GraphQL gives clients freedom to traverse object graphs. That freedom punishes naive resolvers: a single query for 20 posts with author and comments fields can trigger 41 SQL roundtrips. DataLoader fixes it in one place, per request, without restructuring your services.
This post walks through a real wiring in a NestJS 11 + Apollo 5 + Prisma 7 codebase: where DataLoader lives, how it sits between resolvers and repositories, and a runtime toggle that lets you flip batching on/off to see the difference.
The N+1 Problem in 30 Seconds
Given this query against a Relay-style connection:
query {
feed(first: 20) {
edges {
node {
id
content
author { id name }
comments { id content }
}
}
pageInfo { hasNextPage endCursor }
}
}
A naive implementation:
feedresolver →SELECT ... FROM posts ORDER BY created_at DESC LIMIT 20(1 query)- For each post,
authorfield →SELECT ... FROM users WHERE id = ?(20 queries) - For each post,
commentsfield →SELECT ... FROM comments WHERE post_id = ?(20 queries)
Total: 41 queries for 20 rows. The client requested one logical thing; the database paid for forty-one.
(The Relay edges/node envelope is just pagination metadata — the N+1 lives in the field resolvers under node, identical to a flat [Post] return.)
The fix isn’t “use joins” — GraphQL field resolvers run independently and can’t see siblings. The fix is to collect all .load(id) calls in the same tick of the event loop, hit the DB once, then fan results back out. That’s DataLoader.
Anatomy of DataLoader
DataLoader (Facebook OSS, ~2 KB) is a small class with two jobs:
- Batching —
loader.load(1),loader.load(2),loader.load(3)called synchronously resolve to a singlebatchFn([1, 2, 3])call. - Caching — within the same loader instance, repeated
.load(sameId)returns the cached promise.
import DataLoader from 'dataloader'
const userLoader = new DataLoader<string, User>(async (ids) => {
const rows = await prisma.user.findMany({ where: { id: { in: [...ids] } } })
// CRITICAL: result must be in the same order as input ids
const byId = new Map(rows.map((r) => [r.id, r]))
return ids.map((id) => byId.get(id) ?? null)
})
await Promise.all([userLoader.load('1'), userLoader.load('2'), userLoader.load('1')])
// → 1 SQL query, 'user-1' returned twice from cache
Two non-obvious rules of batchFn:
- Output array length must equal input array length.
- Output order must match input order (or you return
null/Errorper missing key).
Get either wrong and DataLoader silently mismatches users to posts. Always go through a Map keyed by id.
Per-Request Lifecycle (and Why It Matters)
A DataLoader instance is a request-scoped cache. Reuse one across requests and:
- Stale data leaks between users.
- Authorization-sensitive fields (
isFollowing,canEdit) compute against the wrong viewer. - Memory grows unbounded.
Therefore: build loaders inside the GraphQL context factory, throw them away when the response is sent.
Wiring It in NestJS + Apollo
1. The Loader Factory
Loaders depend on repositories, which are NestJS providers. Wrap them in an injectable service that exposes a createLoaders() method per request:
// src/graphql/dataloader/dataloader.service.ts
import { Comment, ICommentRepository } from '@modules/comment'
import { IFollowRepository } from '@modules/follow'
import { IUserRepository, User } from '@modules/user'
import { Injectable } from '@nestjs/common'
import DataLoader from 'dataloader'
export interface RequestLoaders {
userById: DataLoader<string, User | null>
commentsByPost: DataLoader<string, Comment[]>
followersCountByUser: DataLoader<string, number>
isFollowingByUser: DataLoader<string, boolean>
}
@Injectable()
export class DataLoaderService {
constructor(
private readonly users: IUserRepository,
private readonly comments: ICommentRepository,
private readonly follows: IFollowRepository,
) {}
createLoaders(viewerId?: string): RequestLoaders {
return {
userById: new DataLoader(async (ids) => {
const all = await this.users.findByIds([...ids])
const byId = new Map(all.map((u) => [u.id, u]))
return ids.map((id) => byId.get(id) ?? null)
}),
commentsByPost: new DataLoader(async (postIds) => {
const all = await this.comments.findByPostIds([...postIds])
return postIds.map((pid) => all.filter((c) => c.postId === pid))
}),
followersCountByUser: new DataLoader(async (userIds) => {
const map = await this.follows.countFollowers([...userIds])
return userIds.map((id) => map.get(id) ?? 0)
}),
isFollowingByUser: new DataLoader(async (userIds) => {
if (!viewerId) return userIds.map(() => false)
const set = await this.follows.isFollowingBatch(viewerId, [...userIds])
return userIds.map((id) => set.has(id))
}),
}
}
}
Notice isFollowingByUser closes over viewerId. That’s why a fresh set of loaders must be created per request — the cache is bound to a specific authenticated user.
2. Repository Methods Built for Batching
DataLoader is only as good as the batch function behind it. Repositories need bulk methods that take arrays:
// src/modules/follow/infrastructure/repositories/follow.prisma-repository.ts
async countFollowers(userIds: string[]): Promise<Map<string, number>> {
const rows = await this.prisma.follow.groupBy({
by: ['followingId'],
where: { followingId: { in: userIds } },
_count: { followerId: true },
})
return new Map(rows.map((r) => [r.followingId, r._count.followerId]))
}
async isFollowingBatch(viewerId: string, targetIds: string[]): Promise<Set<string>> {
const rows = await this.prisma.follow.findMany({
where: { followerId: viewerId, followingId: { in: targetIds } },
select: { followingId: true },
})
return new Set(rows.map((r) => r.followingId))
}
Returning a Map or Set keeps the batch function trivial: lookup by id, default to zero/false.
3. The Module (@Global() for Convenience)
// src/graphql/dataloader/dataloader.module.ts
import { Global, Module } from '@nestjs/common'
import { CommentModule } from '../../modules/comment/comment.module'
import { FollowModule } from '../../modules/follow/follow.module'
import { UserModule } from '../../modules/user/user.module'
import { DataLoaderService } from './dataloader.service'
@Global()
@Module({
imports: [UserModule, CommentModule, FollowModule],
providers: [DataLoaderService],
exports: [DataLoaderService],
})
export class DataLoaderModule {}
@Global() because every resolver wants loaders in context — explicit imports would be noise.
4. The Apollo context Factory
This is the only place that knows how to assemble a request-scoped context:
// src/app.module.ts
GraphQLModule.forRootAsync<ApolloDriverConfig>({
driver: ApolloDriver,
imports: [DataLoaderModule, AuthModule],
inject: [JwtService, DataLoaderService],
useFactory: (jwt: JwtService, loaderSvc: DataLoaderService) => ({
autoSchemaFile: join(process.cwd(), 'src/schema.gql'),
context: ({ req }) => {
// Parse JWT inline — keep auth simple, skip passport
const auth = req.headers.authorization as string | undefined
if (auth?.startsWith('Bearer ')) {
try {
const payload = jwt.verify(auth.slice(7)) as { sub: string; email: string }
req.user = { id: payload.sub, email: payload.email }
} catch {
/* invalid token — leave unauthenticated */
}
}
return {
req,
loaders: loaderSvc.createLoaders(req.user?.id),
}
},
}),
})
Each HTTP request → fresh loaders → isolated batching window. Apollo discards context after willSendResponse, taking the cache with it.
5. Typed Context for Resolvers
// src/common/graphql/gql-context.ts
import type { Request } from 'express'
import type { RequestLoaders } from '../../graphql/dataloader/dataloader.service'
export interface GqlContext {
req: Request & { user?: { id: string; email: string } }
loaders: RequestLoaders
}
6. Resolvers Just Call .load(id)
// src/resolvers/post.resolver.ts
@Resolver(() => PostType)
export class PostResolver {
@ResolveField(() => UserType)
author(@Parent() post: Post, @Context() ctx: GqlContext) {
return ctx.loaders.userById.load(post.authorId)
}
@ResolveField(() => [CommentType])
comments(@Parent() post: Post, @Context() ctx: GqlContext) {
return ctx.loaders.commentsByPost.load(post.id)
}
}
// src/resolvers/user.resolver.ts
@Resolver(() => UserType)
export class UserResolver {
@ResolveField(() => Int)
followersCount(@Parent() user: UserType, @Context() ctx: GqlContext) {
return ctx.loaders.followersCountByUser.load(user.id)
}
@ResolveField(() => Boolean)
isFollowing(@Parent() user: UserType, @Context() ctx: GqlContext) {
return ctx.loaders.isFollowingByUser.load(user.id)
}
}
That’s the entire integration. Resolvers stay one-liners; the orchestration lives in the loader factory.
Use Cases Worth Loading
| Field | Loader strategy | Wins |
|---|---|---|
Post.author, Comment.author | Single userById shared across resolvers | Cache hit when same author appears in posts and their comments |
Post.comments | commentsByPost keyed by postId | One WHERE post_id IN (...) instead of N queries |
User.followersCount | Aggregate via GROUP BY | One count query for an entire feed of distinct authors |
User.isFollowing | Viewer-scoped batch | One query checks “does viewer follow any of these users” |
The pattern: each @ResolveField that hits I/O gets a loader; same-typed loaders are shared across resolvers (a userById loader works equally well from Post.author, Comment.author, or Follow.target).
Demo Mode: Toggle Batching at Runtime
A great way to teach DataLoader’s value is to let users flip it off and watch query counts spike. Two pieces:
A KeyedLoader Interface
export interface KeyedLoader<K, V> {
load(key: K): Promise<V>
}
DataLoader<K, V> already implements this. So does a trivial fallback:
createLoaders(viewerId?: string, opts: { batch?: boolean } = {}): RequestLoaders {
const batch = opts.batch !== false
if (!batch) {
return {
userById: { load: (id) => this.users.findById(id) },
commentsByPost: {
load: async (postId) => {
const all = await this.comments.findByPostIds([postId])
return all.filter((c) => c.postId === postId)
},
},
// ... non-batched fallbacks for the rest
}
}
// ... real DataLoader instances (as before)
}
Resolvers don’t change — both branches satisfy KeyedLoader.
Header-Driven Switch + Stats Plugin
Read a header, count Prisma queries, return the count in extensions:
// app.module.ts context factory
context: ({ req }) => {
// ... auth parsing ...
const dataLoaderEnabled = req.headers['x-disable-dataloader'] !== '1'
const stats: RequestStats = { queryCount: 0, dataLoaderEnabled }
requestContext.enterWith(stats) // AsyncLocalStorage so Prisma middleware can bump the counter
return {
req,
loaders: loaderSvc.createLoaders(req.user?.id, { batch: dataLoaderEnabled }),
stats,
}
}
const demoStatsPlugin = {
async requestDidStart() {
return {
async willSendResponse({ response, contextValue }: any) {
const stats = contextValue?.stats
if (!stats || response.body.kind !== 'single') return
response.body.singleResult.extensions = {
...(response.body.singleResult.extensions ?? {}),
queryCount: stats.queryCount,
dataLoaderEnabled: stats.dataLoaderEnabled,
}
},
}
},
}
A Prisma middleware increments stats.queryCount on every $query. Now the same GraphQL query returns:
// With DataLoader
{ "data": {...}, "extensions": { "queryCount": 4, "dataLoaderEnabled": true } }
// curl -H "x-disable-dataloader: 1" ...
{ "data": {...}, "extensions": { "queryCount": 41, "dataLoaderEnabled": false } }
Wire a checkbox on the frontend that toggles the header → live demo of N+1 vs batched. Worth its weight in onboarding docs.
Common Pitfalls
1. Sharing loaders across requests. Don’t make DataLoaderService hold loader instances. Always create fresh ones in the context factory.
2. Returning the wrong order from batchFn. Always go through a Map/Set keyed by id. Direct .findMany results are not guaranteed in input order.
3. Mixing entities and DTOs in cache. The cache key is K, but the cached value is whatever you returned. If userById sometimes returns User and sometimes returns User | null, downstream null checks get inconsistent. Pick one shape.
4. Caching mutations. DataLoader’s cache is read-through. If you update a user mid-request, the loader still returns the stale cached value. Either don’t load before mutations, or call loader.clear(id) after writes.
5. Forgetting non-id keys. A loader keyed by (viewerId, targetUserId) needs a composite key. Either stringify the tuple (`${viewer}:${target}`) or scope the loader to a single viewer (the approach above) so the key collapses to targetId.
6. Loading inside Promise chains. Calls to .load() must happen synchronously to be batched. await something(); loader.load(x) runs in a later tick — it’ll batch with other late calls but not with the original burst. Usually fine; just know why.
When NOT to Use DataLoader
- Root queries returning a single entity — no N to amortize.
- Mutations with a single write — overhead without payoff.
- Fields already eager-loaded on the parent (e.g., the post object already includes its author from a
findFirst({ include: { author: true } })). In that case, the resolver should default toparent.authorand only fall back to the loader if absent. - Cross-request caching needs — DataLoader is per-request by design. Reach for Redis or an in-memory LRU instead.
Why This Pattern Holds Up
- Resolvers stay dumb. They map
parent → key → loader.load(key). Domain logic stays in services and repositories. - Repositories gain real bulk methods.
findByIds,findByPostIds,countFollowers([])are useful outside GraphQL too — REST handlers, background jobs, exports. - Per-request boundary is enforced by the context factory. No accidental leaks; lifecycle is obvious.
- Toggle is a 3-line change. Demoing the win — or rolling it back if a batch function breaks — costs nothing.
- Framework-agnostic core.
DataLoaderServiceonly depends on repository interfaces. Swap NestJS for Fastify or plain Express + graphql-yoga, the loader factory survives.