Every ride-hailing app answers one question, fast: which driver picks up this rider? It sounds simple until you remember the driver is moving, dozens of other riders are asking the same thing, and the answer has to come back before the rider closes the app.
This post walks through the matching core of a Grab-style ride-hailing demo: four NestJS microservices in a Turborepo monorepo, Redis GEO for nearest-driver search, a race-safe assignment step backed by Redis locks, and live driver tracking pushed over WebSocket. I’ll cover the stack first, then dig into the matching service — and the bug that taught me the most.
The Stack
The whole thing is a single Turborepo monorepo with a pnpm workspace — backend services and frontends side by side, plus shared packages.
| Layer | Choice | Why |
|---|---|---|
| Monorepo | Turborepo + pnpm workspace | One repo, shared contracts, cached builds |
| Services | NestJS 11 (TypeScript) | Modules, DI, guards — structure for free |
| ORM | Prisma 7 | New prisma-client generator, driver adapters |
| Database | PostgreSQL 16, schema-per-service | One instance, isolated user / driver / trip schemas |
| Cache/Geo | Redis 7 | GEO search + Pub/Sub + presence — one box, three jobs |
| Realtime | Socket.IO on the gateway | Live driver-location push to riders |
| Contracts | Zod, shared via a contracts package | One schema, validated on the backend, reused on the frontend |
| Frontend | Vite + React 19 + Tailwind + shadcn/ui | Passenger app + driver app |
A few decisions worth calling out.
Schema-per-service, one Postgres. Each service owns a Postgres schema (user, driver, trip) and connects with ?schema=<name>. Each has its own Prisma client and migration history. It’s not database-per-service in the strict sense — they share one instance — but services never query across schemas. They talk over the network, not the database. That keeps the boundaries honest without paying for three Postgres instances in a demo.
Zod as the contract. A @rocket/contracts package holds Zod schemas — auth, trip, driver, plus event payloads and enums. The backend validates requests with nestjs-zod; the frontend reuses the same schemas with React Hook Form. Write the shape once, infer the type with z.infer, never duplicate it.
REST now, gRPC later. Services call each other over plain REST today. To keep a future gRPC migration cheap, every cross-service call is wrapped in a thin client class — swap the transport in one file, leave the business logic alone. More on that below.
Architecture
Four services sit behind a gateway. The gateway is the only thing the frontends talk to — it verifies JWTs, proxies REST, and hosts the WebSocket server.
┌───────────┐
Rider / Driver │ Gateway │ :3000 — JWT verify, REST proxy, Socket.IO
apps ───────▶│ │
└─────┬─────┘
┌───────────────┼────────────────┐
│ REST │ REST │ REST
┌────▼────┐ ┌─────▼─────┐ ┌──────▼─────┐
│ User │ │ Driver │◀──┤ Trip │
│ :3001 │ │ :3002 │ │ :3003 │
│ auth │ │ Redis GEO │ │ matching │
└────┬────┘ └─────┬─────┘ └──────┬─────┘
│ │ │
┌────▼───────────────▼────────────────▼────┐
│ PostgreSQL (schema: user/driver/trip) │
└───────────────────────────────────────────┘
┌──────────────────┐
│ Redis │ GEO set · Pub/Sub · presence
└──────────────────┘
- User service is the auth authority — it registers users, hashes passwords, and signs JWTs. Both riders and drivers are
Userrows with arole. - Driver service owns driver profiles, online/offline status, and — the interesting part — driver locations in Redis GEO.
- Trip service is the matching brain: it creates trips, finds a driver, assigns one, and runs the trip lifecycle.
- Gateway verifies the JWT, injects an
x-user-idheader, and forwards. Internal services trust the gateway and never re-verify.
The matching story lives in the Driver and Trip services. Let’s build it.
Where Drivers Live: Redis GEO
A driver’s location is not in Postgres. It changes every few seconds — writing that to a relational table and running ORDER BY distance for every match request is the wrong tool.
Redis has a purpose-built answer: the GEO commands. Under the hood a GEO set is just a sorted set scored by a 52-bit geohash, but the API speaks latitude and longitude. The Driver service keeps a single key, drivers:locations, holding every online driver.
@Injectable()
export class GeoService {
constructor(@Inject(REDIS_CLIENT) private readonly redis: Redis) {}
// Called on every driver location update (~every 4s from the driver app)
async addLocation(driverId: string, lat: number, lng: number): Promise<void> {
// GEOADD takes longitude FIRST, then latitude — easy to get backwards
await this.redis.geoadd(GEO_KEY, lng, lat, driverId);
// Presence key with a TTL — a driver who stops reporting falls off
await this.redis.set(driverPresenceKey(driverId), '1', 'EX', 30);
}
async removeDriver(driverId: string): Promise<void> {
await this.redis.zrem(GEO_KEY, driverId);
await this.redis.del(driverPresenceKey(driverId));
}
async searchNearby(lat: number, lng: number, radiusKm: number, limit: number) {
// GEOSEARCH ... FROMLONLAT lng lat BYRADIUS r km ASC COUNT n WITHDIST
const rows = (await this.redis.geosearch(
GEO_KEY,
'FROMLONLAT', lng, lat,
'BYRADIUS', radiusKm, 'km',
'ASC', 'COUNT', limit,
'WITHDIST',
)) as [string, string][];
return rows.map(([driverId, distance]) => ({
driverId,
distanceKm: Number(distance),
}));
}
}
Three things to notice:
GEOADDis longitude-first. Latitude-first is the single most common bug with Redis GEO. Your search will quietly return drivers from the wrong hemisphere.GEOSEARCH ... ASCreturns drivers sorted nearest-first, with distances. That ordering is the whole matching strategy — greedy nearest.- Presence TTL. Every location write also refreshes a short-lived presence key. If a driver’s app dies, the key expires and they’re treated as gone — without anyone explicitly marking them offline.
Driver status (OFFLINE / ONLINE / BUSY) still lives in Postgres — it’s durable state, not ephemeral position. The GEO set is the index; Postgres is the record.
The Matching Flow
When a rider requests a trip, the Trip service runs a short, explicit sequence. No magic — create the trip, find drivers, try to grab one, publish the result.
Rider Gateway trip-svc driver-svc
│ │ │ │
│ POST /trips │ │ │
│ {pickup,dropoff}│ │ │
│─────────────────▶│ + x-user-id │ │
│ │────────────────▶│ │
│ │ │ 1. INSERT trip │
│ │ │ status=REQUESTED
│ │ │ │
│ │ │ 2. GET /drivers/nearby
│ │ │─────────────────▶│ GEOSEARCH
│ │ │◀─────────────────│ [nearest…]
│ │ │ │
│ │ │ 3. for each: POST /drivers/:id/assign
│ │ │─────────────────▶│ SET NX lock
│ │ │◀─────────────────│ ok / 409
│ │ │ │
│ │ │ 4. UPDATE trip │
│ │ │ status=ASSIGNED
│ trip (ASSIGNED) │ │ │
│◀─────────────────┴─────────────────│ 5. PUBLISH trip:events
Here is the core of it — the MatchingService:
async requestTrip(input: RequestTripInput): Promise<Trip> {
// 1. Persist the trip as REQUESTED
const trip = await this.prisma.trip.create({
data: { ...input, status: TripStatus.REQUESTED },
});
// 2. Ask the Driver service for the nearest online drivers
const candidates = await this.driverClient.findNearby(
input.pickupLat, input.pickupLng, 5 /* km */, 5 /* limit */,
);
// 3. Walk candidates nearest-first, try to claim one
let assignedDriverId: string | null = null;
for (const candidate of candidates) {
const result = await this.driverClient.assign(candidate.driverId);
if (result.ok) {
assignedDriverId = candidate.driverId;
break; // first successful claim wins — it's the closest
}
// lock taken by another trip — try the next driver
}
// 4. Update status + publish the outcome
if (assignedDriverId) {
const updated = await this.prisma.trip.update({
where: { id: trip.id },
data: { status: TripStatus.ASSIGNED, driverId: assignedDriverId },
});
await this.publisher.publish({
type: 'trip', tripId: trip.id, passengerId: input.passengerId,
driverId: assignedDriverId, status: TripStatus.ASSIGNED,
});
return updated;
}
// No driver found, or every candidate's lock was taken
const updated = await this.prisma.trip.update({
where: { id: trip.id },
data: { status: TripStatus.NO_DRIVER },
});
await this.publisher.publish({
type: 'trip', tripId: trip.id,
passengerId: input.passengerId, status: TripStatus.NO_DRIVER,
});
return updated;
}
The algorithm is greedy nearest-first: GEOSEARCH hands back drivers sorted by distance, and the first one we can successfully claim wins. It’s not globally optimal — no ETA modelling, no batching multiple riders, no driver accept/reject step — but it’s predictable and easy to reason about, which is exactly what you want before you optimize.
Keeping the Transport Swappable
Notice the Trip service never calls the Driver service directly. Every call goes through a DriverClient — a thin wrapper that is the only place an HTTP detail appears.
@Injectable()
export class DriverClient {
private readonly baseUrl: string;
constructor(
private readonly http: HttpService,
private readonly config: ConfigService,
) {
this.baseUrl = this.config.get('DRIVER_SERVICE_URL', 'http://localhost:3002');
}
async findNearby(lat: number, lng: number, radiusKm = 5, limit = 5) {
try {
const res = await firstValueFrom(
this.http.get<NearbyDriver[]>(`${this.baseUrl}/drivers/nearby`, {
params: { lat, lng, radiusKm, limit },
timeout: 5000,
}),
);
return Array.isArray(res.data) ? res.data : [];
} catch (err) {
// Driver service unreachable — degrade gracefully to "no drivers"
this.logger.warn(`findNearby failed: ${(err as AxiosError).message}`);
return [];
}
}
async assign(driverId: string): Promise<{ ok: boolean }> { /* ... */ }
async release(driverId: string): Promise<void> { /* ... */ }
}
This is the seam for the planned gRPC migration. When REST becomes the bottleneck, only DriverClient changes — MatchingService doesn’t know or care whether findNearby is an HTTP GET or a gRPC unary call. Wrapping inter-service calls in a typed client costs almost nothing up front and buys you a clean migration later.
It also gives one obvious place to put resilience: the try/catch degrades a Driver-service outage into “no drivers found” instead of a crashed request.
The Race: Two Riders, One Driver
Step 3 hides the hardest problem in matching. Two riders request trips at the same instant. GEOSEARCH runs twice and returns the same nearest driver to both. Without coordination, both trips get assigned to him.
The fix lives in the Driver service’s assign endpoint — a Redis SET ... NX lock, which is atomic.
async assign(driverId: string): Promise<{ ok: boolean }> {
// SET key value EX 10 NX — succeeds only if the key does not exist
const locked = await this.redis.set(
driverLockKey(driverId), '1', 'EX', 10, 'NX',
);
if (locked !== 'OK') {
return { ok: false }; // another trip already claimed this driver
}
// We own the driver — mark BUSY and pull them out of the GEO index
await this.prisma.driver.update({
where: { id: driverId },
data: { status: DriverStatus.BUSY },
});
await this.geo.removeDriver(driverId);
return { ok: true };
}
SET ... NX returns OK only if it created the key. Exactly one of the two concurrent requests gets OK; the other gets null. The losing trip sees { ok: false }, the matching loop moves to its next candidate, and the second rider is assigned the second-nearest driver. No double-booking.
Two extra touches make the result clean:
EX 10— the lock self-expires. If the trip service crashes mid-assignment, the driver isn’t locked forever.geo.removeDriver— once claimed, the driver isZREM’d fromdrivers:locations, so no futureGEOSEARCHeven considers them. When the trip ends,releaseflips status back toONLINEand the driver re-enters the GEO set on their next location update.
This is a deliberately small piece of distributed locking — no Redlock, no consensus. For a single Redis instance and a 10-second critical section, SET NX EX is the right amount of machinery.
Trip Lifecycle
Once assigned, a trip walks a small state machine, and every transition is guarded.
REQUESTED ──▶ ASSIGNED ──▶ ONGOING ──▶ COMPLETED
│ │ │
└──▶ NO_DRIVER└────────────┴──▶ CANCELLED
start, complete, and cancel each check the current status before moving — calling complete on a REQUESTED trip is a 409, not a silent no-op. complete and cancel also call driverClient.release(driverId) so the driver becomes matchable again. Every transition publishes an event — which is how the rider’s screen finds out.
Real-Time Tracking: Pub/Sub to WebSocket
A rider watching their driver approach needs live updates. The pieces are already there — Redis is in the stack — so the realtime layer is Redis Pub/Sub feeding Socket.IO.
driver app ──POST /drivers/:id/location──▶ driver-svc
│ GEOADD
│ PUBLISH driver:location
▼
trip-svc ──PUBLISH trip:events──▶ ┌──────────────────┐
│ Redis Pub/Sub │
└─────────┬─────────┘
│ subscribe
▼
Gateway subscriber
│ emit to room trip:<id>
▼
Rider's WebSocket
The Driver service publishes a driver:location event after each GEOADD. The Trip service publishes trip:events on every status change. The gateway runs one Socket.IO server and a Redis subscriber: when a rider’s trip is assigned, their socket joins a room named trip:<tripId>, and the subscriber forwards matching events into that room.
One catch worth a sentence: the gateway keeps an in-memory Map<driverId, tripId> so a raw driver:location event can be routed to the right trip room. It works perfectly for one gateway instance — and it’s the first thing that breaks when you scale out. (See the trade-offs.)
The Bug That Taught Me Something: Nearest-Then-Filter Starves
The matching algorithm was never the hard part. This was.
/drivers/nearby does two things: a GEOSEARCH ... COUNT 5, then a filter that keeps only drivers whose Postgres status is ONLINE.
Read that order again. It searches for the 5 nearest drivers first, then filters. If the 5 nearest entries in the GEO set happen to be stale — drivers who went offline but were never removed from drivers:locations — the filter discards all 5 and returns an empty list. Meanwhile a perfectly available driver sits 2 km away, never considered, because they didn’t make the COUNT 5 cut.
I hit this when leftover test drivers polluted the GEO set. The matching service reported “no drivers” while five live drivers were online and moving.
There are two real fixes, and you want both:
- Keep the GEO set clean. A driver going offline must be
ZREM’d. The set should only ever contain matchable drivers, so the COUNT window never wastes a slot on a ghost. - Don’t trust COUNT-then-filter. Either filter inside the query, or fetch a generous over-count (say
COUNT 50), filter, then take the top N.
The deeper lesson: a LIMIT applied before a filter is a correctness bug, not a performance tweak. It’s easy to read COUNT 5 as “give me 5 results” when it actually means “look at only 5 rows” — and those are very different promises.
Trade-offs and What’s Missing
This design optimizes for clarity. The simplifications are deliberate, and they’re also exactly what you’d change first under real load.
The matching is synchronous. POST /trips blocks while the Trip service makes two-to-six sequential HTTP calls (one findNearby, then a loop of assigns). Under load that ties up request handlers. A production system enqueues a matching job and pushes the result over WebSocket — the HTTP request returns the moment the trip is persisted.
The gateway’s Map<driverId, tripId> is in-memory. Run two gateway instances and a driver:location event arriving at instance B can’t find a trip whose map entry lives on instance A. Horizontal scaling of the WebSocket tier means moving that map into Redis and adding the Socket.IO Redis adapter.
One Redis, one Postgres. GEO, Pub/Sub, and presence all share a single Redis — a single point of failure. The three Postgres schemas share one instance, so the services don’t actually scale independently at the data layer.
Greedy nearest, straight-line. Redis GEO distance is as-the-crow-flies, not road distance or ETA. There’s no driver accept/reject step — the system force-assigns. Real dispatch weighs traffic, driver heading, and acceptance rate.
Could it serve 50,000 concurrent users as-is? No — and honestly, not close. But the shape is right: stateless services, a clean event bus, a typed seam for gRPC. The path from here to scale is visible, and that’s the point of getting the boundaries right early.
Conclusion
A driver-matching service is, at its core, two ideas: an index that answers “who is near here?” and a claim that answers “can I have them?” Redis GEO gives you the first almost for free — GEOADD on every location ping, GEOSEARCH ... ASC to rank by distance. A Redis SET NX EX lock gives you the second, with just enough distributed-systems machinery and no more.
Everything else — the gateway, the schema-per-service split, the Zod contracts, the Pub/Sub-to-WebSocket bridge — is structure that keeps those two ideas honest as the system grows. And the bug that cost the most wasn’t in the algorithm — it was a LIMIT that ran before a filter. The matching maths was fine. The query that fed it wasn’t — until it was.