Trading engines sit at the core of every centralized exchange. They need to match orders in microseconds, settle wallets atomically, and notify users in real-time — all while keeping funds consistent across failures. This post walks through the complete system design of a CEX trading platform built in Go, from the moment a user places an order to when they see their trade notification over WebSocket.
Architecture Overview
The system runs as 6 microservices inside a single Go binary (one CLI command per service). Services communicate synchronously via gRPC for critical operations and asynchronously via NATS JetStream for event-driven notifications.
┌─────────────────────────────────────────────────────────┐
│ Traefik (Reverse Proxy) │
└──────┬──────────────┬──────────────┬────────────────┬───┘
│ :8081 │ :8082 │ :8083 │ :8085
▼ ▼ ▼ ▼
users-svc wallet-svc order-svc market-svc
(auth/JWT) (balances) (CRUD) (tickers/WS)
│ │ │
│ :50052 :50053
│ ▲ ▲
│ │ gRPC │
│ └──────────────┴──────────┐
│ │
│ matching-svc
│ :50054
│ │
│ NATS publish
│ │
│ ▼
│ notification-svc
│ :8086 (WS)
│ │
└────────────────────────────────────────┘
Why gRPC for inter-service? Wallet holds and settlements are synchronous operations that must succeed or fail atomically. gRPC gives us type-safe contracts, streaming, and proper error codes. NATS handles the fire-and-forget notification fan-out where eventual consistency is acceptable.
The Trading Flow: A Saga in Disguise
When a user places a limit order, 11 steps execute across 4 services. This is effectively a saga pattern — a sequence of local transactions with compensation logic on failure.
Client order-svc wallet-svc matching-svc
│ │ │ │
│ POST /orders │ │ │
│────────────────────▶│ │ │
│ │ │ │
│ │ 1. HoldBalance() │ │
│ │──────────────────▶│ │
│ │ lock funds │ │
│ │◀──────────────────│ │
│ │ │ │
│ │ 2. INSERT order │ │
│ │ status=new │ │
│ │ │ │
│ │ 3. SubmitOrder() │ │
│ │───────────────────────────────────────▶│
│ 201 Created │ │ │
│◀────────────────────│ │ │
│ │ │ │
│ │ │ 4. Match() │
│ │ │ ┌───────────────▶│
│ │ │ │ order book │
│ │ │ │
│ │ │ 5-6. Settle() │
│ │ │◀───────────────────│
│ │ │ unlock + deposit │
│ │ │ │
│ │ 7. UpdateFill() │ │
│ │◀──────────────────────────────────────│
│ │ │ │
│ │ 8. NATS publish events │
│ │ │ │
│ 9. WebSocket push │ │ │
│◀═══════════════════════════════════════════════════════════ notif-svc
The HTTP response returns immediately after step 3 — the client gets a 201 Created with the order details. Matching and settlement happen asynchronously. The user learns about fills through WebSocket notifications.
Step 1: Order Validation and Wallet Hold
When an order arrives, the order service validates it and locks funds before anything touches the matching engine.
func (s *orderService) CreateOrder(ctx context.Context, userID string, req *dto.CreateOrderDTO) (*entities.Order, error) {
// 1. Fetch and validate trading pair
pair, err := s.repo.GetTradingPair(ctx, req.PairID)
if err != nil {
return nil, ErrTradingPairNotFound
}
// Validate tick size and minimum quantity
if !req.Price.Mod(pair.TickSize).IsZero() {
return nil, ErrInvalidTickSize
}
if req.Quantity.LessThan(pair.MinQty) {
return nil, ErrQuantityTooSmall
}
// 2. Hold balance via gRPC to wallet service
holdAsset, holdAmount := calculateHold(req.Side, pair, req.Price, req.Quantity)
if err := s.walletClient.HoldBalance(ctx, userID, holdAsset, holdAmount); err != nil {
return nil, err // ErrInsufficientBalance
}
// 3. Persist order
order, err := s.repo.Create(ctx, &entities.Order{
UserID: userID,
PairID: req.PairID,
Side: req.Side,
Price: req.Price,
Quantity: req.Quantity,
})
if err != nil {
// Compensation: release the hold
s.walletClient.ReleaseBalance(ctx, userID, holdAsset, holdAmount)
return nil, err
}
// 4. Submit to matching engine (fire-and-forget)
s.matchingClient.SubmitOrder(ctx, order)
return order, nil
}
Key design decisions:
- BUY orders hold
price × quantityof the quote asset (e.g., USDT) - SELL orders hold
quantityof the base asset (e.g., BTC) - If the DB insert fails after locking funds, we compensate by releasing the hold
- The matching submission is fire-and-forget — if it fails, the order sits in the DB as
newand can be cancelled
The wallet hold is an atomic SQL operation with a balance guard:
UPDATE wallets
SET available = available - $3, locked = locked + $3, updated_at = NOW()
WHERE user_id = $1 AND asset_id = $2 AND available >= $3
RETURNING *;
The available >= $3 guard is critical — it prevents negative balances at the database level, regardless of application-layer race conditions.
Step 2: The Matching Engine
The matching engine is the most performance-sensitive component. Each trading pair gets its own goroutine with a command channel, serializing all mutations to the order book without locks.
Engine Architecture
Engine (1 goroutine per trading pair)
┌───────────────────────────────────────────────┐
│ │
│ cmdCh (buffered channel) │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ Order Book │ │
│ │ │ │
│ │ BIDS (sorted DESC) ASKS (sorted ASC) │ │
│ │ ┌────────────────┐ ┌────────────────┐│ │
│ │ │ 50,100 │ │ 50,200 ││ │
│ │ │ [ord3, ord7] │ │ [ord1, ord4] ││ │
│ │ ├────────────────┤ ├────────────────┤│ │
│ │ │ 50,000 │ │ 50,300 ││ │
│ │ │ [ord2, ord5] │ │ [ord6] ││ │
│ │ └────────────────┘ └────────────────┘│ │
│ │ │ │
│ │ INDEX: map[orderID] → node (O(1) del) │ │
│ └──────────────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────┘
The order book has three data structures working together:
- Sorted price levels — bids sorted descending (highest first), asks sorted ascending (lowest first)
- FIFO queue per level — orders at the same price execute in arrival order
- Index map — maps order ID to its node in the book for O(1) cancellation
The Matching Algorithm
func (ob *OrderBook) Match(incoming *BookOrder) []*Trade {
var trades []*Trade
opposite := ob.asks // if incoming is BUY
if incoming.Side == Sell {
opposite = ob.bids
}
for _, level := range opposite.snapshot() {
if !pricesCross(incoming, level.price) {
break // no more matchable levels
}
for e := level.orders.Front(); e != nil; {
resting := e.Value.(*BookOrder)
next := e.Next()
// Self-trade prevention
if resting.UserID == incoming.UserID {
e = next
continue
}
fillQty := decimal.Min(incoming.Remaining, resting.Remaining)
trades = append(trades, &Trade{
Price: resting.Price, // maker price
Quantity: fillQty,
BuyerID: buyerID(incoming, resting),
SellerID: sellerID(incoming, resting),
})
incoming.Remaining = incoming.Remaining.Sub(fillQty)
resting.Remaining = resting.Remaining.Sub(fillQty)
if resting.Remaining.IsZero() {
ob.removeOrder(resting.ID)
}
if incoming.Remaining.IsZero() {
return trades
}
e = next
}
}
// Remaining quantity rests in the book
if incoming.Remaining.GreaterThan(decimal.Zero) {
ob.addOrder(incoming)
}
return trades
}
Price crossing rules:
- BUY matches when
incoming.Price >= resting.Price - SELL matches when
incoming.Price <= resting.Price - Trade always executes at the resting (maker) price — the taker gets price improvement
Self-trade prevention: If the incoming and resting orders belong to the same user, skip the resting order. This prevents wash trading.
Why a Channel, Not a Mutex?
Using a command channel instead of a mutex gives us:
- Backpressure — if the engine can’t keep up, submissions block at the channel
- Sequential reasoning — the goroutine processes one command at a time, no concurrent mutation
- Clean shutdown — send a
CmdStopcommand and the goroutine exits after draining
func (e *Engine) Start(ctx context.Context) {
go func() {
for {
select {
case cmd := <-e.cmdCh:
switch cmd.Type {
case CmdSubmit:
trades := e.book.Match(cmd.Order)
cmd.ResultCh <- Result{Trades: trades}
case CmdCancel:
e.book.Cancel(cmd.OrderID)
cmd.ResultCh <- Result{}
case CmdStop:
return
}
case <-ctx.Done():
return
}
}
}()
}
Step 3: Trade Settlement
After matching produces trades, the matching service orchestrates settlement — 4 wallet operations per trade via gRPC.
func (s *matchingService) settleTrade(ctx context.Context, trade *engine.Trade) error {
quoteAmount := trade.Price.Mul(trade.Quantity)
// 1. Unlock buyer's quote asset (USDT)
s.walletClient.SettleTrade(ctx, trade.BuyerID, pair.QuoteAsset, quoteAmount)
// 2. Deposit base asset to buyer (BTC)
s.walletClient.Deposit(ctx, trade.BuyerID, pair.BaseAsset, trade.Quantity)
// 3. Unlock seller's base asset (BTC)
s.walletClient.SettleTrade(ctx, trade.SellerID, pair.BaseAsset, trade.Quantity)
// 4. Deposit quote asset to seller (USDT)
s.walletClient.Deposit(ctx, trade.SellerID, pair.QuoteAsset, quoteAmount)
return nil
}
The Wallet State Machine
Each wallet tracks two balances: available (free to use) and locked (reserved for open orders).
Hold
available ──────────────▶ locked
▲ │
│ Release │
└───────────────────────┘
│ │
│ Deposit Settle
│ │
▲ ▼
(receive) (consumed)
| Operation | Available | Locked | Trigger |
|---|---|---|---|
| Hold | -= amount | += amount | Order created |
| Release | += amount | -= amount | Order cancelled |
| Settle | no change | -= amount | Trade matched |
| Deposit | += amount | no change | Trade matched |
Why separate Settle and Deposit? When a trade matches, the buyer’s locked USDT is consumed (Settle), and the buyer receives BTC (Deposit). These are logically distinct operations on different assets. Combining them would create coupling between asset types.
Balance Example: Buy 1 BTC at 50,000 USDT
Phase 1: Order Created (Hold) Phase 2: Trade Settled
┌────────────────────────────┐ ┌────────────────────────────┐
│ Buyer USDT │ │ Buyer USDT │
│ avail: 100k → 50k │ │ locked: 50k → 0 (settle) │
│ locked: 0 → 50k │ │ │
│ │ │ Buyer BTC │
│ Seller BTC │ │ avail: 0 → 1 (deposit) │
│ avail: 1 → 0 │ │ │
│ locked: 0 → 1 │ │ Seller BTC │
└────────────────────────────┘ │ locked: 1 → 0 (settle) │
│ │
│ Seller USDT │
│ avail: 0 → 50k (deposit) │
└────────────────────────────┘
Step 4: Event Publishing and Notifications
After settlement, events flow through NATS JetStream to the notification service.
Three Event Streams
matching-svc ──▶ TRADES stream ──▶ trades.{pair}.executed
order-svc ──▶ ORDERS stream ──▶ orders.{user}.{status}
wallet-svc ──▶ WALLETS stream ──▶ wallets.{user}.{action}
Each event carries a Nats-Msg-Id header for deduplication. JetStream rejects duplicate messages within its dedup window, making publishers safe to retry.
func (p *TradePublisher) PublishTrade(ctx context.Context, event TradeEvent) error {
data, _ := json.Marshal(event)
_, err := p.js.Publish(ctx,
fmt.Sprintf("trades.%s.executed", event.PairID),
data,
jetstream.WithMsgID(event.TradeID), // dedup key
)
return err
}
Notification Consumer
The notification service runs 3 durable JetStream consumers, one per stream. Each consumer routes events to the appropriate handler:
func (h *EventHandler) Handle(ctx context.Context, msg jetstream.Msg) error {
subject := msg.Subject()
switch {
case strings.HasPrefix(subject, "trades."):
return h.handleTradeEvent(ctx, msg.Data())
case strings.HasPrefix(subject, "orders."):
return h.handleOrderEvent(ctx, msg.Data())
case strings.HasPrefix(subject, "wallets."):
return h.handleWalletEvent(ctx, msg.Data())
}
return nil
}
Trade events create two notifications — one for the buyer and one for the seller. Each notification has an event_key for idempotency:
trade_{tradeID}_{userID}— one per user per tradeorder_{orderID}_{status}— one per status changewallet_{txID}— one per transaction
The event_key column has a UNIQUE constraint in Postgres. If a duplicate event arrives (NATS at-least-once delivery), the INSERT is silently skipped.
WebSocket Broadcasting
The WebSocket hub maintains a per-user connection registry with RWMutex protection:
func (h *Hub) SendToUser(userID string, notif *entities.Notification) {
h.mu.RLock()
conns := make([]*safeConn, len(h.connections[userID]))
copy(conns, h.connections[userID]) // deep-copy under lock
h.mu.RUnlock()
for _, conn := range conns {
go func(c *safeConn) {
if err := c.WriteJSON(notif); err != nil {
h.Unregister(userID, c)
}
}(conn)
}
}
Key design choices:
- Deep-copy the connection slice before releasing the lock — prevents holding the lock during slow writes
- Async sends — one slow/dead connection doesn’t block others
- Auto-cleanup — failed writes trigger unregistration
Order Cancellation: The Reverse Saga
Cancellation reverses the order creation saga with its own compensation logic:
1. Verify ownership (user_id matches JWT)
2. Check cancellable (status = new | partial)
3. Remove from matching engine (gRPC, best-effort)
4. Release wallet hold (gRPC)
BUY: release price × remainingQty of quote asset
SELL: release remainingQty of base asset
5. Update order status = cancelled in DB
6. Publish cancellation event to NATS
The compensation logic here is subtle:
- If wallet release fails → cancel fails, balance stays locked. This is safe — the user can retry.
- If DB update fails after release → attempt to re-hold the released amount. This prevents double-release (user getting funds back but order still appearing active).
The matching engine cancellation is best-effort because the order might already have been fully filled between the cancel request and its processing.
Financial Precision
All monetary values use shopspring/decimal in Go and NUMERIC in Postgres. Never use float64 for money:
// decimal.Decimal — arbitrary precision
price := decimal.NewFromString("50000.50")
qty := decimal.NewFromString("0.00123456")
total := price.Mul(qty) // exact: 61.500000628
Using float64 would introduce rounding errors that compound across trades — unacceptable for a financial system.
Key Invariants
Six invariants keep the system consistent:
- Balance conservation —
available + lockedis conserved across hold/release cycles. Settle and Deposit are external flows. - Fill monotonicity —
filled_qtyonly increases. SQL guard:$2 > filled_qty AND $2 <= quantity. - Self-trade prevention — the matching engine skips orders from the same user.
- Idempotent notifications —
event_keyUNIQUE constraint prevents duplicates. - NATS dedup —
Nats-Msg-Idprevents duplicate event processing. - Atomic wallet ops — SQL guards ensure sufficient balance before mutations.
Lessons from Production Debugging
Two bugs that only showed up in Docker deployment taught me more about this architecture than the design phase did.
The Silent gRPC Disconnect
After deploying all services via Docker Compose, orders were created successfully (HTTP 201) but never matched. No errors in the order service logs — just orders sitting in new status forever.
The root cause: order-svc was missing MATCHING_SERVICE__ADDRESS=matching-svc:50054 in its Docker environment. It fell back to the config default (localhost:50054), which doesn’t exist inside a container. The gRPC call to submit the order failed silently because SubmitOrder is fire-and-forget:
// order_service.go — fire-and-forget, error only logged
s.matchingClient.SubmitOrder(ctx, order)
The fix was one line in docker-compose.yml. The lesson: every inter-service address must be explicitly configured in every deployment environment. Config defaults that work locally (localhost:50054) become silent failures in containers.
Self-Trade Prevention Breaks E2E Tests
The E2E test for order execution used a single test account to place both a sell and buy order at the same price, expecting them to match. They never did.
The matching engine has self-trade prevention — if the incoming and resting orders share the same UserID, the match is skipped:
if resting.UserID == incoming.UserID {
e = next
continue // skip self-trade
}
This is correct behavior for a real exchange. The fix: use two different test accounts.
Scenario: Buy and sell orders from different users match and execute
# Trader 1 places a sell order
Given I am logged in as "trader1@booker.dev"
And I am on the trading page
When I fill in the sell order with price "42000" and quantity "0.01"
And I submit the sell order
# Trader 2 places a matching buy order
When I logout and login as "trader2@booker.dev"
And I fill in the buy order with price "42000" and quantity "0.01"
And I submit the buy order
Then the buy order should be filled
The broader lesson: E2E tests must respect the same invariants as production. If your engine prevents self-trades, your tests need multiple users. Seed your database with enough test accounts to support this.
Trade-offs and What’s Missing
This design optimizes for correctness and simplicity over raw throughput. Some conscious trade-offs:
Settlement is synchronous and best-effort. If a settlement gRPC call fails mid-way, the buyer might be debited without the seller being credited. A production system would move settlement to an async queue with idempotent retries — or use a workflow engine like Temporal for durable orchestration.
The matching engine is in-memory. A crash loses the order book state. Recovery requires replaying open orders from the database on startup. A production system would snapshot the order book periodically or use event sourcing.
No rate limiting or circuit breakers. The gRPC calls between services have no circuit breaker protection. A wallet service outage would cascade to order creation failures.
Single-pair goroutine. One goroutine per pair works well for moderate throughput but becomes a bottleneck for high-frequency trading. Sharding the order book or using lock-free data structures would be the next optimization.
Conclusion
Building a trading engine is fundamentally about managing state transitions across multiple services while keeping funds consistent. The saga pattern — explicit compensation at each step — provides a clear mental model for reasoning about failures. gRPC gives you the synchronous guarantees where you need them, NATS JetStream handles the async fan-out, and atomic SQL operations with balance guards prevent the scariest bugs: negative balances and phantom funds.
The bugs that hurt most weren’t in the matching algorithm — they were in deployment configuration and test assumptions. A missing environment variable silently broke the entire trading flow. A test that ignored a business rule (self-trade prevention) passed for the wrong reasons. The code was correct; the environment wasn’t.
The full source code is available on GitHub with unit tests (96% coverage), integration tests using testcontainers, and E2E tests with Cucumber + Playwright covering multi-user trade execution.