LangGraph extends LangChain with graph-based workflow orchestration for building stateful, multi-step AI agents. Unlike simple chains, LangGraph lets you define nodes (processing steps) and edges (transitions) that can branch conditionally based on state. This post walks through building a production customer support agent that classifies intent, routes to specialized handlers, and escalates to humans when confidence is low.
Why LangGraph Over Simple Chains?
LangChain chains are linear: input flows through a sequence of steps. But real-world agents need branching logic:
- Route to different handlers based on user intent
- Skip steps when data is missing
- Loop back for clarification
- Escalate when confidence drops
LangGraph models these flows as directed graphs with conditional edges. Each node transforms the shared state, and edges determine which node runs next.
Architecture Overview
flowchart TD
A[Classifier] --> |greeting| B[Greeting Handler]
A --> |knowledge_query| C[Knowledge Retriever]
A --> |product_query| D[Product Fetcher]
A --> |order_query| E[Order Fetcher]
A --> |outscope| F[Outscope Handler]
A --> |sensitive| G[Escalation]
B --> END1([END])
F --> END2([END])
C --> H[LLM Responder]
D --> H
E --> H
H --> I[Confidence Evaluator]
I --> |confidence >= 0.3| END3([END])
I --> |confidence < 0.3| G
G --> END4([END])
style A fill:#4f46e5,color:#fff
style H fill:#059669,color:#fff
style I fill:#d97706,color:#fff
style G fill:#dc2626,color:#fff
The flow: Classifier detects intent → routes to specialized handler → handler fetches context → LLM generates response → evaluator checks confidence → escalates if needed.
Project Setup
Initialize with uv:
uv init ai-chat-service
cd ai-chat-service
uv add fastapi uvicorn langgraph langchain-core langchain-anthropic pydantic python-dotenv
Create .env:
ANTHROPIC_API_KEY=sk-ant-...
Defining Agent State
LangGraph uses TypedDict for state management. Every node reads from and writes to this shared state:
# agent/state.py
from typing import TypedDict, Literal, NotRequired
class AgentState(TypedDict):
message: str
conversation_id: str
user_id: str
conversation_history: list[dict]
needs_human: bool
# Fields populated by agent nodes
intent: NotRequired[
Literal[
"greeting",
"knowledge_query",
"order_query",
"product_query",
"outscope",
"unclear",
"sensitive",
]
| None
]
entities: NotRequired[dict | None]
fetched_context: NotRequired[dict | None]
reply: NotRequired[str | None]
confidence: NotRequired[float | None]
Key patterns:
- Required fields must be present at input
- NotRequired fields are populated by nodes during processing
- Literal types constrain valid values for routing decisions
Structured Classification with Pydantic
The classifier uses with_structured_output() to ensure the LLM returns valid intent and entities:
# agent/nodes/classifier.py
import logging
from typing import Literal
from pydantic import BaseModel, Field
from langchain_anthropic import ChatAnthropic
from agent.state import AgentState
logger = logging.getLogger(__name__)
class ClassifierOutput(BaseModel):
intent: Literal[
"greeting",
"knowledge_query",
"order_query",
"product_query",
"outscope",
"unclear",
"sensitive",
]
entities: dict = Field(default_factory=dict)
def _get_structured_llm():
llm = ChatAnthropic(model="claude-haiku-4-20250514")
return llm.with_structured_output(ClassifierOutput)
CLASSIFIER_PROMPT = """You are an intent classifier for customer support.
Classify the user message into exactly one intent:
- "greeting": simple greetings (hi, hello, hey)
- "knowledge_query": questions about policies, shipping, FAQs
- "order_query": questions about order status, tracking
- "product_query": questions about products, pricing
- "outscope": unrelated questions (weather, news, coding)
- "unclear": ambiguous messages
- "sensitive": complaints, refunds, legal threats
Extract entities:
- order_id: if mentioned
- product_name: if mentioned
- topic_keywords: keywords for search
Conversation context:
{history}
User message: {message}"""
async def classifier_node(state: AgentState) -> dict:
history_text = ""
for msg in state.get("conversation_history", [])[-5:]:
history_text += f"{msg.get('role', 'user')}: {msg.get('content', '')}\n"
prompt = CLASSIFIER_PROMPT.format(
history=history_text or "(no history)",
message=state["message"],
)
try:
structured_llm = _get_structured_llm()
result: ClassifierOutput = await structured_llm.ainvoke(prompt)
logger.info("Classified intent=%s entities=%s", result.intent, result.entities)
return {"intent": result.intent, "entities": result.entities}
except Exception:
logger.exception("Classifier failed, fallback to knowledge_query")
return {"intent": "knowledge_query", "entities": {}}
The Pydantic model enforces:
intentmust be one of the allowed literalsentitiesdefaults to empty dict if not extracted- Invalid LLM responses raise validation errors
Building Specialized Handlers
Each intent gets a dedicated handler node. Here’s a greeting handler:
# agent/nodes/greeting_handler.py
import random
from agent.state import AgentState
GREETINGS = [
"Hello! How can I help you today?",
"Hi there! What can I assist you with?",
"Hey! Welcome! How may I help?",
]
async def greeting_handler_node(state: AgentState) -> dict:
return {
"reply": random.choice(GREETINGS),
"confidence": 1.0,
}
And a knowledge retriever that queries a vector database:
# agent/nodes/knowledge_retriever.py
import os
import json
import logging
import voyageai
from sqlalchemy import text
from db import async_session
from agent.state import AgentState
logger = logging.getLogger(__name__)
def _get_voyage_client():
return voyageai.Client(api_key=os.getenv("VOYAGE_API_KEY", ""))
async def knowledge_retriever_node(state: AgentState) -> dict:
try:
vo = _get_voyage_client()
# Embed user message
embed_result = vo.embed([state["message"]], model="voyage-4-lite")
query_vector = embed_result.embeddings[0]
vector_str = json.dumps(query_vector)
# Vector search
async with async_session() as session:
result = await session.execute(
text("""
SELECT title, content, category,
1 - (embedding <=> CAST(:vec AS vector)) AS similarity
FROM "KnowledgePage"
WHERE is_active = true AND embedding IS NOT NULL
ORDER BY embedding <=> CAST(:vec AS vector)
LIMIT 3
"""),
{"vec": vector_str},
)
rows = result.mappings().all()
if not rows or rows[0]["similarity"] < 0.3:
logger.info("No relevant pages found")
return {"fetched_context": {"pages": []}, "confidence": 0.4}
logger.info("Found %d pages, top similarity=%.3f", len(rows), rows[0]["similarity"])
return {
"fetched_context": {
"pages": [
{"title": r["title"], "content": r["content"], "category": r["category"]}
for r in rows
]
}
}
except Exception:
logger.exception("Knowledge retriever failed")
return {"fetched_context": {"pages": []}, "confidence": 0.4}
LLM Response Generation
The responder uses fetched context to generate grounded answers:
# agent/nodes/llm_responder.py
import json
import logging
from langchain_anthropic import ChatAnthropic
from agent.state import AgentState
logger = logging.getLogger(__name__)
def _get_llm():
return ChatAnthropic(model="claude-haiku-4-20250514")
SYSTEM_PROMPT = """You are a friendly customer support assistant.
GUIDELINES:
- Answer from provided context only
- If context is insufficient, acknowledge limitations
- Match customer's language
- Keep responses concise
FORMATTING:
- Use bullet points for lists
- Use **bold** for key terms
End with: CONFIDENCE: X.X (0.0-1.0)"""
async def llm_responder_node(state: AgentState) -> dict:
context_data = state.get("fetched_context") or {}
context_str = json.dumps(context_data, ensure_ascii=False, default=str)
history_text = ""
for msg in state.get("conversation_history", [])[-5:]:
history_text += f"{msg.get('role', 'user')}: {msg.get('content', '')}\n"
user_prompt = f"Context:\n{context_str}\n\nHistory:\n{history_text}\n\nMessage: {state['message']}"
try:
llm = _get_llm()
response = await llm.ainvoke([
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_prompt},
])
content = response.content
reply_text = content if isinstance(content, str) else str(content)
confidence = state.get("confidence") or 0.7
# Extract inline confidence score
if "CONFIDENCE:" in reply_text:
parts = reply_text.rsplit("CONFIDENCE:", 1)
reply_text = parts[0].strip()
try:
confidence = float(parts[1].strip()[:3])
except ValueError:
pass
return {"reply": reply_text, "confidence": confidence}
except Exception:
logger.exception("LLM responder failed")
return {
"reply": "Sorry, I'm having trouble. Let me connect you with a human.",
"confidence": 0.0,
"needs_human": True,
}
Confidence-Based Escalation
The evaluator checks if the response needs human review:
# agent/nodes/confidence_evaluator.py
import logging
from typing import Literal
from agent.state import AgentState
logger = logging.getLogger(__name__)
ESCALATION_KEYWORDS = [
"refund", "complaint", "speak to human", "real person",
"not working", "terrible", "lawsuit",
]
def confidence_evaluator_node(state: AgentState) -> dict:
"""Check for escalation keywords in message."""
message_lower = state.get("message", "").lower()
has_escalation = any(kw in message_lower for kw in ESCALATION_KEYWORDS)
if has_escalation:
logger.info("Escalation keyword detected")
return {"confidence": 0.0, "needs_human": True}
return {}
def evaluate_route(state: AgentState) -> Literal["respond", "escalate"]:
"""Route based on confidence and flags."""
confidence = state.get("confidence") or 0.5
intent = state.get("intent")
if state.get("needs_human"):
logger.info("Escalating: needs_human flag")
return "escalate"
if intent == "sensitive":
logger.info("Escalating: sensitive intent")
return "escalate"
if confidence < 0.3:
logger.info("Escalating: low confidence %.2f", confidence)
return "escalate"
return "respond"
Assembling the Graph
Now wire everything together with LangGraph:
# agent/graph.py
from langgraph.graph import StateGraph, END
from agent.state import AgentState
from agent.nodes import (
classifier_node,
greeting_handler_node,
outscope_handler_node,
knowledge_retriever_node,
order_fetcher_node,
product_fetcher_node,
llm_responder_node,
confidence_evaluator_node,
evaluate_route,
escalation_node,
)
def route_by_intent(state: AgentState) -> str:
"""Route to handler based on classified intent."""
intent = state.get("intent") or "unclear"
if intent == "sensitive":
return "escalation"
routing_map: dict[str, str] = {
"greeting": "greeting_handler",
"knowledge_query": "knowledge_retriever",
"order_query": "order_fetcher",
"product_query": "product_fetcher",
"outscope": "outscope_handler",
}
return routing_map.get(intent, "knowledge_retriever")
def build_graph():
graph = StateGraph(AgentState)
# Add nodes
graph.add_node("classifier", classifier_node)
graph.add_node("greeting_handler", greeting_handler_node)
graph.add_node("outscope_handler", outscope_handler_node)
graph.add_node("knowledge_retriever", knowledge_retriever_node)
graph.add_node("order_fetcher", order_fetcher_node)
graph.add_node("product_fetcher", product_fetcher_node)
graph.add_node("llm_responder", llm_responder_node)
graph.add_node("confidence_evaluator", confidence_evaluator_node)
graph.add_node("escalation", escalation_node)
# Set entry point
graph.set_entry_point("classifier")
# Conditional routing after classification
graph.add_conditional_edges(
"classifier",
route_by_intent,
{
"greeting_handler": "greeting_handler",
"outscope_handler": "outscope_handler",
"knowledge_retriever": "knowledge_retriever",
"order_fetcher": "order_fetcher",
"product_fetcher": "product_fetcher",
"escalation": "escalation",
},
)
# Terminal handlers end immediately
graph.add_edge("greeting_handler", END)
graph.add_edge("outscope_handler", END)
# Context fetchers flow to responder
graph.add_edge("knowledge_retriever", "llm_responder")
graph.add_edge("order_fetcher", "llm_responder")
graph.add_edge("product_fetcher", "llm_responder")
# Response evaluation
graph.add_edge("llm_responder", "confidence_evaluator")
graph.add_conditional_edges(
"confidence_evaluator",
evaluate_route,
{"respond": END, "escalate": "escalation"},
)
graph.add_edge("escalation", END)
return graph.compile()
agent = build_graph()
Key LangGraph patterns:
StateGraph(AgentState)- creates graph with typed stateadd_node(name, func)- registers processing functionset_entry_point(name)- defines starting nodeadd_edge(from, to)- unconditional transitionadd_conditional_edges(from, router_fn, mapping)- branch based on router return valueEND- terminal state that exits the graph
FastAPI Integration
Expose the agent via API:
# main.py
import logging
from fastapi import FastAPI, HTTPException, Depends
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from pydantic import BaseModel
from agent import agent
import os
logging.basicConfig(level=logging.INFO)
app = FastAPI(title="AI Chat Service")
security = HTTPBearer()
NEXTJS_SECRET = os.getenv("NEXTJS_SECRET", "")
class ChatMessage(BaseModel):
role: str
content: str
class ChatRequest(BaseModel):
message: str
conversationId: str
userId: str
history: list[ChatMessage] = []
class ChatResponse(BaseModel):
reply: str
needs_human: bool
confidence: float
def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
if NEXTJS_SECRET and credentials.credentials != NEXTJS_SECRET:
raise HTTPException(status_code=401, detail="Invalid token")
@app.post("/chat", response_model=ChatResponse)
async def chat(req: ChatRequest, _=Depends(verify_token)):
try:
result = await agent.ainvoke({
"message": req.message,
"conversation_id": req.conversationId,
"user_id": req.userId,
"conversation_history": [
{"role": m.role, "content": m.content} for m in req.history
],
"needs_human": False,
})
return ChatResponse(
reply=result.get("reply", "Sorry, I could not process your request."),
needs_human=result.get("needs_human", False),
confidence=result.get("confidence", 0.0),
)
except Exception:
logging.exception("Chat error")
return ChatResponse(
reply="Having trouble. Let me connect you with a human.",
needs_human=True,
confidence=0.0,
)
@app.get("/health")
async def health():
return {"status": "ok"}
Run with:
uv run uvicorn main:app --reload --port 8000
Testing the Agent
Test intent routing:
# Greeting - routes to greeting_handler
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer secret" \
-d '{"message": "Hello!", "conversationId": "1", "userId": "1"}'
# Product query - routes to product_fetcher -> llm_responder
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer secret" \
-d '{"message": "What shampoos do you have?", "conversationId": "1", "userId": "1"}'
# Sensitive - routes to escalation
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer secret" \
-d '{"message": "I want a refund!", "conversationId": "1", "userId": "1"}'
Production Considerations
Error Handling
Each node should catch exceptions and return graceful fallbacks:
async def some_node(state: AgentState) -> dict:
try:
# main logic
return {"result": data}
except Exception:
logger.exception("Node failed")
return {"confidence": 0.0, "needs_human": True}
Logging
Log intent, confidence, and routing decisions for debugging:
logger.info("intent=%s confidence=%.2f route=%s", intent, confidence, route)
Timeouts
Wrap LLM calls with timeouts for production:
import asyncio
async def with_timeout(coro, seconds=10):
return await asyncio.wait_for(coro, timeout=seconds)
response = await with_timeout(llm.ainvoke(prompt))
Observability
LangGraph integrates with LangSmith for tracing. Set environment variables:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=ls-...
Conclusion
LangGraph transforms LangChain from linear chains into flexible state machines. The key patterns:
- TypedDict state for type-safe data flow
- Pydantic structured output for reliable classification
- Conditional edges for intent-based routing
- Confidence scoring for automatic escalation
This architecture scales from simple chatbots to complex multi-agent systems with shared state and parallel execution. The graph structure makes the control flow explicit and testable.