Build a Stateful AI Agent with LangGraph: Intent-Based Routing and Conditional Workflows

LangGraph extends LangChain with graph-based workflow orchestration for building stateful, multi-step AI agents. Unlike simple chains, LangGraph lets you define nodes (processing steps) and edges (transitions) that can branch conditionally based on state. This post walks through building a production customer support agent that classifies intent, routes to specialized handlers, and escalates to humans when confidence is low.

Why LangGraph Over Simple Chains?

LangChain chains are linear: input flows through a sequence of steps. But real-world agents need branching logic:

Route to different handlers based on user intent
Skip steps when data is missing
Loop back for clarification
Escalate when confidence drops

LangGraph models these flows as directed graphs with conditional edges. Each node transforms the shared state, and edges determine which node runs next.

Architecture Overview

flowchart TD
    A[Classifier] --> |greeting| B[Greeting Handler]
    A --> |knowledge_query| C[Knowledge Retriever]
    A --> |product_query| D[Product Fetcher]
    A --> |order_query| E[Order Fetcher]
    A --> |outscope| F[Outscope Handler]
    A --> |sensitive| G[Escalation]

    B --> END1([END])
    F --> END2([END])

    C --> H[LLM Responder]
    D --> H
    E --> H

    H --> I[Confidence Evaluator]

    I --> |confidence >= 0.3| END3([END])
    I --> |confidence < 0.3| G

    G --> END4([END])

    style A fill:#4f46e5,color:#fff
    style H fill:#059669,color:#fff
    style I fill:#d97706,color:#fff
    style G fill:#dc2626,color:#fff

The flow: Classifier detects intent → routes to specialized handler → handler fetches context → LLM generates response → evaluator checks confidence → escalates if needed.

Project Setup

Initialize with uv:

uv init ai-chat-service
cd ai-chat-service
uv add fastapi uvicorn langgraph langchain-core langchain-anthropic pydantic python-dotenv

Create .env:

ANTHROPIC_API_KEY=sk-ant-...

Defining Agent State

LangGraph uses TypedDict for state management. Every node reads from and writes to this shared state:

# agent/state.py
from typing import TypedDict, Literal, NotRequired


class AgentState(TypedDict):
    message: str
    conversation_id: str
    user_id: str
    conversation_history: list[dict]
    needs_human: bool
    # Fields populated by agent nodes
    intent: NotRequired[
        Literal[
            "greeting",
            "knowledge_query",
            "order_query",
            "product_query",
            "outscope",
            "unclear",
            "sensitive",
        ]
        | None
    ]
    entities: NotRequired[dict | None]
    fetched_context: NotRequired[dict | None]
    reply: NotRequired[str | None]
    confidence: NotRequired[float | None]

Key patterns:

Required fields must be present at input
NotRequired fields are populated by nodes during processing
Literal types constrain valid values for routing decisions

Structured Classification with Pydantic

The classifier uses with_structured_output() to ensure the LLM returns valid intent and entities:

# agent/nodes/classifier.py
import logging
from typing import Literal
from pydantic import BaseModel, Field
from langchain_anthropic import ChatAnthropic
from agent.state import AgentState

logger = logging.getLogger(__name__)


class ClassifierOutput(BaseModel):
    intent: Literal[
        "greeting",
        "knowledge_query",
        "order_query",
        "product_query",
        "outscope",
        "unclear",
        "sensitive",
    ]
    entities: dict = Field(default_factory=dict)


def _get_structured_llm():
    llm = ChatAnthropic(model="claude-haiku-4-20250514")
    return llm.with_structured_output(ClassifierOutput)


CLASSIFIER_PROMPT = """You are an intent classifier for customer support.

Classify the user message into exactly one intent:
- "greeting": simple greetings (hi, hello, hey)
- "knowledge_query": questions about policies, shipping, FAQs
- "order_query": questions about order status, tracking
- "product_query": questions about products, pricing
- "outscope": unrelated questions (weather, news, coding)
- "unclear": ambiguous messages
- "sensitive": complaints, refunds, legal threats

Extract entities:
- order_id: if mentioned
- product_name: if mentioned
- topic_keywords: keywords for search

Conversation context:
{history}

User message: {message}"""


async def classifier_node(state: AgentState) -> dict:
    history_text = ""
    for msg in state.get("conversation_history", [])[-5:]:
        history_text += f"{msg.get('role', 'user')}: {msg.get('content', '')}\n"

    prompt = CLASSIFIER_PROMPT.format(
        history=history_text or "(no history)",
        message=state["message"],
    )

    try:
        structured_llm = _get_structured_llm()
        result: ClassifierOutput = await structured_llm.ainvoke(prompt)
        logger.info("Classified intent=%s entities=%s", result.intent, result.entities)
        return {"intent": result.intent, "entities": result.entities}
    except Exception:
        logger.exception("Classifier failed, fallback to knowledge_query")
        return {"intent": "knowledge_query", "entities": {}}

The Pydantic model enforces:

intent must be one of the allowed literals
entities defaults to empty dict if not extracted
Invalid LLM responses raise validation errors

Building Specialized Handlers

Each intent gets a dedicated handler node. Here’s a greeting handler:

# agent/nodes/greeting_handler.py
import random
from agent.state import AgentState

GREETINGS = [
    "Hello! How can I help you today?",
    "Hi there! What can I assist you with?",
    "Hey! Welcome! How may I help?",
]


async def greeting_handler_node(state: AgentState) -> dict:
    return {
        "reply": random.choice(GREETINGS),
        "confidence": 1.0,
    }

And a knowledge retriever that queries a vector database:

# agent/nodes/knowledge_retriever.py
import os
import json
import logging
import voyageai
from sqlalchemy import text
from db import async_session
from agent.state import AgentState

logger = logging.getLogger(__name__)


def _get_voyage_client():
    return voyageai.Client(api_key=os.getenv("VOYAGE_API_KEY", ""))


async def knowledge_retriever_node(state: AgentState) -> dict:
    try:
        vo = _get_voyage_client()
        # Embed user message
        embed_result = vo.embed([state["message"]], model="voyage-4-lite")
        query_vector = embed_result.embeddings[0]
        vector_str = json.dumps(query_vector)

        # Vector search
        async with async_session() as session:
            result = await session.execute(
                text("""
                    SELECT title, content, category,
                        1 - (embedding <=> CAST(:vec AS vector)) AS similarity
                    FROM "KnowledgePage"
                    WHERE is_active = true AND embedding IS NOT NULL
                    ORDER BY embedding <=> CAST(:vec AS vector)
                    LIMIT 3
                """),
                {"vec": vector_str},
            )
            rows = result.mappings().all()

        if not rows or rows[0]["similarity"] < 0.3:
            logger.info("No relevant pages found")
            return {"fetched_context": {"pages": []}, "confidence": 0.4}

        logger.info("Found %d pages, top similarity=%.3f", len(rows), rows[0]["similarity"])
        return {
            "fetched_context": {
                "pages": [
                    {"title": r["title"], "content": r["content"], "category": r["category"]}
                    for r in rows
                ]
            }
        }
    except Exception:
        logger.exception("Knowledge retriever failed")
        return {"fetched_context": {"pages": []}, "confidence": 0.4}

LLM Response Generation

The responder uses fetched context to generate grounded answers:

# agent/nodes/llm_responder.py
import json
import logging
from langchain_anthropic import ChatAnthropic
from agent.state import AgentState

logger = logging.getLogger(__name__)


def _get_llm():
    return ChatAnthropic(model="claude-haiku-4-20250514")


SYSTEM_PROMPT = """You are a friendly customer support assistant.

GUIDELINES:
- Answer from provided context only
- If context is insufficient, acknowledge limitations
- Match customer's language
- Keep responses concise

FORMATTING:
- Use bullet points for lists
- Use **bold** for key terms

End with: CONFIDENCE: X.X (0.0-1.0)"""


async def llm_responder_node(state: AgentState) -> dict:
    context_data = state.get("fetched_context") or {}
    context_str = json.dumps(context_data, ensure_ascii=False, default=str)

    history_text = ""
    for msg in state.get("conversation_history", [])[-5:]:
        history_text += f"{msg.get('role', 'user')}: {msg.get('content', '')}\n"

    user_prompt = f"Context:\n{context_str}\n\nHistory:\n{history_text}\n\nMessage: {state['message']}"

    try:
        llm = _get_llm()
        response = await llm.ainvoke([
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_prompt},
        ])

        content = response.content
        reply_text = content if isinstance(content, str) else str(content)
        confidence = state.get("confidence") or 0.7

        # Extract inline confidence score
        if "CONFIDENCE:" in reply_text:
            parts = reply_text.rsplit("CONFIDENCE:", 1)
            reply_text = parts[0].strip()
            try:
                confidence = float(parts[1].strip()[:3])
            except ValueError:
                pass

        return {"reply": reply_text, "confidence": confidence}
    except Exception:
        logger.exception("LLM responder failed")
        return {
            "reply": "Sorry, I'm having trouble. Let me connect you with a human.",
            "confidence": 0.0,
            "needs_human": True,
        }

Confidence-Based Escalation

The evaluator checks if the response needs human review:

# agent/nodes/confidence_evaluator.py
import logging
from typing import Literal
from agent.state import AgentState

logger = logging.getLogger(__name__)

ESCALATION_KEYWORDS = [
    "refund", "complaint", "speak to human", "real person",
    "not working", "terrible", "lawsuit",
]


def confidence_evaluator_node(state: AgentState) -> dict:
    """Check for escalation keywords in message."""
    message_lower = state.get("message", "").lower()
    has_escalation = any(kw in message_lower for kw in ESCALATION_KEYWORDS)

    if has_escalation:
        logger.info("Escalation keyword detected")
        return {"confidence": 0.0, "needs_human": True}

    return {}


def evaluate_route(state: AgentState) -> Literal["respond", "escalate"]:
    """Route based on confidence and flags."""
    confidence = state.get("confidence") or 0.5
    intent = state.get("intent")

    if state.get("needs_human"):
        logger.info("Escalating: needs_human flag")
        return "escalate"
    if intent == "sensitive":
        logger.info("Escalating: sensitive intent")
        return "escalate"
    if confidence < 0.3:
        logger.info("Escalating: low confidence %.2f", confidence)
        return "escalate"

    return "respond"

Assembling the Graph

Now wire everything together with LangGraph:

# agent/graph.py
from langgraph.graph import StateGraph, END
from agent.state import AgentState
from agent.nodes import (
    classifier_node,
    greeting_handler_node,
    outscope_handler_node,
    knowledge_retriever_node,
    order_fetcher_node,
    product_fetcher_node,
    llm_responder_node,
    confidence_evaluator_node,
    evaluate_route,
    escalation_node,
)


def route_by_intent(state: AgentState) -> str:
    """Route to handler based on classified intent."""
    intent = state.get("intent") or "unclear"
    if intent == "sensitive":
        return "escalation"

    routing_map: dict[str, str] = {
        "greeting": "greeting_handler",
        "knowledge_query": "knowledge_retriever",
        "order_query": "order_fetcher",
        "product_query": "product_fetcher",
        "outscope": "outscope_handler",
    }
    return routing_map.get(intent, "knowledge_retriever")


def build_graph():
    graph = StateGraph(AgentState)

    # Add nodes
    graph.add_node("classifier", classifier_node)
    graph.add_node("greeting_handler", greeting_handler_node)
    graph.add_node("outscope_handler", outscope_handler_node)
    graph.add_node("knowledge_retriever", knowledge_retriever_node)
    graph.add_node("order_fetcher", order_fetcher_node)
    graph.add_node("product_fetcher", product_fetcher_node)
    graph.add_node("llm_responder", llm_responder_node)
    graph.add_node("confidence_evaluator", confidence_evaluator_node)
    graph.add_node("escalation", escalation_node)

    # Set entry point
    graph.set_entry_point("classifier")

    # Conditional routing after classification
    graph.add_conditional_edges(
        "classifier",
        route_by_intent,
        {
            "greeting_handler": "greeting_handler",
            "outscope_handler": "outscope_handler",
            "knowledge_retriever": "knowledge_retriever",
            "order_fetcher": "order_fetcher",
            "product_fetcher": "product_fetcher",
            "escalation": "escalation",
        },
    )

    # Terminal handlers end immediately
    graph.add_edge("greeting_handler", END)
    graph.add_edge("outscope_handler", END)

    # Context fetchers flow to responder
    graph.add_edge("knowledge_retriever", "llm_responder")
    graph.add_edge("order_fetcher", "llm_responder")
    graph.add_edge("product_fetcher", "llm_responder")

    # Response evaluation
    graph.add_edge("llm_responder", "confidence_evaluator")
    graph.add_conditional_edges(
        "confidence_evaluator",
        evaluate_route,
        {"respond": END, "escalate": "escalation"},
    )

    graph.add_edge("escalation", END)

    return graph.compile()


agent = build_graph()

Key LangGraph patterns:

StateGraph(AgentState) - creates graph with typed state
add_node(name, func) - registers processing function
set_entry_point(name) - defines starting node
add_edge(from, to) - unconditional transition
add_conditional_edges(from, router_fn, mapping) - branch based on router return value
END - terminal state that exits the graph

FastAPI Integration

Expose the agent via API:

# main.py
import logging
from fastapi import FastAPI, HTTPException, Depends
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from pydantic import BaseModel
from agent import agent
import os

logging.basicConfig(level=logging.INFO)

app = FastAPI(title="AI Chat Service")
security = HTTPBearer()
NEXTJS_SECRET = os.getenv("NEXTJS_SECRET", "")


class ChatMessage(BaseModel):
    role: str
    content: str


class ChatRequest(BaseModel):
    message: str
    conversationId: str
    userId: str
    history: list[ChatMessage] = []


class ChatResponse(BaseModel):
    reply: str
    needs_human: bool
    confidence: float


def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
    if NEXTJS_SECRET and credentials.credentials != NEXTJS_SECRET:
        raise HTTPException(status_code=401, detail="Invalid token")


@app.post("/chat", response_model=ChatResponse)
async def chat(req: ChatRequest, _=Depends(verify_token)):
    try:
        result = await agent.ainvoke({
            "message": req.message,
            "conversation_id": req.conversationId,
            "user_id": req.userId,
            "conversation_history": [
                {"role": m.role, "content": m.content} for m in req.history
            ],
            "needs_human": False,
        })
        return ChatResponse(
            reply=result.get("reply", "Sorry, I could not process your request."),
            needs_human=result.get("needs_human", False),
            confidence=result.get("confidence", 0.0),
        )
    except Exception:
        logging.exception("Chat error")
        return ChatResponse(
            reply="Having trouble. Let me connect you with a human.",
            needs_human=True,
            confidence=0.0,
        )


@app.get("/health")
async def health():
    return {"status": "ok"}

Run with:

uv run uvicorn main:app --reload --port 8000

Testing the Agent

Test intent routing:

# Greeting - routes to greeting_handler
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer secret" \
  -d '{"message": "Hello!", "conversationId": "1", "userId": "1"}'

# Product query - routes to product_fetcher -> llm_responder
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer secret" \
  -d '{"message": "What shampoos do you have?", "conversationId": "1", "userId": "1"}'

# Sensitive - routes to escalation
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer secret" \
  -d '{"message": "I want a refund!", "conversationId": "1", "userId": "1"}'

Production Considerations

Error Handling

Each node should catch exceptions and return graceful fallbacks:

async def some_node(state: AgentState) -> dict:
    try:
        # main logic
        return {"result": data}
    except Exception:
        logger.exception("Node failed")
        return {"confidence": 0.0, "needs_human": True}

Logging

Log intent, confidence, and routing decisions for debugging:

logger.info("intent=%s confidence=%.2f route=%s", intent, confidence, route)

Timeouts

Wrap LLM calls with timeouts for production:

import asyncio

async def with_timeout(coro, seconds=10):
    return await asyncio.wait_for(coro, timeout=seconds)

response = await with_timeout(llm.ainvoke(prompt))

Observability

LangGraph integrates with LangSmith for tracing. Set environment variables:

LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=ls-...

Conclusion

LangGraph transforms LangChain from linear chains into flexible state machines. The key patterns:

TypedDict state for type-safe data flow
Pydantic structured output for reliable classification
Conditional edges for intent-based routing
Confidence scoring for automatic escalation

This architecture scales from simple chatbots to complex multi-agent systems with shared state and parallel execution. The graph structure makes the control flow explicit and testable.