<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ langgraph - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ langgraph - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Mon, 25 May 2026 20:14:47 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/langgraph/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Multi-Agent AI System with LangGraph, MCP, and A2A [Full Book] ]]>
                </title>
                <description>
                    <![CDATA[ Building a single AI agent that answers questions or runs searches is a solved problem. A handful of tutorials and a few hours of work will get you there. What most tutorials skip is the engineering l ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-multi-agent-ai-system-with-langgraph-mcp-and-a2a-full-book/</link>
                <guid isPermaLink="false">69f36894909e64ad07e3fc7f</guid>
                
                    <category>
                        <![CDATA[ ai agents ]]>
                    </category>
                
                    <category>
                        <![CDATA[ large language models ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langgraph ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Multi-Agent Systems (MAS) ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langfuse ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MCP-protocol ]]>
                    </category>
                
                    <category>
                        <![CDATA[ A2A Protocol ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Sandeep Bharadwaj Mannapur ]]>
                </dc:creator>
                <pubDate>Thu, 30 Apr 2026 14:35:00 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/41b8ee2f-3097-497e-b008-0259f6c10772.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Building a single AI agent that answers questions or runs searches is a solved problem. A handful of tutorials and a few hours of work will get you there.</p>
<p>What most tutorials skip is the engineering layer that comes next: the part that makes a multi-agent system reliable enough to run in production.</p>
<p>How do you recover state after a process crash? How do you give agents standardized access to tools without writing a proprietary adapter for every integration? How do you coordinate agents built with different frameworks? How do you know when agent output quality is degrading?</p>
<p>These are infrastructure questions, and this book answers them with working code you can run on your own machine. No cloud accounts, no API keys, no ongoing cost.</p>
<p>You'll work with four technologies that tackle these problems at the protocol level:</p>
<ol>
<li><p><strong>LangGraph</strong> for stateful agent orchestration,</p>
</li>
<li><p><strong>MCP (Model Context Protocol)</strong> for standardized tool integration,</p>
</li>
<li><p><strong>A2A (Agent-to-Agent Protocol)</strong> for cross-framework agent coordination, and</p>
</li>
<li><p><strong>Ollama</strong> for local LLM inference.</p>
</li>
</ol>
<p>To make every concept concrete, you'll build a real system throughout: a Learning Accelerator that plans study roadmaps, explains topics from your own notes, runs quizzes, and adapts based on the results. The use case is the teaching vehicle. The architecture is the real subject.</p>
<p>That architecture pattern (specialized agents coordinating through open protocols) runs in production today for sales enablement (agents that onboard reps and adapt training paths), compliance training (agents that certify employees through regulatory curricula), customer support (agents that build knowledge bases and track escalation topics), and engineering onboarding (agents that walk new hires through codebases).</p>
<p>The domain changes. The infrastructure patterns don't.</p>
<h3 id="heading-get-the-complete-code">📦 <strong>Get the Complete Code</strong></h3>
<p>The full ready-to-run repository for this handbook <a href="http://github.com/sandeepmb/freecodecamp-multi-agent-ai-system">is on GitHub here</a>. Clone it and follow along, or use it as a reference implementation while you read.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-introduction">Introduction</a></p>
</li>
<li><p><a href="#heading-chapter-1-when-to-use-multiple-agents">Chapter 1: When to Use Multiple Agents</a></p>
</li>
<li><p><a href="#heading-chapter-2-stateful-orchestration-with-langgraph">Chapter 2: Stateful Orchestration with LangGraph</a></p>
</li>
<li><p><a href="#heading-chapter-3-standardized-tool-access-with-mcp">Chapter 3: Standardized Tool Access with MCP</a></p>
</li>
<li><p><a href="#heading-chapter-4-building-the-four-agent-system">Chapter 4: Building the Four-Agent System</a></p>
</li>
<li><p><a href="#heading-chapter-5-state-persistence-and-human-oversight">Chapter 5: State Persistence and Human Oversight</a></p>
</li>
<li><p><a href="#heading-chapter-6-observability-with-langfuse">Chapter 6: Observability with Langfuse</a></p>
</li>
<li><p><a href="#heading-chapter-7-evaluating-agent-quality-with-deepeval">Chapter 7: Evaluating Agent Quality with DeepEval</a></p>
</li>
<li><p><a href="#heading-chapter-8-cross-framework-coordination-with-a2a">Chapter 8: Cross-Framework Coordination with A2A</a></p>
</li>
<li><p><a href="#heading-chapter-9-the-complete-system-and-whats-next">Chapter 9: The Complete System and What's Next</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-appendix-a-framework-comparison">Appendix A: Framework Comparison</a></p>
</li>
<li><p><a href="#heading-appendix-b-model-selection-guide">Appendix B: Model Selection Guide</a></p>
</li>
<li><p><a href="#heading-appendix-c-production-hardening-checklist">Appendix C: Production Hardening Checklist</a></p>
</li>
</ul>
<h2 id="heading-introduction">Introduction</h2>
<h3 id="heading-what-youll-build">What You'll Build</h3>
<p>The system you'll build has four agents coordinated by LangGraph, two MCP servers giving those agents access to external tools, two A2A services that allow cross-framework agent delegation, Langfuse capturing full traces, and DeepEval running automated quality checks.</p>
<p>Here is what that looks like end to end:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6983b18befedc65b9820e223/4bcaabd4-644a-4787-a8ae-de0c4e7ca73c.png" alt="Architecture diagram of the Learning Accelerator showing five layers: a User on the left feeding learning goals, approval responses, and quiz answers into the Orchestration Layer; the Orchestration Layer contains a LangGraph workflow with five nodes (Curriculum Planner, Human Approval, Explainer, Quiz Generator, Progress Coach) connected to a SQLite checkpoint store; the Tool Layer beneath holds an MCP Filesystem Server and an MCP Memory Server that the agents read and write through; the Inference Layer at the bottom shows all four agents fanning into Ollama running locally on port 11434 with qwen2.5 models; the A2A Layer on the right shows a Quiz Generator A2A service on port 9001 and a CrewAI Study Buddy on port 9002, both reached over JSON-RPC 2.0; the Observability Layer on the right shows Langfuse capturing every LLM call, tool call, and node execution via callback traces." style="display:block;margin:0 auto" width="1672" height="941" loading="lazy">

<p><em>Figure 1. The complete system. LangGraph orchestrates the four agents. Each agent accesses tools through MCP. The Progress Coach delegates to external agents via A2A, including a CrewAI agent, a different framework entirely. Ollama runs all inference locally. Langfuse captures every trace.</em></p>
<p>You'll build each layer incrementally. By the time the system is complete, you'll understand not just how to wire these technologies together but why each one exists and what production failure mode it prevents.</p>
<h3 id="heading-the-technology-stack">The Technology Stack</h3>
<table>
<thead>
<tr>
<th>Technology</th>
<th>Version</th>
<th>Role</th>
</tr>
</thead>
<tbody><tr>
<td>LangGraph</td>
<td>1.1.0</td>
<td>Stateful multi-agent graph orchestration</td>
</tr>
<tr>
<td>MCP</td>
<td>1.26.0</td>
<td>Standardized agent-to-tool protocol</td>
</tr>
<tr>
<td>A2A SDK</td>
<td>0.3.25</td>
<td>Cross-framework agent-to-agent protocol</td>
</tr>
<tr>
<td>Ollama</td>
<td>latest</td>
<td>Local LLM inference (no API keys)</td>
</tr>
<tr>
<td>CrewAI</td>
<td>1.13.0</td>
<td>Cross-framework interop via A2A</td>
</tr>
<tr>
<td>Langfuse</td>
<td>4.0.1</td>
<td>Distributed tracing and observability</td>
</tr>
<tr>
<td>DeepEval</td>
<td>3.9.1</td>
<td>LLM-as-judge evaluation</td>
</tr>
</tbody></table>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>You should be comfortable with:</p>
<ul>
<li><p><strong>Python 3.11 or higher</strong>: type hints, dataclasses, async/await basics</p>
</li>
<li><p><strong>Basic LLM concepts</strong>: prompts, completions, tool calling</p>
</li>
<li><p><strong>Command line</strong>: creating virtual environments, running scripts</p>
</li>
</ul>
<p>You don't need prior experience with LangGraph, MCP, A2A, or any agent framework. This handbook builds from first principles.</p>
<h3 id="heading-hardware-requirements">Hardware Requirements</h3>
<table>
<thead>
<tr>
<th>Setup</th>
<th>RAM</th>
<th>VRAM</th>
<th>Model</th>
<th>Notes</th>
</tr>
</thead>
<tbody><tr>
<td>Minimum</td>
<td>16 GB</td>
<td>8 GB</td>
<td><code>qwen2.5:7b</code></td>
<td>Fully functional</td>
</tr>
<tr>
<td>Recommended</td>
<td>32 GB</td>
<td>24 GB</td>
<td><code>qwen2.5-coder:32b</code></td>
<td>Best tool-calling reliability</td>
</tr>
<tr>
<td>CPU-only</td>
<td>32 GB</td>
<td>None</td>
<td><code>qwen2.5:7b</code></td>
<td>Works but 5 to 10 times slower</td>
</tr>
</tbody></table>
<h3 id="heading-why-model-size-matters-for-agents">💡 Why Model Size Matters for Agents</h3>
<p>Agents call tools by generating structured JSON arguments. A model that hallucinates tool names or misformats arguments fails silently: the tool call doesn't execute, the agent loops, and you hit the iteration limit without a clear error.</p>
<p>Models under 7B parameters produce these JSON formatting errors frequently. The 7 to 9B range is the minimum viable tier for reliable tool calling in production.</p>
<h2 id="heading-chapter-1-when-to-use-multiple-agents">Chapter 1: When to Use Multiple Agents</h2>
<p>Before writing any code, you should answer a question that most multi-agent tutorials skip entirely: does your problem actually need multiple agents?</p>
<p>This matters because adding agents has a real cost. More agents means more moving parts, more potential failure points, shared state that can be corrupted from multiple directions, and debugging that requires following execution across process boundaries. A single agent with good tools is often the simpler, faster, and more reliable solution.</p>
<p>So the question isn't "should I use multiple agents?" as though multi-agent is inherently superior. The question is "does my problem have characteristics that justify the coordination overhead?"</p>
<h3 id="heading-11-when-a-single-agent-is-the-right-answer">1.1 When a Single Agent is the Right Answer</h3>
<p>A single agent is usually the right architecture when the problem has one primary job that fits in one context window.</p>
<p>An agent that researches a topic and summarizes it: one job, one context window, one agent. An agent that reviews a pull request and posts comments: one job. An agent that answers customer questions from a knowledge base: one job. An agent that extracts structured data from a document: one job.</p>
<p>In these cases, adding a second agent doesn't simplify anything. It adds a coordination layer, a shared state contract, a new failure surface, and debugging complexity, in exchange for no architectural benefit. The single agent does the whole job. You give it good tools and it works.</p>
<p>The model for a single agent is straightforward:</p>
<pre><code class="language-plaintext">User input → Agent (with tools) → Response
</code></pre>
<p>The agent may call tools in a loop (search, read, write, verify) but a single LLM with the right tool access handles the full task. This is the right starting point for most AI automation work, and it's often the right finishing point too.</p>
<h3 id="heading-12-the-real-criteria-for-multiple-agents">1.2 The Real Criteria for Multiple Agents</h3>
<p>A problem warrants multiple agents when it has <em>genuinely distinct specializations</em>: subtasks so different in their tools, LLM call patterns, temperature requirements, or failure modes that combining them into one agent creates more problems than it solves.</p>
<p>Here are the specific conditions that justify the coordination overhead:</p>
<h4 id="heading-different-tools-for-different-subtasks">Different tools for different subtasks</h4>
<p>If one part of the workflow needs filesystem access, another needs database writes, and a third needs to call an external API, there's a natural seam for agent separation.</p>
<p>Each agent uses only the tools it needs, which means each agent is easier to test and reason about in isolation.</p>
<h4 id="heading-different-llm-call-patterns">Different LLM call patterns</h4>
<p>Some tasks need a single structured output call with <code>temperature=0</code>. Others need a multi-turn tool-calling loop that terminates when the LLM decides it has enough context.</p>
<p>Mixing these patterns in one agent creates a function that does too many different things and fails in different ways depending on which path executes.</p>
<h4 id="heading-different-temperature-and-model-requirements">Different temperature and model requirements</h4>
<p>Structured planning output wants low temperature for consistency. Creative explanation wants slightly higher temperature for variety. Grading wants low temperature for analytical consistency.</p>
<p>If these three tasks share one agent with one temperature setting, you're making compromises in every direction.</p>
<h4 id="heading-fault-isolation-requirements">Fault isolation requirements</h4>
<p>If one subtask can fail without stopping the others, you need a boundary between them. An agent that plans a curriculum can succeed even if the quiz grading service is temporarily down. If they're in the same process with the same failure surface, a grading error takes down planning too.</p>
<h4 id="heading-independent-deployment-needs">Independent deployment needs</h4>
<p>If different parts of the system might need to run at different scales, be updated independently, or be built by different teams using different frameworks, agent separation maps to deployment separation. The A2A protocol (Chapter 8) makes this concrete.</p>
<h4 id="heading-cross-framework-collaboration">Cross-framework collaboration</h4>
<p>If you want to use a CrewAI agent for one task and a LangGraph agent for another, because different frameworks have different strengths, you need a protocol for them to communicate. That protocol is A2A.</p>
<p>None of these conditions by themselves mandate multi-agent. Two of them probably do. All of them make a strong case.</p>
<h3 id="heading-13-the-cost-youre-paying">1.3 The Cost You're Paying</h3>
<p>Before committing to a multi-agent architecture, name what you're paying for it.</p>
<p><strong>Shared state complexity:</strong> Every agent reads from and writes to a shared state object. If two agents write to the same field, you need a merge strategy. If one agent writes bad data, every subsequent agent gets bad input.</p>
<p>The state definition becomes a contract that all agents must honor, and changes to that contract require updating every agent.</p>
<p><strong>Harder debugging:</strong> A failure in a single agent shows up in one stack trace. A failure in a multi-agent system might be caused by bad output from three steps earlier, persisted in state, passed to a second agent, which produced output that caused the failure you're seeing now. The chain of causation crosses agent boundaries.</p>
<p><strong>Latency multiplication:</strong> Each agent makes at least one LLM call. A four-agent system makes a minimum of four LLM calls per session, often more when agents use tools in loops. At 2 to 5 seconds per Ollama call, that adds up quickly.</p>
<p><strong>More infrastructure:</strong> Multi-agent systems benefit from state persistence, observability, evaluation, and human oversight, all of which take time to set up. A single agent can often run without any of this. A multi-agent system in production really can't.</p>
<p>You should go into a multi-agent architecture with eyes open about these costs, and you should be able to name the specific benefits that justify them.</p>
<h3 id="heading-14-why-this-system-uses-four-agents">1.4 Why This System Uses Four Agents</h3>
<p>The Learning Accelerator uses four agents. Here is the honest technical justification for each separation&nbsp;– again, not because multi-agent is better, but because these four tasks are different enough that combining any two would make the combined agent worse at both.</p>
<table>
<thead>
<tr>
<th>Agent</th>
<th>What it does</th>
<th>Why it's a separate agent</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Curriculum Planner</strong></td>
<td>Takes a learning goal, produces a structured study roadmap</td>
<td>One LLM call, <code>temperature=0.1</code>, <code>format="json"</code>. Zero tools. Fast, deterministic, fails fast on bad input. Mixing tool-calling behavior here would add noise to structured output.</td>
</tr>
<tr>
<td><strong>Explainer</strong></td>
<td>Reads source notes via MCP, explains topics to the student</td>
<td>Multi-turn tool-calling loop. <code>temperature=0.3</code>. Loop count is non-deterministic: the LLM decides when it has enough context. Completely different execution pattern from the Planner.</td>
</tr>
<tr>
<td><strong>Quiz Generator</strong></td>
<td>Generates questions (creative), then grades answers (analytical)</td>
<td>Two separate LLM calls with different temperatures. Interactive: pauses for user input. Also runs as a standalone A2A service (Chapter 8). Can't do this if bundled with another agent.</td>
</tr>
<tr>
<td><strong>Progress Coach</strong></td>
<td>Synthesizes results, updates topic status, routes to next topic or ends</td>
<td>Makes the only cross-agent A2A call (to the CrewAI Study Buddy). Reads and writes MCP memory. Manages the routing decision that determines whether the graph loops or ends.</td>
</tr>
</tbody></table>
<p>The Curriculum Planner and Explainer alone justify separation: one does structured JSON output with no tools, the other does a multi-turn tool-calling loop. Putting these in one agent means one function that sometimes calls tools in a loop and sometimes doesn't, at different temperatures, returning different types of output. That's not one agent with a broad capability. That's two agents pretending to be one.</p>
<p>The Quiz Generator's dual-temperature pattern (creative question generation at 0.4, analytical grading at 0.1) and its need to run as a standalone A2A service make the case for its own boundary.</p>
<p>The Progress Coach is the coordinator. It synthesizes everything and makes the routing decision, which is exactly the wrong job to share with any other agent.</p>
<p>This is the pattern worth looking for in your own problems: if you can't explain why two tasks should be the same agent, they probably shouldn't be.</p>
<p>The same reasoning applies in production systems. A compliance training platform has a curriculum agent (builds the certification path), a content delivery agent (presents regulatory material from a content MCP server), an assessment agent (tests comprehension, records results), and a certification agent (evaluates readiness, issues certificates).</p>
<p>Each has different tools, different failure modes, and different update cadences. The separation isn't architectural philosophy. It's the direct consequence of what each task needs.</p>
<h3 id="heading-15-setting-up-the-project">1.5 Setting Up the Project</h3>
<p>With the architectural reasoning established, let's build the system.</p>
<h4 id="heading-install-ollama-and-pull-your-model">Install Ollama and pull your model</h4>
<p>Ollama runs local LLMs as an OpenAI-compatible server on <code>localhost:11434</code>.</p>
<p>macOS and Linux:</p>
<pre><code class="language-bash">curl -fsSL https://ollama.com/install.sh | sh
</code></pre>
<p>Windows: Download the installer from <a href="https://ollama.com">ollama.com</a> and run it.</p>
<p>Pull the model that matches your hardware:</p>
<pre><code class="language-bash"># 8 GB VRAM
ollama pull qwen2.5:7b

# 24 GB VRAM: stronger tool calling, recommended if you have it
ollama pull qwen2.5-coder:32b

# Verify it works
ollama run qwen2.5:7b "Say hello in one sentence."
</code></pre>
<p>You should see a short response. Keep Ollama running as a background server: it stays alive between calls.</p>
<h4 id="heading-clone-the-repository">Clone the repository</h4>
<pre><code class="language-bash">git clone https://github.com/sandeepmb/freecodecamp-multi-agent-ai-system
cd freecodecamp-multi-agent-ai-system
</code></pre>
<h4 id="heading-set-up-the-virtual-environment">Set up the virtual environment</h4>
<pre><code class="language-bash">python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -r requirements.txt
</code></pre>
<p>The <code>requirements.txt</code> pins every dependency to a tested version:</p>
<pre><code class="language-plaintext"># requirements.txt
langgraph==1.1.0
langgraph-checkpoint-sqlite==3.0.3
langchain-core==1.0.0
langchain-ollama==1.0.0

mcp==1.26.0
a2a-sdk==0.3.25
crewai==1.13.0

langfuse==4.0.1
deepeval==3.9.1

litellm==1.82.4
openai==2.8.0
httpx==0.28.1
fastapi==0.115.0
uvicorn==0.34.0
streamlit==1.43.2

pydantic==2.11.9
python-dotenv==1.1.1
tenacity==8.5.0

pytest==8.3.0
pytest-asyncio==0.25.0
</code></pre>
<p>⚠️ <strong>Don't upgrade dependency versions.</strong> The agent frameworks in this stack, particularly LangGraph, langchain-core, and the A2A SDK, have breaking changes between minor versions. The pinned versions are tested together. Running <code>pip install --upgrade</code> on any of them risks breaking imports or behavior.</p>
<h4 id="heading-configure-your-environment">Configure your environment</h4>
<pre><code class="language-bash">cp .env.example .env
</code></pre>
<p>Open <code>.env</code> and set your model:</p>
<pre><code class="language-bash"># .env: set this to match what you pulled
OLLAMA_MODEL=qwen2.5:7b
OLLAMA_BASE_URL=http://localhost:11434

# Storage
CHECKPOINT_DB=data/checkpoints.db
NOTES_PATH=study_materials/sample_notes

# A2A services (used in Chapter 8)
QUIZ_SERVICE_URL=http://localhost:9001
STUDY_BUDDY_URL=http://localhost:9002
USE_A2A_QUIZ=true
USE_STUDY_BUDDY=true

# Langfuse: leave empty for now, configured in Chapter 6
LANGFUSE_PUBLIC_KEY=
LANGFUSE_SECRET_KEY=
LANGFUSE_HOST=http://localhost:3000
</code></pre>
<h4 id="heading-verify-the-setup">Verify the setup</h4>
<pre><code class="language-bash">python main.py --help
</code></pre>
<p>You should see the argparse help output with no errors. If you see import errors, check that the virtual environment is activated.</p>
<p>📌 <strong>Checkpoint:</strong> You have Ollama running, dependencies installed, and the environment configured. The project structure looks like this:</p>
<pre><code class="language-plaintext">freecodecamp-multi-agent-ai-system/
├── src/
│   ├── agents/           # LangGraph agent nodes
│   ├── graph/            # State definition and workflow
│   ├── mcp_servers/      # MCP tool servers
│   ├── a2a_services/     # A2A protocol services and client
│   ├── crewai_agent/     # CrewAI agent served via A2A
│   └── observability/    # Langfuse setup
├── tests/                # Unit and evaluation tests
├── study_materials/
│   └── sample_notes/     # Markdown files the Explainer reads
├── docs/
├── data/                 # SQLite checkpoint DB (created at runtime)
├── main.py
├── Makefile
├── docker-compose.yml    # Langfuse local stack
├── requirements.txt
└── .env.example
</code></pre>
<p>Everything in <code>src/</code> follows the standard Python <code>src/</code> layout. The <code>pyproject.toml</code> adds <code>src/</code> to the Python path so tests can import <code>from graph.state import AgentState</code> without path gymnastics.</p>
<p>In the next chapter, you'll build the first piece of the system: the LangGraph graph that coordinates all four agents. You'll start with the shared state definition that every agent reads and writes.</p>
<h2 id="heading-chapter-2-stateful-orchestration-with-langgraph">Chapter 2: Stateful Orchestration with LangGraph</h2>
<p>LangGraph models a multi-agent workflow as a directed graph. Nodes are Python functions: your agent code. Edges define the routing between them. Every node reads from and writes to a shared state object. LangGraph checkpoints that state to SQLite after every node runs.</p>
<p>That last part is what makes it a production tool rather than a convenience wrapper. A naïve multi-agent loop written as a <code>for</code> loop loses everything the moment it crashes. LangGraph doesn't. The checkpoint survives the crash, and <code>graph.invoke()</code> with the same session ID picks up exactly where it left off.</p>
<p>This chapter builds the graph foundation: the shared state definition that all four agents use, the first working agent node, and the graph that wires it together.</p>
<h3 id="heading-21-the-shared-state">2.1 The Shared State</h3>
<p>Every node in the graph receives the complete state as a <code>dict</code> and returns a partial update with only the keys it changed. LangGraph merges that update into the full state and saves a checkpoint before calling the next node.</p>
<p>The state definition in <code>src/graph/state.py</code> starts with four dataclasses that hold structured data, then defines the <code>AgentState</code> TypedDict that LangGraph manages:</p>
<pre><code class="language-python"># src/graph/state.py

from __future__ import annotations

import json
from dataclasses import dataclass, field, asdict
from typing import Annotated, TypedDict

from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages


@dataclass
class Topic:
    """A single topic within the study roadmap."""
    title: str
    description: str
    estimated_minutes: int
    prerequisites: list[str] = field(default_factory=list)
    # pending → in_progress → completed | needs_review
    status: str = "pending"

    def to_dict(self) -&gt; dict:
        return asdict(self)

    @classmethod
    def from_dict(cls, data: dict) -&gt; "Topic":
        return cls(
            title=data["title"],
            description=data["description"],
            estimated_minutes=data["estimated_minutes"],
            prerequisites=data.get("prerequisites", []),
            status=data.get("status", "pending"),
        )


@dataclass
class StudyRoadmap:
    """The full study plan produced by the Curriculum Planner."""
    goal: str
    total_weeks: int
    topics: list[Topic]
    weekly_hours: int = 5

    def is_complete(self) -&gt; bool:
        return all(t.status in ("completed", "needs_review") for t in self.topics)


@dataclass
class QuizResult:
    """The complete result of one quiz session on a single topic."""
    topic: str
    questions: list
    score: float       # 0.0 to 1.0
    weak_areas: list[str]
    timestamp: str = ""

    def passed(self) -&gt; bool:
        return self.score &gt;= 0.5


class AgentState(TypedDict):
    """
    The shared state for the Learning Accelerator graph.

    Partial updates: when a node returns {"approved": True}, LangGraph
    merges that into the existing state. It does NOT replace the whole dict.
    Nodes only return the keys they changed.

    The one exception is `messages`: it uses the add_messages reducer,
    which appends to the list instead of replacing it.
    """
    messages: Annotated[list[BaseMessage], add_messages]
    session_id: str
    goal: str
    roadmap: StudyRoadmap | None
    approved: bool
    current_topic_index: int
    quiz_results: list[QuizResult]
    weak_areas: list[str]
    study_materials_path: str
    error: str | None
</code></pre>
<p>A few design decisions worth understanding here.</p>
<p><strong>Why TypedDict and not a regular class?</strong> LangGraph requires dict-compatible objects. TypedDict gives you type safety (your IDE catches misspelled keys) while remaining dict-compatible. It's the right tool for this specific use case.</p>
<p><strong>Why</strong> <code>add_messages</code> <strong>on the</strong> <code>messages</code> <strong>field?</strong> Every other field in <code>AgentState</code> uses last-write-wins semantics. If two nodes write to <code>roadmap</code>, the second one wins. But conversation messages should accumulate. The <code>add_messages</code> reducer tells LangGraph to append new messages rather than replace the list. This preserves the full conversation history across all agent calls.</p>
<p><strong>Why dataclasses for</strong> <code>Topic</code><strong>,</strong> <code>StudyRoadmap</code><strong>, and</strong> <code>QuizResult</code><strong>?</strong> Because agents need to read and update structured data without accidentally typo-ing a key. <code>topic.title</code> raises an <code>AttributeError</code> immediately if the field doesn't exist. <code>topic["titl"]</code> silently returns <code>None</code>. For structured data that multiple agents touch, dataclasses are safer than plain dicts.</p>
<p>The <code>src/graph/state.py</code> file also contains three utility functions that agent nodes use to read from state safely:</p>
<pre><code class="language-python"># src/graph/state.py (continued)

def initial_state(
    goal: str,
    session_id: str,
    study_materials_path: str = "study_materials/sample_notes",
) -&gt; dict:
    """Create the initial state for a new study session."""
    return {
        "messages": [],
        "session_id": session_id,
        "goal": goal,
        "roadmap": None,
        "approved": False,
        "current_topic_index": 0,
        "quiz_results": [],
        "weak_areas": [],
        "study_materials_path": study_materials_path,
        "error": None,
    }


def get_current_topic(state: dict) -&gt; Topic | None:
    """Get the topic currently being studied, or None if done."""
    roadmap = state.get("roadmap")
    if roadmap is None:
        return None
    idx = state.get("current_topic_index", 0)
    if idx &gt;= len(roadmap.topics):
        return None
    return roadmap.topics[idx]


def session_is_complete(state: dict) -&gt; bool:
    """True when all topics have been studied."""
    roadmap = state.get("roadmap")
    if roadmap is None:
        return True
    idx = state.get("current_topic_index", 0)
    return idx &gt;= len(roadmap.topics)
</code></pre>
<p><code>initial_state()</code> is always how you create a new session. Never build the dict manually. It ensures every field has a valid default and no required key is accidentally missing.</p>
<h3 id="heading-22-the-curriculum-planner-the-first-agent-node">2.2 The Curriculum Planner: the First Agent Node</h3>
<p>The Curriculum Planner is the simplest agent in the system: one LLM call, one JSON response, one dataclass output. No tools, no loops. It demonstrates the pattern every agent follows: read from state, call LLM, parse output, return partial state update.</p>
<pre><code class="language-python"># src/agents/curriculum_planner.py

import json
import os

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama import ChatOllama

from graph.state import StudyRoadmap, Topic

MODEL_NAME = os.getenv("OLLAMA_MODEL", "qwen2.5:7b")
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")

PLANNER_SYSTEM_PROMPT = """You are an expert curriculum designer. Your job is to
create a structured study roadmap when given a learning goal.

Return ONLY valid JSON with no prose, no markdown code fences, no explanation.
The JSON must match this exact schema:

{
  "goal": "the original learning goal exactly as given",
  "total_weeks": &lt;integer between 1 and 12&gt;,
  "weekly_hours": &lt;integer between 3 and 10&gt;,
  "topics": [
    {
      "title": "Short topic name (3-6 words)",
      "description": "One clear sentence explaining what this topic covers",
      "estimated_minutes": &lt;integer between 30 and 120&gt;,
      "prerequisites": ["title of earlier topic if required, else empty list"],
      "status": "pending"
    }
  ]
}

Rules:
- Order topics from foundational to advanced
- prerequisites must reference earlier topic titles exactly as written
- Aim for 4 to 6 topics
- status must always be "pending"
"""
</code></pre>
<p>Two things about the model setup here. First, <code>temperature=0.1</code>. Very low, because structured JSON output needs consistency. A higher temperature introduces variation that makes JSON parsing unreliable.</p>
<p>Second, <code>format="json"</code>. This is Ollama's JSON mode, a constraint at the inference level. The model can't produce output that isn't valid JSON, regardless of what the prompt asks. It's stronger than just telling the model to output JSON in the system prompt.</p>
<pre><code class="language-python">def build_planner_llm() -&gt; ChatOllama:
    return ChatOllama(
        model=MODEL_NAME,
        base_url=OLLAMA_BASE_URL,
        temperature=0.1,
        format="json",
    )
</code></pre>
<p>The parser is separated from the node function intentionally. This makes it independently testable without an LLM call. All 11 unit tests in <code>tests/test_curriculum_planner.py</code> call <code>parse_roadmap_json()</code> directly:</p>
<pre><code class="language-python">def parse_roadmap_json(json_string: str) -&gt; StudyRoadmap:
    """Parse the LLM's JSON output into a StudyRoadmap dataclass."""
    try:
        data = json.loads(json_string)
    except json.JSONDecodeError as e:
        raise ValueError(
            f"LLM returned invalid JSON.\n"
            f"Error: {e}\n"
            f"Raw output (first 300 chars): {json_string[:300]}"
        )

    required = ["goal", "total_weeks", "topics"]
    for field in required:
        if field not in data:
            raise ValueError(f"LLM JSON missing required field: '{field}'")

    if not isinstance(data["topics"], list) or len(data["topics"]) == 0:
        raise ValueError("LLM JSON 'topics' must be a non-empty list")

    topics = []
    for i, t in enumerate(data["topics"]):
        for field in ["title", "description", "estimated_minutes"]:
            if field not in t:
                raise ValueError(f"Topic {i} missing required field: '{field}'")
        topics.append(Topic(
            title=t["title"],
            description=t["description"],
            estimated_minutes=int(t["estimated_minutes"]),
            prerequisites=t.get("prerequisites", []),
            status=t.get("status", "pending"),
        ))

    return StudyRoadmap(
        goal=data["goal"],
        total_weeks=int(data["total_weeks"]),
        weekly_hours=int(data.get("weekly_hours", 5)),
        topics=topics,
    )
</code></pre>
<p>The node function itself follows the same pattern that every agent in this system uses:</p>
<pre><code class="language-python">def curriculum_planner_node(state: dict) -&gt; dict:
    """
    LangGraph node: Curriculum Planner

    Reads:  state["goal"]
    Writes: state["roadmap"], state["messages"], state["error"]
    """
    goal = state.get("goal", "").strip()
    if not goal:
        return {"error": "No learning goal provided."}

    print(f"\n[Curriculum Planner] Building roadmap for: '{goal}'")

    llm = build_planner_llm()
    messages = [
        SystemMessage(content=PLANNER_SYSTEM_PROMPT),
        HumanMessage(content=f"Create a study roadmap for: {goal}"),
    ]

    print(f"[Curriculum Planner] Calling {MODEL_NAME}...")
    response = llm.invoke(messages)

    try:
        roadmap = parse_roadmap_json(response.content)
    except ValueError as e:
        print(f"[Curriculum Planner] Parse error: {e}")
        return {
            "error": str(e),
            "messages": messages + [response],
        }

    print(f"[Curriculum Planner] Created {len(roadmap.topics)} topics")

    # Return ONLY the keys this node changed
    return {
        "roadmap": roadmap,
        "messages": messages + [response],
        "error": None,
    }
</code></pre>
<p>Notice the return value: <code>{"roadmap": roadmap, "messages": ..., "error": None}</code>. Not the full state – only the three keys this node touched. LangGraph merges these into the existing state. Every other field stays unchanged.</p>
<h3 id="heading-23-the-graph-definition">2.3 The Graph Definition</h3>
<p>The graph is wiring, not logic. All business logic lives in the agent modules. <code>src/graph/workflow.py</code> only describes which nodes exist, how they connect, and what decisions the routing functions make:</p>
<pre><code class="language-python"># src/graph/workflow.py

import os
import sqlite3
from pathlib import Path

from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.graph import END, START, StateGraph

from agents.curriculum_planner import curriculum_planner_node
from agents.explainer import explainer_node
from agents.human_approval import human_approval_node
from agents.progress_coach import progress_coach_node
from agents.quiz_generator import quiz_generator_node
from graph.state import AgentState, session_is_complete


def route_after_approval(state: dict) -&gt; str:
    if state.get("approved", False):
        return "explainer"
    return "curriculum_planner"


def route_after_coach(state: dict) -&gt; str:
    if session_is_complete(state):
        return "end"
    return "explainer"


def build_graph(
    db_path: str = "data/checkpoints.db",
    interrupt_before: list | None = None,
):
    Path("data").mkdir(exist_ok=True)
    if db_path == "data/checkpoints.db":
        db_path = os.getenv("CHECKPOINT_DB", db_path)

    builder = StateGraph(AgentState)

    # Register all five nodes
    builder.add_node("curriculum_planner", curriculum_planner_node)
    builder.add_node("human_approval", human_approval_node)
    builder.add_node("explainer", explainer_node)
    builder.add_node("quiz_generator", quiz_generator_node)
    builder.add_node("progress_coach", progress_coach_node)

    # Static edges
    builder.add_edge(START, "curriculum_planner")
    builder.add_edge("curriculum_planner", "human_approval")
    builder.add_edge("explainer", "quiz_generator")
    builder.add_edge("quiz_generator", "progress_coach")

    # Conditional edges
    builder.add_conditional_edges(
        "human_approval",
        route_after_approval,
        {"explainer": "explainer", "curriculum_planner": "curriculum_planner"},
    )
    builder.add_conditional_edges(
        "progress_coach",
        route_after_coach,
        {"explainer": "explainer", "end": END},
    )

    # IMPORTANT: create the connection directly, not via context manager.
    # SqliteSaver.from_conn_string() returns a context manager. If you use
    # `with SqliteSaver.from_conn_string(...) as checkpointer:`, the connection
    # closes when the `with` block exits. The graph object lives longer than
    # build_graph(), so the connection must stay open for the process lifetime.
    conn = sqlite3.connect(db_path, check_same_thread=False)
    checkpointer = SqliteSaver(conn)

    return builder.compile(
        checkpointer=checkpointer,
        interrupt_before=interrupt_before or [],
    )


graph = build_graph()
</code></pre>
<h4 id="heading-the-sqlitesaver-connection-pattern">💡 The SqliteSaver connection pattern</h4>
<p>The <code>check_same_thread=False</code> flag is required. SQLite's default behavior prevents a connection created on one thread from being used on another.</p>
<p>LangGraph runs node functions and checkpoint writes on different threads internally. Without this flag, you'll get <code>ProgrammingError: SQLite objects created in a thread can only be used in that same thread</code> at runtime. The flag is safe here because LangGraph serializes checkpoint writes: there's no concurrent write contention.</p>
<p>The routing functions are pure Python. No LLM calls. They read from state and return a string. That string determines which node runs next. Keep control flow logic in Python, not in LLMs. An LLM routing decision introduces non-determinism into your graph's control flow, which makes it very hard to reason about and test.</p>
<p>The <code>interrupt_before</code> parameter defaults to an empty list. The terminal interface uses <code>interrupt()</code> <em>inside</em> <code>human_approval_node</code> to pause for roadmap approval, which you'll see in Chapter 5, so no compile-time interrupt is needed.</p>
<p>The Streamlit UI (Chapter 9) passes <code>interrupt_before=["quiz_generator"]</code> to stop the graph before the quiz node runs, so <code>input()</code> is never called inside the graph thread. The same graph builder supports both modes.</p>
<p>Here is what the complete graph looks like:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6983b18befedc65b9820e223/96774b41-787f-420b-ac36-a6883c79bb3c.png" alt="Flowchart of the LangGraph workflow showing the order of execution: START flows into curriculum_planner, then human_approval which contains an interrupt that pauses for user input, then a route_after_approval decision diamond that branches on dashed conditional edges (approved=true continues to explainer, approved=false loops back to curriculum_planner as the rejection loop); explainer flows into quiz_generator, then progress_coach, then a route_after_coach decision diamond that branches on dashed conditional edges (more topics loops back to explainer as the study loop, all done flows to END); solid arrows mark static edges and dashed arrows mark conditional edges." style="display:block;margin:0 auto" width="1668" height="681" loading="lazy">

<p><em>Figure 2. The complete LangGraph graph. Static edges are solid. Conditional edges are dashed. The routing function determines which path executes at runtime.</em></p>
<h3 id="heading-24-run-it-and-verify">2.4 Run it and Verify</h3>
<p>With the Curriculum Planner node and graph in place, you can run the first end-to-end test:</p>
<pre><code class="language-bash">python main.py "Learn Python closures and decorators from scratch"
</code></pre>
<p>You should see:</p>
<pre><code class="language-plaintext">============================================================
Learning Accelerator
Session ID: a3f1b2c4
Goal: Learn Python closures and decorators from scratch
============================================================

[Curriculum Planner] Building roadmap for: 'Learn Python closures...'
[Curriculum Planner] Calling qwen2.5:7b...
[Curriculum Planner] Created 5 topics

Proposed Study Plan
============================================================
Goal: Learn Python closures and decorators from scratch
Duration: 2 weeks @ 5 hrs/week

  1. Python Functions Review (45 min)
     Review function definition, arguments, return values, and scope basics
  2. Scope and the LEGB Rule (60 min)
     Understand how Python resolves variable names across nested scopes
  3. Closures Explained (75 min) (needs: Scope and the LEGB Rule)
     ...
</code></pre>
<p>The graph pauses here. The <code>interrupt()</code> call inside <code>human_approval_node</code> causes it to stop, save a checkpoint, and return control to the caller. Your terminal is waiting. Type <code>yes</code> to continue or <code>no</code> to regenerate.</p>
<p>📌 <strong>Checkpoint:</strong> You have a working graph with state persistence. The session ID printed at the top is stored in <code>data/checkpoints.db</code>. If you kill the process now and run <code>python main.py --resume a3f1b2c4</code>, it will pick up exactly at the approval prompt. Checkpointing is already working.</p>
<p>Now run the unit tests to verify the parsing logic:</p>
<pre><code class="language-bash">pytest tests/test_state.py tests/test_curriculum_planner.py -v
</code></pre>
<p>Expected: 35 tests, all passing, no Ollama required. These tests exercise <code>parse_roadmap_json()</code>, the state dataclasses, and the utility functions: everything except the actual LLM call.</p>
<p>The enterprise pattern here: a sales enablement system follows the same graph structure. A curriculum planner generates an onboarding path for a new sales rep, a manager approves it before training begins, then the study loop runs through product knowledge topics. The graph checkpoints after every topic. If a rep comes back after lunch, the system resumes exactly where they left off.</p>
<p>In the next chapter, you'll add the Model Context Protocol so your agents have standardized tool access, then build the Explainer: the first agent that calls tools in a loop and iterates until it has enough context to write a grounded explanation.</p>
<h2 id="heading-chapter-3-standardized-tool-access-with-mcp">Chapter 3: Standardized Tool Access with MCP</h2>
<p>The Explainer agent needs to read your study notes before it can explain anything. The Progress Coach needs to store and retrieve session data. Both could call Python functions directly, but that would couple every agent to the filesystem layout, the storage schema, and however you implemented those functions.</p>
<p>The Model Context Protocol solves this with a clean separation: agents describe <em>what</em> they need, tool servers handle <em>how</em> it's done. Change the storage backend, and no agent code changes. Build the same tool server once, and any MCP-compatible agent (LangGraph, CrewAI, Claude Desktop, or anything else) can use it.</p>
<h3 id="heading-31-mcps-three-primitives">3.1 MCP's Three Primitives</h3>
<p>MCP has three types of capabilities a server can expose:</p>
<ol>
<li><p><strong>Tools</strong> are executable functions the agent calls with arguments. <code>read_study_file(filename)</code> is a Tool. The agent controls when it's called and with what arguments. The server handles the implementation.</p>
</li>
<li><p><strong>Resources</strong> are structured data the agent reads, identified by a URI. <code>notes://index</code> is a Resource. Think of these as read-only HTTP GET endpoints. The server controls what data is available, the agent reads it on demand.</p>
</li>
<li><p><strong>Prompts</strong> are reusable prompt templates the server owns and the agent requests by name. This system doesn't use Prompts heavily, but they exist for cases where a tool server wants to own the prompt design for its domain.</p>
</li>
</ol>
<p>The key distinction: Tools are about actions, Resources are about data. If the agent needs to <em>do</em> something, it's a Tool. If the agent needs to <em>read</em> something structured, it's a Resource.</p>
<h4 id="heading-mcp-as-a-stable-contract">💡 MCP as a stable contract</h4>
<p>Think of MCP as the stable contract between agents and tools. The Explainer agent knows the tool is called <code>read_study_file</code> and takes a <code>filename</code> argument. Whether the implementation reads from disk, fetches from an S3 bucket, or queries a database is invisible to the agent.</p>
<p>That's the value. You can swap the implementation without touching any agent code.</p>
<h3 id="heading-32-build-the-filesystem-mcp-server">3.2 Build the Filesystem MCP Server</h3>
<p>The filesystem server gives agents access to your study notes. It exposes three tools and one resource.</p>
<pre><code class="language-python"># src/mcp_servers/filesystem_server.py

import os
from pathlib import Path
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("Filesystem Server")

# Path configured via environment variable
NOTES_BASE = Path(os.getenv("NOTES_PATH", "study_materials/sample_notes"))


@mcp.tool()
def list_study_files() -&gt; list[str]:
    """
    List all available study note files.

    Returns a list of filenames relative to the notes directory.
    Example: ['closures.md', 'decorators.md', 'python_basics.md']

    Always call this first to discover what materials are available
    before attempting to read specific files.
    """
    if not NOTES_BASE.exists():
        return []
    return sorted([
        str(f.relative_to(NOTES_BASE))
        for f in NOTES_BASE.rglob("*.md")
    ])


@mcp.tool()
def read_study_file(filename: str) -&gt; str:
    """
    Read the full content of a study note file.

    Args:
        filename: The filename to read, exactly as returned by
                  list_study_files(). Example: 'closures.md'

    Returns the full text content, or an error string if not found.
    Never raises. Errors are returned as strings so the agent
    can handle them gracefully.
    """
    file_path = NOTES_BASE / filename

    # Security: path traversal prevention.
    # Without this, an agent could call read_study_file("../../.env")
    # and expose your API keys. We resolve both paths and verify
    # the requested file is inside the notes directory.
    try:
        resolved = file_path.resolve()
        resolved.relative_to(NOTES_BASE.resolve())
    except ValueError:
        return (
            f"Error: path traversal attempt blocked for '{filename}'. "
            f"Only files within the notes directory are accessible."
        )

    if not file_path.exists():
        available = list_study_files()
        return f"Error: '{filename}' not found. Available: {available}"

    if file_path.suffix != ".md":
        return f"Error: only .md files are accessible, got '{file_path.suffix}'"

    try:
        return file_path.read_text(encoding="utf-8")
    except (PermissionError, OSError) as e:
        return f"Error reading '{filename}': {e}"


@mcp.tool()
def search_notes(query: str) -&gt; list[dict]:
    """
    Search across all study notes for a keyword or phrase.

    Args:
        query: The search term. Case-insensitive substring match.

    Returns a list of matches, each with keys: 'file', 'line_number', 'line'.
    Maximum 20 results to avoid overwhelming the context window.
    """
    if not NOTES_BASE.exists():
        return []

    results = []
    query_lower = query.lower()

    for file_path in sorted(NOTES_BASE.rglob("*.md")):
        rel_path = str(file_path.relative_to(NOTES_BASE))
        try:
            lines = file_path.read_text(encoding="utf-8").splitlines()
        except (UnicodeDecodeError, PermissionError, OSError):
            continue

        for line_num, line in enumerate(lines, 1):
            if query_lower in line.lower():
                results.append({
                    "file": rel_path,
                    "line_number": line_num,
                    "line": line.strip(),
                })
                if len(results) &gt;= 20:
                    return results

    return results


@mcp.resource("notes://index")
def get_notes_index() -&gt; str:
    """
    Resource: index of all available study materials with file sizes.
    URI: notes://index
    """
    files = list_study_files()
    if not files:
        return "# Study Materials Index\n\nNo study materials found."

    lines = ["# Study Materials Index\n"]
    for filename in files:
        file_path = NOTES_BASE / filename
        try:
            size_kb = file_path.stat().st_size / 1024
            lines.append(f"- **{filename}** ({size_kb:.1f} KB)")
        except OSError:
            lines.append(f"- **{filename}** (size unknown)")
    lines.append(f"\nTotal: {len(files)} file(s)")
    return "\n".join(lines)


if __name__ == "__main__":
    print(f"[Filesystem MCP] Starting server")
    print(f"[Filesystem MCP] Serving files from: {NOTES_BASE.resolve()}")
    mcp.run()
</code></pre>
<p><code>@mcp.tool()</code> and <code>@mcp.resource()</code> are the entire integration surface. FastMCP reads the function name (which becomes the tool name), the docstring (which becomes the description the LLM reads to decide whether to use the tool), and the type annotations (which become the argument schema). That's the full contract between the server and any client that connects to it.</p>
<p>The docstrings deserve attention. The LLM calling these tools reads the docstring to decide when to use the tool and with what arguments. A vague docstring (something like "reads a file") leads to incorrect tool selection. The docstrings in this server tell the agent exactly when to call each tool and what format the arguments should be in.</p>
<h3 id="heading-33-build-the-memory-mcp-server">3.3 Build the Memory MCP Server</h3>
<p>The memory server gives agents a session-scoped key-value store. The Explainer writes which topics it has explained. The Progress Coach reads that history before deciding what to do next.</p>
<pre><code class="language-python"># src/mcp_servers/memory_server.py

from datetime import datetime, timezone
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("Memory Server")

# In-process store: {session_id: {key: {"value": str, "updated_at": str}}}
# For production: replace with Redis or PostgreSQL.
# The MCP interface stays identical. Only this dict changes.
_store: dict[str, dict] = {}


def _now_iso() -&gt; str:
    return datetime.now(timezone.utc).isoformat()


@mcp.tool()
def memory_set(session_id: str, key: str, value: str) -&gt; str:
    """
    Store a value in session memory.

    Values are always strings. Use JSON for complex data:
    memory_set(session_id, 'quiz_scores', json.dumps([0.8, 0.6]))

    Args:
        session_id: Scopes this data to one study session.
        key: Descriptive name. Examples: 'explained_topics', 'last_quiz_score'
        value: String value. Use JSON for lists or dicts.
    """
    if session_id not in _store:
        _store[session_id] = {}
    _store[session_id][key] = {"value": value, "updated_at": _now_iso()}
    return f"Stored '{key}' for session '{session_id}'"


@mcp.tool()
def memory_get(session_id: str, key: str) -&gt; str:
    """
    Retrieve a value from session memory.

    Returns the stored value, or the string "null" if the key doesn't exist.
    Returns "null" (not Python None) so the LLM can handle the missing case
    without type errors.
    """
    session = _store.get(session_id, {})
    entry = session.get(key)
    return "null" if entry is None else entry["value"]


@mcp.tool()
def memory_list_keys(session_id: str) -&gt; list[str]:
    """List all keys stored for a session. Returns [] if none exist."""
    return list(_store.get(session_id, {}).keys())


@mcp.tool()
def memory_delete(session_id: str, key: str) -&gt; str:
    """Delete a specific key from session memory."""
    session = _store.get(session_id, {})
    if key in session:
        del session[key]
        return f"Deleted '{key}' from session '{session_id}'"
    return f"Key '{key}' not found in session '{session_id}'"


@mcp.resource("notes://session/{session_id}")
def get_session_summary(session_id: str) -&gt; str:
    """Full summary of everything stored for a session. URI: notes://session/{session_id}"""
    session = _store.get(session_id, {})
    if not session:
        return f"# Session Memory: {session_id}\n\nNo data stored yet."
    lines = [f"# Session Memory: {session_id}\n"]
    for key, entry in sorted(session.items()):
        lines.append(f"## {key}")
        lines.append(f"- Value: {entry['value']}\n")
    return "\n".join(lines)


if __name__ == "__main__":
    print("[Memory MCP] Starting server")
    mcp.run()
</code></pre>
<p>The <code>_store</code> dict is intentionally simple. The entire memory server could be replaced with a Redis backend and no agent code would change. Only the implementation of <code>memory_set</code> and <code>memory_get</code> would. That's the value of the protocol boundary.</p>
<p>The choice to return the string <code>"null"</code> rather than Python <code>None</code> from <code>memory_get</code> is deliberate. When a <code>ToolMessage</code> contains <code>None</code>, some model versions handle it poorly. Returning <code>"null"</code> gives the LLM a string it can reason about ("the key doesn't exist yet") without type-handling edge cases.</p>
<h3 id="heading-34-how-agents-use-mcp-tools-the-tool-calling-loop">3.4 How Agents Use MCP Tools: the Tool-calling Loop</h3>
<p>The Explainer agent is where everything from Chapter 2 (state) and Chapter 3 (MCP) comes together. It's also the first agent in the system that makes multiple LLM calls: one per tool invocation, iterating until the LLM decides it has enough information to write an explanation.</p>
<p>In <code>src/agents/explainer.py</code>, the MCP server functions are imported directly as Python functions and wrapped with LangChain's <code>@tool</code> decorator:</p>
<pre><code class="language-python"># src/agents/explainer.py (setup section)

import json, os
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage
from langchain_core.tools import tool
from langchain_ollama import ChatOllama

from graph.state import get_current_topic
from mcp_servers.filesystem_server import list_study_files, read_study_file, search_notes
from mcp_servers.memory_server import memory_get, memory_set

MODEL_NAME = os.getenv("OLLAMA_MODEL", "qwen2.5:7b")
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")


@tool
def tool_list_files() -&gt; list[str]:
    """
    List all available study note files in the notes directory.
    Returns filenames like ['closures.md', 'decorators.md'].
    Call this FIRST to discover what materials exist before reading any file.
    """
    return list_study_files()


@tool
def tool_read_file(filename: str) -&gt; str:
    """
    Read the complete content of a study note file.
    Args:
        filename: Exact filename as returned by tool_list_files().
    Returns the full file text, or an error string if not found.
    """
    return read_study_file(filename)


@tool
def tool_search_notes(query: str) -&gt; str:
    """
    Search across all study notes for a keyword or phrase.
    Args:
        query: Search term (case-insensitive). Example: 'nonlocal', 'closure'
    Returns a JSON string with matching lines and their file locations.
    """
    results = search_notes(query)
    if not results:
        return "No matches found."
    return json.dumps(results, indent=2)


@tool
def tool_memory_get(session_id: str, key: str) -&gt; str:
    """
    Retrieve a value from session memory.
    Args:
        session_id: The current session ID (from state).
        key: The memory key to look up.
    Returns the stored value, or 'null' if not found.
    """
    return memory_get(session_id, key)


@tool
def tool_memory_set(session_id: str, key: str, value: str) -&gt; str:
    """
    Store a value in session memory for later agents to read.
    Args:
        session_id: The current session ID (from state).
        key: Descriptive key name.
        value: String value. Use JSON for complex data.
    """
    return memory_set(session_id, key, value)


EXPLAINER_TOOLS = [
    tool_list_files, tool_read_file, tool_search_notes,
    tool_memory_get, tool_memory_set,
]
TOOL_MAP = {t.name: t for t in EXPLAINER_TOOLS}
</code></pre>
<h4 id="heading-direct-import-vs-subprocess-transport">⚠️ Direct import vs. subprocess transport</h4>
<p>In this tutorial, MCP tools are imported as Python functions and wrapped with <code>@tool</code>. This runs everything in one process. It's simpler for development, has zero subprocess overhead, and easy to test.</p>
<p>In production, MCP servers run as separate processes communicating over stdio or HTTP. You'd use <code>MultiServerMCPClient</code> from <code>langchain-mcp-adapters</code> to connect. The agent code is nearly identical in both modes – only the tool wrapping changes.</p>
<p>The Explainer's system prompt tells the LLM not just what tools are available, but <em>how to use them in sequence</em>:</p>
<pre><code class="language-python">EXPLAINER_SYSTEM_PROMPT = """You are an expert tutor explaining topics to a student.

Your explanations must be grounded in the student's actual study materials.
Use the available tools to find and read relevant notes before explaining.

APPROACH (follow this sequence):
1. Call tool_list_files() to see what materials are available
2. Call tool_search_notes(topic) to find which files cover this topic
3. Call tool_read_file(filename) to read the most relevant file(s)
4. Check prior context: call tool_memory_get(session_id, 'explained_topics')
5. Write your explanation based on what you found in the notes

EXPLANATION FORMAT:
- Start with a real-world analogy (1-2 sentences)
- State the core concept clearly (2-3 sentences)
- Show a concrete code example from the student's notes
- End with one common mistake or gotcha to watch out for

After writing the explanation, store what you explained:
  tool_memory_set(session_id, 'explained_topics', &lt;comma-separated topic titles&gt;)
"""
</code></pre>
<p>The tool-calling loop in <code>explainer_node</code> is the core mechanism worth understanding carefully:</p>
<pre><code class="language-python"># src/agents/explainer.py (node function)

def execute_tool_call(tool_call: dict) -&gt; str:
    """Execute a tool call and return the result as a string. Never raises."""
    name = tool_call["name"]
    args = tool_call["args"]
    if name not in TOOL_MAP:
        return f"Error: unknown tool '{name}'. Available: {list(TOOL_MAP.keys())}"
    try:
        result = TOOL_MAP[name].invoke(args)
        if isinstance(result, (list, dict)):
            return json.dumps(result)
        return str(result)
    except Exception as e:
        return f"Error executing {name}({args}): {type(e).__name__}: {e}"


def explainer_node(state: dict) -&gt; dict:
    """
    LangGraph node: Explainer Agent

    Reads:  state["roadmap"], state["current_topic_index"], state["session_id"]
    Writes: state["messages"], state["error"]
    """
    topic = get_current_topic(state)
    if topic is None:
        return {"error": "No current topic found."}

    session_id = state.get("session_id", "unknown")
    print(f"\n[Explainer] Topic: '{topic.title}'")

    llm = ChatOllama(
        model=MODEL_NAME,
        base_url=OLLAMA_BASE_URL,
        temperature=0.3,
    ).bind_tools(EXPLAINER_TOOLS)

    messages = [
        SystemMessage(content=EXPLAINER_SYSTEM_PROMPT),
        HumanMessage(content=(
            f"Please explain this topic to me: '{topic.title}'\n"
            f"Context: {topic.description}\n"
            f"Session ID for memory calls: {session_id}"
        )),
    ]

    max_iterations = 8
    final_response = None

    for iteration in range(max_iterations):
        print(f"[Explainer] LLM call {iteration + 1}/{max_iterations}...")
        response = llm.invoke(messages)
        messages.append(response)

        if not response.tool_calls:
            final_response = response
            print(f"[Explainer] Complete after {iteration + 1} LLM call(s)")
            break

        print(f"[Explainer] {len(response.tool_calls)} tool call(s) requested:")
        for tool_call in response.tool_calls:
            print(f"  → {tool_call['name']}({tool_call['args']})")
            result = execute_tool_call(tool_call)
            log_result = result[:100] + "..." if len(result) &gt; 100 else result
            print(f"    ← {log_result}")

            # The tool_call_id must match the ID the LLM assigned to the request.
            # Without this, the LLM can't correlate result to request.
            messages.append(ToolMessage(
                content=result,
                tool_call_id=tool_call["id"],
            ))

    if final_response is None:
        return {
            "messages": messages,
            "error": f"Explainer reached max iterations ({max_iterations}).",
        }

    print(f"[Explainer] Explanation: {len(final_response.content)} characters")
    return {"messages": messages, "error": None}
</code></pre>
<p>Let's walk through what happens during one execution:</p>
<p><strong>LLM call 1:</strong> The LLM receives the system prompt and the human message asking for an explanation of "Closures Explained". It responds with tool calls: <code>tool_list_files()</code> and <code>tool_search_notes("closure")</code>. No text explanation yet.</p>
<p><strong>Tool execution:</strong> <code>tool_list_files()</code> returns <code>["closures.md", "decorators.md", "python_basics.md"]</code>. <code>tool_search_notes("closure")</code> returns matching lines from <code>closures.md</code>. Both results are appended to the message list as <code>ToolMessage</code> objects with the matching <code>tool_call_id</code>.</p>
<p><strong>LLM call 2:</strong> The LLM now has the file list and search results. It requests <code>tool_read_file("closures.md")</code>.</p>
<p><strong>Tool execution:</strong> The full content of <code>closures.md</code> is returned as a <code>ToolMessage</code>.</p>
<p><strong>LLM call 3:</strong> The LLM has read the notes. It calls <code>tool_memory_set(session_id, "explained_topics", "Closures Explained")</code> to record that this topic was covered.</p>
<p><strong>LLM call 4:</strong> With context stored, the LLM produces the final explanation. No more tool calls in the response. The loop exits. The explanation is grounded in what's actually in your notes, not in the model's training data.</p>
<p>The <code>tool_call_id</code> matching on line <code>tool_call_id=tool_call["id"]</code> deserves attention. When the LLM requests a tool call, it assigns it an ID. The <code>ToolMessage</code> must include that same ID so the LLM can correlate the result to the request. Without it, the conversation is malformed and the model produces garbage output or errors.</p>
<p>The <code>max_iterations = 8</code> limit is a production circuit breaker. A confused model that calls tools indefinitely would otherwise run until you kill it. Eight iterations is enough for any legitimate explanation task. If a model reaches the limit, the error state triggers, and you can adjust the system prompt or switch to a larger model.</p>
<h3 id="heading-35-run-the-explainer">3.5 Run the Explainer</h3>
<p>Approve the roadmap when prompted, then watch the tool-calling loop in action:</p>
<pre><code class="language-bash">python main.py
</code></pre>
<p>After approval:</p>
<pre><code class="language-plaintext">[Explainer] Topic: 'Python Functions Review'
[Explainer] LLM call 1/8...
  → tool_list_files({})
    ← ["closures.md", "decorators.md", "python_basics.md"]
[Explainer] LLM call 2/8...
  → tool_search_notes({'query': 'functions'})
    ← [{"file": "python_basics.md", "line_number": 12, "line": "## Functions"}]
[Explainer] LLM call 3/8...
  → tool_read_file({'filename': 'python_basics.md'})
    ← # Python Basics\n\n## Variables and Types...
[Explainer] LLM call 4/8...
  → tool_memory_set({'session_id': 'a3f1b2c4', 'key': 'explained_topics', ...})
    ← Stored 'explained_topics' for session 'a3f1b2c4'
[Explainer] LLM call 5/8...
[Explainer] Complete after 5 LLM call(s)
[Explainer] Explanation: 487 characters
</code></pre>
<p>Every arrow (<code>→</code>) is a tool call the LLM requested. Every back-arrow (<code>←</code>) is the result returned to the LLM. The loop terminates at LLM call 5 because that response contains the final explanation and no further tool requests.</p>
<p>📌 <strong>Checkpoint:</strong> Run the MCP server tests to verify the tools work independently of the LLM:</p>
<pre><code class="language-bash">pytest tests/test_mcp_servers.py -v
</code></pre>
<p>Expected: 36 tests, all passing, no Ollama required. These tests call the tool functions directly as Python functions. No subprocess, no protocol overhead. The tools work in both modes (direct Python import and MCP protocol) because the tool functions are just regular Python.</p>
<p>The enterprise connection here: a compliance training system using this same pattern would have an MCP server exposing the regulatory content library instead of study notes. Agents query it by topic, read requirements, and generate certification assessments from the actual regulatory text, not from what the model thinks the regulations say. The grounding is the point.</p>
<p>In the next chapter, you'll add the Quiz Generator and Progress Coach, wire the conditional routing that makes the graph loop automatically through all topics, and run the complete four-agent system end to end.</p>
<h2 id="heading-chapter-4-building-the-four-agent-system">Chapter 4: Building the Four-Agent System</h2>
<p>The first three chapters built the foundation: a shared state definition, a graph that checkpoints after every node, two MCP servers, and the Explainer agent that uses those servers to ground its explanations in your actual notes. What you have is an LLM that reads files and explains topics.</p>
<p>This chapter completes the system. You'll add the Quiz Generator and Progress Coach, wire the conditional routing that makes the graph loop through every topic automatically, and run a complete end-to-end session.</p>
<h3 id="heading-41-the-quiz-generator-llm-as-judge">4.1 The Quiz Generator: LLM as Judge</h3>
<p>The Quiz Generator is the most architecturally interesting agent in the system because it uses two LLM calls with different purposes and different temperatures, deliberately kept separate.</p>
<p><strong>The generation call</strong> produces questions from the Explainer's output. It uses <code>temperature=0.4</code> (enough creativity to produce varied, non-repetitive questions across multiple topics) and <code>format="json"</code> to enforce structured output.</p>
<p><strong>The grading call</strong> evaluates the student's answer. It uses <code>temperature=0.1</code>. Analytical, consistent. Grading the same answer twice should produce the same score. Using the same temperature as generation would let the creative settings bleed into the analytical evaluation.</p>
<p>This is a production pattern worth naming: when one workflow has subtasks with fundamentally different requirements, giving them separate LLM calls with separate configurations produces better results than a single call that tries to do both.</p>
<pre><code class="language-python"># src/agents/quiz_generator.py

import json
import os
from datetime import datetime, timezone

from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_ollama import ChatOllama

from graph.state import QuizQuestion, QuizResult, get_current_topic

MODEL_NAME = os.getenv("OLLAMA_MODEL", "qwen2.5:7b")
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")

GENERATION_PROMPT = """You are a quiz designer for a student learning programming.

Given a topic and explanation, generate {n} quiz questions that test
genuine understanding, not just the ability to repeat memorized phrases.

Good questions require the student to:
  - Apply a concept to a new situation
  - Explain WHY something works, not just WHAT it does
  - Identify edge cases or common mistakes
  - Compare related concepts

Return ONLY valid JSON with no prose or markdown:
{{
  "questions": [
    {{
      "question": "Clear, specific question text ending with ?",
      "expected_answer": "Model answer in 1-3 sentences",
      "difficulty": "easy|medium|hard"
    }}
  ]
}}

Rules:
  - Include at least one question about a common mistake or gotcha
  - expected_answer should be concise but complete
  - Avoid yes/no questions. Ask for explanation or demonstration
"""

GRADING_PROMPT = """You are a fair teacher grading a student's answer.

Question: {question}
Model answer: {expected_answer}
Student's answer: {student_answer}

Grade the student's answer honestly. Be generous with partial credit:
  - Fundamentally correct with minor gaps: 0.7-0.9
  - Correct concept but imprecise: 0.5-0.7
  - Partially correct: 0.3-0.5
  - Fundamentally wrong: 0.0-0.2

Return ONLY valid JSON with no prose or markdown:
{{
  "correct": true,
  "score": 0.85,
  "feedback": "One specific sentence of feedback",
  "missing_concept": "Key concept missed, or empty string if answer is correct"
}}
"""
</code></pre>
<p>The <code>generate_questions</code> and <code>grade_answer</code> functions implement these two calls independently. Both are importable and callable as plain Python. No graph required. This makes them testable in isolation and reusable by the A2A service you'll build in Chapter 8.</p>
<pre><code class="language-python">def generate_questions(topic: str, explanation: str, n: int = 3) -&gt; list[dict]:
    """Generate n quiz questions from the Explainer's output."""
    llm = ChatOllama(
        model=MODEL_NAME,
        base_url=OLLAMA_BASE_URL,
        temperature=0.4,
        format="json",
    )

    prompt = GENERATION_PROMPT.format(n=n)
    try:
        response = llm.invoke([
            SystemMessage(content=prompt),
            HumanMessage(content=f"Topic: {topic}\n\nExplanation:\n{explanation}"),
        ])
        data = json.loads(response.content)
        questions = data.get("questions", [])
        if questions and isinstance(questions, list):
            return questions
    except Exception as e:
        print(f"[Quiz Generator] LLM call failed during question generation: {e}")

    # Fallback: one generic question
    return [{
        "question": f"In your own words, explain the key concept of {topic} and why it matters.",
        "expected_answer": "A clear explanation demonstrating conceptual understanding.",
        "difficulty": "medium",
    }]


def grade_answer(question: str, expected: str, student_answer: str) -&gt; dict:
    """Grade a student's answer using the LLM as judge."""
    llm = ChatOllama(
        model=MODEL_NAME,
        base_url=OLLAMA_BASE_URL,
        temperature=0.1,   # Analytical: grading must be consistent
        format="json",
    )

    prompt = GRADING_PROMPT.format(
        question=question,
        expected_answer=expected,
        student_answer=student_answer,
    )

    try:
        response = llm.invoke([HumanMessage(content=prompt)])
        return json.loads(response.content)
    except Exception as e:
        print(f"[Quiz Generator] LLM call failed during grading: {e}")
        return {
            "correct": False,
            "score": 0.5,
            "feedback": "Could not grade automatically. Please review manually.",
            "missing_concept": "",
        }
</code></pre>
<p>The <code>run_quiz</code> function orchestrates the interactive terminal session. It calls <code>generate_questions</code>, presents each question to the student via <code>input()</code>, grades each answer as it arrives, and builds the <code>QuizResult</code>:</p>
<pre><code class="language-python">def run_quiz(topic: str, explanation: str) -&gt; QuizResult:
    """Run an interactive quiz session in the terminal."""
    print(f"\n{'='*60}")
    print(f"Quiz: {topic}")
    print(f"{'='*60}")
    print("Answer each question in your own words. Press Enter to submit.\n")

    questions_data = generate_questions(topic, explanation, n=3)
    graded_questions = []
    total_score = 0.0
    weak_areas = []

    for i, q_data in enumerate(questions_data, 1):
        question_text = q_data["question"]
        expected = q_data["expected_answer"]
        difficulty = q_data.get("difficulty", "medium")

        print(f"Question {i} [{difficulty}]: {question_text}")
        user_answer = input("Your answer: ").strip()
        if not user_answer:
            user_answer = "(no answer provided)"

        print("Grading...")
        grade = grade_answer(question_text, expected, user_answer)

        score = float(grade.get("score", 0.0))
        correct = bool(grade.get("correct", False))
        feedback = grade.get("feedback", "")
        missing = grade.get("missing_concept", "")

        total_score += score
        status = "✓" if correct else "✗"
        print(f"{status} Score: {score:.0%}. {feedback}\n")

        if missing:
            weak_areas.append(missing)

        graded_questions.append(QuizQuestion(
            question=question_text,
            expected_answer=expected,
            user_answer=user_answer,
            correct=correct,
            feedback=feedback,
            score=score,
        ))

    avg_score = total_score / len(questions_data) if questions_data else 0.0
    correct_count = sum(1 for q in graded_questions if q.correct)

    print(f"{'='*60}")
    print(f"Quiz complete! Score: {avg_score:.0%} ({correct_count}/{len(graded_questions)} correct)")
    if weak_areas:
        print(f"Areas to review: {', '.join(set(weak_areas))}")
    print(f"{'='*60}\n")

    return QuizResult(
        topic=topic,
        questions=graded_questions,
        score=avg_score,
        weak_areas=list(set(weak_areas)),
        timestamp=datetime.now(timezone.utc).isoformat(),
    )
</code></pre>
<p>The LangGraph node extracts the Explainer's output from the message history and calls <code>run_quiz</code>. It then accumulates the result and the weak areas into state:</p>
<pre><code class="language-python">def quiz_generator_node(state: dict) -&gt; dict:
    """
    LangGraph node: Quiz Generator

    Reads:  state["roadmap"], state["current_topic_index"], state["messages"]
    Writes: state["quiz_results"], state["weak_areas"], state["error"]
    """
    topic = get_current_topic(state)
    if topic is None:
        return {"error": "No current topic. Curriculum Planner must run first"}

    # Extract the Explainer's final response from message history.
    # The Explainer's output is the last AIMessage that has no tool_calls.
    # Tool-calling responses have content too, but they also have tool_calls set.
    from langchain_core.messages import AIMessage
    messages = state.get("messages", [])
    explanation = ""
    for msg in reversed(messages):
        if isinstance(msg, AIMessage) and msg.content and not getattr(msg, "tool_calls", None):
            explanation = msg.content
            break

    if not explanation:
        print("[Quiz Generator] Warning: no explanation found, generating generic quiz")
        explanation = f"Topic: {topic.title}. {topic.description}"

    print(f"\n[Quiz Generator] Generating quiz for: '{topic.title}'")
    quiz_result = run_quiz(topic.title, explanation)

    existing_results = state.get("quiz_results", [])
    all_weak_areas = list(set(
        state.get("weak_areas", []) + quiz_result.weak_areas
    ))

    return {
        "quiz_results": existing_results + [quiz_result],
        "weak_areas": all_weak_areas,
        "error": None,
        # Pass state forward explicitly to preserve it across interrupt/resume
        "roadmap": state.get("roadmap"),
        "current_topic_index": state.get("current_topic_index", 0),
        "session_id": state.get("session_id", ""),
    }
</code></pre>
<h4 id="heading-why-quizresults-accumulates-instead-of-replaces">💡 Why <code>quiz_results</code> accumulates instead of replaces</h4>
<p>The Progress Coach needs the current quiz result. The session summary needs all of them. The node appends to the existing list (<code>existing_results + [quiz_result]</code>) rather than replacing it.</p>
<p><code>weak_areas</code> follows the same pattern: <code>set(existing + new)</code> deduplicates across topics so the final weak areas list is the union of everything the student struggled with in the session.</p>
<h3 id="heading-42-the-progress-coach-synthesis-and-routing">4.2 The Progress Coach: Synthesis and Routing</h3>
<p>The Progress Coach does three things in sequence: evaluate the quiz result, give the student feedback, and decide what happens next. The routing decision (loop to the next topic or end the session) is its most consequential responsibility.</p>
<pre><code class="language-python"># src/agents/progress_coach.py

import json
import os
from datetime import datetime, timezone

from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_ollama import ChatOllama

from graph.state import QuizResult, StudyRoadmap, get_latest_quiz_result
from mcp_servers.memory_server import memory_set

MODEL_NAME = os.getenv("OLLAMA_MODEL", "qwen2.5:7b")
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
PASS_THRESHOLD = 0.5

COACHING_PROMPT = """You are an encouraging learning coach reviewing a student's quiz results.

Provide a brief, warm coaching message (2-3 sentences max) based on:
  - The topic studied
  - Their score (0.0 = 0%, 1.0 = 100%)
  - Any weak areas identified

Return ONLY valid JSON:
{{
  "summary": "2-3 sentence encouraging summary",
  "encouragement": "One short motivational sentence for next steps"
}}

Be specific. Reference the topic and any weak areas by name.
Never be discouraging. A low score means "more practice needed", not "you failed."
"""
</code></pre>
<p>The <code>get_coaching_message</code> function makes a single LLM call with <code>temperature=0.4</code> and <code>format="json"</code>. The warmth in the response requires some temperature. <code>temperature=0.1</code> would produce technically correct but dry feedback:</p>
<pre><code class="language-python">def get_coaching_message(topic: str, score: float, weak_areas: list[str]) -&gt; dict:
    """Ask the LLM for a personalised coaching message."""
    llm = ChatOllama(
        model=MODEL_NAME,
        base_url=OLLAMA_BASE_URL,
        temperature=0.4,
        format="json",
    )
    context = {
        "topic":         topic,
        "score_percent": f"{score:.0%}",
        "weak_areas":    weak_areas if weak_areas else ["none identified"],
    }
    try:
        response = llm.invoke([
            SystemMessage(content=COACHING_PROMPT),
            HumanMessage(content=json.dumps(context)),
        ])
        return json.loads(response.content)
    except Exception as e:
        print(f"[Progress Coach] LLM call failed: {e}")
        return {
            "summary":      f"You scored {score:.0%} on {topic}. Keep going!",
            "encouragement": "Every topic builds on the last.",
        }
</code></pre>
<p>The node function ties everything together. It reads the latest quiz result, updates the topic status in the roadmap, persists progress to MCP memory, prints feedback, and advances the topic index:</p>
<pre><code class="language-python">def progress_coach_node(state: dict) -&gt; dict:
    """
    LangGraph node: Progress Coach

    Reads:  state["quiz_results"], state["roadmap"],
            state["current_topic_index"], state["session_id"]
    Writes: state["roadmap"], state["current_topic_index"],
            state["messages"], state["error"]
    """
    latest = get_latest_quiz_result(state)
    if latest is None:
        return {"error": "No quiz results. Quiz Generator must run first"}

    roadmap = state.get("roadmap")
    if roadmap is None:
        return {"error": "No roadmap found"}

    idx = state.get("current_topic_index", 0)
    session_id = state.get("session_id", "unknown")
    score = latest.score

    print(f"\n[Progress Coach] Topic: '{latest.topic}'")
    print(f"[Progress Coach] Score: {score:.0%}")
    if latest.weak_areas:
        print(f"[Progress Coach] Weak areas: {', '.join(latest.weak_areas)}")

    # Get coaching message from LLM
    coaching = get_coaching_message(latest.topic, score, latest.weak_areas)

    # Update topic status in the roadmap
    topics = roadmap.get("topics", []) if isinstance(roadmap, dict) else roadmap.topics
    if idx &lt; len(topics):
        topic = topics[idx]
        new_status = "completed" if score &gt;= PASS_THRESHOLD else "needs_review"
        if isinstance(topic, dict):
            topic["status"] = new_status
        else:
            topic.status = new_status

    # Advance the topic index
    next_idx = idx + 1
    all_done = next_idx &gt;= len(topics)

    # Persist progress to MCP memory
    memory_set(session_id, f"progress_topic_{idx}", json.dumps({
        "topic":      latest.topic,
        "score":      score,
        "weak_areas": latest.weak_areas,
        "timestamp":  datetime.now(timezone.utc).isoformat(),
    }))

    # Print coaching feedback
    print(f"\n{'─'*60}")
    print(f"Coach: {coaching['summary']}")
    print(f"{coaching['encouragement']}")

    if all_done:
        results = state.get("quiz_results", [])
        avg = sum(r.score for r in results) / max(len(results), 1)
        print(f"\nSession complete! Average: {avg:.0%}")
    else:
        next_topic = topics[next_idx]
        next_title = next_topic.get("title") if isinstance(next_topic, dict) else next_topic.title
        print(f"\nNext topic: '{next_title}'")
    print(f"{'─'*60}\n")

    return {
        "roadmap":              roadmap,
        "current_topic_index":  next_idx,
        "messages":             [AIMessage(content=coaching["summary"])],
        "error":                None,
    }
</code></pre>
<p>Two things worth understanding in this function.</p>
<p><strong>Why update topic status before advancing the index?</strong> Because the status change (<code>"pending"</code> to <code>"completed"</code> or <code>"needs_review"</code>) must happen at <code>topics[idx]</code>, not <code>topics[next_idx]</code>. The index is incremented <em>after</em> updating the current topic's status. Getting this order wrong means the wrong topic gets marked. It's a subtle bug that's easy to miss because the session still runs correctly to the eye.</p>
<p><strong>Why write to MCP memory?</strong> The Progress Coach persists each topic's result via <code>memory_set</code>. This serves a production use case: if the session is resumed after a crash or pause, the memory server has a record of what was covered and how the student performed. The Explainer can check this history via <code>tool_memory_get</code> when explaining subsequent topics, adapting its emphasis based on where the student struggled.</p>
<h3 id="heading-43-wiring-the-complete-graph">4.3 Wiring the Complete Graph</h3>
<p>With all four agents defined, <code>workflow.py</code> wires them into the complete graph. The wiring itself is the shortest file in the system: fewer than 50 lines that are almost entirely <code>add_node</code>, <code>add_edge</code>, and <code>add_conditional_edges</code> calls.</p>
<pre><code class="language-python"># src/graph/workflow.py

import os
import sqlite3
from pathlib import Path

from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.graph import END, START, StateGraph

from agents.curriculum_planner import curriculum_planner_node
from agents.explainer import explainer_node
from agents.human_approval import human_approval_node
from agents.progress_coach import progress_coach_node
from agents.quiz_generator import quiz_generator_node
from graph.state import AgentState, session_is_complete


def route_after_approval(state: dict) -&gt; str:
    if state.get("approved", False):
        return "explainer"
    return "curriculum_planner"


def route_after_coach(state: dict) -&gt; str:
    if session_is_complete(state):
        return "end"
    return "explainer"


def build_graph(
    db_path: str = "data/checkpoints.db",
    interrupt_before: list | None = None,
):
    """
    Build and compile the Learning Accelerator graph.

    Args:
        db_path:          Path to the SQLite checkpoint database.
        interrupt_before: Optional list of node names to pause before.
                          Used by the Streamlit UI to intercept quiz_generator.
    """
    Path("data").mkdir(exist_ok=True)
    if db_path == "data/checkpoints.db":
        db_path = os.getenv("CHECKPOINT_DB", db_path)

    builder = StateGraph(AgentState)

    builder.add_node("curriculum_planner", curriculum_planner_node)
    builder.add_node("human_approval",     human_approval_node)
    builder.add_node("explainer",          explainer_node)
    builder.add_node("quiz_generator",     quiz_generator_node)
    builder.add_node("progress_coach",     progress_coach_node)

    builder.add_edge(START, "curriculum_planner")
    builder.add_edge("curriculum_planner", "human_approval")
    builder.add_edge("explainer",          "quiz_generator")
    builder.add_edge("quiz_generator",     "progress_coach")

    builder.add_conditional_edges(
        "human_approval",
        route_after_approval,
        {"explainer": "explainer", "curriculum_planner": "curriculum_planner"},
    )
    builder.add_conditional_edges(
        "progress_coach",
        route_after_coach,
        {"explainer": "explainer", "end": END},
    )

    # CRITICAL: Create the connection directly. Do NOT use a context manager.
    # The connection must stay open for the process lifetime.
    # SqliteSaver requires check_same_thread=False because LangGraph runs
    # node functions and checkpoint writes on different threads.
    conn = sqlite3.connect(db_path, check_same_thread=False)
    checkpointer = SqliteSaver(conn)

    return builder.compile(
        checkpointer=checkpointer,
        interrupt_before=interrupt_before or [],
    )


graph = build_graph()
</code></pre>
<p>The <code>interrupt_before</code> parameter deserves a closer look here. The terminal interface (<code>main.py</code>) uses <code>interrupt()</code> inside <code>human_approval_node</code> to pause for roadmap approval. No <code>interrupt_before</code> needed.</p>
<p>The Streamlit UI (Chapter 9) needs a different kind of pause: it must stop before <code>quiz_generator_node</code> runs so that <code>input()</code> is never called inside the graph thread. The <code>build_graph(interrupt_before=["quiz_generator"])</code> call in <code>streamlit_app.py</code> produces a separate graph instance configured for UI use.</p>
<p>The terminal graph and the UI graph are compiled from the same builder. Only the pause point differs.</p>
<p>The routing functions are pure Python with no LLM calls. <code>route_after_approval</code> reads <code>state["approved"]</code>, a boolean the human approval node writes. <code>route_after_coach</code> calls <code>session_is_complete(state)</code>, which checks whether the topic index has advanced past the roadmap. All control flow is deterministic Python, not probabilistic LLM output.</p>
<h3 id="heading-44-the-complete-execution-flow">4.4 The Complete Execution Flow</h3>
<p>Here's what happens when you run <code>python main.py "Learn Python closures"</code> and type <code>yes</code> at the approval prompt:</p>
<pre><code class="language-plaintext">START
  ↓
curriculum_planner_node
  reads:  state["goal"]
  writes: state["roadmap"], state["messages"]
  ↓
human_approval_node
  interrupt() pauses here. Waits for user input.
  user types "yes"
  writes: state["approved"] = True + full state forward
  ↓  route_after_approval → "explainer"
explainer_node (topic 0)
  reads:  state["roadmap"], state["current_topic_index"]
  calls:  tool_list_files, tool_search_notes, tool_read_file
  writes: state["messages"]
  ↓
quiz_generator_node (topic 0)
  reads:  state["messages"] (extracts explanation)
  calls:  run_quiz() → 3 questions, 3 graded answers
  writes: state["quiz_results"], state["weak_areas"]
  ↓
progress_coach_node (topic 0)
  reads:  state["quiz_results"], state["roadmap"]
  writes: state["roadmap"] (topic 0 status updated)
          state["current_topic_index"] = 1
          state["messages"] (coaching message)
  ↓  route_after_coach → "explainer" (more topics remain)
explainer_node (topic 1)
  ...
  ↓
  [loop continues until current_topic_index &gt;= len(roadmap.topics)]
  ↓  route_after_coach → "end"
END
</code></pre>
<p>LangGraph checkpoints state after every node. If the process crashes between <code>quiz_generator_node</code> and <code>progress_coach_node</code>, the next <code>graph.invoke(None, config=config)</code> with the same session ID resumes from <code>progress_coach_node</code>. The quiz result is already in state.</p>
<h3 id="heading-45-run-the-complete-system">4.5 Run the Complete System</h3>
<p>With all four nodes registered:</p>
<pre><code class="language-bash">rm -f data/checkpoints.db
python main.py "Learn Python closures and decorators from scratch"
</code></pre>
<p>You'll see the planner, the approval prompt, then the full loop:</p>
<pre><code class="language-plaintext">[Curriculum Planner] Building roadmap for: 'Learn Python closures...'
[Curriculum Planner] Created roadmap: 5 topics, 4 weeks
  1. Python Functions (60 min)
  2. Scopes and Namespaces (45 min)
  3. Inner Functions (60 min)
  4. Creating Closures (75 min)
  5. Decorator Basics (60 min)

[Human Approval] Pausing for roadmap review...
&gt; yes
[Human Approval] Roadmap approved. Starting study session.

[Explainer] Topic: 'Python Functions'
[Explainer] LLM call 1/8...
  → tool_list_files({})
    ← ["closures.md", "decorators.md", "python_basics.md"]
[Explainer] LLM call 2/8...
  → tool_read_file({'filename': 'python_basics.md'})
    ← # Python Basics...
[Explainer] Complete after 4 LLM call(s)
[Explainer] Explanation: 1938 characters

[Quiz Generator] Generating quiz for: 'Python Functions'

============================================================
Quiz: Python Functions
============================================================
Question 1 [medium]: What is the difference between...
Your answer: Functions are first-class objects...
Grading...
✓ Score: 80%. Good explanation of first-class functions.

...

[Progress Coach] Topic: 'Python Functions'
[Progress Coach] Score: 73%
────────────────────────────────────────────────────────────
Coach: You have a solid grasp of Python functions, especially...
Keep building on this foundation as you move into closures!

Next topic: 'Scopes and Namespaces'
────────────────────────────────────────────────────────────

[Explainer] Topic: 'Scopes and Namespaces'
...
</code></pre>
<p>The loop runs automatically. When <code>progress_coach_node</code> writes <code>current_topic_index = 1</code>, <code>route_after_coach</code> returns <code>"explainer"</code>, and the graph calls <code>explainer_node</code> with the updated index. No external loop in <code>main.py</code>. The graph topology handles the iteration.</p>
<p>📌 <strong>Checkpoint:</strong> Run the full test suite:</p>
<pre><code class="language-bash">pytest tests/ -v
</code></pre>
<p>Expected: 184 tests collected, eval tests automatically deselected. The unit tests cover the quiz and coach nodes without requiring Ollama:</p>
<pre><code class="language-bash">pytest tests/test_quiz_and_coach.py -v
</code></pre>
<p>These tests mock the LLM calls and verify the state contract: that <code>quiz_results</code> accumulates correctly, that <code>current_topic_index</code> increments, and that the routing functions return the right strings.</p>
<p>In the next chapter, you'll dig into the two production capabilities that have quietly been working since Chapter 2: state persistence that survives crashes, and human-in-the-loop oversight that pauses the graph for approval and resumes when the user responds.</p>
<h2 id="heading-chapter-5-state-persistence-and-human-oversight">Chapter 5: State Persistence and Human Oversight</h2>
<p>Two problems have quietly been solved in the background since Chapter 2: the system can survive crashes, and it can pause mid-execution to wait for a human decision. This chapter makes both explicit. Understanding them is what separates a demo from a production system.</p>
<h3 id="heading-51-what-checkpointing-actually-does">5.1 What Checkpointing Actually Does</h3>
<p>Every time a LangGraph node completes, the framework serializes the full <code>AgentState</code> to SQLite and writes it under a <code>thread_id</code>. That thread ID is the session ID you create at the start of <code>run_session</code>.</p>
<p>The database structure is straightforward:</p>
<pre><code class="language-plaintext">data/checkpoints.db
  └── checkpoints table
        thread_id = "a3f1b2c4"   ← your session ID
        checkpoint blob           ← serialized AgentState after each node
</code></pre>
<p>Multiple checkpoints accumulate per session, one after each node. LangGraph always loads the latest. When you call <code>graph.invoke(None, config={"configurable": {"thread_id": "a3f1b2c4"}})</code>, LangGraph reads the most recent checkpoint for that thread ID and picks up from there.</p>
<p>The <code>get_langfuse_config</code> function in <code>src/observability/langfuse_setup.py</code> builds the config dict that carries the thread ID:</p>
<pre><code class="language-python">def get_langfuse_config(session_id: str) -&gt; dict:
    """
    Build the graph run config with session ID as the checkpoint thread ID.

    The config is passed to graph.invoke() on every call: both the initial
    invocation and any subsequent resume calls. LangGraph uses the thread_id
    to find and load the right checkpoint.
    """
    config = {
        "configurable": {
            "thread_id": session_id,
        }
    }
    # If Langfuse is configured, callbacks are added here (Chapter 6)
    handler = get_langfuse_handler(session_id)
    if handler:
        config["callbacks"] = [handler]
    return config
</code></pre>
<p>This config object is the single piece of context that connects every <code>graph.invoke</code> call in a session to the same checkpoint history.</p>
<h4 id="heading-the-sqlitesaver-connection-pattern">💡 The SqliteSaver connection pattern</h4>
<p>SqliteSaver can be initialised in two ways. The context manager form (<code>with SqliteSaver.from_conn_string(...) as checkpointer</code>) closes the connection when the <code>with</code> block exits. Since <code>graph = build_graph()</code> is a module-level variable that lives for the entire process, the <code>with</code> block would close the connection immediately after <code>build_graph()</code> returns. Every subsequent <code>graph.invoke</code> call would fail trying to write to a closed database.</p>
<p>The correct pattern is <code>conn = sqlite3.connect(db_path, check_same_thread=False)</code> followed by <code>checkpointer = SqliteSaver(conn)</code>. The connection stays open for the process lifetime.</p>
<p>The <code>check_same_thread=False</code> flag is required. SQLite's default prevents a connection created on one thread from being used on another. LangGraph runs node functions and checkpoint writes on different threads internally. Without this flag you get <code>ProgrammingError: SQLite objects created in a thread can only be used in that same thread</code> at runtime.</p>
<h3 id="heading-52-the-human-approval-node-interrupt-and-resume">5.2 The Human Approval Node: Interrupt and Resume</h3>
<p>The Human Approval node uses <code>interrupt()</code> to pause the graph mid-execution. This is how LangGraph implements human-in-the-loop: execution stops inside the node, state is checkpointed, and control returns to the caller. When the caller calls <code>graph.invoke(Command(resume=value), config=config)</code>, execution resumes inside the same node at the exact line where <code>interrupt()</code> was called, with <code>decision</code> set to <code>value</code>.</p>
<pre><code class="language-python"># src/agents/human_approval.py

from langgraph.types import interrupt
from graph.state import StudyRoadmap


def human_approval_node(state: dict) -&gt; dict:
    """
    LangGraph node: Human Approval

    Reads:  state["roadmap"]
    Writes: state["approved"]: True if approved, False if rejected.
            Also returns all other state keys explicitly (see note below).

    When approved=False, the conditional edge routes back to the
    Curriculum Planner to generate a new roadmap.
    When approved=True, the graph continues to the Explainer.
    """
    roadmap = state.get("roadmap")

    if roadmap is None:
        return {"approved": True}

    print(f"\n[Human Approval] Pausing for roadmap review...")

    # interrupt() pauses execution here.
    # The dict passed to interrupt() is the payload. The caller reads this
    # to know what to display to the user.
    # Execution resumes when Command(resume=value) is called by the caller.
    decision = interrupt({
        "type":   "roadmap_approval",
        "roadmap": roadmap,
        "prompt": (
            "Does this study plan look good?\n"
            "  Type 'yes' to start studying\n"
            "  Type 'no' to generate a different plan"
        ),
    })

    approved = str(decision).lower().strip() in ("yes", "y", "ok", "approve")

    if approved:
        print(f"[Human Approval] Roadmap approved. Starting study session.")
    else:
        print(f"[Human Approval] Roadmap rejected. Regenerating...")

    # LangGraph 1.1.0: after Command(resume=...), the next node receives only
    # the keys returned by this node. Not the full pre-interrupt checkpoint.
    # Returning the complete state explicitly ensures downstream agents
    # (explainer, quiz_generator, progress_coach) receive roadmap, session_id, etc.
    return {
        "approved":              approved,
        "roadmap":               roadmap,
        "goal":                  state.get("goal", ""),
        "session_id":            state.get("session_id", ""),
        "current_topic_index":   state.get("current_topic_index", 0),
        "quiz_results":          state.get("quiz_results", []),
        "weak_areas":            state.get("weak_areas", []),
        "study_materials_path":  state.get("study_materials_path",
                                           "study_materials/sample_notes"),
        "error":                 None,
    }
</code></pre>
<p>The comment about LangGraph 1.1.0 at the bottom of this function documents a real behaviour you will hit in production: after <code>Command(resume=...)</code>, the next node's state only contains what the interrupted node explicitly returns. If the node returns only <code>{"approved": True}</code>, the explainer node receives a state with no <code>roadmap</code>, no <code>session_id</code>, no <code>current_topic_index</code>, and immediately returns an error.</p>
<p>This is not a bug in your code. It's a known behaviour of LangGraph 1.1.0's state propagation after interrupt/resume. The fix is to return the full state explicitly.</p>
<p>Every state key that downstream nodes need must appear in the return dict. Nodes that run after an interrupt/resume boundary should be treated as if they're receiving state from scratch, not from a merged checkpoint.</p>
<h4 id="heading-interrupt-vs-interruptbefore">💡 interrupt() vs interrupt_before</h4>
<p>LangGraph offers two ways to pause a graph. <code>interrupt_before=["node_name"]</code> in <code>builder.compile()</code> pauses <em>before</em> the named node and is configured at compile time. <code>interrupt()</code> called <em>inside</em> a node pauses in the middle of that node's execution and can include a payload (a dict that the caller reads to know what to show the user).</p>
<p>This system uses <code>interrupt()</code> inside <code>human_approval_node</code> because the approval step needs to pass the roadmap object to the caller. The <code>interrupt_before</code> approach would pause before the node runs, but the roadmap is built <em>inside</em> the node's predecessor (<code>curriculum_planner_node</code>). Using <code>interrupt()</code> lets the node receive the roadmap, construct the approval payload, and pause, all in the right sequence.</p>
<p>The Streamlit UI uses <code>build_graph(interrupt_before=["quiz_generator"])</code> for a different reason: it needs to stop the graph before <code>quiz_generator_node</code> runs so that <code>input()</code> is never called inside the graph thread. Both mechanisms are correct for their respective use cases.</p>
<h3 id="heading-53-handling-the-interrupt-in-mainpy">5.3 Handling the Interrupt in <code>main.py</code></h3>
<p>The caller of <code>graph.invoke</code> needs to handle the case where the graph pauses. LangGraph signals a pause by including <code>"__interrupt__"</code> in the result dict. The interrupt payload (the dict you passed to <code>interrupt()</code>) is in <code>result["__interrupt__"][0].value</code>.</p>
<pre><code class="language-python"># main.py: the interrupt/resume loop

from langgraph.types import Command

result = graph.invoke(state, config=config)

while "__interrupt__" in result:
    interrupt_payload = result["__interrupt__"][0].value
    roadmap = interrupt_payload.get("roadmap")

    # Display the roadmap for the user to review
    if roadmap:
        print(f"\n{'='*60}")
        print("Proposed Study Plan")
        print(f"{'='*60}")
        print(f"Goal: {roadmap.goal}")
        print(f"Duration: {roadmap.total_weeks} weeks @ "
              f"{roadmap.weekly_hours} hrs/week\n")
        for i, topic in enumerate(roadmap.topics, 1):
            prereqs = (f" (needs: {', '.join(topic.prerequisites)})"
                       if topic.prerequisites else "")
            print(f"  {i}. {topic.title} ({topic.estimated_minutes} min){prereqs}")
            print(f"     {topic.description}")

    print(f"\n{interrupt_payload.get('prompt', 'Continue?')}")
    user_input = input("&gt; ").strip()

    # Resume the graph with the user's decision.
    # Command(resume=value) is how you pass input back to the interrupted node.
    result = graph.invoke(Command(resume=user_input), config=config)
</code></pre>
<p>The <code>while</code> loop handles the case where rejecting the roadmap causes the planner to regenerate, which triggers another interrupt. If the user types <code>no</code>, the graph runs <code>curriculum_planner_node</code> again, returns a new roadmap, hits <code>interrupt()</code> again, and the loop shows the new plan. The user can keep rejecting until satisfied. The loop only exits when the graph runs to completion without hitting another interrupt.</p>
<p>The structure is worth understanding precisely:</p>
<pre><code class="language-plaintext">graph.invoke(initial_state, config)
  → runs: curriculum_planner → human_approval (interrupt() fires)
  → returns: {"__interrupt__": [...]}  ← caller reads roadmap from here

main.py shows roadmap, collects "yes"

graph.invoke(Command(resume="yes"), config)
  → resumes: human_approval (decision = "yes", approved = True)
  → continues: explainer → quiz_generator → progress_coach → ... → END
  → returns: final state dict  ← no "__interrupt__" key
</code></pre>
<p>The <code>config</code> dict with the <code>thread_id</code> is identical on both <code>graph.invoke</code> calls. This is how LangGraph knows to load the checkpoint from the interrupted node rather than starting fresh.</p>
<h3 id="heading-54-resuming-a-crashed-session">5.4 Resuming a Crashed Session</h3>
<p>The same mechanism that handles approval also handles crash recovery. If the process dies between <code>explainer_node</code> and <code>quiz_generator_node</code>, the SQLite checkpoint has the full state as of the last completed node. Starting a new process and invoking with the same <code>thread_id</code> picks up from there.</p>
<p>The <code>--resume</code> flag in <code>main.py</code> implements this:</p>
<pre><code class="language-python"># main.py

if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description="Learning Accelerator")
    parser.add_argument("goal", nargs="?",
                        default="Learn Python closures and decorators from scratch")
    parser.add_argument("--resume", metavar="SESSION_ID",
                        help="Resume an existing session by ID")
    args = parser.parse_args()

    if args.resume:
        run_session(goal="", session_id=args.resume)
    else:
        run_session(goal=args.goal)
</code></pre>
<p>Inside <code>run_session</code>, a resume and a fresh start differ in exactly one line:</p>
<pre><code class="language-python"># For a new session: provide initial state
state = initial_state(goal, session_id)

# For a resume: pass None. LangGraph loads from the checkpoint.
state = None if is_resume else initial_state(goal, session_id)

result = graph.invoke(state, config=config)
</code></pre>
<p>When <code>state</code> is <code>None</code>, LangGraph loads the most recent checkpoint for the <code>thread_id</code> in <code>config</code> and continues from the last completed node. The session ID printed when the original session started is all you need:</p>
<pre><code class="language-bash"># Original session printed: Session ID: a3f1b2c4
# Process died mid-session

python main.py --resume a3f1b2c4
</code></pre>
<pre><code class="language-plaintext">============================================================
Learning Accelerator
Session ID: a3f1b2c4
Resuming existing session...
============================================================

[Explainer] Topic: 'Creating Closures'
...
</code></pre>
<p>The graph picks up at the next uncompleted node. Topics that already ran (with their explanations, quiz results, and coaching messages) stay in state. Only the remaining work runs.</p>
<h3 id="heading-55-the-deserialization-detail-you-need-to-know">5.5 The Deserialization Detail You Need to Know</h3>
<p>When LangGraph loads a checkpoint from SQLite, it deserializes the stored state back into Python objects. For primitive types (strings, ints, lists of strings), this is transparent. For your custom dataclasses (<code>Topic</code>, <code>StudyRoadmap</code>, <code>QuizResult</code>), LangGraph uses its internal msgpack serializer and may return them as plain dicts rather than dataclass instances.</p>
<p>This is why <code>get_current_topic</code>, <code>session_is_complete</code>, and <code>get_latest_quiz_result</code> in <code>state.py</code> all handle both forms:</p>
<pre><code class="language-python">def get_current_topic(state: dict) -&gt; Topic | None:
    roadmap = state.get("roadmap")
    if roadmap is None:
        return None

    # After checkpoint deserialization, roadmap may be a dict
    if isinstance(roadmap, dict):
        topics_raw = roadmap.get("topics", [])
    else:
        topics_raw = roadmap.topics

    idx = state.get("current_topic_index", 0)
    if idx &gt;= len(topics_raw):
        return None

    t = topics_raw[idx]
    # Individual topics may also be dicts after deserialization
    if isinstance(t, dict):
        return Topic.from_dict(t)
    return t
</code></pre>
<p>And it's why <code>Topic</code>, <code>StudyRoadmap</code>, and <code>QuizResult</code> each have <code>from_dict</code> classmethods. Not as a convenience, but as a necessity for resume to work correctly.</p>
<p>The same pattern applies in any production system that checkpoints custom objects. If your state contains dataclasses or Pydantic models, instrument every state accessor to handle both the live form and the deserialized form. Don't assume the type will be what you put in. Verify it at the point of use.</p>
<h3 id="heading-56-test-session-persistence">5.6 Test Session Persistence</h3>
<p>Run a session, kill it mid-way, and verify that the resume works:</p>
<pre><code class="language-bash">rm -f data/checkpoints.db
python main.py "Learn Python closures"
</code></pre>
<p>After the roadmap appears and you type <code>yes</code>, wait until you see <code>[Explainer] Complete after N LLM call(s)</code>. Then press <code>Ctrl+C</code> to kill the process. Note the session ID printed at the start.</p>
<p>Now resume:</p>
<pre><code class="language-bash">python main.py --resume &lt;session-id&gt;
</code></pre>
<p>The session should continue from the Quiz Generator. The explanation is already in state, so it goes straight to the questions for the first topic.</p>
<p>📌 <strong>Checkpoint:</strong> Run the checkpointing tests:</p>
<pre><code class="language-bash">pytest tests/test_checkpointing.py -v
</code></pre>
<p>Expected: 20 tests, all passing. These tests verify the checkpoint round-trip: that a session interrupted mid-run can be resumed and produces the expected state, and that the dict-vs-dataclass deserialization is handled correctly.</p>
<p>The enterprise connection: a sales enablement platform uses the same checkpoint pattern for manager approval.</p>
<p>When the curriculum agent builds a training plan for a new hire, the graph pauses and sends the manager a notification. The manager reviews the plan in a web dashboard, approves or modifies it, and submits. That HTTP POST calls <code>graph.invoke(Command(resume=decision), config=config)</code>. The LangGraph code is identical to the terminal version. Only the notification mechanism and input collection differ.</p>
<p>In the next chapter, you'll add observability: Langfuse capturing every agent call, LLM invocation, and tool execution as a structured trace you can query and visualise.</p>
<h2 id="heading-chapter-6-observability-with-langfuse">Chapter 6: Observability with Langfuse</h2>
<p>A multi-agent system that produces wrong output with no error is harder to debug than one that crashes. Standard infrastructure metrics (CPU, memory, request latency, error rate) tell you the system is healthy while the agents are reasoning incorrectly. You need a different kind of observability: one that captures not just whether a call was made, but what the model decided and why.</p>
<p>Langfuse provides this. It records every LLM call, every tool invocation, and the full message history at each step, grouped into traces by session. When something goes wrong, you open the trace for that session and see exactly what each agent received, what it called, and what it returned.</p>
<p>This chapter adds Langfuse to the system with a single integration point and a graceful degradation pattern: the system runs identically with or without Langfuse configured.</p>
<h3 id="heading-61-run-langfuse-locally-with-docker">6.1 Run Langfuse Locally with Docker</h3>
<p>Langfuse is self-hosted for this tutorial. All traces stay on your machine&nbsp;– no API keys required, no data leaves your network. The <code>docker-compose.yml</code> in the repository starts the full Langfuse stack:</p>
<pre><code class="language-yaml"># docker-compose.yml
services:
  langfuse-server:
    image: langfuse/langfuse:3
    depends_on:
      postgres:
        condition: service_healthy
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgresql://postgres:postgres@postgres:5432/langfuse
      NEXTAUTH_URL: http://localhost:3000
      NEXTAUTH_SECRET: local-dev-secret-change-in-production
      SALT: local-dev-salt-change-in-production
      ENCRYPTION_KEY: "0000000000000000000000000000000000000000000000000000000000000000"
      LANGFUSE_ENABLE_EXPERIMENTAL_FEATURES: "true"
      TELEMETRY_ENABLED: "false"

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: langfuse
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    volumes:
      - langfuse_postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres -d langfuse"]
      interval: 5s
      retries: 10

volumes:
  langfuse_postgres_data:
</code></pre>
<p>Start the stack:</p>
<pre><code class="language-bash">docker compose up -d
</code></pre>
<p>Wait about 20 seconds for Postgres to initialise. Then open <a href="http://localhost:3000">http://localhost:3000</a>, create an account (local, no email verification required), and create a project called <code>learning-accelerator</code>.</p>
<p>Langfuse will show you your API keys under <strong>Settings → API Keys</strong>. Copy both the public and secret keys into your <code>.env</code>:</p>
<pre><code class="language-bash">LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=http://localhost:3000
</code></pre>
<h3 id="heading-62-the-observability-module">6.2 The Observability Module</h3>
<p>The integration lives entirely in <code>src/observability/langfuse_setup.py</code>. Every other file in the project is unchanged. Agent nodes don't import from this module, call any Langfuse functions, or know whether observability is running.</p>
<p>This is the correct architecture for observability. If you add logging calls inside agent functions, you've coupled agent logic to the observability framework. Replacing Langfuse with a different tool means touching every agent. The callback pattern keeps that coupling out of your business logic entirely.</p>
<p>The module has four functions with one-way dependencies. Each builds on the previous:</p>
<pre><code class="language-python"># src/observability/langfuse_setup.py

import os


def _langfuse_configured() -&gt; bool:
    """
    Check whether Langfuse credentials are present in the environment.

    Returns False if either key is missing or empty. In that case the
    system runs without observability rather than raising an error.
    """
    public_key = os.getenv("LANGFUSE_PUBLIC_KEY", "").strip()
    secret_key = os.getenv("LANGFUSE_SECRET_KEY", "").strip()
    return bool(public_key and secret_key)
</code></pre>
<p><code>_langfuse_configured()</code> is the guard used by every other function. No credentials means no Langfuse, but the system still runs. This is the graceful degradation pattern: observability is a production enhancement, not a hard dependency.</p>
<pre><code class="language-python">def get_langfuse_handler(session_id: str, user_id: str = "local"):
    """
    Create a Langfuse callback handler for a session, or None if not configured.

    The handler is a LangChain CallbackHandler that Langfuse provides.
    When attached to graph.invoke(), it intercepts every LLM call, tool call,
    and chain invocation automatically. No changes to agent code required.
    """
    if not _langfuse_configured():
        return None

    try:
        from langfuse.langchain import CallbackHandler

        return CallbackHandler(
            public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
            secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
            host=os.getenv("LANGFUSE_HOST", "http://localhost:3000"),
            session_id=session_id,
            user_id=user_id,
            tags=["learning-accelerator", "local-inference"],
            metadata={
                "model":     os.getenv("OLLAMA_MODEL", "qwen2.5:7b"),
                "framework": "langgraph",
            },
        )
    except ImportError:
        print("[Observability] langfuse not installed. Run: pip install langfuse")
        return None
    except Exception as e:
        print(f"[Observability] Failed to create handler: {e}")
        return None
</code></pre>
<p>The <code>session_id</code> passed to <code>CallbackHandler</code> groups all traces from one study session together in the Langfuse UI. Every LLM call, tool invocation, and node execution from that session appears under a single session view. You can follow the complete reasoning chain from goal input to final quiz result.</p>
<p>The <code>tags</code> list appears as filterable labels in Langfuse. If you run multiple projects, <code>"learning-accelerator"</code> lets you filter to just this system's traces.</p>
<pre><code class="language-python">def get_langfuse_config(
    session_id: str,
    user_id: str = "local",
    extra_config: dict | None = None,
) -&gt; dict:
    """
    Build the complete LangGraph run config for a session.

    Merges the checkpoint thread_id with the Langfuse callback handler.
    This is the only function main.py calls. One function, one config dict,
    everything set up.

    Returns a dict ready to pass as `config` to graph.invoke().
    """
    config = {
        "configurable": {"thread_id": session_id},
    }

    if extra_config:
        config.update(extra_config)

    handler = get_langfuse_handler(session_id, user_id)
    if handler:
        config["callbacks"] = [handler]
        print(f"[Observability] Tracing session {session_id} → "
              f"{os.getenv('LANGFUSE_HOST', 'http://localhost:3000')}")
    else:
        print(f"[Observability] Langfuse not configured. Running without tracing.")

    return config
</code></pre>
<p><code>get_langfuse_config</code> merges two concerns into one dict: the <code>thread_id</code> that LangGraph uses for checkpointing, and the <code>callbacks</code> list that LangChain uses to route observability events.</p>
<p>These two keys coexist because <code>graph.invoke(state, config=config)</code> passes the full config to LangGraph, which routes <code>configurable</code> keys to the checkpointer and <code>callbacks</code> to the callback system. Neither system interferes with the other.</p>
<pre><code class="language-python">def flush_langfuse() -&gt; None:
    """
    Flush pending traces before process exit.

    Langfuse sends traces in a background thread. Without this call,
    the last few seconds of traces may be lost when the process exits.
    Call this at the end of main.py, after all graph.invoke() calls.
    """
    if not _langfuse_configured():
        return
    try:
        from langfuse import Langfuse
        Langfuse().flush()
    except Exception:
        pass  # Best-effort. Don't crash on exit.
</code></pre>
<p>The <code>flush</code> call matters in practice. Langfuse batches traces and sends them asynchronously. A short-running process like <code>python main.py</code> can exit before the batch is sent. <code>flush()</code> blocks until the queue is empty.</p>
<h3 id="heading-63-the-single-integration-point">6.3 The Single Integration Point</h3>
<p>Everything above integrates into <code>main.py</code> in exactly two places:</p>
<pre><code class="language-python"># main.py

from observability.langfuse_setup import get_langfuse_config, flush_langfuse

def run_session(goal: str, session_id: str | None = None) -&gt; None:
    ...
    # One function call replaces: {"configurable": {"thread_id": session_id}}
    # It returns that same dict, plus callbacks if Langfuse is configured.
    config = get_langfuse_config(session_id)

    result = graph.invoke(state, config=config)
    while "__interrupt__" in result:
        ...
        result = graph.invoke(Command(resume=user_input), config=config)

    print_session_summary(result)

    # Flush before exit
    flush_langfuse()
</code></pre>
<p>That's the complete integration. No imports in agent files. No Langfuse calls scattered through the codebase. No conditional checks in node functions. The callback handler intercepts calls at the LangChain framework level. Your agent code is untouched.</p>
<h4 id="heading-what-the-callback-system-captures-automatically">💡 What the callback system captures automatically</h4>
<p>The <code>CallbackHandler</code> hooks into LangChain's callback protocol. Every time a LangChain-compatible object (<code>ChatOllama</code>, a tool, a chain, a graph node) starts or finishes execution, it fires callback events. Langfuse's handler catches these and records them as trace spans.</p>
<p>For this system, that means every <code>llm.invoke()</code> call across all five agents, every <code>TOOL_MAP[name].invoke(args)</code> call in the Explainer's tool-calling loop, every node start and end time, and the full message history at each step are all captured without any code change in the agents.</p>
<h3 id="heading-64-what-you-see-in-the-langfuse-ui">6.4 What You See in the Langfuse UI</h3>
<p>Run a session with Langfuse configured:</p>
<pre><code class="language-bash">python main.py "Learn Python closures"
</code></pre>
<p>Open <a href="http://localhost:3000">http://localhost:3000</a> and navigate to <strong>Traces</strong>. You'll see a trace for your session. Expand it:</p>
<pre><code class="language-plaintext">Session: a3f1b2c4
  ├── curriculum_planner_node       245ms
  │     └── ChatOllama.invoke       238ms
  │           input:  "Create a study roadmap for..."
  │           output: {"goal": "Learn Python closures", "topics": [...]}
  │
  ├── human_approval_node           (interrupted, user input collected)
  │
  ├── explainer_node                4,821ms
  │     ├── ChatOllama.invoke       312ms   → tool_list_files()
  │     ├── tool_list_files         2ms     ← ["closures.md", ...]
  │     ├── ChatOllama.invoke       287ms   → tool_read_file("closures.md")
  │     ├── tool_read_file          1ms     ← "# Python Closures\n..."
  │     ├── ChatOllama.invoke       1,204ms → (no tool calls. final explanation)
  │     └── tool_memory_set         1ms
  │
  ├── quiz_generator_node           8,342ms
  │     ├── ChatOllama.invoke       1,890ms  (question generation)
  │     ├── ChatOllama.invoke       892ms    (grading Q1)
  │     ├── ChatOllama.invoke       874ms    (grading Q2)
  │     └── ChatOllama.invoke       891ms    (grading Q3)
  │
  └── progress_coach_node           1,102ms
        └── ChatOllama.invoke       1,088ms
</code></pre>
<p>There are three things this trace tells you immediately that no infrastructure metric would reveal.</p>
<ol>
<li><p><strong>Latency breakdown by agent.</strong> The Quiz Generator takes 8 seconds across four LLM calls. If you need to optimise latency, the grading calls are the target: three calls at ~900ms each, potentially parallelisable.</p>
</li>
<li><p><strong>Tool call sequence.</strong> The Explainer called <code>tool_list_files</code>, then <code>tool_read_file</code>, then wrote to memory, in the right order. If the sequence is wrong, you see it here before you look at any code.</p>
</li>
<li><p><strong>LLM input and output at every step.</strong> If the Curriculum Planner produces a malformed roadmap, you see the raw LLM output in the trace. If the grader gives an incorrect score, you see what it received and what it returned.</p>
</li>
</ol>
<h3 id="heading-65-graceful-degradation">6.5 Graceful Degradation</h3>
<p>The system is designed to run identically with and without Langfuse. If you don't set the environment variables, <code>_langfuse_configured()</code> returns False and <code>get_langfuse_config</code> returns the minimal config with only <code>thread_id</code>:</p>
<pre><code class="language-python"># Without Langfuse configured
config = get_langfuse_config("a3f1b2c4")
# Returns: {"configurable": {"thread_id": "a3f1b2c4"}}

# With Langfuse configured
config = get_langfuse_config("a3f1b2c4")
# Returns: {"configurable": {"thread_id": "a3f1b2c4"},
#           "callbacks": [&lt;CallbackHandler&gt;]}
</code></pre>
<p>The agent nodes receive neither version of this config. They only receive <code>state</code>. The config is consumed by LangGraph and LangChain infrastructure, not by your business logic.</p>
<p>This is the right production pattern. Observability infrastructure should fail silently and degrade gracefully. An outage in your tracing backend shouldn't take down your application.</p>
<h3 id="heading-66-run-the-observability-tests">6.6 Run the Observability Tests</h3>
<pre><code class="language-bash">pytest tests/test_observability.py -v
</code></pre>
<p>Expected: 16 tests passing, no Langfuse server required. The tests mock the <code>_langfuse_configured</code> check and verify:</p>
<ul>
<li><p><code>get_langfuse_config</code> always includes <code>thread_id</code> in <code>configurable</code></p>
</li>
<li><p>No <code>callbacks</code> key appears when Langfuse is not configured</p>
</li>
<li><p><code>flush_langfuse</code> is a no-op when credentials are missing</p>
</li>
<li><p><code>get_langfuse_handler</code> returns <code>None</code> on <code>ImportError</code> without raising</p>
</li>
</ul>
<p>None of these tests require the Langfuse server to be running. They verify the integration logic: that the module behaves correctly in both the configured and unconfigured state.</p>
<p>The enterprise connection: production multi-agent systems in regulated industries use observability for compliance as much as debugging. Langfuse traces provide an auditable record of every LLM call (input, output, timestamp, session ID) that can be exported for regulatory review. The same trace that helps you debug a wrong quiz score can demonstrate to an auditor what the model was given and what it produced.</p>
<p>In the next chapter, you'll add automated quality evaluation: DeepEval running LLM-as-judge tests that verify the Explainer's output is faithful to your notes, and the Quiz Generator's questions are relevant to the topic.</p>
<h2 id="heading-chapter-7-evaluating-agent-quality-with-deepeval">Chapter 7: Evaluating Agent Quality with DeepEval</h2>
<p>Observability tells you what happened. Evaluation tells you whether what happened was any good.</p>
<p>A multi-agent system can run to completion with no errors while still producing explanations that hallucinate facts, questions that test the wrong thing, and grading that scores incorrect answers as correct.</p>
<p>These failures are invisible to infrastructure metrics. They're invisible to most unit tests. The only reliable way to catch them is to evaluate the LLM's outputs using another LLM as the judge.</p>
<p>This chapter adds automated quality evaluation using DeepEval with a custom <code>OllamaJudge</code> class. All evaluation runs locally. No cloud API keys, no per-evaluation cost.</p>
<h3 id="heading-71-llm-as-judge-evaluation">7.1 LLM-as-Judge Evaluation</h3>
<p>LLM-as-judge is the pattern of using one LLM call to evaluate the output of another. Given an explanation the Explainer produced, a judge model reads the explanation and the source notes and answers a structured question: "Is every claim in this explanation supported by the notes?"</p>
<p>This isn't a perfect evaluation. The judge model can also be wrong. But for the kind of qualitative assessment that matters here (is the explanation faithful? are the questions relevant? is the grading fair?), a carefully prompted LLM judge consistently outperforms rule-based heuristics and is far more practical than human review at scale.</p>
<p>DeepEval provides the evaluation framework. It handles the judge prompt construction, scoring rubrics, and metric aggregation. You provide the test cases and optionally a custom model.</p>
<h3 id="heading-72-the-ollamajudge-class">7.2 The OllamaJudge Class</h3>
<p>DeepEval uses OpenAI by default. To keep evaluation local, you subclass <code>DeepEvalBaseLLM</code> and wire it to your Ollama instance:</p>
<pre><code class="language-python"># tests/test_eval.py

import os
from deepeval.models import DeepEvalBaseLLM
from langchain_ollama import ChatOllama


class OllamaJudge(DeepEvalBaseLLM):
    """
    Custom judge model using local Ollama.

    DeepEval supports custom models via the DeepEvalBaseLLM interface.
    We wrap ChatOllama to provide synchronous and async generation.

    The judge runs at temperature=0.0 for consistency. The same answer
    evaluated twice should produce the same score.
    """

    def __init__(self):
        self.model_name = os.getenv("OLLAMA_MODEL", "qwen2.5:7b")
        self.base_url   = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")

    def load_model(self):
        return ChatOllama(
            model=self.model_name,
            base_url=self.base_url,
            temperature=0.0,   # Deterministic for evaluation
        )

    def generate(self, prompt: str) -&gt; str:
        return self.load_model().invoke(prompt).content

    async def a_generate(self, prompt: str) -&gt; str:
        return self.generate(prompt)

    def get_model_name(self) -&gt; str:
        return f"ollama/{self.model_name}"


def get_judge_model():
    """Return an OllamaJudge, or None if deepeval is not installed."""
    try:
        return OllamaJudge()
    except ImportError:
        return None
</code></pre>
<p><code>temperature=0.0</code> on the judge is a deliberate choice. You want evaluation to be stable: run the same test twice and get the same score. A higher temperature introduces variance that makes it hard to tell whether a score change reflects a real quality change or random sampling.</p>
<h3 id="heading-73-the-two-tier-test-strategy">7.3 The Two-tier Test Strategy</h3>
<p>The test suite uses two tiers with different execution profiles.</p>
<p><strong>Unit tests</strong> are fast, no Ollama required, and they run on every code change. These verify the structural contracts: does <code>generate_questions</code> return a list of dicts with the right keys? Does <code>grade_answer</code> always return a dict with <code>correct</code>, <code>score</code>, and <code>feedback</code>? Does <code>get_coaching_message</code> always return <code>summary</code> and <code>encouragement</code>?</p>
<p><strong>Eval tests</strong> are slow (30 to 120 seconds each), require Ollama running, and run before significant changes or releases. These verify quality: is the Explainer's output faithful to the notes? Do the grader's scores track with actual answer quality?</p>
<p>The separation is enforced in two places. First, <code>pyproject.toml</code> adds <code>addopts = "-m 'not eval'"</code> so <code>pytest tests/</code> skips eval tests by default:</p>
<pre><code class="language-toml">[tool.pytest.ini_options]
pythonpath = ["src"]
testpaths  = ["tests"]
asyncio_mode = "auto"
addopts    = "-m 'not eval'"
markers = [
    "unit: fast tests, no external dependencies",
    "eval: slow evaluation tests requiring Ollama (LLM-as-judge)",
]
</code></pre>
<p>Second, every eval test class and function is decorated with <code>@pytest.mark.eval</code>:</p>
<pre><code class="language-python">@pytest.mark.eval
class TestExplainerQuality:
    ...
</code></pre>
<p>Running eval tests explicitly:</p>
<pre><code class="language-bash">pytest tests/test_eval.py -m eval -v -s
</code></pre>
<p>The <code>-s</code> flag disables output capture so you can see the model's scores and reasoning in real time.</p>
<h3 id="heading-74-shared-fixtures-in-conftestpy">7.4 Shared Fixtures in <code>conftest.py</code></h3>
<p><code>tests/conftest.py</code> holds fixtures shared across all test files:</p>
<pre><code class="language-python"># tests/conftest.py

import sys
from pathlib import Path
import pytest

sys.path.insert(0, str(Path(__file__).parent.parent / "src"))


def pytest_configure(config):
    """Register custom markers so pytest doesn't warn about unknown marks."""
    config.addinivalue_line(
        "markers",
        "eval: marks tests requiring Ollama (deselect with -m 'not eval')"
    )
    config.addinivalue_line(
        "markers",
        "unit: marks fast tests with no external dependencies"
    )


@pytest.fixture
def sample_roadmap():
    """A minimal StudyRoadmap for use in unit tests."""
    from graph.state import StudyRoadmap, Topic
    return StudyRoadmap(
        goal="Learn Python closures",
        total_weeks=2,
        topics=[
            Topic(
                title="Closures Explained",
                description="Understand how closures capture enclosing scope variables",
                estimated_minutes=60,
            ),
            Topic(
                title="Practical Closure Patterns",
                description="Apply closures to real problems: factories, memoisation",
                estimated_minutes=45,
                prerequisites=["Closures Explained"],
            ),
        ],
    )


@pytest.fixture
def sample_state(sample_roadmap):
    """A minimal AgentState dict for use in unit tests."""
    from graph.state import initial_state
    state = initial_state("Learn Python closures", "test-session-001")
    state["roadmap"] = sample_roadmap
    state["current_topic_index"] = 0
    return state


@pytest.fixture
def closures_note_content():
    """
    The content of closures.md, used as retrieval context in faithfulness tests.
    Falls back to an inline summary if the file doesn't exist.
    """
    notes_path = (
        Path(__file__).parent.parent
        / "study_materials/sample_notes/closures.md"
    )
    if notes_path.exists():
        return notes_path.read_text(encoding="utf-8")
    return (
        "A closure is a nested function that remembers variables from its "
        "enclosing scope even after the enclosing function returns."
    )
</code></pre>
<p>The <code>closures_note_content</code> fixture is the retrieval context for faithfulness tests. DeepEval's <code>FaithfulnessMetric</code> asks the judge to verify each claim in the explanation against this content. If the Explainer invents a fact not present in the notes, the metric catches it.</p>
<h3 id="heading-75-the-explainer-quality-tests">7.5 The Explainer Quality Tests</h3>
<p>The eval tests for the Explainer answer two questions: is the output faithful to the notes, and is it relevant to what was asked?</p>
<pre><code class="language-python"># tests/test_eval.py

def run_explainer(topic_title: str, topic_description: str, session_id: str) -&gt; str:
    """Run the Explainer agent and return its final explanation text."""
    from graph.state import StudyRoadmap, Topic, initial_state
    from agents.explainer import explainer_node
    from langchain_core.messages import AIMessage

    state = initial_state(f"Learn {topic_title}", session_id)
    state["roadmap"] = StudyRoadmap(
        goal=f"Learn {topic_title}",
        total_weeks=1,
        topics=[Topic(topic_title, topic_description, 60)],
    )
    state["current_topic_index"] = 0

    result = explainer_node(state)

    # Extract the final response: last AIMessage with no tool_calls
    for msg in reversed(result.get("messages", [])):
        if (isinstance(msg, AIMessage) and msg.content
                and not getattr(msg, "tool_calls", None)):
            return msg.content
    return ""


@pytest.mark.eval
class TestExplainerQuality:

    FAITHFULNESS_THRESHOLD = 0.6
    RELEVANCY_THRESHOLD    = 0.6

    @pytest.fixture(autouse=True)
    def setup(self, closures_note_content):
        """Run the Explainer once, reuse the output across all tests in this class."""
        self.retrieval_context = [closures_note_content]
        self.explanation = run_explainer(
            topic_title="Closures Explained",
            topic_description="Understand how closures capture enclosing scope variables",
            session_id="eval-test-001",
        )
        if not self.explanation:
            pytest.skip("Explainer returned empty output. Check Ollama is running.")

    def test_explanation_is_faithful_to_notes(self):
        """
        The explanation should not hallucinate facts not in the source notes.

        FaithfulnessMetric asks the judge: is every claim in the output
        supported by the retrieval context (the notes)?
        A low score means the agent is making things up.
        """
        from deepeval.test_case import LLMTestCase
        from deepeval.metrics import FaithfulnessMetric

        judge = get_judge_model()
        if judge is None:
            pytest.skip("Could not initialise judge model")

        test_case = LLMTestCase(
            input="Explain Python closures",
            actual_output=self.explanation,
            retrieval_context=self.retrieval_context,
        )
        metric = FaithfulnessMetric(
            model=judge,
            threshold=self.FAITHFULNESS_THRESHOLD,
            include_reason=True,
        )
        metric.measure(test_case)

        print(f"\n[Faithfulness] Score: {metric.score:.3f}")
        if hasattr(metric, "reason"):
            print(f"[Faithfulness] Reason: {metric.reason}")

        assert metric.score &gt;= self.FAITHFULNESS_THRESHOLD, (
            f"Faithfulness {metric.score:.3f} below {self.FAITHFULNESS_THRESHOLD}.\n"
            f"The explanation may contain hallucinated facts.\n"
            f"Reason: {getattr(metric, 'reason', 'not available')}"
        )

    def test_explanation_is_relevant_to_topic(self):
        """The explanation should address what was actually asked."""
        from deepeval.test_case import LLMTestCase
        from deepeval.metrics import AnswerRelevancyMetric

        judge = get_judge_model()
        if judge is None:
            pytest.skip("Could not initialise judge model")

        test_case = LLMTestCase(
            input="Explain Python closures",
            actual_output=self.explanation,
        )
        metric = AnswerRelevancyMetric(
            model=judge,
            threshold=self.RELEVANCY_THRESHOLD,
        )
        metric.measure(test_case)

        print(f"\n[Relevancy] Score: {metric.score:.3f}")

        assert metric.score &gt;= self.RELEVANCY_THRESHOLD, (
            f"Relevancy {metric.score:.3f} below {self.RELEVANCY_THRESHOLD}.\n"
            f"The explanation may have wandered off-topic."
        )
</code></pre>
<p>The <code>autouse=True</code> fixture in <code>TestExplainerQuality</code> runs the Explainer once and reuses the output across both tests. This avoids making two separate LLM calls (one per test) when the same explanation can serve both metrics.</p>
<h3 id="heading-76-the-grading-quality-tests">7.6 The Grading Quality Tests</h3>
<p>These tests verify that the grader's scores track with actual answer quality. They don't need DeepEval metrics. They call <code>grade_answer</code> directly and assert score ranges:</p>
<pre><code class="language-python">@pytest.mark.eval
class TestGradingQuality:

    def test_correct_answer_scores_high(self):
        """A clearly correct answer should score &gt;= 0.65."""
        from agents.quiz_generator import grade_answer

        result = grade_answer(
            question="What are the three requirements for a Python closure?",
            expected=(
                "A closure requires: 1) a nested inner function, "
                "2) the inner function references a variable from the enclosing scope, "
                "3) the enclosing function returns the inner function."
            ),
            student_answer=(
                "You need a nested function that uses variables from the outer "
                "function's scope, and the outer function has to return the inner function."
            ),
        )
        print(f"\n[GradeQuality] Correct answer: {result.get('score', 0):.2f}")
        assert result.get("score", 0) &gt;= 0.65, (
            f"Correct answer scored too low: {result['score']:.2f}\n"
            f"Feedback: {result.get('feedback', '')}"
        )

    def test_wrong_answer_scores_low(self):
        """A clearly wrong answer should score &lt;= 0.35."""
        from agents.quiz_generator import grade_answer

        result = grade_answer(
            question="What is a Python closure?",
            expected=(
                "A closure is a nested function that captures and remembers "
                "variables from its enclosing scope after the enclosing function returns."
            ),
            student_answer=(
                "A closure is a class that closes over its attributes "
                "and prevents external access to them."
            ),
        )
        print(f"\n[GradeQuality] Wrong answer: {result.get('score', 0):.2f}")
        assert result.get("score", 0) &lt;= 0.35, (
            f"Wrong answer scored too high: {result['score']:.2f}\n"
            f"The grader may be too lenient."
        )

    def test_partial_answer_scores_middle(self):
        """A partially correct answer should score between 0.3 and 0.75."""
        from agents.quiz_generator import grade_answer

        result = grade_answer(
            question="What is late binding in closures and how do you fix it?",
            expected=(
                "Late binding means closures look up variable values at call time, "
                "not at definition time. Fix: use default argument values "
                "(lambda i=i: i instead of lambda: i)."
            ),
            student_answer=(
                "Late binding means the closure uses the variable's current value "
                "when called, not when defined."  # Knows what, not how to fix
            ),
        )
        score = result.get("score", 0)
        print(f"\n[GradeQuality] Partial answer: {score:.2f}")
        assert 0.3 &lt;= score &lt;= 0.75, (
            f"Partial answer should score 0.3 to 0.75, got {score:.2f}"
        )
</code></pre>
<p>These three tests together give you calibration confidence: the grader rewards correct answers, penalises wrong ones, and gives appropriate partial credit. If any of the three fails after a model change or prompt update, you know immediately which direction the grader drifted.</p>
<h3 id="heading-77-the-coaching-quality-test">7.7 The Coaching Quality Test</h3>
<p>The coaching test uses DeepEval's <code>GEval</code> metric, a general-purpose evaluator where you write your own evaluation criteria in plain English:</p>
<pre><code class="language-python">@pytest.mark.eval
class TestProgressCoachQuality:

    COACHING_QUALITY_THRESHOLD = 0.6

    def test_coaching_message_is_encouraging_and_specific(self):
        """
        Coaching messages should be warm, specific, and actionable.

        GEval lets you write evaluation criteria in plain English.
        The judge scores the output 0.0 to 1.0 against those criteria.
        """
        from deepeval.test_case import LLMTestCase, LLMTestCaseParams
        from deepeval.metrics import GEval
        from agents.progress_coach import get_coaching_message

        judge = get_judge_model()
        if judge is None:
            pytest.skip("Could not initialise judge model")

        coaching = get_coaching_message(
            topic="Python Closures",
            score=0.67,
            weak_areas=["late binding", "nonlocal keyword"],
        )
        coaching_text = (
            f"Summary: {coaching.get('summary', '')}\n"
            f"Encouragement: {coaching.get('encouragement', '')}"
        )

        test_case = LLMTestCase(
            input=(
                "Generate coaching feedback for a student who scored 67% on "
                "Python Closures and struggled with late binding and nonlocal"
            ),
            actual_output=coaching_text,
        )
        metric = GEval(
            name="CoachingQuality",
            criteria=(
                "Evaluate whether this coaching message is: "
                "1) Encouraging without being dishonest about the score, "
                "2) Specific to the topic and weak areas mentioned, "
                "3) Actionable. Gives the student a clear next step. "
                "4) Concise. 2 to 4 sentences total. "
                "A poor message is generic, vague, or condescending."
            ),
            evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT],
            model=judge,
            threshold=self.COACHING_QUALITY_THRESHOLD,
        )
        metric.measure(test_case)

        print(f"\n[CoachingQuality] Score: {metric.score:.3f}")

        assert metric.score &gt;= self.COACHING_QUALITY_THRESHOLD, (
            f"Coaching quality {metric.score:.3f} below threshold.\n"
            f"Message:\n{coaching_text}"
        )
</code></pre>
<p><code>GEval</code> is the most flexible metric DeepEval offers. You describe what "good" looks like in plain language, and the judge scores against those criteria. Use it when you have qualitative requirements that are hard to express as a formula but easy to describe in words.</p>
<h3 id="heading-78-run-the-evaluation-suite">7.8 Run the Evaluation Suite</h3>
<p>Unit tests (fast, no Ollama):</p>
<pre><code class="language-bash">pytest tests/ -v
# 184 tests, eval tests automatically excluded
</code></pre>
<p>Eval tests (slow, Ollama required):</p>
<pre><code class="language-bash">pytest tests/test_eval.py -m eval -v -s
</code></pre>
<p>You'll see output like:</p>
<pre><code class="language-plaintext">[TestExplainerQuality] Running Explainer for closures topic...
[TestExplainerQuality] Explanation length: 1,847 chars

[Faithfulness] Score: 0.782 (threshold: 0.600)
[Faithfulness] Reason: All major claims trace back to the closures.md source material.
PASSED

[Relevancy] Score: 0.841
PASSED

[GradeQuality] Correct answer: 0.82
PASSED

[GradeQuality] Wrong answer: 0.15
PASSED

[GradeQuality] Partial answer: 0.55
PASSED

[CoachingQuality] Score: 0.731
PASSED
</code></pre>
<h4 id="heading-setting-thresholds-conservatively">💡 Setting thresholds conservatively</h4>
<p>Local 7B models score 0.6 to 0.8 on faithfulness and relevancy metrics. Cloud models typically score 0.8 to 0.95. The thresholds in these tests are set at 0.6: low enough to pass reliably with a local model, high enough to catch significant degradation.</p>
<p>If you upgrade to a larger model and want stricter quality gates, raise the thresholds. If a test is consistently failing with a model that produces good output subjectively, lower the threshold and document why.</p>
<p>The enterprise connection: an evaluation suite like this is how you manage the model update problem in production. When you swap from one model version to another, run the eval tests before deploying.</p>
<p>If faithfulness drops below threshold, the model change introduces hallucination risk. Roll it back. If the grader starts scoring correct answers too low, the threshold drift will affect student experience. The eval tests are your regression suite for LLM behaviour, the same way unit tests are your regression suite for code logic.</p>
<p>In the next chapter, you'll add the A2A protocol layer. The Quiz Generator becomes a standalone service that any agent or framework can call, and a CrewAI agent joins the system that the Progress Coach delegates to when a student needs supplementary help.</p>
<h2 id="heading-chapter-8-cross-framework-coordination-with-a2a">Chapter 8: Cross-Framework Coordination with A2A</h2>
<p>Every agent in the system so far is a Python function that LangGraph calls. That's fine, and for most production systems, keeping everything in one framework is the right choice.</p>
<p>But real infrastructure sometimes requires something different: an agent built with a different framework, maintained by a different team, deployed independently, and callable by anything that speaks HTTP.</p>
<p>The Agent-to-Agent (A2A) protocol makes this possible. A2A is an open standard (built on JSON-RPC 2.0 and HTTP) that gives any agent a standard way to advertise what it can do and accept tasks from any caller, regardless of what framework the caller uses.</p>
<p>A LangGraph agent and a CrewAI agent that have never heard of each other can coordinate through A2A the same way two REST services coordinate through HTTP.</p>
<p>This chapter adds two A2A services to the system: the Quiz Generator exposed as a standalone service, and a CrewAI Study Buddy that the Progress Coach calls when a student needs a different explanation angle.</p>
<h3 id="heading-81-how-a2a-works">8.1 How A2A Works</h3>
<p>A2A has three concepts worth understanding before writing any code.</p>
<p><strong>The Agent Card</strong> is a JSON document served at <code>/.well-known/agent-card.json</code>. It describes what the agent can do: its name, capabilities, skills, and how to send it tasks.</p>
<p>Any A2A client fetches this first to discover whether the agent can handle its request. The Agent Card is the agent's public API contract, analogous to an OpenAPI spec for a REST service.</p>
<p><strong>Task submission</strong> uses a single endpoint: <code>POST /tasks/send</code>. The request is a JSON-RPC 2.0 envelope wrapping a message: a role (<code>"user"</code>) and a list of parts (typically one <code>TextPart</code> with JSON content). The agent processes the task and responds with a message in the same format.</p>
<p><strong>Framework independence</strong> is the point. The A2A server handles all the HTTP and protocol mechanics. Your agent code goes in an <code>AgentExecutor</code> subclass: an <code>execute()</code> method that receives the parsed request and emits the response. The framework building the executor (LangGraph, CrewAI, or anything else) never appears in the protocol layer. Callers see only HTTP.</p>
<pre><code class="language-plaintext">Caller (any framework)
  ↓  GET /.well-known/agent-card.json   ← discover capabilities
  ↓  POST /tasks/send                   ← submit task (JSON-RPC 2.0)
  ↑  response with result artifacts
A2A Server (Starlette + uvicorn)
  ↓  calls AgentExecutor.execute()
Your agent logic (LangGraph / CrewAI / anything)
</code></pre>
<h3 id="heading-82-the-quiz-generator-as-an-a2a-service">8.2 The Quiz Generator as an A2A Service</h3>
<p><code>src/a2a_services/quiz_service.py</code> wraps <code>generate_questions</code> and <code>grade_answer</code> (the same functions used in Chapter 4) as an A2A service. Nothing in those functions changes.</p>
<p><strong>The Agent Card</strong> first:</p>
<pre><code class="language-python"># src/a2a_services/quiz_service.py

from a2a.types import AgentCapabilities, AgentCard, AgentSkill

QUIZ_SKILL = AgentSkill(
    id="generate_and_grade_quiz",
    name="Generate and Grade Quiz",
    description=(
        "Given a topic and optional explanation text, generates quiz questions "
        "that test conceptual understanding. If answers are provided, grades "
        "each answer and returns scores with identified weak areas."
    ),
    tags=["quiz", "assessment", "education", "grading"],
    examples=[
        "Generate a quiz on Python closures",
        "Grade these answers for a decorators quiz",
    ],
)

QUIZ_AGENT_CARD = AgentCard(
    name="Quiz Generator Service",
    description=(
        "Generates and grades quizzes using LLM-as-judge. "
        "Framework-agnostic: works with any A2A-compatible agent."
    ),
    url="http://localhost:9001/",
    version="1.0.0",
    defaultInputModes=["text"],
    defaultOutputModes=["text"],
    capabilities=AgentCapabilities(streaming=False),
    skills=[QUIZ_SKILL],
)
</code></pre>
<p>The Agent Card is served automatically at <code>GET /.well-known/agent-card.json</code> by the A2A framework. You don't write a handler for it.</p>
<p><strong>The AgentExecutor</strong> contains the actual quiz logic. It receives the parsed A2A request, calls <code>generate_questions</code> and optionally <code>grade_answer</code>, and emits the result:</p>
<pre><code class="language-python">from a2a.server.agent_execution import AgentExecutor, RequestContext
from a2a.server.events import EventQueue
from a2a.types import Message, TextPart
from agents.quiz_generator import generate_questions, grade_answer


class QuizAgentExecutor(AgentExecutor):
    """
    Handles incoming A2A quiz tasks.

    Request format (JSON in the TextPart):
    {
        "topic":       "Python Closures",
        "explanation": "A closure is...",   (optional)
        "answers":     ["answer 1", ...]    (optional. omit for questions only)
    }
    """

    async def execute(
        self,
        context: RequestContext,
        event_queue: EventQueue,
    ) -&gt; None:
        # Parse request
        request_text = ""
        for part in context.current_request.params.message.parts:
            if isinstance(part, TextPart):
                request_text += part.text

        try:
            request_data = json.loads(request_text)
        except json.JSONDecodeError:
            request_data = {"topic": request_text}

        topic             = request_data.get("topic", "General Knowledge")
        explanation       = request_data.get("explanation", "")
        provided_answers  = request_data.get("answers", [])

        # Generate questions (synchronous blocking call in thread pool)
        questions_data = await asyncio.to_thread(
            generate_questions, topic, explanation, 3
        )

        if not provided_answers:
            # No answers. Return questions only.
            result = {
                "status":    "questions_ready",
                "topic":     topic,
                "questions": questions_data,
            }
        else:
            # Grade provided answers
            graded     = []
            total      = 0.0
            weak_areas = []

            for q_data, answer in zip(questions_data, provided_answers):
                grade = await asyncio.to_thread(
                    grade_answer,
                    q_data["question"],
                    q_data["expected_answer"],
                    answer,
                )
                score = float(grade.get("score", 0.0))
                total += score
                if grade.get("missing_concept"):
                    weak_areas.append(grade["missing_concept"])
                graded.append({
                    "question": q_data["question"],
                    "answer":   answer,
                    "score":    score,
                    "correct":  bool(grade.get("correct", False)),
                    "feedback": grade.get("feedback", ""),
                })

            result = {
                "status":           "graded",
                "topic":            topic,
                "score":            total / len(questions_data) if questions_data else 0.0,
                "questions":        questions_data,
                "graded_questions": graded,
                "weak_areas":       list(set(weak_areas)),
            }

        # Emit result. A2A sends this back to the caller.
        await event_queue.enqueue_event(
            Message(
                role="agent",
                parts=[TextPart(text=json.dumps(result, indent=2))],
            )
        )

    async def cancel(self, context: RequestContext, event_queue: EventQueue) -&gt; None:
        pass
</code></pre>
<p><code>asyncio.to_thread</code> wraps the synchronous <code>generate_questions</code> and <code>grade_answer</code> calls. The A2A executor is async. It runs in an event loop. Calling a blocking function directly would freeze the loop and block all other tasks. <code>to_thread</code> runs the blocking function in a thread pool and awaits the result without blocking the event loop.</p>
<p><strong>Starting the server:</strong></p>
<pre><code class="language-python">from a2a.server.apps import A2AStarletteApplication
from a2a.server.request_handlers import DefaultRequestHandler
from a2a.server.tasks import InMemoryTaskStore

def create_quiz_server():
    handler = DefaultRequestHandler(
        agent_executor=QuizAgentExecutor(),
        task_store=InMemoryTaskStore(),
    )
    app = A2AStarletteApplication(
        agent_card=QUIZ_AGENT_CARD,
        http_handler=handler,
    )
    return app.build()

if __name__ == "__main__":
    uvicorn.run(create_quiz_server(), host="0.0.0.0", port=9001, log_level="warning")
</code></pre>
<pre><code class="language-bash">python src/a2a_services/quiz_service.py
# [Quiz A2A Service] Starting on http://localhost:9001
# [Quiz A2A Service] Agent Card: http://localhost:9001/.well-known/agent-card.json
</code></pre>
<p>Verify it's running:</p>
<pre><code class="language-bash">curl http://localhost:9001/.well-known/agent-card.json
</code></pre>
<pre><code class="language-json">{
  "name": "Quiz Generator Service",
  "description": "Generates and grades quizzes...",
  "url": "http://localhost:9001/",
  "skills": [
    {
      "id": "generate_and_grade_quiz",
      "name": "Generate and Grade Quiz"
    }
  ]
}
</code></pre>
<h3 id="heading-83-the-a2a-client">8.3 The A2A Client</h3>
<p><code>src/a2a_services/a2a_client.py</code> keeps the HTTP and protocol details out of agent code. The Progress Coach never constructs JSON-RPC envelopes. It calls <code>delegate_quiz_task</code> and gets a result dict back.</p>
<pre><code class="language-python"># src/a2a_services/a2a_client.py

import httpx
import json
import uuid

QUIZ_SERVICE_URL  = os.getenv("QUIZ_SERVICE_URL",  "http://localhost:9001")
STUDY_BUDDY_URL   = os.getenv("STUDY_BUDDY_URL",   "http://localhost:9002")
DEFAULT_TIMEOUT   = 120.0


def discover_agent(base_url: str) -&gt; dict:
    """Fetch an Agent Card to discover capabilities. Returns {} if unreachable."""
    card_url = f"{base_url.rstrip('/')}/.well-known/agent-card.json"
    try:
        response = httpx.get(card_url, timeout=5.0)
        response.raise_for_status()
        return response.json()
    except Exception as e:
        print(f"[A2A Client] Cannot reach {card_url}: {e}")
        return {}


def send_task(
    base_url: str,
    message_text: str,
    task_id: str | None = None,
    timeout: float = DEFAULT_TIMEOUT,
) -&gt; dict:
    """
    Submit a task to an A2A agent via JSON-RPC 2.0.

    The JSON-RPC envelope is what A2A requires. Your caller doesn't
    need to know about the envelope. It just passes a text payload.
    Pass an explicit task_id when you need an idempotency key; otherwise
    a UUID is generated for you.
    """
    payload = {
        "jsonrpc": "2.0",
        "id":      1,
        "method":  "tasks/send",
        "params": {
            "id":      task_id or str(uuid.uuid4()),
            "message": {
                "role":  "user",
                "parts": [{"type": "text", "text": message_text}],
            },
        },
    }

    url = f"{base_url.rstrip('/')}/tasks/send"
    try:
        response = httpx.post(url, json=payload, timeout=timeout)
        response.raise_for_status()
        data = response.json()

        # Extract text from the A2A response envelope:
        # result.artifacts[0].parts[0].text
        result    = data.get("result", {})
        artifacts = result.get("artifacts", [])
        if artifacts:
            for part in artifacts[0].get("parts", []):
                if part.get("type") == "text":
                    try:
                        return json.loads(part["text"])
                    except json.JSONDecodeError:
                        return {"text": part["text"]}

        # Fallback: check status message
        status = result.get("status", {})
        for part in status.get("message", {}).get("parts", []):
            if part.get("type") == "text":
                try:
                    return json.loads(part["text"])
                except json.JSONDecodeError:
                    return {"text": part["text"]}

        return result

    except httpx.TimeoutException:
        return {"error": f"Service timed out after {timeout}s"}
    except httpx.ConnectError:
        return {"error": f"Cannot connect to {url}"}
    except Exception as e:
        return {"error": f"A2A task failed: {e}"}


def delegate_quiz_task(
    topic: str,
    explanation: str,
    answers: list[str] | None = None,
    quiz_service_url: str = QUIZ_SERVICE_URL,
) -&gt; dict:
    """High-level helper: delegate a quiz task to the Quiz A2A service."""
    payload = json.dumps({
        "topic":       topic,
        "explanation": explanation,
        "answers":     answers or [],
    })
    return send_task(quiz_service_url, payload)


def is_quiz_service_available(quiz_service_url: str = QUIZ_SERVICE_URL) -&gt; bool:
    """Quick health check: is the quiz service reachable?"""
    return bool(discover_agent(quiz_service_url))
</code></pre>
<p><code>discover_agent</code> is the health check. It fetches the Agent Card at <code>/.well-known/agent-card.json</code> with a 5-second timeout. If that succeeds, the service is reachable and can accept tasks. The Progress Coach calls this before delegating. If it returns <code>{}</code>, the coach falls back to local quiz generation without ever trying the full task submission.</p>
<h3 id="heading-84-the-crewai-study-buddy">8.4 The CrewAI Study Buddy</h3>
<p>The Study Buddy demonstrates the core A2A value proposition: a LangGraph agent calling a CrewAI agent through a protocol neither knows about.</p>
<p><code>src/crewai_agent/study_buddy.py</code> builds a CrewAI agent, wraps it in an A2A <code>AgentExecutor</code>, and serves it on port 9002. The LangGraph Progress Coach never imports CrewAI. The CrewAI agent never imports LangGraph. They communicate only through HTTP.</p>
<p>The CrewAI side:</p>
<pre><code class="language-python"># src/crewai_agent/study_buddy.py

from crewai import Agent, Crew, LLM, Process, Task
from crewai.tools import BaseTool

MODEL_NAME     = os.getenv("OLLAMA_MODEL", "qwen2.5:7b")
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")


class TopicAnalyserTool(BaseTool):
    """
    Structures the Study Buddy's approach before generating its response.

    In production this might query a knowledge graph or curriculum database.
    For the tutorial, it produces structured guidance from the inputs.
    """
    name:        str = "topic_analyser"
    description: str = (
        "Analyse a study topic and weak areas to produce a structured "
        "list of key concepts to focus on."
    )
    args_schema: type = TopicAnalyserInput

    def _run(self, topic: str, weak_areas: list[str] | None = None) -&gt; str:
        areas = weak_areas or []
        return json.dumps({
            "topic":              topic,
            "focus_areas":        areas or [f"Core concepts of {topic}"],
            "suggested_approach": f"Start with fundamentals, then address: {', '.join(areas)}.",
            "study_tip": (
                "Try explaining the concept out loud in your own words. "
                "If you can teach it simply, you understand it."
            ),
        })


def build_study_buddy_crew(topic: str, explanation: str, weak_areas: list[str]) -&gt; Crew:
    """Build a CrewAI crew for a specific study assistance request."""
    llm = LLM(model=f"ollama/{MODEL_NAME}", base_url=OLLAMA_BASE_URL)

    agent = Agent(
        role="Study Buddy",
        goal=(
            "Provide clear, encouraging supplementary explanations that help "
            "students understand difficult concepts from a fresh angle."
        ),
        backstory=(
            "You are an experienced tutor who specialises in finding alternative "
            "explanations and analogies that make difficult ideas click."
        ),
        llm=llm,
        tools=[TopicAnalyserTool()],
        verbose=False,
        allow_delegation=False,
    )

    weak_text = (
        f"The student struggled with: {', '.join(weak_areas)}"
        if weak_areas else "No specific weak areas identified."
    )

    task = Task(
        description=(
            f"A student is studying '{topic}'. They received this explanation:\n\n"
            f"{explanation[:1000]}\n\n"
            f"{weak_text}\n\n"
            f"Use the topic_analyser tool to structure your approach. Then provide:\n"
            f"1) A fresh analogy that explains the core concept differently\n"
            f"2) One concrete example targeting the weak area(s)\n"
            f"3) One practical tip for remembering this concept\n"
            f"Keep your response concise and encouraging (150-250 words)."
        ),
        agent=agent,
        expected_output=(
            "A study assistance response with a fresh analogy, "
            "a targeted example, and a memory tip."
        ),
    )

    return Crew(
        agents=[agent],
        tasks=[task],
        process=Process.sequential,
        verbose=False,
    )
</code></pre>
<p>The A2A wrapper bridges the CrewAI crew to the A2A protocol. This is <code>StudyBuddyExecutor</code>, the same structure as <code>QuizAgentExecutor</code>, but calling <code>crew.kickoff()</code> instead of quiz functions:</p>
<pre><code class="language-python">class StudyBuddyExecutor(AgentExecutor):
    """
    Bridges the A2A protocol to CrewAI execution.

    The LangGraph system has no idea this is CrewAI.
    The CrewAI crew has no idea it's serving an A2A request.
    """

    async def execute(
        self,
        context: RequestContext,
        event_queue: EventQueue,
    ) -&gt; None:
        # Parse request
        request_text = ""
        for part in context.current_request.params.message.parts:
            if isinstance(part, TextPart):
                request_text += part.text

        try:
            request_data = json.loads(request_text)
        except json.JSONDecodeError:
            request_data = {"topic": request_text}

        topic       = request_data.get("topic", "General Topic")
        explanation = request_data.get("explanation", "")
        weak_areas  = request_data.get("weak_areas", [])

        # CrewAI's kickoff() is synchronous. Run in thread pool
        # to avoid blocking the async event loop.
        try:
            crew        = build_study_buddy_crew(topic, explanation, weak_areas)
            crew_result = await asyncio.to_thread(crew.kickoff)
            result_text = crew_result.raw if hasattr(crew_result, "raw") else str(crew_result)

            result = {
                "source":     "crewai_study_buddy",
                "topic":      topic,
                "weak_areas": weak_areas,
                "assistance": result_text,
                "status":     "complete",
            }
        except Exception as e:
            result = {
                "source":     "crewai_study_buddy",
                "topic":      topic,
                "assistance": f"Could not generate supplementary help for '{topic}'.",
                "status":     "error",
                "error":      str(e),
            }

        await event_queue.enqueue_event(
            Message(
                role="agent",
                parts=[TextPart(text=json.dumps(result, indent=2))],
            )
        )
</code></pre>
<p><code>asyncio.to_thread(crew.kickoff)</code> is the critical line. CrewAI's <code>kickoff()</code> is synchronous and blocking. It can run for 30 to 60 seconds depending on the model and task complexity.</p>
<p>Calling it directly in an <code>async</code> function would freeze the entire A2A server during that time, preventing it from accepting any other requests. <code>asyncio.to_thread</code> runs it in Python's default thread pool, freeing the event loop to handle other requests while the crew runs.</p>
<h3 id="heading-85-the-progress-coach-fallback-pattern">8.5 The Progress Coach Fallback Pattern</h3>
<p>The Progress Coach module ships two helpers for talking to A2A services. Each one tries the external service first and falls back to a local default on any failure.</p>
<p>The Study Buddy helper is wired into <code>progress_coach_node</code> and runs whenever a topic score is below the pass threshold.</p>
<p>The quiz delegation helper is provided as a ready-to-use building block for readers who want to route grading through the A2A service instead of running it inline. The default flow keeps quiz generation local for simplicity.</p>
<p>Both helpers use the same circuit-breaker pattern: probe the Agent Card first, time-bound the actual task call, and never let an external failure surface to the user.</p>
<pre><code class="language-python"># src/agents/progress_coach.py

QUIZ_SERVICE_URL = "http://localhost:9001"

def try_a2a_quiz_delegation(topic, explanation, answers) -&gt; dict | None:
    """
    Attempt to delegate quiz grading to the A2A Quiz Service.
    Returns the grading result, or None on any failure.

    Note: USE_A2A_QUIZ is read at call time, not at module load time.
    Reading env vars at import time causes test isolation failures.
    The env var state at import time gets baked in for the process lifetime.
    """
    use_a2a = os.getenv("USE_A2A_QUIZ", "true").lower() == "true"
    if not use_a2a:
        return None

    try:
        from a2a_services.a2a_client import delegate_quiz_task, is_quiz_service_available

        if not is_quiz_service_available(QUIZ_SERVICE_URL):
            print(f"[Progress Coach] Quiz A2A service unavailable. Using local.")
            return None

        print(f"[Progress Coach] Delegating quiz to A2A: {QUIZ_SERVICE_URL}")
        result = delegate_quiz_task(topic=topic, explanation=explanation, answers=answers)

        if "error" in result:
            print(f"[Progress Coach] A2A failed: {result['error']}")
            return None

        return result

    except Exception as e:
        print(f"[Progress Coach] A2A error: {e}")
        return None


def try_study_buddy_assistance(topic, explanation, weak_areas) -&gt; str | None:
    """
    Request supplementary help from the CrewAI Study Buddy.
    Returns assistance text, or None if the service is unavailable.
    """
    study_buddy_url = os.getenv("STUDY_BUDDY_URL", "http://localhost:9002")
    use_study_buddy = os.getenv("USE_STUDY_BUDDY", "true").lower() == "true"

    if not use_study_buddy:
        return None

    try:
        from a2a_services.a2a_client import request_study_assistance, is_study_buddy_available

        if not is_study_buddy_available(study_buddy_url):
            return None

        result = request_study_assistance(
            topic=topic,
            explanation=explanation,
            weak_areas=weak_areas,
            study_buddy_url=study_buddy_url,
        )

        if result.get("status") == "error" or "error" in result:
            return None

        return result.get("assistance", "")

    except Exception as e:
        return None
</code></pre>
<p>The comment about <code>os.getenv</code> at call time is worth internalising. Reading an environment variable at module import time (<code>USE_A2A = os.getenv("USE_A2A_QUIZ", "true") == "true"</code> at the top of the file) bakes in the value that was present when the module was first imported. Tests that set the env var before calling a function won't see the change because the module already ran. Reading inside the function guarantees the current value at every call.</p>
<h3 id="heading-86-running-the-full-three-terminal-setup">8.6 Running the Full Three-Terminal Setup</h3>
<p>With all services in place, the full system uses three terminals.</p>
<p><strong>Terminal 1:</strong> The main Learning Accelerator:</p>
<pre><code class="language-bash">source .venv/bin/activate
python main.py "Learn Python closures"
</code></pre>
<p><strong>Terminal 2:</strong> The Quiz Generator A2A service:</p>
<pre><code class="language-bash">source .venv/bin/activate
python src/a2a_services/quiz_service.py
</code></pre>
<p><strong>Terminal 3:</strong> The CrewAI Study Buddy:</p>
<pre><code class="language-bash">source .venv/bin/activate
python src/crewai_agent/study_buddy.py
</code></pre>
<p>Or using Make:</p>
<pre><code class="language-bash">make services   # Terminals 2 and 3 in background
make run        # Terminal 1
</code></pre>
<p>When the Progress Coach runs with both services up, you'll see:</p>
<pre><code class="language-plaintext">[Progress Coach] Score: 35%
[Progress Coach] Delegating quiz to A2A: http://localhost:9001
[Quiz A2A] Task received: topic='Python Functions', answers_provided=3
[Quiz A2A] Task complete: status=graded
[Progress Coach] A2A quiz complete: score=35%
[Progress Coach] Requesting study assistance from CrewAI Study Buddy...
[Study Buddy A2A] Request: topic='Python Functions', weak_areas=['first-class functions']
[Study Buddy A2A] Task complete (287 chars)

────────────────────────────────────────────────────────────
Coach: You scored 35% on Python Functions. That's a solid foundation to build on...

📚 Study Buddy says:
Think of functions like variables with superpowers. Just as you can pass a number
to another function, you can pass a function too...
────────────────────────────────────────────────────────────
</code></pre>
<p>When either service is not running, the Progress Coach falls back gracefully:</p>
<pre><code class="language-plaintext">[A2A Client] Cannot reach http://localhost:9001/.well-known/agent-card.json: Connection refused
[Progress Coach] Quiz A2A service unavailable. Using local.
</code></pre>
<p>The session continues. The student never sees the error.</p>
<p>📌 <strong>Checkpoint:</strong> Run the A2A tests:</p>
<pre><code class="language-bash">pytest tests/test_a2a.py tests/test_crewai_interop.py -v
</code></pre>
<p>Expected: 44 tests, all passing. These tests mock the HTTP calls and verify that <code>delegate_quiz_task</code> constructs the right JSON-RPC payload, that <code>discover_agent</code> handles connection errors gracefully, and that <code>build_study_buddy_crew</code> produces a properly configured Crew. No running services required.</p>
<p>The enterprise connection: A2A is what makes agent systems composable at the organisational level. A compliance training platform built by one team (LangGraph) can call a certification verification service built by another team (CrewAI, or any HTTP service) without either team needing to know the other's implementation details. The A2A protocol is the contract. Both sides honor it. The rest is internal.</p>
<p>In the final chapter, you'll see the complete system running end to end, walk through how to extend it, and look at where the multi-agent ecosystem is heading next.</p>
<h2 id="heading-chapter-9-the-complete-system-and-whats-next">Chapter 9: The Complete System and What's Next</h2>
<p>Everything is built. Four LangGraph agents coordinating through a shared state, two MCP servers providing tool access, two A2A services running as independent processes, Langfuse capturing decision-level traces, DeepEval running quality gates, and a Streamlit UI that makes the whole thing usable without a terminal.</p>
<p>This chapter is the runbook: how every piece fits together, how to run it, how to extend it, and where the patterns apply beyond the Learning Accelerator.</p>
<h3 id="heading-91-mainpy-the-entry-point">9.1 <code>main.py</code>: the Entry Point</h3>
<p><code>main.py</code> is under 140 lines. It does four things: load configuration, handle command-line arguments, run the graph with the interrupt/resume loop, and print the session summary.</p>
<p>Every other concern (agents, tools, observability, persistence) is handled by the modules <code>main.py</code> imports.</p>
<pre><code class="language-python"># main.py

import sys
import os
import uuid
from pathlib import Path

# Add src/ to Python path before any project imports
sys.path.insert(0, str(Path(__file__).parent / "src"))

from dotenv import load_dotenv
load_dotenv()

from graph.workflow import graph
from graph.state import initial_state
from observability.langfuse_setup import get_langfuse_config, flush_langfuse


def run_session(goal: str, session_id: str | None = None) -&gt; None:
    """Run a complete interactive study session with Langfuse tracing."""
    is_resume = session_id is not None
    if not session_id:
        session_id = str(uuid.uuid4())[:8]

    # get_langfuse_config() builds the full run config:
    #   - thread_id for SQLite checkpointing
    #   - Langfuse callback handler (if LANGFUSE_PUBLIC_KEY is set)
    config = get_langfuse_config(session_id)

    print(f"\n{'='*60}")
    print(f"Learning Accelerator")
    print(f"Session ID: {session_id}")
    if is_resume:
        print(f"Resuming existing session...")
    else:
        print(f"Goal: {goal}")
    print(f"{'='*60}")

    # For a new session: initial state. For resume: None. LangGraph loads from checkpoint.
    state = None if is_resume else initial_state(goal, session_id)
    result = graph.invoke(state, config=config)

    # Interrupt/resume loop
    from langgraph.types import Command
    while "__interrupt__" in result:
        interrupt_payload = result["__interrupt__"][0].value
        roadmap = interrupt_payload.get("roadmap")
        if roadmap:
            # Display roadmap (abbreviated for chapter. See repo for the full version.)
            print_roadmap(roadmap)
        print(f"\n{interrupt_payload.get('prompt', 'Continue?')}")
        user_input = input("&gt; ").strip()
        result = graph.invoke(Command(resume=user_input), config=config)

    if result.get("error"):
        print(f"\n[ERROR] {result['error']}")
        return

    print_session_summary(result)
    flush_langfuse()   # Ensure all traces are sent before exit


if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser(description="Learning Accelerator")
    parser.add_argument("goal", nargs="?",
                        default="Learn Python closures and decorators from scratch")
    parser.add_argument("--resume", metavar="SESSION_ID",
                        help="Resume an existing session by ID")
    args = parser.parse_args()

    if args.resume:
        run_session(goal="", session_id=args.resume)
    else:
        run_session(goal=args.goal)
</code></pre>
<p>Three things worth noting about this file.</p>
<p><strong>The graph is imported as a module-level singleton.</strong> <code>from graph.workflow import graph</code> runs <code>build_graph()</code> once at import time. The compiled graph lives for the entire process: same SqliteSaver connection, same registered nodes.</p>
<p>This is intentional. Multiple <code>graph.invoke</code> calls (initial plus any resumes from interrupts) all use the same compiled graph with the same checkpointer.</p>
<p><strong>State handling for resume is one line.</strong> <code>state = None if is_resume else initial_state(...)</code>. Passing <code>None</code> tells LangGraph to load the latest checkpoint for the <code>thread_id</code> in <code>config</code>. That's the entire resume mechanism from the caller's side.</p>
<p><strong>The</strong> <code>while</code> <strong>loop handles both approval and rejection.</strong> If the user types <code>no</code>, the conditional edge routes back to <code>curriculum_planner</code>, which generates a new roadmap, which triggers another <code>interrupt()</code>. The loop keeps showing new roadmaps until the user approves one.</p>
<h3 id="heading-92-the-three-terminal-startup">9.2 The Three-Terminal Startup</h3>
<p>The full system needs three processes running simultaneously. The <code>Makefile</code> provides one-command targets:</p>
<pre><code class="language-bash">make setup      # First time only: create venv and install dependencies
make langfuse   # Optional: start self-hosted Langfuse
make services   # Start both A2A services in background
make run        # Start main application (foreground)
</code></pre>
<p>The <code>services</code> target:</p>
<pre><code class="language-makefile">services: stop
	@echo "Starting A2A services..."
	$(PYTHON) src/a2a_services/quiz_service.py &amp;
	@sleep 1
	$(PYTHON) src/crewai_agent/study_buddy.py &amp;
	@sleep 1
	@echo ""
	@echo "Services started:"
	@echo "  Quiz:        http://localhost:9001"
	@echo "  Study Buddy: http://localhost:9002"
</code></pre>
<p>Verify everything is reachable:</p>
<pre><code class="language-bash">curl http://localhost:9001/.well-known/agent-card.json
curl http://localhost:9002/.well-known/agent-card.json
curl http://localhost:3000                   # Langfuse UI
</code></pre>
<h3 id="heading-93-a-complete-session-end-to-end">9.3 A Complete Session, End to End</h3>
<p>With Ollama running, the A2A services up, and Langfuse configured:</p>
<pre><code class="language-bash">make services
make run
</code></pre>
<p>The goal input, approval, and topic loop:</p>
<pre><code class="language-plaintext">============================================================
Learning Accelerator
Session ID: 8660e1d6
Goal: Learn Python closures and decorators from scratch
============================================================

[Observability] Tracing session 8660e1d6 → http://localhost:3000

[Curriculum Planner] Building roadmap for: 'Learn Python closures...'
[Curriculum Planner] Calling qwen2.5:7b...
[Curriculum Planner] Created roadmap: 5 topics, 4 weeks
  1. Python Functions: 60 min
  2. Scopes and Namespaces (needs: Python Functions): 45 min
  3. Inner Functions (needs: Scopes and Namespaces): 60 min
  4. Creating Closures (needs: Inner Functions): 75 min
  5. Decorator Basics (needs: Creating Closures): 60 min

[Human Approval] Pausing for roadmap review...

============================================================
Proposed Study Plan
============================================================
Goal: Learn Python closures and decorators from scratch
Duration: 4 weeks @ 5 hrs/week

  1. Python Functions (60 min)
     Understand how functions are first-class objects in Python.
  ...

Does this study plan look good?
  Type 'yes' to start studying
  Type 'no' to generate a different plan
&gt; yes

[Human Approval] Roadmap approved. Starting study session.

[Explainer] Topic: 'Python Functions'
[Explainer] LLM call 1/8...
  → tool_list_files({})
    ← ["closures.md", "decorators.md", "python_basics.md"]
[Explainer] LLM call 2/8...
  → tool_read_file({'filename': 'python_basics.md'})
    ← # Python Basics...
[Explainer] Complete after 4 LLM call(s)

[Quiz Generator] Generating quiz for: 'Python Functions'
[Progress Coach] Delegating quiz to A2A: http://localhost:9001
[Quiz A2A] Task received: topic='Python Functions', answers_provided=3
[Quiz A2A] Task complete: status=graded

[Progress Coach] Score: 67%
[Progress Coach] Requesting study assistance from CrewAI Study Buddy...
[Study Buddy A2A] Task complete (287 chars)

────────────────────────────────────────────────────────────
Coach: You've got a solid foundation in Python functions...

📚 Study Buddy says:
Think of functions like variables with superpowers...

Next topic: 'Scopes and Namespaces'
────────────────────────────────────────────────────────────
</code></pre>
<p>That single session exercises every component in the system: LangGraph orchestration, SQLite checkpointing, human-in-the-loop interrupt, MCP tool calling, A2A delegation to both the Quiz service and the CrewAI Study Buddy, and Langfuse tracing. The session summary prints at the end. The trace appears in Langfuse within seconds.</p>
<h3 id="heading-94-the-streamlit-ui">9.4 The Streamlit UI</h3>
<p>The terminal interface is fine for development. For daily use, and for demonstrating the system to anyone who isn't going to open a terminal, the system needs a web UI.</p>
<p><code>streamlit_app.py</code> at the project root provides one. The architectural point is worth understanding: <strong>the LangGraph code in</strong> <code>src/</code> <strong>is unchanged</strong>. The same graph that powers <code>main.py</code> powers the web app. Only the I/O mechanism is different. <code>input()</code> and <code>print()</code> become Streamlit widgets, and the interrupt/resume pattern becomes button clicks with <code>st.session_state</code> carrying context across reruns.</p>
<p>Streamlit reruns the entire Python script on every user interaction. Anything that needs to persist across reruns lives in <code>st.session_state</code>, a dict Streamlit preserves between runs. The LangGraph session ID, run config, roadmap, topic index, and quiz progress all live there.</p>
<p>The app is structured as a state machine with five screens (goal input, roadmap approval, explaining, quizzing, complete) and <code>st.session_state.screen</code> determines what renders on each rerun.</p>
<p>The architectural wrinkle is that <code>quiz_generator_node</code> calls <code>run_quiz()</code> which uses <code>input()</code> to collect answers from the terminal. Calling that from Streamlit would freeze the browser. The fix is a UI-specific graph compiled with <code>interrupt_before=["quiz_generator"]</code>:</p>
<pre><code class="language-python"># streamlit_app.py (key excerpt)

from graph.workflow import build_graph
from graph.state import initial_state, StudyRoadmap, QuizResult
from agents.quiz_generator import generate_questions, grade_answer

# UI-specific graph: pauses BEFORE quiz_generator so the UI can
# handle quiz I/O without input() being called inside the graph.
ui_graph = build_graph(
    db_path="data/checkpoints_ui.db",
    interrupt_before=["quiz_generator"],
)
</code></pre>
<p>The UI handles the quiz itself by calling <code>generate_questions</code> and <code>grade_answer</code> directly from the app layer (same functions, different caller). Once the quiz is complete, the app uses <code>graph.update_state()</code> to inject the <code>QuizResult</code> back into the checkpoint as if <code>quiz_generator_node</code> had run, then resumes the graph to execute the Progress Coach:</p>
<pre><code class="language-python">def advance_after_quiz(quiz_result: QuizResult):
    """After UI-handled quiz completes, inject result and resume graph."""
    config = st.session_state.graph_config

    # Tell LangGraph quiz_generator has already run with this result
    ui_graph.update_state(
        config,
        {
            "quiz_results":        existing + [quiz_result],
            "weak_areas":          all_weak,
            "roadmap":             st.session_state.roadmap,
            "current_topic_index": st.session_state.current_topic_index,
        },
        as_node="quiz_generator",
    )

    # Resume. Runs progress_coach, then either explainer (next topic) or END.
    # Because interrupt_before=["quiz_generator"], if a next topic exists
    # the graph pauses again before its quiz_generator.
    result = ui_graph.invoke(None, config=config)
</code></pre>
<p>This is the pattern worth remembering: <code>graph.update_state(config, values, as_node=...)</code> lets the caller patch the checkpoint as if a specific node had produced those values. It's how you inject results from code running outside the graph back into the graph's state flow.</p>
<p>Run it:</p>
<pre><code class="language-bash">make streamlit
# or: streamlit run streamlit_app.py
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/6983b18befedc65b9820e223/0eb788a1-5333-440e-802a-4159a413ea6b.png" alt="Screenshot of the Streamlit web interface showing the roadmap approval screen of the Learning Accelerator: a sidebar on the left labeled Navigation with the Learning Accelerator entry highlighted, and a main content area with a graduation-cap heading &quot;Learning Accelerator&quot;, a &quot;Proposed Study Plan&quot; section listing the goal &quot;Learn Python closures and decorators from scratch&quot; and duration &quot;4 weeks @ 5 hrs/week&quot;, followed by five numbered topic cards (Python Functions, Scopes and Namespaces, Inner Functions, Creating Closures, Decorator Basics) each with estimated minutes, a one-sentence description, and prerequisite topics; two buttons at the bottom labeled &quot;Approve and start studying&quot; and &quot;Generate a different plan&quot;." style="display:block;margin:0 auto" width="1672" height="941" loading="lazy">

<p><em>Figure 3. The Streamlit web interface. Same LangGraph code, same MCP servers, same A2A services. Different I/O.</em></p>
<p>The browser opens at <a href="http://localhost:8501">http://localhost:8501</a>. You get the same system with a web UI. Goal input becomes a form. Roadmap approval becomes two buttons. The explanation renders as formatted markdown. Quiz questions appear one at a time with an answer field. Coach feedback shows in an info box before the next topic.</p>
<p>When the session completes, the summary screen shows per-topic scores and the session ID for terminal resume.</p>
<h4 id="heading-the-streamlit-sessionstate-pattern">💡 The Streamlit <code>session_state</code> pattern</h4>
<p>Streamlit reruns the entire script on every user interaction. Anything that must survive across reruns lives in <code>st.session_state</code>, a dict that Streamlit preserves between runs. The LangGraph <code>session_id</code> and <code>graph_config</code> both go there. So does the current screen, the roadmap, the current question index, the graded answers, and the list of completed <code>QuizResult</code> objects.</p>
<p>The app is effectively a state machine where <code>st.session_state.screen</code> determines what renders and the state machine transitions happen in response to button clicks.</p>
<p>This is the payoff of protocol-first architecture: the system has a terminal UI, a web UI, and the option to add a React frontend, a Slack bot, or an iOS app next, and the LangGraph code in <code>src/</code> is untouched through all of it.</p>
<h3 id="heading-95-the-project-structure-final">9.5 The Project Structure, Final</h3>
<p>After everything is built, the repository layout is:</p>
<pre><code class="language-plaintext">freecodecamp-multi-agent-ai-system/
├── src/
│   ├── agents/
│   │   ├── curriculum_planner.py   # JSON roadmap generation
│   │   ├── explainer.py             # MCP tool-calling loop
│   │   ├── quiz_generator.py        # Two-call pattern + grading
│   │   ├── progress_coach.py        # Synthesis + A2A delegation
│   │   └── human_approval.py        # interrupt() / Command resume
│   ├── graph/
│   │   ├── state.py                 # AgentState + 4 dataclasses
│   │   └── workflow.py              # StateGraph definition
│   ├── mcp_servers/
│   │   ├── filesystem_server.py     # Tools: list, read, search
│   │   └── memory_server.py         # Tools: get, set, delete, list
│   ├── a2a_services/
│   │   ├── quiz_service.py          # Quiz agent on :9001
│   │   └── a2a_client.py            # JSON-RPC client + discovery
│   ├── crewai_agent/
│   │   └── study_buddy.py           # CrewAI agent on :9002
│   └── observability/
│       └── langfuse_setup.py        # Callback handler + config
├── tests/                           # 182 unit + 12 eval tests
├── study_materials/sample_notes/    # Explainer's source content
├── docs/                            # ARCHITECTURE.md, MODEL_SELECTION.md
├── data/                            # SQLite checkpoints (created at runtime)
├── main.py                          # Terminal entry point
├── streamlit_app.py                 # Web UI entry point
├── Makefile                         # One-command targets
├── docker-compose.yml               # Self-hosted Langfuse
├── requirements.txt                 # Pinned versions
└── pyproject.toml                   # pythonpath + pytest config
</code></pre>
<h3 id="heading-96-extending-the-system">9.6 Extending the System</h3>
<p>The architecture supports extension in several directions, all without touching existing code.</p>
<p><strong>Add a new agent.</strong> Write a node function in <code>src/agents/your_agent.py</code>. Register it in <code>workflow.py</code> with <code>builder.add_node("your_agent", your_agent_node)</code>. Add the edges that connect it to existing nodes. Every other agent continues to work unchanged because agents don't know about each other. They only know about state.</p>
<p><strong>Swap the inference backend.</strong> Every agent uses <code>ChatOllama</code> pointing at <code>OLLAMA_BASE_URL</code>. Setting that URL to a LiteLLM gateway (which speaks Ollama's API on the front and routes to OpenAI, Anthropic, or any other provider on the back) switches all four agents to the new backend with zero code change. The API is the contract.</p>
<p><strong>Add an MCP tool.</strong> Add a <code>@mcp.tool()</code> function to <code>filesystem_server.py</code> or <code>memory_server.py</code>. Add a corresponding <code>@tool</code> wrapper in <code>explainer.py</code> and include it in <code>EXPLAINER_TOOLS</code>. The agent's system prompt tells the LLM when to use the new tool. No other changes needed.</p>
<p><strong>Add a new A2A service.</strong> Create a new module under <code>a2a_services/</code> following the <code>quiz_service.py</code> pattern: Agent Card, Executor subclass, uvicorn server. Add a client function in <code>a2a_client.py</code>. Any agent that needs it calls the client function. The service is a separate process and can be deployed, scaled, and restarted independently of the main application.</p>
<p><strong>Migrate state to PostgreSQL.</strong> Replace <code>SqliteSaver</code> with <code>PostgresSaver</code> in <code>workflow.py</code>. Set the connection string to your Postgres instance. Nothing else changes. LangGraph's checkpoint interface is backend-agnostic.</p>
<p><strong>Add authentication to A2A services.</strong> Wrap <code>create_quiz_server()</code>'s Starlette app with authentication middleware. The A2A protocol supports this. Agent Cards can declare authentication schemes, and clients pass credentials in the task envelope. Production deployments outside a trusted network should do this.</p>
<p>Each of these extensions exercises one specific layer of the architecture. None of them requires rewriting the layers below.</p>
<p>📌 <strong>Checkpoint:</strong> Run the full test suite with everything running:</p>
<pre><code class="language-bash">make services
pytest tests/ -v
# 184 tests, eval tests skipped by default
</code></pre>
<p>Then run the eval tests with Ollama:</p>
<pre><code class="language-bash">pytest tests/test_eval.py -m eval -s -v
# 12 eval tests: checks quality, faithfulness, grading calibration
</code></pre>
<p>Finally, exercise the full system manually:</p>
<pre><code class="language-bash">make run
# Follow the prompts, complete a session
# Check Langfuse UI for the trace
</code></pre>
<p>All three verification steps pass. The system is complete.</p>
<h3 id="heading-97-five-extensions-ordered-by-effort">9.7 Five Extensions, Ordered by Effort</h3>
<p>You have a working four-agent system. That's the hard part. The rest is incremental. Each direction below is a natural next step, not a rewrite.</p>
<h4 id="heading-1-swap-the-inference-backend-to-a-managed-gateway-under-an-hour-of-work">1. Swap the inference backend to a managed gateway (under an hour of work).</h4>
<p>Every agent in the system uses <code>ChatOllama</code> pointing at <code>OLLAMA_BASE_URL</code>. Set that URL to a LiteLLM gateway instead. LiteLLM speaks Ollama's API on the front and routes to OpenAI, Anthropic, Together, or any other provider on the back. All four agents switch to the new backend with one environment variable change.</p>
<p>The same approach handles fallback routing: configure LiteLLM to try GPT-4, fall back to Claude if it fails, fall back to a local model if both are down. Your agent code doesn't know any of this happens.</p>
<h4 id="heading-2-add-an-authentication-layer-to-the-a2a-services-a-few-hours-of-work">2. Add an authentication layer to the A2A services (a few hours of work).</h4>
<p>The Agent Card can declare authentication schemes. Production A2A deployments should require bearer tokens or mTLS certificates. Wrap <code>create_quiz_server()</code>'s Starlette app with FastAPI-compatible auth middleware, update the <code>a2a_client.py</code> to pass credentials in the task envelope, and the services become safe to expose outside a trusted network.</p>
<p>The A2A protocol supports this natively. The bearer token goes in the HTTP <code>Authorization</code> header like any other REST service.</p>
<h4 id="heading-3-migrate-sqlite-checkpointing-to-postgresql-half-a-day-including-testing">3. Migrate SQLite checkpointing to PostgreSQL (half a day including testing).</h4>
<p>Replace <code>SqliteSaver</code> with <code>PostgresSaver</code> in <code>workflow.py</code>. Set the connection string to your Postgres instance. LangGraph's checkpoint interface is backend-agnostic.</p>
<p>This matters for multi-instance deployments. SQLite works for a single process, but PostgreSQL lets you run multiple instances of <code>main.py</code> (or the Streamlit app) against the same checkpoint store, so sessions survive instance restarts and can be picked up by any instance.</p>
<h4 id="heading-4-add-streaming-responses-a-day-or-two-of-work">4. Add streaming responses (a day or two of work).</h4>
<p>LangGraph supports <code>graph.astream()</code> for token-level streaming from agent nodes. Update the Streamlit UI to consume the stream and render the explanation as it's generated. Users see output starting in 500ms instead of waiting 3-4 seconds for the full response.</p>
<p>The Explainer is the agent that benefits most. It produces 1,500 to 2,500 character explanations, and the perceived latency improvement is significant.</p>
<h4 id="heading-5-build-a-mobile-friendly-frontend-a-week-of-focused-work">5. Build a mobile-friendly frontend (a week of focused work).</h4>
<p>Replace the Streamlit UI with a React or Next.js frontend that calls a FastAPI wrapper around the graph. The wrapper exposes the same five-screen flow (goal input, roadmap approval, explanation, quiz, complete) as REST endpoints. The LangGraph code in <code>src/</code> doesn't change at all. The quiz collection and grading pattern stays identical to what the Streamlit app does now. The API contract is:</p>
<pre><code class="language-plaintext">POST /api/sessions                     → create session, return session_id + roadmap
POST /api/sessions/:id/approval        → body: {"approved": true/false}
GET  /api/sessions/:id/current         → current topic, explanation, questions
POST /api/sessions/:id/answer          → submit one quiz answer, get graded response
GET  /api/sessions/:id/summary         → final summary when complete
</code></pre>
<p>This is the architecture you'd build if the Learning Accelerator became a real product. The graph runs on the backend. The frontend is a thin client. The production hardening checklist in Appendix C applies.</p>
<h3 id="heading-98-production-hardening">9.8 Production Hardening</h3>
<p>The system as written is tutorial-grade. It runs locally, handles errors gracefully, and demonstrates every concept correctly. It's not ready to serve thousands of concurrent users at enterprise scale.</p>
<p>Here's what changes for that, in order of how much work each item requires.</p>
<p><strong>Per-request rate limiting.</strong> Add token budgets per agent enforced at the orchestrator level. Not as guidelines but as hard limits.</p>
<p>A 4-agent system with 5 tool calls per agent is 20+ LLM calls per user request. At scale, cost becomes an engineering concern before architecture does. The LiteLLM gateway makes this straightforward. It tracks spend per session and can enforce caps.</p>
<p><strong>Checkpoint migration safety.</strong> Version your <code>AgentState</code> schema. When you deploy a new version of the system, in-flight workflows checkpointed against the old schema will try to deserialize with the new code. If fields are added or removed, those workflows fail mid-flight.</p>
<p>Treat checkpoint format as a public API: add new fields as optional with defaults, deprecate removed fields for a release cycle before deleting them, and test schema migrations as part of your deployment pipeline.</p>
<p><strong>Cold start handling.</strong> Agent containers with model weights and heavy dependencies can take 30 to 60 seconds to cold start. Production request rates can't tolerate users waiting a minute while a container initializes. Either maintain a warm pool of containers (cost trade-off) or design fallback paths that tolerate cold start delays with a simpler, faster backup agent. There is no third option. Don't pretend cold starts won't happen.</p>
<p><strong>Observability at scale.</strong> Local Langfuse works for development. Production deployments need either managed Langfuse or a similar distributed tracing backend that can handle millions of traces per day.</p>
<p>The decision-level tracing is what you need. Infrastructure metrics alone can't tell you what went wrong in a multi-agent reasoning chain. Request latency can be fine while the model is producing wrong answers.</p>
<p><strong>Evaluation in CI.</strong> The DeepEval tests from Chapter 7 should run as part of your deployment pipeline. Every new model, prompt, or agent change triggers a full eval suite. If faithfulness drops below threshold, the change is blocked. This is the regression suite for LLM behaviour, your insurance against gradual quality erosion.</p>
<p><strong>Content safety.</strong> Agent outputs should pass through content filters before reaching users or production systems. The Explainer is grounded in your notes, but the LLM can still produce hallucinations or content that violates policies.</p>
<p>A schema validation layer plus a content filter before the output reaches the database or the user is non-negotiable in any production environment where the consequence of a bad output matters.</p>
<p>Appendix C contains the complete hardening checklist.</p>
<h3 id="heading-99-where-the-ecosystem-is-going-in-2026">9.9 Where the Ecosystem is Going in 2026</h3>
<p>A few trends are reshaping how multi-agent systems get built, and both are worth watching as you plan your next project.</p>
<h4 id="heading-protocol-consolidation">Protocol consolidation</h4>
<p>MCP and A2A both shipped v1.0 specs in 2025. Google, Anthropic, Salesforce, SAP, and dozens of other vendors signed on. The agentic era is following the same standardisation arc that REST did for web services: messy at first, then a few clear winners that everything else converges on.</p>
<p>The implication for your work: standardising your tool access on MCP and your agent coordination on A2A now is a low-risk bet. These protocols will still be relevant in three years. Framework choices will come and go.</p>
<h4 id="heading-local-first-infrastructure">Local-first infrastructure</h4>
<p>The gap between local and cloud inference quality keeps narrowing. A year ago, running a multi-agent system on a local 7B model was a demo, not a production tool. Today, Qwen 2.5 at 7 to 32B parameters handles tool calling reliably enough for production workflows.</p>
<p>The privacy, cost, and latency benefits of local inference are significant. Some industries genuinely can't send data to external APIs. Architectures that work well locally also work well with managed gateways. Architectures built around a specific cloud provider's features tend to be harder to migrate.</p>
<h4 id="heading-longer-context-narrower-agents">Longer context, narrower agents</h4>
<p>Context windows keep growing. 1M+ tokens is available on several commercial models now. This pushes against the case for multi-agent systems in general: if one agent can hold the full conversation and reason over everything, why split the work?</p>
<p>The answer has shifted. Multi-agent is no longer about context window management. It's about specialisation, failure isolation, and independent deployment.</p>
<p>The reasons are discussed in Chapter 1. As single-agent capability increases, the bar for "does this problem warrant multi-agent" moves higher. Many teams building multi-agent systems today could achieve the same outcomes with a single agent and better tools.</p>
<p>The patterns in this handbook still apply. The question is just when to reach for them.</p>
<h3 id="heading-910-where-to-apply-these-patterns">9.10 Where to Apply These Patterns</h3>
<p>The Learning Accelerator is a teaching vehicle. The patterns are what transfer. These production systems use this architecture today.</p>
<h4 id="heading-1-sales-enablement">1. Sales enablement</h4>
<p>A curriculum agent builds an onboarding path for a new sales rep. A content agent explains product features from an internal knowledge base via MCP. An assessment agent tests comprehension. A progress agent tracks certification across multiple product areas. Managers approve curricula via the human-in-the-loop gate before training begins.</p>
<h4 id="heading-2-compliance-training">2. Compliance training</h4>
<p>Domain-specific curriculum agents for HIPAA, SOX, GDPR. Content agents grounded in the actual regulatory text (not the model's training data) via MCP servers. Assessment agents with stricter grading thresholds and audit logs that can be exported for regulators. The human-in-the-loop gate becomes a legal review step before the training is assigned.</p>
<h4 id="heading-3-customer-support">3. Customer support</h4>
<p>An intake agent categorises tickets. A research agent reads knowledge base articles via MCP. A drafting agent composes responses. A review agent checks for policy compliance before sending. The A2A layer lets a Salesforce agent call a ServiceNow agent call a custom LangGraph agent: cross-system without bespoke integrations.</p>
<h4 id="heading-4-engineering-onboarding">4. Engineering onboarding</h4>
<p>A codebase agent walks new hires through the repository. A tooling agent explains the development environment. A review agent answers questions about coding standards. All are grounded in the actual codebase and docs via MCP servers pointing at internal repos.</p>
<p>The common thread: each of these has the architectural markers from Chapter 1. Different tools for different subtasks. Different LLM call patterns. Specialisation that would compromise one shared agent. Fault isolation requirements.</p>
<p>The multi-agent architecture isn't chosen for novelty. It's chosen because the problem shape matches.</p>
<h3 id="heading-911-what-to-build-next">9.11 What to Build Next</h3>
<p>A few suggestions for where to take this, from lightest lift to largest.</p>
<ol>
<li><p><strong>Add your own MCP tools:</strong> Point the filesystem server at your own notes directory. Write an MCP server that queries your preferred knowledge source: Notion, Confluence, your team's documentation site. The tool-calling loop works identically. Only the server implementation changes.</p>
</li>
<li><p><strong>Fork the curriculum:</strong> The Learning Accelerator assumes programming topics. Change the prompts in <code>curriculum_planner.py</code> to your domain: medical education, language learning, legal training. The graph structure stays the same.</p>
</li>
<li><p><strong>Build a companion analytics agent:</strong> Add a sixth agent that runs periodically (not in the main graph) and summarises learning patterns across sessions. It reads from the checkpoint database, the Langfuse traces, and MCP memory. It produces weekly progress reports. This is a great extension because it exercises every part of the system without modifying existing code.</p>
</li>
<li><p><strong>Write your own handbook:</strong> The best way to solidify these patterns is to teach them. Build a different multi-agent system for a different problem and document what you learned. The infrastructure patterns (MCP for tools, A2A for agent coordination, LangGraph for orchestration, checkpointing for resilience, LLM-as-judge for evaluation) apply to any multi-agent problem. The specific agents and tools change.</p>
</li>
</ol>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You started this handbook with a single question: does your problem actually warrant multiple agents? That question kept the rest of the engineering honest.</p>
<p>Every agent in the Learning Accelerator exists because the task it handles is genuinely different from the others. Different tools, different LLM call patterns, different temperatures, different failure modes.</p>
<p>We didn't choose multi-agent architecture for its own sake. We chose it because the problem shape required it.</p>
<p>Every technology layer above that decision followed the same discipline.</p>
<ul>
<li><p>LangGraph gave you stateful orchestration and checkpointing because a production system cannot lose state on a crash.</p>
</li>
<li><p>MCP standardised tool access because agents shouldn't be coupled to specific implementations.</p>
</li>
<li><p>A2A made cross-framework coordination possible because real infrastructure sometimes spans multiple frameworks.</p>
</li>
<li><p>Langfuse captured decision-level traces because infrastructure metrics alone can't tell you whether an agent is reasoning correctly.</p>
</li>
<li><p>DeepEval ran quality gates because the only reliable way to evaluate LLM output is another LLM judging against explicit criteria.</p>
</li>
<li><p>The Streamlit UI demonstrated that the LangGraph code is I/O-agnostic.</p>
</li>
<li><p>The same graph powers a terminal session and a web app.</p>
</li>
</ul>
<p>The engineering principle underneath all of this is the one worth carrying forward: <strong>every boundary in a well-designed multi-agent system is a protocol, not a coupling</strong>.</p>
<p>Agents talk to state through a TypedDict contract. Agents talk to tools through MCP. Agents talk to each other through A2A. Agents talk to observability through LangChain callbacks.</p>
<p>Each of those boundaries can be swapped, replaced, or extended without touching the rest. That's what makes the system production-grade. Not the specific frameworks you used, but the discipline of keeping those frameworks behind clear interfaces.</p>
<p>Whatever you build next, keep that principle in view. Models will change. Frameworks will change. The agentic era's specific tooling will evolve faster than any handbook can keep up with. Good architectural decisions outlive all of it.</p>
<p>The complete code for this handbook is at <a href="https://github.com/sandeepmb/freecodecamp-multi-agent-ai-system">github.com/sandeepmb/freecodecamp-multi-agent-ai-system</a>. Clone it, run it, fork it, extend it. If you build something interesting on top of these patterns, I'd genuinely like to hear about it.</p>
<p>Now go build something.</p>
<h2 id="heading-appendix-a-framework-comparison">Appendix A: Framework Comparison</h2>
<p>Frameworks covered in this handbook and when each one fits. This table reflects the state of the ecosystem as of early 2026. Specific features change. The fit-for-purpose reasoning tends to stay stable.</p>
<table>
<thead>
<tr>
<th>Framework</th>
<th>What it is</th>
<th>When to use</th>
<th>When to skip</th>
</tr>
</thead>
<tbody><tr>
<td><strong>LangGraph</strong></td>
<td>Stateful agent graph with checkpointing, conditional routing, and native HITL</td>
<td>Production multi-agent workflows where state persistence and deterministic routing matter</td>
<td>Simple single-agent tasks with no state</td>
</tr>
<tr>
<td><strong>CrewAI</strong></td>
<td>Role-based multi-agent framework with declarative crews and tasks</td>
<td>Rapid prototyping of role-based agent collaborations. Use cases that fit the crew metaphor naturally.</td>
<td>Complex branching logic or custom control flow. The crew abstraction gets in the way.</td>
</tr>
<tr>
<td><strong>AutoGen</strong></td>
<td>Microsoft's conversational multi-agent framework with group chat patterns</td>
<td>Research and exploratory work. Multi-agent scenarios driven by conversation patterns.</td>
<td>Production systems requiring strict control flow and explicit state management</td>
</tr>
<tr>
<td><strong>LlamaIndex</strong></td>
<td>RAG-first framework with strong data ingestion and retrieval</td>
<td>Systems where retrieval over unstructured data is the core problem</td>
<td>Pure agent orchestration. You'd end up using LangGraph or similar on top.</td>
</tr>
<tr>
<td><strong>LangChain</strong></td>
<td>Broad toolkit for LLM app primitives. Foundation that LangGraph sits on</td>
<td>Lower-level building blocks (prompts, output parsers, chains) used inside agents</td>
<td>Orchestration itself. Use LangGraph for graph-based multi-agent systems.</td>
</tr>
<tr>
<td><strong>MCP</strong> (protocol)</td>
<td>Model Context Protocol. Standardised agent-to-tool interface</td>
<td>Any system where tool implementations should be swappable and cross-framework reusable</td>
<td>Single-use internal tools where a Python function works fine</td>
</tr>
<tr>
<td><strong>A2A</strong> (protocol)</td>
<td>Agent-to-Agent Protocol. Cross-framework agent coordination over HTTP</td>
<td>Cross-team or cross-framework agent coordination, independent deployment of agents</td>
<td>Tightly coupled agents that always deploy together. Direct function calls are simpler.</td>
</tr>
</tbody></table>
<p>Here's a rule of thumb for choosing the orchestrator: LangGraph's strengths (checkpointing, interrupt/resume, explicit state contracts) become essential in production. CrewAI is great when the role-based metaphor maps cleanly to your domain. AutoGen's group-chat pattern fits research and exploratory work better than strict production control flow.</p>
<p>Don't let framework preference override problem shape. If your problem is a graph, use LangGraph. If your problem is a conversation, use AutoGen.</p>
<p>And note that MCP and A2A aren't in competition with these frameworks. They're the integration layer underneath. Build your agent in LangGraph, expose it as an A2A service, use MCP for its tools. You can mix and match all three regardless of which orchestration framework you chose.</p>
<h2 id="heading-appendix-b-model-selection-guide">Appendix B: Model Selection Guide</h2>
<p>All agents in this system use Ollama for local inference. Model choice determines whether tool calling works reliably. Models under 7B parameters tend to produce malformed JSON and hallucinate tool names often enough to fail in agentic use.</p>
<h3 id="heading-recommendations-by-vram">Recommendations by VRAM</h3>
<table>
<thead>
<tr>
<th>VRAM</th>
<th>Model</th>
<th>Pull command</th>
<th>Best for</th>
</tr>
</thead>
<tbody><tr>
<td>8 GB</td>
<td><code>qwen2.5:7b</code></td>
<td><code>ollama pull qwen2.5:7b</code></td>
<td>General purpose, reliable tool calling</td>
</tr>
<tr>
<td>8 GB</td>
<td><code>qwen3:8b</code></td>
<td><code>ollama pull qwen3:8b</code></td>
<td>Better reasoning, same VRAM class</td>
</tr>
<tr>
<td>24 GB</td>
<td><code>qwen2.5-coder:32b</code></td>
<td><code>ollama pull qwen2.5-coder:32b</code></td>
<td>Best tool calling at this tier</td>
</tr>
<tr>
<td>24 GB</td>
<td><code>qwen3:32b</code></td>
<td><code>ollama pull qwen3:32b</code></td>
<td>Best overall at this tier</td>
</tr>
<tr>
<td>CPU only</td>
<td><code>qwen2.5:7b</code> (Q4_K_M)</td>
<td><code>ollama pull qwen2.5:7b</code></td>
<td>Works, 5 to 10 times slower</td>
</tr>
</tbody></table>
<p><strong>On macOS,</strong> Apple Silicon unified memory is shared between CPU and GPU. A 16 GB unified memory Mac gives roughly 8 GB to the model. Check via Apple menu → About This Mac → chip info.</p>
<p><strong>Minimum viable tier for production agentic use: 7B parameters.</strong> Sub-7B models handle chat fine but produce too many JSON formatting errors for reliable tool calling.</p>
<p>The <code>format="json"</code> constraint in Ollama helps. It's an inference-time guarantee of valid JSON. But the model still needs to produce <em>meaningful</em> JSON, not just parseable JSON, and that requires the 7B+ parameter count.</p>
<h3 id="heading-temperature-settings-used-in-this-system">Temperature Settings Used in This System</h3>
<p>These are the settings baked into each agent. Never use <code>temperature &gt; 0.5</code> for any agent that produces structured JSON output. Parsing becomes unreliable.</p>
<pre><code class="language-python"># Structured output: Curriculum Planner, Quiz Generator grading
ChatOllama(temperature=0.1, format="json")

# Tool-calling loop: Explainer
ChatOllama(temperature=0.3)

# Creative generation: Quiz Generator questions, Progress Coach
ChatOllama(temperature=0.4, format="json")

# Deterministic evaluation: DeepEval OllamaJudge
ChatOllama(temperature=0.0)
</code></pre>
<p><strong>Why different temperatures matter:</strong> A single agent with one temperature setting compromises every task it handles. Structured JSON planning needs 0.1 for consistency. Creative question generation benefits from 0.4 for variety. Grading needs 0.1 for fairness.</p>
<p>If one agent did all three with <code>temperature=0.25</code>, planning would produce parse errors and question generation would produce repetitive questions. Splitting these into different agents with different temperature configurations is one of the core justifications for multi-agent architecture in this system.</p>
<h3 id="heading-switching-models">Switching Models</h3>
<p>Change <code>OLLAMA_MODEL</code> in <code>.env</code>. No code changes needed.</p>
<pre><code class="language-bash"># .env
OLLAMA_MODEL=qwen2.5-coder:32b
OLLAMA_BASE_URL=http://localhost:11434
</code></pre>
<p>Then pull the model if you haven't:</p>
<pre><code class="language-bash">ollama pull qwen2.5-coder:32b
</code></pre>
<p>All four agents automatically use the new model on the next run.</p>
<h3 id="heading-eval-test-thresholds-by-model">Eval Test Thresholds by Model</h3>
<p>Thresholds in <code>tests/test_eval.py</code> are calibrated for 7B models at 0.6. Larger models typically score higher. If you upgrade and want stricter quality gates, raise these:</p>
<table>
<thead>
<tr>
<th>Model tier</th>
<th>Faithfulness</th>
<th>Relevancy</th>
<th>Question Quality</th>
<th>Notes</th>
</tr>
</thead>
<tbody><tr>
<td>7-8B local</td>
<td>0.65-0.80</td>
<td>0.70-0.85</td>
<td>0.65-0.80</td>
<td>Default thresholds at 0.6</td>
</tr>
<tr>
<td>32B local</td>
<td>0.80-0.90</td>
<td>0.85-0.95</td>
<td>0.80-0.90</td>
<td>Can raise thresholds to 0.75</td>
</tr>
<tr>
<td>GPT-4 / Claude</td>
<td>0.85-0.98</td>
<td>0.90-0.98</td>
<td>0.85-0.95</td>
<td>Can raise thresholds to 0.85</td>
</tr>
</tbody></table>
<p>Set the threshold at roughly 10 percentage points below the typical score. Too close to the typical score and you get flaky tests. Too far and you miss regressions.</p>
<h2 id="heading-appendix-c-production-hardening-checklist">Appendix C: Production Hardening Checklist</h2>
<p>The system as written is tutorial-grade. Before deploying at scale, work through this checklist. Each item maps to a real failure mode that appears in production deployments.</p>
<h3 id="heading-orchestration-and-state">Orchestration and State</h3>
<ul>
<li><p>[ ] <strong>Replace SQLite with PostgreSQL</strong> for checkpointing. SQLite works for single-process. Postgres is required for multi-instance deployments.</p>
</li>
<li><p>[ ] <strong>Version your</strong> <code>AgentState</code> <strong>schema.</strong> Add new fields as optional with defaults. Deprecate removed fields for a release cycle before deleting.</p>
</li>
<li><p>[ ] <strong>Test schema migrations</strong> as part of your deployment pipeline. In-flight workflows must survive rolling deployments.</p>
</li>
<li><p>[ ] <strong>Set explicit timeout budgets</strong> on every agent call. Propagate the timeout from the orchestrator to every downstream service.</p>
</li>
<li><p>[ ] <strong>Add circuit breakers</strong> around every external service call (LLM API, A2A services, MCP servers). Retry storms amplify production pressure.</p>
</li>
</ul>
<h3 id="heading-inference-and-cost">Inference and Cost</h3>
<ul>
<li><p>[ ] <strong>Route through an inference gateway</strong> (LiteLLM or similar) with rate limiting, model fallback, and per-session cost tracking.</p>
</li>
<li><p>[ ] <strong>Enforce per-agent token budgets</strong> at the orchestrator level. Hard limits, not guidelines.</p>
</li>
<li><p>[ ] <strong>Cap</strong> <code>max_iterations</code> on every tool-calling loop. The Explainer has <code>max_iterations=8</code>. Verify each agent has a similar cap.</p>
</li>
<li><p>[ ] <strong>Monitor per-session cost</strong> and alert when a session exceeds the budget. A confused agent can loop indefinitely otherwise.</p>
</li>
</ul>
<h3 id="heading-observability">Observability</h3>
<ul>
<li><p>[ ] <strong>Move Langfuse to managed or high-availability self-hosted.</strong> Local Langfuse doesn't scale to production trace volumes.</p>
</li>
<li><p>[ ] <strong>Capture session-level traces</strong> with structured tags (user ID, feature flag, model version) so you can filter and compare.</p>
</li>
<li><p>[ ] <strong>Set up alerting</strong> on error rate spikes, token cost spikes, and latency regressions.</p>
</li>
<li><p>[ ] <strong>Sample traces</strong> in production. 100% sampling becomes expensive. 10 to 20% sampling with full capture of errors is typically enough.</p>
</li>
<li><p>[ ] <strong>Export traces to a data warehouse</strong> periodically for long-term analysis and regulatory audit.</p>
</li>
</ul>
<h3 id="heading-evaluation-and-quality">Evaluation and Quality</h3>
<ul>
<li><p>[ ] <strong>Run the eval suite in CI</strong> on every deployment. Block deployments that fail quality thresholds.</p>
</li>
<li><p>[ ] <strong>Maintain a regression test set</strong> of known-good inputs and expected outputs. Run this before every model change.</p>
</li>
<li><p>[ ] <strong>Track quality metrics over time.</strong> Gradual drift is harder to catch than a sudden regression.</p>
</li>
<li><p>[ ] <strong>Have human-review sampling</strong> for high-risk decisions. Not every output, but a statistically meaningful sample.</p>
</li>
</ul>
<h3 id="heading-security">Security</h3>
<ul>
<li><p>[ ] <strong>Add authentication to A2A services.</strong> Bearer tokens, mTLS, or OAuth depending on your environment.</p>
</li>
<li><p>[ ] <strong>Audit MCP tool implementations</strong> for path traversal, injection, and privilege escalation. The <code>read_study_file</code> function in this system shows the pattern.</p>
</li>
<li><p>[ ] <strong>Sanitise LLM inputs.</strong> Anything the model sees can influence its behaviour, including indirect prompt injection from retrieved content.</p>
</li>
<li><p>[ ] <strong>Validate structured outputs</strong> before applying them to production systems. Schema validation, policy rules, safety filters.</p>
</li>
<li><p>[ ] <strong>Maintain immutable audit logs</strong> of every decision that results in a production action. Required for regulated industries.</p>
</li>
<li><p>[ ] <strong>Implement human-in-the-loop thresholds</strong> for high-risk actions. Automation for low-risk, escalation for high-risk.</p>
</li>
<li><p>[ ] <strong>Rotate credentials</strong> for API keys, database connections, and service tokens.</p>
</li>
</ul>
<h3 id="heading-reliability-and-failure-modes">Reliability and Failure Modes</h3>
<ul>
<li><p>[ ] <strong>Design fallback paths</strong> for every external dependency. The Progress Coach's A2A fallback pattern in this system is the model: try the service, fall back silently on any failure.</p>
</li>
<li><p>[ ] <strong>Handle cold starts</strong> for agent containers. Warm pool or tolerable fallback. Never let users wait 60 seconds for a container to initialise.</p>
</li>
<li><p>[ ] <strong>Implement content filters</strong> on agent outputs. Hallucinations happen even with grounded inputs.</p>
</li>
<li><p>[ ] <strong>Set up health checks</strong> for every service. A2A Agent Cards serve as health endpoints. Any client can fetch them to verify reachability.</p>
</li>
<li><p>[ ] <strong>Test graceful degradation</strong> explicitly. Kill services one at a time and verify the main app stays responsive.</p>
</li>
</ul>
<h3 id="heading-governance">Governance</h3>
<ul>
<li><p>[ ] <strong>Document every agent's responsibilities.</strong> What tools it uses, what state it reads and writes, what failure modes are expected.</p>
</li>
<li><p>[ ] <strong>Maintain a prompt version registry</strong> tied to git commits. Know which prompt was in production when an issue occurred.</p>
</li>
<li><p>[ ] <strong>Review and approve model upgrades.</strong> Swapping a model version can change output behaviour in ways that break downstream assumptions.</p>
</li>
<li><p>[ ] <strong>Establish a rollback procedure</strong> for both code and model changes. Rolling back a bad deployment should take minutes, not hours.</p>
</li>
</ul>
<p>This isn't an exhaustive list, but it covers the failure modes that actually appear in production deployments of multi-agent systems. Work through it before your first public launch, and revisit it quarterly as the system evolves.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Protect Sensitive Data by Running LLMs Locally with Ollama ]]>
                </title>
                <description>
                    <![CDATA[ Whenever engineers are building AI-powered applications, use of sensitive data is always a top priority. You don't want to send users' data to an external API that you don't control. For me, this happ ]]>
                </description>
                <link>https://www.freecodecamp.org/news/protect-sensitive-data-with-local-llms/</link>
                <guid isPermaLink="false">69a99b623728a9dc358a5d85</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langgraph ]]>
                    </category>
                
                    <category>
                        <![CDATA[ LLM&#39;s  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manoj Aggarwal ]]>
                </dc:creator>
                <pubDate>Thu, 05 Mar 2026 15:04:02 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/92c9b0b4-5ff8-40ab-b5f5-a060765e99b4.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Whenever engineers are building AI-powered applications, use of sensitive data is always a top priority. You don't want to send users' data to an external API that you don't control.</p>
<p>For me, this happened when I was building <a href="https://github.com/manojag115/FinanceGPT">FinanceGPT</a>, which is my personal open-source project that helps me with my finances. This application lets you upload your bank statements, tax forms like 1099s, and so on, and then you can ask questions in plain English like, "How much did I spend on groceries this month?" or "What was my effective tax rate last year?"</p>
<p>The problem is that answering these questions means sending all the sensitive transaction history, W-2s and income data to OpenAI or Anthropic or Google, which I was not comfortable with. Even after redacting PII data from these documents, I was not ok with the trade-off.</p>
<p>This is where Ollama comes in. Ollama lets you run large language models entirely on your own laptop. You don't need any API keys or cloud infrastructure and no data leaves your machine.</p>
<p>In this tutorial, I will walk you through what Ollama is, how to get started with it, and how to use it in a real Python application so that users of the application can choose to keep their data completely local.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#what-is-ollama">What is Ollama</a></p>
</li>
<li><p><a href="#how-ollamas-api-works">How Ollama's API works</a></p>
</li>
<li><p><a href="#how-to-call-ollama-from-python">How to call Ollama from Python</a></p>
</li>
<li><p><a href="#how-to-integrate-ollama-into-a-langchain-app">How to Integrate Ollama into a LangChain App</a></p>
</li>
<li><p><a href="#how-to-build-an-llm-provider-agnostic-app">How to Build an LLM-Provider Agnostic App</a></p>
</li>
<li><p><a href="#how-to-use-ollama-with-langgraph">How to use Ollama with LangGraph</a></p>
</li>
<li><p><a href="#how-financegpt-uses-this-in-practice">How FinanceGPT Uses This in Practice</a></p>
</li>
<li><p><a href="#tradeoffs-to-be-aware-of">Tradeoffs to be Aware Of</a></p>
</li>
<li><p><a href="#conclusion">Conclusion</a></p>
</li>
<li><p><a href="#check-out-financegpt">Check Out FinanceGPT</a></p>
</li>
<li><p><a href="#resources">Resources</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You will need the following at a minimum:</p>
<ul>
<li><p>Python 3.10+</p>
</li>
<li><p>A machine with at least 8GB of RAM (16GB recommended for larger models)</p>
</li>
<li><p>Basic familiarity with Python and pip</p>
</li>
</ul>
<h2 id="heading-what-is-ollama">What is Ollama?</h2>
<p>Ollama is an open-source tool that makes running LLMs locally very easy. You can think of it as Docker but for AI models. You can pull models using just one command and Ollama handles everything else like downloading the weights, managing memory and the serving the model through a local REST API.</p>
<p>The local REST API is compatible with OpenAI's API format which means any application that can talk to OpenAI, can switch to using Ollama without changing any code.</p>
<h3 id="heading-installation">Installation</h3>
<p>First thing you would need is to download the installer from <a href="https://ollama.com/">ollama.com</a>. Once installed, you can verify it is running:</p>
<pre><code class="language-shell">ollama --version
</code></pre>
<p>The above command checks whether Ollama was installed correctly and prints the current version.</p>
<h3 id="heading-pull-and-run-your-first-model">Pull and Run Your First Model</h3>
<p>Ollama hosts a variety of models on <a href="https://ollama.com/library">ollama.com/library</a>. To pull and immediately chat with one, just do:</p>
<pre><code class="language-shell">ollama run llama3.2
</code></pre>
<p>This command will download the model from ollama and start an interactive chat session with it. Note: the model size would be a few GBs depending on which model is downloaded. Alternatively, if you want to download a specific model only:</p>
<pre><code class="language-shell">ollama pull mistral
</code></pre>
<p>This downloads a model to your machine without starting a chat session which is useful when you want to set up models in advance.</p>
<p>You can run the following command to list the models you have installed:</p>
<pre><code class="language-shell">ollama list
</code></pre>
<p>This shows all models you've downloaded locally along with their sizes.</p>
<p>I have used the following models and they have worked great for specific tasks:</p>
<table>
<thead>
<tr>
<th>Model</th>
<th>Size</th>
<th>Good For</th>
</tr>
</thead>
<tbody><tr>
<td><code>llama3.2</code></td>
<td>~2GB</td>
<td>Fast, general purpose</td>
</tr>
<tr>
<td><code>mistral</code></td>
<td>~4GB</td>
<td>Strong instruction following</td>
</tr>
<tr>
<td><code>qwen2.5:7b</code></td>
<td>~4GB</td>
<td>Multilingual, reasoning</td>
</tr>
<tr>
<td><code>deepseek-r1:7b</code></td>
<td>~4GB</td>
<td>Complex reasoning tasks</td>
</tr>
</tbody></table>
<h2 id="heading-how-ollamas-api-works">How Ollama's API works</h2>
<p>Once Ollama is running, it will be served on localhost:11434. You can call it directly using curl:</p>
<pre><code class="language-shell">curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{ "role": "user", "content": "What is compound interest?" }],
  "stream": false
}'
</code></pre>
<p>This sends a chat message directly to Ollama's REST API from the command line, with streaming disabled so you get the full response at once. The above endpoint is to simply chat with the model. The more useful endpoint is <code>http://localhost:11434/v1</code> as this is OpenAI-compatible. This is the key feature that makes it easy to drop into existing apps that use OpenAI or other LLMs.</p>
<h2 id="heading-how-to-call-ollama-from-python">How to Call Ollama from Python</h2>
<h3 id="heading-how-to-use-the-ollama-python-library">How to Use the Ollama Python Library</h3>
<p>Ollama has its own Python library that is pretty intuitive to use:</p>
<pre><code class="language-shell">pip install ollama
</code></pre>
<pre><code class="language-python">from ollama import chat

response = chat(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'Explain what a Roth IRA is in simple terms.'}
    ]
)

print(response.message.content)
</code></pre>
<p>The above code uses Ollama's native Python SDK to send a message and print the model's reply, which is the most straightforward way to call Ollama from Python</p>
<h3 id="heading-how-to-use-the-openai-sdk-with-ollama-as-the-backend">How to Use the OpenAI SDK with Ollama as the Backend</h3>
<p>As mentioned earlier, Ollama has an endpoint that is OpenAI compatible, so you can also use the OpenAI Python SDK and just point it to your local server:</p>
<pre><code class="language-shell">pip install openai
</code></pre>
<pre><code class="language-python">from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama',  # Required by the SDK, but ignored by Ollama
)

response = client.chat.completions.create(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'Explain what a Roth IRA is in simple terms.'}
    ]
)

print(response.choices[0].message.content)
</code></pre>
<p>This uses the standard OpenAI Python SDK but redirects it to your local Ollama server. The <code>api_key</code> field is required by the SDK but ignored by Ollama. This pattern makes using Ollama seamless for existing applications. The code is nearly identical to what you would write for OpenAI.</p>
<h2 id="heading-how-to-integrate-ollama-into-a-langchain-app">How to Integrate Ollama into a LangChain App</h2>
<p>Most production applications are built with an orchestration framework like LangChain, which has a native Ollama support. This means swapping providers is just a one-line change.</p>
<p>Install the integration:</p>
<pre><code class="language-shell">pip install langchain-ollama
</code></pre>
<h3 id="heading-how-to-create-a-chat-model">How to Create a Chat Model</h3>
<pre><code class="language-python">from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.2")

response = llm.invoke("What is the difference between a W-2 and a 1099?")
print(response.content)
</code></pre>
<p>This creates a LangChain-compatible chat model backed by a local Ollama model, a one-line swap from <code>ChatOpenAI</code>.</p>
<p>Compare this to the OpenAI version and you will see that the interface is almost identical:</p>
<pre><code class="language-python">from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")
</code></pre>
<h2 id="heading-how-to-build-an-llm-provider-agnostic-app">How to Build an LLM-Provider Agnostic App</h2>
<p>The real power of the application comes from the abstraction of LLM providers. Applications like Perplexity lets users choose the LLM they want to use for their tasks. Here's a simple factory pattern that returns the right LLM based on the configuration:</p>
<pre><code class="language-python">from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langchain_anthropic import ChatAnthropic

def get_llm(provider: str, model: str):
    """
    Return the appropriate LangChain LLM based on the provider.
    
    Args:
        provider: One of "openai", "ollama", "anthropic"
        model: The model name (e.g. "gpt-4o", "llama3.2", "claude-3-5-sonnet")
    
    Returns:
        A LangChain chat model ready to use
    """
    if provider == "openai":
        return ChatOpenAI(model=model)
    elif provider == "ollama":
        return ChatOllama(model=model)
    elif provider == "anthropic":
        return ChatAnthropic(model=model)
    else:
        raise ValueError(f"Unknown provider: {provider}")
</code></pre>
<p>The above snippet shows a helper that returns the right LangChain model based on a provider string, so the rest of your app never needs to know which LLM is running underneath.</p>
<p>Now the rest of your code does not need to know about the provider who's LLM is running underneath. This includes your chains, your agents and your tools. You pass <code>llm</code> around and it just works.</p>
<h2 id="heading-how-to-use-ollama-with-langgraph">How to use Ollama with LangGraph</h2>
<p>If you're using LangGraph to build agents (as I covered in my <a href="https://www.freecodecamp.org/news/how-to-develop-ai-agents-using-langgraph-a-practical-guide/">previous article on AI agents</a>), plugging in Ollama is equally seamless:</p>
<pre><code class="language-python">from langgraph.prebuilt import create_react_agent
from langchain_ollama import ChatOllama
from langchain_core.tools import tool

@tool
def get_spending_summary(category: str) -&gt; str:
    """Get total spending for a given category this month."""
    # In a real app, this would query your database
    return f"You spent $342.50 on {category} this month."

llm = ChatOllama(model="llama3.2")

agent = create_react_agent(
    model=llm,
    tools=[get_spending_summary]
)

response = agent.invoke({
    "messages": [{"role": "user", "content": "How much did I spend on groceries?"}]
})

print(response["messages"][-1].content)
</code></pre>
<p>This snippet builds a ReAct agent that uses a locally-running model to decide when to call tools while keeping all data on-device even during agentic workflows.</p>
<p>The agent will decide to call the <code>get_spending_summary</code> tool when needed and get the result using the locally running model instead of sending your data over the internet to OpenAI.</p>
<h2 id="heading-how-financegpt-uses-this-in-practice">How FinanceGPT Uses This in Practice</h2>
<p>FinanceGPT is built to support OpenAI, Anthropic, Google and Ollama as LLM providers. The user sets their preference on the UI or in a config file and the application instantiates the right model using a pattern very similar to the factory pattern above.</p>
<p>When the user chooses Ollama, here's what happens:</p>
<ol>
<li><p>Their bank statements and other sensitive documents are parsed locally</p>
</li>
<li><p>Sensitive fields like SSNs are masked before any LLM call</p>
</li>
<li><p>The masked data and query goes to the local Ollama server running on their own machine</p>
</li>
<li><p>The response comes back locally and nothing ever leaves their network</p>
</li>
</ol>
<p>To run FinanceGPT locally with Ollama, the setup looks like this:</p>
<pre><code class="language-shell"># 1. Pull a capable model
ollama pull llama3.2

# 2. Clone and configure FinanceGPT
git clone https://github.com/manojag115/FinanceGPT.git
cd FinanceGPT
cp .env.example .env

# 3. In .env, set your LLM provider to Ollama
# LLM_PROVIDER=ollama
# LLM_MODEL=llama3.2

# 4. Start the full stack
docker compose -f docker-compose.quickstart.yml up -d
</code></pre>
<p>With this setup, the entire application including the frontend, backend and LLM, runs on your own hardware.</p>
<h2 id="heading-tradeoffs-to-be-aware-of">Tradeoffs to be Aware Of</h2>
<p>Ollama is a great local alternative to using cloud LLMs, but it comes with its own problems.</p>
<h3 id="heading-response-quality">Response Quality</h3>
<p>Ollama models are essentially 7B parameter models running locally, so by design they will not match GPT-4o on complex reasoning tasks. For simple Q&amp;A and summarization tasks, the results would be comparable, but for multi-step reasoning or nuanced judgement calls, the gap is noticeable.</p>
<h3 id="heading-speed">Speed</h3>
<p>Inference speed depends on the hardware that is running the model. Without a GPU, the Ollama models can take several seconds to respond. On Apple Silicon (M1/M2/M3), the performance is surprisingly good even without a dedicated GPU.</p>
<h3 id="heading-hardware-requirements">Hardware Requirements</h3>
<p>Small models (7B parameters) need around 8GB of RAM, however larger models (13B+) need 16GB or more. If you are building your application for end users, you cannot guarantee they have the hardware.</p>
<h3 id="heading-tool-use-and-function-calling">Tool Use and Function Calling</h3>
<p>Not all local models support function calling reliably. If your agent depends heavily on tool use, test your chosen model carefully. Models like <code>qwen2.5</code> and <code>mistral</code> generally handle this better than others.</p>
<p>The right mental model: use cloud models when you need maximum capability, and local models when privacy or cost constraints make cloud models impractical.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you learned what Ollama is, how to install it and pull models, and three different ways to call it from Python: the native Ollama library, the OpenAI-compatible SDK, and LangChain. You also saw how to build a provider-agnostic factory pattern so your app can switch between cloud and local models with a single config change.</p>
<p>Ollama makes local LLMs genuinely practical for production apps. The OpenAI-compatible API means integration is nearly zero-friction, and LangChain's native support means you can build provider-agnostic apps from the start.</p>
<p>The finance domain is an obvious fit — but the same principle applies anywhere sensitive data is involved: healthcare, legal tech, HR, personal productivity. If your app processes data that users wouldn't want stored on someone else's server, giving them a local option isn't just a nice-to-have. It's a trust feature.</p>
<h2 id="heading-check-out-financegpt"><strong>Check Out FinanceGPT</strong></h2>
<p>All the code examples here came from <a href="https://github.com/manojag115/FinanceGPT">FinanceGPT</a>. If you want to see these patterns in a complete app, poke around the repo. It's got document processing, portfolio tracking, tax optimization – all built with LangGraph.</p>
<p>If you find this helpful, <a href="https://github.com/manojag115/FinanceGPT">give the project a star on GitHub</a> – it helps other developers discover it.</p>
<h2 id="heading-resources">Resources</h2>
<ul>
<li><p><a href="https://ollama.com/docs">Ollama Documentation</a></p>
</li>
<li><p><a href="https://ollama.com/library">Ollama Model Library</a></p>
</li>
<li><p><a href="https://python.langchain.com/docs/integrations/chat/ollama/">LangChain Ollama Integration</a></p>
</li>
<li><p><a href="https://www.freecodecamp.org/news/how-to-develop-ai-agents-using-langgraph-a-practical-guide/">How to Build AI Agents with LangGraph (my previous article)</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Develop AI Agents Using LangGraph: A Practical Guide ]]>
                </title>
                <description>
                    <![CDATA[ AI agents are all the rage these days. They’re like traditional chatbots, but they have the ability to utilize a plethora of tools in the background. They can also decide which tool to use and when to ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-develop-ai-agents-using-langgraph-a-practical-guide/</link>
                <guid isPermaLink="false">69965d1013f3e8d4dfe2a929</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langgraph ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agentic AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI Agent Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manoj Aggarwal ]]>
                </dc:creator>
                <pubDate>Thu, 19 Feb 2026 00:45:04 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771461883355/00e4ae2d-048d-461c-93f9-184a67280770.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>AI agents are all the rage these days. They’re like traditional chatbots, but they have the ability to utilize a plethora of tools in the background. They can also decide which tool to use and when to use it to answer your questions.</p>
<p>In this tutorial, I’ll show you how to build this type of agent using <code>LangGraph</code>. We’ll dig into real code from my personal project <a href="https://github.com/manojag115/FinanceGPT">FinanceGPT</a>, an open-source financial assistant I created to help me with my finances.</p>
<p>You’ll walk away understanding how AI agents actually work under the hood, and you’ll be able to build your own agent for whatever domain you are working on.</p>
<h2 id="heading-what-ill-cover">What I’ll Cover:</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-what-are-ai-agents">What Are AI Agents?</a></p>
</li>
<li><p><a href="#heading-what-is-langgraph">What is LangGraph?</a></p>
</li>
<li><p><a href="#heading-core-concept-1-tools">Core Concept 1: Tools</a></p>
</li>
<li><p><a href="#heading-core-concept-2-agent-state">Core Concept 2: Agent State</a></p>
</li>
<li><p><a href="#heading-core-concept-3-the-agent-graph">Core Concept 3: The Agent Graph</a></p>
</li>
<li><p><a href="#heading-how-to-put-it-all-together">How to Put it All Together</a></p>
</li>
<li><p><a href="#heading-how-the-agent-thinks">How the Agent Thinks</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-resources-worth-checking-out">Resources Worth Checking Out</a></p>
</li>
<li><p><a href="#heading-check-out-financegpt">Check Out FinanceGPT</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before diving in, you should be comfortable with the following:</p>
<p><strong>Python knowledge</strong>: You should know how to write Python functions, work with async/await syntax, and understand decorators. The code examples use all three extensively.</p>
<p><strong>Basic LLM/chatbot familiarity</strong>: You don't need to be an expert, but knowing what a large language model is and having some experience calling one (via OpenAI's API or similar) will help you follow along.</p>
<p><strong>LangChain basics</strong>: We'll be using LangGraph, which is built on top of LangChain. If you've never used LangChain before, it's worth skimming their <a href="https://python.langchain.com/docs/get_started/quickstart">quickstart guide first.</a></p>
<p>You'll also need the following tools installed:</p>
<ul>
<li><p>Python 3.10+</p>
</li>
<li><p><a href="https://python.langchain.com/docs/get_started/quickstart">An OpenAI API ke</a>y (the examples use <code>gpt-4-turbo-preview</code>)</p>
</li>
<li><p>The following packages, installable via pip:</p>
</li>
</ul>
<pre><code class="language-python">  pip install langchain langgraph langchain-openai sqlalchemy
</code></pre>
<p>If you're planning to follow along with the full FinanceGPT project rather than just the code snippets, you'll also want a PostgreSQL database set up, but that's optional for understanding the core concepts covered here.</p>
<h2 id="heading-what-are-ai-agents">What Are AI Agents?</h2>
<p>Think of AI agents as traditional chatbots that can answer user questions. But they specialize in figuring out what tools they need and can chain multiple actions together to get an answer.</p>
<p>Here’s an example conversation with my FinanceGPT AI agent:</p>
<pre><code class="language-plaintext">User: "How much did I spend on groceries this month?"

Agent: [Thinks: I need transaction data filtered by category]

Agent: [Calls search_transactions(category="Groceries")]

Agent: [Gets back: $1,245.67 across 23 transactions]

Agent: "You spent $1,245.67 on groceries this month."
</code></pre>
<p>The agent broke down the problem, picked the right tool to use, and generated the answer. This matters a lot when you’re working with messy real world problems where:</p>
<ul>
<li><p>Questions don’t fit into specific categories</p>
</li>
<li><p>You need to pull data from multiple sources</p>
</li>
<li><p>Users want to ask followup questions</p>
</li>
</ul>
<h2 id="heading-what-is-langgraph">What is LangGraph?</h2>
<p><code>LangGraph</code> is an open sourced extension of <code>LangChain</code> that’s useful for creating stateful AI agents by modeling workflows as nodes and edges in a graph. You can think of your agent’s logic as a flowchart where:</p>
<ul>
<li><p><strong>Nodes</strong> are the actions (for example “ask the LLM” or “run this tool”)</p>
</li>
<li><p><strong>Edges</strong> are the arrows (what happens next)</p>
</li>
<li><p><strong>State</strong> is the information passed around</p>
</li>
</ul>
<p>LangGraph is especially good at providing the following benefits:</p>
<ol>
<li><p><strong>Flow control</strong>: You define exactly what happens when.</p>
</li>
<li><p><strong>Stateful</strong>: The framework preserves conversation history for you.</p>
</li>
<li><p><strong>Easy to use</strong>: Just adding a decorator to an existing Python function makes it a tool.</p>
</li>
<li><p><strong>Production-ready</strong>: It has built-in error handling and retries.</p>
</li>
</ol>
<h2 id="heading-core-concept-1-tools">Core Concept 1: Tools</h2>
<p>Think of tools as just Python functions your AI agent can call. The LLM utilizes the function name, docstring, parameters, and return value to know what the functions are doing and when to use them.</p>
<p><code>LangChain</code> has a <code>@tool</code> decorator that can convert any function into a tool, for example:</p>
<pre><code class="language-python">from langchain_core.tools import tool

@tool
def get_current_weather(location: str) -&gt; str:
    """Get the current weather for a location.
    
    Use this when the user asks about weather conditions.
    
    Args:
        location: City name (e.g., "San Francisco", "New York")
    
    Returns:
        Weather description string
    """
    # In real life, you'd call a weather API here
    return f"The weather in {location} is sunny, 72°F"
</code></pre>
<p>Notice that the docstring is self-explanatory, as that’s how the LLM decides whether this tool is the right choice or not.</p>
<p>Here is a real example from FinanceGPT. This is a tool that searches through financial transactions:</p>
<pre><code class="language-python">from langchain_core.tools import tool
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import select

def create_search_transactions_tool(search_space_id: int, db_session: AsyncSession):
    """
    Factory function that creates a search tool with database access.
    
    This pattern lets you inject dependencies (database, user context)
    while keeping the tool signature clean for the LLM.
    """
    
    @tool
    async def search_transactions(
        keywords: str | None = None,
        category: str | None = None
    ) -&gt; dict:
        """Search financial transactions by merchant or category.
        
        Use when users ask about:
        - Spending at specific merchants ("How much at Starbucks?")
        - Spending in categories ("How much on groceries?")
        - Both combined ("Show me restaurant spending at McDonald's")
        
        Args:
            keywords: Merchant name to search for
            category: Spending category (e.g., "Groceries", "Gas")
        
        Returns:
            Dictionary with transactions, total amount, and count
        """
        # Query the database
        query = select(Document.document_metadata).where(
            Document.search_space_id == search_space_id
        )
        result = await db_session.execute(query)
        documents = result.all()
        
        # Filter transactions based on criteria
        all_transactions = []
        for (doc_metadata,) in documents:
            transactions = doc_metadata.get("financial_data", {}).get("transactions", [])
            
            for txn in transactions:
                # Apply filters
                if category and category.lower() not in str(txn.get("category", "")).lower():
                    continue
                if keywords and keywords.lower() not in txn.get("description", "").lower():
                    continue
                
                # Include matching transaction
                all_transactions.append({
                    "date": txn.get("date"),
                    "description": txn.get("description"),
                    "amount": float(txn.get("amount", 0)),
                    "category": txn.get("category"),
                })
        
        # Calculate total and return
        total = sum(abs(t["amount"]) for t in all_transactions if t["amount"] &lt; 0)
        
        return {
            "transactions": all_transactions[:20],  # Limit results
            "total_amount": total,
            "count": len(all_transactions),
            "summary": f"Found {len(all_transactions)} transactions totaling ${total:,.2f}"
        }
    
    return search_transactions
</code></pre>
<p>Let’s dive into what this code is doing.</p>
<p><strong>The factory function pattern</strong>: The tool only takes parameters the LLM can provide (a keyword and category), but it also needs a database session and <code>search_space_id</code> to know whose data to query. The factory function solves this by capturing those dependencies in a closure, so the LLM sees a clean interface while the database wiring stays hidden.</p>
<p><strong>The filtering logic</strong>: We loop through all transactions and apply the optional filters. If <code>category</code> is provided, it must appear in the transaction's category field. If <code>keywords</code> is provided, it must appear in the merchant description. Both can be used together, letting the LLM handle questions like "How much did I spend at McDonald's in the Restaurants category?"</p>
<p><strong>The return value</strong>: Instead of a raw list, the tool returns a structured dict with a capped result set, a pre-calculated total, and a plain-English summary string. The summary means the LLM can read <code>"Found 23 transactions totaling $1,245.67"</code> and immediately know what to say, rather than parsing the raw data itself.</p>
<h3 id="heading-key-tool-design-principles">Key Tool Design Principles</h3>
<p>These are the principles that differentiate a good tool from a great tool:</p>
<ol>
<li><p><strong>Docstrings:</strong> Instead of vague descriptions, you need to be thorough with the explanation of the tool in the docstring. The more examples you give, the better the LLM gets at picking the right tool.</p>
</li>
<li><p><strong>Clean signature:</strong> The tool should only take the parameters that the LLM has access to and can provide. If the tool needs user ids, or database connections (and so on), you can hide those in factory functions using closures.</p>
</li>
<li><p><strong>Return both data and summaries:</strong> Instead of just the raw data, if you include a summary field, the agent can just use that to understand the output better. Here’s an example:</p>
<pre><code class="language-json">{
    "transactions": [...],           # For detailed analysis
    "total_amount": 1245.67,         # Pre-calculated
    "summary": "Found 23 transactions..."  # Ready to send to user
}
</code></pre>
</li>
<li><p><strong>Limited context window:</strong> Capping results to a finite amount like 20-50 items depending on the use case will make sure your LLM doesn’t choke or hit context limits.</p>
</li>
</ol>
<h2 id="heading-core-concept-2-agent-state">Core Concept 2: Agent State</h2>
<p>Your agent carries around information as it works. This is called the agent’s state. For a chatbot, it’s usually the conversation history.</p>
<p>In <code>LangGraph</code>, state is defined with a <code>TypeDict</code>:</p>
<pre><code class="language-python">from typing import Annotated, Sequence, TypedDict
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
    """
    This is what flows through your agent.
    
    Messages is a list that keeps growing:
    - User questions
    - Agent responses
    - Tool results
    """
    messages: Annotated[Sequence[BaseMessage], "The conversation history"]
</code></pre>
<p>For complex agents, you can track more than just messages, like:</p>
<pre><code class="language-python">class FancierState(TypedDict):
    messages: Sequence[BaseMessage]
    user_id: str
    retry_count: int
    last_tool_used: str | None
</code></pre>
<p>This matters more than it might look. Each field here has a real purpose in a sophisticated production-grade agent. <code>user_id</code> tells every node whose data to fetch without you having to pass it around manually. <code>retry_count</code> helps agent detect when its stuck in a loop so it can bail out gracefully. <code>last_tool_used</code> helps the agent avoid redundant calls.</p>
<p>As the agent grows in complexity, state becomes the single source of truth that keeps every node coordinated.</p>
<h3 id="heading-why-state-matters">Why State Matters</h3>
<p>State is what separates an agent which is conversational from an API call that is stateless. Without it, every message would be processed in isolation and the agent would have no recollection of what was asked earlier, what tools it already used, and what data it retrieved already.</p>
<p>With state, the full conversation history is passed through each step of the agent’s execution.</p>
<p>Here's what that looks like in practice for our grocery spending example:</p>
<pre><code class="language-plaintext">When the conversation starts:
{
    "messages": []
}

User asks something:
{
    "messages": [
        HumanMessage("How much did I spend on groceries?")
    ]
}

Agent decides to use a tool:
{
    "messages": [
        HumanMessage("How much did I spend on groceries?"),
        AIMessage(tool_calls=[{name: "search_transactions", ...}]),
        ToolMessage({"total_amount": 1245.67, ...}),
    ]
}

Agent responds with the answer:
{
    "messages": [
        HumanMessage("How much did I spend on groceries?"),
        AIMessage(tool_calls=[...]),
        ToolMessage({...}),
        AIMessage("You spent $1,245.67 on groceries this month.")
    ]
}
</code></pre>
<p>Notice that the state is always growing with every tool call and every result. This means that when user has a followup like “How does that compare to last month?”, the agent can just look back and know what “that” refers to.</p>
<h2 id="heading-core-concept-3-the-agent-graph">Core Concept 3: The Agent Graph</h2>
<p>The graph is the backbone of your agent. Think of it as a collection of tools and an LLM, combined together to reason, act and respond in a structured way. Specifically, it determines the order of operations – that is, what runs first, what happens next, and what conditions determine which path to take.</p>
<p>Without a graph, you would have to manually orchestrate the workflow: calling the LLM, then checking whether it wants to use a tool, executing the tool, and then feeding the result back to it and deciding when to stop. The graph encodes this logic explicitly so that your agent figures out the right sequence.</p>
<p>Each node in the graph is an action like “ask the LLM” or “run a tool” and each edge is a connection between those actions.</p>
<p>With that in mind, let's build one step by step.</p>
<h3 id="heading-step-1-create-the-agent-node">Step 1: Create the Agent Node</h3>
<p>The agent node is where the LLM makes a decision like “Should I use a tool?” or “Which tool to use?”. Let’s take an example:</p>
<pre><code class="language-python">from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Create the LLM with tools
llm = ChatOpenAI(model="gpt-4-turbo-preview", temperature=0)

# Create your tools
tools = [
    create_search_transactions_tool(search_space_id, db_session),
    # ... other tools
]

# Bind tools to the LLM so it knows what's available
llm_with_tools = llm.bind_tools(tools)

# Create the system prompt
system_prompt = """You are a helpful AI financial assistant.

Your capabilities:
- Search transactions by merchant, category, or date
- Analyze portfolio performance
- Find tax optimization opportunities

Guidelines:
- Be concise and cite specific data
- Format currency as $X,XXX.XX
- Remind users to consult professionals for tax/investment advice"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    MessagesPlaceholder(variable_name="messages"),
])

# Define the agent node function
async def call_agent(state: AgentState):
    """
    The agent node calls the LLM to decide the next action.
    
    The LLM can:
    1. Call one or more tools
    2. Generate a text response
    3. Both
    """
    messages = state["messages"]
    
    # Format messages with system prompt
    formatted = prompt.format_messages(messages=messages)
    
    # Call the LLM
    response = await llm_with_tools.ainvoke(formatted)
    
    # Return state update (add the LLM's response)
    return {"messages": [response]}
</code></pre>
<p>Let’s walk through what's happening here.</p>
<p>First, we initialize the LLM with <code>temperature=0</code>, which makes the model deterministic and consistent. This is important for an agent that needs to make reliable decisions rather than creative ones.</p>
<p>Next, we call <code>llm.bind_tools(tools)</code>. It tells the LLM what tools are available by passing along their names, descriptions, and parameter schemas. Without this, the LLM would have no idea it could call any tools at all. With it, the LLM can look at a user's question and decide both whether a tool is needed and which one to use.</p>
<p>The prompt is built using <code>ChatPromptTemplate</code>, which combines a static system prompt with a <code>MessagesPlaceholder</code>. The placeholder is where the full conversation history gets inserted at runtime, meaning the LLM always has the complete context of the conversation when making its decision.</p>
<p>Last, <code>call_agent</code> is the actual node function. It pulls the current messages from state, formats them with the prompt, calls the LLM, and returns the response to be appended to state. This is the function LangGraph will call every time execution reaches the agent node.</p>
<h3 id="heading-step-2-create-the-tool-node">Step 2: Create the Tool Node</h3>
<p><code>LangGraph</code> has a pre-built <code>ToolNode</code> that executes tools:</p>
<pre><code class="language-python">from langgraph.prebuilt import ToolNode

# This node automatically executes any tools the LLM requested
tool_node = ToolNode(tools)
</code></pre>
<p>When the LLM includes tool calls in its response, <code>ToolNode</code> will:</p>
<ol>
<li><p>extract the tool calls,</p>
</li>
<li><p>execute each tool with specific params, and</p>
</li>
<li><p>add <code>ToolMessage</code> object with the result to state</p>
</li>
</ol>
<h3 id="heading-step-3-define-control-flow">Step 3: Define Control Flow</h3>
<p>This is where we need to decide when the tool should be used and when it ends.</p>
<pre><code class="language-python">from langgraph.graph import END

def should_continue(state: AgentState):
    """
    Router function that determines the next step.
    
    Returns:
        "tools" - if the LLM wants to use tools
        END - if the LLM is done (just text response)
    """
    last_message = state["messages"][-1]
    
    # Check if the LLM included tool calls
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    
    # No tool calls means we're done
    return END
</code></pre>
<p>This tiny function is the decision-maker of your entire agent. After the LLM responds, LangGraph calls <code>should_continue</code> to figure out what to do next. It works by inspecting the last message in state: the LLM's most recent response. If that response contains tool calls, it means the LLM has decided it needs more data before it can answer, so we return <code>"tools"</code> to route execution to the tool node. If there are no tool calls, the LLM has produced a final answer and we return <code>END</code> to stop execution.</p>
<p>This is the mechanism that makes the agent loop. The agent doesn't just call one tool and stop, but it can call a tool, see the result, decide it needs another tool, call that one too, and only stop when it has everything it needs to respond.</p>
<h3 id="heading-step-4-assemble-the-graph">Step 4: Assemble the Graph</h3>
<p>Now, we can connect everything:</p>
<pre><code class="language-python">from langgraph.graph import StateGraph

# Create the graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("agent", call_agent)
workflow.add_node("tools", tool_node)

# Set entry point
workflow.set_entry_point("agent")

# Add conditional edge from agent
workflow.add_conditional_edges(
    "agent",           # From this node
    should_continue,   # Use this function to decide
    {
        "tools": "tools",  # If "tools" is returned, go to tools node
        END: END           # If END is returned, finish
    }
)

# After tools execute, go back to agent
workflow.add_edge("tools", "agent")

# Compile into a runnable agent
agent = workflow.compile()
</code></pre>
<p>This is where everything gets wired together. We start by creating a <code>StateGraph</code> and passing it our <code>AgentState</code> type. This tells LangGraph what shape the state will take as it flows through the graph.</p>
<p>We then register our two nodes with <code>add_node</code>. The string name we give each node ("agent" and "tools") is what we'll use to reference them when defining edges. <code>set_entry_point</code> tells LangGraph where execution should begin which in our case is the agent node.</p>
<p>The conditional edge is where the routing logic plugs in. We're telling LangGraph: "After the agent node runs, call <code>should_continue</code> to decide what happens next, then use this mapping to translate that decision into the next node." If <code>should_continue</code> returns <code>"tools"</code>, go to the tools node. If it returns <code>END</code>, stop.</p>
<p>Finally, <code>add_edge("tools", "agent")</code> creates an unconditional edge: after the tools node runs, always go back to the agent node. This is what creates the loop, letting the agent review the tool results and decide whether it's done or needs to keep going. Calling <code>workflow.compile()</code> locks everything in and returns a runnable agent.</p>
<h3 id="heading-understanding-the-flow">Understanding the Flow</h3>
<p>Here’s what happens when you run the agent:</p>
<pre><code class="language-plaintext">User Question
    ↓
[AGENT NODE]
    ↓
[SHOULD_CONTINUE]
    ↓
  Tools needed?
    ↓ YES   ↓ NO
[TOOLS]    [END]
    ↓
[AGENT NODE]
    ↓
[SHOULD_CONTINUE]
    ↓
    ...
</code></pre>
<p>The loop above allows the agent to:</p>
<ol>
<li><p>Use a tool</p>
</li>
<li><p>See the results</p>
</li>
<li><p>Decide if more tools are needed</p>
</li>
<li><p>Use more tools or generate final answer</p>
</li>
</ol>
<h2 id="heading-how-to-put-it-all-together">How to Put it All Together</h2>
<p>Let’s see the complete agent in one place:</p>
<pre><code class="language-python">from typing import Annotated, Sequence, TypedDict
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode

# 1. Define State
class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], "Conversation history"]

# 2. Create Agent Function
def create_agent(tools):
    # Set up LLM
    llm = ChatOpenAI(model="gpt-4-turbo-preview", temperature=0)
    llm_with_tools = llm.bind_tools(tools)
    
    # Create prompt
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful AI assistant."),
        MessagesPlaceholder(variable_name="messages"),
    ])
    
    # Define nodes
    async def call_agent(state: AgentState):
        formatted = prompt.format_messages(messages=state["messages"])
        response = await llm_with_tools.ainvoke(formatted)
        return {"messages": [response]}
    
    def should_continue(state: AgentState):
        last_message = state["messages"][-1]
        if hasattr(last_message, "tool_calls") and last_message.tool_calls:
            return "tools"
        return END
    
    # Build graph
    workflow = StateGraph(AgentState)
    workflow.add_node("agent", call_agent)
    workflow.add_node("tools", ToolNode(tools))
    workflow.set_entry_point("agent")
    workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
    workflow.add_edge("tools", "agent")
    
    return workflow.compile()

# 3. Use the Agent
async def main():
    # Create tools (simplified example)
    tools = [create_search_transactions_tool(user_id=1, db_session=session)]
    
    # Create agent
    agent = create_agent(tools)
    
    # Run agent
    result = await agent.ainvoke({
        "messages": [HumanMessage(content="How much did I spend on groceries?")]
    })
    
    # Get final response
    final_response = result["messages"][-1].content
    print(final_response)
</code></pre>
<h2 id="heading-how-the-agent-thinks">How the Agent Thinks</h2>
<p>Let’s use an example to see how the agent reasons.</p>
<p><strong>Example: “How much did I spend on groceries this month?”</strong></p>
<h3 id="heading-step-1-user-input">Step 1: User Input</h3>
<pre><code class="language-python">State: {
    "messages": [HumanMessage("How much did I spend on groceries this month?")]
}
</code></pre>
<h3 id="heading-step-2-agent-node">Step 2: Agent Node</h3>
<p>The LLM gets:</p>
<ul>
<li><p>A system prompt, like the one we defined above</p>
</li>
<li><p>User question: “How much did I spend on groceries this month?”</p>
</li>
<li><p>List of available tools: <code>search_transactions(keywords, category)</code></p>
</li>
</ul>
<p>The LLM reasons that this is about spending in a specific category and decides that it should use <code>search_transactions</code> with <code>category=’groceries’</code>. It responds with a tool call:</p>
<pre><code class="language-python">AIMessage(
    content="",
    tool_calls=[{
        "name": "search_transactions",
        "args": {"category": "Groceries"},
        "id": "call_123"
    }]
)
</code></pre>
<h3 id="heading-step-3-should-continue">Step 3: Should Continue</h3>
<p>The router sees tool calls and returns “tools”.</p>
<h3 id="heading-step-4-tools-node">Step 4: Tools Node</h3>
<p>It executes <code>search_transactions(category="Groceries")</code> and gets:</p>
<pre><code class="language-python">{
    "transactions": [...],
    "total_amount": 1245.67,
    "count": 23,
    "summary": "Found 23 transactions totaling $1,245.67"
}
</code></pre>
<p>And adds this to the state:</p>
<pre><code class="language-python">ToolMessage(
    content='{"transactions": [...], "total_amount": 1245.67, ...}',
    tool_call_id="call_123"
)
</code></pre>
<h3 id="heading-step-5-agent-node-again">Step 5: Agent Node Again</h3>
<p>The LLM now sees the user question, its previous tool, and the results. The LLM thinks: “I now have the data, the user spent $1245.67 on groceries. I can answer now.” And the LLM responds with:</p>
<pre><code class="language-python">AIMessage(content="You spent $1,245.67 on groceries this month across 23 transactions.")
</code></pre>
<h3 id="heading-step-6-should-continue">Step 6: Should Continue</h3>
<p>No tool calls this time, so returns END.</p>
<p><strong>Final State:</strong></p>
<pre><code class="language-python">{
    "messages": [
        HumanMessage("How much did I spend on groceries this month?"),
        AIMessage("", tool_calls=[...]),
        ToolMessage('{"total_amount": 1245.67, ...}'),
        AIMessage("You spent $1,245.67 on groceries this month across 23 transactions.")
    ]
}
</code></pre>
<p>The user receives: "You spent $1245.67 on groceries this month across 23 transactions."</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Building an AI agent boils down to three ideas:</p>
<ol>
<li><p>Tools</p>
</li>
<li><p>State</p>
</li>
<li><p>Graph</p>
</li>
</ol>
<p>LangGraph gives you control, so you are not left hoping that the agent does the right thing – instead, you’re explicitly defining what the “right thing” is.</p>
<p>The FinanceGPT example shows how this works in a real application. By learning these concepts, now you can build specialized agents for different jobs.</p>
<h2 id="heading-resources-worth-checking-out">Resources Worth Checking Out</h2>
<p>These helped me learn LangGraph:</p>
<ul>
<li><p><a href="https://python.langchain.com/docs/langgraph">Official LangGraph docs</a>: Start here</p>
</li>
<li><p><a href="https://python.langchain.com/docs/concepts/langgraph">LangGraph conceptual guide</a>: Deeper theory</p>
</li>
<li><p><a href="https://python.langchain.com/docs/concepts/agents">LangChain agent patterns</a>: Alternative approaches</p>
</li>
</ul>
<h2 id="heading-check-out-financegpt"><strong>Check Out FinanceGPT</strong></h2>
<p>All the code examples here came from <a href="https://github.com/manojag115/FinanceGPT">FinanceGPT</a>. If you want to see these patterns in a complete app, poke around the repo. It's got document processing, portfolio tracking, tax optimization – all built with LangGraph.</p>
<p>If you find this helpful, <a href="https://github.com/manojag115/FinanceGPT">give the project a star on GitHub</a> – it helps other developers discover it.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use LangChain and LangGraph: A Beginner’s Guide to AI Workflows ]]>
                </title>
                <description>
                    <![CDATA[ Artificial intelligence is moving fast. Every week, new tools appear that make it easier to build apps powered by large language models. But many beginners still get stuck on one question: how do you structure the logic of an AI application? How do y... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-langchain-and-langgraph-a-beginners-guide-to-ai-workflows/</link>
                <guid isPermaLink="false">690b882e468be723832787a7</guid>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langgraph ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Wed, 05 Nov 2025 17:23:58 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762363391314/34c1c950-b257-40b2-a03d-cbaf1bfbd4b6.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Artificial intelligence is moving fast. Every week, new tools appear that make it easier to build apps powered by large language models.</p>
<p>But many beginners still get stuck on one question: how do you structure the logic of an AI application? How do you connect prompts, memory, tools, and APIs in a clean way?</p>
<p>That is where popular open-source frameworks like <a target="_blank" href="https://www.langchain.com/">LangChain</a> and <a target="_blank" href="https://www.langchain.com/langgraph">LangGraph</a> come in.</p>
<p>Both are part of the same ecosystem, and they’re designed to help you build complex AI workflows without reinventing the wheel.</p>
<p>LangChain focuses on building sequences of steps called chains, while LangGraph takes things a step further by adding memory, branching, and feedback loops to make your AI more intelligent and flexible.</p>
<p>This guide will help you understand what these tools do, how they differ, and how you can start using them to build your own AI projects.</p>
<h2 id="heading-what-we-will-cover"><strong>What we will cover</strong></h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-langchain">What is LangChain?</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-why-langchain-was-not-enough">Why LangChain Was Not Enough</a></li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-langgraph">What is LangGraph?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-langchain-vs-langgraph">LangChain vs LangGraph</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-to-use-each">When to Use Each</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-adding-memory-and-persistence">Adding Memory and Persistence</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-monitoring-and-debugging-with-langsmith">Monitoring and Debugging with LangSmith</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-langchain-ecosystem">The LangChain Ecosystem</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-what-is-langchain"><strong>What is LangChain?</strong></h2>
<p><a target="_blank" href="https://www.turingtalks.ai/p/how-to-build-better-ai-workflows-with-langchain">LangChain</a> is a Python and JavaScript framework that helps you build language model-powered applications. It provides a structure for connecting models like GPT, data sources, and tools into a single flow.</p>
<p>Instead of writing long prompt templates or hardcoding logic, you use components like chains, tools, and agents.</p>
<p>A simple example is chaining prompts together. For instance, you might first ask the model to summarize text, and then use the summary to generate a title. LangChain lets you define both steps and connect them in code.</p>
<p>Here is a basic example in Python:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.prompts <span class="hljs-keyword">import</span> PromptTemplate
<span class="hljs-keyword">from</span> langchain.chains <span class="hljs-keyword">import</span> LLMChain
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI

llm = ChatOpenAI(model=<span class="hljs-string">"gpt-4o-mini"</span>)
prompt = PromptTemplate.from_template(<span class="hljs-string">"Summarize the following text:\n{text}"</span>)
chain = LLMChain(prompt=prompt, llm=llm)
result = chain.run({<span class="hljs-string">"text"</span>: <span class="hljs-string">"LangChain helps developers build AI apps faster."</span>})
print(result)
</code></pre>
<p>This simple chain takes text and runs it through an OpenAI model to get a summary. You can add more steps, like a second chain to turn that summary into a title or a question.</p>
<p>LangChain provides modules for prompt templates, models, retrievers, and tools so you can build workflows without managing the raw API logic.</p>
<p>Here is the full <a target="_blank" href="https://docs.langchain.com/oss/python/langchain/overview">LangChain documentation</a>.</p>
<h3 id="heading-why-langchain-was-not-enough"><strong>Why LangChain Was Not Enough</strong></h3>
<p>LangChain made it easy to build straight-line workflows.</p>
<p>But most real-world applications are not linear. When <a target="_blank" href="https://www.freecodecamp.org/news/build-a-custom-ai-chat-application-with-nextjs/">building a chatbot</a>, summarizer, or an autonomous agent, you often need loops, memory, and conditions.</p>
<p>For example, if the AI makes a wrong assumption, you might want it to try again. If it needs more data, it should call a search tool. Or if a user changes context, the AI should remember what was discussed earlier.</p>
<p>LangChain’s chains and agents could do some of this, but the flow was hard to visualize and manage. You had to write nested chains or use callbacks to handle decisions.</p>
<p>Developers wanted a better way to represent how AI systems actually think. Not in straight lines, but as graphs where outputs can lead to different paths.</p>
<p>That’s what led to LangGraph.</p>
<h2 id="heading-what-is-langgraph"><strong>What is LangGraph?</strong></h2>
<p>LangGraph is an extension of LangChain that introduces a graph-based approach to AI workflows.</p>
<p>Instead of chaining steps in one direction, LangGraph lets you define nodes and edges like a flowchart. Each node can represent a task, an action, or a model call.</p>
<p>This structure allows loops, branching, and parallel paths. It’s perfect for building agent-like systems where the model reasons, decides, and acts.</p>
<p>Here is an example of a simple LangGraph setup:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langgraph.graph <span class="hljs-keyword">import</span> StateGraph, END
<span class="hljs-keyword">from</span> langgraph.prebuilt <span class="hljs-keyword">import</span> create_react_agent
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> Tool

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">multiply</span>(<span class="hljs-params">a: int, b: int</span>):</span>
    <span class="hljs-keyword">return</span> a * b
tools = [Tool(name=<span class="hljs-string">"multiply"</span>, func=multiply, description=<span class="hljs-string">"Multiply two numbers"</span>)]
llm = ChatOpenAI(model=<span class="hljs-string">"gpt-4o-mini"</span>)
agent_executor = create_react_agent(llm, tools)
graph = StateGraph()
graph.add_node(<span class="hljs-string">"agent"</span>, agent_executor)
graph.set_entry_point(<span class="hljs-string">"agent"</span>)
graph.add_edge(<span class="hljs-string">"agent"</span>, END)
app = graph.compile()
response = app.invoke({<span class="hljs-string">"input"</span>: <span class="hljs-string">"Use the multiply tool to get 8 times 7"</span>})
print(response)
</code></pre>
<p>This example shows a basic agent graph.</p>
<p>The AI receives a request, reasons about it, decides to use the tool, and completes the task. You can imagine extending this to more complex graphs where the AI can retry, call APIs, or fetch new information.</p>
<p>LangGraph gives you full control over how the AI moves between states. Each node can have conditions. For example, if an answer is incomplete, you can send it back to another node to refine it.</p>
<p>This makes LangGraph ideal for building systems that need multiple reasoning steps, like document analysis bots, code reviewers, or research assistants.</p>
<p>Here is the full <a target="_blank" href="https://docs.langchain.com/oss/python/langgraph/overview">LangGraph documentation</a>.</p>
<h2 id="heading-langchain-vs-langgraph"><strong>LangChain vs LangGraph</strong></h2>
<p>LangChain and LangGraph share the same foundation, but they approach workflows differently.</p>
<p>LangChain is linear. Each chain or agent moves from one step to the next in a sequence. It is simpler to start with, especially for prompt engineering, retrieval-augmented generation, and structured pipelines.</p>
<p>LangGraph is dynamic. It represents workflows as graphs that can loop, branch, and self-correct. It is more powerful when building agents that need reasoning, planning, or memory.</p>
<p>A good analogy is this: LangChain is like writing a list of tasks in order. LangGraph is like drawing a flowchart where decisions can lead to different actions or back to previous steps.</p>
<p>Most developers start with LangChain to learn the basics, then move to LangGraph when they want to build more interactive or autonomous AI systems.</p>
<h2 id="heading-when-to-use-each"><strong>When to Use Each</strong></h2>
<p>If you’re building simple tools like text summarizers, chatbots, or document retrievers, LangChain is enough. It’s easy to get started and integrates well with popular models like GPT, Claude, and Gemini.</p>
<p>If you want to build multi-step agents, or apps that think and adapt, go with LangGraph. You can define how the AI reacts to different outcomes, and you get more control over retry logic, context switching, and feedback loops.</p>
<p>In practice, many developers combine both. LangChain provides the building blocks, while LangGraph organizes how those blocks interact.</p>
<h2 id="heading-adding-memory-and-persistence"><strong>Adding Memory and Persistence</strong></h2>
<p>Both LangChain and LangGraph support memory, which allows your AI to remember context between interactions. This is useful when you’re building chatbots, assistants, or agents that need to carry information across steps.</p>
<p>For example, if a user introduces themselves once, the AI should be able to recall that detail later in the conversation.</p>
<p>In LangChain, memory is handled through built-in modules like <code>ConversationBufferMemory</code> or <code>ConversationSummaryMemory</code>. These let you store previous inputs and outputs so the model can reference them in future responses.</p>
<p>Here’s a simple example using LangChain:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.memory <span class="hljs-keyword">import</span> ConversationBufferMemory
<span class="hljs-keyword">from</span> langchain.chains <span class="hljs-keyword">import</span> ConversationChain
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI

memory = ConversationBufferMemory()
llm = ChatOpenAI(model=<span class="hljs-string">"gpt-4o-mini"</span>)
conversation = ConversationChain(llm=llm, memory=memory)

conversation.predict(input=<span class="hljs-string">"Hello, I am Manish."</span>)
response = conversation.predict(input=<span class="hljs-string">"What did I just tell you?"</span>)
print(response)
</code></pre>
<p>In this case, the model remembers your previous message and answers accordingly. The memory object acts like a running conversation log, keeping track of the dialogue as it evolves.</p>
<p>LangGraph takes this a step further by embedding memory into the graph’s state. Each node in the graph can access or update shared memory, allowing your AI to maintain context across multiple reasoning steps or branches. This approach is especially useful when building agents that loop, revisit nodes, or depend on previous interactions.</p>
<p>Here’s how memory can be added inside a LangGraph workflow:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langgraph.graph <span class="hljs-keyword">import</span> StateGraph, END
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">from</span> langchain.memory <span class="hljs-keyword">import</span> ConversationBufferMemory
<span class="hljs-keyword">from</span> langgraph.prebuilt <span class="hljs-keyword">import</span> create_react_agent

llm = ChatOpenAI(model=<span class="hljs-string">"gpt-4o-mini"</span>)
memory = ConversationBufferMemory()

agent = create_react_agent(llm)
graph = StateGraph()

<span class="hljs-comment"># Add node with access to memory</span>
graph.add_node(<span class="hljs-string">"chat"</span>, <span class="hljs-keyword">lambda</span> state: agent.invoke({<span class="hljs-string">"input"</span>: state[<span class="hljs-string">"input"</span>], <span class="hljs-string">"memory"</span>: memory}))
graph.set_entry_point(<span class="hljs-string">"chat"</span>)
graph.add_edge(<span class="hljs-string">"chat"</span>, END)

app = graph.compile()

app.invoke({<span class="hljs-string">"input"</span>: <span class="hljs-string">"Hello, I am Manish."</span>})
response = app.invoke({<span class="hljs-string">"input"</span>: <span class="hljs-string">"What did I just tell you?"</span>})
print(response)
</code></pre>
<p>Here, the graph keeps track of memory between invocations. Even though each call runs through the same node, the shared <code>ConversationBufferMemory</code> retains what was said earlier. This design lets you build agents that remember user context, maintain history, and adapt as they move between nodes.</p>
<p>Whether you use LangChain or LangGraph, adding memory is what turns a simple workflow into a stateful system, one that can carry on a conversation, refine its reasoning, and respond more naturally over time.</p>
<h2 id="heading-monitoring-and-debugging-with-langsmith"><strong>Monitoring and Debugging with LangSmith</strong></h2>
<p><a target="_blank" href="https://www.langchain.com/langsmith/observability">LangSmith</a> is another important tool from the LangChain ecosystem. It helps you visualize, monitor, and debug your AI applications.</p>
<p>When building workflows, you often want to see how the model behaves, how much it costs, and where things go wrong.</p>
<p>LangSmith records every call made by your chains and agents. You can view input and output data, timing, token usage, and errors. It provides a dashboard that shows how your system performed across multiple runs.</p>
<p>You can integrate LangSmith easily by setting your environment variable:</p>
<pre><code class="lang-python-repl">export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="your_api_key_here"
</code></pre>
<p>Then, every LangChain or LangGraph process you run will automatically log to LangSmith. This helps developers find bugs, optimize prompts, and understand how the workflow behaves at each step.</p>
<p>Note that while Langchain and LangGraph are open source, Langsmith is a paid platform. Langsmith is a good-to-have tool and not a requirement to build AI workflows.</p>
<h2 id="heading-the-langchain-ecosystem"><strong>The LangChain Ecosystem</strong></h2>
<p>LangChain is not just one library. It has grown into an ecosystem of tools that work together.</p>
<ul>
<li><p><strong>LangChain Core</strong>: The main framework for chains, prompts, and memory.</p>
</li>
<li><p><strong>LangGraph</strong>: A graph-based extension for building adaptive workflows.</p>
</li>
<li><p><strong>LangSmith</strong>: A debugging and monitoring platform for AI apps.</p>
</li>
<li><p><strong>LangServe</strong>: A deployment layer that lets you turn your chains and graphs into APIs with one command.</p>
</li>
</ul>
<p>Together, these tools form a complete stack for building, managing, and deploying language model applications. You can start with a simple chain, evolve it into a graph-based system, test it with LangSmith, and deploy it using LangServe.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>LangChain and LangGraph make it easier to move from prompts to production-ready AI systems. LangChain helps you build linear flows that connect models, data, and tools. LangGraph lets you go further by building adaptive and intelligent workflows that reason and learn.</p>
<p>For beginners, starting with LangChain is the best way to understand how language models can interact with other components. As your projects grow, LangGraph will give you the flexibility to handle complex logic and long-term state.</p>
<p>Whether you are building a chatbot, an agent, or a knowledge assistant, these tools will help you go from idea to implementation faster and more reliably.</p>
<p><em>Hope you enjoyed this article. Signup for my free newsletter</em> <a target="_blank" href="https://www.turingtalks.ai/"><strong><em>TuringTalks.ai</em></strong></a> <em>for more hands-on tutorials on AI. You can also</em> <a target="_blank" href="https://manishshivanandhan.com/"><strong><em>visit my website</em></strong></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a LangGraph and Composio-Powered Discord Bot ]]>
                </title>
                <description>
                    <![CDATA[ With the rise of AI tools over the past couple years, most of us are learning how to use them in our projects. And in this article, I’ll teach you how to build a quick Discord bot with LangGraph and Composio. You’ll use LangGraph nodes to build a bra... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-a-langgraph-composio-powered-discord-bot/</link>
                <guid isPermaLink="false">685b12877dabc4d300e53706</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agentic AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langgraph ]]>
                    </category>
                
                    <category>
                        <![CDATA[ bot ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Shrijal Acharya ]]>
                </dc:creator>
                <pubDate>Tue, 24 Jun 2025 21:03:03 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750798930964/65dd7078-e4e7-42d0-a797-1e7d72690513.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>With the rise of AI tools over the past couple years, most of us are learning how to use them in our projects. And in this article, I’ll teach you how to build a quick Discord bot with LangGraph and Composio.</p>
<p>You’ll use LangGraph nodes to build a branching flow that processes incoming messages and detects intent like chat, support, or tool usage. It’ll then route them to the right logic based on what the user says.</p>
<p>I know it may sound a bit weird to use LangGraph for a Discord bot, but you’ll soon see that this project is a pretty fun way to visualize how node-based AI workflows actually run.</p>
<p>For now, the workflow is simple: you’ll figure out if the user is just chatting, asking a support question, or requesting that the bot perform an action, and respond based on that.</p>
<p><strong>What you will learn:</strong> 👀</p>
<ul>
<li><p>How to use LangGraph to create an AI-driven workflow that powers your bot’s logic.</p>
</li>
<li><p>How you can integrate Composio to let your bot take real-world actions using external tools.</p>
</li>
<li><p>How you can use Discord.js and handle different message types like replies, threads, and embeds.</p>
</li>
<li><p>How you can maintain per-channel context using message history and pass it into AI.</p>
</li>
</ul>
<p>By the end of this article, you’ll have a quite decent and functional Discord bot that you can add to your server. It replies to users based on message context and even has tool-calling support! (And there’s a small challenge for you to implement something yourself.) 😉</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Make sure you have Discord installed on your machine so you can test the bot easily.</p>
<p>This project is designed to demonstrate how you can build a bot powered by LangGraph and Composio. Before proceeding, it is helpful to have a basic understanding of:</p>
<ul>
<li><p>How to work with Node.js</p>
</li>
<li><p>Rough idea of what LangGraph is and how it works</p>
</li>
<li><p>How to work with Discord.js</p>
</li>
<li><p>What AI Agents are</p>
</li>
</ul>
<p>If you’re not confident about any of these, try following along anyway. You might pick things up just fine. And if it ever gets confusing, you can always check out the full source code <a target="_blank" href="https://github.com/shricodev/discord-bot-langgraph-composio">here</a>.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-the-environment">How to Set Up the Environment</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-initialize-the-project">Initialize the Project</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-install-dependencies">Install Dependencies</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-configure-composio">Configure Composio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-configure-discord-integration">Configure Discord Integration</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-add-environment-variables">Add Environment Variables</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-build-the-application-logic">Build the Application Logic</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-define-types-and-utility-helpers">Define Types and Utility Helpers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-implement-langgraph-workflow">Implement LangGraph Workflow</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-set-up-discordjs-client">Set Up Discord.js Client</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-how-to-set-up-the-environment">How to Set Up the Environment</h2>
<p>In this section, we will get everything set up for building the project.</p>
<h3 id="heading-initialize-the-project">Initialize the Project</h3>
<p>Initialize a Node.js application with the following command:</p>
<p>💁 Here I'm using Bun, but you can choose any package manager of your choice.</p>
<pre><code class="lang-bash">mkdir discord-bot-langgraph &amp;&amp; <span class="hljs-built_in">cd</span> discord-bot-langgraph \
&amp;&amp; bun init -y
</code></pre>
<p>Now, that our Node.js application is ready, let's install some dependencies.</p>
<h3 id="heading-install-dependencies">Install Dependencies</h3>
<p>We'll be using the following main packages and some other helper packages:</p>
<ul>
<li><p><a target="_blank" href="https://discord.js.org">discord.js</a>: Interacts with the Discord API</p>
</li>
<li><p><a target="_blank" href="https://composio.dev">composio</a>: Adds tools integration support to the bot</p>
</li>
<li><p><a target="_blank" href="https://platform.openai.com">openai</a>: Enables AI-powered responses</p>
</li>
<li><p><a target="_blank" href="https://www.langchain.com">langchain</a>: Manages LLM workflows</p>
</li>
<li><p><a target="_blank" href="https://zod.dev">zod</a>: Validates and parses data safely</p>
</li>
</ul>
<pre><code class="lang-bash">bun add discord.js openai @langchain/core @langchain/langgraph \
langchain composio-core dotenv zod uuid
</code></pre>
<h3 id="heading-configure-composio">Configure Composio</h3>
<p>💁 You’ll use Composio to add integrations to your application. You can choose the integration of your choice, but here I'm using Google sheets.</p>
<p>First, before moving forward, you need to get access to a Composio API key.</p>
<p>Go ahead and create an account on Composio, get your API key, and paste it in the <code>.env</code> file in the root of the project:</p>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/lkr1pys0txedp9vam4tt.png" alt="Composio dashboard" width="1920" height="972" loading="lazy"></p>
<pre><code class="lang-ini"><span class="hljs-attr">COMPOSIO_API_KEY</span>=&lt;your_composio_api_key&gt;
</code></pre>
<p>Authenticate yourself with the following command:</p>
<pre><code class="lang-bash">composio login
</code></pre>
<p>Once that’s done, run the <code>composio whoami</code> command, and if you see something like the below, you’re successfully logged in.</p>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ifzbkw6u6bwnj68lwqxt.png" alt="Output of the `composio whoami` command" width="1115" height="304" loading="lazy"></p>
<p>You're almost there: now you just need to set up integrations. Here, I’ll use Google sheets, but again you can set up any integration you like.</p>
<p>Run the following command to set up the Google Sheets integration:</p>
<pre><code class="lang-bash">composio add googlesheets
</code></pre>
<p>You should see an output similar to this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750336813743/9079ef2b-dc2a-4b10-b001-50e4cf98f3c5.png" alt="Add Composio Google Sheets integration" class="image--center mx-auto" width="1457" height="384" loading="lazy"></p>
<p>Head over to the URL that’s shown, and you should be authenticated like so:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750325571006/b0864445-7471-471f-88eb-f2ec8d832b39.png" alt="Composio authentication success" class="image--center mx-auto" width="1916" height="947" loading="lazy"></p>
<p>That's it. You’ve successfully added the Google Sheets integration and can access all its tools in your application.</p>
<p>Once finished, run the <code>composio integrations</code> command to verify if it worked. You should see a list of all your integrations:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750325653419/4585b63a-5581-4102-92e4-a55dca018063.png" alt="Composio list of integrations" class="image--center mx-auto" width="1175" height="268" loading="lazy"></p>
<h3 id="heading-configure-discord-integration">Configure Discord Integration</h3>
<p>This is a bit off topic for this tutorial, but basically, you’ll create an application/bot on Discord and add it to your server.</p>
<p>You can find a guide on how to create and add a bot to your server in the <a target="_blank" href="https://discordjs.guide/preparations/adding-your-bot-to-servers.html#bot-invite-links">Discord.js</a> documentation.</p>
<p>And yes, it’s free if you’re wondering whether any step here requires a pro account or anything. 😉</p>
<p>Make sure you populate these three environment variables:</p>
<pre><code class="lang-ini"><span class="hljs-attr">DISCORD_BOT_TOKEN</span>=&lt;YOUR_DISCORD_BOT_TOKEN&gt;
<span class="hljs-attr">DISCORD_BOT_GUILD_ID</span>=&lt;YOUR_DISCORD_BOT_GUILD_ID&gt;
<span class="hljs-attr">DISCORD_BOT_CHANNEL_ID</span>=&lt;YOUR_DISCORD_BOT_CHANNEL_ID&gt;
</code></pre>
<h3 id="heading-add-environment-variables">Add Environment Variables</h3>
<p>You’ll require a few other environment variables, including the OpenAI API key, for the bot to work.</p>
<p>Your final <code>.env</code> file should look something like this:</p>
<pre><code class="lang-ini"><span class="hljs-attr">OPENAI_API_KEY</span>=&lt;YOUR_OPENAI_API_KEY&gt;

<span class="hljs-attr">COMPOSIO_API_KEY</span>=&lt;YOUR_COMPOSIO_API_KEY&gt;

<span class="hljs-attr">DISCORD_BOT_TOKEN</span>=&lt;YOUR_DISCORD_BOT_TOKEN&gt;
<span class="hljs-attr">DISCORD_BOT_GUILD_ID</span>=&lt;YOUR_DISCORD_BOT_GUILD_ID&gt;
<span class="hljs-attr">DISCORD_BOT_CHANNEL_ID</span>=&lt;YOUR_DISCORD_BOT_CHANNEL_ID&gt;
</code></pre>
<h2 id="heading-build-the-application-logic">Build the Application Logic</h2>
<p>Now that you’ve laid all the groundwork, you can finally start coding the project.</p>
<h3 id="heading-define-types-and-utility-helpers">Define Types and Utility Helpers</h3>
<p>Let’s start by writing some helper functions and defining the types of data you’ll be working with.</p>
<p>It's important in any application, especially ones like the one we're building – which is prone to errors due to multiple API calls – that we set up decent logging so we know when and how things go wrong.</p>
<p>Create a new file named <code>logger.ts</code> inside the <code>utils</code> directory and add the following lines of code:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇 discord-bot-langgraph/utils/logger.ts</span>

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> DEBUG = <span class="hljs-string">"DEBUG"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> INFO = <span class="hljs-string">"INFO"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> WARN = <span class="hljs-string">"WARN"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> ERROR = <span class="hljs-string">"ERROR"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> LogLevel = <span class="hljs-keyword">typeof</span> DEBUG | <span class="hljs-keyword">typeof</span> INFO | <span class="hljs-keyword">typeof</span> WARN | <span class="hljs-keyword">typeof</span> ERROR;

<span class="hljs-comment">// eslint-disable-next-line  @typescript-eslint/no-explicit-any</span>
<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">log</span>(<span class="hljs-params">level: LogLevel, message: <span class="hljs-built_in">string</span>, ...data: <span class="hljs-built_in">any</span>[]</span>) </span>{
  <span class="hljs-keyword">const</span> timestamp = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>().toLocaleString();
  <span class="hljs-keyword">const</span> prefix = <span class="hljs-string">`[<span class="hljs-subst">${timestamp}</span>] [<span class="hljs-subst">${level}</span>]`</span>;

  <span class="hljs-keyword">switch</span> (level) {
    <span class="hljs-keyword">case</span> ERROR:
      <span class="hljs-built_in">console</span>.error(prefix, message, ...data);
      <span class="hljs-keyword">break</span>;
    <span class="hljs-keyword">case</span> WARN:
      <span class="hljs-built_in">console</span>.warn(prefix, message, ...data);
      <span class="hljs-keyword">break</span>;
    <span class="hljs-keyword">default</span>:
      <span class="hljs-built_in">console</span>.log(prefix, message, ...data);
  }
}
</code></pre>
<p>This is already looking great. Why not write a small environment variables validator? Run this during the initial program startup, and if something goes wrong, the application will exit with clear logs so users know if any environment variables are missing.</p>
<p>Create a new file named <code>env-validator.ts</code> in the <code>utils</code> directory and add the following lines of code:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇 discord-bot-langgraph/utils/env-validator.ts</span>

<span class="hljs-keyword">import</span> { log, ERROR } <span class="hljs-keyword">from</span> <span class="hljs-string">"./logger.js"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> OPENAI_API_KEY = <span class="hljs-string">"OPENAI_API_KEY"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> DISCORD_BOT_TOKEN = <span class="hljs-string">"DISCORD_BOT_TOKEN"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> DISCORD_BOT_GUILD_ID = <span class="hljs-string">"DISCORD_BOT_GUILD_ID"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> DISCORD_BOT_CLIENT_ID = <span class="hljs-string">"DISCORD_BOT_CLIENT_ID"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> COMPOSIO_API_KEY = <span class="hljs-string">"COMPOSIO_API_KEY"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> validateEnvVars = (requiredEnvVars: <span class="hljs-built_in">string</span>[]): <span class="hljs-function"><span class="hljs-params">void</span> =&gt;</span> {
  <span class="hljs-keyword">const</span> missingVars: <span class="hljs-built_in">string</span>[] = [];

  <span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> envVar <span class="hljs-keyword">of</span> requiredEnvVars) {
    <span class="hljs-keyword">if</span> (!process.env[envVar]) {
      missingVars.push(envVar);
    }
  }

  <span class="hljs-keyword">if</span> (missingVars.length &gt; <span class="hljs-number">0</span>) {
    log(
      ERROR,
      <span class="hljs-string">"missing required environment variables. please create a .env file and add the following:"</span>,
    );
    missingVars.forEach(<span class="hljs-function">(<span class="hljs-params">envVar</span>) =&gt;</span> <span class="hljs-built_in">console</span>.error(<span class="hljs-string">`- <span class="hljs-subst">${envVar}</span>`</span>));
    process.exit(<span class="hljs-number">1</span>);
  }
};
</code></pre>
<p>Now, let's also define the type of data you'll be working with:</p>
<p>Create a new file named <code>types.ts</code> inside the <code>types</code> directory and add the following lines of code:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇 discord-bot-langgraph/types/types.ts</span>

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> QUESTION = <span class="hljs-string">"QUESTION"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> HELP = <span class="hljs-string">"HELP"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> SUPPORT = <span class="hljs-string">"SUPPORT"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> OTHER = <span class="hljs-string">"OTHER"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> TOOL_CALL_REQUEST = <span class="hljs-string">"TOOL_CALL_REQUEST"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> FinalAction =
  | { <span class="hljs-keyword">type</span>: <span class="hljs-string">"REPLY"</span>; content: <span class="hljs-built_in">string</span> }
  | { <span class="hljs-keyword">type</span>: <span class="hljs-string">"REPLY_IN_THREAD"</span>; content: <span class="hljs-built_in">string</span> }
  | {
      <span class="hljs-keyword">type</span>: <span class="hljs-string">"CREATE_EMBED"</span>;
      title: <span class="hljs-built_in">string</span>;
      description: <span class="hljs-built_in">string</span>;
      roleToPing?: <span class="hljs-built_in">string</span>;
    };

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> MessageChoice =
  | <span class="hljs-keyword">typeof</span> SUPPORT
  | <span class="hljs-keyword">typeof</span> OTHER
  | <span class="hljs-keyword">typeof</span> TOOL_CALL_REQUEST;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> SupportTicketType = <span class="hljs-keyword">typeof</span> QUESTION | <span class="hljs-keyword">typeof</span> HELP;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Message = {
  author: <span class="hljs-built_in">string</span>;
  content: <span class="hljs-built_in">string</span>;
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> SupportTicketQuestion = {
  description: <span class="hljs-built_in">string</span>;
  answer: <span class="hljs-built_in">string</span>;
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> SupportTicket = {
  <span class="hljs-keyword">type</span>?: SupportTicketType;
  question?: SupportTicketQuestion;
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> ToolCallRequestAction = {
  <span class="hljs-comment">// actionLog is not intended to be shown to the end-user.</span>
  <span class="hljs-comment">// This is solely for logging purpose.</span>
  actionLog: <span class="hljs-built_in">string</span>;
  status: <span class="hljs-string">"success"</span> | <span class="hljs-string">"failed"</span> | <span class="hljs-string">"acknowledged"</span>;
};
</code></pre>
<p>The types are pretty self-explanatory, but here’s a quick overview.</p>
<p><code>Message</code> holds the user's input and author. Each message can be marked as support, a tool call request, or just other, like spam or small talk.</p>
<p>Support messages are further labeled as either help or a question using <code>SupportTicketType</code>.</p>
<p>The graph returns a <code>FinalAction</code>, which can be a direct reply, a reply in a thread, or an embed. If it's <code>CREATE_EMBED</code> and has <code>roleToPing</code> set, it denotes support help, so we can ping the mod.</p>
<p>For tool-based responses, <code>ToolCallRequestAction</code> stores the status and an internal log used for debugging.</p>
<p>Now, you need one last helper function to use in your nodes to extract the response from the LLM. Create a new file named <code>helpers.ts</code> and add the following code:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇 discord-bot-langgraph/utils/helpers.ts</span>

<span class="hljs-keyword">import</span> <span class="hljs-keyword">type</span> { AIMessage } <span class="hljs-keyword">from</span> <span class="hljs-string">"@langchain/core/messages"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">extractStringFromAIMessage</span>(<span class="hljs-params">
  message: AIMessage,
  fallback: <span class="hljs-built_in">string</span> = "No valid response generated by the LLM.",
</span>): <span class="hljs-title">string</span> </span>{
  <span class="hljs-keyword">if</span> (<span class="hljs-keyword">typeof</span> message.content === <span class="hljs-string">"string"</span>) {
    <span class="hljs-keyword">return</span> message.content;
  }

  <span class="hljs-keyword">if</span> (<span class="hljs-built_in">Array</span>.isArray(message.content)) {
    <span class="hljs-keyword">const</span> textContent = message.content
      .map(<span class="hljs-function">(<span class="hljs-params">item</span>) =&gt;</span> (<span class="hljs-keyword">typeof</span> item === <span class="hljs-string">"string"</span> ? item : <span class="hljs-string">""</span>))
      .join(<span class="hljs-string">" "</span>);
    <span class="hljs-keyword">return</span> textContent.trim() || fallback;
  }

  <span class="hljs-keyword">return</span> fallback;
}
</code></pre>
<p>You're all set for now with these helper functions in place. Now, you can start coding the logic.</p>
<h3 id="heading-implement-langgraph-workflow">Implement LangGraph Workflow</h3>
<p>Now that you have the types defined, structure your graph and connect it with some edges.</p>
<p>Create a new file named <code>graph.ts</code> inside the <code>src</code> directory and add the following lines of code:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇 discord-bot-langgraph/src/graph.ts</span>

<span class="hljs-keyword">import</span> { Annotation, END, START, StateGraph } <span class="hljs-keyword">from</span> <span class="hljs-string">"@langchain/langgraph"</span>;
<span class="hljs-keyword">import</span> {
  <span class="hljs-keyword">type</span> FinalAction,
  <span class="hljs-keyword">type</span> ToolCallRequestAction,
  <span class="hljs-keyword">type</span> Message,
  <span class="hljs-keyword">type</span> MessageChoice,
  <span class="hljs-keyword">type</span> SupportTicket,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"../types/types.js"</span>;
<span class="hljs-keyword">import</span> {
  processToolCall,
  processMessage,
  processOther,
  processSupport,
  processSupportHelp,
  processSupportQuestion,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"./nodes.js"</span>;
<span class="hljs-keyword">import</span> { processMessageEdges, processSupportEdges } <span class="hljs-keyword">from</span> <span class="hljs-string">"./edges.js"</span>;

<span class="hljs-keyword">const</span> state = Annotation.Root({
  message: Annotation&lt;Message&gt;(),
  previousMessages: Annotation&lt;Message[]&gt;(),
  messageChoice: Annotation&lt;MessageChoice&gt;(),
  supportTicket: Annotation&lt;SupportTicket&gt;(),
  toolCallRequest: Annotation&lt;ToolCallRequestAction&gt;(),
  finalAction: Annotation&lt;FinalAction&gt;(),
});

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> State = <span class="hljs-keyword">typeof</span> state.State;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Update = <span class="hljs-keyword">typeof</span> state.Update;

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">initializeGraph</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> workflow = <span class="hljs-keyword">new</span> StateGraph(state);

  workflow
    .addNode(<span class="hljs-string">"process-message"</span>, processMessage)
    .addNode(<span class="hljs-string">"process-support"</span>, processSupport)
    .addNode(<span class="hljs-string">"process-other"</span>, processOther)

    .addNode(<span class="hljs-string">"process-support-question"</span>, processSupportQuestion)
    .addNode(<span class="hljs-string">"process-support-help"</span>, processSupportHelp)
    .addNode(<span class="hljs-string">"process-tool-call"</span>, processToolCall)

    <span class="hljs-comment">// Edges setup starts here....</span>
    .addEdge(START, <span class="hljs-string">"process-message"</span>)

    .addConditionalEdges(<span class="hljs-string">"process-message"</span>, processMessageEdges)
    .addConditionalEdges(<span class="hljs-string">"process-support"</span>, processSupportEdges)

    .addEdge(<span class="hljs-string">"process-other"</span>, END)
    .addEdge(<span class="hljs-string">"process-support-question"</span>, END)
    .addEdge(<span class="hljs-string">"process-support-help"</span>, END)
    .addEdge(<span class="hljs-string">"process-tool-call"</span>, END);

  <span class="hljs-keyword">const</span> graph = workflow.compile();

  <span class="hljs-comment">// To get the graph in png</span>
  <span class="hljs-comment">// getGraph() is deprecated though</span>
  <span class="hljs-comment">// Bun.write("graph/graph.png", await graph.getGraph().drawMermaidPng());</span>

  <span class="hljs-keyword">return</span> graph;
}
</code></pre>
<p>The <code>initializeGraph</code> function, as the name suggests, returns the graph you can use to execute the workflow.</p>
<p>The <code>process-message</code> node is the starting point of the graph. It takes in the user’s message, processes it, and routes it to the appropriate next node: <code>process-support</code>, <code>process-tool-call</code>, or <code>process-other</code>.</p>
<p>The <code>process-support</code> node further classifies the support message and decides whether it should go to <code>process-support-help</code> or <code>process-support-question</code>.</p>
<p>The <code>process-tool-call</code> node handles messages when the user tries to trigger some kind of tool or action.</p>
<p>The <code>process-other</code> node handles everything that doesn’t fall into the support or tool call categories. These are general or fallback responses.</p>
<p>To help you visualize how things will shape up, here’s how the graph looks with all the different nodes (yet to work on!):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750327093884/fa8e6b4e-ca61-4900-9b3b-7b3a2863c296.png" alt="LangGraph nodes for the Discord bot workflow" class="image--center mx-auto" width="886" height="432" loading="lazy"></p>
<p>To wire everything together, you need to define edges between nodes, including conditional edges that dynamically decide the next step based on the state.</p>
<p>Create a new file named <code>edges.ts</code> inside the <code>src</code> directory and add the following lines of code:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇 discord-bot-langgraph/src/edges.ts</span>

<span class="hljs-keyword">import</span> { END } <span class="hljs-keyword">from</span> <span class="hljs-string">"@langchain/langgraph"</span>;
<span class="hljs-keyword">import</span> { <span class="hljs-keyword">type</span> State } <span class="hljs-keyword">from</span> <span class="hljs-string">"./graph.js"</span>;
<span class="hljs-keyword">import</span> { QUESTION, OTHER, SUPPORT, TOOL_CALL_REQUEST } <span class="hljs-keyword">from</span> <span class="hljs-string">"../types/types.js"</span>;
<span class="hljs-keyword">import</span> { log, WARN } <span class="hljs-keyword">from</span> <span class="hljs-string">"../utils/logger.js"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> processMessageEdges = (
  state: State,
): <span class="hljs-string">"process-support"</span> | <span class="hljs-string">"process-other"</span> | <span class="hljs-string">"process-tool-call"</span> | <span class="hljs-string">"__end__"</span> =&gt; {
  <span class="hljs-keyword">if</span> (!state.messageChoice) {
    log(WARN, <span class="hljs-string">"state.messageChoice is undefined. Returning..."</span>);
    <span class="hljs-keyword">return</span> END;
  }

  <span class="hljs-keyword">switch</span> (state.messageChoice) {
    <span class="hljs-keyword">case</span> SUPPORT:
      <span class="hljs-keyword">return</span> <span class="hljs-string">"process-support"</span>;
    <span class="hljs-keyword">case</span> TOOL_CALL_REQUEST:
      <span class="hljs-keyword">return</span> <span class="hljs-string">"process-tool-call"</span>;
    <span class="hljs-keyword">case</span> OTHER:
      <span class="hljs-keyword">return</span> <span class="hljs-string">"process-other"</span>;
    <span class="hljs-keyword">default</span>:
      log(WARN, <span class="hljs-string">"unknown message choice. Returning..."</span>);
      <span class="hljs-keyword">return</span> END;
  }
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> processSupportEdges = (
  state: State,
): <span class="hljs-string">"process-support-question"</span> | <span class="hljs-string">"process-support-help"</span> | <span class="hljs-string">"__end__"</span> =&gt; {
  <span class="hljs-keyword">if</span> (!state.supportTicket?.type) {
    log(WARN, <span class="hljs-string">"state.supportTicket.type is undefined. Returning..."</span>);
    <span class="hljs-keyword">return</span> END;
  }

  <span class="hljs-keyword">return</span> state.supportTicket.type === QUESTION
    ? <span class="hljs-string">"process-support-question"</span>
    : <span class="hljs-string">"process-support-help"</span>;
};
</code></pre>
<p>These are the edges that connect different nodes in your application. They direct the flow in your graph.</p>
<p>Things are really shaping up – so let’s finish the core logic by implementing all the nodes for your application.</p>
<p>Create a new file named <code>nodes.ts</code> inside the <code>src</code> directory and add the following lines of code:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇 discord-bot-langgraph/src/nodes.ts</span>

<span class="hljs-keyword">import</span> { <span class="hljs-keyword">type</span> State, <span class="hljs-keyword">type</span> Update } <span class="hljs-keyword">from</span> <span class="hljs-string">"./graph.js"</span>;
<span class="hljs-keyword">import</span> { ChatOpenAI } <span class="hljs-keyword">from</span> <span class="hljs-string">"@langchain/openai"</span>;
<span class="hljs-keyword">import</span> { z } <span class="hljs-keyword">from</span> <span class="hljs-string">"zod"</span>;
<span class="hljs-keyword">import</span> {
  HELP,
  TOOL_CALL_REQUEST,
  OTHER,
  QUESTION,
  SUPPORT,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"../types/types.js"</span>;
<span class="hljs-keyword">import</span> { extractStringFromAIMessage } <span class="hljs-keyword">from</span> <span class="hljs-string">"../utils/helpers.js"</span>;
<span class="hljs-keyword">import</span> { OpenAIToolSet } <span class="hljs-keyword">from</span> <span class="hljs-string">"composio-core"</span>;
<span class="hljs-keyword">import</span> <span class="hljs-keyword">type</span> { ChatCompletionMessageToolCall } <span class="hljs-keyword">from</span> <span class="hljs-string">"openai/resources/chat/completions.mjs"</span>;
<span class="hljs-keyword">import</span> { v4 <span class="hljs-keyword">as</span> uuidv4 } <span class="hljs-keyword">from</span> <span class="hljs-string">"uuid"</span>;
<span class="hljs-keyword">import</span> { DEBUG, ERROR, INFO, log, WARN } <span class="hljs-keyword">from</span> <span class="hljs-string">"../utils/logger.js"</span>;
<span class="hljs-keyword">import</span> {
  SystemMessage,
  HumanMessage,
  ToolMessage,
  BaseMessage,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"@langchain/core/messages"</span>;

<span class="hljs-comment">// feel free to use any model. Here I'm going with gpt-4o-mini</span>
<span class="hljs-keyword">const</span> model = <span class="hljs-string">"gpt-4o-mini"</span>;

<span class="hljs-keyword">const</span> toolset = <span class="hljs-keyword">new</span> OpenAIToolSet();
<span class="hljs-keyword">const</span> llm = <span class="hljs-keyword">new</span> ChatOpenAI({
  model,
  apiKey: process.env.OPENAI_API_KEY,
  temperature: <span class="hljs-number">0</span>,
});

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> processMessage = <span class="hljs-keyword">async</span> (state: State): <span class="hljs-built_in">Promise</span>&lt;Update&gt; =&gt; {
  log(DEBUG, <span class="hljs-string">"message in process message:"</span>, state.message);

  <span class="hljs-keyword">const</span> llm = <span class="hljs-keyword">new</span> ChatOpenAI({
    model,
    apiKey: process.env.OPENAI_API_KEY,
    temperature: <span class="hljs-number">0</span>,
  });

  <span class="hljs-keyword">const</span> structuredLlm = llm.withStructuredOutput(
    z.object({
      <span class="hljs-keyword">type</span>: z.enum([SUPPORT, OTHER, TOOL_CALL_REQUEST]).describe(<span class="hljs-string">`
Categorize the user's message:
- <span class="hljs-subst">${SUPPORT}</span>: Technical support, help with problems, or questions about AI.
- <span class="hljs-subst">${TOOL_CALL_REQUEST}</span>: User asks the bot to perform tool action (e.g., "send an email", "summarize chat", "summarize google sheets").
- <span class="hljs-subst">${OTHER}</span>: General conversation, spam, or off-topic messages.
`</span>),
    }),
  );

  <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> structuredLlm.invoke([
    [
      <span class="hljs-string">"system"</span>,
      <span class="hljs-string">`You are an expert message analyzer AI. You need to categorize the message into
one of these categories:

- <span class="hljs-subst">${SUPPORT}</span>: If the message asks for technical support, help with a problem, or questions about AIs and LLMs.
- <span class="hljs-subst">${TOOL_CALL_REQUEST}</span>: If the message is a direct command or request for the bot to perform an action using external tools/services. Examples: "Summarize a document or Google Sheet", "Summarize the last hour of chat", "Send an email to devteam about this bug", "Create a Trello card for this feature request". Prioritize this if the user is asking the bot to *do* something beyond just answering.
- <span class="hljs-subst">${OTHER}</span>: For general chit-chat, spam, off-topic messages, or anything not fitting <span class="hljs-subst">${SUPPORT}</span> or <span class="hljs-subst">${TOOL_CALL_REQUEST}</span>.
`</span>,
    ],
    [<span class="hljs-string">"human"</span>, state.message.content],
  ]);

  <span class="hljs-keyword">return</span> {
    messageChoice: res.type,
  };
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> processSupport = <span class="hljs-keyword">async</span> (state: State): <span class="hljs-built_in">Promise</span>&lt;Update&gt; =&gt; {
  log(DEBUG, <span class="hljs-string">"message in support:"</span>, state.message);

  <span class="hljs-keyword">const</span> llm = <span class="hljs-keyword">new</span> ChatOpenAI({
    model,
    apiKey: process.env.OPENAI_API_KEY,
    temperature: <span class="hljs-number">0</span>,
  });

  <span class="hljs-keyword">const</span> structuredLlm = llm.withStructuredOutput(
    z.object({
      <span class="hljs-keyword">type</span>: z.enum([QUESTION, HELP]).describe(<span class="hljs-string">`
Type of support needed:
- <span class="hljs-subst">${QUESTION}</span>: User asks a specific question seeking information or an answer.
- <span class="hljs-subst">${HELP}</span>: User needs broader assistance, guidance, or reports an issue requiring intervention/troubleshooting.
`</span>),
    }),
  );

  <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> structuredLlm.invoke([
    [
      <span class="hljs-string">"system"</span>,
      <span class="hljs-string">`
You are a support ticket analyzer. Given a support message, categorize it as <span class="hljs-subst">${QUESTION}</span> or <span class="hljs-subst">${HELP}</span>.
- <span class="hljs-subst">${QUESTION}</span>: For specific questions.
- <span class="hljs-subst">${HELP}</span>: For requests for assistance, troubleshooting, or problem reports.
`</span>,
    ],
    [<span class="hljs-string">"human"</span>, state.message.content],
  ]);

  <span class="hljs-keyword">return</span> {
    supportTicket: {
      ...state.supportTicket,
      <span class="hljs-keyword">type</span>: res.type,
    },
  };
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> processSupportHelp = <span class="hljs-keyword">async</span> (state: State): <span class="hljs-built_in">Promise</span>&lt;Update&gt; =&gt; {
  log(DEBUG, <span class="hljs-string">"message in support help:"</span>, state.message);

  <span class="hljs-keyword">return</span> {
    supportTicket: {
      ...state.supportTicket,
    },
    finalAction: {
      <span class="hljs-keyword">type</span>: <span class="hljs-string">"CREATE_EMBED"</span>,
      title: <span class="hljs-string">"🚨 Help Needed!"</span>,
      description: <span class="hljs-string">`A new request for help has been raised by **@<span class="hljs-subst">${state.message.author}</span>**.\n\n**Query:**\n&gt; <span class="hljs-subst">${state.message.content}</span>`</span>,
      roleToPing: process.env.DISCORD_SUPPORT_MOD_ID,
    },
  };
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> processSupportQuestion = <span class="hljs-keyword">async</span> (state: State): <span class="hljs-built_in">Promise</span>&lt;Update&gt; =&gt; {
  log(DEBUG, <span class="hljs-string">"message in support question category:"</span>, state.message);

  <span class="hljs-keyword">const</span> llm = <span class="hljs-keyword">new</span> ChatOpenAI({
    model,
    apiKey: process.env.OPENAI_API_KEY,
    temperature: <span class="hljs-number">0</span>,
  });

  <span class="hljs-keyword">const</span> systemPrompt = <span class="hljs-string">`
You are a helpful AI assistant specializing in AI, and LLMs. Answer
the user's question concisely and accurately based on general knowledge in
these areas. If the question is outside this scope (e.g., personal advice,
non-technical topics), politely state you cannot answer. User's question:
`</span>;

  <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> llm.invoke([
    [<span class="hljs-string">"system"</span>, systemPrompt],
    [<span class="hljs-string">"human"</span>, state.message.content],
  ]);

  <span class="hljs-keyword">const</span> llmResponse = extractStringFromAIMessage(res);
  <span class="hljs-keyword">return</span> {
    supportTicket: {
      ...state.supportTicket,
      question: {
        description: state.message.content,
        answer: llmResponse,
      },
    },
    finalAction: {
      <span class="hljs-keyword">type</span>: <span class="hljs-string">"REPLY"</span>,
      content: llmResponse,
    },
  };
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> processOther = <span class="hljs-keyword">async</span> (state: State): <span class="hljs-built_in">Promise</span>&lt;Update&gt; =&gt; {
  log(DEBUG, <span class="hljs-string">"message in other category:"</span>, state.message);

  <span class="hljs-keyword">const</span> response =
    <span class="hljs-string">"This seems to be a general message. I'm here to help with technical support or perform specific actions if you ask. How can I assist you with those?"</span>;

  <span class="hljs-keyword">return</span> {
    finalAction: {
      <span class="hljs-keyword">type</span>: <span class="hljs-string">"REPLY_IN_THREAD"</span>,
      content: response,
    },
  };
};
</code></pre>
<p>There’s not much to explain for these nodes. Each node in the flow functions as a message classifier. It spins up a Chat LLM instance and uses structured output to ensure the model returns a specific label from a predefined set like <code>QUESTION</code> or <code>HELP</code> for support messages. The system prompt clearly defines what each label means, and your user message is passed in for classification.</p>
<p>You’re almost there. But there’s one piece missing. Can you spot it?</p>
<p>The <code>process-tool-call</code> node that’s supposed to handle the workflow when the user asks to use a tool. This is a big piece of the workflow.</p>
<p>It’s a bit longer, so I’ll explain it separately.</p>
<p>Modify the above <code>nodes.ts</code> file to add the missing node:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇 discord-bot-langgraph/src/nodes.ts</span>

<span class="hljs-comment">// Rest of the code...</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> processToolCall = <span class="hljs-keyword">async</span> (state: State): <span class="hljs-built_in">Promise</span>&lt;Update&gt; =&gt; {
  log(DEBUG, <span class="hljs-string">"message in tool call request category:"</span>, state.message);

  <span class="hljs-keyword">const</span> structuredOutputType = z.object({
    service: z
      .string()
      .describe(<span class="hljs-string">"The target service (e.g., 'email', 'discord')."</span>),
    task: z
      .string()
      .describe(
        <span class="hljs-string">"A concise description of the task (e.g., 'send email to X', 'summarize recent chat', 'create task Y')."</span>,
      ),
    details: z
      .string()
      .optional()
      .describe(
        <span class="hljs-string">"Any specific details or parameters extracted from the message relevant to the task."</span>,
      ),
  });

  <span class="hljs-keyword">const</span> structuredLlm = llm.withStructuredOutput(structuredOutputType);

  <span class="hljs-keyword">let</span> parsedActionDetails: z.infer&lt;<span class="hljs-keyword">typeof</span> structuredOutputType&gt; = {
    service: <span class="hljs-string">"unknown"</span>,
    task: <span class="hljs-string">"perform a requested action"</span>,
  };

  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> structuredLlm.invoke([
      [
        <span class="hljs-string">"system"</span>,
        <span class="hljs-string">`Parse the user's request to identify an action. Extract the target service, a description of the task, and any relevant details or parameters.
      Examples:
      - "Remind me to check emails at 5 PM": service: calendar/reminder, task: set reminder, details: check emails at 5 PM
      - "Send a summary of this conversation to #general channel": service: discord, task: send summary to channel, details: channel #general
      - "Create a bug report for 'login fails on mobile'": service: project_manager, task: create bug report, details: title 'login fails on mobile'`</span>,
      ],
      [<span class="hljs-string">"human"</span>, state.message.content],
    ]);

    parsedActionDetails = res;
    log(INFO, <span class="hljs-string">"initial parsing action details:"</span>, parsedActionDetails);
  } <span class="hljs-keyword">catch</span> (error) {
    log(ERROR, <span class="hljs-string">"initial parsing error:"</span>, error);
    <span class="hljs-keyword">return</span> {
      toolCallRequest: {
        actionLog: <span class="hljs-string">`Failed to parse user request: <span class="hljs-subst">${state.message.content}</span>`</span>,
        status: <span class="hljs-string">"failed"</span>,
      },
      finalAction: {
        <span class="hljs-keyword">type</span>: <span class="hljs-string">"REPLY_IN_THREAD"</span>,
        content:
          <span class="hljs-string">"I'm sorry, I had trouble understanding that action. Could you please rephrase it?"</span>,
      },
    };
  }

  <span class="hljs-keyword">try</span> {
    log(INFO, <span class="hljs-string">"fetching composio tools"</span>);
    <span class="hljs-keyword">const</span> tools = <span class="hljs-keyword">await</span> toolset.getTools({
      apps: [<span class="hljs-string">"GOOGLESHEETS"</span>],
    });

    log(INFO, <span class="hljs-string">`fetched <span class="hljs-subst">${tools.length}</span> tools. Errors if &gt; 128 for OpenAI:`</span>);

    <span class="hljs-keyword">if</span> (tools.length === <span class="hljs-number">0</span>) {
      log(WARN, <span class="hljs-string">"no tools fetched from Composio. skipping..."</span>);
      <span class="hljs-keyword">return</span> {
        toolCallRequest: {
          actionLog: <span class="hljs-string">`Service: <span class="hljs-subst">${parsedActionDetails.service}</span>, Task: <span class="hljs-subst">${parsedActionDetails.task}</span>. No composio tools found`</span>,
          status: <span class="hljs-string">"failed"</span>,
        },
        finalAction: {
          <span class="hljs-keyword">type</span>: <span class="hljs-string">"REPLY_IN_THREAD"</span>,
          content: <span class="hljs-string">"Couldn't find any tools to perform your action."</span>,
        },
      };
    }

    log(DEBUG, <span class="hljs-string">"starting iterative tool execution loop"</span>);

    <span class="hljs-keyword">const</span> conversationHistory: BaseMessage[] = [
      <span class="hljs-keyword">new</span> SystemMessage(
        <span class="hljs-string">"You are a helpful assistant that performs tool calls. Your task is to understand the user's request and use the available tools to fulfill the request completely. You can use multiple tools in sequence to accomplish complex tasks. Always provide a brief, conversational summary of what you accomplished after using tools."</span>,
      ),
      <span class="hljs-keyword">new</span> HumanMessage(state.message.content),
    ];

    <span class="hljs-keyword">let</span> totalToolsUsed = <span class="hljs-number">0</span>;
    <span class="hljs-keyword">let</span> finalResponse: <span class="hljs-built_in">string</span> | <span class="hljs-literal">null</span> = <span class="hljs-literal">null</span>;

    <span class="hljs-keyword">const</span> maxIterations = <span class="hljs-number">5</span>;
    <span class="hljs-keyword">let</span> iteration = <span class="hljs-number">0</span>;

    <span class="hljs-keyword">while</span> (iteration &lt; maxIterations) {
      iteration++;
      log(
        DEBUG,
        <span class="hljs-string">`Iteration <span class="hljs-subst">${iteration}</span>: calling LLM with <span class="hljs-subst">${tools.length}</span> tools`</span>,
      );

      <span class="hljs-keyword">const</span> llmResponse = <span class="hljs-keyword">await</span> llm.invoke(conversationHistory, {
        tools: tools,
      });

      log(DEBUG, <span class="hljs-string">`Iteration <span class="hljs-subst">${iteration}</span> LLM response:`</span>, llmResponse);

      <span class="hljs-keyword">const</span> toolCalls = llmResponse.tool_calls;

      <span class="hljs-keyword">if</span> ((!toolCalls || toolCalls.length === <span class="hljs-number">0</span>) &amp;&amp; llmResponse.content) {
        finalResponse =
          <span class="hljs-keyword">typeof</span> llmResponse.content === <span class="hljs-string">"string"</span>
            ? llmResponse.content
            : <span class="hljs-built_in">JSON</span>.stringify(llmResponse.content);
        log(
          INFO,
          <span class="hljs-string">`Final response received after <span class="hljs-subst">${iteration}</span> iterations:`</span>,
          finalResponse,
        );
        <span class="hljs-keyword">break</span>;
      }

      <span class="hljs-keyword">if</span> (toolCalls &amp;&amp; toolCalls.length &gt; <span class="hljs-number">0</span>) {
        log(
          INFO,
          <span class="hljs-string">`Iteration <span class="hljs-subst">${iteration}</span>: executing <span class="hljs-subst">${toolCalls.length}</span> tool(s)`</span>,
        );
        totalToolsUsed += toolCalls.length;

        conversationHistory.push(llmResponse);

        <span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> toolCall <span class="hljs-keyword">of</span> toolCalls) {
          log(
            INFO,
            <span class="hljs-string">`Executing tool: <span class="hljs-subst">${toolCall.name}</span> with args:`</span>,
            toolCall.args,
          );

          <span class="hljs-keyword">const</span> composioCompatibleToolCall: ChatCompletionMessageToolCall = {
            id: toolCall.id || uuidv4(),
            <span class="hljs-keyword">type</span>: <span class="hljs-string">"function"</span>,
            <span class="hljs-function"><span class="hljs-keyword">function</span>: </span>{
              name: toolCall.name,
              <span class="hljs-built_in">arguments</span>: <span class="hljs-built_in">JSON</span>.stringify(toolCall.args),
            },
          };

          <span class="hljs-keyword">let</span> toolOutputContent: <span class="hljs-built_in">string</span>;
          <span class="hljs-keyword">try</span> {
            <span class="hljs-keyword">const</span> executionResult = <span class="hljs-keyword">await</span> toolset.executeToolCall(
              composioCompatibleToolCall,
            );
            log(
              INFO,
              <span class="hljs-string">`Tool <span class="hljs-subst">${toolCall.name}</span> execution result:`</span>,
              executionResult,
            );
            toolOutputContent = <span class="hljs-built_in">JSON</span>.stringify(executionResult);
          } <span class="hljs-keyword">catch</span> (toolError) {
            log(ERROR, <span class="hljs-string">`Tool <span class="hljs-subst">${toolCall.name}</span> execution error:`</span>, toolError);
            <span class="hljs-keyword">const</span> errorMessage =
              toolError <span class="hljs-keyword">instanceof</span> <span class="hljs-built_in">Error</span>
                ? toolError.message
                : <span class="hljs-built_in">String</span>(toolError);

            toolOutputContent = <span class="hljs-string">`Error: <span class="hljs-subst">${errorMessage}</span>`</span>;
          }

          conversationHistory.push(
            <span class="hljs-keyword">new</span> ToolMessage({
              content: toolOutputContent,
              tool_call_id: toolCall.id || uuidv4(),
            }),
          );
        }

        <span class="hljs-keyword">continue</span>;
      }

      log(
        WARN,
        <span class="hljs-string">`Iteration <span class="hljs-subst">${iteration}</span>: LLM provided no tool calls or content`</span>,
      );
      <span class="hljs-keyword">break</span>;
    }

    <span class="hljs-keyword">let</span> userFriendlyResponse: <span class="hljs-built_in">string</span>;

    <span class="hljs-keyword">if</span> (totalToolsUsed &gt; <span class="hljs-number">0</span>) {
      log(DEBUG, <span class="hljs-string">"Generating user-friendly summary using LLM"</span>);

      <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> summaryResponse = <span class="hljs-keyword">await</span> llm.invoke([
          <span class="hljs-keyword">new</span> SystemMessage(
            <span class="hljs-string">"You are tasked with creating a brief, friendly summary for a Discord user about what actions were just completed. Keep it conversational, under 2-3 sentences, and focus on what was accomplished rather than technical details. Start with phrases like 'Done!', 'Successfully completed', 'All set!', etc."</span>,
          ),
          <span class="hljs-keyword">new</span> HumanMessage(
            <span class="hljs-string">`The user requested: "<span class="hljs-subst">${state.message.content}</span>"

I used <span class="hljs-subst">${totalToolsUsed}</span> tools across <span class="hljs-subst">${iteration}</span> iterations to complete their request. <span class="hljs-subst">${finalResponse ? <span class="hljs-string">`My final response was: <span class="hljs-subst">${finalResponse}</span>`</span> : <span class="hljs-string">"The task was completed successfully."</span>}</span>

Generate a brief, friendly summary of what was accomplished.`</span>,
          ),
        ]);

        userFriendlyResponse =
          <span class="hljs-keyword">typeof</span> summaryResponse.content === <span class="hljs-string">"string"</span>
            ? summaryResponse.content
            : <span class="hljs-string">`Done! I've completed your request using <span class="hljs-subst">${totalToolsUsed}</span> action<span class="hljs-subst">${totalToolsUsed &gt; <span class="hljs-number">1</span> ? <span class="hljs-string">"s"</span> : <span class="hljs-string">""</span>}</span>.`</span>;

        log(INFO, <span class="hljs-string">"Generated user-friendly summary:"</span>, userFriendlyResponse);
      } <span class="hljs-keyword">catch</span> (summaryError) {
        log(ERROR, <span class="hljs-string">"Failed to generate summary:"</span>, summaryError);
        userFriendlyResponse = <span class="hljs-string">`All set! I've completed your request using <span class="hljs-subst">${totalToolsUsed}</span> action<span class="hljs-subst">${totalToolsUsed &gt; <span class="hljs-number">1</span> ? <span class="hljs-string">"s"</span> : <span class="hljs-string">""</span>}</span>.`</span>;
      }
    } <span class="hljs-keyword">else</span> {
      userFriendlyResponse =
        finalResponse ||
        <span class="hljs-string">`I understood your request about '<span class="hljs-subst">${parsedActionDetails.task}</span>' but couldn't find the right tools to complete it.`</span>;
    }

    <span class="hljs-keyword">const</span> actionLog = <span class="hljs-string">`Service: <span class="hljs-subst">${parsedActionDetails.service}</span>, Task: <span class="hljs-subst">${parsedActionDetails.task}</span>. Used <span class="hljs-subst">${totalToolsUsed}</span> tools across <span class="hljs-subst">${iteration}</span> iterations.`</span>;

    <span class="hljs-keyword">return</span> {
      toolCallRequest: {
        actionLog,
        status: totalToolsUsed &gt; <span class="hljs-number">0</span> ? <span class="hljs-string">"success"</span> : <span class="hljs-string">"acknowledged"</span>,
      },
      finalAction: {
        <span class="hljs-keyword">type</span>: <span class="hljs-string">"REPLY_IN_THREAD"</span>,
        content: userFriendlyResponse,
      },
    };
  } <span class="hljs-keyword">catch</span> (error) {
    log(ERROR, <span class="hljs-string">"processing tool call with Composio:"</span>, error);
    <span class="hljs-keyword">const</span> errorMessage = error <span class="hljs-keyword">instanceof</span> <span class="hljs-built_in">Error</span> ? error.message : <span class="hljs-built_in">String</span>(error);

    <span class="hljs-keyword">return</span> {
      toolCallRequest: {
        actionLog: <span class="hljs-string">`Error during tool call (Service: <span class="hljs-subst">${parsedActionDetails.service}</span>, Task: <span class="hljs-subst">${parsedActionDetails.task}</span>). Error: <span class="hljs-subst">${errorMessage}</span>`</span>,
        status: <span class="hljs-string">"failed"</span>,
      },
      finalAction: {
        <span class="hljs-keyword">type</span>: <span class="hljs-string">"REPLY_IN_THREAD"</span>,
        content: <span class="hljs-string">"Sorry, I encountered an error while processing your request."</span>,
      },
    };
  }
};
</code></pre>
<p>The part up until the first try-catch block is the same. Up until then, you're figuring out the tool the user is trying to call. Now comes the juicy part: actually handling tool calls.</p>
<p>At this point, you need to fetch the tools from Composio. Here, I’m just passing in Google Sheets as the option for demo purposes, but you could use literally anything once you authenticate yourself as shown above.</p>
<p>After fetching the tools, you enter a loop where the LLM can use them. It reviews the conversation history and decides which tools to call. You execute these calls, feed the results back, and repeat for up to 5 iterations or until the LLM gives a final answer.</p>
<p>This loop runs up to 5 times as a safeguard so the LLM doesn’t get stuck in an endless back-and-forth.</p>
<p>If tools were used, you ask the LLM to write a friendly summary for the user instead of dumping the raw JSON response. If no tools worked or none matched, just let the user know you couldn’t perform the action.</p>
<p>Now with that, you’re done with the difficult part (I mean, it was pretty easy though, right?). From here on, you just need to set up and work with the Discord API using Discord.js.</p>
<h3 id="heading-set-up-discordjs-client">Set Up Discord.js Client</h3>
<p>In this application, you’re using slash commands. To use slash commands in Discord, you need to register them first. You can do this manually, but why not automate it as well? 😉</p>
<p>Create a new file named <code>slash-deploy.ts</code> inside the <code>utils</code> directory and add the following lines of code:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇 discord-bot-langgraph/utils/slash-deploy.ts</span>

<span class="hljs-keyword">import</span> { REST, Routes } <span class="hljs-keyword">from</span> <span class="hljs-string">"discord.js"</span>;
<span class="hljs-keyword">import</span> dotenv <span class="hljs-keyword">from</span> <span class="hljs-string">"dotenv"</span>;
<span class="hljs-keyword">import</span> { log, INFO, ERROR } <span class="hljs-keyword">from</span> <span class="hljs-string">"./logger.js"</span>;
<span class="hljs-keyword">import</span> {
  DISCORD_BOT_TOKEN,
  DISCORD_BOT_GUILD_ID,
  OPENAI_API_KEY,
  DISCORD_BOT_CLIENT_ID,
  validateEnvVars,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"./env-validator.js"</span>;

dotenv.config();

<span class="hljs-keyword">const</span> requiredEnvVars = [
  DISCORD_BOT_TOKEN,
  DISCORD_BOT_GUILD_ID,
  DISCORD_BOT_CLIENT_ID,
  OPENAI_API_KEY,
];
validateEnvVars(requiredEnvVars);

<span class="hljs-keyword">const</span> commands = [
  {
    name: <span class="hljs-string">"ask"</span>,
    description: <span class="hljs-string">"Ask the AI assistant a question or give it a command."</span>,
    options: [
      {
        name: <span class="hljs-string">"prompt"</span>,
        <span class="hljs-keyword">type</span>: <span class="hljs-number">3</span>,
        description: <span class="hljs-string">"Your question or command for the bot"</span>,
        required: <span class="hljs-literal">true</span>,
      },
    ],
  },
];

<span class="hljs-keyword">const</span> rest = <span class="hljs-keyword">new</span> REST({ version: <span class="hljs-string">"10"</span> }).setToken(
  process.env.DISCORD_BOT_TOKEN!,
);

(<span class="hljs-keyword">async</span> () =&gt; {
  <span class="hljs-keyword">try</span> {
    log(INFO, <span class="hljs-string">"deploying slash(/) commands"</span>);
    <span class="hljs-keyword">await</span> rest.put(
      Routes.applicationGuildCommands(
        process.env.DISCORD_BOT_CLIENT_ID!,
        process.env.DISCORD_BOT_GUILD_ID!,
      ),
      {
        body: commands,
      },
    );

    log(INFO, <span class="hljs-string">"slash(/) commands deployed"</span>);
  } <span class="hljs-keyword">catch</span> (error) {
    log(ERROR, <span class="hljs-string">"deploying slash(/) commands:"</span>, error);
  }
})();
</code></pre>
<p>See your <code>validateEnvVars</code> function in action? Here, you’re specifying the environment variables that must be set before running the program. If any are missing and you try to run the program, you’ll get an error.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750340614800/ce0b37bc-647c-4b94-9099-2e396b0ffa93.png" alt="Command failed output for deploying slash command to Discord" class="image--center mx-auto" width="1221" height="191" loading="lazy"></p>
<p>The way you deploy the slash commands to Discord is using the <code>REST</code> API provided by <code>discord.js</code>, specifically by calling <code>rest.put</code> with your command data and target guild.</p>
<p>Now, simply run the <code>commands:deploy</code> bun script and you should have <code>/ask</code> registered as a slash command in your Discord.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750340646555/2d5b22df-cd43-4e54-b985-b64576831316.png" alt="2d5b22df-cd43-4e54-b985-b64576831316" class="image--center mx-auto" width="1080" height="165" loading="lazy"></p>
<p>At this point, you should see the <code>/ask</code> slash command available in your server. All that’s left is to create the <code>index.ts</code> file, which will be the entry point to your Discord bot.</p>
<p>Create a new file named <code>index.ts</code> inside the <code>src</code> directory and add the following lines of code:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// 👇 discord-bot-langgraph/src/index.ts</span>

<span class="hljs-keyword">import</span> dotenv <span class="hljs-keyword">from</span> <span class="hljs-string">"dotenv"</span>;
<span class="hljs-keyword">import</span> {
  Client,
  Events,
  GatewayIntentBits,
  EmbedBuilder,
  <span class="hljs-keyword">type</span> Interaction,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"discord.js"</span>;
<span class="hljs-keyword">import</span> { initializeGraph } <span class="hljs-keyword">from</span> <span class="hljs-string">"./graph.js"</span>;
<span class="hljs-keyword">import</span> { <span class="hljs-keyword">type</span> Message <span class="hljs-keyword">as</span> ChatMessage } <span class="hljs-keyword">from</span> <span class="hljs-string">"../types/types.js"</span>;
<span class="hljs-keyword">import</span> { ERROR, INFO, log } <span class="hljs-keyword">from</span> <span class="hljs-string">"../utils/logger.js"</span>;
<span class="hljs-keyword">import</span> {
  DISCORD_BOT_TOKEN,
  DISCORD_BOT_GUILD_ID,
  OPENAI_API_KEY,
  validateEnvVars,
  DISCORD_BOT_CLIENT_ID,
  COMPOSIO_API_KEY,
} <span class="hljs-keyword">from</span> <span class="hljs-string">"../utils/env-validator.js"</span>;

dotenv.config();

<span class="hljs-keyword">const</span> requiredEnvVars = [
  DISCORD_BOT_CLIENT_ID,
  DISCORD_BOT_TOKEN,
  DISCORD_BOT_GUILD_ID,

  OPENAI_API_KEY,

  COMPOSIO_API_KEY,
];
validateEnvVars(requiredEnvVars);

<span class="hljs-keyword">const</span> graph = initializeGraph();

<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> Client({
  intents: [
    GatewayIntentBits.Guilds,
    GatewayIntentBits.GuildMessages,
    GatewayIntentBits.MessageContent,
  ],
});

<span class="hljs-comment">// use a map to store history per channel to make it work properly with all the</span>
<span class="hljs-comment">// channels and not for one specific channel.</span>
<span class="hljs-keyword">const</span> channelHistories = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Map</span>&lt;<span class="hljs-built_in">string</span>, ChatMessage[]&gt;();

client.on(Events.ClientReady, <span class="hljs-keyword">async</span> (readyClient) =&gt; {
  log(INFO, <span class="hljs-string">`logged in as <span class="hljs-subst">${readyClient.user.tag}</span>. ready to process commands!`</span>);
});

client.on(Events.InteractionCreate, <span class="hljs-keyword">async</span> (interaction: Interaction) =&gt; {
  <span class="hljs-keyword">if</span> (!interaction.isChatInputCommand()) <span class="hljs-keyword">return</span>;
  <span class="hljs-keyword">if</span> (interaction.commandName !== <span class="hljs-string">"ask"</span>) <span class="hljs-keyword">return</span>;

  <span class="hljs-keyword">const</span> userPrompt = interaction.options.getString(<span class="hljs-string">"prompt"</span>, <span class="hljs-literal">true</span>);
  <span class="hljs-keyword">const</span> user = interaction.user;
  <span class="hljs-keyword">const</span> channelId = interaction.channelId;

  <span class="hljs-keyword">if</span> (!channelHistories.has(channelId)) channelHistories.set(channelId, []);

  <span class="hljs-keyword">const</span> messageHistory = channelHistories.get(channelId)!;

  <span class="hljs-keyword">const</span> currentUserMessage: ChatMessage = {
    author: user.username,
    content: userPrompt,
  };

  <span class="hljs-keyword">const</span> graphInput = {
    message: currentUserMessage,
    previousMessages: [...messageHistory],
  };

  messageHistory.push(currentUserMessage);
  <span class="hljs-keyword">if</span> (messageHistory.length &gt; <span class="hljs-number">20</span>) messageHistory.shift();

  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">await</span> interaction.reply({
      content: <span class="hljs-string">"Hmm... processing your request! 🐀"</span>,
    });

    <span class="hljs-keyword">const</span> finalState = <span class="hljs-keyword">await</span> graph.invoke(graphInput);

    <span class="hljs-keyword">if</span> (!finalState.finalAction) {
      log(ERROR, <span class="hljs-string">"no final action found"</span>);
      <span class="hljs-keyword">await</span> interaction.editReply({
        content: <span class="hljs-string">"I'm sorry, I couldn't process your request."</span>,
      });
      <span class="hljs-keyword">return</span>;
    }

    <span class="hljs-keyword">const</span> userPing = <span class="hljs-string">`&lt;@<span class="hljs-subst">${user.id}</span>&gt;`</span>;
    <span class="hljs-keyword">const</span> action = finalState.finalAction;

    <span class="hljs-keyword">const</span> quotedPrompt = <span class="hljs-string">`🗣️ "<span class="hljs-subst">${userPrompt}</span>"`</span>;

    <span class="hljs-keyword">switch</span> (action.type) {
      <span class="hljs-keyword">case</span> <span class="hljs-string">"REPLY"</span>:
        <span class="hljs-keyword">await</span> interaction.editReply({
          content: <span class="hljs-string">`<span class="hljs-subst">${userPing}</span>\n\n<span class="hljs-subst">${quotedPrompt}</span>\n\n<span class="hljs-subst">${action.content}</span>`</span>,
        });
        <span class="hljs-keyword">break</span>;

      <span class="hljs-keyword">case</span> <span class="hljs-string">"REPLY_IN_THREAD"</span>:
        <span class="hljs-keyword">if</span> (!interaction.channel || !(<span class="hljs-string">"threads"</span> <span class="hljs-keyword">in</span> interaction.channel)) {
          <span class="hljs-keyword">await</span> interaction.editReply({
            content: <span class="hljs-string">"Cannot create a thread in this channel"</span>,
          });
          <span class="hljs-keyword">return</span>;
        }

        <span class="hljs-keyword">try</span> {
          <span class="hljs-keyword">const</span> thread = <span class="hljs-keyword">await</span> interaction.channel.threads.create({
            name: <span class="hljs-string">`Action: <span class="hljs-subst">${userPrompt.substring(<span class="hljs-number">0</span>, <span class="hljs-number">50</span>)}</span>...`</span>,
            autoArchiveDuration: <span class="hljs-number">60</span>,
          });

          <span class="hljs-keyword">await</span> thread.send(
            <span class="hljs-string">`<span class="hljs-subst">${userPing}</span>\n\n<span class="hljs-subst">${quotedPrompt}</span>\n\n<span class="hljs-subst">${action.content}</span>`</span>,
          );
          <span class="hljs-keyword">await</span> interaction.editReply({
            content: <span class="hljs-string">`I've created a thread for you: <span class="hljs-subst">${thread.url}</span>`</span>,
          });
        } <span class="hljs-keyword">catch</span> (threadError) {
          log(ERROR, <span class="hljs-string">"failed to create or reply in thread:"</span>, threadError);
          <span class="hljs-keyword">await</span> interaction.editReply({
            content: <span class="hljs-string">`<span class="hljs-subst">${userPing}</span>\n\n<span class="hljs-subst">${quotedPrompt}</span>\n\nI tried to create a thread but failed. Here is your response:\n\n<span class="hljs-subst">${action.content}</span>`</span>,
          });
        }
        <span class="hljs-keyword">break</span>;

      <span class="hljs-keyword">case</span> <span class="hljs-string">"CREATE_EMBED"</span>: {
        <span class="hljs-keyword">const</span> embed = <span class="hljs-keyword">new</span> EmbedBuilder()
          .setColor(<span class="hljs-number">0xffa500</span>)
          .setTitle(action.title)
          .setDescription(action.description)
          .setTimestamp()
          .setFooter({ text: <span class="hljs-string">"Support System"</span> });

        <span class="hljs-keyword">const</span> rolePing = action.roleToPing ? <span class="hljs-string">`&lt;@<span class="hljs-subst">${action.roleToPing}</span>&gt;`</span> : <span class="hljs-string">""</span>;

        <span class="hljs-keyword">await</span> interaction.editReply({
          content: <span class="hljs-string">`<span class="hljs-subst">${userPing}</span> <span class="hljs-subst">${rolePing}</span>`</span>,
          embeds: [embed],
        });
        <span class="hljs-keyword">break</span>;
      }
    }
  } <span class="hljs-keyword">catch</span> (error) {
    log(ERROR, <span class="hljs-string">"generating AI response or processing graph:"</span>, error);
    <span class="hljs-keyword">const</span> errorMessage =
      <span class="hljs-string">"sorry, I encountered an error while processing your request."</span>;
    <span class="hljs-keyword">if</span> (interaction.replied || interaction.deferred) {
      <span class="hljs-keyword">await</span> interaction.followUp({ content: errorMessage, ephemeral: <span class="hljs-literal">true</span> });
    } <span class="hljs-keyword">else</span> {
      <span class="hljs-keyword">await</span> interaction.reply({ content: errorMessage, ephemeral: <span class="hljs-literal">true</span> });
    }
  }
});

<span class="hljs-keyword">const</span> token = process.env.DISCORD_BOT_TOKEN!;
client.login(token);
</code></pre>
<p>At the core of our bot is the <code>Client</code> object from <code>discord.js</code>. This represents your bot and handles everything from connecting to Discord’s API to listening for events like user messages or interactions.</p>
<p>What’s with that intent? Discord uses intents as a way for bots to declare what kind of data they want access to. In our case:</p>
<ul>
<li><p><code>Guilds</code> lets the bot connect to servers</p>
</li>
<li><p><code>GuildMessages</code> allows it to see messages</p>
</li>
<li><p><code>MessageContent</code> gives access to the actual content of messages</p>
</li>
</ul>
<p>These are quite standard, and there are many more based on different use cases. You can always check them all out <a target="_blank" href="https://discordjs.guide/popular-topics/intents.html#privileged-intents">here</a>.</p>
<p>You also keep a <code>Map</code> to store per-channel message history so the bot can respond with context across multiple channels:</p>
<pre><code class="lang-ts"><span class="hljs-keyword">const</span> channelHistories = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Map</span>&lt;<span class="hljs-built_in">string</span>, ChatMessage[]&gt;();
</code></pre>
<p>Discord.js provides access to a few events that you can listen to. When you work with slash commands, it registers an <code>Events.InteractionCreate</code>, which is what you’re listening to.</p>
<p>With every <code>/ask</code> command, you take the user's prompt and any previous messages. If <code>channelHistories</code> does not have a key with that specific channelId, meaning it's being used for the first time, you initialize it with an empty array and feed them into the AI state.</p>
<pre><code class="lang-ts"><span class="hljs-keyword">const</span> finalState = <span class="hljs-keyword">await</span> graph.invoke({
  message: currentUserMessage,
  previousMessages: [...messageHistory],
});
</code></pre>
<p>Depending on what the graph <code>finalAction.type</code> returns, you either:</p>
<ul>
<li><p>reply directly,</p>
</li>
<li><p>create a thread and respond there,</p>
</li>
<li><p>or send an embed (for support-type replies).</p>
</li>
</ul>
<p>If a thread can’t be created, you fall back to replying in the main channel. Message history is capped at 20 to keep things lightweight.</p>
<p>Note that we’re not really using <code>previousMessages</code> much at the moment in the application, but I’ve prepared everything you need to handle querying previous conversations. You could easily create a new LangGraph node that queries or reasons over history if the bot needs to reference past conversations. (Take this as your challenge!)</p>
<p>This project should give you a basic idea of how you can use LangGraph + Composio to build a somewhat useful bot that can already handle decent stuff. There’s a lot more you could improve. I’ll leave that up to you. ✌️</p>
<p>Here’s a quick demo of what we’ve built so far:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/aeQKN0nMGRg" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p> </p>
<hr>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>By now you should have a good idea of how LangGraph works and also how to power the bot with integrations using Composio.</p>
<p>This is just a fraction of what you can do. Try adding more features and more integration support to the bot to fit your workflow. This can come in really handy.</p>
<p>If you got lost somewhere while coding along, you can find the source code <a target="_blank" href="https://github.com/shricodev/discord-bot-langgraph-composio">here</a>.</p>
<p>So, that is it for this article. Thank you so much for reading! See you next time. 🫡</p>
<p>Love to build cool stuff like this? I regularly build such stuff every few weeks. Feel free to reach out to me here:</p>
<ul>
<li><p>GitHub: <a target="_blank" href="http://github.com/shricodev">github.com/shricodev</a></p>
</li>
<li><p>Portfolio: <a target="_blank" href="http://techwithshrijal.com">techwithshrijal.com</a></p>
</li>
<li><p>LinkedIn: <a target="_blank" href="http://linkedin.com/in/iamshrijal">linkedin.com/in/iamshrijal</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn LangGraph and Build Conversational AI with Python ]]>
                </title>
                <description>
                    <![CDATA[ If you're building conversational AI and tired of messy logic or hard-to-scale workflows, LangGraph makes it easier. It uses graphs to manage dialogue flow, so your bots stay organized even as they get more complex. Great for anything from support ag... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-langgraph-and-build-conversational-ai-with-python/</link>
                <guid isPermaLink="false">682ca2c574938e5428551845</guid>
                
                    <category>
                        <![CDATA[ langgraph ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Tue, 20 May 2025 15:41:57 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1747755679385/a984244e-b0a8-4431-8090-75db806e9616.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you're building conversational AI and tired of messy logic or hard-to-scale workflows, LangGraph makes it easier. It uses graphs to manage dialogue flow, so your bots stay organized even as they get more complex. Great for anything from support agents to multi-step AI tools.</p>
<p>We just published a full video course on the freeCodeCamp.org YouTube channel about LangGraph. It is an open-source Python framework from the LangChain team you can use to build advanced conversational AI workflows. If you're looking to build chatbots or AI agents that can handle more than just basic Q&amp;A, this course is for you.</p>
<p>Your instructor, Vaibhav Mehra, walks you through everything step by step. You’ll learn how to use a graph-based structure to manage complex dialogue systems. This approach makes it easier to build scalable, flexible conversational applications powered by large language models.</p>
<p>Here’s a quick breakdown of what’s covered:</p>
<ul>
<li><p>The Basics: Get started with LangGraph, type annotations, and core elements.</p>
</li>
<li><p>Agents: Build five different agents from scratch. Each one includes a walkthrough and coding exercises so you can follow along and try it yourself.</p>
</li>
<li><p>AI Agents: Go deeper with AI-powered agents. These modules cover key design patterns and show you how to build more intelligent, adaptive systems.</p>
</li>
<li><p>RAG (Retrieval-Augmented Generation): Learn how to integrate RAG into your LangGraph workflows for smarter, context-aware responses.</p>
</li>
<li><p>Wrap-up: Recap what you’ve learned and see how all the pieces come together.</p>
</li>
</ul>
<p>This is a hands-on course, so expect to write a lot of code and build real working examples. Whether you're new to LangGraph or just want to sharpen your skills, this course will help you go from zero to production-ready applications.</p>
<p>Watch it now on the <a target="_blank" href="https://youtu.be/jGg_1h0qzaM">freeCodeCamp.org YouTube channel</a> (3-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/jGg_1h0qzaM" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
