<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ langchain - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ langchain - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Wed, 01 Jul 2026 10:25:29 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/langchain/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ From LLMs to LangChain: Understanding How Modern AI Applications Actually Work ]]>
                </title>
                <description>
                    <![CDATA[ Typically, when we start experimenting with AI, many of us begin similarly. We try a single LLM call as the core of an app, like this: const response = await llm.chat("Explain Kubernetes"); For a lit ]]>
                </description>
                <link>https://www.freecodecamp.org/news/from-llms-to-langchain-understanding-how-modern-ai-applications-actually-work/</link>
                <guid isPermaLink="false">6a3aab13b5ad15098db82372</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Open Source ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Sudheesh Shetty ]]>
                </dc:creator>
                <pubDate>Tue, 23 Jun 2026 15:49:39 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/38787e16-7e86-44da-9a6a-620cc1a99fce.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Typically, when we start experimenting with AI, many of us begin similarly. We try a single LLM call as the core of an app, like this:</p>
<pre><code class="language-plaintext">const response = await llm.chat("Explain Kubernetes");
</code></pre>
<p>For a little while it feels like the whole flow is: the user asks something, and the model returns an answer. That early success often creates a false impression that building AI is just about sending prompts and getting responses.</p>
<p>That simplicity is seductive, but it doesn't hold up. Over time, users want the assistant to find answers in their documents and knowledge bases, call APIs, fetch live data, or trigger services or schedule meetings.</p>
<p>Users also expect the agent to access internal systems and interact with ERPs, CRMs, or other tools holding critical business data. They'll want agents to combine multiple steps, as workflows often require chaining queries, computations, and side effects into reliable processes.</p>
<p>This is where concepts like MCP (the Model Context Protocol) and tools like LangChain come in. Initially, they may seem like buzzwords, but they address different aspects of LLM production.</p>
<p>After experimenting with AI tools, I found that these concepts help solve different problems related to interfaces, orchestration, and system integration.</p>
<p>This article is a practical guide to understanding how LLMs connect with tools, orchestrate workflows, and power real AI applications.</p>
<h3 id="heading-heres-what-well-cover">Here’s what we’ll cover:</h3>
<ol>
<li><p><a href="#heading-what-is-an-llm">What Is an LLM?</a></p>
</li>
<li><p><a href="#heading-why-llms-need-tools">Why LLMs Need Tools</a></p>
</li>
<li><p><a href="#heading-where-mcp-comes-in">Where MCP Comes In</a></p>
</li>
<li><p><a href="#heading-so-what-does-langchain-actually-do">So What Does LangChain Actually Do?</a></p>
</li>
<li><p><a href="#heading-putting-it-together">Putting It Together</a></p>
</li>
<li><p><a href="#heading-what-i-built-while-learning-this">What I Built While Learning This</a></p>
</li>
</ol>
<p>Throughout the article we'll discuss what LLMs are and how they work, what tool-calling looks like in practice, what MCP is and how it works, how LangChain fits into the whole process, and how to put all these tools together.</p>
<p>To follow along, you'll need a basic understanding of Node.js, API operations, and basic JavaScript concepts.</p>
<h2 id="heading-what-is-an-llm"><strong>What Is an LLM?</strong></h2>
<p>LLM stands for <strong>Large Language Model</strong>. It's a class of deep neural networks trained on massive amounts of text to model and generate human-like language. Popular examples you might have heard of include GPT, Claude, Gemini, and Llama.</p>
<h3 id="heading-how-to-call-an-llm-from-a-nodejs-application">How to Call an LLM From a Node.js Application</h3>
<p>Before writing code, let’s understand what it means to call an LLM from a Node.js application.</p>
<p>Calling an LLM means sending input from your application to an AI provider’s API and receiving generated output in return. It's similar to calling any other external service.</p>
<p>In most real-world applications, the model isn't hosted or trained by your application. Instead, providers such as OpenAI and Groq host and maintain the models, while your application communicates with them over HTTP APIs.</p>
<p>In this example, we’ll build a minimal API using Node.js and Express. We’ll create a simple <code>POST /chat</code> endpoint that accepts a user message, sends it to the OpenAI API, receives the generated response, and returns it to the client.</p>
<p>Here, our Node.js server acts as the bridge between the user and the LLM provider.</p>
<p>For this example, create an API key from the <a href="https://console.groq.com/keys">Groq</a> console. Since it offers a free tier, it’s a simple way to experiment and understand the concepts.</p>
<p>First, install the dependencies:</p>
<pre><code class="language-plaintext">npm install express
</code></pre>
<pre><code class="language-javascript">import express from "express";

const app = express();
app.use(express.json());

app.post("/chat", async (req, res) =&gt; {
  const { message } = req.body;
  const response = await fetch("https://api.groq.com/openai/v1/chat/completions", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: GROQ_API_KEY,
    },
    body: JSON.stringify({
      model: "llama-3.3-70b-versatile",
      messages: [{ role: "user", content: message }],
    }),
  });

  const data = await response.json();

  if (!response.ok) {
    return res.status(response.status).json({ error: data });
  }

  const reply = data.choices[0].message.content;

  res.json({ reply });
});

const PORT = process.env.PORT || 8888;
app.listen(PORT, () =&gt; {
  console.log(`Server running on http://localhost:${PORT}`);
});
</code></pre>
<p>Start the server and make a request. Use Postman and do a POST request to <code>/chat</code> using the below body:</p>
<pre><code class="language-plaintext">POST /chat

{
  "message": "Explain Kubernetes"
}
</code></pre>
<p>Example response:</p>
<pre><code class="language-plaintext">{
  "reply": "Kubernetes is a container orchestration platform..."
}
</code></pre>
<p>The backend receives the message, forwards it to the model provider, receives generated text, and returns it to the client.</p>
<p>LLMs are excellent at language-centric tasks: they understand phrasing and intent, generate coherent text, extract structured information from unstructured input, and perform basic reasoning over provided context. These capabilities make them powerful for things like summarization, drafting, and conversational QA.</p>
<p>But there’s an important limitation: LLMs don't automatically know about and can't access your private or live data. They don’t have implicit access to your company database, internal APIs, or the current state of your systems unless you provide that information at runtime.</p>
<p>Because of that limitation, you need secure mechanisms to connect models to live systems and data — which brings us to the idea of tools.</p>
<h2 id="heading-why-llms-need-tools"><strong>Why LLMs Need Tools</strong></h2>
<p>Imagine asking:</p>
<blockquote>
<p>Check my order and raise support if delivery is delayed.</p>
</blockquote>
<p>The model alone can't inspect your order database or create a support ticket in your system. To do that, it must call external functions — for example, a <code>getOrderStatus(orderId)</code> API and a <code>createSupportTicket(orderId, issue)</code> action.</p>
<p>Those callable functions are what we call tools: programmatic interfaces the AI can use to interact with systems and take concrete actions on behalf of users.</p>
<p>A tool is simply a function that an AI model can call to interact with external systems or perform actions.</p>
<p>For example, imagine we have a getOrderStatus(id) function that returns an order’s delivery status.</p>
<p>To expose this to the LLM, we define a tools array. Each tool includes:</p>
<ul>
<li><p>type – currently "function"</p>
</li>
<li><p>function name – the function identifier</p>
</li>
<li><p>function description – helps the LLM decide when to call the tool</p>
</li>
<li><p>function parameters – a JSON Schema describing the arguments the tool expects</p>
</li>
</ul>
<p>Here's an example:</p>
<pre><code class="language-typescript">function getOrderStatus(id) {
  const statuses = ["pending", "success", "cancelled"];
  const status = statuses[Math.floor(Math.random() * statuses.length)];
  return `Your order status is ${status}.`;
}

const tools = [
  {
    type: "function",
    function: {
      name: "getOrderStatus",
      description: "Get the status of an order by its ID",
      parameters: {
        type: "object",
        properties: {
          id: { type: "string", description: "The order ID" },
        },
        required: ["id"],
      },
    },
  },
];
</code></pre>
<p>The above tool format is for Grok. Different LLM providers may use different formats for defining tools, but the overall idea remains the same.</p>
<p>When making the API call, we pass both the user messages and the list of available tools.</p>
<pre><code class="language-typescript">body: JSON.stringify({
    model: "llama-3.3-70b-versatile",
    messages: [{ role: "user", content: message }],
    tools,
}),
</code></pre>
<p>After the API call, the LLM decides whether a tool is needed. If a tool call is requested, our application executes the corresponding function and sends the result back to the model.</p>
<p>For this example, we'll only handle the <code>getOrderStatus</code> tool. We can check whether the model requested a tool call like this:</p>
<pre><code class="language-typescript">const toolCall = data.choices[0].message.tool_calls[0];
const { id } = JSON.parse(toolCall.function.arguments);
const toolResult = getOrderStatus(id)
</code></pre>
<p>and later we can pass the message context with tool result</p>
<pre><code class="language-typescript">body: JSON.stringify({
    model: "llama-3.3-70b-versatile",
    messages: [
        { role: "user", content: message },
        assistantMessage,
        { role: "tool", tool_call_id: toolCall.id, content: toolResult },
    ],
    tools,
}),
</code></pre>
<p>Finally, return the response:</p>
<pre><code class="language-typescript">return res.json({ reply: followUpData.choices[0].message.content });
</code></pre>
<p>Here's a diagram of the flow:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6a1fa5fdc5c3ae375fb38ab2/22d6dc4d-ad5e-4fbb-84f6-71c367565282.png" alt="User -> LLM -> Tool Execution -> Tool Result -> Final Response" style="display:block;margin:0 auto" width="1774" height="887" loading="lazy">

<p>The LLM decides whether a tool is needed and generates the required inputs, while your application executes the function.</p>
<h2 id="heading-where-mcp-comes-in"><strong>Where MCP Comes In</strong></h2>
<p>Tools are simple. You define functions and tell the AI what it can use.</p>
<p>For example, <code>getOrderStatus()</code> works well when all tools are built inside your application. But as applications grow, tools may come from many places, like Slack, GitHub, databases, internal systems, or third-party services. Each one may expose tools differently.</p>
<p>This is where <a href="https://www.freecodecamp.org/news/how-does-an-mcp-work-under-the-hood/">MCP (Model Context Protocol) helps</a>. Think of MCP as a common language that lets AI systems connect to external tools in a consistent way.</p>
<p>Tools define what the AI can do. MCP standardizes how the AI connects to and uses those tools.</p>
<p>Now let’s extend the previous /chat API example so the LLM can use tools exposed through MCP. There are multiple ways to do this:</p>
<ul>
<li><p>build and host your own MCP server and expose your application functions</p>
</li>
<li><p>connect to existing third-party MCP servers such as Slack</p>
</li>
</ul>
<p>For this tutorial, we'll keep things simple and use a remote MCP server approach because it's easier to understand.</p>
<pre><code class="language-plaintext">npm install express @modelcontextprotocol/sdk zod
</code></pre>
<p>Now let’s create our own MCP server and expose the same <code>getOrderStatus</code> function as an MCP tool:</p>
<pre><code class="language-typescript">import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { createMcpExpressApp } from "@modelcontextprotocol/sdk/server/express.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { z } from "zod";

function getOrderStatus(id) {
  const statuses = ["pending", "success", "cancelled"];
  const status = statuses[Math.floor(Math.random() * statuses.length)];
  return `Your order status is ${status}.`;
}

function createOrderServer() {
  const server = new McpServer({ name: "order-server", version: "1.0.0" });

  server.registerTool(
    "getOrderStatus",
    {
      description: "Get the status of an order by its ID",
      inputSchema: { id: z.string() },
    },
    async ({ id }) =&gt; ({
      content: [{ type: "text", text: getOrderStatus(id) }],
    })
  );

  return server;
}

const app = createMcpExpressApp({ host: "0.0.0.0" });

app.post("/mcp", async (req, res) =&gt; {
  const server = createOrderServer();
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: undefined,
  });

  res.on("close", () =&gt; {
    transport.close();
    server.close();
  });

  await server.connect(transport);
  await transport.handleRequest(req, res, req.body);
});

const PORT = process.env.PORT || 3001;
app.listen(PORT, "0.0.0.0", () =&gt; {
  console.log(`Order MCP server running on http://0.0.0.0:${PORT}/mcp`);
});
</code></pre>
<p>This is useful when you want to expose your own application functions through MCP. Typically, the MCP server runs separately and is accessed by MCP clients. Now any MCP client can connect to this server and discover the available tools automatically.</p>
<p>The same idea applies to third-party MCP servers.</p>
<p>For example, if a Slack MCP server is available, we can connect to it instead of writing Slack integration code ourselves.</p>
<p>In that case, our application isn't directly calling Slack APIs. It connects to the Slack MCP server, which exposes Slack-related tools using the MCP standard.</p>
<p>So the difference is:</p>
<ul>
<li><p>For our own features, we can build our own MCP server</p>
</li>
<li><p>For external systems, we can use existing MCP servers when available</p>
</li>
</ul>
<p>Now we can pass MCP servers to the LLM request:</p>
<pre><code class="language-typescript">body: JSON.stringify({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: message }],
  tools: [
    {
      type: "mcp",
      server_label: "OrderServer",
      server_url: `http://0.0.0.0:${PORT}/mcp`,
      server_description: "Get the status of an order by its ID",
    },
    {
      type: "mcp",
      server_label: "Slack",
      server_url: "https://mcp.slack.com/mcp",
      server_description: "Send and read Slack messages",
      headers: {
        Authorization: `Bearer ${process.env.SLACK_BOT_TOKEN}`,
      },
    },
  ],
})
</code></pre>
<p>We can also use local MCP servers instead of remote URLs by connecting through transports such as <code>StdioClientTransport</code>. In that case, we connect locally, discover the available tools, and expose them to the LLM.</p>
<p>Now if the user sends:</p>
<pre><code class="language-json">{
  "message": "What is status of order 123"
}
</code></pre>
<p>The LLM decides whether a tool is needed, MCP exposes and executes the tool, and the final response is returned to the user.</p>
<p>The flow becomes:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6a1fa5fdc5c3ae375fb38ab2/2db75d86-db9a-477e-b578-92221a490a2a.png" alt="User -> /chat api -> LLM -> MCP Tool -> Tool Result -> Tool Response" style="display:block;margin:0 auto" width="1774" height="887" loading="lazy">

<p>This standardization makes integrations far more reusable: instead of rewriting glue logic for each new connector, teams can register MCP-compliant tools and let the orchestrator and model handle discovery and invocation.</p>
<h2 id="heading-so-what-does-langchain-actually-do"><strong>So What Does LangChain Actually Do?</strong></h2>
<p>I initially thought LangChain was simply another wrapper around LLM APIs, but it is better understood as an orchestration framework for AI workflows. Tools let an LLM perform actions. MCP standardizes how tools are exposed. LangChain helps coordinate models, tools, and application logic to build multi-step workflows.</p>
<p>For example:</p>
<blockquote>
<p>User: Find flights, compare prices, book hotel, send confirmation.</p>
</blockquote>
<p>Now the system may need to:</p>
<ul>
<li><p>Check order status</p>
</li>
<li><p>Decide whether support is needed</p>
</li>
<li><p>Create a support ticket</p>
</li>
<li><p>Generate the final response</p>
</li>
</ul>
<p>Without orchestration, you would manually control each step. LangChain helps manage this flow.</p>
<p>To use LangChain, Install the required packages:</p>
<pre><code class="language-json">npm install express langchain @langchain/groq
</code></pre>
<p>We'll reuse the same tool functions from earlier:</p>
<pre><code class="language-typescript">import express from "express";
import { createAgent } from "langchain";
import { ChatGroq } from "@langchain/groq";

const app = express();
app.use(express.json());

const agent = createAgent({
  model: new ChatGroq({
    model: "llama-3.3-70b-versatile",
    apiKey: GROQ_API_KEY,
  }),
  tools: [
    {
      name: "getOrderStatus",
      description:
        "Get order status",
      execute: ({ id }) =&gt;
        getOrderStatus(id), // we have this function above
    },
    {
      name: "createSupportTicket",
      description:
        "Create support ticket",
      execute: ({ id }) =&gt;
        createSupportTicket(id), //imagine a function that creates a support ticket
    },
  ],
});

app.post(
  "/chat",
  async (req, res) =&gt; {
    const { message } = req.body;

    const response =
      await agent.invoke({
        messages: [
          {
            role: "user",
            content: message,
          },
        ],
      });

    res.json({
      reply:
        response.messages
          ?.at(-1)
          ?.text,
    });
  }
);

app.listen(3000);
</code></pre>
<p>Now the flow becomes:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6a1fa5fdc5c3ae375fb38ab2/bd2a266c-39eb-4f3e-9909-ad81360bccb7.png" alt="Horizontal architecture diagram showing User → /chat API → LangChain Agent → OpenAI → Tool → Tool Result → Final Response." style="display:block;margin:0 auto" width="1930" height="815" loading="lazy">

<p>LangChain doesn't replace tools or MCP. It sits above them and coordinates how everything works together.</p>
<h2 id="heading-putting-it-together"><strong>Putting It Together</strong></h2>
<p>A modern AI application usually has multiple layers working together. The LLM handles reasoning and language generation. Tools perform real operations such as reading data, calling APIs, or executing actions. MCP helps standardize how those tools are exposed and accessed. LangChain helps orchestrate the interaction between models, tools, and workflows.</p>
<p>By separating these responsibilities, applications become easier to extend, maintain, and scale.</p>
<p>The goal is more than just generating text. You want to be able to build systems that can reason, retrieve information, take actions, and reliably solve real user problems.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6a1fa5fdc5c3ae375fb38ab2/bfc88660-3145-4b89-a626-158c4ec52bcc.png" alt="User ->LLM -> LangChain -> MCP -> Tools -> Systems &amp; Data" style="display:block;margin:0 auto" width="1536" height="1024" loading="lazy">

<h2 id="heading-what-i-built-while-learning-this"><strong>What I Built While Learning This</strong></h2>
<p>After understanding the concepts above, I wanted to reduce some of this setup for my own projects. As I experimented, I noticed most applications recreate the same plumbing over and over: connecting an LLM, wiring up tools, managing execution, and exposing orchestration patterns.</p>
<p>So I built a small open-source toolkit to reduce that setup. The goal was simple: you should be able to focus on business logic instead of wiring AI infrastructure.</p>
<p>Current capabilities:</p>
<ul>
<li><p>LLM integration</p>
</li>
<li><p>Tool registration</p>
</li>
<li><p>Tool execution</p>
</li>
<li><p>Chat orchestration</p>
</li>
<li><p>LangChain support</p>
</li>
<li><p>Extensible architecture</p>
</li>
</ul>
<h3 id="heading-packages">Packages:</h3>
<p>AI Chat Widget: <a href="https://www.npmjs.com/package/ai-chat-toolkit-widget">https://www.npmjs.com/package/ai-chat-toolkit-widget</a></p>
<p>AI Chat Server: <a href="https://www.npmjs.com/package/ai-chat-toolkit-server">https://www.npmjs.com/package/ai-chat-toolkit-server</a></p>
<p>GitHub Repository: <a href="https://github.com/sudheeshshetty/ai-chat-toolkit">https://github.com/sudheeshshetty/ai-chat-toolkit</a></p>
<p>To build a server using the toolkit:</p>
<pre><code class="language-typescript">npm install express ai-chat-toolkit-server
</code></pre>
<p>Create the chat server:</p>
<pre><code class="language-typescript">const aiChat = new AiChatServer({
  path: "/my-chat",
  provider: "groq",
  apiKey: process.env.API_KEY,
  model: process.env.MODEL || "llama-3.3-70b-versatile",
  cors: {
    origin: "http://localhost:5174",
  },
  orchestration: "langchain",
  maxToolRounds: 6,
  systemPrompt:
    "You are a helpful operations assistant for a demo store. Keep answers concise.",
});
</code></pre>
<p>Add your tools:</p>
<pre><code class="language-typescript">aiChat.addTools([
  {
    name: "...",
    description: "...",
    inputSchema: { ... },
    handler: async (input) =&gt; { /* runs in Node */ },
  },
]);
</code></pre>
<p>Attach it to your Express app:</p>
<pre><code class="language-typescript">aiChat.attach(app);
</code></pre>
<p>Now <code>/my-chat</code> is exposed in your Express server and can be used directly.</p>
<p>You can also use <code>ai-chat-toolkit-widget</code> if you want to skip building the chat UI.</p>
<p>Examples are available in the repository, so you can try it out quickly.</p>
<p>A quick glance of one of the examples:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6a1fa5fdc5c3ae375fb38ab2/a9079710-be65-472b-881f-350daeeb0f3b.gif" alt="a9079710-be65-472b-881f-350daeeb0f3b" style="display:block;margin:0 auto" width="3456" height="2234" loading="lazy">

<p>If you find it useful, I’d appreciate a star, feedback, or contributions on GitHub as I continue improving the developer experience and exploring new ideas.<br>Thanks for reading — I hope this helped make LLMs, tools, MCP, and LangChain feel a little less magical and a lot more practical.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Mastra vs LangChain: Building an AI Agent Pipeline and Analyzing the Data ]]>
                </title>
                <description>
                    <![CDATA[ A week ago, I saw this tweet: I had just shipped SupportMesh, a multi-tenant AI support platform built on Mastra, so I had opinions from production. I liked the .dowhile() loop, the typed step schem ]]>
                </description>
                <link>https://www.freecodecamp.org/news/mastra-vs-langchain-building-an-ai-agent-pipeline-and-analyzing-the-data/</link>
                <guid isPermaLink="false">6a2d04106a8db5c6ef6facf4</guid>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MastraAI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Mastra ]]>
                    </category>
                
                    <category>
                        <![CDATA[ TypeScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Next.js ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Convex ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #anthropic ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm evaluation ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tavily ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agent-benchmarking ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Shola Jegede ]]>
                </dc:creator>
                <pubDate>Sat, 13 Jun 2026 07:17:36 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/0e1e81b3-6e39-4532-a12b-e99f600e372f.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>A week ago, I saw this tweet:</p>
<img src="https://cdn.hashnode.com/uploads/covers/62cab1b3e62bf98e0fb0a38f/fae48919-95f1-4089-969a-98da75006424.png" alt="tweet image: @omaroubari_ asking &quot;has anyone tried mastra and langchain for agent orchestration? which is better?&quot;" style="display:block;margin:0 auto" width="1190" height="536" loading="lazy">

<p>I had just shipped SupportMesh, a multi-tenant AI support platform built on Mastra, so I had opinions from production.</p>
<p>I liked the <code>.dowhile()</code> loop, the typed step schemas, and the way <code>createWorkflow</code> kept orchestration logic in one place. What I didn't like was the token overhead: every agent step initialises Mastra's tool loop manager regardless of whether tools are needed, and across a four-step pipeline that adds up to seconds of extra latency and thousands of extra tokens per run.</p>
<p>At the same time, I was looking at LangChain for a separate project I was starting. The approach is completely different from Mastra. Instead of a workflow with typed step contracts, you build a directed graph where nodes are plain async functions and state is a single shared object.</p>
<p>The promise is leaner execution and more explicit control over exactly what goes into each model call, which, given the token overhead I had been seeing with Mastra, was exactly the kind of thing I wanted to understand properly.</p>
<p>So rather than picking one based on documentation and vibes, I built the same pipeline in both and measured everything. The same five-step research and synthesis pipeline, twice, with every piece instrumented: tokens per step, latency per step, the exact prompt sent to Claude at each stage, the raw Tavily search results, and a production-grade evaluation system that actually produces varied scores rather than giving everything a 7.</p>
<p>Then I built a real-time web dashboard on Convex and Next.js so you can run any topic yourself and see every decision both frameworks make to get there.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62cab1b3e62bf98e0fb0a38f/950d7575-7048-42d9-9e00-9a59446c36dd.png" alt="Mastra vs LangChain dashboard showing both pipelines complete side by side, with Mastra scoring 9/10 in 25.2s using 9,846 tokens and LangChain scoring 8/10 in 19.8s using 7,875 tokens on the topic &quot;What is the real cost of running AI agents in production?&quot;" style="display:block;margin:0 auto" width="3456" height="2166" loading="lazy">

<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-the-tools-were-using">The Tools We're Using</a></p>
</li>
<li><p><a href="#heading-why-this-pipeline">Why This Pipeline</a></p>
</li>
<li><p><a href="#heading-the-project-structure">The Project Structure</a></p>
</li>
<li><p><a href="#heading-building-the-mastra-pipeline">Building the Mastra Pipeline</a></p>
<ul>
<li><p><a href="#heading-the-search-tool">The Search Tool</a></p>
</li>
<li><p><a href="#heading-the-agents">The Agents</a></p>
</li>
<li><p><a href="#heading-the-writecriticstep-why-write-and-critic-live-in-the-same-step">The writeCriticStep: Why Write and Critic Live in the Same Step</a></p>
</li>
<li><p><a href="#heading-token-capture">Token Capture</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-building-the-langchain-pipeline">Building the LangChain Pipeline</a></p>
<ul>
<li><p><a href="#heading-the-state-annotation">The State Annotation</a></p>
</li>
<li><p><a href="#heading-the-factory-pattern">The Factory Pattern</a></p>
</li>
<li><p><a href="#heading-the-graph-and-the-node-naming-collision">The Graph and the Node Naming Collision</a></p>
</li>
<li><p><a href="#heading-the-retry-wrapper">The Retry Wrapper</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-the-critic-that-gave-everything-a-7-out-of-10">The Critic That Gave Everything a 7 out of 10</a></p>
<ul>
<li><p><a href="#heading-what-production-grade-evaluation-actually-looks-like">What Production-Grade Evaluation Actually Looks Like</a></p>
</li>
<li><p><a href="#heading-extracting-json-from-chain-of-thought-output">Extracting JSON from Chain-of-Thought Output</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-the-evaluation-bias-i-almost-shipped">The Evaluation Bias I Almost Shipped</a></p>
</li>
<li><p><a href="#heading-the-real-time-dashboard">The Real-Time Dashboard</a></p>
<ul>
<li><p><a href="#heading-the-convex-schema">The Convex Schema</a></p>
</li>
<li><p><a href="#heading-the-fire-and-forget-pattern">The Fire-and-Forget Pattern</a></p>
</li>
<li><p><a href="#heading-subscribing-to-live-updates">Subscribing to Live Updates</a></p>
</li>
<li><p><a href="#heading-deduplicating-steps-after-retries">Deduplicating Steps After Retries</a></p>
</li>
<li><p><a href="#heading-the-live-log-auto-scroll">The Live Log Auto-Scroll</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-what-the-data-actually-shows">What the Data Actually Shows</a></p>
</li>
<li><p><a href="#heading-try-it-yourself">Try it Yourself</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along and run this yourself, you'll need four things:</p>
<ul>
<li><p><strong>Node.js 22 or later</strong>: the pipeline packages use modern TypeScript features that require a recent Node version.</p>
</li>
<li><p><strong>An Anthropic API key</strong>: you can get one at <a href="http://console.anthropic.com">console.anthropic.com</a>. Claude Haiku 4.5 is cheap enough that running this benchmark a dozen times costs a few cents.</p>
</li>
<li><p><strong>A Tavily API key</strong>: you can get one at <a href="http://tavily.com">tavily.com</a>. The free tier gives you 1,000 searches a month, which is more than enough to run this benchmark repeatedly.</p>
</li>
<li><p><strong>A Convex account</strong>: you can sign up at <a href="http://convex.dev">convex.dev</a>. The free tier covers everything here.</p>
</li>
</ul>
<p>Once you have those, the setup section at the end of this article walks through exactly where each one goes.</p>
<h2 id="heading-the-tools-were-using">The Tools We're Using</h2>
<p>Before getting into the build, it helps to know what each tool I used is and why it's in the stack. If you're already familiar with all of these, you can skip ahead.</p>
<p><a href="https://mastra.ai">Mastra</a> is a TypeScript-first framework for building AI-powered applications and agents. The idea is that you define individual steps with typed input and output schemas, chain them into a workflow, and the framework handles the data flow between them. It has opinions about structure, which is either a feature or a constraint depending on what you're building.</p>
<p><a href="https://www.langchain.com"><strong>LangChain</strong></a> is one of the most widely used frameworks for building LLM applications. It started in Python and has a TypeScript version.</p>
<p>For agent orchestration specifically, the relevant piece is <strong>LangGraph</strong>, which is LangChain's graph-based execution layer. Instead of a workflow with typed step contracts, you build a directed graph: nodes are plain async functions, state is a single shared object that every node reads from and writes to, and the flow between nodes is controlled by edges.</p>
<p><a href="https://www.anthropic.com/claude/haiku"><strong>Claude Haiku 4.5</strong></a> is the model powering all agents. It is Anthropic's fastest and most cost-efficient model, which makes it the right choice here.</p>
<p><a href="https://www.tavily.com"><strong>Tavily</strong></a> is a web search API built specifically for AI agents. Unlike a general search API, it returns structured results with relevance scores and content snippets that are ready to pass directly into a model prompt. The free tier is generous enough to run this benchmark without paying anything.</p>
<p>I used it here because it has a clean TypeScript SDK, it works in both Mastra tools and plain LangChain nodes without any adapter layer, and the search results are consistent enough that both pipelines are working with the same quality of input.</p>
<p><a href="https://www.convex.dev"><strong>Convex</strong></a> is a real-time database with a React hook, <code>useQuery</code>, that automatically re-renders your component whenever the underlying data changes. No polling, no WebSocket setup, and no manual state syncing. When both pipelines are writing step data as they execute, the run page just updates.</p>
<p><a href="https://nextjs.org"><strong>Next.js</strong></a> is the web framework for the dashboard. App Router, API routes for the pipeline execution, and server components where they make sense.</p>
<h2 id="heading-why-this-pipeline">Why This Pipeline</h2>
<p>A simple comparison wouldn't tell me anything useful, because the difference between frameworks only shows up when you actually push them.</p>
<p>The pipeline I landed on has five steps:</p>
<pre><code class="language-plaintext">Topic
  ↓
1. RESEARCH   (Tavily web search, 5 results with relevance scores)
2. ANALYSIS   (Extract 5 key findings, 3 themes, 1 central argument)
3. WRITE      (Draft a structured ~400-word report)
4. CRITIC     (Score the draft, provide dimension-level feedback)
5. LOOP       (Revise if score below 7, output if passes or 3 iterations used)
</code></pre>
<p>I chose each step because it stresses the frameworks differently.</p>
<p>The research step requires a real tool call, which is where Mastra's Agent abstraction does its heaviest work. The analysis step needs structured JSON output, which tests how each framework enforces output shape. The write step has strict content requirements enforced purely through prompt engineering. The critic needs to do chain-of-thought reasoning and produce structured JSON at the same time, which turns out to be harder than it sounds. And the revision loop tests perhaps the most fundamental difference between the two frameworks: how each one expresses conditional iteration.</p>
<p>Taken together, this covers most of what you would actually build with an agent framework in production: tool calls, structured output, multi-step orchestration, quality evaluation, and feedback loops.</p>
<h2 id="heading-the-project-structure">The Project Structure</h2>
<p>Everything lives in a single monorepo using npm workspaces, which means all packages share a single <code>node_modules</code> at the root and can import each other directly:</p>
<pre><code class="language-plaintext">mastra-vs-langchain/
├── packages/
│   ├── mastra-pipeline/          # Mastra implementation
│   ├── langchain-pipeline/       # LangChain/LangGraph implementation
│   ├── web/                      # Next.js 16 App Router dashboard
│   └── shared/                   # Shared TypeScript types
├── convex/                       # Real-time backend
└── package.json                  # Workspace root
</code></pre>
<p>The most important piece of the shared package is the <code>PipelineCallbacks</code> interface, which both pipeline implementations must satisfy. This is the contract that lets the dashboard receive live events from either framework: step starts, step completions, token counts, and Tavily results, all without knowing anything about Mastra or LangChain specifically:</p>
<pre><code class="language-typescript">// packages/shared/src/types.ts
export interface PipelineCallbacks {
  onPipelineStart: () =&gt; Promise&lt;string&gt;;
  onPipelineComplete: (id: string, data: PipelineCompleteData) =&gt; Promise&lt;void&gt;;
  onPipelineError: (id: string, error: string) =&gt; Promise&lt;void&gt;;
  step: {
    onStepStart: (stepName: string, iteration: number, input: string) =&gt; Promise&lt;string&gt;;
    onStepComplete: (stepId: string, data: StepCompleteData) =&gt; Promise&lt;void&gt;;
    onStepError: (stepId: string, error: string) =&gt; Promise&lt;void&gt;;
  };
}
</code></pre>
<p>Every Convex write, live log entry, and token count flows through this interface. Adding a new framework to the benchmark in the future means implementing this interface and plugging it into the API route, and nothing else needs to change.</p>
<h2 id="heading-building-the-mastra-pipeline">Building the Mastra Pipeline</h2>
<p>If you haven't used Mastra before, the core mental model is this: you define individual steps with typed input and output schemas, chain them together into a workflow, and Mastra manages the data flow between them.</p>
<p>The framework is opinionated about structure but that structure gives you type safety across the entire pipeline and makes the orchestration logic easy to read.</p>
<h3 id="heading-the-search-tool">The Search Tool</h3>
<p>Mastra tools are created with <code>createTool</code>, which takes a Zod input schema and an <code>execute</code> function that receives the validated input directly:</p>
<pre><code class="language-typescript">// packages/mastra-pipeline/src/tools/search.ts
import { createTool } from "@mastra/core/tools";
import { z } from "zod";
import { tavily } from "@tavily/core";

const client = tavily({ apiKey: process.env.TAVILY_API_KEY! });

export let lastTavilyCapture: { query: string; results: any[] } = {
  query: "",
  results: [],
};

export function resetTavilyCapture() {
  lastTavilyCapture = { query: "", results: [] };
}

export const searchTool = createTool({
  id: "web-search",
  description: "Search the web for information on a topic",
  inputSchema: z.object({ query: z.string() }),
  execute: async ({ query }) =&gt; {
    lastTavilyCapture = { query, results: [] };
    const results = await client.search(query, {
      maxResults: 5,
      searchDepth: "basic",
    });
    lastTavilyCapture.results = results.results;
    return { results: results.results };
  },
});
</code></pre>
<p>The <code>lastTavilyCapture</code> module-level variable is a deliberate workaround for a real constraint. Mastra's tool execution happens inside the agent's internal tool loop, which sits one layer below the workflow step.</p>
<p>I needed to capture the Tavily query and results for the dashboard so users can see the actual URLs and relevance scores for each run, but threading a callback through the agent's tool execution context would have required patching Mastra internals. Capturing at module scope and calling <code>resetTavilyCapture()</code> at the start of each research step is less elegant but completely reliable, and it prevents stale data from a previous run bleeding into the current one.</p>
<h3 id="heading-the-agents">The Agents</h3>
<p>Each step in the Mastra pipeline runs as a separate <code>Agent</code> instance. One thing worth knowing if you're just getting started with Mastra is that it requires an explicit <code>id</code> field alongside <code>name</code>. If you skip it, TypeScript throws a confusing error about missing required fields that doesn't point at the actual problem:</p>
<pre><code class="language-typescript">// packages/mastra-pipeline/src/agents/researcher.ts
export const researcherAgent = new Agent({
  name: "Researcher",
  id: "researcher",           // required in v1.41 - easy to miss
  instructions: `You are a research agent. When given a topic, use the 
  web-search tool to find 5 relevant results. Return ALL the raw search 
  results including titles, URLs, and content snippets as a formatted string.`,
  model: anthropic("claude-haiku-4-5"),
  tools: { searchTool },
});
</code></pre>
<p>The writer agent carries all its content requirements directly in the instructions rather than in a separate validation layer. This keeps the constraints in one visible place, which matters when the critic is giving feedback about which specific requirements the draft violated:</p>
<pre><code class="language-typescript">// packages/mastra-pipeline/src/agents/writer.ts
export const writerAgent = new Agent({
  name: "Writer",
  id: "writer",
  instructions: `You are a research analyst writing for a technical audience.

STRICT REQUIREMENTS:
- Opening sentence must state a specific finding from the research.
  Never open with "X is increasingly important."
- Every paragraph makes exactly one argument. State it first.
  Support it with specific evidence.
- Name specific tools, frameworks, companies, numbers, and dates.
- Conclusion must make a specific recommendation or prediction.
  It must not restate the introduction.
- Target length: 350-450 words.

FORBIDDEN PHRASES:
"it is important to note", "it is worth noting",
"organizations must consider", "in conclusion", "in summary",
"as we look to the future", "rapidly evolving landscape",
any sentence equally true if you replaced the topic`,
  model: anthropic("claude-haiku-4-5"),
});
</code></pre>
<h3 id="heading-the-writecriticstep-why-write-and-critic-live-in-the-same-step">The writeCriticStep: Why Write and Critic Live in the Same Step</h3>
<p>While implementing Mastra, I made one architectural decision here that diverges from most tutorials, and it's worth understanding why.</p>
<p>Mastra's <code>.dowhile()</code> construct loops a single step until a condition is met. That's clean when you have one thing to repeat, but the revision loop needs two things: a write step followed by a critic step. You can either combine them into one step, or build a nested workflow where the inner workflow contains both steps.</p>
<p>A nested workflow adds a layer of complexity that doesn't buy you anything in this case, so the write and critic phases live together in <code>writeCriticStep</code>. The step runs the writer first, then immediately runs the critic on the draft, and returns a combined output that includes both the draft and the score:</p>
<pre><code class="language-typescript">const writeCriticStep = createStep({
  id: "write-critic",
  inputSchema: z.object({
    topic: z.string(),
    research: z.string(),
    analysis: z.string(),
    keyFindings: z.array(z.string()),
    mainThemes: z.array(z.string()),
    centralArgument: z.string(),
    draft: z.string().optional(),       // populated after first iteration
    score: z.number().optional(),       // populated after first iteration
    feedback: z.string().optional(),    // populated after first iteration
    iterations: z.number().optional(),
  }),
  outputSchema: z.object({
    // ... all input fields plus draft, score, feedback, iterations
  }),
  execute: async ({ inputData }) =&gt; {
    const iteration = (inputData.iterations ?? 0) + 1;

    // WRITE phase
    let writerPrompt = `Topic: "\({inputData.topic}"\n\nResearch:\n\){inputData.research}\n\nAnalysis:\n${inputData.analysis}`;
    if (inputData.feedback &amp;&amp; inputData.draft) {
      // On revisions, the writer sees its previous attempt and the specific feedback
      writerPrompt += `\n\nPrevious draft:\n\({inputData.draft}\n\nFeedback:\n\){inputData.feedback}`;
    }

    const writeStepId = await callbacks.step.onStepStart("write", iteration, writerPrompt.slice(0, 500));
    const writerResult = await writerAgent.generate(writerPrompt);
    const draft = writerResult.text;
    await callbacks.step.onStepComplete(writeStepId, { output: draft, /* token data */ });

    // CRITIC phase: runs immediately after write, on the same draft
    const criticPrompt = `RESEARCH:\n\({inputData.research}\n\nANALYSIS:\n\){inputData.analysis}\n\nDRAFT:\n${draft}`;
    const criticStepId = await callbacks.step.onStepStart("critic", iteration, draft.slice(0, 500));
    const criticResult = await criticAgent.generate(criticPrompt);
    const parsed = extractJson(criticResult.text);
    const score = parsed?.score ?? 4;
    const feedback = parsed?.feedback ?? "Score parsing failed";
    await callbacks.step.onStepComplete(criticStepId, { output: criticResult.text, criticScore: score });

    return { ...inputData, draft, score, feedback, iterations: iteration };
  },
});
</code></pre>
<p>The <code>.dowhile()</code> condition then checks whether to loop again. It receives the output of the previous <code>writeCriticStep</code> as <code>inputData</code>, so it can read the score directly:</p>
<pre><code class="language-typescript">const workflow = createWorkflow({
  id: `research-pipeline-${Date.now()}`,  // timestamp prevents conflicts on concurrent runs
  inputSchema: z.object({ topic: z.string() }),
})
  .then(researchStep)
  .then(analysisStep)
  .dowhile(
    writeCriticStep,
    async ({ inputData }) =&gt; inputData.score &lt; 7 &amp;&amp; inputData.iterations &lt; 3
  )
  .commit();
</code></pre>
<p>The <code>Date.now()</code> in the workflow ID is there because Mastra workflows with a static ID conflict when two runs start concurrently. Adding the timestamp gives each run a unique workflow instance.</p>
<h3 id="heading-token-capture">Token Capture</h3>
<p>After any <code>agent.generate()</code> call, usage data lives on the result object. The shape changes between Mastra versions, so checking both possible field names is the safe approach:</p>
<pre><code class="language-typescript">const inputTokens =
  (result as any).usage?.promptTokens ??
  (result as any).usage?.inputTokens ??
  0;
const outputTokens =
  (result as any).usage?.completionTokens ??
  (result as any).usage?.outputTokens ??
  0;
</code></pre>
<h2 id="heading-building-the-langchain-pipeline">Building the LangChain Pipeline</h2>
<p>LangChain/LangGraph solves the same problem with a fundamentally different mental model.</p>
<p>Where Mastra gives you a workflow with explicitly typed step contracts, LangGraph gives you a directed graph. Nodes are plain async functions, state is a single shared mutable object that flows through the graph, and the execution order is determined by edges rather than a chain of <code>.then()</code> calls.</p>
<h3 id="heading-the-state-annotation">The State Annotation</h3>
<p>Before writing any nodes, you define the shape of the shared state using <code>Annotation.Root</code>. Every node in the graph reads from and writes to this object:</p>
<pre><code class="language-typescript">// packages/langchain-pipeline/src/graph/state.ts
export const PipelineState = Annotation.Root({
  topic: Annotation&lt;string&gt;(),
  research: Annotation&lt;string&gt;(),
  analysis: Annotation&lt;string&gt;(),
  draft: Annotation&lt;string&gt;(),
  score: Annotation&lt;number&gt;(),
  feedback: Annotation&lt;string&gt;(),
  iterations: Annotation&lt;number&gt;(),
  finalReport: Annotation&lt;string&gt;(),
  criticDimensions: Annotation&lt;object&gt;(),
});
</code></pre>
<p>Coming from Mastra, the difference in how data flows is significant. In Mastra, each step declares what it receives and returns, and the framework enforces that contract at the TypeScript level.</p>
<p>In LangGraph, any node can read or write any field in the shared state. The structure comes from the graph topology rather than the type system, which means Mastra catches data flow bugs at compile time while LangGraph makes it easier to add new fields to the pipeline without touching every step's schema.</p>
<h3 id="heading-the-factory-pattern">The Factory Pattern</h3>
<p>LangGraph nodes are plain async functions, which is exactly what makes them lean: no framework overhead, no initialization, just your code calling the model.</p>
<p>The challenge is that I needed to thread callbacks and a shared token accumulator through all four nodes, and plain functions have no built-in mechanism for that.</p>
<p>The solution is a factory function that creates all four nodes as closures over the shared state:</p>
<pre><code class="language-typescript">// packages/langchain-pipeline/src/graph/nodes.ts
export function createNodes(
  callbacks: PipelineCallbacks,
  acc: { inputTokens: number; outputTokens: number }
) {
  const tavilyClient = tavily({ apiKey: process.env.TAVILY_API_KEY! });

  async function researchNode(state: PipelineStateType): Promise&lt;Partial&lt;PipelineStateType&gt;&gt; {
    const stepId = await callbacks.step.onStepStart("research", 1, state.topic);
    const results = await tavilyClient.search(state.topic, { maxResults: 5, searchDepth: "basic" });
    const research = results.results
      .map((r, i) =&gt; `[\({i + 1}] \){r.title}\nURL: \({r.url}\nContent: \){r.content}`)
      .join("\n\n");
    await callbacks.step.onStepComplete(stepId, {
      output: research,
      promptSent: state.topic,
      timeMs: elapsed,
      inputTokens: 0,      // research step uses Tavily, not an LLM
      outputTokens: 0,
      model: "tavily-search",
      tavilyQuery: state.topic,
      tavilyResults: JSON.stringify(results.results),
    });
    return { research };
  }

  // analysisNode, writeNode, criticNode follow the same pattern

  return { researchNode, analysisNode, writeNode, criticNode };
}
</code></pre>
<p>Notice the research node returns 0 tokens because it calls Tavily directly without any LLM involvement, which is one of the key differences that shows up in the benchmark data. Each subsequent node accumulates tokens directly into the shared <code>acc</code> object:</p>
<pre><code class="language-typescript">const inputTokens = response.usage_metadata?.input_tokens ?? 0;
const outputTokens = response.usage_metadata?.output_tokens ?? 0;
acc.inputTokens += inputTokens;
acc.outputTokens += outputTokens;
</code></pre>
<p>LangChain's <code>ChatAnthropic</code> puts usage on <code>response.usage_metadata</code>, which is cleanly typed and requires no casting.</p>
<h3 id="heading-the-graph-and-the-node-naming-collision">The Graph and the Node Naming Collision</h3>
<p>One thing LangGraph enforces that's easy to miss: node names can't conflict with state annotation keys. Naming a node <code>"research"</code> throws a runtime error because <code>state.research</code> already exists as a state channel, and the error message doesn't explain why. Renaming to <code>"researcher"</code> and <code>"analyzer"</code> fixes it:</p>
<pre><code class="language-typescript">export const pipeline = new StateGraph(PipelineState)
  .addNode("researcher", researchNode)   // NOT "research": conflicts with state.research
  .addNode("analyzer", analysisNode)     // NOT "analysis": conflicts with state.analysis
  .addNode("write", writeNode)
  .addNode("critic", criticNode)
  .addEdge(START, "researcher")
  .addEdge("researcher", "analyzer")
  .addEdge("analyzer", "write")
  .addEdge("write", "critic")
  .addConditionalEdges("critic", shouldRevise, {
    revise: "write",
    end: END,
  })
  .compile();
</code></pre>
<p>The revision loop in LangGraph is expressed as a conditional edge with a routing function:</p>
<pre><code class="language-typescript">function shouldRevise(state: PipelineStateType): string {
  if (state.score &gt;= 7 || state.iterations &gt;= 3) return "end";
  return "revise";
}
</code></pre>
<p>After every critic execution, <code>shouldRevise</code> runs and returns either <code>"revise"</code> to loop back to the write node or <code>"end"</code> to exit the graph. That's the state machine equivalent of Mastra's <code>.dowhile()</code>: the same conditional logic expressed as graph routing rather than as a named loop construct.</p>
<h3 id="heading-the-retry-wrapper">The Retry Wrapper</h3>
<p>Both frameworks hit intermittent TLS session reuse errors when making concurrent HTTPS requests. The error look like this: <code>SSL routines:tls_get_more_records:decryption failed or bad record mac</code>. A retry wrapper with linear backoff handles it:</p>
<pre><code class="language-typescript">async function retryOnFetch&lt;T&gt;(fn: () =&gt; Promise&lt;T&gt;, retries = 3): Promise&lt;T&gt; {
  for (let i = 0; i &lt;= retries; i++) {
    try {
      return await fn();
    } catch (e: any) {
      const shouldRetry =
        e?.message?.includes("fetch") ||
        e?.message === "fetch failed" ||
        e?.message?.includes("SSL") ||
        e?.message?.includes("ECONNRESET") ||
        e?.message?.includes("other side closed") ||
        e?.cause?.code === "ECONNRESET";
      if (i &lt; retries &amp;&amp; shouldRetry) {
        await new Promise((r) =&gt; setTimeout(r, 1000 * (i + 1)));
        continue;
      }
      throw e;
    }
  }
  throw new Error("unreachable");
}
</code></pre>
<p>Every <code>llm.invoke()</code> call in the LangChain nodes is wrapped in this. In the web app's API route, there's an equivalent <code>retryMutation</code> wrapper around every Convex call for the same reason.</p>
<h2 id="heading-the-critic-that-gave-everything-a-7-out-of-10">The Critic That Gave Everything a 7 out of 10</h2>
<p>With both pipelines running, I tested a few topics. Every score came back 7 out of 10, regardless of topic, framework, or iteration.</p>
<p>This is actually a well-documented failure mode called LLM-as-judge bias. When you ask a language model to assign a score from 1 to 10 without giving it structured criteria and explicit anchors for each score level, it gravitates toward 7. It's the socially safe answer: high enough to signal quality, low enough to seem fair, and it requires no real justification. The model has no incentive to discriminate because nothing in the prompt forces it to.</p>
<p>My original critic was this:</p>
<pre><code class="language-plaintext">You are a critical editor. Score the draft 1-10 on accuracy,
clarity, and depth. Return { score, feedback }.
</code></pre>
<p>That single sentence was the entire prompt, so obviously it gave everything a 7.</p>
<h3 id="heading-what-production-grade-evaluation-actually-looks-like">What Production-Grade Evaluation Actually Looks Like</h3>
<p>The solution I used comes from the <a href="https://arxiv.org/abs/2303.16634">G-Eval paper</a>, which is also the approach behind tools like DeepEval and RAGAS. The key insight is that you need three things working together: the judge must reason step-by-step before assigning any score, the dimensions being scored must be independent of each other, and each score level must have an explicit description of what it means, not just "1 is bad, 10 is perfect."</p>
<p>So, I rebuilt the critic around six mandatory steps that must all complete before a number is produced:</p>
<ol>
<li><p><strong>Claim audit</strong>: every factual claim in the report gets classified as GROUNDED (supported by a specific search result), INFERRED (reasonable extension of the research), UNSUPPORTED (no basis in the results), or HALLUCINATED (contradicts the results).</p>
</li>
<li><p><strong>Specificity audit</strong>: every generic sentence and every forbidden phrase gets flagged explicitly.</p>
</li>
<li><p><strong>Insight audit</strong>: checks whether the conclusion actually adds something beyond restating the introduction.</p>
</li>
<li><p><strong>Counterfactual check</strong>: the judge must name at least one specific belief a reader would hold after reading this that they wouldn't hold from just the topic title alone. If it can't identify one, the insight score can't exceed 6.</p>
</li>
<li><p><strong>Dimension scoring</strong>: three independent scores with explicit anchors for each level.</p>
</li>
<li><p><strong>Floor rule</strong>: if any single dimension scores 4 or below, the final score can't exceed 6 regardless of the other dimensions.</p>
</li>
</ol>
<p>The floor rule deserves a specific explanation because it addresses a real failure mode: without it, a report that hallucinates facts could score 2 on source fidelity but still end up with a passing score on the weighted average if specificity and insight are high enough. A critical failure in one dimension should disqualify the report, not get diluted.</p>
<p>This is the full critic prompt, which is shared between Mastra and LangChain via a constant in <code>nodes.ts</code>:</p>
<pre><code class="language-typescript">const CRITIC_INSTRUCTIONS = `You are a senior research editor.
Catch the specific ways AI-generated reports fail.

STEP 1: CLAIM AUDIT
Classify every claim: [GROUNDED] [INFERRED] [UNSUPPORTED] [HALLUCINATED]

STEP 2: SPECIFICITY AUDIT
List sentences that are generic, use forbidden phrases, or make no
falsifiable claims. Forbidden phrases: "it is important to note",
"organizations must consider", "rapidly evolving", "as we look to the future"

STEP 3: INSIGHT AUDIT
Does the conclusion add anything not already in the introduction?

STEP 3.5: COUNTERFACTUAL CHECK
Name one specific belief a reader holds after reading this that they
would not hold from just the topic title. If you cannot identify one,
insight cannot exceed 6.

STEP 4: SCORE EACH DIMENSION

SOURCE FIDELITY (40% weight):
5-6: Claims accurate but traced to general topic knowledge, not these specific results
7:   Most claims traceable, at least one source cited by name
8:   All major claims grounded, two or more named sources with specific details
9-10: Every claim traces to a named source, at least one statistic used

SPECIFICITY (30% weight):
5-6: Some specific claims but generic analysis between paragraphs
7:   Mostly specific, minor filler remains
8:   Every paragraph falsifiable, named entities throughout
9-10: Zero sentences survive if you swap the topic

INSIGHT (30% weight):
5-6: Some synthesis but conclusion could have been written before reading
7:   Conclusion makes a recommendation that follows from the evidence
8:   Identifies a tradeoff the reader has not considered
9-10: A senior engineer would reconsider an architectural decision after reading this

STEP 5: FLOOR RULE
If any dimension scores 4 or below, the final score cannot exceed 6.

STEP 6: CALCULATE
finalScore = round((fidelity * 0.40) + (specificity * 0.30) + (insight * 0.30))

Respond ONLY with this JSON:
{
  "fidelity": &lt;1-10&gt;,
  "fidelityReasoning": "&lt;one sentence&gt;",
  "specificity": &lt;1-10&gt;,
  "specificityReasoning": "&lt;one sentence&gt;",
  "insight": &lt;1-10&gt;,
  "insightReasoning": "&lt;one sentence&gt;",
  "score": &lt;weighted final&gt;,
  "feedback": "&lt;surgical: quote the specific sentence that caused the
  lowest-scoring dimension to fail, then state exactly what needs to change&gt;"
}`;
</code></pre>
<h3 id="heading-extracting-json-from-chain-of-thought-output">Extracting JSON from Chain-of-Thought Output</h3>
<p>Because the critic now writes several paragraphs of reasoning before producing the JSON, <code>JSON.parse(result.text)</code> throws because the response isn't pure JSON anymore. Before I caught this and fixed it, the fallback value of <code>4</code> was returned silently on every parse failure, which meant every loop ran the full three iterations on every topic.</p>
<p>The fix scans the text for the last valid JSON object, working backwards through any matches because the JSON block always appears at the end after the reasoning:</p>
<pre><code class="language-typescript">function extractJson(text: string): any {
  try { return JSON.parse(text.trim()); } catch {}

  const matches = text.match(/\{[\s\S]*\}/g);
  if (matches) {
    for (let i = matches.length - 1; i &gt;= 0; i--) {
      try { return JSON.parse(matches[i]); } catch {}
    }
  }

  const fenced = text.match(/```(?:json)?\s*([\s\S]*?)```/);
  if (fenced) {
    try { return JSON.parse(fenced[1].trim()); } catch {}
  }

  return null;
}
</code></pre>
<h2 id="heading-the-evaluation-bias-i-almost-shipped">The Evaluation Bias I Almost Shipped</h2>
<p>After the critic rebuild, things were working properly: first drafts scoring 4-6, the revision loop triggering, revisions actually improving on the previous attempt.</p>
<p>But a clear pattern emerged across technology topics: Mastra consistently scoring 8-9, and LangChain consistently scoring 6-7, on every single topic.</p>
<p>Looking at what the critic was actually rewarding revealed the problem. Source Fidelity carries 40% of the final score, and it rewards reports that cite specific named sources from the Tavily results. Mastra's reports were full of phrases like "according to Kore.ai's analysis" and "the ArXiv paper on orchestrated multi-agent systems identifies." LangChain's reports made the same points but without attributing them to specific sources.</p>
<p>The cause was how context flowed through each pipeline. Mastra's Agent class carries the full Tavily content (titles, URLs, content snippets) in its conversation history through the tool loop. By the time the writer agent runs, all of that source material is available in context.</p>
<p>The LangChain write node, on the other hand, only received <code>state.analysis</code>, which is the structured JSON extracted from the research: key findings, themes, and a central argument. By the time that JSON was produced, the specific source details had already been abstracted away. The writer had the conclusions but not the citations.</p>
<p>Both pipelines were correctly implemented according to each framework's idioms, but I had given them unequal inputs without realising it. The evaluation system was rewarding one framework for having more context rather than for producing a better report, and the consistent score gap across every technology topic was the signal: a genuine quality difference would vary by topic and draft, but a structural gap shows up the same way every time.</p>
<p>The fix was one change in the LangChain write node: pass <code>state.research</code> (the raw Tavily results) alongside <code>state.analysis</code>:</p>
<pre><code class="language-typescript">async function writeNode(state: PipelineStateType): Promise&lt;Partial&lt;PipelineStateType&gt;&gt; {
  const prompt = `You are a research analyst writing for a technical audience.

RESEARCH (raw search results -- cite specific sources by name):
${state.research}

ANALYSIS:
${state.analysis}
\({state.feedback ? `\nCRITIC FEEDBACK FROM PREVIOUS DRAFT:\n\){state.feedback}` : ""}

${WRITER_INSTRUCTIONS}

Return ONLY the report text.`;

  const response = await retryOnFetch(() =&gt; llm.invoke(prompt));
  return { draft: response.content as string, iterations: (state.iterations ?? 0) + 1 };
}
</code></pre>
<p>With both writers receiving identical source material, quality scores now reflect actual writing quality. If your evaluation system consistently favours one option across many runs, the first thing to check is whether both options have equal inputs. A structural gap produces consistent results, while a genuine quality difference varies by topic and draft quality.</p>
<h2 id="heading-the-real-time-dashboard">The Real-Time Dashboard</h2>
<p>Running pipelines in the terminal works for your own comparisons, but it doesn't scale to a benchmark that other people can use. The dashboard needed both pipelines running in parallel, every step visible as it executes, the full prompt and response expandable per step, Tavily results with relevance score bars, token counts, a live scrolling log, and everything saved and browsable by category.</p>
<h3 id="heading-the-convex-schema">The Convex Schema</h3>
<p>Convex was chosen specifically for real-time capabilities: its <code>useQuery</code> hook in React subscribes to database queries and automatically re-renders when the underlying data changes, without any polling or websocket management on your end.</p>
<p>The schema stores every run at three levels of granularity:</p>
<pre><code class="language-typescript">steps: defineTable({
  runId: v.id("runs"),
  pipelineResultId: v.id("pipelineResults"),
  framework: v.union(v.literal("mastra"), v.literal("langchain")),
  stepName: v.union(
    v.literal("research"), v.literal("analysis"),
    v.literal("write"), v.literal("critic")
  ),
  iterationNumber: v.number(),
  status: v.union(v.literal("running"), v.literal("complete"), v.literal("error")),
  promptSent: v.optional(v.string()),
  output: v.optional(v.string()),
  timeMs: v.optional(v.number()),
  inputTokens: v.optional(v.number()),
  outputTokens: v.optional(v.number()),
  model: v.optional(v.string()),
  tavilyQuery: v.optional(v.string()),
  tavilyResults: v.optional(v.string()),
  criticScore: v.optional(v.number()),
  criticFeedback: v.optional(v.string()),
  criticDimensions: v.optional(v.object({
    fidelity: v.number(),
    specificity: v.number(),
    insight: v.number(),
    fidelityReasoning: v.string(),
    specificityReasoning: v.string(),
    insightReasoning: v.string(),
  })),
}).index("by_pipeline_result", ["pipelineResultId"]),
</code></pre>
<p>The <code>criticDimensions</code> field stores the full G-Eval breakdown so the dashboard can render individual dimension scores with colored bars and the per-dimension reasoning text.</p>
<h3 id="heading-the-fire-and-forget-pattern">The Fire-and-Forget Pattern</h3>
<p>The most important decision in the Next.js API route is returning the <code>runId</code> before either pipeline finishes. If you await both pipelines first, the browser sits waiting 30-60 seconds before it can even navigate to the run page, and the whole point of real-time updates is gone.</p>
<pre><code class="language-typescript">const activeTasks = new Map&lt;string, Promise&lt;void&gt;&gt;();

export async function POST(req: NextRequest) {
  const { topic, category } = await req.json();

  // Create the Convex records synchronously (these are fast)
  const runId = await retryMutation(() =&gt;
    fetchMutation(api.runs.createRun, { topic, category, status: "running" })
  );
  const mastraResultId = await retryMutation(() =&gt;
    fetchMutation(api.pipelineResults.createPipelineResult, {
      runId, framework: "mastra", status: "running", iterations: 0,
    })
  );
  const langchainResultId = await retryMutation(() =&gt;
    fetchMutation(api.pipelineResults.createPipelineResult, {
      runId, framework: "langchain", status: "running", iterations: 0,
    })
  );

  // Start both pipelines without awaiting them
  const task = Promise.allSettled([
    withRetry(() =&gt; runMastraPipeline(topic, buildCallbacks(runId, mastraResultId, "mastra"))),
    withRetry(() =&gt; runLangChainPipeline(topic, buildCallbacks(runId, langchainResultId, "langchain"))),
  ]).then(async () =&gt; {
    await retryMutation(() =&gt;
      fetchMutation(api.runs.updateRunStatus, { runId, status: "complete" })
    );
    activeTasks.delete(runId as string);
  });

  // Hold a reference in the Map so Node.js doesn't garbage-collect the promise
  activeTasks.set(runId as string, task);
  return NextResponse.json({ runId });   // returns immediately
}
</code></pre>
<p>On Vercel, this pattern still fails because serverless functions terminate when the route handler returns, killing any background promises. The fix is using <code>waitUntil</code> from <code>@vercel/functions</code>, which tells Vercel to keep the execution context alive until the promise resolves:</p>
<pre><code class="language-typescript">import { waitUntil } from "@vercel/functions";

waitUntil(task);
return NextResponse.json({ runId });
</code></pre>
<h3 id="heading-subscribing-to-live-updates">Subscribing to Live Updates</h3>
<p>On the run page, three Convex queries run simultaneously: the run itself, the pipeline results, and the steps for each pipeline result.</p>
<p>The <code>"skip"</code> sentinel is important here: it tells Convex to hold the subscription open without executing the query until a real argument is available. This prevents a race condition where the steps query fires before the pipeline result records have been created:</p>
<pre><code class="language-typescript">const mastraSteps = useQuery(
  api.steps.getStepsForPipelineResult,
  mastraResult ? { pipelineResultId: mastraResult._id } : "skip"
);
</code></pre>
<h3 id="heading-deduplicating-steps-after-retries">Deduplicating Steps After Retries</h3>
<p>When a pipeline fails due to a TLS error and retries from the beginning, the failed attempt's step records stay in Convex alongside the successful attempt's records. The UI would render both, creating a visible gap between the research card and the rest of the steps.</p>
<p>The fix groups steps by <code>stepName + iterationNumber</code> and keeps the best version of each:</p>
<pre><code class="language-typescript">const stepMap = new Map&lt;string, Step&gt;();
[...steps]
  .sort((a, b) =&gt; (a._creationTime ?? 0) - (b._creationTime ?? 0))
  .forEach((s) =&gt; {
    const key = `\({s.stepName}-\){s.iterationNumber}`;
    const existing = stepMap.get(key);
    if (!existing) { stepMap.set(key, s); return; }
    if (s.status === "complete") { stepMap.set(key, s); return; }
    if (existing.status !== "complete") { stepMap.set(key, s); }
  });
</code></pre>
<h3 id="heading-the-live-log-auto-scroll">The Live Log Auto-Scroll</h3>
<p>Log entries are appended to the pipeline result document in Convex as an array, and the panel auto-scrolls as new entries arrive using a ref attached to an empty div at the bottom:</p>
<pre><code class="language-typescript">function LiveLogPanel({ logs }: { logs?: LogEntry[] }) {
  const endRef = useRef&lt;HTMLDivElement&gt;(null);

  useEffect(() =&gt; {
    endRef.current?.scrollIntoView({ behavior: "smooth" });
  }, [logs?.length]);

  return (
    &lt;div className="max-h-52 overflow-y-auto font-mono text-xs"&gt;
      {logs?.map((entry, i) =&gt; (
        &lt;div key={i} className="flex gap-2"&gt;
          &lt;span className="text-[#484f58]"&gt;[{fmtTs(entry.timestamp)}]&lt;/span&gt;
          &lt;span className={`font-bold w-14 ${tagColor(entry.tag)}`}&gt;{entry.tag}&lt;/span&gt;
          &lt;span className="text-[#c9d1d9]"&gt;{entry.message}&lt;/span&gt;
        &lt;/div&gt;
      ))}
      &lt;div ref={endRef} /&gt;
    &lt;/div&gt;
  );
}
</code></pre>
<p>The effect dependency is <code>logs?.length</code>, so the scroll triggers every time a new log entry arrives from Convex.</p>
<h2 id="heading-what-the-data-actually-shows">What the Data Actually Shows</h2>
<p><strong>Speed:</strong> LangChain is 25-45% faster in every run. On shorter topics the gap narrows to 7-8 seconds, but it never reverses.</p>
<p>I think the reason for this is structural. Mastra's Agent class initialises its tool loop manager on every step, even when no tools are called. That means internal conversation history, tool schemas, and retry infrastructure are all set up as overhead before the actual model call happens.</p>
<p>Across a four-step pipeline, those 2-5 seconds per step accumulate. LangGraph nodes are plain async functions, so your code runs directly, with no framework initialisation between you and the model.</p>
<p><strong>Tokens:</strong> Mastra uses 1.5-2.5x more tokens. The research step alone accounted for most of that gap because LangChain's research node calls Tavily directly without invoking an LLM at all.</p>
<p>On more typical topics, Mastra runs around 6,200 tokens and LangChain around 3,900. The gap scales with how much content Tavily returns, because that content flows into Mastra's agent conversation history on every subsequent step.</p>
<p><strong>Quality:</strong> After fixing the evaluation bias, scores vary meaningfully by topic rather than by framework. Both produce high-scoring reports when the Tavily results are specific and rich. Both struggle on vague or biographical topics where the search results are generic.</p>
<p>A first draft scoring 7 or 8 means the research was strong and the writer made specific grounded claims. A 4 or 5 means the research returned thin results and the writer defaulted to generic observations, and the revision loop runs until either the draft improves or the iteration limit is hit.</p>
<p><strong>The tradeoff:</strong> Mastra handles orchestration complexity in the framework so you don't have to. You write <code>.dowhile()</code> instead of a conditional edge, typed step schemas instead of a shared mutable state object, and the framework manages conversation history and tool execution. The cost is a consistent token and latency overhead on every step.</p>
<p>LangChain gives you the graph execution engine and leaves everything else to you: more explicit wiring to write, but leaner execution and precise control over every token that enters each model call.</p>
<h2 id="heading-try-it-yourself">Try it Yourself</h2>
<p>The live demo is at <a href="https://mastra-vs-langchain.vercel.app">mastra-vs-langchain.vercel.app</a> and the complete source code for this comparison is at <a href="https://github.com/sholajegede/mastra-vs-langchain">github.com/sholajegede/mastra-vs-langchain</a>. If it helped you, consider giving it a star.</p>
<pre><code class="language-bash">git clone https://github.com/sholajegede/mastra-vs-langchain.git
cd mastra-vs-langchain
npm install
cp .env.example .env
# Add ANTHROPIC_API_KEY and TAVILY_API_KEY
npx convex dev   # Terminal 1
npm run web      # Terminal 2
</code></pre>
<p>Open <code>localhost:3000</code>, enter a topic, pick a category, and run both. Every step is visible as it happens, every token is counted, and the history page stores all previous runs by category.</p>
<p>If you want to take this comparison further by adding CrewAI, CopilotKit, or any other framework to the benchmark, the <code>PipelineCallbacks</code> interface in <code>packages/shared</code> is the only contract you need to implement.</p>
<p>If this tutorial was useful, feel free to share it with others who might benefit. I’d really appreciate your thoughts. You can mention me on X at <a href="https://x.com/wani_shola">@wani_shola</a> or <a href="https://linkedin.com/in/sholajegede">connect with me on LinkedIn</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Protect Sensitive Data by Running LLMs Locally with Ollama ]]>
                </title>
                <description>
                    <![CDATA[ Whenever engineers are building AI-powered applications, use of sensitive data is always a top priority. You don't want to send users' data to an external API that you don't control. For me, this happ ]]>
                </description>
                <link>https://www.freecodecamp.org/news/protect-sensitive-data-with-local-llms/</link>
                <guid isPermaLink="false">69a99b623728a9dc358a5d85</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langgraph ]]>
                    </category>
                
                    <category>
                        <![CDATA[ LLM&#39;s  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manoj Aggarwal ]]>
                </dc:creator>
                <pubDate>Thu, 05 Mar 2026 15:04:02 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/92c9b0b4-5ff8-40ab-b5f5-a060765e99b4.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Whenever engineers are building AI-powered applications, use of sensitive data is always a top priority. You don't want to send users' data to an external API that you don't control.</p>
<p>For me, this happened when I was building <a href="https://github.com/manojag115/FinanceGPT">FinanceGPT</a>, which is my personal open-source project that helps me with my finances. This application lets you upload your bank statements, tax forms like 1099s, and so on, and then you can ask questions in plain English like, "How much did I spend on groceries this month?" or "What was my effective tax rate last year?"</p>
<p>The problem is that answering these questions means sending all the sensitive transaction history, W-2s and income data to OpenAI or Anthropic or Google, which I was not comfortable with. Even after redacting PII data from these documents, I was not ok with the trade-off.</p>
<p>This is where Ollama comes in. Ollama lets you run large language models entirely on your own laptop. You don't need any API keys or cloud infrastructure and no data leaves your machine.</p>
<p>In this tutorial, I will walk you through what Ollama is, how to get started with it, and how to use it in a real Python application so that users of the application can choose to keep their data completely local.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#what-is-ollama">What is Ollama</a></p>
</li>
<li><p><a href="#how-ollamas-api-works">How Ollama's API works</a></p>
</li>
<li><p><a href="#how-to-call-ollama-from-python">How to call Ollama from Python</a></p>
</li>
<li><p><a href="#how-to-integrate-ollama-into-a-langchain-app">How to Integrate Ollama into a LangChain App</a></p>
</li>
<li><p><a href="#how-to-build-an-llm-provider-agnostic-app">How to Build an LLM-Provider Agnostic App</a></p>
</li>
<li><p><a href="#how-to-use-ollama-with-langgraph">How to use Ollama with LangGraph</a></p>
</li>
<li><p><a href="#how-financegpt-uses-this-in-practice">How FinanceGPT Uses This in Practice</a></p>
</li>
<li><p><a href="#tradeoffs-to-be-aware-of">Tradeoffs to be Aware Of</a></p>
</li>
<li><p><a href="#conclusion">Conclusion</a></p>
</li>
<li><p><a href="#check-out-financegpt">Check Out FinanceGPT</a></p>
</li>
<li><p><a href="#resources">Resources</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You will need the following at a minimum:</p>
<ul>
<li><p>Python 3.10+</p>
</li>
<li><p>A machine with at least 8GB of RAM (16GB recommended for larger models)</p>
</li>
<li><p>Basic familiarity with Python and pip</p>
</li>
</ul>
<h2 id="heading-what-is-ollama">What is Ollama?</h2>
<p>Ollama is an open-source tool that makes running LLMs locally very easy. You can think of it as Docker but for AI models. You can pull models using just one command and Ollama handles everything else like downloading the weights, managing memory and the serving the model through a local REST API.</p>
<p>The local REST API is compatible with OpenAI's API format which means any application that can talk to OpenAI, can switch to using Ollama without changing any code.</p>
<h3 id="heading-installation">Installation</h3>
<p>First thing you would need is to download the installer from <a href="https://ollama.com/">ollama.com</a>. Once installed, you can verify it is running:</p>
<pre><code class="language-shell">ollama --version
</code></pre>
<p>The above command checks whether Ollama was installed correctly and prints the current version.</p>
<h3 id="heading-pull-and-run-your-first-model">Pull and Run Your First Model</h3>
<p>Ollama hosts a variety of models on <a href="https://ollama.com/library">ollama.com/library</a>. To pull and immediately chat with one, just do:</p>
<pre><code class="language-shell">ollama run llama3.2
</code></pre>
<p>This command will download the model from ollama and start an interactive chat session with it. Note: the model size would be a few GBs depending on which model is downloaded. Alternatively, if you want to download a specific model only:</p>
<pre><code class="language-shell">ollama pull mistral
</code></pre>
<p>This downloads a model to your machine without starting a chat session which is useful when you want to set up models in advance.</p>
<p>You can run the following command to list the models you have installed:</p>
<pre><code class="language-shell">ollama list
</code></pre>
<p>This shows all models you've downloaded locally along with their sizes.</p>
<p>I have used the following models and they have worked great for specific tasks:</p>
<table>
<thead>
<tr>
<th>Model</th>
<th>Size</th>
<th>Good For</th>
</tr>
</thead>
<tbody><tr>
<td><code>llama3.2</code></td>
<td>~2GB</td>
<td>Fast, general purpose</td>
</tr>
<tr>
<td><code>mistral</code></td>
<td>~4GB</td>
<td>Strong instruction following</td>
</tr>
<tr>
<td><code>qwen2.5:7b</code></td>
<td>~4GB</td>
<td>Multilingual, reasoning</td>
</tr>
<tr>
<td><code>deepseek-r1:7b</code></td>
<td>~4GB</td>
<td>Complex reasoning tasks</td>
</tr>
</tbody></table>
<h2 id="heading-how-ollamas-api-works">How Ollama's API works</h2>
<p>Once Ollama is running, it will be served on localhost:11434. You can call it directly using curl:</p>
<pre><code class="language-shell">curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{ "role": "user", "content": "What is compound interest?" }],
  "stream": false
}'
</code></pre>
<p>This sends a chat message directly to Ollama's REST API from the command line, with streaming disabled so you get the full response at once. The above endpoint is to simply chat with the model. The more useful endpoint is <code>http://localhost:11434/v1</code> as this is OpenAI-compatible. This is the key feature that makes it easy to drop into existing apps that use OpenAI or other LLMs.</p>
<h2 id="heading-how-to-call-ollama-from-python">How to Call Ollama from Python</h2>
<h3 id="heading-how-to-use-the-ollama-python-library">How to Use the Ollama Python Library</h3>
<p>Ollama has its own Python library that is pretty intuitive to use:</p>
<pre><code class="language-shell">pip install ollama
</code></pre>
<pre><code class="language-python">from ollama import chat

response = chat(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'Explain what a Roth IRA is in simple terms.'}
    ]
)

print(response.message.content)
</code></pre>
<p>The above code uses Ollama's native Python SDK to send a message and print the model's reply, which is the most straightforward way to call Ollama from Python</p>
<h3 id="heading-how-to-use-the-openai-sdk-with-ollama-as-the-backend">How to Use the OpenAI SDK with Ollama as the Backend</h3>
<p>As mentioned earlier, Ollama has an endpoint that is OpenAI compatible, so you can also use the OpenAI Python SDK and just point it to your local server:</p>
<pre><code class="language-shell">pip install openai
</code></pre>
<pre><code class="language-python">from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama',  # Required by the SDK, but ignored by Ollama
)

response = client.chat.completions.create(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'Explain what a Roth IRA is in simple terms.'}
    ]
)

print(response.choices[0].message.content)
</code></pre>
<p>This uses the standard OpenAI Python SDK but redirects it to your local Ollama server. The <code>api_key</code> field is required by the SDK but ignored by Ollama. This pattern makes using Ollama seamless for existing applications. The code is nearly identical to what you would write for OpenAI.</p>
<h2 id="heading-how-to-integrate-ollama-into-a-langchain-app">How to Integrate Ollama into a LangChain App</h2>
<p>Most production applications are built with an orchestration framework like LangChain, which has a native Ollama support. This means swapping providers is just a one-line change.</p>
<p>Install the integration:</p>
<pre><code class="language-shell">pip install langchain-ollama
</code></pre>
<h3 id="heading-how-to-create-a-chat-model">How to Create a Chat Model</h3>
<pre><code class="language-python">from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.2")

response = llm.invoke("What is the difference between a W-2 and a 1099?")
print(response.content)
</code></pre>
<p>This creates a LangChain-compatible chat model backed by a local Ollama model, a one-line swap from <code>ChatOpenAI</code>.</p>
<p>Compare this to the OpenAI version and you will see that the interface is almost identical:</p>
<pre><code class="language-python">from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")
</code></pre>
<h2 id="heading-how-to-build-an-llm-provider-agnostic-app">How to Build an LLM-Provider Agnostic App</h2>
<p>The real power of the application comes from the abstraction of LLM providers. Applications like Perplexity lets users choose the LLM they want to use for their tasks. Here's a simple factory pattern that returns the right LLM based on the configuration:</p>
<pre><code class="language-python">from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langchain_anthropic import ChatAnthropic

def get_llm(provider: str, model: str):
    """
    Return the appropriate LangChain LLM based on the provider.
    
    Args:
        provider: One of "openai", "ollama", "anthropic"
        model: The model name (e.g. "gpt-4o", "llama3.2", "claude-3-5-sonnet")
    
    Returns:
        A LangChain chat model ready to use
    """
    if provider == "openai":
        return ChatOpenAI(model=model)
    elif provider == "ollama":
        return ChatOllama(model=model)
    elif provider == "anthropic":
        return ChatAnthropic(model=model)
    else:
        raise ValueError(f"Unknown provider: {provider}")
</code></pre>
<p>The above snippet shows a helper that returns the right LangChain model based on a provider string, so the rest of your app never needs to know which LLM is running underneath.</p>
<p>Now the rest of your code does not need to know about the provider who's LLM is running underneath. This includes your chains, your agents and your tools. You pass <code>llm</code> around and it just works.</p>
<h2 id="heading-how-to-use-ollama-with-langgraph">How to use Ollama with LangGraph</h2>
<p>If you're using LangGraph to build agents (as I covered in my <a href="https://www.freecodecamp.org/news/how-to-develop-ai-agents-using-langgraph-a-practical-guide/">previous article on AI agents</a>), plugging in Ollama is equally seamless:</p>
<pre><code class="language-python">from langgraph.prebuilt import create_react_agent
from langchain_ollama import ChatOllama
from langchain_core.tools import tool

@tool
def get_spending_summary(category: str) -&gt; str:
    """Get total spending for a given category this month."""
    # In a real app, this would query your database
    return f"You spent $342.50 on {category} this month."

llm = ChatOllama(model="llama3.2")

agent = create_react_agent(
    model=llm,
    tools=[get_spending_summary]
)

response = agent.invoke({
    "messages": [{"role": "user", "content": "How much did I spend on groceries?"}]
})

print(response["messages"][-1].content)
</code></pre>
<p>This snippet builds a ReAct agent that uses a locally-running model to decide when to call tools while keeping all data on-device even during agentic workflows.</p>
<p>The agent will decide to call the <code>get_spending_summary</code> tool when needed and get the result using the locally running model instead of sending your data over the internet to OpenAI.</p>
<h2 id="heading-how-financegpt-uses-this-in-practice">How FinanceGPT Uses This in Practice</h2>
<p>FinanceGPT is built to support OpenAI, Anthropic, Google and Ollama as LLM providers. The user sets their preference on the UI or in a config file and the application instantiates the right model using a pattern very similar to the factory pattern above.</p>
<p>When the user chooses Ollama, here's what happens:</p>
<ol>
<li><p>Their bank statements and other sensitive documents are parsed locally</p>
</li>
<li><p>Sensitive fields like SSNs are masked before any LLM call</p>
</li>
<li><p>The masked data and query goes to the local Ollama server running on their own machine</p>
</li>
<li><p>The response comes back locally and nothing ever leaves their network</p>
</li>
</ol>
<p>To run FinanceGPT locally with Ollama, the setup looks like this:</p>
<pre><code class="language-shell"># 1. Pull a capable model
ollama pull llama3.2

# 2. Clone and configure FinanceGPT
git clone https://github.com/manojag115/FinanceGPT.git
cd FinanceGPT
cp .env.example .env

# 3. In .env, set your LLM provider to Ollama
# LLM_PROVIDER=ollama
# LLM_MODEL=llama3.2

# 4. Start the full stack
docker compose -f docker-compose.quickstart.yml up -d
</code></pre>
<p>With this setup, the entire application including the frontend, backend and LLM, runs on your own hardware.</p>
<h2 id="heading-tradeoffs-to-be-aware-of">Tradeoffs to be Aware Of</h2>
<p>Ollama is a great local alternative to using cloud LLMs, but it comes with its own problems.</p>
<h3 id="heading-response-quality">Response Quality</h3>
<p>Ollama models are essentially 7B parameter models running locally, so by design they will not match GPT-4o on complex reasoning tasks. For simple Q&amp;A and summarization tasks, the results would be comparable, but for multi-step reasoning or nuanced judgement calls, the gap is noticeable.</p>
<h3 id="heading-speed">Speed</h3>
<p>Inference speed depends on the hardware that is running the model. Without a GPU, the Ollama models can take several seconds to respond. On Apple Silicon (M1/M2/M3), the performance is surprisingly good even without a dedicated GPU.</p>
<h3 id="heading-hardware-requirements">Hardware Requirements</h3>
<p>Small models (7B parameters) need around 8GB of RAM, however larger models (13B+) need 16GB or more. If you are building your application for end users, you cannot guarantee they have the hardware.</p>
<h3 id="heading-tool-use-and-function-calling">Tool Use and Function Calling</h3>
<p>Not all local models support function calling reliably. If your agent depends heavily on tool use, test your chosen model carefully. Models like <code>qwen2.5</code> and <code>mistral</code> generally handle this better than others.</p>
<p>The right mental model: use cloud models when you need maximum capability, and local models when privacy or cost constraints make cloud models impractical.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you learned what Ollama is, how to install it and pull models, and three different ways to call it from Python: the native Ollama library, the OpenAI-compatible SDK, and LangChain. You also saw how to build a provider-agnostic factory pattern so your app can switch between cloud and local models with a single config change.</p>
<p>Ollama makes local LLMs genuinely practical for production apps. The OpenAI-compatible API means integration is nearly zero-friction, and LangChain's native support means you can build provider-agnostic apps from the start.</p>
<p>The finance domain is an obvious fit — but the same principle applies anywhere sensitive data is involved: healthcare, legal tech, HR, personal productivity. If your app processes data that users wouldn't want stored on someone else's server, giving them a local option isn't just a nice-to-have. It's a trust feature.</p>
<h2 id="heading-check-out-financegpt"><strong>Check Out FinanceGPT</strong></h2>
<p>All the code examples here came from <a href="https://github.com/manojag115/FinanceGPT">FinanceGPT</a>. If you want to see these patterns in a complete app, poke around the repo. It's got document processing, portfolio tracking, tax optimization – all built with LangGraph.</p>
<p>If you find this helpful, <a href="https://github.com/manojag115/FinanceGPT">give the project a star on GitHub</a> – it helps other developers discover it.</p>
<h2 id="heading-resources">Resources</h2>
<ul>
<li><p><a href="https://ollama.com/docs">Ollama Documentation</a></p>
</li>
<li><p><a href="https://ollama.com/library">Ollama Model Library</a></p>
</li>
<li><p><a href="https://python.langchain.com/docs/integrations/chat/ollama/">LangChain Ollama Integration</a></p>
</li>
<li><p><a href="https://www.freecodecamp.org/news/how-to-develop-ai-agents-using-langgraph-a-practical-guide/">How to Build AI Agents with LangGraph (my previous article)</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build and Deploy an AI Agent with LangChain, FastAPI, and Sevalla ]]>
                </title>
                <description>
                    <![CDATA[ Artificial intelligence is changing how we build software. Just a few years ago, writing code that could talk, decide, or use external data felt hard. Today, thanks to new tools, developers can build smart agents that read messages, reason about them... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-ai-agent-with-langchain-fastapi-and-sevalla/</link>
                <guid isPermaLink="false">6960413b864205dd1936a070</guid>
                
                    <category>
                        <![CDATA[ ai agents ]]>
                    </category>
                
                    <category>
                        <![CDATA[ FastAPI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Thu, 08 Jan 2026 23:43:55 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767915474046/728b3bd5-2dfe-45a3-a2a9-c682e4719d7d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Artificial intelligence is changing how we build software. Just a few years ago, writing code that could talk, decide, or use external data felt hard.</p>
<p>Today, thanks to new tools, developers can build smart agents that read messages, reason about them, and call functions on their own.</p>
<p>One such platform that makes this easy is <a target="_blank" href="https://github.com/langchain-ai/langchain">LangChain</a>. With LangChain, you can link language models, tools, and apps together. You can also wrap your agent inside a FastAPI server, then push it to a cloud platform for deployment.</p>
<p>This article will walk you through building your first AI agent. You will learn what LangChain is, how to build an agent, how to serve it through FastAPI, and how to deploy it on Sevalla.</p>
<h2 id="heading-what-well-cover">What We’ll Cover</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-langchain">What is LangChain?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-your-first-agent-with-langchain">How to Build Your First Agent with LangChain</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-wrapping-your-agent-with-fastapi">Wrapping Your Agent with FastAPI</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-deploy-your-ai-agent-to-sevalla">How to Deploy Your AI Agent to Sevalla</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-is-langchain">What is LangChain?</h2>
<p>LangChain is a framework for working with large language models. It helps you build apps that think, reason, and act.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767629343581/a7f55a7e-f9fa-4d34-9ce5-666adf9cb93d.jpeg" alt="Langchain" class="image--center mx-auto" width="891" height="708" loading="lazy"></p>
<p>A model on its own only gives text replies, but LangChain lets it do more. It lets a model call functions, use tools, connect with databases, and follow workflows.</p>
<p>Think of LangChain as a bridge. On one side is the language model. On the other side are your tools, data sources, and business logic. LangChain tells the model what tools exist, when to use them, and how to reply. This makes it ideal for building agents that answer questions, automate tasks, or handle complex flows.</p>
<p>Many developers use LangChain because it is flexible. It supports many AI models. It fits well with Python.</p>
<p>Langchain also makes it easier to move from prototype to production. Once you learn how to create an agent, you can reuse the pattern for more advanced use cases.</p>
<p>I have recently published a detailed <a target="_blank" href="https://www.turingtalks.ai/p/langchain-tutorial">langchain tutorial</a> here.</p>
<h2 id="heading-how-to-build-your-first-agent-with-langchain">How to Build Your First Agent with LangChain</h2>
<p>Let’s make our first agent. It will respond to user questions and <a target="_blank" href="https://www.freecodecamp.org/news/how-to-build-your-first-mcp-server-using-fastmcp/">call a tool</a> when needed.</p>
<p>We’ll give it a simple weather tool, then ask it about the weather in a city. Before this, create a file called <code>.env</code> and add your OpenAI api key. Langchain will automatically use it when making requests to OpenAI.</p>
<pre><code class="lang-python">OPENAI_API_KEY=&lt;key&gt;
</code></pre>
<p>Here is the code for our agent:</p>
<pre><code class="lang-python">
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> create_agent
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv

<span class="hljs-comment"># load environment variables</span>
load_dotenv()

<span class="hljs-comment"># defining the tool that LLM can call</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_weather</span>(<span class="hljs-params">city: str</span>) -&gt; str:</span>
    <span class="hljs-string">"""Get weather for a given city."""</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">f"It's always sunny in <span class="hljs-subst">{city}</span>!"</span>

<span class="hljs-comment"># Creating an agent</span>
agent = create_agent(
    model=<span class="hljs-string">"gpt-4o"</span>,
    tools=[get_weather],
    system_prompt=<span class="hljs-string">"You are a helpful assistant"</span>,
)

result = agent.invoke({<span class="hljs-string">"messages"</span>:[{<span class="hljs-string">"role"</span>:<span class="hljs-string">"user"</span>,<span class="hljs-string">"content"</span>:<span class="hljs-string">"What is the weather in san francisco?"</span>}]})
</code></pre>
<p>This small program shows the power of LangChain agents.</p>
<p>First, we import <code>create_agent</code>, which helps us build the agent. Then we write a function called <code>get_weather</code>. It takes a city name and returns a friendly sentence.</p>
<p>The function acts as our tool. A tool is something the agent can use. In real projects, tools might fetch prices, store notes, or call APIs.</p>
<p>Next, we call <code>create_agent</code>. We give it three things. We pass the model we want to use. We list the tools we want it to call. And we give a system prompt. The system prompt tells the agent who it is and how it should behave.</p>
<p>Finally, we run the agent. We call <code>invoke</code> with a message.</p>
<p>The user asks for the weather in San Francisco. The agent reads this message. It sees that the question needs the weather function. So it calls our tool <code>get_weather</code>, passes the city, and returns an answer.</p>
<p>Even though this example is tiny, it captures the main idea. The agent reads natural language, figures out what tool to use, and sends a reply.</p>
<p>Later, you can add more tools or replace the weather function with one that connects to a real API. But this is enough for us to wrap and deploy.</p>
<h2 id="heading-wrapping-your-agent-with-fastapi">Wrapping Your Agent with FastAPI</h2>
<p>The next step is to serve our agent. <a target="_blank" href="https://fastapi.tiangolo.com/">FastAPI</a> helps us expose our agent through an HTTP endpoint. That way, users and systems can call it through a URL, send messages, and get replies.</p>
<p>To begin, you install FastAPI and write a simple file like <code>main.py</code>. Inside it, you import FastAPI, load the agent, and write a route.</p>
<p>When someone posts a question, the API forwards it to the agent and returns the answer. The flow is simple.</p>
<p>The user talks to FastAPI. FastAPI talks to your agent. The agent thinks and replies. Here is the FAST API wrapper for your agent.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fastapi <span class="hljs-keyword">import</span> FastAPI
<span class="hljs-keyword">from</span> pydantic <span class="hljs-keyword">import</span> BaseModel
<span class="hljs-keyword">import</span> uvicorn
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> create_agent
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv
<span class="hljs-keyword">import</span> os

load_dotenv()

<span class="hljs-comment"># defining the tool that LLM can call</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_weather</span>(<span class="hljs-params">city: str</span>) -&gt; str:</span>
    <span class="hljs-string">"""Get weather for a given city."""</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">f"It's always sunny in <span class="hljs-subst">{city}</span>!"</span>

<span class="hljs-comment"># Creating an agent</span>
agent = create_agent(
    model=<span class="hljs-string">"gpt-4o"</span>,
    tools=[get_weather],
    system_prompt=<span class="hljs-string">"You are a helpful assistant"</span>,
)

app = FastAPI()

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ChatRequest</span>(<span class="hljs-params">BaseModel</span>):</span>
    message: str

<span class="hljs-meta">@app.get("/")</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">root</span>():</span>
    <span class="hljs-keyword">return</span> {<span class="hljs-string">"message"</span>: <span class="hljs-string">"Welcome to your first agent"</span>}

<span class="hljs-meta">@app.post("/chat")</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">chat</span>(<span class="hljs-params">request: ChatRequest</span>):</span>
    result = agent.invoke({<span class="hljs-string">"messages"</span>:[{<span class="hljs-string">"role"</span>:<span class="hljs-string">"user"</span>,<span class="hljs-string">"content"</span>:request.message}]})
    <span class="hljs-keyword">return</span> {<span class="hljs-string">"reply"</span>: result[<span class="hljs-string">"messages"</span>][<span class="hljs-number">-1</span>].content}

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    port = int(os.getenv(<span class="hljs-string">"PORT"</span>, <span class="hljs-number">8000</span>))
    uvicorn.run(app, host=<span class="hljs-string">"0.0.0.0"</span>, port=port)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p>Here, FastAPI defines a <code>/chat</code> endpoint. When someone sends a message, the server calls our agent. The agent processes it as before. Then FastAPI returns a clean JSON reply. The API layer hides the complexity inside a simple interface.</p>
<p>At this point, you have a working agent server. You can run it on your machine, call it with Postman or cURL, and check responses. When this works, you are ready to deploy.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767629386493/e5699447-d82e-4c73-87f8-87cec2d7dac2.png" alt="Postman Result" class="image--center mx-auto" width="1000" height="593" loading="lazy"></p>
<h2 id="heading-how-to-deploy-your-ai-agent-to-sevalla">How to Deploy Your AI Agent to Sevalla</h2>
<p>You can choose any cloud provider, like AWS, DigitalOcean, or others to host your agent. I will be using Sevalla for this example.</p>
<p><a target="_blank" href="https://sevalla.com/">Sevalla</a> is a developer-friendly PaaS provider. It offers application hosting, database, object storage, and static site hosting for your projects.</p>
<p>Every platform will charge you for creating a cloud resource. Sevalla comes with a $50 credit for us to use, so we won’t incur any costs for this example.</p>
<p>Let’s push this project to GitHub so that we can connect our repository to Sevalla. We can also enable auto-deployments so that any new change to the repository is automatically deployed.</p>
<p>You can also <a target="_blank" href="https://github.com/manishmshiva/first-agent-with-fastapi">fork my repository</a> from here.</p>
<p><a target="_blank" href="https://app.sevalla.com/login">Log in</a> to Sevalla and click on Applications -&gt; Create new application. You can see the option to link your GitHub repository to create a new application</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767629443568/85e00d7f-c296-4bed-94ba-8e2e5bbdb0ba.png" alt="Create application" class="image--center mx-auto" width="1000" height="825" loading="lazy"></p>
<p>Use the default settings. Click “Create application”. Now we have to add our openai api key to the environment variables. Click on the “Environment variables” section once the application is created, and save the <code>OPENAI_API_KEY</code> value as an environment variable.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767629507196/0ae254e2-00f6-46a1-8535-c3af006022c6.png" alt="Sevalla Environment Variables" class="image--center mx-auto" width="1000" height="293" loading="lazy"></p>
<p>Now we are ready to deploy our application. Click on “Deployments” and click “Deploy now”. It will take 2–3 minutes for the deployment to complete.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767629546289/cbdc2f5d-4902-4799-aed4-2177695748bc.png" alt="Sevalla Deployment" class="image--center mx-auto" width="1000" height="483" loading="lazy"></p>
<p>Once done, click on “Visit app”. You will see the application served via a URL ending with <code>sevalla.app</code> . This is your new root URL. You can replace <code>localhost:8000</code> with this URL and test in Postman.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767629568646/e849222d-0cb5-433f-a399-0e8a63d891d1.png" alt="Postman Response" class="image--center mx-auto" width="1000" height="592" loading="lazy"></p>
<p>Congrats! Your first AI agent with tool calling is now live. You can extend this by adding more tools and other capabilities, and pushing your code to GitHub, and Sevalla will automatically deploy your application to production.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Building AI agents is no longer a task for experts. With LangChain, you can write a few lines and create reasoning tools that respond to users and call functions on their own.</p>
<p>By wrapping the agent with FastAPI, you give it a doorway that apps and users can access. Finally, Sevalla makes it easy to push your agent live, monitor it, and run it in production.</p>
<p>This journey from agent idea to deployed service shows what modern AI development looks like. You start small. You explore tools. You wrap them and deploy them.</p>
<p>Then you iterate, add more capability, improve logic, and plug in real tools. Before long, you have a smart, living agent online. That is the power of this new wave of technology.</p>
<p><em>Hope you enjoyed this article. Signup for my free newsletter</em> <a target="_blank" href="https://www.turingtalks.ai/"><strong><em>TuringTalks.ai</em></strong></a> <em>for more hands-on tutorials on AI. You can also</em> <a target="_blank" href="https://manishshivanandhan.com/"><strong><em>visit my website</em></strong></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build an AI Agent with LangChain and LangGraph: Build an Autonomous Starbucks Agent ]]>
                </title>
                <description>
                    <![CDATA[ Back in 2023, when I started using ChatGPT, it was just another chatbot that I could ask complex questions to and it would identify errors in my code snippets. Everything was fine. The application had no memory of previous states or what was said the... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-starbucks-ai-agent-with-langchain/</link>
                <guid isPermaLink="false">69449a6dcd2a4eec1f27eb1b</guid>
                
                    <category>
                        <![CDATA[ ai agents ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nestjs ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Jibril-M🍀 ]]>
                </dc:creator>
                <pubDate>Fri, 19 Dec 2025 00:21:01 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1765630477745/8dffec85-c3c4-4d83-9aa4-f332439d4663.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Back in 2023, when I started using ChatGPT, it was just another chatbot that I could ask complex questions to and it would identify errors in my code snippets. Everything was fine. The application had no memory of previous states or what was said the day before.</p>
<p>Then in 2024, everything started to change. We went from a stateless chatbot to an AI agent that could call tools, search the internet, and generate download links.</p>
<p>At this point, I started to get curious. How can an LLM search the internet? An infinite number of questions were flowing through my head. Can it create its own tools, programs, or execute its own code? It felt like we were heading toward the Skynet (Terminator) revolution.</p>
<p>I was just ignorant 😅. But that's when I started my research and discovered LangChain, a tool that promises all those miracles without a billion-dollar budget.</p>
<p>In this article, you’ll build a fully functional AI agent using LangChain and LangGraph. You’ll start by defining structured data using Zod schemas, then parsing them for AI understanding. Next, you’ll learn about summarizing data into text, creating tools the agent can call, and setting up LangGraph nodes to orchestrate workflows.</p>
<p>You’ll see how to compile the workflow graph, manage state, and persist conversation history using MongoDB. By the end, you’ll have a working Starbucks barista AI that demonstrates how to combine reasoning, tool execution, and memory in a single agent.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-an-llm-agent">What is an LLM Agent?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-setup">Project Setup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-data-schematization-with-zod">Data Schematization with Zod</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-parse-the-schema">How to Parse the Schema</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-data-to-text-summarization">Data-to-Text Summarization</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-persist-orders-with-mongodb-in-nestjs">How to Persist Orders with MongoDB in NestJS</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-langgraph-stateannotation-terms">LangGraph State/Annotation Terms</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-tools-for-the-agent">How to Create Tools for the Agent</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-langgraph-nodes-workflow-components">LangGraph Nodes (Workflow Components)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-graph-declaration">Graph Declaration</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-workflow-compilation-and-state-persistence-final-part">Workflow Compilation and State Persistence (Final Part)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To take full advantage of this article, you should have a basic understanding of TypeScript, Node.js, and a bit of NestJS will help, as it’s the backend framework we’ll be using.</p>
<h2 id="heading-what-is-an-llm-agent"><strong>What is an LLM Agent?</strong></h2>
<p>By definition, an LLM agent is a software program that’s capable of perceiving its environment, making decisions, and taking autonomous actions to achieve specific goals. It often does this by interacting with tools and systems.</p>
<p>Many frameworks and conventions were created to achieve this, and one of the most famous and widely used is the ReAct (Reason &amp; Act) framework.</p>
<p>With this framework, the LLM receives a prompt, thinks, decides the next action (this can be calling a specific tool), and receives the tool data. Once the tool’s response has been received, the AI model observes the response, generates its own response, and plans its next actions based on the tool’s response.</p>
<p>You can read more about this concept on the official <a target="_blank" href="https://arxiv.org/abs/2210.03629">white paper</a>. And here’s a diagram that summarizes the entire process:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765064426716/b1e6d7b2-4e4b-43c4-af5c-9cd49b27a864.png" alt="Diagram illustrating an LLM agent workflow: the agent receives a prompt, reasons, decides an action (such as calling a tool), observes the tool’s response, generates its own response, and iteratively plans its next actions using the ReAct framework" class="image--center mx-auto" width="3015" height="1827" loading="lazy"></p>
<p>Note that the workflow is not limited to a single tool invocation – it can proceed through several rounds before returning to the user.</p>
<p>But for an LLM agent to be truly human-like and act with knowledge of the past, it requires a memory. This enables it to recall previous prompts and responses, maintaining consistency within the given thread.</p>
<p>There’s no single source of truth for how to approach this. Most agents implement a short-term memory. This means that the agent will append each new chat to the conversation history, and when a new prompt is submitted, the agent will append the previous messages to the new prompt.</p>
<p>This method is very efficient and gives the LLM a strong knowledge of previous states. But it can also introduce problems, because the more the conversation grows, the more the LLM will have to go through all previous messages in order to understand what action to take next.</p>
<p>And this can introduce some context drift, just like humans experience. You can’t watch a two-hour podcast and remember all the spoken words, right? In this scenario, the LLM will focus on the most relevant information, eventually losing some context.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765064542431/18b8d0a7-b9f1-4f7d-993d-76b3c4058ccf.png" alt="Illustration showing an LLM agent workflow with memory: the agent processes multiple rounds of prompts and tool interactions, maintains a short-term memory of previous conversations, and uses this context to decide actions, while older context may fade over time causing potential context drift." class="image--center mx-auto" width="3015" height="1827" loading="lazy"></p>
<p>You don’t have to implement this from scratch. Many tools and frameworks have been developed to make the implementation as easy as possible. You can build it from scratch if you want, of course, but we won’t be doing that here.</p>
<p>In this article, we’ll build a Starbucks barista that collects order information and calls a <code>create_order</code> tool once the order meets the full criteria. This is a tool that we’ll create and expose to the AI.</p>
<h2 id="heading-project-setup">Project Setup</h2>
<p>Let’s start by initializing our project. We’ll use Nest.js for its efficiency and native TypeScript support. Note that nothing here is tied to Nest.js – this is just a framework preference, and everything we’ll do here can be done with Node.js and Express.js.</p>
<p>Here is a list of all the tools that we’ll use:</p>
<ol>
<li><p><code>langchain/core</code> - <strong>Always required</strong></p>
<p> This is the main Langchain engine that defines all core tools and fundamental functions, containing:</p>
<ul>
<li><p>prompt templates</p>
</li>
<li><p>message types</p>
</li>
<li><p>runnables</p>
</li>
<li><p>tool interfaces</p>
</li>
<li><p>chain composition utilities, and more.</p>
</li>
</ul>
</li>
</ol>
<p>    Most LangChain project need this.</p>
<ol start="2">
<li><p><code>langchain/google-genai</code> - This package is used to interact with Google’s generative AI models, vector embedding models, and other related tools.</p>
</li>
<li><p><code>langchain/langgraph</code> - <strong>Important for building an AI agent with total control</strong></p>
<p> Langgraph is a low-level orchestration framework for building controllable agents. It can be used to build:</p>
<ul>
<li><p>Conversational agents.</p>
</li>
<li><p>Build complex task automation.</p>
</li>
<li><p>Agent’s context management.</p>
</li>
</ul>
</li>
<li><p><code>langchain/langgraph-checkpoint-mongodb</code> - This package provides a MongoDB-based checkpointer for LangGraph, enabling persistence of agent state and short-term memory using MongoDB.</p>
</li>
<li><p><code>@langchain/mongodb</code> - This package provides MongoDB integrations for LangChain, allowing you to:</p>
<ul>
<li><p>Store and retrieve vector embeddings.</p>
</li>
<li><p>Persist LangChain documents, agents, or memory states.</p>
</li>
<li><p>Easily integrate MongoDB as a database backend for your AI workflows.</p>
</li>
</ul>
</li>
<li><p><code>@nestjs/mongoose</code> - A NestJS wrapper around Mongoose for MongoDB. Provides:</p>
<ul>
<li><p>Dependency injection support for Mongoose models.</p>
</li>
<li><p>Simplified schema definition and model management.</p>
</li>
<li><p>Seamless integration of MongoDB into NestJS applications, enabling structured data persistence for AI apps or any backend.</p>
</li>
</ul>
</li>
<li><p><code>langchain</code> - This is the main npm package that aggregates LangChain functionality. It provides:</p>
<ul>
<li><p>Access to connectors, utilities, and core modules.</p>
</li>
<li><p>Easy import of different LangChain components in one place.</p>
</li>
<li><p>Commonly used alongside <code>@langchain/core</code> for building applications with minimal setup.</p>
</li>
</ul>
</li>
<li><p><code>mongodb</code> - The official MongoDB driver for Node.js. It provides:</p>
<ul>
<li><p>Low-level, flexible access to MongoDB databases.</p>
</li>
<li><p>Support for CRUD operations, transactions, and indexing.</p>
</li>
<li><p>A required dependency if you plan to connect LangChain components or your backend directly to MongoDB.</p>
</li>
</ul>
</li>
<li><p><code>mongoose</code> - An ODM (Object Data Modeling) library for MongoDB. Offers:</p>
<ul>
<li><p>Schema-based data modeling for MongoDB documents.</p>
</li>
<li><p>Middleware, validation, and hooks for MongoDB operations.</p>
</li>
<li><p>Ideal for structured data management in NestJS or other Node.js applications.</p>
</li>
</ul>
</li>
<li><p><code>zod</code> - A TypeScript-first schema validation library. Used for:</p>
<ul>
<li><p>Defining strict data schemas and validating inputs/outputs.</p>
</li>
<li><p>Ensuring type safety at runtime.</p>
</li>
<li><p>Useful in AI applications to validate responses from models or enforce data consistency.</p>
</li>
</ul>
</li>
</ol>
<p>Start by initializing your Nest.js project, and installing all the required dependencies:</p>
<pre><code class="lang-dart">$ npm i -g <span class="hljs-meta">@nestjs</span>/cli <span class="hljs-comment">//If you don't have Nest.js installed on your machine</span>
$ nest <span class="hljs-keyword">new</span> project-name

<span class="hljs-string">"dependencies"</span> : {
    <span class="hljs-string">"@langchain/core"</span>: <span class="hljs-string">"^0.3.75"</span>,
    <span class="hljs-string">"@langchain/google-genai"</span>: <span class="hljs-string">"^0.2.16"</span>,
    <span class="hljs-string">"@langchain/langgraph"</span>: <span class="hljs-string">"^0.4.8"</span>,
    <span class="hljs-string">"@langchain/langgraph-checkpoint-mongodb"</span>: <span class="hljs-string">"^0.1.1"</span>,
    <span class="hljs-string">"@langchain/mongodb"</span>: <span class="hljs-string">"^0.1.0"</span>,
    <span class="hljs-string">"@nestjs/mongoose"</span>: <span class="hljs-string">"^11.0.3"</span>,
    <span class="hljs-string">"langchain"</span>: <span class="hljs-string">"^0.3.33"</span>,
    <span class="hljs-string">"mongodb"</span>: <span class="hljs-string">"^6.19.0"</span>,
    <span class="hljs-string">"mongoose"</span>: <span class="hljs-string">"^8.18.1"</span>,
    <span class="hljs-string">"zod"</span>: <span class="hljs-string">"^4.1.8"</span>
}

<span class="hljs-comment">//The versions may not be same at the time you are reading this, so I recommand checking</span>
<span class="hljs-comment">//The official documentation for each package.</span>
</code></pre>
<p>Now that we have our project created and all the packages installed, let’s see what we need to do to turn our vision into a project. Think of what you’ll need in order to create a Starbucks barista:</p>
<ul>
<li><p>First, we need to define the structure of our data (creating schemas)</p>
</li>
<li><p>Then we need to create a menu list that our agent will be referring to.</p>
</li>
<li><p>After that, we’ll add LLM interaction</p>
</li>
<li><p>And last but not least, we’ll add the ability to save previous conversations for conversational context.</p>
</li>
</ul>
<h3 id="heading-folder-structure">Folder Structure</h3>
<p>You can modify this folder structure and adapt it based on your framework of choice. But the core implementation is the same across all frameworks.</p>
<pre><code class="lang-plaintext">├── .env
├── .eslintrc.js
├── .gitignore
├── .prettierrc
├── nest-cli.json
├── package.json
├── README.md
├── tsconfig.build.json
├── tsconfig.json
├── src/
│   ├── app.controller.ts
│   ├── app.module.ts
│   ├── app.service.ts
│   ├── main.ts
│   ├── chat/
│   │   ├── chat.controller.ts
│   │   ├── chat.module.ts
│   │   ├── chat.service.ts
│   │   └── dtos/
│   │       └── chat.dto.ts
│   ├── data/
│   │   └── schema/
│   │       └── order.schema.ts
│   └── util/
│       ├── constants/
│       │   └── drinks_data.ts
│       ├── schemas/
│       │   ├── drinks/
│       │   │   └── Drink.schema.ts
│       │   └── orders/
│       │       └── Order.schema.ts
│       ├── summeries/
│       │   └── drink.ts
│       └── types/
</code></pre>
<h2 id="heading-data-schematization-with-zod">Data Schematization with Zod</h2>
<p>This file contains all our schema definitions regarding drinks and all modifications they can receive. This part is useful for defining the structure of the data that will be used by the AI agent.</p>
<h3 id="heading-importing-zod"><strong>Importing Zod</strong></h3>
<p>In the <code>lib/util/schemas/drinks.ts</code> file, before defining any schemas, import the Zod library, which provides tools for building TypeScript-first schemas.</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Imports the 'z' object from the 'zod' library.</span>
<span class="hljs-comment">// Zod is a TypeScript-first schema declaration and validation library.</span>
<span class="hljs-comment">// 'z' is the primary object used to define schemas (e.g., z.object, z.string, z.boolean, z.array).</span>
<span class="hljs-keyword">import</span> z <span class="hljs-keyword">from</span> <span class="hljs-string">"zod"</span>;
</code></pre>
<p>Zod gives you a simple and expressive way to define and validate the structure of the data our agent will interact with.</p>
<h3 id="heading-drink-schema"><strong>Drink Schema</strong></h3>
<p>This schema represents the structure of a drink in the Starbucks-style menu. I split and explained each field so the reader clearly understands what each property controls.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> DrinkSchema = z.object({
  name: z.string(),            <span class="hljs-comment">// Required name of the drink</span>
  description: z.string(),     <span class="hljs-comment">// Required explanation of what the drink is</span>
  supportMilk: z.boolean(),    <span class="hljs-comment">// Whether milk options are available</span>
  supportSweeteners: z.boolean(), <span class="hljs-comment">// Whether sweeteners can be added</span>
  supportSyrup: z.boolean(),   <span class="hljs-comment">// Whether flavor syrups are allowed</span>
  supportTopping: z.boolean(), <span class="hljs-comment">// Whether toppings are supported</span>
  supportSize: z.boolean(),    <span class="hljs-comment">// Whether the drink can be ordered in sizes</span>
  image: z.string().url().optional(), <span class="hljs-comment">// Optional image URL</span>
});
</code></pre>
<h3 id="heading-what-this-schema-represents"><strong>What this schema represents</strong></h3>
<ul>
<li><p>It ensures every drink has a proper name and a description.</p>
</li>
<li><p>It defines which customizations apply to the drink.</p>
</li>
<li><p>It prepares the agent to reason about drink options in a structured, validated format.</p>
</li>
</ul>
<h3 id="heading-sweetener-schema"><strong>Sweetener Schema</strong></h3>
<p>Each sweetener option in the menu is represented with its own schema.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> SweetenerSchema = z.object({
  name: z.string(),                <span class="hljs-comment">// Sweetener name</span>
  description: z.string(),         <span class="hljs-comment">// What it is / taste description</span>
  image: z.string().url().optional(), <span class="hljs-comment">// Optional image URL</span>
});
</code></pre>
<p>This ensures consistency across all sweetener entries and avoids malformed data.</p>
<h3 id="heading-syrup-schema"><strong>Syrup Schema</strong></h3>
<p>Similar to sweeteners, but for syrup flavors:</p>
<pre><code class="lang-typescript">
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> SyrupSchema = z.object({
  name: z.string(),
  description: z.string(),
  image: z.string().url().optional(),
});
</code></pre>
<p>This can represent flavors like Vanilla, Caramel, or Hazelnut.</p>
<h3 id="heading-topping-schema"><strong>Topping Schema</strong></h3>
<p>Toppings such as whipped cream or cinnamon are defined here.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> ToppingSchema = z.object({
  name: z.string(),
  description: z.string(),
  image: z.string().url().optional(),
});
</code></pre>
<h3 id="heading-size-schema"><strong>Size Schema</strong></h3>
<p>Drink sizes are modeled as objects as well:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> SizeSchema = z.object({
  name: z.string(),               <span class="hljs-comment">// e.g. Small, Medium</span>
  description: z.string(),        <span class="hljs-comment">// A short explanation</span>
  image: z.string().url().optional(),
});
</code></pre>
<h3 id="heading-milk-schema"><strong>Milk Schema</strong></h3>
<p>Represents milk types such as Whole, Skim, Almond, or Oat.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> MilkSchema = z.object({
  name: z.string(),
  description: z.string(),
  image: z.string().url().optional(),
});
</code></pre>
<h3 id="heading-collections-of-items"><strong>Collections of Items</strong></h3>
<p>Now that the individual item schemas exist, we can create <strong>collections</strong> of them. These represent all available toppings, sizes, milk types, syrups, sweeteners, and the entire menu of drinks</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> ToppingsSchema = z.array(ToppingSchema);
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> SizesSchema = z.array(SizeSchema);
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> MilksSchema = z.array(MilkSchema);
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> SyrupsSchema = z.array(SyrupSchema);
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> SweetenersSchema = z.array(SweetenerSchema);
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> DrinksSchema = z.array(DrinkSchema);
</code></pre>
<p>Why arrays? Because in the real world, your agent will receive <strong>lists</strong> from a database or API—not single items.</p>
<h3 id="heading-inferred-types"><strong>Inferred Types</strong></h3>
<p>Zod also allows TypeScript to infer types from schemas automatically.</p>
<p>This ensures:</p>
<ul>
<li><p>TypeScript types always match the schemas.</p>
</li>
<li><p>You avoid duplicated definitions.</p>
</li>
<li><p>The agent code stays consistent and safe.</p>
</li>
</ul>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Drink = z.infer&lt;<span class="hljs-keyword">typeof</span> DrinkSchema&gt;;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> SupportSweetener = z.infer&lt;<span class="hljs-keyword">typeof</span> SweetenerSchema&gt;;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Syrup = z.infer&lt;<span class="hljs-keyword">typeof</span> SyrupSchema&gt;;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Topping = z.infer&lt;<span class="hljs-keyword">typeof</span> ToppingSchema&gt;;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Size = z.infer&lt;<span class="hljs-keyword">typeof</span> SizeSchema&gt;;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Milk = z.infer&lt;<span class="hljs-keyword">typeof</span> MilkSchema&gt;;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Toppings = z.infer&lt;<span class="hljs-keyword">typeof</span> ToppingsSchema&gt;;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Sizes = z.infer&lt;<span class="hljs-keyword">typeof</span> SizesSchema&gt;;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Milks = z.infer&lt;<span class="hljs-keyword">typeof</span> MilksSchema&gt;;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Syrups = z.infer&lt;<span class="hljs-keyword">typeof</span> SyrupsSchema&gt;;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Sweeteners = z.infer&lt;<span class="hljs-keyword">typeof</span> SweetenersSchema&gt;;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Drinks = z.infer&lt;<span class="hljs-keyword">typeof</span> DrinksSchema&gt;;
</code></pre>
<p>These provide the rest of your LangChain/LangGraph code with strong typing based on your schema definitions.</p>
<p>This entire file:</p>
<ul>
<li><p>Encodes all drink-related data structures.</p>
</li>
<li><p>Provides validation to ensure clean, predictable data.</p>
</li>
<li><p>Automatically generates TypeScript types.</p>
</li>
<li><p>Helps the AI agent reason reliably about drinks and customization options.</p>
</li>
</ul>
<p>You’ll use these schemas later and convert them into string representations for LLM prompts.</p>
<p><em>You can find the file containing all the code</em> <a target="_blank" href="https://github.com/DjibrilM/langgraph-starbucks-agent/blob/main/src/lib/schemas/drinks.ts"><em>here</em></a><em>.</em></p>
<h2 id="heading-how-to-parse-the-schema">How to Parse the Schema</h2>
<p>As mentioned earlier, LLMs are <strong>text input–output machines</strong>. They don’t understand TypeScript types or Zod schemas directly. If you include a schema inside a prompt, the model will simply see it as plain text without understanding its structure or constraints.</p>
<p>Because of this, we need a way to convert schemas into a readable string format that can be embedded inside a prompt, such as:</p>
<blockquote>
<p>“The output must be a JSON object with the following fields…”</p>
</blockquote>
<p>This is exactly the problem solved by <code>StructuredOutputParser</code> from <code>langchain/output_parsers</code>. It takes a Zod schema and turns it into:</p>
<ul>
<li><p>A human-readable description that can be sent to an LLM.</p>
</li>
<li><p>A validator that checks whether the model’s output matches the schema.</p>
</li>
</ul>
<p>In short, it acts as a bridge between typed application logic and text-based AI output.</p>
<h3 id="heading-defining-the-order-schema">Defining the Order Schema</h3>
<p>We’ll start with a simple Zod schema that represents a customer’s drink order. This schema defines the exact shape and constraints of the data we expect the model to produce.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> OrderSchema = z.object({
  drink: z.string(),
  size: z.string(),
  mil: z.string(),
  syrup: z.string(),
  sweeteners: z.string(),
  toppings: z.string(),
  quantity: z.number().min(<span class="hljs-number">1</span>).max(<span class="hljs-number">10</span>),
});

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> OrderType = z.infer&lt;<span class="hljs-keyword">typeof</span> OrderSchema&gt;;
</code></pre>
<p>At this point, the schema is useful only inside our TypeScript application. The LLM still has no idea what this structure means.</p>
<h3 id="heading-parsing-the-schema-into-human-readable-text">Parsing the Schema into Human-Readable Text</h3>
<p>This is where schema parsing comes in. Using <code>StructuredOutputParser.fromZodSchema</code>, we can transform the Zod schema into:</p>
<ul>
<li><p>Instructions the LLM can understand.</p>
</li>
<li><p>A runtime validator that ensures the response is correct.</p>
</li>
</ul>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> OrderParser =
  StructuredOutputParser.fromZodSchema(OrderSchema <span class="hljs-keyword">as</span> <span class="hljs-built_in">any</span>);
</code></pre>
<p>The parser enables two critical workflows:</p>
<h4 id="heading-generating-prompt-instructions">Generating prompt instructions</h4>
<p>The parser can generate a text description of the schema that looks roughly like: “Return a JSON object with the fields <code>drink</code>, <code>size</code>, <code>mil</code>, <code>syrup</code>, <code>sweeteners</code>, and <code>toppings</code> as strings, and <code>quantity</code> as a number between 1 and 10.” This string can be injected directly into your prompt so the LLM knows exactly how to format its response.</p>
<h4 id="heading-validating-the-models-output">Validating the model’s output</h4>
<p>After the LLM responds, its output is still just text. The parser:</p>
<ul>
<li><p>Converts that text into a JavaScript object.</p>
</li>
<li><p>Validates it against the original Zod schema.</p>
</li>
<li><p>Throws an error if anything is missing, malformed, or out of bounds.</p>
</li>
</ul>
<p>This prevents invalid AI-generated data (for example, <code>quantity: 0</code>) from entering your system.</p>
<h3 id="heading-reusing-the-same-approach-for-other-schemas">Reusing the Same Approach for Other Schemas</h3>
<p>Once you understand this pattern, applying it to other schemas is straightforward.</p>
<p>For example, you can do the same thing for a <code>DrinkSchema</code>:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> DrinkParser =
  StructuredOutputParser.fromZodSchema(DrinkSchema <span class="hljs-keyword">as</span> <span class="hljs-built_in">any</span>);
</code></pre>
<p>Now you can confidently say something like: “Hey Gemini, this is what a drink object looks like—please respond using this structure.”</p>
<h3 id="heading-why-this-matters">Why This Matters</h3>
<p>Schema parsing allows you to:</p>
<ul>
<li><p>Keep strong typing in your application.</p>
</li>
<li><p>Give clear formatting instructions to the LLM.</p>
</li>
<li><p>Safely convert unstructured AI output into validated, production-ready data.</p>
</li>
</ul>
<p>Without this step, working with LLMs at scale becomes unreliable and error-prone.</p>
<h2 id="heading-data-to-text-summarization">Data-to-Text Summarization</h2>
<p>In the context of LLM agents, <strong>data-to-text summarization</strong> means converting structured data—such as objects returned from a database or backend API—into <strong>clear, human-readable strings</strong> that can be embedded directly into prompts.</p>
<p>Even the most advanced LLMs operate purely on text. They don’t reason over JavaScript objects, database rows, or JSON structures in the same way humans or programs do. The clearer and more descriptive your text input is, the more accurate and reliable the model’s output will be.</p>
<p>Because of this, a common and recommended pattern when building LLM-powered systems is:</p>
<p><strong>Fetch structured data → summarize it into natural language → pass the summary into the prompt</strong></p>
<p>To keep this article focused, we’ll store our data in constants instead of querying a real database. The technique is exactly the same whether the data comes from MongoDB, PostgreSQL, or an API.</p>
<h3 id="heading-the-core-idea">The Core Idea</h3>
<p>The goal of data-to-text summarization is simple:</p>
<ul>
<li><p>Take an object with fields and boolean flags</p>
</li>
<li><p>Convert it into a short paragraph that explains what the object represents</p>
</li>
<li><p>Remove ambiguity and guesswork for the LLM</p>
</li>
</ul>
<p>Instead of forcing the model to infer meaning from raw data, we <em>spell it out explicitly</em>.</p>
<h3 id="heading-summarizing-a-drink-object">Summarizing a Drink Object</h3>
<p>Consider the following drink object:</p>
<pre><code class="lang-typescript">{
  name: <span class="hljs-string">'Espresso'</span>,
  description: <span class="hljs-string">'Strong concentrated coffee shot.'</span>,
  supportMilk: <span class="hljs-literal">false</span>,
  supportSweeteners: <span class="hljs-literal">true</span>,
  supportSyrup: <span class="hljs-literal">true</span>,
  supportTopping: <span class="hljs-literal">false</span>,
  supportSize: <span class="hljs-literal">false</span>,
}
</code></pre>
<p>While this structure is easy for developers to understand, it’s not ideal for an LLM prompt. Boolean flags like <code>supportMilk: false</code> require interpretation, which increases the chance of incorrect assumptions.</p>
<p>Instead, we convert this object into a descriptive paragraph:</p>
<p>“A drink named Espresso. It is described as a strong, concentrated coffee shot. It cannot be made with milk. It can be made with sweeteners. It can be made with syrup. It cannot be made with toppings. It cannot be made in different sizes.”</p>
<p>This transformation is exactly what data-to-text summarization provides.</p>
<h3 id="heading-a-standard-summarization-pattern">A Standard Summarization Pattern</h3>
<p>Below is a simplified example of how we convert a <code>Drink</code> object into a readable description.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> createDrinkItemSummary = (drink: Drink): <span class="hljs-function"><span class="hljs-params">string</span> =&gt;</span> {
  <span class="hljs-keyword">const</span> name = <span class="hljs-string">`A drink named <span class="hljs-subst">${drink.name}</span>.`</span>;
  <span class="hljs-keyword">const</span> description = <span class="hljs-string">`It is described as <span class="hljs-subst">${drink.description}</span>.`</span>;

  <span class="hljs-keyword">const</span> milk = drink.supportMilk
    ? <span class="hljs-string">'It can be made with milk.'</span>
    : <span class="hljs-string">'It cannot be made with milk.'</span>;

  <span class="hljs-keyword">const</span> sweeteners = drink.supportSweeteners
    ? <span class="hljs-string">'It can be made with sweeteners.'</span>
    : <span class="hljs-string">'It cannot contain sweeteners.'</span>;

  <span class="hljs-keyword">const</span> syrup = drink.supportSyrup
    ? <span class="hljs-string">'It can be made with syrup.'</span>
    : <span class="hljs-string">'It cannot be made with syrup.'</span>;

  <span class="hljs-keyword">const</span> toppings = drink.supportTopping
    ? <span class="hljs-string">'It can be made with toppings.'</span>
    : <span class="hljs-string">'It cannot be made with toppings.'</span>;

  <span class="hljs-keyword">const</span> size = drink.supportSize
    ? <span class="hljs-string">'It can be made in different sizes.'</span>
    : <span class="hljs-string">'It cannot be made in different sizes.'</span>;

  <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${name}</span> <span class="hljs-subst">${description}</span> <span class="hljs-subst">${milk}</span> <span class="hljs-subst">${sweeteners}</span> <span class="hljs-subst">${syrup}</span> <span class="hljs-subst">${toppings}</span> <span class="hljs-subst">${size}</span>`</span>;
};
</code></pre>
<h3 id="heading-why-this-works-well-for-llms">Why this works well for LLMs</h3>
<ul>
<li><p>Boolean logic is converted into <strong>explicit sentences</strong></p>
</li>
<li><p>Every capability and limitation is clearly stated</p>
</li>
<li><p>The output can be embedded directly into a system or user prompt</p>
</li>
</ul>
<h3 id="heading-summarizing-collections-of-data">Summarizing Collections of Data</h3>
<p>This same approach applies to lists of data such as milks, syrups, toppings, or sizes. Instead of passing an array of objects to the model, we convert them into bullet-style text summaries:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> createSweetenersSummary = (): <span class="hljs-function"><span class="hljs-params">string</span> =&gt;</span> {
  <span class="hljs-keyword">return</span> <span class="hljs-string">`Available sweeteners are:
<span class="hljs-subst">${SWEETENERS.map(
  (s) =&gt; <span class="hljs-string">`- <span class="hljs-subst">${s.name}</span>: <span class="hljs-subst">${s.description}</span>`</span>
).join(<span class="hljs-string">'\n'</span>)}</span>`</span>;
};
</code></pre>
<p>This gives the model a <strong>complete, readable overview</strong> of available options without requiring it to interpret raw arrays.</p>
<h3 id="heading-applying-the-same-idea-to-other-domains">Applying the Same Idea to Other Domains</h3>
<p>This pattern is not limited to drinks or menus. It works for <em>any</em> domain. For example, here’s the same summarization technique applied to an object representing a shoe in an online ordering assistant:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> createShoeItemSummary = (shoe: {
  name: <span class="hljs-built_in">string</span>;
  description: <span class="hljs-built_in">string</span>;
  genderCategory: <span class="hljs-built_in">string</span>;
  styleType: <span class="hljs-built_in">string</span>;
  material: <span class="hljs-built_in">string</span>;
  availableInMultipleColors: <span class="hljs-built_in">boolean</span>;
  limitedEdition: <span class="hljs-built_in">boolean</span>;
  supportsCustomization: <span class="hljs-built_in">boolean</span>;
}): <span class="hljs-function"><span class="hljs-params">string</span> =&gt;</span> {
  <span class="hljs-keyword">return</span> <span class="hljs-string">`
A shoe named <span class="hljs-subst">${shoe.name}</span>.
It is described as <span class="hljs-subst">${shoe.description}</span>.
It is categorized as a <span class="hljs-subst">${shoe.genderCategory.toLowerCase()}</span> shoe.
It belongs to the <span class="hljs-subst">${shoe.styleType.toLowerCase()}</span> fashion style.
It is made of <span class="hljs-subst">${shoe.material.toLowerCase()}</span> material.
<span class="hljs-subst">${shoe.availableInMultipleColors ? <span class="hljs-string">'It is available in multiple colors.'</span> : <span class="hljs-string">'It is available in a single color.'</span>}</span>
<span class="hljs-subst">${shoe.limitedEdition ? <span class="hljs-string">'It is a limited-edition release.'</span> : <span class="hljs-string">'It is not a limited-edition release.'</span>}</span>
<span class="hljs-subst">${shoe.supportsCustomization ? <span class="hljs-string">'It supports customization options.'</span> : <span class="hljs-string">'It does not support customization options.'</span>}</span>
`</span>.trim();
};
</code></pre>
<p>Which produces an output like:</p>
<p>“A shoe named Veloria Canvas Sneaker. It is described as a minimalist everyday sneaker designed for casual wear. It is categorized as a unisex shoe. It belongs to the casual fashion style. It is made of breathable canvas material. It is available in multiple colors. It is not a limited-edition release. It supports light customization options.”</p>
<h2 id="heading-how-to-persist-orders-with-mongodb-in-nestjs">How to Persist Orders with MongoDB in NestJS</h2>
<p>Now that we’ve established the core foundations of our application—schemas, parsers, and data-to-text summaries—it’s time to <strong>persist data</strong>. In a real-world assistant, orders and conversations shouldn’t disappear when the server restarts. They need to be stored reliably so they can be retrieved, analyzed, or continued later.</p>
<p>To achieve this, we’ll use MongoDB as our database and the NestJS Mongoose integration to manage data models and collections.</p>
<h3 id="heading-connecting-mongodb-to-a-nestjs-application">Connecting MongoDB to a NestJS Application</h3>
<p>In NestJS, the <code>AppModule</code> is the root module of the application. This is where global dependencies—such as database connections—are configured.</p>
<pre><code class="lang-typescript"><span class="hljs-meta">@Module</span>({
  imports: [
    MongooseModule.forRoot(process.env.MONGO_URI),
    ChatsModule,
  ],
  controllers: [AppController],
  providers: [AppService],
})
<span class="hljs-keyword">export</span> <span class="hljs-keyword">class</span> AppModule {}
</code></pre>
<p>What’s happening here?</p>
<ul>
<li><p><code>MongooseModule.forRoot(...)</code> establishes a global MongoDB connection.</p>
</li>
<li><p>The connection string is read from an environment variable (<code>MONGO_URI</code>), which is the recommended practice for security.</p>
</li>
<li><p>Once configured, this connection becomes available throughout the entire application.</p>
</li>
<li><p><code>ChatsModule</code> is imported so it can access the database connection and register its own schemas.</p>
</li>
</ul>
<p>This setup ensures that every feature module can safely interact with MongoDB without creating multiple connections.</p>
<h3 id="heading-defining-an-order-schema-with-mongoose">Defining an Order Schema with Mongoose</h3>
<p>NestJS uses decorators to define MongoDB schemas in a clean, class-based way. Each class represents a MongoDB document, and each property becomes a field in the collection.</p>
<pre><code class="lang-typescript"><span class="hljs-meta">@Schema</span>()
<span class="hljs-keyword">export</span> <span class="hljs-keyword">class</span> Order {
  <span class="hljs-meta">@Prop</span>({ required: <span class="hljs-literal">true</span> })
  drink: <span class="hljs-built_in">string</span>;

  <span class="hljs-meta">@Prop</span>({ <span class="hljs-keyword">default</span>: <span class="hljs-literal">null</span> })
  size: <span class="hljs-built_in">string</span>;

  <span class="hljs-meta">@Prop</span>({ <span class="hljs-keyword">default</span>: <span class="hljs-literal">null</span> })
  milk: <span class="hljs-built_in">string</span>;

  <span class="hljs-meta">@Prop</span>({ <span class="hljs-keyword">default</span>: <span class="hljs-literal">null</span> })
  syrup: <span class="hljs-built_in">string</span>;

  <span class="hljs-meta">@Prop</span>({ <span class="hljs-keyword">default</span>: <span class="hljs-literal">null</span> })
  sweeter: <span class="hljs-built_in">string</span>;

  <span class="hljs-meta">@Prop</span>({ <span class="hljs-keyword">default</span>: <span class="hljs-literal">null</span> })
  toppings: <span class="hljs-built_in">string</span>;

  <span class="hljs-meta">@Prop</span>({ <span class="hljs-keyword">default</span>: <span class="hljs-number">1</span> })
  quantity: <span class="hljs-built_in">number</span>;
}
</code></pre>
<p>Why this approach?</p>
<ul>
<li><p>Each <code>@Prop()</code> decorator maps directly to a MongoDB field.</p>
</li>
<li><p>Default values allow partial orders to be saved incrementally.</p>
</li>
<li><p>Required fields (like <code>drink</code>) enforce basic data integrity.</p>
</li>
<li><p>The schema closely mirrors the structured output produced by the LLM.</p>
</li>
</ul>
<p>Once the class is defined, it’s converted into a MongoDB schema:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> OrderSchema = SchemaFactory.createForClass(Order);
</code></pre>
<p>This single line creates:</p>
<ul>
<li><p>A MongoDB collection</p>
</li>
<li><p>A validation layer</p>
</li>
<li><p>A schema that Mongoose can use to create, read, and update orders</p>
</li>
</ul>
<h3 id="heading-how-this-fits-into-the-llm-agent-architecture">How This Fits into the LLM Agent Architecture</h3>
<p>At this point, we have:</p>
<ul>
<li><p><strong>Zod schemas</strong> → for validating AI output</p>
</li>
<li><p><strong>Summarization functions</strong> → for converting data into readable prompts</p>
</li>
<li><p><strong>MongoDB schemas</strong> → for persisting finalized orders</p>
</li>
</ul>
<p>This separation is intentional:</p>
<ul>
<li><p>Zod handles <em>AI-facing validation</em></p>
</li>
<li><p>Mongoose handles <em>database persistence</em></p>
</li>
<li><p>NestJS acts as the glue that ties everything together</p>
</li>
</ul>
<h3 id="heading-preparing-for-the-agent-logic">Preparing for the Agent Logic</h3>
<p>With the database in place, we’re now ready to implement the agent itself.</p>
<p>The agent’s responsibilities will include:</p>
<ul>
<li><p>Interpreting user messages</p>
</li>
<li><p>Calling tools</p>
</li>
<li><p>Generating structured orders</p>
</li>
<li><p>Validating them</p>
</li>
<li><p>Persisting them to MongoDB</p>
</li>
<li><p>Maintaining conversational state</p>
</li>
</ul>
<p>All of this logic will live inside the <code>src/chats/chats.service.ts</code> file. The next section introduces the <strong>agent’s core logic</strong>, and we’ll walk through it step by step so every part is easy to follow.</p>
<p>Start by importing the required dependencies:</p>
<pre><code class="lang-tsx">
import { Injectable } from '@nestjs/common';
import { InjectModel } from '@nestjs/mongoose';
import { MongoClient } from 'mongodb';
import { Model } from 'mongoose';

import { tool } from '@langchain/core/tools';
import {
  ChatPromptTemplate,
  MessagesPlaceholder,
} from '@langchain/core/prompts';
import { AIMessage, BaseMessage, HumanMessage } from '@langchain/core/messages';

import { ChatGoogleGenerativeAI } from '@langchain/google-genai';
import { StateGraph } from '@langchain/langgraph';
import { ToolNode } from '@langchain/langgraph/prebuilt';
import { Annotation } from '@langchain/langgraph';
import { START, END } from '@langchain/langgraph';

import { MongoDBSaver } from '@langchain/langgraph-checkpoint-mongodb';

import z from 'zod';

import { Order } from './schemas/order.schema';
import { OrderParser, OrderSchema, OrderType } from 'src/lib/schemas/orders';
import { DrinkParser } from 'src/lib/schemas/drinks';
import { DRINKS } from 'src/lib/utils/constants/menu_data';

import {
  createSweetenersSummary,
  availableToppingsSummary,
  createAvailableMilksSummary,
  createSyrupsSummary,
  createSizesSummary,
  createDrinkItemSummary,
} from 'src/lib/summaries';

const GOOGLE_API_KEY = process.env.GOOGLE_API_KEY || '';
const client: MongoClient = new MongoClient(process.env.MONGO_URI || '');
const database_name = 'drinks_db';
</code></pre>
<h2 id="heading-langgraph-stateannotation-terms">LangGraph State/Annotation Terms</h2>
<p>In LangGraph, <strong>state</strong> can be thought of as a temporary workspace that exists while the agent is running. It stores all the information that nodes (we’ll cover nodes in detail later) might need to access information like the last message, the history of the conversation, or any intermediate data generated during execution.</p>
<p>This state allows nodes to <strong>read from it, update it, and pass information along</strong> as the agent processes a workflow, making it the agent’s short-term memory for the duration of the run.</p>
<pre><code class="lang-tsx">@Injectable()
export class ChatService {

  chatWithAgent = async ({
    thread_id,
    query,
  }: {
    thread_id: string;
    query: string;
  }) =&gt; {

    const graphState = Annotation.Root({
      messages: Annotation&lt;BaseMessage[]&gt;({
        reducer: (x, y) =&gt; [...x, ...y],
      }),
    });

  }

}
</code></pre>
<p>This code defines the <strong>LangGraph state</strong> for the chat agent. The <code>graphState</code> object acts as a central memory that every node in the workflow can read from and update.</p>
<p>The <code>messages</code> field specifically stores all messages in the conversation, including user messages, AI responses, and tool outputs. The reducer function <code>[...x, ...y]</code> appends new messages to the existing array, preserving the conversation history across multiple steps.</p>
<p>LangGraph’s reducer mechanism lets developers control how new state merges with old state. In this chat system, the approach is similar to updating React state with <code>setMessages(prev =&gt; [...prev, ...newMessages])</code>: it keeps the old messages while adding the new ones.</p>
<p>Together, this state enables the agent, tools, and checkpointing system to maintain a coherent conversation, allowing each node in the LangGraph workflow to access the full context and contribute incrementally.</p>
<h2 id="heading-how-to-create-tools-for-the-agent">How to Create Tools for the Agent</h2>
<p>Modern chatbots can do more than just generate text - they can also search the internet, read files, or perform computations. While LLMs are powerful, they cannot execute code or compile programs on their own.</p>
<p>In the code text of LLM agents, a tool is a piece of code written by the agent developer that an LLM can invoke on the host machine. The host machine executes the code, and the LLM only receives the final output of the computation.</p>
<p>Here's how to create a tool that stores orders in the database. Still in the <code>chatWithAgent</code> function within the <code>ChatService</code> class. Bellow the state store definition:</p>
<pre><code class="lang-tsx">const orderTool = tool(
  async ({ order }: { order: OrderType }) =&gt; {
    try {
      await this.orderModel.create(order);
      return 'Order created successfully';
    } catch (error) {
      console.log(error);
      return 'Failed to create the order';
    }
  },
  {
    schema: z.object({
      order: OrderSchema.describe('The order that will be stored in the DB'),
    }),
    name: 'create_order',
    description: 'This tool creates a new order in the database',
  }
);

const tools = [orderTool];
</code></pre>
<h2 id="heading-langgraph-nodes-workflow-components">LangGraph Nodes (Workflow Components)</h2>
<p>From a definition standpoint, a LangGraph node is a fundamental component of a LangGraph workflow, representing a single unit of computation or an individual step in an AI agent's process.</p>
<p>Each node can perform a specific task, such as generating a message, invoking a tool, or transforming data, and it interacts with the state to read inputs and write outputs. Together, nodes are connected to form the agent’s workflow or execution graph, allowing complex reasoning and multi-step operations.</p>
<p>In our project, we’ll have four nodes.</p>
<ol>
<li><p><strong>Agent node:</strong> This node is in charge of interacting with the LLM - it constructs the agent’s main message template and stacks old messages to the new prompt to create context.</p>
</li>
<li><p><strong>Tools node:</strong> The tools node introduces external capabilities, which allow the workflow to interact with external APIs</p>
</li>
<li><p><code>START</code> <strong>node:</strong> This node indicates the entry point of our workflow, or to be precise, which node to call when a user initiates a conversation with the agent. It’s quite simple to define.</p>
</li>
<li><p><code>addConditionalEdges</code> - <code>addConditionalEdges('agent', shouldContinue)</code>: In LangGraph, <code>.addConditionalEdges('agent', shouldContinue)</code> lets the workflow branch dynamically after the <code>'agent'</code> node runs, based on a condition defined in <code>shouldContinue</code>. Unlike a fixed edge, which always goes from one node to the next, a conditional edge evaluates the agent’s output and directs the workflow to different nodes depending on the result, allowing the AI agent to make decisions and adapt its next steps.</p>
</li>
</ol>
<h2 id="heading-graph-declaration">Graph Declaration</h2>
<p>In LangGraph, a graph is the central structure that models an AI agent’s workflow as interconnected nodes, where each node represents a computation step, tool, or decision. It orchestrates the flow of data and control between nodes, manages conditional branching, and maintains the recursive loop of execution.</p>
<p>Essentially, the graph is the backbone that ensures complex, stateful interactions happen in a coordinated and modular way, connecting nodes like <code>agent</code>, <code>tools</code>, and conditional edges into a coherent workflow.</p>
<p>With that knowledge in place, we can now create the agent graph with all its nodes.</p>
<pre><code class="lang-tsx">  const callModal = async (states: typeof graphState.State) =&gt; {
    const prompt = ChatPromptTemplate.fromMessages([
      {
        role: 'system',
        content: `
            You are a helpful assistant that helps users order drinks from Starbucks.
            Your job is to take the user's request and fill in any missing details based on how a complete order should look.
            A complete order follows this structure: ${OrderParser}.

            **TOOLS**
            You have access to a "create_order" tool.
            Use this tool when the user confirms the final order.
            After calling the tool, you should inform the user whether the order was successfully created or if it failed.

            **DRINK DETAILS**
            Each drink has its own set of properties such as size, milk, syrup, sweetener, and toppings.
            Here is the drink schema: ${DrinkParser}.

            You must ask for any missing details before creating the order.

            If the user requests a modification that is not supported for the selected drink, tell them that it is not possible.

            If the user asks for something unrelated to drink orders, politely tell them that you can only assist with drink orders.

            **AVAILABLE OPTIONS**
            List of available drinks and their allowed modifications:
            ${DRINKS.map((drink) =&gt; `- ${createDrinkItemSummary(drink)}`)}

            Sweeteners: ${createSweetenersSummary()}
            Toppings: ${availableToppingsSummary()}
            Milks: ${createAvailableMilksSummary()}
            Syrups: ${createSyrupsSummary()}
            Sizes: ${createSizesSummary()}

            Order schema: ${OrderParser}

            If the user's query is unclear, tell them that the request is not clear.

            **ORDER CONFIRMATION**
            Once the order is ready, you must ask the user to confirm it.
            If they confirm, immediately call the "create_order" tool.
            Only respond after the tool completes, indicating success or failure.

            **FRONTEND RESPONSE FORMAT**
            Every response must include:

            "message": "Your message to the user",
            "current_order": "The order currently being constructed",
            "suggestions": "Options the user can choose from",
            "progress": "Order status ('completed' after creation)"

            **IMPORTANT RULES**
            - Be friendly, use emojis, and add humor.
            - Use null for unfilled fields.
            - Never omit the JSON tracking object.
        `,
      },
      new MessagesPlaceholder('messages'),
    ]);

  const formattedPrompt = await prompt.formatMessages({
    time: new Date().toISOString(),
    messages: states.messages,
  });

  const chat = new ChatGoogleGenerativeAI({
    model: 'gemini-2.0-flash',
    temperature: 0,
    apiKey: GOOGLE_API_KEY,
  }).bindTools(tools);

  const result = await chat.invoke(formattedPrompt);
  return { messages: [result] };
  };     
    const shouldContinue = (state: typeof graphState.State) =&gt; {
      const lastMessage = state.messages[
        state.messages.length - 1
      ] as AIMessage;
      return lastMessage.tool_calls?.length ? 'tools' : END;
    };

    const toolsNode = new ToolNode&lt;typeof graphState.State&gt;(tools);

    /**
     * Build the conversation graph.
     */
    const graph = new StateGraph(graphState)
      .addNode('agent', callModal)
      .addNode('tools', toolsNode)
      .addEdge(START, 'agent')
      .addConditionalEdges('agent', shouldContinue)
      .addEdge('tools', 'agent');
</code></pre>
<h3 id="heading-explanation">Explanation</h3>
<ul>
<li><p><strong>Graph State (</strong><code>graphState</code>)<br>  The <code>graphState</code> object is the shared memory across all nodes. It stores <code>messages</code>, which track the conversation history including user inputs, AI responses, and tool interactions. The reducer <code>[...x, ...y]</code> appends new messages, preserving past context. This is similar to React state updates: old messages remain while new ones are added.</p>
</li>
<li><p><strong>Agent Node (</strong><code>callModal</code>)<br>  This node handles the <strong>LLM call</strong>. It formats a prompt containing system instructions, drink schemas, available tools, and frontend response rules. By including <code>states.messages</code>, the AI sees the full conversation history, enabling multi-turn dialogue.</p>
</li>
<li><p><strong>LLM Execution</strong><br>  <code>ChatGoogleGenerativeAI</code> generates the AI response. <code>.bindTools(tools)</code> allows the AI to call tools like <code>create_order</code> directly if needed.</p>
</li>
<li><p><strong>Conditional Flow (</strong><code>shouldContinue</code>)<br>  After the AI responds, the <code>shouldContinue</code> function checks if the message includes tool calls. If so, execution moves to the <code>tools</code> node; otherwise, the workflow ends. This allows dynamic branching depending on the AI’s output.</p>
</li>
<li><p><strong>Tool Node (</strong><code>ToolNode</code>)<br>  The <code>tools</code> node executes the requested tool, such as saving the order to the database. Once completed, control returns to the agent node, enabling the AI to respond to the user with results.</p>
</li>
<li><p><strong>Graph Construction (</strong><code>StateGraph</code>)<br>  Nodes are connected in a coherent workflow:</p>
<ul>
<li><p><code>START → agent</code> begins the conversation</p>
</li>
<li><p>Conditional edges handle tool execution</p>
</li>
<li><p><code>tools → agent</code> ensures the agent can respond after tools run</p>
</li>
</ul>
</li>
<li><p><strong>Overall Flow</strong><br>  Together, the graph and shared state ensure a <strong>stateful, multi-turn conversation</strong>. The AI can ask for missing details, call tools when needed, and maintain context across interactions. Every node reads and writes to the same state.</p>
</li>
</ul>
<h2 id="heading-workflow-compilation-and-state-persistence-final-part"><strong>Workflow Compilation and State Persistence (Final Part)</strong></h2>
<p>So far, all of our states are temporary, meaning they only exist for the duration of a user’s request. However, we want our agent to <strong>remember and recall conversation context</strong> even when a new request is sent with the same <code>thread_id</code> or conversation ID.</p>
<p>To achieve this, we’ll use MongoDB in combination with the <code>langchain/langgraph-checkpoint-mongo</code> library. This library simplifies state persistence by associating each conversation with a unique, manually assigned ID. All operations—from retrieving previous messages to saving new ones—are handled internally, you only need to provide the conversation ID you want to work with.</p>
<pre><code class="lang-tsx">const graph = new StateGraph(graphState)
  .addNode('agent', callModal)
  .addNode('tools', toolsNode)
  .addEdge(START, 'agent')
  .addConditionalEdges('agent', shouldContinue)
  .addEdge('tools', 'agent');

  const checkpointer = new MongoDBSaver({ client, dbName: database_name });

  const app = graph.compile({ checkpointer });

  /**
     * Run the graph using the user's message.
     */
    const finalState = await app.invoke(
      { messages: [new HumanMessage(query)] },
      { recursionLimit: 15, configurable: { thread_id } },
    );

  /**
   * Extract JSON payload from AI response.
   */
  function extractJsonResponse(response: any) {
    const match = response.match(/```json\\s*([\\s\\S]*?)\\s*```/i);
    if (match &amp;&amp; match[1] &amp;&amp; typeof response === 'string') {
      return JSON.parse(match[1].trim());
    }
    throw response;
  }

  const lastMessage = finalState.messages.at(-1) as AIMessage; // Extract the last message of the conversation
  return extractJsonResponse(lastMessage.content); //Response
</code></pre>
<p>The above code demonstrates how to initialize a checkpoint, compile a graph, and invoke the agent with an incoming prompt.</p>
<p>The <code>extractJsonResponse</code> method is used to grab the formatted response that we instructed the LLM to generate whenever it’s sending back something to the user.</p>
<p>Based on this given instruction from the main template, every response must include: "message": "Your message to the user", "current_order": "The order currently being constructed", "suggestions": "Options the user can choose from", "progress": "Order status ('completed' after creation)"</p>
<p>Every response from the LLM should look like this:</p>
<pre><code class="lang-tsx">'```json\\n' +
  '{\\n' +
  '"message": "Got it! To make sure I get your order just right, can you clarify which coffee drink you\\'d like? We have Latte, Cappuccino, Cold Brew, and Frappuccino. 😊",\\n' +
  '"current_order": {\\n' +
  '"drink": null,\\n' +
  '"size": null,\\n' +
  '"mil": null,\\n' +
  '"syrup": null,\\n' +
  '"sweeteners": null,\\n' +
  '"toppings": null,\\n' +
  '"quantity": null\\n' +
  '},\\n' +
  '"suggestions": [\\n' +
  '"Latte",\\n' +
  '"Cappuccino",\\n' +
  '"Cold Brew",\\n' +
  '"Frappuccino"\\n' +
  '],\\n' +
  '"progress": "incomplete"\\n' +
  '}\\n' +
  '```';
</code></pre>
<p>This structure allows the frontend to easily render the LLM response and track the state of the current order. This is more of a design choice and less of a convention.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Building an autonomous AI agent with LangChain and LangGraph allows you to combine the reasoning power of LLMs with practical tool execution and persistent memory. By defining schemas, parsing data into human-readable formats, and orchestrating workflows through nodes, you can create intelligent agents capable of handling real-world tasks—like our Starbucks barista.</p>
<p>With MongoDB integration for state persistence, your agent can maintain context across conversations, making interactions feel more natural and human-like. This approach opens the door to building more sophisticated, domain-specific AI assistants without starting from scratch.</p>
<p>In short: <strong>define your data, teach your agent how to reason, and let LangGraph orchestrate the magic.</strong> ☕🤖</p>
<p>Source code here: <a target="_blank" href="https://github.com/DjibrilM/langgraph-starbucks-agent">https://github.com/DjibrilM/langgraph-starbucks-agent</a></p>
<h3 id="heading-resources"><strong>Resources</strong></h3>
<ul>
<li><p>LangGraph documentation: <a target="_blank" href="https://docs.langchain.com/oss/javascript/langgraph/quickstart">https://docs.langchain.com/oss/javascript/langgraph/quickstart</a></p>
</li>
<li><p>Synergizing Reasoning and Acting in Language Models: <a target="_blank" href="https://arxiv.org/abs/2210.03629">https://arxiv.org/abs/2210.03629</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use LangChain and LangGraph: A Beginner’s Guide to AI Workflows ]]>
                </title>
                <description>
                    <![CDATA[ Artificial intelligence is moving fast. Every week, new tools appear that make it easier to build apps powered by large language models. But many beginners still get stuck on one question: how do you structure the logic of an AI application? How do y... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-langchain-and-langgraph-a-beginners-guide-to-ai-workflows/</link>
                <guid isPermaLink="false">690b882e468be723832787a7</guid>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langgraph ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Wed, 05 Nov 2025 17:23:58 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762363391314/34c1c950-b257-40b2-a03d-cbaf1bfbd4b6.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Artificial intelligence is moving fast. Every week, new tools appear that make it easier to build apps powered by large language models.</p>
<p>But many beginners still get stuck on one question: how do you structure the logic of an AI application? How do you connect prompts, memory, tools, and APIs in a clean way?</p>
<p>That is where popular open-source frameworks like <a target="_blank" href="https://www.langchain.com/">LangChain</a> and <a target="_blank" href="https://www.langchain.com/langgraph">LangGraph</a> come in.</p>
<p>Both are part of the same ecosystem, and they’re designed to help you build complex AI workflows without reinventing the wheel.</p>
<p>LangChain focuses on building sequences of steps called chains, while LangGraph takes things a step further by adding memory, branching, and feedback loops to make your AI more intelligent and flexible.</p>
<p>This guide will help you understand what these tools do, how they differ, and how you can start using them to build your own AI projects.</p>
<h2 id="heading-what-we-will-cover"><strong>What we will cover</strong></h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-langchain">What is LangChain?</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-why-langchain-was-not-enough">Why LangChain Was Not Enough</a></li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-langgraph">What is LangGraph?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-langchain-vs-langgraph">LangChain vs LangGraph</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-to-use-each">When to Use Each</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-adding-memory-and-persistence">Adding Memory and Persistence</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-monitoring-and-debugging-with-langsmith">Monitoring and Debugging with LangSmith</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-langchain-ecosystem">The LangChain Ecosystem</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-what-is-langchain"><strong>What is LangChain?</strong></h2>
<p><a target="_blank" href="https://www.turingtalks.ai/p/how-to-build-better-ai-workflows-with-langchain">LangChain</a> is a Python and JavaScript framework that helps you build language model-powered applications. It provides a structure for connecting models like GPT, data sources, and tools into a single flow.</p>
<p>Instead of writing long prompt templates or hardcoding logic, you use components like chains, tools, and agents.</p>
<p>A simple example is chaining prompts together. For instance, you might first ask the model to summarize text, and then use the summary to generate a title. LangChain lets you define both steps and connect them in code.</p>
<p>Here is a basic example in Python:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.prompts <span class="hljs-keyword">import</span> PromptTemplate
<span class="hljs-keyword">from</span> langchain.chains <span class="hljs-keyword">import</span> LLMChain
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI

llm = ChatOpenAI(model=<span class="hljs-string">"gpt-4o-mini"</span>)
prompt = PromptTemplate.from_template(<span class="hljs-string">"Summarize the following text:\n{text}"</span>)
chain = LLMChain(prompt=prompt, llm=llm)
result = chain.run({<span class="hljs-string">"text"</span>: <span class="hljs-string">"LangChain helps developers build AI apps faster."</span>})
print(result)
</code></pre>
<p>This simple chain takes text and runs it through an OpenAI model to get a summary. You can add more steps, like a second chain to turn that summary into a title or a question.</p>
<p>LangChain provides modules for prompt templates, models, retrievers, and tools so you can build workflows without managing the raw API logic.</p>
<p>Here is the full <a target="_blank" href="https://docs.langchain.com/oss/python/langchain/overview">LangChain documentation</a>.</p>
<h3 id="heading-why-langchain-was-not-enough"><strong>Why LangChain Was Not Enough</strong></h3>
<p>LangChain made it easy to build straight-line workflows.</p>
<p>But most real-world applications are not linear. When <a target="_blank" href="https://www.freecodecamp.org/news/build-a-custom-ai-chat-application-with-nextjs/">building a chatbot</a>, summarizer, or an autonomous agent, you often need loops, memory, and conditions.</p>
<p>For example, if the AI makes a wrong assumption, you might want it to try again. If it needs more data, it should call a search tool. Or if a user changes context, the AI should remember what was discussed earlier.</p>
<p>LangChain’s chains and agents could do some of this, but the flow was hard to visualize and manage. You had to write nested chains or use callbacks to handle decisions.</p>
<p>Developers wanted a better way to represent how AI systems actually think. Not in straight lines, but as graphs where outputs can lead to different paths.</p>
<p>That’s what led to LangGraph.</p>
<h2 id="heading-what-is-langgraph"><strong>What is LangGraph?</strong></h2>
<p>LangGraph is an extension of LangChain that introduces a graph-based approach to AI workflows.</p>
<p>Instead of chaining steps in one direction, LangGraph lets you define nodes and edges like a flowchart. Each node can represent a task, an action, or a model call.</p>
<p>This structure allows loops, branching, and parallel paths. It’s perfect for building agent-like systems where the model reasons, decides, and acts.</p>
<p>Here is an example of a simple LangGraph setup:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langgraph.graph <span class="hljs-keyword">import</span> StateGraph, END
<span class="hljs-keyword">from</span> langgraph.prebuilt <span class="hljs-keyword">import</span> create_react_agent
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> Tool

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">multiply</span>(<span class="hljs-params">a: int, b: int</span>):</span>
    <span class="hljs-keyword">return</span> a * b
tools = [Tool(name=<span class="hljs-string">"multiply"</span>, func=multiply, description=<span class="hljs-string">"Multiply two numbers"</span>)]
llm = ChatOpenAI(model=<span class="hljs-string">"gpt-4o-mini"</span>)
agent_executor = create_react_agent(llm, tools)
graph = StateGraph()
graph.add_node(<span class="hljs-string">"agent"</span>, agent_executor)
graph.set_entry_point(<span class="hljs-string">"agent"</span>)
graph.add_edge(<span class="hljs-string">"agent"</span>, END)
app = graph.compile()
response = app.invoke({<span class="hljs-string">"input"</span>: <span class="hljs-string">"Use the multiply tool to get 8 times 7"</span>})
print(response)
</code></pre>
<p>This example shows a basic agent graph.</p>
<p>The AI receives a request, reasons about it, decides to use the tool, and completes the task. You can imagine extending this to more complex graphs where the AI can retry, call APIs, or fetch new information.</p>
<p>LangGraph gives you full control over how the AI moves between states. Each node can have conditions. For example, if an answer is incomplete, you can send it back to another node to refine it.</p>
<p>This makes LangGraph ideal for building systems that need multiple reasoning steps, like document analysis bots, code reviewers, or research assistants.</p>
<p>Here is the full <a target="_blank" href="https://docs.langchain.com/oss/python/langgraph/overview">LangGraph documentation</a>.</p>
<h2 id="heading-langchain-vs-langgraph"><strong>LangChain vs LangGraph</strong></h2>
<p>LangChain and LangGraph share the same foundation, but they approach workflows differently.</p>
<p>LangChain is linear. Each chain or agent moves from one step to the next in a sequence. It is simpler to start with, especially for prompt engineering, retrieval-augmented generation, and structured pipelines.</p>
<p>LangGraph is dynamic. It represents workflows as graphs that can loop, branch, and self-correct. It is more powerful when building agents that need reasoning, planning, or memory.</p>
<p>A good analogy is this: LangChain is like writing a list of tasks in order. LangGraph is like drawing a flowchart where decisions can lead to different actions or back to previous steps.</p>
<p>Most developers start with LangChain to learn the basics, then move to LangGraph when they want to build more interactive or autonomous AI systems.</p>
<h2 id="heading-when-to-use-each"><strong>When to Use Each</strong></h2>
<p>If you’re building simple tools like text summarizers, chatbots, or document retrievers, LangChain is enough. It’s easy to get started and integrates well with popular models like GPT, Claude, and Gemini.</p>
<p>If you want to build multi-step agents, or apps that think and adapt, go with LangGraph. You can define how the AI reacts to different outcomes, and you get more control over retry logic, context switching, and feedback loops.</p>
<p>In practice, many developers combine both. LangChain provides the building blocks, while LangGraph organizes how those blocks interact.</p>
<h2 id="heading-adding-memory-and-persistence"><strong>Adding Memory and Persistence</strong></h2>
<p>Both LangChain and LangGraph support memory, which allows your AI to remember context between interactions. This is useful when you’re building chatbots, assistants, or agents that need to carry information across steps.</p>
<p>For example, if a user introduces themselves once, the AI should be able to recall that detail later in the conversation.</p>
<p>In LangChain, memory is handled through built-in modules like <code>ConversationBufferMemory</code> or <code>ConversationSummaryMemory</code>. These let you store previous inputs and outputs so the model can reference them in future responses.</p>
<p>Here’s a simple example using LangChain:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.memory <span class="hljs-keyword">import</span> ConversationBufferMemory
<span class="hljs-keyword">from</span> langchain.chains <span class="hljs-keyword">import</span> ConversationChain
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI

memory = ConversationBufferMemory()
llm = ChatOpenAI(model=<span class="hljs-string">"gpt-4o-mini"</span>)
conversation = ConversationChain(llm=llm, memory=memory)

conversation.predict(input=<span class="hljs-string">"Hello, I am Manish."</span>)
response = conversation.predict(input=<span class="hljs-string">"What did I just tell you?"</span>)
print(response)
</code></pre>
<p>In this case, the model remembers your previous message and answers accordingly. The memory object acts like a running conversation log, keeping track of the dialogue as it evolves.</p>
<p>LangGraph takes this a step further by embedding memory into the graph’s state. Each node in the graph can access or update shared memory, allowing your AI to maintain context across multiple reasoning steps or branches. This approach is especially useful when building agents that loop, revisit nodes, or depend on previous interactions.</p>
<p>Here’s how memory can be added inside a LangGraph workflow:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langgraph.graph <span class="hljs-keyword">import</span> StateGraph, END
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">from</span> langchain.memory <span class="hljs-keyword">import</span> ConversationBufferMemory
<span class="hljs-keyword">from</span> langgraph.prebuilt <span class="hljs-keyword">import</span> create_react_agent

llm = ChatOpenAI(model=<span class="hljs-string">"gpt-4o-mini"</span>)
memory = ConversationBufferMemory()

agent = create_react_agent(llm)
graph = StateGraph()

<span class="hljs-comment"># Add node with access to memory</span>
graph.add_node(<span class="hljs-string">"chat"</span>, <span class="hljs-keyword">lambda</span> state: agent.invoke({<span class="hljs-string">"input"</span>: state[<span class="hljs-string">"input"</span>], <span class="hljs-string">"memory"</span>: memory}))
graph.set_entry_point(<span class="hljs-string">"chat"</span>)
graph.add_edge(<span class="hljs-string">"chat"</span>, END)

app = graph.compile()

app.invoke({<span class="hljs-string">"input"</span>: <span class="hljs-string">"Hello, I am Manish."</span>})
response = app.invoke({<span class="hljs-string">"input"</span>: <span class="hljs-string">"What did I just tell you?"</span>})
print(response)
</code></pre>
<p>Here, the graph keeps track of memory between invocations. Even though each call runs through the same node, the shared <code>ConversationBufferMemory</code> retains what was said earlier. This design lets you build agents that remember user context, maintain history, and adapt as they move between nodes.</p>
<p>Whether you use LangChain or LangGraph, adding memory is what turns a simple workflow into a stateful system, one that can carry on a conversation, refine its reasoning, and respond more naturally over time.</p>
<h2 id="heading-monitoring-and-debugging-with-langsmith"><strong>Monitoring and Debugging with LangSmith</strong></h2>
<p><a target="_blank" href="https://www.langchain.com/langsmith/observability">LangSmith</a> is another important tool from the LangChain ecosystem. It helps you visualize, monitor, and debug your AI applications.</p>
<p>When building workflows, you often want to see how the model behaves, how much it costs, and where things go wrong.</p>
<p>LangSmith records every call made by your chains and agents. You can view input and output data, timing, token usage, and errors. It provides a dashboard that shows how your system performed across multiple runs.</p>
<p>You can integrate LangSmith easily by setting your environment variable:</p>
<pre><code class="lang-python-repl">export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="your_api_key_here"
</code></pre>
<p>Then, every LangChain or LangGraph process you run will automatically log to LangSmith. This helps developers find bugs, optimize prompts, and understand how the workflow behaves at each step.</p>
<p>Note that while Langchain and LangGraph are open source, Langsmith is a paid platform. Langsmith is a good-to-have tool and not a requirement to build AI workflows.</p>
<h2 id="heading-the-langchain-ecosystem"><strong>The LangChain Ecosystem</strong></h2>
<p>LangChain is not just one library. It has grown into an ecosystem of tools that work together.</p>
<ul>
<li><p><strong>LangChain Core</strong>: The main framework for chains, prompts, and memory.</p>
</li>
<li><p><strong>LangGraph</strong>: A graph-based extension for building adaptive workflows.</p>
</li>
<li><p><strong>LangSmith</strong>: A debugging and monitoring platform for AI apps.</p>
</li>
<li><p><strong>LangServe</strong>: A deployment layer that lets you turn your chains and graphs into APIs with one command.</p>
</li>
</ul>
<p>Together, these tools form a complete stack for building, managing, and deploying language model applications. You can start with a simple chain, evolve it into a graph-based system, test it with LangSmith, and deploy it using LangServe.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>LangChain and LangGraph make it easier to move from prompts to production-ready AI systems. LangChain helps you build linear flows that connect models, data, and tools. LangGraph lets you go further by building adaptive and intelligent workflows that reason and learn.</p>
<p>For beginners, starting with LangChain is the best way to understand how language models can interact with other components. As your projects grow, LangGraph will give you the flexibility to handle complex logic and long-term state.</p>
<p>Whether you are building a chatbot, an agent, or a knowledge assistant, these tools will help you go from idea to implementation faster and more reliably.</p>
<p><em>Hope you enjoyed this article. Signup for my free newsletter</em> <a target="_blank" href="https://www.turingtalks.ai/"><strong><em>TuringTalks.ai</em></strong></a> <em>for more hands-on tutorials on AI. You can also</em> <a target="_blank" href="https://manishshivanandhan.com/"><strong><em>visit my website</em></strong></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use LangChain and GPT to Analyze Multiple Documents ]]>
                </title>
                <description>
                    <![CDATA[ Over the past year or so, the developer universe has exploded with ingenious new tools, applications, and processes for working with large language models and generative AI. One particularly versatile example is the LangChain project. The overall goa... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-langchain-and-gpt-to-analyze-multiple-documents/</link>
                <guid isPermaLink="false">672b941f0c32c8c8cd6159a9</guid>
                
                    <category>
                        <![CDATA[ generative ai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ David Clinton ]]>
                </dc:creator>
                <pubDate>Wed, 06 Nov 2024 16:06:55 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1730909200914/e75f3725-7453-49c0-b4e9-8b14fbc3b783.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Over the past year or so, the developer universe has exploded with ingenious new tools, applications, and processes for working with large language models and generative AI.</p>
<p>One particularly versatile example is <a target="_blank" href="https://www.langchain.com/">the LangChain project</a>. The overall goal involves providing easy integrations with various LLM models. But the LangChain ecosystem is also host to a growing number of (sometimes experimental) projects pushing the limits of the humble LLM.</p>
<p>Spend some time browsing <a target="_blank" href="https://www.langchain.com/">LangChain’s website</a> to get a sense of what's possible. You'll see how many tools are designed to help you build more powerful applications.</p>
<p>But you can also use it as an alternative for connecting your favorite AI with the live internet. Specifically, this demo will show you how to use it to programmatically access, summarize, and analyze long and complex online documents.</p>
<p>To make it all happen, you’ll need a Python runtime environment (like Jupyter Lab) and a valid OpenAI API key.</p>
<h3 id="heading-prepare-your-environment">Prepare Your Environment</h3>
<p>One popular use for LangChain involves loading multiple PDF files in parallel and asking GPT to analyze and compare their contents.</p>
<p>As you can see for yourself in <a target="_blank" href="https://python.langchain.com/docs/integrations/toolkits/document_comparison_toolkit">the LangChain documentation,</a> existing modules can be loaded to permit PDF consumption and natural language parsing. I'm going to walk you through a use-case sample that's loosely based on the example in that documentation. Here's how that begins:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
os.environ[<span class="hljs-string">'OPENAI_API_KEY'</span>] = <span class="hljs-string">"sk-xxx"</span>
<span class="hljs-keyword">from</span> pydantic <span class="hljs-keyword">import</span> BaseModel, Field
<span class="hljs-keyword">from</span> langchain.chat_models <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> Tool
<span class="hljs-keyword">from</span> langchain.embeddings.openai <span class="hljs-keyword">import</span> OpenAIEmbeddings
<span class="hljs-keyword">from</span> langchain.text_splitter <span class="hljs-keyword">import</span> CharacterTextSplitter
<span class="hljs-keyword">from</span> langchain.vectorstores <span class="hljs-keyword">import</span> FAISS
<span class="hljs-keyword">from</span> langchain.document_loaders <span class="hljs-keyword">import</span> PyPDFLoader
<span class="hljs-keyword">from</span> langchain.chains <span class="hljs-keyword">import</span> RetrievalQA
</code></pre>
<p>That code will build your environment and set up the tools necessary for:</p>
<ul>
<li><p>Enabling OpenAI Chat (ChatOpenAI)</p>
</li>
<li><p>Understanding and processing text (OpenAIEmbeddings, CharacterTextSplitter, FAISS, RetrievalQA)</p>
</li>
<li><p>Managing an AI agent (Tool)</p>
</li>
</ul>
<p>Next, you'll create and define a <code>DocumentInput</code> class and a value called <code>llm</code> which sets some familiar GPT parameters that'll both be called later:</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">DocumentInput</span>(<span class="hljs-params">BaseModel</span>):</span>
    question: str = Field()
llm = ChatOpenAI(temperature=<span class="hljs-number">0</span>, model=<span class="hljs-string">"gpt-3.5-turbo-0613"</span>)
</code></pre>
<h3 id="heading-load-your-documents">Load Your Documents</h3>
<p>Next, you'll create a couple of arrays. The three <code>path</code> variables in the <code>files</code> array contain the URLs for recent financial reports issued by three software/IT services companies: Alphabet (Google), Cisco, and IBM.</p>
<p>We're going to have GPT dig into three companies’ data simultaneously, have the AI compare the results, and do it all without having to go to the trouble of downloading PDFs to a local environment.</p>
<p>You can usually find such legal filings in the Investor Relations section of a company's website.</p>
<pre><code class="lang-python">tools = []
files = [
    {
        <span class="hljs-string">"name"</span>: <span class="hljs-string">"alphabet-earnings"</span>,
        <span class="hljs-string">"path"</span>: <span class="hljs-string">"https://abc.xyz/investor/static/pdf/2023Q1\
        _alphabet_earnings_release.pdf"</span>,
    },
    {
        <span class="hljs-string">"name"</span>: <span class="hljs-string">"Cisco-earnings"</span>,
        <span class="hljs-string">"path"</span>: <span class="hljs-string">"https://d18rn0p25nwr6d.cloudfront.net/CIK-00\
            00858877/5b3c172d-f7a3-4ecb-b141-03ff7af7e068.pdf"</span>,
    },
    {
        <span class="hljs-string">"name"</span>: <span class="hljs-string">"IBM-earnings"</span>,
        <span class="hljs-string">"path"</span>: <span class="hljs-string">"https://www.ibm.com/investor/att/pdf/IBM_\
            Annual_Report_2022.pdf"</span>,
    },
    ]
</code></pre>
<p>This <code>for</code> loop will iterate through each value of the <code>files</code> array I just showed you. For each iteration, it'll use <code>PyPDFLoader</code> to load the specified PDF file, <code>loader</code> and <code>CharacterTextSplitter</code> to parse the text, and the remaining tools to organize the data and apply the embeddings. It'll then invoke the <code>DocumentInput</code> class we created earlier:</p>
<pre><code class="lang-python"><span class="hljs-keyword">for</span> file <span class="hljs-keyword">in</span> files:
    loader = PyPDFLoader(file[<span class="hljs-string">"path"</span>])
    pages = loader.load_and_split()
    text_splitter = CharacterTextSplitter(chunk_size=<span class="hljs-number">1000</span>, \
        chunk_overlap=<span class="hljs-number">0</span>)
    docs = text_splitter.split_documents(pages)
    embeddings = OpenAIEmbeddings()
    retriever = FAISS.from_documents(docs, embeddings).as_retriever()
<span class="hljs-comment"># Wrap retrievers in a Tool</span>
tools.append(
    Tool(
        args_schema=DocumentInput,
        name=file[<span class="hljs-string">"name"</span>],
        func=RetrievalQA.from_chain_type(llm=llm, \
            retriever=retriever),
    )
)
</code></pre>
<h3 id="heading-prompt-your-model">Prompt Your Model</h3>
<p>At this point, we're finally ready to create an agent and feed it our prompt as <code>input</code>.</p>
<pre><code class="lang-python">llm = ChatOpenAI(
    temperature=<span class="hljs-number">0</span>,
    model=<span class="hljs-string">"gpt-3.5-turbo-0613"</span>,
)
agent = initialize_agent(
    agent=AgentType.OPENAI_FUNCTIONS,
    tools=tools,
    llm=llm,
    verbose=<span class="hljs-literal">True</span>,
)
    agent({<span class="hljs-string">"input"</span>: <span class="hljs-string">"Based on these SEC filing documents, identify \
        which of these three companies - Alphabet, IBM, and Cisco \
        has the greatest short-term debt levels and which has the \
        highest research and development costs."</span>})
</code></pre>
<p>The output that I got was short and to the point:</p>
<blockquote>
<p>‘output’: ‘Based on the SEC filing documents:\n\n- The company with the greatest short-term debt levels is IBM, with a short-term debt level of $4,760 million.\n- The company with the highest research and development costs is Alphabet, with research and development costs of $11,468 million.’</p>
</blockquote>
<h3 id="heading-wrapping-up">Wrapping Up</h3>
<p>As you’ve seen, LangChain lets you integrate multiple tools into generative AI operations, enabling multi-layered programmatic access to the live internet and more sophisticated LLM prompts.</p>
<p>With these tools, you’ll be able to automate applying the power of AI engines to real-world data assets in real time. Try it out for yourself.</p>
<p><em>This article is excerpted from</em> <a target="_blank" href="https://www.amazon.com/dp/1633436985"><em>my Manning book, The Complete Obsolete Guide to Generative AI</em></a><em>.  But you can find plenty more technology goodness at</em> <a target="_blank" href="https://bootstrap-it.com/"><em>my website</em></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Start Building Projects with LLMs ]]>
                </title>
                <description>
                    <![CDATA[ If you’re an aspiring AI professional, becoming an LLM engineer offers an exciting and promising career path. But where should you start? What should your trajectory look like? How should you learn? In one of my previous posts, I laid out the complet... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-start-building-projects-with-llms/</link>
                <guid isPermaLink="false">66faf2011a0aeb460edd6a88</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ engineering ]]>
                    </category>
                
                    <category>
                        <![CDATA[ chatbot ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Harshit Tyagi ]]>
                </dc:creator>
                <pubDate>Mon, 30 Sep 2024 18:46:25 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727442031549/2b9f61f1-d25d-4c10-8a9e-c63fe7ee7cad.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you’re an aspiring AI professional, becoming an LLM engineer offers an exciting and promising career path.</p>
<p>But where should you start? What should your trajectory look like? How should you learn?</p>
<p>In one of my <a target="_blank" href="https://dswharshit.medium.com/roadmap-to-become-an-ai-engineer-roadmap-6d9558d970cf">previous</a> <a target="_blank" href="https://dswharshit.medium.com/roadmap-to-become-an-ai-engineer-roadmap-6d9558d970cf">posts</a>, I laid out the complete roadmap to become an AI / LLM Engineer. Reading this article will give you insights into the types of skills you’ll need to acquire and how to start learning.</p>
<h2 id="heading-the-best-way-to-learn-is-to-build">The Best Way to Learn  is to  BUILD!</h2>
<p>As Andrej Karpathy puts it:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727441366598/07d24597-c31d-45b5-a99c-fbb485ce3459.png" alt="Karpathy's message on how to become an expert at a thing" width="1170" height="410" loading="lazy"></p>
<p>Andrej emphasizes that you should build concrete projects, and explain everything you learn in your own words. (He also instructs us to only compare ourselves to a younger version of ourselves – never to others.)</p>
<p>And I agree – building projects is the best way to not just learn but really grok these concepts. It will further sharpen the skills you’re learning to think about cutting edge use cases.</p>
<p>But the main challenge with this learning philosophy is that good projects can be hard to find.</p>
<p>And that’s the problem I am trying to resolve. I want to help people, including myself, discover and build practical and real-world projects that help you develop skills that are worth showcasing in your portfolio.</p>
<h2 id="heading-heres-what-well-cover">Here’s What We’ll Cover:</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-should-be-your-first-project">What Should Be Your First Project?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-1-summarise-youtube-videos">Project #1: YouTube Video Summarizer</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-setup-and-requirements">Setup and Requirements</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-introduction-to-document-loaders">Introduction to Document Loaders</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-processing-youtube-transcripts">Processing YouTube Transcripts</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-using-langchain-for-summarization">Using LangChain for Summarization</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-deploying-the-summarizer-on-whatsapp">Deploying the Summarizer on WhatsApp</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-creating-a-flask-api">Creating a Flask API</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-with-twilio-for-whatsapp-integration">Connecting with Twilio for WhatsApp integration</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-project-2-build-a-bot-that-can-handle-different-types-of-user-queries">Project #2 preview: Multi-purpose Customer Service Bot</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-3-rag-powered-support-bot">Project #3 preview: RAG-Powered Support Bot</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-what-should-be-your-first-project">What Should Be Your First Project?</h2>
<p>If you’re a beginner who knows basic to intermediate programming, your initial projects should showcase that you can comfortably build applications with LLMs.</p>
<p>They should demonstrate that:</p>
<ul>
<li><p>you know what APIs are</p>
</li>
<li><p>you know how to consume them</p>
</li>
<li><p>you know how to build products that people actually want to use</p>
</li>
</ul>
<p>Building a chatbot provides a great starting point, but at this point everyone has developed one. And there are many solutions for easy Streamlit based prototypes. So, you need to develop something that’s actually usable and has the potential to reach a wider audience.</p>
<p>I’d suggest building a chatbot for WhatsApp or Discord or Telegram. Build a chatbot which solves a problem people struggle with, a problem that companies have started to build solutions for.</p>
<p>If I had to pick a good and, arguably, the most common AI project that every company has started to work on, it would be RAG-powered chatbots.</p>
<p>But before you get to building RAG-powered bots, you should start building something slightly more basic but practical with LLMs.</p>
<p>To kick things off, let’s start by building a YouTube Summariser.</p>
<h2 id="heading-project-1-summarise-youtube-videos">Project #1: Summarise YouTube Videos</h2>
<p>We’ll build the first part of this project in this tutorial: the core functionality of a YouTube video summariser tool.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727441993970/d318b7d9-37d5-4e93-a862-4d8c6e23886b.png" alt="Wiplane's project on building Youtube summariser whatsapp chatbot" width="880" height="896" loading="lazy"></p>
<p>Our bot will:</p>
<ul>
<li><p>Receive the YouTube URL.</p>
</li>
<li><p>Validate if the URL is correct.</p>
</li>
<li><p>Retrieve the transcript of the video</p>
</li>
<li><p>Use an LLM to analyze and summarize the video’s content.</p>
</li>
<li><p>Return the summary to the user.</p>
</li>
</ul>
<h3 id="heading-setup-and-requirements">Setup and Requirements</h3>
<p>For this project, we’ll code the core functionality in a Jupyter Notebook using the following Python packages:</p>
<ul>
<li><p><code>langchain-together</code> — for the LLM using the LangChain &lt;&gt; Together AI integration</p>
</li>
<li><p><code>langchain-community</code> — for specific data loaders</p>
</li>
<li><p><code>langchain</code> — for programming with LLMs</p>
</li>
<li><p><code>pytube</code> — for fetching video info</p>
</li>
<li><p><code>youtube-transcript-api</code> — for youtube video transcript</p>
</li>
</ul>
<p>We’ll use the Llama 3.1 model offered as an API by <a target="_blank" href="https://www.together.ai/">Together AI</a>.</p>
<p><strong>Together AI</strong> is a cloud platform that offers the open source models as inference APIs. without worrying about the underlying infrastructure.</p>
<p>Let’s start by installing these:</p>
<pre><code class="lang-bash">!pip install — upgrade — quiet langchain
!pip install — quiet langchain-community
!pip install — upgrade — quiet langchain-together
!pip install youtube_transcript_api
!pip install pytube
</code></pre>
<p>Now let’s set up our LLM:</p>
<pre><code class="lang-python"><span class="hljs-comment">## setting up the language model</span>
<span class="hljs-keyword">from</span> langchain_together <span class="hljs-keyword">import</span> ChatTogether
<span class="hljs-keyword">import</span> api_key

llm = ChatTogether(api_key=api_key.api,temperature=<span class="hljs-number">0.0</span>, 
                   model=<span class="hljs-string">"meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"</span>)
</code></pre>
<p>The next step is to process the YouTube videos as a data source. For this we’ll need to understand the concept of document loaders.</p>
<h3 id="heading-introduction-to-document-loaders">Introduction to Document Loaders</h3>
<p>Document loaders provide a unified interface to load data from various sources into a standardized Document format.</p>
<ul>
<li><p>They automatically extract and attach relevant metadata to the loaded content.</p>
</li>
<li><p>The metadata can include source information, timestamps, or other contextual data that can be valuable for downstream processing.</p>
</li>
<li><p>LangChain offers loaders for CSV, PDF, HTML, JSON, and even specialized loaders for sources like YouTube transcripts or GitHub repositories, as listed in <a target="_blank" href="https://python.langchain.com/docs/how_to/#document-loaders">their integrations page</a>.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727441974919/e979be2a-c1d8-4936-aa45-58d909855ace.png" alt="LangChain supports different types of document loaders" width="2118" height="1394" loading="lazy"></p>
<h4 id="heading-categories-of-document-loaders">Categories of Document Loaders</h4>
<p>Document loaders in LangChain can be broadly categorized into two types:</p>
<ol>
<li><strong>File Type-Based Loaders</strong></li>
</ol>
<ul>
<li><p>Parse and load documents based on specific file formats</p>
</li>
<li><p>Examples include: CSV, PDF, HTML, Markdown</p>
</li>
</ul>
<p><strong>2. Data Source-Based Loaders</strong></p>
<ul>
<li><p>Retrieve data from various external sources</p>
</li>
<li><p>Load the data into Document objects</p>
</li>
<li><p>Examples include: YouTube, Wikipedia, GitHub</p>
</li>
</ul>
<h4 id="heading-integration-capabilities">Integration Capabilities</h4>
<ul>
<li><p>LangChain’s document loaders can integrate with almost any file format you might need.</p>
</li>
<li><p>They also support many third-party data sources.</p>
</li>
</ul>
<p>For our project, we’ll use the YoutubeLoader to get the transcripts in the required format.</p>
<h4 id="heading-youtubeloader-from-langchain-to-get-transcript">YoutubeLoader from LangChain to Get Transcript:</h4>
<pre><code class="lang-python"><span class="hljs-comment">## import the youtube documnent loader from LangChain</span>
<span class="hljs-keyword">from</span> langchain_community.document_loaders <span class="hljs-keyword">import</span> YoutubeLoader

video_url = <span class="hljs-string">'https://www.youtube.com/watch?v=gaWxyWwziwE'</span>
loader = YoutubeLoader.from_youtube_url(video_url, add_video_info=<span class="hljs-literal">False</span>)
data = loader.load()
</code></pre>
<h3 id="heading-process-the-youtube-transcript">Process the YouTube Transcript</h3>
<ul>
<li><p>Display raw transcript content</p>
</li>
<li><p>Use the LLM to summarize and extract key points from the transcript:</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># show the extracted page content</span>
data[<span class="hljs-number">0</span>].page_content
</code></pre>
<p>The <code>page_content</code> attribute contains the complete transcript as shown in the output below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727441916343/b834abbf-f4d5-4464-a421-257ef95fcbd1.png" alt="Youtube video transcript from the youtube loader" width="2890" height="860" loading="lazy"></p>
<p>Now that we have the transcript, we simply need to pass this to the LLM we configured above along with the prompt to summarise.</p>
<p>First, let’s understand a simple method:</p>
<p>Langchain offers the <code>invoke()</code> method to which you need to pass the system message and the user or human message.</p>
<p>The system message is essentially the instructions for the LLM on how it is supposed to process the human request.</p>
<p>And the human message is simply what we want the LLM to do.</p>
<pre><code class="lang-python"><span class="hljs-comment"># This code creates a list of messages for the language model:</span>
<span class="hljs-comment"># 1. A system message with instructions on how to summarize the video transcript</span>
<span class="hljs-comment"># 2. A human message containing the actual video transcript</span>

<span class="hljs-comment"># The messages are then passed to the language model (llm) for processing</span>
<span class="hljs-comment"># The model's response is stored in the 'ai_msg' variable and returned</span>

messages = [
    (
        <span class="hljs-string">"system"</span>, 
        <span class="hljs-string">"""Read through the entire transcript carefully.
           Provide a concise summary of the video's main topic and purpose.
           Extract and list the five most interesting or important points from the transcript. For each point: State the key idea in a clear and concise manner.

        - Ensure your summary and key points capture the essence of the video without including unnecessary details.
        - Use clear, engaging language that is accessible to a general audience.
        - If the transcript includes any statistical data, expert opinions, or unique insights, prioritize including these in your summary or key points."""</span>,
    ),
    (<span class="hljs-string">"human"</span>, data[<span class="hljs-number">0</span>].page_content),
]
ai_msg = llm.invoke(messages)
ai_msg
</code></pre>
<p>But this method won’t work when you have more variables and when you want a more dynamic solution.</p>
<h4 id="heading-for-this-langchain-offers-prompttemplate">For this, LangChain offers PromptTemplate:</h4>
<p>A PromptTemplate in LangChain is a powerful tool that helps in creating dynamic prompts for large language models (LLMs). It allows you to define a template with placeholders for variables that can be filled in with actual values at runtime.</p>
<p>This helps in managing and reusing prompts efficiently, ensuring consistency and reducing the likelihood of errors in prompt creation.</p>
<p>A PromptTemplate consists of:</p>
<ul>
<li><p><strong>Template String</strong>: The actual prompt text with placeholders for variables.</p>
</li>
<li><p><strong>Input Variables</strong>: A list of variables that will be replaced in the template string at runtime.</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># Set up a prompt template for summarizing a video transcript using LangChain</span>

<span class="hljs-comment"># Import necessary classes from LangChain</span>
<span class="hljs-keyword">from</span> langchain.prompts <span class="hljs-keyword">import</span> PromptTemplate
<span class="hljs-keyword">from</span> langchain <span class="hljs-keyword">import</span> LLMChain

<span class="hljs-comment"># Define a PromptTemplate for summarizing video transcripts</span>
<span class="hljs-comment"># The template includes instructions for the AI model on how to process the transcript</span>
product_description_template = PromptTemplate(
    input_variables=[<span class="hljs-string">"video_transcript"</span>],
    template=<span class="hljs-string">"""
    Read through the entire transcript carefully.
           Provide a concise summary of the video's main topic and purpose.
           Extract and list the five most interesting or important points from the transcript. 
           For each point: State the key idea in a clear and concise manner.

        - Ensure your summary and key points capture the essence of the video without including unnecessary details.
        - Use clear, engaging language that is accessible to a general audience.
        - If the transcript includes any statistical data, expert opinions, or unique insights, 
        prioritize including these in your summary or key points.

    Video transcript: {video_transcript}    """</span>
)
</code></pre>
<h3 id="heading-how-to-use-llmchain-lcel-for-summarization">How to Use LLMChain / LCEL for Summarization</h3>
<p>A chain is a sequence of steps that consists of a language model, PromptTemplate, and an optional output parser.</p>
<ul>
<li><p>Create an LLMChain with the custom prompt template</p>
</li>
<li><p>Generate a summary of the video transcript using the chain</p>
</li>
</ul>
<p>Here, we are using LLMChain but you can also use LangChain Expression Language as well to do this:</p>
<pre><code class="lang-python"><span class="hljs-comment">## invoke the chain with the video transcript </span>
chain = LLMChain(llm=llm, prompt=product_description_template)

<span class="hljs-comment"># Run the chain with the provided product details</span>
summary = chain.invoke({
    <span class="hljs-string">"video_transcript"</span>: data[<span class="hljs-number">0</span>].page_content
})
</code></pre>
<p>This will give you the summary object which has the text attribute that contains the response in markdown format.</p>
<pre><code class="lang-python">summary[<span class="hljs-string">'text'</span>]
</code></pre>
<p>The raw response will look like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727441806141/be122b5b-6774-46be-92ab-1f9e651b5045.png" alt="summary response from simple LLM chain" width="2340" height="470" loading="lazy"></p>
<p>To see the Markdown formatted response:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> IPython.display <span class="hljs-keyword">import</span> Markdown, display

display(Markdown(summary[<span class="hljs-string">'text'</span>]))
</code></pre>
<p>And there you go:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727441776170/98223339-03d2-483c-84ef-9400d2eb33f2.png" alt="Structure summary display using Markdown function " width="2272" height="866" loading="lazy"></p>
<p>So, the core functionality of our YouTube summariser is now working.</p>
<p>But this is working in your Jupyter Notebook, to make it more accessible, we’d need to get this functionality deployed on WhatsApp.</p>
<h3 id="heading-how-to-serve-the-yt-summariser-on-whatsapp">How to serve the YT summariser on WhatsApp</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727421384448/cd7f0f37-f25b-4b46-a4a9-0bcd5bf0f0fd.png" alt="Establishing connection between youtube and flask server using Twilio" class="image--center mx-auto" width="1905" height="318" loading="lazy"></p>
<p>For this, we’d need to serve our YT summarisation functionality as an API endpoint for which we are going to use Flask. You can also use FastAPI.</p>
<p>Now we’ll turn all the code in the Jupyter notebook into functions. So, add a function to check if it is a valid youtube URL, then define the <code>summarise</code> function that is basically a compilation of what we wrote in the Jupyter notebook.</p>
<p>You can configure our endpoint in the following manner:</p>
<pre><code class="lang-python"><span class="hljs-meta">@app.route('/summary', methods=['POST'])</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">summary</span>():</span>
    url = request.form.get(<span class="hljs-string">'Body'</span>)  <span class="hljs-comment"># Get the JSON data from the request body</span>
    print(url)
    <span class="hljs-keyword">if</span> is_youtube_url(url):
        response = summarise(url)
    <span class="hljs-keyword">else</span>:
        response = <span class="hljs-string">"please check if this is a correct youtube video url"</span>
    print(response)
    resp = MessagingResponse()
    msg = resp.message()
    msg.body(response)
    <span class="hljs-keyword">return</span> str(resp)
</code></pre>
<p>Once your <code>app.py</code> is ready with your Flask API, run the Python script, and you should have your server running locally on your system.</p>
<p>The next step is to make your local server connect with WhatsApp, and that’s where we’ll use Twilio.</p>
<p>Twilio allows us to implement this handshake by offering a WhatsApp sandbox to test your bot. You can follow the steps in this guide <a target="_blank" href="https://www.twilio.com/docs/whatsapp/quickstart/python">here</a> to build this connection.</p>
<p>I got the connection established:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727422495235/4a60a190-2d57-4726-be7c-1e062c4528e5.png" alt="Configure twilio sandbox settings" class="image--center mx-auto" width="1274" height="496" loading="lazy"></p>
<p>Now, we can start testing our WhatsApp bot:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727422721636/339fd977-6b63-4f57-ba40-e677c32e1814.png" alt="Summariser chatbot screenshot" class="image--center mx-auto" width="1508" height="1290" loading="lazy"></p>
<p>Amazing!</p>
<p>I explain all the steps in detail in my project-based course on <a target="_blank" href="https://www.wiplane.com/whatsapp-chatbot"><strong>Building LLM-powered WhatsApp Chatbots</strong></a><strong>.</strong></p>
<p>It’s a <strong>3-project course</strong> that contains two other more complex projects. I’ll give you a brief summary of those other projects here so you can try them out for yourselves. And if you’re interested, you can check out the course as well.</p>
<h2 id="heading-project-2-build-a-bot-that-can-handle-different-types-of-user-querieshttpswwwwiplanecomwhatsapp-chatbot"><a target="_blank" href="https://www.wiplane.com/whatsapp-chatbot">Project #2 — Build a Bot that Can Handle Different Types of User Queries</a></h2>
<p>This bot acts as a customer service representative for an airline. It can answer questions related to flight status, baggage inquiries, ticket booking, and more. It uses Langchain’s Router and LLM models to dynamically generate responses based on the user’s input.</p>
<ul>
<li><p>Different prompt templates are defined for various customer queries, such as flight status, baggage inquiries, and complaints.</p>
</li>
<li><p>Based on the query, the router selects the appropriate template and generates a response.</p>
</li>
<li><p>Twilio then sends the response back to the WhatsApp chat.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727441691086/54bcc4a9-8e04-4509-a361-ee4eb15bca08.png" alt="Wiplane's project 2 - Airline customer support to handle different types of queries" width="880" height="977" loading="lazy"></p>
<h2 id="heading-project-3-rag-powered-support-bothttpswwwwiplanecomwhatsapp-chatbot"><a target="_blank" href="https://www.wiplane.com/whatsapp-chatbot">Project #3 — RAG-Powered Support Bot</a> </h2>
<p>This chatbot answers questions related to airline services using a document-based system. The document is converted into embeddings, which are then queried using Langchain’s RAG system to generate responses. Companies want developers these days who have these skills, so this is an especially practical project.</p>
<ul>
<li><p>The guidelines/rules document is embedded using FAISS and HuggingFace models.</p>
</li>
<li><p>When a user submits a question, the RAG system retrieves relevant information from the document.</p>
</li>
<li><p>The system then generates a response using a pre-trained LLM and sends it back via Twilio.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727441686023/fe55ec78-96dd-42bd-aeae-ceaad24aae44.png" alt="Wiplane's project 3 - RAG powered support bot" width="880" height="1090" loading="lazy"></p>
<p>These 3 projects will get you started so you can continue experimenting and learning more about AI engineering.</p>
<p><a target="_blank" href="https://www.wiplane.com/whatsapp-chatbot"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727306395800/82bf4b68-a79b-4f40-b4fe-61f99fa445ab.png" alt="Wiplane's 3 project course on building LLM powered whatsapp chatbots" class="image--center mx-auto" width="3420" height="1238" loading="lazy"></a></p>
<p>Customer Support is the most funded category in AI because it reduces the cost instantly if AI can handle communication with disgruntled users.</p>
<p>So, we build bots that can handle different types of queries, intelligent RAG powered bots which will have access to proprietary documents to provided up-to-date information to the users.</p>
<p>That’s why I created <a target="_blank" href="https://www.wiplane.com/whatsapp-chatbot">this project-based course</a> to help you start building with LLMs.</p>
<p>Check out the course preview here:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/6R5DMyqMOz4" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p> </p>
<p>And to thank you for reading this guide, you can use the code FREECODECAMP to get a 20% discount on my course.</p>
<p>I want to make this affordably accessible for all those who are sincere about building with AI, so I’ve priced it affordably at $14.99 USD.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, we focused on building a fun YouTube video summarizer tool that is served on WhatsApp.</p>
<p>The bot's core functionality includes:</p>
<ul>
<li><p>Receiving a YouTube URL</p>
</li>
<li><p>Validating the URL</p>
</li>
<li><p>Retrieving the video transcript</p>
</li>
<li><p>Using an LLM to summarize the content</p>
</li>
<li><p>Returning the summary to the user</p>
</li>
</ul>
<p>We used a number of Python packages including langchain-together, langchain-community, langchain, pytube, and youtube-transcript-api.</p>
<p>The project uses the Llama 3.1 model via Together AI's API.</p>
<p>We built the core summarisation functionality using</p>
<ul>
<li><p>Using LangChain's invoke() method with system and human messages</p>
</li>
<li><p>Using PromptTemplate and LLMChain for more dynamic solutions</p>
</li>
</ul>
<p>To make the tool accessible via WhatsApp:</p>
<ul>
<li><p>The functionality is served as an API endpoint using Flask</p>
</li>
<li><p>Twilio is used to connect the local server with WhatsApp</p>
</li>
<li><p>A WhatsApp sandbox is used for testing the bot</p>
</li>
</ul>
<p>To continue building further projects, check out the course.</p>
<p>It is a beginner track course where you start from learning to build with LLMs, then apply those skills to build 3 different types of LLM applications. Not just that – you learn to serve your applications as WA chatbots.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn LangChain to link LLMs with external data ]]>
                </title>
                <description>
                    <![CDATA[ LangChain is an AI-first framework designed to enable developers to create context-aware reasoning applications by linking powerful Large Language Models with external data sources. We just published a course on the freeCodeCamp.org YouTube channel t... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-langchain-to-link-llms-with-external-data/</link>
                <guid isPermaLink="false">66b204b3712508eb16067889</guid>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Wed, 22 Nov 2023 04:10:10 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/11/langchain4.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>LangChain is an AI-first framework designed to enable developers to create context-aware reasoning applications by linking powerful Large Language Models with external data sources.</p>
<p>We just published a course on the freeCodeCamp.org YouTube channel that will teach you all about LangChain. The course will equip you with the cutting-edge skills needed to build a highly knowledgeable chatbot using LangChain Expression Language. </p>
<p>Tom Chant is a popular instructor at Scrimba. In this course, Tom will take you on a journey from the basics of LangChain.js to advanced concepts. You'll delve into an array of topics including embeddings, app flow diagrams, Supabase vector store, text splitting, and much more. The course is structured to make learning LangChain.js approachable and enjoyable, with a focus on practical applications.</p>
<p>The course even includes an introduction to LangChain from Jacob Lee, the lead maintainer of LangChain.js.</p>
<p>In this course, you will learn about:</p>
<ul>
<li>Splitting with a LangChain textSplitter tool</li>
<li>Vectorising text chunks</li>
<li>Using embeddings models</li>
<li>Supabase vector store</li>
<li>Templates with input_variables</li>
<li>Prompts from templates</li>
<li>LangChain Expression Language</li>
<li>Basic chains with the .Pipe() method</li>
<li>Retrieval from a vector store</li>
<li>Complex chains with RunnableSequence()</li>
<li>The StringOutputParser() class</li>
<li>Troubleshooting performance issues</li>
</ul>
<p>In this course, you'll learn how to use LangChain.js to build a chatbot that can answer questions on a specific text you give it.</p>
<p>In the first part of the project, you'll learn about using LangChain to split text into chunks, convert the chunks to vectors using an OpenAI embeddings model, and store them together in a Supabase vector store.</p>
<p>Next, you'll learn about chains, which are the building blocks of LangChain. And we do this using LangChain Expression Language. This makes the process of coding in LangChain much smoother and easier to grasp.</p>
<p>Finally, you'll learn about retrieval: using vector matching to select the text chunks from our vector store which are most likely to hold the answer to a user’s query. This enables the chatbot to answer questions specific to your data - a critical skill when working with AI and one of the most common use-cases for AI in web dev.</p>
<p>Watch the full course on the <a target="_blank" href="https://youtu.be/HSZ_uaif57o">freeCodeCamp.org YouTube channel</a> (2-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/HSZ_uaif57o" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
