langchain - freeCodeCamp.org

How to Protect Sensitive Data by Running LLMs Locally with Ollama

Manoj Aggarwal — Thu, 05 Mar 2026 15:04:02 +0000

Whenever engineers are building AI-powered applications, use of sensitive data is always a top priority. You don't want to send users' data to an external API that you don't control.

For me, this happened when I was building FinanceGPT, which is my personal open-source project that helps me with my finances. This application lets you upload your bank statements, tax forms like 1099s, and so on, and then you can ask questions in plain English like, "How much did I spend on groceries this month?" or "What was my effective tax rate last year?"

The problem is that answering these questions means sending all the sensitive transaction history, W-2s and income data to OpenAI or Anthropic or Google, which I was not comfortable with. Even after redacting PII data from these documents, I was not ok with the trade-off.

This is where Ollama comes in. Ollama lets you run large language models entirely on your own laptop. You don't need any API keys or cloud infrastructure and no data leaves your machine.

In this tutorial, I will walk you through what Ollama is, how to get started with it, and how to use it in a real Python application so that users of the application can choose to keep their data completely local.

Prerequisites
What is Ollama
How Ollama's API works
How to call Ollama from Python
How to Integrate Ollama into a LangChain App
How to Build an LLM-Provider Agnostic App
How to use Ollama with LangGraph
How FinanceGPT Uses This in Practice
Tradeoffs to be Aware Of
Conclusion
Check Out FinanceGPT
Resources

Prerequisites

You will need the following at a minimum:

Python 3.10+
A machine with at least 8GB of RAM (16GB recommended for larger models)
Basic familiarity with Python and pip

What is Ollama?

Ollama is an open-source tool that makes running LLMs locally very easy. You can think of it as Docker but for AI models. You can pull models using just one command and Ollama handles everything else like downloading the weights, managing memory and the serving the model through a local REST API.

The local REST API is compatible with OpenAI's API format which means any application that can talk to OpenAI, can switch to using Ollama without changing any code.

Installation

First thing you would need is to download the installer from ollama.com. Once installed, you can verify it is running:

ollama --version

The above command checks whether Ollama was installed correctly and prints the current version.

Pull and Run Your First Model

Ollama hosts a variety of models on ollama.com/library. To pull and immediately chat with one, just do:

ollama run llama3.2

This command will download the model from ollama and start an interactive chat session with it. Note: the model size would be a few GBs depending on which model is downloaded. Alternatively, if you want to download a specific model only:

ollama pull mistral

This downloads a model to your machine without starting a chat session which is useful when you want to set up models in advance.

You can run the following command to list the models you have installed:

ollama list

This shows all models you've downloaded locally along with their sizes.

I have used the following models and they have worked great for specific tasks:

Model	Size	Good For
`llama3.2`	~2GB	Fast, general purpose
`mistral`	~4GB	Strong instruction following
`qwen2.5:7b`	~4GB	Multilingual, reasoning
`deepseek-r1:7b`	~4GB	Complex reasoning tasks

How Ollama's API works

Once Ollama is running, it will be served on localhost:11434. You can call it directly using curl:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{ "role": "user", "content": "What is compound interest?" }],
  "stream": false
}'

This sends a chat message directly to Ollama's REST API from the command line, with streaming disabled so you get the full response at once. The above endpoint is to simply chat with the model. The more useful endpoint is http://localhost:11434/v1 as this is OpenAI-compatible. This is the key feature that makes it easy to drop into existing apps that use OpenAI or other LLMs.

How to Call Ollama from Python

How to Use the Ollama Python Library

Ollama has its own Python library that is pretty intuitive to use:

pip install ollama

from ollama import chat

response = chat(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'Explain what a Roth IRA is in simple terms.'}
    ]
)

print(response.message.content)

The above code uses Ollama's native Python SDK to send a message and print the model's reply, which is the most straightforward way to call Ollama from Python

How to Use the OpenAI SDK with Ollama as the Backend

As mentioned earlier, Ollama has an endpoint that is OpenAI compatible, so you can also use the OpenAI Python SDK and just point it to your local server:

pip install openai

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama',  # Required by the SDK, but ignored by Ollama
)

response = client.chat.completions.create(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'Explain what a Roth IRA is in simple terms.'}
    ]
)

print(response.choices[0].message.content)

This uses the standard OpenAI Python SDK but redirects it to your local Ollama server. The api_key field is required by the SDK but ignored by Ollama. This pattern makes using Ollama seamless for existing applications. The code is nearly identical to what you would write for OpenAI.

How to Integrate Ollama into a LangChain App

Most production applications are built with an orchestration framework like LangChain, which has a native Ollama support. This means swapping providers is just a one-line change.

Install the integration:

pip install langchain-ollama

How to Create a Chat Model

from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.2")

response = llm.invoke("What is the difference between a W-2 and a 1099?")
print(response.content)

This creates a LangChain-compatible chat model backed by a local Ollama model, a one-line swap from ChatOpenAI.

Compare this to the OpenAI version and you will see that the interface is almost identical:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")

How to Build an LLM-Provider Agnostic App

The real power of the application comes from the abstraction of LLM providers. Applications like Perplexity lets users choose the LLM they want to use for their tasks. Here's a simple factory pattern that returns the right LLM based on the configuration:

from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langchain_anthropic import ChatAnthropic

def get_llm(provider: str, model: str):
    """
    Return the appropriate LangChain LLM based on the provider.
    
    Args:
        provider: One of "openai", "ollama", "anthropic"
        model: The model name (e.g. "gpt-4o", "llama3.2", "claude-3-5-sonnet")
    
    Returns:
        A LangChain chat model ready to use
    """
    if provider == "openai":
        return ChatOpenAI(model=model)
    elif provider == "ollama":
        return ChatOllama(model=model)
    elif provider == "anthropic":
        return ChatAnthropic(model=model)
    else:
        raise ValueError(f"Unknown provider: {provider}")

The above snippet shows a helper that returns the right LangChain model based on a provider string, so the rest of your app never needs to know which LLM is running underneath.

Now the rest of your code does not need to know about the provider who's LLM is running underneath. This includes your chains, your agents and your tools. You pass llm around and it just works.

How to use Ollama with LangGraph

If you're using LangGraph to build agents (as I covered in my previous article on AI agents), plugging in Ollama is equally seamless:

from langgraph.prebuilt import create_react_agent
from langchain_ollama import ChatOllama
from langchain_core.tools import tool

@tool
def get_spending_summary(category: str) -> str:
    """Get total spending for a given category this month."""
    # In a real app, this would query your database
    return f"You spent $342.50 on {category} this month."

llm = ChatOllama(model="llama3.2")

agent = create_react_agent(
    model=llm,
    tools=[get_spending_summary]
)

response = agent.invoke({
    "messages": [{"role": "user", "content": "How much did I spend on groceries?"}]
})

print(response["messages"][-1].content)

This snippet builds a ReAct agent that uses a locally-running model to decide when to call tools while keeping all data on-device even during agentic workflows.

The agent will decide to call the get_spending_summary tool when needed and get the result using the locally running model instead of sending your data over the internet to OpenAI.

How FinanceGPT Uses This in Practice

FinanceGPT is built to support OpenAI, Anthropic, Google and Ollama as LLM providers. The user sets their preference on the UI or in a config file and the application instantiates the right model using a pattern very similar to the factory pattern above.

When the user chooses Ollama, here's what happens:

Their bank statements and other sensitive documents are parsed locally
Sensitive fields like SSNs are masked before any LLM call
The masked data and query goes to the local Ollama server running on their own machine
The response comes back locally and nothing ever leaves their network

To run FinanceGPT locally with Ollama, the setup looks like this:

# 1. Pull a capable model
ollama pull llama3.2

# 2. Clone and configure FinanceGPT
git clone https://github.com/manojag115/FinanceGPT.git
cd FinanceGPT
cp .env.example .env

# 3. In .env, set your LLM provider to Ollama
# LLM_PROVIDER=ollama
# LLM_MODEL=llama3.2

# 4. Start the full stack
docker compose -f docker-compose.quickstart.yml up -d

With this setup, the entire application including the frontend, backend and LLM, runs on your own hardware.

Tradeoffs to be Aware Of

Ollama is a great local alternative to using cloud LLMs, but it comes with its own problems.

Response Quality

Ollama models are essentially 7B parameter models running locally, so by design they will not match GPT-4o on complex reasoning tasks. For simple Q&A and summarization tasks, the results would be comparable, but for multi-step reasoning or nuanced judgement calls, the gap is noticeable.

Speed

Inference speed depends on the hardware that is running the model. Without a GPU, the Ollama models can take several seconds to respond. On Apple Silicon (M1/M2/M3), the performance is surprisingly good even without a dedicated GPU.

Hardware Requirements

Small models (7B parameters) need around 8GB of RAM, however larger models (13B+) need 16GB or more. If you are building your application for end users, you cannot guarantee they have the hardware.

Tool Use and Function Calling

Not all local models support function calling reliably. If your agent depends heavily on tool use, test your chosen model carefully. Models like qwen2.5 and mistral generally handle this better than others.

The right mental model: use cloud models when you need maximum capability, and local models when privacy or cost constraints make cloud models impractical.

Conclusion

In this tutorial, you learned what Ollama is, how to install it and pull models, and three different ways to call it from Python: the native Ollama library, the OpenAI-compatible SDK, and LangChain. You also saw how to build a provider-agnostic factory pattern so your app can switch between cloud and local models with a single config change.

Ollama makes local LLMs genuinely practical for production apps. The OpenAI-compatible API means integration is nearly zero-friction, and LangChain's native support means you can build provider-agnostic apps from the start.

The finance domain is an obvious fit — but the same principle applies anywhere sensitive data is involved: healthcare, legal tech, HR, personal productivity. If your app processes data that users wouldn't want stored on someone else's server, giving them a local option isn't just a nice-to-have. It's a trust feature.

Check Out FinanceGPT

All the code examples here came from FinanceGPT. If you want to see these patterns in a complete app, poke around the repo. It's got document processing, portfolio tracking, tax optimization – all built with LangGraph.

If you find this helpful, give the project a star on GitHub – it helps other developers discover it.

Resources

How to Build and Deploy an AI Agent with LangChain, FastAPI, and Sevalla

Manish Shivanandhan — Thu, 08 Jan 2026 23:43:55 +0000

Artificial intelligence is changing how we build software. Just a few years ago, writing code that could talk, decide, or use external data felt hard.

Today, thanks to new tools, developers can build smart agents that read messages, reason about them, and call functions on their own.

One such platform that makes this easy is LangChain. With LangChain, you can link language models, tools, and apps together. You can also wrap your agent inside a FastAPI server, then push it to a cloud platform for deployment.

This article will walk you through building your first AI agent. You will learn what LangChain is, how to build an agent, how to serve it through FastAPI, and how to deploy it on Sevalla.

What We’ll Cover

What is LangChain?
How to Build Your First Agent with LangChain
Wrapping Your Agent with FastAPI
How to Deploy Your AI Agent to Sevalla
Conclusion

What is LangChain?

LangChain is a framework for working with large language models. It helps you build apps that think, reason, and act.

A model on its own only gives text replies, but LangChain lets it do more. It lets a model call functions, use tools, connect with databases, and follow workflows.

Think of LangChain as a bridge. On one side is the language model. On the other side are your tools, data sources, and business logic. LangChain tells the model what tools exist, when to use them, and how to reply. This makes it ideal for building agents that answer questions, automate tasks, or handle complex flows.

Many developers use LangChain because it is flexible. It supports many AI models. It fits well with Python.

Langchain also makes it easier to move from prototype to production. Once you learn how to create an agent, you can reuse the pattern for more advanced use cases.

I have recently published a detailed langchain tutorial here.

How to Build Your First Agent with LangChain

Let’s make our first agent. It will respond to user questions and call a tool when needed.

We’ll give it a simple weather tool, then ask it about the weather in a city. Before this, create a file called .env and add your OpenAI api key. Langchain will automatically use it when making requests to OpenAI.

OPENAI_API_KEY=

Here is the code for our agent:


from langchain.agents import create_agent
from dotenv import load_dotenv

# load environment variables
load_dotenv()

# defining the tool that LLM can call
def get_weather(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"

# Creating an agent
agent = create_agent(
    model="gpt-4o",
    tools=[get_weather],
    system_prompt="You are a helpful assistant",
)

result = agent.invoke({"messages":[{"role":"user","content":"What is the weather in san francisco?"}]})

This small program shows the power of LangChain agents.

First, we import create_agent, which helps us build the agent. Then we write a function called get_weather. It takes a city name and returns a friendly sentence.

The function acts as our tool. A tool is something the agent can use. In real projects, tools might fetch prices, store notes, or call APIs.

Next, we call create_agent. We give it three things. We pass the model we want to use. We list the tools we want it to call. And we give a system prompt. The system prompt tells the agent who it is and how it should behave.

Finally, we run the agent. We call invoke with a message.

The user asks for the weather in San Francisco. The agent reads this message. It sees that the question needs the weather function. So it calls our tool get_weather, passes the city, and returns an answer.

Even though this example is tiny, it captures the main idea. The agent reads natural language, figures out what tool to use, and sends a reply.

Later, you can add more tools or replace the weather function with one that connects to a real API. But this is enough for us to wrap and deploy.

Wrapping Your Agent with FastAPI

The next step is to serve our agent. FastAPI helps us expose our agent through an HTTP endpoint. That way, users and systems can call it through a URL, send messages, and get replies.

To begin, you install FastAPI and write a simple file like main.py. Inside it, you import FastAPI, load the agent, and write a route.

When someone posts a question, the API forwards it to the agent and returns the answer. The flow is simple.

The user talks to FastAPI. FastAPI talks to your agent. The agent thinks and replies. Here is the FAST API wrapper for your agent.

from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
from langchain.agents import create_agent
from dotenv import load_dotenv
import os

load_dotenv()

# defining the tool that LLM can call
def get_weather(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"

# Creating an agent
agent = create_agent(
    model="gpt-4o",
    tools=[get_weather],
    system_prompt="You are a helpful assistant",
)

app = FastAPI()

class ChatRequest(BaseModel):
    message: str

@app.get("/")
def root():
    return {"message": "Welcome to your first agent"}

@app.post("/chat")
def chat(request: ChatRequest):
    result = agent.invoke({"messages":[{"role":"user","content":request.message}]})
    return {"reply": result["messages"][-1].content}

def main():
    port = int(os.getenv("PORT", 8000))
    uvicorn.run(app, host="0.0.0.0", port=port)

if __name__ == "__main__":
    main()

Here, FastAPI defines a /chat endpoint. When someone sends a message, the server calls our agent. The agent processes it as before. Then FastAPI returns a clean JSON reply. The API layer hides the complexity inside a simple interface.

At this point, you have a working agent server. You can run it on your machine, call it with Postman or cURL, and check responses. When this works, you are ready to deploy.

How to Deploy Your AI Agent to Sevalla

You can choose any cloud provider, like AWS, DigitalOcean, or others to host your agent. I will be using Sevalla for this example.

Sevalla is a developer-friendly PaaS provider. It offers application hosting, database, object storage, and static site hosting for your projects.

Every platform will charge you for creating a cloud resource. Sevalla comes with a $50 credit for us to use, so we won’t incur any costs for this example.

Let’s push this project to GitHub so that we can connect our repository to Sevalla. We can also enable auto-deployments so that any new change to the repository is automatically deployed.

You can also fork my repository from here.

Log in to Sevalla and click on Applications -> Create new application. You can see the option to link your GitHub repository to create a new application

Use the default settings. Click “Create application”. Now we have to add our openai api key to the environment variables. Click on the “Environment variables” section once the application is created, and save the OPENAI_API_KEY value as an environment variable.

Now we are ready to deploy our application. Click on “Deployments” and click “Deploy now”. It will take 2–3 minutes for the deployment to complete.

Once done, click on “Visit app”. You will see the application served via a URL ending with sevalla.app . This is your new root URL. You can replace localhost:8000 with this URL and test in Postman.

Congrats! Your first AI agent with tool calling is now live. You can extend this by adding more tools and other capabilities, and pushing your code to GitHub, and Sevalla will automatically deploy your application to production.

Conclusion

Building AI agents is no longer a task for experts. With LangChain, you can write a few lines and create reasoning tools that respond to users and call functions on their own.

By wrapping the agent with FastAPI, you give it a doorway that apps and users can access. Finally, Sevalla makes it easy to push your agent live, monitor it, and run it in production.

This journey from agent idea to deployed service shows what modern AI development looks like. You start small. You explore tools. You wrap them and deploy them.

Then you iterate, add more capability, improve logic, and plug in real tools. Before long, you have a smart, living agent online. That is the power of this new wave of technology.

Hope you enjoyed this article. Signup for my free newsletter TuringTalks.ai for more hands-on tutorials on AI. You can also visit my website.

How to Build an AI Agent with LangChain and LangGraph: Build an Autonomous Starbucks Agent

Djibril-M🍀 — Fri, 19 Dec 2025 00:21:01 +0000

Back in 2023, when I started using ChatGPT, it was just another chatbot that I could ask complex questions to and it would identify errors in my code snippets. Everything was fine. The application had no memory of previous states or what was said the day before.

Then in 2024, everything started to change. We went from a stateless chatbot to an AI agent that could call tools, search the internet, and generate download links.

At this point, I started to get curious. How can an LLM search the internet? An infinite number of questions were flowing through my head. Can it create its own tools, programs, or execute its own code? It felt like we were heading toward the Skynet (Terminator) revolution.

I was just ignorant 😅. But that's when I started my research and discovered LangChain, a tool that promises all those miracles without a billion-dollar budget.

In this article, you’ll build a fully functional AI agent using LangChain and LangGraph. You’ll start by defining structured data using Zod schemas, then parsing them for AI understanding. Next, you’ll learn about summarizing data into text, creating tools the agent can call, and setting up LangGraph nodes to orchestrate workflows.

You’ll see how to compile the workflow graph, manage state, and persist conversation history using MongoDB. By the end, you’ll have a working Starbucks barista AI that demonstrates how to combine reasoning, tool execution, and memory in a single agent.

Prerequisites
What is an LLM Agent?
Project Setup
Data Schematization with Zod
How to Parse the Schema
Data-to-Text Summarization
How to Persist Orders with MongoDB in NestJS
LangGraph State/Annotation Terms
How to Create Tools for the Agent
LangGraph Nodes (Workflow Components)
Graph Declaration
Workflow Compilation and State Persistence (Final Part)
Conclusion

Prerequisites

To take full advantage of this article, you should have a basic understanding of TypeScript, Node.js, and a bit of NestJS will help, as it’s the backend framework we’ll be using.

What is an LLM Agent?

By definition, an LLM agent is a software program that’s capable of perceiving its environment, making decisions, and taking autonomous actions to achieve specific goals. It often does this by interacting with tools and systems.

Many frameworks and conventions were created to achieve this, and one of the most famous and widely used is the ReAct (Reason & Act) framework.

With this framework, the LLM receives a prompt, thinks, decides the next action (this can be calling a specific tool), and receives the tool data. Once the tool’s response has been received, the AI model observes the response, generates its own response, and plans its next actions based on the tool’s response.

You can read more about this concept on the official white paper. And here’s a diagram that summarizes the entire process:

Note that the workflow is not limited to a single tool invocation – it can proceed through several rounds before returning to the user.

But for an LLM agent to be truly human-like and act with knowledge of the past, it requires a memory. This enables it to recall previous prompts and responses, maintaining consistency within the given thread.

There’s no single source of truth for how to approach this. Most agents implement a short-term memory. This means that the agent will append each new chat to the conversation history, and when a new prompt is submitted, the agent will append the previous messages to the new prompt.

This method is very efficient and gives the LLM a strong knowledge of previous states. But it can also introduce problems, because the more the conversation grows, the more the LLM will have to go through all previous messages in order to understand what action to take next.

And this can introduce some context drift, just like humans experience. You can’t watch a two-hour podcast and remember all the spoken words, right? In this scenario, the LLM will focus on the most relevant information, eventually losing some context.

You don’t have to implement this from scratch. Many tools and frameworks have been developed to make the implementation as easy as possible. You can build it from scratch if you want, of course, but we won’t be doing that here.

In this article, we’ll build a Starbucks barista that collects order information and calls a create_order tool once the order meets the full criteria. This is a tool that we’ll create and expose to the AI.

Project Setup

Let’s start by initializing our project. We’ll use Nest.js for its efficiency and native TypeScript support. Note that nothing here is tied to Nest.js – this is just a framework preference, and everything we’ll do here can be done with Node.js and Express.js.

Here is a list of all the tools that we’ll use:

langchain/core - Always required

This is the main Langchain engine that defines all core tools and fundamental functions, containing:
- prompt templates
- message types
- runnables
- tool interfaces
- chain composition utilities, and more.

Most LangChain project need this.

langchain/google-genai - This package is used to interact with Google’s generative AI models, vector embedding models, and other related tools.
langchain/langgraph - Important for building an AI agent with total control

Langgraph is a low-level orchestration framework for building controllable agents. It can be used to build:
- Conversational agents.
- Build complex task automation.
- Agent’s context management.
langchain/langgraph-checkpoint-mongodb - This package provides a MongoDB-based checkpointer for LangGraph, enabling persistence of agent state and short-term memory using MongoDB.
@langchain/mongodb - This package provides MongoDB integrations for LangChain, allowing you to:
- Store and retrieve vector embeddings.
- Persist LangChain documents, agents, or memory states.
- Easily integrate MongoDB as a database backend for your AI workflows.
@nestjs/mongoose - A NestJS wrapper around Mongoose for MongoDB. Provides:
- Dependency injection support for Mongoose models.
- Simplified schema definition and model management.
- Seamless integration of MongoDB into NestJS applications, enabling structured data persistence for AI apps or any backend.
langchain - This is the main npm package that aggregates LangChain functionality. It provides:
- Access to connectors, utilities, and core modules.
- Easy import of different LangChain components in one place.
- Commonly used alongside @langchain/core for building applications with minimal setup.
mongodb - The official MongoDB driver for Node.js. It provides:
- Low-level, flexible access to MongoDB databases.
- Support for CRUD operations, transactions, and indexing.
- A required dependency if you plan to connect LangChain components or your backend directly to MongoDB.
mongoose - An ODM (Object Data Modeling) library for MongoDB. Offers:
- Schema-based data modeling for MongoDB documents.
- Middleware, validation, and hooks for MongoDB operations.
- Ideal for structured data management in NestJS or other Node.js applications.
zod - A TypeScript-first schema validation library. Used for:
- Defining strict data schemas and validating inputs/outputs.
- Ensuring type safety at runtime.
- Useful in AI applications to validate responses from models or enforce data consistency.

Start by initializing your Nest.js project, and installing all the required dependencies:

$ npm i -g @nestjs/cli //If you don't have Nest.js installed on your machine
$ nest new project-name

"dependencies" : {
    "@langchain/core": "^0.3.75",
    "@langchain/google-genai": "^0.2.16",
    "@langchain/langgraph": "^0.4.8",
    "@langchain/langgraph-checkpoint-mongodb": "^0.1.1",
    "@langchain/mongodb": "^0.1.0",
    "@nestjs/mongoose": "^11.0.3",
    "langchain": "^0.3.33",
    "mongodb": "^6.19.0",
    "mongoose": "^8.18.1",
    "zod": "^4.1.8"
}

//The versions may not be same at the time you are reading this, so I recommand checking
//The official documentation for each package.

Now that we have our project created and all the packages installed, let’s see what we need to do to turn our vision into a project. Think of what you’ll need in order to create a Starbucks barista:

First, we need to define the structure of our data (creating schemas)
Then we need to create a menu list that our agent will be referring to.
After that, we’ll add LLM interaction
And last but not least, we’ll add the ability to save previous conversations for conversational context.

Folder Structure

You can modify this folder structure and adapt it based on your framework of choice. But the core implementation is the same across all frameworks.

├── .env
├── .eslintrc.js
├── .gitignore
├── .prettierrc
├── nest-cli.json
├── package.json
├── README.md
├── tsconfig.build.json
├── tsconfig.json
├── src/
│   ├── app.controller.ts
│   ├── app.module.ts
│   ├── app.service.ts
│   ├── main.ts
│   ├── chat/
│   │   ├── chat.controller.ts
│   │   ├── chat.module.ts
│   │   ├── chat.service.ts
│   │   └── dtos/
│   │       └── chat.dto.ts
│   ├── data/
│   │   └── schema/
│   │       └── order.schema.ts
│   └── util/
│       ├── constants/
│       │   └── drinks_data.ts
│       ├── schemas/
│       │   ├── drinks/
│       │   │   └── Drink.schema.ts
│       │   └── orders/
│       │       └── Order.schema.ts
│       ├── summeries/
│       │   └── drink.ts
│       └── types/

Data Schematization with Zod

This file contains all our schema definitions regarding drinks and all modifications they can receive. This part is useful for defining the structure of the data that will be used by the AI agent.

Importing Zod

In the lib/util/schemas/drinks.ts file, before defining any schemas, import the Zod library, which provides tools for building TypeScript-first schemas.

// Imports the 'z' object from the 'zod' library.
// Zod is a TypeScript-first schema declaration and validation library.
// 'z' is the primary object used to define schemas (e.g., z.object, z.string, z.boolean, z.array).
import z from "zod";

Zod gives you a simple and expressive way to define and validate the structure of the data our agent will interact with.

Drink Schema

This schema represents the structure of a drink in the Starbucks-style menu. I split and explained each field so the reader clearly understands what each property controls.

export const DrinkSchema = z.object({
  name: z.string(),            // Required name of the drink
  description: z.string(),     // Required explanation of what the drink is
  supportMilk: z.boolean(),    // Whether milk options are available
  supportSweeteners: z.boolean(), // Whether sweeteners can be added
  supportSyrup: z.boolean(),   // Whether flavor syrups are allowed
  supportTopping: z.boolean(), // Whether toppings are supported
  supportSize: z.boolean(),    // Whether the drink can be ordered in sizes
  image: z.string().url().optional(), // Optional image URL
});

What this schema represents

It ensures every drink has a proper name and a description.
It defines which customizations apply to the drink.
It prepares the agent to reason about drink options in a structured, validated format.

Sweetener Schema

Each sweetener option in the menu is represented with its own schema.

export const SweetenerSchema = z.object({
  name: z.string(),                // Sweetener name
  description: z.string(),         // What it is / taste description
  image: z.string().url().optional(), // Optional image URL
});

This ensures consistency across all sweetener entries and avoids malformed data.

Syrup Schema

Similar to sweeteners, but for syrup flavors:


export const SyrupSchema = z.object({
  name: z.string(),
  description: z.string(),
  image: z.string().url().optional(),
});

This can represent flavors like Vanilla, Caramel, or Hazelnut.

Topping Schema

Toppings such as whipped cream or cinnamon are defined here.

export const ToppingSchema = z.object({
  name: z.string(),
  description: z.string(),
  image: z.string().url().optional(),
});

Size Schema

Drink sizes are modeled as objects as well:

export const SizeSchema = z.object({
  name: z.string(),               // e.g. Small, Medium
  description: z.string(),        // A short explanation
  image: z.string().url().optional(),
});

Milk Schema

Represents milk types such as Whole, Skim, Almond, or Oat.

export const MilkSchema = z.object({
  name: z.string(),
  description: z.string(),
  image: z.string().url().optional(),
});

Collections of Items

Now that the individual item schemas exist, we can create collections of them. These represent all available toppings, sizes, milk types, syrups, sweeteners, and the entire menu of drinks

export const ToppingsSchema = z.array(ToppingSchema);
export const SizesSchema = z.array(SizeSchema);
export const MilksSchema = z.array(MilkSchema);
export const SyrupsSchema = z.array(SyrupSchema);
export const SweetenersSchema = z.array(SweetenerSchema);
export const DrinksSchema = z.array(DrinkSchema);

Why arrays? Because in the real world, your agent will receive lists from a database or API—not single items.

Inferred Types

Zod also allows TypeScript to infer types from schemas automatically.

This ensures:

TypeScript types always match the schemas.
You avoid duplicated definitions.
The agent code stays consistent and safe.

export type Drink = z.infer<typeof DrinkSchema>;
export type SupportSweetener = z.infer<typeof SweetenerSchema>;
export type Syrup = z.infer<typeof SyrupSchema>;
export type Topping = z.infer<typeof ToppingSchema>;
export type Size = z.infer<typeof SizeSchema>;
export type Milk = z.infer<typeof MilkSchema>;

export type Toppings = z.infer<typeof ToppingsSchema>;
export type Sizes = z.infer<typeof SizesSchema>;
export type Milks = z.infer<typeof MilksSchema>;
export type Syrups = z.infer<typeof SyrupsSchema>;
export type Sweeteners = z.infer<typeof SweetenersSchema>;
export type Drinks = z.infer<typeof DrinksSchema>;

These provide the rest of your LangChain/LangGraph code with strong typing based on your schema definitions.

This entire file:

Encodes all drink-related data structures.
Provides validation to ensure clean, predictable data.
Automatically generates TypeScript types.
Helps the AI agent reason reliably about drinks and customization options.

You’ll use these schemas later and convert them into string representations for LLM prompts.

You can find the file containing all the code here.

How to Parse the Schema

As mentioned earlier, LLMs are text input–output machines. They don’t understand TypeScript types or Zod schemas directly. If you include a schema inside a prompt, the model will simply see it as plain text without understanding its structure or constraints.

Because of this, we need a way to convert schemas into a readable string format that can be embedded inside a prompt, such as:

“The output must be a JSON object with the following fields…”

This is exactly the problem solved by StructuredOutputParser from langchain/output_parsers. It takes a Zod schema and turns it into:

A human-readable description that can be sent to an LLM.
A validator that checks whether the model’s output matches the schema.

In short, it acts as a bridge between typed application logic and text-based AI output.

Defining the Order Schema

We’ll start with a simple Zod schema that represents a customer’s drink order. This schema defines the exact shape and constraints of the data we expect the model to produce.

export const OrderSchema = z.object({
  drink: z.string(),
  size: z.string(),
  mil: z.string(),
  syrup: z.string(),
  sweeteners: z.string(),
  toppings: z.string(),
  quantity: z.number().min(1).max(10),
});

export type OrderType = z.infer<typeof OrderSchema>;

At this point, the schema is useful only inside our TypeScript application. The LLM still has no idea what this structure means.

Parsing the Schema into Human-Readable Text

This is where schema parsing comes in. Using StructuredOutputParser.fromZodSchema, we can transform the Zod schema into:

Instructions the LLM can understand.
A runtime validator that ensures the response is correct.

export const OrderParser =
  StructuredOutputParser.fromZodSchema(OrderSchema as any);

The parser enables two critical workflows:

Generating prompt instructions

The parser can generate a text description of the schema that looks roughly like: “Return a JSON object with the fields drink, size, mil, syrup, sweeteners, and toppings as strings, and quantity as a number between 1 and 10.” This string can be injected directly into your prompt so the LLM knows exactly how to format its response.

Validating the model’s output

After the LLM responds, its output is still just text. The parser:

Converts that text into a JavaScript object.
Validates it against the original Zod schema.
Throws an error if anything is missing, malformed, or out of bounds.

This prevents invalid AI-generated data (for example, quantity: 0) from entering your system.

Reusing the Same Approach for Other Schemas

Once you understand this pattern, applying it to other schemas is straightforward.

For example, you can do the same thing for a DrinkSchema:

export const DrinkParser =
  StructuredOutputParser.fromZodSchema(DrinkSchema as any);

Now you can confidently say something like: “Hey Gemini, this is what a drink object looks like—please respond using this structure.”

Why This Matters

Schema parsing allows you to:

Keep strong typing in your application.
Give clear formatting instructions to the LLM.
Safely convert unstructured AI output into validated, production-ready data.

Without this step, working with LLMs at scale becomes unreliable and error-prone.

Data-to-Text Summarization

In the context of LLM agents, data-to-text summarization means converting structured data—such as objects returned from a database or backend API—into clear, human-readable strings that can be embedded directly into prompts.

Even the most advanced LLMs operate purely on text. They don’t reason over JavaScript objects, database rows, or JSON structures in the same way humans or programs do. The clearer and more descriptive your text input is, the more accurate and reliable the model’s output will be.

Because of this, a common and recommended pattern when building LLM-powered systems is:

Fetch structured data → summarize it into natural language → pass the summary into the prompt

To keep this article focused, we’ll store our data in constants instead of querying a real database. The technique is exactly the same whether the data comes from MongoDB, PostgreSQL, or an API.

The Core Idea

The goal of data-to-text summarization is simple:

Take an object with fields and boolean flags
Convert it into a short paragraph that explains what the object represents
Remove ambiguity and guesswork for the LLM

Instead of forcing the model to infer meaning from raw data, we spell it out explicitly.

Summarizing a Drink Object

Consider the following drink object:

{
  name: 'Espresso',
  description: 'Strong concentrated coffee shot.',
  supportMilk: false,
  supportSweeteners: true,
  supportSyrup: true,
  supportTopping: false,
  supportSize: false,
}

While this structure is easy for developers to understand, it’s not ideal for an LLM prompt. Boolean flags like supportMilk: false require interpretation, which increases the chance of incorrect assumptions.

Instead, we convert this object into a descriptive paragraph:

“A drink named Espresso. It is described as a strong, concentrated coffee shot. It cannot be made with milk. It can be made with sweeteners. It can be made with syrup. It cannot be made with toppings. It cannot be made in different sizes.”

This transformation is exactly what data-to-text summarization provides.

A Standard Summarization Pattern

Below is a simplified example of how we convert a Drink object into a readable description.

export const createDrinkItemSummary = (drink: Drink): string => {
  const name = `A drink named ${drink.name}.`;
  const description = `It is described as ${drink.description}.`;

  const milk = drink.supportMilk
    ? 'It can be made with milk.'
    : 'It cannot be made with milk.';

  const sweeteners = drink.supportSweeteners
    ? 'It can be made with sweeteners.'
    : 'It cannot contain sweeteners.';

  const syrup = drink.supportSyrup
    ? 'It can be made with syrup.'
    : 'It cannot be made with syrup.';

  const toppings = drink.supportTopping
    ? 'It can be made with toppings.'
    : 'It cannot be made with toppings.';

  const size = drink.supportSize
    ? 'It can be made in different sizes.'
    : 'It cannot be made in different sizes.';

  return `${name} ${description} ${milk} ${sweeteners} ${syrup} ${toppings} ${size}`;
};

Why this works well for LLMs

Boolean logic is converted into explicit sentences
Every capability and limitation is clearly stated
The output can be embedded directly into a system or user prompt

Summarizing Collections of Data

This same approach applies to lists of data such as milks, syrups, toppings, or sizes. Instead of passing an array of objects to the model, we convert them into bullet-style text summaries:

export const createSweetenersSummary = (): string => {
  return `Available sweeteners are:
${SWEETENERS.map(
  (s) => `- ${s.name}: ${s.description}`
).join('\n')}`;
};

This gives the model a complete, readable overview of available options without requiring it to interpret raw arrays.

Applying the Same Idea to Other Domains

This pattern is not limited to drinks or menus. It works for any domain. For example, here’s the same summarization technique applied to an object representing a shoe in an online ordering assistant:

export const createShoeItemSummary = (shoe: {
  name: string;
  description: string;
  genderCategory: string;
  styleType: string;
  material: string;
  availableInMultipleColors: boolean;
  limitedEdition: boolean;
  supportsCustomization: boolean;
}): string => {
  return `
A shoe named ${shoe.name}.
It is described as ${shoe.description}.
It is categorized as a ${shoe.genderCategory.toLowerCase()} shoe.
It belongs to the ${shoe.styleType.toLowerCase()} fashion style.
It is made of ${shoe.material.toLowerCase()} material.
${shoe.availableInMultipleColors ? 'It is available in multiple colors.' : 'It is available in a single color.'}
${shoe.limitedEdition ? 'It is a limited-edition release.' : 'It is not a limited-edition release.'}
${shoe.supportsCustomization ? 'It supports customization options.' : 'It does not support customization options.'}
`.trim();
};

Which produces an output like:

“A shoe named Veloria Canvas Sneaker. It is described as a minimalist everyday sneaker designed for casual wear. It is categorized as a unisex shoe. It belongs to the casual fashion style. It is made of breathable canvas material. It is available in multiple colors. It is not a limited-edition release. It supports light customization options.”

How to Persist Orders with MongoDB in NestJS

Now that we’ve established the core foundations of our application—schemas, parsers, and data-to-text summaries—it’s time to persist data. In a real-world assistant, orders and conversations shouldn’t disappear when the server restarts. They need to be stored reliably so they can be retrieved, analyzed, or continued later.

To achieve this, we’ll use MongoDB as our database and the NestJS Mongoose integration to manage data models and collections.

Connecting MongoDB to a NestJS Application

In NestJS, the AppModule is the root module of the application. This is where global dependencies—such as database connections—are configured.

@Module({
  imports: [
    MongooseModule.forRoot(process.env.MONGO_URI),
    ChatsModule,
  ],
  controllers: [AppController],
  providers: [AppService],
})
export class AppModule {}

What’s happening here?

MongooseModule.forRoot(...) establishes a global MongoDB connection.
The connection string is read from an environment variable (MONGO_URI), which is the recommended practice for security.
Once configured, this connection becomes available throughout the entire application.
ChatsModule is imported so it can access the database connection and register its own schemas.

This setup ensures that every feature module can safely interact with MongoDB without creating multiple connections.

Defining an Order Schema with Mongoose

NestJS uses decorators to define MongoDB schemas in a clean, class-based way. Each class represents a MongoDB document, and each property becomes a field in the collection.

@Schema()
export class Order {
  @Prop({ required: true })
  drink: string;

  @Prop({ default: null })
  size: string;

  @Prop({ default: null })
  milk: string;

  @Prop({ default: null })
  syrup: string;

  @Prop({ default: null })
  sweeter: string;

  @Prop({ default: null })
  toppings: string;

  @Prop({ default: 1 })
  quantity: number;
}

Why this approach?

Each @Prop() decorator maps directly to a MongoDB field.
Default values allow partial orders to be saved incrementally.
Required fields (like drink) enforce basic data integrity.
The schema closely mirrors the structured output produced by the LLM.

Once the class is defined, it’s converted into a MongoDB schema:

export const OrderSchema = SchemaFactory.createForClass(Order);

This single line creates:

A MongoDB collection
A validation layer
A schema that Mongoose can use to create, read, and update orders

How This Fits into the LLM Agent Architecture

At this point, we have:

Zod schemas → for validating AI output
Summarization functions → for converting data into readable prompts
MongoDB schemas → for persisting finalized orders

This separation is intentional:

Zod handles AI-facing validation
Mongoose handles database persistence
NestJS acts as the glue that ties everything together

Preparing for the Agent Logic

With the database in place, we’re now ready to implement the agent itself.

The agent’s responsibilities will include:

Interpreting user messages
Calling tools
Generating structured orders
Validating them
Persisting them to MongoDB
Maintaining conversational state

All of this logic will live inside the src/chats/chats.service.ts file. The next section introduces the agent’s core logic, and we’ll walk through it step by step so every part is easy to follow.

Start by importing the required dependencies:


import { Injectable } from '@nestjs/common';
import { InjectModel } from '@nestjs/mongoose';
import { MongoClient } from 'mongodb';
import { Model } from 'mongoose';

import { tool } from '@langchain/core/tools';
import {
  ChatPromptTemplate,
  MessagesPlaceholder,
} from '@langchain/core/prompts';
import { AIMessage, BaseMessage, HumanMessage } from '@langchain/core/messages';

import { ChatGoogleGenerativeAI } from '@langchain/google-genai';
import { StateGraph } from '@langchain/langgraph';
import { ToolNode } from '@langchain/langgraph/prebuilt';
import { Annotation } from '@langchain/langgraph';
import { START, END } from '@langchain/langgraph';

import { MongoDBSaver } from '@langchain/langgraph-checkpoint-mongodb';

import z from 'zod';

import { Order } from './schemas/order.schema';
import { OrderParser, OrderSchema, OrderType } from 'src/lib/schemas/orders';
import { DrinkParser } from 'src/lib/schemas/drinks';
import { DRINKS } from 'src/lib/utils/constants/menu_data';

import {
  createSweetenersSummary,
  availableToppingsSummary,
  createAvailableMilksSummary,
  createSyrupsSummary,
  createSizesSummary,
  createDrinkItemSummary,
} from 'src/lib/summaries';

const GOOGLE_API_KEY = process.env.GOOGLE_API_KEY || '';
const client: MongoClient = new MongoClient(process.env.MONGO_URI || '');
const database_name = 'drinks_db';

LangGraph State/Annotation Terms

In LangGraph, state can be thought of as a temporary workspace that exists while the agent is running. It stores all the information that nodes (we’ll cover nodes in detail later) might need to access information like the last message, the history of the conversation, or any intermediate data generated during execution.

This state allows nodes to read from it, update it, and pass information along as the agent processes a workflow, making it the agent’s short-term memory for the duration of the run.

@Injectable()
export class ChatService {

  chatWithAgent = async ({
    thread_id,
    query,
  }: {
    thread_id: string;
    query: string;
  }) => {

    const graphState = Annotation.Root({
      messages: Annotation({
        reducer: (x, y) => [...x, ...y],
      }),
    });

  }

}

This code defines the LangGraph state for the chat agent. The graphState object acts as a central memory that every node in the workflow can read from and update.

The messages field specifically stores all messages in the conversation, including user messages, AI responses, and tool outputs. The reducer function [...x, ...y] appends new messages to the existing array, preserving the conversation history across multiple steps.

LangGraph’s reducer mechanism lets developers control how new state merges with old state. In this chat system, the approach is similar to updating React state with setMessages(prev => [...prev, ...newMessages]): it keeps the old messages while adding the new ones.

Together, this state enables the agent, tools, and checkpointing system to maintain a coherent conversation, allowing each node in the LangGraph workflow to access the full context and contribute incrementally.

How to Create Tools for the Agent

Modern chatbots can do more than just generate text - they can also search the internet, read files, or perform computations. While LLMs are powerful, they cannot execute code or compile programs on their own.

In the code text of LLM agents, a tool is a piece of code written by the agent developer that an LLM can invoke on the host machine. The host machine executes the code, and the LLM only receives the final output of the computation.

Here's how to create a tool that stores orders in the database. Still in the chatWithAgent function within the ChatService class. Bellow the state store definition:

const orderTool = tool(
  async ({ order }: { order: OrderType }) => {
    try {
      await this.orderModel.create(order);
      return 'Order created successfully';
    } catch (error) {
      console.log(error);
      return 'Failed to create the order';
    }
  },
  {
    schema: z.object({
      order: OrderSchema.describe('The order that will be stored in the DB'),
    }),
    name: 'create_order',
    description: 'This tool creates a new order in the database',
  }
);

const tools = [orderTool];

LangGraph Nodes (Workflow Components)

From a definition standpoint, a LangGraph node is a fundamental component of a LangGraph workflow, representing a single unit of computation or an individual step in an AI agent's process.

Each node can perform a specific task, such as generating a message, invoking a tool, or transforming data, and it interacts with the state to read inputs and write outputs. Together, nodes are connected to form the agent’s workflow or execution graph, allowing complex reasoning and multi-step operations.

In our project, we’ll have four nodes.

Agent node: This node is in charge of interacting with the LLM - it constructs the agent’s main message template and stacks old messages to the new prompt to create context.
Tools node: The tools node introduces external capabilities, which allow the workflow to interact with external APIs
START node: This node indicates the entry point of our workflow, or to be precise, which node to call when a user initiates a conversation with the agent. It’s quite simple to define.
addConditionalEdges - addConditionalEdges('agent', shouldContinue): In LangGraph, .addConditionalEdges('agent', shouldContinue) lets the workflow branch dynamically after the 'agent' node runs, based on a condition defined in shouldContinue. Unlike a fixed edge, which always goes from one node to the next, a conditional edge evaluates the agent’s output and directs the workflow to different nodes depending on the result, allowing the AI agent to make decisions and adapt its next steps.

Graph Declaration

In LangGraph, a graph is the central structure that models an AI agent’s workflow as interconnected nodes, where each node represents a computation step, tool, or decision. It orchestrates the flow of data and control between nodes, manages conditional branching, and maintains the recursive loop of execution.

Essentially, the graph is the backbone that ensures complex, stateful interactions happen in a coordinated and modular way, connecting nodes like agent, tools, and conditional edges into a coherent workflow.

With that knowledge in place, we can now create the agent graph with all its nodes.

  const callModal = async (states: typeof graphState.State) => {
    const prompt = ChatPromptTemplate.fromMessages([
      {
        role: 'system',
        content: `
            You are a helpful assistant that helps users order drinks from Starbucks.
            Your job is to take the user's request and fill in any missing details based on how a complete order should look.
            A complete order follows this structure: ${OrderParser}.

            **TOOLS**
            You have access to a "create_order" tool.
            Use this tool when the user confirms the final order.
            After calling the tool, you should inform the user whether the order was successfully created or if it failed.

            **DRINK DETAILS**
            Each drink has its own set of properties such as size, milk, syrup, sweetener, and toppings.
            Here is the drink schema: ${DrinkParser}.

            You must ask for any missing details before creating the order.

            If the user requests a modification that is not supported for the selected drink, tell them that it is not possible.

            If the user asks for something unrelated to drink orders, politely tell them that you can only assist with drink orders.

            **AVAILABLE OPTIONS**
            List of available drinks and their allowed modifications:
            ${DRINKS.map((drink) => `- ${createDrinkItemSummary(drink)}`)}

            Sweeteners: ${createSweetenersSummary()}
            Toppings: ${availableToppingsSummary()}
            Milks: ${createAvailableMilksSummary()}
            Syrups: ${createSyrupsSummary()}
            Sizes: ${createSizesSummary()}

            Order schema: ${OrderParser}

            If the user's query is unclear, tell them that the request is not clear.

            **ORDER CONFIRMATION**
            Once the order is ready, you must ask the user to confirm it.
            If they confirm, immediately call the "create_order" tool.
            Only respond after the tool completes, indicating success or failure.

            **FRONTEND RESPONSE FORMAT**
            Every response must include:

            "message": "Your message to the user",
            "current_order": "The order currently being constructed",
            "suggestions": "Options the user can choose from",
            "progress": "Order status ('completed' after creation)"

            **IMPORTANT RULES**
            - Be friendly, use emojis, and add humor.
            - Use null for unfilled fields.
            - Never omit the JSON tracking object.
        `,
      },
      new MessagesPlaceholder('messages'),
    ]);

  const formattedPrompt = await prompt.formatMessages({
    time: new Date().toISOString(),
    messages: states.messages,
  });

  const chat = new ChatGoogleGenerativeAI({
    model: 'gemini-2.0-flash',
    temperature: 0,
    apiKey: GOOGLE_API_KEY,
  }).bindTools(tools);

  const result = await chat.invoke(formattedPrompt);
  return { messages: [result] };
  };     
    const shouldContinue = (state: typeof graphState.State) => {
      const lastMessage = state.messages[
        state.messages.length - 1
      ] as AIMessage;
      return lastMessage.tool_calls?.length ? 'tools' : END;
    };

    const toolsNode = new ToolNode(tools);

    /**
     * Build the conversation graph.
     */
    const graph = new StateGraph(graphState)
      .addNode('agent', callModal)
      .addNode('tools', toolsNode)
      .addEdge(START, 'agent')
      .addConditionalEdges('agent', shouldContinue)
      .addEdge('tools', 'agent');

Explanation

Graph State (graphState)
The graphState object is the shared memory across all nodes. It stores messages, which track the conversation history including user inputs, AI responses, and tool interactions. The reducer [...x, ...y] appends new messages, preserving past context. This is similar to React state updates: old messages remain while new ones are added.
Agent Node (callModal)
This node handles the LLM call. It formats a prompt containing system instructions, drink schemas, available tools, and frontend response rules. By including states.messages, the AI sees the full conversation history, enabling multi-turn dialogue.
LLM Execution
ChatGoogleGenerativeAI generates the AI response. .bindTools(tools) allows the AI to call tools like create_order directly if needed.
Conditional Flow (shouldContinue)
After the AI responds, the shouldContinue function checks if the message includes tool calls. If so, execution moves to the tools node; otherwise, the workflow ends. This allows dynamic branching depending on the AI’s output.
Tool Node (ToolNode)
The tools node executes the requested tool, such as saving the order to the database. Once completed, control returns to the agent node, enabling the AI to respond to the user with results.
Graph Construction (StateGraph)
Nodes are connected in a coherent workflow:
- START → agent begins the conversation
- Conditional edges handle tool execution
- tools → agent ensures the agent can respond after tools run
Overall Flow
Together, the graph and shared state ensure a stateful, multi-turn conversation. The AI can ask for missing details, call tools when needed, and maintain context across interactions. Every node reads and writes to the same state.

Workflow Compilation and State Persistence (Final Part)

So far, all of our states are temporary, meaning they only exist for the duration of a user’s request. However, we want our agent to remember and recall conversation context even when a new request is sent with the same thread_id or conversation ID.

To achieve this, we’ll use MongoDB in combination with the langchain/langgraph-checkpoint-mongo library. This library simplifies state persistence by associating each conversation with a unique, manually assigned ID. All operations—from retrieving previous messages to saving new ones—are handled internally, you only need to provide the conversation ID you want to work with.

const graph = new StateGraph(graphState)
  .addNode('agent', callModal)
  .addNode('tools', toolsNode)
  .addEdge(START, 'agent')
  .addConditionalEdges('agent', shouldContinue)
  .addEdge('tools', 'agent');

  const checkpointer = new MongoDBSaver({ client, dbName: database_name });

  const app = graph.compile({ checkpointer });

  /**
     * Run the graph using the user's message.
     */
    const finalState = await app.invoke(
      { messages: [new HumanMessage(query)] },
      { recursionLimit: 15, configurable: { thread_id } },
    );

  /**
   * Extract JSON payload from AI response.
   */
  function extractJsonResponse(response: any) {
    const match = response.match(/```json\\s*([\\s\\S]*?)\\s*```/i);
    if (match && match[1] && typeof response === 'string') {
      return JSON.parse(match[1].trim());
    }
    throw response;
  }

  const lastMessage = finalState.messages.at(-1) as AIMessage; // Extract the last message of the conversation
  return extractJsonResponse(lastMessage.content); //Response

The above code demonstrates how to initialize a checkpoint, compile a graph, and invoke the agent with an incoming prompt.

The extractJsonResponse method is used to grab the formatted response that we instructed the LLM to generate whenever it’s sending back something to the user.

Based on this given instruction from the main template, every response must include: "message": "Your message to the user", "current_order": "The order currently being constructed", "suggestions": "Options the user can choose from", "progress": "Order status ('completed' after creation)"

Every response from the LLM should look like this:

'```json\\n' +
  '{\\n' +
  '"message": "Got it! To make sure I get your order just right, can you clarify which coffee drink you\\'d like? We have Latte, Cappuccino, Cold Brew, and Frappuccino. 😊",\\n' +
  '"current_order": {\\n' +
  '"drink": null,\\n' +
  '"size": null,\\n' +
  '"mil": null,\\n' +
  '"syrup": null,\\n' +
  '"sweeteners": null,\\n' +
  '"toppings": null,\\n' +
  '"quantity": null\\n' +
  '},\\n' +
  '"suggestions": [\\n' +
  '"Latte",\\n' +
  '"Cappuccino",\\n' +
  '"Cold Brew",\\n' +
  '"Frappuccino"\\n' +
  '],\\n' +
  '"progress": "incomplete"\\n' +
  '}\\n' +
  '```';

This structure allows the frontend to easily render the LLM response and track the state of the current order. This is more of a design choice and less of a convention.

Conclusion

Building an autonomous AI agent with LangChain and LangGraph allows you to combine the reasoning power of LLMs with practical tool execution and persistent memory. By defining schemas, parsing data into human-readable formats, and orchestrating workflows through nodes, you can create intelligent agents capable of handling real-world tasks—like our Starbucks barista.

With MongoDB integration for state persistence, your agent can maintain context across conversations, making interactions feel more natural and human-like. This approach opens the door to building more sophisticated, domain-specific AI assistants without starting from scratch.

In short: define your data, teach your agent how to reason, and let LangGraph orchestrate the magic. ☕🤖

Source code here: https://github.com/DjibrilM/langgraph-starbucks-agent

Resources

LangGraph documentation: https://docs.langchain.com/oss/javascript/langgraph/quickstart
Synergizing Reasoning and Acting in Language Models: https://arxiv.org/abs/2210.03629

How to Use LangChain and LangGraph: A Beginner’s Guide to AI Workflows

Manish Shivanandhan — Wed, 05 Nov 2025 17:23:58 +0000

Artificial intelligence is moving fast. Every week, new tools appear that make it easier to build apps powered by large language models.

But many beginners still get stuck on one question: how do you structure the logic of an AI application? How do you connect prompts, memory, tools, and APIs in a clean way?

That is where popular open-source frameworks like LangChain and LangGraph come in.

Both are part of the same ecosystem, and they’re designed to help you build complex AI workflows without reinventing the wheel.

LangChain focuses on building sequences of steps called chains, while LangGraph takes things a step further by adding memory, branching, and feedback loops to make your AI more intelligent and flexible.

This guide will help you understand what these tools do, how they differ, and how you can start using them to build your own AI projects.

What we will cover

What is LangChain?
- Why LangChain Was Not Enough
What is LangGraph?
LangChain vs LangGraph
When to Use Each
Adding Memory and Persistence
Monitoring and Debugging with LangSmith
The LangChain Ecosystem
Conclusion

What is LangChain?

LangChain is a Python and JavaScript framework that helps you build language model-powered applications. It provides a structure for connecting models like GPT, data sources, and tools into a single flow.

Instead of writing long prompt templates or hardcoding logic, you use components like chains, tools, and agents.

A simple example is chaining prompts together. For instance, you might first ask the model to summarize text, and then use the summary to generate a title. LangChain lets you define both steps and connect them in code.

Here is a basic example in Python:

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = PromptTemplate.from_template("Summarize the following text:\n{text}")
chain = LLMChain(prompt=prompt, llm=llm)
result = chain.run({"text": "LangChain helps developers build AI apps faster."})
print(result)

This simple chain takes text and runs it through an OpenAI model to get a summary. You can add more steps, like a second chain to turn that summary into a title or a question.

LangChain provides modules for prompt templates, models, retrievers, and tools so you can build workflows without managing the raw API logic.

Here is the full LangChain documentation.

Why LangChain Was Not Enough

LangChain made it easy to build straight-line workflows.

But most real-world applications are not linear. When building a chatbot, summarizer, or an autonomous agent, you often need loops, memory, and conditions.

For example, if the AI makes a wrong assumption, you might want it to try again. If it needs more data, it should call a search tool. Or if a user changes context, the AI should remember what was discussed earlier.

LangChain’s chains and agents could do some of this, but the flow was hard to visualize and manage. You had to write nested chains or use callbacks to handle decisions.

Developers wanted a better way to represent how AI systems actually think. Not in straight lines, but as graphs where outputs can lead to different paths.

That’s what led to LangGraph.

What is LangGraph?

LangGraph is an extension of LangChain that introduces a graph-based approach to AI workflows.

Instead of chaining steps in one direction, LangGraph lets you define nodes and edges like a flowchart. Each node can represent a task, an action, or a model call.

This structure allows loops, branching, and parallel paths. It’s perfect for building agent-like systems where the model reasons, decides, and acts.

Here is an example of a simple LangGraph setup:

from langgraph.graph import StateGraph, END
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain.agents import Tool

def multiply(a: int, b: int):
    return a * b
tools = [Tool(name="multiply", func=multiply, description="Multiply two numbers")]
llm = ChatOpenAI(model="gpt-4o-mini")
agent_executor = create_react_agent(llm, tools)
graph = StateGraph()
graph.add_node("agent", agent_executor)
graph.set_entry_point("agent")
graph.add_edge("agent", END)
app = graph.compile()
response = app.invoke({"input": "Use the multiply tool to get 8 times 7"})
print(response)

This example shows a basic agent graph.

The AI receives a request, reasons about it, decides to use the tool, and completes the task. You can imagine extending this to more complex graphs where the AI can retry, call APIs, or fetch new information.

LangGraph gives you full control over how the AI moves between states. Each node can have conditions. For example, if an answer is incomplete, you can send it back to another node to refine it.

This makes LangGraph ideal for building systems that need multiple reasoning steps, like document analysis bots, code reviewers, or research assistants.

Here is the full LangGraph documentation.

LangChain vs LangGraph

LangChain and LangGraph share the same foundation, but they approach workflows differently.

LangChain is linear. Each chain or agent moves from one step to the next in a sequence. It is simpler to start with, especially for prompt engineering, retrieval-augmented generation, and structured pipelines.

LangGraph is dynamic. It represents workflows as graphs that can loop, branch, and self-correct. It is more powerful when building agents that need reasoning, planning, or memory.

A good analogy is this: LangChain is like writing a list of tasks in order. LangGraph is like drawing a flowchart where decisions can lead to different actions or back to previous steps.

Most developers start with LangChain to learn the basics, then move to LangGraph when they want to build more interactive or autonomous AI systems.

When to Use Each

If you’re building simple tools like text summarizers, chatbots, or document retrievers, LangChain is enough. It’s easy to get started and integrates well with popular models like GPT, Claude, and Gemini.

If you want to build multi-step agents, or apps that think and adapt, go with LangGraph. You can define how the AI reacts to different outcomes, and you get more control over retry logic, context switching, and feedback loops.

In practice, many developers combine both. LangChain provides the building blocks, while LangGraph organizes how those blocks interact.

Adding Memory and Persistence

Both LangChain and LangGraph support memory, which allows your AI to remember context between interactions. This is useful when you’re building chatbots, assistants, or agents that need to carry information across steps.

For example, if a user introduces themselves once, the AI should be able to recall that detail later in the conversation.

In LangChain, memory is handled through built-in modules like ConversationBufferMemory or ConversationSummaryMemory. These let you store previous inputs and outputs so the model can reference them in future responses.

Here’s a simple example using LangChain:

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI

memory = ConversationBufferMemory()
llm = ChatOpenAI(model="gpt-4o-mini")
conversation = ConversationChain(llm=llm, memory=memory)

conversation.predict(input="Hello, I am Manish.")
response = conversation.predict(input="What did I just tell you?")
print(response)

In this case, the model remembers your previous message and answers accordingly. The memory object acts like a running conversation log, keeping track of the dialogue as it evolves.

LangGraph takes this a step further by embedding memory into the graph’s state. Each node in the graph can access or update shared memory, allowing your AI to maintain context across multiple reasoning steps or branches. This approach is especially useful when building agents that loop, revisit nodes, or depend on previous interactions.

Here’s how memory can be added inside a LangGraph workflow:

from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI(model="gpt-4o-mini")
memory = ConversationBufferMemory()

agent = create_react_agent(llm)
graph = StateGraph()

# Add node with access to memory
graph.add_node("chat", lambda state: agent.invoke({"input": state["input"], "memory": memory}))
graph.set_entry_point("chat")
graph.add_edge("chat", END)

app = graph.compile()

app.invoke({"input": "Hello, I am Manish."})
response = app.invoke({"input": "What did I just tell you?"})
print(response)

Here, the graph keeps track of memory between invocations. Even though each call runs through the same node, the shared ConversationBufferMemory retains what was said earlier. This design lets you build agents that remember user context, maintain history, and adapt as they move between nodes.

Whether you use LangChain or LangGraph, adding memory is what turns a simple workflow into a stateful system, one that can carry on a conversation, refine its reasoning, and respond more naturally over time.

Monitoring and Debugging with LangSmith

LangSmith is another important tool from the LangChain ecosystem. It helps you visualize, monitor, and debug your AI applications.

When building workflows, you often want to see how the model behaves, how much it costs, and where things go wrong.

LangSmith records every call made by your chains and agents. You can view input and output data, timing, token usage, and errors. It provides a dashboard that shows how your system performed across multiple runs.

You can integrate LangSmith easily by setting your environment variable:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="your_api_key_here"

Then, every LangChain or LangGraph process you run will automatically log to LangSmith. This helps developers find bugs, optimize prompts, and understand how the workflow behaves at each step.

Note that while Langchain and LangGraph are open source, Langsmith is a paid platform. Langsmith is a good-to-have tool and not a requirement to build AI workflows.

The LangChain Ecosystem

LangChain is not just one library. It has grown into an ecosystem of tools that work together.

LangChain Core: The main framework for chains, prompts, and memory.
LangGraph: A graph-based extension for building adaptive workflows.
LangSmith: A debugging and monitoring platform for AI apps.
LangServe: A deployment layer that lets you turn your chains and graphs into APIs with one command.

Together, these tools form a complete stack for building, managing, and deploying language model applications. You can start with a simple chain, evolve it into a graph-based system, test it with LangSmith, and deploy it using LangServe.

Conclusion

LangChain and LangGraph make it easier to move from prompts to production-ready AI systems. LangChain helps you build linear flows that connect models, data, and tools. LangGraph lets you go further by building adaptive and intelligent workflows that reason and learn.

For beginners, starting with LangChain is the best way to understand how language models can interact with other components. As your projects grow, LangGraph will give you the flexibility to handle complex logic and long-term state.

Whether you are building a chatbot, an agent, or a knowledge assistant, these tools will help you go from idea to implementation faster and more reliably.

Hope you enjoyed this article. Signup for my free newsletter TuringTalks.ai for more hands-on tutorials on AI. You can also visit my website.

How to Use LangChain and GPT to Analyze Multiple Documents

David Clinton — Wed, 06 Nov 2024 16:06:55 +0000

Over the past year or so, the developer universe has exploded with ingenious new tools, applications, and processes for working with large language models and generative AI.

One particularly versatile example is the LangChain project. The overall goal involves providing easy integrations with various LLM models. But the LangChain ecosystem is also host to a growing number of (sometimes experimental) projects pushing the limits of the humble LLM.

Spend some time browsing LangChain’s website to get a sense of what's possible. You'll see how many tools are designed to help you build more powerful applications.

But you can also use it as an alternative for connecting your favorite AI with the live internet. Specifically, this demo will show you how to use it to programmatically access, summarize, and analyze long and complex online documents.

To make it all happen, you’ll need a Python runtime environment (like Jupyter Lab) and a valid OpenAI API key.

Prepare Your Environment

One popular use for LangChain involves loading multiple PDF files in parallel and asking GPT to analyze and compare their contents.

As you can see for yourself in the LangChain documentation, existing modules can be loaded to permit PDF consumption and natural language parsing. I'm going to walk you through a use-case sample that's loosely based on the example in that documentation. Here's how that begins:

import os
os.environ['OPENAI_API_KEY'] = "sk-xxx"
from pydantic import BaseModel, Field
from langchain.chat_models import ChatOpenAI
from langchain.agents import Tool
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA

That code will build your environment and set up the tools necessary for:

Enabling OpenAI Chat (ChatOpenAI)
Understanding and processing text (OpenAIEmbeddings, CharacterTextSplitter, FAISS, RetrievalQA)
Managing an AI agent (Tool)

Next, you'll create and define a DocumentInput class and a value called llm which sets some familiar GPT parameters that'll both be called later:

class DocumentInput(BaseModel):
    question: str = Field()
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")

Load Your Documents

Next, you'll create a couple of arrays. The three path variables in the files array contain the URLs for recent financial reports issued by three software/IT services companies: Alphabet (Google), Cisco, and IBM.

We're going to have GPT dig into three companies’ data simultaneously, have the AI compare the results, and do it all without having to go to the trouble of downloading PDFs to a local environment.

You can usually find such legal filings in the Investor Relations section of a company's website.

tools = []
files = [
    {
        "name": "alphabet-earnings",
        "path": "https://abc.xyz/investor/static/pdf/2023Q1\
        _alphabet_earnings_release.pdf",
    },
    {
        "name": "Cisco-earnings",
        "path": "https://d18rn0p25nwr6d.cloudfront.net/CIK-00\
            00858877/5b3c172d-f7a3-4ecb-b141-03ff7af7e068.pdf",
    },
    {
        "name": "IBM-earnings",
        "path": "https://www.ibm.com/investor/att/pdf/IBM_\
            Annual_Report_2022.pdf",
    },
    ]

This for loop will iterate through each value of the files array I just showed you. For each iteration, it'll use PyPDFLoader to load the specified PDF file, loader and CharacterTextSplitter to parse the text, and the remaining tools to organize the data and apply the embeddings. It'll then invoke the DocumentInput class we created earlier:

for file in files:
    loader = PyPDFLoader(file["path"])
    pages = loader.load_and_split()
    text_splitter = CharacterTextSplitter(chunk_size=1000, \
        chunk_overlap=0)
    docs = text_splitter.split_documents(pages)
    embeddings = OpenAIEmbeddings()
    retriever = FAISS.from_documents(docs, embeddings).as_retriever()
# Wrap retrievers in a Tool
tools.append(
    Tool(
        args_schema=DocumentInput,
        name=file["name"],
        func=RetrievalQA.from_chain_type(llm=llm, \
            retriever=retriever),
    )
)

Prompt Your Model

At this point, we're finally ready to create an agent and feed it our prompt as input.

llm = ChatOpenAI(
    temperature=0,
    model="gpt-3.5-turbo-0613",
)
agent = initialize_agent(
    agent=AgentType.OPENAI_FUNCTIONS,
    tools=tools,
    llm=llm,
    verbose=True,
)
    agent({"input": "Based on these SEC filing documents, identify \
        which of these three companies - Alphabet, IBM, and Cisco \
        has the greatest short-term debt levels and which has the \
        highest research and development costs."})

The output that I got was short and to the point:

‘output’: ‘Based on the SEC filing documents:\n\n- The company with the greatest short-term debt levels is IBM, with a short-term debt level of $4,760 million.\n- The company with the highest research and development costs is Alphabet, with research and development costs of $11,468 million.’

Wrapping Up

As you’ve seen, LangChain lets you integrate multiple tools into generative AI operations, enabling multi-layered programmatic access to the live internet and more sophisticated LLM prompts.

With these tools, you’ll be able to automate applying the power of AI engines to real-world data assets in real time. Try it out for yourself.

This article is excerpted from my Manning book, The Complete Obsolete Guide to Generative AI. But you can find plenty more technology goodness at my website.

How to Start Building Projects with LLMs

Harshit Tyagi — Mon, 30 Sep 2024 18:46:25 +0000

If you’re an aspiring AI professional, becoming an LLM engineer offers an exciting and promising career path.

But where should you start? What should your trajectory look like? How should you learn?

In one of my previous posts, I laid out the complete roadmap to become an AI / LLM Engineer. Reading this article will give you insights into the types of skills you’ll need to acquire and how to start learning.

The Best Way to Learn is to BUILD!

As Andrej Karpathy puts it:

Andrej emphasizes that you should build concrete projects, and explain everything you learn in your own words. (He also instructs us to only compare ourselves to a younger version of ourselves – never to others.)

And I agree – building projects is the best way to not just learn but really grok these concepts. It will further sharpen the skills you’re learning to think about cutting edge use cases.

But the main challenge with this learning philosophy is that good projects can be hard to find.

And that’s the problem I am trying to resolve. I want to help people, including myself, discover and build practical and real-world projects that help you develop skills that are worth showcasing in your portfolio.

Here’s What We’ll Cover:

What Should Be Your First Project?
Project #1: YouTube Video Summarizer
Project #2 preview: Multi-purpose Customer Service Bot
Project #3 preview: RAG-Powered Support Bot
Conclusion

What Should Be Your First Project?

If you’re a beginner who knows basic to intermediate programming, your initial projects should showcase that you can comfortably build applications with LLMs.

They should demonstrate that:

you know what APIs are
you know how to consume them
you know how to build products that people actually want to use

Building a chatbot provides a great starting point, but at this point everyone has developed one. And there are many solutions for easy Streamlit based prototypes. So, you need to develop something that’s actually usable and has the potential to reach a wider audience.

I’d suggest building a chatbot for WhatsApp or Discord or Telegram. Build a chatbot which solves a problem people struggle with, a problem that companies have started to build solutions for.

If I had to pick a good and, arguably, the most common AI project that every company has started to work on, it would be RAG-powered chatbots.

But before you get to building RAG-powered bots, you should start building something slightly more basic but practical with LLMs.

To kick things off, let’s start by building a YouTube Summariser.

Project #1: Summarise YouTube Videos

We’ll build the first part of this project in this tutorial: the core functionality of a YouTube video summariser tool.

Our bot will:

Receive the YouTube URL.
Validate if the URL is correct.
Retrieve the transcript of the video
Use an LLM to analyze and summarize the video’s content.
Return the summary to the user.

Setup and Requirements

For this project, we’ll code the core functionality in a Jupyter Notebook using the following Python packages:

langchain-together — for the LLM using the LangChain <> Together AI integration
langchain-community — for specific data loaders
langchain — for programming with LLMs
pytube — for fetching video info
youtube-transcript-api — for youtube video transcript

We’ll use the Llama 3.1 model offered as an API by Together AI.

Together AI is a cloud platform that offers the open source models as inference APIs. without worrying about the underlying infrastructure.

Let’s start by installing these:

!pip install — upgrade — quiet langchain
!pip install — quiet langchain-community
!pip install — upgrade — quiet langchain-together
!pip install youtube_transcript_api
!pip install pytube

Now let’s set up our LLM:

## setting up the language model
from langchain_together import ChatTogether
import api_key

llm = ChatTogether(api_key=api_key.api,temperature=0.0, 
                   model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo")

The next step is to process the YouTube videos as a data source. For this we’ll need to understand the concept of document loaders.

Introduction to Document Loaders

Document loaders provide a unified interface to load data from various sources into a standardized Document format.

They automatically extract and attach relevant metadata to the loaded content.
The metadata can include source information, timestamps, or other contextual data that can be valuable for downstream processing.
LangChain offers loaders for CSV, PDF, HTML, JSON, and even specialized loaders for sources like YouTube transcripts or GitHub repositories, as listed in their integrations page.

Categories of Document Loaders

Document loaders in LangChain can be broadly categorized into two types:

File Type-Based Loaders

Parse and load documents based on specific file formats
Examples include: CSV, PDF, HTML, Markdown

2. Data Source-Based Loaders

Retrieve data from various external sources
Load the data into Document objects
Examples include: YouTube, Wikipedia, GitHub

Integration Capabilities

LangChain’s document loaders can integrate with almost any file format you might need.
They also support many third-party data sources.

For our project, we’ll use the YoutubeLoader to get the transcripts in the required format.

YoutubeLoader from LangChain to Get Transcript:

## import the youtube documnent loader from LangChain
from langchain_community.document_loaders import YoutubeLoader

video_url = 'https://www.youtube.com/watch?v=gaWxyWwziwE'
loader = YoutubeLoader.from_youtube_url(video_url, add_video_info=False)
data = loader.load()

Process the YouTube Transcript

Display raw transcript content
Use the LLM to summarize and extract key points from the transcript:

# show the extracted page content
data[0].page_content

The page_content attribute contains the complete transcript as shown in the output below:

Now that we have the transcript, we simply need to pass this to the LLM we configured above along with the prompt to summarise.

First, let’s understand a simple method:

Langchain offers the invoke() method to which you need to pass the system message and the user or human message.

The system message is essentially the instructions for the LLM on how it is supposed to process the human request.

And the human message is simply what we want the LLM to do.

# This code creates a list of messages for the language model:
# 1. A system message with instructions on how to summarize the video transcript
# 2. A human message containing the actual video transcript

# The messages are then passed to the language model (llm) for processing
# The model's response is stored in the 'ai_msg' variable and returned

messages = [
    (
        "system", 
        """Read through the entire transcript carefully.
           Provide a concise summary of the video's main topic and purpose.
           Extract and list the five most interesting or important points from the transcript. For each point: State the key idea in a clear and concise manner.

        - Ensure your summary and key points capture the essence of the video without including unnecessary details.
        - Use clear, engaging language that is accessible to a general audience.
        - If the transcript includes any statistical data, expert opinions, or unique insights, prioritize including these in your summary or key points.""",
    ),
    ("human", data[0].page_content),
]
ai_msg = llm.invoke(messages)
ai_msg

But this method won’t work when you have more variables and when you want a more dynamic solution.

For this, LangChain offers PromptTemplate:

A PromptTemplate in LangChain is a powerful tool that helps in creating dynamic prompts for large language models (LLMs). It allows you to define a template with placeholders for variables that can be filled in with actual values at runtime.

This helps in managing and reusing prompts efficiently, ensuring consistency and reducing the likelihood of errors in prompt creation.

A PromptTemplate consists of:

Template String: The actual prompt text with placeholders for variables.
Input Variables: A list of variables that will be replaced in the template string at runtime.

# Set up a prompt template for summarizing a video transcript using LangChain

# Import necessary classes from LangChain
from langchain.prompts import PromptTemplate
from langchain import LLMChain

# Define a PromptTemplate for summarizing video transcripts
# The template includes instructions for the AI model on how to process the transcript
product_description_template = PromptTemplate(
    input_variables=["video_transcript"],
    template="""
    Read through the entire transcript carefully.
           Provide a concise summary of the video's main topic and purpose.
           Extract and list the five most interesting or important points from the transcript. 
           For each point: State the key idea in a clear and concise manner.

        - Ensure your summary and key points capture the essence of the video without including unnecessary details.
        - Use clear, engaging language that is accessible to a general audience.
        - If the transcript includes any statistical data, expert opinions, or unique insights, 
        prioritize including these in your summary or key points.

    Video transcript: {video_transcript}    """
)

How to Use LLMChain / LCEL for Summarization

A chain is a sequence of steps that consists of a language model, PromptTemplate, and an optional output parser.

Create an LLMChain with the custom prompt template
Generate a summary of the video transcript using the chain

Here, we are using LLMChain but you can also use LangChain Expression Language as well to do this:

## invoke the chain with the video transcript 
chain = LLMChain(llm=llm, prompt=product_description_template)

# Run the chain with the provided product details
summary = chain.invoke({
    "video_transcript": data[0].page_content
})

This will give you the summary object which has the text attribute that contains the response in markdown format.

summary['text']

The raw response will look like this:

To see the Markdown formatted response:

from IPython.display import Markdown, display

display(Markdown(summary['text']))

And there you go:

So, the core functionality of our YouTube summariser is now working.

But this is working in your Jupyter Notebook, to make it more accessible, we’d need to get this functionality deployed on WhatsApp.

How to serve the YT summariser on WhatsApp

For this, we’d need to serve our YT summarisation functionality as an API endpoint for which we are going to use Flask. You can also use FastAPI.

Now we’ll turn all the code in the Jupyter notebook into functions. So, add a function to check if it is a valid youtube URL, then define the summarise function that is basically a compilation of what we wrote in the Jupyter notebook.

You can configure our endpoint in the following manner:

@app.route('/summary', methods=['POST'])
def summary():
    url = request.form.get('Body')  # Get the JSON data from the request body
    print(url)
    if is_youtube_url(url):
        response = summarise(url)
    else:
        response = "please check if this is a correct youtube video url"
    print(response)
    resp = MessagingResponse()
    msg = resp.message()
    msg.body(response)
    return str(resp)

Once your app.py is ready with your Flask API, run the Python script, and you should have your server running locally on your system.

The next step is to make your local server connect with WhatsApp, and that’s where we’ll use Twilio.

Twilio allows us to implement this handshake by offering a WhatsApp sandbox to test your bot. You can follow the steps in this guide here to build this connection.

I got the connection established:

Now, we can start testing our WhatsApp bot:

Amazing!

I explain all the steps in detail in my project-based course on Building LLM-powered WhatsApp Chatbots.

It’s a 3-project course that contains two other more complex projects. I’ll give you a brief summary of those other projects here so you can try them out for yourselves. And if you’re interested, you can check out the course as well.

Project #2 — Build a Bot that Can Handle Different Types of User Queries

This bot acts as a customer service representative for an airline. It can answer questions related to flight status, baggage inquiries, ticket booking, and more. It uses Langchain’s Router and LLM models to dynamically generate responses based on the user’s input.

Different prompt templates are defined for various customer queries, such as flight status, baggage inquiries, and complaints.
Based on the query, the router selects the appropriate template and generates a response.
Twilio then sends the response back to the WhatsApp chat.

Project #3 — RAG-Powered Support Bot

This chatbot answers questions related to airline services using a document-based system. The document is converted into embeddings, which are then queried using Langchain’s RAG system to generate responses. Companies want developers these days who have these skills, so this is an especially practical project.

The guidelines/rules document is embedded using FAISS and HuggingFace models.
When a user submits a question, the RAG system retrieves relevant information from the document.
The system then generates a response using a pre-trained LLM and sends it back via Twilio.

These 3 projects will get you started so you can continue experimenting and learning more about AI engineering.

Customer Support is the most funded category in AI because it reduces the cost instantly if AI can handle communication with disgruntled users.

So, we build bots that can handle different types of queries, intelligent RAG powered bots which will have access to proprietary documents to provided up-to-date information to the users.

That’s why I created this project-based course to help you start building with LLMs.

Check out the course preview here:

And to thank you for reading this guide, you can use the code FREECODECAMP to get a 20% discount on my course.

I want to make this affordably accessible for all those who are sincere about building with AI, so I’ve priced it affordably at $14.99 USD.

Conclusion

In this tutorial, we focused on building a fun YouTube video summarizer tool that is served on WhatsApp.

The bot's core functionality includes:

Receiving a YouTube URL
Validating the URL
Retrieving the video transcript
Using an LLM to summarize the content
Returning the summary to the user

We used a number of Python packages including langchain-together, langchain-community, langchain, pytube, and youtube-transcript-api.

The project uses the Llama 3.1 model via Together AI's API.

We built the core summarisation functionality using

Using LangChain's invoke() method with system and human messages
Using PromptTemplate and LLMChain for more dynamic solutions

To make the tool accessible via WhatsApp:

The functionality is served as an API endpoint using Flask
Twilio is used to connect the local server with WhatsApp
A WhatsApp sandbox is used for testing the bot

To continue building further projects, check out the course.

It is a beginner track course where you start from learning to build with LLMs, then apply those skills to build 3 different types of LLM applications. Not just that – you learn to serve your applications as WA chatbots.

Learn LangChain to link LLMs with external data

Beau Carnes — Wed, 22 Nov 2023 04:10:10 +0000

LangChain is an AI-first framework designed to enable developers to create context-aware reasoning applications by linking powerful Large Language Models with external data sources.

We just published a course on the freeCodeCamp.org YouTube channel that will teach you all about LangChain. The course will equip you with the cutting-edge skills needed to build a highly knowledgeable chatbot using LangChain Expression Language.

Tom Chant is a popular instructor at Scrimba. In this course, Tom will take you on a journey from the basics of LangChain.js to advanced concepts. You'll delve into an array of topics including embeddings, app flow diagrams, Supabase vector store, text splitting, and much more. The course is structured to make learning LangChain.js approachable and enjoyable, with a focus on practical applications.

The course even includes an introduction to LangChain from Jacob Lee, the lead maintainer of LangChain.js.

In this course, you will learn about:

Splitting with a LangChain textSplitter tool
Vectorising text chunks
Using embeddings models
Supabase vector store
Templates with input_variables
Prompts from templates
LangChain Expression Language
Basic chains with the .Pipe() method
Retrieval from a vector store
Complex chains with RunnableSequence()
The StringOutputParser() class
Troubleshooting performance issues

In this course, you'll learn how to use LangChain.js to build a chatbot that can answer questions on a specific text you give it.

In the first part of the project, you'll learn about using LangChain to split text into chunks, convert the chunks to vectors using an OpenAI embeddings model, and store them together in a Supabase vector store.

Next, you'll learn about chains, which are the building blocks of LangChain. And we do this using LangChain Expression Language. This makes the process of coding in LangChain much smoother and easier to grasp.

Finally, you'll learn about retrieval: using vector matching to select the text chunks from our vector store which are most likely to hold the answer to a user’s query. This enables the chatbot to answer questions specific to your data - a critical skill when working with AI and one of the most common use-cases for AI in web dev.

Watch the full course on the freeCodeCamp.org YouTube channel (2-hour watch).

langchain - freeCodeCamp.org

How to Protect Sensitive Data by Running LLMs Locally with Ollama

Table of Contents

Prerequisites

What is Ollama?

Installation

Pull and Run Your First Model

How Ollama's API works

How to Call Ollama from Python

How to Use the Ollama Python Library

How to Use the OpenAI SDK with Ollama as the Backend

How to Integrate Ollama into a LangChain App

How to Create a Chat Model

How to Build an LLM-Provider Agnostic App

How to use Ollama with LangGraph

How FinanceGPT Uses This in Practice

Tradeoffs to be Aware Of

Response Quality

Speed

Hardware Requirements

Tool Use and Function Calling

Conclusion

Check Out FinanceGPT

Resources

How to Build and Deploy an AI Agent with LangChain, FastAPI, and Sevalla

What We’ll Cover

What is LangChain?

How to Build Your First Agent with LangChain

Wrapping Your Agent with FastAPI

How to Deploy Your AI Agent to Sevalla

Conclusion

How to Build an AI Agent with LangChain and LangGraph: Build an Autonomous Starbucks Agent

Table of Contents

Prerequisites

What is an LLM Agent?

Project Setup

Folder Structure

Data Schematization with Zod

Importing Zod

Drink Schema

What this schema represents

Sweetener Schema

Syrup Schema

Topping Schema

Size Schema

Milk Schema

Collections of Items

Inferred Types

How to Parse the Schema

Defining the Order Schema

Parsing the Schema into Human-Readable Text

Generating prompt instructions

Validating the model’s output

Reusing the Same Approach for Other Schemas

Why This Matters

Data-to-Text Summarization

The Core Idea

Summarizing a Drink Object

A Standard Summarization Pattern

Why this works well for LLMs

Summarizing Collections of Data

Applying the Same Idea to Other Domains

How to Persist Orders with MongoDB in NestJS

Connecting MongoDB to a NestJS Application

Defining an Order Schema with Mongoose

How This Fits into the LLM Agent Architecture

Preparing for the Agent Logic

LangGraph State/Annotation Terms

How to Create Tools for the Agent

LangGraph Nodes (Workflow Components)

Graph Declaration

Explanation

Workflow Compilation and State Persistence (Final Part)

Conclusion

Resources

How to Use LangChain and LangGraph: A Beginner’s Guide to AI Workflows

What we will cover

What is LangChain?

Why LangChain Was Not Enough

What is LangGraph?