chatbot - freeCodeCamp.org

How to Build and Deploy a Production-Ready WhatsApp Bot with FastAPI, Evolution API, Docker, EasyPanel, and GCP

Raju Manoj — Fri, 20 Feb 2026 15:03:08 +0000

WhatsApp bots are widely used for customer support, automated replies, notifications, and internal tools. Instead of relying on expensive third-party platforms, you can build and deploy your own self-hosted WhatsApp bot using modern open-source tools.

In this tutorial, you’ll learn how to build and deploy a production-ready WhatsApp bot using:

FastAPI
Evolution API
Docker
EasyPanel
Google Cloud Platform (GCP)

By the end of this guide, you will have a fully working WhatsApp bot connected to your own WhatsApp account and deployed on a cloud virtual machine.

How the Architecture Works
How Your WhatsApp Bot Works
Prerequisites
Step 1: Create Firewall Rules on GCP
Step 2: Create a Virtual Machine (Ubuntu 22.04)
Step 3: SSH into the VM
Step 4: Install Docker
Step 5: Install EasyPanel
Step 6: Open the EasyPanel Dashboard
Step 7: Deploy Evolution API
Step 8: Connect WhatsApp
Step 9: Deploy the FastAPI Bot
Step 10: Connect the Webhook - Telling Evolution API Where to Send Messages
Step 11: Final Test
Production Considerations
Conclusion

How the Architecture Works

Before we start installing anything, let’s understand how the system works.

How Your WhatsApp Bot Works

Before we continue setting things up, let's make sure you understand what's actually happening behind the scenes. Don't worry – no technical experience needed here.

Imagine a postal service

Think of your WhatsApp bot like a very fast, automated postal service:

Someone sends you a letter (a WhatsApp message)
A postal worker (Evolution API) picks it up and brings it to your office
Your office manager (FastAPI bot) reads it and writes a reply
The postal worker takes the reply back and delivers it

That's it. That's the whole system.

The 7 steps

Someone sends a message to your WhatsApp number – just like texting a friend.
Evolution API notices the message – it's constantly watching your WhatsApp number for new messages, like a receptionist sitting by the phone.
Evolution API passes the message to your bot – it sends the message content to your app and says "hey, you've got a new message!"
Your bot reads the message and decides what to say – this is where your code does its job.
Your bot sends the reply back to Evolution API – "okay, send this response."
Evolution API delivers the reply through WhatsApp.
The user sees the reply on their phone – usually within seconds.

One line summary

User → WhatsApp → Evolution API → Your Bot → Evolution API → WhatsApp → User

Every step in this guide is just setting up one piece of that chain. Once they're all connected, the whole thing runs on its own automatically.

This architecture allows you to automate replies while keeping full control of your infrastructure.

Why These Tools?

Let’s briefly understand why we’re using each tool.

FastAPI

FastAPI is a modern Python framework for building APIs. It is fast, lightweight, and ideal for handling webhook requests from Evolution API.

Evolution API

Evolution API is a self-hosted WhatsApp automation server built on top of Baileys. It connects your personal WhatsApp account without requiring official WhatsApp Business API approval.

Docker

Docker allows us to run applications in containers. This makes deployments consistent, portable, and production-ready.

EasyPanel

EasyPanel is a graphical platform for managing Docker services. Instead of writing Docker Compose files manually, we use EasyPanel’s UI to deploy and manage our services easily.

Google Cloud Platform (GCP)

GCP provides the virtual machine that hosts our infrastructure. We will use an Ubuntu 22.04 server to run Docker, EasyPanel, Evolution API, and our FastAPI bot.

I chose these tools because they are practical, lightweight, and suitable for real-world production deployments.

Prerequisites

Before starting, make sure you have:

A Google Cloud, AWS, or Azure account
Billing enabled
A project selected
Access to Cloud Shell
Basic Linux and Docker knowledge

Step 1: Create Firewall Rules on GCP

We need to allow traffic to specific ports on our VM. So, we run this command in GCP Cloud Shell:

gcloud compute firewall-rules create easypanel-whatsapp-fw \
 --network default \
 --direction INGRESS \
 --priority 1000 \
 --action ALLOW \
 --rules tcp:22,tcp:80,tcp:443,tcp:3000,tcp:8080,tcp:9000,tcp:5000-5999 \
 --source-ranges 0.0.0.0/0 \
 --description "SSH, EasyPanel, Evolution API, Bot"

This command:

Creates a firewall rule named easypanel-whatsapp-fw
On the default network
Allows incoming internet traffic (INGRESS)
Opens these ports:
- 22 → SSH (server access)
- 80 → HTTP
- 443 → HTTPS
- 3000, 8080, 9000 → App panels / APIs
- 5000–5999 → Custom app range
Allows access from any IP address (0.0.0.0/0)

Basically It opens your server so people (and you) can access your apps and services from the internet. This firewall rule allows external traffic to reach your VM.

Step 2: Create a Virtual Machine (Ubuntu 22.04)

Now we'll create the server that hosts everything. Run the following command in the GCP Cloud Shell to set up a virtual machine with Ubuntu 22.04.

gcloud compute instances create whatsapp-vm \
  --zone=asia-south1-a \
  --machine-type=e2-medium \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=30GB \
  --tags=easypanel

This command creates a new virtual machine (VM) on Google Cloud:

Name: whatsapp-vm
Location (zone): asia-south1-a (India region)
Machine size: e2-medium (2 vCPU, 4GB RAM)
Operating System: Ubuntu 22.04 LTS
Disk size: 30GB
Tag: easypanel (used to apply firewall rules)

This creates a Linux server in Google Cloud that you can use to host EasyPanel, WhatsApp bot, or your APIs.

Note: Wait about one minute for the instance to start.

Step 3: SSH into the VM

Connect to your server by using SSH to access the virtual machine you just created on Google Cloud.

gcloud compute ssh whatsapp-vm --zone=asia-south1-a

This command connects to your virtual machine named whatsapp-vm in the zone asia-south1-a using SSH (secure remote login).

It logs you into your Google Cloud server so you can start installing software and running commands. After running this, you will see a terminal prompt – that means you are now inside your Ubuntu server and ready to go.

Step 4: Install Docker

Docker is needed to run EasyPanel and the Evolution API.

First update the system:

sudo apt update -y
sudo apt install -y curl

This does two things:

sudo apt update -y→ Updates your server’s package list (refreshes available software info).
sudo apt install -y curl→ Installs curl, a tool used to download things from the internet using the terminal.

It prepares your server and installs a tool needed to download and install other software.

Then install Docker:

curl -fsSL https://get.docker.com | sudo sh

This command uses curl to download Docker’s official installation script. The | (pipe) sends it directly to sudo sh, which runs the script as administrator.

It automatically installs Docker on your server.

After this finishes, Docker should be installed.

Enable Docker:

sudo systemctl enable docker
sudo systemctl start docker

This command does two things:

enable docker→ Makes Docker start automatically every time the server reboots.
start docker→ Starts Docker right now.

It turns Docker ON now and makes sure it stays ON after restart.

Allow the Ubuntu user to run Docker:

sudo usermod -aG docker ubuntu

This command adds the user ubuntu to the Docker group.

This is important: By default, you must use sudo before every Docker command.After running this, the Ubuntu user can run Docker without needing sudo every time.

Note: This command assumes your username is ubuntu, which is the default on Google Cloud VMs. If your username is different, replace Ubuntu with your actual username.

Exit the session and reconnect:

exit
gcloud compute ssh whatsapp-vm --zone=asia-south1-a

exit→ Logs you out of your current server session.
gcloud compute ssh whatsapp-vm --zone=asia-south1-a→ Logs you back into your Google Cloud VM.

Why we do this: After adding the ubuntu user to the Docker group, you must log out and log back in for the permission changes to work.

Test Docker:

docker run hello-world

This command downloads a small test image called hello-world, runs it inside Docker, and prints a success message if Docker is working correctly.

It checks if Docker is installed and working properly. If you see “Hello from Docker!”, Docker is working correctly.

Step 5: Install EasyPanel

EasyPanel provides a user interface for deploying Docker services. Run this command in the VM:

curl -sSL https://get.easypanel.io | sudo bash

This command:

Downloads the official EasyPanel installation script
Runs it with administrator (sudo) permission
Automatically installs and configures EasyPanel on your server

It installs EasyPanel on your VM so you can manage apps using a web dashboard instead of commands. Installation takes about one minute.

Step 6: Open the EasyPanel Dashboard

Once you have your IP address, open a new tab in your browser and type it in like this:

http://:3000

For example, if your IP was 34.123.45.67, you would type:

http://34.123.45.67:3000

EasyPanel runs on port 3000 by default – that's why we add :3000 at the end. Without it, your browser won't know which service to open on the server.

Create an admin account and log in; the EasyPanel login page will appear.

Click “Create Admin Account”.

Fill in:

Username (choose something you’ll remember)
Email
Password (make it strong!)
Submit the form.

You are now logged in as the admin and can start managing apps, APIs, and bots through the EasyPanel dashboard.

You will see a page like the one below:

Step 7: Deploy Evolution API

Create a new project (for example: whatsapp-1)
Go to Services → Templates
Select Evolution API
Deploy the latest version

Wait until all services turn green. You will see a page like the one below.

Next, open Environment Variables and locate:

AUTHENTICATION_API_KEY

Copy the AUTHENTICATION_API_KEY.

Open the Evolution API dashboard

Inside EasyPanel, find your Evolution API service. You will see a clickable domain link – it usually looks something like:

https://evolution-api.easypanel.host

Click that link to open it in your browser. You will see a JSON response confirming the service is running.

Once you open the link, you’ll see a JSON response confirming success. To proceed with login, copy the Manager link displayed in the response. This link opens the management dashboard where you can authenticate and begin using the Evolution API. The screenshot below highlights the manager URL along with version details for easy reference

Copy the manager link and open it in a new tab, then copy the AUTHENTICATION_API_KEY, which you did in the previous step. This is how it looks, as you can see below:

Create a new instance:

Choose channel: Baileys
Leave phone number blank
Give your instance a name

Save the instance.

Step 8: Connect WhatsApp

Inside your instance dashboard:

Click Get QR
Scan it using WhatsApp on your phone

Once connected, your chats and contacts will sync automatically. If syncing fails, disconnect and reconnect the session.

Step 9: Deploy the FastAPI Bot

Now we’ll deploy the bot service.

1: Go to EasyPanel

You’re opening the EasyPanel dashboard you just installed. This is where you can manage apps, servers, and services using a graphical interface instead of terminal commands.

2: Create a new project

A “project” is like a container or folder for your bot service. It organizes all files, settings, and deployments for this app.

3: Add an App service

“App service” means a running instance of your application. In this case, it will be the WhatsApp bot.

4: Choose Git deployment

Git deployment lets you connect a code repository to EasyPanel.This will automatically download your code from GitHub and run it inside Docker.

5: Paste your repository URL

https://github.com/rajumanoj333/wabot

This is the GitHub repository containing the WhatsApp bot code. EasyPanel will clone this repo and prepare the app automatically.

6: Domains in EasyPanel

This section lets you assign a URL or domain name to your app service. Even if you don’t have a custom domain, you can use your server’s public IP. Your WhatsApp bot app runs on port 9000 inside the server.

7: Set the port to `9000`

By setting the domain to use port 9000, EasyPanel knows where to send traffic.

Example URL after this step:

https://your-project.easypanel.host

This is the public address people (and other services) will use to reach your bot.

You’re telling EasyPanel:

“Whenever someone accesses this project, forward them to the bot service running on port 9000.”

Without this step, the bot service would run but you wouldn’t be able to access it from your browser or other apps.

Configure Environment Variables

Set the following variables:

EVOLUTION_API_URL=http://evolution-api:8080
EVOLUTION_API_KEY=YOUR_AUTHENTICATION_API_KEY
INSTANCE_NAME=your_instance_name

Note: You might notice two different names here – AUTHENTICATION_API_KEY (used in EasyPanel) and EVOLUTION_API_KEY (used in your bot code). They are the same key. Just copy the value from EasyPanel and paste it into both places.

Step 10: Connect the Webhook – Telling Evolution API Where to Send Messages

At this point, you have two separate things running:

Evolution API: the service that connects to WhatsApp and handles messages
Your app (fastapi bot): the chatbot brain you deployed in the previous steps

Right now, these two don't know each other exists. They're like two people in different rooms with no way to pass notes between them. A webhook fixes that.

So what exactly is a webhook?

A webhook is simply a URL (a web address) that you hand to one service so it can automatically notify another service when something happens.

You're going to tell Evolution API "whenever a WhatsApp message arrives, forward it to this address." Your app will be sitting at that address, waiting to receive it, read it, and send a reply.

Think of it like a forwarding address at the post office. When mail (a WhatsApp message) arrives, it gets automatically redirected to your app's door.

Let's set it up

1. Open your Evolution API dashboard.

You should already have this open from earlier steps. In the left sidebar, click on Events, then click on Webhook. This is where you control how Evolution API sends data to your app.

2. Turn the webhook on.

At the top of the page, you'll see a toggle next to the word "Enabled". Click it so it turns green. This tells Evolution API that you want to start using a webhook.

3. Enter your app's webhook URL.

In the URL field, type your app's address with /webhook added to the end, like this:

https://your-domain.easypanel.host/webhook

Replace your-domain with the actual domain name you set up when you deployed your app. The /webhook part at the end is important: it's a specific page your app has set up just for receiving these messages. Without it, Evolution API would be knocking on the wrong door.

4. Leave "Webhook by Events" and "Webhook Base64" turned off for now.

These are advanced options you won't need for a basic chatbot.

5. Scroll down to the Events section and enable these two events:

MESSAGES_UPSERT: This triggers every time someone sends your WhatsApp number a message. Without this, your app would never know a message arrived.
SEND_MESSAGE: This triggers when a message is sent out. It helps your app confirm that replies are going through correctly.

You can leave all the other events (like APPLICATION_STARTUP) turned off. They handle things like group chats and contact updates, which aren't needed for what we're building.

6. Click Save.

Quick recap of what you just did

You created a direct line between Evolution API and your app. Now, the moment someone messages your WhatsApp number, Evolution API will instantly pass that message along to your app. Your app reads it, figures out a response, and sends one back all automatically.

This is the step that brings your chatbot to life. Without it, nothing would happen when someone sent you a message. With it, the whole system clicks into place.

Step 11: Final Test

Send a message from a different WhatsApp number (not the connected one).

Send:

Hi

If everything is configured correctly, your bot should reply:

👋 Hello! Bot is working.

Congratulations! Your WhatsApp bot is now live.

Production Considerations

For real-world deployments, consider:

Restricting firewall rules instead of allowing 0.0.0.0/0
Using HTTPS with a custom domain
Securing API keys with a secret manager
Monitoring logs and container health
Setting up automatic backups

This tutorial demonstrates the core working system, but these improvements will make your deployment more secure and scalable.

Conclusion

You now have a fully self-hosted WhatsApp bot running on a cloud VM using FastAPI, Evolution API, Docker, EasyPanel, and GCP.

This setup gives you:

Full control over infrastructure
No dependency on expensive SaaS platforms
Production-ready container deployment
Scalable architecture

From here, you can extend your bot with:

AI integrations : connect your bot to ChatGPT or Gemini or Claude so it can answer questions intelligently instead of just sending fixed replies.

Database storage: save incoming messages, user details, or conversation history to a database like PostgreSQL or MongoDB.

Custom automation workflows trigger actions based on keywords, like sending a PDF when someone types "menu" or booking an appointment when they type "schedule".

CRM integrations :connect your bot to tools like HubSpot or Notion to automatically log leads and customer conversations.Building your own infrastructure is one of the best ways to deeply understand how modern backend systems work together.

Happy building!

How to Build an AI-Powered RAG Chatbot with Amazon Lex, Bedrock, and S3

Chisom Uma — Wed, 03 Dec 2025 22:34:26 +0000

Chatbots are widely adopted among software companies, especially those that interact heavily with customers. It is typically used for tasks such as customer support, answering questions, and providing information on websites, apps, and messaging platforms.

These days, as expected, some chatbots are AI-powered and can generate answers to queries through Retrieval-Augmented Generation (RAG). I have been curious about how this works, built it out myself, and now, we’ll look at how to build an AI-powered RAG chatbot.

For this tutorial, you’ll build a RAG chatbot that answers queries about travel policies to Mars. The chatbot retrieves its answers from our own data source (travel policy documents) stored in an S3 bucket. The document serves as our internal data source for the chatbot to reference when generating prompts.

Instead of scripted responses from pre-trained data, the chatbot will pull contextual answers directly from the knowledge base.

Let's get started :)

Prerequisites
What is Retrieval-Augmented Generation (RAG)?
What is Amazon Bedrock?
Getting Started: Access models on Bedrock
Step 1: Upload Travel Policy Documents to the S3 Bucket
Step 2: Create a Knowledge base in Amazon Bedrock
Step 3: Create an Amazon Lex Chatbot
Step 4: Add a Welcome Intent to Your Chatbot
Step 5: Build the Chatbot
Step 6: Adding Amazon QnAIntent
Conclusion

Prerequisites

An AWS account, logged in as an IAM user with admin privileges
Access to Amazon Titan Embeddings G1 - text model on Amazon Bedrock
Access to Anthropic Claude 3.5 Sonnet on Amazon Bedrock.
Access to travel policy documents. You can download these from Google Drive here.
Experience using the AWS console.
No coding required.

What is Retrieval-Augmented Generation (RAG)?

Large Language Models (LLMs) like GPT-4 and Claude are basically everywhere. They get some things amazingly right and others very interestingly wrong, like hallucinations, where the model generates factually incorrect or fabricated information. This brings us to the idea of RAG.

Marina Danilevsky, Senior Research Scientist at IBM, in a lecture, referred to RAG as a “framework” for helping LLMs be more accurate and up-to-date.

Before going into the full scope of RAG, let’s talk briefly about the “generation” part. Generation, in the context of RAG, refers to LLMs that generate texts in response to a user query, referred to as a prompt.

These LLMs can sometimes give incorrect answers, due to limited context or outdated information. Especially because they only fetch information from pre-trained data. Imagine you're asked how many Grammy Awards your favorite artist has, and you give an answer you read in a magazine four years ago. You might be correct, but there are two problems with this answer: first, you didn't cite a source, and second, it's outdated.

This is the problem LLMs have traditionally had. The answers were outdated, and no credible sources were cited.

Now, imagine if you had looked up the answer first, from a reputable source on Google. Your answer would be more accurate and factual, and if there was ever a doubt from the person who asked the question, you could easily share the link to the reputable source on Google, and there would be no further doubts or questions.

What does this have to do with LLMs and RAG? Well, now, instead of the LLM only getting answers from its pre-trained data, risking providing outdated answers, when RAG gets involved, it retrieves answers to queries directly from a content store, which could comprise external sources, such as the internet, or internal sources, such as documents (which will be used in this tutorial). This way, its generated answers are more accurate.

RAG helps the LLM stay up to date by further retrieving information from other sources rather than solely from its pre-trained data.

What is Amazon Bedrock?

Amazon Bedrock is AWS's managed service that gives you access to foundation models, essentially the core AI engines that power generative AI applications. The beauty of Bedrock is that it handles all the heavy lifting for you. No need to provision GPUs, set up model pipelines, or deal with infrastructure headaches.

It's a single platform where you can experiment with, customize, and deploy top-tier AI models from providers like Anthropic, Stability AI, and Amazon's own Titan models (used in this tutorial).

Here's a practical example: let’s say you're building a customer support chatbot. With Bedrock, you simply pick a language model that fits your needs, fine-tune it for your specific use case, and integrate it into your app, all without touching server configuration or infrastructure code.

Getting Started: Access models on Bedrock

To get access to models on AWS via Bedrock:

Log in to your AWS IAM account with root privileges.
Navigate to Amazon Bedrock > Model catalog.

Locate the “Titan Embeddings G1 - Text” model and “Claude 3.5 Sonnet” models.

When you click these models, you are directed to a page with more details. You don’t need to do anything on this page. We will be using these models later in this tutorial. In the following sections, we’ll walk through the steps to build the chatbot.

Step 1: Upload Travel Policy Documents to the S3 Bucket

To upload documents, navigate to the Amazon S3 page in your AWS console, then create a bucket. For more details on creating a bucket, refer to the AWS documentation. Next, upload the downloaded document to the S3 bucket.

Note that the document is zipped; you will need to unzip it before uploading.

Step 2: Create a Knowledge base in Amazon Bedrock

Now that we have created our S3 buckets and uploaded our documents, we can’t just hook up our chatbot built with Lex directly to the S3 buckets. S3 isn’t really “smart” from an AI perspective. To get the AI capabilities needed to make this work, we need Amazon Bedrock.

First, we need to create a knowledge base in Amazon Bedrock.

To get started, head back to the Bedrock page opened up earlier and navigate to Build > Knowledge Bases. Click Create. From the dropdown, choose “Knowledge base with vector store.” Leave IAM permissions as “Create and use a new service role”. This is what allows Bedrock to access other services. Choose “Amazon S3” as the data source type. Click Next.

Next, click Browse S3 and select the created bucket with the uploaded documents. Click Next. On the next page, click Select model to choose an embedding model. Select the “Titan Embeddings G1 - Text”, then select “Amazon OpenSearch Serverless” and click Apply.

Leave everything else the same and click Next. On the next page, click Create Knowledge Base. Note that this takes some time (a few minutes), so you need to be patient with this step. Once your knowledge base is created, you’ll be taken to a new page with a message like one in the image below:

The second message tells you that you need to sync the knowledge base with data sources. To do this, scroll down to the Data source section, select the data source, then click Sync. Wait a few seconds and everything syncs.

Note: If you have more data than we have in this tutorial (just four PDFs), syncing may take longer.

Now, we have our Bedrock knowledge base set up. The knowledge base connects to the S3 bucket containing the travel documents.

It's now time to create the chatbot. For this, we’ll use Amazon Lex.

Step 3: Create an Amazon Lex Chatbot

In your AWS console, navigate to Amazon Lex, then click Create bot. Select Create a blank bot under the Traditional creation method. For the bot name, you can call it “Mars travel bot” or any name you prefer.

Under the “IAM permissions” section, select Create a role with basic Amazon Lex permissions. Under the “Children’s Online Privacy Protection Act (COPPA)” section, select No, since our bot isn't subject to COPPA, and click Next.

On the next page, enter a short description in the Description text field. Select your preferred voice interaction option available for text-to-speech. This is the voice your users will hear when they use the chatbot.

The cool thing about Lex is that you can play a voice sample for each voice. This can help you make the best decision for your business. Next, click Done.

Step 4: Add a Welcome Intent to Your Chatbot

After hitting the Done button, you should see a page for creating an intent next. An intent is basically an action that fulfils a user's request.

Let's start with creating a welcome intent. To get started, change Intent name to “WelcomeIntent”. Then scroll down to the “Sample utterances” section and add utterances. These are example texts that you expect a user to type or speak when they start using your chatbot. So, if the user says “Hi” the chatbot responds with a welcome message. For this tutorial, I added the following expected utterances:

“Hi”
“Hey”
“hello”

You can add as many as you want.

In the “Initial response” section, you can provide a response to the user's utterance. Under the Message group dropdown, you can type in something like “Hi! welcome! How can I help you today?” Next, click the Advanced options button. This reveals a dialog box. Under Set values, select the “Wait for users input” option. You can select other options, but for this tutorial, we are going with this. Click Update options.

When you navigate back to the Intents page, you’ll notice a “Fallbackintent” intent automatically generated for you. This intent is supposed to be invoked when a user launches your bot with an utterance that differs from the one created for the welcome intent.

Step 5: Build the Chatbot

In the previous step, we built an intent for the bot. Now it's time to build the actual chatbot that bundles up all of this configuration into something usable.

To get started, click Build at the top-right side of your screen.

Once the building is completed, you’ll get a message at the top of the page. Now, it’s time to test the bot. Next, click Test at the top-right side of your screen.

You get a pre-built chatbot for testing your implementation. Enter a text or utterance, in this case, for example, “Hi”, and you get an initial response. Remember, the utterance and initial response were set in the previous section.

When you click on Inspect. You’ll see the current intent. In this case, the welcomeIntent.

At this point, we haven’t fully integrated the AI capabilities required to get answers about travel policies to Mars.

Step 6: Adding Amazon QnAIntent

The Amazon QnAIntent introduces GenAI capabilities to our bot. It is a built-in intent that uses Generative AI to fulfill Frequently Asked Questions (FAQ) requests by querying the authorized knowledge content.

To get started, navigate to Add intent > Use built-in intent on the Intents page. Select the QnAIntent option, as shown in the image below:

Give it a name of your choice. Click Add. You’ll be directed to the intent page. In the “QnA configuration” section, select Claude3.5 Sonnet as the desired model.

For the ID, since we had already created a knowledge base earlier, navigate back to Amazon Bedrock > Knowledge Bases and copy your Knowledge Base ID and paste it into the “Knowledge base for Amazon Bedrock Id” field. Click Save intent. Before testing your changes, click Build to build the bot again.

Now, let’s run a little test with the chatbot. I will be prompting it about items I can expense for my trip.

The image above shows me having a conversation with the chatbot. I sent an utterance for the welcome intent, and it responded with a welcome message. When I asked the chatbot about what items I can expense for the trip, it pulled the information from the Bedrock knowledge base, which is connected to the S3 bucket housing the travel policy documents.

Try experimenting with other questions like “How much does my trip cost?” or “Can I bring my pets?”

Want to add a proper web UI to your bot? Follow the step-by-step instructions in this GitHub repository.

FYI - you should delete resources such as your knowledge base, S3 bucket, and vector store (navigate to Amazon OpenSearch Service > Serverless > Dashboard and delete the knowledge base vector collection) to avoid incurring any unwanted charges from AWS.

Conclusion

You've just built an AI-powered chatbot that pulls answers from your own data sources. No more generic responses or outdated information. By combining Amazon Lex, Bedrock, S3, and RAG, you've created a system that actually understands your documentation/knowledge base and delivers accurate, contextual answers.

The real power here isn't just in the technology stack, it's in what you can do with it. Scale this approach to handle customer support queries, internal HR questions, product documentation, or any scenario where you need instant, accurate responses from your own knowledge base.

This is just the beginning. Experiment with different foundation models in Bedrock, expand your knowledge base with more documents, or refine your intents to handle more complex conversations. The infrastructure is built, now it's time to customize it for your specific use case.

If you found this tutorial helpful, consider sharing it with your team or fellow developers.

How to Build a Conversational AI Chatbot with Stream Chat and React

Timothy Olanrewaju — Tue, 17 Jun 2025 20:29:11 +0000

Modern chat applications are increasingly incorporating voice input capabilities because they offer a more engaging and versatile user experience. This also improves accessibility, allowing users with different needs to interact more comfortably with such applications.

In this tutorial, I’ll guide you through the process of creating a conversational AI application that integrates real-time chat functionality with voice recognition. By leveraging Stream Chat for robust messaging and the Web Speech API for speech to text conversion, you’ll build a multi-faceted chat application that supports both voice and text interaction.

Prerequisites
Sneak Peek
Core Technologies
Backend Implementation Guide
- Project Setup
Frontend Implementation Guide
Complete Process Flow
Conclusion

Prerequisites

Before we begin, ensure you have the following:

A Stream account with an API key and secret (Read on how to get them here)
Access to an LLM API (like OpenAI, Anthropic).
Node.js and npm/yarn installed.
Basic knowledge of React and TypeScript.
Modern browser with WebSpeech API support (like Chrome, Edge)

Sneak Peek

Let’s take a quick look at the app we’ll be building in this tutorial. This way, you get a feel for what it does before we jump into the details.

If you’re now excited, let’s get straight into it!

Core Technologies

This application is powered by three main players: Stream Chat, the Web Speech API, and a Node.js + Express backend.

Stream Chat is a platform that helps you easily build and integrate rich, real-time chat and messaging experiences into your applications. It offers a variety of SDKs (Software Development Kits) for different platforms (like Android, iOS, React) and pre-built UI components to streamline development. Its robustness and engaging chat functionality make it a great choice for this app – we don’t need to build anything from scratch.

Web Speech API is a browser standard that allows you to integrate voice input and output into your apps, enabling features like speech recognition (converting spoken speech to text) and speech synthesis (converting text to speech). We’ll use the speech recognition feature in this project.

The Node.js + Express backend manages correct agent instantiation and processes the conversational responses generated by our LLM API.

Backend Implementation Guide

Let’s begin with our backend, the engine room – where user input is routed to the appropriate AI model, and a processed response is returned. Our backend supports multiple AI models, specifically OpenAI and Anthropic.

Project Setup

Create a folder, call it ‘My-Chat-Application’.
Clone this Github repository
After cloning, rename the folder to ‘backend’
Open the .env.example file and provide the necessary keys (you’ll need to provide either the OpenAI or Anthropic key – the Open Weather key is optional).
Rename the env.examplefile to .env
Install dependencies by running this command:
```
 npm install
```
Run the project by entering this command:
```
 npm start
```
Your backend should be running smoothly on localhost:3000.

Frontend Implementation Guide

This section explores two broad, interrelated components: the chat structure and speech recognition.

Project Setup

We will be creating and setting up our React project with the Stream Chat React SDK. We'll use Vite with the TypeScript template. To do that, navigate to your My-Chat-Application folder, open your terminal and enter this command:

npm create vite frontend -- --template react-ts
cd chat-example
npm i stream-chat stream-chat-react

With our frontend project set up, we can now run the app:

npm run dev

Understanding the App Component

The main focus here is to initialize a chat client, connect a user, create a channel, and render the chat interface. We’ll go through all these processes step by step to help you understand them better:

Define Constants

First, we need to provide some important credentials that we need for user creation and chat client setup. You can find these credentials on your Stream dashboard.

const apiKey = "xxxxxxxxxxxxx";
const userId = "111111111";
const userName = "John Doe";
const userToken = "xxxxxxxxxx.xxxxxxxxxxxx.xx_xxxxxxx-xxxxx_xxxxxxxx"; //your stream secret key

Note: These are dummy credentials. Make sure to use your own credentials.

Create a User

Next, we need to create a user object. We’ll create it using an ID, name and a generated avatar URL:

const user: User = {
  id: userId,
  name: userName,
  image: `https://getstream.io/random_png/?name=${userName}`,
};

Setup a Client

We need to track the state of the active chat channel using the useState hook to ensure seamless real-time messaging in this Stream Chat application. A custom hook called useCreateChatClient initializes the chat client with an API key, user token, and user data:

  const [channel, setChannel] = useState();
  const client = useCreateChatClient({
    apiKey,
    tokenOrProvider: userToken,
    userData: user,
  });

Initialize Channel

Now, we initialize a messaging channel to enable real-time communication in the Stream Chat application. When the chat client is ready, the useEffect hook triggers the creation of a messaging channel named my_channel, adding the user as a member. This channel is then stored in the channel state, ensuring that the app is primed for dynamic conversation rendering.

  useEffect(() => {
    if (!client) return;
    const channel = client.channel("messaging", "my_channel", {
      members: [userId],
    });

    setChannel(channel);
  }, [client]);

Render Chat Interface

With all the integral parts of our chat application all set up, we’ll return a JSX to define the chat interface's structure and components:

 if (!client) return <div>Setting up client & connection...div>;

  return (
    <Chat client={client}>
      <Channel channel={channel}>
        <Window>
          <MessageList />
          <MessageInput />
        Window>
        <Thread />
      Channel>
    Chat>
  );

In this JSX structure:

If the client is not ready, it displays a "Setting up client & connection..." message.
Once the client is ready, it renders the chat interface using:
- : Wraps the Stream Chat context with the initialized client.
- : Sets the active channel.
- : Contains the main chat UI components:
  - : Displays the list of messages.
  - : Uses a custom CustomMessageInput for sending messages.
- : Renders threaded replies.

With this, we've set up our chat interface and channel, and we have a client ready. Here's what our interface looks like so far:

Adding AI to the Channel

Remember, this chat application is designed to interact with an AI, so we need to be able to both add and remove the AI from the channel. On the UI, we’ll add a button in the channel header to enable users add and remove AI. But we still need to determine whether or not we already have it in the channel to know which option to display.

For that we’ll create a custom hook called useWatchers. It monitors the presence of the AI using a concept called watchers:

import { useCallback, useEffect, useState } from 'react';
import { Channel } from 'stream-chat';

export const useWatchers = ({ channel }: { channel: Channel }) => {
  const [watchers, setWatchers] = useState([]);
  const [error, setError] = useState<Error | null>(null);

  const queryWatchers = useCallback(async () => {
    setError(null);

    try {
      const result = await channel.query({ watchers: { limit: 5, offset: 0 } });
      setWatchers(result?.watchers?.map((watcher) => watcher.id).filter((id): id is string => id !== undefined) || [])
      return;
    } catch (err) {
      setError(err as Error);
    }
  }, [channel]);

  useEffect(() => {
    queryWatchers();
  }, [queryWatchers]);

  useEffect(() => {
    const watchingStartListener = channel.on('user.watching.start', (event) => {
      const userId = event?.user?.id;
      if (userId && userId.startsWith('ai-bot')) {
        setWatchers((prevWatchers) => [
          userId,
          ...(prevWatchers || []).filter((watcherId) => watcherId !== userId),
        ]);
      }
    });

    const watchingStopListener = channel.on('user.watching.stop', (event) => {
      const userId = event?.user?.id;
      if (userId && userId.startsWith('ai-bot')) {
        setWatchers((prevWatchers) =>
          (prevWatchers || []).filter((watcherId) => watcherId !== userId)
        );
      }
    });

    return () => {
      watchingStartListener.unsubscribe();
      watchingStopListener.unsubscribe();
    };
  }, [channel]);

  return { watchers, error };
};

Configuring the ChannelHeader

We can now build a new channel header component by utilizing the useChannelStateContext hook to access the channel and initialize the custom useWatchers hook. Using the watchers' data, we define an aiInChannel variable to display relevant text. Based on this variable, we invoke either the start-ai-agent or stop-ai-agent endpoint on the Node.js backend.

import { useChannelStateContext } from 'stream-chat-react';
import { useWatchers } from './useWatchers';

export default function ChannelHeader() {
  const { channel } = useChannelStateContext();
  const { watchers } = useWatchers({ channel });

  const aiInChannel =
    (watchers ?? []).filter((watcher) => watcher.includes('ai-bot')).length > 0;
  return (
    <div className='my-channel-header'>
      <h2>{(channel?.data as { name?: string })?.name ?? 'Voice-and-Text AI Chat'}h2>
      <button onClick={addOrRemoveAgent}>
        {aiInChannel ? 'Remove AI' : 'Add AI'}
      button>
    div>
  );

  async function addOrRemoveAgent() {
    if (!channel) return;
    const endpoint = aiInChannel ? 'stop-ai-agent' : 'start-ai-agent';
    await fetch(`http://127.0.0.1:3000/${endpoint}`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ channel_id: channel.id, platform: 'openai' }),
    });
  }
}

Adding an AI State Indicator

AIs take a bit of time to process information, so while the AI is processing, we add an indicator to reflect its status. We create a AIStateIndicator that does that for us:

import { AIState } from 'stream-chat';
import { useAIState, useChannelStateContext } from 'stream-chat-react';

export default function MyAIStateIndicator() {
  const { channel } = useChannelStateContext();
  const { aiState } = useAIState(channel);
  const text = textForState(aiState);
  return text && <p className='my-ai-state-indicator'>{text}p>;

  function textForState(aiState: AIState): string {
    switch (aiState) {
      case 'AI_STATE_ERROR':
        return 'Something went wrong...';
      case 'AI_STATE_CHECKING_SOURCES':
        return 'Checking external resources...';
      case 'AI_STATE_THINKING':
        return "I'm currently thinking...";
      case 'AI_STATE_GENERATING':
        return 'Generating an answer for you...';
      default:
        return '';
    }
  }
}

Building the Speech to Text Functionality

Up to this point, we have a functional chat application that sends messages and receives feedback from an AI. Now, we want to enable voice interaction, allowing users to speak to the AI instead of typing manually.

To achieve this, we’ll set up speech-to-text functionality within a CustomMessageInput component. Let’s walk through the entire process, step by step, to understand how to achieve it.

Initial States Configuration

When the CustomMessageInput component first mounts, it begins by establishing its foundational state structure:

  const [isRecording, setIsRecording] = useState(false);
  const [isRecognitionReady, setIsRecognitionReady] = useState(false);
  const recognitionRef = useRef(null);
  const isManualStopRef = useRef(false);
  const currentTranscriptRef = useRef("");

This initialization step is crucial because it establishes the component's ability to track multiple concurrent states: whether recording is active, whether the speech API is ready, and various persistence mechanisms for managing the speech recognition lifecycle.

Context Integration

In Stream Chat, the MessageInputContext is established within the MessageInput component. It provides data to the Input UI component and its children. Since we want to use the values stored within the MessageInputContext to build our own custom input UI component, we’ll be calling the useMessageInputContext custom hook:

  // Access the MessageInput context
  const { handleSubmit, textareaRef } = useMessageInputContext();

This step ensures that the voice input feature integrates seamlessly with the existing chat infrastructure, sharing the same textarea reference and submission mechanisms that other input methods use.

Web Speech API Detection and Initialization

The Web Speech API is not supported by some browsers, which is why we need to check if the browser running this application is compatible. The component's first major process involves detecting and initializing the Web Speech API:

 const SpeechRecognition = (window as any).SpeechRecognition||(window as any).webkitSpeechRecognition;

Once the API is detected, the component configures the speech recognition service with optimal settings.

Event Handler Configuration

We’ll have two event handlers: the result processing handler and the lifecycle event handler.

The result processing handler processes speech recognition output. It demonstrates a two-phase processing approach where interim results provide immediate feedback while final results are accumulated for accuracy.

      recognition.onresult = (event: any) => {
        let finalTranscript = "";
        let interimTranscript = "";

        // Process all results from the last processed index
        for (let i = event.resultIndex; i < event.results.length; i++) {
          const transcriptSegment = event.results[i][0].transcript;
          if (event.results[i].isFinal) {
            finalTranscript += transcriptSegment + " ";
          } else {
            interimTranscript += transcriptSegment;
          }
        }

        // Update the current transcript
        if (finalTranscript) {
          currentTranscriptRef.current += finalTranscript;
        }

        // Combine stored final transcript with current interim results
        const combinedTranscript = (currentTranscriptRef.current + interimTranscript).trim();

        // Update the textarea
        if (combinedTranscript) {
          updateTextareaValue(combinedTranscript);
        }
      };

The lifecycle event handler ensures that the component responds appropriately to each phase of the speech recognition lifecycle events (onstart, onend and onerror):

      recognition.onstart = () => {
        console.log("Speech recognition started");
        setIsRecording(true);
        currentTranscriptRef.current = ""; // Reset transcript on start
      };

      recognition.onend = () => {
        console.log("Speech recognition ended");
        setIsRecording(false);

        // If it wasn't manually stopped and we're still supposed to be recording, restart
        if (!isManualStopRef.current && isRecording) {
          try {
            recognition.start();
          } catch (error) {
            console.error("Error restarting recognition:", error);
          }
        }

        isManualStopRef.current = false;
      };

      recognition.onerror = (event: any) => {
        console.error("Speech recognition error:", event.error);
        setIsRecording(false);
        isManualStopRef.current = false;

        switch (event.error) {
          case "no-speech":
            console.warn("No speech detected");
            // Don't show alert for no-speech, just log it
            break;
          case "not-allowed":
            alert(
              "Microphone access denied. Please allow microphone permissions.",
            );
            break;
          case "network":
            alert("Network error occurred. Please check your connection.");
            break;
          case "aborted":
            console.log("Speech recognition aborted");
            break;
          default:
            console.error("Speech recognition error:", event.error);
        }
      };

      recognitionRef.current = recognition;
      setIsRecognitionReady(true);
      } else {
      console.warn("Web Speech API not supported in this browser.");
      setIsRecognitionReady(false);
      }

Starting Voice Input

When a user clicks the microphone button, the component initiates a multi-step process that involves requesting microphone permissions and providing clear error handling if users deny access.

 const toggleRecording = async (): Promise<void> => {
    if (!recognitionRef.current) {
      alert("Speech recognition not available");
      return;
    }

    if (isRecording) {
      // Stop recording
      isManualStopRef.current = true;
      recognitionRef.current.stop();
    } else {
      try {
        // Request microphone permission
        await navigator.mediaDevices.getUserMedia({ audio: true });

        // Clear current text and reset transcript before starting
        currentTranscriptRef.current = "";
        updateTextareaValue("");

        // Start recognition
        recognitionRef.current.start();
      } catch (error) {
        console.error("Microphone access error:", error);
        alert(
          "Unable to access microphone. Please check permissions and try again.",
        );
      }
    }
  };

Resetting State and Start Recognition

Before beginning speech recognition, the component resets its internal state. This reset ensures that each new voice input session starts with a clean slate, preventing interference from previous sessions.

currentTranscriptRef.current = "";
updateTextareaValue("");
recognitionRef.current.start();

Real-Time Speech Processing

Two things happen simultaneously during this process:

Continuous Result Processing : As the user speaks, the component continuously processes incoming speech data through a sophisticated pipeline:
- Each speech segment is classified as either interim (temporary) or final (confirmed).
- Final results are accumulated in the persistent transcript reference.
- Interim results are combined with accumulated finals for immediate display.

Dynamic Textarea Updates: The component updates the textarea in real-time using a custom DOM manipulation approach:

 const updateTextareaValue = (value: string) => {
   const nativeInputValueSetter = Object.getOwnPropertyDescriptor(
     window.HTMLTextAreaElement.prototype,
     'value'
   )?.set;

   if (nativeInputValueSetter) {
     nativeInputValueSetter.call(textareaRef.current, value);
     const inputEvent = new Event('input', { bubbles: true });
     textareaRef.current.dispatchEvent(inputEvent);
   }
 };

This step involves bypassing React's conventional controlled component behavior to provide immediate feedback, while still maintaining compatibility with React's event system.

User Interface Feedback

To make voice interactions feel smoother for users, we’ll add some visual feedback features. These include:

Toggling between mic and stop icons

We show a microphone icon when idle and a stop icon when recording is active. This provides a clear indication of the recording state.

Recording notification banner

A notification banner appears at the top of the screen to indicate that voice recording is in progress. This notification ensures users are aware when the microphone is active, addressing privacy and usability concerns.
```
 {isRecording && (
   <div className="recording-notification show">
     <span className="recording-icon">🎤span>
     Recording... Click stop when finished
   div>
 )}
```

Message Integration and Submission

The transcribed text integrates seamlessly with the existing chat system through the shared textarea reference and context-provided submission handler:

This integration means that voice-generated messages follow the same submission pathway as typed messages, maintaining consistency with the chat system's behavior. After message submission, the component ensures proper cleanup of its internal state, preparing for the next voice input session.

Passing the CustomMessageInput component

Having built our custom messaging input component, we’ll now pass it to the Input prop of the MessageInput component in our App.tsx:

Complete Process Flow

Here’s how the application works:

After the component mounts, you add the AI to the chat by clicking the Add AI button.
Click the mic icon to start recording.
Your browser will ask for permission to use the microphone.
If you deny permission, recording won't begin.
If you allow permission, recording and transcription start simultaneously.
Click the stop (square) icon to end the recording.
Click the send button to submit your message.
The AI processes your input and generates a response.

Conclusion

In this tutorial, you’ve learned how to build a powerful conversational chatbot using Stream Chat and React. The application supports both text and voice inputs.

If you want to create your own engaging chat experiences, you can explore Stream Chat and Video features to take your projects to the next level.

Get the full source code for this project here. If you enjoyed reading this article, connect with me on LinkedIn or follow me on X for more programming-related posts and articles.

See you on the next one!

How to Build a Medical Chatbot with Flutter and Gemini: A Beginner’s Guide

Atuoha Anthony — Fri, 13 Jun 2025 17:37:31 +0000

In today's digital age, the demand for accessible and accurate health information is higher than ever. Leveraging the power of artificial intelligence, we can create intelligent chatbots that provide reliable health-related guidance.

This beginner's guide will walk you through building a powerful and specialized medical chatbot using Flutter and Google's Gemini API. The chatbot will be able to receive input from various modalities like text, audio, camera, files, and a gallery, and it will be strictly confined to answering health-related questions.

The Power of AI in Healthcare

AI-powered chatbots are transforming various industries, and healthcare is no exception. They offer a scalable and efficient way to disseminate information, answer frequently asked questions, and even provide initial assessments. By focusing on health-related queries, our chatbot will act as a specialized assistant, providing concise and accurate information to users.

Core Technologies

We’ll build our medical chatbot using the following key technologies:

Flutter: Google's UI toolkit for building natively compiled applications for mobile, web, and desktop from a single codebase.¹ Its rich set of widgets and expressive UI make it ideal for creating engaging chat interfaces.
Google Gemini API: Google's most capable and flexible AI model. Gemini is multimodal, meaning it can process and understand different types of information, including text, images, audio, and video. This capability is crucial for our chatbot to handle diverse user inputs.
flutter_ai_toolkit: A Flutter package that provides a set of AI chat-related widgets and an abstract LLM provider API, simplifying the integration of AI models into your Flutter app. It offers out-of-the-box support for Gemini.
google_generative_ai: The official Dart package for interacting with Google's Generative AI models (Gemini).

How to Set Up Your Development Environment

Before we dive into the code, make sure you have Flutter installed and configured on your system. If not, follow the official Flutter installation guide here.

Get Your Gemini API Key

To interact with the Gemini API, you need an API key. This key authenticates your application and allows it to send requests to the Gemini model.

Here's how to get your Gemini API key:

Go to Google AI Studio: Open your web browser and navigate to https://aistudio.google.com/.
Log in with your Google account: If you're not already logged in, you'll be prompted to sign in with your Google account.
Click "Get API key in Google AI Studio": On the Google AI Studio homepage, you'll see a prominent button with this text. Click it.
Review and approve terms of service: A pop-up will appear asking you to consent to the Google APIs Terms of Service and Gemini API Additional Terms of Service. Read them carefully, check the necessary boxes, and click "Continue."
Create your API key: You'll now have the option to "Create API key in new project" or "Create API key in existing project." Choose the one that suits your needs. Your API key will be auto-generated.
Copy your API key: Crucially, copy this API key immediately and store it securely. It will not be shown again. Do NOT hardcode your API key directly into your production code, especially for client-side applications. For development purposes, we will use it directly in our MedicalChatScreen for simplicity, but for a real-world application, consider using environment variables or a secure backend to manage your API key.

Add Dependencies (`pubspec.yaml`)

Open your pubspec.yaml file (located at the root of your Flutter project) and add the following dependencies under dependencies:

dependencies:
  flutter:
    sdk: flutter
  flutter_ai_toolkit: ^0.6.8
  google_generative_ai: ^0.4.6

After adding these, run flutter pub get in your terminal to fetch the packages.

Project Structure

Our project will have a simple structure:

lib/main.dart: The entry point of our Flutter application.
lib/screens/chat.dart: Contains the main chat interface for our medical chatbot.

Code Implementation and Explanation

Let's break down the provided code and understand each part.

`lib/main.dart`

import 'package:ai_demo/screens/chat.dart';
import 'package:flutter/material.dart';

void main() {
  runApp(const MyApp());
}

class MyApp extends StatelessWidget {
  const MyApp({super.key});

  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'Medical ChatBot',
      theme: ThemeData(
        colorScheme: ColorScheme.fromSeed(seedColor: Colors.deepPurple),
      ),
      home: const MedicalChatScreen(),
    );
  }
}

Here’s what’s going on in this code:

import 'package:ai_demo/screens/chat.dart';: This line imports the chat.dart file from the screens folder. This is where our MedicalChatScreen widget is defined.
import 'package:flutter/material.dart';: This imports the fundamental Flutter Material Design widgets, essential for building the UI.
void main() { runApp(const MyApp()); }: This is the entry point of every Flutter application. runApp() takes a widget as an argument and makes it the root of the widget tree.
class MyApp extends StatelessWidget: MyApp is the root widget of our application. StatelessWidget means its properties don't change over time.
Widget build(BuildContext context): This method is where the UI of the MyApp widget is built.
MaterialApp: This widget provides the basic Material Design visual structure for a Flutter app.
- title: 'Medical ChatBot': Sets the title of the application, which might be displayed in the device's task switcher or browser tab.
- theme: ThemeData(...): Defines the visual theme of the application.
  - colorScheme: ColorScheme.fromSeed(seedColor: Colors.deepPurple): Generates a color scheme based on a primary "seed" color (Colors.deepPurple), ensuring a consistent and harmonious look across the app.
- home: const MedicalChatScreen(): Sets the initial screen of the application to our MedicalChatScreen widget.

`lib/screens/chat.dart`

import 'package:flutter/material.dart';
import 'package:flutter_ai_toolkit/flutter_ai_toolkit.dart';
import 'package:google_generative_ai/google_generative_ai.dart';

class MedicalChatScreen extends StatefulWidget {
  const MedicalChatScreen({super.key});

  @override
  State createState() => _MedicalChatScreenState();
}

class _MedicalChatScreenState extends State<MedicalChatScreen> {
  String apiKey = ""; // IMPORTANT: Replace with your actual Gemini API Key

  @override
  void initState() {
    super.initState();
    // It's a good practice to load the API key from a secure source
    // rather than hardcoding it, especially for production apps.
    // For this demo, we'll keep it simple.
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        backgroundColor: Colors.white,
        automaticallyImplyLeading: false,
        title: const Text("Medical ChatBot"),
      ),
      body: LlmChatView(
        suggestions: const [
          "I've been feeling dizzy lately. What now?",
          "How do I know if I need to see a doctor?",
          "What should I eat to boost my immunity?"
        ],
        style: LlmChatViewStyle(
          backgroundColor: Colors.white,
          chatInputStyle: ChatInputStyle(
            hintText: "Enter your message",
            decoration: const BoxDecoration().copyWith(
              borderRadius: BorderRadius.circular(50),
            ),
          ),
        ),
        provider: GeminiProvider(
          model: GenerativeModel(
            model: "gemini-2.0-flash",
            apiKey: apiKey,
            systemInstruction: Content.system(
              "You are a professional medical health assistant. Only respond to health and medically related questions and make them concise and straight to the point without too much explanation."
                  "If a question is unrelated to health or medicine, politely inform the user that you can only answer medical-related queries.",
            ),
          ),
        ),
        welcomeMessage:
        "Hello👋 I’m here to help with your medical questions. Please tell me how I can assist you."
      ),
    );
  }
}

What’s going on in chat.dart:

import 'package:flutter/material.dart';: Imports Material Design widgets.
import 'package:flutter_ai_toolkit/flutter_ai_toolkit.dart';: Imports the flutter_ai_toolkit package, which provides the LlmChatView and GeminiProvider.
import 'package:google_generative_ai/google_generative_ai.dart';: Imports the google_generative_ai package, which allows us to interact with the Gemini model.
class MedicalChatScreen extends StatefulWidget: Our chat screen is a StatefulWidget because its apiKey and potentially other chat-related states might change.
_MedicalChatScreenState createState() => _MedicalChatScreenState();: Creates the mutable state for this widget.
String apiKey = "";: This is where you need to paste your Gemini API key. Replace "" with the actual key you obtained from Google AI Studio. For example: String apiKey = "YOUR_GEMINI_API_KEY_HERE";.
- Security note: As mentioned before, hardcoding API keys is not recommended for production applications. Consider using environment variables, a secrets management service (like Firebase Remote Config or Google Cloud Secret Manager), or a backend server to handle API requests securely.
initState(): This method is called once when the widget is inserted into the widget tree. It's a good place for initial setup. In this case, it's empty but serves as a placeholder for potential future initialization like loading the API key securely.
Scaffold: Implements the basic Material Design visual layout structure.
- appBar: Displays a top app bar.
  - backgroundColor: Colors.white: Sets the background color of the app bar to white.
  - automaticallyImplyLeading: false: Prevents Flutter from automatically adding a back button if this screen is pushed onto a navigation stack.
  - title: const Text("Medical ChatBot"): Sets the title text of the app bar.
body: LlmChatView(...): This is the core widget from the flutter_ai_toolkit that provides the chat UI and handles the interaction with the LLM.
- suggestions: const [...]: Provides a list of initial suggested prompts to the user when the chat is empty. These prompts guide the user on the types of questions the chatbot can answer.
- style: LlmChatViewStyle(...): Customizes the appearance of the chat view.
  - backgroundColor: Colors.white: Sets the background color of the chat area.
  - chatInputStyle: ChatInputStyle(...): Styles the text input field.
    - hintText: "Enter your message": Placeholder text in the input field.
    - decoration: const BoxDecoration().copyWith(borderRadius: BorderRadius.circular(50)): Styles the input field with rounded corners.
- provider: GeminiProvider(...): This is where we configure our Gemini model as the AI provider for the LlmChatView.
  - model: GenerativeModel(...): Creates an instance of the Gemini model.
    - model: "gemini-2.0-flash": Specifies the particular Gemini model to use. "gemini-2.0-flash" is a lightweight and fast model suitable for many applications. Other models like "gemini-pro" are also available, offering different capabilities and costs.
    - apiKey: apiKey: Passes your obtained Gemini API key to the model.
    - systemInstruction: Content.system(...): This is crucial for defining the chatbot's persona and limitations. It's a system message sent to the Gemini model at the beginning of the conversation (and potentially with every turn, depending on the implementation details of flutter_ai_toolkit and google_generative_ai).
      - "You are a professional medical health assistant. Only respond to health and medical-related questions and make them concise and straight to the point without too much explanation.": This is the primary instruction. It tells Gemini to act as a medical assistant and to be precise in its health-related responses.
      - "If a question is unrelated to health or medicine, politely inform the user that you can only answer medical-related queries.": This instruction ensures that the chatbot stays within its defined scope and doesn't hallucinate or provide irrelevant answers to non-medical questions, which is vital for a specialized health bot.
- welcomeMessage: "Hello👋 I’m here to help with your medical questions. Please tell me how I can assist you.": A friendly message displayed to the user when they first open the chat screen, setting the context for the conversation.

The flutter_ai_toolkit package, when used with GeminiProvider, intrinsically supports multi-modal inputs. The LlmChatView automatically provides UI elements for:

Text input: The standard text field for typing messages.
Audio input: A microphone icon will typically be present, allowing users to record voice messages that are then transcribed and sent to Gemini.
Camera input: A camera icon will allow users to take a photo and send it to the chatbot. Gemini can then process the image and provide a response.
File input: An attachment icon (often a paperclip) will enable users to select files (like documents or images) from their device to send to the chatbot.
Gallery input: Similar to file input, but specifically for selecting images or videos from the device's photo gallery.

The flutter_ai_toolkit abstracts away the complexities of handling these different input types, converting them into a format that the google_generative_ai package and subsequently the Gemini model can understand and process. For instance, images are sent as ImagePart within the Content object, and audio might be transcribed to text before being sent, or sent as AudioPart if the model supports direct audio input.

The systemInstruction we set for the GenerativeModel is crucial here. While the flutter_ai_toolkit handles the input, the Gemini model's ability to understand various modalities in the context of health questions depends on its training and our clear instructions.

For example, if a user uploads an image of a rash, the system instruction helps guide Gemini to interpret it from a medical perspective (though it's important to remember that an AI chatbot is not a substitute for professional medical diagnosis).

How to Run Your Chatbot

Replace apiKey: In lib/screens/chat.dart, replace "" with your actual Gemini API key.
Run the application: In your terminal, navigate to your project's root directory and run: Bash
```
 flutter run
```

This will launch the application on a connected device or emulator. You should see the "Medical ChatBot" app with the welcome message and suggested prompts. Try typing some health-related questions, and also experiment with the multi-modal input options (microphone, camera, attachment icon) if your device/emulator supports them.

Important Considerations and Future Enhancements

API key security: Just to reiterate the importance of not hardcoding API keys in production. For a deployed app, consider using environment variables, backend services, or Flutter's build configurations to inject the API key securely.
Error handling: The current code doesn't explicitly show error handling for API calls. In a real application, you'd want to handle network errors, invalid API keys, or rate limits gracefully. The flutter_ai_toolkit and google_generative_ai packages provide mechanisms for this.
User Experience (UX):
- Loading indicators: Show a loading indicator while the AI is generating a response.
- Input validation: For certain inputs (for example, file types), you might want to add client-side validation.
- Clearance/history: Implement features to clear chat history or save past conversations.
Medical disclaimer: Crucially, any medical chatbot should include a prominent disclaimer stating that it is not a substitute for professional medical advice, diagnosis, or treatment. It should always advise users to consult with a qualified healthcare professional for any health concerns.
Privacy and data security: When dealing with health-related information, data privacy is paramount. Ensure your application complies with relevant regulations (for example, HIPAA in the U.S., GDPR in Europe) and that user data is handled securely. The Gemini API has its own data policies you should review.
Advanced system instructions: For a more sophisticated medical chatbot, you could expand the systemInstruction to include specific medical knowledge domains, preferred response formats (for example, always list bullet points for symptoms), or even direct the AI to ask clarifying questions.
Tool use/function calling: Gemini supports tool use (function calling), allowing the AI to interact with external services. For a health bot, this could mean:
- Looking up drug information from a database.
- Finding nearby clinics or pharmacies.
- Accessing up-to-date medical research.
- This would require more complex setup with backend functions that the AI can call.
Speech-to-Text (STT) and Text-to-Speech (TTS): While flutter_ai_toolkit handles audio input, you might want more fine-grained control over STT and TTS services for improved voice interaction.
Image processing and medical imaging: For truly advanced medical applications, you might integrate specialized image processing libraries for analyzing medical images (for example, X-rays, MRIs), but this is a complex domain requiring expert knowledge and regulatory compliance. Our current setup allows Gemini to interpret images, but it relies on Gemini's general vision capabilities.

Screenshots

You can check out the full project here: https://github.com/Atuoha/ai_medical_assistant

Wrapping Up

You've now successfully built a foundational medical chatbot using Flutter and Google Gemini! This application demonstrates how to integrate a powerful AI model with a multi-modal user interface, while also enforcing specific behavioral constraints (health-only questions).

By extending this base with robust error handling, enhanced UX, and potentially advanced AI features like tool use, you can create even more sophisticated and valuable healthcare applications.

Remember to always prioritize user safety and data privacy when developing AI solutions in the medical domain.

Flutter and Dart Packages:

flutter_ai_toolkit:
- Pub.dev page: https://pub.dev/packages/flutter_ai_toolkit
- Flutter Documentation (AI Toolkit): https://docs.flutter.dev/ai-toolkit
google_generative_ai:
- Pub.dev page: https://pub.dev/packages/google_generative_ai

Google Gemini API and AI Studio:

Google AI Studio: https://aistudio.google.com/
Get a Gemini API Key (Google AI for Developers): https://ai.google.dev/gemini-api/docs/api-key
Gemini API Documentation (Google AI for Developers): https://ai.google.dev/api (General API documentation)

Flutter Documentation:

Flutter Official Website (Installation Guide): https://flutter.dev/docs/get-started/install

General Concepts (for further reading):

Material Design: https://m3.material.io/ (For understanding Flutter's UI principles)
Large Language Models (LLMs): A broad topic, but understanding the basics of how LLMs work will enhance comprehension of the systemInstruction and model behavior.
Multimodal AI: Research on multimodal AI provides context for why Gemini can handle various input types (text, image, audio, and so on).
API Key Security Best Practices: For production applications, it's crucial to understand secure API key management (for example, environment variables, secret management services). A good starting point would be general security best practices for API keys.

How to Add Live Chat to Your Applications with Rocket.chat

Spruce Emmanuel — Mon, 07 Apr 2025 13:26:13 +0000

The fastest way to gather valuable information about your site’s users is still by talking to them. And what better way to do this than by adding a chat system to your app?

For my case, I just wanted to add a chat system to my portfolio website so I could get valuable info from potential employers and clients. I ended up building something like this:

Why Rocket.Chat , you may ask?
Prerequisites
Getting Started
Conclusion

Why Rocket.Chat, you may ask?

Rocket.Chat is a great option because:

Open Source: It’s free and customizable.
Comprehensive APIs: Their APIs make integration simple.
Flexible Hosting: Self-host your own or use their cloud version with a free trial (which we’ll use here).

Prerequisites

Before you continue, there are a few things you should know and have:

A running Rocket.Chat server (either self-hosted or on Rocket.Chat Cloud). Here, I'll show you how to set up one with Rocket.Chat Cloud.
A working knowledge of JavaScript fundamentals.

Getting Started

First things first, let's set up a Rocket.Chat server. Again, you can either self host your own or use their cloud version. And don't worry – you don't have to pay anything right now or for this tutorial, as they provide a 30 day free trial.

Step 1: Set Up the Rocket.Chat Server

Head over to https://cloud.rocket.chat and create your free account.

Once you're logged in, click on the "Change to SaaS trial" button to launch a cloud-hosted server.

Next, create a Cloud Workspace by providing your workspace name, URL, and server region.

It will take a little while to set up. When it’s done, you should see something similar to this:

Now copy your server URL—it should look like this: https://example.rocket.chat.

Step 2: Configure the Rocket.Chat Server

Before diving into the code, we need to configure our server so we can use the livechat API.

To start, open your Rocket.Chat server and click on the menu button, then click on Omnichannel.

Click on Agents on the sidebar and add yourself as an agent.

Next, click on Departments and create a Department. I'll call mine Chats.

Now you need to configure a few things about the Livechat widget:

Make sure you turn on the offline form and set the Email Address to Send Offline Messages.
Also, configure your business hours to the times you'll be available.

Step 3: Register the Visitor

Next, we need to register the visitor and create a room for them. To do this, you need to collect the visitor's name and email and generate a random unique ID.

How to Register the Visitor

First, we need to register the visitor in the server. We need their name, email, and token. You send those to this endpoint: /api/v1/livechat/visitor. Here's an example code that you might send from your backend:

const body = {
  name: "Visitor Name",          // Replace with the visitor's name
  email: "visitor@example.com",  // Replace with the visitor's email
  token: "unique-visitor-token"  // Replace with a generated unique token
};

fetch(`${process.env.ROCKETCHAT_URL}/api/v1/livechat/visitor`, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Cache-Control': 'no-cache'
  },
  body: JSON.stringify(body)
})
  .then(response => response.json())
  .then(data => {
    if (data.success) {
      console.log("Visitor registered:", data);
    } else {
      console.error("Visitor registration failed:", data);
    }
  })
  .catch(error => console.error("Error in visitor registration:", error));

How to Create or Retrieve the Chat Room

After you've registered the visitor, you need to create a room for them so they can send you messages and you can respond.

Call this endpoint /api/v1/livechat/room with the visitor token as a query parameter. If the visitor already has a room, it’ll be returned. If not, a new one will be created. This is how you can make that request from your backend:

const token = "unique-visitor-token"; // Replace with the actual visitor token

fetch(`${process.env.ROCKETCHAT_URL}/api/v1/livechat/room?token=${token}`, {
  method: 'GET',
  headers: { 'Content-Type': 'application/json' },
})
  .then(response => response.json())
  .then(data => {
    if (data.success) {
      console.log("Room retrieved:", data);
    } else {
      console.error("Failed to retrieve room:", data);
    }
  })
  .catch(error => console.error("Error in retrieving room:", error));

How to Retrieve Livechat Configuration

Lastly, we need to get the info about the visitor and the agent we registered. Use this API endpoint to get the visitor token, room ID, and agent info. You can use it to check if the agent is online before trying to connect to the WebSocket.

const token = "unique-visitor-token"; // Replace with the actual visitor token
const url = `${process.env.ROCKETCHAT_URL}/api/v1/livechat/config?token=${token}`;

fetch(url, {
  method: 'GET',
  headers: { 'Content-Type': 'application/json' },
})
  .then(response => response.json())
  .then(data => {
    if (data.success) {
      console.log("Livechat config:", data);
    } else {
      console.error("Failed to get livechat config:", data);
    }
  })
  .catch(error => console.error("Error fetching livechat config:", error));

Step 4: Create the Connection to WebSocket

To establish the live chat experience, we need to open a WebSocket connection to Rocket.Chat and handle messaging.

WebSocket Connection Example

First, open the WebSocket like this:

const rocketChatSocket = new WebSocket("ws://example.rocket.chat/websocket");

Then connect:

const connectRequest = {
  msg: "connect",
  version: "1",
  support: ["1", "pre2", "pre1"]
};
rocketChatSocket.send(JSON.stringify(connectRequest));

You can keep the connection alive by responding to the server's "ping" messages with a "pong".

rocketChatSocket.onmessage = (event) => {
  try {
    const data = JSON.parse(event.data);
    if (data.msg === "ping") {
      console.log("Received ping from server, sending pong");
      rocketChatSocket.send(JSON.stringify({ msg: "pong" }));
    }
  } catch (error) {
    console.error("Error parsing WebSocket message:", error);
  }
};

You can subscribe to the room created for the visitor. Just use the visitor’s token and room ID from the previous sections.

const subscribeRequest = {
  msg: "sub",
  id: "unique-subscription-id", // Replace with your unique ID
  name: "stream-room-messages",
  params: [
    "fetched-room-id", // Replace with the room ID variable
    {
      useCollection: false,
      args: [
        { visitorToken: "visitor-token" } // Replace with your visitor token variable
      ],
    },
  ],
};
rocketChatSocket.send(JSON.stringify(subscribeRequest));

You can also listen for incoming messages. Here’s how you can process new messages as they arrive:

rocketChatSocket.onmessage = (event) => {
  try {
    const data = JSON.parse(event.data);
    if (
      data.msg === "changed" &&
      data.collection === "stream-room-messages"
    ) {
      // Handle new messages
      if (data.fields && data.fields.args && data.fields.args.length > 0) {
        const newMessage = data.fields.args[0];
        // Assume isValidChatMessage is defined to validate the message format
        if (isValidChatMessage(newMessage)) {
          // Update your messages list here
          console.log("New message received:", newMessage);
        }
      }
    }
  } catch (error) {
    console.error("Error parsing WebSocket message:", error);
  }
};

What if you want to send livechat messages? Just use this code to do so:

const sendMessageRequest = {
  msg: "method",
  method: "sendMessageLivechat",
  params: [
    {
      _id: "unique-message-id",  // Replace with a generated unique ID for the message
      rid: "room-id",            // Replace with the actual room ID
      msg: "Your message here",  // Replace with the message text you want to send
      token: "visitor-token"     // Replace with the actual visitor token
    }
  ],
  id: "unique-request-id"        // Replace with a unique request ID
};

rocketChatSocket.send(JSON.stringify(sendMessageRequest));

In your actual implementation, you can integrate these examples into your backend or client-side logic as needed.

You can take a look at the source code for how I implemented mine with Next.js or you can look at the live demo.

Conclusion

Adding a Livechat feature to your web apps shouldn't be hard. With Rocket.Chat's livechat API, you can quickly integrate chat functionality and gain valuable insights from your users. I even built an SDK wrapper to make it easier to use.

Now it’s your turn! Try out Rocket.Chat’s API and build your own live chat system. You can explore more in the Rocket.Chat documentation.

Happy coding!

How to Start Building Projects with LLMs

Harshit Tyagi — Mon, 30 Sep 2024 18:46:25 +0000

If you’re an aspiring AI professional, becoming an LLM engineer offers an exciting and promising career path.

But where should you start? What should your trajectory look like? How should you learn?

In one of my previous posts, I laid out the complete roadmap to become an AI / LLM Engineer. Reading this article will give you insights into the types of skills you’ll need to acquire and how to start learning.

The Best Way to Learn is to BUILD!

As Andrej Karpathy puts it:

Andrej emphasizes that you should build concrete projects, and explain everything you learn in your own words. (He also instructs us to only compare ourselves to a younger version of ourselves – never to others.)

And I agree – building projects is the best way to not just learn but really grok these concepts. It will further sharpen the skills you’re learning to think about cutting edge use cases.

But the main challenge with this learning philosophy is that good projects can be hard to find.

And that’s the problem I am trying to resolve. I want to help people, including myself, discover and build practical and real-world projects that help you develop skills that are worth showcasing in your portfolio.

Here’s What We’ll Cover:

What Should Be Your First Project?
Project #1: YouTube Video Summarizer
Project #2 preview: Multi-purpose Customer Service Bot
Project #3 preview: RAG-Powered Support Bot
Conclusion

What Should Be Your First Project?

If you’re a beginner who knows basic to intermediate programming, your initial projects should showcase that you can comfortably build applications with LLMs.

They should demonstrate that:

you know what APIs are
you know how to consume them
you know how to build products that people actually want to use

Building a chatbot provides a great starting point, but at this point everyone has developed one. And there are many solutions for easy Streamlit based prototypes. So, you need to develop something that’s actually usable and has the potential to reach a wider audience.

I’d suggest building a chatbot for WhatsApp or Discord or Telegram. Build a chatbot which solves a problem people struggle with, a problem that companies have started to build solutions for.

If I had to pick a good and, arguably, the most common AI project that every company has started to work on, it would be RAG-powered chatbots.

But before you get to building RAG-powered bots, you should start building something slightly more basic but practical with LLMs.

To kick things off, let’s start by building a YouTube Summariser.

Project #1: Summarise YouTube Videos

We’ll build the first part of this project in this tutorial: the core functionality of a YouTube video summariser tool.

Our bot will:

Receive the YouTube URL.
Validate if the URL is correct.
Retrieve the transcript of the video
Use an LLM to analyze and summarize the video’s content.
Return the summary to the user.

Setup and Requirements

For this project, we’ll code the core functionality in a Jupyter Notebook using the following Python packages:

langchain-together — for the LLM using the LangChain <> Together AI integration
langchain-community — for specific data loaders
langchain — for programming with LLMs
pytube — for fetching video info
youtube-transcript-api — for youtube video transcript

We’ll use the Llama 3.1 model offered as an API by Together AI.

Together AI is a cloud platform that offers the open source models as inference APIs. without worrying about the underlying infrastructure.

Let’s start by installing these:

!pip install — upgrade — quiet langchain
!pip install — quiet langchain-community
!pip install — upgrade — quiet langchain-together
!pip install youtube_transcript_api
!pip install pytube

Now let’s set up our LLM:

## setting up the language model
from langchain_together import ChatTogether
import api_key

llm = ChatTogether(api_key=api_key.api,temperature=0.0, 
                   model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo")

The next step is to process the YouTube videos as a data source. For this we’ll need to understand the concept of document loaders.

Introduction to Document Loaders

Document loaders provide a unified interface to load data from various sources into a standardized Document format.

They automatically extract and attach relevant metadata to the loaded content.
The metadata can include source information, timestamps, or other contextual data that can be valuable for downstream processing.
LangChain offers loaders for CSV, PDF, HTML, JSON, and even specialized loaders for sources like YouTube transcripts or GitHub repositories, as listed in their integrations page.

Categories of Document Loaders

Document loaders in LangChain can be broadly categorized into two types:

File Type-Based Loaders

Parse and load documents based on specific file formats
Examples include: CSV, PDF, HTML, Markdown

2. Data Source-Based Loaders

Retrieve data from various external sources
Load the data into Document objects
Examples include: YouTube, Wikipedia, GitHub

Integration Capabilities

LangChain’s document loaders can integrate with almost any file format you might need.
They also support many third-party data sources.

For our project, we’ll use the YoutubeLoader to get the transcripts in the required format.

YoutubeLoader from LangChain to Get Transcript:

## import the youtube documnent loader from LangChain
from langchain_community.document_loaders import YoutubeLoader

video_url = 'https://www.youtube.com/watch?v=gaWxyWwziwE'
loader = YoutubeLoader.from_youtube_url(video_url, add_video_info=False)
data = loader.load()

Process the YouTube Transcript

Display raw transcript content
Use the LLM to summarize and extract key points from the transcript:

# show the extracted page content
data[0].page_content

The page_content attribute contains the complete transcript as shown in the output below:

Now that we have the transcript, we simply need to pass this to the LLM we configured above along with the prompt to summarise.

First, let’s understand a simple method:

Langchain offers the invoke() method to which you need to pass the system message and the user or human message.

The system message is essentially the instructions for the LLM on how it is supposed to process the human request.

And the human message is simply what we want the LLM to do.

# This code creates a list of messages for the language model:
# 1. A system message with instructions on how to summarize the video transcript
# 2. A human message containing the actual video transcript

# The messages are then passed to the language model (llm) for processing
# The model's response is stored in the 'ai_msg' variable and returned

messages = [
    (
        "system", 
        """Read through the entire transcript carefully.
           Provide a concise summary of the video's main topic and purpose.
           Extract and list the five most interesting or important points from the transcript. For each point: State the key idea in a clear and concise manner.

        - Ensure your summary and key points capture the essence of the video without including unnecessary details.
        - Use clear, engaging language that is accessible to a general audience.
        - If the transcript includes any statistical data, expert opinions, or unique insights, prioritize including these in your summary or key points.""",
    ),
    ("human", data[0].page_content),
]
ai_msg = llm.invoke(messages)
ai_msg

But this method won’t work when you have more variables and when you want a more dynamic solution.

For this, LangChain offers PromptTemplate:

A PromptTemplate in LangChain is a powerful tool that helps in creating dynamic prompts for large language models (LLMs). It allows you to define a template with placeholders for variables that can be filled in with actual values at runtime.

This helps in managing and reusing prompts efficiently, ensuring consistency and reducing the likelihood of errors in prompt creation.

A PromptTemplate consists of:

Template String: The actual prompt text with placeholders for variables.
Input Variables: A list of variables that will be replaced in the template string at runtime.

# Set up a prompt template for summarizing a video transcript using LangChain

# Import necessary classes from LangChain
from langchain.prompts import PromptTemplate
from langchain import LLMChain

# Define a PromptTemplate for summarizing video transcripts
# The template includes instructions for the AI model on how to process the transcript
product_description_template = PromptTemplate(
    input_variables=["video_transcript"],
    template="""
    Read through the entire transcript carefully.
           Provide a concise summary of the video's main topic and purpose.
           Extract and list the five most interesting or important points from the transcript. 
           For each point: State the key idea in a clear and concise manner.

        - Ensure your summary and key points capture the essence of the video without including unnecessary details.
        - Use clear, engaging language that is accessible to a general audience.
        - If the transcript includes any statistical data, expert opinions, or unique insights, 
        prioritize including these in your summary or key points.

    Video transcript: {video_transcript}    """
)

How to Use LLMChain / LCEL for Summarization

A chain is a sequence of steps that consists of a language model, PromptTemplate, and an optional output parser.

Create an LLMChain with the custom prompt template
Generate a summary of the video transcript using the chain

Here, we are using LLMChain but you can also use LangChain Expression Language as well to do this:

## invoke the chain with the video transcript 
chain = LLMChain(llm=llm, prompt=product_description_template)

# Run the chain with the provided product details
summary = chain.invoke({
    "video_transcript": data[0].page_content
})

This will give you the summary object which has the text attribute that contains the response in markdown format.

summary['text']

The raw response will look like this:

To see the Markdown formatted response:

from IPython.display import Markdown, display

display(Markdown(summary['text']))

And there you go:

So, the core functionality of our YouTube summariser is now working.

But this is working in your Jupyter Notebook, to make it more accessible, we’d need to get this functionality deployed on WhatsApp.

How to serve the YT summariser on WhatsApp

For this, we’d need to serve our YT summarisation functionality as an API endpoint for which we are going to use Flask. You can also use FastAPI.

Now we’ll turn all the code in the Jupyter notebook into functions. So, add a function to check if it is a valid youtube URL, then define the summarise function that is basically a compilation of what we wrote in the Jupyter notebook.

You can configure our endpoint in the following manner:

@app.route('/summary', methods=['POST'])
def summary():
    url = request.form.get('Body')  # Get the JSON data from the request body
    print(url)
    if is_youtube_url(url):
        response = summarise(url)
    else:
        response = "please check if this is a correct youtube video url"
    print(response)
    resp = MessagingResponse()
    msg = resp.message()
    msg.body(response)
    return str(resp)

Once your app.py is ready with your Flask API, run the Python script, and you should have your server running locally on your system.

The next step is to make your local server connect with WhatsApp, and that’s where we’ll use Twilio.

Twilio allows us to implement this handshake by offering a WhatsApp sandbox to test your bot. You can follow the steps in this guide here to build this connection.

I got the connection established:

Now, we can start testing our WhatsApp bot:

Amazing!

I explain all the steps in detail in my project-based course on Building LLM-powered WhatsApp Chatbots.

It’s a 3-project course that contains two other more complex projects. I’ll give you a brief summary of those other projects here so you can try them out for yourselves. And if you’re interested, you can check out the course as well.

Project #2 — Build a Bot that Can Handle Different Types of User Queries

This bot acts as a customer service representative for an airline. It can answer questions related to flight status, baggage inquiries, ticket booking, and more. It uses Langchain’s Router and LLM models to dynamically generate responses based on the user’s input.

Different prompt templates are defined for various customer queries, such as flight status, baggage inquiries, and complaints.
Based on the query, the router selects the appropriate template and generates a response.
Twilio then sends the response back to the WhatsApp chat.

Project #3 — RAG-Powered Support Bot

This chatbot answers questions related to airline services using a document-based system. The document is converted into embeddings, which are then queried using Langchain’s RAG system to generate responses. Companies want developers these days who have these skills, so this is an especially practical project.

The guidelines/rules document is embedded using FAISS and HuggingFace models.
When a user submits a question, the RAG system retrieves relevant information from the document.
The system then generates a response using a pre-trained LLM and sends it back via Twilio.

These 3 projects will get you started so you can continue experimenting and learning more about AI engineering.

Customer Support is the most funded category in AI because it reduces the cost instantly if AI can handle communication with disgruntled users.

So, we build bots that can handle different types of queries, intelligent RAG powered bots which will have access to proprietary documents to provided up-to-date information to the users.

That’s why I created this project-based course to help you start building with LLMs.

Check out the course preview here:

And to thank you for reading this guide, you can use the code FREECODECAMP to get a 20% discount on my course.

I want to make this affordably accessible for all those who are sincere about building with AI, so I’ve priced it affordably at $14.99 USD.

Conclusion

In this tutorial, we focused on building a fun YouTube video summarizer tool that is served on WhatsApp.

The bot's core functionality includes:

Receiving a YouTube URL
Validating the URL
Retrieving the video transcript
Using an LLM to summarize the content
Returning the summary to the user

We used a number of Python packages including langchain-together, langchain-community, langchain, pytube, and youtube-transcript-api.

The project uses the Llama 3.1 model via Together AI's API.

We built the core summarisation functionality using

Using LangChain's invoke() method with system and human messages
Using PromptTemplate and LLMChain for more dynamic solutions

To make the tool accessible via WhatsApp:

The functionality is served as an API endpoint using Flask
Twilio is used to connect the local server with WhatsApp
A WhatsApp sandbox is used for testing the bot

To continue building further projects, check out the course.

It is a beginner track course where you start from learning to build with LLMs, then apply those skills to build 3 different types of LLM applications. Not just that – you learn to serve your applications as WA chatbots.

chatbot - freeCodeCamp.org

How to Build and Deploy a Production-Ready WhatsApp Bot with FastAPI, Evolution API, Docker, EasyPanel, and GCP

Table of Contents

How the Architecture Works

How Your WhatsApp Bot Works

Imagine a postal service

The 7 steps

One line summary

Why These Tools?

FastAPI

Evolution API

Docker

EasyPanel

Google Cloud Platform (GCP)

Prerequisites

Step 1: Create Firewall Rules on GCP

Step 2: Create a Virtual Machine (Ubuntu 22.04)

Step 3: SSH into the VM

Step 4: Install Docker

Step 5: Install EasyPanel

Step 6: Open the EasyPanel Dashboard

Step 7: Deploy Evolution API

Step 8: Connect WhatsApp

Step 9: Deploy the FastAPI Bot

1: Go to EasyPanel

2: Create a new project

3: Add an App service

4: Choose Git deployment

5: Paste your repository URL

6: Domains in EasyPanel

7: Set the port to 9000

Configure Environment Variables

Step 10: Connect the Webhook – Telling Evolution API Where to Send Messages

So what exactly is a webhook?

Let's set it up

1. Open your Evolution API dashboard.

2. Turn the webhook on.

3. Enter your app's webhook URL.

4. Leave "Webhook by Events" and "Webhook Base64" turned off for now.

5. Scroll down to the Events section and enable these two events:

Quick recap of what you just did

Step 11: Final Test

Production Considerations

Conclusion

How to Build an AI-Powered RAG Chatbot with Amazon Lex, Bedrock, and S3

Table of Contents

Prerequisites

What is Retrieval-Augmented Generation (RAG)?

What is Amazon Bedrock?

Getting Started: Access models on Bedrock

Step 1: Upload Travel Policy Documents to the S3 Bucket

Step 2: Create a Knowledge base in Amazon Bedrock

Step 3: Create an Amazon Lex Chatbot

Step 4: Add a Welcome Intent to Your Chatbot

Step 5: Build the Chatbot

Step 6: Adding Amazon QnAIntent

Conclusion

How to Build a Conversational AI Chatbot with Stream Chat and React

Table of Contents

Prerequisites

Sneak Peek

Core Technologies

Backend Implementation Guide

Project Setup

Frontend Implementation Guide

Project Setup

Understanding the App Component

Define Constants

Create a User

Setup a Client

Initialize Channel

Render Chat Interface

Adding AI to the Channel

Configuring the ChannelHeader

Adding an AI State Indicator

Building the Speech to Text Functionality

Initial States Configuration

Context Integration

Web Speech API Detection and Initialization

Event Handler Configuration

7: Set the port to `9000`

Add Dependencies (`pubspec.yaml`)

`lib/main.dart`

`lib/screens/chat.dart`