<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ ollama - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ ollama - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sun, 14 Jun 2026 10:15:04 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/ollama/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Protect Sensitive Data by Running LLMs Locally with Ollama ]]>
                </title>
                <description>
                    <![CDATA[ Whenever engineers are building AI-powered applications, use of sensitive data is always a top priority. You don't want to send users' data to an external API that you don't control. For me, this happ ]]>
                </description>
                <link>https://www.freecodecamp.org/news/protect-sensitive-data-with-local-llms/</link>
                <guid isPermaLink="false">69a99b623728a9dc358a5d85</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langchain ]]>
                    </category>
                
                    <category>
                        <![CDATA[ langgraph ]]>
                    </category>
                
                    <category>
                        <![CDATA[ LLM&#39;s  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manoj Aggarwal ]]>
                </dc:creator>
                <pubDate>Thu, 05 Mar 2026 15:04:02 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/92c9b0b4-5ff8-40ab-b5f5-a060765e99b4.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Whenever engineers are building AI-powered applications, use of sensitive data is always a top priority. You don't want to send users' data to an external API that you don't control.</p>
<p>For me, this happened when I was building <a href="https://github.com/manojag115/FinanceGPT">FinanceGPT</a>, which is my personal open-source project that helps me with my finances. This application lets you upload your bank statements, tax forms like 1099s, and so on, and then you can ask questions in plain English like, "How much did I spend on groceries this month?" or "What was my effective tax rate last year?"</p>
<p>The problem is that answering these questions means sending all the sensitive transaction history, W-2s and income data to OpenAI or Anthropic or Google, which I was not comfortable with. Even after redacting PII data from these documents, I was not ok with the trade-off.</p>
<p>This is where Ollama comes in. Ollama lets you run large language models entirely on your own laptop. You don't need any API keys or cloud infrastructure and no data leaves your machine.</p>
<p>In this tutorial, I will walk you through what Ollama is, how to get started with it, and how to use it in a real Python application so that users of the application can choose to keep their data completely local.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#what-is-ollama">What is Ollama</a></p>
</li>
<li><p><a href="#how-ollamas-api-works">How Ollama's API works</a></p>
</li>
<li><p><a href="#how-to-call-ollama-from-python">How to call Ollama from Python</a></p>
</li>
<li><p><a href="#how-to-integrate-ollama-into-a-langchain-app">How to Integrate Ollama into a LangChain App</a></p>
</li>
<li><p><a href="#how-to-build-an-llm-provider-agnostic-app">How to Build an LLM-Provider Agnostic App</a></p>
</li>
<li><p><a href="#how-to-use-ollama-with-langgraph">How to use Ollama with LangGraph</a></p>
</li>
<li><p><a href="#how-financegpt-uses-this-in-practice">How FinanceGPT Uses This in Practice</a></p>
</li>
<li><p><a href="#tradeoffs-to-be-aware-of">Tradeoffs to be Aware Of</a></p>
</li>
<li><p><a href="#conclusion">Conclusion</a></p>
</li>
<li><p><a href="#check-out-financegpt">Check Out FinanceGPT</a></p>
</li>
<li><p><a href="#resources">Resources</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You will need the following at a minimum:</p>
<ul>
<li><p>Python 3.10+</p>
</li>
<li><p>A machine with at least 8GB of RAM (16GB recommended for larger models)</p>
</li>
<li><p>Basic familiarity with Python and pip</p>
</li>
</ul>
<h2 id="heading-what-is-ollama">What is Ollama?</h2>
<p>Ollama is an open-source tool that makes running LLMs locally very easy. You can think of it as Docker but for AI models. You can pull models using just one command and Ollama handles everything else like downloading the weights, managing memory and the serving the model through a local REST API.</p>
<p>The local REST API is compatible with OpenAI's API format which means any application that can talk to OpenAI, can switch to using Ollama without changing any code.</p>
<h3 id="heading-installation">Installation</h3>
<p>First thing you would need is to download the installer from <a href="https://ollama.com/">ollama.com</a>. Once installed, you can verify it is running:</p>
<pre><code class="language-shell">ollama --version
</code></pre>
<p>The above command checks whether Ollama was installed correctly and prints the current version.</p>
<h3 id="heading-pull-and-run-your-first-model">Pull and Run Your First Model</h3>
<p>Ollama hosts a variety of models on <a href="https://ollama.com/library">ollama.com/library</a>. To pull and immediately chat with one, just do:</p>
<pre><code class="language-shell">ollama run llama3.2
</code></pre>
<p>This command will download the model from ollama and start an interactive chat session with it. Note: the model size would be a few GBs depending on which model is downloaded. Alternatively, if you want to download a specific model only:</p>
<pre><code class="language-shell">ollama pull mistral
</code></pre>
<p>This downloads a model to your machine without starting a chat session which is useful when you want to set up models in advance.</p>
<p>You can run the following command to list the models you have installed:</p>
<pre><code class="language-shell">ollama list
</code></pre>
<p>This shows all models you've downloaded locally along with their sizes.</p>
<p>I have used the following models and they have worked great for specific tasks:</p>
<table>
<thead>
<tr>
<th>Model</th>
<th>Size</th>
<th>Good For</th>
</tr>
</thead>
<tbody><tr>
<td><code>llama3.2</code></td>
<td>~2GB</td>
<td>Fast, general purpose</td>
</tr>
<tr>
<td><code>mistral</code></td>
<td>~4GB</td>
<td>Strong instruction following</td>
</tr>
<tr>
<td><code>qwen2.5:7b</code></td>
<td>~4GB</td>
<td>Multilingual, reasoning</td>
</tr>
<tr>
<td><code>deepseek-r1:7b</code></td>
<td>~4GB</td>
<td>Complex reasoning tasks</td>
</tr>
</tbody></table>
<h2 id="heading-how-ollamas-api-works">How Ollama's API works</h2>
<p>Once Ollama is running, it will be served on localhost:11434. You can call it directly using curl:</p>
<pre><code class="language-shell">curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{ "role": "user", "content": "What is compound interest?" }],
  "stream": false
}'
</code></pre>
<p>This sends a chat message directly to Ollama's REST API from the command line, with streaming disabled so you get the full response at once. The above endpoint is to simply chat with the model. The more useful endpoint is <code>http://localhost:11434/v1</code> as this is OpenAI-compatible. This is the key feature that makes it easy to drop into existing apps that use OpenAI or other LLMs.</p>
<h2 id="heading-how-to-call-ollama-from-python">How to Call Ollama from Python</h2>
<h3 id="heading-how-to-use-the-ollama-python-library">How to Use the Ollama Python Library</h3>
<p>Ollama has its own Python library that is pretty intuitive to use:</p>
<pre><code class="language-shell">pip install ollama
</code></pre>
<pre><code class="language-python">from ollama import chat

response = chat(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'Explain what a Roth IRA is in simple terms.'}
    ]
)

print(response.message.content)
</code></pre>
<p>The above code uses Ollama's native Python SDK to send a message and print the model's reply, which is the most straightforward way to call Ollama from Python</p>
<h3 id="heading-how-to-use-the-openai-sdk-with-ollama-as-the-backend">How to Use the OpenAI SDK with Ollama as the Backend</h3>
<p>As mentioned earlier, Ollama has an endpoint that is OpenAI compatible, so you can also use the OpenAI Python SDK and just point it to your local server:</p>
<pre><code class="language-shell">pip install openai
</code></pre>
<pre><code class="language-python">from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama',  # Required by the SDK, but ignored by Ollama
)

response = client.chat.completions.create(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'Explain what a Roth IRA is in simple terms.'}
    ]
)

print(response.choices[0].message.content)
</code></pre>
<p>This uses the standard OpenAI Python SDK but redirects it to your local Ollama server. The <code>api_key</code> field is required by the SDK but ignored by Ollama. This pattern makes using Ollama seamless for existing applications. The code is nearly identical to what you would write for OpenAI.</p>
<h2 id="heading-how-to-integrate-ollama-into-a-langchain-app">How to Integrate Ollama into a LangChain App</h2>
<p>Most production applications are built with an orchestration framework like LangChain, which has a native Ollama support. This means swapping providers is just a one-line change.</p>
<p>Install the integration:</p>
<pre><code class="language-shell">pip install langchain-ollama
</code></pre>
<h3 id="heading-how-to-create-a-chat-model">How to Create a Chat Model</h3>
<pre><code class="language-python">from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.2")

response = llm.invoke("What is the difference between a W-2 and a 1099?")
print(response.content)
</code></pre>
<p>This creates a LangChain-compatible chat model backed by a local Ollama model, a one-line swap from <code>ChatOpenAI</code>.</p>
<p>Compare this to the OpenAI version and you will see that the interface is almost identical:</p>
<pre><code class="language-python">from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")
</code></pre>
<h2 id="heading-how-to-build-an-llm-provider-agnostic-app">How to Build an LLM-Provider Agnostic App</h2>
<p>The real power of the application comes from the abstraction of LLM providers. Applications like Perplexity lets users choose the LLM they want to use for their tasks. Here's a simple factory pattern that returns the right LLM based on the configuration:</p>
<pre><code class="language-python">from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langchain_anthropic import ChatAnthropic

def get_llm(provider: str, model: str):
    """
    Return the appropriate LangChain LLM based on the provider.
    
    Args:
        provider: One of "openai", "ollama", "anthropic"
        model: The model name (e.g. "gpt-4o", "llama3.2", "claude-3-5-sonnet")
    
    Returns:
        A LangChain chat model ready to use
    """
    if provider == "openai":
        return ChatOpenAI(model=model)
    elif provider == "ollama":
        return ChatOllama(model=model)
    elif provider == "anthropic":
        return ChatAnthropic(model=model)
    else:
        raise ValueError(f"Unknown provider: {provider}")
</code></pre>
<p>The above snippet shows a helper that returns the right LangChain model based on a provider string, so the rest of your app never needs to know which LLM is running underneath.</p>
<p>Now the rest of your code does not need to know about the provider who's LLM is running underneath. This includes your chains, your agents and your tools. You pass <code>llm</code> around and it just works.</p>
<h2 id="heading-how-to-use-ollama-with-langgraph">How to use Ollama with LangGraph</h2>
<p>If you're using LangGraph to build agents (as I covered in my <a href="https://www.freecodecamp.org/news/how-to-develop-ai-agents-using-langgraph-a-practical-guide/">previous article on AI agents</a>), plugging in Ollama is equally seamless:</p>
<pre><code class="language-python">from langgraph.prebuilt import create_react_agent
from langchain_ollama import ChatOllama
from langchain_core.tools import tool

@tool
def get_spending_summary(category: str) -&gt; str:
    """Get total spending for a given category this month."""
    # In a real app, this would query your database
    return f"You spent $342.50 on {category} this month."

llm = ChatOllama(model="llama3.2")

agent = create_react_agent(
    model=llm,
    tools=[get_spending_summary]
)

response = agent.invoke({
    "messages": [{"role": "user", "content": "How much did I spend on groceries?"}]
})

print(response["messages"][-1].content)
</code></pre>
<p>This snippet builds a ReAct agent that uses a locally-running model to decide when to call tools while keeping all data on-device even during agentic workflows.</p>
<p>The agent will decide to call the <code>get_spending_summary</code> tool when needed and get the result using the locally running model instead of sending your data over the internet to OpenAI.</p>
<h2 id="heading-how-financegpt-uses-this-in-practice">How FinanceGPT Uses This in Practice</h2>
<p>FinanceGPT is built to support OpenAI, Anthropic, Google and Ollama as LLM providers. The user sets their preference on the UI or in a config file and the application instantiates the right model using a pattern very similar to the factory pattern above.</p>
<p>When the user chooses Ollama, here's what happens:</p>
<ol>
<li><p>Their bank statements and other sensitive documents are parsed locally</p>
</li>
<li><p>Sensitive fields like SSNs are masked before any LLM call</p>
</li>
<li><p>The masked data and query goes to the local Ollama server running on their own machine</p>
</li>
<li><p>The response comes back locally and nothing ever leaves their network</p>
</li>
</ol>
<p>To run FinanceGPT locally with Ollama, the setup looks like this:</p>
<pre><code class="language-shell"># 1. Pull a capable model
ollama pull llama3.2

# 2. Clone and configure FinanceGPT
git clone https://github.com/manojag115/FinanceGPT.git
cd FinanceGPT
cp .env.example .env

# 3. In .env, set your LLM provider to Ollama
# LLM_PROVIDER=ollama
# LLM_MODEL=llama3.2

# 4. Start the full stack
docker compose -f docker-compose.quickstart.yml up -d
</code></pre>
<p>With this setup, the entire application including the frontend, backend and LLM, runs on your own hardware.</p>
<h2 id="heading-tradeoffs-to-be-aware-of">Tradeoffs to be Aware Of</h2>
<p>Ollama is a great local alternative to using cloud LLMs, but it comes with its own problems.</p>
<h3 id="heading-response-quality">Response Quality</h3>
<p>Ollama models are essentially 7B parameter models running locally, so by design they will not match GPT-4o on complex reasoning tasks. For simple Q&amp;A and summarization tasks, the results would be comparable, but for multi-step reasoning or nuanced judgement calls, the gap is noticeable.</p>
<h3 id="heading-speed">Speed</h3>
<p>Inference speed depends on the hardware that is running the model. Without a GPU, the Ollama models can take several seconds to respond. On Apple Silicon (M1/M2/M3), the performance is surprisingly good even without a dedicated GPU.</p>
<h3 id="heading-hardware-requirements">Hardware Requirements</h3>
<p>Small models (7B parameters) need around 8GB of RAM, however larger models (13B+) need 16GB or more. If you are building your application for end users, you cannot guarantee they have the hardware.</p>
<h3 id="heading-tool-use-and-function-calling">Tool Use and Function Calling</h3>
<p>Not all local models support function calling reliably. If your agent depends heavily on tool use, test your chosen model carefully. Models like <code>qwen2.5</code> and <code>mistral</code> generally handle this better than others.</p>
<p>The right mental model: use cloud models when you need maximum capability, and local models when privacy or cost constraints make cloud models impractical.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you learned what Ollama is, how to install it and pull models, and three different ways to call it from Python: the native Ollama library, the OpenAI-compatible SDK, and LangChain. You also saw how to build a provider-agnostic factory pattern so your app can switch between cloud and local models with a single config change.</p>
<p>Ollama makes local LLMs genuinely practical for production apps. The OpenAI-compatible API means integration is nearly zero-friction, and LangChain's native support means you can build provider-agnostic apps from the start.</p>
<p>The finance domain is an obvious fit — but the same principle applies anywhere sensitive data is involved: healthcare, legal tech, HR, personal productivity. If your app processes data that users wouldn't want stored on someone else's server, giving them a local option isn't just a nice-to-have. It's a trust feature.</p>
<h2 id="heading-check-out-financegpt"><strong>Check Out FinanceGPT</strong></h2>
<p>All the code examples here came from <a href="https://github.com/manojag115/FinanceGPT">FinanceGPT</a>. If you want to see these patterns in a complete app, poke around the repo. It's got document processing, portfolio tracking, tax optimization – all built with LangGraph.</p>
<p>If you find this helpful, <a href="https://github.com/manojag115/FinanceGPT">give the project a star on GitHub</a> – it helps other developers discover it.</p>
<h2 id="heading-resources">Resources</h2>
<ul>
<li><p><a href="https://ollama.com/docs">Ollama Documentation</a></p>
</li>
<li><p><a href="https://ollama.com/library">Ollama Model Library</a></p>
</li>
<li><p><a href="https://python.langchain.com/docs/integrations/chat/ollama/">LangChain Ollama Integration</a></p>
</li>
<li><p><a href="https://www.freecodecamp.org/news/how-to-develop-ai-agents-using-langgraph-a-practical-guide/">How to Build AI Agents with LangGraph (my previous article)</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Run and Customize LLMs Locally with Ollama ]]>
                </title>
                <description>
                    <![CDATA[ In the long history of technological innovation, only a few developments have been as impactful as Large Language Models (LLMs). LLMs are advanced AI systems trained on vast datasets to understand, ge ]]>
                </description>
                <link>https://www.freecodecamp.org/news/run-and-customize-llms-locally-with-ollama/</link>
                <guid isPermaLink="false">69a6cd5c75d7a0f10015ca0d</guid>
                
                    <category>
                        <![CDATA[ LLM&#39;s  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ikegah Oliver ]]>
                </dc:creator>
                <pubDate>Tue, 03 Mar 2026 12:00:28 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/1d477910-f378-421b-87a2-20e390738e7c.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In the long history of technological innovation, only a few developments have been as impactful as Large Language Models (LLMs). LLMs are advanced AI systems trained on vast datasets to understand, generate, and process human language for tasks like writing, translation, summarization, and powering chatbots.</p>
<p>Having a powerful tool like this available offline is a game-changer. These <strong>Local LLMs</strong> keep high-level intelligence at your fingertips, even when you're offline. By the end of this guide, you’ll understand what local LLMs are, why they matter, and how to run them yourself, both the easy way and the more technical way.</p>
<p>This guide is suited but not limited to:</p>
<ul>
<li><p>Developers, technical writers, or curious engineers.</p>
</li>
<li><p>Anyone comfortable with the terminal.</p>
</li>
<li><p>People with some exposure to AI tools (ChatGPT, Claude, and so on).</p>
</li>
<li><p>Anyone with little or no experience running LLMs locally.</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-what-are-local-llms">What Are Local LLMs?</a></p>
</li>
<li><p><a href="#heading-what-running-locally-means">What Running “Locally” Means</a></p>
</li>
<li><p><a href="#heading-why-run-llms-locally">Why Run LLMs Locally?</a></p>
</li>
<li><p><a href="#heading-how-to-set-up-a-local-llm">How to Set Up a Local LLM</a></p>
</li>
<li><p><a href="#heading-what-is-ollama">What Is Ollama?</a></p>
</li>
<li><p><a href="#heading-how-ollama-operates">How Ollama Operates</a></p>
</li>
<li><p><a href="#heading-how-to-install-ollama">How to Install Ollama</a></p>
</li>
<li><p><a href="#heading-how-to-pull-an-llm">How to Pull an LLM</a></p>
</li>
<li><p><a href="#heading-how-to-run-your-llm">How to Run Your LLM</a></p>
</li>
<li><p><a href="#heading-how-to-customize-local-llms-in-ollama-with-modelfiles">How to Customize Local LLMs in Ollama with Modelfiles</a></p>
<ul>
<li><p><a href="#heading-what-are-modelfiles">What Are Modelfiles?</a></p>
</li>
<li><p><a href="#heading-how-to-customize-a-model">How to Customize a Model</a></p>
</li>
<li><p><a href="#heading-what-modelfiles-do-and-dont-do">What Modelfiles Do and Don't Do</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-are-local-llms">What are Local LLMs?</h2>
<p>Local Large Language Models (LLMs) bring AI off the cloud and onto your personal hardware. While standard models are originally too large for consumer devices, a process called <strong>quantization</strong> reduces their numerical precision, much like compressing a large high-resolution video file so it can stream smoothly on a mobile phone. This allows powerful intelligence to run locally on your laptop without needing massive server farms.</p>
<p>Running models such as Meta’s Llama 3.3, Google’s Gemma 3, or Alibaba’s Qwen series locally ensures full data privacy and eliminates subscription costs. Because the AI lives on your machine, you get a fast, offline-capable workspace that keeps your code secure and under your direct control.</p>
<h2 id="heading-what-running-locally-means">What Running “Locally” Means</h2>
<p>To understand how local LLMs run on your machine, you have to look into the physical components of your computer. When you run a model like Llama 3 or Mistral locally, your hardware transforms from a general-purpose machine into a specialized AI engine.</p>
<p>The process relies on a tight coordination between four key hardware pillars: <strong>Storage, RAM, the GPU, and the CPU.</strong></p>
<h3 id="heading-storage-the-models-permanent-home">Storage (The model's permanent home)</h3>
<p>Before you can chat, you must download the model. Unlike a standard app, an LLM is primarily a massive file of "weights", numerical values that represent everything the AI knows.</p>
<ul>
<li><p><strong>The Files:</strong> You’ll likely see formats like .gguf or .safetensors. These files are large: a "small" 7B (7 billion parameter) model usually occupies <strong>5GB to 10GB</strong> of disk space.</p>
</li>
<li><p><strong>SSD vs. HDD:</strong> An SSD is mandatory. Because the computer must move several gigabytes of data into memory every time you launch the model, a traditional hard drive will leave you waiting minutes for the "brain" to wake up.</p>
</li>
</ul>
<h3 id="heading-vram-and-ram-the-models-workspace">VRAM and RAM (The Model’s Workspace)</h3>
<p>This is the most critical bottleneck. For an AI to respond quickly, its entire "brain" must fit into high-speed memory.</p>
<ul>
<li><p><strong>VRAM (Video RAM):</strong> This is the memory physically attached to your graphics card (GPU). It is significantly faster than regular system RAM. If your model fits entirely in VRAM, the AI will likely type faster than you can read.</p>
</li>
<li><p><strong>System RAM:</strong> If your model is too big for your GPU, the software will "spill over" into your computer’s regular RAM. While this allows you to run massive models on modest hardware, the speed penalty is severe—often dropping from 50 words per second to just one or two.</p>
</li>
</ul>
<h3 id="heading-the-gpu-the-mathematical-engine">The GPU (The Mathematical Engine)</h3>
<p>While your CPU is the "manager" of your computer, the <strong>GPU (Graphics Processing Unit)</strong> is the "mathematician."</p>
<ul>
<li><p><strong>Parallel Power:</strong> LLMs work by performing billions of simple math problems (matrix multiplications) at the same time. A CPU has a few powerful cores, but a GPU has thousands of smaller cores designed specifically for this parallel math.</p>
</li>
<li><p><strong>Unified Memory (Apple Silicon):</strong> On modern Macs (M1/M2/M3), the CPU and GPU share the same pool of memory. This "Unified Memory" is a game-changer for local AI, allowing even thin laptops to handle relatively large models that would typically require a chunky desktop GPU.</p>
</li>
</ul>
<p>For optimal performance, always compare your computer's specs with the model’s requirements to see which models you can comfortably run.</p>
<h2 id="heading-why-run-llms-locally">Why Run LLMs Locally?</h2>
<p>Running an LLM locally isn't just for tech enthusiasts, it’s a strategic move for anyone who wants full control over their AI. Core benefits of running an LLM locally are:</p>
<ol>
<li><p><strong>Offline Usage</strong>: You're not limited to the cloud. You can explore and use your data wherever you go. Whether you're on a plane or in a remote area, your AI works without an internet connection.</p>
</li>
<li><p><strong>Privacy and data ownership</strong>: Also, because you are not connected to the cloud, there is no risk of your data and prompts being exploited by a third party remotely or used to train a company’s next model.</p>
</li>
<li><p><strong>Cost control</strong>: No need for monthly subscriptions and API tokens. Once you have the hardware, running the model is essentially free, given its capabilities and your configurations.</p>
</li>
<li><p><strong>Customization &amp; Experimentation</strong>: If you have multiple models downloaded, you can "swap brains" instantly. Try different models, fine-tune them for specific tasks, and tweak settings that big providers keep locked.</p>
</li>
<li><p><strong>Faster iteration for dev workflows</strong>: For developers, local hosting eliminates network latency, allowing for near-instant responses and faster testing loops.</p>
</li>
</ol>
<h3 id="heading-tradeoffs">Tradeoffs</h3>
<p>Local LLMs have certain tradeoffs to consider:</p>
<ul>
<li><p><strong>Hardware Requirements:</strong> You’ll need a decent setup—specifically, a GPU with a good amount of VRAM (usually 8GB+) or a Mac with Apple Silicon (M1/M2/M3)—to achieve smooth performance.</p>
</li>
<li><p><strong>Performance Limitations:</strong> Local models are getting better every day, but they might not yet match the sheer "reasoning power" of a massive, billion-dollar cloud cluster like GPT-4.</p>
</li>
<li><p><strong>Initial Setup Friction:</strong> It isn’t always "plug and play." If you want to get hands-on with specific features, you will have to spend some time configuring software, downloading large model files, and troubleshooting your environment.</p>
</li>
</ul>
<p>Even with these trade-offs, having such a tool at your disposal and under your control remains a significant advantage in everyday life.</p>
<h2 id="heading-how-to-set-up-a-local-llm">How to Set Up a Local LLM</h2>
<p>There are many ways to get and set up a local LLM, but for this guide, you will use Ollama, a user-friendly tool that brings private, secure AI directly to your desktop. You will learn to pull and deploy high-performance models with a single command, optimize them for your specific CPU/GPU configuration, and use the powerful <strong>Modelfile</strong> system to "program" custom AI personalities tailored to your exact needs.</p>
<p>What We’ll Cover:</p>
<ul>
<li><p><strong>The Basics:</strong> Understanding how Ollama turns your PC into an AI powerhouse.</p>
</li>
<li><p><strong>Installation &amp; Setup:</strong> Getting up and running in under five minutes.</p>
</li>
<li><p><strong>Model Management:</strong> How to find, "pull" (download), and run models like Llama 3 or Mistral.</p>
</li>
<li><p><strong>Customization:</strong> Writing your first <strong>Modelfile</strong> to give your AI a specific job or personality.</p>
</li>
</ul>
<p>By the end of this, you will have a fully independent AI workstation, capable of sophisticated reasoning without ever sending a byte of data to the cloud.</p>
<h2 id="heading-what-is-ollama">What is Ollama?</h2>
<p><a href="https://ollama.com/">Ollama</a> is a free, open-source tool that makes running Large Language Models (LLMs) on your own hardware as easy as opening a web browser. It strips away the technical complexity that usually comes with AI research, giving you a clean, simple way to chat with, manage, and even customize your own AI models.</p>
<p>Before Ollama, running a local AI was a headache. You had to hunt for the right "weights" files on the internet, set up complex coding environments, and hope your hardware doesn't crash. Now, instead of spending hours configuring software, Ollama handles the heavy lifting. It automatically finds your graphics card (GPU) and tunes the settings for you.</p>
<h2 id="heading-how-ollama-operates">How Ollama Operates</h2>
<p>Ollama follows a simple "Mental Model" that mimics how you handle apps on a phone or music on a streaming service.</p>
<h3 id="heading-the-model-registry-the-library">The Model Registry (The Library)</h3>
<p>Ollama maintains a massive <a href="https://ollama.com/library">"Library"</a>, a central library of prepackaged AI models such as Llama 3, Mistral, and Gemma. You don't have to worry about file formats, you just pick a name from the list, and Ollama "pulls" it down to your machine.</p>
<h3 id="heading-the-local-runtime-the-engine">The Local Runtime (The Engine)</h3>
<p>Once you have a model, Ollama acts as the engine. It wakes the model up, loads it into your computer's memory (RAM/VRAM), and starts the mathematical "thinking" process. It is smart enough to use your GPU for speed, but it can also run on a standard CPU if that's all you have.</p>
<h3 id="heading-the-cli-the-control-centre">The CLI (The Control Centre)</h3>
<p>Ollama uses a <strong>Command Line Interface (CLI)</strong>. While that sounds technical, it just means you type simple, human-like instructions into a terminal window. Want to talk to a model? You just tell it to run. Want to see what you've downloaded? You ask it to list them.</p>
<h2 id="heading-how-to-install-ollama">How to Install Ollama</h2>
<p>Go to the Ollama <a href="https://ollama.com/download">download page</a>. For Windows and Mac, click the download button.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6349de767b9ff550634412bf/0e61c2b6-598f-49af-8de7-9029240ed9c2.png" alt="Screenshot of the Ollama download page showing macOS, Linux, and Windows options, with Windows selected and a PowerShell install command (irm https://ollama.com/install.ps1 | iex) plus a “Download for Windows” button (requires Windows 10 or later)." style="display:block;margin:0 auto" width="1920" height="895" loading="lazy">

<p>For Linux, run this command:</p>
<pre><code class="language-plaintext">curl -fsSL https://ollama.com/install.sh | sh
</code></pre>
<p>After downloading, open the file, follow the setup instructions, and install it.</p>
<p>On Windows and Mac, after installation, the Ollama native Desktop Application should open.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6349de767b9ff550634412bf/811604e9-ad6a-44cb-afde-c842c249b32e.png" alt="Screenshot of the Ollama desktop app interface showing the sidebar with “New Chat” and “Settings,” a blank chat area with a llama icon, a message input field, and the selected model set to “llama2:7b.”" style="display:block;margin:0 auto" width="1920" height="1020" loading="lazy">

<p>This GUI is most beneficial for those who feel the CLI is intimidating; you don't have to be a coder to use Ollama. Instead of typing commands, you can manage your models and start conversations through a sleek window that feels just like any other chat app.</p>
<h2 id="heading-how-to-pull-an-llm">How to Pull an LLM</h2>
<p>As mentioned earlier, Ollama has a vast <a href="https://ollama.com/library">library</a> of Large Language Models for different specs and uses. To download one to your computer, use the pull command followed by the name of the LLM. For example:</p>
<pre><code class="language-plaintext">ollama pull gemma3:1b
</code></pre>
<p>To see the models you downloaded or have, use the list command, like:</p>
<pre><code class="language-plaintext">ollama list
</code></pre>
<h2 id="heading-how-to-run-your-llm">How to Run Your LLM</h2>
<p>You now have your LLM on your computer. To use it, you use the run command, followed by the name of the LLM. For example:</p>
<pre><code class="language-plaintext">ollama run gemma3:1b
</code></pre>
<p>The LLM will load up, and you can prompt it.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6349de767b9ff550634412bf/b78fba39-1de0-4ec8-b313-8bc19211bda9.png" alt="Screenshot of a Windows Command Prompt showing ollama run gemma3:1b executed successfully, with the prompt displaying “Send a message (/? for help)” indicating the model is ready for input." style="display:block;margin:0 auto" width="760" height="222" loading="lazy">

<p>To exit the LLM, use Ctrl + d or type in <code>/bye</code>.<br>You can perform other operations like deleting a model, copying a model, show information on a model, and so on. Type in ollama help to see all these commands.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6349de767b9ff550634412bf/a30f393e-0b88-40db-b5ef-e0e1e30acc43.png" alt="Screenshot of a command line interface on a dark background displays the help message for &quot;ollama&quot; with the title &quot;Large language model runner&quot;. It includes sections for &quot;Usage&quot; and &quot;Available Commands,&quot; listing options such as &quot;serve,&quot; &quot;create,&quot; &quot;show,&quot; &quot;run,&quot; &quot;stop,&quot; &quot;pull,&quot; and &quot;list,&quot; with brief descriptions of each. The bottom of the screen displays &quot;Flags,&quot; which lists options such as &quot;-h, --help,&quot; &quot;--verbose,&quot; and &quot;--version&quot;." style="display:block;margin:0 auto" width="1614" height="785" loading="lazy">

<h2 id="heading-how-to-customize-local-llms-in-ollama-with-modelfiles">How to Customize Local LLMs in Ollama with Modelfiles</h2>
<p>One of Ollama’s most powerful features is the ability to customize how a local model behaves using <strong>Modelfiles</strong>. Rather than treating models as fixed black boxes, Modelfiles allow you to define <em>how</em> a model should respond, what role it should play, and how it should generate text, without retraining or fine-tuning.</p>
<p>This makes Modelfiles ideal for creating reusable, task-specific local models such as technical writers, code reviewers, research assistants, internal developer tools, or even character-driven assistants.</p>
<h2 id="heading-what-are-modelfiles">What are ModelFiles?</h2>
<p>A Modelfile is a plain-text configuration file used by Ollama to create a new model based on an existing one. It describes how a base model should be wrapped, prompted, and configured at runtime.</p>
<p>Essentially, a Modelfile:</p>
<ul>
<li><p>Starts from a base model</p>
</li>
<li><p>Applies a set of instructions</p>
</li>
<li><p>Produces a new, named model that can be run like any other</p>
</li>
</ul>
<p>Modelfiles do not modify the underlying model weights. Instead, they define behavioral rules, how the model should be prompted, how it should generate text, and how it should respond to user input.</p>
<h3 id="heading-modelfile-syntax-and-structure">Modelfile Syntax and Structure</h3>
<p>Modelfiles are line-based and declarative. Each directive defines a specific aspect of the model’s behavior.</p>
<p>A minimal Modelfile looks like this:</p>
<pre><code class="language-markdown">FROM llama3

SYSTEM """
You are a senior technical writer.
"""

PARAMETER temperature 0.2
</code></pre>
<ul>
<li><p><strong>FROM</strong>: This is the foundation. It tells the system which base architecture (like llama3) to inherit its intelligence and tokenizer from.</p>
</li>
<li><p><strong>SYSTEM</strong>: This sets the "permanent" instructions. By assigning the Senior Technical Writer role, we ensure that every response maintains a professional, structured tone without needing to remind the AI in every prompt.</p>
</li>
<li><p><strong>PARAMETER</strong>: These are the model's dials and knobs. In this case, we use the temperature 0.2 parameter to set a low "creativity dial," forcing the model to be more deterministic and precise, which is ideal for the consistent, factual output.</p>
</li>
</ul>
<p>Advanced users can also use TEMPLATE for custom prompt formatting and additional MESSAGE directives to include specific conversation history, though these aren't required for this basic setup.</p>
<p><strong>Quick reference cheat sheet:</strong></p>
<table style="min-width:75px"><colgroup><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"></colgroup><tbody><tr><td><p><strong>Directive</strong></p></td><td><p><strong>Purpose</strong></p></td><td><p><strong>Example</strong></p></td></tr><tr><td><p><strong>FROM</strong></p></td><td><p><strong>Required.</strong> Defines the base model.</p></td><td><p>FROM llama3</p></td></tr><tr><td><p><strong>SYSTEM</strong></p></td><td><p>Sets the model's persona and rules.</p></td><td><p>SYSTEM "You are a helpful assistant."</p></td></tr><tr><td><p><strong>PARAMETER</strong></p></td><td><p>Adjusts generation settings (randomness, context).</p></td><td><p>PARAMETER temperature 0.2</p></td></tr><tr><td><p><strong>TEMPLATE</strong></p></td><td><p>Formats how User/System prompts are structured.</p></td><td><p>TEMPLATE "{{ .System }}\nUser: {{ .Prompt }}"</p></td></tr><tr><td><p><strong>STOP</strong></p></td><td><p>Defines tokens that end the model's response.</p></td><td><p>STOP "&lt;/s&gt;"</p></td></tr><tr><td><p><strong>MESSAGE</strong></p></td><td><p>Adds specific message history to the model.</p></td><td><p>MESSAGE user "Hello!"</p></td></tr></tbody></table>

<h2 id="heading-how-to-customize-a-model">How to Customize a Model</h2>
<p>To create a model using a Modelfile, Ollama performs the following steps:</p>
<ul>
<li><p>Loads the specified base model</p>
</li>
<li><p>Applies system-level instructions</p>
</li>
<li><p>Configures generation parameters</p>
</li>
<li><p>Registers the result as a new local model</p>
</li>
</ul>
<p>For this article, you will be creating a technical writing assistant from any local LLM of your choice. You can use the LLM you downloaded earlier, or download another one you feel is a better fit for this model.</p>
<ol>
<li><p>Set up your environment: Create a folder named <code>my-writing-assistant</code>, then open it in your preferred IDE or text editor.</p>
</li>
<li><p>Create a Modelfile: Create a file named Modelfile in your folder. Populate it with the following:</p>
</li>
</ol>
<pre><code class="language-markdown">FROM llama3 

SYSTEM """
You are a senior technical writer.
Write clear, concise explanations.
Use headings and bullet points where appropriate.
Avoid marketing language.
"""

PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 4096
</code></pre>
<ol>
<li><p>Create your model: Open the terminal in your IDE, or if you are using a text editor without a built-in terminal, open your Command Prompt and navigate into the my-writing-assistant directory. Run this command:</p>
<pre><code class="language-plaintext">ollama create tech-writer -f Modelfile
</code></pre>
<p>You should see a response like this:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6349de767b9ff550634412bf/a403675f-396d-43af-b2ae-63bf3e56906c.png" alt="Screenshot of a command line interface showing the successful creation of a custom model named &quot;tech-writer&quot; using the command ollama create tech-writer -f Modelfile. The terminal displays progress logs for gathering components, using existing layers, and creating new layers, ending with a &quot;success&quot; message." style="display:block;margin:0 auto" width="1135" height="272" loading="lazy">
</li>
<li><p>Run your model: You can run your model like any other Ollama model, with the run command:</p>
<pre><code class="language-plaintext">ollama run tech-writer
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/6349de767b9ff550634412bf/ea8bb6a7-b64d-422c-9f52-6526ed1c8496.png" alt="Screenshot of a command line interface showing the command ollama run tech-writer being executed. Below the command, an interactive prompt appears with the text &quot;>>> Send a message (/? for help),&quot; indicating the custom model is ready for use." style="display:block;margin:0 auto" width="1198" height="111" loading="lazy">

<p>Try a documentation-based prompt and see your model behave exactly how your Modelfile designed it.</p>
</li>
</ol>
<p>You can also interact with your models(downloaded and modified) using the <strong>Desktop App</strong>. Simply open the application, select your preferred model from the chatbox dropdown menu, and start prompting.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6349de767b9ff550634412bf/896cd707-0695-494c-b8a8-b70ded37ce10.png" alt="A screenshot of a white theme chat interface showing a model selection dropdown menu open." style="display:block;margin:0 auto" width="1920" height="1020" loading="lazy">

<h2 id="heading-what-modelfiles-do-and-dont-do">What Modelfiles Do and Don't Do</h2>
<p>Modelfiles are powerful, but it’s important to understand their scope.</p>
<p>They:</p>
<ul>
<li><p>Customize model behavior</p>
</li>
<li><p>Enforce consistent prompting</p>
</li>
<li><p>Tune generation characteristics</p>
</li>
<li><p>Create reusable local models</p>
</li>
</ul>
<p>They do not:</p>
<ul>
<li><p>Retrain or fine-tune model weights</p>
</li>
<li><p>Add new knowledge</p>
</li>
<li><p>Change the model’s architecture</p>
</li>
</ul>
<p>A Modelfile shapes how a model responds, not what it knows.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Running large language models locally is no longer limited to researchers or high-end machines. With Ollama and Modelfiles, you can download capable models, run them on your own device, and tailor their behavior to fit your workflow.</p>
<p>In this guide, we covered what local LLMs are, why they matter, how Ollama simplifies setup, and how Modelfiles let you control tone, structure, and generation settings. Instead of relying on a generic chatbot, you can build assistants that feel intentional and purpose-built.</p>
<p>More importantly, running models locally changes how you interact with AI. You move from simply consuming an API to understanding and shaping the system itself. As AI continues to influence software, business, and everyday tools, hands-on experience with local models gives you a clearer view of where the technology is heading. The best way to understand that shift is to experiment, pull a model, refine a Modelfile,</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build and Deploy a Multi-Agent AI System with Python and Docker ]]>
                </title>
                <description>
                    <![CDATA[ You wake up and open your laptop. Your browser has 27 tabs open, your inbox is overflowing with unread newsletters, and meeting notes are scattered across three apps. Sound familiar? Now imagine you h ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-and-deploy-multi-agent-ai-with-python-and-docker/</link>
                <guid isPermaLink="false">699c785540e1f055acbb8b6f</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Open Source ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Balajee Asish Brahmandam ]]>
                </dc:creator>
                <pubDate>Mon, 23 Feb 2026 15:55:01 +0000</pubDate>
                <media:content url="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/5fc16e412cae9c5b190b6cdd/6bd425e1-7427-4fe8-b1a7-80fff56102f7.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>You wake up and open your laptop. Your browser has 27 tabs open, your inbox is overflowing with unread newsletters, and meeting notes are scattered across three apps. Sound familiar?</p>
<p>Now imagine you had a team of specialized assistants that worked overnight — one to read your inputs, one to summarize the key facts, one to rank what matters most, and one to format everything into a clean daily brief waiting in your inbox.</p>
<p>That is exactly what this handbook walks you through building. You will create a multi-agent AI system where four Python-based agents each handle one job. You will containerize each agent with Docker so the whole thing runs reliably on any machine. And you will wire it all together with Docker Compose so you can launch the entire pipeline with a single command.</p>
<p>This handbook assumes you are comfortable reading Python code, but it does not assume you have used Docker before. If you have never written a Dockerfile or run a container, that is fine — the fundamentals are covered as we go.</p>
<p>By the end, you will have a working system that turns digital noise into an organized daily digest, and you will understand the patterns behind it well enough to adapt them to your own projects.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#what-is-a-multi-agent-system-and-why-build-one">What is a Multi-Agent System (and Why Build One)?</a></p>
<ul>
<li><p><a href="#how-traditional-scripts-work">How Traditional Scripts Work</a></p>
</li>
<li><p><a href="#how-ai-agents-are-different">How AI Agents are Different</a></p>
</li>
<li><p><a href="#why-use-multiple-agents-instead-of-one">Why Use Multiple Agents Instead of One?</a></p>
</li>
</ul>
</li>
<li><p><a href="#what-is-docker-and-why-does-it-matter-here">What is Docker (and Why Does It Matter Here)?</a></p>
<ul>
<li><p><a href="#the-environment-problem">The Environment Problem</a></p>
</li>
<li><p><a href="#how-docker-solves-this">How Docker Solves This</a></p>
</li>
<li><p><a href="#how-docker-layers-work">How Docker Layers Work</a></p>
</li>
<li><p><a href="#docker-vs-no-docker">Docker vs. No Docker</a></p>
</li>
</ul>
</li>
<li><p><a href="#how-to-plan-the-architecture">How to Plan the Architecture</a></p>
</li>
<li><p><a href="#prerequisites-and-environment-setup">Prerequisites and Environment Setup</a></p>
<ul>
<li><p><a href="#how-to-install-python">How to Install Python</a></p>
</li>
<li><p><a href="#how-to-install-docker">How to Install Docker</a></p>
</li>
<li><p><a href="#how-to-verify-your-setup">How to Verify Your Setup</a></p>
</li>
<li><p><a href="#how-to-set-up-the-project-structure">How to Set Up the Project Structure</a></p>
</li>
</ul>
</li>
<li><p><a href="#how-to-build-each-agent-step-by-step">How to Build Each Agent Step by Step</a></p>
<ul>
<li><p><a href="#the-ingestor-agent">The Ingestor Agent</a></p>
</li>
<li><p><a href="#the-summarizer-agent">The Summarizer Agent</a></p>
</li>
<li><p><a href="#the-prioritizer-agent">The Prioritizer Agent</a></p>
</li>
<li><p><a href="#the-formatter-agent">The Formatter Agent</a></p>
</li>
</ul>
</li>
<li><p><a href="#how-to-handle-secrets-and-api-keys">How to Handle Secrets and API Keys</a></p>
<ul>
<li><p><a href="#using-env-files-for-development">Using .env Files for Development</a></p>
</li>
<li><p><a href="#how-to-use-docker-secrets-for-production">How to Use Docker Secrets for Production</a></p>
</li>
</ul>
</li>
<li><p><a href="#how-to-orchestrate-everything-with-docker-compose">How to Orchestrate Everything with Docker Compose</a></p>
</li>
<li><p><a href="#how-to-run-the-pipeline">How to Run the Pipeline</a></p>
</li>
<li><p><a href="#how-to-test-the-pipeline">How to Test the Pipeline</a></p>
<ul>
<li><p><a href="#unit-tests">Unit Tests</a></p>
</li>
<li><p><a href="#integration-tests">Integration Tests</a></p>
</li>
</ul>
</li>
<li><p><a href="#how-to-add-logging-and-observability">How to Add Logging and Observability</a></p>
</li>
<li><p><a href="#cost-rate-limits-and-graceful-degradation">Cost, Rate Limits, and Graceful Degradation</a></p>
</li>
<li><p><a href="#security-and-privacy-considerations">Security and Privacy Considerations</a></p>
</li>
<li><p><a href="#how-to-use-a-local-llm-for-full-privacy-ollama">How to Use a Local LLM for Full Privacy (Ollama)</a></p>
</li>
<li><p><a href="#example-seed-data-and-expected-output">Example Seed Data and Expected Output</a></p>
</li>
<li><p><a href="#how-to-automate-daily-execution">How to Automate Daily Execution</a></p>
</li>
<li><p><a href="#how-to-use-cron-on-linux-or-macos">How to Use Cron on Linux or macOS</a></p>
</li>
<li><p><a href="#how-to-use-task-scheduler-on-windows">How to Use Task Scheduler on Windows</a></p>
</li>
<li><p><a href="#how-to-add-delivery-notifications">How to Add Delivery Notifications</a></p>
</li>
<li><p><a href="#troubleshooting-common-errors">Troubleshooting Common Errors</a></p>
</li>
<li><p><a href="#production-deployment-options">Production Deployment Options</a></p>
<ul>
<li><p><a href="#docker-swarm">Docker Swarm</a></p>
</li>
<li><p><a href="#kubernetes">Kubernetes</a></p>
</li>
</ul>
</li>
<li><p><a href="#cloud-platforms">Cloud Platforms</a></p>
</li>
<li><p><a href="#conclusion-and-next-steps">Conclusion and Next Steps</a></p>
</li>
</ul>
<h2 id="heading-what-is-a-multi-agent-system-and-why-build-one">What is a Multi-Agent System (and Why Build One)?</h2>
<h3 id="heading-how-traditional-scripts-work">How Traditional Scripts Work</h3>
<p>A traditional Python script follows a fixed path. It reads some input, processes it through a series of hard-coded steps, and writes the output. If the input format changes even slightly, the script often breaks. Think of it like a train on a track. Trains are fast and efficient, but they can only go where the rails take them. If the track is blocked, the train stops.</p>
<h3 id="heading-how-ai-agents-are-different">How AI Agents are Different</h3>
<p>An AI agent is more like a bus driver. It has a destination (a goal), but it can decide which route to take based on current conditions (the data). If one road is blocked, it finds another.</p>
<p>Agents typically follow a loop called the <strong>ReAct pattern</strong>, which stands for Reasoning plus Acting. At each step, the agent thinks about what to do, takes an action, observes the result, and decides whether it has reached its goal. If not, it loops back and tries again. If so, it finishes.</p>
<p>In practice, this means an LLM-based agent can handle messy, unpredictable input much better than a traditional script. If a newsletter changes its format, the summarizer agent can still extract the key points because it reasons about the content rather than parsing a rigid structure.</p>
<h3 id="heading-why-use-multiple-agents-instead-of-one">Why Use Multiple Agents Instead of One?</h3>
<p>You might wonder: why not just use one powerful agent that does everything? That approach is called the "God Model" pattern, and it has real problems. When you ask a single LLM to ingest data, summarize it, prioritize it, and format it all in one prompt, you are giving it too much to think about at once. LLMs have a limited context window and limited attention. The more tasks you pile on, the more likely the model is to hallucinate, skip steps, or produce inconsistent output.</p>
<p>A multi-agent system solves this through <strong>separation of concerns</strong>. Each agent has one narrow job. The Ingestor reads and combines raw files, with no LLM needed. The Summarizer calls the LLM with a focused prompt: just summarize this text. The Prioritizer scores lines by keyword with no LLM needed. And the Formatter writes Markdown output, also with no LLM.</p>
<p>This design has several advantages. Each agent is simpler to build, test, and debug. You can swap out the Summarizer for a better model without touching anything else. And you can scale individual agents independently — for example, running multiple Summarizers in parallel if you have a lot of input.</p>
<h2 id="heading-what-is-docker-and-why-does-it-matter-here">What is Docker (and Why Does It Matter Here)?</h2>
<h3 id="heading-the-environment-problem">The Environment Problem</h3>
<p>If you have ever shared a Python project with someone and heard "it does not work on my machine," you already understand the problem Docker solves. Every Python project depends on specific versions of Python itself, plus libraries like <code>openai</code>, <code>requests</code>, or <code>beautifulsoup4</code>. These dependencies live in your operating system's environment. When you install a new library or upgrade Python, you might break a different project that depends on the old version.</p>
<p>Virtual environments help, but they only isolate Python packages. They do not isolate the operating system, system libraries, or other tools your code might need. And they do not guarantee that someone else can recreate your exact environment. For a multi-agent system, this problem gets worse. Each agent might need different dependencies. If they share an environment, their dependencies can conflict.</p>
<h3 id="heading-how-docker-solves-this">How Docker Solves This</h3>
<p>Docker packages your code, its dependencies, and a minimal operating system into a single unit called a <strong>container</strong>. When you run that container, it behaves exactly the same way regardless of what machine it is running on — your laptop, a coworker's computer, or a cloud server. Think of a Docker container like a shipping container for software. The contents are sealed inside, protected from the outside environment.</p>
<p>There are a few key Docker concepts to understand:</p>
<p><strong>Image</strong> — A read-only template that contains your code, dependencies, and a minimal OS. You build an image from a Dockerfile. Think of it as a recipe.</p>
<p><strong>Container</strong> — A running instance of an image. When you "run" an image, Docker creates a container from it. Think of it as a dish made from the recipe.</p>
<p><strong>Dockerfile</strong> — A text file with instructions for building an image. It specifies the base OS, what to install, what code to copy in, and what command to run when the container starts.</p>
<p><strong>Volume</strong> — A way to share files between your computer and a container, or between multiple containers. Our agents will use a shared volume to pass data to each other.</p>
<p><strong>Docker Compose</strong> — A tool for defining and running multiple containers together. You describe all your containers in a single YAML file, and Compose handles building, networking, and ordering them.</p>
<h3 id="heading-how-docker-layers-work">How Docker Layers Work</h3>
<p>Docker builds images in layers. Each instruction in a Dockerfile creates a new layer. Docker caches these layers, so if a layer has not changed since the last build, Docker reuses the cached version instead of rebuilding it. This is why Dockerfiles are structured in a specific order: the base OS layer rarely changes, the dependency installation layer changes when <code>requirements.txt</code> changes, and the application code layer changes on every code edit. By putting dependency installation before the code copy, Docker only re-runs <code>pip install</code> when your requirements actually change, making rebuilds much faster — seconds instead of minutes.</p>
<h3 id="heading-docker-vs-no-docker">Docker vs. No Docker</h3>
<p>To be clear, you do not strictly need Docker for this tutorial. You can run all four agents as plain Python scripts. But without Docker you face dependency conflicts from a shared environment, manual process management for scaling, having to redo all setup on every new machine, complex orchestration for testing, and painful Python version management when one agent needs 3.8 and another needs 3.10. With Docker, each agent has its own isolated environment, you run multiple containers in parallel with one command, <code>docker compose up</code> produces identical results everywhere, and each container runs its own Python version independently.</p>
<p>For a personal project, either approach works. But if you ever want to share this system, deploy it to a server, or run it in the cloud, Docker makes the difference between "here is a README with 15 setup steps" and "run <code>docker compose up</code>."</p>
<h2 id="heading-how-to-plan-the-architecture">How to Plan the Architecture</h2>
<p>Before writing any code, it is worth mapping out how the pieces fit together. The full system consists of four agents arranged in a sequential pipeline, all orchestrated by Docker Compose. Data flows through the Ingestor Agent, the Summarizer Agent, the Prioritizer Agent, and the Formatter Agent in that order. Each agent reads from a shared volume, processes its input, writes the result, and exits. Docker Compose enforces execution order by waiting for each container to finish successfully before starting the next one.</p>
<p>This is a synchronous pipeline: agents run one at a time, in sequence. It is the simplest multi-agent pattern to implement and understand. For more complex systems, you could replace the shared volume with a message broker like Redis or RabbitMQ, which lets agents run asynchronously and react to events. But for this daily-digest use case, the sequential approach is exactly right.</p>
<p>In terms of responsibilities:</p>
<ul>
<li><p><strong>Ingestor</strong> — Reads and combines raw files from <code>/data/input/</code> into <code>ingested.txt</code>. No LLM required.</p>
</li>
<li><p><strong>Summarizer</strong> — Distills key points from <code>ingested.txt</code> into <code>summary.txt</code>. The only agent that requires an LLM.</p>
</li>
<li><p><strong>Prioritizer</strong> — Scores items by urgency keywords, turning <code>summary.txt</code> into <code>prioritized.txt</code>. No LLM.</p>
</li>
<li><p><strong>Formatter</strong> — Produces the final Markdown report, <code>daily_digest.md</code>. No LLM.</p>
</li>
</ul>
<p>Notice that only one of the four agents actually calls an LLM. The others are plain Python. This is intentional — you should only use an LLM when you need reasoning or language understanding. Everything else should be deterministic code. It is cheaper, faster, and more predictable.</p>
<h2 id="heading-prerequisites-and-environment-setup">Prerequisites and Environment Setup</h2>
<p>You need the following tools installed before starting:</p>
<ul>
<li><p><strong>Python</strong> 3.10 or higher — the language for the agents</p>
</li>
<li><p><strong>Docker Desktop</strong> (Engine 20.10+) — the container runtime</p>
</li>
<li><p><strong>Docker Compose</strong> v2 (included with Docker Desktop) — multi-container orchestration</p>
</li>
<li><p><strong>Git</strong> 2.30+ — version control</p>
</li>
<li><p><strong>OpenAI Python SDK</strong> (<code>openai &gt;= 1.0</code>) — LLM API access</p>
</li>
<li><p><strong>Redis or RabbitMQ</strong> (optional) — async message queuing</p>
</li>
<li><p><strong>PostgreSQL</strong> (optional) — persistent data storage</p>
</li>
</ul>
<h3 id="heading-how-to-install-python">How to Install Python</h3>
<p>Download Python from <a href="https://python.org/">python.org</a>. On Windows, check the "Add Python to PATH" box during installation. On macOS, you can use Homebrew:</p>
<pre><code class="language-bash">brew install python@3.12
</code></pre>
<p>On Linux (Ubuntu/Debian), use your package manager:</p>
<pre><code class="language-bash">sudo apt update &amp;&amp; sudo apt install python3 python3-pip
</code></pre>
<h3 id="heading-how-to-install-docker">How to Install Docker</h3>
<p>Docker Desktop is the easiest way to get started on Windows and macOS. Download it from <a href="https://docker.com/">docker.com</a> and follow the prompts. On Windows, Docker Desktop requires WSL2 — the installer will guide you through enabling it. On Linux, install Docker Engine directly:</p>
<pre><code class="language-bash"># Ubuntu/Debian
sudo apt update
sudo apt install docker.io docker-compose-v2
sudo usermod -aG docker $USER  # So you don't need sudo for docker commands
</code></pre>
<p>After installing, log out and back in for the group change to take effect.</p>
<h3 id="heading-how-to-verify-your-setup">How to Verify Your Setup</h3>
<p>Open your terminal and run these commands. Each should print a version number without errors:</p>
<pre><code class="language-bash">python --version        # Should show 3.10 or higher
docker --version        # Should show 20.10 or higher
docker compose version  # Should show v2.x
git --version           # Should show 2.30 or higher
</code></pre>
<p>If any command fails, go back to the installation step for that tool. The most common issue is that the command is not in your PATH.</p>
<h2 id="heading-how-to-set-up-the-project-structure">How to Set Up the Project Structure</h2>
<p>Each agent lives in its own directory with its own code, Dockerfile, and requirements file. This isolation means you can build, test, and update each agent independently. Create the following structure:</p>
<pre><code class="language-plaintext">multi-agent-digest/
├── agents/
│   ├── ingestor/
│   │   ├── app.py
│   │   ├── Dockerfile
│   │   └── requirements.txt
│   ├── summarizer/
│   │   ├── app.py
│   │   ├── Dockerfile
│   │   └── requirements.txt
│   ├── prioritizer/
│   │   ├── app.py
│   │   ├── Dockerfile
│   │   └── requirements.txt
│   └── formatter/
│       ├── app.py
│       ├── Dockerfile
│       └── requirements.txt
├── data/
│   └── input/          # Your raw files go here
├── output/              # The final digest appears here
├── tests/               # Unit and integration tests
├── .env                 # API keys (gitignored!)
├── .gitignore
├── docker-compose.yml
└── README.md
</code></pre>
<p>You can create the folders quickly from the terminal:</p>
<pre><code class="language-bash">mkdir -p multi-agent-digest/agents/{ingestor,summarizer,prioritizer,formatter}
mkdir -p multi-agent-digest/{data/input,output,tests}
cd multi-agent-digest
</code></pre>
<h2 id="heading-how-to-build-each-agent-step-by-step">How to Build Each Agent Step by Step</h2>
<p>Every agent follows the same simple pattern: read an input file from the shared volume, do its job, and write an output file. This consistency makes the system easy to understand and extend.</p>
<h3 id="heading-the-ingestor-agent">The Ingestor Agent</h3>
<p>The Ingestor is the entry point of the pipeline. Its job is to read all text files from the input folder and combine them into a single file that the Summarizer can process. This is the simplest agent — no external libraries, no API calls, just file reading and writing.</p>
<p><code>agents/ingestor/app.py</code></p>
<pre><code class="language-python">import os
import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
logger = logging.getLogger("ingestor")

INPUT_DIR = "/data/input"
OUTPUT_FILE = "/data/ingested.txt"

def ingest():
    content = ""
    files_processed = 0
    for filename in sorted(os.listdir(INPUT_DIR)):
        filepath = os.path.join(INPUT_DIR, filename)
        if os.path.isfile(filepath):
            try:
                with open(filepath, "r", encoding="utf-8") as f:
                    content += f"\n--- {filename} ---\n"
                    content += f.read()
                    content += "\n"
                    files_processed += 1
            except Exception as e:
                logger.error(f"Failed to read {filename}: {e}")

    if files_processed == 0:
        logger.warning("No input files found in /data/input/")

    with open(OUTPUT_FILE, "w", encoding="utf-8") as out:
        out.write(content)
    logger.info(f"Ingested {files_processed} files -&gt; {OUTPUT_FILE}")

if __name__ == "__main__":
    ingest()
</code></pre>
<p>The <code>logging.basicConfig</code> block sets up structured logging. Every agent uses the same log format, so when Docker Compose runs them together, you get a clean, consistent timeline. The <code>sorted(os.listdir())</code> call ensures files are processed in alphabetical order — without it, the order depends on the filesystem and can vary between machines. The <code>try/except</code> block around each file read means a single corrupted file will not crash the entire pipeline. And if no files are found at all, the agent writes an empty output file rather than crashing, so downstream agents can handle empty input gracefully.</p>
<p><code>agents/ingestor/Dockerfile</code></p>
<pre><code class="language-dockerfile">FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
CMD ["python", "app.py"]
</code></pre>
<p><code>FROM python:3.10-slim</code> starts with a minimal Linux image that has Python pre-installed. The <code>-slim</code> variant is about 120 MB versus 900 MB for the full image. <code>WORKDIR /app</code> sets the working directory inside the container. <code>COPY requirements.txt</code> and <code>RUN pip install</code> handle dependencies at build time, not runtime. <code>COPY app.py</code> copies the application code last because it changes most often, and Docker caches previous layers. <code>CMD</code> specifies the command to run when the container starts.</p>
<p>Since the Ingestor uses only standard library modules, its <code>requirements.txt</code> can be empty:</p>
<pre><code class="language-plaintext"># No external dependencies needed
</code></pre>
<h3 id="heading-the-summarizer-agent">The Summarizer Agent</h3>
<p>The Summarizer is the most complex agent in the pipeline. It reads the ingested text and calls an LLM API to produce a concise summary. This is the only agent that makes a network call, which means it is the only one that can fail due to external factors: the API might be down, you might hit rate limits, or your key might be invalid.</p>
<p><code>agents/summarizer/app.py</code>:</p>
<pre><code class="language-python">import os
import logging
import time
from openai import OpenAI, RateLimitError, APIError

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
logger = logging.getLogger("summarizer")

INPUT_FILE = "/data/ingested.txt"
OUTPUT_FILE = "/data/summary.txt"

client = OpenAI()  # reads OPENAI_API_KEY from environment

SYSTEM_PROMPT = (
    "You are a helpful assistant that summarizes long text "
    "into key bullet points. Each bullet should be one "
    "concise sentence capturing a core insight."
)

MAX_RETRIES = 3
RETRY_DELAY = 5  # seconds

def summarize(text, retries=MAX_RETRIES):
    """Call the LLM API with retry logic for rate limits."""
    for attempt in range(retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "system", "content": SYSTEM_PROMPT},
                    {"role": "user", "content": text[:8000]}
                ],
                max_tokens=1000,
                temperature=0.3,
            )
            return response.choices[0].message.content
        except RateLimitError:
            wait = RETRY_DELAY * (attempt + 1)
            logger.warning(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)
        except APIError as e:
            logger.error(f"API error: {e}")
            raise
    raise RuntimeError("Max retries exceeded for LLM API call")

def main():
    with open(INPUT_FILE, "r", encoding="utf-8") as f:
        raw_text = f.read()

    if not raw_text.strip():
        logger.warning("Empty input. Writing fallback summary.")
        summary = "No content to summarize."
    else:
        try:
            summary = summarize(raw_text)
        except Exception as e:
            logger.error(f"Summarization failed: {e}")
            summary = f"Summarization failed: {e}"

    with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
        f.write(summary)
    logger.info(f"Summary written to {OUTPUT_FILE}")

if __name__ == "__main__":
    main()
</code></pre>
<p>The <code>OpenAI()</code> client automatically reads the <code>OPENAI_API_KEY</code> environment variable — you do not need to pass the key explicitly in code, which is both cleaner and safer. The <code>text[:8000]</code> slice limits how much text is sent to the API. Sending fewer tokens means faster responses and lower cost. For production, you would want smarter chunking that splits on sentence or paragraph boundaries rather than a raw character count.</p>
<p><strong>Temperature 0.3</strong> makes the output more focused and deterministic, which is ideal for summarization. The retry logic catches <code>RateLimitError</code> specifically and waits longer each time (5, 10, then 15 seconds) — this is called <strong>exponential backoff</strong>. Other API errors raise immediately because retrying them will not help. If the input is empty or the API fails completely, the agent writes a fallback message instead of crashing, so the downstream agents can still run.</p>
<p><code>agents/summarizer/requirements.txt</code>:</p>
<pre><code class="language-plaintext">openai&gt;=1.0
</code></pre>
<p>The Dockerfile is identical to the Ingestor's.</p>
<h3 id="heading-the-prioritizer-agent">The Prioritizer Agent</h3>
<p>The Prioritizer takes the LLM-generated summary and scores each line based on urgency keywords. This is a rule-based agent — no LLM call needed. It is fast, deterministic, and free.</p>
<p><code>agents/prioritizer/app.py</code>:</p>
<pre><code class="language-python">import os
import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
logger = logging.getLogger("prioritizer")

INPUT_FILE = "/data/summary.txt"
OUTPUT_FILE = "/data/prioritized.txt"

PRIORITY_KEYWORDS = [
    "urgent", "today", "asap", "important",
    "deadline", "critical", "action required"
]

def score_line(line):
    """Count how many priority keywords appear in a line."""
    lower = line.lower()
    return sum(1 for kw in PRIORITY_KEYWORDS if kw in lower)

def prioritize():
    with open(INPUT_FILE, "r", encoding="utf-8") as f:
        lines = [line.strip() for line in f if line.strip()]

    scored = [(line, score_line(line)) for line in lines]
    scored.sort(key=lambda x: x[1], reverse=True)

    with open(OUTPUT_FILE, "w", encoding="utf-8") as out:
        for line, score in scored:
            out.write(f"[{score}] {line}\n")

    logger.info(f"Prioritized {len(scored)} items -&gt; {OUTPUT_FILE}")

if __name__ == "__main__":
    prioritize()
</code></pre>
<p>The scoring function counts how many priority keywords appear in each line. A line containing "urgent deadline" scores 2, and a line with no keywords scores 0. The scored lines are sorted in descending order, so the most urgent items appear first. Each line is prefixed with its score in brackets, like <code>[2] Urgent: quarterly report due today</code>. In a more advanced system, you could replace this keyword scorer with an LLM-based ranker, but for a daily digest, simple keyword matching works surprisingly well.</p>
<p>This agent has no pip dependencies, so the Dockerfile skips the requirements step:</p>
<p><code>agents/prioritizer/Dockerfile</code>:</p>
<pre><code class="language-dockerfile">FROM python:3.10-slim
WORKDIR /app
COPY app.py .
CMD ["python", "app.py"]
</code></pre>
<h3 id="heading-the-formatter-agent">The Formatter Agent</h3>
<p>The Formatter is the final agent in the pipeline. It reads the scored lines and writes a clean Markdown document to the output directory.</p>
<p><code>agents/formatter/app.py</code>:</p>
<pre><code class="language-python">import os
import logging
from datetime import datetime

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
logger = logging.getLogger("formatter")

INPUT_FILE = "/data/prioritized.txt"
OUTPUT_FILE = "/output/daily_digest.md"

def format_to_markdown():
    with open(INPUT_FILE, "r", encoding="utf-8") as f:
        lines = [line.strip() for line in f if line.strip()]

    today = datetime.now().strftime('%Y-%m-%d')

    with open(OUTPUT_FILE, "w", encoding="utf-8") as out:
        out.write("# Your Daily AI Digest\n\n")
        out.write(f"**Date:** {today}\n\n")
        out.write("## Top Insights\n\n")
        for line in lines:
            if '] ' in line:
                score = line.split(']')[0][1:]
                content = line.split('] ', 1)[1]
                out.write(f"- **Priority {score}**: {content}\n")
            else:
                out.write(f"- {line}\n")

    logger.info(f"Digest written to {OUTPUT_FILE}")

if __name__ == "__main__":
    format_to_markdown()
</code></pre>
<p>Notice that the Formatter writes to <code>/output</code> instead of <code>/data</code>. This is a separate volume mount in Docker Compose. The <code>/data</code> volume is internal plumbing that agents use to communicate, while the <code>/output</code> volume maps to a folder on your host machine where you can access the final result. The <code>split('] ', 1)</code> with <code>maxsplit=1</code> ensures that bracket characters inside the actual content do not break the parsing.</p>
<p>The Dockerfile is the same as the Prioritizer's (no external dependencies).</p>
<h2 id="heading-how-to-handle-secrets-and-api-keys">How to Handle Secrets and API Keys</h2>
<blockquote>
<p>⚠️ <strong>Warning:</strong> Never commit API keys or secrets to version control. A leaked OpenAI key can rack up thousands of dollars in charges before you notice.</p>
</blockquote>
<h3 id="heading-using-env-files-for-development">Using .env Files for Development</h3>
<p>Create a <code>.env</code> file in your project root:</p>
<pre><code class="language-plaintext"># .env -- DO NOT COMMIT THIS FILE
OPENAI_API_KEY=sk-your-key-here
</code></pre>
<p>Then immediately add it to your <code>.gitignore</code>:</p>
<pre><code class="language-plaintext"># .gitignore
.env
output/
data/ingested.txt
data/summary.txt
data/prioritized.txt
__pycache__/
*.pyc
</code></pre>
<p>Docker Compose reads <code>.env</code> files automatically when it starts. In your <code>docker-compose.yml</code>, you reference the variable with <code>${OPENAI_API_KEY}</code>, and Compose substitutes the real value at runtime. The key never appears in your Dockerfile, your code, or your version history.</p>
<h3 id="heading-how-to-use-docker-secrets-for-production">How to Use Docker Secrets for Production</h3>
<p>For production deployments on Docker Swarm or Kubernetes, environment variables are visible in process listings and inspect commands. Docker secrets are more secure:</p>
<pre><code class="language-bash"># Create the secret
echo "sk-your-key-here" | docker secret create openai_key -
</code></pre>
<pre><code class="language-yaml"># Reference in docker-compose.yml (Swarm mode only)
services:
  summarizer:
    secrets:
      - openai_key

secrets:
  openai_key:
    external: true
</code></pre>
<p>The secret gets mounted as a read-only file at <code>/run/secrets/openai_key</code> inside the container. Your code reads the key from that file instead of from an environment variable.</p>
<h2 id="heading-how-to-orchestrate-everything-with-docker-compose">How to Orchestrate Everything with Docker Compose</h2>
<p>With all four agents built, Docker Compose ties them together. It builds each container, mounts the shared volumes, passes environment variables, and enforces the correct execution order.</p>
<p><code>docker-compose.yml</code>:</p>
<pre><code class="language-yaml">version: "3.9"

services:
  ingestor:
    build: ./agents/ingestor
    container_name: agent_ingestor
    volumes:
      - ./data:/data
    restart: "no"

  summarizer:
    build: ./agents/summarizer
    container_name: agent_summarizer
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    depends_on:
      ingestor:
        condition: service_completed_successfully
    volumes:
      - ./data:/data
    deploy:
      resources:
        limits:
          memory: 512M
    restart: "no"

  prioritizer:
    build: ./agents/prioritizer
    container_name: agent_prioritizer
    depends_on:
      summarizer:
        condition: service_completed_successfully
    volumes:
      - ./data:/data
    restart: "no"

  formatter:
    build: ./agents/formatter
    container_name: agent_formatter
    depends_on:
      prioritizer:
        condition: service_completed_successfully
    volumes:
      - ./data:/data
      - ./output:/output
    restart: "no"
</code></pre>
<p>The <code>depends_on</code> with <code>condition: service_completed_successfully</code> is the key to the sequential pipeline. This setting (available in Compose v2) tells Docker to wait until the previous container exits with a zero exit code before starting the next one. Without this condition, <code>depends_on</code> only waits for the container to <em>start</em>, not to <em>finish</em> — which would cause race conditions where the Summarizer tries to read a file the Ingestor has not written yet.</p>
<p>The <strong>volume mounts</strong> (<code>./data:/data</code>) map your local data folder into each container. All agents share this volume, which is how they pass files to each other. The Formatter also gets <code>./output:/output</code> so the final digest lands on your host machine. The <strong>memory limit</strong> of 512M on the Summarizer prevents it from consuming too much RAM. And <code>restart: "no"</code> ensures Docker does not restart the agents after they finish, since they are batch jobs.</p>
<h3 id="heading-how-to-run-the-pipeline">How to Run the Pipeline</h3>
<pre><code class="language-bash">docker compose up --build
</code></pre>
<p>The <code>--build</code> flag tells Compose to rebuild the images before running. You will see structured logs from each agent in sequence:</p>
<pre><code class="language-plaintext">agent_ingestor    | 2025-01-20 07:00:01 [INFO] ingestor: Ingested 3 files
agent_summarizer  | 2025-01-20 07:00:04 [INFO] summarizer: Summary written
agent_prioritizer | 2025-01-20 07:00:05 [INFO] prioritizer: Prioritized 8 items
agent_formatter   | 2025-01-20 07:00:05 [INFO] formatter: Digest written
</code></pre>
<p>When all four containers finish, open <code>output/daily_digest.md</code> to see your morning brief.</p>
<h2 id="heading-how-to-test-the-pipeline">How to Test the Pipeline</h2>
<h3 id="heading-unit-tests">Unit Tests</h3>
<p>Because each agent's core logic is a plain Python function, you can test it in isolation without Docker.</p>
<p><code>tests/test_prioritizer.py</code></p>
<pre><code class="language-python">import sys
sys.path.insert(0, 'agents/prioritizer')
from app import score_line

def test_urgent_keyword_scores_one():
    assert score_line("This is urgent") == 1

def test_multiple_keywords_stack():
    assert score_line("Urgent and important deadline") == 3

def test_no_keywords_scores_zero():
    assert score_line("Regular project update") == 0

def test_scoring_is_case_insensitive():
    assert score_line("URGENT DEADLINE ASAP") == 3
</code></pre>
<p>Run the tests with pytest:</p>
<pre><code class="language-bash">pip install pytest
python -m pytest tests/ -v
</code></pre>
<p>Writing tests for each agent's core function means you can catch bugs before you build any Docker images, saving a lot of time compared to debugging inside running containers.</p>
<h3 id="heading-integration-tests">Integration Tests</h3>
<p>To test the full pipeline end-to-end, create known input files and verify the expected output:</p>
<pre><code class="language-bash"># Create test data
mkdir -p data/input
echo "Urgent: quarterly report due today" &gt; data/input/test.txt
echo "Regular standup notes, no blockers" &gt;&gt; data/input/test.txt

# Run the pipeline
docker compose up --build

# Verify the output exists and contains expected content
test -f output/daily_digest.md &amp;&amp; echo "File exists: PASS" || echo "File missing: FAIL"
grep -q "Priority" output/daily_digest.md &amp;&amp; echo "Content check: PASS" || echo "Content check: FAIL"
</code></pre>
<h2 id="heading-how-to-add-logging-and-observability">How to Add Logging and Observability</h2>
<p>Every agent uses Python's <code>logging</code> module with a consistent format. When Docker Compose runs all four containers, it interleaves their logs with container name prefixes, giving you a unified timeline of the entire pipeline.</p>
<p>For production systems, consider switching to JSON-formatted logs. They are easier to parse with log aggregation tools like the ELK Stack, Grafana Loki, or AWS CloudWatch:</p>
<pre><code class="language-python">import json
import logging

class JSONFormatter(logging.Formatter):
    def format(self, record):
        return json.dumps({
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "agent": record.name,
            "message": record.getMessage(),
        })
</code></pre>
<p>To use this formatter, replace the <code>basicConfig</code> call with a handler:</p>
<pre><code class="language-python">handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger = logging.getLogger("summarizer")
logger.addHandler(handler)
logger.setLevel(logging.INFO)
</code></pre>
<p>The most useful metrics to track include the number of files ingested per run, Summarizer latency (time from API call to response), LLM token usage for cost tracking, the number of errors and retries per agent, and whether <code>daily_digest.md</code> was successfully generated. A simple approach for personal use is to write a JSON metrics file alongside the digest in the output directory. For team or production use, consider adding Prometheus metrics or sending data to a monitoring service.</p>
<h2 id="heading-cost-rate-limits-and-graceful-degradation">Cost, Rate Limits, and Graceful Degradation</h2>
<p>The Summarizer is the only agent that calls a paid API. Here is what you can expect to pay:</p>
<table style="min-width:100px"><colgroup><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"></colgroup><tbody><tr><th><p>Model</p></th><th><p>Input Cost</p></th><th><p>Output Cost</p></th><th><p>Cost per Daily Run</p></th></tr><tr><td><p><code>gpt-4o-mini</code></p></td><td><p>\(0.15 / 1M tokens</p></td><td><p>\)0.60 / 1M tokens</p></td><td><p>Less than \(0.01</p></td></tr><tr><td><p><code>gpt-4o</code></p></td><td><p>\)2.50 / 1M tokens</p></td><td><p>\(10.00 / 1M tokens</p></td><td><p>\)0.02 to \(0.10</p></td></tr><tr><td><p>Local model (Ollama)</p></td><td><p>Free (uses your hardware)</p></td><td><p>Free</p></td><td><p>\)0.00</p></td></tr></tbody></table>

<p>For a daily personal digest processing a few thousand tokens of input, <code>gpt-4o-mini</code> costs less than a penny per run. That works out to roughly three dollars per year.</p>
<p>To protect against unexpected bills, set a monthly spending cap in your OpenAI dashboard. You can also set per-minute rate limits to prevent runaway usage if a bug causes repeated API calls.</p>
<p>Beyond the retry logic already built into the Summarizer, you can cache LLM responses so that if the same input text appears again you reuse the previous summary instead of calling the API. Use the cheapest model that gives acceptable results — for summarization, <code>gpt-4o-mini</code> usually works as well as <code>gpt-4o</code> at a fraction of the cost. And batch requests when possible by combining many small texts into one API call.</p>
<p>The Summarizer already writes a fallback message when the API fails. This is the most important form of graceful degradation: the pipeline keeps running, and you get a less useful digest instead of nothing at all. If the digest is critical for your workflow, add an alerting step — for example, you could extend the Formatter to send a Slack notification when the Summarizer falls back.</p>
<h2 id="heading-security-and-privacy-considerations">Security and Privacy Considerations</h2>
<p>When you feed personal data emails, meeting notes, private newsletters into an LLM, you need to think carefully about where that data goes.</p>
<p>Text you send to OpenAI or similar providers leaves your machine and is processed on their servers. As of early 2025, OpenAI's API does not use submitted data for model training by default, but policies can change. Always check your provider's current data retention and usage policies. If your input contains personally identifiable information like names, email addresses, or phone numbers, consider stripping it before calling the API, or use a local model.</p>
<p>The intermediate files created during the pipeline (<code>ingested.txt</code>, <code>summary.txt</code>, <code>prioritized.txt</code>) contain processed versions of your raw input. For personal use, keep them for debugging and delete manually. For automated pipelines, add a cleanup step that deletes intermediate files after the digest is generated. If you operate in the EU, review GDPR requirements around data minimization, right to deletion, and records of processing.</p>
<p>To secure your containers, use minimal base images like <code>python:3.10-slim</code> to reduce the attack surface, run containers as a non-root user by adding a <code>USER</code> directive to your Dockerfiles, update base images regularly (at least monthly) to pick up security patches, and scan your images for vulnerabilities using <code>docker scout</code> or Trivy.</p>
<h2 id="heading-how-to-use-a-local-llm-for-full-privacy-ollama">How to Use a Local LLM for Full Privacy (Ollama)</h2>
<p>If you want to keep all data on your machine and avoid sending anything to external APIs, you can swap the OpenAI API for a local model running through <strong>Ollama</strong>. Ollama lets you run open-source LLMs locally, handling model weight downloads, memory management, and serving an API.</p>
<p>To set up Ollama:</p>
<pre><code class="language-bash"># Install Ollama (macOS or Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model (llama3 is a good general-purpose choice)
ollama pull llama3

# Verify it is running
ollama list
</code></pre>
<p>Replace the OpenAI API call in the Summarizer with a request to Ollama's local API:</p>
<pre><code class="language-python">import requests

def summarize_locally(text):
    """Call a local Ollama instance from inside a Docker container."""
    url = "http://host.docker.internal:11434/api/generate"
    payload = {
        "model": "llama3",
        "prompt": (
            "Summarize the following text into key "
            f"bullet points:\n\n{text}"
        ),
        "stream": False
    }
    try:
        resp = requests.post(url, json=payload, timeout=120)
        resp.raise_for_status()
        return resp.json().get('response', 'No response')
    except requests.exceptions.RequestException as e:
        return f"Ollama error: {e}"
</code></pre>
<p>The <code>host.docker.internal</code> hostname lets a container communicate with services running on the host machine. Ollama runs on your host (not inside a container), so this is how the Summarizer reaches it.</p>
<blockquote>
<p><strong>Note:</strong> On Linux, <code>host.docker.internal</code> may not resolve by default. Add this to your <code>docker-compose.yml</code> under the summarizer service: <code>extra_hosts: ["host.docker.internal:host-gateway"]</code></p>
</blockquote>
<p>Local models are slower than cloud APIs and require decent hardware (at least 8 GB of RAM for smaller models, 16 GB or more for larger ones). But they are free, fully private, and work without an internet connection.</p>
<h2 id="heading-example-seed-data-and-expected-output">Example Seed Data and Expected Output</h2>
<p>To test the full pipeline without real newsletters, create these sample input files:</p>
<p><code>data/input/newsletter_ai.txt</code></p>
<pre><code class="language-plaintext">AI Weekly Roundup - January 2025
OpenAI released a new reasoning model this week.
URGENT: New EU AI Act regulations take effect in March.
Google announced updates to their Gemini model family.
A startup raised $50M for AI-powered code review tools.
</code></pre>
<p><code>data/input/meeting_notes.txt</code>:</p>
<pre><code class="language-plaintext">Team Standup Notes - Monday
IMPORTANT: Deadline for Q1 report is this Friday.
Action required: Review the updated API documentation.
Sprint velocity is on track. No blockers reported.
</code></pre>
<p>Expected output in <code>output/daily_digest.md</code>:</p>
<pre><code class="language-markdown"># Your Daily AI Digest

**Date:** 2025-01-20

## Top Insights

- **Priority 3**: IMPORTANT: Deadline for Q1 report due Friday
- **Priority 2**: URGENT: New EU AI Act regulations in March
- **Priority 1**: Action required: Review the updated API docs
- **Priority 0**: OpenAI released a new reasoning model
- **Priority 0**: Sprint velocity is on track
</code></pre>
<p>The exact summary text will vary depending on your LLM model and settings, but the structure and priority ordering should remain consistent.</p>
<h2 id="heading-how-to-automate-daily-execution">How to Automate Daily Execution</h2>
<p>Now that the pipeline works end-to-end with a single command, you can schedule it to run automatically every morning.</p>
<h3 id="heading-how-to-use-cron-on-linux-or-macos">How to Use Cron on Linux or macOS</h3>
<p>Open your crontab with <code>crontab -e</code> and add this line to run the pipeline every day at 7:00 AM:</p>
<pre><code class="language-bash">0 7 * * * cd /path/to/multi-agent-digest &amp;&amp; docker compose up --build &gt;&gt; cron.log 2&gt;&amp;1
</code></pre>
<p>The <code>&gt;&gt; cron.log 2&gt;&amp;1</code> part redirects all output (including errors) to a log file so you can check it later. Make sure your machine is running at the scheduled time and Docker Desktop is started.</p>
<h3 id="heading-how-to-use-task-scheduler-on-windows">How to Use Task Scheduler on Windows</h3>
<p>Open Task Scheduler and create a new task. Under "Actions," set the program to:</p>
<pre><code class="language-bash">wsl -e bash -c 'cd /mnt/c/path/to/multi-agent-digest &amp;&amp; docker compose up --build'
</code></pre>
<p>Set the trigger to fire every morning at your preferred time.</p>
<h3 id="heading-how-to-add-delivery-notifications">How to Add Delivery Notifications</h3>
<p>For the digest to be truly useful, you want it delivered to you rather than sitting in a folder. Here are three options:</p>
<p><strong>Email</strong> — Extend the Formatter to send the digest via Python's <code>smtplib</code> module. You will need SMTP credentials for a service like Gmail, SendGrid, or Amazon SES.</p>
<p><strong>Slack</strong> — Create an incoming webhook in your Slack workspace and POST the digest as a message. This takes about 10 lines of code.</p>
<p><strong>Notion or Obsidian</strong> — Use their APIs to create a new page or note with the digest content each morning.</p>
<h2 id="heading-troubleshooting-common-errors">Troubleshooting Common Errors</h2>
<p><strong>Container exits with OOM error</strong> — Large files or LLM processing are exceeding memory. Increase the memory limit in <code>docker-compose.yml</code> under <code>deploy &gt; resources &gt; limits &gt; memory</code>. Try <code>1G</code>.</p>
<p><strong>Rate limit errors from OpenAI</strong> — The retry logic handles temporary rate limits automatically. Check your OpenAI dashboard for usage caps.</p>
<p><code>depends_on</code> <strong>does not wait for completion</strong> — Make sure you are using <code>condition: service_completed_successfully</code>, which requires Docker Compose v2.</p>
<p><strong>Permission denied on</strong> <code>/output</code> — Volume mount permissions mismatch. Run <code>chmod -R 777 ./output</code> on the host, or add a <code>USER</code> directive to your Dockerfiles.</p>
<p><code>OPENAI_API_KEY</code> <strong>not found</strong> — The <code>.env</code> file may be missing or not in the right directory. Create <code>.env</code> in the same folder as <code>docker-compose.yml</code> and verify with <code>docker compose config</code>.</p>
<p><strong>Cannot reach Ollama from container</strong> — <code>host.docker.internal</code> may not be resolving on Linux. Add <code>extra_hosts: ["host.docker.internal:host-gateway"]</code> to the service in <code>docker-compose.yml</code>.</p>
<h2 id="heading-production-deployment-options">Production Deployment Options</h2>
<p>The <code>docker compose up</code> approach works well for personal use and development. When you are ready to deploy to a server or the cloud, here are your main options.</p>
<h3 id="heading-docker-swarm">Docker Swarm</h3>
<p>Docker Swarm is the simplest step up from Compose. It lets you deploy across multiple machines with minimal changes to your existing Compose file:</p>
<pre><code class="language-bash">docker swarm init
docker stack deploy -c docker-compose.yml morning-brief
</code></pre>
<h3 id="heading-kubernetes">Kubernetes</h3>
<p>For production at scale, Kubernetes gives you more control over scheduling, scaling, and fault tolerance. Use Kubernetes <strong>Jobs</strong> (not Deployments) for batch agents that run once and exit. Set resource requests and limits on each container so the cluster scheduler can allocate resources efficiently. Store API keys in <strong>Kubernetes Secrets</strong>, and use <strong>CronJobs</strong> for scheduled daily execution — they work like cron but are managed by the cluster.</p>
<h3 id="heading-cloud-platforms">Cloud Platforms</h3>
<p>All major cloud providers offer managed container services that can run this pipeline:</p>
<p><strong>AWS</strong> — ECS Fargate with scheduled tasks for serverless execution, or EKS for managed Kubernetes.</p>
<p><strong>Azure</strong> — Azure Container Instances for simple runs, or AKS for managed Kubernetes.</p>
<p><strong>GCP</strong> — Cloud Run Jobs for serverless batch processing, or GKE for managed Kubernetes.</p>
<h2 id="heading-conclusion-and-next-steps">Conclusion and Next Steps</h2>
<p>In this handbook, you built a multi-agent AI system from scratch. You created four specialized Python agents, containerized each one with Docker, orchestrated them with Docker Compose, and added secrets handling, structured logging, retry logic, and graceful fallbacks.</p>
<p>The core patterns you learned — separation of concerns, containerized agents, shared-volume communication, and defensive coding against external APIs — apply far beyond this specific use case. Any time you need a reliable, modular, and reproducible AI workflow, these patterns are a solid foundation.</p>
<p>Here are some directions to explore next:</p>
<p><strong>Agent collaboration frameworks</strong> — Tools like CrewAI and LangGraph let you build agents that delegate tasks to each other, negotiate priorities, and collaborate in more sophisticated ways.</p>
<p><strong>Local and fine-tuned models</strong> — Experiment with Ollama or vLLM to run models locally. Fine-tune a small model specifically for summarization to get better results at lower cost.</p>
<p><strong>Event-driven architectures</strong> — Replace the shared volume with Redis or RabbitMQ so agents react to events in real time rather than running on a schedule.</p>
<p><strong>Feedback loops</strong> — Add an agent that evaluates the quality of the daily digest and adjusts the Summarizer's prompts over time. This is how production agent systems learn and improve.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Run an LLM Locally to Interact with Your Documents ]]>
                </title>
                <description>
                    <![CDATA[ Most AI tools require you to send your prompts and files to third-party servers. That’s a non-starter if your data includes private journals, research notes, or sensitive business documents (contracts, board decks, HR files, financials). The good new... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/run-an-llm-locally-to-interact-with-your-documents/</link>
                <guid isPermaLink="false">69619f7198022932a4f500a0</guid>
                
                    <category>
                        <![CDATA[ LLM&#39;s  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Zoe Isabel Senón ]]>
                </dc:creator>
                <pubDate>Sat, 10 Jan 2026 00:38:09 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767976983680/2e3671cd-4280-4a32-9508-47fe9c06ab22.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most AI tools require you to send your prompts and files to third-party servers. That’s a non-starter if your data includes private journals, research notes, or sensitive business documents (contracts, board decks, HR files, financials). The good news: you can run capable LLMs locally (on a laptop or your own server) and query your documents without sending a single byte to the cloud.</p>
<p>In this tutorial, you’ll learn how to run an LLM locally and privately, so you can search and chat with sensitive journals and business docs on your own machine. We’ll install <strong>Ollama</strong> and <strong>OpenWebUI</strong>, pick a model that fits your hardware, enable private document search with <strong>nomic-embed-text</strong>, and create a local knowledge base so everything stays on-disk.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-installation">Installation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-settings-for-documents">Settings for Documents</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-upload-your-documents">How to Upload Your Documents</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-optional-adding-a-system-prompt">(Optional) Adding a system prompt</a></li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-run-your-llm-locally">How to Run Your LLM Locally</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You’ll need a terminal (all systems—Windows, Mac, Linux—include one, and you can find yours with a quick search), and either Python and pip or Docker, depending on your preferred installation method for OpenWebUI.</p>
<h2 id="heading-installation">Installation</h2>
<p>You’ll need <a target="_blank" href="https://ollama.com/download"><strong>Ollama</strong></a> and <a target="_blank" href="https://docs.openwebui.com/getting-started/quick-start/"><strong>OpenWebUI</strong></a>. Ollama runs the models, while OpenWebUI gives you a browser interface to interact with your local LLM, like you would with ChatGPT.</p>
<h3 id="heading-step-1-install-ollama">Step 1: Install Ollama</h3>
<p>Download and install Ollama from its <a target="_blank" href="https://ollama.com/download">official site</a>. Installers are available for <strong>macOS</strong>, <strong>Linux</strong>, and <strong>Windows</strong>. Once installed, verify it’s running by opening a terminal and executing:</p>
<pre><code class="lang-bash">ollama list
</code></pre>
<p>If Ollama is running, this will return a list of active models (or an empty list).</p>
<h3 id="heading-step-2-install-openwebui">Step 2: Install OpenWebUI</h3>
<p>You can install OpenWebUI either with Python (pip) or with Docker. Here, we will show how to do it with pip, but you can find instructions for Docker on the <a target="_blank" href="https://docs.openwebui.com/getting-started/quick-start/">official openwebui docs</a>.</p>
<p>Install OpenWebUI with the following command:</p>
<pre><code class="lang-bash">pip install open-webui
</code></pre>
<p>This works on <strong>macOS, Linux, and Windows</strong>, as long as you have Python ≥ 3.9 installed.</p>
<p>Next, start the server:</p>
<pre><code class="lang-bash">open-webui serve
</code></pre>
<p>Then open your browser and go to:</p>
<pre><code class="lang-bash">http://localhost:8080
</code></pre>
<h3 id="heading-step-3-install-a-model">Step 3: Install a Model</h3>
<p>Choose a model from the <a target="_blank" href="https://ollama.com/library">Ollama model list</a> and pull it locally by copying the command provided.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302463715/fbbaabf7-6612-460c-8e09-1c5143eacc1a.png" alt="Screenshot of the model download page with an arrow pointing to the upper-right corner box that includes the installation command with a shortcut to copy-paste" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>For example:</p>
<pre><code class="lang-bash">ollama pull gemma3:4b
</code></pre>
<p>If you’re unsure which model your machine can handle, ask an AI to recommend one based on your hardware. Smaller models (1B–4B) are safer on laptops.</p>
<p>I would recommend Gemma3 as a starter (you can download multiple models and easily switch between them). Pick the <strong>parameter number</strong> at the end (“:4b”, “:1b”, and so on) based on this guide:</p>
<ul>
<li><p>Tier 1 (small laptops or weak computers): RAM ≤8 GB or no GPU → 1B–2B.</p>
</li>
<li><p>Tier 2: RAM 16 GB, weak GPU → 2B–4B.</p>
</li>
<li><p>Tier 3: RAM ≥16 GB, 6–8 GB VRAM → 4B–9B.</p>
</li>
<li><p>Tier 4: RAM ≥32 GB, 12 GB+ VRAM → 12B+.</p>
</li>
</ul>
<p>Once you have installed Ollama and your desired model, confirm that they are active by running <code>ollama list</code> in the terminal:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767465401368/d1b8abc0-7aaa-4c2f-ad4c-30ae908f9e8b.png" alt="Image showing the output of running the &quot;ollama list&quot; command (shows the list of downloaded models, in this case &quot;gemma3:1b&quot;)" width="600" height="400" loading="lazy"></p>
<p>Run WebOpenUI to launch the browser interface with:</p>
<pre><code class="lang-bash">open-webui serve
</code></pre>
<p>Then head over to <a target="_blank" href="http://localhost:8080/">http://localhost:8080/</a>. Now you are ready to start using your LLM locally!</p>
<p><strong>Note</strong>: it will ask you for login credentials, but these don’t really matter if you only intend to use it locally.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302486263/14d93c7e-415c-463f-82da-fc515f28663a.png" alt="Screenshot of the frontend of a running instance of OpenWebUI, showing the homepage, which includes a text input box in the center with the placeholder &quot;how can I help you today?&quot;, and a side panel with the list of previous chats, and links to &quot;search&quot;, &quot;notes&quot;, &quot;workspace&quot;, and &quot;new chat&quot;, as well as a setting button. At the top there is a model selector that currently has &quot;gemma3:1b&quot; selected as the model to use." class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-settings-for-documents">Settings for Documents</h2>
<p>Now we are going to set up everything we need to interact with our local documents. First of all, we need to install the “<a target="_blank" href="https://ollama.com/library/nomic-embed-text"><strong>nomic-embed-text</strong></a>” model to process our documents. Install it with:</p>
<pre><code class="lang-bash">ollama pull nomic-embed-text
</code></pre>
<p><strong>Note</strong>: If you are wondering why we need another model (nomic-embed-text) besides our main one:</p>
<ul>
<li><p>The embedding model (<code>nomic-embed-text</code>) maps each text chunk from your documents to a numerical vector so OpenWebUI can quickly find semantically similar chunks when you ask a question.​</p>
</li>
<li><p>The chat model (for example <code>gemma3:1b</code>) receives your question plus those retrieved chunks as context and generates the natural-language response.</p>
</li>
</ul>
<p>Next, you should enable the “<strong>memory</strong>” feature if you want the LLM to remember the context of your past conversations in your future ones.</p>
<p>Download the adaptive memory function <a target="_blank" href="https://openwebui.com/f/alexgrama7/adaptive_memory_v2"><strong>here</strong></a>. Functions are like plug-ins.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302505221/b247316c-0863-410a-84c9-abc084a6631f.png" alt="Screenshot showing the page (website) for the &quot;adaptive memory v3&quot; function. It shows a big &quot;get&quot; button, that when clicked opens a pop-up view named &quot;Open WebUI URL&quot; with the current placeholder being &quot;http:localhost:8080&quot; (the default WebUI port) and a button to &quot;import to WebUI&quot; and another one below to &quot;Download as JSON export&quot; in case the first one doesn't work)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now we will update our settings to enable these features. Click on your name in the bottom-left corner, then “Settings”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302517617/e73983f3-0e36-4c0a-a61c-96a0a42f1fab.png" alt="Screenshot showing the menu panel that pops up when clicking on the bottom-left round icon with the user's initital and name, showing a list of options, starting with &quot;Settings&quot; and followed by &quot;Archived Chats&quot;, &quot;Playground&quot;, &quot;Admin Panel&quot; and &quot;Sign out&quot;" width="600" height="400" loading="lazy"></p>
<p>Click on the first one, then go to “Personalization” and enable “Memory”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752935284007/aa42c76b-f38c-4485-b442-8844c6c3a544.png" alt="“Screenshot of the OpenWebUI settings panel with the Personalization tab open and the Memory toggle switched on for saving past conversation context.”" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now we are going to access the other settings panel (“Admin Panel”). Click again on your name in the bottom-left corner and go to <strong>Admin panel → Settings → Documents</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302570583/96784c55-484b-4c66-bdc4-ce23a7e901a1.png" alt="Screenshot of the OpenWebUI Admin → Settings → Documents page, showing a text input field called &quot;Chunk size&quot; currently set to 512" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>In this section (Admin Panel → Settings → Documents), find the “<strong>Embedding</strong>” section, go to “<strong>Embedding Model Engine</strong>” and choose Ollama (find the selectable to the right). Leave the API Key blank.</p>
<p>Now, under “<strong>Embedding Model</strong>” write <code>nomic-embed-text</code>. Then go to “Retrieval” → enable “Full Context Mode”.</p>
<h3 id="heading-chunking-settings">Chunking settings</h3>
<p>You should also set the <strong>chunk size</strong> and <strong>overlap</strong>. OpenWebUI splits documents into smaller chunks before indexing them, since models can’t embed or retrieve very long texts in one piece.</p>
<p>A good default is <strong>128–512 tokens per chunk</strong>, with <strong>10–20% overlap</strong>. Larger chunks preserve more context but are slower and more memory-intensive, while smaller chunks are faster but can lose higher-level meaning. Overlap helps prevent important context from being cut off when text is split.</p>
<p>Here’s a guiding table, but I recommend obtaining the recommended values for your specific use case and setup by sharing them (including GPU or laptop model, storage, RAM, and so on) with an LLM like ChatGPT or Claude, <strong>as changing the chunking/overlap values later on requires reuploading the documents.</strong></p>
<h3 id="heading-suggested-chunkoverlap-by-tier">Suggested chunk/overlap by tier</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Tier / scenario</strong></td><td><strong>Typical hardware</strong></td><td><strong>Chunk size (tokens)</strong></td><td><strong>Overlap (%)</strong></td><td><strong>Notes</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Tier 1 – constrained</td><td>≤8 GB RAM, no/weak GPU</td><td>128–256</td><td>10–15</td><td>Prioritizes speed and low memory use. ​</td></tr>
<tr>
<td>Tier 2 – mid</td><td>16 GB RAM, modest GPU or strong CPU</td><td>256–384</td><td>15–20</td><td>Balanced context vs. performance. ​</td></tr>
<tr>
<td>Tier 3 – comfortable</td><td>≥16 GB RAM, 6–8 GB VRAM</td><td>384–512</td><td>15–20</td><td>More semantics per chunk, still practical. ​</td></tr>
<tr>
<td>Dense technical PDFs / legal docs</td><td>Any, but especially Tier 2–3</td><td>384–512</td><td>15–20</td><td>Keeps paragraphs and arguments intact. ​</td></tr>
<tr>
<td>Short notes, tickets, emails</td><td>Any</td><td>128–256</td><td>10–15</td><td>Items are small, large chunks not needed. ​</td></tr>
<tr>
<td>Very long queries, need many retrieved chunks</td><td>Any with larger context window</td><td>256–384</td><td>10–15</td><td>Smaller chunks fit more pieces into context. ​</td></tr>
</tbody>
</table>
</div><h2 id="heading-how-to-upload-your-documents">How to Upload Your Documents</h2>
<p>Now, the final step: uploading your documents! Go to “Workspace” in the side panel, then “Knowledge”, and create a new collection (database). You can start uploading files here.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302584485/63c04901-f5d3-4ac7-bab5-b23362fb83cb.png" alt="Screenshot of the &quot;Workspace&quot; page (after clicking on &quot;workspace&quot; in the side panel) highlight the &quot;Workspace&quot; button on the lefthand side, the &quot;Knowledge&quot; tab being selected from the options at the top within this Workspace page, then &quot;Upload files&quot; which is the first option shown on the list after clicking the &quot;+&quot; (plus) sign button at the right of the text input with the placeholder that says &quot;Search Collection&quot;." class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">⚠</div>
<div data-node-type="callout-text">Make sure to check for any errors during the upload. Unfortunately, they only show as temporary pop-ups. Some errors might be due to the format of your files, so make sure to check the console for further error logs.</div>
</div>

<p>Then, within “Workspace”, switch to the “Models” tab and create a new custom model. Creating a custom model and attaching your knowledge base tells OpenWebUI to automatically search your document collection and include the most relevant chunks as context whenever you ask a question.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302593445/b5316a4a-8c8a-4348-a31e-1c10fe0e1abb.png" alt="Screenshot of the &quot;Workspace&quot; page (after clicking on &quot;workspace&quot; in the side panel), highlighting the first tab/option in the upper menu named &quot;Models&quot;, which when clicked shows the list of custom models and an option to create new ones (in this case the user has created one called &quot;Gemma-custom-knowledge&quot;)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Here, make sure to select your model (in my case “gemma3:1b”) and attach your knowledge base.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302604758/df0c7948-bb9b-4615-8f09-21faaa64fdde.png" alt="Screenshot of the model creation page, highlighting the selectable options under the &quot;Base model (from)&quot; field, specifically highlighting &quot;gemma3:1b&quot; or the model of choice, under the selected-by-default option &quot;select a base model&quot;. The second element highlighted in red is the other field below titled &quot;Knowledge&quot;, with a buttom called &quot;Select Knowledge&quot;. There are 2 other elements highlighted in yellow (indicating lower priority): the first one is &quot;Model Params&quot; that includes a &quot;system prompt&quot; input field right below, and the other one is &quot;Filters&quot; which includes multiple selectable options depending on the different plugins or &quot;functions&quot; installed." class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302612285/8247d1c3-5f84-42de-9861-34416d0b7f10.png" alt="Screenshot showing the options available after clicking &quot;Select Knowledge&quot; under &quot;Knowledge&quot;, highlighting the option that says &quot;COLLECTION&quot; in green followed by the title &quot;Test-knowledge-base&quot; (example title chosen by the author) and the description added by the author (&quot;adding my documents&quot;)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-optional-adding-a-system-prompt">(Optional) Adding a system prompt</h3>
<p>When creating your custom model in <strong>Workspace → Models</strong>, you can define a <strong>system prompt</strong> that the model will use for context throughout all your conversations.</p>
<p>Here are some examples of information you might want to add:</p>
<ul>
<li><p>context about yourself <em>(“I am a 20-year-old student in bioengineering interested in…”)</em></p>
</li>
<li><p>your preferred communication style <em>(“no fluff", “be direct”, “be analytical”…)</em></p>
</li>
<li><p>context about how your data is structured</p>
</li>
</ul>
<p><strong>Example system prompt:</strong></p>
<blockquote>
<p>You are a thoughtful, analytical assistant helping me explore patterns and insights in my personal journals. Be direct, avoid speculation, and clearly distinguish between facts from the documents and interpretation.</p>
</blockquote>
<p>This prompt will automatically apply to every chat using this custom model, helping keep responses consistent and aligned with your goals.</p>
<h2 id="heading-how-to-run-your-llm-locally">How to Run Your LLM Locally</h2>
<p>Now open a new chat and make sure to select your custom model:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302621012/241f461c-acf6-41ae-b68d-ad187790aef4.png" alt="Screenshot showing the &quot;New chat&quot; page after clicking on the &quot;+&quot; (plus) symbol/button next to the custom model name. It shows the options shown when clicking on the input field that says &quot;Search a model&quot; as a placeholder, and the option highlighted within it is the name of the custom model (in this case the author chose the name &quot;Gemma-custom-knowledge&quot;)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now you are ready to chat with your own docs in a private local environment!</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">⚠</div>
<div data-node-type="callout-text"><strong>Note</strong>: By default, the frontend/browser will stop streaming the response after five minutes, even though it will keep processing your query in the background. This means that if your query takes more than five minutes to process, it will not be displayed on the browser. You can reload the page and click “continue response” to get the latest output.</div>
</div>

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">I recommend installing the <a target="_self" href="https://openwebui.com/f/alexgrama7/enhanced_context_tracker_v4">Enhanced Context Tracker</a> function (plugin) to get more visibility into the progress of your query.</div>
</div>

<h2 id="heading-conclusion">Conclusion</h2>
<p>You now have a private LLM stack (<strong>Ollama</strong> for models, <strong>OpenWebUI</strong> for the UI, and <strong>nomic-embed-text</strong> for embeddings) wired to your on-disk knowledge base. Your journals and business docs stay local; nothing is sent to third parties. The main dials are simple: pick a model that fits your hardware, enable memory and full-context retrieval, use sensible chunk/overlap, and check the console when runs stall.</p>
<p>If you need more headroom, deploy the same setup on your own server and keep the privacy guarantees. From here, iterate on model choice, chunking, and prompts, and add the optional functions if you need deeper visibility during long jobs.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How To Run an Open-Source LLM on Your Personal Computer – Run Ollama Locally ]]>
                </title>
                <description>
                    <![CDATA[ Running a large language model (LLM) on your computer is now easier than ever. You no longer need a cloud subscription or a massive server. With just your PC, you can run models like Llama, Mistral, or Phi, privately and offline. This guide will show... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-run-an-open-source-llm-on-your-personal-computer-run-ollama-locally/</link>
                <guid isPermaLink="false">691256ca726af9fcf5543027</guid>
                
                    <category>
                        <![CDATA[ LLM&#39;s  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ open source ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Mon, 10 Nov 2025 21:19:06 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762809417189/37e154b9-9bf0-4210-921a-4722cd448b09.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Running a large language model (LLM) on your computer is now easier than ever. You no longer need a cloud subscription or a massive server. With just your PC, you can run models like Llama, Mistral, or Phi, privately and offline.</p>
<p>This guide will show you how to set up an open-source LLM locally, explain the tools involved, and walk you through both the UI and command-line installation methods.</p>
<h2 id="heading-what-well-cover">What We’ll Cover</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-understanding-open-source-llms">Understanding Open Source LLMs</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-choosing-a-platform-to-run-llms-locally">Choosing a Platform to Run LLMs Locally</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-install-ollama">How to Install Ollama</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-install-and-run-llms-via-the-command-line">How to Install and Run LLMs via the Command Line</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-manage-models-and-resources">How to Manage Models and Resources</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-use-ollama-with-other-applications">How to Use Ollama with Other Applications</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-troubleshooting-and-common-issues">Troubleshooting and Common Issues</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-running-llms-locally-matters">Why Running LLMs Locally Matters</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-understanding-open-source-llms">Understanding Open Source LLMs</h2>
<p>An open-source large language model is a type of AI that can understand and generate text, much like ChatGPT, but it can function without depending on external servers. </p>
<p>You can download the model files, run them on your machine, and even <a target="_blank" href="https://www.turingtalks.ai/p/how-ai-agents-remember-things-the-role-of-vector-stores-in-llm-memory">fine-tune</a> them for your use cases.</p>
<p>Projects like Llama 3, Mistral, Gemma, and Phi have made it possible to run models that fit well on consumer hardware. You can choose between smaller models that run on CPUs or larger ones that benefit from GPUs.</p>
<p>Running these models locally gives you privacy, control, and flexibility. It also helps developers integrate AI features into their applications without relying on cloud APIs.</p>
<h2 id="heading-choosing-a-platform-to-run-llms-locally">Choosing a Platform to Run LLMs Locally</h2>
<p>To run an open source model, you need a platform that can load it, manage its parameters, and provide an interface to interact with it.</p>
<p>Three popular choices for local setup are:</p>
<ol>
<li><p><a target="_blank" href="https://ollama.com/"><strong>Ollama</strong></a> — a user-friendly system that runs models like OpenAI GPT OSS, Google Gemma with one command. It has both a Windows UI and CLI version.</p>
</li>
<li><p><a target="_blank" href="https://lmstudio.ai/"><strong>LM Studio</strong></a> — a graphical desktop application for those who prefer a point-and-click interface.</p>
</li>
<li><p><a target="_blank" href="https://www.nomic.ai/gpt4all">Gpt4All</a> — another popular GUI desktop application.</p>
</li>
</ol>
<p>We’ll use Ollama as the example in this guide since it’s widely supported and integrates easily with other tools.</p>
<h2 id="heading-how-to-install-ollama">How to Install Ollama</h2>
<p>Ollama provides a one-click installer that sets up everything you need to run local models. Visit <a target="_blank" href="https://ollama.com/">the official Ollama website</a> and download the Windows installer.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762438947066/9b6c84c1-e8ae-4765-9b55-a444bdf68283.png" alt="Ollama home page" class="image--center mx-auto" width="1241" height="721" loading="lazy"></p>
<p>Once downloaded, double-click the file to start installation. The setup wizard will guide you through the process, which only takes a few minutes.</p>
<p>When the installation finishes, Ollama will run in the background as a local service. You can access it either through its graphical desktop interface or using the command line.</p>
<p>After installing Ollama, you can open the application from the Start Menu. The UI makes it easy for beginners to start interacting with local models.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762439008725/a1ebb4fc-c638-41f0-817a-cd6772c8577e.png" alt="Ollama Interface" class="image--center mx-auto" width="1000" height="532" loading="lazy"></p>
<p>On the Ollama interface, you’ll see a simple text box where you can type prompts and receive responses. There’s also a panel that lists available models.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762439045357/760b04b6-f826-422d-8ba9-6a255917ae29.png" alt="Ollama Models" class="image--center mx-auto" width="759" height="622" loading="lazy"></p>
<p>To download and use a model, just select it from the list. Ollama will automatically fetch the model weights and load them into memory.</p>
<p>The first time you ask a question, it will download the model if it does not exist. You can also choose the model from the <a target="_blank" href="https://ollama.com/search">models search page</a>. </p>
<p>I’ll use the <a target="_blank" href="https://ollama.com/library/gemma3">gemma 270m</a> model which is the smallest model available in Ollama. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762439068617/c88f191b-f2f7-4c7a-b1dc-b1eea7745a35.png" alt="Ollama downloading model" class="image--center mx-auto" width="1000" height="345" loading="lazy"></p>
<p>You can see the model being downloaded when used for the first time. Depending on the model size and your system’s performance, this might take a few minutes.</p>
<p>Once loaded, you can start chatting or running tasks directly within the UI. It’s designed to look and feel like a normal chat window, but everything runs locally on your PC. </p>
<p>You don’t need an internet connection after the model has been downloaded.</p>
<h2 id="heading-how-to-install-and-run-llms-via-the-command-line">How to Install and Run LLMs via the Command Line</h2>
<p>If you prefer more control, you can use the Ollama command-line interface (CLI). This is useful for developers or those who want to integrate local models into scripts and workflows.</p>
<p>To open the command line, search for “Command Prompt” or “PowerShell” in Windows and run it. You can now interact with Ollama using simple commands.</p>
<p>To check if the installation worked, type:</p>
<pre><code class="lang-python-repl">ollama --version
</code></pre>
<p>If you see a version number, Ollama is ready. Next, to run your first model, use the pull command:</p>
<pre><code class="lang-python-repl">ollama pull gemma3:270m
</code></pre>
<p>This will download the Gemma model to your machine.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762439104192/14ed4a53-330f-41c6-82dd-f2a22ecb9d05.png" alt="Ollama pull model" class="image--center mx-auto" width="1000" height="204" loading="lazy"></p>
<p>When the process finishes, start it with:</p>
<pre><code class="lang-python-repl">ollama run gemma3:270m
</code></pre>
<p>Ollama will launch the model and open an interactive prompt where you can type messages.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762439115178/9d17c753-52af-4834-93f4-155bad39bd8d.png" alt="Ollama Interactive shell" class="image--center mx-auto" width="844" height="157" loading="lazy"></p>
<p>Everything happens locally, and your data never leaves your computer.</p>
<p>You can stop the model anytime by typing <code>/bye</code>.</p>
<h2 id="heading-how-to-manage-models-and-resources">How to Manage Models and Resources</h2>
<p>Each model you download takes up disk space and memory. Smaller models like Phi-3 Mini or Gemma 2B are lighter and suitable for most consumer laptops. Larger ones such as Mistral 7B or Llama 3 8B require more powerful GPUs or high-end CPUs.</p>
<p>You can list all installed models using:</p>
<pre><code class="lang-python-repl">ollama list
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762439131985/31bc6125-aec9-47bb-90a8-7017d422e527.png" alt="Ollama installed models" class="image--center mx-auto" width="848" height="104" loading="lazy"></p>
<p>And remove one when you no longer need it:</p>
<pre><code class="lang-python-repl">ollama rm model_name
</code></pre>
<p>If your PC has limited RAM, try running smaller models first. You can experiment with different ones to find the right balance between speed and accuracy.</p>
<h2 id="heading-how-to-use-ollama-with-other-applications">How to Use Ollama with Other Applications</h2>
<p>Once you’ve installed Ollama, you can use it beyond the chat interface. Developers can connect to it using APIs and local ports.</p>
<p>Ollama runs a local server on <code>http://localhost:11434</code>. This means you can send requests from your own scripts or applications.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762439148881/b506c227-8b83-45f4-a2c3-662081ec9faf.png" alt="Ollama API" class="image--center mx-auto" width="1000" height="343" loading="lazy"></p>
<p>For example, a simple Python script can call the local model like this:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> requests, json

<span class="hljs-comment"># Define the local Ollama API endpoint</span>
url = <span class="hljs-string">"http://localhost:11434/api/generate"</span>

<span class="hljs-comment"># Send a prompt to the Gemma 3 model</span>
payload = {
    <span class="hljs-string">"model"</span>: <span class="hljs-string">"gemma3:270m"</span>,
    <span class="hljs-string">"prompt"</span>: <span class="hljs-string">"Write a short story about space exploration."</span>
}

<span class="hljs-comment"># stream=True tells requests to read the response as a live data stream</span>
response = requests.post(url, json=payload, stream=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># Ollama sends one JSON object per line as it generates text</span>
<span class="hljs-keyword">for</span> line <span class="hljs-keyword">in</span> response.iter_lines():
    <span class="hljs-keyword">if</span> line:
        data = json.loads(line.decode(<span class="hljs-string">"utf-8"</span>))
        <span class="hljs-comment"># Each chunk has a "response" key containing part of the text</span>
        <span class="hljs-keyword">if</span> <span class="hljs-string">"response"</span> <span class="hljs-keyword">in</span> data:
            print(data[<span class="hljs-string">"response"</span>], end=<span class="hljs-string">""</span>, flush=<span class="hljs-literal">True</span>)This setup turns your computer into a local AI engine. You can integrate it <span class="hljs-keyword">with</span> chatbots, coding assistants, <span class="hljs-keyword">or</span> automation tools without using external APIs.
</code></pre>
<h2 id="heading-troubleshooting-and-common-issues">Troubleshooting and Common Issues</h2>
<p>If you face issues running a model, check your system resources first. Models need enough RAM and disk space to load properly. Closing other apps can help free up memory.</p>
<p>Sometimes, antivirus software may block local network ports. If Ollama fails to start, add it to the list of allowed programs.</p>
<p>If you use the CLI and see errors about GPU drivers, ensure that your graphics drivers are up to date. Ollama supports both CPU and GPU execution, but having updated drivers improves performance.</p>
<h2 id="heading-why-running-llms-locally-matters">Why Running LLMs Locally Matters</h2>
<p>Running LLMs locally changes how you work with AI. You’re no longer tied to API costs or rate limits. It’s ideal for developers who want to prototype fast, researchers exploring fine-tuning, or hobbyists who value privacy.</p>
<p>Local models are also great for offline environments. You can experiment with prompt design, generate content, or test AI-assisted apps without an internet connection.</p>
<p>As hardware improves and open source communities grow, local AI will continue to become more powerful and accessible.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Setting up and running an open-source LLM on Windows is now simple. With tools like Ollama and LM Studio, you can download a model, run it locally, and start generating text in minutes.</p>
<p>The UI makes it friendly for beginners, while the command line offers full control for developers. Whether you’re building an app, testing ideas, or exploring AI for personal use, running models locally puts everything in your hands, making it fast, private, and flexible.</p>
<p><em>Hope you enjoyed this article. Signup for my free newsletter</em> <a target="_blank" href="https://www.turingtalks.ai/"><strong><em>TuringTalks.ai</em></strong></a> <em>for more hands-on tutorials on AI. You can also</em> <a target="_blank" href="https://manishshivanandhan.com/"><strong><em>visit my website</em></strong></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Local RAG App with Ollama and ChromaDB in the R Programming Language ]]>
                </title>
                <description>
                    <![CDATA[ A Large Language Model (LLM) is a type of machine learning model that is trained to understand and generate human-like text. These models are trained on vast datasets to capture the nuances of human language, enabling them to generate coherent and co... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-a-local-rag-app-with-ollama-and-chromadb-in-r/</link>
                <guid isPermaLink="false">67fd5ac89a2c2895da61d799</guid>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ chromadb ]]>
                    </category>
                
                    <category>
                        <![CDATA[ R Language ]]>
                    </category>
                
                    <category>
                        <![CDATA[ RAG  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Elabonga Atuo ]]>
                </dc:creator>
                <pubDate>Mon, 14 Apr 2025 18:58:16 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1744638731389/83993a5e-7a4d-4615-a8c5-582008115fc4.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>A Large Language Model (LLM) is a type of machine learning model that is trained to understand and generate human-like text. These models are trained on vast datasets to capture the nuances of human language, enabling them to generate coherent and contextually relevant responses.</p>
<p>You can enhance the performance of an LLM by providing context — structured or unstructured data, such as documents, articles, or knowledge bases — tailored to the domain or information you want the model to specialize in. Using techniques like prompt engineering and context injection, you can build an intelligent chatbot capable of navigating extensive datasets, retrieving relevant information, and delivering responses.</p>
<p>Whether it's storing recipes, code documentation, research articles, or answering domain-specific queries, an LLM-based chatbot can adapt to your needs with customization and privacy. You can deploy it locally to create a highly specialized conversational assistant that respects your data.</p>
<p>In this article, you will learn how to build a local Retrieval-Augmented Generation (RAG) application using Ollama and ChromaDB in R. By the end, you'll have a custom conversational assistant with a Shiny interface that efficiently retrieves information while maintaining privacy and customization.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-rag">What is RAG?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-overview">Project Overview</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-setup">Project Setup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ollama-installation">Ollama Installation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-data-collection-and-cleaning">Data Collection and Cleaning</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-chunks">How to Create Chunks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-generate-sentence-embeddings">How to Generate Sentence Embeddings</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-the-vector-database-for-embedding-storage">How to Set Up the Vector Database for Embedding Storage</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-write-the-user-input-query-embedding-function">How to Write the User Input Query Embedding Function</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tool-calling">Tool Calling</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-initialize-the-chat-system-design-prompts-and-integrate-tools">How to Initialize the Chat System, Design Prompts, and Integrate Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-interact-with-your-chatbot-using-a-shiny-app">How to Interact with Your Chatbot Using a Shiny App</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-complete-code">Complete Code</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-is-rag">What is RAG?</h2>
<p>Retrieval-Augmented Generation (RAG) is a method that integrates retrieval systems with generative AI, enabling chatbots to access recent and specific information from external sources.</p>
<p>By using a retrieval pipeline, the chatbot can fetch up-to-date, relevant data and combine it with the generative model’s language capabilities, producing responses that are both accurate and contextually enriched. This makes RAG particularly useful for applications requiring fact-based, real-time knowledge delivery.</p>
<h2 id="heading-project-overview">Project Overview</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744367291671/3e7989f8-0cd9-4857-ba48-23a352d9ae8d.png" alt="Setting up a local RAG chatbot from data gathering, cleaning, chunking, embedding, vector database storage, system prompting and interactive chatbot using Shiny" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-project-setup">Project Setup</h2>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>Before you begin, ensure you have installed the latest version of the items listed here:</p>
<ol>
<li><p><a target="_blank" href="https://posit.co/download/rstudio-desktop/"><strong>RStudio</strong></a><strong>: The IDE</strong> <em>–</em> RStudio is the primary workspace where you'll write and test your R code. Its user-friendly interface, debugging tools, and integrated environment make it ideal for data analysis and chatbot development.</p>
</li>
<li><p><a target="_blank" href="https://cran.rstudio.com/"><strong>R</strong></a><strong>: The Programming Language</strong> <em>–</em> R is the backbone of your project. You'll use it to handle data manipulation, apply statistical models, and integrate your recipe chatbot components seamlessly.</p>
</li>
<li><p><a target="_blank" href="https://www.python.org/downloads/"><strong>Python</strong></a> – Some libraries, like the embedding library you'll use for text vectorization, are built on Python. It’s vital to have Python installed to enable these functionalities alongside your R code.</p>
</li>
<li><p><a target="_blank" href="https://www.java.com/en/download/"><strong>Java</strong></a> – Java serves as a foundational element for certain embedding libraries. It ensures efficient processing and compatibility for text embedding tasks required to train your chatbot.</p>
</li>
<li><p><a target="_blank" href="https://www.docker.com/products/docker-desktop/"><strong>Docker Desktop</strong></a> – Docker Desktop allows you to run ChromaDB, the vector database, locally on your machine. This enables fast and reliable storage of embeddings, ensuring your chatbot retrieves relevant information quickly.</p>
</li>
<li><p><a target="_blank" href="https://ollama.com/"><strong>Ollama</strong></a> – Ollama brings powerful Large Language Models (LLMs) directly to your local computer, removing the need for cloud resources. It lets you access multiple models, customize outputs, and integrate them into your chatbot effortlessly.</p>
</li>
</ol>
<h2 id="heading-ollama-installation">Ollama Installation</h2>
<p>Ollama is an open-sourced tool you can use to run and manage LLMs on your computer. Once installed, you can access various LLMs as per your needs. You will be using <code>llama3.2:3b-instruct-q4_K_M</code> model to build this chatbot.</p>
<p>A quantized model is a version of a machine learning model that has been optimized to use less memory and computational power by reducing the precision of the numbers it uses. This enables you to use an LLM locally, especially when you don’t have access to a GPU (Graphics Processing Unit – a specialized processor that perform complex computations).</p>
<p>To start, you can download and install the Ollama software <a target="_blank" href="https://ollama.com/download">here</a>.</p>
<p>Then you can confirm installation by running this command:</p>
<pre><code class="lang-bash">ollama --version
</code></pre>
<p>Run the following command to start Ollama:</p>
<pre><code class="lang-bash">ollama serve
</code></pre>
<p>Next, run the following command to pull the Q4_K_M quantization of llama3.2:3b-instruct:</p>
<pre><code class="lang-bash">ollama pull llama3.2:3b-instruct-q4_K_M
</code></pre>
<p>Then confirm that the model was extracted with this:</p>
<pre><code class="lang-bash">ollama list
</code></pre>
<p>If the model extraction was successful, a list containing the model’s name, ID, and size will be returned, like so:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744288047721/f6349ca4-fe86-4851-beaf-2f04fe2a4d80.png" alt="Confirm Ollama Installation" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now you can chat with the model:</p>
<pre><code class="lang-bash">ollama run llama3.2:3b-instruct-q4_K_M
</code></pre>
<p>If successful, you should receive a prompt that you can test by asking a question and getting an answer. For example:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744288433940/d831d256-0f6c-49c0-b647-bce1c1976584.png" alt="Ollama llama3.2:3b-instruct-q4_K_M chat console" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Then you can exit the console by typing <code>/bye</code> or ctrl + D</p>
<h2 id="heading-data-collection-and-cleaning">Data Collection and Cleaning</h2>
<p>The chatbot you are building will be a cooking assistant that suggests recipes given your available ingredients, what you want to eat, and how much food a recipe yields.</p>
<p>You first have to get the data to train the model. You will be using a <a target="_blank" href="https://www.kaggle.com/datasets/paultimothymooney/recipenlg">dataset</a> that contains recipes from Kaggle.</p>
<p>To start, load the necessary libraries:</p>
<pre><code class="lang-r"><span class="hljs-comment"># loading required libraries</span>
<span class="hljs-keyword">library</span>(xml2) <span class="hljs-comment">#read, parse, and manipulate XML,HTML documents</span>
<span class="hljs-keyword">library</span>(jsonlite) <span class="hljs-comment">#manipulate JSON objects</span>

<span class="hljs-keyword">library</span>(RKaggle) <span class="hljs-comment"># download datasets from Kaggle </span>
<span class="hljs-keyword">library</span>(dplyr)   <span class="hljs-comment"># data manipulation</span>
</code></pre>
<p>Then download and save recipe dataset:</p>
<pre><code class="lang-r"><span class="hljs-comment"># Download and read the "recipe" dataset from Kaggle</span>
recipes_list &lt;- RKaggle::get_dataset(<span class="hljs-string">"thedevastator/better-recipes-for-a-better-life"</span>)
</code></pre>
<p>Inspect the dataframe and extract the first element like this:</p>
<pre><code class="lang-r"><span class="hljs-comment"># inspect the dataset</span>
class(recipes_list)
str(recipes_list)
head(recipes_list)
<span class="hljs-comment"># extract the first tibble</span>
recipes_df &lt;- recipes_list[[<span class="hljs-number">1</span>]]
</code></pre>
<p>A quick inspection of the <code>recipes_list</code> object shows that it contains two objects of type tibble. You will be using only the first element for this project. A tibble is a type of data structure used for storing and manipulating data. It’s similar to a traditional dataframe, but it’s designed to enforce stricter rules and perform fewer automatic actions compared to traditional dataframes.</p>
<p>We’ll use a regular dataframe in this project because more people are likely familiar with it. It can also efficiently handle row indexing, which is crucial for accessing and manipulating specific rows in our recipe dataset.</p>
<p>In the code block below, you’ll convert the tibble to a dataframe and then drop the first column, which is the index column. Then you’ll inspect the newly converted dataframe and drop unnecessary columns.</p>
<p>Unnecessary columns are best removed to streamline the dataset and focus on relevant features. In this project, we’ll drop certain columns that aren’t particularly useful for training the chatbot. This ensures that the model concentrates on meaningful data to improve its accuracy and functionality.</p>
<pre><code class="lang-r"><span class="hljs-comment"># convert to dataframe and drop the first column</span>
recipes_df &lt;- as.data.frame(recipes_df[, -<span class="hljs-number">1</span>])
<span class="hljs-comment"># inspect the converted dataframe</span>
head(recipes_df)
class(recipes_df)
colnames(recipes_df)
<span class="hljs-comment"># drop unnecessary columns</span>
cleaned_recipes_df &lt;- subset(recipes_df, select = -c(yield,rating,url,cuisine_path,nutrition,timing,img_src))
</code></pre>
<p>Now you need to identify rows with NA (missing) values, which you can do like this:</p>
<pre><code class="lang-r"><span class="hljs-comment"># Identify rows and columns with NA values</span>
which(is.na(cleaned_recipes_df), arr.ind = <span class="hljs-literal">TRUE</span>)

<span class="hljs-comment"># a quick inspection reveals columns [2:4] have missing values</span>
subset_column_names &lt;- colnames(cleaned_recipes_df)[<span class="hljs-number">2</span>:<span class="hljs-number">4</span>]
subset_column_names
</code></pre>
<p>It is important to handle NA values to ensure that your data is complete, to prevent errors, and to preserve context.</p>
<p>Now, replace the NA values and confirm that there are no missing values:</p>
<pre><code class="lang-r"><span class="hljs-comment"># Replace NA values dynamically based on conditions</span>
cols_to_modify &lt;- c(<span class="hljs-string">"prep_time"</span>, <span class="hljs-string">"cook_time"</span>, <span class="hljs-string">"total_time"</span>)
cleaned_recipes_df[cols_to_modify] &lt;- lapply(
  cleaned_recipes_df[cols_to_modify],
  <span class="hljs-keyword">function</span>(x, df) {
    <span class="hljs-comment"># Replace NA in prep_time and cook_time where both are NA</span>
    replace(x, is.na(df$prep_time) &amp; is.na(df$cook_time), <span class="hljs-string">"unknown"</span>)
  },
  df = cleaned_recipes_df  <span class="hljs-comment"># Pass the whole dataframe for conditions</span>
)
cleaned_recipes_df &lt;- cleaned_recipes_df %&gt;%
  mutate(
    prep_time = case_when(
      <span class="hljs-comment"># If cooktime is present but preptime is NA, replace with "no preparation required"</span>
      !is.na(cook_time) &amp; is.na(prep_time) ~ <span class="hljs-string">"no preparation required"</span>,
      <span class="hljs-comment"># Otherwise, retain original value</span>
      <span class="hljs-literal">TRUE</span> ~ as.character(prep_time)
    ),
    cook_time = case_when(
      <span class="hljs-comment"># If prep_time is present but cook_time is NA, replace with "no cooking required"</span>
      !is.na(prep_time) &amp; is.na(cook_time) ~ <span class="hljs-string">"no cooking required"</span>,
      <span class="hljs-comment"># Otherwise, retain original value</span>
      <span class="hljs-literal">TRUE</span> ~ as.character(cook_time)
    )
  )
<span class="hljs-comment"># confirm there are no missing values</span>
any(is.na(cleaned_recipes_df))
)

<span class="hljs-comment"># confirm the replacing NA logic works by inspecting specific rows</span>
cleaned_recipes_df[<span class="hljs-number">1081</span>,]
cleaned_recipes_df[<span class="hljs-number">1</span>,]
cleaned_recipes_df[<span class="hljs-number">405</span>,]
</code></pre>
<p>For this tutorial, we’ll subset the dataframe to the first 250 rows for demo purposes. This saves on time when it comes to generating embeddings.</p>
<pre><code class="lang-r"><span class="hljs-comment"># recommended for demo/learning purposes</span>
cleaned_recipes_df &lt;- head(cleaned_recipes_df,<span class="hljs-number">250</span>)
</code></pre>
<h2 id="heading-how-to-create-chunks">How to Create Chunks</h2>
<p>To understand why chunking is important before embedding, you need to understand what an embedding is.</p>
<p>An embedding is a vectoral representation of a word or a sentence. Machines don’t understand human text – they understand numbers. LLMs work by transforming human text to numerical representations in order to give answers. The process of generating embeddings requires a lot of computation, and breaking down the data to be embedded optimizes the embedding process.</p>
<p>So now we’re going to split the dataframe into smaller chunks of a specified size to enable efficient batch processing and iteration.</p>
<pre><code class="lang-r"><span class="hljs-comment"># Define the size of each chunk (number of rows per chunk)</span>
chunk_size &lt;- <span class="hljs-number">1</span>

<span class="hljs-comment"># Get the total number of rows in the dataframe</span>
n &lt;- nrow(cleaned_recipes_df)

<span class="hljs-comment"># Create a vector of group numbers for chunking</span>
<span class="hljs-comment"># Each group number repeats for 'chunk_size' rows</span>
<span class="hljs-comment"># Ensure the vector matches the total number of rows</span>
r &lt;- rep(<span class="hljs-number">1</span>:ceiling(n/chunk_size), each = chunk_size)[<span class="hljs-number">1</span>:n]

<span class="hljs-comment"># Split the dataframe into smaller chunks (subsets) based on the group numbers</span>
chunks &lt;- split(cleaned_recipes_df, r)
</code></pre>
<h2 id="heading-how-to-generate-sentence-embeddings">How to Generate Sentence Embeddings</h2>
<p>As previously mentioned, embeddings are vector representations of words or sentences. Embeddings can be generated from both words and sentences. How you choose to generate embeddings depends on your intended application of the LLM.</p>
<p>Word embeddings are numerical representations of individual words in a continuous vector space. They capture semantic relationships between words, allowing similar words to have vectors close to each other.</p>
<p>Word embeddings can be used in search engines as they support word-level queries by matching embeddings to retrieve relevant documents. They can also be used in text classification to classify documents, emails, or tweets based on word-level features (for example, detecting spam emails or sentiment analysis).</p>
<p>Sentence embeddings are numerical representations of entire sentences in a vector space, designed to capture the overall meaning and context of the sentence. They are used in settings where sentences provide better context like question answering systems where user queries are matched to relevant sentences or documents for more precise retrieval.</p>
<p>For our recipe chatbot, sentence embedding is the best choice.</p>
<p>First, create an empty dataframe that has three columns.</p>
<pre><code class="lang-r"><span class="hljs-comment">#empty dataframe</span>
recipe_sentence_embeddings &lt;-  data.frame(
  recipe = character(),
  recipe_vec_embeddings = I(list()),
  recipe_id = character()
)
</code></pre>
<p>The first column will hold the actual recipe in text form, the <code>recipe_vec_embeddings</code> column will hold the generated sentence embeddings, and the <code>recipe_id</code> holds a unique id for each recipe. This will help in indexing and retrieval from the vector database.</p>
<p>Next, it’s helpful to define a progress bar, which you can do like this:</p>
<pre><code class="lang-r"><span class="hljs-comment"># create a progress bar</span>
pb &lt;- txtProgressBar(min = <span class="hljs-number">1</span>, max = length(chunks), style = <span class="hljs-number">3</span>)
</code></pre>
<p>Embedding can take a while, so it’s important to keep track of the progress of the process.</p>
<p>Now it’s time to generate embeddings and populate the dataframe.</p>
<p>Write a for loop that executes the code block as long as the length of the chunks.</p>
<pre><code class="lang-r"><span class="hljs-keyword">for</span> (i <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(chunks)) {}
</code></pre>
<p>The recipe field is the text at the chunk that is currently being executed and the unique chunk id is generated by pasting the index of the chunk and the text “chunk”.</p>
<pre><code class="lang-r"><span class="hljs-keyword">for</span> (i <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(chunks)) {
    recipe &lt;- as.character(chunks[i])
    recipe_id &lt;- paste0(<span class="hljs-string">"recipe"</span>,i)
}
</code></pre>
<p>The text embed function from the text library generates either sentence or word embeddings. It takes in a character variable or a dataframe and produces a tibble of embeddings. You can read loading instructions here for smooth running of the <a target="_blank" href="https://www.r-text.org/">text</a> library.</p>
<p>The <code>batch_size</code> defines how many rows are embedded at a time from the input. Setting the <code>keep_token_embeddings</code> discards the embeddings for individual tokens after processing, and <code>aggregation_from_layers_to_tokens</code> “concatenates” or combines embeddings from specified layers to create detailed embeddings for each token. A token is the smallest unit of text that a model can process.</p>
<pre><code class="lang-r"><span class="hljs-keyword">for</span> (i <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(chunks)) {
    recipe &lt;- as.character(chunks[i])
    recipe_id &lt;- paste0(<span class="hljs-string">"recipe"</span>,i)
    recipe_embeddings &lt;- textEmbed(as.character(recipe),
                                layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                keep_token_embeddings = <span class="hljs-literal">FALSE</span>,
                                batch_size = <span class="hljs-number">1</span>
  )
}
</code></pre>
<p>In order to specify sentence embeddings, you need to set the argument to the <code>aggregation_from_tokens_to_texts</code> parameter as <code>"mean"</code>.</p>
<pre><code class="lang-r">aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>
</code></pre>
<p>The "mean" operation averages the embeddings of all tokens in a sentence to generate a single vector that represents the entire sentence. This sentence-level embedding captures the overall meaning and semantics of the text, regardless of its token length.</p>
<pre><code class="lang-r"><span class="hljs-comment"># convert tibble to vector</span>
  recipe_vec_embeddings &lt;- unlist(recipe_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
  recipe_vec_embeddings &lt;- list(recipe_vec_embeddings)
</code></pre>
<p>The embedding function returns a tibble object. In order to obtain a vector embedding, you need to first unlist the tibble and drop the row names and then list the result to form a simple vector.</p>
<pre><code class="lang-r">  <span class="hljs-comment"># Append the current chunk's data to the dataframe</span>
  recipe_sentence_embeddings &lt;- recipe_sentence_embeddings %&gt;%
    add_row(
      recipe = recipe,
      recipe_vec_embeddings = recipe_vec_embeddings,
      recipe_id = recipe_id
    )
</code></pre>
<p>Finally, update the empty dataframe after each iteration with the newly generated data.</p>
<pre><code class="lang-r">  <span class="hljs-comment"># track embedding progress</span>
  setTxtProgressBar(pb, i)
</code></pre>
<p>In order to keep track of the embedding progress, you can use the earlier defined progress bar inside the loop. It will update at the end of every iteration.</p>
<p><strong>Complete Code Block:</strong></p>
<pre><code class="lang-r"><span class="hljs-comment"># load required library</span>
<span class="hljs-keyword">library</span>(text)
<span class="hljs-comment"># # ensure to read loading instructions here for smooth running of the 'text' library</span>
<span class="hljs-comment"># # https://www.r-text.org/</span>
<span class="hljs-comment"># embedding data</span>
<span class="hljs-keyword">for</span> (i <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(chunks)) {
  recipe &lt;- as.character(chunks[i])
  recipe_id &lt;- paste0(<span class="hljs-string">"recipe"</span>,i)
  recipe_embeddings &lt;- textEmbed(as.character(recipe),
                                layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                keep_token_embeddings = <span class="hljs-literal">FALSE</span>,
                                batch_size = <span class="hljs-number">1</span>
  )

  <span class="hljs-comment"># convert tibble to vector</span>
  recipe_vec_embeddings &lt;- unlist(recipe_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
  recipe_vec_embeddings &lt;- list(recipe_vec_embeddings)

  <span class="hljs-comment"># Append the current chunk's data to the dataframe</span>
  recipe_sentence_embeddings &lt;- recipe_sentence_embeddings %&gt;%
    add_row(
      recipe = recipe,
      recipe_vec_embeddings = recipe_vec_embeddings,
      recipe_id = recipe_id
    )

  <span class="hljs-comment"># track embedding progress</span>
  setTxtProgressBar(pb, i)

}
</code></pre>
<h2 id="heading-how-to-set-up-the-vector-database-for-embedding-storage">How to Set Up the Vector Database for Embedding Storage</h2>
<p>A vector database is a special type of database that stores embeddings and allows you to query and retrieve relevant information. There are numerous vector databases available, but for this project, you will use ChromaDB, an open-source option that integrates with the R environment through the <code>rchroma</code> library.</p>
<p>ChromaDB runs locally in a Docker container. Just make sure you have Docker installed and running on your device.</p>
<p>Then load the rchroma library and run your ChromaDB instance:</p>
<pre><code class="lang-r"><span class="hljs-comment"># load rchroma library</span>
<span class="hljs-keyword">library</span>(rchroma)
<span class="hljs-comment"># run ChromaDB instance.</span>
chroma_docker_run()
</code></pre>
<p>If it was successful, you should see this in the console:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744383249217/bd8fb67c-0731-46f9-8a13-0747b4789714.png" alt="Confirm ChromaDB is running locally" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Next, connect to a local ChromaDB instance and check the connection:</p>
<pre><code class="lang-r"><span class="hljs-comment"># Connect to a local ChromaDB instance</span>
client &lt;- chroma_connect()

<span class="hljs-comment"># Check the connection</span>
heartbeat(client)
version(client)
</code></pre>
<p>Now you’ll need to create a collection and confirm that it was created. Collections in ChromaDB function similarly to tables in conventional databases.</p>
<pre><code class="lang-r"><span class="hljs-comment"># Create a new collection</span>
create_collection(client, <span class="hljs-string">"recipes_collection"</span>)

<span class="hljs-comment"># List all collections</span>
list_collections(client)
</code></pre>
<p>Now, add embeddings to the collection. To add embeddings to the <code>recipes_collection</code>, use the <code>add_documents</code> function.</p>
<pre><code class="lang-r"><span class="hljs-comment"># Add documents to the collection</span>
add_documents(
  client,
  <span class="hljs-string">"recipes_collection"</span>,
  documents = recipe_sentence_embeddings$recipe,
  ids = recipe_sentence_embeddings$recipe_id,
  embeddings = recipe_sentence_embeddings$recipe_vec_embeddings
)
</code></pre>
<p>The <code>add_documents()</code> function is used to add recipe data to the <code>recipes_collection</code>. Here's a breakdown of its arguments and how the corresponding data is accessed:</p>
<ol>
<li><p><code>documents</code>: This argument represents the recipe text. It is sourced from the <code>recipe</code> column of the <code>recipe_sentence_embeddings</code> dataframe.</p>
</li>
<li><p><code>ids</code>: This is the unique identifier for each recipe. It is extracted from the <code>recipe_id</code> column of the same dataframe.</p>
</li>
<li><p><code>embeddings</code>: This contains the sentence embeddings, which were previously generated for each recipe. These embeddings are accessed from the <code>recipe_vec_embeddings</code> column of the dataframe.</p>
</li>
</ol>
<p>All three arguments—<code>documents</code>, <code>ids</code>, and <code>embeddings</code>—are obtained by subsetting their respective columns from the <code>recipe_sentence_embeddings</code> dataframe.</p>
<h2 id="heading-how-to-write-the-user-input-query-embedding-function">How to Write the User Input Query Embedding Function</h2>
<p>In order to retrieve information from a vector database, you must first embed your query text. The database compares your query's embedding with its stored embeddings to find and retrieve the most relevant document.</p>
<p>It's important to ensure that the dimensions (rows × columns) of your query embedding match those of the database embeddings. This alignment is achieved by using the same embedding model to generate your query.</p>
<p>Matching embeddings involves calculating the similarity (for example, cosine similarity) between the query and stored embeddings, identifying the closest match for effective retrieval.</p>
<p>Let’s write a function that allows us to embed a query which then queries similar documents using the generated embeddings. Wrapping it in a function makes it reusable.</p>
<pre><code class="lang-r">  <span class="hljs-comment">#sentence embeddings function and query</span>
  question &lt;- <span class="hljs-keyword">function</span>(sentence){
    sentence_embeddings &lt;- textEmbed(sentence,
                                     layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                     aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                     aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                     keep_token_embeddings = <span class="hljs-literal">FALSE</span>
    )

    <span class="hljs-comment"># convert tibble to vector</span>
    sentence_vec_embeddings &lt;- unlist(sentence_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
    sentence_vec_embeddings &lt;- list(sentence_vec_embeddings)

    <span class="hljs-comment"># Query similar documents using embeddings</span>
    results &lt;- query(
      client,
      <span class="hljs-string">"recipes_collection"</span>,
      query_embeddings = sentence_vec_embeddings ,
      n_results = <span class="hljs-number">2</span>
    )
    results

  }
</code></pre>
<p>This chunk of code is similar to how we have previously used the <code>text_embed()</code> function. The <code>query()</code> function is added to enable querying the vector database, particularly the recipes' collection, and returns the top two documents that closely match a user’s query.</p>
<p>Our function thus takes in a sentence as an argument and embeds the sentence to generate sentence embeddings. It then queries the database and returns two documents that match the query most.</p>
<h2 id="heading-tool-calling">Tool Calling</h2>
<p>To interact with Ollama in R, you will utilize the <code>ellmer</code> library. This library streamlines the use of large language models (LLMs) by offering an interface that enables seamless access to and interaction with a variety of LLM providers.</p>
<p>To enhance the LLM’s usage, we need to provide context to it. You can do this by tool calling. Tool calling allows an LLM to access external resources in order to enhance its functionality.</p>
<p>For this project, we are implementing <a target="_blank" href="https://www.freecodecamp.org/news/learn-rag-fundamentals-and-advanced-techniques/">Retrieval-Augmented Generation (RAG)</a>, which combines retrieving relevant information from a vector database and generating responses using an LLM. This approach improves the chatbot's ability to provide accurate and contextually relevant answers.</p>
<p>Now, define a function that links to the LLM to provide context using the <code>tool()</code> function from the <code>ellmer</code> library.</p>
<pre><code class="lang-r"><span class="hljs-comment"># load ellmer library</span>
<span class="hljs-keyword">library</span>(ellmer)

<span class="hljs-comment"># function that links to llm to provide context</span>
  tool_context  &lt;- tool(
    question,
    <span class="hljs-string">"obtains the right context for a given question"</span>,
    sentence = type_string()

  )
</code></pre>
<p>The <code>tool()</code> function takes the question function that returns the relevant documents that we’ll use as context as the first argument. We’ll use the documents to help the LLM answer questions accordingly.</p>
<p>The text, "obtains the right context for a given question", is a description of what the tool will be doing.</p>
<p>Finally, the <code>sentence = type_string()</code> defines what type of object the <code>question()</code> function expects.</p>
<h2 id="heading-how-to-initialize-the-chat-system-design-prompts-and-integrate-tools">How to Initialize the Chat System, Design Prompts, and Integrate Tools</h2>
<p>Next, you’ll set up a conversational AI system by defining its role and functionality. Using system prompt design, you will shape the assistant’s behavior, tone, and focus as a culinary assistant. You’ll also integrate external tools to extend the chatbot’s capabilities by registering tools. Let’s dive in.</p>
<p>First, you need to initialize a Chat Object:</p>
<pre><code class="lang-r"><span class="hljs-comment">#  Initialize the chat system with propmpt instructions.</span>
  chat &lt;- chat_ollama(system_prompt = <span class="hljs-string">"You are a knowledgeable culinary assistant specializing in recipe recommendations. 
                      You provide tailored meal suggestions based on the user's available ingredients and the desired amount of food or servings.
                      Ensure the recipes align closely with the user's inputs and yield the expected quantity."</span>,
                      model = <span class="hljs-string">"llama3.2:3b-instruct-q4_K_M"</span>)
</code></pre>
<p>You can do that using the <code>chat_ollama()</code> function. This sets up a conversational agent with the specified system prompt and model.</p>
<p>The system prompt defines the conversational behavior, tone, and focus of the LLM while the model argument specifies the language model (<code>llama3.2:3b-instruct-q4_K_M</code>) that the chat system will use to generate responses.</p>
<p>Next, you need to register a tool.</p>
<pre><code class="lang-r"> <span class="hljs-comment">#register tool</span>
  chat$register_tool(tool_context)
</code></pre>
<p>We need to tell our chat object about our <code>tool_context()</code> function. Do this by registering a tool using the <code>register_tool()</code> function.</p>
<h2 id="heading-how-to-interact-with-your-chatbot-using-a-shiny-app"><strong>How to Interact with Your Chatbot Using a Shiny App</strong></h2>
<p>To interact with the chatbot you’ve just created, we’ll use <strong>Shiny</strong>, a framework for building interactive web applications in R. Shiny provides a user-friendly graphical interface that allows seamless interaction with the chatbot.</p>
<p>For this purpose, we’ll use the <strong>shinychat</strong> library, which simplifies the process of building a chat interface within a Shiny app. This involves defining two key components:</p>
<ol>
<li><p><strong>User Interface (UI)</strong>:</p>
<ul>
<li><p>Responsible for the visual layout and what the user sees.</p>
</li>
<li><p>In this case, <code>chat_ui("chat")</code> is used to create the interactive chat interface.</p>
</li>
</ul>
</li>
<li><p><strong>Server Function</strong>:</p>
<ul>
<li><p>Handles the functionality and logic of the application.</p>
</li>
<li><p>It connects the chatbot to external tools and manages processes like embedding queries, retrieving relevant responses, and handling user inputs.</p>
</li>
</ul>
</li>
</ol>
<pre><code class="lang-r"><span class="hljs-comment"># load the required library</span>
<span class="hljs-keyword">library</span>(shinychat)

<span class="hljs-comment"># wrap the chat code in a Shiny App</span>
ui &lt;- bslib::page_fluid(
  chat_ui(<span class="hljs-string">"chat"</span>)
)

server &lt;- <span class="hljs-keyword">function</span>(input, output, session) {
  <span class="hljs-comment"># Connect to a local ChromaDB instance running on docker with embeddings loaded</span>
  client &lt;- chroma_connect()

  <span class="hljs-comment">#sentence embeddings function and query</span>
  question &lt;- <span class="hljs-keyword">function</span>(sentence){
    sentence_embeddings &lt;- textEmbed(sentence,
                                     layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                     aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                     aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                     keep_token_embeddings = <span class="hljs-literal">FALSE</span>
    )

    <span class="hljs-comment"># convert tibble to vector</span>
    sentence_vec_embeddings &lt;- unlist(sentence_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
    sentence_vec_embeddings &lt;- list(sentence_vec_embeddings)

    <span class="hljs-comment"># Query similar documents using embeddings</span>
    results &lt;- query(
      client,
      <span class="hljs-string">"recipes_collection"</span>,
      query_embeddings = sentence_vec_embeddings ,
      n_results = <span class="hljs-number">2</span>
    )
    results

  }


  <span class="hljs-comment"># function that provides context</span>
  tool_context  &lt;- tool(
    question,
    <span class="hljs-string">"obtains the right context for a given question"</span>,
    sentence = type_string()

  )

  <span class="hljs-comment">#  Initialize the chat system with the first chunk</span>
  chat &lt;- chat_ollama(system_prompt = <span class="hljs-string">"You are a knowledgeable culinary assistant specializing in recipe recommendations. 
                      You provide tailored meal suggestions based on the user's available ingredients and the desired amount of food or servings.
                      Ensure the recipes align closely with the user's inputs and yield the expected quantity."</span>,
                      model = <span class="hljs-string">"llama3.2:3b-instruct-q4_K_M"</span>)
  <span class="hljs-comment">#register tool</span>
  chat$register_tool(tool_context)

  observeEvent(input$chat_user_input, {
    stream &lt;- chat$stream_async(input$chat_user_input)
    chat_append(<span class="hljs-string">"chat"</span>, stream)
  })
}

shinyApp(ui, server)
</code></pre>
<p>Alright, let’s understand how this is working:</p>
<ol>
<li><p><strong>User input monitoring with</strong> <code>observeEvent()</code>: The <code>observeEvent()</code> block monitors user inputs from the chat interface (<code>input$chat_user_input</code>). When a user sends a message, the chatbot processes it, retrieves relevant context using the embeddings, and streams the response dynamically to the chat interface.</p>
</li>
<li><p><strong>Tool calling for context</strong>: The chatbot employs tool calling to interact with external resources (like the vector database) and enhance its functionality. In this project, Retrieval-Augmented Generation (RAG) ensures the chatbot provides accurate and context-rich responses by integrating retrieval and generation seamlessly.</p>
</li>
</ol>
<p>This approach brings the chatbot to life, enabling users to interact with it dynamically through a responsive Shiny app.</p>
<h2 id="heading-complete-code">Complete Code</h2>
<p>The R scripts have been split in two, with <code>data.R</code> containing code that handles data gathering and cleaning, text chunking, sentence embeddings generation, creating a vector database, and loading documents to it.</p>
<p>The <code>chat.R</code> script contains code that handles user input querying, context retrieval, chat initialization, system prompt design, tool integration, and a chat Shiny app.</p>
<p><strong>data.R</strong></p>
<pre><code class="lang-r"><span class="hljs-comment"># install and load required packages</span>
<span class="hljs-comment"># install devtools from CRAN</span>
install.packages(<span class="hljs-string">'devtools'</span>)
devtools::install_github(<span class="hljs-string">"benyamindsmith/RKaggle"</span>)

<span class="hljs-keyword">library</span>(text)
<span class="hljs-keyword">library</span>(rchroma)
<span class="hljs-keyword">library</span>(RKaggle)
<span class="hljs-keyword">library</span>(dplyr)

<span class="hljs-comment"># run ChromaDB instance.</span>
chroma_docker_run()

<span class="hljs-comment"># Connect to a local ChromaDB instance</span>
client &lt;- chroma_connect()

<span class="hljs-comment"># Check the connection</span>
heartbeat(client)
version(client)


<span class="hljs-comment"># Create a new collection</span>
create_collection(client, <span class="hljs-string">"recipes_collection"</span>)

<span class="hljs-comment"># List all collections</span>
list_collections(client)

<span class="hljs-comment"># Download and read the "recipe" dataset from Kaggle</span>
recipes_list &lt;- RKaggle::get_dataset(<span class="hljs-string">"thedevastator/better-recipes-for-a-better-life"</span>)

<span class="hljs-comment"># extract the first tibble</span>
recipes_df &lt;- recipes_list[[<span class="hljs-number">1</span>]]

<span class="hljs-comment"># convert to dataframe and drop the first column</span>
recipes_df &lt;- as.data.frame(recipes_df[, -<span class="hljs-number">1</span>])

<span class="hljs-comment"># drop unnecessary columns</span>
cleaned_recipes_df &lt;- subset(recipes_df, select = -c(yield,rating,url,cuisine_path,nutrition,timing,img_src))

<span class="hljs-comment">## Replace NA values dynamically based on conditions</span>
<span class="hljs-comment"># Replace NA when all columns have NA values</span>
cols_to_modify &lt;- c(<span class="hljs-string">"prep_time"</span>, <span class="hljs-string">"cook_time"</span>, <span class="hljs-string">"total_time"</span>)
cleaned_recipes_df[cols_to_modify] &lt;- lapply(
  cleaned_recipes_df[cols_to_modify],
  <span class="hljs-keyword">function</span>(x, df) {
    <span class="hljs-comment"># Replace NA in prep_time and cook_time where both are NA</span>
    replace(x, is.na(df$prep_time) &amp; is.na(df$cook_time), <span class="hljs-string">"unknown"</span>)
  },
  df = cleaned_recipes_df  
)

<span class="hljs-comment"># Replace NA when either or columns have NA values</span>
cleaned_recipes_df &lt;- cleaned_recipes_df %&gt;%
  mutate(
    prep_time = case_when(
      <span class="hljs-comment"># If cook_time is present but prep_time is NA, replace with "no preparation required"</span>
      !is.na(cook_time) &amp; is.na(prep_time) ~ <span class="hljs-string">"no preparation required"</span>,
      <span class="hljs-comment"># Otherwise, retain original value</span>
      <span class="hljs-literal">TRUE</span> ~ as.character(prep_time)
    ),
    cook_time = case_when(
      <span class="hljs-comment"># If prep_time is present but cook_time is NA, replace with "no cooking required"</span>
      !is.na(prep_time) &amp; is.na(cook_time) ~ <span class="hljs-string">"no cooking required"</span>,
      <span class="hljs-comment"># Otherwise, retain original value</span>
      <span class="hljs-literal">TRUE</span> ~ as.character(cook_time)
    )
  )

<span class="hljs-comment"># chunk the dataset</span>
chunk_size &lt;- <span class="hljs-number">1</span>
n &lt;- nrow(cleaned_recipes_df)
r &lt;- rep(<span class="hljs-number">1</span>:ceiling(n/chunk_size),each = chunk_size)[<span class="hljs-number">1</span>:n]
chunks &lt;- split(cleaned_recipes_df,r)

<span class="hljs-comment">#empty dataframe</span>
recipe_sentence_embeddings &lt;-  data.frame(
  recipe = character(),
  recipe_vec_embeddings = I(list()),
  recipe_id = character()
)

<span class="hljs-comment"># create a progress bar</span>
pb &lt;- txtProgressBar(min = <span class="hljs-number">1</span>, max = length(chunks), style = <span class="hljs-number">3</span>)

<span class="hljs-comment"># embedding data</span>
<span class="hljs-keyword">for</span> (i <span class="hljs-keyword">in</span> <span class="hljs-number">1</span>:length(chunks)) {
  recipe &lt;- as.character(chunks[i])
  recipe_id &lt;- paste0(<span class="hljs-string">"recipe"</span>,i)
  recipe_embeddings &lt;- textEmbed(as.character(recipe),
                                layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                keep_token_embeddings = <span class="hljs-literal">FALSE</span>,
                                batch_size = <span class="hljs-number">1</span>
  )

  <span class="hljs-comment"># convert tibble to vector</span>
  recipe_vec_embeddings &lt;- unlist(recipe_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
  recipe_vec_embeddings &lt;- list(recipe_vec_embeddings)

  <span class="hljs-comment"># Append the current chunk's data to the dataframe</span>
  recipe_sentence_embeddings &lt;- recipe_sentence_embeddings %&gt;%
    add_row(
      recipe = recipe,
      recipe_vec_embeddings = recipe_vec_embeddings,
      recipe_id = recipe_id
    )

  <span class="hljs-comment"># track embedding progress</span>
  setTxtProgressBar(pb, i)

}

<span class="hljs-comment"># Add documents to the collection</span>
add_documents(
  client,
  <span class="hljs-string">"recipes_collection"</span>,
  documents = recipe_sentence_embeddings$recipe,
  ids = recipe_sentence_embeddings$recipe_id,
  embeddings = recipe_sentence_embeddings$recipe_vec_embeddings
)
</code></pre>
<p><strong>chat.R</strong></p>
<pre><code class="lang-r"><span class="hljs-comment"># Load required packages</span>
<span class="hljs-keyword">library</span>(ellmer)
<span class="hljs-keyword">library</span>(text)
<span class="hljs-keyword">library</span>(rchroma)
<span class="hljs-keyword">library</span>(shinychat)

ui &lt;- bslib::page_fluid(
  chat_ui(<span class="hljs-string">"chat"</span>)
)

server &lt;- <span class="hljs-keyword">function</span>(input, output, session) {
  <span class="hljs-comment"># Connect to a local ChromaDB instance running on docker with embeddings loaded </span>
  client &lt;- chroma_connect()

  <span class="hljs-comment"># sentence embeddings function and query</span>
  question &lt;- <span class="hljs-keyword">function</span>(sentence){
    sentence_embeddings &lt;- textEmbed(sentence,
                                     layers = <span class="hljs-number">10</span>:<span class="hljs-number">11</span>,
                                     aggregation_from_layers_to_tokens = <span class="hljs-string">"concatenate"</span>,
                                     aggregation_from_tokens_to_texts = <span class="hljs-string">"mean"</span>,
                                     keep_token_embeddings = <span class="hljs-literal">FALSE</span>
    )

    <span class="hljs-comment"># convert tibble to vector</span>
    sentence_vec_embeddings &lt;- unlist(sentence_embeddings, use.names = <span class="hljs-literal">FALSE</span>)
    sentence_vec_embeddings &lt;- list(sentence_vec_embeddings)

    <span class="hljs-comment"># Query similar documents</span>
    results &lt;- query(
      client,
      <span class="hljs-string">"recipes_collection"</span>,
      query_embeddings = sentence_vec_embeddings ,
      n_results = <span class="hljs-number">2</span>
    )
    results

  }


  <span class="hljs-comment"># function that provides context</span>
  tool_context  &lt;- tool(
    question,
    <span class="hljs-string">"obtains the right context for a given question"</span>,
    sentence = type_string()

  )

  <span class="hljs-comment">#  Initialize the chat system </span>
  chat &lt;- chat_ollama(system_prompt = <span class="hljs-string">"You are a knowledgeable culinary assistant specializing in recipe recommendations. 
                      You provide tailored meal suggestions based on the user's available ingredients and the desired amount of food or servings.
                      Ensure the recipes align closely with the user's inputs and yield the expected quantity."</span>,
                      model = <span class="hljs-string">"llama3.2:3b-instruct-q4_K_M"</span>)
  <span class="hljs-comment">#register tool</span>
  chat$register_tool(tool_context)

  observeEvent(input$chat_user_input, {
    stream &lt;- chat$stream_async(input$chat_user_input)
    chat_append(<span class="hljs-string">"chat"</span>, stream)
  })
}

shinyApp(ui, server)
</code></pre>
<p>You can find the complete code <a target="_blank" href="https://github.com/elabongaatuo/Recipe-Chatbot/">here</a>.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Building a local Retrieval-Augmented Generation (RAG) application using Ollama and ChromaDB in R programming offers a powerful way to create a specialized conversational assistant.</p>
<p>By leveraging the capabilities of large language models and vector databases, you can efficiently manage and retrieve relevant information from extensive datasets.</p>
<p>This approach not only enhances the performance of language models but also ensures customization and privacy by running the application locally.</p>
<p>Whether you're developing a cooking assistant or any other domain-specific chatbot, this method provides a robust framework for delivering intelligent and contextually aware responses.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744380659737/4e096d1c-87d6-4baa-bbf3-03657e05c182.gif" alt="Chatbot running on Shiny giving relevant recipe after user prompt" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Run Open Source LLMs on Your Own Computer Using Ollama ]]>
                </title>
                <description>
                    <![CDATA[ AI tools have become commonplace these days, and you may use them daily. One of the key ways to secure your confidential data – both personal and business-related – is by running your own AI on your own infrastructure. This guide will explain how to ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-run-open-source-llms-on-your-own-computer-using-ollama/</link>
                <guid isPermaLink="false">6765d8da1ec59713cfa06b50</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ai agents ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Krishna Sarathi Ghosh ]]>
                </dc:creator>
                <pubDate>Fri, 20 Dec 2024 20:51:38 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1734681473969/20c1a1cd-898a-4f48-a26f-d2d3d2917efc.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>AI tools have become commonplace these days, and you may use them daily. One of the key ways to secure your confidential data – both personal and business-related – is by running your own AI on your own infrastructure.</p>
<p>This guide will explain how to host an open source LLM on your computer. Doing this helps make sure you don’t compromise your data to third-party companies through cloud-based AI solutions.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><strong>A little AI knowledge</strong>. I’ll cover the main concepts related to what we’ll be doing in the article, but some basic knowledge about LLMs will help you understand this better. No worries if you don’t know anything though – you should still find this interesting.</li>
</ul>
<ul>
<li><p><strong>A decent computer:</strong> A system with at least 16GB of RAM, a multi-core CPU, and preferably a GPU for optimal performance. (If you have lesser specs, it may be quite slow)</p>
</li>
<li><p><strong>Internet connection</strong>: Required to download and install the models.</p>
</li>
<li><p><strong>Time and patience</strong></p>
</li>
</ul>
<h2 id="heading-what-is-an-llm">What is an LLM?</h2>
<p>LLMs, or Large Language Models, are advanced AI systems that are trained to understand and generate natural human-readable language. They use algorithms to process and understand natural language and are trained on large amounts of information to understand patterns and relationships in the data.</p>
<p>Companies like OpenAI, Anthropic, and Meta have created LLMs that you can use to perform tasks such as generating content, analyzing code, planning trips, and so on.</p>
<h2 id="heading-cloud-based-ai-vs-self-hosted-ai">Cloud-Based AI vs. Self-Hosted AI</h2>
<p>Before deciding to host an AI model locally, it’s important to understand how this approach differs from cloud-based solutions. Both options have their strengths and are suited to different use cases.</p>
<h3 id="heading-cloud-based-ai-solutions"><strong>Cloud-Based AI Solutions</strong></h3>
<p>These services are hosted and maintained by providers like OpenAI, Google, or AWS. Examples include OpenAI’s GPT models, Google Bard, and AWS SageMaker. You access these models over the internet using APIs or their endpoints.</p>
<p><strong>Key Characteristics</strong>:</p>
<ul>
<li><p><strong>Easy to use</strong>: Setup is minimal – you simply integrate with an API or access through the web pages.</p>
</li>
<li><p><strong>Scalability</strong>: Handles large workloads and concurrent requests better since they’re managed by companies.</p>
</li>
<li><p><strong>Cutting-edge models</strong>: Often the latest and most powerful models are available in the cloud.</p>
</li>
<li><p><strong>Data dependency</strong>: Your data is sent to the cloud for processing, which may raise privacy concerns.</p>
</li>
<li><p><strong>Ongoing costs</strong>: Though some models are free, others are typically billed per request or usage on certain models like the more powerful or latest ones, making it an operational expense.</p>
</li>
</ul>
<h3 id="heading-self-hosted-ai"><strong>Self-Hosted AI</strong></h3>
<p>With this approach, you run the model on your own hardware. Open-source LLMs like Llama 2, GPT-J, or Mistral can be downloaded and hosted using tools like Ollama.</p>
<p><strong>Key Characteristics</strong>:</p>
<ul>
<li><p><strong>Data privacy</strong>: Your data stays on your infrastructure, giving you full control over it.</p>
</li>
<li><p><strong>More cost-effective over the long-term</strong>: Requires an upfront investment in hardware, but avoids recurring API fees.</p>
</li>
<li><p><strong>Customizability</strong>: You can fine-tune and adapt models to specific needs.</p>
</li>
<li><p><strong>Technical requirements</strong>: Requires powerful hardware, setup effort, and technical know-how.</p>
</li>
<li><p><strong>Limited scalability</strong>: Best suited for personal or small-scale use.</p>
</li>
</ul>
<h3 id="heading-which-should-you-choose"><strong>Which Should You Choose?</strong></h3>
<p>If you need quick and scalable access to advanced models and don’t mind sharing data with a third party, cloud-based AI solutions are likely the better option. On the other hand, if data security, customization, or cost savings are top priorities, hosting an LLM locally could be the way to go.</p>
<h2 id="heading-how-can-you-run-llms-locally-on-your-machine">How Can You Run LLMs Locally on Your Machine?</h2>
<p>There are various solutions out there that let you run certain open source LLMs on your own infrastructure.</p>
<p>While most locally-hosted solutions focus on <strong>open-source LLMs</strong>—such as Llama 2, GPT-J, or Mistral—there are cases where proprietary or licensed models can also be run locally, depending on their terms of use.</p>
<ul>
<li><p><strong>Open-Source Models</strong>: These are freely available and can be downloaded, modified, and hosted without licensing restrictions. Examples include Llama 2 (Meta), GPT-J, and Mistral.</p>
</li>
<li><p><strong>Proprietary Models with Local Options</strong>: Some companies may offer downloadable versions of their models for offline use, but this often requires specific licensing or hardware. For instance, NVIDIA’s NeMo framework provides tools for hosting their models on your infrastructure, and some smaller companies may offer downloadable versions of their proprietary LLMs for enterprise customers.</p>
</li>
</ul>
<p>Just remember that if you run your own LLM, you’ll need a powerful computer (with a good GPU and CPU). In case your computer is not very powerful, you can try running smaller and more lightweight models, though it can still be slow.</p>
<p><strong>Here’s an example of a suitable system setup that I am using for this guide</strong>:</p>
<ul>
<li><p>CPU: Intel Core i7 13700HX</p>
</li>
<li><p>RAM: 16GB DDR5</p>
</li>
<li><p>STORAGE: 512GB SSD</p>
</li>
<li><p>GPU: Nvidia RTX 3050 (6GB)</p>
</li>
</ul>
<p>In this guide, you’ll be using Ollama to download and run AI models on your PC.</p>
<h3 id="heading-what-is-ollama">What is Ollama?</h3>
<p><a target="_blank" href="http://ollama.com">Ollama</a> is a tool designed to simplify the process of running open-source large language models (LLMs) directly on your computer. It acts as a local model manager and runtime, handling everything from downloading the model files to setting up a local environment where you can interact with them.</p>
<p><strong>Here’s what Ollama helps you do:</strong></p>
<ul>
<li><p><strong>Manage your models</strong>: Ollama provides a straightforward way to browse, download, and manage different open-source models. You can view a list of supported models on their official website.</p>
</li>
<li><p><strong>Deploy easily</strong>: With just a few commands, you can set up a fully functional environment to run and interact with LLMs.</p>
</li>
<li><p><strong>Host locally</strong>: Models run entirely on your infrastructure, ensuring that your data stays private and secure.</p>
</li>
<li><p><strong>Integrate different models</strong>: It includes support for integrating models into your own projects using programming languages like Python or JavaScript.</p>
</li>
</ul>
<p>By using Ollama, you don’t need to dive deep into the complexities of setting up machine learning frameworks or managing dependencies. It simplifies the process, especially for those who want to experiment with LLMs without needing a deep technical background.</p>
<p>You can install Ollama very easily through the <strong>Download</strong> button in their <a target="_blank" href="http://ollama.com">website</a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734604517326/06605e51-4425-4dbe-b8d9-403270eec95b.png" alt="ollama official website" class="image--center mx-auto" width="797" height="651" loading="lazy"></p>
<h3 id="heading-how-to-use-ollama-to-installrun-your-model">How to Use Ollama to Install/Run Your Model</h3>
<p>After you have installed Ollama, follow these steps to install and use your model:</p>
<ol>
<li><p>Open your browser and go to <a target="_blank" href="http://localhost:11434">localhost:11434</a> to make sure Ollama is running.</p>
</li>
<li><p>Now, open the command prompt, and write <code>ollama run &lt;model_name&gt;</code>. Add your desired model name here which is supported by Ollama, say, Llama2 (by Meta) or Mistral.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734604496300/beef69ca-f6e0-44b8-a3a7-ed488e78e776.png" alt="picture of a command prompt window where the llama2 model is being installed" class="image--center mx-auto" width="1468" height="158" loading="lazy"></p>
</li>
<li><p>Wait for the installation process to finish.</p>
</li>
<li><p>In the prompt that says <code>&gt;&gt;&gt; Send a message (/? for help)</code>, write a message to the AI and press Enter.</p>
</li>
</ol>
<p>You have successfully installed your model and now you can chat with it!</p>
<h2 id="heading-building-a-chatbot-with-your-newly-installed-model">Building a Chatbot with Your Newly Installed Model</h2>
<p>With open source models running in your own infrastructure, you have a lot of freedom to alter and use the model any way you like. You can even use it to build local chatbots or applications for personal use by using the <code>ollama</code> module in Python, JavaScript, and other languages.</p>
<p>Now let’s walk through how you can build a chatbot with it in Python in just a few minutes.</p>
<h3 id="heading-step-1-install-python">Step 1: Install Python</h3>
<p>If you don’t already have Python installed, download and install it from the <a target="_blank" href="https://www.python.org/">official Python website</a>. For best compatibility<a target="_blank" href="https://www.python.org/">,</a> avoid using the most recent Python version, as some modules may not yet fully support it. Instead, select the latest stable version (generally the one before the most recent release) to ensure smooth functioning of all required modules.</p>
<p>While setting up Python, make sure to give the installer admin privileges and check the <strong>Add to PATH</strong> checkbox.</p>
<h3 id="heading-step-2-install-ollama">Step 2: Install Ollama</h3>
<p>Now, you need to open a new terminal window in the directory where the file is saved. You can open the directory in the File Explorer and <strong>right click</strong>, then click on <strong>Open in Terminal</strong> (<strong>Open with Command Prompt</strong> or <strong>Powershell</strong> if you’re using Windows 10 or a previous version).</p>
<p>Type <code>pip install ollama</code> and press Enter. This will install the <code>ollama</code> module for Python, so you can access your models and the functions provided by the tool from Python. Wait until the process finishes.</p>
<h3 id="heading-step-3-add-the-python-code">Step 3: Add the Python Code</h3>
<p>Go ahead and create a Python file with the <code>.py</code> extension somewhere in your File System, where you can access it easily. Open the file with your favourite Code Editor, and if you have none installed, you can use the online version of <a target="_blank" href="https://vscode.dev/">VS Code</a> from your browser.</p>
<p>Now, add this code in your Python File:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> ollama <span class="hljs-keyword">import</span> chat

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">stream_response</span>(<span class="hljs-params">user_input</span>):</span>
    <span class="hljs-string">"""Stream the response from the chat model and display it in the CLI."""</span>
    <span class="hljs-keyword">try</span>:
        print(<span class="hljs-string">"\nAI: "</span>, end=<span class="hljs-string">""</span>, flush=<span class="hljs-literal">True</span>)
        stream = chat(model=<span class="hljs-string">'llama2'</span>, messages=[{<span class="hljs-string">'role'</span>: <span class="hljs-string">'user'</span>, <span class="hljs-string">'content'</span>: user_input}], stream=<span class="hljs-literal">True</span>)
        <span class="hljs-keyword">for</span> chunk <span class="hljs-keyword">in</span> stream:
            content = chunk[<span class="hljs-string">'message'</span>][<span class="hljs-string">'content'</span>]
            print(content, end=<span class="hljs-string">''</span>, flush=<span class="hljs-literal">True</span>)
        print() 
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        print(<span class="hljs-string">f"\nError: <span class="hljs-subst">{str(e)}</span>"</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    print(<span class="hljs-string">"Welcome to your CLI AI Chatbot! Type 'exit' to quit.\n"</span>)
    <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
        user_input = input(<span class="hljs-string">"You: "</span>)
        <span class="hljs-keyword">if</span> user_input.lower() <span class="hljs-keyword">in</span> {<span class="hljs-string">"exit"</span>, <span class="hljs-string">"quit"</span>}:
            print(<span class="hljs-string">"Goodbye!"</span>)
            <span class="hljs-keyword">break</span>
        stream_response(user_input)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p>If you don’t understand Python code, here’s what it basically does:</p>
<ul>
<li><p>First, the chat module is imported from the <code>ollama</code> library, which contains pre-written code to integrate with the Ollama application on your computer.</p>
</li>
<li><p>Then a <code>stream_response</code> function is declared, which passes oyur prompt to the specified model, and streams (provides the response chunk by chunk as it is generated) the live response back to you.</p>
</li>
<li><p>Then in the main function, a Welcome text is printed to the terminal. It gets the user input which is passed to the <code>stream_response</code> function, all wrapped in a <code>while True</code> or infinite loop. This lets us ask the AI questions without the execution process breaking. We also specify that if the user input contains either <strong>exit</strong> or <strong>quit</strong>, the code will stop executing.</p>
</li>
</ul>
<h3 id="heading-step-4-write-prompts">Step 4: Write Prompts</h3>
<p>Now go back to the terminal window and type <code>python filename.py</code>, replacing <code>filename</code> with the actual file name that you set, and press Enter.</p>
<p>You should see a prompt saying <code>You:</code>, just like we mentioned in the code. Write your prompt and press Enter. You should see the AI Response being streamed. To stop executing, enter the prompt <code>exit</code>, or close the Terminal window.</p>
<p>You can even install the module for JavaScript or any other supported language and integrate the AI in your code. Feel free to check the <a target="_blank" href="https://github.com/ollama/ollama/blob/main/docs/README.md">Ollama Official Documentation</a> and understand what can you code with the AI Models.</p>
<h2 id="heading-how-to-customize-your-models-with-fine-tuning">How to Customize Your Models with Fine-Tuning</h2>
<h3 id="heading-what-is-fine-tuning">What is Fine-Tuning?</h3>
<p>Fine-tuning is the process of taking a pre-trained language model and training it further on a specific and custom dataset for a specific purpose. While LLMs are trained on massive datasets, they may not always perfectly align with your needs. Fine-tuning allows you to make the model better suited for your particular use case.</p>
<h3 id="heading-how-to-fine-tune-a-model">How to Fine-Tune a Model</h3>
<p>Fine-tuning requires:</p>
<ul>
<li><p><strong>A pre-trained model</strong>: I’d suggest starting with a powerful open-source LLM like LLaMA, Mistral, or Falcon.</p>
</li>
<li><p><strong>A quality dataset</strong>: A <strong>dataset</strong> is a collection of data that is used for training, testing, or evaluating machine learning models, including LLMs. The quality and relevance of the dataset directly influence how well the model performs on a given task. Use a dataset relevant to your domain or task. For example, if you want the AI to write blog posts, train it on high-quality blog content.</p>
</li>
<li><p><strong>Sufficient resources</strong>: Fine-tuning involves re-training the model, which requires significant computational resources (preferably a machine with a powerful GPU).</p>
</li>
</ul>
<p>For fine tuning your model, there are several tools you can use. <a target="_blank" href="https://unsloth.ai/">Unsloth</a> is a fast option to fine-tune a model with any datasets.</p>
<h2 id="heading-what-are-the-benefits-of-self-hosted-llms">What Are the Benefits of Self-hosted LLMs?</h2>
<p>As I’ve briefly discussed above, there are various reasons to self-host an LLM. To summarize, here are some of the top benefits:</p>
<ul>
<li><p>Enhanced data privacy and security, as your data does not leave your computer, and you have complete control over it.</p>
</li>
<li><p>Cost savings, as you do not need to pay for API subscriptions regularly. Instead, it’s a one-time-investment to get powerful-enough infrastructure to help you get going in the long run.</p>
</li>
<li><p>Great customizability, as you get to tailor the models to your specific needs through fine-tuning or training on your own datasets.</p>
</li>
<li><p>Lower latency</p>
</li>
</ul>
<h2 id="heading-when-should-you-not-use-a-self-hosted-ai">When Should You NOT Use a Self-hosted AI?</h2>
<p>But this might not be the right fit for you for several reasons. First, you may not have the system resources required to be able to run the models – and perhaps you don’t want to or can’t upgrade.</p>
<p>Second, you may not have the technical knowledge or time to set up your own model and fine tune it. It’s not terribly difficult, but it does require some background knowledge and particular skills. This can also be a problem if you don’t know how to troubleshoot errors that may come up.</p>
<p>You also may need your models to be up 24/7, and you might not have the infrastructure to handle it.</p>
<p>None of these issues are insurmountable, but they may inform your decision as to whether you use a cloud-based solution or host your own model.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Hosting your own LLMs can be a game-changer if you value data privacy, cost-efficiency, and customization.</p>
<p>Tools like Ollama make it easier than ever to bring powerful AI models right to your personal infrastructure. While self-hosting isn't without its challenges, it gives you control over your data and the flexibility to adapt models to your needs.</p>
<p>Just make sure you assess your technical capabilities, hardware resources, and project requirements before deciding to go this way. If you need reliability, scalability, and quick access to cutting-edge features, cloud-based LLMs might still be the better fit.</p>
<p>If you liked this article, don’t forget to show your support, and follow me on <a target="_blank" href="https://x.com/Codeskae">X</a> and <a target="_blank" href="https://www.linkedin.com/in/imkrishnasarathi/">LinkedIn</a> to get connected. Also, I create short but informative tech content on <a target="_blank" href="https://youtube.com/@krishcodes">YouTube</a>, so don’t forget to check out my content.  </p>
<p>Thanks for reading this article!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Local AI Development with Ollama Course ]]>
                </title>
                <description>
                    <![CDATA[ Artificial Intelligence is evolving rapidly, and with tools like Ollama, you can bring cutting-edge AI capabilities directly to your local environment. Learning how to harness local large language models (LLMs) can open up a world of opportunities. L... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/local-ai-development-with-ollama-course/</link>
                <guid isPermaLink="false">6745ffa3608b32cb4a868bca</guid>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Tue, 26 Nov 2024 17:04:35 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1732640638451/ecee4cf1-efb6-41cc-823b-4808bc1670df.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Artificial Intelligence is evolving rapidly, and with tools like Ollama, you can bring cutting-edge AI capabilities directly to your local environment. Learning how to harness local large language models (LLMs) can open up a world of opportunities. Local LLMs provide greater control, customization, and data privacy compared to cloud-based AI systems.</p>
<p>We just published a course on the <a target="_blank" href="http://freeCodeCamp.org">freeCodeCamp.org</a> YouTube channel that will teach you all about <strong>setting up and using Ollama to build powerful AI applications locally</strong>. This comprehensive course, created by Paulo Dichone, takes a hands-on approach to exploring Ollama, a tool designed for running LLMs efficiently on your local machine. You’ll learn everything from installing and customizing models to using REST APIs and Python libraries to build real-world AI applications. By the end of the course, you'll have the skills to develop projects like a Grocery List Organizer, a Retrieval-Augmented Generation (RAG) system, and an AI-powered Recruiter Agency.</p>
<h3 id="heading-what-youll-learn">What You’ll Learn</h3>
<p>This course is packed with practical knowledge and real-world applications. Here's a glimpse of the topics covered:</p>
<ol>
<li><p><strong>Getting Started with Ollama</strong></p>
<ul>
<li><p>What Ollama is and why it's a game-changer for local AI development.</p>
</li>
<li><p>Step-by-step setup instructions for your development environment.</p>
</li>
<li><p>Pulling and testing different Ollama models using basic CLI commands.</p>
</li>
</ul>
</li>
<li><p><strong>Deep Dive into LLMs</strong></p>
<ul>
<li><p>Understanding parameters, benchmarks, and how to optimize models for specific use cases.</p>
</li>
<li><p>Customizing models using the Modelfile for tasks like summarization and sentiment analysis.</p>
</li>
</ul>
</li>
<li><p><strong>Ollama REST API</strong></p>
<ul>
<li><p>Learn how to interact with Ollama models programmatically using JSON requests.</p>
</li>
<li><p>Overview of integration techniques, including a Python library for building local LLM applications.</p>
</li>
</ul>
</li>
<li><p><strong>Hands-On Projects</strong></p>
<ul>
<li><p><strong>Grocery List Organizer</strong>: Automate and optimize your grocery planning using an AI-powered assistant.</p>
</li>
<li><p><strong>RAG System</strong>: Build a sophisticated Retrieval-Augmented Generation system, complete with document ingestion, vector database creation, and Streamlit integration for user-friendly interfaces.</p>
</li>
<li><p><strong>AI Recruiter Agency</strong>: Develop a project that showcases how AI can streamline recruitment processes by matching candidates to job descriptions effectively.</p>
</li>
</ul>
</li>
<li><p><strong>Advanced Features and Tools</strong></p>
<ul>
<li><p>Explore multimodal models like Llava for tasks such as image captioning.</p>
</li>
<li><p>Use Ollama's features, such as the "show" function, for interactive model exploration.</p>
</li>
</ul>
</li>
</ol>
<h3 id="heading-why-choose-ollama">Why Choose Ollama?</h3>
<p>Running AI models locally provides unique advantages:</p>
<ul>
<li><p><strong>Privacy and Security</strong>: Your data stays on your machine, offering unmatched confidentiality.</p>
</li>
<li><p><strong>Customization</strong>: Fine-tune models to fit your specific needs without relying on external servers.</p>
</li>
<li><p><strong>Cost-Effectiveness</strong>: Save on recurring API costs by working entirely offline.</p>
</li>
</ul>
<p>With detailed lessons, engaging projects, and expert guidance from Paulo Dichone, you’ll be empowered to create AI solutions that are both innovative and impactful.</p>
<h3 id="heading-ready-to-build-with-local-ai">Ready to Build with Local AI?</h3>
<p>Whether you're building practical applications or experimenting with new ideas, this course will equip you with the tools and knowledge to succeed. Watch now <a target="_blank" href="https://youtu.be/GWB9ApTPTv4">on the freeCodeCamp.org YouTube channel</a> (3-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/GWB9ApTPTv4" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
