<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Zoe Isabel Senón - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Zoe Isabel Senón - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sun, 31 May 2026 05:04:47 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/techno0ptimist/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Run an LLM Locally to Interact with Your Documents ]]>
                </title>
                <description>
                    <![CDATA[ Most AI tools require you to send your prompts and files to third-party servers. That’s a non-starter if your data includes private journals, research notes, or sensitive business documents (contracts, board decks, HR files, financials). The good new... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/run-an-llm-locally-to-interact-with-your-documents/</link>
                <guid isPermaLink="false">69619f7198022932a4f500a0</guid>
                
                    <category>
                        <![CDATA[ LLM&#39;s  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ollama ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Zoe Isabel Senón ]]>
                </dc:creator>
                <pubDate>Sat, 10 Jan 2026 00:38:09 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767976983680/2e3671cd-4280-4a32-9508-47fe9c06ab22.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most AI tools require you to send your prompts and files to third-party servers. That’s a non-starter if your data includes private journals, research notes, or sensitive business documents (contracts, board decks, HR files, financials). The good news: you can run capable LLMs locally (on a laptop or your own server) and query your documents without sending a single byte to the cloud.</p>
<p>In this tutorial, you’ll learn how to run an LLM locally and privately, so you can search and chat with sensitive journals and business docs on your own machine. We’ll install <strong>Ollama</strong> and <strong>OpenWebUI</strong>, pick a model that fits your hardware, enable private document search with <strong>nomic-embed-text</strong>, and create a local knowledge base so everything stays on-disk.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-installation">Installation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-settings-for-documents">Settings for Documents</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-upload-your-documents">How to Upload Your Documents</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-optional-adding-a-system-prompt">(Optional) Adding a system prompt</a></li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-run-your-llm-locally">How to Run Your LLM Locally</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You’ll need a terminal (all systems—Windows, Mac, Linux—include one, and you can find yours with a quick search), and either Python and pip or Docker, depending on your preferred installation method for OpenWebUI.</p>
<h2 id="heading-installation">Installation</h2>
<p>You’ll need <a target="_blank" href="https://ollama.com/download"><strong>Ollama</strong></a> and <a target="_blank" href="https://docs.openwebui.com/getting-started/quick-start/"><strong>OpenWebUI</strong></a>. Ollama runs the models, while OpenWebUI gives you a browser interface to interact with your local LLM, like you would with ChatGPT.</p>
<h3 id="heading-step-1-install-ollama">Step 1: Install Ollama</h3>
<p>Download and install Ollama from its <a target="_blank" href="https://ollama.com/download">official site</a>. Installers are available for <strong>macOS</strong>, <strong>Linux</strong>, and <strong>Windows</strong>. Once installed, verify it’s running by opening a terminal and executing:</p>
<pre><code class="lang-bash">ollama list
</code></pre>
<p>If Ollama is running, this will return a list of active models (or an empty list).</p>
<h3 id="heading-step-2-install-openwebui">Step 2: Install OpenWebUI</h3>
<p>You can install OpenWebUI either with Python (pip) or with Docker. Here, we will show how to do it with pip, but you can find instructions for Docker on the <a target="_blank" href="https://docs.openwebui.com/getting-started/quick-start/">official openwebui docs</a>.</p>
<p>Install OpenWebUI with the following command:</p>
<pre><code class="lang-bash">pip install open-webui
</code></pre>
<p>This works on <strong>macOS, Linux, and Windows</strong>, as long as you have Python ≥ 3.9 installed.</p>
<p>Next, start the server:</p>
<pre><code class="lang-bash">open-webui serve
</code></pre>
<p>Then open your browser and go to:</p>
<pre><code class="lang-bash">http://localhost:8080
</code></pre>
<h3 id="heading-step-3-install-a-model">Step 3: Install a Model</h3>
<p>Choose a model from the <a target="_blank" href="https://ollama.com/library">Ollama model list</a> and pull it locally by copying the command provided.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302463715/fbbaabf7-6612-460c-8e09-1c5143eacc1a.png" alt="Screenshot of the model download page with an arrow pointing to the upper-right corner box that includes the installation command with a shortcut to copy-paste" class="image--center mx-auto" width="1748" height="1110" loading="lazy"></p>
<p>For example:</p>
<pre><code class="lang-bash">ollama pull gemma3:4b
</code></pre>
<p>If you’re unsure which model your machine can handle, ask an AI to recommend one based on your hardware. Smaller models (1B–4B) are safer on laptops.</p>
<p>I would recommend Gemma3 as a starter (you can download multiple models and easily switch between them). Pick the <strong>parameter number</strong> at the end (“:4b”, “:1b”, and so on) based on this guide:</p>
<ul>
<li><p>Tier 1 (small laptops or weak computers): RAM ≤8 GB or no GPU → 1B–2B.</p>
</li>
<li><p>Tier 2: RAM 16 GB, weak GPU → 2B–4B.</p>
</li>
<li><p>Tier 3: RAM ≥16 GB, 6–8 GB VRAM → 4B–9B.</p>
</li>
<li><p>Tier 4: RAM ≥32 GB, 12 GB+ VRAM → 12B+.</p>
</li>
</ul>
<p>Once you have installed Ollama and your desired model, confirm that they are active by running <code>ollama list</code> in the terminal:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767465401368/d1b8abc0-7aaa-4c2f-ad4c-30ae908f9e8b.png" alt="Image showing the output of running the &quot;ollama list&quot; command (shows the list of downloaded models, in this case &quot;gemma3:1b&quot;)" width="436" height="190" loading="lazy"></p>
<p>Run WebOpenUI to launch the browser interface with:</p>
<pre><code class="lang-bash">open-webui serve
</code></pre>
<p>Then head over to <a target="_blank" href="http://localhost:8080/">http://localhost:8080/</a>. Now you are ready to start using your LLM locally!</p>
<p><strong>Note</strong>: it will ask you for login credentials, but these don’t really matter if you only intend to use it locally.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302486263/14d93c7e-415c-463f-82da-fc515f28663a.png" alt="Screenshot of the frontend of a running instance of OpenWebUI, showing the homepage, which includes a text input box in the center with the placeholder &quot;how can I help you today?&quot;, and a side panel with the list of previous chats, and links to &quot;search&quot;, &quot;notes&quot;, &quot;workspace&quot;, and &quot;new chat&quot;, as well as a setting button. At the top there is a model selector that currently has &quot;gemma3:1b&quot; selected as the model to use." class="image--center mx-auto" width="2736" height="1390" loading="lazy"></p>
<h2 id="heading-settings-for-documents">Settings for Documents</h2>
<p>Now we are going to set up everything we need to interact with our local documents. First of all, we need to install the “<a target="_blank" href="https://ollama.com/library/nomic-embed-text"><strong>nomic-embed-text</strong></a>” model to process our documents. Install it with:</p>
<pre><code class="lang-bash">ollama pull nomic-embed-text
</code></pre>
<p><strong>Note</strong>: If you are wondering why we need another model (nomic-embed-text) besides our main one:</p>
<ul>
<li><p>The embedding model (<code>nomic-embed-text</code>) maps each text chunk from your documents to a numerical vector so OpenWebUI can quickly find semantically similar chunks when you ask a question.​</p>
</li>
<li><p>The chat model (for example <code>gemma3:1b</code>) receives your question plus those retrieved chunks as context and generates the natural-language response.</p>
</li>
</ul>
<p>Next, you should enable the “<strong>memory</strong>” feature if you want the LLM to remember the context of your past conversations in your future ones.</p>
<p>Download the adaptive memory function <a target="_blank" href="https://openwebui.com/f/alexgrama7/adaptive_memory_v2"><strong>here</strong></a>. Functions are like plug-ins.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302505221/b247316c-0863-410a-84c9-abc084a6631f.png" alt="Screenshot showing the page (website) for the &quot;adaptive memory v3&quot; function. It shows a big &quot;get&quot; button, that when clicked opens a pop-up view named &quot;Open WebUI URL&quot; with the current placeholder being &quot;http:localhost:8080&quot; (the default WebUI port) and a button to &quot;import to WebUI&quot; and another one below to &quot;Download as JSON export&quot; in case the first one doesn't work)" class="image--center mx-auto" width="1488" height="1206" loading="lazy"></p>
<p>Now we will update our settings to enable these features. Click on your name in the bottom-left corner, then “Settings”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302517617/e73983f3-0e36-4c0a-a61c-96a0a42f1fab.png" alt="Screenshot showing the menu panel that pops up when clicking on the bottom-left round icon with the user's initital and name, showing a list of options, starting with &quot;Settings&quot; and followed by &quot;Archived Chats&quot;, &quot;Playground&quot;, &quot;Admin Panel&quot; and &quot;Sign out&quot;" width="554" height="572" loading="lazy"></p>
<p>Click on the first one, then go to “Personalization” and enable “Memory”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752935284007/aa42c76b-f38c-4485-b442-8844c6c3a544.png" alt="“Screenshot of the OpenWebUI settings panel with the Personalization tab open and the Memory toggle switched on for saving past conversation context.”" class="image--center mx-auto" width="1802" height="536" loading="lazy"></p>
<p>Now we are going to access the other settings panel (“Admin Panel”). Click again on your name in the bottom-left corner and go to <strong>Admin panel → Settings → Documents</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302570583/96784c55-484b-4c66-bdc4-ce23a7e901a1.png" alt="Screenshot of the OpenWebUI Admin → Settings → Documents page, showing a text input field called &quot;Chunk size&quot; currently set to 512" class="image--center mx-auto" width="1172" height="650" loading="lazy"></p>
<p>In this section (Admin Panel → Settings → Documents), find the “<strong>Embedding</strong>” section, go to “<strong>Embedding Model Engine</strong>” and choose Ollama (find the selectable to the right). Leave the API Key blank.</p>
<p>Now, under “<strong>Embedding Model</strong>” write <code>nomic-embed-text</code>. Then go to “Retrieval” → enable “Full Context Mode”.</p>
<h3 id="heading-chunking-settings">Chunking settings</h3>
<p>You should also set the <strong>chunk size</strong> and <strong>overlap</strong>. OpenWebUI splits documents into smaller chunks before indexing them, since models can’t embed or retrieve very long texts in one piece.</p>
<p>A good default is <strong>128–512 tokens per chunk</strong>, with <strong>10–20% overlap</strong>. Larger chunks preserve more context but are slower and more memory-intensive, while smaller chunks are faster but can lose higher-level meaning. Overlap helps prevent important context from being cut off when text is split.</p>
<p>Here’s a guiding table, but I recommend obtaining the recommended values for your specific use case and setup by sharing them (including GPU or laptop model, storage, RAM, and so on) with an LLM like ChatGPT or Claude, <strong>as changing the chunking/overlap values later on requires reuploading the documents.</strong></p>
<h3 id="heading-suggested-chunkoverlap-by-tier">Suggested chunk/overlap by tier</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Tier / scenario</strong></td><td><strong>Typical hardware</strong></td><td><strong>Chunk size (tokens)</strong></td><td><strong>Overlap (%)</strong></td><td><strong>Notes</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Tier 1 – constrained</td><td>≤8 GB RAM, no/weak GPU</td><td>128–256</td><td>10–15</td><td>Prioritizes speed and low memory use. ​</td></tr>
<tr>
<td>Tier 2 – mid</td><td>16 GB RAM, modest GPU or strong CPU</td><td>256–384</td><td>15–20</td><td>Balanced context vs. performance. ​</td></tr>
<tr>
<td>Tier 3 – comfortable</td><td>≥16 GB RAM, 6–8 GB VRAM</td><td>384–512</td><td>15–20</td><td>More semantics per chunk, still practical. ​</td></tr>
<tr>
<td>Dense technical PDFs / legal docs</td><td>Any, but especially Tier 2–3</td><td>384–512</td><td>15–20</td><td>Keeps paragraphs and arguments intact. ​</td></tr>
<tr>
<td>Short notes, tickets, emails</td><td>Any</td><td>128–256</td><td>10–15</td><td>Items are small, large chunks not needed. ​</td></tr>
<tr>
<td>Very long queries, need many retrieved chunks</td><td>Any with larger context window</td><td>256–384</td><td>10–15</td><td>Smaller chunks fit more pieces into context. ​</td></tr>
</tbody>
</table>
</div><h2 id="heading-how-to-upload-your-documents">How to Upload Your Documents</h2>
<p>Now, the final step: uploading your documents! Go to “Workspace” in the side panel, then “Knowledge”, and create a new collection (database). You can start uploading files here.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302584485/63c04901-f5d3-4ac7-bab5-b23362fb83cb.png" alt="Screenshot of the &quot;Workspace&quot; page (after clicking on &quot;workspace&quot; in the side panel) highlight the &quot;Workspace&quot; button on the lefthand side, the &quot;Knowledge&quot; tab being selected from the options at the top within this Workspace page, then &quot;Upload files&quot; which is the first option shown on the list after clicking the &quot;+&quot; (plus) sign button at the right of the text input with the placeholder that says &quot;Search Collection&quot;." class="image--center mx-auto" width="1596" height="672" loading="lazy"></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">⚠</div>
<div data-node-type="callout-text">Make sure to check for any errors during the upload. Unfortunately, they only show as temporary pop-ups. Some errors might be due to the format of your files, so make sure to check the console for further error logs.</div>
</div>

<p>Then, within “Workspace”, switch to the “Models” tab and create a new custom model. Creating a custom model and attaching your knowledge base tells OpenWebUI to automatically search your document collection and include the most relevant chunks as context whenever you ask a question.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302593445/b5316a4a-8c8a-4348-a31e-1c10fe0e1abb.png" alt="Screenshot of the &quot;Workspace&quot; page (after clicking on &quot;workspace&quot; in the side panel), highlighting the first tab/option in the upper menu named &quot;Models&quot;, which when clicked shows the list of custom models and an option to create new ones (in this case the user has created one called &quot;Gemma-custom-knowledge&quot;)" class="image--center mx-auto" width="1328" height="560" loading="lazy"></p>
<p>Here, make sure to select your model (in my case “gemma3:1b”) and attach your knowledge base.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302604758/df0c7948-bb9b-4615-8f09-21faaa64fdde.png" alt="Screenshot of the model creation page, highlighting the selectable options under the &quot;Base model (from)&quot; field, specifically highlighting &quot;gemma3:1b&quot; or the model of choice, under the selected-by-default option &quot;select a base model&quot;. The second element highlighted in red is the other field below titled &quot;Knowledge&quot;, with a buttom called &quot;Select Knowledge&quot;. There are 2 other elements highlighted in yellow (indicating lower priority): the first one is &quot;Model Params&quot; that includes a &quot;system prompt&quot; input field right below, and the other one is &quot;Filters&quot; which includes multiple selectable options depending on the different plugins or &quot;functions&quot; installed." class="image--center mx-auto" width="1724" height="1578" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302612285/8247d1c3-5f84-42de-9861-34416d0b7f10.png" alt="Screenshot showing the options available after clicking &quot;Select Knowledge&quot; under &quot;Knowledge&quot;, highlighting the option that says &quot;COLLECTION&quot; in green followed by the title &quot;Test-knowledge-base&quot; (example title chosen by the author) and the description added by the author (&quot;adding my documents&quot;)" class="image--center mx-auto" width="550" height="310" loading="lazy"></p>
<h3 id="heading-optional-adding-a-system-prompt">(Optional) Adding a system prompt</h3>
<p>When creating your custom model in <strong>Workspace → Models</strong>, you can define a <strong>system prompt</strong> that the model will use for context throughout all your conversations.</p>
<p>Here are some examples of information you might want to add:</p>
<ul>
<li><p>context about yourself <em>(“I am a 20-year-old student in bioengineering interested in…”)</em></p>
</li>
<li><p>your preferred communication style <em>(“no fluff", “be direct”, “be analytical”…)</em></p>
</li>
<li><p>context about how your data is structured</p>
</li>
</ul>
<p><strong>Example system prompt:</strong></p>
<blockquote>
<p>You are a thoughtful, analytical assistant helping me explore patterns and insights in my personal journals. Be direct, avoid speculation, and clearly distinguish between facts from the documents and interpretation.</p>
</blockquote>
<p>This prompt will automatically apply to every chat using this custom model, helping keep responses consistent and aligned with your goals.</p>
<h2 id="heading-how-to-run-your-llm-locally">How to Run Your LLM Locally</h2>
<p>Now open a new chat and make sure to select your custom model:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758302621012/241f461c-acf6-41ae-b68d-ad187790aef4.png" alt="Screenshot showing the &quot;New chat&quot; page after clicking on the &quot;+&quot; (plus) symbol/button next to the custom model name. It shows the options shown when clicking on the input field that says &quot;Search a model&quot; as a placeholder, and the option highlighted within it is the name of the custom model (in this case the author chose the name &quot;Gemma-custom-knowledge&quot;)" class="image--center mx-auto" width="1404" height="944" loading="lazy"></p>
<p>Now you are ready to chat with your own docs in a private local environment!</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">⚠</div>
<div data-node-type="callout-text"><strong>Note</strong>: By default, the frontend/browser will stop streaming the response after five minutes, even though it will keep processing your query in the background. This means that if your query takes more than five minutes to process, it will not be displayed on the browser. You can reload the page and click “continue response” to get the latest output.</div>
</div>

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">I recommend installing the <a target="_self" href="https://openwebui.com/f/alexgrama7/enhanced_context_tracker_v4">Enhanced Context Tracker</a> function (plugin) to get more visibility into the progress of your query.</div>
</div>

<h2 id="heading-conclusion">Conclusion</h2>
<p>You now have a private LLM stack (<strong>Ollama</strong> for models, <strong>OpenWebUI</strong> for the UI, and <strong>nomic-embed-text</strong> for embeddings) wired to your on-disk knowledge base. Your journals and business docs stay local; nothing is sent to third parties. The main dials are simple: pick a model that fits your hardware, enable memory and full-context retrieval, use sensible chunk/overlap, and check the console when runs stall.</p>
<p>If you need more headroom, deploy the same setup on your own server and keep the privacy guarantees. From here, iterate on model choice, chunking, and prompts, and add the optional functions if you need deeper visibility during long jobs.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
