<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ agents - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ agents - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Tue, 16 Jun 2026 21:35:25 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/agents/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Use Context Hub (chub) to Build a Companion Relevance Engine
 ]]>
                </title>
                <description>
                    <![CDATA[ Large language models can write code quickly, but they still misremember APIs, miss version-specific details, and forget what they learned at the end of a session. That is the problem Context Hub is t ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-context-hub-chub-to-build-a-companion-relevance-engine/</link>
                <guid isPermaLink="false">69e299d0fd22b8ad6276817b</guid>
                
                    <category>
                        <![CDATA[ context-hub ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Developer Tools ]]>
                    </category>
                
                    <category>
                        <![CDATA[ search ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agentic AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agents ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Open Source ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Nataraj Sundar ]]>
                </dc:creator>
                <pubDate>Fri, 17 Apr 2026 20:36:32 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/14f9768e-436d-4c7e-b86c-3d380e821354.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Large language models can write code quickly, but they still misremember APIs, miss version-specific details, and forget what they learned at the end of a session.</p>
<p>That is the problem Context Hub is trying to solve.</p>
<p>Context Hub (<code>chub</code>) gives coding agents curated, versioned documentation and skills that they can search and fetch through a CLI. It also gives them two learning loops: local annotations for agent memory and feedback for maintainers.</p>
<p>In this tutorial, you'll learn how the official <code>chub</code> workflow works, how Context Hub organizes docs and skills, how annotations and feedback create a memory loop, and how to build a <a href="https://github.com/natarajsundar/context-hub-relevance-engine/">companion relevance engine</a> that improves retrieval without breaking the upstream content model.</p>
<p>This tutorial uses two public repositories side by side:</p>
<ul>
<li><p>the official upstream project: <a href="https://github.com/andrewyng/context-hub">andrewyng/context-hub</a></p>
</li>
<li><p>the companion implementation for this article: <a href="https://github.com/natarajsundar/context-hub-relevance-engine/">natarajsundar/context-hub-relevance-engine</a></p>
</li>
</ul>
<p>I've also opened a corresponding upstream pull request from my fork to the main project. If you want to track that work from the article, use the upstream pull request list filtered by author: <a href="https://github.com/andrewyng/context-hub/pulls?q=is%3Apr+author%3Anatarajsundar">andrewyng/context-hub pull requests by <code>natarajsundar</code></a>.</p>
<h2 id="heading-what-well-build">What We'll Build</h2>
<p>By the end of this tutorial, you'll have:</p>
<ul>
<li><p>a clear mental model for how Context Hub works</p>
</li>
<li><p>a working local install of the official <code>chub</code> CLI</p>
</li>
<li><p>a repeatable workflow for search, fetch, annotations, and feedback</p>
</li>
<li><p>a companion repo that adds an additive reranking layer on top of a Context-Hub-style content tree</p>
</li>
<li><p>a small benchmark and local comparison UI you can run end to end</p>
</li>
<li><p>a clear bridge between the companion repo and the smaller upstream PR</p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you start, make sure you have:</p>
<ul>
<li><p>Node.js 18 or newer</p>
</li>
<li><p>npm</p>
</li>
<li><p>comfort with the terminal</p>
</li>
<li><p>basic familiarity with Markdown</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a href="#heading-how-to-understand-context-hub">How to Understand Context Hub</a></p>
</li>
<li><p><a href="#heading-how-to-understand-the-official-repo-the-companion-repo-and-the-upstream-pr">How to Understand the Official Repo, the Companion Repo, and the Upstream PR</a></p>
</li>
<li><p><a href="#heading-how-to-install-and-use-the-official-cli">How to Install and Use the Official CLI</a></p>
</li>
<li><p><a href="#heading-how-to-understand-docs-skills-and-the-content-layout">How to Understand Docs, Skills, and the Content Layout</a></p>
</li>
<li><p><a href="#heading-how-to-use-incremental-fetch-and-layered-sources">How to Use Incremental Fetch and Layered Sources</a></p>
</li>
<li><p><a href="#heading-how-to-use-annotations-and-feedback-to-create-a-memory-loop">How to Use Annotations and Feedback to Create a Memory Loop</a></p>
</li>
<li><p><a href="#heading-how-to-see-where-relevance-still-misses">How to See Where Relevance Still Misses</a></p>
</li>
<li><p><a href="#heading-how-the-companion-relevance-engine-improves-retrieval">How the Companion Relevance Engine Improves Retrieval</a></p>
</li>
<li><p><a href="#heading-how-to-run-the-companion-repo-end-to-end">How to Run the Companion Repo End to End</a></p>
</li>
<li><p><a href="#heading-how-to-read-the-benchmark-honestly">How to Read the Benchmark Honestly</a></p>
</li>
<li><p><a href="#heading-how-to-connect-the-companion-repo-to-the-upstream-pr">How to Connect the Companion Repo to the Upstream PR</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-sources">Sources</a></p>
</li>
</ol>
<h2 id="heading-how-to-understand-context-hub">How to Understand Context Hub</h2>
<p>Context Hub is easiest to understand as a workflow for turning fast-moving documentation into a reliable input for coding agents.</p>
<p>Instead of asking an agent to rely on whatever it remembers from training data, you give it a predictable contract:</p>
<ol>
<li><p>search for the right entry</p>
</li>
<li><p>fetch the right doc or skill</p>
</li>
<li><p>write code against that curated content</p>
</li>
<li><p>save local lessons as annotations</p>
</li>
<li><p>send doc-quality feedback back to maintainers</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/694ca88d5ac09a5d68c63854/09d75c85-fbb0-4c9a-86d5-8acdff4e1abf.png" alt="Diagram showing the Context Hub loop from developer prompt to agent search and fetch, then annotations and maintainer feedback." style="display:block;margin:0 auto" width="1654" height="307" loading="lazy">

<p>That system boundary matters.</p>
<p>It makes the agent easier to audit, easier to improve, and easier to extend. It also keeps the interface small enough that you can reason about where the failures happen. If the agent still misses the answer, you can ask whether the problem happened during search, fetch, context selection, or generation.</p>
<h2 id="heading-how-to-understand-the-official-repo-the-companion-repo-and-the-upstream-pr">How to Understand the Official Repo, the Companion repo, and the Upstream PR</h2>
<p>This tutorial is intentionally split across two codebases and one contribution path.</p>
<p>The official upstream project, <a href="https://github.com/andrewyng/context-hub">andrewyng/context-hub</a>, is the source of truth for the real CLI, the content model, and the documented workflows. That's the codebase you should use to learn how <code>chub</code> works today.</p>
<p>The companion repository, <a href="https://github.com/natarajsundar/context-hub-relevance-engine/">natarajsundar/context-hub-relevance-engine</a>, is where the relevant ideas in this article are made concrete. It's a companion implementation, not a replacement product. Its job is to make retrieval tradeoffs visible, measurable, and easy to run locally.</p>
<p>The upstream PR is the bridge between those two worlds. The companion repo is where you can iterate faster on benchmarks, reranking, and the comparison UI. The upstream PR is where the smallest reviewable slices can be proposed back to the main project. You can track that thread here: <a href="https://github.com/andrewyng/context-hub/pulls?q=is%3Apr+author%3Anatarajsundar">upstream PR search filtered by author</a>.</p>
<p>That three-part framing keeps the article honest:</p>
<ul>
<li><p><strong>use the upstream repo</strong> to understand the current system</p>
</li>
<li><p><strong>use the companion repo</strong> to explore relevant improvements end to end</p>
</li>
<li><p><strong>use the upstream PR</strong> to show how a larger idea can be broken into reviewable pieces</p>
</li>
</ul>
<h2 id="heading-how-to-install-and-use-the-official-cli">How to Install and Use the Official CLI</h2>
<p>The official quick start is intentionally small.</p>
<pre><code class="language-bash">npm install -g @aisuite/chub
</code></pre>
<p>Once the CLI is installed, you can search for what is available and fetch a specific entry:</p>
<pre><code class="language-bash">chub search openai
chub get openai/chat --lang py
</code></pre>
<p>That's the happy path, but it helps to think through the request flow.</p>
<img src="https://cdn.hashnode.com/uploads/covers/694ca88d5ac09a5d68c63854/c5ff71d4-5e51-48b8-bbd3-fc2aafa93b9d.png" alt="Sequence diagram showing the developer asking the agent for current docs, the agent calling chub search and chub get, and the CLI fetching docs from the registry." style="display:block;margin:0 auto" width="1416" height="683" loading="lazy">

<p>In practice, the most useful detail is that the CLI is designed for the <strong>agent</strong> to use, not just for the human to use by hand.</p>
<p>That's why the upstream CLI also ships a <code>get-api-docs</code> skill. For example, if you use Claude Code, you can copy the skill into your local project like this:</p>
<pre><code class="language-bash">mkdir -p .claude/skills
cp $(npm root -g)/@aisuite/chub/skills/get-api-docs/SKILL.md \
  .claude/skills/get-api-docs.md
</code></pre>
<p>That step teaches the agent a retrieval habit:</p>
<blockquote>
<p>Before you write code against a third-party SDK or API, use <code>chub</code> instead of guessing.</p>
</blockquote>
<p>That behavioral rule is often as important as the docs themselves.</p>
<h2 id="heading-how-to-understand-docs-skills-and-the-content-layout">How to Understand Docs, Skills, and the Content Layout</h2>
<p>Context Hub separates content into two categories:</p>
<ul>
<li><p><strong>docs</strong>, which answer “what should the agent know?”</p>
</li>
<li><p><strong>skills</strong>, which answer “how should the agent behave?”</p>
</li>
</ul>
<p>That distinction makes the content model easier to scale. Docs can be versioned and language-specific. Skills can stay short and operational.</p>
<p>The directory structure is also predictable. The content guide organizes entries by author, then by <code>docs</code> or <code>skills</code>, then by entry name.</p>
<img src="https://cdn.hashnode.com/uploads/covers/694ca88d5ac09a5d68c63854/3ac72bc2-c869-4e2e-9294-d63b35991135.png" alt="Diagram showing the content tree from author to docs and skills, with DOC.md and SKILL.md feeding a build step that emits registry and search artifacts." style="display:block;margin:0 auto" width="674" height="739" loading="lazy">

<p>A small example looks like this:</p>
<pre><code class="language-text">author/docs/payments/python/DOC.md
author/docs/payments/python/references/errors.md
author/skills/login-flows/SKILL.md
</code></pre>
<p>This is one of the reasons Context Hub is easy to work with.</p>
<p>The shape of the content is plain Markdown, the main entry file is predictable, and the build output is inspectable. You don't have to reverse engineer a hidden prompt layer to figure out what the agent is reading.</p>
<h2 id="heading-how-to-use-incremental-fetch-and-layered-sources">How to Use Incremental Fetch and Layered Sources</h2>
<p>One of the best design choices in Context Hub is that it doesn't force you to inject every file into the model on every request.</p>
<p>Instead, the entry file gives you the overview, and the reference files hold the deeper material.</p>
<img src="https://cdn.hashnode.com/uploads/covers/694ca88d5ac09a5d68c63854/88d80a48-c991-495a-af25-14a0c0ac9868.png" alt="Diagram showing how chub get can fetch just the main entry file, a specific reference file, or the full entry directory." style="display:block;margin:0 auto" width="592" height="460" loading="lazy">

<p>That lets you fetch content in progressively larger slices.</p>
<pre><code class="language-bash">chub get stripe/webhooks --lang py
chub get stripe/webhooks --lang py --file references/raw-body.md
chub get stripe/webhooks --lang py --full
</code></pre>
<p>This is a token-budget feature as much as it is a documentation feature. A good agent should first load the overview, decide what part of the task matters, and only then fetch the specific supporting file.</p>
<p>Context Hub also supports layered sources. You can merge public content with your own local build output through <code>~/.chub/config.yaml</code>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/694ca88d5ac09a5d68c63854/67465254-7a7c-4cfc-b9f0-9e94d8c3e2f3.png" alt="Diagram showing community, official, and local team sources merging into one search surface for chub search and chub get." style="display:block;margin:0 auto" width="774" height="460" loading="lazy">

<p>A minimal configuration looks like this:</p>
<pre><code class="language-yaml">sources:
  - name: community
    url: https://cdn.aichub.org/v1
  - name: my-team
    path: /opt/team-docs/dist
</code></pre>
<p>That means you can keep public docs in one lane and team-specific runbooks in another lane while still giving the agent one search surface.</p>
<h2 id="heading-how-to-use-annotations-and-feedback-to-create-a-memory-loop">How to Use Annotations and Feedback to Create a Memory Loop</h2>
<p>Context Hub has two different improvement channels.</p>
<p>Annotations are local. They help your agent remember what worked last time. Feedback is shared. It helps maintainers improve the docs for everyone.</p>
<p>That distinction matters because not every lesson belongs in the shared registry. Some lessons are environment-specific. Others point to content quality issues that should be fixed centrally.</p>
<img src="https://cdn.hashnode.com/uploads/covers/694ca88d5ac09a5d68c63854/a8514430-08cb-4085-8047-64df25c603c7.png" alt="Diagram showing the agent fetch/write cycle, then branching to local annotations or maintainer feedback before the next task." style="display:block;margin:0 auto" width="808" height="798" loading="lazy">

<p>Here is what local memory looks like in practice:</p>
<pre><code class="language-bash">chub annotate stripe/webhooks \
  "Remember: Flask request.data must stay raw for Stripe signature verification."
</code></pre>
<p>And here's the feedback path:</p>
<pre><code class="language-bash">chub feedback stripe/webhooks up
</code></pre>
<p>That loop is simple, but it's one of the most important ideas in the project. It turns a one-off debugging lesson into either persistent local memory or a signal that the shared docs need to improve.</p>
<h2 id="heading-how-to-see-where-relevance-still-misses">How to See Where Relevance Still Misses</h2>
<p>The upstream project already has a real ranking story. It uses BM25 and lexical rescue so that package-like identifiers, exact tokens, and fuzzy matches still have a chance to surface.</p>
<p>That is a strong baseline.</p>
<p>But developer queries are often much messier than package names.</p>
<p>People search for:</p>
<ul>
<li><p><code>rrf</code></p>
</li>
<li><p><code>signin</code></p>
</li>
<li><p><code>pg vector</code></p>
</li>
<li><p><code>hnsw</code></p>
</li>
<li><p><code>raw body stripe</code></p>
</li>
</ul>
<p>Those aren't “bad” queries. They're realistic shorthand.</p>
<p>And they expose an opportunity in the content model itself: many of the exact answers live in reference files such as <code>references/rrf.md</code>, <code>references/raw-body.md</code>, and <code>references/hnsw.md</code>.</p>
<p>So the question is not whether the current search works at all. It clearly does. The better question is this:</p>
<blockquote>
<p>How can you improve retrieval without breaking the content contract that already makes Context Hub useful?</p>
</blockquote>
<p>The answer in the companion repo is to keep the current model and add a reranking layer on top of it.</p>
<h2 id="heading-how-the-companion-relevance-engine-improves-retrieval">How the Companion Relevance Engine Improves Retrieval</h2>
<p>The companion repository in this article is <a href="https://github.com/natarajsundar/context-hub-relevance-engine/"><code>context-hub-relevance-engine</code></a>.</p>
<p>It keeps the same broad ideas that make Context Hub attractive:</p>
<ul>
<li><p>plain Markdown content</p>
</li>
<li><p><code>DOC.md</code> and <code>SKILL.md</code> entry points</p>
</li>
<li><p>build artifacts you can inspect</p>
</li>
<li><p>local annotations and feedback</p>
</li>
<li><p>progressive fetch behavior</p>
</li>
</ul>
<p>Then it adds one new build artifact: <code>signals.json</code>.</p>
<p>At build time, the engine extracts extra signals such as:</p>
<ul>
<li><p>headings from the main file</p>
</li>
<li><p>titles and tokens from reference files</p>
</li>
<li><p>language and version metadata</p>
</li>
<li><p>source metadata and freshness</p>
</li>
<li><p>annotation overlap</p>
</li>
<li><p>feedback priors</p>
</li>
</ul>
<p>The first pass stays cheap and transparent. The reranker only runs after the baseline has done its work.</p>
<img src="https://cdn.hashnode.com/uploads/covers/694ca88d5ac09a5d68c63854/2ed2dadb-8fff-41ee-904b-0792cafcf744.png" alt="Diagram showing the relevance pipeline from query to BM25 and lexical rescue, then synonym expansion, candidate set building, reranking signals, and final results." style="display:block;margin:0 auto" width="1399" height="541" loading="lazy">

<p>That approach matters for two reasons.</p>
<p>First, it's additive. You don't have to redesign the content tree.</p>
<p>Second, it's measurable. You can define concrete failure modes, fix them one by one, and run the same benchmark every time you change the scorer.</p>
<h2 id="heading-how-to-run-the-companion-repo-end-to-end">How to Run the Companion Repo End to End</h2>
<p>Open the repository on <a href="https://github.com/natarajsundar/context-hub-relevance-engine/">GitHub</a>, clone it using GitHub’s normal clone flow, and then run the commands below from the project root.</p>
<pre><code class="language-bash">cd context-hub-relevance-engine
npm install
npm run build
npm test
</code></pre>
<p>The repository has no third-party runtime dependencies, so <code>npm install</code> is mostly there to keep the workflow familiar. The main commands are all plain Node scripts.</p>
<h3 id="heading-how-to-reproduce-a-baseline-miss">How to Reproduce a Baseline Miss</h3>
<p>Start with the query <code>rrf</code>.</p>
<pre><code class="language-bash">node bin/chub-lab.mjs search rrf --mode baseline --lang python
</code></pre>
<p>Expected output:</p>
<pre><code class="language-text">No results.
</code></pre>
<p>Now run the improved mode.</p>
<pre><code class="language-bash">node bin/chub-lab.mjs search rrf --mode improved --lang python
</code></pre>
<p>Expected top result:</p>
<pre><code class="language-text">langchain/retrievers [doc] score=320.24
  Composable retrieval patterns for hybrid search, parent documents, query expansion, and reranking.
</code></pre>
<p>That win happens because the improved mode looks beyond the top-level entry description. It also sees the reference file title <code>rrf</code>, the related terms from query expansion, and the broader token overlap in the extracted signals.</p>
<h3 id="heading-how-to-reproduce-a-workflow-intent-win">How to Reproduce a Workflow-intent Win</h3>
<p>Try a sign-in query.</p>
<pre><code class="language-bash">node bin/chub-lab.mjs search signin --mode baseline
node bin/chub-lab.mjs search signin --mode improved
</code></pre>
<p>The baseline misses. The improved mode returns <code>playwright-community/login-flows</code> because the reranker treats <code>signin</code>, <code>sign in</code>, <code>login</code>, and <code>authentication</code> as related intent.</p>
<h3 id="heading-how-to-test-the-memory-loop">How to Test the Memory Loop</h3>
<p>Write a local note:</p>
<pre><code class="language-bash">node bin/chub-lab.mjs annotate stripe/webhooks \
  "Remember: Flask request.data must stay raw for Stripe signature verification."
</code></pre>
<p>Then fetch the doc:</p>
<pre><code class="language-bash">node bin/chub-lab.mjs get stripe/webhooks --lang python
</code></pre>
<p>You will see the main doc content, the list of available reference files, and the appended annotation.</p>
<p>That's the behavior you want from an agent memory loop: learn once, reuse many times.</p>
<h3 id="heading-how-to-run-the-benchmark">How to Run the Benchmark</h3>
<p>Start from an empty store:</p>
<pre><code class="language-bash">npm run reset-store
node bin/chub-lab.mjs evaluate
</code></pre>
<p>The included synthetic stress set reports the following summary with an empty store:</p>
<table>
<thead>
<tr>
<th>Mode</th>
<th>Top-1 Accuracy</th>
<th>MRR</th>
</tr>
</thead>
<tbody><tr>
<td>baseline</td>
<td>0.333</td>
<td>0.333</td>
</tr>
<tr>
<td>improved</td>
<td>1.000</td>
<td>1.000</td>
</tr>
</tbody></table>
<p>You can also seed the store and rerun the evaluation:</p>
<pre><code class="language-bash">npm run seed-demo
node bin/chub-lab.mjs evaluate
</code></pre>
<p>That demonstrates how annotations and feedback can push relevant entries even higher when the query overlaps with the agent’s own history.</p>
<h3 id="heading-how-to-launch-the-local-comparison-ui">How to Launch the Local Comparison UI</h3>
<pre><code class="language-bash">npm run serve
</code></pre>
<p>Then open <code>http://localhost:8787</code> in your browser.</p>
<p>The UI lets you compare baseline and improved retrieval, inspect stored annotations and feedback, rebuild the local artifacts, and rerun the benchmark from one place.</p>
<h2 id="heading-how-to-read-the-benchmark-honestly">How to Read the Benchmark Honestly</h2>
<p>The benchmark in this repo is intentionally small.</p>
<p>That is a feature, not a flaw.</p>
<p>The point is not to claim universal search quality. The point is to make a handful of realistic failure modes easy to reproduce:</p>
<ul>
<li><p>acronym queries</p>
</li>
<li><p>shorthand workflow queries</p>
</li>
<li><p>reference-file topic queries</p>
</li>
<li><p>memory-aware reranking</p>
</li>
</ul>
<p>That keeps the evaluation honest.</p>
<p>If a future scoring change breaks <code>rrf</code>, <code>signin</code>, or <code>raw body stripe</code>, you'll know immediately. And if you add a stronger dataset later, you can keep these tests as regression guards.</p>
<p>The benchmark files included in the repo are:</p>
<ul>
<li><p><code>demo/benchmark.json</code></p>
</li>
<li><p><code>docs/benchmark-empty-store.json</code></p>
</li>
<li><p><code>docs/benchmark-seeded-store.json</code></p>
</li>
<li><p><code>docs/relevance-improvement-plan.md</code></p>
</li>
</ul>
<h2 id="heading-how-to-connect-the-companion-repo-to-the-upstream-pr">How to Connect the Companion Repo to the Upstream PR</h2>
<p>A good companion repo is broad enough to explore ideas quickly. A good upstream PR is narrow enough to review.</p>
<p>That's why the two shouldn't be identical.</p>
<p>The companion repository is where you can keep the full relevance story together:</p>
<ul>
<li><p>the local comparison UI</p>
</li>
<li><p>the synthetic benchmark</p>
</li>
<li><p>the richer reranking signals</p>
</li>
<li><p>the debug and explain surfaces</p>
</li>
<li><p>the documentation that walks through tradeoffs end to end</p>
</li>
</ul>
<p>The upstream PR should be smaller and more surgical. In practice, that usually means proposing the most reviewable slices first, such as:</p>
<ol>
<li><p>reference-file signal extraction</p>
</li>
<li><p>explainable score output for debugging</p>
</li>
<li><p>a lightweight benchmark fixture format</p>
</li>
<li><p>one additive reranking hook behind a flag</p>
</li>
</ol>
<p>That keeps the main repository maintainable while still letting the article and companion repo tell the full engineering story. The upstream thread for this work lives here: <a href="https://github.com/andrewyng/context-hub/pulls?q=is%3Apr+author%3Anatarajsundar">andrewyng/context-hub pull requests by <code>natarajsundar</code></a>.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>What makes Context Hub interesting is not just that it stores documentation. It gives you a clear system boundary for improving coding agents.</p>
<p>You can inspect what the agent reads. You can decide when it should retrieve. You can layer public and private sources. You can persist local lessons. And you can improve ranking without tearing the whole model apart.</p>
<p>The companion relevance engine shows how to keep what already works, make one part of the system measurably better, and package the result in a way other developers can run, inspect, and extend. The upstream PR, in turn, shows how to turn a broad idea into smaller pieces that are realistic to review in the main project.</p>
<h2 id="heading-diagram-attribution">Diagram Attribution</h2>
<p>All diagrams used in this article were created by the author specifically for this tutorial and its companion repository.</p>
<h2 id="heading-sources">Sources</h2>
<ul>
<li><p><a href="https://github.com/andrewyng/context-hub">Context Hub repository</a></p>
</li>
<li><p><a href="https://github.com/andrewyng/context-hub/blob/main/README.md">Context Hub README</a></p>
</li>
<li><p><a href="https://github.com/andrewyng/context-hub/blob/main/cli/README.md">Context Hub CLI README</a></p>
</li>
<li><p><a href="https://github.com/andrewyng/context-hub/blob/main/docs/cli-reference.md">Context Hub CLI reference</a></p>
</li>
<li><p><a href="https://github.com/andrewyng/context-hub/blob/main/docs/content-guide.md">Context Hub content guide</a></p>
</li>
<li><p><a href="https://github.com/andrewyng/context-hub/blob/main/docs/byod-guide.md">Context Hub bring-your-own-docs guide</a></p>
</li>
<li><p><a href="https://github.com/andrewyng/context-hub/blob/main/docs/feedback-and-annotations.md">Context Hub feedback and annotations guide</a></p>
</li>
<li><p><a href="https://github.com/natarajsundar/context-hub-relevance-engine/">Companion repository: <code>context-hub-relevance-engine</code></a></p>
</li>
<li><p><a href="https://github.com/andrewyng/context-hub/pulls?q=is%3Apr+author%3Anatarajsundar">Upstream pull request search filtered by author</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Docker Container Doctor: How I Built an AI Agent That Monitors and Fixes My Containers ]]>
                </title>
                <description>
                    <![CDATA[ Maybe this sounds familiar: your production container crashes at 3 AM. By the time you wake up, it's been throwing the same error for 2 hours. You SSH in, pull logs, decode the cryptic stack trace, Go ]]>
                </description>
                <link>https://www.freecodecamp.org/news/docker-container-doctor-how-i-built-an-ai-agent-that-monitors-and-fixes-my-containers/</link>
                <guid isPermaLink="false">69c1768730a9b81e3a833f20</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agentic AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agents ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Balajee Asish Brahmandam ]]>
                </dc:creator>
                <pubDate>Mon, 23 Mar 2026 17:21:11 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/8bb7701d-e519-407f-92ba-59639e13729d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Maybe this sounds familiar: your production container crashes at 3 AM. By the time you wake up, it's been throwing the same error for 2 hours. You SSH in, pull logs, decode the cryptic stack trace, Google the error, and finally restart it. Twenty minutes of your morning gone. And the worst part? It happens again next week.</p>
<p>I got tired of this cycle. I was running 5 containerized services on a single Linode box – a Flask API, a Postgres database, an Nginx reverse proxy, a Redis cache, and a background worker. Every other week, one of them would crash. The logs were messy. The errors weren't obvious. And I'd waste time debugging something that could've been auto-detected and fixed in seconds.</p>
<p>So I built something better: a Python agent that watches your containers in real-time, spots errors, figures out what went wrong using Claude, and fixes them without waking you up. I call it the Container Doctor. It's not magic. It's Docker API + LLM reasoning + some automation glue. Here's exactly how I built it, what went wrong along the way, and what I'd do differently.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a href="#heading-why-not-just-use-prometheus">Why Not Just Use Prometheus?</a></p>
</li>
<li><p><a href="#heading-the-architecture">The Architecture</a></p>
</li>
<li><p><a href="#heading-setting-up-the-project">Setting Up the Project</a></p>
</li>
<li><p><a href="#heading-the-monitoring-script--line-by-line">The Monitoring Script — Line by Line</a></p>
</li>
<li><p><a href="#heading-the-claude-diagnosis-prompt-and-why-structure-matters">The Claude Diagnosis Prompt (and Why Structure Matters)</a></p>
</li>
<li><p><a href="#heading-auto-fix-logic--being-conservative-on-purpose">Auto-Fix Logic — Being Conservative on Purpose</a></p>
</li>
<li><p><a href="#heading-adding-slack-notifications">Adding Slack Notifications</a></p>
</li>
<li><p><a href="#heading-health-check-endpoint">Health Check Endpoint</a></p>
</li>
<li><p><a href="#heading-rate-limiting-claude-calls">Rate Limiting Claude Calls</a></p>
</li>
<li><p><a href="#heading-docker-compose--the-full-setup">Docker Compose — The Full Setup</a></p>
</li>
<li><p><a href="#heading-real-errors-i-caught-in-production">Real Errors I Caught in Production</a></p>
</li>
<li><p><a href="#heading-cost-breakdown--what-this-actually-costs">Cost Breakdown — What This Actually Costs</a></p>
</li>
<li><p><a href="#heading-security-considerations">Security Considerations</a></p>
</li>
<li><p><a href="#heading-what-id-do-differently">What I'd Do Differently</a></p>
</li>
<li><p><a href="#heading-whats-next">What's Next?</a></p>
</li>
</ol>
<h2 id="heading-why-not-just-use-prometheus">Why Not Just Use Prometheus?</h2>
<p>Fair question. Prometheus, Grafana, DataDog – they're all great. But for my setup, they were overkill. I had 5 containers on a $20/month Linode. Setting up Prometheus means deploying a metrics server, configuring exporters for each service, building Grafana dashboards, and writing alert rules. That's a whole side project just to monitor a side project.</p>
<p>Even then, those tools tell you <em>what</em> happened. They'll show you a spike in memory or a 500 error rate. But they won't tell you <em>why</em>. You still need a human to look at the logs, figure out the root cause, and decide what to do.</p>
<p>That's the gap I wanted to fill. I didn't need another dashboard. I needed something that could read a stack trace, understand the context, and either fix it or tell me exactly what to do when I wake up. Claude turned out to be surprisingly good at this. It can read a Python traceback and tell you the issue faster than most junior devs (and some senior ones, honestly).</p>
<h2 id="heading-the-architecture">The Architecture</h2>
<p>Here's how the pieces fit together:</p>
<pre><code class="language-plaintext">┌─────────────────────────────────────────────┐
│              Docker Host                      │
│                                               │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │   web    │  │   api    │  │    db    │   │
│  │ (nginx)  │  │ (flask)  │  │(postgres)│   │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘   │
│       │              │              │         │
│       └──────────────┼──────────────┘         │
│                      │                         │
│              Docker Socket                     │
│                      │                         │
│            ┌─────────┴─────────┐              │
│            │ Container Doctor  │              │
│            │  (Python agent)   │              │
│            └─────────┬─────────┘              │
│                      │                         │
└──────────────────────┼─────────────────────────┘
                       │
              ┌────────┴────────┐
              │   Claude API    │
              │  (diagnosis)    │
              └────────┬────────┘
                       │
              ┌────────┴────────┐
              │  Slack Webhook  │
              │  (alerts)       │
              └─────────────────┘
</code></pre>
<p>The flow works like this:</p>
<ol>
<li><p>The Container Doctor runs in its own container with the Docker socket mounted</p>
</li>
<li><p>Every 10 seconds, it pulls the last 50 lines of logs from each target container</p>
</li>
<li><p>It scans for error patterns (keywords like "error", "exception", "traceback", "fatal")</p>
</li>
<li><p>When it finds something, it sends the logs to Claude with a structured prompt</p>
</li>
<li><p>Claude returns a JSON diagnosis: root cause, severity, suggested fix, and whether it's safe to auto-restart</p>
</li>
<li><p>If severity is high and auto-restart is safe, the script restarts the container</p>
</li>
<li><p>Either way, it sends a Slack notification with the full diagnosis</p>
</li>
<li><p>A simple health endpoint lets you check the doctor's own status</p>
</li>
</ol>
<p>The key insight: the script doesn't try to be smart about the diagnosis itself. It outsources all the thinking to Claude. The script's job is just plumbing: collecting logs, routing them to Claude, and executing the response.</p>
<h2 id="heading-setting-up-the-project">Setting Up the Project</h2>
<p>Create your project directory:</p>
<pre><code class="language-bash">mkdir container-doctor &amp;&amp; cd container-doctor
</code></pre>
<p>Here's your <code>requirements.txt</code>:</p>
<pre><code class="language-plaintext">docker==7.0.0
anthropic&gt;=0.28.0
python-dotenv==1.0.0
flask==3.0.0
requests==2.31.0
</code></pre>
<p>Install locally for testing: <code>pip install -r requirements.txt</code></p>
<p>Create a <code>.env</code> file:</p>
<pre><code class="language-bash">ANTHROPIC_API_KEY=sk-ant-...
TARGET_CONTAINERS=web,api,db
CHECK_INTERVAL=10
LOG_LINES=50
AUTO_FIX=true
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
POSTGRES_USER=user
POSTGRES_PASSWORD=changeme
POSTGRES_DB=mydb
MAX_DIAGNOSES_PER_HOUR=20
</code></pre>
<p>A quick note on <code>CHECK_INTERVAL</code>: 10 seconds is aggressive. For production, I'd bump this to 30-60 seconds. I kept it low during development so I could see results faster, and honestly forgot to change it. My API bill reminded me.</p>
<h2 id="heading-the-monitoring-script-line-by-line">The Monitoring Script – Line by Line</h2>
<p>Here's the full <code>container_doctor.py</code>. I'll walk through the important parts after:</p>
<pre><code class="language-python">import docker
import json
import time
import logging
import os
import requests
from datetime import datetime, timedelta
from collections import defaultdict
from threading import Thread
from flask import Flask, jsonify
from anthropic import Anthropic

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

client = Anthropic()
docker_client = None

# --- Config ---
TARGET_CONTAINERS = os.getenv("TARGET_CONTAINERS", "").split(",")
CHECK_INTERVAL = int(os.getenv("CHECK_INTERVAL", "10"))
LOG_LINES = int(os.getenv("LOG_LINES", "50"))
AUTO_FIX = os.getenv("AUTO_FIX", "true").lower() == "true"
SLACK_WEBHOOK = os.getenv("SLACK_WEBHOOK_URL", "")
MAX_DIAGNOSES = int(os.getenv("MAX_DIAGNOSES_PER_HOUR", "20"))

# --- State tracking ---
diagnosis_history = []
fix_history = defaultdict(list)
last_error_seen = {}
rate_limit_counter = defaultdict(int)
rate_limit_reset = datetime.now() + timedelta(hours=1)

app = Flask(__name__)


def get_docker_client():
    """Lazily initialize Docker client."""
    global docker_client
    if docker_client is None:
        docker_client = docker.from_env()
    return docker_client


def get_container_logs(container_name):
    """Fetch last N lines from a container."""
    try:
        container = get_docker_client().containers.get(container_name)
        logs = container.logs(
            tail=LOG_LINES,
            timestamps=True
        ).decode("utf-8")
        return logs
    except docker.errors.NotFound:
        logger.warning(f"Container '{container_name}' not found. Skipping.")
        return None
    except docker.errors.APIError as e:
        logger.error(f"Docker API error for {container_name}: {e}")
        return None
    except Exception as e:
        logger.error(f"Unexpected error fetching logs for {container_name}: {e}")
        return None


def detect_errors(logs):
    """Check if logs contain error patterns."""
    error_patterns = [
        "error", "exception", "traceback", "failed", "crash",
        "fatal", "panic", "segmentation fault", "out of memory",
        "killed", "oomkiller", "connection refused", "timeout",
        "permission denied", "no such file", "errno"
    ]
    logs_lower = logs.lower()
    found = []
    for pattern in error_patterns:
        if pattern in logs_lower:
            found.append(pattern)
    return found


def is_new_error(container_name, logs):
    """Check if this is a new error or the same one we already diagnosed."""
    log_hash = hash(logs[-200:])  # Hash last 200 chars
    if last_error_seen.get(container_name) == log_hash:
        return False
    last_error_seen[container_name] = log_hash
    return True


def check_rate_limit():
    """Ensure we don't spam Claude with too many requests."""
    global rate_limit_counter, rate_limit_reset

    now = datetime.now()
    if now &gt; rate_limit_reset:
        rate_limit_counter.clear()
        rate_limit_reset = now + timedelta(hours=1)

    total = sum(rate_limit_counter.values())
    if total &gt;= MAX_DIAGNOSES:
        logger.warning(f"Rate limit reached ({total}/{MAX_DIAGNOSES} per hour). Skipping diagnosis.")
        return False
    return True


def diagnose_with_claude(container_name, logs, error_patterns):
    """Send logs to Claude for diagnosis."""
    if not check_rate_limit():
        return None

    rate_limit_counter[container_name] += 1

    prompt = f"""You are a DevOps expert analyzing container logs.

Container: {container_name}
Timestamp: {datetime.now().isoformat()}
Detected patterns: {', '.join(error_patterns)}

Recent logs:
---
{logs}
---

Analyze these logs and respond with ONLY valid JSON (no markdown, no explanation):
{{
    "root_cause": "One sentence explaining exactly what went wrong",
    "severity": "low|medium|high",
    "suggested_fix": "Step-by-step fix the operator should apply",
    "auto_restart_safe": true or false,
    "config_suggestions": ["ENV_VAR=value", "..."],
    "likely_recurring": true or false,
    "estimated_impact": "What breaks if this isn't fixed"
}}
"""

    try:
        message = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=600,
            messages=[
                {"role": "user", "content": prompt}
            ]
        )
        return message.content[0].text
    except Exception as e:
        logger.error(f"Claude API error: {e}")
        return None


def parse_diagnosis(diagnosis_text):
    """Extract JSON from Claude's response."""
    if not diagnosis_text:
        return None
    try:
        start = diagnosis_text.find("{")
        end = diagnosis_text.rfind("}") + 1
        if start &gt;= 0 and end &gt; start:
            json_str = diagnosis_text[start:end]
            return json.loads(json_str)
    except json.JSONDecodeError as e:
        logger.error(f"JSON parse error: {e}")
        logger.debug(f"Raw response: {diagnosis_text}")
    except Exception as e:
        logger.error(f"Failed to parse diagnosis: {e}")
    return None


def apply_fix(container_name, diagnosis):
    """Apply auto-fixes if safe."""
    if not AUTO_FIX:
        logger.info(f"Auto-fix disabled globally. Skipping {container_name}.")
        return False

    if not diagnosis.get("auto_restart_safe"):
        logger.info(f"Claude says restart is unsafe for {container_name}. Skipping.")
        return False

    # Don't restart the same container more than 3 times per hour
    recent_fixes = [
        t for t in fix_history[container_name]
        if t &gt; datetime.now() - timedelta(hours=1)
    ]
    if len(recent_fixes) &gt;= 3:
        logger.warning(
            f"Container {container_name} already restarted {len(recent_fixes)} "
            f"times this hour. Something deeper is wrong. Skipping."
        )
        send_slack_alert(
            container_name, diagnosis,
            extra="REPEATED FAILURE: This container has been restarted 3+ times "
                  "in the last hour. Manual intervention needed."
        )
        return False

    try:
        container = get_docker_client().containers.get(container_name)
        logger.info(f"Restarting container {container_name}...")
        container.restart(timeout=30)
        fix_history[container_name].append(datetime.now())
        logger.info(f"Container {container_name} restarted successfully")

        # Verify it's actually running after restart
        time.sleep(5)
        container.reload()
        if container.status != "running":
            logger.error(f"Container {container_name} failed to start after restart")
            return False

        return True
    except Exception as e:
        logger.error(f"Failed to restart {container_name}: {e}")
        return False


def send_slack_alert(container_name, diagnosis, extra=""):
    """Send diagnosis to Slack."""
    if not SLACK_WEBHOOK:
        return

    severity_emoji = {
        "low": "🟡",
        "medium": "🟠",
        "high": "🔴"
    }

    severity = diagnosis.get("severity", "unknown")
    emoji = severity_emoji.get(severity, "⚪")

    blocks = [
        {
            "type": "header",
            "text": {
                "type": "plain_text",
                "text": f"{emoji} Container Doctor Alert: {container_name}"
            }
        },
        {
            "type": "section",
            "fields": [
                {"type": "mrkdwn", "text": f"*Severity:* {severity}"},
                {"type": "mrkdwn", "text": f"*Container:* `{container_name}`"},
                {"type": "mrkdwn", "text": f"*Root Cause:* {diagnosis.get('root_cause', 'Unknown')}"},
                {"type": "mrkdwn", "text": f"*Fix:* {diagnosis.get('suggested_fix', 'N/A')}"},
            ]
        }
    ]

    if diagnosis.get("config_suggestions"):
        suggestions = "\n".join(
            f"• `{s}`" for s in diagnosis["config_suggestions"]
        )
        blocks.append({
            "type": "section",
            "text": {
                "type": "mrkdwn",
                "text": f"*Config Suggestions:*\n{suggestions}"
            }
        })

    if extra:
        blocks.append({
            "type": "section",
            "text": {"type": "mrkdwn", "text": f"*⚠️ {extra}*"}
        })

    try:
        requests.post(SLACK_WEBHOOK, json={"blocks": blocks}, timeout=10)
    except Exception as e:
        logger.error(f"Slack notification failed: {e}")


# --- Health Check Endpoint ---
@app.route("/health")
def health():
    """Health check endpoint for the doctor itself."""
    try:
        get_docker_client().ping()
        docker_ok = True
    except:
        docker_ok = False

    return jsonify({
        "status": "healthy" if docker_ok else "degraded",
        "docker_connected": docker_ok,
        "monitoring": TARGET_CONTAINERS,
        "total_diagnoses": len(diagnosis_history),
        "fixes_applied": {k: len(v) for k, v in fix_history.items()},
        "rate_limit_remaining": MAX_DIAGNOSES - sum(rate_limit_counter.values()),
        "uptime_check": datetime.now().isoformat()
    })


@app.route("/history")
def history():
    """Return recent diagnosis history."""
    return jsonify(diagnosis_history[-50:])


def monitor_containers():
    """Main monitoring loop."""
    logger.info(f"Container Doctor starting up")
    logger.info(f"Monitoring: {TARGET_CONTAINERS}")
    logger.info(f"Check interval: {CHECK_INTERVAL}s")
    logger.info(f"Auto-fix: {AUTO_FIX}")
    logger.info(f"Rate limit: {MAX_DIAGNOSES}/hour")

    while True:
        for container_name in TARGET_CONTAINERS:
            container_name = container_name.strip()
            if not container_name:
                continue

            logs = get_container_logs(container_name)
            if not logs:
                continue

            error_patterns = detect_errors(logs)
            if not error_patterns:
                continue

            # Skip if we already diagnosed this exact error
            if not is_new_error(container_name, logs):
                continue

            logger.warning(
                f"Errors detected in {container_name}: {error_patterns}"
            )

            diagnosis_text = diagnose_with_claude(
                container_name, logs, error_patterns
            )
            if not diagnosis_text:
                continue

            diagnosis = parse_diagnosis(diagnosis_text)
            if not diagnosis:
                logger.error("Failed to parse Claude's response. Skipping.")
                continue

            # Record it
            diagnosis_history.append({
                "container": container_name,
                "timestamp": datetime.now().isoformat(),
                "diagnosis": diagnosis,
                "patterns": error_patterns
            })

            logger.info(
                f"Diagnosis for {container_name}: "
                f"severity={diagnosis.get('severity')}, "
                f"cause={diagnosis.get('root_cause')}"
            )

            # Auto-fix only on high severity
            fixed = False
            if diagnosis.get("severity") == "high":
                fixed = apply_fix(container_name, diagnosis)

            # Always notify Slack
            send_slack_alert(
                container_name, diagnosis,
                extra="Auto-restarted" if fixed else ""
            )

        time.sleep(CHECK_INTERVAL)


if __name__ == "__main__":
    # Run Flask health endpoint in background
    flask_thread = Thread(
        target=lambda: app.run(host="0.0.0.0", port=8080, debug=False),
        daemon=True
    )
    flask_thread.start()
    logger.info("Health endpoint running on :8080")

    try:
        monitor_containers()
    except KeyboardInterrupt:
        logger.info("Container Doctor shutting down")
</code></pre>
<p>That's a lot of code, so let me walk through the parts that matter.</p>
<p><strong>Error deduplication (</strong><code>is_new_error</code><strong>)</strong>: This was a lesson I learned the hard way. Without this, the script would see the same error every 10 seconds and spam Claude with identical requests. I hash the last 200 characters of the log output and skip if it matches the last error we saw. Simple, but it cut my API costs by about 80%.</p>
<p><strong>Rate limiting (</strong><code>check_rate_limit</code><strong>)</strong>: Belt and suspenders. Even with deduplication, I cap it at 20 diagnoses per hour. If something is so broken that it's generating 20+ unique errors per hour, you need a human anyway.</p>
<p><strong>Restart throttling (inside</strong> <code>apply_fix</code><strong>)</strong>: If the same container has been restarted 3 times in an hour, something deeper is wrong. A restart loop won't fix a misconfigured database or a missing volume. The script stops restarting and sends a louder Slack alert instead.</p>
<p><strong>Post-restart verification</strong>: After restarting, the script waits 5 seconds and checks if the container is actually running. I've seen cases where a container restarts and immediately crashes again. Without this check, the script would report success while the container is still down.</p>
<h2 id="heading-the-claude-diagnosis-prompt-and-why-structure-matters">The Claude Diagnosis Prompt (and Why Structure Matters)</h2>
<p>Getting Claude to return parseable JSON took some iteration. My first attempt used a casual prompt and I got back paragraphs of explanation with JSON buried somewhere in the middle. Sometimes it'd use markdown code fences, sometimes not.</p>
<p>The version I landed on is explicit about format:</p>
<pre><code class="language-python">prompt = f"""You are a DevOps expert analyzing container logs.

Container: {container_name}
Timestamp: {datetime.now().isoformat()}
Detected patterns: {', '.join(error_patterns)}

Recent logs:
---
{logs}
---

Analyze these logs and respond with ONLY valid JSON (no markdown, no explanation):
{{
    "root_cause": "One sentence explaining exactly what went wrong",
    "severity": "low|medium|high",
    "suggested_fix": "Step-by-step fix the operator should apply",
    "auto_restart_safe": true or false,
    "config_suggestions": ["ENV_VAR=value", "..."],
    "likely_recurring": true or false,
    "estimated_impact": "What breaks if this isn't fixed"
}}
"""
</code></pre>
<p>A few things I learned:</p>
<p><strong>Include the detected patterns.</strong> Telling Claude "I found 'timeout' and 'connection refused'" helps it focus. Without this, it sometimes fixated on irrelevant warnings in the logs.</p>
<p><strong>Ask for</strong> <code>estimated_impact</code><strong>.</strong> This field turned out to be the most useful in Slack alerts. When your team sees "Database connections will pile up and crash the API within 15 minutes," they act faster than when they see "connection pool exhausted."</p>
<p><code>likely_recurring</code> <strong>is gold.</strong> If Claude says an issue is likely to recur, I know a restart is a band-aid and I need to actually fix the root cause. I flag these in Slack with extra emphasis.</p>
<p>Claude returns something like:</p>
<pre><code class="language-json">{
    "root_cause": "Connection pool exhausted. Default pool size is 5, but app has 8+ concurrent workers.",
    "severity": "high",
    "suggested_fix": "1. Set POOL_SIZE=20 in environment. 2. Add connection timeout of 30s. 3. Consider a connection pooler like PgBouncer.",
    "auto_restart_safe": true,
    "config_suggestions": ["POOL_SIZE=20", "CONNECTION_TIMEOUT=30"],
    "likely_recurring": true,
    "estimated_impact": "API requests will queue and timeout. Users will see 503 errors within 2-3 minutes."
}
</code></pre>
<p>I only auto-restart on <code>high</code> severity. Medium and low issues get logged, sent to Slack, and I deal with them during business hours. This distinction matters: you don't want the script restarting containers over every transient warning.</p>
<h2 id="heading-auto-fix-logic-being-conservative-on-purpose">Auto-Fix Logic – Being Conservative on Purpose</h2>
<p>The auto-fix function is intentionally limited. Right now it only restarts containers. It doesn't modify environment variables, change configs, or scale services. Here's why:</p>
<p>Restarting is safe and reversible. If the restart makes things worse, the container just crashes again and I get another alert. But if the script started changing environment variables or modifying docker-compose files, a bad decision could cascade across services.</p>
<p>The three safety checks before any restart:</p>
<ol>
<li><p><strong>Global toggle</strong>: <code>AUTO_FIX=true</code> in .env. I can kill all auto-fixes instantly by changing one variable.</p>
</li>
<li><p><strong>Claude's assessment</strong>: <code>auto_restart_safe</code> must be true. If Claude says "don't restart this, it'll corrupt the database," the script listens.</p>
</li>
<li><p><strong>Restart throttle</strong>: No more than 3 restarts per container per hour. After that, it's a human problem.</p>
</li>
</ol>
<p>If I were building this for a team, I'd add approval flows. Send a Slack message with "Restart?" and two buttons. Wait for a human to click yes. That adds latency but removes the risk of automated chaos.</p>
<h2 id="heading-adding-slack-notifications">Adding Slack Notifications</h2>
<p>Every diagnosis gets sent to Slack, whether the container was restarted or not. The notification includes color-coded severity, root cause, suggested fix, and config suggestions.</p>
<p>The Slack Block Kit formatting makes these alerts scannable. A red dot for high severity, orange for medium, yellow for low. Your team can glance at the channel and know if they need to drop everything or if it can wait.</p>
<p>To set this up, create a Slack app at <a href="https://api.slack.com/apps">api.slack.com/apps</a>, add an incoming webhook, and paste the URL in your <code>.env</code>.</p>
<h2 id="heading-health-check-endpoint">Health Check Endpoint</h2>
<p>The doctor needs a doctor. I added a simple Flask endpoint so I can monitor the monitoring script:</p>
<pre><code class="language-bash">curl http://localhost:8080/health
</code></pre>
<p>Returns:</p>
<pre><code class="language-json">{
    "status": "healthy",
    "docker_connected": true,
    "monitoring": ["web", "api", "db"],
    "total_diagnoses": 14,
    "fixes_applied": {"api": 2, "web": 1},
    "rate_limit_remaining": 6,
    "uptime_check": "2026-03-15T14:30:00"
}
</code></pre>
<p>And <code>/history</code> returns the last 50 diagnoses:</p>
<pre><code class="language-bash">curl http://localhost:8080/history
</code></pre>
<p>I point an uptime checker (UptimeRobot, free tier) at the <code>/health</code> endpoint. If the Container Doctor itself goes down, I get an email. It's monitoring all the way down.</p>
<h2 id="heading-rate-limiting-claude-calls">Rate Limiting Claude Calls</h2>
<p>This is where I burned money during development. Without rate limiting, the script was sending 100+ requests per hour during a container crash loop. At a few cents per request, that's a few dollars per hour. Not catastrophic, but annoying.</p>
<p>The rate limiter is simple: a counter that resets every hour. Default cap is 20 diagnoses per hour. If you hit the limit, the script logs a warning and skips diagnosis until the window resets. Errors still get detected, they just don't get sent to Claude.</p>
<p>Combined with error deduplication (same error won't trigger a second diagnosis), this keeps my Claude bill under $5/month even with 5 containers monitored.</p>
<h2 id="heading-docker-compose-the-full-setup">Docker Compose – The Full Setup</h2>
<p>Here's the complete <code>docker-compose.yml</code> with the Container Doctor, a sample web server, API, and database:</p>
<pre><code class="language-yaml">version: '3.8'

services:
  container_doctor:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: container_doctor
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - TARGET_CONTAINERS=web,api,db
      - CHECK_INTERVAL=10
      - LOG_LINES=50
      - AUTO_FIX=true
      - SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL}
      - MAX_DIAGNOSES_PER_HOUR=20
    ports:
      - "8080:8080"
    restart: unless-stopped
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  web:
    image: nginx:latest
    container_name: web
    ports:
      - "80:80"
    restart: unless-stopped

  api:
    build: ./api
    container_name: api
    environment:
      - DATABASE_URL=postgres://\({POSTGRES_USER}:\){POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}
      - POOL_SIZE=20
    depends_on:
      - db
    restart: unless-stopped

  db:
    image: postgres:15
    container_name: db
    environment:
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_DB=${POSTGRES_DB}
    volumes:
      - db_data:/var/lib/postgresql/data
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER}"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  db_data:
</code></pre>
<p>And the <code>Dockerfile</code>:</p>
<pre><code class="language-dockerfile">FROM python:3.12-slim

WORKDIR /app

RUN apt-get update &amp;&amp; apt-get install -y curl &amp;&amp; rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY container_doctor.py .

EXPOSE 8080

CMD ["python", "-u", "container_doctor.py"]
</code></pre>
<p>Start everything: <code>docker compose up -d</code></p>
<p><strong>Important:</strong> The socket mount (<code>/var/run/docker.sock:/var/run/docker.sock</code>) gives the Container Doctor full access to the Docker daemon. Don't copy <code>.env</code> into the Docker image either — it bakes your API key into the image layer. Pass environment variables via the compose file or at runtime.</p>
<h2 id="heading-real-errors-i-caught-in-production">Real Errors I Caught in Production</h2>
<p>I've been running this for about 3 weeks now. Here are the actual incidents it caught:</p>
<h3 id="heading-incident-1-oom-kill-week-1">Incident 1: OOM Kill (Week 1)</h3>
<p>Logs showed a single word: <code>Killed</code>. That's Linux's OOMKiller doing its thing.</p>
<p>Claude's diagnosis:</p>
<pre><code class="language-json">{
    "root_cause": "Process killed by OOMKiller. Container is requesting more memory than the 256MB limit allows under load.",
    "severity": "high",
    "suggested_fix": "Increase memory limit to 512MB in docker-compose. Monitor if the leak continues at higher limits.",
    "auto_restart_safe": true,
    "config_suggestions": ["mem_limit: 512m", "memswap_limit: 1g"],
    "likely_recurring": true,
    "estimated_impact": "API is completely down. All requests return 502 from nginx."
}
</code></pre>
<p>The script restarted the container in 3 seconds. I updated the compose file the next morning. Before the Container Doctor, this would've been a 2-hour outage overnight.</p>
<h3 id="heading-incident-2-connection-pool-exhausted-week-2">Incident 2: Connection Pool Exhausted (Week 2)</h3>
<pre><code class="language-plaintext">ERROR: database connection pool exhausted
ERROR: cannot create new pool entry
ERROR: QueuePool limit of 5 overflow 0 reached
</code></pre>
<p>Claude caught that my pool size was too small for the number of workers:</p>
<pre><code class="language-json">{
    "root_cause": "SQLAlchemy connection pool (size=5) can't keep up with 8 concurrent Gunicorn workers. Each worker holds a connection during request processing.",
    "severity": "high",
    "suggested_fix": "Set POOL_SIZE=20 and add POOL_TIMEOUT=30. Long-term: add PgBouncer as a connection pooler.",
    "auto_restart_safe": true,
    "config_suggestions": ["POOL_SIZE=20", "POOL_TIMEOUT=30", "POOL_RECYCLE=3600"],
    "likely_recurring": true,
    "estimated_impact": "New API requests will hang for 30s then timeout. Existing requests may complete but slowly."
}
</code></pre>
<h3 id="heading-incident-3-transient-timeout-week-2">Incident 3: Transient Timeout (Week 2)</h3>
<pre><code class="language-plaintext">WARN: timeout connecting to upstream service
WARN: retrying request (attempt 2/3)
INFO: request succeeded on retry
</code></pre>
<p>Claude correctly identified this as a non-issue:</p>
<pre><code class="language-json">{
    "root_cause": "Transient network timeout during a DNS resolution hiccup. Retries succeeded.",
    "severity": "low",
    "suggested_fix": "No action needed. This is expected during brief network blips. Only investigate if frequency increases.",
    "auto_restart_safe": false,
    "config_suggestions": [],
    "likely_recurring": false,
    "estimated_impact": "Minimal. Individual requests delayed by ~2s but all completed."
}
</code></pre>
<p>No restart. No alert (I filter low-severity from Slack pings). This is the right call: restarting on every transient timeout causes more downtime than it prevents.</p>
<h3 id="heading-incident-4-disk-full-week-3">Incident 4: Disk Full (Week 3)</h3>
<pre><code class="language-plaintext">ERROR: could not write to temporary file: No space left on device
FATAL: data directory has no space
</code></pre>
<pre><code class="language-json">{
    "root_cause": "Postgres data volume is full. WAL files and temporary sort files consumed all available space.",
    "severity": "high",
    "suggested_fix": "1. Clean WAL files: SELECT pg_switch_wal(). 2. Increase volume size. 3. Add log rotation. 4. Set max_wal_size=1GB.",
    "auto_restart_safe": false,
    "config_suggestions": ["max_wal_size=1GB", "log_rotation_age=1d"],
    "likely_recurring": true,
    "estimated_impact": "Database is read-only. All writes fail. API returns 500 on any mutation."
}
</code></pre>
<p>Notice Claude said <code>auto_restart_safe: false</code> here. Restarting Postgres when the disk is full can corrupt data. The script didn't touch it. It just sent me a detailed Slack alert at 4 AM. I cleaned up the WAL files the next morning. Good call by Claude.</p>
<h2 id="heading-cost-breakdown-what-this-actually-costs">Cost Breakdown – What This Actually Costs</h2>
<p>After 3 weeks of running this on 5 containers:</p>
<ul>
<li><p><strong>Claude API</strong>: ~$3.80/month (with rate limiting and deduplication)</p>
</li>
<li><p><strong>Linode compute</strong>: $0 extra (the Container Doctor uses about 50MB RAM)</p>
</li>
<li><p><strong>Slack</strong>: Free tier</p>
</li>
<li><p><strong>My time saved</strong>: ~2-3 hours/month of 3 AM debugging</p>
</li>
</ul>
<p>Without rate limiting, my first week cost $8 in API calls. The deduplication + rate limiter brought that down dramatically. Most of my containers run fine. The script only calls Claude when something actually breaks.</p>
<p>If you're monitoring more containers or have noisier logs, expect higher costs. The <code>MAX_DIAGNOSES_PER_HOUR</code> setting is your budget knob.</p>
<h2 id="heading-security-considerations">Security Considerations</h2>
<p>Let's talk about the elephant in the room: the Docker socket.</p>
<p>Mounting <code>/var/run/docker.sock</code> gives the Container Doctor <strong>root-equivalent access</strong> to your Docker daemon. It can start, stop, and remove any container. It can pull images. It can exec into running containers. If someone compromises the Container Doctor, they own your entire Docker host.</p>
<p>Here's how I mitigate this:</p>
<ol>
<li><p><strong>Network isolation</strong>: The Container Doctor's health endpoint is only exposed on localhost. In production, put it behind a reverse proxy with auth.</p>
</li>
<li><p><strong>Read-mostly access</strong>: The script only <em>reads</em> logs and <em>restarts</em> containers. It never execs into containers, pulls images, or modifies volumes.</p>
</li>
<li><p><strong>No external inputs</strong>: The script doesn't accept commands from Slack or any external source. It's outbound-only (logs out, alerts out).</p>
</li>
<li><p><strong>API key rotation</strong>: I rotate the Anthropic API key monthly. If the container is compromised, the key has limited blast radius.</p>
</li>
</ol>
<p>For a more secure setup, consider Docker's <code>--read-only</code> flag on the socket mount and a tool like <a href="https://github.com/Tecnativa/docker-socket-proxy">docker-socket-proxy</a> to restrict which API calls the Container Doctor can make.</p>
<h2 id="heading-what-id-do-differently">What I'd Do Differently</h2>
<p>After 3 weeks in production, here's my honest retrospective:</p>
<p><strong>I'd use structured logging from day one.</strong> My regex-based error detection catches too many false positives. A JSON log format with severity levels would make detection way more accurate.</p>
<p><strong>I'd add per-container policies.</strong> Right now, every container gets the same treatment. But you probably want different rules for a database vs a web server. Never auto-restart a database. Always auto-restart a stateless web server.</p>
<p><strong>I'd build a simple web UI.</strong> The <code>/history</code> endpoint returns JSON, but a small React dashboard showing a timeline of incidents, fix success rates, and cost tracking would be much more useful.</p>
<p><strong>I'd try local models first.</strong> For simple errors (OOM, connection refused), a small local model running on Ollama could handle the diagnosis without any API cost. Reserve Claude for the weird, complex stack traces where you actually need strong reasoning.</p>
<p><strong>I'd add a "learning mode."</strong> Run the Container Doctor in observe-only mode for a week. Let it diagnose everything but fix nothing. Review the diagnoses manually. Once you trust its judgment, flip on auto-fix. This builds confidence before you give it restart power.</p>
<h2 id="heading-whats-next">What's Next?</h2>
<p>If you found this useful, I write about Docker, AI tools, and developer workflows every week. I'm Balajee Asish – Docker Captain, freeCodeCamp contributor, and currently building my way through the AI tools space one project at a time.</p>
<p>Got questions or built something similar? Drop a comment below or find me on <a href="https://github.com/balajee-asish">GitHub</a> and <a href="https://linkedin.com/in/balajee-asish">LinkedIn</a>.</p>
<p>Happy building.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Open Source LLM Agent Handbook: How to Automate Complex Tasks with LangGraph and CrewAI ]]>
                </title>
                <description>
                    <![CDATA[ Ever feel like your AI tools are a bit...well, passive? Like they just sit there, waiting for your next command? Imagine if they could take initiative, break down big problems, and even work together to get things done. That's exactly what LLM agents... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-open-source-llm-agent-handbook/</link>
                <guid isPermaLink="false">683f04aedfb685791a4e8dd2</guid>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ openai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Open Source ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agentic AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #agent ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agents ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ML ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Bash ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Beginner Developers ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Balajee Asish Brahmandam ]]>
                </dc:creator>
                <pubDate>Tue, 03 Jun 2025 14:20:30 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748956366197/c4dd2bba-430a-4f12-a3d4-becc6707c52e.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Ever feel like your AI tools are a bit...well, passive? Like they just sit there, waiting for your next command? Imagine if they could take initiative, break down big problems, and even work together to get things done.</p>
<p>That's exactly what LLM agents bring to the table. They're changing how we automate complex tasks, and they can help bring our AI ideas to life in a whole new way.</p>
<p>In this article, we'll explore what LLM agents are, how they work, and how you can build your very own using awesome open-source frameworks.</p>
<h3 id="heading-what-well-cover">What we’ll cover:</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-the-current-state-of-llm-agents">The Current State of LLM Agents</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-from-chatbots-to-autonomous-agents">From Chatbots to Autonomous Agents</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-can-agents-do-today">What Can Agents Do Today?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-whats-available-to-build-with">What's Available to Build With?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-now-is-the-best-time-to-learn">Why Now Is the Best Time to Learn</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-what-are-llm-agents-and-why-are-they-a-big-deal">What Are LLM Agents and Why Are They a Big Deal?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-an-llm">What Is an LLM?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-so-whats-an-llm-agent">So, What’s an LLM Agent?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-does-this-matter">Why Does This Matter?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-the-rise-of-open-source-agent-frameworks">The Rise of Open-Source Agent Frameworks</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-popular-open-source-agent-frameworks">Popular Open-Source Agent Frameworks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-these-tools-enable">What These Tools Enable</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-use-a-framework-instead-of-building-from-scratch">Why Use a Framework Instead of Building from Scratch?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-core-concepts-behind-agent-design">Core Concepts Behind Agent Design</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-agent-loop">The Agent Loop</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-components-of-an-agent">Key Components of an Agent</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-multi-agent-collaboration">Multi-Agent Collaboration</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-project-automate-your-daily-schedule-from-emails">Project: Automate Your Daily Schedule from Emails</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-were-automating">What We’re Automating</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-install-the-required-tools">Step 1: Install the Required Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-define-the-task">Step 2: Define the Task</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-build-the-workflow-with-langgraph">Step 3: Build the Workflow with LangGraph</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-multi-agent-collaboration-with-crewai">Multi-Agent Collaboration with CrewAI</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-crewai">What Is CrewAI?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-sample-roles-for-the-email-summary-task">Sample Roles for the Email Summary Task</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-sample-crewai-code">Sample CrewAI Code</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-what-actually-happens-during-execution">What Actually Happens During Execution?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-are-llm-agents-safe-what-to-know-about-security-and-privacy">Are LLM Agents Safe? What to Know About Security and Privacy</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-troubleshooting-and-tips">Troubleshooting &amp; Tips</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-explore-more-daily-automations">Explore More Daily Automations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-whats-next-in-agent-technology">What’s Next in Agent Technology?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-final-summary">Final Summary</a></p>
</li>
</ol>
<h2 id="heading-the-current-state-of-llm-agents">The Current State of LLM Agents</h2>
<p>LLM agents are one of the most exciting developments in AI right now. They’re already helping automate real tasks but they’re also still evolving. So where are we today?</p>
<h3 id="heading-from-chatbots-to-autonomous-agents">From Chatbots to Autonomous Agents</h3>
<p>Large Language Models (LLMs) like GPT-4, Claude, Gemini, and LLaMA have evolved from simple chatbots into surprisingly capable reasoning engines. They've gone from answering trivia questions and generating essays to performing complex reasoning, following multi-step instructions, and interacting with tools like web search and code interpreters.</p>
<p>But here’s the catch: these models are <strong>reactive</strong>. They wait for input and give output. They don't retain memory between tasks, plan ahead, or pursue goals on their own. That’s where <strong>LLM agents</strong> come in – they bridge this gap by adding structure, memory, and autonomy.</p>
<h3 id="heading-what-can-agents-do-today">What Can Agents Do Today?</h3>
<p>Right now, LLM agents are already being used for:</p>
<ul>
<li><p>Summarizing emails or documents</p>
</li>
<li><p>Planning daily schedules</p>
</li>
<li><p>Running DevOps scripts</p>
</li>
<li><p>Searching APIs or tools for answers</p>
</li>
<li><p>Collaborating in small “teams” to complete complex tasks</p>
</li>
</ul>
<p>But they’re not perfect yet. Agents can still:</p>
<ul>
<li><p>Get stuck in loops</p>
</li>
<li><p>Misunderstand goals</p>
</li>
<li><p>Require detailed prompts and guardrails</p>
</li>
</ul>
<p>That’s because this technology is still early-stage. Frameworks are getting better fast, but reliability and memory are still works in progress. So just keep that in mind as you experiment.</p>
<h3 id="heading-why-now-is-the-best-time-to-learn">Why Now Is the Best Time to Learn</h3>
<p>The truth is: we’re still early. But not <em>too</em> early.</p>
<p>This is the perfect time to start experimenting with agents:</p>
<ul>
<li><p>The tooling is mature enough to build real projects</p>
</li>
<li><p>The community is growing rapidly</p>
</li>
<li><p>And you don’t need to be an AI expert just comfortable with Python</p>
</li>
</ul>
<h2 id="heading-what-are-llm-agents-and-why-are-they-a-big-deal">What Are LLM Agents and Why Are They a Big Deal?</h2>
<p>Before we dive into the exciting world of agents, let's quickly chat a bit more about the basics.</p>
<h3 id="heading-what-is-an-llm">What Is an LLM?</h3>
<p>An LLM, or Large Language Model, is basically an AI that's learned from a massive amount of text from the internet – think books, articles, code, and tons more. You can picture it as a super-smart autocomplete engine. But it does way more than just finish your sentences. It can also:</p>
<ul>
<li><p>Answer tricky questions</p>
</li>
<li><p>Summarize long articles or documents</p>
</li>
<li><p>Write code, emails, or creative stories</p>
</li>
<li><p>Translate languages instantly</p>
</li>
<li><p>Even solve logic puzzles and have engaging conversations</p>
</li>
</ul>
<p>Chances are you've heard of ChatGPT, which is powered by OpenAI's GPT models. Other popular LLMs you might come across include Claude (from Anthropic), LLaMA (by Meta), Mistral, and Gemini (from Google).</p>
<p>These models work by simply predicting the next word in a sentence based on the context. While that sounds straightforward, when trained on billions of words, LLMs become capable of surprisingly intelligent behavior, understanding your instructions, following step-by-step reasoning, and producing coherent responses across almost any topic you can imagine.</p>
<h3 id="heading-so-whats-an-llm-agent">So, What’s an LLM Agent?</h3>
<p>While LLMs are super powerful, they usually just <em>react –</em> they only respond when you ask them something. An LLM agent, on the other hand, is <em>proactive</em>.</p>
<p>LLM agents can:</p>
<ul>
<li><p>Break down big, complex tasks into smaller, manageable steps</p>
</li>
<li><p>Make smart decisions and figure out what to do next</p>
</li>
<li><p>Use "tools" like web search, calculators, or even other apps</p>
</li>
<li><p>Work towards a goal, even if it takes multiple steps or tries</p>
</li>
<li><p>Team up with other agents to accomplish shared objectives</p>
</li>
</ul>
<p>In short, LLM agents can think, plan, act, and adapt.</p>
<p>Think of an LLM agent like your super-efficient new assistant: you give it a goal, and it figures out how to achieve it all on its own.</p>
<h3 id="heading-why-does-this-matter">Why Does This Matter?</h3>
<p>This shift from just responding to actively pursuing goals opens a ton of exciting possibilities:</p>
<ul>
<li><p>Automating boring IT or DevOps tasks</p>
</li>
<li><p>Generating detailed reports from raw data</p>
</li>
<li><p>Helping you with multi-step research projects</p>
</li>
<li><p>Reading through your daily emails and highlighting key info</p>
</li>
<li><p>Running your internal tools to take real-world actions</p>
</li>
</ul>
<p>Unlike older, rule-based bots, LLM agents can reason, reflect, and learn from their attempts. This makes them a much better fit for real-world tasks that are messy, require flexibility, and depend on understanding context.</p>
<h2 id="heading-the-rise-of-open-source-agent-frameworks">The Rise of Open-Source Agent Frameworks</h2>
<p>Not too long ago, if you wanted to build an AI system that could act autonomously, it meant writing a ton of custom code, painstakingly managing memory, and trying to stitch together dozens of components. It was a complex, delicate, and highly specialized job.</p>
<p>But guess what? That's not the case anymore.</p>
<p>In 2024, a wave of fantastic open-source frameworks hit the scene. These tools have made it dramatically easier to build powerful LLM agents without you having to reinvent the wheel every time.</p>
<h3 id="heading-popular-open-source-agent-frameworks">Popular Open-Source Agent Frameworks</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Framework</strong></td><td><strong>Description</strong></td><td><strong>Maintainer</strong></td></tr>
</thead>
<tbody>
<tr>
<td>LangGraph</td><td>Graph-based framework for agent state and memory</td><td>LangChain</td></tr>
<tr>
<td>CrewAI</td><td>"Role-based, multi-agent collaboration engine"</td><td>Community (CrewAI)</td></tr>
<tr>
<td>AutoGen</td><td>Customizable multi-agent chat orchestration</td><td>Microsoft</td></tr>
<tr>
<td>AgentVerse</td><td>Modular framework for agent simulation and testing</td><td>Open-source project</td></tr>
</tbody>
</table>
</div><h3 id="heading-what-these-tools-enable">What These Tools Enable</h3>
<p>These frameworks give you ready-made building blocks to handle the trickier parts of creating agents:</p>
<ul>
<li><p><strong>Planning</strong> – Letting agents decide their next move</p>
</li>
<li><p><strong>Tool Use</strong> – Easily connecting agents to things like file systems, web browsers, APIs, or databases</p>
</li>
<li><p><strong>Memory</strong> – Storing and retrieving past information or intermediate results for long-term context</p>
</li>
<li><p><strong>Multi-Agent Collaboration</strong> – Setting up teams of agents that work together on shared goals</p>
</li>
</ul>
<h3 id="heading-why-use-a-framework-instead-of-building-from-scratch">Why Use a Framework Instead of Building from Scratch?</h3>
<p>While you <em>could</em> build a custom agent from the ground up, using a framework will save you a huge amount of time and effort. Open-source agent libraries come packed with:</p>
<ul>
<li><p>Built-in support for orchestrating LLMs</p>
</li>
<li><p>Proven patterns for task planning, keeping track of where you are, and getting feedback</p>
</li>
<li><p>Easy integration with popular models like OpenAI, or even models you run locally</p>
</li>
<li><p>The flexibility to grow from a single helpful agent to entire teams of agents</p>
</li>
</ul>
<p>Basically, these frameworks let you focus on <strong>what your agent should do</strong>, rather than getting bogged down in how to build all the internal workings. Plus, choosing open source means you benefit from community contributions, transparency in how they work, and the freedom to tweak them to your exact needs, without getting locked into a single vendor.</p>
<h2 id="heading-core-concepts-behind-agent-design">Core Concepts Behind Agent Design</h2>
<p>To really grasp how LLM agents operate, it helps to think of them as goal-driven systems that constantly cycle through observing, reasoning, and acting. This continuous loop allows them to tackle tasks that go beyond simple questions and answers, moving into true automation, tool usage, and adapting on the fly.</p>
<h3 id="heading-the-agent-loop">The Agent Loop</h3>
<p>Most LLM agents function based on a mental model called the <strong>Agent Loop</strong> a step-by-step cycle that repeats until the job is done. Here’s how it typically works:</p>
<ul>
<li><p><strong>Perceive:</strong> The agent starts by noticing something in its environment or receiving new information. This could be your prompt, a piece of data, or the current state of a system.</p>
</li>
<li><p><strong>Plan:</strong> Based on what it perceives and its overall goal, the agent decides what to do next. It might break the task into smaller sub-goals or figure out the best tool for the job.</p>
</li>
<li><p><strong>Act:</strong> The agent then acts. This could mean running a function, calling an API, searching the web, interacting with a database, or even asking another agent for help.</p>
</li>
<li><p><strong>Reflect:</strong> After acting, the agent looks at the outcome: Did it work? Was the result useful? Should it try a different approach? Based on this, it updates its plan and keeps going until the task is complete.</p>
</li>
</ul>
<p>This loop is what makes agents so dynamic. It allows them to handle ever-changing tasks, learn from partial results, and correct their course qualities that are vital for building truly useful AI assistants.</p>
<h3 id="heading-key-components-of-an-agent">Key Components of an Agent</h3>
<p>To do their job effectively, agents are built around several crucial parts:</p>
<ul>
<li><p><strong>Tools</strong> are how an agent interacts with the real (or digital) world. These can be anything from search engines, code execution environments, file readers, or API clients, to simple calculators or command-line scripts.</p>
</li>
<li><p><strong>Memory</strong> lets agents remember what they've done or seen across different steps. This might include previous things you've said, temporary results, or key decisions. Some frameworks offer short-term memory (just for one session), while others support long-term memory that can span multiple sessions or goals.</p>
</li>
<li><p><strong>Environment</strong> refers to the external data or system context the agent operates within think APIs, documents, databases, files, or sensor inputs. The more information and access an agent have to its environment, the more meaningful actions it can take.</p>
</li>
<li><p><strong>Goal</strong> is the agent's ultimate objective: what it's trying to achieve. Goals should be specific and clear for instance, “generate a daily schedule,” “summarize this document,” or “extract tasks from emails.”</p>
</li>
</ul>
<h3 id="heading-multi-agent-collaboration">Multi-Agent Collaboration</h3>
<p>For more advanced systems, you can even have multiple agents working together to hit a shared target. Each agent can be given a specific <strong>role</strong> that highlights its specialty just like people working on a team.</p>
<p>For example:</p>
<ul>
<li><p>A <strong>researcher agent</strong> might be tasked with gathering information.</p>
</li>
<li><p>A <strong>coder agent</strong> could write Python scripts or automation routines.</p>
</li>
<li><p>A <strong>reviewer agent</strong> might check the results and ensure everything is up to snuff.</p>
</li>
</ul>
<p>These agents can chat with each other, share information, and even debate or vote on decisions. This kind of teamwork allows AI systems to tackle bigger, more complex tasks while keeping things organized and modular.</p>
<h2 id="heading-project-automate-your-daily-schedule-from-emails">Project: Automate Your Daily Schedule from Emails</h2>
<h3 id="heading-what-were-automating">What We’re Automating</h3>
<p>Think about your typical morning routine:</p>
<ul>
<li><p>You open your inbox.</p>
</li>
<li><p>You quickly scan through a bunch of emails.</p>
</li>
<li><p>You try to spot meetings, tasks, and important reminders.</p>
</li>
<li><p>Then, you manually write a to-do list or add things to your calendar.</p>
</li>
</ul>
<p>Let's use an LLM agent to make that process effortless. Our agent will:</p>
<ul>
<li><p>Read a list of your email messages</p>
</li>
<li><p>Pull out time-sensitive items like meetings or deadlines</p>
</li>
<li><p>Summarize everything into a nice, clean daily schedule</p>
</li>
</ul>
<h3 id="heading-step-1-install-the-required-tools">Step 1: Install the Required Tools</h3>
<p>To get started, you'll need three main tools: Python, VSCode, and an OpenAI API key.</p>
<h4 id="heading-1-install-python-39-or-higher">1. Install Python 3.9 or Higher</h4>
<p>Grab the latest version of Python 3.9+ from the official website: <a target="_blank" href="https://www.python.org/downloads/">https://www.python.org/downloads/</a></p>
<p>Once it's installed, double-check it by running <code>python --version</code> in your terminal.</p>
<p>This command simply asks your system to report the Python version currently installed. You'll want to see Python 3.9.x or something higher to ensure compatibility with our project.</p>
<h4 id="heading-2-install-vscode-optional-but-recommended">2. Install VSCode (Optional but Recommended)</h4>
<p>VSCode is a fantastic, user-friendly code editor that works perfectly with Python. You can download it right here: <a target="_blank" href="https://code.visualstudio.com/">https://code.visualstudio.com/</a>.</p>
<h4 id="heading-3-get-your-openai-api-key">3. Get Your OpenAI API Key</h4>
<p>Head over to: https://platform.openai.com</p>
<p>Sign in or create a new account. Navigate to your API Keys page. Click “Create new secret key” and make sure to copy that key somewhere safe for later.</p>
<h4 id="heading-4-install-python-libraries">4. Install Python Libraries</h4>
<p>Open your terminal or command prompt and install these essential packages:</p>
<pre><code class="lang-bash">pip install langgraph langchain openai
</code></pre>
<p>This command uses pip, Python's package manager, to download and install three crucial libraries for our agent:</p>
<ul>
<li><p>langgraph: The core framework we'll use to build our agent's workflow.</p>
</li>
<li><p>langchain: A foundational library for working with large language models, upon which LangGraph is built.</p>
</li>
<li><p>openai: The official Python library for connecting to OpenAI's powerful AI models.</p>
</li>
</ul>
<p>If you're excited to try out multi-agent setups (which we'll cover in Step 5), also install CrewAI:</p>
<pre><code class="lang-bash">pip install crewai
</code></pre>
<p>This command installs CrewAI, a specialized framework that makes it easy to orchestrate multiple AI agents working together as a team.</p>
<p><strong>5. Set Your OpenAI API Key</strong></p>
<p>You need to make sure your Python code can find and use your OpenAI API key. This is typically done by setting it as an environment variable.</p>
<p>On macOS/Linux, run this in your terminal (replace "your-api-key" with your actual key):</p>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> OPENAI_API_KEY=<span class="hljs-string">"your-api-key"</span>
</code></pre>
<p>This command sets an environment variable named OPENAI_API_KEY. Environment variables are a secure way for applications (like your Python script) to access sensitive information without hardcoding it directly into the code itself.</p>
<p>On Windows (using Command Prompt), do this:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">set</span> OPENAI_API_KEY=<span class="hljs-string">"your-api-key"</span>
</code></pre>
<p>This is the Windows equivalent command to set the <code>OPENAI_API_KEY</code> environment variable.</p>
<p>Now, your Python code will be all set to talk to the OpenAI model!</p>
<h3 id="heading-step-2-define-the-task">Step 2: Define the Task</h3>
<p>We discussed this briefly in the beginning of this section. But to reiterate, this is what we’ll want our agent to do:</p>
<ul>
<li><p>Scan for meetings, events, and important tasks.</p>
</li>
<li><p>Jot them down quickly in a notebook or an app.</p>
</li>
<li><p>Create a rough mental plan for your day.</p>
</li>
</ul>
<p>This routine takes time and mental energy. So having an agent do it for us will be super helpful.</p>
<h3 id="heading-step-3-build-the-workflow-with-langgraph">Step 3: Build the Workflow with LangGraph</h3>
<h4 id="heading-what-is-langgraph">What Is LangGraph?</h4>
<p>LangGraph is a cool framework that helps you build agents using a "graph-based" workflow, kind of like drawing a flowchart. It's powered by LangChain and gives you a lot more control over exactly how each step in your agent's process unfolds.</p>
<p>Each "node" in this graph represents a decision point or a function that:</p>
<ul>
<li><p>Takes some input (its current "state").</p>
</li>
<li><p>Does some reasoning or takes an action (often involving the LLM and its tools).</p>
</li>
<li><p>Returns an updated output (a new "state").</p>
</li>
</ul>
<p>You draw the connections between these nodes, and LangGraph then executes it like a smart, automated state machine.</p>
<h4 id="heading-why-use-langgraph">Why Use LangGraph?</h4>
<ul>
<li><p>You get to control the precise order of execution.</p>
</li>
<li><p>It's fantastic for building workflows that have multiple steps or even branch off into different paths.</p>
</li>
<li><p>It plays nicely with both cloud-based models (like OpenAI) and models you run locally.</p>
</li>
</ul>
<p>Alright – now let’s write the code.</p>
<h5 id="heading-1-simulate-email-input"><strong>1. Simulate Email Input</strong></h5>
<p>In a real application, your agent would probably connect to Gmail or Outlook to fetch your actual emails. For this example, though, we’ll just hardcode some sample messages to keep things simple:</p>
<pre><code class="lang-python">Python

emails = <span class="hljs-string">"""
1. Subject: Standup Call at 10 AM
2. Subject: Client Review due by 5 PM
3. Subject: Lunch with Sarah at noon
4. Subject: AWS Budget Warning – 80% usage
5. Subject: Dentist Appointment - 4 PM
"""</span>
</code></pre>
<p>This multiline Python string, <code>emails</code>, acts as our stand-in for real email content. We're providing a simple, structured list of email subjects to demonstrate how the agent will process text.</p>
<h5 id="heading-2-define-the-agent-logic"><strong>2. Define the Agent Logic</strong></h5>
<p>Now, we'll tell OpenAI’s GPT model how to process this email text and turn it into a summary.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">from</span> langgraph.graph <span class="hljs-keyword">import</span> StateGraph, END
<span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> TypedDict, Annotated, List
<span class="hljs-keyword">import</span> operator

<span class="hljs-comment"># Define the state for our graph</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">AgentState</span>(<span class="hljs-params">TypedDict</span>):</span>
    emails: str
    result: str

llm = ChatOpenAI(temperature=<span class="hljs-number">0</span>, model=<span class="hljs-string">"gpt-4o"</span>) <span class="hljs-comment"># Using gpt-4o for better performance</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">calendar_summary_agent</span>(<span class="hljs-params">state: AgentState</span>) -&gt; AgentState:</span>
    emails = state[<span class="hljs-string">"emails"</span>]
    prompt = <span class="hljs-string">f"Summarize today's schedule based on these emails, listing time-sensitive items first and then other important notes. Be concise and use bullet points:\n<span class="hljs-subst">{emails}</span>"</span>
    summary = llm.invoke(prompt).content
    <span class="hljs-keyword">return</span> {<span class="hljs-string">"result"</span>: summary, <span class="hljs-string">"emails"</span>: emails} <span class="hljs-comment"># Ensure emails is also returned</span>
</code></pre>
<p>Here’s what’s going on:</p>
<ul>
<li><p><strong>Imports</strong>: We bring in necessary components:</p>
<ul>
<li><p><code>ChatOpenAI</code> to connect to the LLM,</p>
</li>
<li><p><code>StateGraph</code> and <code>END</code> from <code>langgraph.graph</code> to build our agent workflow,</p>
</li>
<li><p><code>TypedDict</code>, <code>Annotated</code>, and <code>List</code> from <code>typing</code> for type checking and structure,</p>
</li>
<li><p><code>operator</code> (though not used in this snippet, it can help with comparisons or logic).</p>
</li>
</ul>
</li>
<li><p><strong>AgentState</strong>: This <code>TypedDict</code> defines the shape of the data our agent will work with. It includes:</p>
<ul>
<li><p><code>emails</code>: the raw input messages.</p>
</li>
<li><p><code>result</code>: the final output (the daily summary).</p>
</li>
</ul>
</li>
<li><p><strong>llm = ChatOpenAI(...)</strong>: Initializes the language model. We're using GPT-4o with <code>temperature=0</code> to ensure consistent, predictable output perfect for structured summarization tasks.</p>
</li>
<li><p><strong>calendar_summary_agent(state: AgentState)</strong>: This function is the "brain" of our agent. It:</p>
<ul>
<li><p>Takes in the current state, which includes a list of emails.</p>
</li>
<li><p>Extracts the emails from that state.</p>
</li>
<li><p>Constructs a prompt that tells the model to generate a concise daily schedule summary using bullet points, prioritizing time-sensitive items.</p>
</li>
<li><p>Sends this prompt to the model with <code>llm.invoke(prompt).content</code>, which returns the LLM’s response as plain text.</p>
</li>
<li><p>Returns a new <code>AgentState</code> dictionary containing:</p>
<ul>
<li><p><code>result</code>: the generated summary,</p>
</li>
<li><p><code>emails</code>: preserved in case we need it downstream.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h5 id="heading-3-build-and-run-the-graph"><strong>3. Build and Run the Graph</strong></h5>
<p>Now, let's use LangGraph to map out the flow of our single-agent task and then run it.</p>
<pre><code class="lang-python">builder = StateGraph(AgentState)
builder.add_node(<span class="hljs-string">"calendar"</span>, calendar_summary_agent)
builder.set_entry_point(<span class="hljs-string">"calendar"</span>)
builder.set_finish_point(<span class="hljs-string">"calendar"</span>) <span class="hljs-comment"># END is implicit if not set explicitly</span>

graph = builder.compile()

<span class="hljs-comment"># Run the graph using your simulated email data</span>
result = graph.invoke({<span class="hljs-string">"emails"</span>: emails})
print(result[<span class="hljs-string">"result"</span>])
</code></pre>
<p>Here’s what’s going on:</p>
<ul>
<li><p><strong>builder = StateGraph(AgentState):</strong> We're initiating a StateGraph object. By passing AgentState, we're telling LangGraph the expected data structure for its internal state.</p>
</li>
<li><p><strong>builder.add_node("calendar", calendar_summary_agent):</strong> This line adds a named "node" to our graph. We're calling it "calendar", and we're linking it to our <code>calendar_summary_agent</code> function, meaning that function will be executed when this node is active.</p>
</li>
<li><p><strong>builder.set_entry_point("calendar"):</strong> This sets "calendar" as the very first step in our workflow. When we start the graph, execution will begin here.</p>
</li>
<li><p><strong>builder.set_finish_point("calendar"):</strong> This tells LangGraph that once the "calendar" node finishes its job, the entire graph process is complete.</p>
</li>
<li><p><strong>graph = builder.compile():</strong> This command takes our defined graph blueprint and "compiles" it into an executable workflow.</p>
</li>
<li><p><strong>result = graph.invoke({"emails": emails}):</strong> This is where the magic happens! We're telling our graph to start running. We pass it an initial state that contains our emails data. The graph will then process this data through its nodes until it reaches an end point, returning the final state.</p>
</li>
<li><p><strong>print(result["result"]):</strong> Finally, we grab the summarized schedule from the result (the final state of our graph) and print it to the console.</p>
</li>
</ul>
<h4 id="heading-example-output">Example Output</h4>
<p><code>Your Schedule:</code><br><code>- 10:00 AM – Standup Call</code><br><code>- 12:00 PM – Lunch with Sarah</code><br><code>- 4:00 PM – Dentist Appointment</code><br><code>- Submit client report by 5:00 PM</code><br><code>- AWS Budget Warning – check usage</code></p>
<p>Boom! You've just built an AI agent that can read your emails and whip up your daily schedule. Pretty cool, right? This is a simple yet powerful peek into what LLM agents can do with just a few lines of code.</p>
<h2 id="heading-multi-agent-collaboration-with-crewai">Multi-Agent Collaboration with CrewAI</h2>
<h3 id="heading-what-is-crewai">What Is CrewAI?</h3>
<p>CrewAI is an exciting open-source framework that lets you build <em>teams</em> of agents that work together seamlessly just like a real-world project team! Each agent in a CrewAI setup:</p>
<ul>
<li><p>Has a specific, specialized role.</p>
</li>
<li><p>Can communicate and share information with its teammates.</p>
</li>
<li><p>Collaborates to achieve a shared goal.</p>
</li>
</ul>
<p>This multi-agent approach is super useful when your task is too big or too complex for just one agent, or when breaking it down into specialized parts makes it clearer and more efficient.</p>
<h3 id="heading-sample-roles-for-the-email-summary-task">Sample Roles for the Email Summary Task</h3>
<p>Let's imagine our email summary task being handled by a small team of agents:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Agent Name</strong></td><td><strong>Role</strong></td><td><strong>Responsibility</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Extractor</td><td>Email Scanner</td><td>"Find meetings, reminders, and tasks from emails"</td></tr>
<tr>
<td>Prioritizer</td><td>Schedule Optimizer</td><td>Sort items by urgency and time</td></tr>
<tr>
<td>Formatter</td><td>Output Generator</td><td>"Write a clean, polished daily agenda"</td></tr>
</tbody>
</table>
</div><h3 id="heading-sample-crewai-code">Sample CrewAI Code</h3>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> crewai <span class="hljs-keyword">import</span> Agent, Crew, Task, Process
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">import</span> os

<span class="hljs-comment"># Set your OpenAI API key from environment variables</span>
<span class="hljs-comment"># os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY" # Make sure this is set, or defined directly</span>

<span class="hljs-comment"># Initialize the LLM (using gpt-4o for better performance)</span>
llm = ChatOpenAI(temperature=<span class="hljs-number">0</span>, model=<span class="hljs-string">"gpt-4o"</span>)

<span class="hljs-comment"># Define the agents with specific roles and goals</span>
extractor = Agent(
    role=<span class="hljs-string">"Email Scanner"</span>,
    goal=<span class="hljs-string">"Find all meetings, reminders, and tasks from the given emails, accurately extracting details like time, date, and subject."</span>,
    backstory=<span class="hljs-string">"You are an expert at scanning emails for key information. You meticulously extract every relevant detail."</span>,
    verbose=<span class="hljs-literal">True</span>,
    allow_delegation=<span class="hljs-literal">False</span>,
    llm=llm
)

prioritizer = Agent(
    role=<span class="hljs-string">"Schedule Optimizer"</span>,
    goal=<span class="hljs-string">"Sort extracted items by urgency and time, preparing them for a daily agenda."</span>,
    backstory=<span class="hljs-string">"You are a master of time management, always knowing what needs to be done first. You organize tasks logically."</span>,
    verbose=<span class="hljs-literal">True</span>,
    allow_delegation=<span class="hljs-literal">False</span>,
    llm=llm
)

formatter = Agent(
    role=<span class="hljs-string">"Output Generator"</span>,
    goal=<span class="hljs-string">"Generate a clean, polished, and concise daily agenda in bullet-point format, clearly listing all schedule items."</span>,
    backstory=<span class="hljs-string">"You are a professional secretary, ensuring all outputs are perfectly formatted and easy to read. You prioritize clarity."</span>,
    verbose=<span class="hljs-literal">True</span>,
    allow_delegation=<span class="hljs-literal">False</span>,
    llm=llm
)

<span class="hljs-comment"># Simulate email input</span>
emails = <span class="hljs-string">"""
1. Subject: Standup Call at 10 AM
2. Subject: Client Review due by 5 PM
3. Subject: Lunch with Sarah at noon
4. Subject: AWS Budget Warning – 80% usage
5. Subject: Dentist Appointment - 4 PM
"""</span>

<span class="hljs-comment"># Define the tasks for each agent</span>
extract_task = Task(
    description=<span class="hljs-string">f"Extract all relevant events, meetings, and tasks from these emails: <span class="hljs-subst">{emails}</span>. Focus on precise details."</span>,
    agent=extractor,
    expected_output=<span class="hljs-string">"A list of extracted items with their details (e.g., '- Standup Call at 10 AM', '- Client Review due by 5 PM')."</span>
)

prioritize_task = Task(
    description=<span class="hljs-string">"Prioritize the extracted items by time and urgency. Meetings first, then deadlines, then other notes."</span>,
    agent=prioritizer,
    context=[extract_task], <span class="hljs-comment"># The output of extract_task is the input here</span>
    expected_output=<span class="hljs-string">"A prioritized list of schedule items."</span>
)

format_task = Task(
    description=<span class="hljs-string">"Format the prioritized schedule into a clean, easy-to-read daily agenda using bullet points. Ensure concise language."</span>,
    agent=formatter,
    context=[prioritize_task], <span class="hljs-comment"># The output of prioritize_task is the input here</span>
    expected_output=<span class="hljs-string">"A well-formatted daily agenda with bullet points."</span>
)

<span class="hljs-comment"># Instantiate the crew</span>
crew = Crew(
    agents=[extractor, prioritizer, formatter],
    tasks=[extract_task, prioritize_task, format_task],
    process=Process.sequential, <span class="hljs-comment"># Tasks are executed sequentially</span>
    verbose=<span class="hljs-number">2</span> <span class="hljs-comment"># Outputs more details during execution</span>
)

<span class="hljs-comment"># Run the crew</span>
result = crew.kickoff()
print(<span class="hljs-string">"\n########################"</span>)
print(<span class="hljs-string">"## Final Daily Agenda ##"</span>)
print(<span class="hljs-string">"########################\n"</span>)
print(result)
</code></pre>
<p>Here’s what’s going on:</p>
<ul>
<li><p><strong>Imports:</strong> We bring in key classes from CrewAI: Agent, Crew, Task, and Process. We also import <code>ChatOpenAI</code> for our language model and os to handle environment variables.</p>
</li>
<li><p><strong>llm = ChatOpenAI(...):</strong> Just like in the LangGraph example, this sets up our OpenAI language model, making sure its responses are direct (temperature=0) and using the gpt-4o model.</p>
</li>
<li><p><strong>Agent Definitions (extractor, prioritizer, formatter):</strong></p>
<ul>
<li><p>Each of these variables creates an Agent instance. An agent is defined by its role (what it does), a specific goal it's trying to achieve, and a backstory (a sort of personality or expertise that helps the LLM understand its purpose better).</p>
</li>
<li><p>verbose=True is super helpful for debugging, as it makes the agents print out their "thoughts" as they work.</p>
</li>
<li><p>allow_delegation=False means these agents won't pass their assigned tasks to other agents (though this can be set to True for more complex delegation scenarios).</p>
</li>
<li><p>llm=llm connects each agent to our OpenAI language model.</p>
</li>
</ul>
</li>
<li><p><strong>Simulated emails:</strong> We reuse the same sample email data for this example.</p>
</li>
<li><p><strong>Task Definitions (extract_task, prioritize_task, format_task):</strong></p>
<ul>
<li><p>Each Task defines a specific piece of work that an agent needs to perform.</p>
</li>
<li><p>description clearly tells the agent what the task involves.</p>
</li>
<li><p>agent assigns this task to one of our defined agents (e.g., extractor for extract_task).</p>
</li>
<li><p>context=[...] is a critical part of CrewAI's collaboration. It tells a task to use the <em>output</em> of a previous task as its <em>input</em>. For instance, prioritize_task takes the extract_task's output as its context.</p>
</li>
<li><p>expected_output gives the agent an idea of what its result should look like, helping guide the LLM.</p>
</li>
</ul>
</li>
<li><p><strong>crew = Crew(...):</strong></p>
<ul>
<li><p>This is where we assemble our team! We create a Crew instance, giving it our list of agents and tasks.</p>
</li>
<li><p>process=Process.sequential tells the crew to execute tasks one after another in the order they're defined in the tasks list. CrewAI also supports more advanced processes like hierarchical ones.</p>
</li>
<li><p>verbose=2 will show you a very detailed log of the crew's internal workings and communication.</p>
</li>
</ul>
</li>
<li><p><strong>result = crew.kickoff():</strong> This command officially starts the entire multi-agent workflow. The agents will begin collaborating, passing information, and working through their assigned tasks in sequence.</p>
</li>
<li><p><strong>fprint(result):</strong> Finally, the consolidated output from the entire crew's collaborative effort is printed to your console.</p>
</li>
</ul>
<p>CrewAI cleverly handles all the communication between agents, figures out who needs to work on what and when, and passes the output smoothly from one agent to the next it's like having a mini AI assembly line!</p>
<h2 id="heading-what-actually-happens-during-execution">What Actually Happens During Execution?</h2>
<p>So, whether you're using LangGraph or CrewAI, what's really going on behind the scenes when an agent runs? Let's break down the execution process:</p>
<ul>
<li><p>The system gets an <strong>input state</strong> (for example, your emails).</p>
</li>
<li><p>The first agent or graph node reads this input and uses a <strong>Large Language Model (LLM)</strong> to make sense of it.</p>
</li>
<li><p>Based on its understanding, the agent decides on an <strong>action</strong> like pulling out key events or calling a specific tool.</p>
</li>
<li><p>If needed, the agent might <strong>invoke tools</strong> (like a web search or a file reader) to get more context or perform external operations.</p>
</li>
<li><p>The result of that action is then <strong>passed to the next agent</strong> in the team (if it's a multi-agent setup) or returned directly to you.</p>
</li>
</ul>
<p>Execution keeps going until:</p>
<ul>
<li><p>The task is fully completed.</p>
</li>
<li><p>All agents have finished their assigned roles.</p>
</li>
<li><p>A stopping condition or a designated "END" point in the workflow is reached.</p>
</li>
</ul>
<p>Think of this as a super-smart workflow engine where every single step involves reasoning, making decisions, and remembering previous interactions.</p>
<h2 id="heading-are-llm-agents-safe-what-to-know-about-security-and-privacy">Are LLM Agents Safe? What to Know About Security and Privacy</h2>
<p>As cool as LLM agents are, they raise an important question: <em>can you really trust an AI to run parts of your workflow or interact with your data?</em> It depends. If you’re using services like OpenAI or Anthropic, your data is encrypted in transit and (as of now) isn’t used for training.</p>
<p>But some data might still be temporarily logged to prevent abuse. That’s usually fine for testing and personal projects, but if you’re working with sensitive business info, customer data, or anything private, you’ll want to be careful.</p>
<p>Use anonymized inputs, avoid exposing full datasets, and consider running agents locally using open-source models like LLaMA or Mistral if full control matters to you.</p>
<p>You can also set clear boundaries for your agents so they don’t overstep. Think of it like onboarding a new intern: you wouldn’t give them access to everything on day one.</p>
<p>Give agents only the tools and files they need, keep logs of what they do, and always review the results before letting them make real changes.</p>
<p>As this tech grows, more safety features are coming like better sandboxing, memory limits, and role-based access. But for now, it’s smart to treat your agents like powerful helpers that still need some human supervision.</p>
<h2 id="heading-troubleshooting-amp-tips">Troubleshooting &amp; Tips</h2>
<p>Sometimes, agents can be a bit quirky! Here are some common issues you might run into and how to fix them:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Issue</strong></td><td><strong>Suggested Fix</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Agent seems to loop forever</td><td>Set a maximum number of iterations or define a clearer stopping point.</td></tr>
<tr>
<td>Output is too chatty or verbose</td><td>Use more specific prompts (for example, “Respond in bullet points only”).</td></tr>
<tr>
<td>Input is too long or gets cut off</td><td>Break down large pieces of content into smaller chunks and summarize them individually.</td></tr>
<tr>
<td>Agent runs too slowly</td><td>Try using a faster LLM model like gpt-3.5 or consider running a local model.</td></tr>
</tbody>
</table>
</div><p>A handy tip: You can also add print() statements or logging messages inside your agent functions to see what's happening at each stage and debug state transitions.</p>
<h2 id="heading-explore-more-daily-automations">Explore More Daily Automations</h2>
<p>Once you've built one agent-based task, you'll find it incredibly easy to adapt the pattern for other automations. Here are some cool ideas to get your creative juices flowing:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Task Type</strong></td><td><strong>Example Automation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>DevOps Assistant</td><td>"Read system logs, detect potential issues, and suggest solutions."</td></tr>
<tr>
<td>Finance Tracker</td><td>Read bank statements or CSV files and summarize your spending habits/budgets.</td></tr>
<tr>
<td>Meeting Organizer</td><td>After a meeting, automatically extract action items and assign owners.</td></tr>
<tr>
<td>Inbox Cleaner</td><td>"Automatically label, archive, and delete non-urgent emails."</td></tr>
<tr>
<td>Note Summarizer</td><td>Convert your daily notes into a neatly formatted to-do list or summary.</td></tr>
<tr>
<td>Link Checker</td><td>Extract URLs from documents and automatically test if they're still valid.</td></tr>
<tr>
<td>Resume Formatter</td><td>Score resumes against job descriptions and format them automatically.</td></tr>
</tbody>
</table>
</div><p>Each of these can be built using the very same principles and frameworks we discussed whether that's LangGraph or CrewAI.</p>
<h2 id="heading-whats-next-in-agent-technology">What’s Next in Agent Technology?</h2>
<p>LLM agents are evolving at lightning speed, and the next wave of innovation is already here:</p>
<ul>
<li><p><strong>Smarter memory systems</strong>: Expect agents to have better long-term memory, allowing them to learn over extended periods and remember past conversations and actions.</p>
</li>
<li><p><strong>Multi-modal agents</strong>: Agents won't just handle text anymore! They'll be able to process and understand images, audio, and video, making them much more versatile.</p>
</li>
<li><p><strong>Advanced planning frameworks</strong>: Techniques like ReAct, Toolformer, and AutoGen are constantly improving agents' ability to reason, plan, and reduce those pesky "hallucinations."</p>
</li>
<li><p><strong>Edge deployment</strong>: Imagine agents running entirely offline on your local computer or device using lightweight models like LLaMA 3 or Mistral.</p>
</li>
</ul>
<p>In the very near future, you'll see agents seamlessly integrated into:</p>
<ul>
<li><p>Your DevOps pipelines</p>
</li>
<li><p>Big enterprise workflows</p>
</li>
<li><p>Everyday productivity tools</p>
</li>
<li><p>Mobile apps and smart devices</p>
</li>
<li><p>Games, simulations, and educational platforms</p>
</li>
</ul>
<h2 id="heading-final-summary">Final Summary</h2>
<p>Alright, let's quickly recap all the cool stuff you've just learned and accomplished:</p>
<ul>
<li><p>You've gotten a solid grasp of what LLM agents are and why they're so powerful.</p>
</li>
<li><p>You've seen how open-source frameworks like LangGraph and CrewAI make building agents much easier.</p>
</li>
<li><p>You've built a real LLM agent using LangGraph to automate a common daily task: summarizing your inbox!</p>
</li>
<li><p>You've explored the world of multi-agent collaboration with CrewAI, understanding how teams of AIs can work together.</p>
</li>
<li><p>You've learned how to take these principles and scale them to automate countless other tasks.</p>
</li>
</ul>
<p>So, next time you find yourself stuck doing something repetitive, just ask yourself: "Hey, can I build an agent for that?" The answer is probably yes!</p>
<h3 id="heading-resources-recap">Resources Recap</h3>
<p>Here are some helpful resources if you want to dive deeper into building LLM agents:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Resource</strong></td><td><strong>Link</strong></td></tr>
</thead>
<tbody>
<tr>
<td>LangGraph Docs</td><td><a target="_blank" href="https://docs.langgraph.dev/">https://docs.langgraph.dev/</a></td></tr>
<tr>
<td>CrewAI GitHub</td><td><a target="_blank" href="https://github.com/joaomdmoura/crewAI">https://github.com/joaomdmoura/crewAI</a></td></tr>
<tr>
<td>LangChain Docs</td><td><a target="_blank" href="https://docs.langchain.com/docs/">https://docs.langchain.com/docs/</a></td></tr>
<tr>
<td>OpenAI API Docs</td><td><a target="_blank" href="https://platform.openai.com/docs">https://platform.openai.com/docs</a></td></tr>
<tr>
<td>Python 3.9+</td><td><a target="_blank" href="https://www.python.org/downloads/">https://www.python.org/downloads/</a></td></tr>
<tr>
<td>VSCode</td><td><a target="_blank" href="https://code.visualstudio.com/">https://code.visualstudio.com/</a></td></tr>
</tbody>
</table>
</div> ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
