agents - freeCodeCamp.org

How to Use Context Hub (chub) to Build a Companion Relevance Engine

Nataraj Sundar — Fri, 17 Apr 2026 20:36:32 +0000

Large language models can write code quickly, but they still misremember APIs, miss version-specific details, and forget what they learned at the end of a session.

That is the problem Context Hub is trying to solve.

Context Hub (chub) gives coding agents curated, versioned documentation and skills that they can search and fetch through a CLI. It also gives them two learning loops: local annotations for agent memory and feedback for maintainers.

In this tutorial, you'll learn how the official chub workflow works, how Context Hub organizes docs and skills, how annotations and feedback create a memory loop, and how to build a companion relevance engine that improves retrieval without breaking the upstream content model.

This tutorial uses two public repositories side by side:

the official upstream project: andrewyng/context-hub
the companion implementation for this article: natarajsundar/context-hub-relevance-engine

I've also opened a corresponding upstream pull request from my fork to the main project. If you want to track that work from the article, use the upstream pull request list filtered by author: andrewyng/context-hub pull requests by natarajsundar.

What We'll Build

By the end of this tutorial, you'll have:

a clear mental model for how Context Hub works
a working local install of the official chub CLI
a repeatable workflow for search, fetch, annotations, and feedback
a companion repo that adds an additive reranking layer on top of a Context-Hub-style content tree
a small benchmark and local comparison UI you can run end to end
a clear bridge between the companion repo and the smaller upstream PR

Prerequisites

Before you start, make sure you have:

Node.js 18 or newer
npm
comfort with the terminal
basic familiarity with Markdown

How to Understand Context Hub
How to Understand the Official Repo, the Companion Repo, and the Upstream PR
How to Install and Use the Official CLI
How to Understand Docs, Skills, and the Content Layout
How to Use Incremental Fetch and Layered Sources
How to Use Annotations and Feedback to Create a Memory Loop
How to See Where Relevance Still Misses
How the Companion Relevance Engine Improves Retrieval
How to Run the Companion Repo End to End
How to Read the Benchmark Honestly
How to Connect the Companion Repo to the Upstream PR
Conclusion
Sources

How to Understand Context Hub

Context Hub is easiest to understand as a workflow for turning fast-moving documentation into a reliable input for coding agents.

Instead of asking an agent to rely on whatever it remembers from training data, you give it a predictable contract:

search for the right entry
fetch the right doc or skill
write code against that curated content
save local lessons as annotations
send doc-quality feedback back to maintainers

That system boundary matters.

It makes the agent easier to audit, easier to improve, and easier to extend. It also keeps the interface small enough that you can reason about where the failures happen. If the agent still misses the answer, you can ask whether the problem happened during search, fetch, context selection, or generation.

How to Understand the Official Repo, the Companion repo, and the Upstream PR

This tutorial is intentionally split across two codebases and one contribution path.

The official upstream project, andrewyng/context-hub, is the source of truth for the real CLI, the content model, and the documented workflows. That's the codebase you should use to learn how chub works today.

The companion repository, natarajsundar/context-hub-relevance-engine, is where the relevant ideas in this article are made concrete. It's a companion implementation, not a replacement product. Its job is to make retrieval tradeoffs visible, measurable, and easy to run locally.

The upstream PR is the bridge between those two worlds. The companion repo is where you can iterate faster on benchmarks, reranking, and the comparison UI. The upstream PR is where the smallest reviewable slices can be proposed back to the main project. You can track that thread here: upstream PR search filtered by author.

That three-part framing keeps the article honest:

use the upstream repo to understand the current system
use the companion repo to explore relevant improvements end to end
use the upstream PR to show how a larger idea can be broken into reviewable pieces

How to Install and Use the Official CLI

The official quick start is intentionally small.

npm install -g @aisuite/chub

Once the CLI is installed, you can search for what is available and fetch a specific entry:

chub search openai
chub get openai/chat --lang py

That's the happy path, but it helps to think through the request flow.

In practice, the most useful detail is that the CLI is designed for the agent to use, not just for the human to use by hand.

That's why the upstream CLI also ships a get-api-docs skill. For example, if you use Claude Code, you can copy the skill into your local project like this:

mkdir -p .claude/skills
cp $(npm root -g)/@aisuite/chub/skills/get-api-docs/SKILL.md \
  .claude/skills/get-api-docs.md

That step teaches the agent a retrieval habit:

Before you write code against a third-party SDK or API, use chub instead of guessing.

That behavioral rule is often as important as the docs themselves.

How to Understand Docs, Skills, and the Content Layout

Context Hub separates content into two categories:

docs, which answer “what should the agent know?”
skills, which answer “how should the agent behave?”

That distinction makes the content model easier to scale. Docs can be versioned and language-specific. Skills can stay short and operational.

The directory structure is also predictable. The content guide organizes entries by author, then by docs or skills, then by entry name.

A small example looks like this:

author/docs/payments/python/DOC.md
author/docs/payments/python/references/errors.md
author/skills/login-flows/SKILL.md

This is one of the reasons Context Hub is easy to work with.

The shape of the content is plain Markdown, the main entry file is predictable, and the build output is inspectable. You don't have to reverse engineer a hidden prompt layer to figure out what the agent is reading.

How to Use Incremental Fetch and Layered Sources

One of the best design choices in Context Hub is that it doesn't force you to inject every file into the model on every request.

Instead, the entry file gives you the overview, and the reference files hold the deeper material.

That lets you fetch content in progressively larger slices.

chub get stripe/webhooks --lang py
chub get stripe/webhooks --lang py --file references/raw-body.md
chub get stripe/webhooks --lang py --full

This is a token-budget feature as much as it is a documentation feature. A good agent should first load the overview, decide what part of the task matters, and only then fetch the specific supporting file.

Context Hub also supports layered sources. You can merge public content with your own local build output through ~/.chub/config.yaml.

A minimal configuration looks like this:

sources:
  - name: community
    url: https://cdn.aichub.org/v1
  - name: my-team
    path: /opt/team-docs/dist

That means you can keep public docs in one lane and team-specific runbooks in another lane while still giving the agent one search surface.

How to Use Annotations and Feedback to Create a Memory Loop

Context Hub has two different improvement channels.

Annotations are local. They help your agent remember what worked last time. Feedback is shared. It helps maintainers improve the docs for everyone.

That distinction matters because not every lesson belongs in the shared registry. Some lessons are environment-specific. Others point to content quality issues that should be fixed centrally.

Here is what local memory looks like in practice:

chub annotate stripe/webhooks \
  "Remember: Flask request.data must stay raw for Stripe signature verification."

And here's the feedback path:

chub feedback stripe/webhooks up

That loop is simple, but it's one of the most important ideas in the project. It turns a one-off debugging lesson into either persistent local memory or a signal that the shared docs need to improve.

How to See Where Relevance Still Misses

The upstream project already has a real ranking story. It uses BM25 and lexical rescue so that package-like identifiers, exact tokens, and fuzzy matches still have a chance to surface.

That is a strong baseline.

But developer queries are often much messier than package names.

People search for:

rrf
signin
pg vector
hnsw
raw body stripe

Those aren't “bad” queries. They're realistic shorthand.

And they expose an opportunity in the content model itself: many of the exact answers live in reference files such as references/rrf.md, references/raw-body.md, and references/hnsw.md.

So the question is not whether the current search works at all. It clearly does. The better question is this:

How can you improve retrieval without breaking the content contract that already makes Context Hub useful?

The answer in the companion repo is to keep the current model and add a reranking layer on top of it.

How the Companion Relevance Engine Improves Retrieval

The companion repository in this article is context-hub-relevance-engine.

It keeps the same broad ideas that make Context Hub attractive:

plain Markdown content
DOC.md and SKILL.md entry points
build artifacts you can inspect
local annotations and feedback
progressive fetch behavior

Then it adds one new build artifact: signals.json.

At build time, the engine extracts extra signals such as:

headings from the main file
titles and tokens from reference files
language and version metadata
source metadata and freshness
annotation overlap
feedback priors

The first pass stays cheap and transparent. The reranker only runs after the baseline has done its work.

That approach matters for two reasons.

First, it's additive. You don't have to redesign the content tree.

Second, it's measurable. You can define concrete failure modes, fix them one by one, and run the same benchmark every time you change the scorer.

How to Run the Companion Repo End to End

Open the repository on GitHub, clone it using GitHub’s normal clone flow, and then run the commands below from the project root.

cd context-hub-relevance-engine
npm install
npm run build
npm test

The repository has no third-party runtime dependencies, so npm install is mostly there to keep the workflow familiar. The main commands are all plain Node scripts.

How to Reproduce a Baseline Miss

Start with the query rrf.

node bin/chub-lab.mjs search rrf --mode baseline --lang python

Expected output:

No results.

Now run the improved mode.

node bin/chub-lab.mjs search rrf --mode improved --lang python

Expected top result:

langchain/retrievers [doc] score=320.24
  Composable retrieval patterns for hybrid search, parent documents, query expansion, and reranking.

That win happens because the improved mode looks beyond the top-level entry description. It also sees the reference file title rrf, the related terms from query expansion, and the broader token overlap in the extracted signals.

How to Reproduce a Workflow-intent Win

Try a sign-in query.

node bin/chub-lab.mjs search signin --mode baseline
node bin/chub-lab.mjs search signin --mode improved

The baseline misses. The improved mode returns playwright-community/login-flows because the reranker treats signin, sign in, login, and authentication as related intent.

How to Test the Memory Loop

Write a local note:

node bin/chub-lab.mjs annotate stripe/webhooks \
  "Remember: Flask request.data must stay raw for Stripe signature verification."

Then fetch the doc:

node bin/chub-lab.mjs get stripe/webhooks --lang python

You will see the main doc content, the list of available reference files, and the appended annotation.

That's the behavior you want from an agent memory loop: learn once, reuse many times.

How to Run the Benchmark

Start from an empty store:

npm run reset-store
node bin/chub-lab.mjs evaluate

The included synthetic stress set reports the following summary with an empty store:

Mode	Top-1 Accuracy	MRR
baseline	0.333	0.333
improved	1.000	1.000

You can also seed the store and rerun the evaluation:

npm run seed-demo
node bin/chub-lab.mjs evaluate

That demonstrates how annotations and feedback can push relevant entries even higher when the query overlaps with the agent’s own history.

How to Launch the Local Comparison UI

npm run serve

Then open http://localhost:8787 in your browser.

The UI lets you compare baseline and improved retrieval, inspect stored annotations and feedback, rebuild the local artifacts, and rerun the benchmark from one place.

How to Read the Benchmark Honestly

The benchmark in this repo is intentionally small.

That is a feature, not a flaw.

The point is not to claim universal search quality. The point is to make a handful of realistic failure modes easy to reproduce:

acronym queries
shorthand workflow queries
reference-file topic queries
memory-aware reranking

That keeps the evaluation honest.

If a future scoring change breaks rrf, signin, or raw body stripe, you'll know immediately. And if you add a stronger dataset later, you can keep these tests as regression guards.

The benchmark files included in the repo are:

demo/benchmark.json
docs/benchmark-empty-store.json
docs/benchmark-seeded-store.json
docs/relevance-improvement-plan.md

How to Connect the Companion Repo to the Upstream PR

A good companion repo is broad enough to explore ideas quickly. A good upstream PR is narrow enough to review.

That's why the two shouldn't be identical.

The companion repository is where you can keep the full relevance story together:

the local comparison UI
the synthetic benchmark
the richer reranking signals
the debug and explain surfaces
the documentation that walks through tradeoffs end to end

The upstream PR should be smaller and more surgical. In practice, that usually means proposing the most reviewable slices first, such as:

reference-file signal extraction
explainable score output for debugging
a lightweight benchmark fixture format
one additive reranking hook behind a flag

That keeps the main repository maintainable while still letting the article and companion repo tell the full engineering story. The upstream thread for this work lives here: andrewyng/context-hub pull requests by natarajsundar.

Conclusion

What makes Context Hub interesting is not just that it stores documentation. It gives you a clear system boundary for improving coding agents.

You can inspect what the agent reads. You can decide when it should retrieve. You can layer public and private sources. You can persist local lessons. And you can improve ranking without tearing the whole model apart.

The companion relevance engine shows how to keep what already works, make one part of the system measurably better, and package the result in a way other developers can run, inspect, and extend. The upstream PR, in turn, shows how to turn a broad idea into smaller pieces that are realistic to review in the main project.

Diagram Attribution

All diagrams used in this article were created by the author specifically for this tutorial and its companion repository.

Sources

Docker Container Doctor: How I Built an AI Agent That Monitors and Fixes My Containers

Balajee Asish Brahmandam — Mon, 23 Mar 2026 17:21:11 +0000

Maybe this sounds familiar: your production container crashes at 3 AM. By the time you wake up, it's been throwing the same error for 2 hours. You SSH in, pull logs, decode the cryptic stack trace, Google the error, and finally restart it. Twenty minutes of your morning gone. And the worst part? It happens again next week.

I got tired of this cycle. I was running 5 containerized services on a single Linode box – a Flask API, a Postgres database, an Nginx reverse proxy, a Redis cache, and a background worker. Every other week, one of them would crash. The logs were messy. The errors weren't obvious. And I'd waste time debugging something that could've been auto-detected and fixed in seconds.

So I built something better: a Python agent that watches your containers in real-time, spots errors, figures out what went wrong using Claude, and fixes them without waking you up. I call it the Container Doctor. It's not magic. It's Docker API + LLM reasoning + some automation glue. Here's exactly how I built it, what went wrong along the way, and what I'd do differently.

Why Not Just Use Prometheus?
The Architecture
Setting Up the Project
The Monitoring Script — Line by Line
The Claude Diagnosis Prompt (and Why Structure Matters)
Auto-Fix Logic — Being Conservative on Purpose
Adding Slack Notifications
Health Check Endpoint
Rate Limiting Claude Calls
Docker Compose — The Full Setup
Real Errors I Caught in Production
Cost Breakdown — What This Actually Costs
Security Considerations
What I'd Do Differently
What's Next?

Why Not Just Use Prometheus?

Fair question. Prometheus, Grafana, DataDog – they're all great. But for my setup, they were overkill. I had 5 containers on a $20/month Linode. Setting up Prometheus means deploying a metrics server, configuring exporters for each service, building Grafana dashboards, and writing alert rules. That's a whole side project just to monitor a side project.

Even then, those tools tell you what happened. They'll show you a spike in memory or a 500 error rate. But they won't tell you why. You still need a human to look at the logs, figure out the root cause, and decide what to do.

That's the gap I wanted to fill. I didn't need another dashboard. I needed something that could read a stack trace, understand the context, and either fix it or tell me exactly what to do when I wake up. Claude turned out to be surprisingly good at this. It can read a Python traceback and tell you the issue faster than most junior devs (and some senior ones, honestly).

The Architecture

Here's how the pieces fit together:

┌─────────────────────────────────────────────┐
│              Docker Host                      │
│                                               │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │   web    │  │   api    │  │    db    │   │
│  │ (nginx)  │  │ (flask)  │  │(postgres)│   │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘   │
│       │              │              │         │
│       └──────────────┼──────────────┘         │
│                      │                         │
│              Docker Socket                     │
│                      │                         │
│            ┌─────────┴─────────┐              │
│            │ Container Doctor  │              │
│            │  (Python agent)   │              │
│            └─────────┬─────────┘              │
│                      │                         │
└──────────────────────┼─────────────────────────┘
                       │
              ┌────────┴────────┐
              │   Claude API    │
              │  (diagnosis)    │
              └────────┬────────┘
                       │
              ┌────────┴────────┐
              │  Slack Webhook  │
              │  (alerts)       │
              └─────────────────┘

The flow works like this:

The Container Doctor runs in its own container with the Docker socket mounted
Every 10 seconds, it pulls the last 50 lines of logs from each target container
It scans for error patterns (keywords like "error", "exception", "traceback", "fatal")
When it finds something, it sends the logs to Claude with a structured prompt
Claude returns a JSON diagnosis: root cause, severity, suggested fix, and whether it's safe to auto-restart
If severity is high and auto-restart is safe, the script restarts the container
Either way, it sends a Slack notification with the full diagnosis
A simple health endpoint lets you check the doctor's own status

The key insight: the script doesn't try to be smart about the diagnosis itself. It outsources all the thinking to Claude. The script's job is just plumbing: collecting logs, routing them to Claude, and executing the response.

Setting Up the Project

Create your project directory:

mkdir container-doctor && cd container-doctor

Here's your requirements.txt:

docker==7.0.0
anthropic>=0.28.0
python-dotenv==1.0.0
flask==3.0.0
requests==2.31.0

Install locally for testing: pip install -r requirements.txt

Create a .env file:

ANTHROPIC_API_KEY=sk-ant-...
TARGET_CONTAINERS=web,api,db
CHECK_INTERVAL=10
LOG_LINES=50
AUTO_FIX=true
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
POSTGRES_USER=user
POSTGRES_PASSWORD=changeme
POSTGRES_DB=mydb
MAX_DIAGNOSES_PER_HOUR=20

A quick note on CHECK_INTERVAL: 10 seconds is aggressive. For production, I'd bump this to 30-60 seconds. I kept it low during development so I could see results faster, and honestly forgot to change it. My API bill reminded me.

The Monitoring Script – Line by Line

Here's the full container_doctor.py. I'll walk through the important parts after:

import docker
import json
import time
import logging
import os
import requests
from datetime import datetime, timedelta
from collections import defaultdict
from threading import Thread
from flask import Flask, jsonify
from anthropic import Anthropic

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

client = Anthropic()
docker_client = None

# --- Config ---
TARGET_CONTAINERS = os.getenv("TARGET_CONTAINERS", "").split(",")
CHECK_INTERVAL = int(os.getenv("CHECK_INTERVAL", "10"))
LOG_LINES = int(os.getenv("LOG_LINES", "50"))
AUTO_FIX = os.getenv("AUTO_FIX", "true").lower() == "true"
SLACK_WEBHOOK = os.getenv("SLACK_WEBHOOK_URL", "")
MAX_DIAGNOSES = int(os.getenv("MAX_DIAGNOSES_PER_HOUR", "20"))

# --- State tracking ---
diagnosis_history = []
fix_history = defaultdict(list)
last_error_seen = {}
rate_limit_counter = defaultdict(int)
rate_limit_reset = datetime.now() + timedelta(hours=1)

app = Flask(__name__)


def get_docker_client():
    """Lazily initialize Docker client."""
    global docker_client
    if docker_client is None:
        docker_client = docker.from_env()
    return docker_client


def get_container_logs(container_name):
    """Fetch last N lines from a container."""
    try:
        container = get_docker_client().containers.get(container_name)
        logs = container.logs(
            tail=LOG_LINES,
            timestamps=True
        ).decode("utf-8")
        return logs
    except docker.errors.NotFound:
        logger.warning(f"Container '{container_name}' not found. Skipping.")
        return None
    except docker.errors.APIError as e:
        logger.error(f"Docker API error for {container_name}: {e}")
        return None
    except Exception as e:
        logger.error(f"Unexpected error fetching logs for {container_name}: {e}")
        return None


def detect_errors(logs):
    """Check if logs contain error patterns."""
    error_patterns = [
        "error", "exception", "traceback", "failed", "crash",
        "fatal", "panic", "segmentation fault", "out of memory",
        "killed", "oomkiller", "connection refused", "timeout",
        "permission denied", "no such file", "errno"
    ]
    logs_lower = logs.lower()
    found = []
    for pattern in error_patterns:
        if pattern in logs_lower:
            found.append(pattern)
    return found


def is_new_error(container_name, logs):
    """Check if this is a new error or the same one we already diagnosed."""
    log_hash = hash(logs[-200:])  # Hash last 200 chars
    if last_error_seen.get(container_name) == log_hash:
        return False
    last_error_seen[container_name] = log_hash
    return True


def check_rate_limit():
    """Ensure we don't spam Claude with too many requests."""
    global rate_limit_counter, rate_limit_reset

    now = datetime.now()
    if now > rate_limit_reset:
        rate_limit_counter.clear()
        rate_limit_reset = now + timedelta(hours=1)

    total = sum(rate_limit_counter.values())
    if total >= MAX_DIAGNOSES:
        logger.warning(f"Rate limit reached ({total}/{MAX_DIAGNOSES} per hour). Skipping diagnosis.")
        return False
    return True


def diagnose_with_claude(container_name, logs, error_patterns):
    """Send logs to Claude for diagnosis."""
    if not check_rate_limit():
        return None

    rate_limit_counter[container_name] += 1

    prompt = f"""You are a DevOps expert analyzing container logs.

Container: {container_name}
Timestamp: {datetime.now().isoformat()}
Detected patterns: {', '.join(error_patterns)}

Recent logs:
---
{logs}
---

Analyze these logs and respond with ONLY valid JSON (no markdown, no explanation):
{{
    "root_cause": "One sentence explaining exactly what went wrong",
    "severity": "low|medium|high",
    "suggested_fix": "Step-by-step fix the operator should apply",
    "auto_restart_safe": true or false,
    "config_suggestions": ["ENV_VAR=value", "..."],
    "likely_recurring": true or false,
    "estimated_impact": "What breaks if this isn't fixed"
}}
"""

    try:
        message = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=600,
            messages=[
                {"role": "user", "content": prompt}
            ]
        )
        return message.content[0].text
    except Exception as e:
        logger.error(f"Claude API error: {e}")
        return None


def parse_diagnosis(diagnosis_text):
    """Extract JSON from Claude's response."""
    if not diagnosis_text:
        return None
    try:
        start = diagnosis_text.find("{")
        end = diagnosis_text.rfind("}") + 1
        if start >= 0 and end > start:
            json_str = diagnosis_text[start:end]
            return json.loads(json_str)
    except json.JSONDecodeError as e:
        logger.error(f"JSON parse error: {e}")
        logger.debug(f"Raw response: {diagnosis_text}")
    except Exception as e:
        logger.error(f"Failed to parse diagnosis: {e}")
    return None


def apply_fix(container_name, diagnosis):
    """Apply auto-fixes if safe."""
    if not AUTO_FIX:
        logger.info(f"Auto-fix disabled globally. Skipping {container_name}.")
        return False

    if not diagnosis.get("auto_restart_safe"):
        logger.info(f"Claude says restart is unsafe for {container_name}. Skipping.")
        return False

    # Don't restart the same container more than 3 times per hour
    recent_fixes = [
        t for t in fix_history[container_name]
        if t > datetime.now() - timedelta(hours=1)
    ]
    if len(recent_fixes) >= 3:
        logger.warning(
            f"Container {container_name} already restarted {len(recent_fixes)} "
            f"times this hour. Something deeper is wrong. Skipping."
        )
        send_slack_alert(
            container_name, diagnosis,
            extra="REPEATED FAILURE: This container has been restarted 3+ times "
                  "in the last hour. Manual intervention needed."
        )
        return False

    try:
        container = get_docker_client().containers.get(container_name)
        logger.info(f"Restarting container {container_name}...")
        container.restart(timeout=30)
        fix_history[container_name].append(datetime.now())
        logger.info(f"Container {container_name} restarted successfully")

        # Verify it's actually running after restart
        time.sleep(5)
        container.reload()
        if container.status != "running":
            logger.error(f"Container {container_name} failed to start after restart")
            return False

        return True
    except Exception as e:
        logger.error(f"Failed to restart {container_name}: {e}")
        return False


def send_slack_alert(container_name, diagnosis, extra=""):
    """Send diagnosis to Slack."""
    if not SLACK_WEBHOOK:
        return

    severity_emoji = {
        "low": "🟡",
        "medium": "🟠",
        "high": "🔴"
    }

    severity = diagnosis.get("severity", "unknown")
    emoji = severity_emoji.get(severity, "⚪")

    blocks = [
        {
            "type": "header",
            "text": {
                "type": "plain_text",
                "text": f"{emoji} Container Doctor Alert: {container_name}"
            }
        },
        {
            "type": "section",
            "fields": [
                {"type": "mrkdwn", "text": f"*Severity:* {severity}"},
                {"type": "mrkdwn", "text": f"*Container:* `{container_name}`"},
                {"type": "mrkdwn", "text": f"*Root Cause:* {diagnosis.get('root_cause', 'Unknown')}"},
                {"type": "mrkdwn", "text": f"*Fix:* {diagnosis.get('suggested_fix', 'N/A')}"},
            ]
        }
    ]

    if diagnosis.get("config_suggestions"):
        suggestions = "\n".join(
            f"• `{s}`" for s in diagnosis["config_suggestions"]
        )
        blocks.append({
            "type": "section",
            "text": {
                "type": "mrkdwn",
                "text": f"*Config Suggestions:*\n{suggestions}"
            }
        })

    if extra:
        blocks.append({
            "type": "section",
            "text": {"type": "mrkdwn", "text": f"*⚠️ {extra}*"}
        })

    try:
        requests.post(SLACK_WEBHOOK, json={"blocks": blocks}, timeout=10)
    except Exception as e:
        logger.error(f"Slack notification failed: {e}")


# --- Health Check Endpoint ---
@app.route("/health")
def health():
    """Health check endpoint for the doctor itself."""
    try:
        get_docker_client().ping()
        docker_ok = True
    except:
        docker_ok = False

    return jsonify({
        "status": "healthy" if docker_ok else "degraded",
        "docker_connected": docker_ok,
        "monitoring": TARGET_CONTAINERS,
        "total_diagnoses": len(diagnosis_history),
        "fixes_applied": {k: len(v) for k, v in fix_history.items()},
        "rate_limit_remaining": MAX_DIAGNOSES - sum(rate_limit_counter.values()),
        "uptime_check": datetime.now().isoformat()
    })


@app.route("/history")
def history():
    """Return recent diagnosis history."""
    return jsonify(diagnosis_history[-50:])


def monitor_containers():
    """Main monitoring loop."""
    logger.info(f"Container Doctor starting up")
    logger.info(f"Monitoring: {TARGET_CONTAINERS}")
    logger.info(f"Check interval: {CHECK_INTERVAL}s")
    logger.info(f"Auto-fix: {AUTO_FIX}")
    logger.info(f"Rate limit: {MAX_DIAGNOSES}/hour")

    while True:
        for container_name in TARGET_CONTAINERS:
            container_name = container_name.strip()
            if not container_name:
                continue

            logs = get_container_logs(container_name)
            if not logs:
                continue

            error_patterns = detect_errors(logs)
            if not error_patterns:
                continue

            # Skip if we already diagnosed this exact error
            if not is_new_error(container_name, logs):
                continue

            logger.warning(
                f"Errors detected in {container_name}: {error_patterns}"
            )

            diagnosis_text = diagnose_with_claude(
                container_name, logs, error_patterns
            )
            if not diagnosis_text:
                continue

            diagnosis = parse_diagnosis(diagnosis_text)
            if not diagnosis:
                logger.error("Failed to parse Claude's response. Skipping.")
                continue

            # Record it
            diagnosis_history.append({
                "container": container_name,
                "timestamp": datetime.now().isoformat(),
                "diagnosis": diagnosis,
                "patterns": error_patterns
            })

            logger.info(
                f"Diagnosis for {container_name}: "
                f"severity={diagnosis.get('severity')}, "
                f"cause={diagnosis.get('root_cause')}"
            )

            # Auto-fix only on high severity
            fixed = False
            if diagnosis.get("severity") == "high":
                fixed = apply_fix(container_name, diagnosis)

            # Always notify Slack
            send_slack_alert(
                container_name, diagnosis,
                extra="Auto-restarted" if fixed else ""
            )

        time.sleep(CHECK_INTERVAL)


if __name__ == "__main__":
    # Run Flask health endpoint in background
    flask_thread = Thread(
        target=lambda: app.run(host="0.0.0.0", port=8080, debug=False),
        daemon=True
    )
    flask_thread.start()
    logger.info("Health endpoint running on :8080")

    try:
        monitor_containers()
    except KeyboardInterrupt:
        logger.info("Container Doctor shutting down")

That's a lot of code, so let me walk through the parts that matter.

Error deduplication (is_new_error): This was a lesson I learned the hard way. Without this, the script would see the same error every 10 seconds and spam Claude with identical requests. I hash the last 200 characters of the log output and skip if it matches the last error we saw. Simple, but it cut my API costs by about 80%.

Rate limiting (check_rate_limit): Belt and suspenders. Even with deduplication, I cap it at 20 diagnoses per hour. If something is so broken that it's generating 20+ unique errors per hour, you need a human anyway.

Restart throttling (inside apply_fix): If the same container has been restarted 3 times in an hour, something deeper is wrong. A restart loop won't fix a misconfigured database or a missing volume. The script stops restarting and sends a louder Slack alert instead.

Post-restart verification: After restarting, the script waits 5 seconds and checks if the container is actually running. I've seen cases where a container restarts and immediately crashes again. Without this check, the script would report success while the container is still down.

The Claude Diagnosis Prompt (and Why Structure Matters)

Getting Claude to return parseable JSON took some iteration. My first attempt used a casual prompt and I got back paragraphs of explanation with JSON buried somewhere in the middle. Sometimes it'd use markdown code fences, sometimes not.

The version I landed on is explicit about format:

prompt = f"""You are a DevOps expert analyzing container logs.

Container: {container_name}
Timestamp: {datetime.now().isoformat()}
Detected patterns: {', '.join(error_patterns)}

Recent logs:
---
{logs}
---

Analyze these logs and respond with ONLY valid JSON (no markdown, no explanation):
{{
    "root_cause": "One sentence explaining exactly what went wrong",
    "severity": "low|medium|high",
    "suggested_fix": "Step-by-step fix the operator should apply",
    "auto_restart_safe": true or false,
    "config_suggestions": ["ENV_VAR=value", "..."],
    "likely_recurring": true or false,
    "estimated_impact": "What breaks if this isn't fixed"
}}
"""

A few things I learned:

Include the detected patterns. Telling Claude "I found 'timeout' and 'connection refused'" helps it focus. Without this, it sometimes fixated on irrelevant warnings in the logs.

Ask for estimated_impact. This field turned out to be the most useful in Slack alerts. When your team sees "Database connections will pile up and crash the API within 15 minutes," they act faster than when they see "connection pool exhausted."

likely_recurring is gold. If Claude says an issue is likely to recur, I know a restart is a band-aid and I need to actually fix the root cause. I flag these in Slack with extra emphasis.

Claude returns something like:

{
    "root_cause": "Connection pool exhausted. Default pool size is 5, but app has 8+ concurrent workers.",
    "severity": "high",
    "suggested_fix": "1. Set POOL_SIZE=20 in environment. 2. Add connection timeout of 30s. 3. Consider a connection pooler like PgBouncer.",
    "auto_restart_safe": true,
    "config_suggestions": ["POOL_SIZE=20", "CONNECTION_TIMEOUT=30"],
    "likely_recurring": true,
    "estimated_impact": "API requests will queue and timeout. Users will see 503 errors within 2-3 minutes."
}

I only auto-restart on high severity. Medium and low issues get logged, sent to Slack, and I deal with them during business hours. This distinction matters: you don't want the script restarting containers over every transient warning.

Auto-Fix Logic – Being Conservative on Purpose

The auto-fix function is intentionally limited. Right now it only restarts containers. It doesn't modify environment variables, change configs, or scale services. Here's why:

Restarting is safe and reversible. If the restart makes things worse, the container just crashes again and I get another alert. But if the script started changing environment variables or modifying docker-compose files, a bad decision could cascade across services.

The three safety checks before any restart:

Global toggle: AUTO_FIX=true in .env. I can kill all auto-fixes instantly by changing one variable.
Claude's assessment: auto_restart_safe must be true. If Claude says "don't restart this, it'll corrupt the database," the script listens.
Restart throttle: No more than 3 restarts per container per hour. After that, it's a human problem.

If I were building this for a team, I'd add approval flows. Send a Slack message with "Restart?" and two buttons. Wait for a human to click yes. That adds latency but removes the risk of automated chaos.

Adding Slack Notifications

Every diagnosis gets sent to Slack, whether the container was restarted or not. The notification includes color-coded severity, root cause, suggested fix, and config suggestions.

The Slack Block Kit formatting makes these alerts scannable. A red dot for high severity, orange for medium, yellow for low. Your team can glance at the channel and know if they need to drop everything or if it can wait.

To set this up, create a Slack app at api.slack.com/apps, add an incoming webhook, and paste the URL in your .env.

Health Check Endpoint

The doctor needs a doctor. I added a simple Flask endpoint so I can monitor the monitoring script:

curl http://localhost:8080/health

Returns:

{
    "status": "healthy",
    "docker_connected": true,
    "monitoring": ["web", "api", "db"],
    "total_diagnoses": 14,
    "fixes_applied": {"api": 2, "web": 1},
    "rate_limit_remaining": 6,
    "uptime_check": "2026-03-15T14:30:00"
}

And /history returns the last 50 diagnoses:

curl http://localhost:8080/history

I point an uptime checker (UptimeRobot, free tier) at the /health endpoint. If the Container Doctor itself goes down, I get an email. It's monitoring all the way down.

Rate Limiting Claude Calls

This is where I burned money during development. Without rate limiting, the script was sending 100+ requests per hour during a container crash loop. At a few cents per request, that's a few dollars per hour. Not catastrophic, but annoying.

The rate limiter is simple: a counter that resets every hour. Default cap is 20 diagnoses per hour. If you hit the limit, the script logs a warning and skips diagnosis until the window resets. Errors still get detected, they just don't get sent to Claude.

Combined with error deduplication (same error won't trigger a second diagnosis), this keeps my Claude bill under $5/month even with 5 containers monitored.

Docker Compose – The Full Setup

Here's the complete docker-compose.yml with the Container Doctor, a sample web server, API, and database:

version: '3.8'

services:
  container_doctor:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: container_doctor
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - TARGET_CONTAINERS=web,api,db
      - CHECK_INTERVAL=10
      - LOG_LINES=50
      - AUTO_FIX=true
      - SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL}
      - MAX_DIAGNOSES_PER_HOUR=20
    ports:
      - "8080:8080"
    restart: unless-stopped
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  web:
    image: nginx:latest
    container_name: web
    ports:
      - "80:80"
    restart: unless-stopped

  api:
    build: ./api
    container_name: api
    environment:
      - DATABASE_URL=postgres://\({POSTGRES_USER}:\){POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}
      - POOL_SIZE=20
    depends_on:
      - db
    restart: unless-stopped

  db:
    image: postgres:15
    container_name: db
    environment:
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_DB=${POSTGRES_DB}
    volumes:
      - db_data:/var/lib/postgresql/data
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER}"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  db_data:

And the Dockerfile:

FROM python:3.12-slim

WORKDIR /app

RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY container_doctor.py .

EXPOSE 8080

CMD ["python", "-u", "container_doctor.py"]

Start everything: docker compose up -d

Important: The socket mount (/var/run/docker.sock:/var/run/docker.sock) gives the Container Doctor full access to the Docker daemon. Don't copy .env into the Docker image either — it bakes your API key into the image layer. Pass environment variables via the compose file or at runtime.

Real Errors I Caught in Production

I've been running this for about 3 weeks now. Here are the actual incidents it caught:

Incident 1: OOM Kill (Week 1)

Logs showed a single word: Killed. That's Linux's OOMKiller doing its thing.

Claude's diagnosis:

{
    "root_cause": "Process killed by OOMKiller. Container is requesting more memory than the 256MB limit allows under load.",
    "severity": "high",
    "suggested_fix": "Increase memory limit to 512MB in docker-compose. Monitor if the leak continues at higher limits.",
    "auto_restart_safe": true,
    "config_suggestions": ["mem_limit: 512m", "memswap_limit: 1g"],
    "likely_recurring": true,
    "estimated_impact": "API is completely down. All requests return 502 from nginx."
}

The script restarted the container in 3 seconds. I updated the compose file the next morning. Before the Container Doctor, this would've been a 2-hour outage overnight.

Incident 2: Connection Pool Exhausted (Week 2)

ERROR: database connection pool exhausted
ERROR: cannot create new pool entry
ERROR: QueuePool limit of 5 overflow 0 reached

Claude caught that my pool size was too small for the number of workers:

{
    "root_cause": "SQLAlchemy connection pool (size=5) can't keep up with 8 concurrent Gunicorn workers. Each worker holds a connection during request processing.",
    "severity": "high",
    "suggested_fix": "Set POOL_SIZE=20 and add POOL_TIMEOUT=30. Long-term: add PgBouncer as a connection pooler.",
    "auto_restart_safe": true,
    "config_suggestions": ["POOL_SIZE=20", "POOL_TIMEOUT=30", "POOL_RECYCLE=3600"],
    "likely_recurring": true,
    "estimated_impact": "New API requests will hang for 30s then timeout. Existing requests may complete but slowly."
}

Incident 3: Transient Timeout (Week 2)

WARN: timeout connecting to upstream service
WARN: retrying request (attempt 2/3)
INFO: request succeeded on retry

Claude correctly identified this as a non-issue:

{
    "root_cause": "Transient network timeout during a DNS resolution hiccup. Retries succeeded.",
    "severity": "low",
    "suggested_fix": "No action needed. This is expected during brief network blips. Only investigate if frequency increases.",
    "auto_restart_safe": false,
    "config_suggestions": [],
    "likely_recurring": false,
    "estimated_impact": "Minimal. Individual requests delayed by ~2s but all completed."
}

No restart. No alert (I filter low-severity from Slack pings). This is the right call: restarting on every transient timeout causes more downtime than it prevents.

Incident 4: Disk Full (Week 3)

ERROR: could not write to temporary file: No space left on device
FATAL: data directory has no space

{
    "root_cause": "Postgres data volume is full. WAL files and temporary sort files consumed all available space.",
    "severity": "high",
    "suggested_fix": "1. Clean WAL files: SELECT pg_switch_wal(). 2. Increase volume size. 3. Add log rotation. 4. Set max_wal_size=1GB.",
    "auto_restart_safe": false,
    "config_suggestions": ["max_wal_size=1GB", "log_rotation_age=1d"],
    "likely_recurring": true,
    "estimated_impact": "Database is read-only. All writes fail. API returns 500 on any mutation."
}

Notice Claude said auto_restart_safe: false here. Restarting Postgres when the disk is full can corrupt data. The script didn't touch it. It just sent me a detailed Slack alert at 4 AM. I cleaned up the WAL files the next morning. Good call by Claude.

Cost Breakdown – What This Actually Costs

After 3 weeks of running this on 5 containers:

Claude API: ~$3.80/month (with rate limiting and deduplication)
Linode compute: $0 extra (the Container Doctor uses about 50MB RAM)
Slack: Free tier
My time saved: ~2-3 hours/month of 3 AM debugging

Without rate limiting, my first week cost $8 in API calls. The deduplication + rate limiter brought that down dramatically. Most of my containers run fine. The script only calls Claude when something actually breaks.

If you're monitoring more containers or have noisier logs, expect higher costs. The MAX_DIAGNOSES_PER_HOUR setting is your budget knob.

Security Considerations

Let's talk about the elephant in the room: the Docker socket.

Mounting /var/run/docker.sock gives the Container Doctor root-equivalent access to your Docker daemon. It can start, stop, and remove any container. It can pull images. It can exec into running containers. If someone compromises the Container Doctor, they own your entire Docker host.

Here's how I mitigate this:

Network isolation: The Container Doctor's health endpoint is only exposed on localhost. In production, put it behind a reverse proxy with auth.
Read-mostly access: The script only reads logs and restarts containers. It never execs into containers, pulls images, or modifies volumes.
No external inputs: The script doesn't accept commands from Slack or any external source. It's outbound-only (logs out, alerts out).
API key rotation: I rotate the Anthropic API key monthly. If the container is compromised, the key has limited blast radius.

For a more secure setup, consider Docker's --read-only flag on the socket mount and a tool like docker-socket-proxy to restrict which API calls the Container Doctor can make.

What I'd Do Differently

After 3 weeks in production, here's my honest retrospective:

I'd use structured logging from day one. My regex-based error detection catches too many false positives. A JSON log format with severity levels would make detection way more accurate.

I'd add per-container policies. Right now, every container gets the same treatment. But you probably want different rules for a database vs a web server. Never auto-restart a database. Always auto-restart a stateless web server.

I'd build a simple web UI. The /history endpoint returns JSON, but a small React dashboard showing a timeline of incidents, fix success rates, and cost tracking would be much more useful.

I'd try local models first. For simple errors (OOM, connection refused), a small local model running on Ollama could handle the diagnosis without any API cost. Reserve Claude for the weird, complex stack traces where you actually need strong reasoning.

I'd add a "learning mode." Run the Container Doctor in observe-only mode for a week. Let it diagnose everything but fix nothing. Review the diagnoses manually. Once you trust its judgment, flip on auto-fix. This builds confidence before you give it restart power.

What's Next?

If you found this useful, I write about Docker, AI tools, and developer workflows every week. I'm Balajee Asish – Docker Captain, freeCodeCamp contributor, and currently building my way through the AI tools space one project at a time.

Got questions or built something similar? Drop a comment below or find me on GitHub and LinkedIn.

Happy building.

The Open Source LLM Agent Handbook: How to Automate Complex Tasks with LangGraph and CrewAI

Balajee Asish Brahmandam — Tue, 03 Jun 2025 14:20:30 +0000

Ever feel like your AI tools are a bit...well, passive? Like they just sit there, waiting for your next command? Imagine if they could take initiative, break down big problems, and even work together to get things done.

That's exactly what LLM agents bring to the table. They're changing how we automate complex tasks, and they can help bring our AI ideas to life in a whole new way.

In this article, we'll explore what LLM agents are, how they work, and how you can build your very own using awesome open-source frameworks.

What we’ll cover:

The Current State of LLM Agents
What Are LLM Agents and Why Are They a Big Deal?
The Rise of Open-Source Agent Frameworks
Core Concepts Behind Agent Design
Project: Automate Your Daily Schedule from Emails
Multi-Agent Collaboration with CrewAI
What Actually Happens During Execution?
Are LLM Agents Safe? What to Know About Security and Privacy
Troubleshooting & Tips
Explore More Daily Automations
What’s Next in Agent Technology?
Final Summary

The Current State of LLM Agents

LLM agents are one of the most exciting developments in AI right now. They’re already helping automate real tasks but they’re also still evolving. So where are we today?

From Chatbots to Autonomous Agents

Large Language Models (LLMs) like GPT-4, Claude, Gemini, and LLaMA have evolved from simple chatbots into surprisingly capable reasoning engines. They've gone from answering trivia questions and generating essays to performing complex reasoning, following multi-step instructions, and interacting with tools like web search and code interpreters.

But here’s the catch: these models are reactive. They wait for input and give output. They don't retain memory between tasks, plan ahead, or pursue goals on their own. That’s where LLM agents come in – they bridge this gap by adding structure, memory, and autonomy.

What Can Agents Do Today?

Right now, LLM agents are already being used for:

Summarizing emails or documents
Planning daily schedules
Running DevOps scripts
Searching APIs or tools for answers
Collaborating in small “teams” to complete complex tasks

But they’re not perfect yet. Agents can still:

Get stuck in loops
Misunderstand goals
Require detailed prompts and guardrails

That’s because this technology is still early-stage. Frameworks are getting better fast, but reliability and memory are still works in progress. So just keep that in mind as you experiment.

Why Now Is the Best Time to Learn

The truth is: we’re still early. But not too early.

This is the perfect time to start experimenting with agents:

The tooling is mature enough to build real projects
The community is growing rapidly
And you don’t need to be an AI expert just comfortable with Python

What Are LLM Agents and Why Are They a Big Deal?

Before we dive into the exciting world of agents, let's quickly chat a bit more about the basics.

What Is an LLM?

An LLM, or Large Language Model, is basically an AI that's learned from a massive amount of text from the internet – think books, articles, code, and tons more. You can picture it as a super-smart autocomplete engine. But it does way more than just finish your sentences. It can also:

Answer tricky questions
Summarize long articles or documents
Write code, emails, or creative stories
Translate languages instantly
Even solve logic puzzles and have engaging conversations

Chances are you've heard of ChatGPT, which is powered by OpenAI's GPT models. Other popular LLMs you might come across include Claude (from Anthropic), LLaMA (by Meta), Mistral, and Gemini (from Google).

These models work by simply predicting the next word in a sentence based on the context. While that sounds straightforward, when trained on billions of words, LLMs become capable of surprisingly intelligent behavior, understanding your instructions, following step-by-step reasoning, and producing coherent responses across almost any topic you can imagine.

So, What’s an LLM Agent?

While LLMs are super powerful, they usually just react – they only respond when you ask them something. An LLM agent, on the other hand, is proactive.

LLM agents can:

Break down big, complex tasks into smaller, manageable steps
Make smart decisions and figure out what to do next
Use "tools" like web search, calculators, or even other apps
Work towards a goal, even if it takes multiple steps or tries
Team up with other agents to accomplish shared objectives

In short, LLM agents can think, plan, act, and adapt.

Think of an LLM agent like your super-efficient new assistant: you give it a goal, and it figures out how to achieve it all on its own.

Why Does This Matter?

This shift from just responding to actively pursuing goals opens a ton of exciting possibilities:

Automating boring IT or DevOps tasks
Generating detailed reports from raw data
Helping you with multi-step research projects
Reading through your daily emails and highlighting key info
Running your internal tools to take real-world actions

Unlike older, rule-based bots, LLM agents can reason, reflect, and learn from their attempts. This makes them a much better fit for real-world tasks that are messy, require flexibility, and depend on understanding context.

The Rise of Open-Source Agent Frameworks

Not too long ago, if you wanted to build an AI system that could act autonomously, it meant writing a ton of custom code, painstakingly managing memory, and trying to stitch together dozens of components. It was a complex, delicate, and highly specialized job.

But guess what? That's not the case anymore.

In 2024, a wave of fantastic open-source frameworks hit the scene. These tools have made it dramatically easier to build powerful LLM agents without you having to reinvent the wheel every time.

Popular Open-Source Agent Frameworks

Framework	Description	Maintainer
LangGraph	Graph-based framework for agent state and memory	LangChain
CrewAI	"Role-based, multi-agent collaboration engine"	Community (CrewAI)
AutoGen	Customizable multi-agent chat orchestration	Microsoft
AgentVerse	Modular framework for agent simulation and testing	Open-source project

What These Tools Enable

These frameworks give you ready-made building blocks to handle the trickier parts of creating agents:

Planning – Letting agents decide their next move
Tool Use – Easily connecting agents to things like file systems, web browsers, APIs, or databases
Memory – Storing and retrieving past information or intermediate results for long-term context
Multi-Agent Collaboration – Setting up teams of agents that work together on shared goals

Why Use a Framework Instead of Building from Scratch?

While you could build a custom agent from the ground up, using a framework will save you a huge amount of time and effort. Open-source agent libraries come packed with:

Built-in support for orchestrating LLMs
Proven patterns for task planning, keeping track of where you are, and getting feedback
Easy integration with popular models like OpenAI, or even models you run locally
The flexibility to grow from a single helpful agent to entire teams of agents

Basically, these frameworks let you focus on what your agent should do, rather than getting bogged down in how to build all the internal workings. Plus, choosing open source means you benefit from community contributions, transparency in how they work, and the freedom to tweak them to your exact needs, without getting locked into a single vendor.

Core Concepts Behind Agent Design

To really grasp how LLM agents operate, it helps to think of them as goal-driven systems that constantly cycle through observing, reasoning, and acting. This continuous loop allows them to tackle tasks that go beyond simple questions and answers, moving into true automation, tool usage, and adapting on the fly.

The Agent Loop

Most LLM agents function based on a mental model called the Agent Loop a step-by-step cycle that repeats until the job is done. Here’s how it typically works:

Perceive: The agent starts by noticing something in its environment or receiving new information. This could be your prompt, a piece of data, or the current state of a system.
Plan: Based on what it perceives and its overall goal, the agent decides what to do next. It might break the task into smaller sub-goals or figure out the best tool for the job.
Act: The agent then acts. This could mean running a function, calling an API, searching the web, interacting with a database, or even asking another agent for help.
Reflect: After acting, the agent looks at the outcome: Did it work? Was the result useful? Should it try a different approach? Based on this, it updates its plan and keeps going until the task is complete.

This loop is what makes agents so dynamic. It allows them to handle ever-changing tasks, learn from partial results, and correct their course qualities that are vital for building truly useful AI assistants.

Key Components of an Agent

To do their job effectively, agents are built around several crucial parts:

Tools are how an agent interacts with the real (or digital) world. These can be anything from search engines, code execution environments, file readers, or API clients, to simple calculators or command-line scripts.
Memory lets agents remember what they've done or seen across different steps. This might include previous things you've said, temporary results, or key decisions. Some frameworks offer short-term memory (just for one session), while others support long-term memory that can span multiple sessions or goals.
Environment refers to the external data or system context the agent operates within think APIs, documents, databases, files, or sensor inputs. The more information and access an agent have to its environment, the more meaningful actions it can take.
Goal is the agent's ultimate objective: what it's trying to achieve. Goals should be specific and clear for instance, “generate a daily schedule,” “summarize this document,” or “extract tasks from emails.”

Multi-Agent Collaboration

For more advanced systems, you can even have multiple agents working together to hit a shared target. Each agent can be given a specific role that highlights its specialty just like people working on a team.

For example:

A researcher agent might be tasked with gathering information.
A coder agent could write Python scripts or automation routines.
A reviewer agent might check the results and ensure everything is up to snuff.

These agents can chat with each other, share information, and even debate or vote on decisions. This kind of teamwork allows AI systems to tackle bigger, more complex tasks while keeping things organized and modular.

Project: Automate Your Daily Schedule from Emails

What We’re Automating

Think about your typical morning routine:

You open your inbox.
You quickly scan through a bunch of emails.
You try to spot meetings, tasks, and important reminders.
Then, you manually write a to-do list or add things to your calendar.

Let's use an LLM agent to make that process effortless. Our agent will:

Read a list of your email messages
Pull out time-sensitive items like meetings or deadlines
Summarize everything into a nice, clean daily schedule

Step 1: Install the Required Tools

To get started, you'll need three main tools: Python, VSCode, and an OpenAI API key.

1. Install Python 3.9 or Higher

Grab the latest version of Python 3.9+ from the official website: https://www.python.org/downloads/

Once it's installed, double-check it by running python --version in your terminal.

This command simply asks your system to report the Python version currently installed. You'll want to see Python 3.9.x or something higher to ensure compatibility with our project.

2. Install VSCode (Optional but Recommended)

VSCode is a fantastic, user-friendly code editor that works perfectly with Python. You can download it right here: https://code.visualstudio.com/.

3. Get Your OpenAI API Key

Head over to: https://platform.openai.com

Sign in or create a new account. Navigate to your API Keys page. Click “Create new secret key” and make sure to copy that key somewhere safe for later.

4. Install Python Libraries

Open your terminal or command prompt and install these essential packages:

pip install langgraph langchain openai

This command uses pip, Python's package manager, to download and install three crucial libraries for our agent:

langgraph: The core framework we'll use to build our agent's workflow.
langchain: A foundational library for working with large language models, upon which LangGraph is built.
openai: The official Python library for connecting to OpenAI's powerful AI models.

If you're excited to try out multi-agent setups (which we'll cover in Step 5), also install CrewAI:

pip install crewai

This command installs CrewAI, a specialized framework that makes it easy to orchestrate multiple AI agents working together as a team.

5. Set Your OpenAI API Key

You need to make sure your Python code can find and use your OpenAI API key. This is typically done by setting it as an environment variable.

On macOS/Linux, run this in your terminal (replace "your-api-key" with your actual key):

export OPENAI_API_KEY="your-api-key"

This command sets an environment variable named OPENAI_API_KEY. Environment variables are a secure way for applications (like your Python script) to access sensitive information without hardcoding it directly into the code itself.

On Windows (using Command Prompt), do this:

set OPENAI_API_KEY="your-api-key"

This is the Windows equivalent command to set the OPENAI_API_KEY environment variable.

Now, your Python code will be all set to talk to the OpenAI model!

Step 2: Define the Task

We discussed this briefly in the beginning of this section. But to reiterate, this is what we’ll want our agent to do:

Scan for meetings, events, and important tasks.
Jot them down quickly in a notebook or an app.
Create a rough mental plan for your day.

This routine takes time and mental energy. So having an agent do it for us will be super helpful.

Step 3: Build the Workflow with LangGraph

What Is LangGraph?

LangGraph is a cool framework that helps you build agents using a "graph-based" workflow, kind of like drawing a flowchart. It's powered by LangChain and gives you a lot more control over exactly how each step in your agent's process unfolds.

Each "node" in this graph represents a decision point or a function that:

Takes some input (its current "state").
Does some reasoning or takes an action (often involving the LLM and its tools).
Returns an updated output (a new "state").

You draw the connections between these nodes, and LangGraph then executes it like a smart, automated state machine.

Why Use LangGraph?

You get to control the precise order of execution.
It's fantastic for building workflows that have multiple steps or even branch off into different paths.
It plays nicely with both cloud-based models (like OpenAI) and models you run locally.

Alright – now let’s write the code.

1. Simulate Email Input

In a real application, your agent would probably connect to Gmail or Outlook to fetch your actual emails. For this example, though, we’ll just hardcode some sample messages to keep things simple:

Python

emails = """
1. Subject: Standup Call at 10 AM
2. Subject: Client Review due by 5 PM
3. Subject: Lunch with Sarah at noon
4. Subject: AWS Budget Warning – 80% usage
5. Subject: Dentist Appointment - 4 PM
"""

This multiline Python string, emails, acts as our stand-in for real email content. We're providing a simple, structured list of email subjects to demonstrate how the agent will process text.

2. Define the Agent Logic

Now, we'll tell OpenAI’s GPT model how to process this email text and turn it into a summary.

from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, List
import operator

# Define the state for our graph
class AgentState(TypedDict):
    emails: str
    result: str

llm = ChatOpenAI(temperature=0, model="gpt-4o") # Using gpt-4o for better performance

def calendar_summary_agent(state: AgentState) -> AgentState:
    emails = state["emails"]
    prompt = f"Summarize today's schedule based on these emails, listing time-sensitive items first and then other important notes. Be concise and use bullet points:\n{emails}"
    summary = llm.invoke(prompt).content
    return {"result": summary, "emails": emails} # Ensure emails is also returned

Here’s what’s going on:

Imports: We bring in necessary components:
- ChatOpenAI to connect to the LLM,
- StateGraph and END from langgraph.graph to build our agent workflow,
- TypedDict, Annotated, and List from typing for type checking and structure,
- operator (though not used in this snippet, it can help with comparisons or logic).
AgentState: This TypedDict defines the shape of the data our agent will work with. It includes:
- emails: the raw input messages.
- result: the final output (the daily summary).
llm = ChatOpenAI(...): Initializes the language model. We're using GPT-4o with temperature=0 to ensure consistent, predictable output perfect for structured summarization tasks.
calendar_summary_agent(state: AgentState): This function is the "brain" of our agent. It:
- Takes in the current state, which includes a list of emails.
- Extracts the emails from that state.
- Constructs a prompt that tells the model to generate a concise daily schedule summary using bullet points, prioritizing time-sensitive items.
- Sends this prompt to the model with llm.invoke(prompt).content, which returns the LLM’s response as plain text.
- Returns a new AgentState dictionary containing:
  - result: the generated summary,
  - emails: preserved in case we need it downstream.

3. Build and Run the Graph

Now, let's use LangGraph to map out the flow of our single-agent task and then run it.

builder = StateGraph(AgentState)
builder.add_node("calendar", calendar_summary_agent)
builder.set_entry_point("calendar")
builder.set_finish_point("calendar") # END is implicit if not set explicitly

graph = builder.compile()

# Run the graph using your simulated email data
result = graph.invoke({"emails": emails})
print(result["result"])

Here’s what’s going on:

builder = StateGraph(AgentState): We're initiating a StateGraph object. By passing AgentState, we're telling LangGraph the expected data structure for its internal state.
builder.add_node("calendar", calendar_summary_agent): This line adds a named "node" to our graph. We're calling it "calendar", and we're linking it to our calendar_summary_agent function, meaning that function will be executed when this node is active.
builder.set_entry_point("calendar"): This sets "calendar" as the very first step in our workflow. When we start the graph, execution will begin here.
builder.set_finish_point("calendar"): This tells LangGraph that once the "calendar" node finishes its job, the entire graph process is complete.
graph = builder.compile(): This command takes our defined graph blueprint and "compiles" it into an executable workflow.
result = graph.invoke({"emails": emails}): This is where the magic happens! We're telling our graph to start running. We pass it an initial state that contains our emails data. The graph will then process this data through its nodes until it reaches an end point, returning the final state.
print(result["result"]): Finally, we grab the summarized schedule from the result (the final state of our graph) and print it to the console.

Example Output

Your Schedule:
- 10:00 AM – Standup Call
- 12:00 PM – Lunch with Sarah
- 4:00 PM – Dentist Appointment
- Submit client report by 5:00 PM
- AWS Budget Warning – check usage

Boom! You've just built an AI agent that can read your emails and whip up your daily schedule. Pretty cool, right? This is a simple yet powerful peek into what LLM agents can do with just a few lines of code.

Multi-Agent Collaboration with CrewAI

What Is CrewAI?

CrewAI is an exciting open-source framework that lets you build teams of agents that work together seamlessly just like a real-world project team! Each agent in a CrewAI setup:

Has a specific, specialized role.
Can communicate and share information with its teammates.
Collaborates to achieve a shared goal.

This multi-agent approach is super useful when your task is too big or too complex for just one agent, or when breaking it down into specialized parts makes it clearer and more efficient.

Sample Roles for the Email Summary Task

Let's imagine our email summary task being handled by a small team of agents:

Agent Name	Role	Responsibility
Extractor	Email Scanner	"Find meetings, reminders, and tasks from emails"
Prioritizer	Schedule Optimizer	Sort items by urgency and time
Formatter	Output Generator	"Write a clean, polished daily agenda"

Sample CrewAI Code

from crewai import Agent, Crew, Task, Process
from langchain_openai import ChatOpenAI
import os

# Set your OpenAI API key from environment variables
# os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY" # Make sure this is set, or defined directly

# Initialize the LLM (using gpt-4o for better performance)
llm = ChatOpenAI(temperature=0, model="gpt-4o")

# Define the agents with specific roles and goals
extractor = Agent(
    role="Email Scanner",
    goal="Find all meetings, reminders, and tasks from the given emails, accurately extracting details like time, date, and subject.",
    backstory="You are an expert at scanning emails for key information. You meticulously extract every relevant detail.",
    verbose=True,
    allow_delegation=False,
    llm=llm
)

prioritizer = Agent(
    role="Schedule Optimizer",
    goal="Sort extracted items by urgency and time, preparing them for a daily agenda.",
    backstory="You are a master of time management, always knowing what needs to be done first. You organize tasks logically.",
    verbose=True,
    allow_delegation=False,
    llm=llm
)

formatter = Agent(
    role="Output Generator",
    goal="Generate a clean, polished, and concise daily agenda in bullet-point format, clearly listing all schedule items.",
    backstory="You are a professional secretary, ensuring all outputs are perfectly formatted and easy to read. You prioritize clarity.",
    verbose=True,
    allow_delegation=False,
    llm=llm
)

# Simulate email input
emails = """
1. Subject: Standup Call at 10 AM
2. Subject: Client Review due by 5 PM
3. Subject: Lunch with Sarah at noon
4. Subject: AWS Budget Warning – 80% usage
5. Subject: Dentist Appointment - 4 PM
"""

# Define the tasks for each agent
extract_task = Task(
    description=f"Extract all relevant events, meetings, and tasks from these emails: {emails}. Focus on precise details.",
    agent=extractor,
    expected_output="A list of extracted items with their details (e.g., '- Standup Call at 10 AM', '- Client Review due by 5 PM')."
)

prioritize_task = Task(
    description="Prioritize the extracted items by time and urgency. Meetings first, then deadlines, then other notes.",
    agent=prioritizer,
    context=[extract_task], # The output of extract_task is the input here
    expected_output="A prioritized list of schedule items."
)

format_task = Task(
    description="Format the prioritized schedule into a clean, easy-to-read daily agenda using bullet points. Ensure concise language.",
    agent=formatter,
    context=[prioritize_task], # The output of prioritize_task is the input here
    expected_output="A well-formatted daily agenda with bullet points."
)

# Instantiate the crew
crew = Crew(
    agents=[extractor, prioritizer, formatter],
    tasks=[extract_task, prioritize_task, format_task],
    process=Process.sequential, # Tasks are executed sequentially
    verbose=2 # Outputs more details during execution
)

# Run the crew
result = crew.kickoff()
print("\n########################")
print("## Final Daily Agenda ##")
print("########################\n")
print(result)

Here’s what’s going on:

Imports: We bring in key classes from CrewAI: Agent, Crew, Task, and Process. We also import ChatOpenAI for our language model and os to handle environment variables.
llm = ChatOpenAI(...): Just like in the LangGraph example, this sets up our OpenAI language model, making sure its responses are direct (temperature=0) and using the gpt-4o model.
Agent Definitions (extractor, prioritizer, formatter):
- Each of these variables creates an Agent instance. An agent is defined by its role (what it does), a specific goal it's trying to achieve, and a backstory (a sort of personality or expertise that helps the LLM understand its purpose better).
- verbose=True is super helpful for debugging, as it makes the agents print out their "thoughts" as they work.
- allow_delegation=False means these agents won't pass their assigned tasks to other agents (though this can be set to True for more complex delegation scenarios).
- llm=llm connects each agent to our OpenAI language model.
Simulated emails: We reuse the same sample email data for this example.
Task Definitions (extract_task, prioritize_task, format_task):
- Each Task defines a specific piece of work that an agent needs to perform.
- description clearly tells the agent what the task involves.
- agent assigns this task to one of our defined agents (e.g., extractor for extract_task).
- context=[...] is a critical part of CrewAI's collaboration. It tells a task to use the output of a previous task as its input. For instance, prioritize_task takes the extract_task's output as its context.
- expected_output gives the agent an idea of what its result should look like, helping guide the LLM.
crew = Crew(...):
- This is where we assemble our team! We create a Crew instance, giving it our list of agents and tasks.
- process=Process.sequential tells the crew to execute tasks one after another in the order they're defined in the tasks list. CrewAI also supports more advanced processes like hierarchical ones.
- verbose=2 will show you a very detailed log of the crew's internal workings and communication.
result = crew.kickoff(): This command officially starts the entire multi-agent workflow. The agents will begin collaborating, passing information, and working through their assigned tasks in sequence.
fprint(result): Finally, the consolidated output from the entire crew's collaborative effort is printed to your console.

CrewAI cleverly handles all the communication between agents, figures out who needs to work on what and when, and passes the output smoothly from one agent to the next it's like having a mini AI assembly line!

What Actually Happens During Execution?

So, whether you're using LangGraph or CrewAI, what's really going on behind the scenes when an agent runs? Let's break down the execution process:

The system gets an input state (for example, your emails).
The first agent or graph node reads this input and uses a Large Language Model (LLM) to make sense of it.
Based on its understanding, the agent decides on an action like pulling out key events or calling a specific tool.
If needed, the agent might invoke tools (like a web search or a file reader) to get more context or perform external operations.
The result of that action is then passed to the next agent in the team (if it's a multi-agent setup) or returned directly to you.

Execution keeps going until:

The task is fully completed.
All agents have finished their assigned roles.
A stopping condition or a designated "END" point in the workflow is reached.

Think of this as a super-smart workflow engine where every single step involves reasoning, making decisions, and remembering previous interactions.

Are LLM Agents Safe? What to Know About Security and Privacy

As cool as LLM agents are, they raise an important question: can you really trust an AI to run parts of your workflow or interact with your data? It depends. If you’re using services like OpenAI or Anthropic, your data is encrypted in transit and (as of now) isn’t used for training.

But some data might still be temporarily logged to prevent abuse. That’s usually fine for testing and personal projects, but if you’re working with sensitive business info, customer data, or anything private, you’ll want to be careful.

Use anonymized inputs, avoid exposing full datasets, and consider running agents locally using open-source models like LLaMA or Mistral if full control matters to you.

You can also set clear boundaries for your agents so they don’t overstep. Think of it like onboarding a new intern: you wouldn’t give them access to everything on day one.

Give agents only the tools and files they need, keep logs of what they do, and always review the results before letting them make real changes.

As this tech grows, more safety features are coming like better sandboxing, memory limits, and role-based access. But for now, it’s smart to treat your agents like powerful helpers that still need some human supervision.

Troubleshooting & Tips

Sometimes, agents can be a bit quirky! Here are some common issues you might run into and how to fix them:

Issue	Suggested Fix
Agent seems to loop forever	Set a maximum number of iterations or define a clearer stopping point.
Output is too chatty or verbose	Use more specific prompts (for example, “Respond in bullet points only”).
Input is too long or gets cut off	Break down large pieces of content into smaller chunks and summarize them individually.
Agent runs too slowly	Try using a faster LLM model like gpt-3.5 or consider running a local model.

A handy tip: You can also add print() statements or logging messages inside your agent functions to see what's happening at each stage and debug state transitions.

Explore More Daily Automations

Once you've built one agent-based task, you'll find it incredibly easy to adapt the pattern for other automations. Here are some cool ideas to get your creative juices flowing:

Task Type	Example Automation
DevOps Assistant	"Read system logs, detect potential issues, and suggest solutions."
Finance Tracker	Read bank statements or CSV files and summarize your spending habits/budgets.
Meeting Organizer	After a meeting, automatically extract action items and assign owners.
Inbox Cleaner	"Automatically label, archive, and delete non-urgent emails."
Note Summarizer	Convert your daily notes into a neatly formatted to-do list or summary.
Link Checker	Extract URLs from documents and automatically test if they're still valid.
Resume Formatter	Score resumes against job descriptions and format them automatically.

Each of these can be built using the very same principles and frameworks we discussed whether that's LangGraph or CrewAI.

What’s Next in Agent Technology?

LLM agents are evolving at lightning speed, and the next wave of innovation is already here:

Smarter memory systems: Expect agents to have better long-term memory, allowing them to learn over extended periods and remember past conversations and actions.
Multi-modal agents: Agents won't just handle text anymore! They'll be able to process and understand images, audio, and video, making them much more versatile.
Advanced planning frameworks: Techniques like ReAct, Toolformer, and AutoGen are constantly improving agents' ability to reason, plan, and reduce those pesky "hallucinations."
Edge deployment: Imagine agents running entirely offline on your local computer or device using lightweight models like LLaMA 3 or Mistral.

In the very near future, you'll see agents seamlessly integrated into:

Your DevOps pipelines
Big enterprise workflows
Everyday productivity tools
Mobile apps and smart devices
Games, simulations, and educational platforms

Final Summary

Alright, let's quickly recap all the cool stuff you've just learned and accomplished:

You've gotten a solid grasp of what LLM agents are and why they're so powerful.
You've seen how open-source frameworks like LangGraph and CrewAI make building agents much easier.
You've built a real LLM agent using LangGraph to automate a common daily task: summarizing your inbox!
You've explored the world of multi-agent collaboration with CrewAI, understanding how teams of AIs can work together.
You've learned how to take these principles and scale them to automate countless other tasks.

So, next time you find yourself stuck doing something repetitive, just ask yourself: "Hey, can I build an agent for that?" The answer is probably yes!

Resources Recap

Here are some helpful resources if you want to dive deeper into building LLM agents:

Resource	Link
LangGraph Docs	https://docs.langgraph.dev/
CrewAI GitHub	https://github.com/joaomdmoura/crewAI
LangChain Docs	https://docs.langchain.com/docs/
OpenAI API Docs	https://platform.openai.com/docs
Python 3.9+	https://www.python.org/downloads/
VSCode	https://code.visualstudio.com/

agents - freeCodeCamp.org

How to Use Context Hub (chub) to Build a Companion Relevance Engine

What We'll Build

Prerequisites

Table of Contents

How to Understand Context Hub

How to Understand the Official Repo, the Companion repo, and the Upstream PR

How to Install and Use the Official CLI

How to Understand Docs, Skills, and the Content Layout

How to Use Incremental Fetch and Layered Sources

How to Use Annotations and Feedback to Create a Memory Loop

How to See Where Relevance Still Misses

How the Companion Relevance Engine Improves Retrieval

How to Run the Companion Repo End to End

How to Reproduce a Baseline Miss

How to Reproduce a Workflow-intent Win

How to Test the Memory Loop

How to Run the Benchmark

How to Launch the Local Comparison UI

How to Read the Benchmark Honestly

How to Connect the Companion Repo to the Upstream PR

Conclusion

Diagram Attribution

Sources

Docker Container Doctor: How I Built an AI Agent That Monitors and Fixes My Containers

Table of Contents

Why Not Just Use Prometheus?

The Architecture

Setting Up the Project

The Monitoring Script – Line by Line

The Claude Diagnosis Prompt (and Why Structure Matters)

Auto-Fix Logic – Being Conservative on Purpose

Adding Slack Notifications

Health Check Endpoint

Rate Limiting Claude Calls

Docker Compose – The Full Setup

Real Errors I Caught in Production

Incident 1: OOM Kill (Week 1)

Incident 2: Connection Pool Exhausted (Week 2)

Incident 3: Transient Timeout (Week 2)

Incident 4: Disk Full (Week 3)

Cost Breakdown – What This Actually Costs

Security Considerations

What I'd Do Differently

What's Next?

The Open Source LLM Agent Handbook: How to Automate Complex Tasks with LangGraph and CrewAI

What we’ll cover:

The Current State of LLM Agents

From Chatbots to Autonomous Agents

What Can Agents Do Today?

Why Now Is the Best Time to Learn

What Are LLM Agents and Why Are They a Big Deal?

What Is an LLM?

So, What’s an LLM Agent?

Why Does This Matter?

The Rise of Open-Source Agent Frameworks

Popular Open-Source Agent Frameworks

What These Tools Enable

Why Use a Framework Instead of Building from Scratch?

Core Concepts Behind Agent Design

The Agent Loop

Key Components of an Agent

Multi-Agent Collaboration

Project: Automate Your Daily Schedule from Emails

What We’re Automating

Step 1: Install the Required Tools

1. Install Python 3.9 or Higher

2. Install VSCode (Optional but Recommended)

3. Get Your OpenAI API Key

4. Install Python Libraries

Step 2: Define the Task

Step 3: Build the Workflow with LangGraph

What Is LangGraph?

Why Use LangGraph?

1. Simulate Email Input

2. Define the Agent Logic

3. Build and Run the Graph

Example Output

Multi-Agent Collaboration with CrewAI

What Is CrewAI?