<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Christopher Galliart - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Christopher Galliart - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sun, 31 May 2026 09:37:50 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/hatmanstack/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Trace Multi-Agent AI Swarms with Jaeger v2 ]]>
                </title>
                <description>
                    <![CDATA[ When you run a single AI agent, debugging is straightforward. You read the log, you see what happened. When you run five agents in a swarm, each spawning its own tool calls and producing its own outpu ]]>
                </description>
                <link>https://www.freecodecamp.org/news/multi-agent-ai-swarms-tracing/</link>
                <guid isPermaLink="false">69eaae45904b915438cefb47</guid>
                
                    <category>
                        <![CDATA[ jaeger ]]>
                    </category>
                
                    <category>
                        <![CDATA[ OpenTelemetry ]]>
                    </category>
                
                    <category>
                        <![CDATA[ distributed tracing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ multi-agent systems ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ai agents ]]>
                    </category>
                
                    <category>
                        <![CDATA[ observability ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Christopher Galliart ]]>
                </dc:creator>
                <pubDate>Thu, 23 Apr 2026 23:41:57 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/308710e6-cfe6-4007-887a-c49a5e2e6b9a.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When you run a single AI agent, debugging is straightforward. You read the log, you see what happened.</p>
<p>When you run five agents in a swarm, each spawning its own tool calls and producing its own output, "read the log" stops being a strategy.</p>
<p>I built <a href="https://github.com/HatmanStack/claude-forge">Claude Forge</a> as an adversarial multi-agent coding framework on top of Claude Code. A typical run spawns a planner, an implementer, a reviewer, and a fixer. They evaluate each other's work and loop back when quality checks fail.</p>
<p>But when something went wrong, I had timestamps and text dumps but no way to see which agent was responsible, how long it actually took, or where the tokens went.</p>
<p>Jaeger fixed that. This article covers setting up Jaeger v2 with Docker, wiring it into a multi-agent system through OpenTelemetry, and what I learned along the way.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-what-is-distributed-tracing">What Is Distributed Tracing?</a></p>
</li>
<li><p><a href="#heading-why-jaeger-v2">Why Jaeger v2?</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-installing-docker-on-debian">Installing Docker on Debian</a></p>
</li>
<li><p><a href="#heading-setting-up-jaeger-v2">Setting Up Jaeger v2</a></p>
</li>
<li><p><a href="#heading-setting-up-claude-forge-tracing">Setting Up Claude Forge Tracing</a></p>
</li>
<li><p><a href="#heading-understanding-the-span-model">Understanding the Span Model</a></p>
</li>
<li><p><a href="#heading-instrumenting-a-multi-agent-swarm">Instrumenting a Multi-Agent Swarm</a></p>
</li>
<li><p><a href="#heading-viewing-traces-in-the-jaeger-ui">Viewing Traces in the Jaeger UI</a></p>
</li>
<li><p><a href="#heading-lessons-from-the-trenches">Lessons from the Trenches</a></p>
</li>
<li><p><a href="#heading-environment-variable-reference">Environment Variable Reference</a></p>
</li>
<li><p><a href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-what-is-distributed-tracing">What Is Distributed Tracing?</h2>
<p>Distributed tracing tracks a single operation as it moves through multiple services. A span is one unit of work with a start time, end time, and key-value attributes. Spans nest into parent-child trees. One tree per operation is one trace.</p>
<p>Microservices people already know this pattern: follow an HTTP request from the gateway through auth, the database, and the cache. Same idea works for multi-agent AI. Follow one swarm invocation from the orchestrator through each subagent and its tool calls.</p>
<p>OpenTelemetry (OTel) is the standard. It gives you SDKs for creating spans and shipping them over OTLP. Jaeger receives that data and renders it as a searchable timeline.</p>
<h2 id="heading-why-jaeger-v2">Why Jaeger v2?</h2>
<p>Jaeger started at Uber and graduated as a CNCF project in 2019. v1 hit end of life in December 2025. v2 is the current release, built on the OpenTelemetry Collector framework. Single binary: collector, query service, and UI. It speaks OTLP natively on port 4317 (gRPC) and 4318 (HTTP). There's no separate collector needed for local work.</p>
<p>One important difference from v1: configuration moved from CLI flags and environment variables to a YAML file. The old <code>-e SPAN_STORAGE_TYPE=badger</code> env vars are silently ignored in v2. The container starts fine but falls back to in-memory storage. I lost two days of traces before noticing. More on the correct setup below.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p><strong>Docker</strong> installed and running.</p>
</li>
<li><p><strong>Claude Code</strong> installed.</p>
</li>
<li><p><strong>Python 3.8+</strong> for the tracing hook.</p>
</li>
<li><p><strong>Claude Forge</strong> or another multi-agent system to instrument.</p>
</li>
</ul>
<h2 id="heading-installing-docker-on-debian">Installing Docker on Debian</h2>
<p>Skip this if you already have Docker. macOS and Windows users can use Docker Desktop. On Debian:</p>
<pre><code class="language-bash">sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
  https://download.docker.com/linux/debian \
  \((. /etc/os-release &amp;&amp; echo "\)VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list &gt; /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
</code></pre>
<p>Ubuntu users: replace both <code>linux/debian</code> URLs with <code>linux/ubuntu</code>.</p>
<h2 id="heading-setting-up-jaeger-v2">Setting Up Jaeger v2</h2>
<h3 id="heading-basic-run">Basic Run</h3>
<p>For quick testing with no persistence:</p>
<pre><code class="language-bash">docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/jaeger:2.17.0
</code></pre>
<p>Port 16686 is the UI. Port 4317 is OTLP/gRPC ingestion. Port 4318 is OTLP/HTTP. Remove the container and your traces are gone.</p>
<h3 id="heading-persistent-storage-with-badger">Persistent Storage with Badger</h3>
<p>v2 reads configuration from a YAML file, not environment variables. Save this as <code>~/.local/share/jaeger/config.yaml</code>:</p>
<pre><code class="language-yaml">service:
  extensions: [jaeger_storage, jaeger_query, healthcheckv2]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [jaeger_storage_exporter]
extensions:
  healthcheckv2:
    use_v2: true
    http: { endpoint: 0.0.0.0:13133 }
  jaeger_query:
    storage: { traces: main_store }
  jaeger_storage:
    backends:
      main_store:
        badger:
          directories: { keys: /badger/key, values: /badger/data }
          ephemeral: false
          ttl: { spans: 720h }
receivers:
  otlp:
    protocols:
      grpc: { endpoint: 0.0.0.0:4317 }
      http: { endpoint: 0.0.0.0:4318 }
processors:
  batch:
exporters:
  jaeger_storage_exporter:
    trace_storage: main_store
</code></pre>
<p>The Jaeger container runs as UID 10001. Docker named volumes default to root ownership. Without fixing permissions first, the container crash-loops with <code>mkdir /badger/key: permission denied</code>.</p>
<p>Pre-create the volume and fix ownership:</p>
<pre><code class="language-bash">docker volume create jaeger-data

docker run --rm \
  -v jaeger-data:/badger \
  alpine sh -c "mkdir -p /badger/data /badger/key &amp;&amp; chown -R 10001:10001 /badger"
</code></pre>
<p>Then run Jaeger with the config mounted in:</p>
<pre><code class="language-bash">docker run -d --name jaeger \
  --restart unless-stopped \
  -v ~/.local/share/jaeger/config.yaml:/etc/jaeger/config.yaml:ro \
  -v jaeger-data:/badger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/jaeger:2.17.0 \
  --config /etc/jaeger/config.yaml
</code></pre>
<p>Verify persistence by running <code>docker restart jaeger</code> and confirming a previously recorded trace is still there. Hit <code>http://localhost:16686</code> and you should see the UI.</p>
<h2 id="heading-setting-up-claude-forge-tracing">Setting Up Claude Forge Tracing</h2>
<h3 id="heading-installing-claude-forge">Installing Claude Forge</h3>
<p>Install it through the Claude Code plugin marketplace:</p>
<pre><code class="language-bash">/plugin marketplace add hatmanstack/claude-forge
/plugin install forge@claude-forge
/reload-plugins
</code></pre>
<p>The install opens a TUI to confirm scope and settings. After reload, commands use the <code>forge:</code> prefix (for example, <code>/forge:pipeline</code>).</p>
<p>You can also clone the repo from <a href="https://github.com/HatmanStack/claude-forge">GitHub</a>.</p>
<h3 id="heading-installing-the-tracing-hook">Installing the Tracing Hook</h3>
<p>From your target project directory, run the install script. For plugin installs:</p>
<pre><code class="language-bash">cd your-project
forge-trace                # if you set up the alias from the README
# or, without the alias:
bash "$(find ~/.claude -path '*/forge*' -name install-tracing.sh 2&gt;/dev/null | head -1)"
</code></pre>
<p>For clone installs:</p>
<pre><code class="language-bash">cd your-project
bash /path/to/claude-forge/bin/install-tracing.sh
</code></pre>
<p>The script builds a dedicated venv at <code>~/.local/share/claude-forge/venv</code> (prefers <code>uv</code>, falls back to <code>python3 -m venv</code>), installs the OpenTelemetry packages, copies the hook into place, merges hook entries into <code>.claude/settings.local.json</code>, and self-tests against the OTLP endpoint.</p>
<p>Pass <code>--no-settings</code> to skip the settings merge, or <code>--uninstall</code> to tear everything down.</p>
<h3 id="heading-opting-in">Opting In</h3>
<p>Add to your shell init and restart your terminal:</p>
<pre><code class="language-bash">export CLAUDE_FORGE_TRACING=1
</code></pre>
<p>Restart Claude Code, run <code>/pipeline</code>, then check <code>http://localhost:16686</code> for the <code>claude-forge</code> service.</p>
<h2 id="heading-understanding-the-span-model">Understanding the Span Model</h2>
<p>Here's what the hierarchy looks like for a typical swarm run:</p>
<pre><code class="language-plaintext">session: "implement login form with OAuth"        &lt;- root span
├── subagent:planner
│   ├── tool:Write  (Phase-0.md)                  &lt;- mutation spans (on by default)
│   ├── tool:Write  (Phase-1.md)
│   └── subagent_result:planner                   &lt;- duration, token counts, output
├── subagent:implementer
│   ├── tool:Edit   (src/auth.ts)
│   ├── tool:Bash   (npm test)
│   ├── tool:Write  (src/oauth.ts)
│   └── subagent_result:implementer
├── subagent:reviewer
│   └── subagent_result:reviewer
└── session_complete                              &lt;- session totals
</code></pre>
<p>The root span's name comes from the first line of your prompt. Find traces by what you asked for, not by a UUID.</p>
<p>Subagents get an anchor span on start and a result span on completion. The result carries duration, token counts, prompt, and output.</p>
<h3 id="heading-three-tiers-of-detail">Three Tiers of Detail</h3>
<p>Not all inner tool calls are equally interesting. Write, Edit, MultiEdit, and Bash are mutational: small in number, high signal. They tell you what actually changed. Read, Glob, Grep, and WebFetch are navigation: lots of them, mostly noise.</p>
<p>Tracing captures mutations by default. That middle ground turned out to be the right one. Before this change, you either saw nothing inside subagents or you saw 200+ spans per run.</p>
<table>
<thead>
<tr>
<th>Mode</th>
<th>Subagents</th>
<th>Mutations (Write/Edit/Bash)</th>
<th>Other inner tools</th>
</tr>
</thead>
<tbody><tr>
<td>Default</td>
<td>yes</td>
<td>yes</td>
<td>no</td>
</tr>
<tr>
<td><code>CLAUDE_FORGE_TRACE_INNER=1</code></td>
<td>yes</td>
<td>yes</td>
<td>yes (minus blocklist)</td>
</tr>
<tr>
<td><code>CLAUDE_FORGE_TRACE_MUTATIONS=0</code></td>
<td>yes</td>
<td>no</td>
<td>no (or per INNER)</td>
</tr>
</tbody></table>
<h3 id="heading-span-attributes">Span Attributes</h3>
<p><strong>On</strong> <code>session_complete</code><strong>:</strong> <code>session.tokens.input</code>, <code>session.tokens.output</code>, <code>session.tokens.total</code>, <code>session.tokens.turns</code>, <code>session.duration_ms</code>, <code>user.prompt</code> (first 2KB).</p>
<p><strong>On</strong> <code>subagent_result</code><strong>:</strong> <code>agent.description</code>, <code>agent.prompt</code>, <code>agent.output</code>, <code>agent.duration_ms</code>, <code>agent.is_error</code>, <code>agent.tokens.input</code>, <code>agent.tokens.output</code>.</p>
<p><strong>On</strong> <code>tool:*</code><strong>:</strong> <code>tool.name</code>, <code>tool.input</code>, <code>tool.output</code>, <code>tool.duration_ms</code>, <code>tool.is_error</code>.</p>
<h2 id="heading-instrumenting-a-multi-agent-swarm">Instrumenting a Multi-Agent Swarm</h2>
<h3 id="heading-hook-architecture">Hook Architecture</h3>
<p>Claude Code has lifecycle hooks that fire scripts on specific events. Four matter here:</p>
<ol>
<li><p><strong>UserPromptSubmit</strong> (create the root span),</p>
</li>
<li><p><strong>PreToolUse</strong> (start a span),</p>
</li>
<li><p><strong>PostToolUse</strong> (end it with results), and</p>
</li>
<li><p><strong>Stop</strong> (finalize the trace). Each hook gets a JSON payload on stdin and runs as a subprocess.</p>
</li>
</ol>
<h3 id="heading-sending-spans-with-opentelemetry">Sending Spans with OpenTelemetry</h3>
<p>Here's some minimal Python to get a span into Jaeger:</p>
<pre><code class="language-python">from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

resource = Resource.create({"service.name": "my-agent-system"})
exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("agent-tracer")

with tracer.start_as_current_span("my-agent-task") as span:
    span.set_attribute("agent.name", "planner")
    span.set_attribute("agent.tokens.input", 1500)
    span.set_attribute("agent.tokens.output", 800)
</code></pre>
<p>Refresh <code>localhost:16686</code>, pick your service, click "Find Traces."</p>
<h3 id="heading-correlating-pre-and-post-events">Correlating Pre and Post Events</h3>
<p>You need to match each PreToolUse to its PostToolUse. Agent-type tool calls didn't include a <code>tool_use_id</code> in the payload, so I hashed the tool name and input instead. Pre and Post carry identical <code>tool_input</code>, so the hashes line up.</p>
<pre><code class="language-python">import hashlib, json

def correlation_key(tool_name: str, tool_input: dict) -&gt; str:
    content = json.dumps({"tool": tool_name, "input": tool_input}, sort_keys=True)
    return hashlib.sha1(content.encode()).hexdigest()[:16]
</code></pre>
<h3 id="heading-state-across-invocations">State Across Invocations</h3>
<p>Every hook call is a separate process. No shared memory. So I wrote span context to JSON files on Pre and read them back on Post:</p>
<pre><code class="language-plaintext">/tmp/claude-forge-tracing/&lt;session_id&gt;/
├── _root.json              # trace ID, root span context
├── _session_start_ns.json  # timestamp for duration calculation
├── subagent_&lt;hash&gt;.json    # per-subagent span context
└── tool_&lt;hash&gt;.json        # per-tool span context
</code></pre>
<p>File names get sanitized against path traversal. <code>_safe_name()</code> strips everything outside <code>[A-Za-z0-9._-]</code> and falls back to a SHA1 slug.</p>
<h3 id="heading-flushing-without-blocking">Flushing Without Blocking</h3>
<pre><code class="language-python">try:
    provider.force_flush(timeout_millis=1000)
except Exception:
    pass  # Never block the swarm
</code></pre>
<p>I tried 2000ms first and the swarm felt slow. 100ms lost spans on cold TLS connections. 1000ms worked. If Jaeger is down, the swarm keeps running regardless.</p>
<h2 id="heading-viewing-traces-in-the-jaeger-ui">Viewing Traces in the Jaeger UI</h2>
<p>Open <code>http://localhost:16686</code>. Pick <code>claude-forge</code> from the service dropdown. Click "Find Traces."</p>
<p>The trace search filters by operation name, tags, and time range. Since session spans take their name from your prompt, searching "login form" pulls up the runs where you asked for one.</p>
<p>The timeline view is where I spend most of my time. Every span is a horizontal bar, nested by parent-child relationships. I can see the planner took 12 seconds, the implementer 45, the reviewer 8. Click any bar to see token counts, prompts, outputs, error status.</p>
<p>Trace comparison puts two runs side by side. This is good for figuring out why one run succeeded and another did not.</p>
<h2 id="heading-lessons-from-the-trenches">Lessons from the Trenches</h2>
<p><strong>One trace per swarm, not per subagent:</strong> My first version wiped the root span's state file on every Stop event, so each subagent started a new trace. I changed Stop to mark a timestamp while preserving the root.</p>
<p><strong>Use descriptions, not type names:</strong> Subagents all report their type as <code>general-purpose</code>. The description field is where the actual role lives.</p>
<p><strong>Token attribution needs per-agent transcripts:</strong> Claude Code writes subagent transcripts to <code>~/.claude/projects/&lt;project&gt;/&lt;session&gt;/subagents/agent-*.jsonl</code>. Match them via <code>agent-*.meta.json</code>.</p>
<p><strong>Parse boolean env vars explicitly:</strong> <code>bool("0")</code> in Python is <code>True</code>. Use an allowlist: <code>{"1", "true", "yes", "on"}</code>.</p>
<h2 id="heading-environment-variable-reference">Environment Variable Reference</h2>
<table>
<thead>
<tr>
<th>Variable</th>
<th>Purpose</th>
</tr>
</thead>
<tbody><tr>
<td><code>CLAUDE_FORGE_TRACING=1</code></td>
<td>Master opt-in. Hook is a no-op without this.</td>
</tr>
<tr>
<td><code>CLAUDE_FORGE_TRACE_MUTATIONS=0</code></td>
<td>Disable default mutation spans (Write/Edit/Bash). On by default.</td>
</tr>
<tr>
<td><code>CLAUDE_FORGE_TRACE_INNER=1</code></td>
<td>Capture all inner tool calls as child spans (off by default).</td>
</tr>
<tr>
<td><code>CLAUDE_FORGE_TRACE_TOOL_BLOCKLIST</code></td>
<td>Comma-separated tools to skip when inner tracing is on. Defaults to <code>Read,Glob,Grep,TodoWrite,NotebookRead</code>.</td>
</tr>
<tr>
<td><code>CLAUDE_FORGE_HOOK_DEBUG=1</code></td>
<td>Enable debug logging of raw hook payloads. Off by default.</td>
</tr>
<tr>
<td><code>CLAUDE_FORGE_HOOK_DEBUG_LOG</code></td>
<td>Override debug log path. Defaults to <code>~/.cache/claude-forge/hook.log</code>.</td>
</tr>
<tr>
<td><code>OTEL_EXPORTER_OTLP_ENDPOINT</code></td>
<td>OTLP/gRPC endpoint. Defaults to <code>http://localhost:4317</code>.</td>
</tr>
</tbody></table>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>Without visibility into the process, you're being inefficient with tokens and your time. Multi-agent swarms cost real money on every run. When an agent fails and retries, or when a reviewer rejects work that was close, you're paying for that blind.</p>
<p>Tracing gives you the map. You find out where the failure modes are. You find out which agents burn tokens going nowhere. A 45-second implementer run might have been 10 seconds with a better planner prompt. But you would never know that without seeing the breakdown.</p>
<p>Get observability in early. Jaeger and OpenTelemetry make it cheap to set up. Once you can see where things go wrong you can actually fix them.</p>
<p>Claude Forge tracing is on the <a href="https://github.com/HatmanStack/claude-forge">main branch</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Why Chrome OS Is the Operating System the AI Era Was Built For ]]>
                </title>
                <description>
                    <![CDATA[ Chrome OS runs on a read-only filesystem. You can't install executables on the host. There's no traditional desktop environment. Everything that interacts with the underlying system does so through a  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/why-chrome-os-is-the-ai-os/</link>
                <guid isPermaLink="false">69e2765cfd22b8ad62611ba8</guid>
                
                    <category>
                        <![CDATA[ Chrome OS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Linux ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cybersecurity ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Christopher Galliart ]]>
                </dc:creator>
                <pubDate>Fri, 17 Apr 2026 18:05:16 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/c4116a06-9e42-4da5-a152-0fe1433e0857.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Chrome OS runs on a read-only filesystem. You can't install executables on the host. There's no traditional desktop environment. Everything that interacts with the underlying system does so through a sandboxed browser, a containerized Linux terminal, or a cloud connection.</p>
<p>For years, that list of constraints was the reason people dismissed it. But in 2026, it's the reason Chrome OS might be the most correctly designed operating system for what's coming.</p>
<p>The security architecture treats the endpoint as untrusted by default. The containerized Linux environment gives developers a full headless stack without compromising the host. And an upcoming OS-level rewrite, Aluminium, puts Google's on-device AI models directly into the kernel.</p>
<p>This article covers security architecture, the container-based developer environment, cloud-streamed creative tools via AWS NICE DCV, cloud gaming, and what Aluminium OS means for on-device AI.</p>
<h3 id="heading-heres-what-well-cover">Here's what we'll cover:</h3>
<ol>
<li><p><a href="#heading-security-first-architecture-in-the-era-of-ai-powered-threats">Security-First Architecture in an Era of AI-Powered Threats</a></p>
</li>
<li><p><a href="#heading-a-headless-linux-stack-thats-more-flexible-than-it-looks">A Headless Linux Stack That's More Flexible Than It Looks</a></p>
</li>
<li><p><a href="#heading-aws-nice-dcv-changes-the-creative-tools-conversation">AWS NICE DCV Changes the Creative Tools Conversation</a></p>
</li>
<li><p><a href="#heading-cloud-gaming-works">Cloud Gaming Works</a></p>
</li>
<li><p><a href="#heading-aluminum-os-on-device-models-on-googles-own-architecture">Aluminium OS: On-Device Models on Google's Own Architecture</a></p>
</li>
<li><p><a href="#heading-where-this-lands">Where This Lands</a></p>
</li>
</ol>
<h2 id="heading-security-first-architecture-in-an-era-of-ai-powered-threats">Security-First Architecture in an Era of AI-Powered Threats</h2>
<p>Threat actors are getting better tools. Models like Mythos are lowering the barrier for generating convincing phishing campaigns, crafting polymorphic malware, and automating social engineering at scale.</p>
<p>Traditional operating systems present exactly the attack surface these tools target: writable system files, user-installable executables, patches that sit uninstalled for weeks because someone clicked "remind me later."</p>
<p>Chrome OS sidesteps most of this by design. The root filesystem is read-only and cryptographically verified on every boot through a process called Verified Boot.</p>
<p>If anything has modified the OS files since the last verified state, whether that's malware, a compromised package, or a rogue AI agent that decided to start deleting system files, the device detects it at startup and either self-corrects or refuses to boot.</p>
<p>Persistence across reboots isn't difficult. It's architecturally impossible through software alone.</p>
<p>Updates happen silently. While you're working, the system downloads the next OS version to an inactive partition. On your next reboot, it pivots to the updated version. No prompts, no deferred patches, no exposure window.</p>
<p>Major updates ship every four to six weeks. Security patches land every two to three weeks. The gap between vulnerability discovery and remediation is measured in days.</p>
<p>Chrome OS consistently doesn't appear in the top 50 products by CVE count in the NIST vulnerability database. Windows and the Linux kernel sit near the top every year. When AI is actively being weaponized to find and exploit vulnerabilities faster than humans can patch them, a read-only, verified, automatically updated endpoint is a different category of security posture.</p>
<p>The tradeoff is trust. Chrome OS's security model means trusting Google as the root authority for your entire computing stack: updates, certificate trust, telemetry. Organizations with strict data sovereignty requirements should weigh that dependency carefully.</p>
<h2 id="heading-a-headless-linux-stack-thats-more-flexible-than-it-looks">A Headless Linux Stack That's More Flexible Than It Looks</h2>
<p>Chrome OS is a text-based operating system. There's no native GUI layer. Stop and sit with that for a second, because it's the thing that makes people dismiss Chrome OS and also the thing that makes it work.</p>
<p>The entire graphical interface you interact with IS the Chrome browser. The Ash shell, Chrome's window manager, is the desktop. You don't install applications onto it the way you install .exe files on Windows or drag .app bundles into a macOS Applications folder. If it isn't running in a browser tab, an Android VM, or a Linux container, it doesn't run. That restriction is what keeps the host locked down, and it's what makes everything else possible.</p>
<p>Under the hood, Chrome OS runs a minimal virtual machine called Termina through crosvm, Google's Rust-based VM monitor.</p>
<p>Inside Termina, LXD manages Linux containers. The default container, penguin, is a Debian environment with a special trick: it bridges GUI-based Linux applications directly into the Chrome OS desktop through a Wayland proxy called Sommelier. Install VS Code, GIMP, or LibreOffice in penguin and they show up in your Chrome OS app launcher, running in windows alongside your browser tabs. For a lot of developers, penguin alone covers the daily workflow.</p>
<p>But Termina gives you more than penguin. Through the LXD layer you can spin up independent containers that are fully isolated operating systems: Arch, Alpine, Ubuntu, whatever you need.</p>
<p>These aren't attached to the GUI bridge. They run headless, natively, with their own systemd, their own package managers, their own persistent state. Need a clean Ubuntu environment to test a deployment script without touching your main setup? <code>lxc launch</code> and you're there. Need to blow it away? <code>lxc delete</code> and it's gone. No orphaned files on the host, no cross-contamination between environments.</p>
<p>The key distinction from Docker is that LXD runs system containers (full OS emulation) rather than application containers. You get background services, persistent daemons, the works. You can also run Docker inside any of these LXD containers if you need application-level containerization on top of that.</p>
<p>Snapshot your entire environment with <code>lxc snapshot</code> before a risky dependency install and roll back instantly if something breaks. That kind of safety net is broader than version control alone: it captures your full OS configuration, not just code.</p>
<p>Pair this with browser-native tools like GitHub Codespaces, Google Colab, AWS CloudShell, or vscode.dev, and the terminal handles your local tooling while the browser handles everything else.</p>
<p>AI coding assistants like Claude and Gemini already operate natively in the browser. The distance between "cloud IDE" and "local IDE" keeps shrinking.</p>
<p>There are friction points: no custom kernel modules inside Crostini. Nested KVM requires Intel Gen 10+ processors. VPN routing into the Linux container from the Chrome OS host can be a headache, with WireGuard requiring userspace workarounds inside the container.</p>
<p>But none of these break the core architecture for cloud-native work. They're just worth knowing about before you commit.</p>
<h2 id="heading-aws-nice-dcv-changes-the-creative-tools-conversation">AWS NICE DCV Changes the Creative Tools Conversation</h2>
<p>One of the longest-standing arguments against Chrome OS has been the absence of professional creative software. There's no Premiere, no DaVinci Resolve, no Blender, no Ableton. For years, this was a dead-end conversation.</p>
<p>AWS NICE DCV (Desktop Cloud Visualization) reopens it. DCV is a high-performance remote display protocol that streams GPU-accelerated desktop sessions from EC2 instances to any device, including a Chromebook running the browser-based DCV client. It supports OpenGL, Vulkan, and DirectX rendering, with adaptive encoding that adjusts to network conditions. On AWS, the DCV license is free. You pay only for the EC2 compute time.</p>
<p>Netflix engineers use DCV to stream content creation applications to remote artists. Volkswagen runs 3D CAD simulations across their engineering division through it. A VFX studio called RVX used it to deliver visual effects for HBO's The Last of Us, streaming Nuke, Maya, Houdini, and Blender to artists distributed across Europe from servers in Iceland. Their team said it was the best remote experience they'd ever worked with.</p>
<p>So: a Chromebook connected to a g5.xlarge EC2 instance (one A10G GPU) can run Blender, DaVinci Resolve, or any other GPU-accelerated creative application with full hardware acceleration. The rendering happens in the data center. DCV streams the pixels. The creative professional gets a responsive, high-fidelity workspace on a $400 machine that couldn't locally render a single frame.</p>
<p>The constraints are connectivity and cost. You need sustained bandwidth (25+ Mbps for 1080p work, more for 4K multi-monitor setups) and leaving a GPU instance running around the clock adds up. But for studios and professionals who already budget for high-end workstations, the math often pencils out, especially when you factor in zero local hardware maintenance and the ability to scale GPU power on demand.</p>
<h2 id="heading-cloud-gaming-works">Cloud Gaming Works</h2>
<p>GeForce NOW survived where Stadia failed because it made a better business decision: bring your own games. Connect your existing Steam, Epic, or Ubisoft library and stream from NVIDIA's server-side hardware. The Ultimate tier now runs on RTX 5080-class infrastructure. 4K at 120fps with ray tracing, on a fanless Chromebook.</p>
<p>Chrome OS has a structural advantage as a cloud gaming client. GeForce NOW runs natively in the Chromium browser via WebRTC, and users consistently report less micro-stuttering and tighter input handling than the standalone Windows desktop app. Under good network conditions, measured total latency runs 13 to 14ms, with sub-3ms ping documented near datacenter proximity. That's below human perceptual threshold for most game types.</p>
<p>Anti-cheat systems like Easy Anti-Cheat and Riot Vanguard are a non-issue in this model. They run on the server where the game executes, not on your local endpoint. On-device gaming isn't viable on Chrome OS and likely never will be. The architecture isn't designed for it, and even projects attempting to bridge local GPUs hit bottlenecks in the container layers. Cloud gaming is the path, and it works.</p>
<p>The limiting factors are network-dependent. Latency spikes above 500ms on bad connections make fast-twitch games unplayable, and NVIDIA's 100-hour monthly cap on the Ultimate tier has drawn criticism. But cloud gaming on Chrome OS has crossed the line from novelty to daily-driver viable for most use cases.</p>
<h2 id="heading-aluminium-os-on-device-models-on-googles-own-architecture">Aluminium OS: On-Device Models on Google's Own Architecture</h2>
<p>The most consequential near-term development for Chrome OS is Project Aluminium, a ground-up rewrite that replaces the current Chrome OS foundation with a native Android kernel. Not another bolted-on compatibility layer: a new operating system built on Android 16, designed to run Android applications natively with direct hardware acceleration instead of routing them through the resource-heavy ARCVM virtual machine that currently eats CPU cycles on even basic app launches.</p>
<p>The AI story is the real story. Aluminium is being built with Gemini models integrated directly into the OS: the file system, the application launcher, the window manager.</p>
<p>Google serving their own proprietary models on their own devices, using an architecture optimized specifically to run them, is a level of vertical integration that no other OS vendor has in the pipeline. Apple has the silicon advantage for local inference. Google has the model-to-OS integration advantage. Those are competing theses about where AI compute should live, and both are worth taking seriously.</p>
<p>The rollout timeline from court documents and leaked roadmaps puts a trusted tester program on select hardware in late 2026, premium tablets by early 2027, and general consumer availability in 2028. Chrome OS Classic gets maintained through existing support obligations until 2033 or 2034.</p>
<p>The launch won't be perfect. Google's track record on platform transitions gives the community earned skepticism. But the ability to iterate a natively AI-integrated OS on hardware they control is the kind of capability that compounds over time.</p>
<h2 id="heading-where-this-lands">Where This Lands</h2>
<p>Two years ago, calling Chrome OS a serious platform for development or creative work would have been a stretch. Today you can run a full Debian environment with systemd daemons, snapshot your workspace, stream Blender from a GPU-backed data center, play AAA games at 4K on hardware you don't own, and do all of it from a verified, read-only endpoint that patches itself while you sleep.</p>
<p>The remaining gaps are real. But they're concentrated in workflows that are themselves moving to the cloud. Chrome OS was designed around assumptions about computing that used to be premature. They're not premature anymore.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Apply GAN Architecture to Multi-Agent Code Generation ]]>
                </title>
                <description>
                    <![CDATA[ Ask an AI coding agent to build a feature and it will probably do a decent job. Ask it to review its own work and it will tell you everything looks great. This is the fundamental problem with single-p ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-apply-gan-architecture-to-multi-agent-code-generation/</link>
                <guid isPermaLink="false">69c4123410e664c5dac5298f</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Software Engineering ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude ]]>
                    </category>
                
                    <category>
                        <![CDATA[ multi-agent systems ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Code Quality ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Christopher Galliart ]]>
                </dc:creator>
                <pubDate>Wed, 25 Mar 2026 16:49:56 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/3c06f375-0e26-427d-9659-b3be60716492.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Ask an AI coding agent to build a feature and it will probably do a decent job. Ask it to review its own work and it will tell you everything looks great.</p>
<p>This is the fundamental problem with single-pass AI code generation: the same context that created the code is the one evaluating it. There's no adversarial pressure. No second opinion. No fresh eyes.</p>
<p>What if you could structure the work so that separate agents generate and critique each other in iterative loops, the way a generator and discriminator improve each other in a <a href="https://www.freecodecamp.org/news/an-intuitive-introduction-to-generative-adversarial-networks-gans-7a2264a81394/">GAN</a>? The code that reaches you has already survived an argument between agents who disagreed about whether it was good enough.</p>
<p>This article walks through why that pattern works, how to build it, and when it is (and is not) worth the extra tokens. The concrete example is an open source project called <a href="https://github.com/HatmanStack/claude-forge">Claude Forge</a>, but the ideas are framework-agnostic. Anything that supports subagent spawning with fresh context windows can implement this pattern.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-the-single-pass-problem">The Single-Pass Problem</a></p>
</li>
<li><p><a href="#heading-what-the-ecosystem-is-solving">What the Ecosystem Is Solving</a></p>
</li>
<li><p><a href="#heading-the-gan-pattern-applied-to-code">The GAN Pattern Applied to Code</a></p>
</li>
<li><p><a href="#heading-why-rhetorical-questions-outperform-direct-instructions">Why Rhetorical Questions Outperform Direct Instructions</a></p>
</li>
<li><p><a href="#heading-feedback-as-filesystem">Feedback as Filesystem</a></p>
</li>
<li><p><a href="#heading-the-zero-context-engineer">The Zero-Context Engineer</a></p>
</li>
<li><p><a href="#heading-phase-0-immutable-conventions">Phase-0: Immutable Conventions</a></p>
</li>
<li><p><a href="#heading-convergence-design-knowing-when-to-stop">Convergence Design: Knowing When to Stop</a></p>
</li>
<li><p><a href="#heading-ground-truth-documents-and-the-pipeline">Ground Truth Documents and the Pipeline</a></p>
</li>
<li><p><a href="#heading-what-the-adversarial-loop-actually-catches">What the Adversarial Loop Actually Catches</a></p>
</li>
<li><p><a href="#heading-honest-trade-offs">Honest Trade-offs</a></p>
</li>
<li><p><a href="#heading-when-to-use-this-and-when-not-to">When to Use This (And When Not To)</a></p>
</li>
<li><p><a href="#heading-getting-started">Getting Started</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p>Familiarity with <a href="https://docs.anthropic.com/en/docs/claude-code">Claude Code</a> or a similar AI coding agent</p>
</li>
<li><p>A working installation of Claude Code (for the hands-on sections)</p>
</li>
<li><p>Basic understanding of how LLM context windows work</p>
</li>
<li><p>Git installed and configured</p>
</li>
</ul>
<p>No machine learning background is required. The GAN concepts are explained from first principles where they appear.</p>
<h2 id="heading-the-single-pass-problem">The Single-Pass Problem</h2>
<p>The AI generates code in one pass. If it hallucinates a file path, misunderstands the architecture, or writes tests that don't actually test anything, you catch it during review. Or worse, you don't.</p>
<p>This isn't a hypothetical. Anyone who has used AI coding agents at scale has seen placeholder tests like <code>expect(true).toBe(true)</code>, phantom dependencies where Phase 2 assumes a model that Phase 1 never creates, and instructions so ambiguous that two valid interpretations exist. These aren't rare edge cases. They're the predictable failure mode of single-pass generation.</p>
<p>The problem compounds with task complexity. A simple utility function generates fine in one pass. An auth middleware with token refresh, error handling, rate limiting, and logging across multiple files? The agent starts cutting corners, because the entire generation happened inside one context window that is simultaneously tracking the plan, the code, the tests, and the growing weight of its own prior reasoning.</p>
<h2 id="heading-what-the-ecosystem-is-solving">What the Ecosystem Is Solving</h2>
<p>There is a growing ecosystem of frameworks tackling different aspects of this problem. They each bring real contributions worth understanding.</p>
<p><a href="https://github.com/obra/superpowers">Superpowers</a> focuses on development methodology. It uses subagent-driven development, TDD enforcement, and multi-stage review. The framework generates a design spec, then an implementation plan, then dispatches subagents to execute. Review subagents check the output, and if they find issues, the implementer revises and gets re-reviewed until approved.</p>
<p><a href="https://github.com/gsd-build/get-shit-done"><strong>Get Shit Done</strong></a> <strong>(GSD)</strong> focuses on context engineering. Its key insight is fighting context window degradation through fresh 200k subagent contexts, parallel wave execution, and XML-structured plans. A JavaScript CLI handles the deterministic work (tracking progress, dependency ordering, context budgets) so the LLM never wastes tokens on bookkeeping it would do unreliably anyway.</p>
<p>Both frameworks share a crucial design decision: fresh context windows. When an agent has been reasoning for 100k tokens, its attention degrades. By spawning subagents with clean 200k contexts, these frameworks sidestep the "context rot" problem that plagues long-running agent sessions.</p>
<p>Where these frameworks diverge is in how they handle quality assurance. GSD relies on mechanical verification: lint, test, type-check, and auto-fix retries if the checks fail. There is no agent reading another agent's code to assess whether it matches the spec's intent. The "review" is whether <code>npm run test</code> passes.</p>
<p>Superpowers does have agent-to-agent review with iterative loops. But the review is enforced by in-context instructions, which means the agent can (and frequently does) rationalize skipping the review step to save tokens.</p>
<p>This is a known issue in the project. When review enforcement lives inside the same prompt that the model is also using to make efficiency decisions, the model sometimes decides that review is not worth the cost.</p>
<p>The adversarial GAN pattern addresses this differently. Instead of asking an agent to review its own work or trusting in-context instructions to enforce review, it structures the pipeline so that <strong>review is architecturally mandatory</strong>. The reviewer is a separate agent that cannot be skipped, because the orchestrator will not advance the pipeline without the reviewer's signal. The reviewer cannot modify source code, only <code>feedback.md</code>. The generator cannot approve its own output. Role separation is enforced by the system, not suggested by the prompt.</p>
<h2 id="heading-the-gan-pattern-applied-to-code">The GAN Pattern Applied to Code</h2>
<p>In machine learning, GANs pit two networks against each other: a generator creates content, a discriminator evaluates it, and the feedback loop between them drives both to improve. The generator gets better at producing realistic output. The discriminator gets better at finding flaws. The adversarial tension is what produces quality.</p>
<p>Applied to software development, this creates two stacked feedback loops:</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/09d2fc1a-94f7-44e8-a35e-f221f5c9563e.jpg" alt="Diagram comparing generator and discriminator roles in GAN loops. Top: Planner (Generator) vs Plan Reviewer (Discriminator). Bottom: Implementer (Generator) vs Reviewer (Discriminator). Arrows indicate iterative feedback between each pair." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Each role runs as a <strong>separate agent with its own fresh context window</strong>. The Plan Reviewer has never seen the Planner's reasoning process. It only sees the output. The Code Reviewer has never seen the Implementer's struggles. It only sees the code.</p>
<p>This separation fundamentally changes what the reviewer can catch. When a reviewer shares context with the generator, it inherits the generator's blind spots. When a reviewer starts fresh, it reads the plan the way an actual engineer would: with no assumptions about what the author "meant" versus what they wrote.</p>
<p>The adversarial Plan Reviewer doesn't just verify structure. It actively tries to break the plan:</p>
<ul>
<li><p><strong>Deadlock search:</strong> Is there a task ordering that would deadlock the implementer? (Task 3 needs the output of Task 5.)</p>
</li>
<li><p><strong>False positive verification:</strong> Could any verification checklist pass even with a wrong implementation?</p>
</li>
<li><p><strong>Ambiguity search:</strong> Are there instructions that could be interpreted two valid ways?</p>
</li>
<li><p><strong>Missing context:</strong> Could the implementer get stuck because a task assumes knowledge not provided?</p>
</li>
</ul>
<p>This is where the GAN analogy is most literal. The discriminator isn't checking if the plan looks good. It's trying to find failure modes.</p>
<h2 id="heading-why-rhetorical-questions-outperform-direct-instructions">Why Rhetorical Questions Outperform Direct Instructions</h2>
<p>When a reviewer finds an issue, there are two ways to communicate it.</p>
<p><strong>Direct instruction:</strong></p>
<pre><code class="language-plaintext">Fix line 45: the error handler returns 500 instead of 401 for invalid tokens.
</code></pre>
<p><strong>Rhetorical question:</strong></p>
<pre><code class="language-plaintext">Consider: The test test_invalid_token_rejection expects a 401 status code.
Are you returning the correct HTTP status in your error handling?

Think about: In src/auth/middleware.js:45, what happens when the token is
invalid? Is the error properly caught?

Reflect: Look at how other middleware handles auth errors. Are you following
the same pattern?
</code></pre>
<p>The direct instruction produces a mechanical edit. The agent changes line 45 and moves on. The rhetorical question produces a deeper investigation. The agent re-examines the surrounding code, considers the pattern used elsewhere, and is more likely to find the root cause rather than just patching the symptom.</p>
<p>This maps to how the underlying models work. When given an explicit instruction, the model follows it literally. When guided to reason about a problem, it activates a broader search through its understanding of the codebase. The fix addresses related issues that a mechanical edit would miss.</p>
<p>Reviewer prompts structured around "Consider," "Think about," and "Reflect" prefixes consistently produce better fixes than "Fix" or "Change" directives. The implementer agent receives these as feedback in <code>feedback.md</code> and addresses them in the next iteration of the GAN loop.</p>
<h2 id="heading-feedback-as-filesystem">Feedback as Filesystem</h2>
<p>Most agent orchestration systems rely on some form of message passing: API calls, databases, queue systems, in-memory state. These all work, but they introduce infrastructure dependencies and make the agent conversation opaque after the fact.</p>
<p>An alternative: use the filesystem as the message bus and git as the orchestration layer.</p>
<p>All agent communication flows through <code>feedback.md</code>, a structured markdown file with two sections:</p>
<pre><code class="language-markdown">## Active Feedback (OPEN)

### FB-001: Auth middleware missing rate limiting
- **Status:** OPEN
- **Source:** Plan Reviewer
- **Phase:** 1
- **Detail:** The plan specifies JWT validation but does not address rate
  limiting for failed auth attempts. Consider: what happens if an attacker
  brute-forces tokens?

## Resolved Feedback

### FB-000: Missing error codes in API spec
- **Status:** RESOLVED
- **Resolution:** Added error code table to Phase-0 conventions
</code></pre>
<p>This design has several properties that matter in practice:</p>
<p><strong>Full audit trail:</strong> Every piece of feedback, every resolution, every signal is committed to git alongside the code it produced. When you want to understand why the auth middleware was designed a certain way, the conversation that shaped it is right there in the commit history.</p>
<p><strong>State recovery:</strong> If a pipeline gets interrupted (token limits, network issues, you need to step away), resuming is trivial. The orchestrator re-reads <code>feedback.md</code> and <code>git log</code>, determines what stage the pipeline reached, and picks up where it left off. No cloud infrastructure, no database, no queue. Just files.</p>
<p><strong>Transparency:</strong> You can read the agent conversation in your editor. You can see exactly what the reviewer flagged, exactly how the implementer responded, and whether the resolution actually addressed the concern.</p>
<p>Agents communicate through structured signals routed by the orchestrator:</p>
<ul>
<li><p><code>PLAN_COMPLETE</code> / <code>REVISION_REQUIRED</code> / <code>PLAN_APPROVED</code> (plan GAN loop)</p>
</li>
<li><p><code>IMPLEMENTATION_COMPLETE</code> / <code>CHANGES_REQUESTED</code> / <code>PHASE_APPROVED</code> (code GAN loop)</p>
</li>
<li><p><code>GO</code> / <code>NO-GO</code> (final gate)</p>
</li>
<li><p><code>VERIFIED</code> / <code>UNVERIFIED</code> (post-remediation verification)</p>
</li>
</ul>
<p>Each signal marks a state transition. The orchestrator reads the signal, determines the next agent to invoke, and passes it the relevant context. The orchestrator itself is a Claude Code session, but the agents it spawns are fresh subagents with clean context windows.</p>
<h2 id="heading-the-zero-context-engineer">The Zero-Context Engineer</h2>
<p>One of the most effective constraints in the system is the "zero-context engineer" framing. The Planner writes every plan as if it will be executed by an engineer who:</p>
<ul>
<li><p>Is skilled but has <strong>zero context</strong> on the codebase</p>
</li>
<li><p>Is unfamiliar with the toolset and problem domain</p>
</li>
<li><p>Will follow instructions precisely</p>
</li>
<li><p>Will not infer missing details. If it's not in the plan, it won't happen.</p>
</li>
</ul>
<p>This constraint forces explicit instructions. No "add the usual auth middleware." Instead: which library, which pattern, which error codes, which files to create, which existing files to modify, and how to verify the result.</p>
<p>The Plan Reviewer then simulates this zero-context experience: "If I knew nothing about this codebase, could I follow these instructions and produce a working result?"</p>
<p>This framing catches a class of failures that are invisible to someone with context. The author of the plan knows what they meant. The zero-context reviewer only knows what is written. The gap between intention and specification is where bugs live.</p>
<h2 id="heading-phase-0-immutable-conventions">Phase-0: Immutable Conventions</h2>
<p>Every pipeline run starts with a Phase-0 document that defines immutable rules: tech stack, testing strategy, deployment approach, shared patterns, commit format. Every subsequent phase inherits from Phase-0. Every reviewer checks against it.</p>
<p>This solves a common multi-agent problem: drift. Without a shared source of truth, Agent A might decide to use Jest while Agent B sets up Vitest. Agent C might use a different error handling pattern than Agent D. Phase-0 prevents this by establishing conventions before any code is written.</p>
<p>The conventions aren't suggestions. They're constraints that every agent in the pipeline must respect, and every reviewer must verify against.</p>
<h2 id="heading-convergence-design-knowing-when-to-stop">Convergence Design: Knowing When to Stop</h2>
<p>An adversarial loop without exit conditions is just two agents arguing forever. The convergence design has three mechanisms:</p>
<p><strong>Iteration caps:</strong> Each GAN loop (plan review, code review) runs a maximum of 3 iterations. If the planner and reviewer cannot converge in 3 rounds, the issue requires human judgment, not more machine cycles.</p>
<p><strong>Signal protocol:</strong> The structured signals (<code>PLAN_APPROVED</code>, <code>GO</code>, <code>NO-GO</code>) are explicit state transitions, not suggestions. When the final reviewer issues <code>NO-GO</code>, the pipeline rolls back the phase. There is no "let's try one more time." The rollback is automatic.</p>
<p><strong>Token budget:</strong> Each phase targets roughly 50k tokens with a 75k hard ceiling. This prevents any single phase from consuming the entire context budget and ensures the orchestrator retains enough headroom to manage the pipeline.</p>
<p>These caps exist because adversarial loops have a cost curve. The first iteration catches major issues. The second iteration catches subtle issues. The third iteration catches edge cases. A fourth iteration almost never catches anything the previous three missed, but it costs just as many tokens. Three iterations hit the sweet spot between thoroughness and efficiency.</p>
<h2 id="heading-ground-truth-documents-and-the-pipeline">Ground Truth Documents and the Pipeline</h2>
<p>The adversarial pipeline doesn't start from a vague prompt. Every workflow begins with an intake skill that produces a structured ground truth document. The pipeline then runs from that document, not from the original user request.</p>
<h3 id="heading-brainstorm-turning-ideas-into-specs">Brainstorm: Turning Ideas into Specs</h3>
<p>The <code>/brainstorm</code> skill is the feature creation workflow. Given a feature idea, it first explores the codebase to understand the existing architecture, tech stack, and patterns. Then it asks 5-15 clarifying questions designed to front-load high-impact decisions:</p>
<pre><code class="language-plaintext">The codebase uses DynamoDB for storage. For this feature's data, should we:

A) Add tables to the existing DynamoDB setup
B) Use a different storage approach (e.g., S3 for documents)
C) Both - DynamoDB for metadata, S3 for content
</code></pre>
<p>These aren't generic questions. They're grounded in what the skill found during codebase exploration. The skill identifies the real decision points for this specific project and surfaces them before any planning or code generation begins.</p>
<p>The output is <code>brainstorm.md</code>, a structured design spec. Not a conversation transcript, but a distilled set of decisions that the Planner agent can consume cold. This document becomes the single source of truth for the entire pipeline run.</p>
<h3 id="heading-repository-evaluation-health-and-documentation-audits">Repository Evaluation, Health, and Documentation Audits</h3>
<p>The same ground-truth-document pattern applies to the audit workflows:</p>
<ul>
<li><p><code>/repo-eval</code> spawns three evaluator agents in parallel (the Pragmatist, the Oncall Engineer, the Team Lead), each scoring the codebase from a different lens across 12 pillars. The output is <code>eval.md</code>.</p>
</li>
<li><p><code>/repo-health</code> runs a technical debt auditor across four vectors (architectural, structural, operational, hygiene). The output is <code>health-audit.md</code>.</p>
</li>
<li><p><code>/doc-health</code> runs six detection phases comparing documentation against actual code. The output is <code>doc-audit.md</code>.</p>
</li>
<li><p><code>/audit</code> runs any combination of the above. It asks scoping questions once, then spawns up to 5 agents in parallel (3 evaluators + health auditor + doc auditor). All intake documents land in one directory.</p>
</li>
</ul>
<p>Each of these intake skills produces a read-only assessment. The agents doing the evaluation never modify the codebase. They only write their findings into the intake document.</p>
<h3 id="heading-the-pipeline-runs-from-ground-truth">The Pipeline Runs from Ground Truth</h3>
<p>The <code>/pipeline</code> skill reads whatever intake documents exist and runs the adversarial GAN loop from them. For a feature, it reads <code>brainstorm.md</code>. For an audit, it reads whichever combination of <code>eval.md</code>, <code>health-audit.md</code>, and <code>doc-audit.md</code> are present.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/ec48de80-2185-45d2-89b2-2008fbc14365.jpg" alt="Diagram of the extended pipeline agent workflow. Shows brainstorm exploring the codebase and producing a design spec, which feeds into the pipeline orchestrator. The orchestrator routes through three stages: Planning (Planner and Plan Reviewer in a GAN loop, max 3 iterations), Implementation (Implementer and Reviewer in a GAN loop, max 3 iterations), and Final Review (GO or NO-GO gate)." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>When multiple intake documents exist (from a combined audit), the Planner reads all findings together and consolidates overlapping concerns into a single unified plan. Phases are tagged by implementer type and ordered:</p>
<ol>
<li><p><code>[HYGIENIST]</code> phases first, subtractive cleanup (deleting dead code, simplifying over-abstractions)</p>
</li>
<li><p><code>[IMPLEMENTER]</code> phases next, structural fixes on clean code</p>
</li>
<li><p><code>[FORTIFIER]</code> phases next, locking in the clean state (linting, CI checks, git hooks)</p>
</li>
<li><p><code>[DOC-ENGINEER]</code> phases last, documentation reflecting final code</p>
</li>
</ol>
<p>The ordering matters. You don't want the implementer building on top of dead code that the hygienist would have removed. You don't want the doc-engineer documenting an API that the fortifier is about to add validation to.</p>
<p>This separation between intake and pipeline is deliberate. The intake skills are exploratory and interactive. They ask questions, explore the codebase, and produce a document. The pipeline is autonomous. It reads the document and runs through the adversarial loops with minimal human intervention, stopping only at explicit decision points.</p>
<h2 id="heading-what-the-adversarial-loop-actually-catches">What the Adversarial Loop Actually Catches</h2>
<p>In practice, the adversarial loops catch issues that single-pass generation consistently misses.</p>
<p><strong>Plan Review catches:</strong></p>
<ul>
<li><p>Hallucinated file paths (the Planner says "modify" a file that doesn't exist)</p>
</li>
<li><p>Phantom dependencies (Phase 2 assumes a model that Phase 1 never creates)</p>
</li>
<li><p>Test strategies that require live cloud resources instead of mocks</p>
</li>
<li><p>Ambiguous instructions that a zero-context engineer could misinterpret</p>
</li>
<li><p>Deadlocks in task ordering (Task 3 needs the output of Task 5)</p>
</li>
</ul>
<p><strong>Code Review catches:</strong></p>
<ul>
<li><p>Placeholder tests (<code>expect(true).toBe(true)</code>)</p>
</li>
<li><p>Deviations from Phase-0 architecture conventions</p>
</li>
<li><p>Missing error path coverage (only happy paths tested)</p>
</li>
<li><p>Hardcoded secrets and input validation gaps</p>
</li>
</ul>
<p><strong>Verification catches:</strong></p>
<ul>
<li><p>Remediation targets that weren't actually addressed</p>
</li>
<li><p>Regressions introduced during fixes</p>
</li>
<li><p>Partial fixes where the symptom changed but the root cause remains</p>
</li>
</ul>
<p>An earlier design re-ran the full evaluator or auditor agents after remediation, 3-5 agents re-scanning the entire codebase. This was token-expensive and redundant since the per-phase reviewers had already verified each fix. The current design uses a single verification agent with a targeted scope: read the original intake document findings and check each specific <code>file:line</code> location. One agent, targeted scope, a fraction of the tokens. Evaluator and auditor agents run exactly once (during intake) and never again.</p>
<h2 id="heading-honest-trade-offs">Honest Trade-offs</h2>
<p>This pipeline is not free. There are some trade-offs you'll want to consider and be aware of:</p>
<h3 id="heading-token-cost">Token Cost</h3>
<p>Multiple agents reviewing each other's work uses significantly more tokens than a single-pass approach. The adversarial loops can triple the total token usage for a feature. On a subscription plan, this means hitting session limits faster. On API billing, this means real money.</p>
<h3 id="heading-time">Time</h3>
<p>A feature that takes one agent 10 minutes might take the pipeline 30-45 minutes with review loops. Multi-agent frameworks in general are slower than single-pass. The adversarial loops add time on top of the orchestration overhead that any multi-agent system carries.</p>
<h3 id="heading-orchestrator-context-pressure">Orchestrator Context Pressure</h3>
<p>The orchestrator accumulates agent result summaries across phases. Long pipelines with many phases may hit context compression, which degrades the orchestrator's ability to route effectively.</p>
<h3 id="heading-not-fire-and-forget">Not Fire-and-Forget</h3>
<p>Despite the automation, complex features benefit from human checkpoints. The pipeline stops and asks for judgment at key moments. If you skip those checkpoints, you may end up with technically correct code that misses the actual requirement.</p>
<h3 id="heading-diminishing-returns-on-simple-tasks">Diminishing Returns on Simple Tasks</h3>
<p>For a quick script, a utility function, or a prototype, the adversarial overhead is pure waste. Single-pass generation is faster, cheaper, and sufficient.</p>
<p>The trade-off is worth it for features where correctness matters more than speed: anything touching auth, payments, data integrity, or infrastructure. When the cost of a bug in production exceeds the cost of the extra tokens to prevent it, the math works. For everything else, single-pass is fine.</p>
<h2 id="heading-when-to-use-this-and-when-not-to">When to Use This (And When Not To)</h2>
<p><strong>Use adversarial multi-agent patterns when:</strong></p>
<ul>
<li><p>The feature touches authentication, authorization, or session management</p>
</li>
<li><p>The code handles payments or financial transactions</p>
</li>
<li><p>Data integrity is critical (migrations, schema changes, ETL pipelines)</p>
</li>
<li><p>Infrastructure changes could affect production (IaC, CI/CD modifications)</p>
</li>
<li><p>The codebase is unfamiliar to the agents (large legacy systems)</p>
</li>
</ul>
<p><strong>Use single-pass generation when:</strong></p>
<ul>
<li><p>Prototyping or exploring an idea</p>
</li>
<li><p>Writing utility scripts or one-off tools</p>
</li>
<li><p>Making small, well-scoped changes to familiar code</p>
</li>
<li><p>Speed matters more than thoroughness</p>
</li>
<li><p>You will review the output carefully yourself anyway</p>
</li>
</ul>
<h2 id="heading-getting-started">Getting Started</h2>
<p>Claude Forge is built entirely from Claude Code custom skills. No external tooling, no CI integration required. Install by copying the skills directory into your project:</p>
<pre><code class="language-bash">git clone https://github.com/hatmanstack/claude-forge.git
cp -r claude-forge/.claude/skills/ /path/to/your-project/.claude/skills/
</code></pre>
<p>Then in your project:</p>
<pre><code class="language-bash"># Feature development
/brainstorm I want to add webhook support for payment events
/pipeline 2026-03-12-payment-webhooks

# Full audit (health + eval + docs), one command
/audit all
/pipeline 2026-03-16-audit-remediation

# Individual audits
/repo-eval
/repo-health
/doc-health
</code></pre>
<p>The pipeline handles the orchestration. You'll see progress reports between stages, and it will stop and ask when something needs human judgment.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>The adversarial pattern (separate generator and discriminator with isolated context windows, structured feedback as the communication channel, iteration caps for convergence) can be implemented in any agent system that supports subagent spawning with fresh contexts. The specific implementation uses Claude Code skills, but the pattern is the contribution, not the tooling.</p>
<p>Sometimes the best code comes from the argument, not the agreement.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Serverless RAG Pipeline on AWS That Scales to Zero ]]>
                </title>
                <description>
                    <![CDATA[ Most RAG tutorials end the same way: you've got a working prototype and a bill for a vector database that runs whether anyone's querying it or not. Add an always-on embedding service, a hosted LLM end ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-serverless-rag-pipeline-on-aws-that-scales-to-zero/</link>
                <guid isPermaLink="false">69b1b23c6c896b0519b4eda8</guid>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ serverless ]]>
                    </category>
                
                    <category>
                        <![CDATA[ RAG  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Christopher Galliart ]]>
                </dc:creator>
                <pubDate>Wed, 11 Mar 2026 18:19:40 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/c0416d9e-9661-47a3-ba9c-8001f5f91b8c.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most RAG tutorials end the same way: you've got a working prototype and a bill for a vector database that runs whether anyone's querying it or not. Add an always-on embedding service, a hosted LLM endpoint, and the usual AWS infrastructure, and you're looking at real money before a single user shows up.</p>
<p>But it doesn't have to work that way. In this tutorial, you'll deploy a fully serverless RAG pipeline that processes documents, images, video, and audio, then scales to zero when nobody's using it.</p>
<p>Everything runs in your AWS account, your data never leaves your infrastructure, and your ongoing monthly cost for a modest knowledge base will be closer to <code>2-3 USD</code> than <code>300 USD</code>.</p>
<p>We'll use <a href="https://github.com/HatmanStack/RAGStack-Lambda">RAGStack-Lambda</a>, an open-source project I built on AWS. By the end, you'll have a deployed pipeline with a dashboard, an AI chat interface with source citations, a drop-in web component you can embed in any app, and an MCP server you can use to feed your assistant context.</p>
<h3 id="heading-heres-what-well-cover">Here's what we'll cover:</h3>
<ul>
<li><p><a href="#heading-what-this-actually-costs">What This Actually Costs</a></p>
</li>
<li><p><a href="#heading-what-youre-building">What You're Building</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-deploying-from-aws-marketplace">Deploying from AWS Marketplace</a></p>
</li>
<li><p><a href="#heading-deploying-from-source">Deploying from Source</a></p>
</li>
<li><p><a href="#heading-uploading-your-first-documents">Uploading Your First Documents</a></p>
</li>
<li><p><a href="#heading-chatting-with-your-knowledge-base">Chatting With Your Knowledge Base</a></p>
</li>
<li><p><a href="#heading-embedding-the-web-component-in-your-app">Embedding the Web Component in Your App</a></p>
</li>
<li><p><a href="#heading-using-the-mcp-server">Using the MCP Server</a></p>
</li>
<li><p><a href="#heading-what-you-can-build-from-here">What You Can Build From Here</a></p>
</li>
<li><p><a href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-what-this-actually-costs">What This Actually Costs</h2>
<p>Before we build anything, let's talk money, because the cost story is the whole point.</p>
<p>RAG pipelines have two cost phases: ingestion (processing your documents once) and operation (querying them over time).</p>
<p>Most platforms charge you a flat monthly rate regardless of which phase you're in. A serverless architecture flips that: ingestion costs something, and then everything scales to zero.</p>
<h3 id="heading-ingestion-the-one-time-hit">Ingestion: The One-Time Hit</h3>
<p>When you upload documents, several things happen: text extraction (OCR for PDFs and images), embedding generation, metadata extraction, and storage. Here's what that actually costs per service:</p>
<p><strong>Textract (OCR):</strong> This is the most expensive part of ingestion, and it only applies to scanned PDFs and images that need text extraction. Plain text, HTML, CSV, and other text-based formats skip this entirely.</p>
<p>Textract charges about <code>1.50 USD</code> per 1,000 pages for standard text detection. If you're uploading 500 pages of scanned PDFs, that's about <code>0.75 USD</code>. A heavy initial load of several thousand scanned pages might run <code>5-10 USD</code>. But once your documents are processed, you never pay this again unless you add new ones.</p>
<p><strong>Bedrock Embeddings (Nova Multimodal):</strong> This is where your content gets converted into vectors for semantic search. The pricing is almost comically cheap:</p>
<ul>
<li><p>Text: <code>0.00002 USD</code> per 1,000 input tokens</p>
</li>
<li><p>Images: <code>0.00115 USD</code> per image</p>
</li>
<li><p>Video/Audio: <code>0.00200 USD</code> per minute</p>
</li>
</ul>
<p>To put that in perspective: if you have 1,500 text documents averaging 2,500 tokens each after chunking, your total embedding cost is about <code>0.08 USD</code>. A knowledge base with 500 images runs <code>0.58 USD</code>. Even a mixed corpus of text, images, and a few hours of video stays well under <code>2 USD</code> for the entire embedding pass. This is a one-time cost – you only re-embed if you add or update documents.</p>
<p><strong>Bedrock LLM (Metadata Extraction):</strong> RAGStack uses an LLM to analyze each document and extract structured metadata automatically. This is a few inference calls per document using Nova Lite or a similar model. At <code>0.06 USD</code>/<code>0.24 USD</code> per million input/output tokens, processing 1,500 documents costs well under <code>1 USD</code>.</p>
<p><strong>S3 Vectors (Storage):</strong> Storing your embeddings. At <code>0.06 USD</code> per GB/month, a knowledge base of 1,500 documents with 1,024-dimension vectors takes up a trivially small amount of space. We're talking pennies per month.</p>
<p><strong>S3 (Document Storage):</strong> Your source documents in standard S3. Even cheaper, <code>0.023 USD</code> per GB/month.</p>
<p><strong>DynamoDB:</strong> Stores document metadata and processing state. The on-demand pricing model means you pay per request during ingestion, then essentially nothing at rest. A few cents for the initial load.</p>
<p>To put real numbers on it: if you upload 200 text documents (PDFs, HTML, markdown), your total ingestion cost is likely under <code>1 USD</code>. If you upload 1,000 scanned PDFs that need OCR, you might see <code>5-8 USD</code> as a one-time hit. That <code>7-10 USD</code> figure you might see referenced? That's the upper end for a heavy initial load with lots of OCR work.</p>
<h3 id="heading-operation-where-scale-to-zero-shines">Operation: Where Scale-to-Zero Shines</h3>
<p>Once your documents are ingested, the pipeline is waiting. Not running. Waiting. Here's what each query costs:</p>
<p><strong>Lambda:</strong> Invocations are billed per request and duration. The free tier covers 1 million requests/month. For a personal or small-team knowledge base, you may never leave the free tier.</p>
<p><strong>S3 Vectors (Queries):</strong> <code>2.50 USD</code> per million query API calls, plus a per-TB data processing charge. For a small index queried a few hundred times a month, this rounds to effectively zero.</p>
<p><strong>Bedrock (Chat Inference):</strong> This is your main operating cost. Each chat response requires an LLM call. Using Nova Lite at <code>0.06 USD</code> per million input tokens and <code>0.24 USD</code> per million output tokens, a typical RAG query (retrieval context + user question + response) might cost <code>0.001-0.003 USD</code> per query. A hundred queries a month is <code>0.10-0.30 USD</code>.</p>
<p><strong>Step Functions:</strong> Orchestrates the document processing pipeline. Standard workflows charge <code>0.025 USD</code> per 1,000 state transitions. Minimal during operation since it's only active during ingestion.</p>
<p><strong>Cognito:</strong> User authentication. Free for the first 10,000 monthly active users.</p>
<p><strong>CloudFront:</strong> Serves the dashboard UI. Free tier covers 1 TB of data transfer per month.</p>
<p><strong>API Gateway:</strong> Handles GraphQL API requests. Free tier covers 1 million API calls per month.</p>
<p>Add it all up for a knowledge base with 500 documents getting a few hundred queries per month, and your monthly operating cost is somewhere between <code>0.50 USD</code> and <code>3.00 USD</code>. Most of that is the LLM inference for chat responses.</p>
<h3 id="heading-the-comparison-that-matters">The Comparison That Matters</h3>
<p>Here's the same pipeline on a traditional always-on stack:</p>
<table>
<thead>
<tr>
<th>Service</th>
<th>RAGStack-Lambda</th>
<th>Traditional Stack</th>
</tr>
</thead>
<tbody><tr>
<td>Vector Database</td>
<td>S3 Vectors: pennies/mo</td>
<td>Pinecone Starter: <code>70 USD</code>/mo</td>
</tr>
<tr>
<td>Vector Database (alt)</td>
<td>S3 Vectors: pennies/mo</td>
<td>OpenSearch Serverless: about <code>350 USD</code>/mo min</td>
</tr>
<tr>
<td>Compute</td>
<td>Lambda: free tier</td>
<td>EC2 or ECS: <code>50-150 USD</code>/mo</td>
</tr>
<tr>
<td>LLM Inference</td>
<td>Same per-query cost</td>
<td>Same per-query cost</td>
</tr>
<tr>
<td>Total (idle)</td>
<td>about <code>0.50-3.00 USD</code>/mo</td>
<td><code>120-500 USD</code>/mo</td>
</tr>
</tbody></table>
<p>The LLM inference cost per query is roughly the same everywhere – that's Bedrock's on-demand pricing regardless of your architecture. The difference is everything else. Traditional stacks pay a floor cost whether anyone's using them or not. A serverless stack pays for what it uses, and idle costs essentially nothing.</p>
<h3 id="heading-what-about-transcribe">What About Transcribe?</h3>
<p>If you're uploading video or audio, AWS Transcribe adds cost for speech-to-text conversion. Standard transcription runs about <code>0.024 USD</code> per minute of audio. A 10-minute video costs <code>0.24 USD</code> to transcribe. This is a one-time ingestion cost, once transcribed and embedded, the resulting text chunks are queried like any other document.</p>
<h2 id="heading-what-youre-building">What You're Building</h2>
<p>By the end of this tutorial, you'll have a deployed pipeline that does the following:</p>
<ol>
<li><p>You upload a document (PDF, image, video, audio, HTML, CSV, <a href="https://github.com/HatmanStack/RAGStack-Lambda/blob/main/docs/ARCHITECTURE.md">the full list</a> is extensive) through a web dashboard.</p>
</li>
<li><p>The pipeline detects the file type and routes it to the right processor. Scanned PDFs go through OCR via Textract. Video and audio go through Transcribe for speech-to-text, split into 30-second searchable chunks with speaker identification. Images get visual embeddings and any caption text you provide.</p>
</li>
<li><p>An LLM analyzes each document and extracts structured metadata, topic, document type, date range, people mentioned, whatever's relevant. This happens automatically.</p>
</li>
<li><p>Everything gets embedded using Amazon Nova Multimodal Embeddings and stored in a Bedrock Knowledge Base backed by S3 Vectors.</p>
</li>
<li><p>You (or your users) ask questions through an AI chat interface. The pipeline retrieves relevant documents, passes them as context to a Bedrock LLM, and returns an answer with collapsible source citations, including timestamp links for video and audio that jump to the exact position.</p>
</li>
</ol>
<p>All of this runs in your AWS account. No external control plane, no third-party services beyond AWS itself.</p>
<h3 id="heading-the-architecture">The Architecture</h3>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/45eca6a5-91b4-4f55-8b1a-ba9f59a3e25d.png" alt="The diagram illustrates a flowchart of a buyer's AWS account, detailing the application plane with processes like S3 to Lambda OCR, supported by services like Cognito Auth. It emphasizes Amazon Bedrock's integration for knowledge and chat." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>A few things to note about this architecture:</p>
<p><strong>Step Functions orchestrate everything.</strong> When a document is uploaded, a state machine manages the entire processing flow, detecting the file type, routing to the right processor, waiting for async operations like Transcribe jobs, then triggering embedding and metadata extraction.</p>
<p>This is what makes the pipeline reliable without a running server. If a step fails, it retries. You can see exactly where every document is in the processing pipeline.</p>
<p><strong>Lambda does the compute.</strong> Every processing step is a Lambda function. They spin up when needed, run for a few seconds to a few minutes, and shut down. There's no EC2 instance idling at 3 AM.</p>
<p><strong>S3 Vectors is the vector store.</strong> Your embeddings live in S3's purpose-built vector storage rather than in a dedicated vector database like Pinecone or OpenSearch.</p>
<p>This is what makes the "scale to zero" cost possible: you're paying object storage rates for vector data instead of keeping a database cluster warm. It also means your vectors are sitting in your own S3 bucket, not in a third-party managed service that holds your data on their terms.</p>
<p><strong>Cognito handles auth.</strong> The dashboard and API are protected with Cognito user pools. When you deploy, you get a temporary password via email. The web component uses IAM-based authentication, and server-side integrations use API key auth.</p>
<p><strong>CloudFront serves the UI.</strong> The dashboard is a static React app served through CloudFront, so there's no web server to maintain.</p>
<h3 id="heading-two-ways-to-deploy">Two Ways to Deploy</h3>
<p>You have two deployment paths depending on what you want:</p>
<p><strong>AWS Marketplace (the fast path)</strong>, click deploy, fill in two fields (stack name and email), and wait about 10 minutes. No local tooling required. This is the path we'll walk through first.</p>
<p><strong>From Source (the developer path)</strong>, Clone the repo, run <code>publish.py</code>, and deploy via SAM CLI. This is the path for when you want to customize the processing pipeline, modify the UI, or contribute to the project. We'll cover this after the Marketplace walkthrough.</p>
<p>Both paths produce the same stack. The Marketplace version just wraps the CloudFormation template in a one-click deployment.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you deploy, you'll need:</p>
<ul>
<li><p><strong>An AWS account</strong> with permissions to create CloudFormation stacks, Lambda functions, S3 buckets, DynamoDB tables, and Cognito user pools. If you're using an admin account, you're covered.</p>
</li>
<li><p><strong>Bedrock model access:</strong> RAGStack defaults to <code>us-east-1</code> because that's where Nova Multimodal Embeddings is available. Amazon's own models (including Nova) are available by default in Bedrock, no manual enablement required. Just make sure your IAM role has the necessary <code>bedrock:InvokeModel</code> permissions.</p>
</li>
<li><p><strong>For the Marketplace path:</strong> just a web browser.</p>
</li>
<li><p><strong>For the source path:</strong> Python 3.13+, Node.js 24+, AWS CLI and SAM CLI configured, and Docker (for building Lambda layers).</p>
</li>
</ul>
<h2 id="heading-deploying-from-aws-marketplace">Deploying from AWS Marketplace</h2>
<p>This is the fastest path – no local tools, no CLI, no Docker. You'll launch a CloudFormation stack and have a working pipeline in about 10 minutes.</p>
<h3 id="heading-step-1-launch-the-stack">Step 1: Launch the Stack</h3>
<p>Click the <a href="https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?templateURL=https://ragstack-quicklaunch-public.s3.us-east-1.amazonaws.com/ragstack-template.yaml&amp;stackName=my-docs">direct deploy link</a> to open CloudFormation's "Quick create stack" page with the template pre-loaded.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/d354f6bc-dee8-4f44-9b3b-523ea27564c7.png" alt="Screenshot of AWS CloudFormation Quick Create Stack page in dark mode. Sections for template URL, stack name, parameters, and build options are visible." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-2-fill-in-two-fields">Step 2: Fill In Two Fields</h3>
<p>The page has a lot of options, but you only need two:</p>
<ul>
<li><p><strong>Stack name:</strong> Must be lowercase. This becomes the prefix for all your AWS resources (for example, <code>my-docs</code>, <code>team-kb</code>, <code>project-notes</code>). Keep it short.</p>
</li>
<li><p><strong>Admin Email:</strong> Under Required Settings. Cognito will send your temporary login credentials here. Use an email you can access right now.</p>
</li>
</ul>
<p>Everything else – Build Options, Advanced Settings, OCR Backend, model selections – can stay at the defaults. They're there for customization later, but the defaults work out of the box.</p>
<h3 id="heading-step-3-deploy">Step 3: Deploy</h3>
<p>Scroll to the bottom, check the three acknowledgment boxes under "Capabilities and transforms," and click <strong>Create stack</strong>.</p>
<p>Deployment takes roughly 10 minutes. You can watch the progress in the CloudFormation Events tab if you're curious, but there's nothing to do until the stack status flips to <code>CREATE_COMPLETE</code>.</p>
<h3 id="heading-step-4-log-in">Step 4: Log In</h3>
<p>Once the stack finishes, check your email. Cognito sends you the dashboard URL and a temporary password. Log in, set a new password, and you're looking at an empty dashboard ready for documents.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/5ac31b6c-2782-4b66-82a9-0cb962c5dac4.png" alt="A software dashboard interface titled 'Document Pipeline (Demo)' displaying options for uploading, scraping, and searching documents. The screen shows no current documents or scrape jobs, with menu options on the left and a search and filter bar at the center. The overall tone is functional and minimalist." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-deploying-from-source">Deploying from Source</h2>
<p>If you want to customize the pipeline, modify the UI, or contribute to the project, deploy from source instead.</p>
<h3 id="heading-step-1-clone-and-set-up">Step 1: Clone and Set Up</h3>
<pre><code class="language-bash">git clone https://github.com/HatmanStack/RAGStack-Lambda.git
cd RAGStack-Lambda

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
</code></pre>
<h3 id="heading-step-2-deploy">Step 2: Deploy</h3>
<p>The <code>publish.py</code> script handles everything: building the frontend, packaging Lambda functions, and deploying via SAM CLI.</p>
<pre><code class="language-bash">python publish.py \
  --project-name my-docs \
  --admin-email admin@example.com
</code></pre>
<p>This defaults to <code>us-east-1</code> for Nova Multimodal Embeddings. The script will build the React dashboard, build the web component, package all Lambda layers with Docker, and deploy the CloudFormation stack through SAM.</p>
<p>First deploy takes longer (15-20 minutes) because it's building everything from scratch. Subsequent deploys are faster since SAM caches unchanged resources.</p>
<p>If you only want to iterate on the backend and skip UI builds:</p>
<pre><code class="language-bash"># Skip dashboard build (still builds web component)
python publish.py --project-name my-docs --admin-email admin@example.com --skip-ui

# Skip ALL UI builds
python publish.py --project-name my-docs --admin-email admin@example.com --skip-ui-all
</code></pre>
<p>Once it finishes, you'll get the same Cognito email and dashboard URL as the Marketplace path.</p>
<h2 id="heading-uploading-your-first-documents">Uploading Your First Documents</h2>
<p>The dashboard has tabs for different content types. We'll start with the Documents tab since that's the most common use case.</p>
<h3 id="heading-documents">Documents</h3>
<p>Click the <strong>Documents</strong> tab and upload a file. RAGStack accepts a wide range of formats: PDF, DOCX, XLSX, HTML, CSV, JSON, XML, EML, EPUB, TXT, and Markdown. Drag and drop or use the file picker.</p>
<p>Once uploaded, the document enters the processing pipeline. You'll see the status update in real time:</p>
<ol>
<li><p><strong>UPLOADED:</strong> File received and stored in S3.</p>
</li>
<li><p><strong>PROCESSING:</strong> Step Functions has picked it up and routed it to the right processor. Text-based files (HTML, CSV, Markdown) go through direct extraction. Scanned PDFs and images go through Textract OCR. The LLM analyzes the content and extracts structured metadata, topic, document type, people mentioned, date ranges, whatever's relevant to the content.</p>
</li>
<li><p><strong>INDEXED:</strong> Embeddings generated, vectors stored, document is searchable.</p>
</li>
</ol>
<p>Text documents typically process in 1-5 minutes. OCR-heavy documents (scanned PDFs, images with text) can take 2-15 minutes depending on page count.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/3df05041-2632-41a9-a71c-6d764c503f2a.png" alt="Screenshot of a document upload interface labeled &quot;Document Pipeline (Demo).&quot; Central panel shows a box for drag-and-drop file upload. Sleek, modern design." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-images">Images</h3>
<p>The <strong>Images</strong> tab works differently. Upload a JPG, PNG, GIF, or WebP and you can add a caption. Both the visual content and caption text get embedded using Nova Multimodal Embeddings, so you can search by what's in the image or by your description of it.</p>
<p>This is where multimodal embeddings earn their keep. A traditional text-only RAG pipeline would need you to describe every image manually. Here, the image itself becomes searchable, and since everything stays in your AWS account, you're not sending personal photos or sensitive visual content to an external service to get there.</p>
<h3 id="heading-what-about-video-and-audio">What About Video and Audio?</h3>
<p>Upload video or audio files and RAGStack routes them through AWS Transcribe for speech-to-text conversion. The transcript gets split into 30-second chunks with speaker identification, then embedded like any other document. When chat results reference a video source, you get timestamp links that jump to the exact position in the recording.</p>
<h3 id="heading-web-scraping">Web Scraping</h3>
<p>The <strong>Scrape</strong> tab lets you pull websites directly into your knowledge base. Enter a URL and RAGStack crawls the page, extracts the content, and processes it through the same pipeline as uploaded documents, metadata extraction, embedding, indexing.</p>
<p>This is useful for building a knowledge base from existing web content without manually saving and uploading pages. Documentation sites, blog archives, reference material, anything publicly accessible.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/ac2c6239-a323-4770-80f7-31aa7ff3bdfb.png" alt="Web scraping interface with fields for URL, max pages, and depth. A dropdown for scope selection and a 'Start Scrape' button are visible." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-chatting-with-your-knowledge-base">Chatting With Your Knowledge Base</h2>
<p>This is the payoff. Go to the <strong>Chat</strong> tab, type a question, and RAGStack retrieves relevant documents from your knowledge base, passes them as context to a Bedrock LLM, and returns an answer with source citations.</p>
<p>The citations are collapsible, so click to expand and see which documents informed the answer, with the option to download the source file. For video and audio sources, you get clickable timestamps that jump to the relevant moment.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/760b3cd0-8bb8-493d-97ce-5eb3d0138592.png" alt="Screenshot of a web interface titled &quot;Knowledge Base Chat&quot; with menu options on the left. The central section prompts users to ask document-related questions." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-metadata-filtering">Metadata Filtering</h3>
<p>If you've uploaded enough documents to have meaningful metadata categories, the chat interface lets you filter search results by metadata before querying. RAGStack auto-discovers the metadata structure from your documents, so you don't configure this manually, it just appears as your knowledge base grows.</p>
<p>This is useful when you have a large mixed corpus. Instead of hoping the vector search picks the right context from thousands of documents, you can narrow it down: "only search documents about project X" or "only search content from Q4 2024."</p>
<h2 id="heading-embedding-the-web-component-in-your-app">Embedding the Web Component in Your App</h2>
<p>The dashboard is useful for managing your knowledge base, but the real power is embedding RAGStack's chat in your own application. The web component works with any framework, React, Vue, Angular, Svelte, plain HTML.</p>
<p>Load the script once from your CloudFront distribution:</p>
<pre><code class="language-html">&lt;script src="https://your-cloudfront-url/ragstack-chat.js"&gt;&lt;/script&gt;
</code></pre>
<p>Then drop the component wherever you want a chat interface:</p>
<pre><code class="language-html">&lt;ragstack-chat
  conversation-id="my-app"
  header-text="Ask About Documents"
&gt;&lt;/ragstack-chat&gt;
</code></pre>
<p>That's it. The component handles authentication (via IAM), manages conversation state, and renders source citations, all self-contained. Your CloudFront URL is in the stack outputs.</p>
<p>For server-side integrations that don't need a UI, the GraphQL API is available with API key authentication. You can find your endpoint and API key in the dashboard under Settings.</p>
<h2 id="heading-using-the-mcp-server">Using the MCP Server</h2>
<p>RAGStack includes an MCP server that connects your knowledge base to AI assistants like Claude Desktop, Cursor, VS Code, and Amazon Q CLI. Instead of switching to the dashboard to search your documents, you ask your assistant directly.</p>
<p>Install it:</p>
<pre><code class="language-bash">pip install ragstack-mcp
</code></pre>
<p>Then add it to your AI assistant's MCP configuration:</p>
<pre><code class="language-json">{
  "ragstack": {
    "command": "uvx",
    "args": ["ragstack-mcp"],
    "env": {
      "RAGSTACK_GRAPHQL_ENDPOINT": "YOUR_ENDPOINT",
      "RAGSTACK_API_KEY": "YOUR_API_KEY"
    }
  }
}
</code></pre>
<p>Your endpoint and API key are in the dashboard under Settings. Once configured, type <code>@ragstack</code> in your assistant's chat to invoke the MCP server, then ask things like "search my knowledge base for authentication docs" and it queries RAGStack directly.</p>
<p>See the <a href="https://github.com/HatmanStack/RAGStack-Lambda/blob/main/src/ragstack-mcp/README.md">MCP Server docs</a> for the full list of available tools and setup details.</p>
<h2 id="heading-what-you-can-build-from-here">What You Can Build From Here</h2>
<p>You've got a deployed RAG pipeline that costs almost nothing to run and handles text, images, video, and audio. A few directions you might take it:</p>
<p><strong>A searchable personal archive.</strong> Every conference talk you've saved, every PDF textbook, every tutorial video that's sitting in a folder somewhere. Upload it all, and now you have one search interface across years of accumulated material. The multimodal embeddings mean your screenshots and diagrams are searchable too, not just the text.</p>
<p>I built <a href="https://github.com/HatmanStack/family-archive-document-ai">a family archive app</a> this way, scanned letters, old photos, home videos, with RAGStack deployed as a nested CloudFormation stack so the whole family can search across decades of memories using the chat widget.</p>
<p><strong>A second brain for a client project.</strong> Scrape the client's existing docs, upload the SOW and meeting notes, drop in the codebase documentation. Now you've got a searchable knowledge base scoped to that engagement. Spin it up at the start, tear it down when the contract ends. At these costs, it's disposable infrastructure.</p>
<p><strong>AI chat over a niche dataset.</strong> Recipe collections, legal filings, research papers, local government meeting minutes, any corpus that's too specialized for general-purpose LLMs to know well. The web component means you can ship it as a standalone tool without building a frontend from scratch.</p>
<p><strong>RAG for your MCP workflow.</strong> If you're already using Claude Desktop or Cursor, the MCP server turns your knowledge base into another tool your assistant can reach for. Upload your team's runbooks and architecture docs, and now <code>@ragstack</code> in your editor gives you instant context without tab-switching.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>The serverless RAG pipeline you just deployed handles document processing, multimodal embeddings, metadata extraction, and AI chat with source citations, all scaling to zero when idle, all running in your AWS account. Your documents, your vectors, your infrastructure. The traditional approach to this stack costs <code>120-500 USD</code>/month in baseline infrastructure. This one costs pocket change.</p>
<p>The full source is at <a href="https://github.com/HatmanStack/RAGStack-Lambda">github.com/HatmanStack/RAGStack-Lambda</a>. File issues, open PRs, or just poke around the architecture. If you want to go deeper on the technical tradeoffs, particularly how filtered vector search behaves on cost-optimized backends like S3 Vectors, that's a story for the next post.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
