<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ claude.ai - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ claude.ai - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sun, 14 Jun 2026 22:42:47 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/claudeai/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Keep Human Experts Visible in Your AI-Assisted Codebase ]]>
                </title>
                <description>
                    <![CDATA[ Six months ago, Stack Overflow processed 108,563 questions in a single month. By December 2025, that number had fallen to 3,862. A 78% collapse in two years. The explanation everyone reaches for is th ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-keep-human-experts-visible-in-your-ai-assisted-codebase/</link>
                <guid isPermaLink="false">69dd18d4217f5dfcbd13e964</guid>
                
                    <category>
                        <![CDATA[ claude.ai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Productivity ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude-code ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Daniel Nwaneri ]]>
                </dc:creator>
                <pubDate>Mon, 13 Apr 2026 16:24:52 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/21d160a8-af66-4048-9fda-1d83b2e26148.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Six months ago, Stack Overflow processed 108,563 questions in a single month. By December 2025, that number had fallen to 3,862. A 78% collapse in two years.</p>
<p>The explanation everyone reaches for is that AI replaced it. That's partly true. But it misses the structural problem underneath: every time a developer asks Claude or ChatGPT to write code, the knowledge that shaped the answer disappears.</p>
<p>The GitHub discussion where someone spent two hours documenting why cursor-based pagination beats offset for live-updating datasets. The Stack Overflow answer from 2019 where one engineer, after a week of debugging, documented exactly why that approach fails under concurrent writes.</p>
<p>The AI consumed all of it. The humans who produced it got nothing — no citation in the codebase, no signal that their work mattered.</p>
<p>Over time, those people stopped contributing. Stack Overflow isn't dying because it's bad. It's dying because AI extracted its value and the feedback loop that kept humans contributing broke down.</p>
<p>This tutorial builds a tool that puts that loop back together. <strong>proof-of-contribution</strong> is a Claude Code skill that links every AI-generated artifact back to the human knowledge that inspired it — and surfaces exactly where the AI made choices with no human source at all.</p>
<p>I'll show you how to install proof-of-contribution, how to record your first provenance entry, how to use the spec-writer integration that makes Knowledge Gaps deterministic, and how to run <code>poc.py verify</code> — a static analyser that detects gaps without a single API call.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a href="#heading-what-you-will-build">What You Will Build</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-quickstart-in-5-minutes">Quickstart in 5 Minutes</a></p>
</li>
<li><p><a href="#heading-how-the-tool-works">How the Tool Works</a></p>
</li>
<li><p><a href="#heading-how-to-install-proof-of-contribution">How to Install proof-of-contribution</a></p>
</li>
<li><p><a href="#heading-how-to-scaffold-your-project">How to Scaffold Your Project</a></p>
</li>
<li><p><a href="#heading-how-to-record-your-first-provenance-entry">How to Record Your First Provenance Entry</a></p>
</li>
<li><p><a href="#heading-how-to-use-import-spec-to-seed-knowledge-gaps">How to Use import-spec to Seed Knowledge Gaps</a></p>
</li>
<li><p><a href="#heading-how-to-trace-human-attribution">How to Trace Human Attribution</a></p>
</li>
<li><p><a href="#heading-how-to-verify-with-static-analysis">How to Verify with Static Analysis</a></p>
</li>
<li><p><a href="#heading-how-to-enable-pr-enforcement">How to Enable PR Enforcement</a></p>
</li>
<li><p><a href="#heading-where-to-go-next">Where to Go Next</a></p>
</li>
</ol>
<h2 id="heading-what-you-will-build">What You Will Build</h2>
<p>proof-of-contribution is a Claude Code skill with a local CLI. Together they give you:</p>
<ul>
<li><p><strong>Provenance Blocks</strong>: Claude appends a structured attribution block to every generated artifact, listing the human sources that inspired it and flagging what it synthesized without any traceable source.</p>
</li>
<li><p><strong>Knowledge Gaps</strong>: the parts of AI-generated code that have no human citation, surfaced before they become production incidents</p>
</li>
<li><p><code>poc.py trace</code>: a CLI command that shows the full human attribution chain for any file in thirty seconds</p>
</li>
<li><p><code>poc.py import-spec</code>: bridges proof-of-contribution with spec-writer, seeding knowledge gaps from your spec's assumptions list before the agent builds anything</p>
</li>
<li><p><code>poc.py verify</code>: a static analyser that cross-checks your file's structure against seeded claims using Python's AST. Zero API calls. Exit code 0 means clean, exit code 1 means gaps found — wires directly into CI</p>
</li>
<li><p><strong>A GitHub Action</strong>: optional PR enforcement that fails PRs missing attribution, for teams that want a standard</p>
</li>
</ul>
<p>The complete source is at <a href="https://github.com/dannwaneri/proof-of-contribution">github.com/dannwaneri/proof-of-contribution</a>.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>This is a beginner-to-intermediate tutorial. You should be comfortable with:</p>
<ul>
<li><p><strong>Command line basics</strong>: navigating directories, running scripts</p>
</li>
<li><p><strong>Git</strong>: basic commits and PRs</p>
</li>
<li><p><strong>Python 3.8 or higher</strong>: the CLI is pure Python with no dependencies</p>
</li>
</ul>
<p>You will need:</p>
<ul>
<li><p><strong>Python installed</strong>: check with <code>python --version</code> or <code>python3 --version</code></p>
</li>
<li><p><strong>Git installed</strong>: check with <code>git --version</code></p>
</li>
<li><p><strong>Claude Code</strong> (or any agent that supports the Agent Skills standard — Cursor and Gemini CLI also work)</p>
</li>
</ul>
<p>There's no database to install. No API keys. No paid services. The default storage is SQLite, which Python includes out of the box.</p>
<h2 id="heading-quickstart-in-5-minutes">Quickstart in 5 Minutes</h2>
<p>If you want to try the tool before reading the full tutorial, here are the five commands that take you from zero to your first gap detection:</p>
<p><strong>Mac and Linux:</strong></p>
<pre><code class="language-bash"># 1. Install
mkdir -p ~/.claude/skills
git clone https://github.com/dannwaneri/proof-of-contribution.git \
  ~/.claude/skills/proof-of-contribution

# 2. Scaffold your project (run in your repo root)
python ~/.claude/skills/proof-of-contribution/assets/scripts/poc_init.py

# 3. Record attribution for an AI-generated file
python poc.py add src/utils/parser.py

# 4. Detect gaps via static analysis
python poc.py verify src/utils/parser.py

# 5. See the full provenance chain
python poc.py trace src/utils/parser.py
</code></pre>
<p><strong>Windows PowerShell:</strong></p>
<pre><code class="language-powershell"># 1. Install
New-Item -ItemType Directory -Force -Path "$HOME\.claude\skills"
git clone https://github.com/dannwaneri/proof-of-contribution.git `
  "$HOME\.claude\skills\proof-of-contribution"

# 2. Scaffold your project
python "$HOME\.claude\skills\proof-of-contribution\assets\scripts\poc_init.py"

# 3. Record attribution
python poc.py add src\utils\parser.py

# 4. Detect gaps
python poc.py verify src\utils\parser.py

# 5. See the full provenance chain
python poc.py trace src\utils\parser.py
</code></pre>
<p>That's the whole tool. The sections below walk through each step in detail with real terminal output at every stage.</p>
<h2 id="heading-how-the-tool-works">How the Tool Works</h2>
<p>Before you install anything, you need a clear mental model of what proof-of-contribution actually does — because the most important part isn't obvious.</p>
<h3 id="heading-the-archaeology-problem">The Archaeology Problem</h3>
<p>Here's a scenario that happens on every team using AI-assisted development.</p>
<p>A developer joins. They go through six months of AI-generated codebase. They hit a bug in the pagination logic — cursor-based, unusual implementation, nobody remembers why it was built that way. The original developer has left.</p>
<p>Old answer: two days of archaeology. <code>git blame</code> points to a commit message that says "fix pagination." The commit before that says "implement pagination." Dead end.</p>
<p>With <code>poc.py trace src/utils/paginator.py</code>, that same developer sees this in thirty seconds:</p>
<pre><code class="language-plaintext">Provenance trace: src/utils/paginator.py
────────────────────────────────────────────────────────────
  [HIGH]  @tannerlinsley on github
          Cursor pagination discussion
          https://github.com/TanStack/query/discussions/123
          Insight: cursor beats offset for live-updating datasets

Knowledge gaps (AI-synthesized, no human source):
  • Error retry strategy — no human source cited
  • Concurrent write handling — AI chose this arbitrarily
</code></pre>
<p>They now know where the pattern came from and — critically — which parts have no traceable human source. The concurrent write handling is where the bug lives. The AI made a choice nobody reviewed.</p>
<p>That's what this tool does. Not enforcement first. Archaeology first.</p>
<h3 id="heading-how-knowledge-gaps-are-detected">How Knowledge Gaps Are Detected</h3>
<p>The obvious assumption is that Claude introspects and reports what it doesn't know. That assumption is wrong. LLMs hallucinate confidently. An AI that could reliably detect its own knowledge gaps wouldn't produce them.</p>
<p>The detection mechanism is a comparison, not introspection.</p>
<p>When you use <a href="https://github.com/dannwaneri/spec-writer">spec-writer</a> before building, it generates a spec with an explicit <code>## Assumptions to review</code> section — every decision the AI is making that you didn't specify, each one impact-rated. That list is the contract.</p>
<p>When you run <code>poc.py import-spec spec.md --artifact src/utils/paginator.py</code>, those assumptions get seeded into the database as unresolved knowledge gaps. After the agent builds, <code>poc.py trace</code> shows which assumptions made it into code with no human source ever cited.</p>
<p>The AI isn't grading its own exam. The spec is the answer key.</p>
<p><code>poc.py verify</code> takes this further. After the agent builds, it parses the file's actual structure using Python's built-in <code>ast</code> module — extracting every function definition, conditional branch, and return path. It cross-checks each one against the seeded claims. Any structural unit with no resolved claim surfaces as a deterministic Knowledge Gap, regardless of how confident the model was when it wrote the code.</p>
<h2 id="heading-how-to-install-proof-of-contribution">How to Install proof-of-contribution</h2>
<h3 id="heading-mac-and-linux">Mac and Linux</h3>
<pre><code class="language-bash">mkdir -p ~/.claude/skills
git clone https://github.com/dannwaneri/proof-of-contribution.git \
  ~/.claude/skills/proof-of-contribution
</code></pre>
<h3 id="heading-windows-powershell">Windows PowerShell</h3>
<pre><code class="language-powershell">New-Item -ItemType Directory -Force -Path "$HOME\.claude\skills"
git clone https://github.com/dannwaneri/proof-of-contribution.git `
  "$HOME\.claude\skills\proof-of-contribution"
</code></pre>
<p>That's the entire installation. No package to install, no configuration file to edit. The skill is a markdown file the agent reads. The CLI is a Python script that runs locally.</p>
<h3 id="heading-verify-the-install">Verify the Install:</h3>
<pre><code class="language-bash">ls ~/.claude/skills/proof-of-contribution/
</code></pre>
<p>You should see <code>SKILL.md</code>, <code>poc.py</code>, <code>assets/</code>, and <code>references/</code>. If the directory is empty, the clone failed — check your internet connection and try again.</p>
<h2 id="heading-how-to-scaffold-your-project">How to Scaffold Your Project</h2>
<p>The scaffold script creates the database, config, CLI, and GitHub integration in your project root. Run it once per project.</p>
<h3 id="heading-mac-and-linux">Mac and Linux</h3>
<pre><code class="language-bash">cd /path/to/your/project
python ~/.claude/skills/proof-of-contribution/assets/scripts/poc_init.py
</code></pre>
<h3 id="heading-windows-powershell">Windows PowerShell</h3>
<pre><code class="language-powershell">cd C:\path\to\your\project
python "$HOME\.claude\skills\proof-of-contribution\assets\scripts\poc_init.py"
</code></pre>
<p>You should see output like this:</p>
<pre><code class="language-plaintext">🔗 Proof of Contribution — init

  →  Project root: /path/to/your/project
  ✔  Created .poc/config.json
  ✔  Created .poc/.gitignore  (db excluded from git, config tracked)
  ✔  Created .poc/provenance.db  (SQLite — no extra infra needed)
  ✔  Created .github/PULL_REQUEST_TEMPLATE.md
  ✔  Created .github/workflows/poc-check.yml
  ✔  Created poc.py  (local CLI — includes import-spec command)
  ✔  Created .gitignore

✔ Proof of Contribution initialised for 'your-project'
</code></pre>
<p>This creates four things in your project:</p>
<pre><code class="language-plaintext">your-project/
├── .poc/
│   ├── config.json      ← project settings (commit this)
│   ├── provenance.db    ← SQLite database (local only, gitignored)
│   └── .gitignore
├── .github/
│   ├── PULL_REQUEST_TEMPLATE.md
│   └── workflows/
│       └── poc-check.yml
└── poc.py               ← your local CLI
</code></pre>
<ul>
<li><p><code>.poc/</code> — the tool's local data directory. <code>config.json</code> stores project settings and is committed to git. <code>provenance.db</code> is the SQLite database where attribution records and knowledge gaps are stored — local only, gitignored.</p>
</li>
<li><p><code>poc.py</code> — your local CLI, copied into the project root. Run <code>python poc.py trace</code>, <code>python poc.py verify</code>, and every other command directly without a global install.</p>
</li>
<li><p><code>.github/PULL_REQUEST_TEMPLATE.md</code> — a PR template with the <code>## 🤖 AI Provenance</code> section pre-filled. Developers fill it in when submitting PRs that contain AI-generated code.</p>
</li>
<li><p><code>.github/workflows/poc-check.yml</code> — the optional GitHub Action for PR enforcement. Installed but dormant until you push the workflow file and enable it in your repo settings.</p>
</li>
</ul>
<p><strong>Windows note:</strong> if the scaffold fails with a <code>UnicodeEncodeError</code>, the emoji in the PR template is hitting a Windows encoding limit. Open <code>assets/scripts/poc_init.py</code> in a text editor and find every line ending with <code>.write_text(...)</code>. Change each one to <code>.write_text(..., encoding="utf-8")</code>. Save and re-run.</p>
<h3 id="heading-verify-the-scaffold-worked">Verify the Scaffold Worked</h3>
<pre><code class="language-bash">python poc.py report
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">Proof of Contribution Report
────────────────────────────────────────
  Artifacts tracked    : 0
  With provenance      : 0  (0%)
  Unresolved gaps      : 0
  Resolved claims      : 0
  Human experts        : 0
</code></pre>
<p>Empty database, clean state. You're ready.</p>
<h2 id="heading-how-to-record-your-first-provenance-entry">How to Record Your First Provenance Entry</h2>
<p>Before we dive in here, I just want to clear something up. Earlier, I described <code>poc.py verify</code> as detecting Knowledge Gaps automatically — and it does. But the static analyser can only tell you <em>that</em> a function has no human citation. It can't tell you <em>which</em> human source inspired it. That knowledge lives in your head, not in the code.</p>
<p><code>poc.py add</code> is where you supply that context. After the agent builds a file, you record the human sources you actually drew on: the GitHub discussion you read before prompting, the Stack Overflow answer that shaped the approach. Those records become the attribution chain <code>poc.py trace</code> surfaces — and what closes the gaps <code>poc.py verify</code> flags.</p>
<p><code>verify</code> finds the gaps. <code>add</code> fills them.</p>
<p><code>poc.py add</code> records attribution for a file interactively. You can run it on any AI-generated file in your project.</p>
<pre><code class="language-bash">python poc.py add src/utils/parser.py
</code></pre>
<p>You'll see a prompt:</p>
<pre><code class="language-plaintext">Recording provenance for: src/utils/parser.py
(Press Ctrl+C to cancel)

  Human source URL (or Enter to finish):
</code></pre>
<p>Enter the URL of the human-authored source that inspired the code. This could be a GitHub discussion, a Stack Overflow answer, a documentation page, a blog post, or an RFC.</p>
<pre><code class="language-plaintext">  Human source URL (or Enter to finish): https://github.com/TanStack/query/discussions/123
  Author handle: tannerlinsley
  Platform (github/stackoverflow/docs/other): github
  Source title: Cursor pagination discussion
  What specific insight came from this? cursor beats offset for live-updating datasets
  Confidence HIGH/MEDIUM/LOW [MEDIUM]: HIGH
  ✔ Recorded.
</code></pre>
<p>Add as many sources as apply. Press Enter on a blank URL when you're done.</p>
<pre><code class="language-plaintext">  Human source URL (or Enter to finish): 
✔ Provenance saved. Run: python poc.py trace src/utils/parser.py
</code></pre>
<h3 id="heading-check-what-you-recorded">Check What You Recorded</h3>
<pre><code class="language-bash">python poc.py trace src/utils/parser.py
</code></pre>
<pre><code class="language-plaintext">Provenance trace: src/utils/parser.py
────────────────────────────────────────────────────────────
  [HIGH]  @tannerlinsley on github
          Cursor pagination discussion
          https://github.com/TanStack/query/discussions/123
          Insight: cursor beats offset for live-updating datasets
</code></pre>
<p>No knowledge gaps — because you recorded a source. If the file had parts with no human source, they would appear below as gaps.</p>
<h3 id="heading-see-all-experts-in-your-graph">See All Experts in Your Graph</h3>
<p>Every <code>poc.py add</code> call stores not just the URL but the author — their handle, platform, and the specific insight they contributed. Run it across enough files, and those authors accumulate into a <strong>knowledge graph</strong>: a local record of which human experts your codebase drew from, which files their knowledge shaped, and how many artifacts trace back to their work.</p>
<p><code>poc.py experts</code> surfaces the top contributors. On a new project, it'll be one or two entries. On a mature codebase, it becomes a map of whose knowledge is load-bearing — the people you'd want to consult if that part of the code ever needed to change.</p>
<pre><code class="language-bash">python poc.py experts
</code></pre>
<pre><code class="language-plaintext">Top Human Experts in Knowledge Graph
──────────────────────────────────────────────────────
  @tannerlinsley            github          1 artifact(s)
</code></pre>
<h2 id="heading-how-to-use-import-spec-to-seed-knowledge-gaps">How to Use import-spec to Seed Knowledge Gaps</h2>
<p>This is the most important command in the tool. It connects proof-of-contribution with spec-writer and makes Knowledge Gaps deterministic.</p>
<p>When you use spec-writer before building a feature, it generates an <code>## Assumptions to review</code> section — every implicit decision is impact-rated HIGH, MEDIUM, or LOW. The <code>import-spec</code> command reads that section and seeds those assumptions into the database as unresolved gaps before the agent writes a line of code.</p>
<p>After the agent builds, any assumption that made it into the implementation without a cited human source surfaces automatically in <code>poc.py trace</code>. You don't need to know which parts of the code are uncertain. The spec already told you.</p>
<h3 id="heading-step-1-create-a-test-spec">Step 1 — Create a Test Spec</h3>
<p>If you don't have a spec-writer output yet, create one manually to see how the import works.</p>
<p><strong>Mac and Linux:</strong></p>
<pre><code class="language-bash">cat &gt; test-spec.md &lt;&lt; 'EOF'
## Assumptions to review

1. SQLite is sufficient for single-developer use — Impact: HIGH
   Correct this if: you need team-shared provenance

2. Filepath is the artifact identifier — Impact: MEDIUM
   Correct this if: you use content hashing instead

3. REST pattern for any future API — Impact: LOW
   Correct this if: you prefer GraphQL
EOF
</code></pre>
<p><strong>Windows PowerShell:</strong></p>
<pre><code class="language-powershell">python -c "
content = '''## Assumptions to review

1. SQLite is sufficient for single-developer use - Impact: HIGH
   Correct this if: you need team-shared provenance

2. Filepath is the artifact identifier - Impact: MEDIUM
   Correct this if: you use content hashing instead

3. REST pattern for any future API - Impact: LOW
   Correct this if: you prefer GraphQL'''
open('test-spec.md', 'w', encoding='utf-8').write(content)
print('test-spec.md created')
"
</code></pre>
<p><strong>Windows note:</strong> don't use PowerShell's <code>echo</code> to create spec files. PowerShell saves files as UTF-16, which causes a <code>UnicodeDecodeError</code> when <code>import-spec</code> reads them. The <code>python -c</code> approach above writes UTF-8 correctly.</p>
<h3 id="heading-step-2-import-the-assumptions">Step 2 — Import the Assumptions</h3>
<pre><code class="language-bash">python poc.py import-spec test-spec.md --artifact src/utils/parser.py
</code></pre>
<pre><code class="language-plaintext">Spec assumptions imported — 3 Knowledge Gap(s) seeded
───────────────────────────────────────────────────────
  1. [HIGH] SQLite is sufficient for single-developer use
       Correct if: you need team-shared provenance
  2. [MEDIUM] Filepath is the artifact identifier
       Correct if: you use content hashing instead
  3. [LOW] REST pattern for any future API
       Correct if: you prefer GraphQL

  →  Bound to: src/utils/parser.py
  After the agent builds, run:
  python poc.py trace src/utils/parser.py
  python poc.py add src/utils/parser.py
</code></pre>
<h3 id="heading-step-3-trace-the-gaps">Step 3 — Trace the Gaps</h3>
<pre><code class="language-bash">python poc.py trace src/utils/parser.py
</code></pre>
<pre><code class="language-plaintext">Knowledge gaps (AI-synthesized, no human source):
  • REST pattern for any future API [Correct if: you prefer GraphQL]
  • SQLite is sufficient for single-developer use [Correct if: you need team-shared provenance]
  • Filepath is the artifact identifier [Correct if: you use content hashing instead]

  Resolve gaps: python poc.py add src/utils/parser.py
</code></pre>
<p>Three gaps, colour-coded by urgency. The HIGH-impact assumption — SQLite for single-developer use — appears in red. The LOW-impact one appears in green. When you run <code>poc.py add</code> and record a human source with an insight that overlaps the gap text, the gap auto-closes.</p>
<h3 id="heading-preview-without-writing">Preview Without Writing</h3>
<pre><code class="language-bash">python poc.py import-spec test-spec.md --dry-run
</code></pre>
<p>This parses the spec and prints what would be seeded without touching the database. This is useful before committing to an import.</p>
<h3 id="heading-check-the-overall-health">Check the Overall Health</h3>
<pre><code class="language-bash">python poc.py report
</code></pre>
<pre><code class="language-plaintext">Proof of Contribution Report
────────────────────────────────────────
  Artifacts tracked    : 1
  With provenance      : 0  (0%)
  Unresolved gaps      : 3
  Resolved claims      : 0
  Human experts        : 1
  ⚠ Less than 50% of artifacts have provenance records.
  ⚠ 3 unresolved Knowledge Gap(s).
    Run `poc.py trace &lt;filepath&gt;` to locate them.
</code></pre>
<h2 id="heading-how-to-trace-human-attribution">How to Trace Human Attribution</h2>
<p><code>poc.py trace</code> is the command you'll use most. It shows the full human attribution chain for any file and lists any knowledge gaps — parts of the code with no traceable human source.</p>
<pre><code class="language-bash">python poc.py trace src/utils/parser.py
</code></pre>
<p>A file with both attribution and gaps looks like this:</p>
<pre><code class="language-plaintext">Provenance trace: src/utils/parser.py
────────────────────────────────────────────────────────────
  [HIGH]  @juliandeangelis on github
          Spec Driven Development methodology
          https://github.com/dannwaneri/spec-writer
          Insight: separate functional from technical spec

  [MEDIUM] @tannerlinsley on github
           Cursor pagination discussion
           https://github.com/TanStack/query/discussions/123
           Insight: cursor beats offset for live-updating datasets

Knowledge gaps (AI-synthesized, no human source):
  • Error retry strategy — no human source cited
  • CSV column ordering — AI chose this arbitrarily

  Resolve gaps: python poc.py add src/utils/parser.py
</code></pre>
<p>The human attribution section shows every cited source, colour-coded by confidence. The knowledge gaps section shows every assumption that shipped without a human citation — either seeded from a spec via <code>import-spec</code>, or flagged by Claude in the Provenance Block.</p>
<h3 id="heading-resolving-gaps">Resolving Gaps</h3>
<p>Run <code>poc.py add</code> on any file with open gaps:</p>
<pre><code class="language-bash">python poc.py add src/utils/parser.py
</code></pre>
<p>When you enter an insight that shares words with an open gap claim, the gap auto-closes. Run <code>poc.py trace</code> again to confirm it's resolved.</p>
<h2 id="heading-how-to-verify-with-static-analysis">How to Verify with Static Analysis</h2>
<p><code>poc.py verify</code> is the command that closes the epistemic trust gap completely. It detects Knowledge Gaps by analysing the file's actual code structure — not by asking the AI what it doesn't know.</p>
<p>Run it after the agent builds, once you've seeded gaps with <code>import-spec</code>:</p>
<pre><code class="language-bash">python poc.py verify src/utils/parser.py
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">Verify: src/utils/parser.py
────────────────────────────────────────────────────────────
  Structural units detected : 11
  Seeded claims             : 3
  Covered by cited source   : 2
  Deterministic gaps        : 1

Deterministic Knowledge Gaps (no human source):
  • function: handle_concurrent_writes (lines 47–61)
      Seeded assumption: concurrent write handling — AI chose this arbitrarily

  Resolve: python poc.py add src/utils/parser.py
</code></pre>
<p>The gap shown is not something Claude admitted. It's something the analyser found by comparing the file's function list against your seeded claims. The function <code>handle_concurrent_writes</code> exists in the code but has no resolved human citation in the database. That's the gap.</p>
<h3 id="heading-what-the-exit-codes-mean">What the Exit Codes Mean</h3>
<pre><code class="language-bash">python poc.py verify src/utils/parser.py
echo $?   # Mac/Linux

python poc.py verify src/utils/parser.py
echo $LASTEXITCODE   # Windows PowerShell
</code></pre>
<ul>
<li><p><strong>Exit code 0</strong> — no gaps, all detected units have cited sources</p>
</li>
<li><p><strong>Exit code 1</strong> — gaps found, resolve with <code>poc.py add</code></p>
</li>
<li><p><strong>Exit code 2</strong> — file not found or unsupported language</p>
</li>
</ul>
<p>Exit code 1 integrates directly into CI pipelines. Add <code>poc.py verify</code> to your GitHub Action or pre-commit hook and gaps block the build before they reach production.</p>
<h3 id="heading-run-it-without-a-seeded-spec">Run it Without a Seeded Spec</h3>
<p>If you haven't run <code>import-spec</code> first, <code>verify</code> still works — it falls back to structural analysis and surfaces every uncited function and branch as a gap:</p>
<pre><code class="language-bash">python poc.py verify src/utils/parser.py
</code></pre>
<pre><code class="language-plaintext">⚠ No spec imported — showing all uncited structural units.
  Run: python poc.py import-spec spec.md --artifact src/utils/parser.py
  for deterministic gap detection.

Deterministic Knowledge Gaps (no human source):
  • function: parse_query (lines 1–7)
  • branch: if not text (lines 2–3)
  • function: fetch_results (lines 9–12)
  ...
</code></pre>
<p>It's less precise than the spec-writer path — every structural unit shows rather than only the ones tied to named assumptions — but it's useful as a baseline on any file, new or old.</p>
<h3 id="heading-the-strict-flag">The <code>--strict</code> Flag</h3>
<pre><code class="language-bash">python poc.py verify src/utils/parser.py --strict
</code></pre>
<p>Strict mode flags every uncited structural unit as a gap even when claims are seeded. You can use it when you want zero tolerance: any function or branch without a resolved human source fails the check.</p>
<h2 id="heading-how-to-enable-pr-enforcement">How to Enable PR Enforcement</h2>
<p>Once <code>poc.py trace</code> has saved you real hours — not before — enable the GitHub Action. The distinction matters. Turning it on day one frames the tool as overhead. Turning it on after the team already finds value frames it as a standard.</p>
<pre><code class="language-bash">git add .github/ .poc/config.json poc.py
git commit -m "chore: add proof-of-contribution"
git push
</code></pre>
<p>After that, every PR is checked for an <code>## 🤖 AI Provenance</code> section. The scaffold already created the PR template with that section included. Developers fill it in naturally once they're already running <code>poc.py trace</code> locally — the template just asks them to record what they already know.</p>
<p>Developers who write fully human code opt out by adding <code>100% human-written</code> anywhere in the PR body. The action skips the check automatically.</p>
<h3 id="heading-what-the-action-checks">What the Action Checks</h3>
<p>The action reads the PR description and looks for:</p>
<ol>
<li><p>The <code>## 🤖 AI Provenance</code> heading</p>
</li>
<li><p>At least one populated row in the attribution table</p>
</li>
</ol>
<p>If the section is missing or the table is empty, the action fails and posts a comment explaining what to add. The comment includes a link to <code>poc.py trace &lt;filepath&gt;</code> so the developer knows exactly where to look.</p>
<h2 id="heading-where-to-go-next">Where to Go Next</h2>
<h3 id="heading-use-it-with-spec-writer-on-a-real-feature">Use it with spec-writer on a Real Feature</h3>
<p>The real value of <code>import-spec</code> is on actual features, not test specs. If you use <a href="https://github.com/dannwaneri/spec-writer">spec-writer</a>, the workflow is:</p>
<pre><code class="language-plaintext">/spec-writer "your feature description"
</code></pre>
<p>Save the output to <code>spec.md</code>. Then:</p>
<pre><code class="language-bash">python poc.py import-spec spec.md --artifact src/path/to/output.py
</code></pre>
<p>Build the feature with your agent. Then run <code>poc.py trace</code> to see which assumptions made it into code with no human source. Resolve the HIGH-impact gaps first — those are the ones that will cause production incidents.</p>
<h3 id="heading-activate-the-claude-code-skill">Activate the Claude Code Skill</h3>
<p>The SKILL.md file makes Claude automatically append a Provenance Block to every generated artifact when the skill is active. The block lists human sources Claude drew from and flags what it synthesized without any traceable source.</p>
<p>To activate it in Claude Code, the skill is already installed at <code>~/.claude/skills/proof-of-contribution/</code>. Claude Code loads it automatically when you are in a project that has <code>.poc/config.json</code>.</p>
<p>A generated Provenance Block looks like this:</p>
<pre><code class="language-plaintext">## PROOF OF CONTRIBUTION
Generated artifact: fetch_github_discussions()
Confidence: MEDIUM

## HUMAN SOURCES THAT INSPIRED THIS

[1] GitHub GraphQL API Documentation Team
    Source type: Official Docs
    URL: docs.github.com/en/graphql
    Contribution: cursor-based pagination pattern

[2] GitHub Community (multiple contributors)
    Source type: GitHub Discussions
    URL: github.com/community/community
    Contribution: "ghost" fallback for deleted accounts
                  surfaced in bug reports

## KNOWLEDGE GAPS (AI synthesized, no human cited)
- Error handling / retry logic
- Rate limit strategy

## RECOMMENDED HUMAN EXPERTS TO CONSULT
- github.com/octokit community for pagination
</code></pre>
<p>The Knowledge Gaps section is the part no other tool produces. It's where AI admits what it synthesized without a traceable human source — before that gap becomes a production incident.</p>
<h3 id="heading-upgrade-when-you-outgrow-sqlite">Upgrade When You Outgrow SQLite</h3>
<p>The default database is SQLite — local only, no infra required. When you need team sharing or graph queries, the <code>references/</code> directory in the repo has migration guides:</p>
<table>
<thead>
<tr>
<th>Need</th>
<th>File</th>
</tr>
</thead>
<tbody><tr>
<td>Team sharing a provenance DB</td>
<td><code>references/relational-schema.md</code></td>
</tr>
<tr>
<td>Graph traversal queries</td>
<td><code>references/neo4j-implementation.md</code></td>
</tr>
<tr>
<td>Semantic web / interoperability</td>
<td><code>references/jsonld-schema.md</code></td>
</tr>
</tbody></table>
<h2 id="heading-manual-tracking-vs-proof-of-contribution">Manual Tracking vs. proof-of-contribution</h2>
<table>
<thead>
<tr>
<th></th>
<th>Manual tracking</th>
<th>proof-of-contribution</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Finding who wrote the code</strong></td>
<td>Search Slack, ask the team, dig through commits</td>
<td><code>poc.py trace &lt;file&gt;</code> — thirty seconds</td>
</tr>
<tr>
<td><strong>Knowing which parts the AI guessed</strong></td>
<td>You don't, until it breaks in production</td>
<td>Knowledge Gaps section — surfaced before the code ships</td>
</tr>
<tr>
<td><strong>Detecting gaps after the build</strong></td>
<td>Code review, if someone notices</td>
<td><code>poc.py verify</code> — static analysis, zero API calls</td>
</tr>
<tr>
<td><strong>Enforcing attribution on PRs</strong></td>
<td>Honor system</td>
<td>GitHub Action fails the PR if attribution is missing</td>
</tr>
<tr>
<td><strong>Connecting to your spec</strong></td>
<td>Copy-paste assumptions into comments manually</td>
<td><code>poc.py import-spec</code> seeds them as tracked claims automatically</td>
</tr>
<tr>
<td><strong>Infrastructure required</strong></td>
<td>None (usually a spreadsheet or nothing)</td>
<td>None — SQLite, pure Python, no paid services</td>
</tr>
</tbody></table>
<p>The tool doesn't replace code review. It gives code review the context it needs to catch the right things.</p>
<p>The archaeology scenario — two days tracing a bug through dead-end commit messages — takes thirty seconds with <code>poc.py trace</code>. The code still has gaps, and it always will. But now you know where they are.</p>
<p><em>Built by</em> <a href="https://dev.to/dannwaneri"><em>Daniel Nwaneri</em></a><em>. The spec-writer skill that feeds</em> <code>import-spec</code> <em>is at</em> <a href="https://github.com/dannwaneri/spec-writer"><em>github.com/dannwaneri/spec-writer</em></a><em>. The full proof-of-contribution repo is at</em> <a href="https://github.com/dannwaneri/proof-of-contribution"><em>github.com/dannwaneri/proof-of-contribution</em></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Cost-Efficient AI Agent with Tiered Model Routing ]]>
                </title>
                <description>
                    <![CDATA[ Most AI agent tutorials make the same mistake: they route every task to the most expensive model available. A character count doesn't need GPT-4. A presence check doesn't need Sonnet. A regex doesn't  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-cost-efficient-ai-agent-with-tiered-model-routing/</link>
                <guid isPermaLink="false">69d6ddbd707c1ce7688e7ea0</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude.ai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude-code ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ai agents ]]>
                    </category>
                
                    <category>
                        <![CDATA[ webdev ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Daniel Nwaneri ]]>
                </dc:creator>
                <pubDate>Wed, 08 Apr 2026 22:59:09 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/3a60436b-cbd7-4005-8e52-36291d815eea.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most AI agent tutorials make the same mistake: they route every task to the most expensive model available.</p>
<p>A character count doesn't need GPT-4. A presence check doesn't need Sonnet. A regex doesn't need anything except Python.</p>
<p>The mistake isn't using AI — it's not knowing when to stop using it.</p>
<p>This tutorial shows you how to build a tiered routing system that sends tasks to the cheapest model that can solve them. The pattern is called the cost curve. It comes from a comment thread on a DEV.to article, implemented by three developers over a weekend, and it cut the per-URL cost of a real SEO audit agent from \(0.006 to effectively \)0 for most pages.</p>
<p>By the end, you'll have a working <code>cost_curve.py</code> module you can drop into any agent project.</p>
<h2 id="heading-what-youll-build">What You'll Build</h2>
<p>A three-tier routing function that:</p>
<ul>
<li><p>Runs deterministic Python checks first — zero API cost</p>
</li>
<li><p>Escalates to Claude Haiku only for genuinely ambiguous cases — ~$0.0001 per call</p>
</li>
<li><p>Escalates to Claude Sonnet only when semantic judgment is required — ~$0.006 per call</p>
</li>
<li><p>Falls back gracefully when any tier fails</p>
</li>
<li><p>Returns a consistent result schema regardless of which tier handled the request</p>
</li>
</ul>
<p>The full implementation is part of <a href="https://github.com/dannwaneri/seo-agent">dannwaneri/seo-agent</a>, an open-core SEO audit agent. The cost curve module is the premium routing layer, and the principle applies to any agent with mixed-complexity tasks.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p>Python 3.11 or higher</p>
</li>
<li><p>An Anthropic API key</p>
</li>
<li><p>Basic familiarity with Python and the Claude API</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a href="#heading-the-problem-with-calling-claude-on-everything">The Problem with Calling Claude on Everything</a></p>
</li>
<li><p><a href="#heading-the-cost-curve-explained">The Cost Curve Explained</a></p>
</li>
<li><p><a href="#heading-project-setup">Project Setup</a></p>
</li>
<li><p><a href="#heading-tier-1-deterministic-python">Tier 1: Deterministic Python</a></p>
</li>
<li><p><a href="#heading-tier-2-claude-haiku-for-ambiguous-cases">Tier 2: Claude Haiku for Ambiguous Cases</a></p>
</li>
<li><p><a href="#heading-tier-3-claude-sonnet-for-semantic-judgment">Tier 3: Claude Sonnet for Semantic Judgment</a></p>
</li>
<li><p><a href="#heading-the-router-audit_url">The Router: audit_url()</a></p>
</li>
<li><p><a href="#heading-graceful-fallback">Graceful Fallback</a></p>
</li>
<li><p><a href="#heading-testing-the-cost-curve">Testing the Cost Curve</a></p>
</li>
<li><p><a href="#heading-applying-this-pattern-to-your-agent">Applying This Pattern to Your Agent</a></p>
</li>
</ol>
<h2 id="heading-the-problem-with-calling-claude-on-everything">The Problem with Calling Claude on Everything</h2>
<p>Here's what most agent code looks like:</p>
<pre><code class="language-python">def audit_url(snapshot: dict) -&gt; dict:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        messages=[{"role": "user", "content": build_prompt(snapshot)}]
    )
    return parse_response(response)
</code></pre>
<p>This works. It also calls Sonnet for every URL in the list — including the ones where the title is 142 characters long and the answer is obviously FAIL without any model involvement.</p>
<p>Claude Sonnet 4 is priced at \(3 per million input tokens and \)15 per million output tokens. A typical page snapshot is around 500 input tokens. That's \(0.0015 per URL just for input — before output tokens. Across a 20-URL weekly audit, the total is around \)0.12. Not expensive. But most of those pages have mechanical SEO issues: missing descriptions, titles over 60 characters, no canonical tag. A character count catches all of that. You don't need a model.</p>
<p>The cost curve fixes this by routing based on what the task actually requires, not on what the model is capable of.</p>
<h2 id="heading-the-cost-curve-explained">The Cost Curve Explained</h2>
<p>In the cost curve, we have three tiers, three tools, and three price points:</p>
<p><strong>Tier 1 — Deterministic Python. Cost: $0.</strong> Check title length, description length, H1 count, canonical presence. These are not judgment calls. They're string operations. If title length &gt; 60, FAIL. No model needed.</p>
<p><strong>Tier 2 — Claude Haiku. Cost: ~$0.0001 per call.</strong> Title present but only 4 characters long. Description present but only 30 characters. Status code is a redirect. These pass the mechanical audit but something is off. Haiku is fast and cheap enough that escalating ambiguous cases costs less than the debugging time you'd spend on false positives.</p>
<p><strong>Tier 3 — Claude Sonnet. Cost: ~$0.006 per call.</strong> Pages Haiku flags as needing semantic judgment. "This title passes length but reads like a navigation label." "This description duplicates the title verbatim." Sonnet earns its cost on genuinely hard cases — not on every URL in the list.</p>
<p>The routing decision happens before any API call. The result schema is identical regardless of which tier handled the request.</p>
<h2 id="heading-project-setup">Project Setup</h2>
<pre><code class="language-bash">mkdir cost-curve-demo &amp;&amp; cd cost-curve-demo
pip install anthropic
</code></pre>
<p>Set your API key:</p>
<pre><code class="language-bash"># macOS/Linux
export ANTHROPIC_API_KEY="sk-ant-..."

# Windows PowerShell
$env:ANTHROPIC_API_KEY = "sk-ant-..."
</code></pre>
<p>Create <code>cost_curve.py</code> — you'll build this module step by step.</p>
<h2 id="heading-tier-1-deterministic-python">Tier 1: Deterministic Python</h2>
<p>Tier 1 runs first on every URL. It checks four fields using only Python string operations. There's no API call, no latency, and no cost.</p>
<pre><code class="language-python">import json
import logging
import os
import re
from datetime import datetime, timezone

import anthropic

logger = logging.getLogger(__name__)

REDIRECT_CODES = {301, 302, 307, 308}

# Fields that trigger Tier 2 escalation
# Title or description present but suspiciously short
AMBIGUOUS_TITLE_MAX = 10   # chars — present but too short to be real
AMBIGUOUS_DESC_MAX = 50    # chars — present but too short to be useful


def _now_iso() -&gt; str:
    return datetime.now(timezone.utc).isoformat()


def _build_result(snapshot: dict, method: str) -&gt; dict:
    """Base result skeleton — same schema regardless of tier."""
    return {
        "url": snapshot.get("final_url", ""),
        "final_url": snapshot.get("final_url", ""),
        "status_code": snapshot.get("status_code"),
        "title": {"value": None, "length": 0, "status": "PASS"},
        "description": {"value": None, "length": 0, "status": "PASS"},
        "h1": {"count": 0, "value": None, "status": "PASS"},
        "canonical": {"value": None, "status": "PASS"},
        "flags": [],
        "human_review": False,
        "audited_at": _now_iso(),
        "method": method,
        "needs_tier3": False,
    }


def tier1_check(snapshot: dict) -&gt; dict:
    """
    Pure Python SEO checks. Zero API calls.

    Returns a result dict with method="deterministic".
    Sets needs_tier3=False always — Tier 1 never escalates to Tier 3 directly.
    Escalation to Tier 2 is decided by the router, not here.
    """
    result = _build_result(snapshot, "deterministic")

    title = snapshot.get("title") or ""
    description = snapshot.get("meta_description") or ""
    h1s = snapshot.get("h1s") or []
    canonical = snapshot.get("canonical") or ""

    # Title check
    result["title"]["value"] = title or None
    result["title"]["length"] = len(title)
    if not title or len(title) &gt; 60:
        result["title"]["status"] = "FAIL"
        msg = "Title is missing" if not title else f"Title is {len(title)} characters (max 60)"
        result["flags"].append(msg)

    # Description check
    result["description"]["value"] = description or None
    result["description"]["length"] = len(description)
    if not description or len(description) &gt; 160:
        result["description"]["status"] = "FAIL"
        msg = "Meta description is missing" if not description else f"Meta description is {len(description)} characters (max 160)"
        result["flags"].append(msg)

    # H1 check
    result["h1"]["count"] = len(h1s)
    result["h1"]["value"] = h1s[0] if h1s else None
    if len(h1s) == 0:
        result["h1"]["status"] = "FAIL"
        result["flags"].append("H1 tag is missing")
    elif len(h1s) &gt; 1:
        result["h1"]["status"] = "FAIL"
        result["flags"].append(f"Multiple H1 tags found ({len(h1s)})")

    # Canonical check
    result["canonical"]["value"] = canonical or None
    if not canonical:
        result["canonical"]["status"] = "FAIL"
        result["flags"].append("Canonical tag is missing")

    return result
</code></pre>
<p>The key design decision: <code>tier1_check()</code> never decides whether to escalate. It just runs the checks and returns. The router decides escalation based on the result.</p>
<h2 id="heading-tier-2-claude-haiku-for-ambiguous-cases">Tier 2: Claude Haiku for Ambiguous Cases</h2>
<p>Tier 2 runs when Tier 1 detects something mechanical but the result might need a second look. A 4-character title present but clearly wrong. A 30-character description that's technically there but useless. A redirect status that needs a human-readable explanation.</p>
<p>Haiku is the right model here. It's fast, cheap (\(1 input / \)5 output per million tokens), and sufficient for triage-level judgment. The prompt asks a narrow question: is this ambiguous enough to need Sonnet?</p>
<pre><code class="language-python">def tier2_check(snapshot: dict) -&gt; dict:
    """
    Claude Haiku call for ambiguous cases.

    Returns result with method="haiku".
    Sets needs_tier3=True if Haiku determines the case needs semantic judgment.
    Falls back to Tier 1 result on API error.
    """
    api_key = os.environ.get("ANTHROPIC_API_KEY")
    if not api_key:
        raise OSError("ANTHROPIC_API_KEY is not set.")

    client = anthropic.Anthropic(api_key=api_key)

    title = snapshot.get("title") or ""
    description = snapshot.get("meta_description") or ""
    status_code = snapshot.get("status_code")

    prompt = f"""You are an SEO auditor doing a quick triage check.

Page data:
- Title: {repr(title)} ({len(title)} chars)
- Meta description: {repr(description)} ({len(description)} chars)
- Status code: {status_code}

Answer these two questions with only "yes" or "no":
1. Does this page need semantic judgment beyond simple length/presence checks? 
   (e.g. title is present but clearly wrong, description is present but meaningless)
2. Is the status code a redirect that needs investigation?

Respond in this exact JSON format and nothing else:
{{"needs_tier3": true_or_false, "reason": "one sentence explanation"}}"""

    try:
        response = client.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=150,
            messages=[{"role": "user", "content": prompt}],
        )
        raw = response.content[0].text.strip()
        # Strip markdown fences if present
        if raw.startswith("```"):
            lines = raw.splitlines()
            raw = "\n".join(lines[1:-1] if lines[-1].strip() == "```" else lines[1:])
        parsed = json.loads(raw)

        result = _build_result(snapshot, "haiku")
        # Copy Tier 1 field checks — Haiku doesn't redo those
        t1 = tier1_check(snapshot)
        result["title"] = t1["title"]
        result["description"] = t1["description"]
        result["h1"] = t1["h1"]
        result["canonical"] = t1["canonical"]
        result["flags"] = t1["flags"]
        result["needs_tier3"] = parsed.get("needs_tier3", False)
        if result["needs_tier3"]:
            result["flags"].append(f"Escalated to Tier 3: {parsed.get('reason', '')}")

        return result

    except Exception as exc:
        logger.warning("[tier2] Haiku API error: %s — falling back to Tier 1 result", exc)
        fallback = tier1_check(snapshot)
        fallback["method"] = "haiku-fallback"
        return fallback
</code></pre>
<p>The fallback is the critical piece. If Haiku fails — rate limit, network error, malformed response — the function returns the Tier 1 result rather than crashing. The audit continues. The URL gets flagged with <code>method="haiku-fallback"</code> so you can identify it later.</p>
<h2 id="heading-tier-3-claude-sonnet-for-semantic-judgment">Tier 3: Claude Sonnet for Semantic Judgment</h2>
<p>Tier 3 is where the full extraction prompt runs. This is the same call you'd make in a naïve implementation — the difference is that only a small fraction of URLs reach this tier.</p>
<pre><code class="language-python">def tier3_check(snapshot: dict) -&gt; dict:
    """
    Claude Sonnet call for semantic judgment.

    Returns result with method="sonnet".
    This is the full extraction prompt — same as calling the model directly.
    """
    api_key = os.environ.get("ANTHROPIC_API_KEY")
    if not api_key:
        raise OSError("ANTHROPIC_API_KEY is not set.")

    client = anthropic.Anthropic(api_key=api_key)

    prompt = f"""You are an SEO auditor. Analyze this page snapshot and return ONLY a JSON object.
No prose. No explanation. No markdown fences. Raw JSON only.

Page data:
- URL: {snapshot.get('final_url')}
- Status code: {snapshot.get('status_code')}
- Title: {snapshot.get('title')}
- Meta description: {snapshot.get('meta_description')}
- H1 tags: {snapshot.get('h1s')}
- Canonical: {snapshot.get('canonical')}

Return this exact schema:
{{
  "url": "string",
  "final_url": "string",
  "status_code": number,
  "title": {{"value": "string or null", "length": number, "status": "PASS or FAIL"}},
  "description": {{"value": "string or null", "length": number, "status": "PASS or FAIL"}},
  "h1": {{"count": number, "value": "string or null", "status": "PASS or FAIL"}},
  "canonical": {{"value": "string or null", "status": "PASS or FAIL"}},
  "flags": ["array of strings describing specific issues"],
  "human_review": false,
  "audited_at": "ISO timestamp"
}}

PASS/FAIL rules:
- title: FAIL if null or length &gt; 60 characters, or if present but clearly not a real title
- description: FAIL if null or length &gt; 160 characters, or if present but meaningless
- h1: FAIL if count is 0 or count &gt; 1
- canonical: FAIL if null
- audited_at: use current UTC time"""

    try:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1000,
            messages=[{"role": "user", "content": prompt}],
        )
        raw = response.content[0].text.strip()
        if raw.startswith("```"):
            lines = raw.splitlines()
            raw = "\n".join(lines[1:-1] if lines[-1].strip() == "```" else lines[1:])

        result = json.loads(raw)
        result["method"] = "sonnet"
        result["needs_tier3"] = False
        return result

    except Exception as exc:
        logger.warning("[tier3] Sonnet API error: %s — falling back to Tier 1 result", exc)
        fallback = tier1_check(snapshot)
        fallback["method"] = "sonnet-fallback"
        return fallback
</code></pre>
<p>Note the prompt addition in Tier 3 that isn't in Tier 1: <code>"or if present but clearly not a real title"</code> and <code>"or if present but meaningless"</code>. That's the semantic judgment Haiku identified as needed. Tier 3 acts on it.</p>
<h2 id="heading-the-router-auditurl">The Router: audit_url()</h2>
<p>The router is the public interface. Everything else is an implementation detail.</p>
<pre><code class="language-python">def audit_url(snapshot: dict, tiered: bool = False) -&gt; dict:
    """
    Route a page snapshot through the appropriate audit tier.

    Args:
        snapshot: Page data from browser.py — must contain final_url,
                  status_code, title, meta_description, h1s, canonical.
        tiered: If False, delegates directly to Tier 3 (Sonnet).
                If True, routes through the cost curve.

    Returns:
        Audit result dict with method field indicating which tier ran.
    """
    if not tiered:
        # Non-tiered mode: call Sonnet directly, same as v1 behavior
        return tier3_check(snapshot)

    # Tier 1: always runs first
    t1_result = tier1_check(snapshot)

    # Check if escalation to Tier 2 is warranted
    title = snapshot.get("title") or ""
    description = snapshot.get("meta_description") or ""
    status_code = snapshot.get("status_code")

    needs_tier2 = (
        # Title present but suspiciously short
        (title and len(title) &lt; AMBIGUOUS_TITLE_MAX) or
        # Description present but suspiciously short
        (description and len(description) &lt; AMBIGUOUS_DESC_MAX) or
        # Redirect status — may need explanation
        (status_code in REDIRECT_CODES)
    )

    if not needs_tier2:
        # Tier 1 result is definitive — return without any API call
        return t1_result

    # Tier 2: Haiku triage
    t2_result = tier2_check(snapshot)

    if not t2_result.get("needs_tier3", False):
        # Haiku determined no semantic judgment needed
        return t2_result

    # Tier 3: Sonnet for semantic judgment
    return tier3_check(snapshot)
</code></pre>
<p>The router logic is explicit and readable. Each decision point is a named condition. When <code>tiered=False</code>, behavior is identical to the v1 naive implementation — this is the backward compatibility guarantee that lets you add the cost curve incrementally without breaking existing audits.</p>
<h2 id="heading-graceful-fallback">Graceful Fallback</h2>
<p>The fallback pattern appears in both Tier 2 and Tier 3. It's worth making explicit:</p>
<pre><code class="language-python"># Pattern used in both tier2_check() and tier3_check()
except Exception as exc:
    logger.warning("[tierN] API error: %s — falling back to Tier 1 result", exc)
    fallback = tier1_check(snapshot)
    fallback["method"] = "tierN-fallback"
    return fallback
</code></pre>
<p>Three things this does:</p>
<ol>
<li><p>Logs the error with enough context to debug later</p>
</li>
<li><p>Returns a valid result — the Tier 1 deterministic check always runs regardless</p>
</li>
<li><p>Tags the result with the fallback method so you can filter these in your report</p>
</li>
</ol>
<p>An agent that crashes on API errors is not production-ready. An agent that degrades gracefully and continues is.</p>
<h2 id="heading-testing-the-cost-curve">Testing the Cost Curve</h2>
<p>Create <code>test_cost_curve.py</code> to verify routing behavior without live API calls:</p>
<pre><code class="language-python">import json
from unittest import mock

from cost_curve import audit_url, tier1_check


def make_snapshot(title="Normal Title Under 60 Chars",
                  description="A normal meta description that is under 160 characters and describes the page content well.",
                  h1s=["Single H1"],
                  canonical="https://example.com/page",
                  status_code=200,
                  final_url="https://example.com/page"):
    return {
        "title": title,
        "meta_description": description,
        "h1s": h1s,
        "canonical": canonical,
        "status_code": status_code,
        "final_url": final_url,
    }


def test_clean_page_returns_tier1_no_api_calls():
    """Clean page: all checks pass deterministically — no API call."""
    snapshot = make_snapshot()
    with mock.patch("anthropic.Anthropic") as mock_client:
        result = audit_url(snapshot, tiered=True)
        assert result["method"] == "deterministic"
        mock_client.assert_not_called()
    print("PASS: clean page → Tier 1, zero API calls")


def test_long_title_returns_tier1_fail_no_api_call():
    """Title &gt;60 chars: FAIL from Tier 1, no API call."""
    snapshot = make_snapshot(title="A" * 70)
    with mock.patch("anthropic.Anthropic") as mock_client:
        result = audit_url(snapshot, tiered=True)
        assert result["method"] == "deterministic"
        assert result["title"]["status"] == "FAIL"
        mock_client.assert_not_called()
    print("PASS: title &gt;60 → Tier 1 FAIL, zero API calls")


def test_suspiciously_short_title_escalates_to_tier2():
    """Title present but 4 chars: escalates to Tier 2."""
    snapshot = make_snapshot(title="SEO")  # 3 chars — under AMBIGUOUS_TITLE_MAX
    mock_response = mock.MagicMock()
    mock_response.content = [mock.MagicMock(
        text='{"needs_tier3": false, "reason": "title is short but not ambiguous"}'
    )]
    with mock.patch("anthropic.Anthropic") as mock_client:
        mock_client.return_value.messages.create.return_value = mock_response
        result = audit_url(snapshot, tiered=True)
        assert result["method"] == "haiku"
        assert mock_client.return_value.messages.create.call_count == 1
    print("PASS: short title → Tier 2 (Haiku called once)")


def test_tiered_false_calls_sonnet_directly():
    """tiered=False: Sonnet called regardless of snapshot content."""
    snapshot = make_snapshot()  # clean page, would be Tier 1 in tiered mode
    mock_response = mock.MagicMock()
    mock_response.content = [mock.MagicMock(text=json.dumps({
        "url": "https://example.com/page",
        "final_url": "https://example.com/page",
        "status_code": 200,
        "title": {"value": "Normal Title Under 60 Chars", "length": 27, "status": "PASS"},
        "description": {"value": "desc", "length": 4, "status": "PASS"},
        "h1": {"count": 1, "value": "Single H1", "status": "PASS"},
        "canonical": {"value": "https://example.com/page", "status": "PASS"},
        "flags": [],
        "human_review": False,
        "audited_at": "2026-04-01T00:00:00+00:00",
    }))]
    with mock.patch("anthropic.Anthropic") as mock_client:
        mock_client.return_value.messages.create.return_value = mock_response
        result = audit_url(snapshot, tiered=False)
        assert result["method"] == "sonnet"
        assert mock_client.return_value.messages.create.call_count == 1
    print("PASS: tiered=False → Sonnet called directly")


def test_haiku_api_failure_falls_back_to_tier1():
    """Haiku failure: falls back to Tier 1 result, no crash."""
    snapshot = make_snapshot(title="SEO")  # triggers Tier 2
    with mock.patch("anthropic.Anthropic") as mock_client:
        mock_client.return_value.messages.create.side_effect = Exception("rate limit")
        result = audit_url(snapshot, tiered=True)
        assert result["method"] == "haiku-fallback"
    print("PASS: Haiku failure → fallback to Tier 1, no crash")


if __name__ == "__main__":
    test_clean_page_returns_tier1_no_api_calls()
    test_long_title_returns_tier1_fail_no_api_call()
    test_suspiciously_short_title_escalates_to_tier2()
    test_tiered_false_calls_sonnet_directly()
    test_haiku_api_failure_falls_back_to_tier1()
    print("\nAll tests passed.")
</code></pre>
<p>Run it:</p>
<pre><code class="language-bash">python test_cost_curve.py
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">PASS: clean page → Tier 1, zero API calls
PASS: title &gt;60 → Tier 1 FAIL, zero API calls
PASS: short title → Tier 2 (Haiku called once)
PASS: tiered=False → Sonnet called directly
PASS: Haiku failure → fallback to Tier 1, no crash
</code></pre>
<h2 id="heading-applying-this-pattern-to-your-agent">Applying This Pattern to Your Agent</h2>
<p>The cost curve is not SEO-specific. Any agent with mixed-complexity tasks can use it.</p>
<p>The principle: classify tasks by what they actually require before deciding which model to invoke.</p>
<p><strong>Customer support agent:</strong></p>
<ul>
<li><p>Tier 1: keyword matching for known FAQ topics — no model</p>
</li>
<li><p>Tier 2: Haiku for intent classification on ambiguous queries</p>
</li>
<li><p>Tier 3: Sonnet for complex complaints requiring judgment</p>
</li>
</ul>
<p><strong>Code review agent:</strong></p>
<ul>
<li><p>Tier 1: lint rules, syntax checks — no model</p>
</li>
<li><p>Tier 2: Haiku for common pattern detection</p>
</li>
<li><p>Tier 3: Sonnet for architectural review</p>
</li>
</ul>
<p><strong>Content moderation agent:</strong></p>
<ul>
<li><p>Tier 1: blocklist matching — no model</p>
</li>
<li><p>Tier 2: Haiku for borderline cases</p>
</li>
<li><p>Tier 3: Sonnet for context-dependent judgment</p>
</li>
</ul>
<p>The implementation pattern is the same in all three cases. The <code>audit_url()</code> router becomes <code>route_task()</code>. The tier functions change their prompts and escalation conditions. The fallback logic stays identical.</p>
<p>The key question to ask before writing any agent code: what fraction of my inputs are mechanically solvable? That fraction goes to Tier 1. The rest escalate. The cost curve routes everything else.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>The full implementation — including the SEO audit agent that uses this module in production — is at <a href="https://github.com/dannwaneri/seo-agent">dannwaneri/seo-agent</a>. The <code>core/</code> directory is MIT licensed. The tiered routing lives in <code>premium/cost_curve.py</code>.</p>
<p><em>This tutorial is the companion piece to</em> <a href="https://dev.to/dannwaneri/i-was-paying-0006-per-url-for-seo-audits-until-i-realized-most-needed-0-132j">I Was Paying \(0.006 Per URL for SEO Audits Until I Realized Most Needed \)0</a> <em>on DEV.to, which covers the architecture decisions behind the cost curve.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Local SEO Audit Agent with Browser Use and Claude API ]]>
                </title>
                <description>
                    <![CDATA[ Every digital marketing agency has someone whose job involves opening a spreadsheet, visiting each client URL, checking the title tag, meta description, and H1, noting broken links, and pasting everyt ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-local-seo-audit-agent-with-browser-use-and-claude-api/</link>
                <guid isPermaLink="false">69cb09249fffa747409f133f</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python 3 ]]>
                    </category>
                
                    <category>
                        <![CDATA[ automation ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude.ai ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Daniel Nwaneri ]]>
                </dc:creator>
                <pubDate>Mon, 30 Mar 2026 23:37:08 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/98f8eb73-bfe2-4990-b41a-1997a35134f2.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Every digital marketing agency has someone whose job involves opening a spreadsheet, visiting each client URL, checking the title tag, meta description, and H1, noting broken links, and pasting everything into a report. Then doing it again next week.</p>
<p>That work is deterministic. An agent can do it.</p>
<p>In this tutorial, you'll build a local SEO audit agent from scratch using Python, Browser Use, and the Claude API. The agent visits real pages in a visible browser window, extracts SEO signals using Claude, checks for broken links asynchronously, handles edge cases with a human-in-the-loop pause, and writes a structured report — all resumable if interrupted.</p>
<p>By the end, you'll have a working agent you can run against any list of URLs. It costs less than $0.01 per URL to run.</p>
<h2 id="heading-what-youll-build">What You'll Build</h2>
<p>A seven-module Python agent that:</p>
<ul>
<li><p>Reads a URL list from a CSV file</p>
</li>
<li><p>Visits each URL in a real Chromium browser (not a headless scraper)</p>
</li>
<li><p>Extracts title, meta description, H1s, and canonical tag via Claude API</p>
</li>
<li><p>Checks for broken links asynchronously using httpx</p>
</li>
<li><p>Detects edge cases (404s, login walls, redirects) and pauses for human input</p>
</li>
<li><p>Writes results to <code>report.json</code> incrementally — safe to interrupt and resume</p>
</li>
<li><p>Generates a plain-English <code>report-summary.txt</code> on completion</p>
</li>
</ul>
<p>The full code is on GitHub at <a href="https://github.com/dannwaneri/seo-agent">dannwaneri/seo-agent</a>.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p>Python 3.11 or higher</p>
</li>
<li><p>An Anthropic API key (get one at console.anthropic.com)</p>
</li>
<li><p>Windows, macOS, or Linux</p>
</li>
<li><p>Basic familiarity with Python and the command line</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a href="#heading-why-browser-use-instead-of-a-scraper">Why Browser Use Instead of a Scraper</a></p>
</li>
<li><p><a href="#heading-project-structure">Project Structure</a></p>
</li>
<li><p><a href="#heading-setup">Setup</a></p>
</li>
<li><p><a href="#heading-module-1-state-management">Module 1: State Management</a></p>
</li>
<li><p><a href="#heading-module-2-browser-integration">Module 2: Browser Integration</a></p>
</li>
<li><p><a href="#heading-module-3-claude-extraction-layer">Module 3: Claude Extraction Layer</a></p>
</li>
<li><p><a href="#heading-module-4-broken-link-checker">Module 4: Broken Link Checker</a></p>
</li>
<li><p><a href="#heading-module-5-human-in-the-loop">Module 5: Human-in-the-Loop</a></p>
</li>
<li><p><a href="#heading-module-6-report-writer">Module 6: Report Writer</a></p>
</li>
<li><p><a href="#heading-module-7-the-main-loop">Module 7: The Main Loop</a></p>
</li>
<li><p><a href="#heading-running-the-agent">Running the Agent</a></p>
</li>
<li><p><a href="#heading-scheduling-for-agency-use">Scheduling for Agency Use</a></p>
</li>
<li><p><a href="#heading-what-the-results-look-like">What the Results Look Like</a></p>
</li>
</ol>
<h2 id="heading-why-browser-use-instead-of-a-scraper">Why Browser Use Instead of a Scraper</h2>
<p>The standard approach to SEO auditing is to fetch page HTML with <code>requests</code> and parse it with BeautifulSoup. That works on static pages. It breaks on JavaScript-rendered content, misses dynamically injected meta tags, and fails entirely on authenticated pages.</p>
<p>Browser Use (84,000+ GitHub stars, MIT license) takes a different approach. It controls a real Chromium browser, reads the DOM after JavaScript executes, and exposes the page through Playwright's accessibility tree. The agent sees what a human would see.</p>
<p>The practical difference: a requests-based scraper might miss a meta description injected by a React component. Browser Use won't.</p>
<p>The other difference worth naming: Browser Use reads pages semantically. A Playwright script breaks when a button's CSS class changes from <code>btn-primary</code> to <code>button-main</code>. Browser Use identifies it's still a "Submit" button and acts accordingly. The extraction logic lives in the Claude prompt, not in brittle CSS selectors.</p>
<h2 id="heading-project-structure">Project Structure</h2>
<pre><code class="language-plaintext">seo-agent/
├── index.py          # Main audit loop
├── browser.py        # Browser Use / Playwright page driver
├── extractor.py      # Claude API extraction layer
├── linkchecker.py    # Async broken link checker
├── hitl.py           # Human-in-the-loop pause logic
├── reporter.py       # Report writer
├── state.py          # State persistence (resume on interrupt)
├── input.csv         # Your URL list
├── requirements.txt
├── .env.example
└── .gitignore
</code></pre>
<h2 id="heading-setup">Setup</h2>
<p>Create a project folder and install dependencies:</p>
<pre><code class="language-bash">mkdir seo-agent &amp;&amp; cd seo-agent
pip install browser-use anthropic playwright httpx
playwright install chromium
</code></pre>
<p>Create <code>input.csv</code> with your URLs:</p>
<pre><code class="language-plaintext">url
https://example.com
https://example.com/about
https://example.com/contact
</code></pre>
<p>Create <code>.env.example</code>:</p>
<pre><code class="language-plaintext">ANTHROPIC_API_KEY=your-key-here
</code></pre>
<p>Set your API key as an environment variable before running:</p>
<pre><code class="language-bash"># macOS/Linux
export ANTHROPIC_API_KEY="sk-ant-..."

# Windows PowerShell
$env:ANTHROPIC_API_KEY = "sk-ant-..."
</code></pre>
<p>Create <code>.gitignore</code>:</p>
<pre><code class="language-plaintext">state.json
report.json
report-summary.txt
.env
__pycache__/
*.pyc
</code></pre>
<h2 id="heading-module-1-state-management">Module 1: State Management</h2>
<p>The agent needs to track which URLs it has already audited. If the run is interrupted — power cut, keyboard interrupt, network error — it should resume from where it stopped, not start over.</p>
<p><code>state.py</code> handles this with a flat JSON file:</p>
<pre><code class="language-python">import json
import os

STATE_FILE = os.path.join(os.path.dirname(__file__), "state.json")

_DEFAULT_STATE = {"audited": [], "pending": [], "needs_human": []}


def load_state() -&gt; dict:
    if not os.path.exists(STATE_FILE):
        save_state(_DEFAULT_STATE.copy())
    with open(STATE_FILE, encoding="utf-8") as f:
        return json.load(f)


def save_state(state: dict) -&gt; None:
    with open(STATE_FILE, "w", encoding="utf-8") as f:
        json.dump(state, f, indent=2)


def is_audited(url: str) -&gt; bool:
    return url in load_state()["audited"]


def mark_audited(url: str) -&gt; None:
    state = load_state()
    if url not in state["audited"]:
        state["audited"].append(url)
    save_state(state)


def add_to_needs_human(url: str) -&gt; None:
    state = load_state()
    if url not in state["needs_human"]:
        state["needs_human"].append(url)
    save_state(state)
</code></pre>
<p>The design is intentional: <code>mark_audited()</code> is called immediately after a URL is processed and written to the report. If the agent crashes mid-run, it loses at most one URL's work.</p>
<h2 id="heading-module-2-browser-integration">Module 2: Browser Integration</h2>
<p><code>browser.py</code> does the actual page navigation. It uses Playwright directly (which Browser Use installs as a dependency) to open a visible Chromium window, navigate to the URL, capture HTTP status and redirect information, and extract the raw SEO signals from the DOM.</p>
<p>The key design decisions:</p>
<p><strong>Visible browser, not headless.</strong> Set <code>headless=False</code> so you can watch the agent work. This matters for the demo and for debugging.</p>
<p><strong>Status capture via response listener.</strong> Playwright raises an exception on 4xx/5xx responses, but the <code>on("response", ...)</code> handler fires before the exception. We capture status there.</p>
<p><strong>2-second delay between visits.</strong> Prevents triggering rate limiting or bot detection on agency client sites.</p>
<p>Here is the core navigation function:</p>
<pre><code class="language-python">import asyncio
import sys
import time
from playwright.sync_api import sync_playwright, TimeoutError as PlaywrightTimeout

TIMEOUT = 20_000  # 20 seconds


def fetch_page(url: str) -&gt; dict:
    result = {
        "final_url": url,
        "status_code": None,
        "title": None,
        "meta_description": None,
        "h1s": [],
        "canonical": None,
        "raw_links": [],
    }

    first_status = {"code": None}

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        page = browser.new_page()

        def on_response(response):
            if first_status["code"] is None:
                first_status["code"] = response.status

        page.on("response", on_response)

        try:
            page.goto(url, wait_until="domcontentloaded", timeout=TIMEOUT)
            result["status_code"] = first_status["code"] or 200
            result["final_url"] = page.url

            # Extract SEO signals from DOM
            result["title"] = page.title() or None
            result["meta_description"] = page.evaluate(
                "() =&gt; { const m = document.querySelector('meta[name=\"description\"]'); "
                "return m ? m.getAttribute('content') : null; }"
            )
            result["h1s"] = page.evaluate(
                "() =&gt; Array.from(document.querySelectorAll('h1')).map(h =&gt; h.innerText.trim())"
            )
            result["canonical"] = page.evaluate(
                "() =&gt; { const c = document.querySelector('link[rel=\"canonical\"]'); "
                "return c ? c.getAttribute('href') : null; }"
            )
            result["raw_links"] = page.evaluate(
                "() =&gt; Array.from(document.querySelectorAll('a[href]'))"
                ".map(a =&gt; a.href).filter(Boolean).slice(0, 100)"
            )

        except PlaywrightTimeout:
            result["status_code"] = first_status["code"] or 408
        except Exception as exc:
            print(f"[browser] Error: {exc}", file=sys.stderr)
            result["status_code"] = first_status["code"]
        finally:
            browser.close()

    time.sleep(2)
    return result
</code></pre>
<p>A few things worth noting:</p>
<p>The <code>raw_links</code> cap at 100 is deliberate. DEV.to profile pages have hundreds of links — you don't need all of them for broken link detection.</p>
<p>The <code>wait_until="domcontentloaded"</code> setting is faster than <code>networkidle</code> and sufficient for meta tag extraction. JavaScript-rendered content needs the DOM to be ready, not all network requests to complete.</p>
<h2 id="heading-module-3-claude-extraction-layer">Module 3: Claude Extraction Layer</h2>
<p><code>extractor.py</code> takes the raw page snapshot from <code>browser.py</code> and calls Claude to produce a structured SEO audit result.</p>
<p>This is where most tutorials go wrong. They either write complex parsing logic in Python (fragile) or ask Claude for a free-form response and try to parse prose (unreliable). The right approach: give Claude a strict JSON schema and tell it to return nothing else.</p>
<p><strong>The prompt engineering that makes this reliable:</strong></p>
<pre><code class="language-python">import json
import os
import sys
from datetime import datetime, timezone
import anthropic

MODEL = "claude-sonnet-4-20250514"
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))


def _strip_fences(text: str) -&gt; str:
    """Remove accidental markdown code fences from Claude's response."""
    text = text.strip()
    if text.startswith("```"):
        lines = text.splitlines()
        # Drop opening fence
        lines = lines[1:] if lines[0].startswith("```") else lines
        # Drop closing fence
        if lines and lines[-1].strip() == "```":
            lines = lines[:-1]
        text = "\n".join(lines).strip()
    return text


def extract(snapshot: dict) -&gt; dict:
    if not os.environ.get("ANTHROPIC_API_KEY"):
        raise OSError("ANTHROPIC_API_KEY is not set.")

    prompt = f"""You are an SEO auditor. Analyze this page snapshot and return ONLY a JSON object.
No prose. No explanation. No markdown fences. Raw JSON only.

Page data:
- URL: {snapshot.get('final_url')}
- Status code: {snapshot.get('status_code')}
- Title: {snapshot.get('title')}
- Meta description: {snapshot.get('meta_description')}
- H1 tags: {snapshot.get('h1s')}
- Canonical: {snapshot.get('canonical')}

Return this exact schema:
{{
  "url": "string",
  "final_url": "string",
  "status_code": number,
  "title": {{"value": "string or null", "length": number, "status": "PASS or FAIL"}},
  "description": {{"value": "string or null", "length": number, "status": "PASS or FAIL"}},
  "h1": {{"count": number, "value": "string or null", "status": "PASS or FAIL"}},
  "canonical": {{"value": "string or null", "status": "PASS or FAIL"}},
  "flags": ["array of strings describing specific issues"],
  "human_review": false,
  "audited_at": "ISO timestamp"
}}

PASS/FAIL rules:
- title: FAIL if null or length &gt; 60 characters
- description: FAIL if null or length &gt; 160 characters  
- h1: FAIL if count is 0 (missing) or count &gt; 1 (multiple)
- canonical: FAIL if null
- flags: list every failing field with a clear description
- audited_at: use current UTC time in ISO 8601 format"""

    response = client.messages.create(
        model=MODEL,
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}],
    )

    raw = response.content[0].text
    clean = _strip_fences(raw)

    try:
        return json.loads(clean)
    except json.JSONDecodeError as exc:
        print(f"[extractor] JSON parse error: {exc}", file=sys.stderr)
        return _error_result(snapshot, str(exc))


def _error_result(snapshot: dict, reason: str) -&gt; dict:
    return {
        "url": snapshot.get("final_url", ""),
        "final_url": snapshot.get("final_url", ""),
        "status_code": snapshot.get("status_code"),
        "title": {"value": None, "length": 0, "status": "ERROR"},
        "description": {"value": None, "length": 0, "status": "ERROR"},
        "h1": {"count": 0, "value": None, "status": "ERROR"},
        "canonical": {"value": None, "status": "ERROR"},
        "flags": [f"Extraction error: {reason}"],
        "human_review": True,
        "audited_at": datetime.now(timezone.utc).isoformat(),
    }
</code></pre>
<p>Two things make this reliable in production:</p>
<p>First, <code>_strip_fences()</code> handles the case where Claude wraps its response in <code>```json</code> fences despite being told not to. This happens occasionally with Sonnet and consistently breaks <code>json.loads()</code> if you don't handle it.</p>
<p>Second, the <code>_error_result()</code> fallback means the agent never crashes on a bad Claude response — it logs the error and marks the URL for human review, then continues to the next URL.</p>
<p><strong>Cost:</strong> Claude Sonnet 4 is priced at \(3 per million input tokens and \)15 per million output tokens. A typical page snapshot is around 500 input tokens; the structured JSON response is around 300 output tokens. That works out to roughly \(0.006 per URL — about \)0.12 for a 20-URL audit.</p>
<h2 id="heading-module-4-broken-link-checker">Module 4: Broken Link Checker</h2>
<p><code>linkchecker.py</code> takes the <code>raw_links</code> list from the browser snapshot and checks same-domain links for broken status using async HEAD requests.</p>
<p>The design choices:</p>
<ul>
<li><p><strong>Same-domain only.</strong> Checking every external link on a page would take minutes and isn't what agency clients need. Filter to links on the same domain as the page being audited.</p>
</li>
<li><p><strong>HEAD requests, not GET.</strong> Faster, lower bandwidth, sufficient for status code detection.</p>
</li>
<li><p><strong>Cap at 50 links.</strong> Pages like DEV.to article listings have hundreds of internal links. Checking all of them would dominate the runtime.</p>
</li>
<li><p><strong>Concurrent requests via asyncio.</strong> All links are checked in parallel, not sequentially.</p>
</li>
</ul>
<pre><code class="language-python">import asyncio
import logging
from urllib.parse import urlparse
import httpx

CAP = 50
TIMEOUT = 5.0
logger = logging.getLogger(__name__)


def _same_domain(link: str, final_url: str) -&gt; bool:
    if not link:
        return False
    lower = link.strip().lower()
    if lower.startswith(("#", "mailto:", "javascript:", "tel:", "data:")):
        return False
    try:
        page_host = urlparse(final_url).netloc.lower()
        parsed = urlparse(link)
        return parsed.scheme in ("http", "https") and parsed.netloc.lower() == page_host
    except Exception:
        return False


async def _check_link(client: httpx.AsyncClient, url: str) -&gt; tuple[str, bool]:
    try:
        resp = await client.head(url, follow_redirects=True, timeout=TIMEOUT)
        return url, resp.status_code != 200
    except Exception:
        return url, True  # Timeout or connection error = broken


async def _run_checks(links: list[str]) -&gt; list[str]:
    async with httpx.AsyncClient() as client:
        results = await asyncio.gather(*[_check_link(client, url) for url in links])
    return [url for url, broken in results if broken]


def check_links(raw_links: list[str], final_url: str) -&gt; dict:
    same_domain = [l for l in raw_links if _same_domain(l, final_url)]

    capped = len(same_domain) &gt; CAP
    if capped:
        logger.warning("Page has %d same-domain links — capping at %d.", len(same_domain), CAP)
        same_domain = same_domain[:CAP]

    broken = asyncio.run(_run_checks(same_domain))

    return {
        "broken": broken,
        "count": len(broken),
        "status": "FAIL" if broken else "PASS",
        "capped": capped,
    }
</code></pre>
<h2 id="heading-module-5-human-in-the-loop">Module 5: Human-in-the-Loop</h2>
<p>This is the part most automation tutorials skip. What happens when the agent hits a login wall? A page that returns 403? A URL that redirects to a "Subscribe to continue reading" page?</p>
<p>Most scripts either crash or silently skip. Neither is acceptable in an agency context.</p>
<p><code>hitl.py</code> handles this with two functions: one that detects whether a pause is needed, and one that handles the pause itself.</p>
<pre><code class="language-python">from state import add_to_needs_human

LOGIN_KEYWORDS = {"login", "sign in", "sign-in", "access denied", "log in", "unauthorized"}
REDIRECT_CODES = {301, 302, 307, 308}


def should_pause(snapshot: dict) -&gt; bool:
    code = snapshot.get("status_code")

    # Navigation failed entirely
    if code is None:
        return True

    # Non-200, non-redirect
    if code != 200 and code not in REDIRECT_CODES:
        return True

    # Login wall detection
    title = (snapshot.get("title") or "").lower()
    h1s = [h.lower() for h in (snapshot.get("h1s") or [])]

    if any(kw in title for kw in LOGIN_KEYWORDS):
        return True
    if any(kw in h1 for kw in LOGIN_KEYWORDS for h1 in h1s):
        return True

    return False


def pause_reason(snapshot: dict) -&gt; str:
    code = snapshot.get("status_code")
    if code is None:
        return "Navigation failed (None status)"
    if code != 200 and code not in REDIRECT_CODES:
        return f"Unexpected status code: {code}"
    return "Possible login wall detected"


def pause_and_prompt(url: str, reason: str) -&gt; str:
    print(f"\n⚠️  HUMAN REVIEW NEEDED")
    print(f"   URL:    {url}")
    print(f"   Reason: {reason}")
    print(f"   Options: [s] skip  [r] retry  [q] quit\n")

    while True:
        choice = input("Your choice: ").strip().lower()
        if choice in ("s", "r", "q"):
            return {"s": "skip", "r": "retry", "q": "quit"}[choice]
        print("   Enter s, r, or q.")
</code></pre>
<p>The <code>should_pause()</code> function catches four cases: navigation failure, unexpected HTTP status, login keywords in the title, and login keywords in H1 tags. The login keyword check is what catches "Please sign in to continue" pages that return 200 but are effectively inaccessible.</p>
<p>In <code>--auto</code> mode (for scheduled runs), the main loop skips the <code>pause_and_prompt()</code> call and automatically handles these cases by logging the URL to <code>needs_human[]</code> in state and continuing.</p>
<h2 id="heading-module-6-report-writer">Module 6: Report Writer</h2>
<p><code>reporter.py</code> writes results incrementally. This is important: results are written after each URL is audited, not batched at the end. If the run is interrupted, you don't lose completed work.</p>
<pre><code class="language-python">import json
import os
from datetime import datetime, timezone

REPORT_JSON = os.path.join(os.path.dirname(__file__), "report.json")
REPORT_TXT = os.path.join(os.path.dirname(__file__), "report-summary.txt")


def _load_report() -&gt; list:
    if not os.path.exists(REPORT_JSON):
        return []
    with open(REPORT_JSON, encoding="utf-8") as f:
        return json.load(f)


def write_result(result: dict) -&gt; None:
    """Append or update a result in report.json."""
    entries = _load_report()
    url = result.get("url", "")

    # Update existing entry if URL already present (handles retries)
    for i, entry in enumerate(entries):
        if entry.get("url") == url:
            entries[i] = result
            break
    else:
        entries.append(result)

    with open(REPORT_JSON, "w", encoding="utf-8") as f:
        json.dump(entries, f, indent=2, ensure_ascii=False)


def _is_overall_pass(result: dict) -&gt; bool:
    fields = ["title", "description", "h1", "canonical"]
    for field in fields:
        if result.get(field, {}).get("status") not in ("PASS",):
            return False
    if result.get("broken_links", {}).get("status") == "FAIL":
        return False
    return True


def write_summary() -&gt; None:
    entries = _load_report()
    passed = sum(1 for e in entries if _is_overall_pass(e))

    lines = []
    for entry in entries:
        overall = "PASS" if _is_overall_pass(entry) else "FAIL"
        failed_fields = [
            f for f in ["title", "description", "h1", "canonical", "broken_links"]
            if entry.get(f, {}).get("status") == "FAIL"
        ]
        suffix = f" [{', '.join(failed_fields)}]" if failed_fields else ""
        lines.append(f"{entry.get('url', 'unknown'):&lt;60} | {overall}{suffix}")

    lines.append("")
    lines.append(f"{passed}/{len(entries)} URLs passed")

    with open(REPORT_TXT, "w", encoding="utf-8") as f:
        f.write("\n".join(lines))
</code></pre>
<p>The deduplication in <code>write_result()</code> handles retries cleanly. If a URL is retried after a human reviews a login wall and authenticates, the new result replaces the old one rather than creating a duplicate entry.</p>
<h2 id="heading-module-7-the-main-loop">Module 7: The Main Loop</h2>
<p><code>index.py</code> wires everything together. It reads the URL list, loads state, skips already-audited URLs, and runs the audit loop.</p>
<pre><code class="language-python">import csv
import os
import sys
import time
import argparse

from state import load_state, is_audited, mark_audited, add_to_needs_human
from browser import fetch_page
from extractor import extract
from linkchecker import check_links
from hitl import should_pause, pause_reason, pause_and_prompt
from reporter import write_result, write_summary

INPUT_CSV = os.path.join(os.path.dirname(__file__), "input.csv")


def read_urls(path: str) -&gt; list[str]:
    with open(path, newline="", encoding="utf-8") as f:
        return [row["url"].strip() for row in csv.DictReader(f) if row.get("url", "").strip()]


def run(auto: bool = False):
    if not os.environ.get("ANTHROPIC_API_KEY"):
        print("Error: ANTHROPIC_API_KEY environment variable is not set.")
        sys.exit(1)

    urls = read_urls(INPUT_CSV)
    pending = [u for u in urls if not is_audited(u)]

    print(f"Starting audit: {len(pending)} pending, {len(urls) - len(pending)} already done.\n")

    total = len(urls)

    try:
        for i, url in enumerate(pending, start=1):
            position = urls.index(url) + 1
            print(f"[{position}/{total}] {url}", end=" -&gt; ", flush=True)

            # Browser navigation
            snapshot = fetch_page(url)

            # Human-in-the-loop check
            if should_pause(snapshot):
                reason = pause_reason(snapshot)

                if auto:
                    print(f"AUTO-SKIPPED ({reason})")
                    add_to_needs_human(url)
                    mark_audited(url)
                    continue

                action = pause_and_prompt(url, reason)
                if action == "quit":
                    print("Exiting.")
                    break
                elif action == "skip":
                    add_to_needs_human(url)
                    mark_audited(url)
                    continue
                # "retry" falls through to re-fetch below
                snapshot = fetch_page(url)

            # Claude extraction
            result = extract(snapshot)

            # Broken link check
            links = check_links(snapshot.get("raw_links", []), snapshot.get("final_url", url))
            result["broken_links"] = links

            # Write result immediately
            write_result(result)
            mark_audited(url)

            overall = "PASS" if all(
                result.get(f, {}).get("status") == "PASS"
                for f in ["title", "description", "h1", "canonical"]
            ) and links["status"] == "PASS" else "FAIL"

            print(overall)

    except KeyboardInterrupt:
        print("\n\nInterrupted. Progress saved. Re-run to continue.")
        return

    write_summary()
    passed = sum(
        1 for e in [r for r in []]
        if all(e.get(f, {}).get("status") == "PASS" for f in ["title", "description", "h1", "canonical"])
    )
    print(f"\nAudit complete. Report saved to report.json and report-summary.txt")


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--auto", action="store_true", help="Auto-skip URLs requiring human review")
    args = parser.parse_args()
    run(auto=args.auto)
</code></pre>
<p>The <code>KeyboardInterrupt</code> handler is the resume mechanism. When you press Ctrl+C, the handler prints a message and exits cleanly. Because <code>mark_audited()</code> is called after <code>write_result()</code> for each URL, the next run skips everything already processed.</p>
<h2 id="heading-running-the-agent">Running the Agent</h2>
<p>Interactive mode (pauses on edge cases):</p>
<pre><code class="language-bash">python index.py
</code></pre>
<p>Auto mode (skips edge cases, adds to <code>needs_human[]</code>):</p>
<pre><code class="language-bash">python index.py --auto
</code></pre>
<p>When it runs, you'll see the browser window open for each URL and the terminal print progress:</p>
<pre><code class="language-plaintext">Starting audit: 7 pending, 0 already done.

[1/7] https://example.com -&gt; PASS
[2/7] https://example.com/about -&gt; FAIL
[3/7] https://example.com/contact -&gt; AUTO-SKIPPED (Unexpected status code: 404)
...
Audit complete. Report saved to report.json and report-summary.txt
</code></pre>
<p>To resume after an interruption:</p>
<pre><code class="language-bash">python index.py --auto
# Starting audit: 4 pending, 3 already done.
</code></pre>
<h2 id="heading-scheduling-for-agency-use">Scheduling for Agency Use</h2>
<p>For recurring weekly audits, create a batch file and schedule it with Windows Task Scheduler.</p>
<p>Create <code>run-audit.bat</code>:</p>
<pre><code class="language-batch">@echo off
set ANTHROPIC_API_KEY=your-key-here
cd /d C:\Users\yourname\Desktop\seo-agent
python index.py --auto
</code></pre>
<p>In Windows Task Scheduler:</p>
<ol>
<li><p>Create a new Basic Task</p>
</li>
<li><p>Set the trigger to Weekly, Monday at 7:00 AM</p>
</li>
<li><p>Set the action to "Start a program"</p>
</li>
<li><p>Browse to your <code>run-audit.bat</code> file</p>
</li>
</ol>
<p>Check <code>report-summary.txt</code> on Monday morning. URLs in <code>needs_human[]</code> in <code>state.json</code> need manual review — login walls, paywalls, or pages that returned unexpected status codes.</p>
<p>For macOS/Linux, use cron:</p>
<pre><code class="language-bash"># Run every Monday at 7am
0 7 * * 1 cd /path/to/seo-agent &amp;&amp; ANTHROPIC_API_KEY=your-key python index.py --auto
</code></pre>
<h2 id="heading-what-the-results-look-like">What the Results Look Like</h2>
<p>I ran this agent against seven of my own published pages across Hashnode, freeCodeCamp, and DEV.to. Every single one failed.</p>
<pre><code class="language-plaintext">https://hashnode.com/@dannwaneri                    | FAIL [h1]
https://freecodecamp.org/news/claude-code-skill     | FAIL [description]
https://freecodecamp.org/news/stop-letting-ai-guess | FAIL [description]
https://freecodecamp.org/news/rag-system-handbook   | FAIL [title, description]
https://freecodecamp.org/news/author/dannwaneri     | FAIL [description]
https://dev.to/dannwaneri/gatekeeping-panic         | FAIL [title]
https://dev.to/dannwaneri/production-rag-system     | FAIL [title]

0/7 URLs passed
</code></pre>
<p>The freeCodeCamp description issues are partly platform-level — freeCodeCamp's template sometimes truncates or omits meta descriptions for article listing pages. The DEV.to title issues are mine. Article titles that work as headlines often exceed 60 characters in the <code>&lt;title&gt;</code> tag.</p>
<p>A note on the 60-character title rule: this is a display threshold, not a ranking penalty. Google indexes titles of any length. The 60-character guideline reflects approximately how many characters fit in a desktop SERP result before truncation. Titles over 60 characters often still rank — they just get cut off in search results, which can hurt click-through rate. The agent flags display risk, not a ranking violation.</p>
<h2 id="heading-next-steps">Next Steps</h2>
<p>The agent as built handles the core SEO audit workflow. Obvious extensions:</p>
<ul>
<li><p><strong>Performance metrics</strong> — add a Lighthouse or PageSpeed Insights API call per URL</p>
</li>
<li><p><strong>Structured data validation</strong> — check for JSON-LD schema markup and validate it</p>
</li>
<li><p><strong>Email delivery</strong> — send <code>report-summary.txt</code> via SMTP after the run completes</p>
</li>
<li><p><strong>Multi-client support</strong> — separate <code>input.csv</code> files per client, separate report directories</p>
</li>
</ul>
<p>The full code including all seven modules is at <a href="https://github.com/dannwaneri/seo-agent">dannwaneri/seo-agent</a>. Clone it, add your URLs, and run it.</p>
<p><em>If you found this useful, I write about practical AI agent setups for developers and agencies at</em> <a href="https://dev.to/dannwaneri"><em>DEV.to/@dannwaneri</em></a><em>. The DEV.to companion piece covers the design decisions behind the agent — why HITL matters, why Browser Use over scrapers, and what the audit results mean for your own published content.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build Your Own Claude Code Skill ]]>
                </title>
                <description>
                    <![CDATA[ Every developer eventually has a workflow they repeat. A way they write commit messages. A checklist they run before opening a pull request. A structure they follow when reviewing code. They do it man ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-your-own-claude-code-skill/</link>
                <guid isPermaLink="false">69c6ecde7cf27065104cd8a1</guid>
                
                    <category>
                        <![CDATA[ claude.ai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude ai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Productivity ]]>
                    </category>
                
                    <category>
                        <![CDATA[ webdev ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Daniel Nwaneri ]]>
                </dc:creator>
                <pubDate>Fri, 27 Mar 2026 20:47:26 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/c0947834-4b11-46c6-ab61-994667e70a7e.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Every developer eventually has a workflow they repeat. A way they write commit messages. A checklist they run before opening a pull request. A structure they follow when reviewing code. They do it manually, explain it to their agents in every session, and watch the agent interpret it differently each time.</p>
<p>Agent skills fix this. A skill is a markdown file that loads into Claude Code's context automatically when you need it. You write the workflow once. The agent follows it every time. And because skills follow an open standard, the same file works in Claude Code, GitHub Copilot, Cursor, and Gemini CLI.</p>
<p>This tutorial shows you how to build a skill from scratch. You will build a commit-message-writer — a skill that reads your staged changes and generates a structured commit message following the Conventional Commits standard. By the end, you will have a working skill installed and ready to use, and you will understand the structure well enough to build any skill you need.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a href="#heading-what-an-agent-skill-is">What an Agent Skill Is</a></p>
</li>
<li><p><a href="#heading-how-to-choose-what-to-build">How to Choose What to Build</a></p>
</li>
<li><p><a href="#heading-how-to-structure-your-skill">How to Structure Your Skill</a></p>
</li>
<li><p><a href="#heading-how-to-write-the-description">How to Write the Description</a></p>
</li>
<li><p><a href="#heading-how-to-write-the-instructions">How to Write the Instructions</a></p>
</li>
<li><p><a href="#heading-how-to-build-the-commit-message-writer-skill">How to Build the commit-message-writer Skill</a></p>
</li>
<li><p><a href="#heading-how-to-install-and-test-your-skill">How to Install and Test Your Skill</a></p>
</li>
<li><p><a href="#heading-how-to-improve-your-skill-over-time">How to Improve Your Skill Over Time</a></p>
</li>
<li><p><a href="#heading-where-to-go-next">Where to Go Next</a></p>
</li>
</ol>
<h2 id="heading-what-an-agent-skill-is">What an Agent Skill Is</h2>
<p>A skill is a folder containing a <code>SKILL.md</code> file. That file has two parts: a YAML frontmatter block at the top, and a markdown body below it.</p>
<pre><code class="language-plaintext">my-skill/
└── SKILL.md
</code></pre>
<p>The frontmatter tells the agent what the skill is called and when to use it. The body tells the agent what to do when it loads the skill. Here is the minimal structure:</p>
<pre><code class="language-yaml">---
name: my-skill
description: What this skill does and when to use it.
---
 
# My Skill
 
Instructions for the agent go here.
</code></pre>
<p>When you invoke a skill — either explicitly with <code>/skill-name</code> or by describing what you want — the agent reads the SKILL.md body and follows the instructions inside it. The frontmatter never reaches the agent's instructions. It's metadata the skill system uses to decide whether to load the skill at all.</p>
<h3 id="heading-how-the-agent-decides-to-load-a-skill">How the Agent Decides to Load a Skill</h3>
<p>This is the most important thing to understand before you write your first skill: <strong>the agent decides whether to load your skill based entirely on the description field.</strong></p>
<p>Skills appear in Claude Code's context as a list of names and descriptions. When you make a request, the agent scans that list and loads any skill whose description matches what you're asking for. If the description is vague, the skill won't load when you need it. If the description is too narrow, it won't load for variations of the same request.</p>
<p>The instructions in the body only matter after the skill loads. Getting the description right is what determines whether the skill loads at all.</p>
<h3 id="heading-what-skills-are-not">What Skills Are Not</h3>
<p>Skills are instruction files. They cannot run code on their own — but they can instruct the agent to run code using its existing tools. They are not plugins, extensions, or packages. They have no runtime. They are markdown files the agent reads, like a recipe a chef follows.</p>
<h2 id="heading-how-to-choose-what-to-build">How to Choose What to Build</h2>
<p>The best skills share three properties.</p>
<ol>
<li><p><strong>They encode a repeatable workflow.</strong> If you do something differently every time, a skill won't help. If you follow the same steps every session — even if you explain them differently each time — that's a skill candidate.</p>
</li>
<li><p><strong>They have a clear trigger.</strong> You should be able to finish the sentence "I need this skill when I want to...". If you can't finish that sentence in one clause, the workflow isn't scoped enough for a skill.</p>
</li>
<li><p><strong>They produce a consistent output format.</strong> Skills that output in a fixed structure — a commit message, a code review, a spec — are easier to build and test than skills that produce open-ended prose.</p>
</li>
</ol>
<p>Good candidates: commit messages, pull request descriptions, code reviews, changelog entries. Bad candidates: "help me think through this", "make this better" — too open-ended to encode in a skill.</p>
<p>For this tutorial, commit message generation is the right scope. The trigger is obvious (you want to commit), the workflow is defined (read staged changes, apply Conventional Commits format), and the output is structured (a commit message with a specific shape).</p>
<h2 id="heading-how-to-structure-your-skill">How to Structure Your Skill</h2>
<p>Every skill starts as a single folder with a single file:</p>
<pre><code class="language-plaintext">commit-message-writer/
└── SKILL.md
</code></pre>
<p>As skills grow, they can include additional files the agent loads as needed:</p>
<pre><code class="language-plaintext">commit-message-writer/
├── SKILL.md          ← always loaded when skill triggers
└── references/
    └── examples.md   ← loaded only when the agent needs examples
</code></pre>
<p>The SKILL.md body should stay under 500 lines. If your instructions are growing beyond that, move supporting detail into a <code>references/</code> subfolder and tell the agent when to read those files. This keeps the skill lean — the agent only loads what it needs.</p>
<p>For this tutorial, a single SKILL.md is enough.</p>
<h2 id="heading-how-to-write-the-description">How to Write the Description</h2>
<p>The description field is the trigger condition. It determines when your skill loads and when it doesn't. Most skills fail not because the instructions are wrong, but because the description doesn't match how people actually ask for help.</p>
<p>Here is a weak description:</p>
<pre><code class="language-yaml">description: Generates commit messages.
</code></pre>
<p>This will undertrigger. "Generate a commit message" will load it. "Write a commit for my changes" probably won't. "Summarize my staged diff" definitely won't — even though all three are asking for the same thing.</p>
<p>Here is a stronger description:</p>
<pre><code class="language-yaml">description: Generates structured commit messages following the Conventional Commits standard. Use when you want to commit your changes and need a well-formatted message. Triggers on "write a commit message", "commit my changes", "summarize my staged diff", "what should my commit say", or any request to describe or document code changes for version control.
</code></pre>
<p>The pattern is: <strong>what the skill does + when to use it + specific trigger phrases</strong>. The trigger phrases cover the different ways a developer might ask for the same thing.</p>
<p>Two rules for descriptions:</p>
<p><strong>Be specific about the output.</strong> "Generates commit messages" is vague. "Generates structured commit messages following the Conventional Commits standard" tells the agent and the user exactly what they'll get.</p>
<p><strong>Be slightly pushy.</strong> The agent has a natural tendency to undertrigger skills — to handle requests itself rather than loading a skill. A description that explicitly lists trigger phrases counteracts this. You are not being redundant. You are training the trigger.</p>
<h2 id="heading-how-to-write-the-instructions">How to Write the Instructions</h2>
<p>The body of SKILL.md is where you define what the agent does when the skill loads. Good instructions follow two principles.</p>
<p><strong>Generate first, clarify second.</strong> The agent should produce output immediately rather than asking clarifying questions. If it needs to make assumptions, it should make them and flag them — not ask. Asking questions before producing output adds friction and loses the benefit of having a skill at all.</p>
<p><strong>Define the output format explicitly.</strong> Don't say "write a good commit message." Say exactly what the structure is, what fields are required, what the character limits are. The more specific the output format, the more consistent the results.</p>
<p>Here is what weak instructions look like:</p>
<pre><code class="language-markdown"># Commit Message Writer
 
Look at the staged changes and write a commit message that describes what changed.
</code></pre>
<p>That will produce different results every time — different formats, different lengths, different conventions. It's not a skill. It's a prompt.</p>
<p>Here is what strong instructions look like:</p>
<pre><code class="language-markdown"># Commit Message Writer
 
Read the staged diff using `git diff --staged`. Generate a commit message
following the Conventional Commits standard.
 
Output format:
type(scope): short description under 72 characters
 
Body (if changes are non-trivial):
- What changed and why, not how
- One bullet per logical change
 
Footer (if applicable):
BREAKING CHANGE: description
Closes #issue-number
</code></pre>
<p>The agent knows exactly what to produce. The output will be consistent across sessions, across projects, and across agents that support the standard.</p>
<h2 id="heading-how-to-build-the-commit-message-writer-skill">How to Build the <code>commit-message-writer</code> Skill</h2>
<p>Now build it. Create the skill directory:</p>
<pre><code class="language-bash">mkdir -p ~/.claude/skills/commit-message-writer
</code></pre>
<p>On Windows PowerShell:</p>
<p><strong>Note:</strong> PowerShell uses backtick (<code>`</code>) for line continuation, not backslash.</p>
<pre><code class="language-powershell">New-Item -ItemType Directory -Force -Path "$HOME\.claude\skills\commit-message-writer"
</code></pre>
<p>Create the SKILL.md file inside that directory. Here is the complete content:</p>
<pre><code class="language-markdown">---
name: commit-message-writer
description: Generates structured commit messages following the Conventional Commits
  standard. Use when you want to commit your changes and need a well-formatted message.
  Triggers on "write a commit message", "commit my changes", "summarize my staged
  diff", "what should my commit say", or any request to describe or document staged
  changes for version control.
---
 
# commit-message-writer
 
You generate structured commit messages from staged git changes.
 
## How to invoke
 
Run `git diff --staged` to read the staged changes. If nothing is staged, tell the
user and suggest they run `git add` first.
 
Generate first. Do not ask clarifying questions before producing the commit message.
If you need to make assumptions about scope or type, make them and note them after
the output.
 
## Output format
 
~~~
type(scope): short description
 
[body — optional, include if changes are non-trivial]
 
[footer — optional]
~~~
 
**Type** — choose one:
- `feat` — a new feature
- `fix` — a bug fix
- `docs` — documentation changes only
- `refactor` — code change that neither fixes a bug nor adds a feature
- `test` — adding or updating tests
- `chore` — build process, tooling, or dependency updates
 
**Scope** — the module, file, or area affected. Use the directory name or component
name. Omit if the change spans the entire codebase.
 
**Short description** — imperative mood, under 72 characters, no period at the end.
"Add user authentication" not "Added user authentication" or "Adds user authentication."
 
**Body** — what changed and why, not how. One bullet per logical change. Skip if the
short description is self-explanatory.
 
**Footer** — include `BREAKING CHANGE:` if the commit breaks backward compatibility.
Include `Closes #N` if it resolves a GitHub issue.
 
## Quality rules
 
- Never use "updated", "changed", or "modified" in the short description — be specific
- Never write "various improvements" or "misc fixes" — name what improved
- If more than three files changed across unrelated concerns, flag it:
  "These changes may be better split into separate commits: [list concerns]"
- The short description must be under 72 characters — count before outputting
 
## Example output
 
Input: staged changes adding a rate limiter to an API endpoint
 
~~~
feat(api): add rate limiting to /query endpoint
 
- Limits requests to 100 per minute per IP using Cloudflare's rate limit binding
- Returns 429 with Retry-After header when limit is exceeded
- Adds rate limit configuration to wrangler.toml
 
Closes #47
~~~
</code></pre>
<p>Save that file. The skill is built.</p>
<h2 id="heading-how-to-install-and-test-your-skill">How to Install and Test Your Skill</h2>
<h3 id="heading-verify-the-file-exists">Verify the File Exists</h3>
<pre><code class="language-bash">cat ~/.claude/skills/commit-message-writer/SKILL.md
</code></pre>
<p>You should see the full SKILL.md content. If you get an error, check the directory path.</p>
<h3 id="heading-test-the-skill">Test the Skill</h3>
<p>Open Claude Code in any git repository that has staged changes. Type:</p>
<pre><code class="language-plaintext">/commit-message-writer
</code></pre>
<p>The agent will read your staged diff and produce a commit message following the format you defined.</p>
<p>You can also trigger it naturally:</p>
<pre><code class="language-plaintext">write a commit message for my staged changes
</code></pre>
<pre><code class="language-plaintext">what should my commit say
</code></pre>
<pre><code class="language-plaintext">summarize my diff for git
</code></pre>
<p>All three should load the skill and produce a structured commit message. If the skill doesn't trigger on natural language requests, the description needs more trigger phrases — see the improvement section below.</p>
<h3 id="heading-test-edge-cases">Test Edge Cases</h3>
<p>Test these cases before relying on the skill in production:</p>
<pre><code class="language-bash"># Stage nothing, then ask for a commit message
git add -p  # stage nothing
# In Claude Code: "write a commit message"
# Expected: skill tells you nothing is staged and suggests git add
</code></pre>
<pre><code class="language-bash"># Stage changes across unrelated files
git add src/api.ts src/styles.css README.md
# In Claude Code: "write a commit message"  
# Expected: skill flags that commits may be better split
</code></pre>
<h2 id="heading-how-to-improve-your-skill-over-time">How to Improve Your Skill Over Time</h2>
<p>The first version of any skill is a draft. You improve it by observing where it produces inconsistent or wrong output, then updating the instructions.</p>
<h3 id="heading-when-the-skill-undertriggers">When the Skill Undertriggers</h3>
<p>If you type "summarize my changes for git" and the skill doesn't load, add that phrase to the description's trigger list:</p>
<pre><code class="language-yaml">description: ... Triggers on "write a commit message", "commit my changes",
  "summarize my staged diff", "summarize my changes for git", ...
</code></pre>
<p>The description is your primary lever for fixing triggering problems.</p>
<h3 id="heading-when-the-output-format-drifts">When the Output Format Drifts</h3>
<p>If the agent starts producing commit messages that don't match your format — wrong type, missing scope, body in the wrong style — the instructions need to be more explicit. Add a concrete example that shows the failure and the correct output:</p>
<pre><code class="language-markdown">## Common mistakes to avoid
 
Wrong: "Updated the authentication flow"
Right: "refactor(auth): simplify token validation logic"
 
Wrong: "Fixed bugs"
Right: "fix(api): handle null response from upstream service"
</code></pre>
<p>Concrete counterexamples are more effective than abstract rules.</p>
<h3 id="heading-when-the-scope-grows">When the Scope Grows</h3>
<p>If you find yourself wanting the skill to handle related tasks — reviewing commit messages, generating changelogs, writing PR descriptions — resist the urge to add everything to one skill. Build separate skills. Each skill should do one thing well. The Agent Skills standard is designed for composition, not for monolithic instructions.</p>
<h2 id="heading-where-to-go-next">Where to Go Next</h2>
<p>The commit-message-writer covers the core pattern. The same structure works for any repeatable workflow.</p>
<p><strong>Pull request descriptions</strong> follow the same shape — read the diff, apply a structure, produce consistent output. The trigger phrases are different ("write a PR description", "summarize my branch for review") and the output format adds sections for motivation and testing, but the SKILL.md structure is identical.</p>
<p><strong>Code review checklists</strong> work well as skills when your team has a standard review process. The trigger is "review this code" or "check this PR", and the instructions encode whatever your team actually checks — security concerns, test coverage, naming conventions.</p>
<p>The commit-message-writer is the simplest skill architecture — instructions only. As your skills grow more specialized, two other patterns become useful.</p>
<p>The first adds a <code>references/</code> directory: the voice-humanizer skill loads a CORPUS.md file containing the author's published writing, which the agent reads when it needs to check output against a specific style. The second adds quality rules and structured output formats that make results stricter and more consistent — that's the pattern spec-writer uses to surface assumptions inline. Each is the same SKILL.md structure at a different level of complexity.</p>
<p>Start with instructions only. Add references when the agent needs external context. Add output format rules when consistency matters more than flexibility.</p>
<p>The Agent Skills standard is supported in Claude Code, GitHub Copilot in VS Code, Cursor, and Gemini CLI. A skill you build once installs across all of them. The install path differs by agent:</p>
<table>
<thead>
<tr>
<th>Agent</th>
<th>Skills directory</th>
</tr>
</thead>
<tbody><tr>
<td>Claude Code</td>
<td><code>~/.claude/skills/</code></td>
</tr>
<tr>
<td>GitHub Copilot</td>
<td><code>~/.copilot/skills/</code> or <code>.github/skills/</code></td>
</tr>
<tr>
<td>Cursor</td>
<td><code>~/.cursor/skills/</code></td>
</tr>
<tr>
<td>Gemini CLI</td>
<td><code>~/.gemini/skills/</code></td>
</tr>
</tbody></table>
<p>The SKILL.md format is the same across all of them.</p>
<p>The commit-message-writer you just built is a working skill. The next one will take less time. By the third, you will start seeing workflows you repeat and immediately think: that should be a skill.</p>
<p>That's the point.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What Happened When I Replaced Copilot with Claude Code for 2 Weeks ]]>
                </title>
                <description>
                    <![CDATA[ GitHub Copilot costs $10/month, and I'd been using it for two years without thinking twice. But when Claude Code launched, I got curious. What if I just... switched? I didn't want to just add Claude C ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-happened-when-i-replaced-copilot-with-claude-code-for-2-weeks/</link>
                <guid isPermaLink="false">69c6d07e7cf2706510370b13</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude.ai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ copilot ]]>
                    </category>
                
                    <category>
                        <![CDATA[ GitHub ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Programming Tips ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Balajee Asish Brahmandam ]]>
                </dc:creator>
                <pubDate>Fri, 27 Mar 2026 18:46:22 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/b4f5a663-3ef6-4fcb-a08c-1c0ff36c495d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>GitHub Copilot costs $10/month, and I'd been using it for two years without thinking twice. But when Claude Code launched, I got curious. What if I just... switched?</p>
<p>I didn't want to just add Claude Code to my stack. I actually wanted to replace Copilot entirely for two weeks. I kept everything else the same – same editor, same projects, same workflow. I just swapped the autocomplete suggestion tool.</p>
<p>Here's what broke, what improved, and whether I went back.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-the-setup">The Setup</a></p>
</li>
<li><p><a href="#heading-what-worked-better">What Worked Better</a></p>
</li>
<li><p><a href="#heading-what-broke-or-slowed-things-down">What Broke (Or Slowed Things Down)</a></p>
</li>
<li><p><a href="#heading-the-first-week-vs-the-second-week">The First Week vs The Second Week</a></p>
</li>
<li><p><a href="#heading-why-i-went-back">Why I Went Back</a></p>
</li>
<li><p><a href="#heading-the-honest-verdict">The Honest Verdict</a></p>
</li>
<li><p><a href="#heading-what-i-actually-use-now">What I Actually Use Now</a></p>
</li>
<li><p><a href="#heading-copilot-vs-claude-code-the-breakdown">Copilot vs Claude Code — The Breakdown</a></p>
</li>
<li><p><a href="#heading-a-word-on-developer-experience">A Word on Developer Experience</a></p>
</li>
<li><p><a href="#heading-what-would-make-me-switch">What Would Make Me Switch</a></p>
</li>
<li><p><a href="#heading-final-thoughts">Final Thoughts</a></p>
</li>
</ul>
<h2 id="heading-the-setup">The Setup</h2>
<p><strong>Environment:</strong></p>
<ul>
<li><p>Python 3.12 for backend work (Django REST framework specifically)</p>
</li>
<li><p>React/TypeScript for frontend</p>
</li>
<li><p>VSCode as my editor</p>
</li>
<li><p>A mid-sized project with about 15k lines of code across backend and frontend</p>
</li>
<li><p>Two weeks, normal workload (roughly 30-40 hours of coding)</p>
</li>
<li><p>Working on features I'd normally tackle: adding endpoints, debugging issues, writing tests</p>
</li>
</ul>
<p><strong>What I did:</strong></p>
<ul>
<li><p>Disabled GitHub Copilot completely. Uninstalled the extension.</p>
</li>
<li><p>Set up Claude Code (via their CLI and VSCode integration).</p>
</li>
<li><p>Kept everything else identical: same repos, same Git flow, same daily work.</p>
</li>
<li><p>Tracked time on each task to see if there was a real difference.</p>
</li>
</ul>
<p><strong>Ground rules:</strong></p>
<ul>
<li><p>I couldn't use Copilot as a fallback. This was an honest comparison.</p>
</li>
<li><p>I logged every time I got frustrated or felt like Claude Code was slowing me down.</p>
</li>
<li><p>I kept track of bugs I caught vs. bugs I missed.</p>
</li>
</ul>
<p>The goal: Does Claude Code work as a day-to-day replacement for Copilot, or does it force me back?</p>
<h2 id="heading-what-worked-better">What Worked Better</h2>
<h3 id="heading-accuracy">Accuracy</h3>
<p>Copilot sometimes suggests things that are close but not quite right. It might finish a regex pattern 80% correctly, and I have to tweak it. It happens maybe 20% of the time.</p>
<p>Claude Code was more accurate. In the first week, I noticed fewer "close but wrong" suggestions. When I typed a function signature, Claude got the implementation right more often than Copilot did.</p>
<p>One example: I was writing a utility to parse JSON and handle errors. Copilot suggested:</p>
<pre><code class="language-python">def parse_json(data):
 try:
 return json.loads(data)
 except:
 return None
</code></pre>
<p>That's sloppy. It catches all exceptions and silently fails.</p>
<p>Claude Code suggested:</p>
<pre><code class="language-python">def parse_json(data):
 try:
 return json.loads(data)
 except json.JSONDecodeError as e:
 logging.error(f"Failed to parse JSON: {e}")
 return None
 except Exception as e:
 logging.error(f"Unexpected error: {e}")
 raise
</code></pre>
<p>Better error handling. More production-ready. That's a real difference.</p>
<p>I estimate Claude Code's suggestions were "immediately usable" about 85% of the time. Copilot was more like 70%.</p>
<h3 id="heading-understanding-context">Understanding Context</h3>
<p>Claude Code seems to understand your project better than Copilot. When I opened a file with Claude Code context, it knew:</p>
<ul>
<li><p>My project's naming conventions (I use <code>fetch_</code> for async functions, <code>get_</code> for sync).</p>
</li>
<li><p>My error handling style.</p>
</li>
<li><p>What libraries I was using.</p>
</li>
</ul>
<p>Copilot sometimes forgot these patterns or suggested things using the wrong library. Claude Code was more consistent.</p>
<p>One morning I was adding a new endpoint to an existing API. I typed the route signature:</p>
<pre><code class="language-python">@app.post("/api/users")
async def create_user(data: UserPayload):
</code></pre>
<p>Copilot might suggest:</p>
<pre><code class="language-python"> response = requests.post(...)
</code></pre>
<p>(Wrong! That's sync. This function is async.)</p>
<p>Claude Code suggested:</p>
<pre><code class="language-python"> async with httpx.AsyncClient() as client:
 response = await client.post(...)
</code></pre>
<p>It remembered that the entire codebase uses async/await and httpx for async calls. That's attention to detail.</p>
<h3 id="heading-reasoning-about-requirements">Reasoning About Requirements</h3>
<p>Sometimes Copilot just completes code. It doesn't think about whether it makes sense.</p>
<p>Claude Code seemed to reason about whether the suggestion was actually what you wanted. A few times, when I was writing ambiguous code, Claude Code offered a clarifying suggestion instead of just finishing it.</p>
<p>Example: I started a function for sorting users:</p>
<pre><code class="language-python">def sort_users(users):
</code></pre>
<p>Copilot would auto-complete with some sorting logic, but I'd have to check if it was what I meant.</p>
<p>Claude Code would sometimes suggest:</p>
<pre><code class="language-python">def sort_users(users, key="created_at", reverse=False):
</code></pre>
<p>It was thinking: "Sorting is ambiguous. What key? What order?" It was right more often than not.</p>
<h2 id="heading-what-broke-or-slowed-things-down">What Broke (Or Slowed Things Down)</h2>
<h3 id="heading-response-time">Response Time</h3>
<p>This was the biggest issue. Copilot is instant. I type <code>def get_</code> and it finishes before I can blink. It's autocomplete, and autocomplete needs to be fast. The latency is maybe 100-200ms.</p>
<p>Claude Code has a noticeable delay. Maybe 1-2 seconds before suggestions appear. On day one, that felt fine – I had time to think. By day two, I was annoyed. By day three, I was genuinely frustrated.</p>
<p>Over a day of coding, that adds up. If you're typing 20 functions and each one has a 2-second delay, that's 40 seconds of just waiting. It doesn't sound like much, but it breaks flow. Flow is where the good coding happens.</p>
<p>By day three, I was getting frustrated. I'd type faster than Claude Code could suggest, which meant I'd often just finish the code myself. The second a suggestion appeared, I'd already moved on. Defeating the purpose.</p>
<p>I tested this by tracking time. Same function, same complexity:</p>
<ul>
<li><p><strong>With Copilot:</strong> 3 minutes (including auto-complete time)</p>
</li>
<li><p><strong>With Claude Code:</strong> 5 minutes (waiting for suggestions + finishing manually)</p>
</li>
</ul>
<p>The delay isn't theoretical. It's real and measurable.</p>
<p><strong>The truth:</strong> Copilot is an autocomplete tool. It needs sub-second latency. Claude Code, being more powerful, is inherently slower. That's a fundamental tradeoff. You can't have both "instant" and "smart." Choose one.</p>
<h3 id="heading-no-inline-acceptance">No Inline Acceptance</h3>
<p>With Copilot, I press Tab to accept. It's in my muscle memory. Tab = accept.</p>
<p>Claude Code doesn't work exactly the same way. I had to click or use a different keyboard shortcut. Small thing, but it broke my rhythm constantly. I'd write code, see a suggestion, and instinctively press Tab. Nothing would happen. Then I'd remember: "Oh right, it's a different tool."</p>
<p>After two weeks, I never fully got used to it.</p>
<h3 id="heading-disconnected-from-flow">Disconnected From Flow</h3>
<p>Copilot is so embedded in the editor that I don't think about it. It's just there, like spellcheck. Claude Code feels like a separate tool I'm using, which means I'm more aware of it. That sounds like a good thing, but it's actually more cognitively expensive.</p>
<p>I wanted to type and have suggestions appear. Instead, I felt like I was using a tool. There's a difference. It's the same difference between walking and thinking about walking. When you're thinking about your walking mechanics, you walk worse.</p>
<p>This affected my productivity more than I expected. On day three, I found myself just typing manually instead of waiting for suggestions. It wasn't a conscious decision. I'd just start typing and then remember "oh, the suggestion came in." By then I'd already finished half the function myself.</p>
<h3 id="heading-limited-to-the-file">Limited to the File</h3>
<p>Copilot understands your entire project. It knows what's in other files, what libraries you import, what conventions you follow. If I'm importing a utility function that doesn't exist yet, Copilot knows to suggest the import with the path I'd use.</p>
<p>Claude Code seemed more limited to the current file. Sometimes it would suggest imports that weren't already in the file, or use patterns different from the rest of my codebase. Not often, but enough to notice. On one occasion, it suggested a database query pattern that was different from my whole codebase. It would've worked, but it would've been inconsistent.</p>
<p>This is less of a limitation and more of a design difference. Claude Code is built for depth on individual files, not breadth across a project.</p>
<h2 id="heading-the-first-week-vs-the-second-week">The First Week vs The Second Week</h2>
<p><strong>Week 1:</strong> I was excited. Claude Code felt smarter. I noticed the accuracy advantage. But the latency was starting to annoy me.</p>
<p><strong>Week 2:</strong> The novelty wore off. The latency was more annoying. I was missing Copilot's speed. I found myself disabling Claude Code's suggestions and typing manually more often, which defeated the purpose. "If I'm typing it all manually anyway, why switch?"</p>
<p>By day 10, I was typing code faster with Claude Code disabled than with it enabled. That's when I knew it wasn't working for me.</p>
<h2 id="heading-why-i-went-back">Why I Went Back</h2>
<p>On day 14, I re-enabled Copilot.</p>
<p>The first thing I noticed: speed. Code was completing again instantly. My rhythm came back. I hit Tab, it accepted, I moved on. That's the entire appeal of Copilot-it's frictionless.</p>
<p>I also realized how much I'd been manually typing. On days 10-14, I was writing more code by hand because the suggestions felt too slow to be worth waiting for. Without realizing it, I'd completely stopped using Claude Code's suggestions. I was just typing. That's the worst of both worlds: no AI help and the cognitive burden of being aware you're using a tool that's not helping.</p>
<p>Was I sacrificing accuracy? A little. But I'm accurate enough that I catch mistakes in review. For day-to-day, Copilot is fine.</p>
<p>The second thing: it just works. No weird setup, no integration issues. It's part of VSCode. It's always there.</p>
<p>By day 15, I was back to normal productivity, maybe even higher because the flow was better.</p>
<h2 id="heading-the-honest-verdict">The Honest Verdict</h2>
<p>Claude Code isn't a Copilot replacement. It's not worse. It's different. It's like comparing a calculator to a calculator app on your phone. One is designed for speed and muscle memory. One is designed to be a full computer in your pocket. They're not competitors.</p>
<p>If I'd tried Claude Code expecting it to be better at debugging, I would've been happy. I was trying it expecting it to replace my autocomplete, which is where it falls flat.</p>
<p>The experiment was valuable, though. It taught me that:</p>
<ol>
<li><p>Latency matters more than I expected. A 2-second delay breaks flow.</p>
</li>
<li><p>Familiarity matters. Tab to accept is burned into my muscle memory.</p>
</li>
<li><p>Tool stacking works. Claude Code is great for debugging. Copilot is great for autocomplete. Together they're better than either alone.</p>
</li>
</ol>
<h2 id="heading-what-i-actually-use-now">What I Actually Use Now</h2>
<p>I didn't abandon Claude Code. I just changed how I use it.</p>
<ul>
<li><p><strong>Claude Code:</strong> For debugging, analysis, and big changes. "Why is this function slow?" "Refactor this for readability." I invoke it deliberately when I need thinking, not continuous autocomplete.</p>
</li>
<li><p><strong>Copilot:</strong> For routine coding. Finishing functions, auto-completing imports, normal flow.</p>
</li>
</ul>
<p>That's the working solution. Claude Code is powerful, but it's not a Copilot replacement for daily work. It's a different tool for a different use case.</p>
<h2 id="heading-copilot-vs-claude-code-the-breakdown">Copilot vs Claude Code: The Breakdown</h2>
<p><strong>Copilot is better for:</strong></p>
<ul>
<li><p>Pure autocomplete speed</p>
</li>
<li><p>Routine, well-understood coding</p>
</li>
<li><p>Low friction, high flow state</p>
</li>
<li><p>Simple suggestions</p>
</li>
</ul>
<p><strong>Claude Code is better for:</strong></p>
<ul>
<li><p>Complex suggestions that require reasoning</p>
</li>
<li><p>Debugging and analysis</p>
</li>
<li><p>Understanding intent (not just completing code)</p>
</li>
<li><p>Asking questions about code you've written</p>
</li>
</ul>
<p>If you're a Copilot user thinking about switching, don't do it as a straight replacement. Claude Code isn't faster. It's smarter, but slower, and for day-to-day autocomplete, faster wins.</p>
<p>Try using both. Use Copilot for normal coding, Claude Code for debugging and complex changes. If you only want to pay for one, stick with Copilot. It's cheaper, it's faster, and it does the job.</p>
<p>If you're a heavy debugger and you spend a lot of time analyzing code, Claude Code might be worth it. But as a Copilot replacement? No.</p>
<h2 id="heading-a-word-on-developer-experience">A Word on Developer Experience</h2>
<p>What surprised me wasn't just the latency. It was how much I missed the seamlessness of Copilot. With Copilot, I don't think about it. It's like breathing-automatic. I type, it suggests, I accept or reject, I move on.</p>
<p>With Claude Code, I was constantly aware I was using a tool. I'd finish typing before the suggestion appeared. I'd have to remember the keyboard shortcut. I'd have to context-switch to look at the suggestion.</p>
<p>That awareness is exhausting. It's why flow state is so important to programming. The best tools get out of your way. Copilot gets out of the way. Claude Code, for autocomplete purposes, doesn't.</p>
<p>Developer experience isn't a nice-to-have. It's core to productivity. A tool that's 10% smarter but 50% more annoying is worse, not better.</p>
<h2 id="heading-what-would-make-me-switch">What Would Make Me Switch</h2>
<ul>
<li><p>Claude Code needs to get faster. Sub-second latency for suggestions.</p>
</li>
<li><p>It needs better editor integration. Tab to accept, like Copilot.</p>
</li>
<li><p>It needs to understand the full project, not just the current file.</p>
</li>
</ul>
<p>Once those three things happen, it'd be competitive. Until then, Copilot is still the better choice for daily coding work.</p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>This experiment taught me something: better isn't always better. Claude Code is arguably smarter than Copilot. But Copilot is more efficient. For autocomplete, efficiency matters more than intelligence.</p>
<p>It's like comparing a sports car to a Jeep. The sports car is faster on a highway. The Jeep is better on a mountain trail. Neither is "better." They're different. Copilot is trying to predict the next line of code fast. Claude Code is trying to understand your code deeply. They're solving different problems.</p>
<p>I went back to Copilot not because Claude Code is bad. It's actually impressive. But it's a different category of tool. Using it for autocomplete is like using a hammer when you need a screwdriver. The hammer might be fancier, but the screwdriver does the job.</p>
<p>What surprised me most was how much latency matters. I didn't expect a 2-second delay to be that noticeable. But when you're in the zone, typing code, and the autocomplete lags, it completely breaks your flow. It's not about the absolute time. It's about the interruption.</p>
<p>Don't take my word for it though. Run your own two-week experiment. Pick a tool, commit to it, and see what happens. Track your productivity. Track your frustration. The best tool is the one you'll actually use. And you can only find that out by using it.</p>
<h2 id="heading-whats-next">What's Next?</h2>
<p>If you found this useful, I write about Docker, AI tools, and developer workflows every week. I'm Balajee Asish - Docker Captain, freeCodeCamp contributor, and currently building my way through the AI tools space one project at a time.</p>
<p>Got questions or built something similar? Drop a comment below or find me on <a href="https://github.com/balajee-asish">GitHub</a> and <a href="https://linkedin.com/in/balajee-asish">LinkedIn</a>.</p>
<p>Happy building.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Stop Letting AI Agents Guess Your Requirements ]]>
                </title>
                <description>
                    <![CDATA[ I spent 64% of my weekly Claude budget before Wednesday building a tool designed to reduce Claude usage. That's the kind of irony that deserves its own specification. The tool is spec-writer: a Claude ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-stop-letting-ai-agents-guess-your-requirements/</link>
                <guid isPermaLink="false">69c1dc5930a9b81e3ac400fc</guid>
                
                    <category>
                        <![CDATA[ claude.ai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ClaudeCode ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ webdev ]]>
                    </category>
                
                    <category>
                        <![CDATA[ webdevelopment ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Productivity ]]>
                    </category>
                
                    <category>
                        <![CDATA[ TypeScript ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Daniel Nwaneri ]]>
                </dc:creator>
                <pubDate>Tue, 24 Mar 2026 00:35:37 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/06a3ff85-4d60-4e05-b494-8d2f3e6024ac.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>I spent 64% of my weekly Claude budget before Wednesday building a tool designed to reduce Claude usage. That's the kind of irony that deserves its own specification.</p>
<p>The tool is spec-writer: a Claude Code skill that takes a vague feature request and generates a structured spec, technical plan, and task breakdown before a single line of code gets written.</p>
<p>The problem it solves is one most developers hit within their first week of using AI coding agents seriously: the agent writes confidently in the wrong direction and you pay for it twice, once in tokens, once in rewrites.</p>
<p>This tutorial shows you how to install spec-writer, how to invoke it on a real feature, and how to read the output so you can catch the assumptions that would have wasted your time.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a href="#heading-the-problem-with-prompting-agents-directly">The Problem with Prompting Agents Directly</a></p>
</li>
<li><p><a href="#heading-what-specdriven-development-is">What Spec-Driven Development Is</a></p>
</li>
<li><p><a href="#heading-how-spec-writer-works">How spec-writer Works</a></p>
</li>
<li><p><a href="#heading-how-to-install-spec-writer">How to Install spec-writer</a></p>
</li>
<li><p><a href="#heading-how-to-write-your-first-spec">How to Write Your First Spec</a></p>
</li>
<li><p><a href="#heading-how-to-read-the-output">How to Read the Output</a></p>
</li>
<li><p><a href="#heading-how-to-hand-the-spec-to-your-agent">How to Hand the Spec to Your Agent</a></p>
</li>
<li><p><a href="#heading-where-to-go-next">Where to Go Next</a></p>
</li>
</ol>
<h2 id="heading-the-problem-with-prompting-agents-directly">The Problem with Prompting Agents Directly</h2>
<p>Here is what happens when you skip the spec.</p>
<p>You have a feature in your head: "Add a way for users to export their data." You open Claude Code and describe it. The agent produces code. It looks right. You run it. It's mostly right – except it exports everything including soft-deleted records, it doesn't paginate, it times out on large accounts, and it has no authentication check on the export endpoint.</p>
<p>None of those things were in your prompt. The agent guessed, and it guessed plausibly – which is worse than guessing obviously wrong. You didn't notice until testing.</p>
<p>This is the fundamental problem with prompting agents directly on anything non-trivial: your prompt carries your conscious requirements, but every feature has a shadow of requirements you didn't think to state. And the agent fills that shadow with assumptions.</p>
<p>Most of the time, those assumptions are reasonable. Some of the time, they're wrong in ways that take hours to unravel.</p>
<p>The failure mode isn't hallucination. It's the agent being exactly as helpful as the prompt allowed, which wasn't helpful enough.</p>
<p>Spec-Driven Development addresses this directly. The methodology – documented extensively by practitioners like Julián Deangelis – argues that a written spec isn't documentation overhead. It's the mechanism that forces you to make decisions before the agent does.</p>
<h2 id="heading-what-spec-driven-development-is">What Spec-Driven Development Is</h2>
<p>Spec-Driven Development is the practice of writing a structured specification before you write code or prompt an agent. The spec defines what the feature must do, what assumptions are being made, and what tasks the implementation breaks into.</p>
<p>The key insight is what a spec is <em>for</em>. A spec is not trying to replace code. It's trying to surface the decisions that would otherwise be invisible. The agent will make those decisions either way: with a spec, you make them first. Without a spec, you discover them during testing.</p>
<p>The strongest counterargument to SDD comes from Gabriella Gonzalez: <em>a sufficiently detailed spec is just code</em>. She's right that some specs devolve into pseudocode so specific they might as well be implementations.</p>
<p>But that's a spec written at the wrong level of abstraction. The goal is to name the decisions, not to pre-implement them. "Only authenticated users can trigger this export" is a decision. "Call <code>verifyJWT(token)</code> and return 401 if it fails" is implementation. The spec needs the first. The agent handles the second.</p>
<p>SDD has three levels:</p>
<ol>
<li><p><strong>Spec-First</strong>: write a spec before every feature and hand it to the agent as context. This is the entry point and the workflow this tutorial focuses on.</p>
</li>
<li><p><strong>Spec-Anchored</strong>: the spec lives in the repository and evolves alongside the code. When requirements change, you update the spec and re-prompt the agent to realign.</p>
</li>
<li><p><strong>Spec-as-Source</strong>: the spec is the primary artifact. Code is generated from it and considered disposable. This is the most ambitious level and the direction many teams are moving toward.</p>
</li>
</ol>
<p>spec-writer gets you to Spec-First immediately, with no ceremony.</p>
<h2 id="heading-how-spec-writer-works">How spec-writer Works</h2>
<p>spec-writer is a Claude Code skill – a markdown file that loads into the agent's context and changes how it responds when invoked.</p>
<p>The skill follows one rule: generate first, flag assumptions inline. Instead of asking you clarifying questions before producing output, it generates the full spec immediately and marks every decision it made without your explicit input using <code>[ASSUMPTION: ...]</code> tags. Then you correct what's wrong.</p>
<p>This is faster than Q&amp;A because it makes the decisions visible in a form you can react to rather than anticipate.</p>
<p>The output has three sections in fixed order:</p>
<ol>
<li><p><strong>SPEC</strong>: the what. One-line purpose, user stories, requirements, edge cases, and acceptance criteria in Given/When/Then format.</p>
</li>
<li><p><strong>PLAN</strong>: the how. Stack and architecture decisions, data model changes, API contracts, testing strategy, and security constraints.</p>
</li>
<li><p><strong>TASKS</strong>: the breakdown. Ordered, self-contained tasks each completable in a single agent session, each with its own acceptance criteria.</p>
</li>
</ol>
<p>After the three sections, the skill produces an <strong>Assumptions summary</strong>: every <code>[ASSUMPTION: ...]</code> from the output, ranked by impact. This is the part you review before handing anything to the agent.</p>
<p>The skill is compatible with <a href="https://github.com/github/spec-kit">GitHub Spec Kit</a> and <a href="https://github.com/Fission-AI/OpenSpec">OpenSpec</a>. If you use either framework, save the spec output to your <code>.specify/</code> or <code>openspec/changes/</code> directory and continue from there.</p>
<h2 id="heading-how-to-install-spec-writer">How to Install spec-writer</h2>
<p>spec-writer uses the Agent Skills standard, which means the same SKILL.md file works across Claude Code, Cursor, GitHub Copilot, Gemini CLI, and any other agent that supports the standard. You install it once and it works everywhere.</p>
<h3 id="heading-installation">Installation</h3>
<p>Create the skills directory if it doesn't exist and clone the repo:</p>
<pre><code class="language-bash">mkdir -p ~/.claude/skills
git clone https://github.com/dannwaneri/spec-writer.git ~/.claude/skills/spec-writer
</code></pre>
<p>On Windows PowerShell:</p>
<p><strong>(Note:</strong> PowerShell uses backtick (<code>`</code>) for line continuation, not backslash.)</p>
<pre><code class="language-powershell">New-Item -ItemType Directory -Force -Path "$HOME\.claude\skills"
git clone https://github.com/dannwaneri/spec-writer.git "$HOME\.claude\skills\spec-writer"
</code></pre>
<p>That's the entire installation. No package to install, no configuration file to edit, no API key. The skill is a markdown file. The agent reads it.</p>
<h3 id="heading-verification">Verification</h3>
<p>Open Claude Code and type:</p>
<pre><code class="language-plaintext">/spec-writer test
</code></pre>
<p>If the skill is installed correctly, the agent will read the SKILL.md and produce a spec structure – even for "test" as input. You'll see the three sections and the Assumptions summary. If nothing happens, confirm that the <code>~/.claude/skills/spec-writer/SKILL.md</code> file exists.</p>
<h2 id="heading-how-to-write-your-first-spec">How to Write Your First Spec</h2>
<p>The invocation pattern is:</p>
<pre><code class="language-plaintext">/spec-writer [your feature description]
</code></pre>
<p>The description can be as rough as you'd write in a Slack message. The skill's job is to structure it. You don't need to be precise. Being precise is what the spec is for.</p>
<h3 id="heading-a-real-example">A Real Example</h3>
<p>Here is a real invocation from the Foundation project, a federated knowledge base built on Cloudflare Workers:</p>
<pre><code class="language-plaintext">/spec-writer Add CLI capture for Claude Code sessions stored in ~/.claude/projects/
</code></pre>
<p>That's twelve words. No mention of the tech stack, no data model details, no authentication requirements, no error handling spec. All of that comes out in the spec.</p>
<p>Here is the abbreviated output:</p>
<p><strong>SPEC</strong></p>
<p><strong>One-line purpose:</strong> Developers can capture Claude Code session logs from their local filesystem into Foundation before they are deleted after 30 days.</p>
<p><strong>Users and use cases:</strong></p>
<ul>
<li><p>As a developer using Claude Code daily, I want my coding sessions captured automatically so that architectural decisions and debugging patterns don't disappear after 30 days.</p>
</li>
<li><p>As a Foundation user, I want CLI sessions and browser extension captures in the same knowledge base so that I have one place to query across all my AI interactions.</p>
</li>
</ul>
<p><strong>Acceptance criteria:</strong></p>
<pre><code class="language-plaintext">Given a developer with Claude Code sessions in ~/.claude/projects/
When they run the Foundation CLI capture command
Then a list of uncaptured sessions is displayed, sorted by most recent first
 
Given a session already in Foundation
When the developer runs capture again
Then that session does not appear in the review list
 
Given a malformed .jsonl file
When the capture command processes it
Then an error is logged for that file and processing continues on remaining sessions
</code></pre>
<p><strong>PLAN</strong></p>
<p><strong>Stack and architecture:</strong></p>
<pre><code class="language-plaintext">CLI capture runs locally — it needs to read the local filesystem, which
Cloudflare Workers cannot do. [ASSUMPTION: CLI capture is a local Node.js
or Bun script that calls the Foundation API to insert sessions, rather
than a Worker itself]
</code></pre>
<p><strong>API contracts:</strong></p>
<ul>
<li><p><code>POST /api/sessions/import</code> – accepts parsed session JSON, returns session ID</p>
</li>
<li><p>Returns 409 if session already exists</p>
</li>
<li><p>Triggers evaluator pipeline after successful insert</p>
</li>
</ul>
<p><strong>TASKS</strong></p>
<pre><code class="language-plaintext">## Task 1: .jsonl parser
 
**What to build:** A parser that reads a Claude Code .jsonl session file
and outputs Foundation's internal message format.
**Files likely affected:** cli/src/parsers/claudeCode.js (create)
**Acceptance criteria:**
1. Parses a valid .jsonl file into the correct message format
2. Handles malformed lines without crashing — logs and skips
3. Returns empty array for empty file
**Dependencies:** none
</code></pre>
<p><strong>Assumptions to review:</strong></p>
<pre><code class="language-plaintext">1. CLI capture is a local script calling the Foundation API, not a Worker
   Impact: HIGH
   Correct this if: you want a purely serverless approach
 
2. Manual curation before capture, not automatic bulk import
   Impact: HIGH
   Correct this if: you want automatic background capture
 
3. Session ID from .jsonl filename is the deduplication key
   Impact: MEDIUM
   Correct this if: session IDs are stored differently in your schema
 
4. No sensitive data scrubbing in v1
   Impact: MEDIUM
   Correct this if: your sessions contain credentials or keys
</code></pre>
<p>Twelve words in, four decisions surfaced immediately – three of which had real architectural implications.</p>
<p>The third assumption ("Session ID from .jsonl filename is the deduplication key") is the one that would have caused the most subtle bug. The agent would have implemented deduplication based on the filename and it would have worked until a session was renamed. The spec caught it before a line of code was written.</p>
<h2 id="heading-how-to-read-the-output">How to Read the Output</h2>
<p>The output is designed to be scanned for <code>[ASSUMPTION: ...]</code> tags first, read for the tasks second.</p>
<h3 id="heading-reading-the-assumptions">Reading the Assumptions</h3>
<p>Every <code>[ASSUMPTION: ...]</code> tag marks a place where the agent filled in something you didn't specify. Your job is to go through the Assumptions summary and decide for each one:</p>
<ul>
<li><p><strong>Correct</strong>: the assumption is right, leave it</p>
</li>
<li><p><strong>Override</strong>: the assumption is wrong, restate it and re-run the spec</p>
</li>
<li><p><strong>Defer</strong>: the assumption doesn't matter for this iteration, mark it and move on</p>
</li>
</ul>
<p>The impact rating tells you which assumptions to fix before you start coding. HIGH-impact assumptions affect architecture or data model. If they're wrong, fixing them requires rework. LOW-impact assumptions affect behavior details that are easy to change later.</p>
<h3 id="heading-reading-the-acceptance-criteria">Reading the Acceptance Criteria</h3>
<p>The acceptance criteria in Given/When/Then format are the most useful part of the spec for catching scope errors. Read each one and ask: is this actually what I want?</p>
<p>Criteria are binary by design. "Returns 401 when unauthenticated" is a criterion. "Works correctly" is not. If you find yourself reading a criterion and thinking "well, it depends", then that's a signal that the criterion is hiding an assumption. Restate it.</p>
<h3 id="heading-reading-the-tasks">Reading the Tasks</h3>
<p>The tasks are ordered and self-contained. Each task produces a verifiable change. Before you hand any task to an agent, check two things:</p>
<ol>
<li><p>Does the task have all the context it needs? If a task says "follow the existing auth pattern" and you haven't pointed the agent at your auth code, it will guess.</p>
</li>
<li><p>Does the acceptance criteria match what you'd actually test? If the criteria are vague, tighten them before the agent sees the task.</p>
</li>
</ol>
<h2 id="heading-how-to-hand-the-spec-to-your-agent">How to Hand the Spec to Your Agent</h2>
<p>The spec is context, not a prompt. When you start an agent session for a task, include the relevant spec sections alongside the task description.</p>
<p>For Task 1 from the example above, your agent session might open like this:</p>
<pre><code class="language-plaintext">Context:
- This is a federated knowledge base built on Cloudflare Workers, D1, and Vectorize
- Sessions are stored in ~/.claude/projects/ as .jsonl files
- The API runs at https://&lt;your-worker&gt;.workers.dev
 
Spec:
[paste the SPEC and PLAN sections]
 
Task:
[paste Task 1]
</code></pre>
<p>The context block is just an example. Replace it with your own project's tech stack, file locations, and API URL. The point is to give the agent the same context a new team member would need on day one.</p>
<p>The agent now has requirements, architecture context, and a single scoped task with binary acceptance criteria. It cannot guess the deduplication key incorrectly because the spec already resolved that assumption. It cannot skip error handling because the acceptance criteria explicitly require it.</p>
<p>This is the workflow the spec is designed for. The spec doesn't replace the agent. Rather, it removes the decisions from the agent's hands and puts them in yours, before the work starts.</p>
<h3 id="heading-saving-the-spec-for-later">Saving the Spec for Later</h3>
<p>If you want to move toward Spec-Anchored development – where the spec lives in the repository – save the output to a <code>specs/</code> directory in your project:</p>
<pre><code class="language-bash"># Create specs directory
mkdir -p specs
 
# Save your spec
# Paste the output into specs/cli-capture.md
</code></pre>
<p>When requirements change, update the spec and re-prompt the agent to realign the implementation. The spec becomes the source of truth, not the code comments.</p>
<h2 id="heading-where-to-go-next">Where to Go Next</h2>
<p>Try it on your next feature before you write a line of code. The assumptions it flags will tell you something about your feature you hadn't consciously decided yet – and correcting the HIGH-impact ones before you hand anything to an agent is the whole point. Skipping that step is the same as prompting directly.</p>
<p>If your project is growing, move toward Spec-Anchored. Save specs in your repository under <code>specs/</code>. When a new contributor joins or an agent starts a session cold, the specs give them the decisions that got made without requiring them to reverse-engineer the code.</p>
<p>The strongest ongoing challenge to this workflow is Gabriella Gonzalez's argument that detailed specs become code. If your specs are getting implementation-specific, you've crossed a line. Pull back to decisions – "only authenticated users can trigger this" – and leave implementation to the agent. The spec's job is to name what the agent would have guessed wrong, not to write the feature in prose.</p>
<p>The Agent Skills standard now works across Claude Code, GitHub Copilot, Cursor, and Gemini CLI. The spec-writer repo is at <a href="https://github.com/dannwaneri/spec-writer">github.com/dannwaneri/spec-writer</a>.</p>
<p>The irony of spending 64% of a Claude budget building a token-efficiency tool is real. But the spec surfaced four decisions on a twelve-word prompt. The fourth one – the deduplication key assumption – would have produced a bug that worked perfectly until a session got renamed.</p>
<p>That's not a hallucination. That's the agent being exactly as helpful as the prompt allowed.</p>
<p>The spec is how you raise the ceiling on what "helpful" means.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build an MCP Server with Python, Docker, and Claude Code ]]>
                </title>
                <description>
                    <![CDATA[ Every MCP tutorial I've found so far has followed the same basic script: build a server, point Claude Desktop at it, screenshot the chat window, done. This is fine if you want a demo. But it's not fin ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-an-mcp-server-with-python-docker-and-claude-code/</link>
                <guid isPermaLink="false">69b09018abc0d95001a8f07f</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ML ]]>
                    </category>
                
                    <category>
                        <![CDATA[ claude.ai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ mcp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ mcp server ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Balajee Asish Brahmandam ]]>
                </dc:creator>
                <pubDate>Tue, 10 Mar 2026 21:41:44 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/02826050-87fa-42cb-8167-73bca4b42616.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Every MCP tutorial I've found so far has followed the same basic script: build a server, point Claude Desktop at it, screenshot the chat window, done.</p>
<p>This is fine if you want a demo. But it's not fine if you want something you can ship, defend in an interview, or hand to another developer without a README that starts with "first, install this Electron app."</p>
<p>So I built an MCP server in Python, containerized it with Docker, and wired it into Claude Code – all from the terminal, no GUI required.</p>
<p>This article walks through the full loop in one afternoon: what MCP actually is, why it matters now that OpenAI and Google have adopted it, the real security problems nobody puts in their tutorial (complete with CVEs), and every command you need to go from an empty directory to a working tool.</p>
<p>If you're between jobs and need a portfolio project that shows you understand how AI tooling actually works under the hood, this is the one.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#what-you-will-build">What You Will Build</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#what-is-mcp-and-why-should-you-care">What is MCP (and Why Should You Care)?</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#why-claude-code-instead-of-claude-desktop">Why Claude Code Instead of Claude Desktop?</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#step-1-build-the-mcp-server">Step 1: Build the MCP Server</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#step-2-test-it-locally">Step 2: Test It Locally</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#step-3-dockerize-it">Step 3: Dockerize It</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#step-4-wire-it-into-claude-code">Step 4: Wire It Into Claude Code</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#step-5-use-it">Step 5: Use It</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#security-what-the-other-tutorials-leave-out">Security: What the Other Tutorials Leave Out</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#what-to-do-next">What to Do Next</a></p>
</li>
<li><p><a href="https://claude.ai/chat/1a92e709-4c86-4c9a-8fa3-b1533b9d21a5#wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-what-you-will-build">What You Will Build</h2>
<p>By the end of this tutorial, you will have:</p>
<ul>
<li><p>A Python MCP server that exposes custom tools to any MCP-compatible AI client</p>
</li>
<li><p>A Docker container that packages the server for reproducible deployment</p>
</li>
<li><p>A working connection between that container and Claude Code in your terminal</p>
</li>
<li><p>An understanding of the security risks involved and how to mitigate the worst of them</p>
</li>
</ul>
<p>The server we are building is a <strong>project scaffolder</strong>. You give it a project name and a language, and it generates a starter directory structure with the right files. It's simple enough to build in an afternoon, but useful enough to actually put on your résumé.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You will need the following installed on your machine:</p>
<ul>
<li><p><strong>Python 3.10+</strong> (check with <code>python3 --version</code>)</p>
</li>
<li><p><strong>Docker</strong> (check with <code>docker --version</code>)</p>
</li>
<li><p><strong>Claude Code</strong> with an active Claude Pro, Max, or API plan (check with <code>claude --version</code>)</p>
</li>
<li><p><strong>Node.js 20+</strong> (required by Claude Code – check with <code>node --version</code>)</p>
</li>
<li><p>A terminal you are comfortable in</p>
</li>
</ul>
<p>If you don't have Claude Code installed yet, follow the <a href="https://code.claude.com/docs/en/getting-started">official installation instructions</a>. The npm installation method is deprecated, so make sure you use the native binary installer instead.</p>
<h2 id="heading-what-is-mcp-and-why-should-you-care">What is MCP (and Why Should You Care)?</h2>
<p>The Model Context Protocol (MCP) is an open standard that lets AI models connect to external tools and data sources. Anthropic released it in November 2024, and within a year it became the default way to extend what an LLM can do. OpenAI adopted it in March 2025. Google DeepMind followed in April. The protocol now has over 97 million monthly SDK downloads and more than 10,000 active servers.</p>
<p>The easiest way to think about MCP is as a USB-C port for AI. Before MCP, every AI provider had its own way of calling tools. OpenAI had function calling. Google had their own format. If you wanted your tool to work with multiple models, you had to implement it multiple times. MCP gives you one interface that works everywhere.</p>
<p>Here is how the pieces fit together:</p>
<ul>
<li><p>An <strong>MCP server</strong> exposes tools, resources, and prompts. It is your code.</p>
</li>
<li><p>An <strong>MCP client</strong> (like Claude Code, Claude Desktop, or Cursor) discovers those tools and calls them on behalf of the LLM.</p>
</li>
<li><p>The <strong>transport</strong> is how they communicate. For local servers, that's usually stdio (standard input/output). For remote servers, it's HTTP.</p>
</li>
</ul>
<p>When you type a message in Claude Code and it decides to use one of your tools, here is what happens: Claude Code sends a JSON-RPC 2.0 message to your server over stdin, your server executes the tool and writes the result to stdout, and Claude Code reads it back. The LLM never talks to your server directly. The client is always in the middle.</p>
<p>If you want the deeper architecture breakdown, freeCodeCamp already has a <a href="https://www.freecodecamp.org/news/how-does-an-mcp-work-under-the-hood/">solid explainer on how MCP works under the hood</a>. Here, I will focus on building.</p>
<h2 id="heading-why-claude-code-instead-of-claude-desktop">Why Claude Code Instead of Claude Desktop?</h2>
<p>Most MCP tutorials use Claude Desktop as the client. That works, but Claude Code has a few advantages for developers:</p>
<ol>
<li><p><strong>It lives in your terminal.</strong> No GUI to configure. No JSON files to hand-edit in hidden config directories. You add an MCP server with one command and you are done.</p>
</li>
<li><p><strong>It's already where you code.</strong> If you're writing the server, testing it, and connecting it, doing all of that in the same terminal session cuts the context switching.</p>
</li>
<li><p><strong>It works on headless machines.</strong> If you're SSHing into a dev box or running in CI, Claude Desktop isn't an option. Claude Code is.</p>
</li>
<li><p><strong>It's also an MCP server itself.</strong> Claude Code can expose its own tools (file reading, writing, shell commands) to other MCP clients via <code>claude mcp serve</code>. That's a neat trick we won't use today, but it's worth knowing about.</p>
</li>
</ol>
<p>The relevant commands:</p>
<pre><code class="language-bash"># Add an MCP server
claude mcp add &lt;name&gt; -- &lt;command&gt;

# List configured servers
claude mcp list

# Remove a server
claude mcp remove &lt;name&gt;

# Check MCP status inside Claude Code
/mcp
</code></pre>
<h2 id="heading-step-1-build-the-mcp-server">Step 1: Build the MCP Server</h2>
<p>We're using <a href="https://github.com/jlowin/fastmcp">FastMCP</a>, a Python framework that handles all the protocol plumbing so you can focus on your tools. Create a new project directory and set it up:</p>
<pre><code class="language-bash">mkdir mcp-scaffolder &amp;&amp; cd mcp-scaffolder
python3 -m venv .venv
source .venv/bin/activate
pip install "mcp[cli]&gt;=1.25,&lt;2"
</code></pre>
<p>Why pin the version? The MCP Python SDK v2.0 is in development and will change the transport layer significantly. Pinning to &gt;=1.25,&lt;2 keeps your server working until you're ready to migrate.</p>
<p>Now create <code>server.py</code>:</p>
<pre><code class="language-python"># server.py
from mcp.server.fastmcp import FastMCP
import os
import json

mcp = FastMCP("project-scaffolder")

# Templates for different languages
TEMPLATES = {
    "python": {
        "files": {
            "main.py": '"""Entry point."""\n\n\ndef main():\n    print("Hello, world!")\n\n\nif __name__ == "__main__":\n    main()\n',
            "requirements.txt": "",
            "README.md": "# {name}\n\nA Python project.\n\n## Setup\n\n```bash\npip install -r requirements.txt\npython main.py\n```\n",
            ".gitignore": "__pycache__/\n*.pyc\n.venv/\n",
        },
        "dirs": ["tests"],
    },
    "node": {
        "files": {
            "index.js": 'console.log("Hello, world!");\n',
            "package.json": '{{\n  "name": "{name}",\n  "version": "1.0.0",\n  "main": "index.js"\n}}\n',
            "README.md": "# {name}\n\nA Node.js project.\n\n## Setup\n\n```bash\nnpm install\nnode index.js\n```\n",
            ".gitignore": "node_modules/\n",
        },
        "dirs": [],
    },
    "go": {
        "files": {
            "main.go": 'package main\n\nimport "fmt"\n\nfunc main() {{\n\tfmt.Println("Hello, world!")\n}}\n',
            "go.mod": "module {name}\n\ngo 1.21\n",
            "README.md": "# {name}\n\nA Go project.\n\n## Setup\n\n```bash\ngo run main.go\n```\n",
            ".gitignore": "bin/\n",
        },
        "dirs": ["cmd", "internal"],
    },
}


@mcp.tool()
def scaffold_project(name: str, language: str) -&gt; str:
    """Create a new project directory structure.

    Args:
        name: The project name (used as the directory name)
        language: The programming language - one of: python, node, go
    """
    language = language.lower().strip()

    if language not in TEMPLATES:
        return json.dumps({
            "error": f"Unsupported language: {language}",
            "supported": list(TEMPLATES.keys()),
        })

    template = TEMPLATES[language]
    base_path = os.path.join(os.getcwd(), name)

    if os.path.exists(base_path):
        return json.dumps({
            "error": f"Directory already exists: {name}",
        })

    # Create the project directory
    os.makedirs(base_path, exist_ok=True)

    # Create subdirectories
    for dir_name in template["dirs"]:
        os.makedirs(os.path.join(base_path, dir_name), exist_ok=True)

    # Create files
    created_files = []
    for filename, content in template["files"].items():
        filepath = os.path.join(base_path, filename)
        formatted_content = content.replace("{name}", name)
        with open(filepath, "w") as f:
            f.write(formatted_content)
        created_files.append(filename)

    return json.dumps({
        "status": "created",
        "path": base_path,
        "language": language,
        "files": created_files,
        "directories": template["dirs"],
    })


@mcp.tool()
def list_templates() -&gt; str:
    """List all available project templates and their contents."""
    result = {}
    for lang, template in TEMPLATES.items():
        result[lang] = {
            "files": list(template["files"].keys()),
            "directories": template["dirs"],
        }
    return json.dumps(result, indent=2)


if __name__ == "__main__":
    mcp.run(transport="stdio")
</code></pre>
<p>A few things to notice about this code:</p>
<p>Tools return strings. MCP tools communicate through text. I'm returning JSON strings so the LLM can parse the results reliably. You could return plain text, but structured data gives the model more to work with.</p>
<p>The <code>@mcp.tool()</code> decorator does the heavy lifting. FastMCP reads your function signature and docstring to generate the JSON schema that tells the LLM what this tool does, what arguments it takes, and what types they are. Good docstrings aren't optional here – they're how the LLM decides whether to call your tool.</p>
<p><code>transport="stdio"</code> is the key line. This tells FastMCP to communicate over standard input/output, which is what Claude Code expects for local servers.</p>
<h2 id="heading-step-2-test-it-locally">Step 2: Test It Locally</h2>
<p>Before we Dockerize anything, make sure the server actually works:</p>
<pre><code class="language-bash"># Quick smoke test - the server should start without errors
python server.py
</code></pre>
<p>You should see... nothing. That is correct. An MCP server over stdio just sits there waiting for JSON-RPC messages on stdin. Press <code>Ctrl+C</code> to stop it.</p>
<p>For a proper test, use the MCP Inspector (Anthropic's debugging tool):</p>
<pre><code class="language-bash"># Install and run the inspector
npx @modelcontextprotocol/inspector python server.py
</code></pre>
<p>This opens a web interface where you can see your tools, call them manually, and inspect the JSON-RPC messages going back and forth. Verify that both <code>scaffold_project</code> and <code>list_templates</code> show up and return sensible results.</p>
<p><strong>Here's a debugging tip that will save you time:</strong> If your MCP server logs anything to stdout, it will corrupt the JSON-RPC stream and the client will disconnect. Use stderr for all logging: <code>print("debug info", file=sys.stderr)</code>. This is the single most common source of "my server connects but then immediately fails" bugs. The New Stack called stdio transport "incredibly fragile" for exactly this reason.</p>
<h2 id="heading-step-3-dockerize-it">Step 3: Dockerize It</h2>
<p>Create a <code>Dockerfile</code> in your project root:</p>
<pre><code class="language-dockerfile">FROM python:3.12-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy server code
COPY server.py .

# MCP servers over stdio need unbuffered output
ENV PYTHONUNBUFFERED=1

# The server reads from stdin and writes to stdout
CMD ["python", "server.py"]
</code></pre>
<p>Create <code>requirements.txt</code>:</p>
<pre><code class="language-plaintext">mcp[cli]&gt;=1.25,&lt;2
</code></pre>
<p>Build and verify:</p>
<pre><code class="language-bash">docker build -t mcp-scaffolder .

# Quick test - should start without errors
docker run -i mcp-scaffolder
</code></pre>
<p>Again, you'll see nothing because the server is waiting for input. <code>Ctrl+C</code> to stop.</p>
<p>Two things matter in this Dockerfile:</p>
<ol>
<li><p><code>PYTHONUNBUFFERED=1</code> <strong>is critical.</strong> Without it, Python buffers stdout, and the MCP client may hang waiting for responses that are sitting in a buffer. This is one of those bugs that works fine in local testing and breaks in Docker.</p>
</li>
<li><p><code>docker run -i</code> <strong>(interactive mode) is required.</strong> The <code>-i</code> flag keeps stdin open so the MCP client can send messages to the container. Without it, the server gets an immediate EOF and exits.</p>
</li>
</ol>
<h2 id="heading-step-4-wire-it-into-claude-code">Step 4: Wire It Into Claude Code</h2>
<p>Now connect your Docker container to Claude Code:</p>
<pre><code class="language-bash">claude mcp add scaffolder -- docker run -i --rm mcp-scaffolder
</code></pre>
<p>That's the whole command. Let me break it down:</p>
<ul>
<li><p><code>claude mcp add</code> registers a new MCP server</p>
</li>
<li><p><code>scaffolder</code> is the name you will reference it by</p>
</li>
<li><p>Everything after <code>--</code> is the command Claude Code runs to start the server</p>
</li>
<li><p><code>docker run -i --rm mcp-scaffolder</code> starts the container with interactive stdin and removes it when done</p>
</li>
</ul>
<p>Verify that it registered:</p>
<pre><code class="language-bash">claude mcp list
</code></pre>
<p>You should see <code>scaffolder</code> in the output with a <code>stdio</code> transport type.</p>
<p>Now launch Claude Code and check the connection:</p>
<pre><code class="language-bash">claude
</code></pre>
<p>Once inside Claude Code, type <code>/mcp</code> to see the status of your MCP servers. You should see <code>scaffolder</code> listed as connected with two tools available.</p>
<h2 id="heading-step-5-use-it">Step 5: Use It</h2>
<p>Still inside Claude Code, try it out:</p>
<pre><code class="language-plaintext">Create a new Python project called "weather-api"
</code></pre>
<p>Claude Code should discover your <code>scaffold_project</code> tool, call it with <code>name="weather-api"</code> and <code>language="python"</code>, and report back what it created. Check your filesystem and you should see the full project structure.</p>
<p>Try a few more:</p>
<pre><code class="language-plaintext">What project templates are available?
</code></pre>
<pre><code class="language-plaintext">Scaffold a Go project called "url-shortener"
</code></pre>
<p>If Claude Code doesn't pick up your tools, run <code>/mcp</code> to check the connection status. If it shows as disconnected, the most common causes are that the Docker image failed to build, stdout is being polluted (check for stray print statements), or the Docker daemon is not running.</p>
<h2 id="heading-security-what-the-other-tutorials-leave-out">Security: What the Other Tutorials Leave Out</h2>
<p>This is the section most MCP tutorials skip. They should not. MCP has had real security incidents, not theoretical ones, and understanding them makes you a better developer.</p>
<h3 id="heading-the-prompt-injection-problem">The Prompt Injection Problem</h3>
<p>MCP servers execute code on your machine based on what an LLM decides to do. If an attacker can influence what the LLM sees, they can influence what your server does. This is called prompt injection, and it is the number one unsolved security problem in the MCP ecosystem.</p>
<p>In May 2025, researchers at Invariant Labs demonstrated this against the official GitHub MCP server. They created a malicious GitHub issue that, when read by an AI agent, hijacked the agent into leaking private repository data (including salary information) into a public pull request. The root cause was an overly broad Personal Access Token combined with untrusted content landing in the LLM's context window.</p>
<p>This was not a contrived lab demo. It used the official GitHub MCP server, the kind of thing people install from the MCP server directory without a second thought.</p>
<h3 id="heading-real-cves-not-theory">Real CVEs, Not Theory</h3>
<p>The ecosystem has accumulated real vulnerability reports:</p>
<ul>
<li><p><strong>CVE-2025-6514:</strong> A critical command-injection bug in <code>mcp-remote</code>, a popular OAuth proxy that 437,000+ environments used. An attacker could execute arbitrary OS commands through crafted OAuth redirect URIs.</p>
</li>
<li><p><strong>CVE-2025-6515:</strong> Session hijacking in <code>oatpp-mcp</code> through predictable session IDs, letting attackers inject prompts into other users' sessions.</p>
</li>
<li><p><strong>MCP Inspector RCE:</strong> Anthropic's own debugging tool allowed unauthenticated remote code execution. Inspecting a malicious server meant giving the attacker a shell on your machine.</p>
</li>
</ul>
<p>An Equixly security assessment found command injection in 43% of tested MCP server implementations. Nearly a third were vulnerable to server-side request forgery.</p>
<h3 id="heading-what-you-should-actually-do">What You Should Actually Do</h3>
<p>For the server we built today, here is what matters:</p>
<h4 id="heading-limit-file-system-access">Limit file system access</h4>
<p>Our Docker container doesn't mount your home directory. That's intentional. If you need the server to write files to your host, mount only the specific directory you need: <code>docker run -i --rm -v $(pwd)/projects:/app/projects mcp-scaffolder</code>. Never mount <code>/</code> or <code>~</code>.</p>
<h4 id="heading-validate-all-inputs">Validate all inputs</h4>
<p>Our <code>scaffold_project</code> tool checks that the language is in a known list and that the directory does not already exist. But think about what happens if someone passes <code>name="../../etc/passwd"</code> as the project name. Path traversal is the kind of thing you need to catch. Add this to the tool:</p>
<pre><code class="language-python"># Add this validation at the top of scaffold_project
if ".." in name or "/" in name or "\\" in name:
    return json.dumps({"error": "Invalid project name"})
</code></pre>
<h4 id="heading-use-least-privilege-tokens">Use least-privilege tokens</h4>
<p>If your MCP server connects to an API, give it the minimum permissions it needs. The GitHub MCP incident happened because the PAT had access to every private repo. A read-only token scoped to one repo would have contained the blast radius.</p>
<h4 id="heading-do-not-install-mcp-servers-from-untrusted-sources">Do not install MCP servers from untrusted sources</h4>
<p>A malicious npm package posing as a "Postmark MCP Server" was caught silently BCC'ing all emails to an attacker's address. Treat MCP server packages with the same caution you would give any code that runs on your machine with your permissions.</p>
<h2 id="heading-what-to-do-next">What to Do Next</h2>
<p>You have a working MCP server in a Docker container, connected to Claude Code. Here is how to make it portfolio-ready:</p>
<ol>
<li><p><strong>Add more tools:</strong> The scaffolder is a starting point. Add a tool that reads a project's dependency file and lists outdated packages. Add one that generates a Dockerfile for an existing project. Each tool is a function with a decorator – the pattern is the same every time.</p>
</li>
<li><p><strong>Add tests:</strong> Write pytest tests that call your tool functions directly and verify the output. MCP tools are just Python functions. Test them like Python functions.</p>
</li>
<li><p><strong>Push the Docker image:</strong> Tag it and push to Docker Hub or GitHub Container Registry. Then your <code>claude mcp add</code> command becomes <code>claude mcp add scaffolder -- docker run -i --rm yourusername/mcp-scaffolder:latest</code> and anyone can use it.</p>
</li>
<li><p><strong>Write a README that explains the security model:</strong> What permissions does your server need? What file system access? What happens if inputs are malicious? Answering these questions in your README signals that you think about security, which is exactly what hiring managers are looking for right now.</p>
</li>
</ol>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>We built a Python MCP server with FastMCP, containerized it with Docker, and connected it to Claude Code. The whole thing fits in about 100 lines of Python, a six-line Dockerfile, and one <code>claude mcp add</code> command.</p>
<p>The MCP ecosystem is real and growing fast. The protocol has the backing of Anthropic, OpenAI, and Google. It's now governed by the Linux Foundation. But it's also young, and the security story is still being written. Build with it, but build with your eyes open.</p>
<p>If you want to go deeper, here are the resources I found most useful:</p>
<ul>
<li><p><a href="https://modelcontextprotocol.io/specification/2025-11-25">MCP specification</a>: the actual protocol docs</p>
</li>
<li><p><a href="https://code.claude.com/docs/en/mcp">Claude Code MCP documentation</a>: how Claude Code implements MCP</p>
</li>
<li><p><a href="https://github.com/jlowin/fastmcp">FastMCP GitHub</a>: the Python framework we used</p>
</li>
<li><p><a href="https://authzed.com/blog/timeline-mcp-breaches">AuthZed's timeline of MCP security incidents</a>: required reading if you are building MCP servers for production</p>
</li>
<li><p><a href="https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/">Simon Willison on MCP prompt injection</a>: the clearest explanation of why this is hard to solve</p>
</li>
</ul>
<p>The complete source code for this tutorial is on <a href="https://github.com/balajeeasish/ai-workshop/tree/main/mcp-server">GitHub</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn to Use Claude AI to Build Text Summarizers, Image Describers, and More ]]>
                </title>
                <description>
                    <![CDATA[ From summarizing lengthy articles to providing detailed descriptions of images, AI models are becoming essential tools for developers. One such powerful tool is Claude, a state-of-the-art AI language model developed by Anthropic. Whether you're an as... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-to-use-claude-ai/</link>
                <guid isPermaLink="false">6717c129cfa8adba4005005e</guid>
                
                    <category>
                        <![CDATA[ claude.ai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Tue, 22 Oct 2024 15:13:45 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1729609697168/55a20068-be7b-4617-ac08-cde170ee0914.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>From summarizing lengthy articles to providing detailed descriptions of images, AI models are becoming essential tools for developers. One such powerful tool is Claude, a state-of-the-art AI language model developed by Anthropic. Whether you're an aspiring AI enthusiast or an experienced developer, learning how to leverage Claude’s capabilities can open up a world of creative and practical possibilities.</p>
<p>We just published a course on the <a target="_blank" href="http://freeCodeCamp.org">freeCodeCamp.org</a> YouTube channel that will teach you all about Claude AI and how to build exciting projects using Anthropic's API. In this course, you'll dive into Claude’s capabilities and discover how to use this AI model to create applications like text summarizers and image describers. The course is packed with hands-on coding challenges that will help you build practical skills while working with real-world AI tasks. Shant Dashjian from Scrimba developed this course.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729609616999/d1dba58d-3b2e-4ee1-97be-38190565a77e.png" alt="Image of Claude chat site." class="image--center mx-auto" width="1588" height="368" loading="lazy"></p>
<p>You'll begin by learning the basics: getting familiar with Claude, understanding its potential, and setting up your Anthropic API key. From there, you’ll quickly progress to interacting with Claude through conversations, where you’ll learn how to craft effective prompts to control its responses. The course will guide you through building two main projects that showcase how AI can process both text and visual data. The projects are:</p>
<ul>
<li><p>🗞️ a text summarizer</p>
</li>
<li><p>🖼️ an image describer</p>
</li>
</ul>
<p>In addition to learning how to work with Claude’s API, you’ll also develop important skills like error handling, prompt engineering, and cloud deployment, which are essential for creating robust, real-world applications. By the end of the course, you’ll not only have built two impressive projects but also gained a deeper understanding of how Claude fits into the broader AI landscape and how to effectively use AI models in your own projects.</p>
<p>Ready to meet Claude and start building? Watch the <a target="_blank" href="https://www.youtube.com/watch?v=QfJB9d0J3Iw">full course on the freeCodeCamp.org YouTube channel</a> (1-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/QfJB9d0J3Iw" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
