mcp - freeCodeCamp.org

How I Used Harness Engineering to Make Our Company AI-Native

Tech With RJ — Wed, 15 Jul 2026 15:34:41 +0000

Most companies say they want to "adopt AI". In practice this usually means a chatbot bolted onto a website.

Meanwhile, engineers using AI coding tools hit the opposite wall. The AI writes code fast, but nobody fully trusts the output, so someone reviews every line and the speed evaporates.

Both problems have the same root. The AI has no structure around it. No checks it must pass, and no access to the data your company actually runs on. Building that structure is a discipline called harness engineering, and it's what this article teaches you.

I'm a full-stack engineer who builds lending systems. Our documentation kept drifting away from the code, so I set out to fix it with Claude Code. What made it work in the end wasn't a smarter model like Fable or Opus. It was structure and guardrails.

In 30 days, I built V1 of an internal documentation platform where most of the code was written by the agent, kept safe by a set of automatic checks. Then I gave the platform a Model Context Protocol (MCP) server, so AI agents could read and write company docs with the same permissions as the person running them.

After rounds of improvement and tweaks, by day 50, the company adopted it. Requirement gathering, development work, and documentation all flow through the platform as one source of truth, in production, for a new project.

This article acts as the playbook, not a product tour. I won't go through all the features I built. I'll walk through the mindset and how it led to this outcome.

What you'll find below:

What harness engineering means, in plain terms
The four gates that let an AI agent write most of a production system
What an MCP server is and why it matters more than the chatbot
Why "you can only improve what you track" is the core idea behind an AI-native company
How to start with one process in your own company

What Harness Engineering Means
Pointing It at a Real Problem
The Four Gates
Where the Harness Failed
What an MCP Server Is and Why You Should Care
You Can Only Improve What You Track
How to Start in Your Own Company
The Real Shift

What Harness Engineering Means

Here's the usual way people use an AI coding agent. You ask for code, it writes some, you read every line because you don't trust it, you fix what's wrong, repeat. The AI is fast, but your review is the bottleneck, so nothing actually got faster.

Harness engineering flips the job. Instead of reviewing every line, you build the environment the agent works in.

The term comes from OpenAI. In a post called Harness engineering, their team describes a five-month experiment where Codex agents wrote roughly a million lines of a production product with no code written by hand.

They define the harness as "the full environment of scaffolding, constraints, and feedback loops" that surrounds an agent and lets it do stable work. In their setup that meant repository structure, CI configuration, formatting rules, project instructions, and tool integrations. The engineer's job shifts from writing the code to designing that environment.

Here's how that applied to us. OpenAI ran the idea with a team of engineers at a million-line scale. I ran it alone, on an internal tool, with four automatic checks, a rules file the agent reads at the start of every session, and a habit of proving each change by running the app and watching it. Same idea, budget version, and it held.

You stop trusting the AI. You start trusting the harness.

This changes what your job is. You spend your time designing checks, writing down rules, and reviewing the output at a higher level. The agent spends its time inside the fence you built.

And this is why one engineer suddenly matters a lot. An agent's speed is worthless when nobody trusts its output, and the harness is the thing that turns speed into output you can trust. Build a good harness and one person ships what used to take a team.

None of this needs permission from your company. My harness was made of things every engineer already knows. A type checker, a test runner, a coverage rule, and a text file with rules in it.

Pointing It at a Real Problem

The problem I pointed all this at is one every company has. A spec or requirement gets written. Developers build from it. The code changes during review, again in testing, again in production support. Nobody goes back to update the spec, for whatever reason. Six months later the document describes a system that no longer exists.

Most places shrug at this. In regulated lending you don't get to. You need to know what's current, and you sometimes need to show what changed, on what date, and who changed it. A document that quietly stopped being true is a business risk.

So, the case study was an internal documentation platform with one design goal. Docs should tell you when they go stale, instead of waiting for a human to notice.

Every doc declares which code paths it describes. A small script in CI reports code changes to the platform, and any doc whose code moved after its last edit gets flagged as drifting. Add a sign-off workflow where the approval badge turns amber if the doc changes after approval, a health score per document, and a digest that tells owners what needs attention.

Fifty days, 300+ commits, and most of that code was written by Claude Code inside the harness. The plan was mine. We'd worked with a regular wiki for years, so I knew exactly what was missing and what to build. The agent wrote the code. The commits are not the point of the article. They're the evidence that the method works.

The Four Gates

Every change the agent made had to pass four gates before it could land. None of them are exotic.

Gate 1: The Type Checker

tsc --noEmit across the whole codebase. No change lands with a type error. This is the cheapest gate and it catches a surprising number of agent mistakes.

Gate 2: 100% Test Coverage on the Logic

Every line, every branch, and every function of the core business logic must be covered by a test, or the build fails. That sounds extreme for a human team, and it is.

For an agent it's perfect, for two reasons. First, the rule is binary, so there's nothing to negotiate. An uncovered branch means a missing test, full stop. Second, the agent has no ego. It never argues that a test is unnecessary. It reads the coverage report like a to-do list and works through it.

Gate 3: End-to-End Tests

A Playwright suite clicks through the real app the way a user would. Unit tests check the logic in isolation. This gate checks the parts users actually touch.

I've written before about testing with plain-English assertions, and the same idea applies here. The e2e suite asserts what a user sees, not what the code intends.

Gate 4: Verify by Running It

After every change, the agent starts the app and watches the behaviour it claims to have changed. This one sounds obvious and gets skipped everywhere. Green tests plus an unverified claim is how a broken change ships with full confidence. Tests confirm the logic. Running the app confirms the claim.

Two text files complete the harness. One is a rules file in the repo. It holds the architecture, the step-by-step recipe every feature follows, and a list of ideas I already rejected, with reasons. Every fresh agent session starts by reading it, so the agent stays consistent and stops re-proposing bad ideas.

The other is a habit. Every feature ships with a short usage page written by the agent, showing the feature working. Writing it forces the agent to actually use what it built. Cheapest integration test I know.

Notice what the harness doesn't include. There's no linter. Style is not what goes wrong in agent-written code. What goes wrong is a plausible-looking branch nobody exercised. Spend your gate budget on behaviour, not formatting.

Where the Harness Failed

I want to be honest about the limits, because this is the part most AI articles skip.

The worst bug in the project passed every gate, and I found it by using the platform myself. I renamed a document, the slug got corrupted, and the page stopped loading.

Digging into the rename code showed something worse. The rename rebuilt the record from a partial payload, and any field missing from that payload quietly reset to its default. One of those fields controlled who could see the document. So a rename made a restricted document visible to everyone. Type-safe, fully covered, and wrong, because every test checked the fields the payload carried and no test checked the fields it left out.

Using my own product caught it, not a gate. That's the honest shape of harness engineering. Gates catch the failure types you thought to encode. Using the product and reviewing the output catch the rest. You need both. The harness doesn't remove your judgement from the loop. It spends your judgement where it matters instead of on every line.

What an MCP Server Is and Why You Should Care

Everything up to here is about building software with AI. The second half of the story is about what your company does with AI, and this is where MCP comes in.

MCP (Model Context Protocol) is a standard way to give an AI agent access to a system. Think of it as a USB port for your company's tools. Any agent that speaks the protocol can plug into any system that exposes it to read data, take actions, and do work.

I gave the documentation platform an MCP server with 50+ tools. Search the docs, read a page, write a page, comment, check what's drifting, and so on. Any engineer at the company connects their AI agent to it and their agent now works with the company's knowledge base directly.

I got the security model wrong the first time, and the mistake is worth sharing because you might make it. Version one gave the agent direct, trusted access to the database. It was convenient, and broken in three ways: every agent action was anonymous, the agent could read documents its user had no right to see, and there was no way to revoke access.

The fix was to make the MCP server hold no credentials of its own. Each person mints a personal access token in their profile, and every agent action runs as that person, with their exact permissions. A junior's agent can read and comment. An editor's agent can write. Every action lands in the audit trail under the real person's name, and revoking the token cuts the agent off instantly.

The part I like most is how this plays with role-based access control. The token carries no permissions of its own, it only says who you are. Permissions are checked server-side against your current role on every call. So when a person's role changes, or a whole group's access gets tightened, nobody has to hunt down and revoke existing tokens. The agent might still show the same tools in its list, but the server blocks the call the moment the role behind the token no longer allows it.

Here's what that looks like in practice. This is a cut-down version of one tool from my server, using the official TypeScript SDK. The full server is the same pattern repeated 50 times.

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const API = process.env.WIKI_API_URL;   // your existing HTTP API
const TOKEN = process.env.WIKI_TOKEN;   // the user's personal access token

const server = new McpServer({ name: "docs-wiki", version: "1.0.0" });

server.registerTool(
  "read_doc",
  {
    description: "Read one document by its slug",
    inputSchema: { slug: z.string() },
  },
  async ({ slug }) => {
    // The MCP server holds no credentials of its own.
    // It forwards the user's token, and the API checks
    // that user's current role on every single call.
    const res = await fetch(`${API}/docs/${slug}`, {
      headers: { Authorization: `Bearer ${TOKEN}` },
    });

    if (res.status === 403) {
      // Forbidden comes back as a clean tool error,
      // never a crash and never a silent success.
      return {
        content: [{ type: "text", text: "Error: Forbidden." }],
        isError: true,
      };
    }
    if (res.status === 404) {
      // A restricted doc the user can't see returns the same
      // response as a missing one, so its existence never leaks.
      return {
        content: [{ type: "text", text: `No document: ${slug}` }],
        isError: true,
      };
    }

    return { content: [{ type: "text", text: await res.text() }] };
  }
);

await server.connect(new StdioServerTransport());

Three things in this small file carry all the security weight. The server has no database access, so there's nothing to steal from it. The token travels with every request, so the API applies the real user's permissions and the audit trail gets a real name. And the two error branches make failure boring, a forbidden action reads as a plain error message, and a document the user can't see is indistinguishable from one that doesn't exist.

The rule underneath is simple: give AI your permission model, not a back door. That single design decision is why the company trusts agent-written documentation. Nothing the agent does is anonymous or outside what its human could do anyway.

And once agents could write docs safely, something changed. Documentation stopped being a chore after development and became part of it. An agent finishes a feature, writes the doc through the same MCP tools, and flags anything it isn't sure about with an inline [!VERIFY] marker. Anything touching rates or compliance gets an [!SME] marker that blocks approval until an expert signs off. The agent brings speed. The human keeps authority.

You Can Only Improve What You Track

Here's the belief driving all of this. You can only improve what you track.

Our documentation didn't go stale because people were careless. It went stale because nothing measured staleness. The moment drift became a tracked number, like "this doc's code changed 3 times since its last edit", keeping docs current became a finite, visible job instead of a vague wish.

The same pattern showed up everywhere once I looked for it:

Every question the AI assistant had no answer for gets logged. An assistant that knows when not to answer turns its own gaps into data. That list is literally a ranked backlog of what to write next, sorted by real demand.
Health scores per document show which owner is overloaded and which corner of the knowledge base needs attention.
The audit log keeps a tamper-evident history of every action. When we need proof of what changed, on what date, by who, it's one query instead of an archaeology dig, and the MCP can read it to compare versions.

None of this needed advanced AI. It needed the data to exist somewhere structured, instead of evaporating in chat messages and inboxes.

That's my working definition of an AI-native company. Not a company with a chatbot. A company whose processes leave trackable data behind, and whose tools are reachable by agents through something like MCP.

Once both are true, the AI does what AI is genuinely good at. It reads more data than any human has patience for, and it points at the patterns. Where work piles up. Which step everyone waits on. What keeps going stale. You stop guessing at bottlenecks and start reading them.

Your company already produces all of this data every day. The question is whether it lands somewhere an agent can read.

How to Start in Your Own Company

You don't need a mandate. I didn't have one. Here's the sequence I'd repeat:

Pick one process that annoys everyone. Docs going stale, tickets triaged by hand, release notes nobody writes. Small and real beats big and strategic.
Make its data trackable. Structured, timestamped, with an owner. This step is boring and it's the one that matters. A spreadsheet is a fine start.
Build the harness before the features. Decide the checks a change must pass. Write the rules file. Then let the agent build fast inside it.
Expose it over MCP with real permissions. Personal tokens, actions attributed to real people, revocable. Never a shared back door.
Ask the agent what it sees. Once the data accumulates, ask where the bottleneck is, what's going stale, what gets asked but never answered. This is the payoff step.

Start low-risk. An internal tool is the perfect first target because your colleagues are forgiving users and the data stays in-house.

In a larger company, you won't get to skip the approval layers, so design for them instead of around them. Reuse the permission model your security team already trusts, keep every agent action attributed to a real person and revocable, and run the pilot inside one team's boundary. Those three properties answer most of the questions a review board will ask before it asks them.

Then let the tracked data make your case. A pilot that shows exactly what it caught, in numbers, is a stronger argument for the next approval than any slide deck.

The Real Shift

Fifty days and one engineer changed how a whole company handles its knowledge. But the model didn't do that, and honestly, neither did I in the way it sounds. The harness did the trusting, the MCP did the connecting, and the tracked data did the convincing.

The shift worth copying isn't "use AI to write code faster." It's three habits:

Build checks so you can mostly trust code you didn't write yourself.
Give agents the same permissions as the person running them, never full access.
Record what your processes do, because you can only improve what you track.

Pick the process that annoys everyone and build the first gate.

How MCP Is Changing WordPress Development

Manish Shivanandhan — Wed, 08 Jul 2026 23:18:17 +0000

For years, the promise of AI-assisted development felt just out of reach for WordPress developers.

You could ask a chatbot to generate a block of PHP, paste it into your editor, run into a conflict, copy the error back into the chat, and repeat the whole cycle until something worked. It was useful, but it was also exhausting.

The gap between "AI knows how to do this" and "AI can actually do this in my environment" stayed stubbornly wide.

Model Context Protocol ( MCP) is closing that gap, and it's doing so in a way that changes not just how WordPress developers work, but what they can reasonably attempt on their own.

What We'll Cover:

What MCP Actually Is
The Shift from Autocomplete to Agency
Tools Leading the Shift
What This Means for Day-to-Day WordPress Work
The Developer's Role Is Changing, Not Disappearing
Where This Goes Next

What MCP Actually Is

MCP is an open standard, originally introduced by Anthropic, that defines how AI models communicate with external tools and data sources.

Before MCP, every integration between an AI assistant and an external system was a custom job. A team building an AI coding tool had to write proprietary connectors for their editor, their file system, and their APIs. It worked, but nothing was interoperable, and every new tool started from scratch.

MCP introduces a shared language. When a tool exposes an MCP server, any compatible AI client can connect to it and issue requests in a standard format.

The AI doesn't just receive information. It can take actions: read a file, query a database, call an API endpoint, update a record. The connection is bidirectional and structured.

For WordPress developers, this is significant because WordPress isn't a simple codebase. It's a deep ecosystem with its own database schema, a plugin architecture with thousands of moving parts, REST and GraphQL APIs, a block editor with its own component model, and hosting environments that all behave slightly differently.

Getting an AI to help you meaningfully inside that ecosystem used to require constant hand-holding. MCP changes the premise entirely.

The Shift from Autocomplete to Agency

The practical difference shows up quickly once you start working with MCP-powered tools. Traditional AI coding assistance is fundamentally reactive. You write some code, you ask a question, you get a suggestion. The AI has no context about your project unless you paste it in yourself.

An MCP-connected AI assistant can read your theme files, inspect your database tables, check which plugins are active, pull the schema of a custom post type, and cross-reference all of that before it suggests anything. That's not autocomplete. That's an agent that understands what you're actually building.

For WordPress specifically, this matters at every layer of a project. Setting up custom post types, registering taxonomies, writing WooCommerce hooks, and building Gutenberg blocks: each of these tasks requires awareness of what already exists in the project. An AI without that context gives generic answers. An AI with live project context gives accurate ones.

Tools Leading the Shift

Several tools are already putting MCP to work inside the WordPress ecosystem, and they approach the problem from different angles.

WPVibe AI

WPVibe AI is one of the more focused implementations in this space. It connects an MCP server directly to your WordPress site, giving the AI assistant access to your real content, settings, and plugin configuration.

Rather than working from a description of your site, the AI works from the site itself. Because it exposes your WordPress site through MCP rather than tying itself to a single editor, it can work with compatible AI clients such as Claude Code, Cursor, OpenAI's Codex, and other MCP-enabled development tools, so developers can keep the workflow they already prefer.

For developers who spend significant time debugging plugin conflicts or reverse-engineering how a client's site has been customized over the years, this kind of grounded context is genuinely valuable.

The same thinking runs through the rest of the design. The connection uses an encrypted WordPress login that can be revoked in one click, theme changes are built as drafts with a preview link so nothing reaches the live site until you approve it, and a daily usage limit sits on top of whatever caps your AI provider already enforces.

Large database fields, like page layouts and settings, are edited surgically on the server rather than being pulled through the conversation, which keeps token costs down and limits the blast radius of a bad change.

Cursor

Cursor is an AI-powered code editor built on VS Code, and it has become popular in the WordPress community partly because of how well it handles large, unfamiliar codebases.

With MCP support, Cursor can connect to local WordPress development environments and operate with awareness of project structure, file relationships, and dependencies.

Cursor's AI capabilities become even more powerful when paired with MCP servers. Rather than relying only on the files currently open in the editor, it can query external tools, inspect WordPress installations, retrieve project metadata, and automate common development tasks through a consistent protocol. This gives the AI richer context and enables more accurate code generation and refactoring.

For developers maintaining WordPress plugins, themes, or enterprise websites, Cursor offers a familiar VS Code experience while extending it with intelligent automation.

As the ecosystem of WordPress MCP servers continues to grow, Cursor provides a practical way to integrate AI-assisted development into existing workflows without requiring teams to adopt an entirely new editor.

Zed

Zed is a newer code editor with native MCP support built into its architecture from the ground up rather than added as an extension. It's still building out its WordPress-specific tooling, but its performance and deep AI integration make it a tool worth watching for developers who want MCP capabilities without the overhead of a heavier editor.

One of Zed's biggest strengths is its speed. The editor is written in Rust and is designed to remain highly responsive even when working with large codebases. Features such as collaborative editing, built-in AI assistance, and native MCP support create a workflow where developers can navigate, modify, and understand projects with minimal friction.

While Zed's plugin ecosystem isn't yet as extensive as those of more established editors, development is progressing rapidly. As the MCP ecosystem matures and more WordPress-focused servers become available, Zed is well positioned to become an attractive choice for developers who want a modern, AI-first editor without sacrificing performance.

What This Means for Day-to-Day WordPress Work

The use cases that benefit most are the ones that have always been tedious rather than technically difficult.

Tasks like plugin audits, theme customization, writing migration scripts, generating test data, and documenting custom functions require a lot of context and not much creativity. They are exactly the kind of work an MCP-connected AI can take on end-to-end.

MCP also helps with managing multiple WordPress sites from a single AI-assisted workflow. Agencies and freelancers rarely work on just one installation. With MCP-connected access, developers can switch between client sites, inspect plugin configurations, compare environments, audit updates, and troubleshoot issues without manually rebuilding context for each project.

Instead of treating every website as a separate conversation, the AI can work with each site's live configuration, making multi-site maintenance significantly more efficient.

Consider a common scenario: a developer inherits a site built by someone else, with a handful of custom plugins, a heavily modified theme, and minimal documentation.

Before MCP, getting up to speed meant reading through files, tracing function calls, and building a mental model of how everything connected. With an MCP-enabled assistant that can read the actual codebase and database, the developer can ask the AI to map the custom post type structure, identify all the custom hooks in use, summarize what each plugin is responsible for, and get a reliable answer in minutes rather than hours.

On the build side, MCP-powered tools are changing the threshold for what a solo developer or small agency can deliver. Tasks that previously required deep specialization, such as writing performant database queries, implementing custom REST API endpoints, or setting up complex ACF field groups programmatically, become more approachable when the AI can see exactly what your installation looks like and generate code that fits it.

The Developer's Role Is Changing, Not Disappearing

It's worth being direct about what MCP doesn't do. It doesn't replace judgment, and it doesn't replace the developer's understanding of why WordPress works the way it does.

An AI that can read your database schema can also generate a query that technically runs but performs terribly at scale. An AI that knows your plugin list can still suggest an integration that creates a subtle conflict you won't notice until production.

The developer who gets the most from MCP-powered tools is the one who knows enough to evaluate what the AI produces. That bar is real. If anything, MCP raises the importance of WordPress fundamentals because the AI is now doing more and doing it faster, which means mistakes can travel further before anyone catches them.

What MCP changes is where a capable developer's attention goes. Less time spelunking through files to establish context. Less time writing boilerplate that requires no original thought. More time on the decisions that actually require a human: architecture choices, client communication, performance trade-offs, accessibility, and the kind of judgment that only comes from having shipped and broken things before.

Where This Goes Next

MCP is still in its relatively early stages. The ecosystem of WordPress-specific servers and tools is growing, but it's not yet mature. The tooling for managing which permissions an AI has inside your environment, what it can read, what it can modify, and what requires confirmation is still being worked out across the ecosystem.

For production environments especially, those guardrails matter enormously, and the better tools are starting to treat them as a design problem rather than an afterthought, gating destructive actions behind explicit approval while letting reversible work flow freely.

But the direction is clear. WordPress development has always rewarded developers who adopted better tools early.

The developers who start building their workflows around these tools now won't just be faster. They'll be capable of things that weren't practical to attempt before. That's not a small change in degree. It's a change in kind.

Hope you enjoyed this article. You can connect with me on LinkedIn.

How to Build an MCP Server with FastMCP for Your Local AI Agent

Darsh Shah — Wed, 08 Jul 2026 18:56:26 +0000

In this tutorial, I'll show you how to build an MCP server with FastMCP, connect your local AI agent to use tools from the local MCP server that you built, and add support for remote MCP servers. We'll wire the whole thing together with LangChain v1, Ollama, Qwen, and Python.

Model Context Protocol (MCP) is the common language between AI agents and tools. It's the standard way to expose tools to AI agents.

More companies are starting to expose MCP servers alongside their existing APIs, because MCP gives LLMs and AI agents a standard way to discover and use those capabilities directly.

Background
What is MCP?
What is FastMCP?
Motivation and Architecture
Step 1: Install Ollama and Pull the Model
Step 2: Install Python Dependencies
Step 3: Build the Local MCP Server with FastMCP
Step 4: Agent Python Code
Step 5: Run the Agent
Conclusion

Background

A lot of simple local AI agents define their tools directly inside the same Python script as the agent. These are specific to the agent and every new agent has to re-implement the same tools from scratch.

MCP improves this by giving tools a standard interface that any MCP-compatible client can use. Write the tool once as an MCP server, and any compatible client can reuse it. And because MCP is a network protocol, those tools don't even have to run on your machine. Someone else can host an MCP server, and your agent can use its tools the same way it uses your local ones.

To follow this tutorial, you'll need Ollama installed on your machine. The tutorial works on macOS, Windows, and Linux. I'm using a MacBook Pro with 32 GB of RAM, but you can run this on a lower-memory machine by choosing a smaller Qwen model from Ollama.

What is MCP?

MCP (Model Context Protocol) is an open protocol that exposes tools, resources, and prompts to LLM clients.

Just as REST standardized many web APIs, MCP is the standardizing protocol for AI tools. Instead of every framework inventing its own tool interface, MCP defines a shared one, and anything that understands the protocol can use tools exposed by any MCP-compatible server.

The below image from modelcontextprotocol.io captures the idea well.

An MCP server is a small program that exposes a list of tools. An MCP client is anything that connects to that server (for example, an AI agent) and lets an LLM call those tools.

MCP servers are commonly exposed over transports like:

stdio: the server runs as a subprocess of the client, communicating over stdin/stdout. Best for local tools that only your agent needs.
http: the server runs as an HTTP service and clients connect over the network. Best for shared or remote tools.

The protocol standardizes how tools are exposed so different AI agents and clients can use them consistently.

What is FastMCP?

FastMCP is a Python library that makes writing an MCP server feel like writing a FastAPI app. You decorate functions with @mcp.tool, and FastMCP handles the protocol details: JSON-RPC messages, tool schema generation from your type hints and docstrings, and the transport layer.

On the LangChain side, langchain-mcp-adapters is a library that connects to one or more MCP servers and loads their tools into a format LangChain v1's create_agent can use directly. The agent code doesn't know if a tool lives in a subprocess on your machine or on a remote server. It just sees a list of tools with names and descriptions.

Motivation and Architecture

The motivation behind this project is to create sharable tools and to reuse tools others have already built. I wanted to create tools like current_time and word_count and share them across every agent I build. I also wanted to use tools from public MCP servers for capabilities I don't want to write myself, like browsing GitHub repos.

Using a local LLM means my conversations never leave my machine. The only thing that touches the network is whatever the model decides to send to remote tools, and only when it decides to call them.

For this project, I'll use FastMCP to build a local MCP server with two tools, connect to DeepWiki's free public MCP server for GitHub repo lookups, use langchain-mcp-adapters to load both into a LangChain v1 agent, and Ollama to run the local Qwen model.

The flow has three processes.

The local MCP server is a standalone Python script that exposes current_time and word_count. It runs as a subprocess of the agent, over stdio.
The remote MCP server is DeepWiki's public service that exposes three tools (read_wiki_structure, read_wiki_contents, ask_question) for asking questions about any GitHub repo, over HTTP.
The agent is the coordinating script that connects to both, merges their tools into a single list, and runs the interactive loop.

When the user asks a question, the model sees all tools from both servers as one list and picks whichever ones it needs.

Step 1: Install Ollama and Pull the Model

To get started, install the Ollama application for your platform.

We'll use Qwen as the chat model. Qwen has native tool-calling support, which is what makes it work well with MCP tools. I'm using qwen3.5:4b. If your machine has less RAM, you can use qwen3.5:0.8b.

ollama pull qwen3.5:4b

Step 2: Install Python Dependencies

python3 -m venv venv
source venv/bin/activate
pip install fastmcp langchain langchain-core langchain-ollama langchain-mcp-adapters

This tutorial requires langchain>=1.0.0.

Step 3: Build the Local MCP Server with FastMCP

The local MCP server exposes two small utility tools: current_time for checking the current date and time, and word_count for counting words in a piece of text. Any MCP client can use them, not just this agent.

FastMCP generates each tool's schema automatically from the type hints and docstrings, so the docstring wording matters. That's what the LLM sees when deciding whether to call each tool.

Save the code in your mcp_server.py file.

from datetime import datetime
from fastmcp import FastMCP

mcp = FastMCP("local-tools")


@mcp.tool
def current_time() -> str:
    """Return the current local date and time.
    Use this when the user asks what time or date it is.
    """
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")


@mcp.tool
def word_count(text: str) -> int:
    """Count the number of words in a piece of text.
    Use this when the user asks how long a piece of writing is
    or asks you to count the words in something they've shared.
    Returns the word count as an integer.
    """
    return len(text.split())


if __name__ == "__main__":
    # Run the MCP server over stdio.
    mcp.run()

Since this tools_server.py will be run in stdio mode as a subprocess, we don't need to start it separately. The agent will run it automatically.

Step 4: Agent Python Code

The agent code does three things. First, the configuration at the top defines the model, the system prompt, and the URL of the remote MCP server. The build_agent() function connects to both MCP servers, loads their tools into a single list, and creates a LangChain v1 agent. The main() function runs the interactive loop.

The [tool call] log line lets us see exactly which tool (local or remote) the agent picked on each turn.

Finally, await is used because build_agent(client) is asynchronous. It needs to wait for async MCP operations like client.get_tools() before it can return the finished agent. Without await, we would just get a coroutine object instead of the actual agent.

Save the code in your agent_with_mcp.py file:

import asyncio

from langchain.agents import create_agent
from langchain_ollama import ChatOllama
from langchain_mcp_adapters.client import MultiServerMCPClient

# Local Ollama model to use for the chat agent.
CHAT_MODEL = "qwen3.5:4b"

# Hosted remote MCP server we'll connect to over HTTP.
DEEPWIKI_MCP_URL = "https://mcp.deepwiki.com/mcp"

# System prompt that tells the model what tools it has and how to behave.
SYSTEM_PROMPT = (
    "You are a helpful assistant with access to tools for checking the current time, "
    "counting words, and looking up information about GitHub repositories. "
    "Use tools when the user's request needs information you don't already have. "
    "If a tool returns an error, tell the user plainly and do not retry with made-up arguments. "
    "If the question doesn't need a tool, just answer directly."
)


async def build_agent(client: MultiServerMCPClient):
    # Load tools from all connected MCP servers.
    # This is async because MCP communication happens over I/O.
    tools = await client.get_tools()
    print(f"Loaded {len(tools)} tools: {[t.name for t in tools]}")

    # Create the local Ollama chat model.
    model = ChatOllama(model=CHAT_MODEL, temperature=0)

    # Build a LangChain agent with the local model and all MCP tools.
    return create_agent(
        model=model,
        tools=tools,
        system_prompt=SYSTEM_PROMPT,
    )


async def main():
    # Create one MCP client that connects to two servers:
    #
    # 1. "tools" is a local MCP server started as a subprocess over stdio.LangChain will launch `python mcp_server.py` for us.
    # 2. "deepwiki" is a hosted MCP server we connect to over HTTP.
    client = MultiServerMCPClient({
        "tools": {
            "command": "python",
            "args": ["mcp_server.py"],
            "transport": "stdio",
        },
        "deepwiki": {
            "url": DEEPWIKI_MCP_URL,
            "transport": "streamable_http",
        },
    })

    # Build the agent after the MCP client is ready and tools are loaded.
    agent = await build_agent(client)

    print("\nReady! Ask the agent something.")
    print("Type 'exit' to quit.\n")

    while True:
        question = input("You: ").strip()
        if not question or question.lower() in {"exit", "quit"}:
            break

        # Send the user's message to the agent.
        # We use `ainvoke()` because the agent may call async MCP tools.
        result = await agent.ainvoke({
            "messages": [{"role": "user", "content": question}],
        })

        # Walk through the returned messages and print any tool calls
        # the agent made during this turn.
        for msg in result["messages"]:
            tool_calls = getattr(msg, "tool_calls", None)
            if tool_calls:
                for call in tool_calls:
                    print(f"[tool call] {call['name']}({call['args']})")

        # The final message in the list is the agent's final answer.
        print(f"\nAnswer: {result['messages'][-1].content}\n")


if __name__ == "__main__":
    # Run the async program.
    asyncio.run(main())

Step 5: Run the Agent

python agent_with_mcp.py

You don't need to start the local MCP server yourself. MultiServerMCPClient launches mcp_server.py as a subprocess over stdio, and also opens an HTTP connection to DeepWiki. If either server is unreachable, you'll see an error during startup rather than a silent fallback.

Once the agent is running, you can ask it questions in plain English. Before trusting the answers, watch the tool calls to make sure the agent picked the right tool with the right arguments. Local models are smaller than hosted frontier models and tend to hallucinate more. Spot-checking helps.

As a test run, I asked the agent a mix of questions:

$ python agent_with_tools.py

Starting MCP server 'local-tools' with transport 'stdio'                                                      transport.py:210
Loaded 5 tools: ['current_time', 'word_count', 'read_wiki_structure', 'read_wiki_contents', 'ask_question']

Ready! Ask the agent something.
Type 'exit' to quit.

You: what is the current time
[tool call] current_time({})

Answer: The current time is 2026-07-01 16:41:42

You: Give me one line summary of karpathy/nanochat 
[tool call] ask_question({'repoName': 'karpathy/nanochat', 'question': 'Give me a one-line summary of this repository'})

Answer: This repository, `karpathy/nanochat`, is a minimal, full-stack experimental system for training large language models (LLMs) from scratch, designed to be accessible and cost-effective, with a primary development focus on optimizing the "Time-to-GPT-2" benchmark.

You: what's the capital of France?

Answer: Paris

The agent behaved reasonably well for a 4B local model. It called current_time tool for the time question and reached out to DeepWiki's remote ask_question tool to answer a question about the nanochat repo. It also skipped tool calls entirely for the France question.

You can explore more MCP servers in the MCP server registry: https://github.com/modelcontextprotocol/servers

Conclusion

In this tutorial, we built an MCP server with FastMCP, connected to a free public remote MCP server, and wired both into a local AI agent using LangChain v1's create_agent and langchain-mcp-adapters.

From here, try adding your own tools to the local server, like a note reader or a wrapper around another local capability. Point the agent at other remote MCP servers. Or turn your local server into a remote one by switching its transport to HTTP and running it on a small server, so you can use it from any device you own or even publish it for others to use. Happy tinkering!

If you enjoyed this tutorial, you can find more of my writing on my blog (recent posts include system design paper series), my work on my personal website, and updates on LinkedIn.

How to Connect Your AI Coding Agent to a Browser on macOS

אחיה כהן — Tue, 26 May 2026 12:40:33 +0000

AI coding agents like Claude Code, Cursor, and the rest have gotten remarkably good at reading and writing code. But the moment they need to look at something on the web, they hit a wall. They can't see your staging site. They can't read the error in your analytics dashboard. They can't check whether the form they just built actually submits.

The usual fix is to hand the agent a headless browser — Puppeteer or Playwright driving a fresh Chromium instance. That works, sort of. But a headless Chromium starts every session as a stranger: no logins, no cookies, no sessions. It spins up a second browser engine that pushes your CPU and spins up your fan. And a growing number of sites simply block it on sight.

There's another option, and on a Mac it's a good one: let the agent drive the Safari you already use — the one that's already logged into GitHub, your analytics, your staging environment. That's what Safari MCP does. It's an open-source MCP server that exposes Safari to any MCP-capable agent through around 80 tools, with no Chromium, no WebDriver, and no separate browser to babysit.

In this tutorial you'll connect Safari MCP to an AI agent, run your first automation, and then build something a headless browser fundamentally cannot do: an automation that works inside a page you're logged into. By the end you'll understand not just how to wire this up, but when native browser automation is the right call — and when it isn't.

Here's what you'll need:

A Mac (Safari MCP is macOS-only — more on that trade-off later)
Node.js 18 or newer
An MCP-capable AI agent — this tutorial uses Claude Code and Cursor, but any MCP client works

What is MCP, and Why Does Browser Automation Need It?
Why Safari Instead of Chrome or Playwright?
Installing Safari MCP
Your First Automation: Reading a Page
The Payoff: Automating a Logged-in Workflow
Handling the Tricky Parts
Limitations: When Not to Use This
Wrapping Up

What is MCP, and Why Does Browser Automation Need It?

Before wiring anything up, it helps to know what the "MCP" in Safari MCP stands for.

MCP is the Model Context Protocol — an open standard for connecting AI agents to external tools and data. Think of it the way you'd think of a USB port. Before USB, every device needed its own connector. MCP is the equivalent of agreeing on one connector: an agent that speaks MCP can use any tool that speaks MCP, with no custom integration code on either side.

An MCP server exposes a set of tools. An MCP client — your AI agent — discovers those tools and calls them. The server describes each tool (its name, what it does, what arguments it takes) and the agent decides when to call it. When Claude Code decides it needs to read a web page, it doesn't run browser code itself. It calls a tool that some MCP server provides.

Browser automation is a natural fit for this model. The agent's job is reasoning — "I need to see what's on the staging site, then check the console for errors." The actual mechanics — open a tab, wait for load, read the DOM, capture console output — are well-defined operations that belong behind a stable interface. That interface is exactly what an MCP server provides.

Safari MCP is one such server. It runs as a local process, exposes around 80 browser tools (navigate, click, fill, read, screenshot, extract, and more), and any MCP client can drive it. The agent never touches AppleScript or WebKit internals. It just calls safari_navigate and gets a result.

The "USB port" framing matters for a practical reason: nothing in this tutorial is Claude-specific. Wire Safari MCP into Cursor, Cline, Windsurf, or your own MCP client and the tools are identical.

Why Safari Instead of Chrome or Playwright?

If you've automated a browser before, you've almost certainly used Chrome through Puppeteer, Playwright, or Selenium. So why reach for Safari?

It comes down to three differences that matter once an AI agent, not a test script, is the thing driving the browser.

1. It's your real browser, with your real sessions. A headless Chromium launched by Playwright is a clean room. It has never logged into anything. If you want your agent to read your analytics dashboard, you first have to solve authentication — store credentials somewhere, script the login, handle two-factor prompts, refresh tokens. Safari MCP skips all of that. It drives the Safari instance you use every day, which is already logged into your dashboards, your GitHub, your email. The agent inherits those sessions for free.

2. It doesn't melt your laptop. A headless Chromium is a second, full browser engine running alongside the browser you already have open. On a laptop that's real CPU, real memory, and a fan you can hear. Safari MCP uses the WebKit engine that's already running on every Mac — there's no second engine to start. The project measures this at roughly 60% less CPU for the browsing work, and the automation runs with Safari in the background, so it doesn't steal your screen.

3. Sites don't treat it as a bot. Headless browsers leak. They expose navigator.webdriver, they ship with telltale automation fingerprints, and bot-detection services — Cloudflare's challenge pages, reCAPTCHA, the WAFs in front of a lot of B2B sites — have gotten very good at spotting them. Your real Safari, driven through the operating system, looks like exactly what it is: a person's browser. (To be clear: this is for automating your own accounts and sites — not for evading access controls you don't own.)

The cost of all this is the obvious one: Safari MCP is macOS-only. It's built on WebKit and AppleScript, so there's no Windows or Linux story. If your agent runs on a Linux CI box, this isn't your tool. If it runs on your Mac — which, for a coding agent, it very often does — the trade is a good one. We'll come back to limitations honestly at the end.

Installing Safari MCP

Installation is genuinely one command, but there are two Safari settings to flip first. Let's do it in order.

Step 1 — Enable Safari's developer features

Safari MCP reads and controls pages by running JavaScript inside Safari. Two settings have to be on:

Open Safari → Settings → Advanced and check "Show features for web developers." This reveals the Develop menu.
Open the new Develop menu and check "Allow JavaScript from Apple Events."

That second one is the important one. It's what lets an outside process — the MCP server — ask Safari to run JavaScript on a page. Without it, every tool call fails.

Step 2 — Run the server

npx safari-mcp

That's the whole install. npx fetches the package and runs it; there's nothing to build. The first time an agent calls a tool, macOS will pop up a permission prompt — something like "Terminal wants to control Safari." Click OK. That's the standard Automation permission, and you can review it later under System Settings → Privacy & Security → Automation.

If you'd rather have it installed permanently:

npm install -g safari-mcp

Step 3 — Tell your agent about it

Your AI agent needs to know the server exists. For Claude Code, one command does it:

claude mcp add safari -- npx safari-mcp

For Cursor, create .cursor/mcp.json in your project:

{
  "mcpServers": {
    "safari": {
      "command": "npx",
      "args": ["safari-mcp"]
    }
  }
}

The process is the same for every client — Claude Desktop, Cline, Windsurf, Continue, VS Code. You're telling the agent: "there's an MCP server named safari; start it by running npx safari-mcp."

Restart your agent (or reload its MCP servers) and it will connect. In Claude Code you can confirm with the /mcp command, which lists connected servers and their tools. You should see safari with around 80 tools available.

That's it. Your agent now has a browser.

Your First Automation: Reading a Page

Let's prove the wiring works with the simplest possible task: have the agent read a web page.

In your agent, just ask in plain language:

"Use the safari tools to open example.com and tell me what the page says."

Behind that request, the agent makes two tool calls. First it navigates:

{ "tool": "safari_navigate", "arguments": { "url": "https://example.com" } }

Then it reads the content:

{ "tool": "safari_read_page", "arguments": {} }

safari_read_page returns the page's title, URL, and text content with the HTML stripped out — exactly the form an LLM wants. The agent gets back something like this:

Example Domain
https://example.com/
This domain is for use in illustrative examples in documents. You may
use this domain in literature without prior coordination or asking for
permission.

And it relays that to you. You just watched your agent browse.

A quick note on how the agent should look at a page, because it changes everything downstream. safari_read_page is great for "what does this say." But when the agent needs to act — click a button, fill a field — text isn't enough. It needs to know what's actually there and how to target it. For that, the better first move is safari_snapshot:

{ "tool": "safari_snapshot", "arguments": {} }

This returns an accessibility-tree view of the page, where every interactive element has a stable ref ID:

[textbox ref=0_8] "Full Name" value=""
[combobox ref=0_10] "Subject"
[button ref=0_15] "Submit"

Those ref IDs are the agent's reliable handles. CSS selectors break when a page re-renders. A snapshot ref stays valid for the life of the page. Keep that in mind — it's the difference between an automation that works once and one that works every time.

The Payoff: Automating a Logged-in Workflow

Reading example.com is a wiring test. Here's the thing a headless browser genuinely cannot do.

Pick a site you're logged into in Safari right now — your analytics, your project board, your CI dashboard. We'll use GitHub, because every developer has an account and the notifications page is a real, mildly annoying chore. The task: have the agent open your GitHub notifications and summarize what actually needs your attention.

Ask the agent:

"Open my GitHub notifications, read them, and group them into 'needs a reply' versus 'just FYI'."

The agent navigates:

{ "tool": "safari_navigate", "arguments": { "url": "https://github.com/notifications" } }

Stop and notice what didn't happen. No login screen. No OAuth dance. No personal access token in an environment variable. Safari is already authenticated as you, so the agent lands directly on your real notifications. A headless Chromium would have hit a login wall here and stopped.

Notification lists load incrementally, so the agent should wait for content before reading. safari_wait_for polls the page until a selector or piece of text appears, or a timeout elapses:

{ "tool": "safari_wait_for", "arguments": { "text": "Inbox", "timeout": 10000 } }

Then it reads. safari_read_page scoped to the notifications region returns the list as clean text:

{ "tool": "safari_read_page", "arguments": { "selector": "main" } }

The agent reasons over that text and hands you the grouped summary. The whole loop — navigate, wait, read, summarize — is a handful of tool calls.

When you need data in a precise shape rather than prose — to feed another step, or to write to a file — the agent can reach for safari_evaluate, which runs custom JavaScript on the page and returns whatever you build:

{
  "tool": "safari_evaluate",
  "arguments": {
    "expression": "JSON.stringify([...document.querySelectorAll('li')].map(li => li.innerText.trim()))"
  }
}

The agent writes that expression itself, against the structure it just saw in the snapshot — you don't hand-author selectors.

You might be thinking: GitHub has an API, why scrape the page? Fair. For GitHub specifically, the API is excellent. But the point generalizes. Most of the dashboards you stare at every day — your billing portal, your error tracker's specific filtered view, a client's analytics, the admin panel of some tool your company pays for — either have no usable API or would cost you an afternoon of OAuth setup to reach. With Safari MCP, "the page I'm already looking at" is the API. The agent reads what you can see, because it's using the browser you're seeing it in.

That's the capability headless automation can't match. Not speed, not features — access.

Handling the Tricky Parts

A first automation always looks easy. Three things tend to bite on the second one.

Tab Safety — The Agent Must not Hijack Your Tabs

This is the scariest failure mode: you're typing in a tab, the agent navigates that tab, and your work is gone. Safari MCP guards against it by stamping each automation tab with an identity marker — it uses window.name, which survives page navigations — and resolving "the agent's tab" through that marker on every call. If it can't positively identify its own tab, it refuses to act and raises a re-anchor error rather than guessing.

The practical rule for you: let the agent open its own tab with safari_new_tab, and it will stay in its lane. Don't point it at "the current tab" and assume.

Waiting for Dynamic Content

Modern pages render after load. If the agent reads too early, it reads an empty shell. Don't have it guess with fixed sleeps — use safari_wait_for, which polls for a selector or text until it appears or the timeout elapses:

{ "tool": "safari_wait_for", "arguments": { "selector": ".results-list", "timeout": 8000 } }

This is the single most common fix for "the automation works when I step through it slowly but fails when it runs."

Framework Forms

Set a React or Vue input's .value directly and the framework never notices — its internal state stays empty, and your "filled" form submits blank. Safari MCP's safari_fill and safari_fill_form use the native value setters and dispatch the input and change events the framework listens for, so React, Vue, Angular, and Svelte state all stay in sync:

{
  "tool": "safari_fill_form",
  "arguments": {
    "fields": [
      { "selector": "#email", "value": "jane@example.com" },
      { "selector": "#message", "value": "Looks great." }
    ]
  }
}

For framework-heavy pages where CSS selectors are fragile, go back to the snapshot refs from the previous section — pass { "ref": "0_9" } instead of { "selector": "#email" }. Refs survive re-renders; selectors don't.

None of these are exotic. They're just the difference between a demo and an automation you'd actually leave running.

Limitations: When Not to Use This

A tool tutorial that only lists strengths isn't worth much. Here's where Safari MCP is the wrong choice.

It's macOS-only, and that's structural. Safari MCP is built on WebKit and AppleScript. There's no Windows or Linux port coming, because the foundation doesn't exist on those platforms. If your agent runs in Linux CI, use Playwright.

It drives one Safari, on one Mac. This is browser automation for your machine — a coding agent working alongside you. It is not a fleet. If you need 50 parallel browsers scraping in a data center, that's a headless-Chromium-in-containers job, and Safari MCP is the wrong shape for it.

Cross-browser test suites should stay on Playwright. If you're writing end-to-end tests that must pass on Chrome, Firefox, and Safari, use the tool built for that. Safari MCP drives exactly one engine: WebKit.

It shares a browser with you. Because it uses your real Safari, the agent and you are in the same browser. That's the entire point — but it means you should let the agent work in its own tabs and not fight it for the same window.

The honest summary: Safari MCP is built for one specific situation — an AI agent doing real browser work on the Mac you're sitting at, against sites you're already logged into. In that situation it's hard to beat. Outside it, reach for the headless tools. Knowing which situation you're in is the actual skill.

Wrapping Up

You've gone from an AI agent that could only see code to one that can see the web — the real web, behind your real logins.

To recap what you did: you learned what MCP is and why browser automation belongs behind that interface. You saw why a native Safari engine beats a headless Chromium for an agent working on your Mac and you installed Safari MCP with one command and two settings. You ran a first read, and then you did the thing that actually matters — an automation inside a logged-in page, with no auth code at all. Finally, you saw the edges: tab safety, waiting for dynamic content, framework forms, and the cases where you should pick a different tool.

The bigger idea is worth holding onto. An AI agent is only as capable as the tools you connect to it. Giving it a browser — a real one — turns "write me code" into "go look at the staging site, find the bug, and tell me what's wrong." That's a different kind of collaborator.

Safari MCP is open source under the MIT license, and it exposes around 80 tools beyond the handful you used here — screenshots, network inspection, storage, accessibility audits, multi-tab workflows. The repository and full tool reference are at github.com/achiya-automation/safari-mcp. Point your agent at it and see what it does when it can finally look around.

How to Build an Autonomous OSINT Agent in Python Using Claude's Tool Use API

Tommaso Bertocchi — Fri, 15 May 2026 00:19:42 +0000

When I started studying OSINT, I always felt I was just putting random values into software without deeply understanding what I was doing. After months in the field, I realized I wasn't really investigating — I was just executing steps that follow a predictable pattern. That's exactly what an AI agent is good at. So I built one.

In this tutorial you'll learn how to set up OpenOSINT, an open-source Python OSINT framework with an AI agent at its core. You'll learn how Claude's native tool use API works, how to run autonomous investigations from the terminal using the interactive AI REPL, how to use the direct CLI for scripting, and how to expose all the tools to Claude Code or Claude Desktop via an MCP server.

What Is OSINT and Why Manual Workflows Break Down
What You'll Build
Prerequisites
How Claude's Tool Use API Works
How to Install OpenOSINT
How to Use the Interactive AI REPL
How to Run Individual Tools from the CLI
How to Set Up the MCP Server
How the Agent Loop Works Under the Hood
Project Architecture
Conclusion

What Is OSINT and Why Manual Workflows Break Down

Open Source Intelligence (OSINT) is the practice of collecting and analyzing information from publicly available sources. Security researchers use it during penetration tests. Journalists use it to verify identities and trace connections. Threat analysts use it to profile infrastructure.

A typical OSINT workflow looks like this:

You have a target email address
You run holehe to find which platforms that email is registered on
You notice a username in the output
You manually copy that username and run sherlock to search 300+ platforms
You switch to a browser to check HaveIBeenPwned
You open another tab for a WHOIS lookup
You take notes and repeat

Every tool is a silo. Every pivot is manual. The investigation logic — what to run next, what to chain, what the findings mean — lives entirely in your head.

When you close the terminal, it's gone.

This tutorial walks you through OpenOSINT, an open-source Python framework that replaces that fragmented workflow with an AI agent that chains tools autonomously, executes them against real binaries, and saves a structured Markdown report.

More importantly, you'll learn the core design principle that makes it trustworthy for security research: hallucination in tool results is structurally impossible.

What You'll Build

By the end of this tutorial, you'll have a working OSINT agent that you can use in three ways:

Interactive AI REPL — type a target in natural language and the agent decides what to run
Direct CLI — run individual tools without AI, useful for scripting
MCP Server — expose all tools to Claude Code or Claude Desktop

Here's what a real session looks like:

$ openosint
openosint ❯ investigate target@example.com

  → generate_dorks('target@example.com')
  → search_email('target@example.com')
  ✓ Found: Spotify, WordPress, Gravatar, Office365

  → search_breach('target@example.com')
  ✓ Found in 2 breaches: LinkedIn (2016), Adobe (2013)

  → search_username('target_handle')
  ✓ Found on: GitHub, Reddit, HackerNews, Twitter

  ╭──────────────── Report ────────────────╮
  │ ## Online Presence                     │
  │ Spotify · WordPress · Gravatar         │
  │                                        │
  │ ## Data Breaches                       │
  │ LinkedIn (2016) · Adobe (2013)         │
  ╰────────────────────────────────────────╯

  ✓ Report saved → reports/2026-05-11_report.md

The agent went from email → linked accounts → username pivot → cross-platform search with no human orchestration at any step.

Prerequisites

To follow this tutorial, you'll need:

Python 3.10 or later installed on your machine
Basic familiarity with the command line
An Anthropic API key — only required for the AI REPL, not for the CLI or MCP server
Git installed

You don't need prior experience with OSINT tools or the Anthropic SDK.

How Claude's Tool Use API Works

Before you dive into installation, it's worth understanding the mechanism that makes this framework trustworthy for security research.

Most AI applications that wrap external tools work by generating text that describes what a tool would return. That's a problem when accuracy matters — the model can hallucinate plausible-looking usernames, fake subdomains, or data breaches that never happened.

Claude's tool use API works differently. When the model decides it needs to call a tool, it does not generate the output. It stops and emits a structured tool_use block containing the tool name and the arguments it wants to pass.

Your code then runs the actual binary — holehe, sherlock, or whatever else — and sends the real output back as a tool_result. The model reads that real output and decides its next step.

Here's the flow:

User prompt
    ↓
Model decides to call search_email()
    ↓
Hard stop — model emits tool_use block
    ↓
Your code runs holehe against the real target
    ↓
Real output sent back as tool_result
    ↓
Model reads actual results, decides next step
    ↓
Repeat until investigation is complete

The model never generates tool output. It only ever reads it. If sherlock finds 12 profiles, those 12 URLs go back into the context verbatim. The model cannot add a 13th that doesn't exist.

This is not a prompting trick or a system prompt instruction. It is how the API is architected. Keep this in mind as you read through the agent loop code later in this tutorial.

How to Install OpenOSINT

Start by cloning the repository and installing the package:

git clone https://github.com/OpenOSINT/OpenOSINT.git
cd OpenOSINT
pip install -e .

Alternatively, if you just want to use the tool without modifying the source, install it directly from PyPI:

pip install openosint

Next, set your Anthropic API key. This is only required for the interactive AI REPL — the direct CLI and MCP server work without it:

export ANTHROPIC_API_KEY=sk-ant-...

How to Install the External Tool Dependencies

OpenOSINT wraps several standalone OSINT tools. Install the ones you plan to use:

pip install holehe            # email account enumeration
pip install sherlock-project  # username search across 300+ platforms
pip install sublist3r         # subdomain enumeration

For phone intelligence, phoneinfoga is a standalone binary. Download the release for your platform from its GitHub releases page and place it somewhere in your PATH.

How to Configure Optional API Keys

Two tools work at higher rate limits with optional API keys:

export HIBP_API_KEY=your_key    # required for breach checks via HaveIBeenPwned v3
export IPINFO_TOKEN=your_token  # optional — raises ipinfo.io rate limits

If a binary is missing or an API key is not configured, that specific tool returns a descriptive error string. All other tools continue to work normally.

How to Use the Interactive AI REPL

Run openosint with no arguments to start the AI-powered REPL. You can also use openosint shell — it's equivalent:

$ openosint
# or
$ openosint shell

If you prefer to pass the API key inline rather than via environment variable, use the --api-key flag:

$ openosint --api-key sk-ant-...

You'll get a prompt where you can type targets or questions in natural language:

openosint ❯ investigate target@example.com
openosint ❯ find all accounts for johndoe99
openosint ❯ what subdomains does example.com have?
openosint ❯ check if +14155552671 is a mobile number

The agent decides which tools to run based on your input. You don't need to specify which tools to use or in what order. If you type an email address, the agent will run email enumeration. If it finds a linked username, it may pivot and search that username across platforms.

Reports are saved automatically to the reports/ directory after every investigation that produces structured findings.

Here are the commands available inside the REPL:

Command	Description
`clear`	Reset the conversation memory
`save`	Manually save the last report
`tools`	Show available tools and their status
`config`	Show current configuration
`help`	List all commands
`exit` or Ctrl-D	Quit

How to Run Individual Tools from the CLI

If you want to run a single tool without the AI layer — for scripting, automation, or quick lookups — use the direct CLI:

# Email account enumeration (default timeout: 120s)
openosint email target@example.com

# With a custom timeout in seconds
openosint email target@example.com -t 60

# Username search across 300+ platforms (default timeout: 180s)
openosint username johndoe99

# Enable verbose output for debugging
openosint -v email target@example.com

The direct CLI doesn't require an Anthropic API key. It runs the underlying binary and prints the output to the terminal.

This mode is useful when you need predictable, scriptable behavior — for example, piping output into another tool or running automated checks.

How to Set Up the MCP Server

OpenOSINT also ships as a Model Context Protocol (MCP) server. This exposes all 9 tools to any MCP-compatible AI client.

How to Register with Claude Code

claude mcp add openosint python /absolute/path/to/OpenOSINT/openosint/mcp_server.py

Verify the registration worked:

claude mcp list

Once registered, you can drive investigations from the Claude Code prompt:

> Investigate target@example.com. If you find a linked username,
  trace it across other platforms and compile a full report.

How to Configure Claude Desktop

Add the following to your Claude Desktop config at ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "openosint": {
      "command": "python",
      "args": ["/absolute/path/to/OpenOSINT/openosint/mcp_server.py"]
    }
  }
}

Restart Claude Desktop after saving the file. The tools will appear in Claude's tool list.

The MCP server uses stdio transport and does not need a persistent background process. Claude Code or Claude Desktop starts it on demand.

How the Agent Loop Works Under the Hood

Here is a simplified version of the agent loop from openosint/agent.py:

import anthropic
import asyncio

client = anthropic.Anthropic()

async def run_investigation(user_prompt: str) -> str:
    messages = [{"role": "user", "content": user_prompt}]

    while True:
        response = client.messages.create(
            model="claude-...",   # model configured via --api-key / env var
            max_tokens=4096,
            tools=TOOL_SCHEMAS,   # JSON schemas for all 9 tools
            messages=messages
        )

        # Agent is done — extract and return the final report
        if response.stop_reason == "end_turn":
            return extract_text(response)

        # Agent needs a tool — run the real binary
        if response.stop_reason == "tool_use":
            tool_results = []

            for block in response.content:
                if block.type == "tool_use":
                    # Runs holehe, sherlock, etc. as real subprocesses
                    real_output = await execute_tool(block.name, block.input)

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": real_output  # real output, never generated
                    })

            # Append assistant turn and real tool results to conversation
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})

There are a few important things to understand in this code.

The loop runs until stop_reason == "end_turn": The agent decides when it has gathered enough information to write the final report. It may call one tool or ten, depending on what it finds.
execute_tool() runs real subprocesses: It's a thin async wrapper around Python's asyncio.create_subprocess_exec() with a configurable timeout. There's no simulation and no mocked data at any point.
Conversation history is maintained across the entire loop: Each tool result goes back into messages, so the model always has full context of what it found when deciding what to run next.
Tool schemas are defined as JSON: Each tool has a name, description, and parameter schema. The model uses these to know what tools exist and what arguments they accept. Here's a simplified example for search_email:

{
    "name": "search_email",
    "description": (
        "Enumerates online services and social accounts "
        "associated with an email address using holehe."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "email": {
                "type": "string",
                "description": "Target email address"
            }
        },
        "required": ["email"]
    }
}

The same pattern applies to all 9 tools. The model reads these schemas at the start of every request and uses them to decide what's available and how to call it.

Project Architecture

The codebase is organized in five layers. The hard rule across the codebase is that no layer imports from a layer above it:

openosint/tools/        Core tools
                        Async wrappers around external binaries and APIs.
                        Stateless. No AI. No CLI. Pure functions.

openosint/agent.py      AI agent
                        Anthropic tool use loop.
                        Per-session conversation history.
                        Imports from tools/. Nothing imports from agent.py.

openosint/repl.py       Interactive REPL (prompt_toolkit + Rich)
openosint/mcp_server.py MCP server (stdio transport)
openosint/cli.py        CLI entry point

This separation makes each layer independently testable. The core tools are pure async functions that take a string and return a string — you can unit test them without touching the agent or the CLI.

It also means the AI layer is entirely optional. If you don't have an Anthropic API key, you use the CLI and bypass the agent. The MCP server also operates independently of the agent.

The 9 Available Tools

Tool	Backend	What it returns
`search_email`	holehe	Social accounts linked to an email
`search_username`	sherlock	Accounts across 300+ platforms
`search_breach`	HaveIBeenPwned v3	Breach names, dates, leaked data types
`search_whois`	python-whois	Registrant, registrar, creation/expiry
`search_ip`	ipinfo.io	Geolocation, ASN, hostname, org
`search_domain`	sublist3r	Subdomain enumeration
`generate_dorks`	built-in	12 targeted Google dork URLs, no network calls
`search_paste`	psbdmp.ws	Pastebin dump mentions
`search_phone`	phoneinfoga	Carrier, country, line type

Conclusion

In this tutorial, you learned how to set up and use OpenOSINT — a Python OSINT framework built on Claude's tool use API.

The key takeaway is the design principle: by using native tool use, the agent never generates tool output. It only reads real output from real binaries. This makes it suitable for security research where accuracy matters and hallucination isn't an acceptable failure mode.

To recap the three interfaces:

Run openosint for the interactive AI REPL — best for full investigations with automatic chaining
Run openosint email or openosint username for direct CLI access — best for scripting and automation
Register the MCP server in Claude Code or Claude Desktop to run investigations inside your existing AI environment

The full source code is available on GitHub under the MIT license. Contributions and issues are welcome.

Legal note: OpenOSINT is for authorized security research, penetration testing, and investigative journalism only. Users are solely responsible for compliance with applicable law, including GDPR, CCPA, and the CFAA. See the DISCLAIMER.md for the full notice.

How to Build a Market Research Copilot with MCP and Python [Full Handbook]

Nikhil Adithyan — Wed, 06 May 2026 18:11:37 +0000

Most financial AI tools are good at one thing: summarizing a stock. You ask about Apple, NVIDIA, or Tesla, and they give you a clean overview of price action, a few ratios, and maybe some company context. That can be useful, but it falls short the moment the task becomes more like real research.

Real research usually starts with a view. Not a ticker. A trader, analyst, or product team is more likely to ask something like, “Apple looks attractive because downside has been controlled and business quality remains high. Does the data actually support that?” That's a different problem. A summary can't answer it properly because the system needs to test the claim itself, not just describe the company around it.

In this tutorial, we're going to build a financial research copilot that does exactly that. It takes a natural-language thesis, pulls historical prices and fundamentals through EODHD’s MCP server, turns those inputs into structured evidence, and returns a short research memo with a verdict.

Prerequisites
What This Copilot Actually Produces
What Makes This Different from a Normal Stock Assistant
The Workflow
Building the MCP Client
Setting Up core.py
Parsing a Research Prompt into a Structured Request
Fetching the Two Data Sources: Historical & Fundamental Data
Building the First Evidence Layer from Price Data
Building the Second Evidence Layer from Fundamentals
What do we have so far?
Classifying the Thesis
Turning Signals into Support, Contradiction, and Missing Evidence
- Sanity Check (Jupyter Notebook)
Assigning a Verdict
Building the Facts Object
Writing the Final Memo
- Sanity Check (Jupyter Notebook)
Stitching Everything Together
Demo Time! (Jupyter Notebook)
- Demo 1. Testing Whether a Premium Is Actually Justified
- Demo 2. Testing Whether Volatility Is Too High for the Underlying Business
Final Thoughts

Prerequisites

Before starting, make sure you have the following in place.

You will need Python 3.9 or later, along with these libraries: mcp, openai, numpy, and pandas. Install them with pip before running any code.

You will also need two API keys. One from EODHD for historical prices and fundamentals data, and one from OpenAI for parsing and memo generation. If you don't have an EODHD key, you can get one by registering for a developer account at eodhd.com.

The tutorial assumes basic familiarity with Python and async programming. You don't need a background in finance, but it helps to understand what a P/E ratio and drawdown mean before reading the evidence-building sections.

A Jupyter notebook environment is recommended for running the sanity checks, though any Python environment that supports await will work.

What This Copilot Actually Produces

Before getting into the pipeline, it helps to see the kind of output we're building toward. The easiest way to understand this project is to look at one real example.

Suppose the user gives the system this prompt:

I think Apple looks attractive because downside has been controlled and business quality remains high. Can you test that for AAPL over the last 180 days?

The copilot doesn't respond with a loose summary of Apple. It turns that into a structured research memo:

1. Thesis under review  

Apple appears attractive due to controlled downside and sustained high business 
quality.

2. Supporting evidence  

Over the past 180 days, maximum drawdown was limited to -13.82%, suggesting relatively contained downside.Profitability metrics are strong, with a 35.37% operating margin and 27.04% profit margin. Returns on capital are high, with ROA at 24.38% and ROE at 152.02%, indicating efficient asset use and strong  capital efficiency. Growth metrics support ongoing business strength, with quarterly revenue growth of 15.70% and earnings growth of 18.30% year-over-year. Forward estimates also remain positive, with expected earnings growth of 9.68% and 
revenue growth of 6.87%.

3. Evidence that weakens the thesis  

Net EPS revisions over the past 30 days are negative (-3), indicating some deterioration in analyst sentiment.

4. Missing evidence  

No material gaps in the provided dataset.

5. Verdict  

partially_supported - There is more supporting evidence than contradicting evidence, but the thesis is not fully confirmed.

6. Bottom-line assessment  

Apple demonstrates strong and consistent business quality supported by high margins, returns, and continued growth. Downside has been relatively contained over the observed period, though not negligible. However, negative earnings 
revisions introduce some caution, leaving the thesis supported but not conclusively established.

This example makes the goal of the project much clearer. We're not building a system that simply tells us what happened to Apple. We're building one that takes a claim, checks it against market and fundamentals data, and returns a structured judgment.

That distinction matters because the memo is only the final surface. Underneath it, the system first parses the thesis, pulls prices and fundamentals through EODHD’s MCP server, computes the relevant signals, builds support and contradiction, assigns a verdict, and only then writes the final note. That's what gives the output its structure.

In this first part, we’ll build everything up to the evidence layers that power this kind of output.

What Makes This Different from a Normal Stock Assistant

A normal stock assistant starts with a ticker and tries to explain what happened. It may summarize price action, mention a few ratios, and add some company context. That is useful when the question is broad, but it's not enough when the input is a specific investment view.

This project starts from the opposite direction. The input is not “tell me about Apple.” The input is a claim, like Apple looks attractive because downside has been controlled and business quality remains high. That changes the job of the system. It now has to test each part of that claim, decide what supports it, decide what weakens it, and be clear about what's still missing.

That one shift is what shapes the whole workflow. Instead of ending at retrieval and summarization, the pipeline has to parse the thesis, map the data to the right kind of evidence, and return a verdict. That's what makes this feel like a research copilot rather than a better stock summary tool.

The Workflow

At a high level, the copilot follows a simple sequence:

parse the user’s thesis into a structured request
fetch historical prices and fundamentals through MCP
turn those inputs into market and business signals
map those signals into support, contradiction, and missing evidence
assign a verdict
write the final memo

That's the full loop. The output may look like a short research note, but it sits on top of a more controlled pipeline in core.py.

Project structure:

project/
├── client.py
├── core.py
└── test.ipynb

client.py is the MCP access layer. It connects to EODHD, lists tools, calls them with retries and timeouts, and returns metadata for each request. core.py contains the actual thesis-testing logic, including parsing, data fetching, signal computation, evidence building, verdict assignment, and memo generation. test.ipynb is where the quality checks and end-to-end demos are run.

This split is useful because it keeps the tutorial easy to follow. When we move into code, each block has a clear place. MCP access stays in client.py, while the research workflow stays in core.py.

Building the MCP Client

We’ll start with the thinnest part of the project, which is the MCP access layer.

This file only does one job. It connects to EODHD’s MCP server, lists available tools, calls a tool with retries and a timeout, and returns a small metadata object alongside the response. The actual thesis logic doesn't belong here. Keeping this layer small makes the rest of the project much easier to reason about later.

Create a file called client.py and add this:

import time
import asyncio

from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

class EODHDMCP:
    def __init__(self, apikey, base_url=None):
        self.apikey = apikey
        self.base_url = base_url or "https://mcp.eodhd.dev/mcp"
        self._tools = None

    def _url(self):
        return f"{self.base_url}?apikey={self.apikey}"

    def _open(self):
        return streamablehttp_client(self._url())

    async def list_tools(self):
        if self._tools is not None:
            return self._tools

        async with self._open() as (read, write, _):
            async with ClientSession(read, write) as s:
                await s.initialize()
                resp = await s.list_tools()
                self._tools = [t.name for t in resp.tools]
                return self._tools

    async def call_tool(self, name, args, trace_id, timeout_s=25, retries=2):
        last = None

        for attempt in range(retries + 1):
            t0 = time.time()
            try:
                async with self._open() as (read, write, _):
                    async with ClientSession(read, write) as s:
                        await s.initialize()
                        out = await asyncio.wait_for(s.call_tool(name, args), timeout=timeout_s)
                        dt = time.time() - t0
                        meta = {
                            "trace_id": trace_id,
                            "tool": name,
                            "args": args,
                            "latency_s": round(dt, 3),
                        }
                        return out, meta
            except Exception as e:
                last = e
                if attempt < retries:
                    await asyncio.sleep(0.5 * (attempt + 1))

        raise last

There are only two methods that really matter here. list_tools() is just a quick way to inspect and cache the tools exposed by the MCP server. call_tool() is the method the rest of the project will actually use. It makes the request, applies timeout and retry handling, and returns both the raw output and a small metadata object.

That metadata becomes useful later because the workflow stays traceable. When the copilot returns a memo, we still know which tool was called, with what arguments, and how long it took. So even though this file is small, it gives the rest of the system a clean and inspectable access layer.

Setting Up `core.py`

Now that the MCP client is ready, we can start building the main workflow in core.py.

This file will hold the actual thesis-testing logic, so the first step is to set up the imports, API clients, a few limits, and some small helper functions that the rest of the pipeline will reuse.

Create a file called core.py and start with this:

import json
import re
import time
import uuid
import asyncio
from datetime import date, timedelta

import numpy as np
import pandas as pd
from openai import OpenAI

from client import EODHDMCP

eodhd_api_key = "your eodhd api key"
mcp_base_url = "https://mcp.eodhd.dev/mcp"

openai_api_key = "your openai api key"
model_name = "gpt-5.3-chat-latest"

max_lookback_days = 365
max_tool_calls = 10
max_tickers = 5

mcp = EODHDMCP(eodhd_api_key, base_url=mcp_base_url)
oa = OpenAI(api_key=openai_api_key)

def log_event(event, trace_id, **extra):
    payload = {
        "event": event,
        "trace_id": trace_id,
        "ts": round(time.time(), 3),
    }
    payload.update(extra)
    print(json.dumps(payload, default=str))

def get_dates_from_lookback(days):
    end = date.today()
    start = end - timedelta(days=int(days))
    return start.isoformat(), end.isoformat()

def make_state():
    return {
        "tool_calls": 0,
        "tool_trace": [],
    }

def bump_tool_call(state, meta):
    state["tool_calls"] += 1
    state["tool_trace"].append(meta)

    if state["tool_calls"] > max_tool_calls:
        raise RuntimeError("tool call budget exceeded")

def to_text(out):
    if isinstance(out, str):
        return out.strip()

    if hasattr(out, "content"):
        try:
            parts = []
            for item in out.content:
                if hasattr(item, "text") and item.text is not None:
                    parts.append(item.text)
                else:
                    parts.append(str(item))
            return "\n".join(parts).strip()
        except Exception:
            pass

    return str(out).strip()

Note: Replace “your eodhd api key” with your actual EODHD API key. If you don’t have one, you can obtain it by opening an EODHD developer account.

This block does three things:

First, it sets up the two clients we need. mcp is the EODHD MCP client from client.py, and oa is the OpenAI client that will be used for parsing and memo generation later.
Second, it defines a few small limits for the workflow. These help keep the system controlled by capping the lookback window, the number of tickers, and the number of tool calls in a single run.
Third, it adds helper functions that the rest of the file depends on. log_event() gives us lightweight tracing, get_dates_from_lookback() converts a lookback window into start and end dates, make_state() and bump_tool_call() help track MCP usage, and to_text() safely converts tool output into plain text before we parse it.

Parsing a Research Prompt into a Structured Request

The first thing this copilot needs to do is clean up the input. A user isn't going to send a perfectly formatted request every time. They're more likely to write a research thought in plain English and mix the thesis, ticker, and timeframe into one prompt.

That is why the system starts by turning the raw prompt into four fields:

ticker
lookback window
thesis
mode

This logic goes into core.py.

def parse_request(text):
    prompt = f"""
You are extracting fields for a financial thesis-testing copilot.

Return only valid JSON with this exact shape:
{{
  "tickers": ["AAPL"],
  "lookback_days": 180,
  "thesis": "the actual thesis statement",
  "mode": "single"
}}

Rules:
- Extract only tickers explicitly mentioned or strongly implied.
- Do not invent tickers.
- If there are multiple tickers, mode must be "watchlist".
- If there is one ticker, mode must be "single".
- If no timeframe is mentioned, use 180.
- Convert months to days using 30 days per month.
- Convert years to days using 365 days per year.
- Keep the thesis concise but faithful to the user's intent.
- Return JSON only. No markdown. No explanation.

User request:
{text}
""".strip()

    r = oa.responses.create(
        model=model_name,
        input=[{"role": "user", "content": prompt}],
    )

    raw = r.output_text.strip()

    try:
        parsed = json.loads(raw)
    except Exception:
        raise RuntimeError(f"parser returned non-json text: {raw[:500]}")

    return parsed

This function gives the model one very narrow job. It's not asking for an opinion or analysis. It's only asking for structured extraction. That matters because we want flexibility at the input layer, but we don't want the whole workflow to become fuzzy.

Once the model returns that JSON, Python takes over and tightens it up.

def enforce_limits(parsed):
    tickers = parsed.get("tickers", [])
    if not isinstance(tickers, list):
        tickers = []

    tickers = [str(x).upper().strip() for x in tickers if str(x).strip()]
    tickers = tickers[:max_tickers]

    lookback_days = parsed.get("lookback_days", 180)
    try:
        lookback_days = int(lookback_days)
    except Exception:
        lookback_days = 180

    if lookback_days < 1:
        lookback_days = 1
    if lookback_days > max_lookback_days:
        lookback_days = max_lookback_days

    thesis = str(parsed.get("thesis", "")).strip()
    if not thesis:
        thesis = "No thesis provided."

    mode = parsed.get("mode", "single")
    if len(tickers) > 1:
        mode = "watchlist"
    else:
        mode = "single"

    return {
        "tickers": tickers,
        "lookback_days": lookback_days,
        "thesis": thesis,
        "mode": mode,
    }

This second function is what keeps the workflow controlled. It cleans the tickers, caps how many we allow in one request, clamps the time window, and makes sure the mode matches the number of tickers. So the model gives us flexibility, while the code gives us boundaries. That combination is important for a build like this.

Fetching the Two Data Sources: Historical & Fundamental Data

Once the request is parsed, the next step is to pull the data that will feed the rest of the workflow. For this version, we only use two sources from EODHD: historical prices and fundamentals. That's enough to test a surprising number of thesis types without making the build unnecessarily wide.

Add these two functions to core.py:

async def fetch_prices(ticker, start_date, end_date, trace_id, state):
    args = {
        "ticker": ticker,
        "start_date": start_date,
        "end_date": end_date,
        "period": "d",
        "order": "a",
        "fmt": "json",
    }

    out, meta = await mcp.call_tool("get_historical_stock_prices", args, trace_id)
    text = to_text(out)

    bump_tool_call(state, meta)

    if not text:
        raise RuntimeError("empty response from get_historical_stock_prices")

    try:
        data = json.loads(text)
    except Exception:
        raise RuntimeError(f"price tool returned non-json text: {text[:300]}")

    if isinstance(data, dict) and data.get("error"):
        raise RuntimeError(data["error"])

    df = pd.DataFrame(data)
    if df.empty:
        return df

    keep = [c for c in ["date", "close"] if c in df.columns]
    df = df[keep].copy()
    df["ticker"] = ticker

    return df

async def fetch_fundamentals(ticker, trace_id, state):
    args = {
        "ticker": ticker,
        "include_financials": False,
        "fmt": "json",
    }

    out, meta = await mcp.call_tool("get_fundamentals_data", args, trace_id)
    text = to_text(out)

    bump_tool_call(state, meta)

    if not text:
        raise RuntimeError("empty response from get_fundamentals_data")

    try:
        data = json.loads(text)
    except Exception:
        raise RuntimeError(f"fundamentals tool returned non-json text: {text[:300]}")

    if isinstance(data, dict) and data.get("error"):
        raise RuntimeError(data["error"])

    return data

fetch_prices() pulls daily historical data for the requested window and reduces it to the fields we actually need right now: date, close, and the ticker itself. That trimmed DataFrame is what we'll later use for return, drawdown, volatility, trend, and other market signals.
fetch_fundamentals() keeps the fundamentals payload as JSON because we'll extract different categories from it in the next sections, including margins, growth, valuation, revisions, and beta.

A couple of details matter here. Both functions run through the same MCP wrapper, so they automatically inherit the timeout, retry, and metadata handling we already built in client.py. Both also call bump_tool_call(), which lets us track how many external calls were made during a single run. That becomes useful later when we want the workflow to stay inspectable rather than feel like a black box.

Building the First Evidence Layer from Price Data

Once the price data is in, the next step is to turn that raw series into something we can actually reason with. For this copilot, price history isn't the final answer, but it is still the first evidence layer. It helps us test claims around downside control, risk, momentum, and the quality of returns.

Add this to core.py:

def compute_price_signals(prices_df):
    if prices_df is None or prices_df.empty:
        return {}

    df = prices_df.copy()
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df["close"] = pd.to_numeric(df["close"], errors="coerce")

    df = df.dropna(subset=["date", "close"]).sort_values("date")
    if df.empty:
        return {}

    close = df["close"]
    rets = close.pct_change().dropna()

    out = {
        "n_points": int(len(close)),
        "start_price": float(close.iloc[0]),
        "end_price": float(close.iloc[-1]),
    }

    if len(close) >= 2:
        out["ret_total"] = float(close.iloc[-1] / close.iloc[0] - 1)

    if not rets.empty:
        vol_daily = float(rets.std())
        vol_annualized = float(vol_daily * np.sqrt(252))

        out["vol_daily"] = vol_daily
        out["vol_annualized"] = vol_annualized

        if vol_annualized > 0 and "ret_total" in out:
            out["ret_to_vol"] = float(out["ret_total"] / vol_annualized)

    peak = close.cummax()
    drawdown = close / peak - 1
    out["max_drawdown"] = float(drawdown.min())

    logp = np.log(close.values)
    x = np.arange(len(logp))
    if len(logp) >= 3:
        out["trend_slope"] = float(np.polyfit(x, logp, 1)[0])
    else:
        out["trend_slope"] = 0.0

    return out

This function gives us a compact set of market signals from a plain close-price series. ret_total tells us how the stock moved over the full window. vol_annualized tells us how noisy that move was. max_drawdown is useful when the thesis talks about downside control. trend_slope gives us a simple directional measure, and ret_to_vol helps us judge return quality instead of looking at raw return alone.

The important point here is that we aren't asking the model to infer all of this from raw prices. We compute it first in Python, so the later reasoning step starts from explicit signals rather than vague interpretation. That makes the whole workflow much more stable.

Building the Second Evidence Layer from Fundamentals

Price data gives us one side of the thesis. The second side comes from fundamentals. This is the part that makes the project stop sounding generic. Once the copilot starts treating fundamentals as actual evidence, instead of just company profile data, the outputs become much more useful.

Add this helper first in core.py:

def _to_float(x):
    if x in (None, "", "NA"):
        return None
    try:
        return float(x)
    except Exception:
        return None

This small function just cleans values before we use them. Fundamentals payloads often contain strings, nulls, or "NA", so it helps to normalize everything early.

Now add the main function:

def compute_fundamental_signals(fundamentals):
    if not isinstance(fundamentals, dict):
        return {}

    general = fundamentals.get("General", {}) or {}
    highlights = fundamentals.get("Highlights", {}) or {}
    valuation = fundamentals.get("Valuation", {}) or {}
    technicals = fundamentals.get("Technicals", {}) or {}

    earnings = fundamentals.get("Earnings", {}) or {}
    trend = earnings.get("Trend", {}) or {}

    latest_trend = None
    if isinstance(trend, dict) and trend:
        latest_key = sorted(trend.keys())[-1]
        latest_trend = trend.get(latest_key, {}) or {}
    else:
        latest_trend = {}

    out = {
        "sector": general.get("Sector"),
        "industry": general.get("Industry"),
        "employees": _to_float(general.get("FullTimeEmployees")),

        "market_cap": _to_float(highlights.get("MarketCapitalization")),
        "pe_ratio": _to_float(highlights.get("PERatio")),
        "peg_ratio": _to_float(highlights.get("PEGRatio")),
        "profit_margin": _to_float(highlights.get("ProfitMargin")),
        "operating_margin": _to_float(highlights.get("OperatingMarginTTM")),
        "roa": _to_float(highlights.get("ReturnOnAssetsTTM")),
        "roe": _to_float(highlights.get("ReturnOnEquityTTM")),
        "revenue_ttm": _to_float(highlights.get("RevenueTTM")),
        "revenue_growth_yoy": _to_float(highlights.get("QuarterlyRevenueGrowthYOY")),
        "earnings_growth_yoy": _to_float(highlights.get("QuarterlyEarningsGrowthYOY")),
        "dividend_yield": _to_float(highlights.get("DividendYield")),

        "trailing_pe": _to_float(valuation.get("TrailingPE")),
        "forward_pe": _to_float(valuation.get("ForwardPE")),
        "price_sales": _to_float(valuation.get("PriceSalesTTM")),
        "price_book": _to_float(valuation.get("PriceBookMRQ")),
        "ev_revenue": _to_float(valuation.get("EnterpriseValueRevenue")),
        "ev_ebitda": _to_float(valuation.get("EnterpriseValueEbitda")),

        "beta": _to_float(technicals.get("Beta")),

        "earnings_estimate_growth": _to_float(latest_trend.get("earningsEstimateGrowth")),
        "revenue_estimate_growth": _to_float(latest_trend.get("revenueEstimateGrowth")),
        "eps_revisions_up_30d": _to_float(latest_trend.get("epsRevisionsUpLast30days")),
        "eps_revisions_down_30d": _to_float(latest_trend.get("epsRevisionsDownLast30days")),
    }

    if out["trailing_pe"] is not None and out["forward_pe"] is not None:
        out["forward_vs_trailing_pe_change"] = out["forward_pe"] - out["trailing_pe"]

    if out["eps_revisions_up_30d"] is not None and out["eps_revisions_down_30d"] is not None:
        out["net_eps_revisions_30d"] = out["eps_revisions_up_30d"] - out["eps_revisions_down_30d"]

    return out

This function pulls together the parts of the fundamentals payload that matter most for thesis testing.

From Highlights, we get profitability, returns on capital, growth, and market cap. From Valuation, we get multiples like trailing P/E, forward P/E, price-to-sales, and EV-based ratios.
From Technicals, we take beta.
From Earnings.Trend, we pick up forward estimate growth and revision data.

These are the fields that let us test claims around business quality, premium justification, valuation, and forward expectations in a much more concrete way.

The last two derived fields are also useful. The gap between forward P/E and trailing P/E gives us a quick way to see whether valuation is easing or staying stretched. Net EPS revisions over the last 30 days tell us whether analyst expectations are improving or deteriorating.

What Do We Have So Far?

At this point, the copilot can parse a thesis, fetch prices and fundamentals, and convert both into two reusable signal layers:

Price signals cover return, volatility, drawdown, trend, and return quality
Fundamentals signals cover margins, returns on capital, growth, valuation, revisions, and beta.

Next, we’ll turn those signals into what a real research workflow needs: supporting evidence, weakening evidence, what’s missing, a verdict, and the final memo.

Classifying the Thesis

Before the copilot can judge a thesis, it first needs to understand what kind of claim is being made.

This matters because not every thesis should be tested the same way. A claim about controlled downside should care more about drawdown and volatility. A claim about business quality should lean more on margins, returns on capital, and growth. A claim about premium justification may need both business quality and valuation context.

So instead of jumping straight from signals to a verdict, we'll add a small classification step. This gives the system a short list of claim types to work with and a cleaner summary of the thesis.

Add this to core.py:

def classify_thesis(thesis):
    prompt = f"""
You are classifying a stock thesis into a few broad claim types.

Return only valid JSON like this:
{{
  "claim_types": ["controlled_downside", "business_quality"],
  "summary": "short restatement of the thesis"
}}

Allowed claim types:
- controlled_downside
- momentum_strength
- low_risk
- high_risk
- valuation_attractive
- valuation_expensive
- business_quality
- weak_business_quality
- premium_justified
- premium_not_justified

Rules:
- pick only the claim types that are clearly relevant
- do not invent extra labels
- if nothing fits strongly, return an empty list
- summary should be short and faithful

Thesis:
{thesis}
""".strip()

    r = oa.responses.create(
        model=model_name,
        input=[{"role": "user", "content": prompt}],
    )

    raw = r.output_text.strip()

    try:
        out = json.loads(raw)
    except Exception:
        raise RuntimeError(f"thesis classifier returned non-json text: {raw[:500]}")

    claim_types = out.get("claim_types", [])
    if not isinstance(claim_types, list):
        claim_types = []

    clean = []
    allowed = {
        "controlled_downside",
        "momentum_strength",
        "low_risk",
        "high_risk",
        "valuation_attractive",
        "valuation_expensive",
        "business_quality",
        "weak_business_quality",
        "premium_justified",
        "premium_not_justified",
    }

    for x in claim_types:
        x = str(x).strip()
        if x in allowed and x not in clean:
            clean.append(x)

    return {
        "claim_types": clean,
        "summary": str(out.get("summary", "")).strip(),
    }

This function keeps the model’s job narrow. It's not being asked to decide whether the thesis is right or wrong. It's only being asked to identify the kind of thesis it's dealing with. That makes the next step much cleaner, because the evidence engine no longer has to treat every prompt the same way.

The validation at the bottom is important too. Even though the model returns the labels, Python still filters them through an allowed set and removes anything unexpected. That keeps this step flexible, but still controlled.

Turning Signals into Support, Contradiction, and Missing Evidence

This is the step where the copilot actually starts reasoning.

Up to this point, we have three things in hand. We have the thesis, we have the claim types, and we have the signal layers built from price data and fundamentals. But none of that is useful on its own unless the system can turn it into a clear argument.

That means it needs to answer three questions for every thesis:

What in the data supports this claim?
What in the data weakens it?
What is still missing before we can judge it properly?

That's exactly what build_evidence_blocks() does. It takes the classified thesis, checks the relevant price and fundamentals signals, and sorts them into three buckets: support, contradiction, and missing evidence.

Add this to core.py:

def build_evidence_blocks(thesis, thesis_tags, price_signals, fundamental_signals):
    evidence_for = []
    evidence_against = []
    missing_evidence = []

    ret_total = price_signals.get("ret_total")
    vol = price_signals.get("vol_annualized")
    dd = price_signals.get("max_drawdown")
    trend = price_signals.get("trend_slope")
    ret_to_vol = price_signals.get("ret_to_vol")

    pe = fundamental_signals.get("pe_ratio") or fundamental_signals.get("trailing_pe")
    forward_pe = fundamental_signals.get("forward_pe")
    beta = fundamental_signals.get("beta")

    profit_margin = fundamental_signals.get("profit_margin")
    operating_margin = fundamental_signals.get("operating_margin")
    roa = fundamental_signals.get("roa")
    roe = fundamental_signals.get("roe")
    revenue_growth = fundamental_signals.get("revenue_growth_yoy")
    earnings_growth = fundamental_signals.get("earnings_growth_yoy")
    earnings_estimate_growth = fundamental_signals.get("earnings_estimate_growth")
    revenue_estimate_growth = fundamental_signals.get("revenue_estimate_growth")
    net_eps_revisions = fundamental_signals.get("net_eps_revisions_30d")

    claim_types = thesis_tags.get("claim_types", [])

    if "controlled_downside" in claim_types:
        if dd is not None:
            if dd > -0.15:
                evidence_for.append(f"Maximum drawdown was relatively contained at {dd:.2%}.")
            else:
                evidence_against.append(f"Maximum drawdown reached {dd:.2%}, which weakens the controlled-downside claim.")
        else:
            missing_evidence.append("No drawdown signal available to test downside control.")

    if "momentum_strength" in claim_types:
        if trend is not None and ret_total is not None:
            if trend > 0 and ret_total > 0:
                evidence_for.append(f"Trend was positive and total return over the window was {ret_total:.2%}.")
            else:
                evidence_against.append("Trend and total return do not strongly support a momentum-strength view.")
        else:
            missing_evidence.append("No usable trend or return signal available to test momentum.")

    if "low_risk" in claim_types:
        if vol is not None:
            if vol < 0.30:
                evidence_for.append(f"Annualized volatility was {vol:.2%}, which supports a lower-risk view.")
            else:
                evidence_against.append(f"Annualized volatility was {vol:.2%}, which weakens a low-risk thesis.")
        else:
            missing_evidence.append("No volatility signal available to test risk.")

    if "high_risk" in claim_types:
        if vol is not None:
            if vol >= 0.30:
                evidence_for.append(f"Annualized volatility was {vol:.2%}, which supports a higher-risk view.")
            else:
                evidence_against.append(f"Annualized volatility was only {vol:.2%}, which does not strongly support a high-risk thesis.")
        else:
            missing_evidence.append("No volatility signal available to test risk.")

    if "valuation_attractive" in claim_types:
        if pe is not None:
            if pe < 20:
                evidence_for.append(f"P/E is {pe:.2f}, which supports a more attractive valuation view.")
            elif pe > 30:
                evidence_against.append(f"P/E is {pe:.2f}, which weakens the attractive-valuation claim.")
        else:
            missing_evidence.append("No P/E metric available to test valuation attractiveness.")

        if forward_pe is not None and pe is not None:
            if forward_pe < pe:
                evidence_for.append(f"Forward P/E ({forward_pe:.2f}) is below trailing P/E ({pe:.2f}), which can support an improving earnings setup.")

    if "valuation_expensive" in claim_types or "premium_not_justified" in claim_types:
        if pe is not None:
            if pe > 30:
                evidence_for.append(f"P/E is {pe:.2f}, which supports an expensive-valuation view.")
            else:
                evidence_against.append(f"P/E is {pe:.2f}, which does not strongly support an expensive-valuation claim.")
        else:
            missing_evidence.append("No P/E metric available to test whether valuation looks expensive.")

    if "business_quality" in claim_types or "premium_justified" in claim_types:
        quality_hits = 0

        if operating_margin is not None:
            if operating_margin >= 0.25:
                evidence_for.append(f"Operating margin is {operating_margin:.2%}, which supports strong business quality.")
                quality_hits += 1
            else:
                evidence_against.append(f"Operating margin is {operating_margin:.2%}, which is not especially strong for a quality claim.")

        if profit_margin is not None:
            if profit_margin >= 0.20:
                evidence_for.append(f"Profit margin is {profit_margin:.2%}, which supports business quality.")
                quality_hits += 1
            else:
                evidence_against.append(f"Profit margin is {profit_margin:.2%}, which weakens a strong-quality thesis.")

        if roa is not None:
            if roa >= 0.10:
                evidence_for.append(f"ROA is {roa:.2%}, which supports efficient asset use.")
                quality_hits += 1
            else:
                evidence_against.append(f"ROA is {roa:.2%}, which does not strongly support a quality claim.")

        if roe is not None:
            if roe >= 0.20:
                evidence_for.append(f"ROE is {roe:.2%}, which supports strong capital efficiency.")
                quality_hits += 1
            else:
                evidence_against.append(f"ROE is {roe:.2%}, which is weaker than expected for a strong-quality thesis.")

        if revenue_growth is not None:
            if revenue_growth > 0:
                evidence_for.append(f"Quarterly revenue growth was {revenue_growth:.2%} YoY, which supports business momentum.")
                quality_hits += 1
            else:
                evidence_against.append(f"Quarterly revenue growth was {revenue_growth:.2%} YoY, which weakens the quality claim.")

        if earnings_growth is not None:
            if earnings_growth > 0:
                evidence_for.append(f"Quarterly earnings growth was {earnings_growth:.2%} YoY, which supports operating strength.")
                quality_hits += 1
            else:
                evidence_against.append(f"Quarterly earnings growth was {earnings_growth:.2%} YoY, which weakens the quality claim.")

        if earnings_estimate_growth is not None:
            if earnings_estimate_growth > 0:
                evidence_for.append(f"Forward earnings estimate growth is {earnings_estimate_growth:.2%}, which supports a healthier forward outlook.")
            else:
                evidence_against.append(f"Forward earnings estimate growth is {earnings_estimate_growth:.2%}, which weakens the quality argument.")

        if revenue_estimate_growth is not None:
            if revenue_estimate_growth > 0:
                evidence_for.append(f"Forward revenue estimate growth is {revenue_estimate_growth:.2%}, which supports ongoing business strength.")
            else:
                evidence_against.append(f"Forward revenue estimate growth is {revenue_estimate_growth:.2%}, which weakens the quality argument.")

        if net_eps_revisions is not None:
            if net_eps_revisions > 0:
                evidence_for.append(f"Net EPS revisions over the last 30 days are positive ({net_eps_revisions:.0f}), which supports improving expectations.")
            elif net_eps_revisions < 0:
                evidence_against.append(f"Net EPS revisions over the last 30 days are negative ({net_eps_revisions:.0f}), which weakens the thesis.")

        if quality_hits == 0:
            missing_evidence.append("This version could not extract enough direct business-quality metrics to test the quality claim.")

    if "weak_business_quality" in claim_types:
        if operating_margin is not None and operating_margin < 0.15:
            evidence_for.append(f"Operating margin is only {operating_margin:.2%}, which supports a weaker-quality view.")
        if profit_margin is not None and profit_margin < 0.10:
            evidence_for.append(f"Profit margin is only {profit_margin:.2%}, which supports a weaker-quality view.")
        if revenue_growth is not None and revenue_growth <= 0:
            evidence_for.append(f"Revenue growth is {revenue_growth:.2%} YoY, which supports a weaker-quality view.")
        if earnings_growth is not None and earnings_growth <= 0:
            evidence_for.append(f"Earnings growth is {earnings_growth:.2%} YoY, which supports a weaker-quality view.")

    if beta is not None:
        if beta > 1.2:
            evidence_against.append(f"Beta is {beta:.2f}, which suggests above-market sensitivity.")
        elif beta < 0.9:
            evidence_for.append(f"Beta is {beta:.2f}, which suggests below-market sensitivity.")
    else:
        missing_evidence.append("No beta value available.")

    if ret_to_vol is None:
        missing_evidence.append("No return-to-volatility signal available.")

    if not evidence_for and not evidence_against:
        missing_evidence.append("The current data is not enough to strongly support or reject the thesis.")

    return {
        "thesis": thesis,
        "thesis_summary": thesis_tags.get("summary", ""),
        "claim_types": claim_types,
        "evidence_for": evidence_for,
        "evidence_against": evidence_against,
        "missing_evidence": list(dict.fromkeys(missing_evidence)),
    }

The function looks long, but the logic is simple once you break it down.

It starts by pulling the signals it needs from the two evidence layers that we built earlier. Then it checks the thesis tags one by one. If the thesis is about controlled downside, it looks at drawdown. If it's about risk, it looks at volatility and beta. If't is about business quality, it leans on margins, returns on capital, growth, and revisions. If it's about valuation, it checks multiples like P/E and the relationship between forward and trailing valuation.

That's the key shift in this project. The copilot is no longer just collecting data. It's deciding which parts of the EODHD-backed signal set actually matter for the thesis in front of it.

The three output buckets are what make this useful.

evidence_for holds the points that support the claim.
evidence_against holds the points that weaken it.
missing_evidence makes the gaps explicit instead of letting the system sound more confident than it should.

That's what makes this feel like a thesis-testing workflow rather than a polished stock summary.

Sanity Check (Jupyter Notebook)

Run this code inside test.ipynb for a quick sanity check:

import uuid
from core import (
    fetch_prices,
    fetch_fundamentals,
    compute_price_signals,
    classify_thesis,
    build_evidence_blocks,
    make_state
)
import json

trace_id = uuid.uuid4().hex[:10]
state = make_state()

thesis = "Apple looks attractive because downside has been controlled and business quality remains high."

prices = await fetch_prices("AAPL.US", "2026-01-01", "2026-04-01", trace_id, state)
funds = await fetch_fundamentals("AAPL.US", trace_id, state)

signals = compute_price_signals(prices)
tags = classify_thesis(thesis)
evidence = build_evidence_blocks(thesis, tags, signals, funds)

print(tags)
print(json.dumps(evidence, indent=2))

Expected Output:

Assigning a Verdict

Once the evidence is structured, the copilot still needs one more layer before it can write a memo. It needs a controlled way to label the thesis.

That's the job of decide_verdict(). It looks at how much evidence supports the thesis, how much weakens it, and whether the claim still depends on missing business-quality or valuation evidence. The goal here isn't to create a perfect scoring model. It's to make sure the system doesn't jump from a few evidence strings straight into a confident conclusion.

Add this to core.py:

def decide_verdict(evidence, claim_types=None):
    claim_types = claim_types or []

    evidence_for = evidence.get("evidence_for", [])
    evidence_against = evidence.get("evidence_against", [])
    missing = evidence.get("missing_evidence", [])

    n_for = len(evidence_for)
    n_against = len(evidence_against)
    n_missing = len(missing)

    quality_claim = any(x in claim_types for x in ["business_quality", "weak_business_quality", "premium_justified", "premium_not_justified"])
    valuation_claim = any(x in claim_types for x in ["valuation_attractive", "valuation_expensive", "premium_justified", "premium_not_justified"])

    if n_for == 0 and n_against == 0:
        return {
            "verdict": "unresolved_due_to_missing_evidence",
            "reason": "There is not enough usable evidence to test the thesis.",
        }

    if quality_claim and n_missing >= 1:
        if n_against > 0:
            return {
                "verdict": "weakly_supported",
                "reason": "Some evidence supports the thesis, but direct business-quality evidence is missing and contradictory signals remain.",
            }
        return {
            "verdict": "partially_supported",
            "reason": "Part of the thesis is supported, but direct business-quality evidence is missing.",
        }

    if valuation_claim and n_missing >= 1:
        return {
            "verdict": "unresolved_due_to_missing_evidence",
            "reason": "The thesis depends on valuation evidence that is not available in this version.",
        }

    if n_for > 0 and n_against == 0:
        if n_missing >= 2:
            return {
                "verdict": "partially_supported",
                "reason": "The available evidence supports the thesis, but important evidence is still missing.",
            }
        return {
            "verdict": "supported",
            "reason": "The available evidence mainly supports the thesis.",
        }

    if n_against > 0 and n_for == 0:
        return {
            "verdict": "not_supported",
            "reason": "The available evidence mainly weakens the thesis.",
        }

    if n_for > n_against:
        return {
            "verdict": "partially_supported",
            "reason": "There is more supporting evidence than contradicting evidence, but the thesis is not fully confirmed.",
        }

    if n_against >= n_for:
        return {
            "verdict": "weakly_supported",
            "reason": "Contradicting evidence is meaningful enough that the thesis is only weakly supported.",
        }

    return {
        "verdict": "unresolved_due_to_missing_evidence",
        "reason": "The evidence is mixed and does not clearly resolve the thesis.",
    }

The logic here is intentionally simple. It doesn't try to do fine-grained scoring. Instead, it uses the shape of the evidence to decide whether the thesis is supported, partially supported, weakly supported, not supported, or still unresolved.

A couple of checks matter more than the rest. If the thesis depends on business-quality or valuation evidence and that evidence is still missing, the verdict gets capped early instead of sounding stronger than it should. That is important because a thesis can look convincing on price behavior alone, but still be incomplete if the claim depends on fundamentals that aren't actually present.

The other useful thing about this function is that it returns both a short label and a reason. That makes the final output easier to understand later, and it also gives the memo-writing step something cleaner to work from than a bare category.

Building the Facts Object

Before the memo gets written, the system first puts everything into one structured object. That object becomes the single source of truth for the final output. Instead of handing the model a mix of scattered variables, we'll give it one clean package containing the thesis, signals, company context, evidence, and verdict.

1. Company Context

We’ll start with a small helper that pulls the basic company context from the fundamentals payload.

Add this to core.py:

def extract_company_context(fundamentals):
    if not isinstance(fundamentals, dict):
        return {}

    gen = fundamentals.get("General", {}) or {}

    out = {
        "name": gen.get("Name"),
        "code": gen.get("Code"),
        "exchange": gen.get("Exchange"),
        "sector": gen.get("Sector"),
        "industry": gen.get("Industry"),
        "country": gen.get("CountryName"),
        "market_cap": gen.get("MarketCapitalization"),
        "pe_ratio": gen.get("PERatio"),
        "beta": gen.get("Beta"),
        "dividend_yield": gen.get("DividendYield"),
        "description": gen.get("Description"),
    }

    clean = {}
    for k, v in out.items():
        if v not in (None, "", "NA"):
            clean[k] = v

    return clean

This function is just a cleanup step. It gives us a compact company context block that can later sit alongside the price and fundamentals signals without dragging the full fundamentals payload into the memo layer.

2. Single-Stock Facts Builder

Now add the single-stock facts builder:

def build_thesis_facts(parsed, ticker, signals, fundamentals, thesis_tags, evidence):
    company = extract_company_context(fundamentals)

    facts = {
        "type": "single_name_thesis_test",
        "ticker": ticker,
        "lookback_days": parsed["lookback_days"],
        "thesis": parsed["thesis"],
        "thesis_summary": thesis_tags.get("summary", ""),
        "claim_types": thesis_tags.get("claim_types", []),
        "market_signals": {
            "ret_total": signals.get("ret_total"),
            "vol_annualized": signals.get("vol_annualized"),
            "max_drawdown": signals.get("max_drawdown"),
            "trend_slope": signals.get("trend_slope"),
            "ret_to_vol": signals.get("ret_to_vol"),
            "start_price": signals.get("start_price"),
            "end_price": signals.get("end_price"),
            "n_points": signals.get("n_points"),
        },
        "company_context": {
            "name": company.get("name"),
            "exchange": company.get("exchange"),
            "sector": company.get("sector"),
            "industry": company.get("industry"),
            "country": company.get("country"),
            "market_cap": company.get("market_cap"),
            "pe_ratio": company.get("pe_ratio"),
            "beta": company.get("beta"),
            "dividend_yield": company.get("dividend_yield"),
        },
        "description": company.get("description"),
        "evidence_for": evidence.get("evidence_for", []),
        "evidence_against": evidence.get("evidence_against", []),
        "missing_evidence": evidence.get("missing_evidence", []),
    }

    facts["verdict"] = decide_verdict(evidence, thesis_tags.get("claim_types", []))
    return facts

This is the main facts object for a single-stock thesis. It pulls together the parsed thesis, the market signals, the basic company context, the evidence buckets, and the verdict. At this point, the copilot has already done the reasoning work. The memo isn't deciding anything new. It's just writing from this object.

3. Watchlist Facts Builder

Now add the watchlist version:

def build_watchlist_facts(parsed, tickers, signals_by_ticker, fundamentals_by_ticker, thesis_tags, evidence_by_ticker):
    per_ticker = {}

    for t in tickers:
        company = extract_company_context(fundamentals_by_ticker.get(t, {}))
        signals = signals_by_ticker.get(t, {})
        evidence = evidence_by_ticker.get(t, {})

        per_ticker[t] = {
            "company_context": {
                "name": company.get("name"),
                "sector": company.get("sector"),
                "industry": company.get("industry"),
                "market_cap": company.get("market_cap"),
                "pe_ratio": company.get("pe_ratio"),
                "beta": company.get("beta"),
            },
            "market_signals": {
                "ret_total": signals.get("ret_total"),
                "vol_annualized": signals.get("vol_annualized"),
                "max_drawdown": signals.get("max_drawdown"),
                "trend_slope": signals.get("trend_slope"),
                "ret_to_vol": signals.get("ret_to_vol"),
            },
            "evidence_for": evidence.get("evidence_for", []),
            "evidence_against": evidence.get("evidence_against", []),
            "missing_evidence": evidence.get("missing_evidence", []),
            "verdict": decide_verdict(evidence, thesis_tags.get("claim_types", []))
        }

    facts = {
        "type": "watchlist_thesis_test",
        "tickers": tickers,
        "lookback_days": parsed["lookback_days"],
        "thesis": parsed["thesis"],
        "thesis_summary": thesis_tags.get("summary", ""),
        "claim_types": thesis_tags.get("claim_types", []),
        "per_ticker": per_ticker,
    }

    return facts

This version does the same thing, but across multiple tickers. Instead of one top-level evidence block, it stores a per-ticker structure so the memo layer can later compare names without needing to reconstruct anything.

That is the main reason this section matters. By the time we reach the memo step, we no longer want to pass loose values around. We want one structured object that already contains:

the thesis
the relevant signals
the company context
the evidence buckets
the verdict

That keeps the final writing step much cleaner and makes the whole workflow easier to debug.

Sanity Check (Jupyter Notebook)

Run this code inside test.ipynb for a quick sanity check:

from core import build_thesis_facts, extract_company_context

facts = build_thesis_facts(
    parsed={
        "tickers": ["AAPL"],
        "lookback_days": 180,
        "thesis": "Apple looks attractive because downside has been controlled and business quality remains high.",
        "mode": "single"
    },
    ticker="AAPL.US",
    signals=signals,
    fundamentals=funds,
    thesis_tags=tags,
    evidence=evidence
)

print(json.dumps(facts, indent=2))

Expected Output:

{
  "type": "single_name_thesis_test",
  "ticker": "AAPL.US",
  "lookback_days": 180,
  "thesis": "Apple looks attractive because downside has been controlled and business quality remains high.",
  "thesis_summary": "Apple is attractive due to controlled downside and strong business quality",
  "claim_types": [
    "controlled_downside",
    "business_quality"
  ],
  "market_signals": {
    "ret_total": -0.05675067340688533,
    "vol_annualized": 0.2504818805125429,
    "max_drawdown": -0.11322450740687473,
    "trend_slope": -0.0005437843809243782,
    "ret_to_vol": -0.22656598270006817,
    "start_price": 271.01,
    "end_price": 255.63,
    "n_points": 62
  },
  "company_context": {
    "name": "Apple Inc",
    "exchange": "NASDAQ",
    "sector": "Technology",
    "industry": "Consumer Electronics",
    "country": "USA",
    "market_cap": null,
    "pe_ratio": null,
    "beta": null,
    "dividend_yield": null
  },
  "description": "Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple Vision Pro, Apple TV, Apple Watch, Beats products, and HomePod, as well as Apple branded and third-party accessories. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts, as well as advertising services include third-party licensing arrangements and its own advertising platforms. In addition, the company offers various subscription-based services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription news and magazine service; Apple TV, which offers exclusive original content and live sports; Apple Card, a co-branded credit card; and Apple Pay, a cashless payment service, as well as licenses its intellectual property. The company serves consumers, and small and mid-sized businesses; and the education, enterprise, and government markets. It distributes third-party applications for its products through the App Store. The company also sells its products through its retail and online stores, and direct sales force; and third-party cellular network carriers and resellers. The company was formerly known as Apple Computer, Inc. and changed its name to Apple Inc. in January 2007. Apple Inc. was founded in 1976 and is headquartered in Cupertino, California.",
  "evidence_for": [
    "Maximum drawdown was relatively contained at -11.32%."
  ],
  "evidence_against": [],
  "missing_evidence": [
    "This version does not include direct business-quality metrics such as margins, growth, cash flow, or return on capital.",
    "Only basic company context is available, which is not enough on its own to confirm business quality.",
    "No beta value available."
  ],
  "verdict": {
    "verdict": "partially_supported",
    "reason": "Part of the thesis is supported, but direct business-quality evidence is missing."
  }
}

Writing the Final Memo

At this point, the hard part is already done.

By the time we reach the memo step, the copilot already has a structured facts object with the thesis, claim types, market signals, company context, evidence buckets, and verdict. So this final function isn't where the reasoning happens. It's just the presentation layer that turns that structured judgment into something readable.

Add this to core.py:

def write_thesis_memo(facts):
    prompt = f"""
You are writing a short financial research memo.

Write using only the facts provided below.
Do not invent numbers, events, comparisons, or opinions beyond the supplied evidence.
If evidence is missing, say so clearly.

Use this exact structure:

1. Thesis under review
2. Supporting evidence
3. Evidence that weakens the thesis
4. Missing evidence
5. Verdict
6. Bottom-line assessment

Style rules:
- Keep it concise
- Keep it analytical and professional
- No bullet points unless necessary
- No hype
- No generic investment disclaimer language
- The bottom-line assessment should be balanced and evidence-based
- The verdict section must explicitly use the supplied verdict

Facts:
{json.dumps(facts, indent=2, default=str)}
""".strip()

    r = oa.responses.create(
        model=model_name,
        input=[{"role": "user", "content": prompt}],
    )

    return r.output_text.strip()

This function keeps the model boxed into one narrow task. It's not being asked to look at raw price history, raw fundamentals, or scattered variables. It's being asked to write from one clean facts object that already contains the judgment.

That separation matters because it keeps the final memo grounded. The model isn't deciding what it thinks about the stock at the last second. It's simply turning the structured output of the earlier steps into a short research note.

The prompt is also deliberately strict. It fixes the memo structure, tells the model not to invent anything, and makes the verdict explicit instead of leaving it implied. That helps the final output stay consistent even when the underlying thesis changes.

Sanity Check (Jupyter Notebook)

You can test it with a facts object from the previous section:

from core import write_thesis_memo

memo = write_thesis_memo(facts)
print(memo)

Expected Output:

Stitching Everything Together

At this point, all the individual pieces are ready. We have the parser, the data fetchers, the signal builders, the thesis classifier, the evidence engine, the verdict layer, and the memo writer. The only thing left is to connect them into one end-to-end function.

Add this to core.py:

async def run_thesis_copilot(user_text):
    trace_id = uuid.uuid4().hex[:10]
    log_event("request_started", trace_id, text=user_text)

    parsed = enforce_limits(parse_request(user_text))
    tickers = parsed["tickers"]

    if not tickers:
        return {
            "memo": "No valid ticker was found in the request.",
            "facts": {},
            "data_used": {},
            "tool_trace_id": trace_id,
        }

    log_event(
        "parsed",
        trace_id,
        tickers=tickers,
        lookback_days=parsed["lookback_days"],
        mode=parsed["mode"],
        thesis=parsed["thesis"],
    )

    start_date, end_date = get_dates_from_lookback(parsed["lookback_days"])
    state = make_state()

    try:
        thesis_tags = classify_thesis(parsed["thesis"])

        if parsed["mode"] == "single":
            ticker = tickers[0]
            ticker_full = ticker if "." in ticker else f"{ticker}.US"

            log_event(
                "tool_phase",
                trace_id,
                mode="single",
                ticker=ticker_full,
                start_date=start_date,
                end_date=end_date,
            )

            prices = await fetch_prices(ticker_full, start_date, end_date, trace_id, state)
            funds = await fetch_fundamentals(ticker_full, trace_id, state)

            price_signals = compute_price_signals(prices)
            fundamental_signals = compute_fundamental_signals(funds)

            evidence = build_evidence_blocks(
                parsed["thesis"],
                thesis_tags,
                price_signals,
                fundamental_signals
            )

            facts = build_thesis_facts(
                parsed,
                ticker_full,
                price_signals,
                funds,
                thesis_tags,
                evidence
            )

            facts["fundamental_signals"] = fundamental_signals

            memo = write_thesis_memo(facts)

            out = {
                "memo": memo,
                "facts": facts,
                "data_used": {
                    "tickers": [ticker_full],
                    "date_range": [start_date, end_date],
                    "tools_called": [x.get("tool") for x in state["tool_trace"]],
                    "tool_calls": state["tool_calls"],
                },
                "tool_trace_id": trace_id,
            }

            log_event("request_finished", trace_id, tool_calls=state["tool_calls"])
            return out

        ticker_full = [x if "." in x else f"{x}.US" for x in tickers]

        log_event(
            "tool_phase",
            trace_id,
            mode="watchlist",
            tickers=ticker_full,
            start_date=start_date,
            end_date=end_date,
        )

        signals_by_ticker = {}
        funds_by_ticker = {}
        evidence_by_ticker = {}

        for t in ticker_full:
            prices = await fetch_prices(t, start_date, end_date, trace_id, state)
            funds = await fetch_fundamentals(t, trace_id, state)

            price_signals = compute_price_signals(prices)
            fundamental_signals = compute_fundamental_signals(funds)

            evidence = build_evidence_blocks(
                parsed["thesis"],
                thesis_tags,
                price_signals,
                fundamental_signals
            )

            signals_by_ticker[t] = {
                **price_signals,
                "fundamental_signals": fundamental_signals
            }
            funds_by_ticker[t] = funds
            evidence_by_ticker[t] = evidence

        facts = build_watchlist_facts(
            parsed,
            ticker_full,
            signals_by_ticker,
            funds_by_ticker,
            thesis_tags,
            evidence_by_ticker,
        )

        memo = write_thesis_memo(facts)

        out = {
            "memo": memo,
            "facts": facts,
            "data_used": {
                "tickers": ticker_full,
                "date_range": [start_date, end_date],
                "tools_called": [x.get("tool") for x in state["tool_trace"]],
                "tool_calls": state["tool_calls"],
            },
            "tool_trace_id": trace_id,
        }

        log_event("request_finished", trace_id, tool_calls=state["tool_calls"])
        return out

    except Exception as e:
        detail = repr(e)
        if hasattr(e, "exceptions"):
            detail = detail + " | " + " ; ".join([repr(x) for x in e.exceptions])

        log_event("request_failed", trace_id, err=detail)

        return {
            "memo": f"failed: {e}",
            "facts": {},
            "data_used": {
                "tickers": tickers,
                "date_range": [start_date, end_date],
                "tools_called": [x.get("tool") for x in state["tool_trace"]],
                "tool_calls": state["tool_calls"],
            },
            "tool_trace_id": trace_id,
        }

This function is just the full workflow in one place. It parses the request, fetches the data, computes the two signal layers, builds the evidence, assembles the facts object, writes the memo, and returns everything in a clean output.

The useful part is that it returns more than just the memo. It also returns the structured facts object, the tools that were used, the date range, and the trace ID. That keeps the final result inspectable instead of turning the copilot into a black box.

Demo Time! (Jupyter Notebook)

Demo 1: Testing Whether a Premium Is Actually Justified

This is a good first demo because it pushes the copilot beyond a basic single-stock check. The prompt isn't asking whether NVIDIA is a good company in general. It's asking whether NVIDIA’s premium over AMD can actually be defended using market behavior and business quality.

Here's the prompt:

from core import run_thesis_copilot

q = """
Between NVDA and AMD, I think NVDA's premium is still justified by stronger market behavior and business quality.
Check that over the last 6 months.
""".strip()

result = await run_thesis_copilot(q)

print(result["memo"])
print(result["data_used"])

And here's the output:

What makes this output useful is that it doesn't flatten the result into a simple yes or no. NVIDIA clearly looks stronger on business quality, but market behavior isn't as convincing, and the lack of direct valuation data stops the copilot from overclaiming.

This is the kind of behavior we want. The system isn't just comparing two companies. It's testing whether the specific claim about a premium actually holds up.

Demo 2: Testing Whether Volatility Is Too High for the Underlying Business

The second demo shifts back to a single-stock thesis, but the claim is different. This time, the question isn't whether the company looks attractive. It's whether the stock is more volatile than the underlying business quality would justify.

Here's the prompt:

q = """
TSLA feels too volatile for the underlying business quality.
Test that thesis over the last year.
""".strip()

result = await run_thesis_copilot(q)

print(result["memo"])
print(result["data_used"])

And here's the output:

This result is useful because it shows a more conflicted thesis. Tesla’s recent returns and forward growth expectations offer some support, but the current profitability, recent operating trends, revisions, and volatility profile all push back against the idea that the business quality is strong enough to fully justify that risk.

So the final verdict lands where it should: not as a clean confirmation, but as a weakly supported thesis.

Final Thoughts

At this point, the copilot already does the most important part well. It can take a natural-language thesis, pull the right market and fundamentals data through EODHD’s MCP layer, turn those inputs into structured evidence, and return a research memo that's much more disciplined than a normal stock summary.

At the same time, this version still has clear limits. It doesn't yet go deeper into statement-level accounting logic, it doesn't use news or catalyst context, and its handling of relative valuation can still be stronger for more demanding comparison cases.

But even with those limits, the shift here is already meaningful. The real change wasn't just connecting a model to financial data. It was moving from summarizing stocks to testing claims.

How to Build an Agentic Terminal Workflow with GitHub Copilot CLI and MCP Servers

Caleb Mintoumba — Wed, 29 Apr 2026 14:14:42 +0000

Most developers live in their terminal. You run commands, debug pipelines, manage infrastructure, and navigate codebases, all from a shell prompt.

But despite how central the terminal is to developer workflows, AI assistance there has remained shallow: autocomplete a command here, explain an error there.

That changes when you combine GitHub Copilot CLI with MCP (Model Context Protocol) servers. Instead of an AI that reacts to isolated prompts, you get a terminal that understands your project context, queries live data sources, and chains tool calls autonomously – what the industry is starting to call an agentic workflow.

In this tutorial, you'll learn exactly how to wire these two systems together, step by step. By the end, your terminal will be able to do things like understand your Git history before suggesting a fix, query your running Docker containers before writing a compose patch, or pull live API schemas before generating a request.

Prerequisites
What is GitHub Copilot CLI?
What is the Model Context Protocol?
How MCP Servers Work in a Terminal Context
Step 1 – Install and Configure GitHub Copilot CLI
Step 2 – Set Up Your First MCP Server
Step 3 – Wire Copilot CLI to Your MCP Server
Step 4 – Build a Real Agentic Workflow
Step 5 – Extend with Multiple MCP Servers
Debugging Common Issues
Conclusion

Prerequisites

Before you start, make sure you have the following:

Node.js v18 or later (node --version)
npm v9 or later
A GitHub account with Copilot enabled. The free tier (available to all GitHub users) is sufficient to follow this tutorial. Pro, Business, and Enterprise plans unlock higher usage limits but aren't required.
GitHub CLI (gh) installed. We'll use it to authenticate.
Basic familiarity with the terminal and JSON configuration files
(Optional) Docker installed if you want to follow the Docker MCP example in Step 5

You don't need prior experience with MCP or agentic AI systems, as this guide builds that understanding from the ground up.

What is GitHub Copilot CLI?

GitHub Copilot CLI is the terminal-native interface to GitHub's Copilot AI. Unlike the IDE plugin (which assists with code completion), Copilot CLI is designed specifically for shell workflows. It exposes three main commands:

gh copilot suggest proposes a shell command based on a natural language description
gh copilot explain explains what a given command does
gh copilot alias generates shell aliases for Copilot subcommands

Here's a quick example of suggest in action:

gh copilot suggest "find all files modified in the last 24 hours and larger than 1MB"

Copilot will return something like:

find . -mtime -1 -size +1M

It will also ask if you want to copy it, run it directly, or revise the request. This interactive loop is already useful – but by itself, Copilot CLI has no awareness of your project context. It doesn't know your repo structure, your running services, or your deployment environment. That's where MCP comes in.

What is the Model Context Protocol?

The Model Context Protocol (MCP) is an open standard introduced by Anthropic in late 2024. Its goal is straightforward: give AI models a standardized way to connect to external tools, data sources, and services.

Think of MCP as a universal adapter layer between an AI model and the real world. Without MCP, each AI integration is custom-built: one plugin for GitHub, another for Postgres, another for Slack, all with incompatible interfaces. MCP defines a single protocol that any tool can implement, and any compatible AI client can consume.

An MCP server exposes tools (functions the AI can call), resources (data the AI can read), and prompts (reusable instruction templates). The AI client in our case, a Copilot-powered terminal discovers these capabilities at runtime and uses them autonomously to complete a task.

A few notable MCP servers that are already production-ready:

MCP Server	What it exposes
@modelcontextprotocol/server-filesystem	Read/write access to local files
@modelcontextprotocol/server-git	Git log, diff, blame, branch operations
@modelcontextprotocol/server-github	GitHub Issues, PRs, repos via API
@modelcontextprotocol/server-postgres	Live query execution on a Postgres DB
@modelcontextprotocol/server-docker	Container inspection, logs, stats

The full registry lives at github.com/modelcontextprotocol/servers.

How MCP Servers Work in a Terminal Context

Before we get hands-on, it's worth understanding the communication model.

MCP servers run as local processes. They communicate with the AI client over stdio (standard input/output) or over an HTTP/SSE transport. The client sends JSON-RPC messages to the server, and the server responds with structured data.

Here's the simplified flow:

The key word here is grounded. Without MCP, Copilot responds based purely on its training data and your prompt. With MCP, it can call git log --oneline -20 before answering your question about recent regressions and its answer is based on your actual code history, not a generalized assumption.

Step 1 – Install and Configure GitHub Copilot CLI

If you haven't already, install the GitHub CLI:

# macOS
brew install gh

# Ubuntu/Debian
sudo apt install gh

# Windows (via winget)
winget install --id GitHub.cli

Then authenticate:

gh auth login

Follow the interactive prompts. Select GitHub.com, then HTTPS, and authenticate via browser when prompted.

Now install the Copilot CLI extension:

gh extension install github/gh-copilot

Verify the installation:

gh copilot --version

You should see output like gh-copilot version 1.x.x.

Optional but recommended: set up shell aliases. This makes the workflow much faster. For bash or zsh:

# Add to your ~/.bashrc or ~/.zshrc
eval "$(gh copilot alias -- bash)"   # for bash
eval "$(gh copilot alias -- zsh)"    # for zsh

After reloading your shell (source ~/.bashrc), you can use ghcs as shorthand for gh copilot suggest and ghce for gh copilot explain.

Step 2 – Set Up Your First MCP Server

We'll start with server-git. It's the most immediately useful for a development workflow and has zero external dependencies.

Install it globally via npm:

npm install -g @modelcontextprotocol/server-git

Test that it runs:

mcp-server-git --version

This server exposes the following tools to any compatible MCP client:

git_log retrieve commit history with filters
git_diff diff between branches or commits
git_status current working tree status
git_show inspect a specific commit
git_blame annotate file lines with commit info
git_branch list or switch branches

Now create a configuration file. MCP clients look for a file called mcp.json to discover available servers. Create it in your project root or in a global config directory:

mkdir -p ~/.config/mcp
touch ~/.config/mcp/mcp.json

Add the following content:

{
  "mcpServers": {
    "git": {
      "command": "mcp-server-git",
      "args": ["--repository", "."],
      "transport": "stdio"
    }
  }
}

A few notes on this config:

command is the binary to run. Make sure it's on your $PATH.
args passes --repository . so the server scopes itself to the current working directory.
transport: "stdio" means communication happens over standard input/output the simplest and most stable option for local servers.

Step 3 – Wire Copilot CLI to Your MCP Server

This is where the two systems connect. GitHub Copilot CLI supports MCP via its --mcp-config flag (available from version 1.3+). You point it at your mcp.json, and Copilot will automatically initialize the declared servers before processing your prompt.

Here's the basic invocation:

gh copilot suggest --mcp-config ~/.config/mcp/mcp.json "why did the build break in the last commit?"

When you run this inside a Git repository, Copilot CLI will:

Start the mcp-server-git process
Call git_log to retrieve recent commits
Call git_diff on the most recent commit
Synthesize an answer based on the actual diff output

Try it yourself on a repo with a recent failing commit. The difference in response quality compared to a plain gh copilot suggest is immediately obvious.

Tip: avoid retyping the flag every time. Add a shell function to your .bashrc/.zshrc:

function aterm() {
  gh copilot suggest --mcp-config ~/.config/mcp/mcp.json "$@"
}

Now you just type:

aterm "what changed between main and feature/auth?"

And you're running a fully context-aware, MCP-powered query from a single short command. This function name aterm for agentic terminal is what we'll use throughout the rest of this tutorial.

Step 4 – Build a Real Agentic Workflow

Let's move beyond individual queries and build a workflow that chains multiple tool calls to complete a real developer task: diagnosing a regression.

Imagine you pushed a feature branch and your CI pipeline failed. You don't know exactly which change caused it. Here's how your agentic terminal handles it:

Query 1: understand what changed

aterm "summarize all commits on feature/auth that aren't on main yet"

Copilot calls git_log with branch filters, then returns a structured summary of commits unique to your branch. No copy-pasting SHAs manually.

Query 2: isolate the diff

aterm "show me everything that changed in the auth middleware between main and feature/auth"

This triggers git_diff scoped to the path containing your middleware. Copilot returns the diff with an explanation of what each change does.

Query 3: find the likely culprit

aterm "which of those changes could cause a JWT validation failure?"

At this point, Copilot has the diff in its context window from the previous tool calls. It reasons over the actual code changes not generic knowledge about JWT and pinpoints the likely issue.

Query 4: generate the fix

aterm "write the corrected version of that validation function"

Copilot generates a targeted fix based on the specific code it retrieved via MCP. You get a patch you can directly apply, not a generic code template.

This four-step sequence – understand, isolate, reason, fix – is a complete agentic loop. Each step is grounded in live repository data retrieved through MCP tools. The AI is not hallucinating context. Instead, it's reading your actual codebase.

Step 5 – Extend with Multiple MCP Servers

One MCP server is useful. Multiple MCP servers working together is where the workflow becomes genuinely powerful. Let's add two more: server-filesystem and server-docker.

Install the additional servers:

npm install -g @modelcontextprotocol/server-filesystem
npm install -g @modelcontextprotocol/server-docker

Update your mcp.json:

{
  "mcpServers": {
    "git": {
      "command": "mcp-server-git",
      "args": ["--repository", "."],
      "transport": "stdio"
    },
    "filesystem": {
      "command": "mcp-server-filesystem",
      "args": ["--root", "."],
      "transport": "stdio"
    },
    "docker": {
      "command": "mcp-server-docker",
      "transport": "stdio"
    }
  }
}

With all three servers active, your terminal can now answer cross-domain questions:

aterm "my Express app container keeps restarting, check the logs and compare with what the healthcheck in my Dockerfile expects"

To answer this, Copilot will:

Call docker_logs (server-docker) to pull the container's recent stderr output
Call read_file (server-filesystem) to read your Dockerfile
Parse the HEALTHCHECK instruction
Cross-reference the log errors with the health endpoint path
Return a diagnosis explaining the mismatch and suggest the fix

This is an agentic workflow: the model autonomously decides which tools to call, in what order, and synthesizes the results into a coherent answer. You didn't tell it to read the Dockerfile. It inferred that was necessary based on your question.

A note on security: When running server-filesystem, always scope it to a specific directory using --root. Never point it at / or your home directory. Similarly, server-docker has access to your Docker socket run it only in trusted environments.

Debugging Common Issues

mcp-server-git: command not found

The npm global bin directory isn't on your $PATH. Fix:

export PATH="\(PATH:\)(npm bin -g)"
# or for newer npm versions:
export PATH="\(PATH:\)(npm prefix -g)/bin"

Add this line to your .bashrc/.zshrc to persist it.

Copilot CLI doesn't seem to be using MCP tools

Check your Copilot CLI version:

gh copilot --version

MCP support requires version 1.3 or later. Update with:

gh extension upgrade copilot

Also verify your mcp.json is valid JSON a trailing comma or missing bracket will silently prevent server initialization.

MCP server starts but returns no data

Run the server manually to check for errors:

mcp-server-git --repository .

If it exits immediately, check that you're running the command inside a valid Git repository. For server-docker, make sure the Docker daemon is running and your user has access to the Docker socket:

sudo usermod -aG docker $USER
# Then log out and back in

Responses are slow with multiple servers

Each MCP server is a separate subprocess. Spawning several at once adds startup latency, especially on slower machines. Two optimizations:

Only declare the servers you actually need for a given project in your mcp.json
Use project-specific config files instead of one global config:

# project A (backend)
aterm --mcp-config ./mcp-backend.json "..."

# project B (infra)
aterm --mcp-config ./mcp-infra.json "..."

Conclusion

You've just built an agentic terminal workflow from scratch. Here's a quick recap of what you did:

Installed and configured GitHub Copilot CLI with shell aliases for fast access
Set up MCP servers (server-git, server-filesystem, server-docker) and wired them through a mcp.json config
Created a shell function (aterm) that transparently passes your MCP config to every Copilot query
Built a multi-step agentic loop for diagnosing regressions using live Git data
Extended the setup with cross-domain tool orchestration across Git, filesystem, and Docker

The architecture you've built here is not a demo – it's a production-ready pattern. You can extend it with any MCP-compatible server: server-postgres for database-aware queries, server-github for issue and PR context, or custom MCP servers you write yourself for your internal APIs.

The terminal has always been the most powerful surface in a developer's environment. With Copilot CLI and MCP, it's finally becoming an intelligent one.

How to Use the Model Context Protocol to Build a Personal Financial Assistant

Nikhil Adithyan — Wed, 25 Mar 2026 16:41:36 +0000

LLMs are great at writing market commentary. The problem is they can sound confident even when they haven't looked at any data. That’s fine for casual chat, but it’s not fine if you’re building a feature for a product, an internal tool, or anything a user might rely on.

In this guide, we’ll build a small financial assistant that fetches real data by calling tools exposed via the MCP protocol (Model Context Protocol), then computes the numbers in Python. The LLM’s job is only to narrate the computed facts. It doesn't invent metrics, and it doesn't do the math.

By the end, you’ll have two outputs you can actually plug into a product flow: a single-ticker market brief, and a watchlist snapshot that compares multiple tickers on volatility and drawdown, with the tool calls traced so you can see exactly what data was used.

What is MCP, and How Does it Change the Integration Story?
Architecture: The “Narrator” Pattern
Step 1: MCP Client Wrapper (client.py)
Step 2: The Assistant Core (core.py)
Demo 1: Market Brief for One Ticker
Demo 2: Watchlist Snapshot
What Makes this Shippable, and What Can Be Improved?
Conclusion

Prerequisites

This is a code-first guide. I won’t explain every line of Python, so you should be comfortable reading pandas code, basic async/await patterns, and calling APIs from Python.

Before you start, you’ll need:

Python 3.10+
An EODHD API key (to access the EODHD MCP server)
An OpenAI API key (for the narration step)
The MCP Python client installed, plus the usual data stack: numpy and pandas
A local environment where you can run async Python code (Jupyter or a normal script both work)

If you’ve never worked with async code before, you can still follow along. Just treat the async functions as "network calls" and focus on how the data flows from tool calls, to deterministic metrics, to narration.

What is MCP, and How Does it Change the Integration Story?

MCP (Model Context Protocol) is a protocol for how an LLM application can discover and call external tools exposed by an MCP server. Instead of hardcoding a bunch of function schemas or building custom connectors per framework, you plug into an MCP server and the tools become “available” in a consistent format.

For product teams, this matters because it reduces integration churn. Tool discovery is predictable, you’re not rewriting wrappers every time your stack changes, and you get a clean separation between the model and the data layer.

In our case, that data layer is EOD Historical Data (EODHD), a market data provider. We’ll use EODHD’s MCP server, which exposes market data tools the assistant can call whenever it needs prices or fundamentals.

One important clarification for this tutorial: we’re using an MCP server purely as the data access layer. The model doesn’t decide which MCP tools to call or what parameters to pass. We'll do that deterministically in Python, then hand the model a facts object and let it write the narrative. This keeps the output grounded and makes the system much easier to trust and debug.

Architecture: The “Narrator” Pattern

Here’s the architecture we’re using in this guide:

The idea is simple: we'll separate “getting facts” from “writing words”. The model only does the second part.

First, the user asks a question like “Give me a 30-day brief for AAPL” or “Compare TSLA, NVDA, AMZN over the last 60 days”. That raw text goes into a tiny parser. The parser is intentionally boring. It only extracts what the system needs to operate: a list of tickers and a lookback window.

Once we have tickers and dates, we fetch data by calling MCP tools on the EODHD MCP server. In this case, our MCP client connects to the EODHD MCP server. So instead of the assistant guessing prices or fundamentals, it calls tools like “get historical prices” and “get fundamentals”. At this point we have raw data. Nothing has been computed yet, and the model has not written a single sentence.

Then Python takes over. This is where we compute everything deterministically: returns, volatility, max drawdown, trend slope, and a simple volatility regime label. For watchlists, we align returns and compute correlation. These numbers are the backbone of the output. If you rerun the same query with the same window, you should get the same metrics.

Only after that do we involve the LLM. We pass it a compact facts object. It contains the metrics we computed, plus a few clean fundamentals fields. The prompt is strict. Use only these facts – no extra numbers and no guessing. The model’s job is to turn the facts into a clean note that feels like something a product would show.

Finally, the assistant returns a structured response object. Not just text. You get:

answer (the narrative)
metrics (the exact computed numbers)
data_used (tickers, date range, and which tools were called)
tool_trace_id (a trace id you can log, debug, or attach to monitoring)

This pattern is B2B-friendly for a very practical reason. It reduces hallucinations because the model isn’t doing analysis. It makes numbers repeatable because Python computes them. And it’s easy to audit because you can always show what data was fetched, what window was used, and which tool calls happened.

Step 1: MCP Client Wrapper (`client.py`)

Before we touch any “assistant logic”, we need one thing: a tiny MCP client wrapper that opens MCP sessions to the EODHD MCP server and calls tools reliably. That’s it.

This file does three jobs:

opens a streamable HTTP MCP session
calls a tool with a timeout and a small retry loop
returns the tool output plus a small metadata object we can later attach to logs and traces

Here’s the complete client.py:

import time
import asyncio

from mcp import ClientSession
from mcp.client.streamable_http import streamable_http_client

class EODHDMCP:
    def __init__(self, apikey, base_url=None):
        self.apikey = apikey
        self.base_url = base_url or "https://mcp.eodhd.dev/mcp"
        self._tools = None

    def _url(self):
        return f"{self.base_url}?apikey={self.apikey}"

    def _open(self):
        return streamable_http_client(self._url())

    async def list_tools(self):
        if self._tools is not None:
            return self._tools

        async with self._open() as (read, write, _):
            async with ClientSession(read, write) as s:
                await s.initialize()
                resp = await s.list_tools()
                self._tools = [t.name for t in resp.tools]
                return self._tools

    async def call_tool(self, name, args, trace_id, timeout_s=25, retries=1):
        last = None

        for attempt in range(retries + 1):
            t0 = time.time()
            try:
                async with self._open() as (read, write, _):
                    async with ClientSession(read, write) as s:
                        await s.initialize()
                        out = await asyncio.wait_for(s.call_tool(name, args), timeout=timeout_s)
                        dt = time.time() - t0
                        meta = {"trace_id": trace_id, "tool": name, "args": args, "latency_s": round(dt, 3)}
                        return out, meta
            except Exception as e:
                last = e
                if attempt < retries:
                    await asyncio.sleep(0.25)

        raise last

How this works:

streamablehttp_client(self._url()) opens an MCP session over streamable HTTP. The URL includes your API key as a query param, so the server can authenticate.
list_tools() is just a convenience. It asks the server which tools exist and caches the names in memory so you don’t fetch them repeatedly.
call_tool() is the workhorse. It opens a session, initializes it, calls a tool with call_tool(name, args), and wraps the result with a meta object.
That meta object is important later. It lets you trace which tool was called, with which params, how long it took, and which request it belonged to (trace_id).

Next, we’ll build the core runner in core.py. This is where we parse the user’s request, fetch prices and fundamentals via MCP, compute metrics in Python, and then hand the facts to the LLM for narration.

Step 2: The Assistant Core (`core.py`)

This is where the assistant actually becomes “real”. client.py was just a connector. Here we decide what data to fetch, how much to fetch, how to compute the numbers, and what we hand to the model for narration.

1. Budgets and Trace Logging

When you build anything that calls real tools, you want limits. Not because you don’t trust your code, but because without limits, one messy prompt can easily turn into an expensive, slow request.

In our case, we cap:

how far back we’ll fetch data (MAX_LOOKBACK_DAYS)
how many tool calls we allow per request (MAX_TOOL_CALLS)
how many tickers we’ll accept in one query (MAX_TICKERS)

And we log a few events so we can always debug what happened later.

Here’s the top part of core.py for that:

import json
import re
import time
import uuid
from datetime import date, timedelta
from openai import OpenAI
import numpy as np
import pandas as pd
import asyncio
from client import EODHDMCP

EODHD_API_KEY = "YOUR EODHD API KEY"
MCP_BASE_URL = "https://mcp.eodhd.dev/mcp"

MAX_LOOKBACK_DAYS = 365
MAX_TOOL_CALLS = 6
MAX_TICKERS = 5

mcp = EODHDMCP(EODHD_API_KEY, base_url=MCP_BASE_URL)
oa = OpenAI(api_key = "OPENAI API KEY")
NARRATION_MODEL = "gpt-5.3-chat-latest"

def log_event(event, trace_id, **k):
    payload = {"event": event, "trace_id": trace_id, "ts": round(time.time(), 3)}
    payload.update(k)
    print(json.dumps(payload, default=str))

What’s going on here:

MAX_LOOKBACK_DAYS, MAX_TOOL_CALLS, MAX_TICKERS are basically your safety rails. We’ll enforce them later, right after parsing the user query.
trace_id is a small id we generate per request. Every log line includes it, so when something breaks, you can reconstruct the exact flow for that request.
log_event() prints one JSON line. Nothing fancy – but it’s enough for debugging and it also looks very similar to how real systems emit traces.

Note: Make sure to replace YOUR EODHD API KEY with your actual EODHD API key. If you don’t have one, you can obtain it by creating an EODHD developer account.

2. Parsing the Request

This part is intentionally not “smart”. We’re not doing NLP. We’re not letting the model interpret the query. We just want to extract two things in a predictable way:

tickers
lookback window

That’s it.

The benefit of keeping it dumb is that the behavior is stable. If the query is messy, we still do something consistent, and the rest of the pipeline remains controllable.

Here are the two functions:

def parse_request(text):
    t = (text or "").upper()

    raw = re.findall(r"\b[A-Z]{1,5}\b", t)

    bad = {
        "I","A","AN","THE","AND","OR","TO","FOR","OF","IN","ON","BY","WITH","ME","WE","US",
        "GIVE","DAY","DAYS","BRIEF","COMPARE","RANK","OVER","LAST","TREND","VOL","VOLATILITY",
        "DRAWDOWN","FLAG","RISKS","RISK","PLUS","MAX","MIN","LOOKBACK"
    }

    tickers = []
    for x in raw:
        if x in bad:
            continue
        if len(x) < 2:
            continue
        if x not in tickers:
            tickers.append(x)

    days = 30

    if "LAST" in t:
        after = t.split("LAST", 1)[1]
        m = re.search(r"\d{1,4}", after)
        if m:
            days = int(m.group(0))
    
    return tickers, days

def enforce_budgets(tickers, lookback_days):
    if lookback_days < 1:
        lookback_days = 1
    if lookback_days > MAX_LOOKBACK_DAYS:
        lookback_days = MAX_LOOKBACK_DAYS

    tickers = tickers[:MAX_TICKERS]

    return tickers, lookback_days

How to read this:

re.findall(r"\b[A-Z]{1,5}\b", t) pulls out every short uppercase token. That’s our crude “ticker candidate” list.
The bad set is just a blacklist of common words that show up in prompts but are obviously not tickers.
We keep unique tickers in order, because the first ticker becomes the “base” for correlation in the watchlist demo.
Lookback is simple: the default is 30 days. If the query contains “last …”, we grab the first number after “LAST”. That avoids regex edge cases with punctuation.

Then enforce_budgets() clamps everything so one request can’t ask for 500 tickers or a 10-year window.

Next, we’ll wire these parsed values into a request state and start making actual MCP calls for prices and fundamentals.

3. Tool Wrappers: Prices and Fundamentals

Now we’re at the point where the assistant actually touches data.

These two functions do the same job in different ways:

fetch_prices() calls the historical prices tool on the EODHD MCP server, then normalizes the output into a tiny DataFrame with just date and price.
fetch_fundamentals() calls the fundamentals tool on the EODHD MCP server.

We also keep a small state object per request. It tracks tool calls and keeps a trace of what was called. That’s how we later produce the data_used block in the final response.

Here’s the code:

def new_state():
    return {"tool_calls": 0, "tool_trace": [], "rows": {}}

def _bump(state, meta):
    state["tool_calls"] += 1
    state["tool_trace"].append(meta)
    if state["tool_calls"] > MAX_TOOL_CALLS:
        raise RuntimeError("tool call budget exceeded")

def _as_json_text(out):
    if isinstance(out, str):
        return out
    if hasattr(out, "content"):
        try:
            return out.content[0].text
        except Exception:
            pass
    return str(out)

async def fetch_prices(ticker, start_date, end_date, trace_id, state):
    args = {
        "ticker": ticker,
        "start_date": start_date,
        "end_date": end_date,
        "period": "d",
        "order": "a",
        "fmt": "json",
    }

    out, meta = await mcp.call_tool("get_historical_stock_prices", args, trace_id)
    txt = _as_json_text(out)

    _bump(state, meta)

    data = json.loads(txt)
    if isinstance(data, dict) and data.get("error"):
        raise RuntimeError(data["error"])

    df = pd.DataFrame(data)
    if df.empty:
        return df

    cols = [c for c in ["date", "adjusted_close", "close"] if c in                   df.columns]
    df = df[cols].copy()

    if "adjusted_close" in df.columns:
        df = df.rename(columns={"adjusted_close": "price"})
    elif "close" in df.columns:
        df = df.rename(columns={"close": "price"})
    else:
        return pd.DataFrame()

    df["ticker"] = ticker

    state["rows"][f"{meta['tool']}:{ticker}"] = len(df)
    return df

async def fetch_fundamentals(ticker, trace_id, state):
    args = {
        "ticker": ticker,
        "include_financials": False,
        "fmt": "json",
    }

    out, meta = await mcp.call_tool("get_fundamentals_data", args, trace_id)
    txt = _as_json_text(out)

    _bump(state, meta)

    data = json.loads(txt)
    if isinstance(data, dict) and data.get("error"):
        raise RuntimeError(data["error"])

    return data

What’s happening here:

_bump() is the budget guard. Every time we make a tool call, we increment the counter and store the tool metadata. If we cross the budget, we fail fast.
meta comes from client.py. It contains tool, args, and latency. That’s enough to trace “what did we call and how long did it take”.
_as_json_text() is there because the tool results returned by the MCP server are not always plain strings. Sometimes it’s an object with .content. This helper just tries to extract the text cleanly.
In fetch_prices(), we intentionally keep only date and price. That’s not because OHLC is useless. It’s because this tutorial’s metrics only need adjusted closes. Fewer columns means simpler code, smaller payloads, and fewer chances to break.

Next, we’ll compute the actual metrics. This is where the assistant stops being “an API caller” and starts producing something useful.

4. Deterministic Metrics

This is the most important design choice in the whole build. The model never computes numbers. Python does.

So for every ticker, we compute a small set of metrics that are easy to explain and are actually useful in a “market brief” style output:

total return over the window
realized volatility (daily and annualized)
max drawdown (worst peak-to-trough fall)
a simple trend slope (so we can say “mild uptrend” or “downtrend” without vibes)
a lightweight regime label (low, mid, high volatility)

Here’s the code:

def compute_metrics(prices_df):
    if prices_df is None or prices_df.empty:
        return {}

    df = prices_df.copy()
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df = df.dropna(subset=["date"]).sort_values("date")

    close = pd.to_numeric(df["price"], errors="coerce").dropna()
    if close.empty:
        return {}

    rets = close.pct_change().dropna()

    out = {}

    # realized vol (daily), annualize with sqrt(252)
    if not rets.empty:
        out["vol_daily"] = float(rets.std())
        out["vol_annualized"] = float(rets.std() * np.sqrt(252))
        out["ret_total"] = float((close.iloc[-1] / close.iloc[0]) - 1.0)

    # max drawdown
    peak = close.cummax()
    dd = (close / peak) - 1.0
    out["max_drawdown"] = float(dd.min())

    # simple trend score
    logp = np.log(close.values)
    x = np.arange(len(logp))
    if len(logp) >= 3:
        slope = np.polyfit(x, logp, 1)[0]
        out["trend_slope"] = float(slope)
    else:
        out["trend_slope"] = 0.0

    # basic helpers
    out["n_points"] = int(len(close))
    out["start_close"] = float(close.iloc[0])
    out["end_close"] = float(close.iloc[-1])

    return out

def compute_regime(prices_df, window=20):
    # cheap regime label, based on rolling vol percentile
    if prices_df is None or prices_df.empty:
        return {"regime": "unknown"}

    df = prices_df.copy()
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df = df.dropna(subset=["date"]).sort_values("date")

    close = pd.to_numeric(df["price"], errors="coerce").dropna()
    if close.empty:
        return {"regime": "unknown"}

    rets = close.pct_change()
    rv = rets.rolling(window).std()

    last = rv.dropna()
    if last.empty:
        return {"regime": "unknown"}

    cur = float(last.iloc[-1])
    p80 = float(last.quantile(0.8))
    p50 = float(last.quantile(0.5))

    if cur >= p80:
        reg = "high_vol"
    elif cur >= p50:
        reg = "mid_vol"
    else:
        reg = "low_vol"

    return {"regime": reg, "rolling_vol": cur, "window": int(window)}

How to think about these calculations:

Total return is just end / start - 1. It’s the simplest “did it go up or down” number.
Volatility here is realized volatility of daily returns. That’s just the standard deviation of daily % changes. We annualize it using sqrt(252) because markets have roughly 252 trading days.
Max drawdown tells you how bad the worst dip was during the window. It’s often more meaningful than return when you’re writing a quick risk note.
Trend slope is intentionally simple. We fit a straight line to log prices. If the slope is positive, it’s generally drifting up. If it’s negative, it’s drifting down.
Regime label is not a fancy model. It just says “compared to its own recent rolling volatility, are we currently in a high, medium, or low vol phase”.

The main point is this: these numbers are deterministic. If the assistant says “max drawdown was -13%”, you can trace it back to the exact adjusted close series that produced it.

Next, we’ll handle the watchlist side. That means aligning returns across tickers, computing correlation, and generating a ranked snapshot.

5. Watchlist Utilities

Once you have more than one ticker, you want two extra things:

a quick ranking so you can say “this is the riskiest name in the basket”
a correlation snapshot so you can see what’s moving together

The only “gotcha” with correlation is dates. If TSLA has 41 price points and NVDA has 39 because of missing days, you can’t just correlate blindly. You need the returns lined up on the same dates first. That’s what align_returns() does.

Here’s the code:

def align_returns(price_frames):
    if not price_frames:
        return pd.DataFrame()

    parts = []
    for df in price_frames:
        if df is None or df.empty:
            continue
        x = df.copy()
        x["date"] = pd.to_datetime(x["date"], errors="coerce")
        x = x.dropna(subset=["date"])
        x["price"] = pd.to_numeric(x["price"], errors="coerce")
        x = x.dropna(subset=["price"])
        x = x.sort_values("date")
        x["ret"] = x["price"].pct_change()
        x = x.dropna(subset=["ret"])
        parts.append(x[["date", "ticker", "ret"]])

    if not parts:
        return pd.DataFrame()

    allr = pd.concat(parts, ignore_index=True)
    wide = allr.pivot(index="date", columns="ticker", values="ret").dropna(how="any")
    return wide


def corr_summary(ret_wide, base_ticker, top_n=3):
    if ret_wide is None or ret_wide.empty:
        return []

    if base_ticker not in ret_wide.columns:
        return []

    c = ret_wide.corr()[base_ticker].dropna()
    c = c.drop(labels=[base_ticker], errors="ignore")
    if c.empty:
        return []

    out = []
    for k, v in c.sort_values(ascending=False).head(top_n).items():
        out.append({"ticker": k, "corr": float(v)})

    return out


def rank_watchlist(metrics_by_ticker):
    rows = []
    for t, m in metrics_by_ticker.items():
        if not m:
            continue
        rows.append({
            "ticker": t,
            "vol_annualized": m.get("vol_annualized"),
            "max_drawdown": m.get("max_drawdown"),
            "ret_total": m.get("ret_total"),
            "trend_slope": m.get("trend_slope"),
        })

    if not rows:
        return pd.DataFrame()

    df = pd.DataFrame(rows)
    df = df.sort_values(["vol_annualized", "max_drawdown"], ascending=[False, True])
    return df.reset_index(drop=True)

What’s happening here:

align_returns() takes a list of price DataFrames, computes daily returns for each, then pivots them into a wide table like: date -> TSLA.US, NVDA.US, AMZN.US.
We drop rows where any ticker is missing, because correlation only makes sense when the returns are aligned on the same dates.
corr_summary() is a compact “who moves with whom” helper. We pick one base ticker, compute correlations against everything else, then grab the top few. For a watchlist widget, that’s usually enough.
rank_watchlist() is the ranking logic for the snapshot. We sort primarily by annualized volatility, and use drawdown as a secondary risk indicator. You could choose different ranking logic. The point is to keep it deterministic and explainable.

Next, we’ll build the facts objects and narration layer. That’s where we enforce the “model is just a narrator” contract.

6. Facts Object and Narration

This is where the “narrator pattern” becomes real.

Up to this point, we’ve done everything with MCP and Python. We fetched prices and fundamentals from EODHD, we computed metrics, and we aligned returns. Now we need one clean object that represents “the truth” for this request.

That’s what the facts object is.

The rule is simple.

facts contains only things we actually fetched or computed.
The model never sees raw market data. It sees the cleaned facts.
The model is told to write using only those facts, and not to invent any numbers.

Here are the functions that build those facts objects for the two demos, plus the narration function.

def build_facts_single(ticker, lookback_days, metrics, regime, fundamentals):
    # keep this compact. LLM will narrate from this later
    out = {
        "type": "single_ticker_brief",
        "ticker": ticker,
        "lookback_days": int(lookback_days),
        "metrics": metrics,
        "regime": regime,
    }

    if isinstance(fundamentals, dict):
        gen = fundamentals.get("General", {}) or {}
        hi = fundamentals.get("Highlights", {}) or {}
        val = fundamentals.get("Valuation", {}) or {}
        tech = fundamentals.get("Technicals", {}) or {}

        base = {
            "name": gen.get("Name"),
            "exchange": gen.get("Exchange"),
            "sector": gen.get("Sector"),
            "industry": gen.get("Industry"),
        }

        metrics = {
            "market_cap": hi.get("MarketCapitalization"),
            "pe": hi.get("PERatio") or val.get("TrailingPE") or val.get("PERatio"),
            "beta": tech.get("Beta"),
            "div_yield": hi.get("DividendYield"),
        }

        out["fundamentals"] = {k: v for k, v in {**base, **metrics}.items() if v is not None}

    return out


def build_facts_watchlist(tickers, lookback_days, rank_df, corr_bits, metrics_by_ticker):
    out = {
        "type": "watchlist_snapshot",
        "tickers": tickers,
        "lookback_days": int(lookback_days),
        "ranking": rank_df.to_dict(orient="records") if isinstance(rank_df, pd.DataFrame) else [],
        "correlation": corr_bits,
        "metrics_by_ticker": metrics_by_ticker,
    }
    return out


def narrate(facts):
    prompt = (
        "Write a short, product-ready market note using ONLY the facts below.\n"
        "No guessing. No extra numbers. If something is missing, say it's missing.\n"
        "Keep it tight and readable.\n\n"
        f"FACTS:\n{json.dumps(facts, indent=2, default=str)}"
    )

    r = oa.responses.create(
        model=NARRATION_MODEL,
        input=[{"role": "user", "content": prompt}],
    )

    try:
        return r.output_text
    except Exception:
        return str(r)

What’s happening here:

build_facts_single() takes the ticker, window, computed metrics, the vol regime label, and the fundamentals payload. But it doesn’t dump the entire fundamentals JSON. It picks a handful of fields from the General section and only keeps what exists. That keeps the prompt tight and the output predictable.
build_facts_watchlist() is the same idea but for multiple tickers. It passes the ranking table, correlation notes, and per-ticker metrics.
narrate() is basically “convert this facts object into human-friendly text”. The prompt is strict on purpose. If the model can only see these facts, it cannot hallucinate numbers outside them.

One small implementation detail: narrate() is a normal blocking function, while everything else is async. That’s why later, inside run_assistant(), we call it with await asyncio.to_thread(...) so it doesn’t block the async flow.

7. The Orchestration Function (`run_assistant()`)

This is the piece that ties everything together. It does four things in order:

create a trace id and log the request
parse tickers and lookback, then clamp them to budgets
fetch EODHD data via MCP and compute metrics in Python
call the model to narrate the facts, then return a structured response

Here’s the function:

def _dates_from_lookback(lookback_days):
    end = date.today()
    start = end - timedelta(days=int(lookback_days))
    return start.isoformat(), end.isoformat()

async def run_assistant(user_text, mode="auto"):
    trace_id = uuid.uuid4().hex[:10]
    log_event("request_started", trace_id, text=user_text, mode=mode)

    tickers, lookback = parse_request(user_text)
    tickers, lookback = enforce_budgets(tickers, lookback)

    if not tickers:
        return {
            "answer": "no tickers found in request",
            "metrics": {},
            "data_used": {},
            "tool_trace_id": trace_id,
        }

    log_event("parsed", trace_id, tickers=tickers, lookback_days=lookback)
    
    start_date, end_date = _dates_from_lookback(lookback)
    state = new_state()
        
    if mode == "auto":
        mode = "watchlist" if len(tickers) > 1 else "single"

    try:
        if mode == "single":
            t = tickers[0]
            t_full = t if "." in t else f"{t}.US"

            log_event("tool_phase", trace_id, mode="single", ticker=t_full, start_date=start_date, end_date=end_date)

            prices = await fetch_prices(t_full, start_date, end_date, trace_id, state)
            metrics = compute_metrics(prices)
            regime = compute_regime(prices)

            fundamentals = await fetch_fundamentals(t_full, trace_id, state)

            facts = build_facts_single(t_full, lookback, metrics, regime, fundamentals)
            answer = await asyncio.to_thread(narrate, facts)

            resp = {
                "answer": answer,
                "metrics": metrics,
                "data_used": {
                    "tickers": [t_full],
                    "date_range": [start_date, end_date],
                    "tools_called": [x.get("tool") for x in state["tool_trace"]],
                    "tool_calls": state["tool_calls"],
                },
                "tool_trace_id": trace_id,
            }

            log_event("request_finished", trace_id, tool_calls=state["tool_calls"])
            return resp

        # watchlist
        full = [x if "." in x else f"{x}.US" for x in tickers]

        log_event("tool_phase", trace_id, mode="watchlist", tickers=full, start_date=start_date, end_date=end_date)

        frames = []
        metrics_by = {}

        for t in full:
            prices = await fetch_prices(t, start_date, end_date, trace_id, state)
            frames.append(prices)
            metrics_by[t] = compute_metrics(prices)

        ret_wide = align_returns(frames)

        base = full[0]
        corr_bits = []
        top = corr_summary(ret_wide, base, top_n=3)
        if top:
            corr_bits.append({"base": base, "top": top})

        rank_df = rank_watchlist(metrics_by)
        facts = build_facts_watchlist(full, lookback, rank_df, corr_bits, metrics_by)
        answer = await asyncio.to_thread(narrate, facts)

        resp = {
            "answer": answer,
            "metrics": {"by_ticker": metrics_by},
            "data_used": {
                "tickers": full,
                "date_range": [start_date, end_date],
                "tools_called": [x.get("tool") for x in state["tool_trace"]],
                "tool_calls": state["tool_calls"],
            },
            "tool_trace_id": trace_id,
        }

        log_event("request_finished", trace_id, tool_calls=state["tool_calls"])
        return resp

    except Exception as e:
        detail = repr(e)
        if hasattr(e, "exceptions"):
            detail = detail + " | " + " ; ".join([repr(x) for x in e.exceptions])

        log_event("request_failed", trace_id, err=detail)
        
        return {
            "answer": f"failed: {e}",
            "metrics": {},
            "data_used": {
                "tickers": tickers,
                "date_range": [start_date, end_date],
                "tools_called": [x.get("tool") for x in state["tool_trace"]],
                "tool_calls": state["tool_calls"],
            },
            "tool_trace_id": trace_id,
        }

This function is the glue. It creates a trace_id, logs the request, extracts tickers and a lookback window, then clamps both to your budgets so the assistant can’t over-fetch or spam tool calls.

After that, it turns the lookback into a start_date and end_date, initializes a fresh state, and picks a mode. In single mode, it fetches prices and fundamentals for one ticker via EODHD’s MCP tools, computes the metrics in Python, packs everything into a facts object, and asks the LLM to only narrate those facts. In watchlist mode it does the same across multiple tickers, then aligns returns so correlation is computed on matching dates, and builds a ranked snapshot.

The response is always structured the same way. You get the narrative answer, the raw computed metrics, a data_used block that shows tickers, date range, and tools called, plus a tool_trace_id so you can trace any output back to logs.

That structure is the difference between “a chat response” and “a shippable assistant output”. You can plug the same response into a UI card, a Slack alert, or a dashboard without changing anything.

Demo 1: Market Brief for One Ticker

Let’s start with the simplest flow. One ticker, one lookback window, and a market brief that looks like something you could show inside a product.

Prompt used:

“Give me a 30-day brief for AAPL. trend, volatility, max drawdown, plus 3 fundamental highlights.”

Code (Jupyter Notebook):

import asyncio
import json
from core import run_assistant

q1 = "Give me a 30-day brief for AAPL. trend, volatility, max drawdown, plus 3 fundamental highlights."

r1 = await run_assistant(q1, mode="single")
print(json.dumps(r1, indent=2, ensure_ascii=False))

Output:

{"event": "request_started", "trace_id": "2af550173f", "ts": 1772735388.777, "text": "Give me a 30-day brief for AAPL. trend, volatility, max drawdown, plus 3 fundamental highlights.", "mode": "single"}
{"event": "parsed", "trace_id": "2af550173f", "ts": 1772735388.778, "tickers": ["AAPL"], "lookback_days": 30}
{"event": "tool_phase", "trace_id": "2af550173f", "ts": 1772735388.778, "mode": "single", "ticker": "AAPL.US", "start_date": "2026-02-03", "end_date": "2026-03-05"}
{"event": "request_finished", "trace_id": "2af550173f", "ts": 1772735404.392, "tool_calls": 2}
{
  "answer": "Apple Inc (AAPL.US) | NASDAQ | Technology — Consumer Electronics\n
\nOver the past 30 days, Apple shares declined 2.58%, falling from 269.48 to 
262.52 across 21 trading observations. The trend slope over the period was 
negative (-0.00175), indicating a modest downward drift.\n\nRealized daily 
volatility was 1.93%, equivalent to about 30.65% annualized. The stock is currently 
classified in a high‑volatility regime based on a 20‑day rolling volatility measure.
\n\nMaximum drawdown during the period reached -8.03%.\n\nAdditional fundamentals 
or valuation metrics were not provided.",
  "metrics": {
    "vol_daily": 0.01930981768788001,
    "vol_annualized": 0.3065338527847606,
    "ret_total": -0.02582751966750796,
    "max_drawdown": -0.08032503955127279,
    "trend_slope": -0.0017498633497641184,
    "n_points": 21,
    "start_close": 269.48,
    "end_close": 262.52
  },
  "data_used": {
    "tickers": [
      "AAPL.US"
    ],
    "date_range": [
      "2026-02-03",
      "2026-03-05"
    ],
    "tools_called": [
      "get_historical_stock_prices",
      "get_fundamentals_data"
    ],
    "tool_calls": 2
  },
  "tool_trace_id": "2af550173f"
}

First, you’ll see the log events. They’re not part of the final response. They’re just the trace trail.

request_started shows the raw prompt and that we forced mode="single".
parsed confirms the parser extracted AAPL and a 30-day lookback.
tool_phase shows what we actually fetched: AAPL.US from 2026-02-03 to 2026-03-05.
request_finished confirms we made exactly 2 tool calls.

Now the actual response JSON:

answer is the narrative. In this run it summarizes:

return of -2.58% (269.48 to 262.52)
21 price observations in that window
negative trend slope (-0.00175) meaning mild downward drift
daily vol 1.93% and annualized vol 30.65%
max drawdown -8.03%
and it labels the regime as high volatility using the rolling vol logic.

metrics is where those numbers come from. This is the deterministic part. ret_total, vol_daily, vol_annualized, max_drawdown, and trend_slope were computed directly from the fetched closes. start_close, end_close, and n_points explain the exact series used.

data_used is the audit block for this specific output. It shows:

ticker normalized to AAPL.US
the exact date range pulled
the exact tools called on the MCP server: get_historical_stock_prices and get_fundamentals_data
and again, tool_calls: 2 so you can quickly spot runaway calls.

tool_trace_id (2af550173f) is your handle for debugging. Every log line above carries the same id, so you can trace this brief back to the exact tool calls and parameters.

Demo 2: Watchlist Snapshot

Now let’s switch to the watchlist flow. Same assistant core. The only difference is we pass multiple tickers and a longer window, so the output becomes a comparative risk snapshot.

Prompt used:

“Compare TSLA, NVDA, AMZN over the last 60 days. rank by volatility and drawdown, and flag valuation risks.”

Code:

q2 = "Compare TSLA, NVDA, AMZN over the last 60 days. rank by volatility and drawdown, and flag risk outliers."

r2 = await run_assistant(q2, mode="watchlist")
print(json.dumps(r2, indent=2, ensure_ascii=False))

Output:

{"event": "request_started", "trace_id": "1b67bb47d6", "ts": 1772735404.394, "text": "Compare TSLA, NVDA, AMZN over the last 60 days. rank by volatility and drawdown, and flag valuation risks.", "mode": "watchlist"}
{"event": "parsed", "trace_id": "1b67bb47d6", "ts": 1772735404.394, "tickers": ["TSLA", "NVDA", "AMZN"], "lookback_days": 60}
{"event": "tool_phase", "trace_id": "1b67bb47d6", "ts": 1772735404.394, "mode": "watchlist", "tickers": ["TSLA.US", "NVDA.US", "AMZN.US"], "start_date": "2026-01-05", "end_date": "2026-03-06"}
{"event": "request_finished", "trace_id": "1b67bb47d6", "ts": 1772735423.004, "tool_calls": 3}
{
  "answer": "Market Watchlist Snapshot (last 60 days)\n\nAll three names show 
negative total returns and downward trend slopes over the period.\n\nNVDA.US 
ranks highest in the group despite a small decline. Total return is -0.027. 
Price moved from 188.12 to 183.04 across 41 observations. Annualized volatility is 
0.3808 and maximum drawdown is -0.107.\n\nTSLA.US shows the second‑highest volatility 
profile with annualized volatility of 0.3561. Total return is -0.101, with price 
falling from 451.67 to 405.94. Maximum drawdown reached -0.131. Trend slope is negative.
\n\nAMZN.US has the lowest volatility in the set (annualized 0.3196) but the deepest 
drawdown at -0.196. Total return is -0.0697, with price moving from 233.06 to 
216.82. Trend slope is also negative.\n\nCorrelation: TSLA shows a stronger 
relationship with NVDA (0.533) than with AMZN (0.177).\n\nMissing from the 
data: trading volume, catalysts, sector context, and forward-looking indicators.",
  "metrics": {
    "by_ticker": {
      "TSLA.US": {
        "vol_daily": 0.02243518393199404,
        "vol_annualized": 0.3561475038122908,
        "ret_total": -0.10124648526579139,
        "max_drawdown": -0.13115770363318358,
        "trend_slope": -0.0026452119688441023,
        "n_points": 41,
        "start_close": 451.67,
        "end_close": 405.94
      },
      "NVDA.US": {
        "vol_daily": 0.023987861378298222,
        "vol_annualized": 0.3807954941476091,
        "ret_total": -0.027004039974484417,
        "max_drawdown": -0.10716326424601319,
        "trend_slope": -4.3573704505466623e-05,
        "n_points": 41,
        "start_close": 188.12,
        "end_close": 183.04
      },
      "AMZN.US": {
        "vol_daily": 0.020129905817481322,
        "vol_annualized": 0.31955234824924766,
        "ret_total": -0.06968162704882863,
        "max_drawdown": -0.1964184655186353,
        "trend_slope": -0.00520436173926906,
        "n_points": 41,
        "start_close": 233.06,
        "end_close": 216.82
      }
    }
  },
  "data_used": {
    "tickers": [
      "TSLA.US",
      "NVDA.US",
      "AMZN.US"
    ],
    "date_range": [
      "2026-01-05",
      "2026-03-06"
    ],
    "tools_called": [
      "get_historical_stock_prices",
      "get_historical_stock_prices",
      "get_historical_stock_prices"
    ],
    "tool_calls": 3
  },
  "tool_trace_id": "1b67bb47d6"
}

The logs show the assistant correctly extracted TSLA, NVDA, AMZN and a 60-day lookback, then fetched TSLA.US, NVDA.US, and AMZN.US from 2026-01-05 to 2026-03-06. Since this is a watchlist request, it made exactly 3 tool calls. One get_historical_stock_prices call per ticker.

Inside answer, the model is basically summarizing what Python computed. In this run, all three names had negative returns and negative trend slopes.

NVDA had the highest annualized volatility at 0.3808 with a relatively small decline of -2.7%.
TSLA was next in volatility (0.3561) with a larger decline (-10.1%) and drawdown of about -13.1%.
AMZN had the lowest volatility (0.3196) but the deepest drawdown at around -19.6%. It also includes a correlation note derived from the aligned returns table.
TSLA’s return series correlated more with NVDA (0.533) than with AMZN (0.177) in this window.

metrics.by_ticker is where the snapshot really lives. It contains the full computed metric set per ticker, including observation count (n_points=41) and the start and end closes used for the return calculation. data_used shows exactly what we fetched, including the tickers, the date range, and the three price tool calls. And tool_trace_id is the id that links this output back to the full trace logs.

So how would a product team use this? Well, this output is already shaped like a widget backend. You can render the ranking as a watchlist “risk card”, show the top volatility and drawdown names, and drop the narrative into a compact summary box. Since you also get deterministic metrics, you can build UI elements without parsing text, and still keep the narration as a layer on top.

What Makes this Shippable, and What Can Be Improved?

The core reason this works in a real product setting is that the numbers are deterministic. Prices and fundamentals come from EODHD via MCP, metrics are computed in Python, and the model only writes narrative from a facts object.

On top of that, every run is traceable. You get tool logs, data_used, and a tool_trace_id, plus hard limits on lookback, tickers, and tool calls so the system can’t spiral.

At the same time, this is still an MVP. The parsing is a simple heuristic, the metric set is intentionally small, and fundamentals are only lightly extracted.

If you want to take this further, the next upgrades are straightforward: you can add volume and a couple more data tools like earnings calendar and news, introduce caching for repeated requests, build a tiny evaluation harness with fixed prompts and expected outputs, then wrap run_assistant() behind a small API so it can power an actual UI or internal service.

Conclusion

The main takeaway is simple. If you want a financial assistant to be usable beyond casual chat, you need to separate facts from narrative. The MCP protocol gives you a clean way to connect to tool providers via an MCP server. Python gives you deterministic metrics, and the model becomes the last-mile layer that turns those facts into readable output.

This is still a small build, but it’s already shaped like something you can ship. The response format is structured, traceable, and easy to plug into a UI. If you extend it with a few more tools and add basic caching, it can quickly move from a Jupyter notebook demo to a real feature.

If you want to try the same approach with a full market data tool layer out of the box, EODHD’s MCP server is a solid starting point.

With that being said, you’ve reached the end of the article. Hope you learned something new and useful today. Thank you very much for your time.

How to Build an MCP Server with Python, Docker, and Claude Code

Balajee Asish Brahmandam — Tue, 10 Mar 2026 21:41:44 +0000

Every MCP tutorial I've found so far has followed the same basic script: build a server, point Claude Desktop at it, screenshot the chat window, done.

This is fine if you want a demo. But it's not fine if you want something you can ship, defend in an interview, or hand to another developer without a README that starts with "first, install this Electron app."

So I built an MCP server in Python, containerized it with Docker, and wired it into Claude Code – all from the terminal, no GUI required.

This article walks through the full loop in one afternoon: what MCP actually is, why it matters now that OpenAI and Google have adopted it, the real security problems nobody puts in their tutorial (complete with CVEs), and every command you need to go from an empty directory to a working tool.

If you're between jobs and need a portfolio project that shows you understand how AI tooling actually works under the hood, this is the one.

What You Will Build

By the end of this tutorial, you will have:

A Python MCP server that exposes custom tools to any MCP-compatible AI client
A Docker container that packages the server for reproducible deployment
A working connection between that container and Claude Code in your terminal
An understanding of the security risks involved and how to mitigate the worst of them

The server we are building is a project scaffolder. You give it a project name and a language, and it generates a starter directory structure with the right files. It's simple enough to build in an afternoon, but useful enough to actually put on your résumé.

Prerequisites

You will need the following installed on your machine:

Python 3.10+ (check with python3 --version)
Docker (check with docker --version)
Claude Code with an active Claude Pro, Max, or API plan (check with claude --version)
Node.js 20+ (required by Claude Code – check with node --version)
A terminal you are comfortable in

If you don't have Claude Code installed yet, follow the official installation instructions. The npm installation method is deprecated, so make sure you use the native binary installer instead.

What is MCP (and Why Should You Care)?

The Model Context Protocol (MCP) is an open standard that lets AI models connect to external tools and data sources. Anthropic released it in November 2024, and within a year it became the default way to extend what an LLM can do. OpenAI adopted it in March 2025. Google DeepMind followed in April. The protocol now has over 97 million monthly SDK downloads and more than 10,000 active servers.

The easiest way to think about MCP is as a USB-C port for AI. Before MCP, every AI provider had its own way of calling tools. OpenAI had function calling. Google had their own format. If you wanted your tool to work with multiple models, you had to implement it multiple times. MCP gives you one interface that works everywhere.

Here is how the pieces fit together:

An MCP server exposes tools, resources, and prompts. It is your code.
An MCP client (like Claude Code, Claude Desktop, or Cursor) discovers those tools and calls them on behalf of the LLM.
The transport is how they communicate. For local servers, that's usually stdio (standard input/output). For remote servers, it's HTTP.

When you type a message in Claude Code and it decides to use one of your tools, here is what happens: Claude Code sends a JSON-RPC 2.0 message to your server over stdin, your server executes the tool and writes the result to stdout, and Claude Code reads it back. The LLM never talks to your server directly. The client is always in the middle.

If you want the deeper architecture breakdown, freeCodeCamp already has a solid explainer on how MCP works under the hood. Here, I will focus on building.

Why Claude Code Instead of Claude Desktop?

Most MCP tutorials use Claude Desktop as the client. That works, but Claude Code has a few advantages for developers:

It lives in your terminal. No GUI to configure. No JSON files to hand-edit in hidden config directories. You add an MCP server with one command and you are done.
It's already where you code. If you're writing the server, testing it, and connecting it, doing all of that in the same terminal session cuts the context switching.
It works on headless machines. If you're SSHing into a dev box or running in CI, Claude Desktop isn't an option. Claude Code is.
It's also an MCP server itself. Claude Code can expose its own tools (file reading, writing, shell commands) to other MCP clients via claude mcp serve. That's a neat trick we won't use today, but it's worth knowing about.

The relevant commands:

# Add an MCP server
claude mcp add  -- 

# List configured servers
claude mcp list

# Remove a server
claude mcp remove 

# Check MCP status inside Claude Code
/mcp

Step 1: Build the MCP Server

We're using FastMCP, a Python framework that handles all the protocol plumbing so you can focus on your tools. Create a new project directory and set it up:

mkdir mcp-scaffolder && cd mcp-scaffolder
python3 -m venv .venv
source .venv/bin/activate
pip install "mcp[cli]>=1.25,<2"

Why pin the version? The MCP Python SDK v2.0 is in development and will change the transport layer significantly. Pinning to >=1.25,<2 keeps your server working until you're ready to migrate.

Now create server.py:

# server.py
from mcp.server.fastmcp import FastMCP
import os
import json

mcp = FastMCP("project-scaffolder")

# Templates for different languages
TEMPLATES = {
    "python": {
        "files": {
            "main.py": '"""Entry point."""\n\n\ndef main():\n    print("Hello, world!")\n\n\nif __name__ == "__main__":\n    main()\n',
            "requirements.txt": "",
            "README.md": "# {name}\n\nA Python project.\n\n## Setup\n\n```bash\npip install -r requirements.txt\npython main.py\n```\n",
            ".gitignore": "__pycache__/\n*.pyc\n.venv/\n",
        },
        "dirs": ["tests"],
    },
    "node": {
        "files": {
            "index.js": 'console.log("Hello, world!");\n',
            "package.json": '{{\n  "name": "{name}",\n  "version": "1.0.0",\n  "main": "index.js"\n}}\n',
            "README.md": "# {name}\n\nA Node.js project.\n\n## Setup\n\n```bash\nnpm install\nnode index.js\n```\n",
            ".gitignore": "node_modules/\n",
        },
        "dirs": [],
    },
    "go": {
        "files": {
            "main.go": 'package main\n\nimport "fmt"\n\nfunc main() {{\n\tfmt.Println("Hello, world!")\n}}\n',
            "go.mod": "module {name}\n\ngo 1.21\n",
            "README.md": "# {name}\n\nA Go project.\n\n## Setup\n\n```bash\ngo run main.go\n```\n",
            ".gitignore": "bin/\n",
        },
        "dirs": ["cmd", "internal"],
    },
}


@mcp.tool()
def scaffold_project(name: str, language: str) -> str:
    """Create a new project directory structure.

    Args:
        name: The project name (used as the directory name)
        language: The programming language - one of: python, node, go
    """
    language = language.lower().strip()

    if language not in TEMPLATES:
        return json.dumps({
            "error": f"Unsupported language: {language}",
            "supported": list(TEMPLATES.keys()),
        })

    template = TEMPLATES[language]
    base_path = os.path.join(os.getcwd(), name)

    if os.path.exists(base_path):
        return json.dumps({
            "error": f"Directory already exists: {name}",
        })

    # Create the project directory
    os.makedirs(base_path, exist_ok=True)

    # Create subdirectories
    for dir_name in template["dirs"]:
        os.makedirs(os.path.join(base_path, dir_name), exist_ok=True)

    # Create files
    created_files = []
    for filename, content in template["files"].items():
        filepath = os.path.join(base_path, filename)
        formatted_content = content.replace("{name}", name)
        with open(filepath, "w") as f:
            f.write(formatted_content)
        created_files.append(filename)

    return json.dumps({
        "status": "created",
        "path": base_path,
        "language": language,
        "files": created_files,
        "directories": template["dirs"],
    })


@mcp.tool()
def list_templates() -> str:
    """List all available project templates and their contents."""
    result = {}
    for lang, template in TEMPLATES.items():
        result[lang] = {
            "files": list(template["files"].keys()),
            "directories": template["dirs"],
        }
    return json.dumps(result, indent=2)


if __name__ == "__main__":
    mcp.run(transport="stdio")

A few things to notice about this code:

Tools return strings. MCP tools communicate through text. I'm returning JSON strings so the LLM can parse the results reliably. You could return plain text, but structured data gives the model more to work with.

The @mcp.tool() decorator does the heavy lifting. FastMCP reads your function signature and docstring to generate the JSON schema that tells the LLM what this tool does, what arguments it takes, and what types they are. Good docstrings aren't optional here – they're how the LLM decides whether to call your tool.

transport="stdio" is the key line. This tells FastMCP to communicate over standard input/output, which is what Claude Code expects for local servers.

Step 2: Test It Locally

Before we Dockerize anything, make sure the server actually works:

# Quick smoke test - the server should start without errors
python server.py

You should see... nothing. That is correct. An MCP server over stdio just sits there waiting for JSON-RPC messages on stdin. Press Ctrl+C to stop it.

For a proper test, use the MCP Inspector (Anthropic's debugging tool):

# Install and run the inspector
npx @modelcontextprotocol/inspector python server.py

This opens a web interface where you can see your tools, call them manually, and inspect the JSON-RPC messages going back and forth. Verify that both scaffold_project and list_templates show up and return sensible results.

Here's a debugging tip that will save you time: If your MCP server logs anything to stdout, it will corrupt the JSON-RPC stream and the client will disconnect. Use stderr for all logging: print("debug info", file=sys.stderr). This is the single most common source of "my server connects but then immediately fails" bugs. The New Stack called stdio transport "incredibly fragile" for exactly this reason.

Step 3: Dockerize It

Create a Dockerfile in your project root:

FROM python:3.12-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy server code
COPY server.py .

# MCP servers over stdio need unbuffered output
ENV PYTHONUNBUFFERED=1

# The server reads from stdin and writes to stdout
CMD ["python", "server.py"]

Create requirements.txt:

mcp[cli]>=1.25,<2

Build and verify:

docker build -t mcp-scaffolder .

# Quick test - should start without errors
docker run -i mcp-scaffolder

Again, you'll see nothing because the server is waiting for input. Ctrl+C to stop.

Two things matter in this Dockerfile:

PYTHONUNBUFFERED=1 is critical. Without it, Python buffers stdout, and the MCP client may hang waiting for responses that are sitting in a buffer. This is one of those bugs that works fine in local testing and breaks in Docker.
docker run -i (interactive mode) is required. The -i flag keeps stdin open so the MCP client can send messages to the container. Without it, the server gets an immediate EOF and exits.

Step 4: Wire It Into Claude Code

Now connect your Docker container to Claude Code:

claude mcp add scaffolder -- docker run -i --rm mcp-scaffolder

That's the whole command. Let me break it down:

claude mcp add registers a new MCP server
scaffolder is the name you will reference it by
Everything after -- is the command Claude Code runs to start the server
docker run -i --rm mcp-scaffolder starts the container with interactive stdin and removes it when done

Verify that it registered:

claude mcp list

You should see scaffolder in the output with a stdio transport type.

Now launch Claude Code and check the connection:

claude

Once inside Claude Code, type /mcp to see the status of your MCP servers. You should see scaffolder listed as connected with two tools available.

Step 5: Use It

Still inside Claude Code, try it out:

Create a new Python project called "weather-api"

Claude Code should discover your scaffold_project tool, call it with name="weather-api" and language="python", and report back what it created. Check your filesystem and you should see the full project structure.

Try a few more:

What project templates are available?

Scaffold a Go project called "url-shortener"

If Claude Code doesn't pick up your tools, run /mcp to check the connection status. If it shows as disconnected, the most common causes are that the Docker image failed to build, stdout is being polluted (check for stray print statements), or the Docker daemon is not running.

Security: What the Other Tutorials Leave Out

This is the section most MCP tutorials skip. They should not. MCP has had real security incidents, not theoretical ones, and understanding them makes you a better developer.

The Prompt Injection Problem

MCP servers execute code on your machine based on what an LLM decides to do. If an attacker can influence what the LLM sees, they can influence what your server does. This is called prompt injection, and it is the number one unsolved security problem in the MCP ecosystem.

In May 2025, researchers at Invariant Labs demonstrated this against the official GitHub MCP server. They created a malicious GitHub issue that, when read by an AI agent, hijacked the agent into leaking private repository data (including salary information) into a public pull request. The root cause was an overly broad Personal Access Token combined with untrusted content landing in the LLM's context window.

This was not a contrived lab demo. It used the official GitHub MCP server, the kind of thing people install from the MCP server directory without a second thought.

Real CVEs, Not Theory

The ecosystem has accumulated real vulnerability reports:

CVE-2025-6514: A critical command-injection bug in mcp-remote, a popular OAuth proxy that 437,000+ environments used. An attacker could execute arbitrary OS commands through crafted OAuth redirect URIs.
CVE-2025-6515: Session hijacking in oatpp-mcp through predictable session IDs, letting attackers inject prompts into other users' sessions.
MCP Inspector RCE: Anthropic's own debugging tool allowed unauthenticated remote code execution. Inspecting a malicious server meant giving the attacker a shell on your machine.

An Equixly security assessment found command injection in 43% of tested MCP server implementations. Nearly a third were vulnerable to server-side request forgery.

What You Should Actually Do

For the server we built today, here is what matters:

Limit file system access

Our Docker container doesn't mount your home directory. That's intentional. If you need the server to write files to your host, mount only the specific directory you need: docker run -i --rm -v $(pwd)/projects:/app/projects mcp-scaffolder. Never mount / or ~.

Validate all inputs

Our scaffold_project tool checks that the language is in a known list and that the directory does not already exist. But think about what happens if someone passes name="../../etc/passwd" as the project name. Path traversal is the kind of thing you need to catch. Add this to the tool:

# Add this validation at the top of scaffold_project
if ".." in name or "/" in name or "\\" in name:
    return json.dumps({"error": "Invalid project name"})

Use least-privilege tokens

If your MCP server connects to an API, give it the minimum permissions it needs. The GitHub MCP incident happened because the PAT had access to every private repo. A read-only token scoped to one repo would have contained the blast radius.

Do not install MCP servers from untrusted sources

A malicious npm package posing as a "Postmark MCP Server" was caught silently BCC'ing all emails to an attacker's address. Treat MCP server packages with the same caution you would give any code that runs on your machine with your permissions.

What to Do Next

You have a working MCP server in a Docker container, connected to Claude Code. Here is how to make it portfolio-ready:

Add more tools: The scaffolder is a starting point. Add a tool that reads a project's dependency file and lists outdated packages. Add one that generates a Dockerfile for an existing project. Each tool is a function with a decorator – the pattern is the same every time.
Add tests: Write pytest tests that call your tool functions directly and verify the output. MCP tools are just Python functions. Test them like Python functions.
Push the Docker image: Tag it and push to Docker Hub or GitHub Container Registry. Then your claude mcp add command becomes claude mcp add scaffolder -- docker run -i --rm yourusername/mcp-scaffolder:latest and anyone can use it.
Write a README that explains the security model: What permissions does your server need? What file system access? What happens if inputs are malicious? Answering these questions in your README signals that you think about security, which is exactly what hiring managers are looking for right now.

Wrapping Up

We built a Python MCP server with FastMCP, containerized it with Docker, and connected it to Claude Code. The whole thing fits in about 100 lines of Python, a six-line Dockerfile, and one claude mcp add command.

The MCP ecosystem is real and growing fast. The protocol has the backing of Anthropic, OpenAI, and Google. It's now governed by the Linux Foundation. But it's also young, and the security story is still being written. Build with it, but build with your eyes open.

If you want to go deeper, here are the resources I found most useful:

MCP specification: the actual protocol docs
Claude Code MCP documentation: how Claude Code implements MCP
FastMCP GitHub: the Python framework we used
AuthZed's timeline of MCP security incidents: required reading if you are building MCP servers for production
Simon Willison on MCP prompt injection: the clearest explanation of why this is hard to solve

The complete source code for this tutorial is on GitHub.

How to Build MCP Servers for Your Internal Data

Mayur Vekariya — Wed, 04 Mar 2026 15:04:09 +0000

The Model Context Protocol (MCP) is changing how AI applications connect to external tools and data. While some tutorials stop at "connect to GitHub" or "read a file," the real power of MCP is unlocking your internal data—databases, internal APIs, knowledge bases, and proprietary systems—for AI assistants in a structured, secure way.

In this guide, I'll walk you through building production-grade MCP servers that expose your organization's internal data to AI models. We'll go beyond simple examples and cover authentication, multi-tenancy, streaming, and deployment patterns you'll actually need.

Prerequisites
What is MCP and Why Does It Matter for Internal Data?
Architecture Overview
Setting Up the Project
Building the MCP Server
Adding Authentication
- Bearer Token Authentication
- OAuth 2.0 for MCP
Scoping Data Access Per User
Connecting to Internal APIs
Building a RAG Tool for Internal Documents
Production Deployment
Connecting Your MCP Server to AI Clients
- Claude Desktop
- Custom Application (using the MCP Client SDK)
Common Pitfalls
Wrapping Up

Prerequisites

This is an advanced guide. You should be comfortable with:

TypeScript / Node.js
REST APIs and server-side development
Basic understanding of LLMs and tool calling
Familiarity with protocols like JSON-RPC

What is MCP, and Why Does It Matter for Internal Data?

MCP is an open protocol (created by Anthropic) that standardizes how AI assistants discover and invoke external tools. Think of it as a USB-C port for AI — one standard interface that lets any AI model connect to any data source.

Before MCP, connecting an AI assistant to your internal database meant:

Writing custom tool definitions for each LLM provider
Hardcoding data access logic into your AI application
Rebuilding everything when you switched models or added new data sources

MCP separates the data layer from the AI layer. Your MCP server exposes tools and resources. Any MCP-compatible client—Claude, ChatGPT, your custom app—can use them without modification.

For internal data, this is significant because:

Your CRM, ERP, ticketing system, and wiki all become AI-accessible through one protocol
Access control stays in your MCP server, not scattered across AI application code
New AI models or clients automatically get access without rewiring integrations
Tool definitions live close to the data, making them easier to maintain and version

Architecture Overview

Here's what we're building:

The MCP server sits between your AI client and your internal systems. It handles:

Tool discovery: Tells the AI what operations are available
Parameter validation: Ensures the AI sends correct inputs
Data access: Queries your internal systems
Response formatting: Returns structured data the AI can reason about
Authentication: Verifies who's making the request

Setting Up the Project

Let's build an MCP server that exposes an internal employee directory and project management system.

mkdir internal-data-mcp && cd internal-data-mcp
npm init -y
npm install @modelcontextprotocol/sdk zod express pg
npm install -D typescript @types/node @types/express @types/pg tsx

These commands scaffold the project. npm install pulls in the runtime dependencies: the official MCP SDK, Zod for schema validation, Express for the HTTP server, and pg for PostgreSQL. The -D flag installs TypeScript and its type definitions as dev-only dependencies — they're needed to compile the code but don't ship to production. tsx lets you run TypeScript directly during development without a separate compile step.

Now, create your tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "Node16",
    "moduleResolution": "Node16",
    "outDir": "./dist",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "declaration": true
  },
  "include": ["src/**/*"]
}

This TypeScript config targets ES2022, which supports modern JavaScript features like top-level await. "module": "Node16" and "moduleResolution": "Node16" are required when using the MCP SDK's .js import extensions. "strict": true enables all of TypeScript's strictness checks, which helps catch bugs in tool handlers before they reach production. The outDir/rootDir pair tells the compiler to take source files from src/ and emit compiled JavaScript into dist/.

Building the MCP Server

Step 1: Server Skeleton

Create src/server.ts:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

const server = new McpServer(
  { name: "internal-data", version: "1.0.0" },
  { capabilities: { tools: {}, resources: {} } }
);

The McpServer class from the official SDK handles the JSON-RPC protocol, transport negotiation, and lifecycle management. We declare support for both tools (actions the AI can take) and resources (data the AI can read).

Step 2: Connecting to Internal Data

Let's say you have a PostgreSQL database with employee and project data. Create a data access layer:

// src/db.ts
import pg from "pg";

const pool = new pg.Pool({
  connectionString: process.env.INTERNAL_DB_URL,
  max: 10,
  idleTimeoutMillis: 30000,
});

export interface Employee {
  id: string;
  name: string;
  email: string;
  department: string;
  role: string;
  manager_id: string | null;
  start_date: string;
}

export interface Project {
  id: string;
  name: string;
  status: "active" | "completed" | "on_hold";
  lead_id: string;
  department: string;
  deadline: string | null;
}

export async function searchEmployees(
  query: string,
  department?: string
): Promise {
  const conditions = ["(name ILIKE \(1 OR email ILIKE \)1 OR role ILIKE $1)"];
  const params: string[] = [`%${query}%`];

  if (department) {
    conditions.push(`department = $${params.length + 1}`);
    params.push(department);
  }

  const result = await pool.query(
    `SELECT id, name, email, department, role, manager_id, start_date
     FROM employees
     WHERE ${conditions.join(" AND ")}
     ORDER BY name
     LIMIT 25`,
    params
  );

  return result.rows;
}

export async function getProjectsByStatus(
  status: string
): Promise {
  const result = await pool.query(
    `SELECT id, name, status, lead_id, department, deadline
     FROM projects
     WHERE status = $1
     ORDER BY deadline ASC NULLS LAST`,
    [status]
  );

  return result.rows;
}

export async function getProjectMembers(
  projectId: string
): Promise {
  const result = await pool.query(
    `SELECT e.id, e.name, e.email, e.department, e.role,
            e.manager_id, e.start_date
     FROM employees e
     JOIN project_members pm ON pm.employee_id = e.id
     WHERE pm.project_id = $1
     ORDER BY e.name`,
    [projectId]
  );

  return result.rows;
}

Notice this is plain SQL with parameterized queries. Your MCP server's data access layer should use whatever your team already uses — Prisma, Drizzle, Knex, raw SQL. MCP doesn't dictate your data access patterns.

Step 3: Defining Tools

Now expose this data through MCP tools. This is where the design matters most. Good tool definitions directly impact how well the AI uses your data.

// src/tools.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
import {
  searchEmployees,
  getProjectsByStatus,
  getProjectMembers,
} from "./db.js";

export function registerTools(server: McpServer) {
  // Tool 1: Search the employee directory
  server.tool(
    "search_employees",
    `Search the internal employee directory by name, email, or role.
     Returns matching employees with their department and reporting structure.
     Use this when the user asks about people, teams, or org structure.`,
    {
      query: z
        .string()
        .describe("Search term: employee name, email, or role title"),
      department: z
        .string()
        .optional()
        .describe(
          "Filter by department name (e.g., 'Engineering', 'Marketing')"
        ),
    },
    async ({ query, department }) => {
      const employees = await searchEmployees(query, department);

      if (employees.length === 0) {
        return {
          content: [
            {
              type: "text",
              text: `No employees found matching "\({query}"\){department ? ` in ${department}` : ""}.`,
            },
          ],
        };
      }

      const formatted = employees
        .map(
          (e) =>
            `- **\({e.name}** (\){e.email})\n  Role: \({e.role} | Dept: \){e.department} | Since: ${e.start_date}`
        )
        .join("\n");

      return {
        content: [
          {
            type: "text",
            text: `Found \({employees.length} employee(s):\n\n\){formatted}`,
          },
        ],
      };
    }
  );

  // Tool 2: List projects by status
  server.tool(
    "list_projects",
    `List internal projects filtered by status.
     Returns project name, lead, department, and deadline.
     Use this when the user asks about ongoing work, project status, or deadlines.`,
    {
      status: z
        .enum(["active", "completed", "on_hold"])
        .describe("Project status to filter by"),
    },
    async ({ status }) => {
      const projects = await getProjectsByStatus(status);

      if (projects.length === 0) {
        return {
          content: [
            {
              type: "text",
              text: `No ${status} projects found.`,
            },
          ],
        };
      }

      const formatted = projects
        .map(
          (p) =>
            `- **\({p.name}** [\){p.status}]\n  Lead: \({p.lead_id} | Dept: \){p.department} | Deadline: ${p.deadline ?? "None"}`
        )
        .join("\n");

      return {
        content: [
          {
            type: "text",
            text: `\({projects.length} \){status} project(s):\n\n${formatted}`,
          },
        ],
      };
    }
  );

  // Tool 3: Get team members for a project
  server.tool(
    "get_project_team",
    `Get all team members assigned to a specific project.
     Returns employee details for each member.
     Use this when the user asks who is working on a project.`,
    {
      project_id: z
        .string()
        .uuid()
        .describe("The UUID of the project to look up"),
    },
    async ({ project_id }) => {
      const members = await getProjectMembers(project_id);

      if (members.length === 0) {
        return {
          content: [
            {
              type: "text",
              text: "No team members found for this project.",
            },
          ],
        };
      }

      const formatted = members
        .map((m) => `- \({m.name} (\){m.role}, ${m.department})`)
        .join("\n");

      return {
        content: [
          {
            type: "text",
            text: `Project team (\({members.length} members):\n\n\){formatted}`,
          },
        ],
      };
    }
  );
}

server.tool() registers each tool with four arguments: the tool name, a plain-English description the AI reads to decide when to call it, a Zod schema defining the parameters, and the async handler that runs when the tool is invoked. The handler receives validated, typed parameters — Zod rejects malformed inputs before your handler ever runs. Each handler returns a content array; the type: "text" block is the most common format and tells the AI client to treat the response as readable text. Returning an empty result (zero matches) is handled explicitly so the AI gets a useful message rather than an empty array it might misinterpret.

Tool Design Principles

Three things make the difference between tools an AI uses well and tools it struggles with:

1. Descriptive names and descriptions. The AI decides which tool to call based entirely on the description. Be specific about when to use the tool, not just what it does. Compare:

// Vague — the AI won't know when to pick this
"Search employees"

// Specific — the AI knows exactly when this tool is relevant
"Search the internal employee directory by name, email, or role.
 Use this when the user asks about people, teams, or org structure."

2. Typed parameters with descriptions. Use Zod's .describe() on every parameter. The AI needs to understand what each field expects:

// The AI has to guess what format "query" expects
{ query: z.string() }

// The AI knows exactly what to pass
{ query: z.string().describe("Search term: employee name, email, or role title") }

3. Structured return values. Return data in a format the AI can reason about. Use markdown tables or structured lists rather than raw JSON dumps. The AI processes structured text better than deeply nested objects.

Step 4: Exposing Resources

Resources are read-only data the AI can pull into its context. Unlike tools (which the AI invokes during reasoning), resources are typically loaded upfront to provide background knowledge.

// src/resources.ts
import {
  McpServer,
  ResourceTemplate,
} from "@modelcontextprotocol/sdk/server/mcp.js";

export function registerResources(server: McpServer) {
  // Static resource: org chart overview
  server.resource(
    "org-structure",
    "internal://org-structure",
    {
      description:
        "Overview of the organization structure including departments and leadership",
      mimeType: "text/markdown",
    },
    async (uri) => ({
      contents: [
        {
          uri: uri.href,
          mimeType: "text/markdown",
          text: await generateOrgOverview(),
        },
      ],
    })
  );

  // Dynamic resource template: department details
  server.resource(
    "department-info",
    new ResourceTemplate("internal://departments/{name}", {
      list: undefined,
    }),
    {
      description: "Detailed information about a specific department",
      mimeType: "text/markdown",
    },
    async (uri, variables) => ({
      contents: [
        {
          uri: uri.href,
          mimeType: "text/markdown",
          text: await getDepartmentDetails(
            variables.name as string
          ),
        },
      ],
    })
  );
}

server.resource() registers two kinds of resources here. The first uses a fixed URI (internal://org-structure) — this is a static resource the AI can request by name. The second uses a ResourceTemplate, which defines a URI pattern with a {name} placeholder; the AI can request internal://departments/Engineering and the variables.name parameter will be populated with "Engineering" at runtime. Both resources return a contents array with mimeType: "text/markdown" — this tells the client how to render the response. Resources differ from tools in that they're meant to be read as background context, not invoked as actions.

Resources are useful for data that provides context rather than answering a specific question — company policies, API documentation, database schemas, configuration references.

Step 5: Transport and Startup

MCP supports multiple transports. For internal data servers, you'll typically use one of two:

Streamable HTTP — the recommended transport for remote servers (replaces the older SSE transport):

// src/index.ts
import express from "express";
import { randomUUID } from "node:crypto";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { registerTools } from "./tools.js";
import { registerResources } from "./resources.js";

const app = express();
app.use(express.json());

const server = new McpServer(
  { name: "internal-data", version: "1.0.0" },
  { capabilities: { tools: {}, resources: {} } }
);

registerTools(server);
registerResources(server);

// Store transports by session ID
const transports = new Map();

// Handle all MCP requests on a single endpoint
app.all("/mcp", async (req, res) => {
  // Check for existing session
  const sessionId = req.headers["mcp-session-id"] as string | undefined;

  if (sessionId && transports.has(sessionId)) {
    // Existing session — route to its transport
    const transport = transports.get(sessionId)!;
    await transport.handleRequest(req, res);
    return;
  }

  if (sessionId && !transports.has(sessionId)) {
    // Unknown session ID
    res.status(404).json({ error: "Session not found" });
    return;
  }

  // New session — create transport and connect
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: () => randomUUID(),
    onsessioninitialized: (id) => {
      transports.set(id, transport);
    },
  });

  transport.onclose = () => {
    if (transport.sessionId) {
      transports.delete(transport.sessionId);
    }
  };

  await server.connect(transport);
  await transport.handleRequest(req, res);
});

app.listen(3100, () => {
  console.log("MCP server running on http://localhost:3100/mcp");
});

This sets up a single /mcp endpoint that handles all MCP communication. When a new client connects (no mcp-session-id header), a StreamableHTTPServerTransport is created and stored in the transports Map keyed by a generated UUID. On subsequent requests, the session ID from the header is used to look up the existing transport and route the request to it — this is how the server maintains stateful sessions with multiple clients simultaneously. transport.onclose cleans up the Map entry when a session ends, preventing memory leaks. The StdioServerTransport alternative (shown below) skips all of this: it reads from stdin and writes to stdout, which is how Claude Desktop spawns local servers as child processes.

Stdio — for local development or when the MCP client spawns the server as a child process:

// src/stdio.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { registerTools } from "./tools.js";
import { registerResources } from "./resources.js";

const server = new McpServer(
  { name: "internal-data", version: "1.0.0" },
  { capabilities: { tools: {}, resources: {} } }
);

registerTools(server);
registerResources(server);

const transport = new StdioServerTransport();
await server.connect(transport);

For internal data in a production setting, HTTP/SSE is almost always what you want. Stdio is convenient for development and when the client and server run on the same machine.

Adding Authentication

Internal data servers need authentication. You don't want every AI client on the network querying your employee database unauthenticated.

Bearer Token Authentication

The simplest approach is to validate a token on every request:

// src/auth-middleware.ts
import { Request, Response, NextFunction } from "express";

interface AuthenticatedRequest extends Request {
  userId?: string;
  orgId?: string;
}

export function authMiddleware(
  req: AuthenticatedRequest,
  res: Response,
  next: NextFunction
) {
  const authHeader = req.headers.authorization;

  if (!authHeader?.startsWith("Bearer ")) {
    return res.status(401).json({ error: "Missing authorization header" });
  }

  const token = authHeader.slice(7);

  try {
    // Validate against your internal auth system
    const claims = validateInternalToken(token);
    req.userId = claims.sub;
    req.orgId = claims.org;
    next();
  } catch {
    return res.status(403).json({ error: "Invalid token" });
  }
}

function validateInternalToken(token: string) {
  // Replace with your actual token validation:
  // - JWT verification against your auth service
  // - API key lookup in your database
  // - Session token validation against Redis
  // This is a placeholder
  return { sub: "user-123", org: "org-456" };
}

The middleware checks every request for an Authorization: Bearer header before it reaches the MCP handler. validateInternalToken is a placeholder — replace it with your real validation logic: JWT verification using a library like jsonwebtoken, an API key lookup in your database, or a session token check against Redis. The validated claims are attached to the request object (req.userId, req.orgId) so downstream tool handlers can use them for access scoping. The app.use("/mcp", authMiddleware) line ensures no request reaches the MCP endpoint without passing this check first.

Add it to your Express app:

app.use("/mcp", authMiddleware);

OAuth 2.0 for MCP

For clients that support MCP's built-in OAuth flow (like Claude Desktop), you can implement the full OAuth handshake. The MCP SDK provides the OAuthServerProvider interface with these required methods:

import type { OAuthServerProvider } from "@modelcontextprotocol/sdk/server/auth/provider.js";
import type {
  AuthorizationParams,
  OAuthClientInformationFull,
  OAuthRegisteredClientsStore,
  OAuthTokens,
  AuthInfo,
} from "@modelcontextprotocol/sdk/server/auth/types.js";

class InternalOAuthProvider implements OAuthServerProvider {
  // Store for registered OAuth clients
  get clientsStore(): OAuthRegisteredClientsStore {
    return this._clientsStore;
  }

  private _clientsStore: OAuthRegisteredClientsStore = {
    async getClient(clientId: string) {
      // Look up the registered client in your database
      return db.getOAuthClient(clientId);
    },
    async registerClient(clientMetadata) {
      // Register a new dynamic client
      return db.createOAuthClient(clientMetadata);
    },
  };

  // Redirect the user to your internal SSO for authorization
  async authorize(
    client: OAuthClientInformationFull,
    params: AuthorizationParams,
    res: Response
  ): Promise {
    const authUrl = new URL(
      "https://sso.internal.company.com/authorize"
    );
    authUrl.searchParams.set("client_id", client.client_id);
    authUrl.searchParams.set("redirect_uri", params.redirectUri);
    authUrl.searchParams.set("state", params.state ?? "");
    authUrl.searchParams.set(
      "code_challenge",
      params.codeChallenge
    );
    // The method writes to the response directly
    res.redirect(authUrl.toString());
  }

  // Return the PKCE challenge for a given authorization code
  async challengeForAuthorizationCode(
    _client: OAuthClientInformationFull,
    authorizationCode: string
  ): Promise {
    const session = await db.getSessionByCode(authorizationCode);
    return session.codeChallenge;
  }

  // Exchange authorization code for access + refresh tokens
  async exchangeAuthorizationCode(
    client: OAuthClientInformationFull,
    authorizationCode: string,
    _codeVerifier?: string,
    _redirectUri?: string
  ): Promise {
    const response = await fetch(
      "https://sso.internal.company.com/token",
      {
        method: "POST",
        headers: {
          "Content-Type": "application/x-www-form-urlencoded",
        },
        body: new URLSearchParams({
          grant_type: "authorization_code",
          code: authorizationCode,
          client_id: client.client_id,
        }),
      }
    );

    return response.json() as Promise;
  }

  // Refresh expired tokens
  async exchangeRefreshToken(
    client: OAuthClientInformationFull,
    refreshToken: string
  ): Promise {
    const response = await fetch(
      "https://sso.internal.company.com/token",
      {
        method: "POST",
        headers: {
          "Content-Type": "application/x-www-form-urlencoded",
        },
        body: new URLSearchParams({
          grant_type: "refresh_token",
          refresh_token: refreshToken,
          client_id: client.client_id,
        }),
      }
    );

    return response.json() as Promise;
  }

  // Validate an access token on every request
  async verifyAccessToken(token: string): Promise {
    const response = await fetch(
      "https://sso.internal.company.com/introspect",
      {
        method: "POST",
        headers: {
          "Content-Type": "application/x-www-form-urlencoded",
        },
        body: new URLSearchParams({ token }),
      }
    );

    const data = await response.json();
    if (!data.active) throw new Error("Token inactive");

    return {
      token,
      clientId: data.client_id,
      scopes: data.scope?.split(" ") ?? [],
      expiresAt: data.exp,
    };
  }
}

InternalOAuthProvider implements the OAuthServerProvider interface, which the MCP SDK calls at each stage of the OAuth flow. clientsStore handles dynamic client registration — MCP clients like Claude Desktop register themselves the first time they connect. authorize() redirects the user to your internal SSO; it writes directly to the Express response. challengeForAuthorizationCode() returns the PKCE code challenge stored when the authorization session began — this is how the token exchange is verified without transmitting secrets. exchangeAuthorizationCode() and exchangeRefreshToken() make server-to-server calls to your SSO's token endpoint, keeping credentials out of the browser. verifyAccessToken() is called on every incoming MCP request using the token introspection endpoint to confirm the token is still active and extract the user's scopes.

Scoping Data Access Per User

This is the most important part of an internal data MCP server: the AI should only access data the requesting user is authorized to see.

Don't skip this. Without user-scoped access, you're building a data exfiltration tool with an AI wrapper.

// src/scoped-tools.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

export function registerScopedTools(
  server: McpServer,
  getUserContext: () => { userId: string; orgId: string; role: string }
) {
  server.tool(
    "search_employees",
    "Search the employee directory. Results are filtered based on your access level.",
    {
      query: z.string().describe("Name, email, or role to search for"),
    },
    async ({ query }) => {
      const ctx = getUserContext();

      // Enforce access boundaries
      let departmentFilter: string | undefined;

      if (ctx.role === "manager") {
        // Managers see their department only
        departmentFilter = await getUserDepartment(ctx.userId);
      } else if (ctx.role === "employee") {
        // Regular employees see limited fields
        departmentFilter = await getUserDepartment(ctx.userId);
      }
      // Admins and HR see everything — no filter

      const employees = await searchEmployees(query, departmentFilter);

      // Redact sensitive fields based on role
      const results = employees.map((e) => ({
        name: e.name,
        email: e.email,
        department: e.department,
        role: e.role,
        // Only HR and admins see start date and manager info
        ...(["admin", "hr"].includes(ctx.role)
          ? { start_date: e.start_date, manager_id: e.manager_id }
          : {}),
      }));

      return {
        content: [
          {
            type: "text",
            text: formatEmployeeList(results),
          },
        ],
      };
    }
  );
}

The pattern here:

Extract user context from the authenticated session
Filter queries at the database level (not after fetching everything)
Redact fields the user shouldn't see
Log access for audit trails

Connecting to Internal APIs

Not all internal data lives in databases. You often need to wrap existing internal APIs:

server.tool(
  "get_ticket_details",
  `Look up a support ticket from the internal ticketing system.
   Returns ticket status, assignee, priority, and recent updates.`,
  {
    ticket_id: z
      .string()
      .regex(/^TK-\d+$/)
      .describe("Ticket ID in format TK-12345"),
  },
  async ({ ticket_id }) => {
    const ctx = getUserContext();

    const response = await fetch(
      `\({process.env.TICKETING_API_URL}/api/v2/tickets/\){ticket_id}`,
      {
        headers: {
          Authorization: `Bearer ${process.env.TICKETING_SERVICE_TOKEN}`,
          "X-On-Behalf-Of": ctx.userId,
        },
      }
    );

    if (response.status === 404) {
      return {
        content: [
          { type: "text", text: `Ticket ${ticket_id} not found.` },
        ],
      };
    }

    if (response.status === 403) {
      return {
        content: [
          {
            type: "text",
            text: `You don't have access to ticket ${ticket_id}.`,
          },
        ],
      };
    }

    const ticket = await response.json();

    return {
      content: [
        {
          type: "text",
          text: [
            `**\({ticket.id}: \){ticket.title}**`,
            `Status: \({ticket.status} | Priority: \){ticket.priority}`,
            `Assignee: ${ticket.assignee?.name ?? "Unassigned"}`,
            `Created: ${ticket.created_at}`,
            "",
            `**Latest Update:**`,
            ticket.updates?.[0]?.body ?? "No updates yet.",
          ].join("\n"),
        },
      ],
    };
  }
);

Key points when wrapping internal APIs:

Use service tokens for server-to-server auth, but pass user identity via headers like X-On-Behalf-Of
Handle HTTP errors explicitly — return user-friendly messages, not raw error objects
Validate input formats — the regex on ticket_id prevents injection and guides the AI on expected format
Don't leak internal implementation details in error messages

Building a RAG Tool for Internal Documents

One of the highest-value use cases: letting the AI search your internal knowledge base. Here's a tool that performs vector search against an internal document store:

server.tool(
  "search_internal_docs",
  `Search the internal knowledge base for relevant documents.
   Covers engineering docs, runbooks, architecture decisions, and policies.
   Use this when the user asks about internal processes, systems, or decisions.`,
  {
    query: z
      .string()
      .describe("Natural language search query"),
    category: z
      .enum(["engineering", "policy", "runbook", "architecture", "all"])
      .default("all")
      .describe("Document category to search within"),
    limit: z
      .number()
      .min(1)
      .max(10)
      .default(5)
      .describe("Maximum number of results"),
  },
  async ({ query, category, limit }) => {
    // Generate embedding for the search query
    const embedding = await generateEmbedding(query);

    // Vector similarity search against your document store
    const results = await pool.query(
      `SELECT
         d.id,
         d.title,
         d.category,
         d.content_chunk,
         d.source_url,
         d.updated_at,
         1 - (d.embedding <=> $1::vector) AS similarity
       FROM document_chunks d
       WHERE (\(2 = 'all' OR d.category = \)2)
         AND 1 - (d.embedding <=> $1::vector) > 0.7
       ORDER BY d.embedding <=> $1::vector
       LIMIT $3`,
      [JSON.stringify(embedding), category, limit]
    );

    if (results.rows.length === 0) {
      return {
        content: [
          {
            type: "text",
            text: `No relevant documents found for "${query}".`,
          },
        ],
      };
    }

    const formatted = results.rows
      .map(
        (doc, i) =>
          `### \({i + 1}. \){doc.title}\n` +
          `Category: \({doc.category} | Updated: \){doc.updated_at} | Relevance: ${(doc.similarity * 100).toFixed(0)}%\n\n` +
          `${doc.content_chunk}\n\n` +
          `Source: ${doc.source_url}`
      )
      .join("\n\n---\n\n");

    return {
      content: [
        {
          type: "text",
          text: `Found \({results.rows.length} relevant document(s):\n\n\){formatted}`,
        },
      ],
    };
  }
);

This tool combines two operations: embedding generation and vector similarity search. generateEmbedding(query) calls an embedding model (such as OpenAI's text-embedding-3-small or a self-hosted model) to convert the user's query into a numeric vector. The SQL query then uses pgvector's <=> operator to compute cosine distance between the query vector and stored document chunk embeddings — lower distance means higher similarity. The 1 - (embedding <=> $1) > 0.7 condition filters out results below 70% similarity, so the AI doesn't receive loosely related noise. Results are ordered by ascending distance (most similar first) and capped by the limit parameter. The formatted output includes a relevance percentage so the AI can communicate confidence levels to the user.

Production Deployment

Dockerizing the MCP Server

FROM node:22-slim AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:22-slim AS runtime

WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

ENV NODE_ENV=production
EXPOSE 3100

HEALTHCHECK --interval=30s --timeout=5s \
  CMD curl -f http://localhost:3100/health || exit 1

CMD ["node", "dist/index.js"]

The Dockerfile uses a two-stage build. The builder stage installs all dependencies (including devDependencies) and compiles TypeScript to JavaScript in dist/. The runtime stage starts fresh from a clean Node image and copies only the compiled output and node_modules — devDependencies like TypeScript are excluded, keeping the final image small. The HEALTHCHECK instruction tells Docker (and orchestrators like Kubernetes) to poll /health every 30 seconds; if the endpoint fails, the container is marked unhealthy and can be automatically restarted or removed from the load balancer rotation.

Health Checks and Monitoring

Add a health endpoint that verifies your dependencies:

app.get("/health", async (_req, res) => {
  const checks = {
    database: false,
    ticketingApi: false,
  };

  try {
    await pool.query("SELECT 1");
    checks.database = true;
  } catch {}

  try {
    const resp = await fetch(
      `${process.env.TICKETING_API_URL}/health`
    );
    checks.ticketingApi = resp.ok;
  } catch {}

  const healthy = Object.values(checks).every(Boolean);
  res.status(healthy ? 200 : 503).json({
    status: healthy ? "healthy" : "degraded",
    checks,
    uptime: process.uptime(),
  });
});

The /health endpoint runs two dependency checks in parallel: a lightweight SELECT 1 query to confirm the database connection is live, and an HTTP ping to the ticketing API. Both results are collected into a checks object. If any check fails, the endpoint returns HTTP 503 (Service Unavailable) — this is the signal load balancers and container orchestrators use to stop routing traffic to an unhealthy instance. process.uptime() is included as a diagnostic field so you can quickly tell whether a degraded instance just started or has been running for hours.

Logging and Audit Trail

Every tool invocation against internal data should be logged:

function createAuditLogger() {
  return {
    logToolCall(params: {
      userId: string;
      tool: string;
      input: Record;
      resultSize: number;
      durationMs: number;
    }) {
      // Ship to your logging infrastructure
      // (Datadog, ELK, CloudWatch, etc.)
      console.log(
        JSON.stringify({
          event: "mcp_tool_call",
          timestamp: new Date().toISOString(),
          ...params,
        })
      );
    },
  };
}

createAuditLogger returns a logger object rather than a class instance, which makes it easy to swap the underlying transport (stdout, a logging SDK, etc.) without changing the call sites. The audited wrapper function is a higher-order function: it takes a tool handler and returns a new function with the same signature, but with timing and logging added around the original call. The try/catch ensures a log entry is written even when the handler throws — you want failed calls in your audit trail, not just successful ones. Shipping these logs to a centralized store (Datadog, CloudWatch, ELK) lets you answer questions like "what data did this user's AI session access last Tuesday?" — which is often required for compliance in organizations handling sensitive internal data.

Wrap your tool handlers to automatically log every call:

function audited>(
  handler: (params: T) => Promise,
  toolName: string,
  audit: ReturnType
) {
  return async (params: T): Promise => {
    const start = Date.now();
    const ctx = getUserContext();

    try {
      const result = await handler(params);
      audit.logToolCall({
        userId: ctx.userId,
        tool: toolName,
        input: params,
        resultSize: JSON.stringify(result).length,
        durationMs: Date.now() - start,
      });
      return result;
    } catch (error) {
      audit.logToolCall({
        userId: ctx.userId,
        tool: toolName,
        input: params,
        resultSize: 0,
        durationMs: Date.now() - start,
      });
      throw error;
    }
  };
}

Connecting Your MCP Server to AI Clients

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "internal-data": {
      "url": "http://localhost:3100/mcp",
      "headers": {
        "Authorization": "Bearer your-internal-token"
      }
    }
  }
}

Custom Application (using the MCP Client SDK)

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";

const transport = new StreamableHTTPClientTransport(
  new URL("http://localhost:3100/mcp"),
  {
    requestInit: {
      headers: {
        Authorization: `Bearer ${userToken}`,
      },
    },
  }
);

const client = new Client(
  { name: "my-ai-app", version: "1.0.0" }
);

await client.connect(transport);

// Discover available tools
const { tools } = await client.listTools();
console.log("Available tools:", tools.map((t) => t.name));

// Call a tool
const result = await client.callTool({
  name: "search_employees",
  arguments: { query: "engineering manager" },
});

console.log(result.content);

StreamableHTTPClientTransport manages the HTTP connection to your MCP server, including attaching the Authorization header to every request. client.connect(transport) performs the MCP initialization handshake — the client announces its capabilities and the server responds with the list of available tools and resources. client.listTools() returns the full tool catalog, which you can use to dynamically build a UI or pass directly to an LLM's tool-calling API. client.callTool() sends a JSON-RPC request to invoke a specific tool by name and returns the content array from the handler — the same format the AI model receives. In a production application, you'd pass this content back to the model as a tool result in the conversation history.

Common Pitfalls

1. Returning too much data. LLMs have context limits. If your database query returns 500 rows, don't send them all. Paginate, summarize, or limit results. 25 items is a reasonable default.

2. Tool descriptions that are too generic. If you have search_employees and search_contractors, the AI needs to know the difference. Don't rely on the tool name alone — the description is what the model reads.

3. Missing error handling. When a database query fails, return a structured error message, not a stack trace. The AI needs to tell the user something useful, and raw errors leak implementation details.

4. No rate limiting. AI tool calls can happen in loops. If the model calls your tool 50 times in one conversation, you need circuit breakers:

const rateLimiter = new Map();

function checkRateLimit(userId: string, limit = 30, windowMs = 60000) {
  const now = Date.now();
  const calls = rateLimiter.get(userId) ?? [];
  const recent = calls.filter((t) => now - t < windowMs);

  if (recent.length >= limit) {
    throw new Error(
      `Rate limit exceeded. Max ${limit} calls per minute.`
    );
  }

  recent.push(now);
  rateLimiter.set(userId, recent);
}

5. Not testing with actual AI models. Your tools might look correct in unit tests but confuse the model. Test the full loop: AI model receives tool definitions, decides to call a tool, gets the result, and reasons about it. Adjust descriptions based on how the model actually behaves.

Wrapping Up

Building MCP servers for internal data is about three things:

Good tool design — clear descriptions, typed parameters, structured responses
Proper access control — authenticate users, scope data access, log everything
Production readiness — health checks, rate limiting, error handling, monitoring

The protocol itself is straightforward. The hard work is designing the right abstractions over your internal systems so the AI can use them effectively without leaking data or overwhelming the context window.

Start with one or two high-value tools (employee lookup, document search), test them with real users, and expand from there. The best internal MCP servers grow organically based on what people actually ask the AI.

The full source code from this guide is available on GitHub.

Learn RAG & MCP Fundamentals

Beau Carnes — Thu, 22 Jan 2026 14:34:33 +0000

Building AI today is about more than just a clever prompt. If you really want to move from playing with standalone tools to creating integrated systems that actually work with your data, our new crash course on the freeCodeCamp.org YouTube channel is exactly where you need to start.

Mastering RAG (Retrieval Augmented Generation)

Everyone is talking about RAG, but many people struggle to understand how it works under the hood. This course starts by breaking down how to connect a model to your own private information. You will learn how to turn documents into embeddings (mathematical representations of meaning) and store them in vector databases like Chroma.

The course also covers the "precision problem." You will learn why just uploading a massive PDF doesn't work and how to use chunking strategies to ensure the AI finds exactly the right paragraph to answer a user's question.

Coordination with MCP

While RAG gives an AI knowledge, the Model Context Protocol (MCP) gives it the ability to coordinate actions. MCP allows AI agents to interact with third-party software, databases, and local files. Instead of writing custom code for every single API, MCP provides a standardized way for agents to discover what a server can do and then execute tasks.

You will learn how to build your own MCP server and client using the Python SDK, giving your AI the "hands" it needs to perform real-world tasks.

Watch the full course on the freeCodeCamp.org YouTube channel (2-hour watch).

How Does an MCP Work Under the Hood? MCP Workflow Explained

Ajay Patel — Tue, 16 Dec 2025 18:35:11 +0000

We’ve all faced that awkward limitation with AI: it can write code or explain complex topics in seconds, but the moment you ask it to check a local file or run a quick database query, it hits a wall. It’s like having a genius assistant who is locked in an empty room—smart, but completely cut off from your actual work. This is where the Model Context Protocol (MCP) changes the game. In this article, we’ll explore MCP in depth.

MCP Server: A-Z of Model Context Protocol
What is MCP (Model Context Protocol)?
Architecture of MCP
How Does MCP Work?
MCP vs RAG
MCP vs A2A
Resources
Conclusion

MCP Server: A-Z of Model Context Protocol

LLMs possess impressive knowledge and reasoning skills, which allow them to perform many complex tasks. But the problem is that their knowledge is limited to their initial training data. It means they can’t access your calendar, run SQL queries, or send an email.

It was clear that, to give the LLMs real-world knowledge, we have to provide some integrations that enable them to access real-time knowledge or perform some actions in the real world. This leads to the classic MxN problems, where developers have to build and maintain custom integrations for every combination of M models and N tools.

The image below properly demonstrates the MxN Problem:

Function calling (also known as tool calling) provides a powerful and flexible way for OpenAI models to interface with external systems and access data outside their training data. However, this feature is currently exclusive to OpenAI models, creating vendor lock-in.

That’s where MCP steps in. MCP is a write once, use anywhere approach to the problem. An app developer can write a single MCP server for any AI system to use and expose a set of tools and data. Similarly, an AI system can implement the protocol and connect to any MCP server that exists today or in the future.

What is MCP (Model Context Protocol)?

MCP is an open-source standard, developed by Anthropic, for connecting AI applications to external systems.

By using an MCP, AI applications like Claude or ChatGPT can connect to data sources like local files and databases, tools like search engines and calculators, and workflows like specialized prompts—enabling them to access key information and perform tasks.

Think of an MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect electronic devices, an MCP provides a standardized way to connect AI applications to external systems.

The image below will help you to better understand the MCP Server:

Architecture of MCP

The Model Context Protocol has a clear structure with components that work together to help LLMs and outside systems interact easily. An MCP follows a simple client-server architecture, which can be broken down into three simple key components:

MCP Host

The host is the user-facing AI application, the environment where the AI model lives and interacts with the user. Hosts manage the discovery, permissions, and communication between clients and servers. This ca be a chat application like OpenAI’s ChatGPT interface or Anthropic’s Claude desktop app, or an AI-enhanced IDE like Cursor & Windsurf.

MCP Client

The MCP client is a component within the host that handles the low-level communication with the MCP server. MCP clients are instantiated by host applications to communicate with particular MCP servers. Each client handles one direct communication with one server.

Here, the difference is important: the host is the application users interact with, while clients are the components that enable server connections.

MCP Server

The MCP server is the external program or service that exposes the capabilities (tools, data, and so on) to the application. An MCP server can be seen as a wrapper around some functionality, which exposes a set of tools or resources in a standardized way so that any MCP client can invoke them.

Servers can run locally on the same machine as the host, or remotely on some cloud service, since an MCP is designed to support both scenarios seamlessly

The image below will help you to better understand the concept:

An MCP server can expose one or more capabilities to the client. Capabilities are essentially the features or functions that the server makes available.

The MCP server provides the following capabilities:

Tools: Tools are the functions that do something on behalf of the AI model. An AI can use this tool whenever required. Tools are triggered by the AI model’s choice, which means the LLM (via the host) decides to call a tool when it determines it needs to perform a specific task. For example: send_email -> send the email to the user
Resources: Resources provide read-only data to the AI model. A resource can be a database record or a knowledge base that the AI can query to get information, but can’t modify.
Prompts: Prompts are the predefined templates or workflows that the server can provide.

Transport Layer

The transport layer uses JSON-RPC 2.0 messages to communicate between the client and server. For this, we have mainly two transport methods:

Standard Input/Output (stdio): Ideal for local environments, providing fast and synchronous message transmission.
Server-Sent Events (SSE): Best suited for remote resources, enabling efficient, real-time, one-way data streaming from the server to the client.

How Does MCP Work?

An MCP gives an AI assistant the ability to securely use external tools, databases, and services. Imagine you ask Claude:

“Find the latest sales report in our database and email it to my manager.”

Step #1 - Tool Discovery

When we launch any MCP client (Claude Desktop), it connects to your configured MCP servers and asks: “What can I do with available tools?”

Each server responds with its available tools:

database_query ,email_sender ,file_browser

Now, Claude knows about the tools it has.

Step #2 - Understanding Your Requirement

Claude reads your query and realizes:

It needs to retrieve information it doesn’t have (in this case, it has to find the sales data database_query)
It needs to take an external action (send email email_sender )

So Claude plans a 2-step tool sequence.

Step #3 - Ask for Permission

Before any external action happens, Claude Desktop prompts you: “Claude wants to query your sales database. Allow?”

Nothing proceeds without your approval. This is core to the MCP’s security model.

Step #4 - Querying the Database

Once you grant the permission, Claude sends a structured MCP tool call to the database_query server.

Next, the server will run a secure database lookup and return the latest sales report data. This doesn’t give Claude direct access to the database.

Step #5 - Sending the Email

Once Claude has the data, Claude triggers a second permission prompt: “Claude wants to send an email on your behalf. Approve?”

Once approved, MCP sends the information to the email_sender server, and Claude will format the email & deliver it to your manager

Step #6 - Natural Answer

Claude wraps everything up nicely and sends a response to you, “Done! I found the latest sales report and emailed it to your manager.”

The entire process typically happens in seconds. From your perspective, Claude simply "knows" how to access your database and send emails, but in reality, the MCP has orchestrated a secure, standardized exchange between multiple systems.

The beauty of MCP is that it transforms AI assistants from isolated conversational tools into genuine productivity partners that can interact with your entire digital ecosystem, safely and with your explicit permission every step of the way.

MCP vs RAG

Fundamentally, MCP and RAG are built for serving different purposes.

RAG is a technique that is used to supply the relevant knowledge that we have stored in a vector database. In RAG, the user’s query is converted to a vector embedding, which searches through embeddings in the vector database and finds the relevant context based on similarity. This relevant context is then provided to the LLM. It is great for answering questions from large documents like company wikis, knowledge bases, or research papers.

An MCP enables AI models to perform real-world actions with the help of tools. It lets the AI connect to tools and services like databases, APIs, Gmail, calendar, and so on.

MCP vs A2A

The Model Context Protocol (MCP) and the Agent-to-Agent (A2A) protocol are complementary open standards in AI architecture that serve different purposes in how AI agents connect with external systems.

MCP standardizes how a single AI agent connects to tools, data, and external systems (agent-to-tool communication).
A2A standardizes how multiple, independent AI agents communicate and collaborate with each other (agent-to-agent communication).

Resources

For more information on the MCP, you can refer to the official website: modelcontextprotocol.io.

Some of the awesome MCP Servers which you can check:

Brave Search MCP Server
- An MCP server implementation that integrates the Brave Search API, providing both web and local search capabilities.
Sentry MCP server
- This server provides tools to inspect error reports, stacktraces, and other debugging information from your Sentry account.
Google Maps MCP Server
- MCP Server for the Google Maps API.
Tailwind MCP Server by FlyonUI
- MCP Server for FlyoUI - Generate Amazing UIs/Themes/Sections with just a single prompt.
git MCP server
- A Model Context Protocol server for Git repository interaction and automation. This server provides tools to read, search, and manipulate Git repositories via Large Language Models.
GitHub MCP Server
- MCP Server for the GitHub API, enabling file operations, repository management, search functionality, and more.
Shadcn MCP Server
- MCP Server for shadcn/studio - Generate Amazing UIs/Themes/Sections with just a single prompt.

You can explore a list of available MCP servers here: https://github.com/punkpeye/awesome-mcp-servers

If you're interested in learning how to build your own MCP server, check out this detailed course on Hugging Face: https://huggingface.co/mcp-course.

Conclusion

MCP (Model Context Protocol) is an open-source standard for connecting AI applications to external systems. With MCP, AI models are not just chatbots, they are fully capable agents that can work with your local files, query your database, send emails with your permission and control.

It has also solved the classic MxN problem—developers only need to build the MCP server once, then all other AI systems can integrate the MCP server in their application.

MCP is the revolution in how AI systems can interact with the real world. As the ecosystem of the MCP continues to grow, it will enable AI agents to become more powerful assistants that can operate across diverse environments with reliability and security.

How to Build Your First MCP Server using FastMCP

Manish Shivanandhan — Wed, 03 Dec 2025 17:17:30 +0000

Model Context Protocol, or MCP, is changing how large language models connect with data and tools.

Instead of treating an AI model as a black box, MCP gives it structured access to information and actions.

It is like the USB-C port for AI, creating a standard way for models to interact with servers that hold real-world data or perform useful tasks.

FastMCP is the easiest and fastest framework for building MCP servers with Python. It hides all the complex protocol details and lets you focus on your logic.

In this guide, you will learn what MCP is, how FastMCP works, and how to build and run your first MCP server from scratch.

What is MCP?
Why use FastMCP?
Creating Your First MCP Server
Running the Server
Adding More Tools
Adding Resources
Using Context in Tools
Connecting with an MCP Client
Authentication and Security
Deploying Your MCP Server
Using the MCP Server with an LLM Application
Conclusion

What is MCP?

MCP is a standard protocol that allows language models to talk to external systems in a secure and consistent way. MCP is similar to an API, but built for large language models instead of humans.

An MCP server can do three main things:

It can expose data as resources (similar to GET endpoints)
It can provide actions through tools (similar to POST requests)
And it can define prompts that guide how the model interacts with data or users.

For example, a resource might return a list of articles, a tool might analyze those articles, and a prompt might define how the model summarizes them. By connecting an LLM to such an MCP server, you give it the power to use your own data and logic in real time.

Why Use FastMCP?

While you could build an MCP server using the official SDK, FastMCP takes things much further. It’s a production-ready framework with enterprise authentication, client libraries, testing tools, and automatic API generation.

You can use FastMCP to build secure, scalable MCP applications that integrate with providers like Google, GitHub, and Azure. It also supports deployment to the cloud or your own infrastructure.

Most importantly, the framework is extremely developer-friendly. You can create a working MCP server in just a few lines of Python code.

Creating Your First MCP Server

Before you start building, install FastMCP in your Python environment. You can use pip or uv. The uv tool is recommended because it handles environments and dependencies efficiently.

uv pip install fastmcp

Once installed, you are ready to write your first server.

Every MCP server starts with the FastMCP class. This class represents your application and manages your tools, resources, and prompts. Let’s start by creating a simple server that adds two numbers together.

Create a file named server.py and add the following code:

from fastmcp import FastMCP

mcp = FastMCP("Demo Server 🚀")

@mcp.tool
def add(a: int, b: int) -> int:
    """Add two numbers and return the result"""
    return a + b
if __name__ == "__main__":
    mcp.run()

That’s all you need. You have just created a fully working MCP server with one tool called add. When a client calls this tool, the server adds two numbers and returns the result.

Running the Server

To run your server locally, open your terminal and type:

fastmcp run server.py

This command starts the MCP server. You can also use HTTP or SSE transports for web-based deployments. For example, to run your server over HTTP, use:

mcp.run(transport="http", host="127.0.0.1", port=8000, path="/mcp")

Once the server is running, clients can connect and call the add tool remotely.

Adding More Tools

FastMCP tools are simple Python functions that you decorate with @mcp.tool. You can add as many as you like. Let’s add a multiplication tool next:

@mcp.tool
def multiply(a: float, b: float) -> float:
    """Multiply two numbers"""
    return a * b

You can now run the server again, and clients will have access to both the add and multiply tools.

FastMCP automatically generates schemas based on your function signatures and docstrings, making it easy for clients to understand your API.

Adding Resources

Resources in MCP represent read-only data that clients can access. You can create static resources or dynamic templates that take parameters. For example, you might expose a version number or a user profile.

@mcp.resource("config://version")
def get_version():
    return "1.0.0"

@mcp.resource("user://{user_id}/profile")
def get_profile(user_id: int):
    return {"name": f"User {user_id}", "status": "active"}

In this example, the first resource always returns the version number, while the second resource dynamically fetches a user profile based on the ID provided.

Using Context in Tools

FastMCP allows you to access the session context within any tool, resource, or prompt by including a ctx: Context parameter. The context gives you powerful capabilities like logging, LLM sampling, progress tracking, and resource access.

Here is an example that shows how to use context:

from fastmcp import Context

@mcp.tool
async def summarize(uri: str, ctx: Context):
    await ctx.info(f"Reading resource from {uri}")
    data = await ctx.read_resource(uri)
    summary = await ctx.sample(f"Summarize this: {data.content[:500]}")
    return summary.text

This tool logs a message, reads a resource, and then asks the client’s language model to summarise it. Context makes your MCP tools smarter and more interactive.

Connecting with an MCP Client

Once your server is running, you can connect to it using the fastmcp.Client class. The client can communicate via STDIO, HTTP, or SSE, and can even run in-memory for testing.

Here is a simple example of connecting to your local server and calling the add tool:

from fastmcp import Client
import asyncio

async def main():
    async with Client("server.py") as client:
        tools = await client.list_tools()
        print("Available tools:", tools)
        result = await client.call_tool("add", {"a": 5, "b": 7})
        print("Result:", result.content[0].text)
asyncio.run(main())

You can also connect to multiple servers using a standard MCP configuration file, making it easy to build complex systems that interact with several services simultaneously.

Authentication and Security

When you move from development to production, authentication becomes important.

FastMCP has built-in support for enterprise-grade authentication providers such as Google, GitHub, Microsoft Azure, Auth0, and WorkOS. You can enable secure OAuth-based access with just a few lines of code:

from fastmcp.server.auth.providers.google import GoogleProvider
from fastmcp import FastMCP

auth = GoogleProvider(client_id="...", client_secret="...", base_url="https://myserver.com")
mcp = FastMCP("Secure Server", auth=auth)

Now only authenticated users can access your server. On the client side, you can connect using an OAuth flow like this:

async with Client("https://secure-server.com/mcp", auth="oauth") as client:
    result = await client.call_tool("protected_tool")

FastMCP handles tokens, refreshes, and error handling automatically.

Deploying Your MCP Server

You can deploy FastMCP servers anywhere.

For testing, the fastmcp run command is enough. For production, you can deploy to FastMCP Cloud, which provides instant HTTPS endpoints and built-in authentication.

If you prefer to self-host, use the HTTP or SSE transport to serve your MCP endpoints from your own infrastructure. A simple deployment command might look like this:

mcp.run(transport="http", host="0.0.0.0", port=8080)

Once deployed, your MCP server is ready to connect with language models, web clients, or automation workflows.

Using the MCP Server with an LLM Application

Once your MCP server is running, the next step is to connect it to a large language model. This allows an LLM to securely call your server’s functions, read resources, and perform actions as part of a conversation.

To connect an LLM application, you first define your MCP configuration file. This file lists the available servers, their connection methods, and any authentication requirements.

Once configured, the LLM can automatically discover your MCP tools and call them when needed.

For example, if your server exposes an add or summarize tool, the model can directly use them as if they were built-in capabilities. In a chat-based environment, when a user asks the model to perform a task such as “Summarize the latest article,” the LLM will call your summarize tool, process the result, and respond with the output.

If you are building a custom LLM application with frameworks like OpenAI’s Assistants API or LangChain, you can register your MCP server as an external tool. The LLM then interacts with it through the MCP client library.

Here is a simple example:

from fastmcp import Client
from openai import OpenAI
import asyncio

async def main():
    # Connect to your MCP server
    async with Client("http://localhost:8000/mcp") as client:
        # Call an MCP tool directly
        result = await client.call_tool("add", {"a": 10, "b": 5})
        print("MCP Result:", result.content[0].text)
        # Use the result inside an LLM prompt
        llm = OpenAI(api_key="YOUR_KEY")
        response = llm.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "You are an AI assistant using MCP tools."},
                {"role": "user", "content": f"The sum of 10 and 5 is {result.content[0].text}. Explain how MCP helps with this integration."}
            ]
        )
        print(response.choices[0].message.content)

asyncio.run(main())

In this setup, the LLM can seamlessly combine its reasoning with your server’s logic. It uses the MCP client to fetch data or perform computations and then incorporates the output into its conversation or workflow.

This approach lets you build intelligent systems that go beyond static prompts. You can connect your LLM to real databases, APIs, or automation tools, turning it into an active agent that can read, write, and execute with real-world context.

Conclusion

FastMCP makes it simple to bring your data, APIs, and tools into the world of AI through the Model Context Protocol. With just a few lines of Python, you can create powerful MCP servers that connect to language models, automate workflows, and handle real-world logic securely.

Whether you are building a quick demo or an enterprise-grade system, FastMCP gives you the shortest path from idea to production. Install it today, start your first server, and explore how MCP can unlock the next level of AI integration.

If you want to learn more about general MCP concepts and how to build an MCP server with Python, I wrote another article about that which you can check out here.

Hope you enjoyed this article. Sign up for my free newsletter TuringTalks.ai for more hands-on tutorials on AI. You can also visit my website.

How to Build Your Own MCP Server with Python

Manish Shivanandhan — Thu, 30 Oct 2025 16:12:21 +0000

Artificial intelligence is evolving at a remarkable pace. Models today can reason, write, code, and analyze information in ways that once seemed impossible.

But there’s one major limitation that still holds them back: context.

Most AI models don’t have access to your system, files, APIs, or live data. They only know what you tell them in a prompt.

The Model Context Protocol, also known as MCP, was created to address this problem. It enables AI models to securely connect to your own tools, APIs, and systems via small, structured servers known as MCP servers.

In this guide, you’ll learn how to build your own MCP server using Python. We’ll walk through each part of the code and I’ll explain how it works.

By the end, you’ll have a running MCP server that can add numbers, return random words, and fetch live weather data from the internet. We will also see how to host this MCP server on the cloud.

What is Model Context Protocol?

Before diving into the code, it’s important to understand what the Model Context Protocol actually is.

MCP is an open standard that defines how AI models and external systems communicate. You can think of it as an API that’s designed specifically for AI assistants.

If an API lets two software programs exchange data, MCP allows an AI model to talk to your system. This opens up endless possibilities.

You could build an MCP server that lets ChatGPT read files from your local machine, or one that calls your company’s internal APIs to fetch data. You could even expose your own Python functions so that a model can use them as tools.

MCP makes this communication structured, secure, and extendable. It runs on familiar web technologies such as Server-Sent Events, or SSE, which allow the server to send real-time data streams to the client.

Setting Up Your Environment

To follow along, you’ll need Python version 3.9 or higher. You can find the code for this example in this repository.

We’ll use a library called FastMCP that simplifies the process of building MCP servers. You can install it using pip:

pip install fastmcp requests

The requests library will be used to make HTTP calls later in the example. Once installed, you’re ready to create your first MCP server.

Creating the Project

Create a new file called server.py and start by importing the necessary modules:

import logging
import os
import random
import sys
import requests
from mcp.server.fastmcp import FastMCP

Here’s what each one does:

The logging module records what your server is doing.
os is used to access environment variables like port numbers.
random will help us generate random words.
sys allows the script to exit gracefully in case of errors.
requests lets us fetch live data from APIs.
And finally, FastMCP turns our Python functions into tools that can be called through the MCP protocol.

Configuring Logging

Logging gives you visibility into what your server is doing. It helps during development and is vital when you deploy your server in production.

name = "demo-mcp-server"
logging.basicConfig(
    level=logging.INFO,
    format='%(name)s - %(levelname)s - %(message)s',
    handlers=[logging.StreamHandler()]
)
logger = logging.getLogger(name)

This configuration prints log messages to the console in a simple format showing the server name, the log level, and the message. Every time a tool runs, a message will appear in the logs such as:

demo-mcp-server - INFO - Tool called: add(3, 5)

Creating the MCP Server

Next, we’ll create the server instance that will host our tools.

port = int(os.environ.get('PORT', 8080))
mcp = FastMCP(name, logger=logger, port=port)

The server will run on the port specified by the environment variable PORT. If that variable isn’t set, it defaults to 8080. The FastMCP object now represents your running MCP server.

Defining Tools

Each function that you decorate with @mcp.tool() becomes an accessible tool that clients can call. Let’s start with a simple example: an addition tool.

Example 1: Adding Two Numbers

@mcp.tool()
def add(a: int, b: int) -> int:
    """Add two numbers"""
    logger.info(f"Tool called: add({a}, {b})")
    return a + b

This tool takes two numbers, logs the call, and returns their sum. Calling add(3, 5) will return 8.

Even though it’s simple, this shows the basic structure of every MCP tool: input parameters, a logging statement, and a return value.

Example 2: Returning a Random Secret Word

Let’s make another tool that returns a random word from a small list.

@mcp.tool()
def get_secret_word() -> str:
    """Get a random secret word"""
    logger.info("Tool called: get_secret_word()")
    return random.choice(["apple", "banana", "cherry"])

When you call this function, it picks one of the three words at random. Each time you call it, you might get a different result. This function demonstrates how MCP tools can use logic or randomness just like any regular Python function.

Example 3: Fetching Weather Data

Now let’s build something more practical. We’ll create a tool that fetches live weather data from the web using the requests library.

@mcp.tool()
def get_current_weather(city: str) -> str:
    """Get current weather for a city"""
    logger.info(f"Tool called: get_current_weather({city})")

try:
        endpoint = "https://wttr.in"
        response = requests.get(f"{endpoint}/{city}", timeout=10)
        response.raise_for_status()
        return response.text
    except requests.RequestException as e:
        logger.error(f"Error fetching weather data: {str(e)}")
        return f"Error fetching weather data: {str(e)}"

This tool accepts a city name, sends a request to the public weather service at wttr.in, and returns the text-based weather report. If there’s any issue, such as a network timeout or invalid city name, the function logs an error and returns a descriptive message.

Calling get_current_weather("London") will print a short weather summary for that city.

Running the Server

Once all your tools are defined, you can start the server. Add the following code to the bottom of your file:

if __name__ == "__main__":
    logger.info(f"Starting MCP Server on port {port}...")
    try:
        mcp.run(transport="sse")
    except Exception as e:
        logger.error(f"Server error: {str(e)}")
        sys.exit(1)
    finally:
        logger.info("Server terminated")

This block starts the server using the Server-Sent Events transport method. If anything goes wrong, it logs the error and shuts down cleanly.

You can now run the server from your terminal:

python server.py

If everything is working, you’ll see:

demo-mcp-server - INFO - Starting MCP Server on port 8080...

Your MCP server is now live and ready to accept requests.

Testing the Tools

To test your tools, you need an MCP-compatible client such as ChatGPT with developer features or another app that supports the protocol. Once connected, the client will list your available tools.

For example, you can send a request like this:

{
  "tool": "add",
  "args": [5, 7]
}

The server will respond with:

{
  "result": 12
}

The same applies to the other tools such as get_secret_word or get_current_weather.

If you want to test the server directly without the MCP client, you can still send HTTP requests manually (though this bypasses the full protocol logic).

For example, to test your weather tool, you can send a simple GET request:

curl http://localhost:8080/tool/get_current_weather?city=London

or in Python:

import requests
response = requests.get("http://localhost:8080/tool/get_current_weather", params={"city": "London"})
print(response.text)

This won’t use the MCP structure (like sse streaming), but it’s a quick sanity check that your server works.

Deploying Your MCP Server to Sevalla

You can run this server locally for development. But if you want to use it in production applications, you have to deploy it to a server.

You can choose any cloud provider, like AWS, Heroku, or others to set up this project. But I will be using Sevalla.

Sevalla is a modern, usage-based Platform-as-a-service provider. It offers application hosting, database, object storage, and static site hosting for your projects.

I am using Sevalla for hosting for two reasons:

Every platform will charge you for creating a cloud resource. Sevalla comes with a $50 credit for us to use, so we won’t incur any costs for this example.
Sevalla has a template for Python MCP server, so it simplifies the manual installation and setup for each resource you will need for installation.

Click on the “Python MCP Server” template. You will see the resources needed to provision the application. Click on “Deploy Template”.

You can see the resource being provisioned. If the deployment doesn't start automatically, click “Deploy now”.

Wait for a few minutes. Once the deployment is complete, you will see a green checkmark.

Once deployment is complete, click on “Visit app”. You will get a cloud url eg. https://python-mcp-server-rlfdk.sevalla.app. Use this as the base url instead of the localhost:3000 url.

You now have a production-grade MCP server running on the cloud. You can plug this into any application to fetch data for our LLM applications.

Why Build Your Own MCP Server?

Building an MCP server gives you control and flexibility.

You can connect AI models directly to your databases or internal systems, automate repetitive actions, and decide exactly what data an AI model can access.

It also allows you to experiment quickly. You can start small with a few simple tools and expand later into complex workflows.

By creating your own MCP server, you’re not just writing code – you’re defining how intelligent systems interact with the real world through your logic and data.

Expanding the Server

Once you’ve mastered the basics, it’s easy to extend your server. You can add tools that read and write files, query databases, interact with APIs like GitHub or Slack, or monitor your system. Each new function becomes another tool that your AI can use.

This modular approach lets you build an entire ecosystem of AI-aware tools, each performing a specific task but working together through the same MCP interface.

Conclusion

In this tutorial, you learned how to create an MCP server in Python using the FastMCP library. You configured logging, set up a server, defined multiple tools, and learned how to run and test it. You also saw how easily these tools can expose real functionality, like fetching live weather data or performing basic computations.

This structure is simple yet powerful. With just a few lines of Python code, you can build bridges between your systems and intelligent models. The Model Context Protocol represents a step toward AI systems that can truly understand and interact with real-world data and actions.

Hope you enjoyed this article. Signup for my free newsletter TuringTalks.ai for more hands-on tutorials on AI. You can also visit my website.

MCP vs APIs: What's the Real Difference?

Manish Shivanandhan — Wed, 29 Oct 2025 21:51:55 +0000

APIs and MCPs both help systems talk to each other.

At first, they might look the same. Both allow one piece of software to ask another for data or perform an action. But the way they work and the reason they exist are completely different.

An API, or Application Programming Interface, is built for developers. It’s how one program communicates with another. MCP, or Model Context Protocol, is built for AI models. It’s how a large language model like GPT or Claude can safely talk to external systems and use tools.

Let’s look at what makes them different, why MCP exists when APIs already do the job, and how they work in real examples.

What is an API?
What is MCP?
How MCP Works
Why Not Just Use an API?
MCP vs API in Practice
Key Conceptual Difference
Discovery and Schema
Security and Privacy
The Future of MCP
Conclusion

What is an API?

An API is a set of rules that lets software talk to software.

It is like a waiter in a restaurant. You tell the waiter what you want, the kitchen prepares it, and the waiter brings it back. You never go into the kitchen yourself.

For example, if you want to get details of a GitHub user, you can make a simple API request.

GET https://api.github.com/users/username

The server replies with a response like this:

{
  "login": "john",
  "id": 12345,
  "followers": 120,
  "repos": 42
}

The API follows a pattern that both the client and the server understand. Developers use APIs every day to connect systems like payment gateways, weather data, or user accounts.

APIs are built for humans to code against. A developer writes the logic, sends requests, handles errors, adds authentication, and decides what to do with the response.

What is MCP?

MCP stands for Model Context Protocol. It’s a new standard that allows AI models to interact with external tools, data, and systems in a safe, structured way.

MCP is not meant for developers directly. It’s meant for large language models.

An AI model cannot make network requests by itself. It doesn’t know how to use headers, tokens, or API formats. It just predicts text based on what you type.

So if you tell a model, “Get the weather for Delhi,” it might generate some text that looks like a Python request. But it cannot actually run that code.

That is where MCP comes in. MCP acts like a bridge between the AI model and the real world. It defines a set of “tools” that the model can use safely.

Each tool is described using a schema so that the model knows what the tool does, what inputs it takes, and what it returns.

How MCP Works

You can think of MCP as a server that runs in the background. It exposes tools that an AI model can call. Each tool is a small piece of code that performs an action.

For example, you can write a simple MCP server in Python like this:

from mcp.server.fastmcp import FastMCP
import requests

mcp = FastMCP(name="github-tools")
@mcp.tool()
def get_repos(username: str):
    """Fetch public repositories for a user"""
    url = f"https://api.github.com/users/{username}/repos"
    return requests.get(url).json()
mcp.run()

This server defines a single tool called get_repos. It takes a username and fetches their GitHub repositories using the GitHub API.

Now, if an AI model is connected to this MCP server, it can ask for “get_repos for user john” and receive the data. The model does not know or care about the actual URL, headers, or tokens. The MCP server handles that part.

Why Not Just Use an API?

You might wonder, why not just let the AI model call the API directly? If the model can talk to APIs, why add another layer?

The short answer is that AI models cannot safely call APIs on their own. They have no built-in execution environment, no way to store secrets, and no limits.

Letting a model make arbitrary network requests would be dangerous. It could expose keys, access private data, or even cause damage by mistake.

MCP solves that problem by creating a controlled layer between the model and your systems. You decide which tools the model can use. You can restrict inputs, filter outputs, and monitor everything the model does.

In an MCP setup, the model never sees API keys or sensitive URLs. It just calls a tool that you define. The tool itself handles the network call and returns only the safe data.

This makes MCP much safer for real-world use, especially in enterprise or private environments.

MCP vs API in Practice

Let’s take a simple example. Suppose you want an AI to fetch weather data.

If you were using an API, you might write code like this:

import requests
response = requests.get("https://api.weatherapi.com/v1/current.json?key=API_KEY&q=Delhi")
print(response.json())

That works fine if a human developer runs it. But if an AI model tried to do the same, it would need access to your API key, network, and code execution. That is unsafe.

With MCP, you can define a tool like this:

@mcp.tool()
def get_weather(city: str):
    """Get weather for a city"""
    import requests
    url = f"https://api.weatherapi.com/v1/current.json?key=API_KEY&q={city}"
    return requests.get(url).json()

Now the AI model can simply say, “Call get_weather with city=Delhi,” and the MCP server runs the function.

The model does not see the API key or the actual URL. It just uses the tool safely.

Key Conceptual Difference

The difference between MCP and API is not just technical. It’s also philosophical.

APIs are for humans to use directly. They assume the caller understands the system, can handle tokens, and knows how to format requests.

MCP is for AI models. It assumes the caller is an intelligent but untrusted system that cannot hold secrets or execute code. The protocol gives the model only what it needs to perform reasoning and tool usage.

So while APIs expose endpoints like /users or /weather, MCP exposes capabilities like “get_user_info” or “get_weather.” The AI model does not call URLs. It calls functions with typed parameters.

Discovery and Schema

Another big advantage of MCP is that it can tell the model what tools are available.

When an AI model connects to an MCP server, it can ask for a list of tools. The server replies with their names, descriptions, and parameters in a structured format.

For example, the model might receive something like this:

{
  "tools": [
    {
      "name": "get_weather",
      "description": "Get weather for a city",
      "parameters": {
        "city": {"type": "string"}
      }
    }
  ]
}

This means the model does not need separate documentation or prompt tuning. It knows exactly how to call each tool.

In contrast, an API would require reading human-written docs, copying sample requests, and guessing formats.

Security and Privacy

MCP provides better control over what the model can do.

Since the tools are defined in your server, you can add rules, limits, and validations. You can prevent the model from sending dangerous inputs or accessing private data.

For example, your tool can reject requests that ask for too much data or contain suspicious patterns. You can also log every call for audit purposes.

APIs, on the other hand, are exposed over the internet. If an API key leaks or a model calls the wrong endpoint, you could face a data breach.

With MCP, everything can run locally, behind a firewall, or on a private network. The model never needs direct access to the outside world.

The Future of MCP

Big AI companies like OpenAI and Anthropic are adopting MCP as a shared standard. That means any model that supports MCP can use your tools without modification.

If you build a weather MCP server today, it could work with GPT, Claude, or any other MCP-compatible model in the future.

This makes MCP a unifying layer between AI systems and external tools, much like APIs are for web applications.

Conclusion

At a glance, MCP and APIs might seem similar because both pass data between systems. But the difference lies in for whom they are built.

APIs are built for developers and systems that can safely make network calls. MCP is built for AI models that reason with text but cannot safely execute code.

An API gives you endpoints to access data. MCP gives the AI tools to use that data safely.

Think of it this way. APIs connect machines. MCP connects intelligence to machines.

That is why MCP is not replacing APIs but sitting above them as a new layer. APIs will still provide the data. MCP will just make it possible for AI to use those APIs safely, with structure, control, and understanding.

Hope you enjoyed this article. Signup for my free AI newsletter TuringTalks.ai for more hands-on tutorials on AI. You can also find visit my website.

mcp - freeCodeCamp.org

How I Used Harness Engineering to Make Our Company AI-Native

Table of Contents

What Harness Engineering Means

Pointing It at a Real Problem

The Four Gates

Gate 1: The Type Checker

Gate 2: 100% Test Coverage on the Logic

Gate 3: End-to-End Tests

Gate 4: Verify by Running It

Where the Harness Failed

What an MCP Server Is and Why You Should Care

You Can Only Improve What You Track

How to Start in Your Own Company

The Real Shift

How MCP Is Changing WordPress Development

What We'll Cover:

What MCP Actually Is

The Shift from Autocomplete to Agency

Tools Leading the Shift

WPVibe AI

Cursor

Zed

What This Means for Day-to-Day WordPress Work

The Developer's Role Is Changing, Not Disappearing

Where This Goes Next

How to Build an MCP Server with FastMCP for Your Local AI Agent

Table of Contents

Background

What is MCP?

What is FastMCP?

Motivation and Architecture

Step 1: Install Ollama and Pull the Model

Step 2: Install Python Dependencies

Step 3: Build the Local MCP Server with FastMCP

Step 4: Agent Python Code

Step 5: Run the Agent

Conclusion

How to Connect Your AI Coding Agent to a Browser on macOS

Table of Contents

What is MCP, and Why Does Browser Automation Need It?

Why Safari Instead of Chrome or Playwright?

Installing Safari MCP

Step 1 — Enable Safari's developer features

Step 2 — Run the server

Step 3 — Tell your agent about it

Your First Automation: Reading a Page

The Payoff: Automating a Logged-in Workflow

Handling the Tricky Parts

Tab Safety — The Agent Must not Hijack Your Tabs

Waiting for Dynamic Content

Framework Forms

Limitations: When Not to Use This

Wrapping Up

How to Build an Autonomous OSINT Agent in Python Using Claude's Tool Use API

Table of Contents

What Is OSINT and Why Manual Workflows Break Down

What You'll Build

Prerequisites

How Claude's Tool Use API Works

How to Install OpenOSINT

How to Install the External Tool Dependencies

How to Configure Optional API Keys

How to Use the Interactive AI REPL

How to Run Individual Tools from the CLI

How to Set Up the MCP Server

How to Register with Claude Code

How to Configure Claude Desktop

How the Agent Loop Works Under the Hood

Project Architecture

The 9 Available Tools

Conclusion

How to Build a Market Research Copilot with MCP and Python [Full Handbook]

Table of Contents

Prerequisites

What This Copilot Actually Produces

What Makes This Different from a Normal Stock Assistant

The Workflow

Project structure:

Building the MCP Client

Setting Up `core.py`

Step 1: MCP Client Wrapper (`client.py`)

Step 2: The Assistant Core (`core.py`)

7. The Orchestration Function (`run_assistant()`)