When I started studying OSINT, I always felt I was just putting random values into software without deeply understanding what I was doing. After months in the field, I realized I wasn't really investigating — I was just executing steps that follow a predictable pattern. That's exactly what an AI agent is good at. So I built one.
In this tutorial you'll learn how to set up OpenOSINT, an open-source Python OSINT framework with an AI agent at its core. You'll learn how Claude's native tool use API works, how to run autonomous investigations from the terminal using the interactive AI REPL, how to use the direct CLI for scripting, and how to expose all the tools to Claude Code or Claude Desktop via an MCP server.
Table of Contents
What Is OSINT and Why Manual Workflows Break Down
Open Source Intelligence (OSINT) is the practice of collecting and analyzing information from publicly available sources. Security researchers use it during penetration tests. Journalists use it to verify identities and trace connections. Threat analysts use it to profile infrastructure.
A typical OSINT workflow looks like this:
You have a target email address
You run
holeheto find which platforms that email is registered onYou notice a username in the output
You manually copy that username and run
sherlockto search 300+ platformsYou switch to a browser to check HaveIBeenPwned
You open another tab for a WHOIS lookup
You take notes and repeat
Every tool is a silo. Every pivot is manual. The investigation logic — what to run next, what to chain, what the findings mean — lives entirely in your head.
When you close the terminal, it's gone.
This tutorial walks you through OpenOSINT, an open-source Python framework that replaces that fragmented workflow with an AI agent that chains tools autonomously, executes them against real binaries, and saves a structured Markdown report.
More importantly, you'll learn the core design principle that makes it trustworthy for security research: hallucination in tool results is structurally impossible.
What You'll Build
By the end of this tutorial, you'll have a working OSINT agent that you can use in three ways:
Interactive AI REPL — type a target in natural language and the agent decides what to run
Direct CLI — run individual tools without AI, useful for scripting
MCP Server — expose all tools to Claude Code or Claude Desktop
Here's what a real session looks like:
$ openosint
openosint ❯ investigate target@example.com
→ generate_dorks('target@example.com')
→ search_email('target@example.com')
✓ Found: Spotify, WordPress, Gravatar, Office365
→ search_breach('target@example.com')
✓ Found in 2 breaches: LinkedIn (2016), Adobe (2013)
→ search_username('target_handle')
✓ Found on: GitHub, Reddit, HackerNews, Twitter
╭──────────────── Report ────────────────╮
│ ## Online Presence │
│ Spotify · WordPress · Gravatar │
│ │
│ ## Data Breaches │
│ LinkedIn (2016) · Adobe (2013) │
╰────────────────────────────────────────╯
✓ Report saved → reports/2026-05-11_report.md
The agent went from email → linked accounts → username pivot → cross-platform search with no human orchestration at any step.
Prerequisites
To follow this tutorial, you'll need:
Python 3.10 or later installed on your machine
Basic familiarity with the command line
An Anthropic API key — only required for the AI REPL, not for the CLI or MCP server
Git installed
You don't need prior experience with OSINT tools or the Anthropic SDK.
How Claude's Tool Use API Works
Before you dive into installation, it's worth understanding the mechanism that makes this framework trustworthy for security research.
Most AI applications that wrap external tools work by generating text that describes what a tool would return. That's a problem when accuracy matters — the model can hallucinate plausible-looking usernames, fake subdomains, or data breaches that never happened.
Claude's tool use API works differently. When the model decides it needs to call a tool, it does not generate the output. It stops and emits a structured tool_use block containing the tool name and the arguments it wants to pass.
Your code then runs the actual binary — holehe, sherlock, or whatever else — and sends the real output back as a tool_result. The model reads that real output and decides its next step.
Here's the flow:
User prompt
↓
Model decides to call search_email()
↓
Hard stop — model emits tool_use block
↓
Your code runs holehe against the real target
↓
Real output sent back as tool_result
↓
Model reads actual results, decides next step
↓
Repeat until investigation is complete
The model never generates tool output. It only ever reads it. If sherlock finds 12 profiles, those 12 URLs go back into the context verbatim. The model cannot add a 13th that doesn't exist.
This is not a prompting trick or a system prompt instruction. It is how the API is architected. Keep this in mind as you read through the agent loop code later in this tutorial.
How to Install OpenOSINT
Start by cloning the repository and installing the package:
git clone https://github.com/OpenOSINT/OpenOSINT.git
cd OpenOSINT
pip install -e .
Alternatively, if you just want to use the tool without modifying the source, install it directly from PyPI:
pip install openosint
Next, set your Anthropic API key. This is only required for the interactive AI REPL — the direct CLI and MCP server work without it:
export ANTHROPIC_API_KEY=sk-ant-...
How to Install the External Tool Dependencies
OpenOSINT wraps several standalone OSINT tools. Install the ones you plan to use:
pip install holehe # email account enumeration
pip install sherlock-project # username search across 300+ platforms
pip install sublist3r # subdomain enumeration
For phone intelligence, phoneinfoga is a standalone binary. Download the release for your platform from its GitHub releases page and place it somewhere in your PATH.
How to Configure Optional API Keys
Two tools work at higher rate limits with optional API keys:
export HIBP_API_KEY=your_key # required for breach checks via HaveIBeenPwned v3
export IPINFO_TOKEN=your_token # optional — raises ipinfo.io rate limits
If a binary is missing or an API key is not configured, that specific tool returns a descriptive error string. All other tools continue to work normally.
How to Use the Interactive AI REPL
Run openosint with no arguments to start the AI-powered REPL. You can also use openosint shell — it's equivalent:
$ openosint
# or
$ openosint shell
If you prefer to pass the API key inline rather than via environment variable, use the --api-key flag:
$ openosint --api-key sk-ant-...
You'll get a prompt where you can type targets or questions in natural language:
openosint ❯ investigate target@example.com
openosint ❯ find all accounts for johndoe99
openosint ❯ what subdomains does example.com have?
openosint ❯ check if +14155552671 is a mobile number
The agent decides which tools to run based on your input. You don't need to specify which tools to use or in what order. If you type an email address, the agent will run email enumeration. If it finds a linked username, it may pivot and search that username across platforms.
Reports are saved automatically to the reports/ directory after every investigation that produces structured findings.
Here are the commands available inside the REPL:
| Command | Description |
|---|---|
clear |
Reset the conversation memory |
save |
Manually save the last report |
tools |
Show available tools and their status |
config |
Show current configuration |
help |
List all commands |
exit or Ctrl-D |
Quit |
How to Run Individual Tools from the CLI
If you want to run a single tool without the AI layer — for scripting, automation, or quick lookups — use the direct CLI:
# Email account enumeration (default timeout: 120s)
openosint email target@example.com
# With a custom timeout in seconds
openosint email target@example.com -t 60
# Username search across 300+ platforms (default timeout: 180s)
openosint username johndoe99
# Enable verbose output for debugging
openosint -v email target@example.com
The direct CLI doesn't require an Anthropic API key. It runs the underlying binary and prints the output to the terminal.
This mode is useful when you need predictable, scriptable behavior — for example, piping output into another tool or running automated checks.
How to Set Up the MCP Server
OpenOSINT also ships as a Model Context Protocol (MCP) server. This exposes all 9 tools to any MCP-compatible AI client.
How to Register with Claude Code
claude mcp add openosint python /absolute/path/to/OpenOSINT/openosint/mcp_server.py
Verify the registration worked:
claude mcp list
Once registered, you can drive investigations from the Claude Code prompt:
> Investigate target@example.com. If you find a linked username,
trace it across other platforms and compile a full report.
How to Configure Claude Desktop
Add the following to your Claude Desktop config at ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"openosint": {
"command": "python",
"args": ["/absolute/path/to/OpenOSINT/openosint/mcp_server.py"]
}
}
}
Restart Claude Desktop after saving the file. The tools will appear in Claude's tool list.
The MCP server uses stdio transport and does not need a persistent background process. Claude Code or Claude Desktop starts it on demand.
How the Agent Loop Works Under the Hood
Here is a simplified version of the agent loop from openosint/agent.py:
import anthropic
import asyncio
client = anthropic.Anthropic()
async def run_investigation(user_prompt: str) -> str:
messages = [{"role": "user", "content": user_prompt}]
while True:
response = client.messages.create(
model="claude-...", # model configured via --api-key / env var
max_tokens=4096,
tools=TOOL_SCHEMAS, # JSON schemas for all 9 tools
messages=messages
)
# Agent is done — extract and return the final report
if response.stop_reason == "end_turn":
return extract_text(response)
# Agent needs a tool — run the real binary
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
# Runs holehe, sherlock, etc. as real subprocesses
real_output = await execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": real_output # real output, never generated
})
# Append assistant turn and real tool results to conversation
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
There are a few important things to understand in this code.
The loop runs until
stop_reason == "end_turn": The agent decides when it has gathered enough information to write the final report. It may call one tool or ten, depending on what it finds.execute_tool()runs real subprocesses: It's a thin async wrapper around Python'sasyncio.create_subprocess_exec()with a configurable timeout. There's no simulation and no mocked data at any point.Conversation history is maintained across the entire loop: Each tool result goes back into
messages, so the model always has full context of what it found when deciding what to run next.Tool schemas are defined as JSON: Each tool has a name, description, and parameter schema. The model uses these to know what tools exist and what arguments they accept. Here's a simplified example for
search_email:
{
"name": "search_email",
"description": (
"Enumerates online services and social accounts "
"associated with an email address using holehe."
),
"input_schema": {
"type": "object",
"properties": {
"email": {
"type": "string",
"description": "Target email address"
}
},
"required": ["email"]
}
}
The same pattern applies to all 9 tools. The model reads these schemas at the start of every request and uses them to decide what's available and how to call it.
Project Architecture
The codebase is organized in five layers. The hard rule across the codebase is that no layer imports from a layer above it:
openosint/tools/ Core tools
Async wrappers around external binaries and APIs.
Stateless. No AI. No CLI. Pure functions.
openosint/agent.py AI agent
Anthropic tool use loop.
Per-session conversation history.
Imports from tools/. Nothing imports from agent.py.
openosint/repl.py Interactive REPL (prompt_toolkit + Rich)
openosint/mcp_server.py MCP server (stdio transport)
openosint/cli.py CLI entry point
This separation makes each layer independently testable. The core tools are pure async functions that take a string and return a string — you can unit test them without touching the agent or the CLI.
It also means the AI layer is entirely optional. If you don't have an Anthropic API key, you use the CLI and bypass the agent. The MCP server also operates independently of the agent.
The 9 Available Tools
| Tool | Backend | What it returns |
|---|---|---|
search_email |
holehe | Social accounts linked to an email |
search_username |
sherlock | Accounts across 300+ platforms |
search_breach |
HaveIBeenPwned v3 | Breach names, dates, leaked data types |
search_whois |
python-whois | Registrant, registrar, creation/expiry |
search_ip |
ipinfo.io | Geolocation, ASN, hostname, org |
search_domain |
sublist3r | Subdomain enumeration |
generate_dorks |
built-in | 12 targeted Google dork URLs, no network calls |
search_paste |
psbdmp.ws | Pastebin dump mentions |
search_phone |
phoneinfoga | Carrier, country, line type |
Conclusion
In this tutorial, you learned how to set up and use OpenOSINT — a Python OSINT framework built on Claude's tool use API.
The key takeaway is the design principle: by using native tool use, the agent never generates tool output. It only reads real output from real binaries. This makes it suitable for security research where accuracy matters and hallucination isn't an acceptable failure mode.
To recap the three interfaces:
Run
openosintfor the interactive AI REPL — best for full investigations with automatic chainingRun
openosint emailoropenosint usernamefor direct CLI access — best for scripting and automationRegister the MCP server in Claude Code or Claude Desktop to run investigations inside your existing AI environment
The full source code is available on GitHub under the MIT license. Contributions and issues are welcome.
Legal note: OpenOSINT is for authorized security research, penetration testing, and investigative journalism only. Users are solely responsible for compliance with applicable law, including GDPR, CCPA, and the CFAA. See the DISCLAIMER.md for the full notice.