<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ אחיה כהן - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ אחיה כהן - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Fri, 26 Jun 2026 10:11:05 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/achiya-automation/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Connect Your AI Coding Agent to a Browser on macOS  ]]>
                </title>
                <description>
                    <![CDATA[ AI coding agents like Claude Code, Cursor, and the rest have gotten remarkably good at reading and writing code. But the moment they need to look at something on the web, they hit a wall. They can't s ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-connect-your-ai-coding-agent-to-a-browser-on-macos/</link>
                <guid isPermaLink="false">6a1594c1da253d50d4ae1277</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ automation ]]>
                    </category>
                
                    <category>
                        <![CDATA[ macOS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Open Source ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Developer Tools ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agentic AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ mcp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ אחיה כהן ]]>
                </dc:creator>
                <pubDate>Tue, 26 May 2026 12:40:33 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/7e77f1c5-6942-4dbe-a3c6-ca74cc4354e5.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>AI coding agents like Claude Code, Cursor, and the rest have gotten remarkably good at reading and writing code. But the moment they need to <em>look at something on the web</em>, they hit a wall. They can't see your staging site. They can't read the error in your analytics dashboard. They can't check whether the form they just built actually submits.</p>
<p>The usual fix is to hand the agent a headless browser — Puppeteer or Playwright driving a fresh Chromium instance. That works, sort of. But a headless Chromium starts every session as a stranger: no logins, no cookies, no sessions. It spins up a second browser engine that pushes your CPU and spins up your fan. And a growing number of sites simply block it on sight.</p>
<p>There's another option, and on a Mac it's a good one: let the agent drive the <strong>Safari you already use</strong> — the one that's already logged into GitHub, your analytics, your staging environment. That's what Safari MCP does. It's an open-source MCP server that exposes Safari to any MCP-capable agent through around 80 tools, with no Chromium, no WebDriver, and no separate browser to babysit.</p>
<p>In this tutorial you'll connect Safari MCP to an AI agent, run your first automation, and then build something a headless browser fundamentally cannot do: an automation that works inside a page you're logged into. By the end you'll understand not just <em>how</em> to wire this up, but <em>when</em> native browser automation is the right call — and when it isn't.</p>
<p>Here's what you'll need:</p>
<ul>
<li><p>A Mac (Safari MCP is macOS-only — more on that trade-off later)</p>
</li>
<li><p>Node.js 18 or newer</p>
</li>
<li><p>An MCP-capable AI agent — this tutorial uses Claude Code and Cursor, but any MCP client works</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-what-is-mcp-and-why-does-browser-automation-need-it">What is MCP, and Why Does Browser Automation Need It?</a></p>
</li>
<li><p><a href="#heading-why-safari-instead-of-chrome-or-playwright">Why Safari Instead of Chrome or Playwright?</a></p>
</li>
<li><p><a href="#heading-installing-safari-mcp">Installing Safari MCP</a></p>
</li>
<li><p><a href="#heading-your-first-automation-reading-a-page">Your First Automation: Reading a Page</a></p>
</li>
<li><p><a href="#heading-the-payoff-automating-a-logged-in-workflow">The Payoff: Automating a Logged-in Workflow</a></p>
</li>
<li><p><a href="#heading-handling-the-tricky-parts">Handling the Tricky Parts</a></p>
</li>
<li><p><a href="#heading-limitations-when-not-to-use-this">Limitations: When Not to Use This</a></p>
</li>
<li><p><a href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-what-is-mcp-and-why-does-browser-automation-need-it">What is MCP, and Why Does Browser Automation Need It?</h2>
<p>Before wiring anything up, it helps to know what the "MCP" in Safari MCP stands for.</p>
<p><strong>MCP</strong> is the Model Context Protocol — an open standard for connecting AI agents to external tools and data. Think of it the way you'd think of a USB port. Before USB, every device needed its own connector. MCP is the equivalent of agreeing on one connector: an agent that speaks MCP can use <em>any</em> tool that speaks MCP, with no custom integration code on either side.</p>
<p>An MCP <strong>server</strong> exposes a set of tools. An MCP <strong>client</strong> — your AI agent — discovers those tools and calls them. The server describes each tool (its name, what it does, what arguments it takes) and the agent decides when to call it. When Claude Code decides it needs to read a web page, it doesn't run browser code itself. It calls a tool that some MCP server provides.</p>
<p>Browser automation is a natural fit for this model. The agent's job is reasoning — "I need to see what's on the staging site, then check the console for errors." The actual mechanics — open a tab, wait for load, read the DOM, capture console output — are well-defined operations that belong behind a stable interface. That interface is exactly what an MCP server provides.</p>
<p>Safari MCP is one such server. It runs as a local process, exposes around 80 browser tools (navigate, click, fill, read, screenshot, extract, and more), and any MCP client can drive it. The agent never touches AppleScript or WebKit internals. It just calls <code>safari_navigate</code> and gets a result.</p>
<p>The "USB port" framing matters for a practical reason: nothing in this tutorial is Claude-specific. Wire Safari MCP into Cursor, Cline, Windsurf, or your own MCP client and the tools are identical.</p>
<h2 id="heading-why-safari-instead-of-chrome-or-playwright">Why Safari Instead of Chrome or Playwright?</h2>
<p>If you've automated a browser before, you've almost certainly used Chrome through Puppeteer, Playwright, or Selenium. So why reach for Safari?</p>
<p>It comes down to three differences that matter once an <em>AI agent</em>, not a test script, is the thing driving the browser.</p>
<p><strong>1. It's your real browser, with your real sessions.</strong> A headless Chromium launched by Playwright is a clean room. It has never logged into anything. If you want your agent to read your analytics dashboard, you first have to solve authentication — store credentials somewhere, script the login, handle two-factor prompts, refresh tokens. Safari MCP skips all of that. It drives the Safari instance you use every day, which is <em>already</em> logged into your dashboards, your GitHub, your email. The agent inherits those sessions for free.</p>
<p><strong>2. It doesn't melt your laptop.</strong> A headless Chromium is a second, full browser engine running alongside the browser you already have open. On a laptop that's real CPU, real memory, and a fan you can hear. Safari MCP uses the WebKit engine that's already running on every Mac — there's no second engine to start. The project measures this at roughly 60% less CPU for the browsing work, and the automation runs with Safari in the background, so it doesn't steal your screen.</p>
<p><strong>3. Sites don't treat it as a bot.</strong> Headless browsers leak. They expose <code>navigator.webdriver</code>, they ship with telltale automation fingerprints, and bot-detection services — Cloudflare's challenge pages, reCAPTCHA, the WAFs in front of a lot of B2B sites — have gotten very good at spotting them. Your real Safari, driven through the operating system, looks like exactly what it is: a person's browser. (To be clear: this is for automating <em>your own</em> accounts and sites — not for evading access controls you don't own.)</p>
<p>The cost of all this is the obvious one: <strong>Safari MCP is macOS-only.</strong> It's built on WebKit and AppleScript, so there's no Windows or Linux story. If your agent runs on a Linux CI box, this isn't your tool. If it runs on your Mac — which, for a coding agent, it very often does — the trade is a good one. We'll come back to limitations honestly at the end.</p>
<h2 id="heading-installing-safari-mcp">Installing Safari MCP</h2>
<p>Installation is genuinely one command, but there are two Safari settings to flip first. Let's do it in order.</p>
<h3 id="heading-step-1-enable-safaris-developer-features">Step 1 — Enable Safari's developer features</h3>
<p>Safari MCP reads and controls pages by running JavaScript inside Safari. Two settings have to be on:</p>
<ol>
<li><p>Open <strong>Safari → Settings → Advanced</strong> and check <strong>"Show features for web developers."</strong> This reveals the Develop menu.</p>
</li>
<li><p>Open the new <strong>Develop</strong> menu and check <strong>"Allow JavaScript from Apple Events."</strong></p>
</li>
</ol>
<p>That second one is the important one. It's what lets an outside process — the MCP server — ask Safari to run JavaScript on a page. Without it, every tool call fails.</p>
<h3 id="heading-step-2-run-the-server">Step 2 — Run the server</h3>
<pre><code class="language-bash">npx safari-mcp
</code></pre>
<p>That's the whole install. <code>npx</code> fetches the package and runs it; there's nothing to build. The first time an agent calls a tool, macOS will pop up a permission prompt — something like <em>"Terminal wants to control Safari."</em> Click <strong>OK</strong>. That's the standard Automation permission, and you can review it later under <strong>System Settings → Privacy &amp; Security → Automation</strong>.</p>
<p>If you'd rather have it installed permanently:</p>
<pre><code class="language-bash">npm install -g safari-mcp
</code></pre>
<h3 id="heading-step-3-tell-your-agent-about-it">Step 3 — Tell your agent about it</h3>
<p>Your AI agent needs to know the server exists. For <strong>Claude Code</strong>, one command does it:</p>
<pre><code class="language-bash">claude mcp add safari -- npx safari-mcp
</code></pre>
<p>For <strong>Cursor</strong>, create <code>.cursor/mcp.json</code> in your project:</p>
<pre><code class="language-json">{
  "mcpServers": {
    "safari": {
      "command": "npx",
      "args": ["safari-mcp"]
    }
  }
}
</code></pre>
<p>The process is the same for every client — Claude Desktop, Cline, Windsurf, Continue, VS Code. You're telling the agent: "there's an MCP server named <code>safari</code>; start it by running <code>npx safari-mcp</code>."</p>
<p>Restart your agent (or reload its MCP servers) and it will connect. In Claude Code you can confirm with the <code>/mcp</code> command, which lists connected servers and their tools. You should see <code>safari</code> with around 80 tools available.</p>
<p>That's it. Your agent now has a browser.</p>
<h2 id="heading-your-first-automation-reading-a-page">Your First Automation: Reading a Page</h2>
<p>Let's prove the wiring works with the simplest possible task: have the agent read a web page.</p>
<p>In your agent, just ask in plain language:</p>
<blockquote>
<p>"Use the safari tools to open example.com and tell me what the page says."</p>
</blockquote>
<p>Behind that request, the agent makes two tool calls. First it navigates:</p>
<pre><code class="language-json">{ "tool": "safari_navigate", "arguments": { "url": "https://example.com" } }
</code></pre>
<p>Then it reads the content:</p>
<pre><code class="language-json">{ "tool": "safari_read_page", "arguments": {} }
</code></pre>
<p><code>safari_read_page</code> returns the page's title, URL, and text content with the HTML stripped out — exactly the form an LLM wants. The agent gets back something like this:</p>
<pre><code class="language-plaintext">Example Domain
https://example.com/
This domain is for use in illustrative examples in documents. You may
use this domain in literature without prior coordination or asking for
permission.
</code></pre>
<p>And it relays that to you. You just watched your agent browse.</p>
<p>A quick note on <em>how</em> the agent should look at a page, because it changes everything downstream. <code>safari_read_page</code> is great for "what does this say." But when the agent needs to <em>act</em> — click a button, fill a field — text isn't enough. It needs to know what's actually there and how to target it. For that, the better first move is <code>safari_snapshot</code>:</p>
<pre><code class="language-json">{ "tool": "safari_snapshot", "arguments": {} }
</code></pre>
<p>This returns an accessibility-tree view of the page, where every interactive element has a stable <code>ref</code> ID:</p>
<pre><code class="language-plaintext">[textbox ref=0_8] "Full Name" value=""
[combobox ref=0_10] "Subject"
[button ref=0_15] "Submit"
</code></pre>
<p>Those <code>ref</code> IDs are the agent's reliable handles. CSS selectors break when a page re-renders. A snapshot ref stays valid for the life of the page. Keep that in mind — it's the difference between an automation that works once and one that works every time.</p>
<h2 id="heading-the-payoff-automating-a-logged-in-workflow">The Payoff: Automating a Logged-in Workflow</h2>
<p>Reading example.com is a wiring test. Here's the thing a headless browser genuinely cannot do.</p>
<p>Pick a site you're logged into in Safari right now — your analytics, your project board, your CI dashboard. We'll use GitHub, because every developer has an account and the notifications page is a real, mildly annoying chore. The task: <strong>have the agent open your GitHub notifications and summarize what actually needs your attention.</strong></p>
<p>Ask the agent:</p>
<blockquote>
<p>"Open my GitHub notifications, read them, and group them into 'needs a reply' versus 'just FYI'."</p>
</blockquote>
<p>The agent navigates:</p>
<pre><code class="language-json">{ "tool": "safari_navigate", "arguments": { "url": "https://github.com/notifications" } }
</code></pre>
<p>Stop and notice what <em>didn't</em> happen. No login screen. No OAuth dance. No personal access token in an environment variable. Safari is already authenticated as you, so the agent lands directly on your real notifications. A headless Chromium would have hit a login wall here and stopped.</p>
<p>Notification lists load incrementally, so the agent should wait for content before reading. <code>safari_wait_for</code> polls the page until a selector or piece of text appears, or a timeout elapses:</p>
<pre><code class="language-json">{ "tool": "safari_wait_for", "arguments": { "text": "Inbox", "timeout": 10000 } }
</code></pre>
<p>Then it reads. <code>safari_read_page</code> scoped to the notifications region returns the list as clean text:</p>
<pre><code class="language-json">{ "tool": "safari_read_page", "arguments": { "selector": "main" } }
</code></pre>
<p>The agent reasons over that text and hands you the grouped summary. The whole loop — navigate, wait, read, summarize — is a handful of tool calls.</p>
<p>When you need data in a precise shape rather than prose — to feed another step, or to write to a file — the agent can reach for <code>safari_evaluate</code>, which runs custom JavaScript on the page and returns whatever you build:</p>
<pre><code class="language-json">{
  "tool": "safari_evaluate",
  "arguments": {
    "expression": "JSON.stringify([...document.querySelectorAll('li')].map(li =&gt; li.innerText.trim()))"
  }
}
</code></pre>
<p>The agent writes that expression itself, against the structure it just saw in the snapshot — you don't hand-author selectors.</p>
<p>You might be thinking: <em>GitHub has an API, why scrape the page?</em> Fair. For GitHub specifically, the API is excellent. But the point generalizes. Most of the dashboards you stare at every day — your billing portal, your error tracker's specific filtered view, a client's analytics, the admin panel of some tool your company pays for — either have no usable API or would cost you an afternoon of OAuth setup to reach. With Safari MCP, "the page I'm already looking at" <em>is</em> the API. The agent reads what you can see, because it's using the browser you're seeing it in.</p>
<p>That's the capability headless automation can't match. Not speed, not features — <strong>access.</strong></p>
<h2 id="heading-handling-the-tricky-parts">Handling the Tricky Parts</h2>
<p>A first automation always looks easy. Three things tend to bite on the second one.</p>
<h3 id="heading-tab-safety-the-agent-must-not-hijack-your-tabs">Tab Safety — The Agent Must not Hijack Your Tabs</h3>
<p>This is the scariest failure mode: you're typing in a tab, the agent navigates <em>that</em> tab, and your work is gone. Safari MCP guards against it by stamping each automation tab with an identity marker — it uses <code>window.name</code>, which survives page navigations — and resolving "the agent's tab" through that marker on every call. If it can't positively identify its own tab, it refuses to act and raises a re-anchor error rather than guessing.</p>
<p>The practical rule for you: let the agent open its own tab with <code>safari_new_tab</code>, and it will stay in its lane. Don't point it at "the current tab" and assume.</p>
<h3 id="heading-waiting-for-dynamic-content">Waiting for Dynamic Content</h3>
<p>Modern pages render after load. If the agent reads too early, it reads an empty shell. Don't have it guess with fixed sleeps — use <code>safari_wait_for</code>, which polls for a selector or text until it appears or the timeout elapses:</p>
<pre><code class="language-json">{ "tool": "safari_wait_for", "arguments": { "selector": ".results-list", "timeout": 8000 } }
</code></pre>
<p>This is the single most common fix for "the automation works when I step through it slowly but fails when it runs."</p>
<h3 id="heading-framework-forms">Framework Forms</h3>
<p>Set a React or Vue input's <code>.value</code> directly and the framework never notices — its internal state stays empty, and your "filled" form submits blank. Safari MCP's <code>safari_fill</code> and <code>safari_fill_form</code> use the native value setters and dispatch the <code>input</code> and <code>change</code> events the framework listens for, so React, Vue, Angular, and Svelte state all stay in sync:</p>
<pre><code class="language-json">{
  "tool": "safari_fill_form",
  "arguments": {
    "fields": [
      { "selector": "#email", "value": "jane@example.com" },
      { "selector": "#message", "value": "Looks great." }
    ]
  }
}
</code></pre>
<p>For framework-heavy pages where CSS selectors are fragile, go back to the snapshot refs from the previous section — pass <code>{ "ref": "0_9" }</code> instead of <code>{ "selector": "#email" }</code>. Refs survive re-renders; selectors don't.</p>
<p>None of these are exotic. They're just the difference between a demo and an automation you'd actually leave running.</p>
<h2 id="heading-limitations-when-not-to-use-this">Limitations: When Not to Use This</h2>
<p>A tool tutorial that only lists strengths isn't worth much. Here's where Safari MCP is the wrong choice.</p>
<p><strong>It's macOS-only, and that's structural.</strong> Safari MCP is built on WebKit and AppleScript. There's no Windows or Linux port coming, because the foundation doesn't exist on those platforms. If your agent runs in Linux CI, use Playwright.</p>
<p><strong>It drives one Safari, on one Mac.</strong> This is browser automation for <em>your</em> machine — a coding agent working alongside you. It is not a fleet. If you need 50 parallel browsers scraping in a data center, that's a headless-Chromium-in-containers job, and Safari MCP is the wrong shape for it.</p>
<p><strong>Cross-browser test suites should stay on Playwright.</strong> If you're writing end-to-end tests that must pass on Chrome, Firefox, and Safari, use the tool built for that. Safari MCP drives exactly one engine: WebKit.</p>
<p><strong>It shares a browser with you.</strong> Because it uses your real Safari, the agent and you are in the same browser. That's the entire point — but it means you should let the agent work in its own tabs and not fight it for the same window.</p>
<p>The honest summary: Safari MCP is built for one specific situation — an AI agent doing real browser work on the Mac you're sitting at, against sites you're already logged into. In that situation it's hard to beat. Outside it, reach for the headless tools. Knowing which situation you're in is the actual skill.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>You've gone from an AI agent that could only see code to one that can see the web — the real web, behind your real logins.</p>
<p>To recap what you did: you learned what MCP is and why browser automation belongs behind that interface. You saw why a native Safari engine beats a headless Chromium for an agent working on your Mac and you installed Safari MCP with one command and two settings. You ran a first read, and then you did the thing that actually matters — an automation inside a logged-in page, with no auth code at all. Finally, you saw the edges: tab safety, waiting for dynamic content, framework forms, and the cases where you should pick a different tool.</p>
<p>The bigger idea is worth holding onto. An AI agent is only as capable as the tools you connect to it. Giving it a browser — a <em>real</em> one — turns "write me code" into "go look at the staging site, find the bug, and tell me what's wrong." That's a different kind of collaborator.</p>
<p>Safari MCP is open source under the MIT license, and it exposes around 80 tools beyond the handful you used here — screenshots, network inspection, storage, accessibility audits, multi-tab workflows. The repository and full tool reference are at <a href="https://github.com/achiya-automation/safari-mcp">github.com/achiya-automation/safari-mcp</a>. Point your agent at it and see what it does when it can finally look around.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Self-Hosted WhatsApp Bot with n8n and WAHA ]]>
                </title>
                <description>
                    <![CDATA[ WhatsApp is where your many of your customers likely already are. For support tickets, order updates, booking reminders, and lead qualification, a WhatsApp channel often converts several times better  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-self-hosted-whatsapp-bot-with-n8n-and-waha/</link>
                <guid isPermaLink="false">6a01e032fca21b0d4b2bb4c1</guid>
                
                    <category>
                        <![CDATA[ whatsapp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ automation ]]>
                    </category>
                
                    <category>
                        <![CDATA[ n8n ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ self-hosted ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ אחיה כהן ]]>
                </dc:creator>
                <pubDate>Mon, 11 May 2026 13:57:06 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/28affe4d-9359-4cbb-a311-a2ee9d0829c0.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>WhatsApp is where your many of your customers likely already are. For support tickets, order updates, booking reminders, and lead qualification, a WhatsApp channel often converts several times better than email.</p>
<p>But the official WhatsApp Business Cloud API can be slow to onboard, template-restricted for proactive messages, and priced per conversation — which adds up fast at scale.</p>
<p>There's another path: you can run your own WhatsApp HTTP gateway on a small server, connect it to a workflow engine, and keep every message — inbound and outbound — inside infrastructure you control. No monthly conversation fees, no template approvals for routine replies, no third-party middleman holding your customer data.</p>
<p>In this tutorial, you'll build exactly that. By the end, you'll have a WhatsApp bot that:</p>
<ul>
<li><p>Receives every incoming message through a webhook</p>
</li>
<li><p>Routes messages through an n8n workflow</p>
</li>
<li><p>Replies automatically based on keywords, AI, or any API call you want</p>
</li>
<li><p>Runs entirely on your own server, using two open-source tools</p>
</li>
</ul>
<p>You'll use <strong>WAHA</strong> (WhatsApp HTTP API) as the gateway, and <strong>n8n</strong> as the workflow engine. Both run in Docker, both are free for self-hosting, and together they cover everything from a simple auto-reply to a full CRM integration.</p>
<h2 id="heading-table-of-contents">Table of contents</h2>
<ul>
<li><p><a href="#heading-what-youll-learn">What You'll Learn</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-a-note-on-which-whatsapp-account-to-use">A Note on Which WhatsApp Account to Use</a></p>
</li>
<li><p><a href="#heading-waha-vs-the-official-whatsapp-business-cloud-api">WAHA vs the official WhatsApp Business Cloud API</a></p>
</li>
<li><p><a href="#heading-part-1-understanding-waha">Part 1: Understanding WAHA</a></p>
</li>
<li><p><a href="#heading-part-2-running-waha-with-docker">Part 2: Running WAHA with Docker</a></p>
</li>
<li><p><a href="#heading-part-3-starting-a-whatsapp-session">Part 3: Starting a WhatsApp session</a></p>
</li>
<li><p><a href="#heading-part-4-running-n8n">Part 4: Running n8n</a></p>
</li>
<li><p><a href="#heading-part-5-creating-the-webhook-trigger-in-n8n">Part 5: Creating the Webhook Trigger in n8n</a></p>
</li>
<li><p><a href="#heading-part-6-wiring-waha-to-n8n">Part 6: Wiring WAHA to n8n</a></p>
</li>
<li><p><a href="#heading-part-7-building-the-first-auto-reply">Part 7: Building the first auto-reply</a></p>
</li>
<li><p><a href="#heading-part-8-a-second-example-proactive-booking-confirmations">Part 8: A Second Example — Proactive Booking Confirmations</a></p>
</li>
<li><p><a href="#heading-part-9-going-to-production">Part 9: Going to Production</a></p>
</li>
<li><p><a href="#heading-common-pitfalls">Common Pitfalls</a></p>
</li>
<li><p><a href="#heading-where-to-go-next">Where to Go Next</a></p>
</li>
</ul>
<h2 id="heading-what-youll-learn">What You'll Learn</h2>
<ul>
<li><p>How WAHA works under the hood and when to use it instead of the official Cloud API</p>
</li>
<li><p>How to run WAHA and n8n side by side with Docker Compose</p>
</li>
<li><p>How to scan the QR code and bind a WhatsApp account to your gateway</p>
</li>
<li><p>How to connect WAHA's webhook to an n8n workflow</p>
</li>
<li><p>How to build a keyword-based auto-reply bot</p>
</li>
<li><p>How to send proactive confirmations from a separate workflow</p>
</li>
<li><p>How to harden the setup for production (HTTPS, API keys, rate limits, Queue Mode)</p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p>A Linux server (any VPS works — 2 GB of RAM is enough for a small bot)</p>
</li>
<li><p>Docker and Docker Compose installed</p>
</li>
<li><p>A public hostname with DNS pointing at the server, or an ngrok tunnel for local testing</p>
</li>
<li><p>A WhatsApp account you're willing to dedicate to the bot (more on that below)</p>
</li>
<li><p>Basic familiarity with JSON and HTTP requests</p>
</li>
</ul>
<p>You don't need prior n8n experience. If you can drag a box and wire it to another box, you can build the flow.</p>
<h2 id="heading-a-note-on-which-whatsapp-account-to-use">A Note on Which WhatsApp Account to Use</h2>
<p>WAHA works by running an actual WhatsApp Web session inside a headless Chromium process. It logs in as a real account — the same way you would open web.whatsapp.com in your browser. Meta doesn't officially endorse this approach for commercial use at scale, and heavy volume from a single number can lead to a ban.</p>
<p>For that reason, use a dedicated number for the bot. Don't use your personal WhatsApp. Get a second SIM, eSIM, or a VoIP number that supports WhatsApp activation. Keep outbound volume reasonable, and you'll be fine for most small-business use cases.</p>
<p>If you plan to send thousands of marketing messages per day, switch to the official WhatsApp Business Cloud API — that's what it exists for. This tutorial is aimed at the middle ground: support bots, order updates, booking confirmations, and similar conversational flows where you need real-time control without enterprise pricing.</p>
<h2 id="heading-waha-vs-the-official-whatsapp-business-cloud-api">WAHA vs the official WhatsApp Business Cloud API</h2>
<p>Before writing any code, it helps to understand when each option is the right fit.</p>
<table>
<thead>
<tr>
<th>Dimension</th>
<th>WAHA (self-hosted)</th>
<th>WhatsApp Cloud API (Meta)</th>
</tr>
</thead>
<tbody><tr>
<td>Onboarding</td>
<td>Scan a QR code — ready in minutes</td>
<td>Business verification, app review — days to weeks</td>
</tr>
<tr>
<td>Cost</td>
<td>Server cost only</td>
<td>Per-conversation pricing</td>
</tr>
<tr>
<td>Template approval</td>
<td>Not needed</td>
<td>Required for proactive messages outside the 24-hour window</td>
</tr>
<tr>
<td>Session model</td>
<td>One WhatsApp Web session per Core container</td>
<td>Native API, no web session</td>
</tr>
<tr>
<td>Risk</td>
<td>Account ban possible at high unsolicited volume</td>
<td>Rate limits but no ban for normal use</td>
</tr>
<tr>
<td>Vendor lock-in</td>
<td>None — pure open source</td>
<td>Tied to Meta's API and pricing</td>
</tr>
<tr>
<td>Best for</td>
<td>Support bots, small-team workflows, internal tools</td>
<td>High-volume marketing, regulated industries, &gt;100k monthly messages</td>
</tr>
</tbody></table>
<p>Neither is strictly better. If you run a support team for a small business, WAHA is often the pragmatic choice. If you're a bank sending millions of transactional messages, you want the Cloud API. Many teams run both — WAHA for conversational support, Cloud API for bulk transactional traffic.</p>
<h2 id="heading-part-1-understanding-waha">Part 1: Understanding WAHA</h2>
<p>WAHA is an open-source project that wraps WhatsApp Web behind a clean REST API. You <code>POST /api/sendText</code> with a chat ID and a message, and WAHA sends it. You configure a webhook URL, and WAHA <code>POST</code>s to that URL every time a message arrives.</p>
<p>Under the hood, WAHA spawns a Chromium instance, opens WhatsApp Web, and uses an engine (<code>whatsapp-web.js</code>, <code>NOWEB</code>, or <code>GOWS</code>) to automate the session. Your code doesn't see any of that complexity — you just see an HTTP API.</p>
<p>The project ships in two flavors:</p>
<ul>
<li><p><strong>WAHA Core</strong> — free, MIT licensed, one active session per container, community support.</p>
</li>
<li><p><strong>WAHA Plus</strong> — commercial license, multi-session support, priority support, and access to advanced endpoints.</p>
</li>
</ul>
<p>For most developers building a single bot, Core is enough. You can always upgrade later.</p>
<p>Official docs live at <a href="https://waha.devlike.pro/">waha.devlike.pro</a>. Keep that open in another tab — we'll reference specific endpoints as we go.</p>
<h2 id="heading-part-2-running-waha-with-docker">Part 2: Running WAHA with Docker</h2>
<p>Create a fresh directory for the project:</p>
<pre><code class="language-bash">mkdir whatsapp-bot &amp;&amp; cd whatsapp-bot
</code></pre>
<p>Create a <code>docker-compose.yml</code> file:</p>
<pre><code class="language-yaml">services:
  waha:
    image: devlikeapro/waha:latest
    container_name: waha
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - WAHA_DASHBOARD_ENABLED=true
      - WAHA_DASHBOARD_USERNAME=admin
      - WAHA_DASHBOARD_PASSWORD=change-me-now
      - WHATSAPP_API_KEY=super-secret-key-change-me
      - WHATSAPP_DEFAULT_ENGINE=WEBJS
    volumes:
      - ./waha-sessions:/app/.sessions
</code></pre>
<p>A few things to notice:</p>
<ul>
<li><p>The dashboard username and password protect the web UI at <code>http://your-server:3000</code>. Always change the defaults before you expose the port publicly.</p>
</li>
<li><p><code>WHATSAPP_API_KEY</code> is the key every HTTP request to WAHA must include in the <code>X-Api-Key</code> header. Treat it like a database password.</p>
</li>
<li><p><code>WHATSAPP_DEFAULT_ENGINE=WEBJS</code> uses the mature <code>whatsapp-web.js</code> engine. WAHA also supports <code>NOWEB</code> and <code>GOWS</code> engines with different trade-offs — WEBJS is the safest default for a first deployment.</p>
</li>
<li><p>The volume mount persists the session across restarts. Without it, every container rebuild forces you to scan the QR code again.</p>
</li>
</ul>
<p>Start the container:</p>
<pre><code class="language-bash">docker compose up -d
docker compose logs -f waha
</code></pre>
<p>Within about 20 seconds WAHA finishes booting. Visit <code>http://your-server:3000</code> and log in with the dashboard credentials.</p>
<h2 id="heading-part-3-starting-a-whatsapp-session">Part 3: Starting a WhatsApp session</h2>
<p>WAHA calls each WhatsApp account a "session." You can have one session at a time on WAHA Core.</p>
<p>From the dashboard, click <strong>Start New Session</strong> and name it <code>default</code>. WAHA displays a QR code.</p>
<p>On your phone:</p>
<ol>
<li><p>Open WhatsApp.</p>
</li>
<li><p>Tap the three-dot menu (Android) or Settings (iOS).</p>
</li>
<li><p>Tap Linked Devices → Link a Device.</p>
</li>
<li><p>Point the camera at the QR code on your screen.</p>
</li>
</ol>
<p>Within a few seconds the dashboard shows <code>WORKING</code> status. Your session is live.</p>
<p>You can also do this over the API. Start the session (<code>default</code> is the session name, encoded in the URL path):</p>
<pre><code class="language-bash">curl -X POST http://your-server:3000/api/sessions/default/start \
  -H "X-Api-Key: super-secret-key-change-me"
</code></pre>
<p>The call is idempotent — if the session is already running, nothing happens.</p>
<p>Fetch the QR as a PNG:</p>
<pre><code class="language-bash">curl http://your-server:3000/api/default/auth/qr \
  -H "X-Api-Key: super-secret-key-change-me" \
  -H "Accept: image/png" \
  --output qr.png
</code></pre>
<p>Scan and you're in.</p>
<p>Test that the session works by sending a message to yourself:</p>
<pre><code class="language-bash">curl -X POST http://your-server:3000/api/sendText \
  -H "X-Api-Key: super-secret-key-change-me" \
  -H "Content-Type: application/json" \
  -d '{
    "session": "default",
    "chatId": "15555550123@c.us",
    "text": "Hello from WAHA!"
  }'
</code></pre>
<p>Replace <code>15555550123</code> with your own number (country code plus number, no <code>+</code>, no spaces, no dashes). The <code>@c.us</code> suffix marks it as an individual chat. Groups use <code>@g.us</code>.</p>
<p>If the message lands on your phone — congratulations. The gateway works.</p>
<h2 id="heading-part-4-running-n8n">Part 4: Running n8n</h2>
<p>Add an <code>n8n</code> service to your <code>docker-compose.yml</code> alongside WAHA:</p>
<pre><code class="language-yaml">services:
  waha:
    # ... existing config

  n8n:
    image: n8nio/n8n:latest
    container_name: n8n
    restart: unless-stopped
    ports:
      - "5678:5678"
    environment:
      - N8N_HOST=n8n.example.com
      - N8N_PORT=5678
      - N8N_PROTOCOL=https
      - WEBHOOK_URL=https://n8n.example.com/
      - GENERIC_TIMEZONE=UTC
    volumes:
      - ./n8n-data:/home/node/.n8n
</code></pre>
<p>Replace <code>n8n.example.com</code> with your real domain. For purely local testing, set:</p>
<pre><code class="language-yaml">- N8N_HOST=localhost
- N8N_PROTOCOL=http
- WEBHOOK_URL=http://localhost:5678/
</code></pre>
<p>If you want to test webhooks from your laptop without a server, run <code>ngrok http 5678</code> in another terminal and use the ngrok HTTPS URL as <code>WEBHOOK_URL</code>. n8n uses <code>WEBHOOK_URL</code> to tell external services where to POST — get this wrong and your webhooks will 404.</p>
<p>Start the stack:</p>
<pre><code class="language-bash">docker compose up -d
</code></pre>
<p>Visit <code>http://your-server:5678</code>. On the first visit, n8n walks you through creating an owner account (email and password). Every subsequent visit requires that login. For extra safety in production, put n8n behind a reverse proxy with an allow-list or an additional auth layer — we'll set that up later.</p>
<h2 id="heading-part-5-creating-the-webhook-trigger-in-n8n">Part 5: Creating the Webhook Trigger in n8n</h2>
<p>Click Create Workflow. You'll see an empty canvas.</p>
<p>Add a Webhook node and configure it:</p>
<ul>
<li><p><strong>HTTP Method</strong>: POST</p>
</li>
<li><p><strong>Path</strong>: <code>whatsapp</code> (this becomes part of the URL)</p>
</li>
<li><p><strong>Response Mode</strong>: Respond Immediately</p>
</li>
<li><p><strong>Response Data</strong>: First Entry JSON</p>
</li>
</ul>
<p>Click Listen for Test Event. n8n shows you two URLs: a test URL and a production URL. Copy the production URL. It looks like this:</p>
<pre><code class="language-plaintext">https://n8n.example.com/webhook/whatsapp
</code></pre>
<p>Not <code>webhook-test</code> — that one only fires while the editor is open. You want <code>webhook</code>.</p>
<h2 id="heading-part-6-wiring-waha-to-n8n">Part 6: Wiring WAHA to n8n</h2>
<p>WAHA can POST to a webhook on every WhatsApp event. Tell it where to send those events.</p>
<p>In the WAHA dashboard, open your session and set the webhook URL. Or do it over the API:</p>
<pre><code class="language-bash">curl -X PUT http://your-server:3000/api/sessions/default \
  -H "X-Api-Key: super-secret-key-change-me" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "webhooks": [
        {
          "url": "https://n8n.example.com/webhook/whatsapp",
          "events": ["message", "session.status"]
        }
      ]
    }
  }'
</code></pre>
<p>The <code>message</code> event fires on every inbound message. <code>session.status</code> fires when the session connects, disconnects, or reconnects — which is useful for alerting when your bot goes down.</p>
<p>Test it. From another phone, send a WhatsApp message to your bot's number. Head back to the n8n editor. Within a second or two the webhook node lights up with the event data.</p>
<p>The payload looks roughly like this:</p>
<pre><code class="language-json">{
  "event": "message",
  "session": "default",
  "payload": {
    "id": "false_15555550123@c.us_3EB0...",
    "from": "15555550123@c.us",
    "body": "Hello",
    "timestamp": 1713801234,
    "fromMe": false
  }
}
</code></pre>
<p>Everything you need is in <code>payload</code>: who sent it (<code>from</code>), what they said (<code>body</code>), and when (<code>timestamp</code>).</p>
<h2 id="heading-part-7-building-the-first-auto-reply">Part 7: Building the first auto-reply</h2>
<p>A bot that only listens is boring. Let's make it answer.</p>
<p>You'll build a tiny keyword router: if the user sends <code>hi</code> or <code>hello</code>, the bot greets them. If they send <code>price</code>, it sends a pricing message. Anything else gets a fallback.</p>
<p>After the Webhook node, add a Switch node.</p>
<p>Configure the Switch node:</p>
<ul>
<li><p><strong>Mode</strong>: Expression</p>
</li>
<li><p><strong>Value</strong>: <code>{{ $json.payload.body.toLowerCase().trim() }}</code></p>
</li>
<li><p>Add routing rules:</p>
<ul>
<li><p>Rule 1: equals <code>hi</code> — output 0</p>
</li>
<li><p>Rule 2: equals <code>hello</code> — output 0</p>
</li>
<li><p>Rule 3: equals <code>price</code> — output 1</p>
</li>
<li><p>Fallback output: 2</p>
</li>
</ul>
</li>
</ul>
<p>After the Switch, add three HTTP Request nodes, one per output.</p>
<p>Configure each HTTP Request node identically, except for the body text:</p>
<ul>
<li><p><strong>Method</strong>: POST</p>
</li>
<li><p><strong>URL</strong>: <code>http://waha:3000/api/sendText</code> (inside the Docker network you can reach WAHA by its service name. From outside use the full public URL)</p>
</li>
<li><p><strong>Send Headers</strong>: on</p>
<ul>
<li><p><code>X-Api-Key</code>: <code>super-secret-key-change-me</code></p>
</li>
<li><p><code>Content-Type</code>: <code>application/json</code></p>
</li>
</ul>
</li>
<li><p><strong>Send Body</strong>: on</p>
<ul>
<li><p><strong>Body Content Type</strong>: JSON</p>
</li>
<li><p><strong>Specify Body</strong>: Using JSON</p>
</li>
</ul>
</li>
</ul>
<p>For the greeting node, the JSON body is:</p>
<pre><code class="language-json">{
  "session": "default",
  "chatId": "={{ $('Webhook').item.json.payload.from }}",
  "text": "Hi! I'm the bot. Send 'price' to see pricing, or anything else for help."
}
</code></pre>
<p>For the pricing node:</p>
<pre><code class="language-json">{
  "session": "default",
  "chatId": "={{ $('Webhook').item.json.payload.from }}",
  "text": "Our plans start at $49/month. Reply 'sales' to talk to a human."
}
</code></pre>
<p>For the fallback:</p>
<pre><code class="language-json">{
  "session": "default",
  "chatId": "={{ $('Webhook').item.json.payload.from }}",
  "text": "I didn't catch that. Try 'hi' or 'price'."
}
</code></pre>
<p>The <code>={{ ... }}</code> syntax is an n8n expression — at runtime it pulls values from earlier nodes.</p>
<p>Connect the Switch outputs to their matching HTTP Request nodes. Save the workflow. Click Activate in the top-right.</p>
<p>Send <code>hi</code> to your bot from any phone. It should reply within a second.</p>
<p>Congratulations — you have a WhatsApp bot running entirely on your own infrastructure.</p>
<h2 id="heading-part-8-a-second-example-proactive-booking-confirmations">Part 8: A Second Example — Proactive Booking Confirmations</h2>
<p>Auto-reply is useful. Proactive outbound is where the value really compounds. Here's a second workflow that sends a booking confirmation whenever a new row lands in a database.</p>
<p>Create a second workflow in n8n. Use one of these triggers:</p>
<ul>
<li><p><strong>Schedule Trigger</strong> — poll a database every minute for new rows</p>
</li>
<li><p><strong>Webhook Trigger</strong> — listen for a notification from your booking system</p>
</li>
<li><p><strong>Database Trigger</strong> (Postgres, MySQL, Supabase) — react to inserts in real time</p>
</li>
</ul>
<p>For this example, use a Schedule Trigger set to every minute, followed by a Postgres <strong>Execute Query</strong> node that reads pending confirmations:</p>
<pre><code class="language-sql">SELECT id, customer_phone, service_name, booking_time
FROM bookings
WHERE confirmation_sent = false
LIMIT 20;
</code></pre>
<p>After the Postgres node, add an HTTP Request node pointing to the same WAHA <code>sendText</code> endpoint you used earlier. The body:</p>
<pre><code class="language-json">{
  "session": "default",
  "chatId": "={{ $json.customer_phone }}@c.us",
  "text": "Hi! Your booking for {{ \(json.service_name }} on {{ \)json.booking_time }} is confirmed. Reply 'change' to reschedule."
}
</code></pre>
<p>Finally, add a second Postgres node that marks the booking as sent:</p>
<pre><code class="language-sql">UPDATE bookings
SET confirmation_sent = true, confirmation_sent_at = NOW()
WHERE id = {{ $json.id }};
</code></pre>
<p>Activate the workflow. Every minute, n8n pulls pending bookings, sends a WhatsApp confirmation, and marks them done.</p>
<p>This pattern generalizes. Replace the SQL with a call to Shopify for order confirmations, Stripe for receipt messages, or Calendly for appointment reminders. The WhatsApp layer stays the same — only the source of truth changes.</p>
<h2 id="heading-part-9-going-to-production">Part 9: Going to Production</h2>
<p>The setup above works, but it's not yet production-ready. Here's what to harden before you point real customers at it.</p>
<h3 id="heading-1-put-everything-behind-https">1. Put Everything Behind HTTPS</h3>
<p>Never expose n8n or WAHA directly on plain HTTP. Put a reverse proxy in front. Caddy is the easiest choice because it handles Let's Encrypt automatically.</p>
<p>A minimal <code>Caddyfile</code>:</p>
<pre><code class="language-plaintext">n8n.example.com {
    reverse_proxy n8n:5678
}

waha.example.com {
    reverse_proxy waha:3000
}
</code></pre>
<p>Run Caddy as another service in the same Docker Compose. TLS certificates are issued and renewed automatically.</p>
<h3 id="heading-2-rotate-the-api-keys">2. Rotate the API Keys</h3>
<p>Don't ship <code>super-secret-key-change-me</code> to production. Generate a real key:</p>
<pre><code class="language-bash">openssl rand -hex 32
</code></pre>
<p>Put it in a <code>.env</code> file, reference it as <code>${WHATSAPP_API_KEY}</code> in <code>docker-compose.yml</code>, and add <code>.env</code> to your <code>.gitignore</code>.</p>
<h3 id="heading-3-rate-limit-outbound-messages">3. Rate-limit Outbound Messages</h3>
<p>WhatsApp bans accounts that send too many messages too fast. A safe outbound rate for a fresh number is well under 20 messages per minute. For bursty replies, add an n8n Wait node between sends, or queue outgoing messages through a small custom function node that sleeps between requests.</p>
<h3 id="heading-4-scale-n8n-with-queue-mode">4. Scale n8n with Queue Mode</h3>
<p>By default, n8n runs everything in a single process. That's fine for low volume. For higher throughput, switch to Queue Mode:</p>
<ul>
<li><p>Add a Redis container.</p>
</li>
<li><p>Run one <code>n8n</code> main container (the web UI and webhook receiver).</p>
</li>
<li><p>Run one or more <code>n8n-worker</code> containers that pull jobs from the queue.</p>
</li>
</ul>
<p>Queue Mode is documented at <a href="https://docs.n8n.io/hosting/scaling/queue-mode/">docs.n8n.io/hosting/scaling/queue-mode/</a>. Setup adds two environment variables (<code>EXECUTIONS_MODE=queue</code>, <code>QUEUE_BULL_REDIS_HOST=redis</code>) and decouples incoming webhooks from workflow execution. The webhook responds in milliseconds while workers chew through the queue in the background.</p>
<h3 id="heading-5-monitor-the-session">5. Monitor the Session</h3>
<p>WhatsApp Web sessions drop. The phone loses connection, WhatsApp rotates security tokens, or your server reboots. Catch those drops early.</p>
<p>Subscribe to the <code>session.status</code> webhook event in WAHA. When status becomes <code>FAILED</code> or <code>STOPPED</code>, route it to an n8n workflow that posts to Slack, sends an email, or pages you. The faster you know, the faster you recover.</p>
<p>For overall uptime, point something like Uptime Kuma at <code>GET /api/sessions/default</code> on WAHA. If WAHA reports <code>WORKING</code>, you're fine. Anything else triggers an alert.</p>
<h3 id="heading-6-back-up-the-sessions-volume">6. Back Up the Sessions Volume</h3>
<p>The <code>waha-sessions</code> directory contains the logged-in state. If you lose it, you have to scan the QR code again — possibly from a phone that's no longer handy. Back it up nightly. A simple cron job with <code>tar</code> and <code>rclone</code> to S3-compatible storage is plenty.</p>
<h3 id="heading-7-add-a-live-agent-handoff">7. Add a Live-Agent Handoff</h3>
<p>Not every conversation should stay with the bot. When a user types <code>human</code> — or when your intent classifier can't answer confidently — hand off to a real agent.</p>
<p>Chatwoot is a solid open-source option: it has a dedicated WhatsApp channel, agent inbox, team assignment, and conversation history. The handoff is an n8n branch that stops processing bot replies and forwards the message stream to Chatwoot's API.</p>
<h2 id="heading-common-pitfalls">Common Pitfalls</h2>
<p>A few issues catch almost everyone on their first production deploy.</p>
<h3 id="heading-webhooks-timing-out">Webhooks Timing Out</h3>
<p>WAHA gives your webhook a few seconds to respond. If your n8n workflow is slow (calling an LLM, hitting a remote API), the webhook times out and WAHA retries, potentially causing duplicate replies.</p>
<p>Fix: make the webhook return <code>200</code> immediately and offload the slow work. In n8n, set the Webhook node's Response Mode to <em>Using Respond to Webhook Node</em>, add a Respond to Webhook node as the first step with a <code>200</code> and empty body, then do the heavy lifting after that.</p>
<h3 id="heading-duplicate-messages">Duplicate Messages</h3>
<p>WAHA delivers the same <code>message</code> event more than once in edge cases (phone comes back online, session reconnects). Store the <code>payload.id</code> somewhere — Redis, a database, or n8n's static data store — and drop any ID you've already processed.</p>
<h3 id="heading-messages-arriving-out-of-order">Messages Arriving Out of Order</h3>
<p>The webhook is async, and n8n may parallelize executions. If ordering matters — for example, in a multi-step conversation — key a queue by the sender's <code>chatId</code> and process each sender serially.</p>
<h3 id="heading-sessions-disconnecting-after-a-phone-restart">Sessions Disconnecting After a Phone Restart</h3>
<p>Normal WhatsApp Web behavior. WAHA auto-reconnects, but occasionally the linked-devices list needs a manual refresh. If a session refuses to come back, stop the WAHA container, delete that session's folder under <code>waha-sessions/</code>, start the container again, and rescan the QR.</p>
<h3 id="heading-your-number-gets-banned">Your Number Gets Banned</h3>
<p>The single biggest cause is rate: a new number blasting hundreds of messages an hour gets flagged fast. Warm up a number slowly — send a normal, human-like volume for the first week. Don't send to strangers unsolicited. Prefer inbound-driven replies over outbound pushes wherever you can.</p>
<h3 id="heading-the-wrong-chat-id-format">The Wrong Chat ID Format</h3>
<p>WhatsApp individual chats use <code>&lt;number&gt;@c.us</code> and groups use <code>&lt;groupId&gt;@g.us</code>. Don't include the <code>+</code> or spaces in the number. If WAHA returns a 404 when sending, the chat ID is almost always the problem.</p>
<h2 id="heading-where-to-go-next">Where to Go Next</h2>
<p>You now have the foundation. The same two-service stack supports almost any bot you can imagine — you're only limited by what you can build in an n8n workflow.</p>
<p>Some natural next steps:</p>
<ul>
<li><p><strong>Plug in AI replies:</strong> Add an OpenAI or Anthropic node after the Webhook, pass the user's message through it with a short system prompt, and send the response back through WAHA. Cap conversation length to prevent runaway token usage.</p>
</li>
<li><p><strong>Integrate a CRM:</strong> Look up the caller's <code>chatId</code> in HubSpot, Pipedrive, or your own database before deciding how to reply. Segment responses by customer tier.</p>
</li>
<li><p><strong>Send proactive notifications:</strong> Appointment reminders, shipping updates, payment receipts, abandoned-cart nudges. Keep the content transactional and expected — unsolicited marketing blasts are the fastest way to a ban.</p>
</li>
<li><p><strong>Log every conversation:</strong> Add a Postgres or Supabase node after the Webhook to persist messages for analytics and customer history. Your future self (and your support team) will thank you.</p>
</li>
<li><p><strong>Add media handling:</strong> WAHA exposes <code>sendImage</code>, <code>sendFile</code>, and <code>sendVoice</code> endpoints. Teach the bot to accept photos for support tickets, or send invoices as PDFs directly inside the chat.</p>
</li>
</ul>
<p>The WhatsApp layer stays the same. Everything interesting happens upstream in the workflow.</p>
<p><em>If you want to see production examples of n8n and WAHA running at scale — or you need a similar automation built for your business — I'm the founder of Achiya Automation, where we ship WhatsApp, n8n, and Chatwoot integrations. You can find more at</em> <a href="https://achiya-automation.com"><em>achiya-automation.com</em></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
