<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sun, 10 May 2026 22:19:31 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ The Codex Handbook: A Practical Guide to OpenAI's Coding Platform ]]>
                </title>
                <description>
                    <![CDATA[ This handbook is written for developers, team leads, and admins who want to understand what Codex is, how to set it up, how to use it well, how it differs from general-purpose models, and how pricing  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-codex-handbook-a-practical-guide-to-openai-s-coding-platform/</link>
                <guid isPermaLink="false">69fe6b68f239332df41e4063</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #ai-tools ]]>
                    </category>
                
                    <category>
                        <![CDATA[ codex ]]>
                    </category>
                
                    <category>
                        <![CDATA[ openai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tatev Aslanyan ]]>
                </dc:creator>
                <pubDate>Fri, 08 May 2026 23:02:00 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/e558d0da-b13d-4fce-90de-9ef1e818fcff.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>This handbook is written for developers, team leads, and admins who want to understand what Codex is, how to set it up, how to use it well, how it differs from general-purpose models, and how pricing works today.</p>
<p>It's based on current OpenAI Codex documentation and Help Center articles. Pricing and plan availability change frequently, so treat the pricing section as a snapshot of the current docs and verify against the official links before making procurement decisions.</p>
<p><strong>What's new (April 2026):</strong> OpenAI released <strong>GPT-5.5</strong> and <strong>GPT-5.5 Pro</strong> on April 23–24, 2026. GPT-5.5 is now the flagship general model and is rolling into Codex surfaces. See the new "GPT-5.5: The Newest Release" subsection in <a href="#heading-section-2-where-codex-fits-in-the-openai-ecosystem">Section 2</a>, the full benchmark deep dive in <a href="#heading-section-11-model-specs-and-benchmarks-gpt-55-deep-dive">Section 11</a>, and the updated pricing snapshot in <a href="#heading-section-7-pricing-and-plan-access">Section 7</a>.</p>
<p><strong>Authors:</strong> Tatev Aslanyan, Vahe Aslanyan, Jim Amuto | <strong>Version:</strong> 1.3 — Last updated April 30, 2026</p>
<h2 id="heading-executive-summary">Executive Summary</h2>
<p>Codex is OpenAI's coding agent — not a single model, but a product and workflow layer that wraps OpenAI's frontier models with file access, shell execution, sandboxes, approval flows, and code review.</p>
<p>It runs in four surfaces: the CLI, IDE extensions (VS Code, Cursor, Windsurf), the macOS/Windows app, and Codex Cloud for background tasks against GitHub repositories.</p>
<p>The product is included with most paid ChatGPT plans (Plus, Pro, Business, Enterprise/Edu) and, for now, Free and Go with stricter rate limits.</p>
<p>The model layer beneath Codex shifted in April 2026. GPT-5.5 is the new general flagship, with substantial gains on agentic and long-context benchmarks (MRCR v2 at 1M tokens jumped from 36.6% on GPT-5.4 to 74.0% on GPT-5.5. Terminal-Bench 2.0 reaches 82.7%, and hallucination rate dropped roughly 60% versus prior generations). It's also roughly 2× the per-token cost of GPT-5.4, so picking the right model per task now matters more for budget than it did a quarter ago.</p>
<p>For teams adopting Codex, the highest-leverage choices are:</p>
<ol>
<li><p>Start in the CLI or IDE on small bounded tasks before enabling cloud</p>
</li>
<li><p>Use Codex as a pre-merge reviewer in addition to a code generator</p>
</li>
<li><p>Keep admin and user access separated through workspace RBAC, and</p>
</li>
<li><p>Treat token consumption — not prompt count — as the cost driver.</p>
</li>
</ol>
<p>The 30-60-90 day adoption plan in the appendix gives a phased rollout that surfaces friction early.</p>
<p>This handbook covers what Codex is, how to set it up, how to use it well, how it compares to Claude Code, GitHub Copilot, and self-hosted alternatives. We'll also discuss what it costs, how to govern it in an enterprise, and where it does and does not fit. You'll find a glossary, security checklist, and worked cost example in the appendix.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<h3 id="heading-heres-what-well-cover">Here's What We'll Cover:</h3>
<ol>
<li><p><a href="#heading-executive-summary">Executive Summary</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-section-1-what-codex-is">Section 1: What Codex Is</a></p>
</li>
<li><p><a href="#heading-section-2-where-codex-fits-in-the-openai-ecosystem">Section 2: Where Codex Fits in the OpenAI Ecosystem</a></p>
</li>
<li><p><a href="#heading-section-3-the-core-surfaces">Section 3: The Core Surfaces</a></p>
</li>
<li><p><a href="#heading-section-4-getting-started-install-set-up-and-your-first-task">Section 4: Getting Started: Install, Set Up, and Your First Task</a></p>
</li>
<li><p><a href="#heading-section-5-how-to-use-codex-effectively">Section 5: How to Use Codex Effectively</a></p>
</li>
<li><p><a href="#heading-section-6-difference-between-codex-and-other-coding-tools">Section 6: Difference Between Codex and Other Coding Tools</a></p>
</li>
<li><p><a href="#heading-comparison-matrix">Comparison Matrix</a></p>
</li>
<li><p><a href="#heading-section-7-pricing-and-plan-access">Section 7: Pricing and Plan Access</a></p>
</li>
<li><p><a href="#heading-worked-cost-example">Worked Cost Example</a></p>
</li>
<li><p><a href="#heading-section-8-security-permissions-and-enterprise-setup">Section 8: Security, Permissions, and Enterprise Setup</a></p>
</li>
<li><p><a href="#heading-section-9-best-practices-for-teams">Section 9: Best Practices for Teams</a></p>
</li>
<li><p><a href="#heading-section-10-common-workflows-and-examples">Section 10: Common Workflows and Examples</a></p>
</li>
<li><p><a href="#heading-section-11-model-specs-and-benchmarks-gpt-55-deep-dive">Section 11: Model Specs and Benchmarks (GPT-5.5 Deep Dive)</a></p>
</li>
<li><p><a href="#heading-section-12-troubleshooting">Section 12: Troubleshooting</a></p>
</li>
<li><p><a href="#heading-section-13-faq">Section 13: FAQ</a></p>
</li>
<li><p><a href="#heading-section-14-when-not-to-use-codex">Section 14: When NOT to Use Codex</a></p>
</li>
<li><p><a href="#heading-section-15-final-recommendations">Section 15: Final Recommendations</a></p>
</li>
<li><p><a href="#heading-section-16-source-references">Section 16: Source References</a></p>
</li>
<li><p><a href="#heading-appendix-a-30-60-90-day-adoption-plan">Appendix A: 30-60-90 Day Adoption Plan</a></p>
</li>
<li><p><a href="#heading-appendix-b-glossary">Appendix B: Glossary</a></p>
</li>
<li><p><a href="#heading-appendix-c-admin-security-checklist">Appendix C: Admin Security Checklist</a></p>
</li>
<li><p><a href="#heading-appendix-d-changelog">Appendix D: Changelog</a></p>
</li>
<li><p><a href="#heading-appendix-e-working-with-codex-in-vs-code">Appendix E: Working with Codex in VS Code</a></p>
</li>
</ol>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>This handbook is hands-on. To get the most out of it — especially <a href="#heading-section-4-getting-started-install-set-up-and-your-first-task">Section 4</a>, <a href="#heading-section-5-how-to-use-codex-effectively">Section 5</a>, and <a href="#heading-section-10-common-workflows-and-examples">Section 10</a> where you'll install Codex and run real tasks — you should have the following in place.</p>
<h3 id="heading-background-knowledge-you-should-already-have">Background Knowledge You Should Already Have</h3>
<p>You don't need to be a senior engineer, but the walkthroughs assume:</p>
<ul>
<li><p><strong>Comfort using the command line.</strong> You can <code>cd</code> into a directory, list files, run <code>git</code> commands, and read shell error messages. If you have never opened a terminal, work through a one-hour shell tutorial first.</p>
</li>
<li><p><strong>Basic Git literacy.</strong> You understand commits, branches, pull requests, and the difference between staged and unstaged changes. The Codex workflow centers on producing reviewable diffs, so this is non-negotiable.</p>
</li>
<li><p><strong>Experience reading code in at least one mainstream language.</strong> Codex can work in any language, but the demo repo in <a href="#heading-section-4-getting-started-install-set-up-and-your-first-task">Section 4</a> is a small Python service. If you can read Python, JavaScript, Go, or similar, you'll be fine.</p>
</li>
<li><p><strong>A mental model of "what an API call costs."</strong> <a href="#heading-section-7-pricing-and-plan-access">Section 7</a>'s worked cost example assumes you understand that LLM usage is metered by tokens. If "tokens" is a brand-new concept, skim the OpenAI tokenizer page once before reading <a href="#heading-section-7-pricing-and-plan-access">Section 7</a>.</p>
</li>
</ul>
<p>If you're an engineering manager, procurement lead, or admin and you only need <a href="#heading-section-7-pricing-and-plan-access">Section 7</a>, <a href="#heading-section-8-security-permissions-and-enterprise-setup">Section 8</a>, and <a href="#heading-section-14-when-not-to-use-codex">Section 14</a>, you can skip the technical prerequisites and jump straight to those sections.</p>
<h3 id="heading-tools-and-accounts-you-need-to-install">Tools and Accounts You Need to Install</h3>
<p>Before starting <a href="#heading-section-4-getting-started-install-set-up-and-your-first-task">Section 4</a>, have the following ready. Approximate setup time: <strong>15–25 minutes</strong> if you're starting from scratch.</p>
<table>
<thead>
<tr>
<th>Tool / Account</th>
<th>Why you need it</th>
<th>Where to get it</th>
</tr>
</thead>
<tbody><tr>
<td>A ChatGPT account on Plus, Pro, Business, or Enterprise/Edu</td>
<td>Codex is included with these plans. Free and Go work for now but with stricter rate limits</td>
<td><a href="https://chatgpt.com">chatgpt.com</a></td>
</tr>
<tr>
<td><strong>Node.js 18+ and npm</strong></td>
<td>The Codex CLI is installed via npm (<code>npm i -g @openai/codex</code>)</td>
<td><a href="https://nodejs.org">nodejs.org</a></td>
</tr>
<tr>
<td><strong>Git 2.30+</strong></td>
<td>Required to clone the demo repo and produce diffs Codex can review</td>
<td><a href="https://git-scm.com">git-scm.com</a></td>
</tr>
<tr>
<td><strong>A code editor</strong></td>
<td>VS Code is the recommended baseline. Cursor and Windsurf also work</td>
<td><a href="https://code.visualstudio.com">code.visualstudio.com</a></td>
</tr>
<tr>
<td><strong>A GitHub account</strong></td>
<td>Required only for Codex Cloud tasks (<a href="#heading-section-8-security-permissions-and-enterprise-setup">Section 8</a> and <a href="#heading-appendix-e-working-with-codex-in-vs-code">Appendix E</a>)</td>
<td><a href="https://github.com">github.com</a></td>
</tr>
<tr>
<td><strong>WSL2</strong> (Windows users only)</td>
<td>The Codex CLI is experimental on native Windows; WSL is the supported path</td>
<td><a href="https://learn.microsoft.com/en-us/windows/wsl/install">Microsoft WSL docs</a></td>
</tr>
</tbody></table>
<h3 id="heading-verify-your-environment">Verify Your Environment</h3>
<p>Run these three commands before you start <a href="#heading-section-4-getting-started-install-set-up-and-your-first-task">Section 4</a>. If any of them fails, fix it first.</p>
<pre><code class="language-bash">node --version   # should print v18.x or higher
npm --version    # should print 9.x or higher
git --version    # should print 2.30 or higher
</code></pre>
<h3 id="heading-what-this-handbook-will-not-teach-you">What This Handbook Will Not Teach You</h3>
<p>To set expectations honestly, this handbook does <strong>not</strong> cover:</p>
<ul>
<li><p>How to write production-grade Python, JavaScript, or any specific language. We use small examples to demonstrate Codex behavior, not teach syntax.</p>
</li>
<li><p>How to design a system architecture from scratch. <a href="#heading-section-14-when-not-to-use-codex">Section 14</a> explains why Codex is a poor fit for novel architecture decisions.</p>
</li>
<li><p>How to administer GitHub at the organization level. <a href="#heading-section-8-security-permissions-and-enterprise-setup">Section 8</a> covers the Codex-specific GitHub Connector setup, but assumes your GitHub org already exists.</p>
</li>
<li><p>LLM internals (attention, RLHF, and so on). We treat the model as a black box with measurable behavior.</p>
</li>
</ul>
<h2 id="heading-section-1-what-codex-is">Section 1: What Codex Is</h2>
<p>Codex is OpenAI's coding agent. The most important thing to understand is that Codex is not just a single model name. It's a product and workflow layer designed to help people write, review, debug, and ship code faster. In OpenAI's own wording, it's an AI coding agent that can work with you locally or complete tasks in the cloud.</p>
<p>That distinction matters. Most people think of AI in one of two ways:</p>
<ul>
<li><p>A chat model that answers questions.</p>
</li>
<li><p>A coding assistant that suggests snippets.</p>
</li>
</ul>
<p>Codex is broader than both. It can inspect a repository, edit files, run commands, and execute tests. It can also handle larger chunks of work by taking a prompt or spec and turning it into a task plan, code changes, and reviewable output.</p>
<p>For teams, the cloud-based workflow is especially important because it lets Codex run in the background while engineers stay in flow.</p>
<p>OpenAI's current docs also place Codex alongside a wider set of developer tools: the API, the Responses API, the Agents SDK, MCP tools, and the Codex app. If you are onboarding a team, the easiest mental model is this:</p>
<ul>
<li><p>The models are the engine.</p>
</li>
<li><p>Codex is the coding product that uses those engines.</p>
</li>
<li><p>The CLI, IDE extension, web app, and cloud tasks are the ways you interact with it.</p>
</li>
</ul>
<h2 id="heading-section-2-where-codex-fits-in-the-openai-ecosystem">Section 2: Where Codex Fits in the OpenAI Ecosystem</h2>
<p>OpenAI now offers a layered stack:</p>
<ul>
<li><p>General-purpose frontier models such as <strong>GPT-5.5</strong>, <strong>GPT-5.5 Pro</strong>, GPT-5.4, GPT-5.4-mini, and GPT-5.4-nano.</p>
</li>
<li><p>Codex-specific models such as GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex, and codex-mini-latest.</p>
</li>
<li><p>Product surfaces that package those models into workflows, such as Codex CLI, the Codex app, IDE extensions, cloud tasks, and code review.</p>
</li>
</ul>
<p>The practical difference is simple:</p>
<ul>
<li><p>If you need one-off reasoning, synthesis, or general chat, you may use a general model.</p>
</li>
<li><p>If you need an agent that should navigate a repository, change files, run tests, and push toward a concrete code outcome, Codex is the purpose-built surface.</p>
</li>
</ul>
<p>OpenAI's current model docs describe GPT-5.4 as the flagship model for complex reasoning and coding. At the same time, Codex-specific model pages describe GPT-5.3-Codex and GPT-5.2-Codex as optimized for agentic coding tasks in Codex or similar environments. That tells you how OpenAI is positioning the stack:</p>
<ul>
<li><p>GPT-5.4 is the general flagship.</p>
</li>
<li><p>Codex-specific models are tuned for coding workflows.</p>
</li>
<li><p>Codex the product can switch models depending on the surface and configuration.</p>
</li>
</ul>
<p>If you remember nothing else from this section, remember this: Codex is the workflow. Models are the engine.</p>
<h3 id="heading-gpt-55-the-newest-release">GPT-5.5: The Newest Release</h3>
<p>OpenAI launched <strong>GPT-5.5</strong> on April 23, 2026, with API availability following on April 24, 2026. A higher-tier <strong>GPT-5.5 Pro</strong> variant shipped alongside it. OpenAI describes GPT-5.5 as their "smartest and most intuitive to use model yet, and the next step toward a new way of getting work done on a computer."</p>
<p>For a Codex user, the practical upshot is short:</p>
<ol>
<li><p><strong>GPT-5.5 is the new general flagship.</strong> Anywhere older docs say "GPT-5.4 is the flagship," read GPT-5.5 going forward. GPT-5.4 remains available as a cheaper default.</p>
</li>
<li><p><strong>Codex surfaces will switch over.</strong> Expect GPT-5.5 to become selectable (and often the default) inside the CLI, IDE, app, and cloud tasks shortly after launch. Verify the active model in your settings.</p>
</li>
<li><p><strong>Pricing has shifted.</strong> GPT-5.5 sits well above GPT-5.4 on a per-token basis. See <a href="#heading-section-7-pricing-and-plan-access">Section 7</a> before approving budgets.</p>
</li>
</ol>
<p>The full benchmark breakdown, performance highlights, and per-workload guidance for picking GPT-5.5 vs GPT-5.4 vs Codex-specific models are in <a href="#heading-section-11-model-specs-and-benchmarks-gpt-55-deep-dive">Section 11: Model Specs and Benchmarks</a>. Read that section once you have the foundational chapters under your belt.</p>
<h2 id="heading-section-3-the-core-surfaces">Section 3: The Core Surfaces</h2>
<p>Codex currently shows up in a few places, and each one is optimized for a slightly different working style.</p>
<h3 id="heading-codex-cli">Codex CLI</h3>
<ul>
<li><p><a href="https://developers.openai.com/codex/cli">Official docs: developers.openai.com/codex/cli</a></p>
</li>
<li><p><a href="https://www.npmjs.com/package/@openai/codex">npm package: <code>@openai/codex</code></a></p>
</li>
<li><p><a href="https://github.com/openai/codex">GitHub repo</a></p>
</li>
</ul>
<p>The CLI is the fastest way to put Codex directly into a terminal session. The docs describe it as OpenAI's coding agent that runs locally from your terminal, can read, change, and run code on your machine, and is open source and written in Rust.</p>
<p>Use the CLI when you want:</p>
<ul>
<li><p>A terminal-first workflow.</p>
</li>
<li><p>Fast iteration inside an existing repo.</p>
</li>
<li><p>Fine-grained control over approvals and execution.</p>
</li>
<li><p>A lightweight path for local coding tasks.</p>
</li>
</ul>
<h3 id="heading-ide-extension">IDE Extension</h3>
<ul>
<li><p><a href="https://developers.openai.com/codex/ide">Official docs: developers.openai.com/codex/ide</a></p>
</li>
<li><p><a href="https://marketplace.visualstudio.com/items?itemName=openai.chatgpt">VS Code Marketplace listing (<code>openai.chatgpt</code>)</a></p>
</li>
</ul>
<p>The CLI docs and Help Center articles point to the IDE extension for VS Code, Cursor, Windsurf, and other VS Code forks. This is the natural fit when your team lives in an editor and wants Codex embedded in the normal coding flow.</p>
<p>Use the IDE extension when you want:</p>
<ul>
<li><p>Codex close to the files you are already editing.</p>
</li>
<li><p>Prompting and editing without switching contexts.</p>
</li>
<li><p>A bridge between human-driven and agent-driven editing.</p>
</li>
</ul>
<h3 id="heading-codex-app">Codex App</h3>
<ul>
<li><p><a href="https://help.openai.com/en/articles/11369540-codex-in-chatgpt-faq">Help Center: Using Codex with your ChatGPT plan</a></p>
</li>
<li><p><a href="https://chatgpt.com/codex">Download from chatgpt.com/codex</a></p>
</li>
</ul>
<p>OpenAI's Help Center says the Codex app is available on macOS and Windows. It is designed for parallel work across projects, with built-in worktree support, skills, automations, and git functionality.</p>
<p>Use the app when you want:</p>
<ul>
<li><p>Multiple Codex agents running in parallel.</p>
</li>
<li><p>Cloud tasks without bouncing between terminal and editor.</p>
</li>
<li><p>A project-centric place to assign and monitor tasks.</p>
</li>
</ul>
<h3 id="heading-codex-cloud">Codex Cloud</h3>
<ul>
<li><p><a href="https://developers.openai.com/codex/cloud">Official docs: developers.openai.com/codex/cloud</a></p>
</li>
<li><p><a href="https://chatgpt.com/codex">Web interface: chatgpt.com/codex</a></p>
</li>
</ul>
<p>Codex cloud is the background execution mode. It runs each task in an isolated sandbox with the repository and environment, and it is intended for reviewable code output rather than direct interactive sessions.</p>
<p>Use Codex cloud when you want:</p>
<ul>
<li><p>Tasks to run while you do something else.</p>
</li>
<li><p>Sandboxed execution with reviewable diffs.</p>
</li>
<li><p>Automated code review or repository-level workflows.</p>
</li>
</ul>
<h3 id="heading-code-review">Code Review</h3>
<ul>
<li><p><a href="https://help.openai.com/en/articles/11369540-codex-in-chatgpt-faq">Help Center: Codex for code review</a></p>
</li>
<li><p><a href="https://developers.openai.com/codex/use-cases">Codex use cases</a></p>
</li>
</ul>
<p>Codex can also review code inside GitHub. OpenAI describes this as a way to automatically review your personal pull requests or configure reviews at the team level.</p>
<p>Use code review when you want:</p>
<ul>
<li><p>A second set of eyes on pull requests.</p>
</li>
<li><p>Automated regression or issue spotting before human review.</p>
</li>
<li><p>Lightweight review coverage across a team.</p>
</li>
</ul>
<h2 id="heading-section-4-getting-started-install-set-up-and-your-first-task">Section 4: Getting Started: Install, Set Up, and Your First Task</h2>
<p>This section walks you end-to-end from "nothing installed" to "Codex just fixed a real bug for me."</p>
<p>We will use a tiny demo repository you build yourself in two minutes — a small Python price-calculator with one obvious bug and one missing test. That gives you a real, reproducible target you can throw away when you're done.</p>
<p>The same walkthrough works for the CLI, the IDE extension, and the app, with notes for each.</p>
<p>If you have existing code you would rather use, skip ahead to <a href="#heading-step-4-launch-codex-and-run-your-first-task">Step 4</a> and point Codex at your own repo. The demo is for readers who want a known-good starting point.</p>
<h3 id="heading-step-0-confirm-access">Step 0: Confirm Access</h3>
<p>Codex is included with ChatGPT Plus, Pro, Business, and Enterprise/Edu plans. For a limited time, it is also included with Free and Go, with stricter rate limits.</p>
<p>If you are in a team or enterprise workspace, access may also depend on workspace settings and role-based controls. Do not assume that a ChatGPT subscription alone guarantees access in a managed environment — confirm with your admin or look in Codex Cloud settings at <a href="https://chatgpt.com/codex">chatgpt.com/codex</a>.</p>
<h3 id="heading-step-1-install-codex">Step 1: Install Codex</h3>
<p>You have three install paths. Pick <strong>one</strong> to start; you can add the others later.</p>
<h4 id="heading-option-a-the-cli-recommended-for-first-task">Option A: The CLI (recommended for first task)</h4>
<p>The CLI is the most direct way to see how Codex behaves. The official docs note that <strong>macOS and Linux are first-class, while Windows is experimental and you should use WSL2</strong>.</p>
<pre><code class="language-bash">npm i -g @openai/codex
codex --version
</code></pre>
<p>If <code>codex --version</code> prints a version number, you are done.</p>
<h4 id="heading-option-b-the-vs-code-extension">Option B: The VS Code Extension</h4>
<p>In VS Code (or Cursor / Windsurf), open the Extensions panel, search for "Codex" by <code>openai</code>, and install it. Or from a terminal:</p>
<pre><code class="language-bash">code --install-extension openai.chatgpt
</code></pre>
<p>The Codex panel will appear in the right sidebar after install.</p>
<h4 id="heading-option-c-the-codex-app">Option C: The Codex App</h4>
<p>Download the Codex app for macOS or Windows from <a href="https://chatgpt.com/codex">chatgpt.com/codex</a>. The app shines when you want parallel tasks, built-in git worktrees, and a project-centric UI. For your very first task it is overkill — start with the CLI or extension.</p>
<p><strong>VS Code users:</strong> For a step-by-step guide covering all three VS Code entry points (extension, CLI in the integrated terminal, and browser Codex), see <strong>Appendix E: Working with Codex in VS Code</strong>.</p>
<h3 id="heading-step-2-authenticate">Step 2: Authenticate</h3>
<p>Run <code>codex</code> in a terminal (or open the extension panel). You will be prompted to:</p>
<ul>
<li><p><strong>Sign in with ChatGPT</strong> — recommended. Usage is charged against your plan's included Codex credits.</p>
</li>
<li><p><strong>Sign in with an API key</strong> — used when you want metered API billing or your workspace policy requires it.</p>
</li>
</ul>
<p>If you are unsure, pick ChatGPT sign-in.</p>
<h3 id="heading-step-3-build-the-demo-repo">Step 3: Build the Demo Repo</h3>
<p>This is the part most quick-starts skip. Instead of pointing Codex at "any repo," let's create a small, <strong>self-contained demo repo with a known bug</strong> so you can verify Codex actually fixes it.</p>
<p>In a terminal, run:</p>
<pre><code class="language-bash">mkdir codex-demo &amp;&amp; cd codex-demo
git init
</code></pre>
<p>Now create three files. First, <code>pricing.py</code> — a small pricing calculator with one off-by-one bug and one missing edge case:</p>
<pre><code class="language-python"># pricing.py
def apply_discount(price: float, discount_percent: float) -&gt; float:
    """Apply a percentage discount to a price.

    BUG: The discount is applied as a multiplier of (discount_percent / 10)
    instead of (discount_percent / 100). A 20% discount currently doubles
    the price instead of reducing it.
    """
    if discount_percent &lt; 0:
        raise ValueError("discount_percent must be &gt;= 0")
    return price * (1 - discount_percent / 10)


def cart_total(items: list[dict], discount_percent: float = 0) -&gt; float:
    """Compute the total for a list of cart items after a discount."""
    subtotal = sum(item["price"] * item["quantity"] for item in items)
    return apply_discount(subtotal, discount_percent)
</code></pre>
<p>Then <code>test_pricing.py</code> — a single passing test plus one that will fail because of the bug:</p>
<pre><code class="language-python"># test_pricing.py
from pricing import apply_discount, cart_total


def test_no_discount_returns_original_price():
    assert apply_discount(100.0, 0) == 100.0


def test_twenty_percent_discount_on_100_is_80():
    # This will FAIL until the bug in apply_discount is fixed.
    assert apply_discount(100.0, 20) == 80.0


def test_cart_total_with_discount():
    items = [
        {"price": 10.0, "quantity": 2},
        {"price": 5.0, "quantity": 1},
    ]
    # Subtotal is 25.0. With 10% off, expected total is 22.5.
    assert cart_total(items, discount_percent=10) == 22.5
</code></pre>
<p>And a tiny <code>README.md</code>:</p>
<pre><code class="language-markdown"># codex-demo

A tiny pricing module used to learn the Codex workflow.

Run tests with: `python -m pytest`
</code></pre>
<p>Commit the starting state so Codex's diffs are easy to review:</p>
<pre><code class="language-bash">git add .
git commit -m "Initial demo: pricing module with a known bug"
</code></pre>
<p>Confirm the bug is real before you ask Codex to fix it:</p>
<pre><code class="language-bash">python -m pytest
</code></pre>
<p>You should see two failing tests (<code>test_twenty_percent_discount_on_100_is_80</code> and <code>test_cart_total_with_discount</code>).</p>
<p>If <code>pytest</code> is not installed: <code>pip install pytest</code>. The full demo needs only Python 3.10+ and pytest.</p>
<h3 id="heading-step-4-launch-codex-and-run-your-first-task">Step 4: Launch Codex and Run Your First Task</h3>
<p>Now point Codex at the demo repo.</p>
<p><strong>From the CLI:</strong></p>
<pre><code class="language-bash">cd codex-demo
codex
</code></pre>
<p>When Codex starts, give it a clear, bounded task. <strong>Type this prompt exactly:</strong></p>
<pre><code class="language-text">The test suite has two failing tests. Read pricing.py and test_pricing.py,
identify the root cause, fix the smallest possible thing, then run the tests
to confirm they pass. Explain what you changed and why.
</code></pre>
<p>Codex will:</p>
<ol>
<li><p>Inspect <code>pricing.py</code> and <code>test_pricing.py</code>.</p>
</li>
<li><p>Recognize the off-by-one bug (<code>/ 10</code> should be <code>/ 100</code>).</p>
</li>
<li><p>Propose a one-line diff.</p>
</li>
<li><p>Ask for approval before modifying the file (in the default approval mode).</p>
</li>
<li><p>After you approve, run <code>python -m pytest</code> and report that all three tests now pass.</p>
</li>
</ol>
<p><strong>From the VS Code extension:</strong> Open the <code>codex-demo</code> folder in VS Code, open the Codex panel in the right sidebar, and paste the same prompt. The diff will appear inline in the editor for you to review and accept.</p>
<h3 id="heading-step-5-review-the-diff">Step 5: Review the Diff</h3>
<p>This is the most important habit to build early. Even though the fix is one character (<code>10</code> → <code>100</code>), look at the diff before accepting:</p>
<pre><code class="language-bash">git diff
</code></pre>
<p>Read the change. Confirm it matches what Codex described. Run the tests yourself:</p>
<pre><code class="language-bash">python -m pytest
</code></pre>
<p>All three should pass. Commit the fix:</p>
<pre><code class="language-bash">git commit -am "Fix off-by-one in apply_discount"
</code></pre>
<p>You have just completed the full Codex loop: <strong>context → task → change → review → verify</strong>. Every bigger task is a longer version of this loop.</p>
<h3 id="heading-step-6-try-two-more-bounded-tasks">Step 6: Try Two More Bounded Tasks</h3>
<p>Now that the loop works, try these against the same demo repo:</p>
<ol>
<li><p><strong>Add an edge case test.</strong> Prompt: <em>"Add a test that verifies</em> <code>apply_discount</code> <em>raises a ValueError when</em> <code>discount_percent</code> <em>is negative. Run the tests after."</em></p>
</li>
<li><p><strong>Add a missing safety check.</strong> Prompt: <em>"</em><code>apply_discount</code> <em>does not currently reject</em> <code>discount_percent</code> <em>values greater than 100, which would produce a negative price. Add validation, update the existing tests if needed, and add a new test for the new behavior."</em></p>
</li>
</ol>
<p>Each task is small, has a clear acceptance criterion (the tests pass), and produces a reviewable diff. That is the shape of every good Codex task.</p>
<h3 id="heading-step-7-optional-set-up-codex-cloud">Step 7 (Optional): Set Up Codex Cloud</h3>
<p>Cloud tasks let Codex run in the background while you do other work. They require a <strong>GitHub-hosted repository</strong>.</p>
<p>To enable Codex Cloud against the demo repo:</p>
<ol>
<li><p>Push <code>codex-demo</code> to a private GitHub repo: <code>gh repo create codex-demo --private --source=. --push</code> (requires the <code>gh</code> CLI).</p>
</li>
<li><p>Visit <a href="https://chatgpt.com/codex">chatgpt.com/codex</a> and connect the <strong>ChatGPT GitHub Connector</strong>.</p>
</li>
<li><p>Allow the <code>codex-demo</code> repository in the connector. <strong>Do not grant org-wide access by default</strong> — see <a href="#heading-appendix-c-admin-security-checklist">Appendix C</a>.</p>
</li>
<li><p>From the web interface, pick the repo and prompt: <em>"Add type hints to every function in</em> <code>pricing.py</code> <em>and add a CI-style summary of what changed."</em></p>
</li>
<li><p>Wait for the sandbox to finish, review the diff in the browser, and either accept it or open a PR.</p>
</li>
</ol>
<p>By default, <strong>Codex Cloud sandboxes have no internet access</strong>. That is deliberate — admins can allowlist dependency registries and trusted sites if a real workflow needs them.</p>
<h3 id="heading-when-to-use-which-surface">When to Use Which Surface</h3>
<p>After completing the demo, the surface trade-offs become concrete:</p>
<ul>
<li><p><strong>CLI</strong> — fastest for terminal-heavy local work, scriptable, best for multi-step agentic tasks with explicit approvals.</p>
</li>
<li><p><strong>VS Code extension</strong> — lowest friction for in-flow editing while you are already in the editor.</p>
</li>
<li><p><strong>Codex app</strong> — best when you want to run multiple parallel tasks across projects with worktree isolation.</p>
</li>
<li><p><strong>Codex Cloud</strong> — best for background work, long-running tasks, and PR-style review you can leave running.</p>
</li>
</ul>
<p>Most experienced users have <strong>all of them installed</strong> and pick per task. A single workflow rarely fits every kind of work.</p>
<h3 id="heading-what-if-something-doesnt-work">What If Something Doesn't Work?</h3>
<p>If you get stuck during this walkthrough:</p>
<ul>
<li><p><code>codex</code> command not found → npm's global bin is not on your PATH. Restart your terminal, or use a Node version manager like nvm.</p>
</li>
<li><p>Sign-in keeps failing → confirm the email matches your ChatGPT plan; in enterprise workspaces, your admin must enable Codex.</p>
</li>
<li><p>Codex won't modify the file → you may be in a strict approval mode. Approve when prompted, or relax the mode after your first successful task.</p>
</li>
<li><p>Windows misbehavior → switch to a WSL2 terminal. Native Windows for the CLI is experimental.</p>
</li>
</ul>
<p>The full troubleshooting guide is in <a href="#heading-section-12-troubleshooting">Section 12</a>.</p>
<h2 id="heading-section-5-how-to-use-codex-effectively">Section 5: How to Use Codex Effectively</h2>
<p>Codex works best when you treat it like a developer you're onboarding rather than a magic prompt responder. The more concrete your task, the better the result.</p>
<p>Each tip below has a <strong>bad example</strong> (what people actually type) and a <strong>good example</strong> (what produces a useful result). Most use the <code>codex-demo</code> repo from <a href="#heading-section-4-getting-started-install-set-up-and-your-first-task">Section 4</a> so you can run them yourself.</p>
<h3 id="heading-give-it-a-real-objective">Give It a Real Objective</h3>
<p>A "real objective" means a concrete goal with a verifiable outcome — not a feeling.</p>
<p><strong>Bad:</strong></p>
<pre><code class="language-text">Improve this codebase.
</code></pre>
<p>Codex will pick something to do, but you have no way to know if the result is what you wanted, and the diff will probably touch more than you can review.</p>
<p><strong>Good:</strong></p>
<pre><code class="language-text">Refactor cart_total in pricing.py so the iteration logic and the discount
application are in two separate helper functions. Keep the public signature
of cart_total unchanged. Add tests for each helper. Run pytest at the end.
</code></pre>
<p>This works because there is exactly one acceptance criterion (tests pass with the new structure) and exactly one boundary (public signature unchanged). You can review the diff in 30 seconds.</p>
<p>Other shapes that work:</p>
<ul>
<li><p>"Fix the failing test in <code>test_pricing.py::test_twenty_percent_discount_on_100_is_80</code>."</p>
</li>
<li><p>"Add a <code>currency: str = 'USD'</code> parameter to <code>cart_total</code> and update the tests."</p>
</li>
<li><p>"Review the changes in my last commit for missing edge cases."</p>
</li>
</ul>
<h3 id="heading-provide-the-right-context">Provide the Right Context</h3>
<p>Codex can inspect the repo, but you still need to steer it to the right files and constraints. Without that, it wanders.</p>
<p><strong>Bad:</strong></p>
<pre><code class="language-text">Add validation to the pricing module.
</code></pre>
<p>What kind of validation? On which inputs? What error class? Codex has to guess all of that.</p>
<p><strong>Good:</strong></p>
<pre><code class="language-text">Context:
- File: pricing.py
- Function: apply_discount
- Current behavior: raises ValueError for negative discount_percent.
- Desired behavior: also raise ValueError when discount_percent &gt; 100,
  with the message "discount_percent must be between 0 and 100".

Task:
- Add the validation.
- Add a matching test in test_pricing.py.
- Do not change apply_discount's public signature.
- Run pytest after.
</code></pre>
<p>Notice the structure: <strong>what file</strong>, <strong>current behavior</strong>, <strong>desired behavior</strong>, <strong>task</strong>, <strong>constraints</strong>, <strong>how to verify</strong>. That is the difference between a hopeful prompt and a usable spec.</p>
<p>For larger tasks, also include:</p>
<ul>
<li><p>A link to the issue or spec (Codex can fetch it if web access is enabled).</p>
</li>
<li><p>The names of related files even if Codex could find them itself — naming them halves the time-to-first-edit.</p>
</li>
<li><p>The name of any test command, build command, or lint that should pass.</p>
</li>
</ul>
<h3 id="heading-ask-for-intermediate-thinking-when-needed">Ask for Intermediate Thinking When Needed</h3>
<p>"Intermediate thinking" means asking Codex to <strong>plan in writing before it edits files</strong>. The default is for Codex to dive straight to code. For anything larger than a single function, that is the wrong default.</p>
<p><strong>Without intermediate thinking</strong> (the alternative):</p>
<pre><code class="language-text">Refactor pricing.py to support multiple currencies.
</code></pre>
<p>Codex starts editing immediately. You discover after the fact that it changed the database schema, the API contract, and three test files — and you have no idea whether the design choice it made was the right one.</p>
<p><strong>With intermediate thinking:</strong></p>
<pre><code class="language-text">I want to add multi-currency support to pricing.py.

Before editing anything:
1. List the files you expect to touch and why.
2. Outline the approach in 5-10 bullets.
3. Call out any assumptions you are making and any open questions.
4. Identify the riskiest part of the change.

Wait for my approval before making any edits.
</code></pre>
<p>Now you get a plan you can review, push back on, or scrap entirely — at zero cost to the codebase. After you approve, Codex executes against the plan it just wrote, which makes the resulting diff predictable.</p>
<p>Use intermediate thinking whenever the task is:</p>
<ul>
<li><p>Multi-file or cross-cutting.</p>
</li>
<li><p>Architecturally novel for this codebase.</p>
</li>
<li><p>Hard to test (so the diff is your only signal).</p>
</li>
<li><p>High blast-radius if wrong (auth, payments, data migrations).</p>
</li>
</ul>
<h3 id="heading-prefer-bounded-changes">Prefer Bounded Changes</h3>
<p>A <strong>bounded change</strong> is one with all four of these properties:</p>
<ol>
<li><p><strong>Small surface area</strong> — touches one file, one module, or one logical concept.</p>
</li>
<li><p><strong>Clear acceptance criterion</strong> — there's a specific test, output, or behavior that proves it worked.</p>
</li>
<li><p><strong>Reviewable in a few minutes</strong> — a human can read the diff and form an opinion without setting aside an hour.</p>
</li>
<li><p><strong>Easily revertible</strong> — if it goes wrong, <code>git revert</code> undoes it cleanly without breaking anything else.</p>
</li>
</ol>
<p>The opposite is an <strong>unbounded change</strong>: "make the codebase faster," "modernize the API," "add types everywhere." These have no clear endpoint, no easy verification, and no clean revert path.</p>
<p><strong>Bounded examples (good):</strong></p>
<ul>
<li><p>"Add a <code>serialize()</code> method to <code>CartItem</code> that returns a dict suitable for JSON encoding. Add a test."</p>
</li>
<li><p>"In <code>apply_discount</code>, replace the magic number 100 with a module-level constant <code>MAX_DISCOUNT_PERCENT</code>."</p>
</li>
<li><p>"The <code>cart_total</code> function takes a <code>discount_percent</code> keyword argument that defaults to 0. Make the default <code>None</code> and treat <code>None</code> as 'no discount.' Update the tests."</p>
</li>
</ul>
<p><strong>Unbounded examples (avoid):</strong></p>
<ul>
<li><p>"Make pricing.py production-ready."</p>
</li>
<li><p>"Add proper error handling everywhere."</p>
</li>
<li><p>"Improve the architecture."</p>
</li>
</ul>
<p>When you catch yourself writing an unbounded prompt, break it into a list of bounded ones before sending. The decomposition itself is most of the work; once you have it, Codex is good at executing each piece.</p>
<h3 id="heading-use-reviews-as-a-loop">Use Reviews as a Loop</h3>
<p>Codex is not just for writing code — it is also a useful pre-merge reviewer. The loop is:</p>
<ol>
<li><p>You (or Codex) write the change.</p>
</li>
<li><p>Ask Codex to review it.</p>
</li>
<li><p>Fix the issues it finds.</p>
</li>
<li><p>Re-run tests.</p>
</li>
</ol>
<p><strong>What this looks like in practice:</strong></p>
<p>After completing a task in <code>codex-demo</code>, ask Codex to review your own commit:</p>
<pre><code class="language-text">Review the change in my last commit (git show HEAD) for:
- correctness issues (off-by-one, type mismatches, wrong defaults)
- missing tests, especially edge cases
- security concerns (input validation, injection, unsafe defaults)
- maintainability risks (unclear naming, hidden coupling)

Prioritize findings by severity (critical / important / nit). For each
finding, point to the exact line and propose a concrete fix. Do not
modify any files in this turn — just produce the review.
</code></pre>
<p>You will typically get back a structured response like:</p>
<pre><code class="language-text">CRITICAL: line 14 — apply_discount accepts NaN silently because the type
  check is `discount_percent &lt; 0`, which is False for NaN. Fix: add an
  explicit math.isnan() check before the comparison.

IMPORTANT: test_pricing.py has no test for the boundary discount_percent=100.
  Fix: add a test asserting apply_discount(100, 100) == 0.

NIT: line 8 — the docstring mentions a "BUG" comment that should be removed
  now that the bug is fixed.
</code></pre>
<p>Then you triage: fix the critical and important findings (often by feeding them back to Codex with "apply the fixes you proposed"), defer or reject the nits, and re-run tests.</p>
<p>This converts Codex from a code generator into a <strong>quality gate</strong>, which is usually the higher-leverage use. A team that uses Codex only as a generator gets faster code; a team that also uses it as a reviewer gets better code.</p>
<h2 id="heading-section-6-difference-between-codex-and-other-coding-tools">Section 6: Difference Between Codex and Other Coding Tools</h2>
<p>This is the section that usually matters most to new users, because the category boundaries are easy to blur.</p>
<h3 id="heading-codex-is-a-product-layer-not-just-a-model">Codex Is A Product Layer, Not Just A Model</h3>
<p>Codex is the product experience and workflow layer. Models are the underlying engines. Put differently:</p>
<ul>
<li><p>A general model answers questions or writes text.</p>
</li>
<li><p>A coding model is tuned more narrowly for software tasks.</p>
</li>
<li><p>Codex packages the model inside an agentic coding workflow with files, commands, approvals, sandboxes, and reviews.</p>
</li>
</ul>
<p>That matters because users often compare Codex to "another model" when the real comparison is "another coding system."</p>
<h3 id="heading-codex-vs-openai-general-models">Codex vs OpenAI General Models</h3>
<p>OpenAI's current models page recommends GPT-5.4 as the flagship model for complex reasoning and coding. That is the general model-side recommendation.</p>
<p>Codex-specific pages, on the other hand, describe models like GPT-5.3-Codex and GPT-5.2-Codex as optimized for agentic coding tasks in Codex or similar environments.</p>
<p>The practical takeaway:</p>
<ul>
<li><p>Use GPT-5.4 when you want a top-tier general model.</p>
</li>
<li><p>Use Codex-specific models when you want a model optimized for coding workflows inside Codex.</p>
</li>
<li><p>Use the Codex surface when you want file edits, shell commands, reviews, and sandboxes, not just text output.</p>
</li>
</ul>
<h3 id="heading-codex-vs-claude-code">Codex vs Claude Code</h3>
<p>Claude Code is also a terminal-based agentic coding tool. Anthropic's docs describe it as a terminal tool that can make plans, edit files, run commands, create commits, and work with MCP-connected data sources. It is strong if your team already prefers a terminal-first workflow and wants a tightly scriptable developer tool.</p>
<p>Codex differs in a few practical ways:</p>
<ul>
<li><p>Codex spans more surfaces, including CLI, IDE extension, app, cloud tasks, and code review.</p>
</li>
<li><p>Codex cloud is built around GitHub-connected task execution and review.</p>
</li>
<li><p>Codex is more explicitly positioned as a family of coding workflows, not just a single terminal agent.</p>
</li>
</ul>
<p>The practical takeaway:</p>
<ul>
<li><p>Choose Claude Code if you want a terminal-native workflow with strong composability and you are happy living mostly in the shell.</p>
</li>
<li><p>Choose Codex if you want a broader product layer with local, cloud, and app-based workflows that can be shared across a team.</p>
</li>
</ul>
<h3 id="heading-codex-vs-github-copilot-coding-agent">Codex vs GitHub Copilot Coding Agent</h3>
<p>GitHub Copilot coding agent is designed around GitHub's own workflow. GitHub docs describe it as an agent you can assign issues or pull requests to, and it works in the background to create or modify PRs. It lives very naturally inside GitHub-hosted development flows.</p>
<p>Codex is different in emphasis:</p>
<ul>
<li><p>Copilot coding agent is highly GitHub-centric.</p>
</li>
<li><p>Codex is broader across terminal, IDE, app, and cloud.</p>
</li>
<li><p>Copilot is a strong fit if your team already uses GitHub as the center of gravity for task assignment and review.</p>
</li>
<li><p>Codex is a stronger fit if you want a more general coding agent surface that can work across local and cloud workflows.</p>
</li>
</ul>
<p>The practical takeaway:</p>
<ul>
<li><p>Choose Copilot coding agent if your process is already deeply anchored in GitHub issues and pull requests.</p>
</li>
<li><p>Choose Codex if you want a wider agent workflow that can run locally, in the IDE, or in Codex cloud.</p>
</li>
</ul>
<h3 id="heading-codex-vs-open-weight-and-self-hosted-models">Codex vs Open-Weight and Self-Hosted Models</h3>
<p>Open-weight or self-hosted models serve a different need. Teams usually reach for them when they want:</p>
<ul>
<li><p>Full infrastructure control.</p>
</li>
<li><p>Custom hosting or air-gapped deployment.</p>
</li>
<li><p>More direct control over retention and data boundaries.</p>
</li>
<li><p>A lower-cost path at high scale if they already own the hardware and ops stack.</p>
</li>
</ul>
<p>The tradeoff is that self-hosted models usually do not give you the same out-of-the-box agentic product experience that Codex does. You have to assemble the orchestration, repo access, sandboxing, approvals, and review loop yourself.</p>
<p>That means the real choice is not "Which model is smartest?" It is "How much engineering do I want to spend on the workflow around the model?"</p>
<p>The practical takeaway:</p>
<ul>
<li><p>Choose open-weight or self-hosted models when infrastructure control is the main requirement and you are willing to build the surrounding agent system.</p>
</li>
<li><p>Choose Codex when you want the workflow already packaged, especially for day-to-day engineering teams.</p>
</li>
</ul>
<h3 id="heading-codex-vs-general-chat-models">Codex vs General Chat Models</h3>
<p>General chat models are best when the task is:</p>
<ul>
<li><p>A question and answer exchange.</p>
</li>
<li><p>Conceptual reasoning.</p>
</li>
<li><p>Drafting prose.</p>
</li>
<li><p>Summarizing or rewriting text.</p>
</li>
</ul>
<p>Codex is better when the task is:</p>
<ul>
<li><p>Reading and modifying a repository.</p>
</li>
<li><p>Running tests.</p>
</li>
<li><p>Fixing code.</p>
</li>
<li><p>Reviewing pull requests.</p>
</li>
<li><p>Coordinating multi-step implementation work.</p>
</li>
</ul>
<h3 id="heading-codex-vs-api-usage-of-the-same-models">Codex vs API Usage of the Same Models</h3>
<p>The same model family can behave differently depending on the surface.</p>
<ul>
<li><p>In the API, you may call a model directly and design your own orchestration.</p>
</li>
<li><p>In Codex, the same or similar model may be wrapped in repo access, approval flows, and task execution.</p>
</li>
</ul>
<p>That is why some model pages mention that a model is optimized for "Codex or similar environments." The model is tuned for agentic software work, but the workflow surface still matters.</p>
<h3 id="heading-comparison-matrix">Comparison Matrix</h3>
<p>The prose comparisons above collapse into a single matrix for fast reference:</p>
<table>
<thead>
<tr>
<th>Dimension</th>
<th>Codex</th>
<th>Claude Code</th>
<th>GitHub Copilot Coding Agent</th>
<th>Self-hosted / Open-weight</th>
</tr>
</thead>
<tbody><tr>
<td>Primary surface</td>
<td>CLI, IDE, app, cloud</td>
<td>CLI (terminal-first)</td>
<td>GitHub web/PR/issues</td>
<td>Whatever you build</td>
</tr>
<tr>
<td>Background execution</td>
<td>Yes (Codex Cloud sandboxes)</td>
<td>Limited; runs locally</td>
<td>Yes (GitHub Actions runners)</td>
<td>DIY</td>
</tr>
<tr>
<td>Repository integration</td>
<td>GitHub via connector; local repos directly</td>
<td>Local; MCP-connected sources</td>
<td>Native GitHub</td>
<td>DIY</td>
</tr>
<tr>
<td>Model choice</td>
<td>OpenAI models, switchable per surface</td>
<td>Anthropic Claude models</td>
<td>GitHub-managed (mix of vendors)</td>
<td>Any model you can host</td>
</tr>
<tr>
<td>Approval and sandbox controls</td>
<td>Yes, per-surface</td>
<td>Yes, per-tool</td>
<td>GitHub permission model</td>
<td>DIY</td>
</tr>
<tr>
<td>Parallel agents</td>
<td>Yes (app + cloud)</td>
<td>Limited</td>
<td>Yes (per-PR)</td>
<td>DIY</td>
</tr>
<tr>
<td>Best fit</td>
<td>Cross-surface team workflows</td>
<td>Terminal-native power users</td>
<td>Teams already living in GitHub</td>
<td>Air-gapped, custom infra, or cost-sensitive at scale</td>
</tr>
<tr>
<td>Main tradeoff</td>
<td>OpenAI ecosystem lock-in; price tier</td>
<td>Less product surface area</td>
<td>Heavily GitHub-coupled</td>
<td>Significant engineering effort</td>
</tr>
</tbody></table>
<p>Use the matrix to pick the dominant tool, then layer the others where they fit. Many teams legitimately run two of these in parallel — for example, Codex for cross-surface work and Claude Code for power-user terminal workflows.</p>
<h3 id="heading-which-tool-should-a-new-user-choose">Which Tool Should A New User Choose?</h3>
<p>As a rule of thumb:</p>
<ul>
<li><p>For terminal-first coding and scripting, Claude Code is a strong alternative.</p>
</li>
<li><p>For GitHub-native issue and PR automation, GitHub Copilot coding agent fits naturally.</p>
</li>
<li><p>For local plus cloud plus app-based team workflows, Codex is the most flexible option.</p>
</li>
<li><p>For maximum infrastructure control, self-hosted or open-weight stacks make sense.</p>
</li>
</ul>
<p>OpenAI's docs currently list GPT-5.5 as the general flagship, with GPT-5.4, GPT-5.4-mini, and GPT-5.4-nano remaining available below it, while Codex docs and model pages expose Codex-specific variants and model switching inside the CLI.</p>
<h2 id="heading-section-7-pricing-and-plan-access">Section 7: Pricing and Plan Access</h2>
<p>Pricing is the part of Codex most likely to change, so this section should be treated as a snapshot of the current official docs.</p>
<h3 id="heading-plan-access">Plan Access</h3>
<p>OpenAI's current Help Center says Codex is included with:</p>
<ul>
<li><p>ChatGPT Plus</p>
</li>
<li><p>ChatGPT Pro</p>
</li>
<li><p>ChatGPT Business</p>
</li>
<li><p>ChatGPT Enterprise/Edu</p>
</li>
</ul>
<p>For a limited time, it is also included with Free and Go, though those plans are temporary exceptions and subject to rate limits.</p>
<h3 id="heading-flexible-pricing-and-credits">Flexible Pricing and Credits</h3>
<p>The current rate card says Codex pricing changed on April 2, 2026 to align with API token usage instead of purely per-message pricing. The same article explains that:</p>
<ul>
<li><p>New and existing Plus and Pro customers use the token-based rate card.</p>
</li>
<li><p>New and existing Business customers use the token-based rate card.</p>
</li>
<li><p>New Enterprise customers use the token-based rate card.</p>
</li>
<li><p>Existing Enterprise/Edu and several other legacy plan categories remain on the legacy rate card until migration.</p>
</li>
</ul>
<p>This is important because two teams in the same company can be on different pricing logic depending on workspace status and plan vintage.</p>
<h3 id="heading-current-model-pricing-snapshot">Current Model Pricing Snapshot</h3>
<p>The current model pages list pricing per 1M tokens in USD. The exact numbers depend on the model you choose:</p>
<ul>
<li><p><strong>GPT-5.5: \(5 input, \)30 output.</strong> New flagship as of April 23, 2026.</p>
</li>
<li><p><strong>GPT-5.5 Pro: \(30 input, \)180 output.</strong> Higher-tier variant for the most demanding agentic and reasoning workloads.</p>
</li>
<li><p>GPT-5.4: \(2.50 input, \)15 output.</p>
</li>
<li><p>GPT-5.4-mini: \(0.75 input, \)4.50 output.</p>
</li>
<li><p>GPT-5.4-nano: \(0.20 input, \)1.25 output.</p>
</li>
<li><p>GPT-5-Codex: \(1.25 input, \)10 output.</p>
</li>
<li><p>GPT-5.2-Codex: \(1.75 input, \)14 output.</p>
</li>
<li><p>GPT-5.1-Codex-mini: \(0.25 input, \)2 output.</p>
</li>
<li><p>codex-mini-latest: \(1.50 input, \)6 output.</p>
</li>
</ul>
<p>These model pages also note context windows, output limits, and whether the model is intended for Codex-specific or general API use. For budget planning, remember that longer outputs can cost much more than the input prompt, so task framing matters as much as model choice.</p>
<p>Note that GPT-5.5 is roughly 2x the input price and 2x the output price of GPT-5.4, and GPT-5.5 Pro is an order of magnitude above that. OpenAI's framing is that GPT-5.5 is also more token-efficient than GPT-5.4, which can offset some of the headline price difference, but you should measure this on your own workloads before assuming it nets out. For the Codex-specific models, expect the lineup to shift as Codex variants based on GPT-5.5 ship; until then, the Codex-specific models above remain the right choice for purely coding-shaped tasks.</p>
<h3 id="heading-what-this-means-in-practice">What This Means in Practice</h3>
<p>The real cost depends on:</p>
<ul>
<li><p>Input size.</p>
</li>
<li><p>Cached input.</p>
</li>
<li><p>Output length.</p>
</li>
<li><p>Whether the task uses fast mode.</p>
</li>
<li><p>Which model you select.</p>
</li>
</ul>
<p>So if you are planning a team rollout, do not estimate usage from "number of prompts" alone. Estimate based on expected token consumption and task type.</p>
<h3 id="heading-legacy-pricing">Legacy Pricing</h3>
<p>The legacy rate card still matters for users and workspaces that have not been migrated. The big lesson is that pricing is now tied more closely to model usage than to a simple fixed message count. Anyone budgeting Codex should read the current rate card before setting internal chargeback rules or usage policies.</p>
<h3 id="heading-worked-cost-example">Worked Cost Example</h3>
<p>Pricing tables are easy to misread. A worked example makes the model selection question concrete.</p>
<p><strong>Scenario:</strong> A 30-engineer team uses Codex Cloud for automated pull request review. Each engineer opens roughly 4 PRs per week. Each PR review pulls in approximately 30,000 input tokens (the diff plus relevant context files) and produces approximately 3,000 output tokens (the review comments and risk summary).</p>
<p>Weekly token volume:</p>
<ul>
<li><p>Reviews per week: 30 engineers × 4 PRs = 120 reviews</p>
</li>
<li><p>Input tokens per week: 120 × 30,000 = 3.6M input tokens</p>
</li>
<li><p>Output tokens per week: 120 × 3,000 = 360K output tokens</p>
</li>
</ul>
<p>Cost per week by model:</p>
<table>
<thead>
<tr>
<th>Model</th>
<th>Input cost</th>
<th>Output cost</th>
<th>Weekly total</th>
<th>Annualized (52 wk)</th>
</tr>
</thead>
<tbody><tr>
<td>GPT-5.5 (\(5 / \)30)</td>
<td>3.6M × \(5/1M = \)18.00</td>
<td>0.36M × \(30/1M = \)10.80</td>
<td><strong>$28.80</strong></td>
<td>$1,498</td>
</tr>
<tr>
<td>GPT-5.5 Pro (\(30 / \)180)</td>
<td>$108.00</td>
<td>$64.80</td>
<td><strong>$172.80</strong></td>
<td>$8,986</td>
</tr>
<tr>
<td>GPT-5.4 (\(2.50 / \)15)</td>
<td>$9.00</td>
<td>$5.40</td>
<td><strong>$14.40</strong></td>
<td>$749</td>
</tr>
<tr>
<td>GPT-5-Codex (\(1.25 / \)10)</td>
<td>$4.50</td>
<td>$3.60</td>
<td><strong>$8.10</strong></td>
<td>$421</td>
</tr>
<tr>
<td>GPT-5.1-Codex-mini (\(0.25 / \)2)</td>
<td>$0.90</td>
<td>$0.72</td>
<td><strong>$1.62</strong></td>
<td>$84</td>
</tr>
</tbody></table>
<p><strong>Reading the table:</strong> The headline GPT-5.5 sticker shock disappears at this volume — under $1,500/year for 30 engineers' worth of automated review is a rounding error against engineering payroll. GPT-5.5 Pro is 6× more expensive and generally not justified for routine review; reserve it for the small share of reviews where you need its extra capability. The Codex-specific models are dramatically cheaper and are the right default if your reviews are mostly mechanical (style, obvious bugs, missing tests).</p>
<p><strong>What this example does not capture:</strong></p>
<ul>
<li><p><strong>Cached input.</strong> OpenAI prices repeated input tokens lower; if your review pulls the same context files repeatedly, real costs are lower than shown.</p>
</li>
<li><p><strong>Long-task overhead.</strong> Agentic workflows that re-read files or iterate burn many more tokens than a single-shot review. A coding task can easily be 5–10× the tokens of a review.</p>
</li>
<li><p><strong>Failure retries.</strong> A failed task that gets re-run costs roughly the same as the original. Agent flakiness is a real budget line item.</p>
</li>
<li><p><strong>Mixed-model strategies.</strong> Most mature teams route cheap tasks (test stubs, doc updates) to a Codex-mini model and reserve GPT-5.5 for repository-wide refactors and PRs that need long-context reasoning.</p>
</li>
</ul>
<p>The practical pattern: build the cost model around your actual highest-volume workload (usually PR review or test generation), then size the GPT-5.5 budget separately for the smaller set of tasks that actually benefit from the new capabilities.</p>
<h2 id="heading-section-8-security-permissions-and-enterprise-setup">Section 8: Security, Permissions, and Enterprise Setup</h2>
<p>Teams care about Codex not just as a productivity tool, but as a controlled software-development system. OpenAI's docs reflect that reality.</p>
<h3 id="heading-local-vs-cloud-access">Local vs Cloud Access</h3>
<p>Enterprise admins can separately enable:</p>
<ul>
<li><p>Codex Local</p>
</li>
<li><p>Codex Cloud</p>
</li>
<li><p>Both</p>
</li>
</ul>
<p>Codex Local covers the app, CLI, and IDE extension. Codex Cloud covers hosted tasks, code review, and related integrations.</p>
<p>That separation is useful because some organizations want local tooling enabled broadly while keeping cloud tasks restricted to fewer users.</p>
<h3 id="heading-workspace-controls">Workspace Controls</h3>
<p>The admin docs say workspace owners can use RBAC to manage access. They can:</p>
<ul>
<li><p>Set a default role.</p>
</li>
<li><p>Create custom roles.</p>
</li>
<li><p>Assign roles to groups.</p>
</li>
<li><p>Sync groups with SCIM.</p>
</li>
<li><p>Manage permissions centrally.</p>
</li>
</ul>
<p>This is the right place to build a rollout with least privilege rather than giving every developer broad Codex access by default.</p>
<h3 id="heading-github-connector-and-repository-access">GitHub Connector and Repository Access</h3>
<p>Codex Cloud requires GitHub-hosted repositories. Admins connect the ChatGPT GitHub Connector, choose an installation target, and allow specific repositories. Codex uses short-lived, least-privilege GitHub App tokens and respects repository permissions and branch protection rules.</p>
<p>For security teams, that matters because it keeps Codex aligned with the repo access model you already use.</p>
<h3 id="heading-internet-access">Internet Access</h3>
<p>By default, Codex cloud agents do not have internet access at runtime. That is deliberate. If your task truly needs access to dependency registries or trusted sites, admins can configure allowlists and HTTP method limits.</p>
<h3 id="heading-recommended-governance-pattern">Recommended Governance Pattern</h3>
<p>The enterprise docs recommend using separate groups for users and admins:</p>
<ul>
<li><p>A smaller Codex Admin group for people who manage policy and governance.</p>
</li>
<li><p>A broader Codex Users group for developers who just need to use the tool.</p>
</li>
</ul>
<p>That keeps policy management tight and avoids accidental over-permissioning.</p>
<h2 id="heading-section-9-best-practices-for-teams">Section 9: Best Practices for Teams</h2>
<p>If you are onboarding a team, you will get much better outcomes if you set expectations up front.</p>
<h3 id="heading-start-with-simple-valuable-tasks">Start With Simple, Valuable Tasks</h3>
<p>Good first-team use cases:</p>
<ul>
<li><p>Pull request review.</p>
</li>
<li><p>Small bug fixes.</p>
</li>
<li><p>Test generation.</p>
</li>
<li><p>Documentation updates.</p>
</li>
<li><p>Codebase navigation and understanding.</p>
</li>
</ul>
<p>These are easy to compare against human work and easy to judge for quality.</p>
<h3 id="heading-standardize-task-prompts">Standardize Task Prompts</h3>
<p>Give people a shared prompt template. For example:</p>
<pre><code class="language-text">Task: Fix the failing test in X.
Context: The regression started after Y.
Constraints: Do not change public API behavior.
Output: Explain root cause, apply fix, run tests, summarize risks.
</code></pre>
<p>This makes results easier to review and reduces the "prompt quality lottery" that often hurts team adoption.</p>
<h3 id="heading-use-a-review-culture">Use a Review Culture</h3>
<p>Codex should not replace code review discipline. Treat it as:</p>
<ul>
<li><p>A first-pass implementer.</p>
</li>
<li><p>A pre-review reviewer.</p>
</li>
<li><p>A way to reduce repetitive work.</p>
</li>
</ul>
<p>The human team should still own architecture, product tradeoffs, and final sign-off.</p>
<h3 id="heading-measure-what-matters">Measure What Matters</h3>
<p>The metrics that matter are the ones that tell you whether Codex is producing reviewable, mergeable, trustworthy work — not the ones that count activity. Below is each metric, <strong>how to actually compute it from data you already have</strong>, and the rule of thumb for what "healthy" looks like.</p>
<h4 id="heading-1-time-to-first-useful-diff">1. Time to First Useful Diff</h4>
<p><strong>Definition:</strong> From the moment a Codex task is started, how long until it produces a diff that a human would actually consider applying (after possible small tweaks).</p>
<p><strong>How to measure:</strong></p>
<ul>
<li><p>For CLI/IDE tasks, log the wall-clock time from prompt submission to first diff. The Codex CLI emits structured logs you can parse; a simple wrapper script suffices:</p>
<pre><code class="language-bash">start=\((date +%s); codex "&lt;prompt&gt;"; echo "elapsed: \)(( $(date +%s) - start ))s"
</code></pre>
</li>
<li><p>For Codex Cloud tasks, use the task duration shown in the chatgpt.com/codex dashboard, or pull it from the workspace usage export.</p>
</li>
<li><p>Tag each task as "useful" or "discarded" in a shared spreadsheet for the first month. After that, you can sample.</p>
</li>
</ul>
<p><strong>Healthy:</strong> under 2 minutes for bounded tasks; under 10 minutes for multi-file refactors. If the median is much higher, your prompts probably lack context (see <a href="#heading-section-5-how-to-use-codex-effectively">Section 5</a>).</p>
<h4 id="heading-2-test-pass-rate-on-codex-generated-changes">2. Test Pass Rate on Codex-Generated Changes</h4>
<p><strong>Definition:</strong> Of the diffs Codex produces, what percentage pass the existing test suite on the first try.</p>
<p><strong>How to measure:</strong></p>
<ul>
<li><p>In CI, tag PRs that originated from Codex (a label like <code>codex-authored</code> or a commit-message prefix works). Then run a simple weekly query:</p>
<pre><code class="language-sql">SELECT
  COUNT(*) FILTER (WHERE first_ci_run = 'pass') * 100.0 / COUNT(*) AS first_try_pass_rate
FROM pull_requests
WHERE labels @&gt; '{"codex-authored"}'
  AND created_at &gt; NOW() - INTERVAL '7 days';
</code></pre>
</li>
<li><p>For local CLI usage, instrument with a wrapper that runs your test command immediately after Codex finishes and records the exit code.</p>
</li>
</ul>
<p><strong>Healthy:</strong> above 75% for bounded tasks. Below 50% means Codex is making changes without verifying them — usually fixable by adding "run the tests after" to your prompt template (see <a href="#heading-standardize-task-prompts">Section 9 → Standardize Task Prompts</a>).</p>
<h4 id="heading-3-review-findings-caught-by-codex">3. Review Findings Caught by Codex</h4>
<p><strong>Definition:</strong> When Codex is used as a pre-merge reviewer, how many issues does it surface that a human reviewer or CI would have caught anyway, vs. issues only Codex caught, vs. false positives.</p>
<p><strong>How to measure:</strong></p>
<ul>
<li><p>Have human reviewers annotate Codex's review comments with one of three tags: <code>agree-found-it</code>, <code>agree-missed-it</code>, <code>disagree-noise</code>.</p>
</li>
<li><p>Track the ratios over time:</p>
<ul>
<li><p><strong>Useful-finding rate</strong> = (<code>agree-found-it</code> + <code>agree-missed-it</code>) / total Codex comments.</p>
</li>
<li><p><strong>Unique-value rate</strong> = <code>agree-missed-it</code> / total Codex comments.</p>
</li>
</ul>
</li>
<li><p>A simple GitHub Actions step that posts the Codex review and asks the human reviewer to react with emoji (✅ / ⚠️ / ❌) makes this nearly free to collect.</p>
</li>
</ul>
<p><strong>Healthy:</strong> useful-finding rate above 70%; unique-value rate above 20%. Unique-value rate is the number that justifies keeping the workflow on — if it is near zero, Codex is duplicating CI and you can disable it without losing anything.</p>
<h4 id="heading-4-tasks-completed-without-human-rewrite">4. Tasks Completed Without Human Rewrite</h4>
<p><strong>Definition:</strong> Of all merged Codex-authored changes, what fraction shipped substantially as Codex wrote them (vs. being heavily rewritten by a human before merge).</p>
<p><strong>How to measure:</strong></p>
<ul>
<li><p>Compare the diff Codex initially produced to the diff that actually merged. The simplest proxy:</p>
<pre><code class="language-bash"># in the Codex-authored branch:
git diff codex/initial-commit HEAD --shortstat
</code></pre>
<p>If the post-Codex diff changes more than ~30% of the lines Codex originally wrote, count the task as "rewritten."</p>
</li>
<li><p>Track this monthly. The trend line matters more than the absolute number.</p>
</li>
</ul>
<p><strong>Healthy:</strong> above 60% shipped without major rewrite. Lower than that, and either prompts are under-specified or Codex is being pushed into work it is bad at — re-read <a href="#heading-section-14-when-not-to-use-codex">Section 14</a>.</p>
<h4 id="heading-5-developer-satisfaction">5. Developer Satisfaction</h4>
<p><strong>Definition:</strong> Whether the people actually using the tool think it makes them faster and want to keep using it. Hard numbers do not capture this.</p>
<p><strong>How to measure:</strong></p>
<ul>
<li><p>Run a 5-question pulse survey monthly. Keep it short. Suggested questions, all on a 1–5 scale:</p>
<ol>
<li><p>"Codex saved me time this week."</p>
</li>
<li><p>"I trust Codex's diffs enough to review them confidently."</p>
</li>
<li><p>"Codex's review comments are usually worth reading."</p>
</li>
<li><p>"I would be unhappy if Codex were taken away."</p>
</li>
<li><p>"What is the single biggest friction point?" (free text)</p>
</li>
</ol>
</li>
<li><p>Track the <strong>trend in question 4</strong> specifically. That is the closest equivalent to a product-market-fit signal for an internal tool.</p>
</li>
</ul>
<p><strong>Healthy:</strong> average score above 3.5/5 on questions 1–4 by month 3 of rollout. If question 4 trends down, the rollout is failing regardless of what the other metrics say.</p>
<h4 id="heading-what-not-to-measure">What NOT to Measure</h4>
<p>These look useful but mislead:</p>
<ul>
<li><p><strong>Number of prompts sent.</strong> Counts activity, not value. A team sending 10× more prompts may be 10× more productive — or 10× more confused.</p>
</li>
<li><p><strong>Tokens consumed.</strong> Useful for budget, useless for impact. Heavy users are not necessarily good users.</p>
</li>
<li><p><strong>Lines of code generated.</strong> Same problem as LOC has always had: you reward verbosity.</p>
</li>
<li><p><strong>PRs opened by Codex.</strong> A Codex-opened PR that nobody merges is a negative outcome dressed up as a positive one.</p>
</li>
</ul>
<p>Use the cost data (<a href="#heading-section-7-pricing-and-plan-access">Section 7</a>) to manage budget. Use the metrics above to manage adoption.</p>
<h3 id="heading-use-the-right-surface-for-the-job">Use the Right Surface for the Job</h3>
<ul>
<li><p>CLI for terminal-heavy local work.</p>
</li>
<li><p>IDE extension for day-to-day coding.</p>
</li>
<li><p>App for parallel project work.</p>
</li>
<li><p>Cloud for background tasks and review.</p>
</li>
</ul>
<p>That is usually the difference between "this is useful" and "this is annoying."</p>
<h2 id="heading-section-10-common-workflows-and-examples">Section 10: Common Workflows and Examples</h2>
<p>Here are the workflows most teams will actually use. Each one includes a <strong>worked example</strong> against the <code>codex-demo</code> repo from <a href="#heading-section-4-getting-started-install-set-up-and-your-first-task">Section 4</a> so you can see the full prompt, the kind of output Codex produces, and what to do with it.</p>
<h3 id="heading-workflow-1-fix-a-bug-locally">Workflow 1: Fix a Bug Locally</h3>
<p><strong>Use when:</strong> A test is failing, a behavior is wrong, and the cause is contained to one file or function.</p>
<p><strong>Steps:</strong></p>
<ol>
<li><p>Open the repo in your terminal or IDE.</p>
</li>
<li><p>Ask Codex to inspect the failing path.</p>
</li>
<li><p>Request a fix and a test.</p>
</li>
<li><p>Review the diff.</p>
</li>
<li><p>Run the test suite.</p>
</li>
</ol>
<p><strong>Worked example:</strong></p>
<p>In the <code>codex-demo</code> repo, suppose a teammate just reported: <em>"</em><code>apply_discount</code> <em>is silently returning a negative price when discount_percent is greater than 100."</em> Verify the bug first:</p>
<pre><code class="language-bash">python -c "from pricing import apply_discount; print(apply_discount(100, 150))"
# prints: -50.0    &lt;-- silent negative price, no error raised
</code></pre>
<p>Now launch Codex and run:</p>
<pre><code class="language-text">Bug: apply_discount(100, 150) returns -50.0 instead of raising an error.
Expected: discount_percent values above 100 should raise ValueError with
the message "discount_percent must be between 0 and 100".

Task:
- Add the validation in pricing.py.
- Add a test in test_pricing.py that asserts ValueError is raised for
  discount_percent=150.
- Keep the existing tests passing.
- Run pytest at the end and report the result.
</code></pre>
<p><strong>What you get back:</strong> a diff that adds <code>if discount_percent &gt; 100: raise ValueError(...)</code> in <code>apply_discount</code>, a new <code>test_invalid_discount_percent_above_100</code> test, and the pytest output showing all four tests passing. Review with <code>git diff</code>, run <code>python -m pytest</code> yourself to confirm, then <code>git commit -am "Reject discount_percent &gt; 100"</code>.</p>
<p>This works best when the bug is bounded and reproducible. If you cannot reproduce it from the command line, Codex usually cannot either.</p>
<h3 id="heading-workflow-2-review-a-pull-request">Workflow 2: Review a Pull Request</h3>
<p><strong>Use when:</strong> You (or a teammate) just made a change and want a fast pre-merge sanity check before opening it for human review.</p>
<p><strong>Steps:</strong></p>
<ol>
<li><p>Point Codex at the PR or changed files.</p>
</li>
<li><p>Ask for correctness issues, missing tests, and security risks.</p>
</li>
<li><p>Compare the findings against human review.</p>
</li>
<li><p>Use Codex as a pre-filter before the broader team reviews.</p>
</li>
</ol>
<p><strong>Worked example:</strong></p>
<p>After completing Workflow 1 above, ask Codex to review your own change before opening a PR:</p>
<pre><code class="language-text">Review the change in my last commit (HEAD) — it added validation to
apply_discount in pricing.py.

Look for:
- correctness issues (off-by-one on the boundary, wrong error type, etc.)
- missing tests (boundary cases like exactly 100, exactly 0, NaN, negative zero)
- security or robustness issues
- API consistency with the existing apply_discount validation style

Prioritize findings as CRITICAL / IMPORTANT / NIT and propose a concrete
fix for each. Do not modify any files in this turn.
</code></pre>
<p><strong>What you might get back:</strong></p>
<pre><code class="language-text">IMPORTANT: line 14 — the new validation rejects discount_percent &gt; 100 but
  silently allows discount_percent == 100, which makes the price 0. That is
  technically valid but worth a test to lock the boundary. Add:
    test_apply_discount_at_boundary_100_returns_zero

NIT: the new error message says "between 0 and 100" but the existing check
  for negative values says "must be &gt;= 0". Consider unifying the messages
  for consistency.
</code></pre>
<p>You apply the IMPORTANT fix (often by following up with: <em>"apply the IMPORTANT fix from your review"</em>), defer or accept the nit, and re-run tests.</p>
<p>This is one of the highest-leverage team workflows because it catches obvious problems before a human spends review time on them. See <a href="#heading-3-review-findings-caught-by-codex">Section 9 → Measure What Matters → Review Findings Caught by Codex</a> for how to track its actual value over time.</p>
<h3 id="heading-workflow-3-understand-a-large-codebase">Workflow 3: Understand a Large Codebase</h3>
<p><strong>Use when:</strong> You are new to a repo (or returning after months away) and need a map before you can safely make changes.</p>
<p><strong>Steps:</strong></p>
<ol>
<li><p>Ask Codex to trace a request flow.</p>
</li>
<li><p>Ask for the key modules and entry points.</p>
</li>
<li><p>Request a map of the code path before editing anything.</p>
</li>
</ol>
<p><strong>Worked example:</strong></p>
<p>The <code>codex-demo</code> repo is too small to need this, so imagine a more realistic case: a teammate's repo with <code>app/</code>, <code>services/</code>, <code>models/</code>, <code>api/</code>, and 80 files you have never seen. Open the repo in Codex and run:</p>
<pre><code class="language-text">I am new to this codebase. Without modifying anything, give me an
orientation:

1. What is the entry point for the HTTP API?
2. Trace what happens when a POST hits /users — list every file the
   request touches in order, with a one-line description of each.
3. Where is database access centralized? Is there a repository pattern?
4. What test command should I run to verify any change I make?
5. What are the three files I should read first to understand the
   project's conventions?

Output as a structured markdown report.
</code></pre>
<p><strong>What you get back:</strong> a markdown report you can paste into your notes. Read the recommended files, then start working with Codex on actual changes. The 10 minutes spent on this orientation typically saves an hour of confused refactoring later.</p>
<p>This workflow is particularly useful for new hires. A senior engineer can also use it the first time they touch an unfamiliar service to avoid breaking conventions they cannot see.</p>
<h3 id="heading-workflow-4-generate-a-feature-in-parallel">Workflow 4: Generate a Feature in Parallel</h3>
<p><strong>Use when:</strong> A feature naturally splits into independent pieces (API + tests + docs, or UI + backend + migration) that do not block each other.</p>
<p><strong>Steps:</strong></p>
<ol>
<li><p>Break the work into subtasks.</p>
</li>
<li><p>Run separate Codex tasks for UI, API, tests, or docs.</p>
</li>
<li><p>Merge the outputs after review.</p>
</li>
</ol>
<p><strong>Worked example:</strong></p>
<p>Add a new "loyalty discount" capability to <code>codex-demo</code>. The work splits into three pieces that do not depend on each other:</p>
<table>
<thead>
<tr>
<th>Subtask</th>
<th>Surface</th>
<th>Prompt</th>
</tr>
</thead>
<tbody><tr>
<td><strong>A. Implementation</strong></td>
<td>CLI in terminal 1</td>
<td>"Add a <code>loyalty_discount(price, customer_tier)</code> function to <code>pricing.py</code>. Tiers are 'bronze' (0%), 'silver' (5%), 'gold' (10%). Reject unknown tiers with ValueError. Do not change any other function."</td>
</tr>
<tr>
<td><strong>B. Tests</strong></td>
<td>Codex Cloud</td>
<td>"Generate exhaustive tests in <code>test_pricing.py</code> for a function <code>loyalty_discount(price, customer_tier)</code> with tiers bronze/silver/gold. Cover: each tier, unknown tier, negative price, zero price, decimal prices. Do not modify pricing.py — assume the function will exist."</td>
</tr>
<tr>
<td><strong>C. Docs</strong></td>
<td>VS Code extension</td>
<td>"Add a section to README.md documenting the new loyalty_discount function: signature, tier table, and one usage example."</td>
</tr>
</tbody></table>
<p>Each runs in parallel. When all three finish, merge the diffs (typically the implementation goes first, then tests verify against it, then docs reference what shipped). Review each independently.</p>
<p>The Codex app and cloud surfaces are especially good for this because they let you launch and monitor multiple tasks without juggling terminal windows. The CLI also supports parallel work, but it benefits from <code>git worktree</code> so each run operates on its own branch checkout.</p>
<h3 id="heading-workflow-5-use-subagents-for-decomposition">Workflow 5: Use Subagents for Decomposition</h3>
<p><strong>Use when:</strong> A single task is too large for one Codex run but can be naturally split into investigate / plan / implement phases.</p>
<p>The CLI explicitly supports subagents — one Codex task that spawns child tasks, each with a narrower scope and its own context window.</p>
<p><strong>Worked example:</strong></p>
<p>A bug report says: <em>"Cart totals are sometimes off by a penny for European currencies."</em> You do not yet know if this is a rounding bug, a currency-conversion bug, or a data bug. Run a parent task that decomposes:</p>
<pre><code class="language-text">A bug report says cart totals are occasionally off by a penny for
European currencies.

Decompose this into three subagent tasks:

1. INVESTIGATE: Read pricing.py and any currency-related code. Identify
   every place where floating-point arithmetic touches a money value.
   Report findings without proposing fixes.

2. REPRODUCE: Write a failing test in test_pricing.py that demonstrates
   a one-cent discrepancy with EUR amounts. Use the smallest possible
   reproduction.

3. PROPOSE: Based on (1) and (2), propose two possible fixes (e.g.,
   switching to Decimal vs. rounding at the boundary) with the trade-offs
   of each. Do not implement either yet.

Wait for me to pick a fix before writing any production code.
</code></pre>
<p><strong>Why subagents help:</strong> each child task has a clean context, so the investigation findings do not pollute the test-writing context, and the proposal task gets a clean view of both. You also get a natural human checkpoint between investigation and implementation.</p>
<p>That division is often faster than one giant all-purpose run, and dramatically more reviewable.</p>
<h3 id="heading-prompt-cookbook">Prompt Cookbook</h3>
<p>New users often ask for examples because they know what they want outcome-wise but not how to phrase it. These templates are a good starting point.</p>
<h4 id="heading-bug-fix-template">Bug Fix Template</h4>
<pre><code class="language-text">Inspect the failing behavior in [file or module].
Identify the root cause.
Patch the smallest safe fix.
Add or update tests.
Summarize what changed and any edge cases I should watch.
</code></pre>
<p>Use this when the bug is narrow and you want a disciplined fix, not a redesign.</p>
<h4 id="heading-refactor-template">Refactor Template</h4>
<pre><code class="language-text">Refactor [module] to improve readability and maintain the current behavior.
Keep external APIs stable.
Explain the refactor plan before editing.
Make the smallest set of changes that achieves the goal.
</code></pre>
<p>Use this when the code works but is hard to maintain.</p>
<h4 id="heading-review-template">Review Template</h4>
<pre><code class="language-text">Review this change for correctness, missing tests, security issues, and maintainability risks.
Prioritize findings by severity.
Call out any behavior changes or ambiguous logic.
</code></pre>
<p>Use this when you want Codex to act like a pre-merge reviewer.</p>
<h4 id="heading-feature-template">Feature Template</h4>
<pre><code class="language-text">Implement [feature] in [file or subsystem].
List the files you expect to touch before changing anything.
Add tests.
Keep the implementation aligned with the current architecture.
</code></pre>
<p>Use this when the task spans multiple files and you want visibility into the plan.</p>
<h3 id="heading-signs-you-are-using-codex-well">Signs You Are Using Codex Well</h3>
<p>You usually know the workflow is healthy when:</p>
<ul>
<li><p>Codex makes small, reviewable diffs instead of broad rewrites.</p>
</li>
<li><p>The model asks for clarification only when the missing detail matters.</p>
</li>
<li><p>Test coverage improves along with functionality.</p>
</li>
<li><p>New developers can use the tool without needing a custom training session.</p>
</li>
<li><p>The time from prompt to merged change is lower, but review quality does not drop.</p>
</li>
</ul>
<p>You usually know the workflow is unhealthy when:</p>
<ul>
<li><p>Prompts are vague and every result needs heavy rework.</p>
</li>
<li><p>The team treats the first output as final.</p>
</li>
<li><p>Nobody is checking diffs or running tests.</p>
</li>
<li><p>Users keep asking for "make it better" instead of defining a clear target.</p>
</li>
</ul>
<p>Those signals matter more than raw usage counts.</p>
<h2 id="heading-section-11-model-specs-and-benchmarks-gpt-55-deep-dive">Section 11: Model Specs and Benchmarks (GPT-5.5 Deep Dive)</h2>
<p><a href="#heading-section-2-where-codex-fits-in-the-openai-ecosystem">Section 2</a> introduced GPT-5.5 as the new general flagship and gave the three-bullet practical takeaway. This section is the deep dive: the published benchmark numbers, what each one actually measures, why it matters for Codex workloads specifically, and how to use those numbers to pick the right model per task.</p>
<p>If you are setting budgets or choosing default models for a team, read this section in full. If you just want to use Codex, you can skim it.</p>
<h3 id="heading-why-benchmarks-matter-for-model-selection">Why Benchmarks Matter for Model Selection</h3>
<p>Codex lets you pick the model behind each surface. Picking well is mostly about matching the model's strengths to the task shape:</p>
<ul>
<li><p>A <strong>bounded local edit</strong> (one file, one function) does not benefit much from a frontier model. Codex-specific or Codex-mini variants are usually the right call.</p>
</li>
<li><p>A <strong>repository-wide refactor</strong> that needs the model to keep many files in working memory benefits enormously from long-context performance.</p>
</li>
<li><p>An <strong>agentic cloud task</strong> that runs unattended for ten minutes benefits from low hallucination rates and strong tool-use behavior.</p>
</li>
<li><p>A <strong>PR review</strong> benefits from low hallucination rates above almost everything else — a confident-but-wrong review comment costs more than a missed real issue.</p>
</li>
</ul>
<p>The benchmarks below tell you which model best matches each shape.</p>
<h3 id="heading-gpt-55-performance-highlights">GPT-5.5 Performance Highlights</h3>
<p>The published benchmarks position GPT-5.5 as a meaningful jump over GPT-5.4, particularly on agentic and long-context work — the workloads most relevant to Codex users.</p>
<ul>
<li><p><strong>Knowledge work (GDPval)</strong> — <strong>84.9%</strong>. GDPval evaluates whether a model can produce well-specified knowledge-work output across 44 occupations. This is the headline general-capability number.</p>
</li>
<li><p><strong>Computer use (OSWorld-Verified)</strong> — <strong>78.7%</strong>. Measures whether the model can drive a real computer environment end-to-end. Directly relevant to Codex Cloud sandboxes and agentic CLI runs.</p>
</li>
<li><p><strong>Coding (Terminal-Bench 2.0)</strong> — <strong>82.7%</strong>. A terminal-centric coding benchmark with long-context retrieval and computer-use components. The closest public proxy for Codex CLI workloads.</p>
</li>
<li><p><strong>Customer-service workflows (Tau2-bench Telecom)</strong> — <strong>98.0%</strong> without prompt tuning. Indicates strong tool-use and policy-adherence behavior straight out of the box.</p>
</li>
<li><p><strong>Long-context retrieval (MRCR v2 at 1M tokens)</strong> — <strong>74.0%</strong>, up from <strong>36.6%</strong> on GPT-5.4. This is the largest single jump in the report and the most important one for repository-scale Codex tasks where the model must keep many files in working memory.</p>
</li>
<li><p><strong>Hallucination rate</strong> — independent coverage reports a roughly <strong>60% reduction in hallucinations</strong> versus prior generations, which materially changes the trust calculus for review and PR-feedback workflows.</p>
</li>
</ul>
<h3 id="heading-what-each-benchmark-actually-measures">What Each Benchmark Actually Measures</h3>
<p>Benchmarks are easy to misread. Quick definitions of the ones cited above:</p>
<ul>
<li><p><strong>GDPval</strong> — Asks the model to produce specified knowledge-work output across 44 occupations (legal memos, financial summaries, technical documentation, etc.). A high score means the model can produce structured, well-specified output reliably. Use as a general-capability signal, not a coding-specific one.</p>
</li>
<li><p><strong>OSWorld-Verified</strong> — Tasks the model with operating a real desktop environment to complete real workflows (open files, navigate UIs, run commands). High scores predict the model will behave well in agentic sandboxes that mimic a developer's desktop.</p>
</li>
<li><p><strong>Terminal-Bench 2.0</strong> — A terminal-driven coding benchmark with long-context retrieval and computer-use components. The closest public proxy for what Codex CLI actually does day to day.</p>
</li>
<li><p><strong>Tau2-bench Telecom</strong> — Evaluates complex customer-service-style workflows that require following policies and using tools correctly. A proxy for "does the model do what you told it without going off-script."</p>
</li>
<li><p><strong>MRCR v2 at 1M tokens</strong> — A long-context retrieval benchmark. Tests whether the model can find and use information across a full 1M-token context window. The single best predictor of behavior on repository-scale Codex tasks where many files must be kept in working memory.</p>
</li>
</ul>
<h3 id="heading-practical-guidance-for-codex-users">Practical Guidance for Codex Users</h3>
<p>Translate the benchmarks into model choice:</p>
<ul>
<li><p><strong>Repository-wide tasks</strong> (cross-file refactors, multi-module migrations): GPT-5.5. The MRCR v2 jump is the single best signal that it will behave better on large codebases than GPT-5.4 did.</p>
</li>
<li><p><strong>Cheap, bounded local edits</strong> (single function, single test, doc tweak): GPT-5.4 or a Codex-specific model. The cost/latency tradeoff is much better and the capability headroom is wasted on small tasks. Do not default everything to GPT-5.5 just because it is newest.</p>
</li>
<li><p><strong>Agentic cloud tasks</strong> (background sandbox runs, multi-step workflows): GPT-5.5. The OSWorld-Verified score and lower hallucination rate are the relevant signals — fewer broken sandbox runs and fewer confidently-wrong outputs.</p>
</li>
<li><p><strong>PR review and code review workflows</strong>: GPT-5.5. The 60% hallucination drop is the single most important number for review work; a noisy reviewer trains the team to ignore the reviewer.</p>
</li>
<li><p><strong>Most expensive workloads</strong> (anything that approaches GPT-5.5 Pro pricing): keep GPT-5.5 Pro reserved for the small set of tasks where its extra capability is justified — typically deeply novel reasoning or extreme long-context work.</p>
</li>
</ul>
<h3 id="heading-for-procurement-treat-gpt-55-as-a-separate-budget-line">For Procurement: Treat GPT-5.5 as a Separate Budget Line</h3>
<p>Token consumption on agentic tasks is dominated by output. GPT-5.5 outputs are substantially more expensive than GPT-5.4 outputs. Concretely:</p>
<ul>
<li><p>Mixed-model strategies are now the rule, not the exception. Most mature teams route routine work to a Codex-mini model and reserve GPT-5.5 for repository-wide and review-heavy work.</p>
</li>
<li><p>The <a href="#heading-worked-cost-example">worked cost example in Section 7</a> shows the 30-engineer PR-review case across all five model tiers. Read it before approving a budget.</p>
</li>
<li><p>Re-check pricing every quarter. The rate card has changed in the past and will change again.</p>
</li>
</ul>
<h3 id="heading-verify-before-quoting">Verify Before Quoting</h3>
<p>The numbers in this section come from OpenAI's launch documentation and contemporaneous press coverage. Before they go into a procurement deck or a public document, verify against the official OpenAI announcement and the model page — see <a href="#heading-section-16-source-references">Section 16: Source References</a>. Benchmarks get re-run; numbers shift with eval methodology changes.</p>
<h2 id="heading-section-12-troubleshooting">Section 12: Troubleshooting</h2>
<p>Even good tools fail if the setup is wrong. Here are the most common issues.</p>
<h3 id="heading-codex-is-not-installed">"Codex is not installed"</h3>
<p>Check:</p>
<ul>
<li><p>You ran <code>npm i -g @openai/codex</code>.</p>
</li>
<li><p>You are using a supported shell and runtime.</p>
</li>
<li><p>The binary is on your path.</p>
</li>
</ul>
<h3 id="heading-i-cannot-sign-in">"I cannot sign in"</h3>
<p>Check:</p>
<ul>
<li><p>Your ChatGPT account has the right plan.</p>
</li>
<li><p>Your workspace allows Codex local or cloud use.</p>
</li>
<li><p>You are signing in with the correct account.</p>
</li>
</ul>
<h3 id="heading-windows-is-behaving-badly">"Windows is behaving badly"</h3>
<p>The CLI docs say Windows support is experimental. If you are on Windows, the best supported path is to use WSL for the CLI or use the Codex app where appropriate.</p>
<h3 id="heading-cloud-task-cannot-see-my-repo">"Cloud task cannot see my repo"</h3>
<p>Check:</p>
<ul>
<li><p>The GitHub connector is installed.</p>
</li>
<li><p>The repository is allowed in the connector.</p>
</li>
<li><p>Your organization admin has enabled Codex cloud.</p>
</li>
<li><p>You are using a GitHub-hosted repository.</p>
</li>
</ul>
<h3 id="heading-codex-will-not-browse-the-internet">"Codex will not browse the internet"</h3>
<p>That is expected by default in cloud mode. Ask your admin whether internet access has been intentionally restricted.</p>
<h3 id="heading-the-result-is-technically-correct-but-not-what-i-wanted">"The result is technically correct but not what I wanted"</h3>
<p>Usually this means the prompt was under-specified. Tighten:</p>
<ul>
<li><p>The target file or feature.</p>
</li>
<li><p>The acceptance criteria.</p>
</li>
<li><p>The constraints.</p>
</li>
<li><p>The expected output format.</p>
</li>
</ul>
<h2 id="heading-section-13-faq">Section 13: FAQ</h2>
<h3 id="heading-is-codex-a-chat-model">Is Codex a chat model?</h3>
<p>Not exactly. It is a coding agent and product surface built to work on repositories, tests, code review, and multi-step software tasks.</p>
<h3 id="heading-can-i-use-codex-without-switching-tools-all-the-time">Can I use Codex without switching tools all the time?</h3>
<p>Yes. That is one of its strengths. You can use the CLI, IDE extension, or Codex app depending on your workflow.</p>
<h3 id="heading-do-i-need-the-cloud-features">Do I need the cloud features?</h3>
<p>No. Many individual users will get value from the local CLI or IDE extension alone. Cloud tasks become more valuable as soon as you want background execution, parallelism, or automated review.</p>
<h3 id="heading-is-codex-only-for-professional-engineers">Is Codex only for professional engineers?</h3>
<p>No, but it is most useful when the user can evaluate code changes and understand a repository. It is a developer tool first.</p>
<h3 id="heading-is-codex-the-same-as-gpt-54">Is Codex the same as GPT-5.4?</h3>
<p>No. GPT-5.4 is a model. Codex is the coding product/workflow. Codex may use different models depending on the surface and configuration.</p>
<h3 id="heading-what-is-the-safest-way-to-start">What is the safest way to start?</h3>
<p>Use the CLI or IDE extension in a small repo change, keep the approval mode conservative, and review every diff before merging.</p>
<h2 id="heading-section-14-when-not-to-use-codex">Section 14: When NOT to Use Codex</h2>
<p>Most of this handbook is affirmative — Codex is good at this, Codex fits here, here is how to set it up. That framing risks creating the impression that Codex is the right tool for any coding-adjacent task. It is not. The fastest way to lose team trust in an AI coding tool is to push it into work it is bad at. The following is an honest list of where Codex is a poor fit today.</p>
<h3 id="heading-tasks-with-no-reviewable-output">Tasks With No Reviewable Output</h3>
<p>Codex's value depends on a human reviewing the diff, the test result, or the explanation. If the task produces something nobody will check — a one-off script that touches production data, an exploratory query whose result drives a decision before anyone reads the SQL — the AI's confidence becomes the only quality gate. That is a bad position to be in regardless of model quality. Either add a review step or do the task yourself.</p>
<h3 id="heading-highly-novel-architecture-decisions">Highly Novel Architecture Decisions</h3>
<p>Codex is good at applying patterns. It is much weaker at choosing which pattern fits a problem the team has not solved before. Expect it to confidently generate plausible-but-wrong architecture for genuinely new domains: a new pricing model, a new auth boundary, a new event-sourcing scheme. Use it to prototype options, not to decide between them.</p>
<h3 id="heading-work-that-crosses-org-boundaries">Work That Crosses Org Boundaries</h3>
<p>Codex sees the repository it has access to. It does not see the cross-team contracts, the deprecation calendar in the platform team's roadmap, the half-finished migration in another repo, or the political reasons one approach is off-limits. For changes that span multiple teams or services, Codex can implement individual pieces, but a human still needs to own the cross-cutting plan.</p>
<h3 id="heading-anything-touching-live-production-state">Anything Touching Live Production State</h3>
<p>Codex Cloud sandboxes are good. They are not a substitute for human approval before a production change. Database migrations, infrastructure-as-code that mutates real resources, secret rotation, customer-data scripts — these need a human in the approval path even if Codex wrote the diff. The fact that Codex can run commands does not mean it should run those commands.</p>
<h3 id="heading-compliance-and-safety-critical-code">Compliance- and Safety-Critical Code</h3>
<p>Code that lives inside a regulated boundary (payments, medical, security primitives, model-evaluation harnesses for safety) has higher review and provenance requirements than typical product code. Codex output is fine as a starting draft, but the review burden is the same as for any third-party-authored code, which usually means the speed advantage shrinks substantially. Plan for that or keep these areas Codex-free.</p>
<h3 id="heading-tasks-where-the-real-bottleneck-is-knowledge-not-typing">Tasks Where the Real Bottleneck Is Knowledge, Not Typing</h3>
<p>If the team is stuck because nobody understands the legacy system, the failing test, or the weird customer report, generating more code rarely helps. Codex can accelerate the implementation once you know what to do. It cannot replace the discovery and design conversation that should happen first. Teams that skip the discovery step and go straight to "ask Codex" tend to ship the wrong thing fast.</p>
<h3 id="heading-anything-where-hallucinations-have-high-cost">Anything Where Hallucinations Have High Cost</h3>
<p>GPT-5.5 dropped hallucination rates by roughly 60% versus prior generations, which is a real improvement. It is not zero. Tasks where a confident-but-wrong output causes real damage — generating regulatory citations, copying API contract details from a doc the model hasn't actually read, asserting facts about an unfamiliar third-party library — still need the same skepticism you would apply to any AI output. Use search-grounded workflows or human verification for these.</p>
<h3 id="heading-quick-heuristic">Quick Heuristic</h3>
<p>If you can answer all four of these with "yes," Codex is likely a good fit:</p>
<ol>
<li><p>Can the output be reviewed by someone who would catch a mistake?</p>
</li>
<li><p>Is the task a known pattern, not a novel architecture decision?</p>
</li>
<li><p>Is the blast radius local to one repository or service?</p>
</li>
<li><p>Is the cost of a bad output bounded (e.g., a failed test, a reverted commit) rather than unbounded (e.g., production data loss, regulatory exposure)?</p>
</li>
</ol>
<p>If any of those are "no," either restructure the task to make them "yes" or keep the work outside Codex.</p>
<h2 id="heading-section-15-final-recommendations">Section 15: Final Recommendations</h2>
<p>If you are rolling Codex out to new users, I would keep the guidance very simple:</p>
<ol>
<li><p>Start with the CLI or IDE extension.</p>
</li>
<li><p>Use one small task to learn the tool.</p>
</li>
<li><p>Review every change before merging.</p>
</li>
<li><p>Move to cloud tasks only after users trust the local workflow.</p>
</li>
<li><p>For teams, separate user access from admin access.</p>
</li>
<li><p>Re-check pricing whenever your plan or workspace changes.</p>
</li>
</ol>
<p>Codex is most valuable when it is treated as a disciplined engineering tool rather than a novelty. If you give it real code, clear constraints, and a review culture, it can accelerate the boring parts of software development and make bigger tasks easier to break down.</p>
<h3 id="heading-the-lunartech-fellowship-bridging-academia-and-industry">The LUNARTECH Fellowship: Bridging Academia and Industry</h3>
<p>Addressing the growing disconnect between academic theory and the practical demands of the tech industry, the LUNARTECH Fellowship was created to bridge this talent gap.</p>
<p>Far too often, aspiring engineers are caught in the “no experience, no job” loop, graduating with theoretical knowledge but unprepared for the messy reality of production systems.</p>
<p>To combat this systemic issue and halt the resulting brain drain, the Fellowship invests heavily in promising individuals, offering a transformative environment that prioritizes hands-on experience, mentorship, and real-world engineering over traditional degrees.</p>
<p>This 6-month, remote-first apprenticeship serves as an immersive odyssey from aspiring talent to AI trailblazer. Rather than paying to learn in isolation, Fellows work on live, high-stakes AI and data products alongside experienced senior engineers and founders. By tackling actual engineering challenges and building a concrete portfolio of production-ready work, participants acquire the job-ready skills needed to thrive in today’s competitive landscape.</p>
<p>If you are ready to break the loop and accelerate your career, you can explore these opportunities and start your journey here: <a href="https://www.lunartech.ai/our-careers">https://www.lunartech.ai/our-careers</a>.</p>
<h3 id="heading-master-your-career-the-ai-engineering-handbook">Master Your Career: The AI Engineering Handbook</h3>
<p>For those ready to transition from theory to practice, we have developed <a href="https://www.lunartech.ai/download/the-ai-engineering-handbook"><strong>The AI Engineering Handbook: How to Start a Career and Excel as an AI Engineer</strong></a>. This comprehensive guide provides a step-by-step roadmap for mastering the skills necessary to thrive in the transformative world of AI in 2026.</p>
<p>Whether you are a developer looking to break into a competitive field or a professional seeking to future-proof your career, this handbook offers proven strategies and actionable insights that have already empowered countless individuals to secure high-impact roles.</p>
<p>Inside, you will explore real-world industry workflows, advanced architecting methods, and expert perspectives from leaders at companies like NVIDIA, Microsoft, and OpenAI. From discovering the technology behind ChatGPT to learning how to architect systems that transform research into world-changing products, this eBook is your ultimate companion for career acceleration. You can <a href="https://www.lunartech.ai/download/the-ai-engineering-handbook">download your free copy</a> and start mastering the future of AI.</p>
<h2 id="heading-section-16-source-references">Section 16: Source References</h2>
<p>Official OpenAI sources used for this handbook:</p>
<ul>
<li><p><a href="https://openai.com/index/introducing-gpt-5-5/">Introducing GPT-5.5 (OpenAI)</a></p>
</li>
<li><p><a href="https://help.openai.com/en/articles/11369540-codex-in-chatgpt-faq">Using Codex with your ChatGPT plan</a></p>
</li>
<li><p><a href="https://help.openai.com/en/articles/11487671-flexible-pricing-for-the-enterprise-edu-and-team-plans">Flexible pricing for the Enterprise, Edu, and Business plans</a></p>
</li>
<li><p><a href="https://developers.openai.com/api/docs/models/all">All models</a></p>
</li>
<li><p><a href="https://developers.openai.com/api/docs/models">OpenAI API models overview</a></p>
</li>
<li><p><a href="https://developers.openai.com/api/docs/models/gpt-5-codex">GPT-5-Codex model</a></p>
</li>
<li><p><a href="https://developers.openai.com/api/docs/models/gpt-5.2-codex">GPT-5.2-Codex model</a></p>
</li>
<li><p><a href="https://developers.openai.com/api/docs/models/codex-mini-latest">codex-mini-latest model</a></p>
</li>
<li><p><a href="https://developers.openai.com/codex/use-cases">Codex use cases</a></p>
</li>
<li><p><a href="https://docs.anthropic.com/en/docs/overview">Claude overview</a></p>
</li>
<li><p><a href="https://docs.github.com/en/copilot/">GitHub Copilot documentation</a></p>
</li>
<li><p><a href="https://developers.openai.com/codex/enterprise/admin-setup">Codex enterprise admin setup</a></p>
</li>
<li><p><a href="https://developers.openai.com/codex/ide">Codex IDE extension docs</a></p>
</li>
<li><p><a href="https://marketplace.visualstudio.com/items?itemName=openai.chatgpt">Codex – OpenAI's coding agent (VS Code Marketplace listing)</a></p>
</li>
<li><p><a href="https://developers.openai.com/codex/cloud">Codex web (cloud) docs</a></p>
</li>
<li><p><a href="https://developers.openai.com/codex/cli">Codex CLI docs</a></p>
</li>
<li><p><a href="https://developers.openai.com/codex/cli/reference">Codex CLI command-line reference</a></p>
</li>
<li><p><a href="https://developers.openai.com/codex/cli/features">Codex CLI features</a></p>
</li>
<li><p><a href="https://developers.openai.com/codex/quickstart">Codex quickstart</a></p>
</li>
<li><p><a href="https://help.openai.com/en/articles/11369540-using-codex-with-your-chatgpt-plan">Using Codex with your ChatGPT plan (Help Center)</a></p>
</li>
</ul>
<p>Press coverage of the GPT-5.5 release referenced in <a href="#heading-section-2-where-codex-fits-in-the-openai-ecosystem">Section 2</a> and <a href="#heading-section-11-model-specs-and-benchmarks-gpt-55-deep-dive">Section 11</a>:</p>
<ul>
<li><p><a href="https://techcrunch.com/2026/04/23/openai-chatgpt-gpt-5-5-ai-model-superapp/">OpenAI releases GPT-5.5, bringing company one step closer to an AI 'super app' (TechCrunch)</a></p>
</li>
<li><p><a href="https://thenewstack.io/openai-launches-gpt-5-5-calling-it-a-new-class-of-intelligence/">OpenAI launches GPT-5.5, calling it "a new class of intelligence" (The New Stack)</a></p>
</li>
<li><p><a href="https://startupfortune.com/openais-gpt-55-benchmarks-show-a-60-hallucination-drop-and-coding-skills-that-rival-senior-engineers/">OpenAI's GPT-5.5 benchmarks show a 60% hallucination drop and coding skills that rival senior engineers (Startup Fortune)</a></p>
</li>
</ul>
<h2 id="heading-appendix-a-30-60-90-day-adoption-plan">Appendix A: 30-60-90 Day Adoption Plan</h2>
<p>If you are introducing Codex to a team, the fastest way to create trust is to phase adoption instead of rolling it out as a big-bang change. A staged plan also helps you discover where the real friction lives: authentication, permissions, prompt quality, review habits, or budget assumptions.</p>
<h3 id="heading-first-30-days-prove-value">First 30 Days: Prove Value</h3>
<p>In the first month, the goal is not maximum usage. The goal is repeatable wins.</p>
<p>Recommended actions:</p>
<ol>
<li><p>Pick one or two engineers who are comfortable trying new tools.</p>
</li>
<li><p>Restrict usage to small, low-risk tasks such as bug fixes, test generation, and documentation updates.</p>
</li>
<li><p>Standardize a short prompt template so every request includes task, context, constraints, and expected output.</p>
</li>
<li><p>Require human review for every change.</p>
</li>
<li><p>Track the time it takes to go from prompt to merged diff.</p>
</li>
</ol>
<p>What you should learn in this phase:</p>
<ul>
<li><p>Does Codex understand your codebase structure?</p>
</li>
<li><p>Are the diffs reviewable?</p>
</li>
<li><p>Does the approval flow slow people down in a useful way, or in a frustrating way?</p>
</li>
<li><p>Which classes of tasks work well, and which ones need more guidance?</p>
</li>
</ul>
<p>If the first month is noisy, do not blame the model first. Usually the issue is task scope, missing context, or unclear acceptance criteria.</p>
<h3 id="heading-days-31-60-expand-carefully">Days 31-60: Expand Carefully</h3>
<p>Once the tool has proven itself on a handful of tasks, expand to a broader pilot group.</p>
<p>Recommended actions:</p>
<ol>
<li><p>Add more developers from different parts of the stack.</p>
</li>
<li><p>Include at least one person who is skeptical, because their feedback will reveal weak spots.</p>
</li>
<li><p>Try the app, CLI, and IDE extension in parallel so people can choose the workflow that matches their habits.</p>
</li>
<li><p>Introduce Codex cloud for one or two background tasks or pull request reviews.</p>
</li>
<li><p>Start documenting prompts that worked well, including examples of high-quality follow-up instructions.</p>
</li>
</ol>
<p>What you should learn in this phase:</p>
<ul>
<li><p>Which surfaces are actually sticky for the team?</p>
</li>
<li><p>Where does Codex save the most time?</p>
</li>
<li><p>Do people trust the output enough to delegate real work?</p>
</li>
<li><p>Are you seeing the same mistakes repeatedly?</p>
</li>
</ul>
<p>At this stage, your internal documentation matters. A short "how we use Codex here" page is often more useful than another technical deep dive.</p>
<h3 id="heading-days-61-90-operationalize">Days 61-90: Operationalize</h3>
<p>After about three months, your objective should shift from experimentation to operating practice.</p>
<p>Recommended actions:</p>
<ol>
<li><p>Assign ownership for workspace settings, GitHub connector setup, and model access.</p>
</li>
<li><p>Define which tasks should stay local and which can go to cloud sandboxes.</p>
</li>
<li><p>Document your review standards for Codex-generated diffs.</p>
</li>
<li><p>Set budget expectations with the team so no one is surprised by token-heavy tasks.</p>
</li>
<li><p>Add Codex to onboarding for new engineers, starting with one simple flow.</p>
</li>
</ol>
<p>What good looks like at this stage:</p>
<ul>
<li><p>New hires can use Codex on day one.</p>
</li>
<li><p>Team members know when to reach for Codex and when to use a different workflow.</p>
</li>
<li><p>Admins can answer access and pricing questions quickly.</p>
</li>
<li><p>The organization has a realistic picture of the tool's strengths and limits.</p>
</li>
</ul>
<h3 id="heading-a-practical-onboarding-script">A Practical Onboarding Script</h3>
<p>If you need a ready-made orientation for a new user, use this:</p>
<ol>
<li><p>"Install the CLI or extension."</p>
</li>
<li><p>"Open a repository you know well."</p>
</li>
<li><p>"Ask Codex to make one small, safe change."</p>
</li>
<li><p>"Review the diff line by line."</p>
</li>
<li><p>"Run the tests."</p>
</li>
<li><p>"Ask Codex to explain what it changed and why."</p>
</li>
<li><p>"Repeat with a slightly larger task."</p>
</li>
</ol>
<p>That sequence teaches the core loop: context, task, change, review, verify. Once a user understands that loop, the rest of the product family becomes much easier to adopt.</p>
<h2 id="heading-appendix-b-glossary">Appendix B: Glossary</h2>
<p>Terms used in this handbook, in alphabetical order. The list is intentionally narrow — only terms that appear in the body and are likely to be unfamiliar to a non-engineering reader (procurement, security, leadership) are defined here.</p>
<ul>
<li><p><strong>Agent / agentic workflow.</strong> Software that can take a goal, plan steps, take actions (read files, run commands, call APIs), observe the result, and iterate. Codex is an agentic coding workflow; a chatbot is not.</p>
</li>
<li><p><strong>Approval mode.</strong> A Codex setting that controls how much the agent can do without asking. Stricter modes prompt the human before running shell commands or modifying files; permissive modes let the agent work uninterrupted.</p>
</li>
<li><p><strong>CLI.</strong> Command-line interface. The Codex CLI is the terminal-based version of Codex, installed via <code>npm i -g @openai/codex</code>.</p>
</li>
<li><p><strong>Codex Cloud.</strong> The hosted, sandboxed execution mode for Codex. Tasks run in isolated environments with the repo and finish with a reviewable diff.</p>
</li>
<li><p><strong>GDPval.</strong> A benchmark that scores models on their ability to produce well-specified knowledge-work output across 44 occupations. Used in <a href="#heading-section-11-model-specs-and-benchmarks-gpt-55-deep-dive">Section 11</a> as a general-capability signal.</p>
</li>
<li><p><strong>GitHub Connector.</strong> The integration that lets Codex Cloud access GitHub repositories. Required for cloud tasks; uses short-lived, least-privilege tokens.</p>
</li>
<li><p><strong>MCP (Model Context Protocol).</strong> An open protocol for connecting models to external data sources and tools. Codex CLI supports MCP, which lets it pull in data from systems beyond the repo.</p>
</li>
<li><p><strong>MRCR v2.</strong> A long-context retrieval benchmark that measures whether the model can find and use information across very large input windows. The 1M-token version is cited in the GPT-5.5 section because it predicts behavior on repository-scale tasks.</p>
</li>
<li><p><strong>OSWorld-Verified.</strong> A benchmark that measures whether a model can operate a real desktop computer environment to complete tasks. A direct proxy for agentic and computer-use workloads.</p>
</li>
<li><p><strong>PR (pull request).</strong> A proposed change to a code repository, hosted on GitHub or similar platforms, where reviewers approve before the change merges.</p>
</li>
<li><p><strong>RBAC (role-based access control).</strong> A permission model where users are assigned to roles, and roles have specific permissions. Used by Codex workspace admins to control who can do what.</p>
</li>
<li><p><strong>SCIM (System for Cross-domain Identity Management).</strong> A standard for syncing users and groups from an identity provider (Okta, Entra ID, etc.) into another system. Codex supports SCIM-based group sync for enterprise.</p>
</li>
<li><p><strong>Subagent.</strong> A Codex CLI feature that splits a task across multiple parallel agent runs, each handling a piece of the work.</p>
</li>
<li><p><strong>Tau2-bench Telecom.</strong> A benchmark for complex customer-service workflows with tool use. Cited as a signal for tool-use reliability and policy adherence.</p>
</li>
<li><p><strong>Terminal-Bench 2.0.</strong> A coding benchmark focused on terminal-driven workflows, including long-context retrieval and computer use. The closest public proxy for Codex CLI workloads.</p>
</li>
<li><p><strong>Worktree.</strong> A git feature that lets multiple branches be checked out simultaneously in different directories. The Codex app uses worktrees so multiple agents can work in parallel without stepping on each other.</p>
</li>
<li><p><strong>WSL (Windows Subsystem for Linux).</strong> A compatibility layer that runs Linux binaries natively on Windows. The recommended environment for Codex CLI on Windows, since direct Windows support is experimental.</p>
</li>
</ul>
<h2 id="heading-appendix-c-admin-security-checklist">Appendix C: Admin Security Checklist</h2>
<p>For workspace admins setting up Codex for an enterprise. This checklist condenses <a href="#heading-section-8-security-permissions-and-enterprise-setup">Section 8</a> into actionable items. Run through it before broad rollout, then revisit quarterly.</p>
<p><strong>Access</strong></p>
<ul>
<li><p>[ ] Decide whether Codex Local, Codex Cloud, or both are enabled at the workspace level.</p>
</li>
<li><p>[ ] Create separate RBAC groups for Codex Admins (policy and governance) and Codex Users (day-to-day developers). Avoid mixing the two.</p>
</li>
<li><p>[ ] Sync user and group membership from your identity provider via SCIM rather than managing users by hand.</p>
</li>
<li><p>[ ] Set a sensible default role for new workspace members. Do not default to admin.</p>
</li>
</ul>
<p><strong>GitHub integration</strong></p>
<ul>
<li><p>[ ] Install the ChatGPT GitHub Connector against the correct GitHub organization.</p>
</li>
<li><p>[ ] Allowlist only the repositories Codex Cloud needs. Do not grant org-wide access by default.</p>
</li>
<li><p>[ ] Verify Codex respects existing branch protection rules on protected branches before enabling cloud tasks against them.</p>
</li>
<li><p>[ ] Confirm the GitHub App tokens Codex uses are short-lived and least-privilege.</p>
</li>
</ul>
<p><strong>Network and runtime</strong></p>
<ul>
<li><p>[ ] Confirm Codex Cloud runs with no internet access by default. This is the secure default; verify it is on.</p>
</li>
<li><p>[ ] If a workflow requires internet access, define an explicit allowlist (dependency registries, trusted sites) and limit allowed HTTP methods.</p>
</li>
<li><p>[ ] Document which model surfaces are approved for sensitive code (often: local CLI yes, cloud no for the most sensitive repositories).</p>
</li>
</ul>
<p><strong>Data and review</strong></p>
<ul>
<li><p>[ ] Document the team's review standard for Codex-generated diffs. At minimum: a human approves every merge.</p>
</li>
<li><p>[ ] Confirm logging and audit trails are configured for Codex actions (model used, prompts, files changed) per your compliance requirements.</p>
</li>
<li><p>[ ] Define which classes of data are off-limits to Codex (PII, customer data, secrets) and how those boundaries are enforced.</p>
</li>
<li><p>[ ] Establish an incident playbook for the case where Codex generates or commits something it should not have.</p>
</li>
</ul>
<p><strong>Budget and ongoing operations</strong></p>
<ul>
<li><p>[ ] Set a per-workspace token budget or alert threshold so unexpected spend is caught early.</p>
</li>
<li><p>[ ] Pick a default model per task type (e.g., Codex-mini for routine review, GPT-5.5 for repository-wide refactors) and document the choice.</p>
</li>
<li><p>[ ] Review the Codex pricing page quarterly. The rate card has changed in the past and will change again.</p>
</li>
<li><p>[ ] Re-run this checklist when (a) a major model release lands, (b) the workspace expands to a new team, or (c) Codex adds a new surface or capability.</p>
</li>
</ul>
<h2 id="heading-appendix-d-changelog">Appendix D: Changelog</h2>
<p>A short, append-only log of substantive revisions to this handbook. Each entry lists the version, date, and a one-line summary of what changed.</p>
<ul>
<li><p><strong>v1.3 — 2026-04-30.</strong> Made the Table of Contents clickable. Added a new Prerequisites section after the TOC. Restructured the early sections: merged the old "Quick Start" and "How to Set Up Codex" into a single <a href="#heading-section-4-getting-started-install-set-up-and-your-first-task">Section 4</a> walkthrough using a self-contained <code>codex-demo</code> repo readers build themselves. Slimmed <a href="#heading-section-2-where-codex-fits-in-the-openai-ecosystem">Section 2</a> by moving the GPT-5.5 benchmark deep dive to a new <a href="#heading-section-11-model-specs-and-benchmarks-gpt-55-deep-dive">Section 11</a> (Model Specs and Benchmarks). Added per-surface hyperlinks to <a href="#heading-section-3-the-core-surfaces">Section 3</a>. Rewrote <a href="#heading-section-5-how-to-use-codex-effectively">Section 5</a> (How to Use Codex Effectively) with bad/good examples for every tip and a definition of "bounded change." Rewrote the "Measure What Matters" subsection with concrete computation methods for each metric. Added worked, runnable examples to every workflow in <a href="#heading-section-10-common-workflows-and-examples">Section 10</a>. Renumbered downstream sections accordingly.</p>
</li>
<li><p><strong>v1.2 — 2026-04-25.</strong> Added Appendix E (Working with Codex in VS Code), a detailed step-by-step guide covering the three VS Code entry points — the extension, the CLI in the integrated terminal, and browser Codex at chatgpt.com/codex — with setup instructions, a decision matrix, a combined-workflow pattern, and VS Code-specific troubleshooting. Added a forward-pointer in the setup section.</p>
</li>
<li><p><strong>v1.1 — 2026-04-25.</strong> Added GPT-5.5 / GPT-5.5 Pro coverage in <a href="#heading-section-2-where-codex-fits-in-the-openai-ecosystem">Section 2</a> and <a href="#heading-section-7-pricing-and-plan-access">Section 7</a>. Added executive summary, comparison matrix in the model-comparison section, worked cost example, "When NOT to use Codex" in <a href="#heading-section-14-when-not-to-use-codex">Section 14</a>. Added Appendix B (Glossary), Appendix C (Admin Security Checklist), Appendix D (Changelog). Added version stamp and author line. Press coverage sources for GPT-5.5 added in <a href="#heading-section-16-source-references">Section 16</a>.</p>
</li>
<li><p><strong>v1.0 — Initial release.</strong> Original Codex onboarding handbook covering surfaces, setup, usage, model comparison, pricing, security, team practices, workflows, troubleshooting, FAQ, and the 30-60-90 day adoption plan.</p>
</li>
</ul>
<h2 id="heading-appendix-e-working-with-codex-in-vs-code">Appendix E: Working with Codex in VS Code</h2>
<p>This appendix is a focused, step-by-step guide to using Codex inside Visual Studio Code (and its forks, Cursor and Windsurf).</p>
<p>VS Code is the most common starting surface for new Codex users, and the workflow has three distinct entry points that can be used independently or together. This guide covers each one, when to pick it, and how the three combine into a single fluid workflow.</p>
<h3 id="heading-e1-why-vs-code-is-the-recommended-starting-surface">E.1 Why VS Code Is the Recommended Starting Surface</h3>
<p>Most teams start with VS Code rather than the standalone Codex app or pure CLI for a few practical reasons:</p>
<ul>
<li><p>The editor is already where engineers spend their day. Adding Codex does not require a context switch.</p>
</li>
<li><p>The extension surface area is small and reviewable. Engineers can try it on a single file before adopting it more broadly.</p>
</li>
<li><p>VS Code's integrated terminal makes the CLI a one-keystroke experience, so the extension and CLI can be combined without leaving the editor.</p>
</li>
<li><p>Cursor and Windsurf, the most popular VS Code forks, both run the same Codex extension. A team that standardizes on the VS Code workflow does not have to retrain people if some engineers prefer a fork.</p>
</li>
</ul>
<p>The downside of starting in VS Code is that you do not get parallel-task management or worktree support out of the box — those are stronger in the Codex app. For most individual contributors, that is not a meaningful loss in the first month.</p>
<h3 id="heading-e2-the-three-entry-points">E.2 The Three Entry Points</h3>
<p>Codex shows up in VS Code in three distinct ways, and they are easy to confuse. Each is a separate piece of software with its own install and its own auth handshake, even though they all sign in with the same ChatGPT account.</p>
<ol>
<li><p><strong>The Codex VS Code extension</strong> — a sidebar UI inside VS Code itself. Installed from the VS Code Marketplace. Best for in-flow editing, quick questions about the open file, and short bounded tasks.</p>
</li>
<li><p><strong>The Codex CLI, run inside VS Code's integrated terminal</strong> — the command-line agent (<code>codex</code>) running in the terminal pane that is already attached to your VS Code workspace. Best for multi-step agentic tasks, scripted runs, and anything where you want explicit approval gates.</p>
</li>
<li><p><strong>Browser Codex at chatgpt.com/codex</strong> — the web interface to Codex Cloud, where tasks run in isolated sandboxes against your GitHub repository. Best for background work, parallel tasks, and PR-style review.</p>
</li>
</ol>
<p>These are not alternatives to each other in the sense that you must pick one. They are three workflows that target different kinds of work, and most experienced Codex users have all three set up.</p>
<h3 id="heading-e3-setting-up-the-codex-vs-code-extension">E.3 Setting Up the Codex VS Code Extension</h3>
<p>This is the entry point most new users meet first.</p>
<p><strong>Install</strong></p>
<p>There are two install paths:</p>
<ol>
<li><p>Open the VS Code Marketplace, search for "Codex" or "ChatGPT", and install the extension published by <code>openai</code>. The marketplace identifier is <code>openai.chatgpt</code>.</p>
</li>
<li><p>From a terminal, run:</p>
</li>
</ol>
<pre><code class="language-bash">code --install-extension openai.chatgpt
</code></pre>
<p>The CLI install path is useful for scripted dev-environment provisioning, dotfiles repos, and onboarding scripts that bring a new machine up to a known baseline.</p>
<p><strong>Sign in</strong></p>
<p>After install, the Codex panel appears in the right sidebar. The first time you open it, you will be prompted to sign in. You have two options:</p>
<ul>
<li><p><strong>Sign in with ChatGPT.</strong> Recommended for individuals on Plus, Pro, Business, or Enterprise/Edu plans. Usage is charged against your plan's included Codex credits.</p>
</li>
<li><p><strong>Sign in with an API key.</strong> Used when you want metered API billing instead of plan-based usage, or when your workspace policy requires it. Get the key from the OpenAI developer console, then paste it into the extension's auth prompt.</p>
</li>
</ul>
<p>If both options are visible and you are unsure which to pick, default to ChatGPT sign-in. It is the path that exercises the same plan-included usage that the rest of your team is on, which makes cost behavior predictable.</p>
<p><strong>First-run sanity check</strong></p>
<p>Once signed in, do a five-minute sanity check before relying on the extension for real work:</p>
<ol>
<li><p>Open a small repository you know well.</p>
</li>
<li><p>Open the Codex panel in the right sidebar.</p>
</li>
<li><p>Ask a question about the open file (e.g., "What does this function do?") and confirm the answer matches what you already know.</p>
</li>
<li><p>Ask for a small change (e.g., "Add a docstring to this function") and confirm a reviewable diff appears.</p>
</li>
<li><p>Apply the change, run your tests, and revert if needed.</p>
</li>
</ol>
<p>If any of those steps fails, fix the auth or install before going further. Trying to debug the extension on a real task is much harder than debugging it on a known-good toy task.</p>
<p><strong>Platform notes</strong></p>
<ul>
<li><p><strong>macOS and Linux</strong> are first-class. The extension and the underlying CLI both work natively.</p>
</li>
<li><p><strong>Windows</strong> is experimental for the CLI. The extension itself works, but if you also want to run the CLI inside VS Code's integrated terminal, OpenAI recommends using a WSL workspace. Open the folder via "Reopen in WSL" before installing the CLI.</p>
</li>
<li><p><strong>Cursor and Windsurf</strong> run the same extension. Watch for visual or shortcut conflicts with the fork's built-in AI features — see E.9 for specifics.</p>
</li>
</ul>
<h3 id="heading-e4-setting-up-the-codex-cli-inside-vs-codes-integrated-terminal">E.4 Setting Up the Codex CLI Inside VS Code's Integrated Terminal</h3>
<p>The CLI is the second entry point. It runs as a normal command-line tool, but inside VS Code's integrated terminal it picks up the active workspace folder automatically, which makes it feel like a native part of the editor.</p>
<p><strong>Install the CLI</strong></p>
<p>From any terminal, including VS Code's integrated terminal:</p>
<pre><code class="language-bash">npm i -g @openai/codex
</code></pre>
<p>This installs the <code>codex</code> binary globally. Confirm by running:</p>
<pre><code class="language-bash">codex --version
</code></pre>
<p>If the command is not found, the most common cause is that npm's global bin directory is not on your PATH. Either fix the PATH or use a Node version manager (nvm, fnm, volta) that handles it for you.</p>
<p><strong>Open the integrated terminal in VS Code</strong></p>
<p>Three ways to open it, pick whichever matches your habits:</p>
<ul>
<li><p>The View menu → Terminal.</p>
</li>
<li><p>The keyboard shortcut <strong>Ctrl+</strong><code>** (backtick) on Windows/Linux, **⌃</code> on macOS.</p>
</li>
<li><p>The Command Palette: <code>Terminal: Create New Terminal</code>.</p>
</li>
</ul>
<p>The integrated terminal inherits the active workspace folder as its working directory, which means <code>codex</code> launched from there immediately sees the right repo.</p>
<p><strong>Run Codex</strong></p>
<p>In the terminal, navigate to the repo (if you are not already there) and run:</p>
<pre><code class="language-bash">codex
</code></pre>
<p>The first time you run it, you will go through the same auth flow as the extension — sign in with ChatGPT or paste an API key.</p>
<p><strong>Pick an approval mode</strong></p>
<p>The CLI supports several approval modes that govern how much Codex can do without explicit confirmation. For new users, start with the strictest mode (asks before every shell command and every file change), then loosen it once you trust the workflow on your repo. The relevant modes and how to toggle them are described in the CLI docs linked in <a href="#heading-section-16-source-references">Section 16</a>.</p>
<p><strong>Where the CLI beats the extension</strong></p>
<ul>
<li><p>Multi-step agentic runs that need to read several files, run tests, iterate, and report.</p>
</li>
<li><p>Anything you want to script or invoke from a <code>package.json</code> script, a Makefile, or a CI step.</p>
</li>
<li><p>Subagent decomposition (the CLI explicitly supports splitting a task across multiple parallel agent runs).</p>
</li>
<li><p>MCP-connected tools and custom data sources.</p>
</li>
<li><p>Cloud task launching from the terminal, when you do not want to leave the keyboard.</p>
</li>
</ul>
<h3 id="heading-e5-setting-up-browser-codex-chatgptcomcodex">E.5 Setting Up Browser Codex (chatgpt.com/codex)</h3>
<p>The third entry point lives outside VS Code but is essential for the full workflow because it is how you launch and monitor cloud tasks.</p>
<p><strong>Open browser Codex</strong></p>
<p>Navigate to <strong>chatgpt.com/codex</strong>. You will need to be signed into the same ChatGPT account you used for the extension and CLI. If you are part of an enterprise workspace, your admin must have enabled Codex Cloud at the workspace level — see <a href="#heading-section-8-security-permissions-and-enterprise-setup">Section 8</a>.</p>
<p>You can also reach Codex through the sidebar in regular ChatGPT. The browser surface exposes two main verbs:</p>
<ul>
<li><p><strong>Code</strong> — assign a coding task. Codex spins up a sandbox preloaded with your repository and produces a reviewable diff.</p>
</li>
<li><p><strong>Ask</strong> — ask a question about your codebase without changing any code.</p>
</li>
</ul>
<p><strong>Connect a GitHub repository</strong></p>
<p>Cloud tasks need a GitHub-hosted repository. Connect it once:</p>
<ol>
<li><p>Open environment settings at chatgpt.com/codex.</p>
</li>
<li><p>Connect your GitHub account through the ChatGPT GitHub Connector.</p>
</li>
<li><p>Grant access to the specific repositories you want Codex to be able to use. Do not grant org-wide access by default — see Appendix C for the security checklist.</p>
</li>
<li><p>Confirm the connector shows the repo as available.</p>
</li>
</ol>
<p><strong>Launch a task</strong></p>
<p>From the Codex web interface:</p>
<ol>
<li><p>Pick the repository and (optionally) the branch.</p>
</li>
<li><p>Type a prompt describing the task. Be specific — "Add input validation to the <code>/users</code> POST endpoint and update the matching tests" beats "Improve the API."</p>
</li>
<li><p>Click <strong>Code</strong> (or <strong>Ask</strong> for a non-mutating question).</p>
</li>
<li><p>Watch the live logs as Codex works, or close the tab and let it run in the background.</p>
</li>
<li><p>When it finishes, review the diff. From there you can request changes, accept the result, or open a pull request.</p>
</li>
</ol>
<p><strong>Delegate from a GitHub PR comment</strong></p>
<p>A useful shortcut: in any PR on a connected repo, you can post a comment that tags <code>@codex</code> with an instruction (for example, "@codex review this PR for security issues and missing tests"). Codex will pick up the request and respond on the PR. This requires being signed into ChatGPT in the same browser.</p>
<p><strong>Why the browser surface matters even if you live in VS Code</strong></p>
<p>Cloud tasks decouple Codex from your local machine. You can launch a long-running task from the browser, close the laptop, and come back to the diff later. The extension and CLI cannot do this — they need an open VS Code instance to run.</p>
<h3 id="heading-e6-when-to-pick-which-entry-point">E.6 When to Pick Which Entry Point</h3>
<p>The three entry points overlap, which causes confusion. This table makes the choice mechanical.</p>
<table>
<thead>
<tr>
<th>Situation</th>
<th>Best entry point</th>
<th>Why</th>
</tr>
</thead>
<tbody><tr>
<td>Quick edit on the file you have open</td>
<td>Extension</td>
<td>Lowest friction, no context switch</td>
</tr>
<tr>
<td>"What does this function do?"</td>
<td>Extension</td>
<td>Right-sidebar Q&amp;A is faster than typing it into a terminal</td>
</tr>
<tr>
<td>Multi-file refactor with tests</td>
<td>CLI in integrated terminal</td>
<td>Better at multi-step agentic work and approvals</td>
</tr>
<tr>
<td>Anything you want to script or wire into a Makefile</td>
<td>CLI</td>
<td>Only the CLI is invokable from other scripts</td>
</tr>
<tr>
<td>Long-running task you want to leave running</td>
<td>Browser (cloud)</td>
<td>Decoupled from your laptop</td>
</tr>
<tr>
<td>Parallel tasks (e.g., three independent fixes at once)</td>
<td>Browser (cloud)</td>
<td>Cloud sandboxes run in parallel without local resource contention</td>
</tr>
<tr>
<td>PR review on a teammate's pull request</td>
<td>Browser, via <code>@codex</code> mention in PR</td>
<td>Lives where the review actually happens</td>
</tr>
<tr>
<td>Anything touching production credentials or live infra</td>
<td>None of the above without explicit human approval</td>
<td>See <a href="#heading-section-14-when-not-to-use-codex">Section 14</a></td>
</tr>
</tbody></table>
<p>The pattern that emerges: <strong>extension for in-flow editing, CLI for serious local agentic work, browser for anything you want offloaded or shared with the team.</strong></p>
<h3 id="heading-e7-the-combined-vs-code-workflow">E.7 The Combined VS Code Workflow</h3>
<p>The three entry points are most powerful when used together. A representative day looks like this.</p>
<p><strong>Morning, in VS Code:</strong></p>
<ol>
<li><p>Open the repo. The Codex extension panel is in the right sidebar.</p>
</li>
<li><p>Use the extension to ask questions about an unfamiliar module before you touch it.</p>
</li>
<li><p>Make small in-line edits — single-function changes, docstrings, type fixes — using the extension's diff-apply flow.</p>
</li>
</ol>
<p><strong>Mid-morning, in the integrated terminal:</strong></p>
<ol>
<li><p>Open the integrated terminal (Ctrl+`).</p>
</li>
<li><p>Run <code>codex</code> and start a multi-file task with explicit approval mode: "Refactor the auth middleware to use the new session interface. List the files you intend to touch first, then make the changes in the smallest commits possible."</p>
</li>
<li><p>Approve each shell command and each diff as Codex requests them.</p>
</li>
<li><p>Run the test suite when Codex finishes.</p>
</li>
</ol>
<p><strong>Afternoon, in the browser:</strong></p>
<ol>
<li><p>While you are reviewing the morning's CLI changes, open chatgpt.com/codex in another tab.</p>
</li>
<li><p>Launch a cloud task: "Add OpenAPI annotations to every public endpoint in the <code>/api/v2</code> directory." This will take a while.</p>
</li>
<li><p>Switch back to VS Code and keep working. The cloud task runs in its own sandbox.</p>
</li>
<li><p>When the cloud task finishes, review the diff in the browser, request any tweaks, and open a PR.</p>
</li>
</ol>
<p><strong>End of day, on GitHub:</strong></p>
<ol>
<li>Tag <code>@codex</code> on a teammate's open PR with "review for correctness and missing tests." The result lands as a comment overnight.</li>
</ol>
<p>The point of the combined workflow is that each entry point is doing what it is best at simultaneously. The extension keeps in-flow editing fast, the CLI handles local agentic work where you want approval control, and the cloud handles long-running and parallel tasks without consuming your local machine.</p>
<h3 id="heading-e8-vs-code-specific-tips">E.8 VS Code-Specific Tips</h3>
<p>These are small tips that compound over time once you use Codex daily inside VS Code.</p>
<ul>
<li><p><strong>Sidebar position.</strong> The Codex panel defaults to the right sidebar. If you also have GitHub PR review or another panel there, drag Codex to the secondary side or to a panel-bottom dock — whichever keeps it visible without stealing space from the editor.</p>
</li>
<li><p><strong>Keybindings.</strong> Bind the most-used Codex commands (open panel, new task, accept diff) to keyboard shortcuts via VS Code's <code>Preferences: Open Keyboard Shortcuts</code>. Reach for the keyboard, not the mouse.</p>
</li>
<li><p><strong>Settings sync.</strong> If you use VS Code's Settings Sync, the Codex extension's settings travel with you to other machines. Auth state does not — you sign in again on each machine. This is the right behavior; do not work around it.</p>
</li>
<li><p><strong>Multi-root workspaces.</strong> The extension scopes to the active workspace folder. If you open a multi-root workspace, switch the active folder explicitly before asking Codex to make changes, otherwise it may operate against the wrong root.</p>
</li>
<li><p><strong>Integrated terminal profiles.</strong> If you use multiple terminal profiles (PowerShell, bash, WSL), set the WSL profile as default on Windows so <code>codex</code> from the integrated terminal always lands in the supported environment.</p>
</li>
<li><p><strong>Source control panel.</strong> After Codex applies a change, the VS Code Source Control panel shows the diff. Review there before committing — it gives you the same context as a <code>git diff</code> without leaving the editor.</p>
</li>
<li><p><strong>Don't fight the approval mode.</strong> New users often loosen approvals to "auto" too quickly because the prompts feel slow. Resist that for the first week. The approvals are how you build a mental model of what Codex actually does in your repo.</p>
</li>
<li><p><strong>One Codex panel per VS Code window.</strong> Avoid running the extension and the CLI in the same workspace simultaneously on the same task — they can both touch files and you will get confused about which one made which change.</p>
</li>
</ul>
<h3 id="heading-e9-cursor-and-windsurf">E.9 Cursor and Windsurf</h3>
<p>The Codex extension explicitly supports Cursor and Windsurf, the two most popular VS Code forks. The install and sign-in flow is identical. The notes worth knowing:</p>
<ul>
<li><p><strong>Avoid double-AI confusion.</strong> Cursor and Windsurf both ship their own AI features. Engineers using them with Codex sometimes accidentally invoke the fork's built-in AI when they meant to invoke Codex, or vice versa. Pick a primary tool for editing and use the other only when its specific strengths matter.</p>
</li>
<li><p><strong>Auth is independent.</strong> The Codex extension's ChatGPT sign-in is separate from Cursor's or Windsurf's own model accounts. Your Codex usage is billed against your ChatGPT plan; Cursor/Windsurf usage against theirs.</p>
</li>
<li><p><strong>Keybinding conflicts.</strong> Cursor in particular has heavily customized AI-related keybindings. Audit your bindings after installing the Codex extension to make sure both surfaces are reachable.</p>
</li>
<li><p><strong>Settings sync caveat.</strong> Cursor and Windsurf have their own settings sync that diverges from upstream VS Code. Codex extension settings may sync within Cursor or Windsurf separately from your VS Code installs.</p>
</li>
</ul>
<p>For pure Codex-first teams, vanilla VS Code is the simplest baseline. For teams that already standardized on Cursor or Windsurf for other reasons, the Codex extension is a clean addition rather than a replacement.</p>
<h3 id="heading-e10-troubleshooting-vs-code-specifically">E.10 Troubleshooting VS Code Specifically</h3>
<p>The general troubleshooting list is in <a href="#heading-section-12-troubleshooting">Section 12</a>. The issues below are specific to running Codex inside VS Code.</p>
<p><strong>Extension installs but sidebar panel never appears</strong></p>
<p>Reload the window (Command Palette → "Developer: Reload Window"). If that does not fix it, check the Output panel, switch the dropdown to "Codex", and look for the actual error. The most common causes are a corporate proxy blocking the extension's auth handshake, or a conflicting older version of the extension still installed.</p>
<p><strong>"Sign in" keeps looping back to the sign-in prompt</strong></p>
<p>This usually means the redirect from the browser auth flow did not reach the extension. Try signing out completely, closing all VS Code windows, then reopening and signing in fresh. On Windows, verify your default browser is one VS Code can open via the OS handler.</p>
<p><code>codex</code> <strong>command not found in the integrated terminal</strong></p>
<p>The CLI's npm global bin directory is not on PATH. The fastest fix on macOS/Linux is to add <code>$(npm bin -g)</code> to your shell profile (<code>.zshrc</code>, <code>.bashrc</code>). On Windows, restart VS Code after the npm install so the integrated terminal picks up the updated PATH, or switch to a WSL terminal where the install is already on PATH.</p>
<p><strong>Cloud task says "no repository connected" even though you connected one</strong></p>
<p>Verify in chatgpt.com/codex environment settings that the specific repository is in the allowlist. The GitHub Connector grants per-repository access; granting access to the org alone is not enough. Also confirm your workspace admin has enabled Codex Cloud — individual users cannot enable it themselves.</p>
<p><strong>Extension and CLI both editing the same file at the same time</strong></p>
<p>Stop one of them. They do not coordinate, and you will get conflicting edits. The simplest discipline: pick one entry point per task, switch between tasks rather than trying to combine within a task.</p>
<p><strong>Extension feels slower than the CLI for the same prompt</strong></p>
<p>Often this is because the extension is using a different default model than your CLI configuration. Check both for the active model — the model picker in the extension panel, and <code>codex --help</code> or the relevant config file for the CLI.</p>
<p><strong>Windows behavior is generally bad</strong></p>
<p>Switch to a WSL workspace. OpenAI's own docs call out Windows as experimental for the CLI; the WSL path is the supported one and clears most issues at once.</p>
<h3 id="heading-ready-to-excel-as-an-ai-engineer"><strong>Ready to Excel as an AI Engineer?</strong></h3>
<p>As we conclude this exploration of intelligent healthcare, it’s clear that the future belongs to those who can bridge the gap between groundbreaking research and real-world utility. If you are inspired to lead this transformation, we invite you to download our flagship resource, <strong>The AI Engineering Handbook</strong>. Authored by Tatev Aslanyan, a pioneering AI engineer and co-founder of LUNARTECH, this guide is designed to help you navigate the highly competitive landscape of AI engineering, providing you with the step-by-step roadmap and industry workflows needed to build world-changing products.</p>
<p>Empower yourself with the same strategies used by AI trailblazers at the world's most innovative tech companies. By mastering these production-ready skills, you won't just keep pace with the hyper-connected world — you will help define it. Get started today by downloading your eBook here: <a href="https://www.lunartech.ai/download/the-ai-engineering-handbook">https://www.lunartech.ai/download/the-ai-engineering-handbook</a>.</p>
<h2 id="heading-about-lunartech-lab"><strong>About LunarTech Lab</strong></h2>
<p><em>“Real AI. Real ROI. Delivered by Engineers — Not Slide Decks.”</em></p>
<p><a href="https://technologies.lunartech.ai"><strong>LunarTech Lab</strong></a> is a deep-tech innovation partner specializing in AI, data science, and digital transformation – from healthcare to energy, telecom, and beyond.</p>
<p>We build real systems, not PowerPoint strategies. Our teams combine clinical, data, and engineering expertise to design AI that’s measurable, compliant, and production-ready. We’re vendor-neutral, globally distributed, and grounded in real AI and engineering, not hype. Our model blends Western European and North American leadership with high-performance technical teams offering world-class delivery at 70% of the Big Four’s cost.</p>
<h3 id="heading-how-we-work-from-scratch-in-four-phases">How We Work — From Scratch, in Four Phases</h3>
<p><strong>1. Discovery Sprint (2–4 Weeks):</strong> We start with data and ROI – not assumptions to define what’s worth building and what’s not and how much it will cost you.</p>
<p><strong>2. Pilot / Proof of Concept (8–12 Weeks):</strong> We prototype the core idea – fast, focused, and measurable.<br>This phase tests models, integrations, and real-world ROI before scaling.</p>
<p><strong>3. Full Implementation (6–12 Months):</strong> We industrialize the solution – secure data pipelines, production-grade models, full compliance (HIPAA, MDR, GDPR), and knowledge transfer.</p>
<p><strong>4. Managed Services (Ongoing):</strong> We maintain, retrain, and evolve the AI models for lasting ROI. Quarterly reviews ensure that performance improves with time, not decays. As we own <a href="https://academy.lunartech.ai/courses">LunarTech Academy</a>, we also build customised training to ensure clients tech team can continue working without us.</p>
<p>Every project is designed <strong>from scratch</strong>, integrating clinical knowledge, data engineering, and applied AI research.</p>
<h3 id="heading-why-lunartech-lab">Why LunarTech Lab?</h3>
<p>LunarTech Lab bridges the gap between strategy and real engineering, where most competitors fall short. Traditional consultancies, including the Big Four, sell frameworks, not systems – expensive slide decks with little execution.</p>
<p>We offer the same strategic clarity, but it’s delivered by engineers and data scientists who build what they design, at about 70% of the cost. Cloud vendors push their own stacks and lock clients in. LunarTech is vendor-neutral: we choose what’s best for your goals, ensuring freedom and long-term flexibility.</p>
<p>Outsourcing firms execute without innovation. LunarTech works like an R&amp;D partner, building from first principles, co-creating IP, and delivering measurable ROI.</p>
<p>From discovery to deployment, we combine strategy, science, and engineering, with one promise: We don’t sell slides. We deliver intelligence that works.</p>
<h3 id="heading-stay-connected-with-lunartech">Stay Connected with LunarTech</h3>
<p>Follow LunarTech Lab on <a href="https://substack.com/@lunartech">LunarTech NewsLetter</a> <strong>and</strong> <a href="https://www.linkedin.com/in/tatev-karen-aslanyan/"><strong>LinkedIn</strong></a><strong>,</strong> where innovation meets real engineering. You’ll get insights, project stories, and industry breakthroughs from the front lines of applied AI and data science.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn Command Line Interface (CLI) Development with Dart: From Zero to a Fully Published Developer Tool ]]>
                </title>
                <description>
                    <![CDATA[ Most developers spend a significant portion of their day in the terminal. They run flutter build, push with git, manage packages with dart pub, and orchestrate pipelines from the command line. Every o ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-command-line-interface-cli-development-with-dart-from-zero-to-a-fully-published-developer-tool/</link>
                <guid isPermaLink="false">69fe3149f239332df4fdfd46</guid>
                
                    <category>
                        <![CDATA[ Flutter ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Dart ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cli ]]>
                    </category>
                
                    <category>
                        <![CDATA[ command line ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Mobile Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ software development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Software Engineering ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Oluwaseyi Fatunmole ]]>
                </dc:creator>
                <pubDate>Fri, 08 May 2026 18:54:01 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/a4c564c2-f5f3-4824-b4e7-d103b5fc488e.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most developers spend a significant portion of their day in the terminal. They run <code>flutter build</code>, push with <code>git</code>, manage packages with <code>dart pub</code>, and orchestrate pipelines from the command line. Every one of those tools is a CLI, or command line interface: a program that lives in the terminal and responds to text commands.</p>
<p>Yet most developers have never built one.</p>
<p>That's a missed opportunity. CLI tools are one of the most practical things a developer can ship. They automate repetitive workflows, standardise processes across teams, and, when published, become tangible artifacts that the developer community can discover, install, and use.</p>
<p>In this handbook, you'll go from zero to building a fully distributed Dart CLI tool. We'll start with the fundamentals – how CLIs work, how Dart receives and processes terminal input, and the core syntax you need to know. Then we'll build three progressively complex CLIs, starting with the basics and finishing with a real-world API request runner. Finally, we will cover every distribution path available, from <code>pub.dev</code> to compiled binaries, Homebrew taps, Docker, and local team activation.</p>
<p>By the end of the guide, you'll understand both how to build a CLI tool in Dart as well as how to ship it so other developers can actually use it.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-what-is-a-cli-and-why-should-you-build-one">What is a CLI and Why Should You Build One?</a></p>
</li>
<li><p><a href="#heading-cli-syntax-anatomy">CLI Syntax Anatomy</a></p>
</li>
<li><p><a href="#heading-how-dart-receives-terminal-input">How Dart Receives Terminal Input</a></p>
</li>
<li><p><a href="#heading-core-cli-concepts-in-dart">Core CLI Concepts in Dart</a></p>
<ul>
<li><p><a href="#heading-stdout-stderr-and-stdin">stdout, stderr, and stdin</a></p>
</li>
<li><p><a href="#heading-exit-codes">Exit Codes</a></p>
</li>
<li><p><a href="#heading-environment-variables">Environment Variables</a></p>
</li>
<li><p><a href="#heading-file-and-directory-operations">File and Directory Operations</a></p>
</li>
<li><p><a href="#heading-running-external-processes">Running External Processes</a></p>
</li>
<li><p><a href="#heading-platform-detection">Platform Detection</a></p>
</li>
<li><p><a href="#heading-async-in-cli">Async in CLI</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-setting-up-your-dart-cli-project">Setting Up Your Dart CLI Project</a></p>
</li>
<li><p><a href="#heading-cli-1-hello-cli-the-fundamentals">CLI 1 — Hello CLI: The Fundamentals</a></p>
</li>
<li><p><a href="#heading-cli-2-darttodo-a-terminal-task-manager">CLI 2 — dart_todo: A Terminal Task Manager</a></p>
<ul>
<li><p><a href="#heading-introducing-the-args-package">Introducing the args Package</a></p>
</li>
<li><p><a href="#heading-building-darttodo">Building dart_todo</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-cli-3-darthttp-a-lightweight-api-request-runner">CLI 3 — dart_http: A Lightweight API Request Runner</a></p>
<ul>
<li><a href="#heading-building-darthttp">Building dart_http</a></li>
</ul>
</li>
<li><p><a href="#heading-adding-color-and-polish-to-your-cli">Adding Color and Polish to Your CLI</a></p>
</li>
<li><p><a href="#heading-testing-your-cli-tool">Testing Your CLI Tool</a></p>
</li>
<li><p><a href="#heading-deploying-and-distributing-your-cli">Deploying and Distributing Your CLI</a></p>
<ul>
<li><p><a href="#heading-mode-1-pubdev-public-package-distribution">Mode 1: pub.dev — Public Package Distribution</a></p>
</li>
<li><p><a href="#heading-mode-2-local-path-activation">Mode 2: Local Path Activation</a></p>
</li>
<li><p><a href="#heading-mode-3-compiled-binary-via-github-releases">Mode 3: Compiled Binary via GitHub Releases</a></p>
</li>
<li><p><a href="#heading-mode-4-homebrew-tap">Mode 4: Homebrew Tap</a></p>
</li>
<li><p><a href="#heading-mode-5-docker">Mode 5: Docker</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-choosing-the-right-distribution-mode">Choosing the Right Distribution Mode</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before starting, you should have:</p>
<ul>
<li><p>Dart SDK installed (<code>dart --version</code> should work in your terminal)</p>
</li>
<li><p>Basic familiarity with Dart syntax</p>
</li>
<li><p>Comfort with the terminal and running commands</p>
</li>
<li><p>A pub.dev account (for the publishing section)</p>
</li>
<li><p>A GitHub account (for the binary distribution section)</p>
</li>
</ul>
<h2 id="heading-what-is-a-cli-and-why-should-you-build-one">What is a CLI and Why Should You Build One?</h2>
<p>A CLI (or <strong>Command Line Interface</strong>) is a program you interact with entirely through text commands in a terminal, rather than through buttons and screens in a graphical interface.</p>
<p>Many of the tools you likely already rely on as a developer are CLI tools:</p>
<pre><code class="language-yaml">flutter build apk
git commit -m "fix: auth flow"
dart pub get
npm install
</code></pre>
<p>Flutter, Git, Dart, npm – all CLIs. You are already a CLI user every single day. This article is about becoming a CLI builder.</p>
<p>There are three strong reasons to build CLI tools as a developer:</p>
<ol>
<li><p><strong>Automating repetitive work:</strong> Anything you type more than twice a week is a candidate for automation. Generating boilerplate folder structures, running sequences of commands, scaffolding files, checking environments before a build a CLI turns a seven-step manual process into a single command.</p>
</li>
<li><p><strong>Standardising team workflows:</strong> Instead of a README that says "run these commands in this order," you ship one command that does all of it – consistently, every time, with no room for human error or a missed step.</p>
</li>
<li><p><strong>Building and publishing tooling.</strong> A published Dart CLI package is a tangible artifact. It shows up on pub.dev, gets installed and used by other developers, and communicates real engineering depth in a way that a portfolio or resume cannot.</p>
</li>
</ol>
<h2 id="heading-cli-syntax-anatomy">CLI Syntax Anatomy</h2>
<p>Before writing a single line of code, it helps to understand the structure of a CLI command. Every command follows a consistent pattern:</p>
<pre><code class="language-bash">tool [subcommand] [arguments] [options/flags]
</code></pre>
<p>Breaking down a real example:</p>
<pre><code class="language-bash">flutter build apk --release --obfuscate
│       │     │   │
tool    sub   arg  flags
</code></pre>
<ul>
<li><p><strong>Tool</strong> — the program itself (<code>flutter</code>, <code>dart</code>, <code>git</code>)</p>
</li>
<li><p><strong>Subcommand</strong> — the action being performed (<code>build</code>, <code>run</code>, <code>pub</code>)</p>
</li>
<li><p><strong>Arguments</strong> — what the action operates on (<code>apk</code>, <code>main.dart</code>, a filename)</p>
</li>
<li><p><strong>Flags and Options</strong> — modifiers that change behaviour</p>
</li>
</ul>
<p>There are two types of options:</p>
<pre><code class="language-plaintext">--release              # Boolean flag — either present or absent

--output=build/app     # Key-value option — name and a value
-v                     # Short flag — single hyphen, single character
</code></pre>
<p>This is the anatomy your CLIs will follow. Understanding it before writing any code means you will design your commands intentionally rather than stumbling into structure by accident.</p>
<h2 id="heading-how-dart-receives-terminal-input">How Dart Receives Terminal Input</h2>
<p>In Dart, everything the user types after your tool name is passed into your program through the <code>main</code> function:</p>
<pre><code class="language-dart">void main(List&lt;String&gt; args) {
  print(args);
}
</code></pre>
<p>Run it:</p>
<pre><code class="language-bash">dart run bin/mytool.dart hello world --name=Seyi
# [hello, world, --name=Seyi]
</code></pre>
<p>That <code>List&lt;String&gt; args</code> is just a list of strings. Each word or flag the user typed becomes an element in that list. Everything else you build on top of a CLI subcommands, flags, validation — is ultimately just processing this list.</p>
<h2 id="heading-core-cli-concepts-in-dart">Core CLI Concepts in Dart</h2>
<p>Before building anything, there's a set of foundational concepts that every CLI developer needs to understand. These are the building blocks that everything else sits on top of.</p>
<h3 id="heading-stdout-stderr-and-stdin">stdout, stderr, and stdin</h3>
<p>Most developers use <code>print()</code> for all output when they start building CLIs. That works for learning but it's incorrect in production.</p>
<p>There are two separate output streams in a terminal program:</p>
<ul>
<li><p><code>stdout</code> — regular output, meant for the user</p>
</li>
<li><p><code>stderr</code> — error output, meant for diagnostic messages and failures</p>
</li>
</ul>
<pre><code class="language-dart">import 'dart:io';

void main(List&lt;String&gt; args) {
  if (args.isEmpty) {
    stderr.writeln('Error: no arguments provided');
    exit(1);
  }

  stdout.writeln('Processing: ${args[0]}');
}
</code></pre>
<p>Keeping these separate matters because users can redirect stdout to a file without errors polluting it:</p>
<pre><code class="language-bash">dart run bin/tool.dart &gt; output.txt
# Errors still appear in the terminal
# Normal output goes cleanly to the file
</code></pre>
<p>Tools like <code>git</code>, <code>flutter</code>, and <code>curl</code> all do this correctly. Your CLI should too.</p>
<p><code>stdin</code> is the third stream — reading input from the user interactively at runtime:</p>
<pre><code class="language-dart">import 'dart:io';

void main() {
  stdout.write('Enter your name: ');
  final name = stdin.readLineSync();

  if (name == null || name.trim().isEmpty) {
    stderr.writeln('Error: no name provided');
    exit(1);
  }

  stdout.writeln('Hello, $name!');
}
</code></pre>
<p><code>stdout.write</code> (without <code>ln</code>) keeps the cursor on the same line so the user types right after the prompt. <code>stdin.readLineSync()</code> blocks until the user presses Enter and returns the typed string, or <code>null</code> if the stream closes unexpectedly. Always handle the null case.</p>
<h3 id="heading-exit-codes">Exit Codes</h3>
<p>Every program returns an exit code when it finishes. This is how the shell – and any script or CI system calling your tool – knows whether it succeeded or failed.</p>
<pre><code class="language-dart">import 'dart:io';

void main(List&lt;String&gt; args) {
  if (args.isEmpty) {
    stderr.writeln('Error: please provide an argument');
    exit(1); // failure
  }

  stdout.writeln('Done');
  exit(0); // success — also the default if you don't call exit()
}
</code></pre>
<p>The conventions are:</p>
<ul>
<li><p><code>0</code> — success</p>
</li>
<li><p><code>1</code> — general failure</p>
</li>
<li><p><code>2</code> — incorrect usage (wrong arguments, missing flags)</p>
</li>
</ul>
<p>Exit codes are critical when your CLI is called inside shell scripts or GitHub Actions workflows. A non-zero exit code stops a pipeline immediately. That's exactly the behaviour you want from a quality gate or a validation step.</p>
<h3 id="heading-environment-variables">Environment Variables</h3>
<p>Your CLI can read environment variables set in the user's shell:</p>
<pre><code class="language-dart">import 'dart:io';

void main() {
  final token = Platform.environment['API_TOKEN'];

  if (token == null) {
    stderr.writeln('Error: API_TOKEN environment variable is not set');
    exit(1);
  }

  stdout.writeln('Token found — proceeding...');
}
</code></pre>
<p>Set it in the terminal and run:</p>
<pre><code class="language-bash">export API_TOKEN=mytoken123
dart run bin/tool.dart
# Token found — proceeding...
</code></pre>
<p>This pattern is essential for CLI tools that interact with APIs, cloud services, or CI environments where credentials should never be hardcoded.</p>
<h3 id="heading-file-and-directory-operations">File and Directory Operations</h3>
<p>Many CLI tools read from or write to the file system. Dart's <code>dart:io</code> library covers everything you need:</p>
<pre><code class="language-dart">import 'dart:io';

void main(List&lt;String&gt; args) {
  if (args.isEmpty) {
    stderr.writeln('Usage: tool &lt;filename&gt;');
    exit(2);
  }

  final file = File(args[0]);

  if (!file.existsSync()) {
    stderr.writeln('Error: "${args[0]}" not found');
    exit(1);
  }

  final contents = file.readAsStringSync();
  stdout.writeln(contents);

  final output = File('output.txt');
  output.writeAsStringSync('Processed:\n$contents');
  stdout.writeln('Written to output.txt');
}
</code></pre>
<p>Working with directories:</p>
<pre><code class="language-dart">import 'dart:io';

void main() {
  // Where the command was run from
  final cwd = Directory.current.path;
  stdout.writeln('Working directory: $cwd');

  // Create a directory relative to current location
  final dir = Directory('$cwd/generated');

  if (!dir.existsSync()) {
    dir.createSync(recursive: true);
    stdout.writeln('Created: ${dir.path}');
  } else {
    stdout.writeln('Already exists: ${dir.path}');
  }
}
</code></pre>
<p>The <code>recursive: true</code> flag on <code>createSync</code> means it creates all intermediate directories — equivalent to <code>mkdir -p</code> in bash.</p>
<h3 id="heading-running-external-processes">Running External Processes</h3>
<p>One of the most powerful things a CLI can do is call other programs. Your Dart CLI can run <code>git</code>, <code>flutter</code>, <code>dart</code>, or any shell command programmatically:</p>
<pre><code class="language-dart">import 'dart:io';

void main() async {
  // Run a command and wait for it to finish
  final result = await Process.run('dart', ['pub', 'get']);

  stdout.write(result.stdout);

  if (result.exitCode != 0) {
    stderr.write(result.stderr);
    exit(result.exitCode);
  }

  stdout.writeln('Dependencies installed successfully');
}
</code></pre>
<p>For long-running commands where you want output to stream live as it happens:</p>
<pre><code class="language-dart">import 'dart:io';

void main() async {
  final process = await Process.start('flutter', ['build', 'apk']);

  // Pipe output directly to the terminal in real time
  process.stdout.pipe(stdout);
  process.stderr.pipe(stderr);

  final exitCode = await process.exitCode;
  exit(exitCode);
}
</code></pre>
<p><code>Process.run</code> — waits for completion, returns all output at once. Use for short commands.</p>
<p><code>Process.start</code> — streams output live as it arrives. Use for long-running commands where the user needs to see progress.</p>
<h3 id="heading-platform-detection">Platform Detection</h3>
<p>Sometimes your CLI needs to behave differently depending on the operating system it is running on:</p>
<pre><code class="language-dart">import 'dart:io';

void main() {
  if (Platform.isWindows) {
    stdout.writeln('Running on Windows');
  } else if (Platform.isMacOS) {
    stdout.writeln('Running on macOS');
  } else if (Platform.isLinux) {
    stdout.writeln('Running on Linux');
  }

  // Useful for path handling across operating systems
  stdout.writeln(Platform.pathSeparator); // \ on Windows, / elsewhere
  stdout.writeln(Platform.operatingSystem); // 'macos', 'linux', 'windows'
}
</code></pre>
<p>This matters when your CLI creates files, resolves paths, or calls shell commands that differ between operating systems.</p>
<h3 id="heading-async-in-cli">Async in CLI</h3>
<p>Dart CLIs support <code>async/await</code> natively. Any <code>main</code> function can be made async:</p>
<pre><code class="language-dart">import 'dart:io';

void main() async {
  stdout.writeln('Starting...');

  await Future.delayed(const Duration(seconds: 1)); // simulating async work

  stdout.writeln('Done');
}
</code></pre>
<p>Any operation involving file I/O, HTTP requests, or spawning processes will be asynchronous. Get comfortable with async <code>main</code> functions early — you'll use them constantly.</p>
<h2 id="heading-setting-up-your-dart-cli-project">Setting Up Your Dart CLI Project</h2>
<p>Create a new Dart console project:</p>
<pre><code class="language-bash">dart create -t console my_cli_tool
cd my_cli_tool
</code></pre>
<p>This generates a clean structure:</p>
<pre><code class="language-plaintext">my_cli_tool/
  bin/
    my_cli_tool.dart    ← entry point
  lib/                  ← shared library code
  test/                 ← tests
  pubspec.yaml
  README.md
</code></pre>
<p>The <code>bin/</code> directory is where your executable entry point lives. The <code>lib/</code> directory is where you put everything else — commands, utilities, models — that <code>bin/</code> imports and uses.</p>
<p>Open <code>pubspec.yaml</code>. You'll need to add an <code>executables</code> block before publishing:</p>
<pre><code class="language-yaml">name: my_cli_tool
description: A sample CLI tool built with Dart
version: 1.0.0

environment:
  sdk: '&gt;=3.0.0 &lt;4.0.0'

executables:
  my_cli_tool: my_cli_tool  # executable name: bin file name

dependencies:
  args: ^2.4.2

dev_dependencies:
  lints: ^3.0.0
  test: ^1.24.0
</code></pre>
<p>The <code>executables</code> block is what makes <code>dart pub global activate my_cli_tool</code> work. It tells Dart which script in <code>bin/</code> to expose as a runnable command after installation.</p>
<h2 id="heading-cli-1-hello-cli-the-fundamentals">CLI 1 — Hello CLI: The Fundamentals</h2>
<p>This first CLI uses pure Dart — no packages. The goal is to get comfortable with args, subcommands, input validation, and exit codes before introducing any external dependencies.</p>
<p>Replace the contents of <code>bin/my_cli_tool.dart</code>:</p>
<pre><code class="language-dart">import 'dart:io';

void main(List&lt;String&gt; args) {
  if (args.isEmpty) {
    printHelp();
    exit(0);
  }

  final command = args[0];

  switch (command) {
    case 'greet':
      handleGreet(args.sublist(1));
    case 'time':
      handleTime();
    case 'echo':
      handleEcho(args.sublist(1));
    case 'help':
      printHelp();
    default:
      stderr.writeln('Unknown command: "$command"');
      stderr.writeln('Run "mytool help" to see available commands.');
      exit(1);
  }
}

void handleGreet(List&lt;String&gt; args) {
  if (args.isEmpty) {
    stderr.writeln('Usage: mytool greet &lt;name&gt;');
    exit(2);
  }

  final name = args[0];
  stdout.writeln('Hello, $name! Welcome to your first Dart CLI.');
}

void handleTime() {
  final now = DateTime.now();
  stdout.writeln(
    'Current time: ${now.hour.toString().padLeft(2, '0')}:'
    '${now.minute.toString().padLeft(2, '0')}:'
    '${now.second.toString().padLeft(2, '0')}',
  );
}

void handleEcho(List&lt;String&gt; args) {
  if (args.isEmpty) {
    stderr.writeln('Usage: mytool echo &lt;message&gt;');
    exit(2);
  }

  stdout.writeln(args.join(' '));
}

void printHelp() {
  stdout.writeln('''
mytool — a simple Dart CLI

Usage:
  mytool &lt;command&gt; [arguments]

Commands:
  greet &lt;name&gt;      Greet someone by name
  time              Show the current time
  echo &lt;message&gt;    Echo a message back to the terminal
  help              Show this help message

Examples:
  mytool greet Seyi
  mytool echo "Hello from the terminal"
  mytool time
  ''');
}
</code></pre>
<p>Run it:</p>
<pre><code class="language-bash">dart run bin/my_cli_tool.dart help

dart run bin/my_cli_tool.dart greet Seyi
# Hello, Seyi! Welcome to your first Dart CLI.

dart run bin/my_cli_tool.dart time
# Current time: 14:32:10

dart run bin/my_cli_tool.dart echo "Dart CLIs are powerful"
# Dart CLIs are powerful

dart run bin/my_cli_tool.dart unknown
# Unknown command: "unknown"
# Run "mytool help" to see available commands.
</code></pre>
<p>Three things this CLI demonstrates that are worth internalising:</p>
<ol>
<li><p><strong>Subcommands are just a switch on</strong> <code>args[0]</code><strong>.</strong> The pattern is simple and scalable — add a new <code>case</code> to add a new command.</p>
</li>
<li><p><code>args.sublist(1)</code> <strong>passes remaining args to the handler.</strong> When <code>greet</code> receives <code>['greet', 'Seyi']</code>, it calls <code>handleGreet(['Seyi'])</code> — clean and isolated.</p>
</li>
<li><p><strong>Every error path has a message and a non-zero exit code.</strong> The user always knows what went wrong and what to do next.</p>
</li>
</ol>
<h2 id="heading-cli-2-darttodo-a-terminal-task-manager">CLI 2 — dart_todo: A Terminal Task Manager</h2>
<p>This CLI introduces the <code>args</code> package, JSON file persistence, and structured terminal output. It's meaningfully more complex than CLI 1 and reflects real patterns you will use in production tools.</p>
<h3 id="heading-introducing-the-args-package">Introducing the args Package</h3>
<p>Manually parsing <code>List&lt;String&gt; args</code> works for simple cases, but breaks down quickly when you add flags like <code>--priority=high</code>, boolean options like <code>--done</code>, or commands with multiple optional arguments.</p>
<p>The <code>args</code> package handles all of that cleanly.</p>
<p>Add it to your <code>pubspec.yaml</code>:</p>
<pre><code class="language-yaml">dependencies:
  args: ^2.4.2
</code></pre>
<p>Run:</p>
<pre><code class="language-bash">dart pub get
</code></pre>
<p>The core concept in <code>args</code> is the <code>ArgParser</code>. You define what your CLI accepts, and <code>args</code> handles parsing, validation, and generating help text automatically:</p>
<pre><code class="language-dart">import 'package:args/args.dart';

void main(List&lt;String&gt; arguments) {
  final parser = ArgParser()
    ..addCommand('add')
    ..addCommand('list')
    ..addFlag('help', abbr: 'h', negatable: false);

  final results = parser.parse(arguments);

  if (results['help'] as bool) {
    print(parser.usage);
    return;
  }
}
</code></pre>
<p>For more complex CLIs with subcommands that each have their own flags, use <code>ArgParser</code> per command:</p>
<pre><code class="language-dart">final parser = ArgParser();

final addCommand = ArgParser()
  ..addOption('priority', abbr: 'p', defaultsTo: 'normal');

parser.addCommand('add', addCommand);
</code></pre>
<h3 id="heading-building-darttodo">Building dart_todo</h3>
<p>Create a fresh project:</p>
<pre><code class="language-bash">dart create -t console dart_todo
cd dart_todo
</code></pre>
<p>Update <code>pubspec.yaml</code>:</p>
<pre><code class="language-yaml">name: dart_todo
description: A terminal task manager built with Dart
version: 1.0.0

environment:
  sdk: '&gt;=3.0.0 &lt;4.0.0'

executables:
  dart_todo: dart_todo

dependencies:
  args: ^2.4.2

dev_dependencies:
  lints: ^3.0.0
  test: ^1.24.0
</code></pre>
<p>Run <code>dart pub get</code>.</p>
<p>Create the folder structure:</p>
<pre><code class="language-plaintext">dart_todo/
  bin/
    dart_todo.dart
  lib/
    models/
      task.dart
    storage/
      task_storage.dart
    commands/
      add_command.dart
      list_command.dart
      complete_command.dart
      delete_command.dart
      clear_command.dart
  pubspec.yaml
</code></pre>
<h4 id="heading-step-1-the-task-model-libmodelstaskdart">Step 1 — The Task Model (<code>lib/models/task.dart</code>)</h4>
<pre><code class="language-dart">class Task {
  final int id;
  final String title;
  final String priority;
  final bool isComplete;
  final DateTime createdAt;

  Task({
    required this.id,
    required this.title,
    required this.priority,
    this.isComplete = false,
    required this.createdAt,
  });

  Task copyWith({bool? isComplete}) {
    return Task(
      id: id,
      title: title,
      priority: priority,
      isComplete: isComplete ?? this.isComplete,
      createdAt: createdAt,
    );
  }

  Map&lt;String, dynamic&gt; toJson() =&gt; {
        'id': id,
        'title': title,
        'priority': priority,
        'isComplete': isComplete,
        'createdAt': createdAt.toIso8601String(),
      };

  factory Task.fromJson(Map&lt;String, dynamic&gt; json) =&gt; Task(
        id: json['id'] as int,
        title: json['title'] as String,
        priority: json['priority'] as String,
        isComplete: json['isComplete'] as bool,
        createdAt: DateTime.parse(json['createdAt'] as String),
      );
}
</code></pre>
<h4 id="heading-step-2-storage-libstoragetaskstoragedart">Step 2 — Storage (<code>lib/storage/task_storage.dart</code>)</h4>
<p>This class handles reading and writing tasks to a local JSON file so they persist between CLI runs:</p>
<pre><code class="language-dart">import 'dart:convert';
import 'dart:io';

import '../models/task.dart';

class TaskStorage {
  static final _file = File(
    '${Platform.environment['HOME'] ?? Directory.current.path}/.dart_todo.json',
  );

  static List&lt;Task&gt; loadAll() {
    if (!_file.existsSync()) return [];

    try {
      final content = _file.readAsStringSync();
      final List&lt;dynamic&gt; json = jsonDecode(content) as List&lt;dynamic&gt;;
      return json
          .map((e) =&gt; Task.fromJson(e as Map&lt;String, dynamic&gt;))
          .toList();
    } catch (_) {
      return [];
    }
  }

  static void saveAll(List&lt;Task&gt; tasks) {
    final json = jsonEncode(tasks.map((t) =&gt; t.toJson()).toList());
    _file.writeAsStringSync(json);
  }
}
</code></pre>
<p>Tasks are stored in a hidden JSON file in the user's home directory — a common pattern for CLI tools that need lightweight local persistence.</p>
<h4 id="heading-step-3-commands">Step 3 — Commands</h4>
<p><code>lib/commands/add_command.dart</code>:</p>
<pre><code class="language-dart">import 'dart:io';

import '../models/task.dart';
import '../storage/task_storage.dart';

void runAdd(List&lt;String&gt; args, String priority) {
  if (args.isEmpty) {
    stderr.writeln('Usage: dart_todo add &lt;title&gt; [--priority=high|normal|low]');
    exit(2);
  }

  final title = args.join(' ');
  final tasks = TaskStorage.loadAll();

  final newTask = Task(
    id: tasks.isEmpty ? 1 : tasks.last.id + 1,
    title: title,
    priority: priority,
    createdAt: DateTime.now(),
  );

  tasks.add(newTask);
  TaskStorage.saveAll(tasks);

  stdout.writeln('Added task #\({newTask.id}: "\)title" [$priority]');
}
</code></pre>
<p><code>lib/commands/list_command.dart</code>:</p>
<pre><code class="language-cpp">import 'dart:io';

import '../storage/task_storage.dart';

void runList() {
  final tasks = TaskStorage.loadAll();

  if (tasks.isEmpty) {
    stdout.writeln('No tasks yet. Add one with: dart_todo add &lt;title&gt;');
    return;
  }

  stdout.writeln('');
  stdout.writeln('  ID   Status      Priority   Title');
  stdout.writeln('  ───  ──────────  ─────────  ────────────────────────');

  for (final task in tasks) {
    final status = task.isComplete ? 'done  ' : 'pending';
    final id = task.id.toString().padRight(4);
    final priority = task.priority.padRight(9);
    stdout.writeln('  \(id \)status  \(priority  \){task.title}');
  }

  stdout.writeln('');
}
</code></pre>
<p><code>lib/commands/complete_command.dart</code>:</p>
<pre><code class="language-dart">import 'dart:io';

import '../storage/task_storage.dart';

void runComplete(List&lt;String&gt; args) {
  if (args.isEmpty) {
    stderr.writeln('Usage: dart_todo complete &lt;id&gt;');
    exit(2);
  }

  final id = int.tryParse(args[0]);
  if (id == null) {
    stderr.writeln('Error: "${args[0]}" is not a valid task ID');
    exit(1);
  }

  final tasks = TaskStorage.loadAll();
  final index = tasks.indexWhere((t) =&gt; t.id == id);

  if (index == -1) {
    stderr.writeln('Error: No task found with ID $id');
    exit(1);
  }

  if (tasks[index].isComplete) {
    stdout.writeln('Task #$id is already complete.');
    return;
  }

  tasks[index] = tasks[index].copyWith(isComplete: true);
  TaskStorage.saveAll(tasks);

  stdout.writeln('Task #\(id marked as complete: "\){tasks[index].title}"');
}
</code></pre>
<p><code>lib/commands/delete_command.dart</code>:</p>
<pre><code class="language-dart">import 'dart:io';

import '../storage/task_storage.dart';

void runDelete(List&lt;String&gt; args) {
  if (args.isEmpty) {
    stderr.writeln('Usage: dart_todo delete &lt;id&gt;');
    exit(2);
  }

  final id = int.tryParse(args[0]);
  if (id == null) {
    stderr.writeln('Error: "${args[0]}" is not a valid task ID');
    exit(1);
  }

  final tasks = TaskStorage.loadAll();
  final index = tasks.indexWhere((t) =&gt; t.id == id);

  if (index == -1) {
    stderr.writeln('Error: No task found with ID $id');
    exit(1);
  }

  final title = tasks[index].title;
  tasks.removeAt(index);
  TaskStorage.saveAll(tasks);

  stdout.writeln('Deleted task #\(id: "\)title"');
}
</code></pre>
<p><code>lib/commands/clear_command.dart</code>:</p>
<pre><code class="language-dart">import 'dart:io';

import '../storage/task_storage.dart';

void runClear() {
  stdout.write('Are you sure you want to delete all tasks? (y/N): ');
  final input = stdin.readLineSync()?.trim().toLowerCase();

  if (input != 'y') {
    stdout.writeln('Cancelled.');
    return;
  }

  TaskStorage.saveAll([]);
  stdout.writeln('All tasks cleared.');
}
</code></pre>
<h4 id="heading-step-4-entry-point-bindarttododart">Step 4 — Entry Point (<code>bin/dart_todo.dart</code>)</h4>
<pre><code class="language-dart">import 'dart:io';

import 'package:args/args.dart';

import '../lib/commands/add_command.dart';
import '../lib/commands/clear_command.dart';
import '../lib/commands/complete_command.dart';
import '../lib/commands/delete_command.dart';
import '../lib/commands/list_command.dart';

void main(List&lt;String&gt; arguments) {
  final parser = ArgParser();

  // Add subcommand parsers
  final addParser = ArgParser()
    ..addOption(
      'priority',
      abbr: 'p',
      defaultsTo: 'normal',
      allowed: ['high', 'normal', 'low'],
      help: 'Task priority level',
    );

  parser
    ..addCommand('add', addParser)
    ..addCommand('list')
    ..addCommand('complete')
    ..addCommand('delete')
    ..addCommand('clear')
    ..addFlag('help', abbr: 'h', negatable: false, help: 'Show help');

  ArgResults results;

  try {
    results = parser.parse(arguments);
  } catch (e) {
    stderr.writeln('Error: $e');
    stderr.writeln(parser.usage);
    exit(2);
  }

  if (results['help'] as bool || results.command == null) {
    printHelp(parser);
    exit(0);
  }

  final command = results.command!;

  switch (command.name) {
    case 'add':
      runAdd(command.rest, command['priority'] as String);
    case 'list':
      runList();
    case 'complete':
      runComplete(command.rest);
    case 'delete':
      runDelete(command.rest);
    case 'clear':
      runClear();
    default:
      stderr.writeln('Unknown command: "${command.name}"');
      exit(1);
  }
}

void printHelp(ArgParser parser) {
  stdout.writeln('''
dart_todo — a terminal task manager

Usage:
  dart_todo &lt;command&gt; [arguments]

Commands:
  add &lt;title&gt;        Add a new task
    -p, --priority   Priority: high, normal, low (default: normal)
  list               List all tasks
  complete &lt;id&gt;      Mark a task as complete
  delete &lt;id&gt;        Delete a task
  clear              Delete all tasks

Examples:
  dart_todo add "Write the CLI article" --priority=high
  dart_todo list
  dart_todo complete 1
  dart_todo delete 2
  dart_todo clear
  ''');
}
</code></pre>
<p>Run it:</p>
<pre><code class="language-bash">dart run bin/dart_todo.dart add "Write the CLI article" --priority=high
# Added task #1: "Write the CLI article" [high]

dart run bin/dart_todo.dart add "Review PR comments"
# Added task #2: "Review PR comments" [normal]

dart run bin/dart_todo.dart list
#   ID   Status      Priority   Title
#   ───  ──────────  ─────────  ────────────────────────
#   1    ⬜ pending  high       Write the CLI article
#   2    ⬜ pending  normal     Review PR comments

dart run bin/dart_todo.dart complete 1
# Task #1 marked as complete: "Write the CLI article"

dart run bin/dart_todo.dart delete 2
# Deleted task #2: "Review PR comments"
</code></pre>
<p><code>dart_todo</code> demonstrates the patterns that form the backbone of almost every real CLI tool — argument parsing with <code>args</code>, JSON persistence, interactive prompts, structured output, and clean error handling across every command.</p>
<h2 id="heading-cli-3-darthttp-a-lightweight-api-request-runner">CLI 3 — dart_http: A Lightweight API Request Runner</h2>
<p>This is the most complex CLI in this article – and the most immediately useful. <code>dart_http</code> lets developers make HTTP requests directly from the terminal, with pretty-printed JSON responses, response metadata, header support, and the ability to save responses to a file.</p>
<pre><code class="language-bash">dart_http get https://jsonplaceholder.typicode.com/users/1
dart_http post https://jsonplaceholder.typicode.com/posts --body='{"title":"Hello"}'
dart_http get https://jsonplaceholder.typicode.com/users --save=users.json
dart_http get https://api.example.com/me --header="Authorization: Bearer mytoken"
</code></pre>
<h3 id="heading-building-darthttp">Building dart_http</h3>
<p>Create the project:</p>
<pre><code class="language-bash">dart create -t console dart_http
cd dart_http
</code></pre>
<p>Update <code>pubspec.yaml</code>:</p>
<pre><code class="language-yaml">name: dart_http
description: A lightweight API request runner for the terminal
version: 1.0.0

environment:
  sdk: '&gt;=3.0.0 &lt;4.0.0'

executables:
  dart_http: dart_http

dependencies:
  args: ^2.4.2
  http: ^1.2.1

dev_dependencies:
  lints: ^3.0.0
  test: ^1.24.0
</code></pre>
<p>Run <code>dart pub get</code>.</p>
<p>Project structure:</p>
<pre><code class="language-plaintext">dart_http/
  bin/
    dart_http.dart
  lib/
    runner/
      request_runner.dart
    printer/
      response_printer.dart
    utils/
      headers_parser.dart
  pubspec.yaml
</code></pre>
<h4 id="heading-step-1-headers-parser-libutilsheadersparserdart">Step 1 — Headers Parser (<code>lib/utils/headers_parser.dart</code>)</h4>
<pre><code class="language-dart">Map&lt;String, String&gt; parseHeaders(List&lt;String&gt; rawHeaders) {
  final headers = &lt;String, String&gt;{};

  for (final header in rawHeaders) {
    final index = header.indexOf(':');
    if (index == -1) continue;

    final key = header.substring(0, index).trim();
    final value = header.substring(index + 1).trim();
    headers[key] = value;
  }

  return headers;
}
</code></pre>
<h4 id="heading-step-2-response-printer-libprinterresponseprinterdart">Step 2 — Response Printer (<code>lib/printer/response_printer.dart</code>)</h4>
<pre><code class="language-dart">import 'dart:convert';
import 'dart:io';

void printResponse({
  required int statusCode,
  required String body,
  required int durationMs,
  required int bodyBytes,
}) {
  final statusLabel = _statusLabel(statusCode);
  final size = _formatSize(bodyBytes);

  stdout.writeln('');
  stdout.writeln('\(statusLabel | \){durationMs}ms | $size');
  stdout.writeln('─' * 50);

  try {
    final decoded = jsonDecode(body);
    const encoder = JsonEncoder.withIndent('  ');
    stdout.writeln(encoder.convert(decoded));
  } catch (_) {
    // Not JSON — print as plain text
    stdout.writeln(body);
  }

  stdout.writeln('');
}

String _statusLabel(int code) {
  if (code &gt;= 200 &amp;&amp; code &lt; 300) return '✅ $code';
  if (code &gt;= 300 &amp;&amp; code &lt; 400) return '↪️  $code';
  if (code &gt;= 400 &amp;&amp; code &lt; 500) return '❌ $code';
  return '$code';
}

String _formatSize(int bytes) {
  if (bytes &lt; 1024) return '${bytes}b';
  if (bytes &lt; 1024 * 1024) return '${(bytes / 1024).toStringAsFixed(1)}kb';
  return '${(bytes / (1024 * 1024)).toStringAsFixed(1)}mb';
}
</code></pre>
<h4 id="heading-step-3-request-runner-librunnerrequestrunnerdart">Step 3 — Request Runner (<code>lib/runner/request_runner.dart</code>)</h4>
<pre><code class="language-dart">import 'dart:io';

import 'package:http/http.dart' as http;

import '../printer/response_printer.dart';

Future&lt;void&gt; runRequest({
  required String method,
  required String url,
  required Map&lt;String, String&gt; headers,
  String? body,
  String? saveToFile,
}) async {
  final uri = Uri.tryParse(url);

  if (uri == null) {
    stderr.writeln('Error: "$url" is not a valid URL');
    exit(1);
  }

  stdout.writeln('→ \({method.toUpperCase()} \)url');

  http.Response response;
  final stopwatch = Stopwatch()..start();

  try {
    switch (method.toLowerCase()) {
      case 'get':
        response = await http.get(uri, headers: headers);
      case 'post':
        response = await http.post(uri, headers: headers, body: body);
      case 'put':
        response = await http.put(uri, headers: headers, body: body);
      case 'patch':
        response = await http.patch(uri, headers: headers, body: body);
      case 'delete':
        response = await http.delete(uri, headers: headers);
      default:
        stderr.writeln('Error: unsupported method "$method"');
        exit(2);
    }
  } catch (e) {
    stderr.writeln('Error: request failed — $e');
    exit(1);
  }

  stopwatch.stop();

  printResponse(
    statusCode: response.statusCode,
    body: response.body,
    durationMs: stopwatch.elapsedMilliseconds,
    bodyBytes: response.bodyBytes.length,
  );

  if (saveToFile != null) {
    final file = File(saveToFile);
    file.writeAsStringSync(response.body);
    stdout.writeln('Response saved to $saveToFile');
  }
}
</code></pre>
<h4 id="heading-step-4-entry-point-bindarthttpdart">Step 4 — Entry Point (<code>bin/dart_http.dart</code>)</h4>
<pre><code class="language-dart">import 'dart:io';

import 'package:args/args.dart';

import '../lib/runner/request_runner.dart';
import '../lib/utils/headers_parser.dart';

void main(List&lt;String&gt; arguments) async {
  final parser = ArgParser();

  for (final method in ['get', 'post', 'put', 'patch', 'delete']) {
    final commandParser = ArgParser()
      ..addMultiOption('header', abbr: 'H', help: 'Request header (repeatable)')
      ..addOption('body', abbr: 'b', help: 'Request body (for POST/PUT/PATCH)')
      ..addOption('save', abbr: 's', help: 'Save response body to a file');

    parser.addCommand(method, commandParser);
  }

  parser.addFlag('help', abbr: 'h', negatable: false, help: 'Show help');

  ArgResults results;

  try {
    results = parser.parse(arguments);
  } catch (e) {
    stderr.writeln('Error: $e');
    printHelp();
    exit(2);
  }

  if (results['help'] as bool || results.command == null) {
    printHelp();
    exit(0);
  }

  final command = results.command!;
  final method = command.name!;
  final rest = command.rest;

  if (rest.isEmpty) {
    stderr.writeln('Error: please provide a URL');
    stderr.writeln('Usage: dart_http $method &lt;url&gt;');
    exit(2);
  }

  final url = rest[0];
  final rawHeaders = command['header'] as List&lt;String&gt;;
  final body = command['body'] as String?;
  final saveToFile = command['save'] as String?;

  final headers = parseHeaders(rawHeaders);

  // Default Content-Type for requests with a body
  if (body != null &amp;&amp; !headers.containsKey('Content-Type')) {
    headers['Content-Type'] = 'application/json';
  }

  await runRequest(
    method: method,
    url: url,
    headers: headers,
    body: body,
    saveToFile: saveToFile,
  );
}

void printHelp() {
  stdout.writeln('''
dart_http — a lightweight API request runner

Usage:
  dart_http &lt;method&gt; &lt;url&gt; [options]

Methods:
  get       Send a GET request
  post      Send a POST request
  put       Send a PUT request
  patch     Send a PATCH request
  delete    Send a DELETE request

Options:
  -H, --header    Add a request header (repeatable)
  -b, --body      Request body (JSON string)
  -s, --save      Save response body to a file
  -h, --help      Show this help message

Examples:
  dart_http get https://jsonplaceholder.typicode.com/users
  dart_http get https://api.example.com/me --header="Authorization: Bearer token"
  dart_http post https://api.example.com/posts --body=\'{"title":"Hello"}\'
  dart_http get https://api.example.com/users --save=users.json
  ''');
}
</code></pre>
<p>Run it:</p>
<pre><code class="language-bash">dart run bin/dart_http.dart get https://jsonplaceholder.typicode.com/users/1

# → GET https://jsonplaceholder.typicode.com/users/1
# 200 | 87ms | 510b
# ──────────────────────────────────────────────────
# {
#   "id": 1,
#   "name": "Leanne Graham",
#   "username": "Bret",
#   "email": "Sincere@april.biz"
# }

dart run bin/dart_http.dart get https://jsonplaceholder.typicode.com/users --save=users.json
# → GET https://jsonplaceholder.typicode.com/users
# 200 | 143ms | 5.3kb
# ──────────────────────────────────────────────────
# [ ... ]
# Response saved to users.json

dart run bin/dart_http.dart post https://jsonplaceholder.typicode.com/posts \
  --body='{"title":"Hello from dart_http","userId":1}'
# → POST https://jsonplaceholder.typicode.com/posts
# 201 | 312ms | 72b
</code></pre>
<h2 id="heading-adding-color-and-polish-to-your-cli">Adding Color and Polish to Your CLI</h2>
<p>The CLIs above are functional, but terminal output can be made significantly more readable with color. The <code>ansi_styles</code> package provides ANSI escape code support for coloring text in the terminal.</p>
<p>Add it to <code>pubspec.yaml</code>:</p>
<pre><code class="language-yaml">dependencies:
  ansi_styles: ^0.3.0
</code></pre>
<p>Using it:</p>
<pre><code class="language-dart">import 'package:ansi_styles/ansi_styles.dart';

stdout.writeln(AnsiStyles.green('✅ Success'));
stdout.writeln(AnsiStyles.red('❌ Error: something went wrong'));
stdout.writeln(AnsiStyles.yellow('⚠️  Warning: check your config'));
stdout.writeln(AnsiStyles.bold('dart_http — API request runner'));
stdout.writeln(AnsiStyles.cyan('→ GET https://api.example.com/users'));
</code></pre>
<p>Apply color intentionally and consistently:</p>
<ul>
<li><p><strong>Green</strong> — success states, completed operations</p>
</li>
<li><p><strong>Red</strong> — errors and failures</p>
</li>
<li><p><strong>Yellow</strong> — warnings and non-blocking issues</p>
</li>
<li><p><strong>Cyan</strong> — informational output, URLs, paths</p>
</li>
<li><p><strong>Bold</strong> — headers, tool names, important values</p>
</li>
</ul>
<p>Avoid coloring everything. Color loses meaning when it is everywhere. Use it to draw the user's eye to what actually matters.</p>
<h2 id="heading-testing-your-cli-tool">Testing Your CLI Tool</h2>
<p>CLI tools are testable, and they should be tested. The most reliable approach is to test the logic inside your commands directly — not the terminal output formatting, but the behaviour.</p>
<p>Add <code>test</code> to your dev dependencies if it's not already there:</p>
<pre><code class="language-yaml">dev_dependencies:
  test: ^1.24.0
</code></pre>
<p><strong>Testing command logic:</strong></p>
<pre><code class="language-dart">import 'package:test/test.dart';

import '../lib/models/task.dart';

void main() {
  group('Task model', () {
    test('copyWith updates isComplete correctly', () {
      final task = Task(
        id: 1,
        title: 'Write tests',
        priority: 'high',
        createdAt: DateTime.now(),
      );

      final completed = task.copyWith(isComplete: true);

      expect(completed.isComplete, isTrue);
      expect(completed.title, equals('Write tests'));
      expect(completed.id, equals(1));
    });

    test('toJson and fromJson round-trips correctly', () {
      final task = Task(
        id: 2,
        title: 'Ship the tool',
        priority: 'normal',
        createdAt: DateTime.parse('2025-01-01T00:00:00.000'),
      );

      final json = task.toJson();
      final restored = Task.fromJson(json);

      expect(restored.id, equals(task.id));
      expect(restored.title, equals(task.title));
      expect(restored.priority, equals(task.priority));
    });
  });
}
</code></pre>
<p><strong>Testing the headers parser:</strong></p>
<pre><code class="language-dart">import 'package:test/test.dart';

import '../lib/utils/headers_parser.dart';

void main() {
  group('parseHeaders', () {
    test('parses a single header correctly', () {
      final result = parseHeaders(['Authorization: Bearer mytoken']);
      expect(result['Authorization'], equals('Bearer mytoken'));
    });

    test('parses multiple headers', () {
      final result = parseHeaders([
        'Authorization: Bearer token',
        'Accept: application/json',
      ]);
      expect(result.length, equals(2));
      expect(result['Accept'], equals('application/json'));
    });

    test('ignores malformed headers without a colon', () {
      final result = parseHeaders(['malformed-header']);
      expect(result.isEmpty, isTrue);
    });
  });
}
</code></pre>
<p>Run your tests:</p>
<pre><code class="language-bash">dart test
</code></pre>
<h2 id="heading-deploying-and-distributing-your-cli">Deploying and Distributing Your CLI</h2>
<p>Building a CLI tool is half the work. Getting it into the hands of developers is the other half. There are five distribution paths available, each suited to a different use case.</p>
<h3 id="heading-mode-1-pubdev-public-package-distribution">Mode 1: pub.dev — Public Package Distribution</h3>
<p>Publishing to pub.dev makes your tool installable by anyone in the Dart and Flutter community with a single command.</p>
<h4 id="heading-prepare-your-package">Prepare your package:</h4>
<p>Your <code>pubspec.yaml</code> needs to be complete:</p>
<pre><code class="language-yaml">name: dart_http
description: A lightweight API request runner for Dart developers.
version: 1.0.0
homepage: https://github.com/yourname/dart_http

environment:
  sdk: '&gt;=3.0.0 &lt;4.0.0'

executables:
  dart_http: dart_http
</code></pre>
<p>The <code>executables</code> block is critical. It tells pub.dev which script in <code>bin/</code> to expose as a runnable command.</p>
<p>You also need:</p>
<ul>
<li><p><code>README.md</code> — what the tool does, how to install it, usage examples</p>
</li>
<li><p><code>CHANGELOG.md</code> — version history</p>
</li>
<li><p><code>LICENSE</code> — an open source license (MIT is standard)</p>
</li>
</ul>
<h4 id="heading-validate-before-publishing">Validate before publishing:</h4>
<pre><code class="language-bash">dart pub publish --dry-run
</code></pre>
<p>This runs all validation checks without actually publishing. Fix any warnings before proceeding.</p>
<h4 id="heading-publish">Publish:</h4>
<pre><code class="language-bash">dart pub publish
</code></pre>
<p>You will be prompted to authenticate with your pub.dev account. Once published, your tool is available globally:</p>
<pre><code class="language-bash">dart pub global activate dart_http
dart_http get https://api.example.com/users
</code></pre>
<h3 id="heading-mode-2-local-path-activation">Mode 2: Local Path Activation</h3>
<p>For internal team tools that you don't want to publish publicly, activate directly from a local or cloned repository:</p>
<pre><code class="language-bash">dart pub global activate --source path /path/to/dart_http
</code></pre>
<p>Any developer on the team clones the repo and runs this command once. The tool is then available globally in their terminal without needing a pub.dev publish.</p>
<p>This is the right distribution mode for:</p>
<ul>
<li><p>Internal company tooling</p>
</li>
<li><p>Tools that depend on private packages</p>
</li>
<li><p>Work-in-progress tools shared within a team before a public release</p>
</li>
</ul>
<h3 id="heading-mode-3-compiled-binary-via-github-releases">Mode 3: Compiled Binary via GitHub Releases</h3>
<p>Dart can compile to a self-contained native executable — no Dart SDK required on the target machine. This makes your tool accessible to developers outside the Dart ecosystem.</p>
<h4 id="heading-compile">Compile:</h4>
<pre><code class="language-bash"># macOS
dart compile exe bin/dart_http.dart -o dist/dart_http-macos

# Linux
dart compile exe bin/dart_http.dart -o dist/dart_http-linux

# Windows
dart compile exe bin/dart_http.dart -o dist/dart_http-windows.exe
</code></pre>
<p>The compiled binary is fully self-contained. Copy it to any machine and run it — no Dart installation needed.</p>
<h4 id="heading-automate-with-github-actions">Automate with GitHub Actions:</h4>
<p>Create <code>.github/workflows/release.yml</code>:</p>
<pre><code class="language-yaml">name: Release

on:
  push:
    tags:
      - 'v*'

jobs:
  build:
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
    runs-on: ${{ matrix.os }}

    steps:
      - uses: actions/checkout@v3

      - uses: dart-lang/setup-dart@v1
        with:
          sdk: stable

      - name: Install dependencies
        run: dart pub get

      - name: Compile binary
        run: |
          mkdir -p dist
          dart compile exe bin/dart_http.dart -o dist/dart_http-${{ runner.os }}

      - name: Upload binary to release
        uses: softprops/action-gh-release@v1
        with:
          files: dist/dart_http-${{ runner.os }}
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
</code></pre>
<p>Every time you push a version tag (<code>v1.0.0</code>), GitHub Actions compiles binaries for all three platforms and attaches them to the GitHub Release automatically.</p>
<h4 id="heading-write-an-install-script">Write an install script:</h4>
<pre><code class="language-bash">#!/usr/bin/env bash
set -euo pipefail

VERSION="1.0.0"
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
BINARY="dart_http-$OS"
INSTALL_DIR="/usr/local/bin"

curl -L "https://github.com/yourname/dart_http/releases/download/v\(VERSION/\)BINARY" \
  -o "$INSTALL_DIR/dart_http"

chmod +x "$INSTALL_DIR/dart_http"
echo "dart_http installed successfully"
</code></pre>
<p>Developers install it with:</p>
<pre><code class="language-bash">curl -fsSL https://raw.githubusercontent.com/yourname/dart_http/main/install.sh | bash
</code></pre>
<h3 id="heading-mode-4-homebrew-tap">Mode 4: Homebrew Tap</h3>
<p>Homebrew is the standard package manager for macOS and is widely used on Linux. A Homebrew tap makes your tool installable with <code>brew install</code> — the most familiar installation pattern for macOS developers.</p>
<h4 id="heading-create-your-tap-repository">Create your tap repository:</h4>
<p>Create a new GitHub repository named <code>homebrew-tools</code> (the <code>homebrew-</code> prefix is required by Homebrew's naming convention).</p>
<h4 id="heading-write-the-formula">Write the formula:</h4>
<p>Create <code>Formula/dart_http.rb</code> in that repository:</p>
<pre><code class="language-ruby">class DartHttp &lt; Formula
  desc "A lightweight API request runner for the terminal"
  homepage "https://github.com/yourname/dart_http"
  version "1.0.0"

  on_macos do
    url "https://github.com/yourname/dart_http/releases/download/v1.0.0/dart_http-macOS"
    sha256 "YOUR_SHA256_HASH_HERE"
  end

  on_linux do
    url "https://github.com/yourname/dart_http/releases/download/v1.0.0/dart_http-Linux"
    sha256 "YOUR_SHA256_HASH_HERE"
  end

  def install
    bin.install "dart_http-#{OS.mac? ? 'macOS' : 'Linux'}" =&gt; "dart_http"
  end

  test do
    system "#{bin}/dart_http", "--help"
  end
end
</code></pre>
<p>Generate the SHA256 hash for each binary:</p>
<pre><code class="language-bash">shasum -a 256 dist/dart_http-macOS
</code></pre>
<h4 id="heading-install-from-the-tap">Install from the tap:</h4>
<pre><code class="language-bash">brew tap yourname/tools
brew install dart_http
</code></pre>
<p>When you release a new version, update the <code>url</code> and <code>sha256</code> values in the formula and push the change. Users run <code>brew upgrade dart_http</code> to update.</p>
<h3 id="heading-mode-5-docker">Mode 5: Docker</h3>
<p>Docker distribution is best suited for CI environments, teams that standardise on containers, or tools with complex dependencies.</p>
<h4 id="heading-write-a-dockerfile">Write a Dockerfile:</h4>
<pre><code class="language-dockerfile">FROM dart:stable AS build

WORKDIR /app
COPY pubspec.* ./
RUN dart pub get

COPY . .
RUN dart compile exe bin/dart_http.dart -o /app/dart_http

FROM debian:stable-slim
COPY --from=build /app/dart_http /usr/local/bin/dart_http

ENTRYPOINT ["dart_http"]
</code></pre>
<p>This uses a multi-stage build: the first stage compiles the binary using the Dart SDK image, and the second stage copies only the binary into a minimal Debian image. The final image has no Dart SDK — just the compiled binary.</p>
<h4 id="heading-build-and-run">Build and run:</h4>
<pre><code class="language-bash">docker build -t dart_http .
docker run dart_http get https://jsonplaceholder.typicode.com/users/1
</code></pre>
<h4 id="heading-publish-to-docker-hub">Publish to Docker Hub:</h4>
<pre><code class="language-bash">docker tag dart_http yourname/dart_http:1.0.0
docker push yourname/dart_http:1.0.0
</code></pre>
<p>Users can then run your tool without installing anything locally:</p>
<pre><code class="language-bash">docker run yourname/dart_http get https://api.example.com/users
</code></pre>
<h2 id="heading-choosing-the-right-distribution-mode">Choosing the Right Distribution Mode</h2>
<table>
<thead>
<tr>
<th>Mode</th>
<th>Best for</th>
<th>Dart SDK required</th>
</tr>
</thead>
<tbody><tr>
<td>pub.dev</td>
<td>Public Dart/Flutter developer tools</td>
<td>Yes</td>
</tr>
<tr>
<td>Local path activation</td>
<td>Internal team tools, pre-release builds</td>
<td>Yes</td>
</tr>
<tr>
<td>Compiled binary</td>
<td>Language-agnostic tools, broad adoption</td>
<td>No</td>
</tr>
<tr>
<td>Homebrew tap</td>
<td>macOS/Linux developer tools</td>
<td>No</td>
</tr>
<tr>
<td>Docker</td>
<td>CI environments, complex dependencies</td>
<td>No</td>
</tr>
</tbody></table>
<p>For most tools, the practical recommendation is:</p>
<ul>
<li><p>Start with <strong>pub.dev</strong> if your audience is Dart developers</p>
</li>
<li><p>Add <strong>compiled binary + GitHub Releases</strong> once you want broader adoption</p>
</li>
<li><p>Add a <strong>Homebrew tap</strong> when macOS developers start asking for it</p>
</li>
<li><p>Use <strong>Docker</strong> only when it is already part of your team's workflow</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You've gone from understanding what a CLI is to building three progressively complex tools and distributing them across five different channels.</p>
<p>The foundational skills – <code>args</code>, <code>stdin</code>, <code>stdout</code>, <code>stderr</code>, exit codes, file I/O, and process spawning – are the same building blocks that tools like <code>flutter</code>, <code>git</code>, and <code>dart</code> themselves are built on. Everything else is composition.</p>
<p>The three CLIs we built (Hello CLI, <code>dart_todo</code>, and <code>dart_http</code>) each introduced a new layer: raw Dart fundamentals, the <code>args</code> package with JSON persistence, and real-world HTTP interaction. The distribution section ensures that whatever you build next, you have a clear path to getting it in front of the developers who will use it.</p>
<p>Dart is a powerful language for CLI development. Its strong typing, async support, native compilation, and pub.dev ecosystem make it a serious choice for building developer tooling, not just mobile apps.</p>
<p>The next step is building something that solves a real problem for you or your team, and shipping it.</p>
<p>Happy coding!!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Bypass Cloud SMTP Restrictions Using Brevo and HTTP APIs ]]>
                </title>
                <description>
                    <![CDATA[ Being able to communicate by sending emails through web applications is important these days. It helps businesses stay connected with their potential customers, securely verify user identities, and de ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-bypass-cloud-smtp-restrictions-using-brevo-and-http-apis/</link>
                <guid isPermaLink="false">69fe2a2ef239332df4f7f8b9</guid>
                
                    <category>
                        <![CDATA[ api ]]>
                    </category>
                
                    <category>
                        <![CDATA[ brevo ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google smtp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cloud-smtp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Okoro Emmanuel Nzube ]]>
                </dc:creator>
                <pubDate>Fri, 08 May 2026 18:23:42 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/64f68792-e18c-4b90-9c65-c6d9884ab191.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Being able to communicate by sending emails through web applications is important these days. It helps businesses stay connected with their potential customers, securely verify user identities, and deliver crucial notifications like password resets.</p>
<p>But sometimes, deploying your perfectly working email function to the cloud leads to unexpected and frustrating errors. You build your backend, test it locally, and it works flawlessly. Then you deploy to the cloud, and suddenly your app stops sending emails completely.</p>
<p>In this article, you’ll learn exactly why your email setup fails on cloud platforms like Render or Heroku, the underlying networking rules causing the issue, and how to elegantly bypass these restrictions using Brevo's HTTP API.</p>
<p>Let’s dive right in.</p>
<h2 id="heading-outline">Outline</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-tools-well-be-using">Tools We'll Be Using</a></p>
</li>
<li><p><a href="#heading-the-problem-nodemailer-and-smtp-blocking">The Problem: Nodemailer and SMTP Blocking</a></p>
</li>
<li><p><a href="#heading-the-modern-trap-domain-verification">The "Modern" Trap: Domain Verification</a></p>
</li>
<li><p><a href="#heading-the-ultimate-solution-brevo-and-http-apis">The Ultimate Solution: Brevo and HTTP APIs</a></p>
</li>
<li><p><a href="#heading-backend-setup">Backend Setup</a></p>
</li>
<li><p><a href="#heading-brevo-configuration-setup">Brevo Configuration Setup</a></p>
</li>
<li><p><a href="#heading-creating-the-email-function">Creating the Email Function</a></p>
</li>
<li><p><a href="#heading-integrating-the-function-into-an-express-route">Integrating the Function into an Express Route</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To get the absolute most out of this tutorial, it’s important to have some basic knowledge of the following:</p>
<ul>
<li><p><strong>JavaScript and Node.js:</strong> Having a good fundamental understanding of how JS works on the server side will make it easier to follow along with the project.</p>
</li>
<li><p><strong>REST APIs:</strong> You should have a basic understanding of how HTTP requests (like POST and GET) work using native <code>fetch()</code> in Node.js.</p>
</li>
<li><p><strong>Express.js:</strong> A little background on creating basic server routes will be helpful, as we'll build a real-world controller.</p>
</li>
<li><p>A basic understanding of what <a href="https://nodemailer.com/">Nodemailer</a> is and how cloud hosting platforms (like Render or Heroku) operate.</p>
</li>
</ul>
<h2 id="heading-tools-well-be-using">Tools We’ll Be Using</h2>
<p>In one of my recent projects, I created a complex authentication flow where users needed an OTP (One Time Password) sent to their email to complete registration. I set up Nodemailer, linked my Gmail, and tested it on <code>localhost</code>. Within seconds, the emails arrived perfectly.</p>
<p>But when I deployed my backend to Render, the entire signup flow broke. After doing some deep digging, I found out why it broke and how to fix it permanently. And now that I know how it works, I wanted to share it with you all.</p>
<h2 id="heading-the-problem-nodemailer-and-smtp-blocking">The Problem: Nodemailer and SMTP Blocking</h2>
<p>So what exactly is the issue?</p>
<p>Nodemailer is a very popular Node.js module that lets you send emails efficiently. Usually, developers use it to connect to services like Gmail or Mailtrap using <strong>SMTP</strong> (Simple Mail Transfer Protocol). When your code tries to send an email, Nodemailer opens a connection to the mail server using Port <code>587</code> (for STARTTLS) or Port <code>465</code> (for SSL).</p>
<p>But cloud providers like Render, Heroku, DigitalOcean, and AWS face a massive daily battle against automated spammers. Malicious users often spin up thousands of free-tier servers specifically to blast out millions of spam emails. If a cloud provider allows this, their entire network IP address block will get blacklisted by Gmail, Outlook, and Yahoo.</p>
<p>To protect their network reputation, cloud providers enacted a heavy-handed, silent rule: <strong>All outbound traffic on Ports 25, 465, and 587 is strictly blocked on free and entry-level tiers.</strong></p>
<p>This means your server is literally trapped behind a firewall. If you check your server logs, you won't see an "Invalid Password" error. Instead, you'll see a timeout error that looks like this:</p>
<pre><code class="language-plaintext">Error: connect ETIMEDOUT 142.250.102.108:587
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
</code></pre>
<p>Your code isn't broken – it's just being blocked at the network level!</p>
<h3 id="heading-the-modern-trap-domain-verification">The "Modern" Trap: Domain Verification</h3>
<p>When developers hit this wall, they often try modern API-based email services like Resend or SendGrid. These are amazing tools, but they introduce a new problem for beginners: <strong>Strict Domain Authentication.</strong></p>
<p>To use Resend in production, you must own a custom domain (like <code>yourname.com</code>) and configure DNS records (SPF, DKIM, and DMARC). If you don't own a domain, Resend's sandbox mode strictly restricts you to sending emails <em>only</em> to yourself. You can't send emails to your live users.</p>
<p>For a developer just trying to launch a portfolio project, buying a domain just to send test emails is a huge bottleneck.</p>
<h3 id="heading-the-ultimate-solution-brevo-and-http-apis">The Ultimate Solution: Brevo and HTTP APIs</h3>
<p>We need a solution that meets two criteria:</p>
<ol>
<li><p>It must bypass the Port <code>587</code> firewall.</p>
</li>
<li><p>It must let us send emails to <em>anyone</em> without forcing us to buy a custom domain.</p>
</li>
</ol>
<p>This is where the architectural difference between SMTP and REST APIs comes to the rescue. While SMTP is a dedicated protocol for routing mail, a REST API operates over standard web traffic using <strong>HTTPS (Port 443)</strong>. Cloud providers <em>can't</em> block Port 443, because doing so would prevent your server from fetching data from databases or functioning as a web server entirely.</p>
<p>Enter <strong>Brevo</strong> (formerly Sendinblue). Brevo is a powerful email platform that allows you to send emails via a standard REST API. Best of all, their free tier (300 emails/day) allows Single Sender Verification. You just verify your standard Gmail address, and they let you send to anyone!</p>
<p>By sending a JSON payload via HTTPS to Brevo's API, your server routes the traffic out of the unrestricted Port <code>443</code>, bypassing the Render firewall completely.</p>
<p>Now that you know the theory behind the tools we’ll be using, let’s move on to writing the code.</p>
<h2 id="heading-backend-setup">Backend Setup</h2>
<p>First things first, you have to set up your environment. If you don't already have Node.js installed on your computer, head to their <a href="https://nodejs.org/en">website</a> to download and install it.</p>
<p>Start by running <code>npm init -y</code> in your terminal. This creates the <code>package.json</code> file which manages your project and stores all the dependencies.</p>
<p>Next, run <code>npm install express dotenv</code>.</p>
<p>You might be used to installing <code>nodemailer</code> for your email tasks. But because we are going to use the native Node.js <code>fetch()</code> API to talk to the Brevo API, you actually don't need to install <em>any</em> heavy email libraries at all! We want to keep our backend as lightweight as possible.</p>
<h3 id="heading-brevo-configuration-setup">Brevo Configuration Setup</h3>
<p>Before you write the email function, you first need to configure Brevo to get access to your API key.</p>
<ol>
<li><p>Go to <a href="https://www.brevo.com/">Brevo.com</a> and create a free account.</p>
</li>
<li><p>During setup, they will ask you to add a <strong>Sender Email</strong>. Make sure you input your standard Gmail address. They will send you an email with a link to verify you own this address.</p>
</li>
<li><p>Once verified and inside the dashboard, click on your profile name in the top right corner, and select <strong>SMTP &amp; API</strong> from the dropdown menu.</p>
</li>
<li><p>Go to the <strong>API Keys</strong> tab and click <strong>Generate a new API key</strong>. Give it a name like "MyWebApp".</p>
</li>
</ol>
<p>Copy this generated key and store it safely in a <code>.env</code> file at the root of your project:</p>
<pre><code class="language-env"># .env file
EMAIL_USER = yourverifiedemail@gmail.com
BREVO_API_KEY = xkeysib-your-generated-api-key-goes-here
</code></pre>
<h3 id="heading-creating-the-email-function">Creating the Email Function</h3>
<p>Now that you’ve gotten your API key and set up your environment variables, all that remains is to start putting your backend code together.</p>
<p>Create a file named <code>utils/email.js</code>.</p>
<p>First, start by ensuring you can load your <code>.env</code> file so you can easily access the credentials you generated:</p>
<pre><code class="language-javascript">require("dotenv").config();

// We'll define the function to accept dynamic options
const sendEmail = async (options) =&gt; {
  const brevoApiKey = process.env.BREVO_API_KEY;
  const senderEmail = process.env.EMAIL_USER;

  // Validate that the keys actually exist
  if (!brevoApiKey || !senderEmail) {
    throw new Error("Missing Brevo credentials in environment variables.");
  }
</code></pre>
<p>Next on the line, you’ll need to structure your payload. This is the JSON object that tells Brevo exactly who is sending the email, who is receiving it, and what the content is. Here’s how you can do that:</p>
<pre><code class="language-javascript">  const payload = {
    sender: {
      name: "My Awesome Web App",
      email: senderEmail, // Must match your verified Brevo email
    },
    to: [
      {
        email: options.email, // The dynamic email address of the user receiving the email
      },
    ],
    subject: options.subject,
    htmlContent: options.html,
  };
</code></pre>
<p>In the code above, the <code>payload</code> object securely packages up your information. We pass in <code>options.email</code>, <code>options.subject</code>, and <code>options.html</code> so that we can reuse this single function for welcome emails, password resets, and notifications.</p>
<p>Now, create the actual network request that sends your data to the Brevo backend. We'll use the <code>POST</code> method. When the data is sent, it must be stringified into a JSON format.</p>
<pre><code class="language-javascript">  try {
    const response = await fetch("https://api.brevo.com/v3/smtp/email", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "api-key": brevoApiKey,
      },
      body: JSON.stringify(payload),
    });

    const result = await response.json();

    if (!response.ok) {
      throw new Error(`Brevo API Error: ${JSON.stringify(result)}`);
    }

    console.log(`Email successfully sent to ${options.email} via Brevo HTTP API!`);
  } catch (error) {
    console.error("Error details:", error.message);
  }
};

module.exports = sendEmail;
</code></pre>
<p>In the code above, after the payload is submitted, if the message is sent successfully, a success log will be displayed in your terminal. But if the message wasn’t successful – maybe due to a typo in your API key – an error message will be thrown to help you debug exactly what went wrong.</p>
<h3 id="heading-integrating-the-function-into-an-express-route">Integrating the Function into an Express Route</h3>
<p>At this point, you've successfully built a robust email function. Let's see how you would actually use this in a real Express application.</p>
<p>Create an <code>index.js</code> file and set up a simple Express server route:</p>
<pre><code class="language-javascript">const express = require("express");
const sendEmail = require("./utils/email");
const app = express();

app.use(express.json()); // Middleware to parse JSON request bodies

app.post("/api/signup", async (req, res) =&gt; {
  const { username, email } = req.body;

  // 1. Save user to database (skipped for brevity)
  
  // 2. Generate a random OTP
  const otp = Math.floor(100000 + Math.random() * 900000);

  // 3. Send the email using our new Brevo function
  try {
    await sendEmail({
      email: email,
      subject: "Welcome! Here is your Verification Code",
      html: `
        &lt;div style="font-family: sans-serif; text-align: center;"&gt;
          &lt;h2&gt;Welcome to My Awesome Web App, ${username}!&lt;/h2&gt;
          &lt;p&gt;Please use the verification code below to complete your registration:&lt;/p&gt;
          &lt;h1 style="color: #2563eb; letter-spacing: 5px;"&gt;${otp}&lt;/h1&gt;
          &lt;p&gt;This code will expire in 10 minutes.&lt;/p&gt;
        &lt;/div&gt;
      `,
    });

    res.status(201).json({ message: "User created and email sent!" });
  } catch (error) {
    res.status(500).json({ error: "Failed to send email." });
  }
});

app.listen(8000, () =&gt; {
  console.log("Server running on port 8000");
});
</code></pre>
<p>And that is it! You can now hit this <code>/api/signup</code> endpoint from your React or Vue frontend, and it will instantly fire off a beautifully formatted email via Brevo's REST API.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>As developers, encountering a bug that works locally but fails in production is a rite of passage. But the "Email Delivery Failed" timeout error is special. It teaches you that software engineering isn't just about writing clean syntax – it's about understanding the underlying infrastructure, network layers, and the security context of the environment your code runs in.</p>
<p>By swapping a protocol (SMTP) for an architectural pattern (REST API over HTTPS), you didn't just fix a bug. You successfully engineered a secure, free, and robust bypass around a cloud-level firewall without relying on heavy third-party NPM modules like Nodemailer.</p>
<p>If you've made it this far, I hope I've successfully shown you the importance of understanding network layers and how you can use HTTP APIs to send email messages directly from your web applications safely.</p>
<p>Thank you for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Apply Academic Theories to Human-Centered Web Design [Full Handbook ]]>
                </title>
                <description>
                    <![CDATA[ Have you ever abandoned an app right at the sign‑up page? Or felt uneasy navigating a website because the buttons were scattered randomly, the colors clashed, and the layout felt confusing and unneces ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-apply-academic-theories-to-human-centered-web-design-handbook/</link>
                <guid isPermaLink="false">69fe29e9f239332df4f7cd02</guid>
                
                    <category>
                        <![CDATA[ Frontend Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ UX ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ux design ]]>
                    </category>
                
                    <category>
                        <![CDATA[ UI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ UI Design ]]>
                    </category>
                
                    <category>
                        <![CDATA[ user experience ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MathJax ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Great John ]]>
                </dc:creator>
                <pubDate>Fri, 08 May 2026 18:22:33 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/d7621fda-83a6-460e-aa38-bce970d4a655.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Have you ever abandoned an app right at the sign‑up page? Or felt uneasy navigating a website because the buttons were scattered randomly, the colors clashed, and the layout felt confusing and unnecessarily complex?</p>
<p>Maybe you were asked to complete twenty fields in one go. You carefully filled everything out, hit Submit — and only then were you told that your password didn't meet some hidden, unspoken requirement. A requirement that was never communicated upfront.</p>
<p>Instead of helpful guidance, you were met with a vague message: “Invalid input." Invalid how, you wonder?</p>
<p>Required fields weren’t marked. There was no real‑time validation. No helpful red outline showing which field was wrong. Just a generic prompt telling you to “go back and correct missing information,” as if you’re supposed to magically know what the system wants.</p>
<p>So you scroll.</p>
<p>You search.</p>
<p>You guess.</p>
<p>And you're now getting frustrated.</p>
<p>The reason you're frustrated is simple: no one enjoys repeating a task they thought they had already completed — especially when the mistakes could've been prevented with clear guidance along the way.</p>
<p>You manage to fill in the form and you tap the Submit button.</p>
<p>Nothing happens.</p>
<p>No loading spinner.</p>
<p>No subtle animation.</p>
<p>No confirmation message.</p>
<p>No success screen.</p>
<p>Just silence. For a brief moment, you’re left wondering: Did it go through? So you tap again. And maybe… one more time.</p>
<p>At this point, you become fed up and you either postpone the signup process to when you have the time, or you may not ever return.</p>
<p>Even if you haven’t experienced this exact scenario, you’ve almost certainly felt the same kind of friction: that moment when a digital interface makes you pause, hesitate, or wonder what you’re supposed to do next.</p>
<p>These frustrations often arise because frontend developers either overlook or are unaware of the essential design principles and theories that underpin a smooth, intuitive user experience.</p>
<p>As a frontend developer, your interface should minimise cognitive load, provide immediate clarity, and guide users effortlessly through every task.</p>
<p>In this handbook, I'll introduce the academic theories that should inform and elevate your frontend decisions.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-10-fittss-law">1.0 Fitts’s Law:</a></p>
<ul>
<li><p><a href="#heading-11-use-padding-wisely">1.1 Use padding wisely</a></p>
</li>
<li><p><a href="#heading-12-use-infinite-targets">1.2 Use infinite targets</a></p>
</li>
<li><p><a href="#heading-design-takeaway-from-fitts-law">Design Takeaway from Fitts Law:</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-20-hicks-law">2.0 Hick's Law:</a></p>
<ul>
<li><a href="#heading-design-takeaway-from-hicks-law">Design Takeaway from Hick's Law</a></li>
</ul>
</li>
<li><p><a href="#heading-30-gestalt-principles">3.0 Gestalt Principles:</a></p>
<ul>
<li><p><a href="#heading-key-gestalt-principles">Key Gestalt Principles:</a></p>
</li>
<li><p><a href="#heading-31-proximity">3.1 Proximity</a></p>
</li>
<li><p><a href="#heading-32-similarity">3.2 Similarity</a></p>
</li>
<li><p><a href="#heading-33-continuity">3.3 Continuity</a></p>
</li>
<li><p><a href="#heading-34-closure">3.4 Closure</a></p>
</li>
<li><p><a href="#heading-35-figureground">3.5 Figure/Ground</a></p>
</li>
<li><p><a href="#heading-36-common-fate">3.6 Common Fate</a></p>
</li>
<li><p><a href="#heading-37-focal-point">3.7 Focal Point</a></p>
</li>
<li><p><a href="#heading-design-takeaways-from-the-gestalt-principles">Design Takeaways from the Gestalt Principles</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-40-von-restorff-effect-the-isolation-effect">4.0 Von Restorff Effect (The Isolation Effect):</a></p>
<ul>
<li><a href="#heading-design-takeway-from-von-restorff">Design takeway from Von Restorff</a></li>
</ul>
</li>
<li><p><a href="#heading-50-jakobs-law">5.0 Jakob’s Law</a></p>
<ul>
<li><a href="#heading-design-takeaway-from-jakobs-law">Design Takeaway from Jakob's Law</a></li>
</ul>
</li>
<li><p><a href="#heading-60-millers-law">6.0 Miller’s Law</a></p>
<ul>
<li><a href="#heading-design-takeaway-from-millers-law">Design Takeaway from Miller's Law</a></li>
</ul>
</li>
<li><p><a href="#heading-70-the-goal-gradient-hypothesis">7.0 The Goal-Gradient Hypothesis</a></p>
<ul>
<li><a href="#heading-design-takeaway-from-goal-gradient-hypothesis">Design Takeaway from Goal-Gradient Hypothesis</a></li>
</ul>
</li>
<li><p><a href="#heading-80-zeigarnik-effect">8.0 Zeigarnik Effect</a></p>
<ul>
<li><a href="#heading-design-takeaway-from-zeigarnik-effect">Design Takeaway from Zeigarnik Effect</a></li>
</ul>
</li>
<li><p><a href="#heading-90-teslas-law">9.0 Tesla’s Law:</a></p>
<ul>
<li><a href="#heading-design-takeaway-from-teslas-law">Design Takeaway from Tesla's Law</a></li>
</ul>
</li>
<li><p><a href="#heading-100-peak-end-rule">10.0 Peak End Rule:</a></p>
<ul>
<li><a href="#heading-design-takeaway-from-peak-end-rule">Design takeaway from Peak End Rule</a></li>
</ul>
</li>
<li><p><a href="#heading-110-postels-law">11.0 Postel’s Law:</a></p>
<ul>
<li><a href="#heading-design-takeaway-from-postels-law">Design Takeaway from Postel's Law</a></li>
</ul>
</li>
<li><p><a href="#heading-120-doherty-threshold">12.0 Doherty Threshold:</a></p>
<ul>
<li><a href="#heading-design-takeaways-from-doherty-threshold">Design Takeaways from Doherty Threshold</a></li>
</ul>
</li>
<li><p><a href="#heading-130-serial-position-effect-primacy-and-recency">13.0 Serial Position Effect (Primacy and Recency):</a></p>
<ul>
<li><a href="#heading-design-takeaways-serial-position-effect">Design Takeaways Serial Position Effect</a></li>
</ul>
</li>
<li><p><a href="#heading-140-occams-razor">14.0 Occam’s Razor:</a></p>
<ul>
<li><a href="#heading-design-takeaway-from-occams-razor">Design Takeaway from Occam's Razor</a></li>
</ul>
</li>
<li><p><a href="#heading-150-parkinsons-law">15.0 Parkinson's Law</a></p>
<ul>
<li><a href="#heading-design-takeaway-for-parkinsons-law">Design Takeaway for Parkinson's law</a></li>
</ul>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-references">References</a></p>
</li>
</ul>
<p>You might wonder what academic theories have to do with frontend development.</p>
<p>The answer is simple. Academic theories aren't abstract ideas. There are the result of rigorous scientific investigation — controlled experiments, validated models, and decades of research into how humans think, learn, perceive, and interact with information.</p>
<p>Because these theories are grounded in evidence rather than opinion, they offer reliable guidance for building interfaces that align with how the human brain actually processes information.</p>
<p>Applying them to frontend development means you're not designing by guesswork or personal preference. Instead, you're applying tested, scientific insights to create clearer, faster, more humane user experiences.</p>
<p>In other words, when you build with academic theory in mind, your frontend becomes more than just visually appealing — it becomes cognitively efficient, behaviourally aligned, and measurably easier for users to navigate.</p>
<p>You can use the following laws and principles to guide your development work. Let’s start by looking at Fitt’s law.</p>
<h2 id="heading-10-fittss-law"><strong>1.0 Fitts’s Law:</strong></h2>
<p>Fitts’s law is the brainchild of Paul Fitts. He was among the early psychologists who recognised that many human errors result from flawed design rather than simple human weakness.</p>
<p>During World War II, he studied airplane cockpit layouts and concluded that numerous incidents attributed to pilot error were actually caused by poor design decisions (Hall, 2023; Budiu, 2022).</p>
<p>Here's the formula:</p>
<p>$$T = a + b \cdot \log_2\left(1 + \frac{D}{W}\right)$$</p>
<p>T = Movement Time</p>
<p>D = Distance to the target</p>
<p>W = Width (size) of the target</p>
<p>a, b = Empirically determined constants</p>
<p>Based on his findings, Fitts postulated that the time required to acquire/reach a target is determined by the distance to the target and the size of the target.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/7d2699bd-4878-4ab0-8669-21616cb8faf1.png" alt="7d2699bd-4878-4ab0-8669-21616cb8faf1" style="display:block;margin:0 auto" width="691" height="244" loading="lazy">

<p><em>Fig 1.0: Illustration of Fitts Law.</em></p>
<p>From the above, between Target B and Target C, it will be faster to interact with Target C than Target B simply because of the distance (Target B is farther away). Interestingly, though Target A and Target C are at the same distance, Target C will still be faster to interact with and less error-prone because of its larger size.</p>
<p>In simple terms, Fitt’s Law tells us that the time required to move to a target depends on two main factors: the distance to the target and the size of the target. The farther away an element is, the longer it takes to reach. The smaller it is, the more precision it demands, which increases the interaction time and the likelihood of errors.</p>
<p>Conversely, closer and larger targets reduce cognitive load, motor effort, and frustration.</p>
<p>In a nutshell, Fitts’s main message to developers is to reduce the distance users must travel on the screen and to make important buttons large and visually dominant.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/bc293949-cadc-46b2-892d-bdfa1dfc6db3.jpg" alt="bc293949-cadc-46b2-892d-bdfa1dfc6db3" style="display:block;margin:0 auto" width="928" height="664" loading="lazy">

<p><em>Fig 1.1: Showing Call-to-Action buttons are the largest and most visually prominent elements on each screen.</em></p>
<p>From the image above, you can see that the Call-to-Action buttons on each of the screens are the most visually dominant button and largest in size. They're also placed within the natural region. This makes them faster/easier to interact with.</p>
<p>You should also place your Call-to-Action button within the natural zone. This is a zone on a mobile phone where it's easy to reach with the thumb (as most people use their thumbs to select things on a phone screen). Here's a diagram showing the "natural zone" on a typical smartphone. It's much faster for a user to interact within the "natural zone" than the "hard zone" (see figure).</p>
<img src="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/e98d8d91-b4e6-4b4a-96e5-2524a6e09028.png" alt="e98d8d91-b4e6-4b4a-96e5-2524a6e09028" style="display:block;margin:0 auto" width="1536" height="1024" loading="lazy">

<p><em>Fig 1.2: Showing three different zones for buttons placement (natural, stretching and hard region)</em></p>
<h3 id="heading-11-use-padding-wisely">1.1 Use Padding Wisely</h3>
<p>Fitts' law can be applied to your development by increasing padding wisely. You can also use padding to increase the interactive area. By doing this, you're increasing the size of the targets.</p>
<p>This is important, because imagine a menu that disappears the moment your cursor drifts a few inches away. You’re weren't trying to close it — you simply moved slightly, and suddenly the entire menu collapses. That tiny slip forces you to start the interaction all over again. It’s a small mistake, but it creates a disproportionately frustrating experience.</p>
<p>This happens because the interactive area is too narrow.</p>
<p>That’s why effective padding — or more broadly, generous interactive zones — is essential. By increasing the clickable or hoverable area around a menu, you are increasing the size of the targets, which makes the interaction more stable, more forgiving, and far less cognitively demanding.</p>
<p>This ensures users can move naturally without fear of accidentally “falling off” the target.</p>
<h3 id="heading-12-use-infinite-targets">1.2 Use Infinite Targets</h3>
<p>Another fundamental principle that emerges from Fitt’s Law is the idea of infinite targets. When an interface element is placed at the very edge or corner of a screen, it becomes effectively “infinite” because the cursor can't move beyond the screen boundary. The edge acts as a physical barrier, allowing the user to fling the mouse in that direction without precision or careful aiming.</p>
<p>As a result, corners and edges become the fastest, easiest, and most reliable places for users to access important controls.</p>
<p>This is why operating systems such as Apple’s macOS and Microsoft Windows position their most essential menus and buttons at these locations. The macOS Apple Menu sits in the top‑left corner, Windows historically placed the Start button in the bottom‑left corner, and both systems anchor taskbars, docks, and notification areas along screen edges.</p>
<p>These placements reduce cognitive load, minimise motor effort, and increase interaction speed because users do not need to slow down or correct their cursor movement. The screen itself “catches” the pointer.</p>
<p>In essence, infinite targets transform small interface elements into large, easy‑to‑hit zones simply by leveraging the geometry of the screen.</p>
<p>What this means for you: place your most important and frequently used actions where users can reach them with the least effort. Screen edges and corners act as natural stopping points, meaning users can't overshoot them.</p>
<h3 id="heading-design-takeaways-from-fitts-law">Design Takeaways from Fitts Law:</h3>
<p><strong>Place Primary Actions Where the Task Ends:</strong><br>Placing a submit button at the top‑right forces users to travel all the way back after completing a long form. This increases interaction cost and breaks flow. The best place for a submit button is at the bottom of the form — exactly where the user finishes the task. This aligns with natural reading and interaction patterns.</p>
<p><strong>Keep Related Actions Physically Close:</strong><br>Separating “Add to Cart” and “Check Out” across opposite sides of the screen forces unnecessary thumb movement. Group related actions to reduce effort and speed up decisions.</p>
<p><strong>Make Primary Targets Large and Visually Dominant:</strong><br>Your main CTAs (“Subscribe Now,” “Pay Now,” “Create Account,” “Sign Up”) should be the most recognisable elements on the screen. Large, high‑contrast targets reduce errors and improve speed.</p>
<p><strong>Place High‑Value Actions at Screen Edges and Corners:</strong><br>Edges and corners act as “infinite targets” because the cursor can’t overshoot them. This makes them the fastest, easiest, and most reliable places for critical controls.</p>
<p>A tiny icon in the middle of the screen is hard to hit. The same icon placed at an edge becomes effectively huge because the boundary “catches” the pointer. Also, actions like navigation, primary CTAs, or global controls should live where users can reach them with minimal effort. Avoid burying important actions in the centre of the screen.</p>
<p><strong>Increase Target Size With Generous Padding:</strong><br>Small interactive zones force users to aim with pixel‑level precision. Adding padding expands the clickable or hoverable area, making interactions easier, faster, and more forgiving.</p>
<p><strong>Prevent Accidental “Fall‑Off” With Larger Hit Areas:</strong><br>Menus that collapse the moment the cursor drifts slightly create frustration. A wider interactive zone keeps the menu open during natural mouse movement, reducing accidental resets.</p>
<p>Users don’t move perfectly. Interfaces should accommodate slight slips without punishing them. Larger targets reduce cognitive load and eliminate unnecessary frustration. so by increasing the effective size of buttons, menus, and controls, you create interactions that feel stable and predictable, and users can move confidently without fear of losing their place.</p>
<p><strong>To Sum Up:</strong> The farther away an element is, the longer it takes to reach. The smaller it is, the more precision it demands, which increases the interaction time and the likelihood of errors. Conversely, closer and larger targets reduce cognitive load, motor effort, and frustration.</p>
<h2 id="heading-20-hicks-law"><strong>2.0 Hick's Law</strong>:</h2>
<p>Hick’s Law is a psychological principle that describes the relationship between the number of choices presented to a user and the time it takes them to make a decision. It was formulated by William Edmund Hick in 1952 (Yablonski, 2022; Proctor &amp; Scheider, 2018).</p>
<p>The law states that as the number of options increases, the decision time increases logarithmically. In simple terms, more choices slow users down, while fewer choices speed up decision-making.</p>
<p>$$T = a + b \cdot \log_2(n + 1)$$</p>
<p>Where:</p>
<p>T = time to make a decision,</p>
<p>n = number of choices,</p>
<p>b= a constant that depends on the task and the individual</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/d2c45f4b-77e9-42dc-a9c3-165dfe5f7ce7.jpg" alt="d2c45f4b-77e9-42dc-a9c3-165dfe5f7ce7" style="display:block;margin:0 auto" width="1136" height="692" loading="lazy">

<p><em>Figure 2.0 illustrates the relationship between user experience, reaction time, and the number of actions.</em></p>
<p>This is how users feel, for example, when they encounter a form that asks for too much information upfront. The longer the form gets, the more frustrated they become.</p>
<p>Examples of this are overloading menus with too many items, presenting long, unorganised forms, giving too many calls-to-action on one screen, and building nested menus with excessive depth.</p>
<p>All of these create friction and can lead to cognitive overload.</p>
<h3 id="heading-design-takeaway-from-hicks-law">Design Takeaway from Hick's Law</h3>
<p><strong>Avoid Overloading Users With Too Many Actions:</strong><br>Too many buttons, menu items, or choices at once increases cognitive load and slows decision‑making. Users freeze when everything competes for attention.</p>
<p><strong>Keep Navigation Clean and Focused:</strong><br>Cluttered menus hurt both usability and SEO. Search engines struggle to track overly complex navigation structures, and users struggle to find what matters.</p>
<p><strong>Use Progressive Disclosure to Reduce Complexity:</strong><br>Hide advanced or rarely used options under “More” or expandable sections. Reveal complexity only when the user needs it.</p>
<p><strong>Break Complex Tasks Into Smaller, Manageable Steps:</strong><br>Progressive disclosure works beautifully for multi‑step forms and decision flows. Smaller steps reduce overwhelm and improve completion rates.</p>
<p><strong>Group Related Options Into Logical Categories:</strong><br>Organising actions into meaningful clusters helps users process information faster. For example, placing “Edit” and “Delete” together leverages natural mental grouping.</p>
<div class="embed-wrapper"><iframe width="560" height="315" src="https://www.youtube.com/embed/mbYIfRxSkHs" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>

<p><em>Video 2.0: Video description of Progressive Disclosure.</em></p>
<p>From the video above, instead of showing all the menu details at once, it is better to hide them initially. As you can see, the additional information only appears when the arrow down button is pressed. This approach prevents overwhelming the user and keeps the interface clean and focused.</p>
<p>You should also reduce decision anxiety, as too many choices create doubt and friction (as they say, the more you ask from a user, the less you get).</p>
<p>Beyond this, try to use recommended labels, show brief descriptions, provide visual previews, and use comparison tables wisely to show comparison between products especially when they have many characteristics. An example of a comparison table is shown below:</p>
<p><a href="https://drive.google.com/file/d/1sY-tb9W1QDnyrH9dd3NSYsteP9WoMT27/view?usp=drive_link"><img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/5e52527a-3faf-4da1-97c2-a2a09ca7af8b.jpg" alt="Use of comparison table to compare products" style="display:block;margin:0 auto" width="1038" height="972" loading="lazy"></a></p>
<p><em>Figure 2.1: A comparison table being used to simplify complex information.</em></p>
<p>Also, rather than showing advanced configuration options by default, display only the most commonly used settings. Advanced options can be hidden under an expandable section like “Advanced” or “More Settings. This makes your interface less cluttered and more visually organized.</p>
<p>And speaking of visual organization, this is the perfect moment to introduce Gestalt principles — the psychological rules that explain how users naturally group and interpret what they see.</p>
<p><strong>To Sum Up</strong>: As the number of options increases, the decision time increases logarithmically.</p>
<h2 id="heading-30-gestalt-principles"><strong>3.0 Gestalt Principles</strong>:</h2>
<p>In the 1920s, a group of German psychologists – Max Wertheimer, Kurt Koffka, and Wolfgang Köhlern – introduced what are now known as the Gestalt Principles. Their work sought to understand how humans perceive and interpret visual information (Bustamante, 2023).</p>
<p>The word “Gestalt” is German for “unified whole,” reflecting the core idea behind the theory: people naturally perceive objects as organised patterns and complete forms rather than as separate, disconnected parts.</p>
<p>These principles explain how the human mind structures visual elements to make sense of the world. Over time, they have become highly influential in fields such as design, user experience (UX), psychology, and data visualization, where understanding perception is critical.</p>
<h3 id="heading-key-gestalt-principles">Key Gestalt Principles:</h3>
<h3 id="heading-31-proximity">3.1 Proximity</h3>
<p>Elements that are placed close to each other are perceived as a group, while those spaced far apart are seen as separate. This is why labels are placed directly next to their corresponding input fields.</p>
<p>For example: In a blog feed, the "Title," "Author," and "Date" should have small margins between them (8px), while the space between one blog post card and the next should be much larger (40px). This tells the user's brain: "These three text strings belong to this specific post."</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/b68ab7f4-1b2e-47f1-8e8c-8ae8f32198c1.jpg" alt="b68ab7f4-1b2e-47f1-8e8c-8ae8f32198c1" style="display:block;margin:0 auto" width="1046" height="956" loading="lazy">

<p><em>Fig 3:0 Illustration of proximity (Gestalt Principle)</em></p>
<p>From the fig above, the spacing within the blog feed plays a powerful role in how effortlessly users interpret what they see. When elements sit close together, the brain instinctively treats them as belonging to the same unit. This is why placing the author credit just 8px beneath the title creates an immediate mental link. The viewer doesn’t need to pause or decode who wrote which article; proximity does the cognitive work automatically, forming a tight, intuitive grouping.</p>
<p>Equally important is the generous 40px gap between individual cards. This larger spacing introduces “visual breathing room.” Without it, a feed can quickly collapse into a dense wall of text, overwhelming the user and discouraging exploration. The wider margin establishes a clear boundary—a natural stop-and-start rhythm—that makes each card feel distinct and the entire layout more scannable.</p>
<p>Finally, subtle spacing differences can guide behaviour, not just perception. The slightly larger 12px margin above the read‑more link separates it from the passive information above it. This spacing cues the user that the link represents an action rather than another piece of descriptive text. It’s a small adjustment, but it shifts the element’s role from informational to interactive, helping users understand what they can <em>do</em> next.</p>
<p>Together, these spacing decisions transform a simple list of posts into a structured, intuitive, and behaviourally clear interface—one where the user never has to think about the layout, because the layout is already thinking for them.</p>
<p>Proximity controls meaning: move elements closer to show connection, separate them to show difference.</p>
<h3 id="heading-32-similarity">3.2 Similarity</h3>
<p>We naturally group elements that share similar visual characteristics, such as color, shape, size, or orientation.</p>
<p>For example, even if buttons are spread across a page, if they're all the same shade of blue, the user understands they perform similar functions.</p>
<p>If your primary "Submit" button is blue with rounded corners, every other primary action on your site should look exactly the same. If you suddenly use a square red button for a primary action, the user will be confused because the "similarity" is broken.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/151d9658-5a4b-4d1d-b885-02c73c53cdbd.jpg" alt="151d9658-5a4b-4d1d-b885-02c73c53cdbd" style="display:block;margin:0 auto" width="1618" height="2170" loading="lazy">

<p><em>Fig 3:1 : illustration of similarity (Gestalt principle)</em></p>
<p>As you can see from above, the layout clearly demonstrates how the Gestalt Principle of Similarity works by showing two different visual situations: one where everything matches, and one where a single element breaks the pattern.</p>
<p>All three product cards share the same visual characteristics:</p>
<ul>
<li><p>Same card shape</p>
</li>
<li><p>Same border and shadow</p>
</li>
<li><p>Same image size and placement</p>
</li>
<li><p>Same blue “Add to Cart” button</p>
</li>
<li><p>Same font style and spacing</p>
</li>
</ul>
<p>Because these elements look alike, your brain automatically groups them as one category — “products that belong together.”<br>You don’t have to think about it; the similarity creates instant visual unity.</p>
<p>This is the Gestalt Principle of Similarity in action.</p>
<p>In the second row, everything is still similar except one button:</p>
<ul>
<li><p>The middle product’s button is orange, not blue</p>
</li>
<li><p>It has square corners, not rounded</p>
</li>
<li><p>The text is italic, not regular</p>
</li>
<li><p>The label changes to “Quick Buy”</p>
</li>
</ul>
<p>Because this button breaks the shared pattern, your brain immediately notices it and treats it as different or special.</p>
<p>Developers can use broken similarity to intentionally highlight featured items, promotions, or urgent actions.</p>
<p>When similarity is broken, the different element stands out and draws attention.</p>
<h3 id="heading-33-continuity">3.3 Continuity</h3>
<p>The human eye prefers to follow a continuous path or curve rather than jagged or broken lines. We perceive items aligned on a line or curve as being related. This is often used in navigation menus or horizontal carousels to guide the user's gaze.</p>
<p>For example, you might have a horizontal carousel where the last visible card is slightly "cut off" at the edge of the screen. This visual break creates a path that encourages the user to keep scrolling as their eyes follow the line of cards.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/e44556e8-4174-4073-9c05-63b5df9a1278.jpg" alt="e44556e8-4174-4073-9c05-63b5df9a1278" style="display:block;margin:0 auto" width="1650" height="214" loading="lazy">

<p><em>Fig 3:2: illustration of continuity (Gestalt principle)</em></p>
<p>As you can see, all four form fields — <em>First Name</em>, <em>Last Name</em>, <em>Email Address</em>, and <em>Phone Number</em> — are perfectly aligned along one continuous horizontal path. Because the human eye naturally prefers to follow an unbroken line, your gaze moves smoothly from left to right across the fields without effort.</p>
<p>The final field is slightly cut off at the edge, which creates a subtle visual cue that the line continues beyond the visible area. This encourages the user to keep scrolling or swiping, because their eyes are already following the direction of the form.<br>when elements are arranged along a straight path, curve, or flow, the brain automatically treats them as connected and expects the pattern to continue.</p>
<p>Another example is Instagram Stories, which are arranged in a smooth horizontal line at the top of the app. Instagram reinforces this by slightly revealing the next story circle at the edge of the screen. That tiny “peek” acts as a continuation cue — your eyes expect the line to keep going, so your finger follows.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/bb019a6d-33c9-4509-b1c7-191b5a453c46.jpg" alt="bb019a6d-33c9-4509-b1c7-191b5a453c46" style="display:block;margin:0 auto" width="1320" height="520" loading="lazy">

<p><em>Fig 3:3: illustration of continuity (Gestalt principle)</em></p>
<p>As you can see from above, all the circular story icons are arranged in a straight horizontal line, and your visual system instinctively follows that line from left to right without effort.</p>
<p>The slight visibility of the next story at the edge of the screen strengthens this effect, signaling that the sequence continues beyond what's currently shown. Also, because the icons share the same size, spacing, and shape, there are no visual interruptions, allowing your eyes to glide across them in one continuous motion.</p>
<p>This seamless flow is exactly what continuity describes: the tendency of the human eye to follow the direction of a line or pattern, assuming it continues even when part of it is out of view.</p>
<p>Continuity is the tendency of the human eye to follow the direction of a line or pattern, assuming it continues even when part of it is out of view.</p>
<h3 id="heading-34-closure">3.4 Closure</h3>
<p>Closure refers to the mind’s ability to perceive a complete, unified form even when parts of that form are missing. Rather than requiring every boundary, line, or shape to be explicitly drawn, the brain instinctively fills in the gaps. When used intentionally, closure allows interfaces to feel cleaner, more elegant, and more cognitively efficient.</p>
<p>When we look at a complex arrangement of visual elements, we tend to look for a single, recognisable pattern. If an image is missing parts, our brains fill in the gaps to "close" the shape.</p>
<p>One of the most celebrated examples of closure in visual identity design is the panda symbol used by the World Wide Fund for Nature (WWF). This logo demonstrates how strategic omission can produce a memorable, emotionally resonant, and universally recognisable mark.</p>
<p>At first glance, the panda illustration appears simple, composed of a few bold black shapes arranged against a white background.</p>
<p>Yet a closer look reveals that the panda is not fully drawn. There are no outlines defining the body, no complete contours around the head, and no explicit boundaries separating limbs from background. Instead, the designer uses a series of carefully placed shapes (ears, eye patches, nose, and partial limbs) to imply the rest of the animal. The viewer’s mind fills in the missing information, completing the silhouette effortlessly.</p>
<p>This is closure at its most effective: the brain constructs a whole from fragments, creating a sense of completeness without visual overload.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/0c40effd-e4d2-4669-af1d-56faac976b4a.jpg" alt="0c40effd-e4d2-4669-af1d-56faac976b4a" style="display:block;margin:0 auto" width="522" height="784" loading="lazy">

<p><em>Fig 3:4: illustration of closure (Gestalt principle)</em></p>
<p>For example, a "hamburger menu" (three lines) isn't a literal drawer, but our brains "close" the shape to understand it represents a menu.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/b67253cd-2a2c-4565-9818-f5586dad6d36.jpg" alt="b67253cd-2a2c-4565-9818-f5586dad6d36" style="display:block;margin:0 auto" width="1244" height="236" loading="lazy">

<p><em>Fig 3:5: illustration of closure (Gestalt principle)</em></p>
<p>An example of closure in practice can be seen in step indicators commonly used in checkout flows. These components often rely on partial shapes, implied boundaries, and incomplete outlines to guide the user through a sequence of actions.</p>
<p>For instance, upcoming steps may be represented by dashed circles. Although the circles aren't fully drawn, the viewer immediately recognises them as complete shapes. The brain resolves the missing segments, allowing the interface to communicate progression without heavy borders or fully rendered icons. This subtle use of closure reduces visual clutter while preserving clarity.</p>
<p>Closure refers to the mind’s ability to perceive a complete, unified form even when parts of that form are missing.</p>
<h3 id="heading-35-figureground">3.5 Figure/Ground</h3>
<p>This principle describes the mind's tendency to separate an object (the figure) from its surrounding area (the ground or background). In web design, using a "modal" or "pop-up" relies on this: by blurring the background, you force the user to see the pop-up as the focal figure.</p>
<p>When a user clicks "Login" on a modal/lightbox, the background site often dims (the "Ground") while the login box stays bright and centered (the "Figure"). This immediate depth change tells the user exactly where their attention belongs.</p>
<div class="embed-wrapper"><iframe width="560" height="315" src="https://www.youtube.com/embed/UCdjymjASOU" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>

<p><em>Video 3.5.0 Video description of Figure/Ground (Gestalt Principle)</em></p>
<p>From the video above, you can see that when the Quick View button is clicked, the selected figure stands out while the background darkens. This contrast guides the user’s attention and helps them focus on the figure. Developers can use this technique to direct users’ attention to what matters most or to what they want users to notice.</p>
<p>This principle describes the mind's tendency to separate an object (the figure) from its surrounding area (the ground or background).</p>
<h3 id="heading-36-common-fate">3.6 Common Fate</h3>
<p>Elements that move in the same direction are perceived as more related than elements that are stationary or move in different directions. Think of a dropdown menu: when all sub-items slide down together, they are clearly part of the same "unit."</p>
<p>For example, when you click a FAQ header and five sub-items slide down at the exact same speed and direction, the "Common Fate" tells the user that all those items belong to that specific category. If they flew in from different directions, the relationship would be lost.</p>
<div class="embed-wrapper"><iframe width="560" height="315" src="https://www.youtube.com/embed/T10xmlne6E4" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>

<p><em>Video 3.6.1 Video description of common fate (Gestalt Principle)</em></p>
<div class="embed-wrapper"><iframe width="560" height="315" src="https://www.youtube.com/embed/5FncGLRy0bM" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>

<p><em>Video 3.6.2 Video description of common fate (Gestalt Principle)</em></p>
<p>From the video shown above, the e‑commerce animation example demonstrates these principles clearly by using two distinct motion patterns: a group of regular products that move upward together, and a pair of special‑category items that enter dramatically from the left. Through these contrasting movements, the interface communicates category differences without relying on text labels or explicit instructions.</p>
<p>Therefore, developers can use this motion‑based differentiation as a design strategy to guide users’ perception—allowing the interface to signal hierarchy, category structure, and product importance purely through animated behaviour rather than through static visual labels.</p>
<p>Elements that move in the same direction are perceived as more related than elements that are stationary or move in different directions.</p>
<h3 id="heading-37-focal-point">3.7 Focal Point</h3>
<p>Whatever stands out visually will capture and hold the viewer’s attention first. This is essentially the principle of emphasis. A bright "Sign Up" button in a sea of gray text acts as the focal point, directing the user's primary action.</p>
<p>For example, an alert banner or a pricing table should stand out from its surroundings. Beyond this, in a three-tier pricing table (Basic, Pro, Enterprise), the "Pro" column is often slightly larger or a different color. This creates a focal point that draws the eye to the "recommended" option immediately.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/711c9ee9-b546-4041-bf86-30c80154bd47.jpg" alt="711c9ee9-b546-4041-bf86-30c80154bd47" style="display:block;margin:0 auto" width="1784" height="646" loading="lazy">

<p><em>Fig 3:7: illustration of closure (Gestalt principle)</em></p>
<p>In visual interface design, the Gestalt principle of Focal Point plays a crucial role in directing user attention toward the most important element on a screen.</p>
<p>A focal point is created when one element breaks the established pattern of surrounding elements, making it stand out immediately.</p>
<p>In e‑commerce interfaces, this principle is often applied to highlight primary actions such as purchasing, subscribing, or upgrading. The “Buy Now” button provides a clear and practical example of how focal points function within a layout.</p>
<p>From the example above, the first two buttons share the same visual characteristics: neutral colours, and regular weight text. This repetition establishes a visual pattern that the user quickly becomes familiar with.</p>
<p>But the “Buy Now” button intentionally disrupts this pattern. It uses a bright colour, which contrasts sharply with the muted tones of the other buttons. This colour difference alone is enough to draw the eye, as humans are naturally sensitive to changes in hue and saturation within a uniform environment.</p>
<p>The Focal Point may sound like it's similar to the principle of Similarity, but the two operate in completely opposite ways within perceptual psychology.</p>
<p>Similarity explains how the mind naturally groups elements that share visual characteristics –&nbsp;such as colour, shape, or size – into coherent units. Once this grouping is established, the interface gains structure and predictability.</p>
<p>Focal Point, on the other hand, works by intentionally breaking that structure. Instead of reinforcing uniformity, it introduces a deliberate contrast – through colour, scale, brightness, or motion –&nbsp;to draw the viewer’s attention to one specific element.</p>
<p>In other words, Similarity creates the background pattern, while Focal Point identifies the one element that must stand out against that pattern.</p>
<p>Whatever stands out visually will capture and hold the viewer’s attention first.</p>
<h3 id="heading-design-takeaways-from-the-gestalt-principles">Design Takeaways from the Gestalt Principles</h3>
<p><strong>Use Spacing as Your Primary Grouping Tool:</strong><br>Elements that belong together should sit closer to each other than to anything else. Spacing communicates structure faster than borders or boxes. Use tight internal spacing (6–12px) for related items and wide external spacing (24–48px) to separate groups.</p>
<p><strong>Build a Strict, Consistent Visual System — and Stick to It:</strong><br>Define clear rules for button types, text styles, icon sizes, and alignment patterns. Consistent left‑aligned text blocks, predictable carousel lines, and stable flow patterns reduce cognitive load and make interfaces feel trustworthy.</p>
<p><strong>Guide the Brain With Spacing, Alignment, Consistency, Contrast, and Motion:</strong><br>The human brain is always trying to group, follow, and prioritise what it sees. Your job is to guide that instinct through intentional layout decisions, not fight against it.</p>
<h2 id="heading-40-von-restorff-effect-the-isolation-effect"><strong>4.0 Von Restorff Effect (The Isolation Effect)</strong>:</h2>
<p>This is the brainchild of Hedwig von Restorff, posited in 1933. In principle it states: An item that stands out is more noticable and more likely to be remembered than other items (Hunt, 1995).</p>
<p>So unique or visually distinct elements grab attention and are more memorable – in other words, distinctiveness dictates memory. When a user interacts with an interface, their brain naturally seeks patterns to minimize cognitive effort.</p>
<p>While consistency is generally a virtue in design, a perfectly uniform layout can lead to "banner blindness" or habituation, where the user stops noticing details.</p>
<p>By strategically breaking a pattern through changes in color, size, shape, or spacing, the developer can "isolate" an element, triggering a biological response that flags the item as high-priority.</p>
<p>Note that although the Focal Point principle may initially seem similar to the Von Restorff Effect, they describe two different psychological processes.</p>
<p>Focal Point is a Gestalt visual principle that explains how one element becomes the centre of attention within a composition because it carries the strongest visual contrast –&nbsp;through size, colour, brightness, position, or motion. Its purpose is to guide the viewer’s eye toward the most important element in the layout.</p>
<p>The Von Restorff Effect comes from cognitive psychology, not Gestalt theory. It states that an item that is noticeably different from a group of similar items is not only more attention‑grabbing but also more memorable.</p>
<p>So Focal Point is about where the eye goes first, while the Von Restorff Effect is about what the brain remembers later.</p>
<h3 id="heading-design-takeaways-from-von-restorff">Design takeaways from Von Restorff</h3>
<p><strong>Use Isolation to Make CTAs Impossible to Miss:</strong><br>On a page filled with neutral text and standard links, a single high‑contrast button (like a bold “Primary Blue” or “Emergency Red”) instantly becomes the standout element. This leverages the Von Restorff Effect to pull the user’s eye toward the most important action.</p>
<p><strong>Create a Visual “Hitch” in the Scan Path:</strong><br>A distinct CTA interrupts the user’s natural left‑to‑right, top‑to‑bottom scanning rhythm. This makes actions like “Buy Now” or “Sign Up” the first thing they notice and the last thing they forget.</p>
<p><strong>Make Critical Actions Visually Distinct:</strong><br>Because users naturally notice the one element that breaks a pattern, your most important actions should use deliberate contrast — color, size, shape, weight, or motion. Isolate key information instead of letting it blend into surrounding UI noise.</p>
<p><strong>Avoid Over‑Differentiation — or Nothing Stands Out:</strong> If every button is loud, animated, or uniquely styled, the interface becomes chaotic. The Von Restorff Effect only works when there is a clear, stable pattern — and you break it once, intentionally.</p>
<p><strong>To Sum Up:</strong> An item that stands out is more noticable and more likely to be remembered than other items.</p>
<h2 id="heading-50-jakobs-law"><strong>5.0 Jakob’s Law</strong></h2>
<p>Jakob’s Law states that users spend most of their time on other sites, so they expect your interface to behave like the ones they already know.</p>
<p>Familiar patterns — hamburger menus, top navigation, search icons, and clickable top‑left logos — reduce cognitive load because users don’t have to interpret anything new.</p>
<p>But while Jakob’s Law is foundational to UX, I think it can also unintentionally suppress innovation.</p>
<p>When developers over‑prioritise familiarity, they fall into a standardisation trap: endlessly optimising conventional patterns instead of exploring fundamentally better ones.</p>
<p>The Pie Menu is a perfect illustration of this. According to Fitts’s Law, the time required to reach a target depends on its distance and size. Linear menus place the last item much farther from the cursor than the first, creating uneven interaction costs.</p>
<p>Radial menus position every option at an equal distance from the centre, and their wedge‑shaped targets effectively grow larger as the pointer moves outward.</p>
<p>Mathematically, pie/radial menu are faster to interact with and more efficient — yet they remain rare in mainstream web design because they violate users’ expectations. In other words, Jakob’s Law keeps us locked into a familiar but suboptimal pattern simply because “that’s how it’s always been done.”</p>
<p>But the challenge is not choosing between familiarity and innovation, but balancing them.</p>
<p>This is where the Aesthetic–Usability Effect becomes powerful. Research shows that users perceive attractive interfaces as easier to use, and they are more forgiving of minor usability friction when the design is visually pleasing.</p>
<p>A beautifully crafted Pie Menu, for example, can encourage users to invest the small amount of learning required to use it. By applying aesthetic delight strategically, developers can introduce innovative patterns without overwhelming users.</p>
<p>The principle that emerges is simple: Be conventional where it matters, and innovative where it delights.</p>
<h3 id="heading-design-takeaway-from-jakobs-law">Design Takeaway from Jakob's Law</h3>
<p><strong>Keep Trust‑Critical Elements Predictable:</strong><br>Navigation, search, authentication, and other high‑stakes interactions must follow established conventions. Users rely on these patterns for speed, confidence, and safety — this is where Jakob’s Law should be respected without exception.</p>
<p><strong>Experiment Only in Low‑Risk, High‑Creativity Areas:</strong><br>In creative or productivity‑focused zones — like editing tools in a photo app — you can safely introduce new interaction models such as radial menus, gesture wheels, or context‑aware tool selectors. These areas invite exploration and benefit from efficiency‑driven innovation.</p>
<p><strong>To Sum Up:</strong> Be conventional where it matters, and innovative where it delights.</p>
<h2 id="heading-60-millers-law"><strong>6.0 Miller’s Law</strong></h2>
<p>Miller’s Law originates from George A. Miller’s classic paper “The Magical Number Seven, Plus or Minus Two.” It states that the average person can hold only about 7 (±2) chunks of information in working memory at any given moment (Miller, 1956).</p>
<p>Crucially, Miller emphasised that the brain doesn’t store isolated items — it groups them into meaningful units called chunks. Because working memory is so limited, developers must structure information in ways that respect this cognitive boundary.</p>
<p>This principle has direct implications for interface design. Long, unbroken strings of information overwhelm users, whereas chunked formats are far easier to process.</p>
<p>For example, instead of displaying a phone number as 1234567890, formatting it as 123‑456‑7890 transforms ten digits into three manageable chunks. The same logic applies to navigation: aim for five to nine primary menu items, and if you need more, group them into categories. Users remember the category as a single chunk rather than each individual link.</p>
<p>Miller’s Law also explains why long forms are so intimidating. When a user sees 30 fields on one page, their brain interprets it as a single, massive task — far beyond the 7±2 limit.</p>
<p>A progressive stepper solves this by breaking the form into smaller stages of 5–7 fields each. This reduces cognitive load, creates a sense of progress, and significantly lowers abandonment rates.</p>
<p>The same principle applies to product listings or search results. Expecting users to compare 50 items at once is unrealistic. Instead, provide strong filtering tools so users can narrow the set to a manageable size — ideally within the range their working memory can meaningfully evaluate.</p>
<p>In essence, Miller’s Law reminds developers that humans don’t process information in bulk. They process it in structured, meaningful chunks.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6998def93dc17c4862075045/3add2891-c6f8-4df1-a044-6cea6c5f1945.jpg" alt="Image of progressive stepper" style="display:block;margin:0 auto" width="818" height="960" loading="lazy">

<p><em>Fig 6.0: Illustrating progressive stepper</em></p>
<p>In the example above, the interface uses both a progress bar and a stepper to guide the user through multiple stages of a task. After completing the first page and selecting “Continue,” the user moves to the next step, and the progress bar updates accordingly. This creates a clear sense of forward movement and accomplishment.</p>
<p>By breaking the process into smaller segments, the interface prevents cognitive overload. If all the information were presented on a single page, users might feel overwhelmed, unsure where to begin, or discouraged by the sheer volume of work.</p>
<p>A step‑by‑step flow transforms a large task into a sequence of manageable actions, increasing the likelihood of completion.</p>
<h3 id="heading-design-takeaway-from-millers-law">Design Takeaway from Miller's Law</h3>
<p><strong>Respect the 7±2 Working‑Memory Limit:</strong><br>Users can only hold about seven chunks of information at once. Long, unbroken content overwhelms them, while chunked information is instantly easier to process.</p>
<p><strong>Chunk Information Into Meaningful Units:</strong><br>The brain doesn’t store isolated items — it groups them. Format data (like phone numbers), menus, and settings into clear, memorable chunks instead of long, flat lists.</p>
<p><strong>Keep Navigation Within 5–9 Primary Items:</strong><br>If you need more than nine options, group them into categories. Users remember the category as a single chunk, not each individual link.</p>
<p><strong>Break Long Forms Into Smaller Steps:</strong><br>A 30‑field form feels like one giant task. A progressive stepper with 5–7 fields per step keeps users below the cognitive overload threshold and dramatically reduces abandonment.</p>
<p><strong>Reduce Comparison Load With Strong Filters:</strong><br>Expecting users to compare 50 products at once is unrealistic. Provide filtering tools that shrink the decision set to something the working memory can actually handle.</p>
<p><strong>Design for Chunked Thinking, Not Bulk Processing:</strong><br>Humans don’t process information in bulk — they process structured, meaningful groups. Interfaces that respect this limitation feel lighter, faster, and more intuitive.</p>
<p><strong>To Sum Up:</strong> A step‑by‑step flow transforms a large task into a sequence of manageable actions, increasing the likelihood of completion.</p>
<h2 id="heading-70-the-goal-gradient-hypothesis">7.0 The Goal-Gradient Hypothesis</h2>
<p>This is the perfect moment to introduce the Goal‑Gradient Hypothesis, originally proposed by behaviorist Clark Hull in 1932 (Yablonski, 2022). The hypothesis states that people become more motivated as they get closer to achieving a goal. In other words, users naturally accelerate their engagement when they sense they are nearing completion.</p>
<p>This principle is incredibly powerful in UX design, especially for progress tracking, gamification, and reward systems.</p>
<p>The takeaway is straightforward: Because users are more motivated near the finish line, progress indicators should be prominent and meaningful.</p>
<p>Percentages, progress bars, and step counters reinforce momentum. Micro‑achievements — such as badges, checkmarks, or subtle confetti — amplify motivation by celebrating small wins.</p>
<p>Tasks should be broken into measurable milestones so users can see themselves advancing.</p>
<p>This is why e‑learning platforms display messages like “You’ve completed 8 of 10 lessons — almost there!” and why fitness apps highlight progress with prompts such as “3 km done, 2 km to go.” These cues leverage the goal‑gradient effect to keep users engaged, energized, and eager to finish.</p>
<p>By combining progressive steppers with clear progress feedback, developers create interfaces that feel lighter, more encouraging, and far more motivating — ultimately improving completion rates and overall user satisfaction.</p>
<p>But what happens when a goal isn't completed? Why do we sometimes feel uncomfortable leaving things unfinished? That discomfort is explained by another psychological principle called the Zeigarnik Effect — the tendency for people to remember and feel tension about incomplete tasks. We will look at this next.</p>
<h3 id="heading-design-takeaway-from-goal-gradient-hypothesis">Design Takeaway from Goal-Gradient Hypothesis</h3>
<p><strong>Make Progress Visible to Boost Motivation:</strong><br>According to the Goal‑Gradient Hypothesis, users naturally speed up as they sense they’re nearing completion. Prominent progress bars, percentages, and step counters tap into this instinct and keep momentum high.</p>
<p><strong>Celebrate Micro‑Achievements to Reinforce Engagement:</strong><br>Badges, checkmarks, subtle confetti, and “step completed” cues reward small wins. These micro‑rewards amplify motivation and make long tasks feel lighter and more achievable.</p>
<p><strong>Break Tasks Into Measurable Milestones:</strong><br>Users stay motivated when they can see themselves advancing. Divide complex flows into clear steps so progress feels tangible rather than overwhelming.</p>
<p><strong>Use Progress Feedback to Drive Completion:</strong><br>Messages like “8 of 10 lessons completed — almost there” or “3 km done, 2 km to go” leverage the goal‑gradient effect to energise users and pull them toward the finish line.</p>
<p><strong>Combine Steppers With Clear Feedback for Maximum Impact:</strong><br>Progressive steppers paired with strong visual feedback create interfaces that feel encouraging, structured, and motivating — dramatically improving completion rates.</p>
<div class="embed-wrapper"><iframe width="560" height="315" src="https://www.youtube.com/embed/-mDuFB3W2bg" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>

<p><em>Video 8.0 : Video illustrating goal gradient</em></p>
<p><strong>To Sum Up:</strong> People become more motivated as they get closer to achieving a goal.</p>
<h2 id="heading-80-zeigarnik-effect"><strong>8.0 Zeigarnik Effect</strong></h2>
<p>The Zeigarnik Effect is a psychological principle stating that people remember unfinished or interrupted tasks better than completed ones (Cherry, 2024).</p>
<p>Memory begins with sensory input, which is processed into short-term memory. Unfinished tasks persist in our thoughts, leading to active recall. This ongoing engagement can turn them into long-term memories, enhancing recall until resolved. This increases engagement, encourages task completion, improves retention, and drives conversions.</p>
<p>So because people remember unfinished tasks better than completed ones (Zeigarnik Effect), developers use progress indicators to make users aware that something is incomplete and motivate them to finish it.</p>
<p>In your designs, you can break long forms into multi-step processes to encourage completion and display profile completion percentages (for example, 70% complete) to push users toward 100%.</p>
<p>This is the main reason why e-commerce platforms send abandoned cart reminders to bring users back to complete their purchases. It's also why apps use streak systems to encourage daily engagement and habit formation and learning platforms show course completion bars to motivate users to finish modules.</p>
<h3 id="heading-design-takeaway-from-zeigarnik-effect">Design Takeaway from Zeigarnik Effect</h3>
<p><strong>Unfinished Tasks Stay Active in Memory — Use That to Drive Completion:</strong><br>Because incomplete tasks linger in working memory (Zeigarnik Effect), users naturally keep thinking about what they haven’t finished. This tension boosts recall, engagement, and the likelihood of returning to complete the task.</p>
<p><strong>Make Incompleteness Visible With Progress Indicators:</strong><br>Progress bars, percentages, and step counters remind users that something is still unfinished. This gentle psychological pressure motivates them to continue until the task is complete.</p>
<p><strong>Break Long Flows Into Multi‑Step Processes:</strong><br>A massive form feels overwhelming, but a stepper with smaller chunks keeps users moving. Showing “70% complete” nudges them toward finishing the last stretch.</p>
<p><strong>Use Reminders to Re‑activate Unfinished Intent:</strong><br>Abandoned cart emails, streak reminders, and “continue your lesson” prompts work because the unfinished task is already active in the user’s mind. The reminder simply pulls them back into the loop.</p>
<p><strong>Celebrate Completion to Close the Cognitive Loop:</strong><br>Checkmarks, confirmations, and completion badges give users closure. This resolves the mental tension created by the unfinished task and reinforces positive behaviour.</p>
<p><strong>To Sum Up:</strong> Unfinished tasks persist in our thoughts, leading to active recall.</p>
<h2 id="heading-90-teslers-law"><strong>9.0 Tesler’s Law</strong>:</h2>
<p>This law was proposed by Lawrence Tesler. He was a computer scientist known for his work on human-computer interaction, and he contributed significantly to making software more user-friendly, including work on cut, copy, and paste functionality.</p>
<p>This law is otherwise known as the Law of Conservation of Complexity. The core Idea here is every process has a certain amount of “inherent complexity" that can't be removed. You can only decide who handles it: the user or the system.</p>
<p>Some examples of these inherent complexities might be:</p>
<ul>
<li><p>translating user actions into correct operations behind the scenes,</p>
</li>
<li><p>handling unreliable or slow network connections,</p>
</li>
<li><p>connecting with third-party APIs, services, or legacy systems,</p>
</li>
<li><p>sorting large datasets quickly,</p>
</li>
<li><p>performing complex search operations</p>
</li>
<li><p>managing version changes and compatibility issues,</p>
</li>
<li><p>managing state, interactions, and animations without confusing the user.</p>
</li>
</ul>
<p>All of these can be inherently complex, but it's the job of the developer to deal with the complexity.</p>
<p>As a developer, you should always try as much as possible to push complexity to the system. For example, instead of making a user type their full address manually, use an Auto-complete API (Google’s Places and Map is best for this). The complexity of finding and validating the address still exists, but the software handles the work for them.</p>
<p>Here's a practical example: let’s say you're designing a student platform that requires users to enter their university name. A practical approach would be to store an array of all universities in the UK in your codebase (This is the hard part Tesla hinted at).</p>
<p>As the user types, they don't need to enter the full name, and their full university name is shown (relating to what they have typed). For instance, if they intend to type “University of Sheffield,” simply typing “Sheff” should prompt the system to display the full university name, which they can then select.</p>
<p>In Dart, you can use a package like fuzzysearch to implement this kind of intelligent matching.</p>
<p>The advantage of this approach is greater than it first appears. It improves data consistency because users often enter the same information in different ways. For example, some users might type “Uni of Sheff,” others “Sheffield University,” and others “Uni of Sheffield,” while all are referring to “University of Sheffield.”</p>
<p>This is how messy data is created, and it creates more work for data analysts. Little wonder that data analysts spend up to 70% of their time cleaning data.</p>
<p>If developers invested more time in structuring how data is collected to ensure consistency, there would be far less work downstream for analysts. This same logic should be applied in how we collect date, time, and other information.</p>
<p>So apart from people's names and email addresses, you should try to standardize the data your app collects as much as possible. Use date and time pickers, stepper controls, input masks, checkboxes, dropdown menu and radio buttons, toggle switches. and so on.</p>
<p>The essence of removing complexity from the user is not only about improving usability, but also about ensuring that the data collected is standardised, structured, and consistent.</p>
<h3 id="heading-design-takeaway-from-teslers-law">Design Takeaway from Tesler's Law</h3>
<p><strong>Push Complexity to the System, Not the User:</strong><br>Every process contains unavoidable complexity. Your job is to handle it behind the scenes so the user experiences the simplest possible interaction.</p>
<p><strong>Automate Tasks Users Shouldn’t Have to Think About:</strong><br>Use tools like autocomplete, fuzzy search, intelligent defaults, and validation APIs to remove manual effort. The complexity still exists — but the system absorbs it instead of the user.</p>
<p><strong>Standardise Inputs to Prevent Messy Data:</strong><br>Users enter the same information in wildly different ways. Use pickers, dropdowns, input masks, radio buttons, and toggles to enforce consistent, structured data collection.</p>
<p><strong>Handle Inherent Technical Complexity Internally:</strong><br>Network issues, API quirks, large dataset sorting, search optimisation, state management, and animation logic are all developer responsibilities. Users should never feel this complexity.</p>
<p><strong>To Sum Up:</strong> Every process contains unavoidable complexity. Your job as a developer is to handle it behind the scenes so the user experiences the simplest possible interaction.</p>
<h2 id="heading-100-peak-end-rule"><strong>10.0 Peak End Rule</strong>:</h2>
<p>In 1993, Daniel Kahneman, Barbara Fredrickson, Charles Schreiber, and Donald Redelmeier invited volunteers into a lab for what sounded like a simple experiment. The task was straightforward: place a hand into a container of painfully cold water (Kahneman et al., 1993)</p>
<p>In the first round, participants kept their hand in 14°C water for 60 seconds. It was uncomfortable, sharp, and unpleasant but after one minute, it was over.</p>
<p>In the second round, they again endured 60 seconds in 14°C water. But this time, they were asked to keep their hand in for an extra 30 seconds. The temperature was raised slightly to 15°C. Still cold. Still unpleasant. Just slightly less intense.</p>
<p>Objectively, the second experience was worse. It lasted 90 seconds instead of 60. More total pain. More suffering.</p>
<p>Later, the researchers asked a simple question:</p>
<p>If you had to repeat one of the trials, which would you choose?” Surprisingly, most participants chose the longer one.</p>
<p>Why would anyone choose more pain?</p>
<p>The researchers realised something profound: people don’t remember experiences by calculating total discomfort. Instead, the mind summarizes the experience using just two key moments — the most intense point (the peak) and the final moment (the end).</p>
<p>In both trials, the peak pain was the same: 14°C. But the longer trial ended slightly better, at 15°C. That small improvement at the end reshaped how the entire episode was remembered. The participants’ “experiencing self” suffered more during the longer trial. But their “remembering self” preferred it because it ended on a less painful note.</p>
<p>From this, the researchers introduced what became known as the Peak–End Rule: we judge experiences largely by their most intense moment and how they finish, not by how long they last.</p>
<p>Since people largely judge an experience by how it ends, developers should focus on designing satisfying confirmation screens and smooth exit interactions. You should concentrate less on making every single moment perfect and instead prioritise optimising the peak and final moments.</p>
<p>A negative ending can overshadow an otherwise good experience, so carefully avoid frustrating final steps such as unexpected fees or confusing confirmations.</p>
<p>Emotional intensity strongly shapes memory, which is why many apps incorporate celebration animations, rewards, or success messages at key moments to leave a lasting positive impression.</p>
<h3 id="heading-design-takeaway-from-peak-end-rule">Design takeaway from Peak End Rule</h3>
<p><strong>People Judge Experiences by the Peak and the Ending — Not the Total Duration:</strong><br>Users don’t remember every moment. They remember the most intense point and how the experience ends. A slightly better ending can completely reshape how the entire interaction is remembered.</p>
<p><strong>Prioritise Strong, Positive Endings in Your UX Flows:</strong><br>A smooth final step, a clear confirmation, or a satisfying success screen leaves a disproportionately strong impression. A bad ending can overshadow an otherwise great experience.</p>
<p><strong>Design for Emotional Peaks at Key Moments:</strong><br>Celebration animations, rewards, checkmarks, and success messages create memorable emotional spikes. These peaks anchor the experience in the user’s memory.</p>
<p><strong>Don’t Try to Perfect Every Moment — Perfect the Right Moments:</strong><br>Optimise the peak and the end of the journey. These two moments define how users recall the entire interaction.</p>
<p><strong>Avoid Negative Surprises at the Finish Line:</strong><br>Unexpected fees, confusing confirmations, or friction at the last step can ruin the memory of the whole process. Protect the ending carefully.</p>
<p><strong>To Sum Up:</strong> We judge experiences largely by their most intense moment and how they finish, not by how long they last.</p>
<h2 id="heading-110-postels-law"><strong>11.0 Postel’s Law</strong>:</h2>
<p>Jon Postel’s famous principle – “Be conservative in what you send, be liberal in what you accept” –&nbsp; is a philosophy of kindness in software design. At its core, the principle argues that systems should be generous with what they accept from users, yet disciplined and predictable in what they output.</p>
<p>When developers follow this approach, users feel supported and understood. When they don’t, users feel punished for being human.</p>
<p>A user’s input is rarely perfect. People type quickly, make mistakes, follow their own habits, or rely on formats familiar to them. A robust system embraces this reality. It accepts messy, human input and quietly transforms it into clean, standardized data.</p>
<p>Real people don't think in strict formats. They write dates the way they learned in school, type phone numbers the way they say them aloud, and enter names and addresses in whatever structure feels natural to them.</p>
<p>A rigid system will reject anything that doesn’t match its narrow expectations, but a robust system, by contrast, adapts to the user.</p>
<p>Consider dates. A brittle interface might demand MM/DD/YYYY and reject everything else. A more humane system accepts a wide range of formats — “1 May 2024,” “2024‑05‑01,” “05/01/24,” or “May 1st, 2024” — and quietly converts them into a standard internal representation. This is where the complex handling described by Tesla's Law comes into play (Shifting complexity to the system, rather than the user).</p>
<p>Phone numbers follow the same pattern. People might enter (555) 123 4567, 555‑123‑4567, 5551234567, or +1 555 123 4567. A fragile system throws errors. A robust one parses all of them using libraries like libphonenumber and moves on.</p>
<p>Addresses are equally varied. “221B Baker St,” “221‑B Baker Street,” and “221 Baker St., Apt B” all refer to the same place. A forgiving system normalizes these instead of rejecting them.</p>
<p>Even names can be surprisingly complex. Hyphens, apostrophes, multiple words, and titles are all part of real human identity. Rejecting “O’Connor,” “Jean‑Luc,” or “Dr. Sarah Lee” is not just technically incorrect — it's disrespectful to the user.</p>
<p>Search bars offer another clear example. A strict search bar demands perfect spelling and exact phrasing. A robust one handles typos (“restuarant”), partial words (“resta”), synonyms (“food places”), and natural language (“where can I eat nearby”). It meets the user where they are instead of forcing them to think like a machine.</p>
<p>Currency should be normalized to a clear format such as GBP 5.00, no matter whether the user typed “£5,” “5 pounds,” or “5 GBP.”</p>
<p>Even file uploads benefit from standardization: whether the user uploads .jpeg, .jpg, .JPG, or .JPEG, the system should store everything as .jpg.</p>
<p>Error messages follow the same principle. Vague feedback like “Invalid password” leaves users confused and frustrated.</p>
<p>A clear, conservative message — “Incorrect password. Please try again.” — respects the user’s time. And instead of hiding password requirements, the system should state them upfront: minimum eight characters, at least one uppercase letter, at least one number.</p>
<p>Predictability reduces friction.</p>
<p>Because users inevitably make mistakes or enter data in unexpected ways, developers should design input fields that are tolerant rather than brittle. This means accepting flexible formats, offering autocorrect or intelligent parsing, and using forgiving validation rules that interpret the user’s intent instead of rejecting their effort.</p>
<p>Clear instructions, tooltips, and visible requirements should appear before submission so users understand what the system expects without trial and error.</p>
<p>When errors do occur, the interface should handle them gently—never crashing, and never forcing the user to start over.</p>
<p>Even simple variations, such as phone numbers typed with spaces, dashes, or parentheses, should be accepted and normalized behind the scenes.</p>
<p>By embracing flexibility on the input side and clarity on the output side, developers create systems that feel humane, resilient, and respectful of the way real people actually behave.</p>
<h3 id="heading-design-takeaway-from-postels-law">Design Takeaway from Postel's Law</h3>
<p><strong>Accept Messy Human Input, Output Clean Structured Data:</strong><br>Users type dates, names, phone numbers, and addresses in unpredictable ways. A humane system accepts this variability and quietly normalises it into a consistent internal format.</p>
<p>Rigid interfaces punish users for being human. Robust interfaces interpret intent — handling typos, partial matches, synonyms, and natural language without complaint.</p>
<p>Also accept variations in spacing, punctuation, casing, and structure. Let users type naturally — the system should handle the complexity, not them.</p>
<p><strong>Be Flexible With Input, Be Strict With Output:</strong><br>This is the heart of Postel’s Law. Let users express information naturally, but ensure your system stores and displays it in a predictable, standardised way.</p>
<p><strong>Use Intelligent Parsing and Autocorrection to Reduce Errors:</strong><br>Libraries like libphonenumber, fuzzy search, and natural‑language parsers allow systems to accept a wide range of formats while still producing clean, reliable data.</p>
<p><strong>Normalise Everything Behind the Scenes:</strong><br>Dates, phone numbers, currency, file extensions, and addresses should all be standardised internally. This prevents messy data and reduces downstream cleanup work.</p>
<p><strong>Provide Clear, Predictable Feedback:</strong><br>Error messages should be specific and helpful. Requirements should be visible upfront. Users should never be surprised, confused, or forced to start over.</p>
<p><strong>Combine Postel’s Law With Tesler’s Law:</strong><br>Shift complexity to the system. Intelligent handling of messy input reduces cognitive load, improves usability, and ensures consistent, high‑quality data.</p>
<p><strong>To Sum Up:</strong> A rigid system will reject anything that doesn’t match its narrow expectations, but a robust system, by contrast, adapts to the user.</p>
<h2 id="heading-120-doherty-threshold"><strong>12.0 Doherty Threshold</strong>:</h2>
<p>The Doherty Threshold is a principle in human–computer interaction which proposes that systems should respond quickly enough to keep users actively engaged (Mod 2024).</p>
<p>When response times stay below a certain limit, users remain focused and productive. But once performance already meets this optimal responsiveness level, making the system even faster or adding extra capability doesn't significantly enhance satisfaction or efficiency.</p>
<p>The idea was introduced by Walter J. Doherty in 1976 in his paper “A Comparison of Programming Systems and Doherty Threshold.” His research showed that maintaining rapid system feedback fast enough to sustain continuous interaction has a stronger impact on productivity than simply increasing system power or features beyond that point.</p>
<p>Doherty proposes that this shouldn't be greater than 400ms Rule: If the system responds within this window, the user feels in total control. If the response takes longer, the user's attention begins to wander, and their "train of thought" is broken.</p>
<p>The challenge, of course, is that not every operation can realistically complete within 400ms. Some tasks require heavy computation, large network calls, or complex rendering. This is where the concept of perceived performance becomes essential.</p>
<p>Even when the system can't finish the work quickly, it can feel fast by responding instantly at the UI level. Developers can achieve this illusion of speed through a combination of thoughtful design patterns and disciplined engineering practices.</p>
<p>On the technical side, performance begins with reducing unnecessary work. Keeping the number of HTML elements low helps the browser render faster. Rendering only the visible portion of long lists prevents the Document Oject Model (DOM) from becoming bloated. Splitting scripts and deferring non‑critical code ensures that essential interactions load first.</p>
<p>Using CSS transforms and opacity changes avoids expensive layout recalculations. Lazy‑loading images, videos, and scripts ensures that the interface becomes interactive long before all assets are downloaded.</p>
<p>These optimizations don’t just improve raw speed — they create the foundation for interfaces that feel responsive.</p>
<h3 id="heading-design-takeaways-from-doherty-threshold">Design Takeaways from Doherty Threshold</h3>
<p><strong>Instant Feedback</strong>: When a user clicks a button, provide a visual change (like a button press animation or a spinner) immediately, even if the background task takes longer.</p>
<p><strong>Skeleton Screens</strong>: Use placeholder blocks that mimic the layout of the page while data loads. This makes the app feel like it is responding instantly.</p>
<p><strong>Progressive Loading</strong>: Load text and basic structures first, then "pop in" high-resolution images later.</p>
<p><strong>Optimistic UI</strong>: When a user hits "Save," don't wait for the server. Update the UI instantly (Doherty) and handle the "messy" data formatting on the backend (Postel).</p>
<p><strong>Live Inline Validation</strong>: Show a green checkmark or a helpful error message as the user types. This keeps them below the 400ms "thought-break" limit.</p>
<p><strong>Debouncing</strong>: In search bars, start showing results after a few keystrokes so the user feels the app is "predicting" their needs.</p>
<p><strong>To Sum Up:</strong> When response times stay below a certain limit, users remain focused and productive. But once performance already meets this optimal responsiveness level, making the system even faster or adding extra capability doesn't significantly enhance satisfaction or efficiency.</p>
<h2 id="heading-130-serial-position-effect-primacy-and-recency"><strong>13.0 Serial Position Effect (Primacy and Recency)</strong>:</h2>
<p>Murdock’s study investigated how the position of a word in a list affects recall, known as the serial position effect. He presented 103 psychology students with lists of 10 to 40 words, one at a time, at either 1 or 2 seconds per word (McLeod, 2025).</p>
<p>Participants were divided into six groups, each experiencing a different combination of list length and presentation rate, and were asked to recall as many words as possible in any order.</p>
<p>The results showed that participants were most likely to remember words at the beginning of the list (primacy effect) and at the end of the list (recency effect), while words in the middle were recalled less often. The recency effect persisted even in longer lists, and the middle section of the recall curve formed a flat asymptote.</p>
<p>Murdock explained this using the multi-store model of memory: early words were rehearsed and transferred to long-term memory, last words remained in short-term memory, and middle words were neither sufficiently rehearsed nor retained, leading to poorer recall.</p>
<p>The experiment demonstrated that memory performance varies systematically with the position of information in a sequence.</p>
<p>This is the reason why the most important information or actions should never be buried in the middle.</p>
<p>As a developer, you should put your most critical navigation links (like "Home" or "Dashboard") at the far left or the top of a list. In a pricing table, put the most popular or recommended plan on the Place "Final Actions" (like "Log Out," "Cart," or "Support") at the end of a menu or the far right of a navigation bar.</p>
<p>In a long onboarding flow, put the most exciting benefit of the app on the very last slide so the user enters the app feeling motivated.</p>
<p>Avoid placing highly important buttons in the middle of a row. If you have a row of 7 buttons, the user is statistically likely to overlook the 4th one.</p>
<h3 id="heading-design-takeaways-serial-position-effect">Design Takeaways <strong>Serial Position Effect</strong></h3>
<p><strong>Place Critical Items at the Beginning or End — Never the Middle:</strong><br>Users reliably remember the first and last items in any sequence (primacy and recency). Anything placed in the middle is statistically more likely to be forgotten or ignored. Also, actions such as “Log Out,” “Cart,” “Support,” or “Checkout” should sit at the far right or bottom — the natural recency position.</p>
<p><strong>Put Essential Navigation Links at the Far Left or Top:</strong><br>Links like “Home,” “Dashboard,” or “Overview” should appear at the start of a menu, where recall and recognition are strongest.</p>
<p><strong>To Sum Up:</strong> The results showed that participants were most likely to remember words at the beginning of the list (primacy effect) and at the end of the list (recency effect), while words in the middle were recalled less often.</p>
<h2 id="heading-140-occams-razor"><strong>14.0 Occam’s Razor</strong>:</h2>
<p>Although first articulated in the 14th century by the Franciscan friar William of Ockham, Occam’s Razor remains one of the most indispensable principles in a developer’s toolkit. In fact, skipping this law while discussing other theories and principles would be like skipping the glue that holds the entire framework together.</p>
<p>At its core, Occam’s Razor states that “among competing explanations, the simplest one is usually the best.”</p>
<p>For example, if two user interfaces achieve the same goal, the one with fewer visual elements is typically superior because it requires less processing power.</p>
<p>The fundamental takeaway for modern developers regarding Occam’s Razor is that complexity is a tax on the user’s cognitive resources.</p>
<p>In an era of information density, the developer's primary role is no longer to provide "more" features – rather, it's to curate the most direct path to a solution.</p>
<p>In practice, Occam’s Razor becomes a reminder to keep things as simple as possible. This “less is more” mindset shapes everything from navigation to forms.</p>
<p>A good rule for navigation is the Rule of Five: aim for three to five main menu items instead of a long, overwhelming list. This keeps choices clear and prevents users from freezing up when they see too many options.</p>
<p>The same idea applies to data entry. When you ask only for the information that truly matters, you respect the user’s time and reduce the chance of “form fatigue,” which is one of the biggest reasons people abandon sign‑ups or checkout flows.</p>
<p>Simplicity isn’t just elegant — it’s practical, humane, and far more effective.</p>
<h3 id="heading-design-takeaway-from-occams-razor">Design Takeaway from Occam's Razor</h3>
<p><strong>Choose the Simplest Effective Solution:</strong><br>When two designs achieve the same goal, the one with fewer elements is almost always better. Simplicity reduces cognitive load and speeds up user decision‑making.</p>
<p><strong>Simplicity Is Not Just Aesthetic — It’s Humane:</strong><br>Clear, minimal interfaces respect the user’s time, reduce friction, and make the product feel effortless. Simplicity is both a design strategy and an act of empathy.</p>
<p><strong>To Sum Up:</strong> Simplicity isn’t just elegant — it’s practical, humane, and far more effective.</p>
<h2 id="heading-150-parkinsons-law">15.0 Parkinson's Law</h2>
<p>Occam’s Razor teaches us to prefer the simplest solution that works. But why do we so often end up with complex systems in the first place? That tendency is explained by another principle: Parkinson’s Law.</p>
<p>Parkinson’s Law states that "work expands to fill the time available for its completion". In design, this means projects often become overly complex or take longer than necessary if given too much time, resulting in inefficient, over-designed, or cluttered interfaces.</p>
<p>In design, this manifests as Feature Creep. If you give yourself three months to build an app, you will spend three months adding "nice-to-have" animations, extra settings toggles, and niche edge cases that nobody asked for and in reality, what you have added isn’t that important.</p>
<p>You just succeeded in adding layers of complexity that might ends up violating some of the laws we spoke about. Occam’s Razor reminds us that the simplest solution is often the most effective.</p>
<p>By being aware of Parkinson’s Law and the tendency for work to expand, developers can manage their time intentionally and focus only on what truly matters.</p>
<h3 id="heading-design-takeaway-for-parkinsons-law">Design Takeaway for Parkinson's law</h3>
<p><strong>Set Clear Constraints to Keep Designs Focused:</strong><br>Intentional time limits and scope boundaries prevent over‑designing. Constraints force clarity, prioritisation, and simplicity.</p>
<p><strong>Build Only What Truly Matters for the User:</strong><br>Parkinson’s Law reminds you to resist the urge to fill time with unnecessary features. Focus on the core experience, not the edge cases nobody asked for.</p>
<p><strong>Use Occam’s Razor to Counterbalance Parkinson’s Law:</strong><br>As work expands, complexity grows. Occam’s Razor pulls you back to the simplest effective solution. Together, the two principles prevent bloated, over‑engineered products.</p>
<p><strong>To Sum Up:</strong> Work expands to fill the time available for its completion</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Human-centered design is deeply influenced by a set of psychological principles that explain how users perceive, process, and interact with digital systems.</p>
<p>Among these, Fitts’s Law establishes that the time required to acquire a target depends on its size and distance. In practice, this means that larger and closer elements are easier and faster to interact with.</p>
<p>To apply this in practice, developers should make primary call-to-action elements prominent, large, and easily reachable –&nbsp;especially in mobile interfaces where thumb accessibility is critical.</p>
<p>Closely related to decision-making is Hick’s Law, which states that the more choices a user is presented with, the longer it takes to make a decision. Excessive options can overwhelm users and lead to decision fatigue.</p>
<p>To address this, developers should simplify interfaces, minimise unnecessary options, and guide users through processes step-by-step rather than presenting everything at once.</p>
<p>Another important cognitive principle discussed is Miller’s Law, which suggests that the average person can hold approximately seven (plus or minus two) items in working memory at a time. This limitation highlights the need to present information in manageable chunks.</p>
<p>By breaking content into smaller groups and avoiding information overload, developers can improve comprehension and usability.</p>
<p>User expectations are strongly shaped by Jakob’s Law, which says that people spend most of their time on other websites and therefore expect similar patterns across digital products.</p>
<p>Instead of reinventing basic interactions, developers should follow familiar conventions such as placing the logo in the top‑left, the shopping cart in the top‑right, and keeping scrolling behaviour predictable.</p>
<p>But innovation is still possible where it truly adds value. As we discussed with the Aesthetic‑Usability Effect, users are far more tolerant of new or unusual design patterns when the interface is visually appealing and thoughtfully crafted.</p>
<p>The Gestalt Principles provided additional insight into how users visually organise information. The principle of proximity suggests that objects placed close together are perceived as related, so grouping related elements improves clarity. Similarity indicates that elements with consistent colours, shapes, or styles are seen as belonging together, reinforcing visual hierarchy and function. Closure explains that users can perceive incomplete shapes as complete, allowing for minimalistic designs where the brain fills in missing details. Continuity highlights that users naturally follow smooth visual paths, meaning layouts should guide the eye logically through alignment and structure.</p>
<p>We also looked at The Von Restorff Effect which emphasizes that elements which stand out are more likely to be remembered. By using contrast in colour, size, or design, important features such as buttons or alerts can capture user attention.</p>
<p>Managing complexity was addressed by Tesler’s Law, which asserts that every system has inherent complexity that cannot be eliminated but only managed.</p>
<p>Developers must therefore shift complexity away from the user by simplifying interfaces while handling intricate processes behind the scenes.</p>
<p>The Zeigarnik Effect reveals that people remember unfinished tasks better than completed ones, creating a sense of mental tension. This can be leveraged by incorporating progress indicators, checklists, and reminders that encourage users to complete tasks.</p>
<p>Similarly, the Peak-End Rule suggests that users judge an experience based on its most intense moment and its conclusion. Developers should create memorable highlights and ensure a smooth, satisfying ending to user journeys.</p>
<p>We also discussed the Goal-Gradient Effect, which explains that users become more motivated as they approach the completion of a task. By showing progress –such as indicating that a process is “80% complete” –&nbsp;and breaking tasks into stages, developers can encourage users to finish what they have started.</p>
<p>In terms of system interaction, Postel’s Law advises developers to be flexible in accepting user input while maintaining strict standards for output. This means allowing different input formats while ensuring consistent and reliable system responses.</p>
<p>Performance is equally important, as highlighted by the Doherty Threshold, which shows that productivity increases when system response times stay under 400 milliseconds. Fast systems keep users engaged and create a sense of ease.</p>
<p>This means that developers should focus on building interfaces that feel instant, even when real processing takes longer, by combining smart engineering practices with thoughtful design patterns that maintain the illusion of speed.</p>
<p>Memory and attention are further explained by the Serial Position Effect, where users tend to remember the first and last items in a sequence more than those in the middle. Developers should position key information or actions at the beginning or end of lists.</p>
<p>Simplicity is reinforced by Occam’s Razor, which argues that the simplest solution is often the most effective. Eliminating unnecessary features reduces friction and enhances usability, and we further discussed about Parkinson’s Law, which suggests that tasks expand to fill the time available, indicating the importance of setting constraints such as deadlines or timers to encourage timely action.</p>
<p>These principles collectively highlight the importance of simplicity, clarity, performance, and user psychology in design. By applying them thoughtfully, developers can create intuitive, efficient, and engaging user experiences that align with both human behaviour and user expectations.</p>
<h2 id="heading-references">References</h2>
<p>Budiu, R. (2022). <em>Fitts’s Law and Its Applications in UX</em>. [online] Nielsen Norman Group. Available at: <a href="https://www.nngroup.com/articles/fitts-law/">https://www.nngroup.com/articles/fitts-law/</a>.</p>
<p>Bustamante, N. (2023). <em>Gestalt Psychology? Definition, Principles, &amp; Examples - Simply Psychology</em>. [online] <a href="http://www.simplypsychology.org">www.simplypsychology.org</a>. Available at: <a href="https://www.simplypsychology.org/what-is-gestalt-psychology.html">https://www.simplypsychology.org/what-is-gestalt-psychology.html</a>.</p>
<p>Cherry, K. (2024). <em>The Zeigarnik Effect Is Why You Keep Thinking of Unfinished Work</em>. [online] Verywell Mind. Available at: <a href="https://www.verywellmind.com/zeigarnik-effect-memory-overview-4175150">https://www.verywellmind.com/zeigarnik-effect-memory-overview-4175150</a>.</p>
<p>DO, A.M., RUPERT, A.V. and WOLFORD, G. (2008). Evaluations of pleasurable experiences: The peak-end rule. <em>Psychonomic Bulletin &amp; Review</em>, 15(1), pp.96–98. doi:<a href="https://doi.org/10.3758/pbr.15.1.96">https://doi.org/10.3758/pbr.15.1.96</a>.</p>
<p>GUPTA, S., GUPTA, S., MAHENDRA, A. and GUPTA, S. (2006). Inverse Halo Nevus. <em>Dermatologic Surgery</em>, 32(6), pp.871–872. doi:<a href="https://doi.org/10.1097/00042728-200606000-00025">https://doi.org/10.1097/00042728-200606000-00025</a>.</p>
<p>‌Hall, D. (2023). <em>Pilot Error, Chapanis and The Shape of Things to Come</em>. [online] UX Magazine. Available at: <a href="https://uxmag.com/articles/pilot-error-chapanis-and-the-shape-of-things-to-come">https://uxmag.com/articles/pilot-error-chapanis-and-the-shape-of-things-to-come</a>.</p>
<p>Hunt, R.R. (1995). The subtlety of distinctiveness: What von Restorff really did. <em>Psychonomic Bulletin &amp; Review</em>, 2(1), pp.105–112. doi:<a href="https://doi.org/10.3758/bf03214414">https://doi.org/10.3758/bf03214414</a>.</p>
<p>Kahneman, D., Fredrickson, B.L., Schreiber, C.A. and Redelmeier, D.A. (1993). When More Pain Is Preferred to Less: Adding a Better End. <em>Psychological Science</em>, 4(6), pp.401–405. doi:<a href="https://doi.org/10.1111/j.1467-9280.1993.tb00589.x">https://doi.org/10.1111/j.1467-9280.1993.tb00589.x</a>.</p>
<p>Mod, D. (2024). <em>Doherty Threshold: UX Law of Swift Interactions</em>. [online] Articles on everything UX: Research, Testing &amp; Design. Available at: <a href="https://blog.uxtweak.com/doherty-threshold/">https://blog.uxtweak.com/doherty-threshold/</a>.</p>
<p>Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. <em>Psychological Review</em>, [online] 101(2), pp.343–352. doi:<a href="https://doi.org/10.1037/0033-295x.101.2.343">https://doi.org/10.1037/0033-295x.101.2.343</a>.</p>
<p>Proctor, R.W. and Schneider, D.W. (2018). Hick’s law for choice reaction time: A review. <em>Quarterly Journal of Experimental Psychology</em>, [online] 71(6), pp.1281–1299. doi:<a href="https://doi.org/10.1080/17470218.2017.1322622">https://doi.org/10.1080/17470218.2017.1322622</a>.</p>
<p>Yablonski, J. (2022). <em>Hick’s Law</em>. [online] Laws of UX. Available at: <a href="https://lawsofux.com/hicks-law/">https://lawsofux.com/hicks-law/</a>.</p>
<p>Yablonski, J. (2022). <em>Goal-Gradient Effect</em>. [online] Laws of UX. Available at: <a href="https://lawsofux.com/goal-gradient-effect/">https://lawsofux.com/goal-gradient-effect/</a>.</p>
<p>‌</p>
<p>‌</p>
<p>‌</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Convert Images to PDF in the Browser Using JavaScript – A Step-by-Step Guide ]]>
                </title>
                <description>
                    <![CDATA[ Whether it’s scanned documents, screenshots, receipts, notes, certificates, or multiple photos, users often need a quick way to combine images into a downloadable PDF. Modern browsers make this much e ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-convert-images-to-pdf-using-javascript/</link>
                <guid isPermaLink="false">69fe1ae5f239332df4ec3436</guid>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ pdf ]]>
                    </category>
                
                    <category>
                        <![CDATA[ webdev ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Programming Blogs ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Bhavin Sheth ]]>
                </dc:creator>
                <pubDate>Fri, 08 May 2026 17:18:29 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/8902fd4f-fcfa-4f7b-8baf-9b595239254f.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Whether it’s scanned documents, screenshots, receipts, notes, certificates, or multiple photos, users often need a quick way to combine images into a downloadable PDF.</p>
<p>Modern browsers make this much easier than before.</p>
<p>Instead of uploading files to a server, we can now process images directly in the browser using JavaScript. This keeps the tool fast, private, and easy to use.</p>
<p>In this tutorial, you’ll build a browser-based Image to PDF converter using JavaScript.</p>
<p>The tool will support uploading multiple images, sorting files, choosing orientation and page size, configuring margins, and merging images into either a single PDF or separate PDF files. Users will also be able to preview and download the generated document directly in the browser.</p>
<p>Everything runs entirely client-side without any backend.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/b3742c45-abef-44a6-9703-ee1538fa4c68.png" alt="Convert Images to PDF" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a href="#heading-how-image-to-pdf-conversion-works">How Image to PDF Conversion Works</a></p>
</li>
<li><p><a href="#heading-project-setup">Project Setup</a></p>
</li>
<li><p><a href="#heading-what-library-are-we-using">What Library Are We Using?</a></p>
</li>
<li><p><a href="#heading-creating-the-upload-interface">Creating the Upload Interface</a></p>
</li>
<li><p><a href="#heading-reading-uploaded-images">Reading Uploaded Images</a></p>
</li>
<li><p><a href="#heading-generating-the-pdf">Generating the PDF</a></p>
</li>
<li><p><a href="#heading-handling-multiple-images">Handling Multiple Images</a></p>
</li>
<li><p><a href="#heading-configuring-pdf-settings">Configuring PDF Settings</a></p>
</li>
<li><p><a href="#heading-renaming-and-downloading-the-pdf">Renaming and Downloading the PDF</a></p>
</li>
<li><p><a href="#heading-demo-how-the-image-to-pdf-tool-works">Demo: How the Image to PDF Tool Works</a></p>
</li>
<li><p><a href="#heading-important-notes-from-real-world-use">Important Notes from Real-World Use</a></p>
</li>
<li><p><a href="#heading-common-mistakes-to-avoid">Common Mistakes to Avoid</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-how-image-to-pdf-conversion-works">How Image to PDF Conversion Works</h2>
<p>The browser can't directly combine images into a PDF by itself.</p>
<p>Instead, we'll use a JavaScript PDF library that creates pages, inserts images, and exports everything as a downloadable PDF document.</p>
<p>The process starts when users upload one or multiple images into the browser. JavaScript then reads the image data and prepares it for PDF generation. After that, the tool creates PDF pages, inserts the uploaded images into those pages, and finally exports everything as a downloadable PDF document.</p>
<p>Everything happens locally inside the browser.</p>
<p>This means users don’t need to upload private files to a server, which makes the process faster and more privacy-friendly.</p>
<h2 id="heading-project-setup">Project Setup</h2>
<p>This project is intentionally simple.</p>
<p>You only need:</p>
<ul>
<li><p>an HTML file</p>
</li>
<li><p>a JavaScript file</p>
</li>
<li><p>a PDF library</p>
</li>
</ul>
<p>No backend or database is required.</p>
<h2 id="heading-what-library-are-we-using">What Library Are We Using?</h2>
<p>We’ll use the jsPDF library. It allows us to generate PDF files directly in JavaScript.</p>
<p>Add it using a CDN:</p>
<pre><code class="language-html">&lt;script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"&gt;&lt;/script&gt;
</code></pre>
<p>Once loaded, we can create and export PDF files directly from the browser.</p>
<h2 id="heading-creating-the-upload-interface">Creating the Upload Interface</h2>
<p>Start with a basic upload area:</p>
<pre><code class="language-html">&lt;input type="file" id="upload" multiple accept="image/*"&gt;

&lt;button onclick="convertToPDF()"&gt;
  Convert to PDF
&lt;/button&gt;
</code></pre>
<p>This allows users to upload multiple image files and generate the PDF.</p>
<p>Here’s what the upload section looks like inside the tool:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/0d9e3a68-26cd-4b5e-83ed-93149fcffdd1.png" alt="Image upload interface for browser-based image to PDF converter tool" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>You can also expand the interface with additional controls for sorting, page settings, margins, and merge modes.</p>
<h2 id="heading-reading-uploaded-images">Reading Uploaded Images</h2>
<p>After users select files, we need to read them in JavaScript.</p>
<p>We can use <code>FileReader</code> for this:</p>
<pre><code class="language-javascript">const fileInput = document.getElementById("upload");

const files = fileInput.files;

for (const file of files) {
  const reader = new FileReader();

  reader.onload = function (e) {
    const imageData = e.target.result;

    console.log(imageData);
  };

  reader.readAsDataURL(file);
}
</code></pre>
<p>This converts uploaded images into readable Base64 data that can later be inserted into the PDF.</p>
<h2 id="heading-generating-the-pdf">Generating the PDF</h2>
<p>Now we can create the PDF document.</p>
<pre><code class="language-javascript">const { jsPDF } = window.jspdf;

const pdf = new jsPDF();
</code></pre>
<p>Once the PDF is created, images can be inserted into pages:</p>
<pre><code class="language-javascript">pdf.addImage(imageData, "JPEG", 10, 10, 180, 120);
</code></pre>
<p>This inserts the uploaded image into the PDF page at a specific position and size.</p>
<p>Finally, export the document:</p>
<pre><code class="language-javascript">pdf.save("images.pdf");
</code></pre>
<p>This downloads the generated PDF instantly.</p>
<h2 id="heading-handling-multiple-images">Handling Multiple Images</h2>
<p>If users upload multiple files, each image can be added to its own PDF page automatically.</p>
<p>For example:</p>
<pre><code class="language-javascript">files.forEach((file, index) =&gt; {

  if (index !== 0) {
    pdf.addPage();
  }

});
</code></pre>
<p>This creates a new page before inserting the next image into the document.</p>
<p>In some situations, users may also want multiple images on the same page instead of one image per page.</p>
<p>For example:</p>
<pre><code class="language-javascript">pdf.addImage(img1, "JPEG", 10, 20, 80, 80);

pdf.addImage(img2, "JPEG", 110, 20, 80, 80);
</code></pre>
<p>This allows more flexible layouts for galleries, reports, or grouped documents.</p>
<h2 id="heading-configuring-pdf-settings">Configuring PDF Settings</h2>
<p>Before generating the final PDF, users can customize several layout and output settings.</p>
<p>These settings improve document quality and give users more control over the generated file.</p>
<p>Here’s what the configuration panel looks like inside the tool:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/c13b5ba4-054b-4698-9ea8-48d932926c91.png" alt="allinonetools - image to pdf convertion setting" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-sorting-images">Sorting Images</h3>
<p>When multiple images are uploaded, organizing them properly becomes important before generating the PDF.</p>
<p>Users may want to sort images alphabetically, reverse the order, or arrange them based on file size.</p>
<p>For example, images can be sorted alphabetically like this:</p>
<pre><code class="language-javascript">files.sort((a, b) =&gt; a.name.localeCompare(b.name));
</code></pre>
<p>You can also sort files by size:</p>
<pre><code class="language-javascript">files.sort((a, b) =&gt; a.size - b.size);
</code></pre>
<p>Here’s an example of sorting options inside the tool:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/bd834efc-6a70-41be-8597-9cddbb20c5ae.png" alt="image to pdf convertion image sorting option" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This helps users organize documents more efficiently before converting them into a PDF.</p>
<h3 id="heading-choosing-orientation">Choosing Orientation</h3>
<p>Different images work better in different page orientations.</p>
<p>Portrait orientation works well for vertical images, while landscape orientation is better for wider images.</p>
<p>For example:</p>
<pre><code class="language-javascript">const pdf = new jsPDF({
  orientation: "portrait"
});
</code></pre>
<p>You can also switch to <code>"landscape"</code> when needed.</p>
<p>Here’s an example of orientation options inside the tool:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/1c9edb6b-3c15-4246-a060-0834a05493bf.png" alt="image to pdf convertion image orientation option" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-selecting-page-size">Selecting Page Size</h3>
<p>PDF page size controls the dimensions of the generated document.</p>
<p>For example:</p>
<pre><code class="language-javascript">const pdf = new jsPDF({
  unit: "mm",
  format: "a4"
});
</code></pre>
<p>This creates an A4-sized PDF document using millimeter units.</p>
<p>Other formats like letter, legal, or custom page sizes can also be supported.</p>
<p>Here’s an example of selecting page size options inside the tool:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/72c06be9-12e5-4c55-937a-93420072ae04.png" alt="image to pdf convertion page size selection option" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-adding-margins">Adding Margins</h3>
<p>Margins create spacing between the image and the edges of the page.</p>
<p>Without margins, images may touch the borders and appear cramped.</p>
<p>For example:</p>
<pre><code class="language-javascript">const margin = 10;

pdf.addImage(imageData, "JPEG", margin, margin, 180, 120);
</code></pre>
<p>Here’s an example of margins options inside the tool:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/bd36e4aa-d65c-486f-970e-fdfc283235b7.png" alt="image to pdf convertion margin selection option" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This creates cleaner spacing around the inserted image.</p>
<h3 id="heading-automatic-image-fitting">Automatic Image Fitting</h3>
<p>One common issue when generating PDFs from images is incorrect sizing.</p>
<p>If images are inserted with fixed dimensions, they may stretch, overflow outside the page, or appear distorted.</p>
<p>Instead, it’s better to calculate image dimensions dynamically.</p>
<p>For example:</p>
<pre><code class="language-javascript">const pageWidth = pdf.internal.pageSize.getWidth();

const imgWidth = pageWidth - 20;

const imgHeight = (image.height * imgWidth) / image.width;

pdf.addImage(imageData, "JPEG", 10, 10, imgWidth, imgHeight);
</code></pre>
<p>This automatically scales images proportionally while maintaining margins and layout consistency.</p>
<h3 id="heading-merge-options">Merge Options</h3>
<p>One useful feature is allowing different output modes.</p>
<p>For example, users may want to merge all uploaded images into a single PDF document when creating reports, notes, or combined files.</p>
<p>In some cases, users may prefer generating separate PDFs for each image instead of combining everything together. This can be useful when exporting individual documents or scanned pages.</p>
<p>Custom grouping is another helpful option because it allows users to combine selected images into multiple PDFs based on their own arrangement or categories.</p>
<p>These different output modes make the tool much more flexible for different real-world use cases.</p>
<p>A simple selection dropdown works well:</p>
<pre><code class="language-html">&lt;select id="mergeMode"&gt;
  &lt;option&gt;Merge all into Single PDF&lt;/option&gt;
  &lt;option&gt;Create Separate PDFs&lt;/option&gt;
  &lt;option&gt;Custom Grouping&lt;/option&gt;
&lt;/select&gt;
</code></pre>
<p>Once selected, JavaScript can apply different generation logic based on the chosen mode.</p>
<p>Here’s an example of merge mode options inside the tool:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/84dfc427-6037-4520-b3e3-accb3d0837db.png" alt="image to pdf convertion image merge option" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This makes the tool more flexible for handling different document workflows.</p>
<h2 id="heading-renaming-and-downloading-the-pdf">Renaming and Downloading the PDF</h2>
<p>After generating the document, users may want to rename the file before downloading.</p>
<p>You can prompt for a filename like this:</p>
<pre><code class="language-javascript">const fileName = prompt("Enter PDF name:", "images");

pdf.save(`${fileName}.pdf`);
</code></pre>
<p>This gives users more control over the exported file.</p>
<p>Here’s an example of the rename popup inside the tool:</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/6409dfec-8ee5-4437-b7ca-2f78b27323ec.png" alt="image to pdf convertion rename popup" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-demo-how-the-image-to-pdf-tool-works">Demo: How the Image to PDF Tool Works</h2>
<h3 id="heading-step-1-upload-images">Step 1: Upload Images</h3>
<p>Users upload one or multiple image files into the browser-based tool.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/0d9e3a68-26cd-4b5e-83ed-93149fcffdd1.png" alt="Image upload interface for browser-based image to PDF converter tool" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The tool supports common formats like JPG, PNG, and WEBP.</p>
<h3 id="heading-step-2-configure-pdf-settings">Step 2: Configure PDF Settings</h3>
<p>Users can customize layout settings before generating the PDF.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/4cab41c0-ffe7-4f07-9846-649d538628a9.png" alt="Image to PDF converter settings showing sorting, orientation, page size, margins, and uploaded image previews" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This includes:</p>
<ul>
<li><p>sorting images</p>
</li>
<li><p>orientation</p>
</li>
<li><p>page size</p>
</li>
<li><p>margins</p>
</li>
<li><p>merge mode</p>
</li>
</ul>
<p>These settings help create cleaner PDF output.</p>
<h3 id="heading-step-3-generate-the-pdf">Step 3: Generate the PDF</h3>
<p>Once settings are configured, users click the convert button.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/1323e37f-5947-415b-a747-e471b3b5ac53.png" alt="Convert to PDF button in browser-based image to PDF tool" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The browser processes all uploaded images locally and generates the PDF instantly.</p>
<h3 id="heading-step-4-rename-the-generated-file">Step 4: Rename the Generated File</h3>
<p>Before downloading, users can rename the generated PDF.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/620bfe32-903f-4fce-9228-2afc7f8940eb.png" alt="Rename generated PDF popup before downloading the converted file" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This improves organization when exporting multiple documents.</p>
<h3 id="heading-step-5-download-the-pdf">Step 5: Download the PDF</h3>
<p>Finally, the generated PDF becomes available for download directly in the browser.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979d22f93bc273cc33971b1/a1c808d1-4af6-4ca8-9bfc-a6df158e7777.png" alt="PDF preview and download section showing generated image PDF file" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The entire process works without uploading files to any server.</p>
<h2 id="heading-important-notes-from-real-world-use">Important Notes from Real-World Use</h2>
<p>When working with large images, performance and memory usage become important.</p>
<p>Large images can slow down PDF generation and create unnecessarily large output files.</p>
<p>For example, you can limit upload size before processing:</p>
<pre><code class="language-plaintext">const MAX_SIZE = 10 * 1024 * 1024;

if (file.size &gt; MAX_SIZE) {
  alert("Image is too large.");
  return;
}
</code></pre>
<p>Another useful optimization is resizing images before inserting them into the PDF.</p>
<p>For example:</p>
<pre><code class="language-plaintext">const canvas = document.createElement("canvas");

const ctx = canvas.getContext("2d");

canvas.width = image.width * 0.5;
canvas.height = image.height * 0.5;

ctx.drawImage(image, 0, 0, canvas.width, canvas.height);

const resizedImage = canvas.toDataURL("image/jpeg", 0.7);
</code></pre>
<p>This reduces image dimensions and compression quality before generating the PDF.</p>
<p>It also helps reduce memory usage and improves PDF generation speed for large files.</p>
<p>Since everything runs directly inside the browser, uploaded images never leave the user’s device, which improves privacy.</p>
<h2 id="heading-common-mistakes-to-avoid">Common Mistakes to Avoid</h2>
<p>One common mistake is not validating uploaded files before processing them.</p>
<p>For example, users may upload unsupported formats or attempt to generate a PDF without selecting images.</p>
<p>Always validate input before processing:</p>
<pre><code class="language-javascript">if (!fileInput.files.length) {
  alert("Please upload images first.");
  return;
}
</code></pre>
<p>Another issue is inserting very large images without resizing them first.</p>
<p>Large images can create oversized PDFs and reduce performance significantly.</p>
<p>Incorrect image positioning is also common.</p>
<p>If dimensions are hardcoded incorrectly, images may overflow outside the page or become distorted.</p>
<p>Using dynamic image sizing and margins helps prevent these layout issues.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you built a browser-based image to PDF converter using JavaScript.</p>
<p>You learned how to upload images, generate PDF documents, configure layout settings, and export files directly inside the browser.</p>
<p>More importantly, you saw how modern browsers can handle document generation locally without relying on a backend server.</p>
<p>This approach keeps the tool fast, private, and easy to use.</p>
<p>Once you understand this workflow, you can extend it further with features like compression, drag-and-drop sorting, watermarking, batch exports, or advanced PDF editing tools.</p>
<p>You can also try a full working version here:</p>
<p><a href="https://allinonetools.net/image-to-pdf-converter/">https://allinonetools.net/image-to-pdf-converter/</a></p>
<p>And that’s where things start getting really interesting.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Rise of AI Agents: How Software Is Learning to Act ]]>
                </title>
                <description>
                    <![CDATA[ Software has always been reactive. You click a button, it responds. You call an API, it returns data. Even the most sophisticated systems have historically depended on explicit instructions and tightl ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-rise-of-ai-agents-how-software-is-learning-to-act/</link>
                <guid isPermaLink="false">69fe184ef239332df4ea34e7</guid>
                
                    <category>
                        <![CDATA[ ai agents ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Fri, 08 May 2026 17:07:26 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/1351f6d0-79c2-491b-a8e7-943cc9ece905.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Software has always been reactive.</p>
<p>You click a button, it responds. You call an API, it returns data.</p>
<p>Even the most sophisticated systems have historically depended on explicit instructions and tightly defined workflows. That model is starting to break.</p>
<p>A new class of software is emerging that doesn't just respond, but act.</p>
<p>This shift isn't cosmetic. It changes how software is designed, how systems are operated, and how work itself is executed.</p>
<p>Instead of encoding every step of a workflow, developers are now defining goals, constraints, and tools, then letting software figure out the execution path. The result is software that behaves less like a function and more like an operator.</p>
<p>In this article, you'll learn what AI agents actually are, how they differ from traditional software systems, and why they're starting to represent a major shift in modern software design.</p>
<p>This article is written for developers, technical founders, engineering managers, and anyone building software systems with AI components.</p>
<p>You don't need prior experience building AI agents, but it helps to be familiar with Basic Python syntax and Large language models (LLMs)</p>
<h3 id="heading-what-well-cover">What We'll Cover:</h3>
<ul>
<li><p><a href="#heading-from-deterministic-systems-to-goal-driven-execution">From Deterministic Systems to Goal-Driven Execution</a></p>
</li>
<li><p><a href="#heading-the-core-components-of-an-ai-agent">The Core Components of an AI Agent</a></p>
</li>
<li><p><a href="#heading-why-ai-agents-are-emerging-now">Why AI Agents Are Emerging Now</a></p>
</li>
<li><p><a href="#heading-the-illusion-and-reality-of-autonomy">The Illusion and Reality of Autonomy</a></p>
</li>
<li><p><a href="#heading-designing-agents-that-work-in-practice">Designing Agents That Work in Practice</a></p>
</li>
<li><p><a href="#heading-multi-agent-systems-and-coordination">Multi-Agent Systems and Coordination</a></p>
</li>
<li><p><a href="#heading-where-ai-agents-are-already-delivering-value">Where AI Agents Are Already Delivering Value</a></p>
</li>
<li><p><a href="#heading-the-shift-in-software-design">The Shift in Software Design</a></p>
</li>
<li><p><a href="#heading-what-comes-next">What Comes Next</a></p>
</li>
</ul>
<h2 id="heading-from-deterministic-systems-to-goal-driven-execution">From Deterministic Systems to Goal-Driven Execution</h2>
<p>Traditional software systems are deterministic. Given the same input, they produce the same output.</p>
<p>This predictability is what makes them reliable, but it's also what limits them. Any variation in workflow requires new code, new conditions, and new branches.</p>
<p>AI agents introduce a different model. They're goal-driven rather than instruction-driven. Instead of specifying every step, you define an objective and provide access to tools. The agent decides how to achieve the objective, often adapting in real time.</p>
<p>Consider a simple task like summarizing a set of documents and emailing the result. In a traditional system, you would write a pipeline that loads documents, processes them, formats the output, and sends an email. Each step is explicitly coded.</p>
<p>With an agent, the system might look more like this:</p>
<pre><code class="language-plaintext">from openai import OpenAI

client = OpenAI()
goal = "Summarize all documents in /reports and email a concise briefing to the leadership team"
tools = [
    "read_files",
    "summarize_text",
    "send_email"
]
response = client.responses.create(
    model="gpt-4.1",
    input=f"Goal: {goal}. Available tools: {tools}"
)
print(response.output_text)
</code></pre>
<p>This example is simplified, but it captures the shift. The developer defines intent and capability. The agent determines execution.</p>
<h2 id="heading-the-core-components-of-an-ai-agent">The Core Components of an AI&nbsp;Agent</h2>
<p>To understand how agents work, it helps to break them into components. At a high level, most agents consist of reasoning, memory, and tools.</p>
<p>Reasoning is handled by a large language model. This is what allows the agent to interpret goals, plan actions, and adapt when something fails. It's not just generating text, it's generating decisions.</p>
<p>Memory allows the agent to maintain context across steps. Without memory, the agent behaves like a stateless function. With memory, it can track progress, recall past actions, and refine its approach.</p>
<p><a href="https://www.freecodecamp.org/news/how-to-build-your-first-mcp-server-using-fastmcp/">Tools are what make the agent useful</a>. A tool can be anything from an API to a database query to a shell command. The agent doesn't need to know how the tool works internally. It only needs to know when and how to use it.</p>
<p>Here is a minimal example of tool usage in an agent loop:</p>
<pre><code class="language-plaintext">def agent_loop(goal, tools):
    context = []
    
    while True:
        prompt = f"Goal: {goal}\nContext: {context}\nWhat should be done next?"
        
        decision = model.generate(prompt)
        
        if decision == "DONE":
            break
        
        if decision.startswith("USE_TOOL"):
            tool_name, tool_input = parse_tool_call(decision)
            result = tools[tool_name](tool_input)
            context.append(result)
        else:
            context.append(decision)
    
    return context
</code></pre>
<p>This loop is where the agent “acts.” It observes, decides, executes, and updates its understanding.</p>
<h2 id="heading-why-ai-agents-are-emerging-now">Why AI Agents Are Emerging&nbsp;Now</h2>
<p>The idea of autonomous software isn't new. What has changed is the capability of the underlying models.</p>
<p>Large language models can now reason across multiple steps, interpret unstructured inputs, and generate structured outputs that can drive real systems.</p>
<p>Equally important is the ecosystem around them. APIs are more standardized, infrastructure is more programmable, and data is more accessible. This makes it easier to expose tools and let them interact with real systems helping build some of the <a href="https://nexos.ai/blog/best-ai-agents/">best AI agents</a> in use today.</p>
<p>There's also an economic driver. Many workflows today are still manual, even in highly digitized organizations. These workflows often involve coordination across systems, interpretation of data, and decision-making under uncertainty. This is exactly the kind of work agents are suited for.</p>
<h2 id="heading-the-illusion-and-reality-of-autonomy">The Illusion and Reality of&nbsp;Autonomy</h2>
<p>It's tempting to describe AI agents as fully autonomous. In practice, most are not. They operate within constraints defined by developers. They rely on tools that expose only certain actions. They're often monitored, rate-limited, and evaluated at each step.</p>
<p>What makes them different isn't complete autonomy, but partial autonomy. They can decide how to execute within a bounded environment.</p>
<p>This distinction matters because it affects how systems are designed. You're not building a system that always behaves predictably. You're building a system that explores a solution space and converges on an outcome.</p>
<p>That introduces new challenges. Agents can take inefficient paths. They can misinterpret goals. They can fail in ways that are hard to debug because the failure isn't a single error, but a chain of decisions.</p>
<h2 id="heading-designing-agents-that-work-in-practice">Designing Agents That Work in&nbsp;Practice</h2>
<p>Building an agent is easy. Building one that works reliably is harder. The difference comes down to control.</p>
<p>One approach is to constrain the agent’s <a href="https://milvus.io/ai-quick-reference/what-is-an-action-space-in-rl">action space</a>. Instead of giving it open-ended access, you define a limited set of tools with clear interfaces. This reduces ambiguity and makes behavior more predictable.</p>
<p>Another approach is to introduce intermediate checkpoints. Instead of letting the agent run freely, you validate its decisions at key steps. You can do this through rules, secondary models, or even human review.</p>
<p>Here's an example of adding a validation layer:</p>
<pre><code class="language-plaintext">def safe_execute(tool, input_data):
    if not validate_input(tool, input_data):
        return "Invalid input"
    
    result = tool(input_data)
    
    if not validate_output(tool, result):
        return "Invalid output"
    
    return result
</code></pre>
<p>This pattern is critical in production systems. It turns an unconstrained agent into a controlled system that can still adapt, but within safe boundaries.</p>
<h2 id="heading-multi-agent-systems-and-coordination">Multi-Agent Systems and Coordination</h2>
<p>As agents become more capable, a single agent is often not enough. Complex tasks can be decomposed into multiple agents, each responsible for a specific function.</p>
<p>For example, one agent might handle data retrieval, another might handle analysis, and a third might handle communication. These agents can coordinate by passing structured messages.</p>
<pre><code class="language-plaintext">class Message:
    def __init__(self, sender, receiver, content):
        self.sender = sender
        self.receiver = receiver
        self.content = content

def send_message(agent, message):
    return agent.process(message)
message = Message("retriever", "analyst", "Data collected from API")
response = send_message(analyst_agent, message)
</code></pre>
<p>This model starts to resemble a distributed system, but with agents instead of services. Coordination becomes a first-class concern. You need to define protocols, handle failures, and ensure consistency across agents.</p>
<h2 id="heading-where-ai-agents-are-already-delivering-value">Where AI Agents Are Already Delivering Value</h2>
<p>Despite the hype, there are concrete areas where agents are already useful. Internal tooling is one of them. Automating repetitive workflows, generating reports, and orchestrating tasks across systems are all well-suited for agents.</p>
<p>Customer support is another area. Agents can handle complex queries that require accessing multiple systems, not just retrieving canned responses.</p>
<p>Security and compliance workflows are also a strong fit. These often involve monitoring signals, correlating data, and taking action based on rules that aren't always deterministic.</p>
<p>What these use cases have in common is that they involve structured environments with clear objectives and measurable outcomes. Agents perform best when the problem space is bounded, even if the execution path is not.</p>
<h2 id="heading-the-shift-in-software-design">The Shift in Software&nbsp;Design</h2>
<p>The rise of AI agents isn't just about adding a new feature. It's about changing the abstraction layer of software.</p>
<p>Instead of writing code that directly implements behavior, you're designing systems that enable behavior. You define goals, expose capabilities, and enforce constraints. The actual execution becomes dynamic.</p>
<p>This requires a different mindset. Debugging is no longer just about tracing code. It's about understanding decision paths. Testing is no longer just about input-output pairs. It's about evaluating behavior across scenarios.</p>
<p>Observability becomes critical. You need to log not just what the system did, but why it did it. This includes prompts, intermediate decisions, and tool interactions.</p>
<h2 id="heading-what-comes-next">What Comes&nbsp;Next</h2>
<p>AI agents are still in the relatively early stages. The current generation is powerful but imperfect. Reliability is a major challenge. So is cost, especially when agents require multiple model calls per task.</p>
<p>But the direction is clear: software is moving from static execution to dynamic action. The boundary between user and system is becoming less rigid. Instead of telling software what to do step by step, users will increasingly define outcomes and let systems figure out the rest.</p>
<p>This doesn't eliminate the need for engineers. It changes what engineers do. The focus shifts from implementing logic to designing systems that can reason, act, and adapt.</p>
<p>The rise of AI agents marks a transition. Software is no longer just a tool. It's becoming an actor.</p>
<p><em>Join my</em> <a href="https://applyaito.substack.com/"><em><strong>Applied AI newsletter</strong></em></a> <em>to learn how to build and ship real AI systems. Practical projects, production-ready code, and direct Q&amp;A. You can also</em> <a href="https://www.linkedin.com/in/manishmshiva/"><em><strong>connect with me on</strong></em> <em><strong>LinkedIn</strong></em></a><em><strong>.</strong></em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Complete SaaS Payment Flow with Stripe, Webhooks, and Email Notifications ]]>
                </title>
                <description>
                    <![CDATA[ Most Stripe tutorials end at the checkout page. The customer clicks "Pay," Stripe processes the charge, and the tutorial congratulates you on integrating payments. But that's only the first 10% of a r ]]>
                </description>
                <link>https://www.freecodecamp.org/news/saas-payment-flow-stripe-webhooks-email/</link>
                <guid isPermaLink="false">69fe0830f239332df4de5722</guid>
                
                    <category>
                        <![CDATA[ TypeScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Node.js ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Software Engineering ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Magnus Rødseth ]]>
                </dc:creator>
                <pubDate>Fri, 08 May 2026 15:58:40 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/de7d5c4d-062c-4879-892c-4486c7c461af.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most Stripe tutorials end at the checkout page. The customer clicks "Pay," Stripe processes the charge, and the tutorial congratulates you on integrating payments.</p>
<p>But that's only the first 10% of a real payment system.</p>
<p>What happens after the customer pays? You need to record the purchase in your database, send a confirmation email, and grant product access (a GitHub repo invitation, an API key, a license file). You need to notify yourself as the admin. You need to handle refunds two weeks later and send recovery emails when someone abandons checkout.</p>
<p>This is the complete payment lifecycle, and it's where most SaaS applications break.</p>
<p>This article walks you through building the entire flow, from the "Buy" button to the "Welcome" email and everything in between. Every code example comes from a production application processing real payments. You'll see how to design the database schema, create Stripe products, build the checkout flow, process purchases reliably, handle refunds, recover abandoned carts, and send transactional emails.</p>
<p>Here is what you'll learn:</p>
<ul>
<li><p>How to design a database schema that tracks every stage of a purchase</p>
</li>
<li><p>How to create Stripe products and prices programmatically</p>
</li>
<li><p>How to build a checkout flow with success/cancel handling</p>
</li>
<li><p>How to process webhooks securely with signature verification</p>
</li>
<li><p>How to split post-payment processing into durable, independently retried steps</p>
</li>
<li><p>How to handle full and partial refunds with automatic access revocation</p>
</li>
<li><p>How to recover revenue from abandoned checkouts</p>
</li>
<li><p>How to build transactional email templates with React Email and Resend</p>
</li>
<li><p>How to test the entire flow locally with Stripe CLI and Inngest</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-how-to-design-the-payment-database-schema">How to Design the Payment Database Schema</a></p>
</li>
<li><p><a href="#heading-how-to-create-stripe-products-and-prices">How to Create Stripe Products and Prices</a></p>
</li>
<li><p><a href="#heading-how-to-build-the-checkout-flow">How to Build the Checkout Flow</a></p>
</li>
<li><p><a href="#heading-how-to-handle-webhooks-securely">How to Handle Webhooks Securely</a></p>
</li>
<li><p><a href="#heading-how-to-process-purchases-with-durable-background-jobs">How to Process Purchases with Durable Background Jobs</a></p>
</li>
<li><p><a href="#heading-how-to-handle-refunds">How to Handle Refunds</a></p>
</li>
<li><p><a href="#heading-how-to-recover-abandoned-checkouts">How to Recover Abandoned Checkouts</a></p>
</li>
<li><p><a href="#heading-how-to-send-transactional-emails-with-react-email">How to Send Transactional Emails with React Email</a></p>
</li>
<li><p><a href="#heading-how-to-test-the-complete-flow-locally">How to Test the Complete Flow Locally</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along, you should be familiar with:</p>
<ul>
<li><p>TypeScript and Node.js</p>
</li>
<li><p>SQL databases (the examples use PostgreSQL)</p>
</li>
<li><p>React (for email templates)</p>
</li>
<li><p>Basic understanding of webhooks</p>
</li>
</ul>
<p>You don't need prior experience with any of the specific libraries. This handbook explains each one as it appears.</p>
<h3 id="heading-what-you-need-installed">What You Need Installed</h3>
<p>Install these packages to run the code examples:</p>
<pre><code class="language-bash">bun add stripe drizzle-orm @neondatabase/serverless inngest resend @react-email/components
</code></pre>
<p>You'll also need:</p>
<ul>
<li><p>A <a href="https://dashboard.stripe.com/register">Stripe account</a> (test mode is fine)</p>
</li>
<li><p>A <a href="https://neon.tech">Neon</a> PostgreSQL database (or any PostgreSQL instance)</p>
</li>
<li><p>A <a href="https://resend.com">Resend</a> account for sending emails</p>
</li>
<li><p>The <a href="https://stripe.com/docs/stripe-cli">Stripe CLI</a> for local webhook testing</p>
</li>
</ul>
<h3 id="heading-environment-variables">Environment Variables</h3>
<p>Set up these environment variables in your <code>.env</code> file:</p>
<pre><code class="language-bash"># Database
DATABASE_URL=postgresql://...

# Stripe
STRIPE_SECRET_KEY=sk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
STRIPE_PRO_PRICE_ID=price_...

# Email
RESEND_API_KEY=re_...
EMAIL_FROM="Your App &lt;noreply@mail.yourapp.com&gt;"
ADMIN_EMAIL=you@yourapp.com

# App
BETTER_AUTH_URL=http://localhost:3000
</code></pre>
<h2 id="heading-how-to-design-the-payment-database-schema">How to Design the Payment Database Schema</h2>
<p>Before writing any Stripe code, you need a database schema that can track a purchase through every stage of its lifecycle: creation, completion, partial refund, and full refund.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69a694d8d4dc9b42434c218f/6d0650fa-a568-4cb5-8560-8a2414635476.png" alt="Purchase status state machine showing transitions from pending to completed via Stripe webhook, then to refunded or partially refunded" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>A purchase starts as <code>pending</code> when the user clicks "Buy." After Stripe confirms payment, it transitions to <code>completed</code>. From there, it can move to <code>refunded</code> or <code>partially_refunded</code>. Pending purchases that are never completed expire after 24 hours (abandoned carts).</p>
<p>Here is the schema I use in production, defined with <a href="https://orm.drizzle.team">Drizzle ORM</a>. The examples throughout this article grant access to a private GitHub repository because that's what this particular product sells.</p>
<p>Your "grant access" step will be different: upgrading a user to a Pro plan, provisioning API credits, unlocking course content, or activating a subscription. The schema fields and step logic change, but the durable execution pattern is the same.</p>
<pre><code class="language-typescript">// src/lib/db/schema.ts
import {
  boolean,
  integer,
  pgEnum,
  pgTable,
  text,
  timestamp,
  varchar,
} from "drizzle-orm/pg-core";

export const purchaseTierEnum = pgEnum("purchase_tier", ["pro"]);
export const purchaseStatusEnum = pgEnum("purchase_status", [
  "completed",
  "partially_refunded",
  "refunded",
]);

export const users = pgTable("users", {
  id: text("id").primaryKey(),
  email: varchar("email", { length: 255 }).notNull().unique(),
  emailVerified: boolean("email_verified").notNull().default(false),
  name: text("name"),
  image: text("image"),
  githubUsername: text("github_username"),
  createdAt: timestamp("created_at").notNull().defaultNow(),
  updatedAt: timestamp("updated_at").notNull().defaultNow(),
});

export const purchases = pgTable("purchases", {
  id: text("id")
    .primaryKey()
    .$defaultFn(() =&gt; crypto.randomUUID()),
  userId: text("user_id")
    .notNull()
    .references(() =&gt; users.id, { onDelete: "cascade" }),
  stripeCheckoutSessionId: text("stripe_checkout_session_id")
    .notNull()
    .unique(),
  stripeCustomerId: text("stripe_customer_id"),
  stripePaymentIntentId: text("stripe_payment_intent_id"),
  tier: purchaseTierEnum("tier").notNull(),
  status: purchaseStatusEnum("status").notNull().default("completed"),
  githubAccessGranted: boolean("github_access_granted")
    .notNull()
    .default(false),
  githubInvitationId: text("github_invitation_id"),
  amount: integer("amount").notNull(),
  currency: text("currency").notNull().default("usd"),
  purchasedAt: timestamp("purchased_at").notNull().defaultNow(),
  createdAt: timestamp("created_at").notNull().defaultNow(),
  updatedAt: timestamp("updated_at").notNull().defaultNow(),
});

export type Purchase = typeof purchases.$inferSelect;
export type NewPurchase = typeof purchases.$inferInsert;
</code></pre>
<p>Let me walk through the design decisions behind this schema.</p>
<h3 id="heading-why-three-stripe-id-columns">Why Three Stripe ID Columns?</h3>
<p>The <code>purchases</code> table stores three separate Stripe identifiers: <code>stripeCheckoutSessionId</code>, <code>stripeCustomerId</code>, and <code>stripePaymentIntentId</code>.</p>
<p>Each one serves a different purpose.</p>
<p>The <strong>checkout session ID</strong> is what you receive first. When a customer starts checkout, Stripe creates a session and gives you this ID. You use it to claim the purchase after the customer returns from Stripe's hosted checkout page.</p>
<p>The <code>unique()</code> constraint on this column is your idempotency guard. If someone tries to claim the same session twice, the database rejects the second insert.</p>
<p>The <strong>customer ID</strong> is Stripe's internal identifier for the buyer. You need this to look up the customer's payment history in Stripe's dashboard and to create future checkout sessions pre-filled with their billing info.</p>
<p>The <strong>payment intent ID</strong> is what Stripe sends in refund webhook events. When a <code>charge.refunded</code> event fires, it includes the payment intent ID but not the checkout session ID. Without storing this field, you would have no way to match a refund back to a purchase in your database.</p>
<h3 id="heading-why-track-access-state-in-your-database">Why Track Access State in Your Database</h3>
<p>The <code>githubAccessGranted</code> and <code>githubInvitationId</code> fields might look unnecessary. You could check GitHub's API to see if a user has access. But querying an external API every time you need to check a user's access state is slow, rate-limited, and unreliable.</p>
<p>By tracking access state in your own database, you can answer "does this user have access?" with a single indexed query. You also know whether access was ever granted, which is critical for refund processing. If <code>githubAccessGranted</code> is <code>false</code>, you don't need to revoke anything on refund.</p>
<h3 id="heading-why-a-status-enum-with-three-values">Why a Status Enum with Three Values?</h3>
<p>The <code>purchaseStatusEnum</code> has three values: <code>completed</code>, <code>partially_refunded</code>, and <code>refunded</code>.</p>
<p>This matters for downstream logic. Your dashboard, analytics, support tools, and email sequences all need to know the exact state of a purchase. A partially refunded customer still has access, but a fully refunded customer doesn't.</p>
<p>If you only tracked "refunded" as a boolean, you would lose the distinction between partial and full refunds. That distinction affects whether you revoke product access.</p>
<h3 id="heading-how-to-generate-and-run-migrations">How to Generate and Run Migrations</h3>
<p>After defining your schema, generate a migration file and apply it to your database:</p>
<pre><code class="language-bash"># Generate migration SQL from schema changes
bun run drizzle-kit generate

# Push schema directly (development only)
bun run drizzle-kit push

# Run migrations (production)
bun run drizzle-kit migrate
</code></pre>
<p>Drizzle Kit compares your TypeScript schema to the database and generates the SQL needed to bring them in sync. Review the generated migration file before running it in production. Schema changes are one of the few things you can't easily undo.</p>
<p>For development, <code>drizzle-kit push</code> is faster because it applies changes directly without creating migration files. For production, always use <code>drizzle-kit generate</code> followed by <code>drizzle-kit migrate</code> so you have a versioned record of every schema change.</p>
<h2 id="heading-how-to-create-stripe-products-and-prices">How to Create Stripe Products and Prices</h2>
<p>You can create products and prices through the Stripe dashboard, but managing them programmatically is better for reproducibility. Here's a seed script that creates everything you need:</p>
<pre><code class="language-typescript">// src/lib/payments/seed.ts
import { stripe } from "./index";

const PRODUCTS = [
  {
    name: "My SaaS Product",
    description: "Full access, one-time purchase",
    features: [
      "Full source code access",
      "Production-ready infrastructure",
      "Lifetime updates",
    ],
    metadata: { tier: "pro" },
    prices: [
      {
        lookupKey: "pro_one_time",
        unitAmount: 19900, // $199.00 in cents
        currency: "usd",
        nickname: "Pro One-Time",
      },
    ],
  },
];

async function main() {
  console.log("Seeding Stripe products and prices...\n");

  for (const config of PRODUCTS) {
    // Create or find product
    const products = await stripe.products.list({ active: true, limit: 100 });
    let product = products.data.find((p) =&gt; p.name === config.name);

    if (!product) {
      product = await stripe.products.create({
        name: config.name,
        description: config.description,
        marketing_features: config.features.map((f) =&gt; ({ name: f })),
        metadata: config.metadata,
      });
      console.log(`Created product "\({config.name}" (\){product.id})`);
    }

    // Create prices
    for (const priceConfig of config.prices) {
      const existing = await stripe.prices.list({
        lookup_keys: [priceConfig.lookupKey],
        active: true,
        limit: 1,
      });

      if (existing.data[0]) {
        console.log(`Price "${priceConfig.lookupKey}" already exists`);
        continue;
      }

      const price = await stripe.prices.create({
        product: product.id,
        unit_amount: priceConfig.unitAmount,
        currency: priceConfig.currency,
        nickname: priceConfig.nickname,
        lookup_key: priceConfig.lookupKey,
        transfer_lookup_key: true,
      });

      console.log(`Created price "\({priceConfig.lookupKey}" (\){price.id})`);
    }
  }

  console.log("\nDone! Add the price ID to your .env as STRIPE_PRO_PRICE_ID");
}

main().catch(console.error);
</code></pre>
<p>Run this with <code>bun run src/lib/payments/seed.ts</code>.</p>
<p>A few things worth noting.</p>
<ul>
<li><p><strong>Use</strong> <code>lookup_key</code> <strong>instead of hardcoding price IDs:</strong> Price IDs are different between test and live mode. Lookup keys let you reference prices by name (<code>pro_one_time</code>) rather than by Stripe's generated ID (<code>price_1P...</code>).  </p>
<p>The <code>transfer_lookup_key: true</code> option ensures that if you create a new price with the same lookup key, it replaces the old one automatically.</p>
</li>
<li><p><strong>Prices are in cents:</strong> Stripe's API expects amounts in the smallest currency unit. For USD, that means <code>19900</code> represents $199.00.  </p>
<p>This is a common source of bugs. Always store amounts in cents in your database and convert to dollars only at the display layer.</p>
</li>
<li><p><strong>The seed script is idempotent:</strong> You can run it multiple times safely. It checks for existing products and prices before creating new ones.</p>
</li>
</ul>
<h3 id="heading-how-to-set-up-the-stripe-client">How to Set Up the Stripe Client</h3>
<p>The Stripe client uses lazy initialization so that importing it doesn't throw if the API key is missing at module load time. This matters in build environments where environment variables aren't set.</p>
<pre><code class="language-typescript">// src/lib/payments/index.ts
import Stripe from "stripe";

let stripeClient: Stripe | null = null;

function getStripe(): Stripe {
  if (!stripeClient) {
    const secretKey = process.env.STRIPE_SECRET_KEY;
    if (!secretKey) {
      throw new Error("STRIPE_SECRET_KEY is not set");
    }
    stripeClient = new Stripe(secretKey);
  }
  return stripeClient;
}

export const stripe = new Proxy({} as Stripe, {
  get(_, prop) {
    return Reflect.get(getStripe(), prop);
  },
});
</code></pre>
<p>The <code>Proxy</code> wrapper is the key pattern here. Code across your application imports <code>stripe</code> and calls methods like <code>stripe.checkout.sessions.create(...)</code>. The proxy intercepts every property access and forwards it to the lazily initialized client.</p>
<p>This means the Stripe SDK only initializes when you actually use it, not when the module is imported.</p>
<h2 id="heading-how-to-build-the-checkout-flow">How to Build the Checkout Flow</h2>
<p>The checkout flow has three parts: creating the session, redirecting the customer, and handling the return.</p>
<h3 id="heading-how-to-create-a-checkout-session">How to Create a Checkout Session</h3>
<p>Here's the function that creates a Stripe Checkout session for a one-time payment:</p>
<pre><code class="language-typescript">// src/lib/payments/index.ts
export async function createOneTimeCheckoutSession(params: {
  priceId: string;
  successUrl: string;
  cancelUrl: string;
  metadata: Record&lt;string, string&gt;;
  customerEmail?: string;
  couponId?: string;
}) {
  const client = getStripe();

  const session = await client.checkout.sessions.create({
    mode: "payment",
    line_items: [{ price: params.priceId, quantity: 1 }],
    success_url: params.successUrl,
    cancel_url: params.cancelUrl,
    metadata: params.metadata,
    ...(params.customerEmail &amp;&amp; {
      customer_email: params.customerEmail,
    }),
    ...(params.couponId
      ? { discounts: [{ coupon: params.couponId }] }
      : { allow_promotion_codes: true }),
  });

  return session;
}
</code></pre>
<p>Three details matter here.</p>
<ul>
<li><p><strong>The</strong> <code>mode: "payment"</code> <strong>setting tells Stripe this is a one-time charge</strong>, not a subscription. For subscriptions, you would use <code>mode: "subscription"</code>. The mode affects which webhook events Stripe sends after payment.</p>
</li>
<li><p><strong>The</strong> <code>metadata</code> <strong>field is how you link the Stripe session back to your application.</strong> Pass your internal product tier, user ID, or any other data you need after payment. Stripe stores this metadata and includes it in webhook events and API responses.</p>
</li>
<li><p><strong>The</strong> <code>allow_promotion_codes: true</code> <strong>option shows a promo code field on the checkout page.</strong> If you have a specific coupon to apply (from a landing page URL parameter, for example), pass it via <code>discounts</code> instead. You can't use both at the same time.</p>
</li>
</ul>
<h3 id="heading-how-to-create-the-checkout-api-endpoint">How to Create the Checkout API Endpoint</h3>
<p>Here's the API endpoint that creates a checkout session and returns the URL:</p>
<pre><code class="language-typescript">// src/server/api.ts
app.post("/api/payments/checkout", async ({ set }) =&gt; {
  const priceId = process.env.STRIPE_PRO_PRICE_ID;

  if (!priceId) {
    set.status = 500;
    return { error: "Price not configured" };
  }

  const baseUrl = process.env.BETTER_AUTH_URL ?? "http://localhost:3000";
  const tier = "pro";

  const checkoutSession = await createOneTimeCheckoutSession({
    priceId,
    successUrl: `${baseUrl}/dashboard?purchase=success&amp;session_id={CHECKOUT_SESSION_ID}`,
    cancelUrl: `${baseUrl}/pricing`,
    metadata: { tier },
  });

  return { url: checkoutSession.url };
});
</code></pre>
<p>The <code>{CHECKOUT_SESSION_ID}</code> placeholder in the success URL is a Stripe template variable. Stripe replaces it with the actual session ID when redirecting the customer. This lets your frontend know which session just completed.</p>
<h3 id="heading-how-to-claim-the-purchase-after-checkout">How to Claim the Purchase After Checkout</h3>
<p>When the customer returns to your success URL, your frontend reads the <code>session_id</code> from the URL and sends it to a "claim" endpoint. This endpoint verifies the payment and creates the purchase record.</p>
<pre><code class="language-typescript">// src/server/api.ts
app.post(
  "/api/purchases/claim",
  async ({ body, request, set }) =&gt; {
    const session = await auth.api.getSession({
      headers: request.headers,
    });

    if (!session) {
      set.status = 401;
      return { error: "Unauthorized" };
    }

    const { sessionId } = body;

    // Check if this session was already claimed
    const existing = await db
      .select()
      .from(purchases)
      .where(eq(purchases.stripeCheckoutSessionId, sessionId))
      .limit(1);

    if (existing[0]) {
      return { success: true, alreadyClaimed: true, tier: existing[0].tier };
    }

    // Retrieve the Stripe checkout session to verify payment
    const stripeSession = await retrieveCheckoutSession(sessionId);

    if (stripeSession.payment_status !== "paid") {
      set.status = 400;
      return { error: "Payment not completed" };
    }

    const tier = (stripeSession.metadata?.tier ?? "pro") as PaymentTier;

    // Create purchase record
    await db.insert(purchases).values({
      userId: session.user.id,
      stripeCheckoutSessionId: sessionId,
      stripeCustomerId:
        typeof stripeSession.customer === "string"
          ? stripeSession.customer
          : stripeSession.customer?.id ?? null,
      stripePaymentIntentId:
        typeof stripeSession.payment_intent === "string"
          ? stripeSession.payment_intent
          : stripeSession.payment_intent?.id ?? null,
      tier,
      status: "completed",
      amount: stripeSession.amount_total ?? 0,
      currency: stripeSession.currency ?? "usd",
    });

    // Trigger background processing
    await inngest.send({
      name: "purchase/completed",
      data: {
        userId: session.user.id,
        tier,
        sessionId,
      },
    });

    return { success: true, tier };
  },
  {
    body: t.Object({
      sessionId: t.String(),
    }),
  }
);
</code></pre>
<p>This endpoint does four things, in order.</p>
<ol>
<li><p><strong>First, it checks if the session was already claimed.</strong> The <code>unique()</code> constraint on <code>stripeCheckoutSessionId</code> in the schema prevents duplicate records, but checking first lets you return a clean response without catching a database error.</p>
</li>
<li><p><strong>Second, it verifies payment with Stripe.</strong> Never trust data from the client. The frontend passes the session ID, but you must call Stripe's API to confirm that <code>payment_status</code> is <code>"paid"</code>.</p>
</li>
<li><p><strong>Third, it creates the purchase record.</strong> Notice how it extracts the <code>customer</code> and <code>payment_intent</code> from the Stripe session. Both fields are returned as either strings or expanded objects depending on your Stripe API settings, so the ternary handles both cases.</p>
</li>
<li><p><strong>Fourth, it sends a</strong> <code>purchase/completed</code> <strong>event to Inngest.</strong> This triggers the background processing flow that handles emails, access grants, analytics, and follow-up scheduling. The API endpoint doesn't do any of that work and returns <code>{ success: true }</code> immediately.</p>
</li>
</ol>
<p>This separation between recording the purchase and processing it is fundamental. The database insert is fast and reliable. The downstream processing (emails, API calls, analytics) is slow and unreliable.</p>
<p>By splitting them, you ensure the customer sees a success response instantly while the background work happens durably.</p>
<h2 id="heading-how-to-handle-webhooks-securely">How to Handle Webhooks Securely</h2>
<p>Your webhook endpoint is the entry point for Stripe events that happen outside your checkout flow: refunds, expired sessions, and disputes.</p>
<h3 id="heading-how-to-verify-webhook-signatures">How to Verify Webhook Signatures</h3>
<p>Every webhook from Stripe includes a signature header. You must verify this signature before processing the event. Without verification, anyone could send fake events to your webhook URL.</p>
<pre><code class="language-typescript">// src/lib/payments/index.ts
export async function constructWebhookEvent(
  payload: string | Buffer,
  signature: string
) {
  const webhookSecret = process.env.STRIPE_WEBHOOK_SECRET;
  if (!webhookSecret) {
    throw new Error("STRIPE_WEBHOOK_SECRET is not set");
  }
  const client = getStripe();
  return client.webhooks.constructEventAsync(payload, signature, webhookSecret);
}
</code></pre>
<p>One critical detail: <strong>use</strong> <code>constructEventAsync</code> <strong>instead of</strong> <code>constructEvent</code><strong>.</strong> The async version uses the Web Crypto API, which is compatible with modern runtimes like Bun and Cloudflare Workers. The synchronous version depends on Node.js's <code>crypto</code> module, which isn't available everywhere.</p>
<p>Another critical detail: <strong>pass the raw request body to signature verification.</strong> If your framework parses the body as JSON before you access it, the signature check fails. The signature is computed over the raw bytes of the request, not the parsed JSON.</p>
<h3 id="heading-how-to-build-the-webhook-endpoint">How to Build the Webhook Endpoint</h3>
<p>Here is the production webhook handler. Its only job is to validate the event and route it to the background job system.</p>
<pre><code class="language-typescript">// src/server/api.ts
app.post("/api/payments/webhook", async ({ request, set }) =&gt; {
  const body = await request.text();
  const sig = request.headers.get("stripe-signature");

  if (!sig) {
    set.status = 400;
    return { error: "Missing signature" };
  }

  try {
    const event = await constructWebhookEvent(body, sig);
    console.log(`[Webhook] Received ${event.type}`);

    if (event.type === "charge.refunded") {
      const charge = event.data.object as {
        id: string;
        payment_intent: string;
        amount: number;
        amount_refunded: number;
        currency: string;
      };
      await inngest.send({
        name: "stripe/charge.refunded",
        data: {
          chargeId: charge.id,
          paymentIntentId: charge.payment_intent,
          amountRefunded: charge.amount_refunded,
          originalAmount: charge.amount,
          currency: charge.currency,
        },
      });
    }

    if (event.type === "checkout.session.expired") {
      const session = event.data.object as {
        id: string;
        customer_email: string | null;
      };
      await inngest.send({
        name: "stripe/checkout.session.expired",
        data: {
          sessionId: session.id,
          customerEmail: session.customer_email,
        },
      });
    }

    return { received: true };
  } catch (error) {
    console.error("[Webhook] Stripe verification failed:", error);
    set.status = 400;
    return { error: "Webhook verification failed" };
  }
});
</code></pre>
<p>This is the "thin webhook handler" pattern. Notice what it does <strong>not</strong> do: it does not query the database, send emails, grant access, or call any external service. It validates the signature, extracts the fields it needs, and sends a typed event to Inngest.</p>
<p>The entire handler completes in milliseconds.</p>
<p>Why does this matter? Stripe expects your webhook to return a 2xx response within about 20 seconds. If your handler tries to do too much work (database queries, email sends, API calls), it risks timing out.</p>
<p>Stripe marks it as failed and retries the entire event. Now you have partial completion and duplicate processing.</p>
<p>The thin handler avoids this entirely. Validate, enqueue, return. All the real work happens asynchronously in durable background functions.</p>
<h3 id="heading-why-extract-fields-before-enqueueing">Why Extract Fields Before Enqueueing?</h3>
<p>You might notice that the webhook handler extracts specific fields from the Stripe event before sending them to Inngest:</p>
<pre><code class="language-typescript">await inngest.send({
  name: "stripe/charge.refunded",
  data: {
    chargeId: charge.id,
    paymentIntentId: charge.payment_intent,
    amountRefunded: charge.amount_refunded,
    originalAmount: charge.amount,
    currency: charge.currency,
  },
});
</code></pre>
<p>Why not forward the entire Stripe event? Two reasons.</p>
<p>First, Stripe event objects are large and deeply nested. Your background function only needs five fields. Sending the entire object means your durable function stores a large payload at every checkpoint, and over thousands of runs, this adds up.</p>
<p>Second, extracting fields at the boundary creates a clean contract between your webhook handler and your background functions. If Stripe changes the shape of their event objects in a future API version, you only need to update the extraction logic in the webhook handler. Your background functions keep working because they depend on your own typed data shape, not Stripe's.</p>
<h3 id="heading-how-to-set-up-webhooks-in-production">How to Set Up Webhooks in Production</h3>
<p>For production, you configure webhooks in the Stripe Dashboard:</p>
<ol>
<li><p>Go to Stripe Dashboard, then Developers, then Webhooks.</p>
</li>
<li><p>Add an endpoint pointing to your production URL: <code>https://yourapp.com/api/payments/webhook</code>.</p>
</li>
<li><p>Select the events you want to receive: <code>charge.refunded</code> and <code>checkout.session.expired</code>.</p>
</li>
<li><p>Copy the signing secret and add it to your production environment variables as <code>STRIPE_WEBHOOK_SECRET</code>.</p>
</li>
</ol>
<p>The production signing secret is different from the one the Stripe CLI generates for local testing. Make sure your environment variables are set correctly for each environment.</p>
<h3 id="heading-which-webhook-events-to-listen-for">Which Webhook Events to Listen For</h3>
<p>For a complete payment flow, you need these webhook events configured in Stripe:</p>
<table>
<thead>
<tr>
<th>Event</th>
<th>When It Fires</th>
<th>What You Do</th>
</tr>
</thead>
<tbody><tr>
<td><code>charge.refunded</code></td>
<td>Customer receives a refund</td>
<td>Revoke access (full refund) or update status (partial)</td>
</tr>
<tr>
<td><code>checkout.session.expired</code></td>
<td>Checkout session times out (24 hours)</td>
<td>Send abandoned cart recovery email</td>
</tr>
</tbody></table>
<p>For subscription-based billing, you would also listen for <code>customer.subscription.updated</code>, <code>customer.subscription.deleted</code>, and <code>invoice.payment_failed</code>. This article covers one-time payments, so the examples focus on the two events above.</p>
<p>The <code>checkout.session.completed</code> event is notably absent. For one-time payments, you typically process the purchase in the "claim" endpoint (shown in the previous section) rather than in a webhook, because you need the authenticated user's session to link the purchase to their account.</p>
<h2 id="heading-how-to-process-purchases-with-durable-background-jobs">How to Process Purchases with Durable Background Jobs</h2>
<p>This is the heart of the payment flow. After the purchase record is created and the <code>purchase/completed</code> event is sent, a durable function takes over and runs the entire post-payment workflow.</p>
<p>Each step in this function is individually checkpointed. If step 5 fails, steps 1 through 4 don't re-run. Step 5 retries on its own, and once it succeeds, steps 6 through 9 continue.</p>
<p>This is what "durable execution" means. It's the difference between a payment system that works in development and one that works in production.</p>
<p>I use <a href="https://www.inngest.com/">Inngest</a> for this. It is an event-driven durable execution platform that provides step-level checkpointing out of the box. You define functions with <code>step.run()</code> blocks, and Inngest handles retry logic, state persistence, and observability.</p>
<p>The Inngest client setup is minimal:</p>
<pre><code class="language-typescript">// src/lib/jobs/client.ts
import { Inngest } from "inngest";

export const inngest = new Inngest({
  id: "my-app",
});
</code></pre>
<p>Register your functions with the Inngest serve handler so the dev server (and production) can discover them:</p>
<pre><code class="language-typescript">import { serve } from "inngest/bun";
import { inngest } from "@/lib/jobs/client";
import { stripeFunctions } from "@/lib/jobs/functions/stripe";

const inngestHandler = serve({
  client: inngest,
  functions: [...stripeFunctions],
});

// Mount on your API
app.all("/api/inngest", async (ctx) =&gt; {
  return inngestHandler(ctx.request);
});
</code></pre>
<p>Here's the complete purchase function:</p>
<pre><code class="language-typescript">// src/lib/jobs/functions/stripe.ts
import { eq } from "drizzle-orm";
import { createElement } from "react";

import { inngest } from "../client";
import { trackServerEvent } from "@/lib/analytics/server";
import { brand } from "@/lib/brand";
import { db, purchases, users } from "@/lib/db";
import {
  sendEmail,
  PurchaseConfirmationEmail,
  AdminPurchaseNotificationEmail,
  RepoAccessGrantedEmail,
} from "@/lib/email";
import { addCollaborator } from "@/lib/github";

export const handlePurchaseCompleted = inngest.createFunction(
  { id: "purchase-completed", triggers: [{ event: "purchase/completed" }] },
  async ({ event, step }) =&gt; {
    const { userId, tier, sessionId } = event.data as {
      userId: string;
      tier: string;
      sessionId: string;
    };

    // Step 1: Look up user and purchase details
    const { user, purchase } = await step.run(
      "lookup-user-and-purchase",
      async () =&gt; {
        const userResult = await db
          .select({
            id: users.id,
            email: users.email,
            name: users.name,
            githubUsername: users.githubUsername,
          })
          .from(users)
          .where(eq(users.id, userId))
          .limit(1);

        const foundUser = userResult[0];
        if (!foundUser) {
          throw new Error(`User not found: ${userId}`);
        }

        const purchaseResult = await db
          .select({
            amount: purchases.amount,
            currency: purchases.currency,
            stripePaymentIntentId: purchases.stripePaymentIntentId,
          })
          .from(purchases)
          .where(eq(purchases.stripeCheckoutSessionId, sessionId))
          .limit(1);

        const foundPurchase = purchaseResult[0];

        return {
          user: foundUser,
          purchase: foundPurchase ?? {
            amount: 0,
            currency: "usd",
            stripePaymentIntentId: null,
          },
        };
      }
    );

    // Step 2: Track purchase in analytics
    await step.run("track-purchase-to-posthog", async () =&gt; {
      try {
        await trackServerEvent(userId, "purchase_completed_server", {
          tier,
          amount_cents: purchase.amount,
          currency: purchase.currency,
          stripe_session_id: sessionId,
          stripe_payment_intent_id: purchase.stripePaymentIntentId,
        });
      } catch (error) {
        console.error(`Failed to track to PostHog:`, error);
      }
    });

    // Step 3: Send purchase confirmation to customer
    await step.run("send-purchase-confirmation", async () =&gt; {
      await sendEmail({
        to: user.email,
        subject: `Your ${brand.name} purchase is confirmed!`,
        template: createElement(PurchaseConfirmationEmail, {
          amount: purchase.amount,
          currency: purchase.currency,
          customerEmail: user.email,
        }),
      });
    });

    // Step 4: Send admin notification
    await step.run("send-admin-notification", async () =&gt; {
      const adminEmail = process.env.ADMIN_EMAIL;
      if (!adminEmail) return;

      await sendEmail({
        to: adminEmail,
        subject: `New template sale: ${user.email}`,
        template: createElement(AdminPurchaseNotificationEmail, {
          amount: purchase.amount,
          currency: purchase.currency,
          customerEmail: user.email,
          customerName: user.name,
          stripeSessionId: purchase.stripePaymentIntentId ?? sessionId,
        }),
      });
    });

    // Early return if user has no GitHub username
    if (!user.githubUsername) {
      return { success: true, userId, tier, githubAccessGranted: false };
    }

    // Step 5: Grant GitHub repository access
    const collaboratorResult = await step.run(
      "add-github-collaborator",
      async () =&gt; {
        return addCollaborator(user.githubUsername!);
      }
    );

    // Step 6: Track GitHub access granted
    await step.run("track-github-access", async () =&gt; {
      await trackServerEvent(userId, "github_access_granted", {
        tier,
        github_username: user.githubUsername,
        invitation_status: collaboratorResult.status,
      });
    });

    // Step 7: Update purchase record
    await step.run("update-purchase-record", async () =&gt; {
      await db
        .update(purchases)
        .set({
          githubAccessGranted: true,
          githubInvitationId: collaboratorResult.status,
          updatedAt: new Date(),
        })
        .where(eq(purchases.stripeCheckoutSessionId, sessionId));
    });

    // Step 8: Send repo access email
    await step.run("send-repo-access-email", async () =&gt; {
      const repoUrl = brand.social.github;
      await sendEmail({
        to: user.email,
        subject: `Your ${brand.name} repository access is ready!`,
        template: createElement(RepoAccessGrantedEmail, { repoUrl }),
      });
    });

    // Step 9: Schedule follow-up email sequence
    await step.run("schedule-follow-up", async () =&gt; {
      const purchaseRecord = await db
        .select({ id: purchases.id })
        .from(purchases)
        .where(eq(purchases.stripeCheckoutSessionId, sessionId))
        .limit(1);

      if (purchaseRecord[0]) {
        await inngest.send({
          name: "purchase/follow-up.scheduled",
          data: {
            userId,
            purchaseId: purchaseRecord[0].id,
            tier,
          },
        });
      }
    });

    return { success: true, userId, tier, githubAccessGranted: true };
  }
);
</code></pre>
<p>That's a lot of code. Let me break down why each step exists and why it must be separate.</p>
<h3 id="heading-step-1-look-up-user-and-purchase">Step 1: Look Up User and Purchase</h3>
<pre><code class="language-typescript">const { user, purchase } = await step.run(
  "lookup-user-and-purchase",
  async () =&gt; {
    // Database queries for user and purchase records
    return { user: foundUser, purchase: foundPurchase };
  }
);
</code></pre>
<p>This step queries the database for the user and purchase details. Every subsequent step depends on these values (the user's email, the purchase amount, the user's GitHub username).</p>
<p>Because this is wrapped in <code>step.run()</code>, the return value is cached by Inngest. If a later step fails and the function retries, this step doesn't re-run. The cached values are replayed instead.</p>
<p>If the user doesn't exist in the database, this step throws an error that halts the entire function. There's no point continuing if the user can't be found.</p>
<h3 id="heading-step-2-track-analytics">Step 2: Track Analytics</h3>
<pre><code class="language-typescript">await step.run("track-purchase-to-posthog", async () =&gt; {
  try {
    await trackServerEvent(userId, "purchase_completed_server", {
      tier,
      amount_cents: purchase.amount,
      currency: purchase.currency,
    });
  } catch (error) {
    console.error(`Failed to track to PostHog:`, error);
  }
});
</code></pre>
<p>Analytics tracking gets its own step because analytics services have their own failure modes. PostHog could be rate-limited or temporarily unreachable. If that happens, you don't want it to block the confirmation email.</p>
<p>Notice the try-catch. A tracking failure logs the error but doesn't halt the function. Analytics data is valuable but not critical to the purchase flow.</p>
<h3 id="heading-steps-3-and-4-email-notifications">Steps 3 and 4: Email Notifications</h3>
<p>The customer confirmation and admin notification are separate steps because they are independent operations. If Resend returns a 500 when sending the admin email, the customer should still get their confirmation.</p>
<pre><code class="language-typescript">// Step 3: Customer confirmation
await step.run("send-purchase-confirmation", async () =&gt; {
  await sendEmail({
    to: user.email,
    subject: `Your ${brand.name} purchase is confirmed!`,
    template: createElement(PurchaseConfirmationEmail, {
      amount: purchase.amount,
      currency: purchase.currency,
      customerEmail: user.email,
    }),
  });
});

// Step 4: Admin notification
await step.run("send-admin-notification", async () =&gt; {
  const adminEmail = process.env.ADMIN_EMAIL;
  if (!adminEmail) return;

  await sendEmail({
    to: adminEmail,
    subject: `New template sale: ${user.email}`,
    template: createElement(AdminPurchaseNotificationEmail, {
      // ... admin-specific fields
    }),
  });
});
</code></pre>
<p>The admin notification step includes a guard: if <code>ADMIN_EMAIL</code> isn't set, it returns early. This makes the function work in development environments where you haven't configured all environment variables.</p>
<h3 id="heading-step-5-grant-product-access">Step 5: Grant Product Access</h3>
<pre><code class="language-typescript">if (!user.githubUsername) {
  return { success: true, userId, tier, githubAccessGranted: false };
}

const collaboratorResult = await step.run(
  "add-github-collaborator",
  async () =&gt; {
    return addCollaborator(user.githubUsername!);
  }
);
</code></pre>
<p>This is the step most likely to fail. GitHub's API has rate limits, can time out, and the user's GitHub username might be invalid.</p>
<p>By making it its own step, a GitHub API failure doesn't re-trigger the confirmation email (step 3) or the admin notification (step 4). Those are already checkpointed.</p>
<p>Notice the early return before step 5. If the user has no GitHub username linked, the function returns after step 4. The remaining steps only run when there's a GitHub account to grant access to.</p>
<h3 id="heading-steps-6-7-track-and-update">Steps 6-7: Track and Update</h3>
<p>After granting GitHub access, the function tracks the event in analytics (step 6) and updates the purchase record in the database (step 7).</p>
<p>The database update is intentionally ordered after the GitHub API call. You only set <code>githubAccessGranted: true</code> after the invitation actually succeeded. If you updated the record first and the GitHub step failed, your database would say access was granted when it was not.</p>
<h3 id="heading-step-8-send-access-email">Step 8: Send Access Email</h3>
<pre><code class="language-typescript">await step.run("send-repo-access-email", async () =&gt; {
  const repoUrl = brand.social.github;
  await sendEmail({
    to: user.email,
    subject: `Your ${brand.name} repository access is ready!`,
    template: createElement(RepoAccessGrantedEmail, { repoUrl }),
  });
});
</code></pre>
<p>This email only sends after the GitHub invitation is confirmed. The ordering is deliberate. You don't tell the customer "your access is ready" if the invitation hasn't been sent.</p>
<h3 id="heading-step-9-schedule-follow-up-sequence">Step 9: Schedule Follow-Up Sequence</h3>
<pre><code class="language-typescript">await step.run("schedule-follow-up", async () =&gt; {
  const purchaseRecord = await db
    .select({ id: purchases.id })
    .from(purchases)
    .where(eq(purchases.stripeCheckoutSessionId, sessionId))
    .limit(1);

  if (purchaseRecord[0]) {
    await inngest.send({
      name: "purchase/follow-up.scheduled",
      data: {
        userId,
        purchaseId: purchaseRecord[0].id,
        tier,
      },
    });
  }
});
</code></pre>
<p>The final step triggers a separate function that handles the follow-up email sequence: day 7 onboarding tips, day 14 feedback request, day 30 testimonial request. This is an event-driven chain: one function completes and triggers another.</p>
<p>The follow-up function uses <code>step.sleep()</code> to wait between emails without consuming compute resources:</p>
<pre><code class="language-typescript">export const handlePurchaseFollowUp = inngest.createFunction(
  {
    id: "purchase-follow-up",
    triggers: [{ event: "purchase/follow-up.scheduled" }],
    cancelOn: [
      {
        event: "purchase/follow-up.cancelled",
        match: "data.purchaseId",
      },
    ],
  },
  async ({ event, step }) =&gt; {
    await step.sleep("wait-7-days", "7d");
    await step.run("send-day-7-email", async () =&gt; {
      // Send onboarding tips
    });

    await step.sleep("wait-14-days", "7d");
    await step.run("send-day-14-email", async () =&gt; {
      // Send feedback request
    });
  }
);
</code></pre>
<p>The <code>cancelOn</code> option is worth noting. If the purchase is refunded, you send a <code>purchase/follow-up.cancelled</code> event, and the entire follow-up sequence stops. No stale emails to customers who refunded.</p>
<h3 id="heading-the-rule-for-step-separation">The Rule for Step Separation</h3>
<p>Any operation that calls an external service or could fail independently should be its own step. A database query is a step because the database can be temporarily unreachable. An email send or API call is a step because those services can return errors or hit rate limits.</p>
<p>If two operations always succeed or fail together, they can share a step. But when in doubt, make it separate. The overhead is negligible, and the reliability gain is significant.</p>
<h2 id="heading-how-to-handle-refunds">How to Handle Refunds</h2>
<p>Refund processing is the most commonly overlooked part of a payment system. You need to handle two cases: full refunds (revoke access) and partial refunds (keep access, update status).</p>
<p>Here's the complete refund handler:</p>
<pre><code class="language-typescript">// src/lib/jobs/functions/stripe.ts
export const handleRefund = inngest.createFunction(
  { id: "refund-processed", triggers: [{ event: "stripe/charge.refunded" }] },
  async ({ event, step }) =&gt; {
    const data = event.data as {
      chargeId: string;
      paymentIntentId: string;
      amountRefunded: number;
      originalAmount: number;
      currency: string;
    };

    const chargeId = data.chargeId;
    const paymentIntentId = data.paymentIntentId;
    const currency = data.currency;
    const amountRefunded = data.amountRefunded;
    const originalAmount = data.originalAmount;
    const isFullRefund = amountRefunded &gt;= originalAmount;

    // Step 1: Look up the purchase and user
    const { user, purchase } = await step.run(
      "lookup-purchase-by-payment-intent",
      async () =&gt; {
        const purchaseResult = await db
          .select({
            id: purchases.id,
            userId: purchases.userId,
            stripePaymentIntentId: purchases.stripePaymentIntentId,
            githubAccessGranted: purchases.githubAccessGranted,
          })
          .from(purchases)
          .where(eq(purchases.stripePaymentIntentId, paymentIntentId))
          .limit(1);

        const foundPurchase = purchaseResult[0];
        if (!foundPurchase) {
          return { user: null, purchase: null };
        }

        const userResult = await db
          .select({
            id: users.id,
            email: users.email,
            name: users.name,
            githubUsername: users.githubUsername,
          })
          .from(users)
          .where(eq(users.id, foundPurchase.userId))
          .limit(1);

        return { user: userResult[0] ?? null, purchase: foundPurchase };
      }
    );

    if (!purchase || !user) {
      return { success: false, reason: "no_matching_purchase" };
    }

    let accessRevoked = false;

    // Step 2: Revoke GitHub access (only for full refunds)
    if (isFullRefund &amp;&amp; user.githubUsername &amp;&amp; purchase.githubAccessGranted) {
      const revokeResult = await step.run(
        "revoke-github-access",
        async () =&gt; {
          return removeCollaborator(user.githubUsername!);
        }
      );
      accessRevoked = revokeResult.success;
    }

    // Step 3: Update purchase status
    await step.run("update-purchase-status", async () =&gt; {
      if (isFullRefund) {
        await db
          .update(purchases)
          .set({
            status: "refunded",
            githubAccessGranted: false,
            updatedAt: new Date(),
          })
          .where(eq(purchases.id, purchase.id));
      } else {
        await db
          .update(purchases)
          .set({
            status: "partially_refunded",
            updatedAt: new Date(),
          })
          .where(eq(purchases.id, purchase.id));
      }
    });

    // Step 4: Track refund in analytics
    await step.run("track-refund-event", async () =&gt; {
      try {
        await trackServerEvent(user.id, "refund_processed", {
          charge_id: chargeId,
          payment_intent_id: paymentIntentId,
          amount_cents: amountRefunded,
          original_amount_cents: originalAmount,
          currency,
          is_full_refund: isFullRefund,
          github_access_revoked: accessRevoked,
        });
      } catch (error) {
        console.error(`Failed to track to PostHog:`, error);
      }
    });

    // Step 5: Notify customer
    await step.run("send-customer-notification", async () =&gt; {
      if (isFullRefund) {
        await sendEmail({
          to: user.email,
          subject: `Your ${brand.name} refund has been processed`,
          template: createElement(AccessRevokedEmail, {
            customerEmail: user.email,
            refundAmount: amountRefunded,
            currency,
          }),
        });
      } else {
        await sendEmail({
          to: user.email,
          subject: `Your ${brand.name} partial refund has been processed`,
          template: createElement(PartialRefundEmail, {
            customerEmail: user.email,
            refundAmount: amountRefunded,
            originalAmount,
            currency,
          }),
        });
      }
    });

    // Step 6: Notify admin
    await step.run("send-admin-notification", async () =&gt; {
      const adminEmail = process.env.ADMIN_EMAIL;
      if (!adminEmail) return;

      await sendEmail({
        to: adminEmail,
        subject: `\({isFullRefund ? "Full" : "Partial"} refund processed: \){user.email}`,
        template: createElement(AdminRefundNotificationEmail, {
          customerEmail: user.email,
          customerName: user.name,
          githubUsername: user.githubUsername,
          refundAmount: amountRefunded,
          originalAmount,
          currency,
          stripeChargeId: chargeId,
          accessRevoked,
          isPartialRefund: !isFullRefund,
        }),
      });
    });

    return { success: true, accessRevoked, isFullRefund, userId: user.id };
  }
);
</code></pre>
<h3 id="heading-how-full-refunds-differ-from-partial-refunds">How Full Refunds Differ from Partial Refunds</h3>
<p>The function distinguishes between the two with a simple comparison:</p>
<pre><code class="language-typescript">const isFullRefund = amountRefunded &gt;= originalAmount;
</code></pre>
<p>For a <strong>full refund</strong>, three things happen:</p>
<ol>
<li><p>GitHub access is revoked (the <code>removeCollaborator</code> call).</p>
</li>
<li><p>The purchase status is set to <code>"refunded"</code>.</p>
</li>
<li><p>The customer receives an <code>AccessRevokedEmail</code> explaining that their access has been removed.</p>
</li>
</ol>
<p>For a <strong>partial refund</strong>, the customer keeps access:</p>
<ol>
<li><p>GitHub access is <strong>not</strong> revoked.</p>
</li>
<li><p>The purchase status is set to <code>"partially_refunded"</code>.</p>
</li>
<li><p>The customer receives a <code>PartialRefundEmail</code> showing the refunded amount and the original amount.</p>
</li>
</ol>
<p>This distinction matters for your database integrity. Downstream systems (your dashboard, analytics, support tools) need accurate status values. A <code>partially_refunded</code> purchase still represents an active customer.</p>
<h3 id="heading-how-conditional-steps-work">How Conditional Steps Work</h3>
<p>The "revoke GitHub access" step only runs when three conditions are all true: it's a full refund, the user has a GitHub username, and access was previously granted.</p>
<pre><code class="language-typescript">if (isFullRefund &amp;&amp; user.githubUsername &amp;&amp; purchase.githubAccessGranted) {
  const revokeResult = await step.run("revoke-github-access", async () =&gt; {
    return removeCollaborator(user.githubUsername!);
  });
  accessRevoked = revokeResult.success;
}
</code></pre>
<p>If any of those conditions is false, the step is skipped entirely. Inngest handles this cleanly. The function continues to step 3 (update purchase status) with <code>accessRevoked</code> still set to <code>false</code>.</p>
<h2 id="heading-how-to-recover-abandoned-checkouts">How to Recover Abandoned Checkouts</h2>
<p>When a customer starts checkout but doesn't complete it, Stripe eventually expires the session (after 24 hours by default). You can listen for this event and send a recovery email.</p>
<p>The key insight is that you don't want to send the email immediately. Give the customer an hour to come back on their own.</p>
<pre><code class="language-typescript">// src/lib/jobs/functions/stripe.ts
export const handleCheckoutExpired = inngest.createFunction(
  {
    id: "checkout-expired",
    triggers: [{ event: "stripe/checkout.session.expired" }],
  },
  async ({ event, step }) =&gt; {
    const { customerEmail, sessionId } = event.data as {
      customerEmail: string | null;
      sessionId: string;
    };

    if (!customerEmail) {
      return { success: false, reason: "no_email" };
    }

    // Wait 1 hour before sending recovery email
    await step.sleep("wait-before-recovery-email", "1h");

    // Send abandoned cart email
    await step.run("send-abandoned-cart-email", async () =&gt; {
      const baseUrl =
        process.env.BETTER_AUTH_URL ?? "https://your-app.com";
      const checkoutUrl = `${baseUrl}/pricing`;

      await sendEmail({
        to: customerEmail,
        subject: `Your ${brand.name} checkout is waiting`,
        template: createElement(AbandonedCartEmail, {
          customerEmail,
          checkoutUrl,
        }),
      });
    });

    // Track the recovery attempt
    await step.run("track-abandoned-cart", async () =&gt; {
      try {
        await trackServerEvent("anonymous", "abandoned_cart_email_sent", {
          customer_email: customerEmail,
          session_id: sessionId,
        });
      } catch (error) {
        console.error(`Failed to track to PostHog:`, error);
      }
    });

    return { success: true, customerEmail };
  }
);
</code></pre>
<p>The <code>step.sleep("wait-before-recovery-email", "1h")</code> line pauses the function for one hour without consuming compute resources. Inngest schedules the function to resume after the delay. No cron jobs, no Redis queues, no <code>setTimeout</code> that gets lost when your server restarts.</p>
<p>There is a guard at the top of the function. If the checkout session has no customer email (the customer closed the page before entering their email), the function returns early. You can't send a recovery email without an address.</p>
<p>You could extend this pattern with a second sleep and follow-up email three days later. You could also check if the customer has since completed a purchase (by querying the database in a <code>step.run()</code>) and skip the email if they have.</p>
<h3 id="heading-why-one-hour-is-the-right-delay">Why One Hour Is the Right Delay</h3>
<p>Sending the recovery email immediately after checkout expiration feels aggressive. The customer might still be comparing options, waiting for payday, or just distracted. An immediate email says "we noticed you left," which feels surveillance-like.</p>
<p>Waiting 24 hours is too long. The customer has moved on. They have forgotten your product or found an alternative.</p>
<p>One hour is the sweet spot I found through testing. The customer's intent is still fresh, and the email feels helpful rather than pushy.</p>
<p>Your mileage may vary. The delay is configurable: change <code>"1h"</code> to <code>"30m"</code> or <code>"3h"</code> and redeploy.</p>
<h3 id="heading-why-this-is-better-than-a-cron-job">Why This Is Better Than a Cron Job</h3>
<p>Without durable execution, abandoned cart recovery typically works like this: a cron job runs every hour, queries the database for expired sessions that haven't been recovered yet, sends emails to each one, and marks them as recovered.</p>
<p>This approach has several problems. You need a <code>recovered_at</code> column to avoid sending duplicate emails. You need to handle the case where the cron job crashes halfway through the batch, and you need to tune the cron interval carefully.</p>
<p>The <code>step.sleep()</code> approach eliminates all of this. Each expired session gets its own function instance with its own timer. There's no batch processing, no database flag, and no duplicate risk.</p>
<h2 id="heading-how-to-send-transactional-emails-with-react-email">How to Send Transactional Emails with React Email</h2>
<p>Every email in the payment flow is a React component rendered to HTML and sent via Resend. This gives you type-safe templates with props, component reuse, and the ability to preview emails in your browser during development.</p>
<h3 id="heading-how-to-set-up-the-email-client">How to Set Up the Email Client</h3>
<p>The email client wraps Resend with a simple <code>sendEmail</code> function:</p>
<pre><code class="language-typescript">// src/lib/email/index.ts
import { render } from "@react-email/components";
import type { ReactElement } from "react";
import { Resend } from "resend";

import { brand } from "@/lib/brand";

let resendClient: Resend | null = null;

function getResend(): Resend {
  if (!resendClient) {
    const apiKey = process.env.RESEND_API_KEY;
    if (!apiKey) {
      throw new Error("RESEND_API_KEY is not set");
    }
    resendClient = new Resend(apiKey);
  }
  return resendClient;
}

interface SendEmailOptions {
  to: string | string[];
  subject: string;
  template: ReactElement;
  from?: string;
  replyTo?: string;
}

export async function sendEmail({
  to,
  subject,
  template,
  from = process.env.EMAIL_FROM ?? brand.emails.from,
  replyTo,
}: SendEmailOptions) {
  const resend = getResend();
  const html = await render(template);

  return resend.emails.send({
    from,
    to,
    subject,
    html,
    replyTo,
  });
}
</code></pre>
<p>The <code>render()</code> function from <code>@react-email/components</code> converts a React element into an HTML string. This HTML is what Resend delivers to the customer's inbox.</p>
<p>The <code>from</code> address defaults to your brand's email configuration. You need a verified domain in Resend for this to work. During development, Resend's free tier lets you send to your own email address without domain verification.</p>
<h3 id="heading-how-to-build-a-purchase-confirmation-template">How to Build a Purchase Confirmation Template</h3>
<p>Here's the real purchase confirmation email template:</p>
<pre><code class="language-tsx">// src/lib/email/emails/purchase-confirmation.tsx
import {
  Body,
  Container,
  Head,
  Heading,
  Hr,
  Html,
  Link,
  Preview,
  Section,
  Text,
} from "@react-email/components";

import { brand } from "@/lib/brand";

interface PurchaseConfirmationEmailProps {
  amount: number;
  currency: string;
  customerEmail: string;
}

const colors = {
  primary: "#d97757",
  background: "#faf9f5",
  foreground: "#30302e",
  muted: "#6b6860",
  border: "#e5e4df",
  card: "#ffffff",
  success: "#16a34a",
  successLight: "#f0fdf4",
};

export default function PurchaseConfirmationEmail({
  amount,
  currency,
  customerEmail,
}: PurchaseConfirmationEmailProps) {
  const formattedAmount = new Intl.NumberFormat("en-US", {
    style: "currency",
    currency: currency.toUpperCase(),
  }).format(amount / 100);

  return (
    &lt;Html&gt;
      &lt;Head /&gt;
      &lt;Preview&gt;Your {brand.name} purchase is confirmed!&lt;/Preview&gt;
      &lt;Body style={main}&gt;
        &lt;Container style={container}&gt;
          &lt;Section style={header}&gt;
            &lt;Text style={logoText}&gt;{brand.name}&lt;/Text&gt;
          &lt;/Section&gt;

          &lt;Hr style={divider} /&gt;

          &lt;Section style={successBadge}&gt;
            &lt;Text style={successText}&gt;Payment Successful&lt;/Text&gt;
          &lt;/Section&gt;

          &lt;Heading style={h1}&gt;Thank you for your purchase!&lt;/Heading&gt;

          &lt;Text style={text}&gt;
            Your payment has been processed successfully. We are now setting
            up your GitHub repository access. You will receive another email
            shortly with your access link.
          &lt;/Text&gt;

          &lt;Section style={detailsBox}&gt;
            &lt;Text style={detailsTitle}&gt;Order Details&lt;/Text&gt;

            &lt;Section style={detailRow}&gt;
              &lt;Text style={detailLabel}&gt;Product&lt;/Text&gt;
              &lt;Text style={detailValue}&gt;{brand.name}&lt;/Text&gt;
            &lt;/Section&gt;

            &lt;Section style={detailRow}&gt;
              &lt;Text style={detailLabel}&gt;Amount&lt;/Text&gt;
              &lt;Text style={detailValue}&gt;{formattedAmount}&lt;/Text&gt;
            &lt;/Section&gt;

            &lt;Section style={detailRow}&gt;
              &lt;Text style={detailLabel}&gt;Email&lt;/Text&gt;
              &lt;Text style={detailValue}&gt;{customerEmail}&lt;/Text&gt;
            &lt;/Section&gt;
          &lt;/Section&gt;

          &lt;Text style={text}&gt;
            This is a one-time purchase. No recurring charges will be made.
          &lt;/Text&gt;

          &lt;Hr style={divider} /&gt;

          &lt;Text style={footer}&gt;
            Questions about your purchase? Reply to this email or reach
            out at{" "}
            &lt;Link
              href={`mailto:${brand.emails.support}`}
              style={link}
            &gt;
              {brand.emails.support}
            &lt;/Link&gt;
          &lt;/Text&gt;
        &lt;/Container&gt;
      &lt;/Body&gt;
    &lt;/Html&gt;
  );
}

PurchaseConfirmationEmail.PreviewProps = {
  amount: 9900,
  currency: "usd",
  customerEmail: "customer@example.com",
} satisfies PurchaseConfirmationEmailProps;
</code></pre>
<p>A few things to note about this template.</p>
<ul>
<li><p><strong>Currency formatting happens in the template:</strong> The <code>amount</code> prop is in cents (the same format stored in your database and returned by Stripe). The <code>Intl.NumberFormat</code> call converts it to a human-readable string like "$99.00" and keeps currency formatting logic in one place.</p>
</li>
<li><p><strong>The</strong> <code>PreviewProps</code> <strong>object is for development.</strong> React Email uses these props to render a preview in the browser. The <code>satisfies</code> keyword ensures the preview props match the component's interface.</p>
</li>
<li><p><strong>All styles are inline objects.</strong> Email clients strip <code>&lt;style&gt;</code> tags and ignore most CSS. Inline styles are the only reliable way to style emails across Gmail, Outlook, Apple Mail, and every other client.</p>
</li>
</ul>
<h3 id="heading-how-to-build-a-repo-access-template">How to Build a Repo Access Template</h3>
<p>The repo access email is sent after the GitHub invitation succeeds:</p>
<pre><code class="language-tsx">// src/lib/email/emails/repo-access-granted.tsx
import {
  Body,
  Button,
  Container,
  Head,
  Heading,
  Hr,
  Html,
  Link,
  Preview,
  Section,
  Text,
} from "@react-email/components";

import { brand } from "@/lib/brand";

interface RepoAccessGrantedEmailProps {
  repoUrl: string;
}

export default function RepoAccessGrantedEmail({
  repoUrl,
}: RepoAccessGrantedEmailProps) {
  return (
    &lt;Html&gt;
      &lt;Head /&gt;
      &lt;Preview&gt;Your {brand.name} repository access is ready!&lt;/Preview&gt;
      &lt;Body style={main}&gt;
        &lt;Container style={container}&gt;
          &lt;Section style={header}&gt;
            &lt;Text style={logoText}&gt;{brand.name}&lt;/Text&gt;
          &lt;/Section&gt;

          &lt;Hr style={divider} /&gt;

          &lt;Heading style={h1}&gt;You are in!&lt;/Heading&gt;

          &lt;Text style={text}&gt;
            Your GitHub repository access has been granted. You now have
            full access to the {brand.name} codebase.
          &lt;/Text&gt;

          &lt;Section style={buttonContainer}&gt;
            &lt;Button style={button} href={repoUrl}&gt;
              Open Repository
            &lt;/Button&gt;
          &lt;/Section&gt;

          &lt;Section style={infoBox}&gt;
            &lt;Text style={infoTitle}&gt;Quick Start&lt;/Text&gt;
            &lt;Text style={infoText}&gt;
              &lt;strong&gt;1.&lt;/strong&gt; Clone the repository to your machine
            &lt;/Text&gt;
            &lt;Text style={infoText}&gt;
              &lt;strong&gt;2.&lt;/strong&gt; Run{" "}
              &lt;code style={codeStyle}&gt;bun install&lt;/code&gt; to install
              dependencies
            &lt;/Text&gt;
            &lt;Text style={infoText}&gt;
              &lt;strong&gt;3.&lt;/strong&gt; Follow the README for environment setup
            &lt;/Text&gt;
            &lt;Text style={infoText}&gt;
              &lt;strong&gt;4.&lt;/strong&gt; Run{" "}
              &lt;code style={codeStyle}&gt;bun dev&lt;/code&gt; to start building
            &lt;/Text&gt;
          &lt;/Section&gt;

          &lt;Hr style={divider} /&gt;

          &lt;Text style={footer}&gt;
            Need help? Reply to this email or reach out at{" "}
            &lt;Link
              href={`mailto:${brand.emails.support}`}
              style={link}
            &gt;
              {brand.emails.support}
            &lt;/Link&gt;
          &lt;/Text&gt;
        &lt;/Container&gt;
      &lt;/Body&gt;
    &lt;/Html&gt;
  );
}
</code></pre>
<p>This template includes a <code>&lt;Button&gt;</code> component that links directly to the GitHub repository. The quick start section gives the customer immediate next steps so they aren't left wondering what to do after gaining access.</p>
<h3 id="heading-how-to-build-an-abandoned-cart-template">How to Build an Abandoned Cart Template</h3>
<p>The abandoned cart email brings the customer back to your pricing page:</p>
<pre><code class="language-tsx">// src/lib/email/emails/abandoned-cart.tsx
import {
  Body,
  Button,
  Container,
  Head,
  Heading,
  Hr,
  Html,
  Preview,
  Section,
  Text,
} from "@react-email/components";

import { brand } from "@/lib/brand";

interface AbandonedCartEmailProps {
  customerEmail: string;
  checkoutUrl: string;
}

export default function AbandonedCartEmail({
  customerEmail,
  checkoutUrl,
}: AbandonedCartEmailProps) {
  return (
    &lt;Html&gt;
      &lt;Head /&gt;
      &lt;Preview&gt;Your {brand.name} checkout is waiting for you&lt;/Preview&gt;
      &lt;Body style={main}&gt;
        &lt;Container style={container}&gt;
          &lt;Section style={header}&gt;
            &lt;Text style={logoText}&gt;{brand.name}&lt;/Text&gt;
          &lt;/Section&gt;

          &lt;Hr style={divider} /&gt;

          &lt;Heading style={h1}&gt;You left something behind&lt;/Heading&gt;

          &lt;Text style={text}&gt;
            We noticed you started a checkout but did not complete your
            purchase. No worries. Your cart is still waiting for you.
          &lt;/Text&gt;

          &lt;Text style={text}&gt;
            {brand.name} gives you everything you need to ship your
            startup this weekend: authentication, payments, email,
            background jobs, and more. All wired together and ready
            to go.
          &lt;/Text&gt;

          &lt;Section style={buttonContainer}&gt;
            &lt;Button style={button} href={checkoutUrl}&gt;
              Complete Your Purchase
            &lt;/Button&gt;
          &lt;/Section&gt;

          &lt;Text style={textSmall}&gt;
            If you ran into any issues during checkout or have questions
            about {brand.name}, just reply to this email. I read every
            message personally.
          &lt;/Text&gt;

          &lt;Hr style={divider} /&gt;

          &lt;Text style={footer}&gt;
            This email was sent to {customerEmail} because you started
            a checkout on {brand.name}. If this was not you, you can
            safely ignore this email.
          &lt;/Text&gt;
        &lt;/Container&gt;
      &lt;/Body&gt;
    &lt;/Html&gt;
  );
}
</code></pre>
<p>The tone matters here. "You left something behind" is friendly, not pushy. The email explains the product's value briefly, includes a single clear call to action, and the footer explains why they received the email.</p>
<h3 id="heading-how-templates-integrate-with-durable-steps">How Templates Integrate with Durable Steps</h3>
<p>Every email template is invoked via <code>createElement</code> inside a <code>step.run()</code> block:</p>
<pre><code class="language-typescript">await step.run("send-purchase-confirmation", async () =&gt; {
  await sendEmail({
    to: user.email,
    subject: `Your ${brand.name} purchase is confirmed!`,
    template: createElement(PurchaseConfirmationEmail, {
      amount: purchase.amount,
      currency: purchase.currency,
      customerEmail: user.email,
    }),
  });
});
</code></pre>
<p>The <code>createElement</code> call creates a React element from the template component with the given props. The <code>sendEmail</code> function renders it to HTML via React Email's <code>render()</code> and sends it through Resend.</p>
<p>Because this is inside a <code>step.run()</code>, the email send is checkpointed. If Resend is down and the step fails, it retries on its own without re-running previous steps. The customer never gets a duplicate email.</p>
<h2 id="heading-how-to-test-the-complete-flow-locally">How to Test the Complete Flow Locally</h2>
<p>Testing the complete payment lifecycle locally requires three things running simultaneously: your application, the Stripe CLI forwarding webhook events, and the Inngest dev server processing background jobs.</p>
<h3 id="heading-step-1-start-the-stripe-cli">Step 1: Start the Stripe CLI</h3>
<p>Install the Stripe CLI and log in:</p>
<pre><code class="language-bash"># macOS
brew install stripe/stripe-cli/stripe

# Authenticate
stripe login
</code></pre>
<p>Forward webhook events to your local server:</p>
<pre><code class="language-bash">stripe listen --forward-to localhost:3000/api/payments/webhook
</code></pre>
<p>The CLI prints a webhook signing secret starting with <code>whsec_</code>. Copy this to your <code>.env</code> as <code>STRIPE_WEBHOOK_SECRET</code>.</p>
<h3 id="heading-step-2-start-the-inngest-dev-server">Step 2: Start the Inngest Dev Server</h3>
<p>The Inngest dev server gives you real-time visibility into every function execution, every step, and every retry:</p>
<pre><code class="language-bash">npx inngest-cli@latest dev -u http://localhost:3000/api/inngest
</code></pre>
<p>Open <code>http://localhost:8288</code> in your browser. This is the Inngest dashboard where you'll watch your durable functions execute step by step.</p>
<h3 id="heading-step-3-start-your-application">Step 3: Start Your Application</h3>
<pre><code class="language-bash">bun run dev
</code></pre>
<p>Your application should now be running on <code>http://localhost:3000</code>.</p>
<h3 id="heading-step-4-test-the-purchase-flow">Step 4: Test the Purchase Flow</h3>
<ol>
<li><p>Go to your pricing page and click the checkout button.</p>
</li>
<li><p>Use Stripe's test card number <code>4242 4242 4242 4242</code> with any future expiration date and any CVC.</p>
</li>
<li><p>Complete the checkout. Stripe redirects you to your success URL.</p>
</li>
<li><p>Your frontend calls the <code>/api/purchases/claim</code> endpoint with the session ID.</p>
</li>
<li><p>Watch the Inngest dashboard. You should see the <code>purchase-completed</code> function trigger and each step execute in sequence.</p>
</li>
</ol>
<p>In the Inngest dashboard, you will see:</p>
<ul>
<li><p><strong>Step 1:</strong> "lookup-user-and-purchase" completes with the user and purchase data.</p>
</li>
<li><p><strong>Step 2:</strong> "track-purchase-to-posthog" completes (or logs a warning if PostHog isn't configured).</p>
</li>
<li><p><strong>Step 3:</strong> "send-purchase-confirmation" completes. Check your email.</p>
</li>
<li><p><strong>Step 4:</strong> "send-admin-notification" completes (if <code>ADMIN_EMAIL</code> is set).</p>
</li>
<li><p><strong>Steps 5-9:</strong> Run if the user has a GitHub username linked.</p>
</li>
</ul>
<h3 id="heading-step-5-test-a-refund">Step 5: Test a Refund</h3>
<p>Trigger a refund through the Stripe CLI:</p>
<pre><code class="language-bash">stripe trigger charge.refunded
</code></pre>
<p>Or go to the Stripe dashboard, find the test payment, and issue a refund manually. The Stripe CLI will forward the <code>charge.refunded</code> webhook to your local server.</p>
<p>In the Inngest dashboard, you'll see the <code>refund-processed</code> function trigger with its own set of steps: lookup, conditional access revocation, status update, analytics tracking, and email notifications.</p>
<h3 id="heading-step-6-test-abandoned-cart-recovery">Step 6: Test Abandoned Cart Recovery</h3>
<p>Trigger a checkout expiration:</p>
<pre><code class="language-bash">stripe trigger checkout.session.expired
</code></pre>
<p>The <code>checkout-expired</code> function will appear in the Inngest dashboard. You'll see the 1-hour sleep step. In the dev server, you can fast-forward through sleeps by clicking the "Skip" button in the dashboard. This lets you test the delayed email without actually waiting an hour.</p>
<h3 id="heading-how-to-simulate-step-failures">How to Simulate Step Failures</h3>
<p>To test the retry behavior, temporarily throw an error in one of your steps:</p>
<pre><code class="language-typescript">const collaboratorResult = await step.run(
  "add-github-collaborator",
  async () =&gt; {
    throw new Error("Simulated GitHub API failure");
  }
);
</code></pre>
<p>In the Inngest dashboard, you'll see:</p>
<ul>
<li><p>Steps 1 through 4 succeed and their results are cached.</p>
</li>
<li><p>Step 5 fails and is retried with exponential backoff.</p>
</li>
<li><p>Steps 6 through 9 remain pending.</p>
</li>
</ul>
<p>Remove the thrown error, and on the next retry, step 5 succeeds. Steps 6 through 9 execute, while steps 1 through 4 aren't re-executed. This is the checkpointing behavior that makes durable execution reliable.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Building a complete SaaS payment flow is more than integrating Stripe Checkout. It's the entire lifecycle from "Buy" button to "Welcome" email, including the parts that happen when things go wrong.</p>
<p>Here's what you built in this tutorial:</p>
<ul>
<li><p>A <strong>database schema</strong> that tracks purchases through every state: completed, partially refunded, and fully refunded.</p>
</li>
<li><p>A <strong>Stripe product and price seed script</strong> that creates your catalog programmatically.</p>
</li>
<li><p>A <strong>checkout flow</strong> with session creation, payment verification, and idempotent purchase claiming.</p>
</li>
<li><p>A <strong>thin webhook handler</strong> that validates signatures and routes events to background jobs.</p>
</li>
<li><p>A <strong>9-step durable purchase function</strong> where each step is independently checkpointed and retried.</p>
</li>
<li><p>A <strong>refund handler</strong> that distinguishes between full and partial refunds, revoking access only when appropriate.</p>
</li>
<li><p>An <strong>abandoned cart recovery flow</strong> that waits an hour before sending a friendly recovery email.</p>
</li>
<li><p><strong>Three transactional email templates</strong> built with React Email: purchase confirmation, repo access granted, and abandoned cart.</p>
</li>
<li><p>A <strong>local testing setup</strong> with Stripe CLI, Inngest dev server, and step-by-step observability.</p>
</li>
</ul>
<p>The most important pattern is the separation between receiving and processing. Your API endpoints and webhook handlers should be thin: validate, record, enqueue, return. All the complex multi-step work happens in durable background functions where failures are isolated and retried at the step level.</p>
<p>This pattern scales. Add a new step to the purchase flow, and it gets the same checkpointing and retry behavior. Add a new webhook event, and you route it to a new durable function.</p>
<p>Your requirements may differ. You might sell subscriptions instead of one-time purchases, or provision API keys instead of GitHub access. The specific steps change, but the architecture stays the same.</p>
<p>If you want to start with all of these patterns already wired together in a production-ready codebase, <a href="https://eden-stack.com?utm_source=freecodecamp&amp;utm_medium=article&amp;utm_campaign=saas-payment-flow-stripe-webhooks-email">Eden Stack</a> includes the complete payment flow described in this article, along with 30+ additional production-tested patterns for authentication, email, analytics, background jobs, and more.</p>
<p><em>Magnus Rødseth builds AI-native applications and is the creator of</em> <a href="https://eden-stack.com?utm_source=freecodecamp&amp;utm_medium=article&amp;utm_campaign=saas-payment-flow-stripe-webhooks-email"><em>Eden Stack</em></a><em>, a production-ready starter kit with 30+ Claude skills encoding production patterns for AI-native SaaS development.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Product Experimentation with Regression Discontinuity: How an LLM Confidence Threshold Creates a Natural Experiment in Python ]]>
                </title>
                <description>
                    <![CDATA[ Causal inference for LLM-based features starts with one question editors ask before they ship anything: Did the change actually move the metric, or did the metric just move? Let's say that your team b ]]>
                </description>
                <link>https://www.freecodecamp.org/news/gen-ai-product-experimentation-with-regression-discontinuity-design/</link>
                <guid isPermaLink="false">69fe0255f239332df4da1c33</guid>
                
                    <category>
                        <![CDATA[ product experimentation ]]>
                    </category>
                
                    <category>
                        <![CDATA[ causal inference ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ regression-discontinuity ]]>
                    </category>
                
                    <category>
                        <![CDATA[ experimentation ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Rudrendu Paul ]]>
                </dc:creator>
                <pubDate>Fri, 08 May 2026 15:33:41 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/a6b7e375-8638-4b98-824e-bb94c60e9e57.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Causal inference for LLM-based features starts with one question editors ask before they ship anything: Did the change actually move the metric, or did the metric just move?</p>
<p>Let's say that your team built a routing layer that splits incoming queries between two models: queries with a confidence score below 0.85 go to a premium model, and those above 0.85 go to a cheaper distilled model. The premium model costs 5x as much as the cheaper one.</p>
<p>Your boss wants the answer that ends the debate: Is the premium model worth it for the queries it sees?</p>
<p>You can't run a clean A/B test, because routing is deterministic: a query at confidence 0.84 always gets premium, a query at 0.86 always gets cheap, and you can't randomize the assignment.</p>
<p>You also can't trust a naïve comparison of premium-routed users against cheap-routed users. Premium handles the harder queries by design (that's the reason you built the gate), so the two groups differ in query difficulty before either model touches them.</p>
<p>The threshold itself is your free experiment. Right at 0.85, the assignment flips, but the queries on either side of that boundary are essentially identical. A query at confidence 0.849 isn't meaningfully different from a query at 0.851. Any differences in outcomes between the two narrow groups stem solely from the routing decision. That's what regression discontinuity design (RDD) reads.</p>
<p>In this tutorial, you'll use Python to estimate the causal effect of premium routing on task completion using sharp RDD with local linear regression. You'll sweep bandwidths to test estimate stability, run a manipulation diagnostic, check robustness with a quadratic specification, and bootstrap 95% confidence intervals around every point estimate.</p>
<p>The LLM telemetry is a 50,000-user synthetic dataset with the ground-truth premium-routing effect baked in at +6 percentage points, so you can verify that RDD recovers it.</p>
<p><strong>Companion code:</strong> every code block runs end-to-end <a href="https://github.com/RudrenduPaul/product-experimentation-causal-inference-genai-llm/tree/main/03_rdd_confidence_threshold">in the companion notebook</a>.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-why-threshold-routing-is-a-natural-experiment">Why Threshold Routing is a Natural Experiment</a></p>
</li>
<li><p><a href="#heading-what-regression-discontinuity-actually-does">What Regression Discontinuity Actually Does</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-setting-up-the-working-example">Setting Up the Working Example</a></p>
</li>
<li><p><a href="#heading-step-1-a-sharp-rdd-with-local-linear-regression">Step 1: A Sharp RDD with Local Linear Regression</a></p>
</li>
<li><p><a href="#heading-step-2-try-different-bandwidths">Step 2: Try Different Bandwidths</a></p>
</li>
<li><p><a href="#heading-step-3-checking-for-manipulation-at-the-threshold">Step 3: Checking for Manipulation at the Threshold</a></p>
</li>
<li><p><a href="#heading-step-4-quadratic-specification-as-a-robustness-check">Step 4: Quadratic Specification as a Robustness Check</a></p>
</li>
<li><p><a href="#heading-step-5-bootstrap-confidence-intervals">Step 5: Bootstrap Confidence Intervals</a></p>
</li>
<li><p><a href="#heading-when-regression-discontinuity-fails">When Regression Discontinuity Fails</a></p>
</li>
<li><p><a href="#heading-what-to-do-next">What to Do Next</a></p>
</li>
</ul>
<h2 id="heading-why-threshold-routing-is-a-natural-experiment">Why Threshold Routing is a Natural Experiment</h2>
<p>The product reason this routing rule exists is to help your team spend the premium model budget where it earns its keep. Low-confidence queries are the harder ones, which is where a stronger model has the most upside. High-confidence queries already look easy enough for the cheap model to handle.</p>
<p>You'll see this routing direction across confidence-score gates for Q&amp;A assistants, query-complexity gates in multi-model gateways like OpenRouter, safety-score gates in content moderation, and latency-budget gates that re-route when the cheap model would exceed a p99 latency budget.</p>
<p>The mechanism is the same in every case: a continuous score, a threshold, and a deterministic routing rule.</p>
<p>What makes this setup useful for causal inference is that users don't pick which model they get. A query lands, the system computes confidence, and the routing layer decides. Right at the threshold, the user's experience flips from premium to cheap based on a difference too small to be meaningful.</p>
<p>Again, a query at 0.849 confidence isn't shipping a different problem to the model than a query at 0.851. Anything that differs in outcomes between those two groups is the routing decision speaking. The underlying query is the same.</p>
<p>That local randomness is the experiment RDD reads from. You don't need a randomized control group, you don't need a propensity score. And you don't need an instrument, you need a sharp threshold that nobody can game.</p>
<h2 id="heading-what-regression-discontinuity-actually-does">What Regression Discontinuity Actually Does</h2>
<p>The jump at the threshold is the causal effect, which is the number a product team can act on. RDD reads it by fitting two separate regression lines to the outcome: one for users just below the threshold and one for users just above. The vertical difference between those two fitted lines at the cutoff is the local average treatment effect at that point.</p>
<p>Graphically, picture task completion on the y-axis and query confidence on the x-axis. Completion generally trends with confidence (easier queries complete more often). At exactly 0.85, though, users below the cutoff get premium routing, and users above get cheap.</p>
<p>If premium routing helps, you'd see a sharp upward jump in task completion just below 0.85, then disappear just above. Approached from left to right with confidence rising, the visual reads as a downward step at 0.85, because you're moving from the premium-treated zone into the cheap-treated zone.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69cc82ffe4688e4edd796adb/f772c04b-5642-472c-8182-695183027294.png" alt="f772c04b-5642-472c-8182-695183027294" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><em>Figure 1. Conceptual schematic. Two outcome trajectories, one for premium-routed queries (confidence below 0.85) and one for cheap-routed queries (confidence above 0.85), meet at the threshold but don't match. The vertical gap between their endpoints at 0.85 is the local causal effect of premium routing.</em></p>
<p>That gap is identified under two named assumptions:</p>
<ol>
<li><p><strong>No manipulation of the running variable:</strong> Users (or your system) can't precisely nudge a query's confidence score across the cutoff. If anyone can game their score to land just below 0.85 and grab premium routing, the cutoff is no longer drawn at random, and RDD breaks.</p>
</li>
<li><p><strong>Continuity of potential outcomes at the cutoff:</strong> Every other factor that affects task completion (query type, user expertise, workspace tenure, time of day) varies smoothly across 0.85. Only the routing assignment changes discontinuously at exactly the threshold. If a second product rule fires at 0.85 (a different logging level, a separate UI treatment, a retry policy), RDD will attribute that rule's effect to the routing decision.</p>
</li>
</ol>
<p>These are the two assumptions you check before you trust the estimate. Step 3 below tests the first one. The second is a structural property of your system that you have to know cold.</p>
<p>Two practical choices shape every RDD: the <strong>bandwidth</strong> (how close to the cutoff to restrict the analysis) and the <strong>functional form</strong> (linear, quadratic, or local polynomial).</p>
<p>Narrow bandwidths cut potential bias by staying close to the local-randomization zone, but they shrink the sample. Linear specifications are stable, though they assume the underlying relationship can be approximated by a straight line on each side.</p>
<p>You'll try both linear and quadratic specifications at multiple bandwidths to see whether the answer holds.</p>
<p>The article uses sharp RDD throughout, since assignment is a deterministic function of confidence (below 0.85 always premium, above 0.85 always cheap). When the threshold is probabilistic and compliance is partial, the design is a fuzzy RDD, which requires an instrumental variables framework that you can implement using the <code>rdrobust</code> Python package.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You need Python 3.11 or newer, comfort with pandas and statsmodels, and rough familiarity with linear regression and interaction terms.</p>
<p>Install the packages used in this tutorial:</p>
<pre><code class="language-shell">pip install numpy pandas statsmodels matplotlib scipy
</code></pre>
<p><strong>Here's what's happening:</strong> four standard scientific Python libraries plus matplotlib for the diagnostic visualization. Nothing exotic.</p>
<p>Clone the companion repo and generate the synthetic dataset:</p>
<pre><code class="language-shell">git clone https://github.com/RudrenduPaul/product-experimentation-causal-inference-genai-llm.git
cd product-experimentation-causal-inference-genai-llm
python data/generate_data.py --seed 42 --n-users 50000 --out data/synthetic_llm_logs.csv
</code></pre>
<p><strong>Here's what's happening:</strong> the data generator draws 50,000 users with a <code>query_confidence</code> score from a Beta(5,2) distribution, applies the routing rule (<code>routed_to_premium = query_confidence &lt; 0.85</code>), and bakes a +6-percentage-point premium routing effect into <code>task_completed</code>. Same seed, same dataset, every time.</p>
<h2 id="heading-setting-up-the-working-example">Setting Up the Working Example</h2>
<p>The dataset simulates a SaaS product that routes queries between a premium and a cheap model based on confidence score. The threshold is 0.85, and the ground-truth causal effect of premium routing is +6 percentage points on task completion. You know the truth, so you can check whether RDD recovers it.</p>
<p>Load the data and look at the routing breakdown:</p>
<pre><code class="language-python">import numpy as np
import pandas as pd
import statsmodels.formula.api as smf

df = pd.read_csv("data/synthetic_llm_logs.csv")
print(f"Loaded {len(df):,} rows, {df.shape[1]} columns")

print("\nRouting breakdown:")
counts = df.routed_to_premium.value_counts().to_dict()
print(f"  Premium-routed (confidence &lt; 0.85):  {counts.get(1, 0):,}")
print(f"  Cheap-routed   (confidence &gt;= 0.85): {counts.get(0, 0):,}")

print("\nQuery confidence distribution:")
print(df.query_confidence.describe().round(3))
</code></pre>
<p><strong>Expected output:</strong></p>
<pre><code class="language-python">Loaded 50,000 rows, 16 columns

Routing breakdown:
  Premium-routed (confidence &lt; 0.85):  38,874
  Cheap-routed   (confidence &gt;= 0.85): 11,126

Query confidence distribution:
count    50000.000
mean         0.715
std          0.159
min          0.078
25%          0.611
50%          0.736
75%          0.838
max          0.998
</code></pre>
<p><strong>Here's what's happening:</strong> about 78% of queries land below the 0.85 cutoff and get premium routing. The Beta(5,2) distribution is skewed toward the upper end, with a median of 0.736, and most of its mass still sits below 0.85. The remaining 22% are queries that the model already feels confident about, and they go to the cheap model.</p>
<p>Before any regression, look at the naïve comparison every product team is tempted to run:</p>
<pre><code class="language-python">naive = (
    df[df.routed_to_premium == 1].task_completed.mean()
    - df[df.routed_to_premium == 0].task_completed.mean()
)
print(f"Naive premium-vs-cheap effect: {naive:+.4f}  (ground truth = +0.06)")
</code></pre>
<p><strong>Expected output:</strong></p>
<pre><code class="language-python">Naive premium-vs-cheap effect: +0.0632  (ground truth = +0.06)
</code></pre>
<p><strong>Here's what's happening:</strong> the naive estimate sits at +0.0632, which is suspiciously close to the truth. That's a coincidence of this specific synthetic dataset, where the only confounder of premium vs. cheap is <code>query_confidence</code> itself, and the outcome doesn't depend on confidence except through routing.</p>
<p>In production, you almost never get this lucky. User expertise, prompt phrasing, time of day, and a dozen unobserved query traits all correlate with confidence and with completion.</p>
<p>A naïve comparison in a real system can be off by 50% or more in either direction. RDD gives you identification that doesn't depend on the absence of hidden confounders.</p>
<h3 id="heading-step-1-a-sharp-rdd-with-local-linear-regression">Step 1: A Sharp RDD with Local Linear Regression</h3>
<p>The basic sharp RDD estimator is a local linear regression. Restrict to users whose confidence sits within a bandwidth of the cutoff, fit separate linear slopes on each side, and read off the jump at 0.85.</p>
<pre><code class="language-python">cutoff = 0.85
bw = 0.10

near = df[(df.query_confidence &gt; cutoff - bw)
          &amp; (df.query_confidence &lt; cutoff + bw)].copy()
near["below_cutoff"] = (near.query_confidence &lt; cutoff).astype(int)
near["rc"] = near.query_confidence - cutoff

rdd_model = smf.ols(
    "task_completed ~ below_cutoff + rc + below_cutoff:rc",
    data=near,
).fit(cov_type="HC3")

effect = rdd_model.params["below_cutoff"]
print(f"RDD effect at cutoff (LATE): {effect:+.4f}")
print(f"Std error (HC3):             {rdd_model.bse['below_cutoff']:.4f}")
print(f"p-value:                     {rdd_model.pvalues['below_cutoff']:.4f}")
print(f"N users in [0.75, 0.95):     {len(near):,}")
</code></pre>
<p><strong>Expected output:</strong></p>
<pre><code class="language-python">RDD effect at cutoff (LATE): +0.0548
Std error (HC3):             0.0131
p-value:                     0.0000
N users in [0.75, 0.95):     21,689
</code></pre>
<p><strong>Here's what's happening:</strong> the model fits separate intercepts and slopes on each side of 0.85 (<code>below_cutoff</code> is the side indicator, <code>rc</code> is confidence centered at the cutoff). The coefficient on <code>below_cutoff</code> reads off the vertical jump at the threshold, which is the local average treatment effect (LATE) for queries with confidence near 0.85. You get +0.0548, within sampling noise of the +0.06 ground truth.</p>
<p>Three notes on the specification. First, <code>task_completed</code> is binary, so this is a linear probability model. For RDD with a binary outcome at the cutoff, the linear probability model is standard practice because local linearity is the identifying assumption either way. Logit at the cutoff is an alternative if you need bounded predictions globally.</p>
<p>Second, the standard errors are used <code>cov_type="HC3"</code> to relax the homoskedasticity assumption, which is almost always wrong for binary outcomes.</p>
<p>Third, the dataset has one query per user with no within-user clustering, so cluster-robust standard errors aren't needed here. In a setting with multiple queries per user, you'd cluster on <code>user_id</code>.</p>
<p>The next diagnostic to look at is the confidence distribution near the cutoff. Figure 2 shows what 50,000 queries look like in the bandwidth window:</p>
<img src="https://cdn.hashnode.com/uploads/covers/69cc82ffe4688e4edd796adb/9ecb8a4c-6eac-4732-95ae-2a5981917f54.png" alt="9ecb8a4c-6eac-4732-95ae-2a5981917f54" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><em>Figure 2. Real distribution from the 50,000-user synthetic dataset. Unlike the schematic in Figure 1, this shows the actual query density by confidence score, with the routing threshold annotated. The bottom panel counts how many queries land in each 2-percentage-point bin near the cutoff (2,461 / 2,481 / 2,335 / 2,229 / 2,048 across the 0.80–0.90 range). The roughly uniform spread is the visual signal that no manipulation is concentrating users on one side of the threshold.</em></p>
<h3 id="heading-step-2-try-different-bandwidths">Step 2: Try Different Bandwidths</h3>
<p>Bandwidth choice matters. Too narrow and you have too few observations, so the confidence interval blows up. Too wide and you're extrapolating into regions where the linear specification is no longer a reasonable local approximation.</p>
<p>The honest move is to try multiple bandwidths and report whether the estimate holds.</p>
<pre><code class="language-python">results = []
for bw in [0.05, 0.10, 0.15, 0.20]:
    sub = df[(df.query_confidence &gt; cutoff - bw)
             &amp; (df.query_confidence &lt; cutoff + bw)].copy()
    sub["below_cutoff"] = (sub.query_confidence &lt; cutoff).astype(int)
    sub["rc"] = sub.query_confidence - cutoff

    m = smf.ols(
        "task_completed ~ below_cutoff + rc + below_cutoff:rc",
        data=sub,
    ).fit(cov_type="HC3")

    results.append({
        "bandwidth": bw,
        "n": len(sub),
        "effect": m.params["below_cutoff"],
        "se": m.bse["below_cutoff"],
        "p": m.pvalues["below_cutoff"],
    })

print(pd.DataFrame(results).round(4).to_string(index=False))
</code></pre>
<p><strong>Expected output:</strong></p>
<pre><code class="language-python"> bandwidth      n  effect     se       p
      0.05  11554  0.0635  0.0183  0.0005
      0.10  21689  0.0548  0.0131  0.0000
      0.15  29137  0.0618  0.0112  0.0000
      0.20  34074  0.0614  0.0107  0.0000
</code></pre>
<p><strong>Here's what's happening:</strong> four bandwidths from ±0.05 to ±0.20 around the cutoff, refitting the same RDD specification at each. The estimates range from +0.0548 to +0.0635, all in the same neighborhood as the +0.06 ground truth, with standard errors that shrink as the bandwidth widens and grow as it narrows. Every p-value is well below 0.05. Whether the estimates are "stable" depends on the confidence intervals around them, which Step 5 produces with the bootstrap.</p>
<h3 id="heading-step-3-checking-for-manipulation-at-the-threshold">Step 3: Checking for Manipulation at the Threshold</h3>
<p>RDD is valid only if users can't precisely manipulate the running variable around the cutoff. If your users (or your system) can nudge confidence scores just below 0.85 to force premium routing, you get a density spike at the cutoff, and the RDD estimate is contaminated.</p>
<p>The standard diagnostic is the McCrary density test, which checks whether the distribution of the running variable has a sharp jump at the cutoff. The simple version: bin the data tightly around 0.85 and check whether the counts on the two sides are similar.</p>
<pre><code class="language-python">print("User counts in 2-percentage-point bins around 0.85:")
for lo in [0.80, 0.82, 0.84, 0.86, 0.88]:
    hi = lo + 0.02
    cnt = ((df.query_confidence &gt;= lo) &amp; (df.query_confidence &lt; hi)).sum()
    print(f"  [{lo:.2f}, {hi:.2f}):  n = {cnt:,}")
</code></pre>
<p><strong>Expected output:</strong></p>
<pre><code class="language-python">User counts in 2-percentage-point bins around 0.85:
  [0.80, 0.82):  n = 2,461
  [0.82, 0.84):  n = 2,481
  [0.84, 0.86):  n = 2,335
  [0.86, 0.88):  n = 2,229
  [0.88, 0.90):  n = 2,048
</code></pre>
<p><strong>Here's what's happening:</strong> counts trend gently downward across the bandwidth because Beta(5,2) places more mass at higher confidence levels, and the density tapers as it approaches 1.0. There's no spike or dip at the 0.84–0.86 bin that straddles the cutoff. The 433-user spread across all five bins is consistent with smooth tapering of the underlying density.</p>
<p>That's the pattern you want when manipulation is absent. For a more rigorous test, the <a href="https://github.com/rdpackages/rddensity"><code>rddensity</code></a> Python package implements the formal McCrary procedure with bias-corrected standard errors.</p>
<p>What manipulation looks like when it's real: a spike in users at confidences just barely below 0.85 (they're being nudged into premium routing) and a dip just above. If you see that pattern, the RDD estimate overstates the causal effect because the users right below 0.85 differ in motivation from those right above. They cared enough to manipulate the score, and they'd have shown different outcomes even under random routing.</p>
<h3 id="heading-step-4-quadratic-specification-as-a-robustness-check">Step 4: Quadratic Specification as a Robustness Check</h3>
<p>If the true relationship between confidence and task completion isn't exactly linear, a local linear RDD can mistake the curvature for a jump. The standard robustness check allows quadratic terms on both sides of the cutoff and tests whether the estimate holds.</p>
<pre><code class="language-python">near = df[(df.query_confidence &gt; cutoff - 0.10)
         &amp; (df.query_confidence &lt; cutoff + 0.10)].copy()
near["below_cutoff"] = (near.query_confidence &lt; cutoff).astype(int)
near["rc"] = near.query_confidence - cutoff
near["rc2"] = near.rc ** 2

rdd_quad = smf.ols(
    "task_completed ~ below_cutoff + rc + below_cutoff:rc"
    " + rc2 + below_cutoff:rc2",
    data=near,
).fit(cov_type="HC3")

print(f"Linear RDD    (bw=0.10):  effect = +0.0548, p &lt; 0.0001")
print(f"Quadratic RDD (bw=0.10):  effect = "
      f"{rdd_quad.params['below_cutoff']:+.4f}, "
      f"p = {rdd_quad.pvalues['below_cutoff']:.4f}")
</code></pre>
<p><strong>Expected output:</strong></p>
<pre><code class="language-text">Linear RDD    (bw=0.10):  effect = +0.0548, p &lt; 0.0001
Quadratic RDD (bw=0.10):  effect = +0.0569, p = 0.0036
</code></pre>
<p><strong>Here's what's happening:</strong> the quadratic specification adds squared terms and interactions with the cutoff indicator, allowing the relationship to curve differently on each side. The <code>below_cutoff</code> coefficient still captures the jump at the threshold, now under a more flexible specification.</p>
<p>The two estimates differ by 0.0022, both close to the +0.06 ground truth, and both are significant at p &lt; 0.01. The answer doesn't change when you let the model bend.</p>
<p>When linear and quadratic specifications disagree noticeably, you have a real signal. With small samples (a few thousand at narrow bandwidths), the quadratic version can lose power because four extra parameters need data to be identified.</p>
<p>The standard move is to widen the bandwidth and re-run both specifications. If they still disagree at wider bandwidths, the linear approximation is wrong, and you should report both numbers.</p>
<h3 id="heading-step-5-bootstrap-confidence-intervals">Step 5: Bootstrap Confidence Intervals</h3>
<p>Every point estimate in this article is a single number from a finite sample. The bootstrap quantifies how much that number would move under resampling, which is what a confidence interval describes.</p>
<pre><code class="language-python">def bootstrap_ci(df, cutoff, bw, quadratic=False, n_reps=500, seed=7):
    rng = np.random.default_rng(seed)
    near = df[(df.query_confidence &gt; cutoff - bw)
              &amp; (df.query_confidence &lt; cutoff + bw)].copy()
    near["below_cutoff"] = (near.query_confidence &lt; cutoff).astype(int)
    near["rc"] = near.query_confidence - cutoff
    if quadratic:
        near["rc2"] = near.rc ** 2
        formula = ("task_completed ~ below_cutoff + rc + below_cutoff:rc"
                   " + rc2 + below_cutoff:rc2")
    else:
        formula = "task_completed ~ below_cutoff + rc + below_cutoff:rc"

    n = len(near)
    estimates = np.empty(n_reps)
    for i in range(n_reps):
        sample = near.iloc[rng.integers(0, n, size=n)]
        m = smf.ols(formula, data=sample).fit()
        estimates[i] = m.params["below_cutoff"]
    return (np.percentile(estimates, 2.5), np.percentile(estimates, 97.5))


print("Linear RDD (bw=0.10):")
lo, hi = bootstrap_ci(df, cutoff, bw=0.10)
print(f"  effect = +0.0548   95% CI: [{lo:+.4f}, {hi:+.4f}]")

print("\nBandwidth sensitivity:")
for bw, eff in [(0.05, 0.0635), (0.10, 0.0548), (0.15, 0.0618), (0.20, 0.0614)]:
    lo, hi = bootstrap_ci(df, cutoff, bw=bw)
    print(f"  bw = {bw:.2f}   effect = {eff:+.4f}   "
          f"95% CI: [{lo:+.4f}, {hi:+.4f}]")

print("\nQuadratic RDD (bw=0.10):")
lo, hi = bootstrap_ci(df, cutoff, bw=0.10, quadratic=True)
print(f"  effect = +0.0569   95% CI: [{lo:+.4f}, {hi:+.4f}]")
</code></pre>
<p><strong>Expected output:</strong></p>
<pre><code class="language-text">Linear RDD (bw=0.10):
  effect = +0.0548   95% CI: [+0.0278, +0.0817]

Bandwidth sensitivity:
  bw = 0.05   effect = +0.0635   95% CI: [+0.0244, +0.0986]
  bw = 0.10   effect = +0.0548   95% CI: [+0.0278, +0.0817]
  bw = 0.15   effect = +0.0618   95% CI: [+0.0381, +0.0823]
  bw = 0.20   effect = +0.0614   95% CI: [+0.0420, +0.0808]

Quadratic RDD (bw=0.10):
  effect = +0.0569   95% CI: [+0.0205, +0.0959]
</code></pre>
<p><strong>Here's what's happening:</strong> the bootstrap resamples the bandwidth-restricted data with replacement 500 times, refits the RDD on each replicate, and collects the <code>below_cutoff</code> coefficient. The 2.5th and 97.5th percentiles of those 500 estimates form the 95% interval. Every interval covers the +0.06 ground truth, every interval excludes zero, and the bandwidth sweep produces overlapping intervals.</p>
<p>That's quantitative stability, verified by resampling across the full bandwidth range. Intervals widen as the bandwidth shrinks and narrow as it grows. The quadratic interval is wider than the linear one because the four extra parameters absorb degrees of freedom.</p>
<p>One thing the intervals do NOT do on this dataset: exclude the naive +0.0632 estimate. That's because the data generator doesn't bake in confounding by query confidence. The only difference between the premium and cheap groups in expectations is the +6pp routing effect itself, so the naïve comparison is close to the truth.</p>
<p>Real systems are messier. In a production setting where unobserved query traits affect both the routing assignment and task completion, the naïve estimate would diverge from the RDD estimate, and the bootstrap intervals would tell you which one to trust.</p>
<h2 id="heading-when-regression-discontinuity-fails">When Regression Discontinuity Fails</h2>
<p>RDD looks clean, but several specific failure modes can destroy the identification. Each one maps to a violation of one of the two named assumptions.</p>
<p><strong>Users manipulate the running variable</strong> (violates assumption 1). The whole setup depends on users (or any upstream service) being unable to precisely control which side of the cutoff they land on. Any system that reveals the cutoff and gives users a way to influence their score (a retry mechanism, a prompt engineering workaround, a confidence-inflating trick) breaks RDD.</p>
<p>Run the density check in Step 3 every time. If you find manipulation, switch to a fuzzy RDD that treats the threshold as probabilistic, or abandon the approach.</p>
<p><strong>Other policies fire at the same cutoff</strong> (violates assumption 2). If your product has additional rules that activate at 0.85 (a separate UI treatment, a different logging level, a different retry policy), RDD can't separate the routing effect from those other policy effects. Audit the full rule book for anything that shares the threshold.</p>
<p><strong>The threshold has noise or overrides</strong> (violates assumption 1, in the structural sense). Maybe routing isn't strictly deterministic at 0.85&nbsp;– it may have random jitter, or a second rule may override the main rule in some cases.</p>
<p>If assignment to the premium model isn't a deterministic function of <code>query_confidence</code>, you have a fuzzy RDD, which requires an instrumental variables framework. The <code>rdrobust</code> package handles both sharp and fuzzy designs.</p>
<p><strong>Curvature masquerading as a jump</strong> (breaks the linear approximation that supports identification at the cutoff). Sharp RDD assumes linearity is a reasonable local approximation. When the underlying outcome-confidence relationship is strongly curved, the linear specification can mistake the bend for a jump.</p>
<p>Step 4's quadratic robustness check is the standard diagnostic. If linear and quadratic disagree, widen the bandwidth and re-run both.</p>
<p><strong>Extrapolation bias</strong> (a continuity issue, reframed). RDD estimates are strictly local to the cutoff. The +0.06 effect at 0.85 tells you nothing about what premium routing would do for queries with confidence 0.30 or 0.99.</p>
<p>If you want a global average effect, you need a different technique: propensity methods, regression with confounder adjustment, or an actual experiment.</p>
<h2 id="heading-what-to-do-next">What to Do Next</h2>
<p>RDD is the right tool when your AI feature is gated by a continuous score and a sharp threshold.</p>
<p>If your feature is gated by a user-controlled toggle, propensity score methods are a better fit. If it's gated by a staged rollout across workspaces, difference-in-differences handles it. If it's gated by rules you can't observe directly but that have a random component, instrumental variables is the right choice.</p>
<p>For production RDD analyses, use the <a href="https://github.com/rdpackages/rdrobust"><code>rdrobust</code></a> Python package. It gives you optimal bandwidth selection (Calonico, Cattaneo, and Titiunik 2014), bias-corrected standard errors, and a built-in plotting utility. The companion <a href="https://github.com/rdpackages/rddensity"><code>rddensity</code></a> package implements the McCrary density test you saw informally in Step 3.</p>
<p>The from-scratch version in this tutorial shows the mechanics. The rd-packages stack is what you ship to a reviewer.</p>
<p>One thing the LATE doesn't do: tell you the effect for users far from the cutoff. If a +0.06 LATE at 0.85 is enough to keep premium routing in the pipeline, you're done. If you need to know what premium would do for the easy queries you're currently sending to cheap (or the hardest queries near the floor), the next step is a small randomized rollout in those zones, scored against the RDD estimate as a calibration check. Don't generalize the LATE without evidence.</p>
<p>The companion notebook for this tutorial <a href="https://github.com/RudrenduPaul/product-experimentation-causal-inference-genai-llm/tree/main/03_rdd_confidence_threshold">lives here on GitHub</a>. Clone the repo, generate the synthetic dataset, and run <code>rdd_demo.ipynb</code> to reproduce every code block from this tutorial.</p>
<p>Threshold routing is one of the most common patterns in production LLM systems, and every confidence-gated routing decision in your stack is a potential RDD. Run the analysis.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Live Options Database in Python – A Complete Guide ]]>
                </title>
                <description>
                    <![CDATA[ Live options analytics change constantly. Implied volatility shifts, Greeks drift, and the shape of the surface can look different even a few minutes later. But a lot of teams still treat these number ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-live-options-database-in-python-a-complete-guide/</link>
                <guid isPermaLink="false">69fd19789f93a850a43041c9</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ stockmarket ]]>
                    </category>
                
                    <category>
                        <![CDATA[ trading,  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Nikhil Adithyan ]]>
                </dc:creator>
                <pubDate>Thu, 07 May 2026 23:00:08 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/4ecffa99-c492-4959-9899-885021d11ee4.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Live options analytics change constantly. Implied volatility shifts, Greeks drift, and the shape of the surface can look different even a few minutes later.</p>
<p>But a lot of teams still treat these numbers like something you glance at once. A screenshot in a deck. A one-off notebook cell. A quick check in a UI before a meeting.</p>
<p>That works until you need to answer basic questions that show up in real workflows:</p>
<p>What did TSLA's surface look like at 10:32? When did skew start steepening? Did the change come from the wings moving or the ATM shifting?</p>
<p>If you don't store the data as it arrives, you can't replay it, compare it, or audit it. You're stuck with whatever you happened to look at in the moment.</p>
<p>In this walkthrough, we'll build something small but practical: an internal database that continuously captures SpiderRock MLink's LiveImpliedQuote analytics for TSLA, stores each snapshot as queryable history, and also maintains a "latest view" table so you can pull the current surface state without scanning the full history.</p>
<p><strong>The goal is not to build a trading system. It's to build a reliable internal dataset that you can monitor and query.</strong></p>
<p>Note: SpiderRock MLink's LiveImpliedQuote analytics is a product offered for a fee, which includes exchange charges for the underlying market data used in its creation.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-what-data-were-using">What Data We're Using</a></p>
</li>
<li><p><a href="#heading-setup-importing-packages">Setup: Importing Packages</a></p>
</li>
<li><p><a href="#heading-database-design">Database Design</a></p>
</li>
<li><p><a href="#heading-pulling-liveimpliedquote">Pulling LiveImpliedQuote</a></p>
</li>
<li><p><a href="#heading-normalizing-the-response-into-rows">Normalizing the Response Into Rows</a></p>
</li>
<li><p><a href="#heading-writing-to-the-database">Writing To The Database</a></p>
</li>
<li><p><a href="#heading-running-a-short-polling-capture">Running a Short Polling Capture</a></p>
</li>
<li><p><a href="#heading-analysis-smile-reconstruction-from-the-database">Analysis: Smile Reconstruction From the Database</a></p>
<ul>
<li><p><a href="#heading-pick-an-expiry-with-good-coverage">Pick an Expiry with Good Coverage</a></p>
</li>
<li><p><a href="#heading-rebuild-the-smile-across-snapshots">Rebuild the Smile Across Snapshots</a></p>
</li>
<li><p><a href="#heading-zoom-in-around-spot">Zoom-In Around Spot</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-analysis-atm-iv-and-skew-over-time">Analysis: ATM IV and Skew Over Time</a></p>
</li>
<li><p><a href="#heading-alert-style-thresholds">Alert-Style Thresholds</a></p>
</li>
<li><p><a href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before running any of the code in this walkthrough, there are a few things you need to have in place.</p>
<p>On the API side, you need a SpiderRock MLink account with access to the LiveImpliedQuote feed. The examples use the REST interface, so no websocket setup is required, but you do need a valid API key. If you don't have one yet, you can reach out to SpiderRock directly to get access.</p>
<p>On the Python side, the environment is minimal. You need Python 3.10 or later for the tuple type hint syntax used in one of the function signatures. The external packages are requests, pandas, numpy, and matplotlib. Everything else – sqlite3, time, datetime – is part of the standard library. You can install the external dependencies with:</p>
<pre><code class="language-plaintext">pip install requests pandas numpy matplotlib
</code></pre>
<p>No database setup is required beyond a writable local path. SQLite creates the file automatically on first run, so there's nothing to install or configure separately.</p>
<p>Finally, the walkthrough uses TSLA as the target symbol because it has a liquid and active options chain. If you want to swap in a different underlying, the only thing you need to change is the symbol variable in the config block.</p>
<h2 id="heading-what-data-were-using">What Data We're Using</h2>
<p>This build is driven by one OptAnalytics message type from SpiderRock MLink: <a href="https://docs.spiderrockconnect.com/docs/next/MessageSchemas/Schema/Topics/analytics/LiveImpliedQuote/"><strong>LiveImpliedQuote</strong></a>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/7150e733-6238-410b-afe7-abc781d67e7a.png" alt="LiveImpliedQuote docs page" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Each message represents an option contract and comes with the analytics you actually need for monitoring:</p>
<ul>
<li><p>the option identifier (symbol, expiry, strike, call or put)</p>
</li>
<li><p>surface IV (sVol) and related surface fields</p>
</li>
<li><p>Greeks (delta, gamma, theta, vega)</p>
</li>
<li><p>context fields like underlying price (uPrc), time to expiry (years), and rate (rate)</p>
</li>
<li><p>timestamps and calc source markers, which matter when you're turning a live feed into a database</p>
</li>
</ul>
<p>We'll treat sVol as the main volatility field for the article and refer to it as surface IV. That keeps the workflow consistent when we rebuild smiles or compute skew proxies from stored history.</p>
<p>The demo uses TSLA because it has a rich and active options chain, which makes the database and queries more interesting even in a short capture window. The same pipeline works for any other underlying&nbsp;– the only thing you change is the symbol filter.</p>
<h2 id="heading-setup-importing-packages">Setup: Importing Packages</h2>
<p>Before touching the database or the API, we set up a small, repeatable environment. This section is intentionally minimal. We only import what we need for three things: making REST calls, storing data in SQLite, and doing basic analysis and plots.</p>
<pre><code class="language-python">import requests
import sqlite3
import pandas as pd
import numpy as np
import time
from datetime import datetime, timezone
import matplotlib.pyplot as plt
plt.style.use('ggplot')
</code></pre>
<ul>
<li><p><code>requests</code> is used for calling MLink REST endpoints.</p>
</li>
<li><p><code>sqlite3</code> gives us a lightweight database we can write to locally without extra setup.</p>
</li>
<li><p><code>pandas</code> and <code>numpy</code> are only for shaping and filtering the data once it comes back.</p>
</li>
<li><p><code>time</code> and <code>datetime</code> help us run a polling loop and timestamp each snapshot so the database becomes a real-time series.</p>
</li>
</ul>
<h2 id="heading-database-design">Database Design</h2>
<p>If the goal is to make live analytics queryable, the database design has to support two different needs.</p>
<p>First, you want an audit trail. Every snapshot should be preserved so you can reconstruct what the surface looked like at a specific time.</p>
<p>Second, you also want a fast way to answer "what does it look like right now" without scanning everything you've ever stored.</p>
<p>So we use two tables:</p>
<ul>
<li><p><code>implied_quote_history</code>: Append-only. Every poll inserts a full snapshot.</p>
</li>
<li><p><code>implied_quote_latest</code>: One row per option contract. Each poll upserts into this table so it always reflects the most recent snapshot.</p>
</li>
</ul>
<p>The core of both tables is a stable option identifier. In the feed, the option key is nested, so we normalize it into a single <code>option_key</code> string that includes symbol, expiry, strike, call or put, and venue fields. This becomes the primary key for the latest table and the main join key for queries.</p>
<pre><code class="language-python">#config
api_key = "YOUR SPIDERROCK API KEY"
mlink_url = "https://mlink-live.nms.saturn.spiderrockconnect.com/rest/json"

msg_type = "LiveImpliedQuote"

symbol = "TSLA"
poll_interval_s = 10
poll_duration_s = 120
limit = 2000

#create db connection
db_path = "/mnt/data/optanalytics_iv_greeks.db"

def get_conn(path: str = db_path):
    conn = sqlite3.connect(path)
    conn.execute("PRAGMA journal_mode=WAL;")
    conn.execute("PRAGMA synchronous=NORMAL;")
    return conn

#create db schema
def setup_db(path: str = db_path):
    conn = get_conn(path)
    cur = conn.cursor()

    cur.execute("""
    create table if not exists implied_quote_history (
        id integer primary key autoincrement,
        asof_ts text not null,

        option_key text not null,
        symbol text not null,
        expiry text not null,
        strike real not null,
        cp text not null,

        calc_source text,
        u_prc real,
        years real,
        rate real,

        s_vol real,
        atm_vol real,
        s_mark real,

        o_bid real,
        o_ask real,
        o_bid_iv real,
        o_ask_iv real,

        delta real,
        gamma real,
        theta real,
        vega real,

        src_ts text
    );
    """)

    cur.execute("""
    create index if not exists idx_hist_symbol_expiry_asof
    on implied_quote_history(symbol, expiry, asof_ts);
    """)

    cur.execute("""
    create index if not exists idx_hist_option_asof
    on implied_quote_history(option_key, asof_ts);
    """)

    cur.execute("""
    create table if not exists implied_quote_latest (
        option_key text primary key,

        last_asof_ts text not null,
        symbol text not null,
        expiry text not null,
        strike real not null,
        cp text not null,

        calc_source text,
        u_prc real,
        years real,
        rate real,

        s_vol real,
        atm_vol real,
        s_mark real,

        o_bid real,
        o_ask real,
        o_bid_iv real,
        o_ask_iv real,

        delta real,
        gamma real,
        theta real,
        vega real,

        src_ts text
    );
    """)

    cur.execute("""
    create index if not exists idx_latest_symbol_expiry
    on implied_quote_latest(symbol, expiry);
    """)

    conn.commit()
    conn.close()

setup_db()
</code></pre>
<p>This creates the SQLite database file and both tables. The history table is append-only and indexed for the two queries we'll run later: pulling snapshots by expiry and time, and pulling a specific option's timeline by <code>option_key</code>. The latest table is keyed by <code>option_key</code>, which lets us upsert and maintain a consistent "current view."</p>
<p>The columns we store are intentionally opinionated. We keep surface IV (s_vol), surface mark (s_mark), Greeks, and a few context fields. We also store timestamps so later we can reason about when a value was produced.</p>
<h2 id="heading-pulling-liveimpliedquote">Pulling LiveImpliedQuote</h2>
<p>Now we do the first live pull. The goal here is not to build a perfect filter. It's to confirm that we can retrieve a meaningful slice of TSLA option analytics and that the response structure is what we expect.</p>
<p>We request LiveImpliedQuote and filter by symbol using the where clause. The response is a list where most rows are actual LiveImpliedQuote messages, and one row at the end is a QueryResult summary.</p>
<pre><code class="language-python">def fetch_live_implied_quote(symbol: str, limit: int = 2000):
    where = f"okey.tk:eq:{symbol}"

    params = {
        "apiKey": api_key,
        "cmd": "getmsgs",
        "msgType": msg_type,
        "where": where,
        "limit": limit
    }

    r = requests.get(mlink_url, params=params)
    r.raise_for_status()
    return r.json()

raw = fetch_live_implied_quote(symbol, limit=limit)
print("raw messages:", len(raw))
print("first type:", raw[0].get("header", {}).get("mTyp") if raw else None)
</code></pre>
<p>This is a straight REST <code>getmsgs</code> call. We pass the API key, message type, and a simple symbol filter. The <code>limit</code> is important. It caps how many messages we get back in one poll, so for active underlyings, the returned set of strikes and expiries can vary between polls. That's fine for this tutorial, because the goal is to show the database pattern and the types of monitoring queries it enables.</p>
<p>This is the output you should see:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/606259cd-e6ed-4f6f-b24f-48fafe9c561b.png" alt="LiveImpliedQuote sample pull" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-normalizing-the-response-into-rows">Normalizing the Response Into Rows</h2>
<p>Right now, raw is a list of nested message objects. That format is fine for transport, but it's not something you can store or query directly. So now, we turn each LiveImpliedQuote message into one flat row with a consistent schema.</p>
<pre><code class="language-python">def make_option_key(okey: dict) -&gt; str:
    return "|".join([
        str(okey.get("tk")),
        str(okey.get("dt")),
        str(okey.get("xx")),
        str(okey.get("cp")),
        str(okey.get("at")),
        str(okey.get("ts")),
    ])

def normalize_liq(raw: list, asof_ts: str, keep_calc_source: str = "Loop") -&gt; pd.DataFrame:
    rows = []

    for row in raw:
        if row.get("header", {}).get("mTyp") != "LiveImpliedQuote":
            continue

        m = row.get("message", {})
        if keep_calc_source and m.get("calcSource") != keep_calc_source:
            continue

        pkey = m.get("pkey", {})
        okey = pkey.get("okey", {})
        if not okey:
            continue

        s_vol = m.get("sVol")
        if s_vol is None or s_vol == 0:
            continue

        o_bid = m.get("oBid", 0) or 0
        o_ask = m.get("oAsk", 0) or 0

        quote_ok = int(not (o_bid == 0 and o_ask == 0))

        rows.append({
            "asof_ts": asof_ts,
            "option_key": make_option_key(okey),

            "symbol": okey.get("tk"),
            "expiry": okey.get("dt"),
            "strike": okey.get("xx"),
            "cp": okey.get("cp"),

            "calc_source": m.get("calcSource"),
            "u_prc": m.get("uPrc"),
            "years": m.get("years"),
            "rate": m.get("rate"),

            "s_vol": s_vol,
            "atm_vol": m.get("atmVol"),
            "s_mark": m.get("sMark"),

            "o_bid": o_bid,
            "o_ask": o_ask,
            "o_bid_iv": m.get("oBidIv"),
            "o_ask_iv": m.get("oAskIv"),
            "quote_ok": quote_ok,

            "delta": m.get("de"),
            "gamma": m.get("ga"),
            "theta": m.get("th"),
            "vega": m.get("ve"),

            "src_ts": m.get("timestamp"),
        })

    df = pd.DataFrame(rows)
    if df.empty:
        return df

    df = (
        df.sort_values("src_ts")
          .drop_duplicates(subset=["option_key"], keep="last")
          .reset_index(drop=True)
    )
    return df

asof_ts = datetime.now(timezone.utc).isoformat(timespec="seconds").replace("+00:00", "Z")
snapshot_df = normalize_liq(raw, asof_ts)

print("snapshot rows:", len(snapshot_df))
print("quote_ok distribution:", snapshot_df["quote_ok"].value_counts().to_dict() if not snapshot_df.empty else {})
snapshot_df.head()
</code></pre>
<p>There are three practical decisions baked into this normalization step:</p>
<ul>
<li><p>First, we build a stable <code>option_key</code> from the option identifier so we have a consistent primary key for the latest table.</p>
</li>
<li><p>Second, we keep only <code>calcSource="Loop"</code>. LiveImpliedQuote can include both Tick and Loop records. Loop records tend to be more consistent for snapshot-style analysis because the underlying reference price is stable across the surface.</p>
</li>
<li><p>Third, we avoid aggressive filtering. In this dataset, the top-of-book bid and ask fields can be zero even when the analytics fields are populated. So instead of dropping those rows, we store a <code>quote_ok</code> flag and keep the record. That keeps the pipeline usable while still making it obvious later which rows had live quotes.</p>
</li>
</ul>
<p>This is the output:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/7d04a9e8-d3ec-4737-a0a7-64cb3888380c.png" alt="LiveImpliedQuote snapshot" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>At this point, one row represents one option contract snapshot. The fact that <code>quote_ok</code> is 0 across the board simply means bid and ask are not populated in this slice, even though surface IV, Greeks, and other analytics fields are present. That's still useful for building a monitoring database, because the core idea here is tracking the evolution of analytics over time, not reconstructing executable markets.</p>
<h2 id="heading-writing-to-the-database">Writing to the Database</h2>
<p>Now that we have a clean snapshot DataFrame, the job is to persist it in two places.</p>
<p>History table: Append everything. This is the audit log. Latest table: Upsert by <code>option_key</code>. This is the fast "current view."</p>
<p>This separation is what makes the database useful. History lets you reconstruct any past snapshot. Latest lets you answer "what does the surface look like right now" without scanning time series.</p>
<pre><code class="language-python">def safe_add_column(table: str, col: str, col_type: str, path: str = db_path):
    conn = get_conn(path)
    cur = conn.cursor()
    existing = [r[1] for r in cur.execute(f"PRAGMA table_info({table});").fetchall()]
    if col not in existing:
        cur.execute(f"ALTER TABLE {table} ADD COLUMN {col} {col_type};")
    conn.commit()
    conn.close()

safe_add_column("implied_quote_history", "quote_ok", "INTEGER")
safe_add_column("implied_quote_latest", "quote_ok", "INTEGER")

def write_snapshot_to_db(df: pd.DataFrame, path: str = db_path) -&gt; tuple[int, int]:
    if df.empty:
        return 0, 0

    conn = get_conn(path)
    cur = conn.cursor()

    cols = [
        "asof_ts",
        "option_key","symbol","expiry","strike","cp",
        "calc_source","u_prc","years","rate",
        "s_vol","atm_vol","s_mark",
        "o_bid","o_ask","o_bid_iv","o_ask_iv",
        "delta","gamma","theta","vega",
        "quote_ok","src_ts"
    ]

    for c in cols:
        if c not in df.columns:
            df[c] = None

    insert_df = df[cols].copy()

    cur.executemany(
        """
        insert into implied_quote_history (
            asof_ts,
            option_key, symbol, expiry, strike, cp,
            calc_source, u_prc, years, rate,
            s_vol, atm_vol, s_mark,
            o_bid, o_ask, o_bid_iv, o_ask_iv,
            delta, gamma, theta, vega,
            quote_ok, src_ts
        ) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """,
        insert_df.itertuples(index=False, name=None)
    )
    history_inserted = cur.rowcount

    cur.executemany(
        """
        insert into implied_quote_latest (
            option_key,
            last_asof_ts, symbol, expiry, strike, cp,
            calc_source, u_prc, years, rate,
            s_vol, atm_vol, s_mark,
            o_bid, o_ask, o_bid_iv, o_ask_iv,
            delta, gamma, theta, vega,
            quote_ok, src_ts
        ) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        on conflict(option_key) do update set
            last_asof_ts=excluded.last_asof_ts,
            symbol=excluded.symbol,
            expiry=excluded.expiry,
            strike=excluded.strike,
            cp=excluded.cp,
            calc_source=excluded.calc_source,
            u_prc=excluded.u_prc,
            years=excluded.years,
            rate=excluded.rate,
            s_vol=excluded.s_vol,
            atm_vol=excluded.atm_vol,
            s_mark=excluded.s_mark,
            o_bid=excluded.o_bid,
            o_ask=excluded.o_ask,
            o_bid_iv=excluded.o_bid_iv,
            o_ask_iv=excluded.o_ask_iv,
            delta=excluded.delta,
            gamma=excluded.gamma,
            theta=excluded.theta,
            vega=excluded.vega,
            quote_ok=excluded.quote_ok,
            src_ts=excluded.src_ts
        """,
        insert_df[[
            "option_key","asof_ts","symbol","expiry","strike","cp",
            "calc_source","u_prc","years","rate",
            "s_vol","atm_vol","s_mark",
            "o_bid","o_ask","o_bid_iv","o_ask_iv",
            "delta","gamma","theta","vega",
            "quote_ok","src_ts"
        ]].itertuples(index=False, name=None)
    )
    latest_upserted = cur.rowcount

    conn.commit()
    conn.close()
    return history_inserted, latest_upserted

hist_n, latest_n = write_snapshot_to_db(snapshot_df)
print("history inserted:", hist_n)
print("latest upserted:", latest_n)
</code></pre>
<p>We batch write using <code>executemany</code> so inserts are fast even with thousands of option rows. The history insert is straightforward. The latest write uses a SQLite upsert keyed on <code>option_key</code>, which means if the contract already exists in the latest table, its fields are overwritten with the newest snapshot.</p>
<p>You should see:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/8fdbdeb1-a4f2-434d-a3c7-99f44e51ec5d.png" alt="History inserted: 1852, latest upserted: 1852" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>After the first write, both tables have the same number of rows. That's expected, because there is only one snapshot in history so far. Once we start polling multiple snapshots, the history table will grow every cycle, while the latest table will stay roughly flat and continue updating in place.</p>
<h2 id="heading-running-a-short-polling-capture">Running a Short Polling Capture</h2>
<p>At this point, the pipeline works end-to-end for a single snapshot. The whole point of the database, though, is to turn live analytics into a time series. So we run a short capture window and store multiple snapshots back-to-back.</p>
<p>This isn't meant to be a production scheduler. It's just a simple loop that runs for a couple of minutes, polls every few seconds, timestamps the snapshot, and writes it to both tables.</p>
<pre><code class="language-python">def poll_and_write(symbol: str, duration_s: int = poll_duration_s, interval_s: int = poll_interval_s):
    start = time.time()
    polls = 0
    total_hist = 0

    while time.time() - start &lt; duration_s:
        asof_ts = datetime.now(timezone.utc).isoformat(timespec="seconds").replace("+00:00", "Z")

        raw = fetch_live_implied_quote(symbol, limit=limit)
        df = normalize_liq(raw, asof_ts)

        hist_n, latest_n = write_snapshot_to_db(df)
        polls += 1
        total_hist += hist_n

        print(f"[{polls}] {asof_ts} snapshot_rows={len(df)} history+={hist_n} latest_upsert={latest_n}")
        time.sleep(interval_s)

    print(f"done. polls={polls}, total_history_added={total_hist}")

poll_and_write(symbol, duration_s=120, interval_s=10)
</code></pre>
<p>Each loop iteration represents one snapshot. We generate a UTC timestamp (asof_ts), pull the latest batch from LiveImpliedQuote, normalize it into rows, then write it into the database. The history table accumulates every snapshot. The latest table overwrites by <code>option_key</code>, so it always represents the most recent view.</p>
<p>One practical detail is worth calling out. The API call is capped by limit, so you're not guaranteed to receive an identical set of strikes and expiries every poll. That's why <code>snapshot_rows</code> can vary between iterations.</p>
<p>In production, you usually stabilize the slice by pinning specific expiries and a strike band or by interpolating IV to fixed moneyness points. For this tutorial, we're keeping ingestion simple and focusing on the database pattern and the monitoring queries it enables.</p>
<p>You should see per-poll telemetry like this:</p>
<pre><code class="language-plaintext">[1] 2026-04-14T18:09:29Z snapshot_rows=1454 history+=1454 latest_upsert=1454
...
done. polls=9, total_history_added=12806
</code></pre>
<p>This confirms the database is building a time series. Over nine polls, you stored 12,806 option rows in history. The latest table is updated each time, but it doesn't grow in the same way as history because it overwrites per contract key.</p>
<p>From the next section, we'll stop writing and start querying.</p>
<h2 id="heading-analysis-smile-reconstruction-from-the-database">Analysis: Smile Reconstruction From the Database</h2>
<p>Once the data is in <code>implied_quote_history</code>, the workflow flips. We stop thinking in terms of "API responses" and start thinking in terms of "queries." This section does two things. First, it picks an expiry that has enough rows to be representative. Then it reconstructs the call-side volatility smile for that expiry across a few timestamps.</p>
<h3 id="heading-pick-an-expiry-with-good-coverage">Pick an Expiry with Good Coverage</h3>
<p>If you pick an expiry that only appears sporadically in the captured snapshots, the smile plot will be misleading. So we start by looking at which expiries have the most rows in the history table.</p>
<pre><code class="language-python">conn = get_conn()

expiry_counts = pd.read_sql_query(
    """
    select expiry, count(*) as n
    from implied_quote_history
    where symbol = ?
    group by expiry
    order by n desc
    limit 10
    """,
    conn,
    params=(symbol,)
)

conn.close()
expiry_counts
</code></pre>
<p>This query scans only the history table, filters to TSLA, and counts how many option rows exist per expiry across the capture window. We keep the top 10 and pick the first one as the expiry we'll reconstruct.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/2f7b897f-0a4f-4b1a-826e-0fee6b19f2bd.png" alt="Expiry-wise coverage" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The expiry date <code>2026-11-20</code> has the highest count.</p>
<p>Here, the count doesn't mean this expiry is "best" in any trading sense. It just means it showed up most consistently in the captured data. That makes it a practical choice for a clean smile comparison.</p>
<h3 id="heading-rebuild-the-smile-across-snapshots">Rebuild the Smile Across Snapshots</h3>
<p>Now we query the stored history for one expiry, keep only calls, and plot surface IV (s_vol) against strike for multiple snapshot timestamps.</p>
<pre><code class="language-python">chosen_expiry = "2026-11-20" 

conn = get_conn()
smile = pd.read_sql_query(
    """
    select asof_ts, strike, cp, s_vol, u_prc
    from implied_quote_history
    where symbol = ? and expiry = ?
    """,
    conn,
    params=(symbol, chosen_expiry)
)
conn.close()

smile_calls = smile[smile["cp"] == "Call"].copy()

ts_list = sorted(smile_calls["asof_ts"].unique())
pick = [ts_list[0], ts_list[len(ts_list)//2], ts_list[-1]]

plt.figure(figsize=(9,5))
for ts in pick:
    g = smile_calls[smile_calls["asof_ts"] == ts].sort_values("strike")
    plt.plot(g["strike"], g["s_vol"], label=ts)

plt.title(f"{symbol} Vol Smile (Calls) | Expiry {chosen_expiry} | 3 snapshots")
plt.xlabel("Strike")
plt.ylabel("Implied Vol (s_vol)")
plt.grid(True)
plt.legend()
plt.show()
</code></pre>
<p>We pull all rows for the chosen expiry from history, then filter to calls so we don't mix put and call shapes. To keep the plot readable, we only plot three snapshots. First, middle, and last.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/84416f80-9253-4f18-8da4-ea814e174987.png" alt="TSLA vol smile (calls)" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Over a short capture window, the smiles often overlap heavily. That doesn't mean the system isn't working. It usually means the surface didn't move much in those two minutes. The important part is that we can reconstruct and compare it purely from stored history.</p>
<h3 id="heading-zoom-in-around-spot">Zoom-In Around Spot</h3>
<p>The full-range plot is useful for shape, but it can hide small shifts near the region people actually care about. So we zoom to a band around the underlying price.</p>
<pre><code class="language-python">s0 = float(smile_calls["u_prc"].dropna().median())
low, high = s0 * 0.6, s0 * 1.4

for ts in pick:
    g = smile_calls[smile_calls["asof_ts"] == ts].sort_values("strike")
    g = g[(g["strike"] &gt;= low) &amp; (g["strike"] &lt;= high)]
    plt.plot(g["strike"], g["s_vol"], label=ts)

plt.title(f"{symbol} Vol Smile (Calls) | Expiry {chosen_expiry} | zoomed")
plt.xlabel("Strike")
plt.ylabel("Implied Vol (s_vol)")
plt.grid(True)
plt.legend(fontsize=8)
plt.show()
</code></pre>
<p>We take a robust spot proxy from the stored <code>u_prc</code> values and then keep strikes within a range around it. The goal is not precision. It's to make the chart readable and show whether the near-ATM region is drifting.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/107de4b4-7b40-4e79-a38b-fac96cb11b26.png" alt="TSLA vol smile (calls)  -  zoomed-in" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Here, even small changes become visible. This is also why storing history matters. If you only looked at one snapshot in isolation, these shifts would be easy to miss or dismiss.</p>
<h2 id="heading-analysis-atm-iv-and-skew-over-time">Analysis: ATM IV and Skew Over Time</h2>
<p>A full smile plot is useful, but it's not always the fastest way to monitor a surface. In practice, teams usually track a few summary numbers per expiry so they can spot changes quickly, then drill down only when something looks off.</p>
<p>Here we reduce each stored snapshot into two metrics for a single expiry.</p>
<ul>
<li><p>ATM IV: Surface IV at the strike closest to spot.</p>
</li>
<li><p>Skew proxy: Surface IV at 0.9 times spot minus surface IV at 1.1 times spot, using the closest available strikes.</p>
</li>
</ul>
<pre><code class="language-python">chosen_expiry = "2026-11-20"

conn = get_conn()
df = pd.read_sql_query(
    """
    select asof_ts, strike, s_vol, u_prc
    from implied_quote_history
    where symbol = ? and expiry = ? and cp = 'Call'
    """,
    conn,
    params=(symbol, chosen_expiry)
)
conn.close()

df["strike"] = df["strike"].astype(float)
df["s_vol"] = df["s_vol"].astype(float)

def closest_iv(grp: pd.DataFrame, target_strike: float):
    g = grp.iloc[(grp["strike"] - target_strike).abs().argsort()[:1]]
    return float(g["s_vol"].iloc[0]), float(g["strike"].iloc[0])

rows = []
for ts, grp in df.groupby("asof_ts"):
    spot = float(grp["u_prc"].dropna().median())
    atm_target = spot
    down_target = spot * 0.9
    up_target = spot * 1.1

    atm_iv, atm_k = closest_iv(grp, atm_target)
    down_iv, down_k = closest_iv(grp, down_target)
    up_iv, up_k = closest_iv(grp, up_target)

    rows.append({
        "asof_ts": ts,
        "spot": spot,
        "atm_strike": atm_k,
        "atm_iv": atm_iv,
        "k90": down_k,
        "iv_90": down_iv,
        "k110": up_k,
        "iv_110": up_iv,
        "skew_90_110": down_iv - up_iv
    })

metrics = pd.DataFrame(rows).sort_values("asof_ts").reset_index(drop=True)
metrics
</code></pre>
<p>We query the history table for one expiry and keep only calls, then group by snapshot timestamp. For each snapshot, we use the median <code>u_prc</code> as a spot proxy and pick the closest available strike to spot. That gives ATM IV. We repeat the same approach for 0.9 times spot and 1.1 times spot and compute a skew proxy as the difference.</p>
<p>The table also stores the actual strikes used (atm_strike, k90, k110). Options strikes are discrete, so the nearest strike can change between snapshots. Keeping the chosen strikes visible makes the metric explainable when it moves.</p>
<p>The output is a table with one row per snapshot timestamp and the computed metrics.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/5590b162-5fe7-4713-8f56-edc4c6171ab2.png" alt="ATM IV, skew proxy metrics" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now that we have a clean time series table, we can visualize the two metrics. First, ATM IV. Then, the skew proxy.</p>
<pre><code class="language-python">plt.plot(metrics["asof_ts"], metrics["atm_iv"])
plt.title(f"{symbol} ATM IV over time | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.ylabel("ATM IV (s_vol)")
plt.grid(True)
plt.show()

plt.plot(metrics["asof_ts"], metrics["skew_90_110"])
plt.title(f"{symbol} Skew proxy (IV@0.9S - IV@1.1S) | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.ylabel("Skew proxy")
plt.grid(True)
plt.show()
</code></pre>
<p>Here is the first chart, ATM IV over time.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/0df9b0ff-e02f-4c6b-b4ec-175ddc46522c.png" alt="TSLA ATM IV over time" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>ATM IV tends to move slowly over short windows unless there is a sharp repricing event. In this run, it stays fairly stable, which is a realistic outcome for a short capture. The value here is that the database turns "fairly stable" into something you can quantify and compare later, rather than a vague impression.</p>
<p>Here is the second chart, Skew proxy over time.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/f90243ee-6039-4d7e-94ed-d248eaaf9722.png" alt="TSLA skew proxy" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The skew proxy is more sensitive because it's based on wing points. If it changes, it usually means the downside is being repriced differently from the upside for that expiry. One nuance is that the nearest available strike can change between snapshots, which can create step-like moves even when the surface isn't moving dramatically. That's why we keep k90 and k110 in the metrics table. It keeps the skew plot explainable.</p>
<h2 id="heading-alert-style-thresholds">Alert-Style Thresholds</h2>
<p>Once you have a metrics table per snapshot, adding a monitoring layer is straightforward. The idea isn't to generate trades. It's to flag when the surface moves enough that someone should look closer.</p>
<p>Here we do two checks:</p>
<ul>
<li><p>ATM IV change alert: Flag if ATM IV changes more than a small threshold between snapshots.</p>
</li>
<li><p>Skew change alert: Flag if the skew proxy changes more than a threshold between snapshots.</p>
</li>
</ul>
<pre><code class="language-python">alerts = metrics.copy()

alerts["atm_iv_change"] = alerts["atm_iv"].diff()
alerts["skew_change"] = alerts["skew_90_110"].diff()

atm_thresh = 0.002    
skew_thresh = 0.003   

alerts["atm_alert"] = alerts["atm_iv_change"].abs() &gt;= atm_thresh
alerts["skew_alert"] = alerts["skew_change"].abs() &gt;= skew_thresh

alerts[[
    "asof_ts",
    "atm_iv", "atm_iv_change", "atm_alert",
    "skew_90_110", "skew_change", "skew_alert",
    "atm_strike", "k90", "k110"
]]
</code></pre>
<p>We take the per-snapshot metrics table and compute first differences. Then we compare those changes to thresholds and store boolean flags. The output table keeps both the metrics and the strikes used for the calculations, so any alert is explainable rather than a black box.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/b6805adc-90f6-4c57-8dee-aa6e0ec4d724.png" alt="Alerts dataframe" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this run, the ATM IV alerts are all false, while the skew alert triggers once.</p>
<p>The skew alert fires because the skew proxy jumps by more than the threshold between two snapshots. This is explainable. If you see the table, you can see the strikes used for the proxy changed around the same time (k90 shifts from 340 to 315). Because strikes are discrete, nearest-strike metrics can step even when the surface is not moving dramatically.</p>
<p>To make this easier to read, we also plot the two series and mark alert points.</p>
<pre><code class="language-python">plt.plot(alerts["asof_ts"], alerts["atm_iv"])
for i, r in alerts[alerts["atm_alert"]].iterrows():
    plt.scatter(r["asof_ts"], r["atm_iv"],  s=30, edgecolors="r", alpha=0.6, linewidth=2)
plt.title(f"{symbol} ATM IV with alerts | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.grid(True)
plt.show()

plt.plot(alerts["asof_ts"], alerts["skew_90_110"])
for i, r in alerts[alerts["skew_alert"]].iterrows():
    plt.scatter(r["asof_ts"], r["skew_90_110"], s=30, edgecolors="r", alpha=0.6, linewidth=2)
plt.title(f"{symbol} Skew proxy with alerts | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.grid(True)
plt.show()
</code></pre>
<p>Both plots use the same pattern. Plot the metric as a line, then overlay a marker on any timestamp where the corresponding alert flag is true. This makes it obvious when something crossed the threshold.</p>
<p>This chart represents skew proxy with alerts.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/eff87263-68f0-4132-935d-bdf148e73c82.png" alt="TSLA skew proxy with alerts" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This chart shows one alert marker, which matches what we saw in the table.</p>
<p>The ATM IV plot isn't featured since there are no alert points.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>In this walkthrough, we used SpiderRock MLink's LiveImpliedQuote feed for TSLA and turned it into a small internal database you can query. We stored every snapshot in an append-only history table, maintained a latest view keyed by a stable option identifier, then used that stored data to rebuild a smile, track ATM surface IV and a simple skew proxy, and add a basic alert rule on top.</p>
<p>This fits well in B2B workflows because it turns live analytics into something operational: a dataset you can audit, replay, and monitor. The same pattern works whether you're building an internal dashboard, running routine surface checks for a desk, or doing a quick post-event review without relying on screenshots and one-off notebook runs.</p>
<p>If you want to extend it, the most practical next steps are longer capture windows, tracking multiple symbols, and moving from SQLite to Postgres once the data volume grows. If metric stability becomes important, you can also standardize the slice you track per poll or interpolate IV to fixed moneyness points so skew measures don't step when nearest strikes change.</p>
<p>With that being said, you've reached the end of the article. Hope you learned something new and useful.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Migrate to S3 Native State Locking in Terraform ]]>
                </title>
                <description>
                    <![CDATA[ If you've been running Terraform on AWS for any length of time, you know the setup: an S3 bucket for state storage, a DynamoDB table for state locking, and a handful of IAM policies tying them togethe ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-migrate-to-s3-native-state-locking-in-terraform/</link>
                <guid isPermaLink="false">69fd19239f93a850a430069b</guid>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Terraform ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Infrastructure as code ]]>
                    </category>
                
                    <category>
                        <![CDATA[ S3 ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tolani Akintayo ]]>
                </dc:creator>
                <pubDate>Thu, 07 May 2026 22:58:43 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/9619ad45-15c5-4be7-9221-ed4b76bc2b24.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you've been running Terraform on AWS for any length of time, you know the setup: an S3 bucket for state storage, a DynamoDB table for state locking, and a handful of IAM policies tying them together. It works. It has worked for years.</p>
<p>But it has always carried a cost that rarely gets discussed openly. That cost isn't just money, though a DynamoDB table with on-demand billing adds up across multiple teams and environments.</p>
<p>The real cost is complexity. Every new AWS environment needs both resources provisioned before Terraform can manage anything else. Every engineer who sets up their first Terraform backend has to understand why two completely different AWS services are responsible for what is logically one thing: storing and protecting state. And every incident involving a stuck lock has required someone to manually delete a record from DynamoDB to unblock the team.</p>
<p>In November 2024, AWS announced that S3 now supports native object locking for Terraform state files, meaning <strong>DynamoDB is no longer required for state locking</strong>. Terraform 1.10 added support for this feature, and it's now generally available.</p>
<p>In this tutorial, you'll learn:</p>
<ul>
<li><p>What S3 native locking is and how it works</p>
</li>
<li><p>How to set it up from scratch if you're starting a new project</p>
</li>
<li><p>How to migrate an existing S3 + DynamoDB setup to S3 native locking safely</p>
</li>
<li><p>How to verify locking is working and handle edge cases</p>
</li>
</ul>
<p>By the end, you'll have a simpler, cleaner Terraform backend with one fewer AWS resource to manage.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-what-is-terraform-state-locking">What Is Terraform State Locking?</a></p>
</li>
<li><p><a href="#heading-what-is-s3-native-state-locking">What Is S3 Native State Locking?</a></p>
</li>
<li><p><a href="#heading-how-s3-native-locking-compares-to-the-s3-dynamodb-approach">How S3 Native Locking Compares to the S3 + DynamoDB Approach</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-part-1-fresh-setup-how-to-configure-s3-native-locking-from-scratch">Part 1: Fresh Setup – How to Configure S3 Native Locking from Scratch</a></p>
<ul>
<li><p><a href="#heading-step-1-create-the-s3-bucket-with-versioning-and-encryption">Step 1: Create the S3 Bucket with Versioning and Encryption</a></p>
</li>
<li><p><a href="#heading-step-2-configure-the-terraform-backend-with-native-locking">Step 2: Configure the Terraform Backend with Native Locking</a></p>
</li>
<li><p><a href="#heading-step-3-initialize-and-verify">Step 3: Initialize and Verify</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-part-2-migration-how-to-move-from-s3-dynamodb-to-s3-native-locking">Part 2: Migration – How to Move from S3 + DynamoDB to S3 Native Locking</a></p>
<ul>
<li><p><a href="#heading-step-1-verify-your-current-setup">Step 1: Verify Your Current Setup</a></p>
</li>
<li><p><a href="#heading-step-2-enable-object-lock-on-the-existing-s3-bucket">Step 2: Enable Object Lock on the Existing S3 Bucket</a></p>
</li>
<li><p><a href="#heading-step-3-update-the-terraform-backend-configuration">Step 3: Update the Terraform Backend Configuration</a></p>
</li>
<li><p><a href="#heading-step-4-reinitialize-terraform">Step 4: Reinitialize Terraform</a></p>
</li>
<li><p><a href="#heading-step-5-verify-the-migration">Step 5: Verify the Migration</a></p>
</li>
<li><p><a href="#heading-step-6-clean-up-the-dynamodb-table">Step 6: Clean Up the DynamoDB Table</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-how-to-verify-that-locking-is-working">How to Verify That Locking Is Working</a></p>
</li>
<li><p><a href="#heading-how-to-handle-a-stuck-lock">How to Handle a Stuck Lock</a></p>
</li>
<li><p><a href="#heading-rollback-plan-if-something-goes-wrong">Rollback Plan: If Something Goes Wrong</a></p>
</li>
<li><p><a href="#heading-security-best-practices-for-your-state-bucket">Security Best Practices for Your State Bucket</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-references">References</a></p>
</li>
</ul>
<h2 id="heading-what-is-terraform-state-locking">What is Terraform State Locking?</h2>
<p>Before looking at the new approach, it helps to understand what state locking is solving.</p>
<p>Terraform stores everything it knows about your infrastructure in a <strong>state file</strong> – a JSON document that maps your configuration to real AWS resources. When you run <code>terraform apply</code>, Terraform reads this file, calculates the difference between the current state and your configuration, and makes the necessary changes.</p>
<p>The problem arises when two engineers or two CI/CD pipelines run and try to apply changes at the same time. If both read the state file simultaneously, calculate changes independently, and both try to write back, you get a <strong>race condition</strong>. The second write overwrites changes from the first, and your state is now out of sync with reality. This is a serious problem that can cause resources to be untracked, doubled, or destroyed unexpectedly.</p>
<p><strong>State locking</strong> solves this by creating a lock when any operation starts that could modify state. If a lock already exists, Terraform refuses to proceed and reports who holds the lock and when it was acquired. Only one operation can hold the lock at a time. When the operation completes, the lock is released.</p>
<pre><code class="language-plaintext">Terraform Run A                 State File / Lock                Terraform Run B
(User 1)                         (S3/DynamoDB)                   (User 2)

   |                                   |                            |
   |------- 1. Acquire Lock ----------&gt;|                            |
   |                                   |                            |
   |&lt;------ 2. Lock Granted -----------|                            |
   |                                   |                            |
   |                                   |------- 3. Acquire Lock ---&gt;|
   |            [PROCESSING]           |                            |
   |      (Modifying Infrastructure)   |&lt;------ 4. Lock Denied -----|
   |                                   |        (Wait / Retry)      |
   |                                   |                            |
   |------- 5. Release Lock ----------&gt;|                            |
   |                                   |                            |
   |           [COMPLETED]             |&lt;------ 6. Lock Granted ----|
   |                                   |                            |
   |                                   |       [PROCESSING]         |
   |                                   | (Modifying Infrastructure) |              
   |                                   |                            |
</code></pre>
<h2 id="heading-what-is-s3-native-state-locking">What Is S3 Native State Locking?</h2>
<p>Previously, Terraform's S3 backend used a DynamoDB table as the locking mechanism. When a lock was needed, Terraform wrote a record to DynamoDB with a <code>LockID</code> primary key. DynamoDB's conditional writes guaranteed that only one process could create that record, which is what made the locking atomic.</p>
<p>S3 native locking uses <strong>S3 Object Lock</strong> instead. S3 Object Lock is an S3 feature originally designed to enforce WORM (Write Once, Read Many) compliance for regulatory requirements. AWS extended this capability to support Terraform's state locking workflow.</p>
<p>When S3 native locking is enabled in your Terraform backend:</p>
<ol>
<li><p>Terraform writes your state to an <code>.tfstate</code> object in S3 (as before)</p>
</li>
<li><p>To acquire a lock, Terraform uses <strong>S3's conditional write operations</strong> – specifically the <code>if-none-match</code> conditional header to create a lock file atomically</p>
</li>
<li><p>If the lock file already exists, S3 rejects the write, and Terraform reports that a lock is held</p>
</li>
<li><p>When the operation completes, Terraform deletes the lock file to release the lock.</p>
</li>
</ol>
<p>The key difference from DynamoDB: the entire locking mechanism lives inside S3. No second service. No second set of IAM permissions. No second resource to provision.</p>
<p><strong>Note:</strong> This feature requires Terraform version <strong>1.10.0 or later</strong> and an S3 bucket with <strong>Object Lock enabled</strong>. Object Lock must be enabled at bucket creation time. You can't enable it on an existing bucket through the console or CLI. But there is a supported workaround for existing buckets, which we'll cover in Part 2.</p>
<h2 id="heading-how-s3-native-locking-compares-to-the-s3-dynamodb-approach">How S3 Native Locking Compares to the S3 + DynamoDB Approach</h2>
<table>
<thead>
<tr>
<th><strong>Aspect</strong></th>
<th><strong>S3 + DynamoDB (Old)</strong></th>
<th><strong>S3 Native Locking (New)</strong></th>
</tr>
</thead>
<tbody><tr>
<td><strong>AWS services required</strong></td>
<td>S3 + DynamoDB</td>
<td>S3 only</td>
</tr>
<tr>
<td><strong>IAM permissions needed</strong></td>
<td>S3 + DynamoDB permissions</td>
<td>S3 permissions only</td>
</tr>
<tr>
<td><strong>Terraform version</strong></td>
<td>Any</td>
<td>1.10.0 or later</td>
</tr>
<tr>
<td><strong>Setup complexity</strong></td>
<td>Two resources, two IAM scopes</td>
<td>One resource</td>
</tr>
<tr>
<td><strong>Stuck lock resolution</strong></td>
<td>Delete DynamoDB record</td>
<td>Delete S3 lock file</td>
</tr>
<tr>
<td><strong>Cost</strong></td>
<td>S3 storage + DynamoDB on-demand</td>
<td>S3 storage only</td>
</tr>
<tr>
<td><strong>Object Lock requirement</strong></td>
<td>Not required</td>
<td>Required on S3 bucket</td>
</tr>
<tr>
<td><strong>Locking mechanism</strong></td>
<td>DynamoDB conditional writes</td>
<td>S3 conditional writes (<code>if-none-match</code>)</td>
</tr>
<tr>
<td><strong>State versioning</strong></td>
<td>S3 Versioning (recommended)</td>
<td>S3 Versioning (required for full safety)</td>
</tr>
</tbody></table>
<p>The functional behavior from Terraform's perspective is identical. Locking works the same way. The lock information displayed when a lock is held has the same structure. The only difference is what happens under the hood.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you start, make sure you have the following in place:</p>
<ul>
<li><strong>Terraform 1.10.0 or later</strong> installed. Check your version:</li>
</ul>
<pre><code class="language-shell">terraform version
</code></pre>
<p>If you need to upgrade, follow the <a href="https://developer.hashicorp.com/terraform/install">official upgrade guide</a>.</p>
<ul>
<li><strong>AWS CLI</strong> installed and configured with credentials that have permission to create and manage S3 buckets.</li>
</ul>
<pre><code class="language-shell">aws --version
aws sts get-caller-identity   # confirm you're authenticated
</code></pre>
<ul>
<li><p><strong>IAM permissions</strong> to perform the following S3 actions:</p>
<ul>
<li><p><code>s3:CreateBucket</code></p>
</li>
<li><p><code>s3:PutBucketVersioning</code></p>
</li>
<li><p><code>s3:PutBucketEncryption</code></p>
</li>
<li><p><code>s3:PutObjectLegalHold</code></p>
</li>
<li><p><code>s3:PutObjectRetention</code></p>
</li>
<li><p><code>s3:GetObject</code></p>
</li>
<li><p><code>s3:PutObject</code></p>
</li>
<li><p><code>s3:DeleteObject</code></p>
</li>
<li><p><code>s3:ListBucket</code></p>
</li>
</ul>
</li>
<li><p>For the <strong>migration path</strong>: access to your existing Terraform project and the S3 bucket and DynamoDB table currently in use.</p>
</li>
</ul>
<h2 id="heading-part-1-fresh-setup-how-to-configure-s3-native-locking-from-scratch">Part 1: Fresh Setup – How to Configure S3 Native Locking from Scratch</h2>
<p>Follow this section if you're starting a new Terraform project and want to use S3 native locking from the beginning.</p>
<h3 id="heading-step-1-create-the-s3-bucket-with-versioning-and-encryption">Step 1: Create the S3 Bucket with Versioning and Encryption</h3>
<p>Object Lock <strong>must be enabled at bucket creation time</strong>. You can't add it afterward through the standard console flow. Create the bucket using the AWS CLI with Object Lock enabled:</p>
<pre><code class="language-shell">aws s3api create-bucket \
  --bucket your-project-terraform-state \
  --region us-east-1 \
  --object-lock-enabled-for-bucket
</code></pre>
<p><strong>Note:</strong> For regions other than <code>us-east-1</code>, add the <code>--create-bucket-configuration</code> flag.</p>
<pre><code class="language-shell">aws s3api create-bucket \
  --bucket your-project-terraform-state \
  --region eu-west-1 \
  --create-bucket-configuration LocationConstraint=eu-west-1 \
  --object-lock-enabled-for-bucket
</code></pre>
<p>Now enable versioning on the bucket. Versioning is required alongside Object Lock and allows Terraform to recover previous state versions if something goes wrong:</p>
<pre><code class="language-shell">aws s3api put-bucket-versioning \
  --bucket your-project-terraform-state \
  --versioning-configuration Status=Enabled
</code></pre>
<p>Enable server-side encryption so your state files are encrypted at rest:</p>
<pre><code class="language-shell">aws s3api put-bucket-encryption \
  --bucket your-project-terraform-state \
  --server-side-encryption-configuration '{
    "Rules": [
      {
        "ApplyServerSideEncryptionByDefault": {
          "SSEAlgorithm": "AES256"
        },
        "BucketKeyEnabled": true
      }
    ]
  }'
</code></pre>
<p>Block all public access to the bucket. A Terraform state file contains resource IDs, IP addresses, and potentially sensitive values. It should never be publicly accessible:</p>
<pre><code class="language-shell">aws s3api put-public-access-block \
  --bucket your-project-terraform-state \
  --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
</code></pre>
<p>Verify the bucket configuration:</p>
<pre><code class="language-shell"># Confirm Object Lock is enabled
aws s3api get-object-lock-configuration \
  --bucket your-project-terraform-state
 
# Confirm versioning is enabled
aws s3api get-bucket-versioning \
  --bucket your-project-terraform-state
 
# Confirm encryption is configured
aws s3api get-bucket-encryption \
  --bucket your-project-terraform-state
</code></pre>
<p>Expected output for the Object Lock check:</p>
<pre><code class="language-json">{
    "ObjectLockConfiguration": {
        "ObjectLockEnabled": "Enabled"
    }
}
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/2b2e56cf-687f-4932-a61e-ed7cc33ea6f1.png" alt="Terminal showing AWS CLI verification commands confirming S3 bucket is configured correctly with Object Lock, versioning, and encryption enabled" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-2-configure-the-terraform-backend-with-native-locking">Step 2: Configure the Terraform Backend with Native Locking</h3>
<p>In your Terraform project, create or update your <code>backend.tf</code> file:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket = "your-project-terraform-state"
    key    = "production/terraform.tfstate"
    region = "us-east-1"
 
    # Enable S3 native state locking
    # Requires Terraform 1.10.0+ and a bucket with Object Lock enabled
    use_lockfile = true
 
    # Encryption at rest
    encrypt = true
  }
}
</code></pre>
<p>The critical difference from the old configuration is the <code>use_lockfile = true</code> parameter. Notice what is <strong>absent</strong>: there's no <code>dynamodb_table</code> argument. No DynamoDB table. No second service.</p>
<p>Here's a direct comparison of the old and new configurations:</p>
<p><strong>Old configuration (S3 + DynamoDB):</strong></p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket         = "your-project-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"   # this goes away
  }
}
</code></pre>
<p><strong>New configuration (S3 native locking):</strong></p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket       = "your-project-terraform-state"
    key          = "production/terraform.tfstate"
    region       = "us-east-1"
    encrypt      = true
    use_lockfile = true   # this replaces dynamodb_table
  }
}
</code></pre>
<h3 id="heading-step-3-initialize-and-verify">Step 3: Initialize and Verify</h3>
<p>Run <code>terraform init</code> to initialize the backend:</p>
<pre><code class="language-shell">terraform init
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">Initializing the backend...
 
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
 
Initializing provider plugins...
 
Terraform has been successfully initialized!
</code></pre>
<p>Run a plan to confirm everything is working end-to-end:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>If locking is working, you'll see a brief pause while Terraform acquires the lock before the plan output appears. You'll also see the lock information if you look at the S3 bucket&nbsp;– a <code>.tflock</code> file will appear temporarily alongside your state file during the operation and disappear when it completes.</p>
<h2 id="heading-part-2-migration-how-to-move-from-s3-dynamodb-to-s3-native-locking">Part 2: Migration&nbsp;– How to Move from S3 + DynamoDB to S3 Native Locking</h2>
<p>Follow this section if you have an <strong>existing Terraform setup</strong> using an S3 bucket and DynamoDB table for state locking, and you want to migrate to S3 native locking.</p>
<p><strong>Important:</strong> Migration requires a maintenance window or at minimum a period where no Terraform operations are running. You're changing the backend configuration, which means <strong>all team members and CI/CD pipelines must stop running</strong> <code>terraform plan</code> <strong>or</strong> <code>terraform apply</code> <strong>during the migration</strong>. The migration itself takes under 10 minutes.</p>
<h3 id="heading-step-1-verify-your-current-setup">Step 1: Verify Your Current Setup</h3>
<p>Before making any changes, document your existing backend configuration and confirm the state file is accessible:</p>
<pre><code class="language-shell"># Confirm your state file is in S3
aws s3 ls s3://your-existing-bucket/path/to/terraform.tfstate
 
# Confirm the DynamoDB table exists
aws dynamodb describe-table \
  --table-name your-dynamodb-lock-table \
  --query 'Table.TableStatus'
</code></pre>
<p>Check your current <code>backend.tf</code> and note the exact values:</p>
<pre><code class="language-shell"># Your current backend.tf - note these values before changing anything
terraform {
  backend "s3" {
    bucket         = "your-existing-bucket"       # note this
    key            = "path/to/terraform.tfstate"   # note this
    region         = "us-east-1"                   # note this
    encrypt        = true
    dynamodb_table = "your-dynamodb-lock-table"    # this will be removed
  }
}
</code></pre>
<p>Run one final plan to confirm the current state is clean and there are no unexpected changes pending:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>If the plan shows no changes, you're in a safe state to proceed.</p>
<h3 id="heading-step-2-enable-object-lock-on-the-existing-s3-bucket">Step 2: Enable Object Lock on the Existing S3 Bucket</h3>
<p>This is the most important step in the migration. Object Lock can't normally be enabled on an existing bucket. It's a setting that must be configured at creation time.</p>
<p>But AWS provides a way to enable Object Lock on an existing bucket through a support request or through a direct API call that's not exposed in the standard console UI. AWS has officially documented this path for the Terraform migration use case.</p>
<p>Run the following AWS CLI command to enable Object Lock on your <strong>existing</strong> bucket:</p>
<pre><code class="language-bash">aws s3api put-object-lock-configuration \
  --bucket your-existing-bucket \
  --object-lock-configuration '{"ObjectLockEnabled": "Enabled"}'
</code></pre>
<p><strong>Note:</strong> This command enables Object Lock in <strong>governance mode with no default retention</strong>, meaning it enables the locking capability without setting a default retention period on all objects. This is exactly what Terraform's native locking needs: the ability to create and delete lock files, not permanent object retention.</p>
<p>Verify Object Lock is now enabled:</p>
<pre><code class="language-shell">aws s3api get-object-lock-configuration \
  --bucket your-existing-bucket
</code></pre>
<p>Expected output:</p>
<pre><code class="language-json">{
    "ObjectLockConfiguration": {
        "ObjectLockEnabled": "Enabled"
    }
}
</code></pre>
<p>Also verify that versioning is already enabled (it should be if you are running a production Terraform setup):</p>
<pre><code class="language-shell">aws s3api get-bucket-versioning \
  --bucket your-existing-bucket
</code></pre>
<p>Expected output:</p>
<pre><code class="language-json">{
    "Status": "Enabled"
}
</code></pre>
<p>If versioning isn't enabled, enable it before proceeding:</p>
<pre><code class="language-shell">aws s3api put-bucket-versioning \
  --bucket your-existing-bucket \
  --versioning-configuration Status=Enabled
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/cd17df01-3d0a-4f93-9250-3f51627e91c8.png" alt="Terminal output showing successful Object Lock enablement on an existing S3 bucket using the AWS CLI" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-3-update-the-terraform-backend-configuration">Step 3: Update the Terraform Backend Configuration</h3>
<p>Update your <code>backend.tf</code> to remove the <code>dynamodb_table</code> argument and add <code>use_lockfile = true</code>:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket = "your-existing-bucket"
    key    = "path/to/terraform.tfstate"
    region = "us-east-1"
    encrypt = true
 
    # Add this:
    use_lockfile = true
 
    # Remove this line entirely:
    # dynamodb_table = "your-dynamodb-lock-table"
  }
}
</code></pre>
<p>Your updated <code>backend.tf</code> should look like this:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket       = "your-existing-bucket"
    key          = "path/to/terraform.tfstate"
    region       = "us-east-1"
    encrypt      = true
    use_lockfile = true
  }
}
</code></pre>
<h3 id="heading-step-4-reinitialize-terraform">Step 4: Reinitialize Terraform</h3>
<p>Run <code>terraform init</code> with the <code>-reconfigure</code> flag. This flag tells Terraform that the backend configuration has changed intentionally and to reinitialize without prompting you to copy state (the state is already in the same bucket):</p>
<pre><code class="language-shell">terraform init -reconfigure
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">Initializing the backend...
 
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
 
Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
 
Terraform has been successfully initialized!
</code></pre>
<p><strong>If you see an error here:</strong> The most common cause is that Object Lock wasn't successfully enabled on the bucket. Re-run the verification from Step 2 before proceeding.</p>
<h3 id="heading-step-5-verify-the-migration">Step 5: Verify the Migration</h3>
<p>Run a plan to confirm Terraform is working correctly with the new backend configuration:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>The plan should:</p>
<ul>
<li><p>Complete successfully</p>
</li>
<li><p>Show the same result as the plan you ran in Step 1 (no changes, or the same changes as before)</p>
</li>
<li><p>NOT mention DynamoDB anywhere in its output</p>
</li>
</ul>
<p>To confirm that locking is actually using S3 instead of DynamoDB, open a second terminal and run a plan while the first one is running. You should see the second terminal output a lock error that mentions S3, not DynamoDB:</p>
<pre><code class="language-plaintext">╷
│ Error: Error acquiring the state lock
│
│Error message: operation error S3: PutObject, https response       error StatusCode: 409,
│ RequestID: ..., api error Conflict: Object lock already exists for this key.
│
│ Lock Info:
│   ID:        a1b2c3d4-e5f6-7890-abcd-ef1234567890
│   Path:      your-existing-bucket/path/to/terraform.tfstate.tflock
│   Operation: OperationTypePlan
│   Who:       user@hostname
│   Version:   1.10.0
│   Created:   2026-05-06 14:22:01 UTC
│   Info:
╵
</code></pre>
<p>The <code>Path</code> field shows <code>.tfstate.tflock</code>, a file in your S3 bucket, not a DynamoDB record. This confirms that locking is now handled entirely by S3.</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/e9abb703-af6e-429c-83bb-2ea2dac43a3a.png" alt="Two terminals showing concurrent terraform plan commands, the second one displays a lock error confirming S3 native locking is working" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-6-clean-up-the-dynamodb-table">Step 6: Clean Up the DynamoDB Table</h3>
<p>Once you've confirmed the migration is working correctly and your team has run at least one successful <code>plan</code> and <code>apply</code> cycle using the new backend, you can remove the DynamoDB table.</p>
<p><strong>Wait at least 24-48 hours before deleting the DynamoDB table</strong> if you have CI/CD pipelines or multiple team members. This gives time to catch any pipeline that wasn't updated with the new backend configuration.</p>
<p>When you're ready, delete the DynamoDB table:</p>
<pre><code class="language-shell">aws dynamodb delete-table \
  --table-name your-dynamodb-lock-table
</code></pre>
<p>Confirm the deletion:</p>
<pre><code class="language-shell">aws dynamodb describe-table \
  --table-name your-dynamodb-lock-table
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">An error occurred (ResourceNotFoundException) when calling the DescribeTable operation:
Requested resource not found
</code></pre>
<p>This error confirms that the table is gone. The migration is complete.</p>
<p>If you provisioned the DynamoDB table using Terraform (which is the recommended pattern), remove the resource from your Terraform configuration and run <code>terraform apply</code> to destroy it via Terraform rather than the CLI directly. This keeps your state clean:</p>
<pre><code class="language-hcl"># Remove this entire block from your Terraform configuration:
resource "aws_dynamodb_table" "terraform_state_lock" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
 
  attribute {
    name = "LockID"
    type = "S"
  }
}
</code></pre>
<p>After removing the block, run:</p>
<pre><code class="language-bash">terraform apply
</code></pre>
<p>Terraform will detect that the DynamoDB table resource has been removed from configuration and will destroy the table.</p>
<h2 id="heading-how-to-verify-that-locking-is-working">How to Verify That Locking Is Working</h2>
<p>After completing either the fresh setup or the migration, use this procedure to independently verify that locking is functioning correctly.</p>
<h3 id="heading-method-1-observe-the-lock-file-during-an-operation">Method 1: Observe the lock file during an operation</h3>
<p>In one terminal, start a long-running plan against a configuration with many resources:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>While it's running, in a second terminal, check for the lock file in S3:</p>
<pre><code class="language-shell">aws s3 ls s3://your-bucket/path/to/ | grep tflock
</code></pre>
<p>You should see a file like:</p>
<pre><code class="language-plaintext">2026-05-06 14:22:01        512 terraform.tfstate.tflock
</code></pre>
<p>After the plan completes, run the same command again. The <code>.tflock</code> file should be gone.</p>
<h3 id="heading-method-2-read-the-lock-file-contents">Method 2: Read the lock file contents</h3>
<p>While a plan is running, download and read the lock file to see its contents:</p>
<pre><code class="language-shell">aws s3 cp \
  s3://your-bucket/path/to/terraform.tfstate.tflock \
  /tmp/current.lock &amp;&amp; cat /tmp/current.lock
</code></pre>
<p>Expected output (formatted for readability):</p>
<pre><code class="language-json">{
  "ID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "Operation": "OperationTypePlan",
  "Info": "",
  "Who": "tolani@dev-machine",
  "Version": "1.10.0",
  "Created": "2026-05-06T14:22:01.123456789Z",
  "Path": "your-bucket/path/to/terraform.tfstate"
}
</code></pre>
<p>This is the same lock information that Terraform displays when a lock is held. It's now a JSON file in S3 rather than a record in DynamoDB.</p>
<h2 id="heading-how-to-handle-a-stuck-lock">How to Handle a Stuck Lock</h2>
<p>With the DynamoDB backend, resolving a stuck lock meant deleting a record from the DynamoDB table. With S3 native locking, it means deleting the <code>.tflock</code> file from S3.</p>
<p>A lock can get stuck if:</p>
<ul>
<li><p>A <code>terraform apply</code> or <code>plan</code> process was killed mid-execution</p>
</li>
<li><p>A CI/CD pipeline runner crashed during a Terraform operation</p>
</li>
<li><p>A network interruption prevented the lock release from completing</p>
</li>
</ul>
<p>Here's how you can check for a stuck lock:</p>
<pre><code class="language-shell">aws s3 ls s3://your-bucket/path/to/ | grep tflock
</code></pre>
<p>If a <code>.tflock</code> file exists and no Terraform operation is currently running, it is a stuck lock.</p>
<p>You can also read the lock to understand who held it:</p>
<pre><code class="language-shell">aws s3 cp \
  s3://your-bucket/path/to/terraform.tfstate.tflock \
  /tmp/stuck.lock &amp;&amp; cat /tmp/stuck.lock
</code></pre>
<p>This tells you who (<code>Who</code> field) was running the operation, what operation it was (<code>Operation</code> field), and when it was acquired (<code>Created</code> field).</p>
<p>And you can force-unlock using Terraform like this:</p>
<pre><code class="language-shell">terraform force-unlock LOCK-ID
</code></pre>
<p>Replace <code>LOCK-ID</code> with the <code>ID</code> value from the lock file contents. For example:</p>
<pre><code class="language-shell">terraform force-unlock a1b2c3d4-e5f6-7890-abcd-ef1234567890
</code></pre>
<p>Terraform will confirm:</p>
<pre><code class="language-plaintext">Do you really want to force-unlock?
  Terraform will remove the lock on the remote state.
  This will allow local Terraform commands to modify this state, even though it
  may be still be in use. Only 'yes' will be accepted to confirm.
 
  Enter a value: yes
 
Terraform state has been successfully unlocked!
</code></pre>
<p>An alternative is to delete the lock file directly via CLI. If <code>terraform force-unlock</code> doesn't work (for example, because you are running in a CI environment without Terraform available), delete the lock file directly:</p>
<pre><code class="language-shell">aws s3 rm s3://your-bucket/path/to/terraform.tfstate.tflock
</code></pre>
<p><strong>Only delete the lock file if you are certain no Terraform operation is currently running.</strong> Deleting a lock that is actively held by a running operation will allow a second concurrent operation to start, which is exactly the race condition locking is designed to prevent.</p>
<h2 id="heading-rollback-plan-if-something-goes-wrong">Rollback Plan: If Something Goes Wrong</h2>
<p>If you encounter problems after migrating, you can roll back to the S3 + DynamoDB setup with these steps.</p>
<p><strong>Step 1: Stop all Terraform operations</strong> in your team and CI/CD pipelines.</p>
<p><strong>Step 2: Recreate the DynamoDB table</strong> if you already deleted it:</p>
<pre><code class="language-shell">aws dynamodb create-table \
  --table-name terraform-state-lock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST
</code></pre>
<p><strong>Step 3: Revert</strong> <code>backend.tf</code> to the previous configuration:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket         = "your-existing-bucket"
    key            = "path/to/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"   # restored
    # Remove: use_lockfile = true
  }
}
</code></pre>
<p><strong>Step 4: Reinitialize:</strong></p>
<pre><code class="language-shell">terraform init -reconfigure
</code></pre>
<p><strong>Step 5: Verify:</strong></p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>The state file hasn't moved, so there's no data loss during a rollback. The only change is which locking mechanism Terraform uses.</p>
<p><strong>Note:</strong> Object Lock being enabled on the S3 bucket doesn't prevent the rollback. Object Lock and DynamoDB locking can coexist, Object Lock simply adds a capability to the bucket. Using <code>dynamodb_table</code> in your backend config tells Terraform to use DynamoDB regardless of whether Object Lock is enabled on the bucket.</p>
<h2 id="heading-security-best-practices-for-your-state-bucket">Security Best Practices for Your State Bucket</h2>
<p>Migrating to S3 native locking is a good opportunity to review the overall security configuration of your state bucket. Here are the practices every production Terraform state bucket should implement:</p>
<h3 id="heading-enable-versioning-required">Enable Versioning (Required)</h3>
<p>Versioning is a hard requirement for S3 native locking to work safely. It ensures that if a state file is accidentally overwritten or corrupted, you can restore a previous version.</p>
<pre><code class="language-shell">aws s3api put-bucket-versioning \
  --bucket your-state-bucket \
  --versioning-configuration Status=Enabled
</code></pre>
<h3 id="heading-block-all-public-access-non-negotiable">Block All Public Access (Non-Negotiable)</h3>
<p>Your state file contains resource ARNs, IP addresses, and may contain sensitive values passed through Terraform variables. It must never be publicly accessible.</p>
<pre><code class="language-shell">aws s3api put-public-access-block \
  --bucket your-state-bucket \
  --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
</code></pre>
<h3 id="heading-enable-server-side-encryption">Enable Server-Side Encryption</h3>
<p>Always encrypt state files at rest. AES256 is the minimum. If your organization requires KMS key management:</p>
<pre><code class="language-shell">aws s3api put-bucket-encryption \
  --bucket your-state-bucket \
  --server-side-encryption-configuration '{
    "Rules": [
      {
        "ApplyServerSideEncryptionByDefault": {
          "SSEAlgorithm": "aws:kms",
          "KMSMasterKeyID": "arn:aws:kms:us-east-1:123456789012:key/your-kms-key-id"
        },
        "BucketKeyEnabled": true
      }
    ]
  }'
</code></pre>
<h3 id="heading-apply-least-privilege-iam-permissions">Apply Least-Privilege IAM Permissions</h3>
<p>The role or user that Terraform uses to access the state bucket should have only the permissions it needs. Here's a minimal IAM policy for S3 native locking:</p>
<pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "TerraformStateAccess",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::your-state-bucket",
        "arn:aws:s3:::your-state-bucket/*"
      ]
    },
    {
      "Sid": "TerraformStateLocking",
      "Effect": "Allow",
      "Action": [
        "s3:GetObjectLegalHold",
        "s3:PutObjectLegalHold",
        "s3:GetObjectRetention",
        "s3:PutObjectRetention"
      ],
      "Resource": "arn:aws:s3:::your-state-bucket/*.tflock"
    }
  ]
}
</code></pre>
<p>Notice what is absent: there are no DynamoDB permissions. This is a cleaner, smaller permission set than the old approach required.</p>
<h3 id="heading-enable-access-logging">Enable Access Logging</h3>
<p>Log all access to your state bucket in CloudTrail or S3 server access logs. This gives you an audit trail of every time state was read, written, or locked:</p>
<pre><code class="language-shell">aws s3api put-bucket-logging \
  --bucket your-state-bucket \
  --bucket-logging-status '{
    "LoggingEnabled": {
      "TargetBucket": "your-logging-bucket",
      "TargetPrefix": "terraform-state-access/"
    }
  }'
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>AWS S3 native state locking removes the need for a DynamoDB table from your Terraform backend setup. The result is simpler infrastructure, a smaller IAM permission surface, and one fewer service to provision, monitor, and pay for across every environment your team manages.</p>
<p>Here's a summary of what you accomplished:</p>
<ul>
<li><p>Understood what state locking is and why it's required for safe Terraform operations</p>
</li>
<li><p>Compared S3 native locking to the existing S3 + DynamoDB approach</p>
</li>
<li><p>Set up a fresh Terraform backend using S3 native locking with correct bucket configuration</p>
</li>
<li><p>Migrated an existing backend from S3 + DynamoDB to S3 native locking safely</p>
</li>
<li><p>Learned how to verify locking, handle stuck locks, and roll back if needed</p>
</li>
<li><p>Applied security best practices to the state bucket</p>
</li>
</ul>
<p>This pattern – using S3 native locking – is the recommended approach for all new Terraform projects on AWS going forward. If you're managing a large estate with multiple Terraform backends, consider automating the migration using a script or Terraform module that applies the pattern across all your state buckets.</p>
<p><em>If you are building or optimizing cloud infrastructure for a startup and want a complete reference for production-ready Terraform modules, CI/CD pipeline patterns, and infrastructure runbooks, check out</em> <a href="https://coachli.co/tolani-akintayo/PR-H4oQS">The Startup DevOps Field Guide</a><em>. It covers the full lifecycle of AWS infrastructure from initial setup to production reliability.</em></p>
<h2 id="heading-references">References</h2>
<ul>
<li><p><a href="https://developer.hashicorp.com/terraform/language/backend/s3#use_lockfile">HashiCorp - S3 Backend Configuration: use_lockfile</a></p>
</li>
<li><p><a href="https://github.com/hashicorp/terraform/releases/tag/v1.10.0">HashiCorp: Terraform 1.10 Release Notes</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html">AWS Docs: S3 Object Lock Overview</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectLockConfiguration.html">AWS Docs: PutObjectLockConfiguration API</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-requests.html">AWS Docs: S3 Conditional Writes</a></p>
</li>
<li><p><a href="https://developer.hashicorp.com/terraform/language/state/locking">HashiCorp: Backend State Locking</a></p>
</li>
<li><p><a href="https://developer.hashicorp.com/terraform/cli/commands/force-unlock">HashiCorp: terraform force-unlock Command</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/manage-versioning-examples.html">AWS Docs: Enabling S3 Versioning</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html">AWS Docs: S3 Server-Side Encryption</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
