<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Testing - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Testing - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Wed, 27 May 2026 16:21:18 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/testing/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How I Tested Malaysia's Open Data Portals with Plain English ]]>
                </title>
                <description>
                    <![CDATA[ Most end-to-end test suites drive a real browser and click through an app like a user. They check whether a page renders and whether elements appear. But they don't check whether the numbers on those  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-i-tested-malaysia-s-open-data-portals-with-plain-english/</link>
                <guid isPermaLink="false">69eaad32904b915438ce46f9</guid>
                
                    <category>
                        <![CDATA[ postmark ]]>
                    </category>
                
                    <category>
                        <![CDATA[ playwright ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ automation testing  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ breakingappshackathon ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tech With RJ ]]>
                </dc:creator>
                <pubDate>Thu, 23 Apr 2026 23:37:22 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/605584805f8d5121697263ca/d4859bd4-15d5-4bb7-ba9e-d4693c90163d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most end-to-end test suites drive a real browser and click through an app like a user. They check whether a page renders and whether elements appear.</p>
<p>But they don't check whether the numbers on those elements are correct. A data-pipeline bug that shows Malaysia's population as 3.4 million instead of the real 34 million slips past every selector test in the suite.</p>
<p>The element still exists. A number still renders. The page still looks right. But the bug ships and sits there until a human notices.</p>
<p>I work as a full-stack engineer. Writing end-to-end (E2E) tests with <a href="https://playwright.dev">Playwright</a> and unit tests with <a href="https://jestjs.io">Jest</a> is part of my day job. I also use <a href="https://github.com/microsoft/playwright-mcp">Playwright MCP</a>, the bridge between AI assistants like Claude and a running browser, when I need to generate first-draft test code or debug a flow.</p>
<p>None of that tooling closes the maintenance tax on selector-based suites. Every E2E suite I keep alive at work accumulates <code>data-testid</code> selectors, <code>waitForSelector</code> calls, and tests that break because someone renamed a button.</p>
<p>Bug0's <a href="https://hashnode.com/hackathons/breaking-things">Breaking Apps Hackathon</a> gave me a pretext to try something different. Over a weekend, <a href="https://github.com/LeeRenJie/passmark-hackathon">I built an automated regression suite</a> for Malaysia's three public open data portals, <a href="https://data.gov.my">data.gov.my</a>, <a href="https://open.dosm.gov.my">OpenDOSM</a>, and <a href="https://data.moh.gov.my">KKMNow</a>, using <a href="https://github.com/bug0inc/passmark">Passmark</a>, Bug0's open-source AI-driven Playwright library.</p>
<p>The tests are written in plain English. Two AI models verify each assertion. A third arbitrates disagreements.</p>
<h3 id="heading-what-youll-find-below">What You'll Find Below:</h3>
<ul>
<li><p>How to write an E2E test that checks whether a dashboard's numbers are correct, not only whether the page renders</p>
</li>
<li><p>A specific assertion pattern (range-bounded KPIs) that catches an entire class of data-pipeline bug that selector tests miss, with working examples ready to copy</p>
</li>
<li><p>A cross-field math assertion that takes one sentence in Passmark and around a hundred lines of code without it</p>
</li>
<li><p>How Passmark's own failure explanations became my debugging loop (the single biggest shift in how I'll write E2E tests going forward)</p>
</li>
<li><p>The real limits: a 14% cache-hit rate, a dependency on OpenRouter, and what two-model voting fails to catch</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-why-malaysias-open-data-portals">Why Malaysia's Open Data Portals</a>?</p>
</li>
<li><p><a href="#heading-what-is-passmark">What Is Passmark</a>?</p>
</li>
<li><p><a href="#heading-the-hero-spec-range-bounded-assertions">The Hero Spec: Range-Bounded Assertions</a></p>
<ul>
<li><a href="#heading-what-two-model-voting-doesnt-catch">What Two-Model Voting Doesn't Catch</a></li>
</ul>
</li>
<li><p><a href="#heading-going-further-cross-field-math">Going Further: Cross-Field Math</a></p>
</li>
<li><p><a href="#heading-what-i-found-across-three-runs">What I Found Across Three Runs</a></p>
<ul>
<li><p><a href="#heading-the-debugging-loop">The Debugging Loop</a></p>
</li>
<li><p><a href="#heading-the-two-specs-that-still-fail-are-the-most-interesting">The Two Specs That Still Fail Are the Most Interesting</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-what-it-cost-and-why-cache-rate-is-cost-rate">What It Cost, and Why Cache Rate Is Cost Rate</a></p>
</li>
<li><p><a href="#heading-the-pattern-worth-stealing">The Pattern Worth Stealing</a></p>
</li>
<li><p><a href="#heading-honest-verdict">Honest Verdict</a></p>
</li>
<li><p><a href="#heading-resources">Resources</a></p>
</li>
</ul>
<h2 id="heading-why-malaysias-open-data-portals">Why Malaysia's Open Data Portals?</h2>
<p>The hackathon suggested targets like Vercel Commerce, Cal.com, and Hashnode. These all would've been solid picks.</p>
<p>But I wanted to test something local and closer to my day-to-day work instead. I also wanted a data-heavy site where the numbers on screen have to be accurate, as I work with numbers too on a daily basis.</p>
<p>Malaysia has three public open-data portals:</p>
<ul>
<li><p><a href="https://data.gov.my">data.gov.my</a>, run by MAMPU, the government's digital transformation agency</p>
</li>
<li><p><a href="https://open.dosm.gov.my">OpenDOSM</a>, run by the Department of Statistics</p>
</li>
<li><p><a href="https://data.moh.gov.my">KKMNow</a>, run by the Ministry of Health</p>
</li>
</ul>
<p>They're public, no authentication required, with documented APIs. Seemed like a good fit for an automated test suite. The data on them is what Malaysians use every day, so accuracy isn't optional.</p>
<h2 id="heading-what-is-passmark">What Is Passmark?</h2>
<p>Passmark is a Playwright library where the tests read like specs. Here's an example:</p>
<pre><code class="language-typescript">await runSteps({
  page,
  userFlow: "population dashboard smoke",
  steps: [
    { description: "Navigate to https://data.gov.my/dashboard/kawasanku" },
    {
      description: "Wait for the country-level Malaysia view to render",
      waitUntil: "A headline population number is visible",
    },
  ],
  assertions: [
    {
      assertion:
        "The page shows Malaysia's total population as a number greater than 20 million and less than 40 million",
    },
  ],
  test,
  expect,
});
</code></pre>
<p>There are no selectors, no <code>data-testid</code>, and no <code>page.locator()</code>. The assertion expresses what I care about, in the words I would use with a colleague.</p>
<p>On the first run, an AI agent drives the page and caches the resolved Playwright action to Redis. Every run after that replays at native Playwright speed with zero model calls.</p>
<p>When the UI changes and a cached action fails, the AI re-engages only for that step. Two assertion models (Claude and Gemini) vote. A third model arbitrates disagreements.</p>
<h2 id="heading-the-hero-spec-range-bounded-assertions">The Hero Spec: Range-Bounded Assertions</h2>
<p>Range-bounded assertions were the first shape of test I wrote, and the one I came back to most across the suite.</p>
<p>The idea is straightforward: check that a number on the page falls inside a sensible range, not that a specific element exists.</p>
<p>The image below is the Playwright report from the population spec, with all four range-bounded assertions passing.</p>
<img src="https://cdn.hashnode.com/uploads/covers/605584805f8d5121697263ca/4a5f70e6-8a75-489b-8d0d-7d4226653b1a.png" alt="Playwright HTML report detail for the population spec. Passmark's annotation reads: &quot;Total Population (2025) with a value of 34.2 million, which is between 20 million and 40 million.&quot; All four range-bounded assertions pass." style="display:block;margin:0 auto" width="1019" height="1569" loading="lazy">

<p>The range-bounded population test is the one that shows Passmark's real value.</p>
<p>Traditional Playwright asserts DOM structure. It confirms that an element with class <code>kpi-total</code> contains the text <code>34.2 million</code>. That tells you the page rendered, not whether the number makes sense.</p>
<p>A bug that shows Malaysia's population as <code>3.42 million</code> sails past any selector test. The DOM is correct. The number renders. Nothing breaks in the conventional sense.</p>
<p>Passmark reads the page, evaluates the claim, and fails because <code>3.42 million</code> falls outside the sane range. Two models vote. A hallucination by one model alone produces no false pass.</p>
<h3 id="heading-what-two-model-voting-doesnt-catch">What Two-Model Voting Doesn't Catch</h3>
<p>Voting defends against one model misreading the page. It doesn't defend against both models misreading the page the same way. If Claude and Gemini both parse "32.4 million" as "3.24 million" because of the same unusual spacing in the DOM, they agree, they vote pass, and the bug ships.</p>
<p>The mitigation is assertion design. Write assertions that are hard to misread. A range check ("between 20 million and 40 million") is harder for a model to get wrong than a prose check ("roughly 34 million"). Numerical bounds leave less room for interpretation than adjectives. The more your assertion looks like a unit test written in English, the less room the models have to disagree.</p>
<h2 id="heading-going-further-cross-field-math">Going Further: Cross-Field Math</h2>
<p>Range-bounded assertions are a good first step. They catch "is this number in the right ballpark?" But they don't catch "do these numbers agree with each other?"</p>
<p>For that, you need cross-field math. If a dashboard shows a total population and a breakdown by gender, those two things are supposed to agree. Male plus female should equal total. Ethnicity breakdown percentages should sum to 100.</p>
<pre><code class="language-typescript">test("Cross-field math: sex breakdown sums to total population", async ({ page }) =&gt; {
  test.setTimeout(180_000);
  await runSteps({
    page,
    userFlow: "population sex breakdown consistency",
    steps: [
      { description: "Navigate to https://data.gov.my/dashboard/kawasanku" },
      {
        description: "Wait for the Malaysia country-level view with breakdown data",
        waitUntil:
          "A headline total population figure is visible and a breakdown by sex is shown on the page",
      },
    ],
    assertions: [
      {
        assertion:
          "The male and female population values shown on the page add up to approximately the headline total population, within a 5% margin",
      },
      {
        assertion:
          "Any percentage-based breakdowns visible on the page (by sex, age, or ethnicity) sum to approximately 100% within a 2 percentage-point margin",
      },
      {
        assertion: "No breakdown value is negative or greater than the headline total",
      },
    ],
    test,
    expect,
  });
});
</code></pre>
<p>Try writing that in vanilla Playwright. You need selectors for the headline number, selectors for the breakdown components, number parsing with a comma-aware regex, and a margin calculation. Seventy to a hundred lines of code to verify three invariants a primary school student would call obvious.</p>
<p>The Passmark version is one spec. I ran it against <a href="https://data.gov.my/dashboard/kawasanku">Kawasanku's</a> live country view. All three assertions passed in 1.4 minutes. Passmark's annotation, verbatim:</p>
<blockquote>
<p><em>"The headline total population figure 'Malaysia has a population of 32,447,385 people.' is visible, and 'Gender And Age Distribution' is shown, which implies a breakdown by sex (male, female) will be available."</em></p>
</blockquote>
<p>Two models read the page, extract the numbers, do the arithmetic, and agree. When the dashboard changes layout in three months, the same assertion still works, because it never named a selector.</p>
<p>This is the class of test I want running against every dashboard product that I touch. Financial totals matching their line items. Percentages that sum to 100. Inventory counts equal to the sum of warehouse locations. This rarely gets checked today, because writing the check by hand outweighs the perceived value of running it.</p>
<h2 id="heading-what-i-found-across-three-runs">What I Found Across Three Runs</h2>
<table>
<thead>
<tr>
<th>Run</th>
<th>Passed</th>
<th>Key change</th>
</tr>
</thead>
<tbody><tr>
<td>1</td>
<td>4 of 13 (31%)</td>
<td>Baseline. Wrote specs without looking at the target pages</td>
</tr>
<tr>
<td>2</td>
<td>8 of 13 (62%)</td>
<td>Rewrote five over-specified assertions using Passmark's own feedback</td>
</tr>
<tr>
<td>3</td>
<td>12 of 13 (92%)</td>
<td>Dropped one more wrong assertion, bumped timeouts, added retry, installed WebKit</td>
</tr>
</tbody></table>
<img src="https://cdn.hashnode.com/uploads/covers/605584805f8d5121697263ca/0181e92e-3341-4f1a-9830-cd4acd305e1c.png" alt="Playwright HTML report overview page showing the final run. 11 tests passed, 2 failed, total time 21.1 minutes across 13 specs." style="display:block;margin:0 auto" width="1012" height="1460" loading="lazy">

<p>Every passing spec after run 1 came from Passmark telling me, in plain English, why my assertion didn't match the page.</p>
<p>Here are three examples from run 1:</p>
<p>For <code>dataset-detail.spec.ts</code>, I asserted "an API usage snippet (curl or JS) is shown." knowingly that the page is using Python and I wanted to see what the result was. Passmark replied:</p>
<blockquote>
<p><em>"The page contains API usage snippets, but they are specifically for Python using the requests library. There are no snippets provided in curl or JavaScript formats."</em></p>
</blockquote>
<p>The page had snippets. I asked for the wrong languages. Fix: accept any language.</p>
<p>For <code>dashboard-population.spec.ts</code>, I asserted "a chart visualizing population by age or ethnicity is rendered." Passmark replied:</p>
<blockquote>
<p><em>"The current page displays charts for vital statistics such as Live Births, Deaths, and Natural Increase over time, but there is no chart visualizing population specifically by age groups or ethnicity."</em></p>
</blockquote>
<p>The charts are there. Not the slice I guessed. Fix: accept any chart about population.</p>
<p>For <code>kkmnow/hospital-utilisation.spec.ts</code>, I asserted a "headline bed-utilisation percentage." Passmark replied:</p>
<blockquote>
<p><em>"While there are multiple bed-utilisation percentages listed in tables and rankings further down the page, there is no prominent, top-level headline KPI figure displaying the overall bed-utilisation percentage."</em></p>
</blockquote>
<p>The numbers are there. I had asked for a layout the designers didn't build.</p>
<p><strong>This is the killer feature:</strong> Passmark's failure messages aren't stack traces. They're explanations. The AI read the page, compared it against my words, and pointed me at the fix. Nothing like a selector-based test throwing <code>TimeoutError: waiting for locator</code>.</p>
<h3 id="heading-the-debugging-loop">The Debugging Loop</h3>
<p>Once I saw the pattern, the loop became my main technique. Here's the procedure:</p>
<ol>
<li><p>Read the failure message word for word. Don't skim.</p>
</li>
<li><p>Trust it as a description of what is on the page. The AI has read the page. Your assertion has not.</p>
</li>
<li><p>Rewrite the assertion so it matches what's on the page. Broaden, narrow, or restate.</p>
</li>
<li><p>Run it again.</p>
</li>
</ol>
<p>The discipline is to not argue with the tool. The page is what the page is. Your assertion is what is wrong. Every time I tried to "fix" the page (convinced my assertion was right and the site was broken), I lost some time. Every time I took the failure message at face value and rewrote, the test passed on the next run.</p>
<p>This is the one of the changes in how I'll write E2E tests going forward. The feedback loop is the tool. Every failed assertion is a draft of the correct one.</p>
<h3 id="heading-the-two-specs-that-still-fail-are-the-most-interesting">The Two Specs That Still Fail Are the Most Interesting</h3>
<h4 id="heading-1-the-two-models-disagreed-and-the-arbiter-call-failed">1. The two models disagreed and the arbiter call failed.</h4>
<p>On <code>catalogue-search.spec.ts</code>, Claude voted fail (72% confidence) and Gemini voted pass (100% confidence) on the same assertion. I had written the assertion in a way that read two ways.</p>
<p>Passmark escalated to an arbiter model through OpenRouter. The call came back with a 504 from Cloudflare. The arbiter never ran. The suite failed the spec.</p>
<p>This is an honest limit, not a fluke. Any CI that runs Passmark depends on OpenRouter's availability. External gateway errors happen. My fix for the final run was a global retry wrapper around the OpenRouter call, and the 504 stopped being a problem in practice.</p>
<p>If you bring this to production CI, plan for retries and treat OpenRouter outages as a first-class failure mode in your runbook.</p>
<img src="https://cdn.hashnode.com/uploads/covers/605584805f8d5121697263ca/9e08d49a-00e6-4803-a86e-decfa0534308.png" alt="Playwright HTML report detail for the catalogue-search failure. Shows Claude and Gemini returning different verdicts on the same assertion, Passmark escalating to an arbiter model, and the arbiter call aborting with a 504 from Cloudflare." style="display:block;margin:0 auto" width="995" height="1259" loading="lazy">

<p>This failure taught me something about assertion design: my wording was ambiguous. Claude's reading was reasonable. Gemini's reading was reasonable. When you write tests in English, being precise about what you mean is part of writing a good test.</p>
<h4 id="heading-2-the-wait-condition-fired-too-early">2. The wait condition fired too early.</h4>
<p>On the KKMNow spec, I had <code>waitUntil: "A utilisation metric is visible"</code>. The page showed the section label "Hospital Bed Utilisation (%)" before the numbers finished loading. The wait step saw the label, decided the condition was met, and moved on. By the time the numbers rendered, the test had run out of time. Once the page was fully loaded, the range assertions would have passed on content.</p>
<blockquote>
<p><em>"The page displays multiple bed-utilisation percentages within the specified range (0% to 120%). For example, the ranked list shows Perlis at 93.1% and Melaka at 88.2%."</em></p>
</blockquote>
<img src="https://cdn.hashnode.com/uploads/covers/605584805f8d5121697263ca/2a9c8ade-f2ab-4c58-a56b-f7714bcabef5.png" alt="Playwright HTML report detail for the KKMNow spec. The test times out on the initial waitUntil, but Passmark's annotations show the range and state-selector assertions passed on content once the dashboard hydrated. Example quoted: &quot;Perlis at 93.1% and Melaka at 88.2%.&quot;" style="display:block;margin:0 auto" width="1000" height="1152" loading="lazy">

<p>The lesson: your <code>waitUntil</code> wording needs the same care as your assertion wording. Both are read by AI. A vague wait is as bad as a vague assertion.</p>
<h2 id="heading-what-it-cost-and-why-cache-rate-is-cost-rate">What It Cost, and Why Cache Rate Is Cost Rate</h2>
<p>Each of the three runs took about 20 minutes on 13 specs with a single worker. The hackathon's pooled OpenRouter key covered the AI costs, so I have no personal dollar figure to report.</p>
<p>The more useful cost finding is what gets cached.</p>
<pre><code class="language-bash">$ docker exec passmark-redis redis-cli DBSIZE
5
</code></pre>
<p>Five steps out of roughly 35 were cached across three runs. A 14% cache-hit rate. The Passmark README explains why:</p>
<blockquote>
<p><em>Only steps that produced a single tool call get cached. Multi-step sequences are considered non-deterministic.</em></p>
</blockquote>
<p>Most of my steps described multi-tool sequences. "Open the area selector and choose Selangor, then wait for navigation" becomes click, wait, verify. Those don't cache by design.</p>
<p>This matters for your budget. An 86% miss rate means 86% of your steps call a model on every run. The cost model is per-tool-call via OpenRouter.</p>
<p>To estimate your own bill: count non-atomic steps in your suite, multiply by your chosen model's per-call price at current OpenRouter rates, and the product is your recurring cost per run. Cache rate is cost rate.</p>
<p>The fix is authoring discipline. Split compound descriptions into atomic steps. Treat cache fill rate as a metric you track, not an implementation detail to ignore. A suite with 80% atomic steps costs a fifth of a suite with 14%.</p>
<h2 id="heading-the-pattern-worth-stealing">The Pattern Worth Stealing</h2>
<p>The idea here is bigger than Passmark.</p>
<p><strong>Check that the numbers on your dashboards make sense.</strong> Most teams don't. They should.</p>
<p>A one-line assertion like "the headline number is between 20 million and 40 million" catches several classes of bug regular tests miss.</p>
<p>Here are four common ones:</p>
<ul>
<li><p>The data pipeline divided by the wrong thing, so the number on screen is ten times too small.</p>
</li>
<li><p>A timezone bug made yesterday's total show up under tomorrow's date.</p>
</li>
<li><p>The data never refreshed, so users are looking at last week's numbers.</p>
</li>
<li><p>A locale flip swapped commas and decimals, so 1,234,567 is now reading as 1.234567.</p>
</li>
</ul>
<p>Civic portals were my target. The pattern applies anywhere a dashboard shows numbers. Fintech reports, SaaS analytics, healthcare metrics, e-commerce admin panels. Any screen where a number is supposed to mean something.</p>
<p>Most of these numbers never get tested. Writing the check by hand is tedious. You need a selector to find the number, code to parse it, code to handle units, and a margin calculation. Fifty lines for one check. Nobody bothers.</p>
<p>You don't need Passmark to steal the idea. The same check works in plain Playwright with <code>page.evaluate</code> and number parsing. The Passmark version is just more efficient to write and readable by anyone on the team, not only engineers.</p>
<h2 id="heading-honest-verdict">Honest Verdict</h2>
<p>Passmark works. Across three runs I went from 4 of 13 passing to 12 of 13 without touching a selector, guided by the tool's own feedback.</p>
<p>Still, the caveats are real:</p>
<ul>
<li><p>On a cold cache, every step waits for a model. Budget more wall-clock time than a selector suite.</p>
</li>
<li><p>In my suite only 14% of steps cached. The other 86% pays model cost on every run. Authoring discipline (atomic steps) is the difference between cents and dollars per run.</p>
</li>
<li><p>Two-model voting doesn't protect against both models misreading the same way. Write assertions that are hard to misread.</p>
</li>
<li><p>Every assertion depends on OpenRouter's availability. External gateway errors need a retry strategy before this runs in CI.</p>
</li>
</ul>
<p>What stuck with me: Passmark didn't make me better at Playwright. It made me write tests I would have skipped otherwise.</p>
<p>What I imagine myself doing at work:</p>
<ul>
<li><p>Run a small nightly Passmark suite against the critical dashboards, focused on range and freshness checks.</p>
</li>
<li><p>Keep traditional Playwright and Jest for everything that has to be fast and deterministic.</p>
</li>
<li><p>Treat every Passmark failure message as a specification of the page, not an error to argue with.</p>
</li>
</ul>
<p>Try this, even if you never touch Passmark. Pick a number on a dashboard you work with. Write a test that fails if the number is outside a sane range. See what breaks. That is the whole pattern and purpose of this article.</p>
<h2 id="heading-resources">Resources</h2>
<ul>
<li><p>Repo: <a href="https://github.com/LeeRenJie/passmark-hackathon">github.com/LeeRenJie/passmark-hackathon</a></p>
</li>
<li><p>Passmark: <a href="https://github.com/bug0inc/passmark">github.com/bug0inc/passmark</a></p>
</li>
<li><p>Breaking Apps Hackathon: <a href="https://hashnode.com/hackathons/breaking-things">hashnode.com/hackathons/breaking-things</a></p>
</li>
<li><p>Test targets: <a href="https://data.gov.my">data.gov.my</a>, <a href="https://open.dosm.gov.my">OpenDOSM</a>, <a href="https://data.moh.gov.my">KKMNow</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Data Quality Handbook: Data Errors, the Developer's Role, and Validation Layers Explained. ]]>
                </title>
                <description>
                    <![CDATA[ In August 2012, Knight Capital, a major trading firm in the United States, deployed faulty trading software to its production system. The system used this incorrect configuration data and it triggered ]]>
                </description>
                <link>https://www.freecodecamp.org/news/data-quality-handbook-data-errors-the-developer-s-role-validation-layers/</link>
                <guid isPermaLink="false">69dea3b491716f3cfb75fd9d</guid>
                
                    <category>
                        <![CDATA[ data ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Validation ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Great John ]]>
                </dc:creator>
                <pubDate>Tue, 14 Apr 2026 20:29:40 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/4f0c9085-cb4f-4255-b7a0-e146eafc32c9.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In August 2012, Knight Capital, a major trading firm in the United States, deployed faulty trading software to its production system. The system used this incorrect configuration data and it triggered millions of unintended stock trades.</p>
<p>The company lost about $440 million in just 45 minutes. Knight Capital nearly collapsed and had to be rescued by investors. It was later acquired by another firm.</p>
<p>When Target expanded into Canada, the company relied on a new supply chain system that contained incorrect product and inventory data. Product information in the database was incomplete and inaccurate. Prices, sizes, and product descriptions were entered incorrectly.</p>
<p>Inventory systems reported items in stock that were actually unavailable. Customers found empty shelves in stores despite the system showing stock. The company lost over $2 billion in the Canadian market. Target eventually shut down all Canadian stores in 2015.</p>
<p>One employee made the statement “Even though we had a great supply chain system on paper, we didn’t have accurate data. Bad data leads to bad decisions’’</p>
<p>Another famous example of data-related engineering failures involves the Mars Climate Orbiter spacecraft. One engineering team used metric units (newtons). Another team used imperial units (pounds-force). The system failed to convert the data correctly. The spacecraft entered Mars' atmosphere at the wrong altitude. The mission failed and the spacecraft was destroyed. The loss was about $125 million.</p>
<p>In this article, we'll delve deep into what data quality truly means, the types of data errors that silently break systems, the developer’s responsibility in preventing them, and the validation layers that work together to keep bad data out of production.</p>
<h3 id="heading-what-well-cover">What We'll Cover:</h3>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-the-importance-of-data-quality">The Importance of Data Quality</a></p>
<ul>
<li><p><a href="#heading-how-does-bad-data-happen-in-the-first-place">How Does Bad Data Happen in the First Place?</a></p>
</li>
<li><p><a href="#heading-the-cost-of-bad-data">The Cost of Bad Data</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-types-of-data-errors">Types of Data Errors</a></p>
<ul>
<li><p><a href="#heading-required-field-errors">Required Field Errors</a></p>
</li>
<li><p><a href="#heading-format-validation-errors">Format Validation Errors</a></p>
</li>
<li><p><a href="#heading-range-and-limit-errors">Range and Limit Errors</a></p>
</li>
<li><p><a href="#heading-logical-consistency-errors">Logical Consistency Errors</a></p>
</li>
<li><p><a href="#heading-duplicate-and-data-integrity-errors">Duplicate and Data Integrity Errors</a></p>
</li>
<li><p><a href="#heading-relational-errors-reference-integrity">Relational Errors (Reference Integrity)</a></p>
</li>
<li><p><a href="#heading-structural-errors-dropdowns-radio-buttons-enums">Structural Errors (Dropdowns, Radio Buttons, Enums)</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-what-makes-good-data">What Makes Good Data?</a></p>
<ul>
<li><p><a href="#heading-completeness">Completeness:</a></p>
</li>
<li><p><a href="#heading-uniqueness">Uniqueness:</a></p>
</li>
<li><p><a href="#heading-validity">Validity:</a></p>
</li>
<li><p><a href="#heading-timeliness">Timeliness:</a></p>
</li>
<li><p><a href="#heading-accuracy">Accuracy:</a></p>
</li>
<li><p><a href="#heading-consistency">Consistency:</a></p>
</li>
<li><p><a href="#heading-fitness-for-purpose">Fitness for Purpose:</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-data-validation-layers">Data Validation Layers</a></p>
<ul>
<li><p><a href="#heading-frontend-layer-protect-the-user-not-the-system">Frontend Layer — “Protect the User, Not the System”</a></p>
</li>
<li><p><a href="#heading-backend-validation-the-real-gatekeeper">Backend Validation — “The Real Gatekeeper”</a></p>
</li>
<li><p><a href="#heading-database-layer-protect-the-data-at-rest">Database Layer — “Protect the Data at Rest”</a></p>
</li>
<li><p><a href="#heading-service-layer-business-logic-validate-real-world-rules">Service Layer / Business Logic — “Validate Real-World Rules”</a></p>
</li>
<li><p><a href="#heading-jobs-queues-data-ingestion-validate-external-data">Jobs / Queues / Data Ingestion — “Validate External Data”</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-testing-strategies-to-protect-data-quality">Testing Strategies to Protect Data Quality</a></p>
<ul>
<li><p><a href="#heading-unit-testing-the-schema-amp-constraint-check">Unit Testing: The Schema &amp; Constraint Check</a></p>
</li>
<li><p><a href="#heading-integration-testing-the-flow-amp-lineage-check">Integration Testing: The Flow &amp; Lineage Check</a></p>
</li>
<li><p><a href="#heading-functional-testing-the-business-rule-check">Functional Testing: The Business Rule Check</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h3 id="heading-prerequisites">Prerequisites</h3>
<ul>
<li><p>A basic understanding of what data is</p>
</li>
<li><p>A basic understanding of data structures</p>
</li>
<li><p>An understanding of what an API is</p>
</li>
<li><p>An understanding of what a database is and what it does</p>
</li>
</ul>
<h2 id="heading-the-importance-of-data-quality">The Importance of Data Quality</h2>
<p>As you can see from just these few examples, the quality of the data you're working with really matters.</p>
<p>Gartner reports that organisations attribute <a href="https://www.forbes.com/councils/forbestechcouncil/2021/10/14/flying-blind-how-bad-data-undermines-business/"><strong>around $15 million in annual losses</strong></a> to poor‑quality data. The same research also shows that <a href="https://www.forbes.com/councils/forbestechcouncil/2021/10/14/flying-blind-how-bad-data-undermines-business/"><strong>nearly 60% of companies have no clear idea what bad data actually costs them</strong></a>, largely because they don’t track or measure data‑quality issues at all.</p>
<p>A 2016 study by IBM is even more eye-popping. IBM found that <a href="https://community.sap.com/t5/technology-blog-posts-by-sap/bad-data-costs-the-u-s-3-trillion-per-year/ba-p/13575387">poor data quality strips $3.1 trillion from the U.S. economy annually</a> due to lower productivity, system outages, and higher maintenance costs.</p>
<p>Bad data is, and will continue to be, the kryptonite of any organisation. This is even more concerning as more organisations now depend on data for strategy execution than ever before.</p>
<p>When data is wrong, incomplete, duplicated, or inconsistent, the consequences ripple outward: Incorrect dashboards mislead teams, which leads to making incorrect decisions. Implementing these decisions can lead to faulty strategy and policy implementation.</p>
<p>Eventually, the organisation pays the price, financially, operationally, and reputationally. And while money can be recovered, reputation rarely bounces back so easily.</p>
<h3 id="heading-how-does-bad-data-happen-in-the-first-place">How Does Bad Data Happen in the First Place?</h3>
<p>Form fields are usually the first place where data enters an application, so they’re often where bad data begins. This is why the developer’s role is so critical.</p>
<p>Many of the most damaging data errors don’t originate from malicious users or complex edge cases – they come from simple oversights that the system should never have allowed in the first place.</p>
<p>But it's equally important to recognise that data quality issues often originate <em>before</em> the data ever reaches an application. Upstream processes — how data is collected, measured, recorded, or pre‑validated — can introduce inaccuracies long before the system receives it.</p>
<p>For example, a nurse might weigh a patient using an uncalibrated mechanical scale, record the incorrect value on a paper form, and later have that value transcribed into the hospital system. By the time the data enters the application, the error is already embedded.</p>
<p>This means that maintaining data quality requires attention both to upstream data collection practices and to the system-level validation that developers control.</p>
<p>When the UI, backend, or API layer permits invalid, incomplete, inconsistent, or logically impossible data to enter the pipeline, the organisation inherits a long‑term liability. Even small choices — such as allowing empty fields, ignoring duplicates, or failing to enforce validation rules — can introduce errors that may only surface months later in reports or dashboards, leading to confusion and inaccurate insights.</p>
<h3 id="heading-the-cost-of-bad-data">The Cost of Bad Data</h3>
<p>Data quality can also be impacted at any stage of the data pipeline: before ingestion, in production, or even during analysis.</p>
<p>If bad data is caught in the UI, it's almost free, if we're thinking in terms of cost. If it's caught at the API layer, that's still pretty cheap. If it's caught in the database, the cost is moderate. And if it's caught in a report or ML model months later, that's expensive, and sometimes irreversible.</p>
<p>A key principle in modern data management is: the cheapest and safest place to catch bad data is at the source, and that is before ingestion. <a href="https://www.matillion.com/blog/the-1-10-100-rule-of-data-quality-a-critical-review-for-data-professionals">The well-known 1-10-100 Rule</a>, introduced by George Labovitz and Yu Sang Chang in 1992, clearly illustrates this idea.</p>
<p>According to the rule, it costs about \(1 to validate data at the point of entry, \)10 to correct it after it has entered the system, and $100 per record if the error goes unnoticed and causes problems further down the line.</p>
<p>As the saying goes, an ounce of prevention is worth a pound of cure – and this is especially true when it comes to maintaining high-quality data.</p>
<p>To help buttress my point, I’ve categorised the different types of errors and oversights that developers should never allow that can and should be prevented before they ever reach the database, analytics layer, or reporting systems.</p>
<h2 id="heading-types-of-data-errors">Types of Data Errors</h2>
<h3 id="heading-required-field-errors">Required Field Errors</h3>
<p>If you build a form that allows a user to submit a registration form with important fields left empty (like first name, last name, email address, phone number, date of birth, or address), you're directly letting incomplete data enter the system.</p>
<p>I remember a scenario from my time as a data analyst where I was analysing a dataset containing different types of alarms triggered across several buildings. These alarms fell into categories such as aquarium alarms, intruder alarms, fire alarms, and maintenance alarms.</p>
<p>The purpose of the analysis was simple: identify which buildings had the highest frequency of alarms so that maintenance, resources, or investigations could be allocated appropriately.</p>
<p>Whenever an alarm went off, the security team recorded it using a software system. By the end of each month, we could view the cumulative alarms and generate insights.</p>
<p>But I encountered a major data quality issue. The security officers often selected the alarm category but failed to submit the building where the alarm occurred — and the system allowed this incomplete record to be saved into the database.</p>
<p>Every alarm had to occur in a specific building. Yet during analysis, I would see entries like “20 fire alarms” with no building information attached. Since I couldn’t determine where these alarms happened, the data became unusable. I had no choice but to delete those records because they provided no actionable value.</p>
<p>This is a classic example of poor data validation. If the developer had implemented proper constraints, the system would never allow an alarm to be submitted without a building name.</p>
<p>Required fields should be enforced at the UI and backend levels to prevent missing data from entering the system in the first place. These gaps lead to missing or unusable data in the database, often forcing teams to delete or manually repair records later.</p>
<p>To prevent these errors, you can use required‑field validation, disable the submit button until all mandatory fields are completed, and visually highlight missing fields with inline error messages.</p>
<p>Here's a practical code example of some bad code (no required checks):</p>
<pre><code class="language-plaintext">&lt;form id="signup"&gt;
  &lt;input type="text" id="name" placeholder="Full name"&gt;
  &lt;input type="email" id="email" placeholder="Email"&gt;
  &lt;button type="submit"&gt;Sign up&lt;/button&gt;
&lt;/form&gt;

&lt;script&gt;
document.getElementById("signup").addEventListener("submit", e =&gt; {
  const name = document.getElementById("name").value;
  const email = document.getElementById("email").value;
  console.log("Submitted:", { name, email });
});
&lt;/script&gt;
</code></pre>
<p>From the above code snippet, the core problem is that the form doesn't enforce required input. Neither HTML‑level validation (using the <code>required</code> attribute) nor JavaScript‑based checks are implemented. This omission allows users to submit the form without providing necessary information, making the form unreliable for collecting valid and complete user data.</p>
<p>From a usability and data quality perspective, this is problematic. Forms are typically designed to collect meaningful and complete information, and fields such as “Full name” and “Email” are usually essential. Without marking these inputs as required or validating them programmatically, we risk receiving blank or invalid submissions, which can compromise the quality of stored data and any processes that depend on it.</p>
<p>Here's an example of a better version (UI prevents empty submission):</p>
<pre><code class="language-plaintext">&lt;form id="signup"&gt;
  &lt;input type="text" id="name" placeholder="Full name" required&gt;
  &lt;input type="email" id="email" placeholder="Email" required&gt;
  &lt;button type="submit"&gt;Sign up&lt;/button&gt;
&lt;/form&gt;

&lt;script&gt;
document.getElementById("signup").addEventListener("submit", e =&gt; {
  if (!e.target.checkValidity()) {
    e.preventDefault();
    alert("Please fill in all required fields.");
  }
});
&lt;/script&gt;
</code></pre>
<p>In this revised version of the code, the addition of the <code>required</code> attribute to both the name and email input elements ensures that the browser won't allow the form to be submitted unless these fields are filled. This is an important step toward maintaining data completeness and improving the overall reliability of the form.</p>
<p>Also, by checking <code>e.target.checkValidity()</code>, we now ensure that the form is evaluated before submission proceeds.</p>
<p>Another positive aspect is the conditional use of <code>e.preventDefault()</code>. When the form is invalid, the default submission behavior is stopped, preventing incomplete or incorrect data from being sent.</p>
<h3 id="heading-format-validation-errors">Format Validation Errors</h3>
<p>If you have a form that allows a user to enter an email without an @ symbol, an email without a domain, a phone number containing letters, or a postcode/ZIP code in the wrong format, that allows invalid data to enter the system.</p>
<p>The same applies when you allow a user to submit an impossible date (32/15/2025) or a credit card number with the wrong length.</p>
<p>These issues will cause the data analyst to spend more time cleaning the data, if it's even cleanable. And such incorrect inputs create unreliable data that breaks downstream processes and increases cleanup costs.</p>
<p>To prevent these types of errors, you can use regex validation, input masks, and field‑type restrictions (for example, numeric‑only fields for phone numbers) to enforce correct formats before submission.</p>
<p>Here's a bad example of allowing format validation errors:</p>
<pre><code class="language-plaintext">&lt;input id="phone" placeholder="Phone number"&gt;
&lt;button onclick="save()"&gt;Save&lt;/button&gt;

&lt;script&gt;
function save() {
  const phone = document.getElementById("phone").value;
  console.log("Saving phone:", phone);
}
&lt;/script&gt;
</code></pre>
<p>This code doesn't perform any checks on the format or structure of the phone number. The function simply retrieves whatever value exists –&nbsp;whether valid, invalid, or blank –&nbsp;and logs it to the console without any condition.</p>
<p>Here's the fixed version:</p>
<pre><code class="language-plaintext">&lt;input id="phone" placeholder="Phone number" required&gt;
&lt;button onclick="save()"&gt;Save&lt;/button&gt;

&lt;script&gt;
function save() {
  const phone = document.getElementById("phone").value;

  if (!/^\d+$/.test(phone)) {
    alert("Phone number must contain digits only.");
    return;
  }

  console.log("Saving phone:", phone);
}
&lt;/script&gt;
</code></pre>
<p>This version fixes the earlier mistake by introducing a clear validation rule. Before the system accepts the phone number, it checks whether the input contains only digits. The regular expression <code>^\d+$</code> ensures that the value is made up entirely of numbers, with no letters or symbols allowed. If the user enters anything invalid, the function stops and displays an error message instead of saving bad data.</p>
<p>This approach prevents the format error that occurred in the previous example. Instead of blindly trusting whatever the user types, the code now enforces a rule that matches the expected format of a phone number. This is what a responsible developer should do: verify the input before using it.</p>
<h3 id="heading-range-and-limit-errors">Range and Limit Errors</h3>
<p>Allowing users to enter values outside acceptable limits – such as negative ages, quantities below zero, discounts above 100%, or measurements far beyond realistic ranges – that enables the ingestion of data that violates business rules. These errors distort analytics, break calculations, and create operational inconsistencies.</p>
<p>To mitigate these errors, you can apply min/max constraints, sliders, steppers, and numeric boundaries to ensure values fall within valid ranges.</p>
<p>Here's a bad example of allowing range and limit errors:</p>
<pre><code class="language-plaintext">&lt;input id="age" type="number"&gt;
&lt;button onclick="submitAge()"&gt;Submit&lt;/button&gt;

&lt;script&gt;
function submitAge() {
  console.log("Age:", document.getElementById("age").value);
}
&lt;/script&gt;
</code></pre>
<p>As seen above, we've created an input field for age but doesn't specify any limits or constraints. The browser allows the user to type any number — including values that make no sense, such as negative ages, extremely large ages, or decimals. The JavaScript function simply reads the value and logs it without checking whether the age is realistic.</p>
<p>Here's a better version:</p>
<pre><code class="language-plaintext">&lt;input id="age" type="number" min="0" max="120" required&gt;
&lt;button onclick="submitAge()"&gt;Submit&lt;/button&gt;

&lt;script&gt;
function submitAge() {
  const ageInput = document.getElementById("age");
  if (!ageInput.checkValidity()) {
    alert("Age must be between 0 and 120.");
    return;
  }
  console.log("Age:", ageInput.value);
}
&lt;/script&gt;
</code></pre>
<p>Now in this version, the inclusion of the <code>min="0"</code> and <code>max="120"</code> attributes sets clear boundaries for acceptable input values. This ensures that only realistic age values within a defined range are allowed, preventing invalid entries such as negative numbers or excessively large ages.</p>
<p>The JavaScript function further enhances this validation by using the <code>checkValidity()</code> method. This method checks whether the input satisfies all defined constraints, including the required condition and the specified numeric range. If the input doesn't meet these conditions, the function prevents further execution and displays an alert message, informing the user that the entered age must fall within the allowed range.</p>
<h3 id="heading-logical-consistency-errors">Logical Consistency Errors</h3>
<p>If you allow a user to select an end date before the start date, choose a checkout date earlier than check‑in at a hotel, or enter a delivery date before the order date, this will result in logically impossible data. The same applies when you allow a user to enter a graduation year earlier than their admission to a program, or submit working hours that exceed 24 hours in a day.</p>
<p>You can mitigate this by implementing cross‑field validation, business‑rule checks, and conditional logic that ensures related fields remain consistent.</p>
<p>Here's a bad example of a logical consistency error:</p>
<pre><code class="language-plaintext">&lt;input type="date" id="start"&gt;
&lt;input type="date" id="end"&gt;
&lt;button onclick="save()"&gt;Save&lt;/button&gt;

&lt;script&gt;
function save() {
  console.log({
    start: document.getElementById("start").value,
    end: document.getElementById("end").value
  });
}
&lt;/script&gt;
</code></pre>
<p>In the code above, the core issue is the complete absence of validation. Although the inputs use <code>type="date"</code>, which provides a structured way for users to select dates, the code doesn't enforce that either field is required. This means the user can leave one or both date fields empty, and the <code>save()</code> function will still run and log the values. As a result, the system may end up processing incomplete or meaningless data.</p>
<p>Beyond missing required checks, the code also fails to validate the logical relationship between the two dates. In any scenario involving a start date and an end date, it's expected that the start date shouldn't occur after the end date. But this code performs no such comparison.</p>
<p>This means that the user can select a start date that's later than the end date, and the system will accept it without warning. This leads to inconsistent or impossible data being recorded.</p>
<p>Also, the function simply logs the values without providing any feedback to the user. There's no mechanism to alert the user when a field is empty or when the dates are logically incorrect. This reduces usability and makes it difficult for users to understand or correct their mistakes.</p>
<p>Here's the fixed version:</p>
<pre><code class="language-plaintext">&lt;input type="date" id="start" required&gt;
&lt;input type="date" id="end" required&gt;
&lt;button onclick="save()"&gt;Save&lt;/button&gt;

&lt;script&gt;
function save() {
  const startValue = document.getElementById("start").value;
  const endValue = document.getElementById("end").value;

  // Extra safety: check empties (in case required is bypassed)
  if (!startValue || !endValue) {
    alert("Both start and end dates are required.");
    return;
  }

  const start = new Date(startValue);
  const end = new Date(endValue);

  if (end &lt; start) {
    alert("End date cannot be before start date.");
    return;
  }

  console.log({ start, end });
}
&lt;/script&gt;
</code></pre>
<p>In this improved version, first, both date fields now include the <code>required</code> attribute, ensuring that the user can't leave either field empty without triggering validation.</p>
<p>Second, we've added a logical validation check to ensure that the relationship between the two dates is correct. After retrieving the values, the function converts them into <code>Date</code> objects and compares them to verify that the end date doesn't occur before the start date. If this condition is violated, the function stops execution and displays an alert informing the user of the error.</p>
<p>This prevents inconsistent or impossible date ranges from being accepted.</p>
<h3 id="heading-duplicate-and-data-integrity-errors">Duplicate and Data Integrity Errors</h3>
<p>When you let a user submit an email that's already registered, choose a username that's already taken, or enter a duplicate employee ID or student number, this results in identity conflicts and duplicate records. Problems also arise when you allow users to upload unsupported file types, oversized files, or corrupted images.</p>
<p>Security risks can emerge when users are able to enter HTML/script tags (XSS), SQL‑injection patterns, or disallowed special characters. These issues compromise data quality, system integrity, and security.</p>
<p>You can prevent these types of issues by using uniqueness checks, file‑type and size validation, and input sanitization to block duplicates, invalid uploads, and malicious inputs.</p>
<p>Here's an example of a duplicate error:</p>
<pre><code class="language-plaintext">&lt;input id="email" placeholder="Enter email" required&gt;
&lt;button onclick="save()"&gt;Save&lt;/button&gt;

&lt;script&gt;
const savedEmails = [];

function save() {
  const email = document.getElementById("email").value;
  savedEmails.push(email);
  console.log("Saved emails:", savedEmails);
}
&lt;/script&gt;
</code></pre>
<p>This code blindly pushes every email into the <code>savedEmails</code> array without checking whether the email already exists. Because there is no duplicate detection, the user can enter the same email multiple times.</p>
<p>Here is the fixed version:</p>
<pre><code class="language-plaintext">&lt;input id="email" placeholder="Enter email" required&gt;
&lt;button onclick="save()"&gt;Save&lt;/button&gt;

&lt;script&gt;
const savedEmails = [];

function save() {
  const email = document.getElementById("email").value.trim();

  // Check if the field is empty
  if (!email) {
    alert("Please enter an email before saving.");
    return;
  }

  // Check for duplicate
  if (savedEmails.includes(email)) {
    alert("This email has already been saved.");
    return;
  }

  savedEmails.push(email);
  console.log("Saved emails:", savedEmails);
}
&lt;/script&gt;

</code></pre>
<p>In this improved version of the code, we've implemented proper validation steps to prevent duplicate email entries. Before saving the email, the function checks whether the value already exists in the <code>savedEmails</code> array using the <code>includes()</code> method. If the email is found, the function stops execution and displays an alert informing the user that the email has already been saved. This ensures that each email is stored only once, maintaining the uniqueness and integrity of the data.</p>
<h3 id="heading-relational-errors-reference-integrity">Relational Errors (Reference Integrity)</h3>
<p>If you let a user select a city that doesn’t belong to the chosen country, a product ID that no longer exists, a retired SKU, or a shipping method unavailable in the selected region, this can result in broken references.</p>
<p>The same applies when users can select a manager from a different department or choose a fully booked time slot, not setting the right roles and permissions. These errors break relationships between tables and corrupt downstream joins and reports.</p>
<p>Here, you can use dependent dropdowns, real‑time lookups, and foreign‑key validation to help ensure that users can only select valid, existing, and compatible options.</p>
<p>Here's a bad example of a relational error:</p>
<pre><code class="language-plaintext">&lt;select id="country"&gt;
  &lt;option value="uk"&gt;United Kingdom&lt;/option&gt;
  &lt;option value="usa"&gt;United States&lt;/option&gt;
&lt;/select&gt;

&lt;select id="city"&gt;
  &lt;option value="london"&gt;London&lt;/option&gt;
  &lt;option value="manchester"&gt;Manchester&lt;/option&gt;
  &lt;option value="newyork"&gt;New York&lt;/option&gt;
  &lt;option value="losangeles"&gt;Los Angeles&lt;/option&gt;
&lt;/select&gt;

&lt;button onclick="save()"&gt;Save&lt;/button&gt;

&lt;script&gt;
function save() {
  const country = document.getElementById("country").value;
  const city = document.getElementById("city").value;

  console.log("Saving:", { country, city });
}
&lt;/script&gt;
</code></pre>
<p>From the above, the mistake in this code is that we've treated country and city as completely independent fields, even though one is supposed to depend on the other. By presenting all cities regardless of the selected country, the interface allows users to create combinations that make no sense — such as choosing “United Kingdom” with “New York” or “United States” with “Manchester.”</p>
<p>Also, because the <code>save()</code> function performs no validation and simply logs whatever the user selects, the system ends up accepting and storing relationships that should never exist. This breaks the logical link between the two fields and leads to invalid, inconsistent data that can corrupt downstream.</p>
<p>Here's the fixed, production-ready version:</p>
<pre><code class="language-plaintext">&lt;select id="country" onchange="loadCities()" required&gt;
  &lt;option value=""&gt;Select country&lt;/option&gt;
  &lt;option value="uk"&gt;United Kingdom&lt;/option&gt;
  &lt;option value="usa"&gt;United States&lt;/option&gt;
&lt;/select&gt;

&lt;select id="city" required disabled&gt;
  &lt;option value=""&gt;Select city&lt;/option&gt;
&lt;/select&gt;

&lt;button onclick="save()"&gt;Save&lt;/button&gt;

&lt;script&gt;
const citiesByCountry = {
  uk: ["London", "Manchester"],
  usa: ["New York", "Los Angeles"]
};

function loadCities() {
  const country = document.getElementById("country").value;
  const citySelect = document.getElementById("city");

  // Reset city dropdown
  citySelect.innerHTML = '&lt;option value=""&gt;Select city&lt;/option&gt;';

  // Disable if no country selected
  if (!country) {
    citySelect.disabled = true;
    return;
  }

  // Enable dropdown
  citySelect.disabled = false;

  // Load cities safely
  (citiesByCountry[country] || []).forEach(city =&gt; {
    const option = document.createElement("option");
    option.value = city.toLowerCase().replace(/\s+/g, ""); // remove ALL spaces
    option.textContent = city;
    citySelect.appendChild(option);
  });
}

function save() {
  const country = document.getElementById("country").value;
  const city = document.getElementById("city").value;

  // Required validation
  if (!country || !city) {
    alert("Please select both a country and a city.");
    return;
  }

  // Build list of valid cities for this country
  const validCities = (citiesByCountry[country] || [])
    .map(c =&gt; c.toLowerCase().replace(/\s+/g, ""));

  // Relational validation
  if (!validCities.includes(city)) {
    alert("Selected city does not belong to the chosen country.");
    return;
  }

  console.log("Saving:", { country, city });
}
&lt;/script&gt;
</code></pre>
<p>This improved code turns the country–city form into a controlled, relationship‑aware flow instead of two loose dropdowns.</p>
<p>When the user selects a country, the <code>loadCities()</code> function runs. It first clears the city dropdown and, if no country is selected, keeps the city field disabled so the user can't choose a city on its own.</p>
<p>Once a valid country is chosen, the city dropdown is enabled and populated only with the cities that belong to that specific country, using the <code>citiesByCountry</code> mapping. Also, the city values are normalised (lowercased and stripped of spaces) so they’re consistent and safe to compare.</p>
<p>When the user clicks “Save,” the <code>save()</code> function checks that both a country and a city have been selected. If either is missing, it shows an alert and stops. It then rebuilds the list of valid city values for the chosen country and verifies that the selected city is actually in that list.</p>
<h3 id="heading-structural-errors-dropdowns-radio-buttons-enums">Structural Errors (Dropdowns, Radio Buttons, Enums)</h3>
<p>If users can type a country as “U.S.A”, “USA”, “United States”, or “us”, enter gender as “male”, “Male”, “M”, or “man”, or type a department as “Engineering”, “Eng”, or “engineer”, this can result in inconsistent categorical data.</p>
<p>The same applies to currencies typed as “usd”, “USD”, “US Dollars”, product categories spelled differently, status values like “active”, “Active”, “ACT”, “enabled”, or boolean values like “yes”, “Yes”, “Y”, “1”.</p>
<p>These inconsistencies make analytics, grouping, and reporting unreliable, and the analyst will spend time cleaning and standardizing these files.</p>
<p>You should replace free‑text fields with dropdowns, radio buttons, and enums to enforce standardized categorical values.</p>
<p>Bad example of a structural error:</p>
<pre><code class="language-plaintext">&lt;form id="profile"&gt;
  &lt;label&gt;Country&lt;/label&gt;
  &lt;input type="text" id="country" placeholder="Enter country"&gt;
  &lt;button type="submit"&gt;Save&lt;/button&gt;
&lt;/form&gt;

&lt;script&gt;
document.getElementById("profile").addEventListener("submit", e =&gt; {
  e.preventDefault();
  const country = document.getElementById("country").value;
  console.log("Saving:", country);
});
&lt;/script&gt;
</code></pre>
<p>The problem with this code is that it pretends to save a country value without doing any real validation or enforcing any rules, which makes the form unreliable and prone to bad data.</p>
<p>The form uses a plain text input for “country,” meaning the user can type anything they want — misspellings, random characters, invalid countries, or even leave it blank. Because the input isn’t marked as required and the JavaScript doesn’t check whether the field contains a meaningful value, the form will happily “save” an empty string or nonsense text.</p>
<p>The <code>submit</code> handler prevents the default form submission but does nothing beyond logging whatever the user typed, so the system accepts invalid, incomplete, or malformed data without question. In short, the code collects input but doesn't validate it, doesn't enforce correctness, and doesn't protect the system from bad or unusable values.</p>
<p>Here's the fixed version:</p>
<pre><code class="language-plaintext">&lt;form id="profile"&gt;
  &lt;label&gt;Country&lt;/label&gt;
  &lt;select id="country" required&gt;
    &lt;option value=""&gt;Select country&lt;/option&gt;
    &lt;option value="uk"&gt;United Kingdom&lt;/option&gt;
    &lt;option value="usa"&gt;United States&lt;/option&gt;
    &lt;option value="canada"&gt;Canada&lt;/option&gt;
  &lt;/select&gt;

  &lt;button type="submit"&gt;Save&lt;/button&gt;
&lt;/form&gt;

&lt;script&gt;
document.getElementById("profile").addEventListener("submit", e =&gt; {
  e.preventDefault();

  const country = document.getElementById("country").value;

  // Required validation
  if (!country) {
    alert("Please select a country before saving.");
    return;
  }

  console.log("Saving:", country);
});
&lt;/script&gt;
</code></pre>
<p>The biggest improvement is that we're no longer relying on a free‑text field for the country. By switching to a dropdown, the form now limits the user to a controlled set of valid options. This prevents misspellings, random text, or invalid country names from ever entering the system.</p>
<p>These are the main types of data errors you might come across in your work. Now that we've discussed what causes them and some key fixes/preventative measures you can take, let's move on to data quality itself.</p>
<h2 id="heading-what-makes-good-data">What Makes Good Data?</h2>
<p>So what, in fact, is data quality? <a href="https://www.ibm.com/products/tutorials/6-pillars-of-data-quality-and-how-to-improve-your-data">IBM defines it</a> as the degree of accuracy, consistency, completeness, reliability, and relevance of the data collected, stored, and used within an organization or a specific context.</p>
<p>Let's look at each of these features of quality data a bit more closely to understand what they entail.</p>
<h3 id="heading-completeness">Completeness:</h3>
<p>Completeness measures how much of the required data is actually present. When large portions of fields are missing, the dataset stops representing reality and any analysis built on it becomes unreliable.</p>
<p>An example would be a sign‑up form that stores users, but half of them are missing an email address. If you run an analysis on “email engagement,” your results will be skewed because a big chunk of users can’t even receive emails. This means that this data is incomplete.</p>
<h3 id="heading-uniqueness">Uniqueness:</h3>
<p>Uniqueness checks whether each real‑world entity appears only once in the dataset. Duplicate records inflate counts, break joins, and distort metrics.</p>
<p>An example would be a customer table containing two rows for the same person with the same customer ID. When calculating “active customers,” the system counts them twice, inflating revenue projections.</p>
<h3 id="heading-validity">Validity:</h3>
<p>Validity evaluates whether data follows the expected format, type, or business rules. This includes correct data types, allowed ranges, and patterns defined by the system.</p>
<p>An example would be a field meant to store dates contains values like “32/99/2025” or “tomorrow.” These invalid entries break downstream ETL jobs that expect a proper date format.</p>
<h3 id="heading-timeliness">Timeliness:</h3>
<p>Timeliness reflects whether data is available when it’s needed. Even accurate data becomes useless if it arrives too late for the process that depends on it. For example, after a customer places an order, the system should generate an order ID instantly.</p>
<h3 id="heading-accuracy">Accuracy:</h3>
<p>Accuracy measures how closely data matches the real‑world truth. When multiple systems report the same metric, one must be designated as the authoritative source to avoid conflicting values.</p>
<h3 id="heading-consistency">Consistency:</h3>
<p>Consistency checks whether data aligns across different datasets or within related fields. If two systems describe the same concept, their values shouldn't contradict each other.</p>
<p>For example, a company’s HR system reports 50 employees in Engineering, but the payroll system lists only 42. Since both describe the same group, the mismatch signals a data quality issue.</p>
<h3 id="heading-fitness-for-purpose">Fitness for Purpose:</h3>
<p>Fitness for purpose assesses whether the data is suitable for the specific business task at hand. Even complete, accurate, and timely data may be unhelpful if it doesn’t answer the intended question.</p>
<p>A dataset of website clicks might be perfect for analysing user engagement, for example, but it’s useless for forecasting revenue because it contains no purchase or pricing information.</p>
<h2 id="heading-data-validation-layers">Data Validation Layers</h2>
<p>Now that we've highlighted the characteristics that ensure quality data, it's important to discuss the layers of data validation.</p>
<p>There are five layers you'll need to check to enforce data quality.</p>
<h3 id="heading-frontend-layer-protect-the-user-not-the-system">Frontend Layer — “Protect the User, Not the System”</h3>
<p>Frontend validation plays an important role in enhancing the user experience – but it doesn't provide real protection for a system.</p>
<p>Since frontend logic operates within the user’s environment, we can't trust it as a mechanism for enforcing data quality. Any code executed in the browser is ultimately under the user’s control, meaning it can be disabled, modified, intercepted, or bypassed entirely.</p>
<p>For instance, a user can simply open browser developer tools, remove validation rules, and submit invalid or malicious data without restriction.</p>
<p>Frontend validation is incapable of enforcing complex business rules. Constraints such as ensuring that a discounted price is lower than the original price, validating that a start date precedes an end date, preventing stock levels from becoming negative, or confirming that a product belongs to a valid category within the database require deeper system-level checks.</p>
<p>At the frontend level, what is being validated is: required fields, email format, password strength, address fields, and payment input format.</p>
<p>So frontend validation doesn't guarantee data quality or security, as it can be bypassed through API tools (like Postman), disabled JavaScript, malicious bots, and third-party integrations.</p>
<p>Because of this, it's best to treat the front-end as a usability layer, not a trust layer.</p>
<h3 id="heading-backend-validation-the-real-gatekeeper">Backend Validation — “The Real Gatekeeper”</h3>
<p>You can only guarantee true data quality and system integrity at the backend and database layers.</p>
<p>The backend is responsible for enforcing request validation, implementing business logic, and managing authentication and authorization.</p>
<p>If validation fails here, invalid data is rejected before it can propagate. Without this layer, data corruption begins at ingestion.</p>
<p>For example:</p>
<pre><code class="language-plaintext">$request-&gt;validate([
   'name' =&gt; 'required|string|max:255',
   'price' =&gt; 'required|numeric|min:0',
   'stock' =&gt; 'required|integer|min:0',
   'category_id' =&gt; 'required|exists:categories,id',
]);
</code></pre>
<p>The code snippet above demonstrates how you can use request validation in Laravel to ensure that incoming data meets specific requirements before it's processed or stored in the database. This is an essential practice in web development, as it helps maintain data integrity, prevents errors, and enhances application security.</p>
<p>In this example, we're using the <code>$request-&gt;validate()</code> method to define a set of validation rules for four input fields: <code>name</code>, <code>price</code>, <code>stock</code>, and <code>category_id</code>. Each field is assigned a series of constraints that the incoming data must satisfy.</p>
<p>The name field is marked as required, meaning it must be included in the request and can't be empty. It must also be a string, ensuring that only textual data is accepted, and it's limited to a maximum length of 255 characters using <code>max:255</code>. This prevents excessively long inputs that could potentially cause issues in the database or user interface.</p>
<p>Similarly, the price field is required and must be numeric, allowing only numbers such as integers or decimal values. The rule <code>min:0</code> ensures that the price can't be negative, which is logically consistent for most product pricing scenarios.</p>
<p>The stock field is also required and must be an integer, meaning it can only accept whole numbers. This is appropriate for counting physical items. Like the price field, it includes a <code>min:0</code> rule to prevent negative stock values, which would not make sense in an inventory system.</p>
<p>Finally, the category_id field is validated to ensure it is both present and valid. The <code>required</code> rule ensures that a category is selected, while the <code>exists:categories,id</code> rule checks that the provided value corresponds to an existing id in the categories database table. This prevents invalid or non-existent category references, thereby preserving relational integrity within the database.</p>
<p>This layer validates null values, data types and formats, allowed ranges, and referential integrity (exists).</p>
<h3 id="heading-database-layer-protect-the-data-at-rest">Database Layer — “Protect the Data at Rest”</h3>
<p>Validation at the application level is insufficient on its own. You'll also need to enforce database-level constraints like NOT NULL constraints, UNIQUE constraints (email, SKU, order number), foreign keys (orders.user_id → users.id), and check constraints (for example, price &gt;= 0).</p>
<p>This layer is critical because application bugs may bypass validation, background jobs and imports may skip controllers, and malicious actors may attempt direct access.</p>
<p>The database layer acts as the final line of defense, ensuring structural integrity regardless of application failures. Database constraints are the last hard stop: they enforce correctness even when code is bypassed.</p>
<h3 id="heading-service-layer-business-logic-validate-real-world-rules">Service Layer / Business Logic — “Validate Real-World Rules”</h3>
<p>This layer enforces domain-specific logic that can't be captured by simple validation rules. The service layer is where the application stops asking “Is this data shaped correctly?” and starts asking “Is this allowed to happen in the real world?”.</p>
<p>This layer enforces domain‑specific rules that can't be captured by simple request validation or database constraints. These rules reflect business truth, not structural correctness.</p>
<p><strong>Example:</strong></p>
<pre><code class="language-plaintext">if (\(product-&gt;stock &lt; \)quantity) {
   throw new OutOfStockException();
}
</code></pre>
<p>This prevents overselling and ensures the system reflects physical reality.</p>
<pre><code class="language-plaintext">if (\(cartTotal !== \)calculatedTotal) {
   throw new PriceMismatchException();
}
</code></pre>
<p>This protects revenue and prevents tampering.</p>
<p>In this layer, you enforce real‑world business rules by ensuring inventory correctness, recalculating totals, applying discount logic, and checking user‑specific limits.</p>
<h3 id="heading-jobs-queues-data-ingestion-validate-external-data">Jobs / Queues / Data Ingestion — “Validate External Data”</h3>
<p>When importing or processing external data (for example, supplier feeds), validation must occur before processing. You'll need to ensure schema conformity, that the required columns are present, that you have the correct data types, that the JSON structure is valid, and that you're detecting duplicate batches.</p>
<p>This is because external data sources are a major source of data quality issues. Without validation here, corrupted data can silently enter the system at scale.</p>
<p>Now that we've discussed the layers of a modern application stack, it should be clear that data quality isn't something you “check once” at the UI.</p>
<p>It must be enforced repeatedly, at multiple depths of the system. Each layer catches a different class of defects, and together they form a defensive wall that prevents bad data from ever reaching storage, analytics, or downstream consumers.</p>
<h2 id="heading-testing-strategies-to-protect-data-quality">Testing Strategies to Protect Data Quality</h2>
<p>To wrap up, here are the three foundational testing strategy every developer should apply to protect data quality.</p>
<h3 id="heading-unit-testing">Unit Testing</h3>
<p>Unit tests are the first line of defense in data quality. In this context, a “unit” refers to a single column, a single transformation, or a single validation rule.</p>
<p>The purpose is straightforward: verify that the smallest building blocks of your data logic behave exactly as intended. This matters because if these low‑level rules are not tested and validated, incorrect or inconsistent data will flow into the database and contaminate everything built on top of it.</p>
<p>By isolating each rule or transformation, you can guarantee that schema constraints, field‑level assumptions, and low‑level logic remain correct before data ever flows into larger pipelines or business processes.</p>
<p>Typical questions answered at this layer include:</p>
<ol>
<li><p>Does this column allow nulls?</p>
</li>
<li><p>Does this regex correctly strip whitespace from email strings?</p>
</li>
<li><p>Does this transformation produce the expected output for a single row?</p>
</li>
</ol>
<p>This is where you can verify that the data contract is sound. If a column must be non‑null, unique, or follow a specific pattern, the unit test enforces it. When these rules fail here, they fail cheaply – before they can corrupt a table or mislead a dashboard.</p>
<p>To make this concrete, here’s what a unit test looks like in a real codebase. Even though this example comes from Laravel, the testing principle is identical to data‑quality unit tests: one rule, one expectation, isolated from everything else.</p>
<h4 id="heading-example-testing-a-discount-calculation-rule">Example: Testing a Discount Calculation Rule</h4>
<p>Imagine your e‑commerce shop has this rule:</p>
<ul>
<li><p>If a product costs more than £100, apply a 10% discount.</p>
</li>
<li><p>Otherwise, apply no discount.</p>
</li>
</ul>
<p>Let's say this is your discount logic:</p>
<pre><code class="language-plaintext">&lt;?php

namespace App\Services;

class DiscountService
{
    public function calculate(float $price): float
    {
        if ($price &gt; 100) {
            return $price * 0.10; // 10% discount
        }

        return 0;
    }
}
</code></pre>
<p>The unit test for this logic will be:</p>
<pre><code class="language-plaintext">&lt;?php

namespace Tests\Unit;

use Tests\TestCase;
use App\Services\DiscountService;

class DiscountServiceTest extends TestCase
{
    /** @test */
    public function it_applies_10_percent_discount_when_price_is_above_100()
    {
        $service = new DiscountService();

        \(discount = \)service-&gt;calculate(200);

        \(this-&gt;assertEquals(20, \)discount);
    }

    /** @test */
    public function it_applies_no_discount_when_price_is_100_or_below()
    {
        $service = new DiscountService();

        \(discount = \)service-&gt;calculate(100);

        \(this-&gt;assertEquals(0, \)discount);
    }
}
</code></pre>
<p>The <code>DiscountService</code> contains a simple rule: if a price is greater than 100, a 10% discount is applied. Otherwise, no discount is applied. The unit test verifies this rule in isolation, without involving controllers, databases, or HTTP requests. By testing the service directly, the developer ensures that the core calculation behaves exactly as intended.</p>
<p>The first test checks the positive case — a price of 200 should produce a discount of 20. The second test checks the boundary condition — a price of 100 should produce no discount. Together, these tests confirm both sides of the rule and protect against regressions if the logic changes in the future.</p>
<p>Now, since this is Laravel example, Laravel tests help you verify both your logic (unit tests) and your full application behaviour (feature tests). You can run them using <code>php artisan test</code>, which executes tests in a separate testing environment, ensuring your real database and main codebase remain safe and unaffected.</p>
<h3 id="heading-integration-testing-the-flow-amp-lineage-check">Integration Testing: The Flow &amp; Lineage Check</h3>
<p>While unit tests validate the correctness of individual rules, integration tests validate the movement of data across components. Integration testing verifies that multiple layers work together as a single data flow.</p>
<p>In this example, the controller receives an order, calls the discount service, applies the transformation, and persists the result to the database. That interaction across layers is what elevates this from a unit test to an integration test. This is where you test the real‑world flow:</p>
<ol>
<li><p>Controller → Service → Repository → MySQL</p>
</li>
<li><p>Check if MySQL migrations run correctly</p>
</li>
<li><p>Check foreign keys enforce relationships</p>
</li>
<li><p>Check to ensure services interact with the database as expected</p>
</li>
<li><p>Check to ensure models and repositories behave consistently</p>
</li>
</ol>
<p>Integration tests reveal issues that only appear when components interact: incorrect joins, broken migrations, mismatched field names, or subtle type mismatches that unit tests cannot detect.</p>
<p>This is the layer where you catch the bugs that would otherwise silently corrupt data lineage.</p>
<p><strong>Here's an example:</strong></p>
<pre><code class="language-plaintext">&lt;?php

namespace Tests\Feature;

use Tests\TestCase;
use App\Models\Order;
use Illuminate\Foundation\Testing\RefreshDatabase;

class ApplyDiscountTest extends TestCase
{
    use RefreshDatabase;

    /** @test */
    public function check_it_persists_the_correct_discounted_total_to_the_database()
    {
        $order = Order::factory()-&gt;create(['subtotal' =&gt; 150]);

        \(response = \)this-&gt;postJson("/orders/{$order-&gt;id}/apply-discount");

        $response-&gt;assertStatus(200);

        $this-&gt;assertDatabaseHas('orders', [
            'id' =&gt; $order-&gt;id,
            'grand_total' =&gt; 135, // 150 - 10% discount
            'discount_total' =&gt; 15
        ]);
    }
}
</code></pre>
<p>This represents a full flow rather than a single rule:</p>
<ul>
<li><p>Controller → Service</p>
</li>
<li><p>Service → Calculation</p>
</li>
<li><p>Controller → Database write</p>
</li>
<li><p>Database → Final state</p>
</li>
</ul>
<p>This test begins by creating an order using an Eloquent factory. It immediately steps beyond the boundaries of a unit test, since it interacts with the database and relies on Laravel’s model layer to persist real data.</p>
<p>From there, the test sends an actual HTTP POST request to the <code>/orders/{id}/apply-discount</code> endpoint, which means it's not calling a method directly, but instead it's traveling through Laravel’s routing layer, invoking the controller responsible for handling the request, and triggering whatever business logic is responsible for calculating and applying the discount.</p>
<p>This movement through multiple layers (routing, controller, service logic, and model persistence) is precisely what defines integration testing: the goal is to verify that these components work together correctly as a system.</p>
<p>Once the request is processed, the test asserts that the response returns a successful status code, which confirms that the HTTP layer behaved as expected.</p>
<p>But the most important part comes afterward, when the test checks the database to ensure that the correct <code>grand_total</code> and <code>discount_total</code> were saved. This final assertion proves that the discount logic was executed, the model was updated, and the changes were successfully written to the database.</p>
<p>In other words, the test isn't merely checking whether a calculation is correct. It's also checking whether the entire pipeline –&nbsp;from receiving the request to updating the database –&nbsp;functions as a coherent whole.</p>
<h3 id="heading-functional-testing-the-business-rule-check">Functional Testing: The Business Rule Check</h3>
<p>Functional tests validate the entire user experience, from the moment a request enters the system to the moment a response is returned. This includes:</p>
<ul>
<li><p>HTTP requests</p>
</li>
<li><p>Controller logic</p>
</li>
<li><p>Validation rules</p>
</li>
<li><p>Service operations</p>
</li>
<li><p>Database writes</p>
</li>
<li><p>Redirects or rendered views</p>
</li>
</ul>
<p>This is where you test the business rules that govern real‑world behaviour:</p>
<p>“A student can't register for two exams at the same time.”</p>
<p>“A cart can't have negative quantities.”</p>
<p>“A user can't update their profile without a valid email.”</p>
<p>Functional tests ensure that the system behaves correctly from the perspective of the user and the business, not just the code.</p>
<h4 id="heading-heres-an-example-functional-test">Here's an example: Functional Test</h4>
<pre><code class="language-plaintext">&lt;?php

namespace Tests\Feature;

use Tests\TestCase;
use App\Models\Product;
use Illuminate\Foundation\Testing\RefreshDatabase;

class CartQuantityFunctionalTest extends TestCase
{
    use RefreshDatabase;

    /** @test */
    public function a_user_cannot_set_a_negative_cart_quantity()
    {
        // Arrange: create a product
        $product = Product::factory()-&gt;create(['price' =&gt; 40]);

        // Simulate existing cart
        $this-&gt;withSession([
            'cart' =&gt; [
                $product-&gt;id =&gt; ['quantity' =&gt; 2]
            ]
        ]);

        // Act: user tries to update quantity to a negative number
        \(response = \)this-&gt;post('/cart/update', [
            'product_id' =&gt; $product-&gt;id,
            'quantity' =&gt; -5
        ]);

        // Assert: system rejects invalid business behaviour
        $response-&gt;assertStatus(302); // redirect back with errors
        $response-&gt;assertSessionHasErrors(['quantity']);

        // Assert: cart remains unchanged (business rule preserved)
        \(this-&gt;assertEquals(2, session('cart')[\)product-&gt;id]['quantity']);
    }
}
</code></pre>
<p>The test begins by creating a realistic environment in which a user interacts with a shopping cart. This is essential for understanding the behaviour the system is meant to enforce.</p>
<p>First, it generates a real product in the database using a factory, giving the product a price so that it resembles an item a customer might genuinely add to their cart.</p>
<p>Once the product exists, the test manually seeds the session with a cart containing that product and a quantity of two. This simulates a user who has already added the item to their cart in a previous interaction, and it establishes the baseline state the system must preserve if the user attempts an invalid update.</p>
<p>With the environment prepared, the test then imitates a user action by sending a POST request to the <code>/cart/update</code> endpoint. Instead of calling a method directly, it uses Laravel’s HTTP layer to reproduce the exact behaviour of a browser submitting a form. The request includes the product ID and a deliberately invalid quantity of negative five.</p>
<p>This is the heart of the scenario: the user is attempting something that violates the business rules of the application, and the test is designed to confirm that the system responds appropriately.</p>
<p>Now, when the request is processed, the test expects the application to reject the input, redirect the user back, and attach validation errors to the session. The assertion that the response has a 302 status code and contains validation errors confirms that the validation layer is functioning correctly and that the controller is enforcing the rule that quantities can't be negative.</p>
<p>The final part of the test is where the business rule is truly verified. After the failed update attempt, the test inspects the session to ensure that the cart remains unchanged. This is crucial because rejecting invalid input is only half of the requirement: the system must also protect the integrity of the existing cart data.</p>
<p>Functional tests answer questions like:</p>
<ul>
<li><p>Does the system prevent invalid real‑world behaviour?</p>
</li>
<li><p>Does the user get the correct feedback?</p>
</li>
<li><p>Does the data remain consistent after the request?</p>
</li>
<li><p>Does the final output match the business expectation?</p>
</li>
</ul>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Data quality is never the result of a single check or a single team. It emerges from a disciplined, layered approach where each testing level catches a different category of defects.</p>
<p>Unit tests safeguard the smallest rules, integration tests validate the flow of data across components, and functional tests enforce the business logic that governs real‑world behaviour.</p>
<p>When these layers operate together, bad data has nowhere to hide. When they don’t, even a minor oversight can slip through the cracks and escalate into a costly downstream failure.</p>
<p>So as you can see, your role in data quality is fundamentally proactive, not reactive. By designing systems with validation, integrity, and monitoring in mind, you ensure that data flowing through the pipeline is accurate, timely, complete, unique, and fit for purpose – supporting reliable analytics, reporting, and intelligent systems.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Software Testing with Playwright ]]>
                </title>
                <description>
                    <![CDATA[ Testing is the unsung hero of software development because shipping features is only half the battle. We just published a comprehensive course on the freeCodeCamp.org YouTube channel that will teach y ]]>
                </description>
                <link>https://www.freecodecamp.org/news/software-testing-with-playwright/</link>
                <guid isPermaLink="false">69bc3c86b238fd45a32512ba</guid>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Thu, 19 Mar 2026 18:12:22 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5f68e7df6dfc523d0a894e7c/d64667fd-3a46-4b34-8dc9-89c5a56e59d3.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Testing is the unsung hero of software development because shipping features is only half the battle.</p>
<p>We just published a comprehensive course on the <a href="http://freeCodeCamp.org">freeCodeCamp.org</a> YouTube channel that will teach you all about why and how to test software.</p>
<p>You will learn about the foundational Testing Pyramid and how to balance fast unit tests with complex end-to-end journeys. And you will learn how to use Playwright to test an e-commerce application. The course also explores the future of the industry by showcasing KaneAI, an AI-powered agent that allows you to author stable, auto-healing tests using plain English instructions.</p>
<p>This course will give you the practical skills to automate your workflow and ensure your code remains production-ready.</p>
<p>Here are the sections in this course:</p>
<ul>
<li><p>Course Introduction and Overview</p>
</li>
<li><p>Why Software Testing Matters</p>
</li>
<li><p>Case Studies: Knight Capital &amp; Therac-25</p>
</li>
<li><p>The Boeing 737 Max &amp; The Cost of Everyday Bugs</p>
</li>
<li><p>Testing as "Insurance" for Your Code</p>
</li>
<li><p>The Testing Pyramid: Unit, Integration, &amp; E2E</p>
</li>
<li><p>Test-Driven Development (TDD) Explained</p>
</li>
<li><p>Hands-on: Setting Up the TechMart Sample App</p>
</li>
<li><p>Playwright Framework Installation &amp; Setup</p>
</li>
<li><p>Understanding Playwright Test Structure &amp; Assertions</p>
</li>
<li><p>Writing a Search Functionality Test from Scratch</p>
</li>
<li><p>Strategic Locators: Finding Elements Effectively</p>
</li>
<li><p>Testing Complex Shopping Cart Logic</p>
</li>
<li><p>Login Forms, Validations, &amp; Error Handling</p>
</li>
<li><p>Full End-to-End Checkout Flow Walkthrough</p>
</li>
<li><p>Direct API Testing with Playwright</p>
</li>
<li><p>Debugging Tests in Headed and UI Interactive Modes</p>
</li>
<li><p>Testing Edge Cases and Security (XSS) Vulnerabilities</p>
</li>
<li><p>Mocking API Responses and Simulating Slow Networks</p>
</li>
<li><p>Accessibility Testing for Screen Readers &amp; Keyboards</p>
</li>
<li><p>Challenges: Learning Curves and Maintenance Burdens</p>
</li>
<li><p>Introduction to AI-Powered Software Testing</p>
</li>
<li><p>Hands-on with KaneAI: Authoring Tests in Plain English</p>
</li>
<li><p>Natural Language Code Generation &amp; Auto-Healing Tests</p>
</li>
<li><p>Executing API Tests Using AI Agents</p>
</li>
<li><p>Professional Best Practices: CI/CD &amp; Page Objects</p>
</li>
<li><p>Final Takeaways: When to Use Manual vs. AI Tools</p>
</li>
</ul>
<p>Watch the full course <a href="https://youtu.be/jydYq7oAtD8">on the freeCodeCamp.org YouTube channel</a> (1-hour watch).</p>
<div class="embed-wrapper"><iframe width="560" height="315" src="https://www.youtube.com/embed/jydYq7oAtD8" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Test a Complex Full-Stack App: Manual Approach vs AI-Assisted Testing ]]>
                </title>
                <description>
                    <![CDATA[ A few days ago, I ran an experiment with an AI-powered testing agent that lets you write test cases in plain English instead of code. I opened its natural language interface and typed four simple sent ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-test-a-complex-full-stack-app-manual-vs-ai-assisted-testing/</link>
                <guid isPermaLink="false">69b843852ad6ae5184d6fa75</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ full stack ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ajay Yadav ]]>
                </dc:creator>
                <pubDate>Mon, 16 Mar 2026 17:53:09 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/3970744b-194e-4573-b49a-c057a4632d8c.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>A few days ago, I ran an experiment with an AI-powered testing agent that lets you write test cases in plain English instead of code. I opened its natural language interface and typed four simple sentences to test google.com:</p>
<pre><code class="language-plaintext">1. Go to google.com
2. There should be a long input field on the page
3. Type something and verify suggestions appear in a dropdown
4. The input field should not have any placeholder text
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/6198d3da5bb9cc256fc69512/24f353d9-8c98-49a9-ba81-3e236546dab2.png" alt="KaneAI's natural language test authoring interface showing a text input field with the prompt &quot;What do you want to test today?" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>A real browser opened Google, found the search bar, typed a query, checked for the autocomplete dropdown, and verified there was no placeholder, all from those four lines.</p>
<p>No Playwright selectors. No <code>page.getByRole()</code>. No CSS class names. Just plain English describing what a user would do.</p>
<p>That made me curious: what happens if I try this on something actually complex? So I tested my own full-stack app's auth endpoint the same way:</p>
<blockquote>
<p><em><strong>Send a GET request to /api/auth/status without any session cookie. Verify it returns 401.</strong></em></p>
</blockquote>
<p>Within 15 seconds, done.</p>
<p>The same test took me an hour to set up manually, building a session helper, separating my Express app from the server startup, seeding a test database, just so I could write five lines of Supertest code.</p>
<p>I ended up testing my entire application both ways: the traditional manual approach and the AI-assisted approach. Same endpoints, same assertions, completely different experience. This article is about what I learned.</p>
<p>But before I get into how I tested it, let's talk about what actually matters: the testing concepts themselves. Because no approach, manual or automated, will save you time or energy if you don't understand what you're testing and why.</p>
<h3 id="heading-what-well-cover">What we'll cover:</h3>
<ol>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-how-testing-actually-works-in-fullstack-apps">How Testing Actually Works in Full-Stack Apps</a></p>
</li>
<li><p><a href="#heading-what-makes-this-hard">What Made This Hard</a></p>
</li>
<li><p><a href="#heading-the-manual-approaach">The Manual Approach</a></p>
</li>
<li><p><a href="#heading-the-aiassisted-approach">The AI-Assisted Approach</a></p>
</li>
<li><p><a href="#heading-when-to-use-which-approach">When to Use Which Approach</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To get the most out of this article, you should have a basic understanding of JavaScript and Node.js, along with some familiarity with React and Express.</p>
<p>Experience writing simple tests with any JavaScript testing framework like Jest or Vitest will be helpful, though I'll explain the core testing concepts as we go.</p>
<p>You should also have Node.js installed on your machine. If you want to follow along with the manual testing examples, you'll need Vitest (or Jest) for unit and API tests, Supertest for HTTP endpoint testing, and Playwright for end-to-end browser tests. For the AI-assisted approach, I used KaneAI by LambdaTest, which you can explore through their platform.</p>
<h2 id="heading-how-testing-actually-works-in-full-stack-apps">How Testing Actually Works in Full-Stack Apps</h2>
<p>If you've only tested isolated React components or written a few unit tests for utility functions, full-stack testing feels like a different sport. The concepts are the same, but the complexity jumps dramatically. Here's what you actually need to know.</p>
<h3 id="heading-three-layers-three-different-jobs">Three Layers, Three Different Jobs</h3>
<p>Every full-stack application has three natural testing layers, and trying to cover everything with just one of them leads to either fragile tests or blind spots.</p>
<p>Unit Tests</p>
<p>Unit tests check that individual functions return the right output for a given input. They don't touch the database, the network, or the browser.</p>
<p>They run in milliseconds. If your function takes a string and returns a formatted slug, a unit test calls that function and checks the result. That's it.</p>
<pre><code class="language-ts">it("converts a title to a slug", () =&gt; {
  expect(slugify("My First Post")).toBe("my-first-post");
});
</code></pre>
<h4 id="heading-api-tests">API Tests</h4>
<p>API tests check that your backend endpoints return the right responses. They send real HTTP requests to your Express (or Next.js) app and verify the status codes, response shapes, and error handling.</p>
<p>If your <code>/api/auth/status</code> endpoint should return 401 without a session cookie, an API test confirms that contract.</p>
<pre><code class="language-ts">it("returns 401 without session cookie", async () =&gt; {
  const res = await request(app).get("/api/auth/status");
  expect(res.status).toBe(401);
});
</code></pre>
<h4 id="heading-end-to-end-e2e-tests">End-to-end (E2E) Tests</h4>
<p>End-to-end (E2E) tests open a real browser and interact with your app the way a user would. They click buttons, fill forms, navigate pages, and check that the right things appear on screen.</p>
<p>If your login flow should redirect to a dashboard after authentication, an E2E test walks through that entire journey.</p>
<pre><code class="language-ts">test("login redirects to dashboard", async ({ page }) =&gt; {
  await page.goto("/");
  await page.getByTestId("username-input").fill("ajay");
  await page.getByTestId("password-input").fill("password123");
  await page.getByTestId("login-button").click();
  await expect(page.getByTestId("dashboard")).toBeVisible();
});
</code></pre>
<h3 id="heading-the-pain-points-nobody-warns-you-about">The Pain Points Nobody Warns You About</h3>
<p>Tutorials make all three layers look straightforward. In practice, each one has a trap.</p>
<p>First, we have the session cookie problem. Most real apps have authentication. To test any authenticated endpoint, you need a valid session.</p>
<p>That means you need a helper function that logs in a test user, extracts the session cookie from the <code>Set-Cookie</code> header, and returns it for future requests.</p>
<p>This sounds simple. It took me an hour to build one that actually works with express-session. Every project reinvents this wheel.</p>
<p>Then we have the app vs. server separation issue. <a href="https://github.com/forwardemail/supertest#readme">Supertest</a> (the most popular API testing library) needs to import your Express app without starting a real server.</p>
<p>If your <code>app.ts</code> file has <code>app.listen(3000)</code> at the bottom, Supertest will try to bind to port 3000, and your tests will crash when running in parallel.</p>
<p>You have to separate your app definition from the server startup. <code>app.ts</code> exports the Express instance, <code>server.ts</code> calls <code>.listen()</code>. It's a three-minute refactor, but nobody tells you about it until your tests fail.</p>
<p>You also have the SSE and real-time nightmare. If your app uses Server-Sent Events (SSE) or WebSockets, you're testing time-dependent behavior.</p>
<p>You open a connection, trigger an action, and wait for an event to arrive. If the event takes too long, your test times out. If you don't set a timeout, the test hangs forever. You end up writing 30 lines of Promise wrappers, timeout handlers, and cleanup logic for a single assertion.</p>
<p>Finally, there's the selector fragility trap. E2E tests that use CSS selectors (<code>.btn-primary</code>, <code>.card-title</code>) break every time you rename a class.</p>
<p>The fix is using <code>data-testid</code> attributes, stable identifiers that exist solely for testing and don't change during refactors. But retrofitting them into an existing app means touching dozens of components.</p>
<h3 id="heading-schema-validation-the-hidden-time-sink">Schema Validation: The Hidden Time Sink</h3>
<p>Here's something nobody tells you about API testing. Writing the assertion for "does this endpoint return 200" takes one line.</p>
<p>Writing assertions that verify the shape of the response, every field exists, every field has the right type, every enum value is valid, takes 15 to 20 lines per endpoint. Multiply that across a dozen endpoints and you're spending hours writing boilerplate like:</p>
<pre><code class="language-ts">expect(res.body[0]).toHaveProperty("title");
expect(typeof res.body[0].title).toBe("string");
expect(res.body[0]).toHaveProperty("status");
expect(["open", "closed", "merged"]).toContain(res.body[0].status);
</code></pre>
<p>It's important work, though: schema validation catches real bugs when your backend changes a response shape. But the repetitiveness is what makes it a good candidate for automation, which I'll get to later.</p>
<p>These aren't edge cases. These are the everyday realities of testing a full-stack app. Knowing them upfront saves you from the "why is this so much harder than the tutorial??" frustration.</p>
<h2 id="heading-what-made-this-hard">What Made This Hard</h2>
<p>A few months ago, I wrote a <a href="https://www.freecodecamp.org/news/how-to-test-javascript-apps-from-unit-tests-to-ai-augmented-qa/">freeCodeCamp article</a> about testing JavaScript apps from unit tests to AI-augmented QA. That article covered testing fundamentals with clean, simple examples.</p>
<p>After publishing it, I kept thinking: what happens when you apply all of this to something messy?</p>
<p>I had the perfect candidate. <strong>Creoper</strong> (code name) is an AI-powered project management tool I built that connects GitHub with Discord.</p>
<p>Teams can monitor repositories, track pull requests, and query project status using natural language, all without leaving their chat platform.</p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/6198d3da5bb9cc256fc69512/57f5c35a-20fc-483e-b871-e1f55632b683.png" alt="Ajay Yadav receiving &quot;The Visionary&quot; trophy at the Hatch&amp;Hype hackathon hosted at Montrose Golf Resort and Spa, alongside a close-up of the award celebrating bold innovation with the CreoWis logo" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>I built it across two internal hackathons at <a href="https://www.creowis.com/">CreoWis</a>, and it won both times. What started as a simple GitHub-Discord automation bot evolved into a full product with five interconnected components:</p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/6198d3da5bb9cc256fc69512/b4881ec0-b5bf-4b80-b85d-ffd400240b41.png" alt="Architecture diagram of Creoper showing six interconnected components: React dashboard, Express backend, Discord bot, PostgreSQL database, GitHub webhook handlers, and LLM layer." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>It has a React dashboard with GitHub OAuth. An Express backend with REST APIs and SSE. A Discord bot that processes natural language through an LLM intent detection layer. PostgreSQL with Prisma. GitHub webhook handlers.</p>
<p>But here's the thing: despite winning two hackathons, Creoper had <strong>zero test cases</strong>. The app wasn't even deployed yet. I'd been stuck on Railway monorepo deployment issues for weeks.</p>
<p>So I was staring at a system that had every real-world testing challenge I'd just written about, auth flows, real-time events, multiple integration points, complex business logic, and no safety net at all.</p>
<p>I decided to test it two different ways and document what actually happened. If you want to explore the full project, I've written two separate <a href="https://www.creowis.com/blog/building-an-ai-powered-project-management-tool">blogs</a> about how I built it.</p>
<h2 id="heading-the-manual-approach">The Manual Approach</h2>
<p>I mapped pure logic components like the intent parser and embed builder to unit tests, since they deal with straightforward input-output behavior. I assigned Express endpoints to API tests using Supertest, which let me send real HTTP requests and verify response codes and shapes.</p>
<p>I planned to cover the React dashboard with end-to-end tests using Playwright, simulating actual user interactions in a real browser. As for Discord bot interactions and webhook delivery, those couldn't be automated reliably yet, so I documented them and tested them manually.</p>
<p>Here's what each layer looked like in practice.</p>
<h3 id="heading-unit-tests-the-easy-win">Unit Tests: The Easy Win</h3>
<p>Creoper has a function that classifies Discord messages into structured intents. If someone types "list prs," it should return <code>LIST_PRS</code> with a high confidence score.</p>
<p>If the message is gibberish, it should return <code>UNKNOWN</code> with zero confidence. The confidence score matters because anything below a threshold triggers a safe fallback instead of executing an action.</p>
<pre><code class="language-ts">it("detects LIST_PRS intent", () =&gt; {
  const result = parseIntent("list prs");
  expect(result.action).toBe("LIST_PRS");
  expect(result.confidence).toBeGreaterThan(0.8);
});

it("returns low confidence when repo name is missing", () =&gt; {
  const result = parseIntent("set active repo");
  expect(result.confidence).toBeLessThan(0.8);
});
</code></pre>
<p>Notice these aren't just <strong>"does it work"</strong> checks. They're testing a safety mechanism, the threshold between executing an action and falling back.</p>
<p>These are exactly the kinds of tests that need to be written by hand because you have to understand the business logic behind the numbers.</p>
<p>I also tested the Discord embed builder the same way. Give it push event data, check that the formatted message contains the right repo name, author, branch, and commit messages.</p>
<p>Pure input, pure output, no external dependencies. Unit tests ran in milliseconds and caught edge cases like empty commit arrays immediately.</p>
<h3 id="heading-api-tests-where-the-friction-starts">API Tests: Where the Friction Starts</h3>
<p>Testing the Express endpoints required the infrastructure work I described earlier. I separated <code>app.ts</code> from <code>server.ts</code>, built the <code>createTestSession()</code> helper, and set up an in-memory test database so tests wouldn't touch real data.</p>
<pre><code class="language-ts">it("returns 401 without session cookie", async () =&gt; {
  const res = await request(app).get("/api/auth/status");
  expect(res.status).toBe(401);
  expect(res.body).toHaveProperty("error");
});

it("returns user data with valid session", async () =&gt; {
  const cookie = await createTestSession();
  const res = await request(app)
    .get("/api/auth/status")
    .set("Cookie", cookie);
  expect(res.status).toBe(200);
  expect(res.body).toHaveProperty("username");
  expect(res.body).not.toHaveProperty("accessToken");
});
</code></pre>
<p>Five lines of test code, one hour of infrastructure to make those five lines work.</p>
<p>Then I had to repeat this pattern across every endpoint: repos, pull requests, issues, active repo configuration, each with happy path, error cases, and the tedious schema validation I mentioned earlier.</p>
<p>The SSE test was the worst. I needed a Promise wrapper, an EventSource connection, a timeout handler, an <code>onopen</code> callback to trigger the change, an event listener to catch the response, and cleanup for both the connection and the server. About 30 lines for a single assertion, and it took three attempts to get the timing right.</p>
<h3 id="heading-e2e-tests-the-full-journey">E2E Tests: The Full Journey</h3>
<p>Playwright's E2E tests were actually pleasant to write once I added <code>data-testid</code> attributes to the React components. The login flow, note creation, editing, and deletion all followed a predictable pattern.</p>
<pre><code class="language-ts">test("login and create a note", async ({ page }) =&gt; {
  await page.goto("/");
  await page.getByTestId("username-input").fill("ajay");
  await page.getByTestId("password-input").fill("password123");
  await page.getByTestId("login-button").click();
  await expect(page.getByTestId("username-display")).toContainText("ajay");
});
</code></pre>
<p>The real cost wasn't writing the tests — it was maintaining them. Midway through development, I renamed a CSS class from <code>.repo-list-item</code> to <code>.repository-card</code>. Two Playwright tests broke immediately. I found the references, updated them, re-ran. Ten minutes for a CSS rename. I can see this becoming death-by-a-thousand-cuts as the UI evolves.</p>
<h2 id="heading-the-ai-assisted-approach">The AI-Assisted Approach</h2>
<p>Now here's the same project, tested with a fundamentally different workflow.</p>
<p>Instead of writing test code, you describe what you want to test in natural language. An AI agent interprets your intent, interacts with the actual application, generates assertions, and produces exportable test code.</p>
<p>The tool I used is <a href="https://www.testmuai.com/">KaneAI</a>, a GenAI-native testing agent that covers web UIs, APIs, and mobile apps through natural language test authoring with real browser execution. That's the only background you need. Let me show you the workflow.</p>
<h3 id="heading-api-testing-describing-instead-of-coding">API Testing: Describing Instead of Coding</h3>
<p>Instead of writing Supertest code, I opened the slash command menu, selected API, and pasted a curl command:</p>
<pre><code class="language-bash">curl -X GET http://localhost:3000/api/auth/status
</code></pre>
<p>It fired the request through the tunnel, showed the 401 response, and I added it to my test steps. For the authenticated version, I pasted the same command with a session cookie from DevTools. No <code>createTestSession()</code> helper. No test database. No app separation.</p>
<p>For the repository endpoints, I described the flow in plain English:</p>
<pre><code class="language-plaintext">1. Set active repository to "atechajay/no-javascript" via POST to /api/repos/active
2. Verify the response confirms the repository is active
3. Fetch open pull requests via GET to /api/repos/pulls
4. Verify each item has title, author, url, and status fields
5. Try an invalid repository name, verify 400 error
</code></pre>
<p>It generated assertions for the happy path and added schema validation I didn't ask for checking that <code>title</code> is a string, <code>labels</code> is an array, <code>status</code> is one of the expected values. That's the tedious work that ate up hours in the manual approach, generated in seconds.</p>
<h3 id="heading-e2e-testing-plain-english-real-browser">E2E Testing: Plain English, Real Browser</h3>
<p>For the React dashboard, instead of Playwright selectors, I described:</p>
<pre><code class="language-plaintext">1. Navigate to localhost:3001
2. Click "Go to Dashboard"
3. Verify redirect to GitHub OAuth
4. After auth, verify the dashboard loads
5. Verify the username appears in the sidebar
</code></pre>
<p>It executed each step in a real cloud browser connected to my localhost. No <code>page.getByRole()</code>, no <code>page.waitForURL()</code>, no selector debugging.</p>
<p>After each test, I exported the generated code. It came with wait conditions and assertion logic baked in.</p>
<p>It wasn't perfect copy-paste: I updated environment variables, adjusted base URLs, and fixed a few field name mismatches where it expected <code>pullRequestUrl</code> instead of my actual <code>url</code> field. But it gave me roughly 70–80% of the foundation.</p>
<h3 id="heading-the-feature-that-surprised-me">The Feature That Surprised Me</h3>
<p>Midway through testing, I renamed that CSS class from <code>.repo-list-item</code> to <code>.repository-card</code>. My manual Playwright tests broke immediately.</p>
<p>But the AI tool's auto-healing detected the selector change, found the closest matching element based on the test's original intent, and continued the test with a review flag. No code changes needed.</p>
<p>For a rapidly changing MVP where class names are still in flux, that alone saved significant maintenance time.</p>
<h2 id="heading-when-to-use-which-approach">When to Use Which Approach</h2>
<p>After testing the same project both ways, here's my honest take.</p>
<p>Write tests by hand when you're testing business logic that requires domain understanding. For Creoper's intent parser, I needed to think about what "low confidence" means in the context of the application's safety mechanism.</p>
<p>An AI tool can generate assertions, but it can't understand why a confidence score of 0.5 should trigger a fallback instead of an action. Pure logic with meaningful edge cases is where hand-written tests earn their keep.</p>
<p>You should also write tests by hand when they need to run in CI without external dependencies. Vitest tests with mocked dependencies are self-contained. They run in milliseconds and don't need a tunnel, a cloud browser, or a third-party account.</p>
<p>Hand-written tests are also best when the team needs to maintain them. Hand-written tests are transparent. Generated code, even when exported, can feel opaque to someone who wasn't there when it was authored.</p>
<p>Reach for AI-assisted testing, on the other hand, when your UI changes frequently. For an MVP where CSS classes and component structure are still in flux, auto-healing prevents the "my tests broke because I renamed a div" problem. You spend less time fixing selectors and more time shipping features.</p>
<p>AI-assisted testing is also helpful when you need coverage fast and plan to refine later. The 70–80% foundation is a real boost when you're the only developer and you need coverage now. You can always hand-tune the exported code later.</p>
<p>Never rely solely on either approach to understand your system. No tool knows that an SSE connection drops after 30 seconds if the heartbeat isn't configured. No tool understands that a Discord bot should never execute a write action when confidence is below 0.8. No tool realizes the OAuth callback silently fails if the <code>redirect_uri</code> doesn't match precisely.</p>
<p>The strategy relies on you knowing which endpoints are crucial, identifying dangerous edge cases, and understanding what should occur during failures. The tool simply accelerates how quickly you can articulate and implement that strategy.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>My Full-stack app won two hackathons. But without tests, it was a house of cards. One renamed CSS class, one changed API response, and the whole system could silently break.</p>
<p>Testing it both ways taught me that the manual vs AI question is the wrong question. The real skill is matching the approach to the problem.</p>
<p>Write unit tests by hand for business logic. Use AI-assisted testing when you're drowning in repetitive schema validation across a dozen endpoints.</p>
<p>Use auto-healing for E2E tests on a fast-changing UI. And for the things you can't automate yet, like Discord bot interactions or webhook delivery, document them and test them manually until you can.</p>
<p>If you're building something complex and thinking <strong>"I'll add tests after I deploy"</strong>, flip that. Test what you can now. Document what you can't. When deployment day comes, you'll ship with confidence instead of anxiety.</p>
<h2 id="heading-before-we-end"><strong>Before We End</strong></h2>
<p>I hope you found this article insightful. I’m Ajay Yadav, a software developer and content creator.</p>
<p>You can connect with me on:</p>
<ul>
<li><p><a href="https://x.com/atechajay">Twitter/X</a> and <a href="https://www.linkedin.com/in/atechajay/">LinkedIn</a>, where I share insights to help you improve 0.01% each day.</p>
</li>
<li><p>Check out my <a href="https://github.com/ATechAjay">GitHub</a> for more projects.</p>
</li>
<li><p>Check out my <a href="https://thedivsoup.com">Medium</a> page for more blogs.</p>
</li>
<li><p>I also run a <a href="http://youtube.com/@atechajay">YouTube Channel</a> where I share content about careers, software engineering, and technical writing.</p>
</li>
</ul>
<p>See you in the next article — until then, keep learning!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How Does Kubernetes Self-Healing Work? Understand Self-Healing By Breaking a Real Cluster ]]>
                </title>
                <description>
                    <![CDATA[ I have noticed that many engineers who run Kubernetes in production have never actually watched it heal itself. They know it does. They have read the docs. But they have never seen a ReplicaSet contro ]]>
                </description>
                <link>https://www.freecodecamp.org/news/kubernetes-self-healing-explained/</link>
                <guid isPermaLink="false">69aae80e78c5adcd0e1c63bc</guid>
                
                    <category>
                        <![CDATA[ Kubernetes ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Security ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Osomudeya Zudonu ]]>
                </dc:creator>
                <pubDate>Fri, 06 Mar 2026 14:43:26 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/ef1ba178-622f-4a28-b58a-7fb8a58be964.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>I have noticed that many engineers who run Kubernetes in production have never actually watched it heal itself. They know it does. They have read the docs. But they have never seen a ReplicaSet controller fire, an OOMKill from <code>kubectl describe</code>, or watched pod endpoints go empty during a cascading failure. That's where 3 am incidents find you. This tutorial puts you on the other side of it.</p>
<p>You will clone one repo, spin up a real 3-node cluster, break it seven different ways, and watch it fix itself each time. No simulated output or fake clusters. Real Kubernetes, real failures, and real recovery. By the end, you will recognize these failure patterns when they show up in your production environment.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-what-kubelab-is">What KubeLab Is?</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-how-to-get-the-lab-running">How to Get the Lab Running</a></p>
</li>
<li><p><a href="#heading-simulation-1-kill-random-pod">Simulation 1 — Kill Random Pod</a></p>
</li>
<li><p><a href="#heading-simulation-2-drain-a-worker-node">Simulation 2 — Drain a Worker Node</a></p>
</li>
<li><p><a href="#heading-simulation-3-cpu-stress-and-throttling">Simulation 3 — CPU Stress and Throttling</a></p>
</li>
<li><p><a href="#heading-simulation-4-memory-stress-and-oomkill">Simulation 4 — Memory Stress and OOMKill</a></p>
</li>
<li><p><a href="#heading-simulation-5-database-failure">Simulation 5 — Database Failure</a></p>
</li>
<li><p><a href="#heading-simulation-6-cascading-pod-failure">Simulation 6 — Cascading Pod Failure</a></p>
</li>
<li><p><a href="#heading-simulation-7-readiness-probe-failure">Simulation 7 — Readiness Probe Failure</a></p>
</li>
<li><p><a href="#heading-how-to-read-the-signals-in-grafana">How to Read the Signals in Grafana</a></p>
</li>
<li><p><a href="#heading-how-to-use-this-for-production-debugging">How to Use This for Production Debugging</a></p>
</li>
</ul>
<h2 id="heading-what-is-kubelab"><strong>What is KubeLab?</strong></h2>
<p>KubeLab is an open-source Kubernetes failure simulation lab. It runs a real Node.js backend, a PostgreSQL database, Prometheus and Grafana, all inside a real cluster. When you click "Kill Pod", the backend calls the Kubernetes API and deletes an actual running pod. Nothing is fake.</p>
<table>
<thead>
<tr>
<th>Simulation</th>
<th>What it teaches</th>
</tr>
</thead>
<tbody><tr>
<td>Kill Random Pod</td>
<td>ReplicaSet self-healing, pod immutability</td>
</tr>
<tr>
<td>Drain Worker Node</td>
<td>Zero-downtime maintenance, PodDisruptionBudgets</td>
</tr>
<tr>
<td>CPU Stress</td>
<td>Throttling vs crashing, invisible latency</td>
</tr>
<tr>
<td>Memory Stress</td>
<td>OOMKill, exit code 137, silent restart loops</td>
</tr>
<tr>
<td>Database Failure</td>
<td>StatefulSets, PVC persistence</td>
</tr>
<tr>
<td>Cascading Pod Failure</td>
<td>Why replicas: 2 isn't enough</td>
</tr>
<tr>
<td>Readiness Probe Failure</td>
<td>Liveness vs readiness, traffic control</td>
</tr>
</tbody></table>
<p>Plan about 90 minutes for the full path. Or jump directly to any simulation if you have a specific production problem you want to reproduce.</p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/698d563262d4ce66226a844a/1cd2a06d-7a7a-4250-ab5d-8a78d24af7b5.png" alt="KubeLab cluster map — pods grouped by node, color-coded by status. During simulations, chips change color and move between nodes in real time." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>You need basic familiarity with Docker and comfort with the command line, but no prior Kubernetes experience is required.</p>
<p><strong>Hardware:</strong> 8GB RAM minimum, 16GB recommended. The lab can run on Mac, Linux, or Windows with WSL2. You'll need to install three tools. Multipass spins up Ubuntu VMs for the cluster. kubectl is the Kubernetes CLI you will use for every simulation. Git clones the repo. If you cannot run three VMs, the repo includes a Docker Compose preview at <a href="https://github.com/Osomudeya/kubelab/blob/main/setup/docker-compose-preview.md">setup/docker-compose-preview.md</a> full UI with mock data, no real cluster needed.</p>
<h2 id="heading-how-to-get-the-lab-running"><strong>How to Get the Lab Running</strong></h2>
<p>Full cluster setup lives at <a href="https://github.com/Osomudeya/kubelab/blob/main/setup/k8s-cluster-setup.md">setup/k8s-cluster-setup.md</a> in the repo. It walks through creating three VMs with Multipass, installing MicroK8s, joining the worker nodes, and deploying KubeLab. Follow it until all eleven pods show Running:</p>
<pre><code class="language-bash">kubectl get pods -n kubelab
# All 11 pods should show STATUS: Running
</code></pre>
<p>Then open two port-forwards in separate terminal tabs and keep them running for the entire tutorial:</p>
<pre><code class="language-bash"># Tab 1 — KubeLab UI at http://localhost:8080
kubectl port-forward -n kubelab svc/frontend 8080:80

# Tab 2 — Grafana at http://localhost:3000
kubectl port-forward -n kubelab svc/grafana 3000:3000
</code></pre>
<p>Grafana login: <code>admin</code> / <code>kubelab-grafana-2026</code>.</p>
<blockquote>
<p>Position the KubeLab UI and Grafana side by side. Left half of the screen is the app. Right half is Grafana. You will watch both simultaneously from Simulation 3 onward.</p>
</blockquote>
<h2 id="heading-simulation-1-kill-random-pod"><strong>Simulation 1: Kill Random Pod</strong></h2>
<p>This simulation deletes a running backend pod via the Kubernetes API. Without Kubernetes, you would SSH to the server, find the crashed process, and restart it manually, usually discovered by a user alert at 3am.</p>
<p><strong>Before you click:</strong> Run <code>kubectl get pods -n kubelab -w</code>. Watch for a pod to go Terminating then a new one to appear.</p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/698d563262d4ce66226a844a/3d3cb733-407a-482f-82e7-cbeea496157b.png" alt="Terminals running side by side before clicking Run, events streaming, pod watch, frontend and grafana port forwarding." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-bash">kubectl get pods -n kubelab -w
# backend-abc123  1/1   Terminating   0   2m
# backend-xyz789  1/1   Running       0   0s   ← ReplicaSet created a replacement
</code></pre>
<p><strong>What happened:</strong> The ReplicaSet controller noticed actual(1) did not match desired(2) and created a replacement in parallel with the shutdown. The Endpoints controller removed the dying pod from the Service before SIGTERM fired, so zero traffic hit a dying pod.</p>
<p><strong>The production trap:</strong> A missing readiness probe means the new pod receives traffic before it has opened a DB connection. You get 500s on every deployment for 2–3 seconds.</p>
<p><strong>The fix:</strong> Set <code>replicas: 2</code>, add a readiness probe, and set <code>terminationGracePeriodSeconds</code> to match your longest request timeout.</p>
<h2 id="heading-simulation-2-drain-a-worker-node"><strong>Simulation 2: Drain a Worker Node</strong></h2>
<p>This simulation cordons a worker node, then evicts all its pods to the remaining node.</p>
<p>To <em><strong>"cordon"</strong></em> a worker node means to mark it as unschedulable. When you run <code>kubectl cordon &lt;node-name&gt;</code>, the Kubernetes control plane adds the <code>node.kubernetes.io/unschedulable:NoSchedule</code> taint to the node. (A <strong>taint</strong> is a marker that tells the scheduler to avoid placing pods on that node unless they have a matching "toleration.") This tells the scheduler to stop placing any new pods onto that node. It does <strong>not</strong> affect the pods that are already running there.</p>
<p>Cordoning is the first, safe step in preparing a node for maintenance. It ensures that while you are draining the node, the scheduler isn't simultaneously trying to schedule new workloads onto it, which would defeat the purpose of the drain.</p>
<p>Without Kubernetes you would drain the server manually, guess when in-flight requests finish, patch it, and bring it back, the window of downtime is unpredictable.</p>
<p><strong>Before you click:</strong> Run <code>kubectl get pods -n kubelab -o wide -w</code>. Watch which node each pod runs on.</p>
<pre><code class="language-bash">kubectl get pods -n kubelab -o wide -w
</code></pre>
<pre><code class="language-plaintext">NAME                     NODE               STATUS
backend-abc123-xk2qp    kubelab-worker-1   Terminating   ← evicted
backend-abc123-n7mw3    kubelab-worker-2   Running       ← rescheduled
</code></pre>
<p>In <code>kubectl get nodes</code> the node shows <code>Ready,SchedulingDisabled</code> until you run <code>kubectl uncordon</code>.</p>
<p><strong>What happened:</strong> The node spec got <code>spec.unschedulable=true</code>. The Eviction API ran per pod. That path goes through PodDisruptionBudget policy checks before proceeding, unlike a raw delete. A raw <code>kubectl delete pod</code> bypasses this check entirely — which is why draining with <code>kubectl drain</code> is always safer than deleting pods manually during maintenance.</p>
<p><strong>The production trap:</strong> Two replicas with no pod anti-affinity often land on the same node. Drain that node and both pods evict at once. Complete downtime despite <code>replicas: 2</code>.</p>
<p><strong>The fix:</strong> Use pod anti-affinity with topology key: <code>kubernetes.io/hostname</code> and a PodDisruptionBudget with <code>minAvailable: 1</code>.</p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/698d563262d4ce66226a844a/1161cbf9-2482-41c7-9b5c-751762d3baaa.png" alt="Node drain CLI output: cordoned node shows Ready,SchedulingDisabled; pods reschedule to the other node." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-simulation-3-cpu-stress-and-throttling"><strong>Simulation 3: CPU Stress and Throttling</strong></h2>
<p>This simulation burns CPU inside a backend pod for 60 seconds, hitting the 200m limit. Without Kubernetes, one runaway process can consume all CPU on the host and starve every other service.</p>
<p><strong>Before you click:</strong> Run <code>watch -n 2 kubectl top pods -n kubelab</code> and open the Grafana CPU Usage panel.</p>
<pre><code class="language-bash">kubectl top pods -n kubelab
# backend-abc123   200m   ← pegged at limit for 60s; the other pod stays ~15m
</code></pre>
<p><strong>What happened:</strong> The Linux CFS scheduler enforces the cgroup limit by granting 20ms of CPU per 100ms period then freezing all processes in the cgroup for 80ms. The pod is not slow because it is broken. It is slow because it is frozen 80% of the time.</p>
<p><strong>The production trap:</strong> <code>kubectl top</code> shows the pod using 95-150m, which looks normal. The metric shows usage at the ceiling, not the throttle rate. Teams spend hours checking application code for a latency bug that is actually a CPU limit set too low.</p>
<p><strong>The fix:</strong> For latency-sensitive workloads, set CPU requests but remove CPU limits. Requests tell the scheduler where to place the pod without throttling at runtime. Confirm throttling with <code>rate(container_cpu_cfs_throttled_seconds_total{namespace="kubelab"}[5m])</code>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698d563262d4ce66226a844a/5e3fd49b-c9a0-4271-9be7-b7fec3122c1a.png" alt="One backend pod flatlined at exactly 95-150m for 60 seconds. A healthy pod's CPU fluctuates, this flat ceiling is the throttle." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-simulation-4-memory-stress-and-oomkill"><strong>Simulation 4: Memory Stress and OOMKill</strong></h2>
<p>This simulation allocates memory in 50MB chunks inside a backend pod until the kernel kills it. Without Kubernetes the process dies, the server goes down, and someone gets paged.</p>
<p><strong>Before you click:</strong> Run <code>kubectl get pods -n kubelab -l app=backend -w</code> and open the Grafana Memory Usage panel.</p>
<pre><code class="language-bash">kubectl get pods -n kubelab -l app=backend -w
# backend-abc123   0/1   OOMKilled   3   5m   ← no Terminating phase; SIGKILL bypasses graceful shutdown
</code></pre>
<p><strong>What happened:</strong> The cgroup memory limit crossed 256Mi. The Linux kernel OOM killer scored processes in the container's cgroup and sent SIGKILL (exit code 137) to the top consumer. Not Kubernetes, the kernel. SIGKILL cannot be caught or handled, so no preStop hook runs and in-memory data or open transactions can be lost. Kubernetes only observed the exit, labeled it OOMKilled, and started a fresh container.</p>
<p><strong>The production trap:</strong> The pod runs fine for 8 hours, OOMKills, and restarts. Memory resets to zero and everything looks healthy again. This repeats every 8 hours. The restart count climbs to 7, then 15, then 30, but no alert fires because the metrics look normal between crashes. You find out when a user emails saying the app has been "a bit glitchy lately."</p>
<p><strong>The fix:</strong> Alert on <code>rate(kube_pod_container_status_restarts_total{namespace="kubelab"}[1h]) &gt; 3</code> before users notice.<br>The Prometheus expression means: look at how many times containers in the <code>kubelab</code> namespace have restarted over the last hour, calculate how fast that number is increasing per second, and fire an alert if that rate exceeds the equivalent of 3 restarts per hour. A healthy pod rarely restarts. Several restarts in an hour usually means the container is hitting its memory limit, dying, and coming back in a loop, so this alert catches the silent OOMKill pattern before users do.</p>
<p>Confirm it happened:</p>
<pre><code class="language-bash">kubectl describe pod -n kubelab &lt;pod-name&gt; | grep -A 5 "Last State:"
# Reason: OOMKilled
# Exit Code: 137
</code></pre>
<p>To see the last output before the kernel killed the process, run <code>kubectl logs -n kubelab &lt;pod-name&gt; --previous</code>. The log stream stops abruptly with no shutdown message, SIGKILL leaves no time for cleanup or final logs.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698d563262d4ce66226a844a/8ced107b-9d14-4d40-b6d6-7ae0fe35b1b7.png" alt="One backend pod's memory climbs, then the line drops at the OOMKill and reappears as the container restarts. The other pod's line stays flat the whole time" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-simulation-5-database-failure"><strong>Simulation 5: Database Failure</strong></h2>
<p>This simulation scales the PostgreSQL StatefulSet to 0 replicas. The pod terminates completely. Without Kubernetes, the database server crashes and data recovery depends on whether backups exist and when they ran.</p>
<p><strong>Before you click:</strong> Run <code>kubectl get pods,pvc -n kubelab</code>. Note that the PVC exists before you start.</p>
<pre><code class="language-bash">kubectl get pods,pvc -n kubelab
# postgres-0   (gone)
# postgres-data-postgres-0   Bound   ← PVC stays; data lives on the volume
</code></pre>
<p>A PVC, or PersistentVolumeClaim, is a request for storage by a user. Think of it as a pod's way of saying, "I need a certain amount of durable, persistent storage." In the context of a stateful application like PostgreSQL, the PVC is critical. When the database pod is deleted, the PVC (and the underlying PersistentVolume it is bound to) remains. This is where the actual database files are stored. When a new <code>postgres-0</code> pod is created, the StatefulSet knows to re-attach the same PVC, ensuring the new pod has access to all the old data, preventing data loss.</p>
<p><strong>What happened:</strong> The StatefulSet controller deleted the pod but left the PersistentVolumeClaim untouched. StatefulSets guarantee stable names and stable PVC binding. <code>postgres-0</code> always mounts <code>postgres-data-postgres-0</code>. When you restore, the same pod name comes back and reattaches the same volume. PostgreSQL replays WAL to reach a consistent state.</p>
<p><strong>The production trap:</strong> Apps without connection retry logic return 500s and stay broken even after PostgreSQL restores. Connection pools that do not validate on acquire hold dead connections forever.</p>
<p><strong>The fix:</strong> Add connection retry with exponential backoff in your app. Use network-attached storage (EBS, GCE PD) in production so the pod can reschedule to any node.</p>
<h2 id="heading-simulation-6-cascading-pod-failure"><strong>Simulation 6: Cascading Pod Failure</strong></h2>
<p>This simulation deletes both backend replicas at the same time. If everything is down, without Kubernetes, you'd have to restart every service manually, and hope they come up in the right order.</p>
<p><strong>Before you click:</strong> Run <code>kubectl get endpoints -n kubelab backend-service -w</code>. Watch the IP list.</p>
<pre><code class="language-bash">kubectl get endpoints -n kubelab backend-service -w
# ENDPOINTS   &lt;none&gt;   ← every request in this window gets Connection refused
</code></pre>
<p><strong>What happened:</strong> Both pods were deleted. The Service had zero endpoints. The ReplicaSet created two replacements in parallel, but traffic stayed broken until both passed their readiness probes. The endpoint list went empty and came back. You can see the exact downtime window in Grafana's HTTP Request Rate panel.</p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/698d563262d4ce66226a844a/6cae14e0-faf2-4d42-90f4-32d00a1b4119.png" alt="The 5xx spike during Cascading Failure, 5 to 15 seconds of real downtime with the exact window timestamped" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>The production trap:</strong> <code>replicas: 2</code> protects you from one pod dying at a time, nothing more.<br>If both replicas land on the same node and that node goes down, you have zero replicas and full downtime.<br>Check right now with <code>kubectl get pods -n kubelab -o wide | grep backend</code>, and if both pods show the same NODE, you are one node failure away from an outage.</p>
<p><strong>The fix:</strong> Use pod anti-affinity to force replicas onto different nodes and a PodDisruptionBudget with <code>minAvailable: 1</code> to block any voluntary action that would leave zero replicas.</p>
<h2 id="heading-simulation-7-readiness-probe-failure"><strong>Simulation 7: Readiness Probe Failure</strong></h2>
<p>This simulation makes one backend pod fail its readiness probe for 120 seconds without restarting it. Without Kubernetes, you'd have no way to take a pod out of traffic rotation without killing it. This is what happens in production when your app connects to a database on startup but the DB is slow. The pod is alive, but it's not ready. Kubernetes holds it out of rotation until it is.</p>
<p><strong>Before you click:</strong> Run <code>kubectl get pods -n kubelab -w</code> in one tab and <code>kubectl get endpoints -n kubelab backend-service -w</code> in another.</p>
<pre><code class="language-bash"># Pods tab: STATUS Running, RESTARTS 0 — almost nothing changes
# Endpoints tab: one IP disappears — the pod is alive but not receiving traffic
</code></pre>
<p><strong>What happened:</strong> <code>/ready</code> returned 503. The kubelet marked the pod <code>Ready=False</code>. The Endpoints controller removed its IP from the Service. The liveness probe <code>/health</code>) still returned 200, so no restart. After 120 seconds <code>/ready</code> recovered and the pod rejoined. Run <code>kubectl logs -n kubelab &lt;failing-pod&gt; -f</code> to see the app log 503s for the readiness endpoint while the pod stays Running and receives no traffic.</p>
<p><strong>The production trap:</strong> Readiness probes that check external dependencies (database, cache, downstream API) will remove all pods from rotation when that dependency goes down. Instead of degrading gracefully, your entire app goes offline.</p>
<p><strong>The fix:</strong> Readiness probes should test only what the pod itself controls. Use a separate deep health endpoint for dependency checks and never tie readiness to external service availability.</p>
<h2 id="heading-4-how-to-read-the-signals-in-grafana"><strong>4. How to Read the Signals in Grafana</strong></h2>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/698d563262d4ce66226a844a/e6709c25-2d80-489c-b7fb-418ef303b7e2.png" alt="A screenshot showing my grafana dashboards" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><code>kubectl</code> shows current state. Grafana shows what happened over time. That history is essential when you are debugging something that started 4 hours ago.</p>
<h3 id="heading-the-four-panels-that-matter"><strong>The Four Panels that Matter</strong></h3>
<p><strong>Pod Restarts:</strong> A flat line is good. A step up every few hours is a silent OOMKill loop — the most common invisible production failure.</p>
<p><strong>CPU Usage:</strong> A healthy pod's CPU fluctuates. A throttled pod's CPU is unnaturally flat at its limit. That flat ceiling is the signal, not the number.</p>
<p><strong>Memory Usage:</strong> Watch for a line that climbs steadily then disappears. That disappearance is an OOMKill. The line reappearing from zero is the restart.</p>
<p><strong>HTTP Request Rate:</strong> During Cascading Failure you see a spike of 5xx for 5–15 seconds, the exact downtime window, timestamped.</p>
<h3 id="heading-5-how-to-read-the-terminal-signals"><strong>5. How to Read the Terminal Signals</strong></h3>
<p>What you see in the terminal during and after each simulation tells you things Grafana cannot. Five commands matter.</p>
<p>The <code>-w</code> flag on <code>kubectl get pods -n kubelab -w</code> streams changes in real time. The columns that matter are READY, STATUS, and RESTARTS. READY shows containers ready vs total — <code>1/2</code> means one container is alive but not passing its readiness probe. STATUS shows the pod lifecycle phase: Running, Pending, Terminating, OOMKilled. RESTARTS is the most important column in production. A number climbing silently over days is a memory leak or a crash loop nobody has noticed yet.</p>
<p><code>kubectl get events -n kubelab --sort-by=.lastTimestamp</code> is the control plane's diary. Every action the cluster took is here: Killing, SuccessfulCreate, Scheduled, Pulled, Started, OOMKilling, BackOff. When something breaks and you do not know why, read the events. The timestamp gap between a Killing event and the next Started event is your actual downtime window — not an estimate, the exact number.</p>
<p><code>kubectl describe pod -n kubelab &lt;pod-name&gt;</code> is the deepest single-pod view. Three sections matter: Conditions (Ready: True/False tells you if the pod is in the Service endpoints), Last State (shows the previous container's exit reason — OOMKilled, exit code 137, or a crash), and Events at the bottom (the scheduler's reasoning for every placement decision). This is the first command to run when a pod is misbehaving.</p>
<p><code>kubectl get endpoints -n kubelab backend-service</code> shows which pod IPs are actually receiving traffic right now. A pod can show Running in <code>kubectl get pods</code> and be completely absent from this list. That is a readiness probe failure. If this list is empty, no request to that Service will succeed regardless of how many pods show Running. Check this whenever users report errors but pods look healthy.</p>
<p><code>kubectl logs -n kubelab &lt;pod-name&gt;</code> shows the container's stdout and stderr. Use <code>-f</code> to follow the stream. After a pod restarts, use <code>--previous</code> to see the logs from the container that just exited, essential when you need to know what the app was doing right before an OOMKill or crash. Logs are per container and are gone once the pod is replaced, so grab them before the ReplicaSet creates a new pod with a new name.</p>
<p>A full event sequence during Kill Pod recovery looks like this:</p>
<pre><code class="language-bash">kubectl get events -n kubelab --sort-by=.lastTimestamp | tail -10
</code></pre>
<pre><code class="language-plaintext">REASON            MESSAGE
Killing           Stopping container backend          ← SIGTERM sent
SuccessfulCreate  Created pod backend-xyz789          ← ReplicaSet fired
Scheduled         Successfully assigned to worker-2   ← Scheduler placed it
Pulled            Container image already present     ← no pull delay
Started           Started container backend           ← running
</code></pre>
<p>The line between Killing and Started is your actual recovery time. In a healthy cluster with a cached image it is 3–8 seconds. If it takes longer, check the Scheduled line, the scheduler may have struggled to find a node.</p>
<h3 id="heading-two-prometheus-queries-worth-memorizing"><strong>Two Prometheus Queries Worth Memorizing</strong></h3>
<p><strong>First query: silent restart loop.</strong> <code>rate(kube_pod_container_status_restarts_total{namespace="kubelab"}[1h])</code> counts how many times containers in that namespace have restarted over the last hour and expresses it as a rate (restarts per second). A healthy workload rarely restarts. If this rate is high (for example more than 3 restarts per hour), something is killing the container repeatedly, often an OOMKill or a crash. Alert when it exceeds a threshold so you see the pattern before users report errors.</p>
<p><strong>Second query: invisible CPU throttling.</strong> <code>rate(container_cpu_cfs_throttled_seconds_total{namespace="kubelab"}[5m])</code> measures how much time, per second, the Linux scheduler spent throttling containers in that namespace over the last 5 minutes. A result of 0.25 means the container was frozen 25% of the time. High latency with no restarts and "normal" CPU usage in <code>kubectl top</code> often means the CPU limit is too low and the kernel is throttling the process. Alert when this rate exceeds about 0.25 (25% throttled).</p>
<pre><code class="language-plaintext"># Silent restart loop — alert when this exceeds 3 per hour
rate(kube_pod_container_status_restarts_total{namespace="kubelab"}[1h])

# Invisible throttling — alert when this exceeds 25%
rate(container_cpu_cfs_throttled_seconds_total{namespace="kubelab"}[5m])
</code></pre>
<p>Run these against your own cluster. Not just KubeLab. These are production queries.</p>
<h2 id="heading-6-how-to-use-this-for-production-debugging"><strong>6. How to Use This for Production Debugging</strong></h2>
<p>The repo includes <a href="https://github.com/Osomudeya/kubelab/blob/main/docs/diagnose.md">docs/diagnose.md</a>, a symptom-to-simulation map. Find the simulation that reproduces your issue, run it in KubeLab, and understand the mechanics before you touch production.</p>
<p><strong>Exit code 137, pods restarting.</strong> Run the Memory Stress simulation. Confirm with <code>kubectl describe pod | grep -A 5 "Last State:"</code> and look for <code>Reason: OOMKilled</code>. Raise limits or find the leak. The simulation shows both.</p>
<p><strong>High latency, pods look healthy, zero restarts.</strong> Run the CPU Stress simulation. Check <code>container_cpu_cfs_throttled_seconds_total</code> in Prometheus. If it climbs, your CPU limit is too low and the pod is frozen by CFS.</p>
<p><strong>503 on some requests, pods show Running.</strong> Run the Readiness Probe Failure simulation. Check <code>kubectl get endpoints</code> — one pod IP is missing despite Running. The pod gets zero traffic.</p>
<p><strong>Pods stuck Pending after a node went down.</strong> Run the Drain Node simulation. Run <code>kubectl describe pod &lt;pending-pod&gt;</code> and read Events. The scheduler will state why it cannot place the pod, often insufficient capacity or a PVC on the failed node.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>You just broke a real Kubernetes cluster seven ways and watched it fix itself each time. You have seen the ReplicaSet controller fire, read an OOMKill from <code>kubectl describe</code>, watched endpoints go empty during a cascading failure, and understood why a pod can be Running and receiving zero traffic at the same time.</p>
<p>What you practiced here applies to other clusters, staging or production you can read but not safely break. That muscle memory (events, endpoints, restart counter) is what you reach for at 3 am when something is wrong. KubeLab is the safe place to build that reflex.</p>
<p>The repo holds more than this article covered. Explore mode lets you run simulations without the guided flow. The full interview prep doc at <a href="https://github.com/Osomudeya/kubelab/blob/main/docs/interview-prep.md">docs/interview-prep.md</a> has answers to the 13 most common Kubernetes interview questions. The observability guide at <a href="https://github.com/Osomudeya/kubelab/blob/main/docs/observability.md">docs/observability.md</a> covers Prometheus and Grafana setup in detail.</p>
<p>If this helped you, star the repo at <a href="https://github.com/Osomudeya/kubelab">https://github.com/Osomudeya/kube-lab</a> and share it with someone who is learning Kubernetes the hard way.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What is Disaster Recovery Testing? Explained with Practical Examples ]]>
                </title>
                <description>
                    <![CDATA[ Most teams are confident they can recover from a major outage until they actually have to. Backups exist, architectures are redundant and a recovery plan is documented somewhere, yet real incidents of ]]>
                </description>
                <link>https://www.freecodecamp.org/news/disaster-recovery-testing/</link>
                <guid isPermaLink="false">69a5614ffc6453a5f17ca809</guid>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cybersecurity ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Security ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Alex Tray ]]>
                </dc:creator>
                <pubDate>Mon, 02 Mar 2026 10:07:11 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/57c1e51b-867c-444e-90f0-e6551284fe0a.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most teams are confident they can recover from a major outage until they actually have to. Backups exist, architectures are redundant and a recovery plan is documented somewhere, yet real incidents often reveal critical gaps.</p>
<p>Disaster recovery testing is what separates assumed resilience from proven recovery, but it’s still skipped, rushed or treated as a checkbox exercise. For developers and technical teams, that gap can turn a manageable failure into a prolonged outage.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-what-is-disaster-recovery-testing">What is Disaster Recovery Testing?</a></p>
</li>
<li><p><a href="#heading-how-disaster-recovery-testing-works-in-practice">How Disaster Recovery Testing Works in Practice</a></p>
</li>
<li><p><a href="#heading-disaster-recovery-testing-methods-developers-should-know">Disaster Recovery Testing Methods Developers Should Know</a></p>
</li>
<li><p><a href="#heading-what-technology-disaster-recovery-testing-evaluates">What Technology Disaster Recovery Testing Evaluates</a></p>
</li>
<li><p><a href="#heading-how-to-test-a-disaster-recovery-plan">How to Test a Disaster Recovery Plan</a></p>
</li>
<li><p><a href="#heading-disaster-recovery-test-scenarios-practical-examples">Disaster Recovery Test Scenarios: Practical Examples</a></p>
</li>
<li><p><a href="#heading-disaster-recovery-test-report-turning-tests-into-improvements">Disaster Recovery Test Report: Turning Tests Into Improvements</a></p>
</li>
<li><p><a href="#heading-disaster-recovery-audits-and-continuous-validation">Disaster Recovery Audits and Continuous Validation</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-is-disaster-recovery-testing"><strong>What is Disaster Recovery Testing?</strong></h2>
<p>Disaster recovery (DR) testing is the process of validating that systems, data and applications can be restored after a disruptive event within defined recovery objectives. It generally evaluates:</p>
<ul>
<li><p><strong>Recovery Time Objective (RTO)</strong>: How quickly systems must be restored.</p>
</li>
<li><p><strong>Recovery Point Objective (RPO)</strong>: How much data loss is acceptable.</p>
</li>
<li><p><strong>Operational readiness</strong>: Whether teams know what to do during an incident.</p>
</li>
</ul>
<p>A disaster recovery test plan documents how these elements are tested, who is responsible and what success looks like. Without testing, DR plans are assumptions, not guarantees.</p>
<h2 id="heading-how-disaster-recovery-testing-works-in-practice"><strong>How Disaster Recovery Testing Works in Practice</strong></h2>
<p>In real environments, disaster recovery testing is used to check all <a href="https://www.nakivo.com/blog/components-disaster-recovery-plan-checklist/">elements of the disaster recovery plan</a> and is rarely a single event. It’s a structured exercise that simulates failure, observes system behavior and measures outcomes against expectations.</p>
<p>A typical DR test involves:</p>
<ol>
<li><p><strong>Defining scope</strong> – Which applications, services, or data sets are included.</p>
</li>
<li><p><strong>Selecting a scenario</strong> – Outage, corruption, ransomware, region failure, and so on.</p>
</li>
<li><p><strong>Executing recovery actions</strong> – Restore data, fail over systems, reconfigure dependencies.</p>
</li>
<li><p><strong>Measuring results</strong> – Time to recovery, data consistency, service availability.</p>
</li>
<li><p><strong>Documenting findings</strong> – What worked, what failed, what needs improvement.</p>
</li>
</ol>
<p>For developers, the key shift is recognizing that DR testing isn’t just an ops exercise. Application architecture, data handling and deployment patterns all influence recovery outcomes.</p>
<p>Importantly, regulatory pressure is also reshaping how organizations approach recovery validation. Frameworks such as the <a href="https://heimdalsecurity.com/nis-2-directive">NIS2 Directive</a> require essential and important entities in the EU to implement robust cybersecurity risk management measures, including incident response and business continuity capabilities.</p>
<h2 id="heading-disaster-recovery-testing-methods-developers-should-know"><strong>Disaster Recovery Testing Methods Developers Should Know</strong></h2>
<p>Different testing methods provide different levels of confidence. Mature teams use more than one. Each method has a place, but relying only on low-impact testing creates blind spots that surface during real incidents.</p>
<h3 id="heading-checklist-testing"><strong>Checklist Testing</strong></h3>
<p>The simplest method: Teams review documented recovery steps without executing them. This helps validate documentation completeness but does not confirm real-world recoverability.</p>
<h3 id="heading-tabletop-exercises"><strong>Tabletop Exercises</strong></h3>
<p>Stakeholders walk through a simulated disaster scenario and discuss responses. Tabletop tests are useful for identifying communication gaps and unclear responsibilities, especially for cross-team coordination.</p>
<h3 id="heading-partial-or-component-testing"><strong>Partial or Component Testing</strong></h3>
<p>Specific systems, such as databases or backup restores, are tested in isolation. Developers often encounter this when validating recovery procedures for individual services or environments.</p>
<h3 id="heading-full-scale-testing"><strong>Full-scale Testing</strong></h3>
<p>This is the most comprehensive method. It involves actual failover or full recovery in production-like environments. While disruptive, full-scale tests provide the highest confidence.</p>
<h2 id="heading-what-technology-disaster-recovery-testing-evaluates"><strong>What Technology Disaster Recovery Testing Evaluates</strong></h2>
<p>Modern environments are complex, and disaster recovery testing must validate more than just data restores.</p>
<p>DR testing evaluates:</p>
<ul>
<li><p><strong>Backup integrity</strong> – Are backups usable, consistent and complete?</p>
</li>
<li><p><strong>Application dependencies</strong> – Do services come back in the correct order?</p>
</li>
<li><p><strong>Infrastructure recovery</strong> – Can compute, storage and networking be re-provisioned?</p>
</li>
<li><p><strong>Identity and access</strong> – Do credentials, secrets and permissions still function?</p>
</li>
<li><p><strong>Automation and scripts</strong> – Do recovery workflows still match current architectures?</p>
</li>
</ul>
<p>For developers, this often reveals hidden coupling between services, outdated scripts or environment-specific assumptions that were never documented.</p>
<h2 id="heading-how-to-test-a-disaster-recovery-plan"><strong>How to Test a Disaster Recovery Plan</strong></h2>
<p>Testing a disaster recovery plan doesn’t require shutting down production on day one. A practical, incremental approach works best.</p>
<ol>
<li><p><strong>Start with a single application</strong>: Pick a service with well-defined data and dependencies. Avoid starting with your most complex system.</p>
</li>
<li><p><strong>Validate backup restores</strong>: Restore data into a non-production environment and confirm application functionality, not just file presence.</p>
</li>
<li><p><strong>Measure RTO and RPO</strong>: Time the recovery process and compare results to stated objectives. At this stage, many teams can discover that their objectives were unrealistic.</p>
</li>
<li><p><strong>Test failure assumptions</strong>: Simulate real-world issues like missing credentials, expired certificates or partial data loss.</p>
</li>
<li><p><strong>Document gaps immediately</strong>: Update the disaster recovery test plan while findings are fresh. Untested fixes are just new assumptions.</p>
</li>
</ol>
<p>This approach makes disaster recovery testing part of standard processes rather than a once-a-year compliance task.</p>
<h3 id="heading-automating-restore-validation"><strong>Automating Restore Validation</strong></h3>
<p>One of the most common gaps in disaster recovery testing is stopping at “restore completed” instead of validating that the application actually works. A restored database that can’t serve queries or contains incomplete data doesn’t meet recovery objectives.</p>
<p>Teams can reduce this risk by automating post-restore validation. For example, after restoring a PostgreSQL database into a staging or isolated DR environment, a simple validation script can confirm connectivity and basic data integrity:</p>
<pre><code class="language-python">import psycopg2

import sys


def validate_restore():

&nbsp;&nbsp;&nbsp;&nbsp;try:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;conn = psycopg2.connect(

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;host="restored-db.internal",

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;database="appdb",

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;user="dr_test_user",

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;password="securepassword"

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cur = conn.cursor()

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cur.execute("SELECT COUNT(*) FROM users;")

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;result = cur.fetchone()



&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if result and result[0] &gt; 0:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print("Restore validation successful.")

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print("Restore validation failed: No data found.")

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sys.exit(1)


&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;conn.close()

&nbsp;&nbsp;&nbsp;&nbsp;except Exception as e:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print(f"Restore validation error: {e}")

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sys.exit(1)


validate_restore()
</code></pre>
<p>This script does three important things:</p>
<ul>
<li><p>Confirms the database is reachable</p>
</li>
<li><p>Executes a real query, not just a connection check</p>
</li>
<li><p>Fails explicitly if the expected data is missing</p>
</li>
</ul>
<p>In practice, teams can integrate scripts like this into CI/CD pipelines or scheduled recovery drills. The goal isn’t to test every edge case, but to move from “backup exists” to “restore is functionally verified.” Over time, these automated checks become part of the disaster recovery test plan, helping teams measure RTO accurately and detect configuration drift before a real incident exposes it.</p>
<h2 id="heading-disaster-recovery-test-scenarios-practical-examples"><strong>Disaster Recovery Test Scenarios: Practical Examples</strong></h2>
<p>Effective disaster recovery testing focuses on realistic failures, not idealized outages.</p>
<h3 id="heading-accidental-deletion-or-misconfiguration"><strong>Accidental Deletion or Misconfiguration</strong></h3>
<p>A dropped database table, deleted storage bucket or bad configuration change tests how quickly teams can restore specific data without rolling back entire systems. These everyday incidents often reveal slow or overly manual recovery processes.</p>
<h3 id="heading-data-corruption-and-application-failure"><strong>Data Corruption and Application Failure</strong></h3>
<p>Buggy releases can silently corrupt data while systems remain online. This scenario validates point-in-time recovery and whether teams can identify when corruption started, not just restore the latest backup.</p>
<h3 id="heading-ransomware-simulation"><strong>Ransomware Simulation</strong></h3>
<p>Ransomware testing checks whether clean, uncompromised backups can be restored in isolation. It often exposes gaps in backup immutability, credential handling and realistic recovery times.</p>
<h3 id="heading-infrastructure-or-platform-outage"><strong>Infrastructure or Platform Outage</strong></h3>
<p>Simulating the loss of a cluster, availability zone or region tests automation and infrastructure-as-code maturity. In virtualized environments, most commonly <a href="https://www.nakivo.com/vmware-disaster-recovery/">VMware disaster recovery</a>, testing involves restoring virtual machines at a secondary site and validating networking and application dependencies.</p>
<h3 id="heading-credential-and-access-failure"><strong>Credential and Access Failure</strong></h3>
<p>Recovery can stall if credentials, certificates or secret keys are unavailable. Testing this scenario validates identity systems and whether recovery procedures rely on fragile access assumptions.</p>
<h2 id="heading-disaster-recovery-test-report-turning-tests-into-improvements"><strong>Disaster Recovery Test Report: Turning Tests Into Improvements</strong></h2>
<p>Testing without documentation is wasted effort. A disaster recovery test report turns results into actionable improvements.</p>
<p>A valuable DR test report includes:</p>
<ul>
<li><p>Test scope and scenario</p>
</li>
<li><p>Expected vs. actual RTO/RPO</p>
</li>
<li><p>Recovery steps executed</p>
</li>
<li><p>Failures, delays and root causes</p>
</li>
<li><p>Recommended changes</p>
</li>
</ul>
<p>For developers, this often results in concrete action items: refactoring startup dependencies, adding health checks, improving automation or adjusting data protection policies. The report should feed directly into backlog planning.</p>
<h2 id="heading-disaster-recovery-audits-and-continuous-validation"><strong>Disaster Recovery Audits and Continuous Validation</strong></h2>
<p>Audits often expose what teams already suspect: Disaster recovery plans exist, but haven’t been tested recently (or at all).</p>
<p>Rather than treating audits as one-time events, teams should adopt continuous validation:</p>
<ul>
<li><p>Regular restore tests integrated into CI/CD pipelines.</p>
</li>
<li><p>Scheduled DR tests tied to major architecture changes.</p>
</li>
<li><p>Automated alerts when recovery objectives drift.</p>
</li>
</ul>
<p>This shifts disaster recovery testing from an annual obligation to an ongoing practice that evolves alongside the environment.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Disaster recovery testing is not about pessimism, it’s about realism. Systems and people change, and failure modes evolve faster than documentation. Without testing, even the best-designed recovery plan can become outdated.</p>
<p>For developers and technical teams, practicing disaster recovery testing builds confidence rooted in evidence, not assumptions. It exposes hidden dependencies, validates data protection strategies and ensures that when something goes wrong, recovery is predictable instead of chaotic.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The AI Coding Loop: How to Guide AI With Rules and Tests ]]>
                </title>
                <description>
                    <![CDATA[ Building great software isn't about perfect prompts, it's about a disciplined process. In this guide, I'll share my workflow for shipping secure code: defining clear goals, mapping edge cases, and bui ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-guide-ai-with-rules-and-tests/</link>
                <guid isPermaLink="false">699e41d20daf99859e60d319</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ automation testing  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Sumit Saha ]]>
                </dc:creator>
                <pubDate>Wed, 25 Feb 2026 00:26:58 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/b757ebbf-c9e9-44ec-b92c-7a38a8616e68.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Building great software isn't about perfect prompts, it's about a disciplined process. In this guide, I'll share my workflow for shipping secure code: defining clear goals, mapping edge cases, and building incrementally with runnable tests.</p>
<p>Using a Node.js shopping cart example, I'll show why server-side validation and test-driven development beat "one-shot" AI outputs every time. Let's dive into how to make AI your most reliable collaborator.</p>
<h2 id="heading-some-background">Some Background</h2>
<p>Last week I did something that felt amazing for about… five seconds. I opened an AI tool, typed one sentence, and it generated a whole shopping cart module for an e-commerce app. Lots of files, lots of code, even folders and patterns. It looked professional.</p>
<p>And then I realized something: the problem was not "how fast AI wrote code." The problem was "how do I know this code is correct?"</p>
<p>Here's the truth: a big pile of code that you didn't write is not a shortcut. For most developers, it's actually extra work. You have to read it, understand it, and still catch the hidden mistakes.</p>
<p>So today I'm not going to give you another "AI is coming" talk. Instead, I'll show you a simple loop that any developer can follow – beginner, mid-level, or senior – to get better results from AI, step by step, without getting trapped. And I'll show it with a real example you can run in one file.</p>
<h2 id="heading-heres-what-well-cover">Here’s What We’ll Cover:</h2>
<ul>
<li><p><a href="#heading-the-5-second-high-and-the-real-problem">The 5-second high (and the real problem)</a></p>
</li>
<li><p><a href="#heading-the-golden-rule-never-trust-user-prices">The golden rule: never trust user prices</a></p>
</li>
<li><p><a href="#heading-the-mindset-shift-stop-asking-for-the-whole-app">The mindset shift: stop asking for the whole app</a></p>
</li>
<li><p><a href="#heading-the-ai-coding-loop-the-7-step-workflow">The AI coding loop (the 7-step workflow)</a></p>
</li>
<li><p><a href="#heading-apply-the-loop-a-server-side-cart-total-calculator">Apply the loop: a server-side cart total calculator</a></p>
<ul>
<li><a href="#heading-the-prompt-small-piece-strong-constraints">The prompt (small piece, strong constraints)</a></li>
</ul>
</li>
<li><p><a href="#heading-one-file-runnable-example-with-a-wrong-version-on-purpose">One-file runnable example (with a wrong version on purpose)</a></p>
<ul>
<li><a href="#heading-what-you-should-notice-here">What you should notice here</a></li>
</ul>
</li>
<li><p><a href="#heading-how-to-use-failing-tests-as-a-flashlight">How to use failing tests as a flashlight</a></p>
</li>
<li><p><a href="#heading-copy-paste-prompt-template">Copy-paste prompt template</a></p>
</li>
<li><p><a href="#heading-a-calm-hype-check-why-fundamentals-matter-more-now">A calm hype check: why fundamentals matter more now</a></p>
<ul>
<li><a href="#heading-a-simple-exercise-do-this-once-and-youll-feel-the-skill">A simple exercise (do this once and you'll feel the skill)</a></li>
</ul>
</li>
<li><p><a href="#heading-recap">Recap</a></p>
</li>
</ul>
<h2 id="heading-the-5-second-high-and-the-real-problem">The 5-Second High (and the Real Problem)</h2>
<p>A lot of people misunderstand AI coding. They think the main job is typing code. But the main job is thinking clearly. Typing is cheap now. Thinking is expensive.</p>
<p>When AI produces a "perfect-looking" module in one shot, the real work doesn't disappear. It moves downstream:</p>
<ul>
<li><p>You still need to understand what it generated</p>
</li>
<li><p>You still need to verify it matches your rules</p>
</li>
<li><p>You still need to catch the mistakes that hide inside "nice looking code"</p>
</li>
</ul>
<p>If you can't verify it, you don't own it. And if you don't own it, you can't safely ship it.</p>
<p><strong>Tip:</strong> Treat AI output like code from a stranger on the internet: useful, but untrusted until proven.</p>
<h2 id="heading-the-golden-rule-never-trust-user-prices">The Golden Rule: Never Trust User Prices</h2>
<p>I started exactly like a beginner would start. I opened AI and wrote a vague prompt:</p>
<blockquote>
<p>Design and develop an e-commerce shopping cart module for me.</p>
</blockquote>
<p>AI replied with a big output. It looked clean. If you're new, you might think:</p>
<blockquote>
<p>Wow, it solved it.</p>
</blockquote>
<p>But then I asked myself:</p>
<blockquote>
<p>What is the easiest way this can go wrong in real life?</p>
</blockquote>
<p>And the answer is also simple: “money can be stolen”. Because a shopping cart has one golden rule: never trust prices coming from the user.</p>
<p>If the browser sends you: “T-shirt price is \(1" and you accept it, someone can pay \)1 for a $20 product. And when AI generates a big module quickly, that kind of mistake can easily hide inside "nice looking code."</p>
<p><strong>Warning:</strong> Any system that accepts client-sent prices is basically inviting price tampering.</p>
<h2 id="heading-the-mindset-shift-stop-asking-for-the-whole-app">The Mindset Shift: Stop Asking for the Whole App</h2>
<p>So instead of accepting the big AI output, I changed my approach. I said:</p>
<blockquote>
<p>I'm not going to ask AI to build the whole app. I will break the big thing into small parts, and I will guide AI like a real engineer.</p>
</blockquote>
<p>That is the first mindset shift. In the AI era, your value is not how fast you type. Your value is how well you can do three things:</p>
<ul>
<li><p>define the problem clearly</p>
</li>
<li><p>break it into small pieces</p>
</li>
<li><p>prove the result is correct</p>
</li>
</ul>
<p>Big systems are built from small correct pieces. That's not "prompt engineering." That's engineering.</p>
<h2 id="heading-the-ai-coding-loop-the-7-step-workflow">The AI Coding Loop (the 7-Step Workflow)</h2>
<p>Here's the loop I use. It's simple English. You can copy it and use it for any project:</p>
<ul>
<li><p>Write the goal in one sentence</p>
</li>
<li><p>Write the rules (what must be true)</p>
</li>
<li><p>Write two examples (input → output)</p>
</li>
<li><p>Write two bad situations (weird cases)</p>
</li>
<li><p>Ask AI for a small piece, not the whole thing</p>
</li>
<li><p>Ask for tests, then run them</p>
</li>
<li><p>If something fails, improve the prompt and repeat</p>
</li>
</ul>
<p>That's it. That's the loop. Here it is in visual form:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771426083349/2f126c6b-9e17-469e-881f-68e3c6c384a9.png" alt="AI coding loop workflow" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>Tip:</strong> The loop is the skill. Tools will change. The loop will still work.</p>
<h2 id="heading-apply-the-loop-a-server-side-cart-total-calculator">Apply the Loop: a Server-side Cart Total Calculator</h2>
<p>Now let's apply it to the shopping cart example. Instead of "build me a cart module," I wrote a tiny requirement note:</p>
<blockquote>
<p>We need a cart total calculator on the server. User sends <code>productId</code> and <code>quantity</code>. We must ignore any <code>price</code> from the user. We must use our own product list. We must handle unknown products and invalid <code>quantity</code>. We must calculate <code>subtotal</code>, <code>discount</code>, <code>tax</code>, and final <code>total</code>. We must round money correctly. We must have tests.</p>
</blockquote>
<p>This is not a large or complex requirements specification - just a clear and concise note.</p>
<p>And then I asked AI for only one small piece:</p>
<ul>
<li><p>Not the UI</p>
</li>
<li><p>Not the database</p>
</li>
<li><p>Not the entire architecture</p>
</li>
<li><p>Just one function, with tests</p>
</li>
</ul>
<p>Because the fastest way to build something real is to prove one brick at a time. We have written down everything we discussed in the requirement note. It would be great to also create a visual representation of those ideas. Along with the requirement note, we can prepare a simple sketch or diagram for our own reference. This way, it can serve as a clean and well-documented requirement specification, which we can keep recorded in our project's GitHub <code>README.md</code> file.</p>
<p>In the diagram below, we can have a browser on the left and the server on the right. The browser/user is an untrusted input source. The user may send <code>productId</code>, <code>qty</code>, and even a fake <code>price</code>, but the server must treat only <code>productId</code> and <code>qty</code> as input and must ignore any client-sent price. The server then looks up the real price from its own trusted product catalog, validates the quantity, and calculates totals from server-side data. This is the trust boundary: prices come from the server, not from the client.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771426306275/e21c0f2c-3eda-42f9-bd44-65d74b2aa10e.png" alt="Trust boundary and price tampering" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-the-prompt-small-piece-strong-constraints">The prompt (small piece, strong constraints)</h3>
<p>This is the shape of the prompt I used:</p>
<p>Create a single JavaScript file I can run with Node.</p>
<p><strong>Goal:</strong></p>
<p>Calculate shopping cart totals.</p>
<p><strong>Rules:</strong></p>
<ul>
<li><p>Input items have productId and qty.</p>
</li>
<li><p>Do NOT trust price from user input.</p>
</li>
<li><p>Use my product catalog.</p>
</li>
<li><p>qty must be at least 1.</p>
</li>
<li><p>discountPercent and taxPercent must not be negative.</p>
</li>
<li><p>discount first, then tax.</p>
</li>
<li><p>round money to 2 decimals.</p>
</li>
</ul>
<p><strong>Examples:</strong></p>
<ul>
<li><p>2 T-shirts (20 each) + 1 mug (12.50) =&gt; subtotal 52.50</p>
</li>
<li><p>discount 10%, tax 8% =&gt; discount first, then tax</p>
</li>
</ul>
<p><strong>Deliver:</strong></p>
<ul>
<li><p>one function</p>
</li>
<li><p>simple tests using Node's built-in assert</p>
</li>
<li><p>print one example output</p>
</li>
</ul>
<p>One small change makes a massive difference: “rules + examples + tests”. AI still tries to help fast, but now it has guardrails. And if it still makes a mistake, you can catch it, because you asked for proof.</p>
<p>Here is a visual representation of the "Cart Totals Pipeline" that covers all the use cases involved in the cart totals calculation process.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771426340852/b9a1d43c-3e89-468b-b809-7c9d7ad53932.png" alt="Cart totals pipeline (discount then tax)" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In the diagram, the cart total calculation follows a fixed pipeline. First, validate inputs (known <code>productId</code>, valid <code>qty</code>, non-negative discount/tax). Next, compute <code>subtotal</code> from the trusted product catalog. Then apply the discount to get the discounted amount. After that, calculate tax on the discounted amount (not on the original subtotal). Finally, round values correctly and return the result (<code>subtotal</code>, <code>discount</code>, <code>tax</code>, and <code>total</code>). The key rule is the order: discount first, then tax.</p>
<h2 id="heading-one-file-runnable-example-with-a-wrong-version-on-purpose">One-File Runnable Example (with a Wrong Version on Purpose)</h2>
<p>Now here's the one-file example you can run right now. No setup. Just Node. Create a file named <code>cart.js</code>, paste in the below code, and run <code>node cart.js</code>.</p>
<p>It includes two versions:</p>
<ul>
<li><p>a wrong version that trusts user price (this is the mistake we want to learn from)</p>
</li>
<li><p>a correct version that uses a trusted catalog</p>
</li>
</ul>
<pre><code class="language-js">// cart.js

// Run: node cart.js

const assert = require("node:assert/strict");

// Trusted product catalog (server-side truth)

const PRODUCTS = {
    tshirt: { name: "T-shirt", priceCents: 2000 }, // $20.00

    mug: { name: "Mug", priceCents: 1250 }, // $12.50

    book: { name: "Book", priceCents: 1599 }, // $15.99
};

function money(cents) {
    return (cents / 100).toFixed(2);
}

// WRONG: trusts user price

function cartTotal_WRONG(cartItems, discountPercent = 0, taxPercent = 0) {
    let subtotalCents = 0;

    for (const item of cartItems) {
        const priceCents = Math.round((item.price ?? 0) * 100); // user can cheat

        subtotalCents += priceCents * item.qty;
    }

    const discountCents = Math.round(subtotalCents * (discountPercent / 100));

    const afterDiscount = subtotalCents - discountCents;

    const taxCents = Math.round(afterDiscount * (taxPercent / 100));

    const totalCents = afterDiscount + taxCents;

    return totalCents;
}

// Correct: uses trusted catalog + checks

function cartTotal(cartItems, discountPercent = 0, taxPercent = 0) {
    if (!Array.isArray(cartItems))
        throw new Error("cartItems must be an array");

    if (typeof discountPercent !== "number" || discountPercent &lt; 0)
        throw new Error("discountPercent must be non-negative");

    if (typeof taxPercent !== "number" || taxPercent &lt; 0)
        throw new Error("taxPercent must be non-negative");

    let subtotalCents = 0;

    for (const item of cartItems) {
        const { productId, qty } = item || {};

        if (typeof productId !== "string" || !PRODUCTS[productId]) {
            throw new Error("Unknown productId: " + productId);
        }

        if (typeof qty !== "number" || qty &lt; 1) {
            throw new Error("qty must be at least 1");
        }

        subtotalCents += PRODUCTS[productId].priceCents * qty;
    }

    const discountCents = Math.round(subtotalCents * (discountPercent / 100));

    let afterDiscountCents = subtotalCents - discountCents;

    if (afterDiscountCents &lt; 0) afterDiscountCents = 0;

    const taxCents = Math.round(afterDiscountCents * (taxPercent / 100));

    const totalCents = afterDiscountCents + taxCents;

    return { subtotalCents, discountCents, taxCents, totalCents };
}

function runTests() {
    // Normal example

    const cart = [
        { productId: "tshirt", qty: 2 },

        { productId: "mug", qty: 1 },
    ];

    const r = cartTotal(cart, 10, 8);

    assert.equal(r.subtotalCents, 5250); // 52.50

    assert.equal(r.discountCents, 525); // 10% of 52.50

    assert.equal(r.taxCents, 378); // 8% of 47.25

    assert.equal(r.totalCents, 5103); // 51.03

    // Attack example: user tries to cheat with price = 1

    const attackerCart = [
        { productId: "tshirt", qty: 2, price: 1 },

        { productId: "mug", qty: 1, price: 1 },
    ];

    const wrong = cartTotal_WRONG(attackerCart, 0, 0);

    assert.equal(money(wrong), "3.00"); // totally wrong in real life

    const safe = cartTotal(attackerCart, 0, 0);

    assert.equal(money(safe.totalCents), "52.50"); // correct, ignores user price

    // Edge cases

    assert.throws(() =&gt; cartTotal([{ productId: "unknown", qty: 1 }], 0, 0));

    assert.throws(() =&gt; cartTotal([{ productId: "tshirt", qty: 0 }], 0, 0));

    assert.throws(() =&gt; cartTotal(cart, -1, 0));

    assert.throws(() =&gt; cartTotal(cart, 0, -1));
}

runTests();

console.log("All tests passed.");

const example = cartTotal(
    [
        { productId: "tshirt", qty: 1 },

        { productId: "book", qty: 2 },
    ],

    15,

    5,
);

console.log("Example subtotal:", money(example.subtotalCents));

console.log("Example discount:", money(example.discountCents));

console.log("Example tax:", money(example.taxCents));

console.log("Example total:", money(example.totalCents));
</code></pre>
<p>In this code, we didn't do a magic trick. We did some engineering:</p>
<ul>
<li><p>We took a big problem and broke it into a small piece</p>
</li>
<li><p>We wrote rules so the AI doesn't guess</p>
</li>
<li><p>We wrote examples so the AI understands</p>
</li>
<li><p>We asked for tests so we can prove it</p>
</li>
<li><p>We ran the tests so we can trust it</p>
</li>
</ul>
<p>That is the loop you can reuse for any project.</p>
<h2 id="heading-how-to-use-failing-tests-as-a-flashlight">How to Use Failing Tests as a Flashlight</h2>
<p>This is the part many developers skip. They ask for code, but they don't ask for proof. When you run the tests, one of two things happens:</p>
<ul>
<li><p>Tests pass: great, you earned confidence</p>
</li>
<li><p>Tests fail: even better, you earned clarity</p>
</li>
</ul>
<p>A failing test is a flashlight. It shows you the exact place where your thinking (or your prompt) needs improvement. Instead of "AI is wrong," you get a real question:</p>
<blockquote>
<p>Which rule was unclear, missing, or contradictory?</p>
</blockquote>
<p>Then you adjust:</p>
<ul>
<li><p>add a stricter rule</p>
</li>
<li><p>add an example that removes ambiguity</p>
</li>
<li><p>add an edge case that forces the correct behavior</p>
</li>
<li><p>regenerate only the small piece, not the whole codebase</p>
</li>
</ul>
<h2 id="heading-copy-paste-prompt-template">Copy-Paste Prompt Template</h2>
<p>Here is a copy-paste prompt template you can reuse from today (see below the image):</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771426376813/8b9c29e0-7681-433d-989b-f4c693ad4fb4.png" alt="Copy-paste prompt template" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-txt">
Build ONE small piece, not the full app.

Goal:

(One sentence)

Rules:

(3 to 7 bullets)

Examples:

(2 examples: input -&gt; output)

Edge cases:

(2 cases that can break it)

Deliver:

- one runnable file

- include tests using Node assert

- print one example output

Then ask:

Before giving code, list the possible mistakes and confirm the rules.
</code></pre>
<p>That last line is powerful. It forces the AI to think about failure before writing code.</p>
<h2 id="heading-a-calm-hype-check-why-fundamentals-matter-more-now">A Calm Hype Check: Why Fundamentals Matter More Now</h2>
<p>A lot of content online makes it sound like: "AI codes now, so you don't need to learn coding." That idea is a trap. Because yes, AI can type code. But AI cannot replace your responsibilities as a developer and engineer.</p>
<p>If you ship a broken cart, you can lose money. If you ship insecure code, you can get hacked. If you ship unreliable software, users leave. And in real life, nobody will accept the excuse: "The AI wrote it."</p>
<p>In the AI era, learning coding isn't less important. It's more important, just in a different way. The goal isn't to become a fast typist. The goal is to become a strong thinker.</p>
<p>Fundamentals matter more than before:</p>
<ul>
<li><p>how data flows through a system</p>
</li>
<li><p>how to break big problems into small parts</p>
</li>
<li><p>how to write clear rules and requirements</p>
</li>
<li><p>how to test and verify</p>
</li>
<li><p>how to notice edge cases</p>
</li>
<li><p>how to think about security</p>
</li>
<li><p>how to understand the tools you use, not just copy answers</p>
</li>
</ul>
<p>Average software will be everywhere. It will be cheap. It will be copied. It will be easy to make. So the only software that matters will be software that is truly valuable: safe, reliable, high quality, and built with real understanding.</p>
<p>That's good news for serious learners. Because the best engineers will become even more valuable, not less.</p>
<h3 id="heading-a-simple-exercise-do-this-once-and-youll-feel-the-skill">A Simple Exercise (do this once and you'll feel the skill)</h3>
<p>Add one more rule to the cart, like:</p>
<ul>
<li><p>qty cannot be more than 10</p>
</li>
<li><p>Write the test first. Then ask AI to update the function. Run the tests.</p>
</li>
<li><p>That's how you train the real AI skill: not prompting, but guiding and verifying.</p>
</li>
<li><p>Let AI type the code.</p>
</li>
<li><p>You do the thinking.</p>
</li>
<li><p>You do the breaking down.</p>
</li>
<li><p>You do the proof.</p>
</li>
</ul>
<h2 id="heading-recap">Recap</h2>
<ul>
<li><p>Don't ask AI to build the whole app</p>
</li>
<li><p>Break the problem into one small piece</p>
</li>
<li><p>Write rules, examples, and edge cases so AI doesn't guess</p>
</li>
<li><p>Always ask for tests and run them</p>
</li>
<li><p>Treat failing tests as a flashlight</p>
</li>
<li><p>Repeat the loop until you can trust what you ship</p>
</li>
</ul>
<p>That's the game now. And if you play it well, you're not behind, you're ahead.</p>
<h2 id="heading-final-words">Final Words</h2>
<p>If you found the information here valuable, feel free to share it with others who might benefit from it.</p>
<p>I’d really appreciate your thoughts – mention me on X <a href="https://x.com/sumit_analyzen">@sumit_analyzen</a> or on Facebook <a href="https://facebook.com/sumit.analyzen">@sumit.analyzen</a>, <a href="https://youtube.com/@logicBaseLabs">watch my coding tutorials</a>, or simply <a href="https://www.linkedin.com/in/sumitanalyzen/">connect with me on LinkedIn</a>.</p>
<p>You can also checkout my official website <a href="https://www.sumitsaha.me">sumitsaha.me</a> for details about me.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Test React Applications with Vitest ]]>
                </title>
                <description>
                    <![CDATA[ Testing is one of those things that every developer knows they should do, but many put off until problems start appearing in production. If you’re building React applications with Vite, there's a testing framework that fits so naturally into your wor... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-test-react-applications-with-vitest/</link>
                <guid isPermaLink="false">698bb499f3de8b702a26aec1</guid>
                
                    <category>
                        <![CDATA[ unit testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ vitest ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Aiyedogbon Abraham ]]>
                </dc:creator>
                <pubDate>Tue, 10 Feb 2026 22:43:37 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770763375195/82544dec-aec2-4de9-b7f8-f90349394e81.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Testing is one of those things that every developer knows they should do, but many put off until problems start appearing in production. If you’re building React applications with Vite, there's a testing framework that fits so naturally into your workflow that you might actually enjoy writing tests. That framework is Vitest.</p>
<p>In this tutorial, you’ll learn how to set up Vitest in a React project, write effective tests for your components and hooks, and understand the testing patterns that will help you build more reliable applications.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-vitest-and-why-should-you-use-it">What is Vitest and Why Should You Use It?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-vitest-in-your-react-project">How to Set Up Vitest in Your React Project</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-write-your-first-test">How to Write Your First Test</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-test-react-components">How to Test React Components</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-test-user-interactions">How to Test User Interactions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-test-custom-hooks">How to Test Custom Hooks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-mock-api-calls">How to Mock API Calls</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-best-practices-for-testing-react-components">Best Practices for Testing React Components</a></p>
</li>
</ul>
<h2 id="heading-what-is-vitest-and-why-should-you-use-it">What is Vitest and Why Should You Use It?</h2>
<p>Vitest is a testing framework built on top of Vite. It uses Vite’s development server and plugin pipeline to transform and load files during testing. This means your tests use the same configuration and plugins as your app (for example, the React plugin, TypeScript support,and so on), so you don’t need a separate build or compile step.</p>
<p>Vitest runs tests in parallel across worker threads for maximum speed, and it automatically enables an instant “watch” mode (similar to Vite’s HMR) that reruns only the tests related to changed files. Vitest also has first-class support for modern JavaScript out of the box: it handles ESM, TypeScript, and JSX natively via Vite’s transformer (powered by Oxc).</p>
<p>Because Vitest provides a Jest-compatible API, you can continue to use familiar testing libraries (for example, React Testing Library, jest-dom matchers, user-event, and so on) without extra setup.</p>
<p>In short, Vitest tightly integrates with your Vite-powered stack (or can even run standalone) and lets you plug in existing testing tools seamlessly.</p>
<p>Here is why Vitest has become popular in the React ecosystem:</p>
<ul>
<li><p><strong>Speed</strong>: Vitest can run tests more than four times faster than Jest in many scenarios. This speed comes from Vite's fast Hot Module Replacement and efficient caching capabilities.</p>
</li>
<li><p><strong>Zero configuration</strong>: Unlike Jest, which required Babel integration, TSJest setup, and multiple dependencies, Vitest works out of the box. It reuses your existing Vite configuration, eliminating the need to configure a separate test pipeline.</p>
</li>
<li><p><strong>Native TypeScript support</strong>: Vitest handles TypeScript and JSX natively through ESBuild, with no additional configuration needed.</p>
</li>
<li><p><strong>Modern JavaScript</strong>: Vitest offers native support for ES modules out of the box, making it ideal for modern JavaScript stacks.</p>
</li>
<li><p><strong>Familiar API</strong>: If you know Jest, you already know most of Vitest. The API is intentionally compatible, making migration straightforward.</p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along with this tutorial, you should have:</p>
<ul>
<li><p>Basic knowledge of React and JavaScript</p>
</li>
<li><p>Understanding of React Hooks</p>
</li>
<li><p>Node.js installed (version 14 or higher)</p>
</li>
<li><p>A React project created with Vite (or you can create one as we go)</p>
</li>
</ul>
<h2 id="heading-how-to-set-up-vitest-in-your-react-project">How to Set Up Vitest in Your React Project</h2>
<p>Let's start by creating a new React project with Vite and setting up Vitest.</p>
<h3 id="heading-step-1-create-a-react-project-with-vite">Step 1: Create a React Project with Vite</h3>
<p>If you don't have an existing project, create one with the following command:</p>
<pre><code class="lang-bash">npm create vite@latest my-react-app -- --template react
<span class="hljs-built_in">cd</span> my-react-app
npm install
</code></pre>
<p>This creates a React project with Vite as the build tool.</p>
<h3 id="heading-step-2-install-vitest-and-testing-dependencies">Step 2: Install Vitest and Testing Dependencies</h3>
<p>Install Vitest along with the React Testing Library and other necessary dependencies:</p>
<pre><code class="lang-bash">npm install --save-dev vitest @testing-library/react @testing-library/jest-dom @testing-library/user-event jsdom
</code></pre>
<p>Here's what each package does:</p>
<ul>
<li><p><strong>vitest</strong>: The testing framework itself</p>
</li>
<li><p><strong>@testing-library/react</strong>: Provides utilities for testing React components</p>
</li>
<li><p><strong>@testing-library/jest-dom</strong>: Adds custom matchers for DOM assertions</p>
</li>
<li><p><strong>@testing-library/user-event</strong>: Simulates user interactions</p>
</li>
<li><p><strong>jsdom</strong>: Provides a DOM environment for testing</p>
</li>
</ul>
<h3 id="heading-step-3-configure-vitest">Step 3: Configure Vitest</h3>
<p>Create a <code>vitest.config.js</code> file in your project root:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { defineConfig } <span class="hljs-keyword">from</span> <span class="hljs-string">'vitest/config'</span>;
<span class="hljs-keyword">import</span> react <span class="hljs-keyword">from</span> <span class="hljs-string">'@vitejs/plugin-react'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> defineConfig({
  <span class="hljs-attr">plugins</span>: [react()],
  <span class="hljs-attr">test</span>: {
    <span class="hljs-attr">globals</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">environment</span>: <span class="hljs-string">'jsdom'</span>,
    <span class="hljs-attr">setupFiles</span>: <span class="hljs-string">'./src/test/setup.js'</span>,
  },
});
</code></pre>
<p>Setting <code>globals: true</code> exposes the <code>describe</code> and <code>it</code> functions on the global object, so you don't need to import them in every test file. The <code>environment: 'jsdom'</code> setting tells Vitest to use jsdom for simulating a browser environment.</p>
<h3 id="heading-step-4-create-the-test-setup-file">Step 4: Create the Test Setup File</h3>
<p>Create a file at <code>src/test/setup.js</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { expect, afterEach } <span class="hljs-keyword">from</span> <span class="hljs-string">'vitest'</span>;
<span class="hljs-keyword">import</span> { cleanup } <span class="hljs-keyword">from</span> <span class="hljs-string">'@testing-library/react'</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'@testing-library/jest-dom'</span>;

afterEach(<span class="hljs-function">() =&gt;</span> {
  cleanup();
});
</code></pre>
<p>The <code>cleanup()</code> function runs after each test to clean up the DOM, ensuring tests don't interfere with each other.</p>
<h3 id="heading-step-5-add-test-scripts">Step 5: Add Test Scripts</h3>
<p>Add the following script to your <code>package.json</code>:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"scripts"</span>: {
    <span class="hljs-attr">"dev"</span>: <span class="hljs-string">"vite"</span>,
    <span class="hljs-attr">"build"</span>: <span class="hljs-string">"vite build"</span>,
    <span class="hljs-attr">"test"</span>: <span class="hljs-string">"vitest"</span>,
    <span class="hljs-attr">"test:ui"</span>: <span class="hljs-string">"vitest --ui"</span>,
    <span class="hljs-attr">"coverage"</span>: <span class="hljs-string">"vitest --coverage"</span>
  }
}
</code></pre>
<p>Now you can run tests with <code>npm test</code>.</p>
<h2 id="heading-how-to-write-your-first-test">How to Write Your First Test</h2>
<p>Let's write a simple test to make sure everything is working. Create a file called <code>sum.test.js</code> in your <code>src</code> directory:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { expect, test } <span class="hljs-keyword">from</span> <span class="hljs-string">'vitest'</span>;

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">sum</span>(<span class="hljs-params">a, b</span>) </span>{
  <span class="hljs-keyword">return</span> a + b;
}

test(<span class="hljs-string">'adds 1 + 2 to equal 3'</span>, <span class="hljs-function">() =&gt;</span> {
  expect(sum(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>)).toBe(<span class="hljs-number">3</span>);
});
</code></pre>
<p>Run <code>npm test</code> and you should see your test pass. A test in Vitest passes if it doesn't throw an error.</p>
<h2 id="heading-how-to-test-react-components">How to Test React Components</h2>
<p>Now let's test an actual React component. We'll start with a simple component and gradually build up to more complex scenarios.</p>
<h3 id="heading-testing-a-simple-component">Testing a Simple Component</h3>
<p>Create a component called <code>Greeting.jsx</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">Greeting</span>(<span class="hljs-params">{ name }</span>) </span>{
  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">h1</span>&gt;</span>Hello, {name}!<span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">p</span>&gt;</span>Welcome to our application<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
  );
}
</code></pre>
<p>Now create a test file <code>Greeting.test.jsx</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { render, screen } <span class="hljs-keyword">from</span> <span class="hljs-string">'@testing-library/react'</span>;
<span class="hljs-keyword">import</span> { Greeting } <span class="hljs-keyword">from</span> <span class="hljs-string">'./Greeting'</span>;

describe(<span class="hljs-string">'Greeting Component'</span>, <span class="hljs-function">() =&gt;</span> {
  it(<span class="hljs-string">'should render the greeting with the provided name'</span>, <span class="hljs-function">() =&gt;</span> {
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">Greeting</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"Alice"</span> /&gt;</span></span>);

    <span class="hljs-keyword">const</span> heading = screen.getByRole(<span class="hljs-string">'heading'</span>, { <span class="hljs-attr">level</span>: <span class="hljs-number">1</span> });
    expect(heading).toHaveTextContent(<span class="hljs-string">'Hello, Alice!'</span>);
  });

  it(<span class="hljs-string">'should render the welcome message'</span>, <span class="hljs-function">() =&gt;</span> {
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">Greeting</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"Bob"</span> /&gt;</span></span>);

    <span class="hljs-keyword">const</span> paragraph = screen.getByText(<span class="hljs-string">'Welcome to our application'</span>);
    expect(paragraph).toBeInTheDocument();
  });
});
</code></pre>
<p>The <code>describe</code> function groups related tests into a single describe block. Each <code>it</code> function contains one test case.</p>
<p>The <code>render</code> function from React Testing Library renders your component in a test environment. The <code>screen</code> object provides query methods to find elements in the rendered output.</p>
<h3 id="heading-understanding-query-functions">Understanding Query Functions</h3>
<p>React Testing Library provides three types of query functions: <code>get</code>, <code>query</code>, and <code>find</code>.</p>
<p><strong>getBy queries</strong>: Throw an error if the element isn't found. Use these when you expect the element to be present.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> button = screen.getByRole(<span class="hljs-string">'button'</span>, { <span class="hljs-attr">name</span>: <span class="hljs-regexp">/click me/i</span> });
</code></pre>
<p><strong>queryBy queries</strong>: Return <code>null</code> if the element isn't found. Use these when you want to assert that an element doesn't exist.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> errorMessage = screen.queryByText(<span class="hljs-string">'Error'</span>);
expect(errorMessage).not.toBeInTheDocument();
</code></pre>
<p><strong>findBy queries</strong>: Return a promise and wait for the element to appear. Use these for asynchronous operations.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> loadedData = <span class="hljs-keyword">await</span> screen.findByText(<span class="hljs-string">'Data loaded'</span>);
</code></pre>
<h3 id="heading-testing-a-counter-component">Testing a Counter Component</h3>
<p>Let's test a more interactive component. Create <code>Counter.jsx</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { useState } <span class="hljs-keyword">from</span> <span class="hljs-string">'react'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">Counter</span>(<span class="hljs-params">{ initialCount = <span class="hljs-number">0</span> }</span>) </span>{
  <span class="hljs-keyword">const</span> [count, setCount] = useState(initialCount);

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">p</span>&gt;</span>Count: {count}<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> setCount(count + 1)}&gt;Increment<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> setCount(count - 1)}&gt;Decrement<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">onClick</span>=<span class="hljs-string">{()</span> =&gt;</span> setCount(0)}&gt;Reset<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>
  );
}
</code></pre>
<p>Create the test file <code>Counter.test.jsx</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { render, screen } <span class="hljs-keyword">from</span> <span class="hljs-string">'@testing-library/react'</span>;
<span class="hljs-keyword">import</span> userEvent <span class="hljs-keyword">from</span> <span class="hljs-string">'@testing-library/user-event'</span>;
<span class="hljs-keyword">import</span> { Counter } <span class="hljs-keyword">from</span> <span class="hljs-string">'./Counter'</span>;

describe(<span class="hljs-string">'Counter Component'</span>, <span class="hljs-function">() =&gt;</span> {
  it(<span class="hljs-string">'should render with initial count of 0'</span>, <span class="hljs-function">() =&gt;</span> {
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">Counter</span> /&gt;</span></span>);

    expect(screen.getByText(<span class="hljs-string">'Count: 0'</span>)).toBeInTheDocument();
  });

  it(<span class="hljs-string">'should render with custom initial count'</span>, <span class="hljs-function">() =&gt;</span> {
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">Counter</span> <span class="hljs-attr">initialCount</span>=<span class="hljs-string">{5}</span> /&gt;</span></span>);

    expect(screen.getByText(<span class="hljs-string">'Count: 5'</span>)).toBeInTheDocument();
  });

  it(<span class="hljs-string">'should increment count when increment button is clicked'</span>, <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-keyword">const</span> user = userEvent.setup();
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">Counter</span> /&gt;</span></span>);

    <span class="hljs-keyword">const</span> incrementButton = screen.getByRole(<span class="hljs-string">'button'</span>, { <span class="hljs-attr">name</span>: <span class="hljs-regexp">/increment/i</span> });
    <span class="hljs-keyword">await</span> user.click(incrementButton);

    expect(screen.getByText(<span class="hljs-string">'Count: 1'</span>)).toBeInTheDocument();
  });

  it(<span class="hljs-string">'should decrement count when decrement button is clicked'</span>, <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-keyword">const</span> user = userEvent.setup();
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">Counter</span> <span class="hljs-attr">initialCount</span>=<span class="hljs-string">{5}</span> /&gt;</span></span>);

    <span class="hljs-keyword">const</span> decrementButton = screen.getByRole(<span class="hljs-string">'button'</span>, { <span class="hljs-attr">name</span>: <span class="hljs-regexp">/decrement/i</span> });
    <span class="hljs-keyword">await</span> user.click(decrementButton);

    expect(screen.getByText(<span class="hljs-string">'Count: 4'</span>)).toBeInTheDocument();
  });

  it(<span class="hljs-string">'should reset count to 0 when reset button is clicked'</span>, <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-keyword">const</span> user = userEvent.setup();
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">Counter</span> <span class="hljs-attr">initialCount</span>=<span class="hljs-string">{10}</span> /&gt;</span></span>);

    <span class="hljs-keyword">const</span> resetButton = screen.getByRole(<span class="hljs-string">'button'</span>, { <span class="hljs-attr">name</span>: <span class="hljs-regexp">/reset/i</span> });
    <span class="hljs-keyword">await</span> user.click(resetButton);

    expect(screen.getByText(<span class="hljs-string">'Count: 0'</span>)).toBeInTheDocument();
  });
});
</code></pre>
<p>In these Counter tests, we first use <code>render(&lt;Counter /&gt;)</code> to mount the component in a virtual DOM. We then query the output using Testing Library’s <code>screen</code> object. For example, <code>screen.getByText('Count: 0')</code> finds the element displaying the initial count of 0, and <code>expect(...).toBeInTheDocument()</code> asserts that it is present. The <code>getByText</code> query will throw an error if the text isn’t found, immediately failing the test.</p>
<p>For interactive tests, we create a <code>user</code> with <code>const user = userEvent.setup()</code> and then call <code>await user.click(...)</code> on the increment/decrement/reset buttons. The <code>userEvent.click</code> method simulates a real user click (dispatching the sequence of events a browser would fire). We locate buttons by their accessible role and name (for example, <code>getByRole('button', { name: /increment/i })</code>), following best practices for accessible queries.</p>
<p>After each click, we assert that the DOM updates accordingly (for example, the count text changes to “Count: 1”). Using <code>async/await</code> with <code>user.click</code> ensures the test waits for any state changes. In this way, each test checks the user-visible behavior: that clicking the Increment button increases the count, the Decrement button decreases it, and the Reset button sets it back to zero, without depending on the component’s internal implementation.</p>
<h2 id="heading-how-to-test-user-interactions">How to Test User Interactions</h2>
<p>User interactions are a critical part of testing React applications. The <code>@testing-library/user-event</code> library provides a more realistic simulation of user behaviour than simple event dispatching.</p>
<h3 id="heading-testing-form-inputs">Testing Form Inputs</h3>
<p>Create a <code>LoginForm.jsx</code> component:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { useState } <span class="hljs-keyword">from</span> <span class="hljs-string">'react'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">LoginForm</span>(<span class="hljs-params">{ onSubmit }</span>) </span>{
  <span class="hljs-keyword">const</span> [email, setEmail] = useState(<span class="hljs-string">''</span>);
  <span class="hljs-keyword">const</span> [password, setPassword] = useState(<span class="hljs-string">''</span>);
  <span class="hljs-keyword">const</span> [error, setError] = useState(<span class="hljs-string">''</span>);

  <span class="hljs-keyword">const</span> handleSubmit = <span class="hljs-function">(<span class="hljs-params">e</span>) =&gt;</span> {
    e.preventDefault();

    <span class="hljs-keyword">if</span> (!email || !password) {
      setError(<span class="hljs-string">'Both fields are required'</span>);
      <span class="hljs-keyword">return</span>;
    }

    setError(<span class="hljs-string">''</span>);
    onSubmit({ email, password });
  };

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">form</span> <span class="hljs-attr">onSubmit</span>=<span class="hljs-string">{handleSubmit}</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">label</span> <span class="hljs-attr">htmlFor</span>=<span class="hljs-string">"email"</span>&gt;</span>Email<span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
          <span class="hljs-attr">id</span>=<span class="hljs-string">"email"</span>
          <span class="hljs-attr">type</span>=<span class="hljs-string">"email"</span>
          <span class="hljs-attr">value</span>=<span class="hljs-string">{email}</span>
          <span class="hljs-attr">onChange</span>=<span class="hljs-string">{(e)</span> =&gt;</span> setEmail(e.target.value)}
        /&gt;
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">label</span> <span class="hljs-attr">htmlFor</span>=<span class="hljs-string">"password"</span>&gt;</span>Password<span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
          <span class="hljs-attr">id</span>=<span class="hljs-string">"password"</span>
          <span class="hljs-attr">type</span>=<span class="hljs-string">"password"</span>
          <span class="hljs-attr">value</span>=<span class="hljs-string">{password}</span>
          <span class="hljs-attr">onChange</span>=<span class="hljs-string">{(e)</span> =&gt;</span> setPassword(e.target.value)}
        /&gt;
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      {error &amp;&amp; <span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">role</span>=<span class="hljs-string">"alert"</span>&gt;</span>{error}<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>}
      <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"submit"</span>&gt;</span>Log In<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">form</span>&gt;</span></span>
  );
}
</code></pre>
<p>Create the test file <code>LoginForm.test.jsx</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { render, screen } <span class="hljs-keyword">from</span> <span class="hljs-string">'@testing-library/react'</span>;
<span class="hljs-keyword">import</span> userEvent <span class="hljs-keyword">from</span> <span class="hljs-string">'@testing-library/user-event'</span>;
<span class="hljs-keyword">import</span> { LoginForm } <span class="hljs-keyword">from</span> <span class="hljs-string">'./LoginForm'</span>;

describe(<span class="hljs-string">'LoginForm Component'</span>, <span class="hljs-function">() =&gt;</span> {
  it(<span class="hljs-string">'should render email and password inputs'</span>, <span class="hljs-function">() =&gt;</span> {
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">LoginForm</span> <span class="hljs-attr">onSubmit</span>=<span class="hljs-string">{()</span> =&gt;</span> {}} /&gt;</span>);

    expect(screen.getByLabelText(<span class="hljs-regexp">/email/i</span>)).toBeInTheDocument();
    expect(screen.getByLabelText(<span class="hljs-regexp">/password/i</span>)).toBeInTheDocument();
  });

  it(<span class="hljs-string">'should update input values when user types'</span>, <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-keyword">const</span> user = userEvent.setup();
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">LoginForm</span> <span class="hljs-attr">onSubmit</span>=<span class="hljs-string">{()</span> =&gt;</span> {}} /&gt;</span>);

    <span class="hljs-keyword">const</span> emailInput = screen.getByLabelText(<span class="hljs-regexp">/email/i</span>);
    <span class="hljs-keyword">const</span> passwordInput = screen.getByLabelText(<span class="hljs-regexp">/password/i</span>);

    <span class="hljs-keyword">await</span> user.type(emailInput, <span class="hljs-string">'test@example.com'</span>);
    <span class="hljs-keyword">await</span> user.type(passwordInput, <span class="hljs-string">'password123'</span>);

    expect(emailInput).toHaveValue(<span class="hljs-string">'test@example.com'</span>);
    expect(passwordInput).toHaveValue(<span class="hljs-string">'password123'</span>);
  });

  it(<span class="hljs-string">'should show error when form is submitted empty'</span>, <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-keyword">const</span> user = userEvent.setup();
    <span class="hljs-keyword">const</span> mockSubmit = vi.fn();
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">LoginForm</span> <span class="hljs-attr">onSubmit</span>=<span class="hljs-string">{mockSubmit}</span> /&gt;</span></span>);

    <span class="hljs-keyword">const</span> submitButton = screen.getByRole(<span class="hljs-string">'button'</span>, { <span class="hljs-attr">name</span>: <span class="hljs-regexp">/log in/i</span> });
    <span class="hljs-keyword">await</span> user.click(submitButton);

    expect(screen.getByRole(<span class="hljs-string">'alert'</span>)).toHaveTextContent(<span class="hljs-string">'Both fields are required'</span>);
    expect(mockSubmit).not.toHaveBeenCalled();
  });

  it(<span class="hljs-string">'should call onSubmit with form data when valid'</span>, <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-keyword">const</span> user = userEvent.setup();
    <span class="hljs-keyword">const</span> mockSubmit = vi.fn();
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">LoginForm</span> <span class="hljs-attr">onSubmit</span>=<span class="hljs-string">{mockSubmit}</span> /&gt;</span></span>);

    <span class="hljs-keyword">await</span> user.type(screen.getByLabelText(<span class="hljs-regexp">/email/i</span>), <span class="hljs-string">'test@example.com'</span>);
    <span class="hljs-keyword">await</span> user.type(screen.getByLabelText(<span class="hljs-regexp">/password/i</span>), <span class="hljs-string">'password123'</span>);
    <span class="hljs-keyword">await</span> user.click(screen.getByRole(<span class="hljs-string">'button'</span>, { <span class="hljs-attr">name</span>: <span class="hljs-regexp">/log in/i</span> }));

    expect(mockSubmit).toHaveBeenCalledWith({
      <span class="hljs-attr">email</span>: <span class="hljs-string">'test@example.com'</span>,
      <span class="hljs-attr">password</span>: <span class="hljs-string">'password123'</span>,
    });
  });
});
</code></pre>
<p>The LoginForm tests similarly use <code>render</code> and <code>screen</code> to interact with the component. We use <code>screen.getByLabelText(/email/i)</code> and <code>screen.getByLabelText(/password/i)</code> to find the input fields by their associated labels, mimicking how users identify form fields.</p>
<p>To simulate typing, we use <code>await user.type(input, text)</code>, which sends real keyboard events to the input (via user-event). After typing, we assert the input’s value with <code>expect(input).toHaveValue(...)</code> (a custom matcher from jest-dom).</p>
<p>When submitting the form empty, clicking the <strong>Log In</strong> button triggers the form’s validation and displays an error message. We find this error by querying <code>getByRole('alert')</code> and check its text content. We also assert that the mock <code>onSubmit</code> handler was <em>not</em> called.</p>
<p>In the valid submission test, we fill both fields and click <strong>Log In</strong>; then <code>expect(mockSubmit).toHaveBeenCalledWith({...})</code> verifies the submit handler received the correct <code>{ email, password }</code> object.</p>
<p>These tests focus on user actions and outcomes: typing and clicking drive the form logic, and our assertions confirm the expected outputs (visible error text or the callback arguments).</p>
<h2 id="heading-how-to-test-custom-hooks">How to Test Custom Hooks</h2>
<p>Custom hooks encapsulate reusable logic, and they need testing just like components. React Testing Library provides a <code>renderHook</code> function specifically for this purpose.</p>
<h3 id="heading-creating-and-testing-a-custom-hook">Creating and Testing a Custom Hook</h3>
<p>Create a custom hook <code>useFetch.js</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { useState, useEffect } <span class="hljs-keyword">from</span> <span class="hljs-string">'react'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">useFetch</span>(<span class="hljs-params">url</span>) </span>{
  <span class="hljs-keyword">const</span> [data, setData] = useState(<span class="hljs-literal">null</span>);
  <span class="hljs-keyword">const</span> [loading, setLoading] = useState(<span class="hljs-literal">true</span>);
  <span class="hljs-keyword">const</span> [error, setError] = useState(<span class="hljs-literal">null</span>);

  useEffect(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">const</span> fetchData = <span class="hljs-keyword">async</span> () =&gt; {
      <span class="hljs-keyword">try</span> {
        setLoading(<span class="hljs-literal">true</span>);
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(url);

        <span class="hljs-keyword">if</span> (!response.ok) {
          <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'Network response was not ok'</span>);
        }

        <span class="hljs-keyword">const</span> json = <span class="hljs-keyword">await</span> response.json();
        setData(json);
        setError(<span class="hljs-literal">null</span>);
      } <span class="hljs-keyword">catch</span> (err) {
        setError(err.message);
        setData(<span class="hljs-literal">null</span>);
      } <span class="hljs-keyword">finally</span> {
        setLoading(<span class="hljs-literal">false</span>);
      }
    };

    fetchData();
  }, [url]);

  <span class="hljs-keyword">return</span> { data, loading, error };
}
</code></pre>
<p>Create the test file <code>useFetch.test.js</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { renderHook, waitFor } <span class="hljs-keyword">from</span> <span class="hljs-string">'@testing-library/react'</span>;
<span class="hljs-keyword">import</span> { useFetch } <span class="hljs-keyword">from</span> <span class="hljs-string">'./useFetch'</span>;

describe(<span class="hljs-string">'useFetch Hook'</span>, <span class="hljs-function">() =&gt;</span> {
  beforeEach(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">global</span>.fetch = vi.fn();
  });

  afterEach(<span class="hljs-function">() =&gt;</span> {
    vi.restoreAllMocks();
  });

  it(<span class="hljs-string">'should return loading state initially'</span>, <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">global</span>.fetch.mockImplementation(<span class="hljs-function">() =&gt;</span> 
      <span class="hljs-built_in">Promise</span>.resolve({
        <span class="hljs-attr">ok</span>: <span class="hljs-literal">true</span>,
        <span class="hljs-attr">json</span>: <span class="hljs-keyword">async</span> () =&gt; ({ <span class="hljs-attr">data</span>: <span class="hljs-string">'test'</span> }),
      })
    );

    <span class="hljs-keyword">const</span> { result } = renderHook(<span class="hljs-function">() =&gt;</span> useFetch(<span class="hljs-string">'https://api.example.com/data'</span>));

    expect(result.current.loading).toBe(<span class="hljs-literal">true</span>);
    expect(result.current.data).toBe(<span class="hljs-literal">null</span>);
    expect(result.current.error).toBe(<span class="hljs-literal">null</span>);
  });

  it(<span class="hljs-string">'should return data when fetch succeeds'</span>, <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-keyword">const</span> mockData = { <span class="hljs-attr">id</span>: <span class="hljs-number">1</span>, <span class="hljs-attr">title</span>: <span class="hljs-string">'Test Post'</span> };

    <span class="hljs-built_in">global</span>.fetch.mockImplementation(<span class="hljs-function">() =&gt;</span>
      <span class="hljs-built_in">Promise</span>.resolve({
        <span class="hljs-attr">ok</span>: <span class="hljs-literal">true</span>,
        <span class="hljs-attr">json</span>: <span class="hljs-keyword">async</span> () =&gt; mockData,
      })
    );

    <span class="hljs-keyword">const</span> { result } = renderHook(<span class="hljs-function">() =&gt;</span> useFetch(<span class="hljs-string">'https://api.example.com/posts/1'</span>));

    <span class="hljs-keyword">await</span> waitFor(<span class="hljs-function">() =&gt;</span> expect(result.current.loading).toBe(<span class="hljs-literal">false</span>));

    expect(result.current.data).toEqual(mockData);
    expect(result.current.error).toBe(<span class="hljs-literal">null</span>);
  });

  it(<span class="hljs-string">'should return error when fetch fails'</span>, <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-built_in">global</span>.fetch.mockImplementation(<span class="hljs-function">() =&gt;</span>
      <span class="hljs-built_in">Promise</span>.resolve({
        <span class="hljs-attr">ok</span>: <span class="hljs-literal">false</span>,
      })
    );

    <span class="hljs-keyword">const</span> { result } = renderHook(<span class="hljs-function">() =&gt;</span> useFetch(<span class="hljs-string">'https://api.example.com/posts/1'</span>));

    <span class="hljs-keyword">await</span> waitFor(<span class="hljs-function">() =&gt;</span> expect(result.current.loading).toBe(<span class="hljs-literal">false</span>));

    expect(result.current.data).toBe(<span class="hljs-literal">null</span>);
    expect(result.current.error).toBe(<span class="hljs-string">'Network response was not ok'</span>);
  });
});
</code></pre>
<p>The <code>renderHook</code> function from React Testing Library renders custom hooks, and <code>waitFor</code> is used to wait for asynchronous state updates in the hook.</p>
<h2 id="heading-how-to-mock-api-calls">How to Mock API Calls</h2>
<p>When testing components that make API calls, you don't want to hit real endpoints. Mocking ensures your tests are fast, reliable, and don't depend on network conditions.</p>
<h3 id="heading-mocking-with-vitest">Mocking with Vitest</h3>
<p>Vitest doesn’t auto-mock modules like Jest does, so you need to manually mock them. Let's see how to mock an Axios call.</p>
<p>Create a <code>PostsList.jsx</code> component:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { useState, useEffect } <span class="hljs-keyword">from</span> <span class="hljs-string">'react'</span>;
<span class="hljs-keyword">import</span> axios <span class="hljs-keyword">from</span> <span class="hljs-string">'axios'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">PostsList</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> [posts, setPosts] = useState([]);
  <span class="hljs-keyword">const</span> [loading, setLoading] = useState(<span class="hljs-literal">true</span>);
  <span class="hljs-keyword">const</span> [error, setError] = useState(<span class="hljs-literal">null</span>);

  useEffect(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">const</span> fetchPosts = <span class="hljs-keyword">async</span> () =&gt; {
      <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> axios.get(<span class="hljs-string">'https://api.example.com/posts'</span>);
        setPosts(response.data);
      } <span class="hljs-keyword">catch</span> (err) {
        setError(err.message);
      } <span class="hljs-keyword">finally</span> {
        setLoading(<span class="hljs-literal">false</span>);
      }
    };

    fetchPosts();
  }, []);

  <span class="hljs-keyword">if</span> (loading) <span class="hljs-keyword">return</span> <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">p</span>&gt;</span>Loading...<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span></span>;
  <span class="hljs-keyword">if</span> (error) <span class="hljs-keyword">return</span> <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">p</span>&gt;</span>Error: {error}<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span></span>;

  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">ul</span>&gt;</span>
      {posts.map((post) =&gt; (
        <span class="hljs-tag">&lt;<span class="hljs-name">li</span> <span class="hljs-attr">key</span>=<span class="hljs-string">{post.id}</span>&gt;</span>{post.title}<span class="hljs-tag">&lt;/<span class="hljs-name">li</span>&gt;</span>
      ))}
    <span class="hljs-tag">&lt;/<span class="hljs-name">ul</span>&gt;</span></span>
  );
}
</code></pre>
<p>Create the test file <code>PostsList.test.jsx</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { render, screen, waitFor } <span class="hljs-keyword">from</span> <span class="hljs-string">'@testing-library/react'</span>;
<span class="hljs-keyword">import</span> axios <span class="hljs-keyword">from</span> <span class="hljs-string">'axios'</span>;
<span class="hljs-keyword">import</span> { PostsList } <span class="hljs-keyword">from</span> <span class="hljs-string">'./PostsList'</span>;

vi.mock(<span class="hljs-string">'axios'</span>);

describe(<span class="hljs-string">'PostsList Component'</span>, <span class="hljs-function">() =&gt;</span> {
  beforeEach(<span class="hljs-function">() =&gt;</span> {
    vi.clearAllMocks();
  });

  it(<span class="hljs-string">'should display loading state initially'</span>, <span class="hljs-function">() =&gt;</span> {
    axios.get.mockImplementation(<span class="hljs-function">() =&gt;</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Promise</span>(<span class="hljs-function">() =&gt;</span> {}));
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">PostsList</span> /&gt;</span></span>);

    expect(screen.getByText(<span class="hljs-string">'Loading...'</span>)).toBeInTheDocument();
  });

  it(<span class="hljs-string">'should display posts when API call succeeds'</span>, <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-keyword">const</span> mockPosts = [
      { <span class="hljs-attr">id</span>: <span class="hljs-number">1</span>, <span class="hljs-attr">title</span>: <span class="hljs-string">'First Post'</span> },
      { <span class="hljs-attr">id</span>: <span class="hljs-number">2</span>, <span class="hljs-attr">title</span>: <span class="hljs-string">'Second Post'</span> },
    ];

    axios.get.mockResolvedValue({ <span class="hljs-attr">data</span>: mockPosts });
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">PostsList</span> /&gt;</span></span>);

    <span class="hljs-keyword">await</span> waitFor(<span class="hljs-function">() =&gt;</span> {
      expect(screen.queryByText(<span class="hljs-string">'Loading...'</span>)).not.toBeInTheDocument();
    });

    expect(screen.getByText(<span class="hljs-string">'First Post'</span>)).toBeInTheDocument();
    expect(screen.getByText(<span class="hljs-string">'Second Post'</span>)).toBeInTheDocument();
  });

  it(<span class="hljs-string">'should display error when API call fails'</span>, <span class="hljs-keyword">async</span> () =&gt; {
    axios.get.mockRejectedValue(<span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'Network error'</span>));
    render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">PostsList</span> /&gt;</span></span>);

    <span class="hljs-keyword">await</span> waitFor(<span class="hljs-function">() =&gt;</span> {
      expect(screen.queryByText(<span class="hljs-string">'Loading...'</span>)).not.toBeInTheDocument();
    });

    expect(screen.getByText(<span class="hljs-regexp">/error/i</span>)).toBeInTheDocument();
  });
});
</code></pre>
<p>In these tests, we verify specific UI states: the “loading” test checks that a loading indicator shows while data is being fetched, the “success” test confirms that post items render when the API returns data, and the “error” test makes sure an error message appears if the call fails.</p>
<p>We mock Axios by calling <code>vi.mock('axios')</code> and then using methods like <code>mockResolvedValue(...)</code> on <code>axios.get</code> to simulate a successful response (and <code>mockRejectedValue(...)</code> to simulate a failure). This kind of mocking isolates our tests from real network calls (making them fast and reliable) and lets us control exactly what data or error the hook receives.</p>
<p>We use <code>await waitFor(...)</code> to pause the test until those asynchronous updates complete before making assertions. Finally, we use <code>screen.getByText(...)</code> to find elements that should be present (it will throw an error if they’re missing) and <code>screen.queryByText(...)</code> to check that elements aren’t present (it returns null if the element is not in the DOM).</p>
<h3 id="heading-mocking-specific-module-functions">Mocking Specific Module Functions</h3>
<p>Sometimes you only want to mock specific functions while keeping the rest of a module's behaviour intact. Here's how to do that:</p>
<pre><code class="lang-javascript">vi.mock(<span class="hljs-string">'date-fns'</span>, <span class="hljs-keyword">async</span> () =&gt; {
  <span class="hljs-keyword">const</span> original = <span class="hljs-keyword">await</span> vi.importActual(<span class="hljs-string">'date-fns'</span>);
  <span class="hljs-keyword">return</span> {
    ...original,
    <span class="hljs-attr">format</span>: vi.fn(<span class="hljs-function">() =&gt;</span> <span class="hljs-string">'2025-01-01'</span>),
  };
});
</code></pre>
<p>In Vitest, you use <code>vi.importActual</code> to retain all original methods while mocking only the <code>format</code> method.</p>
<h2 id="heading-best-practices-for-testing-react-components">Best Practices for Testing React Components</h2>
<p>Now that you know how to write tests, let's talk about how to write good tests.</p>
<h3 id="heading-test-user-behaviour-not-implementation">Test User Behaviour, Not Implementation</h3>
<p>Focus on testing what users see and do, not internal component details. If you refactor your component's implementation without changing its behaviour, your tests shouldn't break.</p>
<p><strong>Bad test (testing implementation):</strong></p>
<pre><code class="lang-javascript">it(<span class="hljs-string">'should set isOpen state to true'</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-keyword">const</span> { result } = renderHook(<span class="hljs-function">() =&gt;</span> useState(<span class="hljs-literal">false</span>));
  <span class="hljs-comment">// Testing internal state directly</span>
});
</code></pre>
<p><strong>Good test (testing behaviour):</strong></p>
<pre><code class="lang-javascript">it(<span class="hljs-string">'should show menu when button is clicked'</span>, <span class="hljs-keyword">async</span> () =&gt; {
  <span class="hljs-keyword">const</span> user = userEvent.setup();
  render(<span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">Menu</span> /&gt;</span></span>);

  <span class="hljs-keyword">await</span> user.click(screen.getByRole(<span class="hljs-string">'button'</span>, { <span class="hljs-attr">name</span>: <span class="hljs-regexp">/menu/i</span> }));
  expect(screen.getByRole(<span class="hljs-string">'navigation'</span>)).toBeVisible();
});
</code></pre>
<h3 id="heading-use-accessible-queries">Use Accessible Queries</h3>
<p>React Testing Library encourages you to query elements the way users do. Prefer queries that mirror user interaction:</p>
<ol>
<li><p><code>getByRole</code> (best for interactive elements)</p>
</li>
<li><p><code>getByLabelText</code> (for form fields)</p>
</li>
<li><p><code>getByPlaceholderText</code></p>
</li>
<li><p><code>getByText</code></p>
</li>
<li><p><code>getByTestId</code> (last resort)</p>
</li>
</ol>
<h3 id="heading-keep-tests-simple-and-focused">Keep Tests Simple and Focused</h3>
<p>Each test should verify one thing. If your test needs a lot of setup or has many assertions, consider splitting it into multiple tests.</p>
<h3 id="heading-clean-up-between-tests">Clean Up Between Tests</h3>
<p>Use <code>afterEach</code> to clean up the DOM after each test run, ensuring tests don't interfere with each other. This is already handled if you followed the setup steps earlier.</p>
<h3 id="heading-use-descriptive-test-names">Use Descriptive Test Names</h3>
<p>Test names should clearly describe what they're testing and what the expected outcome is.</p>
<p>Good test names:</p>
<pre><code class="lang-javascript">it(<span class="hljs-string">'should display error message when form is submitted empty'</span>);
it(<span class="hljs-string">'should call onSubmit with email and password when form is valid'</span>);
it(<span class="hljs-string">'should disable submit button while request is pending'</span>);
</code></pre>
<h3 id="heading-mock-external-dependencies">Mock External Dependencies</h3>
<p>Always mock API calls, timers, and other external dependencies. Your tests should be isolated and not depend on network conditions or external services.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Now, you have learned how to set up Vitest in a React project and write effective tests for components, user interactions, custom hooks, and API calls. Vitest provides a powerful and efficient way to test React applications, especially when combined with modern tools like Vite.</p>
<p>Testing is about building confidence in your code, documenting expected behaviour, and enabling safe refactoring. Vitest's speed makes testing feel less like a chore and more like a natural part of development.</p>
<p>Start small. Add tests for critical user flows. Test the components that change frequently. As you build the habit, you will find that tests actually make development faster, not slower. The code will still be there tomorrow. But the bugs you catch today won't be.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Unit Testing in Go - A Beginner's Guide ]]>
                </title>
                <description>
                    <![CDATA[ If you're learning Go and you’re already familiar with the idea of unit testing, the main challenge is usually not why to test, but how to test in Go. Go takes a deliberately minimal approach to testing. There are no built-in assertions, no annotatio... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/unit-testing-in-go-a-beginners-guide/</link>
                <guid isPermaLink="false">696535ad7a48c374647910f2</guid>
                
                    <category>
                        <![CDATA[ Go Language ]]>
                    </category>
                
                    <category>
                        <![CDATA[ golang ]]>
                    </category>
                
                    <category>
                        <![CDATA[ unit testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Software Testing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Gabor Koos ]]>
                </dc:creator>
                <pubDate>Mon, 12 Jan 2026 17:55:57 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768240528981/73c9c9f6-4942-4c39-9e62-87f540fd2233.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you're learning Go and you’re already familiar with the idea of <strong>unit testing</strong>, the main challenge is usually not <em>why</em> to test, but <em>how</em> to test in Go.</p>
<p>Go takes a deliberately minimal approach to testing. There are no built-in assertions, no annotations, and no special syntax. Instead, tests are written as regular Go code using a small standard library package, and run with a single command. This can feel unusual at first if you're coming from ecosystems with richer testing frameworks, but it quickly becomes predictable and easy to reason about.</p>
<p>In this article, we'll look at how unit testing works in Go in practice. We'll write a few small tests, run them from the command line, and cover the most common patterns you'll see in real Go codebases, such as table-driven tests and testing functions that return errors. We'll focus on the essentials and won't cover more advanced topics like mocks or external frameworks.</p>
<p>The goal is to show how familiar testing concepts translate into idiomatic Go. By the end, you should feel comfortable reading and writing basic unit tests and integrating them into your regular Go workflow.</p>
<h2 id="heading-what-well-cover">What We'll Cover:</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-writing-your-first-test">Writing Your First Test</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-running-your-test">Running Your Test</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-divide-by-zero">Divide by Zero</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-terrorf-vs-tfatalf">t.Errorf vs t.Fatalf</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-table-driven-tests">Table-Driven Tests</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-table-driven-add-test">Table-Driven Add Test</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-table-driven-divide-test">Table-Driven Divide Test</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-exercise">Exercise</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-testing-functions-that-return-errors">Testing Functions That Return Errors</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-safe-divide-function">Safe Divide Function</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-writing-tests-for-safedivide">Writing Tests for SafeDivide()</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-exercise-1">Exercise</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-best-practices-and-tips">Best Practices and Tips</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-name-tests-clearly">Name Tests Clearly</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-keep-tests-small-and-focused">Keep Tests Small and Focused</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-use-table-driven-tests-for-repetitive-cases">Use Table-Driven Tests for Repetitive Cases</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-check-errors-explicitly">Check Errors Explicitly</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-avoid-panics-when-possible">Avoid Panics When Possible</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-run-tests-frequently">Run Tests Frequently</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-keep-tests-in-the-same-package">Keep Tests in the Same Package</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-use-tfatalf-vs-terrorf-appropriately">Use t.Fatalf vs t.Errorf Appropriately</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-solutions-to-exercises">Solutions to Exercises</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-subtract-function-and-tests">Subtract Function and Tests</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-safesubtract-function-and-tests">SafeSubtract Function and Tests</a></p>
</li>
</ul>
</li>
</ol>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you start, you should be comfortable with:</p>
<ul>
<li><p>Writing and running basic Go programs</p>
</li>
<li><p>Defining and calling functions in Go</p>
</li>
<li><p>Understanding basic Go types (int, string, bool, and so on)</p>
</li>
<li><p>Using the Go command-line tool (go run, go build)</p>
</li>
<li><p>Basic understanding of unit tests: what a test is and why it's useful</p>
</li>
<li><p>Familiarity with Test-Driven Development concepts like testing before or alongside writing code</p>
</li>
<li><p>Awareness of common testing ideas such as assertions, test coverage, and checking error conditions</p>
</li>
</ul>
<p>You don't need prior experience with Go's <code>testing</code> package or Go-specific test patterns, as this guide will cover all of that.</p>
<h2 id="heading-writing-your-first-test">Writing Your First Test</h2>
<p>Let's start with a simple function to test. Imagine you have a small <code>calc</code> package with an <code>Add</code> function:</p>
<pre><code class="lang-go"><span class="hljs-comment">// calc.go</span>
<span class="hljs-keyword">package</span> calc

<span class="hljs-comment">// Add returns the sum of two integers</span>
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Add</span><span class="hljs-params">(a, b <span class="hljs-keyword">int</span>)</span> <span class="hljs-title">int</span></span> {
    <span class="hljs-keyword">return</span> a + b
}
</code></pre>
<p>To test this function, create a new file named <code>calc_test.go</code> in the same package. In Go, test files must end with <code>_test.go</code> to be recognized by the testing tool.</p>
<p>Inside <code>calc_test.go</code>, you write a test function:</p>
<pre><code class="lang-go"><span class="hljs-comment">// calc_test.go</span>
<span class="hljs-keyword">package</span> calc

<span class="hljs-keyword">import</span> <span class="hljs-string">"testing"</span>

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">TestAdd</span><span class="hljs-params">(t *testing.T)</span></span> {
    got := Add(<span class="hljs-number">2</span>, <span class="hljs-number">3</span>)
    want := <span class="hljs-number">5</span>
    <span class="hljs-keyword">if</span> got != want {
        t.Errorf(<span class="hljs-string">"Add(2, 3) = %d; want %d"</span>, got, want)
    }
}
</code></pre>
<p>Here's what's happening:</p>
<ul>
<li><p>The function name starts with <code>Test</code> and takes a single <code>*testing.T</code> parameter. Go automatically discovers and runs any function that follows this convention.</p>
</li>
<li><p>The <code>t.Errorf</code> call reports a test failure. Unlike some frameworks, Go doesn't provide special assertions – you simply check a condition and call <code>t.Errorf</code> or <code>t.Fatalf</code> if it fails.</p>
</li>
<li><p>Each test is a standalone function. You can write as many as you like, and Go will run them all.</p>
</li>
</ul>
<h3 id="heading-running-your-test">Running Your Test</h3>
<p>Once the file is saved, you can run your test with:</p>
<pre><code class="lang-bash">go <span class="hljs-built_in">test</span>
</code></pre>
<p>This runs tests for the current package (files ending with <code>_test.go</code>). If you want to run tests recursively in all subdirectories of your project, use:</p>
<pre><code class="lang-bash">go <span class="hljs-built_in">test</span> ./...
</code></pre>
<p>The <code>./...</code> pattern is shorthand for "run tests in this directory and all subdirectories". This is especially useful in larger projects where your code is spread across multiple packages.</p>
<p>If everything is working, you should see output indicating that the test passed:</p>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span>
PASS
ok      _/C_/projects/Articles/Go_Testing       0.334s
</code></pre>
<p>You can add the <code>-v</code> flag for verbose output:</p>
<pre><code class="lang-bash">go <span class="hljs-built_in">test</span> -v
</code></pre>
<p>This will show you the names of the tests as they run:</p>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span> -v
=== RUN   TestAdd
--- PASS: TestAdd (0.00s)
PASS
ok      _/C_/projects/Articles/Go_Testing       0.356s
</code></pre>
<p>Not much difference for a single test, but it becomes useful as you add more tests.</p>
<p>Now let's see what happens if the test fails. Change the expected value in <code>calc_test.go</code> to an incorrect one:</p>
<pre><code class="lang-go">  ...
    want := <span class="hljs-number">6</span> <span class="hljs-comment">// Incorrect expected value</span>
  ...
</code></pre>
<p>Run the tests again:</p>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span>
--- FAIL: TestAdd (0.00s)
    calc_test.go:9: Add(2, 3) = 5; want 6
FAIL
<span class="hljs-built_in">exit</span> status 1
FAIL    _/C_/projects/Articles/Go_Testing       0.340s
</code></pre>
<p>or with verbose output:</p>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span> -v
=== RUN   TestAdd
    calc_test.go:9: Add(2, 3) = 5; want 6
--- FAIL: TestAdd (0.00s)
FAIL
<span class="hljs-built_in">exit</span> status 1
FAIL    _/C_/projects/Articles/Go_Testing       0.337s
</code></pre>
<p>Of course, your tests should always check for the correct expected values! A failing (but correct) test is a sign that your code needs to be fixed.</p>
<p>We only created one test file and one test function with one assertion here, but Go's testing tool can handle many files and functions at once. Behind the scenes, Go will automatically:</p>
<ul>
<li><p>Find <strong>all</strong> <code>_test.go</code> files in the specified packages (for example, current directory for <code>go test</code>, or recursively in all subdirectories with <code>go test ./...</code>).</p>
</li>
<li><p>Identify functions that start with <code>Test</code> and have the correct signature.</p>
</li>
<li><p>Compile them together with your package into a temporary test binary.</p>
</li>
<li><p>Execute each test function and report the results.</p>
</li>
</ul>
<p>To prove this, let's quickly add a <code>Divide</code> function to our package:</p>
<pre><code class="lang-go"><span class="hljs-comment">// calc.go</span>
...
<span class="hljs-comment">// Divide returns the result of dividing a by b</span>
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Divide</span><span class="hljs-params">(a, b <span class="hljs-keyword">int</span>)</span> <span class="hljs-title">int</span></span> {
    <span class="hljs-keyword">return</span> a / b
}
</code></pre>
<p>(Note that this is an <strong>integer division</strong>, so fractional parts are discarded. <code>Divide(5, 2)</code> would return <code>2</code>.)</p>
<p>And another test file with a corresponding test:</p>
<pre><code class="lang-go"><span class="hljs-comment">// calc_2_test.go</span>
<span class="hljs-keyword">package</span> calc

<span class="hljs-keyword">import</span> <span class="hljs-string">"testing"</span>

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">TestDivide</span><span class="hljs-params">(t *testing.T)</span></span> {
    got := Divide(<span class="hljs-number">10</span>, <span class="hljs-number">2</span>)
    want := <span class="hljs-number">5</span>    
    <span class="hljs-keyword">if</span> got != want {
        t.Errorf(<span class="hljs-string">"Divide(10, 2) = %d; want %d"</span>, got, want)
    }    
}
</code></pre>
<p>Now when you run <code>go test</code>, both <code>TestAdd</code> and <code>TestDivide</code> will be executed:</p>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span>
PASS
ok      _/C_/projects/Articles/Go_Testing       0.325s
</code></pre>
<p>Or:</p>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span> -v
=== RUN   TestAdd
--- PASS: TestAdd (0.00s)
=== RUN   TestDivide
--- PASS: TestDivide (0.00s)
PASS
ok      _/C_/projects/Articles/Go_Testing       0.323s
</code></pre>
<h3 id="heading-divide-by-zero">Divide by Zero</h3>
<p>What happens if we try to <code>Divide</code> by zero? Let's add another test case for that:</p>
<pre><code class="lang-go"><span class="hljs-comment">// calc_test.go</span>
...
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">TestDivideByZero</span><span class="hljs-params">(t *testing.T)</span></span> {
    <span class="hljs-keyword">defer</span> <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">()</span></span> {
        <span class="hljs-keyword">if</span> r := <span class="hljs-built_in">recover</span>(); r == <span class="hljs-literal">nil</span> { <span class="hljs-comment">// Check if a panic occurred</span>
            t.Errorf(<span class="hljs-string">"Divide did not panic on division by zero"</span>)
        }
    }()
    Divide(<span class="hljs-number">10</span>, <span class="hljs-number">0</span>) <span class="hljs-comment">// This should cause a panic</span>
}
</code></pre>
<p>This test checks that the <code>Divide</code> function panics when dividing by zero. When you run the tests again, you'll see that this new test also passes:</p>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span> -v
=== RUN   TestAdd
--- PASS: TestAdd (0.00s)
=== RUN   TestDivide
--- PASS: TestDivide (0.00s)
=== RUN   TestDivideByZero
--- PASS: TestDivideByZero (0.00s)
PASS
ok      _/C_/projects/Articles/Go_Testing       0.312s
</code></pre>
<p>(Note that in real-world Go code, it's better to return <code>(int, error)</code> for unsafe operations instead of panicking.)</p>
<p>Feel free to experiment by adding more test cases, changing expected values, and exploring how Go's testing framework handles different scenarios.</p>
<h3 id="heading-terrorf-vs-tfatalf"><code>t.Errorf</code> vs <code>t.Fatalf</code></h3>
<p>In the examples above, we used <code>t.Errorf</code> to report test failures. This function logs the error but allows the test to continue running. This is useful when you want to check multiple conditions in a single test function.</p>
<p>In contrast, <code>t.Fatalf</code> logs the error and immediately stops the execution of the current test. Use <code>t.Fatalf</code> when continuing the test after a failure doesn't make sense or could cause misleading results.</p>
<p>For example, in the <code>TestDivideByZero</code> test, if the <code>Divide</code> function does not panic, we use <code>t.Errorf</code> to report the failure but continue to the end of the test. But if we had additional checks after the division, we might want to use <code>t.Fatalf</code> to stop execution immediately upon failure.</p>
<p>While <code>t.Errorf</code> and <code>t.Fatalf</code> use <code>fmt</code>-style formatting, for simple messages without formatting, you can also use <code>t.Error</code> and <code>t.Fatal</code>, respectively.</p>
<p>In the next section, we'll look at <em>table-driven tests</em>, a common Go pattern for testing multiple cases efficiently.</p>
<h2 id="heading-table-driven-tests">Table-Driven Tests</h2>
<p>In Go, it's common to want to run the same test logic for multiple inputs and expected outputs. Rather than writing a separate test function for each case, Go developers often use <strong>table-driven tests</strong>. This pattern keeps your tests concise, readable, and easy to extend.</p>
<h3 id="heading-table-driven-add-test">Table-Driven <code>Add</code> Test</h3>
<p>Let's rewrite our Add test using a table-driven approach (and delete <code>calc_2_test.go</code> for clarity):</p>
<pre><code class="lang-go"><span class="hljs-comment">// calc_test.go</span>
<span class="hljs-keyword">package</span> calc

<span class="hljs-keyword">import</span> <span class="hljs-string">"testing"</span>

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">TestAddTableDriven</span><span class="hljs-params">(t *testing.T)</span></span> {
    tests := []<span class="hljs-keyword">struct</span> {<span class="hljs-comment">// Define a struct for each test case and create a slice of them</span>
        name <span class="hljs-keyword">string</span>
        a, b <span class="hljs-keyword">int</span>
        want <span class="hljs-keyword">int</span>
    }{
        {<span class="hljs-string">"both positive"</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">5</span>},
        {<span class="hljs-string">"positive + zero"</span>, <span class="hljs-number">5</span>, <span class="hljs-number">0</span>, <span class="hljs-number">5</span>},
        {<span class="hljs-string">"negative + positive"</span>, <span class="hljs-number">-1</span>, <span class="hljs-number">4</span>, <span class="hljs-number">3</span>},
        {<span class="hljs-string">"both negative"</span>, <span class="hljs-number">-2</span>, <span class="hljs-number">-3</span>, <span class="hljs-number">-5</span>},
    }

    <span class="hljs-keyword">for</span> _, tt := <span class="hljs-keyword">range</span> tests {<span class="hljs-comment">// Loop over each test case</span>
        t.Run(tt.name, <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(t *testing.T)</span></span> {<span class="hljs-comment">// Run each case as a subtest</span>
            got := Add(tt.a, tt.b)
            <span class="hljs-keyword">if</span> got != tt.want {<span class="hljs-comment">// Check the result</span>
                t.Errorf(<span class="hljs-string">"Add(%d, %d) = %d; want %d"</span>, tt.a, tt.b, got, tt.want) <span class="hljs-comment">// Report failure if it doesn't match</span>
            }
        })
    }
}
</code></pre>
<p>Here's how it works:</p>
<ul>
<li><p>We define a <strong>slice of structs</strong>, each representing a test case.</p>
</li>
<li><p>Each struct contains the test name, input values, and the expected result.</p>
</li>
<li><p>We loop over the slice and call <code>t.Run(tt.name, func(t *testing.T) { ... })</code> to run each test as a <strong>subtest</strong>.</p>
</li>
<li><p>If a subtest fails, you can see which one by its name in the output.</p>
</li>
</ul>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span>
PASS
ok      _/C_/projects/Articles/Go_Testing       0.452s
</code></pre>
<p>Or to see detailed output:</p>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span> -v
=== RUN   TestAddTableDriven
=== RUN   TestAddTableDriven/both_positive
=== RUN   TestAddTableDriven/positive_+_zero
=== RUN   TestAddTableDriven/negative_+_positive
=== RUN   TestAddTableDriven/both_negative
--- PASS: TestAddTableDriven (0.00s)
    --- PASS: TestAddTableDriven/both_positive (0.00s)
    --- PASS: TestAddTableDriven/positive_+_zero (0.00s)
    --- PASS: TestAddTableDriven/negative_+_positive (0.00s)
    --- PASS: TestAddTableDriven/both_negative (0.00s)
PASS
ok      _/C_/projects/Articles/Go_Testing       0.385s
</code></pre>
<h3 id="heading-table-driven-divide-test">Table-Driven Divide Test</h3>
<p>We can apply the same pattern to <code>Divide</code>, including checking for divide-by-zero:</p>
<pre><code class="lang-go"><span class="hljs-comment">// calc_test.go</span>
...
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">TestDivideTableDriven</span><span class="hljs-params">(t *testing.T)</span></span> {
    tests := []<span class="hljs-keyword">struct</span> { <span class="hljs-comment">// Define test cases</span>
        name     <span class="hljs-keyword">string</span>
        a, b     <span class="hljs-keyword">int</span>
        want     <span class="hljs-keyword">int</span>
        wantPanic <span class="hljs-keyword">bool</span>
    }{
        {<span class="hljs-string">"normal division"</span>, <span class="hljs-number">10</span>, <span class="hljs-number">2</span>, <span class="hljs-number">5</span>, <span class="hljs-literal">false</span>},
        {<span class="hljs-string">"division by zero"</span>, <span class="hljs-number">10</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-literal">true</span>},
    }

    <span class="hljs-keyword">for</span> _, tt := <span class="hljs-keyword">range</span> tests { <span class="hljs-comment">// Loop over</span>
        t.Run(tt.name, <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(t *testing.T)</span></span> { <span class="hljs-comment">// Run subtest</span>
            <span class="hljs-keyword">if</span> tt.wantPanic { <span class="hljs-comment">// Check for expected panic</span>
                <span class="hljs-keyword">defer</span> <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">()</span></span> { <span class="hljs-comment">// Recover from panic</span>
                    <span class="hljs-keyword">if</span> r := <span class="hljs-built_in">recover</span>(); r == <span class="hljs-literal">nil</span> {
                        t.Errorf(<span class="hljs-string">"Divide(%d, %d) did not panic"</span>, tt.a, tt.b)
                    }
                }()
            }
            got := Divide(tt.a, tt.b) <span class="hljs-comment">// Tests that do not panic</span>
            <span class="hljs-keyword">if</span> !tt.wantPanic &amp;&amp; got != tt.want {
                t.Errorf(<span class="hljs-string">"Divide(%d, %d) = %d; want %d"</span>, tt.a, tt.b, got, tt.want)
            }
        })
    }
}
</code></pre>
<p>This example shows how to handle both normal and panic cases in a single table-driven test:</p>
<ul>
<li><p>The <code>wantPanic</code> field tells the test whether we expect a panic.</p>
</li>
<li><p>We use <code>defer</code> and <code>recover</code> to check for a panic when needed.</p>
</li>
<li><p>Normal test cases still check the result as usual.</p>
</li>
</ul>
<p>Run all tests as before:</p>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span> -v
=== RUN   TestAddTableDriven
=== RUN   TestAddTableDriven/both_positive
=== RUN   TestAddTableDriven/positive_+_zero
=== RUN   TestAddTableDriven/negative_+_positive
=== RUN   TestAddTableDriven/both_negative
--- PASS: TestAddTableDriven (0.00s)
    --- PASS: TestAddTableDriven/both_positive (0.00s)
    --- PASS: TestAddTableDriven/positive_+_zero (0.00s)
    --- PASS: TestAddTableDriven/negative_+_positive (0.00s)
    --- PASS: TestAddTableDriven/both_negative (0.00s)
=== RUN   TestDivideTableDriven
=== RUN   TestDivideTableDriven/normal_division
=== RUN   TestDivideTableDriven/division_by_zero
--- PASS: TestDivideTableDriven (0.00s)
    --- PASS: TestDivideTableDriven/normal_division (0.00s)
    --- PASS: TestDivideTableDriven/division_by_zero (0.00s)
PASS
ok      _/C_/projects/Articles/Go_Testing       0.321s
</code></pre>
<p>Subtest names make it easy to see which case passed or failed.</p>
<h3 id="heading-exercise">Exercise</h3>
<p>Try creating your own table-driven test for a new function, <code>Subtract(a, b int) int</code>. Include at least four test cases:</p>
<ul>
<li><p>Both positive numbers</p>
</li>
<li><p>Positive minus zero</p>
</li>
<li><p>Negative minus positive</p>
</li>
<li><p>Both negative</p>
</li>
</ul>
<p>Then run your tests and verify the output.</p>
<h2 id="heading-testing-functions-that-return-errors">Testing Functions That Return Errors</h2>
<p>Many Go functions return an error as the last return value. Writing tests for these functions is slightly different from testing pure functions like our <code>Add</code> or <code>Divide</code>, because you need to check both the result and whether an error occurred.</p>
<h3 id="heading-safe-divide-function">Safe Divide Function</h3>
<p>Let's add a <code>SafeDivide</code> function to return an error instead of panicking:</p>
<pre><code class="lang-go"><span class="hljs-comment">// calc.go</span>
...
<span class="hljs-keyword">import</span> <span class="hljs-string">"fmt"</span>
...
<span class="hljs-comment">// SafeDivide returns the result of dividing a by b.</span>
<span class="hljs-comment">// It returns an error if b is zero.</span>
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">SafeDivide</span><span class="hljs-params">(a, b <span class="hljs-keyword">int</span>)</span> <span class="hljs-params">(<span class="hljs-keyword">int</span>, error)</span></span> {
    <span class="hljs-keyword">if</span> b == <span class="hljs-number">0</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>, fmt.Errorf(<span class="hljs-string">"cannot divide by zero"</span>)
    }
    <span class="hljs-keyword">return</span> a / b, <span class="hljs-literal">nil</span>
}
</code></pre>
<h3 id="heading-writing-tests-for-safedivide">Writing Tests for <code>SafeDivide()</code></h3>
<p>We can use a table-driven test again:</p>
<pre><code class="lang-go"><span class="hljs-comment">// calc_test.go</span>
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">TestSafeDivide</span><span class="hljs-params">(t *testing.T)</span></span> {
    tests := []<span class="hljs-keyword">struct</span> {
        name      <span class="hljs-keyword">string</span>
        a, b      <span class="hljs-keyword">int</span>
        want      <span class="hljs-keyword">int</span>
        wantError <span class="hljs-keyword">bool</span>
    }{
        {<span class="hljs-string">"normal division"</span>, <span class="hljs-number">10</span>, <span class="hljs-number">2</span>, <span class="hljs-number">5</span>, <span class="hljs-literal">false</span>},
        {<span class="hljs-string">"division by zero"</span>, <span class="hljs-number">10</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-literal">true</span>},
    }

    <span class="hljs-keyword">for</span> _, tt := <span class="hljs-keyword">range</span> tests {
        t.Run(tt.name, <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(t *testing.T)</span></span> {
            got, err := SafeDivide(tt.a, tt.b)
            <span class="hljs-keyword">if</span> tt.wantError {
                <span class="hljs-keyword">if</span> err == <span class="hljs-literal">nil</span> {
                    t.Errorf(<span class="hljs-string">"SafeDivide(%d, %d) expected error, got nil"</span>, tt.a, tt.b)
                }
                <span class="hljs-keyword">return</span> <span class="hljs-comment">// stop here, no need to check `got`</span>
            }
            <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
                t.Errorf(<span class="hljs-string">"SafeDivide(%d, %d) unexpected error: %v"</span>, tt.a, tt.b, err)
            }
            <span class="hljs-keyword">if</span> got != tt.want {
                t.Errorf(<span class="hljs-string">"SafeDivide(%d, %d) = %d; want %d"</span>, tt.a, tt.b, got, tt.want)
            }
        })
    }
}
</code></pre>
<p>What's happening here:</p>
<ul>
<li><p>We added a <code>wantError</code> field to indicate whether the test expects an error.</p>
</li>
<li><p>If an error is expected, we check that <code>err != nil</code>. If not (that is, <code>err == nil</code>), we fail the test.</p>
</li>
<li><p>If no error is expected, we check both the returned value (<code>got</code>) and that <code>err == nil</code>.</p>
</li>
<li><p>Using <code>t.Run</code> subtests keeps everything organized and readable.</p>
</li>
</ul>
<p>Running the tests again:</p>
<pre><code class="lang-bash">$ go <span class="hljs-built_in">test</span> -v
...
=== RUN   TestSafeDivide
=== RUN   TestSafeDivide/normal_division
=== RUN   TestSafeDivide/division_by_zero
--- PASS: TestSafeDivide (0.00s)
    --- PASS: TestSafeDivide/normal_division (0.00s)
    --- PASS: TestSafeDivide/division_by_zero (0.00s)
PASS
ok      _/C_/projects/Articles/Go_Testing       0.323s
</code></pre>
<p>Showing that both normal and error cases are handled correctly.</p>
<h3 id="heading-exercise-1">Exercise</h3>
<p>Update your <code>Subtract(a, b int) int</code> function to a <code>SafeSubtract(a, b int) (int, error)</code> variant that returns an error if the result would be negative. Then write a table-driven test that covers:</p>
<ul>
<li><p>A positive result</p>
</li>
<li><p>Zero result</p>
</li>
<li><p>A negative result (should return an error)</p>
</li>
</ul>
<h2 id="heading-best-practices-and-tips">Best Practices and Tips</h2>
<p>Writing tests in Go is straightforward, but there are a few conventions and tips that make your tests more readable, maintainable, and idiomatic:</p>
<h3 id="heading-name-tests-clearly">Name Tests Clearly</h3>
<p>First, make sure you use descriptive names for test functions and subtests. A good name explains what you're testing and under what conditions.</p>
<p>Here’s an example:</p>
<pre><code class="lang-go">t.Run(<span class="hljs-string">"Divide positive numbers"</span>, <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(t *testing.T)</span></span> { ... })
t.Run(<span class="hljs-string">"Divide by zero returns error"</span>, <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(t *testing.T)</span></span> { ... })
</code></pre>
<h3 id="heading-keep-tests-small-and-focused">Keep Tests Small and Focused</h3>
<p>Each subtest should verify one thing, and each test function should cover a single function or method.</p>
<p>Try to avoid combining multiple unrelated checks in the same test function, and use table-driven tests help keep multiple similar checks concise without losing clarity.</p>
<h3 id="heading-use-table-driven-tests-for-repetitive-cases">Use Table-Driven Tests for Repetitive Cases</h3>
<p>If you find yourself writing multiple similar test functions, switch to a table-driven pattern. It makes it easier to add new cases, reduces duplicated code, and keeps output organized with <code>t.Run</code>.</p>
<h3 id="heading-check-errors-explicitly">Check Errors Explicitly</h3>
<p>In Go, functions often return <code>error</code>. So make sure you always check for errors in tests, even if you expect <code>nil</code>.</p>
<p>You can use the <code>wantError</code> pattern in table-driven tests for clarity.</p>
<pre><code class="lang-go"><span class="hljs-keyword">if</span> tt.wantError {
    <span class="hljs-keyword">if</span> err == <span class="hljs-literal">nil</span> {
        t.Errorf(<span class="hljs-string">"expected error, got nil"</span>)
    }
}
</code></pre>
<h3 id="heading-avoid-panics-when-possible">Avoid Panics When Possible</h3>
<p>Panics are fine for some internal checks, but in production code, prefer returning an error.</p>
<p>Your tests can check for panics using <code>defer</code> and <code>recover</code>, but this should be the exception rather than the norm.</p>
<h3 id="heading-run-tests-frequently">Run Tests Frequently</h3>
<p>Try to make running tests a habit: <code>go test -v ./...</code>. Frequent testing helps catch mistakes early and reinforces TDD practices.</p>
<h3 id="heading-keep-tests-in-the-same-package">Keep Tests in the Same Package</h3>
<p>By convention, tests live in the same package as the code they test. You can create <code>_test.go</code> files for testing, and Go automatically recognizes them.</p>
<p>Only use a separate <code>package calc_test</code> if you want to test your code from the outside, like a consumer. External test packages (just like every other external package) cannot access unexported identifiers.</p>
<h3 id="heading-use-tfatalf-vs-terrorf-appropriately">Use t.Fatalf vs t.Errorf Appropriately</h3>
<ul>
<li><p><code>t.Errorf</code> reports a failure but continues running the test.</p>
</li>
<li><p><code>t.Fatalf</code> stops the test immediately, which is useful if subsequent code depends on successful setup.</p>
</li>
</ul>
<p>These tips will help you write clean, maintainable, and idiomatic Go tests that are easy to read and extend. Following these practices early in your Go journey will make testing less intimidating and more effective.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Unit testing in Go may feel different at first, especially if you're coming from ecosystems with heavy frameworks and assertions. But the simplicity of Go's testing tools is one of its strengths: once you understand the conventions, writing, running, and organizing tests becomes predictable and intuitive.</p>
<p>In this guide, you've seen how to:</p>
<ul>
<li><p>Write basic test functions with the testing package</p>
</li>
<li><p>Run tests from the command line and interpret the results</p>
</li>
<li><p>Use table-driven tests to cover multiple cases efficiently</p>
</li>
<li><p>Handle functions that return errors and check for expected failures</p>
</li>
</ul>
<p>Beyond these fundamentals, testing is not just about verifying correctness, it's also about confidence. Well-tested code allows you to refactor, experiment, and add new features with less fear of breaking existing functionality.</p>
<p>As you continue writing Go code, try to integrate testing early, follow the idiomatic patterns you've learned, and explore more advanced topics such as:</p>
<ul>
<li><p>Using <em>mocks</em> or <em>interfaces</em> to isolate dependencies</p>
</li>
<li><p>Benchmark tests with <code>testing.B</code></p>
</li>
<li><p>Coverage analysis with <code>go test -cover</code></p>
</li>
</ul>
<p>The key takeaway is that testing in Go is accessible, flexible, and powerful, even without fancy frameworks. By building these habits now, you'll write code that's more reliable, maintainable, and enjoyable to work with.</p>
<h2 id="heading-solutions-to-exercises">Solutions to Exercises</h2>
<h3 id="heading-subtract-function-and-tests">Subtract Function and Tests</h3>
<pre><code class="lang-go"><span class="hljs-comment">// calc.go</span>
<span class="hljs-keyword">package</span> calc

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Subtract</span><span class="hljs-params">(a, b <span class="hljs-keyword">int</span>)</span> <span class="hljs-title">int</span></span> {
    <span class="hljs-keyword">return</span> a - b
}
</code></pre>
<pre><code class="lang-go"><span class="hljs-comment">// calc_test.go</span>
<span class="hljs-keyword">package</span> calc

<span class="hljs-keyword">import</span> <span class="hljs-string">"testing"</span>

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">TestSubtractTableDriven</span><span class="hljs-params">(t *testing.T)</span></span> {
    tests := []<span class="hljs-keyword">struct</span> {
        name <span class="hljs-keyword">string</span>
        a, b <span class="hljs-keyword">int</span>
        want <span class="hljs-keyword">int</span>
    }{
        {<span class="hljs-string">"both positive"</span>, <span class="hljs-number">5</span>, <span class="hljs-number">3</span>, <span class="hljs-number">2</span>},
        {<span class="hljs-string">"positive minus zero"</span>, <span class="hljs-number">5</span>, <span class="hljs-number">0</span>, <span class="hljs-number">5</span>},
        {<span class="hljs-string">"negative minus positive"</span>, <span class="hljs-number">-1</span>, <span class="hljs-number">4</span>, <span class="hljs-number">-5</span>},
        {<span class="hljs-string">"both negative"</span>, <span class="hljs-number">-3</span>, <span class="hljs-number">-2</span>, <span class="hljs-number">-1</span>},
    }

    <span class="hljs-keyword">for</span> _, tt := <span class="hljs-keyword">range</span> tests {
        t.Run(tt.name, <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(t *testing.T)</span></span> {
            got := Subtract(tt.a, tt.b)
            <span class="hljs-keyword">if</span> got != tt.want {
                t.Errorf(<span class="hljs-string">"Subtract(%d, %d) = %d; want %d"</span>, tt.a, tt.b, got, tt.want)
            }
        })
    }
}
</code></pre>
<h3 id="heading-safesubtract-function-and-tests">SafeSubtract Function and Tests</h3>
<pre><code class="lang-go"><span class="hljs-comment">// calc.go</span>
<span class="hljs-keyword">package</span> calc

<span class="hljs-keyword">import</span> <span class="hljs-string">"fmt"</span>

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">SafeSubtract</span><span class="hljs-params">(a, b <span class="hljs-keyword">int</span>)</span> <span class="hljs-params">(<span class="hljs-keyword">int</span>, error)</span></span> {
    result := a - b
    <span class="hljs-keyword">if</span> result &lt; <span class="hljs-number">0</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>, fmt.Errorf(<span class="hljs-string">"result would be negative"</span>)
    }
    <span class="hljs-keyword">return</span> result, <span class="hljs-literal">nil</span>
}
</code></pre>
<pre><code class="lang-go"><span class="hljs-comment">// calc_test.go</span>
<span class="hljs-keyword">package</span> calc

<span class="hljs-keyword">import</span> <span class="hljs-string">"testing"</span>

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">TestSafeSubtract</span><span class="hljs-params">(t *testing.T)</span></span> {
    tests := []<span class="hljs-keyword">struct</span> {
        name      <span class="hljs-keyword">string</span>
        a, b      <span class="hljs-keyword">int</span>
        want      <span class="hljs-keyword">int</span>
        wantError <span class="hljs-keyword">bool</span>
    }{
        {<span class="hljs-string">"positive result"</span>, <span class="hljs-number">5</span>, <span class="hljs-number">3</span>, <span class="hljs-number">2</span>, <span class="hljs-literal">false</span>},
        {<span class="hljs-string">"zero result"</span>, <span class="hljs-number">3</span>, <span class="hljs-number">3</span>, <span class="hljs-number">0</span>, <span class="hljs-literal">false</span>},
        {<span class="hljs-string">"negative result"</span>, <span class="hljs-number">2</span>, <span class="hljs-number">5</span>, <span class="hljs-number">0</span>, <span class="hljs-literal">true</span>},
    }

    <span class="hljs-keyword">for</span> _, tt := <span class="hljs-keyword">range</span> tests {
        t.Run(tt.name, <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(t *testing.T)</span></span> {
            got, err := SafeSubtract(tt.a, tt.b)
            <span class="hljs-keyword">if</span> tt.wantError {
                <span class="hljs-keyword">if</span> err == <span class="hljs-literal">nil</span> {
                    t.Errorf(<span class="hljs-string">"SafeSubtract(%d, %d) expected error, got nil"</span>, tt.a, tt.b)
                }
                <span class="hljs-keyword">return</span>
            }
            <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
                t.Errorf(<span class="hljs-string">"SafeSubtract(%d, %d) unexpected error: %v"</span>, tt.a, tt.b, err)
            }
            <span class="hljs-keyword">if</span> got != tt.want {
                t.Errorf(<span class="hljs-string">"SafeSubtract(%d, %d) = %d; want %d"</span>, tt.a, tt.b, got, tt.want)
            }
        })
    }
}
</code></pre>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Test and Improve AI Applications with an Evaluation Flywheel ]]>
                </title>
                <description>
                    <![CDATA[ In traditional programming, developers rely on unit tests to catch mistakes in applications. But when building AI products, that safety net doesn't exist. Responses can shift with model updates, data changes, and subtle fluctuations in prompts or ret... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-test-and-improve-ai-applications-with-an-evaluation-flywheel/</link>
                <guid isPermaLink="false">69491adc842069e2b48bbae7</guid>
                
                    <category>
                        <![CDATA[ ai agents ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ optimization ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Yemi Ojedapo ]]>
                </dc:creator>
                <pubDate>Mon, 22 Dec 2025 10:18:04 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1766082262126/bc54e004-7acc-49fc-b228-24524f250427.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In traditional programming, developers rely on unit tests to catch mistakes in applications. But when building AI products, that safety net doesn't exist. Responses can shift with model updates, data changes, and subtle fluctuations in prompts or retrieval results. The usual testing methods like unit tests with Pytest or Jest, integration tests, CI pipelines, fail to catch accuracy drops, hallucinations, or regressions, and these silent failures can become real production risks.</p>
<p>In this article, you’ll learn why traditional testing methods fall short for AI systems and how an evaluation flywheel can be used as a practical approach to testing and improving AI applications. The sections below break the evaluation flywheel down step by step, from identifying the problem to implementing a repeatable evaluation loop.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-does-traditional-testing-fail-for-ai-applications">Why Does Traditional Testing Fail for AI applications?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-the-evaluation-flywheel">What is the Evaluation Flywheel?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-drawing-parallels-to-familiar-practices">Drawing Parallels to Familiar Practices</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-silent-failures-matter-a-real-world-example">Why Silent Failures Matter: A Real-World Example</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-an-evaluation-flywheel">How to Create an Evaluation Flywheel</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tools-and-frameworks-you-can-use-for-evaluation">Tools and Frameworks you can use for evaluation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-a-complete-evaluation-loop-looks-like-in-practice">What a Complete Evaluation Loop Looks Like in Practice</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-takeaways">Key Takeaways</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-why-does-traditional-testing-fail-for-ai-applications">Why Does Traditional Testing Fail for AI applications?</h2>
<p>In standard programming, tests assume deterministic behavior. This means the same input is expected to always produce the same output. For example:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">authenticate_user_age</span>(<span class="hljs-params">age: int</span>) -&gt; str:</span>
    limit = <span class="hljs-number">18</span>

    <span class="hljs-keyword">if</span> age &gt;= limit:
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Access granted"</span>
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-string">"User doesn't meet the age limit"</span>

<span class="hljs-comment"># Test </span>
<span class="hljs-keyword">assert</span> authenticate_user_age(<span class="hljs-number">20</span>) == <span class="hljs-string">"Access granted"</span>
<span class="hljs-keyword">assert</span> authenticate_user_age(<span class="hljs-number">16</span>) == <span class="hljs-string">"User doesn't meet the age limit"</span>
</code></pre>
<p>The response from this function is always predictable. You can write tests once and trust they'll catch errors forever.</p>
<p>However, AI models don’t behave the same way every time, they generate output based on probabilities. A query like “best programming practices” may produce strong guidance one day, and outdated or incomplete advice the next. This shift can happen because of changes in the underlying model, updates to retrieval components, or gradual data drift. Without a structured evaluation process in place, these inconsistencies slip into production unnoticed and can quietly weaken the system’s performance.</p>
<h2 id="heading-what-is-the-evaluation-flywheel">What is the Evaluation Flywheel?</h2>
<p>The evaluation flywheel is a continuous improvement system where test cases representing real user behavior are passed through multiple evaluation steps to assess the output of AI models. The results don't just tell you whether the system passed or failed, they feed directly into the next cycle of improvement.</p>
<pre><code class="lang-plaintext">┌─────────────┐
│   Collect   │
│ Test Cases  │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│     Run     │
│ Evaluations │
└──────┬──────┘
       │
       ▼
┌─────────────┐      ┌─────────────┐
│  Identify   │─────▶│   Improve   │
│  Failures   │      │   System    │
└─────────────┘      └──────┬──────┘
                            │
                            ▼
                       ┌─────────────┐
                       │   Repeat    │
                       └─────────────┘
</code></pre>
<p>Here's how it works in practice:</p>
<ul>
<li><p><strong>Collect test cases</strong> — Gather examples from real user interactions or create synthetic scenarios. These should reflect the kind of tasks and input your system needs to handle.</p>
</li>
<li><p><strong>Run evaluations</strong> — Pass each test case through a series of checks. The check can either be programmatic (automated metrics like relevance scores or hallucination detectors) or require manual review (like verifying legal advice accuracy or brand voice consistency).</p>
</li>
<li><p><strong>Identify failures</strong> — Detect where the model goes wrong, this can include hallucinations, irrelevant responses, or mistakes on corner-cases.</p>
</li>
<li><p><strong>Improve the system</strong> — Based on those failures, refine prompts, improve training or retrieval data, or adjust architectural components.</p>
</li>
<li><p><strong>Repeat the cycle</strong> — Re-run the updated system on the existing and newly collected cases. Over time, this grows and strengthens your evaluation suite and boosts system reliability.</p>
</li>
</ul>
<h2 id="heading-drawing-parallels-to-familiar-practices">Drawing Parallels to Familiar Practices</h2>
<p>If you've written software before, the evaluation flywheel will feel familiar. It mirrors patterns that are already used in engineering. For instance,</p>
<p><strong>Unit tests → Evaluation datasets</strong><br>Unit tests confirm a function returns the right output. Evaluation datasets play the same role for AI: they're ground-truth queries and answers that guard against regressions.</p>
<p><strong>Test-driven development (TDD) → Evaluation-driven development (EDD)</strong><br>In TDD, you write tests before code. In EDD, you write evaluation cases before shipping prompts or updating models. This replaces assumptions with verifiable results.</p>
<p><strong>CI/CD pipelines → Continuous evaluation pipelines</strong><br>CI/CD runs checks automatically on every code change. Continuous evaluation does the same for models: it runs automated quality checks every time you tweak a prompt, retrain, or swap out a component.</p>
<p>The key difference is subtle but important. Traditional software tests check whether a function returns the right value or type. AI evaluation tests check whether the system produces the right <em>meaning</em>. That's harder to measure, but the principle is the same: build a safety net that grows stronger with every cycle.</p>
<h2 id="heading-why-silent-failures-matter-a-real-world-example">Why Silent Failures Matter: A Real-World Example</h2>
<p>AI systems often behave differently in production than they do in development. A model that seems solid in testing can drift, hallucinate, or silently fail when facing real-world input.</p>
<p><strong>Case in point</strong>: A fraud detection model passed all monitoring metrics yet missed a spike in fraud. An ML engineer shared how their production monitoring dashboards tracked latency, throughput, and error rates, everything showed green. But fraudulent transactions were slipping through at twice the normal rate. Nobody noticed because existing observability tools focused on pipeline health, not prediction quality.</p>
<p>This silent failure cost the company significant losses. The system seemed fine by traditional metrics. It measured system performance—latency, throughput, uptime—but ignored what mattered most: prediction accuracy. As fraudsters adapted their tactics, the model drifted, and without proper evaluation loops, the degradation went undetected for weeks.</p>
<p>Source: <a target="_blank" href="https://insightfinder.com/blog/model-drift-ai-observability/">InsightFinder</a>.</p>
<h3 id="heading-why-this-example-matters">Why This Example Matters</h3>
<ul>
<li><p><strong>Silent failures aren't always bugs</strong> — They often stem from models failing to adapt to shifting patterns in the real world.</p>
</li>
<li><p><strong>Static evaluation isn't enough</strong> — You need continuous, real-world feedback loops to detect when assumptions no longer hold.</p>
</li>
<li><p><strong>Data drift has business impact</strong> — Model degradation isn't just technical, it translates directly into revenue loss, security breaches, or damaged user trust.</p>
</li>
</ul>
<h2 id="heading-how-to-create-an-evaluation-flywheel">How to Create an Evaluation Flywheel</h2>
<p>To show how to build a flywheel and how it works, let's create one for a customer support chatbot that answers questions about a SaaS product.</p>
<h3 id="heading-step-1-build-your-ai-system"><strong>Step 1: Build Your AI System</strong></h3>
<p>Create your initial product: prompts, retrieval logic, and integrations. For our chatbot:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">answer_support_question</span>(<span class="hljs-params">question: str</span>) -&gt; str:</span>
    <span class="hljs-comment"># Retrieve relevant docs from knowledge base</span>
    context = retrieve_docs(question, top_k=<span class="hljs-number">5</span>)

    <span class="hljs-comment"># Generate answer using LLM</span>
    prompt = <span class="hljs-string">f"""You are a helpful customer support agent.

Context: <span class="hljs-subst">{context}</span>

Question: <span class="hljs-subst">{question}</span>

Provide a clear, accurate answer based on the context."""</span>

    response = llm.generate(prompt)
    <span class="hljs-keyword">return</span> response
</code></pre>
<p><strong>How this works:</strong> This function defines the core chat logic, it takes a customer’s question and returns an AI-generated answer. First, it searches your knowledge base to find the five most relevant documents using <code>retrieve_docs()</code>. These documents provide context about your product or policies. Next, it constructs a prompt that includes this context and the user's question, then sends it to a language model. The LLM reads the context and generates a relevant answer, which the function returns.</p>
<h3 id="heading-step-2-identify-test-cases">Step 2: Identify Test Cases</h3>
<p>Build an evaluation set that reflects real user behavior. The more representative your test cases are, including common cases, edge cases, and ambiguous inputs, the better your model can catch failures before they reach production.</p>
<p><strong>Sources for test cases:</strong></p>
<ul>
<li><p>Previous customer support tickets</p>
</li>
<li><p>Common FAQ topics</p>
</li>
<li><p>Edge cases discovered in beta testing</p>
</li>
<li><p>Synthetic scenarios (hypothetical but realistic queries)</p>
</li>
</ul>
<p>Example test cases:</p>
<pre><code class="lang-python">test_cases = [
    {
        <span class="hljs-string">"question"</span>: <span class="hljs-string">"How do I reset my password?"</span>,
        <span class="hljs-string">"expected_elements"</span>: [<span class="hljs-string">"settings page"</span>, <span class="hljs-string">"reset link"</span>, <span class="hljs-string">"email"</span>],
        <span class="hljs-string">"category"</span>: <span class="hljs-string">"account_management"</span>
    },
    {
        <span class="hljs-string">"question"</span>: <span class="hljs-string">"What's your refund policy?"</span>,
        <span class="hljs-string">"expected_elements"</span>: [<span class="hljs-string">"30 days"</span>, <span class="hljs-string">"full refund"</span>, <span class="hljs-string">"contact support"</span>],
        <span class="hljs-string">"category"</span>: <span class="hljs-string">"billing"</span>
    },
    {
        <span class="hljs-string">"question"</span>: <span class="hljs-string">"Can I export my data to CSV?"</span>,
        <span class="hljs-string">"expected_elements"</span>: [<span class="hljs-string">"yes"</span>, <span class="hljs-string">"export button"</span>, <span class="hljs-string">"dashboard"</span>],
        <span class="hljs-string">"category"</span>: <span class="hljs-string">"features"</span>
    },
    {
        <span class="hljs-string">"question"</span>: <span class="hljs-string">"Does your API support webhooks?"</span>,
        <span class="hljs-string">"expected_elements"</span>: [<span class="hljs-string">"yes"</span>, <span class="hljs-string">"webhook endpoints"</span>, <span class="hljs-string">"documentation"</span>],
        <span class="hljs-string">"category"</span>: <span class="hljs-string">"technical"</span>
    }
]
</code></pre>
<p><strong>How this works:</strong> Here, we define a set of representative test cases to evaluate the AI system. Each test case includes the user’s question, a list of key elements expected in the answer, and a category for organization. These cases help ensure the chatbot is tested against real-world scenarios, edge cases, and important information that should appear in responses.</p>
<h3 id="heading-step-3-evaluate-outputs">Step 3: Evaluate Outputs</h3>
<p>Define evaluation criteria based on what matters for your use case: accuracy, faithfulness, safety, relevance, tone. Then measure the output against these criteria.</p>
<p>Evaluation happens in two main ways:</p>
<h4 id="heading-automated-evaluation">Automated Evaluation</h4>
<p>Use programmatic metrics and LLM-as-judge patterns:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">evaluate_response</span>(<span class="hljs-params">question: str, response: str, expected_elements: list</span>) -&gt; dict:</span>
    scores = {}

    <span class="hljs-comment"># 1. Faithfulness: Does response contain expected elements?</span>
    scores[<span class="hljs-string">'contains_key_info'</span>] = all(
        elem.lower() <span class="hljs-keyword">in</span> response.lower() 
        <span class="hljs-keyword">for</span> elem <span class="hljs-keyword">in</span> expected_elements
    )

    <span class="hljs-comment"># 2. Relevance: Semantic similarity to question</span>
    scores[<span class="hljs-string">'relevance'</span>] = calculate_semantic_similarity(question, response)

    <span class="hljs-comment"># 3. Safety: Check for problematic content</span>
    scores[<span class="hljs-string">'is_safe'</span>] = <span class="hljs-keyword">not</span> contains_harmful_content(response)

    <span class="hljs-comment"># 4. Tone: Use LLM-as-judge</span>
    judge_prompt = <span class="hljs-string">f"""Rate the helpfulness of this support response on a scale of 1-5.

Question: <span class="hljs-subst">{question}</span>
Response: <span class="hljs-subst">{response}</span>

Score (1-5):"""</span>

    scores[<span class="hljs-string">'helpfulness'</span>] = int(llm.generate(judge_prompt))

    <span class="hljs-keyword">return</span> scores

<span class="hljs-comment"># Run evaluation</span>
<span class="hljs-keyword">for</span> test_case <span class="hljs-keyword">in</span> test_cases:
    response = answer_support_question(test_case[<span class="hljs-string">'question'</span>])
    scores = evaluate_response(
        test_case[<span class="hljs-string">'question'</span>],
        response,
        test_case[<span class="hljs-string">'expected_elements'</span>]
    )
    test_case[<span class="hljs-string">'scores'</span>] = scores
    test_case[<span class="hljs-string">'response'</span>] = response
</code></pre>
<p><strong>How this works:</strong> The <code>evaluate_response()</code> function applies four different checks to each AI response:</p>
<ul>
<li><p>First, it verifies faithfulness by checking if all expected elements appear in the response using simple string matching.</p>
</li>
<li><p>Second, it calculates semantic similarity, a measure of how closely the responses meaning match the intent of the questions, using embeddings.</p>
</li>
<li><p>Third, it runs a safety check to flag any problematic content.</p>
</li>
<li><p>Fourth, it uses an LLM as a judge by asking a more powerful model (like GPT-4) to rate the helpfulness of the response on a 1-5 scale.</p>
</li>
</ul>
<p>The loop then runs the evaluation for every test case. It generates a response for each question, evaluates it using the <code>evaluate_response</code> function, and then stores both the scores and the response back in the test case. This creates a complete dataset of test results for analysis and further improvements.</p>
<p>Common Automated Metrics:</p>
<ul>
<li><p><strong>Semantic similarity (0.0–1.0):</strong> This is measured by converting the question and response into vector embeddings and calculating cosine similarity. The score shows how closely the response matches the intent of the question, even if the wording differs.</p>
</li>
<li><p><strong>ROUGE / BLEU scores:</strong> The model’s output is compared to reference answers by checking n-gram overlap. These metrics help spot regressions, though scores can be modest for open-ended answers.</p>
</li>
<li><p><strong>LLM-as-judge:</strong> A stronger model (like GPT-4 or Claude) can rate the response on a fixed scale, such as 1–5. These ratings give a sense of quality and are useful for tracking improvements or drops over time.</p>
</li>
<li><p><strong>Retrieval metrics (Precision@k, Recall@k):</strong> For retrieval-based systems, these metrics calculate how many relevant documents appear in the top-k results. Precision shows accuracy of the retrieved set, and recall indicates completeness.</p>
</li>
<li><p><strong>Custom validators:</strong> Simple rule-based checks, like regex patterns, keywords, or length limits, ensure responses meet hard requirements. These help catch issues automated metrics might miss.</p>
</li>
</ul>
<h4 id="heading-manual-evaluation">Manual Evaluation</h4>
<p>Automated metrics can't capture everything. Subjective qualities like tone, empathy, and brand voice require human judgment, as do small factual errors that slip past keyword checks and similarity scores.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Flag cases for human review</span>
needs_review = [
    case <span class="hljs-keyword">for</span> case <span class="hljs-keyword">in</span> test_cases 
    <span class="hljs-keyword">if</span> case[<span class="hljs-string">'scores'</span>][<span class="hljs-string">'helpfulness'</span>] &lt; <span class="hljs-number">3</span> 
    <span class="hljs-keyword">or</span> <span class="hljs-keyword">not</span> case[<span class="hljs-string">'scores'</span>][<span class="hljs-string">'contains_key_info'</span>]
]

<span class="hljs-comment"># SMEs review and annotate</span>
<span class="hljs-keyword">for</span> case <span class="hljs-keyword">in</span> needs_review:
    annotation = get_sme_feedback(case)
    case[<span class="hljs-string">'human_rating'</span>] = annotation[<span class="hljs-string">'rating'</span>]
    case[<span class="hljs-string">'improvement_notes'</span>] = annotation[<span class="hljs-string">'notes'</span>]
</code></pre>
<p>This code filters test cases to find responses that need human attention, those scoring below 3 for helpfulness or missing important information. Subject matter experts review these flagged cases and provide ratings with helpful feedback. Their input helps you spot patterns that automated metrics miss and shows you where to improve your prompts, retrieval setup, or system settings.</p>
<p><strong>When to use manual evaluation:</strong></p>
<ul>
<li><p>Assessing tone, empathy, or brand voice</p>
</li>
<li><p>Detecting subtle hallucinations automated checks miss</p>
</li>
<li><p>Validating edge cases with domain-specific nuance</p>
</li>
<li><p>Creating ground truth labels for training evaluation models</p>
</li>
</ul>
<h3 id="heading-step-4-learn-and-improve">Step 4: Learn and Improve</h3>
<p>Once you've identified failures, adjust the controllable parts of your AI system (the "configs"):</p>
<p><strong>Common configuration levers:</strong></p>
<ul>
<li><p><strong>Prompts</strong> — Add instructions, examples, constraints</p>
</li>
<li><p><strong>Retrieval</strong> — Change chunk size, top-k, reranking strategy</p>
</li>
<li><p><strong>Model</strong> — Switch models, adjust temperature, max tokens</p>
</li>
<li><p><strong>Context</strong> — Modify system instructions, add memory</p>
</li>
<li><p><strong>Post-processing</strong> — Add validation, formatting, safety filters</p>
</li>
</ul>
<p><strong>Example improvement cycle:</strong></p>
<pre><code class="lang-python"><span class="hljs-comment"># Problem discovered: Chatbot missing key details</span>
failing_case = {
    <span class="hljs-string">"question"</span>: <span class="hljs-string">"What's your refund policy?"</span>,
    <span class="hljs-string">"response"</span>: <span class="hljs-string">"We offer refunds in certain cases."</span>,
    <span class="hljs-string">"issue"</span>: <span class="hljs-string">"Too vague, missing 30-day window and process"</span>
}

<span class="hljs-comment"># Root cause: Retrieval returning wrong docs</span>
retrieved_docs = retrieve_docs(failing_case[<span class="hljs-string">'question'</span>], top_k=<span class="hljs-number">5</span>)
<span class="hljs-comment"># Docs about "payment processing" ranked higher than "refund policy"</span>

<span class="hljs-comment"># Solution 1: Improve retrieval with reranking</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">retrieve_docs_v2</span>(<span class="hljs-params">question: str, top_k: int</span>) -&gt; str:</span>
    <span class="hljs-comment"># Initial retrieval</span>
    candidates = vector_search(question, top_k=<span class="hljs-number">20</span>)

    <span class="hljs-comment"># Rerank by relevance</span>
    reranked = rerank_by_relevance(question, candidates)

    <span class="hljs-keyword">return</span> reranked[:top_k]

<span class="hljs-comment"># Solution 2: Update prompt to require specificity</span>
prompt_v2 = <span class="hljs-string">f"""You are a helpful customer support agent.

Context: <span class="hljs-subst">{context}</span>

Question: <span class="hljs-subst">{question}</span>

Provide a clear, accurate answer based on the context. Include specific details like:
- Time windows (e.g., "within 30 days")
- Step-by-step processes
- Relevant links or contact methods

Answer:"""</span>

<span class="hljs-comment"># Re-evaluate</span>
new_response = answer_support_question_v2(failing_case[<span class="hljs-string">'question'</span>])
new_scores = evaluate_response(
    failing_case[<span class="hljs-string">'question'</span>],
    new_response,
    [<span class="hljs-string">"30 days"</span>, <span class="hljs-string">"full refund"</span>, <span class="hljs-string">"contact support"</span>]
)

<span class="hljs-comment"># Verify improvement</span>
<span class="hljs-keyword">assert</span> new_scores[<span class="hljs-string">'contains_key_info'</span>] == <span class="hljs-literal">True</span>
<span class="hljs-keyword">assert</span> new_scores[<span class="hljs-string">'helpfulness'</span>] &gt;= <span class="hljs-number">4</span>
</code></pre>
<p><strong>How this works:</strong> In this example, the chatbot's refund answer was too vague. After checking what went wrong, the problem was that the system retrieved docs about payment processing instead of the refund policy.</p>
<p>To resolve this, two changes can be made. First, retrieval is improved by grabbing twenty documents, then picking the best five. Second, the prompt is updated to ask for specific details like dates and steps.</p>
<p>After making these changes, the test runs again to confirm it works: the response now has all the key info and scores at least 4 out of 5. This process turns problems into fixes you can measure.</p>
<h3 id="heading-step-5-automate-and-repeat">Step 5: Automate and Repeat</h3>
<p>Integrate evaluation into your development workflow using CI/CD:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># .github/workflows/eval.yml</span>
<span class="hljs-attr">name:</span> <span class="hljs-string">Continuous</span> <span class="hljs-string">Evaluation</span>

<span class="hljs-attr">on:</span>
  <span class="hljs-attr">pull_request:</span>
  <span class="hljs-attr">push:</span>
    <span class="hljs-attr">branches:</span> [<span class="hljs-string">main</span>]

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">evaluate:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v2</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Run</span> <span class="hljs-string">evaluation</span> <span class="hljs-string">suite</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">python</span> <span class="hljs-string">run_evals.py</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Check</span> <span class="hljs-string">pass</span> <span class="hljs-string">rate</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          PASS_RATE=$(python calculate_pass_rate.py)
          if (( $(echo "$PASS_RATE &lt; 0.85" | bc -l) )); then
            echo "Pass rate $PASS_RATE below threshold"
            exit 1
          fi
</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Upload</span> <span class="hljs-string">results</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/upload-artifact@v2</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">name:</span> <span class="hljs-string">eval-results</span>
          <span class="hljs-attr">path:</span> <span class="hljs-string">results/</span>
</code></pre>
<p><strong>Explanation:</strong> This GitHub Actions workflow automates your evaluation process so it runs automatically on every code change. The workflow triggers whenever someone opens a pull request or pushes code to the main branch. It checks out your code, runs your full evaluation suite using <code>run_</code><a target="_blank" href="http://evals.py"><code>evals.py</code></a>, then calculates what percentage of test cases passed. If the pass rate drops below 85%, the workflow fails and blocks the code from being merged, preventing quality regressions from reaching production.</p>
<p><strong>Key practices for automation:</strong></p>
<ul>
<li><p><strong>Version your test cases</strong> — Track them in Git alongside code</p>
</li>
<li><p><strong>Set quality gates</strong> — Block deployments if pass rate drops below threshold</p>
</li>
<li><p><strong>Monitor trends</strong> — Track metrics over time to catch gradual drift</p>
</li>
<li><p><strong>Alert on regressions</strong> — Notify team when specific test cases start failing</p>
</li>
<li><p><strong>Sample production traffic</strong> — Continuously add real queries to eval dataset</p>
</li>
</ul>
<h2 id="heading-tools-and-frameworks-you-can-use-for-evaluation">Tools and Frameworks you can use for evaluation</h2>
<p>Several platforms can help implement continuous evaluation. The one you choose depends on your stack and needs:</p>
<p><strong>If you're building with LLMs:</strong> Try LangSmith or Braintrust first. Both handle prompt versioning, evaluation datasets, and tracing out of the box.</p>
<p><strong>If you're doing traditional ML:</strong> Weights &amp; Biases is the industry standard. If you're in the Microsoft ecosystem, PromptFlow integrates well with Azure.</p>
<p><strong>If you want full control:</strong> Build custom with pytest for test execution and MLflow for tracking results. More setup, but you own the entire pipeline</p>
<h2 id="heading-what-a-complete-evaluation-loop-looks-like-in-practice">What a Complete Evaluation Loop Looks Like in Practice</h2>
<p>This walkthrough shows how a support chatbot improves after running a single cycle of evaluations. Each stage shows how evaluation signals guide improvements and lock in quality for the next release.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Stage</td><td>Before</td><td>After</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Test Case</strong></td><td>"Can I use your API on the free plan?"</td><td>Same question</td></tr>
<tr>
<td><strong>Model Response</strong></td><td>"Yes, you can access our API."</td><td>"Yes, you can access our API on the free plan with a rate limit of 100 requests per day. For higher limits, upgrade to Pro or Enterprise."</td></tr>
<tr>
<td><strong>Evaluation Scores</strong></td><td>contains_key_info=False, helpfulness=2/5</td><td>contains_key_info=True, helpfulness=5/5</td></tr>
<tr>
<td><strong>Issue Identified</strong></td><td>Missing crucial detail: free plan rate limits</td><td>N/A (issue resolved)</td></tr>
<tr>
<td><strong>Analysis / Root Cause</strong></td><td>Retrieval returned general API docs; prompt didn’t emphasize limitations</td><td>N/A (analysis led to fix)</td></tr>
<tr>
<td><strong>Fixes Applied</strong></td><td>1. Improved retrieval to fetch plan comparison docs2. Updated prompt: "Always mention plan-specific restrictions"3. Added validation: Response must mention rate limits if asked</td><td>N/A (fix implemented)</td></tr>
<tr>
<td><strong>Outcome</strong></td><td>Test failed, regression not prevented</td><td>Test passes, regression prevented</td></tr>
<tr>
<td><strong>Next Cycle Actions</strong></td><td>N/A</td><td>1. Add this test case to permanent suite 2. Look for similar issues (other plan-related questions) 3. Monitor production queries for this pattern</td></tr>
</tbody>
</table>
</div><p><strong>Next cycle:</strong></p>
<ul>
<li><p>Add this test case to permanent suite</p>
</li>
<li><p>Look for similar issues (other plan-related questions)</p>
</li>
<li><p>Monitor if this pattern appears in production queries</p>
</li>
</ul>
<h2 id="heading-key-takeaways">Key Takeaways</h2>
<ul>
<li><p><strong>AI systems need continuous evaluation, not one-time testing</strong> — Models drift, data changes, and silent failures accumulate without ongoing checks.</p>
</li>
<li><p><strong>Build evaluation into your workflow from day one</strong> — Don't wait until production failures force you to retrofit evaluation.</p>
</li>
<li><p><strong>Start simple, then scale</strong> — Begin with 10-20 test cases and basic metrics. Grow your suite as you encounter edge cases.</p>
</li>
<li><p><strong>Automate what you can, involve humans for what you can't</strong> — Use programmatic checks for speed, SME review for nuance.</p>
</li>
<li><p><strong>Treat evaluation datasets as first-class artifacts</strong> — Version control them, review changes, and grow them over time.</p>
</li>
<li><p><strong>Make evaluation a team sport</strong> — Product, engineering, and domain experts should all contribute test cases and evaluation criteria.</p>
</li>
</ul>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Every developer has felt the relief of seeing "all tests passing." In AI systems, that reassurance is often misleading. A model can deploy successfully, meet performance benchmarks, and still produce incorrect, incomplete, or misleading outputs in ways traditional tests miss.</p>
<p>The evaluation flywheel addresses this gap by making model behavior testable in practice. Instead of assuming correctness, it forces the system to answer real questions, measures the quality of those answers, and highlights where performance degrades over time. This shifts evaluation from a one-off validation step into an ongoing part of development.</p>
<p>Evaluation won't eliminate uncertainty completely, but it makes failures visible before they reach users. With failures clearly exposed, teams stop guessing and start fixing based on results. This might mean adjusting prompts, improving retrieval logic, or refining evaluation criteria. Over time, this leads to AI systems that evolve in controlled ways rather than breaking silently.</p>
<p><strong>Resources for further reading</strong></p>
<ul>
<li><p><strong>Anthropic's eval guide</strong>: <a target="_blank" href="https://docs.anthropic.com/en/docs/build-with-claude/develop-tests">https://docs.anthropic.com/en/docs/build-with-claude/develop-tests</a></p>
</li>
<li><p><strong>OpenAI's evals framework</strong>: <a target="_blank" href="https://github.com/openai/evals">https://github.com/openai/evals</a></p>
</li>
<li><p><strong>LangChain evaluation</strong>: <a target="_blank" href="https://python.langchain.com/docs/guides/evaluation">https://python.langchain.com/docs/guides/evaluation</a></p>
</li>
<li><p><strong>Arize AI blog</strong>: Comprehensive resources on ML observability</p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build Your First Dynamic Performance Test in Apache JMeter ]]>
                </title>
                <description>
                    <![CDATA[ As a QA engineer, I have always found performance testing to be one of the most exciting and underrated parts of software testing. Yes, functional testing is important, but it’s of little use if users have to wait for 5 seconds for each page to load.... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-your-first-dynamic-performance-test-in-apache-jmeter/</link>
                <guid isPermaLink="false">6900f3ca65a053299e38eab3</guid>
                
                    <category>
                        <![CDATA[ Scale Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Performance Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ jmeter ]]>
                    </category>
                
                    <category>
                        <![CDATA[ scalability ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Quality Assurance ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Mah Noor ]]>
                </dc:creator>
                <pubDate>Tue, 28 Oct 2025 16:48:10 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1761335397152/cb105a44-4c18-4998-9ffb-d520df0e6510.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>As a QA engineer, I have always found performance testing to be one of the most exciting and underrated parts of software testing. Yes, functional testing is important, but it’s of little use if users have to wait for 5 seconds for each page to load.</p>
<p>For me personally, there is a deep satisfaction that comes with seeing your product come alive under load to find out how it’ll actually work in production when thousands of users will be using it.</p>
<p>Performance testing is about discovering how your system performs under real-world pressure in terms of load, concurrency, and throughput. One of the key aspects of performance testing is ensuring that the APIs can endure the expected load. You can do this using tools like Apache JMeter and K6.</p>
<p>In this tutorial, we’ll explore how you can build your first end-to-end performance test in Apache JMeter. You will be learning to create a test suite that is dynamic (the test can be run with any test data) and that’s one-click executable (the test execution can be done through the GUI as well as the CLI).</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-introduction-to-apache-jmeter">Introduction to Apache JMeter</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-create-a-new-test-plan">Step 1: Create a New Test Plan</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-configure-the-thread-group">Step 2: Configure the Thread Group</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-add-http-request-defaults">Step 3: Add HTTP Request Defaults</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-add-a-csv-data-set-config-dynamic-input">Step 4: Add a CSV Data Set Config (Dynamic Input)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-add-the-http-request-sampler">Step 5: Add the HTTP Request Sampler</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-add-a-json-extractor">Step 6: Add a JSON Extractor</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-add-an-assertion">Step 7: Add an Assertion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-8-add-listeners">Step 8: Add Listeners</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-9-run-your-test">Step 9: Run Your Test</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-10-chain-another-request-optional">Step 10: Chain Another Request (Optional)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-11-analyze-the-results">Step 11: Analyze the Results</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pro-tips">Pro Tips</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-example-folder-structure">Example Folder Structure:</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you start, make sure you have:</p>
<ul>
<li><p><a target="_blank" href="https://jmeter.apache.org/download_jmeter.cgi"><strong>Apache JMeter (5.5 or above)</strong></a> installed.</p>
</li>
<li><p><a target="_blank" href="https://www.java.com/en/download/manual.jsp"><strong>Java 8 or later</strong></a> configured on your system.</p>
</li>
</ul>
<p>You can check if JMeter is installed by running the command below:</p>
<pre><code class="lang-plaintext">jmeter -v
</code></pre>
<p><strong>Note:</strong> This tutorial will use the <a target="_blank" href="https://jsonplaceholder.typicode.com/">JSONPlaceholder</a> public API. You’ll learn how you can get a post_id and use it in a chain request to get user details.</p>
<p>Let’s get started.</p>
<h2 id="heading-introduction-to-apache-jmeter">Introduction to Apache JMeter</h2>
<p>Apache JMeter is an open-source API load and stress testing tool. It’s a powerful testing tool that supports a wide range of protocols, including HTTP, HTTPS, FTP, JDBC, SOAP, and REST.</p>
<p>JMeter helps you answer critical questions about your APIs, like:</p>
<ul>
<li><p>How does my API perform under heavy load?</p>
</li>
<li><p>What’s the maximum number of users it can handle before it starts failing?</p>
</li>
<li><p>Which requests or endpoints are slowing things down?</p>
</li>
</ul>
<p>Let’s go through the step-by-step process of building a dynamic load testing suite with JMeter.</p>
<h3 id="heading-step-1-create-a-new-test-plan">Step 1: Create a New Test Plan</h3>
<p>Once JMeter opens, you’ll see an empty Test Plan. Think of this as your main workspace, which holds everything: Test configuration, users, requests, assertions, and results.</p>
<p>Right-click on <strong>Test Plan → Add → Threads (Users) → Thread Group</strong> to add a thread group. A thread group is essentially a test suite containing our test cases.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761045558747/ad3a2fe3-de59-420f-ba9d-1a36323e1d9e.png" alt="Add Thread Group" width="1920" height="1009" loading="lazy"></p>
<h3 id="heading-step-2-configure-the-thread-group">Step 2: Configure the Thread Group</h3>
<p>To configure the thread group, fill out the following input fields:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Setting</td><td>Value</td><td>Description</td></tr>
</thead>
<tbody>
<tr>
<td>Number of Threads (Users)</td><td>5</td><td>This represents the number of concurrent users. In this case, it will be ‘5’</td></tr>
<tr>
<td>Ramp-up Period (seconds)</td><td>10</td><td>This means the time it takes the threads to reach the maximum value.</td></tr>
<tr>
<td>Loop Count</td><td>2</td><td>This specifies the number of times you want your thread group executed.</td></tr>
</tbody>
</table>
</div><p>You’ve now created a small, controlled load test of 10 total requests (5 users × 2 loops).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761049951497/8221336c-5f10-4161-81fa-d0ad27c7164f.png" alt="Thread Group" class="image--center mx-auto" width="1920" height="982" loading="lazy"></p>
<h3 id="heading-step-3-add-http-request-defaults">Step 3: Add HTTP Request Defaults</h3>
<p>When you’re creating a suite of 100s of APIs, you don’t need to add your request details to all the API samplers in JMeter. JMeter lets you set it once globally by using a config element called HTTP Request Defaults. To add this element, follow the steps below:</p>
<ol>
<li><p>Right-click on <strong>Thread Group → Add → Config Element → HTTP Request Defaults.</strong></p>
</li>
<li><p>Enter the following:</p>
<ul>
<li><p><strong>Protocol:</strong> <code>https</code></p>
</li>
<li><p><strong>Server Name or IP:</strong> <a target="_blank" href="http://jsonplaceholder.typicode.com"><code>jsonplaceholder.typicode.com</code></a></p>
</li>
</ul>
</li>
</ol>
<p>This means all requests in this test will automatically use this base URL.</p>
<h3 id="heading-step-4-add-a-csv-data-set-config-dynamic-input">Step 4: Add a CSV Data Set Config (Dynamic Input)</h3>
<p>In real projects, APIs rarely use static inputs. Take as an example a login API that you want to run for 100 concurrent users. In a real-world scenario, every login request will have a different username and password.</p>
<p>To replicate this on JMeter, you need to run your test for 100 different login credentials. This means that your test should be <strong>test data-driven</strong>. We can build a data-driven test in JMeter using a <strong>CSV file</strong>:</p>
<ol>
<li><p>Create a file named <code>data.csv</code> with the following content:</p>
<pre><code class="lang-plaintext"> post_id
 1
 2
 3
 4
 5
</code></pre>
</li>
<li><p>Save it in your JMeter project folder.</p>
</li>
<li><p>In JMeter, right-click on <strong>Thread Group → Add → Config Element → CSV Data Set Config.</strong></p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761048312824/4558aae4-23c8-446d-89d0-237aca29619d.png" alt="Add CSV Data Set Config" class="image--center mx-auto" width="1169" height="974" loading="lazy"></p>
</li>
<li><p>Fill in the following fields:</p>
<ul>
<li><p><strong>Filename:</strong> <code>data.csv</code></p>
</li>
<li><p><strong>Variable Names:</strong> <code>post_id</code></p>
</li>
<li><p><strong>Recycle on EOF:</strong> <code>True</code></p>
</li>
<li><p><strong>Stop thread on EOF:</strong> <code>False</code></p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761048167041/eae27f5c-6e23-4c7d-8890-b3eb5943bb66.png" alt="CSV Data Set Config" class="image--center mx-auto" width="1437" height="642" loading="lazy"></p>
</li>
</ul>
</li>
</ol>
<p>Now each user will pick a new <code>post_id</code> for every iteration from the CSV file.</p>
<h3 id="heading-step-5-add-the-http-request-sampler">Step 5: Add the HTTP Request Sampler</h3>
<p>Now let’s add the actual API call we'll test under load. To do this, follow the steps below:</p>
<ol>
<li><p>Right-click on <strong>Thread Group → Add → Sampler → HTTP Request.</strong></p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761051865320/92bf89d0-616c-4d07-9531-3985265e07d7.png" alt="Add an HTTP Request" class="image--center mx-auto" width="1920" height="1017" loading="lazy"></p>
</li>
<li><p>Rename it to <strong>Get Post Data.</strong></p>
</li>
<li><p>Set the following fields:</p>
<ul>
<li><p><strong>Method:</strong> GET</p>
</li>
<li><p><strong>Path:</strong> <code>/posts/${post_id}</code></p>
</li>
</ul>
</li>
</ol>
<p>Here <code>${post_id}</code> dynamically takes its value from your CSV file. The Protocol and Server IP fields will automatically get data from the ‘HTTP Request default’ config element that we added in Step #3.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761049282841/a420139c-4622-4d7a-ac7d-4308bb9a1dbc.png" alt="Add a GET Request" class="image--center mx-auto" width="1920" height="904" loading="lazy"></p>
<h3 id="heading-step-6-add-a-json-extractor">Step 6: Add a JSON Extractor</h3>
<p>When the API returns a response, we can extract a value (like <code>userId</code>) from it and use it later. This is used to implement an end-to-end flow where data is gotten (with GET) from an API and sent to the next POST/DELETE API.</p>
<p>For our API, below is the example response:</p>
<pre><code class="lang-plaintext">{
  "userId": 1,
  "id": 3,
  "title": "fugiat veniam minus",
  "body": "This is an example post body"
}
</code></pre>
<p>To extract <code>userId</code>:</p>
<ol>
<li><p>Right-click on <strong>Get Post Data → Add → Post Processors → JSON Extractor.</strong></p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761051791176/b7888a78-efbb-48d3-8aba-fcd21edfd8f2.png" alt="Add JSON Extractor" class="image--center mx-auto" width="1920" height="1018" loading="lazy"></p>
</li>
<li><p>Set the variables below in the JSON Extractor:</p>
<ul>
<li><p><strong>Name:</strong> Extract User ID</p>
</li>
<li><p><strong>Variable Name:</strong> <code>user_id</code></p>
</li>
<li><p><strong>JSON Path Expression:</strong> <code>$.userId</code></p>
</li>
</ul>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761049324410/8a163733-8925-4557-9ace-124b08167f8e.png" alt="JSON Extractor" class="image--center mx-auto" width="1920" height="971" loading="lazy"></p>
<p>Now you can use <code>${user_id}</code> in the next request, making your test fully dynamic.</p>
<h3 id="heading-step-7-add-an-assertion">Step 7: Add an Assertion</h3>
<p>Assertions help you verify that your API responds correctly even under load. You can assert on the API response code, response time, or even the response payload. To add an assertion, follow the steps below:</p>
<ol>
<li><p>Right-click <strong>Get Post Data → Add → Assertions → Response Assertion.</strong></p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761049384591/a0293eef-74a0-4d55-b0c4-232d5c5eaa0c.png" alt="Add Response Assertion" class="image--center mx-auto" width="1920" height="1020" loading="lazy"></p>
</li>
<li><p>Configure as:</p>
<ul>
<li><p><strong>Response Field to Test:</strong> <em>Response Code –</em> This will add an assertion for the response code.</p>
</li>
<li><p><strong>Pattern Matching Rules:</strong> <em>Contains</em></p>
</li>
<li><p><strong>Pattern to Test:</strong> 200</p>
</li>
</ul>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761050184412/5a52f600-74f6-48c7-a975-7e39df47afdb.png" alt="Add Response Assertion" class="image--center mx-auto" width="1920" height="1017" loading="lazy"></p>
<p>This ensures JMeter only counts the request as successful if the word <code>fugiat</code> appears in the response.</p>
<h3 id="heading-step-8-add-listeners">Step 8: Add Listeners</h3>
<p>We’ll add listeners to display our test results in different forms, such as visually or in a summary. Let’s add two essential ones:</p>
<ol>
<li><p><strong>View Results Tree</strong>: to view and debug individual requests.</p>
</li>
<li><p><strong>Summary Report</strong>: to view performance metrics like response time, error rate, and throughput.</p>
</li>
</ol>
<p>Add them via <strong>Thread Group → Add → Listener → [Choose Listener]</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761049568483/0daa916c-503d-4f91-ad17-1d2bd29a9f72.png" alt="Add Listener in JMeter" class="image--center mx-auto" width="1920" height="1020" loading="lazy"></p>
<h3 id="heading-step-9-run-your-test">Step 9: Run Your Test</h3>
<p>Hit the green <strong>Start</strong> button at the top. JMeter will start sending requests to your API using the dynamic post IDs from your CSV file.</p>
<p>As the test runs:</p>
<ul>
<li><p>Green checkmarks in <strong>View Results Tree</strong> mean successful responses.</p>
</li>
<li><p>Assertion failures will appear in red.</p>
</li>
<li><p><strong>Summary Report</strong> will aggregate key metrics.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761050151356/d8c72408-cf91-4c9d-8663-0a65b6943f5b.png" alt="JMeter View Results Tree" class="image--center mx-auto" width="1920" height="1013" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761050211424/532dd999-b870-4cf8-ad1e-1a692119b0e0.png" alt="JMeter Summary Report" class="image--center mx-auto" width="1920" height="1024" loading="lazy"></p>
<h3 id="heading-step-10-chain-another-request-optional">Step 10: Chain Another Request (Optional)</h3>
<p>Let’s take it one step further: we’ll use the extracted <code>user_id</code> from the first response to get user details from the <a target="_blank" href="https://jsonplaceholder.typicode.com/users">GET users call</a>. To do this, follow the steps below:</p>
<ol>
<li><p>Right-click <strong>Thread Group → Add → Sampler → HTTP Request.</strong></p>
</li>
<li><p>Rename to <strong>Get User Details.</strong></p>
</li>
<li><p>Set:</p>
<ul>
<li><p><strong>Method:</strong> GET</p>
</li>
<li><p><strong>Path:</strong> <code>/users/${user_id}</code></p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761050384264/dcc1c333-4e06-4dd9-8dca-9af823fedabd.png" alt="GET Users API" class="image--center mx-auto" width="1920" height="1015" loading="lazy"></p>
</li>
</ul>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761050365937/736b9954-6d01-45a6-8c16-f6d2ceb60e10.png" alt="Test Execution in JMeter" class="image--center mx-auto" width="1920" height="1021" loading="lazy"></p>
<h3 id="heading-step-11-analyze-the-results">Step 11: Analyze the Results</h3>
<p>Once the test completes, open the <strong>Summary Report</strong>. You’ll see:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Metric</td><td>Description</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Sample Count</strong></td><td>Number of total requests sent</td></tr>
<tr>
<td><strong>Average</strong></td><td>Mean response time per request</td></tr>
<tr>
<td><strong>Min/Max</strong></td><td>Fastest and slowest response times</td></tr>
<tr>
<td><strong>Error %</strong></td><td>Percentage of failed requests</td></tr>
<tr>
<td><strong>Throughput</strong></td><td>Requests handled per second</td></tr>
</tbody>
</table>
</div><p>If your error percentage is 0% and throughput is stable, your system handled the load well.</p>
<h3 id="heading-pro-tips">Pro Tips</h3>
<ul>
<li><p><strong>Parameterize everything.</strong> Use multiple CSVs for realistic test flows (users, IDs, tokens).</p>
</li>
<li><p><strong>Add timers</strong> (like <em>Constant Timer</em>) to simulate think time between user actions.</p>
</li>
<li><p><strong>Use Assertions wisely.</strong> Don’t add extra assertions; focus on key validations such as response time and API status code.</p>
</li>
<li><p><strong>Generate HTML reports using the command below:</strong></p>
<pre><code class="lang-plaintext">  jmeter -n -t test-plan.jmx -l results.jtl -e -o report
</code></pre>
</li>
</ul>
<h3 id="heading-example-folder-structure">Example Folder Structure:</h3>
<p>Follow the folder structure below for an organized test suite.</p>
<pre><code class="lang-plaintext">performance-test/
├── data.csv
├── test-plan.jmx
└── results/
    ├── summary.csv
    └── report.html
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Performance testing is an essential element of a production readiness checklist for any product. It helps you ensure that your product can handle the expected user load and scale gracefully.</p>
<p>This guide is your first step towards writing end-to-end performance test cases and bridging the gap between being a functional test engineer and a full-stack QA Engineer who understands both quality and scalability.</p>
<p>I hope you found this tutorial helpful. If you want to stay connected or learn more about performance testing, follow me on <a target="_blank" href="https://www.linkedin.com/in/mah-noorqa/">LinkedIn</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Test JavaScript Apps: From Unit Tests to AI-Augmented QA ]]>
                </title>
                <description>
                    <![CDATA[ As a software engineer, you should always be open to the challenges this field brings. Two months ago, my project manager assigned me a task: write test cases for an API. I was super excited because it meant I got to learn something new beyond just c... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-test-javascript-apps-from-unit-tests-to-ai-augmented-qa/</link>
                <guid isPermaLink="false">68e68c3655c4d79b6db4f4c4</guid>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ React ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ automation ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ajay Yadav ]]>
                </dc:creator>
                <pubDate>Wed, 08 Oct 2025 16:07:18 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1759939599135/507c5e9a-954b-497b-b3b8-c8d89b2d1a03.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>As a software engineer, you should always be open to the challenges this field brings. Two months ago, my project manager assigned me a task: write test cases for an API. I was super excited because it meant I got to learn something new beyond just coding features.</p>
<p>Now, if you’re thinking “writing test cases isn’t my job as a frontend or backend developer”, then you’re missing the point. That mindset holds you back.</p>
<p>At the very least, every engineer should understand Unit Testing and Integration Testing. Writing test cases isn’t rocket science, it’s as simple as English and feels very similar to writing JavaScript code.</p>
<p>That said, if you’ve ever tried setting up testing in a JavaScript application, you probably know how complicated and frustrating it can get.</p>
<p>The JavaScript ecosystem is massive, with endless libraries and frameworks. Things shift constantly, new tools replace old ones, and community standards evolve almost overnight. That’s exactly why I decided to write this article.</p>
<p>In it, we’ll explore a modern approach to JavaScript testing, covering practical patterns, workflows, and even how AI-assisted tools are changing the game.</p>
<p>Let’s dive in.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-evolution-of-testing">The Evolution of Testing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-core-layers-of-testing">The Core Layers of Testing</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-unit-testing">Unit testing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-integration-testing">Integration testing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-end-to-end-testing">End-to-End testing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ai-augmented-testing">AI-Augmented testing</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-future-of-javascript-testing">Future of JavaScript Testing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-before-we-end">Before We End</a></p>
</li>
</ul>
<h2 id="heading-the-evolution-of-testing">The Evolution of Testing</h2>
<p>Software testing has been around for as long as software itself. According to IBM (2016), testing started right alongside the very first programs. After World War II, three computer scientists wrote what’s considered to be the <a target="_blank" href="https://en.wikipedia.org/wiki/Manchester_Baby">first piece of software</a>.</p>
<p>It ran on June 21, 1948, at the University of Manchester in England, performing mathematical calculations with basic machine code instructions.</p>
<p>Since then, testing methods and principles have continuously evolved. As software became more complex and development cycles got faster, the need for reliable and systematic testing grew stronger.</p>
<p>In the early days, the concept of the <strong>Testing Pyramid</strong> became popular. At the base, you had unit tests, in the middle integration tests, and at the very top a thin layer of end-to-end (E2E) tests. This approach worked well for simpler applications.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1759395722389/0067bc6e-f038-40a6-905c-61406f41e430.png" alt="Image of the testing pyramid showing the different layers" class="image--center mx-auto" width="994" height="618" loading="lazy"></p>
<p>But as apps grew more dynamic and interconnected, the pyramid approach began to show its limits. That’s where the <strong>Testing Trophy model</strong> came in. Instead of overloading with unit tests, it puts greater emphasis on integration testing while still keeping E2E tests and unit tests in balance.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1759395841713/b92ea402-5002-4c48-be7c-aee6f1dfacfd.png" alt="Diagram of a &quot;Testing Trophy&quot; pyramid. Top to bottom: &quot;End-to-End Tests&quot; (Slow, Few, Expensive), &quot;Integration Tests&quot; (Moderate Speed, Fewer, Moderate Cost), &quot;Unit Tests&quot; (Fast, Numerous, Cheap), &quot;Static Analysis&quot; (Instant, Numerous, Cheapest). Left axis: Confidence increases up, Speed decreases down. Right axis: Cost increases up, Frequency decreases down." class="image--center mx-auto" width="993" height="676" loading="lazy"></p>
<p>Now, with the rise of AI in QA, testing has entered a new phase. AI-driven tools don’t just run tests, they help generate, maintain, and even self-heal them. This shift is creating a future-ready testing framework designed to handle the complexity of modern software in 2025 and beyond.</p>
<h2 id="heading-the-core-layers-of-testing">The Core Layers of Testing</h2>
<p>Testing is not just about finding bugs, but also ensuring reliability, scalability, and user satisfaction. Every testing strategy should cover four main layers:</p>
<h3 id="heading-unit-testing">Unit Testing</h3>
<p>Unit testing is a method where you test individual components or units of software in isolation to make sure they work as expected. A unit can be a simple function, a React component, or even a utility module.</p>
<p>When building JavaScript apps, we usually create separate modules or components that later get combined. If any one of those small pieces is broken, the entire application can fail. That’s why unit tests are essential, they catch problems early and ensure reliability before integration.</p>
<p>In the JavaScript ecosystem, there are several tools you can use for writing unit tests:</p>
<ul>
<li><p><a target="_blank" href="https://vitest.dev/"><strong>Vitest</strong></a> – a modern, fast, and developer-friendly testing framework built to work seamlessly with Vite projects.</p>
</li>
<li><p><a target="_blank" href="https://jestjs.io/"><strong>Jest</strong></a> – one of the most widely used testing frameworks, great for React apps among others.</p>
</li>
</ul>
<p>For this section, we’ll focus on <strong>Vitest</strong>, because it’s lightweight, super-fast, and feels very natural for modern frontend development. Let’s write a test case for a small module.</p>
<p>Imagine we have a simple utility function that adds two numbers:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// sum.ts</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> sum = <span class="hljs-function"><span class="hljs-keyword">function</span> (<span class="hljs-params">a: <span class="hljs-built_in">number</span>, b: <span class="hljs-built_in">number</span></span>) </span>{
  <span class="hljs-keyword">return</span> a + b;
};
</code></pre>
<p>Every test typically has 3 parts:</p>
<ol>
<li><p>A description (string).</p>
</li>
<li><p>The code execution.</p>
</li>
<li><p>The assertion.</p>
</li>
</ol>
<p>Now, let’s write a unit test for the above function using Vitest.</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// sum.test.ts</span>
<span class="hljs-keyword">import</span> { describe, expect, it } <span class="hljs-keyword">from</span> <span class="hljs-string">"vitest"</span>;
<span class="hljs-keyword">import</span> { sum } <span class="hljs-keyword">from</span> <span class="hljs-string">"./sum"</span>;

describe(<span class="hljs-string">"sum function"</span>, <span class="hljs-function">() =&gt;</span> {
  it(<span class="hljs-string">"should return the sum of two numbers"</span>, <span class="hljs-function">() =&gt;</span> { <span class="hljs-comment">// 1. description</span>
    <span class="hljs-keyword">const</span> result = sum(<span class="hljs-number">2</span>, <span class="hljs-number">3</span>); <span class="hljs-comment">// 2. code execution</span>
    expect(result).toBe(<span class="hljs-number">5</span>);   <span class="hljs-comment">// 3. assertion</span>
  });

  <span class="hljs-comment">// ... other test cases</span>
});

<span class="hljs-comment">// ... other describe blocks</span>
</code></pre>
<p>Breaking it down:</p>
<ul>
<li><p><code>describe</code> groups related test cases together. Here, we group everything about the <code>sum</code> function.</p>
</li>
<li><p><code>it</code> (or <code>test</code>) defines a single test case. In this example: “should return the sum of two numbers.”</p>
</li>
<li><p><code>expect</code> makes the actual assertion. It checks if the result from <code>sum(2,3)</code> equals <code>5</code>.</p>
</li>
</ul>
<p>When you run this test, Vitest will quickly execute it and show you whether the function passed or failed.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1759399251713/3c051bbb-4813-40ed-8656-d1bd2730dc38.png" alt="Command line interface showing test results using &quot;vitest&quot; in a development environment. Two test files, &quot;sum.test.ts&quot; and &quot;App.test.tsx&quot;, have passed successfully. Total test duration was 828ms." class="image--center mx-auto" width="1020" height="307" loading="lazy"></p>
<p>If the function works, you’ll see <code>1 passed</code> in green. If it fails, the output will be red with details about what went wrong.</p>
<h3 id="heading-integration-testing">Integration Testing</h3>
<p>Now that we’ve covered unit testing, let’s move one step up to integration testing. While unit tests focus on testing individual pieces in isolation, integration tests ensure those pieces work together as expected.</p>
<p>Think of it like assembling Lego blocks: each piece might work fine on its own, but when you connect them, something might not fit right. Integration testing helps you catch those issues early.</p>
<p>In simple terms, Integration testing checks how components and modules interact with each other.</p>
<p>Let’s say we have a React component that fetches user data from an API and displays it on the screen.<br>We’re no longer just testing one function – we’re testing how the component behaves when it calls an API, manages loading states, and renders data dynamically.</p>
<p>Here’s a simple example:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { useEffect, useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;

<span class="hljs-keyword">const</span> User = <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-keyword">const</span> [users, setUsers] = useState&lt;{ name: <span class="hljs-built_in">string</span>; email: <span class="hljs-built_in">string</span> }[]&gt;([]);
  <span class="hljs-keyword">const</span> [loading, setLoading] = useState(<span class="hljs-literal">false</span>);

  <span class="hljs-keyword">const</span> fetchUsers = <span class="hljs-keyword">async</span> () =&gt; {
    setLoading(<span class="hljs-literal">true</span>);
    <span class="hljs-keyword">try</span> {
      <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"https://api.escuelajs.co/api/v1/users"</span>);
      <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> res.json();
      setUsers(data);
    } <span class="hljs-keyword">catch</span> (e) {
      <span class="hljs-built_in">console</span>.log(e);
    } <span class="hljs-keyword">finally</span> {
      setLoading(<span class="hljs-literal">false</span>);
    }
  };

  useEffect(<span class="hljs-function">() =&gt;</span> {
    fetchUsers();
  }, []);

  <span class="hljs-keyword">return</span> (
    &lt;&gt;
      {loading ? (
        &lt;h2&gt;Loading...&lt;/h2&gt;
      ) : (
        &lt;div&gt;
          {users.map(<span class="hljs-function">(<span class="hljs-params">user, index</span>) =&gt;</span> (
            &lt;p key={index}&gt;
              {user.name}: {user.email}
            &lt;/p&gt;
          ))}
        &lt;/div&gt;
      )}
    &lt;/&gt;
  );
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> User;
</code></pre>
<p>This component does a few things:</p>
<ul>
<li><p>Calls an external API when the component mounts.</p>
</li>
<li><p>Sets a loading state while fetching data.</p>
</li>
<li><p>Renders the fetched users on the screen once the data is ready.</p>
</li>
</ul>
<p>Now, our job is to test the complete flow, from the API call to the rendered UI, using Vitest and <a target="_blank" href="https://testing-library.com/">React Testing Library</a>.</p>
<p>Here’s what the test file looks like:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { render, screen, waitFor } <span class="hljs-keyword">from</span> <span class="hljs-string">"@testing-library/react"</span>;
<span class="hljs-keyword">import</span> User <span class="hljs-keyword">from</span> <span class="hljs-string">"../components/User"</span>;
<span class="hljs-keyword">import</span> { describe, test, expect } <span class="hljs-keyword">from</span> <span class="hljs-string">"vitest"</span>;

describe(<span class="hljs-string">"User Component"</span>, <span class="hljs-function">() =&gt;</span> {
  test(<span class="hljs-string">"fetches and displays users successfully"</span>, <span class="hljs-keyword">async</span> () =&gt; {
    render(&lt;User /&gt;);

    <span class="hljs-comment">// 1. Initially shows loading</span>
    expect(screen.getByText(<span class="hljs-string">"Loading..."</span>)).toBeInTheDocument();

    <span class="hljs-comment">// 2. Wait for API response and UI update</span>
    <span class="hljs-keyword">await</span> waitFor(<span class="hljs-function">() =&gt;</span> {
      expect(
        screen.getByText(<span class="hljs-string">"Ajay Yadav: ajay.yadav@example.com"</span>)
      ).toBeInTheDocument();
      expect(
        screen.getByText(<span class="hljs-string">"Jane Smith: jane.smith@example.com"</span>)
      ).toBeInTheDocument();
    });

    <span class="hljs-comment">// 3. Loading should disappear</span>
    expect(screen.queryByText(<span class="hljs-string">"Loading..."</span>)).not.toBeInTheDocument();
  });
});
</code></pre>
<p>This test looks simple, but it covers the entire flow of our component. Let’s understand it step-by-step:</p>
<ul>
<li><p><strong>Render the component:</strong> Render the <code>&lt;User /&gt;</code> component inside the test environment.</p>
</li>
<li><p><strong>Check the loading state:</strong> As soon as the component mounts, the <strong>“Loading…”</strong> text should appear, indicating that data is being fetched.</p>
</li>
<li><p><strong>Wait for the data to load:</strong> Since the API call is asynchronous, use <code>waitFor()</code> to wait until the users are fetched and displayed.</p>
</li>
<li><p><strong>Verify the data:</strong> Once the API resolves, check if the user names and emails are correctly rendered on the screen.</p>
</li>
<li><p><strong>Confirm loading disappears:</strong> Finally, ensure that the “Loading…” text is removed once the data is displayed, confirming a proper state update.</p>
</li>
</ul>
<p>You can also test how your component behaves when the API fails. For example, you can mock the <code>fetch()</code> call to reject and then verify if an error message appears on the screen.</p>
<p>Vitest and React Testing Library make it easy to mock responses and simulate both success and failure cases, helping you ensure that your app handles real-world scenarios gracefully.</p>
<h3 id="heading-end-to-end-testing">End-to-End Testing</h3>
<p>Now that we’ve seen how integration testing ensures that different components work together, let’s move to the third layer, End-to-End (E2E) testing.</p>
<p>While unit and integration tests run in isolated or simulated environments, E2E tests mimic how real users interact with your app.</p>
<p>They open a browser and perform actions like clicking buttons, typing in fields, and verifying what appears on the screen, exactly like a real person would.</p>
<p>Think of E2E testing as putting your entire app on stage and watching if it performs flawlessly in front of the audience. In simple words, E2E testing verifies the full user journey from start to finish.</p>
<p>Let’s take a common example, a login flow. As a developer, you’ve probably built dozens of login forms, but how do you know if they truly work under real conditions? That’s where E2E testing comes in.</p>
<p>Using tools like <a target="_blank" href="https://playwright.dev/">Playwright</a> or <a target="_blank" href="https://www.cypress.io/">Cypress</a>, you can perform effective E2E testing. Both Playwright and Cypress are powerful tools and are popular among developers.</p>
<p>We can simulate a real browser, fill out the login form, submit it, and confirm that the user is redirected to the dashboard. Here’s what a simple E2E test looks like using Playwright:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// tests/login.e2e.ts</span>
<span class="hljs-keyword">import</span> { test, expect } <span class="hljs-keyword">from</span> <span class="hljs-string">"@playwright/test"</span>;

test(<span class="hljs-string">"should login successfully"</span>, <span class="hljs-keyword">async</span> ({ page }) =&gt; {
  <span class="hljs-comment">// 1. Visit the login page</span>
  <span class="hljs-keyword">await</span> page.goto(<span class="hljs-string">"http://localhost:3000/login"</span>);

  <span class="hljs-comment">// 2. Fill in the form</span>
  <span class="hljs-keyword">await</span> page.fill(<span class="hljs-string">'input[name="email"]'</span>, <span class="hljs-string">"user@example.com"</span>);
  <span class="hljs-keyword">await</span> page.fill(<span class="hljs-string">'input[name="password"]'</span>, <span class="hljs-string">"password123"</span>);

  <span class="hljs-comment">// 3. Click login button</span>
  <span class="hljs-keyword">await</span> page.click(<span class="hljs-string">'button[type="submit"]'</span>);

  <span class="hljs-comment">// 4. Wait for navigation and verify success message or dashboard</span>
  <span class="hljs-keyword">await</span> expect(page).toHaveURL(<span class="hljs-string">"http://localhost:3000/dashboard"</span>);
  <span class="hljs-keyword">await</span> expect(page.getByText(<span class="hljs-string">"Welcome back!"</span>)).toBeVisible();
});
</code></pre>
<p>Let’s understand what’s happening here step-by-step:</p>
<ul>
<li><p><strong>Visit the page:</strong> The test opens your web app in a real browser. It navigates to <code>http://localhost:3000/login</code>.</p>
</li>
<li><p><strong>Simulate user input:</strong> Playwright fills in the email and password fields, just like a real user typing into the form.</p>
</li>
<li><p><strong>Perform actions:</strong> It clicks the login button, triggering all the same logic your frontend and backend would normally handle.</p>
</li>
<li><p><strong>Verify the outcome:</strong> Once the user logs in, check if the URL changes to <code>/dashboard</code> and whether a welcome message appears on the screen.</p>
</li>
</ul>
<p>That’s it, you just automated your first user journey from login to dashboard. Both frameworks achieve the same goal, ensuring your app behaves correctly in a real browser, not just in isolated tests.</p>
<h3 id="heading-ai-augmented-testing">AI-Augmented Testing</h3>
<p>As testing evolves, a new layer has emerged that is <strong>AI-Augmented QA</strong>. This isn’t just another tool in the developer’s toolkit. It’s a complete transformation in how software quality is managed.</p>
<p>Traditionally, testing has been a manual process. Engineers wrote, maintained, and updated test cases whenever the product changed. But with AI entering the scene, that manual burden is decreasing.</p>
<p>AI models can now analyze your codebase, understand logic, and generate relevant test cases almost instantly, covering edge cases you might never think of. Tools like <a target="_blank" href="https://github.com/features/copilot">GitHub Copilot</a> and <a target="_blank" href="https://www.codium.ai/qodo/">CodiumAI</a> already assist in generating smart test suites, while continuously learning from your coding style and past patterns.</p>
<p>Beyond code suggestions, complete AI QA platforms are changing automation itself. For example, an AI QA agent like <a target="_blank" href="https://bug0.com/">Bug0</a> can adjust to UI changes automatically. If a button label or DOM structure changes, its self-healing tests find elements visually instead of depending on fixed selectors.</p>
<p>It also produces real-time test reports with detailed logs and video recordings, helping developers pinpoint UI or data changes causing failures.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1759925194041/7b3a5b82-6313-4ce8-8ae8-6d80dafbc5be.png" alt="A screenshot of a code editor displaying a test script, including code snippets for page navigation and URL checks. Below the code, there is a section labeled &quot;Videos&quot; with a video player showing" class="image--center mx-auto" width="800" height="921" loading="lazy"></p>
<p>With CI/CD integrations like GitHub or GitLab, it can automatically start and validate test runs for every pull request, updating PR checks just like a human QA engineer would.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1759924826000/ff55cf75-0b8d-4d01-9f12-2f1920be6862.png" alt="A screenshot of a GitHub interface showing a failed Vercel deployment, a skipped public API test, and six successful checks. An arrow points to the successful &quot;Bug0 QA Agent&quot; test. Notifications also indicate that a review is required, and the branch is out-of-date with the base branch." class="image--center mx-auto" width="800" height="537" loading="lazy"></p>
<p>While AI-assisted testing is powerful, it’s not a full replacement for human judgment. Developers still play a vital role in the following ways:</p>
<ul>
<li><p>AI can generate test cases, but humans must decide what truly matters for business logic and user experience.</p>
</li>
<li><p>Reviewing AI-generated tests to ensure they are relevant and to avoid false positives.</p>
</li>
<li><p>Interpreting failures contextually means understanding whether a test failure indicates a real bug or an expected change.</p>
</li>
<li><p>Maintaining ethical and data-safe workflows involves avoiding the exposure of sensitive data when using cloud-based AI tools.</p>
</li>
</ul>
<p>When used responsibly, AI becomes a testing partner, automating the tedious tasks while leaving creative problem-solving, decision-making, and domain understanding to developers.</p>
<p>This shift marks the beginning of intelligent, autonomous QA. AI isn’t just automating repetitive testing, it’s transforming the process into a continuous, adaptive feedback loop, capable of predicting and resolving failures on its own.</p>
<p>In the coming years, expect testing to evolve into a collaborative process between human engineers and AI copilots, ensuring every release is not just faster, but smarter and more reliable than ever before.</p>
<h2 id="heading-future-of-javascript-testing">Future of JavaScript Testing</h2>
<p>JavaScript testing is changing faster than ever. A few years ago, developers had to deal with tons of testing libraries and confusing setups. Now, things are becoming much more unified, smarter, and easier to work with.</p>
<p>In the future, testing will move from being reactive to proactive. That means instead of catching bugs after they happen, tools will be smart enough to predict and prevent them before they appear.</p>
<p>With AI-powered test generation and real-time monitoring, every commit you make could be automatically checked for reliability and performance without you even running a command.</p>
<p>Frameworks like <code>Vitest</code>, <code>Playwright</code>, and <code>React Testing Library</code> will still be the core tools, but the real progress will come from how they integrate and learn.</p>
<p>We’ll also see tighter CI/CD integrations, where pipelines can automatically adjust based on your test coverage and code risk. Testing won’t feel like an extra step anymore, it’ll become a natural part of development, powered by both human logic and machine intelligence.</p>
<p>In short, the future of JavaScript testing is about speed, intelligence, and automation. A world where developers spend more time building and less time debugging.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Testing isn’t just about preventing bugs, it’s about building confidence. Confidence that your code works, your features scale, and your users have a seamless experience.</p>
<p>Whether it’s unit tests ensuring logic, integration tests validating flow, E2E tests simulating real behavior, or AI-enhanced automation managing it all. Testing is the silent force that makes great software possible.</p>
<p>As a developer, understanding how testing fits into your workflow is no longer optional. Rather, it’s a skill that sets you apart. The more you test, the better you code and the faster you ship with peace of mind.</p>
<p>So, the next time someone says <strong>writing tests isn’t your job</strong>, you’ll know the truth: Testing isn’t extra work. Instead, it’s part of writing better, more reliable software.</p>
<h2 id="heading-before-we-end"><strong>Before We End</strong></h2>
<p>I hope you found this article insightful. I’m Ajay Yadav, a software developer and content creator.</p>
<p>You can connect with me on:</p>
<ul>
<li><p><a target="_blank" href="https://x.com/atechajay">Twitter/X</a> and <a target="_blank" href="https://www.linkedin.com/in/atechajay/">LinkedIn</a>, where I share insights to help you improve 0.01% each day.</p>
</li>
<li><p>Check out my <a target="_blank" href="https://github.com/ATechAjay">GitHub</a> for more projects.</p>
</li>
<li><p>I also run a <a target="_blank" href="http://youtube.com/@atechajay">YouTube Channel</a> where I share content about careers, software engineering, and technical writing.</p>
</li>
</ul>
<p>See you in the next article — until then, keep learning!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use pytest: A Simple Guide to Testing in Python ]]>
                </title>
                <description>
                    <![CDATA[ With the recent advancements in AI, tools like ChatGPT have made the development process faster and more accessible. Developers can now write code and build web apps with some well-articulated prompts and careful code reviews. While this brings an in... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-pytest-a-guide-to-testing-in-python/</link>
                <guid isPermaLink="false">686d82b56332ba136ecc139e</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ pytest ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ TDD (Test-driven development) ]]>
                    </category>
                
                    <category>
                        <![CDATA[ unit testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ guide ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Olowo Jude ]]>
                </dc:creator>
                <pubDate>Tue, 08 Jul 2025 20:42:29 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1752007334998/e196493e-f3e0-4e63-b6eb-ce66c5481d9c.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>With the recent advancements in AI, tools like ChatGPT have made the development process faster and more accessible. Developers can now write code and build web apps with some well-articulated prompts and careful code reviews.</p>
<p>While this brings an increase in productivity, there's a growing downside. AI-generated code is prone to errors, unexpected bugs, or poor integration with the rest of your code.</p>
<p>Because of these risks, it’s more important than ever to establish robust testing practices to make sure your code is high quality and properly functioning. Various testing tools are available to help solve these challenges, and pytest stands out in the Python ecosystem for its simplicity, flexibility, and powerful features.</p>
<p>In this article, we'll explore the following topics:</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-use-pytest">Why Use pytest?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-write-your-first-tests-with-pytest">How to Write Your First Tests with pytest</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-run-pytest-tests">How to Run pytest Tests</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-interpret-pytest-results">How to Interpret pytest Results</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-handle-exceptions-in-pytest">How to Handle Exceptions in pytest</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-advanced-pytest-features">Advanced pytest Features</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-1-pytest-markers">1. pytest Markers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-2-pytest-fixtures">2. pytest Fixtures</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-3-parametrization">3. Parametrization</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-4-pytest-plugins">4. pytest Plugins</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<p>By the end of this article, you will have a comprehensive knowledge of pytest and be able to use it in your Python development process.</p>
<h2 id="heading-pre-requisites"><strong>Pre-requisites</strong></h2>
<ul>
<li><p>Must have Python installed</p>
</li>
<li><p>An understanding of the Python programming language</p>
</li>
</ul>
<h2 id="heading-why-use-pytest">Why Use pytest?</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751221734601/f5d6093a-37d2-4d49-85f2-c1a41a98ab67.png" alt="An image of pytest logo." class="image--center mx-auto" width="742" height="677" loading="lazy"></p>
<p>pytest is a popular testing framework for Python that makes it easy to write and run tests. Unlike unittest and other Python testing frameworks, pytest’s simple syntax allows developers to write tests directly as functions or within classes. This lets you write clean, readable code without complexities.</p>
<p>pytest also supports popular Python frameworks like Flask, Django, and more. Combined with other rich features, pytest equips you with the tools you need to ship reliable software in today’s AI-driven era.</p>
<p>Key features of pytest that make it a preferred testing tool include:</p>
<ul>
<li><p><strong>Flexibility:</strong> it provides flexibility in test structure by supporting tests for functions, classes, and modules.</p>
</li>
<li><p><strong>Detailed test output:</strong> it provides a detailed and readable test output, making it easy to understand test failures and errors.</p>
</li>
<li><p><strong>Automatic test discovery:</strong> it automatically discovers tests by looking for files that start with "<code>test_</code>" or end with "<code>_test.py</code>". This eliminates the need for manually specifying test files**.**</p>
</li>
<li><p><strong>Parameterization:</strong> it supports parameterized tests, which allow you to run a single test function with multiple sets of inputs.</p>
</li>
<li><p><strong>Fixtures:</strong> it fixtures provide <code>setup</code> and <code>tearDown</code> methods that help prevent code repetition. This enables you to set up baseline conditions for your tests and also delete them after each test.</p>
</li>
<li><p><strong>Plugins and extensions:</strong> it has a rich ecosystem of plugins and extensions that add extra functionalities, such as detailed tests reporting, and integration with other tools and Python frameworks like Django and Flask.</p>
</li>
<li><p><strong>Compatibility:</strong> it is compatible with other testing frameworks like <code>unittest</code> , allowing you to migrate tests from different testing frameworks and run them seamlessly on it.</p>
</li>
</ul>
<h2 id="heading-how-to-write-your-first-tests-with-pytest">How to Write Your First Tests with pytest</h2>
<p>This section will guide you through writing your first set of tests using the pytest framework.</p>
<p>pytest is a Python package, and you’ll need to install it before using it. You can do that with the following command:</p>
<pre><code class="lang-python">pip install pytest
</code></pre>
<p><strong>NOTE:</strong> Following Python's best practices, it’s recommended you install pytest within a virtual environment. <a target="_blank" href="https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/">Here's a guide</a> to help you set it up.</p>
<p>Next, create a Python file where you will write your tests and import pytest into it using:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pytest
</code></pre>
<p>pytest has 2 basic methods of writing tests, which include:</p>
<ul>
<li><p><strong>The function-based method:</strong> This method is straightforward for writing tests because you write the tests in individual functions.</p>
<p>  <strong>Note:</strong> Each function name must be prefixed with the word <code>test_</code> for pytest to discover and run these tests automatically.</p>
<p>  Here’s an example of a function-based test:</p>
<pre><code class="lang-python">  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_addition</span>():</span>
      <span class="hljs-keyword">assert</span> <span class="hljs-number">1</span> + <span class="hljs-number">1</span> == <span class="hljs-number">2</span>
</code></pre>
<p>  <strong>Note:</strong> In the code above, the <code>assert</code> statement used here in pytest is Python’s built-in “<code>assert</code>”. It’s more convenient and doesn’t require the specific methods like <code>assertEqual</code> and <code>assertTrue</code> which are common with unittest. Another advantage of using the <code>assert</code> statement is that it provides more detailed error messages when an assertion fails.</p>
</li>
<li><p><strong>Class-based method:</strong> This method is similar to the way of writing tests in <code>unittest</code>, except that your test class does not inherit any methods. An example is shown below:</p>
<pre><code class="lang-python">  <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TestMathOperations</span>:</span>
      <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_addition</span>(<span class="hljs-params">self</span>):</span>
          <span class="hljs-keyword">assert</span> <span class="hljs-number">1</span> + <span class="hljs-number">1</span> == <span class="hljs-number">2</span>
</code></pre>
<p>  This method of writing tests in pytest is useful when you want to group related tests together.</p>
</li>
</ul>
<h2 id="heading-how-to-run-pytest-tests">How to Run pytest Tests</h2>
<p>Running pytest differs slightly from the normal convention of running regular Python scripts.</p>
<p>The general method of running pytest tests is by running the <code>pytest</code> command in your terminal. pytest will automatically look for and run all files of the form <code>test_*.py</code> or <code>*_test.py</code> in the current directory and subdirectories. But while this may be a great way to run tests, pytest offers more flexibility beyond this general method of running tests.</p>
<p>Depending on preferences, you may want to run your test files based on the following:</p>
<ol>
<li><p><strong>To run a specific test file</strong>: To run tests in a specific file, use the <code>pytest</code> command followed by the file name. For example: <code>pytest test_example.py</code>.</p>
</li>
<li><p><strong>To run tests in a directory:</strong> Let’s say you have a directory named Tests that contains some test files. To run all the tests in that directory, use the <code>pytest</code> command followed by the directory and a forward slash. For example: <code>pytest Tests/</code>.</p>
</li>
<li><p><strong>To run tests using specific keywords:</strong> To run tests based on a certain keyword, use the command <code>pytest -k "keyword"</code>. Pytest will automatically look for and run function names, class names, or file names matching that keyword in the current directory and subdirectories. But to run tests matching a certain keyword in a specific file, you’d have to specify the file name after the <code>pytest</code> command. For example: <code>pytest test_example.py -k "keyword"</code>.</p>
</li>
<li><p><strong>Run a specific test within a test file:</strong> To run only a specific test inside a test file, use the command <code>pytest test_example.py::test_addition</code>. This will run only the <code>test_addition</code> test function within the <code>test_example.py</code> module.</p>
</li>
<li><p><strong>To run all test methods in a specific class</strong>: To run all the tests within a specific class, use <code>pytest test_example.py::TestClass</code>. This command would run all the test methods inside the <code>TestClass</code> class in the <code>test_example.py</code> module.</p>
</li>
<li><p><strong>To run a specific test method inside a specific class:</strong> To run a specific test inside a specific class, use <code>pytest test_example.py::TestClass::test_addition</code>. This command would run the specific <code>test_addition</code> method within the <code>TestClass</code> class in the <code>test_example.py</code> module.</p>
</li>
</ol>
<h2 id="heading-how-to-interpret-pytest-results">How to Interpret pytest Results</h2>
<p>One major advantage pytest has over other Python testing frameworks is the rich output it provides, which gives very detailed information about the status of your tests.</p>
<p>Let’s use a basic test to understand how to interpret pytest’s output:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pytest

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_addition</span>():</span>
    <span class="hljs-keyword">assert</span> <span class="hljs-number">1</span> + <span class="hljs-number">1</span> == <span class="hljs-number">3</span>
</code></pre>
<p>Run this test, and we get an output similar to the one below:</p>
<pre><code class="lang-python">============================== test session starts ====================================
platform win32 -- Python <span class="hljs-number">3.10</span><span class="hljs-number">.5</span>, pytest<span class="hljs-number">-8.4</span><span class="hljs-number">.1</span>, pluggy<span class="hljs-number">-1.6</span><span class="hljs-number">.0</span>
rootdir: C:\\Users\\hp\\Desktop\\Pytest
collected <span class="hljs-number">1</span> items

                                                                                  [ <span class="hljs-number">50</span>%]
test_example.py F                                                                 [<span class="hljs-number">100</span>%]

===================================== FAILURES =========================================
____________________________________test_addition ______________________________________

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_addition</span>():</span>
&gt;       <span class="hljs-keyword">assert</span> <span class="hljs-number">1</span> + <span class="hljs-number">1</span> == <span class="hljs-number">3</span>
E       <span class="hljs-keyword">assert</span> (<span class="hljs-number">1</span> + <span class="hljs-number">1</span>) == <span class="hljs-number">3</span>

test_example.py:<span class="hljs-number">4</span>: AssertionError
============================== short test summary info =================================
FAILED test_example.py::test_addition - <span class="hljs-keyword">assert</span> (<span class="hljs-number">1</span> + <span class="hljs-number">1</span>) == <span class="hljs-number">3</span>
========================= <span class="hljs-number">1</span> failed, <span class="hljs-number">1</span> passed <span class="hljs-keyword">in</span> <span class="hljs-number">0.13</span>s ==================================
</code></pre>
<p>The above output is divided into several sections. Here’s a breakdown of what each section means:</p>
<ol>
<li><p>Test session information:</p>
<pre><code class="lang-python"> =============================== test session starts ===============================
 platform win32 -- Python <span class="hljs-number">3.10</span><span class="hljs-number">.5</span>, pytest<span class="hljs-number">-8.4</span><span class="hljs-number">.1</span>, pluggy<span class="hljs-number">-1.6</span><span class="hljs-number">.0</span>
 rootdir: C:\\Users\\hp\\Desktop\\TDD pytest
 collected <span class="hljs-number">1</span> item
</code></pre>
<ul>
<li><p>This section displays a summary of the test environment. It begins with a line marker that indicates the beginning of the test session.</p>
</li>
<li><p>Below the marker, pytest displays information about the operating system, along with the installed versions of Python, pytest and pluggy. (Pluggy is a pytest dependency used to manage plugins.)</p>
</li>
<li><p>The next line indicates the root directory where the test is being run.</p>
</li>
<li><p>The last line in this section displays the number of tests found in this directory.</p>
</li>
</ul>
</li>
<li><p>Test status:</p>
<pre><code class="lang-python"> test_example.py F                                                              [<span class="hljs-number">100</span>%]

 ================================== FAILURES =========================================
 ________________________________ test_addition ______________________________________

     <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_addition</span>():</span>
 &gt;       <span class="hljs-keyword">assert</span> <span class="hljs-number">1</span> + <span class="hljs-number">1</span> == <span class="hljs-number">3</span>
 E       <span class="hljs-keyword">assert</span> (<span class="hljs-number">1</span> + <span class="hljs-number">1</span>) == <span class="hljs-number">3</span>

 test_example.py:<span class="hljs-number">4</span>: AssertionError
</code></pre>
<ul>
<li><p>This section displays information about the status of our tests</p>
</li>
<li><p>The first line in this section specifies the test file which is being run, followed by the status (F in this case, which indicates a test failure).</p>
</li>
<li><p>The next set of lines gives specific information about the failed tests. This includes the function where the failure occurred (<code>test_addition</code>), and the exact line of code responsible for the error.</p>
</li>
<li><p>The last line gives a concise summary of this section. It indicates that the error occurred in <code>test_example.py</code> on line <code>4</code> and it was an <code>AssertionError</code>.</p>
</li>
</ul>
</li>
<li><p>Test summary:</p>
<pre><code class="lang-python"> ============================= short test summary info =============================
 FAILED test_example.py::test_addition - <span class="hljs-keyword">assert</span> (<span class="hljs-number">1</span> + <span class="hljs-number">1</span>) == <span class="hljs-number">3</span>
 ================================ <span class="hljs-number">1</span> failed <span class="hljs-keyword">in</span> <span class="hljs-number">0.13</span>s ================================
</code></pre>
<ul>
<li><p>This section provides an overall summary of the test.</p>
</li>
<li><p>It indicates that the failed test occurred in <code>test_example.py</code> file in the <code>test_addition</code> function because of an incorrect assertion <code>(1 + 1) == 3</code> which isn’t true.</p>
</li>
</ul>
</li>
</ol>
<p>Edit the code with the correct assertion <code>assert(1 + 1) == 2</code> and rerun the code. This time, the code passes with a different output.</p>
<pre><code class="lang-python">=============================== test session starts ==================================
platform win32 -- Python <span class="hljs-number">3.10</span><span class="hljs-number">.5</span>, pytest<span class="hljs-number">-8.3</span><span class="hljs-number">.2</span>, pluggy<span class="hljs-number">-1.5</span><span class="hljs-number">.0</span>
rootdir: C:\\Users\\hp\\Desktop\\TDD pytest
collected <span class="hljs-number">1</span> items

test_example.py .                                                               [<span class="hljs-number">100</span>%]

=============================== <span class="hljs-number">1</span> passed <span class="hljs-keyword">in</span> <span class="hljs-number">0.01</span>s =================================
</code></pre>
<h3 id="heading-how-to-handle-exceptions-in-pytest">How to Handle Exceptions in pytest</h3>
<p>Exceptions are unexpected errors that occur while running our tests, and they prevent our code from performing as expected. As a result, pytest offers several built-in mechanisms for handling these exceptions (but we’ll just cover one of them in this article).</p>
<p><code>pytest.raises</code> <strong>Context Manager</strong> is a tool that checks if your code raises specific exceptions. If the specified exception is raised, that test passes, confirming that the expected error occurred. But if the specified exception is not raised, that test fails.</p>
<p><strong>Usage Examples of</strong> <code>pytest.raises</code></p>
<ol>
<li><p><strong>Checking for</strong> <code>ValueError</code>: In Python, a <code>ValueError</code> is raised when a function receives an argument with an incorrect value. In the example below, we can verify that a <code>ValueError</code> is raised when attempting to calculate the square root of a negative number.</p>
<pre><code class="lang-python"> <span class="hljs-keyword">import</span> pytest
 <span class="hljs-keyword">import</span> math

 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">calculate_square_root</span>(<span class="hljs-params">value</span>):</span>
     <span class="hljs-keyword">if</span> value &lt; <span class="hljs-number">0</span>:
         <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"Cannot calculate the square root of a negative number"</span>)
     <span class="hljs-keyword">return</span> math.sqrt(value)

 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_calculate_square_root</span>():</span>
     <span class="hljs-keyword">with</span> pytest.raises(ValueError):
         calculate_square_root(<span class="hljs-number">-1</span>)
</code></pre>
</li>
<li><p><strong>Checking for</strong> <code>ZeroDivisionError</code>: Dividing a number by zero raises a <code>ZeroDivisionError</code>. In this example, we check that this error is raised when dividing a number by zero.</p>
<pre><code class="lang-jsx"> <span class="hljs-keyword">import</span> pytest

 def divide_numbers(numerator, denominator):
     <span class="hljs-keyword">return</span> numerator / denominator

 def test_divide_numbers():
     <span class="hljs-keyword">with</span> pytest.raises(ZeroDivisionError):
         divide_numbers(<span class="hljs-number">10</span>, <span class="hljs-number">0</span>)
</code></pre>
</li>
<li><p><strong>Checking for</strong> <code>TypeError</code>: A <code>TypeError</code> is raised when an operation is applied to an object of an inappropriate type. Here, we check that this error is raised when adding incompatible data types, such as a string and an integer given in the example.</p>
<pre><code class="lang-jsx"> <span class="hljs-keyword">import</span> pytest

 def add_numbers(a, b):
     <span class="hljs-keyword">return</span> a + b

 def test_add_numbers():
     <span class="hljs-keyword">with</span> pytest.raises(<span class="hljs-built_in">TypeError</span>):
         add_numbers(<span class="hljs-string">"10"</span>, <span class="hljs-number">5</span>)
</code></pre>
</li>
<li><p><strong>Checking for</strong> <code>KeyError</code>: A <code>KeyError</code> is raised when we try to access a dictionary key that doesn’t exist. We can verify and handle this error using the following code:</p>
<pre><code class="lang-python"> <span class="hljs-keyword">import</span> pytest

 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_value</span>(<span class="hljs-params">dictionary, key</span>):</span>
     <span class="hljs-keyword">return</span> dictionary[key]

 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_get_value</span>():</span>
     <span class="hljs-keyword">with</span> pytest.raises(KeyError):
         get_value({<span class="hljs-string">"name"</span>: <span class="hljs-string">"Alice"</span>}, <span class="hljs-string">"age"</span>)
</code></pre>
</li>
</ol>
<h2 id="heading-advanced-pytest-features">Advanced pytest Features</h2>
<p>As a robust testing framework, pytest offers some advanced features that help you manage complex test scenarios. In this section, we will explore some of these advanced features at a beginner-friendly level and demonstrate how you can start applying them in your tests.</p>
<h3 id="heading-1-pytest-markers">1. pytest Markers</h3>
<p>When working with a large codebase, sometimes running every single test can be time-consuming. This is where pytest markers come in handy.</p>
<p>A marker is just like a label that you can attach to a test function to categorise it. Once a test is labelled, you can instruct pytest to run only tests with certain markers. For example, you may label some tests as "slow" if they take longer to execute and run them separately from the faster ones.</p>
<p>One advantage to using Markers is that it allows you to run specific tests based on categories or specific parameters, and also skip tests if certain conditions aren’t met.</p>
<p>pytest comes along with some built-in markers that can be quite useful:</p>
<ol>
<li><p><code>@pytest.mark.skip</code>: This marker allows you to skip a test unconditionally, and can be useful when you know a test will fail due to an external issue or incomplete code.</p>
<p> <strong>Example:</strong></p>
<pre><code class="lang-python"><span class="hljs-meta"> @pytest.mark.skip(reason="Feature not yet implemented")</span>
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_feature</span>():</span>
     <span class="hljs-keyword">pass</span>
</code></pre>
</li>
<li><p><code>@pytest.mark.skipif</code>: This marker allows you to skip a test conditionally if certain conditions are met.</p>
<p> <strong>Example:</strong></p>
<pre><code class="lang-python"> <span class="hljs-keyword">import</span> sys

<span class="hljs-meta"> @pytest.mark.skipif(sys.platform == "win32", reason="does not run on windows")</span>
 <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TestClass</span>:</span>
     <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_function</span>(<span class="hljs-params">self</span>):</span>
         <span class="hljs-string">"This test will not run under 'win32' platform"</span>
</code></pre>
</li>
<li><p><code>@pytest.mark.xfail</code>: This marker is attached to tests that are expected to fail, probably due to a bug or incomplete feature. So when pytest runs such tests, it won’t count it as a failure.</p>
<p> <strong>Example:</strong></p>
<pre><code class="lang-python"><span class="hljs-meta"> @pytest.mark.xfail(reason="division by zero not handled yet")</span>
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_divide_by_zero</span>():</span>
     <span class="hljs-keyword">assert</span> divide(<span class="hljs-number">10</span>, <span class="hljs-number">0</span>) == <span class="hljs-number">0</span>
</code></pre>
<p> <strong>Note:</strong> Detailed information about skipped/failed tests is not shown by default to avoid cluttering the output.</p>
</li>
</ol>
<p>While pytest comes along with some built-in markers, you can also create your own custom marker (but we won’t cover that in this tutorial). Kindly refer to the documentation for more information on <a target="_blank" href="https://docs.pytest.org/en/stable/example/markers.html">working with custom markers</a></p>
<h3 id="heading-2-pytest-fixtures">2. pytest Fixtures</h3>
<p>In pytest, fixtures allow you to create reusable default data that can be shared across multiple tests. By using fixtures, you can reduce code repetition, making your tests cleaner and more maintainable.</p>
<p>In pytest, fixtures are defined with the <code>@pytest.fixture</code> decorator as shown in the example below:</p>
<p>Let’s say we have several tests that rely on a list of user data. Instead of repeating the same data in each test, we can create a fixture to hold this data, and the fixture is passed across the tests that need it.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pytest

<span class="hljs-meta">@pytest.fixture</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">user_data</span>():</span>
    <span class="hljs-keyword">return</span> [
        {<span class="hljs-string">"name"</span>: <span class="hljs-string">"Alice"</span>, <span class="hljs-string">"age"</span>: <span class="hljs-number">30</span>},
        {<span class="hljs-string">"name"</span>: <span class="hljs-string">"Bob"</span>, <span class="hljs-string">"age"</span>: <span class="hljs-number">25</span>},
        {<span class="hljs-string">"name"</span>: <span class="hljs-string">"Charlie"</span>, <span class="hljs-string">"age"</span>: <span class="hljs-number">35</span>}
    ]

<span class="hljs-comment"># Test function to check for a specific user by name and age</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_user_exists</span>(<span class="hljs-params">user_data</span>):</span>
    user = {<span class="hljs-string">"name"</span>: <span class="hljs-string">"Alice"</span>, <span class="hljs-string">"age"</span>: <span class="hljs-number">30</span>}

    <span class="hljs-comment"># Check if the target user is in the list</span>
    <span class="hljs-keyword">assert</span> user <span class="hljs-keyword">in</span> user_data

<span class="hljs-comment"># Test average age of users</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_average_age</span>(<span class="hljs-params">user_data</span>):</span>
    ages = [user[<span class="hljs-string">"age"</span>] <span class="hljs-keyword">for</span> user <span class="hljs-keyword">in</span> user_data]
    avg_age = sum(ages) / len(ages)
    <span class="hljs-keyword">assert</span> avg_age == <span class="hljs-number">30</span>
</code></pre>
<p><strong>Note:</strong> The <code>@pytest.fixture</code> decorator in the code above marks the <code>user_data</code> function as a fixture in pytest. This fixture provides reusable data that can be shared across multiple test functions, allowing them to share the same setup without repeating code.</p>
<h3 id="heading-3-parametrization">3. Parametrization</h3>
<p>Parametrization is a pytest feature that allows you to run a test function with different sets of data at once.</p>
<p>For example: Let’s say you have a function that calculates the square of a number. To provide enough coverage while testing, you would want to test the function with zero, positive, and negative numbers.</p>
<p>Instead of writing separate test functions for each scenario, you can use parametrization to run a test function with different sets of data at once. This approach is more concise, and reduces code duplication.</p>
<p>To use parametrization in pytest, we use the <code>@pytest.mark.parametrize</code> decorator as shown in the example below:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pytest

<span class="hljs-comment"># Function to calculate the square of a number</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">square_numbers</span>(<span class="hljs-params">num</span>):</span>
    <span class="hljs-keyword">return</span> num * num

<span class="hljs-comment">#Parametrize decorator to test the square function with different inputs</span>
<span class="hljs-meta">@pytest.mark.parametrize("input_value, expected_output", [</span>
    (<span class="hljs-number">2</span>, <span class="hljs-number">4</span>),     
    (<span class="hljs-number">-3</span>, <span class="hljs-number">9</span>),    
    (<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)    
])

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_square</span>(<span class="hljs-params">input_value, expected_output</span>):</span>
    <span class="hljs-keyword">assert</span> square_numbers(input_value) == expected_output
</code></pre>
<p>In the example above, the different input values and expected values are listed in the <code>@pytest.mark.parametrize</code> decorator. We’re testing the <code>square_numbers()</code> function with three different input values: <code>2</code>, <code>-3</code>, and <code>0</code>.</p>
<p>For each value, pytest calls the <code>test_square()</code> function and compares the result of <code>square_numbers(input_value)</code> to <code>expected_output</code>.</p>
<p>This approach is more efficient and ensures the function behaves as expected across a variety of cases.</p>
<h3 id="heading-4-pytest-plugins">4. pytest Plugins</h3>
<p>Plugins are an extension mechanism that allows you to add new functionality to pytest or modify its existing behaviour. These plugins work by providing additional features that extend pytest’s capabilities, which can be useful, especially in complex test scenarios.</p>
<p>pytest has a vast ecosystem of plugins, each designed to suit your different testing needs. You can find the full list of available plugins on <a target="_blank" href="https://pypi.org/">PyPI</a> in the <a target="_blank" href="https://docs.pytest.org/en/stable/reference/plugin_list.html#plugin-list">pytest Plugin List</a>.</p>
<p>To use a plugin, simply install it with <code>pip</code>.</p>
<p><strong>For example:</strong></p>
<pre><code class="lang-python">pip install pytest-NAME
pip uninstall pytest-NAME
</code></pre>
<p><strong>Note:</strong> <code>NAME</code> in the code above should be replaced with the name of the plugin you want to install.</p>
<p>After installing a plugin, pytest automatically finds and integrates it. There’s no need for any additional configuration.</p>
<p>In this section, we explored some of pytest's advanced features. By leveraging these features, you can now significantly improve the quality of your tests by ensuring they’re more efficient, scalable, and easier to maintain over time.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this article, you’ve learned the basics of testing with pytest, from writing and interpreting tests to handling exceptions and using advanced features like fixtures and parametrization.</p>
<p>Whether your code is written manually or generated by AI, learning how to write tests empowers you to detect bugs early, and build more reliable software. Testing acts as a safety net that boosts you confidence during development and ensures your code works as expected.</p>
<p>If you're ready to go a step further, I’ve written an in-depth article on <a target="_blank" href="https://judeolowo.hashnode.dev/test-driven-development-in-python-a-complete-guide-to-unittest">Test Driven Development in Python</a>. It is a powerful approach where writing tests guides your entire coding process.</p>
<p>If you found this helpful, let me know, share it with your network, or give it a like to help others discover it too.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Logic, Philosophy, and Science of Software Testing – A Handbook for Developers ]]>
                </title>
                <description>
                    <![CDATA[ In an age of information overload, AI assistance, and rapid technological change, the ability to think clearly and reason soundly has never been more valuable. This handbook takes you on a journey from fundamental logical principles to their practica... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-logic-philosophy-and-science-of-software-testing-handbook-for-developers/</link>
                <guid isPermaLink="false">6851b75a6fd83aa331a8943b</guid>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ debugging ]]>
                    </category>
                
                    <category>
                        <![CDATA[ logic ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Software Engineering ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Science  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Han Qi ]]>
                </dc:creator>
                <pubDate>Tue, 17 Jun 2025 18:43:38 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750176539544/965a99ef-8aad-467c-ae6b-4a144e2d1117.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In an age of information overload, AI assistance, and rapid technological change, the ability to think clearly and reason soundly has never been more valuable.</p>
<p>This handbook takes you on a journey from fundamental logical principles to their practical applications in software development, scientific reasoning, and critical thinking.</p>
<p>Whether you're a high school student learning to think more clearly, a professional debugging complex systems, or simply someone curious about how sound reasoning works, this handbook provides tools for sharper, more reliable thinking.</p>
<h2 id="heading-what-well-cover">What We’ll Cover:</h2>
<h3 id="heading-part-i-foundational-theory"><strong>Part I: Foundational Theory</strong></h3>
<p>We start with the bedrock of formal logic – understanding implications, truth tables, and the core rules of reasoning.</p>
<p>You'll learn the scaffolding for everything that follows:</p>
<ul>
<li><p>How "if-then" statements actually work (spoiler: it's not always intuitive!)</p>
</li>
<li><p>The power of truth tables to map all possible scenarios</p>
</li>
<li><p>Why some arguments are valid while others are logical fallacies</p>
</li>
<li><p>The elegant relationship between <strong>Modus Ponens, Modus Tollens, and Contrapositives</strong></p>
</li>
</ul>
<h3 id="heading-part-ii-practical-applications"><strong>Part II: Practical Applications</strong></h3>
<p>Here's where logic comes alive in tangible ways:</p>
<p><strong>In Software Development:</strong></p>
<ul>
<li><p>How debugging mirrors logical reasoning, and why your tests might be lying to you</p>
</li>
<li><p>The logic behind Test-Driven Development and Mutation Testing</p>
</li>
</ul>
<p><strong>In Scientific Thinking:</strong></p>
<ul>
<li><p>Karl Popper's falsification principle and why it matters beyond academia</p>
</li>
<li><p>How <strong>Hypothesis Testing</strong> is just statistics meets <strong>Modus Tollens</strong></p>
</li>
</ul>
<p><strong>In Everyday Reasoning:</strong></p>
<ul>
<li><p>Spotting logical fallacies in arguments, media, and your thinking</p>
</li>
<li><p>The art of considering multiple causal paths instead of jumping to conclusions</p>
</li>
</ul>
<h3 id="heading-part-iii-philosophical-depths"><strong>Part III: Philosophical Depths</strong></h3>
<p>The final section confronts the beautiful complexity of applying pure logic to an impure world:</p>
<ul>
<li><p>Why perfect "<strong>if-and-only-if</strong>" relationships are the goal but rarely achievable</p>
</li>
<li><p>How modern software systems hide their complexity</p>
</li>
<li><p>The butterfly effect of bugs and why root cause analysis is often harder than it seems</p>
</li>
<li><p>Formal verification tools: from <strong>Prolog</strong> to <strong>Coq</strong> to <strong>TLA+</strong></p>
</li>
</ul>
<h2 id="heading-what-youll-gain">What You'll Gain</h2>
<h3 id="heading-for-students"><strong>For Students:</strong></h3>
<ul>
<li><p><strong>Critical thinking superpowers</strong>: Learn to spot flawed reasoning in arguments, social media, and news</p>
</li>
<li><p><strong>Academic advantage</strong>: These concepts appear in debates, philosophy, computer science, mathematics, and statistics</p>
</li>
</ul>
<h3 id="heading-for-software-engineers"><strong>For Software Engineers:</strong></h3>
<ul>
<li><p><strong>Debugging mastery</strong>: <em>Modus Tollens</em> for debugging: "If the output is wrong, what could cause it?"</p>
</li>
<li><p><strong>Testing philosophy</strong>: Move beyond "make the tests pass" to "prove the code is correct"</p>
</li>
<li><p><strong>Problem analysis</strong>: Avoid jumping to solutions before understanding the real problem</p>
</li>
<li><p><strong>System design</strong>: Think more rigorously about failure modes and edge cases, evaluate cause-and-effect relationships in complex systems</p>
</li>
<li><p><strong>Communication and career growth</strong>: Present arguments more clearly and persuasively, gain logical thinking skills that separate senior engineers from juniors</p>
</li>
</ul>
<h3 id="heading-for-scientists"><strong>For Scientists:</strong></h3>
<ul>
<li><p><strong>Experimental design</strong>: Strengthen your understanding of hypothesis testing and falsifiability</p>
</li>
<li><p><strong>Peer review</strong>: Better evaluate the logical soundness of research claims</p>
</li>
<li><p><strong>Grant writing</strong>: Structure arguments more persuasively using solid logical foundations</p>
</li>
</ul>
<h2 id="heading-pre-requisites">Pre-requisites</h2>
<p>I’ll introduce code samples starting in the second half of the article, so knowing a programming language would be helpful. The concepts in this article are programming language-agnostic, but I’ve used Python throughout for readability.</p>
<p>No prior formal logic or philosophy background is strictly necessary, but the following will let you reap the most benefits from this article:</p>
<ul>
<li><p>Experience in testing and debugging during software development.</p>
</li>
<li><p>Know what REPL (Read-Evaluate-Print-Loop) is if you want to try the Proof Assistants.</p>
</li>
<li><p>Knowledge of logical operators (NOT, AND, OR), and the fact that they take 1 or 2 boolean values as input and return a single boolean value as output.</p>
</li>
<li><p>Basic Algebraic Thinking: representing statements as variables (P, Q), the concept of NOT (¬) as an inversion of statements, and the concept that different input combinations can reach the same output.</p>
</li>
<li><p>Exposure to deductive reasoning, where inferences are made based on some facts, and fallacies, which are some ways arguments can be flawed.</p>
</li>
<li><p>Willingness to engage in conceptual back-and-forth between concrete English examples and abstract logical symbols.</p>
</li>
<li><p>Holding possibly conflicting ideas between the ideal logic world and the impure real world.</p>
</li>
<li><p>Openness to challenging intuition and following logical rules before applying your real-world experience.</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-an-introduction-to-logic">An Introduction to Logic</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-truth-tables-mapping-all-possibilities">Truth Tables: Mapping All Possibilities</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-contrapositives-modus-ponens-modus-tollens">Contrapositives, Modus Ponens, Modus Tollens</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-origin-of-pq-science-and-reality">The Origin of P⟹Q: Science and Reality</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-revisiting-argument-forms-valid-inferences-and-common-fallacies">Revisiting Argument Forms: Valid Inferences and Common Fallacies</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-denying-the-antecedent-a-database-example">Denying the Antecedent: A Database Example</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-assigning-real-world-meanings-to-logic">Assigning Real-World Meanings to Logic</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-applying-logic-to-software-testing">Applying Logic to Software Testing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-a-closer-look-at-testing">A Closer Look at Testing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-revisiting-the-four-statements-for-coding">Revisiting the Four Statements for Coding</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-missing-ingredient-if-and-only-if">The Missing Ingredient - If and Only If</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-mutation-testing-testing-the-tests">Mutation Testing: Testing the Tests</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-toward-if-and-only-if-confidence">Toward If-and-Only-If Confidence</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-real-world-challenges">Real-World Challenges</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-glimmers-of-hope-tools-and-practices-for-clarity">Glimmers of Hope: Tools and Practices for Clarity</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-power-of-falsification-in-testing">The Power of Falsification in Testing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-proof-assistants">Proof Assistants</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-food-for-thought">Food for Thought</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-qed-the-enduring-power-of-logic-in-an-uncertain-world">Q.E.D.: The Enduring Power of Logic in an Uncertain World</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-resources">Resources</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-glossary">Glossary</a></p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749064487021/b0404a1e-3257-4815-bc42-517b2ea955d0.jpeg" alt="man standing at edge of lake looking into the distance" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-an-introduction-to-logic">An Introduction to Logic</h2>
<p>Imagine that the following statement is True:</p>
<p><strong>If you are a coding instructor, then you have a job.</strong></p>
<p>Now, do these make sense?</p>
<ol>
<li><p>You have no job, so you are not a coding instructor</p>
</li>
<li><p>You have a job, so you are a coding instructor</p>
</li>
<li><p>You are not a coding instructor, so you have no job</p>
</li>
</ol>
<h3 id="heading-interpretations">Interpretations</h3>
<p>Based on logic:</p>
<ul>
<li><p>Statement 1 is correct.</p>
</li>
<li><p>Statement 2 is wrong because you may have other jobs without being a coding instructor.</p>
</li>
<li><p>Statement 3 is wrong because you may or may not have a job, and as before, you may have other jobs without being a coding instructor.</p>
</li>
</ul>
<h3 id="heading-growing-complexity">Growing complexity</h3>
<p>These statements grow increasingly complex due to:</p>
<ul>
<li><p>Changing from 2 valid statements to 2 invalid conclusions</p>
</li>
<li><p>Moving from a clear job status (1, 2) to uncertainty about job existence or type (3).</p>
</li>
</ul>
<p>Let’s get familiar with some notation before seeing how <strong>Truth tables</strong> help manage this complexity.</p>
<h3 id="heading-notations">Notations</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Notation</td><td>Meaning</td><td>Example (if P="It's raining", Q="The ground is wet")</td></tr>
</thead>
<tbody>
<tr>
<td><strong>P, Q</strong></td><td>Propositions</td><td>P, Q</td></tr>
<tr>
<td><strong>⟹</strong></td><td>Implies / If...then...</td><td>P⟹Q ("If it's raining, then the ground is wet")</td></tr>
<tr>
<td><strong>¬</strong></td><td>Not</td><td>¬P ("It's not raining")</td></tr>
<tr>
<td><strong>∧</strong></td><td>And (conjunction)</td><td>P∧Q ("It's raining and the ground is wet")</td></tr>
<tr>
<td><strong>∨</strong></td><td>Or (disjunction)</td><td>P∨Q ("It's raining or the ground is wet")</td></tr>
<tr>
<td><strong>⟺</strong></td><td>If and only if (biconditional)</td><td>P⟺Q ("It's raining if and only if the ground is wet")</td></tr>
<tr>
<td>∴</td><td>Therefore</td><td>P ⟹ Q: If it's raining, then the ground is wet; P: It's raining; ∴ Q: <strong>Therefore</strong>, the ground is wet</td></tr>
</tbody>
</table>
</div><h2 id="heading-truth-tables-mapping-all-possibilities">Truth Tables: Mapping All Possibilities</h2>
<h3 id="heading-what-is-a-truth-table"><strong>What is a Truth Table?</strong></h3>
<p>A truth table is a powerful tool in logic that helps us determine the overall truth or falsity of a compound logical statement. It does this by systematically listing <strong>all possible combinations</strong> of truth values (True or False) for its individual component propositions.</p>
<p>For every way the "inputs" (our propositions like P and Q) can be true or false, the truth table shows you the precise "output" (the truth value of the entire logical statement, such as P⟹Q).</p>
<h3 id="heading-why-are-truth-tables-helpful"><strong>Why are Truth Tables Helpful?</strong></h3>
<p>Truth tables offer critical benefits for clear thinking:</p>
<ul>
<li><p><strong>Clarity and precision:</strong> They eliminate ambiguity by explicitly showing the outcome for every single scenario.</p>
</li>
<li><p><strong>Systematic analysis:</strong> They ensure no possible combination is missed, which is vital for sound reasoning.</p>
</li>
<li><p><strong>Foundation for understanding:</strong> They define how logical rules work, forming the bedrock for analyzing more complex arguments in any domain.</p>
</li>
</ul>
<h3 id="heading-how-to-read-our-first-truth-table"><strong>How to Read Our First Truth Table:</strong></h3>
<p>Let's examine the truth table for the implication P⟹Q ("If P then Q").</p>
<p>Each row represents a unique scenario, combining the truth values of P and Q to show the resulting truth value of P⟹Q.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>P</td><td>Q</td><td>P⟹Q (If P then Q)</td><td>Used In</td></tr>
</thead>
<tbody>
<tr>
<td>True</td><td>True</td><td>True</td><td>Modus Ponens ✅</td></tr>
<tr>
<td>True</td><td>False</td><td>False</td><td>Falsifiability 🚨</td></tr>
<tr>
<td>False</td><td>True</td><td>True</td><td>No Inference</td></tr>
<tr>
<td>False</td><td>False</td><td>True</td><td>Modus Tollens ✅</td></tr>
</tbody>
</table>
</div><p>Let's break down each row:</p>
<ul>
<li><p><strong>P and Q Columns:</strong> These show the input truth values (True or False) for our two propositions. Since each can be one of two values, we have 2×2 = 4 unique combinations, filling all four rows.</p>
</li>
<li><p><strong>P ⟹ Q Column:</strong> This is the output truth value of the "If P then Q" statement for each combination of inputs P and Q.</p>
<ul>
<li><p><strong>Row 1: P is True, Q is True.</strong></p>
<ul>
<li><p>If P is true <strong>(you are a coding instructor</strong>) and Q is also true <strong>(you have a job</strong>), then the implication P⟹Q is <strong>True</strong>. (The "If...then..." statement holds).</p>
</li>
<li><p>This row is key for <strong>Modus Ponens</strong>.</p>
</li>
</ul>
</li>
<li><p><strong>Row 2: P is True, Q is False</strong></p>
<ul>
<li><p>If P is true <strong>(you are a coding instructor</strong>) but Q is false <strong>(you have a job</strong>), then the implication P⟹Q is <strong>False</strong>. This is the only scenario that disproves an "if-then" statement.</p>
</li>
<li><p>This row is key for <strong>Falsifiability</strong>.</p>
</li>
</ul>
</li>
<li><p><strong>Row 3: P is False, Q is True.</strong></p>
<ul>
<li><p>If P is False <strong>(you are not a coding instructor)</strong> but Q is True <strong>(you have a job)</strong>, then the implication P⟹Q is still considered <strong>True</strong>. This can seem counter-intuitive.</p>
</li>
<li><p>The reason is that the implication statement <em>only</em> makes a claim about what happens when P is true. If P is false, the implication's claim isn't tested, so it is considered <a target="_blank" href="https://en.wikipedia.org/wiki/Vacuous_truth">vacuously true</a>.</p>
</li>
</ul>
</li>
<li><p><strong>Row 4: P is False, Q is False.</strong></p>
<ul>
<li><p>If P is False <strong>(you are not a coding instructor)</strong> and Q is False <strong>(you have no job)</strong>, then the implication P⟹Q is also considered <strong>True</strong>.</p>
</li>
<li><p>Similar to Row 3, since the initial condition (P) was false, the implication's truth value remains True, as it hasn't been disproven.</p>
</li>
<li><p>This row is key for <strong>Modus Tollens</strong>.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>The "Used In" column serves as a preview of the specific logical arguments or concepts that rely on each row's behavior, which we will explore in detail later.</p>
<h3 id="heading-understanding-the-implication-pq-deeper">Understanding the Implication (P⟹Q) Deeper</h3>
<p>Most programmers are familiar with truth tables from logical operators like <strong>AND (∧)</strong>, <strong>OR (∨)</strong>, and <strong>NOT (¬)</strong>, where they define the output based on combinations of inputs.</p>
<p>The implication (P⟹Q) works similarly, its output is defined by the rules of propositional logic, not by any real-world causal relationship or your “common sense”. For any given pair of inputs for P and Q, the result of P⟹Q is fixed.</p>
<p>If this feels counter-intuitive, consider that mathematical logic, like any formal system, is built upon agreed-upon <strong>axioms</strong>. These basic accepted truths allow us to construct complex systems of ideas. If later found ineffective or contradictory, these axioms can be redefined, or a new system can be developed.</p>
<p>In formal logic, this implication is also defined as being logically equivalent to <strong>"NOT P OR Q" (¬P∨Q)</strong>.</p>
<p>This is the fundamental logical rule that dictates why, <strong>if P is False, P⟹Q is always True, regardless of Q's truth value</strong>. You can also understand this using the <strong>NOT P OR Q</strong> form.</p>
<ul>
<li><p>If P is False, that means NOT P is True.</p>
</li>
<li><p>Using the rules of Logical operation:</p>
<ul>
<li><p>True (Not P) OR True (Q) is True (<strong>NOT P OR Q</strong>)</p>
</li>
<li><p>True (Not P) OR False (Q) is True (<strong>NOT P OR Q</strong>)</p>
</li>
<li><p><strong>NOT P OR Q</strong> is True regardless of what Q is.</p>
</li>
</ul>
</li>
</ul>
<p>The above explains rows 3 and 4 of the truth table from the <strong>NOT P OR Q</strong> form. As an exercise, you can apply the inputs (P, Q) from the first two rows of the truth table to NOT P OR Q to arrive at the same results defined in the P⟹Q column.</p>
<p>This formal definition allows us to use implication to reason in powerful ways, not just in the "forward" direction (P⟹Q, leading to Modus Ponens), but also in a crucial "backward" direction.</p>
<p>This backward form (<strong>Contrapositive</strong>) involves swapping and negating the propositions (¬Q⟹¬P).</p>
<p>For example, if "If you are a coding instructor, then you have a job" is true, then it must also be true that "If you have no job (¬Q), then you are not a coding instructor (¬P). ".</p>
<p>This "backward" way of reasoning, which underpins Modus Tollens, is a powerful tool for inferring conclusions from observed outcomes.</p>
<p>We'll explore the <strong>Contrapositive</strong> and two argument forms (<strong>Modus Ponens, Modus Tollens</strong>) in detail next.</p>
<h2 id="heading-contrapositives-modus-ponens-modus-tollens">Contrapositives, Modus Ponens, Modus Tollens</h2>
<p>We've explored the fundamental implication (P⟹Q) and how truth tables reveal its behavior.</p>
<p>Now, we explore reasoning tools that build upon this foundation: <strong>Modus Ponens</strong>, <strong>Modus Tollens</strong>, and the concept of <strong>Contrapositives</strong>. These are bedrock principles of valid argument and efficient logical thought.</p>
<h3 id="heading-what-is-logical-equivalence">What is Logical Equivalence?</h3>
<p>Before we dive into these specific concepts, let's clarify what <strong>logical equivalence</strong> means. Two statements are <strong>logically equivalent</strong> if they always have the same truth value under all possible circumstances. In simpler terms, if one statement is true, the other is <em>always</em> true. If one is false, the other is <em>always</em> false. They are, in essence, different ways of saying the same logical thing.</p>
<p>Understanding logical equivalence is incredibly useful. It:</p>
<ul>
<li><p><strong>Simplifies logic:</strong> It allows us to substitute one statement for another without changing the truth of an argument, which simplifies complex proofs and reasoning.</p>
</li>
<li><p><strong>Reduces complexity:</strong> In fields like circuit design, it can lead to fewer physical gates.</p>
</li>
<li><p><strong>Maintains software correctness:</strong> In programming, it helps maintain code's correctness during refactoring and debugging, especially when simplifying conditional statements, by ensuring the transformed code still behaves identically to the original under all conditions.</p>
</li>
</ul>
<h3 id="heading-the-contrapositive-an-equivalent-implication">The Contrapositive: An Equivalent Implication</h3>
<p>One of the most important logical equivalences involves the <strong>Contrapositive</strong> of an implication. The contrapositive of an "If P then Q" (P⟹Q) statement is <strong>"If not Q, then not P"</strong> (¬Q⟹¬P).</p>
<p>You might intuitively question how "<strong>If P then Q</strong>" could be logically the same as "<strong>If not Q then not P</strong>." Let's demonstrate this using a truth table.</p>
<p>We'll start with our familiar P and Q columns and the P⟹Q implication. Then, we'll add columns for ¬P (Not P) and ¬Q (Not Q), and finally, the implication for the contrapositive, ¬Q⟹¬P.</p>
<p>Let's look at how the truth table explicitly shows this equivalence:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747584857181/2732a798-da1d-48d9-aa92-c1ca3459b169.png" alt="Truth Table of columns P, Q, P->Q, not P, not Q, not Q -> not P" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-explanation-of-the-table">Explanation of the table</h3>
<ol>
<li><p><strong>P, Q, P ⟹ Q (Columns 1-3):</strong> These are our standard propositions and the implication we've already defined.</p>
</li>
<li><p><strong>¬P (Column 4):</strong> This column simply shows the negation (opposite truth value) of the P column. If P is True, ¬P is False, and vice-versa.</p>
</li>
<li><p><strong>¬Q (Column 5):</strong> Similarly, this column shows the negation of the Q column.</p>
</li>
<li><p><strong>¬Q ⟹ ¬P (Column 6):</strong> This is the contrapositive. We apply the same rules for implication that we learned earlier, but now using ¬Q as our "if" part and ¬P as our "then" part. For example, in Row 2, ¬Q is True and ¬P is False. According to the implication rule (True ⟹ False yields False), the result for ¬Q⟹¬P is False.</p>
</li>
<li><p><strong>The Proof of Equivalence:</strong> Now, compare <strong>Column 3 (P⟹Q)</strong> with <strong>Column 6 (¬Q⟹¬P)</strong>. You'll notice that for every single row, their truth values are identical! When P⟹Q is True, ¬Q⟹¬P is also True. When P⟹Q is False, ¬Q⟹¬P is also False. This perfectly illustrates why they are <strong>logically equivalent</strong>.</p>
</li>
</ol>
<p>So, "If you are a coding instructor, then you have a job" (P⟹Q) is logically the same as saying "If you have no job, then you are not a coding instructor" (¬Q⟹¬P). They convey the same information about the relationship between being a coding instructor and having a job.</p>
<h3 id="heading-how-modus-ponens-and-modus-tollens-relate-to-implication">How Modus Ponens and Modus Tollens Relate to Implication</h3>
<p>Having defined logical equivalence and the contrapositive, we can now precisely understand two of the most fundamental and valid forms of deductive argument: <strong>Modus Ponens</strong> and <strong>Modus Tollens</strong>. Both of these argument forms rely on a core premise that an implication (P⟹Q) is true, and then use additional information to draw a valid conclusion.</p>
<ol>
<li><p><strong>Modus Ponens (Affirming the Antecedent):</strong> This is often considered the most intuitive and direct form of logical inference. It works in the "forward" direction of the implication.</p>
<ul>
<li><p><strong>Premise 1:</strong> We are given that the implication is true: If P, then Q (P⟹Q).</p>
</li>
<li><p><strong>Premise 2:</strong> We are also given that the "if" part, the antecedent, is true: P is true.</p>
</li>
<li><p><strong>Conclusion:</strong> Therefore, we can validly infer that the "then" part, the consequent, must also be true: Q is true.</p>
</li>
</ul>
</li>
</ol>
<p>    <em>Example:</em></p>
<ul>
<li><p>Premise 1: If it is raining (P), then the ground is wet (Q).</p>
</li>
<li><p>Premise 2: It is raining (P).</p>
</li>
<li><p>Conclusion: Therefore, the ground is wet (Q).</p>
</li>
</ul>
<p>    This directly corresponds to <strong>Row 1 (True, True)</strong> of our truth table for P⟹Q.</p>
<ol start="2">
<li><p><strong>Modus Tollens (Denying the Consequent):</strong> This argument form works in the "backward" direction and relies directly on the logical equivalence of an implication and its contrapositive.</p>
<ul>
<li><p><strong>Premise 1:</strong> We are given that the implication is true: If P, then Q (P⟹Q).</p>
</li>
<li><p><strong>Premise 2</strong>: We are also given that the "then" part, the consequent, is false: Not Q (¬Q).</p>
</li>
<li><p><strong>Conclusion</strong>: Therefore, we can validly infer that the "if" part, the antecedent, must also be false: Not P (¬P).</p>
</li>
</ul>
</li>
</ol>
<p>    <em>Example:</em></p>
<ul>
<li><p>Premise 1: If it is raining (P), then the ground is wet (Q).</p>
</li>
<li><p>Premise 2: The ground is <strong>not</strong> wet (¬Q).</p>
</li>
<li><p>Conclusion: Therefore, it is <strong>not</strong> raining (¬P).</p>
</li>
</ul>
<p>    Modus Tollens is valid because if P⟹Q is true, its contrapositive (¬Q⟹¬P) must also be true. Applying Modus Ponens to this contrapositive (with ¬Q as our second premise) directly leads to the conclusion ¬P. This corresponds to <strong>Row 4 (False, False)</strong> of our original truth table for P⟹Q, where P and Q are both false but the implication is still true.</p>
<p>These two argument forms are central to rigorous deductive reasoning, allowing us to draw certain conclusions based on the truth of implications and related facts.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749063972374/e3eaf8a6-8eb1-4fa2-9e97-703b547a81bd.jpeg" alt="Title Page of Book by Charles Darwin: On the Origin of Species" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-the-origin-of-pq-science-and-reality">The Origin of P⟹Q: Science and Reality</h2>
<p>In science, hypotheses often take the form "<strong>If P, then Q</strong>" where P is a cause and Q is its predicted effect –for example, "If a drug is given (P), then symptoms improve (Q)."</p>
<p>Ideally, P is controllable, as in experimental studies, but even in observational studies, P must be clearly defined and measurable.</p>
<p>Each experiment yields one observation, reflecting one of four possible truth-value combinations of P and Q.</p>
<h3 id="heading-the-falsifying-case-in-science-and-logic">The Falsifying Case in Science and Logic</h3>
<p>Each experiment produces a single observation – one of the four possible combinations of P and Q.</p>
<ul>
<li><p>If P=True, Q=False is observed (row 2 of the truth table), the hypothesis is <strong>falsified</strong></p>
</li>
<li><p>In all other cases, the hypothesis is <strong>not falsified</strong> (yet)</p>
</li>
</ul>
<p>Thus:</p>
<ul>
<li><p>If all observations fall in the 3 truth-preserving rows, the hypothesis remains viable.</p>
</li>
<li><p>If at least one experiment yields P=True, Q=False, we either:</p>
<ul>
<li><p>Conclude falsification, or</p>
</li>
<li><p>Re-examine the experiment and attempt replication before accepting falsification.</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-the-power-of-the-falsifying-case">The Power of the Falsifying Case</h3>
<h4 id="heading-in-the-logical-world">In the Logical World</h4>
<p>The falsifying case is not useful for inference with Modus Ponens or Modus Tollens because these two argument forms require starting with <strong>P⟹Q = True</strong>. I’ll explain both arguments in detail later.</p>
<p>But the falsifying case is useful for showing counterexamples to disprove the implication, or proof by contradiction.</p>
<h4 id="heading-in-the-real-scientific-world">In the Real Scientific world</h4>
<p>The falsifying case embodies <strong>Falsifiability</strong> – a crucial concept in Science.</p>
<blockquote>
<p>In so far as a scientific statement speaks about reality, it must be falsifiable: and in so far as it is not falsifiable, it does not speak about reality.</p>
<p><strong>— Karl R. Popper, The Logic of Scientifc Discovery</strong></p>
</blockquote>
<p>Scientific theories come about through hypotheses that are continually tested and survive attempts at falsification.</p>
<h3 id="heading-popperian-falsification-and-hypothesis-testing">Popperian Falsification and Hypothesis Testing</h3>
<p>These two approaches, one philosophical and one statistical, are distinct but complementary in the scientific method.</p>
<ul>
<li><p><strong>Popperian Falsification</strong> starts with a scientific hypothesis (for example, "P has an effect on Q"). Its core aim is to actively seek evidence that would disprove this hypothesis. If such disproving evidence is found, the hypothesis is falsified.</p>
</li>
<li><p><strong>Statistical Hypothesis Testing</strong> begins with a null hypothesis (H0​) (for example, "P has no effect on Q"). Its goal is to determine if the collected data provides sufficiently extreme evidence to reject this null hypothesis.</p>
</li>
</ul>
<p>If the null hypothesis is rejected, it provides statistical support for the alternative hypothesis (that P <em>does</em> have an effect on Q). This statistically supported hypothesis then becomes a stronger candidate, continually subjected to further Popperian attempts at falsification through new experiments and observations.</p>
<h3 id="heading-the-nuance-implication-is-not-causality">The Nuance: Implication is Not Causality</h3>
<p>P⟹Q does <strong>not</strong> inherently imply that P causes Q.</p>
<p>Consider these examples:</p>
<ul>
<li><p>"If the fire alarm is sounding, then there is smoke." The alarm doesn't <em>cause</em> the smoke.</p>
</li>
<li><p>"If a colleague screams during code review, then the code is bad." Does the screaming <em>cause</em> the bad code, or merely reveal it? (Perhaps sometimes both! 😰)</p>
</li>
</ul>
<p><strong>Causality</strong> is a real-world concept crucial for making informed decisions, predicting outcomes, and inferring the underlying reasons for events.</p>
<p>It's often central to predictive modeling and supervised learning in data science, where the target variable is the effect and the predictors are proposed causes. A common pitfall here is <strong>data leakage</strong>, where predictors are inadvertently influenced by (or are themselves effects of) the target, violating the causal assumption.</p>
<p>Logic, however, doesn't model time, mechanisms, or interventions. It only cares about <strong>truth values and formal structure</strong>. Logic defines what is true based on premises, not what <em>makes</em> something true in a causal sense.</p>
<h2 id="heading-revisiting-argument-forms-valid-inferences-and-common-fallacies">Revisiting Argument Forms: Valid Inferences and Common Fallacies</h2>
<p>We've now established the rules of implication, understood logical equivalence, and learned about two powerful, valid argument forms: <strong>Modus Ponens</strong> and <strong>Modus Tollens</strong>. But when we try to reason using "if-then" statements, it's easy to fall into common logical traps.</p>
<p>In this section, we'll systematically revisit the four common ways we might try to draw conclusions from an implication <strong>P⟹Q (If you are a coding instructor, then you have a job)</strong> introduced at the start of the handbook.</p>
<p>Two are valid arguments (Modus Ponens and Modus Tollens), and two are common logical fallacies. Understanding the differences is crucial for sound reasoning.</p>
<p>First, let's quickly define the parts of an "if-then" condition:</p>
<ul>
<li><p><strong>Antecedent:</strong> The "if" part of the condition (P).</p>
</li>
<li><p><strong>Consequent:</strong> The "then" part of the condition (Q).</p>
</li>
</ul>
<p>Now, let's examine these four argument forms, using our knowledge of truth tables and the coding instructor example.</p>
<h3 id="heading-affirming-the-antecedent-modus-ponens">Affirming the Antecedent (Modus Ponens)</h3>
<p>This is the first valid argument form we discussed. It's called "affirming the antecedent" because it asserts the truth of the "if" part (the antecedent, P) to conclude the "then" part (the consequent, Q).</p>
<ul>
<li><p><strong>Argument Form:</strong></p>
<ol>
<li><p>If P, then Q (P⟹Q)</p>
</li>
<li><p>P is true.</p>
</li>
<li><p>Therefore, Q is true.</p>
</li>
</ol>
</li>
<li><p><strong>Examples:</strong></p>
<ul>
<li><p>You are a coding instructor (P), so you have a job (Q).</p>
</li>
<li><p>You provided invalid input data (P), so the code will show an error (Q).</p>
</li>
</ul>
</li>
<li><p><strong>Interpretation:</strong> This argument directly aligns with <strong>Row 1 (P=True, Q=True)</strong> of our truth table, where the implication holds true. It's often the most intuitive form of logical deduction. In programming, it's natural to expect bad input to lead to error messages if the code is designed correctly.</p>
</li>
</ul>
<h3 id="heading-denying-the-consequent-modus-tollens">Denying the Consequent (Modus Tollens)</h3>
<p>This is the second valid argument form. It's called "denying the consequent" because it asserts the falsity of the "then" part (the consequent, ¬Q) to conclude the falsity of the "if" part (the antecedent, ¬P). As we learned, Modus Tollens derives its validity from the logical equivalence of P⟹Q and its contrapositive (¬Q⟹¬P).</p>
<ul>
<li><p><strong>Argument Form:</strong></p>
<ol>
<li><p>If P, then Q (P⟹Q)</p>
</li>
<li><p>Not Q is true (¬Q).</p>
</li>
<li><p>Therefore, Not P is true (¬P).</p>
</li>
</ol>
</li>
<li><p><strong>Examples:</strong></p>
<ul>
<li><p>You have no job (¬Q), so you are not a coding instructor (¬P).</p>
</li>
<li><p>There are no error messages (¬Q), so the input data is valid (¬P)</p>
</li>
</ul>
</li>
<li><p><strong>Interpretation:</strong> This argument corresponds to <strong>Row 4 (P=False, Q=False)</strong> of our truth table, where P⟹Q is true, and both P and Q are false. This form of reasoning is critical for skillful debugging, allowing you to infer reasonably true conclusions about the cause (P) from observations of the outcome (Q), assuming your program logic (P⟹Q) holds true.</p>
</li>
</ul>
<h3 id="heading-affirming-the-consequent-fallacy">Affirming the Consequent (Fallacy)</h3>
<p>Now we move to the common pitfalls. This is an <strong>invalid argument form</strong> where we attempt to conclude that the antecedent (P) is true simply because the consequent (Q) is true. It's a fallacy because the truth of Q does not guarantee the truth of P, as Q could have been caused by something other than P.</p>
<ul>
<li><p><strong>Argument Form (Invalid):</strong></p>
<ol>
<li><p>If P, then Q (P⟹Q)</p>
</li>
<li><p>Q is true.</p>
</li>
<li><p>Therefore, P is true. (**Incorrect inference!**🚨)</p>
</li>
</ol>
</li>
<li><p><strong>Examples:</strong></p>
<ul>
<li><p>You have a job (Q), so you are a coding instructor (P).</p>
<ul>
<li>Incorrect: You could have many other jobs.</li>
</ul>
</li>
<li><p>The code showed an error (Q), so you provided invalid data (P).</p>
<ul>
<li>Incorrect: Other things besides invalid data can cause errors.</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Interpretation:</strong> This fallacy highlights the difference between a one-to-one and a one-to-many relationship. Looking at our truth table, when P⟹Q is True and Q is True, P could be <strong>True (Row 1)</strong> or <strong>False (Row 3)</strong>. The argument mistakenly concludes that P must always be True. The uncertainty arises because observing Q as True doesn't uniquely point to P as the cause – there could be many other reasons or paths that lead to Q.</p>
<ul>
<li>Think of walking down a forest path, unaware that another trail has merged into yours from behind you. When retracing your steps in reverse, you encounter a split (Q) at that merge and feel disoriented, unsure which path leads back to your start point (P). Just as multiple paths can converge on the same point, multiple causes can produce the same outcome.</li>
</ul>
</li>
</ul>
<h3 id="heading-denying-the-antecedent-fallacy">Denying the Antecedent (Fallacy)</h3>
<p>This is another <strong>invalid argument form</strong>. Here, we attempt to conclude that the consequent (Q) is false simply because the antecedent (P) is false. It's a fallacy because P being false does not guarantee that Q will also be false. Q could still be true for other reasons, or the implication might not cover all scenarios where Q occurs.</p>
<ul>
<li><p><strong>Argument Form (Invalid):</strong></p>
<ol>
<li><p>If P, then Q (P⟹Q)</p>
</li>
<li><p>Not P is true (¬P).</p>
</li>
<li><p>Therefore, Not Q is true (¬Q). (**Incorrect inference!**🚨)</p>
</li>
</ol>
</li>
<li><p><strong>Examples:</strong></p>
<ul>
<li><p>You are not a coding instructor (¬P), so you have no job (¬Q).</p>
<ul>
<li>Incorrect: You could have a different job.</li>
</ul>
</li>
<li><p>You provided valid data (¬P), so you have no error (¬Q).</p>
<ul>
<li>Incorrect: Valid data doesn't guarantee no error. Other factors like network issues, memory leaks, or non-idempotent operations can still cause errors.</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Interpretation:</strong> Similar to Affirming the Consequent, this fallacy stems from incorrectly assuming a unique relationship. From our truth table, when P⟹Q is True and P is False, Q could be <strong>True (Row 3)</strong> or <strong>False (Row 4)</strong>. The argument mistakenly concludes Q must always be False.</p>
</li>
</ul>
<p>Both of these fallacies (<strong>Affirming the Consequent</strong> and <strong>Denying the Antecedent</strong>) creep into our thinking when we prematurely assume a single cause for an effect. In complex real-world systems, many factors can lead to an outcome, and narrowing your thinking too soon can lead to missed bugs or incorrect conclusions.</p>
<h3 id="heading-fallacies-and-implication-a-prerequisite">Fallacies and Implication: A Prerequisite</h3>
<p>Both the fallacy of affirming the consequent and denying the antecedent assume the underlying implication (P⟹Q) is true.</p>
<p>If this implication is false from the start, there's no logical argument to be made, and thus, no fallacy to speak of.</p>
<h3 id="heading-exercise-identifying-an-argument-form">Exercise: Identifying an Argument Form</h3>
<p>Which of the 4 forms of argument is this?</p>
<ul>
<li><strong>Penguins can’t fly. I can’t fly. Therefore, I’m a penguin.</strong></li>
</ul>
<p><em>Hint: Rephrase the first statement into an if-then form</em>.</p>
<h2 id="heading-denying-the-antecedent-a-database-example">Denying the Antecedent: A Database Example</h2>
<p>We just saw that Denying the Antecedent is a logical fallacy, meaning that even if the initial implication (P⟹Q) is true, concluding ¬Q from ¬P is not a valid inference. To make this abstract concept concrete, and to illustrate why this fallacy can be particularly dangerous in real-world systems like software, let's explore a practical example involving a database.</p>
<p>The implication: <strong>If the database is down (P), we’ll see a connection timeout error (Q).</strong></p>
<p>Now, applying the fallacy of Denying the Antecedent, we might incorrectly conclude: <strong>If the database is not down (¬P), we will not see a connection timeout error (¬Q). ❌</strong></p>
<p>But even if the database itself is perfectly operational and "not down," you might still encounter a connection timeout error. This could happen due to a variety of other, independent reasons, such as:</p>
<ul>
<li><p>Network problems</p>
</li>
<li><p>Firewall rules</p>
</li>
<li><p>The database is up but extremely slow</p>
</li>
<li><p>The query engine is stuck</p>
</li>
</ul>
<p>This specific example of multiple potential causes for a "timeout" highlights a broader, critical skill in software development: <strong>thorough case analysis</strong>.</p>
<p>This is precisely why technical assessments, especially in areas like algorithms and system design, frequently demand that you consider exhaustive possibilities. For instance, you are often asked to handle <strong>base and recursive cases in dynamic programming</strong>, or to ensure <strong>mutually exclusive and collectively exhaustive coverage when grouping multiple scenarios in problems like interval merging.</strong></p>
<p>Such strong case analysis is vital for minimizing bugs and cultivating an open-minded approach to considering multiple causal paths, driven by experience, curiosity, and a dedication to craftsmanship.</p>
<p>But even perfect case analysis doesn't guarantee a correct implementation. Weak language mastery or mistaken assumptions can still lead to errors, making tests a crucial last line of defense.</p>
<p>Before jumping into applying logic to software testing, let’s practice our agility in conceptually switching between real-world concepts in English and symbols in logic.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750012280729/731cd405-1a5c-45c1-8d16-9e6b28837979.jpeg" alt="kitten in front of computer screen full of code" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-assigning-real-world-meanings-to-logic">Assigning Real-World Meanings to Logic</h2>
<p>We must define what P, Q, and P⟹Q refer to when applying logical theory to real-world concepts.</p>
<p>How we define these variables affects our truth tables.</p>
<p>For example:</p>
<ul>
<li><p>If <strong>P means "valid input,"</strong> then ¬P means "invalid input."</p>
</li>
<li><p>If <strong>P means "invalid input,"</strong> then ¬P means "valid input."</p>
</li>
</ul>
<p>Imagine we define <strong>P = "Good input"</strong> and <strong>Q = "No Error."</strong></p>
<ul>
<li><p>When testing the <strong>happy path</strong>, we are verifying that the implication <strong>P⟹Q (If input is good, then no error)</strong> holds true.</p>
</li>
<li><p>When testing the <strong>unhappy path</strong> (mutation testing, more details later), we are verifying that <strong>¬P⟹¬Q (If input is not good, then an error occurs)</strong> holds true.</p>
</li>
</ul>
<p>In any test, a failure indicates that the tested implication is false. This warrants investigation into whether the issue lies with the specification's interpretation, the implementation, or even the test itself.</p>
<h2 id="heading-applying-logic-to-software-testing">Applying Logic to Software Testing</h2>
<p>Software development relies on constructing systems that behave predictably. <strong>Software testing</strong> is our primary tool for validating these behaviors. At its core, testing is a process deeply rooted in logical implications, where we propose a hypothesis about our code and then run an experiment (the test) to check its truth.</p>
<p>A test case is carefully designed to evaluate a specific piece of code. This involves:</p>
<ol>
<li><p><strong>Setting up Preconditions and Inputs:</strong> Before executing the code under test, we meticulously establish a specific environment and provide particular inputs. This includes:</p>
<ul>
<li><p><strong>Function/Method Arguments:</strong> The precise values passed into the code being tested.</p>
</li>
<li><p><strong>System State:</strong> Setting up relevant data in a database, preparing the content of a file system, configuring an object's instance variables, or dictating the responses of external services (often through "mocks" or "stubs").</p>
</li>
<li><p><strong>Environmental Factors:</strong> Controlling elements like the current time, specific network conditions, or user permissions relevant to the code's execution. This precise setup ensures that the code runs under defined conditions, allowing us to evaluate its behavior consistently.</p>
</li>
</ul>
</li>
</ol>
<p>Once the setup is complete, the code under test is executed, and its output or behavior is observed. This observation is then compared against an <strong>expected result</strong>.</p>
<p>To precisely analyze test outcomes, let's establish our specific logical mapping:</p>
<ul>
<li><p><strong>P: The code under test is correct for the specific scenario defined by the test.</strong> This refers to the <em>actual, objective state</em> of the code's internal logic and implementation when presented with the test's preconditions and inputs. If P is True, the code is without defect for this case. If P is False, there is a bug or deviation.</p>
</li>
<li><p><strong>Q: The test passes.</strong> This means the actual output or behavior observed from the code precisely matches the expected outcome defined in our test case. If they do not match, the test fails.</p>
</li>
<li><p><strong>P⟹Q: If the code under test is correct for this specific scenario, then the test will pass.</strong> In pure propositional logic, the truth value of P⟹Q is indeed defined by the truth values of P and Q. But in the context of software testing, P⟹Q represents our <strong>hypothesis or desired specification</strong> for how the code <em>should</em> behave. We don't directly "know" P's truth value beforehand. Instead, the test's execution provides empirical data (the actual Q) that allows us to <strong>evaluate whether this hypothesis holds true in practice</strong>, and thereby infer the actual state of P.</p>
</li>
</ul>
<p>Understanding this mapping is vital for interpreting test results. Let's examine the different outcomes of a test run, referencing the truth table for P⟹Q:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750280931102/bc300c03-ce17-456d-9a7e-47c8e649cfd6.png" alt="Truth table - explained in the text below" width="600" height="400" loading="lazy"></p>
<ul>
<li><p><strong>Row 1: P is True (Code is correct), Q is True (Test passes)</strong></p>
<ul>
<li><p><strong>Interpretation in Testing: Ideal State/Validation</strong></p>
<ul>
<li><p>This is the desired outcome and strengthens our confidence that the code adheres to its specification.</p>
</li>
<li><p>This scenario directly confirms the truth of our hypothesis (P⟹Q).</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Row 2: P is True (Code is correct), Q is False (Test fails)</strong></p>
<ul>
<li><p><strong>Interpretation in Testing: Logical Contradiction / Falsification of Hypothesis</strong></p>
<ul>
<li><p>This row means our overall hypothesis P⟹Q is <em>false</em> for this specific instance.</p>
</li>
<li><p>This demands investigation: either our initial assumption that P <em>was</em> True (meaning the code was correct) is wrong (i.e., there's an actual bug, so P is actually False), or the test itself is flawed (its inputs/expectations are incorrect), or the specification is wrong.</p>
</li>
<li><p>This is where rethinking of the P⟹Q hypothesis itself happens.</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Row 3: P is False (Code is incorrect), Q is True (Test passes)</strong></p>
<ul>
<li><p><strong>Interpretation in Testing: False Positive / Inadequate Test</strong></p>
<ul>
<li><p>This is a problematic scenario. It implies the test is not robust enough to detect the defect in the code, or the test's expectation is flawed.</p>
</li>
<li><p>While P⟹Q remains true vacuously, this outcome is misleading and means the test is not effectively verifying code correctness.</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Row 4: P is False (Code is incorrect), Q is False (Test fails)</strong></p>
<ul>
<li><p><strong>Interpretation in Testing: Bug Found / Confirmation of Incorrectness</strong></p>
<ul>
<li><p>This is a beneficial outcome, as the test has successfully identified a defect.</p>
</li>
<li><p>When P is truly False, P⟹Q is vacuously true.</p>
</li>
<li><p>This row can represent either a known, intended 'P is False' state (e.g., TDD Red phase) or the <em>actual state discovered</em> via deduction (explained below in Scenario 1).</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h3 id="heading-note-on-this-contextualized-truth-table-and-probabilistic-nature"><strong>Note on this Contextualized Truth Table and Probabilistic Nature</strong></h3>
<p>This truth table differs from a purely abstract logical truth table by being explicitly contextualized for software testing.</p>
<ul>
<li><p><strong>Specific Definitions:</strong> Unlike a generic P and Q, here they have precise meanings within the domain of code correctness and test outcomes.</p>
</li>
<li><p><strong>"Interpretation in Testing" Column:</strong> This is the key distinguishing feature. It translates the raw logical outcomes of (P, Q, and P⟹Q) into actionable insights and common debugging/development scenarios for software engineers. It explains <em>what it means</em> when a particular row is observed in the context of testing.</p>
</li>
<li><p><strong>Probabilistic Confidence:</strong> While formal logic operates in binary (True/False), real-world software testing often involves <strong>probabilistic confidence</strong>. A test doesn't provide absolute logical proof of correctness (for example, a passing test doesn't guarantee P is 100% True due to the possibility of undiscovered bugs or false positives). Instead, test results <em>increase our confidence</em> that the code is correct, or <em>provide strong evidence</em> that it is incorrect. Testing is fundamentally about reducing uncertainty and increasing the probability that our code functions as intended.</p>
</li>
</ul>
<p>Let's now explore how these logical outcomes are interpreted in two common testing scenarios:</p>
<h3 id="heading-scenario-1-debugging-an-unexpected-defect-applying-modus-tollens">Scenario 1: Debugging an Unexpected Defect (Applying Modus Tollens)</h3>
<p>This scenario occurs when a test that was previously passing, or a newly written test that we strongly trust as a precise and correct specification, unexpectedly fails. In this context, we assume the validity of the implication P⟹Q for this specific test case, treating it as an unbreakable rule for how correct code <em>should</em> behave.</p>
<ol>
<li><p><strong>Our Core Premise (Trusted Specification):</strong> We operate under the assumption that the implication "P⟹Q" ("If the code is correct for this scenario, then the test passes") is <strong>True</strong> for this specific test. Our confidence stems from the test's meticulous design, its history of passing, or its role in a well-established regression suite.</p>
</li>
<li><p><strong>Test Execution and Observation:</strong> We run the test, which has its preconditions and inputs set.</p>
<ul>
<li><p><strong>If the Test Fails (Q is False):</strong> This is the key observation. Since we <strong>trust our premise that P⟹Q is True</strong>, and we observe ¬Q (the test fails), we are logically compelled to deduce that our initial belief about P (the code being correct for this scenario) must be false.</p>
<ul>
<li><p><strong>Application of Modus Tollens:</strong></p>
<ul>
<li><p>Premise 1: If the code is correct for this scenario (P), then the test passes (Q). (P⟹Q, assumed true as a trusted specification).</p>
</li>
<li><p>Premise 2: The test did not pass (¬Q).</p>
</li>
<li><p>Conclusion: Therefore, the <strong>code is not correct for this scenario (¬P).</strong></p>
</li>
</ul>
</li>
<li><p><strong>Outcome:</strong> This inference directly points us to a defect in the code. The test's failure, given its trusted nature, <em>reveals</em> that the actual state of the code for this scenario is <strong>P is False</strong>. This effectively places the scenario in <strong>Row 4 (P False, Q False)</strong> of our truth table, confirming the presence of a bug that needs fixing. This is typical in <strong>regression testing</strong>, where a previously correct feature suddenly breaks.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<h3 id="heading-scenario-2-validatingrefining-the-specification-falsifying-pq-or-confirming-known-incorrectness">Scenario 2: Validating/Refining the Specification (Falsifying P⟹Q or Confirming Known Incorrectness)</h3>
<p>This scenario arises when a test fails, and our primary focus is not immediately on debugging the code as if it's a regression. Instead, it's on understanding <em>why</em> the P⟹Q relationship (our hypothesis for this specific behavior) isn't holding, or simply confirming an expected failure. This can involve questioning the test itself, the underlying requirements, or confirming a deliberately incorrect state of the code.</p>
<ol>
<li><p><strong>Our Hypothesis (Being Challenged or Confirmed):</strong> We are either actively evaluating the validity of the implication "P⟹Q" for a specific behavior, or we are running a test against code we know is incomplete or incorrect.</p>
</li>
<li><p><strong>Test Execution and Observation:</strong> We run the test with its defined preconditions and inputs.</p>
</li>
<li><p><strong>If the Test Fails (Q is False):</strong> The interpretation here depends on our prior knowledge or intent about the code's state (P):</p>
<ul>
<li><p><strong>Sub-scenario 2A: Falsifying P⟹Q and Rethinking Specification (Corresponds to Row 2: P True, Q False):</strong></p>
<ul>
<li><p>We observe Q is False (the test fails).</p>
</li>
<li><p>If we then examine the code and the requirements, and we conclude that the code <em>should</em> have been correct for this scenario (meaning, our expectation/belief was P is True), then the test result means <strong>the specific instance of our hypothesis "P⟹Q" is FALSE.</strong></p>
</li>
<li><p>This direct falsification reveals a contradiction. We must then investigate:</p>
<ul>
<li><p>Is our initial belief that P was True mistaken (that is, is there a genuine bug in the code that makes P actually False, moving this to a Row 4 scenario)?</p>
</li>
<li><p>Or, is the test itself incorrect (its inputs or expected output are wrong), meaning our P⟹Q premise needs to be re-evaluated and corrected?</p>
</li>
<li><p>Or, have the underlying requirements changed or been misunderstood?</p>
</li>
</ul>
</li>
<li><p><strong>Outcome:</strong> This critical outcome prompts us to "rethink" – either the code needs fixing, or the test needs adjusting, or the specification needs clarification. This is common in <strong>exploratory testing</strong> or when working with new/evolving features where the exact behavior is still being defined.</p>
</li>
</ul>
</li>
<li><p><strong>Sub-scenario 2B: Confirming Known Incorrectness (Corresponds to Row 4: P False, Q False):</strong></p>
<ul>
<li><p>We observe Q is False (the test fails).</p>
</li>
<li><p>We <em>already know or intentionally designed</em> the code to be incorrect for this scenario (that is, we are actively developing a feature and haven't written the full code yet, or we're running a test against a known, un-fixed bug, so our expectation is P is False).</p>
</li>
<li><p>The test result simply <strong>confirms our prior knowledge that P is False</strong>. The test correctly highlights the missing or incorrect behavior. In this case, the P⟹Q implication is vacuously true, and the test effectively served its purpose of showing the existing defect.</p>
</li>
<li><p><strong>Outcome:</strong> This is typical in Test-Driven Development (TDD) in the Red phase, where a failing test for a not-yet-implemented feature confirms the "P is False" state, guiding development to make P True. It also applies when verifying that a bug fix indeed works: the test initially fails (confirming the bug), and then passes after the fix (confirming P is now True).</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749063701013/bc574591-90ec-4439-9b47-f0737d5a5384.jpeg" alt="girl looking into microscope" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-a-closer-look-at-testing">A Closer Look at Testing</h2>
<h3 id="heading-the-illusion-of-correctness-affirming-the-consequent">The Illusion of Correctness: Affirming the Consequent</h3>
<p>Consider a common scenario where a test passes, seemingly validating our code:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_user_role</span>(<span class="hljs-params">user_id</span>):</span>
    <span class="hljs-keyword">if</span> user_id == <span class="hljs-number">42</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-string">"admin"</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">"guest"</span>

<span class="hljs-comment"># test</span>
<span class="hljs-keyword">assert</span> get_user_role(<span class="hljs-number">42</span>) == <span class="hljs-string">"admin"</span>
</code></pre>
<p>Here, our implicit claim (the specification) is: <strong>If the code is correct (P), then the output will match the expectation (Q).</strong></p>
<p>In this example, the test passes – the output is "admin" <strong>(Q)</strong>, but can we definitively conclude that the function is correct <strong>(P)</strong>? Not necessarily.</p>
<p>This scenario often exemplifies the logical fallacy of <strong>affirming the consequent</strong>. We see the desired outcome (Q) and mistakenly assume that our specific intended cause (P, the correctness of <em>our specific implementation path</em>) was the reason.</p>
<p><strong>The Problem:</strong> What if the real condition for an "admin" role should be checking a database, but we have temporarily hardcoded the value for testing? The test would pass, but the correctness is illusory. If we see P as false because the code did not implement the behaviour from the full specification, this corresponds to Row 3 (P False, Q True: False Positive) in our truth table.</p>
<p>As I mentioned before, deliberately implementing ¬P works well if ¬Q is observed, but is not useful, or even erroneous, if Q is observed.</p>
<p>Even without hardcoding, the output might match by coincidence, or because of factors outside the direct logic we intended to test. This can happen due to:</p>
<ul>
<li><p><strong>Default behavior:</strong> A broader system default might produce the expected output.</p>
</li>
<li><p><strong>Caching:</strong> A previous successful operation might have cached the result, bypassing the actual logic.</p>
</li>
<li><p><strong>Fallback logic:</strong> An unintended fallback mechanism produces the correct output despite an error in the primary path.</p>
</li>
<li><p><strong>Test harness bugs:</strong> Flaws in the testing setup itself might obscure real issues.</p>
</li>
</ul>
<h3 id="heading-the-role-and-risks-of-test-doubles">The Role and Risks of Test Doubles</h3>
<p>The challenges highlighted above are particularly relevant when using <strong>test doubles</strong>, such as Stubs and Mocks. These are artificial components that replace real dependencies (for example, databases, external APIs, time-sensitive operations) during testing.</p>
<ul>
<li><p><strong>Stubs</strong> focus on <strong>state</strong>: they provide pre-programmed fake data or return values to get the rest of the code under test working predictably, like the <code>get_user_role</code> example</p>
</li>
<li><p><strong>Mocks</strong> focus on <strong>behavior</strong>: they allow you to verify interactions, such as the number of calls made to a certain API, or how control flow flows through specific parts of the system.</p>
</li>
</ul>
<p>Both remove external dependencies, allowing you to isolate and focus on the internal logic of the code without noise or side effects. But using them without understanding their limitations can lead to <strong>false confidence</strong>.</p>
<p>If a test double simulates a "correct" response, but the real dependency it replaces has a bug, or the way the main code interacts with that dependency is flawed, the test will pass (Q is True) – yet P (the code's overall correctness in a real environment) might be False, leading to a dangerous false positive.</p>
<p>Whether you encounter such logical fallacies in your testing depends on precisely what behavior or state you are attempting to verify, and whether you are over-interpreting the test results.</p>
<h3 id="heading-test-scope-and-interpretation">Test Scope and Interpretation</h3>
<p>The choice of testing scope – from narrowly focused unit tests to broader integration tests, system tests, user acceptance tests (UAT), and even testing in production – represents a continuum. On this spectrum, various trade-offs are involved, especially concerning the effort-reward ratio. This effort is influenced by factors like individual developer skill, company engineering practices (for example, responsibility split between feature developer and dedicated tester roles), and industry regulations.</p>
<p>Generally:</p>
<ul>
<li><p><strong>Smaller-scoped tests</strong> (for example, unit tests) have fewer assumptions baked in and a shorter chain of logical implications. This translates to less risk of committing fallacies in both test implementation and test result interpretation. They are excellent for quickly verifying isolated units of code.</p>
</li>
<li><p><strong>Larger-scoped tests</strong> (for example, end-to-end integration tests) incorporate more real-world complexities and dependencies. While providing higher confidence in the system's overall behavior, they inherently increase the potential for confounding factors that can lead to false positives or make debugging more challenging.</p>
</li>
</ul>
<p>Being acutely aware of the assumptions implicit in each test, at every scope level, is paramount. Passing tests for the wrong reasons will inevitably cause problems down the road.</p>
<h3 id="heading-debugging-observability-and-mental-models">Debugging, Observability, and Mental Models</h3>
<p>Failing tests are not failures of the testing process but are, in fact, incredibly valuable learning moments. They represent opportunities to:</p>
<ul>
<li><p>Run focused debugging experiments to pinpoint the exact cause of the failure.</p>
</li>
<li><p>Refine your <strong>mental model of the code-to-outcome (P⟹Q) link</strong>. A failing test (where Q is False) tells you that your current understanding of P, or of the P⟹Q relationship, is flawed. Use this feedback to update your understanding of the code's actual behavior.</p>
</li>
<li><p>Improve both the code and the tests themselves.</p>
</li>
</ul>
<p>Enhance system <strong>observability</strong> to better detect and confirm outcomes (Q). The more clearly, from multiple angles, and through diverse methods we can observe Q (for example, logs, metrics, tracing, output inspection), the more confident we can be in its causes and, by extension, the actual state of P.</p>
<p>Crucially, avoid blindly fixing tests just to make them pass. Always ensure you thoroughly understand why a test failed and update your P⟹Q model accordingly. The ultimate goal is not just to fix current bugs, but to prevent them in the future by continually strengthening both the correctness of the code and the verifiability of its behavior.</p>
<h3 id="heading-falsifiable-tests-reveal-regressions">Falsifiable Tests Reveal Regressions</h3>
<p>Beyond avoiding false positives (where the code is incorrect but the test passes), a good test must also be <strong>falsifiable</strong>. This means the test must be genuinely capable of failing under certain (incorrect) conditions. An unfalsifiable test is a broken test – it cannot serve its purpose of revealing regressions or confirming the presence of bugs.</p>
<p>While we strive for the implication P⟹Q to hold true for all the scenarios we care about, it may not be true for all cases due to unforeseen or mistaken assumptions, or simply because the code is incorrect. The test's ability to demonstrate this incorrectness by failing under specific, well-defined conditions makes it profoundly valuable.</p>
<p>Some common culprits for unfalsifiable or "bad" tests include:</p>
<ul>
<li><p><strong>Vague or Untestable Specifications:</strong> Statements like "The system should behave well under most conditions," "It shouldn't crash randomly," or "The algorithm is robust" lack clear, measurable criteria. It's impossible to design a test that definitively passes or fails against such statements, thus rendering them effectively unfalsifiable.</p>
</li>
<li><p><strong>Broken Implementations of the Test Suite:</strong> The test code itself might be flawed, perhaps due to logical errors or control flow issues that prevent assertions from ever being reached or correctly evaluated, inadvertently taking the same passing path regardless of the code under test.</p>
</li>
<li><p><strong>Insufficient Test Data or Edge Cases:</strong> If tests only cover "happy path" scenarios and fail to include challenging inputs or boundary conditions, they might pass for incorrect code that only breaks under specific, untested circumstances.</p>
</li>
</ul>
<p>A robust specification clearly defines what constitutes success and failure. Correspondingly, a good test suite correctly implements that specification, making its tests both accurate and truly falsifiable.</p>
<h3 id="heading-take-a-step-back">Take a step back</h3>
<p>Critical thinkers might observe that the application of the four fundamental logical argument forms to coding scenarios, as initially presented, could be misleading in the complexities of real-world software.</p>
<p>The next section shows some nuances that arise when we transition from the clear-cut rules of formal logic to the often messy reality of software development.</p>
<p>Specifically:</p>
<ul>
<li><p>The first two points below show why the seemingly valid arguments of Modus Ponens and Modus Tollens may not always lead to reliable conclusions when applied to coding scenarios.</p>
</li>
<li><p>The last two points below show why the two common logical fallacies, Affirming the Consequent and Denying the Antecedent, may actually provide correct insights under specific real-world coding conditions.</p>
</li>
</ul>
<h2 id="heading-revisiting-the-four-statements-for-coding">Revisiting the Four Statements for Coding</h2>
<p>Here are the four arguments and their associated coding examples:</p>
<ol>
<li><p><strong>Modus Ponens:</strong> If you provide invalid input data (P), the code will show an error (Q).</p>
</li>
<li><p><strong>Modus Tollens:</strong> There are no error messages (¬Q), so the input data is valid (¬P).</p>
</li>
<li><p><strong>Affirming the Consequent (Fallacy):</strong> The code showed an error (Q), so you provided invalid data (P).</p>
</li>
<li><p><strong>Denying the Antecedent (Fallacy):</strong> You provided valid data (¬P), so you have no error (¬Q).</p>
</li>
</ol>
<p>Now, let's dive into the nuances of each:</p>
<h3 id="heading-modus-ponens">Modus Ponens</h3>
<ul>
<li><p><strong>Our coding example:</strong> If you provide invalid input data (P), then the code will show an error (Q).</p>
</li>
<li><p><strong>Why it may not always hold:</strong> This application of Modus Ponens assumes that either your code or any third-party code it relies upon will <em>always</em> properly detect and explicitly raise exceptions or show errors on bad data. In reality, systems might automatically fix or sanitize bad input, silence errors, or simply proceed with unexpected behavior without explicitly signaling an error, leading to a passing (or non-failing) state (¬Q) even when P (invalid input) was true.</p>
</li>
</ul>
<h3 id="heading-modus-tollens">Modus Tollens</h3>
<ul>
<li><p><strong>Our coding example:</strong> There are no error messages (¬Q), so the input data is valid (¬P).</p>
</li>
<li><p><strong>Why it may not always hold:</strong> This application of Modus Tollens assumes there are no automatic mechanisms within the system to fix or silence bad input <em>before</em> errors are typically displayed. If such "silent correction" or "error suppression" occurs, you might observe no error messages (¬Q), but the input data could still be invalid (P), rendering the conclusion (¬P) false despite the premise (¬Q) being true. This highlights the dangers of incomplete observability.</p>
</li>
</ul>
<h3 id="heading-affirming-the-consequent-fallacy-1">Affirming the Consequent (Fallacy)</h3>
<ul>
<li><p><strong>Our coding example:</strong> The code showed an error (Q), so you provided invalid data (P).</p>
</li>
<li><p><strong>Why it may actually be correct:</strong> While logically a fallacy, in specific, highly constrained real-world conditions, this inference can gain practical validity. If the error message is so uniquely and specifically defined that it can <em>only</em> be caused by invalid input data (P) and no other known factor, then this statement can become reliable. This is rare and typically requires meticulous error handling design where each error message maps unambiguously to a single root cause.</p>
</li>
</ul>
<h3 id="heading-denying-the-antecedent-fallacy-1">Denying the Antecedent (Fallacy)</h3>
<ul>
<li><p><strong>Our coding example:</strong> You provided valid data (¬P), so you have no error (¬Q).</p>
</li>
<li><p><strong>Why it may actually be correct:</strong> Although a fallacy in general logic, this inference can hold a high degree of practical confidence under certain programming paradigms (<strong>Functional Programming</strong>). If the code is sufficiently simple, purely functional (meaning outputs depend <em>only</em> on inputs and have no side effects), and has no external dependencies (like network or database interactions), then the absence of invalid data (¬P) can indeed make us reasonably confident that there will be no errors (¬Q). The lack of external variables and internal state makes the code's behavior highly predictable and directly tied to its inputs.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749061917858/db44dba5-2184-427a-8e28-27fc59904c49.jpeg" alt="dog with head tilted" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You may now be thinking: what’s the point of studying logic if it has so many loopholes and edge cases when applied to coding?</p>
<h2 id="heading-the-missing-ingredient-if-and-only-if">The Missing Ingredient – If and Only If</h2>
<p>In our exploration of logical implications, we've focused primarily on the <strong>unidirectional relationship</strong> P⟹Q ("If P, then Q"). This statement tells us what happens <em>if</em> P is true, but it remains silent on whether Q <em>only</em> happens when P is true. It's like saying, "If it rains, the ground gets wet." This is true, but the ground can also get wet if a sprinkler is on, even if it's not raining.</p>
<p>But in many critical contexts, especially in rigorous scientific theories and robust software systems, we often seek a much stronger relationship: one where the truth of Q absolutely <em>depends</em> on the truth of P, and vice versa. This powerful <strong>bidirectional relationship</strong> is captured by the phrase "<strong>If and Only If</strong>" (P⟺Q).</p>
<h3 id="heading-what-if-and-only-if-means-a-stronger-statement">What "If and Only If" Means: A Stronger Statement</h3>
<p>When we assert "P⟺Q", we're making two distinct claims simultaneously:</p>
<ol>
<li><p><strong>If P, then Q</strong> (P⟹Q): P is a sufficient condition for Q. Whenever P is true, Q must also be true.</p>
</li>
<li><p><strong>If Q, then P</strong> (Q⟹P): P is also a necessary condition for Q. Whenever Q is true, P must also be true. In other words, Q cannot be true without P being true.</p>
</li>
</ol>
<p>Notice the <strong>significant increase in the strength</strong> of the statement. "If P, then Q" merely states a consequence. "P⟺Q" declares a <strong>definitive equivalence</strong>, where P and Q are inextricably linked. They rise and fall together – one cannot be true without the other being true, and one cannot be false without the other being false.</p>
<h3 id="heading-bidirectional-truth-table-unambiguous-relationships">Bidirectional Truth Table: Unambiguous Relationships</h3>
<p>Let's construct the truth table for P⟺Q to clearly see this strong relationship.</p>
<p>P⟺Q is logically equivalent to (P⟹Q)∧(Q⟹P).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747678444501/8d498249-eec2-46ca-a5c1-85801eb1b350.png" alt="Truth table with columns P, Q, P->Q, Q->P, P<->Q" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-creating-the-table-columns-4-and-5-are-new">Creating the Table (columns 4 and 5 are new):</h4>
<ul>
<li><p><strong>Q⟹P (Column 4):</strong> We apply the standard implication rules, but with Q as our "if" and P as our "then." For instance, in Row 3, Q is True and P is False, so Q⟹P is False.</p>
</li>
<li><p><strong>P⟺Q (Column 5):</strong> This is the logical <strong>AND</strong> of the P⟹Q and Q⟹P columns. For P⟺Q to be True, both component implications must be True, which explains why you see less Trues in the bidirectional implication compared to any of the unidirectional implications.</p>
</li>
</ul>
<h3 id="heading-implications-for-the-two-common-fallacies">Implications for the Two Common Fallacies</h3>
<p>The clarity provided by "If and Only If" is particularly powerful in preventing the very logical fallacies we discussed earlier: Affirming the Consequent and Denying the Antecedent. These fallacies arise from the incorrect assumption that an "if-then" statement implies an "if and only if" relationship.</p>
<p>Let's revisit them with the lens of <strong>P⟺Q If and Only If you provided invalid data (P), then the code will show an error (Q)</strong>:</p>
<h4 id="heading-affirming-the-consequent-no-more-ambiguity">Affirming the Consequent: No More Ambiguity</h4>
<ul>
<li><p><strong>The Fallacy (assuming unidirectional P⟹Q):</strong></p>
<ul>
<li><p>If the code showed an error (Q), then you provided invalid data (P).</p>
</li>
<li><p>Previously, when P⟹Q was True and Q was True, P could be True (Row 1) or False (Row 3). This ambiguity led to the fallacy.</p>
</li>
</ul>
</li>
<li><p><strong>With P⟺Q:</strong></p>
<ul>
<li><p>Now, look at the P⟺Q column in the table. When P⟺Q is True and Q is True (Row 1), P is <strong>unambiguously True</strong>. The confusion from Row 3 is gone because if Q were True while P was False, P⟺Q would be False (as Q⟹P would be False), thus making that row irrelevant for valid modus ponens inference under the P⟺Q premise.</p>
</li>
<li><p>In a system designed with P⟺Q in mind, knowing that Q is True (observing an error) would <strong>force</strong> the conclusion that P is True (invalid data is the cause), assuming the "if and only if" relationship holds true for that specific system design.</p>
</li>
</ul>
</li>
</ul>
<h4 id="heading-denying-the-antecedent-unmistakable-consequences">Denying the Antecedent: Unmistakable Consequences</h4>
<ul>
<li><p><strong>The Fallacy (assuming unidirectional P⟹Q):</strong></p>
<ul>
<li><p>You provided valid data (¬P), so you have no error (¬Q).</p>
</li>
<li><p>Previously, when P⟹Q was True and P was False, Q could be True (Row 3) or False (Row 4). This ambiguity led to the fallacy.</p>
</li>
</ul>
</li>
<li><p><strong>With P⟺Q:</strong></p>
<ul>
<li><p>Now, when P⟺Q is True and P is False (Row 4), Q is <strong>unambiguously False</strong>. The problematic scenario from Row 3 (where P was False but Q was True) is irrelevant here because P⟺Q would be False in that case (specifically, Q⟹P would be False).</p>
</li>
<li><p>If your system genuinely adheres to "P⟺Q", then knowing that P is False (valid data provided) <strong>guarantees</strong> that Q is False (no error messages).</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-practical-mitigation-in-coding">Practical Mitigation in Coding</h3>
<p>The insights from "If and Only If" are more than just theoretical. Practically, both fallacies (Affirming the Consequent and Denying the Antecedent) can be mitigated by striving for conditions that approximate an "if and only if" relationship in your code and tests.</p>
<h4 id="heading-focused-unit-tests">Focused Unit Tests</h4>
<p>Design unit tests that are so granular and isolated that they effectively aim to establish an "if and only if" scenario for a tiny piece of logic. By thoroughly mocking or controlling all external dependencies and environmental factors, you reduce the impact of "other causes."</p>
<p>If your test for a specific input passes, you want to be as confident as possible that it passed <em>only</em> because the code handled that specific input correctly, and not due to some irrelevant side effect. Similarly, if it fails, you want to be sure that the failure points directly to the intended logical path.</p>
<h4 id="heading-exception-handling-and-specificity">Exception Handling and Specificity</h4>
<p>Instead of catching broad <code>Exception</code> types, catch and handle specific exceptions. This helps differentiate between various "causes" (P1​,P2​,…) that might lead to a generic "error" (Q). The more precise your error handling, the closer you get to a scenario where "If X error, then Y specific cause," moving towards a bidirectional understanding of error conditions.</p>
<h4 id="heading-test-driven-development-tdd-and-mutation-testing">Test-Driven Development (TDD) and Mutation Testing</h4>
<p>These methodologies inherently push towards P⟺Q thinking. TDD encourages writing a failing test <em>first</em> (¬Q), which <em>then</em> necessitates a specific code change (P) to make it pass.</p>
<p>Mutation testing, which we'll explore further, takes this a step further by ensuring that your tests are robust enough to <em>fail</em> when code is subtly altered (that is, proving that ¬P leads to ¬Q, and thus, that the original P was indeed necessary for Q).</p>
<p>By consciously aiming for "if and only if" relationships in your code's design and your testing strategies, you can build systems that are not only predictable but also much easier to debug and reason about, moving beyond mere correlation to a deeper understanding of cause and effect.</p>
<h3 id="heading-callback-to-mutation-testing">Callback to Mutation Testing</h3>
<p>In the earlier section on <strong>Assigning Real-World Meanings to Logic</strong>, we discussed:</p>
<blockquote>
<p>When testing the <strong>happy path</strong>, we are verifying that the implication <strong>P</strong>⟹<strong>Q (If input is good, then no error)</strong> holds true.</p>
<p>When testing the <strong>unhappy path (mutation testing)</strong>, we are verifying that <strong>¬P</strong>⟹<strong>¬Q (If input is not good, then an error occurs)</strong> holds true.</p>
</blockquote>
<p>This dual view is key to understanding how mutation testing contributes to software correctness.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749063165908/e1e3736c-75dd-4f1f-81bb-fd7d4f4f7837.jpeg" alt="artistic representation of molecular structures" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-mutation-testing-testing-the-tests">Mutation Testing: Testing the Tests</h2>
<p>Mutation testing deliberately introduces small faults (mutations) in the code and checks whether the test suite detects them by failing. This process assesses not the <em>code</em>, but the <em>tests themselves</em>.</p>
<p>In a robust test suite, we strive for two ideal conditions:</p>
<ul>
<li><p>All <strong>correct</strong> implementations should <strong>pass</strong> the tests.</p>
</li>
<li><p>All <strong>incorrect</strong> implementations should <strong>fail</strong> the tests.</p>
</li>
</ul>
<p>If a mutated (wrong) version of the code is introduced and causes no test failures, that defeats the fundamental purpose of testing. It means your tests aren't sensitive enough to catch a deviation from correctness. Mutations reveal hidden assumptions or gaps in your test coverage, acting as a sensitivity probe for your test suite.</p>
<p><strong>Example code mutations:</strong></p>
<ul>
<li><p>Changing an arithmetic operator (<code>+</code> to <code>-</code>, <code>&gt;</code> to <code>&gt;=</code>).</p>
</li>
<li><p>Flipping a boolean condition (<code>true</code> to <code>false</code>).</p>
</li>
<li><p>Deleting or duplicating a statement.</p>
</li>
<li><p>Modifying a constant value.</p>
</li>
</ul>
<p><strong>Common Python mutation testing tools:</strong></p>
<ul>
<li><p><strong>mutmut</strong> uses Python’s built-in <code>ast</code> module.</p>
</li>
<li><p><strong>cosmic-ray</strong> uses <code>parso</code>, which provides a more complete AST.</p>
</li>
</ul>
<p>These tools rely on abstract syntax trees to surgically mutate code.</p>
<p>You can even swap out underlying AST libraries for different precision or completeness: <a target="_blank" href="https://github.com/boxed/mutmut/issues/281">https://github.com/boxed/mutmut/issues/281</a></p>
<h3 id="heading-logic-behind-mutation-testing">Logic Behind Mutation Testing</h3>
<p>Let's formalize the logical mapping of mutation testing, recalling our definitions:</p>
<ul>
<li><p>Let P: Code is correct.</p>
</li>
<li><p>Let Q: Tests pass.</p>
</li>
</ul>
<p>Standard <strong>happy path testing</strong> primarily checks that P⟹Q – "if the code is correct, then tests pass."</p>
<p><strong>Mutation testing</strong> focuses on the other side of the coin: we intentionally make ¬P true (by introducing a fault), and then we expect ¬Q (the tests should fail). This process rigorously checks whether the implication ¬P⟹¬Q ("if the code is <em>not</em> correct, then the tests <em>fail</em>") holds true for your test suite.</p>
<p>But there's a deeper, more powerful logical implication here:</p>
<p>As we learned earlier, the statement ¬P⟹¬Q is <strong>logically equivalent</strong> to its <strong>contrapositive</strong>, Q⟹P.</p>
<p>So, by successfully verifying that introducing a fault (¬P) leads to a test failure (¬Q), we are simultaneously validating the contrapositive: <code>if tests pass (Q), then the code must be correct (P)</code>.</p>
<p>This is incredibly significant! It moves us much closer to establishing a <strong>bidirectional guarantee</strong> between our code and our tests: P⟺Q (code correctness is tightly coupled with test success). Mutation testing helps us confidently eliminate false positives in the test suite – situations where Q is true (the test passes) but P is false (the code is actually incorrect).</p>
<p>In a world where LLMs help us write and refactor code quickly, having this "if and only if" confidence in our test suite is invaluable for ensuring the generated or refactored code truly meets expectations.</p>
<h3 id="heading-clarifying-the-kinds-of-failures"><strong>Clarifying the Kinds of Failures</strong></h3>
<p>In software, we typically categorize errors into three main types:</p>
<ul>
<li><p><strong>Syntax errors:</strong> Violations of the language's grammatical rules (for example, missing colon, invalid keyword). These prevent the code from running at all.</p>
</li>
<li><p><strong>Runtime errors:</strong> Errors that occur during program execution, often due to unexpected conditions (for example, <code>TypeError</code>, <code>AttributeError</code>, <code>ZeroDivisionError</code>).</p>
</li>
<li><p><strong>Logic errors:</strong> The program runs without crashing, but it produces an incorrect result or behaves in a way that doesn't match the intended specification (for example, wrong algorithm, wrong return value).</p>
</li>
</ul>
<p>Mutation testing focuses on <strong>logic errors</strong> – failures where the program runs, but produces incorrect results. These are usually caught via <code>AssertionError</code> in the "Assert" phase of the Arrange–Act–Assert (AAA) testing pattern.</p>
<p>You could argue pedantically that <code>AssertionError</code> is a runtime error, but in testing, we treat it as a <strong>signal for logical failure</strong>:</p>
<blockquote>
<p><em>"The function ran, but the output didn’t match the expected behavior."</em></p>
</blockquote>
<p>Mutation testing assumes that syntax and runtime errors are already handled. Its purpose is to validate whether the test suite reliably catches logical misbehavior.</p>
<h3 id="heading-a-deeper-falsification-perspective">A Deeper Falsification Perspective</h3>
<p>Now, let's connect mutation testing back to <strong>Karl Popper's principle of falsification</strong>, which we introduced earlier in the context of scientific reasoning. Recall that Popper argued scientific theories gain strength not by being "proven," but by <em>surviving rigorous attempts to disprove them</em>. The core idea of falsification logic is that to disprove an implication like P⟹Q, you only need to find one instance where P is True and Q is False.</p>
<p>Mutation testing applies this same powerful principle, but to our test suite's effectiveness:</p>
<p>Instead of trying to <em>prove</em> directly that our tests are perfect, mutation testing takes a falsification approach to the implication <strong>¬P⟹¬Q ("If the code is incorrect, then the tests fail").</strong> It actively tries to <strong>falsify</strong> this crucial relationship.</p>
<p>If we introduce a mutation (making ¬P true, that is, the code is now incorrect) but the existing test suite <em>still passes</em> (meaning Q is true), then we have found an instance where:</p>
<ol>
<li><p>¬P is True (the code is incorrect due to the mutation).</p>
</li>
<li><p>Q is True (the test still passes).</p>
</li>
</ol>
<p>In this scenario, the implication <strong>¬P⟹¬Q is falsified</strong> because we have a True antecedent (¬P) leading to a False consequent (¬Q is false, because Q is true).</p>
<p>And, critically, if ¬P⟹¬Q is falsified, then its logically equivalent contrapositive, Q⟹P ("If the tests pass, then the code is correct"), is <em>also</em> falsified. This means we can no longer trust that a passing test suite reliably indicates correct code. Our desired P⟺Q relationship is broken – <strong>the test suite is no longer fully effective</strong> at guaranteeing correctness.</p>
<p>By pushing for zero surviving mutants, mutation testing forces us to minimize the surface area of these "hidden assumptions" in our test suite. It demands highly sensitive and specific tests that can pinpoint even subtle logical flaws, thereby moving us closer to building truly resilient systems.</p>
<h3 id="heading-comparing-tdd-red-phase-and-mutation-testing">Comparing TDD (Red Phase) and Mutation Testing</h3>
<p>Both methodologies, albeit through different means and at different stages of the development cycle, aim to establish confidence in the <strong>¬P ⟹ ¬Q</strong> relationship.</p>
<p><strong>Key Differences Summarized:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>TDD (Red Phase)</td><td>Mutation Testing</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Primary Goal</strong></td><td>Drive new code development. Confirm a bug/feature.</td><td>Evaluate the quality/completeness of existing tests.</td></tr>
<tr>
<td><strong>Code State</strong></td><td>Production code is incomplete or buggy.</td><td>Production code is (assumed to be) correct.</td></tr>
<tr>
<td><strong>Test State</strong></td><td>The <em>new</em> test is expected to fail.</td><td><em>Existing</em> tests are expected to fail (due to mutants).</td></tr>
<tr>
<td><strong>Initiator</strong></td><td>Developer wanting to add functionality/fix bug.</td><td>Tool that inserts artificial bugs into code.</td></tr>
<tr>
<td><strong>"Bugs"</strong></td><td>Actual, intended bugs or missing features.</td><td>Artificial, subtle changes to the code.</td></tr>
</tbody>
</table>
</div><h2 id="heading-toward-if-and-only-if-confidence">Toward If-and-Only-If Confidence</h2>
<p>Ultimately, the goal in software development is to establish if-and-only-if relationships whenever possible, both in the code implementation and especially in the sensitivity of the test suite to the code under test.</p>
<p>This means <strong>if a certain condition (P) is true, then a specific outcome (Q) <em>must</em> occur, and if Q occurs, then P <em>must</em> have been the cause</strong>. Achieving this level of clarity comes from:</p>
<ul>
<li><p>A deep understanding of the problem.</p>
</li>
<li><p>Aligned expectations during requirements gathering.</p>
</li>
<li><p>Logical analysis and interpretation of well-designed experiments.</p>
</li>
<li><p>Adherence to Single Responsibility Principle in SOLID</p>
</li>
<li><p>Rigorous tests with meaningful coverage.</p>
</li>
</ul>
<p>This allows us to understand how <strong>control flow</strong> and <strong>data flow</strong> work with greater depth and confidence, leading to better inferences throughout the entire software development lifecycle.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749062596293/9bfb566a-5e3c-4fec-ac42-326aa22532c8.jpeg" alt="Monarch Butterfly resting on butterfly bush flower" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-real-world-challenges">Real-World Challenges</h2>
<p>While striving for perfect "if-and-only-if" relationships provides a powerful logical ideal, the messy reality of modern software development presents significant hurdles. The very characteristics that make large systems powerful and scalable – their intricate interconnections and inherent dynamism – simultaneously obscure clear cause-and-effect relationships, making precise logical reasoning and debugging an ongoing battle.</p>
<h3 id="heading-a-web-of-complexity">A Web of Complexity</h3>
<h4 id="heading-fan-in-fan-out-the-nature-of-modern-systems">Fan-In, Fan-Out: The Nature of Modern Systems</h4>
<p>Any reasonably large software system rarely operates through purely linear control and data flows. Fan-out and fan-in patterns – where many components are called and then their results merged – are inevitable.</p>
<p>For example:</p>
<ul>
<li><p>In <strong>ETL pipelines</strong>, data may be ingested from multiple sources (external APIs, CSVs) and logged to multiple destinations (files, databases).</p>
</li>
<li><p>In <strong>concurrent programming</strong>, Python’s <code>ProcessPoolExecutor</code> splits data into chunks processed in parallel, then recombines the results.</p>
</li>
</ul>
<h4 id="heading-srp-meets-real-world-boundaries">SRP Meets Real-World Boundaries</h4>
<p>Just as functional programming must eventually perform I/O, the <strong>Single Responsibility Principle (SRP)</strong> runs into real-world boundaries, whether conceptual or infrastructural. At some point, something must glue these isolated units together.</p>
<p>Orchestration logic might live in a single function, span multiple files, or even distribute across microservices and machines communicating over networks. While this decomposition enhances modularity, it also increases surface area for bugs involving:</p>
<ul>
<li><p><strong>Side effects:</strong> Unintended changes to system state outside a component's explicit outputs.</p>
</li>
<li><p><strong>Circular dependencies:</strong> Components relying on each other in a loop, leading to difficult-to-trace behavior.</p>
</li>
<li><p><strong>Interface drift:</strong> Changes in one component's input/output expectations not being correctly reflected elsewhere.</p>
</li>
<li><p><strong>Race conditions:</strong> Timing-dependent bugs in concurrent operations.</p>
</li>
<li><p><strong>Serialization issues:</strong> Problems translating data between different formats or systems.</p>
</li>
<li><p><strong>Network unreliability:</strong> Unpredictable latency, packet loss, or disconnections in distributed systems.</p>
</li>
</ul>
<h4 id="heading-the-double-edged-sword-of-abstraction">The Double-Edged Sword of Abstraction</h4>
<p>This web of dependencies is the price of progress, made manageable only through better tooling and abstractions.</p>
<ul>
<li><p>If boundaries are <strong>well-designed, observable, and testable</strong>, they enable asynchronous collaboration, improve long-term maintainability, and increase developer confidence. (See GitHub Playbook in References)</p>
</li>
<li><p>If systems <strong>lack architectural coherence</strong> or fall behind evolving needs, they calcify into technical debt that demoralizes even the most motivated teams.</p>
</li>
</ul>
<h4 id="heading-clean-code-is-contextual">Clean Code Is Contextual</h4>
<p>While abstractions and orchestration help manage complexity, overusing design patterns or creating unnecessary class layers can introduce needless indirection. This is a common counterargument to architectural purism.</p>
<p>Ultimately, what counts as "clean code" is context-dependent. It varies with programmer skill, the tooling at hand (linters, tests, Copilot), and whether the project is a throwaway script or a multi-year infrastructure investment. Architectural practices like SRP should evolve alongside those constraints.</p>
<h3 id="heading-the-butterfly-effect-of-bugs">The Butterfly Effect of Bugs</h3>
<h4 id="heading-from-srp-to-reasoning-chains">From SRP to Reasoning Chains</h4>
<p>Previously, we focused on simple, direct cause-effect logic (P ⟹ Q), but real-world systems are messier.</p>
<p>The more we adhere to SRP through small, focused functions, the more we create longer chains of logic. This improves separation of concerns but also extends the reasoning required to debug behavior.</p>
<h4 id="heading-debugging-in-a-causal-fog">Debugging in a Causal Fog</h4>
<p>A seemingly minor trigger (O) can cascade through a chain like O⟹P⟹Q⟹R, which we may not fully understand due to knowledge silos, evolving requirements, or runtime dynamism.</p>
<p>Even when we understand the components, precisely identifying “P” is hard, much like how redefining a research question shifts the statistical population being studied. In complex systems with <strong>feedback loops</strong> (recommender engines), there might not be a single "root cause" at all.</p>
<h4 id="heading-short-term-triage-vs-long-term-insight">Short-Term Triage vs. Long-Term Insight</h4>
<p>Finding the true origin of a bug often demands experimentation, telemetry, and broad system insight. These investigations produce robust, future-proof fixes but take time.</p>
<p>In on-call scenarios, however, urgency reshapes priorities. Fast mitigations and clear communication often take precedence over deep diagnosis.</p>
<h3 id="heading-masked-by-design-and-debt">Masked by Design and Debt</h3>
<p>As systems scale, failure stops looking like a crash. Instead, it shows up as a retry spike, a slow metric drift, or silent fallback behavior.</p>
<p>Modern fault-tolerant systems, built with retries, failovers, circuit breakers, and autoscaling, are designed to recover quickly. This resilience often masks deeper problems, delaying detection for weeks and making root cause analysis harder.</p>
<p>Operating in <strong>non-deterministic environments</strong> with flaky networks, race conditions, or dynamic routing adds further ambiguity. Small symptoms become harder to link back to specific causes.</p>
<p>Compounding this, <strong>technical debt</strong> driven by weak technical leadership, shifting priorities or time pressure weakens the system’s observability and test coverage. Teams inherit brittle, poorly understood code, making it hard to draw clean lines between cause and effect.</p>
<p>Even the best engineers struggle in such conditions. When a system resists clarity, it doesn’t just block debugging. It erodes trust, slows learning, and fuels long-term burnout.</p>
<h2 id="heading-glimmers-of-hope-tools-and-practices-for-clarity">Glimmers of Hope: Tools and Practices for Clarity</h2>
<p>Despite these challenges, several strategies and practices offer a path toward more robust and understandable software.</p>
<h3 id="heading-leveraging-design-patterns">Leveraging Design Patterns</h3>
<p>Design patterns offer a shared vocabulary and time-tested strategies for structuring systems. When applied well, they tame complexity, reduce technical debt, and make behavior more predictable.</p>
<p>They also tend to concentrate similar failure modes. The same bug might appear across companies or industries, creating a wealth of prior art and solution playbooks. Familiarity with patterns can accelerate debugging and deepen shared understanding across teams.</p>
<h3 id="heading-nurturing-expert-mentorship">Nurturing Expert Mentorship</h3>
<p>Promoting mentors based on real technical impact instead of tenure builds stronger teams and avoids the <strong>Peter Principle</strong> (people in a hierarchy tend to rise to a level of respective incompetence).</p>
<p>Great mentors teach more than skills – they model falsifiability, independent thinking, and an ability to reason under uncertainty.</p>
<p>They help others challenge assumptions, navigate tradeoffs, and grow both technically and interpersonally. In systems where root causes are murky, this kind of leadership is essential.</p>
<p>One of the most powerful techniques that scales from mentorship to code is <strong>falsification</strong>: the disciplined search for counterexamples. Whether applied in design reviews, debugging sessions, or automated tests, this mindset anchors reasoning in reality.</p>
<h2 id="heading-the-power-of-falsification-in-testing">The Power of Falsification in Testing</h2>
<p>The deliberate search for counterexamples is core to building reliable systems.</p>
<ul>
<li><p>In algorithm design, testing edge cases is just falsification in disguise: finding where your logic breaks.</p>
</li>
<li><p>In code, <strong>fuzz testing</strong> (Atheris) throws diverse inputs at functions to expose falsifying examples.</p>
</li>
<li><p><strong>Property-based testing</strong> (Hypothesis) goes further by generating inputs that satisfy certain rules, then shrinks failures to their minimal form. This greatly improves reproducibility and helps stress-test concurrency issues.</p>
</li>
</ul>
<p>The more rigorously we attempt to falsify our assumptions, the more confidently we can reason about behavior using tools like Modus Ponens and Modus Tollens.</p>
<p>Assumptions are always present in software to simplify complexity. The question is whether they're <strong>explicitly codified in tests</strong> or <strong>left hidden and fragile</strong>.</p>
<p>Of course, no test is ever bulletproof: our assumptions could be mistaken, or the world could change. That’s why critical thinking, discerning "what should be" versus "what is", remains essential as newer generations increasingly rely on AI tools like Large Language Models.</p>
<p>This deliberate, <strong>falsification-driven approach</strong> is paramount for building reliable software. It underpins sophisticated testing techniques designed to expose hidden assumptions and break our logical chains.</p>
<p>While testing helps us uncover where our reasoning might falter, some domains demand an even higher degree of certainty. For those critical systems, we turn to the ultimate tools for logical rigor: <strong>Proof Assistants</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749062895395/f92ed2e7-f1fd-4351-a9d3-12c436c989f1.jpeg" alt="row of dominos" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-proof-assistants">Proof Assistants</h2>
<p>While traditional testing and fuzzing are powerful for finding bugs, they fundamentally cannot guarantee correctness for all possible inputs or scenarios. They can only prove the <em>presence</em> of bugs, not their <em>absence</em>.</p>
<p>To achieve formal, mathematically verified proofs of program behavior – providing the strongest possible guarantees – we turn to <strong>proof assistants</strong>. These tools allow us to build step-by-step logical proofs, ensuring that a program or system design adheres to its specification with absolute rigor.</p>
<h3 id="heading-prolog"><strong>Prolog</strong></h3>
<p>Prolog offers a relatively straightforward entry point into the world of logic programming and theorem proving. <strong>SWI-Prolog</strong> is a common interpreter (a <strong>REPL</strong>, or Read-Eval-Print Loop) for Prolog.</p>
<p>You interact with Prolog by providing it with a knowledge base composed of <code>facts</code> and <code>rules</code> (which are a type of logical clause called <strong>Horn clauses</strong>). You then pose <code>queries</code>.</p>
<h4 id="heading-installing-swi-prolog">Installing SWI-Prolog</h4>
<p>You can download SWI-Prolog from its official website: <a target="_blank" href="https://www.swi-prolog.org/download/stable">https://www.swi-prolog.org/download/stable</a><br>Follow the instructions for your operating system (Windows, macOS, or Linux).</p>
<p>On Ubuntu/Debian, you can usually install it via:</p>
<pre><code class="lang-bash">sudo apt update
sudo apt install swi-prolog
</code></pre>
<h4 id="heading-using-prolog-repl-vs-file">Using Prolog: REPL vs. File</h4>
<ul>
<li><p><strong>REPL (</strong><code>swipl</code>) is best for: Quick, interactive tests of single facts or rules, and posing queries to an <em>already loaded</em> knowledge base.</p>
</li>
<li><p><strong>A File (</strong><code>.pl</code> extension) is best for: Defining your <strong>entire knowledge base</strong> (multiple facts and rules) and storing your program for reusability. This is the standard way to work with Prolog for anything beyond a few lines.</p>
</li>
</ul>
<h4 id="heading-example-a-simple-knowledge-base">Example: A Simple Knowledge Base</h4>
<p>Let's define a knowledge base to represent who has a job and who is a coding instructor.</p>
<p><strong>1. Create a file</strong> named <code>knowledge.pl</code> with the following content:</p>
<pre><code class="lang-haskell">% knowledge.pl
% <span class="hljs-type">This</span> file defines a small knowledge base <span class="hljs-keyword">in</span> <span class="hljs-type">Prolog</span>.
% <span class="hljs-type">In</span> <span class="hljs-type">Prolog</span>, all statements (facts and rules) about the same predicate
% (identified by its name <span class="hljs-type">AND</span> number <span class="hljs-keyword">of</span> arguments, e.g., 'has_job' with <span class="hljs-number">1</span> argument is 'has_job/<span class="hljs-number">1</span>')
% must be written consecutively without other predicate definitions <span class="hljs-keyword">in</span> between.

% <span class="hljs-comment">--- Definitions for the 'has_job' predicate (takes 1 argument) ---</span>

% <span class="hljs-type">Fact</span>: <span class="hljs-type">Alice</span> has a job.
<span class="hljs-title">has_job</span>(alice).

% <span class="hljs-type">Fact</span>: <span class="hljs-type">Bob</span> has a job.
<span class="hljs-title">has_job</span>(bob).

% <span class="hljs-type">Rule</span>: <span class="hljs-type">Anyone</span> (represented by variable <span class="hljs-type">X</span>) has a job <span class="hljs-type">IF</span> they are a coding instructor.
% ':-' means '<span class="hljs-keyword">if</span>'. '<span class="hljs-type">X'</span> is a variable (starts with uppercase).
<span class="hljs-title">has_job</span>(<span class="hljs-type">X</span>) :- is_coding_instructor(<span class="hljs-type">X</span>).

% <span class="hljs-comment">--- Definitions for the 'is_coding_instructor' predicate (takes 1 argument) ---</span>

% <span class="hljs-type">Fact</span>: <span class="hljs-type">Alice</span> is a coding instructor.
<span class="hljs-title">is_coding_instructor</span>(alice).
</code></pre>
<p><strong>What each line does:</strong></p>
<ul>
<li><p>Lines starting with <code>%</code>: These are comments for human readability, ignored by Prolog. They explain the file's purpose and key rules like predicate grouping.</p>
</li>
<li><p><code>has_job(alice).</code> / <code>has_job(bob).</code>: These are facts. They assert simple truths, like "Alice has a job." The <code>.</code> at the end is mandatory for every statement.</p>
</li>
<li><p><code>has_job(X) :- is_coding_instructor(X).</code>: This is a rule. It states a conditional truth: "For any <code>X</code>, <code>X</code> has a job <em>if</em> <code>X</code> is a coding instructor." <code>X</code> is a variable (always starts with an uppercase letter), and <code>:-</code> means "if." This rule allows Prolog to deduce new information.</p>
</li>
<li><p><code>is_coding_instructor(alice).</code>: Another fact, asserting "Alice is a coding instructor." It's placed after all <code>has_job/1</code> clauses to satisfy Prolog's grouping rule.</p>
</li>
</ul>
<p><strong>2. Load and Query in the REPL:</strong></p>
<p>Open your terminal and type <code>swipl</code>. Once at the <code>?-</code> prompt, load the file and then pose your queries:</p>
<pre><code class="lang-bash">$ swipl
?- [knowledge].   % Load the <span class="hljs-string">'knowledge.pl'</span> file (omit .pl, use square brackets and a period)
% Press Enter. Prolog will confirm it loaded the file, e.g., <span class="hljs-string">'% knowledge.pl compiled...'</span>
True.

?- has_job(alice). % Query: Does Alice have a job?
% Press Enter. Prolog gives you a solution, <span class="hljs-keyword">then</span> waits.
True.              % Output: Yes, because it<span class="hljs-string">'s a fact.
% After '</span>True.<span class="hljs-string">', you'</span>ll see the <span class="hljs-string">'?- '</span> prompt again, indicating Prolog is ready <span class="hljs-keyword">for</span> your next query.
% If there were multiple ways to prove <span class="hljs-string">'True.'</span>, Prolog would present the first <span class="hljs-string">'True.'</span> <span class="hljs-keyword">then</span> <span class="hljs-built_in">wait</span> <span class="hljs-keyword">for</span> you to press <span class="hljs-string">';'</span> <span class="hljs-keyword">for</span> alternatives, <span class="hljs-keyword">then</span> Enter to confirm the final <span class="hljs-string">'True.'</span> or <span class="hljs-string">'False.'</span>.

?- has_job(carol). % Query: Does Carol have a job?
% Press Enter.
False.             % Output: No, Prolog cannot prove it from its knowledge.

?- has_job(X).     % Query: Who has a job? (Find values <span class="hljs-keyword">for</span> X)
% Press Enter
X = alice ;        % Prolog finds Alice as the first solution. Type <span class="hljs-string">';'</span> and press Enter to ask <span class="hljs-keyword">for</span> the next solution.
X = bob ;          % It finds Bob. Type <span class="hljs-string">';'</span> and press Enter <span class="hljs-keyword">for</span> the next solution.
X = alice          % It finds Alice again (this time deduced via the rule and is_coding_instructor(alice)).
% Press Enter. This accepts the current <span class="hljs-built_in">set</span> of solutions and stops searching <span class="hljs-keyword">for</span> more.
False.             % Output: Indicates no more solutions found after the last <span class="hljs-string">'Enter'</span> (or <span class="hljs-keyword">if</span> you explicitly chose not to search further).

?- halt.           % Type <span class="hljs-string">'halt.'</span> to <span class="hljs-built_in">exit</span> the Prolog REPL cleanly.
% Alternatively, you can often use Ctrl+D (press and hold Ctrl, <span class="hljs-keyword">then</span> D) to <span class="hljs-built_in">exit</span> most REPLs.
</code></pre>
<p><strong>The Prolog example clearly demonstrates:</strong></p>
<ul>
<li><p><strong>"Is P(X) true for a specific X?"</strong>: Shown by <code>?- has_job(alice).</code> (returns <code>True.</code>) and <code>?- has_job(carol).</code> (returns <code>False.</code>).</p>
</li>
<li><p><strong>"Is there an X for which P(X) is true?"</strong>: Shown by <code>?- has_job(X).</code> (provides solutions like <code>X = alice</code>, <code>X = bob</code>).</p>
</li>
</ul>
<h4 id="heading-prolog-limitations">Prolog Limitations</h4>
<p>Prolog's limitations become evident when attempting to reason about falsity or non-existence. <strong>You cannot directly ask "Is there any X for which P(X) is false?"</strong></p>
<p>Instead, Prolog operates on the principle of negation as failure. This means that if Prolog cannot prove a statement, it considers that statement false.</p>
<p>For example, if you ask <code>?- \+ has_job(carol).</code> (meaning "Is it not true that Carol has a job?"), Prolog will say True, because it simply cannot find any proof that Carol has a job in its knowledge base.</p>
<p>This is a significant distinction: it doesn't mean Carol definitely doesn't have a job, nor does Prolog provide a formal counterexample. It merely reflects a lack of provable information.</p>
<p>This fundamental constraint means Prolog, while powerful for logic programming, falls short of being a full-fledged proof assistant for comprehensive formal verification.</p>
<h3 id="heading-coq"><strong>Coq</strong></h3>
<p>After experimenting with Prolog and seeing its limitations, you can move on to a more powerful proof assistant like <strong>Coq</strong>. Coq is employed in <strong>safety-critical domains</strong> where absolute mathematical certainty is paramount. <code>coqtop</code> is the standard REPL for Coq.</p>
<p>A fundamental difference from Prolog is Coq's lack of a <strong>Closed World Assumption</strong>. In Coq, anything not explicitly proven is simply <strong>unknown</strong>, not automatically false.</p>
<p>Unlike Prolog, Coq's primary purpose isn't solving computational problems by searching a knowledge base. Its true power lies in its ability to <strong>construct and verify formal mathematical proofs and programs with absolute rigor</strong>. Its interaction involves managing a <strong>proof state</strong> (your remaining goals) and applying <strong>tactics</strong> (logical inference steps) until the proof is complete.</p>
<h4 id="heading-installing-coq">Installing Coq</h4>
<p>Coq can be installed in several ways, often via package managers or a tool called <code>opam</code> (the OCaml package manager, as Coq is written in OCaml).</p>
<ul>
<li><p><strong>Official Downloads:</strong> Visit the Coq website for detailed instructions for your OS: <a target="_blank" href="https://coq.inria.fr/download">https://coq.inria.fr/download</a></p>
</li>
<li><p><strong>Using a system package manager (for example, Ubuntu/Debian):</strong> Bash</p>
<pre><code class="lang-haskell">  sudo apt update
  sudo apt install coq
</code></pre>
</li>
</ul>
<h4 id="heading-using-coq-repl-vs-file">Using Coq: REPL vs. File</h4>
<ul>
<li><p><strong>REPL (</strong><code>coqtop</code>) is best for: Trying out single tactics, inspecting the current proof state, or learning basic syntax for very short commands.</p>
</li>
<li><p><strong>A File (</strong><code>.v</code> extension) is best for: <strong>Almost all Coq development and proof construction.</strong> This is how complex proofs and verified programs are structured and managed.</p>
</li>
</ul>
<h4 id="heading-coqs-comprehensive-question-answering">Coq's Comprehensive Question Answering</h4>
<p>Unlike Prolog, Coq can directly address all three types of logical questions we've discussed, providing robust answers backed by formal proof:</p>
<ul>
<li><p><strong>"Is P(X) true for a specific X?"</strong>: Coq allows you to define a precise statement (a <strong>theorem</strong>) like "Alice has a job." You then build a step-by-step logical <strong>proof</strong> that formally confirms whether this statement is true based on your definitions. If the proof succeeds, Coq formally verifies it: if it fails, Coq clearly shows where your logic breaks down.</p>
</li>
<li><p><strong>"Is there an X for which P(X) is true?"</strong>: Coq handles questions of existence. If you ask, "Does someone have a job?", you can construct a proof by explicitly providing an example (like "Alice") and then proving that your chosen example indeed satisfies the condition ("Alice has a job").</p>
</li>
<li><p><strong>"Is there any X for which P(X) is false?"</strong>: This is a key capability where Coq excels over Prolog. Coq allows you to formally prove that a statement is false, or that a counterexample exists. For instance, you could prove "Carol does not have a job" by showing it contradicts the definition, or prove "there exists someone who doesn't have a job" by explicitly identifying such a person and proving that they indeed lack a job. This direct ability to reason about negation and provide formal counterexamples (or prove their non-existence) is what makes Coq a <strong>full-fledged proof assistant</strong>.</p>
</li>
</ul>
<p>While Coq's core doesn't automatically generate counterexamples when a proof fails, plugins like QuickChick can be integrated for property-based testing to find falsifying examples.</p>
<p>It's a Coq library that allows you to specify properties about your Coq definitions and then <strong>randomly generate inputs</strong> to try and find a counterexample that falsifies your property.</p>
<p>This is a powerful way to <em>find bugs early</em> in your formalization before you invest a lot of time trying to prove a false theorem.</p>
<h3 id="heading-tla-isabelle-and-lean-a-spectrum-of-formal-verification">TLA+, Isabelle, and Lean: A Spectrum of Formal Verification</h3>
<p>Beyond Prolog and Coq, other powerful proof assistants and formal specification languages cater to different needs and paradigms:</p>
<ul>
<li><p><strong>TLA+:</strong> This is a formal <strong>specification language</strong> developed by Leslie Lamport. It focuses on modeling and verifying <strong>system designs</strong> (especially concurrent and distributed ones) using <strong>temporal logic</strong>, rather than proving low-level code. It helps ensure critical properties like safety (nothing bad ever happens) and liveness (something good eventually happens). Its practicality and accessibility make it popular in industry, notably at Amazon and Microsoft for robust system design.</p>
</li>
<li><p><strong>Isabelle and Lean:</strong> These are modern, highly advanced proof assistants.</p>
<ul>
<li><p><strong>Isabelle</strong>, grounded in higher-order logic, is widely used by researchers and institutions (for example, in projects like the seL4 verified microkernel) for formal theorem proving and software verification in academic and <strong>safety-critical domains</strong> demanding extreme rigor.</p>
</li>
<li><p><strong>Lean</strong>, based on dependent type theory, is favored by mathematicians for <strong>formalizing proofs in pure mathematics</strong> (for example, number theory, algebra). It's known for its powerful automation and active community.</p>
</li>
</ul>
</li>
</ul>
<p>These tools represent the pinnacle of applying formal logic to ensure the correctness and reliability of both mathematical theories and complex software systems.</p>
<p>Now that you have a good lay of the land in both theory and practice, here are some thought experiments to enrich your education.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749063042362/b94ec237-0aca-46d8-8921-80dfe1f5f051.jpeg" alt="nuts on a table, like almond, cashew " class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-food-for-thought">Food for Thought</h2>
<p>The journey into formal logic and its intersection with practical domains like software and science offers many avenues for deeper exploration.</p>
<h3 id="heading-hypothesis-testing-in-science-and-the-implication-truth-table">Hypothesis Testing in Science and the Implication Truth Table</h3>
<p>Statistical hypothesis testing uses a probabilistic form of Modus Tollens. We start with a <strong>null hypothesis (H0​): "If H0​ is true, then observing this data (or more extreme data) is likely."</strong> We then observe data that is highly unlikely/unexpected if H0​ were true (that is, a small p-value). This serves as our <strong>probabilistic "not Q."</strong> Therefore, we conclude that H0​ is likely not true (we reject H0​). This is our <strong>probabilistic "∴¬P."</strong></p>
<p>Here, the <strong>"truthiness" of P⟹Q is being tested</strong>, rather than simply assumed to be true for developing arguments, as in Modus Ponens or Modus Tollens. There's no absolute truth or anything to "prove" definitively.</p>
<p>Inferences are drawn from prior experiments (which inform the test data distribution) and context-specific experiment setups (which determine the significance level α), together defining the threshold (critical value) for what is considered an unlikely observation of Q.</p>
<p>The experiment's result is a rejection (or lack thereof) of H0​, not a definitive proof that H0​ is true.</p>
<h3 id="heading-inductive-reasonings-relationship-to-deductive-arguments">Inductive Reasoning's Relationship to Deductive Arguments</h3>
<ul>
<li><p><strong>Induction</strong> generates general rules (for example, "P is always followed by Q") from specific observations or cases.</p>
</li>
<li><p><strong>Deduction</strong> then tests or applies those general rules in new situations.</p>
</li>
</ul>
<p>If deduction leads to wrong predictions (that is, a rule is falsified), induction may need to revise the original rule, which forms a continuous <strong>feedback loop</strong> that refines our understanding.</p>
<h3 id="heading-necessity-and-sufficiency-in-implication">Necessity and Sufficiency in Implication</h3>
<p>The implication <strong>P⟹Q ("If you crossed the border, you must have had a passport")</strong> unpacks into two fundamental logical concepts:</p>
<ul>
<li><p><strong>P is sufficient for Q:</strong> Crossing the border <strong>guarantees</strong> you had a passport. (P alone is enough for Q.)</p>
</li>
<li><p><strong>Q is necessary for P:</strong> If you <strong>didn't have a passport (¬Q), you couldn't have crossed (¬P)</strong>. (Q is required for P to happen.)</p>
</li>
</ul>
<h2 id="heading-qed-the-enduring-power-of-logic-in-an-uncertain-world">Q.E.D.: The Enduring Power of Logic in an Uncertain World</h2>
<p>Throughout this handbook, we’ve journeyed from the foundational concepts of propositional logic and truth tables to the powerful argument forms of Modus Ponens and Modus Tollens. We explored how these tools enable valid deductions and identified common logical fallacies like Affirming the Consequent and Denying the Antecedent, understanding why they lead to incorrect inferences when an "if-then" relationship isn't a strict "if and only if." We learned the profound importance of falsifiability – the ability for a statement or hypothesis to be disproven – a cornerstone of both scientific inquiry and robust software testing.</p>
<p>We then delved into the practical application of these logical principles in software development, mapping code correctness to test outcomes. We discovered how a failing test, when trusted, becomes a powerful application of Modus Tollens, pinpointing defects. We also confronted the "illusion of correctness" that arises from the affirming the consequent fallacy when tests pass for the wrong reasons, especially when using test doubles.</p>
<p>Crucially, we introduced the "If and Only If" (P⟺Q) relationship, highlighting its unparalleled power in establishing unambiguous connections between cause and effect. This bidirectional guarantee is the ideal we strive for in test suite quality, moving beyond mere correlation to a deeper understanding of causality. We saw how mutation testing rigorously pushes us towards this "if and only if" confidence by actively trying to falsify the assumption that "incorrect code leads to failing tests," thereby strengthening the inverse: "passing tests guarantee correct code."</p>
<p>We also acknowledged the "messy reality" of modern software. Large systems are webs of complexity, with fan-in/fan-out patterns, side effects, and unforeseen interactions that can obscure clear logical chains. Technical debt and the double-edged sword of abstraction often mask the true origins of bugs, turning debugging into a "causal fog."</p>
<h3 id="heading-logic-as-your-compass">Logic as Your Compass</h3>
<p>Despite these formidable challenges, the logical principles we've explored remain your most vital tools. They provide the mental framework to navigate uncertainty.</p>
<p>When confronted with a bug, your ability to reason logically allows you to formulate hypotheses, design focused experiments (your tests), and interpret their outcomes with precision. Whether you're debugging a complex microservice or reasoning about a simple function, applying Modus Tollens to a failing test or designing tests that aim for P⟺Q clarity helps you cut through the noise.</p>
<p>We also touched upon advanced tools like Proof Assistants (Prolog, Coq, TLA+, Isabelle, Lean), which represent the pinnacle of applying formal logic to guarantee system correctness – a testament to the enduring power of logical rigor in critical domains.</p>
<p>In the intricate dance between theory and practice, the principles of logic stand as an unshakeable foundation. They are the "rocks" upon which you can meticulously build your understanding and your systems. The more consistently you apply this critical thinking, driven by curiosity and a commitment to rigorous validation, the clearer your path becomes.</p>
<p>This clarity is not just about fixing today’s bugs, it’s about continually refining your mental models, fostering trust in your codebase, and equipping yourself to build increasingly robust and predictable systems in an ever-evolving technological landscape.</p>
<p>If you love problem solving, critical thinking, or have experiences on how you fixed an issue that looked different from how it initially seemed, feel free to connect with me at <a target="_blank" href="https://linkedin.com/in/hanqi91">https://linkedin.com/in/hanqi91</a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749064755840/c7646f6a-a8ba-4cf5-9647-0488e24705aa.jpeg" alt="man kayaking and readying for a drop down a waterfall" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-resources">Resources</h2>
<ol>
<li><p>Article that motivated this handbook: <a target="_blank" href="https://thoughtbot.com/blog/classical-reasoning-and-debugging">Classical Reasoning and Debugging</a></p>
</li>
<li><p>3 Formal proofs of modus tollens: <a target="_blank" href="https://en.wikipedia.org/wiki/Modus_tollens">https://en.wikipedia.org/wiki/Modus_tollens</a></p>
</li>
<li><p>Table of 24 syllogisms: <a target="_blank" href="https://en.wikipedia.org/wiki/Syllogism">https://en.wikipedia.org/wiki/Syllogism</a></p>
</li>
<li><p>Challenging Assumptions: <a target="_blank" href="https://thoughtbot.com/blog/falsehoods-software-teams-believe-about-user-feedback">Falsehoods software teams believe about user feedback</a></p>
</li>
<li><p>How assumptions and software evolve beyond your control: <a target="_blank" href="https://www.tdda.info/why-code-rusts">https://www.tdda.info/why-code-rusts</a></p>
</li>
<li><p>Relationship to Hypothesis Testing: <a target="_blank" href="https://sites.google.com/view/reasonedwriting/home/FRAMEWORK_FOR_SCIENTIFIC_PAPERS/HYPOTHESES/HOW_TO_TEST_HYPOTHESES/MODUS_TOLLENS">https://sites.google.com/view/reasonedwriting/home/FRAMEWORK_FOR_SCIENTIFIC_PAPERS/HYPOTHESES/HOW_TO_TEST_HYPOTHESES/MODUS_TOLLENS</a></p>
</li>
<li><p>The Troubleshooting Mindset: <a target="_blank" href="https://www.autodidacts.io/troubleshooting/">https://www.autodidacts.io/troubleshooting/</a></p>
</li>
<li><p>Causal Diagrams from The Effect Book: <a target="_blank" href="https://theeffectbook.net/ch-CausalDiagrams.html">https://theeffectbook.net/ch-CausalDiagrams.html</a></p>
</li>
<li><p>A systematic guide to the mindsets and practices of debugging: <a target="_blank" href="https://www.amazon.sg/Debug-Find-Repair-Prevent-Bugs/dp/193435628X">https://www.amazon.sg/Debug-Find-Repair-Prevent-Bugs/dp/193435628X</a></p>
</li>
<li><p>Constructing P in a way to ensure software correctness: <a target="_blank" href="https://www.hillelwayne.com/post/constructive/">https://www.hillelwayne.com/post/constructive/</a></p>
</li>
<li><p>Fail Fast by explicitly representing assumptions as assertions: <a target="_blank" href="https://www.martinfowler.com/ieeeSoftware/failFast.pdf">https://www.martinfowler.com/ieeeSoftware/failFast.pdf</a></p>
</li>
<li><p>Deterministic Simulation Testing to tackle complex systems: <a target="_blank" href="https://pierrezemb.fr/posts/learn-about-dst/">https://pierrezemb.fr/posts/learn-about-dst/</a></p>
</li>
<li><p>GitHub’s Engineering System Success Playbook (ESSP) - Quality, Velocity, Developer Happiness on Business Outcomes: <a target="_blank" href="https://assets.ctfassets.net/wfutmusr1t3h/us6AUuwawrtNGTlwlT9Ac/f0fce86712054fc87f10db28b20f303b/GitHub-ESSP.pdf">https://assets.ctfassets.net/wfutmusr1t3h/us6AUuwawrtNGTlwlT9Ac/f0fce86712054fc87f10db28b20f303b/GitHub-ESSP.pdf</a></p>
</li>
<li><p>Closed-world assumption: <a target="_blank" href="https://en.wikipedia.org/wiki/Closed-world_assumption">https://en.wikipedia.org/wiki/Closed-world_assumption</a></p>
</li>
</ol>
<h2 id="heading-glossary">Glossary</h2>
<ul>
<li><p><strong>Axiom:</strong> A fundamental truth or rule accepted as a starting point for a logical or mathematical system, without requiring proof.</p>
</li>
<li><p><strong>Contrapositive:</strong> A logically equivalent form of an "if-then" statement (P⟹Q), which is ¬Q⟹¬P ("If not Q, then not P").</p>
</li>
<li><p><strong>Deductive Reasoning:</strong> A type of logical reasoning where a conclusion is necessarily true if its premises are true.</p>
</li>
<li><p><strong>Falsification:</strong> The principle, especially in science (from Karl Popper), that a hypothesis or theory must be capable of being proven false by empirical observation or experiment.</p>
</li>
<li><p><strong>Formal Logic:</strong> The study of abstract systems of reasoning and arguments based on their structure, independent of content.</p>
</li>
<li><p><strong>Hypothesis Testing:</strong> A statistical method for making inferences about a population based on sample data, typically by testing a null hypothesis (e.g., "P has no effect on Q") against an alternative hypothesis.</p>
</li>
<li><p><strong>Logical Fallacy:</strong> A flaw in the structure or content of an argument that makes it unsound or invalid, even if its conclusion might seem plausible.</p>
<ul>
<li><p><strong>Affirming the Consequent (Fallacy):</strong> An invalid argument form that mistakenly assumes if P⟹Q is true, and Q is true, then P must be true.</p>
</li>
<li><p><strong>Denying the Antecedent (Fallacy):</strong> An invalid argument form that mistakenly assumes if P⟹Q is true, and P is false, then Q must be false.</p>
</li>
</ul>
</li>
<li><p><strong>Modus Ponens:</strong> A valid argument form: If P⟹Q is true and P is true, then Q must be true.</p>
</li>
<li><p><strong>Modus Tollens:</strong> A valid argument form: If P⟹Q is true and ¬Q is true, then ¬P must be true.</p>
</li>
<li><p><strong>Mutation Testing:</strong> A software testing technique that involves deliberately introducing small, single-point faults (mutations) into code to assess the effectiveness and coverage of a test suite.</p>
</li>
<li><p><strong>Propositional Logic:</strong> A branch of logic that deals with propositions and their relationships using logical operators.</p>
</li>
<li><p><strong>Test-Driven Development (TDD):</strong> A software development methodology where tests are written <em>before</em> the code, guiding the development process and ensuring correctness.</p>
</li>
<li><p><strong>Truth Table:</strong> A table that systematically lists all possible truth values for a set of propositions and shows the resulting truth value of a complex logical statement.</p>
</li>
<li><p><strong>Vacuously True:</strong> Describes an implication (P⟹Q) that is considered true simply because its antecedent (P) is false.</p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Testing Framework for E-Commerce Checkout and Payments ]]>
                </title>
                <description>
                    <![CDATA[ When I first started working on E-commerce applications, I assumed testing checkout flows and payments would be straightforward. My expectation was simple: users select items, provide an address, pay, and receive confirmation. But I quickly learned t... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-testing-framework-for-e-commerce-checkout-and-payments/</link>
                <guid isPermaLink="false">68308f32f4205e3b843cfb37</guid>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Automation Test Framework ]]>
                    </category>
                
                    <category>
                        <![CDATA[ checkoutpage ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Venkata Sai Sandeep ]]>
                </dc:creator>
                <pubDate>Fri, 23 May 2025 15:07:30 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748007727163/0fc1a849-6309-4d37-9415-844f9691de40.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When I first started working on E-commerce applications, I assumed testing checkout flows and payments would be straightforward. My expectation was simple: users select items, provide an address, pay, and receive confirmation. But I quickly learned that each step in the checkout process is filled with hidden complexities, edge cases, and unexpected behaviors.</p>
<p>The reason I’m sharing my experience is simple: I struggled initially to find detailed resources that described real-world checkout testing challenges. I want this article to be what I wish I had when I began – a clear, structured guide to building a robust checkout and payment testing framework that anticipates and handles real-world scenarios effectively.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-why-this-is-important-and-challenging">Why This is Important and Challenging</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-getting-started">Getting Started</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-testing-the-checkout-flow">Testing the Checkout Flow</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-cart-state-and-validation">Step 1: Cart State and Validation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-address-and-shipping-details">Step 2: Address and Shipping Details</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-payment-method-selection-and-validation">Step 3: Payment Method Selection and Validation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-payment-processing-and-error-handling">Step 4: Payment Processing and Error Handling</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-order-confirmation">Step 5: Order Confirmation</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-personal-challenges-and-lessons-learned">Personal Challenges &amp; Lessons Learned</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-final-thoughts">Final Thoughts</a></p>
</li>
</ol>
<h2 id="heading-why-this-is-important-and-challenging">Why This is Important and Challenging</h2>
<p>Testing checkout and payment flows is crucial because they’re directly tied to customer trust and business revenue. Each mistake or oversight can lead to lost sales, security vulnerabilities, or damaged reputation.</p>
<p>The complexity arises because checkout processes involve multiple integrated components carts, addresses, payments, and confirmations, each potentially failing or behaving unpredictably. So robust testing ensures the system reliably handles real-world customer behaviors and system anomalies, safeguarding both user experience and business success.</p>
<h2 id="heading-getting-started">Getting Started</h2>
<p>To follow along with this guide, you'll need basic experience in Java (8 or later), object-oriented programming concepts like interfaces and classes, and familiarity with a text editor or IDE such as IntelliJ, Eclipse, or VS Code.</p>
<p>This article is beginner-friendly but touches on real-world use cases that are beneficial to experienced engineers. You'll work with simulated inputs rather than real APIs, making it safe to explore and experiment.</p>
<h3 id="heading-defining-some-terms">Defining Some Terms:</h3>
<p>In this context, a "testing framework" refers to a modular, logic-driven structure for validating key business rules in the checkout pipeline.</p>
<p>Instead of relying on external libraries like JUnit or Selenium, this approach embeds rule-based validations directly into the control flow. Each component (for example, cart, address, payment) is treated as a testable unit with clear preconditions and response logic, reflecting how a lightweight internal QA harness might enforce system integrity.</p>
<p>For example, verifying that a cart has items with quantity &gt; 0, or that an address includes required fields like postal code, simulates the validation engine that would exist in production-grade systems.</p>
<p>We'll also use the term "Assertion Steps" throughout this article to describe the key validation points your framework should enforce at each stage of the checkout flow. These aren't formal assertions from a test library, but are rather logical checks built into the control flow that verify specific conditions like ensuring a cart isn’t empty or a payment method is supported.</p>
<p>When I began building frameworks, I often focused on getting things to work, but missed defining what "working" meant. Adding clear, meaningful assertions to each step transformed my process. They became not only guardrails for correctness, but also checkpoints that made my test code more maintainable, predictable, and easier to extend.</p>
<h2 id="heading-testing-the-checkout-flow">Testing the Checkout Flow</h2>
<p>Now that we understand why checkout testing is important and what we’ll be doing here, let’s walk through the key parts of the flow. Each stage represents a critical checkpoint where real-world issues can emerge and where your test framework should be ready to catch them.</p>
<h3 id="heading-step-1-cart-state-and-validation">Step 1: Cart State and Validation</h3>
<p>Before testing payments, I learned the hard way that ensuring the cart’s state is critical. Users frequently modify carts during checkout, or their session might expire.</p>
<p>The cart is where every checkout begins. It might look simple, but it’s surprisingly fragile. Users can remove items mid-flow, reload stale pages, or even send malformed data. Your framework should validate both the cart’s structure and the legitimacy of its contents before allowing checkout to proceed.</p>
<pre><code class="lang-java">Map&lt;String, Integer&gt; cartItems = getCartItems();

<span class="hljs-keyword">boolean</span> isCartValid = cartItems.entrySet().stream()
    .allMatch(entry -&gt; entry.getValue() &gt; <span class="hljs-number">0</span>);

<span class="hljs-keyword">if</span> (isCartValid) {
    proceedToCheckout();
} <span class="hljs-keyword">else</span> {
    showError(<span class="hljs-string">"Cart validation failed: one or more items have invalid quantities."</span>);
}
</code></pre>
<p><strong>Assertion Steps:</strong></p>
<p>We’re validating that this logic enforces key conditions, ensuring that only valid cart states proceed and failures are clearly reported. This helps isolate issues early and improves confidence in the checkout pipeline:</p>
<ul>
<li><p>Verify error messages appear when the cart validation fails (<code>showError(…)</code> line).</p>
</li>
<li><p>Confirm the checkout process advances only if the cart is valid (<code>proceedToCheckout()</code> line).</p>
</li>
</ul>
<h3 id="heading-step-2-address-and-shipping-details">Step 2: Address and Shipping Details</h3>
<p>I encountered many edge cases such as incomplete addresses, international formats, and unexpected API failures from shipping providers.</p>
<p>To handle these issues, you can use shipping address validation. This ensures that the order actually has a destination and that it's reachable. Also, incomplete fields, invalid formats, or API glitches can lead to fulfillment failures. Your test logic should enforce address completeness and formatting before progressing.</p>
<pre><code class="lang-java">Map&lt;String, String&gt; addressFields = address.getAddressFields();

<span class="hljs-keyword">boolean</span> isAddressComplete = Stream.of(<span class="hljs-string">"street"</span>, <span class="hljs-string">"city"</span>, <span class="hljs-string">"postalCode"</span>)
    .allMatch(field -&gt; addressFields.getOrDefault(field, <span class="hljs-string">""</span>).trim().length() &gt; <span class="hljs-number">0</span>);

<span class="hljs-keyword">if</span> (isAddressComplete) {
    confirmShippingDetails(address);
} <span class="hljs-keyword">else</span> {
    showError(<span class="hljs-string">"Invalid or incomplete address provided."</span>);
}
</code></pre>
<p><strong>Assertion Steps:</strong></p>
<p>This validation ensures the system doesn’t proceed with incomplete address data. The stream logic checks for required fields, and depending on the result, either confirms the shipping or triggers an error message.</p>
<ul>
<li><p>Confirm the system rejects incomplete or invalid addresses (the conditional check in the <code>isAddressComplete</code> stream logic).</p>
</li>
<li><p>Ensure clear error messages are displayed if address validation fails (<code>showError(…)</code> line).</p>
</li>
</ul>
<h3 id="heading-step-3-payment-method-selection-and-validation">Step 3: Payment Method Selection and Validation</h3>
<p>Payment methods like credit cards, debit cards, digital wallets, and gift cards required different validation rules and logic flows.</p>
<p>This step ensures that only valid and supported payment methods can be used. From credit cards to mobile wallets, each method requires its own validation logic. Testing here prevents users from attempting transactions with incomplete or unverified payment inputs.</p>
<pre><code class="lang-java">LinkedList&lt;String&gt; supportedMethods = <span class="hljs-keyword">new</span> LinkedList&lt;&gt;(Arrays.asList(<span class="hljs-string">"CreditCard"</span>, <span class="hljs-string">"DebitCard"</span>, <span class="hljs-string">"PayPal"</span>, <span class="hljs-string">"Wallet"</span>));

<span class="hljs-keyword">if</span> (supportedMethods.contains(paymentMethod.getType()) &amp;&amp; paymentMethod.detailsAreValid()) {
    processPayment(paymentMethod);
} <span class="hljs-keyword">else</span> {
    showError(<span class="hljs-string">"Selected payment method is invalid or unsupported."</span>);
}
</code></pre>
<p><strong>Assertion Steps:</strong></p>
<p>This logic ensures that only supported and valid payment types can proceed to processing. The <code>contains(…)</code> check confirms the method is allowed, while <code>detailsAreValid()</code> guards against incomplete or incorrect data. Combined, these help isolate bad inputs early in the flow:</p>
<ul>
<li><p>Confirm unsupported payment types trigger the appropriate error (<code>showError(…)</code> line).</p>
</li>
<li><p>Ensure the payment processing proceeds only with valid and supported methods (<code>processPayment(paymentMethod)</code> line).</p>
</li>
</ul>
<p><strong>Common Payment Method Validations:</strong></p>
<p>Different payment methods have unique validation requirements. Here are examples of some key tests:</p>
<ul>
<li><p><strong>Credit Card:</strong> Validate card number format (for example, starts with 4 for Visa, correct length), CVV (3-digit), and expiry date validity.</p>
<pre><code class="lang-java">  <span class="hljs-keyword">if</span> (paymentMethod.getType().equals(<span class="hljs-string">"CreditCard"</span>) &amp;&amp; paymentMethod.getCardNumber().matches(<span class="hljs-string">"^4[0-9]{12}(?:[0-9]{3})?$"</span>)) {
      processPayment(paymentMethod);
  } <span class="hljs-keyword">else</span> {
      showError(<span class="hljs-string">"Invalid credit card details."</span>);
  }
</code></pre>
</li>
<li><p><strong>PayPal:</strong> Confirm linked account is verified.</p>
<pre><code class="lang-java">  <span class="hljs-keyword">if</span> (paymentMethod.getType().equals(<span class="hljs-string">"PayPal"</span>) &amp;&amp; paymentMethod.isAccountVerified()) {
      processPayment(paymentMethod);
  } <span class="hljs-keyword">else</span> {
      showError(<span class="hljs-string">"Unverified PayPal account."</span>);
  }
</code></pre>
</li>
<li><p><strong>Digital Wallet</strong>: Validate secure token is correctly formed and active.</p>
<pre><code class="lang-java">  <span class="hljs-keyword">if</span> (paymentMethod.getType().equals(<span class="hljs-string">"Wallet"</span>) &amp;&amp; paymentMethod.isTokenValid()) {
      processPayment(paymentMethod);
  } <span class="hljs-keyword">else</span> {
      showError(<span class="hljs-string">"Invalid or expired wallet token."</span>);
  }
</code></pre>
</li>
</ul>
<h3 id="heading-step-4-payment-processing-and-error-handling">Step 4: Payment Processing and Error Handling</h3>
<p>Even when payment details are valid, payment gateways can fail unpredictably due to network issues, bank declines, or incorrect transaction formats.</p>
<p>This step tests how the system handles payment failures gracefully and clearly and ensures orders are only processed after true confirmation.</p>
<pre><code class="lang-java">PaymentResponse response = paymentGateway.process(transactionDetails);
<span class="hljs-keyword">if</span> (response.isSuccessful()) {
    confirmOrder(response);
} <span class="hljs-keyword">else</span> {
    handlePaymentError(response.getError());
}
</code></pre>
<p><strong>Assertion Steps:</strong></p>
<p>This logic focuses on how the system handles responses from the payment gateway. The <code>isSuccessful()</code> check ensures only confirmed transactions trigger order creation, while any failure path is routed to <code>handlePaymentError()</code>, allowing you to test error flows like declines or timeouts clearly.</p>
<ul>
<li><p>Confirm errors from payment processing (<code>handlePaymentError(response.getError())</code> line) are handled gracefully.</p>
</li>
<li><p>Common errors your framework should simulate and verify include:</p>
<ul>
<li><p><strong>Timeouts</strong>: when the gateway service is delayed or unreachable.</p>
</li>
<li><p><strong>Insufficient Funds</strong>: valid card but not enough balance.</p>
</li>
<li><p><strong>Card Declined</strong>: blocked or expired cards.</p>
</li>
<li><p><strong>Malformed Requests</strong>: missing fields or invalid transaction payloads.</p>
</li>
</ul>
</li>
<li><p>Ensure successful transactions are always followed by order confirmations (<code>confirmOrder(response)</code> line).</p>
</li>
</ul>
<h3 id="heading-step-5-order-confirmation">Step 5: Order Confirmation</h3>
<p>Order confirmation accuracy and timing are crucial. Issues can occur if confirmation happens prematurely or email notifications are delayed.</p>
<p>This final step validates that orders are only confirmed after successful payment. Rushing this process can result in orders without revenue or duplicate transactions. The framework should check for payment settlement before confirming and notifying the user.</p>
<pre><code class="lang-java"><span class="hljs-keyword">if</span> (payment.isSettled()) {
    order.createRecord();
    notifyCustomer(order);
} <span class="hljs-keyword">else</span> {
    showError(<span class="hljs-string">"Order cannot be confirmed until payment settles."</span>);
}
</code></pre>
<p><strong>Assertion Steps:</strong></p>
<p>This logic ensures confirmation and notification only happen after payment settlement. The <code>payment.isSettled()</code> check guards against premature actions, allowing order creation and customer notifications only when the transaction is fully complete:</p>
<ul>
<li><p>Validate emails are sent only after payment settlement (<code>notifyCustomer(order)</code> line following successful payment check).</p>
</li>
<li><p>Confirm that orders are created accurately after payments (<code>order.createRecord()</code> line).</p>
</li>
</ul>
<h2 id="heading-personal-challenges-amp-lessons-learned">Personal Challenges &amp; Lessons Learned</h2>
<ul>
<li><p>Users behave unpredictably: design your tests to mimic real-world behavior as closely as possible.</p>
</li>
<li><p>Simulate external service failures proactively: don’t wait for production to expose them.</p>
</li>
<li><p>Maintain detailed logs: they help pinpoint issues faster during debugging.</p>
</li>
<li><p>Communicate clearly and promptly: users value transparency when issues arise.</p>
</li>
</ul>
<p>These challenges reinforced that technical correctness alone is not sufficient. An effective testing framework must account for unpredictable user behavior, proactively simulate third-party service failures, and offer traceability through detailed logs.</p>
<p>By building for resilience and maintaining clear communication, you can ensure your e-commerce system operates reliably and builds lasting user trust even under stress.</p>
<h2 id="heading-key-takeaways">Key Takeaways:</h2>
<ul>
<li><p>Always validate backend logic separately from UI.</p>
</li>
<li><p>Include negative and edge-case scenarios in your tests.</p>
</li>
<li><p>Expect API failures and handle them gracefully.</p>
</li>
</ul>
<h2 id="heading-lessons-from-the-journey">Lessons from the Journey</h2>
<p>Testing e-commerce checkouts taught me that robust frameworks understand human behaviors, expect the unexpected, and rigorously validate each step. By sharing my journey, I aim to simplify the learning curve for others facing similar challenges.</p>
<p>Remember – effective testing isn’t about getting to zero defects immediately. It's about continuous refinement and learning from every scenario. Keep building, keep testing, and let your code reflect real-world reliability.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
