<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Jessica Patel - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Jessica Patel - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Wed, 20 May 2026 15:58:39 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/jesspat103/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Design a Type-Safe, Lazy, and Secure Plugin Architecture in React ]]>
                </title>
                <description>
                    <![CDATA[ Modern web applications increasingly need to evolve faster than a single team can maintain a monolithic codebase. Product teams often want to add features independently, experiment with new capabiliti ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-design-a-type-safe-lazy-and-secure-plugin-architecture-in-react/</link>
                <guid isPermaLink="false">69caa5bc9fffa747404dbd51</guid>
                
                    <category>
                        <![CDATA[ React ]]>
                    </category>
                
                    <category>
                        <![CDATA[ TypeScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ plugin ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Frontend Development ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Jessica Patel ]]>
                </dc:creator>
                <pubDate>Mon, 30 Mar 2026 15:00:00 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/5cdc3448-3bf8-456e-b316-33c6bcb98690.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Modern web applications increasingly need to evolve faster than a single team can maintain a monolithic codebase. Product teams often want to add features independently, experiment with new capabilities, or deploy domain-specific functionality without modifying the core application every time. This is where a plugin architecture becomes valuable.</p>
<p>A plugin architecture allows an application to load external modules that extend its functionality at runtime. Instead of embedding every feature directly in the core application, the system exposes a controlled interface (the host API) that plugins use to integrate with the platform. These plugins can register UI components, contribute functionality, or interact with application services while remaining isolated from the core codebase.</p>
<p>This architectural pattern is widely used across software ecosystems. Platforms such as IDEs, content management systems, and browser extensions rely on plugins to allow third-party developers to extend their functionality without compromising stability.</p>
<p>In a web application context, a similar approach allows large frontend systems to evolve modularly, enabling multiple teams to ship features independently.</p>
<p>In this tutorial, you'll learn how to design a type-safe, lazy-loaded, and secure plugin architecture in React — complete with lifecycle management, independent bundling, hot-loading, and real TypeScript examples.</p>
<p>By the end, you'll have everything you need to transform your React application into a modular platform capable of hosting independent extensions without sacrificing maintainability, performance, or security.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-a-common-pain-point-scaling-frontend-platforms">A Common Pain Point: Scaling Frontend Platforms</a></p>
</li>
<li><p><a href="#heading-what-this-article-will-cover">What This Article Will Cover</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-why-a-plugin-architecture">Why a Plugin Architecture?</a></p>
</li>
<li><p><a href="#heading-core-concepts-of-a-react-plugin-architecture">Core Concepts of a React Plugin Architecture</a></p>
</li>
<li><p><a href="#heading-high-level-architecture-of-a-react-plugin-system">High-Level Architecture of a React Plugin System</a></p>
</li>
<li><p><a href="#heading-real-typescript-example-a-chat-plugin">Real TypeScript Example: A Chat Plugin</a></p>
<ul>
<li><p><a href="#heading-chat-plugin-implementation">Chat Plugin Implementation</a></p>
</li>
<li><p><a href="#heading-host-application-usage">Host Application Usage</a></p>
</li>
<li><p><a href="#heading-1-how-to-define-the-host-api">1. How to Define the Host API</a></p>
</li>
<li><p><a href="#heading-2-how-to-define-the-plugin-lifecycle">2. How to Define the Plugin Lifecycle</a></p>
</li>
<li><p><a href="#heading-3-how-to-bundle-plugins-separately">3. How to Bundle Plugins Separately</a></p>
</li>
<li><p><a href="#heading-4-how-to-lazy-load-plugins">4. How to Lazy-Load Plugins</a></p>
</li>
<li><p><a href="#heading-5-security-permission-model">5. Security &amp; Permission Model</a></p>
</li>
<li><p><a href="#heading-6-plugin-hot-loading">6. Plugin Hot-loading</a></p>
</li>
<li><p><a href="#heading-7-ci-deployment-considerations">7. CI &amp; Deployment Considerations</a></p>
</li>
<li><p><a href="#heading-putting-it-all-together">Putting It All Together</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-best-practices">Best Practices</a></p>
</li>
<li><p><a href="#heading-when-not-to-use-a-plugin-architecture">When NOT to Use a Plugin Architecture</a></p>
<ul>
<li><p><a href="#heading-small-or-single-team-applications">Small or Single-Team Applications</a></p>
</li>
<li><p><a href="#heading-tightly-coupled-features">Tightly Coupled Features</a></p>
</li>
<li><p><a href="#heading-performance-critical-systems">Performance-Critical Systems</a></p>
</li>
<li><p><a href="#heading-limited-security-controls">Limited Security Controls</a></p>
</li>
<li><p><a href="#heading-early-stage-products">Early-Stage Products</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-future-enhancements">Future Enhancements</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-a-common-pain-point-scaling-frontend-platforms">A Common Pain Point: Scaling Frontend Platforms</h2>
<p>Consider a large internal admin dashboard used by multiple teams across an organization. Each team wants to add its own functionality, like analytics dashboards, workflow management tools, user administration panels, and domain-specific reporting modules.</p>
<p>If all these features are implemented directly in the main React application, several problems quickly emerge. Merge conflicts in the core repository become frequent, unrelated features grow tightly coupled, and release cycles slow down because every change requires redeploying the entire application. Worse, adding new features carries a constant risk of breaking existing functionality.</p>
<p>A plugin architecture solves this problem by allowing each feature to be developed as an independent plugin. The host application provides a stable platform and a controlled API, while teams can ship their own plugins without modifying the core system.</p>
<h2 id="heading-what-this-article-will-cover">What This Article Will Cover</h2>
<p>This guide walks you through how to design a type-safe, lazy-loaded, and secure plugin architecture in React using TypeScript. You'll learn how to design a host API that plugins can safely interact with, how to define a plugin lifecycle for initialization, mounting, updates, and cleanup, and how to bundle plugins independently so they can be developed and deployed separately.</p>
<p>You'll also learn how to lazy-load plugins at runtime to improve performance, how to implement a security model that prevents plugins from accessing sensitive application state, and how to enable hot-loading during development while enforcing safety through CI/CD pipelines.</p>
<p>By the end of this article, you'll understand how to build a flexible plugin system that allows your React application to grow into a modular platform capable of hosting independent extensions without sacrificing maintainability, performance, or security.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before following along with this guide, you should be familiar with several core technologies and concepts used throughout the examples.</p>
<p><strong>React Fundamentals</strong><br>A basic understanding of React components, hooks, and JSX is required. The examples assume familiarity with functional components, <code>useState</code>, and <code>useEffect</code>.</p>
<p><strong>TypeScript Basics</strong><br>Since the plugin architecture relies heavily on type contracts between the host application and plugins, you should understand TypeScript interfaces, generics, and module exports.</p>
<p><strong>Modern JavaScript Modules</strong><br>Knowledge of ES modules (<code>import</code> / <code>export</code>) and dynamic imports will help when working with lazy-loaded plugins.</p>
<p><strong>React Tooling (Vite or Webpack)</strong><br>The examples reference modern frontend build tools such as Vite. Familiarity with how bundlers compile React applications and manage dependencies will help when configuring plugin builds.</p>
<p><strong>Basic Web Security Concepts</strong><br>Some sections discuss sandboxing and restricted APIs. A general understanding of browser security concepts such as iframes, same-origin policies, and API boundaries is helpful but not strictly required.</p>
<h2 id="heading-why-a-plugin-architecture">Why a Plugin Architecture?</h2>
<p>Imagine you're building an internal admin platform where multiple teams need to ship independent features as plugins without risking the core application. A plugin architecture allows each team to contribute functionality safely, while the host maintains type safety, security, and performance.</p>
<p>This guide targets React/TypeScript engineers who want to design a plugin system capable of hosting third-party extensions without compromising maintainability.</p>
<p>The benefits of this approach are significant. Extensibility means developers or third parties can add features without touching core code. Isolation allows plugins to be sandboxed so they can't affect unrelated parts of the application. Lazy loading ensures only the features a user actually needs are fetched, keeping the application fast. TypeScript enforces a strict contract between plugins and the host, catching errors at compile time rather than at runtime. Finally, controlled APIs and permission boundaries prevent malicious or poorly written plugins from interfering with the rest of the system.</p>
<p>A well-architected plugin system balances all of these qualities – flexibility, safety, and maintainability – without forcing unnecessary trade-offs between them.</p>
<h2 id="heading-core-concepts-of-a-react-plugin-architecture">Core Concepts of a React Plugin Architecture</h2>
<p>Before diving into code, it helps to understand the key building blocks that make up a React plugin system.</p>
<p>At a high level, a plugin architecture in React revolves around five concerns.</p>
<ol>
<li><p>The <strong>Host API</strong> is the interface the core application exposes to plugins.</p>
</li>
<li><p>The <strong>Plugin Lifecycle</strong> defines methods for initialization, mounting, updating, and cleanup.</p>
</li>
<li><p><strong>Bundling</strong> means compiling each plugin separately to avoid coupling it to the host.</p>
</li>
<li><p>The <strong>Security Model</strong> covers permissions and sandboxing to prevent misuse.</p>
</li>
<li><p>Finally, <strong>Hot-loading and CI</strong> streamline the development and deployment experience.</p>
</li>
</ol>
<p>We'll explore each of these concepts in detail in the sections that follow. First, let's look at how they fit together visually.</p>
<h2 id="heading-high-level-architecture-of-a-react-plugin-system">High-Level Architecture of a React Plugin System</h2>
<p>The following diagram illustrates how the host application interacts with independently bundled plugins. The host exposes a controlled API, loads plugins dynamically, and manages their lifecycle while maintaining security boundaries.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6979762ba2442d262dacf388/5b7ef89a-62ae-4ea0-97f5-34a2423650d3.png" alt="React Plugin Architecture" style="display:block;margin:0 auto" width="680" height="545" loading="lazy">

<p>The core application serves as the runtime environment for all plugins, housing the plugin loader, lifecycle manager, and the host API.</p>
<p>The plugin loader dynamically imports plugin bundles at runtime using <code>import()</code>, while the host API ensures plugins interact with the application through a controlled interface rather than accessing internal state directly.</p>
<p>Each plugin is compiled as a separate bundle and registers itself with the host during initialization. A dedicated security layer enforces all of these boundaries, ensuring plugins cannot directly manipulate internal state or sensitive resources.</p>
<p>Together, these pieces ensure that plugins remain independent, lazy-loadable, and secure, while the host application retains full control over lifecycle management and platform stability.</p>
<h2 id="heading-real-typescript-example-a-chat-plugin">Real TypeScript Example: A Chat Plugin</h2>
<p>Now that you have a mental model of the architecture, let's look at a minimal working example before diving into each concept individually. This example demonstrates how a plugin registers itself with the host application and exposes a UI component through the host API.</p>
<p>The following plugin implements a simple chat feature that registers a React component with the host platform.</p>
<h3 id="heading-chat-plugin-implementation">Chat Plugin Implementation</h3>
<pre><code class="language-typescript">// plugins/chat-plugin/src/plugin.ts

import { Plugin, HostAPI } from '../../src/plugins';

const ChatPlugin: Plugin = {
  name: 'ChatPlugin',
  version: '1.0.0',
  init(host: HostAPI) {
    host.registerComponent('Chat', () =&gt; (
      &lt;div&gt;Welcome to the Chat Plugin!&lt;/div&gt;
    ));
    host.log('ChatPlugin initialized');
  },
};

export default ChatPlugin;
</code></pre>
<h3 id="heading-host-application-usage">Host Application Usage</h3>
<p>The host application loads the plugin and renders the component it registered.</p>
<pre><code class="language-typescript">const Chat = hostAPI.getComponent('Chat');

return (
  &lt;div&gt;
    {Chat ? &lt;Chat /&gt; : 'Loading Chat Plugin...'}
  &lt;/div&gt;
);
</code></pre>
<p>In this example, the plugin doesn't directly modify the host application. Instead, it interacts through the Host API, registering a component that the host can render dynamically. The sections below break down exactly how each piece of this system is built.</p>
<h3 id="heading-1-how-to-define-the-host-api">1. How to Define the Host API</h3>
<p>The <strong>host API</strong> is the contract between the core app and its plugins. It defines what functionality plugins can access. Before plugins can do anything useful, the host must expose a controlled interface, establishing the contract between the core application and its extensions.</p>
<p><strong>Example: TypeScript Host API</strong></p>
<pre><code class="language-typescript">// src/plugins/host.ts

export interface HostAPI {
  // Using ComponentType instead of FC&lt;any&gt; reinforces type-safety while allowing class/function components
  registerComponent: (name: string, component: React.ComponentType&lt;any&gt;) =&gt; void;
  getComponent: (name: string) =&gt; React.ComponentType&lt;any&gt; | undefined;
  log: (message: string) =&gt; void;
}

// Note: We still use `any` for props here for extensibility; plugins can define stricter props locally if needed.

export const hostAPI: HostAPI = { 
    registerComponent(name, component) { 
        console.log(Registered component: ${name}); 
        componentRegistry[name] = component; 
    }, 
    getComponent(name) { 
        return componentRegistry[name]; 
    }, 
    log(message) { 
        console.log([PLUGIN LOG]: ${message}); 
    }, 
};

const componentRegistry: Record&lt;string, React.ComponentType&lt;any&gt;&gt; = {};
</code></pre>
<p>This API allows plugins to register UI components and log messages, without giving them unrestricted access to the application state.</p>
<h3 id="heading-2-how-to-define-the-plugin-lifecycle">2. How to Define the Plugin Lifecycle</h3>
<p>A plugin lifecycle ensures consistent behavior across all extensions. Once the host API exists, plugins need a structured way to initialize, render, and clean up resources.</p>
<p><strong>Lifecycle Interface</strong></p>
<pre><code class="language-typescript">// src/plugins/plugin.ts

import { HostAPI } from './host';

export interface Plugin {
  name: string;
  version: string;
  init: (host: HostAPI) =&gt; void;
  mount?: () =&gt; void;
  update?: () =&gt; void;
  unmount?: () =&gt; void;
}

// Typically, the host calls mount/update/unmount based on route changes, feature flags, or user interactions.
</code></pre>
<p>The <code>init</code> method is called when the plugin is first loaded and receives the host API as its argument. <code>mount</code> is called when the plugin's UI is displayed, while <code>update</code> is an optional hook triggered when props or state change.</p>
<p>When a plugin is removed, <code>unmount</code> is called to clean up any resources the plugin was holding, preventing memory leaks and side effects in the host application.</p>
<h3 id="heading-3-how-to-bundle-plugins-separately">3. How to Bundle Plugins Separately</h3>
<p>Each plugin should be packaged as an independent module so that it can be developed, versioned, and deployed without tightly coupling it to the host application.</p>
<p>Modern build tools such as Vite or Webpack make it possible to compile plugins into standalone bundles that the host can load dynamically at runtime.</p>
<p><strong>Example Vite Configuration for a Plugin</strong></p>
<pre><code class="language-typescript">// vite.config.ts

import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';

export default defineConfig({
  plugins: [react()],
  build: {
    lib: {
      entry: 'src/plugin.ts',
      name: 'MyPlugin',
      fileName: 'my-plugin',
      formats: ['es'],
    },
    rollupOptions: {
      external: ['react', 'react-dom'],
    },
  },
});
</code></pre>
<p>The <code>external</code> option ensures the plugin uses the host's React, preventing duplicate React versions in memory.</p>
<h3 id="heading-4-how-to-lazy-load-plugins">4. How to Lazy-Load Plugins</h3>
<p>Even when plugins are bundled independently, loading all of them during application startup would significantly increase initial load time. Instead, plugins should be loaded on demand using dynamic imports so that functionality is only fetched when the user actually needs it.</p>
<pre><code class="language-typescript">// src/plugins/loader.ts

export async function loadPlugin(url: string): Promise { 

    // Using /* @vite-ignore */ because the URL is dynamic and cannot be         statically analyzed by Vite.
    // Tradeoff: plugin cannot be pre-bundled; ensure URLs are trusted to avoid security risks.

    const module = await import(/ @vite-ignore */ url); 
    return module.default as Plugin; 
}
</code></pre>
<p><strong>Usage in React:</strong></p>
<pre><code class="language-typescript">const [plugin, setPlugin] = React.useState&lt;Plugin | null&gt;(null);

React.useEffect(() =&gt; {
  loadPlugin('/plugins/my-plugin.js').then((p) =&gt; {
    p.init(hostAPI);
    setPlugin(p);
  });
}, []);
</code></pre>
<p>This pattern allows applications to scale without preloading all plugins, improving initial load time.</p>
<h3 id="heading-5-security-amp-permission-model">5. Security &amp; Permission Model</h3>
<p>Because plugins run code that originates outside the core application, security boundaries are essential. Even though plugins interact through the host API, the platform must still restrict what capabilities they can access in order to prevent misuse or accidental interference with application state.</p>
<p><strong>Example: Restricted API</strong></p>
<pre><code class="language-typescript">export interface SecureHostAPI {
  log: (message: string) =&gt; void;
  registerComponent: (name: string, component: React.ComponentType&lt;any&gt;) =&gt; void;
  fetchData?: (endpoint: string) =&gt; Promise&lt;any&gt;; // Only if allowed
}
</code></pre>
<p>You can enhance security further using <strong>iframe sandboxing</strong> or <strong>Web Workers</strong> for heavier isolation.</p>
<pre><code class="language-typescript">// Example of a sandboxed iframe plugin

&lt;iframe
  src="/plugins/my-plugin.html"
  sandbox="allow-scripts"
  style={{ width: '100%', height: '400px', border: 'none' }}
/&gt;

// Advanced isolation notes:
// - You can define different SecureHostAPI shapes for internal vs. third-party plugins,
//   exposing more capabilities to trusted plugins while restricting untrusted ones.
// - For stronger isolation, use message passing (postMessage) with iframes or Web Workers
//   so plugins cannot access the DOM or host state directly.
</code></pre>
<p>This approach prevents DOM and network access outside the API.</p>
<h3 id="heading-6-plugin-hot-loading">6. Plugin Hot-loading</h3>
<p>Hot-loading is essential for developer productivity. Tools like Vite's HMR let you see plugin updates immediately, speeding up iteration and reducing friction.</p>
<p><strong>React Example with HMR:</strong></p>
<pre><code class="language-typescript">if (import.meta.hot) {
  import.meta.hot.accept('/plugins/my-plugin.js', (newModule) =&gt; {
    const updatedPlugin = newModule.default as Plugin;
    updatedPlugin.init(hostAPI);
    setPlugin(updatedPlugin);
  });
}
</code></pre>
<p>With hot-loading, developers can update plugins without restarting the host app.</p>
<h3 id="heading-7-ci-amp-deployment-considerations">7. CI &amp; Deployment Considerations</h3>
<p>To deploy safely, plugins must be verified and tested. CI/CD pipelines enforce type safety, bundling, and security checks automatically. For a production-grade plugin system, continuous integration pipelines should:</p>
<ol>
<li><p>Lint and type-check each plugin using TypeScript.</p>
</li>
<li><p>Run automated tests to ensure plugin compliance.</p>
</li>
<li><p>Bundle plugins independently with versioned outputs.</p>
</li>
<li><p>Deploy plugins to a secure CDN or internal repository.</p>
</li>
<li><p>Verify signatures or hashes to prevent tampering.</p>
</li>
</ol>
<p><strong>GitHub Actions Example for Plugin CI:</strong></p>
<pre><code class="language-plaintext">name: Build Plugin

on:
  push:
    paths:
      - 'plugins/**'

jobs:
  build:
    runs-on: ubuntu-latest
    steps: 
      - uses: actions/checkout@v3 
      - uses: actions/setup-node@v3 
      with:
        node-version: 20
      - run: npm install 
      - run: npm run build --workspace plugins/my-plugin 
      - run: npm run test --workspace plugins/my-plugin
      # Optional: sign plugin artifacts or generate a checksum to verify integrity before loading in the host
</code></pre>
<p>This ensures every plugin is type-safe, tested, and ready for deployment.</p>
<h3 id="heading-putting-it-all-together">Putting It All Together</h3>
<p>At this point, you have walked through each architectural layer independently. Here's how all the pieces map to a real project structure:</p>
<pre><code class="language-plaintext">src/
├── plugins/ 
│ ├── host.ts ← Host API definition 
│ ├── plugin.ts ← Plugin lifecycle interface 
│ └── loader.ts ← Dynamic plugin loader 
plugins/ 
└── chat-plugin/ 
    └── src/ 
        └── plugin.ts ← Chat plugin implementation
</code></pre>
<p>Each file has a single, clear responsibility. <code>host.ts</code> owns the contract, <code>plugin.ts</code> owns the lifecycle shape, <code>loader.ts</code> handles runtime importing, and the plugin itself lives entirely outside the core <code>src/</code> tree – deployable and versioned independently.</p>
<h2 id="heading-best-practices">Best Practices</h2>
<p>At this point, you have a host API, a well-defined plugin lifecycle, isolated bundles, lazy-loading, and a security model. These foundations ensure plugins are robust, type-safe, and maintainable — ready to be extended with versioning, testing, and CI/CD pipelines.</p>
<ol>
<li><p><strong>Type safety:</strong> Always define TypeScript interfaces for host APIs and plugin contracts.</p>
</li>
<li><p><strong>Lazy loading:</strong> Only load plugins when required.</p>
</li>
<li><p><strong>Security:</strong> Expose a minimal API and avoid giving plugins unrestricted access.</p>
</li>
<li><p><strong>Isolated state:</strong> Keep plugin state isolated to prevent accidental interference.</p>
</li>
<li><p><strong>Versioning:</strong> Maintain plugin versions to ensure compatibility with the host.</p>
</li>
<li><p><strong>Testing:</strong> Unit-test plugins against host API mocks.</p>
</li>
<li><p><strong>CI/CD:</strong> Automate linting, testing, and bundling for plugins.</p>
</li>
</ol>
<h2 id="heading-when-not-to-use-a-plugin-architecture">When NOT to Use a Plugin Architecture</h2>
<p>In some cases, introducing a plugin system can add unnecessary complexity without delivering meaningful benefits.</p>
<h3 id="heading-small-or-single-team-applications">Small or Single-Team Applications</h3>
<p>If a project is maintained by a small team and the feature set is relatively stable, a plugin architecture may be excessive. A simpler modular structure within the main codebase is usually easier to maintain and reason about.</p>
<h3 id="heading-tightly-coupled-features">Tightly Coupled Features</h3>
<p>Plugin systems work best when features can operate independently. If new functionality requires deep access to application state or tightly integrated workflows, forcing it into a plugin model may introduce unnecessary abstractions and complexity rather than solving a real problem.</p>
<h3 id="heading-performance-critical-systems">Performance-Critical Systems</h3>
<p>Although lazy-loading can mitigate performance issues, plugin architectures still introduce additional runtime complexity. Applications with strict performance constraints may benefit from a more tightly optimized architecture rather than dynamic plugin loading.</p>
<h3 id="heading-limited-security-controls">Limited Security Controls</h3>
<p>Allowing external code to run inside an application always introduces security risks. If the platform can't enforce strong API boundaries, sandboxing, or validation of plugins, it may be safer to avoid a plugin architecture altogether.</p>
<h3 id="heading-early-stage-products">Early-Stage Products</h3>
<p>In early product development, requirements often change rapidly. Designing a plugin system too early can slow development because engineers must maintain abstraction layers before the product's core architecture has stabilized. It's usually better to wait until the platform's boundaries are well understood before introducing this level of extensibility.</p>
<h2 id="heading-future-enhancements">Future Enhancements</h2>
<p>As the platform matures, there are several directions worth exploring.</p>
<p>Dynamic permissions would allow plugins to explicitly request capabilities, with the host deciding whether to grant them. This makes the security model more granular and auditable.</p>
<p>A plugin marketplace could serve as a central registry of verified plugins, making discovery and distribution easier for teams.</p>
<p>For use cases that require stronger isolation, Web Workers or iframes offer more robust sandboxing than API boundaries alone.</p>
<p>An event bus is another useful addition, allowing plugins to communicate with each other through a shared message system rather than direct API calls, which keeps inter-plugin dependencies loose and manageable.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Designing a plugin architecture in React is ultimately about treating your application as a platform rather than a single codebase. By defining clear contracts between the host application and its extensions, you enable teams to ship features independently while preserving stability, security, and performance.</p>
<p>If you are building a system that multiple teams (or even third-party developers) need to extend, start by establishing a minimal host API and plugin contract. Focus on strong TypeScript interfaces, clear lifecycle boundaries, and strict API access rules. These foundations ensure that plugins remain predictable and safe as the ecosystem grows.</p>
<p>As your platform evolves, you can gradually introduce more advanced capabilities such as plugin versioning, capability-based permissions, sandboxed execution environments, or an internal plugin marketplace.</p>
<p>Observability and monitoring also become increasingly important as the number of plugins grows, allowing you to detect compatibility issues or performance regressions early.</p>
<p>The key takeaway is to start simple but intentional. A small, well-defined plugin interface combined with lazy loading and secure API boundaries is often enough to support the first generation of extensions. From there, your architecture can expand naturally into a full ecosystem where features are delivered as modular, independently deployable plugins.</p>
<p>When implemented thoughtfully, a React plugin architecture transforms a single application into a scalable, extensible platform capable of supporting long-term growth and collaboration across teams.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build End-to-End LLM Observability in FastAPI with OpenTelemetry ]]>
                </title>
                <description>
                    <![CDATA[ This article shows how to build end-to-end, code-first LLM observability in a FastAPI application using the OpenTelemetry Python SDK. Instead of relying on vendor-specific agents or opaque SDKs, we wi ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-end-to-end-llm-observability-in-fastapi-with-opentelemetry/</link>
                <guid isPermaLink="false">69b4379c6e27dd07d920f14c</guid>
                
                    <category>
                        <![CDATA[ observability ]]>
                    </category>
                
                    <category>
                        <![CDATA[ OpenTelemetry ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ FastAPI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Jessica Patel ]]>
                </dc:creator>
                <pubDate>Fri, 13 Mar 2026 16:13:16 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/c69a589a-2dce-46a1-ac49-a0d0e2c23c6e.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>This article shows how to build end-to-end, code-first LLM observability in a FastAPI application using the OpenTelemetry Python SDK.</p>
<p>Instead of relying on vendor-specific agents or opaque SDKs, we will manually design traces, spans, and semantic attributes that capture the full lifecycle of an LLM-powered request.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-introduction">Introduction</a></p>
</li>
<li><p><a href="#heading-prerequisites-and-technical-context">Prerequisites and Technical Context</a></p>
</li>
<li><p><a href="#heading-why-llm-observability-is-fundamentally-different">Why LLM Observability Is Fundamentally Different</a></p>
</li>
<li><p><a href="#heading-reference-architecture-a-traceable-rag-request">Reference Architecture: A Traceable RAG Request</a></p>
</li>
<li><p><a href="#heading-reference-architecture-explained">Reference Architecture Explained</a></p>
</li>
<li><p><a href="#heading-why-this-design-is-better-than-simpler-alternatives">Why This Design Is Better Than Simpler Alternatives</a></p>
</li>
<li><p><a href="#heading-llm-models-that-work-best-for-this-architecture">LLM Models That Work Best for This Architecture</a></p>
</li>
<li><p><a href="#heading-opentelemetry-primer-llm-relevant-concepts-only">OpenTelemetry Primer (LLM-Relevant Concepts Only)</a></p>
</li>
<li><p><a href="#heading-designing-llm-aware-spans">Designing LLM-Aware Spans</a></p>
</li>
<li><p><a href="#heading-fastapi-example-end-to-end-llm-spans-complete-and-explained">FastAPI Example: End-to-End LLM Spans (Complete and Explained)</a></p>
</li>
<li><p><a href="#heading-semantic-attributes-best-practices-for-llm-observability">Semantic Attributes: Best Practices for LLM Observability</a></p>
</li>
<li><p><a href="#heading-evaluation-hooks-inside-traces">Evaluation Hooks Inside Traces</a></p>
</li>
<li><p><a href="#heading-exporting-and-visualizing-traces-where-this-fits-with-vendor-tooling">Exporting and Visualizing Traces (Where This Fits with Vendor Tooling)</a></p>
</li>
<li><p><a href="#heading-operational-patterns-and-anti-patterns">Operational Patterns and Anti-Patterns</a></p>
</li>
<li><p><a href="#heading-extending-the-system">Extending the System</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-introduction"><strong>Introduction</strong></h2>
<p>Large Language Models (LLMs) are rapidly becoming a core component of modern software systems. Applications that once relied on deterministic APIs are now incorporating LLM-powered features such as conversational assistants, document summarization, intelligent search, and retrieval-augmented generation (RAG).</p>
<p>While these capabilities unlock new user experiences, they also introduce operational complexity that traditional monitoring approaches were never designed to handle.</p>
<p>Unlike conventional software services, LLM systems are probabilistic by nature. The same request may produce slightly different responses depending on factors such as prompt structure, model configuration, retrieval context, and sampling parameters such as temperature or top-p.</p>
<p>In addition, LLM workloads introduce entirely new operational dimensions such as token consumption, prompt construction latency, inference cost, context window limits, and response quality.</p>
<p>These factors mean that a request can appear technically successful from an infrastructure perspective while still producing an incorrect, hallucinated, or low-quality result.</p>
<p>Traditional observability tools typically focus on infrastructure-level signals such as latency, error rate, and throughput. While these metrics remain important, they are insufficient for understanding how an LLM application behaves in production.</p>
<p>Engineers must also understand what prompt was constructed, which documents were retrieved, how many tokens were consumed, which model configuration was used, and how the final response was evaluated. Without this visibility, debugging LLM behavior becomes extremely difficult and operational costs can quickly spiral out of control.</p>
<p>This is where LLM observability becomes essential. Observability for LLM systems extends beyond infrastructure monitoring. It captures the full lifecycle of an AI-driven request — from user input and context retrieval to prompt construction, model inference, post-processing, and quality evaluation.</p>
<p>When implemented correctly, observability allows teams to answer why the model generated a particular response, which retrieval results influenced the output, how much a request cost in terms of tokens, where latency occurred within the request pipeline, and whether the response passed basic quality or safety checks.</p>
<p>This article demonstrates how to implement end-to-end LLM observability in a FastAPI application using OpenTelemetry. Instead of relying on proprietary monitoring agents or opaque vendor SDKs, we take a code-first approach to instrumentation. By explicitly designing traces, spans, and semantic attributes, we gain precise control over how LLM interactions are observed and analyzed.</p>
<p>Throughout the guide, we will walk through a practical architecture for tracing a retrieval-augmented generation (RAG) workflow, where each stage of the request lifecycle is represented as a trace span. We will explore how to design meaningful span boundaries, capture prompt and model metadata safely, record token usage and cost signals, and attach evaluation results directly to traces.</p>
<p>The article also explains how this instrumentation can be exported to any OpenTelemetry-compatible backend such as Jaeger, Grafana Tempo, or LLM-specific platforms like Phoenix.</p>
<p>By the end of this guide, you will understand how to:</p>
<ul>
<li><p>Structure traces so that each user request maps to a single end-to-end LLM interaction</p>
</li>
<li><p>Design span hierarchies that reflect the logical stages of an LLM pipeline</p>
</li>
<li><p>Capture prompt metadata, model configuration, and token usage safely</p>
</li>
<li><p>Attach evaluation and quality signals to traces for deeper analysis</p>
</li>
<li><p>Export observability data to different backends without changing instrumentation</p>
</li>
</ul>
<p>Most importantly, the goal of this article is not simply to demonstrate how to add telemetry to an application. Instead, it aims to show how to think about observability when building LLM-powered systems.</p>
<p>When LLM operations are treated as first-class components within a distributed system, traces become a powerful tool for debugging, optimization, cost management, and continuous improvement of model behavior.</p>
<h2 id="heading-prerequisites-and-technical-context">Prerequisites and Technical Context</h2>
<p>Before following this guide, you should be familiar with the Python programming language, basic web API concepts, and general microservice architecture. Below are some key tools and concepts used in this article.</p>
<h3 id="heading-fastapi-web-framework">FastAPI (Web Framework)</h3>
<p>FastAPI is used as the primary web framework for the application. It is a modern Python framework designed for building high-performance APIs using standard Python type hints. FastAPI simplifies request validation, serialization, and API documentation while remaining lightweight and fast.</p>
<h3 id="heading-large-language-models-llms">Large Language Models (LLMs)</h3>
<p>Large Language Models (LLMs) are the computational core of the example system. An LLM is a model trained on vast amounts of text data to generate or transform language in ways that resemble human communication. In production environments, LLMs are commonly used for tasks such as conversational interfaces, summarization, and question answering.</p>
<h3 id="heading-observability-concept">Observability (Concept)</h3>
<p>Observability is the overarching concept that connects all the technical pieces in this article. At a high level, observability refers to the ability to understand a system's internal behavior by examining the data it produces during execution. Rather than asking whether a system is simply "up" or "down," observability helps answer deeper questions about why a request behaved a certain way, where latency was introduced, or how different components interacted.</p>
<h3 id="heading-opentelemetry-instrumentation-standard">OpenTelemetry (Instrumentation Standard)</h3>
<p>OpenTelemetry is the mechanism used to implement observability within the application. It is an open, vendor-neutral standard for generating telemetry data such as traces, metrics, and logs. By instrumenting key parts of the LLM workflow, we can observe how requests flow through the system, how long each step takes, and what contextual data influenced the final outcome. OpenTelemetry serves as the foundation for collecting this information in a consistent and portable way, independent of any specific monitoring backend.</p>
<h2 id="heading-why-llm-observability-is-fundamentally-different">Why LLM Observability Is Fundamentally Different</h2>
<p>Traditional observability assumes deterministic behavior: the same input produces the same output. LLM systems violate this assumption. The same request can vary due to prompt template changes, retrieval differences, sampling parameters (temperature, top-p), model version upgrades, and context window truncation.​</p>
<p>As a result, teams need visibility into what the model saw, how it was configured, what it retrieved, how long it took, and how much it cost, all correlated to a single user request. Logs alone are insufficient, and metrics lack dimensionality. Distributed traces are the backbone of LLM observability.</p>
<h2 id="heading-reference-architecture-a-traceable-rag-request">Reference Architecture: A Traceable RAG Request</h2>
<p>A typical FastAPI-based RAG service follows this flow:</p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/6979762ba2442d262dacf388/50e7fda4-7407-43d6-8f12-045b8e73c7eb.png" alt="FastAPI Based RAG Service" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Each step is observable, but only if we deliberately instrument it. The goal is one trace per user request, with child spans representing each logical LLM step.</p>
<h2 id="heading-reference-architecture-explained">Reference Architecture Explained</h2>
<h3 id="heading-client-sends-a-request-to-chat">Client Sends a Request to /chat</h3>
<p>The architecture begins when a client sends a request to the <code>/chat</code> endpoint. This request typically contains the user's query along with any session or conversation context required by the application.</p>
<p>Keeping the client interface minimal and well-defined is intentional: it ensures the backend receives a predictable input shape and prevents application-specific logic from leaking into downstream LLM processing.</p>
<p>From an observability perspective, this request marks the start of a single end-to-end trace, allowing every subsequent operation to be correlated back to the original user action.</p>
<h3 id="heading-fastapi-validates-input-and-authenticates-the-user">FastAPI Validates Input and Authenticates the User</h3>
<p>Once the request reaches the service, FastAPI performs schema validation and authentication. Validation guarantees that only well-formed inputs proceed through the pipeline, while authentication ensures that expensive LLM operations are only executed for authorized users.</p>
<p>Placing this step early reduces unnecessary computation and protects the system from abuse. It also improves trace quality by ensuring that all observed requests represent legitimate execution paths rather than malformed or rejected traffic.</p>
<h3 id="heading-retriever-queries-the-vector-database">Retriever Queries the Vector Database​</h3>
<p>After validation, the system queries a vector database to retrieve documents relevant to the user's request. This retrieval step is the foundation of retrieval-augmented generation (RAG). By grounding the LLM in external knowledge, the system improves factual accuracy and reduces hallucinations.</p>
<p>Separating retrieval from generation allows teams to tune similarity thresholds, embedding models, and top-k values independently, and it makes it easier to diagnose whether poor responses are caused by bad retrieval or model behavior.</p>
<h3 id="heading-prompt-is-assembled-using-retrieved-documents">Prompt Is Assembled Using Retrieved Documents</h3>
<p>With relevant documents in hand, the system constructs the final prompt that will be sent to the LLM. This step combines the user query, retrieved context, system instructions, and formatting rules into a single structured prompt.</p>
<p>Making prompt assembly an explicit stage enables prompt versioning, experimentation, and observability. It also provides a natural place to detect issues such as context window overflows or excessive prompt size before invoking the model.</p>
<h3 id="heading-llm-api-is-invoked">LLM API Is Invoked</h3>
<p>The LLM API call is the most expensive and non-deterministic operation in the pipeline, which is why it occurs only after all preparatory work is complete. At this stage, the model receives a fully constructed prompt and produces a response based on its configuration parameters.</p>
<p>This step is the primary focus of latency, cost, and reliability controls such as retries, timeouts, and circuit breakers. From an observability standpoint, this span becomes the anchor for token usage, cost attribution, and prompt-level debugging.</p>
<h3 id="heading-response-is-post-processed-and-returned">Response Is Post-Processed and Returned</h3>
<p>After the LLM returns a response, the system performs post-processing before sending the result back to the client. This may include formatting, filtering, validation, or enrichment of the output. Post-processing acts as a final safeguard against malformed or low-quality responses and ensures consistency with application requirements. It also provides a clean boundary for attaching evaluation signals, such as response length, relevance scores, or truncation indicators, before the request completes.</p>
<h2 id="heading-why-this-design-is-better-than-simpler-alternatives">Why This Design Is Better Than Simpler Alternatives</h2>
<p>This architecture intentionally avoids coupling responsibilities together. Validation, retrieval, prompt construction, model execution, and response handling are all distinct steps. This separation makes the system easier to test, easier to observe, and easier to evolve. When something fails, engineers can identify <em>where</em> and <em>why</em> rather than treating the LLM as a black box.​</p>
<p>Compared to a monolithic "send user input directly to the LLM" approach, this design offers better correctness, lower cost, and higher resilience. It also aligns naturally with distributed tracing, since each block maps cleanly to a trace span with a clear semantic purpose. As the system grows, additional features such as caching, fallback models, or policy enforcement can be added without destabilizing the entire flow.​</p>
<p>Most importantly, this architecture treats the LLM as one component in a larger system, not the system itself. That mindset is essential for building reliable production applications.</p>
<h2 id="heading-llm-models-that-work-best-for-this-architecture">LLM Models That Work Best for This Architecture</h2>
<p>This architecture is model-agnostic, but certain model characteristics work particularly well with retrieval-augmented workflows.</p>
<p>Models with strong instruction-following and reasoning capabilities tend to perform best, especially when prompts include structured context from retrieved documents. General-purpose models such as GPT-4-class systems perform well when accuracy and reasoning depth are critical.</p>
<p>For lower-latency or cost-sensitive use cases, smaller instruction-tuned models can be effective when paired with high-quality retrieval. Open-source models such as LLaMA-derived or Mistral-based systems also fit well into this architecture, particularly when deployed behind a private inference endpoint.​</p>
<p>The key requirement is not the model itself, but how it is used. Models that can reliably ground their responses in provided context, respect system instructions, and produce stable outputs under varying prompts integrate most cleanly into this design. Because retrieval and prompt construction are explicit stages, models can be swapped or compared without changing the overall system structure.</p>
<h2 id="heading-opentelemetry-primer-llm-relevant-concepts-only">OpenTelemetry Primer (LLM-Relevant Concepts Only)</h2>
<p>OpenTelemetry defines three core types of telemetry data: traces, metrics, and logs. For LLM systems, traces are the most important. To make them useful, you need to understand a few building blocks:</p>
<ul>
<li><p>a <strong>trace</strong> represents a single end-to-end request</p>
</li>
<li><p>a <strong>span</strong> is a timed operation within that trace</p>
</li>
<li><p><strong>attributes</strong> are key–value metadata attached to spans</p>
</li>
<li><p><strong>events</strong> are time-stamped annotations</p>
</li>
<li><p><strong>context propagation</strong> ensures child spans attach to the correct parent.</p>
</li>
</ul>
<p>FastAPI’s async nature makes correct context propagation essential, but OpenTelemetry’s Python SDK handles this as long as spans are created correctly.</p>
<p>With those concepts in place, the next step is to wire OpenTelemetry into the app. Start by configuring the OpenTelemetry SDK in FastAPI: define a <code>TracerProvider</code>, attach a <code>Resource</code> (service name and environment), configure an exporter (Jaeger, Tempo, Phoenix, and so on), and enable FastAPI auto-instrumentation.</p>
<h2 id="heading-designing-llm-aware-spans">Designing LLM-Aware Spans</h2>
<h3 id="heading-span-taxonomy">Span Taxonomy</h3>
<p>A clean span hierarchy is critical. In this guide, a single <code>http.request</code> span (usually auto-generated) acts as the root, and it contains child spans such as <code>rag.retrieval</code>, <code>rag.prompt.build</code>, <code>llm.call</code>, <code>llm.postprocess</code>, and, optionally, <code>llm.eval</code>. Each of these spans represents a logical unit of work rather than an implementation detail.</p>
<h3 id="heading-span-boundaries">Span Boundaries</h3>
<p>Getting span boundaries right is just as important as picking the right span names. Avoid extremes like wrapping the entire LLM workflow in one giant span, creating a separate span for every token, or dumping all data into logs.</p>
<p>Instead, aim for a few coarse-grained spans that each represent a meaningful step in the request, enrich them with well-chosen attributes, and use events to mark important milestones within a span rather than splitting everything into smaller spans.</p>
<h3 id="heading-instrumenting-the-llm-call">Instrumenting the LLM Call</h3>
<p>When instrumenting the LLM call, treat it as the most critical span in the trace. Whether you are calling OpenAI, Anthropic, or another provider, start the span immediately before the API request and end it only after the full response (or stream) is complete.</p>
<p>Within that span, capture retries, timeouts, and errors so it becomes the central place for latency analysis, cost attribution, and prompt debugging.</p>
<p>For streaming responses, you can emit events for each chunk to track progress, but avoid creating separate child spans unless you truly need fine-grained timing.</p>
<h2 id="heading-fastapi-example-end-to-end-llm-spans-complete-and-explained">FastAPI Example: End-to-End LLM Spans (Complete and Explained)</h2>
<pre><code class="language-python">from fastapi import FastAPI, Request
from opentelemetry import trace
from opentelemetry.trace import Tracer
from typing import List
import asyncio
import hashlib

# Obtain a tracer instance from OpenTelemetry.
# All spans created with this tracer will be part of the same distributed
# tracing system and exported to the configured backend.
tracer: Tracer = trace.get_tracer(__name__)

# Initialize the FastAPI application.
app = FastAPI()

# Helper functions used by the observable endpoint
async def retrieve_documents(query: str) -&gt; List[str]:
    """
    Simulate document retrieval (e.g., vector search or knowledge base lookup).
    This function represents the retrieval stage in a RAG pipeline.
    In a real system, this might query a vector database or search index.
    """
    await asyncio.sleep(0.05)  # Simulate I/O latency
    return [
        "FastAPI enables high-performance async APIs.",
        "OpenTelemetry provides vendor-neutral observability.",
        "LLM observability requires tracing prompts and tokens.",
    ]


def build_prompt(query: str, documents: List[str]) -&gt; str:
    """
    Construct the final prompt from retrieved documents and the user query.
    Prompt construction is kept separate so it can be observed or modified
    independently if needed (for example, to measure prompt assembly latency).
    """
    context = "\n".join(documents)
    return f"""
Context:
{context}

Question:
{query}
"""


class LLMResponse:
    """
    Minimal abstraction for an LLM response.
    This keeps the example self-contained while still allowing us to attach
    token usage and other metadata for observability.
    """

    def __init__(self, text: str, prompt_tokens: int, completion_tokens: int):
        self.text = text
        self.prompt_tokens = prompt_tokens
        self.completion_tokens = completion_token
    
    @property
    def total_tokens(self) -&gt; int:
        return self.prompt_tokens + self.completion_tokens

async def call_llm(prompt: str) -&gt; LLMResponse:
    """
    Simulate an LLM API call.
    In a real implementation, this would call OpenAI, Anthropic, or another
    provider. The artificial delay represents model latency.
    """
    await asyncio.sleep(0.2)  # Simulate inference time
    response_text = "FastAPI and OpenTelemetry enable end-to-end LLM observability."
    # Token count is approximated here for demonstration purposes.
    prompt_tokens = len(prompt.split())
    completion_tokens = len(response_text.split())
    return LLMResponse(response_text, prompt_tokens, completion_tokens)


def summarize_response(response: LLMResponse) -&gt; str:
    """
    Example post-processing step.
    Post-processing is separated into its own phase so any additional latency
    or errors are not incorrectly attributed to the LLM itself.
    """
    return response.text


# Observable FastAPI endpoint
@app.post("/query")
async def rag_query(request: Request, query: str):
    """
    Handle a single RAG-style request with explicit OpenTelemetry spans.
    This endpoint demonstrates how to create one trace per request, with child
    spans for retrieval, LLM invocation, and post-processing.
    """

    # Create a top-level span for the HTTP request.
    # Even if FastAPI auto-instrumentation is enabled, defining this explicitly
    # allows us to attach domain-specific metadata.
    with tracer.start_as_current_span("http.request") as http_span:
        http_span.set_attribute("http.method", "POST")
        http_span.set_attribute("http.route", "/query")

        # Retrieval phase
        # This span isolates the retrieval step so that relevance issues can be
        # debugged independently of LLM behavior.
        with tracer.start_as_current_span("rag.retrieval") as retrieval_span:
            retrieval_span.set_attribute("rag.top_k", 5)
            retrieval_span.set_attribute("rag.similarity_threshold", 0.8)
            documents = await retrieve_documents(query)

            # Record how many documents were returned.
            # This is a key signal when diagnosing hallucinations
            # or missing context in the final response.
            retrieval_span.set_attribute(
                "rag.documents_returned",
                len(documents),
            )

        # LLM invocation phase
        # This span wraps the actual LLM call and is the primary anchor for
        # latency, cost, and prompt-related analysis.
        with tracer.start_as_current_span("llm.call") as llm_span:
            llm_span.set_attribute("llm.provider", "example")
            llm_span.set_attribute("llm.model", "example-llm")
            llm_span.set_attribute("llm.temperature", 0.7)
            llm_span.set_attribute("llm.prompt_template_id", "rag_v1")

            # Build the final prompt using retrieved context.
            # The raw prompt is intentionally not stored as a span attribute.
            prompt = build_prompt(query, documents)
            
            # Prompt metadata
            prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()
            llm_span.set_attribute("llm.prompt_hash", prompt_hash)
            llm_span.set_attribute("llm.prompt_length", len(prompt))

            response = await call_llm(prompt)

            # Hash the response instead of storing raw text.
            # This allows correlation across traces without exposing content.
            response_hash = hashlib.sha256(
                response.text.encode()
            ).hexdigest()
            llm_span.set_attribute("llm.response_hash", response_hash)

            # Record token usage to enable cost attribution
            # and capacity planning.
            llm_span.set_attribute("llm.usage.prompt_tokens", response.prompt_tokens)
            llm_span.set_attribute("llm.usage.completion_tokens", response.completion_tokens)
            llm_span.set_attribute("llm.usage.total_tokens", response.total_tokens)
            
            # example price per token
            estimated_cost = response.total_tokens * 0.000002
            llm_span.set_attribute("llm.cost_estimated_usd", estimated_cost)

        # Post-processing phase
        # Any transformation after the LLM response is captured here,
        # ensuring inference latency is not overstated.
        with tracer.start_as_current_span("llm.postprocess") as post_span:
            summary = summarize_response(response)
            post_span.set_attribute(
                "llm.summary_length",
                len(summary),
            )

    # Return the final response to the client.
    # All spans above belong to the same distributed trace.
    return {"summary": summary}
</code></pre>
<p>Before examining the full code example, it helps to understand how the instrumentation relates to the observability principles described earlier in this article.</p>
<p>The goal of the example is not simply to show how to create spans, but to demonstrate how a single user request can be represented as a structured trace containing meaningful metadata about each stage of the LLM pipeline.</p>
<p>At a high level, the code follows three key design ideas:</p>
<ol>
<li><p>One trace per user request</p>
</li>
<li><p>One span per logical LLM workflow stage</p>
</li>
<li><p>Semantic attributes attached to spans for debugging, cost tracking, and analysis</p>
</li>
</ol>
<p>Each of these concepts directly corresponds to the observability practices discussed earlier.</p>
<h3 id="heading-top-level-request-span">Top-Level Request Span</h3>
<p>The FastAPI endpoint begins by creating a top-level span called <code>http.request</code>. This span represents the entire lifecycle of the incoming request and serves as the root span for the trace.</p>
<pre><code class="language-python">with tracer.start_as_current_span("http.request") as http_span:
</code></pre>
<p>Although FastAPI can generate HTTP spans automatically through OpenTelemetry auto-instrumentation, explicitly creating this span allows the application to attach domain-specific metadata such as route names or user identifiers.</p>
<p>Attributes such as the HTTP method and route are attached here:</p>
<pre><code class="language-python">http_span.set_attribute("http.method", "POST")
http_span.set_attribute("http.route", "/query")
</code></pre>
<p>This ensures that every trace can be easily filtered by endpoint when analyzing production traffic.</p>
<h3 id="heading-retrieval-span">Retrieval Span</h3>
<p>The next span captures the retrieval phase of the RAG pipeline:</p>
<pre><code class="language-python">with tracer.start_as_current_span("rag.retrieval") as retrieval_span:
</code></pre>
<p>This span isolates the vector search or knowledge retrieval step from the rest of the pipeline. If users report irrelevant answers, engineers can inspect this span to determine whether the issue originates from poor retrieval results rather than model behavior.</p>
<p>Several semantic attributes are attached here:</p>
<ul>
<li><p><code>rag.top_k</code> – number of documents requested</p>
</li>
<li><p><code>rag.similarity_threshold</code> – similarity cutoff used for filtering results</p>
</li>
<li><p><code>rag.documents_returned</code> – number of documents actually retrieved</p>
</li>
</ul>
<p>These attributes align with the RAG observability signals discussed in the earlier section of the article.</p>
<h3 id="heading-llm-invocation-span">LLM Invocation Span</h3>
<p>The most important span in the trace is the <code>llm.call</code> span, which wraps the actual model invocation.</p>
<pre><code class="language-python">with tracer.start_as_current_span("llm.call") as llm_span:
</code></pre>
<p>This span captures the latency, configuration, and token usage associated with the LLM request. In production systems, it becomes the primary location for analyzing model behavior and cost.</p>
<p>Key attributes recorded in this span include:</p>
<ul>
<li><p><code>llm.provider</code> – the model provider (OpenAI, Anthropic, etc.)</p>
</li>
<li><p><code>llm.model</code> – the specific model version</p>
</li>
<li><p><code>llm.temperature</code> – sampling parameter controlling response randomness</p>
</li>
<li><p><code>llm.prompt_template_id</code> – identifier for the prompt template used</p>
</li>
</ul>
<p>These attributes make it possible to correlate changes in model configuration with downstream quality or cost changes.</p>
<h3 id="heading-prompt-handling-and-privacy">Prompt Handling and Privacy</h3>
<p>Instead of storing the full prompt or response text directly in the trace, the example demonstrates a safer practice: hashing sensitive data.</p>
<pre><code class="language-python">response_hash = hashlib.sha256(response.text.encode()).hexdigest()
</code></pre>
<p>The resulting hash is stored as a span attribute:</p>
<pre><code class="language-python">llm_span.set_attribute("llm.response_hash", response_hash)
</code></pre>
<p>This approach allows engineers to correlate repeated responses across traces without exposing potentially sensitive content in observability systems.</p>
<h3 id="heading-token-usage-tracking">Token Usage Tracking</h3>
<p>The <code>llm.call</code> span also records token usage:</p>
<pre><code class="language-python">llm_span.set_attribute(
    "llm.usage.total_tokens",
    response.total_tokens
)
</code></pre>
<p>Capturing token usage at the span level is critical for monitoring cost and efficiency, since token consumption directly determines billing for most LLM providers.</p>
<h3 id="heading-post-processing-span">Post-Processing Span</h3>
<p>Finally, the example includes a <code>llm.postprocess</code> span:</p>
<pre><code class="language-python">with tracer.start_as_current_span("llm.postprocess") as post_span:
</code></pre>
<p>This span represents any transformation applied after the model generates its response. Separating post-processing from the LLM call ensures that additional latency — such as formatting, filtering, or validation — is not incorrectly attributed to the model itself.</p>
<p>An attribute such as response length is recorded here:</p>
<pre><code class="language-python">post_span.set_attribute("llm.summary_length", len(summary))
</code></pre>
<p>This can be useful when diagnosing issues such as unexpectedly short or truncated outputs.</p>
<h3 id="heading-how-the-spans-form-a-complete-trace">How the Spans Form a Complete Trace</h3>
<p>When the request finishes, all spans belong to the same distributed trace:</p>
<pre><code class="language-plaintext">http.request
 ├── rag.retrieval
 ├── llm.call
 └── llm.postprocess
</code></pre>
<p>This hierarchy reflects the logical workflow of a retrieval-augmented LLM system. Because each span contains structured metadata, engineers can quickly answer questions such as:</p>
<ul>
<li><p>Was the latency caused by retrieval or model inference?</p>
</li>
<li><p>How many documents influenced the prompt?</p>
</li>
<li><p>Which model configuration produced the response?</p>
</li>
<li><p>How many tokens were consumed?</p>
</li>
<li><p>Was the response post-processed or truncated?</p>
</li>
</ul>
<p>This structured trace design is what transforms observability from simple monitoring into a practical debugging and optimization tool for LLM systems.</p>
<h2 id="heading-semantic-attributes-best-practices-for-llm-observability">Semantic Attributes: Best Practices for LLM Observability</h2>
<p>The goal is not to capture every possible detail, but to record the minimal set of stable, high-signal attributes that enable effective debugging, cost control, and quality analysis in production. Poor attribute design leads to noisy traces, privacy risks, and dashboards that are impossible to reason about.</p>
<h3 id="heading-prompt-response-and-model-metadata">Prompt, Response, and Model Metadata​</h3>
<p>Storing raw prompts is often unsafe and expensive, so it is better to record minimal, structured metadata instead. In practice, this means attaching a stable template identifier with <code>llm.prompt_template_id</code>, a hashed version of the final prompt using <code>llm.prompt_hash</code> (to avoid storing raw text), and a size indicator such as <code>llm.prompt_length</code>, which captures the number of tokens or characters.</p>
<p>You should also always record key inference parameters: <code>llm.provider</code> (for example, "openai" or "anthropic"), <code>llm.model</code> (for example, "gpt-4.1"), <code>llm.temperature</code> and <code>llm.top_p</code> (sampling parameters), <code>llm.max_tokens</code> (the maximum tokens allowed), and <code>llm.stream</code> to indicate whether streaming was enabled, while staying within your organization’s privacy and compliance requirements.</p>
<pre><code class="language-python">
with tracer.start_as_current_span("llm.call") as llm_span:
            llm_span.set_attribute("llm.provider", "example")
            llm_span.set_attribute("llm.model", "example-llm")
            llm_span.set_attribute("llm.temperature", 0.7)
            llm_span.set_attribute("llm.top_p", 0.9)
            llm_span.set_attribute("llm.max_tokens", 512)
            llm_span.set_attribute("llm.stream", False)
            llm_span.set_attribute("llm.prompt_template_id", "rag_v1")

            # Build the final prompt using retrieved context.
            # The raw prompt is intentionally not stored as a span attribute.
            prompt = build_prompt(query, documents)
            
            # Prompt metadata
            prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()
            llm_span.set_attribute("llm.prompt_hash", prompt_hash)
            llm_span.set_attribute("llm.prompt_length", len(prompt))
</code></pre>
<h3 id="heading-token-usage-and-cost-why-this-matters-in-practice">Token Usage and Cost (Why This Matters in Practice)</h3>
<p>Token usage is one of the most common blind spots in LLM systems. Many teams monitor latency and error rates but discover runaway costs only after invoices spike. Because token consumption varies significantly by prompt structure, retrieved context, and model configuration, it must be captured explicitly at the span level.​</p>
<p>The most important practice is to record token usage at the end of the LLM span, once the model has completed inference. This ensures that the values reflect the full request rather than partial or streamed output.</p>
<p>At minimum, capture the attributes:​<code>llm.usage.prompt_tokens</code> ,<code>llm.usage.completion_tokens</code> and <code>llm.usage.total_tokens</code>​.</p>
<pre><code class="language-python">def __init__(self, text: str, prompt_tokens: int, completion_tokens: int):
        self.text = text
        self.prompt_tokens = prompt_tokens
        self.completion_tokens = completion_token
    
    @property
    def total_tokens(self) -&gt; int:
        return self.prompt_tokens + self.completion_tokens

async def call_llm(prompt: str) -&gt; LLMResponse:
    """
    Simulate an LLM API call.
    In a real implementation, this would call OpenAI, Anthropic, or another
    provider. The artificial delay represents model latency.
    """
    await asyncio.sleep(0.2)  # Simulate inference time
    response_text = "FastAPI and OpenTelemetry enable end-to-end LLM observability."
    # Token count is approximated here for demonstration purposes.
    prompt_tokens = len(prompt.split())
    completion_tokens = len(response_text.split())
    return LLMResponse(response_text, prompt_tokens, completion_tokens)
</code></pre>
<p>These values allow you to distinguish between requests that are expensive because of large prompts (often caused by excessive retrieval or poor prompt construction) versus those that are expensive because of long model-generated outputs.</p>
<p>*Where possible, also attach an estimated cost:*​ <code>llm.cost_estimated_usd</code>​</p>
<pre><code class="language-python">    # example price per token
    estimated_cost = response.total_tokens * 0.000002
    llm_span.set_attribute("llm.cost_estimated_usd", estimated_cost)
</code></pre>
<p>This value is typically derived by multiplying token counts by the model's published pricing. Even if the estimate is approximate, it enables powerful analysis. For example, you can identify which endpoints, prompt templates, or user flows are responsible for the highest cumulative cost, rather than relying on coarse, account-level billing dashboards.</p>
<p>Once spans carry the right attributes, the next step is to connect them to output quality, not just system health.</p>
<h2 id="heading-evaluation-hooks-inside-traces">Evaluation Hooks Inside Traces</h2>
<p>This section describes an additional pattern you can layer on top of the core instrumentation in this guide. It is optional and not implemented in the sample code, but it shows how to attach quality signals directly to your traces.</p>
<p>Observability is not just about whether the system stayed up, it is also about whether the model produced a useful answer. Evaluation hooks inside traces let you attach lightweight quality signals directly to the same spans you use for latency and cost.</p>
<p>Inline evaluations are the simplest approach. You can run quick checks synchronously and record the results as span attributes, such as <code>llm.eval.passed</code> for a simple boolean check, <code>llm.eval.relevance_score</code> for an optional numerical score, or flags like <code>llm.eval.hallucination_detected</code> and <code>llm.eval.refusal_detected</code>. These attributes travel with the trace, so you can filter and aggregate on them in your observability backend just like any other field.</p>
<p>For higher accuracy, you can introduce model-based evaluation as a separate step. In this pattern, an evaluator LLM runs asynchronously on the original prompt and response, and its work is captured in a child span (for example, <code>llm.eval</code>) that shares the same trace ID as the main <code>llm.call</code> span. You then attach scores such as relevance, faithfulness, or toxicity to that evaluation span.</p>
<p>Because the evaluation span shares the same trace ID, you can correlate quality regressions with changes in prompts or retrieval.</p>
<h2 id="heading-exporting-and-visualizing-traces-where-this-fits-with-vendor-tooling">Exporting and Visualizing Traces (Where This Fits with Vendor Tooling)</h2>
<p>This code-first observability design is vendor-agnostic. Once traces are emitted using OpenTelemetry, they can be exported to different backends without changing instrumentation.</p>
<p>General-purpose tracing systems like Jaeger and Grafana Tempo help engineers debug latency, errors, and request flow across retrieval, prompting, and model calls, answering how the system behaved. LLM-focused platforms such as Arize Phoenix use the same data but add model-specific insights like prompt clustering, token analysis, and quality correlation.</p>
<p>Because instrumentation stays OpenTelemetry-native, you maintain full control over attributes and trace structure while still using vendor dashboards, and you can switch backends as your needs evolve without touching the application code.</p>
<h2 id="heading-operational-patterns-and-anti-patterns">Operational Patterns and Anti-Patterns</h2>
<p>Effective LLM observability requires disciplined practices. High-volume systems should sample traces to limit overhead, and prompts or responses should be hashed by default to reduce storage and privacy risk. Traces must be treated as production data, with proper access control and retention policies.</p>
<p>Common pitfalls include relying only on vendor SDK traces, logging prompts without trace correlation, or ignoring evaluation signals. These issues fragment visibility and hide quality regressions, especially when observability focuses only on agents instead of full application context.</p>
<h2 id="heading-extending-the-system">Extending the System</h2>
<p>Once traces are reliable, they support advanced capabilities. Metrics like p95 latency can be derived from spans, logs can be linked using trace IDs, and historical traces can power offline evaluation or prompt testing.​</p>
<p>By following OpenTelemetry conventions, the observability stack also stays aligned with emerging LLM semantic standards, keeping the system flexible and future-proof.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>End-to-end LLM observability is not achieved by installing another agent. It is achieved through intentional span design, meaningful semantic attributes, and, where needed, lightweight evaluation hooks.​</p>
<p>By treating LLM calls as first-class operations within distributed traces, you gain faster debugging, controlled costs, safer deployments, and measurable quality improvements. The backend — Jaeger, Tempo, Phoenix — is interchangeable. The instrumentation strategy is not.​</p>
<p>A well-designed trace is the most valuable artifact in a production LLM system.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build Your Own Circuit Breaker in Spring Boot – and Really Understand Resilience4j ]]>
                </title>
                <description>
                    <![CDATA[ This article explains how to design and implement your own circuit breaker in Spring Boot using explicit failure tracking, a scheduler-driven recovery model, and clear state transitions. Instead of relying solely on Resilience4j, we’ll walk through t... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-your-own-circuit-breaker-in-spring-boot-and-really-understand-resilience4j/</link>
                <guid isPermaLink="false">69938789dce780a9836b8f09</guid>
                
                    <category>
                        <![CDATA[ Springboot ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Resilience ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Java ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Circuit breaker pattern ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Microservices ]]>
                    </category>
                
                    <category>
                        <![CDATA[ fault tolerance ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Backend Engineering ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Jessica Patel ]]>
                </dc:creator>
                <pubDate>Mon, 16 Feb 2026 21:09:29 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771276149217/e55b0a5c-53e2-4d2c-a004-1467a7c2b17a.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>This article explains how to design and implement your own circuit breaker in Spring Boot using explicit failure tracking, a scheduler-driven recovery model, and clear state transitions.</p>
<p>Instead of relying solely on Resilience4j, we’ll walk through the internal mechanics so you understand how circuit breakers actually work.</p>
<h2 id="heading-what-well-cover">What We’ll Cover</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-prerequisites-and-technical-context">Prerequisites and Technical Context</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-a-circuit-breaker-in-distributed-systems">What Is a Circuit Breaker in Distributed Systems</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-circuit-breakers-matter-in-spring-boot">Why Circuit Breakers Matter in Spring Boot</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-circuit-breakers-are-foundational">Why Circuit Breakers Are Foundational</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-problem-circuit-breakers-solve-that-times-and-retries-do-not">What Problem Circuit Breakers Solve That Timeouts and Retries Do Not</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-circuit-breaker-state-model">The Circuit Breaker State Model</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-not-just-use-resilience4j">Why Not Just Use Resilience4j?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-design-goals-for-a-custom-circuit-breaker">Design Goals for a Custom Circuit Breaker</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-a-minimal-working-circuitbreaker-class">How to Build a Minimal Working CircuitBreaker Class</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-concurrency-and-state-transition-guarantees">Concurrency and State Transition Guarantees</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-explaining-the-state-model-in-the-class">Explaining the State Model in the Class</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-failure-tracking-inside-the-class">Failure Tracking Inside the Class</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-closed-state-transitions-to-open">How Closed State Transitions to Open</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-open-state-behavior-in-the-class">OPEN State Behavior in the Class</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-schedulerdriven-recovery-entering-halfopen">Scheduler‑Driven Recovery: Entering HALF_OPEN</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-spring-boot-scheduler-example">Spring Boot Scheduler Example</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-this-connects-to-execution-flow">How This Connects to Execution Flow</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-scheduler-design-and-thread-safety">Scheduler Design and Thread Safety</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-we-avoid-scheduled-for-this-design">Why We Avoid @Scheduled for This Design</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-bringing-it-all-together">Bringing It All Together</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-observability-making-the-breaker-understandable">Observability: Making the Breaker Understandable</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-handling-different-failure-types">Handling Different Failure Types</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-custom-breaker-vs-resilience4j">Custom Breaker vs Resilience4j</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-you-should-not-build-your-own">When You Should Not Build Your Own</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-extending-the-design">Extending the Design</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-common-mistakes">Common Mistakes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-prerequisites-and-technical-context"><strong>Prerequisites and Technical Context</strong></h2>
<p>This article assumes you are comfortable with core Spring Boot and Java concepts. We won’t cover framework fundamentals or basic concurrency principles in depth. Here’s what you’ll need to know:</p>
<h3 id="heading-spring-boot-basics">Spring Boot Basics</h3>
<p>You should be comfortable with how dependency injection works in Spring, how to define <code>@Configuration</code> classes and <code>@Bean</code> definitions, and the basic service-layer structure of a Spring application. In this tutorial, we’ll treat the circuit breaker as a plain Java component and wire it into Spring through configuration classes rather than annotations.</p>
<h3 id="heading-java-concurrency-fundamentals">Java Concurrency Fundamentals</h3>
<p>You don’t need to be a concurrency expert, but you should be comfortable with Java’s basic concurrency tools. The implementation uses <code>AtomicInteger</code>, volatile fields, a <code>ScheduledExecutorService</code>, and simple synchronization, so you should understand why shared mutable state is dangerous, how atomic operations differ from synchronized blocks, and why state transitions in a shared state machine must be serialized.</p>
<h3 id="heading-functional-interfaces">Functional Interfaces</h3>
<p>The circuit breaker exposes an <code>execute(Supplier&lt;T&gt;)</code> method, so you should be comfortable using <code>Supplier&lt;T&gt;</code>, writing simple lambda expressions, and wrapping outbound service calls inside a function you can pass to the breaker.</p>
<h3 id="heading-resilience4j-basics">Resilience4j Basics</h3>
<p>You don’t need hands-on Resilience4j experience, but you should know that it’s a lightweight Java fault-tolerance library that offers circuit breakers, retries, rate limiters, bulkheads, and is commonly used in Spring Boot via annotations or config. In this article we’ll only reference Resilience4j for comparison, not for actual configuration or usage.</p>
<h2 id="heading-what-is-a-circuit-breaker-in-distributed-systems">What Is a Circuit Breaker in Distributed Systems?</h2>
<p>A circuit breaker is a fault-tolerance pattern that stops a system from repeatedly attempting operations that are likely to fail.</p>
<p>The name comes from electrical engineering. In a physical circuit, a breaker “opens” when the current becomes unsafe, preventing damage. After a cooldown period, it allows current to flow again to test whether the issue has been resolved.</p>
<p>In software, the same principle applies. When Service A depends on Service B, and Service B becomes slow or unavailable, naïvely retrying every request can:</p>
<ul>
<li><p>Exhaust thread pools</p>
</li>
<li><p>Saturate connection pools</p>
</li>
<li><p>Increase latency across the system</p>
</li>
<li><p>Trigger cascading failures</p>
</li>
<li><p>Bring down otherwise healthy services</p>
</li>
</ul>
<p>Instead of continuing to send requests to a failing dependency, a circuit breaker:</p>
<ul>
<li><p>Detects repeated failures</p>
</li>
<li><p>Opens the circuit and blocks calls</p>
</li>
<li><p>Fails fast without attempting the operation</p>
</li>
<li><p>Periodically tests whether the dependency has recovered</p>
</li>
</ul>
<p>This turns uncontrolled failure into controlled degradation.</p>
<h3 id="heading-why-circuit-breakers-matter-in-spring-boot">Why Circuit Breakers Matter in Spring Boot</h3>
<p>Because circuit breakers are a foundational resilience pattern in distributed systems, most Spring Boot teams reach immediately for Resilience4j or legacy Hystrix‑style abstractions – and for good reason. These libraries are mature, well-tested, and production-proven.​</p>
<p>However, treating circuit breakers as black boxes often leads to:​</p>
<ul>
<li><p>Misconfigured thresholds</p>
</li>
<li><p>Incorrect assumptions about failure handling</p>
</li>
<li><p>Difficulty extending behavior beyond library defaults</p>
</li>
<li><p>Debugging issues where “the breaker opened, but we don’t know why”​</p>
</li>
</ul>
<p>Building your own circuit breaker – even if you never ship it to production – forces you to understand the mechanics that actually protect your system. In some cases, a custom implementation also provides flexibility that general-purpose libraries cannot.</p>
<h3 id="heading-why-circuit-breakers-are-foundational">Why Circuit Breakers Are Foundational</h3>
<p>Circuit breakers are a foundational resilience pattern because they protect your scarcest resources (like threads, network and database connections, and CPU time) from being exhausted by a failing dependency.</p>
<p>Without a breaker, a single slow service can gradually consume all of those resources and turn a local problem into a system-wide outage.</p>
<p>Circuit breakers enforce isolation boundaries between services and sit alongside timeouts, retries, bulkheads, and rate limiters, but they make one crucial strategic choice that simple retries do not: they <strong>stop trying for now</strong>. That decision is what prevents cascading collapse.</p>
<h3 id="heading-what-problem-circuit-breakers-solve-that-timeouts-and-retries-do-not">What Problem Circuit Breakers Solve That Timeouts and Retries Do Not</h3>
<p>Timeouts and retries are reactive: timeouts cap how long you wait, and retries try the same operation again in the hope it succeeds.</p>
<p>A circuit breaker is proactive. It monitors failure patterns and, once a threshold is crossed, temporarily disables the failing integration point so new requests are rejected immediately instead of timing out.This dramatically reduces resource waste and stabilizes the system under stress.</p>
<h3 id="heading-the-circuit-breaker-state-model">The Circuit Breaker State Model</h3>
<p>Any circuit breaker – library-based or custom – follows the same conceptual state machine.​</p>
<ol>
<li><p><strong>Closed:</strong> In the Closed state, all requests are allowed and failures are simply monitored.</p>
</li>
<li><p><strong>Open:</strong> When failures cross a configured threshold, the breaker moves to Open, blocks new requests, and makes them fail immediately.</p>
</li>
<li><p><strong>Half-Open:</strong> After a cooldown period, it enters Half-Open, where it lets a small number of trial requests through to test whether the dependency has recovered; based on those results, it either returns to Closed or goes back to Open.</p>
</li>
</ol>
<p>The complexity lies not in the states themselves, but in <strong>how and when transitions occur</strong>.​</p>
<h3 id="heading-why-not-just-use-resilience4j">Why Not Just Use Resilience4j?</h3>
<p>Resilience4j is excellent, but there are valid reasons to build your own:​</p>
<ul>
<li><p>You want non-standard failure logic (for example, domain-aware errors).</p>
</li>
<li><p>You need custom recovery strategies.</p>
</li>
<li><p>You want state persisted or shared differently.</p>
</li>
<li><p>You need tight integration with business metrics.</p>
</li>
<li><p>You want to understand the internals for tuning and debugging.​</p>
</li>
</ul>
<p>More importantly, understanding the internals prevents misuse. Many production incidents stem from misconfigured circuit breakers rather than missing ones.​</p>
<h2 id="heading-design-goals-for-a-custom-circuit-breaker">Design Goals for a Custom Circuit Breaker</h2>
<p>Before writing any code, we need to be clear about what “correct” behavior looks like. A circuit breaker seems simple in theory, but subtle design mistakes can introduce race conditions, false openings, or silent failures where it stops protecting the system.</p>
<p>The following goals shape a predictable and production-safe implementation.</p>
<h3 id="heading-thread-safe-and-low-overhead">Thread-Safe and Low Overhead</h3>
<p>The breaker sits on the hot path of outbound calls, so every protected request passes through it. If it introduces lock contention or heavy synchronization, it quickly becomes a bottleneck.</p>
<p>The implementation needs to avoid coarse-grained locking, use atomic primitives carefully, and serialize state transitions without blocking execution more than necessary. Thread safety is non‑negotiable: a circuit breaker that misbehaves under concurrency is worse than having no breaker at all.</p>
<h3 id="heading-predictable-state-transitions">Predictable State Transitions</h3>
<p>Circuit breakers are state machines. If their transitions are inconsistent or prone to races, you end up with split‑brain behavior – one thread believes the breaker is OPEN while another believes it is CLOSED – and your protection becomes undefined.</p>
<p>To avoid this, every transition (CLOSED → OPEN → HALF_OPEN → CLOSED) must be explicit, atomic, and deterministic, all guarded by a single transition mechanism. In this design, predictability matters far more than cleverness.</p>
<h3 id="heading-explicit-failure-tracking">Explicit Failure Tracking</h3>
<p>Not every failure should open the breaker. If you blindly count every exception, you risk opening the breaker on client validation errors, treating business rule violations as infrastructure failures, and hiding real domain bugs behind resilience logic.</p>
<p>Failure classification has to be deliberate: the breaker should react only to infrastructure‑level problems such as timeouts, connection errors, and 5xx responses, not to domain logic errors. Keeping that separation ensures your resilience layer stays aligned with actual failure modes.</p>
<h3 id="heading-time-based-recovery-using-a-scheduler">Time-Based Recovery Using a Scheduler</h3>
<p>Some implementations check timestamps on every request to decide when to move from OPEN to HALF_OPEN, adding extra branching to the hot path.</p>
<p>Instead, this design uses a scheduler: when the breaker opens, it schedules a recovery attempt, keeps the OPEN state purely fail‑fast, and avoids request‑driven polling. That approach reduces branching and contention under load. Recovery should be controlled and predictable – not opportunistic.</p>
<h3 id="heading-framework-agnostic-core-logic">Framework-Agnostic Core Logic</h3>
<p>The breaker itself should be plain Java – no Spring annotations, no AOP, and no direct framework coupling. That choice makes unit testing easier, keeps the component portable, and preserves a clean separation of concerns with less hidden magic. Spring should wrap the breaker, not define it, so your resilience strategy is not trapped inside any one framework’s abstractions.</p>
<h3 id="heading-easy-integration-into-spring-boot">Easy Integration into Spring Boot</h3>
<p>Although the core logic is framework‑agnostic, it still needs to plug cleanly into a Spring application. That means wiring it via <code>@Configuration</code>, supporting dependency injection, and calling it from clear execution points in your service layer. Resilience behavior should be obvious in code reviews. Hiding it behind annotations often leads to confusion when you are debugging production issues.</p>
<h2 id="heading-how-to-build-a-minimal-working-circuitbreaker-class">How to Build a Minimal Working CircuitBreaker Class</h2>
<p>Now let’s turn the conceptual components into a single cohesive class. This is still a minimal implementation, but it’s complete enough to demonstrate state, failure tracking, scheduling, and execution logic in one place.</p>
<p>A minimal circuit breaker consists of:​</p>
<ol>
<li><p>State holder</p>
</li>
<li><p>Failure tracker</p>
</li>
<li><p>Transition rules</p>
</li>
<li><p>Scheduler for recovery</p>
</li>
<li><p>Execution guard</p>
</li>
</ol>
<pre><code class="lang-java"><span class="hljs-keyword">public</span> <span class="hljs-keyword">final</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CircuitBreaker</span> </span>{

    <span class="hljs-class"><span class="hljs-keyword">enum</span> <span class="hljs-title">State</span> </span>{
        CLOSED,
        OPEN,
        HALF_OPEN
    }

    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> ScheduledExecutorService scheduler;
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> <span class="hljs-keyword">int</span> failureThreshold;
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> <span class="hljs-keyword">int</span> halfOpenTrialLimit;
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> Duration openCooldown;

    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> AtomicInteger failureCount = <span class="hljs-keyword">new</span> AtomicInteger(<span class="hljs-number">0</span>);
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> AtomicInteger halfOpenTrials = <span class="hljs-keyword">new</span> AtomicInteger(<span class="hljs-number">0</span>);

    <span class="hljs-comment">// All transitions go through this field, guarded by `synchronized` blocks.</span>
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">volatile</span> State state = State.CLOSED;

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-title">CircuitBreaker</span><span class="hljs-params">(
            ScheduledExecutorService scheduler,
            <span class="hljs-keyword">int</span> failureThreshold,
            <span class="hljs-keyword">int</span> halfOpenTrialLimit,
            Duration openCooldown
    )</span> </span>{
        <span class="hljs-keyword">this</span>.scheduler = scheduler;
        <span class="hljs-keyword">this</span>.failureThreshold = failureThreshold;
        <span class="hljs-keyword">this</span>.halfOpenTrialLimit = halfOpenTrialLimit;
        <span class="hljs-keyword">this</span>.openCooldown = openCooldown;
    }

    <span class="hljs-keyword">public</span> &lt;T&gt; <span class="hljs-function">T <span class="hljs-title">execute</span><span class="hljs-params">(Supplier&lt;T&gt; action)</span> </span>{
        <span class="hljs-comment">// 1. Guards the functionality based on its current state. </span>
        <span class="hljs-comment">//We are using synchronized block for thread safety. </span>
        <span class="hljs-comment">// Make sure another thread does not override our current state</span>
        State current;
        <span class="hljs-keyword">synchronized</span> (<span class="hljs-keyword">this</span>) {
            current = state;

            <span class="hljs-keyword">if</span> (current == State.OPEN) {
                <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> IllegalStateException(<span class="hljs-string">"Circuit breaker is OPEN. Call rejected."</span>);
            }

            <span class="hljs-keyword">if</span> (current == State.HALF_OPEN) {
                <span class="hljs-keyword">int</span> trials = halfOpenTrials.incrementAndGet();
                <span class="hljs-keyword">if</span> (trials &gt; halfOpenTrialLimit) {
                    <span class="hljs-comment">// Too many trial requests; fail fast.</span>
                    halfOpenTrials.decrementAndGet();
                    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> IllegalStateException(<span class="hljs-string">"Circuit breaker is HALF_OPEN. Trial limit exceeded."</span>);
                }
            }
        }

        <span class="hljs-comment">// 2. Execute the business functionality here. For e.g API calls to other systems </span>
        <span class="hljs-keyword">try</span> {
            T result = action.get();
            <span class="hljs-comment">// 3. Record success</span>
            onSuccess();
            <span class="hljs-keyword">return</span> result;
        } <span class="hljs-keyword">catch</span> (Throwable t) {
            <span class="hljs-comment">// 3. Record failure</span>
            onFailure(t);
            <span class="hljs-comment">// 4. Propagate to caller</span>
            <span class="hljs-keyword">if</span> (t <span class="hljs-keyword">instanceof</span> RuntimeException re) {
                <span class="hljs-keyword">throw</span> re;
            }
            <span class="hljs-keyword">if</span> (t <span class="hljs-keyword">instanceof</span> Error e) {
                <span class="hljs-keyword">throw</span> e;
            }
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> RuntimeException(t);
        }
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">onSuccess</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">synchronized</span> (<span class="hljs-keyword">this</span>) {
            failureCount.set(<span class="hljs-number">0</span>);

            <span class="hljs-keyword">if</span> (state == State.HALF_OPEN) {
                <span class="hljs-comment">// A successful trial closes the breaker.</span>
                transitionToClosed();
            }
        }
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">onFailure</span><span class="hljs-params">(Throwable t)</span> </span>{
        <span class="hljs-comment">// Example: only count "server-side" failures.</span>
        <span class="hljs-keyword">boolean</span> breakerRelevant = <span class="hljs-keyword">true</span>; <span class="hljs-comment">// placeholder for domain-specific checks</span>

        <span class="hljs-keyword">if</span> (!breakerRelevant) {
            <span class="hljs-keyword">return</span>;
        }

        <span class="hljs-keyword">synchronized</span> (<span class="hljs-keyword">this</span>) {
            <span class="hljs-keyword">int</span> failures = failureCount.incrementAndGet();
            <span class="hljs-keyword">if</span> (state == State.CLOSED &amp;&amp; failures &gt;= failureThreshold) {
                transitionToOpen();
            } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (state == State.HALF_OPEN) {
                <span class="hljs-comment">// Any failure in HALF_OPEN sends us back to OPEN.</span>
                transitionToOpen();
            }
        }
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">transitionToOpen</span><span class="hljs-params">()</span> </span>{
        state = State.OPEN;
        <span class="hljs-comment">// Reset counters so the next CLOSED phase starts clean.</span>
        failureCount.set(<span class="hljs-number">0</span>);
        halfOpenTrials.set(<span class="hljs-number">0</span>);
        scheduleHalfOpen();
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">transitionToHalfOpen</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">synchronized</span> (<span class="hljs-keyword">this</span>) {
            state = State.HALF_OPEN;
            halfOpenTrials.set(<span class="hljs-number">0</span>);
        }
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">transitionToClosed</span><span class="hljs-params">()</span> </span>{
        state = State.CLOSED;
        failureCount.set(<span class="hljs-number">0</span>);
        halfOpenTrials.set(<span class="hljs-number">0</span>);
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">scheduleHalfOpen</span><span class="hljs-params">()</span> </span>{
        scheduler.schedule(
                <span class="hljs-keyword">this</span>::transitionToHalfOpen,
                openCooldown.toMillis(),
                TimeUnit.MILLISECONDS
        );
    }
}
</code></pre>
<p>Now we’ll walk through each responsibility in that class: why the fields exist, how state transitions work, where concurrency guarantees matter, how execution is guarded, and how the scheduler drives recovery.</p>
<p>Each subsection maps directly back to part of this class – we’re not introducing new concepts, just explaining the behavior implemented within the code above.</p>
<h3 id="heading-concurrency-and-state-transition-guarantees">Concurrency and State Transition Guarantees</h3>
<p>Although the breaker uses atomic primitives for counters and a volatile state field, this only works because <strong>all state transitions are guarded consistently</strong>.​</p>
<p>In practice, every transition – CLOSED → OPEN, OPEN → HALF_OPEN, HALF_OPEN → CLOSED – must be performed under the same synchronization mechanism as shown below: either a single lock or a CAS-based state machine. Mixing unsynchronized state writes with atomic counters can lead to split-brain behavior (for example, one thread reopening the breaker while another closes it).​</p>
<pre><code class="lang-java"><span class="hljs-keyword">synchronized</span> (<span class="hljs-keyword">this</span>) {
            current = state;

            <span class="hljs-keyword">if</span> (current == State.OPEN) {
                <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> IllegalStateException(<span class="hljs-string">"Circuit breaker is OPEN. Call rejected."</span>);
            }

            <span class="hljs-keyword">if</span> (current == State.HALF_OPEN) {
                <span class="hljs-keyword">int</span> trials = halfOpenTrials.incrementAndGet();
                <span class="hljs-keyword">if</span> (trials &gt; halfOpenTrialLimit) {
                    <span class="hljs-comment">// Too many trial requests; fail fast.</span>
                    halfOpenTrials.decrementAndGet();
                    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> IllegalStateException(<span class="hljs-string">"Circuit breaker is HALF_OPEN. Trial limit exceeded."</span>);
                }
            }
        }
</code></pre>
<p>The rule is simple: <strong>reads may be optimistic, but writes and transitions must be serialized</strong>.</p>
<h3 id="heading-explaining-the-state-model-in-the-class">Explaining the State Model in the Class</h3>
<p>At the core of the implementation is a simple but strict state machine represented by the State enum: CLOSED, OPEN and HALF_OPEN</p>
<p>The <code>state</code> field is declared <code>volatile</code> so changes are immediately visible across threads. When one thread moves the breaker to a new state, other threads see that update without delay.</p>
<p>Alongside the state, the class maintains <code>failureCount</code> and <code>halfOpenTrials</code> counters using <code>AtomicInteger</code> (<strong>Refer to the code in the above section</strong>). These track how failures accumulate and how many recovery attempts we have made, without resorting to coarse‑grained locks.</p>
<p>The key design idea is separation of responsibilities: the <code>enum</code> captures the current mode of operation, while the atomic counters hold the metrics that influence state transitions. Atomic increments alone do not guarantee safe transitions, though, so all updates to the state still follow a consistent serialization strategy to avoid race conditions.</p>
<pre><code class="lang-java"><span class="hljs-class"><span class="hljs-keyword">enum</span> <span class="hljs-title">State</span> </span>{
        CLOSED,
        OPEN,
        HALF_OPEN
    }
</code></pre>
<p>This structure gives us a clear foundation: a small, explicit state machine with observable transition boundaries.</p>
<h3 id="heading-failure-tracking-inside-the-class">Failure Tracking Inside the Class</h3>
<pre><code class="lang-java"><span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">onFailure</span><span class="hljs-params">(Throwable t)</span> </span>{
        <span class="hljs-comment">// Example: only count "server-side" failures.</span>
        <span class="hljs-keyword">boolean</span> breakerRelevant = <span class="hljs-keyword">true</span>; <span class="hljs-comment">// placeholder for domain-specific checks</span>

        <span class="hljs-keyword">if</span> (!breakerRelevant) {
            <span class="hljs-keyword">return</span>;
        }

        <span class="hljs-keyword">synchronized</span> (<span class="hljs-keyword">this</span>) {
            <span class="hljs-keyword">int</span> failures = failureCount.incrementAndGet();
            <span class="hljs-keyword">if</span> (state == State.CLOSED &amp;&amp; failures &gt;= failureThreshold) {
                transitionToOpen();
            } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (state == State.HALF_OPEN) {
                <span class="hljs-comment">// Any failure in HALF_OPEN sends us back to OPEN.</span>
                transitionToOpen();
            }
        }
    }
</code></pre>
<p>In this implementation, failure tracking is intentionally simple: we count <strong>consecutive</strong> failures. Each time a protected call throws an exception we classify as breaker‑relevant, <code>failureCount</code> is incremented. On a successful call, the counter resets.</p>
<p>I chose consecutive failures for clarity rather than sophistication. More advanced strategies, like sliding time windows or failure ratios, introduce extra state and timing complexity. When you’re learning how a breaker works, a simple counter makes the transition rules easy to reason about and easy to test.</p>
<p>Equally important, the breaker should not treat every exception the same. Domain validation errors, client misuse, and business rule violations shouldn’t affect the breaker’s state. Only infrastructure‑level problems (like timeouts, connection failures, or 5xx responses) should move the breaker toward OPEN. That separation keeps the breaker focused on dependency instability, not application bugs or bad inputs.</p>
<h3 id="heading-how-closed-state-transitions-to-open">How Closed State Transitions to Open</h3>
<p>When the breaker is in the CLOSED state, all requests flow through normally. In this phase the breaker is purely observational: it monitors outcomes and increments <code>failureCount</code> whenever a breaker‑relevant exception occurs.</p>
<p>Inside the <code>onFailure</code> method (shown in the above section), once the <code>failureCount</code> exceeds the configured threshold, the breaker transitions to OPEN. This transition must be atomic and serialized – otherwise, multiple threads could try to open the breaker at the same time, leading to inconsistent scheduling or duplicate recovery tasks.</p>
<pre><code class="lang-java"><span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">transitionToOpen</span><span class="hljs-params">()</span> </span>{
        state = State.OPEN;
        <span class="hljs-comment">// Reset counters so the next CLOSED phase starts clean.</span>
        failureCount.set(<span class="hljs-number">0</span>);
        halfOpenTrials.set(<span class="hljs-number">0</span>);
        scheduleHalfOpen();
    }
</code></pre>
<p>Moving to OPEN immediately changes system behavior. From that point on, new requests are rejected without attempting the protected operation, which shields downstream services and preserves local resources such as threads and connection pools.</p>
<h3 id="heading-open-state-behavior-in-the-class">OPEN State Behavior in the Class</h3>
<p>The OPEN state represents pure fail‑fast behavior. While the breaker is open, no protected calls are executed. The <code>execute()</code> method immediately throws an exception indicating that the circuit is open.</p>
<pre><code class="lang-java"><span class="hljs-keyword">public</span> &lt;T&gt; <span class="hljs-function">T <span class="hljs-title">execute</span><span class="hljs-params">(Supplier&lt;T&gt; action)</span> </span>{
        <span class="hljs-comment">// 1. Guards the functionality based on its current state. </span>
        <span class="hljs-comment">//We are using synchronized block for thread safety. </span>
        <span class="hljs-comment">// Make sure another thread does not override our current state</span>
        State current;
        <span class="hljs-keyword">synchronized</span> (<span class="hljs-keyword">this</span>) {
            current = state;

            <span class="hljs-keyword">if</span> (current == State.OPEN) {
                <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> IllegalStateException(<span class="hljs-string">"Circuit breaker is OPEN. Call rejected."</span>);
            }
....
}
</code></pre>
<p>This behavior is not about improving latency – it is about resource protection. Letting calls continue and simply “wait for timeouts” would still tie up threads and connections. The value of the OPEN state is that it refuses to participate in propagating failure at all.</p>
<p>In this state, the breaker has a single responsibility: wait for the scheduled recovery attempt. It doesn’t check timestamps on each request or poll in the hot path. Its behavior is deterministic: reject immediately and let the scheduler decide when to try again.</p>
<h3 id="heading-schedulerdriven-recovery-entering-halfopen">Scheduler‑Driven Recovery: Entering HALF_OPEN</h3>
<p>When the breaker transitions to OPEN, it immediately schedules a delayed task using the injected ScheduledExecutorService. After the configured cooldown period elapses, that task transitions the breaker to HALF_OPEN.</p>
<pre><code class="lang-java"><span class="hljs-comment">// Refer below methods from the main code </span>

<span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">transitionToOpen</span><span class="hljs-params">()</span> </span>{
        state = State.OPEN;
        <span class="hljs-comment">// Reset counters so the next CLOSED phase starts clean.</span>
        failureCount.set(<span class="hljs-number">0</span>);
        halfOpenTrials.set(<span class="hljs-number">0</span>);
        scheduleHalfOpen(); <span class="hljs-comment">// schedule a delayed task after changing the state to State.Open</span>
    }

<span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">void</span> <span class="hljs-title">scheduleHalfOpen</span><span class="hljs-params">()</span> </span>{
        scheduler.schedule(
                <span class="hljs-keyword">this</span>::transitionToHalfOpen,
                openCooldown.toMillis(),
                TimeUnit.MILLISECONDS
        );
    }
</code></pre>
<p>This design keeps time-based logic out of the request execution path. Rather than checking elapsed time on every call, the breaker delegates recovery timing to a dedicated scheduler thread. This reduces conditional logic under load and keeps the execute() method focused on guarding execution.</p>
<p>The scheduler must be reliable and isolated. A single-threaded executor is typically sufficient because transitions are rare and lightweight. More importantly, transitions should be idempotent so that unexpected rescheduling does not corrupt state.</p>
<h2 id="heading-spring-boot-scheduler-example">Spring Boot Scheduler Example</h2>
<p>In Spring Boot, you can wire a dedicated <code>ScheduledExecutorService</code> bean to drive state transitions instead of using plain Java threads.</p>
<pre><code class="lang-java"><span class="hljs-meta">@Configuration</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CircuitBreakerConfig</span> </span>{

    <span class="hljs-comment">// First bean </span>
    <span class="hljs-meta">@Bean</span>
    <span class="hljs-function">ScheduledExecutorService <span class="hljs-title">circuitBreakerScheduler</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">return</span> Executors.newSingleThreadScheduledExecutor();
    }

    <span class="hljs-comment">// Second bean </span>
    <span class="hljs-meta">@Bean</span>
    <span class="hljs-function">CircuitBreaker <span class="hljs-title">circuitBreaker</span><span class="hljs-params">(ScheduledExecutorService circuitBreakerScheduler)</span> </span>{
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> CircuitBreaker(
                circuitBreakerScheduler,
                <span class="hljs-number">5</span>,                     <span class="hljs-comment">// failureThreshold</span>
                <span class="hljs-number">2</span>,                     <span class="hljs-comment">// halfOpenTrialLimit</span>
                Duration.ofSeconds(<span class="hljs-number">30</span>) <span class="hljs-comment">// openCooldown</span>
        );
    }
}
</code></pre>
<p>The configuration class above wires the circuit breaker into the Spring container without introducing framework coupling into the breaker itself.</p>
<p>The first bean <code>circuitBreakerScheduler()</code> defines a dedicated <code>ScheduledExecutorService</code>. This executor is responsible exclusively for time-based state transitions. When the breaker moves to OPEN, it uses this scheduler to queue a delayed task that transitions the state to HALF_OPEN.</p>
<p>Using a single-threaded executor is intentional. Circuit breaker transitions are lightweight and infrequent, so parallel scheduling is unnecessary. A single thread guarantees serialized transition execution and avoids overlapping recovery attempts.</p>
<p>The second bean constructs the <code>CircuitBreaker</code> itself. Here we inject the scheduler and configure three things: a failure threshold of 5 consecutive errors, a half‑open trial limit of 2 test requests, and a 30‑second cooldown before we attempt recovery again. This configuration makes the breaker’s behavior explicit and easy to reason about – there are no hidden properties files or annotations, because everything that affects resilience is defined in one place.</p>
<p>At this point, the breaker is a fully managed Spring bean that you can inject into services and use programmatically.</p>
<h3 id="heading-how-this-connects-to-execution-flow">How This Connects to Execution Flow</h3>
<p>Once registered as a bean, the breaker becomes part of the application’s dependency graph. A typical service might inject it and wrap outbound calls:</p>
<pre><code class="lang-java"><span class="hljs-meta">@Service</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ExternalApiService</span> </span>{

    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> CircuitBreaker circuitBreaker;
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> RestTemplate restTemplate;

    ExternalApiService(CircuitBreaker circuitBreaker, RestTemplate restTemplate) {
        <span class="hljs-keyword">this</span>.circuitBreaker = circuitBreaker;
        <span class="hljs-keyword">this</span>.restTemplate = restTemplate;
    }

    <span class="hljs-function"><span class="hljs-keyword">public</span> String <span class="hljs-title">callExternal</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">return</span> circuitBreaker.execute(() -&gt;
                restTemplate.getForObject(<span class="hljs-string">"http://external/api"</span>, String.class)
        );
    }
}
</code></pre>
<p>Every outbound call to the external system flows through the breaker’s <code>execute()</code> method, which enforces the current state rules before allowing the call to proceed. That makes resilience behavior explicit at the integration boundary: anyone reviewing the service can immediately see that the call is protected. There is no hidden interception layer and no AOP proxy quietly changing behavior at runtime.</p>
<h3 id="heading-scheduler-design-and-thread-safety">Scheduler Design and Thread Safety</h3>
<p>The scheduler’s only responsibility is delayed state transition. It doesn’t execute business logic and it doesn’t evaluate request outcomes. Its purpose is narrowly scoped: move the breaker from OPEN to HALF_OPEN after a cooldown.</p>
<p>Because the executor is single-threaded, scheduled tasks cannot overlap. But this doesn’t eliminate concurrency concerns entirely. Request threads may still attempt transitions at the same time the scheduler fires. For this reason, transition methods such as <code>transitionToHalfOpen()</code> and <code>transitionToOpen()</code> must remain serialized and idempotent.</p>
<p>In other words, even though the scheduler simplifies time-based recovery, it doesn’t replace the need for careful state management.</p>
<p>The architectural separation looks like this:</p>
<ul>
<li><p>Request threads → enforce execution rules and record outcomes</p>
</li>
<li><p>Scheduler thread → handle time-based recovery transitions</p>
</li>
</ul>
<p>Keeping these responsibilities separate reduces complexity in the hot path and improves predictability under load.</p>
<h3 id="heading-why-we-avoid-scheduled-for-this-design">Why We Avoid @Scheduled for This Design</h3>
<p>Spring provides @Scheduled as an alternative mechanism for time-based tasks. While convenient, it introduces global scheduling behavior and reduces isolation.</p>
<p>By using a dedicated <code>ScheduledExecutorService</code> for the breaker, we avoid interference with other scheduled jobs, keep lifecycle control explicit, and tie scheduling logic directly to breaker transitions.</p>
<p>This design reinforces the principle that resilience components should be isolated and predictable.</p>
<h3 id="heading-bringing-it-all-together">Bringing It All Together</h3>
<p>At this stage, the full interaction looks like this:</p>
<ol>
<li><p>A service wraps its dependency call with <code>circuitBreaker.execute()</code>.</p>
</li>
<li><p>If the breaker is CLOSED, the call proceeds and any relevant failures are counted.</p>
</li>
<li><p>When failures exceed the threshold, the breaker moves to OPEN and schedules a recovery attempt.</p>
</li>
<li><p>While OPEN, calls fail immediately without hitting the downstream system.</p>
</li>
<li><p>After the cooldown period, the scheduler transitions the breaker to HALF_OPEN.</p>
</li>
<li><p>A small number of trial calls then decide whether the breaker returns to CLOSED or goes back to OPEN.</p>
</li>
</ol>
<p>Nothing is hidden: every transition is visible in code, every configuration value is explicit, and each thread involved has a single responsibility. That clarity is what makes a custom implementation useful for learning – and safe when it is designed correctly.</p>
<h3 id="heading-observability-making-the-breaker-understandable">Observability: Making the Breaker Understandable</h3>
<p>A circuit breaker without observability is risky. At a minimum you should expose the current state, the failure count, the time of the last transition, and how long the breaker has been open.</p>
<p>On the metrics side, track how often the breaker opens, how many calls are rejected per second, and the success rate of recovery attempts.</p>
<p>Your logs should record state transitions at INFO level and failure classification decisions at DEBUG. With that level of visibility, your custom breaker is often easier to understand and tune than what many libraries provide out of the box..</p>
<h3 id="heading-handling-different-failure-types">Handling Different Failure Types</h3>
<p>Not all failures are equal.</p>
<ul>
<li><p>API Response Timeouts → breaker‑relevant</p>
</li>
<li><p>API 5xx responses → breaker‑relevant</p>
</li>
<li><p>API 4xx responses → usually not</p>
</li>
<li><p>Any data or business validation errors → never</p>
</li>
</ul>
<p>A custom breaker lets you apply this kind of <strong>business‑aware classification</strong>, which is often hard to express cleanly with generic libraries.</p>
<h2 id="heading-custom-breaker-vs-resilience4j">Custom Breaker vs Resilience4j</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Aspect</strong></td><td><strong>Custom Breaker</strong></td><td><strong>Resilience4j</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Learning value</td><td>High</td><td>Low</td></tr>
<tr>
<td>Flexibility</td><td>High</td><td>Medium</td></tr>
<tr>
<td>Time to implement</td><td>Medium</td><td>Low</td></tr>
<tr>
<td>Operational maturity</td><td>Depends</td><td>High</td></tr>
<tr>
<td>Custom failure logic</td><td>Easy</td><td>Limited</td></tr>
<tr>
<td>Tooling / metrics</td><td>You wire metrics, logs, observability manually</td><td>Built-in metrics, logging, and integrations</td></tr>
</tbody>
</table>
</div><p>The choice is not binary. Many teams prototype with a custom breaker and later replace it with Resilience4j – now correctly configured.​</p>
<h2 id="heading-when-you-should-not-build-your-own">When You Should Not Build Your Own</h2>
<p>Do not build a custom breaker if:​</p>
<ul>
<li><p>You lack observability.</p>
</li>
<li><p>You do not understand concurrency.</p>
</li>
<li><p>You need advanced features immediately.</p>
</li>
<li><p>Your system is safety-critical.​</p>
</li>
</ul>
<p>For example, if you are building a <strong>payments platform</strong> with strict SLAs and cannot afford to battle-test a custom breaker, stick with a mature library like Resilience4j. The risk of subtle concurrency bugs, misclassified failures, or scheduler misconfigurations is too high to experiment in production.​</p>
<h2 id="heading-extending-the-design">Extending the Design</h2>
<p>Once you understand the core, you can add:​</p>
<ul>
<li><p>Sliding window metrics.</p>
</li>
<li><p>Adaptive thresholds.</p>
</li>
<li><p>Persistent breaker state.</p>
</li>
<li><p>Distributed breakers (per dependency).</p>
</li>
<li><p>Integration with feature flags.​</p>
</li>
</ul>
<p>These extensions are much easier when you control the internals.​</p>
<h2 id="heading-common-mistakes">Common Mistakes</h2>
<p>Common mistakes when working with circuit breakers include:​</p>
<ul>
<li><p>Opening the breaker on the first failure.</p>
</li>
<li><p>Blocking threads while OPEN.</p>
</li>
<li><p>Allowing unlimited HALF_OPEN requests.</p>
</li>
<li><p>Treating all exceptions equally.</p>
</li>
<li><p>Ignoring observability.​</p>
</li>
</ul>
<p>Most of these happen when using libraries without understanding them.​</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Resilience libraries are powerful, but they are not magic. A circuit breaker is fundamentally a state machine with failure tracking and time-based transitions. Building your own – even once – forces you to internalize this reality.​</p>
<p>In Spring Boot systems, a custom circuit breaker:​</p>
<ul>
<li><p>Clarifies failure semantics.</p>
</li>
<li><p>Improves debugging.</p>
</li>
<li><p>Enables domain-specific resilience.</p>
</li>
<li><p>Makes you a better user of Resilience4j.​</p>
</li>
</ul>
<p>You may never deploy your own breaker to production. But after building one, you will never configure a circuit breaker blindly again.​</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
