<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Fri, 05 Jun 2026 10:26:02 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How Attribute-Based Access Control Helps You Write Better Authorization Rules ]]>
                </title>
                <description>
                    <![CDATA[ Every application that handles user data eventually hits the same problem: not all users should see the same things. A junior nurse should not be able to access every patient record in the hospital. A ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-attribute-based-access-control-helps-you-write-better-authorization-rules/</link>
                <guid isPermaLink="false">6a21b44e09761aac249579f9</guid>
                
                    <category>
                        <![CDATA[ Security ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Security ]]>
                    </category>
                
                    <category>
                        <![CDATA[ authorization ]]>
                    </category>
                
                    <category>
                        <![CDATA[ access control ]]>
                    </category>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Aiyedogbon Abraham ]]>
                </dc:creator>
                <pubDate>Thu, 04 Jun 2026 17:22:22 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/1bcd9989-cf38-4375-a0ed-03cf1bd3c3b8.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Every application that handles user data eventually hits the same problem: not all users should see the same things.</p>
<p>A junior nurse should not be able to access every patient record in the hospital. A contractor should not be able to read internal financial reports. An employee logged in from an unrecognized device at 2AM probably should not be editing production configuration files.</p>
<p>Simple role-based systems handle obvious cases well. But as applications grow and access rules become more nuanced, those systems start to crack. You end up creating more and more specific roles, like <code>finance_viewer</code>, <code>finance_viewer_us_only</code>, <code>finance_viewer_us_only_readonly</code>, until the roles themselves become unmanageable.</p>
<p>Attribute-Based Access Control (ABAC) was designed to solve exactly this problem. It shifts from "what role does this user have?" to "what do we know about this user, this resource, and this situation?" and makes access decisions based on all of those factors together.</p>
<p>In this guide, you'll learn how ABAC works, how it evolved from earlier access control models, how policies are structured, how to implement it in code, and when to use it.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-how-access-control-has-evolved">How Access Control Has Evolved</a></p>
</li>
<li><p><a href="#heading-what-is-attribute-based-access-control">What is Attribute-Based Access Control?</a></p>
</li>
<li><p><a href="#heading-the-four-building-blocks-of-abac">The Four Building Blocks of ABAC</a></p>
</li>
<li><p><a href="#heading-how-an-abac-decision-is-made">How an ABAC Decision is Made</a></p>
</li>
<li><p><a href="#heading-how-to-write-abac-policies">How to Write ABAC Policies</a></p>
</li>
<li><p><a href="#heading-how-to-implement-abac-in-code">How to Implement ABAC in Code</a></p>
</li>
<li><p><a href="#heading-abac-vs-rbac-when-to-use-which">ABAC vs RBAC: When to Use Which</a></p>
</li>
<li><p><a href="#heading-real-world-use-cases">Real-World Use Cases</a></p>
</li>
<li><p><a href="#heading-enterprise-abac-considerations">Enterprise ABAC Considerations</a></p>
</li>
<li><p><a href="#heading-limitations-and-challenges">Limitations and Challenges</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-glossary">Glossary</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To get the most from this article, you should have:</p>
<ul>
<li><p>A basic understanding of web authentication (logins, sessions, tokens)</p>
</li>
<li><p>Familiarity with how users and resources relate in applications</p>
</li>
<li><p>Some experience reading JavaScript or pseudocode</p>
</li>
</ul>
<p>No prior knowledge of access control theory is required.</p>
<h2 id="heading-how-access-control-has-evolved">How Access Control Has Evolved</h2>
<p>To understand why ABAC exists, it helps to understand what came before it and why each generation fell short.</p>
<h3 id="heading-discretionary-and-mandatory-access-control">Discretionary and Mandatory Access Control</h3>
<p>Early access control models emerged from Department of Defense applications in the 1960s and 1970s. According to NIST Special Publication 800-162, these were Discretionary Access Control (DAC) and Mandatory Access Control (MAC).</p>
<p>In DAC, the owner of a resource decides who can access it. Think of a file on your computer where you choose who can read or edit it. In MAC, access is governed by a central authority using labels like "Classified" or "Top Secret." The system enforces these labels, not individual owners.</p>
<p>Both worked for their original purposes but didn't scale well to the complexity of modern networked systems.</p>
<h3 id="heading-identity-based-access-control-and-access-control-lists">Identity-Based Access Control and Access Control Lists</h3>
<p>As networks grew, identity-based access control (IBAC) became common. The most familiar implementation is the Access Control List (ACL), a list of users or groups attached to a resource, specifying what each can do.</p>
<p>ACLs are simple and transparent, but they create a management burden as systems grow. Every new user needs to be added to every relevant list. Every permission change means hunting through lists across multiple resources. And when someone leaves the organization, you need to find and remove them everywhere.</p>
<p>Failure to do this consistently leads to users accumulating privileges they should no longer have.</p>
<h3 id="heading-role-based-access-control">Role-Based Access Control</h3>
<p>Role-Based Access Control (RBAC) was a major step forward. Instead of assigning permissions directly to users, RBAC assigns them to roles. Users are then assigned roles. A hospital might have roles like <code>nurse</code>, <code>doctor</code>, <code>admin</code>, and <code>billing_staff</code>, each with different permissions.</p>
<p>This made administration much more manageable. Adding a new employee means assigning them appropriate roles. Removing an employee means removing their roles. Changing what nurses can do means updating the nurse role once.</p>
<p>RBAC became widely adopted and is still the right choice for many applications. But it has a structural weakness: as permission requirements become more granular, you have to create more specific roles. A nurse who can only see patients on their floor, only during their shift, or only for certain record types, needs a very specific role, or a combination of roles that interacts in complicated ways.</p>
<p>This proliferation is called "role explosion." The roles multiply until they are as difficult to manage as the individual permissions RBAC was supposed to replace.</p>
<h3 id="heading-attribute-based-access-control">Attribute-Based Access Control</h3>
<p>ABAC emerged as a response to role explosion. Instead of assigning roles that bundle fixed permissions, ABAC evaluates the actual characteristics of the user, the resource, and the context at the moment of every access request.</p>
<p>A nurse gets access to a patient record not because they have the <code>nurse</code> role, but because their job title is "Nurse Practitioner," the patient is on their assigned floor, it's currently their shift, and the record type is within their scope of care. Change any of those facts, and the access decision changes accordingly.</p>
<p>As NIST SP 800-162 defines it, ABAC is:</p>
<blockquote>
<p>"an access control method where subject requests to perform operations on objects are granted or denied based on assigned attributes of the subject, assigned attributes of the object, environment conditions, and a set of policies that are specified in terms of those attributes and conditions."</p>
</blockquote>
<h2 id="heading-what-is-attribute-based-access-control">What is Attribute-Based Access Control?</h2>
<p>ABAC is a logical access control model where every access decision is made by evaluating a set of rules against the current values of attributes. Nothing is pre-computed or cached in role assignments. Every time a user tries to do something, the system asks: given what we know about this user, this resource, and this moment, should this be allowed?</p>
<p>This makes ABAC highly precise and highly dynamic. Permissions don't accumulate over time. They don't need manual cleanup when someone's role changes. The system simply evaluates the current state of attributes every time.</p>
<p>The model is formally described in NIST's guide to ABAC as being capable of enforcing both Discretionary Access Control and Mandatory Access Control concepts, making it more expressive than models that only support one or the other.</p>
<p>Companies like Axiomatics, major government agencies, and large enterprises managing cross-organizational data sharing all rely on ABAC for its ability to scale security policies across complex environments.</p>
<h2 id="heading-the-four-building-blocks-of-abac">The Four Building Blocks of ABAC</h2>
<p>Every ABAC system is built from four types of information. Understanding these clearly is the key to understanding how ABAC works.</p>
<h3 id="heading-1-subject-attributes">1. Subject Attributes</h3>
<p>The subject is whoever or whatever is requesting access. This is usually a user, but it can also be a service, an application, or an automated system, what NIST calls a Non-Person Entity (NPE).</p>
<p>Subject attributes describe who the subject is:</p>
<pre><code class="language-plaintext">user.jobTitle         = "Nurse Practitioner"
user.department       = "Cardiology"
user.clearanceLevel   = "Confidential"
user.employmentStatus = "Active"
user.location         = "Floor 3"
user.shiftActive      = true
</code></pre>
<p>These attributes are typically sourced from an identity provider, HR system, or user directory. They're facts about the user that can be used in policies.</p>
<h3 id="heading-2-object-attributes-resource-attributes">2. Object Attributes (Resource Attributes)</h3>
<p>The object is whatever the subject is trying to access. This could be a file, a database record, an API endpoint, a service, or any other protected resource.</p>
<p>Object attributes describe what the resource is:</p>
<pre><code class="language-plaintext">record.type           = "PatientMedical"
record.floor          = "Floor 3"
record.sensitivity    = "High"
record.owner          = "Dr. Williams"
record.department     = "Cardiology"
</code></pre>
<p>Object attributes are typically assigned when a resource is created and updated throughout its lifecycle. They're facts about the resource that determine who should be able to access it.</p>
<h3 id="heading-3-action-attributes">3. Action Attributes</h3>
<p>The action is what the subject is trying to do to the object. Common actions include read, write, edit, delete, copy, execute, and share.</p>
<p>In many ABAC implementations, the action itself has attributes:</p>
<pre><code class="language-plaintext">action.type           = "read"
action.bulk           = false
</code></pre>
<p>Policies can restrict which actions are allowed independently of the other attributes. A user might be able to read a document but not delete it, even if all their other attributes match.</p>
<h3 id="heading-4-environment-conditions">4. Environment Conditions</h3>
<p>Environment conditions are contextual factors that don't belong to either the subject or the object, but that should influence the access decision. NIST describes these as "dynamic factors, independent of subject and object, that may be used as attributes at decision time to influence an access decision."</p>
<p>Examples include:</p>
<pre><code class="language-plaintext">environment.time           = "14:30"
environment.dayOfWeek      = "Wednesday"
environment.userLocation   = "Corporate Office"
environment.ipAddress      = "192.168.1.10"
environment.deviceStatus   = "compliant"
environment.threatLevel    = "low"
</code></pre>
<p>Environment conditions are what make ABAC truly dynamic. The same user, the same resource, and the same action might be allowed during business hours on a trusted device but denied at midnight from an unknown IP address.</p>
<h2 id="heading-how-an-abac-decision-is-made">How an ABAC Decision is Made</h2>
<p>When a subject tries to perform an action on an object, the ABAC system runs through a specific process:</p>
<h3 id="heading-step-1-collect-attributes">Step 1: Collect Attributes</h3>
<p>The system gathers current attributes for the subject, object, action, and environment. This might involve querying a user directory, reading resource metadata, and checking current time and location.</p>
<h3 id="heading-step-2-find-applicable-policies">Step 2: Find Applicable Policies</h3>
<p>The system identifies which policies apply to this particular request. A request to read a patient record might have several policies that apply: one about clinical staff access, one about after-hours access, and one about record sensitivity levels.</p>
<h3 id="heading-step-3-evaluate-each-policy">Step 3: Evaluate Each Policy</h3>
<p>Each applicable policy evaluates the collected attributes and returns permit or deny.</p>
<h3 id="heading-step-4-reconcile-conflicts">Step 4: Reconcile Conflicts</h3>
<p>If multiple policies apply and they conflict, the system uses predefined combining rules. Common approaches are "deny overrides" (if any policy says deny, the request is denied) or "permit overrides" (if any policy says permit, the request is permitted).</p>
<h3 id="heading-step-5-enforce-the-decision">Step 5: Enforce the Decision</h3>
<p>The system grants or denies access based on the final decision.</p>
<p>This process happens every time an access request is made. There's no caching of role assignments or pre-computed permission tables. The decision reflects the current state of all attributes at the moment of the request.</p>
<h2 id="heading-how-to-write-abac-policies">How to Write ABAC Policies</h2>
<p>Policies are the logic at the heart of ABAC. They're written as conditional rules that reference attributes. A well-written policy reads like a business rule, because that's exactly what it is.</p>
<h3 id="heading-simple-boolean-policy">Simple Boolean Policy</h3>
<p>The most basic form evaluates whether certain attributes match:</p>
<pre><code class="language-javascript">// Policy: Only active employees can access internal resources
function canAccessInternalResource(user) {
  return user.employmentStatus === "Active";
}
</code></pre>
<p><strong>What this does:</strong> Checks a single attribute, employment status, before allowing access. Any inactive, suspended, or terminated user is denied, regardless of their roles or past access history.</p>
<h3 id="heading-multi-attribute-policy">Multi-Attribute Policy</h3>
<p>Real policies typically combine multiple attributes:</p>
<pre><code class="language-javascript">// Policy: A nurse can read a patient record
// if the patient is on their assigned floor
// and during their active shift

function canReadPatientRecord(user, record, environment) {
  const isNurse = user.jobTitle === "Nurse Practitioner";
  const isAssignedFloor = user.assignedFloor === record.floor;
  const isActiveDuty = user.shiftActive === true;

  return isNurse &amp;&amp; isAssignedFloor &amp;&amp; isActiveDuty;
}
</code></pre>
<p><strong>What this does:</strong> Combines three conditions using AND logic. All three must be true for access to be granted. Change the nurse's floor assignment, and they immediately lose access to records on the previous floor, without any manual intervention.</p>
<h3 id="heading-environment-aware-policy">Environment-Aware Policy</h3>
<p>Adding environment conditions makes policies context-sensitive:</p>
<pre><code class="language-javascript">// Policy: Users can only access sensitive financial records
// during business hours from the corporate network

function canAccessSensitiveFinancialRecord(user, record, environment) {
  const isFinanceStaff = user.department === "Finance";
  const isHighSensitivity = record.sensitivity === "High";
  
  // If this is a high-sensitivity record, apply time and location controls
  if (isHighSensitivity) {
    const currentHour = new Date(environment.timestamp).getHours();
    const isBusinessHours = currentHour &gt;= 9 &amp;&amp; currentHour &lt; 17;
    const isCorporateNetwork = environment.ipRange === "corporate";

    return isFinanceStaff &amp;&amp; isBusinessHours &amp;&amp; isCorporateNetwork;
  }

  // Lower sensitivity records only require finance department membership
  return isFinanceStaff;
}
</code></pre>
<p><strong>What this does:</strong> Applies stricter controls to higher-sensitivity resources. The same user gets access to low-sensitivity records at any time, but high-sensitivity records require them to be on the corporate network during business hours. The policy logic mirrors the actual business rule: sensitive data needs more protection.</p>
<h3 id="heading-ownership-based-policy">Ownership-Based Policy</h3>
<p>ABAC can also implement discretionary ownership rules:</p>
<pre><code class="language-javascript">// Policy: A user can edit a document
// if they own it, or if they have editor permissions
// and the document isn't locked

function canEditDocument(user, document, action) {
  const isOwner = document.ownerId === user.id;
  const hasEditorPermission = user.permissions.includes("document.edit");
  const isUnlocked = document.status !== "locked";

  return (isOwner || hasEditorPermission) &amp;&amp; isUnlocked;
}
</code></pre>
<p><strong>What this does:</strong> Combines ownership (an attribute of the relationship between user and document) with explicit permissions and resource state. An editor can't edit a locked document even if they have the edit permission. An owner can edit their own documents but not locked ones.</p>
<h2 id="heading-how-to-implement-abac-in-code">How to Implement ABAC in Code</h2>
<p>Let's build a simple ABAC evaluation engine that puts these pieces together.</p>
<h3 id="heading-step-1-define-the-attribute-structure">Step 1: Define the Attribute Structure</h3>
<p>First, define clear data structures for your attributes:</p>
<pre><code class="language-javascript">// A user (subject) requesting access
const user = {
  id: "user-123",
  name: "Sarah Chen",
  department: "Cardiology",
  jobTitle: "Nurse Practitioner",
  clearanceLevel: 2,
  assignedFloor: "Floor 3",
  shiftActive: true,
  employmentStatus: "Active"
};

// A resource (object) being accessed
const patientRecord = {
  id: "record-456",
  type: "PatientMedical",
  floor: "Floor 3",
  sensitivity: 2,
  ownerId: "doctor-789",
  department: "Cardiology"
};

// Environment conditions
const environment = {
  timestamp: new Date().toISOString(),
  ipAddress: "10.0.1.25",
  ipRange: "corporate",
  deviceCompliant: true
};
</code></pre>
<h3 id="heading-step-2-write-policy-functions">Step 2: Write Policy Functions</h3>
<p>Write individual policies as pure functions that take attributes and return boolean values:</p>
<pre><code class="language-javascript">// policies/patientRecord.js

// Policy 1: User must be active and clinical staff
function isClinicalStaff(user) {
  const clinicalTitles = [
    "Nurse Practitioner",
    "Physician",
    "Resident",
    "Medical Assistant"
  ];
  return (
    user.employmentStatus === "Active" &amp;&amp;
    clinicalTitles.includes(user.jobTitle)
  );
}

// Policy 2: Record must be within the user's assigned area
function isAssignedToRecord(user, record) {
  return (
    user.department === record.department &amp;&amp;
    user.assignedFloor === record.floor
  );
}

// Policy 3: User must be on active shift
function isOnActiveShift(user) {
  return user.shiftActive === true;
}

// Policy 4: High-sensitivity records require compliant devices
function meetsDeviceRequirements(record, environment) {
  if (record.sensitivity &gt;= 3) {
    return environment.deviceCompliant === true;
  }
  return true; // No device requirement for lower sensitivity
}
</code></pre>
<p><strong>What this does:</strong> Each policy is a small, focused function. This makes policies easy to test individually, easy to read, and easy to reuse across different access decisions. A policy for "is this user clinical staff" can be applied to many different resource types.</p>
<h3 id="heading-step-3-build-an-evaluation-engine">Step 3: Build an Evaluation Engine</h3>
<p>Combine your policies into a decision engine:</p>
<pre><code class="language-javascript">// abac/engine.js

function evaluateAccess(user, resource, action, environment, policies) {
  // Collect all policy results
  const results = policies.map(policy =&gt; {
    try {
      return policy(user, resource, action, environment);
    } catch (error) {
      console.error(`Policy evaluation error: ${error.message}`);
      return false; // Fail closed: deny on error
    }
  });

  // Deny-overrides: if any policy denies, access is denied
  return results.every(result =&gt; result === true);
}

// Assemble policies for reading patient records
const readPatientRecordPolicies = [
  (user) =&gt; isClinicalStaff(user),
  (user, record) =&gt; isAssignedToRecord(user, record),
  (user) =&gt; isOnActiveShift(user),
  (user, record, action, environment) =&gt; meetsDeviceRequirements(record, environment)
];

// Make an access decision
const canRead = evaluateAccess(
  user,
  patientRecord,
  "read",
  environment,
  readPatientRecordPolicies
);

console.log(`Access ${canRead ? "granted" : "denied"}`);
// → Access granted (all conditions met)
</code></pre>
<p><strong>What this does:</strong> The engine loops through each policy function, passing in the relevant attributes. If all policies return true, access is granted. If any returns false, access is denied. This is called "deny-overrides combining". The <code>try-catch</code> ensures that if a policy throws an error, access is denied rather than granted, following the security principle of fail-closed.</p>
<h3 id="heading-step-4-add-attribute-collection">Step 4: Add Attribute Collection</h3>
<p>In a real application, attributes come from multiple sources:</p>
<pre><code class="language-javascript">// attributes/collector.js

async function collectAttributes(userId, resourceId) {
  // Collect in parallel for performance
  const [user, resource, environment] = await Promise.all([
    fetchUserAttributes(userId),      // From identity provider or HR system
    fetchResourceAttributes(resourceId), // From resource metadata store
    collectEnvironmentConditions()    // Time, IP, device status
  ]);

  return { user, resource, environment };
}

async function fetchUserAttributes(userId) {
  // This would query your user directory, LDAP, or identity provider
  const user = await userDirectory.findById(userId);
  const shift = await shiftService.getActiveShift(userId);
  
  return {
    ...user,
    shiftActive: shift !== null,
    assignedFloor: shift?.floor || null
  };
}

async function collectEnvironmentConditions() {
  return {
    timestamp: new Date().toISOString(),
    ipAddress: request.ip,
    ipRange: await networkService.classifyIP(request.ip),
    deviceCompliant: await deviceService.checkCompliance(request.deviceId)
  };
}
</code></pre>
<p><strong>What this does:</strong> Attribute collection is separated from policy evaluation. This is an important design decision: it means you can test policies with any attribute values without needing real users or resources. It also means you can swap out the source of attributes (say, moving from an on-premise directory to a cloud identity provider) without changing your policies.</p>
<h3 id="heading-step-5-integrate-with-your-api">Step 5: Integrate with Your API</h3>
<p>Use the evaluation engine in your API handlers:</p>
<pre><code class="language-javascript">// middleware/abac.js

function requireAccess(action, resourceType) {
  return async (req, res, next) =&gt; {
    try {
      const { user, resource, environment } = await collectAttributes(
        req.user.id,
        req.params.id
      );

      const policies = getPoliciesFor(resourceType, action);
      const allowed = evaluateAccess(user, resource, action, environment, policies);

      if (!allowed) {
        // Log the denial for audit purposes
        auditLog.record({
          userId: req.user.id,
          resourceId: req.params.id,
          action,
          decision: "denied",
          timestamp: new Date()
        });

        return res.status(403).json({ error: "Access denied" });
      }

      next();
    } catch (error) {
      // Fail closed: deny access on unexpected errors
      return res.status(403).json({ error: "Access denied" });
    }
  };
}

// Use in route definitions
app.get(
  "/patient-records/:id",
  authenticate(),                               // First verify identity
  requireAccess("read", "patientRecord"),       // Then evaluate ABAC
  patientRecordController.getById               // Then handle the request
);
</code></pre>
<p><strong>What this does:</strong> The ABAC check lives in middleware that runs between authentication and the route handler. Authentication establishes who the user is. ABAC decides whether that user can do what they're trying to do. This separation keeps authorization logic out of your business logic.</p>
<h2 id="heading-abac-vs-rbac-when-to-use-which">ABAC vs RBAC: When to Use Which</h2>
<p>RBAC isn't obsolete. It's genuinely the right choice for many applications. The question is which model fits your specific access requirements.</p>
<h3 id="heading-rbac-strengths">RBAC Strengths</h3>
<p>RBAC is simple to understand, simple to implement, and simple to audit. If you can describe your access requirements as a list of roles with fixed permissions, RBAC works well. Most SaaS applications start with RBAC and it serves them fine for years.</p>
<p>A typical RBAC check looks like:</p>
<pre><code class="language-javascript">// Simple RBAC: does the user have the required role?
function canAccess(user, requiredRole) {
  return user.roles.includes(requiredRole);
}
</code></pre>
<p>It's fast, clear, and easy to debug. When something goes wrong, you check which roles the user has and which roles the resource requires.</p>
<h3 id="heading-where-rbac-breaks-down">Where RBAC Breaks Down</h3>
<p>RBAC struggles when permissions need to depend on factors that aren't captured by a role. If you need to express "finance managers can view financial records, but only for their own region, and only during business hours," you're outside what a role alone can express cleanly.</p>
<p>You either need an extremely specific role (<code>finance_manager_us_east_business_hours</code>) that creates the role explosion problem, or you add conditional logic to your application code that effectively recreates ABAC, just in a less organized way.</p>
<h3 id="heading-rbac-vs-abac-comparison">RBAC vs ABAC Comparison</h3>
<table>
<thead>
<tr>
<th>Factor</th>
<th>RBAC</th>
<th>ABAC</th>
</tr>
</thead>
<tbody><tr>
<td>Logic</td>
<td>Permissions assigned to roles, roles assigned to users</td>
<td>Policies evaluate attributes at decision time</td>
</tr>
<tr>
<td>Granularity</td>
<td>Coarse-grained</td>
<td>Fine-grained and context-aware</td>
</tr>
<tr>
<td>Flexibility</td>
<td>Low, new rules require new roles</td>
<td>High, update policies without changing roles</td>
</tr>
<tr>
<td>Scalability</td>
<td>Role explosion under complexity</td>
<td>Scales with policy complexity, not role count</td>
</tr>
<tr>
<td>Auditability</td>
<td>Simple, check role assignments</td>
<td>Requires logging attributes at decision time</td>
</tr>
<tr>
<td>Complexity</td>
<td>Low</td>
<td>Higher, more moving parts</td>
</tr>
<tr>
<td>Best for</td>
<td>Simple, stable permission structures</td>
<td>Complex, dynamic, or context-dependent permissions</td>
</tr>
</tbody></table>
<h3 id="heading-combining-both-models">Combining Both Models</h3>
<p>RBAC and ABAC work well together. A common pattern is to use RBAC for coarse-grained access control (which sections of your application can this user see?) and ABAC for fine-grained control within those sections (which specific records can they access?).</p>
<p>For example, a role might grant access to the patient records section of a hospital system. Within that section, ABAC policies determine which specific records a user can view or edit based on their department, assigned floor, and active shift.</p>
<h2 id="heading-real-world-use-cases">Real-World Use Cases</h2>
<h3 id="heading-healthcare-records-management">Healthcare Records Management</h3>
<p>Healthcare is one of the clearest examples of why ABAC matters. Patient privacy regulations require precise access control, and patient care requires that the right staff can access records quickly when they need them.</p>
<p>An ABAC policy in a hospital might allow a nurse to view a patient's record only when:</p>
<ol>
<li><p>the patient is currently admitted to the nurse's assigned floor,</p>
</li>
<li><p>the nurse is on an active shift,</p>
</li>
<li><p>the access occurs from within the hospital network,</p>
</li>
<li><p>and the record type is within the nurse's care scope.</p>
</li>
</ol>
<p>According to WorkOS's ABAC analysis, in emergency situations ABAC systems can automatically expand access rights. For example, an ER doctor automatically gains broader access to patient records to provide immediate care, with this access being time-bound and closely monitored.</p>
<p>All of these rules would require dozens of roles in an RBAC system, and those roles would still struggle to handle the emergency access scenario dynamically.</p>
<h3 id="heading-corporate-data-access">Corporate Data Access</h3>
<p>Large enterprises typically have employees across departments, roles, locations, and clearance levels who need different views of the same underlying data. A document might be accessible to finance managers in the US region during business hours, accessible to executives globally at any time, but inaccessible to contractors entirely.</p>
<p>ABAC expresses all of these rules in policies. As employees change departments, go on leave, or change roles, their attributes update in the identity system and their access changes automatically, with no manual ACL updates required.</p>
<h3 id="heading-government-and-classified-information">Government and Classified Information</h3>
<p>The US federal government's adoption of ABAC is described in NIST SP 800-162, which was developed to address the Federal Identity, Credential, and Access Management (FICAM) requirements. Federal agencies deal with information shared across organizational boundaries, with varying classification levels and need-to-know requirements.</p>
<p>ABAC allows an analyst in one agency to access information from another agency without requiring the second agency to pre-provision an account for them. The analyst's clearance attributes, organizational affiliation, and project assignments are evaluated against the resource's classification and access rules at the time of the request.</p>
<h3 id="heading-multi-tenant-saas-applications">Multi-Tenant SaaS Applications</h3>
<p>SaaS applications that serve multiple organizations need to ensure strict data isolation between tenants while supporting complex permission structures within each tenant.</p>
<p>ABAC handles this naturally. A resource attribute like <code>record.tenantId</code> is evaluated against the user attribute <code>user.tenantId</code>, and no cross-tenant access is possible through policy. Within a tenant, ABAC supports as much complexity as the tenant's policies require.</p>
<h2 id="heading-enterprise-abac-considerations">Enterprise ABAC Considerations</h2>
<p>Deploying ABAC at enterprise scale introduces several challenges that don't exist in smaller implementations.</p>
<h3 id="heading-policy-administration">Policy Administration</h3>
<p>Policies need to be authored, reviewed, tested, and deployed. According to NIST SP 800-162, this requires a Policy Administration Point (PAP), an interface for creating and managing policies. Without proper tooling, policies become difficult to audit and maintain.</p>
<p>In practice, this means treating policies like code: version control, code review, and automated testing.</p>
<h3 id="heading-attribute-quality-and-freshness">Attribute Quality and Freshness</h3>
<p>ABAC is only as good as the attributes it evaluates. If user attributes are stale, for example, for a user who changed departments but whose directory entry hasn't been updated, the access decisions will be wrong.</p>
<p>NIST warns that "attributes that are not refreshed as often will ultimately be less secure than attributes that are refreshed in real time." Building reliable attribute pipelines from authoritative sources is often the hardest part of ABAC deployment.</p>
<h3 id="heading-performance">Performance</h3>
<p>Evaluating policies on every request has a performance cost. Each evaluation may require fetching attributes from multiple sources. To manage this, many implementations use attribute caching, but caching introduces the staleness problem described above.</p>
<p>The solution is to cache with appropriate TTLs (time-to-live values) based on how quickly each attribute type can change. A user's department changes rarely and can be cached for hours. A user's active shift status might change every 8 hours and needs a shorter cache. Real-time location might not be cacheable at all.</p>
<h3 id="heading-audit-logging">Audit Logging</h3>
<p>Because ABAC makes decisions dynamically, auditing requires logging the attributes used in each decision, not just the decision itself. A log entry that says "access denied" is only useful if it also captures why access was denied and which attributes failed to satisfy which policies.</p>
<p>NIST notes that without tracking attribute values at decision time, accountability requirements can't be met.</p>
<h2 id="heading-limitations-and-challenges">Limitations and Challenges</h2>
<p>ABAC is powerful, but it's not the right solution for every access control problem. It's worth being honest about its limitations before committing to an implementation.</p>
<p><strong>Complexity</strong>: According to NIST SP 800-162, "an ABAC system is more complicated, and therefore more costly to implement and maintain, than simpler access control systems." The flexibility that makes ABAC powerful also makes it harder to reason about. A user asking "why can't I access this?" requires examining all the attributes that were evaluated and which conditions weren't met.</p>
<p><strong>Policy Conflicts</strong>: In complex systems with many policies, conflicts between policies can occur. Two policies might individually seem correct but together produce unexpected results. Resolving these conflicts requires clear precedence rules and careful policy design.</p>
<p><strong>Attribute Management Overhead</strong>: Maintaining accurate attributes across large user populations requires investment in identity infrastructure. Attributes from different systems need to be normalized, validated, and kept synchronized. As NIST describes it, organizations need an entire attribute management infrastructure, not just a policy engine.</p>
<p><strong>Testing is Hard</strong>: Because access depends on the combination of potentially dozens of attributes, testing edge cases comprehensively requires thought. A policy that works correctly for typical cases might behave unexpectedly for unusual attribute combinations.</p>
<p><strong>Not Always Worth the Investment</strong>: For applications with straightforward access requirements, ABAC introduces unnecessary complexity. If your needs can be expressed cleanly as a set of roles with fixed permissions, RBAC is the better choice.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Attribute-Based Access Control represents a genuine evolution in how applications manage authorization. Rather than maintaining ever-growing lists of roles and permissions, ABAC evaluates the actual characteristics of users, resources, and context at the moment of every request.</p>
<p>It solves the role explosion problem that plagues complex RBAC implementations. It enables access rules that reflect real business policies rather than technical approximations of them. It handles dynamic scenarios, emergencies, time-based restrictions, and cross-organizational access that are difficult or impossible to express with static roles.</p>
<p>But ABAC isn't universally better. It's more complex to build, harder to debug, and requires investment in attribute management infrastructure that simpler models don't need. Many applications are well-served by RBAC, and some use RBAC and ABAC together.</p>
<p>The right question isn't "should I use ABAC?" It's "are my access requirements complex enough that the investment in ABAC pays off?" If your access rules change frequently, depend on resource or environment context, or need to scale across organizational boundaries, ABAC is worth serious consideration.</p>
<p>Start by identifying where your current access control model is breaking down. If you're creating roles to represent every edge case, if you're writing conditional logic inside route handlers that checks specific attribute values, or if users are accumulating permissions they should no longer have, those are signals that a more expressive model would help.</p>
<p>ABAC is the tool for when roles aren't enough.</p>
<h2 id="heading-glossary">Glossary</h2>
<p><strong>ABAC (Attribute-Based Access Control)</strong>: An access control method where authorization decisions are made by evaluating policies against the attributes of subjects, objects, actions, and environment conditions. Defined by NIST as the approach where "subject requests to perform operations on objects are granted or denied based on assigned attributes."</p>
<p><strong>Subject</strong>: The entity requesting access to a resource. Usually a human user, but can also be a service, automated process, or device. Also called the "requestor."</p>
<p><strong>Object</strong>: The resource being protected, such as a file, database record, API endpoint, service, or any system resource whose access is managed by the ABAC system.</p>
<p><strong>Attribute</strong>: A characteristic of a subject, object, action, or environment expressed as a name-value pair. For example, <code>user.department = "Finance"</code> or <code>record.sensitivity = "High"</code>.</p>
<p><strong>Subject Attributes</strong>: Properties describing the user or service making the request, such as job title, department, clearance level, or current location.</p>
<p><strong>Object Attributes</strong>: Properties describing the resource being accessed, such as its type, owner, sensitivity level, or department.</p>
<p><strong>Environment Conditions</strong>: Contextual factors independent of both subject and object that influence access decisions. Examples include time of day, day of week, IP address, device compliance status, or current threat level.</p>
<p><strong>Policy</strong>: A rule or set of rules that evaluates attribute values to determine whether a specific access request should be permitted or denied. ABAC policies are typically written as logical conditions.</p>
<p><strong>Policy Decision Point (PDP)</strong>: The component of an ABAC system that evaluates policies and attributes to compute an access decision.</p>
<p><strong>Policy Enforcement Point (PEP)</strong>: The component that intercepts access requests and enforces the decisions made by the PDP.</p>
<p><strong>Policy Information Point (PIP)</strong>: The component that retrieves attribute values needed by the PDP to make decisions.</p>
<p><strong>Policy Administration Point (PAP)</strong>: The component that provides an interface for creating, testing, and managing policies.</p>
<p><strong>RBAC (Role-Based Access Control)</strong>: An access control model that assigns permissions to roles and users to roles. Simpler than ABAC but less expressive for complex, dynamic access requirements.</p>
<p><strong>Role Explosion</strong>: The proliferation of increasingly specific roles in an RBAC system as access requirements become more granular, eventually making the roles as difficult to manage as individual permissions.</p>
<p><strong>DAC (Discretionary Access Control)</strong>: An access control model where resource owners control who can access their resources. Common in file systems.</p>
<p><strong>MAC (Mandatory Access Control)</strong>: An access control model where access is governed by a central authority using classification labels, independent of resource owner preferences.</p>
<p><strong>ACL (Access Control List)</strong>: A list associated with a resource that specifies which users or groups have which permissions. Common in identity-based access control systems.</p>
<p><strong>Non-Person Entity (NPE)</strong>: A subject that is not a human user, such as an automated service, application, or network device, that can request access to resources.</p>
<p><strong>Attribute Caching</strong>: Storing previously retrieved attribute values to improve performance, at the cost of potentially using stale data for access decisions.</p>
<p><strong>Deny-Overrides Combining</strong>: A policy combining rule where if any applicable policy returns deny, the overall decision is deny, regardless of other policies that may return permit.</p>
<p><strong>Fail-Closed</strong>: A security design principle where unexpected errors or missing information result in access being denied rather than granted, reducing the risk of unauthorized access.</p>
<p><em>Source: Definitions adapted from NIST Special Publication 800-162, Guide to Attribute Based Access Control (ABAC) Definition and Considerations, January 2014 (with updates through August 2019).</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Backend Challenges Teams Face When Processing Repeat Payments ]]>
                </title>
                <description>
                    <![CDATA[ Modern payment systems look simple from the outside. A user clicks a button, enters payment details, and money moves from one account to another. But once payments happen repeatedly rather than once,  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/backend-challenges-teams-face-when-processing-repeat-payments/</link>
                <guid isPermaLink="false">6a21b39809761aac24951f70</guid>
                
                    <category>
                        <![CDATA[ backend ]]>
                    </category>
                
                    <category>
                        <![CDATA[ payments ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Backend Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Thu, 04 Jun 2026 17:19:20 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/e7d774a7-f80b-4c91-a9f3-46b7d12e758a.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Modern payment systems look simple from the outside. A user clicks a button, enters payment details, and money moves from one account to another.</p>
<p>But once payments happen repeatedly rather than once, the backend becomes much more complex. Subscriptions, memberships, SaaS billing, and donation platforms all depend on repeat transactions that happen automatically over time.</p>
<p>Unlike one-time purchases, these systems must keep working long after the user leaves the application.</p>
<p>A payment failure today can become a customer support problem next week. A timing error can create duplicate charges. Small backend issues can quickly turn into lost revenue and unhappy users.</p>
<p>Many teams discover that recurring payment systems involve much more than calling a payment API every month. Behind the scenes, engineers deal with scheduling, retries, state management, event processing, and reliability challenges.</p>
<p>In this article, we'll look at seven backend challenges teams commonly face when building systems that process repeat payments and how engineering teams usually solve them. We will also look at some Python code that shows you how it looks in production systems.</p>
<h3 id="heading-what-well-cover">What We'll Cover:</h3>
<ul>
<li><p><a href="#heading-challenge-1-managing-payment-schedules-reliably">Challenge 1: Managing Payment Schedules Reliably</a></p>
</li>
<li><p><a href="#heading-challenge-2-preventing-duplicate-charges">Challenge 2: Preventing Duplicate Charges</a></p>
</li>
<li><p><a href="#heading-challenge-3-handling-failed-payments-gracefully">Challenge 3: Handling Failed Payments Gracefully</a></p>
</li>
<li><p><a href="#heading-challenge-4-keeping-system-state-consistent">Challenge 4: Keeping System State Consistent</a></p>
</li>
<li><p><a href="#heading-challenge-5-processing-webhooks-correctly">Challenge 5: Processing Webhooks Correctly</a></p>
</li>
<li><p><a href="#heading-challenge-6-supporting-different-payment-models">Challenge 6: Supporting Different Payment Models</a></p>
</li>
<li><p><a href="#heading-challenge-7-monitoring-payment-systems-in-real-time">Challenge 7: Monitoring Payment Systems in Real Time</a></p>
</li>
<li><p><a href="#heading-final-thoughts">Final Thoughts</a></p>
</li>
</ul>
<h2 id="heading-challenge-1-managing-payment-schedules-reliably">Challenge 1: Managing Payment Schedules Reliably</h2>
<p>The first challenge appears before a payment even starts.</p>
<p>When users subscribe or enroll in a recurring billing flow, the system must remember when future payments should happen. That sounds straightforward at first: you store a date and trigger a job later.</p>
<p>Reality becomes more difficult. Users live across different time zones. Months have different lengths. Leap years exist. Billing cycles change. Daylight Saving adjustments can create unexpected behaviour.</p>
<p>Suppose a customer subscribes on January 31. What happens next month? February doesn't have a 31st day. Now imagine millions of users with different payment schedules.</p>
<p>A simple <a href="https://en.wikipedia.org/wiki/Cron">cron job</a> often proves insufficient.</p>
<p>Large systems usually separate scheduling from business logic.</p>
<p>A common pattern is to store billing schedules in a dedicated scheduler service rather than relying on application cron jobs. The scheduler publishes a "payment due" event when the billing date arrives, and downstream workers handle payment execution.</p>
<p>Teams also store the next billing date after each successful payment rather than calculating future dates on the fly. This prevents errors caused by daylight saving changes, leap years, and month-end edge cases.</p>
<p>Using durable job queues such as Quartz, <a href="https://temporal.io/">Temporal</a>, or cloud-native schedulers further improves reliability because missed executions can be recovered automatically.</p>
<p>Lets look at a Python example.</p>
<pre><code class="language-python">from datetime import datetime

def process_due_payments():
    subscriptions = get_due_subscriptions()

    for sub in subscriptions:
        publish_event(
            "payment_due",
            {
                "subscription_id": sub.id,
                "customer_id": sub.customer_id
            }
        )

        sub.next_billing_date = calculate_next_billing_date(
            sub.next_billing_date
        )
        save_subscription(sub)
</code></pre>
<p>In this example, the scheduler doesn't attempt to process the payment itself. Its only responsibility is to identify subscriptions that are due for billing and publish a <code>payment_due</code> event.</p>
<p>A separate payment service can then consume the event and execute the charge. This separation improves reliability because scheduling and payment processing can scale independently, and missed jobs can be recovered from the event queue if a service becomes unavailable.</p>
<h2 id="heading-challenge-2-preventing-duplicate-charges">Challenge 2: Preventing Duplicate Charges</h2>
<p>Duplicate payment processing is one of the fastest ways to lose customer trust.</p>
<p>Backend systems can retry requests for many reasons: network failures happen, payment providers timeout, and service interruptions occur.</p>
<p>Suppose the application sends a charge request. The payment provider receives it successfully.</p>
<p>But before the provider returns a response, the network connection drops.</p>
<p>Did the charge succeed? The backend system doesn't know.</p>
<p>Some systems immediately retry. But if the original transaction already worked, the user may receive two charges instead of one.</p>
<p>This problem becomes more common in distributed systems where multiple services communicate through APIs and message queues.</p>
<p>Most payment platforms solve this with idempotency keys.</p>
<p>An <a href="https://algomaster.io/learn/system-design/idempotency">idempotency</a> key acts as a unique identifier attached to a payment request. Even if the request arrives multiple times, the payment provider knows it represents the same operation.</p>
<p>Instead of creating duplicate transactions, the system returns the original result. Backend engineers often treat idempotency as a mandatory design principle rather than an optional feature.</p>
<pre><code class="language-python">import requests

idempotency_key = f"sub_{subscription.id}_{billing_period}"

response = requests.post(
    "https://api.payment-provider.com/charge",
    json={
        "customer_id": customer.id,
        "amount": 49.00
    },
    headers={
        "Idempotency-Key": idempotency_key
    }
)
</code></pre>
<p>Here, every billing attempt receives a unique idempotency key based on the subscription and billing period. If the network connection fails after the provider receives the request, the backend can safely retry using the same key.</p>
<p>The payment provider recognizes the operation as a duplicate request and returns the original result instead of creating a second charge, protecting customers from accidental double billing.</p>
<h2 id="heading-challenge-3-handling-failed-payments-gracefully">Challenge 3: Handling Failed Payments Gracefully</h2>
<p>Not all payment failures mean the same thing.</p>
<p>Cards expire. Banks decline charges. Temporary network issues happen. Users hit spending limits. Fraud systems block transactions.</p>
<p>A payment failing once doesn't automatically mean the customer wants to cancel a service. This creates a difficult backend decision.</p>
<p>Should the system retry immediately? Wait one day? Send a notification? Cancel the subscription?</p>
<p>Teams often build retry strategies known as <a href="https://www.hyperbots.com/glossary/dunning-workflow">dunning workflows</a>.</p>
<p>These workflows determine what happens after a failed payment. Some systems attempt another charge after 24 hours. Others wait several days before trying again.</p>
<img src="https://cdn.hashnode.com/uploads/covers/66c6d8f04fa7fe6a6e337edd/552cc9ae-7885-452f-96bf-e55ba80feae3.png" alt="Dunning Workflow" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>A typical dunning workflow categorises failures into temporary and permanent errors.</p>
<p>Temporary failures such as network issues or insufficient funds trigger automatic retries after predefined intervals, for example, after 24 hours, 3 days, and 7 days.</p>
<p>Permanent failures, such as expired cards, pause future retries and immediately request updated payment information from the customer.</p>
<p>Many teams continuously measure retry success rates and adjust retry timing based on historical recovery data.</p>
<pre><code class="language-python">def handle_failed_payment(payment):
    if payment.error_type == "temporary":
        schedule_retry(payment.id, hours=24)

    elif payment.error_type == "permanent":
        notify_customer(
            payment.customer_id,
            "Please update your payment method."
        )
</code></pre>
<p>This example shows a simple dunning workflow. Temporary failures, such as insufficient funds or transient network issues, are scheduled for automatic retry after a delay. Permanent failures, such as an expired payment method, trigger customer notifications instead.</p>
<p>By treating failures differently, the system can recover revenue automatically while avoiding unnecessary retries for charges that cannot succeed without user intervention.</p>
<h2 id="heading-challenge-4-keeping-system-state-consistent">Challenge 4: Keeping System State Consistent</h2>
<p>Payment systems rarely exist as isolated services. A successful transaction can affect multiple systems at once.</p>
<p>A payment may update billing databases, activate customer access, generate invoices, send notifications, and trigger analytics pipelines.</p>
<p>The challenge appears when one action succeeds, but another fails.</p>
<p>Imagine this sequence: Payment succeeds. Invoice generation succeeds. Customer access update fails.</p>
<p>Now the system enters an inconsistent state. The user paid, but still can't access the service.</p>
<p>Distributed systems make this problem difficult because transactions across services are not always atomic.</p>
<p>Teams often solve this using event-driven architecture.</p>
<img src="https://cdn.hashnode.com/uploads/covers/66c6d8f04fa7fe6a6e337edd/7b0457a9-7df6-4930-86ee-a7f0840621da.jpg" alt="Event Driven Architecture" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>After a payment succeeds, the application stores both the payment result and a corresponding event in the same database transaction. A separate process then publishes the event to downstream systems.</p>
<p>This guarantees that customer access, invoicing, analytics, and notifications eventually receive the same source-of-truth event, reducing the risk of inconsistent states.</p>
<pre><code class="language-python">def complete_payment(payment):

    with database.transaction():

        save_payment(payment)

        save_outbox_event({
            "type": "payment_completed",
            "payment_id": payment.id
        })
</code></pre>
<pre><code class="language-python">def publish_outbox_events():
    events = get_unpublished_events()

    for event in events:
        publish_to_queue(event)
        mark_as_published(event.id)
</code></pre>
<p>This pattern is commonly known as the Outbox Pattern. The payment record and the corresponding event are stored within the same database transaction, ensuring that both succeed or fail together.</p>
<p>Even if downstream systems such as invoicing or access management are temporarily unavailable, the event remains stored and can be published later, preventing inconsistencies where a customer pays successfully but doesn't receive the service they purchased.</p>
<h2 id="heading-challenge-5-processing-webhooks-correctly">Challenge 5: Processing Webhooks Correctly</h2>
<p>Modern payment systems depend heavily on <a href="https://www.redhat.com/en/topics/automation/what-is-a-webhook">webhooks</a>.</p>
<p>Payment providers rarely expect applications to continuously ask whether a payment succeeded. Instead, providers send events to your backend.</p>
<p>For example:</p>
<ul>
<li><p>Payment completed.</p>
</li>
<li><p>Subscription updated.</p>
</li>
<li><p>Card expired.</p>
</li>
<li><p>Refund issued.</p>
</li>
<li><p>Charge failed.</p>
</li>
</ul>
<p>Webhooks sound easy until real-world conditions appear.</p>
<p>Events may arrive late. Events may arrive twice. Events sometimes arrive out of order.</p>
<p>Imagine receiving a “subscription renewed” event before the original payment confirmation. Without careful design, systems can enter invalid states.</p>
<p>Teams commonly solve this with event validation, signature verification, and state reconciliation logic.</p>
<p>Many payment teams introduce a webhook ingestion layer that immediately stores incoming events before processing them. The event identifier becomes the idempotency key, ensuring duplicate webhooks are ignored safely.</p>
<p>Systems then process events asynchronously through a queue, which protects the payment provider from timeouts and allows failed events to be retried without losing data.</p>
<pre><code class="language-python">def process_webhook(event):

    if event_exists(event["id"]):
        return

    store_event(event)

    queue_event_for_processing(event)
</code></pre>
<p>This example checks whether an event has already been processed before taking any action.</p>
<p>By using the webhook event ID as a unique identifier, the system can safely ignore duplicates while still guaranteeing that legitimate events are processed exactly once.</p>
<h2 id="heading-challenge-6-supporting-different-payment-models">Challenge 6: Supporting Different Payment Models</h2>
<p>Not every repeat payment behaves the same way.</p>
<p>Some subscriptions charge a fixed amount monthly. Others depend on usage.</p>
<p>Membership systems may include annual plans. Donation platforms often allow users to choose flexible amounts.</p>
<p>Systems supporting recurring donations create an interesting example. Unlike traditional subscriptions, users may adjust contribution amounts frequently, pause payments, or donate on custom schedules. This creates additional complexity around billing rules and <a href="https://www.techtarget.com/searchapparchitecture/definition/state-management">state management</a>.</p>
<p>As products evolve, backend systems often inherit multiple payment models simultaneously.</p>
<p>The original architecture may have assumed one billing type. Months later, new requirements appear.</p>
<p>Weekly billing arrives. Trial periods arrive. Prorated upgrades arrive. Usage-based pricing arrives.</p>
<p>Now a simple payment service starts looking like a billing platform.</p>
<p>Many teams eventually redesign their systems around payment abstractions rather than hardcoded workflows.</p>
<p>Instead of embedding billing rules directly into application code, teams often model subscriptions, usage plans, trial periods, and recurring donations as configurable billing entities.</p>
<p>A billing engine evaluates these entities and generates charge requests based on predefined rules. This approach makes it easier to introduce new pricing models without rewriting core payment logic every time the business changes direction.</p>
<pre><code class="language-python">class BillingPlan:

    def calculate_amount(self, customer):
        raise NotImplementedError


class FixedPlan(BillingPlan):

    def calculate_amount(self, customer):
        return 20.00


class UsagePlan(BillingPlan):

    def calculate_amount(self, customer):
        return customer.active_users * 5.00
</code></pre>
<pre><code class="language-python">amount = customer.plan.calculate_amount(customer)
charge_customer(customer, amount)
</code></pre>
<p>Instead of hardcoding billing logic throughout the application, this design encapsulates pricing rules within dedicated billing plan classes. The payment system simply asks the selected plan to calculate the amount due.</p>
<p>As new pricing models such as annual subscriptions, free trials, or usage-based billing are introduced, developers can add new plan types without modifying the core payment workflow.</p>
<h2 id="heading-challenge-7-monitoring-payment-systems-in-real-time">Challenge 7: Monitoring Payment Systems in Real Time</h2>
<p>Payment failures become expensive quickly.</p>
<p>If a search feature fails, users might retry later. If payment processing fails, revenue disappears immediately.</p>
<p>This means observability becomes essential. Teams need answers to questions like:</p>
<ul>
<li><p>How many payments failed today?</p>
</li>
<li><p>Did retries increase unexpectedly?</p>
</li>
<li><p>Did Webhook processing slow down?</p>
</li>
<li><p>Are certain payment methods failing more often?</p>
</li>
</ul>
<p>Monitoring repeat payment systems requires more than server metrics. Business metrics matter too. Engineering teams often track payment conversion rates, retry success rates, churn indicators, and revenue impact.</p>
<p>Logs alone rarely tell the full story. Modern systems combine <a href="https://www.vmware.com/topics/application-monitoring">application monitoring</a>, event tracing, dashboards, and alerting systems.</p>
<p>When payment issues happen, teams need to identify problems before customers begin filing support tickets.</p>
<p>Fast visibility often becomes the difference between a small incident and a major outage.</p>
<pre><code class="language-python">def process_payment(payment):

    try:
        charge_customer(payment)

        metrics.increment(
            "payments.success"
        )

    except PaymentError:

        metrics.increment(
            "payments.failed"
        )

        raise
</code></pre>
<pre><code class="language-python">if payment_success_rate &lt; 95:
    send_alert(
        "Payment success rate below threshold"
    )
</code></pre>
<p>This example demonstrates how payment systems can capture operational metrics during transaction processing. Every successful and failed charge updates monitoring dashboards, allowing teams to track trends in real time.</p>
<p>If success rates fall below an acceptable threshold, automated alerts notify engineers immediately so they can investigate provider outages, integration issues, or infrastructure problems before significant revenue is affected.</p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>Repeat payments look deceptively simple from the user side.</p>
<p>A customer subscribes once and expects everything to work automatically afterwards.</p>
<p>Backend systems carry the real burden. Scheduling, retries, duplicate prevention, state management, webhook processing, and observability all introduce complexity that rarely appears in early prototypes.</p>
<p>Teams often start with straightforward implementations and discover these problems later as scale increases.</p>
<p>The challenge isn't processing one payment successfully. The challenge is processing millions of payments reliably across months or years without creating customer friction.</p>
<p>The most effective payment systems are usually the ones users never think about.</p>
<p>When the backend works properly, everything feels invisible. And in infrastructure engineering, invisible is often the goal.</p>
<p>Hope you enjoyed this article. You can <a href="https://linkedin.com/in/manishmshiva">connect with me on LinkedIn</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Preprocess Medical Images for Machine Learning – A Guide Using Chest X-Rays ]]>
                </title>
                <description>
                    <![CDATA[ Working with healthcare data introduces preprocessing challenges that go beyond those you might encounter with structured data. Some familiar techniques still apply, while others look very different o ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-preprocess-medical-images-for-machine-learning/</link>
                <guid isPermaLink="false">6a21b25709761aac249473c9</guid>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ healthcare ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Medical Imaging ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data-engineering ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Preprocessing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Lakshmi Mahabaleshwara ]]>
                </dc:creator>
                <pubDate>Thu, 04 Jun 2026 17:13:59 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/eab58d7c-f63a-41ae-a01e-52a65b0be17c.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Working with healthcare data introduces preprocessing challenges that go beyond those you might encounter with structured data. Some familiar techniques still apply, while others look very different once your data becomes medical images.</p>
<p>In this article, you’ll learn how to prepare a real-world medical imaging dataset for machine learning, from initial data validation to a complete preprocessing pipeline.</p>
<p>We’ll use the Chest X-Ray Pneumonia dataset as our running example, but the lessons apply broadly to healthcare imaging data, including ultrasound, MRI, CT, and dermatology images.</p>
<h2 id="heading-what-youll-learn-in-this-article">What You'll Learn in This Article</h2>
<p>By the end of this article, you'll know how to:</p>
<ul>
<li><p>Approach healthcare data preprocessing differently from preprocessing structured data, and recognize where standard techniques fall short</p>
</li>
<li><p>Validate a medical imaging dataset before training to catch corrupted files, mislabels, and data leakage between train and test</p>
</li>
<li><p>Apply six core preprocessing techniques for medical images</p>
</li>
<li><p>Build a complete preprocessing pipeline for chest X-rays using Python with OpenCV.</p>
</li>
</ul>
<h2 id="heading-what-well-cover"><strong>What We'll Cover:</strong></h2>
<ul>
<li><p><a href="#heading-why-preprocessing-data-matters-more-in-healthcare">Why Preprocessing Data Matters More in Healthcare</a></p>
</li>
<li><p><a href="#heading-the-dataset">The Dataset</a></p>
</li>
<li><p><a href="#heading-before-preprocessing-validate-the-dataset">Before Preprocessing: Validate the Dataset</a></p>
</li>
<li><p><a href="#heading-the-six-pillars-of-healthcare-imaging-preprocessing">The Six Pillars of Healthcare Imaging Preprocessing</a></p>
</li>
<li><p><a href="#heading-pillar-1-scaling-making-the-numbers-play-fair">Pillar 1: Scaling — Making the Numbers Play Fair</a></p>
</li>
<li><p><a href="#heading-pillar-2-normalization-centering-the-data">Pillar 2: Normalization — Centering the Data</a></p>
</li>
<li><p><a href="#heading-pillar-3-guiding-the-models-attention">Pillar 3: Guiding the Model's Attention</a></p>
</li>
<li><p><a href="#heading-pillar-4-handling-missing-data">Pillar 4: Handling Missing Data</a></p>
</li>
<li><p><a href="#heading-pillar-5-resizing-amp-resampling-fitting-everything-in-the-same-frame">Pillar 5: Resizing &amp; Resampling — Fitting Everything in the Same Frame</a></p>
</li>
<li><p><a href="#heading-pillar-6-denoising-amp-artifact-handling-cleaning-the-window">Pillar 6: Denoising &amp; Artifact Handling — Cleaning the Window</a></p>
</li>
<li><p><a href="#heading-putting-it-all-together-a-complete-pipeline">Putting it All together: A Complete Pipeline</a></p>
</li>
<li><p><a href="#heading-try-it-yourself">Try it Yourself</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-why-preprocessing-data-matters-more-in-healthcare">Why Preprocessing Data Matters More in Healthcare</h2>
<p>Imagine handing a toddler a jigsaw puzzle with missing pieces, warped edges, and pieces from three different puzzles mixed together. The toddler can't solve it, but that isn't really the toddler's fault.</p>
<p>The same thing happens when raw, messy data gets fed into a machine learning model. A bad prediction on a clinical image can mean a missed diagnosis.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69fd77e89f93a850a46d376f/55671e0b-95ea-4f99-b507-a8742e8981d9.png" alt="Illustration showing a healthcare data preprocessing workflow. Mixed medical images with different sizes, missing labels, noisy scans, and corrupted files enter a preprocessing pipeline and emerge as clean, standardized, model-ready images ready for machine learning." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Healthcare data tends to be messier than what most ML practitioners are used to:</p>
<ul>
<li><p>Images come from different machines, hospitals, and acquisition protocols</p>
</li>
<li><p>Labels are inconsistent, sometimes missing, sometimes wrong</p>
</li>
<li><p>Patient data is incomplete</p>
</li>
<li><p>Image sizes, contrast levels, and orientations vary across sources</p>
</li>
</ul>
<p>Poor preprocessing often leads to models that perform well on benchmark datasets but struggle to generalize to data collected from different hospitals or imaging devices.</p>
<h2 id="heading-the-dataset">The Dataset</h2>
<p>This guide uses the <strong>Chest X-Ray Pneumonia dataset</strong> by Paul Mooney on Kaggle. It's a strong choice for learning preprocessing because:</p>
<ul>
<li><p>It contains around 5,800 pediatric chest X-rays</p>
</li>
<li><p>It has two clear classes — Normal and Pneumonia</p>
</li>
<li><p>It's already organized into train, validation, and test folders</p>
</li>
<li><p>The images are recognizable without specialized medical training</p>
</li>
<li><p>It exhibits almost every preprocessing challenge worth learning</p>
</li>
</ul>
<p>The dataset is available at <a href="https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia">Kaggle: Chest X-Ray Pneumonia</a>.</p>
<h3 id="heading-folder-structure">Folder Structure</h3>
<p>After downloading, the dataset is organized like this:</p>
<pre><code class="language-plaintext">chest_xray/
├── train/
│   ├── NORMAL/
│   └── PNEUMONIA/
├── val/
│   ├── NORMAL/
│   └── PNEUMONIA/
└── test/
    ├── NORMAL/
    └── PNEUMONIA/
</code></pre>
<p>Side-by-side comparison — Normal vs Pneumonia chest X-ray:</p>
<img src="https://cdn.hashnode.com/uploads/covers/69fd77e89f93a850a46d376f/b92e1e14-ac24-4314-afce-bc2c3ce3ea32.png" alt="Side-by-side chest X-ray images showing a normal lung scan on the left and a pneumonia scan on the right. The pneumonia image contains visible cloudy opacities compared with the clearer lung fields in the normal image." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>A quick first look at one of the images:</p>
<pre><code class="language-python">import os
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import cv2

DATA_DIR = "chest_xray"
TRAIN_DIR = os.path.join(DATA_DIR, "train")

# Peek at a sample image
sample_path = os.path.join(TRAIN_DIR, "NORMAL", os.listdir(os.path.join(TRAIN_DIR, "NORMAL"))[0])
sample_image = cv2.imread(sample_path, cv2.IMREAD_GRAYSCALE)

print(f"Image shape: {sample_image.shape}")
print(f"Pixel range: {sample_image.min()} to {sample_image.max()}")
print(f"Data type: {sample_image.dtype}")
</code></pre>
<p>The output reveals a few useful things right away: most images are large (often around 1500×2000 pixels), pixel values fall in the 0–255 range, and image sizes vary across the dataset. Each of these observations will inform a preprocessing step.</p>
<h2 id="heading-before-preprocessing-validate-the-dataset">Before Preprocessing: Validate the Dataset</h2>
<p>Before applying any transformations, it's worth checking that the data itself is intact. This step alone catches issues that would otherwise cause training to fail silently or produce misleading results.</p>
<p>A simple validation function:</p>
<pre><code class="language-python">def validate_dataset(data_dir):
    """Scan a dataset folder and flag common data quality issues."""
    corrupted = []
    too_small = []
    nearly_black = []
    total = 0
    
    for class_name in os.listdir(data_dir):
        class_path = os.path.join(data_dir, class_name)
        if not os.path.isdir(class_path):
            continue
        for fname in os.listdir(class_path):
            fpath = os.path.join(class_path, fname)
            total += 1
            try:
                img = cv2.imread(fpath, cv2.IMREAD_GRAYSCALE)
                if img is None:
                    corrupted.append(fpath)
                    continue
                if img.shape[0] &lt; 100 or img.shape[1] &lt; 100:
                    too_small.append(fpath)
                if img.mean() &lt; 5:
                    nearly_black.append(fpath)
            except Exception:
                corrupted.append(fpath)
    
    print(f"Total files scanned: {total}")
    print(f"Corrupted: {len(corrupted)}")
    print(f"Too small: {len(too_small)}")
    print(f"Nearly black: {len(nearly_black)}")
    return corrupted, too_small, nearly_black

validate_dataset(TRAIN_DIR)
</code></pre>
<p>Common issues this catches:</p>
<ul>
<li><p><strong>Corrupted files</strong> — files that won't open at all</p>
</li>
<li><p><strong>Empty or nearly-black images</strong> — failed acquisitions or saved-as-blank files</p>
</li>
<li><p><strong>Wrong dimensions</strong> — thumbnails or partial downloads mixed in</p>
</li>
<li><p><strong>Duplicate images</strong> — the same scan appearing in both train and test (this causes data leakage)</p>
</li>
<li><p><strong>Mislabeled images</strong> — a normal X-ray placed in the pneumonia folder</p>
</li>
</ul>
<p><strong>⚠️ This step is critical</strong>, One corrupted file can crash a training loop hours into a run. One duplicate between train and test can inflate accuracy scores by several percentage points without anyone noticing.</p>
<h2 id="heading-the-six-pillars-of-healthcare-imaging-preprocessing"><strong>The Six Pillars of Healthcare Imaging Preprocessing</strong></h2>
<p>Preprocessing for medical images can be organized around six core concerns. Two of them carry over directly from preprocessing structured data. Two need to be adapted because the mechanics change when the input is an image. And two are entirely new, they only exist once the data becomes pictures of human bodies.</p>
<h2 id="heading-pillar-1-scaling-making-the-numbers-play-fair">Pillar 1: Scaling — Making the Numbers Play Fair</h2>
<p>Imagine two children comparing their collections. One has 3 seashells. The other has 3,000 stickers. Asking who has more makes the answer seem obvious, but the <em>scales</em> are completely different. Comparing them meaningfully means putting both collections on the same measuring system.</p>
<p>In medical images, pixels usually range from 0 to 255 in 8-bit images, or 0 to 65,535 in some 16-bit medical DICOM images. Neural networks tend to train faster and more reliably when input values are small numbers close to zero.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69fd77e89f93a850a46d376f/1d864b0d-992c-4637-8f43-7ca86c6fd93c.png" alt="Histogram comparison showing chest X-ray pixel values before and after scaling. The left histogram displays values in the 0–255 range, while the right histogram shows the same distribution scaled to the 0–1 range used for machine learning." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>The fix:</strong> Divide every pixel by its maximum possible value, bringing everything into the 0-to-1 range.</p>
<pre><code class="language-python">image = cv2.imread(sample_path, cv2.IMREAD_GRAYSCALE)

# Scale to [0, 1]
image_scaled = image.astype(np.float32) / 255.0

print(f"Before scaling: {image.min()} to {image.max()}")
print(f"After scaling:  {image_scaled.min():.3f} to {image_scaled.max():.3f}")
</code></pre>
<p><strong>Takeaway:</strong> Pixel scaling follows the same principle as scaling any numerical feature. The values simply happen to be arranged as an image rather than a column.</p>
<h2 id="heading-pillar-2-normalization-centering-the-data">Pillar 2: Normalization — Centering the Data</h2>
<p>Imagine a teacher asks a class to rate a movie from 1 to 10. One child always gives 9s and 10s. Another spreads ratings evenly from 1 to 10. Comparing their opinions fairly requires adjusting each child's score relative to their own average.</p>
<p>In medical imaging even after scaling to 0–1, the overall brightness of images can vary. Some X-rays are taken with stronger exposure than others. Normalization shifts and rescales each image (or each channel) so the values are centered around zero with a standard deviation of one.</p>
<p><strong>The fix:</strong> Subtract the mean, divide by the standard deviation.</p>
<pre><code class="language-python"># Compute mean and std from the TRAINING set only — never from validation or test
def compute_train_stats(train_dir, sample_limit=1000):
    """Compute pixel mean and std across the training set."""
    pixel_values = []
    count = 0
    for class_name in os.listdir(train_dir):
        class_path = os.path.join(train_dir, class_name)
        for fname in os.listdir(class_path):
            if count &gt;= sample_limit:
                break
            img = cv2.imread(os.path.join(class_path, fname), cv2.IMREAD_GRAYSCALE)
            if img is not None:
                pixel_values.append(img.astype(np.float32).flatten() / 255.0)
                count += 1
    pixels = np.concatenate(pixel_values)
    return pixels.mean(), pixels.std()

train_mean, train_std = compute_train_stats(TRAIN_DIR)
image_normalized = (image_scaled - train_mean) / train_std
</code></pre>
<p><strong>⚠️</strong> Avoid this common mistake: Statistics for normalization should be computed from the training set only, never from validation or test. Including those in the calculation leaks information from the evaluation data into the model. The same statistics should then be applied to validation, test, and any new data at inference time.</p>
<p><strong>Takeaway:</strong> Centering and scaling each image around the dataset's statistics is the imaging equivalent of standardizing a feature column. The pixels are now comparable across images, regardless of how bright or dim each scan happened to be.</p>
<h2 id="heading-pillar-3-guiding-the-models-attention">Pillar 3: Guiding the Model's Attention</h2>
<p>Imagine a child walking into a crowded pet store. Instead of describing every animal in sight, a parent points to the features that matter: <em>“Look at the soft fur, the fluffy tail, and the nice small size.”</em> The child learns where to focus their attention.</p>
<p>Medical image preprocessing does something similar. It highlights the regions and features most relevant to the diagnostic task.</p>
<ul>
<li><p><strong>Region-of-interest (ROI) cropping</strong> — focus on the lung field and discard the patient's arms, machine borders, and any imprinted text</p>
</li>
<li><p><strong>Contrast enhancement</strong> — use techniques like CLAHE (Contrast Limited Adaptive Histogram Equalization) to make subtle lung textures more visible</p>
</li>
<li><p><strong>Channel selection</strong> — for images stored as RGB but containing grayscale information, convert to single-channel input to reduce noise</p>
</li>
</ul>
<img src="https://cdn.hashnode.com/uploads/covers/69fd77e89f93a850a46d376f/54cb1319-e794-472e-9ca4-22a063fd5092.png" alt="Three-panel illustration showing a chest X-ray before and after feature enhancement. The first panel shows the original image, the second highlights the lung region of interest, and the third shows the image after CLAHE contrast enhancement with lung textures appearing more visible." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>CLAHE applied to an X-ray:</p>
<pre><code class="language-python"># CLAHE enhances local contrast — useful for X-rays
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
image_enhanced = clahe.apply(image)

# Visualize the difference
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
axes[0].imshow(image, cmap='gray')
axes[0].set_title('Original')
axes[1].imshow(image_enhanced, cmap='gray')
axes[1].set_title('After CLAHE')
plt.show()
</code></pre>
<p><strong>Takeaway:</strong> The goal of teaching the model what to look at hasn't changed. With structured data, the answer is in new columns. With images, the answer is in cropping, enhancement, and emphasizing the regions that carry diagnostic signal.</p>
<h2 id="heading-pillar-4-handling-missing-data">Pillar 4: Handling Missing Data</h2>
<p>Imagine reading a storybook with a few damaged pages. You don’t throw away the entire book, you decide whether to skip the page, infer what might be missing, or mark it for review.</p>
<p>In medical imaging, missing data can mean corrupted files, missing labels, or incomplete studies rather than empty spreadsheet cells.</p>
<p>The same three strategies — drop, impute, flag — still apply, just with different mechanics:</p>
<pre><code class="language-python"># Strategy 1: Drop — remove unreadable or empty images
def is_valid_image(path):
    try:
        img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
        if img is None:
            return False
        if img.mean() &lt; 5:           # nearly black
            return False
        if img.shape[0] &lt; 50 or img.shape[1] &lt; 50:  # too small
            return False
        return True
    except Exception:
        return False

# Strategy 2: Impute — rare for images, but possible (e.g., in painting to fill in missing patches). Generally avoided for diagnostic data.

# Strategy 3: Flag — track which patients are missing which modalities,
#   and let the model condition on availability. Common in multi-modal healthcare ML.
</code></pre>
<p><strong>Takeaway:</strong> "Missing" in imaging data is rarely just a NaN. It can be a broken file, an unlabeled scan, an absent modality, or a black corner inside an image. The same three strategies still apply.</p>
<h2 id="heading-pillar-5-resizing-amp-resampling-fitting-everything-in-the-same-frame">Pillar 5: Resizing &amp; Resampling — Fitting Everything in the Same Frame</h2>
<p>Imagine displaying children’s drawings on a classroom wall. If every drawing is a different size, they won’t fit neatly into the display. You resize them while preserving their proportions.</p>
<p>Medical images must often be resized to a common input size, but anatomical structures should retain their original shape.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69fd77e89f93a850a46d376f/d36b6f8c-4be0-41b7-ab7c-5ca30c01b3e0.png" alt="Comparison of two chest X-ray resizing approaches. One image is stretched into a square shape, distorting the lungs, while the second preserves the original aspect ratio by adding padding around the image. The aspect-ratio-preserving approach is highlighted as the preferred method." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>The fix:</strong> Resize all images to a common shape. For medical data, <em>how</em> the resizing is done matters.</p>
<pre><code class="language-python">TARGET_SIZE = (224, 224)

# Simple resize (may distort aspect ratio)
image_resized = cv2.resize(image, TARGET_SIZE)

# Better: preserve aspect ratio with padding
def resize_with_padding(image, target_size):
    h, w = image.shape[:2]
    target_h, target_w = target_size
    scale = min(target_h / h, target_w / w)
    new_h, new_w = int(h * scale), int(w * scale)
    resized = cv2.resize(image, (new_w, new_h))
    
    pad_h = target_h - new_h
    pad_w = target_w - new_w
    top, bottom = pad_h // 2, pad_h - pad_h // 2
    left, right = pad_w // 2, pad_w - pad_w // 2
    padded = cv2.copyMakeBorder(resized, top, bottom, left, right,
                                 cv2.BORDER_CONSTANT, value=0)
    return padded

image_clean_resize = resize_with_padding(image, TARGET_SIZE)
</code></pre>
<p><strong>⚠️ Why aspect ratio matters in healthcare:</strong> Squishing a chest X-ray horizontally makes the lungs look unnatural. Models trained on distorted anatomy often perform worse on real scans. Preserving aspect ratio is generally the safer choice.</p>
<p><strong>Takeaway:</strong> Models need a consistent input size, but the geometry of the anatomy needs to be preserved. Resize, but resize carefully.</p>
<h2 id="heading-pillar-6-denoising-amp-artifact-handling-cleaning-the-window">Pillar 6: Denoising &amp; Artifact Handling — Cleaning the Window</h2>
<p>Imagine looking through a window with dust and smudges on the glass. Cleaning the window makes the view clearer, but scrubbing too aggressively could scratch the glass.</p>
<p>Similarly, medical images often contain noise and acquisition artifacts that should be reduced carefully without removing clinically important details.</p>
<p>For chest X-rays, the most common issues are mild noise and burned-in text or markers. A gentle median or bilateral filter helps with the first, while cropping or masking helps with the second.</p>
<pre><code class="language-python"># Gentle denoising — careful not to blur away clinical detail
image_denoised = cv2.medianBlur(image, ksize=3)

# Bilateral filter preserves edges better than a median filter
image_bilateral = cv2.bilateralFilter(image, d=5, sigmaColor=50, sigmaSpace=50)
</code></pre>
<p><strong>⚠️ A note of caution:</strong> Aggressive denoising can erase the features a model needs to detect a disease. For diagnostic ML, gentle filtering is generally preferred. A useful rule of thumb: if a radiologist can't distinguish the cleaned image from the original, the filtering has gone too far.</p>
<p><strong>Takeaway:</strong> Imaging data carries noise that structured data doesn't have. The window can be cleaned, but never so aggressively that the view is wiped away with the smudges.</p>
<h2 id="heading-putting-it-all-together-a-complete-pipeline">Putting it All Together: A Complete Pipeline</h2>
<img src="https://cdn.hashnode.com/uploads/covers/69fd77e89f93a850a46d376f/c532949b-000c-403e-acb9-f9dec689182e.png" alt="Workflow showing a chest X-ray progressing through a healthcare imaging preprocessing pipeline. The image moves through validation, resizing, denoising, contrast enhancement, scaling, and normalization before becoming a model-ready machine learning input." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Here's how the six pillars combine into a single preprocessing function for chest X-ray images:</p>
<pre><code class="language-python">def preprocess_xray(image_path, target_size=(224, 224),
                    train_mean=0.482, train_std=0.236):
    """
    Full preprocessing pipeline for chest X-ray images.
    Applies all six pillars in order.
    """
    # Pillar 4: Validate first — skip corrupted files
    if not is_valid_image(image_path):
        return None
    
    image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    
    # Pillar 5: Resize with aspect ratio preserved
    image = resize_with_padding(image, target_size)
    
    # Pillar 6: Gentle denoising
    image = cv2.medianBlur(image, 3)
    
    # Pillar 3: Enhance contrast to highlight lung texture
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
    image = clahe.apply(image)
    
    # Pillar 1: Scale to [0, 1]
    image = image.astype(np.float32) / 255.0
    
    # Pillar 2: Normalize using training set statistics
    image = (image - train_mean) / train_std
    
    return image
</code></pre>
<h2 id="heading-try-it-yourself">Try it Yourself</h2>
<p>Every code snippet in this article is bundled into a runnable Kaggle notebook: <a href="https://www.kaggle.com/code/lakshmimahabaleshwar/chest-xray-preprocessing-kaggle">Chest X-Ray Preprocessing — Kaggle Notebook</a>. Fork it, attach the dataset, and run all the cells to see each preprocessing pillar in action on real chest X-rays.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Here's a summary of what we've discussed in this article:</p>
<table>
<thead>
<tr>
<th><strong>Pillar</strong></th>
<th><strong>Purpose</strong></th>
<th><strong>Example</strong></th>
</tr>
</thead>
<tbody><tr>
<td>Scaling</td>
<td>Standardize pixel ranges</td>
<td>0-255 → 0-1</td>
</tr>
<tr>
<td>Normalization</td>
<td>Center brightness distributions</td>
<td>z-score normalization</td>
</tr>
<tr>
<td>Attention Guidance</td>
<td>Highlight diagnostic regions</td>
<td>CLAHE</td>
</tr>
<tr>
<td>Missing Data Handling</td>
<td>Remove unusable scans</td>
<td>Corrupted files</td>
</tr>
<tr>
<td>Resizing</td>
<td>Consistent input size</td>
<td>224×224</td>
</tr>
<tr>
<td>Denoising</td>
<td>Reduce acquisition noise</td>
<td>Median filter</td>
</tr>
</tbody></table>
<p>Preprocessing for structured data is about making numbers play fair so a model can see them clearly.</p>
<p>Preprocessing for healthcare imaging is about respecting the messy reality of how medical data is captured, stored, and labeled. Some standard techniques carry over directly. Some need to be adapted. And a few preprocessing concerns only emerge once the data becomes pictures of human bodies.</p>
<p>Stepping back, whether it's a child learning to organize their toy box, or a model learning to spot pneumonia in a chest X-ray, the quality of learning depends on the quality of data preparation. Get the data right.</p>
<p>If this was useful, you can find a related conceptual primer on preprocessing more broadly here: <a href="https://lakshmimahabaleshwara.substack.com/p/data-preprocessing-for-machine-learning">Data Preprocessing for Machine Learning</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Docker Full Course ]]>
                </title>
                <description>
                    <![CDATA[ We just posted a comprehensive Docker course now live on the freeCodeCamp.org YouTube channel! The ability to scale applications instantly and ship software reliably is an important skill. Containeriz ]]>
                </description>
                <link>https://www.freecodecamp.org/news/docker-full-course/</link>
                <guid isPermaLink="false">6a217a7e004b104f5f3c88b8</guid>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Thu, 04 Jun 2026 13:15:42 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5f68e7df6dfc523d0a894e7c/2d702aaa-2eef-48b1-aa2b-f36dcb744501.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>We just posted a comprehensive Docker course now live on the freeCodeCamp.org YouTube channel!</p>
<p>The ability to scale applications instantly and ship software reliably is an important skill. Containerization is at the heart of modern development.</p>
<p>This hands-on, structured course is designed to take you from absolute scratch to becoming job-ready. Taught by instructor Eissa from DolfinEd, who brings over 25 years of industry experience and 21 years of teaching expertise, this course breaks down complex concepts into simple, actionable skills.</p>
<p>This is a complete, step-by-step practical course that covers everything you need to master Docker:</p>
<ul>
<li><p>Foundations: Understand the shift from legacy physical servers to virtual machines and containers.</p>
</li>
<li><p>Core Skills: Master Docker files, image creation, and how to manage repositories using Docker Hub.</p>
</li>
<li><p>Networking &amp; Storage: Learn the gold standards for managing container networking, storage, and volumes.</p>
</li>
<li><p>Orchestration: Move beyond basic containers by learning how to deploy multi-container applications with Docker Compose and get an introduction to Docker Swarm.</p>
</li>
<li><p>Real-World Application: Put your skills to the test with structured quizzes, module assignments, and real-world projects that mirror professional environments.</p>
</li>
</ul>
<p>Watch the full course now and start your journey to becoming a Docker expert (7-hour watch</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Key Technical Design Decisions for Building an Educational App with LLMs  ]]>
                </title>
                <description>
                    <![CDATA[ Recently, I spent time prototyping an educational app using Claude Code. The project is an open-source mobile app for educators to share, discover, and facilitate low-cost creative learning activities ]]>
                </description>
                <link>https://www.freecodecamp.org/news/technical-design-decisions-educational-app-llms/</link>
                <guid isPermaLink="false">6a20a338e4cc400d0c4149bb</guid>
                
                    <category>
                        <![CDATA[ edtech ]]>
                    </category>
                
                    <category>
                        <![CDATA[ creativity ]]>
                    </category>
                
                    <category>
                        <![CDATA[ learning design ]]>
                    </category>
                
                    <category>
                        <![CDATA[ product development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ai experiments ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Srishti Sethi ]]>
                </dc:creator>
                <pubDate>Wed, 03 Jun 2026 21:57:12 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/004bd995-d90e-4589-be82-e66e0be110f5.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Recently, I spent time prototyping an educational app using <a href="https://www.anthropic.com/claude-code">Claude Code</a>. The project is an open-source mobile app for educators to share, discover, and facilitate low-cost creative learning activities.</p>
<p>One of the core features of the app is AI-assisted activity creation. Activity creation has always been a key aspect of the project, and in the earlier desktop version, this was handled through manual long-form activity submission forms.</p>
<p>Given the current AI landscape, it felt important to explore alternative ways to simplify and streamline activity creation using AI, while reducing the amount of manual form-filling required from users.</p>
<p>I started with a blank slate, letting Claude guide me on the technologies to use for the app. The app was eventually built with React Native (Expo) and Firebase, and builds on a web version that is currently in beta.</p>
<p>What stood out to me most during prototyping was the speed: the mobile app went from ideation and mockups to a working prototype in about a month, compared to nearly a year for the original web version.</p>
<p>I haven't been coding heavily in recent years, and most of my professional work today is centered around technical community management. But since I do have a technical background and prior experience working in software development, I found it surprisingly accessible to quickly build a functional app using Claude alongside its reference guides and documentation.</p>
<p>I do think that experience helped me reason through tradeoffs, evaluate architectural decisions, and critically analyze the generated code rather than relying on the LLM blindly.</p>
<p>In this article, I’ll share some of the technical design decisions I made along the way.</p>
<img src="https://cdn.hashnode.com/uploads/covers/6a172a9fbadcd8afcb11f314/3bca2984-4e06-4e6c-b494-d005bd1db319.png" alt="App Screenshots - Activity, Facilitation Step and Profile View" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<img src="URL1" alt="URL1" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-which-model-to-choose">Which Model to Choose</a></p>
</li>
<li><p><a href="#heading-choosing-for-geography-and-cost">Choosing For Geography and Cost</a></p>
</li>
<li><p><a href="#heading-choosing-the-programming-framework-and-backend-architecture">Choosing the Programming Framework and Backend Architecture</a></p>
</li>
<li><p><a href="#heading-machine-translation-and-multilingualism">Machine Translation and Multilingualism</a></p>
</li>
<li><p><a href="#heading-create-with-ai-with-humans-in-the-loop">“Create with AI” with Humans in the Loop</a></p>
</li>
<li><p><a href="#heading-optimizing-for-low-bandwidth">Optimizing for Low Bandwidth</a></p>
</li>
<li><p><a href="#heading-producing-a-demo-video">Producing a Demo Video</a></p>
</li>
<li><p><a href="#heading-summary">Summary</a></p>
</li>
</ul>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>The key technical decisions I discuss here are reflections that come from my hands-on experimentation. They're intended for others working at the intersection of education and technology, especially developers, community practitioners, or technically curious people looking to prototype and build quickly using AI tools.</p>
<p>You'll need to have some basic familiarity with the React Native framework, how databases and Firebase work, as well as how to use Claude tools, command-line tools, and API integrations.</p>
<p>It also helps to be comfortable making decisions along the way around tradeoffs, such as choosing one infrastructure over another based on cost, geography, multilingual support, scalability, or ease of use.</p>
<h2 id="heading-which-model-to-choose">Which Model to Choose</h2>
<p>When it comes to choosing the model to build the app itself, it was a straightforward choice. I picked <a href="https://www.anthropic.com/news/claude-opus-4-7">Opus 4.7</a> for its advanced capabilities because I needed the model to help architect the app from scratch.</p>
<p>But when it came to choosing the model inside the app, the decision required more consideration.</p>
<p>Before diving into the reasons for picking a model, let’s first understand the context. Some of the features in the app include lesson plan creation and structuring with AI, machine translation of the content into 10 languages, a facilitation mode that guides educators through AI-generated tips for each activity step, educator profiles, and more.</p>
<p>If we break these features down, the model needs to support a few key capabilities: structured JSON generation that follows strict schemas, pedagogical reasoning for activity design, multilingual content generation, and the ability to infer constraints such as time, materials, and age-appropriateness. It also needs to reliably map user inputs into predefined activity categories while maintaining consistency in output structure.</p>
<p>The activity generation workflow is the key AI feature in the app. Since it's an asynchronous, one-shot generation feature, I picked Sonnet among the available Claude models because of the quality and non-generic educational content it was able to generate.</p>
<h2 id="heading-choosing-for-geography-and-cost">Choosing For Geography and Cost</h2>
<p>Latency and network reliability were also important considerations. The app is designed to support educators working in underserved contexts and slower network environments. Although Claude’s <a href="https://www.anthropic.com/claude/haiku">Haiku</a> model would have offered lower latency, it might not be as reliable on slower networks compared to other models.</p>
<p>At the same time, I plan to keep the app free and open source, and I'm not currently planning to market it aggressively. Using Opus for end-user generation would therefore have been expensive, even though it may have produced richer outputs.</p>
<p>For a structured generation task like this, Sonnet felt like the right balance between quality, cost, and response time. Longer generation times with Opus could have negatively impacted user experience.</p>
<p>When it comes to configuring <code>maxTokens</code> in the API setup, I also made decisions keeping cost and generation length in mind.</p>
<p>A typical activity generated for the platform should ideally not exceed roughly 1,500–2,000 words, which translates to around 2,500 output tokens and approximately 30–45 seconds of generation time.</p>
<p>Based on this, I kept the <code>maxTokens</code> value around that range to help control token costs while still allowing enough space for meaningful structured educational content generation.</p>
<pre><code class="language-typescript">/**
 * Claude AI configuration.
 *
 * NOTE: For production, move the API key to a server-side proxy to avoid
 * exposing it in the client bundle. The `baseUrl` can be swapped to your
 * own backend endpoint that forwards requests to Anthropic.
 */
export const aiConfig = {
  apiKey: process.env.EXPO_PUBLIC_CLAUDE_API_KEY ?? '',
  model: 'claude-sonnet-4-6',
  maxTokens: 2500,
  baseUrl: 'https://api.anthropic.com/v1/messages',
  anthropicVersion: '2023-06-01',
};
</code></pre>
<h2 id="heading-choosing-the-programming-framework-and-backend-architecture">Choosing the Programming Framework and Backend Architecture</h2>
<p>I wanted to build an app using a framework that could work seamlessly on both Android and iOS devices. React Native seemed like an obvious choice here, both because it directly fit this requirement and because of its simplicity, ease of use, and overall popularity in the ecosystem.</p>
<p>For the database and backend, I wanted to pick a system that felt credible from both a data privacy and security perspective.</p>
<p>I had a slightly unexpected moment during development while discussing architecture choices with Claude Code. It suggested commonly used developer platforms, such as <a href="https://supabase.com/">Supabase</a>. At first, this felt like a reasonable default choice.</p>
<p>But the key here was to not just go with what's commonly suggested, and instead do a quick but thorough check on how reliably these services are accessible in the target user regions. While looking deeper, I came across reports that <a href="https://techcrunch.com/2026/02/27/india-disrupts-access-to-popular-developer-platform-supabase-with-blocking-order/">Supabase access had been restricted in India</a>, likely related to cybersecurity concerns.</p>
<p>That immediately changed my decision. Even though Claude had initially scaffolded the backend setup assuming Supabase, I later switched the architecture to <a href="https://firebase.google.com/">Firebase</a> by creating a project directly in the Firebase Console.</p>
<p>That was one of those small but important reminders that it's not enough to accept AI suggestions at face value. It's useful to actively check for the latest context, especially when it comes to infrastructure and platform availability.</p>
<p>The Firebase setup itself looked fairly straightforward:</p>
<ul>
<li><p>Create a project at <a href="https://console.firebase.google.com/">Firebase Console</a></p>
</li>
<li><p>Enable Authentication (Email/Password)</p>
</li>
<li><p>Create a Firestore database</p>
</li>
<li><p>Enable Storage</p>
</li>
<li><p>Add a Web app and copy the config values</p>
</li>
</ul>
<p>Another pattern I noticed was that AI is very quick to suggest interesting or “modern” infrastructure choices along the way: for example, for video uploads or media handling.</p>
<p>But in practice, thoughtful tradeoff decisions matter much more. Especially at an early stage, when I'm still validating the app idea with a small group of educators, I don't actually need a full-scale video infrastructure. This keeps the system lightweight, reduces implementation complexity, and helps avoid overengineering before the product direction and user needs are fully validated.</p>
<p>The prompt I used reflected this thinking:</p>
<blockquote>
<p>I am in the early validation stage for this app, focusing on feedback from a small group of educators. Therefore, we do not require a scalable and robust video infrastructure yet. Let’s design for easier alternatives, such as users uploading their videos to YouTube and simply copying the URL into a field for embedding on the activity page.</p>
</blockquote>
<h2 id="heading-machine-translation-and-multilingualism">Machine Translation and Multilingualism</h2>
<p>Given that the primary audience of the earlier version of the app was multilingual, with the platform targeting users from different Indian language communities, it was really difficult as a small project to get translation coverage for content across all languages. But with AI, machine translation is possible at least for popular languages for which training datasets are available.</p>
<p>For the prototyping phase, I'm providing 5 of the world’s most popular languages and 5 popular Indian languages in the language selection. At least for these languages, the machine translation quality is pretty good and AI is reasonably reliable.</p>
<p>Without this, it would have been a cumbersome maintenance effort in the early stages of the app, both to keep translations updated and to recruit contributors to translate content manually.</p>
<p>There are two translation layers in the project: a static layer for interface messages kept in a <code>src/i18n</code> folder, and a dynamic layer for activity content.</p>
<p>For the dynamic part, AI generates translations for activity content using the Google Translate API. This is the same public web endpoint that the Google Translate web widget uses. It's free and no API key is needed.</p>
<p>But the API is unofficial and rate limits aren't guaranteed. For production use, we'll eventually switch to something more commercial and reliable such as the <a href="https://cloud.google.com/translate">Cloud Translation API</a>.</p>
<h2 id="heading-create-with-ai-with-humans-in-the-loop">“Create with AI” with Humans in the Loop</h2>
<p>The core idea behind the app is to help educators document and share their creative projects. So making documentation easier through AI while still keeping educators in the loop to maintain ownership over the final published content felt like an essential design choice.</p>
<p>Initially, I experimented with using Claude as a conversational chat partner for activity creation. The idea was that the AI would guide educators through a back-and-forth interaction and gradually build the activity plan through follow-up questions.</p>
<p>But during prototyping, I realized that this often introduced too much friction into the experience. It started to feel like users were being asked too many questions, and the final outputs frequently deviated from the intended structure or became inconsistent across activities.</p>
<p>To make the experience as quick and lightweight as possible, the app now primarily works from a single input. Users can briefly describe their activity idea in natural language and optionally upload media files, after which the AI generates a complete structured activity plan.</p>
<p>Instead of relying on open-ended conversational outputs, the app uses prompts with specific guidelines and schema requirements. The generated output is strictly valid JSON following a predefined structure (for example: 3-6 activity steps, 4-5 facilitation steps depending on complexity, automatic selection of a featured image based on visual relevance, and so on). This allows the generated content to be directly consumed by the app without requiring an additional parsing or mapping layer.</p>
<p>If a user uploads multiple photos, the AI also identifies which images belong to which activity steps. The experience works somewhat similarly to Facebook’s “<a href="https://about.fb.com/news/2026/03/facebook-marketplace-new-meta-ai-tools-make-selling-faster-and-easier/">Create your listing with Meta AI</a>” feature. Users can upload different types of media files, after which the AI generates titles, materials, objectives, activity steps, and facilitation tips.</p>
<p>Importantly, everything remains editable before publishing, so educators can review, refine, and personalize the final content before sharing it with the community.</p>
<pre><code class="language-json">Return ONLY valid JSON (no markdown, no backticks) matching this exact schema:
{
  "title": "string (catchy, max 60 chars)",
  "description": "string (2-3 sentences, educator-facing)",
  "duration_minutes": number,
  "min_age": number,
  "max_age": number or null,
  "category": one of "Art" | "Science" | "Coding" | "Circuits" | "Engineering" | "Storytelling" | "Drama" | "Film" | "Music" | "Nature",
  "materials": [{ "name": "string", "buy_hint": "string (where to find it, e.g. craft store, hardware store, recycled)" }],
  "objectives": ["string (learning objective)"],
  "steps": [{
    "number": 1,
    "title": "string",
    "description": "string (2-3 sentences, detailed instructions for the educator)",
    "duration_minutes": number,
    "tip": "string (practical facilitation tip for educators running this step for the first time)",
    "assignedPhotoIndex": number or null
  }],
  "featured_image_index": number or null,
  "tips": ["string (general facilitation tip)"]
}
</code></pre>
<h2 id="heading-optimizing-for-low-bandwidth">Optimizing for Low Bandwidth</h2>
<p>Keeping in mind that the app is intended for users on low-bandwidth networks during the initial development phase, I made sure to provide these constraints to Claude and ensure that the prototype included the bare minimum needed to support users on slower connections.</p>
<p>The app loads 10 activities at a time, and uses the Expo module <code>expo-image-manipulator</code> for lightweight image processing tasks such as resizing photos to 1200 px and re-encoding them as JPEGs before upload. As a result, a typical 3–5 MB image can be reduced to ~200 KB.</p>
<p>The AI calls are also kept text-only. While images are uploaded and stored in Firebase, they're never sent to the model itself, which helps keep requests lightweight and responsive even on slower internet connections.</p>
<h2 id="heading-producing-a-demo-video">Producing a Demo Video</h2>
<p>Finally, this was probably the most fun part of the process. Before my first demo meeting with an educator, I managed to generate a ~1 minute demo video out of a 4 minute screen recording. I used Claude to identify and cut the most relevant segments, and the <code>ffmpeg</code> command line tool to convert the final output into the appropriate format.</p>
<p>After trying out numerous AI video generation tools that would exhaust my tokens pretty quickly, I eventually found myself coming back to Claude for this workflow, and it ended up working surprisingly well. 🙂</p>
<h2 id="heading-summary"><strong>Summary</strong></h2>
<p>A little over a year ago, I had started implementing a similar version of this app, but never reached a functional prototype. The tooling was still evolving, and I often found myself stuck in loops of agentic errors, spending more time debugging the AI workflow itself than actually building the product. With the recent advancements in AI-assisted development tools, it has genuinely felt empowering to shape and prototype ideas much more quickly.</p>
<p>At the same time, one of the biggest lessons from this experience was that you can't blindly build applications using AI tools. You can't simply ask an agent to do all the work and make decisions while you go on a hike –&nbsp;though perhaps you can do the dishes between prompts.</p>
<p>Each step still needs careful evaluation. The reasoning, suggestions, and discussions generated by the agent need to be read, understood, and refined through follow-up prompts and human input.</p>
<p>A large part of the work involves making thoughtful decisions along the way: what model to choose and why, what tradeoffs matter most for your use case (cost, geography, latency, reasoning capability, multilingual support), what infrastructure choices make sense for hosting and scalability, what API integrations are appropriate, and what decisions should be optimized for the early stages of the app versus long-term growth.</p>
<p>Similarly, design decisions should be grounded in the actual needs and contexts of users rather than simply following what AI tools suggest by default.</p>
<p>That's ultimately what I have tried to document through this article: not just how the educational app was built, but also the reasoning and tradeoffs behind the technical and design decisions made throughout the process.</p>
<p>Hopefully, these reflections are useful to others experimenting with AI-assisted development, especially in educational or community-centered contexts.</p>
<p>And if you have ideas for evolving the app further, feel free to contribute or comment on GitHub :)</p>
<h3 id="heading-resources"><strong>Resources</strong></h3>
<ul>
<li><p><a href="https://github.com/unstructuredstudio/zubhub-mobile">Checkout the source code repository</a></p>
</li>
<li><p><a href="https://drive.google.com/file/d/1W0rr9O6dBw9_9yCcgufVdCf8nbydHAo1/view?usp=sharing">Watch demo of the final workflow</a></p>
</li>
</ul>
<div class="embed-wrapper"><iframe width="100%" height="500" title="Embedded content" loading="lazy">
</iframe></div> ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What “Production-Ready” Actually Means in Flutter  ]]>
                </title>
                <description>
                    <![CDATA[ I've been building Flutter apps for a few years now, and I still remember the first time I shipped something I was genuinely proud of. It had a clean UI, smooth animations, and every flow worked exact ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-production-ready-actually-means-in-flutter/</link>
                <guid isPermaLink="false">6a206c1a2a223bf98b13f071</guid>
                
                    <category>
                        <![CDATA[ Flutter ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Dart ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Mobile Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Android ]]>
                    </category>
                
                    <category>
                        <![CDATA[ iOS ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Gidudu Nicholas ]]>
                </dc:creator>
                <pubDate>Wed, 03 Jun 2026 18:02:02 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/82dd0caa-f57c-447b-9a20-4e49f40898f7.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>I've been building Flutter apps for a few years now, and I still remember the first time I shipped something I was genuinely proud of. It had a clean UI, smooth animations, and every flow worked exactly as I intended. I handed it to real users and felt good about it.</p>
<p>Within a week, the bug reports started coming in.</p>
<p>Screens freezing, API calls failing silently, Users losing form data they'd spent ten minutes filling out, one user reported the app just... stopped responding after they walked through a tunnel on the subway. I had never tested that. Why would I? It worked fine on my machine.</p>
<p>That experience taught me something I wish someone had told me earlier: there's a real gap between an app that works and an app that is production-ready.</p>
<p>I've now shipped multiple Flutter apps, and I've hit almost every wall this article covers — network failures, memory leaks, state management that made sense at first and became a nightmare at scale, and performance that felt fine in development and janked badly on a user's old device.</p>
<p>This article is everything I've learned from those experiences. Not theory, but actual patterns that came from actual problems.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-why-it-works-on-my-machine-is-dangerous-in-flutter">Why "It Works on My Machine" is Dangerous in Flutter</a></p>
</li>
<li><p><a href="#heading-development-vs-production-what-actually-changes">Development vs Production: What Actually Changes</a></p>
</li>
<li><p><a href="#heading-network-reliability-and-defensive-request-handling">Network Reliability and Defensive Request Handling</a></p>
</li>
<li><p><a href="#heading-retry-logic-and-the-production-request-lifecycle">Retry Logic and the Production Request Lifecycle</a></p>
</li>
<li><p><a href="#heading-offline-support-and-local-persistence">Offline Support and Local Persistence</a></p>
</li>
<li><p><a href="#heading-state-management-at-scale">State Management at Scale</a></p>
</li>
<li><p><a href="#heading-widget-rebuilds-and-rendering-performance">Widget Rebuilds and Rendering Performance</a></p>
</li>
<li><p><a href="#heading-async-pitfalls-and-the-disposed-widget-problem">Async Pitfalls and the Disposed Widget Problem</a></p>
</li>
<li><p><a href="#heading-memory-leaks-and-lifecycle-management">Memory Leaks and Lifecycle Management</a></p>
</li>
<li><p><a href="#heading-observability-and-crash-reporting">Observability and Crash Reporting</a></p>
</li>
<li><p><a href="#heading-testing-production-flutter-apps">Testing Production Flutter Apps</a></p>
</li>
<li><p><a href="#heading-architecture-and-long-term-maintainability">Architecture and Long-Term Maintainability</a></p>
</li>
<li><p><a href="#heading-end-to-end-example-a-production-grade-profile-feature">End-to-End Example: a Production-Grade Profile Feature</a></p>
</li>
<li><p><a href="#heading-final-thoughts">Final Thoughts</a></p>
</li>
</ul>
<h2 id="heading-why-it-works-on-my-machine-is-dangerous-in-flutter">Why "It Works on My Machine" is Dangerous in Flutter</h2>
<p>Here's what your development environment looks like: fast internet, a powerful machine or emulator, a clean app state on every hot reload, APIs that respond in milliseconds, and you, a careful developer who deliberately follows the happy path.</p>
<p>Here's what your users look like: spotty mobile data, old mid-range devices, six other apps running in the background, and zero patience for a screen that stops loading without explanation.</p>
<p>That gap is where production bugs live.</p>
<p>The tricky part is that Flutter makes development feel so smooth that it's easy to mistake "works on my machine" for "ready for users."</p>
<p>I've made that mistake. Most Flutter developers I know have made it too. The app looks polished. The animations are butter. You demo it to a colleague, and everything goes perfectly. Then someone tries to use it while commuting on patchy mobile data, and the whole thing falls apart.</p>
<p>Production-ready Flutter engineering starts with accepting one uncomfortable truth: things will go wrong. Networks will fail. Devices will run low on memory. Users will background your app at the worst possible moment. The question isn't whether these things happen, but rather whether your app handles them gracefully when they do.</p>
<h2 id="heading-development-vs-production-what-actually-changes">Development vs Production: What Actually Changes</h2>
<p>I want to be specific here because "production is different" is easy to say and hard to internalize until you've been burned by it.</p>
<p>In development, a failed API call is something you notice immediately in your terminal, fix in a few minutes, and move on from. In production, that same failed API call happens to a user who sees a blank screen, has no idea why, waits a few seconds, and then either retries or uninstalls. You find out three days later when someone leaves a one-star review.</p>
<p>In development, a widget that rebuilds unnecessarily costs a few milliseconds you never feel. In production, on an older or lower-powered device with several apps running in the background, that same unnecessary rebuild is the thing that pushes a frame over the 16ms budget and creates a stutter the user notices.</p>
<p>In development, a memory leak that adds 5MB of usage over ten minutes is invisible. I once had a leak in a chat feature, an undisposed stream subscription that was completely undetectable during testing. In production, after an hour of use on a low-memory device, the OS started killing the app mid-session. Users thought it was crashing randomly. It took me an embarrassingly long time to track down.</p>
<p>The pattern is always the same: problems that are invisible at development scale become significant at production scale, and problems that are minor on development hardware become severe on the hardware your actual users own.</p>
<h2 id="heading-network-reliability-and-defensive-request-handling">Network Reliability and Defensive Request Handling</h2>
<p>If I had to pick one category of bug that has bitten me the most across multiple apps, it would be this one. Mobile networks are genuinely unreliable, and Flutter apps are often written as though they're not.</p>
<p>The most common networking pattern I see (and wrote myself for longer than I'd like to admit) looks like this:</p>
<pre><code class="language-dart">final response = await dio.get('/user');

setState(() {
  user = response.data;
});
</code></pre>
<p>This works perfectly in development. But it has four ways to fail in production:</p>
<ol>
<li><p>The request fails due to a network error, and the exception propagates unhandled</p>
</li>
<li><p>The user navigates away before the response arrives and <code>setState</code> is called on a disposed widget</p>
</li>
<li><p>The API returns unexpected data, and the cast throws at runtime</p>
</li>
<li><p>The request hangs indefinitely, and the user stares at a spinner forever</p>
</li>
</ol>
<p>I've hit all four. Here's a version that handles them:</p>
<pre><code class="language-dart">Future&lt;void&gt; loadUser(String userId) async {
  setState(() {
    isLoading = true;
    error = null;
  });

  try {
    final response = await dio.get('/user/$userId');

    // mounted checks whether this widget is still in the widget tree.
    // If the user navigated away while the request was running,
    // mounted is false. Calling setState on a disposed widget throws
    // an error — this one line prevents that entire class of crash.
    if (!mounted) return;

    setState(() {
      user = User.fromJson(response.data as Map&lt;String, dynamic&gt;);
      isLoading = false;
    });
  } on DioException catch (e) {
    if (!mounted) return;

    setState(() {
      // Give the user a message that is actually useful.
      // "Something went wrong" is not helpful. Knowing whether
      // they have no internet vs the server failed lets them
      // decide whether to move or wait.
      error = e.type == DioExceptionType.connectionError
          ? 'No internet connection. Please try again.'
          : 'Failed to load profile. Please try again.';
      isLoading = false;
    });
  }
}
</code></pre>
<h3 id="heading-the-three-states-every-screen-needs">The Three States Every Screen Needs</h3>
<p>I used to design screens for the success case and treat loading and error as afterthoughts. That was a mistake. Every screen that fetches remote data needs all three:</p>
<pre><code class="language-dart">@override
Widget build(BuildContext context) {
  // Loading: never leave users staring at a blank screen.
  // A spinner tells them something is happening.
  if (isLoading) {
    return const Center(child: CircularProgressIndicator());
  }

  // Error: show what went wrong and how to recover.
  // A dead end with no retry button is one of the most
  // frustrating things a user can experience.
  if (error != null) {
    return Center(
      child: Column(
        mainAxisSize: MainAxisSize.min,
        children: [
          Text(error!, style: const TextStyle(color: Colors.red)),
          const SizedBox(height: 16),
          ElevatedButton(
            onPressed: () =&gt; loadUser(widget.userId),
            child: const Text('Try again'),
          ),
        ],
      ),
    );
  }

  // Success: show the data.
  return UserProfileView(user: user!);
}
</code></pre>
<p>The error state with a retry button isn't a nice-to-have. It's the difference between a user who recovers from a network hiccup and a user who thinks your app is broken.</p>
<h2 id="heading-retry-logic-and-the-production-request-lifecycle">Retry Logic and the Production Request Lifecycle</h2>
<p>Mobile networks fail all the time temporarily. A user walks past a dead zone, enters an elevator, or switches from WiFi to mobile data mid-request. The request fails but if retried two seconds later, it would succeed.</p>
<p>Without retry logic, every temporary network failure is a permanent failure from the user's perspective. That's a bad trade.</p>
<pre><code class="language-dart">Future&lt;T&gt; withRetry&lt;T&gt;(
  Future&lt;T&gt; Function() request, {
  int maxAttempts = 3,
  Duration delay = const Duration(seconds: 1),
}) async {
  for (int i = 0; i &lt; maxAttempts; i++) {
    try {
      return await request();
    } catch (e) {
      // On the final attempt, stop retrying and let the
      // error propagate to the caller.
      if (i == maxAttempts - 1) rethrow;

      // Wait before trying again. This gives temporary network
      // issues time to resolve and avoids hammering a server
      // that might already be struggling.
      await Future.delayed(delay);
    }
  }

  throw Exception('Retry failed');
}
</code></pre>
<p>Usage is straightforward:</p>
<pre><code class="language-dart">final user = await withRetry(
  () =&gt; dio.get('/user/$userId'),
  maxAttempts: 3,
  delay: const Duration(seconds: 2),
);
</code></pre>
<p>For production apps with heavier traffic, look at <code>dio_smart_retry</code>. This implements exponential backoff, and the delay doubles between each retry, which is much more considerate of server load during actual outages.</p>
<h2 id="heading-offline-support-and-local-persistence">Offline Support and Local Persistence</h2>
<p>I learned to take offline support seriously after an embarrassing support ticket. A user had filled out a long onboarding form (15 fields), which took them several minutes, and hit submit on a spotty connection. The request failed. The form cleared. All their data was gone. They were furious, and honestly, they had every right to be.</p>
<p>The goal of offline support is not to replicate every feature without internet. It's to make sure users don't lose progress and don't hit dead ends.</p>
<h3 id="heading-caching-remote-data">Caching Remote Data</h3>
<p>The strategy here is simple: every time a network request succeeds, save the result locally. Then, if the next request fails, serve what you saved last time instead of showing an error screen.</p>
<pre><code class="language-dart">class UserRepository {
  final Dio _dio;
  final Box _cache; // Hive box

  UserRepository(this._dio, this._cache);

  Future&lt;User&gt; getUser(String userId) async {
    try {
      final response = await _dio.get('/user/$userId');
      final user = User.fromJson(response.data as Map&lt;String, dynamic&gt;);

      // Save fresh data to the cache every time a request succeeds.
      // This means the next request can fall back to this
      // if the network is unavailable.
      await _cache.put('user_$userId', user.toJson());

      return user;
    } catch (e) {
      // Network failed. See if we have something cached.
      final cached = _cache.get('user_$userId');

      if (cached != null) {
        // Stale data is better than an error screen.
        // The user sees something useful even without internet.
        return User.fromJson(Map&lt;String, dynamic&gt;.from(cached));
      }

      // Nothing cached. We have no choice but to surface the error.
      rethrow;
    }
  }
}
</code></pre>
<h3 id="heading-preserving-user-input">Preserving User Input</h3>
<p>This is the fix for the onboarding ticket I mentioned:</p>
<pre><code class="language-dart">// Save whatever the user has typed whenever the field changes.
_contentController.addListener(() async {
  await _cache.put('draft_post', _contentController.text);
});

// When the screen opens, restore any saved draft.
@override
void initState() {
  super.initState();
  final draft = _cache.get('draft_post') as String?;
  if (draft != null &amp;&amp; draft.isNotEmpty) {
    _contentController.text = draft;
  }
}

// Clear the draft once the user successfully submits.
Future&lt;void&gt; _submit() async {
  await _repository.createPost(_contentController.text);
  await _cache.delete('draft_post');
}
</code></pre>
<p>Three lines of code that save users from losing their work. This is worth doing in any form that takes more than a minute to fill out.</p>
<p>Packages I use for local persistence:</p>
<ol>
<li><p><strong>Hive</strong> for simple key-value storage</p>
</li>
<li><p><strong>Isar</strong> when I need more powerful queries</p>
</li>
<li><p><strong>sqflite</strong> for relational data</p>
</li>
<li><p><strong>shared_preferences</strong> strictly for user settings, not for anything substantial</p>
</li>
</ol>
<h2 id="heading-state-management-at-scale">State Management at Scale</h2>
<p><code>setState</code> is fine. I want to say that clearly because there's a tendency in the Flutter community to treat it like it's always wrong. For local, simple UI state – a button toggling, a form field showing validation — <code>setState</code> is exactly the right tool.</p>
<p>The problems start when you use it for state that multiple widgets depend on, or for async operations, or for anything that needs to survive navigation. I've done all of these. Here's what goes wrong:</p>
<pre><code class="language-dart">// This setState call lives high in the widget tree.
// Every widget below it rebuilds — including expensive ones
// that have nothing to do with this state change.
setState(() {
  currentUser = updatedUser;
});
</code></pre>
<p>As the app grows, this gets worse. Rebuilds spread. Side effects happen in unexpected order. You start spending more time debugging state than building features.</p>
<h3 id="heading-moving-to-riverpod">Moving to Riverpod</h3>
<p>After hitting these walls in my second app, I switched to Riverpod and haven't looked back. The core idea is simple: state lives outside widgets, and widgets subscribe to exactly the state they need.</p>
<pre><code class="language-dart">@riverpod
class UserNotifier extends _$UserNotifier {
  @override
  AsyncValue&lt;User&gt; build(String userId) {
    _load();
    return const AsyncValue.loading();
  }

  Future&lt;void&gt; _load() async {
    state = const AsyncValue.loading();

    // AsyncValue.guard runs the future and wraps the result
    // in AsyncValue.data on success or AsyncValue.error on failure.
    // It saves you from writing try/catch every single time.
    state = await AsyncValue.guard(
      () =&gt; ref.read(userRepositoryProvider).getUser(userId),
    );
  }

  Future&lt;void&gt; refresh() =&gt; _load();
}
</code></pre>
<p>In the widget:</p>
<pre><code class="language-dart">@override
Widget build(BuildContext context) {
  // ref.watch subscribes this widget to the notifier.
  // It rebuilds only when userAsync changes — not when
  // unrelated state elsewhere in the app changes.
  final userAsync = ref.watch(userNotifierProvider(widget.userId));

  return userAsync.when(
    // when() forces you to handle loading, error, and data.
    // Miss one and it's a compile error, not a runtime surprise.
    loading: () =&gt; const CircularProgressIndicator(),
    error: (e, _) =&gt; Text('Error: $e'),
    data: (user) =&gt; UserProfileView(user: user),
  );
}
</code></pre>
<p>The part I appreciate most: <code>when()</code> makes it a compile error to forget the loading or error state. The compiler enforces what I used to forget.</p>
<h3 id="heading-immutable-state">Immutable State</h3>
<p>One thing that burned me hard in a real-time chat feature: a mutable list shared across multiple parts of the app.</p>
<pre><code class="language-dart">List&lt;Message&gt; messages = [];

// Later, in different places:
messages.add(newMessage);       // socket handler
messages.removeAt(0);          // pagination
messages.insert(0, pinned);    // push notification handler
</code></pre>
<p>When a message appeared twice, or disappeared at random, tracing which mutation caused it was genuinely painful. The fix is to never mutate and always create a new list:</p>
<pre><code class="language-dart">// The old list is unchanged. The new state is a new list.
// Every change is explicit and traceable.
state = [...state, newMessage];
</code></pre>
<p>It feels like a small thing until you spend two hours debugging a mutation bug. Then it feels very important.</p>
<h2 id="heading-widget-rebuilds-and-rendering-performance">Widget Rebuilds and Rendering Performance</h2>
<p>Flutter is fast. But unnecessary rebuilds accumulate, and on low-end devices the accumulation is noticeable.</p>
<h3 id="heading-const-widgets-skip-rebuilds-entirely">Const Widgets Skip Rebuilds Entirely</h3>
<p>The <code>const</code> keyword tells Dart this widget can be created at compile time and reused indefinitely. Any widget whose content will never change is a candidate.</p>
<pre><code class="language-dart">// Without const: a new Text instance is created on every
// rebuild of the parent, even though the content never changes.
Text('Welcome to the app')

// With const: Flutter reuses the same instance.
// No rebuild work, no allocation.
const Text('Welcome to the app')
</code></pre>
<p>This sounds like a small thing. In a large widget tree with many static elements, the cumulative effect is real. Make it a habit.</p>
<h3 id="heading-keep-the-rebuild-scope-small">Keep the Rebuild Scope Small</h3>
<p>When <code>setState</code> lives high in the widget tree, every widget below it rebuilds — even ones that have nothing to do with the state that changed. The fix is to push state as far down the tree as possible, ideally into its own extracted widget.</p>
<pre><code class="language-dart">// The problem: counter lives in the parent, so every
// setState call rebuilds the entire subtree — including
// ExpensiveListWidget, which has nothing to do with the counter.
class _BadExampleState extends State&lt;BadExample&gt; {
  int _counter = 0;

  @override
  Widget build(BuildContext context) {
    return Column(
      children: [
        Text('Count: $_counter'),
        ElevatedButton(
          onPressed: () =&gt; setState(() =&gt; _counter++),
          child: const Text('Increment'),
        ),
        const ExpensiveListWidget(), // rebuilds for no reason
      ],
    );
  }
}
</code></pre>
<p>Now, only that widget rebuilds when the count changes. <code>ExpensiveListWidget</code> is untouched.</p>
<h3 id="heading-listviewbuilder-for-anything-of-unknown-length">ListView.builder for Anything of Unknown Length</h3>
<p>A <code>Column</code> with a mapped list builds every item upfront regardless of whether it is visible. On a list of 200 items, that is 200 widgets created before the user has scrolled at all.</p>
<pre><code class="language-dart">// This builds every single item widget upfront.
// With 200 items, 200 widgets are created on first render,
// most of which are immediately off-screen.
Column(
  children: items.map((item) =&gt; ItemCard(item: item)).toList(),
)

// This builds only what is visible, plus a small buffer.
// Scrolling through 10,000 items uses the same memory as 10.
ListView.builder(
  itemCount: items.length,
  itemBuilder: (context, index) {
    return ItemCard(items[index]);
  },
)
</code></pre>
<p><code>ListView.builder</code> isn't an optimization for large lists. It's the correct default for any list of unknown or variable size. I use <code>Column</code> with a mapped list only when I know for certain the list will always be tiny.</p>
<h2 id="heading-async-pitfalls-and-the-disposed-widget-problem">Async Pitfalls and the Disposed Widget Problem</h2>
<p>This is one of those bugs that's completely invisible during development and shows up constantly in production.</p>
<p>The scenario: an async operation starts, the user navigates away before it finishes, and the operation completes and tries to call <code>setState</code> on a widget that no longer exists.</p>
<pre><code class="language-dart">Future&lt;void&gt; _loadData() async {
  final data = await repository.fetchData();

  // If the user navigated away during the await above,
  // this widget is gone. setState throws:
  // "setState() called after dispose()"
  setState(() =&gt; this.data = data );
}
</code></pre>
<p>The fix is one line:</p>
<pre><code class="language-dart">Future&lt;void&gt; _loadData() async {
  final data = await repository.fetchData();

  // mounted is true while the widget is in the tree,
  // false after dispose() has been called.
  if (!mounted) return;

  setState(() =&gt; this.data = data);
}
</code></pre>
<p>I now write this check automatically after every <code>await</code> that leads to a <code>setState</code>. It becomes muscle memory quickly.</p>
<h3 id="heading-never-create-futures-inside-build">Never Create Futures Inside Build</h3>
<p>This is an easy-to-overlook issue. When you create a Future directly inside the <code>build</code> method, a new Future is created on every rebuild — meaning <code>FutureBuilder</code> treats it as a brand new operation each time and resets to the loading state unnecessarily.</p>
<pre><code class="language-dart">// Bad: a new Future is created on every rebuild.
// FutureBuilder sees a different Future each time
// and resets to loading state unnecessarily.
@override
Widget build(BuildContext context) {
  return FutureBuilder(
    future: repository.fetchUser(userId), // new Future every build
    builder: (context, snapshot) { ... },
  );
}
</code></pre>
<pre><code class="language-dart">// Good: create the Future once in initState.
// FutureBuilder holds the same reference across rebuilds.
late final Future&lt;User&gt; _userFuture;

@override
void initState() {
  super.initState();
  _userFuture = repository.fetchUser(widget.userId);
}

@override
Widget build(BuildContext context) {
  return FutureBuilder(
    future: _userFuture,
    builder: (context, snapshot) { ... },
  );
}
</code></pre>
<h3 id="heading-move-heavy-work-off-the-ui-thread">Move Heavy Work Off the UI Thread</h3>
<p>Dart renders UI on the main isolate. Anything CPU-intensive that blocks it causes dropped frames.</p>
<pre><code class="language-dart">// Parsing a large API response synchronously on the main isolate
// can block rendering for 50-200ms on slower devices.
final users = (response.data as List)
    .map((json) =&gt; User.fromJson(json))
    .toList();
</code></pre>
<pre><code class="language-dart">// compute() runs the function in a separate isolate.
// The main isolate stays free to render frames.
// Note: the function must be top-level or static —
// closures that capture local state cannot be sent to another isolate.
final users = await compute(parseUsers, response.data);

List&lt;User&gt; parseUsers(dynamic data) {
  return (data as List)
      .map((json) =&gt; User.fromJson(json as Map&lt;String, dynamic&gt;))
      .toList();
}
</code></pre>
<p>I reach for <code>compute</code> whenever I am parsing a large JSON response, doing image processing, or running anything that feels slow in a quick profile. The threshold in my head is roughly 16ms — if an operation might take longer than that, it shouldn't be on the main isolate.</p>
<h2 id="heading-memory-leaks-and-lifecycle-management">Memory Leaks and Lifecycle Management</h2>
<p>This one cost me the most debugging time across all the apps I've shipped. Memory leaks in Flutter don't crash immediately. They build slowly — a few megabytes per session, every session — until the app starts feeling heavy, the OS starts killing it in the background, and users file bug reports about "random crashes."</p>
<p>The root cause is almost always the same: something created inside a widget keeps running after the widget is gone.</p>
<h3 id="heading-controllers-that-are-never-disposed">Controllers That Are Never Disposed</h3>
<p>The most common source of memory leaks I've seen, including in my own code, is controllers that are created in <code>initState</code> and never released. Flutter doesn't clean these up automatically.</p>
<pre><code class="language-dart">class _ProfileScreenState extends State&lt;ProfileScreen&gt; {
  late final TextEditingController _nameController;
  late final AnimationController _fadeController;
  late final ScrollController _scrollController;

  @override
  void initState() {
    super.initState();
    _nameController = TextEditingController();
    _fadeController = AnimationController(
      vsync: this,
      duration: const Duration(milliseconds: 300),
    );
    _scrollController = ScrollController();
  }

  @override
  void dispose() {
    // Every controller created in initState needs to be
    // disposed here. This is not optional — it releases
    // native resources and removes listeners that would
    // otherwise keep this widget's memory alive indefinitely.
    _nameController.dispose();
    _fadeController.dispose();
    _scrollController.dispose();
    super.dispose(); // always last
  }
}
</code></pre>
<p>An undisposed <code>AnimationController</code> is particularly bad. It holds a ticker that fires on every frame — so it keeps consuming CPU even after the screen it belonged to is gone. I've seen this cause noticeable battery drain in addition to memory issues.</p>
<h3 id="heading-stream-subscriptions">Stream Subscriptions</h3>
<pre><code class="language-dart">class _ChatScreenState extends State&lt;ChatScreen&gt; {
  StreamSubscription&lt;Message&gt;? _messageSubscription;

  @override
  void initState() {
    super.initState();
    _messageSubscription = messageStream.listen((message) {
      // Without cancellation, this callback keeps firing
      // even after the screen is removed from the tree.
      // It will call setState on a disposed widget and
      // hold message objects in memory that should be freed.
      if (mounted) setState(() =&gt; messages.add(message));
    });
  }

  @override
  void dispose() {
    _messageSubscription?.cancel();
    super.dispose();
  }
}
</code></pre>
<h3 id="heading-timers">Timers</h3>
<pre><code class="language-dart">@override
void dispose() {
  // A timer that fires after dispose will try to run
  // a callback on a widget that no longer exists.
  _dismissTimer?.cancel();
  super.dispose();
}
</code></pre>
<p>A rule I follow without exception: anything created in <code>initState</code> that has a <code>dispose</code>, <code>cancel</code>, or <code>close</code> method gets a corresponding call in <code>dispose</code>. No exceptions, no "I'll add it later."</p>
<h2 id="heading-observability-and-crash-reporting">Observability and Crash Reporting</h2>
<p>Before I integrated crash reporting into my first production app, debugging was genuinely painful. A user would report a crash. I would ask what they were doing. They would say "I just opened it." I would stare at the code looking for anything that could cause that. Half the time I never figured it out.</p>
<p>With crash reporting, that changes completely.</p>
<h3 id="heading-set-it-up-before-launch">Set it Up Before Launch</h3>
<pre><code class="language-dart">void main() async {
  WidgetsFlutterBinding.ensureInitialized();
  await Firebase.initializeApp();

  // Catch Flutter framework errors — widget build errors,
  // rendering errors, etc.
  FlutterError.onError =
      FirebaseCrashlytics.instance.recordFlutterFatalError;

  // Catch errors in async code that Flutter does not catch —
  // errors in event handlers, timers, isolates.
  PlatformDispatcher.instance.onError = (error, stack) {
    FirebaseCrashlytics.instance.recordError(error, stack, fatal: true);
    return true;
  };

  runApp(const MyApp());
}
</code></pre>
<h3 id="heading-never-let-failures-be-silent">Never Let Failures Be Silent</h3>
<pre><code class="language-dart">// This is how I used to write it. If submitOrder throws,
// nothing happens. The user has no idea. I have no idea.
await api.submitOrder(order);
</code></pre>
<pre><code class="language-dart">// This is how I write it now.
try {
  await api.submitOrder(order);
  setState(() =&gt; orderStatus = OrderStatus.confirmed);
} catch (e, stackTrace) {
  // recordError sends the full exception and stack trace
  // to Crashlytics, with device info and the user's
  // recent session activity attached automatically.
  FirebaseCrashlytics.instance.recordError(e, stackTrace);
  setState(() =&gt; orderStatus = OrderStatus.failed);
}
</code></pre>
<h3 id="heading-breadcrumbs">Breadcrumbs</h3>
<p>Raw crash logs tell you what broke. Breadcrumbs tell you what the user was doing when it broke. These aren't the same thing.</p>
<pre><code class="language-dart">FirebaseCrashlytics.instance.log('User opened checkout');
FirebaseCrashlytics.instance.log('Payment sheet presented');
FirebaseCrashlytics.instance.log('User submitted payment');
// crash here — now I know the exact sequence
</code></pre>
<h2 id="heading-testing-production-flutter-apps">Testing Production Flutter Apps</h2>
<p>I'll be honest: I under-tested my first app. I was moving fast, the features worked, and writing tests felt slow. Then I refactored a pricing calculation, introduced a bug that wasn't immediately obvious, and shipped it. A user caught it before I did.</p>
<p>I test more carefully now. Not everything — but the things that matter.</p>
<h3 id="heading-unit-test-business-logic">Unit Test Business Logic</h3>
<pre><code class="language-dart">test('discount applies percentage correctly', () {
  final result = calculateDiscountedPrice(
    price: 100.0,
    discountPercent: 10,
  );

  // 10% off 100.00 should be 90.00
  expect(result, equals(90.0));
});

test('discount throws for negative percentage', () {
  expect(
    () =&gt; calculateDiscountedPrice(price: 100, discountPercent: -5),
    throwsA(isA&lt;ArgumentError&gt;()),
  );
});
</code></pre>
<p>Business logic – pricing, validation, authorization – should be in plain Dart functions with no Flutter dependencies, so they can be tested in milliseconds without any test infrastructure.</p>
<h3 id="heading-widget-test-ui-states">Widget Test UI States</h3>
<p>Flutter's widget testing is genuinely one of its best features. You can test loading states, error states, and user interactions without a device or emulator.</p>
<pre><code class="language-dart">testWidgets('shows error state with retry button on load failure',
    (tester) async {
  final mockRepo = MockUserRepository();
  when(mockRepo.getUser(any)).thenThrow(Exception('Network error'));

  await tester.pumpWidget(
    ProviderScope(
      overrides: [
        userRepositoryProvider.overrideWithValue(mockRepo),
      ],
      child: const MaterialApp(home: ProfileScreen(userId: 'test')),
    ),
  );

  // pumpAndSettle waits for all animations and async
  // operations to complete before asserting.
  await tester.pumpAndSettle();

  expect(find.text('Failed to load profile. Please try again.'), findsOneWidget);
  expect(find.text('Try again'), findsOneWidget);
});
</code></pre>
<p>What I prioritize testing: core business logic, error and loading states, any flow that involves money or data the user can't recover, and the integration points between my app and the backend. Static UI widgets that contain no logic I generally leave uncovered.</p>
<h2 id="heading-architecture-and-long-term-maintainability">Architecture and Long-Term Maintainability</h2>
<p>The first app I shipped had no real architecture. Everything was in widgets. Business logic sat next to UI code. State was scattered.</p>
<p>It worked fine for six months. Then I needed to add a feature that touched several existing screens, and what should have taken a day took a week because I couldn't change anything without breaking something else.</p>
<p>The second app I was more deliberate about. Features in their own folders. Repositories separate from widgets. State managed outside the UI layer. When requirements changed — and they always change — the changes were contained.</p>
<h3 id="heading-separate-concerns-at-the-layer-boundary">Separate Concerns at the Layer Boundary</h3>
<pre><code class="language-plaintext">lib/
  features/
    profile/
      data/
        profile_repository.dart     # network + cache logic
      domain/
        user.dart                   # clean domain model
      presentation/
        profile_screen.dart         # widget
        profile_notifier.dart       # state
</code></pre>
<p>Widgets shouldn't make network calls. Repositories shouldn't import Flutter. Neither should know anything about the other's internals.</p>
<p>When you need to swap the data source, or test the notifier with a mock, or change the UI without touching the business logic — this separation is what makes that possible.</p>
<h3 id="heading-technical-debt-accumulates-faster-than-you-expect">Technical Debt Accumulates Faster Than You Expect</h3>
<p>A shortcut that saves thirty minutes today tends to cost several hours a month from now. The shortcuts that compound fastest in Flutter:</p>
<ul>
<li><p>Business logic inside widgets (impossible to test, impossible to reuse)</p>
</li>
<li><p><code>dynamic</code> instead of typed models (runtime errors instead of compile-time errors)</p>
</li>
<li><p>Copy-pasted validation logic (change it in one place and forget the others)</p>
</li>
<li><p>Mutable global state without clear ownership</p>
</li>
</ul>
<p>None of these are catastrophic on day one. All of them make the next change harder than it should be, and the change after that harder still.</p>
<h2 id="heading-end-to-end-example-a-production-grade-profile-feature">End-to-End Example: a Production-Grade Profile Feature</h2>
<p>Here's everything from this article assembled into one feature. A repository with caching and retry, a Riverpod notifier with optimistic updates, a widget that handles all three states, and proper lifecycle management throughout.</p>
<h3 id="heading-the-repository">The Repository</h3>
<pre><code class="language-dart">class ProfileRepository {
  final Dio _dio;
  final Box _cache;

  ProfileRepository(this._dio, this._cache);

  Future&lt;User&gt; getUser(String userId) async {
    try {
      final response = await withRetry(
        () =&gt; _dio.get('/users/$userId'),
      );

      final user = User.fromJson(
        response.data as Map&lt;String, dynamic&gt;,
      );

      // Cache successful responses for offline fallback.
      await _cache.put('user_$userId', user.toJson());

      return user;
    } on DioException catch (e) {
      final cached = _cache.get('user_$userId');

      if (cached != null) {
        return User.fromJson(Map&lt;String, dynamic&gt;.from(cached));
      }

      if (e.type == DioExceptionType.connectionError) {
        throw NoInternetException();
      }

      throw ServerException(e.response?.statusCode ?? 0);
    }
  }

  Future&lt;void&gt; updateDisplayName(String userId, String name) async {
    await withRetry(
      () =&gt; _dio.patch('/users/$userId', data: {'displayName': name}),
    );

    // Invalidate cache so the next read fetches fresh data.
    await _cache.delete('user_$userId');
  }
}
</code></pre>
<h3 id="heading-the-notifier">The Notifier</h3>
<pre><code class="language-dart">@riverpod
class ProfileNotifier extends _$ProfileNotifier {
  @override
  AsyncValue&lt;User&gt; build(String userId) {
    _load();
    return const AsyncValue.loading();
  }

  Future&lt;void&gt; _load() async {
    state = const AsyncValue.loading();
    state = await AsyncValue.guard(
      () =&gt; ref.read(profileRepositoryProvider).getUser(userId),
    );
  }

  Future&lt;void&gt; refresh() =&gt; _load();

  Future&lt;void&gt; updateName(String newName) async {
    final current = state.valueOrNull;
    if (current == null) return;

    try {
      await ref
          .read(profileRepositoryProvider)
          .updateDisplayName(userId, newName);

      // Update the UI immediately without waiting for a reload.
      state = AsyncValue.data(current.copyWith(displayName: newName));
    } catch (e, st) {
      FirebaseCrashlytics.instance.recordError(e, st);
      // Restore the previous state if the update fails.
      state = AsyncValue.data(current);
      rethrow;
    }
  }
}
</code></pre>
<h3 id="heading-the-widget">The Widget</h3>
<pre><code class="language-dart">class ProfileScreen extends ConsumerWidget {
  final String userId;
  const ProfileScreen({required this.userId, super.key});

  @override
  Widget build(BuildContext context, WidgetRef ref) {
    final profileAsync = ref.watch(profileNotifierProvider(userId));

    return Scaffold(
      appBar: AppBar(title: const Text('Profile')),
      body: profileAsync.when(
        loading: () =&gt; const Center(child: CircularProgressIndicator()),
        error: (e, _) =&gt; _ErrorView(
          message: e is NoInternetException
              ? 'No internet connection.'
              : 'Failed to load profile.',
          onRetry: () =&gt; ref
              .read(profileNotifierProvider(userId).notifier)
              .refresh(),
        ),
        data: (user) =&gt; _ProfileView(user: user, userId: userId),
      ),
    );
  }
}

class _ProfileView extends ConsumerStatefulWidget {
  final User user;
  final String userId;
  const _ProfileView({required this.user, required this.userId});

  @override
  ConsumerState&lt;_ProfileView&gt; createState() =&gt; _ProfileViewState();
}

class _ProfileViewState extends ConsumerState&lt;_ProfileView&gt; {
  late final TextEditingController _nameController;

  @override
  void initState() {
    super.initState();
    _nameController = TextEditingController(text: widget.user.displayName);
  }

  @override
  void dispose() {
    _nameController.dispose();
    super.dispose();
  }

  Future&lt;void&gt; _saveName() async {
    try {
      await ref
          .read(profileNotifierProvider(widget.userId).notifier)
          .updateName(_nameController.text);

      if (!mounted) return;

      ScaffoldMessenger.of(context).showSnackBar(
        const SnackBar(content: Text('Name updated.')),
      );
    } catch (_) {
      if (!mounted) return;

      ScaffoldMessenger.of(context).showSnackBar(
        const SnackBar(content: Text('Failed to update name.')),
      );
    }
  }

  @override
  Widget build(BuildContext context) {
    return ListView(
      padding: const EdgeInsets.all(16),
      children: [
        TextField(
          controller: _nameController,
          decoration: const InputDecoration(labelText: 'Display name'),
        ),
        const SizedBox(height: 16),
        ElevatedButton(
          onPressed: _saveName,
          child: const Text('Save'),
        ),
      ],
    );
  }
}
</code></pre>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>None of this is particularly advanced. It's mostly habits — <code>checking mounted</code>, <code>disposing controllers</code>, <code>handling the error state</code>, <code>caching for offline</code>. Each habit prevents one specific category of production failure, and together they add up to an app that users experience as reliable.</p>
<p>I wish I'd written my first app this way. I didn't, because I didn't know what I didn't know yet. That is normal.</p>
<p>But if you're reading this before shipping your first production app, you now have the benefit of what took me multiple shipped apps and a lot of frustrated user feedback to learn.</p>
<p>The best time to add these patterns is at the start of a feature. The second-best time is now.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ AI Paper Review: Training Language Models to Follow Instructions
with Human Feedback (InstructGPT) ]]>
                </title>
                <description>
                    <![CDATA[ GPT-3 was a major breakthrough in natural language processing. With 175 billion parameters, it demonstrated remarkable few-shot learning abilities and showed that scaling large language models could u ]]>
                </description>
                <link>https://www.freecodecamp.org/news/ai-paper-review-training-language-models-to-follow-instructions-with-human-feedback-instructgpt/</link>
                <guid isPermaLink="false">6a206bf72a223bf98b13dcfc</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ large language models ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ chatgpt ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Mohammed Fahd Abrah ]]>
                </dc:creator>
                <pubDate>Wed, 03 Jun 2026 18:01:27 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/494c3fa7-d7a0-448b-9983-99575f91836d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>GPT-3 was a major breakthrough in natural language processing. With 175 billion parameters, it demonstrated remarkable few-shot learning abilities and showed that scaling large language models could unlock a wide range of capabilities.</p>
<p>Yet despite its impressive performance, GPT-3 revealed an important limitation: raw capability doesn't automatically create a useful assistant.</p>
<p>A language model can generate fluent text, answer questions, and solve complex tasks while still failing to follow what the user actually wants.</p>
<p>GPT-3 could produce responses that were inconsistent, overly confident, difficult to control, or misaligned with user instructions. It was a powerful prediction engine, but it wasn't designed to reliably act as a helpful assistant.</p>
<p>This challenge motivated one of the most influential papers in modern AI: <em>Training Language Models to Follow Instructions with Human Feedback</em>. Rather than making the model larger, the researchers focused on teaching it how to better follow human intent.</p>
<p>The result was InstructGPT, a system fine-tuned from GPT-3 that demonstrated how human feedback could transform a capable language model into a far more useful and aligned assistant.</p>
<p>This challenge became one of the most important problems in modern AI: alignment.</p>
<p>Researchers realized that building larger models was only part of the solution. While scaling improved capabilities, it didn't guarantee that models would reliably follow instructions or behave in ways that matched user expectations. The next stage of progress required teaching models how to respond in a more helpful, truthful, and safe manner.</p>
<p>This led to the development of instruction-following systems and Reinforcement Learning from Human Feedback (RLHF). Instead of optimizing models solely to predict the next word, researchers began training them to better align with human preferences and intentions.</p>
<p>This shift marked a major turning point in the evolution of large language models.</p>
<p>GPT-3 demonstrated the power of large-scale language modeling and introduced many people to prompting and few-shot learning.</p>
<p>InstructGPT built on that foundation by showing how human feedback could significantly improve instruction following and model behavior. ChatGPT then brought these ideas to a much broader audience by packaging aligned language models into an accessible conversational interface used by millions of people.</p>
<p>In many ways, language models became capable before they became aligned.</p>
<p>That's why the transition from GPT-3 to InstructGPT represents one of the most important milestones in the history of artificial intelligence. The focus was no longer only on making models more capable. It was also about making them more useful, reliable, and responsive to human intent.</p>
<p>The success of InstructGPT pioneered many of the alignment techniques that later became a core part of systems such as ChatGPT and GPT-4.</p>
<h2 id="heading-paper-overview"><strong>Paper Overview:</strong></h2>
<p>In this article, we’ll mainly focus on the paper <a href="https://arxiv.org/pdf/2203.02155"><strong>Training Language Models to Follow Instructions with Human Feedback</strong></a>, published by OpenAI in 2022.</p>
<p>This paper introduced <strong>InstructGPT</strong>, one of the most important transitions in the history of large language models. While earlier GPT systems focused heavily on scaling model size and improving raw capabilities, this work shifted attention toward something equally important: <strong>alignment</strong>.</p>
<p>The paper explores how language models can be trained to better follow human instructions using reinforcement learning from human feedback (RLHF). Instead of optimizing only for next-token prediction, the model is further optimized to produce responses that humans actually prefer – responses that are more helpful, safer, and more aligned with user intent.</p>
<p>What makes this paper historically important is that it became the foundation for the modern ChatGPT alignment pipeline.</p>
<p>Many of the interaction patterns people now associate with ChatGPT (like instruction following, conversational behavior, refusal handling, and safer responses) can be traced directly back to the ideas introduced here.</p>
<p>Here’s the original paper again if you want to explore it directly: <a href="https://arxiv.org/pdf/2203.02155">Training language models to follow instructions with human feedback</a></p>
<p>And here’s a quick infographic of what we’ll cover throughout this review:</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/6986f1fe-7ee5-4bc6-b144-44aad5d2bb3e.png" alt="AI Papers Quick Insights- InstructGPT" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-table-of-contents"><strong>Table of Contents:</strong></h2>
<ul>
<li><p><a href="#heading-executive-summary">Executive Summary</a></p>
</li>
<li><p><a href="#heading-the-core-problem">The Core Problem</a></p>
</li>
<li><p><a href="#heading-why-gpt-3-was-not-enough">Why GPT-3 Was Not Enough</a></p>
</li>
<li><p><a href="#heading-instructgpt-the-birth-of-alignment-centered-llms">InstructGPT: The Birth of Alignment-Centered LLMs</a></p>
</li>
<li><p><a href="#heading-rlhf-pipeline-how-instructgpt-learned-to-behave-like-an-assistant">RLHF Pipeline: How InstructGPT Learned to Behave Like an Assistant</a></p>
<ul>
<li><p><a href="#heading-stage-1-supervised-fine-tuning-sft">Stage 1 — Supervised Fine-Tuning (SFT)</a></p>
</li>
<li><p><a href="#heading-stage-2-reward-model-training">Stage 2 — Reward Model Training</a></p>
</li>
<li><p><a href="#heading-stage-3-ppo-reinforcement-learning">Stage 3 — PPO Reinforcement Learning</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-helpful-honest-harmless">Helpful, Honest, Harmless</a></p>
</li>
<li><p><a href="#heading-human-feedback-as-the-new-scaling-factor">Human Feedback as the New Scaling Factor</a></p>
</li>
<li><p><a href="#heading-why-chatgpt-exploded-globally">Why ChatGPT Exploded Globally</a></p>
</li>
<li><p><a href="#heading-chatgpt-as-an-interface-revolution">ChatGPT as an Interface Revolution</a></p>
</li>
<li><p><a href="#heading-benchmarks-and-results">Benchmarks and Results</a></p>
</li>
<li><p><a href="#heading-truthfulness-and-hallucinations">Truthfulness and Hallucinations</a></p>
</li>
<li><p><a href="#heading-safety-and-refusal-behavior">Safety and Refusal Behavior</a></p>
</li>
<li><p><a href="#heading-limitations">Limitations</a></p>
</li>
<li><p><a href="#heading-historical-importance">Historical Importance</a></p>
</li>
<li><p><a href="#heading-discussion-the-real-shift">Discussion: The Real Shift</a></p>
</li>
<li><p><a href="#heading-connection-to-gpt-4">Connection to GPT-4</a></p>
</li>
<li><p><a href="#heading-gpt-3-vs-instructgpt-vs-chatgpt-vs-gpt-4-key-differences">GPT-3 vs InstructGPT vs ChatGPT vs GPT-4: Key Differences</a></p>
</li>
<li><p><a href="#heading-from-gpt-1-to-gpt-4-a-timeline-of-modern-ai-systems-and-alignment-evolution">From GPT-1 to GPT-4: A Timeline of Modern AI Systems and Alignment Evolution</a></p>
</li>
<li><p><a href="#heading-final-insight">Final Insight</a></p>
</li>
<li><p><a href="#heading-resources">Resources</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To get the most out of this breakdown, it helps to already be familiar with a few foundational ideas.</p>
<p>Reading the previous reviews in this series will be especially helpful:</p>
<ul>
<li><p><a href="https://www.freecodecamp.org/news/ai-paper-review-improving-language-understanding-by-generative-pre-training-gpt-1/">AI Paper Review: Improving Language Understanding by Generative Pre-Training (GPT-1)</a></p>
</li>
<li><p><a href="https://www.freecodecamp.org/news/ai-paper-review-language-models-are-unsupervised-multitask-learners-gpt-2/">AI Paper Review: Language Models are Unsupervised Multitask Learners (GPT-2)</a></p>
</li>
<li><p><a href="https://www.freecodecamp.org/news/ai-paper-review-language-models-are-few-shot-learners-gpt-3/">AI Paper Review: Language Models are Few-Shot Learners (GPT-3)</a></p>
</li>
</ul>
<p>Even though GPT-4 was released after InstructGPT, reading the GPT-4 review can still be helpful. It provides a broader view of how alignment techniques evolved and how they were combined with stronger reasoning and multimodal capabilities in later generations of GPT models.</p>
<ul>
<li><a href="https://www.freecodecamp.org/news/ai-paper-review-gpt-4-technical-report/">AI Paper Review: GPT-4 Technical Report (GPT-4)</a></li>
</ul>
<p>It also helps to have:</p>
<ul>
<li><p>A general understanding of natural language processing (NLP) and large language models</p>
</li>
<li><p>A high-level idea of Transformer-based autoregressive models</p>
</li>
<li><p>Familiarity with prompting, few-shot learning, and in-context learning</p>
</li>
<li><p>A basic understanding of reinforcement learning and human feedback systems</p>
</li>
<li><p>General machine learning concepts like training data, fine-tuning, scaling, and inference</p>
</li>
<li><p>Some familiarity with alignment, safety, and AI behavior control concepts</p>
</li>
</ul>
<p>You don't need to be an AI researcher to follow this article, though.</p>
<p>I’ll keep the explanations practical and intuitive, focusing more on understanding how InstructGPT changed modern AI systems rather than getting lost in dense mathematical details or academic terminology.</p>
<h2 id="heading-executive-summary">Executive Summary</h2>
<p>The paper <em>Training Language Models to Follow Instructions with Human Feedback</em> marks one of the biggest turning points in the history of modern AI systems. Instead of asking only how to make language models larger or smarter, OpenAI focused on a different question: how do we make these models actually helpful for real people?</p>
<p>The paper introduces <strong>InstructGPT</strong>, a version of GPT-3 fine-tuned to follow human instructions more accurately using a method called <strong>Reinforcement Learning from Human Feedback (RLHF)</strong>.</p>
<p>The core insight of the paper is simple but extremely important:</p>
<p>Bigger language models don't automatically become better assistants.</p>
<p>Even highly capable models like GPT-3 could still:</p>
<ul>
<li><p>ignore instructions</p>
</li>
<li><p>hallucinate facts</p>
</li>
<li><p>generate toxic or biased outputs</p>
</li>
<li><p>produce responses that were technically fluent but not actually useful to users</p>
</li>
</ul>
<p>To solve this problem, OpenAI built a multi-stage alignment pipeline: humans first demonstrate ideal answers, humans then rank model outputs, and finally the model learns from those preferences using reinforcement learning.</p>
<p>This changed the direction of modern AI development.</p>
<p>The paper shows that alignment and usability can matter more than raw model size itself. One of the most surprising findings was that the 1.3B InstructGPT model was often preferred by human evaluators over the original 175B GPT-3 model, despite being dramatically smaller.</p>
<p>The paper also demonstrates improvements in instruction following, truthfulness, toxicity reduction, conversational behavior, and general user preference.</p>
<p>Historically, this paper became the foundation behind modern conversational AI systems.</p>
<p>GPT-3 proved that language models could learn from prompts.</p>
<p>GPT-4 later proved that scaling and multimodal reasoning could unlock even stronger capabilities.</p>
<p>But InstructGPT showed something equally important: AI systems must be aligned with human intent to become truly usable products.</p>
<p>In many ways, this paper represents the transition from raw language modeling to aligned assistants, capability scaling to behavior shaping, and research demos to real-world conversational AI systems.</p>
<p>And that transition eventually led directly to ChatGPT.</p>
<h2 id="heading-the-core-problem">The Core Problem</h2>
<p>One of the most important ideas in this paper is that raw language modeling is not the same thing as building a useful assistant.</p>
<p>Before InstructGPT, models like GPT-3 were trained mainly with a simple objective: predict the next token in a sequence.</p>
<p>That objective made language models extremely powerful at generating fluent text, but it also created a major limitation. The model learned how to continue internet text, not necessarily how to help humans.</p>
<p>This became one of the defining realizations behind modern AI alignment research.</p>
<p>Despite its impressive capabilities, GPT-3 often struggled to behave like a reliable assistant. The model could produce fluent text, but it was not explicitly trained to follow user intent.</p>
<p>Here are some examples that highlight the differences between GPT-3 and InstructGPT in how they respond to user prompts:</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/22cfce35-8c0e-4560-9419-15c6e33123ce.png" alt="Comparison of GPT-3 and InstructGPT responses to the same prompts. GPT-3 often continues generating similar prompts instead of completing the requested task, while InstructGPT follows the instruction directly and produces the requested answer, demonstrating stronger instruction-following behavior." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Source: <a href="https://openai.com/index/instruction-following/"><strong>Aligning language models to follow instructions</strong></a></p>
<img src="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/cd366a10-f872-4468-bff3-64d05d0597d6.png" alt="cd366a10-f872-4468-bff3-64d05d0597d6" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Source: <a href="https://openai.com/index/instruction-following/"><strong>Aligning language models to follow instructions</strong></a></p>
<p>These examples reveal the central weakness of early GPT systems. GPT-3 often continued the pattern of the prompt rather than completing the requested task. InstructGPT, by contrast, responded directly to the user's instruction. The difference wasn't a matter of raw intelligence. It was a difference in training objectives.</p>
<p>GPT models were trained on massive internet-scale datasets where the goal was simply to predict what text comes next. As a result, the model optimized for plausibility, continuation, and pattern completion. Not necessarily for truthfulness, safety, helpfulness, or alignment with human goals.</p>
<p>This created a major gap between: language capability and useful assistant behavior.</p>
<p>For example, if a user asked a harmful, misleading, or nonsensical question, the model might still attempt to continue the pattern naturally instead of recognizing the issue. In many cases, the model behaved more like an internet text simulator than a reliable assistant.</p>
<p>The paper repeatedly emphasizes that scaling alone couldn't solve this problem.</p>
<p>Researchers increasingly recognized that better behavior would require more than scaling alone.</p>
<p>Models also needed stronger instruction following, better alignment with human intent, improved safety behavior, greater truthfulness, and optimization around real user needs.</p>
<h2 id="heading-why-gpt-3-was-not-enough">Why GPT-3 Was Not Enough</h2>
<p>When GPT-3 was released, it felt like a massive leap forward in AI capabilities.</p>
<p>The model could perform few-shot learning, answer questions, summarize text, generate code, translate languages, and even solve certain reasoning tasks: all without traditional fine-tuning. For many researchers, it was the first time a language model started to feel genuinely general-purpose.</p>
<p>Yet using GPT-3 in practice was often less reliable than its benchmark performance suggested.</p>
<p>In practice, using GPT-3 often required careful prompt engineering. Small wording changes could completely change the quality of the response. Sometimes the model followed instructions well, and other times it ignored them entirely.</p>
<p>Users often found themselves rewriting prompts repeatedly to obtain the response they actually wanted.</p>
<p>This became the core motivation behind InstructGPT.</p>
<p>OpenAI responded by exploring ways to make model behavior more consistent, predictable, and useful for users.</p>
<h2 id="heading-instructgpt-the-birth-of-alignment-centered-llms">InstructGPT: The Birth of Alignment-Centered LLMs</h2>
<p>The release of InstructGPT marked one of the biggest shifts in the history of large language models.</p>
<p>Before InstructGPT, most advances in language models came from scaling data, compute, and model size.</p>
<p>The focus shifted toward alignment: building systems that could follow instructions more reliably and behave in ways users actually preferred.</p>
<p>This is where InstructGPT introduced one of the most important ideas in modern AI systems: Reinforcement Learning from Human Feedback (RLHF).</p>
<p>Instead of optimizing models only to predict internet text, OpenAI started optimizing models based on what humans actually preferred. Human labelers ranked model outputs, and those preferences became part of the training process itself.</p>
<p>This fundamentally changed the objective of language models.</p>
<p>Rather than optimizing solely for next-token prediction, the system was increasingly optimized to produce responses that humans judged to be helpful, safe, and aligned with their intentions.</p>
<p>That distinction may sound subtle, but it completely changed the direction of AI development.</p>
<p>InstructGPT combined instruction-following training with human preference optimization, creating a model whose behavior could be shaped directly through feedback rather than solely through pretraining.</p>
<p>The model was no longer trained only to imitate the internet. It was trained to behave more like an assistant.</p>
<h2 id="heading-rlhf-pipeline-how-instructgpt-learned-to-behave-like-an-assistant">RLHF Pipeline: How InstructGPT Learned to Behave Like an Assistant</h2>
<p>At the center of the InstructGPT paper is a training pipeline that completely changed how modern AI assistants are built.</p>
<p>RLHF was designed to build on traditional language-model pretraining rather than replace it.</p>
<p>The InstructGPT paper introduced a different idea: instead of training models only on internet text, why not train them using human preferences directly?</p>
<p>This led to the development of the RLHF pipeline: Reinforcement Learning from Human Feedback. This approach would later become a standard component of modern conversational AI systems.</p>
<p>The paper’s Figure 2 is especially important because it visualizes the entire alignment pipeline introduced by OpenAI. Rather than relying on a single training stage, the system uses multiple stages where human feedback gradually shapes model behavior.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/d1ccebd1-00b4-48ea-8bc7-e3953bc88fc6.png" alt="RLHF Training Pipeline for InstructGPT" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>Source:</strong> <em>Training Language Models to Follow Instructions with Human Feedback</em> (OpenAI, 2022).</p>
<p>As you can see in the image above, the process happens in three major stages.</p>
<h3 id="heading-stage-1-supervised-fine-tuning-sft">Stage 1 — Supervised Fine-Tuning (SFT)</h3>
<p>The first stage starts with human-written demonstrations.</p>
<p>Labelers are given prompts and asked to write ideal responses – the kinds of answers a helpful assistant should produce. These examples become the initial training dataset for the model.</p>
<p>At this stage, the model learns the basic patterns of assistant-style responses.</p>
<p>This is still traditional supervised learning, but the goal is different from standard language modeling. Instead of learning only from web text, the model now learns from examples of preferred assistant behavior.</p>
<p>This stage creates what the paper calls the Supervised Fine-Tuned model (SFT model).</p>
<p>And while this already improves behavior significantly, OpenAI realized something important: human preferences are more complex than simple “correct answers.”</p>
<p>There are often many possible responses to a prompt, but humans may strongly prefer some answers over others.</p>
<p>That leads to the next stage.</p>
<h3 id="heading-stage-2-reward-model-training">Stage 2 — Reward Model Training</h3>
<p>In the second stage, humans no longer write responses directly.</p>
<p>Instead, the model generates multiple answers for the same prompt, and human labelers rank them from best to worst.</p>
<p>For a given prompt, one response may be clearer, another more accurate, and another safer or more appropriate. Human labelers rank these alternatives according to their preferences</p>
<p>The rankings are then used to train a separate neural network called the Reward Model (RM).</p>
<p>This model learns something extremely important: which outputs humans prefer.</p>
<p>In other words, the system converts human preferences into a trainable reward signal.</p>
<p>This becomes one of the biggest conceptual breakthroughs in the paper. Instead of manually programming behavior rules, OpenAI trains the model to approximate human judgment itself.</p>
<p>The reward model captures patterns in human preferences and turns them into a training signal.</p>
<p>That reward signal becomes the foundation for the final training stage.</p>
<h3 id="heading-stage-3-ppo-reinforcement-learning">Stage 3 — PPO Reinforcement Learning</h3>
<p>The final stage uses reinforcement learning to optimize the language model against the reward model.</p>
<p>More specifically, the paper uses PPO (Proximal Policy Optimization), a reinforcement learning algorithm commonly used in policy optimization tasks.</p>
<p>At this stage, the model generates responses, receives scores from the reward model, and gradually updates its behavior to maximize those scores.</p>
<p>The model gradually shifts toward responses that receive higher scores from the reward model.</p>
<p>The key innovation is that optimization now occurs against a learned representation of human preferences rather than only a language-modeling objective.</p>
<p>According to the paper, this RLHF pipeline significantly improved instruction following and user preference ratings while also reducing toxic and unsafe behavior.</p>
<p>And in many ways, this pipeline became the blueprint for the modern era of conversational AI systems.</p>
<h2 id="heading-helpful-honest-harmless">Helpful, Honest, Harmless</h2>
<p>The authors argue that evaluating language models requires more than measuring capability alone. They should also be evaluated by how they behave around humans.</p>
<p>At the time, this represented a significant shift in how researchers evaluated language models.</p>
<p>That is why the paper repeatedly emphasizes a new alignment philosophy centered around three goals:</p>
<ul>
<li><p>Helpful</p>
</li>
<li><p>Honest</p>
</li>
<li><p>Harmless</p>
</li>
</ul>
<p>These ideas became the conceptual foundation behind modern alignment research and conversational AI systems.</p>
<h3 id="heading-helpful">Helpful</h3>
<p>The first goal is straightforward: the model should genuinely help the user accomplish what they want.</p>
<p>In practice, helpfulness means following instructions clearly, answering questions directly, providing relevant information, and adapting to the user's intent.</p>
<p>This may seem simple, but it fundamentally changes the training objective.</p>
<p>The model is no longer optimized only for linguistic fluency. It's optimized for usefulness.</p>
<h3 id="heading-honest">Honest</h3>
<p>The second goal is honesty.</p>
<p>One of the biggest problems with large language models is that they often produce convincing answers even when those answers are wrong. The models can hallucinate facts, invent references, or respond confidently despite uncertainty.</p>
<p>The paper recognizes that a useful assistant shouldn't merely sound intelligent. It should also behave truthfully and acknowledge uncertainty when necessary.</p>
<p>This is especially important because language models are optimized to generate plausible text, not verified truth.</p>
<p>As a result, earlier models sometimes prioritized sounding coherent over being accurate.</p>
<p>The alignment process introduced in InstructGPT attempts to reduce this behavior through human feedback and preference optimization. Human evaluators consistently prefer responses that are more accurate, transparent, and reliable, and those preferences gradually shape the model during RLHF training.</p>
<p>The paper doesn't claim that hallucinations disappear completely. Far from it. But it marks one of the first large-scale attempts to explicitly optimize language models for truthfulness and reliability rather than pure text generation quality.</p>
<h3 id="heading-harmless">Harmless</h3>
<p>The third goal is harmlessness.</p>
<p>Large language models trained on internet data inevitably absorb toxic, biased, unsafe, or harmful patterns from that data. Without alignment, models may generate dangerous instructions, offensive content, or manipulative behavior.</p>
<p>The paper directly addresses this concern and treats safety as a central part of model development.</p>
<p>Through RLHF and human preference ranking, the model learns to refuse certain harmful requests, avoid toxic generations, produce safer responses, and behave more responsibly during interaction.</p>
<p>This became one of the defining characteristics of modern conversational AI systems.</p>
<p>Instead of maximizing unrestricted generation, the system begins balancing usefulness, safety, and alignment with human values.</p>
<p>But the paper is also honest about limitations.</p>
<p>The authors acknowledge that harmful outputs, biases, and unsafe behavior can still appear. Alignment is imperfect, and human values themselves are complex and difficult to define universally.</p>
<p>But historically, this paper marks the moment when safety and alignment became core engineering goals rather than secondary concerns.</p>
<p>Taken together, these three principles (helpful, honest, and harmless) became much more than training objectives. They became the philosophical foundation behind ChatGPT-era AI systems.</p>
<p>Earlier GPT papers mainly explored how to scale intelligence. But InstructGPT explored something deeper: how to make intelligence usable for humans.</p>
<h2 id="heading-human-feedback-as-the-new-scaling-factor">Human Feedback as the New Scaling Factor</h2>
<p>One of the most fascinating ideas behind the InstructGPT paper is that it quietly changed what “scaling” meant in modern AI.</p>
<p>For years, progress in language models was largely measured through scaling.</p>
<p>GPT-1 showed that pretraining works. GPT-2 showed that larger models develop stronger zero-shot behavior. GPT-3 pushed this idea even further by scaling to 175 billion parameters and demonstrating impressive few-shot learning abilities.</p>
<p>And to some extent, that was true. Larger models became better at reasoning, code generation, language understanding, translation, and generalization.</p>
<p>That is where human feedback became central.</p>
<p>Instead of relying purely on internet-scale text, OpenAI introduced a training pipeline where human preferences directly shaped model behavior. Human labelers ranked responses, evaluated quality, and guided the system toward outputs people actually preferred.</p>
<p>In many ways, this created a completely new scaling dimension for AI systems:</p>
<ul>
<li><p>scaling human feedback</p>
</li>
<li><p>scaling preference learning</p>
</li>
<li><p>scaling alignment pipelines</p>
</li>
</ul>
<p>Historically, this shifted attention from model scale alone toward the quality of model behavior</p>
<p>InstructGPT focused on scaling usability. And the results were surprisingly powerful.</p>
<p>According to the paper, a much smaller aligned model was often preferred over the original 175B GPT-3 model by human evaluators.</p>
<p>That finding changed how the industry thought about progress.</p>
<p>The result suggested that improving behavior could sometimes matter as much as increasing scale.</p>
<p>This is why RLHF became one of the defining ideas of the ChatGPT era.</p>
<p>After InstructGPT, modern AI systems were no longer evaluated only by benchmark scores, parameter counts, or scaling curves.</p>
<p>They were increasingly evaluated by usefulness, conversational quality, safety, reliability, and how well they interact with humans.</p>
<p>And that shift fundamentally changed the future direction of large language models.</p>
<h2 id="heading-why-chatgpt-exploded-globally">Why ChatGPT Exploded Globally</h2>
<p>When ChatGPT launched publicly, the reaction was immediate and unlike anything the AI industry had seen before.</p>
<p>Millions of people started using it within days. Developers, students, writers, researchers, businesses, and everyday users suddenly felt like they were interacting with AI in a completely different way.</p>
<p>What made this moment so important was that advanced AI capabilities finally became accessible to ordinary users. After all, the underlying language models were already extremely capable before ChatGPT existed. GPT-3 could generate essays, answer questions, write code, summarize text, and perform impressive few-shot learning tasks. GPT-4 later pushed reasoning and multimodal abilities even further.</p>
<p>The challenge was no longer whether language models could perform useful tasks, but whether people could interact with them naturally.</p>
<p>ChatGPT combined powerful language-model capabilities with RLHF-based alignment, conversational interaction, safer behavior, and a user-friendly chat interface.</p>
<p>Earlier systems often required significant prompt experimentation to achieve consistent results. Users had to carefully engineer prompts, retry questions, or work around strange outputs. The models could be brilliant one moment and confusing the next.</p>
<p>ChatGPT changed that experience dramatically.</p>
<p>Thanks to the alignment techniques introduced in the InstructGPT paper, the system became far better at following instructions, maintaining conversational flow, understanding intent, and responding in a way that felt cooperative rather than purely generative.</p>
<p>The conversational interface itself also mattered enormously.</p>
<p>Before ChatGPT, interacting with advanced AI systems often required APIs, coding knowledge, prompt experimentation, or technical understanding.</p>
<p>ChatGPT simplified everything into a familiar chat format: you simply typed naturally, and the system responded naturally.</p>
<p>That design decision may sound small, but historically it was transformative. It turned large language models from research tools into consumer products.</p>
<p>Although imperfect, the system felt substantially more reliable than earlier language-model interfaces.</p>
<p>The system was designed to communicate in ways that felt more natural and cooperative.</p>
<p>The breakthrough was not simply that the AI became smarter. The breakthrough was that the AI became usable.</p>
<p>And that usability is what transformed large language models from impressive research demonstrations into globally adopted AI assistants.</p>
<h2 id="heading-chatgpt-as-an-interface-revolution">ChatGPT as an Interface Revolution</h2>
<p>One of the most important things about ChatGPT is that it changed how humans interact with computers.</p>
<p>Before ChatGPT, powerful AI systems mostly lived behind APIs, research demos, developer tools, and complex prompting workflows.</p>
<p>Using advanced language models often required technical knowledge. Developers experimented with prompt engineering, API parameters, temperature settings, and carefully structured inputs just to get reliable outputs from the model.</p>
<p>Even GPT-3, despite being extremely powerful, still felt like a research system for many users. You had to learn how to “talk to the model.”</p>
<p>And in many cases, the interaction felt fragile. Slight changes in wording could completely change the quality of the response.</p>
<p>ChatGPT changed that dynamic almost overnight.</p>
<p>Instead of making users adapt to the AI, the AI became much better at adapting to humans.</p>
<p>Natural conversation became the interface.</p>
<p>For decades, human-computer interaction depended on commands, menus, search boxes, forms, programming languages, and specialized software interfaces.</p>
<p>ChatGPT introduced something different: you could simply explain what you wanted in plain language. And the system would usually understand.</p>
<p>This made AI feel accessible to people who had never written code, used APIs, or interacted with machine learning systems before.</p>
<p>In many ways, ChatGPT transformed prompting into a universal interface for computing. And that single shift affected nearly every digital field.</p>
<p>In education, students started using conversational AI to explain difficult concepts, summarize lessons, practice languages, and receive tutoring-style help.</p>
<p>In coding, developers began using AI systems for debugging, code generation, documentation, and learning new frameworks.</p>
<p>This eventually led to the rise of AI coding assistants integrated directly into development environments.</p>
<p>In writing and content creation, conversational AI became a brainstorming partner capable of drafting ideas, rewriting text, organizing articles, and helping people communicate more effectively.</p>
<p>Search behavior also started changing. Instead of searching through lists of links, users increasingly expected direct conversational answers. This fundamentally challenged traditional search-engine interaction models.</p>
<p>And across productivity tools, AI systems began acting less like software features and more like collaborative assistants.</p>
<p>This shift was enabled by advances in conversational AI and interaction design that made dialogue feel natural and useful.</p>
<p>The alignment techniques introduced by InstructGPT were an important part of making these conversational experiences practical.</p>
<p>Historically, this may become one of the most important consequences of the GPT era: earlier software required humans to learn interfaces. ChatGPT pushed computing toward interfaces that learn humans instead.</p>
<h2 id="heading-benchmarks-and-results">Benchmarks and Results</h2>
<p>We've already discussed how one of the biggest improvement didn't come from making the model larger. Instead, it came from making the model better aligned with humans.</p>
<p>This is one of the central findings of the entire paper, and it changed how many researchers thought about progress in large language models.</p>
<p>Before this work, the dominant belief was that scaling was the main path forward, with bigger models, more parameters, more compute, and more data. And GPT-3 seemed to confirm that idea. Larger models consistently showed stronger few-shot learning, reasoning, and generalization abilities.</p>
<p>But the InstructGPT paper introduced a different perspective. The researchers found that a relatively small 1.3B parameter InstructGPT model was often preferred by human evaluators over the original 175B GPT-3 model.</p>
<p>That result was extremely important. It suggested that alignment sometimes outperformed scale.</p>
<p>This became one of the defining insights of the ChatGPT era.</p>
<p>According to the paper, human evaluators consistently preferred InstructGPT responses because they were more helpful, more accurate, safer, and better aligned with what users were actually asking for.</p>
<p>The improvements appeared across several important areas.</p>
<p>One major improvement was instruction following. Earlier GPT models often ignored instructions, drifted off-topic, or generated responses that sounded fluent but failed to solve the user’s actual task. InstructGPT behaved much more like a cooperative assistant and followed prompts more reliably.</p>
<p>The paper also reports improvements in truthfulness. Large language models are known for hallucinating information and confidently generating false statements. Through RLHF and preference optimization, InstructGPT reduced some of these behaviors and produced answers humans judged to be more truthful and reliable.</p>
<p>Another important improvement involved toxicity and harmful outputs. The researchers evaluated the system on toxicity benchmarks and found that aligned models generated fewer toxic or unsafe responses compared to earlier GPT systems.</p>
<p>What makes these findings historically important is that they changed the industry’s understanding of what “better AI” actually meant.</p>
<p>Before InstructGPT, improvement was mostly measured through benchmark scores, scaling curves, and parameter counts.</p>
<p>After InstructGPT, researchers increasingly focused on usability, safety, alignment, conversational quality, and human preference satisfaction.</p>
<p>This was a major shift in AI development philosophy.</p>
<h2 id="heading-truthfulness-and-hallucinations">Truthfulness and Hallucinations</h2>
<p>A major challenge for language models is that fluent responses are not always truthful.</p>
<p>This behavior is now commonly called hallucination.</p>
<p>Hallucinations can take many forms, including invented facts, fabricated references, incorrect explanations, or confident answers that lack factual support.</p>
<p>And because the responses are fluent and natural, the mistakes can sometimes look believable to users. The InstructGPT paper treats this as a serious issue rather than a minor flaw.</p>
<p>The authors note that language models are optimized for plausibility rather than verified truth. This is an important distinction: a language model can generate text that <em>looks</em> correct while still being inaccurate.</p>
<p>This is why the paper places particular emphasis on truthfulness and factual reliability.</p>
<p>Through RLHF and human preference optimization, InstructGPT was trained to produce answers humans judged to be more accurate and trustworthy. Human evaluators generally preferred responses that were more transparent about uncertainty and less likely to contain misleading information.</p>
<p>The paper also evaluates the model on truthfulness benchmarks such as <a href="https://arxiv.org/pdf/2109.07958">TruthfulQA</a>, where aligned models demonstrated improvements compared to earlier GPT systems.</p>
<p>But the paper is also careful not to overstate the results. Hallucinations didn't disappear. The aligned models could still make reasoning mistakes, generate false information, misunderstand prompts, or produce overconfident answers.</p>
<p>This nuance is extremely important: the paper doesn't claim that RLHF solved factuality or reasoning completely. Instead, alignment improved behavior, not perfection.</p>
<p>That distinction became increasingly important as ChatGPT and later GPT-4 systems reached millions of users worldwide.</p>
<p>The models became more useful, more truthful, and more aligned, but they still remained probabilistic language models rather than guaranteed fact engines.</p>
<p>In many ways, the InstructGPT paper marks the beginning of large-scale efforts to make AI systems not only intelligent, but also trustworthy enough for real-world human interaction.</p>
<h2 id="heading-safety-and-refusal-behavior">Safety and Refusal Behavior</h2>
<p>As language models became more powerful, researchers realized that safety was becoming a deployment problem.</p>
<p>A model that can generate human-like language at scale can also generate harmful instructions, produce toxic content, spread misinformation, or be manipulated into unsafe behavior.</p>
<p>The InstructGPT paper treats these risks very seriously and frames alignment as a necessary part of deploying large language models responsibly.</p>
<p>One of the biggest changes introduced through RLHF was safer refusal behavior.</p>
<p>Earlier GPT systems often attempted to answer almost anything. As a result, they often responded to unsafe prompts rather than recognizing when a refusal was appropriate.</p>
<p>InstructGPT begins changing that behavior.</p>
<p>Through human feedback and preference optimization, the model learns that some requests shouldn't be answered directly. Human labelers consistently prefer safer responses, refusals for harmful instructions, and outputs that avoid dangerous or toxic behavior.</p>
<p>This leads to systems that are better at refusing unsafe requests, avoiding toxic generations, and behaving more cautiously during interaction.</p>
<p>The paper also evaluates toxicity reduction using safety-related benchmarks and finds that aligned models generally produce fewer harmful outputs than earlier GPT systems.</p>
<p>Another important issue is harmful content filtering. Large language models absorb patterns from massive internet datasets, which inevitably contain biased language, misinformation, unsafe instructions, and toxic behavior.</p>
<p>Without alignment, models may reproduce these patterns surprisingly easily.</p>
<p>RLHF acts as a corrective layer on top of pretraining. Instead of only imitating internet text, the model is further optimized toward responses humans judge to be safer and more appropriate.</p>
<p>Of course, the paper is also realistic about limitations.</p>
<p>The authors acknowledge that alignment is incomplete and that unsafe outputs can still occur. Models may still be vulnerable to adversarial prompting or attempts to bypass safety behavior (what later became widely known as jailbreaks).</p>
<p>This is an important nuance: alignment reduces risk, but it doesn't eliminate it.</p>
<p>And historically, this realization became incredibly important for the future of large-scale AI deployment.</p>
<p>In many ways, the InstructGPT paper marks the beginning of modern AI safety engineering inside flagship language models.</p>
<p>InstructGPT introduced large-scale behavior alignment. Then GPT-4 expanded this even further with red teaming, adversarial testing, deployment monitoring, and much larger safety evaluation pipelines.</p>
<p>So this paper becomes a direct bridge between early generative language models and the much more safety-focused AI systems that followed in the GPT-4 era.</p>
<h2 id="heading-limitations">Limitations</h2>
<p>One of the strongest aspects of the InstructGPT paper is that it doesn't present alignment as a solved problem.</p>
<p>Even though the results are impressive, the authors are careful and surprisingly honest about the system’s remaining weaknesses and risks.</p>
<p>This balance is important because the paper isn't arguing that RLHF creates perfect AI systems. The authors consistently frame alignment as a work in progress rather than a finished solution.</p>
<p>One major limitation is that the models still hallucinate.</p>
<p>The paper acknowledges that hallucinations remain a significant challenge despite alignment improvements.</p>
<p>RLHF improves truthfulness and instruction adherence, but it doesn't fundamentally solve the probabilistic nature of language models. The system still predicts likely text patterns rather than verifying objective truth.</p>
<p>Another important issue is <a href="https://arxiv.org/pdf/2209.13085">reward hacking</a>.</p>
<p>Because the model is optimized against a learned reward signal, it can sometimes discover shortcuts that maximize reward without genuinely improving reasoning or understanding. In other words, the model may learn behaviors that <em>look</em> aligned to evaluators while still hiding deeper problems underneath.</p>
<p>This is a common challenge in reinforcement learning systems more broadly.</p>
<p>The paper also hints at a problem that later became widely discussed in ChatGPT-era systems: <a href="https://arxiv.org/pdf/2406.11717">over-refusal</a> and <a href="https://arxiv.org/pdf/2310.13548">sycophancy</a>.</p>
<p>Sometimes aligned models become too cautious and refuse harmless requests unnecessarily. In other cases, models may become overly agreeable, telling users what they appear to want to hear instead of providing more balanced or truthful responses.</p>
<p>This creates a difficult tension between safety, helpfulness, and honesty.</p>
<p>Another major limitation is bias.</p>
<p>Since these systems are trained on massive internet datasets and further shaped through human labeling, they inevitably inherit biases from both sources. The paper explicitly acknowledges that alignment doesn't remove all harmful or biased behavior.</p>
<p>And perhaps most importantly, the paper emphasizes that RLHF aligns models to labeler preferences not universal human values. This is a very important nuance.</p>
<p>The system learns from the judgments of specific human annotators operating within specific cultural and organizational contexts. That means alignment itself is subjective and imperfect.</p>
<p>There is no single universally agreed definition of helpfulness, fairness, safety, or acceptable behavior.</p>
<p>The paper discusses these concerns carefully and recognizes that human feedback introduces its own limitations and assumptions.</p>
<p>The alignment itself is also fragile. Even aligned systems can sometimes be manipulated through adversarial prompting or jailbreak-style attacks that bypass safety behavior. This later became one of the defining challenges of ChatGPT and GPT-4 deployment.</p>
<p>And finally, there's the practical issue of scale.</p>
<p>RLHF requires large amounts of human labeling, ranking, evaluation, and monitoring. Building these alignment pipelines is expensive, time-consuming, and operationally complex. Unlike raw pretraining data scraped automatically from the internet, human feedback doesn't scale nearly as easily.</p>
<p>In many ways, the paper reveals an important truth about modern AI systems: making models intelligent is difficult. But making them reliably aligned with humans may be even harder.</p>
<h2 id="heading-historical-importance">Historical Importance</h2>
<p>Looking back now, it's difficult to overstate how important the InstructGPT paper became for the entire AI industry.</p>
<p>Earlier GPT papers focused mostly on one central question: How do we make language models more capable?</p>
<p>That era was largely driven by larger datasets, larger parameter counts, scaling laws, and benchmark performance.</p>
<p>The models became increasingly impressive at generating text, solving tasks, and demonstrating emergent abilities. But they still behaved primarily like prediction engines trained to continue internet text.</p>
<p>InstructGPT changed the focus completely. For the first time, large-scale AI development began shifting from model-centric AI to interaction-centric AI.</p>
<p>This was a major philosophical transition: the industry realized that users didn't only care about raw intelligence, benchmark scores, or parameter counts.</p>
<p>They cared about usability, conversational quality, safety, trust, and whether the system could actually help them effectively.</p>
<p>This is why ChatGPT felt so different to the public. The underlying language model capabilities were important, but the real breakthrough came from how those capabilities were shaped into a usable human experience.</p>
<p>The interface became conversational. The system became more cooperative. The AI became more aligned with user intent.</p>
<p>That shift fundamentally changed public perception of artificial intelligence.</p>
<p>Before ChatGPT, most people saw AI as research software, technical demos, or specialized tools for experts.</p>
<p>After ChatGPT, millions of people started interacting with AI systems conversationally on a daily basis.</p>
<p>And that changed everything.</p>
<p>Earlier GPT papers focused mainly on discovering what scaling could achieve. InstructGPT introduced a different challenge: How do we safely deploy these systems in the real world?</p>
<p>That shift helped create entirely new areas of research and engineering, including RLHF pipelines, safety tuning, refusal behavior, red teaming, adversarial testing, policy frameworks, and large-scale human-feedback infrastructure.</p>
<p>In many ways, the ChatGPT era began the moment researchers realized that building powerful models was only part of the problem.</p>
<p>The harder challenge was making those systems reliable enough for human interaction at global scale.</p>
<p>It also helps explain why later systems placed much greater emphasis on safety, alignment, deployment practices, and real-world reliability.</p>
<p>The industry was no longer building language models only for research papers. It was building AI systems intended to operate in the real world. And the InstructGPT paper became one of the clearest turning points in that transformation.</p>
<h2 id="heading-discussion-the-real-shift">Discussion: The Real Shift</h2>
<p>The transition from GPT-3 to ChatGPT represents something much deeper than a simple improvement in model performance.</p>
<p>It changed the central question driving the entire AI industry.</p>
<p>During the GPT-3 era, the big question was, “Can language models learn tasks directly from prompts?”</p>
<p>That was the breakthrough introduced by GPT-3.</p>
<p>Research attention shifted toward scaling and emergent capabilities.</p>
<p>But the ChatGPT era introduced a completely different challenge: the question was no longer simply “Can the model perform the task?” Instead, it became, “Can humans actually trust and use these systems every day?”</p>
<p>That shift changed everything.</p>
<p>Once millions of people began interacting with AI systems directly, raw intelligence alone was no longer sufficient. Users needed systems that were understandable, reliable, safe, conversational, and aligned with human expectations.</p>
<p>This is exactly why the InstructGPT paper became so historically important. It introduced the idea that large language models should not only optimize for capability, but also for human interaction quality.</p>
<p>In many ways, the industry moved from “How smart is the model?” to “How usable is the model?”</p>
<p>And that transition fundamentally changed AI development.</p>
<p>After ChatGPT, success was no longer measured only by benchmark scores, parameter counts, or scaling curves.</p>
<p>It was increasingly measured by alignment, conversational quality, safety, and real-world usability.</p>
<p>This also explains why alignment research suddenly became central to modern AI systems.</p>
<p>GPT-3 showed that models could learn from prompts. ChatGPT showed that humans needed models that could cooperate.</p>
<p>That was the real shift.</p>
<p>And it may ultimately become one of the most important turning points in the history of artificial intelligence.</p>
<h2 id="heading-connection-to-gpt-4">Connection to GPT-4</h2>
<p>One of the most important things to understand about GPT-4 is that it didn't appear out of nowhere.</p>
<p>It was built on top of the alignment ideas introduced by InstructGPT and refined through the large-scale deployment experience of ChatGPT.</p>
<p>GPT-4 is often discussed in terms of its reasoning, multimodal abilities, and benchmark performance.</p>
<p>But beneath all of those improvements is something equally important: the alignment pipeline.</p>
<p>Without the work introduced in the InstructGPT paper, GPT-4 would likely feel far less usable as a real-world assistant.</p>
<p>That distinction matters enormously.</p>
<p>Many of GPT-4's alignment techniques can be traced back to ideas introduced by InstructGPT, including RLHF, instruction tuning, conversational alignment, safer refusal behavior, and human preference optimization.</p>
<p>ChatGPT then became the large-scale real-world testing ground for these ideas.</p>
<p>Millions of user interactions exposed weaknesses ranging from hallucinations and jailbreak attempts to broader safety and usability issues.</p>
<p>Those deployment lessons became incredibly valuable.</p>
<p>By the time GPT-4 arrived, OpenAI was no longer simply training a larger language model. It was building a large-scale aligned conversational system shaped by RLHF pipelines, human feedback, safety engineering, adversarial testing, and real-world user interaction.</p>
<p>This is why GPT-4 feels fundamentally different from earlier GPT models.</p>
<p>In many ways, GPT-4 represents the convergence of two major ideas: scaling capability and scaling alignment.</p>
<ul>
<li><p>GPT-3 proved that language models could learn tasks from prompts.</p>
</li>
<li><p>InstructGPT proved that models could be shaped through human feedback.</p>
</li>
<li><p>ChatGPT proved that aligned conversational AI could work at global scale.</p>
</li>
<li><p>GPT-4 combined all of those ideas into a much more capable multimodal system.</p>
</li>
</ul>
<p>That historical progression is important because it shows that modern AI systems aren't built through scaling alone. They're built through the combination of intelligence, alignment, interaction design, and deployment experience.</p>
<p>And the InstructGPT paper became one of the key foundations that made GPT-4 possible.</p>
<h2 id="heading-gpt-3-vs-instructgpt-vs-chatgpt-vs-gpt-4-key-differences">GPT-3 vs InstructGPT vs ChatGPT vs GPT-4: Key Differences</h2>
<p>By this point, we've discussed GPT-3, InstructGPT, ChatGPT, and GPT-4 individually. But it can be helpful to see them side by side.</p>
<p>Although these systems are closely related, each one introduced a different shift in the evolution of modern AI.</p>
<p>GPT-3 focused on capability through scale, InstructGPT focused on alignment through human feedback, ChatGPT focused on conversational usability, and GPT-4 combined these ideas with stronger reasoning and multimodal capabilities.</p>
<p>The table below summarizes the main differences between them and shows how each system built on the progress of the previous generation.</p>
<table style="min-width:125px"><colgroup><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"></colgroup><tbody><tr><td><p><strong>Aspect</strong></p></td><td><p><strong>GPT-3</strong></p></td><td><p><strong>InstructGPT</strong></p></td><td><p><strong>ChatGPT</strong></p></td><td><p><strong>GPT-4</strong></p></td></tr><tr><td><p><strong>Core Idea</strong></p></td><td><p>Large-scale language model enabling few-shot and in-context learning</p></td><td><p>Align language models with human instructions using RLHF</p></td><td><p>Conversational AI assistant optimized for dialogue and usability</p></td><td><p>Aligned multimodal foundation model with stronger reasoning and deployment maturity</p></td></tr><tr><td><p><strong>Main Goal</strong></p></td><td><p>Scale capability through massive pretraining</p></td><td><p>Improve instruction following and alignment</p></td><td><p>Deliver usable conversational AI for the public</p></td><td><p>Build reliable multimodal AI systems for real-world deployment</p></td></tr><tr><td><p><strong>Training Objective</strong></p></td><td><p>Predict next token from internet-scale text</p></td><td><p>Optimize outputs using human feedback and preference learning</p></td><td><p>Conversational interaction optimized through RLHF and dialogue tuning</p></td><td><p>Large-scale multimodal pretraining combined with RLHF, safety tuning, and deployment optimization</p></td></tr><tr><td><p><strong>Alignment Focus</strong></p></td><td><p>Minimal explicit alignment</p></td><td><p>Central focus of the paper</p></td><td><p>Strong conversational alignment</p></td><td><p>Advanced alignment and safety engineering</p></td></tr><tr><td><p><strong>RLHF Usage</strong></p></td><td><p>Not central</p></td><td><p>Core innovation of the system</p></td><td><p>Major component of interaction quality</p></td><td><p>Expanded and refined at larger scale</p></td></tr><tr><td><p><strong>Human Feedback Role</strong></p></td><td><p>Limited</p></td><td><p>Human rankings shape model behavior directly</p></td><td><p>Human feedback improves conversation flow and usability</p></td><td><p>Human feedback combined with large-scale safety evaluation and red teaming</p></td></tr><tr><td><p><strong>Interaction Style</strong></p></td><td><p>Prompt-based text generation</p></td><td><p>Instruction-following assistant</p></td><td><p>Natural multi-turn conversational assistant</p></td><td><p>Advanced conversational and multimodal assistant</p></td></tr><tr><td><p><strong>Prompting Style</strong></p></td><td><p>Zero-shot, one-shot, and few-shot prompting</p></td><td><p>Instruction prompts become more reliable</p></td><td><p>Conversational prompting becomes primary interface</p></td><td><p>Conversational and multimodal prompting</p></td></tr><tr><td><p><strong>Conversation Memory</strong></p></td><td><p>Limited contextual continuity</p></td><td><p>Better instruction adherence</p></td><td><p>Maintains dialogue flow across interactions</p></td><td><p>Stronger contextual reasoning across longer interactions</p></td></tr><tr><td><p><strong>Instruction Following</strong></p></td><td><p>Often inconsistent</p></td><td><p>Significantly improved</p></td><td><p>Strong conversational instruction following</p></td><td><p>More reliable and nuanced instruction handling</p></td></tr><tr><td><p><strong>Truthfulness</strong></p></td><td><p>Frequent hallucinations and overconfidence</p></td><td><p>Improved factual alignment through RLHF</p></td><td><p>More reliable but still hallucinates</p></td><td><p>Improved reasoning and factual performance, though hallucinations remain</p></td></tr><tr><td><p><strong>Safety Behavior</strong></p></td><td><p>Weak safety control</p></td><td><p>Safer refusal behavior introduced</p></td><td><p>More robust refusal and moderation behavior</p></td><td><p>Advanced safety pipelines and adversarial testing</p></td></tr><tr><td><p><strong>Harmful Output Handling</strong></p></td><td><p>Often continues unsafe prompts</p></td><td><p>Learns safer refusals from human feedback</p></td><td><p>Stronger refusal behavior in public deployment</p></td><td><p>More sophisticated alignment and safety systems</p></td></tr><tr><td><p><strong>Reasoning Ability</strong></p></td><td><p>Strong emergent reasoning for its time</p></td><td><p>Similar base capability but behaviorally improved</p></td><td><p>Improved practical reasoning in conversation</p></td><td><p>Major leap in reasoning and problem-solving</p></td></tr><tr><td><p><strong>Multimodal Capability</strong></p></td><td><p>Text only</p></td><td><p>Text only</p></td><td><p>Primarily text-based at launch</p></td><td><p>Text and image multimodal understanding</p></td></tr><tr><td><p><strong>Coding Ability</strong></p></td><td><p>Strong code generation emergence</p></td><td><p>Improved usability for coding tasks</p></td><td><p>Widely used as coding assistant</p></td><td><p>Much stronger coding and debugging performance</p></td></tr><tr><td><p><strong>Context Handling</strong></p></td><td><p>2048-token context window</p></td><td><p>Similar GPT-3-based context limits</p></td><td><p>Improved conversational memory handling</p></td><td><p>Much larger context capabilities</p></td></tr><tr><td><p><strong>Model Size</strong></p></td><td><p>175B parameters</p></td><td><p>Fine-tuned versions of GPT-3 models</p></td><td><p>Based on aligned GPT-3.5/GPT-4 systems</p></td><td><p>Undisclosed by OpenAI</p></td></tr><tr><td><p><strong>Training Data</strong></p></td><td><p>Massive internet-scale text datasets</p></td><td><p>GPT-3 pretraining plus human demonstrations and rankings</p></td><td><p>Large conversational interaction tuning datasets</p></td><td><p>Large-scale multimodal and internet-scale datasets</p></td></tr><tr><td><p><strong>Learning Paradigm</strong></p></td><td><p>In-context learning through scale</p></td><td><p>Human preference learning through RLHF</p></td><td><p>Conversational alignment at deployment scale</p></td><td><p>Combined capability scaling and alignment scaling</p></td></tr><tr><td><p><strong>Key Innovation</strong></p></td><td><p>Emergent few-shot learning</p></td><td><p>RLHF-based alignment pipeline</p></td><td><p>Conversational AI interface revolution</p></td><td><p>Multimodal aligned foundation systems</p></td></tr><tr><td><p><strong>User Experience</strong></p></td><td><p>Powerful but difficult to control</p></td><td><p>More cooperative and instruction-aware</p></td><td><p>Feels like talking to an assistant</p></td><td><p>More reliable, capable, and multimodal interaction</p></td></tr><tr><td><p><strong>Reliability</strong></p></td><td><p>Often unstable across prompts</p></td><td><p>More stable instruction behavior</p></td><td><p>Significantly improved usability</p></td><td><p>Stronger robustness and interaction quality</p></td></tr><tr><td><p><strong>Deployment Style</strong></p></td><td><p>Research and API usage</p></td><td><p>Alignment research milestone</p></td><td><p>Mass public deployment</p></td><td><p>Large-scale multimodal deployment</p></td></tr><tr><td><p><strong>Benchmark Emphasis</strong></p></td><td><p>Capability scaling and few-shot tasks</p></td><td><p>Human preference evaluations and alignment</p></td><td><p>Real-world conversational usability</p></td><td><p>Broad multimodal benchmark dominance</p></td></tr><tr><td><p><strong>Main Limitation</strong></p></td><td><p>Poor alignment and hallucinations</p></td><td><p>Alignment still incomplete and subjective</p></td><td><p>Hallucinations and jailbreak vulnerabilities</p></td><td><p>Hallucinations, safety tradeoffs, and lack of transparency</p></td></tr><tr><td><p><strong>Historical Importance</strong></p></td><td><p>Proved scaling produces emergent abilities</p></td><td><p>Introduced modern alignment-centered LLM training</p></td><td><p>Brought conversational AI to mainstream global use</p></td><td><p>Defined the era of aligned multimodal AI systems</p></td></tr><tr><td><p><strong>What Changed in AI</strong></p></td><td><p>Prompting became central</p></td><td><p>Alignment became a core research priority</p></td><td><p>AI became a mainstream consumer interface</p></td><td><p>AI became deployable multimodal infrastructure</p></td></tr><tr><td><p><strong>Legacy</strong></p></td><td><p>Foundation of prompt-driven AI</p></td><td><p>Foundation of ChatGPT alignment pipeline</p></td><td><p>Popularized conversational AI globally</p></td><td><p>Established modern multimodal AI ecosystem</p></td></tr></tbody></table>

<h2 id="heading-from-gpt-1-to-gpt-4-a-timeline-of-modern-ai-systems-and-alignment-evolution">From GPT-1 to GPT-4: A Timeline of Modern AI Systems and Alignment Evolution</h2>
<p>Before we wrap up, it's worth stepping back and looking at the bigger picture.</p>
<p>The InstructGPT paper didn't emerge in isolation. It was part of a much larger evolution that transformed GPT models from research-focused language models into the conversational AI systems we use today.</p>
<p>Each generation introduced a new idea that pushed the field forward.</p>
<p>GPT-1 introduced large-scale pretraining, GPT-2 demonstrated zero-shot capabilities, GPT-3 popularized prompting and in-context learning, and InstructGPT introduced alignment through human feedback. ChatGPT then brought these ideas to millions of users through a conversational interface, while GPT-4 combined alignment with stronger reasoning and multimodal capabilities.</p>
<p>The timeline below summarizes the key transitions that shaped the modern AI era.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/6e4cc89c-7772-41e4-b5dc-b61820e1521a.png" alt="From GPT-1 to GPT-4 A Timeline of Modern AI Systems and Alignment Evolution" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<table style="min-width:150px"><colgroup><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"></colgroup><tbody><tr><td><p><strong>Year</strong></p></td><td><p><strong>System</strong></p></td><td><p><strong>Main Transition</strong></p></td><td><p><strong>What Changed</strong></p></td><td><p><strong>Key Paper / Release</strong></p></td><td><p><strong>Historical Importance</strong></p></td></tr><tr><td><p><strong>2018</strong></p></td><td><p>GPT-1</p></td><td><p>Pretraining + Fine-Tuning Era</p></td><td><p>Introduced generative pretraining using Transformers before supervised fine-tuning</p></td><td><p><em>Improving Language Understanding by Generative Pre-Training</em></p></td><td><p>Started the modern large-scale NLP pretraining paradigm</p></td></tr><tr><td><p><strong>2019</strong></p></td><td><p>GPT-2</p></td><td><p>Zero-Shot Language Modeling Era</p></td><td><p>Showed that larger language models could perform multiple tasks without task-specific fine-tuning</p></td><td><p><em>Language Models are Unsupervised Multitask Learners</em></p></td><td><p>Shifted AI toward general-purpose generative models</p></td></tr><tr><td><p><strong>2020</strong></p></td><td><p>GPT-3</p></td><td><p>In-Context Learning Era</p></td><td><p>Demonstrated few-shot, one-shot, and zero-shot learning at massive scale using prompts alone</p></td><td><p><em>Language Models are Few-Shot Learners</em></p></td><td><p>Made prompting the central interface for AI systems</p></td></tr><tr><td><p><strong>March 2022</strong></p></td><td><p>InstructGPT</p></td><td><p>Alignment and RLHF Era</p></td><td><p>Introduced reinforcement learning from human feedback (RLHF) to align models with user intent</p></td><td><p><em>Training Language Models to Follow Instructions with Human Feedback</em></p></td><td><p>Shifted AI development from raw capability to alignment and usability</p></td></tr><tr><td><p><strong>Nov 2022</strong></p></td><td><p>GPT-3.5 / ChatGPT</p></td><td><p>Conversational AI Era</p></td><td><p>Combined GPT-3.5 with RLHF and chat-based interaction for public deployment</p></td><td><p>ChatGPT public release based on GPT-3.5 family</p></td><td><p>Turned LLMs into mainstream conversational assistants used globally</p></td></tr><tr><td><p><strong>2023</strong></p></td><td><p>GPT-4</p></td><td><p>Multimodal Aligned Foundation Model Era</p></td><td><p>Expanded aligned AI into multimodal reasoning across text and images with stronger reliability and safety systems</p></td><td><p>GPT-4 Technical Report</p></td><td><p>Established the modern era of deployable multimodal AI systems</p></td></tr><tr><td><p><strong>2023–Present</strong></p></td><td><p>GPT-4 + ChatGPT Ecosystem</p></td><td><p>AI Assistant Infrastructure Era</p></td><td><p>AI systems evolved into integrated assistants for coding, education, productivity, reasoning, and multimodal interaction</p></td><td><p>GPT-4 deployment ecosystem</p></td><td><p>Transitioned AI from research products into global infrastructure platforms</p></td></tr></tbody></table>

<h2 id="heading-final-insight">Final Insight</h2>
<p>When people look back at the history of modern AI, they often focus on the moments when models became larger, more powerful, or more capable. But the story of the GPT series is not just a story about scale. It is also a story about learning how to make that intelligence useful.</p>
<p>GPT-1 showed that language models could learn surprisingly rich representations from large amounts of text before being adapted to specific tasks.</p>
<p>GPT-2 expanded that idea and revealed that scale itself could unlock new behaviors.</p>
<p>GPT-3 pushed the field into entirely new territory, demonstrating that a single model could perform a wide variety of tasks simply by responding to prompts and examples.</p>
<p>For a moment, it seemed as though scaling might be the answer to everything.</p>
<p>Then InstructGPT arrived and exposed a different challenge.</p>
<p>The problem was no longer whether a model could generate text, answer questions, or complete tasks. Models were already becoming remarkably capable.</p>
<p>The real question was whether people could actually rely on them. Could they follow instructions consistently? Could they respond in ways users found helpful? Could they become something more than sophisticated prediction engines?</p>
<p>That was the breakthrough at the heart of InstructGPT.</p>
<p>Rather than focusing solely on making models smarter, the paper focused on making them behave better.</p>
<p>Human feedback became part of the training process itself.</p>
<p>Alignment moved from a research concern to a core design principle. For the first time, improving the relationship between humans and AI became just as important as improving the model's raw capabilities.</p>
<p>The impact of that shift extended far beyond a single paper.</p>
<p>It laid the groundwork for ChatGPT, which introduced millions of people to conversational AI. Suddenly, interacting with advanced language models no longer required APIs, research expertise, or carefully engineered prompts. People could simply ask questions, seek advice, explore ideas, or learn something new through natural conversation.</p>
<p>That change transformed AI from a research breakthrough into a widely used product.</p>
<p>GPT-4 would later build on this foundation, combining stronger reasoning and broader capabilities with the alignment techniques that began with InstructGPT. But by then, the industry had already learned an important lesson: capability alone was not enough. Intelligence had to be usable.</p>
<p>In hindsight, the lasting significance of the InstructGPT paper is not that it introduced a new training pipeline. It is that it helped redefine the goal of modern AI.</p>
<p>The challenge was no longer just building systems that could generate language.</p>
<p>It was building systems that people could work with, learn from, and trust.</p>
<p>And that may ultimately be the transition that defined this era of artificial intelligence.</p>
<h2 id="heading-resources"><strong>Resources:</strong></h2>
<ul>
<li><p><a href="https://github.com/MOHAMMEDFAHD/Pytorch-Collections/tree/main/GPT">Pytorch Projects for GPT series</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2203.02155">Training Language Models to Follow Instructions with Human Feedback</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2005.14165">Language Models are Few-Shot Learners</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2009.01325">Learning to Summarize from Human Feedback</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/1909.08593">Fine-Tuning Language Models from Human Preferences</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/1706.03741">Deep Reinforcement Learning from Human Preferences</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2009.01325">Learning to Summarize with Human Feedback</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2008.02275">Aligning AI With Shared Human Values</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2107.05637">Asking for Help on Recursive Decomposition</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2112.09332">WebGPT: Browser-assisted Question-Answering with Human Feedback</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2212.08073">Constitutional AI: Harmlessness from AI Feedback</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2109.07958">TruthfulQA: Measuring How Models Mimic Human Falsehoods</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2009.11462">RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2104.08691">The Power of Scale for Parameter-Efficient Prompt Tuning</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2109.01652">Finetuned Language Models Are Zero-Shot Learners</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2110.08207">Multitask Prompted Training Enables Zero-Shot Task Generalization</a></p>
</li>
</ul>
<p><strong>Contact Me</strong></p>
<ul>
<li><p><a href="https://github.com/MOHAMMEDFAHD"><strong>Github</strong></a></p>
</li>
<li><p><a href="https://x.com/programmingoce"><strong>X</strong></a></p>
</li>
<li><p><a href="https://www.linkedin.com/in/mohammed-abrah-6435a63ba/"><strong>Linkedin</strong></a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Run an LLM Locally on Your Mobile Phone with QVAC and Expo ]]>
                </title>
                <description>
                    <![CDATA[ When I was younger, I remember my mother’s Android phone, a Samsung Galaxy Note 3 that she bought right after losing her BlackBerry. During that time, a phone with 16 GB of storage was considered cutt ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-run-an-llm-locally-on-your-mobile-phone-with-qvac-and-expo/</link>
                <guid isPermaLink="false">6a2061ad78a43e3153aede0d</guid>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Mobile Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ local development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Djibril-M🍀 ]]>
                </dc:creator>
                <pubDate>Wed, 03 Jun 2026 17:17:33 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/a5fb9baf-a10d-4e53-9c66-3980919a35b8.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When I was younger, I remember my mother’s Android phone, a Samsung Galaxy Note 3 that she bought right after losing her BlackBerry. During that time, a phone with 16 GB of storage was considered cutting-edge technology. The ability to store five 720p torrented movies on a single phone honestly felt unreal.</p>
<p>Most flagship devices back then shipped with somewhere between 2 and 8 GB of RAM, and GPUs were nowhere near what we carry around today. My mom’s Galaxy Note 3 featured the Qualcomm Adreno 330 GPU with 32 unified shader cores running at up to 578 MHz — a complete powerhouse for its time.</p>
<p>Fast forward to today, and the phones in our pockets are ridiculously more powerful, more efficient, and, honestly, capable of things people would’ve considered science fiction back then.</p>
<p>But enough about my mom’s phone. What I’m really trying to say is this: instead of spending hundreds of dollars every month on AI subscriptions and tokens, we can take advantage of the insanely capable devices we already carry around every day.</p>
<p>Modern smartphones now have dedicated AI acceleration, impressive thermal efficiency, and enough compute power to run lightweight language models locally, completely offline. That means better privacy, full control over your chat history, lower latency, and the ability to use AI without depending entirely on cloud services.</p>
<p>In this article, we’re going to build a React Native application that interacts with an LLM running directly on the device itself. The implementation will revolve around QVAC, a family of inference tools designed specifically for running AI models locally.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-what-is-qvac">What is QVAC?</a></p>
</li>
<li><p><a href="#heading-environment-setup">Environment Setup</a></p>
</li>
<li><p><a href="#heading-model-management">Model Management</a></p>
</li>
<li><p><a href="#heading-custom-models">Custom Models</a></p>
</li>
<li><p><a href="#heading-complete-implementation">Complete Implementation</a></p>
</li>
<li><p><a href="#heading-codebase-breakdown">Codebase Breakdown</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-resources-amp-further-reading">Resources &amp; Further Reading</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To get the most out of this article, you should have a basic understanding of front end development and React in general. You don't have to be a mobile developer, but understanding React will help a lot.</p>
<h2 id="heading-what-is-qvac">What is QVAC?</h2>
<p>QVAC (QuantumVerse Automatic Computer) is a local-first AI inference platform developed by Tether. It's designed to move artificial intelligence away from centralized cloud systems and bring computation back to the user’s own device.</p>
<p>Most modern AI tools rely heavily on remote servers, API keys, and cloud infrastructure controlled by a handful of companies. While this makes AI accessible, it also creates major concerns around privacy, censorship, vendor lock-in, internet dependency, and ownership of user data. Every prompt, conversation, or uploaded file often passes through third-party servers that users have little control over.</p>
<p>QVAC was designed to solve that problem by allowing AI models and agents to run directly on devices like smartphones, laptops, and embedded systems, even while completely offline. Instead of sending personal conversations and sensitive data to the cloud, users can process everything locally on their own hardware.</p>
<p>The platform also embraces decentralization through peer-to-peer communication, reducing reliance on centralized infrastructure and eliminating single points of failure. This approach makes AI systems more private, resilient, autonomous, and accessible, especially in environments with limited internet access or strict data privacy requirements.</p>
<p>In simple terms, QVAC exists to make AI truly owned by its users — local-first, private by default, and independent from centralized control.</p>
<h2 id="heading-environment-setup">Environment Setup</h2>
<p>To speed up the process, I prepared a React Native starter project with all the dependencies installed. But we will install and set up QVAC in this article, since that's our main topic. Here's a link to the <a href="https://github.com/DjibrilM/QVAC-offline-Chatbot-Article-Project-">repository</a>.</p>
<p>Or you can run the below command to clone the starter project.</p>
<pre><code class="language-shell">git clone --branch ft-ui-implementation --single-branch https://github.com/DjibrilM/QVAC-offline-Chatbot-Article-Project-
</code></pre>
<h3 id="heading-qvac-installation">QVAC Installation</h3>
<p>Run the following command to install the SDK: <code>npm i @qvac/sdk</code>. Feel free to use any package manager of your choice. As for me, I will keep things simple with <code>npm.</code></p>
<p>Then add the following peer dependencies to your <code>package.json</code>:</p>
<pre><code class="language-json">{
  "dependencies": {
    "@qvac/sdk": "^0.7.0",
+   "bare-rpc": "^1.0.0", 
    "expo": "~54.0.33",
    "expo-status-bar": "~3.0.9",
    "react": "19.1.0",
    "react-native": "0.81.5",
+   "react-native-bare-kit": "^0.11.5"  
  },
  "devDependencies": {
    "@types/react": "~19.1.0",
    "bare-pack": "^1.5.1", 
    "typescript": "~5.9.2"
  }
}
</code></pre>
<p>Install the following additional dependencies:</p>
<pre><code class="language-shell">npx expo install expo-file-system expo-build-properties expo-device
</code></pre>
<p>Then configure <code>expo-build-properties</code> and add <code>@qvac/sdk/expo-plugin</code> to the <code>plugins</code> array in your <code>app.json</code>:</p>
<pre><code class="language-json">{
  "expo": {
    "plugins": [
      "expo-router",
      "@qvac/sdk/expo-plugin",
      [
        "expo-splash-screen",
        {
          "backgroundColor": "#208AEF",
          "android": {
            "image": "./assets/images/splash-icon.png",
            "imageWidth": 76
          }
        }
      ]
    ]
  }
}
</code></pre>
<p>Run the following command to build the native modules:</p>
<pre><code class="language-shell">npx expo prebuild
</code></pre>
<p><strong>Note:</strong> QVAC uses llama.cpp under the hood. Due to optimization requirements and native hardware dependencies, the QVAC SDK doesn't run on emulators. You'll have to test this with a real physical device with Developer Mode enabled.</p>
<p>To run the app on your physical device, execute:</p>
<pre><code class="language-shell"># For Android:
npx expo run:android --device

# For iOS:
npx expo run:ios --device
</code></pre>
<h2 id="heading-model-management">Model Management</h2>
<p>The QVAC model management system is completely local-first and decentralized. It handles the entire lifecycle, from downloading files to lifecycle optimization, abstracting everything behind clean utility APIs.</p>
<h3 id="heading-resumable-amp-deduplicated-downloading-downloadasset">Resumable &amp; Deduplicated Downloading (<code>downloadAsset</code>)</h3>
<p>It writes temporary chunks to local disk. If a network drop occurs, the partial file is preserved and resumes automatically upon the next call. Also, if multiple components invoke a download for the same asset simultaneously, QVAC handles the streaming under a single network stream.</p>
<h3 id="heading-memory-lifecycle-loadmodel-amp-unloadmodel">Memory Lifecycle (<code>loadModel</code> &amp; <code>unloadModel</code>)</h3>
<p><code>loadModel</code> maps the asset file directly into memory, maps it to your hardware target (such as the device GPU), and exposes an ephemeral <code>modelId</code>. Because local inference is highly memory-intensive on mobile devices, calling <code>unloadModel</code> frees system RAM immediately while preserving the downloaded file on disk.</p>
<h3 id="heading-custom-models">Custom Models</h3>
<p>Because QVAC relies on an optimized branch of llama.cpp, it remains highly compatible with the open-source AI ecosystem. If you plan to load custom models, ensure they adhere to these criteria:</p>
<ul>
<li><p><strong>Format:</strong> Must be in the GGUF (<code>.gguf</code>) format.</p>
</li>
<li><p><strong>Quantization:</strong> For mobile and edge deployments, always prioritize <code>Q4_0</code>, <code>Q4_K_M</code>, or <code>Q8_0</code> configurations to guarantee they fit safely within mobile hardware RAM constraints.</p>
</li>
</ul>
<h2 id="heading-complete-implementation">Complete Implementation</h2>
<p>Now let's replace your main file codebase logic with the full implementation, combining the UI container layout, user interaction state, model lifecycle setup, and real-time inference handling into a cohesive structure.</p>
<p>Replace your entry file with the following code:</p>
<pre><code class="language-typescript">import { ChatInput } from "@/components/chat-input";
import { ChatMessage, Message } from "@/components/chat-message";
import { ModelLoader } from "@/components/model-loader";
import { Button } from "@/components/ui/button";
import { Text } from "@/components/ui/text";

import {
  completion,
  deleteCache,
  downloadAsset,
  LLAMA_3_2_1B_INST_Q4_0,
  loadModel,
  type ModelProgressUpdate,
  VERBOSITY,
} from "@qvac/sdk";
import { SymbolView } from "expo-symbols";
import { useEffect, useRef, useState } from "react";

import {
  Clipboard,
  KeyboardAvoidingView,
  Platform,
  SafeAreaView,
  ScrollView,
  View,
} from "react-native";

const makeId = () =&gt; Math.random().toString(36).substring(2, 9);

export default function Index() {
  const [messages, setMessages] = useState&lt;Message[]&gt;([]);
  const [input, setInput] = useState("");
  const [isGenerating, setIsGenerating] = useState(false);

  // Model loading state
  const [modelId, setModelId] = useState&lt;string | null&gt;(null);
  const [isModelLoaded, setIsModelLoaded] = useState(false);
  const [isDownloading, setIsDownloading] = useState(false);
  const [downloadProgress, setDownloadProgress] = useState(0);

  const scrollViewRef = useRef&lt;ScrollView&gt;(null);
  const messagesRef = useRef&lt;Message[]&gt;([]);

  useEffect(() =&gt; {
    messagesRef.current = messages;
  }, [messages]);

  const startDownload = () =&gt; {
    setIsDownloading(true);
    setupModel();
  };

  // Automatically scroll to bottom when messages list updates
  useEffect(() =&gt; {
    if (scrollViewRef.current) {
      setTimeout(() =&gt; {
        scrollViewRef.current?.scrollToEnd({ animated: true });
      }, 100);
    }
  }, [messages, isGenerating]);

  const copyToClipboard = (text: string) =&gt; {
    if (Platform.OS === "web") {
      navigator.clipboard.writeText(text);
    } else {
      Clipboard.setString(text);
    }
  };

  const setupModel = async () =&gt; {
    try {
      setIsDownloading(true);
      setDownloadProgress(0);
      
      // 1. Local download path execution
      await downloadAsset({
        assetSrc: LLAMA_3_2_1B_INST_Q4_0,
        onProgress: (progress: ModelProgressUpdate) =&gt; {
          setDownloadProgress(progress.percentage / 100);
        },
      });

      setDownloadProgress(1);

      // 2. Load model into runtime memory
      const loadedModel = await loadModel({
        modelSrc: LLAMA_3_2_1B_INST_Q4_0,
        modelType: "llm",
        modelConfig: {
          device: "gpu",
          ctx_size: 2048,
          verbosity: VERBOSITY.ERROR,
        },
      });

      setModelId(loadedModel);
      setIsModelLoaded(true);
      setIsDownloading(false);
    } catch (e: any) {
      console.error("Error setting up model:", e);
      setIsDownloading(false);
    }
  };

  async function handleSend() {
    // Guard against sending before the model is ready or while generating.
    if (!modelId || isGenerating) return;

    const trimmed = input.trim();
    if (!trimmed) return;

    setInput("");
    setIsGenerating(true);

    // Append user message and a placeholder assistant message for streaming.
    const userMsg: Message = {
      id: makeId(),
      role: "user",
      content: trimmed,
    };

    const assistantId = makeId();

    const assistantMsg: Message = {
      id: assistantId,
      role: "assistant",
      content: "",
    };

    setMessages((prev) =&gt; [...prev, userMsg, assistantMsg]);

    try {
      // Build chat history for the completion request.
      const history = [...messagesRef.current, userMsg].map((m) =&gt; ({
        role: m.role,
        content: m.content,
      }));

      // Run a streaming completion and update the last assistant bubble.
      const result = completion({
        modelId,
        history,
        stream: true,
      });

      let acc = "";

      for await (const token of result.tokenStream) {
        acc += token;

        // Update only the last assistant message content
        setMessages((prev) =&gt;
          prev.map((m) =&gt;
            m.id === assistantId ? { ...m, content: acc } : m
          )
        );
      }

      // Optional: Log completion performance stats
      try {
        const stats = await result.stats;
        console.log("📊 Completion stats:", stats);
      } catch {}

    } catch (e: any) {
      // Show any error in the assistant bubble.
      setMessages((prev) =&gt;
        prev.map((m) =&gt;
          m.id === assistantId
            ? { ...m, content: `❌ Error: ${e?.message ?? String(e)}` }
            : m
        )
      );
    } finally {
      setIsGenerating(false);
    }
  }

  if (!isModelLoaded) {
    return (
      &lt;ModelLoader
        onDownload={startDownload}
        isDownloading={isDownloading}
        progress={downloadProgress}
      /&gt;
    );
  }

  return (
    &lt;SafeAreaView className="flex-1 bg-background"&gt;
      &lt;KeyboardAvoidingView
        behavior={Platform.OS === "ios" ? "padding" : "height"}
        className="flex-1"
      &gt;
        &lt;View className="flex-row items-center justify-between p-4 border-b border-border"&gt;
          &lt;View className="flex-row items-center gap-2"&gt;
            &lt;View className="w-2 h-2 rounded-full bg-emerald-500" /&gt;
            &lt;Text className="font-semibold text-lg"&gt;Local Llama 3.2&lt;/Text&gt;
          &lt;/View&gt;
          &lt;Text className="text-xs text-muted-foreground"&gt;Offline Engine&lt;/Text&gt;
        &lt;/View&gt;

        &lt;ScrollView
          ref={scrollViewRef}
          className="flex-1 px-4"
          contentContainerStyle={{ paddingVertical: 16, gap: 16 }}
        &gt;
          {messages.filter(m =&gt; m.content !== "" || m.role === "assistant").map((msg) =&gt; (
            &lt;ChatMessage
              key={msg.id}
              message={msg}
              onCopy={() =&gt; copyToClipboard(msg.content)}
            /&gt;
          ))}
        &lt;/ScrollView&gt;

        &lt;ChatInput
          value={input}
          onChangeText={setInput}
          onSend={handleSend}
          disabled={isGenerating}
          placeholder={isGenerating ? "Thinking..." : "Type a message..."}
        /&gt;
      &lt;/KeyboardAvoidingView&gt;
    &lt;/SafeAreaView&gt;
  );
}
</code></pre>
<h3 id="heading-codebase-breakdown">Codebase Breakdown</h3>
<p>Let’s lift the hood on how this unified component manages local model workflows and real-time UI streaming.</p>
<h4 id="heading-1-tracking-model-state-amp-asynchronous-synchronization">1. Tracking Model State &amp; Asynchronous Synchronization</h4>
<p>At the root of the component, we track both user-facing interface state and underlying QVAC runtime handles:</p>
<pre><code class="language-typescript">const [messages, setMessages] = useState&lt;Message[]&gt;([]);
const [modelId, setModelId] = useState&lt;string | null&gt;(null);
const [isModelLoaded, setIsModelLoaded] = useState(false);
const [isDownloading, setIsDownloading] = useState(false);
const [downloadProgress, setDownloadProgress] = useState(0);
</code></pre>
<p>Because state setters in React are asynchronous, streaming loops can accidentally capture stale representations of current chat logs.</p>
<p>To circumvent this, a mutable <code>messagesRef</code> acts as a real-time single source of truth for the active session state:</p>
<pre><code class="language-typescript">const messagesRef = useRef&lt;Message[]&gt;([]);

useEffect(() =&gt; {
  messagesRef.current = messages;
}, [messages]);
</code></pre>
<h4 id="heading-2-orchestrating-download-amp-memory-instantiation">2. Orchestrating Download &amp; Memory Instantiation</h4>
<p>When the user strikes the download button action trigger, the application launches <code>setupModel()</code>. This function splits tasks clearly across local storage caching and active hardware allocation layers:</p>
<pre><code class="language-typescript">await downloadAsset({
  assetSrc: LLAMA_3_2_1B_INST_Q4_0,
  onProgress: (progress: ModelProgressUpdate) =&gt; {
    setDownloadProgress(progress.percentage / 100);
  },
});
</code></pre>
<ul>
<li><p><strong>Storage Sync:</strong> <code>downloadAsset</code> reaches out to pull the designated standard model signature down into mobile device disk files.</p>
</li>
<li><p><strong>Hardware Binding:</strong> Once safe on disk, <code>loadModel</code> executes to wake up the engine runtime:</p>
</li>
</ul>
<pre><code class="language-typescript">const loadedModel = await loadModel({
  modelSrc: LLAMA_3_2_1B_INST_Q4_0,
  modelType: "llm",
  modelConfig: {
    device: "gpu",
    ctx_size: 2048,
    verbosity: VERBOSITY.ERROR,
  },
});
</code></pre>
<p>Passing <code>device: "gpu"</code> tells QVAC to run hardware-accelerated kernels across the smartphone's graphic processing hardware structure, ensuring rapid performance metrics instead of locking execution to slower CPU loops.</p>
<h4 id="heading-3-pipeline-ingest-amp-streaming-generation-loop">3. Pipeline Ingest &amp; Streaming Generation Loop</h4>
<p>Once user validation confirms the prompt is ready, <code>handleSend()</code> sets up user bubbles and generates an empty assistant placeholder card to catch token output segments.</p>
<p>The application map transforms references straight out of <code>messagesRef.current</code> into a structured history syntax before processing:</p>
<pre><code class="language-typescript">const result = completion({
  modelId,
  history,
  stream: true,
});
</code></pre>
<p>With <code>stream: true</code> enabled, QVAC doesn't hold up your application thread waiting for long string sequences to complete. Instead, it yields an asynchronous iterable stream that spits out fresh updates instantly:</p>
<pre><code class="language-typescript">let acc = "";

for await (const token of result.tokenStream) {
  acc += token;

  setMessages((prev) =&gt;
    prev.map((m) =&gt;
      m.id === assistantId ? { ...m, content: acc } : m
    )
  );
}
</code></pre>
<p>The loop continuously concatenates token text variables into the tracking accumulator (<code>acc</code>), target patching state properties exclusively against our placeholder identifier (<code>assistantId</code>). This creates a lightning-fast typing animation experience while executing fully offline on your user's physical device hardware.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Building a local-first AI application is no longer a concept confined to high-end desktops or specialized research labs. As we’ve seen, the smartphones we carry in our pockets every day possess more than enough computational muscle and dedicated hardware acceleration to run highly capable language models completely offline.</p>
<p>By leveraging React Native and the QVAC SDK, we successfully bypassed the traditional cloud-dependent architecture. We eliminated the need for complex server infrastructure, API key management, and recurring token subscription fees, all while providing an ultra-private, low-latency, streaming chat experience directly on-device.</p>
<p>As open-source models continue to shrink in size and grow in capabilities, edge inference will become an essential architecture for developers prioritizing privacy, offline resilience, and cost efficiency. The power to compute is back where it belongs: in the hands of the user.</p>
<h3 id="heading-resources-amp-further-reading">Resources &amp; Further Reading</h3>
<p>To dive deeper into local inference, inspect the source code, or explore advanced configurations for your mobile applications, check out the following resources:</p>
<ul>
<li><p><a href="https://docs.qvac.tether.io/tutorials/expo/"><strong>QVAC Expo Integration Tutorial</strong></a> – The official step-by-step documentation for configuring QVAC within the Expo and React Native ecosystems.</p>
</li>
<li><p><a href="https://github.com/DjibrilM/QVAC-offline-Chatbot-Article-Project-"><strong>Project GitHub Repository</strong></a> – Access the complete source code, including the UI layout components, starter themes, and full configuration files used in this guide.</p>
</li>
<li><p><a href="https://github.com/ggml-org/llama.cpp"><strong>Llama.cpp Official Repository</strong></a> – Learn more about the underlying inference engine that powers QVAC's hardware-accelerated local execution.</p>
</li>
<li><p><a href="https://huggingface.co/models?search=gguf"><strong>Hugging Face GGUF Models</strong></a> – Explore thousands of open-source, quantized models that you can download and experiment with inside your local application.</p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Understand the Safe Integer Limit in JavaScript ]]>
                </title>
                <description>
                    <![CDATA[ According to the Stack overflow technology survey in 2025, JavaScript is one of the most widely used programming languages in the world. We use it to build frontend applications, backend services, pay ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-understand-the-safe-integer-limit-in-javascript/</link>
                <guid isPermaLink="false">6a20610c78a43e3153ae86b2</guid>
                
                    <category>
                        <![CDATA[ BigInt ]]>
                    </category>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ayodele Aransiola ]]>
                </dc:creator>
                <pubDate>Wed, 03 Jun 2026 17:14:52 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/9dde6b7a-ff16-4ab1-bdef-c8c7be8d82e9.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>According to the <a href="https://survey.stackoverflow.co/2025/technology">Stack overflow technology survey in 2025</a>, JavaScript is one of the most widely used programming languages in the world. We use it to build frontend applications, backend services, payment systems, analytics platforms, blockchain applications, and more.</p>
<p>But JavaScript has an interesting limitation that many developers don't fully understand until it causes a production issue. That limitation is called the <strong>safe integer limit</strong>.</p>
<p>In this article, you'll learn:</p>
<ul>
<li><p>What the safe integer limit is</p>
</li>
<li><p>Why JavaScript has this limitation</p>
</li>
<li><p>How precision errors happen</p>
</li>
<li><p>What <code>BigInt</code> is</p>
</li>
<li><p>How modern systems use <code>BigInt</code></p>
</li>
<li><p>How to use large integers safely in production applications</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-what-is-the-safe-integer-limit-in-javascript">What Is the Safe Integer Limit in JavaScript?</a></p>
</li>
<li><p><a href="#heading-why-is-it-called-a-safe-integer">Why Is It Called a “Safe” Integer?</a></p>
</li>
<li><p><a href="#heading-how-can-you-understand-this-problem-if-you-are-new-to-the-game">How Can You Understand This Problem if You Are New to the Game?</a></p>
</li>
<li><p><a href="#heading-how-to-check-if-a-number-is-safe">How to Check if a Number Is Safe</a></p>
</li>
<li><p><a href="#heading-can-unsafe-integers-cause-any-problems">Can Unsafe Integers Cause Any Problems?</a></p>
</li>
<li><p><a href="#heading-introducing-bigint-in-javascript">Introducing BigInt in JavaScript</a></p>
<ul>
<li><p><a href="#heading-how-to-perform-operations-with-bigint">How to Perform Operations with BigInt</a></p>
</li>
<li><p><a href="#heading-how-bigint-differs-from-number">How BigInt Differs from Number</a></p>
</li>
<li><p><a href="#heading-how-modern-software-uses-bigint">How Modern Software Uses BigInt</a></p>
</li>
<li><p><a href="#heading-when-you-should-use-bigint">When You Should Use BigInt</a></p>
</li>
<li><p><a href="#heading-when-you-should-not-use-bigint">When You Should Not Use BigInt</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-final-thoughts">Final Thoughts</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along with this article, you should have:</p>
<ul>
<li><p>Basic knowledge of JavaScript</p>
</li>
<li><p>A code editor or browser console</p>
</li>
<li><p>Familiarity with variables and functions</p>
</li>
</ul>
<h2 id="heading-what-is-the-safe-integer-limit-in-javascript">What Is the Safe Integer Limit in JavaScript?</h2>
<p>JavaScript uses the <code>Number</code> type to represent numbers.</p>
<p>For example:</p>
<pre><code class="language-jsx">const age = 25
const price = 99.99
const count = 1000
</code></pre>
<p>Under the hood, JavaScript stores numbers using the <a href="https://en.wikipedia.org/wiki/IEEE_754">IEEE 754 double-precision floating-point</a> format. You don't need to memorize the entire specification, but you should understand one important consequence: JavaScript can only represent integers accurately up to a certain point.</p>
<p>That point is:</p>
<pre><code class="language-javascript">console.log(Number.MAX_SAFE_INTEGER) // 9007199254740991
</code></pre>
<p>This is the largest integer JavaScript can safely represent using the <code>Number</code> type.</p>
<p>The smallest safe integer is:</p>
<pre><code class="language-javascript">console.log(Number.MIN_SAFE_INTEGER) // -9007199254740991
</code></pre>
<h2 id="heading-why-is-it-called-a-safe-integer">Why Is It Called a “Safe” Integer?</h2>
<p>The word “safe” means JavaScript can still represent the integer accurately without losing precision. Once you go beyond the safe limit, JavaScript starts making approximation mistakes.</p>
<p>Let’s look at an example.</p>
<pre><code class="language-jsx">const max = Number.MAX_SAFE_INTEGER

console.log(max + 1) // 9007199254740992
console.log(max + 2) // 9007199254740992
</code></pre>
<p>This is incorrect because adding <code>1</code> and <code>2</code> shouldn't produce the same result, but guess what? This happens because JavaScript can no longer distinguish between nearby large integers accurately.</p>
<h2 id="heading-how-can-you-understand-this-problem-if-you-are-new-to-the-game">How Can You Understand This Problem if You Are New to the Game?</h2>
<p>Imagine you have a camera. When you zoom in closely, you can see every small detail clearly. But when you zoom out too far, tiny details begin to disappear.</p>
<p>JavaScript numbers behave similarly. Small integers are represented precisely:</p>
<pre><code class="language-jsx">console.log(10)
console.log(100)
console.log(1000)
</code></pre>
<p>But extremely large integers lose detail because JavaScript runs out of precision. At that point, multiple numbers begin collapsing into the same value internally. That is why large integer calculations become unreliable.</p>
<h2 id="heading-how-to-check-if-a-number-is-safe">How to Check if a Number Is Safe</h2>
<p>JavaScript provides a built-in method called <code>Number.isSafeInteger()</code>.</p>
<p>Example:</p>
<pre><code class="language-jsx">console.log(Number.isSafeInteger(100)) // true
</code></pre>
<p>Another example:</p>
<pre><code class="language-jsx">console.log(Number.isSafeInteger(Number.MAX_SAFE_INTEGER)) // true
</code></pre>
<p>But the below code returns false:</p>
<pre><code class="language-jsx">console.log(
  Number.isSafeInteger(Number.MAX_SAFE_INTEGER + 1)
) // false
</code></pre>
<p>This method is useful when validating large integers from APIs, databases, or user input.</p>
<h2 id="heading-can-unsafe-integers-cause-any-problems">Can Unsafe Integers Cause Any Problems?</h2>
<p>Unsafe integers can create serious production bugs. For example, in financial calculations: imagine a payment platform processing extremely large transaction records. Precision issues can corrupt balances or reconciliation logic.</p>
<pre><code class="language-jsx">const amount = 9007199254740993

console.log(amount) // 9007199254740992
</code></pre>
<p>The value changes unexpectedly. That's dangerous for financial systems.</p>
<p>Another example is in analytics systems. Large-scale analytics platforms often track billions or trillions of events. Unsafe integers can distort counters and reports.</p>
<p>Also, distributed systems frequently generate very large IDs. Examples include database IDs, event IDs, transaction IDs, and blockchain transaction hashes. If precision is lost, systems may reference the wrong records.</p>
<p>Blockchain systems also commonly use extremely large integers. Ethereum, for example, stores values in <code>wei</code>. One Ether equals:</p>
<pre><code class="language-plaintext">1,000,000,000,000,000,000 wei
</code></pre>
<p>That number exceeds JavaScript’s safe integer limit. Without proper handling, balances become inaccurate.</p>
<h2 id="heading-introducing-bigint-in-javascript">Introducing BigInt in JavaScript</h2>
<p>JavaScript introduced <code>BigInt</code> to solve this problem. <code>BigInt</code> allows JavaScript to represent integers larger than the safe limit accurately. You can create a <code>BigInt</code> by adding <code>n</code> to the end of a number.</p>
<p>Example:</p>
<pre><code class="language-jsx">const largeNumber = 9007199254740993n

console.log(largeNumber) // 9007199254740993n
</code></pre>
<p>Notice that the value remains accurate. You can also create <code>BigInt</code> values using the <code>BigInt()</code> constructor.</p>
<pre><code class="language-jsx">const value = BigInt("9007199254740993123123123")

console.log(value)
</code></pre>
<h3 id="heading-how-to-perform-operations-with-bigint">How to Perform Operations with BigInt</h3>
<p>You can use arithmetic operators with <code>BigInt</code>.</p>
<p>Here's an example:</p>
<pre><code class="language-jsx">const a = 1000000000000000000n
const b = 2n

console.log(a + b) // 1000000000000000002n
console.log(a - b) // 999999999999999998n
console.log(a * b) // 2000000000000000000n
console.log(a / b) // 500000000000000000n
</code></pre>
<h3 id="heading-how-bigint-differs-from-number">How BigInt Differs from Number</h3>
<p>One important rule is that you can't mix <code>BigInt</code> and <code>Number</code> directly.</p>
<p>This will throw an error:</p>
<pre><code class="language-jsx">const result = 1n + 1 // TypeError
</code></pre>
<p>You must convert explicitly, like this:</p>
<pre><code class="language-jsx">const result = 1n + BigInt(1)

console.log(result)
</code></pre>
<p>Or this:</p>
<pre><code class="language-jsx">const result = Number(1n) + 1

console.log(result)
</code></pre>
<p>Explicit conversion prevents accidental precision loss.</p>
<h3 id="heading-how-modern-software-uses-bigint">How Modern Software Uses BigInt</h3>
<p>Many modern applications rely on <code>BigInt</code>. Let’s look at a practical example. Blockchain applications depend heavily on precise integer calculations.</p>
<p>Example:</p>
<pre><code class="language-jsx">const wei = 1000000000000000000n
const balance = 5000000000000000000n

console.log(balance / wei) // 5n
</code></pre>
<p>Libraries in Ethereum ecosystems often use <code>BigInt</code> internally for token balances and gas calculations.</p>
<h3 id="heading-when-you-should-use-bigint">When You Should Use BigInt</h3>
<p>Use <code>BigInt</code> when:</p>
<ul>
<li><p>Integer precision matters</p>
</li>
<li><p>Numbers exceed the safe limit</p>
</li>
<li><p>You're building blockchain applications</p>
</li>
<li><p>You're handling financial ledgers</p>
</li>
<li><p>You're processing massive counters</p>
</li>
<li><p>You're working with large database IDs</p>
</li>
</ul>
<h3 id="heading-when-you-should-not-use-bigint">When You Should Not Use BigInt</h3>
<p>Avoid <code>BigInt</code> when:</p>
<ul>
<li><p>You need decimal calculations</p>
</li>
<li><p>You're building simple frontend interactions</p>
</li>
<li><p>Precision isn't critical</p>
</li>
<li><p>Performance matters more than huge integer support</p>
</li>
</ul>
<p><code>BigInt</code> operations are slower than normal <code>Number</code> operations because they require arbitrary-precision arithmetic.</p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>JavaScript’s safe integer limit isn't just a theoretical concept. It affects real-world systems every day. As applications grow larger and more distributed, developers increasingly work with massive integers in payment systems, blockchain platforms, analytics pipelines, databases, event-driven architectures, and so on.</p>
<p>Understanding the safe integer limit helps you avoid subtle production bugs that are often difficult to detect. <code>BigInt</code> gives JavaScript the ability to handle these large integers safely and accurately. But like any powerful tool, it should be used intentionally.</p>
<p>Just keep in mind: Use normal <code>Number</code> values for everyday calculations. Use <code>BigInt</code> when precision becomes critical.</p>
<p>The key lesson is simple: large numbers aren't always safe numbers in JavaScript.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Deploy a Spring Boot App with MySQL on Amazon EKS ]]>
                </title>
                <description>
                    <![CDATA[ If you've been looking to deploy your Spring Boot app to the cloud but feel a little overwhelmed by all the moving pieces, don't worry, you're not alone. Kubernetes can seem intimidating at first, but ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-deploy-a-spring-boot-app-with-mysql-on-amazon-eks/</link>
                <guid isPermaLink="false">6a20609578a43e3153ae5422</guid>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ EKS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Kubernetes ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Springboot ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Chisom Uma ]]>
                </dc:creator>
                <pubDate>Wed, 03 Jun 2026 17:12:53 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/5a7cd6a7-7850-4e3c-9a45-b577c2f91598.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you've been looking to deploy your Spring Boot app to the cloud but feel a little overwhelmed by all the moving pieces, don't worry, you're not alone.</p>
<p>Kubernetes can seem intimidating at first, but Amazon EKS (Elastic Kubernetes Service) makes it much more approachable, especially when you have a step-by-step guide to follow.</p>
<p>In this tutorial, we'll walk through exactly how to get a Spring Boot application with a MySQL database up and running on Amazon EKS. I'll take you from from containerizing your app to connecting it to a managed database, all the way to accessing it live in the cloud. Let’s get started.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-application-overview">Application Overview</a></p>
</li>
<li><p><a href="#heading-what-is-amazon-eks">What is Amazon EKS?</a></p>
</li>
<li><p><a href="#heading-how-to-deploy-a-spring-boot-app-with-mysql-on-amazon-eks">How to Deploy a Spring Boot App with MySQL on Amazon EKS</a></p>
<ul>
<li><p><a href="#heading-step-1-create-the-vpc">Step 1: Create the VPC</a></p>
</li>
<li><p><a href="#heading-step-2-set-up-the-mysql-database-in-a-private-subnet">Step 2: Set Up the MySQL Database in a Private Subnet</a></p>
</li>
<li><p><a href="#heading-step-3-deploy-ec2-instance-in-a-public-subnet">Step 3: Deploy EC2 Instance in a Public Subnet</a></p>
</li>
<li><p><a href="#heading-step-4-create-ssh-tunneling-for-the-database">Step 4: Create SSH Tunneling for the Database</a></p>
</li>
<li><p><a href="#heading-step-5-set-up-a-simple-springboot-application-development">Step 5: Set Up a Simple SpringBoot Application Development</a></p>
</li>
<li><p><a href="#heading-step-6-configure-springboot-app-for-database">Step 6: Configure SpringBoot App for Database</a></p>
</li>
<li><p><a href="#heading-step-7-dockerize-the-spring-boot-application">Step 7: Dockerize the Spring Boot Application</a></p>
</li>
<li><p><a href="#heading-step-8-push-the-image-to-elastic-container-registry-ecr">Step 8: Push the Image to Elastic Container Registry (ECR)</a></p>
</li>
<li><p><a href="#heading-step-9-implement-aws-app-load-balancer">Step 9: Implement AWS App Load Balancer</a></p>
</li>
<li><p><a href="#heading-step-10-create-a-cluster-in-eks">Step 10: Create a Cluster in EKS</a></p>
</li>
<li><p><a href="#heading-step-11-install-aws-load-balancing">Step 11: Install AWS Load Balancing</a></p>
</li>
<li><p><a href="#heading-step-12-create-and-deploy-kubernetes">Step 12: Create and Deploy Kubernetes</a></p>
</li>
<li><p><a href="#heading-step-13-delete-cluster">Step 13: Delete Cluster</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you begin, ensure you have the following:</p>
<ul>
<li><p>Basic knowledge of AWS (AWS Console access).</p>
</li>
<li><p>Basic knowledge of containerization.</p>
</li>
<li><p>Working knowledge of Kubernetes.</p>
</li>
<li><p>Basic knowledge of databases.</p>
</li>
<li><p><a href="https://helm.sh/docs/intro/install/">Helm</a> installed</p>
</li>
<li><p><a href="https://kubernetes.io/docs/tasks/tools/">Kubectl</a> installed</p>
</li>
<li><p><a href="https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-eksctl.html">Eksctl</a> installed</p>
</li>
<li><p>An IDE</p>
</li>
</ul>
<h2 id="heading-application-overview">Application Overview</h2>
<p>The application runs inside an AWS VPC spread across two availability zones for high availability. When a user makes a request, it flows through an Internet Gateway into an AWS Application Load Balancer sitting in the public subnet, which handles incoming traffic via an Ingress rule.</p>
<p>The Load Balancer routes requests to the App Service, which distributes them across multiple App Pods running inside AWS EKS (Elastic Kubernetes Service) in the private subnets.</p>
<p>The Docker images for these pods are pulled from AWS ECR (Elastic Container Registry). For data persistence, the app pods connect to Amazon RDS MySQL databases through a MySQL External Service, with an RDS instance in each availability zone to ensure redundancy.</p>
<p>A NAT Gateway in the public subnet allows the private resources to make outbound internet calls without being directly exposed to the internet.</p>
<h2 id="heading-what-is-amazon-eks">What is Amazon EKS?</h2>
<p>If you've ever tried to manage containers manually, you already know it can get messy pretty quickly, tracking which containers are running, restarting ones that crash, scaling up when traffic spikes... It's a lot.</p>
<p>That's exactly the problem Kubernetes was built to solve. It automates the deployment, scaling, and management of containerized applications. But setting up and maintaining your own Kubernetes cluster from scratch? That's a whole other challenge.</p>
<p>That's where <a href="https://aws.amazon.com/pm/eks/">Amazon EKS</a> comes in. EKS is a fully managed Kubernetes service provided by AWS, which means AWS handles the heavy lifting of setting up, securing, and maintaining the Kubernetes control plane for you. You just focus on deploying your application.</p>
<h2 id="heading-how-to-deploy-a-spring-boot-app-with-mysql-on-amazon-eks">How to Deploy a Spring Boot App with MySQL on Amazon EKS</h2>
<p>In this section, we’ll cover the steps to follow in deploying your SpringBoot application with MySQL on Amazon EKS.</p>
<h3 id="heading-step-1-create-the-vpc">Step 1: Create the VPC</h3>
<p>To create a VPC, log in to the <a href="https://signin.aws.amazon.com/signin?redirect_uri=https%3A%2F%2Fus-east-1.console.aws.amazon.com%2Fiam%3Fca-oauth-flow-id%3Df7d2%26hashArgs%3D%2523%26isauthcode%3Dtrue%26oauthStart%3D1777888354778%26region%3Dus-east-1%26state%3DhashArgsFromTB_us-east-1_0481039a94bc47bd&amp;client_id=arn%3Aaws%3Asignin%3A%3A%3Aconsole%2Fiamv2&amp;forceMobileApp=0&amp;code_challenge=USO5m22DxkRMX1kvbC19ZE-zr5Eyzp52MXY5jnbANB8&amp;code_challenge_method=SHA-256">AWS IAM Console</a> and search for “VPC,” then click create VPC.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/9a1f57fd-7665-469f-a0c2-7d548590c20f.png" alt="vpc interface" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Select the "VPC and more option:, and give your VPC a name for your project, for example, spring-demo. Set the IPv4 CIDR block to 10.4.0.0/16. For the NAT gateway configuration, select Zonal, then In 1 AZ.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/960002c0-9d53-481d-90be-79a7092088ce.png" alt="NAT gateway config" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Select None for VPC endpoints configuration. Next, click Create VPC, then click View VPC. This takes you to the VPC resource map.</p>
<h3 id="heading-step-2-set-up-the-mysql-database-in-a-private-subnet">Step 2: Set Up the MySQL Database in a Private Subnet</h3>
<p>First, you need to create the security group for the MySQL and EC2 instance deployment. To do that, navigate to EC2 &gt; Security Groups. For the inbound rule, select Type: All traffic and Source: Anywhere-IPv4. Then click Create security group.</p>
<p>Next, we’ll create the subnet group for the database. To do that, navigate to Aurora and RDS &gt; Subnet groups and click Create DB subnet group. Next, configure the DB subnet to include:</p>
<ul>
<li><p><strong>Name</strong>: private-subnet-db</p>
</li>
<li><p><strong>Description</strong>: private-subnet-db</p>
</li>
<li><p><strong>VPC</strong>: Select VPC</p>
</li>
<li><p><strong>Add subnets</strong>: Choose <code>us-east-1a</code> and <code>us-east-1b</code> as the availability zones, then select the private and public subnets</p>
</li>
</ul>
<p>Click Create**.**</p>
<p>Now, navigate to Databases, click Create database, and select Full configuration. Select MySQL as the engine type.</p>
<p>Select the Free tier when choosing a sample template. Next, give your DB a username and a strong password. Choose <code>db.t3.micro</code> as the instance type.</p>
<p>Select your VPC and associated private subnet. Now, uncheck the "Enable auto minor version upgrade" option in the Additional configuration section and click Create database.</p>
<p>While our database initializes, let's create a key pair for the EC2 instance, which will be launched in a public subnet. To do that, navigate to EC2 &gt; Network &amp; Security &gt; Key Pairs and click Create key pair.</p>
<p>Give your key pair a name, for example, ece-db-key-pair. Leave everything else as-is and click Create key pair. This automatically downloads the key-pair into your local machine.</p>
<h3 id="heading-step-3-deploy-ec2-instance-in-a-public-subnet">Step 3: Deploy EC2 Instance in a Public Subnet</h3>
<p>Now it’s time to create an EC2 instance. To do this, navigate to EC2 &gt; Instances and click Launch instances. Select the key pair you just created in the Key pair section.</p>
<p>Next, in the Network section, select the VPC created earlier for the project. For Auto-assign public IP, choose Enable. Next, choose the Select existing security group option and select the all-access-sg security group created earlier. Next, click Launch instance.</p>
<h3 id="heading-step-4-create-ssh-tunneling-for-the-database">Step 4: Create SSH Tunneling for the Database</h3>
<p>For this step, go into your terminal and navigate to the folder where your key pair is downloaded. Run the ls command, and you should see your key pair there.</p>
<p>Next, you need to change the permission of the key pair file. Use the command below:</p>
<pre><code class="language-shell">chmod 0400 ece-db-key-pair.pem&nbsp;
</code></pre>
<p>Now, run the SSH tunneling command below:</p>
<pre><code class="language-shell">ssh -i &lt;YOUR-KEY-PAIR&gt;.pem -f -N -L &lt;LOCAL-PORT&gt;:&lt;YOUR-RDS-ENDPOINT&gt;:&lt;RDS-PORT&gt; &lt;EC2-USERNAME&gt;@&lt;YOUR-EC2-PUBLIC-DNS&gt; -v
</code></pre>
<ul>
<li><p><code>&lt;YOUR-KEY-PAIR&gt;.pem</code>: the name of your downloaded key pair file</p>
</li>
<li><p><code>&lt;LOCAL-PORT&gt;</code>:&nbsp; the port on your laptop (3306 for MySQL, 5432 for PostgreSQL)</p>
</li>
<li><p><code>&lt;YOUR-RDS-ENDPOINT&gt;</code>: found in AWS Console &gt; RDS &gt; Your database &gt; Connectivity &amp; Security &gt; Endpoint</p>
</li>
<li><p><code>&lt;RDS-PORT&gt;</code>: same as local port (3306 for MySQL, 5432 for PostgreSQL)</p>
</li>
<li><p><code>&lt;EC2-USERNAME&gt;</code>: usually ec2-user for Amazon Linux, ubuntu for Ubuntu</p>
</li>
<li><p><code>&lt;YOUR-EC2-PUBLIC-DNS&gt;</code>: found in AWS Console &gt; EC2 &gt; Your instance &gt; Public IPv4 DNS</p>
</li>
</ul>
<p>This command lets your laptop or local machine talk directly to your remote database, as if the database were sitting on your own computer.</p>
<p>After running this command, you can open a database tool (like MySQL Workbench, DBeaver, or TablePlus) on your laptop and connect to:</p>
<ul>
<li><p>Host: localhost</p>
</li>
<li><p>Port: 3306</p>
</li>
</ul>
<p>For this tutorial, I’ll be using the community version of DBeaver. You can use other similar tools, but if you prefer to use the same tool for the purpose of this guide, you can install the community version from the official <a href="https://dbeaver.io/download/">DBeaver download page</a>.</p>
<p>After download and installation, open the DBeaver client and click the Connect to a database icon in the top-left corner of the app.</p>
<p>Select MySQL and click Next. On the next window, enter your database username and password, and set Server Host to 127.0.0.1.</p>
<p>Click Test Connection.</p>
<p>You should see a window appear on your screen, indicating that the connection is successful.</p>
<p>Click OK and Finish.</p>
<p>Now, on the left panel, you should see your connection. Expand it to see the database structure.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/5c8115eb-020a-4b8d-9c84-9a1b4c10a071.png" alt="database structure" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now, you have successfully created SSH tunneling for your database.</p>
<h4 id="heading-troubleshooting">Troubleshooting</h4>
<p>While attempting to test the database connection, I initially ran into a “Plugin 'mysql_native_password' is not loaded” error. If you encounter this error, follow the steps below to fix it.</p>
<ol>
<li><p>On the Connection Settings window, navigate to the Driver properties tab.</p>
</li>
<li><p>Look for allowPublicKeyRetrieval and set it to FALSE.</p>
</li>
<li><p>Navigate back to the Main tab and click Test Connection.</p>
</li>
</ol>
<p>Everything should work fine now.</p>
<h3 id="heading-step-5-set-up-a-simple-springboot-application-development">Step 5: Set Up a Simple SpringBoot Application Development</h3>
<p>To get started, head over to the <a href="https://start.spring.io/">Spring Initializr website</a>. Rename Artifact to “springboot-mysql-eks”. Then click ADD DEPENDENCIES… to add dependencies for the REST APIs. Search for the following dependencies:</p>
<ul>
<li><p><strong>Spring Web:</strong> Build web apps, including RESTful applications using Spring MVC. Uses Apache Tomcat as the default embedded container.</p>
</li>
<li><p><strong>Spring Data JPA:</strong> Persist data in SQL stores with the Java Persistence API using Spring Data and Hibernate.</p>
</li>
<li><p><strong>IBM DB2 Driver:</strong> A JDBC driver that provides access to IBM DB2.</p>
</li>
<li><p><strong>Lombok:</strong> A Java annotation library that helps to reduce boilerplate code.</p>
</li>
</ul>
<p>Next, click GENERATE at the bottom center of the page. This action downloads a zip file to your local machine. Open this file in an IDE, such as VSCode or IntelliJ IDEA. For this tutorial, I use VSCode. In the build.gradle file, you can see all the added dependencies:</p>
<pre><code class="language-json">dependencies {
   implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
   implementation 'org.springframework.boot:spring-boot-starter-webmvc'
   compileOnly 'org.projectlombok:lombok'
   runtimeOnly 'com.ibm.db2:jcc'
   annotationProcessor 'org.projectlombok:lombok'
   testImplementation 'org.springframework.boot:spring-boot-starter-data-jpa-test'
   testImplementation 'org.springframework.boot:spring-boot-starter-webmvc-test'
   testCompileOnly 'org.projectlombok:lombok'
   testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
   testAnnotationProcessor 'org.projectlombok:lombok'
}
</code></pre>
<h4 id="heading-what-were-building">What we're building</h4>
<p>The Spring Boot app is a currency exchange rate and conversion app:</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/eef403da-eb1d-47e8-8edd-d0e4d845a3d1.png" alt="image of counter " style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>We'll be inserting the exchange data into the database table.</p>
<p>To continue with this tutorial, you can clone the project repo <a href="https://github.com/ChisomUma/sprint-boot-msql-eks">here</a> to save time.</p>
<p>In main &gt; java &gt; com.. &gt; model &gt; ExchangeRate, you’ll see the code below:</p>
<pre><code class="language-java">package com.example.springbootmysqleks.model;

import jakarta.persistence.*;
import lombok.Getter;
import lombok.Setter;

import java.sql.Date;

@Getter
@Setter
@Entity
@Table(name = "exchange-rate")
public class ExchangeRate {
   @Id
   @GeneratedValue(strategy=GenerationType.AUTO)
   private Integer transactionId;
   private String sourceCurrency;
   private String targetCurrency;
   private double amount;
   private Date lastUpdated;
}
</code></pre>
<p>This class is essentially a blueprint for storing currency exchange rate data in our database. It uses the libraries and dependencies added earlier. Lombok handles all the repetitive getter/setter boilerplate so you don't have to write it yourself, while JPA annotations like <code>@Entity</code> and <code>@Table</code> tell Spring, "hey, this class maps to a database table called exchange-rate."</p>
<p>Inside the class, there are five fields that become database columns:</p>
<ul>
<li><p>A self-incrementing transactionId as the primary key.</p>
</li>
<li><p>sourceCurrency and targetCurrency to track which currencies are being converted,</p>
</li>
<li><p>The amount holding the actual exchange rate</p>
</li>
<li><p>lastUpdated date, so you always know how fresh your data is.</p>
</li>
</ul>
<p>To store the data, create a repository file in main &gt; java &gt; com.. &gt; repository &gt; ExchangeRateRepository:</p>
<pre><code class="language-java">package com.example.springbootmysqleks.repository;

import com.example.springbootmysqleks.model.ExchangeRate;
import org.springframework.data.jpa.repository.JpaRepository;

public interface ExchangeRateRepository extends JpaRepository&lt;ExchangeRate, Integer&gt; {
   ExchangeRate findBySourceCurrencyAndTargetCurrency(String sourceCurrency, String targetCurrency);
}
</code></pre>
<p>This file acts as the middleman between your code and the database. By simply extending JpaRepository, you instantly get a whole suite of built-in database operations (like save, delete, findAll, and so on) completely for free, without writing a single SQL query.</p>
<p>The interface is typed to work with the <code>ExchangeRate</code> model we just looked at, using Integer as the primary key type.</p>
<p>The one custom method, <code>findBySourceCurrencyAndTargetCurrency</code>, is where Spring's magic really shines. Just by following a naming convention, Spring automatically figures out the SQL query it needs to run, so you can look up an exchange rate by simply passing in two currency codes like "USD" and "EUR" without writing any query logic yourself.</p>
<p>To use the <code>findBySourceCurrencyAndTargetCurrency</code> method, create a service file in main &gt; java &gt; com.. &gt; service &gt; ExchangeRateService with the code below:</p>
<pre><code class="language-java">package com.example.springbootmysqleks.service;

import com.example.springbootmysqleks.model.ExchangeRate;
import com.example.springbootmysqleks.repository.ExchangeRateRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

@Service
public class ExchangeRateService {

   @Autowired
   private ExchangeRateRepository exchangeRateRepository;

   public ExchangeRate addExchangeRate(ExchangeRate exchangeRate) {
       return exchangeRateRepository.save(exchangeRate);
   }

   public double getAmount(String sourceCurrency, String targetCurrency) {
       ExchangeRate exchangeRate =  exchangeRateRepository.findBySourceCurrencyAndTargetCurrency(sourceCurrency, targetCurrency);
       return exchangeRate == null ? 0 : exchangeRate.getAmount();
   }
}
</code></pre>
<p>Here, we created a <code>@Service</code> class that interacts with the repository.</p>
<p>The class has two methods, the <code>addExchangeRate</code>, which simply takes an <code>ExchangeRate</code> object and saves it to the database, and <code>getAmount</code>, which takes a source and target currency, uses our custom repository method to look up the matching record, and then either returns the exchange rate amount or a safe default of 0 if no record is found.</p>
<p>That little ternary check (<code>exchangeRate == null ? 0 : exchangeRate.getAmount()</code>) ensures the app doesn't crash if you query a currency pair that doesn't exist in the database yet.</p>
<p>In main &gt; java &gt; com.. &gt; controller &gt; ExchangeRateService, we have the following code:</p>
<pre><code class="language-java">package com.example.springbootmysqleks.controller;

import com.example.springbootmysqleks.model.ExchangeRate;
import com.example.springbootmysqleks.service.ExchangeRateService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.*;

@RestController
public class ExchangeRateController {

   @Autowired
   ExchangeRateService exchangeRateService;

   @GetMapping("/getAmount")
   public double getAmount(@RequestParam String sourceCurrency, @RequestParam String targetCurrency) {
       return exchangeRateService.getAmount(sourceCurrency, targetCurrency);
   }

   @PostMapping("/addExchangeRate")
   public ExchangeRate addExchangeRate(@RequestBody ExchangeRate exchangeRate) {
       return exchangeRateService.addExchangeRate(exchangeRate);
   }

   @GetMapping("/")
   public String getHealth() {
       return "up";
   }

}
</code></pre>
<p>The <code>@RestController</code> annotation tells Spring this class will be serving up REST API endpoints, and again <code>@Autowired</code> wires in the service layer automatically.</p>
<p>There are three endpoints:</p>
<ol>
<li><p>a GET request to <code>/getAmount</code> that accepts <code>sourceCurrency</code> and <code>targetCurrency</code> as query parameters and returns the exchange rate amount</p>
</li>
<li><p>a POST request to <code>/addExchangeRate</code> that accepts a full <code>ExchangeRate</code> object as a JSON body and saves it to the database</p>
</li>
<li><p>and finally a simple health check endpoint at / that just returns "up",&nbsp; which is a common pattern in cloud deployments to let load balancers and orchestration tools know the app is alive and running.</p>
</li>
</ol>
<h3 id="heading-step-6-configure-springboot-app-for-database">Step 6: Configure SpringBoot App for Database</h3>
<p>Now, it’s time to configure the application for the database. Navigate to src &gt; main &gt; resources &gt; application.properties, and you should see this:</p>
<pre><code class="language-java">spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.url=jdbc:mysql://\({MYSQL_HOSTNAME}:\){MYSQL_PORT}/${MYSQL_DATABASE}?createDatabaseIfNotExist=true
spring.datasource.username=${MYSQL_USERNAME}
spring.datasource.password=${MYSQL_PASSWORD}

spring.jpa.hibernate.ddl-auto=update

spring.jpa.show-sql: true
</code></pre>
<p>These are the configurations that allow your app to connect with the database.</p>
<ul>
<li><p><code>spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver</code>: The driver class for the MySQL database.</p>
</li>
<li><p><code>spring.datasource.url=jdbc:mysql://\({MYSQL_HOSTNAME}:\){MYSQL_PORT}/${MYSQL_DATABASE}?createDatabaseIfNotExist=true</code>: This is the data source URL in which we are using the MySQL hostname (127.0.0.1), port name, and database name.</p>
</li>
<li><p><code>spring.datasource.username=${MYSQL_USERNAME}</code>: your database user name.</p>
</li>
<li><p><code>spring.datasource.password=${MYSQL_PASSWORD}</code>: your database password.</p>
</li>
</ul>
<p>One thing to note: the process of configuring environment variables with your actual credentials varies depending on the IDE you're using. If you're using IntelliJ IDEA, this process is pretty straightforward. If you're using VS Code, the process is different.</p>
<p>To configure your actual credentials for the <code>env</code> variables, create a <code>.vscode/launch.json</code> file in your project root folder and paste in the following configuration:</p>
<pre><code class="language-json">{
 "version": "0.2.0",
 "configurations": [
   {
     "type": "java",
     "name": "Spring Boot App",
     "request": "launch",
     "mainClass": "com.example.springbootmysqleks.SpringbootMysqlEksApplication",
     "projectName": "springboot-mysql-eks",
     "env": {
       "MYSQL_HOSTNAME": "localhost",
       "MYSQL_PORT": "3306",
       "MYSQL_DATABASE": "exchangedb",
       "MYSQL_USERNAME": "root",
       "MYSQL_PASSWORD": "CHANGE_ME"
     }
   }
 ]
}
</code></pre>
<p>Configure the credentials to use your actual credentials.</p>
<p>Now, when you run the app, you should be able to see the created <code>exchangedb</code> table in DBeaver:</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/9ea1b85c-25a8-4587-a013-4d592d1664eb.png" alt="exchnage db image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Use an API testing tool like Postman to send a POST request to the database:</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/5fa9e950-17ff-4fd6-a650-b6a33b607744.png" alt="postman request image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Next, run the <code>select * from exchange_rate er</code> script in the <code>exchangedb</code> SQL script editor:</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/a7a2ffea-f930-4f27-b29a-f5aaf8f8543e.png" alt="sql editor image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>At the bottom of the editor, you should see the created table from the Postman request.</p>
<p>Now, run a GET request to the endpoint below:</p>
<pre><code class="language-json">http://localhost:8080/getAmount?sourceCurrency=USD&amp;targetCurrency=EUR&amp;transactionId=1
</code></pre>
<p>You should get a 200 OK response with the currency exchange value, for example, 0.93.</p>
<h3 id="heading-step-7-dockerize-the-springboot-application">Step 7: Dockerize the SpringBoot Application</h3>
<p>To Dockerize your application, create a file named Dockerfile and paste in the configuration below:</p>
<pre><code class="language-dockerfile">FROM eclipse-temurin:17-jre-jammy
WORKDIR /app
COPY build/libs/springboot-mysql-eks.jar /app
EXPOSE 8080
CMD ["java", "-jar", "springboot-mysql-eks.jar"]
</code></pre>
<p>Our Dockerfile starts by pulling the lightweight <code>eclipse-temurin:17-jre-jammy</code> base image to keep things lean, then sets /app as the working directory inside the container. It copies our compiled Spring Boot JAR file from the local build/libs/ folder into that directory, exposes port 8080 for incoming traffic, and finally runs the app with <code>java -jar</code> when the container starts up.</p>
<p>Next, build the app to create the <code>.jar</code> file. To do that, run the command below:</p>
<pre><code class="language-shell">./gradlew clean assemble 
</code></pre>
<p>You should get a successful build output as shown below:</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/bde4395b-bc30-46e4-9687-bd03836f574d.png" alt="bde4395b-bc30-46e4-9687-bd03836f574d" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Navigate to build &gt; the libs folder. You’ll see the <code>springboot-mysql-eks</code> file created.</p>
<p>If you run into an “operation couldn’t be completed.” error, try running the export commands to fix this issue. If you’re using a Mac, then run the command below:</p>
<pre><code class="language-shell">brew install openjdk@21
</code></pre>
<p>Next, run the export commands:</p>
<pre><code class="language-shell">export JAVA_HOME=/opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk/Contents/Home

export PATH=\(JAVA_HOME/bin:\)PATH
</code></pre>
<p>Then run the <code>./gradlew clean assemble</code> command again.</p>
<h3 id="heading-step-8-push-the-image-to-elastic-container-registry-ecr">Step 8: Push the Image to Elastic Container Registry (ECR)</h3>
<p>In this next step, we’ll create an Amazon ECR and push our image to the registry.</p>
<p>To get started, head back into your AWS Console and search for “ECR”. On the ECR page, click Create**.** Then, enter a repository name, for example, “springboot-mysql-eks.” Next, click Create.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/21c56eef-c656-49b6-a624-b1da66cf1096.png" alt="ECR image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Next, select the repo and click View push commands at the top of the page. This presents a window with a bunch of commands you can use to push your image to the registry. Open your terminal and run these commands. You'll need to ensure Docker is running on your local machine before running the commands.</p>
<p>After running the commands, you should see that your image has been successfully pushed to the registry.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/b39d0b9c-ea99-4cf3-bfb0-3496ac79dea9.png" alt="ECR image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-9-implement-aws-app-load-balancer">Step 9: Implement AWS App Load Balancer</h3>
<p>Before getting started with this step, make sure you check out the installation steps and link to additional AWS documentation in the project README. This will help you follow along.</p>
<p>Now, to get started, create a new folder in your root directory named <code>cluster</code> . This is where you'll download the AWS IAM policy for the load balancer. To download the policy, go into your terminal and <code>cd</code> into <code>cluster</code>, then run the command below:</p>
<pre><code class="language-shell">curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.14.1/docs/install/iam_policy.json
</code></pre>
<p>This command is gotten from the <a href="https://docs.aws.amazon.com/eks/latest/userguide/lbc-helm.html">AWS documentation</a>. Now, when you go to the folder, you’ll see an iam_policy.json file automatically generated.</p>
<p>Next, apply the IAM policy using the command below:</p>
<pre><code class="language-shell">aws iam create-policy \
    --policy-name AWSLoadBalancerControllerIAMPolicy \
    --policy-document file://iam_policy.json
</code></pre>
<p>You should get an output like this in your terminal:</p>
<img alt="terminal image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This shows that the IAM policy has been successfully created. To confirm this, head over to the IAM section in your console, navigate to Policies**,** and search for “AWSLoad…”. You should see the policy created there.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/c022959a-c732-4dfd-bd58-4b80d3011b71.png" alt="load balancer policy image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The next step is creating the Kubernetes service account. But before that, you need to tag your public and private subnets as described in this <a href="https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html">documentation</a>.</p>
<p>Now, head over to the VPC dashboard, navigate to Subnets, click into a subnet, and navigate to Tags. Then, click Manage tags.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/615f3ec2-d266-413e-8936-d0767d03316d.png" alt="tag image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Click Add new tag, then enter the key/pair value in the documentation.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/253abe00-cb7e-4276-81de-79ba0ffc249a.png" alt="tag image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-10-create-a-cluster-in-eks">Step 10: Create a Cluster in EKS</h3>
<p>To create a Kubernetes cluster on EKS, you need the eksctl CLI. Follow the instructions in the <a href="https://docs.aws.amazon.com/eks/latest/eksctl/installation.html">AWS eksctl documentation</a> to install the CLI. Next, you need a <a href="https://docs.aws.amazon.com/eks/latest/eksctl/schema.html">config file schema</a> to create the cluster. To use this schema, create a new file called cluster.yaml in the cluster folder.</p>
<p>Next, paste in the following configurations:</p>
<pre><code class="language-dockerfile">apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: spring-test-cluster
  region: us-east-1
  version: "1.30"

vpc:
  id: "&lt;your-vpc-id&gt;"
  subnets:
    private:
      us-east-1a:
        id: "&lt;your-private-subnet-1a-id&gt;" # spring-demo-subnet-private1-us-east-1a
      us-east-1b:
        id: "&lt;your-private-subnet-1b-id&gt;" # spring-demo-subnet-private2-us-east-1b
    public:
      us-east-1a:
        id: "&lt;your-public-subnet-1a-id&gt;" # spring-demo-subnet-public1-us-east-1a
      us-east-1b:
        id: "&lt;your-public-subnet-1b-id&gt;" # spring-demo-subnet-public2-us-east-1b

nodeGroups:
  - name: ng-1
    labels: { role: backend }
    instanceType: t2.micro
    desiredCapacity: 3
    minSize: 3
    maxSize: 5
    privateNetworking: true
    ssh:
      allow: true
      publicKeyName: &lt;your-ec2-key-name&gt;
    iam:
      withAddonPolicies:
        imageBuilder: true
        awsLoadBalancerController: true
        autoScaler: true
iam:
  withOIDC: true
  serviceAccounts:
    - metadata:
        name: aws-load-balancer-controller
        namespace: kube-system
      attachPolicyARNs:
        - arn:aws:iam::&lt;YOUR_AWS_ACCOUNT_ID&gt;:policy/AWSLoadBalancerControllerIAMPolicy
</code></pre>
<p>Th <code>ClusterConfig</code> file is used by eksctl to create our EKS cluster called <code>spring-test-cluster</code> in the <code>us-east-1 region</code>, running Kubernetes version 1.30. It plugs into our existing VPC, placing the worker nodes across private subnets in two availability zones <code>us-east-1a</code> and <code>us-east-1b</code>) for high availability, while keeping public subnets available for the load balancer.</p>
<p>The node group spins up t2.micro EC2 instances with a desired count of 3 (scaling up to 5 if needed), all with private networking enabled for security. It also sets up the necessary IAM permissions for the AWS Load Balancer Controller, Auto Scaler, and ECR image access so our cluster has everything it needs to manage traffic and pull our Docker images automatically.</p>
<p>Now, after updating your configuration with your credentials, run the command below:</p>
<pre><code class="language-shell">eksctl create cluster -f cluster.yaml
</code></pre>
<p>This creates the cluster. You should see an output like this on your terminal:</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/758c6b7d-9cf3-4fa6-a0d5-2276adc82147.png" alt="cluster creation image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now, in your AWS console, navigate to CloudFormation, and you’ll see your cluster creation process in progress.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/4aa96465-45c1-4e43-b8ca-d44a8802e02d.png" alt="stack creation image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now, when you go into the EC2 instance page, you should see the three nodes created.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/d946d993-0c64-4870-90fb-1132df51544f.png" alt="running cluster image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-11-install-aws-load-balancing">Step 11: Install AWS Load Balancing</h3>
<p>The next step is installing a load balancer for our application. To get started, run the command below:</p>
<pre><code class="language-shell"> kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller/crds?ref=master"
</code></pre>
<p>This installs <a href="https://www.geeksforgeeks.org/devops/custom-resource-definitions-crds/">custom resource definitions (CRDs)</a> for our controller. Next, run the command below to add the Helm chart repo.</p>
<pre><code class="language-shell">helm repo add eks https://aws.github.io/eks-charts
</code></pre>
<p>Update your local repo to ensure you have the most recent charts:</p>
<pre><code class="language-shell">helm repo update eks
</code></pre>
<p>Next, install the Helm chart:</p>
<pre><code class="language-shell">helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
&nbsp; -n kube-system \
&nbsp; --set clusterName=my-cluster \
&nbsp; --set serviceAccount.create=false \
&nbsp; --set serviceAccount.name=aws-load-balancer-controller \
&nbsp; --version 1.14.0
</code></pre>
<p>Next, verify that the controller is installed:</p>
<pre><code class="language-shell">kubectl get deployment -n kube-system aws-load-balancer-controller
</code></pre>
<p>You should see this on your terminal:</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/61d954f3-b09d-4691-9b1c-02134c2d8bf1.png" alt="61d954f3-b09d-4691-9b1c-02134c2d8bf1" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This indicates that your controller is ready.</p>
<h3 id="heading-step-12-create-and-deploy-kubernetes">Step 12: Create and Deploy Kubernetes</h3>
<p>To get started, you'll first need to create a Kubernetes manifest file. For that, we’ll use <a href="https://www.freecodecamp.org/news/what-is-a-helm-chart-tutorial-for-kubernetes-beginners/">Helm Chart</a>.</p>
<pre><code class="language-shell">helm create ytchart
</code></pre>
<p>The command above creates a folder named <code>ytchart</code> with the templates for the components. In this folder, you need to make some configurations for your use case. First, navigate to ytchart &gt; templates and delete the <code>serviceaccount.yaml</code> file, since we already created the service account earlier.</p>
<p>Next, go to values.yaml and make the following changes:</p>
<ul>
<li><p>For <code>repository</code>, navigate to the ECR service page on the AWS Console and copy the image URI.</p>
</li>
<li><p>Tag is <code>latest</code>.</p>
</li>
<li><p>Set database name</p>
</li>
</ul>
<pre><code class="language-dockerfile">mysql:
 databaseName: exchangedb
</code></pre>
<ul>
<li><p>Change service account creation to <code>false</code>.</p>
</li>
<li><p>Scroll down a bit more and change the service <code>type</code> to <code>NodePort</code> and <code>port</code> to <code>8080</code>.</p>
</li>
</ul>
<p>You also need to store the database username and password using secrets. Navigate to the <code>templates</code> folder and go into the file named <code>secrets.yaml</code>. Here, set your database username and password, then comment out the liveness and readiness probe in <code>deployment.yaml</code>.</p>
<p>Next, we’ll create a service to connect to the database. To do that, navigate to the <code>mysql.yaml</code> file, then for <code>externalName</code>. Navigate to the RDS service page on the AWS console and copy the database endpoint.</p>
<p>Now, in the <code>deployment.yaml</code> file, paste in the following configuration:</p>
<pre><code class="language-dockerfile">          env:
            - name: SPRING_DATASOURCE_URL
              value: jdbc:mysql://spring-mysql:3306/{{ .Values.mysql.databaseName }}?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useUnicode=true&amp;useSSL=false&amp;allowPublicKeyRetrieval=true
            - name: SPRING_DATASOURCE_USERNAME
              valueFrom:
                secretKeyRef:
                  name: mysql-username
                  key: username
            - name: SPRING_DATASOURCE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-root-password
                  key: password
</code></pre>
<p>You have successfully created environment variables to secure your database credentials.</p>
<p>In the <code>ingress.yaml</code> file, paste in the following configuration:</p>
<pre><code class="language-dockerfile">apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: "spring-microservice-ingress"
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/load-balancer-name: spring-alb-test
  labels:
    app: spring-microservice
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: {{ include "ytchart.fullname" . }}
                port:
                  number: 8080
</code></pre>
<p>This is your configuration for the ingress service.</p>
<p>Run the command below to see all your configuration values:</p>
<pre><code class="language-shell">helm template ytchart/
</code></pre>
<p>Next, run the command below to deploy the chart:</p>
<pre><code class="language-shell">helm install mychart ytchart
</code></pre>
<p>You should see an output like this on your terminal:</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/4178105c-f469-4bd6-94f0-58046d12c080.png" alt="helm chart image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now, when you run kubectl get all, you should see this:</p>
<img src="https://cdn.hashnode.com/uploads/covers/62754329317fc95a74ca62a8/23e91fff-6368-42a1-adda-1af45616e9ef.png" alt="deployment image" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now, navigate to EC2 &gt; Load balancers, copy the DNS name, and enter it into a browser. You should see the “up” text. This indicates that your application is working properly.</p>
<p>Now, when you call the API using the DNS URL as such:</p>
<pre><code class="language-shell">http://spring-alb-test-260424558.us-east-1.elb.amazonaws.com/addExchangeRate
</code></pre>
<p>You should get a 200 OK response. Congratulations, you have successfully deployed a SpringBoot app in Kubernetes!</p>
<h3 id="heading-step-13-delete-cluster">Step 13: Delete Cluster</h3>
<p>If you’re familiar with AWS and the cloud, you should already be aware of how costly it can be to leave resources running for extended periods, especially when you’re not using them actively.</p>
<p>Now that we've come to the end of this tutorial, it’s time to delete the resources.</p>
<p>These are the resources to delete:</p>
<ul>
<li><p>RDS database.</p>
</li>
<li><p>Cluster using the command eksctl delete cluster -f cluster.yaml.</p>
</li>
<li><p>Navigate to VPC and delete the NAT Gateway</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Deploying a Spring Boot application with MySQL on Amazon EKS involves a lot of moving parts, but each step builds logically on the last.</p>
<p>In this tutorial, you've gone from setting up a VPC and provisioning a managed database to containerizing your app, pushing it to ECR, and finally orchestrating everything with Kubernetes and an Application Load Balancer.</p>
<p>What you get is a production-grade setup with high availability, private networking, secure credential management, and auto-scaling built in. This is the kind of infrastructure that would take significant manual effort to replicate without managed services like EKS and RDS.</p>
<p>As a next step, consider adding HTTPS support via AWS Certificate Manager, setting up horizontal pod autoscaling, or integrating a CI/CD pipeline to automate future deployments. And remember to clean up your AWS resources when you're done experimenting. Your wallet will thank you.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
