<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Cloud Computing - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Cloud Computing - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 14 May 2026 04:32:40 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/cloud-computing/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Migrate to S3 Native State Locking in Terraform ]]>
                </title>
                <description>
                    <![CDATA[ If you've been running Terraform on AWS for any length of time, you know the setup: an S3 bucket for state storage, a DynamoDB table for state locking, and a handful of IAM policies tying them togethe ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-migrate-to-s3-native-state-locking-in-terraform/</link>
                <guid isPermaLink="false">69fd19239f93a850a430069b</guid>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Terraform ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Infrastructure as code ]]>
                    </category>
                
                    <category>
                        <![CDATA[ S3 ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tolani Akintayo ]]>
                </dc:creator>
                <pubDate>Thu, 07 May 2026 22:58:43 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/9619ad45-15c5-4be7-9221-ed4b76bc2b24.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you've been running Terraform on AWS for any length of time, you know the setup: an S3 bucket for state storage, a DynamoDB table for state locking, and a handful of IAM policies tying them together. It works. It has worked for years.</p>
<p>But it has always carried a cost that rarely gets discussed openly. That cost isn't just money, though a DynamoDB table with on-demand billing adds up across multiple teams and environments.</p>
<p>The real cost is complexity. Every new AWS environment needs both resources provisioned before Terraform can manage anything else. Every engineer who sets up their first Terraform backend has to understand why two completely different AWS services are responsible for what is logically one thing: storing and protecting state. And every incident involving a stuck lock has required someone to manually delete a record from DynamoDB to unblock the team.</p>
<p>In November 2024, AWS announced that S3 now supports native object locking for Terraform state files, meaning <strong>DynamoDB is no longer required for state locking</strong>. Terraform 1.10 added support for this feature, and it's now generally available.</p>
<p>In this tutorial, you'll learn:</p>
<ul>
<li><p>What S3 native locking is and how it works</p>
</li>
<li><p>How to set it up from scratch if you're starting a new project</p>
</li>
<li><p>How to migrate an existing S3 + DynamoDB setup to S3 native locking safely</p>
</li>
<li><p>How to verify locking is working and handle edge cases</p>
</li>
</ul>
<p>By the end, you'll have a simpler, cleaner Terraform backend with one fewer AWS resource to manage.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-what-is-terraform-state-locking">What Is Terraform State Locking?</a></p>
</li>
<li><p><a href="#heading-what-is-s3-native-state-locking">What Is S3 Native State Locking?</a></p>
</li>
<li><p><a href="#heading-how-s3-native-locking-compares-to-the-s3-dynamodb-approach">How S3 Native Locking Compares to the S3 + DynamoDB Approach</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-part-1-fresh-setup-how-to-configure-s3-native-locking-from-scratch">Part 1: Fresh Setup – How to Configure S3 Native Locking from Scratch</a></p>
<ul>
<li><p><a href="#heading-step-1-create-the-s3-bucket-with-versioning-and-encryption">Step 1: Create the S3 Bucket with Versioning and Encryption</a></p>
</li>
<li><p><a href="#heading-step-2-configure-the-terraform-backend-with-native-locking">Step 2: Configure the Terraform Backend with Native Locking</a></p>
</li>
<li><p><a href="#heading-step-3-initialize-and-verify">Step 3: Initialize and Verify</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-part-2-migration-how-to-move-from-s3-dynamodb-to-s3-native-locking">Part 2: Migration – How to Move from S3 + DynamoDB to S3 Native Locking</a></p>
<ul>
<li><p><a href="#heading-step-1-verify-your-current-setup">Step 1: Verify Your Current Setup</a></p>
</li>
<li><p><a href="#heading-step-2-enable-object-lock-on-the-existing-s3-bucket">Step 2: Enable Object Lock on the Existing S3 Bucket</a></p>
</li>
<li><p><a href="#heading-step-3-update-the-terraform-backend-configuration">Step 3: Update the Terraform Backend Configuration</a></p>
</li>
<li><p><a href="#heading-step-4-reinitialize-terraform">Step 4: Reinitialize Terraform</a></p>
</li>
<li><p><a href="#heading-step-5-verify-the-migration">Step 5: Verify the Migration</a></p>
</li>
<li><p><a href="#heading-step-6-clean-up-the-dynamodb-table">Step 6: Clean Up the DynamoDB Table</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-how-to-verify-that-locking-is-working">How to Verify That Locking Is Working</a></p>
</li>
<li><p><a href="#heading-how-to-handle-a-stuck-lock">How to Handle a Stuck Lock</a></p>
</li>
<li><p><a href="#heading-rollback-plan-if-something-goes-wrong">Rollback Plan: If Something Goes Wrong</a></p>
</li>
<li><p><a href="#heading-security-best-practices-for-your-state-bucket">Security Best Practices for Your State Bucket</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-references">References</a></p>
</li>
</ul>
<h2 id="heading-what-is-terraform-state-locking">What is Terraform State Locking?</h2>
<p>Before looking at the new approach, it helps to understand what state locking is solving.</p>
<p>Terraform stores everything it knows about your infrastructure in a <strong>state file</strong> – a JSON document that maps your configuration to real AWS resources. When you run <code>terraform apply</code>, Terraform reads this file, calculates the difference between the current state and your configuration, and makes the necessary changes.</p>
<p>The problem arises when two engineers or two CI/CD pipelines run and try to apply changes at the same time. If both read the state file simultaneously, calculate changes independently, and both try to write back, you get a <strong>race condition</strong>. The second write overwrites changes from the first, and your state is now out of sync with reality. This is a serious problem that can cause resources to be untracked, doubled, or destroyed unexpectedly.</p>
<p><strong>State locking</strong> solves this by creating a lock when any operation starts that could modify state. If a lock already exists, Terraform refuses to proceed and reports who holds the lock and when it was acquired. Only one operation can hold the lock at a time. When the operation completes, the lock is released.</p>
<pre><code class="language-plaintext">Terraform Run A                 State File / Lock                Terraform Run B
(User 1)                         (S3/DynamoDB)                   (User 2)

   |                                   |                            |
   |------- 1. Acquire Lock ----------&gt;|                            |
   |                                   |                            |
   |&lt;------ 2. Lock Granted -----------|                            |
   |                                   |                            |
   |                                   |------- 3. Acquire Lock ---&gt;|
   |            [PROCESSING]           |                            |
   |      (Modifying Infrastructure)   |&lt;------ 4. Lock Denied -----|
   |                                   |        (Wait / Retry)      |
   |                                   |                            |
   |------- 5. Release Lock ----------&gt;|                            |
   |                                   |                            |
   |           [COMPLETED]             |&lt;------ 6. Lock Granted ----|
   |                                   |                            |
   |                                   |       [PROCESSING]         |
   |                                   | (Modifying Infrastructure) |              
   |                                   |                            |
</code></pre>
<h2 id="heading-what-is-s3-native-state-locking">What Is S3 Native State Locking?</h2>
<p>Previously, Terraform's S3 backend used a DynamoDB table as the locking mechanism. When a lock was needed, Terraform wrote a record to DynamoDB with a <code>LockID</code> primary key. DynamoDB's conditional writes guaranteed that only one process could create that record, which is what made the locking atomic.</p>
<p>S3 native locking uses <strong>S3 Object Lock</strong> instead. S3 Object Lock is an S3 feature originally designed to enforce WORM (Write Once, Read Many) compliance for regulatory requirements. AWS extended this capability to support Terraform's state locking workflow.</p>
<p>When S3 native locking is enabled in your Terraform backend:</p>
<ol>
<li><p>Terraform writes your state to an <code>.tfstate</code> object in S3 (as before)</p>
</li>
<li><p>To acquire a lock, Terraform uses <strong>S3's conditional write operations</strong> – specifically the <code>if-none-match</code> conditional header to create a lock file atomically</p>
</li>
<li><p>If the lock file already exists, S3 rejects the write, and Terraform reports that a lock is held</p>
</li>
<li><p>When the operation completes, Terraform deletes the lock file to release the lock.</p>
</li>
</ol>
<p>The key difference from DynamoDB: the entire locking mechanism lives inside S3. No second service. No second set of IAM permissions. No second resource to provision.</p>
<p><strong>Note:</strong> This feature requires Terraform version <strong>1.10.0 or later</strong> and an S3 bucket with <strong>Object Lock enabled</strong>. Object Lock must be enabled at bucket creation time. You can't enable it on an existing bucket through the console or CLI. But there is a supported workaround for existing buckets, which we'll cover in Part 2.</p>
<h2 id="heading-how-s3-native-locking-compares-to-the-s3-dynamodb-approach">How S3 Native Locking Compares to the S3 + DynamoDB Approach</h2>
<table>
<thead>
<tr>
<th><strong>Aspect</strong></th>
<th><strong>S3 + DynamoDB (Old)</strong></th>
<th><strong>S3 Native Locking (New)</strong></th>
</tr>
</thead>
<tbody><tr>
<td><strong>AWS services required</strong></td>
<td>S3 + DynamoDB</td>
<td>S3 only</td>
</tr>
<tr>
<td><strong>IAM permissions needed</strong></td>
<td>S3 + DynamoDB permissions</td>
<td>S3 permissions only</td>
</tr>
<tr>
<td><strong>Terraform version</strong></td>
<td>Any</td>
<td>1.10.0 or later</td>
</tr>
<tr>
<td><strong>Setup complexity</strong></td>
<td>Two resources, two IAM scopes</td>
<td>One resource</td>
</tr>
<tr>
<td><strong>Stuck lock resolution</strong></td>
<td>Delete DynamoDB record</td>
<td>Delete S3 lock file</td>
</tr>
<tr>
<td><strong>Cost</strong></td>
<td>S3 storage + DynamoDB on-demand</td>
<td>S3 storage only</td>
</tr>
<tr>
<td><strong>Object Lock requirement</strong></td>
<td>Not required</td>
<td>Required on S3 bucket</td>
</tr>
<tr>
<td><strong>Locking mechanism</strong></td>
<td>DynamoDB conditional writes</td>
<td>S3 conditional writes (<code>if-none-match</code>)</td>
</tr>
<tr>
<td><strong>State versioning</strong></td>
<td>S3 Versioning (recommended)</td>
<td>S3 Versioning (required for full safety)</td>
</tr>
</tbody></table>
<p>The functional behavior from Terraform's perspective is identical. Locking works the same way. The lock information displayed when a lock is held has the same structure. The only difference is what happens under the hood.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you start, make sure you have the following in place:</p>
<ul>
<li><strong>Terraform 1.10.0 or later</strong> installed. Check your version:</li>
</ul>
<pre><code class="language-shell">terraform version
</code></pre>
<p>If you need to upgrade, follow the <a href="https://developer.hashicorp.com/terraform/install">official upgrade guide</a>.</p>
<ul>
<li><strong>AWS CLI</strong> installed and configured with credentials that have permission to create and manage S3 buckets.</li>
</ul>
<pre><code class="language-shell">aws --version
aws sts get-caller-identity   # confirm you're authenticated
</code></pre>
<ul>
<li><p><strong>IAM permissions</strong> to perform the following S3 actions:</p>
<ul>
<li><p><code>s3:CreateBucket</code></p>
</li>
<li><p><code>s3:PutBucketVersioning</code></p>
</li>
<li><p><code>s3:PutBucketEncryption</code></p>
</li>
<li><p><code>s3:PutObjectLegalHold</code></p>
</li>
<li><p><code>s3:PutObjectRetention</code></p>
</li>
<li><p><code>s3:GetObject</code></p>
</li>
<li><p><code>s3:PutObject</code></p>
</li>
<li><p><code>s3:DeleteObject</code></p>
</li>
<li><p><code>s3:ListBucket</code></p>
</li>
</ul>
</li>
<li><p>For the <strong>migration path</strong>: access to your existing Terraform project and the S3 bucket and DynamoDB table currently in use.</p>
</li>
</ul>
<h2 id="heading-part-1-fresh-setup-how-to-configure-s3-native-locking-from-scratch">Part 1: Fresh Setup – How to Configure S3 Native Locking from Scratch</h2>
<p>Follow this section if you're starting a new Terraform project and want to use S3 native locking from the beginning.</p>
<h3 id="heading-step-1-create-the-s3-bucket-with-versioning-and-encryption">Step 1: Create the S3 Bucket with Versioning and Encryption</h3>
<p>Object Lock <strong>must be enabled at bucket creation time</strong>. You can't add it afterward through the standard console flow. Create the bucket using the AWS CLI with Object Lock enabled:</p>
<pre><code class="language-shell">aws s3api create-bucket \
  --bucket your-project-terraform-state \
  --region us-east-1 \
  --object-lock-enabled-for-bucket
</code></pre>
<p><strong>Note:</strong> For regions other than <code>us-east-1</code>, add the <code>--create-bucket-configuration</code> flag.</p>
<pre><code class="language-shell">aws s3api create-bucket \
  --bucket your-project-terraform-state \
  --region eu-west-1 \
  --create-bucket-configuration LocationConstraint=eu-west-1 \
  --object-lock-enabled-for-bucket
</code></pre>
<p>Now enable versioning on the bucket. Versioning is required alongside Object Lock and allows Terraform to recover previous state versions if something goes wrong:</p>
<pre><code class="language-shell">aws s3api put-bucket-versioning \
  --bucket your-project-terraform-state \
  --versioning-configuration Status=Enabled
</code></pre>
<p>Enable server-side encryption so your state files are encrypted at rest:</p>
<pre><code class="language-shell">aws s3api put-bucket-encryption \
  --bucket your-project-terraform-state \
  --server-side-encryption-configuration '{
    "Rules": [
      {
        "ApplyServerSideEncryptionByDefault": {
          "SSEAlgorithm": "AES256"
        },
        "BucketKeyEnabled": true
      }
    ]
  }'
</code></pre>
<p>Block all public access to the bucket. A Terraform state file contains resource IDs, IP addresses, and potentially sensitive values. It should never be publicly accessible:</p>
<pre><code class="language-shell">aws s3api put-public-access-block \
  --bucket your-project-terraform-state \
  --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
</code></pre>
<p>Verify the bucket configuration:</p>
<pre><code class="language-shell"># Confirm Object Lock is enabled
aws s3api get-object-lock-configuration \
  --bucket your-project-terraform-state
 
# Confirm versioning is enabled
aws s3api get-bucket-versioning \
  --bucket your-project-terraform-state
 
# Confirm encryption is configured
aws s3api get-bucket-encryption \
  --bucket your-project-terraform-state
</code></pre>
<p>Expected output for the Object Lock check:</p>
<pre><code class="language-json">{
    "ObjectLockConfiguration": {
        "ObjectLockEnabled": "Enabled"
    }
}
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/2b2e56cf-687f-4932-a61e-ed7cc33ea6f1.png" alt="Terminal showing AWS CLI verification commands confirming S3 bucket is configured correctly with Object Lock, versioning, and encryption enabled" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-2-configure-the-terraform-backend-with-native-locking">Step 2: Configure the Terraform Backend with Native Locking</h3>
<p>In your Terraform project, create or update your <code>backend.tf</code> file:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket = "your-project-terraform-state"
    key    = "production/terraform.tfstate"
    region = "us-east-1"
 
    # Enable S3 native state locking
    # Requires Terraform 1.10.0+ and a bucket with Object Lock enabled
    use_lockfile = true
 
    # Encryption at rest
    encrypt = true
  }
}
</code></pre>
<p>The critical difference from the old configuration is the <code>use_lockfile = true</code> parameter. Notice what is <strong>absent</strong>: there's no <code>dynamodb_table</code> argument. No DynamoDB table. No second service.</p>
<p>Here's a direct comparison of the old and new configurations:</p>
<p><strong>Old configuration (S3 + DynamoDB):</strong></p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket         = "your-project-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"   # this goes away
  }
}
</code></pre>
<p><strong>New configuration (S3 native locking):</strong></p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket       = "your-project-terraform-state"
    key          = "production/terraform.tfstate"
    region       = "us-east-1"
    encrypt      = true
    use_lockfile = true   # this replaces dynamodb_table
  }
}
</code></pre>
<h3 id="heading-step-3-initialize-and-verify">Step 3: Initialize and Verify</h3>
<p>Run <code>terraform init</code> to initialize the backend:</p>
<pre><code class="language-shell">terraform init
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">Initializing the backend...
 
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
 
Initializing provider plugins...
 
Terraform has been successfully initialized!
</code></pre>
<p>Run a plan to confirm everything is working end-to-end:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>If locking is working, you'll see a brief pause while Terraform acquires the lock before the plan output appears. You'll also see the lock information if you look at the S3 bucket&nbsp;– a <code>.tflock</code> file will appear temporarily alongside your state file during the operation and disappear when it completes.</p>
<h2 id="heading-part-2-migration-how-to-move-from-s3-dynamodb-to-s3-native-locking">Part 2: Migration&nbsp;– How to Move from S3 + DynamoDB to S3 Native Locking</h2>
<p>Follow this section if you have an <strong>existing Terraform setup</strong> using an S3 bucket and DynamoDB table for state locking, and you want to migrate to S3 native locking.</p>
<p><strong>Important:</strong> Migration requires a maintenance window or at minimum a period where no Terraform operations are running. You're changing the backend configuration, which means <strong>all team members and CI/CD pipelines must stop running</strong> <code>terraform plan</code> <strong>or</strong> <code>terraform apply</code> <strong>during the migration</strong>. The migration itself takes under 10 minutes.</p>
<h3 id="heading-step-1-verify-your-current-setup">Step 1: Verify Your Current Setup</h3>
<p>Before making any changes, document your existing backend configuration and confirm the state file is accessible:</p>
<pre><code class="language-shell"># Confirm your state file is in S3
aws s3 ls s3://your-existing-bucket/path/to/terraform.tfstate
 
# Confirm the DynamoDB table exists
aws dynamodb describe-table \
  --table-name your-dynamodb-lock-table \
  --query 'Table.TableStatus'
</code></pre>
<p>Check your current <code>backend.tf</code> and note the exact values:</p>
<pre><code class="language-shell"># Your current backend.tf - note these values before changing anything
terraform {
  backend "s3" {
    bucket         = "your-existing-bucket"       # note this
    key            = "path/to/terraform.tfstate"   # note this
    region         = "us-east-1"                   # note this
    encrypt        = true
    dynamodb_table = "your-dynamodb-lock-table"    # this will be removed
  }
}
</code></pre>
<p>Run one final plan to confirm the current state is clean and there are no unexpected changes pending:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>If the plan shows no changes, you're in a safe state to proceed.</p>
<h3 id="heading-step-2-enable-object-lock-on-the-existing-s3-bucket">Step 2: Enable Object Lock on the Existing S3 Bucket</h3>
<p>This is the most important step in the migration. Object Lock can't normally be enabled on an existing bucket. It's a setting that must be configured at creation time.</p>
<p>But AWS provides a way to enable Object Lock on an existing bucket through a support request or through a direct API call that's not exposed in the standard console UI. AWS has officially documented this path for the Terraform migration use case.</p>
<p>Run the following AWS CLI command to enable Object Lock on your <strong>existing</strong> bucket:</p>
<pre><code class="language-bash">aws s3api put-object-lock-configuration \
  --bucket your-existing-bucket \
  --object-lock-configuration '{"ObjectLockEnabled": "Enabled"}'
</code></pre>
<p><strong>Note:</strong> This command enables Object Lock in <strong>governance mode with no default retention</strong>, meaning it enables the locking capability without setting a default retention period on all objects. This is exactly what Terraform's native locking needs: the ability to create and delete lock files, not permanent object retention.</p>
<p>Verify Object Lock is now enabled:</p>
<pre><code class="language-shell">aws s3api get-object-lock-configuration \
  --bucket your-existing-bucket
</code></pre>
<p>Expected output:</p>
<pre><code class="language-json">{
    "ObjectLockConfiguration": {
        "ObjectLockEnabled": "Enabled"
    }
}
</code></pre>
<p>Also verify that versioning is already enabled (it should be if you are running a production Terraform setup):</p>
<pre><code class="language-shell">aws s3api get-bucket-versioning \
  --bucket your-existing-bucket
</code></pre>
<p>Expected output:</p>
<pre><code class="language-json">{
    "Status": "Enabled"
}
</code></pre>
<p>If versioning isn't enabled, enable it before proceeding:</p>
<pre><code class="language-shell">aws s3api put-bucket-versioning \
  --bucket your-existing-bucket \
  --versioning-configuration Status=Enabled
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/cd17df01-3d0a-4f93-9250-3f51627e91c8.png" alt="Terminal output showing successful Object Lock enablement on an existing S3 bucket using the AWS CLI" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-3-update-the-terraform-backend-configuration">Step 3: Update the Terraform Backend Configuration</h3>
<p>Update your <code>backend.tf</code> to remove the <code>dynamodb_table</code> argument and add <code>use_lockfile = true</code>:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket = "your-existing-bucket"
    key    = "path/to/terraform.tfstate"
    region = "us-east-1"
    encrypt = true
 
    # Add this:
    use_lockfile = true
 
    # Remove this line entirely:
    # dynamodb_table = "your-dynamodb-lock-table"
  }
}
</code></pre>
<p>Your updated <code>backend.tf</code> should look like this:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket       = "your-existing-bucket"
    key          = "path/to/terraform.tfstate"
    region       = "us-east-1"
    encrypt      = true
    use_lockfile = true
  }
}
</code></pre>
<h3 id="heading-step-4-reinitialize-terraform">Step 4: Reinitialize Terraform</h3>
<p>Run <code>terraform init</code> with the <code>-reconfigure</code> flag. This flag tells Terraform that the backend configuration has changed intentionally and to reinitialize without prompting you to copy state (the state is already in the same bucket):</p>
<pre><code class="language-shell">terraform init -reconfigure
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">Initializing the backend...
 
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
 
Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
 
Terraform has been successfully initialized!
</code></pre>
<p><strong>If you see an error here:</strong> The most common cause is that Object Lock wasn't successfully enabled on the bucket. Re-run the verification from Step 2 before proceeding.</p>
<h3 id="heading-step-5-verify-the-migration">Step 5: Verify the Migration</h3>
<p>Run a plan to confirm Terraform is working correctly with the new backend configuration:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>The plan should:</p>
<ul>
<li><p>Complete successfully</p>
</li>
<li><p>Show the same result as the plan you ran in Step 1 (no changes, or the same changes as before)</p>
</li>
<li><p>NOT mention DynamoDB anywhere in its output</p>
</li>
</ul>
<p>To confirm that locking is actually using S3 instead of DynamoDB, open a second terminal and run a plan while the first one is running. You should see the second terminal output a lock error that mentions S3, not DynamoDB:</p>
<pre><code class="language-plaintext">╷
│ Error: Error acquiring the state lock
│
│Error message: operation error S3: PutObject, https response       error StatusCode: 409,
│ RequestID: ..., api error Conflict: Object lock already exists for this key.
│
│ Lock Info:
│   ID:        a1b2c3d4-e5f6-7890-abcd-ef1234567890
│   Path:      your-existing-bucket/path/to/terraform.tfstate.tflock
│   Operation: OperationTypePlan
│   Who:       user@hostname
│   Version:   1.10.0
│   Created:   2026-05-06 14:22:01 UTC
│   Info:
╵
</code></pre>
<p>The <code>Path</code> field shows <code>.tfstate.tflock</code>, a file in your S3 bucket, not a DynamoDB record. This confirms that locking is now handled entirely by S3.</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/e9abb703-af6e-429c-83bb-2ea2dac43a3a.png" alt="Two terminals showing concurrent terraform plan commands, the second one displays a lock error confirming S3 native locking is working" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-6-clean-up-the-dynamodb-table">Step 6: Clean Up the DynamoDB Table</h3>
<p>Once you've confirmed the migration is working correctly and your team has run at least one successful <code>plan</code> and <code>apply</code> cycle using the new backend, you can remove the DynamoDB table.</p>
<p><strong>Wait at least 24-48 hours before deleting the DynamoDB table</strong> if you have CI/CD pipelines or multiple team members. This gives time to catch any pipeline that wasn't updated with the new backend configuration.</p>
<p>When you're ready, delete the DynamoDB table:</p>
<pre><code class="language-shell">aws dynamodb delete-table \
  --table-name your-dynamodb-lock-table
</code></pre>
<p>Confirm the deletion:</p>
<pre><code class="language-shell">aws dynamodb describe-table \
  --table-name your-dynamodb-lock-table
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">An error occurred (ResourceNotFoundException) when calling the DescribeTable operation:
Requested resource not found
</code></pre>
<p>This error confirms that the table is gone. The migration is complete.</p>
<p>If you provisioned the DynamoDB table using Terraform (which is the recommended pattern), remove the resource from your Terraform configuration and run <code>terraform apply</code> to destroy it via Terraform rather than the CLI directly. This keeps your state clean:</p>
<pre><code class="language-hcl"># Remove this entire block from your Terraform configuration:
resource "aws_dynamodb_table" "terraform_state_lock" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
 
  attribute {
    name = "LockID"
    type = "S"
  }
}
</code></pre>
<p>After removing the block, run:</p>
<pre><code class="language-bash">terraform apply
</code></pre>
<p>Terraform will detect that the DynamoDB table resource has been removed from configuration and will destroy the table.</p>
<h2 id="heading-how-to-verify-that-locking-is-working">How to Verify That Locking Is Working</h2>
<p>After completing either the fresh setup or the migration, use this procedure to independently verify that locking is functioning correctly.</p>
<h3 id="heading-method-1-observe-the-lock-file-during-an-operation">Method 1: Observe the lock file during an operation</h3>
<p>In one terminal, start a long-running plan against a configuration with many resources:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>While it's running, in a second terminal, check for the lock file in S3:</p>
<pre><code class="language-shell">aws s3 ls s3://your-bucket/path/to/ | grep tflock
</code></pre>
<p>You should see a file like:</p>
<pre><code class="language-plaintext">2026-05-06 14:22:01        512 terraform.tfstate.tflock
</code></pre>
<p>After the plan completes, run the same command again. The <code>.tflock</code> file should be gone.</p>
<h3 id="heading-method-2-read-the-lock-file-contents">Method 2: Read the lock file contents</h3>
<p>While a plan is running, download and read the lock file to see its contents:</p>
<pre><code class="language-shell">aws s3 cp \
  s3://your-bucket/path/to/terraform.tfstate.tflock \
  /tmp/current.lock &amp;&amp; cat /tmp/current.lock
</code></pre>
<p>Expected output (formatted for readability):</p>
<pre><code class="language-json">{
  "ID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "Operation": "OperationTypePlan",
  "Info": "",
  "Who": "tolani@dev-machine",
  "Version": "1.10.0",
  "Created": "2026-05-06T14:22:01.123456789Z",
  "Path": "your-bucket/path/to/terraform.tfstate"
}
</code></pre>
<p>This is the same lock information that Terraform displays when a lock is held. It's now a JSON file in S3 rather than a record in DynamoDB.</p>
<h2 id="heading-how-to-handle-a-stuck-lock">How to Handle a Stuck Lock</h2>
<p>With the DynamoDB backend, resolving a stuck lock meant deleting a record from the DynamoDB table. With S3 native locking, it means deleting the <code>.tflock</code> file from S3.</p>
<p>A lock can get stuck if:</p>
<ul>
<li><p>A <code>terraform apply</code> or <code>plan</code> process was killed mid-execution</p>
</li>
<li><p>A CI/CD pipeline runner crashed during a Terraform operation</p>
</li>
<li><p>A network interruption prevented the lock release from completing</p>
</li>
</ul>
<p>Here's how you can check for a stuck lock:</p>
<pre><code class="language-shell">aws s3 ls s3://your-bucket/path/to/ | grep tflock
</code></pre>
<p>If a <code>.tflock</code> file exists and no Terraform operation is currently running, it is a stuck lock.</p>
<p>You can also read the lock to understand who held it:</p>
<pre><code class="language-shell">aws s3 cp \
  s3://your-bucket/path/to/terraform.tfstate.tflock \
  /tmp/stuck.lock &amp;&amp; cat /tmp/stuck.lock
</code></pre>
<p>This tells you who (<code>Who</code> field) was running the operation, what operation it was (<code>Operation</code> field), and when it was acquired (<code>Created</code> field).</p>
<p>And you can force-unlock using Terraform like this:</p>
<pre><code class="language-shell">terraform force-unlock LOCK-ID
</code></pre>
<p>Replace <code>LOCK-ID</code> with the <code>ID</code> value from the lock file contents. For example:</p>
<pre><code class="language-shell">terraform force-unlock a1b2c3d4-e5f6-7890-abcd-ef1234567890
</code></pre>
<p>Terraform will confirm:</p>
<pre><code class="language-plaintext">Do you really want to force-unlock?
  Terraform will remove the lock on the remote state.
  This will allow local Terraform commands to modify this state, even though it
  may be still be in use. Only 'yes' will be accepted to confirm.
 
  Enter a value: yes
 
Terraform state has been successfully unlocked!
</code></pre>
<p>An alternative is to delete the lock file directly via CLI. If <code>terraform force-unlock</code> doesn't work (for example, because you are running in a CI environment without Terraform available), delete the lock file directly:</p>
<pre><code class="language-shell">aws s3 rm s3://your-bucket/path/to/terraform.tfstate.tflock
</code></pre>
<p><strong>Only delete the lock file if you are certain no Terraform operation is currently running.</strong> Deleting a lock that is actively held by a running operation will allow a second concurrent operation to start, which is exactly the race condition locking is designed to prevent.</p>
<h2 id="heading-rollback-plan-if-something-goes-wrong">Rollback Plan: If Something Goes Wrong</h2>
<p>If you encounter problems after migrating, you can roll back to the S3 + DynamoDB setup with these steps.</p>
<p><strong>Step 1: Stop all Terraform operations</strong> in your team and CI/CD pipelines.</p>
<p><strong>Step 2: Recreate the DynamoDB table</strong> if you already deleted it:</p>
<pre><code class="language-shell">aws dynamodb create-table \
  --table-name terraform-state-lock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST
</code></pre>
<p><strong>Step 3: Revert</strong> <code>backend.tf</code> to the previous configuration:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket         = "your-existing-bucket"
    key            = "path/to/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"   # restored
    # Remove: use_lockfile = true
  }
}
</code></pre>
<p><strong>Step 4: Reinitialize:</strong></p>
<pre><code class="language-shell">terraform init -reconfigure
</code></pre>
<p><strong>Step 5: Verify:</strong></p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>The state file hasn't moved, so there's no data loss during a rollback. The only change is which locking mechanism Terraform uses.</p>
<p><strong>Note:</strong> Object Lock being enabled on the S3 bucket doesn't prevent the rollback. Object Lock and DynamoDB locking can coexist, Object Lock simply adds a capability to the bucket. Using <code>dynamodb_table</code> in your backend config tells Terraform to use DynamoDB regardless of whether Object Lock is enabled on the bucket.</p>
<h2 id="heading-security-best-practices-for-your-state-bucket">Security Best Practices for Your State Bucket</h2>
<p>Migrating to S3 native locking is a good opportunity to review the overall security configuration of your state bucket. Here are the practices every production Terraform state bucket should implement:</p>
<h3 id="heading-enable-versioning-required">Enable Versioning (Required)</h3>
<p>Versioning is a hard requirement for S3 native locking to work safely. It ensures that if a state file is accidentally overwritten or corrupted, you can restore a previous version.</p>
<pre><code class="language-shell">aws s3api put-bucket-versioning \
  --bucket your-state-bucket \
  --versioning-configuration Status=Enabled
</code></pre>
<h3 id="heading-block-all-public-access-non-negotiable">Block All Public Access (Non-Negotiable)</h3>
<p>Your state file contains resource ARNs, IP addresses, and may contain sensitive values passed through Terraform variables. It must never be publicly accessible.</p>
<pre><code class="language-shell">aws s3api put-public-access-block \
  --bucket your-state-bucket \
  --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
</code></pre>
<h3 id="heading-enable-server-side-encryption">Enable Server-Side Encryption</h3>
<p>Always encrypt state files at rest. AES256 is the minimum. If your organization requires KMS key management:</p>
<pre><code class="language-shell">aws s3api put-bucket-encryption \
  --bucket your-state-bucket \
  --server-side-encryption-configuration '{
    "Rules": [
      {
        "ApplyServerSideEncryptionByDefault": {
          "SSEAlgorithm": "aws:kms",
          "KMSMasterKeyID": "arn:aws:kms:us-east-1:123456789012:key/your-kms-key-id"
        },
        "BucketKeyEnabled": true
      }
    ]
  }'
</code></pre>
<h3 id="heading-apply-least-privilege-iam-permissions">Apply Least-Privilege IAM Permissions</h3>
<p>The role or user that Terraform uses to access the state bucket should have only the permissions it needs. Here's a minimal IAM policy for S3 native locking:</p>
<pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "TerraformStateAccess",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::your-state-bucket",
        "arn:aws:s3:::your-state-bucket/*"
      ]
    },
    {
      "Sid": "TerraformStateLocking",
      "Effect": "Allow",
      "Action": [
        "s3:GetObjectLegalHold",
        "s3:PutObjectLegalHold",
        "s3:GetObjectRetention",
        "s3:PutObjectRetention"
      ],
      "Resource": "arn:aws:s3:::your-state-bucket/*.tflock"
    }
  ]
}
</code></pre>
<p>Notice what is absent: there are no DynamoDB permissions. This is a cleaner, smaller permission set than the old approach required.</p>
<h3 id="heading-enable-access-logging">Enable Access Logging</h3>
<p>Log all access to your state bucket in CloudTrail or S3 server access logs. This gives you an audit trail of every time state was read, written, or locked:</p>
<pre><code class="language-shell">aws s3api put-bucket-logging \
  --bucket your-state-bucket \
  --bucket-logging-status '{
    "LoggingEnabled": {
      "TargetBucket": "your-logging-bucket",
      "TargetPrefix": "terraform-state-access/"
    }
  }'
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>AWS S3 native state locking removes the need for a DynamoDB table from your Terraform backend setup. The result is simpler infrastructure, a smaller IAM permission surface, and one fewer service to provision, monitor, and pay for across every environment your team manages.</p>
<p>Here's a summary of what you accomplished:</p>
<ul>
<li><p>Understood what state locking is and why it's required for safe Terraform operations</p>
</li>
<li><p>Compared S3 native locking to the existing S3 + DynamoDB approach</p>
</li>
<li><p>Set up a fresh Terraform backend using S3 native locking with correct bucket configuration</p>
</li>
<li><p>Migrated an existing backend from S3 + DynamoDB to S3 native locking safely</p>
</li>
<li><p>Learned how to verify locking, handle stuck locks, and roll back if needed</p>
</li>
<li><p>Applied security best practices to the state bucket</p>
</li>
</ul>
<p>This pattern – using S3 native locking – is the recommended approach for all new Terraform projects on AWS going forward. If you're managing a large estate with multiple Terraform backends, consider automating the migration using a script or Terraform module that applies the pattern across all your state buckets.</p>
<p><em>If you are building or optimizing cloud infrastructure for a startup and want a complete reference for production-ready Terraform modules, CI/CD pipeline patterns, and infrastructure runbooks, check out</em> <a href="https://coachli.co/tolani-akintayo/PR-H4oQS">The Startup DevOps Field Guide</a><em>. It covers the full lifecycle of AWS infrastructure from initial setup to production reliability.</em></p>
<h2 id="heading-references">References</h2>
<ul>
<li><p><a href="https://developer.hashicorp.com/terraform/language/backend/s3#use_lockfile">HashiCorp - S3 Backend Configuration: use_lockfile</a></p>
</li>
<li><p><a href="https://github.com/hashicorp/terraform/releases/tag/v1.10.0">HashiCorp: Terraform 1.10 Release Notes</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html">AWS Docs: S3 Object Lock Overview</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectLockConfiguration.html">AWS Docs: PutObjectLockConfiguration API</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-requests.html">AWS Docs: S3 Conditional Writes</a></p>
</li>
<li><p><a href="https://developer.hashicorp.com/terraform/language/state/locking">HashiCorp: Backend State Locking</a></p>
</li>
<li><p><a href="https://developer.hashicorp.com/terraform/cli/commands/force-unlock">HashiCorp: terraform force-unlock Command</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/manage-versioning-examples.html">AWS Docs: Enabling S3 Versioning</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html">AWS Docs: S3 Server-Side Encryption</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Land Your First Cloud or DevOps Role: What Hiring Managers Actually Look For ]]>
                </title>
                <description>
                    <![CDATA[ You've completed three AWS courses. You have notes from a dozen Docker tutorials. You know what Kubernetes is, what CI/CD means, and you can explain Infrastructure as Code without hesitating. And yet  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-land-your-first-cloud-or-devops-role-what-hiring-managers-actually-look-for/</link>
                <guid isPermaLink="false">69f3683c909e64ad07e3b0fc</guid>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Career ]]>
                    </category>
                
                    <category>
                        <![CDATA[ jobs ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tolani Akintayo ]]>
                </dc:creator>
                <pubDate>Thu, 30 Apr 2026 14:33:32 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/374e807b-a67f-4f04-a639-dfa230b0ba5f.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>You've completed three AWS courses. You have notes from a dozen Docker tutorials. You know what Kubernetes is, what CI/CD means, and you can explain Infrastructure as Code without hesitating.</p>
<p>And yet the applications go out, and nothing comes back.</p>
<p>This is one of the most frustrating experiences in tech. You're genuinely learning, genuinely putting in the time, and you have nothing to show for it in terms of results. You start to wonder if the market is too competitive, if you need one more certification, or if there's some hidden door everyone else found that you're missing.</p>
<p>The truth is simpler and more actionable than any of that: <strong>hiring managers can't see your YouTube watch history. They can see your GitHub.</strong> Most beginners optimize for learning. Hired candidates optimize for proof.</p>
<p>In this guide, you'll get an honest breakdown of the nine factors hiring managers actually evaluate when they look at a junior cloud or DevOps candidate and a concrete 90-day plan to address each one. By the end, you'll know exactly where you stand and exactly what to do next.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-the-three-patterns-that-keep-beginners-stuck">The Three Patterns That Keep Beginners Stuck</a></p>
<ul>
<li><p><a href="#heading-pattern-1-the-tutorial-loop">Pattern 1: The Tutorial Loop</a></p>
</li>
<li><p><a href="#heading-pattern-2--the-theorypractice-gap">Pattern 2: The Theory-Practice Gap</a></p>
</li>
<li><p><a href="#pattern-3-silent-learning">Pattern 3: Silent Learning</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-what-hiring-managers-are-actually-evaluating">What Hiring Managers Are Actually Evaluating</a></p>
</li>
<li><p><a href="#heading-factor-1-proof-of-work-the-non-negotiable">Factor 1: Proof of Work (The Non-Negotiable)</a></p>
<ul>
<li><a href="#heading-the-three-projects-that-cover-everything">The Three Projects That Cover Everything</a></li>
</ul>
</li>
<li><p><a href="#heading-factor-2-system-level-thinking">Factor 2: System-Level Thinking</a></p>
</li>
<li><p><a href="#heading-factor-3-software-engineering-fundamentals">Factor 3: Software Engineering Fundamentals</a></p>
</li>
<li><p><a href="#heading-factor-4-communication-skills">Factor 4: Communication Skills</a></p>
</li>
<li><p><a href="#heading-factor-5-consistency-over-intensity">Factor 5: Consistency Over Intensity</a></p>
</li>
<li><p><a href="#heading-factor-6-networking-and-visibility">Factor 6: Networking and Visibility</a></p>
</li>
<li><p><a href="#heading-factor-7-ownership-mindset">Factor 7: Ownership Mindset</a></p>
</li>
<li><p><a href="#heading-factor-8--business-awareness">Factor 8: Business Awareness</a></p>
</li>
<li><p><a href="#heading-factor-9-learning-agility">Factor 9: Learning Agility</a></p>
</li>
<li><p><a href="#heading-your-90-day-action-plan">Your 90-Day Action Plan</a></p>
</li>
<li><p><a href="#heading-honest-self-assessment-where-do-you-stand">Honest Self-Assessment: Where Do You Stand?</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-references-and-recommended-resources">References and Recommended Resources</a></p>
</li>
</ul>
<h2 id="heading-the-three-patterns-that-keep-beginners-stuck">The Three Patterns That Keep Beginners Stuck</h2>
<h3 id="heading-pattern-1-the-tutorial-loop">Pattern 1: The Tutorial Loop</h3>
<p>Week 1: You watch eight hours of Docker content. Week 2: You start an AWS course and get 70% through. Week 3: A Kubernetes series looks interesting, so you start that instead. Week 4: You open LinkedIn and wonder why you're not getting callbacks.</p>
<p>Watching tutorials feels like progress. It's comfortable, passive, and has no failure state. Nothing breaks. Nothing goes wrong.</p>
<p>The problem is that it produces nothing a hiring manager can evaluate. Courses and certifications tell an employer what you've been exposed to. Your GitHub tells them what you can actually do.</p>
<h3 id="heading-pattern-2-the-theory-practice-gap">Pattern 2: The Theory-Practice Gap</h3>
<p>You can explain CI/CD fluently. You've read the Kubernetes documentation. You understand the conceptual difference between a container and a virtual machine.</p>
<p>But you've never taken a simple application, containerized it, connected it to a pipeline, and deployed it to a cloud server with a real URL that someone can visit.</p>
<p>In an interview, "I understand how it works" and "I have built this and here is the link" are not equivalent answers. Hiring managers hear the first version from hundreds of candidates. The second version gets callbacks.</p>
<h3 id="heading-pattern-3-silent-learning">Pattern 3: Silent Learning</h3>
<p>This one is perhaps the most painful pattern because the learning is real. You're putting in the work every day but nobody knows. No GitHub activity. No LinkedIn posts. No community presence. Just cold applications sent from job boards to ATS systems that filter you out before a human ever sees your name.</p>
<p>The hard truth: people get hired through people. A hiring manager who has seen your LinkedIn post about a problem you solved is significantly more likely to give your résumé serious attention than a stranger who applied through a portal.</p>
<h2 id="heading-what-hiring-managers-are-actually-evaluating">What Hiring Managers Are Actually Evaluating</h2>
<p>I've grouped the nine factors that follow into three buckets: <strong>Mindset</strong>, <strong>Execution</strong>, and <strong>Visibility</strong>. The order matters: mindset shapes how you execute, and execution is what powers visibility.</p>
<table>
<thead>
<tr>
<th>Bucket</th>
<th>Covers</th>
<th>Factors</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Mindset</strong></td>
<td>How you think about problems and your career</td>
<td>Factors 2, 7, 8, 9</td>
</tr>
<tr>
<td><strong>Execution</strong></td>
<td>What you actually build and demonstrate</td>
<td>Factors 1, 3</td>
</tr>
<tr>
<td><strong>Visibility</strong></td>
<td>Whether the right people know you exist</td>
<td>Factors 4, 5, 6</td>
</tr>
</tbody></table>
<p>Let's go through each one.</p>
<h2 id="heading-factor-1-proof-of-work-the-non-negotiable">Factor 1: Proof of Work (The Non-Negotiable)</h2>
<p>If there's one thing to take from this entire article, it's this: <strong>no portfolio means no serious consideration.</strong> The most technically capable candidate in the applicant pool is invisible without proof of work.</p>
<p>This isn't about impressing anyone with complexity. It's about demonstrating that you can take a system from zero to deployed, documented, and working.</p>
<p>Here's the checklist every portfolio project should meet before you consider it done:</p>
<ul>
<li><p><strong>It's deployed</strong>: there's a real URL you can share, not "it works on my machine"</p>
</li>
<li><p><strong>It has a CI/CD pipeline</strong>: code changes are automatically tested and deployed</p>
</li>
<li><p><strong>Infrastructure is defined as code</strong>: not manually clicked together in the AWS console</p>
</li>
<li><p><strong>It has monitoring and alerting</strong>: you know when it breaks before users tell you</p>
</li>
<li><p><strong>It's documented</strong>: a README explains what it does, how to run it, and how it works</p>
</li>
<li><p><strong>It's on GitHub publicly</strong>: with real commit history showing iterative work</p>
</li>
</ul>
<p>If your project meets all six criteria, you have proof of work. If it meets four of six, you have a project in progress. Finish it before you start applying.</p>
<h3 id="heading-the-three-projects-that-cover-everything">The Three Projects That Cover Everything</h3>
<p>You don't need ten projects. You need two to three projects that together demonstrate the full range of DevOps skills.</p>
<h4 id="heading-project-1-the-full-stack-deploy-pipeline">Project 1 : The Full-Stack Deploy Pipeline</h4>
<p>This is the foundational DevOps project every beginner should build first.</p>
<p>Take any simple web application – a Python Flask app, a Node.js API, or even a static site. Containerize it with Docker. Write a CI/CD pipeline that runs tests, builds the Docker image, and deploys to a cloud server automatically on every push to the main branch. You can also set up Nginx as a reverse proxy and add an uptime monitor (UptimeRobot has a free tier).</p>
<p>Tools: GitHub Actions, Docker, AWS EC2 or <a href="http://Render.com">Render.com</a>, Nginx.</p>
<p>Why it matters to a hiring manager: it proves you can automate a full deployment workflow end-to-end. The hiring manager can visit your URL, see it running, and inspect your pipeline history.</p>
<p>This single project puts you ahead of most applicants who only have course completion screenshots.</p>
<h4 id="heading-project-2-infrastructure-as-code-with-terraform">Project 2: Infrastructure as Code with Terraform</h4>
<p>Write Terraform code that provisions a complete environment: a VPC, public and private subnets, an EC2 instance with properly scoped security group rules, and an S3 bucket for remote state. Destroy it and recreate it from scratch to prove the code actually works. Add a GitHub Actions workflow that runs <code>terraform plan</code> on pull requests and <code>terraform apply</code> on merge to main.</p>
<p>Tools: Terraform, AWS (or Azure/GCP), GitHub Actions.</p>
<p>Why it matters: Infrastructure as Code with Terraform is a required skill at almost every company running cloud infrastructure. Showing you can write, version-control, and automate Terraform demonstrates a core professional competency.</p>
<h4 id="heading-project-3-monitoring-and-observability-stack">Project 3: Monitoring and Observability Stack</h4>
<p>Deploy a monitoring stack using Docker Compose: Prometheus scraping metrics from your application and the host, Grafana dashboards showing CPU, memory, request rates, and error rates, and Alertmanager configured to send alerts to Slack or email when thresholds are crossed. Connect this to your Project 1 application so the pipeline deploys and the monitoring watches it.</p>
<p>Tools: Prometheus, Grafana, Alertmanager, Node Exporter, Docker Compose.</p>
<p>Why it matters: most beginner portfolios have zero observability work. This project immediately signals that you understand production engineering, not just deployment. Any senior DevOps engineer or SRE reviewing your application will notice it and it will set you apart.</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/da9e25be-9b59-48c8-9cf0-9cfdb050c277.png" alt="GitHub profile showing three pinned DevOps portfolio repositories with descriptive names " style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-factor-2-system-level-thinking">Factor 2: System-Level Thinking</h2>
<p>This is the mindset that separates a DevOps engineer from someone who just knows a collection of tools. System-level thinking means you can see the whole picture, not just the part you happen to be working on at any given moment.</p>
<p>Here's the mental test hiring managers are running throughout your interview: <em>can you trace a user request from the moment they click a button to the moment they see a response, and explain what happens at every layer in between?</em></p>
<p>Here's the full journey of a web request, the map of modern infrastructure every DevOps engineer needs to understand:</p>
<table>
<thead>
<tr>
<th>Step</th>
<th>Layer</th>
<th>What's happening and what can go wrong</th>
</tr>
</thead>
<tbody><tr>
<td>1</td>
<td>User's Browser</td>
<td>The user types a URL. The browser needs to find the server.</td>
</tr>
<tr>
<td>2</td>
<td>DNS Resolution</td>
<td>The domain is translated into an IP address. DNS misconfigurations mean users can't reach you at all.</td>
</tr>
<tr>
<td>3</td>
<td>CDN / Edge Network</td>
<td>Traffic hits a CDN (Cloudflare, CloudFront) first. Static assets are served from the nearest edge. SSL terminates here.</td>
</tr>
<tr>
<td>4</td>
<td>Load Balancer</td>
<td>Routes the request to an available application server. If all targets are unhealthy, users get 502/503 errors.</td>
</tr>
<tr>
<td>5</td>
<td>Compute / Application Servers</td>
<td>The application code runs here in containers, on VMs, or in server-less functions. Business logic executes.</td>
</tr>
<tr>
<td>6</td>
<td>Database Layer</td>
<td>The application reads from or writes to a database. Slow queries or a full disk causes slow responses or outages.</td>
</tr>
<tr>
<td>7</td>
<td>Cache Layer</td>
<td>Redis or Memcached caches frequently-read data. Cache misses cause extra database load.</td>
</tr>
<tr>
<td>8</td>
<td>Response Returns</td>
<td>The response travels back through the stack and the user sees the result.</td>
</tr>
<tr>
<td>9</td>
<td>Logging and Monitoring</td>
<td>Every step above should emit logs and metrics. Good monitoring alerts you before users notice a problem.</td>
</tr>
</tbody></table>
<p>Why does this matter in an interview? Consider two candidates answering the question: <em>"Tell me about a time something broke in production."</em></p>
<p>Candidate A: "The website was down."</p>
<p>Candidate B: "The load balancer health checks were failing because the app containers were running out of memory due to a memory leak introduced in the previous deploy. We identified it via memory metrics in Grafana, rolled back, and added a memory limit to the container spec."</p>
<p>Same incident. Completely different answer. System-level thinking is what makes the difference.</p>
<h2 id="heading-factor-3-software-engineering-fundamentals">Factor 3: Software Engineering Fundamentals</h2>
<p>Many beginners rush to learn Kubernetes and Terraform before mastering the foundations that make those tools make sense. This creates a knowledge structure that looks impressive but has no solid base underneath it.</p>
<p>Here are the fundamentals that actually matter and what to do if you have a gap in any of them:</p>
<h3 id="heading-1-linux-and-the-command-line">1. Linux and the Command Line</h3>
<p>DevOps tools run on Linux. CI/CD jobs run in Linux containers. SSH is the front door to every server. If the terminal makes you uncomfortable, you're not ready for a production environment. This is not a preference, it's a prerequisite.</p>
<p>Start with daily Linux practice. The <a href="https://training.linuxfoundation.org/training/introduction-to-linux/">Linux Foundation's free introductory materials</a> are a solid starting point. And here's a <a href="https://www.freecodecamp.org/news/learn-the-basics-of-the-linux-operating-system/">solid freeCodeCamp course on Linux basics.</a></p>
<h3 id="heading-2-networking-fundamentals">2. Networking Fundamentals</h3>
<p>DNS, TCP/IP, HTTP/HTTPS, load balancing, firewalls, VPCs, subnets these concepts appear in every cloud architecture. Without them, Terraform and Kubernetes are magic boxes. Study the request flow in Factor 2 above until you can draw it from memory without looking.</p>
<p>Here's a <a href="https://www.freecodecamp.org/news/computer-networking-fundamentals/">computer networking fundamentals course</a> to get you started.</p>
<h3 id="heading-3-scripting-bash-and-python">3. Scripting: Bash and Python</h3>
<p>CI/CD pipelines are scripts. Automation is scripting. If you cannot write a Bash script that reads a config file, calls an API, and handles errors gracefully your automation ceiling is very low. Fix this by writing one small, useful script every week. Solve real problems with code.</p>
<p>Here's a helpful tutorial on <a href="https://www.freecodecamp.org/news/shell-scripting-crash-course-how-to-write-bash-scripts-in-linux/">shell scripting in Linux for beginners</a>.</p>
<h3 id="heading-4-git-and-version-control">4. Git and Version Control</h3>
<p>Not just <code>git commit</code> and <code>git push</code>. Branching strategies, pull requests, merge conflicts, rebasing, and tagging releases are all standard practice in professional DevOps teams. Use Git for everything including your personal learning notes. Practice branching workflows intentionally.</p>
<p>Here's a <a href="https://www.freecodecamp.org/news/gitting-things-done-book/">full book on all the Git basics</a> (and some more advanced topics, too) you need to know.</p>
<h3 id="heading-5-docker-and-containers">5. Docker and Containers</h3>
<p>Docker is the universal packaging format for modern software. Understanding layers, multi-stage builds, volumes, networking, and container security is the floor not the ceiling. Every project you build should be containerized. Write your Dockerfiles by hand instead of copying them.</p>
<p>Here's a course on <a href="https://www.freecodecamp.org/news/learn-docker-and-kubernetes-hands-on-course/">Docker and Kubernetes</a> to get you started,</p>
<h2 id="heading-factor-4-communication-skills">Factor 4: Communication Skills</h2>
<p>Technical skills set your ceiling. Communication skills determine how fast you reach it. This is the most consistently underestimated factor among beginner DevOps candidates.</p>
<p>Two candidates with identical technical ability will have very different career outcomes based on how clearly they communicate. Here's what that looks like in practice:</p>
<p><strong>Architecture explanation</strong>: Can you describe how your project works to someone who has never seen it? Can you draw the architecture on a whiteboard and walk someone through your design decisions and the trade-offs you made?</p>
<p><strong>Trade-off articulation</strong>: <em>"I chose X over Y because..."</em> is one of the most powerful phrases in a technical interview. It shows you understand that every decision has pros and cons and you made a conscious, reasoned choice rather than just copying a tutorial.</p>
<p><strong>Written documentation</strong>: A README is your project's cover letter. A well-written README with clear setup instructions, an architecture diagram, and documented decisions demonstrates engineering maturity that most beginners don't show.</p>
<p>Here's a quick test: open your most recent project on GitHub and read the README as if you're a hiring manager seeing it for the first time. Does it answer these questions?</p>
<ul>
<li><p>What does this project do, and why did you build it?</p>
</li>
<li><p>What does the architecture look like?</p>
</li>
<li><p>How do I run this locally, and how do I deploy it?</p>
</li>
<li><p>What decisions did you make, and why?</p>
</li>
<li><p>What would you improve if you continued working on it?</p>
</li>
</ul>
<p>If you answered "no" to more than two of those rewrite the README before applying anywhere. This single action will meaningfully improve your response rate.</p>
<p><strong>Interview communication</strong>: Hiring managers assess communication throughout the entire interview not just your answers. Thinking out loud, structuring your responses, and admitting uncertainty honestly are all evaluated.</p>
<h2 id="heading-factor-5-consistency-over-intensity">Factor 5: Consistency Over Intensity</h2>
<p>Hiring managers are pattern recognition machines. They look at your GitHub contribution graph, your LinkedIn activity, and your learning trajectory and form an impression before reading a single word on your résumé.</p>
<p>A binge-learning approach, 10-hour weekends followed by weeks of nothing produces a GitHub graph that tells the wrong story. Thirty minutes of focused daily practice for six months beats a monthly 10-hour binge. At the six-month mark, the daily practitioner has 90 hours of focused work. The binge learner has 60 with significantly worse retention.</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/1315bb8d-9e4e-4f84-836f-4e02b83c75ce.webp" alt="GitHub contribution graph showing 12 months of consistent activity with regular commits across the year" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Here's how to build consistency in practice:</p>
<ul>
<li><p>Pick a time slot in your day that you will protect. Thirty minutes is enough to make progress.</p>
</li>
<li><p>Define a four-week learning sprint with a specific goal, not "learn Terraform" but "build and deploy a VPC with Terraform and write the README."</p>
</li>
<li><p>Keep a private learning journal: date, what you studied, what you built, what confused you.</p>
</li>
<li><p>When the sprint ends, evaluate what you built and plan the next one.</p>
</li>
</ul>
<p>What to avoid: declaring publicly on LinkedIn that you're "grinding DevOps full time" and then disappearing for six weeks. The absence is noticed. Only commit publicly to what you will actually sustain.</p>
<h2 id="heading-factor-6-networking-and-visibility">Factor 6: Networking and Visibility</h2>
<p>This is the factor most beginners resist most, and the one that makes the biggest practical difference in time-to-hire.</p>
<p>Most DevOps jobs are filled through people referrals, community connections, LinkedIn conversations. A warm introduction from someone who has seen your work outweighs fifty cold applications every time.</p>
<p>Here are three ways to build visibility without it feeling performative:</p>
<h3 id="heading-community-engagement">Community Engagement</h3>
<p>Join communities where DevOps engineers actually talk: AWS User Groups, local DevOps meetups, DevOps Discord servers, Reddit communities like r/devops and r/kubernetes. You don't need to be the expert. Ask specific questions, answer what you genuinely know, and show up consistently. After three to six months, people will recognize your name.</p>
<h3 id="heading-linkedin-content">LinkedIn Content</h3>
<p>Post once per week about something you learned, built, or got stuck on. Not marketing – documentation. A post that says <em>"This week I configured Prometheus alerting for a Docker Compose stack. Here's what tripped me up and how I solved it"</em> attracts recruiters, leads to conversations, and builds a searchable record of your growth over time.</p>
<h3 id="heading-asking-good-questions-in-public">Asking Good Questions in Public</h3>
<p>When you get stuck and figure it out, write it up. Post the solution in the same community where you asked the question. Answer someone else's version of the same question later. You position yourself as a helpful, engaged learner, exactly who hiring managers want to hire.</p>
<p>Here's a concrete three-month visibility sprint to follow:</p>
<table>
<thead>
<tr>
<th>Timeframe</th>
<th>Action</th>
</tr>
</thead>
<tbody><tr>
<td>Week 1-2</td>
<td>Update your LinkedIn headline: "Cloud / DevOps Engineer in Training │ Building with AWS, Docker, Terraform". Connect with 20 people in DevOps engineers, recruiters, hiring managers. Add a short personal note when connecting.</td>
</tr>
<tr>
<td>Week 3-4</td>
<td>Write your first LinkedIn post. Document something you built or learned this week. Keep it honest and specific. 150–200 words is enough.</td>
</tr>
<tr>
<td>Month 2</td>
<td>Join one community. Introduce yourself. Answer one question per week.</td>
</tr>
<tr>
<td>Month 3</td>
<td>Post consistently once per week. Engage with others' posts. Start appearing in recruiter searches.</td>
</tr>
</tbody></table>
<p>By month three, recruiters searching for "DevOps" in your location will encounter your activity. Some of the best entry-level DevOps opportunities come from exactly this kind of low-pressure visibility.</p>
<h2 id="heading-factor-7-ownership-mindset">Factor 7: Ownership Mindset</h2>
<p>This factor is less about personality type and more about observable behavior. Hiring managers are looking for evidence that you finish what you start not just that you start things.</p>
<p>Here's what the contrast looks like:</p>
<table>
<thead>
<tr>
<th>What hiring managers frequently see</th>
<th>What hiring managers want to see</th>
</tr>
</thead>
<tbody><tr>
<td>"I started a Kubernetes project and encountered a lot of issues"</td>
<td>"Here is a complete project. It deploys to AWS, has a CI/CD pipeline, is monitored, and you can access it at this URL right now."</td>
</tr>
<tr>
<td>"I was working through a Terraform course, learnt a lot about XYZ."</td>
<td>"I finished it, documented it, and wrote a post about what I learned."</td>
</tr>
</tbody></table>
<p>Ownership mindset has three components. First, finish things: a complete, simple project is worth ten times more than ten incomplete complex ones. Second, take responsibility without blame when something breaks: ownership means identifying the cause, fixing it, and adding monitoring so it doesn't happen again. Third, self-direct your learning you don't wait for someone to tell you what to learn next. You see a gap, identify how to close it, and close it. This is what "junior who can work independently" actually means in job descriptions.</p>
<h2 id="heading-factor-8-business-awareness">Factor 8: Business Awareness</h2>
<p>Technical skill gets you in the door. Business awareness keeps you there and accelerates your career.</p>
<p>The core question hiring managers are testing is: <em>can you connect your technical decisions to cost, uptime, and user impact?</em> Infrastructure decisions are business decisions. Cloud costs are typically the second-largest engineering expense at most companies after salaries. A misconfigured auto-scaling group or a forgotten large EC2 instance can burn thousands of dollars overnight.</p>
<p>Here are a few benchmark questions worth being able to answer comfortably:</p>
<ul>
<li><p>If your company has a 99.9% SLA, how many minutes of downtime per month is that? (About 43 minutes.)</p>
</li>
<li><p>If you move workloads from on-demand EC2 instances to Reserved Instances, what's the approximate cost saving? (Around 40–60%.)</p>
</li>
<li><p>If your CI/CD pipeline takes 45 minutes per build and you run 20 builds per day, how much developer wait time does that represent weekly?</p>
</li>
</ul>
<p>Most junior candidates can't answer these fluently in an interview. Candidates who can stand out immediately not because the questions are hard, but because so few people bother to connect infrastructure and business.</p>
<p>The simple habit to build: whenever you describe a technical decision in your project documentation or in an interview, add the business dimension. "I configured auto-scaling" becomes "I configured auto-scaling to handle traffic spikes, which eliminated the cost of over-provisioning and reduced our estimated monthly cloud spend by approximately $X."</p>
<h2 id="heading-factor-9-learning-agility">Factor 9: Learning Agility</h2>
<p>Everyone claims to be a fast learner. It's the most overused phrase in technology job applications. Here's how to make it actually mean something.</p>
<p>Saying "I'm a fast learner" in an interview is table stakes. The question is whether you can prove it. Proof sounds like this: <em>"I had never used GitHub Actions before. I needed a CI/CD pipeline for a project I was building. In 48 hours, I had a working pipeline that runs tests, builds a Docker image, and deploys to AWS."</em></p>
<p>What makes that credible: it names a specific tool, a specific timeframe, and a specific outcome. There is a GitHub repository with a commit history and a working pipeline that a hiring manager can actually look at.</p>
<p>Learning agility is not about knowing many tools shallowly. It's about picking up new tools quickly because you deeply understand the underlying concepts. Tool names change every few years. Concepts networking, automation, observability, reliability do not.</p>
<p>To build a concrete track record of learning agility: once a month, pick one tool you haven't used. Follow its quick-start guide. Build something small. Document what was difficult. Post about it. This is your learning agility portfolio visible, dated, and specific.</p>
<h2 id="heading-your-90-day-action-plan">Your 90-Day Action Plan</h2>
<p>Here is a concrete, sequential plan that takes you from where you are now to your first DevOps interview-ready state.</p>
<h3 id="heading-month-1-build-your-foundation">Month 1: Build Your Foundation</h3>
<p>Focus entirely on Project 1 from the Proof of Work section. Build it completely. Deploy it. Get the live URL. Don't start Project 2 until Project 1 meets all six checklist criteria.</p>
<p>Alongside the build: 30 minutes of Linux and Bash scripting practice daily. This isn't optional, it's the foundation everything else runs on.</p>
<h3 id="heading-month-2-expand-your-execution-and-start-your-visibility">Month 2: Expand Your Execution and Start Your Visibility</h3>
<p>Begin Project 2 (Terraform IaC). Write your first LinkedIn post, it doesn't need to be polished, it needs to be specific. Join one community and introduce yourself.</p>
<h3 id="heading-month-3-complete-the-portfolio-and-document-everything">Month 3: Complete the Portfolio and Document Everything</h3>
<p>Finish all three projects to full checklist standard. Polish every README. Add architecture diagrams. Optimize your GitHub profile, pin your three best repos, write a profile README that describes who you are and what you build, and add links to your live project URLs.</p>
<h3 id="heading-month-4-onward-apply-with-strategy">Month 4 Onward: Apply with Strategy</h3>
<p>Don't start applying before month four. Apply with real proof of work in hand. Target five to ten quality applications per week rather than spraying a hundred. Include your GitHub and your best project's live URL in every application. For roles at companies where you have a community connection, reach out to that person before applying.</p>
<p>Track every application in a spreadsheet: company, role, date applied, status, outcome, notes. After thirty applications, you'll have enough data to see what's working and what isn't.</p>
<p>Here's the full 90-day breakdown:</p>
<table>
<thead>
<tr>
<th>Timeframe</th>
<th>Focus</th>
<th>Milestone</th>
</tr>
</thead>
<tbody><tr>
<td>Week 1-2</td>
<td>Linux fundamentals. Set up GitHub profile. Start Project 1.</td>
<td>Foundation</td>
</tr>
<tr>
<td>Week 3-4</td>
<td>Complete Project 1 CI/CD pipeline. Deploy. Get live URL. Write README.</td>
<td>First Proof of Work</td>
</tr>
<tr>
<td>Month 2</td>
<td>Begin Project 2. First LinkedIn post. Join one community.</td>
<td>Visibility begins</td>
</tr>
<tr>
<td>Month 2-3</td>
<td>Complete Project 2. Scaffold monitoring (Project 3). Post weekly on LinkedIn.</td>
<td>Building momentum</td>
</tr>
<tr>
<td>Month 3</td>
<td>Finish all 3 projects to checklist standard. Polish READMEs and GitHub profile.</td>
<td>Portfolio complete</td>
</tr>
<tr>
<td>Month 4+</td>
<td>Apply strategically. Continue posting and community engagement.</td>
<td>Active job search</td>
</tr>
</tbody></table>
<h2 id="heading-honest-self-assessment-where-do-you-stand">Honest Self-Assessment: Where Do You Stand?</h2>
<p>Go through each statement below. Be completely honest: this is for you, not anyone else.</p>
<table>
<thead>
<tr>
<th>Statement</th>
<th>Action if the answer is No</th>
</tr>
</thead>
<tbody><tr>
<td>I can explain a web request end-to-end (DNS → load balancer → compute → database → logs)</td>
<td>Study Factor 2 until you can draw this from memory</td>
</tr>
<tr>
<td>I have at least one deployed project with a live URL</td>
<td>This is Priority 1. Nothing else matters more right now.</td>
</tr>
<tr>
<td>My best project has a CI/CD pipeline that auto-deploys on push</td>
<td>Add this to your existing project this week</td>
</tr>
<tr>
<td>I have written infrastructure as code (Terraform or CloudFormation)</td>
<td>Project 2 is your next build target</td>
</tr>
<tr>
<td>My projects have READMEs that explain architecture and decisions</td>
<td>Spend one hour today rewriting your README</td>
</tr>
<tr>
<td>I have posted about my learning on LinkedIn in the last 30 days</td>
<td>Post something today, document what you built last week</td>
</tr>
<tr>
<td>I am part of at least one DevOps community</td>
<td>Join r/devops or an AWS Discord server this week</td>
</tr>
<tr>
<td>I can write a Bash script that solves a real automation problem</td>
<td>30 minutes of daily scripting practice for the next 30 days</td>
</tr>
<tr>
<td>I can explain what I built, why I made each decision, and what I'd change</td>
<td>Practice saying this out loud about each project until it's fluent</td>
</tr>
</tbody></table>
<p>Count your "no" answers. Each one is a specific, actionable gap, not a vague sense of being behind. That's the difference between this self-assessment and the anxious feeling of "I'm not ready yet." You're not behind. You just have a prioritized list of what to build next.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Here's what you know now that most beginners still don't:</p>
<p>The gap between you and a DevOps job isn't a gap in certifications, a gap in courses completed, or a gap in the number of tools you've heard about. It's a gap in proof of work, visibility, and the consistency with which you execute.</p>
<p>Hiring managers aren't looking for someone who has watched everything. They're looking for someone who has built something, documented it, deployed it, monitored it, and can clearly explain every decision they made along the way.</p>
<p>The path isn't secret. It's just work. Build two to three complete projects that meet the full checklist. Document everything. Show up consistently in communities and on LinkedIn. Apply with strategy. Iterate based on feedback.</p>
<p>If you want a production-grade reference to support your DevOps journey complete with real Terraform modules, CI/CD workflow templates, infrastructure runbooks, and platform engineering patterns used in real startup environments <a href="https://coachli.co/tolani-akintayo/PR-H4oQS">The Startup DevOps Field Guide</a> was built for exactly this stage of your career.</p>
<p>The information gap between you and your first DevOps role is smaller than you think. The execution gap is where the work is. Start today.</p>
<h2 id="heading-references-and-recommended-resources">References and Recommended Resources</h2>
<ul>
<li><p><a href="https://roadmap.sh/devops">roadmap.sh/devops</a>: The community-maintained DevOps learning roadmap. Use this to sequence what you learn next and avoid random jumps between topics.</p>
</li>
<li><p><a href="https://dora.dev">DORA State of DevOps Report</a>: Free annual report on what DevOps practices actually improve software delivery performance. Gives you the vocabulary hiring managers speak.</p>
</li>
<li><p><a href="https://training.linuxfoundation.org/training/introduction-to-linux/">Linux Foundation - Introduction to Linux</a>: Free introductory Linux course. If the terminal still makes you nervous, start here.</p>
</li>
<li><p><a href="https://itrevolution.com/product/the-phoenix-project/">The Phoenix Project</a>: A business novel about DevOps transformation. Teaches core concepts through story. Gives you vocabulary for business-aware conversations.</p>
</li>
<li><p><a href="http://ExplainShell.com">ExplainShell.com</a>: Paste any command you find online and see exactly what every part does. Use this constantly while building your projects.</p>
</li>
<li><p><a href="https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-readmes">GitHub - How to Write a Good README</a>: Official GitHub guidance on repository documentation.</p>
</li>
<li><p><a href="https://prometheus.io/docs/introduction/overview/">Prometheus Documentation</a>: Official docs for the monitoring tool used in Project 3.</p>
</li>
<li><p><a href="https://developer.hashicorp.com/terraform/tutorials/aws-get-started">Terraform Getting Started - AWS</a>: Official step-by-step guide for Project 2.</p>
</li>
<li><p><a href="https://docs.github.com/en/actions">GitHub Actions Documentation</a>: Complete reference for building CI/CD pipelines in Project 1.</p>
</li>
<li><p><a href="https://www.freecodecamp.org/news/learn-linux-for-beginners-book-basic-to-advanced/">freeCodeCamp - Learn Linux for Beginners</a>: Comprehensive Linux guide available on freeCodeCamp.</p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Deploy a Full-Stack Next.js App on Cloudflare Workers with GitHub Actions CI/CD ]]>
                </title>
                <description>
                    <![CDATA[ I typically build my projects using Next.js 14 (App Router) and Supabase for authentication along with Postgres. The default deployment choice for a Next.js app is usually Vercel, and for good reason: ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-deploy-a-full-stack-next-js-app-on-cloudflare-workers-with-github-actions-ci-cd/</link>
                <guid isPermaLink="false">69f2145e6e0124c05e1a5b6e</guid>
                
                    <category>
                        <![CDATA[ Next.js ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cloudflare ]]>
                    </category>
                
                    <category>
                        <![CDATA[ GitHub Actions ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Md Tarikul Islam ]]>
                </dc:creator>
                <pubDate>Wed, 29 Apr 2026 14:23:26 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/cbb9e559-baa7-452c-992a-3416041712ad.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>I typically build my projects using Next.js 14 (App Router) and Supabase for authentication along with Postgres. The default deployment choice for a Next.js app is usually Vercel, and for good reason: it provides an excellent developer experience.</p>
<p>But after running the same project on both platforms for about a week, I started exploring Cloudflare Workers as an alternative. I noticed improvements in latency (lower TTFB) and found the free tier to be more flexible for my use case.</p>
<p>Deploying Next.js apps on Cloudflare used to be challenging. Earlier solutions like Cloudflare Pages had limitations with full Next.js features, and tools like <code>next-on-pages</code> often lagged behind the latest releases.</p>
<p>That changed with the introduction of <a href="https://opennext.js.org/cloudflare"><code>@opennextjs/cloudflare</code></a>. It allows you to compile a standard Next.js application into a Cloudflare Worker, supporting features like SSR, ISR, middleware, and the Image component – all without requiring major code changes.</p>
<p>In this guide, I’ll walk you through the exact steps I used to deploy my full-stack Next.js + Supabase application to Cloudflare Workers.</p>
<p>This article is the runbook I wish I had when I started.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-why-choose-cloudflare-workers-over-vercel">Why Choose Cloudflare Workers Over Vercel?</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-the-stack">The Stack</a></p>
</li>
<li><p><a href="#heading-step-1-install-the-cloudflare-adapter">Step 1 — Install the Cloudflare Adapter</a></p>
</li>
<li><p><a href="#heading-step-2-wire-opennext-into-next-dev">Step 2 — Wire OpenNext into next dev</a></p>
</li>
<li><p><a href="#heading-step-3-local-environment-setup-with-devvars">Step 3— Local Environment Setup with .dev.vars</a></p>
</li>
<li><p><a href="#heading-step-4-deploy-your-app-from-your-local-machine">Step 4 — Deploy Your App from Your Local Machine</a></p>
</li>
<li><p><a href="#heading-step-5-push-your-secrets-to-the-worker">Step 5 — Push your secrets to the Worker</a></p>
</li>
<li><p><a href="#heading-step-6-set-up-continuous-deployment-with-github-actions">Step 6 — Set Up Continuous Deployment with GitHub Actions</a></p>
</li>
<li><p><a href="#heading-step-7-updating-the-project-the-daily-workflow">Step 7 — Updating the project (the daily workflow)</a></p>
</li>
<li><p><a href="#heading-final-thoughts">Final thoughts</a></p>
</li>
</ul>
<h2 id="heading-why-choose-cloudflare-workers-over-vercel">Why Choose Cloudflare Workers Over Vercel?</h2>
<p>When deploying a Next.js application, Vercel is often the default choice. It offers a smooth developer experience and tight integration with Next.js.</p>
<p>But Cloudflare Workers provides a compelling alternative, especially when you care about global performance and cost efficiency.</p>
<p>Here’s a high-level comparison (at the time of writing):</p>
<table>
<thead>
<tr>
<th>Concern</th>
<th>Vercel (Hobby)</th>
<th>Cloudflare Workers (Free Tier)</th>
</tr>
</thead>
<tbody><tr>
<td>Requests</td>
<td>Fair usage limits</td>
<td>Millions of requests per day</td>
</tr>
<tr>
<td>Cold starts</td>
<td>~100–300 ms (region-based)</td>
<td>Near-zero (V8 isolates)</td>
</tr>
<tr>
<td>Edge locations</td>
<td>Limited regions for SSR</td>
<td>300+ global edge locations</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>~100 GB/month (soft cap)</td>
<td>Generous / no strict cap on free tier</td>
</tr>
<tr>
<td>Custom domains</td>
<td>Supported</td>
<td>Supported</td>
</tr>
<tr>
<td>Image optimization</td>
<td>Counts toward usage</td>
<td>Available via <code>IMAGES</code> binding</td>
</tr>
<tr>
<td>Pricing beyond free</td>
<td>Starts at ~$20/month</td>
<td>Low-cost, usage-based pricing</td>
</tr>
</tbody></table>
<h3 id="heading-key-takeaways">Key Takeaways</h3>
<ul>
<li><p><strong>Lower latency globally</strong>: Cloudflare runs your app across hundreds of edge locations, reducing response time for users worldwide.</p>
</li>
<li><p><strong>Minimal cold starts</strong>: Thanks to V8 isolates, functions start almost instantly.</p>
</li>
<li><p><strong>Cost efficiency</strong>: The free tier is generous enough for portfolios, blogs, and many small-to-medium apps.</p>
</li>
</ul>
<h3 id="heading-trade-offs-to-consider">Trade-offs to Consider</h3>
<p>Cloudflare Workers use a V8 isolate runtime, not a full Node.js environment. That means:</p>
<ul>
<li><p>Some Node.js APIs like <code>fs</code> or <code>child_process</code> aren't available</p>
</li>
<li><p>Native binaries or certain libraries may not work</p>
</li>
</ul>
<p>That said, for most modern stacks –&nbsp;like Next.js + Supabase + Stripe + Resend – this limitation is rarely an issue.</p>
<p>In short, choose <strong>Vercel</strong> if you want the simplest, plug-and-play Next.js deployment. Choose <strong>Cloudflare Workers</strong> if you want better edge performance and more flexible scaling.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before getting started, make sure you have the following set up. Most of these take only a few minutes:</p>
<ul>
<li><p><strong>Node.js 18+</strong> and <strong>pnpm 9+</strong> (you can also use npm or yarn, but this guide uses pnpm.)</p>
</li>
<li><p>A <strong>Cloudflare account</strong> 👉 <a href="https://dash.cloudflare.com/sign-up">https://dash.cloudflare.com/sign-up</a></p>
</li>
<li><p>A <strong>Supabase account</strong> (if your app uses a database) 👉 <a href="https://supabase.com">https://supabase.com</a></p>
</li>
<li><p>A <strong>GitHub repository</strong> for your project (required later for CI/CD setup)</p>
</li>
<li><p>A <strong>domain name</strong> (optional) – You’ll get a free <code>*.workers.dev</code> URL by default.</p>
</li>
</ul>
<h3 id="heading-install-wrangler-cloudflare-cli">Install Wrangler (Cloudflare CLI)</h3>
<p>We’ll use Wrangler to build and deploy the application:</p>
<pre><code class="language-bash">pnpm add -D wrangler
</code></pre>
<h2 id="heading-the-stack">The Stack</h2>
<p>Here’s the tech stack used in this project:</p>
<ul>
<li><p><strong>Next.js (v14.2.x):</strong> Using the App Router with Edge runtime for both public and dashboard routes</p>
</li>
<li><p><strong>Supabase:</strong> Handles authentication, Postgres database, and Row-Level Security (RLS)</p>
</li>
<li><p><strong>Tailwind CSS</strong> + UI utilities: For styling, along with lightweight animation using Framer Motion</p>
</li>
<li><p><strong>Cloudflare Workers:</strong> Deployment powered by <code>@opennextjs/cloudflare</code> and <code>wrangler</code></p>
</li>
<li><p><strong>GitHub Actions:</strong> Used to automate CI/CD and deployments</p>
</li>
</ul>
<p><strong>Note:</strong> If you're using Next.js <strong>15 or later</strong>, you can remove the<br><code>--dangerouslyUseUnsupportedNextVersion</code> flag from the build script, as it's only required for certain Next.js 14 setups.</p>
<h2 id="heading-step-1-install-the-cloudflare-adapter">Step 1 — Install the Cloudflare Adapter</h2>
<p>From inside your existing Next.js project, install the OpenNext adapter along with Wrangler (Cloudflare’s CLI tool):</p>
<pre><code class="language-bash">pnpm add @opennextjs/cloudflare
pnpm add -D wrangler
</code></pre>
<p>Then add the deploy scripts to <code>package.json</code>:</p>
<pre><code class="language-jsonc">{
  "scripts": {
    "dev": "next dev",
    "build": "next build",
    "start": "next start",
    "lint": "next lint",

    "cloudflare-build": "opennextjs-cloudflare build --dangerouslyUseUnsupportedNextVersion",
    "preview":          "pnpm cloudflare-build &amp;&amp; opennextjs-cloudflare preview",
    "deploy":           "pnpm cloudflare-build &amp;&amp; wrangler deploy",
    "upload":           "pnpm cloudflare-build &amp;&amp; opennextjs-cloudflare upload",
    "cf-typegen":       "wrangler types --env-interface CloudflareEnv cloudflare-env.d.ts"
  }
}
</code></pre>
<p>What each script does:</p>
<table>
<thead>
<tr>
<th>Script</th>
<th>What it does</th>
</tr>
</thead>
<tbody><tr>
<td><code>pnpm cloudflare-build</code></td>
<td>Compiles your Next app into <code>.open-next/</code> (the Worker bundle). No upload.</td>
</tr>
<tr>
<td><code>pnpm preview</code></td>
<td>Builds and runs the Worker locally with <code>wrangler dev</code>. Closest thing to prod.</td>
</tr>
<tr>
<td><code>pnpm deploy</code></td>
<td>Builds and uploads to Cloudflare. <strong>This ships to production.</strong></td>
</tr>
<tr>
<td><code>pnpm upload</code></td>
<td>Builds and uploads a <em>new version</em> without promoting it (for staged rollouts).</td>
</tr>
<tr>
<td><code>pnpm cf-typegen</code></td>
<td>Regenerates <code>cloudflare-env.d.ts</code> types after editing <code>wrangler.jsonc</code>.</td>
</tr>
</tbody></table>
<p><strong>Heads up:</strong> the Pages-based <code>@cloudflare/next-on-pages</code> is a different tool. We are <strong>not</strong> using Pages — we're deploying as a real Worker. Don't mix the two.</p>
<h2 id="heading-step-2-wire-opennext-into-next-dev">Step 2 — Wire OpenNext into <code>next dev</code></h2>
<p>So that <code>pnpm dev</code> can read your Cloudflare bindings (env vars, R2, KV, D1, …) the same way production will, edit <code>next.config.mjs</code>:</p>
<pre><code class="language-js">/** @type {import('next').NextConfig} */
const nextConfig = {};

if (process.env.NODE_ENV !== "production") {
  const { initOpenNextCloudflareForDev } = await import(
    "@opennextjs/cloudflare"
  );
  initOpenNextCloudflareForDev();
}

export default nextConfig;
</code></pre>
<p>We only call it in development so <code>next build</code> stays fast and CI doesn't spin up a Miniflare instance for nothing.</p>
<h2 id="heading-step-3-local-environment-setup-with-devvars">Step 3 — Local Environment Setup with <code>.dev.vars</code></h2>
<p>When working with Cloudflare Workers locally, Wrangler uses a file called <code>.dev.vars</code> to store environment variables (instead of <code>.env.local</code> used by Next.js).</p>
<p>A simple and reliable approach is to keep an example file in your repo and ignore the real one.</p>
<h3 id="heading-example-devvarsexample-committed">Example: <code>.dev.vars.example</code> (committed)</h3>
<pre><code class="language-bash">NEXT_PUBLIC_SUPABASE_URL="https://YOUR-PROJECT-ref.supabase.co"
NEXT_PUBLIC_SUPABASE_ANON_KEY="YOUR-ANON-KEY"
NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL="admin@example.com"
</code></pre>
<h3 id="heading-set-up-your-local-environment">Set Up Your Local Environment</h3>
<p>Run the following commands:</p>
<pre><code class="language-plaintext">cp .dev.vars.example .dev.vars
cp .dev.vars .env.local
</code></pre>
<ul>
<li><p><code>.dev.vars</code> is used by Wrangler (<code>wrangler dev</code>)</p>
</li>
<li><p><code>.env.local</code> is used by Next.js (<code>next dev</code>)</p>
</li>
</ul>
<h3 id="heading-why-use-both-files">Why Use Both Files?</h3>
<ul>
<li><p><code>next dev</code> reads from <code>.env.local</code></p>
</li>
<li><p><code>wrangler dev</code> (used in <code>pnpm preview</code>) reads from <code>.dev.vars</code></p>
</li>
</ul>
<p>Keeping both files in sync ensures your app behaves consistently in development and when running in the Cloudflare runtime.</p>
<h3 id="heading-update-gitignore">Update <code>.gitignore</code></h3>
<p>Make sure these files are ignored:</p>
<pre><code class="language-plaintext">.dev.vars
.env*.local
.open-next
.wrangler
</code></pre>
<h2 id="heading-step-4-deploy-your-app-from-your-local-machine">Step 4 — Deploy Your App from Your Local Machine</h2>
<p>Once <code>pnpm preview</code> is working correctly, you're ready to deploy your application:</p>
<pre><code class="language-bash">pnpm deploy
</code></pre>
<p>Under the hood that runs:</p>
<pre><code class="language-bash">pnpm cloudflare-build &amp;&amp; wrangler deploy
</code></pre>
<p>The first time, Wrangler will:</p>
<ol>
<li><p>Compile your app to <code>.open-next/worker.js</code>.</p>
</li>
<li><p>Upload the script + assets to Cloudflare.</p>
</li>
<li><p>Print your live URL, e.g. <code>https://porfolio.&lt;your-account&gt;.workers.dev</code>.</p>
</li>
</ol>
<p>Open it in a browser. Congratulations — you're on Cloudflare's edge in 330+ cities. The page should be served in <strong>&lt;100 ms</strong> TTFB from anywhere.  </p>
<p><a href="https://portfolio.tarikuldev.workers.dev/">Here's the live version of my own portfolio deployed this way</a></p>
<h2 id="heading-step-5-push-your-secrets-to-the-worker">Step 5 — Push Your Secrets to the Worker</h2>
<p>Local <code>.dev.vars</code> is <strong>not</strong> uploaded by <code>wrangler deploy</code>. You have to push secrets explicitly:</p>
<pre><code class="language-bash">wrangler secret put NEXT_PUBLIC_SUPABASE_URL
wrangler secret put NEXT_PUBLIC_SUPABASE_ANON_KEY
wrangler secret put NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL
</code></pre>
<p>Each command prompts you for the value and stores it encrypted on Cloudflare. Or do it visually:</p>
<blockquote>
<p>Cloudflare Dashboard → <strong>Workers &amp; Pages</strong> → your worker → <strong>Settings</strong> → <strong>Variables and Secrets</strong> → <strong>Add</strong>.</p>
</blockquote>
<p>Important: <code>NEXT_PUBLIC_*</code> vars are inlined into the client bundle at build time, so they also need to be available when pnpm cloudflare-build runs (locally, that's your .env.local; in CI, see Step 10).</p>
<h2 id="heading-step-6-set-up-continuous-deployment-with-github-actions">Step 6 — Set Up Continuous Deployment with GitHub Actions</h2>
<p>Once your local deployment is working, the next step is automating deployments so every push to the <code>main</code> branch updates production automatically.</p>
<p>With this workflow:</p>
<ul>
<li><p>Pull requests will run validation checks</p>
</li>
<li><p>Production deploys only happen after successful builds</p>
</li>
<li><p>Broken code never reaches your live site</p>
</li>
</ul>
<p>Create the following file inside your project:</p>
<p><code>.github/workflows/deploy.yml</code></p>
<pre><code class="language-yaml">name: CI / Deploy to Cloudflare Workers

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  workflow_dispatch:

concurrency:
  group: cloudflare-deploy-${{ github.ref }}
  cancel-in-progress: true

jobs:
  verify:
    name: Lint and Build
    runs-on: ubuntu-latest
    timeout-minutes: 10

    steps:
      - uses: actions/checkout@v4

      - uses: pnpm/action-setup@v4
        with:
          version: 10

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm

      - run: pnpm install --frozen-lockfile
      - run: pnpm lint
      - run: pnpm build
        env:
          NEXT_PUBLIC_SUPABASE_URL: ${{ secrets.NEXT_PUBLIC_SUPABASE_URL }}
          NEXT_PUBLIC_SUPABASE_ANON_KEY: ${{ secrets.NEXT_PUBLIC_SUPABASE_ANON_KEY }}
          NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL: ${{ secrets.NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL }}

  deploy:
    name: Deploy to Cloudflare Workers
    needs: verify
    if: github.event_name == 'push' &amp;&amp; github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - uses: actions/checkout@v4

      - uses: pnpm/action-setup@v4
        with:
          version: 10

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm

      - run: pnpm install --frozen-lockfile

      - name: Build and Deploy
        run: pnpm run deploy
        env:
          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          NEXT_PUBLIC_SUPABASE_URL: ${{ secrets.NEXT_PUBLIC_SUPABASE_URL }}
          NEXT_PUBLIC_SUPABASE_ANON_KEY: ${{ secrets.NEXT_PUBLIC_SUPABASE_ANON_KEY }}
          NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL: ${{ secrets.NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL }}
</code></pre>
<h3 id="heading-required-github-repo-secrets">Required GitHub repo secrets</h3>
<p>Go to GitHub repo → Settings → Secrets and variables → Actions → New repository secret and add:</p>
<table>
<thead>
<tr>
<th>Secret</th>
<th>Where to get it</th>
</tr>
</thead>
<tbody><tr>
<td><code>CLOUDFLARE_API_TOKEN</code></td>
<td><a href="https://dash.cloudflare.com/profile/api-tokens">https://dash.cloudflare.com/profile/api-tokens</a> → "Edit Cloudflare Workers" template</td>
</tr>
<tr>
<td><code>CLOUDFLARE_ACCOUNT_ID</code></td>
<td>Cloudflare dashboard → right sidebar, "Account ID"</td>
</tr>
<tr>
<td><code>CLOUDFLARE_ACCOUNT_SUBDOMAIN</code></td>
<td>Your <code>*.workers.dev</code> subdomain (used only for the deployment URL link)</td>
</tr>
<tr>
<td><code>NEXT_PUBLIC_SUPABASE_URL</code></td>
<td>Supabase project settings</td>
</tr>
<tr>
<td><code>NEXT_PUBLIC_SUPABASE_ANON_KEY</code></td>
<td>Supabase project settings</td>
</tr>
<tr>
<td><code>NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL</code></td>
<td>Email pre-filled on <code>/dashboard/login</code></td>
</tr>
</tbody></table>
<p>That's it. Push it to <code>main</code> and it'll go live in about 90 seconds. PRs run lint and build only, so broken code never reaches production.</p>
<h2 id="heading-step-7-updating-the-project-the-daily-workflow">Step 7 — Updating the Project (the Daily Workflow)</h2>
<p>After the initial setup, the loop is boringly simple — which is the whole point. Here's what I actually do day-to-day:</p>
<h3 id="heading-code-change">Code Change</h3>
<pre><code class="language-bash">git checkout -b feat/new-section
# ...edit files...
pnpm dev                # iterate locally
pnpm preview            # final smoke test on the Worker runtime
git commit -am "feat: add new section"
git push origin feat/new-section
</code></pre>
<p>Open a PR and the <strong>verify</strong> that the job runs. Then review, merge, and the deploy it. The job ships to Cloudflare automatically.</p>
<h3 id="heading-updating-env-vars-secrets">Updating env Vars / Secrets</h3>
<pre><code class="language-bash"># Local
nano .dev.vars

# Production
wrangler secret put NEXT_PUBLIC_SUPABASE_URL
# ...etc.
</code></pre>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>When I started this migration, I was nervous about leaving Vercel — the Next.js DX there is genuinely excellent. But the moment you push beyond a hobby site, Cloudflare's economics and edge performance are not close.</p>
<p>With <code>@opennextjs/cloudflare</code>, the developer experience has also caught up: my <code>pnpm dev</code> loop is identical, my <code>pnpm preview</code> mimics production, and <code>git push</code> deploys globally in ~90 seconds.</p>
<p>If you've been holding off because the old Cloudflare Pages + Next.js story was rough, that era is over. Try this runbook on a side project this weekend and see for yourself.</p>
<p>If you found this useful, the full repo is <a href="./">here</a> — feel free to clone it as a starter.</p>
<p>Happy shipping.</p>
<p>— <em>Tarikul</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Create a GPU-Optimized Machine Image with HashiCorp Packer on GCP ]]>
                </title>
                <description>
                    <![CDATA[ Every time you spin up GPU infrastructure, you do the same thing: install CUDA drivers, DCGM, apply OS‑level GPU tuning, and fight dependency issues. Same old ritual every single time, wasting expensi ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-create-a-gpu-optimized-machine-image-with-hashicorp-packer-on-gcp/</link>
                <guid isPermaLink="false">69e93606d5f8830e7d9fbad6</guid>
                
                    <category>
                        <![CDATA[ GPU ]]>
                    </category>
                
                    <category>
                        <![CDATA[ VM Image ]]>
                    </category>
                
                    <category>
                        <![CDATA[ GCP ]]>
                    </category>
                
                    <category>
                        <![CDATA[ hashicorp packer ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ mlops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Rasheedat Atinuke Jamiu ]]>
                </dc:creator>
                <pubDate>Wed, 22 Apr 2026 20:30:00 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/fd393878-fe7c-458a-addf-7cd22d8280ac.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Every time you spin up GPU infrastructure, you do the same thing: install CUDA drivers, DCGM, apply OS‑level GPU tuning, and fight dependency issues. Same old ritual every single time, wasting expensive cloud credits and getting frustrated before actual work begins.</p>
<p>In this article, you'll build a reusable GPU-optimized machine image using Packer, pre-loaded with NVIDIA drivers, CUDA Toolkit, NVIDIA Container Toolkit, DCGM, and system-level GPU tuning like persistence mode.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-project-setup">Project Setup</a></p>
</li>
<li><p><a href="#heading-step-1-install-packer">Step 1: Install Packer</a></p>
</li>
<li><p><a href="#heading-step-2-set-up-project-directory">Step 2: Set Up Project Directory</a></p>
</li>
<li><p><a href="#heading-step-3-install-packers-plugins">Step 3: Install Packer's Plugins</a></p>
</li>
<li><p><a href="#heading-step-4-define-your-source">Step 4: Define Your Source</a></p>
</li>
<li><p><a href="#heading-step-5-writing-the-build-template">Step 5: Writing the Build Template</a></p>
</li>
<li><p><a href="#heading-step-6-writing-the-gpu-provisioning-script">Step 6: Writing the GPU Provisioning Script</a></p>
<ul>
<li><p><a href="#heading-section-1-pre-installation-kernel-headers">section 1: Pre-Installation (Kernel Headers)</a></p>
</li>
<li><p><a href="#heading-section-2-installing-nvidias-apt-repository">Section 2: Installing NVIDIA's Apt Repository</a></p>
</li>
<li><p><a href="#heading-section-3-pinning-nvidia-drivers-version">Section 3: Pinning NVIDIA Drivers Version</a></p>
</li>
<li><p><a href="#heading-section-4-installing-the-driver">Section 4: Installing the Driver</a></p>
</li>
<li><p><a href="#heading-section-5-cuda-toolkit-installation">Section 5: CUDA Toolkit Installation</a></p>
</li>
<li><p><a href="#heading-section-6-nvidia-container-toolkit">Section 6: Nvidia Container Toolkit</a></p>
</li>
<li><p><a href="#heading-section-7-installing-dcgm-data-center-gpu-manager">Section 7: Installing DCGM — Data Center GPU Manager</a></p>
</li>
<li><p><a href="#heading-section-8-enabling-persistence-mode">Section 8: Enabling Persistence Mode</a></p>
</li>
<li><p><a href="#heading-section-9-system-tuning-for-gpu-compute-workloads">Section 9: System Tuning for GPU Compute Workloads</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-step-7assembling-and-running-the-build">Step 7:Assembling and Running the Build</a></p>
</li>
<li><p><a href="#heading-step-8-test-the-image-and-verify-the-gpu-stack">Step 8: Test the Image and Verify the GPU Stack</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-references">References</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p><a href="https://www.packer.io/">HashiCorp Packer</a> &gt;= 1.9</p>
</li>
<li><p><a href="https://github.com/hashicorp/packer-plugin-googlecompute">Google Compute Packer plugin</a> (installed via <code>packer init</code>)</p>
</li>
<li><p>Optionally, the <a href="https://github.com/hashicorp/packer-plugin-amazon">AWS Packer plugin</a> can be used for EC2 builds by adding an <code>amazon-ebs</code> source to <code>node.pkr.hcl</code></p>
</li>
<li><p>GCP project with Compute Engine API enabled (or AWS account with EC2 access)</p>
</li>
<li><p>GCP authentication (<code>gcloud auth application-default login</code>) or AWS credentials</p>
</li>
<li><p>Access to an NVIDIA GPU instance type (For example, A100, H100, L4 on GCP; p4d, p5, G6 on AWS)</p>
</li>
</ul>
<h2 id="heading-project-setup">Project Setup</h2>
<h3 id="heading-step-1-install-packer">Step 1: Install Packer</h3>
<p>To get started, you'll install Packer with the steps below if you're on macOS (or you can follow the official documentation for Linux and Windows installation <a href="https://developer.hashicorp.com/packer/tutorials/docker-get-started/get-started-install-cli#:~:text=Chocolatey%20on%20Windows-,Linux,-HashiCorp%20officially%20maintains">guides</a>).</p>
<p>First, you'll install the official Packer formula from the terminal.</p>
<p>Install the HashiCorp tap, a repository of all Hashicorp packages.</p>
<pre><code class="language-plaintext">$ brew tap hashicorp/tap
</code></pre>
<p>Now, install Packer with <code>hashicorp/tap/packer</code>.</p>
<pre><code class="language-plaintext">$ brew install hashicorp/tap/packer
</code></pre>
<h3 id="heading-step-2-set-up-project-directory">Step 2: Set Up Project Directory</h3>
<p>With Packer installed, you'll create your project directory. For clean code and separation of concerns, your project directory should look like the below. Go ahead and create these files in your <code>packer_demo</code> folder using the command below:</p>
<pre><code class="language-plaintext">mkdir -p packer_demo/script &amp;&amp; touch packer_demo/{build.pkr.hcl,source.pkr.hcl,variable.pkr.hcl,local.pkr.hcl,plugins.pkr.hcl,values.pkrvars.hcl} packer_demo/script/base.sh
</code></pre>
<p>Your file directory should look like this:</p>
<pre><code class="language-plaintext">packer_demo
├── build.pkr.hcl                 # Build pipeline — provisioner ordering
├── source.pkr.hcl                # GCP source definition (googlecompute)
├── variable.pkr.hcl              # Variable definitions with defaults
├── local.pkr.hcl                 # Local values
├── plugins.pkr.hcl                # Packer plugin requirements
├── values.pkrvars.hcl             # variable values (copy and customize)
├── script/
│   ├── base.sh                  # requirement script 
</code></pre>
<h3 id="heading-step-3-install-packers-plugins">Step 3: Install Packer's Plugins</h3>
<p>In your <code>plugins.pkr.hcl file,</code>, define your plugins in the <code>packer block.</code> The <code>packer {}</code> block contains Packer settings, including specifying a required plugin version. You'll find the <code>required_plugins</code> block in the Packer block, which specifies all the plugins required by the template to build your image. If you're on Azure or AWS, you can check for the latest plugin <a href="https://developer.hashicorp.com/packer/integrations">here</a>.</p>
<pre><code class="language-hcl">packer {
  required_plugins {
    googlecompute = {
      source  = "github.com/hashicorp/googlecompute"
      version = "~&gt; 1"
    }
  }
}
</code></pre>
<p>Then, initialize your Packer plugin with the command below:</p>
<pre><code class="language-plaintext">packer init .
</code></pre>
<h3 id="heading-step-4-define-your-source">Step 4: Define Your Source</h3>
<p>With your plugin initialized, you can now define your source block. The source block configures a specific builder plugin, which is then invoked by a build block. Source blocks contain your <code>project ID</code>, the zone where your machine will be created, the <code>source_image_family</code> (think of this as your base image, such as Debian, Ubuntu, and so on), and your <code>source_image_project_id</code>.</p>
<p>In GCP, each has an image project ID, such as "ubuntu-os-cloud" for Ubuntu. You'll set the <code>machine type</code> to a GPU machine type because you're building your base image for a GPU machine, so the machine on which it will be created needs to be able to run your commands.</p>
<pre><code class="language-hcl">source "googlecompute" "gpu-node" {
  project_id              = var.project_id
  zone                    = var.zone
  source_image_family     = var.image_family
  source_image_project_id = var.image_project_id
  ssh_username            = var.ssh_username
  machine_type            = var.machine_type



  image_name        = var.image_name
  image_description = var.image_description

  disk_size           = var.disk_size
  on_host_maintenance = "TERMINATE"

  tags = ["gpu-node"]

}
</code></pre>
<p>Setting <code>on_host_maintenance = "TERMINATE"</code> on Google Cloud Compute Engine ensures that a VM instance stops instead of live-migrating during infrastructure maintenance. This is important when using GPUs or specialized hardware that can't migrate, preventing data corruption.</p>
<p>You'll define all your variables in the <code>variable.pkr.hcl</code> file, and set the values in the <code>values.pkrvars.hcl</code>. Remember to always add your <code>values.pkrvars.hcl</code> file to Gitignore.</p>
<pre><code class="language-hcl">variable "image_name" {
  type        = string
  description = "The name of the resulting image"
}

variable "image_description" {
  type        = string
  description = "Description of the image"
}

variable "project_id" {
  type        = string
  description = "The GCP project ID where the image will be created"
}

variable "image_family" {
  type        = string
  description = "The image family to which the resulting image belongs"
}

variable "image_project_id" {
  type        = list(string)
  description = "The project ID(s) to search for the source image"
}

variable "zone" {
  type        = string
  description = "The GCP zone where the build instance will be created"
}

variable "ssh_username" {
  type        = string
  description = "The SSH username to use for connecting to the instance"
}
variable "machine_type" {
  type        = string
  description = "The machine type to use for the build instance"
}

variable "cuda_version" {
  type        = string
  description = "CUDA toolkit version"
  default     = "13.1"
}

variable "driver_version" {
  type        = string
  description = "NVIDIA driver version"
  default     = "590.48.01"
}

variable "disk_size" {
  type        = number
  description = "Boot disk size in GB"
  default     = 50
}
</code></pre>
<p><code>values.pkrvars.hcl</code></p>
<pre><code class="language-hcl">image_name        = "base-gpu-image-{{timestamp}}"
image_description = "Ubuntu 24.04 LTS with gpu drivers and health checks"
project_id        = "your gcp project id"
image_family      = "ubuntu-2404-lts-amd64"
image_project_id  = ["ubuntu-os-cloud"]
zone              = "us-central1-a"
ssh_username      = "packer"
machine_type      = "g2-standard-4"
disk_size        = 50
driver_version   = "590.48.01"
cuda_version      = "13.1" 
</code></pre>
<h3 id="heading-step-5-writing-the-build-template">Step 5: Writing the Build Template</h3>
<p>Create <code>build.pkr.hcl</code>. The <code>build</code> block creates a temporary instance, runs provisioners, and produces an image.</p>
<p>Provisioners in this template are organized as follows:</p>
<ul>
<li><p><strong>First provisioner</strong> runs system updates and upgrades.</p>
</li>
<li><p><strong>Second provisioner</strong> reboots the instance (<code>expect_disconnect = true</code>).</p>
</li>
<li><p><strong>Third provisioner</strong> waits for the instance to come back (<code>pause_before</code>), then runs <code>script/base.sh</code>. This provisioner sets <code>max_retries</code> to handle transient SSH timeouts and pass environment variables for <code>DRIVER_VERSION</code> and <code>CUDA_VERSION</code>.</p>
</li>
</ul>
<p>Lastly, you have the post-processor to tell you the image ID and completion status:</p>
<pre><code class="language-hcl">build {
  sources = ["source.googlecompute.gpu-node"]

  provisioner "shell" {
    inline = [
      "set -e",
      "sudo apt update",
      "sudo apt -y dist-upgrade"
    ]
  }

  provisioner "shell" {
    expect_disconnect = true
    inline            = ["sudo reboot"]
  }

  # Base: NVIDIA drivers, CUDA, DCGM
  provisioner "shell" {
    pause_before = "60s"
    script       = "script/base.sh"
    max_retries  = 2
    environment_vars = [
      "DRIVER_VERSION=${var.driver_version}",
      "CUDA_VERSION=${var.cuda_version}"
    ]
  }

  post-processor "shell-local" {
    inline = [
      "echo '=== Image Build Complete ==='",
      "echo 'Image ID: ${build.ID}'",
      "date"
    ]
  }
}
</code></pre>
<h3 id="heading-step-6-writing-the-gpu-provisioning-script">Step 6: Writing the GPU Provisioning Script</h3>
<p>Now we'll go through the base script, and break down some parts of it.</p>
<h3 id="heading-section-1-pre-installation-kernel-headers">Section 1: Pre-Installation (Kernel Headers)</h3>
<p>Before installing NVIDIA drivers, the system needs kernel headers and build tools. The NVIDIA driver compiles a kernel module during installation via DKMS, so if the headers for your running kernel aren't present, the build will fail silently, and the driver won't load on boot.</p>
<pre><code class="language-shellscript">log "Installing kernel headers and build tools..."
sudo apt-get install -qq -y \
  "linux-headers-$(uname -r)" \
  build-essential \
  dkms \
  curl \
  wget
</code></pre>
<h3 id="heading-section-2-installing-nvidias-apt-repository">Section 2: Installing NVIDIA's Apt Repository</h3>
<p>This snippet downloads and installs NVIDIA’s official keyring package based on your OS Linux distribution, which adds the trusted signing keys needed for the system to verify CUDA packages.</p>
<pre><code class="language-shellscript">log "Adding NVIDIA CUDA apt repository (${DISTRO})..."
wget -q "https://developer.download.nvidia.com/compute/cuda/repos/\({DISTRO}/\){ARCH}/cuda-keyring_1.1-1_all.deb" \
  -O /tmp/cuda-keyring.deb
sudo dpkg -i /tmp/cuda-keyring.deb
rm /tmp/cuda-keyring.deb
sudo apt-get update -qq
</code></pre>
<h3 id="heading-section-3-pinning-nvidia-drivers-version">Section 3: Pinning NVIDIA Drivers Version</h3>
<p>Pinning the NVIDIA driver to a specific version ensures that the system always installs and keeps using exactly that driver version, even when newer drivers appear in the repository.</p>
<p>NVIDIA drivers are tightly coupled with CUDA toolkit versions, Kernel versions, and container runtimes like Docker or NVIDIA Container Toolkit</p>
<p>A mismatch, such as the system auto‑upgrading to a newer driver, can cause CUDA to stop working, break GPU acceleration, or make the machine image inconsistent across deployments.</p>
<pre><code class="language-shellscript">log "Pinning driver to version ${DRIVER_VERSION}..."
sudo apt-get install -qq -y "nvidia-driver-pinning-${DRIVER_VERSION}"
</code></pre>
<h3 id="heading-section-4-installing-the-driver">Section 4: Installing the Driver</h3>
<p>The <code>libnvidia-compute</code> installs only the compute‑related user‑space libraries (CUDA driver components), while the <code>nvidia-dkms-open;</code> installs the <strong>open‑source NVIDIA kernel module</strong>, built locally via DKMS.</p>
<p>Together, these two packages give you a fully functional CUDA driver environment without any GUI or graphics dependencies.</p>
<p>Here, we're using <strong>NVIDIA’s compute‑only driver stack using the open‑source kernel modules</strong>, as it deliberately avoids installing any display-related components, which you don't need.</p>
<p>This method provides an installation module based on DKMS that's better aligned with Linux distros, as it's lightweight, and compute-focused.</p>
<pre><code class="language-shellscript">log "Installing NVIDIA compute-only driver (open kernel modules)..."
sudo apt-get -V install -y \
  libnvidia-compute \
  nvidia-dkms-open
</code></pre>
<h3 id="heading-section-5-cuda-toolkit-installation">Section 5: CUDA Toolkit Installation</h3>
<p>This part of the script installs the <strong>CUDA Toolkit</strong> for the specified version and then makes sure that CUDA’s executables and libraries are available system‑wide for every user and every shell session.</p>
<p>It adds CUDA binaries to PATH, so commands like <code>nvcc</code>, <code>cuda-gdb</code>, and <code>cuda-memcheck</code> work without specifying full paths. It also adds CUDA libraries to LD_LIBRARY_PATH, so applications can find CUDA’s shared libraries at runtime.</p>
<pre><code class="language-shellscript">log "Installing CUDA Toolkit ${CUDA_VERSION}..."
sudo apt-get install -qq -y "cuda-toolkit-${CUDA_VERSION}"

# Persist CUDA paths for all users and sessions
cat &lt;&lt;'EOF' | sudo tee /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:-}
EOF
echo "/usr/local/cuda/lib64" | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig
</code></pre>
<h3 id="heading-section-6-nvidia-container-toolkit">Section 6: NVIDIA Container Toolkit</h3>
<p>This block installs the NVIDIA Container Toolkit and configures it so that containers (Docker or containerd) can access the GPU safely and correctly. It’s a critical step for Kubernetes GPU nodes, Docker GPU workloads, and any system that needs GPU acceleration inside containers.</p>
<pre><code class="language-shellscript">log "Installing NVIDIA Container Toolkit..."
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update -qq
sudo apt-get install -qq -y nvidia-container-toolkit

# Configure for containerd (primary Kubernetes runtime)
sudo nvidia-ctk runtime configure --runtime=containerd

# Configure for Docker if present on this image
if systemctl list-unit-files | grep -q "^docker.service"; then
  sudo nvidia-ctk runtime configure --runtime=docker
fi
</code></pre>
<h3 id="heading-section-7-installing-dcgm-data-center-gpu-manager">Section 7: Installing DCGM (Data Center GPU Manager)</h3>
<p>This section covers the installation and validation of NVIDIA DCGM (Data Center GPU Manager), which is NVIDIA’s official management and telemetry framework for data center GPUs.</p>
<p>It offers health monitoring and diagnostics, telemetry (including temperature, clocks, power, and utilization), error reporting, and integration with Kubernetes, Prometheus, and monitoring agents. Your GPU monitoring stack relies on this.</p>
<p>The script extracts the installed version and checks that it meets the <strong>minimum required version</strong> for NVIDIA driver 590+. Then it enforces the version requirement. This prevents a mismatch between the GPU driver and DCGM, which would break monitoring and health checks. It also enables fabric manager for NVLink/NVswitches, if you're on a Multi‑GPU topologies like A100/H100 DGX or multi‑GPU servers.</p>
<pre><code class="language-shellscript">log "Installing DCGM..."
sudo apt-get install -qq -y datacenter-gpu-manager

DCGM_VER=\((dpkg -s datacenter-gpu-manager 2&gt;/dev/null | awk '/^Version:/{print \)2}' | sed 's/^[0-9]*://')
DCGM_MAJOR=\((echo "\){DCGM_VER}" | cut -d. -f1)
DCGM_MINOR=\((echo "\){DCGM_VER}" | cut -d. -f2)
if [[ "\({DCGM_MAJOR}" -lt 4 ]] || { [[ "\){DCGM_MAJOR}" -eq 4 ]] &amp;&amp; [[ "${DCGM_MINOR}" -lt 3 ]]; }; then
  error "DCGM ${DCGM_VER} is below the 4.3 minimum required for driver 590+. Check your CUDA repo."
fi
log "DCGM installed: ${DCGM_VER}"

sudo systemctl enable nvidia-dcgm
sudo systemctl start  nvidia-dcgm

# Fabric Manager — only needed for NVLink/NVSwitch GPUs (A100/H100 multi-GPU nodes)
if systemctl list-unit-files | grep -q "^nvidia-fabricmanager.service"; then
  log "Enabling nvidia-fabricmanager for NVLink GPUs..."
  sudo systemctl enable nvidia-fabricmanager
  sudo systemctl start  nvidia-fabricmanager
fi
</code></pre>
<h3 id="heading-section-8-enabling-persistence-mode">Section 8: Enabling Persistence Mode</h3>
<p>The NVIDIA driver normally unloads itself when the GPU is idle. When a new workload starts, the driver must reload, reinitialize the GPU, and set up memory mappings. This adds a delay of a few hundred milliseconds to several seconds, depending on the GPU and system.</p>
<p>Enabling nvidia‑persistenced keeps the NVIDIA driver loaded in memory even when no GPU workloads are running.</p>
<pre><code class="language-shellscript">log "Enabling nvidia-persistenced..."
sudo systemctl enable nvidia-persistenced
sudo systemctl start  nvidia-persistenced
</code></pre>
<h3 id="heading-section-9-system-tuning-for-gpu-compute-workloads">Section 9: System Tuning for GPU Compute Workloads</h3>
<p>This block applies a set of <strong>system‑level performance and stability tunings</strong> that are standard for high‑performance GPU servers, Kubernetes GPU nodes, and ML/AI workloads.</p>
<p>Each line targets a specific bottleneck or instability pattern that appears in real GPU production environments.</p>
<ul>
<li><p>Swap and memory behavior: Disabling swap and setting <code>vm.swappiness=0</code> prevents the kernel from pushing GPU‑bound processes into swap. GPU workloads are extremely sensitive to latency, and swapping can cause CUDA context resets and GPU driver timeouts.</p>
</li>
<li><p>Hugepages for large memory allocations: Setting <code>vm.nr_hugepages=2048</code> allocates a pool of hugepages, which reduces TLB pressure for large contiguous memory allocations.</p>
<p>CUDA, NCCL, and deep‑learning frameworks frequently allocate large buffers, and hugepages reduce page‑table overhead, improving memory bandwidth and lowering latency for large tensor operations. This is especially useful on multi‑GPU servers.</p>
</li>
<li><p>CPU frequency governor: Installing <code>cpupower</code> and forcing the CPU governor to <code>performance</code> ensures the CPU stays at maximum frequency instead of scaling down.</p>
<p>GPU workloads often become CPU‑bound during Data preprocessing, Kernel launches, and NCCL communication. Keeping CPUs at full speed reduces jitter and improves throughput.</p>
</li>
<li><p>NUMA and topology tools: Installing <code>numactl</code>, <code>libnuma-dev</code>, and <code>hwloc</code> provides tools for pinning processes to NUMA nodes, understanding CPU–GPU affinity, and optimizing multi‑GPU placement.</p>
</li>
<li><p>Disabling irqbalance: Stopping and disabling <code>irqbalance</code> it lets the NVIDIA driver manage interrupt affinity. For GPU servers, irqbalance can incorrectly move GPU interrupts to suboptimal CPUs, causing higher latency and lower throughput.</p>
</li>
</ul>
<pre><code class="language-shell">log "Applying system tuning..."

# Disable swap (critical for Kubernetes scheduler and ML stability)
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
echo "vm.swappiness=0"     | sudo tee /etc/sysctl.d/99-gpu-swappiness.conf

# Hugepages — reduces TLB pressure for large memory allocations
echo "vm.nr_hugepages=2048" | sudo tee /etc/sysctl.d/99-gpu-hugepages.conf

# CPU performance governor
sudo apt-get install -qq -y linux-tools-common "linux-tools-$(uname -r)" || true
sudo cpupower frequency-set -g performance || true

# NUMA and topology tools for GPU affinity tuning
sudo apt-get install -qq -y numactl libnuma-dev hwloc

# Disable irqbalance — let NVIDIA driver manage interrupt affinity
sudo systemctl disable irqbalance || true
sudo systemctl stop    irqbalance || true

# Apply all sysctl settings now
sudo sysctl --system
</code></pre>
<p>Full base.sh script here:</p>
<pre><code class="language-shell">#!/bin/bash
set -euo pipefail

log()   { echo "[BASE] $1"; }
error() { echo "[BASE][ERROR] $1" &gt;&amp;2; exit 1; }

###############################################################
###############################################################
[[ -z "${DRIVER_VERSION:-}" ]] &amp;&amp; error "DRIVER_VERSION is not set."
[[ -z "${CUDA_VERSION:-}"   ]] &amp;&amp; error "CUDA_VERSION is not set."

log "DRIVER_VERSION : ${DRIVER_VERSION}"
log "CUDA_VERSION   : ${CUDA_VERSION}"

DISTRO=\((. /etc/os-release &amp;&amp; echo "\){ID}${VERSION_ID}" | tr -d '.')
ARCH="x86_64"

export DEBIAN_FRONTEND=noninteractive

###############################################################
# 1. System update
###############################################################
log "Updating system packages..."
sudo apt-get update -qq
sudo apt-get upgrade -qq -y

###############################################################
# 2. Pre-installation — kernel headers
#    Source: https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/ubuntu.html
###############################################################
log "Installing kernel headers and build tools..."
sudo apt-get install -qq -y \
  "linux-headers-$(uname -r)" \
  build-essential \
  dkms \
  curl \
  wget

###############################################################
# 3. NVIDIA CUDA Network Repository
###############################################################
log "Adding NVIDIA CUDA apt repository (${DISTRO})..."
wget -q "https://developer.download.nvidia.com/compute/cuda/repos/\({DISTRO}/\){ARCH}/cuda-keyring_1.1-1_all.deb" \
  -O /tmp/cuda-keyring.deb
sudo dpkg -i /tmp/cuda-keyring.deb
rm /tmp/cuda-keyring.deb
sudo apt-get update -qq

###############################################################
# 4. Pin driver version BEFORE installation (590+ requirement)
###############################################################
log "Pinning driver to version ${DRIVER_VERSION}..."
sudo apt-get install -qq -y "nvidia-driver-pinning-${DRIVER_VERSION}"

###############################################################
# 5. Compute-only (headless) driver — Open Kernel Modules
#    Source: NVIDIA Driver Installation Guide — Compute-only System (Open Kernel Modules)
#
#    libnvidia-compute  = compute libraries only (no GL/Vulkan/display)
#    nvidia-dkms-open   = open-source kernel module built via DKMS
#
#    Open kernel modules are the NVIDIA-recommended choice for
#    Ampere, Hopper, and Blackwell data centre GPUs (A100, H100, etc.)
###############################################################
log "Installing NVIDIA compute-only driver (open kernel modules)..."
sudo apt-get -V install -y \
  libnvidia-compute \
  nvidia-dkms-open

###############################################################
# 6. CUDA Toolkit
###############################################################
log "Installing CUDA Toolkit ${CUDA_VERSION}..."
sudo apt-get install -qq -y "cuda-toolkit-${CUDA_VERSION}"

# Persist CUDA paths for all users and sessions
cat &lt;&lt;'EOF' | sudo tee /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:-}
EOF
echo "/usr/local/cuda/lib64" | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

###############################################################
# 7. NVIDIA Container Toolkit
#    Required for GPU workloads in Docker / containerd / Kubernetes
###############################################################
log "Installing NVIDIA Container Toolkit..."
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update -qq
sudo apt-get install -qq -y nvidia-container-toolkit

# Configure for containerd (primary Kubernetes runtime)
sudo nvidia-ctk runtime configure --runtime=containerd

# Configure for Docker if present on this image
if systemctl list-unit-files | grep -q "^docker.service"; then
  sudo nvidia-ctk runtime configure --runtime=docker
fi

###############################################################
# 8. DCGM — DataCenter GPU Manager
###############################################################
log "Installing DCGM..."
sudo apt-get install -qq -y datacenter-gpu-manager
 
DCGM_VER=\((dpkg -s datacenter-gpu-manager 2&gt;/dev/null | awk '/^Version:/{print \)2}' | sed 's/^[0-9]*://')
DCGM_MAJOR=\((echo "\){DCGM_VER}" | cut -d. -f1)
DCGM_MINOR=\((echo "\){DCGM_VER}" | cut -d. -f2)
if [[ "\({DCGM_MAJOR}" -lt 4 ]] || { [[ "\){DCGM_MAJOR}" -eq 4 ]] &amp;&amp; [[ "${DCGM_MINOR}" -lt 3 ]]; }; then
  error "DCGM ${DCGM_VER} is below the 4.3 minimum required for driver 590+. Check your CUDA repo."
fi
log "DCGM installed: ${DCGM_VER}"

sudo systemctl enable nvidia-dcgm
sudo systemctl start  nvidia-dcgm

# Fabric Manager — only needed for NVLink/NVSwitch GPUs (A100/H100 multi-GPU nodes)
if systemctl list-unit-files | grep -q "^nvidia-fabricmanager.service"; then
  log "Enabling nvidia-fabricmanager for NVLink GPUs..."
  sudo systemctl enable nvidia-fabricmanager
  sudo systemctl start  nvidia-fabricmanager
fi

###############################################################
# 9. NVIDIA Persistence Daemon
#    Keeps the driver loaded between jobs — reduces cold-start
#    latency on the first CUDA call in each new workload
###############################################################
log "Enabling nvidia-persistenced..."
sudo systemctl enable nvidia-persistenced
sudo systemctl start  nvidia-persistenced

###############################################################
# 10. System tuning for GPU compute workloads
###############################################################
log "Applying system tuning..."

# Disable swap (critical for Kubernetes scheduler and ML stability)
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
echo "vm.swappiness=0"     | sudo tee /etc/sysctl.d/99-gpu-swappiness.conf

# Hugepages — reduces TLB pressure for large memory allocations
echo "vm.nr_hugepages=2048" | sudo tee /etc/sysctl.d/99-gpu-hugepages.conf

# CPU performance governor
sudo apt-get install -qq -y linux-tools-common "linux-tools-$(uname -r)" || true
sudo cpupower frequency-set -g performance || true

# NUMA and topology tools for GPU affinity tuning
sudo apt-get install -qq -y numactl libnuma-dev hwloc

# Disable irqbalance — let NVIDIA driver manage interrupt affinity
sudo systemctl disable irqbalance || true
sudo systemctl stop    irqbalance || true

# Apply all sysctl settings now
sudo sysctl --system

###############################################################
# Done
###############################################################
log "============================================"
log "Base layer provisioning complete."
log "  OS      : ${DISTRO}"
log "  Driver  : ${DRIVER_VERSION} (open kernel modules, compute-only)"
log "  CUDA    : cuda-toolkit-${CUDA_VERSION}"
log "  DCGM    : ${DCGM_VER}"
log "============================================"
</code></pre>
<h2 id="heading-step-7-assembling-and-running-the-build">Step 7: Assembling and Running the Build</h2>
<p>Validate the template first, then run the build. Validation catches syntax or variable errors early, so the build doesn’t start on a broken config.</p>
<pre><code class="language-shellscript">packer validate -var-file=values.pkrvars.hcl .
</code></pre>
<p>If validation succeeds, you’ll see a short confirmation like <code>The configuration is valid.</code>. After that, start the build. You should expect the process to create a temporary VM, run your provisioners, and produce an image:</p>
<pre><code class="language-plaintext">packer build -var-file=values.pkrvars.hcl .
</code></pre>
<p>The build typically takes <strong>15–20 minutes,</strong> depending on network speed and package installs. Watch the Packer log for three key checkpoints:</p>
<ul>
<li><p><strong>Instance creation</strong> — confirms the temporary VM was provisioned.</p>
</li>
<li><p><strong>Provisioner output</strong> — shows each script step (updates, reboot, <code>script/base.sh</code>) and any errors.</p>
</li>
<li><p><strong>Image creation</strong> — indicates the build finished and an image artifact was written.</p>
</li>
</ul>
<p>If the build fails, copy the failing provisioner’s log lines and re-run the build after fixing the script or variables. For quick troubleshooting, re-run the failing provisioner locally on a matching test VM to iterate faster.</p>
<pre><code class="language-plaintext">googlecompute.gpu-node: output will be in this color.

==&gt; googlecompute.gpu-node: Checking image does not exist...
==&gt; googlecompute.gpu-node: Creating temporary RSA SSH key for instance...
==&gt; googlecompute.gpu-node: no persistent disk to create
==&gt; googlecompute.gpu-node: Using image: ubuntu-2404-noble-amd64-v20260225
==&gt; googlecompute.gpu-node: Creating instance...
==&gt; googlecompute.gpu-node: Loading zone: us-central1-a
==&gt; googlecompute.gpu-node: Loading machine type: g2-standard-4
==&gt; googlecompute.gpu-node: Requesting instance creation...
==&gt; googlecompute.gpu-node: Waiting for creation operation to complete...
==&gt; googlecompute.gpu-node: Instance has been created!
==&gt; googlecompute.gpu-node: Waiting for the instance to become running...
==&gt; googlecompute.gpu-node: IP: 34.58.58.214
==&gt; googlecompute.gpu-node: Using SSH communicator to connect: 34.58.58.214
==&gt; googlecompute.gpu-node: Waiting for SSH to become available...
systemd-logind.service
==&gt; googlecompute.gpu-node:  systemctl restart unattended-upgrades.service
==&gt; googlecompute.gpu-node:
==&gt; googlecompute.gpu-node: No containers need to be restarted.
==&gt; googlecompute.gpu-node:
==&gt; googlecompute.gpu-node: User sessions running outdated binaries:
==&gt; googlecompute.gpu-node:  packer @ session #1: sshd[1535]
==&gt; googlecompute.gpu-node:  packer @ user manager service: systemd[1540]
==&gt; googlecompute.gpu-node: Pausing 1m0s before the next provisioner...
==&gt; googlecompute.gpu-node: Provisioning with shell script: script/base.sh
==&gt; googlecompute.gpu-node: [BASE] DRIVER_VERSION : 590.48.01
==&gt; googlecompute.gpu-node: [BASE] CUDA_VERSION   : 13.1
==&gt; googlecompute.gpu-node: [BASE] Updating system packages...
==&gt; googlecompute.gpu-node: [BASE] Installing kernel headers and build tools...
==&gt; googlecompute.gpu-node: [BASE] Installing CUDA Toolkit 13.1...
==&gt; googlecompute.gpu-node: [BASE] Installing DCGM...
==&gt; googlecompute.gpu-node: [BASE] Enabling nvidia-persistenced...
==&gt; googlecompute.gpu-node: [BASE] Applying system tuning...
==&gt; googlecompute.gpu-node: vm.swappiness=0
==&gt; googlecompute.gpu-node: vm.nr_hugepages=2048
==&gt; googlecompute.gpu-node: Setting cpu: 0
==&gt; googlecompute.gpu-node: Error setting new values. Common errors:
==&gt; googlecompute.gpu-node: [BASE] ============================================
==&gt; googlecompute.gpu-node: [BASE] Base layer provisioning complete.
==&gt; googlecompute.gpu-node: [BASE]   OS      : ubuntu2404
==&gt; googlecompute.gpu-node: [BASE]   Driver  : 590.48.01 (open kernel modules, compute-only)
==&gt; googlecompute.gpu-node: [BASE]   CUDA    : cuda-toolkit-13.1
==&gt; googlecompute.gpu-node: [BASE]   DCGM    : 1:3.3.9
==&gt; googlecompute.gpu-node: [BASE] ============================================
==&gt; googlecompute.gpu-node: Deleting instance...
==&gt; googlecompute.gpu-node: Instance has been deleted!
==&gt; googlecompute.gpu-node: Creating image...
==&gt; googlecompute.gpu-node: Deleting disk...
==&gt; googlecompute.gpu-node: Disk has been deleted!
==&gt; googlecompute.gpu-node: Running post-processor:  (type shell-local)
==&gt; googlecompute.gpu-node (shell-local): Running local shell script: 
==&gt; googlecompute.gpu-node (shell-local): === Image Build Complete ===
==&gt; googlecompute.gpu-node (shell-local): Image ID: packer-69b6c2ee-883a-3602-7bb5-059f1ba27c8b
==&gt; googlecompute.gpu-node (shell-local): Sun Mar 15 15:50:09 WAT 2026
Build 'googlecompute.gpu-node' finished after 17 minutes 55 seconds.

==&gt; Wait completed after 17 minutes 55 seconds

==&gt; Builds finished. The artifacts of successful builds are:
--&gt; googlecompute.gpu-node: A disk image was created in the 'my_project-00000' project: base-gpu-image-1773585134
</code></pre>
<h3 id="heading-step-8-test-the-image-and-verify-the-gpu-stack">Step 8: Test the Image and Verify the GPU Stack</h3>
<p>Confirm the image exists in the GCP Console: <strong>Compute → Storage → Images</strong> and locate your newly created OS image.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5eacc4c926e78ca711dfbbdc/90f304eb-3fe7-4304-b2ad-d86701dde607.png" alt="Your Image information on GCP" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Create a test VM from the image:</p>
<pre><code class="language-plaintext">gcloud compute instances create my-gpu-vm \
  --machine-type=g2-standard-4 \
  --accelerator=count=1,type=nvidia-l4 \
  --image=base-gpu-image-1772718104 \
  --image-project=YOUR_PROJECT_ID \
  --boot-disk-size=50GB \
  --maintenance-policy=TERMINATE \
  --restart-on-failure \
  --zone=us-central1-a

Created [https://www.googleapis.com/compute/v1/projects/my-project-000/zones/us-central1-a/instances/my-gpu-vm].
NAME       ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP    EXTERNAL_IP      STATUS
my-gpu-vm  us-central1-a  g2-standard-4               10.128.15.227  104.154.184.217  RUNNING
</code></pre>
<p>Once the instance is <code>RUNNING</code>, verify the NVIDIA driver and GPU are visible:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5eacc4c926e78ca711dfbbdc/364df8fc-7584-40df-8ab7-b3fe349d5065.png" alt="Output from the Nvidia-SMI command showing Driver and CUDA Version" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<img src="https://cdn.hashnode.com/uploads/covers/5eacc4c926e78ca711dfbbdc/0912c303-3bb0-47fa-aa34-1c91ff26874f.png" alt="Image verifying the persistence mode is enabled" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>The</strong> <code>nvidia-smi</code> <strong>output confirms:</strong></p>
<ul>
<li><p>Driver 590.48.01 loaded</p>
</li>
<li><p>CUDA 13.1 available</p>
</li>
<li><p>Persistence Mode is <code>On</code></p>
</li>
<li><p>The L4 GPU is detected with 23GB VRAM</p>
</li>
<li><p>Zero ECC errors</p>
</li>
<li><p>No running processes (clean idle state).</p>
</li>
</ul>
<p>This is exactly what a healthy base image should look like. Notice <code>Disp.A: Off</code>? That confirms our compute-only driver choice is working — no display adapter is active.</p>
<p>Confirm the installed CUDA toolkit by running. <code>nvcc --version</code>. You can see that version 13.1 was installed as specified.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5eacc4c926e78ca711dfbbdc/cc744624-9408-4348-88d7-61da04b5e1d0.png" alt="Output from the NVCC -Version command" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Let's confirm DCGM installation by running <code>dcgmi discovery -l</code>. Successful output indicates DCGM is running and communicating with the driver.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5eacc4c926e78ca711dfbbdc/114996c6-1f28-43d4-a3fa-13aa7ccd2c82.png" alt="Output from the DCGMI dicovery -l command showing device information" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-conclusion">Conclusion</h2>
<p>You now have a production‑grade, GPU‑optimized base image that includes the NVIDIA compute‑only driver built with open kernel modules, DCGM for monitoring, and the CUDA Toolkit. You also applied OS‑level tuning tailored to GPU compute workloads, providing a consistent, reproducible environment with no manual setup.</p>
<p>From here, you can extend the build by adding an application‑layer script to install frameworks such as PyTorch, TensorFlow, or vLLM, or create an instance template that uses this image to scale your GPU infrastructure.</p>
<p>The full Packer project includes additional scripts for training and inference workloads that you can use to extend your image.</p>
<h2 id="heading-references"><strong>References</strong></h2>
<ul>
<li><p>NVIDIA Driver Installation Guide (Ubuntu): <a href="https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/">https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/</a></p>
</li>
<li><p>NVIDIA CUDA Toolkit Documentation: <a href="https://docs.nvidia.com/cuda/">https://docs.nvidia.com/cuda/</a></p>
</li>
<li><p>NVIDIA Container Toolkit Installation Guide: <a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html">https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html</a></p>
</li>
<li><p>NVIDIA DCGM Documentation: <a href="https://docs.nvidia.com/datacenter/dcgm/latest/index.html">https://docs.nvidia.com/datacenter/dcgm/latest/index.html</a></p>
</li>
<li><p>NVIDIA Persistence Daemon: <a href="https://docs.nvidia.com/deploy/driver-persistence/index.html">https://docs.nvidia.com/deploy/driver-persistence/index.html</a></p>
</li>
<li><p>HashiCorp Packer Documentation: <a href="https://developer.hashicorp.com/packer/docs">https://developer.hashicorp.com/packer/docs</a></p>
</li>
<li><p>Packer Google Compute Builder: <a href="https://developer.hashicorp.com/packer/integrations/hashicorp/googlecompute">https://developer.hashicorp.com/packer/integrations/hashicorp/googlecompute</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Why Chrome OS Is the Operating System the AI Era Was Built For ]]>
                </title>
                <description>
                    <![CDATA[ Chrome OS runs on a read-only filesystem. You can't install executables on the host. There's no traditional desktop environment. Everything that interacts with the underlying system does so through a  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/why-chrome-os-is-the-ai-os/</link>
                <guid isPermaLink="false">69e2765cfd22b8ad62611ba8</guid>
                
                    <category>
                        <![CDATA[ Chrome OS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Linux ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cybersecurity ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Christopher Galliart ]]>
                </dc:creator>
                <pubDate>Fri, 17 Apr 2026 18:05:16 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/c4116a06-9e42-4da5-a152-0fe1433e0857.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Chrome OS runs on a read-only filesystem. You can't install executables on the host. There's no traditional desktop environment. Everything that interacts with the underlying system does so through a sandboxed browser, a containerized Linux terminal, or a cloud connection.</p>
<p>For years, that list of constraints was the reason people dismissed it. But in 2026, it's the reason Chrome OS might be the most correctly designed operating system for what's coming.</p>
<p>The security architecture treats the endpoint as untrusted by default. The containerized Linux environment gives developers a full headless stack without compromising the host. And an upcoming OS-level rewrite, Aluminium, puts Google's on-device AI models directly into the kernel.</p>
<p>This article covers security architecture, the container-based developer environment, cloud-streamed creative tools via AWS NICE DCV, cloud gaming, and what Aluminium OS means for on-device AI.</p>
<h3 id="heading-heres-what-well-cover">Here's what we'll cover:</h3>
<ol>
<li><p><a href="#heading-security-first-architecture-in-the-era-of-ai-powered-threats">Security-First Architecture in an Era of AI-Powered Threats</a></p>
</li>
<li><p><a href="#heading-a-headless-linux-stack-thats-more-flexible-than-it-looks">A Headless Linux Stack That's More Flexible Than It Looks</a></p>
</li>
<li><p><a href="#heading-aws-nice-dcv-changes-the-creative-tools-conversation">AWS NICE DCV Changes the Creative Tools Conversation</a></p>
</li>
<li><p><a href="#heading-cloud-gaming-works">Cloud Gaming Works</a></p>
</li>
<li><p><a href="#heading-aluminum-os-on-device-models-on-googles-own-architecture">Aluminium OS: On-Device Models on Google's Own Architecture</a></p>
</li>
<li><p><a href="#heading-where-this-lands">Where This Lands</a></p>
</li>
</ol>
<h2 id="heading-security-first-architecture-in-an-era-of-ai-powered-threats">Security-First Architecture in an Era of AI-Powered Threats</h2>
<p>Threat actors are getting better tools. Models like Mythos are lowering the barrier for generating convincing phishing campaigns, crafting polymorphic malware, and automating social engineering at scale.</p>
<p>Traditional operating systems present exactly the attack surface these tools target: writable system files, user-installable executables, patches that sit uninstalled for weeks because someone clicked "remind me later."</p>
<p>Chrome OS sidesteps most of this by design. The root filesystem is read-only and cryptographically verified on every boot through a process called Verified Boot.</p>
<p>If anything has modified the OS files since the last verified state, whether that's malware, a compromised package, or a rogue AI agent that decided to start deleting system files, the device detects it at startup and either self-corrects or refuses to boot.</p>
<p>Persistence across reboots isn't difficult. It's architecturally impossible through software alone.</p>
<p>Updates happen silently. While you're working, the system downloads the next OS version to an inactive partition. On your next reboot, it pivots to the updated version. No prompts, no deferred patches, no exposure window.</p>
<p>Major updates ship every four to six weeks. Security patches land every two to three weeks. The gap between vulnerability discovery and remediation is measured in days.</p>
<p>Chrome OS consistently doesn't appear in the top 50 products by CVE count in the NIST vulnerability database. Windows and the Linux kernel sit near the top every year. When AI is actively being weaponized to find and exploit vulnerabilities faster than humans can patch them, a read-only, verified, automatically updated endpoint is a different category of security posture.</p>
<p>The tradeoff is trust. Chrome OS's security model means trusting Google as the root authority for your entire computing stack: updates, certificate trust, telemetry. Organizations with strict data sovereignty requirements should weigh that dependency carefully.</p>
<h2 id="heading-a-headless-linux-stack-thats-more-flexible-than-it-looks">A Headless Linux Stack That's More Flexible Than It Looks</h2>
<p>Chrome OS is a text-based operating system. There's no native GUI layer. Stop and sit with that for a second, because it's the thing that makes people dismiss Chrome OS and also the thing that makes it work.</p>
<p>The entire graphical interface you interact with IS the Chrome browser. The Ash shell, Chrome's window manager, is the desktop. You don't install applications onto it the way you install .exe files on Windows or drag .app bundles into a macOS Applications folder. If it isn't running in a browser tab, an Android VM, or a Linux container, it doesn't run. That restriction is what keeps the host locked down, and it's what makes everything else possible.</p>
<p>Under the hood, Chrome OS runs a minimal virtual machine called Termina through crosvm, Google's Rust-based VM monitor.</p>
<p>Inside Termina, LXD manages Linux containers. The default container, penguin, is a Debian environment with a special trick: it bridges GUI-based Linux applications directly into the Chrome OS desktop through a Wayland proxy called Sommelier. Install VS Code, GIMP, or LibreOffice in penguin and they show up in your Chrome OS app launcher, running in windows alongside your browser tabs. For a lot of developers, penguin alone covers the daily workflow.</p>
<p>But Termina gives you more than penguin. Through the LXD layer you can spin up independent containers that are fully isolated operating systems: Arch, Alpine, Ubuntu, whatever you need.</p>
<p>These aren't attached to the GUI bridge. They run headless, natively, with their own systemd, their own package managers, their own persistent state. Need a clean Ubuntu environment to test a deployment script without touching your main setup? <code>lxc launch</code> and you're there. Need to blow it away? <code>lxc delete</code> and it's gone. No orphaned files on the host, no cross-contamination between environments.</p>
<p>The key distinction from Docker is that LXD runs system containers (full OS emulation) rather than application containers. You get background services, persistent daemons, the works. You can also run Docker inside any of these LXD containers if you need application-level containerization on top of that.</p>
<p>Snapshot your entire environment with <code>lxc snapshot</code> before a risky dependency install and roll back instantly if something breaks. That kind of safety net is broader than version control alone: it captures your full OS configuration, not just code.</p>
<p>Pair this with browser-native tools like GitHub Codespaces, Google Colab, AWS CloudShell, or vscode.dev, and the terminal handles your local tooling while the browser handles everything else.</p>
<p>AI coding assistants like Claude and Gemini already operate natively in the browser. The distance between "cloud IDE" and "local IDE" keeps shrinking.</p>
<p>There are friction points: no custom kernel modules inside Crostini. Nested KVM requires Intel Gen 10+ processors. VPN routing into the Linux container from the Chrome OS host can be a headache, with WireGuard requiring userspace workarounds inside the container.</p>
<p>But none of these break the core architecture for cloud-native work. They're just worth knowing about before you commit.</p>
<h2 id="heading-aws-nice-dcv-changes-the-creative-tools-conversation">AWS NICE DCV Changes the Creative Tools Conversation</h2>
<p>One of the longest-standing arguments against Chrome OS has been the absence of professional creative software. There's no Premiere, no DaVinci Resolve, no Blender, no Ableton. For years, this was a dead-end conversation.</p>
<p>AWS NICE DCV (Desktop Cloud Visualization) reopens it. DCV is a high-performance remote display protocol that streams GPU-accelerated desktop sessions from EC2 instances to any device, including a Chromebook running the browser-based DCV client. It supports OpenGL, Vulkan, and DirectX rendering, with adaptive encoding that adjusts to network conditions. On AWS, the DCV license is free. You pay only for the EC2 compute time.</p>
<p>Netflix engineers use DCV to stream content creation applications to remote artists. Volkswagen runs 3D CAD simulations across their engineering division through it. A VFX studio called RVX used it to deliver visual effects for HBO's The Last of Us, streaming Nuke, Maya, Houdini, and Blender to artists distributed across Europe from servers in Iceland. Their team said it was the best remote experience they'd ever worked with.</p>
<p>So: a Chromebook connected to a g5.xlarge EC2 instance (one A10G GPU) can run Blender, DaVinci Resolve, or any other GPU-accelerated creative application with full hardware acceleration. The rendering happens in the data center. DCV streams the pixels. The creative professional gets a responsive, high-fidelity workspace on a $400 machine that couldn't locally render a single frame.</p>
<p>The constraints are connectivity and cost. You need sustained bandwidth (25+ Mbps for 1080p work, more for 4K multi-monitor setups) and leaving a GPU instance running around the clock adds up. But for studios and professionals who already budget for high-end workstations, the math often pencils out, especially when you factor in zero local hardware maintenance and the ability to scale GPU power on demand.</p>
<h2 id="heading-cloud-gaming-works">Cloud Gaming Works</h2>
<p>GeForce NOW survived where Stadia failed because it made a better business decision: bring your own games. Connect your existing Steam, Epic, or Ubisoft library and stream from NVIDIA's server-side hardware. The Ultimate tier now runs on RTX 5080-class infrastructure. 4K at 120fps with ray tracing, on a fanless Chromebook.</p>
<p>Chrome OS has a structural advantage as a cloud gaming client. GeForce NOW runs natively in the Chromium browser via WebRTC, and users consistently report less micro-stuttering and tighter input handling than the standalone Windows desktop app. Under good network conditions, measured total latency runs 13 to 14ms, with sub-3ms ping documented near datacenter proximity. That's below human perceptual threshold for most game types.</p>
<p>Anti-cheat systems like Easy Anti-Cheat and Riot Vanguard are a non-issue in this model. They run on the server where the game executes, not on your local endpoint. On-device gaming isn't viable on Chrome OS and likely never will be. The architecture isn't designed for it, and even projects attempting to bridge local GPUs hit bottlenecks in the container layers. Cloud gaming is the path, and it works.</p>
<p>The limiting factors are network-dependent. Latency spikes above 500ms on bad connections make fast-twitch games unplayable, and NVIDIA's 100-hour monthly cap on the Ultimate tier has drawn criticism. But cloud gaming on Chrome OS has crossed the line from novelty to daily-driver viable for most use cases.</p>
<h2 id="heading-aluminium-os-on-device-models-on-googles-own-architecture">Aluminium OS: On-Device Models on Google's Own Architecture</h2>
<p>The most consequential near-term development for Chrome OS is Project Aluminium, a ground-up rewrite that replaces the current Chrome OS foundation with a native Android kernel. Not another bolted-on compatibility layer: a new operating system built on Android 16, designed to run Android applications natively with direct hardware acceleration instead of routing them through the resource-heavy ARCVM virtual machine that currently eats CPU cycles on even basic app launches.</p>
<p>The AI story is the real story. Aluminium is being built with Gemini models integrated directly into the OS: the file system, the application launcher, the window manager.</p>
<p>Google serving their own proprietary models on their own devices, using an architecture optimized specifically to run them, is a level of vertical integration that no other OS vendor has in the pipeline. Apple has the silicon advantage for local inference. Google has the model-to-OS integration advantage. Those are competing theses about where AI compute should live, and both are worth taking seriously.</p>
<p>The rollout timeline from court documents and leaked roadmaps puts a trusted tester program on select hardware in late 2026, premium tablets by early 2027, and general consumer availability in 2028. Chrome OS Classic gets maintained through existing support obligations until 2033 or 2034.</p>
<p>The launch won't be perfect. Google's track record on platform transitions gives the community earned skepticism. But the ability to iterate a natively AI-integrated OS on hardware they control is the kind of capability that compounds over time.</p>
<h2 id="heading-where-this-lands">Where This Lands</h2>
<p>Two years ago, calling Chrome OS a serious platform for development or creative work would have been a stretch. Today you can run a full Debian environment with systemd daemons, snapshot your workspace, stream Blender from a GPU-backed data center, play AAA games at 4K on hardware you don't own, and do all of it from a verified, read-only endpoint that patches itself while you sleep.</p>
<p>The remaining gaps are real. But they're concentrated in workflows that are themselves moving to the cloud. Chrome OS was designed around assumptions about computing that used to be premature. They're not premature anymore.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Authenticate Users in Kubernetes: x509 Certificates, OIDC, and Cloud Identity ]]>
                </title>
                <description>
                    <![CDATA[ Kubernetes doesn't know who you are. It has no user database, no built-in login system, no password file. When you run kubectl get pods, Kubernetes receives an HTTP request and asks one question: who  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-authenticate-users-in-kubernetes-x509-certificates-oidc-and-cloud-identity/</link>
                <guid isPermaLink="false">69d4182f40c9cabf4484dbdb</guid>
                
                    <category>
                        <![CDATA[ Kubernetes ]]>
                    </category>
                
                    <category>
                        <![CDATA[ authentication ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Security ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Destiny Erhabor ]]>
                </dc:creator>
                <pubDate>Mon, 06 Apr 2026 20:31:43 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/36356282-0cfb-43a8-8461-84f20e64b041.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Kubernetes doesn't know who you are.</p>
<p>It has no user database, no built-in login system, no password file. When you run <code>kubectl get pods</code>, Kubernetes receives an HTTP request and asks one question: who signed this, and do I trust that signature? Everything else — what you're allowed to do, which namespaces you can access, whether your request goes through at all — comes after that question is answered.</p>
<p>This surprises most engineers who are new to Kubernetes. They expect something like a database of users with passwords. Instead, they find a pluggable chain of authenticators, each one able to vouch for a request in a different way:</p>
<ul>
<li><p>Client certificates</p>
</li>
<li><p>OIDC tokens from an external identity provider</p>
</li>
<li><p>Cloud provider IAM tokens</p>
</li>
<li><p>Service account tokens projected into pods.</p>
</li>
</ul>
<p>Any of these can be active at the same time.</p>
<p>Understanding this model is what separates engineers who can debug authentication failures from engineers who copy kubeconfig files and hope for the best.</p>
<p>In this article, you'll work through how the Kubernetes authentication chain works from first principles. You'll see how x509 client certificates are used — and why they're a poor choice for human users in production. You'll configure OIDC authentication with Dex, giving your cluster a real browser-based login flow. And you'll see how AWS, GCP, and Azure each plug into the same underlying model.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p>A running kind cluster — a fresh one works fine, or reuse an existing one</p>
</li>
<li><p><code>kubectl</code> and <code>helm</code> installed</p>
</li>
<li><p><code>openssl</code> available on your machine (comes pre-installed on macOS and most Linux distros)</p>
</li>
<li><p>Basic familiarity with what a JWT is (a signed JSON object with claims) — you don't need to be able to write one, just recognise one</p>
</li>
</ul>
<p>All demo files are in the <a href="https://github.com/Caesarsage/DevOps-Cloud-Projects/tree/main/intermediate/k8/security">companion GitHub repository</a>.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-how-kubernetes-authentication-works">How Kubernetes Authentication Works</a></p>
<ul>
<li><p><a href="#heading-the-authenticator-chain">The Authenticator Chain</a></p>
</li>
<li><p><a href="#heading-users-vs-service-accounts">Users vs Service Accounts</a></p>
</li>
<li><p><a href="#heading-what-happens-after-authentication">What Happens After Authentication</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-how-to-use-x509-client-certificates">How to Use x509 Client Certificates</a></p>
<ul>
<li><p><a href="#heading-how-the-certificate-maps-to-an-identity">How the Certificate Maps to an Identity</a></p>
</li>
<li><p><a href="#the-cluster-ca">The Cluster CA</a></p>
</li>
<li><p><a href="#heading-the-limits-of-certificate-based-auth">The Limits of Certificate-Based Auth</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-demo-1--create-and-use-an-x509-client-certificate">Demo 1 — Create and Use an x509 Client Certificate</a></p>
</li>
<li><p><a href="#heading-how-to-set-up-oidc-authentication">How to Set Up OIDC Authentication</a></p>
<ul>
<li><p><a href="#heading-how-the-oidc-flow-works-in-kubernetes">How the OIDC Flow Works in Kubernetes</a></p>
</li>
<li><p><a href="#heading-the-api-server-configuration">The API Server Configuration</a></p>
</li>
<li><p><a href="#heading-jwt-claims-kubernetes-uses">JWT Claims Kubernetes Uses</a></p>
</li>
<li><p><a href="#heading-how-kubelogin-works">How kubelogin Works</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-demo-2--configure-oidc-login-with-dex-and-kubelogin">Demo 2 — Configure OIDC Login with Dex and kubelogin</a></p>
</li>
<li><p><a href="#heading-cloud-provider-authentication">Cloud Provider Authentication</a></p>
<ul>
<li><p><a href="#heading-aws-eks">AWS EKS</a></p>
</li>
<li><p><a href="#heading-google-gke">Google GKE</a></p>
</li>
<li><p><a href="#heading-azure-aks">Azure AKS</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-webhook-token-authentication">Webhook Token Authentication</a></p>
</li>
<li><p><a href="#heading-cleanup">Cleanup</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-how-kubernetes-authentication-works">How Kubernetes Authentication Works</h2>
<p>Every request that reaches the Kubernetes API server — whether from <code>kubectl</code>, a pod, a controller, or a CI pipeline — carries a credential of some kind.</p>
<p>The API server passes that credential through a chain of authenticators in sequence. The first authenticator that can verify the credential wins. If none can, the request is treated as anonymous.</p>
<h3 id="heading-the-authenticator-chain">The Authenticator Chain</h3>
<p>Kubernetes supports several authentication strategies simultaneously. You can have client certificate authentication and OIDC authentication active on the same cluster at the same time, which is common in production: cluster administrators use certificates, regular developers use OIDC. The strategies active on a cluster are determined by flags passed to the <code>kube-apiserver</code> process.</p>
<p>The strategies available are x509 client certificates, bearer tokens (static token files — rarely used in production), bootstrap tokens (used during node join operations), service account tokens, OIDC tokens, authenticating proxies, and webhook token authentication. A cluster doesn't have to use all of them, and most don't. But knowing they all exist helps when you're diagnosing an auth failure.</p>
<h3 id="heading-users-vs-service-accounts">Users vs Service Accounts</h3>
<p>There is an important distinction in how Kubernetes thinks about identity. Service accounts are Kubernetes objects — they live in a namespace, get created with <code>kubectl create serviceaccount</code>, and have tokens managed by the cluster itself. Every pod runs as a service account. These are machine identities for workloads.</p>
<p>Users, on the other hand, don't exist as Kubernetes objects at all. There is no <code>kubectl create user</code> command. Kubernetes doesn't manage user accounts. Instead, it trusts external systems to assert user identity — a certificate authority, an OIDC provider, or a cloud provider's IAM system. Kubernetes just verifies the assertion and extracts the username and group memberships from it.</p>
<table>
<thead>
<tr>
<th></th>
<th>Service Account</th>
<th>User</th>
</tr>
</thead>
<tbody><tr>
<td>Kubernetes object?</td>
<td>Yes — lives in a namespace</td>
<td>No — managed externally</td>
</tr>
<tr>
<td>Created with</td>
<td><code>kubectl create serviceaccount</code></td>
<td>External system (CA, IdP, cloud IAM)</td>
</tr>
<tr>
<td>Used by</td>
<td>Pods and workloads</td>
<td>Humans and CI systems</td>
</tr>
<tr>
<td>Token managed by</td>
<td>Kubernetes</td>
<td>External system</td>
</tr>
<tr>
<td>Namespaced?</td>
<td>Yes</td>
<td>No</td>
</tr>
</tbody></table>
<h3 id="heading-what-happens-after-authentication">What Happens After Authentication</h3>
<p>Authentication only answers one question: who is this? Once the API server has a verified identity — a username and zero or more group memberships — it passes the request to the authorisation layer. By default that is RBAC, which checks the identity against Role and ClusterRole bindings to determine what the request is allowed to do.</p>
<p>This is why authentication and authorisation are separate concerns in Kubernetes. A valid certificate gets you past the front door. What you can do inside is RBAC's job. An authenticated user with no RBAC bindings can authenticate successfully but will be denied every API call.</p>
<p>If you want a deep dive into how RBAC rules, roles, and bindings work, check out this handbook on <a href="https://www.freecodecamp.org/news/how-to-secure-a-kubernetes-cluster-handbook/">How to Secure a Kubernetes Cluster: RBAC, Pod Hardening, and Runtime Protection</a>.</p>
<h2 id="heading-how-to-use-x509-client-certificates">How to Use x509 Client Certificates</h2>
<p>x509 client certificate authentication is the oldest and simplest authentication method in Kubernetes. It's how <code>kubectl</code> works out of the box when you create a cluster — the kubeconfig file that <code>kind</code> or <code>kubeadm</code> generates contains an embedded client certificate signed by the cluster's Certificate Authority.</p>
<h3 id="heading-how-the-certificate-maps-to-an-identity">How the Certificate Maps to an Identity</h3>
<p>When the API server receives a request with a client certificate, it validates the certificate against its trusted CA, then reads two fields (The Common Name and Organization) from the certificate to construct an identity.</p>
<p>The <strong>Common Name (CN)</strong> field becomes the username. The <strong>Organization (O)</strong> field, which can contain multiple values, becomes the list of groups the user belongs to.</p>
<p>So a certificate with <code>CN=jane</code> and <code>O=engineering</code> authenticates as username <code>jane</code> in group <code>engineering</code>. If you want to give <code>jane</code> permissions, you create a RoleBinding that references either the username <code>jane</code> or the group <code>engineering</code> as a subject.</p>
<p>This is the same mechanism behind <code>system:masters</code>. When <code>kind</code> creates a cluster and writes a kubeconfig for you, it generates a certificate with <code>O=system:masters</code>. Kubernetes has a built-in ClusterRoleBinding that grants <code>cluster-admin</code> to anyone in the <code>system:masters</code> group. That's why your default kubeconfig has full admin access — it's not magic, it's a certificate with the right group.</p>
<h3 id="heading-the-cluster-ca">The Cluster CA</h3>
<p>Every Kubernetes cluster has a root Certificate Authority — a private key and a self-signed certificate that the API server trusts. Any client certificate signed by this CA is trusted by the cluster.</p>
<p>The CA certificate and key are typically stored in <code>/etc/kubernetes/pki/</code> on the control plane node, or in the <code>kube-system</code> namespace as a secret, depending on how the cluster was created.</p>
<p>On kind clusters, you can copy the CA cert and key directly from the control plane container:</p>
<pre><code class="language-bash">docker cp k8s-security-control-plane:/etc/kubernetes/pki/ca.crt ./ca.crt
docker cp k8s-security-control-plane:/etc/kubernetes/pki/ca.key ./ca.key
</code></pre>
<p>Whoever holds the CA key can issue certificates for any username and any group, including <code>system:masters</code>. This makes the CA key the most sensitive secret in a Kubernetes cluster. Guard it accordingly.</p>
<h3 id="heading-the-limits-of-certificate-based-auth">The Limits of Certificate-Based Auth</h3>
<p>Client certificates work, but they have two fundamental problems that make them a poor choice for human users in production.</p>
<p>The first is that <strong>Kubernetes doesn't check certificate revocation lists (CRLs)</strong>. If a developer's kubeconfig is stolen, the embedded certificate remains valid until it expires — which is typically one year in most Kubernetes setups. There's no way to immediately invalidate it. You can't "log out" a certificate. The only mitigation is to rotate the entire cluster CA, which invalidates every certificate including those belonging to other legitimate users.</p>
<p>The second is <strong>operational overhead</strong>. Certificates must be generated, distributed to users, and rotated before expiry. There's no self-service. In a team of ten engineers, managing certificates is annoying. In a team of a hundred, it's a full-time job.</p>
<p>For human access in production, OIDC is the right answer: short-lived tokens issued by a trusted identity provider, with a central revocation mechanism, and a standard browser-based login flow. Certificates are fine for service accounts and automation, where token management can be automated and rotation is handled programmatically.</p>
<p>That said, understanding certificates isn't optional. Your kubeconfig uses one. Your CI system probably does too. And cert-based auth is what you fall back to when everything else breaks.</p>
<h2 id="heading-demo-1-create-and-use-an-x509-client-certificate">Demo 1 — Create and Use an x509 Client Certificate</h2>
<p>In this section, you'll generate a user certificate signed by the cluster CA, bind it to an RBAC role, and use it to authenticate to the cluster as a different user.</p>
<p><strong>This guide is for local development and learning only.</strong> Manually signing certificates with the cluster CA and storing keys on disk is done here for simplicity.</p>
<p>In production, you should use the Kubernetes CertificateSigningRequest API or cert-manager for certificate issuance, enforce short-lived certificates with automatic rotation, and store private keys in a secrets manager (HashiCorp Vault, AWS Secrets Manager) or hardware security module (HSM) — never distribute the cluster CA key.</p>
<h3 id="heading-step-1-copy-the-ca-cert-and-key-from-the-kind-control-plane">Step 1: Copy the CA cert and key from the kind control plane</h3>
<pre><code class="language-bash">docker cp k8s-security-control-plane:/etc/kubernetes/pki/ca.crt ./ca.crt
docker cp k8s-security-control-plane:/etc/kubernetes/pki/ca.key ./ca.key
</code></pre>
<p>This will create two files in your current directory called <code>ca.crt</code> and <code>ca.key</code></p>
<h3 id="heading-step-2-generate-a-private-key-and-csr-for-a-new-user">Step 2: Generate a private key and CSR for a new user</h3>
<p>You're creating a certificate for a user named <code>jane</code> in the <code>engineering</code> group:</p>
<pre><code class="language-bash"># Generate the private key
openssl genrsa -out jane.key 2048

# Generate a Certificate Signing Request
# CN = username, O = group
openssl req -new \
  -key jane.key \
  -out jane.csr \
  -subj "/CN=jane/O=engineering"
</code></pre>
<h3 id="heading-step-3-sign-the-csr-with-the-cluster-ca">Step 3: Sign the CSR with the cluster CA</h3>
<pre><code class="language-bash">openssl x509 -req \
  -in jane.csr \
  -CA ca.crt \
  -CAkey ca.key \
  -CAcreateserial \
  -out jane.crt \
  -days 365
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">Certificate request self-signature ok
subject=CN=jane, O=engineering
</code></pre>
<h3 id="heading-step-4-inspect-the-certificate">Step 4: Inspect the certificate</h3>
<p>Before using it, confirm the identity it carries:</p>
<pre><code class="language-bash">openssl x509 -in jane.crt -noout -subject -dates
</code></pre>
<pre><code class="language-plaintext">subject=CN=jane, O=engineering
notBefore=Mar 20 10:00:00 2024 GMT
notAfter=Mar 20 10:00:00 2025 GMT
</code></pre>
<p>One year from now, this certificate becomes invalid and must be replaced. There's no way to extend it — you have to issue a new one.</p>
<h3 id="heading-step-5-build-a-kubeconfig-entry-for-jane">Step 5: Build a kubeconfig entry for jane</h3>
<pre><code class="language-bash"># Get the cluster API server address from the current context
APISERVER=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}')

# Create a kubeconfig for jane
kubectl config set-cluster k8s-security \
  --server=$APISERVER \
  --certificate-authority=ca.crt \
  --embed-certs=true \
  --kubeconfig=jane.kubeconfig

kubectl config set-credentials jane \
  --client-certificate=jane.crt \
  --client-key=jane.key \
  --embed-certs=true \
  --kubeconfig=jane.kubeconfig

kubectl config set-context jane@k8s-security \
  --cluster=k8s-security \
  --user=jane \
  --kubeconfig=jane.kubeconfig

kubectl config use-context jane@k8s-security \
  --kubeconfig=jane.kubeconfig
</code></pre>
<h3 id="heading-step-6-test-authentication-before-rbac">Step 6: Test authentication — before RBAC</h3>
<p>Try to list pods using jane's kubeconfig:</p>
<pre><code class="language-bash">kubectl get pods -n staging --kubeconfig=jane.kubeconfig
</code></pre>
<pre><code class="language-plaintext">Error from server (Forbidden): pods is forbidden: User "jane" cannot list
resource "pods" in API group "" in the namespace "staging"
</code></pre>
<p>This is correct. Jane authenticated successfully — Kubernetes knows who she is. But she has no RBAC bindings, so every API call is denied. Authentication passed, but authorisation failed.</p>
<h3 id="heading-step-7-grant-jane-access-with-rbac">Step 7: Grant jane access with RBAC</h3>
<p>RBAC bindings use the username exactly as it appears in the certificate's CN field. If you need a refresher on how Roles, ClusterRoles, and RoleBindings work, this handbook <a href="https://www.freecodecamp.org/news/how-to-secure-a-kubernetes-cluster-handbook/">How to Secure a Kubernetes Cluster: RBAC, Pod Hardening, and Runtime Protection</a> covers the full RBAC model. For now, a simple RoleBinding using the built-in <code>view</code> ClusterRole is enough:</p>
<pre><code class="language-yaml"># jane-rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: jane-reader
  namespace: staging
subjects:
  - kind: User
    name: jane          # matches the CN in the certificate
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: view
  apiGroup: rbac.authorization.k8s.io
</code></pre>
<pre><code class="language-bash">kubectl apply -f jane-rolebinding.yaml
kubectl get pods -n staging --kubeconfig=jane.kubeconfig
</code></pre>
<pre><code class="language-plaintext">No resources found in staging namespace.
</code></pre>
<p>No error — jane can now list pods in <code>staging</code>. She can't delete them, create them, or access other namespaces. The certificate got her in. RBAC determines what she can do.</p>
<h2 id="heading-how-to-set-up-oidc-authentication">How to Set Up OIDC Authentication</h2>
<p>OpenID Connect is an identity layer on top of OAuth 2.0. It's how Kubernetes integrates with enterprise identity providers — Active Directory, Okta, Google Workspace, Keycloak, and any other provider that speaks OIDC. Understanding how Kubernetes uses it requires following the token from the user's browser to the API server's decision.</p>
<h3 id="heading-how-the-oidc-flow-works-in-kubernetes">How the OIDC Flow Works in Kubernetes</h3>
<p>When a developer runs <code>kubectl get pods</code> with OIDC configured, the following happens:</p>
<ol>
<li><p><code>kubectl</code> checks whether the current credential in the kubeconfig is a valid, unexpired OIDC token</p>
</li>
<li><p>If not, it launches <code>kubelogin</code>, a kubectl plugin that opens a browser window</p>
</li>
<li><p>The browser redirects to the OIDC provider (Dex, Okta, your corporate IdP)</p>
</li>
<li><p>The user logs in with their corporate credentials</p>
</li>
<li><p>The OIDC provider issues a signed JWT and returns it to kubelogin</p>
</li>
<li><p>kubelogin caches the token locally (under <code>~/.kube/cache/oidc-login/</code>) and returns it to <code>kubectl</code></p>
</li>
<li><p><code>kubectl</code> sends the token to the API server as a <code>Bearer</code> header</p>
</li>
<li><p>The API server fetches the provider's public keys from its JWKS endpoint and verifies the token signature</p>
</li>
<li><p>If valid, the API server extracts the username and group claims from the token</p>
</li>
<li><p>RBAC takes over from there</p>
</li>
</ol>
<p>The Kubernetes API server never contacts the OIDC provider for each request. It only fetches the provider's public keys periodically to verify signatures locally. This makes OIDC authentication stateless and scalable.</p>
<h3 id="heading-the-api-server-configuration">The API Server Configuration</h3>
<p>For OIDC to work, the API server needs to know where to find the identity provider and how to interpret the tokens it issues.</p>
<p>In Kubernetes v1.30+, this is configured through an <code>AuthenticationConfiguration</code> file passed via the <code>--authentication-config</code> flag. (In older versions, individual <code>--oidc-*</code> flags were used instead, but these were removed in v1.35.)</p>
<p>The <code>AuthenticationConfiguration</code> defines OIDC providers under the <code>jwt</code> key:</p>
<table>
<thead>
<tr>
<th>Field</th>
<th>What it does</th>
<th>Example</th>
</tr>
</thead>
<tbody><tr>
<td><code>issuer.url</code></td>
<td>The OIDC provider's base URL — must match the <code>iss</code> claim in the token</td>
<td><code>https://dex.example.com</code></td>
</tr>
<tr>
<td><code>issuer.audiences</code></td>
<td>The client IDs the token was issued for — must match the <code>aud</code> claim</td>
<td><code>["kubernetes"]</code></td>
</tr>
<tr>
<td><code>issuer.certificateAuthority</code></td>
<td>CA certificate to trust when contacting the OIDC provider (inlined PEM)</td>
<td><code>-----BEGIN CERTIFICATE-----...</code></td>
</tr>
<tr>
<td><code>claimMappings.username.claim</code></td>
<td>Which JWT claim to use as the Kubernetes username</td>
<td><code>email</code></td>
</tr>
<tr>
<td><code>claimMappings.groups.claim</code></td>
<td>Which JWT claim to use as the Kubernetes group list</td>
<td><code>groups</code></td>
</tr>
<tr>
<td><code>claimMappings.*.prefix</code></td>
<td>Prefix added to the claim value — set to <code>""</code> for no prefix</td>
<td><code>""</code></td>
</tr>
</tbody></table>
<p>On a kind cluster, the <code>--authentication-config</code> flag is set in the cluster configuration before creation, not after. You'll see this in the next demo.</p>
<h3 id="heading-jwt-claims-kubernetes-uses">JWT Claims Kubernetes Uses</h3>
<p>A JWT is a signed JSON object with three sections: a header, a payload, and a signature. The payload is a set of claims – key-value pairs that assert facts about the token. Kubernetes reads specific claims from the payload to build an identity.</p>
<p>The required claims are <code>iss</code> (the issuer URL, must match <code>issuer.url</code> in the <code>AuthenticationConfiguration</code>), <code>sub</code> (the subject, a unique identifier for the user), and <code>aud</code> (the audience, must match the <code>issuer.audiences</code> list). The <code>exp</code> claim (expiry time) is also required as the API server rejects expired tokens.</p>
<p>The most useful optional claim is <code>groups</code> (or whatever you configure via <code>claimMappings.groups.claim</code>). When this claim is present, Kubernetes can map OIDC group memberships directly to RBAC group bindings. A user in the <code>platform-engineers</code> group in your identity provider automatically gets the RBAC permissions you've bound to that group in Kubernetes — no manual user management required.</p>
<h3 id="heading-how-kubelogin-works">How kubelogin Works</h3>
<p>kubelogin (also distributed as <code>kubectl oidc-login</code>) is a kubectl credential plugin. Instead of embedding a static certificate or token in your kubeconfig, you configure a credential plugin that runs a helper binary when <code>kubectl</code> needs a token.</p>
<p>When kubelogin is invoked, it checks its local token cache. If the cached token is still valid, it returns it immediately. If the token has expired, it initiates the OIDC authorization code flow — opens a browser, redirects to the identity provider, receives the token after login, caches it locally, and returns it to <code>kubectl</code>. The whole flow takes about five seconds when it triggers.</p>
<p>This means tokens are short-lived (typically an hour) and rotate automatically. If a developer's machine is compromised, the token expires on its own. There is no long-lived credential sitting in a file somewhere.</p>
<h2 id="heading-demo-2-configure-oidc-login-with-dex-and-kubelogin">Demo 2 — Configure OIDC Login with Dex and kubelogin</h2>
<p>In this section, you'll deploy Dex as a self-hosted OIDC provider, configure a kind cluster to trust it, and log in with a browser. Dex is a good demo vehicle because it runs inside the cluster and doesn't require a cloud account or an external service.</p>
<p><strong>This guide is for local development and learning only.</strong> Self-signed certificates, static passwords, and certs stored on disk are used here for simplicity.</p>
<p>In production, use a managed identity provider (Azure Entra ID, Google Workspace, Okta), automate certificate lifecycle with cert-manager, and store secrets in a secrets manager (HashiCorp Vault, AWS Secrets Manager) or inject them via CSI driver — never commit or store certs as local files.</p>
<h3 id="heading-step-1-create-a-kind-cluster-with-oidc-authentication">Step 1: Create a kind cluster with OIDC authentication</h3>
<p>OIDC authentication for the API server must be configured at cluster creation time on Kind because the API server needs to know which identity provider to trust before it starts accepting requests.</p>
<p><strong>Note:</strong> Kubernetes v1.30+ deprecated the <code>--oidc-*</code> API server flags in favor of the structured <code>AuthenticationConfiguration</code> API (via <code>--authentication-config</code>). In v1.35+ the old flags are removed entirely. This guide uses the new approach.</p>
<p><strong>nip.io</strong> is a wildcard DNS service — <code>dex.127.0.0.1.nip.io</code> resolves to <code>127.0.0.1</code>. This lets us use a real hostname for TLS without editing <code>/etc/hosts</code>.</p>
<p>First, generate a self-signed CA and TLS certificate for Dex:</p>
<pre><code class="language-bash"># Generate a CA for Dex
openssl req -x509 -newkey rsa:4096 -keyout dex-ca.key \
  -out dex-ca.crt -days 365 -nodes \
  -subj "/CN=dex-ca"

# Generate a certificate for Dex signed by that CA
openssl req -newkey rsa:2048 -keyout dex.key \
  -out dex.csr -nodes \
  -subj "/CN=dex.127.0.0.1.nip.io"

openssl x509 -req -in dex.csr \
  -CA dex-ca.crt -CAkey dex-ca.key \
  -CAcreateserial -out dex.crt -days 365 \
  -extfile &lt;(printf "subjectAltName=DNS:dex.127.0.0.1.nip.io")
</code></pre>
<p>Next, generate the <code>AuthenticationConfiguration</code> file. This tells the API server how to validate JWTs — which issuer to trust (<code>url</code>), which audience to expect (<code>audiences</code>), and which JWT claims map to Kubernetes usernames and groups (<code>claimMappings</code>). The CA cert is inlined so the API server can verify Dex's TLS certificate when fetching signing keys:</p>
<pre><code class="language-bash">cat &gt; auth-config.yaml &lt;&lt;EOF
apiVersion: apiserver.config.k8s.io/v1beta1
kind: AuthenticationConfiguration
jwt:
  - issuer:
      url: https://dex.127.0.0.1.nip.io:32000
      audiences:
        - kubernetes
      certificateAuthority: |
$(sed 's/^/        /' dex-ca.crt)
    claimMappings:
      username:
        claim: email
        prefix: ""
      groups:
        claim: groups
        prefix: ""
EOF
</code></pre>
<p>The <code>kind-oidc.yaml</code> config uses <code>extraPortMappings</code> to expose Dex's port to your browser, <code>extraMounts</code> to copy files into the Kind node, and a <code>kubeadmConfigPatch</code> to pass <code>--authentication-config</code> to the API server:</p>
<pre><code class="language-yaml"># kind-oidc.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    extraPortMappings:
      # Forward port 32000 from the Docker container to localhost,
      # so your browser can reach Dex's login page
      - containerPort: 32000
        hostPort: 32000
        protocol: TCP
    extraMounts:
      # Copy files from your machine into the Kind node's filesystem
      - hostPath: ./dex-ca.crt
        containerPath: /etc/ca-certificates/dex-ca.crt
        readOnly: true
      - hostPath: ./auth-config.yaml
        containerPath: /etc/kubernetes/auth-config.yaml
        readOnly: true
    kubeadmConfigPatches:
      # Patch the API server to enable OIDC authentication
      - |
        kind: ClusterConfiguration
        apiServer:
          extraArgs:
            # Tell the API server to load our AuthenticationConfiguration
            authentication-config: /etc/kubernetes/auth-config.yaml
          extraVolumes:
            # Mount files into the API server pod (it runs as a static pod,
            # so it needs explicit volume mounts even though files are on the node)
            - name: dex-ca
              hostPath: /etc/ca-certificates/dex-ca.crt
              mountPath: /etc/ca-certificates/dex-ca.crt
              readOnly: true
              pathType: File
            - name: auth-config
              hostPath: /etc/kubernetes/auth-config.yaml
              mountPath: /etc/kubernetes/auth-config.yaml
              readOnly: true
              pathType: File
</code></pre>
<p>Create the cluster:</p>
<pre><code class="language-bash">kind create cluster --name k8s-auth --config kind-oidc.yaml
</code></pre>
<h3 id="heading-step-2-deploy-dex">Step 2: Deploy Dex</h3>
<p>Dex is an OIDC-compliant identity provider that acts as a bridge between Kubernetes and upstream identity sources (LDAP, SAML, GitHub, and so on). In this demo it runs inside the cluster with a static password database — two hardcoded users you can log in as.</p>
<p>The API server doesn't talk to Dex directly on every request. It only needs Dex's CA certificate (which you inlined in the <code>AuthenticationConfiguration</code>) to verify the JWT signatures on tokens that Dex issues.</p>
<p>The deployment has four parts: a ConfigMap with Dex's configuration, a Deployment to run Dex, a NodePort Service to expose it on port 32000 (matching the issuer URL), and RBAC resources so Dex can store state using Kubernetes CRDs.</p>
<p>First, create the namespace and load the TLS certificate as a Kubernetes Secret. Dex needs this to serve HTTPS. Without it, your browser and the API server would refuse to connect:</p>
<pre><code class="language-bash">kubectl create namespace dex

kubectl create secret tls dex-tls \
  --cert=dex.crt \
  --key=dex.key \
  -n dex
</code></pre>
<p>Save the following as <code>dex-config.yaml</code>. This configures Dex with a static password connector — two hardcoded users for the demo:</p>
<pre><code class="language-yaml"># dex-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dex-config
  namespace: dex
data:
  config.yaml: |
    # issuer must exactly match the URL in your AuthenticationConfiguration
    issuer: https://dex.127.0.0.1.nip.io:32000

    # Dex stores refresh tokens and auth codes — here it uses Kubernetes CRDs
    storage:
      type: kubernetes
      config:
        inCluster: true

    # Dex's HTTPS listener — serves the login page and token endpoints
    web:
      https: 0.0.0.0:5556
      tlsCert: /etc/dex/tls/tls.crt
      tlsKey: /etc/dex/tls/tls.key

    # staticClients defines which applications can request tokens.
    # "kubernetes" is the client ID that kubelogin uses when authenticating
    staticClients:
      - id: kubernetes
        redirectURIs:
          - http://localhost:8000     # kubelogin listens here to receive the callback
        name: Kubernetes
        secret: kubernetes-secret     # shared secret between kubelogin and Dex

    # Two demo users with the password "password" (bcrypt-hashed).
    # In production, you'd connect Dex to LDAP, SAML, or a social login instead
    enablePasswordDB: true
    staticPasswords:
      - email: "jane@example.com"
        # bcrypt hash of "password" — generate your own with: htpasswd -bnBC 10 "" password
        hash: "\(2a\)10$2b2cU8CPhOTaGrs1HRQuAueS7JTT5ZHsHSzYiFPm1leZck7Mc8T4W"
        username: "jane"
        userID: "08a8684b-db88-4b73-90a9-3cd1661f5466"
      - email: "admin@example.com"
        hash: "\(2a\)10$2b2cU8CPhOTaGrs1HRQuAueS7JTT5ZHsHSzYiFPm1leZck7Mc8T4W"
        username: "admin"
        userID: "a8b53e13-7e8c-4f7b-9a33-6c2f4d8c6a1b"
        groups:
          - platform-engineers
</code></pre>
<p>Save the following as <code>dex-deployment.yaml</code>. This creates the Deployment, Service, ServiceAccount, and RBAC that Dex needs to run:</p>
<pre><code class="language-yaml"># dex-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dex
  namespace: dex
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dex
  template:
    metadata:
      labels:
        app: dex
    spec:
      serviceAccountName: dex
      containers:
        - name: dex
          # v2.45.0+ required — earlier versions don't include groups from staticPasswords in tokens
          image: ghcr.io/dexidp/dex:v2.45.0
          command: ["dex", "serve", "/etc/dex/cfg/config.yaml"]
          ports:
            - name: https
              containerPort: 5556
          volumeMounts:
            - name: config
              mountPath: /etc/dex/cfg
            - name: tls
              mountPath: /etc/dex/tls
      volumes:
        - name: config
          configMap:
            name: dex-config
        - name: tls
          secret:
            secretName: dex-tls
---
# NodePort Service — exposes Dex on port 32000 on the Kind node.
# Combined with extraPortMappings, this makes Dex reachable from your browser
apiVersion: v1
kind: Service
metadata:
  name: dex
  namespace: dex
spec:
  type: NodePort
  ports:
    - name: https
      port: 5556
      targetPort: 5556
      nodePort: 32000
  selector:
    app: dex
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: dex
  namespace: dex
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: dex
rules:
  - apiGroups: ["dex.coreos.com"]
    resources: ["*"]
    verbs: ["*"]
  - apiGroups: ["apiextensions.k8s.io"]
    resources: ["customresourcedefinitions"]
    verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: dex
subjects:
  - kind: ServiceAccount
    name: dex
    namespace: dex
roleRef:
  kind: ClusterRole
  name: dex
  apiGroup: rbac.authorization.k8s.io
</code></pre>
<pre><code class="language-bash">kubectl apply -f dex-config.yaml
kubectl apply -f dex-deployment.yaml
kubectl rollout status deployment/dex -n dex
</code></pre>
<h3 id="heading-step-3-install-kubelogin">Step 3: Install kubelogin</h3>
<pre><code class="language-bash"># macOS
brew install int128/kubelogin/kubelogin

# Linux
curl -LO https://github.com/int128/kubelogin/releases/latest/download/kubelogin_linux_amd64.zip
unzip -j kubelogin_linux_amd64.zip kubelogin -d /tmp
sudo mv /tmp/kubelogin /usr/local/bin/kubectl-oidc_login
rm kubelogin_linux_amd64.zip
</code></pre>
<p>Confirm it's installed:</p>
<pre><code class="language-bash">kubectl oidc-login --version
</code></pre>
<h3 id="heading-step-4-configure-a-kubeconfig-entry-for-oidc">Step 4: Configure a kubeconfig entry for OIDC</h3>
<p>This creates a new user and context in your kubeconfig. Instead of using a client certificate (like the default Kind admin), it tells kubectl to use kubelogin to get a token from Dex.</p>
<p>The <code>--oidc-extra-scope</code> flags are important: without <code>email</code> and <code>groups</code>, Dex won't include those claims in the JWT, and the API server won't know who you are or what groups you belong to.</p>
<pre><code class="language-bash">kubectl config set-credentials oidc-user \
  --exec-api-version=client.authentication.k8s.io/v1beta1 \
  --exec-command=kubectl \
  --exec-arg=oidc-login \
  --exec-arg=get-token \
  --exec-arg=--oidc-issuer-url=https://dex.127.0.0.1.nip.io:32000 \
  --exec-arg=--oidc-client-id=kubernetes \
  --exec-arg=--oidc-client-secret=kubernetes-secret \
  --exec-arg=--oidc-extra-scope=email \
  --exec-arg=--oidc-extra-scope=groups \
  --exec-arg=--certificate-authority=$(pwd)/dex-ca.crt

kubectl config set-context oidc@k8s-auth \
  --cluster=kind-k8s-auth \
  --user=oidc-user

kubectl config use-context oidc@k8s-auth
</code></pre>
<h3 id="heading-step-5-trigger-the-login-flow">Step 5: Trigger the login flow</h3>
<p>Jane has no RBAC permissions yet, so first grant her read access from the admin context:</p>
<pre><code class="language-bash">kubectl --context kind-k8s-auth create clusterrolebinding jane-view \
  --clusterrole=view --user=jane@example.com
</code></pre>
<p>Now switch to the OIDC context and trigger a login:</p>
<pre><code class="language-bash">kubectl get pods -n default
</code></pre>
<p>Your browser opens and redirects to the Dex login page. Log in as <code>jane@example.com</code> with password <code>password</code>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f2a6b76d7d55f162b5da2ee/44fe0657-b383-4245-9e43-45daea7a3f4f.png" alt="dexidp login screen" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<img src="https://cdn.hashnode.com/uploads/covers/5f2a6b76d7d55f162b5da2ee/4f77442a-3055-47fc-a141-8d881731a1f4.png" alt="dexidp grant access" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>After login, the terminal completes:</p>
<pre><code class="language-plaintext">No resources found in default namespace.
</code></pre>
<p>The browser-based authentication worked. <code>kubectl</code> received the token from Dex, sent it to the API server, the API server validated the JWT signature using the CA certificate from the <code>AuthenticationConfiguration</code>, extracted <code>jane@example.com</code> from the <code>email</code> claim, matched it against the RBAC binding, and authorized the request.</p>
<p>Without the <code>clusterrolebinding</code>, you would see <code>Error from server (Forbidden)</code> — authentication succeeds (the API server knows <em>who</em> you are) but authorization fails (jane has no permissions). This is the distinction between 401 Unauthorized and 403 Forbidden.</p>
<h3 id="heading-step-6-inspect-the-jwt">Step 6: Inspect the JWT</h3>
<p>A JWT (JSON Web Token) is a signed JSON payload that contains claims about the user. kubelogin caches the token locally under <code>~/.kube/cache/oidc-login/</code> so you don't have to log in on every kubectl command.</p>
<p>List the directory to find the cached file:</p>
<pre><code class="language-bash">ls ~/.kube/cache/oidc-login/
</code></pre>
<p>Decode the JWT payload directly from the cache:</p>
<pre><code class="language-bash">cat ~/.kube/cache/oidc-login/$(ls ~/.kube/cache/oidc-login/ | grep -v lock | head -1) | \
  python3 -c "
import json, sys, base64
token = json.load(sys.stdin)['id_token'].split('.')[1]
token += '=' * (4 - len(token) % 4)
print(json.dumps(json.loads(base64.urlsafe_b64decode(token)), indent=2))
"
</code></pre>
<p>You'll see something like:</p>
<pre><code class="language-json">{
  "iss": "https://dex.127.0.0.1.nip.io:32000",
  "sub": "CiQwOGE4Njg0Yi1kYjg4LTRiNzMtOTBhOS0zY2QxNjYxZjU0NjYSBWxvY2Fs",
  "aud": "kubernetes",
  "exp": 1775307910,
  "iat": 1775221510,
  "email": "jane@example.com",
  "email_verified": true
}
</code></pre>
<p>The <code>email</code> claim becomes jane's Kubernetes username because the <code>AuthenticationConfiguration</code> maps <code>username.claim: email</code>. The <code>aud</code> matches the configured <code>audiences</code>. The <code>iss</code> matches the issuer <code>url</code>. This is how the API server validates the token without contacting Dex on every request — it only needs the CA certificate to verify the JWT signature.</p>
<h3 id="heading-step-7-map-oidc-groups-to-rbac">Step 7: Map OIDC groups to RBAC</h3>
<p>The <code>admin@example.com</code> user has a <code>groups</code> claim in the Dex config containing <code>platform-engineers</code>. Instead of creating individual RBAC bindings per user, you can bind permissions to a group — anyone whose JWT contains that group gets the permissions automatically:</p>
<pre><code class="language-yaml"># platform-engineers-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: platform-engineers-admin
subjects:
  - kind: Group
    name: platform-engineers     # matches the groups claim in the JWT
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
</code></pre>
<p>You're currently logged in as <code>jane@example.com</code> via the OIDC context, but jane only has <code>view</code> permissions — she can't create cluster-wide RBAC bindings. Switch back to the admin context to apply this:</p>
<pre><code class="language-bash">kubectl config use-context kind-k8s-auth
kubectl apply -f platform-engineers-binding.yaml
kubectl config use-context oidc@k8s-auth
</code></pre>
<p>Now clear the cached token to log out of jane's session, then trigger a new login as <code>admin@example.com</code>:</p>
<pre><code class="language-bash"># Clear the cached token — this is how you "log out" with kubelogin
rm -rf ~/.kube/cache/oidc-login/

# This will open the browser again for a fresh login
kubectl get pods -n default
</code></pre>
<p>Log in as <code>admin@example.com</code> with password <code>password</code>. This time the JWT will contain <code>"groups": ["platform-engineers"]</code>, which matches the <code>ClusterRoleBinding</code> you just created. The admin user gets full cluster access — without ever being added to a kubeconfig by name.</p>
<p>You can verify by decoding the new token (Step 6) — the <code>groups</code> claim will be present:</p>
<pre><code class="language-json">{
  "email": "admin@example.com",
  "groups": ["platform-engineers"]
}
</code></pre>
<p>This is the real power of OIDC group claims: you manage group membership in your identity provider, and Kubernetes permissions follow automatically. Add someone to the <code>platform-engineers</code> group in Dex (or any upstream IdP), and they get cluster-admin access on their next login — no kubeconfig or RBAC changes needed.</p>
<h2 id="heading-cloud-provider-authentication">Cloud Provider Authentication</h2>
<p>AWS, GCP, and Azure each give Kubernetes clusters a native authentication mechanism that ties into their IAM systems.</p>
<p>The implementations differ in API surface, but they all use the same underlying mechanism: OIDC token projection. Once you understand how Dex works above, these are all variations on the same theme.</p>
<h3 id="heading-aws-eks">AWS EKS</h3>
<p>EKS uses the <code>aws-iam-authenticator</code> to translate AWS IAM identities into Kubernetes identities. When you run <code>kubectl</code> against an EKS cluster, the AWS CLI generates a short-lived token signed with your IAM credentials. The API server passes this token to the aws-iam-authenticator webhook, which verifies it against AWS STS and returns the corresponding username and groups.</p>
<p>User access is controlled via the <code>aws-auth</code> ConfigMap in <code>kube-system</code>, which maps IAM role ARNs and IAM user ARNs to Kubernetes usernames and groups. A typical entry looks like this:</p>
<pre><code class="language-yaml"># In kube-system/aws-auth ConfigMap
mapRoles:
  - rolearn: arn:aws:iam::123456789:role/platform-engineers
    username: platform-engineer:{{SessionName}}
    groups:
      - platform-engineers
</code></pre>
<p>AWS is migrating from the <code>aws-auth</code> ConfigMap to a newer Access Entries API, which manages the same mapping through the EKS API rather than a ConfigMap. The underlying authentication mechanism is the same.</p>
<h3 id="heading-google-gke">Google GKE</h3>
<p>GKE integrates with Google Cloud IAM using two different mechanisms, depending on whether you're authenticating as a human user or as a workload.</p>
<p>For human users, GKE accepts standard Google OAuth2 tokens. Running <code>gcloud container clusters get-credentials</code> writes a kubeconfig that uses the <code>gcloud</code> CLI as a credential plugin, generating short-lived tokens from your Google account automatically.</p>
<p>For pod-level identity — letting a pod assume a Google Cloud IAM role — GKE uses Workload Identity. You annotate a Kubernetes service account to bind it to a Google Service Account, and pods running as that service account can call Google Cloud APIs using the GSA's permissions:</p>
<pre><code class="language-bash"># Bind a Kubernetes SA to a Google Service Account
kubectl annotate serviceaccount my-app \
  --namespace production \
  iam.gke.io/gcp-service-account=my-app@my-project.iam.gserviceaccount.com
</code></pre>
<h3 id="heading-azure-aks">Azure AKS</h3>
<p>AKS integrates with Azure Active Directory. When Azure AD integration is enabled, <code>kubectl</code> requests an Azure AD token on behalf of the user via the Azure CLI, and the AKS API server validates it against Azure AD.</p>
<p>For pod-level identity, AKS uses Azure Workload Identity, which follows the same OIDC federation pattern as GKE Workload Identity. A Kubernetes service account is annotated with an Azure Managed Identity client ID, and pods can request Azure AD tokens without storing any credentials:</p>
<pre><code class="language-bash"># Annotate a service account with the Azure Managed Identity client ID
kubectl annotate serviceaccount my-app \
  --namespace production \
  azure.workload.identity/client-id=&lt;MANAGED_IDENTITY_CLIENT_ID&gt;
</code></pre>
<p>The underlying pattern across all three providers is the same: a trusted OIDC token is issued by the cloud provider, verified by the Kubernetes API server, and mapped to an identity through a binding (the <code>aws-auth</code> ConfigMap, a GKE Workload Identity binding, or an AKS federated identity credential). The OIDC section in this article is the conceptual foundation for all of them.</p>
<h2 id="heading-webhook-token-authentication">Webhook Token Authentication</h2>
<p>Webhook token authentication is worth knowing about because it appears in several common Kubernetes setups, even if you never configure it yourself.</p>
<p>When a request arrives with a bearer token that no other authenticator recognises, Kubernetes can send that token to an external HTTP endpoint for validation. The endpoint returns a response indicating who the token belongs to.</p>
<p>This is how EKS authentication worked before the aws-iam-authenticator was built into the API server. It's also how bootstrap tokens work during node join operations: a token is generated, embedded in the <code>kubeadm join</code> command, and validated by the bootstrap webhook when the new node contacts the API server for the first time.</p>
<p>For most clusters, you'll encounter webhook auth as something already running rather than something you configure. The main thing to know is that it exists and what it looks like when it appears in logs or configuration.</p>
<h2 id="heading-cleanup">Cleanup</h2>
<p>To remove everything created in this article:</p>
<pre><code class="language-bash"># Delete the OIDC demo cluster
kind delete cluster --name k8s-auth

# Remove generated certificate files
rm -f ca.crt ca.key jane.key jane.csr jane.crt jane.kubeconfig
rm -f dex-ca.crt dex-ca.key dex.crt dex.key dex.csr dex-ca.srl auth-config.yaml

# Remove the kubelogin token cache
rm -rf ~/.kube/cache/oidc-login/
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Kubernetes authentication is not a single mechanism — it's a chain of pluggable strategies, each one suited to different use cases. In this article you worked through the most important ones.</p>
<p>x509 client certificates are how Kubernetes works out of the box. The CN field becomes the username, the O field becomes the group, and the cluster CA is the trust anchor. You created a certificate for a new user, bound it to RBAC, and saw exactly how authentication and authorisation interact — authentication gets you in, RBAC determines what you can do.</p>
<p>You also saw the fundamental limitation: Kubernetes doesn't check certificate revocation lists, so a compromised certificate remains valid until it expires. This makes certificates a poor fit for human users in production environments.</p>
<p>OIDC is the production-grade answer. Tokens are short-lived, issued by a trusted identity provider, and map directly to Kubernetes groups through JWT claims. You deployed Dex as a self-hosted OIDC provider, configured the API server to trust it, and set up kubelogin for browser-based authentication.</p>
<p>You then decoded a JWT to see exactly what the API server reads from it, and mapped an OIDC group claim to a Kubernetes ClusterRoleBinding.</p>
<p>Cloud provider authentication — EKS, GKE, AKS — uses the same OIDC foundation with provider-specific wrappers. Understanding how Dex works makes each of those systems immediately readable.</p>
<p>All YAML, certificates, and configuration files from this article are in the <a href="https://github.com/Caesarsage/DevOps-Cloud-Projects/tree/main/intermediate/k8/security">companion GitHub repository</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build AI Agents That Can Control Cloud Infrastructure ]]>
                </title>
                <description>
                    <![CDATA[ Cloud infrastructure has become deeply programmable over the past decade. Nearly every platform exposes APIs that allow developers to create applications, provision databases, configure networking, an ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-ai-agents-that-can-control-cloud-infrastructure/</link>
                <guid isPermaLink="false">69cbefa6c1e86567d7576d3e</guid>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ai agents ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Tue, 31 Mar 2026 16:00:38 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/69bdba8c-6915-4d8c-ab35-1f5d06824f50.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Cloud infrastructure has become deeply programmable over the past decade.</p>
<p>Nearly every platform exposes APIs that allow developers to create applications, provision databases, configure networking, and retrieve metrics.</p>
<p>This shift enabled automation via Infrastructure as Code and CI/CD pipelines, allowing teams to manage systems through scripts rather than dashboards.</p>
<p>Now another layer of automation is emerging. AI agents are starting to participate directly in development workflows. These agents can read codebases, generate implementations, run terminal commands, and help debug systems. The next logical step is to allow them to interact with the infrastructure itself.</p>
<p>Instead of manually inspecting dashboards or remembering complex command-line syntax, developers can ask an AI agent to check system state, deploy services, or retrieve metrics. The agent performs these tasks by interacting with cloud APIs on behalf of the user.</p>
<p>This capability opens the door to a new type of workflow where infrastructure becomes conversational, programmable, and deeply integrated into development environments.</p>
<p>In this article, we will explore how AI agents can interact with cloud infrastructure through APIs, the challenges of exposing large APIs to AI systems, and how architectures like MCP make it possible for agents to discover and execute infrastructure operations safely. We will also look at a practical example of connecting an AI agent to a cloud platform like Sevalla using the search-and-execute pattern.</p>
<p>Familiarity with cloud infrastructure concepts such as APIs, Infrastructure as Code, and CI/CD workflows is recommended to follow along effectively. You should also have a basic understanding of how AI agents or developer assistants interact with code and systems to fully understand the architectures discussed in this article.</p>
<h2 id="heading-what-well-cover">What We'll Cover</h2>
<ul>
<li><p><a href="#heading-ai-agents-are-becoming-part-of-the-development-environment">AI Agents Are Becoming Part of the Development Environment</a></p>
</li>
<li><p><a href="#heading-connecting-ai-agents-to-external-systems">Connecting AI Agents to External Systems</a></p>
</li>
<li><p><a href="#heading-the-challenge-of-large-cloud-apis">The Challenge of Large Cloud APIs</a></p>
</li>
<li><p><a href="#heading-a-simpler-pattern-for-api-access">A Simpler Pattern for API Access</a></p>
</li>
<li><p><a href="#heading-why-sandboxed-code-execution-is-important">Why Sandboxed Code Execution Is Important</a></p>
</li>
<li><p><a href="#heading-practical-example-with-sevalla">Practical Example with Sevalla</a></p>
</li>
<li><p><a href="#heading-what-this-means-for-developers">What This Means for Developers</a></p>
</li>
<li><p><a href="#heading-the-next-evolution-of-infrastructure-automation">The Next Evolution of Infrastructure Automation</a></p>
</li>
</ul>
<h2 id="heading-ai-agents-are-becoming-part-of-the-development-environment">AI Agents Are Becoming Part of the Development Environment</h2>
<p>Modern developer tools increasingly embed AI assistants directly inside coding environments. Editors such as Cursor, Windsurf, and Claude Code allow developers to ask questions about their projects, generate new code, and execute commands without leaving the editor.</p>
<p>Instead of manually navigating documentation or writing boilerplate code, developers can simply describe what they want. The AI interprets the request and produces the necessary actions.</p>
<p>This approach is already common for tasks like writing functions, refactoring code, or debugging errors. However, infrastructure management is still largely handled through dashboards, terminal commands, or external tooling.</p>
<p>If AI agents are going to assist developers effectively, they need access to the same systems developers interact with every day. That means accessing APIs that manage applications, databases, deployments, and other infrastructure resources.</p>
<p>The challenge is providing that access in a structured and scalable way.</p>
<h2 id="heading-connecting-ai-agents-to-external-systems">Connecting AI Agents to External&nbsp;Systems</h2>
<p>AI agents do not inherently know how to interact with external services. They need a framework that allows them to call tools and access data safely.</p>
<p><a href="https://www.freecodecamp.org/news/how-the-model-context-protocol-works/">Model Context Protocol</a>, or MCP, provides one such framework. MCP is designed to let AI assistants connect to external tools in a standardized way.</p>
<p>An MCP server exposes tools that an AI agent can call when it needs information or wants to act. These tools might retrieve data from a database, query logs, interact with APIs, or execute commands on a remote system.</p>
<p>When the AI agent receives a request from the user, it determines which tool to call and executes that tool through the MCP server. The results are returned to the agent, which can then continue reasoning about the problem.</p>
<p>This architecture allows AI assistants to interact with complex systems while maintaining a clear boundary between the agent and the external environment.</p>
<h2 id="heading-the-challenge-of-large-cloud-apis">The Challenge of Large Cloud&nbsp;APIs</h2>
<p>While MCP enables connecting AI agents to infrastructure systems, cloud platforms introduce an additional challenge.</p>
<p>Most cloud platforms expose large APIs with many endpoints. A typical platform might include endpoints for managing applications, databases, storage, networking, domains, metrics, logs, and deployment pipelines.</p>
<p>If an MCP server exposes each endpoint as a separate tool, the number of tools can quickly grow into the hundreds.</p>
<p>This creates several problems. First, the AI agent must understand the purpose and parameters of every available tool before deciding which one to use. This increases the amount of context required for the agent to operate effectively.</p>
<p>Second, maintaining hundreds of tools becomes difficult for developers who build and maintain the MCP server.</p>
<p>Third, the system becomes rigid. Every time a new API endpoint is added, a new tool must also be created and documented.</p>
<p>For large APIs, this approach quickly becomes impractical.</p>
<h2 id="heading-a-simpler-pattern-for-api-access">A Simpler Pattern for API&nbsp;Access</h2>
<p>A different architecture solves this problem by dramatically reducing the number of tools exposed to the AI.</p>
<p>Instead of providing a separate tool for every API endpoint, the MCP server exposes only two capabilities.</p>
<p>The first capability allows the agent to search the API specification. This lets the agent discover available endpoints, understand parameters, and inspect request or response schemas.</p>
<p>The second capability allows the agent to execute code that calls the API.</p>
<p>In this model, the AI agent dynamically generates the code required to call the API. Because the agent can search the specification and write its own API calls, the MCP server does not need to define individual tools for every endpoint.</p>
<p>This pattern drastically reduces the complexity of the integration while still giving the agent full access to the underlying platform.</p>
<h2 id="heading-why-sandboxed-code-execution-is-important">Why Sandboxed Code Execution Is Important</h2>
<p>Allowing AI agents to generate and execute code raises important security considerations.</p>
<p>If the generated code runs unrestricted, it could potentially access sensitive parts of the system or perform unintended operations. To prevent this, the execution environment must be carefully controlled.</p>
<p>A common solution is running the generated code inside a sandboxed environment. In this setup, the code runs in an isolated runtime with limited permissions. The environment exposes only specific functions that allow interaction with the platform’s API.</p>
<p>Because the code cannot access the host system directly, the risk of unintended behavior is greatly reduced. At the same time, the AI agent retains the flexibility to generate custom API calls as needed.</p>
<p>This combination of dynamic code generation and sandboxed execution makes it possible for AI agents to interact with complex APIs safely.</p>
<h2 id="heading-practical-example-with-sevalla">Practical Example with&nbsp;Sevalla</h2>
<p>A practical implementation of this architecture can be seen in the <a href="https://github.com/sevalla-hosting/mcp">Sevalla MCP server</a>, which exposes a cloud platform’s API to AI agents through the search-and-execute pattern.</p>
<p><a href="https://sevalla.com/">Sevalla</a> is a PaaS provider designed for developers shipping production applications. It offers app hosting, database, object storage, and static site hosting for your projects. We also have other options, such as AWS and Azure, that come with their own MCP tools.</p>
<p>Instead of registering hundreds of tools for every API endpoint, the server provides only two tools that allow the AI agent to explore and interact with the entire platform. Find the <a href="https://docs.sevalla.com/quick-starts/coding-agents/overview">full documentation</a> for Sevalla’s MCP server here.</p>
<p>The first tool, <code>search</code>, allows the agent to query the platform’s OpenAPI specification. Through this interface the agent can discover available endpoints, understand parameters, and inspect response schemas.</p>
<img src="https://cdn.hashnode.com/uploads/covers/66c6d8f04fa7fe6a6e337edd/b1030c9d-b944-41f4-b0a0-4cf1f1bc3039.png" alt="MCP client" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Because the API specification is searchable, the agent does not need to know the structure of the platform’s API in advance. It can explore the API dynamically based on the task it needs to perform.</p>
<p>For example, if the user asks the agent to list all applications running in their account, the agent can begin by searching the API specification.</p>
<pre><code class="language-plaintext">const endpoints = await sevalla.search("list all applications")
</code></pre>
<p>The result returns the relevant API definitions, including the correct path and parameters required for the request. Once the agent understands which endpoint to use, it can generate the necessary API call.</p>
<p>The second tool, <code>execute</code>, runs JavaScript inside a sandboxed V8 environment. Within this environment the agent can call the API using a helper function provided by the platform.</p>
<pre><code class="language-plaintext">const apps = await sevalla.request({
  method: "GET",
  path: "/applications"
})
</code></pre>
<p>Because the code runs inside an isolated V8 sandbox, the generated script cannot access the host system. The only permitted interaction is through the API helper function. This ensures that the AI agent can perform infrastructure operations safely while still retaining the flexibility to generate dynamic API calls.</p>
<p>This approach allows an agent to discover and interact with many parts of the platform without requiring predefined tools for each capability. After discovering endpoints through the API specification, the agent can retrieve application data, inspect deployments, query metrics, or manage infrastructure resources through generated API calls.</p>
<p>The design also significantly reduces context usage. Traditional MCP integrations might require hundreds of tools to represent every endpoint of a large API. In contrast, the search-and-execute pattern allows the entire API surface to be accessed through just two tools.</p>
<p>For developers connecting AI assistants to infrastructure platforms, this architecture provides a practical way to expose large APIs while keeping the integration simple and efficient.</p>
<h2 id="heading-what-this-means-for-developers">What This Means for Developers</h2>
<p>Allowing AI agents to interact with infrastructure APIs changes how developers manage systems.</p>
<p>Instead of manually navigating dashboards or writing long sequences of commands, developers can describe what they want in natural language. The AI agent can interpret the request, discover the relevant API endpoints, and execute the required operations.</p>
<p>This approach also improves observability and debugging. When something goes wrong, the agent can query logs, inspect metrics, and retrieve system state without requiring the developer to manually gather information.</p>
<p>Over time, this type of integration could significantly reduce the friction involved in managing complex cloud systems.</p>
<h2 id="heading-the-next-evolution-of-infrastructure-automation">The Next Evolution of Infrastructure Automation</h2>
<p>Infrastructure automation has evolved through several stages. Early cloud systems relied heavily on manual configuration through web interfaces. Infrastructure as Code later allowed teams to define infrastructure using scripts and configuration files.</p>
<p>CI/CD pipelines then automated the process of deploying and updating systems.</p>
<p>AI agents represent the next step in this progression. By combining APIs, MCP integrations, and sandboxed execution environments, developers can allow intelligent systems to reason about infrastructure and interact with it safely.</p>
<p>Instead of static integrations, agents can dynamically discover and call APIs as needed. This makes infrastructure management more flexible and accessible while maintaining the reliability of programmable systems.</p>
<p>As AI tools become more deeply embedded in development environments, the ability for agents to understand and control infrastructure will likely become a standard capability for modern platforms.</p>
<p><em>Hope you enjoyed this article.</em> <a href="https://www.manishmshiva.me/"><em>Visit my blog</em></a> <em>for more practical tutorials.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Infrastructure as Code with APIs: How to Automate Cloud Resources the Developer Way ]]>
                </title>
                <description>
                    <![CDATA[ Modern software development moves fast. Teams deploy code many times a day. New environments appear and disappear constantly. In this world, manual infrastructure setup simply doesn't scale. For years ]]>
                </description>
                <link>https://www.freecodecamp.org/news/iac-with-apis-how-to-automate-cloud-resources/</link>
                <guid isPermaLink="false">69c17f3c30a9b81e3a894704</guid>
                
                    <category>
                        <![CDATA[ #IaC ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ APIs ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Mon, 23 Mar 2026 17:58:20 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/d45b828b-c7b7-4138-a373-6edea786bc65.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Modern software development moves fast. Teams deploy code many times a day. New environments appear and disappear constantly. In this world, manual infrastructure setup simply doesn't scale.</p>
<p>For years, developers logged into dashboards, clicked through forms, and configured servers by hand. This worked for small projects, but it quickly became fragile. Every manual step increased the chance of mistakes. Environments drifted apart. Reproducing the same setup became difficult.</p>
<p><a href="https://www.redhat.com/en/topics/automation/what-is-infrastructure-as-code-iac">Infrastructure as Code (IaC)</a> solves this problem. Instead of clicking through interfaces, developers define infrastructure using code. This approach makes infrastructure predictable, repeatable, and easy to automate.</p>
<p>In recent years, another approach has become popular alongside traditional IaC tools: using cloud APIs directly to create and manage infrastructure. This gives developers full control over how resources are provisioned and integrated into workflows.</p>
<p>This article explains what Infrastructure as Code means, why APIs are a powerful way to implement it, and how developers can automate cloud resources using simple scripts.</p>
<p>A basic understanding of cloud platforms, command-line interfaces, and scripting languages like Python, Bash, or JavaScript will help you follow along effectively. Familiarity with APIs, authentication methods, and CI/CD concepts will also make it easier to implement the automation techniques discussed in this article.</p>
<h2 id="heading-heres-what-well-cover">Here's what we'll cover:</h2>
<ul>
<li><p><a href="#heading-what-is-infrastructure-as-code">What Is Infrastructure as Code?</a></p>
</li>
<li><p><a href="#heading-the-limits-of-manual-infrastructure">The Limits of Manual Infrastructure</a></p>
</li>
<li><p><a href="#heading-why-apis-are-a-powerful-iac-tool">Why APIs Are a Powerful IaC Tool</a></p>
</li>
<li><p><a href="#heading-automating-infrastructure-with-scripts">Automating Infrastructure with Scripts</a></p>
</li>
<li><p><a href="#heading-practical-example-with-sevalla">Practical Example with Sevalla</a></p>
<ul>
<li><p><a href="#heading-installing-cli">Installing CLI</a></p>
</li>
<li><p><a href="#heading-working-with-your-infrastructure-using-cli">Working with your Infrastructure using CLI</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-infrastructure-as-code-improves-developer-productivity">Infrastructure as Code Improves Developer Productivity</a></p>
</li>
<li><p><a href="#heading-the-future-of-infrastructure">The Future of Infrastructure</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-is-infrastructure-as-code">What Is Infrastructure as&nbsp;Code?</h2>
<p>Infrastructure as Code means managing infrastructure using code instead of manual processes.</p>
<p>Instead of setting up servers, databases, and networks by hand, you define them in scripts or configuration files. These files describe the desired state of your infrastructure. A tool or script then creates and maintains that state automatically.</p>
<p>For example, instead of manually creating a database, you might define it in code like this:</p>
<pre><code class="language-plaintext">database:
  name: app_db
  engine: postgres
  version: 16
</code></pre>
<p>Once the code runs, the database is created automatically.</p>
<p>This approach provides several key benefits.</p>
<p>First, it improves consistency. Every environment is created from the same definition. Development, staging, and production environments stay aligned.</p>
<p>Second, it improves repeatability. If infrastructure fails, it can be recreated from code in minutes.</p>
<p>Third, it improves version control. Infrastructure definitions live in the same repositories as application code. Teams can review, track, and roll back changes.</p>
<p>Finally, it enables automation. Infrastructure can be created during deployments, tests, or CI/CD pipelines.</p>
<h2 id="heading-the-limits-of-manual-infrastructure">The Limits of Manual Infrastructure</h2>
<p>Before IaC became common, infrastructure management relied heavily on dashboards and manual configuration.</p>
<p>A developer would open a cloud console and perform steps like:</p>
<ul>
<li><p>Create a server</p>
</li>
<li><p>Attach storage</p>
</li>
<li><p>Configure environment variables</p>
</li>
<li><p>Connect a database</p>
</li>
<li><p>Add a domain</p>
</li>
</ul>
<p>These steps worked, but they introduced problems.</p>
<p>First of all, manual configuration is hard to document. Even if teams write guides, small details are often missed. Over time, environments drift apart.</p>
<p>Manual processes also slow down development. Spinning up a new environment may take hours instead of seconds.</p>
<p>Even worse, manual infrastructure cannot easily be tested. If something breaks, reproducing the same conditions becomes difficult.</p>
<p>Infrastructure as Code removes these problems by turning infrastructure into something that can be scripted, tested, and automated.</p>
<h2 id="heading-why-apis-are-a-powerful-iac-tool">Why APIs Are a Powerful IaC&nbsp;Tool</h2>
<p>Many people associate Infrastructure as Code with tools like Terraform or CloudFormation. These tools are powerful, but they're not the only option.</p>
<p>Every modern cloud platform exposes an API. That API allows developers to create resources programmatically.</p>
<p>This means infrastructure can be controlled directly from code using HTTP requests or command-line interfaces.</p>
<p>Using APIs for IaC has several advantages.</p>
<p>First, it offers maximum flexibility. Developers can integrate infrastructure creation directly into applications, deployment scripts, or internal tools.</p>
<p>Second, it reduces tooling complexity. Instead of learning a specialized IaC language, teams can use languages they already know, such as Python, JavaScript, or Bash.</p>
<p>Third, it enables dynamic infrastructure. Scripts can create resources only when needed, scale them automatically, and remove them when work is complete.</p>
<p>For example, a test suite could automatically create a database, run tests, and delete the database afterwards. This keeps environments clean and reduces costs.</p>
<p>APIs essentially turn the cloud into a programmable platform.</p>
<h2 id="heading-automating-infrastructure-with-scripts">Automating Infrastructure with&nbsp;Scripts</h2>
<p>Using APIs for infrastructure automation usually follows a simple workflow.</p>
<ol>
<li><p>First, a script authenticates with the cloud platform using an API token or credentials.</p>
</li>
<li><p>Second, the script sends requests to create or modify resources such as applications, databases, or storage.</p>
</li>
<li><p>Third, the script captures identifiers or configuration values from the response.</p>
</li>
<li><p>Finally, those values are used in later steps, such as deployments or integrations.</p>
</li>
</ol>
<p>Because these steps run in code, they can easily be included in CI/CD pipelines.</p>
<p>A typical pipeline might do the following:</p>
<ul>
<li><p>Create infrastructure</p>
</li>
<li><p>Deploy the application</p>
</li>
<li><p>Run tests</p>
</li>
<li><p>Collect metrics</p>
</li>
<li><p>Destroy temporary environments</p>
</li>
</ul>
<p>This approach ensures every deployment follows the same process.</p>
<h2 id="heading-practical-example-with-sevalla">Practical Example with&nbsp;Sevalla</h2>
<p>A practical way to apply Infrastructure as Code through APIs is to use a command-line interface that directly interacts with a cloud platform’s API. This lets you automate infrastructure creation using scripts rather than dashboards.</p>
<p>One example is the <a href="https://github.com/sevalla-hosting/cli">Sevalla CLI</a>, which exposes infrastructure operations as terminal commands that can be executed manually or inside automation pipelines.</p>
<p><a href="https://sevalla.com/">Sevalla</a> is a developer-centric PaaS designed to simplify your workflow. They provide high-performance application hosting, managed databases, object storage, and static sites in one unified platform.</p>
<p>Other options are AWS and Azure, which require complex CLI tools and heavy DevOps overhead. Sevalla offers simplicity and ease of use, similar to Heroku.</p>
<h3 id="heading-installing-cli">Installing CLI</h3>
<p>You can install the CLI using the following shell command:</p>
<pre><code class="language-plaintext">bash &lt;(curl -fsSL https://raw.githubusercontent.com/sevalla-hosting/cli/main/install.sh)
</code></pre>
<p>Once installed, you can view the list of all available commands using the <code>help</code> command:</p>
<img src="https://cdn.hashnode.com/uploads/covers/66c6d8f04fa7fe6a6e337edd/3e6209cd-6c1e-420d-9d60-7376d54849d0.png" alt="Sevalla CLI Help" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The first step is authentication. Make sure you have an account on Sevalla before using the CLI.</p>
<pre><code class="language-plaintext">sevalla login
</code></pre>
<p>For automated environments such as CI/CD pipelines, authentication can be done with an API token. The token is stored in an environment variable so scripts can run without user interaction.</p>
<pre><code class="language-plaintext">export SEVALLA_API_TOKEN="your-api-token"
</code></pre>
<p>Once authenticated, you can quickly view a list of your apps using <code>sevalla apps list</code></p>
<img src="https://cdn.hashnode.com/uploads/covers/66c6d8f04fa7fe6a6e337edd/e3590cfa-5a95-4c5a-b0e9-8615070932da.png" alt="Sevalla Apps List" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-working-with-your-infrastructure-using-cli">Working with Your Infrastructure using CLI</h3>
<p>Your infrastructure can now be created directly from the command line. For example, you might start by creating an application service that will run the backend code:</p>
<pre><code class="language-plaintext">sevalla apps create --name myapp --source privateGit --cluster &lt;id&gt;
</code></pre>
<p>This command provisions a new application resource on the platform. Instead of navigating through a web interface and filling out forms, the entire setup is performed through a single command.</p>
<p>Because the command can be stored in scripts or configuration files, it becomes part of the project’s infrastructure definition.</p>
<p>After creating the application, you'll often need a database. You can also provision this programmatically:</p>
<pre><code class="language-plaintext">sevalla databases create \
  --name mydb \
  --type postgresql \
  --db-version 16 \
  --cluster &lt;id&gt; \
  --resource-type &lt;id&gt; \
  --db-name mydb \
  --db-password secret
</code></pre>
<p>This creates a PostgreSQL database with a defined version and credentials. In an automated workflow, the database creation step could run during environment setup for staging or testing.</p>
<p>Once the application and database exist, the next step might be configuring environment variables so the application can connect to the database:</p>
<pre><code class="language-plaintext">sevalla apps env-vars create &lt;app-id&gt; --key DATABASE_URL --value "postgres://..."
</code></pre>
<p>These configuration values can be injected during deployments, ensuring the application always receives the correct settings.</p>
<p>Deployment automation is another key part of Infrastructure as Code. Instead of manually triggering deployments, a script can deploy new code whenever a repository is updated.</p>
<pre><code class="language-plaintext">sevalla apps deployments trigger &lt;app-id&gt; --branch main
</code></pre>
<p>This allows CI/CD systems to deploy new versions of the application automatically after tests pass.</p>
<p>Infrastructure automation also includes scaling and monitoring. For example, if an application needs more instances to handle traffic, you can update the number of running processes programmatically.</p>
<pre><code class="language-plaintext">sevalla apps processes update &lt;process-id&gt; --app-id &lt;app-id&gt; --instances 3
</code></pre>
<p>You can also retrieve metrics through the CLI. This allows monitoring tools or scripts to analyze system performance.</p>
<pre><code class="language-plaintext">sevalla apps processes metrics cpu-usage &lt;app-id&gt; &lt;process-id&gt;
</code></pre>
<p>Similarly, you can also query application metrics such as response time or request rates to detect performance issues.</p>
<p>Another common step in infrastructure automation is configuring domains. Instead of manually linking domains to applications, a script can add them during environment setup.</p>
<pre><code class="language-plaintext">sevalla apps domains add &lt;app-id&gt; --name example.com
</code></pre>
<p>With these commands combined in scripts or pipelines, you can fully automate the lifecycle of your infrastructure. A CI pipeline could create an application, provision a database, configure environment variables, deploy code, attach a domain, and monitor performance  – all without human intervention.</p>
<p>Because every command supports JSON output, scripts can also capture values returned by the platform and reuse them in later steps. For example:</p>
<pre><code class="language-plaintext">APP_ID=$(sevalla apps list --json | jq -r '.[0].id')
</code></pre>
<p>This ability to chain commands together makes it easy to build powerful automation workflows.</p>
<p>In practice, teams often place these commands inside deployment scripts or pipeline steps. Whenever code is pushed to a repository, the pipeline automatically provisions or updates the infrastructure needed to run the application.</p>
<p>This approach demonstrates how APIs and automation tools can turn infrastructure into something you can manage the same way you manage application code: through scripts, version control, and automated workflows.</p>
<h2 id="heading-infrastructure-as-code-improves-developer-productivity">Infrastructure as Code Improves Developer Productivity</h2>
<p>One of the biggest benefits of Infrastructure as Code is developer productivity.</p>
<p>Developers no longer need to wait for infrastructure changes or manually configure environments.</p>
<p>Instead, infrastructure becomes part of the development workflow.</p>
<p>When a new feature requires a service, the developer simply adds the infrastructure definition to the repository. The pipeline then creates it automatically.</p>
<p>This reduces delays and keeps development moving quickly.</p>
<p>It also makes onboarding easier. New team members can spin up a full environment with a single command.</p>
<h2 id="heading-the-future-of-infrastructure">The Future of Infrastructure</h2>
<p>Cloud infrastructure continues to evolve toward automation and programmability.</p>
<p>Platforms increasingly expose APIs that allow every resource to be created, configured, and monitored through code.</p>
<p>This trend aligns naturally with the way developers already work.</p>
<p>Applications are built with code. Deployments are automated with code. It makes sense that infrastructure should also be defined with code.</p>
<p>Infrastructure as Code with APIs takes this idea even further. It allows infrastructure to be embedded directly into development workflows, pipelines, and internal tools.</p>
<p>The result is faster development, fewer configuration errors, and more reliable systems.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Infrastructure as Code has transformed how teams manage cloud environments.</p>
<p>By replacing manual configuration with code, organizations gain consistency, automation, and repeatability.</p>
<p>Using APIs to control infrastructure adds another level of flexibility. Developers can integrate infrastructure directly into scripts, pipelines, and applications.</p>
<p>This approach turns the cloud into a programmable platform.</p>
<p>As systems grow more complex and deployment cycles accelerate, the ability to automate infrastructure will only become more important.</p>
<p>For modern development teams, treating infrastructure as code is no longer optional. It's the foundation of reliable and scalable software delivery.</p>
<p><em>Hope you enjoyed this article. Learn more about me by</em> <a href="https://www.linkedin.com/in/manishmshiva/edit/intro/"><em><strong>visiting my LinkedIn</strong></em></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Full-Stack CRUD App with React, AWS Lambda, DynamoDB, and Cognito Auth ]]>
                </title>
                <description>
                    <![CDATA[ Building a web application that works only on your local machine is one thing. Building one that is secure, connected to a real database, and accessible to anyone on the internet is another challenge  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/full-stack-aws-react-lambda-dynamodb-tutorial/</link>
                <guid isPermaLink="false">69b96f7ec22d3eeb8ac3bf81</guid>
                
                    <category>
                        <![CDATA[ serverless ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ React ]]>
                    </category>
                
                    <category>
                        <![CDATA[ full stack ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Benedicta Onyebuchi ]]>
                </dc:creator>
                <pubDate>Tue, 17 Mar 2026 15:13:02 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/1a996eff-72f5-4f4d-b8da-cf4d646c3224.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Building a web application that works only on your local machine is one thing. Building one that is secure, connected to a real database, and accessible to anyone on the internet is another challenge entirely. And it requires a different set of tools.</p>
<p>Most production web applications share a common set of needs: they store and retrieve data, they expose that data through an API, they require users to authenticate before accessing sensitive operations, and they need to be deployed somewhere reliable and fast.</p>
<p>Meeting all of those needs used to require managing servers, configuring databases, handling authentication infrastructure, and provisioning hosting environments – often as separate, manual processes.</p>
<p>AWS changes that model significantly. With the combination of services you'll use in this tutorial (Lambda, DynamoDB, API Gateway, Cognito, and CloudFront), you can build and deploy a fully functional, secured, globally distributed application without managing a single server.</p>
<p>Each service handles one specific responsibility:</p>
<ul>
<li><p>DynamoDB stores your data</p>
</li>
<li><p>Lambda runs your business logic on demand</p>
</li>
<li><p>API Gateway exposes your functions as a REST API</p>
</li>
<li><p>Cognito manages user authentication</p>
</li>
<li><p>CloudFront delivers your frontend worldwide over HTTPS.</p>
</li>
</ul>
<p>The AWS CDK (Cloud Development Kit) ties all of this together by letting you define every one of those services as TypeScript code. Instead of clicking through the AWS Console to configure each resource manually, you describe your entire infrastructure in a single file and deploy it with one command.</p>
<p>By the end of this tutorial, you will have a fully deployed vendor management dashboard. Users can sign up, log in, and then create, read, and delete vendors, with all data securely stored in AWS DynamoDB and all routes protected by Amazon Cognito authentication.</p>
<h2 id="heading-what-youll-build">What You'll Build</h2>
<p>In this handbook, you'll build a two-panel web app where authenticated users can:</p>
<ul>
<li><p>Add a new vendor (name, category, contact email)</p>
</li>
<li><p>View all saved vendors in real time</p>
</li>
<li><p>Delete a vendor from the list</p>
</li>
<li><p>Sign in and sign out securely</p>
</li>
</ul>
<p>The frontend is built with Next.js. The backend runs entirely on AWS: DynamoDB stores the data, Lambda functions handle the logic, API Gateway exposes a REST API, Cognito manages authentication, and CloudFront serves the app globally over HTTPS.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-who-this-is-for">Who This Is For</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-architecture-overview">Architecture Overview</a></p>
</li>
<li><p><a href="#heading-part-1-set-up-your-aws-account-and-tools">Part 1: Set Up Your AWS Account and Tools</a></p>
</li>
<li><p><a href="#heading-part-2-set-up-the-project-structure">Part 2: Set Up the Project Structure</a></p>
</li>
<li><p><a href="#heading-part-3-define-the-database-dynamodb">Part 3: Define the Database (DynamoDB)</a></p>
</li>
<li><p><a href="#heading-part-4-write-the-lambda-functions">Part 4: Write the Lambda Functions</a></p>
</li>
<li><p><a href="#heading-part-5-build-the-api-with-api-gateway">Part 5: Build the API with API Gateway</a></p>
</li>
<li><p><a href="#heading-part-6-deploy-the-backend-to-aws">Part 6: Deploy the Backend to AWS</a></p>
</li>
<li><p><a href="#heading-part-7-build-the-react-frontend">Part 7: Build the React Frontend</a></p>
</li>
<li><p><a href="#heading-part-8-add-authentication-with-amazon-cognito">Part 8: Add Authentication with Amazon Cognito</a></p>
</li>
<li><p><a href="#heading-part-9-deploy-the-frontend-with-s3-and-cloudfront">Part 9: Deploy the Frontend with S3 and CloudFront</a></p>
</li>
<li><p><a href="#heading-what-you-built">What You Built</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-who-this-is-for">Who This Is For</h2>
<p>This tutorial is for developers who know basic JavaScript and React but have never used AWS. You don't need any prior backend, cloud, or DevOps experience. I'll explain every AWS concept before we use it.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before starting, make sure you have the following installed and available:</p>
<ul>
<li><p><strong>Node.js 18 or higher</strong>: <a href="https://nodejs.org">Download here</a></p>
</li>
<li><p><strong>npm</strong>: Included with Node.js</p>
</li>
<li><p><strong>A code editor</strong>: I recommend VS Code</p>
</li>
<li><p><strong>A terminal</strong>: Any terminal on macOS, Linux, or Windows (WSL recommended on Windows)</p>
</li>
<li><p><strong>An AWS account</strong>: You will create one in Part 1. A credit card is required, but the Free Tier covers everything in this tutorial.</p>
</li>
<li><p><strong>Basic familiarity with React and TypeScript</strong>: You should understand components, <code>useState</code>, and <code>useEffect</code>.</p>
</li>
</ul>
<h2 id="heading-architecture-overview">Architecture Overview</h2>
<p>Before writing any code, here's a plain-English description of how the pieces fit together.</p>
<p>When a user clicks "Add Vendor" in the React app:</p>
<ol>
<li><p>The frontend reads the user's JWT auth token from the browser session</p>
</li>
<li><p>It sends a <code>POST</code> request to API Gateway, including the token in the request header</p>
</li>
<li><p>API Gateway checks the token against Cognito. If the token is invalid or missing, it rejects the request with a 401 error immediately</p>
</li>
<li><p>If the token is valid, API Gateway passes the request to the createVendor Lambda function</p>
</li>
<li><p>The Lambda function writes the new vendor to DynamoDB</p>
</li>
<li><p>DynamoDB confirms the write, and the Lambda returns a success response</p>
</li>
<li><p>The frontend re-fetches the vendor list and updates the UI</p>
</li>
</ol>
<p>The same flow applies to reading and deleting vendors, with different Lambda functions and HTTP methods.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/70486bdc-f272-45db-be30-f10752916546.png" alt="Architecture diagram of the Vendors Tracker Application" style="display:block;margin:0 auto" width="1100" height="499" loading="lazy">

<p><strong>How the app is deployed:</strong> Your React app is exported as a static site, uploaded to an S3 bucket, and served globally through CloudFront. Your backend infrastructure (Lambda functions, API Gateway, DynamoDB, Cognito) is defined in TypeScript using AWS CDK and deployed with a single command.</p>
<h2 id="heading-part-1-set-up-your-aws-account-and-tools">Part 1: Set Up Your AWS Account and Tools</h2>
<p>Before writing any application code, you need three things in place: an AWS account, the right tools on your machine, and credentials that let those tools communicate with AWS on your behalf.</p>
<h3 id="heading-11-create-your-aws-account">1.1 Create Your AWS Account</h3>
<p>If you don't have an AWS account:</p>
<ol>
<li><p>Go to <a href="https://aws.amazon.com">https://aws.amazon.com</a></p>
</li>
<li><p>Click <strong>Create an AWS Account</strong></p>
</li>
<li><p>Follow the sign-up prompts and add a payment method</p>
</li>
<li><p>Once registered, log in to the AWS Management Console</p>
</li>
</ol>
<p>AWS has a Free Tier that covers all the services used in this tutorial. You won't be charged for normal use while following along.</p>
<h3 id="heading-12-install-the-aws-cli-and-cdk">1.2 Install the AWS CLI and CDK</h3>
<p>The <strong>AWS CLI</strong> is a command-line tool that lets you interact with AWS from your terminal: checking resources, configuring credentials, and more.</p>
<p>The <strong>AWS CDK (Cloud Development Kit)</strong> is the tool you will use to define your entire backend (database, Lambda functions, API) using TypeScript code. Instead of clicking through the AWS Console to create each resource, you describe what you want in a TypeScript file and CDK builds it for you.</p>
<p>Install both:</p>
<pre><code class="language-shell"># Install AWS CLI (macOS)
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /

# For Linux, see: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html
# For Windows, see: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-windows.html

# Install AWS CDK globally
npm install -g aws-cdk
</code></pre>
<p>Verify both are installed:</p>
<pre><code class="language-shell">aws --version
cdk --version
</code></pre>
<p>Both commands should print a version number. If they do, you are ready to move on.</p>
<h3 id="heading-13-configure-your-aws-credentials-iam">1.3 Configure Your AWS Credentials (IAM)</h3>
<p>This step is critical. Your terminal needs a set of credentials – like a username and password – to act on your behalf inside AWS.</p>
<p>Think of your root account (the one you signed up with) as the master key to your entire AWS account. You should never use it for day-to-day development. Instead, you will create a separate IAM user with its own set of keys. If those keys are ever exposed, you can delete them without compromising your root account.</p>
<h4 id="heading-phase-1-create-an-iam-user">Phase 1: Create an IAM User</h4>
<ol>
<li><p>Log in to the AWS Console and search for IAM in the top search bar</p>
</li>
<li><p>In the left sidebar, click Users, then click Create user</p>
</li>
<li><p>Name the user <code>cdk-dev</code>. Leave "Provide user access to the AWS Management Console" unchecked – you only need terminal access, not console access</p>
</li>
<li><p>On the permissions screen, choose Attach policies directly</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/d4699108-c1aa-4dd3-957c-b84292c719a2.png" alt="IAM Console showing the “Attach policies directly” screen with AdministratorAccess checked" style="display:block;margin:0 auto" width="1100" height="441" loading="lazy">

<ol>
<li>Search for <code>AdministratorAccess</code> and check the box next to it</li>
</ol>
<p>Note on permissions: In a production job you would use a more restricted policy. For this tutorial, Administrator access is needed because CDK creates many different types of AWS resources.</p>
<p>6. Click through to the end and click Create user</p>
<h4 id="heading-phase-2-generate-access-keys">Phase 2: Generate Access Keys</h4>
<ol>
<li><p>Click on your newly created <code>cdk-dev</code> user from the Users list</p>
</li>
<li><p>Go to the Security credentials tab</p>
</li>
<li><p>Scroll down to Access keys and click Create access key</p>
</li>
<li><p>Select Command Line Interface (CLI), check the acknowledgment box, and click Next</p>
</li>
<li><p>Click Create access key</p>
</li>
</ol>
<p><strong>Important</strong>: Copy both the Access Key ID and the Secret Access Key right now. You will never be able to see the Secret Access Key again after closing this screen. Save both values in a password manager or secure note.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/d85bb4eb-0ecf-4d92-be92-d75af5a534c6.png" alt="IAM Console showing the Create access key screen with the Access Key ID and Secret Access Key" style="display:block;margin:0 auto" width="1100" height="426" loading="lazy">

<h4 id="heading-phase-3-connect-your-terminal-to-aws">Phase 3: Connect Your Terminal to AWS</h4>
<p>Run the following command in your terminal:</p>
<pre><code class="language-shell">aws configure
</code></pre>
<p>You will be prompted for four values:</p>
<pre><code class="language-shell">AWS Access Key ID:     [paste your Access Key ID]
AWS Secret Access Key: [paste your Secret Access Key]
Default region name:   us-east-1
Default output format: json
</code></pre>
<p>Use <code>us-east-1</code> as your region for this tutorial. After this step, every CDK and AWS CLI command you run will use these credentials automatically.</p>
<h2 id="heading-part-2-set-up-the-project-structure">Part 2: Set Up the Project Structure</h2>
<p>You will use a <strong>monorepo</strong> layout – one top-level folder with two sub-projects inside: <code>frontend</code> for your React app and <code>backend</code> for your AWS infrastructure code. They are deployed independently but live side by side.</p>
<h3 id="heading-21-create-the-workspace">2.1 Create the Workspace</h3>
<pre><code class="language-shell">mkdir vendor-tracker &amp;&amp; cd vendor-tracker
mkdir backend frontend
</code></pre>
<h3 id="heading-22-initialize-the-frontend-nextjs">2.2 Initialize the Frontend (Next.js)</h3>
<p>Navigate into the <code>frontend</code> folder and run:</p>
<pre><code class="language-shell">cd frontend
npx create-next-app@latest .
</code></pre>
<p>When prompted, choose the following options:</p>
<ul>
<li><p><strong>TypeScript</strong> --&gt; Yes</p>
</li>
<li><p><strong>ESLint</strong> --&gt; Yes</p>
</li>
<li><p><strong>Tailwind CSS</strong> --&gt; Yes</p>
</li>
<li><p><strong>src/ directory</strong> --&gt;No</p>
</li>
<li><p><strong>App Router</strong> --&gt; Yes</p>
</li>
<li><p><strong>Import alias</strong> --&gt; No</p>
</li>
</ul>
<h3 id="heading-23-initialize-the-backend-cdk">2.3 Initialize the Backend (CDK)</h3>
<p>Navigate into the <code>backend</code> folder and run:</p>
<pre><code class="language-shell">cd ../backend
cdk init app --language typescript
</code></pre>
<p>This generates a boilerplate CDK project. The most important file it creates is <code>backend/lib/backend-stack.ts</code>. This is where you will define all of your AWS infrastructure as TypeScript code.</p>
<p>Also install <code>esbuild</code>, which CDK uses to bundle your Lambda functions:</p>
<pre><code class="language-shell">npm install --save-dev esbuild
</code></pre>
<h3 id="heading-24-understanding-cdk-before-you-write-any-code">2.4 Understanding CDK Before You Write Any Code</h3>
<p>CDK is likely different from most tools you have used. Here is how it works:</p>
<p>Normally, you would create AWS resources by clicking through the AWS Console: create a table here, configure a Lambda function there. CDK lets you do all of that using TypeScript code instead.</p>
<p>When you run <code>cdk deploy</code>, CDK reads your TypeScript file, converts it into an AWS CloudFormation template (an internal AWS format for describing infrastructure), and submits it to AWS. AWS then creates all the resources you described.</p>
<p>A few terms you will see throughout this tutorial:</p>
<ul>
<li><p><strong>Stack</strong>: The collection of all AWS resources you define together. Your <code>BackendStack</code> class is your stack.</p>
</li>
<li><p><strong>Construct</strong>: Each individual AWS resource you create inside a stack (a table, a Lambda function, an API) is called a construct.</p>
</li>
<li><p><strong>Deploy</strong>: Running <code>cdk deploy</code> sends your TypeScript definition to AWS and creates or updates the real resources.</p>
</li>
</ul>
<p>The main file you'll work in is <code>backend/lib/backend-stack.ts</code>. Think of it as the blueprint for your entire backend.</p>
<p>Your final project structure will look like this:</p>
<pre><code class="language-plaintext">vendor-tracker/
├── backend/
│   ├── lambda/
│   │   ├── createVendor.ts
│   │   ├── getVendors.ts
│   │   └── deleteVendor.ts
│   ├── lib/
│   │   └── backend-stack.ts
│   └── package.json
└── frontend/
    ├── app/
    │   ├── layout.tsx
    │   ├── page.tsx
    │   └── providers.tsx
    ├── lib/
    │   └── api.ts
    ├── types/
    │   └── vendor.ts
    └── .env.local
</code></pre>
<h2 id="heading-part-3-define-the-database-dynamodb">Part 3: Define the Database (DynamoDB)</h2>
<p>DynamoDB is AWS's NoSQL database. Think of it as a fast, scalable key-value store in the cloud. Every item in a DynamoDB table must have a unique ID called the <strong>partition key</strong>. For your vendor table, that key will be <code>vendorId</code>.</p>
<p>Open <code>backend/lib/backend-stack.ts</code>. Replace the entire file contents with the following:</p>
<pre><code class="language-typescript">import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // 1. DynamoDB Table
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: {
        name: 'vendorId',
        type: dynamodb.AttributeType.STRING,
      },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY, // For development only
    });
  }
}
</code></pre>
<p><strong>What each line does:</strong></p>
<ul>
<li><p><code>partitionKey</code> tells DynamoDB that <code>vendorId</code> is the unique identifier for every record. No two vendors can share the same <code>vendorId</code>.</p>
</li>
<li><p><code>PAY_PER_REQUEST</code> means you only pay when data is actually read or written. There is no charge when the table is idle, which makes it cost-effective for learning.</p>
</li>
<li><p><code>RemovalPolicy.DESTROY</code> means the table will be deleted when you run <code>cdk destroy</code>. For production apps you would not use this.</p>
</li>
</ul>
<h2 id="heading-part-4-write-the-lambda-functions">Part 4: Write the Lambda Functions</h2>
<p>A Lambda function is your server, but unlike a traditional server, it only runs when it's called. AWS spins it up on demand, runs your code, and shuts it down. You're only charged for the time your code is actually running.</p>
<p>You'll write three Lambda functions:</p>
<ul>
<li><p><code>createVendor.ts</code>: Adds a new vendor to DynamoDB</p>
</li>
<li><p><code>getVendors.ts</code>: Returns all vendors from DynamoDB</p>
</li>
<li><p><code>deleteVendor.ts</code>: Removes a vendor from DynamoDB by ID</p>
</li>
</ul>
<p>Create a new folder inside <code>backend</code>:</p>
<pre><code class="language-shell">mkdir backend/lambda
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/6330a84b-77c3-4001-9783-5fedc89ae1c0.png" alt="6330a84b-77c3-4001-9783-5fedc89ae1c0" style="display:block;margin:0 auto" width="300" height="185" loading="lazy">

<h3 id="heading-a-note-on-the-aws-sdk">A Note on the AWS SDK</h3>
<p>All three Lambda functions use <strong>AWS SDK v3</strong> (<code>@aws-sdk/client-dynamodb</code> and <code>@aws-sdk/lib-dynamodb</code>). This is the current standard. An older version of the SDK (<code>aws-sdk</code>) exists but is deprecated and not bundled in the Node.js 18 Lambda runtime, which is what you'll use. Stick to v3 throughout.</p>
<h3 id="heading-41-create-vendor-lambda">4.1 Create Vendor Lambda</h3>
<p>Create <code>backend/lambda/createVendor.ts</code>:</p>
<pre><code class="language-typescript">import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, PutCommand } from "@aws-sdk/lib-dynamodb";
import { randomUUID } from "crypto";

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

export const handler = async (event: any) =&gt; {
  try {
    const body = JSON.parse(event.body);

    const item = {
      vendorId: randomUUID(), // Generates a collision-safe unique ID
      name: body.name,
      category: body.category,
      contactEmail: body.contactEmail,
      createdAt: new Date().toISOString(),
    };

    await docClient.send(
      new PutCommand({
        TableName: process.env.TABLE_NAME!,
        Item: item,
      })
    );

    return {
      statusCode: 201,
      headers: {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Headers": "Content-Type,Authorization",
        "Access-Control-Allow-Methods": "OPTIONS,POST,GET,DELETE",
      },
      body: JSON.stringify({ message: "Vendor created", vendorId: item.vendorId }),
    };
  } catch (error) {
    console.error("Error creating vendor:", error);
    return {
      statusCode: 500,
      headers: { "Access-Control-Allow-Origin": "*" },
      body: JSON.stringify({ error: "Failed to create vendor" }),
    };
  }
};
</code></pre>
<p><strong>What each part does:</strong></p>
<ul>
<li><p><code>randomUUID()</code> generates a universally unique ID using Node's built-in <code>crypto</code> module. No extra package is needed. This is more reliable than <code>Date.now()</code>, which can produce duplicate IDs if two requests arrive within the same millisecond.</p>
</li>
<li><p><code>process.env.TABLE_NAME</code> reads the DynamoDB table name from an environment variable. You'll set this value in the CDK stack. This avoids hardcoding the table name inside your Lambda code.</p>
</li>
<li><p>The <code>headers</code> block is required for CORS (Cross-Origin Resource Sharing). Without <code>Access-Control-Allow-Origin</code>, your browser will block responses from a different domain than your frontend. Without <code>Access-Control-Allow-Headers</code>, the <code>Authorization</code> header you add later for Cognito will be rejected during the browser's preflight check.</p>
</li>
</ul>
<h3 id="heading-42-get-vendors-lambda">4.2 Get Vendors Lambda</h3>
<p>Create <code>backend/lambda/getVendors.ts</code>:</p>
<pre><code class="language-typescript">import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, ScanCommand } from "@aws-sdk/lib-dynamodb";

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

export const handler = async () =&gt; {
  try {
    const response = await docClient.send(
      new ScanCommand({
        TableName: process.env.TABLE_NAME!,
      })
    );

    return {
      statusCode: 200,
      headers: {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Headers": "Content-Type,Authorization",
        "Content-Type": "application/json",
      },
      body: JSON.stringify(response.Items ?? []),
    };
  } catch (error) {
    console.error("Error fetching vendors:", error);
    return {
      statusCode: 500,
      headers: { "Access-Control-Allow-Origin": "*" },
      body: JSON.stringify({ error: "Failed to fetch vendors" }),
    };
  }
};
</code></pre>
<p><strong>What each part does:</strong></p>
<ul>
<li><p><code>ScanCommand</code> reads every item in the table and returns them as an array. For a learning project this is fine. In a production app with millions of rows, you would use a more targeted <code>QueryCommand</code> to avoid reading the entire table on every request.</p>
</li>
<li><p><code>response.Items ?? []</code> returns an empty array if the table is empty, preventing the frontend from crashing when there are no vendors yet.</p>
</li>
</ul>
<h3 id="heading-43-delete-vendor-lambda">4.3 Delete Vendor Lambda</h3>
<p>Create <code>backend/lambda/deleteVendor.ts</code>:</p>
<pre><code class="language-typescript">import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, DeleteCommand } from "@aws-sdk/lib-dynamodb";

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

export const handler = async (event: any) =&gt; {
  try {
    const body = JSON.parse(event.body);
    const { vendorId } = body;

    if (!vendorId) {
      return {
        statusCode: 400,
        headers: { "Access-Control-Allow-Origin": "*" },
        body: JSON.stringify({ error: "vendorId is required" }),
      };
    }

    await docClient.send(
      new DeleteCommand({
        TableName: process.env.TABLE_NAME!,
        Key: { vendorId },
      })
    );

    return {
      statusCode: 200,
      headers: {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Headers": "Content-Type,Authorization",
        "Access-Control-Allow-Methods": "OPTIONS,POST,GET,DELETE",
      },
      body: JSON.stringify({ message: "Vendor deleted" }),
    };
  } catch (error) {
    console.error("Error deleting vendor:", error);
    return {
      statusCode: 500,
      headers: { "Access-Control-Allow-Origin": "*" },
      body: JSON.stringify({ error: "Failed to delete vendor" }),
    };
  }
};
</code></pre>
<p><strong>What each part does:</strong></p>
<ul>
<li><p><code>DeleteCommand</code> removes the item whose <code>vendorId</code> matches the key you provide. DynamoDB doesn't return an error if the item doesn't exist. It simply does nothing.</p>
</li>
<li><p>The <code>400</code> guard at the top returns a clear error if the caller forgets to send a <code>vendorId</code>, rather than letting DynamoDB throw a confusing internal error.</p>
</li>
</ul>
<h2 id="heading-part-5-build-the-api-with-api-gateway">Part 5: Build the API with API Gateway</h2>
<p>API Gateway is what gives your Lambda functions a public URL. Without it, there's no way for your browser to trigger a Lambda function. Think of it as the front door of your backend: it receives HTTP requests, checks whether the caller is authorized, routes the request to the correct Lambda, and returns the Lambda's response to the caller.</p>
<p>Now you'll wire everything together in <code>backend/lib/backend-stack.ts</code>.</p>
<h3 id="heading-51-add-lambda-functions-and-api-gateway-to-the-stack">5.1 Add Lambda Functions and API Gateway to the Stack</h3>
<p>Replace the entire contents of <code>backend/lib/backend-stack.ts</code> with this complete, assembled file:</p>
<pre><code class="language-typescript">import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // 1. DynamoDB Table 
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: {
        name: 'vendorId',
        type: dynamodb.AttributeType.STRING,
      },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // 2. Lambda Functions
    const lambdaEnv = { TABLE_NAME: vendorTable.tableName };

    const createVendorLambda = new NodejsFunction(this, 'CreateVendorHandler', {
      entry: 'lambda/createVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const getVendorsLambda = new NodejsFunction(this, 'GetVendorsHandler', {
      entry: 'lambda/getVendors.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const deleteVendorLambda = new NodejsFunction(this, 'DeleteVendorHandler', {
      entry: 'lambda/deleteVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    // 3. Permissions (Least Privilege)
    vendorTable.grantWriteData(createVendorLambda);
    vendorTable.grantReadData(getVendorsLambda);
    vendorTable.grantWriteData(deleteVendorLambda);

    // 4. API Gateway
    const api = new apigateway.RestApi(this, 'VendorApi', {
      restApiName: 'Vendor Service',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization'],
      },
    });

    const vendors = api.root.addResource('vendors');
    vendors.addMethod('POST', new apigateway.LambdaIntegration(createVendorLambda));
    vendors.addMethod('GET', new apigateway.LambdaIntegration(getVendorsLambda));
    vendors.addMethod('DELETE', new apigateway.LambdaIntegration(deleteVendorLambda));

    // 5. Outputs
    new cdk.CfnOutput(this, 'ApiEndpoint', {
      value: api.url,
    });
  }
}
</code></pre>
<p><strong>What each section does:</strong></p>
<p><code>NodejsFunction</code> is a special CDK construct that automatically bundles your Lambda code and all its dependencies into a single file using <code>esbuild</code> before uploading it to AWS. This is why you installed <code>esbuild</code> in Part 2.</p>
<p>Always use <code>NodejsFunction</code> instead of the basic <code>lambda.Function</code> construct. The basic version requires you to manually manage bundling, which causes "Module not found" errors at runtime.</p>
<p><strong>Permissions (Least Privilege):</strong> In AWS, no resource can communicate with any other resource by default. A Lambda function has no access to DynamoDB, S3, or anything else unless you explicitly grant it.</p>
<p>This is called the <strong>Least Privilege</strong> principle: each piece of your system gets exactly the permissions it needs, and nothing more. <code>grantWriteData</code> lets a Lambda write and delete items. <code>grantReadData</code> lets a Lambda read items. Using separate grants for each function means the <code>getVendors</code> Lambda can never accidentally delete data.</p>
<p><code>CfnOutput</code> prints a value to your terminal after <code>cdk deploy</code> completes. You'll use the <code>ApiEndpoint</code> URL to configure your frontend.</p>
<h2 id="heading-part-6-deploy-the-backend-to-aws">Part 6: Deploy the Backend to AWS</h2>
<p>Your infrastructure is fully defined in code. Now you'll deploy it to AWS and get a live API URL.</p>
<h3 id="heading-61-bootstrap-your-aws-environment">6.1 Bootstrap Your AWS Environment</h3>
<p>Before your first CDK deployment, AWS needs a small landing zone in your account – an S3 bucket where CDK can upload your Lambda bundles and other assets. This setup step is called <strong>bootstrapping</strong> and only needs to be done once per AWS account per region.</p>
<p>From inside your <code>backend</code> folder, run:</p>
<pre><code class="language-shell">cdk bootstrap
</code></pre>
<p><strong>Important</strong>: Bootstrapping is region-specific. If you ever switch to a different AWS region, you will need to run <code>cdk bootstrap</code> again in that region.</p>
<h3 id="heading-62-deploy">6.2 Deploy</h3>
<p>Run:</p>
<pre><code class="language-shell">cdk deploy
</code></pre>
<p>CDK will display a summary of everything it is about to create and ask for your confirmation. Type <code>y</code> and press Enter.</p>
<p>When the deployment finishes, you'll see an <strong>Outputs</strong> section in your terminal:</p>
<pre><code class="language-plaintext">Outputs:
BackendStack.ApiEndpoint = https://abcdef123.execute-api.us-east-1.amazonaws.com/prod/
</code></pre>
<p>Copy that URL. You'll need it when building the frontend.</p>
<h3 id="heading-63-troubleshooting-how-to-read-aws-error-logs">6.3 Troubleshooting: How to Read AWS Error Logs</h3>
<p>Real deployments rarely go perfectly the first time. If something goes wrong after deploying, here is how to find the actual error message.</p>
<h4 id="heading-error-502-bad-gateway">Error: 502 Bad Gateway</h4>
<p>A <code>502</code> means API Gateway received your request but your Lambda crashed before it could respond. The most common cause is a missing environment variable – for example, if <code>TABLE_NAME</code> was not passed correctly and the Lambda cannot find the table.</p>
<p>To find the actual error message, use <strong>CloudWatch Logs</strong>:</p>
<ol>
<li><p>Log in to the AWS Console and search for CloudWatch</p>
</li>
<li><p>In the left sidebar, click Logs --&gt; Log groups</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/abfb78fc-574b-4a75-a12b-12fb09f041b3.png" alt="CloudWatch left sidebar with log groups, and the search field showing /aws/lambda/" style="display:block;margin:0 auto" width="1915" height="428" loading="lazy">

<ol>
<li><p>Find the group named <code>/aws/lambda/BackendStack-CreateVendorHandler...</code></p>
</li>
<li><p>Click the most recent Log stream</p>
</li>
<li><p>Read the error message. It will tell you exactly what went wrong</p>
</li>
</ol>
<p>Two common messages and their fixes:</p>
<ul>
<li><p><code>Runtime.ImportModuleError</code> : Your Lambda cannot find a module. Make sure you're using <code>NodejsFunction</code> (not <code>lambda.Function</code>) in your CDK stack. <code>NodejsFunction</code> automatically bundles dependencies; <code>lambda.Function</code> does not.</p>
</li>
<li><p><code>AccessDeniedException</code>: Your Lambda tried to access DynamoDB but doesn't have permission. Check that you have the correct <code>grantWriteData</code> or <code>grantReadData</code> call in your stack for that Lambda.</p>
</li>
</ul>
<h2 id="heading-part-7-build-the-react-frontend">Part 7: Build the React Frontend</h2>
<p>Your backend is live. Now you'll build the React UI that talks to it.</p>
<h3 id="heading-71-define-the-vendor-type">7.1 Define the Vendor Type</h3>
<p>Before writing any API or component code, define what a "vendor" looks like in TypeScript. This gives you type safety throughout your frontend code.</p>
<p>Create <code>frontend/types/vendor.ts</code>:</p>
<pre><code class="language-typescript">export interface Vendor {
  vendorId?: string; // Optional when creating — the Lambda generates it
  name: string;
  category: string;
  contactEmail: string;
  createdAt?: string;
}
</code></pre>
<p>The <code>vendorId?</code> is marked optional with <code>?</code> because when you are <em>creating</em> a new vendor, you don't have an ID yet. The <code>createVendor</code> Lambda generates one. When you <em>read</em> vendors back from the API, <code>vendorId</code> will always be present.</p>
<h3 id="heading-72-create-the-api-service-layer">7.2 Create the API Service Layer</h3>
<p>Rather than writing <code>fetch</code> calls directly inside your React components, you'll centralize all your API logic in one file. This pattern is called a <strong>service layer</strong>. It keeps your components clean and makes it easy to update API calls in one place.</p>
<p>First, create a <code>.env.local</code> file inside your <code>frontend</code> folder to store your API URL:</p>
<pre><code class="language-bash"># frontend/.env.local
NEXT_PUBLIC_API_URL=https://abcdef123.execute-api.us-east-1.amazonaws.com/prod
</code></pre>
<p>Replace the URL with the <code>ApiEndpoint</code> value from your <code>cdk deploy</code> output. The <code>NEXT_PUBLIC_</code> prefix is required by Next.js to make an environment variable accessible in the browser.</p>
<p>You might be wondering: <strong>why not hardcode the URL</strong>? If you paste your API URL directly into your code and push it to GitHub, it becomes publicly visible. While an API URL alone does not expose your data (Cognito will protect that), it's good practice to keep URLs and secrets out of source control. Always use .env.local and add it to your .gitignore.</p>
<p>Make sure <code>.env.local</code> is in your <code>.gitignore</code>:</p>
<pre><code class="language-shell">echo ".env.local" &gt;&gt; frontend/.gitignore
</code></pre>
<p>Now create <code>frontend/lib/api.ts</code>:</p>
<pre><code class="language-typescript">import { Vendor } from '@/types/vendor';

const BASE_URL = process.env.NEXT_PUBLIC_API_URL!;

export const getVendors = async (): Promise&lt;Vendor[]&gt; =&gt; {
  const response = await fetch(`${BASE_URL}/vendors`);
  if (!response.ok) throw new Error('Failed to fetch vendors');
  return response.json();
};

export const createVendor = async (vendor: Omit&lt;Vendor, 'vendorId' | 'createdAt'&gt;): Promise&lt;void&gt; =&gt; {
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(vendor),
  });
  if (!response.ok) throw new Error('Failed to create vendor');
};

export const deleteVendor = async (vendorId: string): Promise&lt;void&gt; =&gt; {
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'DELETE',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ vendorId }),
  });
  if (!response.ok) throw new Error('Failed to delete vendor');
};
</code></pre>
<p><strong>What each part does:</strong></p>
<ul>
<li><p><code>Omit&lt;Vendor, 'vendorId' | 'createdAt'&gt;</code> means the <code>createVendor</code> function accepts a vendor without an ID or timestamp (those are generated server-side).</p>
</li>
<li><p><code>if (!response.ok) throw new Error(...)</code> ensures that any HTTP error (4xx or 5xx) surfaces as a JavaScript error in your component, where you can show the user a meaningful message instead of silently failing.</p>
</li>
</ul>
<p>You'll update these functions later in Part 8 to include the Cognito auth token.</p>
<h3 id="heading-73-build-the-main-page">7.3 Build the Main Page</h3>
<p>Now create the main page component. It includes a form for adding vendors and a live list that displays all current vendors.</p>
<p>Replace the contents of <code>frontend/app/page.tsx</code> with:</p>
<pre><code class="language-typescript">'use client';

import { useState, useEffect } from 'react';
import { createVendor, getVendors, deleteVendor } from '@/lib/api';
import { Vendor } from '@/types/vendor';

export default function Home() {
  const [vendors, setVendors] = useState&lt;Vendor[]&gt;([]);
  const [form, setForm] = useState({ name: '', category: '', contactEmail: '' });
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState('');

  const loadVendors = async () =&gt; {
    try {
      const data = await getVendors();
      setVendors(data);
    } catch {
      setError('Failed to load vendors.');
    }
  };

  // Load vendors once when the page first renders
  useEffect(() =&gt; {
    loadVendors();
  }, []);
  // The empty [] means this runs only once. Without it, the effect would
  // run after every render, causing an infinite loop of fetch requests.

  const handleSubmit = async (e: React.FormEvent) =&gt; {
    e.preventDefault(); // Prevent the browser from reloading the page on submit
    setLoading(true);
    setError('');
    try {
      await createVendor(form);
      setForm({ name: '', category: '', contactEmail: '' }); // Reset the form
      await loadVendors(); // Refresh the list from DynamoDB
    } catch {
      setError('Failed to add vendor. Please try again.');
    } finally {
      setLoading(false);
    }
  };

  const handleDelete = async (vendorId: string) =&gt; {
    try {
      await deleteVendor(vendorId);
      await loadVendors(); // Refresh after deleting
    } catch {
      setError('Failed to delete vendor.');
    }
  };

  return (
    &lt;main className="p-10 max-w-5xl mx-auto"&gt;
      &lt;h1 className="text-3xl font-bold mb-2 text-gray-900"&gt;Vendor Tracker&lt;/h1&gt;
      &lt;p className="text-gray-500 mb-8"&gt;Manage your vendors, stored in AWS DynamoDB.&lt;/p&gt;

      {error &amp;&amp; (
        &lt;div className="mb-4 p-3 bg-red-100 text-red-700 rounded"&gt;{error}&lt;/div&gt;
      )}

      &lt;div className="grid grid-cols-1 md:grid-cols-2 gap-10"&gt;

        {/* ── Add Vendor Form ── */}
        &lt;section&gt;
          &lt;h2 className="text-xl font-semibold mb-4 text-gray-800"&gt;Add New Vendor&lt;/h2&gt;
          &lt;form onSubmit={handleSubmit} className="space-y-4"&gt;
            &lt;input
              className="w-full p-2 border rounded text-black focus:outline-none focus:ring-2 focus:ring-orange-400"
              placeholder="Vendor Name"
              value={form.name}
              onChange={e =&gt; setForm({ ...form, name: e.target.value })}
              required
            /&gt;
            &lt;input
              className="w-full p-2 border rounded text-black focus:outline-none focus:ring-2 focus:ring-orange-400"
              placeholder="Category (e.g. SaaS, Hardware)"
              value={form.category}
              onChange={e =&gt; setForm({ ...form, category: e.target.value })}
              required
            /&gt;
            &lt;input
              className="w-full p-2 border rounded text-black focus:outline-none focus:ring-2 focus:ring-orange-400"
              placeholder="Contact Email"
              type="email"
              value={form.contactEmail}
              onChange={e =&gt; setForm({ ...form, contactEmail: e.target.value })}
              required
            /&gt;
            &lt;button
              type="submit"
              disabled={loading}
              className="w-full bg-orange-500 text-white p-2 rounded hover:bg-orange-600 disabled:bg-gray-400 transition-colors"
            &gt;
              {loading ? 'Saving...' : 'Add Vendor'}
            &lt;/button&gt;
          &lt;/form&gt;
        &lt;/section&gt;

        {/* ── Vendor List ── */}
        &lt;section&gt;
          &lt;h2 className="text-xl font-semibold mb-4 text-gray-800"&gt;
            Current Vendors ({vendors.length})
          &lt;/h2&gt;
          &lt;div className="space-y-3"&gt;
            {vendors.length === 0 ? (
              &lt;p className="text-gray-400 italic"&gt;No vendors yet. Add one using the form.&lt;/p&gt;
            ) : (
              vendors.map(v =&gt; (
                &lt;div
                  key={v.vendorId}
                  className="p-4 border rounded shadow-sm bg-white flex justify-between items-start"
                &gt;
                  &lt;div&gt;
                    &lt;p className="font-semibold text-gray-900"&gt;{v.name}&lt;/p&gt;
                    &lt;p className="text-sm text-gray-500"&gt;{v.category} · {v.contactEmail}&lt;/p&gt;
                  &lt;/div&gt;
                  &lt;button
                    onClick={() =&gt; v.vendorId &amp;&amp; handleDelete(v.vendorId)}
                    className="ml-4 text-sm text-red-500 hover:text-red-700 hover:underline"
                  &gt;
                    Delete
                  &lt;/button&gt;
                &lt;/div&gt;
              ))
            )}
          &lt;/div&gt;
        &lt;/section&gt;

      &lt;/div&gt;
    &lt;/main&gt;
  );
}
</code></pre>
<p><strong>Key points in this component:</strong></p>
<ul>
<li><p><code>'use client'</code> at the top is a Next.js directive. It tells Next.js that this component uses browser APIs (<code>useState</code>, <code>useEffect</code>, event handlers) and must run in the browser, not be pre-rendered on the server.</p>
</li>
<li><p><code>e.preventDefault()</code> inside <code>handleSubmit</code> stops the browser's default form submission behavior, which would cause a full page reload and wipe your React state.</p>
</li>
<li><p>After every <code>createVendor</code> or <code>deleteVendor</code> call, <code>loadVendors()</code> is called again. This re-fetches the latest data from DynamoDB so the UI always matches what is actually stored in the database.</p>
</li>
</ul>
<h3 id="heading-74-test-the-app-locally">7.4 Test the App Locally</h3>
<p>Start your Next.js development server:</p>
<pre><code class="language-shell">cd frontend
npm run dev
</code></pre>
<p>Open <code>http://localhost:3000</code> in your browser. You should see the two-panel layout. Try adding a vendor and confirm it appears in the list.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/281f971a-27b8-49b3-9079-e12601525d80.png" alt="The running Vendor Tracker app at localhost:3000 showing the two-panel layout with the Add Vendor form on the left and an empty vendor list on the right" style="display:block;margin:0 auto" width="1690" height="708" loading="lazy">

<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/88b5dd74-5847-4310-bec3-b1a2b129fbaa.png" alt="The Vendor Tracker app after a vendor has been added, showing the vendor card in the list" style="display:block;margin:0 auto" width="1646" height="598" loading="lazy">

<h4 id="heading-verifying-the-connection-to-aws">Verifying the connection to AWS:</h4>
<p>Open Chrome DevTools (F12) and click the Network tab. When you add a vendor, you should see:</p>
<ul>
<li><p>A <code>POST</code> request to your AWS API URL returning a <strong>201</strong> status code</p>
</li>
<li><p>A <code>GET</code> request returning <strong>200</strong> with the updated vendor list</p>
</li>
</ul>
<p>You can also verify the data was saved by opening the AWS Console, navigating to <strong>DynamoDB --&gt; Tables --&gt; VendorTable --&gt; Explore table items</strong>. Your vendor should appear there.</p>
<h2 id="heading-part-8-add-authentication-with-amazon-cognito">Part 8: Add Authentication with Amazon Cognito</h2>
<p>Right now your API is completely open. Anyone who finds your API URL can add or delete vendors. You'll fix that with <strong>Amazon Cognito</strong>.</p>
<p>Cognito is AWS's authentication service. It manages a User Pool – a database of registered users with usernames and passwords. When a user logs in, Cognito issues a JWT (JSON Web Token): a cryptographically signed string that proves who the user is. Your API Gateway will check for this token on every request. No valid token means no access.</p>
<p><strong>What is a JWT?</strong> A JSON Web Token is a string that looks like <code>eyJhbGci...</code>. It contains encoded information about the user and is signed by Cognito using a secret key.</p>
<p>API Gateway can verify the signature without contacting Cognito on every request, which makes token checking fast. Think of it as a tamper-proof badge: anyone can read the name on it, but only Cognito's signature makes it valid.</p>
<h3 id="heading-81-add-cognito-to-the-cdk-stack">8.1 Add Cognito to the CDK Stack</h3>
<p>Open <code>backend/lib/backend-stack.ts</code> and update it to include Cognito. Here is the complete updated file:</p>
<pre><code class="language-typescript">import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as cognito from 'aws-cdk-lib/aws-cognito';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // ─── 1. DynamoDB Table ────────────────────────────────────────────────────
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: { name: 'vendorId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // ─── 2. Lambda Functions ──────────────────────────────────────────────────
    const lambdaEnv = { TABLE_NAME: vendorTable.tableName };

    const createVendorLambda = new NodejsFunction(this, 'CreateVendorHandler', {
      entry: 'lambda/createVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const getVendorsLambda = new NodejsFunction(this, 'GetVendorsHandler', {
      entry: 'lambda/getVendors.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const deleteVendorLambda = new NodejsFunction(this, 'DeleteVendorHandler', {
      entry: 'lambda/deleteVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    // ─── 3. Permissions ───────────────────────────────────────────────────────
    vendorTable.grantWriteData(createVendorLambda);
    vendorTable.grantReadData(getVendorsLambda);
    vendorTable.grantWriteData(deleteVendorLambda);

    // ─── 4. Cognito User Pool ─────────────────────────────────────────────────
    const userPool = new cognito.UserPool(this, 'VendorUserPool', {
      selfSignUpEnabled: true,
      signInAliases: { email: true },
      autoVerify: { email: true },
      userVerification: {
        emailStyle: cognito.VerificationEmailStyle.CODE,
      },
    });

    // Required to host Cognito's internal auth endpoints
    userPool.addDomain('VendorUserPoolDomain', {
      cognitoDomain: {
        domainPrefix: `vendor-tracker-${this.account}`,
      },
    });

    const userPoolClient = userPool.addClient('VendorAppClient');

    // ─── 5. API Gateway + Authorizer ──────────────────────────────────────────
    const api = new apigateway.RestApi(this, 'VendorApi', {
      restApiName: 'Vendor Service',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization'],
      },
    });

    const authorizer = new apigateway.CognitoUserPoolsAuthorizer(
      this,
      'VendorAuthorizer',
      { cognitoUserPools: [userPool] }
    );

    const authOptions = {
      authorizer,
      authorizationType: apigateway.AuthorizationType.COGNITO,
    };

    const vendors = api.root.addResource('vendors');
    vendors.addMethod('GET', new apigateway.LambdaIntegration(getVendorsLambda), authOptions);
    vendors.addMethod('POST', new apigateway.LambdaIntegration(createVendorLambda), authOptions);
    vendors.addMethod('DELETE', new apigateway.LambdaIntegration(deleteVendorLambda), authOptions);

    // ─── 6. Outputs ───────────────────────────────────────────────────────────
    new cdk.CfnOutput(this, 'ApiEndpoint', { value: api.url });
    new cdk.CfnOutput(this, 'UserPoolId', { value: userPool.userPoolId });
    new cdk.CfnOutput(this, 'UserPoolClientId', { value: userPoolClient.userPoolClientId });
  }
}
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/c5e91abf-e6af-429f-bf5b-b14d18233f6c.png" alt="The newly created User Pool (VendorUserPool...) in the User Pools list, with the User Pool ID visible" style="display:block;margin:0 auto" width="1100" height="454" loading="lazy">

<p><strong>What changed:</strong></p>
<ul>
<li><p><code>CognitoUserPoolsAuthorizer</code> tells API Gateway to check every request for a valid Cognito JWT before passing it to any Lambda. If the token is missing or invalid, API Gateway rejects the request with a <code>401 Unauthorized</code> response without ever touching your Lambda.</p>
</li>
<li><p><code>authOptions</code> is applied to all three API methods: GET, POST, and DELETE. All routes are now protected.</p>
</li>
<li><p><code>autoVerify: { email: true }</code> tells Cognito to mark the email attribute as verified after a user confirms via the verification code email. It doesn't skip the verification email, as users still receive a code. If you want to skip verification during development, you can manually confirm users in the Cognito console (covered in section 8.5).</p>
</li>
<li><p>Two new <code>CfnOutput</code> values (<code>UserPoolId</code> and <code>UserPoolClientId</code>) will appear in your terminal after the next deployment. Your frontend needs them to connect to Cognito.</p>
</li>
</ul>
<p>Deploy the updated stack:</p>
<pre><code class="language-shell">cd backend
cdk deploy
</code></pre>
<p>After deployment, your terminal output will include three values:</p>
<pre><code class="language-plaintext">Outputs:
BackendStack.ApiEndpoint     = https://abc123.execute-api.us-east-1.amazonaws.com/prod/
BackendStack.UserPoolId      = us-east-1_xxxxxxxx
BackendStack.UserPoolClientId = xxxxxxxxxxxxxxxxxxxx
</code></pre>
<p>Save all three values. You'll use them in the next step.</p>
<h3 id="heading-82-install-and-configure-aws-amplify">8.2 Install and Configure AWS Amplify</h3>
<p><strong>AWS Amplify</strong> is a frontend library that handles all the complex authentication logic for you: it manages the login UI, stores tokens in the browser, refreshes expired tokens automatically, and exposes a simple API to read the current user's session.</p>
<p>Install the Amplify libraries inside your <code>frontend</code> folder:</p>
<pre><code class="language-shell">cd frontend
npm install aws-amplify @aws-amplify/ui-react
</code></pre>
<p>Create <code>frontend/app/providers.tsx</code>. This file initializes Amplify with your Cognito configuration. It runs once when the app loads:</p>
<pre><code class="language-typescript">'use client';

import { Amplify } from 'aws-amplify';

Amplify.configure(
  {
    Auth: {
      Cognito: {
        userPoolId: process.env.NEXT_PUBLIC_USER_POOL_ID!,
        userPoolClientId: process.env.NEXT_PUBLIC_USER_POOL_CLIENT_ID!,
      },
    },
  },
  { ssr: true }
);

export function Providers({ children }: { children: React.ReactNode }) {
  return &lt;&gt;{children}&lt;/&gt;;
}
</code></pre>
<p>Add the Cognito IDs to your <code>frontend/.env.local</code> file:</p>
<pre><code class="language-shell">NEXT_PUBLIC_API_URL=https://abc123.execute-api.us-east-1.amazonaws.com/prod
NEXT_PUBLIC_USER_POOL_ID=us-east-1_xxxxxxxx
NEXT_PUBLIC_USER_POOL_CLIENT_ID=xxxxxxxxxxxxxxxxxxxx
</code></pre>
<p>Replace the values with the outputs from your <code>cdk deploy</code>.</p>
<h3 id="heading-83-wire-providers-into-the-app-layout">8.3 Wire Providers into the App Layout</h3>
<p><strong>This step is critical.</strong> Amplify must be initialized before any component tries to use authentication. If you skip this step, <code>fetchAuthSession()</code> will throw an "Amplify not configured" error and nothing will work.</p>
<p>Open <code>frontend/app/layout.tsx</code> and update it to wrap the app in the <code>Providers</code> component:</p>
<pre><code class="language-typescript">import type { Metadata } from 'next';
import './globals.css';
import { Providers } from './providers';

export const metadata: Metadata = {
  title: 'Vendor Tracker',
  description: 'Manage your vendors with AWS',
};

export default function RootLayout({
  children,
}: {
  children: React.ReactNode;
}) {
  return (
    &lt;html lang="en"&gt;
      &lt;body&gt;
        &lt;Providers&gt;{children}&lt;/Providers&gt;
      &lt;/body&gt;
    &lt;/html&gt;
  );
}
</code></pre>
<p>By wrapping <code>{children}</code> in <code>&lt;Providers&gt;</code>, you ensure that Amplify is configured once at the root of the app, before any child page or component renders.</p>
<h3 id="heading-84-protect-the-ui-with-withauthenticator">8.4 Protect the UI with withAuthenticator</h3>
<p>Now wrap your <code>Home</code> component so that unauthenticated users see a login screen instead of the dashboard.</p>
<p>Replace the contents of <code>frontend/app/page.tsx</code> with this updated version:</p>
<pre><code class="language-typescript">'use client';

import { useState, useEffect } from 'react';
import { withAuthenticator } from '@aws-amplify/ui-react';
import '@aws-amplify/ui-react/styles.css';
import { getVendors, createVendor, deleteVendor } from '@/lib/api';
import { Vendor } from '@/types/vendor';

// withAuthenticator injects `signOut` and `user` as props automatically
function Home({ signOut, user }: { signOut?: () =&gt; void; user?: any }) {
  const [vendors, setVendors] = useState&lt;Vendor[]&gt;([]);
  const [form, setForm] = useState({ name: '', category: '', contactEmail: '' });
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState('');

  const loadVendors = async () =&gt; {
    try {
      const data = await getVendors();
      setVendors(data);
    } catch {
      setError('Failed to load vendors.');
    }
  };

  useEffect(() =&gt; {
    loadVendors();
  }, []);

  const handleSubmit = async (e: React.FormEvent) =&gt; {
    e.preventDefault();
    setLoading(true);
    setError('');
    try {
      await createVendor(form);
      setForm({ name: '', category: '', contactEmail: '' });
      await loadVendors();
    } catch {
      setError('Failed to add vendor.');
    } finally {
      setLoading(false);
    }
  };

  const handleDelete = async (vendorId: string) =&gt; {
    try {
      await deleteVendor(vendorId);
      await loadVendors();
    } catch {
      setError('Failed to delete vendor.');
    }
  };

  return (
    &lt;main className="p-10 max-w-5xl mx-auto"&gt;
      {/* ── Header ── */}
      &lt;header className="flex justify-between items-center mb-8 p-4 bg-gray-100 rounded"&gt;
        &lt;div&gt;
          &lt;h1 className="text-xl font-bold text-gray-900"&gt;Vendor Tracker&lt;/h1&gt;
          &lt;p className="text-sm text-gray-500"&gt;Signed in as: {user?.signInDetails?.loginId}&lt;/p&gt;
        &lt;/div&gt;
        &lt;button
          onClick={signOut}
          className="bg-red-500 text-white px-4 py-2 rounded hover:bg-red-600 transition-colors"
        &gt;
          Sign Out
        &lt;/button&gt;
      &lt;/header&gt;

      {error &amp;&amp; (
        &lt;div className="mb-4 p-3 bg-red-100 text-red-700 rounded"&gt;{error}&lt;/div&gt;
      )}

      &lt;div className="grid grid-cols-1 md:grid-cols-2 gap-10"&gt;

        {/* ── Add Vendor Form ── */}
        &lt;section&gt;
          &lt;h2 className="text-xl font-semibold mb-4 text-gray-800"&gt;Add New Vendor&lt;/h2&gt;
          &lt;form onSubmit={handleSubmit} className="space-y-4"&gt;
            &lt;input
              className="w-full p-2 border rounded text-black"
              placeholder="Vendor Name"
              value={form.name}
              onChange={e =&gt; setForm({ ...form, name: e.target.value })}
              required
            /&gt;
            &lt;input
              className="w-full p-2 border rounded text-black"
              placeholder="Category (e.g. SaaS, Hardware)"
              value={form.category}
              onChange={e =&gt; setForm({ ...form, category: e.target.value })}
              required
            /&gt;
            &lt;input
              className="w-full p-2 border rounded text-black"
              placeholder="Contact Email"
              type="email"
              value={form.contactEmail}
              onChange={e =&gt; setForm({ ...form, contactEmail: e.target.value })}
              required
            /&gt;
            &lt;button
              type="submit"
              disabled={loading}
              className="w-full bg-orange-500 text-white p-2 rounded hover:bg-orange-600 disabled:bg-gray-400"
            &gt;
              {loading ? 'Saving...' : 'Add Vendor'}
            &lt;/button&gt;
          &lt;/form&gt;
        &lt;/section&gt;

        {/* ── Vendor List ── */}
        &lt;section&gt;
          &lt;h2 className="text-xl font-semibold mb-4 text-gray-800"&gt;
            Current Vendors ({vendors.length})
          &lt;/h2&gt;
          &lt;div className="space-y-3"&gt;
            {vendors.length === 0 ? (
              &lt;p className="text-gray-400 italic"&gt;No vendors yet.&lt;/p&gt;
            ) : (
              vendors.map(v =&gt; (
                &lt;div
                  key={v.vendorId}
                  className="p-4 border rounded shadow-sm bg-white flex justify-between items-start"
                &gt;
                  &lt;div&gt;
                    &lt;p className="font-semibold text-gray-900"&gt;{v.name}&lt;/p&gt;
                    &lt;p className="text-sm text-gray-500"&gt;{v.category} · {v.contactEmail}&lt;/p&gt;
                  &lt;/div&gt;
                  &lt;button
                    onClick={() =&gt; v.vendorId &amp;&amp; handleDelete(v.vendorId)}
                    className="ml-4 text-sm text-red-500 hover:text-red-700 hover:underline"
                  &gt;
                    Delete
                  &lt;/button&gt;
                &lt;/div&gt;
              ))
            )}
          &lt;/div&gt;
        &lt;/section&gt;

      &lt;/div&gt;
    &lt;/main&gt;
  );
}

// Wrapping Home with withAuthenticator means any user who is not logged in
// will see Amplify's built-in login/signup screen instead of this component.
export default withAuthenticator(Home);
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/e65a88dc-ea75-4daa-b7cf-eac3406c8060.png" alt="Amplify-generated login screen" style="display:block;margin:0 auto" width="1639" height="805" loading="lazy">

<h3 id="heading-85-pass-the-auth-token-to-api-calls">8.5 Pass the Auth Token to API Calls</h3>
<p>Now that API Gateway requires a JWT on every request, your <code>fetch</code> calls need to include the token in the <code>Authorization</code> header. Without it, every request will return a <code>401 Unauthorized</code> error.</p>
<p>Update <code>frontend/lib/api.ts</code> with a token helper and updated fetch calls:</p>
<pre><code class="language-typescript">import { fetchAuthSession } from 'aws-amplify/auth';
import { Vendor } from '@/types/vendor';

const BASE_URL = process.env.NEXT_PUBLIC_API_URL!;

// Retrieves the current user's JWT token from the active Amplify session
const getAuthToken = async (): Promise&lt;string&gt; =&gt; {
  const session = await fetchAuthSession();
  const token = session.tokens?.idToken?.toString();
  if (!token) throw new Error('No active session. Please sign in.');
  return token;
};

export const getVendors = async (): Promise&lt;Vendor[]&gt; =&gt; {
  const token = await getAuthToken();
  const response = await fetch(`${BASE_URL}/vendors`, {
    headers: { Authorization: token },
  });
  if (!response.ok) throw new Error('Failed to fetch vendors');
  return response.json();
};

export const createVendor = async (
  vendor: Omit&lt;Vendor, 'vendorId' | 'createdAt'&gt;
): Promise&lt;void&gt; =&gt; {
  const token = await getAuthToken();
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: token,
    },
    body: JSON.stringify(vendor),
  });
  if (!response.ok) throw new Error('Failed to create vendor');
};

export const deleteVendor = async (vendorId: string): Promise&lt;void&gt; =&gt; {
  const token = await getAuthToken();
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'DELETE',
    headers: {
      'Content-Type': 'application/json',
      Authorization: token,
    },
    body: JSON.stringify({ vendorId }),
  });
  if (!response.ok) throw new Error('Failed to delete vendor');
};
</code></pre>
<p><strong>What</strong> <code>getAuthToken</code> <strong>does:</strong></p>
<p><code>fetchAuthSession()</code> reads the currently logged-in user's session from the browser. Amplify stores the session in memory and <code>localStorage</code> after the user signs in.</p>
<p><code>session.tokens?.idToken</code> is the JWT string that API Gateway's Cognito Authorizer is looking for. Passing it as the <code>Authorization</code> header tells API Gateway: "This request is from an authenticated user."</p>
<h3 id="heading-86-troubleshooting-cognito">8.6 Troubleshooting Cognito</h3>
<h4 id="heading-unconfirmed-user-error-after-sign-up">"Unconfirmed" user error after sign-up</h4>
<p>When a new user signs up through the Amplify UI, Cognito marks the account as <em>Unconfirmed</em> until the user verifies their email address. A verification code is sent to the user's email. After entering the code, the account becomes confirmed and the user can log in.</p>
<p>If you are testing locally and want to skip the email step, you can manually confirm any account in the AWS Console:</p>
<ol>
<li><p>Open the AWS Console and navigate to Cognito</p>
</li>
<li><p>Click on your User Pool (<code>VendorUserPool...</code>)</p>
</li>
<li><p>Click the Users tab</p>
</li>
<li><p>Click on the user's email address</p>
</li>
<li><p>Open the Actions dropdown and click Confirm account</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/158fb773-9cb1-4c14-9fd7-49e4369ba7e3.png" alt=" Cognito Users list showing a user with &quot;Unconfirmed&quot; status" style="display:block;margin:0 auto" width="1100" height="190" loading="lazy">

<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/5637ac80-ee0c-4fdf-93cf-d4b7d71f6a65.png" alt="Cognito Users list showing a user with &quot;Unconfirmed&quot; status" style="display:block;margin:0 auto" width="1100" height="442" loading="lazy">

<h4 id="heading-401-unauthorized-errors-after-deployment">401 Unauthorized errors after deployment</h4>
<p>If you are getting 401 errors, check two things:</p>
<ol>
<li><p>Open Chrome DevTools --&gt; Network tab, click the failing request, and look at the <strong>Request Headers</strong>. You should see an <code>Authorization</code> header with a long string of characters. If it is missing, <code>getAuthToken</code> is failing. Check that Amplify is configured correctly in <code>providers.tsx</code> and wired in via <code>layout.tsx</code>.</p>
</li>
<li><p>In your CDK stack, confirm that <code>authorizationType: apigateway.AuthorizationType.COGNITO</code> is present on every protected method definition. If it is missing, API Gateway may not be checking tokens even though the authorizer is defined.</p>
</li>
</ol>
<h2 id="heading-part-9-deploy-the-frontend-with-s3-and-cloudfront">Part 9: Deploy the Frontend with S3 and CloudFront</h2>
<p>Your app works locally. Now you'll deploy it to a real HTTPS URL that anyone in the world can visit.</p>
<p><strong>The strategy:</strong> Next.js will export your React app as a set of static HTML, CSS, and JavaScript files. Those files will be uploaded to an <strong>S3 bucket</strong> (AWS's file storage service). <strong>CloudFront</strong> sits in front of the bucket as a Content Delivery Network (CDN), distributing your files to servers around the world and serving them over HTTPS.</p>
<h3 id="heading-91-configure-nextjs-for-static-export">9.1 Configure Next.js for Static Export</h3>
<p>Open <code>frontend/next.config.js</code> (or <code>next.config.mjs</code>) and add the <code>output: 'export'</code> setting:</p>
<pre><code class="language-javascript">/** @type {import('next').NextConfig} */
const nextConfig = {
  output: 'export', // Generates a static /out folder instead of a Node.js server
};

export default nextConfig;
</code></pre>
<p><strong>Note on 'use client' and static export</strong>: When output: 'export' is set, Next.js builds every page at compile time. Any component that uses browser-only APIs – like withAuthenticator from Amplify – must have 'use client' at the top of the file. This tells Next.js to skip server-side rendering for that component and run it only in the browser.</p>
<p>You already have 'use client' in page.tsx. If you ever see a build error mentioning window is not defined or similar, check that the relevant component has 'use client' at the top.</p>
<p>Build the frontend:</p>
<pre><code class="language-shell">cd frontend
npm run build
</code></pre>
<p>This generates an <code>/out</code> folder containing your complete website as static files. Verify the folder was created:</p>
<pre><code class="language-shell">ls out
# You should see: index.html, _next/, etc.
</code></pre>
<h3 id="heading-92-add-s3-and-cloudfront-to-the-cdk-stack">9.2 Add S3 and CloudFront to the CDK Stack</h3>
<p>Open <code>backend/lib/backend-stack.ts</code> and add the hosting infrastructure. Here's the complete final version of the file:</p>
<pre><code class="language-typescript">import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as cognito from 'aws-cdk-lib/aws-cognito';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as cloudfront from 'aws-cdk-lib/aws-cloudfront';
import * as origins from 'aws-cdk-lib/aws-cloudfront-origins';
import * as s3deploy from 'aws-cdk-lib/aws-s3-deployment';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // 1. DynamoDB Table 
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: { name: 'vendorId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // 2. Lambda Functions
    const lambdaEnv = { TABLE_NAME: vendorTable.tableName };

    const createVendorLambda = new NodejsFunction(this, 'CreateVendorHandler', {
      entry: 'lambda/createVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const getVendorsLambda = new NodejsFunction(this, 'GetVendorsHandler', {
      entry: 'lambda/getVendors.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const deleteVendorLambda = new NodejsFunction(this, 'DeleteVendorHandler', {
      entry: 'lambda/deleteVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    // 3. Permissions
    vendorTable.grantWriteData(createVendorLambda);
    vendorTable.grantReadData(getVendorsLambda);
    vendorTable.grantWriteData(deleteVendorLambda);

    // 4. Cognito User Pool
    const userPool = new cognito.UserPool(this, 'VendorUserPool', {
      selfSignUpEnabled: true,
      signInAliases: { email: true },
      autoVerify: { email: true },
      userVerification: {
        emailStyle: cognito.VerificationEmailStyle.CODE,
      },
    });

    userPool.addDomain('VendorUserPoolDomain', {
      cognitoDomain: { domainPrefix: `vendor-tracker-${this.account}` },
    });

    const userPoolClient = userPool.addClient('VendorAppClient');

    // 5. API Gateway + Authorizer
    const api = new apigateway.RestApi(this, 'VendorApi', {
      restApiName: 'Vendor Service',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization'],
      },
    });

    const authorizer = new apigateway.CognitoUserPoolsAuthorizer(
      this,
      'VendorAuthorizer',
      { cognitoUserPools: [userPool] }
    );

    const authOptions = {
      authorizer,
      authorizationType: apigateway.AuthorizationType.COGNITO,
    };

    const vendors = api.root.addResource('vendors');
    vendors.addMethod('GET', new apigateway.LambdaIntegration(getVendorsLambda), authOptions);
    vendors.addMethod('POST', new apigateway.LambdaIntegration(createVendorLambda), authOptions);
    vendors.addMethod('DELETE', new apigateway.LambdaIntegration(deleteVendorLambda), authOptions);

    // 6. S3 Bucket (Frontend Files) 
    const siteBucket = new s3.Bucket(this, 'VendorSiteBucket', {
      blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      autoDeleteObjects: true,
    });

    // 7. CloudFront Distribution (HTTPS + CDN)
    const distribution = new cloudfront.Distribution(this, 'SiteDistribution', {
      defaultBehavior: {
        origin: new origins.S3Origin(siteBucket),
        viewerProtocolPolicy: cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
      },
      defaultRootObject: 'index.html',
      errorResponses: [
        {
          // Redirect all 404s back to index.html so React can handle routing
          httpStatus: 404,
          responseHttpStatus: 200,
          responsePagePath: '/index.html',
        },
      ],
    });

    // 8. Deploy Frontend Files to S3 
    new s3deploy.BucketDeployment(this, 'DeployWebsite', {
      sources: [s3deploy.Source.asset('../frontend/out')],
      destinationBucket: siteBucket,
      distribution,
      distributionPaths: ['/*'], // Clears CloudFront cache on every deploy
    });

    // 9. Outputs ───────────────────────────────────────────────────────────
    new cdk.CfnOutput(this, 'ApiEndpoint', { value: api.url });
    new cdk.CfnOutput(this, 'UserPoolId', { value: userPool.userPoolId });
    new cdk.CfnOutput(this, 'UserPoolClientId', { value: userPoolClient.userPoolClientId });
    new cdk.CfnOutput(this, 'CloudFrontURL', {
      value: `https://${distribution.distributionDomainName}`,
    });
  }
}
</code></pre>
<p><strong>What the hosting infrastructure does:</strong></p>
<ul>
<li><p>The <strong>S3 bucket</strong> stores your static HTML, CSS, and JavaScript files. It is private – users cannot access it directly.</p>
</li>
<li><p><strong>CloudFront</strong> is the CDN that sits in front of S3. It gives you an HTTPS URL and caches your files at edge locations worldwide, so the app loads fast no matter where users are located. <code>REDIRECT_TO_HTTPS</code> automatically upgrades any HTTP request to HTTPS.</p>
</li>
<li><p>The <strong>error response</strong> for 404 returns <code>index.html</code> instead of an error page. This is necessary for single-page apps: if a user navigates directly to a route like <code>/vendors/123</code>, CloudFront cannot find a file at that path, but sending back <code>index.html</code> lets the React app handle the routing correctly.</p>
</li>
<li><p><code>distributionPaths: ['/*']</code> tells CloudFront to invalidate its entire cache after every deployment. This ensures users always see the latest version of your app immediately.</p>
</li>
<li><p><code>BucketDeployment</code> is a CDK construct that automatically uploads the contents of your <code>frontend/out</code> folder to the S3 bucket every time you run <code>cdk deploy</code>.</p>
</li>
</ul>
<h3 id="heading-93-run-the-final-deployment">9.3 Run the Final Deployment</h3>
<p>First, build the frontend with the latest environment variables:</p>
<pre><code class="language-shell">cd frontend
npm run build
</code></pre>
<p>Then deploy everything from the backend folder:</p>
<pre><code class="language-shell">cd ../backend
cdk deploy
</code></pre>
<p>After deployment finishes, copy the <code>CloudFrontURL</code> from the terminal output:</p>
<pre><code class="language-plaintext">Outputs:
BackendStack.CloudFrontURL = https://d1234abcd.cloudfront.net
</code></pre>
<p>Open that URL in your browser. Your app is now live on the internet, served over HTTPS, globally distributed.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/f8e14979-a667-4afc-bdd4-9afe4abd9593.png" alt="f8e14979-a667-4afc-bdd4-9afe4abd9593" style="display:block;margin:0 auto" width="1686" height="804" loading="lazy">

<h2 id="heading-what-you-built">What You Built</h2>
<p>You now have a fully deployed, production-style full-stack application. Here is a summary of every piece you built and what it does:</p>
<table>
<thead>
<tr>
<th>Layer</th>
<th>Service</th>
<th>What it does</th>
</tr>
</thead>
<tbody><tr>
<td>Frontend</td>
<td>Next.js + CloudFront</td>
<td>React UI served globally over HTTPS</td>
</tr>
<tr>
<td>Auth</td>
<td>Amazon Cognito + Amplify</td>
<td>User sign-up, login, and JWT token management</td>
</tr>
<tr>
<td>API</td>
<td>API Gateway</td>
<td>Routes HTTP requests, validates auth tokens</td>
</tr>
<tr>
<td>Logic</td>
<td>AWS Lambda (×3)</td>
<td>Creates, reads, and deletes vendors on demand</td>
</tr>
<tr>
<td>Database</td>
<td>DynamoDB</td>
<td>Stores vendor records with no idle cost</td>
</tr>
<tr>
<td>Storage</td>
<td>S3</td>
<td>Holds your built frontend files</td>
</tr>
<tr>
<td>Infrastructure</td>
<td>AWS CDK</td>
<td>Defines and deploys all of the above as code</td>
</tr>
</tbody></table>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You have built and deployed the foundational pattern of almost every cloud application: a secured API backed by a database, deployed with infrastructure as code. Here is everything you accomplished:</p>
<p>You set up a professional AWS development environment with scoped IAM credentials. You defined your entire backend infrastructure as TypeScript code using AWS CDK, which means your database, API, Lambda functions, and authentication system are all version-controlled, repeatable, and deployable with a single command.</p>
<p>You wrote three Lambda functions that handle create, read, and delete operations, each with proper error handling and the correct AWS SDK v3 patterns. You connected them to a REST API through API Gateway and protected every route with Amazon Cognito authentication, so only registered, verified users can interact with your data.</p>
<p>On the frontend, you built a Next.js application with a service layer that cleanly separates API logic from UI components, manages JWTs automatically through AWS Amplify, and gives users a complete sign-up and sign-in flow without you writing a single line of authentication UI code.</p>
<p>Finally, you deployed the entire system: your backend to AWS Lambda and DynamoDB, and your frontend as a static site served globally through CloudFront over HTTPS.</p>
<p>The full source code for this tutorial is available on <a href="https://github.com/BenedictaUche/vendor-tracker">GitHub</a>. Clone it, modify it, and use it as a reference for your own projects.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Serverless RAG Pipeline on AWS That Scales to Zero ]]>
                </title>
                <description>
                    <![CDATA[ Most RAG tutorials end the same way: you've got a working prototype and a bill for a vector database that runs whether anyone's querying it or not. Add an always-on embedding service, a hosted LLM end ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-serverless-rag-pipeline-on-aws-that-scales-to-zero/</link>
                <guid isPermaLink="false">69b1b23c6c896b0519b4eda8</guid>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ serverless ]]>
                    </category>
                
                    <category>
                        <![CDATA[ RAG  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Christopher Galliart ]]>
                </dc:creator>
                <pubDate>Wed, 11 Mar 2026 18:19:40 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/c0416d9e-9661-47a3-ba9c-8001f5f91b8c.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most RAG tutorials end the same way: you've got a working prototype and a bill for a vector database that runs whether anyone's querying it or not. Add an always-on embedding service, a hosted LLM endpoint, and the usual AWS infrastructure, and you're looking at real money before a single user shows up.</p>
<p>But it doesn't have to work that way. In this tutorial, you'll deploy a fully serverless RAG pipeline that processes documents, images, video, and audio, then scales to zero when nobody's using it.</p>
<p>Everything runs in your AWS account, your data never leaves your infrastructure, and your ongoing monthly cost for a modest knowledge base will be closer to <code>2-3 USD</code> than <code>300 USD</code>.</p>
<p>We'll use <a href="https://github.com/HatmanStack/RAGStack-Lambda">RAGStack-Lambda</a>, an open-source project I built on AWS. By the end, you'll have a deployed pipeline with a dashboard, an AI chat interface with source citations, a drop-in web component you can embed in any app, and an MCP server you can use to feed your assistant context.</p>
<h3 id="heading-heres-what-well-cover">Here's what we'll cover:</h3>
<ul>
<li><p><a href="#heading-what-this-actually-costs">What This Actually Costs</a></p>
</li>
<li><p><a href="#heading-what-youre-building">What You're Building</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-deploying-from-aws-marketplace">Deploying from AWS Marketplace</a></p>
</li>
<li><p><a href="#heading-deploying-from-source">Deploying from Source</a></p>
</li>
<li><p><a href="#heading-uploading-your-first-documents">Uploading Your First Documents</a></p>
</li>
<li><p><a href="#heading-chatting-with-your-knowledge-base">Chatting With Your Knowledge Base</a></p>
</li>
<li><p><a href="#heading-embedding-the-web-component-in-your-app">Embedding the Web Component in Your App</a></p>
</li>
<li><p><a href="#heading-using-the-mcp-server">Using the MCP Server</a></p>
</li>
<li><p><a href="#heading-what-you-can-build-from-here">What You Can Build From Here</a></p>
</li>
<li><p><a href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-what-this-actually-costs">What This Actually Costs</h2>
<p>Before we build anything, let's talk money, because the cost story is the whole point.</p>
<p>RAG pipelines have two cost phases: ingestion (processing your documents once) and operation (querying them over time).</p>
<p>Most platforms charge you a flat monthly rate regardless of which phase you're in. A serverless architecture flips that: ingestion costs something, and then everything scales to zero.</p>
<h3 id="heading-ingestion-the-one-time-hit">Ingestion: The One-Time Hit</h3>
<p>When you upload documents, several things happen: text extraction (OCR for PDFs and images), embedding generation, metadata extraction, and storage. Here's what that actually costs per service:</p>
<p><strong>Textract (OCR):</strong> This is the most expensive part of ingestion, and it only applies to scanned PDFs and images that need text extraction. Plain text, HTML, CSV, and other text-based formats skip this entirely.</p>
<p>Textract charges about <code>1.50 USD</code> per 1,000 pages for standard text detection. If you're uploading 500 pages of scanned PDFs, that's about <code>0.75 USD</code>. A heavy initial load of several thousand scanned pages might run <code>5-10 USD</code>. But once your documents are processed, you never pay this again unless you add new ones.</p>
<p><strong>Bedrock Embeddings (Nova Multimodal):</strong> This is where your content gets converted into vectors for semantic search. The pricing is almost comically cheap:</p>
<ul>
<li><p>Text: <code>0.00002 USD</code> per 1,000 input tokens</p>
</li>
<li><p>Images: <code>0.00115 USD</code> per image</p>
</li>
<li><p>Video/Audio: <code>0.00200 USD</code> per minute</p>
</li>
</ul>
<p>To put that in perspective: if you have 1,500 text documents averaging 2,500 tokens each after chunking, your total embedding cost is about <code>0.08 USD</code>. A knowledge base with 500 images runs <code>0.58 USD</code>. Even a mixed corpus of text, images, and a few hours of video stays well under <code>2 USD</code> for the entire embedding pass. This is a one-time cost – you only re-embed if you add or update documents.</p>
<p><strong>Bedrock LLM (Metadata Extraction):</strong> RAGStack uses an LLM to analyze each document and extract structured metadata automatically. This is a few inference calls per document using Nova Lite or a similar model. At <code>0.06 USD</code>/<code>0.24 USD</code> per million input/output tokens, processing 1,500 documents costs well under <code>1 USD</code>.</p>
<p><strong>S3 Vectors (Storage):</strong> Storing your embeddings. At <code>0.06 USD</code> per GB/month, a knowledge base of 1,500 documents with 1,024-dimension vectors takes up a trivially small amount of space. We're talking pennies per month.</p>
<p><strong>S3 (Document Storage):</strong> Your source documents in standard S3. Even cheaper, <code>0.023 USD</code> per GB/month.</p>
<p><strong>DynamoDB:</strong> Stores document metadata and processing state. The on-demand pricing model means you pay per request during ingestion, then essentially nothing at rest. A few cents for the initial load.</p>
<p>To put real numbers on it: if you upload 200 text documents (PDFs, HTML, markdown), your total ingestion cost is likely under <code>1 USD</code>. If you upload 1,000 scanned PDFs that need OCR, you might see <code>5-8 USD</code> as a one-time hit. That <code>7-10 USD</code> figure you might see referenced? That's the upper end for a heavy initial load with lots of OCR work.</p>
<h3 id="heading-operation-where-scale-to-zero-shines">Operation: Where Scale-to-Zero Shines</h3>
<p>Once your documents are ingested, the pipeline is waiting. Not running. Waiting. Here's what each query costs:</p>
<p><strong>Lambda:</strong> Invocations are billed per request and duration. The free tier covers 1 million requests/month. For a personal or small-team knowledge base, you may never leave the free tier.</p>
<p><strong>S3 Vectors (Queries):</strong> <code>2.50 USD</code> per million query API calls, plus a per-TB data processing charge. For a small index queried a few hundred times a month, this rounds to effectively zero.</p>
<p><strong>Bedrock (Chat Inference):</strong> This is your main operating cost. Each chat response requires an LLM call. Using Nova Lite at <code>0.06 USD</code> per million input tokens and <code>0.24 USD</code> per million output tokens, a typical RAG query (retrieval context + user question + response) might cost <code>0.001-0.003 USD</code> per query. A hundred queries a month is <code>0.10-0.30 USD</code>.</p>
<p><strong>Step Functions:</strong> Orchestrates the document processing pipeline. Standard workflows charge <code>0.025 USD</code> per 1,000 state transitions. Minimal during operation since it's only active during ingestion.</p>
<p><strong>Cognito:</strong> User authentication. Free for the first 10,000 monthly active users.</p>
<p><strong>CloudFront:</strong> Serves the dashboard UI. Free tier covers 1 TB of data transfer per month.</p>
<p><strong>API Gateway:</strong> Handles GraphQL API requests. Free tier covers 1 million API calls per month.</p>
<p>Add it all up for a knowledge base with 500 documents getting a few hundred queries per month, and your monthly operating cost is somewhere between <code>0.50 USD</code> and <code>3.00 USD</code>. Most of that is the LLM inference for chat responses.</p>
<h3 id="heading-the-comparison-that-matters">The Comparison That Matters</h3>
<p>Here's the same pipeline on a traditional always-on stack:</p>
<table>
<thead>
<tr>
<th>Service</th>
<th>RAGStack-Lambda</th>
<th>Traditional Stack</th>
</tr>
</thead>
<tbody><tr>
<td>Vector Database</td>
<td>S3 Vectors: pennies/mo</td>
<td>Pinecone Starter: <code>70 USD</code>/mo</td>
</tr>
<tr>
<td>Vector Database (alt)</td>
<td>S3 Vectors: pennies/mo</td>
<td>OpenSearch Serverless: about <code>350 USD</code>/mo min</td>
</tr>
<tr>
<td>Compute</td>
<td>Lambda: free tier</td>
<td>EC2 or ECS: <code>50-150 USD</code>/mo</td>
</tr>
<tr>
<td>LLM Inference</td>
<td>Same per-query cost</td>
<td>Same per-query cost</td>
</tr>
<tr>
<td>Total (idle)</td>
<td>about <code>0.50-3.00 USD</code>/mo</td>
<td><code>120-500 USD</code>/mo</td>
</tr>
</tbody></table>
<p>The LLM inference cost per query is roughly the same everywhere – that's Bedrock's on-demand pricing regardless of your architecture. The difference is everything else. Traditional stacks pay a floor cost whether anyone's using them or not. A serverless stack pays for what it uses, and idle costs essentially nothing.</p>
<h3 id="heading-what-about-transcribe">What About Transcribe?</h3>
<p>If you're uploading video or audio, AWS Transcribe adds cost for speech-to-text conversion. Standard transcription runs about <code>0.024 USD</code> per minute of audio. A 10-minute video costs <code>0.24 USD</code> to transcribe. This is a one-time ingestion cost, once transcribed and embedded, the resulting text chunks are queried like any other document.</p>
<h2 id="heading-what-youre-building">What You're Building</h2>
<p>By the end of this tutorial, you'll have a deployed pipeline that does the following:</p>
<ol>
<li><p>You upload a document (PDF, image, video, audio, HTML, CSV, <a href="https://github.com/HatmanStack/RAGStack-Lambda/blob/main/docs/ARCHITECTURE.md">the full list</a> is extensive) through a web dashboard.</p>
</li>
<li><p>The pipeline detects the file type and routes it to the right processor. Scanned PDFs go through OCR via Textract. Video and audio go through Transcribe for speech-to-text, split into 30-second searchable chunks with speaker identification. Images get visual embeddings and any caption text you provide.</p>
</li>
<li><p>An LLM analyzes each document and extracts structured metadata, topic, document type, date range, people mentioned, whatever's relevant. This happens automatically.</p>
</li>
<li><p>Everything gets embedded using Amazon Nova Multimodal Embeddings and stored in a Bedrock Knowledge Base backed by S3 Vectors.</p>
</li>
<li><p>You (or your users) ask questions through an AI chat interface. The pipeline retrieves relevant documents, passes them as context to a Bedrock LLM, and returns an answer with collapsible source citations, including timestamp links for video and audio that jump to the exact position.</p>
</li>
</ol>
<p>All of this runs in your AWS account. No external control plane, no third-party services beyond AWS itself.</p>
<h3 id="heading-the-architecture">The Architecture</h3>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/45eca6a5-91b4-4f55-8b1a-ba9f59a3e25d.png" alt="The diagram illustrates a flowchart of a buyer's AWS account, detailing the application plane with processes like S3 to Lambda OCR, supported by services like Cognito Auth. It emphasizes Amazon Bedrock's integration for knowledge and chat." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>A few things to note about this architecture:</p>
<p><strong>Step Functions orchestrate everything.</strong> When a document is uploaded, a state machine manages the entire processing flow, detecting the file type, routing to the right processor, waiting for async operations like Transcribe jobs, then triggering embedding and metadata extraction.</p>
<p>This is what makes the pipeline reliable without a running server. If a step fails, it retries. You can see exactly where every document is in the processing pipeline.</p>
<p><strong>Lambda does the compute.</strong> Every processing step is a Lambda function. They spin up when needed, run for a few seconds to a few minutes, and shut down. There's no EC2 instance idling at 3 AM.</p>
<p><strong>S3 Vectors is the vector store.</strong> Your embeddings live in S3's purpose-built vector storage rather than in a dedicated vector database like Pinecone or OpenSearch.</p>
<p>This is what makes the "scale to zero" cost possible: you're paying object storage rates for vector data instead of keeping a database cluster warm. It also means your vectors are sitting in your own S3 bucket, not in a third-party managed service that holds your data on their terms.</p>
<p><strong>Cognito handles auth.</strong> The dashboard and API are protected with Cognito user pools. When you deploy, you get a temporary password via email. The web component uses IAM-based authentication, and server-side integrations use API key auth.</p>
<p><strong>CloudFront serves the UI.</strong> The dashboard is a static React app served through CloudFront, so there's no web server to maintain.</p>
<h3 id="heading-two-ways-to-deploy">Two Ways to Deploy</h3>
<p>You have two deployment paths depending on what you want:</p>
<p><strong>AWS Marketplace (the fast path)</strong>, click deploy, fill in two fields (stack name and email), and wait about 10 minutes. No local tooling required. This is the path we'll walk through first.</p>
<p><strong>From Source (the developer path)</strong>, Clone the repo, run <code>publish.py</code>, and deploy via SAM CLI. This is the path for when you want to customize the processing pipeline, modify the UI, or contribute to the project. We'll cover this after the Marketplace walkthrough.</p>
<p>Both paths produce the same stack. The Marketplace version just wraps the CloudFormation template in a one-click deployment.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you deploy, you'll need:</p>
<ul>
<li><p><strong>An AWS account</strong> with permissions to create CloudFormation stacks, Lambda functions, S3 buckets, DynamoDB tables, and Cognito user pools. If you're using an admin account, you're covered.</p>
</li>
<li><p><strong>Bedrock model access:</strong> RAGStack defaults to <code>us-east-1</code> because that's where Nova Multimodal Embeddings is available. Amazon's own models (including Nova) are available by default in Bedrock, no manual enablement required. Just make sure your IAM role has the necessary <code>bedrock:InvokeModel</code> permissions.</p>
</li>
<li><p><strong>For the Marketplace path:</strong> just a web browser.</p>
</li>
<li><p><strong>For the source path:</strong> Python 3.13+, Node.js 24+, AWS CLI and SAM CLI configured, and Docker (for building Lambda layers).</p>
</li>
</ul>
<h2 id="heading-deploying-from-aws-marketplace">Deploying from AWS Marketplace</h2>
<p>This is the fastest path – no local tools, no CLI, no Docker. You'll launch a CloudFormation stack and have a working pipeline in about 10 minutes.</p>
<h3 id="heading-step-1-launch-the-stack">Step 1: Launch the Stack</h3>
<p>Click the <a href="https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?templateURL=https://ragstack-quicklaunch-public.s3.us-east-1.amazonaws.com/ragstack-template.yaml&amp;stackName=my-docs">direct deploy link</a> to open CloudFormation's "Quick create stack" page with the template pre-loaded.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/d354f6bc-dee8-4f44-9b3b-523ea27564c7.png" alt="Screenshot of AWS CloudFormation Quick Create Stack page in dark mode. Sections for template URL, stack name, parameters, and build options are visible." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-2-fill-in-two-fields">Step 2: Fill In Two Fields</h3>
<p>The page has a lot of options, but you only need two:</p>
<ul>
<li><p><strong>Stack name:</strong> Must be lowercase. This becomes the prefix for all your AWS resources (for example, <code>my-docs</code>, <code>team-kb</code>, <code>project-notes</code>). Keep it short.</p>
</li>
<li><p><strong>Admin Email:</strong> Under Required Settings. Cognito will send your temporary login credentials here. Use an email you can access right now.</p>
</li>
</ul>
<p>Everything else – Build Options, Advanced Settings, OCR Backend, model selections – can stay at the defaults. They're there for customization later, but the defaults work out of the box.</p>
<h3 id="heading-step-3-deploy">Step 3: Deploy</h3>
<p>Scroll to the bottom, check the three acknowledgment boxes under "Capabilities and transforms," and click <strong>Create stack</strong>.</p>
<p>Deployment takes roughly 10 minutes. You can watch the progress in the CloudFormation Events tab if you're curious, but there's nothing to do until the stack status flips to <code>CREATE_COMPLETE</code>.</p>
<h3 id="heading-step-4-log-in">Step 4: Log In</h3>
<p>Once the stack finishes, check your email. Cognito sends you the dashboard URL and a temporary password. Log in, set a new password, and you're looking at an empty dashboard ready for documents.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/5ac31b6c-2782-4b66-82a9-0cb962c5dac4.png" alt="A software dashboard interface titled 'Document Pipeline (Demo)' displaying options for uploading, scraping, and searching documents. The screen shows no current documents or scrape jobs, with menu options on the left and a search and filter bar at the center. The overall tone is functional and minimalist." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-deploying-from-source">Deploying from Source</h2>
<p>If you want to customize the pipeline, modify the UI, or contribute to the project, deploy from source instead.</p>
<h3 id="heading-step-1-clone-and-set-up">Step 1: Clone and Set Up</h3>
<pre><code class="language-bash">git clone https://github.com/HatmanStack/RAGStack-Lambda.git
cd RAGStack-Lambda

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
</code></pre>
<h3 id="heading-step-2-deploy">Step 2: Deploy</h3>
<p>The <code>publish.py</code> script handles everything: building the frontend, packaging Lambda functions, and deploying via SAM CLI.</p>
<pre><code class="language-bash">python publish.py \
  --project-name my-docs \
  --admin-email admin@example.com
</code></pre>
<p>This defaults to <code>us-east-1</code> for Nova Multimodal Embeddings. The script will build the React dashboard, build the web component, package all Lambda layers with Docker, and deploy the CloudFormation stack through SAM.</p>
<p>First deploy takes longer (15-20 minutes) because it's building everything from scratch. Subsequent deploys are faster since SAM caches unchanged resources.</p>
<p>If you only want to iterate on the backend and skip UI builds:</p>
<pre><code class="language-bash"># Skip dashboard build (still builds web component)
python publish.py --project-name my-docs --admin-email admin@example.com --skip-ui

# Skip ALL UI builds
python publish.py --project-name my-docs --admin-email admin@example.com --skip-ui-all
</code></pre>
<p>Once it finishes, you'll get the same Cognito email and dashboard URL as the Marketplace path.</p>
<h2 id="heading-uploading-your-first-documents">Uploading Your First Documents</h2>
<p>The dashboard has tabs for different content types. We'll start with the Documents tab since that's the most common use case.</p>
<h3 id="heading-documents">Documents</h3>
<p>Click the <strong>Documents</strong> tab and upload a file. RAGStack accepts a wide range of formats: PDF, DOCX, XLSX, HTML, CSV, JSON, XML, EML, EPUB, TXT, and Markdown. Drag and drop or use the file picker.</p>
<p>Once uploaded, the document enters the processing pipeline. You'll see the status update in real time:</p>
<ol>
<li><p><strong>UPLOADED:</strong> File received and stored in S3.</p>
</li>
<li><p><strong>PROCESSING:</strong> Step Functions has picked it up and routed it to the right processor. Text-based files (HTML, CSV, Markdown) go through direct extraction. Scanned PDFs and images go through Textract OCR. The LLM analyzes the content and extracts structured metadata, topic, document type, people mentioned, date ranges, whatever's relevant to the content.</p>
</li>
<li><p><strong>INDEXED:</strong> Embeddings generated, vectors stored, document is searchable.</p>
</li>
</ol>
<p>Text documents typically process in 1-5 minutes. OCR-heavy documents (scanned PDFs, images with text) can take 2-15 minutes depending on page count.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/3df05041-2632-41a9-a71c-6d764c503f2a.png" alt="Screenshot of a document upload interface labeled &quot;Document Pipeline (Demo).&quot; Central panel shows a box for drag-and-drop file upload. Sleek, modern design." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-images">Images</h3>
<p>The <strong>Images</strong> tab works differently. Upload a JPG, PNG, GIF, or WebP and you can add a caption. Both the visual content and caption text get embedded using Nova Multimodal Embeddings, so you can search by what's in the image or by your description of it.</p>
<p>This is where multimodal embeddings earn their keep. A traditional text-only RAG pipeline would need you to describe every image manually. Here, the image itself becomes searchable, and since everything stays in your AWS account, you're not sending personal photos or sensitive visual content to an external service to get there.</p>
<h3 id="heading-what-about-video-and-audio">What About Video and Audio?</h3>
<p>Upload video or audio files and RAGStack routes them through AWS Transcribe for speech-to-text conversion. The transcript gets split into 30-second chunks with speaker identification, then embedded like any other document. When chat results reference a video source, you get timestamp links that jump to the exact position in the recording.</p>
<h3 id="heading-web-scraping">Web Scraping</h3>
<p>The <strong>Scrape</strong> tab lets you pull websites directly into your knowledge base. Enter a URL and RAGStack crawls the page, extracts the content, and processes it through the same pipeline as uploaded documents, metadata extraction, embedding, indexing.</p>
<p>This is useful for building a knowledge base from existing web content without manually saving and uploading pages. Documentation sites, blog archives, reference material, anything publicly accessible.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/ac2c6239-a323-4770-80f7-31aa7ff3bdfb.png" alt="Web scraping interface with fields for URL, max pages, and depth. A dropdown for scope selection and a 'Start Scrape' button are visible." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-chatting-with-your-knowledge-base">Chatting With Your Knowledge Base</h2>
<p>This is the payoff. Go to the <strong>Chat</strong> tab, type a question, and RAGStack retrieves relevant documents from your knowledge base, passes them as context to a Bedrock LLM, and returns an answer with source citations.</p>
<p>The citations are collapsible, so click to expand and see which documents informed the answer, with the option to download the source file. For video and audio sources, you get clickable timestamps that jump to the relevant moment.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/760b3cd0-8bb8-493d-97ce-5eb3d0138592.png" alt="Screenshot of a web interface titled &quot;Knowledge Base Chat&quot; with menu options on the left. The central section prompts users to ask document-related questions." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-metadata-filtering">Metadata Filtering</h3>
<p>If you've uploaded enough documents to have meaningful metadata categories, the chat interface lets you filter search results by metadata before querying. RAGStack auto-discovers the metadata structure from your documents, so you don't configure this manually, it just appears as your knowledge base grows.</p>
<p>This is useful when you have a large mixed corpus. Instead of hoping the vector search picks the right context from thousands of documents, you can narrow it down: "only search documents about project X" or "only search content from Q4 2024."</p>
<h2 id="heading-embedding-the-web-component-in-your-app">Embedding the Web Component in Your App</h2>
<p>The dashboard is useful for managing your knowledge base, but the real power is embedding RAGStack's chat in your own application. The web component works with any framework, React, Vue, Angular, Svelte, plain HTML.</p>
<p>Load the script once from your CloudFront distribution:</p>
<pre><code class="language-html">&lt;script src="https://your-cloudfront-url/ragstack-chat.js"&gt;&lt;/script&gt;
</code></pre>
<p>Then drop the component wherever you want a chat interface:</p>
<pre><code class="language-html">&lt;ragstack-chat
  conversation-id="my-app"
  header-text="Ask About Documents"
&gt;&lt;/ragstack-chat&gt;
</code></pre>
<p>That's it. The component handles authentication (via IAM), manages conversation state, and renders source citations, all self-contained. Your CloudFront URL is in the stack outputs.</p>
<p>For server-side integrations that don't need a UI, the GraphQL API is available with API key authentication. You can find your endpoint and API key in the dashboard under Settings.</p>
<h2 id="heading-using-the-mcp-server">Using the MCP Server</h2>
<p>RAGStack includes an MCP server that connects your knowledge base to AI assistants like Claude Desktop, Cursor, VS Code, and Amazon Q CLI. Instead of switching to the dashboard to search your documents, you ask your assistant directly.</p>
<p>Install it:</p>
<pre><code class="language-bash">pip install ragstack-mcp
</code></pre>
<p>Then add it to your AI assistant's MCP configuration:</p>
<pre><code class="language-json">{
  "ragstack": {
    "command": "uvx",
    "args": ["ragstack-mcp"],
    "env": {
      "RAGSTACK_GRAPHQL_ENDPOINT": "YOUR_ENDPOINT",
      "RAGSTACK_API_KEY": "YOUR_API_KEY"
    }
  }
}
</code></pre>
<p>Your endpoint and API key are in the dashboard under Settings. Once configured, type <code>@ragstack</code> in your assistant's chat to invoke the MCP server, then ask things like "search my knowledge base for authentication docs" and it queries RAGStack directly.</p>
<p>See the <a href="https://github.com/HatmanStack/RAGStack-Lambda/blob/main/src/ragstack-mcp/README.md">MCP Server docs</a> for the full list of available tools and setup details.</p>
<h2 id="heading-what-you-can-build-from-here">What You Can Build From Here</h2>
<p>You've got a deployed RAG pipeline that costs almost nothing to run and handles text, images, video, and audio. A few directions you might take it:</p>
<p><strong>A searchable personal archive.</strong> Every conference talk you've saved, every PDF textbook, every tutorial video that's sitting in a folder somewhere. Upload it all, and now you have one search interface across years of accumulated material. The multimodal embeddings mean your screenshots and diagrams are searchable too, not just the text.</p>
<p>I built <a href="https://github.com/HatmanStack/family-archive-document-ai">a family archive app</a> this way, scanned letters, old photos, home videos, with RAGStack deployed as a nested CloudFormation stack so the whole family can search across decades of memories using the chat widget.</p>
<p><strong>A second brain for a client project.</strong> Scrape the client's existing docs, upload the SOW and meeting notes, drop in the codebase documentation. Now you've got a searchable knowledge base scoped to that engagement. Spin it up at the start, tear it down when the contract ends. At these costs, it's disposable infrastructure.</p>
<p><strong>AI chat over a niche dataset.</strong> Recipe collections, legal filings, research papers, local government meeting minutes, any corpus that's too specialized for general-purpose LLMs to know well. The web component means you can ship it as a standalone tool without building a frontend from scratch.</p>
<p><strong>RAG for your MCP workflow.</strong> If you're already using Claude Desktop or Cursor, the MCP server turns your knowledge base into another tool your assistant can reach for. Upload your team's runbooks and architecture docs, and now <code>@ragstack</code> in your editor gives you instant context without tab-switching.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>The serverless RAG pipeline you just deployed handles document processing, multimodal embeddings, metadata extraction, and AI chat with source citations, all scaling to zero when idle, all running in your AWS account. Your documents, your vectors, your infrastructure. The traditional approach to this stack costs <code>120-500 USD</code>/month in baseline infrastructure. This one costs pocket change.</p>
<p>The full source is at <a href="https://github.com/HatmanStack/RAGStack-Lambda">github.com/HatmanStack/RAGStack-Lambda</a>. File issues, open PRs, or just poke around the architecture. If you want to go deeper on the technical tradeoffs, particularly how filtered vector search behaves on cost-optimized backends like S3 Vectors, that's a story for the next post.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build and Deploy a Production-Ready WhatsApp Bot with FastAPI, Evolution API, Docker, EasyPanel, and GCP ]]>
                </title>
                <description>
                    <![CDATA[ WhatsApp bots are widely used for customer support, automated replies, notifications, and internal tools. Instead of relying on expensive third-party platforms, you can build and deploy your own self- ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-and-deploy-a-production-ready-whatsapp-bot/</link>
                <guid isPermaLink="false">699877ac3dc17c4862f466c7</guid>
                
                    <category>
                        <![CDATA[ chatbot ]]>
                    </category>
                
                    <category>
                        <![CDATA[ whatsapp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Raju Manoj ]]>
                </dc:creator>
                <pubDate>Fri, 20 Feb 2026 15:03:08 +0000</pubDate>
                <media:content url="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/5e1e335a7a1d3fcc59028c64/de480f02-206a-4325-b4c2-788d33d746b1.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>WhatsApp bots are widely used for customer support, automated replies, notifications, and internal tools. Instead of relying on expensive third-party platforms, you can build and deploy your own self-hosted WhatsApp bot using modern open-source tools.</p>
<p>In this tutorial, you’ll learn how to build and deploy a production-ready WhatsApp bot using:</p>
<ul>
<li><p>FastAPI</p>
</li>
<li><p>Evolution API</p>
</li>
<li><p>Docker</p>
</li>
<li><p>EasyPanel</p>
</li>
<li><p>Google Cloud Platform (GCP)</p>
</li>
</ul>
<p>By the end of this guide, you will have a fully working WhatsApp bot connected to your own WhatsApp account and deployed on a cloud virtual machine.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a href="#heading-how-the-architecture-works">How the Architecture Works</a></p>
</li>
<li><p><a href="#heading-how-your-whatsapp-bot-works">How Your WhatsApp Bot Works</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-step-1-create-firewall-rules-on-gcp">Step 1: Create Firewall Rules on GCP</a></p>
</li>
<li><p><a href="#heading-step-2-create-a-virtual-machine-ubuntu-2204">Step 2: Create a Virtual Machine (Ubuntu 22.04)</a></p>
</li>
<li><p><a href="#heading-step-3-ssh-into-the-vm">Step 3: SSH into the VM</a></p>
</li>
<li><p><a href="#heading-step-4-install-docker">Step 4: Install Docker</a></p>
</li>
<li><p><a href="#heading-step-5-install-easypanel">Step 5: Install EasyPanel</a></p>
</li>
<li><p><a href="#heading-step-6-open-the-easypanel-dashboard">Step 6: Open the EasyPanel Dashboard</a></p>
</li>
<li><p><a href="#heading-step-7-deploy-evolution-api">Step 7: Deploy Evolution API</a></p>
</li>
<li><p><a href="#heading-step-8-connect-whatsapp">Step 8: Connect WhatsApp</a></p>
</li>
<li><p><a href="#heading-step-9-deploy-the-fastapi-bot">Step 9: Deploy the FastAPI Bot</a></p>
</li>
<li><p><a href="#heading-step-10-connect-the-webhook-telling-evolution-api-where-to-send-messages">Step 10: Connect the Webhook - Telling Evolution API Where to Send Messages</a></p>
</li>
<li><p><a href="#heading-step-11-final-test">Step 11: Final Test</a></p>
</li>
<li><p><a href="#heading-production-considerations">Production Considerations</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-how-the-architecture-works">How the Architecture Works</h2>
<p>Before we start installing anything, let’s understand how the system works.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770444944919/cbcd4016-c98d-4379-9c7c-2d22ab9e309a.png" alt="This diagram shows how a WhatsApp message flows from the user’s WhatsApp app, through WhatsApp servers, into a GCP VM (via firewall, Docker, and EasyPanel) where Evolution API receives it, triggers a FastAPI bot via webhook, processes the logic, and sends the reply back to the user through WhatsApp." width="600" height="400" loading="lazy">

<h2 id="heading-how-your-whatsapp-bot-works">How Your WhatsApp Bot Works</h2>
<p>Before we continue setting things up, let's make sure you understand what's actually happening behind the scenes. Don't worry – no technical experience needed here.</p>
<h3 id="heading-imagine-a-postal-service">Imagine a postal service</h3>
<p>Think of your WhatsApp bot like a very fast, automated postal service:</p>
<ul>
<li><p>Someone sends you a letter (a WhatsApp message)</p>
</li>
<li><p>A postal worker (Evolution API) picks it up and brings it to your office</p>
</li>
<li><p>Your office manager (FastAPI bot) reads it and writes a reply</p>
</li>
<li><p>The postal worker takes the reply back and delivers it</p>
</li>
</ul>
<p>That's it. That's the whole system.</p>
<h3 id="heading-the-7-steps">The 7 steps</h3>
<ol>
<li><p>Someone sends a message to your WhatsApp number – just like texting a friend.</p>
</li>
<li><p>Evolution API notices the message – it's constantly watching your WhatsApp number for new messages, like a receptionist sitting by the phone.</p>
</li>
<li><p>Evolution API passes the message to your bot – it sends the message content to your app and says <em>"hey, you've got a new message!"</em></p>
</li>
<li><p>Your bot reads the message and decides what to say – this is where your code does its job.</p>
</li>
<li><p>Your bot sends the reply back to Evolution API – <em>"okay, send this response."</em></p>
</li>
<li><p>Evolution API delivers the reply through WhatsApp.</p>
</li>
<li><p>The user sees the reply on their phone – usually within seconds.</p>
</li>
</ol>
<h3 id="heading-one-line-summary">One line summary</h3>
<pre><code class="language-plaintext">User → WhatsApp → Evolution API → Your Bot → Evolution API → WhatsApp → User
</code></pre>
<p>Every step in this guide is just setting up one piece of that chain. Once they're all connected, the whole thing runs on its own automatically.</p>
<p>This architecture allows you to automate replies while keeping full control of your infrastructure.</p>
<h3 id="heading-why-these-tools">Why These Tools?</h3>
<p>Let’s briefly understand why we’re using each tool.</p>
<h4 id="heading-fastapi">FastAPI</h4>
<p>FastAPI is a modern Python framework for building APIs. It is fast, lightweight, and ideal for handling webhook requests from Evolution API.</p>
<h4 id="heading-evolution-api">Evolution API</h4>
<p>Evolution API is a self-hosted WhatsApp automation server built on top of Baileys. It connects your personal WhatsApp account without requiring official WhatsApp Business API approval.</p>
<h4 id="heading-docker">Docker</h4>
<p>Docker allows us to run applications in containers. This makes deployments consistent, portable, and production-ready.</p>
<h4 id="heading-easypanel">EasyPanel</h4>
<p>EasyPanel is a graphical platform for managing Docker services. Instead of writing Docker Compose files manually, we use EasyPanel’s UI to deploy and manage our services easily.</p>
<h4 id="heading-google-cloud-platform-gcp">Google Cloud Platform (GCP)</h4>
<p>GCP provides the virtual machine that hosts our infrastructure. We will use an Ubuntu 22.04 server to run Docker, EasyPanel, Evolution API, and our FastAPI bot.</p>
<p>I chose these tools because they are practical, lightweight, and suitable for real-world production deployments.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before starting, make sure you have:</p>
<ul>
<li><p>A Google Cloud, AWS, or Azure account</p>
</li>
<li><p>Billing enabled</p>
</li>
<li><p>A project selected</p>
</li>
<li><p>Access to Cloud Shell</p>
</li>
<li><p>Basic Linux and Docker knowledge</p>
</li>
</ul>
<h2 id="heading-step-1-create-firewall-rules-on-gcp">Step 1: Create Firewall Rules on GCP</h2>
<p>We need to allow traffic to specific ports on our VM. So, we run this command in GCP Cloud Shell:</p>
<pre><code class="language-bash">gcloud compute firewall-rules create easypanel-whatsapp-fw \
 --network default \
 --direction INGRESS \
 --priority 1000 \
 --action ALLOW \
 --rules tcp:22,tcp:80,tcp:443,tcp:3000,tcp:8080,tcp:9000,tcp:5000-5999 \
 --source-ranges 0.0.0.0/0 \
 --description "SSH, EasyPanel, Evolution API, Bot"
</code></pre>
<p>This command:</p>
<ul>
<li><p>Creates a <strong>firewall rule</strong> named <code>easypanel-whatsapp-fw</code></p>
</li>
<li><p>On the <strong>default network</strong></p>
</li>
<li><p>Allows incoming internet traffic (<code>INGRESS</code>)</p>
</li>
<li><p>Opens these ports:</p>
<ul>
<li><p><code>22</code> → SSH (server access)</p>
</li>
<li><p><code>80</code> → HTTP</p>
</li>
<li><p><code>443</code> → HTTPS</p>
</li>
<li><p><code>3000, 8080, 9000</code> → App panels / APIs</p>
</li>
<li><p><code>5000–5999</code> → Custom app range</p>
</li>
</ul>
</li>
<li><p>Allows access from <strong>any IP address</strong> (<code>0.0.0.0/0</code>)</p>
</li>
</ul>
<p>Basically It opens your server so people (and you) can access your apps and services from the internet. This firewall rule allows external traffic to reach your VM.</p>
<h2 id="heading-step-2-create-a-virtual-machine-ubuntu-2204">Step 2: Create a Virtual Machine (Ubuntu 22.04)</h2>
<p>Now we'll create the server that hosts everything. Run the following command in the GCP Cloud Shell to set up a virtual machine with Ubuntu 22.04.</p>
<pre><code class="language-bash">gcloud compute instances create whatsapp-vm \
  --zone=asia-south1-a \
  --machine-type=e2-medium \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=30GB \
  --tags=easypanel
</code></pre>
<p>This command creates a new virtual machine (VM) on Google Cloud:</p>
<ul>
<li><p><strong>Name:</strong> <code>whatsapp-vm</code></p>
</li>
<li><p><strong>Location (zone):</strong> <code>asia-south1-a</code> (India region)</p>
</li>
<li><p><strong>Machine size:</strong> <code>e2-medium</code> (2 vCPU, 4GB RAM)</p>
</li>
<li><p><strong>Operating System:</strong> Ubuntu 22.04 LTS</p>
</li>
<li><p><strong>Disk size:</strong> 30GB</p>
</li>
<li><p><strong>Tag:</strong> <code>easypanel</code> (used to apply firewall rules)</p>
</li>
</ul>
<p>This creates a Linux server in Google Cloud that you can use to host EasyPanel, WhatsApp bot, or your APIs.</p>
<p>Note: Wait about one minute for the instance to start.</p>
<h2 id="heading-step-3-ssh-into-the-vm">Step 3: SSH into the VM</h2>
<p>Connect to your server by using SSH to access the virtual machine you just created on Google Cloud.</p>
<pre><code class="language-bash">gcloud compute ssh whatsapp-vm --zone=asia-south1-a
</code></pre>
<p>This command connects to your virtual machine named <code>whatsapp-vm</code> in the zone <code>asia-south1-a</code> using SSH (secure remote login).</p>
<p>It logs you into your Google Cloud server so you can start installing software and running commands. After running this, you will see a terminal prompt – that means you are now inside your Ubuntu server and ready to go.</p>
<h2 id="heading-step-4-install-docker">Step 4: Install Docker</h2>
<p>Docker is needed to run EasyPanel and the Evolution API.</p>
<p><strong>First update the system:</strong></p>
<pre><code class="language-bash">sudo apt update -y
sudo apt install -y curl
</code></pre>
<p>This does two things:</p>
<ol>
<li><p><code>sudo apt update -y</code>→ Updates your server’s package list (refreshes available software info).</p>
</li>
<li><p><code>sudo apt install -y curl</code>→ Installs <strong>curl</strong>, a tool used to download things from the internet using the terminal.</p>
</li>
</ol>
<p>It prepares your server and installs a tool needed to download and install other software.</p>
<p><strong>Then install Docker:</strong></p>
<pre><code class="language-bash">curl -fsSL https://get.docker.com | sudo sh
</code></pre>
<p>This command uses <code>curl</code> to download Docker’s official installation script. The <code>|</code> (pipe) sends it directly to <code>sudo sh</code>, which runs the script as administrator.</p>
<p>It automatically installs <strong>Docker</strong> on your server.</p>
<p>After this finishes, Docker should be installed.</p>
<p><strong>Enable Docker:</strong></p>
<pre><code class="language-bash">sudo systemctl enable docker
sudo systemctl start docker
</code></pre>
<p>This command does two things:</p>
<ol>
<li><p><code>enable docker</code>→ Makes Docker start automatically every time the server reboots.</p>
</li>
<li><p><code>start docker</code>→ Starts Docker right now.</p>
</li>
</ol>
<p>It turns Docker ON now and makes sure it stays ON after restart.</p>
<p><strong>Allow the Ubuntu user to run Docker:</strong></p>
<pre><code class="language-bash">sudo usermod -aG docker ubuntu
</code></pre>
<p>This command adds the user <code>ubuntu</code> to the Docker group.</p>
<p>This is important: By default, you must use <code>sudo</code> before every Docker command.After running this, the Ubuntu user can run Docker without needing sudo every time.</p>
<p>Note: This command assumes your username is <code>ubuntu</code>, which is the default on Google Cloud VMs. If your username is different, replace Ubuntu with your actual username.</p>
<p><strong>Exit the session and reconnect:</strong></p>
<pre><code class="language-bash">exit
gcloud compute ssh whatsapp-vm --zone=asia-south1-a
</code></pre>
<ol>
<li><p><code>exit</code>→ Logs you out of your current server session.</p>
</li>
<li><p><code>gcloud compute ssh whatsapp-vm --zone=asia-south1-a</code>→ Logs you back into your Google Cloud VM.</p>
</li>
</ol>
<p>Why we do this: After adding the <code>ubuntu</code> user to the Docker group, you must log out and log back in for the permission changes to work.</p>
<p><strong>Test Docker:</strong></p>
<pre><code class="language-bash">docker run hello-world
</code></pre>
<p>This command downloads a small test image called <strong>hello-world</strong>, runs it inside Docker, and prints a success message if Docker is working correctly.</p>
<p>It checks if Docker is installed and working properly. If you see “Hello from Docker!”, Docker is working correctly.</p>
<h2 id="heading-step-5-install-easypanel">Step 5: Install EasyPanel</h2>
<p>EasyPanel provides a user interface for deploying Docker services. Run this command in the VM:</p>
<pre><code class="language-bash">curl -sSL https://get.easypanel.io | sudo bash
</code></pre>
<p>This command:</p>
<ul>
<li><p>Downloads the official <strong>EasyPanel installation script</strong></p>
</li>
<li><p>Runs it with administrator (sudo) permission</p>
</li>
<li><p>Automatically installs and configures EasyPanel on your server</p>
</li>
</ul>
<p>It installs EasyPanel on your VM so you can manage apps using a web dashboard instead of commands. Installation takes about one minute.</p>
<h2 id="heading-step-6-open-the-easypanel-dashboard">Step 6: Open the EasyPanel Dashboard</h2>
<p>Once you have your IP address, open a new tab in your browser and type it in like this:</p>
<pre><code class="language-plaintext">http://&lt;YOUR_PUBLIC_IP&gt;:3000
</code></pre>
<p>For example, if your IP was 34.123.45.67, you would type:</p>
<pre><code class="language-plaintext">http://34.123.45.67:3000
</code></pre>
<p>EasyPanel runs on port 3000 by default – that's why we add :3000 at the end. Without it, your browser won't know which service to open on the server.</p>
<p>Create an admin account and log in; the EasyPanel login page will appear.</p>
<p>Click <strong>“Create Admin Account”</strong>.</p>
<p>Fill in:</p>
<ul>
<li><p><strong>Username</strong> (choose something you’ll remember)</p>
</li>
<li><p><strong>Email</strong></p>
</li>
<li><p><strong>Password</strong> (make it strong!)</p>
</li>
<li><p>Submit the form.</p>
</li>
</ul>
<p>You are now logged in as the admin and can start managing apps, APIs, and bots through the EasyPanel dashboard.</p>
<p>You will see a page like the one below:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771478879926/c222ae8c-8714-4af2-a631-c6fead8185e8.png" alt="easypanel page" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-step-7-deploy-evolution-api">Step 7: Deploy Evolution API</h2>
<ol>
<li><p>Create a new project (for example: <code>whatsapp-1</code>)</p>
</li>
<li><p>Go to Services → Templates</p>
</li>
<li><p>Select Evolution API</p>
</li>
<li><p>Deploy the latest version</p>
</li>
</ol>
<p>Wait until all services turn green. You will see a page like the one below.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771479080612/ce8453b2-e20e-4c37-b150-de4e032ba794.png" alt="Deploying evolution-api" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Next, open Environment Variables and locate:</p>
<pre><code class="language-plaintext">AUTHENTICATION_API_KEY
</code></pre>
<p>Copy the AUTHENTICATION_API_KEY.</p>
<p>Open the Evolution API dashboard</p>
<p>Inside EasyPanel, find your Evolution API service. You will see a clickable domain link – it usually looks something like:</p>
<pre><code class="language-plaintext">https://evolution-api.easypanel.host
</code></pre>
<p>Click that link to open it in your browser. You will see a JSON response confirming the service is running.</p>
<p>Once you open the link, you’ll see a JSON response confirming success. To proceed with login, copy the <strong>Manager link</strong> displayed in the response. This link opens the management dashboard where you can authenticate and begin using the Evolution API. The screenshot below highlights the manager URL along with version details for easy reference</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771480117778/017524df-f8be-41bd-9d6c-0ba04b499fe9.png" alt="manager URL" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Copy the manager link and open it in a new tab, then copy the AUTHENTICATION_API_KEY, which you did in the previous step. This is how it looks, as you can see below:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771480570638/9e18ca51-1637-4261-be22-7cd89ee0459f.png" alt="Evolution manager" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Create a new instance:</p>
<ul>
<li><p>Choose channel: <strong>Baileys</strong></p>
</li>
<li><p>Leave phone number blank</p>
</li>
<li><p>Give your instance a name</p>
</li>
</ul>
<p>Save the instance.</p>
<h2 id="heading-step-8-connect-whatsapp">Step 8: Connect WhatsApp</h2>
<p>Inside your instance dashboard:</p>
<ol>
<li><p>Click <strong>Get QR</strong></p>
</li>
<li><p>Scan it using WhatsApp on your phone</p>
</li>
</ol>
<p>Once connected, your chats and contacts will sync automatically. If syncing fails, disconnect and reconnect the session.</p>
<h2 id="heading-step-9-deploy-the-fastapi-bot">Step 9: Deploy the FastAPI Bot</h2>
<p>Now we’ll deploy the bot service.</p>
<h3 id="heading-1-go-to-easypanel">1: Go to EasyPanel</h3>
<p>You’re opening the EasyPanel dashboard you just installed. This is where you can manage apps, servers, and services using a graphical interface instead of terminal commands.</p>
<h3 id="heading-2-create-a-new-project">2: Create a new project</h3>
<p>A “project” is like a container or folder for your bot service. It organizes all files, settings, and deployments for this app.</p>
<h3 id="heading-3-add-an-app-service">3: Add an App service</h3>
<p>“App service” means a running instance of your application. In this case, it will be the WhatsApp bot.</p>
<h3 id="heading-4-choose-git-deployment">4: Choose Git deployment</h3>
<p>Git deployment lets you connect a code repository to EasyPanel.This will automatically download your code from GitHub and run it inside Docker.</p>
<h3 id="heading-5-paste-your-repository-url">5: Paste your repository URL</h3>
<pre><code class="language-plaintext">https://github.com/rajumanoj333/wabot
</code></pre>
<p>This is the GitHub repository containing the WhatsApp bot code. EasyPanel will clone this repo and prepare the app automatically.</p>
<h3 id="heading-6-domains-in-easypanel">6: Domains in EasyPanel</h3>
<p>This section lets you assign a URL or domain name to your app service. Even if you don’t have a custom domain, you can use your server’s public IP. Your WhatsApp bot app runs on port 9000 inside the server.</p>
<h3 id="heading-7-set-the-port-to-9000">7: Set the port to <code>9000</code></h3>
<p>By setting the domain to use port 9000, EasyPanel knows where to send traffic.</p>
<p>Example URL after this step:</p>
<pre><code class="language-plaintext">https://your-project.easypanel.host
</code></pre>
<p>This is the public address people (and other services) will use to reach your bot.</p>
<p>You’re telling EasyPanel:</p>
<blockquote>
<p>“Whenever someone accesses this project, forward them to the bot service running on port 9000.”</p>
</blockquote>
<p>Without this step, the bot service would run but <strong>you wouldn’t be able to access it from your browser or other apps</strong>.</p>
<h3 id="heading-configure-environment-variables">Configure Environment Variables</h3>
<p>Set the following variables:</p>
<pre><code class="language-plaintext">EVOLUTION_API_URL=http://evolution-api:8080
EVOLUTION_API_KEY=YOUR_AUTHENTICATION_API_KEY
INSTANCE_NAME=your_instance_name
</code></pre>
<p>Note: You might notice two different names here – AUTHENTICATION_API_KEY (used in EasyPanel) and EVOLUTION_API_KEY (used in your bot code). They are the same key. Just copy the value from EasyPanel and paste it into both places.</p>
<h2 id="heading-step-10-connect-the-webhook-telling-evolution-api-where-to-send-messages">Step 10: Connect the Webhook – Telling Evolution API Where to Send Messages</h2>
<p>At this point, you have two separate things running:</p>
<ol>
<li><p><strong>Evolution API</strong>: the service that connects to WhatsApp and handles messages</p>
</li>
<li><p><strong>Your app (fastapi bot)</strong>: the chatbot brain you deployed in the previous steps</p>
</li>
</ol>
<p>Right now, these two don't know each other exists. They're like two people in different rooms with no way to pass notes between them. A <strong>webhook</strong> fixes that.</p>
<h3 id="heading-so-what-exactly-is-a-webhook">So what exactly is a webhook?</h3>
<p>A webhook is simply a URL (a web address) that you hand to one service so it can automatically notify another service when something happens.</p>
<p>You're going to tell Evolution API <em>"whenever a WhatsApp message arrives, forward it to this address."</em> Your app will be sitting at that address, waiting to receive it, read it, and send a reply.</p>
<p>Think of it like a forwarding address at the post office. When mail (a WhatsApp message) arrives, it gets automatically redirected to your app's door.</p>
<h3 id="heading-lets-set-it-up">Let's set it up</h3>
<h4 id="heading-1-open-your-evolution-api-dashboard">1. Open your Evolution API dashboard.</h4>
<p>You should already have this open from earlier steps. In the left sidebar, click on <strong>Events</strong>, then click on <strong>Webhook</strong>. This is where you control how Evolution API sends data to your app.</p>
<h4 id="heading-2-turn-the-webhook-on">2. Turn the webhook on.</h4>
<p>At the top of the page, you'll see a toggle next to the word <strong>"Enabled"</strong>. Click it so it turns green. This tells Evolution API that you want to start using a webhook.</p>
<h4 id="heading-3-enter-your-apps-webhook-url">3. Enter your app's webhook URL.</h4>
<p>In the <strong>URL</strong> field, type your app's address with <code>/webhook</code> added to the end, like this:</p>
<pre><code class="language-plaintext">https://your-domain.easypanel.host/webhook
</code></pre>
<p>Replace <code>your-domain</code> with the actual domain name you set up when you deployed your app. The <code>/webhook</code> part at the end is important: it's a specific page your app has set up just for receiving these messages. Without it, Evolution API would be knocking on the wrong door.</p>
<h4 id="heading-4-leave-webhook-by-events-and-webhook-base64-turned-off-for-now">4. Leave "Webhook by Events" and "Webhook Base64" turned off for now.</h4>
<p>These are advanced options you won't need for a basic chatbot.</p>
<h4 id="heading-5-scroll-down-to-the-events-section-and-enable-these-two-events">5. Scroll down to the Events section and enable these two events:</h4>
<ul>
<li><p><strong>MESSAGES_UPSERT</strong>: This triggers every time someone sends your WhatsApp number a message. Without this, your app would never know a message arrived.</p>
</li>
<li><p><strong>SEND_MESSAGE</strong>: This triggers when a message is sent <em>out</em>. It helps your app confirm that replies are going through correctly.</p>
</li>
</ul>
<p>You can leave all the other events (like <code>APPLICATION_STARTUP</code>) turned off. They handle things like group chats and contact updates, which aren't needed for what we're building.</p>
<p><strong>6. Click Save.</strong></p>
<h3 id="heading-quick-recap-of-what-you-just-did">Quick recap of what you just did</h3>
<p>You created a direct line between Evolution API and your app. Now, the moment someone messages your WhatsApp number, Evolution API will instantly pass that message along to your app. Your app reads it, figures out a response, and sends one back all automatically.</p>
<p>This is the step that brings your chatbot to life. Without it, nothing would happen when someone sent you a message. With it, the whole system clicks into place.</p>
<h2 id="heading-step-11-final-test">Step 11: Final Test</h2>
<p>Send a message from a different WhatsApp number (not the connected one).</p>
<p>Send:</p>
<pre><code class="language-plaintext">Hi
</code></pre>
<p>If everything is configured correctly, your bot should reply:</p>
<pre><code class="language-plaintext">👋 Hello! Bot is working.
</code></pre>
<p>Congratulations! Your WhatsApp bot is now live.</p>
<h2 id="heading-production-considerations">Production Considerations</h2>
<p>For real-world deployments, consider:</p>
<ul>
<li><p>Restricting firewall rules instead of allowing 0.0.0.0/0</p>
</li>
<li><p>Using HTTPS with a custom domain</p>
</li>
<li><p>Securing API keys with a secret manager</p>
</li>
<li><p>Monitoring logs and container health</p>
</li>
<li><p>Setting up automatic backups</p>
</li>
</ul>
<p>This tutorial demonstrates the core working system, but these improvements will make your deployment more secure and scalable.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You now have a fully self-hosted WhatsApp bot running on a cloud VM using FastAPI, Evolution API, Docker, EasyPanel, and GCP.</p>
<p>This setup gives you:</p>
<ul>
<li><p>Full control over infrastructure</p>
</li>
<li><p>No dependency on expensive SaaS platforms</p>
</li>
<li><p>Production-ready container deployment</p>
</li>
<li><p>Scalable architecture</p>
</li>
</ul>
<p>From here, you can extend your bot with:</p>
<p>AI integrations : connect your bot to ChatGPT or Gemini or Claude so it can answer questions intelligently instead of just sending fixed replies.</p>
<p>Database storage: save incoming messages, user details, or conversation history to a database like PostgreSQL or MongoDB.</p>
<p>Custom automation workflows trigger actions based on keywords, like sending a PDF when someone types "menu" or booking an appointment when they type "schedule".</p>
<p>CRM integrations :connect your bot to tools like HubSpot or Notion to automatically log leads and customer conversations.Building your own infrastructure is one of the best ways to deeply understand how modern backend systems work together.</p>
<p>Happy building!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn Cloud Security Fundamentals in AWS – A Guide for Beginners ]]>
                </title>
                <description>
                    <![CDATA[ Security is a vital part of every system and infrastructure. The word "security" comes from the Latin securitas, which is composed of se- (meaning “without”) and cura (meaning “care” or “worry”). Originally, it meant "without worry." Over time, it ha... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-cloud-security-fundamentals-in-aws-a-guide-for-beginners/</link>
                <guid isPermaLink="false">69377429f6e4912378332463</guid>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cloud security ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #sharedresponsibilitymodel ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ BeginnerFriendly ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ijeoma Igboagu ]]>
                </dc:creator>
                <pubDate>Tue, 09 Dec 2025 00:58:17 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764958826138/12b261c3-9a38-4b0b-b67a-48ce54452d5f.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Security is a vital part of every system and infrastructure. The word "security" comes from the Latin <em>securitas</em>, which is composed of <em>se-</em> (meaning “without”) and <em>cura</em> (meaning “care” or “worry”). Originally, it meant "without worry." Over time, it has come to signify being safe or protected.</p>
<p>Today, when we discuss security, we usually refer to protection from harm, danger, or threats, whether in our homes, online, while using online banking, or even across an entire country. Security is important in everything we do.</p>
<p>Cloud providers, such as AWS, are no exception. Their infrastructure must be safeguarded to ensure users’ peace of mind. But on platforms like AWS, security is a <strong>shared responsibility</strong>. This means that both the provider and the user play a role in maintaining security.</p>
<p>Amazon Web Services (AWS) is one of the most popular cloud service providers worldwide. With great power and flexibility comes the responsibility to secure your infrastructure, data, and applications in the cloud.</p>
<p>In this tutorial, we’ll explore the fundamental aspects of cloud security in AWS – especially those that are your responsibility – making it easy to understand if you’re new to cloud computing.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-cloud-security">What is Cloud Security?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-is-cloud-security-important">Why is Cloud Security Important?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-cloud-security-concepts">Key Cloud Security Concepts</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-a-root-user">What is a Root User?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-an-iam-user-for-daily-tasks">How to Create an IAM User for Daily Tasks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-differences-between-the-root-user-and-an-iam-user">Key Differences Between the Root User and an IAM User</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-mfa">What is MFA?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-aws-shared-responsibility-model">Understanding the AWS Shared Responsibility Model</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-rds-relational-database-service">RDS (Relational Database Service)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-s3-simple-storage-service">S3 (Simple Storage Service)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-give-a-user-permission">How to give a user permission</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-testing-the-policy">Testing the Policy</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-further-reading">Further Reading</a></li>
</ul>
</li>
</ol>
<h2 id="heading-what-is-cloud-security">What is Cloud Security?</h2>
<p>Cloud security is the set of rules, tools, and practices used to protect your data, apps, and services stored online (in the "cloud"). It helps prevent data loss, hacking, and misuse of information.</p>
<p>Think of cloud security like locking the doors of your house. You wouldn’t leave your doors open for anyone to enter. And in the same way, your cloud account must be secured so that your data remains safe.</p>
<p>If your cloud services aren't secure, hackers could steal your data or cause major damage. Whether you're a business or just someone using cloud apps, keeping your information safe is essential.</p>
<h2 id="heading-why-is-cloud-security-important">Why is Cloud Security Important?</h2>
<p>Cloud security matters because it ensures that only the right people have access to your information. It protects your data from being lost, stolen, or misused. With good security in place, your applications can run safely without being exposed to attacks.</p>
<p>It also helps you keep your personal or business data private. When your cloud environment is well-protected, the risk of data breaches and financial loss is greatly reduced.</p>
<p>Now that you understand why cloud security is important, let’s look at how AWS helps you stay secure and what your own role is in keeping things safe.</p>
<h2 id="heading-key-cloud-security-concepts">Key Cloud Security Concepts</h2>
<p>In AWS, cloud security is the responsibility of both AWS <strong>and</strong> the customer. This model is called the Shared Responsibility Model.</p>
<p>But before learning how AWS divides security duties, you need to understand that while AWS protects its infrastructure, you must protect your own account.</p>
<p>Let’s discuss some key security concepts that are your responsibility, so you know how to do your part in the shared responsibility model.</p>
<h3 id="heading-what-is-a-root-user">What is a Root User?</h3>
<p>When you create an AWS account, the first identity that’s created is the <strong>Root</strong> user Account. This account has full, unrestricted control. It can delete resources, change ownership, and even close your entire AWS account. Because of this, it’s risky to use it for everyday tasks.</p>
<p>AWS recommends using root only for a few important account-level actions.</p>
<p>Certain tasks require a root user account, so you will need to use it occasionally. Such tasks include:</p>
<ul>
<li><p>Updating billing and payment information</p>
</li>
<li><p>Closing your AWS account</p>
</li>
<li><p>Changing the root account email</p>
</li>
<li><p>Recovering or resetting MFA for the root user</p>
</li>
</ul>
<p>Apart from these few tasks, avoid using the root user Account completely. Your everyday work should be done through IAM users, not the root account.</p>
<h3 id="heading-how-to-create-an-iam-user-for-daily-tasks">How to Create an IAM User for Daily Tasks</h3>
<p>Before you start creating any infrastructure in your AWS account, you need an IAM user with the right permissions.</p>
<p>Here’s how to create an IAM user, step by step:</p>
<ul>
<li><p>Open the AWS console.</p>
</li>
<li><p>Search for <strong>IAM</strong>, then select it. This takes you to the <strong>IAM page</strong>.</p>
</li>
<li><p>On the left-hand side, you will see <strong>Users</strong>.</p>
</li>
<li><p>Click on it. This takes you to the <strong>Create user</strong> page.</p>
</li>
<li><p>Click the <strong>Create user</strong> button. It takes you to the “specify user details page” where you will create an <strong>IAM user.</strong></p>
</li>
<li><p>Enter a username (for example, <code>adminuser</code>).</p>
</li>
<li><p>Click on “Provide user access to the AWS Management Console”.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764236205525/47c2da54-537f-4242-9048-ced4b7ca6172.png" alt="providing a user access to the AWS console" class="image--center mx-auto" width="1603" height="600" loading="lazy"></p>
<ol start="4">
<li><p>Scroll down and click on “Set a password,” or let AWS generate one for you.</p>
</li>
<li><p>Click <strong>Next</strong> to go to the permissions page.</p>
</li>
<li><p>Select <strong>Attach existing policies directly</strong>.</p>
</li>
<li><p>Choose <code>AdministratorAccess</code>. This permission gives the IAM user full access to perform all administrative tasks in your account.</p>
</li>
<li><p>Click <strong>Create user</strong>.</p>
</li>
</ol>
<p>Once you’ve created this user, sign in with it and use it for your day-to-day tasks. The root user should stay locked down and only be used for rare account-level changes.</p>
<h4 id="heading-video-walkthrough-of-how-to-create-an-iam-user">Video Walkthrough of How to Create an IAM User:</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764235482269/bf5b1f00-13b2-4ff5-b1e4-e6b2fd2796cb.gif" alt="bf5b1f00-13b2-4ff5-b1e4-e6b2fd2796cb" class="image--center mx-auto" width="640" height="323" loading="lazy"></p>
<h3 id="heading-key-differences-between-the-root-user-and-an-iam-user">Key Differences Between the Root User and an IAM User</h3>
<p>Just to be clear, let’s summarise the differences between these two accounts:</p>
<h4 id="heading-root-account">Root Account</h4>
<p>This is the very first account created when you set up AWS. It has unlimited power – literally, everything in the account can be changed, deleted, or closed.</p>
<p>It’s meant for rare, high-level tasks like billing changes, MFA resets, or closing the account. Because it’s so powerful, you shouldn’t use it for daily work.</p>
<h4 id="heading-iam-user-account">IAM User Account</h4>
<p>This is a user you create inside your AWS account for everyday tasks. You can assign specific permissions, like admin or limited access, to this user. It’s much safer because you can control what it can and cannot do.</p>
<p>If something goes wrong or the credentials are compromised, the blast radius is much smaller than for the root user.</p>
<p>In short, the Root is the master key too powerful for daily use. IAM users are customizable and safer for your regular work.</p>
<p>Here’s a helpful visual to show the differences between the two as well:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764248147852/7e053766-f6ba-4a17-9b63-c20695f2933c.jpeg" alt="differences between the two account" class="image--center mx-auto" width="4096" height="4096" loading="lazy"></p>
<p>Now that you have both your root user and IAM user set up properly, let’s go back to the concept of multi-factor authentication, or MFA.</p>
<h3 id="heading-what-is-mfa">What is MFA?</h3>
<p>MFA adds another layer of security when you sign in. It combines something you know, like your password, with something you have, such as a phone or security device. Even if someone gets your password, they can’t log in without your MFA code.</p>
<p>You can enable MFA in several ways:</p>
<ul>
<li><p>Using a virtual MFA app like Google Authenticator or Authy</p>
</li>
<li><p>Using a physical security key such as a YubiKey</p>
</li>
<li><p>Using a hardware device from Gemalto</p>
</li>
<li><p>For AWS GovCloud users, using an MFA device from SurePassID</p>
</li>
</ul>
<p>Enabling MFA makes sure that even if someone gets your password, they still can’t access your account without the second authentication step.</p>
<p>For this tutorial, we’ll use the <strong>Google Authenticator app</strong>, which you can download for free from the Play Store.</p>
<p><strong>How do I turn this on for my account?</strong></p>
<ul>
<li><p>Go to your AWS account.</p>
</li>
<li><p>At the top right corner, you’ll see a menu with your account username or ID. Click on it to open the drop-down.</p>
</li>
<li><p>You’ll see <strong>Security Credentials</strong>. Click on it. This will take you to the IAM-Security Credentials page.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764342765366/76b8ef01-9be2-4d93-918e-aaadf9e771ed.png" alt="security credentials" width="323" height="225" loading="lazy"></p>
<ul>
<li><p>At the top of the page, you’ll see a button labelled <strong>Assign MFA device</strong>. Click on it.</p>
</li>
<li><p>You’ll be redirected to a new page where you can choose the type of MFA device you want to use. Scroll down and select <strong>Virtual MFA device</strong> (this is what the Google Authenticator app uses).</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764342668287/5867caaf-af5d-4dfe-b026-68a16dfa1ad1.png" alt="MFA page selection" class="image--center mx-auto" width="1019" height="777" loading="lazy"></p>
<p>Then just follow the on-screen instructions:</p>
<ul>
<li><p>Open the <strong>Google Authenticator</strong> app on your phone.</p>
</li>
<li><p>Tap the <strong>+</strong> button and scan the QR code displayed on the AWS screen.</p>
</li>
<li><p>Enter the two codes generated by the app to verify your device.</p>
</li>
</ul>
<p>Once verified, AWS will link the MFA device to your account and take you back to the Security credentials page. If you scroll down, you’ll see your MFA device listed as <strong>assigned</strong>.</p>
<p>The next time you log in to AWS, you’ll be prompted to enter your MFA code from the Google Authenticator app before you can access your console.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764345845580/2eec5886-7ce7-43e0-8612-1035655a0619.gif" alt="MFA verification at login" class="image--center mx-auto" width="1920" height="1080" loading="lazy"></p>
<p>Always enable MFA for both your root user account and your IAM user account, as it’s one of the simplest and most effective ways to protect your AWS account.</p>
<p>Now that you understand these security fundamentals, we can get back to the shared responsibility model.</p>
<h2 id="heading-understanding-the-aws-shared-responsibility-model">Understanding the AWS Shared Responsibility Model</h2>
<p>The AWS Shared Responsibility Model divides responsibilities between AWS and the customer.</p>
<h3 id="heading-1-awss-responsibility-security-of-the-cloud">1. AWS’s Responsibility (Security of the Cloud)</h3>
<p>AWS is responsible for protecting the infrastructure that runs the services offered in the AWS Cloud. This includes physical security, hardware, software, networking, and facilities.</p>
<h3 id="heading-2-customers-responsibility-security-in-the-cloud">2. Customer’s Responsibility (Security in the Cloud)</h3>
<p>The customer is responsible for securing the data, user accounts, applications, and configurations they store in the cloud.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746970123152/016f8eef-fa2c-4bcb-866b-323fe7585b9d.png" alt="The shared responsibility model in AWS" class="image--center mx-auto" width="1212" height="664" loading="lazy"></p>
<p><strong><em>Image source:</em></strong> <a target="_blank" href="https://aws.amazon.com/compliance/shared-responsibility-model/"><em>AWS shared responsibility model</em></a></p>
<p>For example, AWS is responsible for securing its data centres and servers. But customers also have a role to play by properly configuring their accounts and resources.</p>
<p>Let’s take two popular AWS services, <strong>RDS (Relational Database Service)</strong> and <strong>S3 (Simple Storage Service)</strong>, as examples.</p>
<h3 id="heading-rds-relational-database-service">RDS (Relational Database Service)</h3>
<p><strong>AWS responsibilities:</strong></p>
<ul>
<li><p>Automates database patching</p>
</li>
<li><p>Audits and maintains the underlying instance and storage disks</p>
</li>
<li><p>Applies operating system patches automatically</p>
</li>
</ul>
<p><strong>Customer responsibilities (you):</strong></p>
<ul>
<li><p>Manage in-database users, roles, and permissions</p>
</li>
<li><p>Choose whether your database is public or private</p>
</li>
<li><p>Review and control inbound rules, ports, and IP addresses in the database’s security group</p>
</li>
<li><p>Configure database encryption settings</p>
</li>
</ul>
<h3 id="heading-s3-simple-storage-service">S3 (Simple Storage Service)</h3>
<p><strong>AWS responsibilities:</strong></p>
<ul>
<li><p>Ensures encryption options are available for your data</p>
</li>
<li><p>Guarantees virtually unlimited storage capacity</p>
</li>
<li><p>Prevents AWS employees and the public from accessing your data</p>
</li>
<li><p>Keeps each customer’s data separated from others</p>
</li>
</ul>
<p><strong>Customer responsibilities (you):</strong></p>
<ul>
<li><p>Define your S3 bucket policies according to your security standards</p>
</li>
<li><p>Review bucket configuration settings</p>
</li>
<li><p>Create and manage IAM users and roles with the right permissions</p>
</li>
</ul>
<p>Now you understand who’s responsible for what.</p>
<h2 id="heading-how-to-give-a-user-permission">How to Give a User Permission</h2>
<p>Security in the cloud isn’t just about strong passwords or enabling MFA – it’s also about controlling <em>who</em> can access <em>what</em>. One of the most important principles in AWS security is to grant users only the access they actually need, nothing more. That’s how you keep your environment safe and your resources protected.</p>
<p>So here’s a key question: how do we know to whom to allow or deny access in the cloud?</p>
<h3 id="heading-demonstration">Demonstration</h3>
<p>Let’s walk through a simple, real-life example together.</p>
<p>Imagine you have a developer on your team who needs access to an S3 bucket named <strong>demo-test-app-ij</strong>. The goal is to let them upload and view files in the bucket, but not delete anything.</p>
<p>We already created a user earlier in this guide, so we’ll use that same one here.</p>
<p>To get started, go to <strong>IAM</strong> from your AWS Management Console. Then click on <strong>Users</strong> from the left-hand menu.</p>
<p>Select the user we created earlier. If you don’t have one yet, go back and follow the steps I showed you before to create a new IAM user.</p>
<p>Once you click on the user’s name, you’ll be taken to the <strong>Permissions</strong> page. On the permissions page, click on <strong>Add permissions</strong>.</p>
<p>From the dropdown options, select <strong>Create inline policy</strong>. This will open the <strong>Specify permissions</strong> page, where you’ll define the user’s access.</p>
<p>Scroll down through the list of services and select <strong>S3</strong>. In our example, we’re using S3 because we want to control access to a specific bucket.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764317236877/10517fde-3bbd-4af3-9e16-289793894903.png" alt="An example showing the permission page" class="image--center mx-auto" width="1538" height="529" loading="lazy"></p>
<p>Once you select the service you want to define permissions for, the <strong>Actions</strong> and <strong>Resources</strong> sections will appear automatically.</p>
<p>In the Actions section, you’ll see a list of what the user can do with the service. Here, you can toggle the effect button to either “Allow” or “Deny.”</p>
<p>Under Actions, scroll through the list and find <strong>DeleteObject</strong>. Set this action to Deny: DeleteObject. This ensures the user won’t be able to delete any files from the bucket.</p>
<p>Next, move on to the Resources section. Here, you’ll specify which bucket these permissions apply to.</p>
<p>Add the following bucket ARN: <code>arn:aws:s3:::demo-test-app-ij/*</code>. This means the rule applies to everything inside the <strong>demo-test-app-ij</strong> bucket.</p>
<p>Once you’ve added the ARN and confirmed the settings, click <strong>Save policy</strong>.</p>
<p>Now, let’s put all these instructions together in a practical example:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764335991549/1d33af0a-23c8-4790-aab4-b629db4b7ba9.gif" alt="1d33af0a-23c8-4790-aab4-b629db4b7ba9" class="image--center mx-auto" width="600" height="338" loading="lazy"></p>
<h3 id="heading-testing-the-policy">Testing the Policy</h3>
<p>Now it’s time to confirm that our permissions work the way we expect.</p>
<p>Head over to the <strong>S3</strong> service and open the bucket named <strong>demo-test-app-ij</strong>. Try uploading a file; it should upload successfully. Next, try deleting that same file. You’ll see an error message saying <strong>Failed to delete objects</strong>.</p>
<p>That’s exactly what we want! The user can upload and view files, but can’t delete them, because we never permitted them to do so.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764336439667/279c1f4b-7d5b-477a-b782-cf1378204430.gif" alt="279c1f4b-7d5b-477a-b782-cf1378204430" class="image--center mx-auto" width="1920" height="971" loading="lazy"></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Security has always been about peace of mind. Whether it’s your home, your phone, or your cloud account, you’ll want to know your data is safe.</p>
<p>AWS gives you a strong foundation by securing the cloud itself. But your part matters too: things like enabling MFA, using strong passwords, and managing who can access what. These simple habits go a long way in keeping your data protected.</p>
<p>Cloud security isn’t a one-time setup. It’s an ongoing practice. When both AWS and its users stay alert, the cloud becomes a place you can trust to store, build, and grow with confidence.</p>
<p>Now that you have a basic understanding of how security works in AWS, you’re ready to go deeper and start exploring the services that keep it all running smoothly.</p>
<h3 id="heading-further-reading">Further Reading</h3>
<ul>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/cloud-computing-guide-for-beginners/">What is Cloud Computing? A Guide for Beginners</a></p>
</li>
<li><p><a target="_blank" href="https://twitter.com/ijaydimples">How t</a><a target="_blank" href="https://www.freecodecamp.org/news/how-to-deploy-a-kubernetes-app-on-aws-eks/">o D</a><a target="_blank" href="https://www.linkedin.com/in/ijeoma-igboagu/">eploy a</a> <a target="_blank" href="https://www.freecodecamp.org/news/how-to-deploy-a-kubernetes-app-on-aws-eks/">Kub</a><a target="_blank" href="https://github.com/ijayhub">ernete</a><a target="_blank" href="https://twitter.com/ijaydimples">s App o</a><a target="_blank" href="https://www.freecodecamp.org/news/how-to-deploy-a-kubernetes-app-on-aws-eks/">n A</a><a target="_blank" href="https://twitter.com/ijaydimples">WS EKS</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/best-aws-services-for-frontend-deployment/">T</a><a target="_blank" href="https://github.com/ijayhub">he Bes</a><a target="_blank" href="https://www.freecodecamp.org/news/best-aws-services-for-frontend-deployment/">t AW</a><a target="_blank" href="https://twitter.com/ijaydimples">S Servi</a><a target="_blank" href="https://www.freecodecamp.org/news/best-aws-services-for-frontend-deployment/">ces</a> <a target="_blank" href="https://www.linkedin.com/in/ijeoma-igboagu/">to Depl</a><a target="_blank" href="https://www.freecodecamp.org/news/best-aws-services-for-frontend-deployment/">oy</a> <a target="_blank" href="https://github.com/ijayhub">Front-</a><a target="_blank" href="https://www.freecodecamp.org/news/best-aws-services-for-frontend-deployment/">End Applications in 20</a><a target="_blank" href="https://twitter.com/ijaydimples">25</a></p>
</li>
<li><p><a target="_blank" href="https://twitter.com/ijaydimples">W</a><a target="_blank" href="https://www.freecodecamp.org/news/backend-as-a-service-beginners-guide/">hat</a> <a target="_blank" href="https://twitter.com/ijaydimples">is Back</a><a target="_blank" href="https://www.freecodecamp.org/news/backend-as-a-service-beginners-guide/">end</a> <a target="_blank" href="https://github.com/ijayhub">as a</a> <a target="_blank" href="https://www.freecodecamp.org/news/backend-as-a-service-beginners-guide/">Service (BaaS)? A Beginner's Guide</a></p>
</li>
<li><p><a target="_blank" href="https://dev.to/ijay/the-hidden-challenges-of-building-with-aws-8mg">The Hidden Challenges of Building with AWS</a></p>
</li>
</ul>
<p>If you found this article helpful, feel free to share it. And if you prefer learning through videos, I also explain cloud topics in simple terms on my <a target="_blank" href="https://www.youtube.com/@cloudinreallife">YouTube channel</a>.</p>
<p>Stay updated with my projects by following me on <a target="_blank" href="https://twitter.com/ijaydimples">Twitter</a>, <a target="_blank" href="https://www.linkedin.com/in/ijeoma-igboagu/">LinkedIn</a> and <a target="_blank" href="https://github.com/ijayhub">GitHub</a>.</p>
<p>Thank you for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Full-Stack Serverless CRUD App using AWS and React ]]>
                </title>
                <description>
                    <![CDATA[ Imagine running a production application that automatically scales from zero to thousands of users without ever touching a server configuration. That's the power of serverless architecture, and it's easier to implement than you might think. If you're... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-full-stack-serverless-app/</link>
                <guid isPermaLink="false">68f7b6ca9d4df532a83f12f7</guid>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ serverless ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ React ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Chisom Uma ]]>
                </dc:creator>
                <pubDate>Tue, 21 Oct 2025 16:37:30 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1761064422167/c0a6b8ed-a500-43f2-820f-42fef5d73275.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Imagine running a production application that automatically scales from zero to thousands of users without ever touching a server configuration. That's the power of serverless architecture, and it's easier to implement than you might think.</p>
<p>If you're a junior cloud engineer ready to move beyond theoretical AWS concepts and build something real, this tutorial walks you through creating a complete serverless coffee shop management system.</p>
<p>You'll learn how to architect, deploy, and secure a production-ready application using AWS's most powerful serverless services.</p>
<p>Without further ado, let's get started!</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tools-well-be-using">Tools We’ll be Using</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-we-are-building">What We are Building</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-serverless">Why Serverless?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-architectural-overview">Architectural Overview</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-build-a-serverless-full-stack-app">Build a Serverless Full-Stack App</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-create-a-dynamodb-table">Step 1: Create a DynamoDB table</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-create-an-iam-role-for-the-lambda-function">Step 2: Create an IAM role for the Lambda function</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-create-lambda-layer-and-lambda-functions">Step 3: Create Lambda Layer And Lambda Functions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-create-an-api-gateway-to-expose-lambda-functions">Step 4: Create an API Gateway To Expose Lambda Functions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-set-up-react-application-and-upload-build-to-s3-bucket">Step 5: Set up React Application And Upload Build To S3 Bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-set-up-amazon-api-gateway-authorizer">Step 6: Set up Amazon API Gateway Authorizer</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-create-cloudfront-distribution-with-behaviors-for-s3-and-api-gateway">Step 7: Create Cloudfront Distribution With Behaviors For S3 And API Gateway</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-8-set-up-react-application-and-upload-build-to-s3-bucket">Step 8: Set up React Application And Upload Build To S3 Bucket</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-troubleshooting-access-denied-error">Troubleshooting Access Denied Error</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-set-up-origin-access-control-oac">Step 1: Set up Origin Access Control (OAC)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-update-s3-bucket-policy">Step 2: Update S3 Bucket Policy</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-set-default-root-object">Step 3: Set Default Root Object</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p>Basic knowledge of AWS.</p>
</li>
<li><p>Basic knowledge of AWS serverless services.</p>
</li>
<li><p>Knowledge of React (not required).</p>
</li>
<li><p>Basic knowledge of Postman or other API testing tools.</p>
</li>
</ul>
<h2 id="heading-tools-well-be-using">Tools We’ll be Using</h2>
<ul>
<li><p><a target="_blank" href="https://react.dev/">React.js</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/lambda/">AWS Lambda</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/dynamodb/">DynamoDB</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/api-gateway/">API Gateway</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/pm/cognito/?trk=14a3c368-cab2-4b17-9bd6-1bbec9e89f29&amp;sc_channel=ps&amp;ef_id=Cj0KCQjw9czHBhCyARIsAFZlN8QA0K8iJKGNUsG4QX-JlA1a2EMYyCbYff2A9zo-itZdGqnDcYYJVW4aApzbEALw_wcB:G:s&amp;s_kwcid=AL!4422!3!651541907485!e!!g!!cognito!19835790380!146491699385&amp;gad_campaignid=19835790380&amp;gbraid=0AAAAADjHtp9wvSpEmU_k_hjYPjL8j0lSi&amp;gclid=Cj0KCQjw9czHBhCyARIsAFZlN8QA0K8iJKGNUsG4QX-JlA1a2EMYyCbYff2A9zo-itZdGqnDcYYJVW4aApzbEALw_wcB">Cognito</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/cloudfront/">CloudFront</a></p>
</li>
</ul>
<h2 id="heading-what-we-are-building">What We are Building</h2>
<p>We'll build a complete serverless coffee shop management system using AWS cloud services. Coffee shop owners will securely log in through AWS Cognito authentication and have full control over their inventory, adding new products, updating stock levels, viewing current inventory, and removing discontinued items. To follow along with this tutorial, you can clone the repo <a target="_blank" href="https://github.com/ChisomUma/aws-serverless-arch-project">here</a>.</p>
<p>This is what our user interface (UI) looks like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760784475691/8d9ba162-74dd-447d-b627-3e67b8a944ae.png" alt="image of coffee shop dashboard serverless project" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-why-serverless">Why Serverless?</h2>
<p>AWS serverless services like Lambda, Cognito, and API Gateway automatically scale to zero during quiet periods and instantly ramp up when traffic spikes. While 'serverless' might sound like there are no servers at all, this isn't actually the case. It means that AWS handles all the heavy lifting, provisioning, managing, and scaling of the infrastructure behind the scenes. You only pay for what you use.</p>
<h2 id="heading-architectural-overview">Architectural Overview</h2>
<p>Our architecture uses DynamoDB as the data store, with Lambda functions (enhanced by Lambda layers) handling all API Gateway requests. Cognito secures the API Gateway, while CloudFront CDN delivers everything globally. The React frontend connects directly to the Cognito UserPool and gets hosted on S3 with CloudFront distribution. For production deployments, you can add a custom domain using CloudFlare and AWS Certificate Manager.</p>
<h2 id="heading-build-a-serverless-full-stack-app">Build a Serverless Full-Stack App</h2>
<p>In this section, you’ll build a full-stack serverless architecture.</p>
<h3 id="heading-step-1-create-a-dynamodb-table">Step 1: Create a DynamoDB table</h3>
<p>To create a DynamoDB table, navigate to your AWS console and select the DynamoDB section. You can do this quickly by typing “DynamoDB” into the AWS search bar and clicking on DynamoDB. Next, follow the steps below to complete your table creation:</p>
<ol>
<li><p>Click <strong>Create table</strong>.</p>
</li>
<li><p>Input table name as “CoffeeShop” or anything you want to name it.</p>
</li>
<li><p>Input partition key as “coffeeId” or anything you want to name it.</p>
</li>
<li><p>Click <strong>Create table</strong>.</p>
</li>
</ol>
<p><strong>Step 1.1: Create items</strong></p>
<p>You need to create items for the table. This helps with testing connectivity to your DynamoDB table.</p>
<p>For our use case, we’ll be creating an item in the table called “coffee” and input attributes such as coffeeId, name, price, and availability. To create an item:</p>
<ol>
<li><p>Click <strong>Explore items</strong> on the left navigation pane.</p>
</li>
<li><p>Click <strong>Create items</strong>.</p>
</li>
<li><p>Click the <em>CoffeeShop</em> radio button, then click <strong>Create item</strong>.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760785698166/ee1f5e2d-feef-41de-80d8-eb2c4cad4d04.png" alt="image of dynamodb page" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<ol start="4">
<li>Click <strong>Add new attribute</strong>. This allows you to add different data types such as strings and booleans. The JSON structure below shows the attributes created.</li>
</ol>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"coffeeId"</span>: <span class="hljs-string">"c123"</span>,
    <span class="hljs-attr">"name"</span>: <span class="hljs-string">"new cold coffee"</span>,
    <span class="hljs-attr">"price"</span>: <span class="hljs-number">456</span>,
    <span class="hljs-attr">"available"</span>: <span class="hljs-literal">true</span>
}
</code></pre>
<h3 id="heading-step-2-create-an-iam-role-for-the-lambda-function">Step 2: Create an IAM role for the Lambda function</h3>
<p>Next, create a Lambda function that interacts with the DynamoDB table using an IAM role attached to the function. We’ll be setting up an IAM role named "CoffeeShopRole" that serves as a shared execution role for all Lambda functions in the coffee shop application.</p>
<p>This role includes the following permissions:</p>
<ul>
<li><p><strong>CloudWatch Logs</strong>: Full logging capabilities (create, write, and manage log streams)</p>
</li>
<li><p><strong>DynamoDB Access</strong>: Complete read, write, update, and delete operations on the "CoffeeShop" table.</p>
</li>
</ul>
<p>To do this:</p>
<ol>
<li><p>Navigate to the AWS IAM console.</p>
</li>
<li><p>Navigate to <strong>Roles</strong>.</p>
</li>
<li><p>Click <strong>Create role</strong>.</p>
</li>
<li><p>Select the Lambda service.</p>
</li>
<li><p>Search for “AWSLambdaBasicExecutionRole.”</p>
</li>
<li><p>Name your role and click <strong>Create role</strong>.</p>
</li>
</ol>
<p>This is what the role looks like:</p>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"Version"</span>: <span class="hljs-string">"2012-10-17"</span>,
    <span class="hljs-attr">"Statement"</span>: [
        {
            <span class="hljs-attr">"Sid"</span>: <span class="hljs-string">"VisualEditor0"</span>,
            <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
            <span class="hljs-attr">"Action"</span>: [
                <span class="hljs-string">"dynamodb:PutItem"</span>,
                <span class="hljs-string">"dynamodb:DeleteItem"</span>,
                <span class="hljs-string">"dynamodb:GetItem"</span>,
                <span class="hljs-string">"dynamodb:Scan"</span>,
                <span class="hljs-string">"dynamodb:UpdateItem"</span>
            ],
            <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:dynamodb::&lt;DYNAMODB_TABLE_NAME&gt;"</span>
        },
        {
            <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
            <span class="hljs-attr">"Action"</span>: [
                <span class="hljs-string">"logs:CreateLogGroup"</span>,
                <span class="hljs-string">"logs:CreateLogStream"</span>,
                <span class="hljs-string">"logs:PutLogEvents"</span>
            ],
            <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"*"</span>
        }
    ]
}
</code></pre>
<p>This policy allows us to create CloudWatch logs. Next, create an <strong>inline policy</strong> to allow communications to DynamoDB. Select the following actions for the table:</p>
<ul>
<li><p>Get</p>
</li>
<li><p>Put</p>
</li>
<li><p>Update</p>
</li>
<li><p>Scan</p>
</li>
<li><p>Delete</p>
</li>
</ul>
<p>Next, connect your table ARN to the policy by navigating to the created table and copying the ARN into the policy.</p>
<h3 id="heading-step-3-create-lambda-layer-and-lambda-functions">Step 3: Create Lambda Layer And Lambda Functions</h3>
<p>Now, we need to connect our Lambda function to the DynamoDB table. For this, we’ll need the DynamoDB JavaScript SDK. To get started, create two folders: <code>lambda</code> &gt; <code>get</code> in your IDE, preferably VS Code. Navigate into these folders in your terminal and run the <code>npm init</code> command to initialize your project. Update your <code>package.json</code> file with this:</p>
<pre><code class="lang-json">
{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"get"</span>,
  <span class="hljs-attr">"type"</span>: <span class="hljs-string">"module"</span>,
  <span class="hljs-attr">"version"</span>: <span class="hljs-string">"1.0.0"</span>,
  <span class="hljs-attr">"main"</span>: <span class="hljs-string">"index.js"</span>,
  <span class="hljs-attr">"scripts"</span>: {
    <span class="hljs-attr">"test"</span>: <span class="hljs-string">"echo \"Error: no test specified\" &amp;&amp; exit 1"</span>
  },
  <span class="hljs-attr">"author"</span>: <span class="hljs-string">""</span>,
  <span class="hljs-attr">"license"</span>: <span class="hljs-string">"ISC"</span>,
  <span class="hljs-attr">"description"</span>: <span class="hljs-string">""</span>
}
</code></pre>
<p><strong>Note:</strong> that we’ll be using <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Glossary/ECMAScript">ECMAScript</a> <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Glossary/ECMAScript">throughou</a>t the course of this tutorial.</p>
<p>Next, we have to create a reusable Node.js Lambda layer containing the <a target="_blank" href="https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/javascript_dynamodb_code_examples.html">DynamoDB JavaScript SDK</a> and shared utility functions. This layer acts like a common library that can be attached to multiple Lambda functions, eliminating the need to bundle the same dependencies repeatedly in each function's deployment package.</p>
<p>To use the SDK, create a new folder in your directory titled <code>index.mjs</code> and paste in the code below:</p>
<pre><code class="lang-typescript">
<span class="hljs-comment">// getCoffee function</span>
<span class="hljs-keyword">import</span> { DynamoDBClient, GetItemCommand } <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/client-dynamodb"</span>; <span class="hljs-comment">// ESM import</span>
<span class="hljs-keyword">const</span> config = {
    region: <span class="hljs-string">"us-east-1"</span>,
};
<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> DynamoDBClient(config);
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> getCoffee = <span class="hljs-keyword">async</span> (event) =&gt; {
    <span class="hljs-keyword">const</span> coffeeId = <span class="hljs-string">"c123"</span>;
    <span class="hljs-keyword">const</span> input = {
        TableName: <span class="hljs-string">"CoffeShop"</span>,
        Key: {
            coffeeId: {
                S: coffeeId,
            },
        },
    };
    <span class="hljs-keyword">const</span> command = <span class="hljs-keyword">new</span> GetItemCommand(input);
    <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> client.send(command);
    <span class="hljs-built_in">console</span>.log(response);
    <span class="hljs-keyword">return</span> response;
}
</code></pre>
<p>The code above is the <code>getCoffee</code> function that connects to the DynamoDB table called <code>CoffeShop</code>, looks up the coffee with the ID <code>c123</code>, and displays its details.</p>
<p>Change <code>region</code> to your specific region.</p>
<p>Next, install the Lambda dependencies for the SDK using the command below:</p>
<pre><code class="lang-bash">
npm i @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb
</code></pre>
<p>Then, create a zip file for all the current files using the command below:</p>
<pre><code class="lang-bash">zip -r get.zip ./*
</code></pre>
<p>This creates a zip file in your project directory. Now, navigate to the Lambda function page on your AWS console and upload this zip file.</p>
<p>Click <strong>Test</strong> to test your application. If you run into an error, edit the Runtime settings and change the handler name to <code>index.getCoffee</code>. Deploy and run the code again, you should get a successful response from DynamoDB as shown below:</p>
<p>Response:</p>
<pre><code class="lang-bash">
{
  <span class="hljs-string">"<span class="hljs-variable">$metadata</span>"</span>: {
    <span class="hljs-string">"httpStatusCode"</span>: 200,
    <span class="hljs-string">"requestId"</span>: <span class="hljs-string">"R14Q5UMTP3K9P9NAF1OGG0IB57VV4KQNSO5AEMVJF66Q9ASUAAJG"</span>,
    <span class="hljs-string">"attempts"</span>: 1,
    <span class="hljs-string">"totalRetryDelay"</span>: 0
  },
  <span class="hljs-string">"Item"</span>: {
    <span class="hljs-string">"available"</span>: {
      <span class="hljs-string">"BOOL"</span>: <span class="hljs-literal">true</span>
    },
    <span class="hljs-string">"price"</span>: {
      <span class="hljs-string">"N"</span>: <span class="hljs-string">"34"</span>
    },
    <span class="hljs-string">"name"</span>: {
      <span class="hljs-string">"S"</span>: <span class="hljs-string">"My New Coffee"</span>
    },
    <span class="hljs-string">"coffeeId"</span>: {
      <span class="hljs-string">"S"</span>: <span class="hljs-string">"c123"</span>
    }
  }
}
</code></pre>
<p>Now, let’s make the necessary changes to make our function ready for the API gateway to get the API. When someone requests a coffee using the <code>/coffee</code> endpoint, we want the app to returns a list of all coffees. But if the request is made to <code>/coffee/c123</code> or <code>/coffee/id</code>, then the app returns only details about that specific coffee.</p>
<p>To do this, head back to your <code>index.mjs</code> file and paste in the code below:</p>
<pre><code class="lang-typescript">
<span class="hljs-keyword">import</span> { DynamoDBClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/client-dynamodb"</span>;
<span class="hljs-keyword">import</span> { DynamoDBDocumentClient, GetCommand, ScanCommand } <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/lib-dynamodb"</span>;
<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> DynamoDBClient({});
<span class="hljs-keyword">const</span> docClient = DynamoDBDocumentClient.from(client);
<span class="hljs-keyword">const</span> tableName = process.env.tableName || <span class="hljs-string">"CoffeShop"</span>;
<span class="hljs-keyword">const</span> createResponse = <span class="hljs-function">(<span class="hljs-params">statusCode, body</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> responseBody = <span class="hljs-built_in">JSON</span>.stringify(body);
    <span class="hljs-keyword">return</span> {
        statusCode,
        headers: { <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span> },
        body: responseBody,
    };
};
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> getCoffee = <span class="hljs-keyword">async</span> (event) =&gt; {
    <span class="hljs-keyword">const</span> { pathParameters } = event;
    <span class="hljs-keyword">const</span> { id } = pathParameters || {};
    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">let</span> command;
        <span class="hljs-keyword">if</span> (id) {
            command = <span class="hljs-keyword">new</span> GetCommand({
                TableName: tableName,
                Key: {
                    <span class="hljs-string">"coffeeId"</span>: id,
                },
            });
        }
        <span class="hljs-keyword">else</span> {
            command = <span class="hljs-keyword">new</span> ScanCommand({
                TableName: tableName,
            });
        }
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> docClient.send(command);
        <span class="hljs-keyword">return</span> createResponse(<span class="hljs-number">200</span>, response);
    }
    <span class="hljs-keyword">catch</span> (err) {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Error fetching data from DynamoDB:"</span>, err);
        <span class="hljs-keyword">return</span> createResponse(<span class="hljs-number">500</span>, { error: err.message });
    }
}
</code></pre>
<p>Run the <code>zip -r get.zip ./*</code> command again and re-upload the zip file in your Lambda function page.</p>
<p>This AWS Lambda function implements a serverless API endpoint for retrieving coffee data from a DynamoDB table, using the <a target="_blank" href="https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/dynamodb/command/GetItemCommand/">AWS SDK v3</a> to create a document client that can either fetch a specific coffee item by ID (when an <code>id</code> parameter is provided in the URL path) or return all items from the table (when no ID is specified, though there's a missing import for <code>ScanCommand</code>).</p>
<p>The function extracts the coffee ID from the incoming event's path parameters, constructs the appropriate DynamoDB command (<code>GetCommand</code> for single items or <code>ScanCommand</code> for all items), executes the database operation, and returns a properly formatted HTTP response with JSON headers and appropriate status codes - either a 200 success response with the coffee data or a 500 error response if something goes wrong during the database operation.</p>
<p>Repeat the steps above for the <code>create</code>, <code>update</code>, and <code>delete</code> functions. You can find these functions in your cloned <a target="_blank" href="https://github.com/ChisomUma/aws-serverless-arch-project">project repo</a>.</p>
<h3 id="heading-step-4-create-an-api-gateway-to-expose-lambda-functions">Step 4: Create an API Gateway To Expose Lambda Functions</h3>
<p>To create an API that points to the Lambda function:</p>
<ol>
<li><p>Navigate to <strong>API Gateway</strong> &gt; <strong>Routes</strong> and click <strong>Create.</strong></p>
</li>
<li><p>Create the following endpoints.</p>
</li>
</ol>
<pre><code class="lang-bash">
GET /coffee  -&gt; getCoffee lambda <span class="hljs-keyword">function</span>
GET /coffee/{id}  -&gt; getCoffee lambda <span class="hljs-keyword">function</span>
POST /coffee  -&gt; createCoffee lambda <span class="hljs-keyword">function</span>
PUT /coffee/{id}  -&gt; updateCoffee lambda <span class="hljs-keyword">function</span>
DELETE /coffee/{id}  -&gt; deleteCoffee lambda <span class="hljs-keyword">function</span>
</code></pre>
<ol start="3">
<li>Navigate to <strong>Integrations</strong> and create integrations for these endpoints. To do this, go to the <strong>Manage integrations</strong> tab, click <strong>Create,</strong> and select Lambda as the integration target.</li>
</ol>
<p>Now, in your API Gateway portal, click on <code>API: CoffeeShop...(random numbers)</code> and copy the invoke URL for testing, as shown in the image below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760792772732/1d453e97-ce05-4be2-ae6d-d7eb55f86820.png" alt="image of postman interface during testing" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The <code>get</code> request with an <code>id</code> returns a <code>200 OK</code> response with the created items in DynamoDB. You can play around with the rest of the endpoints on Postman :)</p>
<p><strong>Adding Lambda Layer to Solve the Dependency Issue</strong></p>
<p>Before we continue with this tutorial, I’d like to address one problem with the previous steps so far. All functions use the same dependency, but for each function, we had to maintain separate <code>node_modules</code> folders and <code>packages.json</code> files. To fix this issue, we’ll be using <a target="_blank" href="https://docs.aws.amazon.com/lambda/latest/dg/chapter-layers.html">Lamba Layer.</a> Layer contains all the dependencies, while the functions contain only your code.</p>
<p>To get started:</p>
<ol>
<li><p>Create a new folder in your IDE called <code>LambdaWithLayer</code>.</p>
</li>
<li><p>Create two additional folders under the <code>LambdaWithLayer</code> named <code>LambdaFunctionsWithLayer</code> and <code>nodejs</code>.</p>
</li>
</ol>
<p><strong>Note:</strong> You <em>must</em> use the name <code>nodejs</code> for this to work.</p>
<ol start="3">
<li><p>Navigate to the <code>nodejs</code> folder and initialize using the npm init command.</p>
</li>
<li><p>Install dependencies using the command below:</p>
</li>
</ol>
<pre><code class="lang-bash">npm i @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb
</code></pre>
<ol start="5">
<li>Create a new file called <code>utils.js</code> under the <code>nodejs</code> folder and paste in the code below:</li>
</ol>
<pre><code class="lang-typescript">
<span class="hljs-keyword">import</span> { DynamoDBClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/client-dynamodb"</span>;
<span class="hljs-keyword">import</span> {
    DynamoDBDocumentClient,
    ScanCommand,
    GetCommand,
    PutCommand,
    UpdateCommand,
    DeleteCommand
} <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/lib-dynamodb"</span>;
<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> DynamoDBClient({});
<span class="hljs-keyword">const</span> docClient = DynamoDBDocumentClient.from(client);
<span class="hljs-keyword">const</span> createResponse = <span class="hljs-function">(<span class="hljs-params">statusCode, body</span>) =&gt;</span> {
    <span class="hljs-keyword">return</span> {
        statusCode,
        headers: { <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span> },
        body: <span class="hljs-built_in">JSON</span>.stringify(body),
    };
};
<span class="hljs-keyword">export</span> {
    docClient,
    createResponse,
    ScanCommand,
    GetCommand,
    PutCommand,
    UpdateCommand,
    DeleteCommand
};
</code></pre>
<p>Here, we imported all the commands for our API operations. Now, we can create Lambda Functions without installing the SDK dependencies for each one. For example, you can create a <code>get</code> folder under the <code>LambdaFunctionsWithLayer</code> folder for the <code>get</code> function, then create an <code>index.mjs</code> file under the <code>get</code> folder. Next, paste the code below:</p>
<pre><code class="lang-typescript">
<span class="hljs-keyword">import</span> { docClient, GetCommand, ScanCommand, createResponse } <span class="hljs-keyword">from</span> <span class="hljs-string">'/opt/nodejs/utils.mjs'</span>; <span class="hljs-comment">// Import from Layer</span>
<span class="hljs-keyword">const</span> tableName = process.env.tableName || <span class="hljs-string">"CoffeShop"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> getCoffee = <span class="hljs-keyword">async</span> (event) =&gt; {
    <span class="hljs-keyword">const</span> { pathParameters } = event;
    <span class="hljs-keyword">const</span> { id } = pathParameters || {};
    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">let</span> command;
        <span class="hljs-keyword">if</span> (id) {
            command = <span class="hljs-keyword">new</span> GetCommand({
                TableName: tableName,
                Key: {
                    <span class="hljs-string">"coffeeId"</span>: id,
                },
            });
        }
        <span class="hljs-keyword">else</span> {
            command = <span class="hljs-keyword">new</span> ScanCommand({
                TableName: tableName,
            });
        }
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> docClient.send(command);
        <span class="hljs-keyword">return</span> createResponse(<span class="hljs-number">200</span>, response);
    }
    <span class="hljs-keyword">catch</span> (err) {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Error fetching data from DynamoDB:"</span>, err);
        <span class="hljs-keyword">return</span> createResponse(<span class="hljs-number">500</span>, { error: err.message });
    }
}
</code></pre>
<p>Now we can see that, in the code, we no longer require dependencies for the <code>get</code> function. We just imported from the layer.</p>
<p>Repeat the above steps for other functions.</p>
<p><strong>Note:</strong> You can find the code for other functions in <a target="_blank" href="https://github.com/ChisomUma/aws-serverless-arch-project">the cloned repo</a>.</p>
<ol start="6">
<li>Create a zip folder for each function. You can do this by creating a file called <code>create_zip.sh</code> under the <code>LambdaFunctionsWithLayer</code> folder. Then paste the script below:</li>
</ol>
<pre><code class="lang-bash">
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating zip for layer"</span>
zip -r layer.zip nodejs
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating zip for GET Function"</span>
<span class="hljs-built_in">cd</span> LambdaFunctionsWithLayer/get
zip -r get.zip index.mjs
mv get.zip ../../
<span class="hljs-built_in">cd</span> ../..
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating zip for POST Function"</span>
<span class="hljs-built_in">cd</span> LambdaFunctionsWithLayer/post
zip -r post.zip index.mjs
mv post.zip ../../
<span class="hljs-built_in">cd</span> ../..
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating zip for UPDATE Function"</span>
<span class="hljs-built_in">cd</span> LambdaFunctionsWithLayer/update
zip -r update.zip index.mjs
mv update.zip ../../
<span class="hljs-built_in">cd</span> ../..
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating zip for DELETE Function"</span>
<span class="hljs-built_in">cd</span> LambdaFunctionsWithLayer/delete
zip -r delete.zip index.mjs
mv delete.zip ../../
<span class="hljs-built_in">cd</span> ../..
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Success!"</span>
</code></pre>
<p>Run the script using the <code>sh create_zip.sh</code> command. This creates zip files (including a <code>layer.zip</code> file) that you can upload to your AWS Lambda function Layer page.</p>
<ol start="7">
<li><p>In your AWS Lambda function page, navigate to <strong>Layers</strong> and upload the <code>layer.zip</code> file**.**</p>
</li>
<li><p>Update the functions by uploading the newly created zip files for each code.</p>
</li>
<li><p>Add the layer to the function by clicking <strong>Layers</strong> in the function view:</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760793962650/8e797256-af9e-445d-8025-2fbd29dfe87f.png" alt="image of get coffee lambda layer" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Next, click <strong>Add a layer,</strong> then select <strong>Custom layers.</strong> Then choose <strong>“DynamoDBLayer”</strong> and version <strong>“1”.</strong></p>
<ol start="10">
<li><p>Click <strong>Add</strong>.</p>
</li>
<li><p>Repeat for all the other functions.</p>
</li>
</ol>
<h3 id="heading-step-5-set-up-react-application-and-upload-build-to-s3-bucket">Step 5: Set up React Application And Upload Build To S3 Bucket</h3>
<p>To set up our React application, navigate to the <code>frontend</code> folder of the cloned repository on your local machine and run <code>npm install</code> to install the dependencies. Then run <code>npm run dev</code> to start your development environment on your local machine. You should see the preview in your browser at: <a target="_blank" href="http://localhost:5173/"><code>http://localhost:5173/</code></a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760794168737/0cace684-7f8b-47db-944a-7a642d991ca0.png" alt="image of coffe list ui" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>If you inspect the page using <a target="_blank" href="https://developer.chrome.com/docs/devtools">Chrome DevTools</a>, you’ll see that we ran into some <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CORS">CORS</a> error:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760794416609/0cd6196e-b7cc-4f61-af5f-77995d6139ec.png" alt="image of chrome dev tool console" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now, let’s fix this problem. To do that:</p>
<ol>
<li><p>Navigate your API Gateway page.</p>
</li>
<li><p>Click on <strong>CORS</strong> on the left navigation panel.</p>
</li>
<li><p>Click <strong>Configure</strong>.</p>
</li>
<li><p>Copy your <a target="_blank" href="http://localhost">localhost</a> URL and paste it into the <strong>Access-Control-Allow-Origin</strong> field.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760794511537/26a1917b-16ae-48bc-a786-b36c6bb31490.png" alt="image of cors configuration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Ensure to remove the <code>/</code> at the end of your URL as shown in the image above.</p>
<ol start="5">
<li><p>Click <strong>Add</strong>.</p>
</li>
<li><p>Enter the <strong>Access-Control-Allow-Headers</strong> field with the text content-type and click <strong>Add</strong>.</p>
</li>
<li><p>Include <code>GET</code>, <code>POST</code>, <code>OPTIONS</code>, <code>PUT</code>, and <code>DELETE</code> in <strong>Access-Control-Allow-Methods.</strong></p>
</li>
<li><p>Click <strong>Save</strong>.</p>
</li>
</ol>
<p>Now it returns our coffee, and the CORS error has been resolved.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760794692165/7d53573d-f2e0-456d-a1da-0d06265d78a9.png" alt="image of solved cors error" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>When you add a new coffee, you should see the newly created items in your DynamoDB database.</p>
<h3 id="heading-step-6-set-up-amazon-api-gateway-authorizer">Step 6: Set up Amazon API Gateway Authorizer</h3>
<p>AWS Congnito helps you secure your Amazon API Gateway. Gateway validates the access token with Amazon Cognito to ensure it is valid and has not expired, and grants or denies access based on token validity.</p>
<p>To get started:</p>
<ol>
<li><p>Navigate to <strong>Amazon Cognito &gt; User pools</strong>.</p>
</li>
<li><p>Click <strong>Create user pool</strong>.</p>
</li>
<li><p>Select <strong>Single-page application (SPA)</strong>.</p>
</li>
<li><p>Select email as the preferred sign-in and sign-up method.</p>
</li>
<li><p>Use <code>http://localhost:5174/</code> or your own local URL as the return URL.</p>
</li>
<li><p>Click <strong>Create user directory</strong>.</p>
</li>
</ol>
<p>You’ll be presented with a page containing code that we can copy and paste into our app for integration. But before we do that, let's head back to API Gateway and integrate it with Cognito. To do that:</p>
<ol>
<li><p>Go to the Authorization section in API Gateway.</p>
</li>
<li><p>Navigate to <strong>Manage authorizers</strong>.</p>
</li>
<li><p>Click <strong>Create</strong>.</p>
</li>
<li><p>Select JWT and name it “Cognito-CoffeeShop”</p>
</li>
<li><p>Copy your issuer URL from Cognito Overview. Your issuer URL is the <em>Token signing key URL</em>. If you click on the URL, you’ll be taken to your browser, where you'll see the keys that’ll be used for verification.</p>
</li>
<li><p>For the Audience, navigate to the Cognito user pool, then to App clients, and select CoffeShopClient. Copy the Client ID.</p>
</li>
<li><p>Click <strong>Create</strong>.</p>
</li>
<li><p>Go to Routes and add authorizations to each endpoint.</p>
</li>
</ol>
<p>Now, to integrate with our front-end app:</p>
<p>Navigate into the frontend folder and run the command below:</p>
<pre><code class="lang-bash">npm install oidc-client-ts react-oidc-context --save
</code></pre>
<ol start="2">
<li><p>Go to the <strong>App clients</strong> section in Cognito user pools to find the readily available code snippets for integration.</p>
</li>
<li><p>Edit your <code>main.jsx</code> file to include the code below:</p>
</li>
</ol>
<pre><code class="lang-typescript">
<span class="hljs-keyword">import</span> { createRoot } <span class="hljs-keyword">from</span> <span class="hljs-string">'react-dom/client'</span>
<span class="hljs-keyword">import</span> { BrowserRouter <span class="hljs-keyword">as</span> Router, Route, Routes } <span class="hljs-keyword">from</span> <span class="hljs-string">"react-router-dom"</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'./index.css'</span>
<span class="hljs-keyword">import</span> App <span class="hljs-keyword">from</span> <span class="hljs-string">'./App.jsx'</span>
<span class="hljs-keyword">import</span> ItemDetails <span class="hljs-keyword">from</span> <span class="hljs-string">"./ItemDetails"</span>;
<span class="hljs-keyword">import</span> { AuthProvider } <span class="hljs-keyword">from</span> <span class="hljs-string">"react-oidc-context"</span>;
<span class="hljs-keyword">const</span> cognitoAuthConfig = {
  authority: <span class="hljs-string">"https://cognito-idp.us-east-1.amazonaws.com/us-east-1_rXq7q3KLm"</span>,
  client_id: <span class="hljs-string">"6fjfrlaup7oph5lhf1q8q6pnp4"</span>,
  redirect_uri: <span class="hljs-string">"http://localhost:5174"</span>,
  response_type: <span class="hljs-string">"code"</span>,
  scope: <span class="hljs-string">"email openid phone"</span>,
};
createRoot(<span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">'root'</span>)).render(
  &lt;AuthProvider {...cognitoAuthConfig}&gt;
    &lt;Router&gt;
      &lt;div&gt;
        &lt;Routes&gt;
          &lt;Route path=<span class="hljs-string">"/"</span> element={&lt;App /&gt;} /&gt;
          &lt;Route path=<span class="hljs-string">"/details/:id"</span> element={&lt;ItemDetails /&gt;} /&gt;
        &lt;/Routes&gt;
      &lt;/div&gt;
    &lt;/Router&gt;
  &lt;/AuthProvider&gt;
)
</code></pre>
<p>Here, we imported <code>AuthProvider</code> from <code>react-oidc-context</code>, then wrapped our app with <code>AuthProvider</code>.  Then, move the code in the <code>App.jsx</code> file to a newly created <code>Home.jsx</code> file, and update <code>App.jsx</code> file with the code below:</p>
<pre><code class="lang-typescript">
<span class="hljs-keyword">import</span> { useEffect, useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">"./App.css"</span>;
<span class="hljs-comment">// App.js</span>
<span class="hljs-keyword">import</span> { useAuth } <span class="hljs-keyword">from</span> <span class="hljs-string">"react-oidc-context"</span>;
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> auth = useAuth();
  <span class="hljs-keyword">const</span> signOutRedirect = <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">const</span> clientId = <span class="hljs-string">"6fjfrlaup7oph5lhf1q8q6pnp4"</span>;
    <span class="hljs-keyword">const</span> logoutUri = <span class="hljs-string">"http://localhost:5174/"</span>;
    <span class="hljs-keyword">const</span> cognitoDomain = <span class="hljs-string">"https://us-east-1rxq7q3klm.auth.us-east-1.amazoncognito.com"</span>;
    <span class="hljs-built_in">window</span>.location.href = <span class="hljs-string">`<span class="hljs-subst">${cognitoDomain}</span>/logout?client_id=<span class="hljs-subst">${clientId}</span>&amp;logout_uri=<span class="hljs-subst">${<span class="hljs-built_in">encodeURIComponent</span>(logoutUri)}</span>`</span>;
  };
  <span class="hljs-keyword">if</span> (auth.isLoading) {
    <span class="hljs-keyword">return</span> &lt;div&gt;Loading...&lt;/div&gt;;
  }
  <span class="hljs-keyword">if</span> (auth.error) {
    <span class="hljs-keyword">return</span> &lt;div&gt;Encountering error... {auth.error.message}&lt;/div&gt;;
  }
  <span class="hljs-keyword">if</span> (auth.isAuthenticated) {
    <span class="hljs-keyword">return</span> (
      &lt;div&gt;
        &lt;button onClick={<span class="hljs-function">() =&gt;</span> auth.removeUser()}&gt;Sign out&lt;/button&gt;
        &lt;Home /&gt;
      &lt;/div&gt;
    );
  }
  <span class="hljs-keyword">return</span> (
    &lt;div&gt;
      &lt;button onClick={<span class="hljs-function">() =&gt;</span> auth.signinRedirect()}&gt;Sign <span class="hljs-keyword">in</span>&lt;/button&gt;
      &lt;button onClick={<span class="hljs-function">() =&gt;</span> signOutRedirect()}&gt;Sign out&lt;/button&gt;
    &lt;/div&gt;
  );
}
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> App;
</code></pre>
<p>Now, when you run the application again, you should see this login page on your browser:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760795002733/2be7ce35-ecff-41bb-adff-ed13c7a33a32.png" alt="Sign in and Sign out buttons" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>When you click on Sign in, you’ll get directed to the Sign in page. Click Sign up. You should see the page below to create your account.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760795104086/41c44c85-881d-482c-ae1f-d84f1ea76fb5.png" alt="Sign in page with a form" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>During sign-up, a verification code is sent to your sign-up email. Once you’re logged in, you can then access your coffee dashboard.</p>
<h3 id="heading-step-7-create-cloudfront-distribution-with-behaviors-for-s3-and-api-gateway"><strong>Step 7: Create Cloudfront Distribution With Behaviors For S3 And API Gateway</strong></h3>
<p>To create a distribution.</p>
<ol>
<li><p>Navigate to <strong>CloudFront</strong>.</p>
</li>
<li><p>Click <strong>Create distribution</strong>.</p>
</li>
<li><p>In the Origin page, select the S3 bucket and browse through your created S3 buckets.</p>
</li>
<li><p>Select your coffee shop bucket.</p>
</li>
<li><p>Set origin path to <code>/dist</code>.</p>
</li>
<li><p>Select <em>Origin access control</em> under <strong>Origin access</strong>.</p>
</li>
<li><p>Update your React code and AWS Cognito with the distribution domain name provided in the CloudFront log-in pages tab.</p>
</li>
</ol>
<h3 id="heading-step-8-set-up-react-application-and-upload-build-to-s3-bucket"><strong>Step 8: Set up React Application And Upload Build To S3 Bucket</strong></h3>
<p>In this step, we’ll be building our React application and uploading the static files to an Amazon S3 bucket, which is then served from a CloudFront distribution.</p>
<p>To get started:</p>
<ol>
<li><p>Create an S3 bucket and give it the name “mycoffeeShop123new”. This name should be globally unique across all AWS accounts.</p>
</li>
<li><p>In the frontend folder, run the <code>npm run build</code> command. This creates a <code>dist</code> folder in your directory.</p>
</li>
<li><p>Head back to the S3 bucket and drag-and-drop the <code>dist</code> folder into S3 to upload it.</p>
</li>
<li><p>Click <strong>Upload</strong>.</p>
</li>
</ol>
<p>Now, copy your CloudFront distribution URL and try to access your site in a private browser, for example, Chrome incognito. You should see your site live in the browser.</p>
<h2 id="heading-troubleshooting-access-denied-error"><strong>Troubleshooting Access Denied Error</strong></h2>
<p>You may encounter an access denied error in the browser:</p>
<pre><code class="lang-xml">
<span class="hljs-tag">&lt;<span class="hljs-name">Error</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">Code</span>&gt;</span>AccessDenied<span class="hljs-tag">&lt;/<span class="hljs-name">Code</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">Message</span>&gt;</span>Access Denied<span class="hljs-tag">&lt;/<span class="hljs-name">Message</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">Error</span>&gt;</span>
</code></pre>
<p>It may be because of a likely S3 + CloudFront configuration error. Here are the steps to resolve this issue:</p>
<h3 id="heading-step-1-set-up-origin-access-control-oac">Step 1: Set up Origin Access Control (OAC)</h3>
<ol>
<li><p>Go to <strong>CloudFront &gt; Your Distribution &gt; Origins tab.</strong></p>
</li>
<li><p>Select your S3 origin and click <strong>Edit.</strong></p>
</li>
<li><p>Under <strong>Origin access</strong>, select <strong>Origin access control settings (recommended)</strong></p>
</li>
<li><p>Click <strong>Create new OAC</strong> (or select an existing one).</p>
</li>
<li><p>Click <strong>Save changes.</strong></p>
</li>
</ol>
<h3 id="heading-step-2-update-s3-bucket-policy">Step 2: Update S3 Bucket Policy</h3>
<p>After saving, CloudFront will show you a <strong>"Copy Policy"</strong> button. Click it, then:</p>
<ol>
<li><p>Go to your S3 bucket &gt; <strong>Permissions</strong> tab.</p>
</li>
<li><p>Scroll to <strong>Bucket policy</strong> and click <strong>Edit.</strong></p>
</li>
<li><p>Paste the copied policy (it should look like this):</p>
</li>
</ol>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"Version"</span>: <span class="hljs-string">"2012-10-17"</span>,
    <span class="hljs-attr">"Statement"</span>: [
        {
            <span class="hljs-attr">"Sid"</span>: <span class="hljs-string">"AllowCloudFrontServicePrincipal"</span>,
            <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
            <span class="hljs-attr">"Principal"</span>: {
                <span class="hljs-attr">"Service"</span>: <span class="hljs-string">"cloudfront.amazonaws.com"</span>
            },
            <span class="hljs-attr">"Action"</span>: <span class="hljs-string">"s3:GetObject"</span>,
            <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:s3:::YOUR-BUCKET-NAME/*"</span>,
            <span class="hljs-attr">"Condition"</span>: {
                <span class="hljs-attr">"StringEquals"</span>: {
                    <span class="hljs-attr">"AWS:SourceArn"</span>: <span class="hljs-string">"arn:aws:cloudfront::YOUR-ACCOUNT-ID:distribution/YOUR-DISTRIBUTION-ID"</span>
                }
            }
        }
    ]
}
</code></pre>
<ol start="4">
<li>Click <strong>Save changes.</strong></li>
</ol>
<h3 id="heading-step-3-set-default-root-object"><strong>Step 3: Set Default Root Object</strong></h3>
<ol>
<li><p>Go back to <strong>CloudFront &gt; Your Distribution &gt; General</strong> tab.</p>
</li>
<li><p>Click <strong>Edit.</strong></p>
</li>
<li><p>Set <strong>Default root object</strong> to <code>index.html</code>.</p>
</li>
<li><p>Save changes.</p>
</li>
</ol>
<p>Now, try accessing the site again. It should work.</p>
<p>This brings us to the end of this tutorial. I hope you were able to learn a thing or two about building serverless systems :)</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Congratulations! You've just built a production-ready serverless application from the ground up. You've successfully architected a complete CRUD system that automatically scales, stays secure with Cognito authentication, and costs you only what you actually use.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Docker Build Tutorial: Learn Contexts, Architecture, and Performance Optimization Techniques ]]>
                </title>
                <description>
                    <![CDATA[ Docker build is a fundamental concept every developer needs to understand. Whether you're containerizing your first application or optimizing existing Docker workflows, understanding Docker build contexts and Docker build architecture is essential fo... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/docker-build-tutorial-learn-contexts-architecture-and-performance-optimization-techniques/</link>
                <guid isPermaLink="false">68e559d8ac28fbe4acae92be</guid>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ software development ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Destiny Erhabor ]]>
                </dc:creator>
                <pubDate>Tue, 07 Oct 2025 18:20:08 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1759861193876/871b72e7-9673-4572-b788-48f082a6b380.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Docker build is a fundamental concept every developer needs to understand. Whether you're containerizing your first application or optimizing existing Docker workflows, understanding Docker build contexts and Docker build architecture is essential for creating efficient, scalable containerized applications.</p>
<p>This comprehensive guide covers everything from basic concepts to advanced optimization techniques, helping you avoid common pitfalls and build better Docker images.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-docker-build">What is Docker Build?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-docker-build-architecture-how-it-all-works">Docker Build Architecture: How It All Works</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-docker-build-features">Docker Build Features</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-docker-build-context">Docker Build Context</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-types-of-docker-build-contexts">Types of Docker Build Contexts</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-common-docker-build-mistakes-and-how-to-fix-them">Common Docker Build Mistakes (And How to Fix Them)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-optimize-and-monitor-build-performance">How to Optimize and Monitor Build Performance</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-best-practices-for-docker-build-performance">Best Practices for Docker Build Performance</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-troubleshooting-docker-build-issues">Troubleshooting Docker Build Issues</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-what-is-docker-build">What is Docker Build?</h2>
<p>Docker build is the process of creating a Docker image from a Dockerfile and a set of files called the <strong>build context</strong>. When you run <code>docker build</code>, you're instructing Docker to:</p>
<ol>
<li><p>Read your Dockerfile instructions</p>
</li>
<li><p>Gather the necessary files (build context)</p>
</li>
<li><p>Execute each instruction step-by-step</p>
</li>
<li><p>Create a final Docker image</p>
</li>
</ol>
<p>Think of it like following a recipe: the Dockerfile is your recipe, and the build context contains all the ingredients you might need.</p>
<h2 id="heading-docker-build-architecture-how-it-all-works">Docker Build Architecture: How It All Works</h2>
<p>Docker Build uses a client-server architecture where two separate components (<strong>Buildx and BuildKit</strong>) work together to build your Docker images. This is different from how many people think Docker works, as it's not just one monolithic program doing everything.</p>
<h3 id="heading-what-is-buildx-the-client">What is Buildx (The Client)?</h3>
<p>Buildx serves as the user interface that you interact with directly whenever you work with Docker builds. When you type <code>docker build .</code> in your terminal, you're actually communicating with Buildx, which acts as the intermediary between you and the actual build engine.</p>
<h4 id="heading-buildxs-primary-jobs">Buildx’s primary jobs:</h4>
<ul>
<li><p>Interprets your build command and options</p>
</li>
<li><p>Sends structured build requests to BuildKit</p>
</li>
<li><p>Manages multiple BuildKit instances (builders)</p>
</li>
<li><p>Handles authentication and secrets</p>
</li>
<li><p>Displays build progress to you</p>
</li>
</ul>
<h3 id="heading-what-is-buildkit-the-serverbuilder">What is BuildKit (The Server/Builder)</h3>
<p>BuildKit functions as the actual build engine that performs all the heavy lifting during the Docker build process. This powerful backend component receives the structured build requests from Buildx and immediately begins reading and interpreting your Dockerfiles line by line.</p>
<h4 id="heading-buildkits-primary-jobs">BuildKit’s primary jobs:</h4>
<ul>
<li><p>Receives build requests from Buildx</p>
</li>
<li><p>Reads and interprets Dockerfiles</p>
</li>
<li><p>Executes build instructions step by step</p>
</li>
<li><p>Manages build cache and layers</p>
</li>
<li><p>Requests only the files it needs from the client</p>
</li>
<li><p>Creates the final Docker image</p>
</li>
</ul>
<h3 id="heading-how-they-communicate">How They Communicate</h3>
<p>Here's what happens when you run <code>docker build .</code>:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1758733757378/d3322dad-efac-4c4a-b8f8-69f17a4920e8.png" alt="Diagram showing Docker build process with BuildKit, including sending build request with Dockerfile and build arguments, requesting and receiving package.json, running npm install, requesting and receiving src directory files, copying files, completing build, and optionally pushing to registry." class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>When you run <code>docker build</code>, the command initiates a multi-step process with BuildKit (as illustrated in the above image).</p>
<p>First, it sends a build request containing your Dockerfile, build arguments, export options, and cache options. BuildKit then intelligently requests only the files it needs when it needs them, starting with <code>package.json</code> to run <code>npm install</code> for dependency installation.</p>
<p>After that's complete, it requests the <code>src/</code> directory containing your application code and copies those files into the image with the <code>COPY</code> command.</p>
<p>Once all build steps are finished, BuildKit sends back the completed image. Optionally, you can then push this image to a container registry for distribution or deployment.</p>
<p>This on-demand file transfer approach is one of BuildKit's key optimizations: rather than sending your entire build context upfront, it only requests specific files as each build step needs them, making the build process more efficient.</p>
<h3 id="heading-key-communication-details">Key Communication Details</h3>
<p>Build request contains:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"dockerfile"</span>: <span class="hljs-string">"FROM node:18\nWORKDIR /app\n..."</span>,
  <span class="hljs-attr">"buildArgs"</span>: {<span class="hljs-attr">"NODE_ENV"</span>: <span class="hljs-string">"production"</span>},
  <span class="hljs-attr">"exportOptions"</span>: {<span class="hljs-attr">"type"</span>: <span class="hljs-string">"image"</span>, <span class="hljs-attr">"name"</span>: <span class="hljs-string">"my-app:latest"</span>},
  <span class="hljs-attr">"cacheOptions"</span>: {<span class="hljs-attr">"type"</span>: <span class="hljs-string">"registry"</span>, <span class="hljs-attr">"ref"</span>: <span class="hljs-string">"my-app:cache"</span>}
}
</code></pre>
<p>Resource requests:</p>
<ul>
<li><p>BuildKit asks: "I need the file at <code>./package.json</code>"</p>
</li>
<li><p>Buildx responds: Sends the actual file content</p>
</li>
<li><p>BuildKit asks: "I need the directory <code>./src/</code>"</p>
</li>
<li><p>Buildx responds: Sends all files in that directory</p>
</li>
</ul>
<h3 id="heading-why-this-architecture-exists">Why This Architecture Exists</h3>
<h4 id="heading-1-efficiency">1. Efficiency</h4>
<p>The old Docker builder had a major flaw: it always copied your entire build context upfront, regardless of what was actually needed. Even if your Dockerfile only used a few files, Docker would transfer hundreds of megabytes before starting the build.</p>
<p>BuildKit fixes this through on-demand file transfers. It only requests specific files at each step.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Old Docker Builder (legacy)</span>
<span class="hljs-comment"># Always copied ENTIRE context upfront</span>
$ docker build .
Sending build context to Docker daemon  245.7MB  <span class="hljs-comment"># Everything!</span>

<span class="hljs-comment"># New BuildKit Architecture  </span>
<span class="hljs-comment"># Only requests files when needed</span>
$ docker build .
<span class="hljs-comment">#1 [internal] load build definition from Dockerfile    0.1s</span>
<span class="hljs-comment">#2 [internal] load .dockerignore                       0.1s</span>
<span class="hljs-comment">#3 [1/4] FROM node:18                                  0.5s</span>
<span class="hljs-comment">#4 [internal] load build context                       0.1s</span>
<span class="hljs-comment">#4 transferring context: 234B  # Only package.json initially!</span>
<span class="hljs-comment">#5 [2/4] WORKDIR /app                                  0.2s  </span>
<span class="hljs-comment">#6 [3/4] COPY package*.json ./                         0.1s</span>
<span class="hljs-comment">#7 [4/4] RUN npm install                               5.2s</span>
<span class="hljs-comment">#8 [internal] load build context                       0.3s  </span>
<span class="hljs-comment">#8 transferring context: 2.1MB  # Now requests src/ files</span>
<span class="hljs-comment">#9 [5/4] COPY src/ ./src/                              0.2s</span>
</code></pre>
<h4 id="heading-2-scalability">2. Scalability</h4>
<p>The client-server architecture enables scalability features. Multiple Docker CLI clients can connect to the same BuildKit instance, and BuildKit can run on remote servers instead of your local machine. This means you could execute builds on a cloud server while controlling them from your laptop. Teams can also deploy multiple BuildKit instances for different teams or purposes, scaling from individual developers to large enterprises.</p>
<h4 id="heading-3-security">3. Security</h4>
<p>Security is improved by only requesting sensitive files when explicitly needed. BuildKit never sees files your Dockerfile doesn't reference, reducing the attack surface. It also handles credentials through separate, secure channels rather than mixing them with your build context, preventing secrets from being embedded in image layers or exposed in build logs.</p>
<h3 id="heading-real-world-example">Real-World Example</h3>
<p>Let's trace through a typical build step by step. You can find the full code available here: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building/tree/main/beginner/docker/docker-build-architecture-examples/02-python-cache">02-python-cache</a>.</p>
<pre><code class="lang-dockerfile"><span class="hljs-keyword">FROM</span> python:<span class="hljs-number">3.9</span>-slim
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>
<span class="hljs-keyword">COPY</span><span class="bash"> requirements.txt .</span>
<span class="hljs-keyword">RUN</span><span class="bash"> pip install -r requirements.txt</span>
<span class="hljs-keyword">COPY</span><span class="bash"> src/ ./src/</span>
<span class="hljs-keyword">COPY</span><span class="bash"> main.py .</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"python"</span>, <span class="hljs-string">"main.py"</span>]</span>
</code></pre>
<p>Let’s see what actually happens here:</p>
<ol>
<li><p>You run <code>docker build .</code></p>
</li>
<li><p>Buildx says to BuildKit:</p>
</li>
</ol>
<pre><code class="lang-bash">   <span class="hljs-string">"Here's a build request with this Dockerfile"</span>
</code></pre>
<ol start="3">
<li><p><strong>BuildKit processes</strong>: <code>FROM python:3.9-slim</code></p>
<ul>
<li>No client files needed, pulls base image</li>
</ul>
</li>
<li><p><strong>BuildKit processes</strong>: <code>COPY requirements.txt .</code></p>
<ul>
<li><p>BuildKit to Buildx: "I need <code>requirements.txt</code>"</p>
</li>
<li><p>Buildx to BuildKit: Sends the file content</p>
</li>
</ul>
</li>
<li><p><strong>BuildKit processes</strong>: <code>RUN pip install -r requirements.txt</code></p>
<ul>
<li>No client files needed, runs inside container</li>
</ul>
</li>
<li><p><strong>BuildKit processes</strong>: <code>COPY src/ ./src/</code></p>
<ul>
<li><p>BuildKit to Buildx: "I need all files in <code>src/</code> directory"</p>
</li>
<li><p>Buildx to BuildKit: Sends all files in src/</p>
</li>
</ul>
</li>
<li><p><strong>BuildKit processes</strong>: <code>COPY main.py .</code></p>
<ul>
<li><p>BuildKit to Buildx: "I need <code>main.py</code>"</p>
</li>
<li><p>Buildx to BuildKit: Sends the file</p>
</li>
</ul>
</li>
<li><p>BuildKit to Buildx: "Build complete, here's your image"</p>
</li>
</ol>
<p>From the illustration, you can see that BuildKit only requests what it needs, when it needs it. Not this entire context:</p>
<pre><code class="lang-bash">
my-app/
├── src/                 <span class="hljs-comment"># ← Only loaded when COPY src/ runs</span>
├── tests/              <span class="hljs-comment"># ← Never requested (not in Dockerfile)</span>
├── docs/               <span class="hljs-comment"># ← Never requested  </span>
├── node_modules/       <span class="hljs-comment"># ← Never requested (in .dockerignore)</span>
├── requirements.txt    <span class="hljs-comment"># ← Loaded early (first COPY)</span>
└── main.py            <span class="hljs-comment"># ← Loaded later (second COPY)</span>
</code></pre>
<h2 id="heading-docker-build-features">Docker Build Features</h2>
<h3 id="heading-named-contexts">Named Contexts</h3>
<p>👉 Demo project: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building/tree/main/beginner/docker/docker-build-architecture-examples/07-named-contexts">07-named-contexts</a></p>
<p>Named contexts allow you to include files from multiple sources during a build while keeping them logically separated. This is useful when you need documentation, configuration files, or shared libraries from different directories or repositories in your build.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Build with additional named context</span>
docker build --build-context docs=./documentation .
</code></pre>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Use named context in Dockerfile</span>
<span class="hljs-keyword">FROM</span> alpine
<span class="hljs-keyword">COPY</span><span class="bash"> . /app</span>
<span class="hljs-comment"># Mount files from named context</span>
<span class="hljs-keyword">RUN</span><span class="bash"> --mount=from=docs,target=/docs \
    cp /docs/manual.pdf /app/</span>
</code></pre>
<h3 id="heading-build-secrets">Build Secrets</h3>
<p>👉 Demo project: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building/tree/main/beginner/docker/docker-build-architecture-examples/06-build-secrets">06-build-secrets</a></p>
<p>Build secrets let you pass sensitive information (like API keys or passwords) to your build without including them in the final image or build history. The secrets are mounted temporarily during specific <code>RUN</code> commands and are never stored in image layers.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Pass secret to build</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"api_key=secret123"</span> | docker build --secret id=apikey,src=- .
</code></pre>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Use secret in Dockerfile</span>
<span class="hljs-keyword">FROM</span> alpine
<span class="hljs-keyword">RUN</span><span class="bash"> --mount=<span class="hljs-built_in">type</span>=secret,id=apikey \
    <span class="hljs-built_in">export</span> API_KEY=$(cat /run/secrets/apikey) &amp;&amp; \
    curl -H <span class="hljs-string">"Authorization: <span class="hljs-variable">$API_KEY</span>"</span> https://api.example.com/data</span>
</code></pre>
<h2 id="heading-docker-build-context">Docker Build Context</h2>
<h3 id="heading-what-is-a-build-context">What is a Build Context?</h3>
<p>The build context is the collection of files and directories that Docker can access during the build process. It's like gathering all your cooking ingredients on the counter before you start cooking.</p>
<pre><code class="lang-bash">docker build [OPTIONS] CONTEXT
                       ^^^^^^^
                       This is your build context
</code></pre>
<h3 id="heading-why-build-contexts-matter">Why Build Contexts Matter</h3>
<ol>
<li><p><strong>Security</strong>: Only files in the context can be accessed during build</p>
</li>
<li><p><strong>Performance</strong>: Large contexts slow down builds</p>
</li>
<li><p><strong>Functionality</strong>: Your Dockerfile can only COPY/ADD files from the context</p>
</li>
<li><p><strong>Efficiency</strong>: Understanding contexts helps you build faster, leaner images</p>
</li>
</ol>
<h2 id="heading-types-of-docker-build-contexts">Types of Docker Build Contexts</h2>
<h3 id="heading-1-local-directory-context-most-common">1. Local Directory Context (Most Common)</h3>
<p>👉 See code here: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building/tree/main/beginner/docker/docker-build-architecture-examples/01-node-local-context">01-node-local-context</a></p>
<p>This is what you'll use in 90% of cases – pointing to a folder on your machine:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Use current directory</span>
docker build .

<span class="hljs-comment"># Use specific directory</span>
docker build /path/to/my/project

<span class="hljs-comment"># Use parent directory</span>
docker build ..
</code></pre>
<p><strong>Example Project Structure:</strong></p>
<pre><code class="lang-bash">my-webapp/
├── src/
│   ├── index.js
│   └── utils.js
├── public/
│   ├── index.html
│   └── styles.css
├── package.json
├── package-lock.json
├── Dockerfile
├── .dockerignore
└── README.md
</code></pre>
<p><strong>Corresponding Dockerfile:</strong></p>
<pre><code class="lang-dockerfile"><span class="hljs-keyword">FROM</span> node:<span class="hljs-number">18</span>-alpine
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>

<span class="hljs-comment"># Copy package files first for better layer caching</span>
<span class="hljs-keyword">COPY</span><span class="bash"> package*.json ./</span>
<span class="hljs-keyword">RUN</span><span class="bash"> npm ci --only=production</span>

<span class="hljs-comment"># Copy application source</span>
<span class="hljs-keyword">COPY</span><span class="bash"> src/ ./src/</span>
<span class="hljs-keyword">COPY</span><span class="bash"> public/ ./public/</span>

<span class="hljs-keyword">EXPOSE</span> <span class="hljs-number">3000</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"node"</span>, <span class="hljs-string">"src/index.js"</span>]</span>
</code></pre>
<h3 id="heading-2-remote-git-repository-context">2. Remote Git Repository Context</h3>
<p>You can build directly from Git repositories without cloning locally:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Build from GitHub main branch</span>
docker build https://github.com/&lt;username&gt;/project.git

<span class="hljs-comment"># Build from specific branch</span>
docker build https://github.com/&lt;username&gt;/project.git<span class="hljs-comment">#develop</span>

<span class="hljs-comment"># Build from specific directory in repo</span>
docker build https://github.com/&lt;username&gt;/project.git<span class="hljs-comment">#main:docker</span>

<span class="hljs-comment"># Build with authentication</span>
docker build --ssh default git@github.com:&lt;username&gt;/private-repo.git
</code></pre>
<p>This has various cases like CI/CD pipelines, building open-source projects, ensuring clean builds from source control, automated deployments, and so on.</p>
<h3 id="heading-3-remote-tarball-context">3. Remote Tarball Context</h3>
<p>You can also build from compressed archives hosted on web servers. A remote <strong>tarball</strong> is a <code>.tar.gz</code> or similar compressed archive file accessible via HTTP/HTTPS. This is useful when your source code is packaged and hosted on a web server, artifact repository, or CDN. Docker downloads and extracts the archive automatically, using its contents as the build context.</p>
<p>This approach works well for CI/CD pipelines where build artifacts are stored centrally, or when you want to build images from released versions of your code without cloning entire repositories.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Build from remote tarball</span>
docker build http://server.com/context.tar.gz

<span class="hljs-comment"># BuildKit downloads and extracts automatically</span>
docker build https://example.com/project-v1.2.3.tar.gz
</code></pre>
<h3 id="heading-4-empty-context-advanced">4. Empty Context (Advanced)</h3>
<p>When you don't need any files, you can pipe the Dockerfile directly:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create image without file context</span>
docker build -t hello-world - &lt;&lt;EOF
FROM alpine:latest
RUN <span class="hljs-built_in">echo</span> <span class="hljs-string">"Hello, World!"</span> &gt; /hello.txt
CMD cat /hello.txt
EOF
</code></pre>
<h2 id="heading-common-docker-build-mistakes-and-how-to-fix-them">Common Docker Build Mistakes (And How to Fix Them)</h2>
<h3 id="heading-mistake-1-wrong-context-directory">Mistake 1: Wrong Context Directory</h3>
<p>👉 Reproduced here: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building/tree/main/beginner/docker/docker-build-architecture-examples/04-wrong-context">04-wrong-context</a></p>
<p>This mistake occurs when you run <code>docker build</code> from the wrong directory, causing the build context to be different from what your Dockerfile expects.</p>
<p>In the example, running <code>docker build frontend/</code> from the <code>/projects/</code> directory means the context is <code>/projects/frontend/</code>, but the Dockerfile tries to access <code>../shared/utils.js</code>, which is outside this context. Docker can only access files within the build context, so any attempt to reference files outside it will fail.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Project structure</span>
/projects/
├── frontend/
│   ├── Dockerfile
│   ├── src/
│   └── package.json
└── shared/
    └── utils.js

<span class="hljs-comment"># WRONG - Running from projects directory</span>
docker build frontend/
<span class="hljs-comment"># This won't work if Dockerfile tries to COPY ../shared/utils.js</span>
</code></pre>
<h4 id="heading-how-to-fix-wrong-context-directory">How to fix wrong context directory:</h4>
<p>The key is aligning your build context with what your Dockerfile needs.</p>
<ul>
<li><p><strong>Option 1</strong> changes your working directory so the context matches your Dockerfile's expectations. You run the build from inside <code>frontend/</code>, making that directory the context root.</p>
</li>
<li><p><strong>Option 2</strong> keeps you in the parent directory but explicitly sets it as the context (the <code>.</code> argument) while telling Docker where to find the Dockerfile with the <code>-f</code> flag. Now both <code>frontend/</code> and <code>shared/</code> are accessible since they're both within the <code>/projects/</code> context.</p>
</li>
</ul>
<pre><code class="lang-bash"><span class="hljs-comment"># Option 1: Run from correct directory</span>
<span class="hljs-built_in">cd</span> frontend
docker build .

<span class="hljs-comment"># Option 2: Use parent directory as context</span>
docker build -f frontend/Dockerfile .
</code></pre>
<h3 id="heading-mistake-2-including-massive-files">Mistake 2: Including Massive Files</h3>
<p>👉 Optimized version with <code>.dockerignore</code>: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building/tree/main/beginner/docker/docker-build-architecture-examples/05-dockerignore-optimization">05-dockerignore-optimization</a></p>
<p>This mistake happens when your build context contains large, unnecessary files that slow down the build process.</p>
<p>Docker must transfer the entire context to the build daemon before starting, so including files like <code>node_modules</code> (which can be hundreds of MB), git history, build artifacts, logs, and database dumps makes builds painfully slow. These files are rarely needed in the final image and should be excluded.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># This context includes everything!</span>
my-app/
├── node_modules/        <span class="hljs-comment"># 200MB+ </span>
├── .git/               <span class="hljs-comment"># Version history</span>
├── dist/               <span class="hljs-comment"># Built files</span>
├── logs/               <span class="hljs-comment"># Log files</span>
├── temp/               <span class="hljs-comment"># Temporary files</span>
├── database.dump       <span class="hljs-comment"># 1GB database backup</span>
└── Dockerfile
</code></pre>
<h4 id="heading-how-to-fix-docker-build-massive-files">How to fix Docker build massive files:</h4>
<p>Use <code>.dockerignore</code> to exclude unnecessary files, dramatically reducing context size and build time. We’ll discuss this in more detail below.</p>
<h3 id="heading-mistake-3-inefficient-layer-caching">Mistake 3: Inefficient Layer Caching</h3>
<p>👉 See good practice code here: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building/tree/main/beginner/docker/docker-build-architecture-examples/02-python-cache">02-python-cache</a></p>
<p>This mistake wastes Docker's layer caching system by copying frequently-changing files (like source code) before running expensive operations (like <code>npm install</code>). When you modify your source code, Docker invalidates the cache for that layer and all subsequent layers, forcing <code>npm install</code> to run again even though dependencies haven't changed. This can turn a 5-second build into a 5-minute build.</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># BAD - Changes to source code rebuild npm install</span>
<span class="hljs-keyword">FROM</span> node:<span class="hljs-number">18</span>
<span class="hljs-keyword">COPY</span><span class="bash"> . /app</span>
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>
<span class="hljs-keyword">RUN</span><span class="bash"> npm install</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"npm"</span>, <span class="hljs-string">"start"</span>]</span>
</code></pre>
<h4 id="heading-how-to-fix-docker-build-inefficient-layer-caching">How to fix docker build inefficient layer caching:</h4>
<p>Copy dependency files first, install dependencies, then copy source code. This way, <code>npm install</code> only runs when <code>package.json</code> actually changes:</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># GOOD - npm install only rebuilds when package.json changes</span>
<span class="hljs-keyword">FROM</span> node:<span class="hljs-number">18</span>
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>
<span class="hljs-keyword">COPY</span><span class="bash"> package*.json ./</span>
<span class="hljs-keyword">RUN</span><span class="bash"> npm install</span>
<span class="hljs-keyword">COPY</span><span class="bash"> . .</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"npm"</span>, <span class="hljs-string">"start"</span>]</span>
</code></pre>
<h2 id="heading-how-to-optimize-and-monitor-build-performance">How to Optimize and Monitor Build Performance</h2>
<p>Understanding build performance metrics helps you identify bottlenecks and measure improvements.</p>
<h3 id="heading-how-to-optimize-docker-builds-with-dockerignore">How to Optimize Docker Builds with .dockerignore</h3>
<p>The <code>.dockerignore</code> file is your secret weapon for faster, more secure builds. It tells Docker which files to exclude from the build context.</p>
<h4 id="heading-creating-dockerignore-patterns">Creating .dockerignore Patterns</h4>
<p>Create a <code>.dockerignore</code> file in your project root. The syntax is similar to <code>.gitignore</code>, and you can use wildcards (<code>*</code>), match specific file extensions (<code>*.log</code>), exclude entire directories (<code>node_modules/</code>), or use negation patterns (<code>!important.txt</code>) to include files that would otherwise be excluded. Each line represents a pattern, and comments start with <code>#</code>.</p>
<p>Example of a .dockerignore file:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Dependencies</span>
node_modules/
npm-debug.log*
yarn-debug.log*
yarn-error.log*

<span class="hljs-comment"># Build outputs</span>
dist/
build/
*.tgz

<span class="hljs-comment"># Version control</span>
.git/
.gitignore
.svn/

<span class="hljs-comment"># IDE and editor files</span>
.vscode/
.idea/
*.swp
*.swo
*~

<span class="hljs-comment"># OS generated files</span>
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

<span class="hljs-comment"># Logs and databases</span>
*.<span class="hljs-built_in">log</span>
*.sqlite
*.db

<span class="hljs-comment"># Environment and secrets</span>
.env
.env.local
.env.*.<span class="hljs-built_in">local</span>
secrets/
*.key
*.pem

<span class="hljs-comment"># Documentation</span>
README.md
docs/
*.md

<span class="hljs-comment"># Test files</span>
<span class="hljs-built_in">test</span>/
tests/
*.test.js
coverage/

<span class="hljs-comment"># Temporary files</span>
tmp/
temp/
*.tmp
</code></pre>
<h3 id="heading-measuring-build-performance">Measuring Build Performance</h3>
<h4 id="heading-analyzing-build-time">Analyzing Build Time</h4>
<p>Understanding where your build spends time helps identify bottlenecks and optimization opportunities. The detailed progress output shows timing for each build step, cache hits/misses, and resource usage.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Enable BuildKit progress output</span>
DOCKER_BUILDKIT=1 docker build --progress=plain .

<span class="hljs-comment"># Use buildx for detailed timing</span>
docker buildx build --progress=plain .
</code></pre>
<h4 id="heading-profiling-context-transfer">Profiling Context Transfer</h4>
<p>Monitor context transfer time to understand how build context size affects overall performance. Profile which directories contribute most to help target <code>.dockerignore</code> optimizations.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Measure context transfer time</span>
time docker build --no-cache .

<span class="hljs-comment"># Profile context size by directory</span>
du -sh */ | sort -hr
</code></pre>
<h4 id="heading-measuring-dockerignore-impact">Measuring .dockerignore Impact</h4>
<p>Before <code>.dockerignore</code>, you'll notice that the <code>transfering context</code> size is 245.7MB in 15.2s:</p>
<pre><code class="lang-bash">$ docker build .
<span class="hljs-comment">#1 [internal] load build context</span>
<span class="hljs-comment">#1 transferring context: 245.7MB in 15.2s</span>
</code></pre>
<p>After adding the .dockerignore file, the context reduced to 2.1MB in 0.3s:</p>
<pre><code class="lang-bash">$ docker build .
<span class="hljs-comment">#1 [internal] load build context  </span>
<span class="hljs-comment">#1 transferring context: 2.1MB in 0.3s</span>
</code></pre>
<p><strong>Result</strong>: 99% reduction in context size and 50x faster context transfer!</p>
<h2 id="heading-best-practices-for-docker-build-performance">Best Practices for Docker Build Performance</h2>
<p>We've covered several optimization techniques throughout this guide. Here's a quick recap of the key practices, plus some additional strategies:</p>
<ol>
<li><p><strong>Layer Caching</strong> (covered in Mistake 3): Copy dependency files before source code to maximize cache reuse.</p>
</li>
<li><p><strong>Using .dockerignore</strong> (covered in Mistake 2): Exclude unnecessary files to reduce context size and improve build speed.</p>
</li>
<li><p><strong>Choosing the Right Context</strong> (covered earlier): Select appropriate context types (local, Git, tarball) based on your use case.</p>
</li>
</ol>
<p>Now let’s talk about some more ways you can improve performance:</p>
<h3 id="heading-use-multi-stage-builds">Use Multi-Stage Builds</h3>
<p>👉 Demo project: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building/tree/main/beginner/docker/docker-build-architecture-examples/03-multistage-node">03-multistage-node</a></p>
<p>Multi-stage builds let you use one image for building/compiling your application and a different, smaller image for running it. This dramatically reduces your final image size by excluding build tools, source code, and other unnecessary files from the production image.</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Build stage</span>
<span class="hljs-keyword">FROM</span> node:<span class="hljs-number">18</span> AS builder
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>
<span class="hljs-keyword">COPY</span><span class="bash"> package*.json ./</span>
<span class="hljs-keyword">RUN</span><span class="bash"> npm ci</span>
<span class="hljs-keyword">COPY</span><span class="bash"> . .</span>
<span class="hljs-keyword">RUN</span><span class="bash"> npm run build</span>

<span class="hljs-comment"># Production stage</span>
<span class="hljs-keyword">FROM</span> nginx:alpine
<span class="hljs-keyword">COPY</span><span class="bash"> --from=builder /app/dist /usr/share/nginx/html</span>
<span class="hljs-keyword">EXPOSE</span> <span class="hljs-number">80</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"nginx"</span>, <span class="hljs-string">"-g"</span>, <span class="hljs-string">"daemon off;"</span>]</span>
</code></pre>
<h3 id="heading-use-specific-base-images">Use Specific Base Images</h3>
<p>Generic base images like <code>ubuntu:latest</code> include many packages you don't need, making your images larger and slower to download. Specific images like <code>node:18-alpine</code> or distroless images contain only what's necessary for your application to run.</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Large base image</span>
<span class="hljs-keyword">FROM</span> ubuntu:latest

<span class="hljs-comment"># Smaller, more specific base image  </span>
<span class="hljs-keyword">FROM</span> node:<span class="hljs-number">18</span>-alpine

<span class="hljs-comment"># Even smaller distroless image</span>
<span class="hljs-keyword">FROM</span> gcr.io/distroless/nodejs18-debian11
</code></pre>
<h3 id="heading-combine-run-commands">Combine RUN Commands</h3>
<p>Each <code>RUN</code> command creates a new layer in your image. Multiple <code>RUN</code> commands create multiple layers, increasing image size. Combining commands into a single <code>RUN</code> instruction creates just one layer, and you can clean up temporary files in the same step.</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Creates multiple layers</span>
<span class="hljs-keyword">RUN</span><span class="bash"> apt-get update</span>
<span class="hljs-keyword">RUN</span><span class="bash"> apt-get install -y curl</span>
<span class="hljs-keyword">RUN</span><span class="bash"> apt-get clean</span>

<span class="hljs-comment"># Single layer</span>
<span class="hljs-keyword">RUN</span><span class="bash"> apt-get update &amp;&amp; \
    apt-get install -y curl &amp;&amp; \
    apt-get clean &amp;&amp; \
    rm -rf /var/lib/apt/lists/*</span>
</code></pre>
<h2 id="heading-troubleshooting-docker-build-issues">Troubleshooting Docker Build Issues</h2>
<h3 id="heading-issue-copy-failed-no-such-file-or-directory">Issue: "COPY failed: no such file or directory"</h3>
<p><strong>Problem</strong>: File not in build context<br><strong>What’s going wrong</strong>: Docker can only access files within the build context (the directory you specify in <code>docker build</code>). If your Dockerfile tries to <code>COPY</code> a file that doesn't exist in the context directory, the build fails. This often happens when running the build command from the wrong directory or when the file path is incorrect relative to the context root.</p>
<p><strong>Solution</strong>:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Check what's in your context</span>
ls -la

<span class="hljs-comment"># Verify file path relative to context</span>
docker build -t debug . --progress=plain
</code></pre>
<h3 id="heading-issue-docker-build-is-extremely-slow">Issue: "Docker Build is extremely slow"</h3>
<p><strong>Problem</strong>: Large build context<br><strong>What’s going wrong</strong>: Docker must transfer your entire build context to the BuildKit daemon before building starts. If your context contains large files, directories like <code>node_modules</code>, or unnecessary files, this transfer can take minutes instead of seconds. The larger the context, the slower your builds become.</p>
<p><strong>Solution</strong>:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Check context size</span>
du -sh .

<span class="hljs-comment"># Add more patterns to .dockerignore</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"large-directory/"</span> &gt;&gt; .dockerignore
<span class="hljs-built_in">echo</span> <span class="hljs-string">"*.zip"</span> &gt;&gt; .dockerignore
</code></pre>
<h3 id="heading-issue-cannot-locate-specified-dockerfile">Issue: "Cannot locate specified Dockerfile"</h3>
<p><strong>Problem</strong>: Dockerfile not in context root<br><strong>What’s going wrong</strong>: By default, Docker looks for a file named <code>Dockerfile</code> in the root of your build context. If your Dockerfile is in a subdirectory or has a different name, Docker can't find it. This is common in monorepo setups where Dockerfiles are organized in separate folders.</p>
<p><strong>Solution</strong>:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Specify Dockerfile location</span>
docker build -f path/to/Dockerfile .

<span class="hljs-comment"># Or move Dockerfile to context root</span>
mv path/to/Dockerfile .
</code></pre>
<h3 id="heading-issue-cache-misses-on-unchanged-files">Issue: "Cache misses on unchanged files"</h3>
<p><strong>Problem</strong>: File timestamps or permissions changed<br><strong>What’s going wrong</strong>: Docker's layer caching relies on file checksums and metadata. Even if file content is unchanged, different timestamps or permissions can cause cache misses, forcing unnecessary rebuilds. This often happens after git operations, file system operations, or when files are copied between systems.</p>
<p><strong>Solution</strong>:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Check file modifications</span>
git status

<span class="hljs-comment"># Reset timestamps</span>
git ls-files -z | xargs -0 touch -r .git/HEAD
</code></pre>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Understanding Docker build contexts and architecture is essential for achieving faster builds. We’ve covered various techniques in this article, like optimized contexts and caching strategies, creating smaller images with efficient layering and multi-stage builds, maintaining better security with proper secret handling and minimal attack surface, and delivering an improved developer experience with faster iteration cycles.</p>
<p>👉 <strong>Full code examples are available on GitHub here:</strong> <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building/tree/main/beginner/docker/docker-build-architecture-examples">Docker build architecture examples</a></p>
<p>As always, I hope you enjoyed the article and learned something new. If you want, you can also follow me on <a target="_blank" href="https://www.linkedin.com/in/destiny-erhabor">LinkedIn</a> or <a target="_blank" href="https://twitter.com/caesar_sage">Twitter</a>.</p>
<p>For more hands-on projects, follow and star this repository: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building">Learn-DevOps-by-building</a></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Upload Large Objects to S3 with AWS CLI Multipart Upload ]]>
                </title>
                <description>
                    <![CDATA[ Uploading large files to S3 using traditional single-request methods can be quite challenging. If you’re transferring a 5GB database backup, and a network interruption happens, it forces you to restart the entire upload process. This wastes bandwidth... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-upload-large-objects-to-s3-with-aws-cli-multipart-upload/</link>
                <guid isPermaLink="false">688b03991b7d57429769e76b</guid>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Chisom Uma ]]>
                </dc:creator>
                <pubDate>Thu, 31 Jul 2025 05:48:09 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1753940874032/9dae199c-ff29-44c1-aff9-4dad02fdc26d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Uploading large files to S3 using traditional single-request methods can be quite challenging. If you’re transferring a 5GB database backup, and a network interruption happens, it forces you to restart the entire upload process. This wastes bandwidth and time. And this approach becomes increasingly unreliable as file sizes grow.</p>
<p>With a single PUT operation, you can actually upload an object of up to 5 GB. But, when it comes to uploading larger objects (above 5GB), using Amazon S3’s <a target="_blank" href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html">Multipart Upload</a> feature is a better approach.</p>
<p>Multipart upload makes it easier for you to upload larger files and objects by segmenting them into smaller, independent chunks that upload separately and reassemble on S3.</p>
<p>In this guide, you’ll learn how to implement multipart uploads using AWS CLI.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-multipart-uploads-work">How Multipart Uploads Work</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-getting-started">Getting started</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-download-the-aws-cli">Step 1: Download the AWS CLI</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-configure-aws-iam-credentials">Step 2: Configure AWS IAM credentials</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-split-object">Step 1: Split Object</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-create-an-amazon-s3-bucket">Step 2: Create an Amazon S3 bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-initiate-multipart-upload">Step 3: Initiate Multipart Upload</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-upload-split-files-to-s3-bucket">Step 4: Upload split files to S3 Bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-create-a-json-file-to-compile-etag-values">Step 5: Create a JSON File to Compile ETag Values</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-complete-mulitipart-upload-to-s3">Step 6: Complete Mulitipart Upload to S3</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow this guide, you should have:</p>
<ul>
<li><p>An AWS account.</p>
</li>
<li><p>Knowledge of AWS and the S3 service.</p>
</li>
<li><p>AWS CLI is installed on your local machine.</p>
</li>
</ul>
<h2 id="heading-how-multipart-uploads-work">How Multipart Uploads Work</h2>
<p>In a multipart upload, large file transfers are segmented into smaller chunks that get uploaded separately to Amazon S3. After all segments complete their upload process, S3 reassembles them into the complete object.</p>
<p>For example, a 160GB file broken into 1GB segments generates 160 individual upload operations to S3. Each segment receives a distinct identifier while preserving sequence information to guarantee proper file reconstruction.</p>
<p>The system supports configurable retry logic for failed segments and allows upload suspension/resumption functionality. Here’s a diagram that shows what the multipart upload process looks like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753729484012/63a68bcc-8c2d-4110-a007-bd67a3b4b2e4.png" alt="AWS multipart upload process" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-getting-started">Getting Started</h2>
<p>Before you get started with this guide, make sure that you have the AWS CLI installed on your machine. If you don’t already have that installed, follow the steps below.</p>
<h3 id="heading-step-1-download-the-aws-cli"><strong>Step 1: Download the AWS CLI</strong></h3>
<p>To download the CLI, visit the <a target="_blank" href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html">CLI download documentation</a>. Then, download the CLI based on your operating system (Windows, Linux, macOS). Once the CLI is installed, the next step is to configure your AWS IAM credentials in your terminal.</p>
<h3 id="heading-step-2-configure-aws-iam-credentials"><strong>Step 2: Configure AWS IAM credentials</strong></h3>
<p>To configure your AWS credentials, navigate to your terminal and run the command below:</p>
<pre><code class="lang-bash">aws configure
</code></pre>
<p>This command prompts you to paste in certain credentials, such as AWS Access Key ID and secret ID. To obtain these credentials, create a new IAM user in your AWS account. To do this, follow the steps below. (You can skip these steps if you already have an IAM user and security credentials.)</p>
<ol>
<li><p><a target="_blank" href="https://eu-north-1.signin.aws.amazon.com/oauth?client_id=arn%3Aaws%3Asignin%3A%3A%3Aconsole%2Fcanvas&amp;code_challenge=gHNgZ4aX7afSDOY05TLWJNFrXDRnWRy-Mn3_TqW2p9o&amp;code_challenge_method=SHA-256&amp;response_type=code&amp;redirect_uri=https%3A%2F%2Fconsole.aws.amazon.com%2Fconsole%2Fhome%3FhashArgs%3D%2523%26isauthcode%3Dtrue%26nc2%3Dh_si%26src%3Dheader-signin%26state%3DhashArgsFromTB_eu-north-1_bcd5b75451c14744">Sign in</a> to your AWS dashboard.</p>
</li>
<li><p>Click on the search bar above your dashboard and search “IAM”.</p>
</li>
<li><p>Click on IAM.</p>
</li>
<li><p>In the left navigation pane, navigate to <strong>Access management</strong> &gt; <strong>Users</strong>.</p>
</li>
<li><p>Click <strong>Create user</strong>.</p>
</li>
<li><p>During IAM user creation, attach a policy directly by selecting <strong>Attach policies directly</strong> in step 2: Set permissions.</p>
</li>
<li><p>Give the user admin access by searching “admin” in the permission policies search bar and selecting AdministratorAccess.</p>
</li>
<li><p>On the next page, click <strong>Create user</strong>.</p>
</li>
<li><p>Click on the created user in the <strong>Users</strong> section and navigate to <strong>Security credentials</strong>.</p>
</li>
<li><p>Scroll down and click <strong>Create access key</strong>.</p>
</li>
<li><p>Select the Command Line Interface (CLI) use case.</p>
</li>
<li><p>On the next page, click <strong>Create access key</strong>.</p>
</li>
</ol>
<p>You will now see your access keys. <strong>Please keep these safe</strong> and do not expose them publicly or share them with anyone.</p>
<p>You can now copy these access keys into your terminal after running the <code>aws configure</code> command.</p>
<p>You will be prompted to include the following details:</p>
<ul>
<li><p><code>AWS Access Key ID</code>: gotten from the created IAM user credentials. See steps above.</p>
</li>
<li><p><code>AWS Secret Access Key</code>: gotten from the created IAM user credentials. See steps above.</p>
</li>
<li><p><code>Default region name</code>: default AWS region name, for example, <code>us-east-1</code>.</p>
</li>
<li><p><code>Default output format</code>: None.</p>
</li>
</ul>
<p>Now we’re done with the CLI configuration.</p>
<p>To confirm that you’ve successfully installed the CLI, run the command below:</p>
<pre><code class="lang-bash">aws --version
</code></pre>
<p>You should see the CLI version in your terminal as shown below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753730059205/5164dd17-5f66-40ba-b044-f1bca46a22a0.png" alt="Image of AWS CLI version" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now, you are ready for the following main steps for multipart uploads :)</p>
<h2 id="heading-step-1-split-object">Step 1: Split Object</h2>
<p>The first step is to split the object you intend to upload. For this guide, we’ll be splitting a <strong>188MB</strong> video file into smaller chunks.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753730190692/bd22598d-0267-4507-864d-f3f7397faf19.png" alt="Image of object size" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Note that this process also works for much larger files.</p>
<p>Next, locate the object you intend to upload in your system. You can use the <code>cd</code> command to locate the object in its stored folder using your terminal.</p>
<p>Then run the split command below:</p>
<pre><code class="lang-bash">split -b &lt;SIZE&gt;mb &lt;filename&gt;
</code></pre>
<p>Replace <code>&lt;SIZE&gt;</code> with your desired chunk size in megabytes (for example, 150, 100, 200).</p>
<p>For this use case, we’ll be splitting our 188mb video file into bytes. Here’s the command:</p>
<pre><code class="lang-bash">
split -b 31457280 videoplayback.mp4
</code></pre>
<p>Next, run the <code>ls -lh</code> command on your terminal. You should get the output below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753730408994/33ef3552-9a92-46e9-9c2a-686e73d23c65.png" alt="Image of split object" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Here, you can see that the 188MB file has been split into multiple parts (30MB and 7.9MB). When you go to the folder where the object is saved in your system files, you will see additional files with names that look like this:</p>
<ul>
<li><p><code>xaa</code></p>
</li>
<li><p><code>xab</code></p>
</li>
<li><p><code>xac</code></p>
</li>
</ul>
<p>and so on. These files represent the different parts of your object. For example, <code>xaa</code> is the first part of your file, which will be uploaded first to S3. More on this later in the guide.</p>
<h2 id="heading-step-2-create-an-amazon-s3-bucket">Step 2: Create an Amazon S3 Bucket</h2>
<p>If you don’t already have an S3 bucket created, follow the steps in the AWS <a target="_blank" href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/GetStartedWithS3.html">Get Started with Amazon S3</a> documentation to create one.</p>
<h2 id="heading-step-3-initiate-multipart-upload">Step 3: Initiate Multipart Upload</h2>
<p>The next step is to initiate a multipart upload. To do this, execute the command below:</p>
<pre><code class="lang-bash">
aws s3api create-multipart-upload --bucket DOC-EXAMPLE-BUCKET --key large_test_file
</code></pre>
<p>In this command:</p>
<ul>
<li><p><code>DOC-EXAMPLE-BUCKET</code> is your S3 bucket name.</p>
</li>
<li><p><code>large_test_file</code> is the name of the file, for example, videoplayback.mp4.</p>
</li>
</ul>
<p>You’ll get a JSON response in your terminal, providing you with the <code>UploadId</code><strong>.</strong> The response looks like this:</p>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"ServerSideEncryption"</span>: <span class="hljs-string">"AES345"</span>,
    <span class="hljs-attr">"Bucket"</span>: <span class="hljs-string">"s3-multipart-uploads"</span>,
    <span class="hljs-attr">"Key"</span>: <span class="hljs-string">"videoplayback.mp4"</span>,
    <span class="hljs-attr">"UploadId"</span>: <span class="hljs-string">"************************************"</span>
}
</code></pre>
<p>Keep the <code>UploadId</code> somewhere safe in your local machine, as you will need it for later steps.</p>
<h2 id="heading-step-4-upload-split-files-to-s3-bucket">Step 4: Upload Split Files to S3 Bucket</h2>
<p>Remember those extra files saved as xaa, xab, and so on? Well, now it’s time to upload them to your S3 bucket. To do that, execute the command below:</p>
<pre><code class="lang-bash">aws s3api upload-part --bucket DOC-EXAMPLE-BUCKET --key large_test_file --part-number 1 --body large_test_file.001 --upload-id exampleTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk
</code></pre>
<ul>
<li><p><code>DOC-EXAMPLE-BUCKET</code> is your S3 bucket name.</p>
</li>
<li><p><code>large_test_file</code> is the name of the file, for example, videoplayback.mp4</p>
</li>
<li><p><code>large_test_file.001</code> is the name of the file part, for example, xaa.</p>
</li>
<li><p><code>upload-id</code> replaces the example ID with your saved <code>UploadId</code>.</p>
</li>
</ul>
<p>The command returns a response that contains an <strong>ETag</strong> value for the part of the file that you uploaded.</p>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"ServerSideEncryption"</span>: <span class="hljs-string">"aws:kms"</span>,
    <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"\"7f9b8c3e2a1d5f4e8c9b2a6d4e8f1c3a\""</span>,
    <span class="hljs-attr">"ChecksumCRC64NVME"</span>: <span class="hljs-string">"mK9xQpD2WnE="</span>
}
</code></pre>
<p>Copy the <strong>ETag</strong> value and save it somewhere on your local machine, as you’ll need it later as a reference.</p>
<p>Continue uploading the remaining file parts by repeating the command above, incrementing both the part number and file name for each subsequent upload. For example: <code>xaa</code> becomes <code>xab</code>, and <code>--part-number 1</code> becomes <code>--part-number 2</code>, and so forth.</p>
<p>Note that upload speed depends on how large the object is and how good your internet speed is.</p>
<p>To confirm that all the file parts have been uploaded successfully, run the command below:</p>
<pre><code class="lang-bash">aws s3api list-parts --bucket s3-multipart-uploads --key videoplayback.mp4 --upload-id p0NU3agC3C2tOi4oBmT8lHLebUYqYXmWhEYYt8gc8jXlCStEZYe1_kSx1GjON2ExY_0T.4N4E6pjzPlNcji7VDT6UomtNYUhFkyzpQ7IFKrtA5Dov8YdC20c7UE20Qf0
</code></pre>
<p>Replace the example upload ID with your actual upload ID.</p>
<p>You should get a JSON response like this:</p>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"Parts"</span>: [
        {
            <span class="hljs-attr">"PartNumber"</span>: <span class="hljs-number">1</span>,
            <span class="hljs-attr">"LastModified"</span>: <span class="hljs-string">"2025-07-27T14:22:18+00:00"</span>,
            <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"\"f7b9c8e4d3a2f6e8c9b5a4d7e6f8c2b1\""</span>,
            <span class="hljs-attr">"Size"</span>: <span class="hljs-number">26214400</span>
        },
        {
            <span class="hljs-attr">"PartNumber"</span>: <span class="hljs-number">2</span>,
            <span class="hljs-attr">"LastModified"</span>: <span class="hljs-string">"2025-07-27T14:25:42+00:00"</span>,
            <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"\"a8e5d2c7f9b4e6a3c8d5f2e9b7c4a6d3\""</span>,
            <span class="hljs-attr">"Size"</span>: <span class="hljs-number">26214400</span>
        },
        {
            <span class="hljs-attr">"PartNumber"</span>: <span class="hljs-number">3</span>,
            <span class="hljs-attr">"LastModified"</span>: <span class="hljs-string">"2025-07-27T14:28:15+00:00"</span>,
            <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"\"c4f8e2b6d9a3c7e5f8b2d6a9c3e7f4b8\""</span>,
            <span class="hljs-attr">"Size"</span>: <span class="hljs-number">26214400</span>
        },
        {
            <span class="hljs-attr">"PartNumber"</span>: <span class="hljs-number">4</span>,
            <span class="hljs-attr">"LastModified"</span>: <span class="hljs-string">"2025-07-27T14:31:03+00:00"</span>,
            <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"\"e9c3f7a5d8b4e6c9f2a7d4b8c6e3f9a2\""</span>,
            <span class="hljs-attr">"Size"</span>: <span class="hljs-number">26214400</span>
        },
        {
            <span class="hljs-attr">"PartNumber"</span>: <span class="hljs-number">5</span>,
            <span class="hljs-attr">"LastModified"</span>: <span class="hljs-string">"2025-07-27T14:33:47+00:00"</span>,
            <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"\"b6d4a8c7f5e9b3d6a2c8f4e7b9c5d8a6\""</span>,
            <span class="hljs-attr">"Size"</span>: <span class="hljs-number">26214400</span>
        },
        {
            <span class="hljs-attr">"PartNumber"</span>: <span class="hljs-number">6</span>,
            <span class="hljs-attr">"LastModified"</span>: <span class="hljs-string">"2025-07-27T14:36:29+00:00"</span>,
            <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"\"d7e3c9f6a4b8d2e5c7f9a3b6d4e8c2f5\""</span>,
            <span class="hljs-attr">"Size"</span>: <span class="hljs-number">26214400</span>
        },
        {
            <span class="hljs-attr">"PartNumber"</span>: <span class="hljs-number">7</span>,
            <span class="hljs-attr">"LastModified"</span>: <span class="hljs-string">"2025-07-27T14:38:52+00:00"</span>,
            <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"\"f2a6d8c4e7b3f6a9c2d5e8b4c7f3a6d9\""</span>,
            <span class="hljs-attr">"Size"</span>: <span class="hljs-number">15728640</span>
        }
    ]
}
</code></pre>
<p>This is how you verify that all parts have been uploaded.</p>
<h2 id="heading-step-5-create-a-json-file-to-compile-etag-values">Step 5: Create a JSON File to Compile ETag Values</h2>
<p>The document we are about to create helps AWS understand which parts the ETags represent. Gather the <strong>ETag</strong> values from each uploaded file part and organize them into a JSON structure.</p>
<p>Sample JSON format:</p>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"Parts"</span>: [{
        <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"example8be9a0268ebfb8b115d4c1fd3"</span>,
        <span class="hljs-attr">"PartNumber"</span>:<span class="hljs-number">1</span>
    },

    ....

    {
        <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"example246e31ab807da6f62802c1ae8"</span>,
        <span class="hljs-attr">"PartNumber"</span>:<span class="hljs-number">4</span>
    }]
}
</code></pre>
<p>Save the created JSON file in the same folder as your object and name it <code>multipart.json</code>. You can use any IDE of your choice to create and save this document.</p>
<h2 id="heading-step-6-complete-mulitipart-upload-to-s3">Step 6: Complete Mulitipart Upload to S3</h2>
<p>To complete the multipart upload, run the command below:</p>
<pre><code class="lang-bash">aws s3api complete-multipart-upload --multipart-upload file://fileparts.json --bucket DOC-EXAMPLE-BUCKET --key large_test_file --upload-id exampleTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk
</code></pre>
<p>Replace <code>fileparts.json</code> with <code>multipart.json</code>.</p>
<p>You should get an output like this:</p>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"ServerSideEncryption"</span>: <span class="hljs-string">"AES256"</span>,
    <span class="hljs-attr">"Location"</span>: <span class="hljs-string">"https://s3-multipart-uploads.s3.eu-west-1.amazonaws.com/videoplayback.mp4"</span>,
    <span class="hljs-attr">"Bucket"</span>: <span class="hljs-string">"s3-multipart-uploads"</span>,
    <span class="hljs-attr">"Key"</span>: <span class="hljs-string">"videoplayback.mp4"</span>,
    <span class="hljs-attr">"ETag"</span>: <span class="hljs-string">"\"78298db673a369adf33dd8054bb6bab7-7\""</span>,
    <span class="hljs-attr">"ChecksumCRC64NVME"</span>: <span class="hljs-string">"d1UPkm73mAE="</span>,
    <span class="hljs-attr">"ChecksumType"</span>: <span class="hljs-string">"FULL_OBJECT"</span>
}
</code></pre>
<p>Now, when you go to your S3 bucket and hit refresh, you should see the uploaded object.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753731196794/c7e121d4-f3a7-4d23-95c5-26b1b5e3b699.png" alt="Image of object successfully uploaded to AWS using multipart upload" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Here, you can see the complete file, file name, type, and size.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Multipart uploads transform large file transfers to Amazon S3 from fragile, all-or-nothing operations into robust, resumable processes. By segmenting files into manageable chunks, you gain retry capabilities, better performance, and the ability to handle objects exceeding S3's 5GB single-upload limit.</p>
<p>This approach is essential for production environments dealing with database backups, video files, or any large assets. With the AWS CLI techniques covered in this guide, you're now equipped to handle S3 transfers confidently, regardless of file size or network conditions.</p>
<p>Check out this <a target="_blank" href="https://repost.aws/knowledge-center/s3-multipart-upload-cli">documentation from the AWS knowledge center</a> to learn more about multi-part uploads using AWS CLI.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
