<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ AWS - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ AWS - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Mon, 25 May 2026 05:05:42 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/aws/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ Common DevOps Mistakes and How to Avoid Them — Tips for Startups ]]>
                </title>
                <description>
                    <![CDATA[ Most DevOps engineers don't fail because they lack knowledge about tools. They fail because nobody told them what not to do before they got into production. Startup environments make this worse. The p ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-avoid-devops-mistakes/</link>
                <guid isPermaLink="false">6a060c22baf09db7a6253878</guid>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ startup ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tips ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tolani Akintayo ]]>
                </dc:creator>
                <pubDate>Thu, 14 May 2026 17:53:38 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/6fcabd5e-272f-4f1d-b035-8241896e8296.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most DevOps engineers don't fail because they lack knowledge about tools. They fail because nobody told them what <em>not</em> to do before they got into production.</p>
<p>Startup environments make this worse. The pressure to ship fast, the small team sizes, and the absence of senior engineers to review your decisions means mistakes happen quietly until they become outages, data loss events, or security incidents that cost the company thousands of dollars and weeks of recovery time.</p>
<p>This article is a direct breakdown of the ten most costly DevOps mistakes engineers make early in their careers at startups. For each mistake, you will get the real-world scenario, the business impact, and the concrete fix you can apply immediately.</p>
<p>Whether you are setting up your first production environment or auditing an existing one, this guide will help you build systems that are reliable, secure, and aligned with what the business actually needs.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-who-this-article-is-for">Who This Article Is For</a></p>
</li>
<li><p><a href="#heading-why-startups-are-a-different-environment">Why Startups Are a Different Environment</a></p>
</li>
<li><p><a href="#heading-mistake-1-deploying-without-understanding-what-youre-deploying">Mistake 1: Deploying Without Understanding What You're Deploying</a></p>
</li>
<li><p><a href="#heading-mistake-2-using-production-as-a-development-environment">Mistake 2: Using Production as a Development Environment</a></p>
</li>
<li><p><a href="#heading-mistake-3-hardcoding-secrets-and-credentials">Mistake 3: Hardcoding Secrets and Credentials</a></p>
</li>
<li><p><a href="#heading-mistake-4-overengineering-for-problems-you-dont-have-yet">Mistake 4: Overengineering for Problems You Don't Have Yet</a></p>
</li>
<li><p><a href="#heading-mistake-5-no-observability-before-launch">Mistake 5: No Observability Before Launch</a></p>
</li>
<li><p><a href="#heading-mistake-6-treating-security-as-a-final-step">Mistake 6: Treating Security as a Final Step</a></p>
</li>
<li><p><a href="#heading-mistake-7-manual-deployments-in-production">Mistake 7: Manual Deployments in Production</a></p>
</li>
<li><p><a href="#heading-mistake-8-no-disaster-recovery-plan">Mistake 8: No Disaster Recovery Plan</a></p>
</li>
<li><p><a href="#heading-mistake-9-no-documentation-or-runbooks">Mistake 9: No Documentation or Runbooks</a></p>
</li>
<li><p><a href="#heading-mistake-10-solving-technical-problems-without-understanding-the-business">Mistake 10: Solving Technical Problems Without Understanding the Business</a></p>
</li>
<li><p><a href="#heading-the-system-thinking-framework-every-devops-engineer-needs">The System Thinking Framework Every DevOps Engineer Needs</a></p>
</li>
<li><p><a href="#heading-your-production-readiness-checklist">Your Production Readiness Checklist</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-who-this-article-is-for">Who This Article Is For</h2>
<ul>
<li><p><strong>Early-career DevOps and cloud engineers</strong> who are building or maintaining production infrastructure at a startup.</p>
</li>
<li><p><strong>Backend developers</strong> who have recently taken on DevOps responsibilities.</p>
</li>
<li><p><strong>Engineers joining a startup</strong> who want to understand what operational discipline actually looks like in a fast-moving environment.</p>
</li>
</ul>
<p>You do not need to be an expert in any specific tool to follow this article. The focus is on decision-making patterns and operational discipline, not tool configuration.</p>
<h2 id="heading-why-startups-are-a-different-environment">Why Startups Are a Different Environment</h2>
<p>Before getting into the mistakes, you have to understand why startups produce them in the first place.</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/f9bec1fa-8938-4144-b934-9e5af4edf4ad.svg" alt="diagram showing the startup DevOps reality, a single engineer handling infra, CI/CD, security, monitoring, and deployment pipelines simultaneously" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In a large company, you typically have dedicated security engineers, an SRE team, a platform team, and multiple reviewers for every infrastructure change. In a startup, you mostly likely have one engineer responsible for all of that simultaneously.</p>
<p>This creates four specific pressure points:</p>
<ol>
<li><p><strong>Speed pressure.</strong> The business needs features shipped now. Operational discipline gets treated as optional because nobody is watching closely yet.</p>
</li>
<li><p><strong>Budget constraints.</strong> Every infrastructure decision has a direct impact on company runway. Engineers optimize for the cheapest option rather than the most reliable one.</p>
</li>
<li><p><strong>Absent guardrails.</strong> There is no senior engineer reviewing your Terraform plans. There is no security audit before launch. The absence of immediate consequences can make bad decisions feel like good ones.</p>
</li>
<li><p><strong>Constantly changing requirements.</strong> The architecture you design today may need to support a completely different product in six months. None of these pressures are excuses for poor decisions. But understanding them helps you see why the following mistakes happen so consistently.</p>
</li>
</ol>
<h2 id="heading-mistake-1-deploying-without-understanding-what-youre-deploying">Mistake 1: Deploying Without Understanding What You're Deploying</h2>
<h3 id="heading-the-scenario">The Scenario</h3>
<p>A junior engineer is asked to deploy the company's Node.js API to AWS. They find a tutorial for Elastic Beanstalk, follow it, and it works. Two weeks later, traffic increases. They try to scale "the same way as in the tutorial." The application goes down. They cannot debug it because they never understood what the deployment was actually doing.</p>
<h3 id="heading-the-business-impact">The Business Impact</h3>
<p>When production breaks and the person who deployed the system cannot explain how it works, diagnosis takes hours instead of minutes. The longer the incident runs, the higher the cost in customer trust, team morale, and potentially direct revenue loss.</p>
<h3 id="heading-the-fix">The Fix</h3>
<p>Before you deploy anything to production, you should be able to answer these five questions in writing:</p>
<ol>
<li><p><strong>What compute type is running my code?</strong> (EC2, Lambda, Fargate, container?)</p>
</li>
<li><p><strong>How does a new version replace the old one?</strong> (Rolling? Blue/green? All-at-once?)</p>
</li>
<li><p><strong>Where does configuration and secrets come from?</strong> (SSM? Secrets Manager? Environment file?)</p>
</li>
<li><p><strong>What downstream services depend on this?</strong> (Database connections? Other APIs? Cache?)</p>
</li>
<li><p><strong>How do I roll back in under five minutes if this breaks?</strong></p>
</li>
</ol>
<p>If you cannot answer all five, do not deploy until you can. The tutorial that got it running is not the documentation for how it operates.</p>
<blockquote>
<p>"It is better to spend two hours understanding a system before deploying it than two days debugging it after something breaks."</p>
</blockquote>
<p>Personally, when learning a new technology, tool, or implementing something I have not worked with before, I usually focus on three core questions: What, Why, and How.</p>
<ul>
<li><p><strong>The first question is: What is this technology or concept about?</strong><br>This helps me build a solid foundation by doing deep research, studying the official documentation, understanding the core principles, and sometimes even learning the history behind the tool or technology. I believe having a well-grounded understanding before implementation is very important.</p>
</li>
<li><p><strong>The second question is: Why do we need it?</strong><br>I try to understand the value the technology brings, why it should be implemented, what problem it solves, and how it benefits the team or organization. This helps me make informed technical decisions instead of just implementing tools without understanding their purpose.</p>
</li>
<li><p><strong>The third question is: How should it be implemented?</strong><br>There are usually multiple approaches to solving a problem or implementing a technology, so I focus on understanding the best and most practical approach based on the use case and expected outcome.</p>
</li>
</ul>
<p>This structured approach has helped me learn new technologies quickly, adapt fast, and implement solutions effectively in real-world environments.</p>
<h2 id="heading-mistake-2-using-production-as-a-development-environment">Mistake 2: Using Production as a Development Environment</h2>
<h3 id="heading-the-scenario">The Scenario</h3>
<p>To save time, an engineer tests a new deployment script directly in the production AWS account. They accidentally run a command that terminates the production database instance. Automated backups exist but were misconfigured. Six hours of customer data is unrecoverable.</p>
<p>This scenario happens more often than you would expect. The reasoning is always the same: "It will only take a minute."</p>
<h3 id="heading-the-business-impact">The Business Impact</h3>
<p>A single test-in-production incident can result in data loss, hours of downtime, and a customer communication crisis. In a startup, that can permanently damage the company's reputation before it has had the chance to build one.</p>
<h3 id="heading-the-fix">The Fix</h3>
<p>You need at minimum three separate environments and ideally three separate AWS accounts:</p>
<table>
<thead>
<tr>
<th>Environment</th>
<th>Purpose</th>
<th>Access Level</th>
</tr>
</thead>
<tbody><tr>
<td><strong>dev</strong></td>
<td>Break things freely. No real data.</td>
<td>Engineers have broad access</td>
</tr>
<tr>
<td><strong>staging</strong></td>
<td>Mirror of production. Final verification.</td>
<td>Controlled access</td>
</tr>
<tr>
<td><strong>production</strong></td>
<td>Real customers. Real data.</td>
<td>MFA required. No manual deployments.</td>
</tr>
</tbody></table>
<p>Using separate AWS accounts (not just separate VPCs) gives you account-level isolation. A permission error in the dev account cannot accidentally touch production infrastructure at the API level.</p>
<p>Infrastructure as Code (Terraform or CloudFormation) makes this affordable, you write the configuration once and apply it three times with different variable files.</p>
<pre><code class="language-hcl"># terraform/environments/prod/main.tf
module "app" {
  source      = "../../modules/app"
  environment = "production"
  instance_type = "t3.medium"
  db_instance_class = "db.t3.medium"
  multi_az          = true
}
</code></pre>
<pre><code class="language-hcl"># terraform/environments/staging/main.tf
module "app" {
  source      = "../../modules/app"
  environment = "staging"
  instance_type = "t3.small"
  db_instance_class = "db.t3.small"
  multi_az          = false
}
</code></pre>
<p>The module is the same. The environment-specific variables are different. Separate environments are not a luxury, they are the minimum operating standard for any team running real software.</p>
<h2 id="heading-mistake-3-hardcoding-secrets-and-credentials">Mistake 3: Hardcoding Secrets and Credentials</h2>
<h3 id="heading-the-scenario">The Scenario</h3>
<p>A new engineer joins a startup and clones the repository. Inside they find a <code>.env</code> file committed to Git containing the production database password, the Stripe secret key, and an AWS access key with admin permissions. The repository has been public for six months.</p>
<p>GitHub's automated secret scanning never triggered because the secrets were inside a <code>.env</code> file rather than raw in the code. The credentials had been valid and actively used for over six months.</p>
<h3 id="heading-the-business-impact">The Business Impact</h3>
<p>Automated scanners run by attackers find exposed credentials within minutes of them being pushed to a public repository. A single exposed AWS access key with admin permissions can result in:</p>
<ul>
<li><p>Crypto-mining workloads generating thousands of dollars in cloud bills overnight</p>
</li>
<li><p>Complete exfiltration of customer data from every S3 bucket</p>
</li>
<li><p>Privilege escalation: the attacker creates new admin users and locks you out of your own account</p>
</li>
<li><p>AWS account suspension while the investigation runs</p>
</li>
</ul>
<p>According to <a href="https://github.blog/security/vulnerability-research/securing-millions-of-developers-together/">GitHub's annual security report</a>, millions of secrets are exposed in public repositories every year. The average time to detect a compromised cloud credential is 197 days.</p>
<h2 id="heading-the-fix">The Fix</h2>
<p><strong>Step 1: Never commit secrets to Git.</strong> Not temporarily. Not in a branch. Not in a private repository.</p>
<p><strong>Step 2: Add</strong> <code>.gitignore</code> <strong>before you create the first file.</strong> Check in the <code>.gitignore</code> with the first line of code before any <code>.env</code> files exist.</p>
<pre><code class="language-gitignore"># .gitignore
.env
.env.*
*.pem
*.key
secrets/
</code></pre>
<p><strong>Step 3: Use AWS Secrets Manager or SSM Parameter Store for all production secrets.</strong> Your application reads secrets at runtime:</p>
<pre><code class="language-python"># Python example — fetch secret at runtime, never at build time
import boto3
import json
 
def get_secret(secret_name: str, region: str = "us-east-1") -&gt; dict:
    client = boto3.client("secretsmanager", region_name=region)
    response = client.get_secret_value(SecretId=secret_name)
    return json.loads(response["SecretString"])
 
# Usage
db_config = get_secret("prod/myapp/database")
DATABASE_URL = db_config["connection_string"]
</code></pre>
<p><strong>Step 4: Scan your existing repositories immediately.</strong> You may already have a problem:</p>
<pre><code class="language-bash"># Install trufflehog to scan for exposed secrets in your repo history
pip install trufflehog
 
# Scan the entire commit history of your repository
trufflehog git file://.
 
# Or scan a remote GitHub repo
trufflehog github --repo https://github.com/your-org/your-repo
</code></pre>
<p><strong>Step 5: Add a pre-commit hook to prevent future accidents:</strong></p>
<pre><code class="language-bash">pip install pre-commit
</code></pre>
<pre><code class="language-yaml"># .pre-commit-config.yaml
repos:
  - repo: https://github.com/awslabs/git-secrets
    rev: master
    hooks:
      - id: git-secrets
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
</code></pre>
<pre><code class="language-bash">pre-commit install
# Now the hook runs before every commit and blocks detected secrets
</code></pre>
<p>There is no recovery from a publicly exposed database password. The fix takes ten minutes upfront. The incident takes weeks.</p>
<h2 id="heading-mistake-4-overengineering-for-problems-you-dont-have-yet">Mistake 4: Overengineering for Problems You Don't Have Yet</h2>
<h3 id="heading-the-scenario">The Scenario</h3>
<p>A five-person startup with 200 users decides to build a microservices architecture on Kubernetes because "Netflix uses it." They spend three months setting up Kubernetes, Istio service mesh, ArgoCD, Vault, Prometheus, and Grafana. Their product has not shipped a new feature in three months. A competitor with a monolith on a single EC2 instance shipped twelve new features in the same period.</p>
<h3 id="heading-the-business-impact">The Business Impact</h3>
<p>Every layer of infrastructure you add is a layer that can break, a layer that requires expertise to operate, and a layer that slows down every future change. Kubernetes is the right answer for organizations with the scale and team size to operate it. For a five-person startup, it is an expensive distraction.</p>
<p>Premature complexity does not just cost engineering time. It costs the competitive advantage that speed provides in the early stage.</p>
<h3 id="heading-the-fix">The Fix</h3>
<p>Match your infrastructure to your actual stage:</p>
<table>
<thead>
<tr>
<th>Scale</th>
<th>Right Infrastructure</th>
<th>Cost Range</th>
</tr>
</thead>
<tbody><tr>
<td><strong>1–1,000 users</strong></td>
<td>Single EC2 + RDS + Nginx reverse proxy</td>
<td>$20–50/month</td>
</tr>
<tr>
<td><strong>1K–50K users</strong></td>
<td>Auto-scaling group, RDS Multi-AZ, ALB, basic CI/CD</td>
<td>$200-500/month</td>
</tr>
<tr>
<td><strong>50K–500K users</strong></td>
<td>ECS Fargate, RDS read replicas, ElastiCache, full observability</td>
<td>$1K-5K/month</td>
</tr>
<tr>
<td><strong>500K+ users</strong></td>
<td>Multi-region, managed Kubernetes, dedicated SRE</td>
<td>$10K+/month</td>
</tr>
</tbody></table>
<p>The question to ask before every infrastructure decision is: <strong>"What specific, measurable problem does this solve today that my current setup cannot solve?"</strong></p>
<p>Amazon, Netflix, and Uber did not start with microservices. They started with monoliths and extracted services only when the monolith became the actual bottleneck. You are not Netflix. You are solving the problems in front of you today.</p>
<p>Use managed services wherever possible, RDS instead of self-hosted Postgres, Fargate instead of self-managed Kubernetes, ElastiCache instead of self-hosted Redis. Managed services let your team focus on the product instead of the infrastructure.</p>
<h2 id="heading-mistake-5-no-observability-before-launch">Mistake 5: No Observability Before Launch</h2>
<h3 id="heading-the-scenario">The Scenario</h3>
<p>A startup's checkout flow breaks on a Friday evening. Users are abandoning their carts and the company is losing revenue. The DevOps engineer finds out 45 minutes later because a customer sent a direct message to the CEO on Twitter.</p>
<p>The engineer has no dashboards, no log aggregation, and no alerting. They SSH into the production server and scroll through raw log files. Two hours later, they find the issue: a database connection pool was exhausted by a memory leak introduced in that morning's deployment.</p>
<h3 id="heading-business-impact">Business Impact</h3>
<p>Without observability:</p>
<ul>
<li><p>You find out about production problems from users, not from your systems</p>
</li>
<li><p>Incidents take 10x longer to resolve because diagnosis is guesswork</p>
</li>
<li><p>You cannot tell whether a deployment improved or degraded performance</p>
</li>
<li><p>You have no data for making better architecture decisions</p>
</li>
</ul>
<h3 id="heading-the-fix">The Fix</h3>
<p>Implement the four golden signals before any service goes to production. These come from <a href="https://sre.google/sre-book/monitoring-distributed-systems/">Google's Site Reliability Engineering book</a>:</p>
<ol>
<li><p><strong>Latency</strong>: How long requests take to complete (p50, p95, p99)</p>
</li>
<li><p><strong>Traffic</strong>: How many requests per second the system is handling</p>
</li>
<li><p><strong>Errors</strong>: The rate of failed requests (5xx responses per minute)</p>
</li>
<li><p><strong>Saturation</strong>: How close the system is to its limits (CPU, memory, connection pool)</p>
</li>
</ol>
<p>Here is a minimal CloudWatch alarm setup using the AWS CLI:</p>
<pre><code class="language-shell"># Alert when error rate exceeds 1% for 5 consecutive minutes

aws cloudwatch put-metric-alarm \
  --alarm-name "high-error-rate-production" \
  --alarm-description "Error rate exceeded 1% for 5 minutes" \
  --metric-name "5XXError" \
  --namespace "AWS/ApplicationELB" \
  --statistic "Average" \
  --period 60 \
  --evaluation-periods 5 \
  --threshold 0.01 \
  --comparison-operator "GreaterThanOrEqualToThreshold" \
  --alarm-actions "arn:aws:sns:us-east-1:123456789:pagerduty-production" \
  --dimensions Name=LoadBalancer,Value=app/my-alb/1234567890abcdef
</code></pre>
<p>Every application should also expose a <code>/health</code> endpoint that returns <code>200 OK</code> when healthy:</p>
<pre><code class="language-python"># FastAPI example

from fastapi import FastAPI
from sqlalchemy import text
 
app = FastAPI()
 
@app.get("/health")
async def health_check():
    # Check database connectivity
    try:
        db.execute(text("SELECT 1"))
        db_status = "healthy"
    except Exception:
        db_status = "unhealthy"
 
    return {
        "status": "healthy" if db_status == "healthy" else "degraded",
        "database": db_status,
        "version": os.getenv("APP_VERSION", "unknown")
    }
</code></pre>
<p>Your load balancer checks this endpoint. Your uptime monitor checks it. You check it after every deployment.</p>
<blockquote>
<p>You do not get to say a system is working unless you have data to prove it. "Nobody complained" is not the same as "nothing is broken."</p>
</blockquote>
<h2 id="heading-mistake-6-treating-security-as-a-final-step">Mistake 6: Treating Security as a Final Step</h2>
<h3 id="heading-the-scenario">The Scenario</h3>
<p>A startup rushes to launch their MVP. Security reviews are "planned for after launch." Six months later, a potential enterprise customer requires a security audit before signing a contract. The audit reveals:</p>
<ul>
<li><p>S3 buckets publicly accessible by default</p>
</li>
<li><p>EC2 instances with port 22 open to <code>0.0.0.0/0</code></p>
</li>
<li><p>IAM users with <code>AdministratorAccess</code> for the entire team</p>
</li>
<li><p>No encryption on the database at rest</p>
</li>
<li><p>JWT secrets hardcoded in environment variables The audit fails. The enterprise deal worth $120,000 annually is lost. Remediation takes four weeks of engineering time.</p>
</li>
</ul>
<h3 id="heading-the-business-impact">The Business Impact</h3>
<p>Security debt is the most expensive technical debt you can accumulate. Unlike performance debt that degrades gradually, security vulnerabilities cause sudden, catastrophic events: data breaches, ransomware, account takeovers, and regulatory fines. At a startup, any one of these can end the company.</p>
<h3 id="heading-the-fix">The Fix</h3>
<p>Apply these six security controls before the first line of production code ships:</p>
<p><strong>1. Principle of Least Privilege every IAM role gets only what it needs:</strong></p>
<p>One of the most common security mistakes in AWS is granting roles more permissions than they need either out of convenience (<code>s3:*</code>) or uncertainty about what the service actually requires. This creates unnecessary risk: if a role is compromised, the attacker inherits every permission you granted.</p>
<p>The fix is simple: look at what your service actually does, then write a policy that allows exactly that.</p>
<p>If your app uploads and reads files from a specific S3 bucket, the policy should say exactly that:</p>
<pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-app-uploads/*"
    }
  ]
}
</code></pre>
<p>Notice the <code>Resource</code> is scoped to <code>my-app-uploads/*</code> not all S3 buckets. And the <code>Action</code> list covers only <code>GetObject</code> and <code>PutObject</code> not <code>DeleteObject</code>, not <code>s3:*</code>. If the service gets compromised, the attacker can read and write to that one bucket. That is it. The rest of your account is untouched.</p>
<p><strong>2. Block all S3 public access by default:</strong></p>
<p>AWS S3 buckets are private by default when created but that can be overridden at the bucket level, the object level, or through a bucket policy. Misconfigured S3 buckets are one of the most common causes of data breaches, and they are almost always accidental.</p>
<p>The safest approach is to enable the "Block Public Access" setting at the account level, which overrides all other settings and prevents any bucket from being made public even if someone tries:</p>
<pre><code class="language-bash">aws s3api put-public-access-block \
  --bucket my-app-bucket \
  --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
</code></pre>
<p>Run this for every bucket you create. Better yet, enable it at the AWS account level so it applies automatically to all future buckets by default.</p>
<p><strong>3. Never open SSH to the internet, use AWS Systems Manager Session Manager instead:</strong></p>
<p>Port 22 open to <code>0.0.0.0/0</code> is an attack surface that exists on thousands of AWS instances right now. Brute-force bots scan the internet continuously looking for open SSH ports. Even with a strong key, the exposure is unnecessary because AWS provides a better alternative.</p>
<p>AWS Systems Manager Session Manager gives you full shell access to any EC2 instance without opening a single inbound port on the security group. There is no port to scan, no port to attack, and every session is logged automatically to CloudTrail:</p>
<pre><code class="language-bash"># Start a session on an EC2 instance without port 22 open
aws ssm start-session --target i-0123456789abcdef0
</code></pre>
<p>To use Session Manager, the EC2 instance needs the SSM Agent installed (included by default on Amazon Linux 2 and Ubuntu 20.04+) and an IAM instance profile with the <code>AmazonSSMManagedInstanceCore</code> policy attached. Once that is set up, you can close port 22 on the security group entirely.</p>
<p><strong>4. Enable MFA for all IAM users and enforce it via policy:</strong></p>
<p>A leaked IAM username and password with no MFA is a fully compromised account. Multi-factor authentication is the single most effective control against credential theft, and it costs nothing to enable.</p>
<p>Enforce it through an IAM policy that denies all actions when MFA is not present, except the actions needed to set up MFA in the first place. This means even if a set of credentials is stolen, the attacker cannot do anything without the second factor.</p>
<p>The AWS documentation provides the <a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_users-self-manage-mfa-and-creds.html">Complete Deny Without MFA Policy</a>, attach it to every IAM user or group in your account. This is a one-time setup that permanently raises your account's security baseline.</p>
<p><strong>5. Enable CloudTrail in all regions:</strong></p>
<p>Without CloudTrail, you have no record of who did what in your AWS account. If a credential is compromised, you cannot investigate what the attacker accessed. If an engineer accidentally deletes a resource, you cannot trace it. You are operating blind.</p>
<p>CloudTrail logs every AWS API call who made it, from which IP, at what time, and what the response was. Enable it across all regions so activity in regions you do not actively use is also captured:</p>
<pre><code class="language-bash">aws cloudtrail create-trail \
  --name production-audit-trail \
  --s3-bucket-name my-cloudtrail-logs \
  --is-multi-region-trail \
  --enable-log-file-validation
</code></pre>
<p>The <code>--enable-log-file-validation</code> flag generates a digest file for each log that lets you verify the log has not been tampered with, this is important if you ever need to use these logs in a security investigation or compliance audit. Once this is running, every <code>AssumeRole</code>, every <code>DeleteBucket</code>, and every <code>RunInstances</code> call in your account is permanently recorded.</p>
<p><strong>6. Run AWS Security Hub from day one:</strong></p>
<p>Most teams only discover security misconfigurations after a breach or a compliance audit. Security Hub inverts this, it continuously scans your AWS environment against industry-standard frameworks (CIS AWS Foundations Benchmark, AWS Foundational Security Best Practices) and surfaces findings before they become incidents.</p>
<p>Enabling it takes a single command:</p>
<pre><code class="language-bash">aws securityhub enable-security-hub
</code></pre>
<p>Within minutes, Security Hub gives your account a compliance score and a prioritized list of findings. A finding might tell you that a security group has port 22 open to the world, that an S3 bucket has logging disabled, or that root account credentials were recently used. Each finding includes the affected resource and a remediation guide.</p>
<p>Treat every Security Hub finding the same way you treat a production bug: assign it a priority, assign an owner, and close it. A finding sitting unaddressed for 30 days is a known vulnerability you chose to leave open.</p>
<h2 id="heading-mistake-7-manual-deployments-in-production">Mistake 7: Manual Deployments in Production</h2>
<h3 id="heading-the-scenario">The Scenario</h3>
<p>A startup's deployment process is documented in a Notion page that is four months out of date. It involves SSH-ing into the server, running <code>git pull</code>, running <code>npm install</code>, and restarting the PM2 process. Different engineers do it slightly differently. One engineer, rushing a late-night release, skips <code>npm install</code>. The application starts crashing because a new dependency is missing.</p>
<h3 id="heading-the-business-impact">The Business Impact</h3>
<p>Manual deployment processes are inherently unreliable. Humans under pressure skip steps, perform steps in the wrong order, and remember procedures differently. Every manual step in a production deployment process is a scheduled incident waiting for the right moment of stress.</p>
<h3 id="heading-the-fix">The Fix</h3>
<p>If a deployment step is performed manually more than twice, it needs to be automated. Here is a minimal but complete GitHub Actions deployment workflow for an ECS Fargate service:</p>
<pre><code class="language-yaml"># .github/workflows/deploy.yml
name: Deploy to Production
 
on:
  push:
    branches:
      - main
 
permissions:
  id-token: write   # Required for OIDC authentication with AWS
  contents: read
 
jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production
 
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
 
      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE_ARN }}
          aws-region: us-east-1
 
      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2
 
      - name: Build and push Docker image
        id: build
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build -t \(ECR_REGISTRY/my-app:\)IMAGE_TAG .
          docker push \(ECR_REGISTRY/my-app:\)IMAGE_TAG
          echo "image=\(ECR_REGISTRY/my-app:\)IMAGE_TAG" &gt;&gt; $GITHUB_OUTPUT
 
      - name: Deploy to Amazon ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: task-definition.json
          service: my-app-service
          cluster: production
          wait-for-service-stability: true
</code></pre>
<p>Notice <code>wait-for-service-stability: true</code>. Without this, the workflow reports success the moment ECS accepts the new task definition before the containers are actually healthy. With it, the workflow fails if the new containers crash. You want to know immediately, not discover it from user reports thirty minutes later.</p>
<h2 id="heading-mistake-8-no-disaster-recovery-plan">Mistake 8: No Disaster Recovery Plan</h2>
<h3 id="heading-the-scenario">The Scenario</h3>
<p>A startup's production database runs on a single RDS instance with no Multi-AZ configuration. Automated backups are enabled but have never been tested. The EBS volume backing the instance fails. AWS provisions a new instance from the last snapshot, which is 18 hours old. 18 hours of customer data is permanently lost.</p>
<p>The startup had no disaster recovery plan, no tested recovery procedure, and no communication template ready for customers.</p>
<h3 id="heading-the-business-impact">The Business Impact</h3>
<p>The question is not whether your infrastructure will fail. It will fail. Every database, every server, every availability zone experiences failures. The question is whether you have a tested plan for when it does.</p>
<p>Data loss of any magnitude is serious. For startups that handle financial data, healthcare data, or anything under GDPR, even partial data loss can trigger regulatory consequences.</p>
<h3 id="heading-the-fix">The Fix</h3>
<p><strong>Define your RTO and RPO before you design anything:</strong></p>
<ul>
<li><p><strong>RTO (Recovery Time Objective):</strong> How long can the business survive without this system? A payment API might have an RTO of 15 minutes. An internal analytics dashboard might have an RTO of 4 hours.</p>
</li>
<li><p><strong>RPO (Recovery Point Objective):</strong> How much data loss is acceptable? Zero means real-time replication. One hour means hourly snapshots are sufficient. This directly determines your backup frequency and architecture.</p>
</li>
</ul>
<p><strong>Enable RDS Multi-AZ for all production databases:</strong></p>
<pre><code class="language-hcl"># Terraform
resource "aws_db_instance" "production" {
  identifier        = "prod-postgres"
  engine            = "postgres"
  engine_version    = "15.4"
  instance_class    = "db.t3.medium"
  allocated_storage = 100
 
  # Multi-AZ: automatic failover to standby in a different AZ
  # No data loss. Automatic failover in ~60-120 seconds.
  multi_az = true
 
  # Encryption at rest — non-negotiable
  storage_encrypted = true
 
  # Automated backups with 7-day retention
  backup_retention_period = 7
  backup_window           = "03:00-04:00"
 
  # Enable deletion protection in production
  deletion_protection = true
 
  tags = {
    Environment = "production"
  }
}
</code></pre>
<p><strong>Test your backups on a schedule.</strong> Create a monthly calendar event: "Restore production backup to staging and verify data integrity." An untested backup is not a backup, it is a hope.</p>
<pre><code class="language-bash"># Restore a snapshot to a test instance and verify
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier recovery-test \
  --db-snapshot-identifier rds:prod-postgres-2025-01-15 \
  --db-instance-class db.t3.medium \
  --no-multi-az
 
# Connect and verify row counts
psql -h recovery-test.xxxx.rds.amazonaws.com -U admin -d mydb \
  -c "SELECT COUNT(*) FROM users; SELECT COUNT(*) FROM orders;"
</code></pre>
<p>For official guidance on RDS backup and restore, refer to the <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html">AWS RDS Backup and Restore documentation</a>.</p>
<h2 id="heading-mistake-9-no-documentation-or-runbooks">Mistake 9: No Documentation or Runbooks</h2>
<h3 id="heading-the-scenario">The Scenario</h3>
<p>The startup's most experienced DevOps engineer takes two weeks of vacation. On day three of their holiday, the staging environment goes down. Nobody else knows how it was built, the engineer set it up manually over six months with no documentation, no Terraform, no notes. The team spends four days trying to reconstruct the environment from memory and guesswork. The engineer gets messages on their vacation every day. When they return, they rebuild the environment in four hours.</p>
<h3 id="heading-the-business-impact">The Business Impact</h3>
<p>Undocumented infrastructure creates single points of failure not in your systems, but in your team. It makes onboarding new engineers take weeks instead of hours. It makes incident response depend on specific people being available. When that person leaves the company, the knowledge walks out with them.</p>
<h3 id="heading-the-fix">The Fix</h3>
<p>Documentation for an engineering team means three specific things:</p>
<ol>
<li><p><strong>Infrastructure as Code is the highest form of documentation.</strong> The Terraform that defines your infrastructure IS the documentation for what exists and how it is configured. If something is not in code, it should not exist in production.</p>
</li>
<li><p><strong>A runbook for every operational task.</strong> A runbook is a step-by-step procedure written well enough that someone in their first week at the company can follow it during an incident:</p>
</li>
</ol>
<pre><code class="language-markdown"># Runbook: Production Database Connection Exhaustion
 
## Symptoms
- Application logs: "too many connections" errors
- 500 error rate spike on database-dependent endpoints
- pg_stat_activity shows max connections reached
 
## Diagnosis
# Check current connection count
psql -h \(DB_HOST -U \)DB_USER -c "SELECT COUNT(*) FROM pg_stat_activity;"
 
# See connections by application
psql -h \(DB_HOST -U \)DB_USER \
  -c "SELECT application_name, COUNT(*) FROM pg_stat_activity GROUP BY 1 ORDER BY 2 DESC;"

## Resolution
1. Identify and restart the service causing the connection leak
2. If immediate relief needed: kill idle connections older than 10 minutes
3. Long-term: review connection pool settings in application config

## Escalation
If unresolved in 30 minutes: page the on-call backend engineer.
</code></pre>
<ol>
<li><strong>An architecture README in every repository.</strong> Every engineer who clones your repository should be able to understand what it does, how to run it locally, how to deploy it, and what it depends on without asking anyone.</li>
</ol>
<h2 id="heading-mistake-10-solving-technical-problems-without-understanding-the-business">Mistake 10: Solving Technical Problems Without Understanding the Business</h2>
<h3 id="heading-the-scenario">The Scenario</h3>
<p>A startup is experiencing slow page loads. A DevOps engineer decides to solve it by migrating to Kubernetes with horizontal pod auto-scaling. The migration takes six weeks. Page loads improve slightly. But 80% of the slowness was caused by unoptimized database queries that had nothing to do with the infrastructure layer. The six-week migration solved 20% of the problem.</p>
<h3 id="heading-the-business-impact">The Business Impact</h3>
<p>Technical solutions to misdiagnosed problems are extraordinarily expensive. Every hour spent building the wrong solution is an hour not spent on the right one. Infrastructure is a tool for delivering business outcomes not an end in itself.</p>
<h3 id="heading-the-fix">The Fix</h3>
<p>Before making any infrastructure decision, answer these four questions:</p>
<ol>
<li><p><strong>What is the actual, measured bottleneck?</strong> Instrument before you act. The bottleneck is almost never where you assumed it was.</p>
</li>
<li><p><strong>What does success look like, and how will you measure it?</strong> "Pages are faster" is not measurable. "p95 page load time drops below 1.2 seconds" is measurable.</p>
</li>
<li><p><strong>What is the full cost of this solution?</strong> Time to implement, ongoing operational burden, team learning curve. Is this cost justified by the measured impact?</p>
</li>
<li><p><strong>Can a simpler solution solve 80% of the problem in 20% of the time?</strong></p>
</li>
</ol>
<p>Always profile and measure before you rebuild:</p>
<pre><code class="language-bash"># Check slow queries in PostgreSQL before any infrastructure changes
psql -h \(DB_HOST -U \)DB_USER -d $DB_NAME -c "
SELECT
  query,
  calls,
  total_exec_time / calls AS avg_ms,
  rows / calls AS avg_rows
FROM pg_stat_statements
ORDER BY avg_ms DESC
LIMIT 10;
"
</code></pre>
<p>Nine times out of ten, slow applications have slow queries, missing indexes, or an N+1 query problem, none of which require a new infrastructure layer to fix.</p>
<h2 id="heading-the-system-thinking-framework-every-devops-engineer-needs">The System Thinking Framework Every DevOps Engineer Needs</h2>
<p>Most of the mistakes above share a common root cause: the engineer was thinking about one component in isolation instead of the full system.</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/b33035a6-448f-419b-b293-206b7b775594.jpg" alt="A diagram showing a request flowing through a full system: user → CDN → load balancer → application servers → cache → database → logs/monitoring" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>A system thinker asks six questions before making any change in production:</p>
<table>
<thead>
<tr>
<th>Question</th>
<th>Why You Ask It</th>
</tr>
</thead>
<tbody><tr>
<td><strong>What does this change?</strong></td>
<td>List every configuration, file, or service that will be different.</td>
</tr>
<tr>
<td><strong>What does this depend on?</strong></td>
<td>What must be true upstream for this component to work correctly?</td>
</tr>
<tr>
<td><strong>What depends on this?</strong></td>
<td>What downstream systems are affected if this changes or fails?</td>
</tr>
<tr>
<td><strong>What is the failure mode?</strong></td>
<td>Does this fail loudly (500 errors) or silently (wrong data)?</td>
</tr>
<tr>
<td><strong>What is the rollback path?</strong></td>
<td>How do you reverse this in under five minutes?</td>
</tr>
<tr>
<td><strong>What does healthy look like after the change?</strong></td>
<td>What metrics confirm everything is working correctly?</td>
</tr>
</tbody></table>
<p>This is not a checklist you run through slowly. It is a thinking habit that becomes automatic with practice. Senior engineers do not spend more time on deployments than junior engineers do, they spend their time on different things, and this is one of them.</p>
<h2 id="heading-your-production-readiness-checklist">Your Production Readiness Checklist</h2>
<p>Use this checklist before any production system goes live. Mark each item as done, in progress, or not yet started.</p>
<h3 id="heading-infrastructure">Infrastructure</h3>
<ul>
<li><p>Infrastructure is defined as code (Terraform or CloudFormation) and version-controlled in Git</p>
</li>
<li><p>Separate dev, staging, and production environments exist with separate credentials</p>
</li>
<li><p>All production changes go through an automated CI/CD pipeline, no manual SSH deployments</p>
</li>
<li><p>You can rebuild the entire production environment from code in under two hours</p>
</li>
</ul>
<h3 id="heading-security">Security</h3>
<ul>
<li><p>No secrets, credentials, or API keys exist in any Git repository</p>
</li>
<li><p>All production secrets are in Secrets Manager or SSM Parameter Store</p>
</li>
<li><p>All IAM roles follow the principle of least privilege</p>
</li>
<li><p>S3 buckets have public access blocked by default</p>
</li>
<li><p>Port 22 is not open to <code>0.0.0.0/0</code> on any security group</p>
</li>
<li><p>CloudTrail is enabled in all regions</p>
</li>
<li><p>All IAM users have MFA enabled</p>
</li>
<li><p>AWS Security Hub is enabled and findings are reviewed weekly</p>
</li>
</ul>
<h3 id="heading-observability">Observability</h3>
<ul>
<li><p>Every service has a <code>/health</code> endpoint that monitoring checks continuously</p>
</li>
<li><p>Alerts fire within five minutes of a production error rate spike</p>
</li>
<li><p>Dashboards exist showing latency, error rate, and resource utilization</p>
</li>
<li><p>Logs are centralized and searchable, not scattered across individual servers</p>
</li>
</ul>
<h3 id="heading-reliability">Reliability</h3>
<ul>
<li><p>Production database has Multi-AZ enabled</p>
</li>
<li><p>Backup restoration has been tested in the last 30 days</p>
</li>
<li><p>Written runbooks exist for the three most likely failure scenarios</p>
</li>
<li><p>RTO and RPO requirements are documented and the architecture meets them</p>
</li>
</ul>
<h3 id="heading-documentation">Documentation</h3>
<ul>
<li><p>Every repository has a README explaining what it does and how to deploy it</p>
</li>
<li><p>A new engineer could understand the production architecture from documentation alone</p>
</li>
<li><p>No single engineer holds critical knowledge that lives only in their head</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>None of the mistakes in this article require rare misfortune to experience. They are the predictable result of decisions that feel reasonable under startup pressure but accumulate into real operational risk over time.</p>
<p>The good news is that every single one of them is preventable with the right awareness and the right habits applied early.</p>
<p>You do not need a perfect infrastructure from day one. You need a correct one: version-controlled, automated, observable, secure, and documented. Start with that foundation. Add complexity only when a specific, measured problem requires it. Always connect technical decisions to business outcomes.</p>
<p>The goal of DevOps in a startup is not to build impressive infrastructure. It is to build reliable systems that support product growth safely, efficiently, and sustainably and to make sure that when something does break, you can recover faster than anyone notices.</p>
<h2 id="heading-want-to-go-deeper">Want to Go Deeper?</h2>
<p>If this article resonated with you, <a href="https://coachli.co/tolani-akintayo/PR-H4oQS"><strong>The Startup DevOps Field Guide</strong></a> covers these principles in full depth with complete infrastructure blueprints, security frameworks, CI/CD pipeline templates, and the end-to-end decision-making playbook for engineers building DevOps practices in startup environments from scratch.</p>
<p>It is written specifically for the engineer who wants to do this right from the beginning not the one rebuilding everything after the first major incident.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ AWS Certified Cloud Practitioner Study Course – Pass the Exam With This Free 14-Hour Course ]]>
                </title>
                <description>
                    <![CDATA[ Passing the AWS Certified Cloud Practitioner Exam is one of the first steps to a career in cloud development. And freeCodeCamp just published a free 14-hour course that will help you prepare for the e ]]>
                </description>
                <link>https://www.freecodecamp.org/news/aws-certified-cloud-practitioner-study-course-pass-the-exam-with-this-free-13-hour-course/</link>
                <guid isPermaLink="false">66b200a5276d158502db2eb1</guid>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Thu, 14 May 2026 13:00:00 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5f68e7df6dfc523d0a894e7c/f493e192-b126-4291-8884-d2e2ff621df4.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Passing the AWS Certified Cloud Practitioner Exam is one of the first steps to a career in cloud development. And freeCodeCamp just published a free 14-hour course that will help you prepare for the exam.</p>
<p>This course has been updated for 2026.</p>
<p>This exam mostly deals with cloud computing concepts. Even if you are new to coding, you should be able to prepare for this exam and earn the AWS certification. Andrew Brown created this course. He is a popular instructor and the CEO of ExamPro.</p>
<h2 id="heading-what-is-the-aws-certified-cloud-practitioner">What is the AWS Certified Cloud Practitioner?</h2>
<p>The Certified Cloud Practitioner is the entry-level AWS certification that goes through:</p>
<ul>
<li><p>The cloud fundamentals, for example Cloud Concepts, Cloud Architecture, and Cloud Deployment Models</p>
</li>
<li><p>A close look at the AWS Core Services</p>
</li>
<li><p>A quick look at the vast amount of AWS services</p>
</li>
<li><p>Identity, Security, and Governance of the Cloud</p>
</li>
<li><p>Billing, Pricing, and Support of AWS Services</p>
</li>
</ul>
<p>The course code is CLF-C02 but its commonly referred to as the CCP.</p>
<p>Amazon Web Services is the leading Cloud Service Provider (CSP) in the world and the AWS Certified Cloud Practitioner is the most common starting point for people breaking into the cloud industry.</p>
<p>Consider the AWS Certified Cloud Practitioner if:</p>
<ul>
<li><p>You are new to cloud and need to learn the fundamentals</p>
</li>
<li><p>You are in the executive, management, or sales level and need to acquire strategic information about cloud for adoption or migration</p>
</li>
<li><p>You are a Senior Cloud Engineer or Solutions Architect who needs to reset or refresh your AWS knowledge after working for multiple years</p>
</li>
</ul>
<p>No matter your path towards a cloud role, the AWS Certified Cloud Practitioner provides fundamental knowledge that you shouldn't skip.</p>
<p>Here are all the sections in this comprehensive course:</p>
<h3 id="heading-introduction">Introduction</h3>
<p>🎤 Is Certified Cloud Practitioner right for me?<br>🎤 Exam Guide<br>🎤 Practice Exam Sample<br>🎤 Case Study Question Type<br>🎤 Validators</p>
<h3 id="heading-cloud-concepts">Cloud Concepts</h3>
<p>🎤 What is Cloud Computing<br>🎤 Evolution of Cloud Hosting<br>🎤 What is Amazon<br>🎤 What is AWS<br>🎤 What is a Cloud Service Provider<br>🎤 Landscape of CSPs<br>🎤 Gartner Magic Quadrant for Cloud<br>🎤 Common Cloud Services<br>🎤 AWS Technology Overview<br>🎤 AWS Services Preview<br>🎤 Evolution of Computing<br>🎤 Types of Cloud Computing<br>🎤 Cloud Computing Deployment Models<br>🎤 Deployment Model Use Cases</p>
<h3 id="heading-getting-started">Getting Started</h3>
<p>🎤 Create an AWS Account<br>🎤 Create IAM User<br>🎤 AWS Region Selector<br>🎤 Overbilling Story<br>🎤 AWS Budgets<br>🎤 AWS Free Tier<br>🎤 Billing Alarm<br>🎤 Turning on MFA</p>
<h3 id="heading-digital-transformation">Digital Transformation</h3>
<p>🎤 Innovation Waves<br>🎤 Burning Platform<br>🎤 Digital Transformation Checklist<br>🎤 Evolution of Computing Power<br>🎤 Amazon Braket</p>
<h3 id="heading-the-benefits-of-cloud">The Benefits of Cloud</h3>
<p>🎤 The Benefits of the Cloud<br>🎤 The Six Advantages of Cloud<br>🎤 The Six Advantages of Cloud Doc Reference<br>🎤 The Seven Advantages of Cloud</p>
<h3 id="heading-global-infrastructure">Global Infrastructure</h3>
<p>🎤 AWS Global Infrastructure Overview<br>🎤 AWS Global Infrastructure Follow Along<br>🎤 Regions<br>🎤 Regional vs Global Services<br>🎤 Availability Zones AZs<br>🎤 Regions and AZ Visualized<br>🎤 Selecting Regions and Azs Follow Along<br>🎤 Fault Tolerance<br>🎤 AWS Global Network<br>🎤 Points of Presence PoP<br>🎤 Tier 1<br>🎤 AWS Services using PoPs<br>🎤 AWS Direct Connect<br>🎤 Direct Connect Locations<br>🎤 AWS Local Zones<br>🎤 Wavelength Zones<br>🎤 Data Residency<br>🎤 AWS for Government<br>🎤 GovCloud<br>🎤 AWS in China<br>🎤 AWS in China Follow Along<br>🎤 Sustainability<br>🎤 Sustainability Follow Along<br>🎤 AWS Ground Station<br>🎤 AWS Outposts</p>
<h3 id="heading-cloud-architecture">Cloud Architecture</h3>
<p>🎤 Cloud Architecture Terminologies<br>🎤 High Availability<br>🎤 High Scalability<br>🎤 High Elasticity<br>🎤 Fault Tolerance<br>🎤 High Durability<br>🎤 Business Continuity Plan<br>🎤 Disaster Recovery Options<br>🎤 RTO Visualized<br>🎤 RPO Visualized<br>🎤 Architectural diagram example<br>🎤 HA Follow Along</p>
<h3 id="heading-management-and-developer-tools">Management and Developer Tools</h3>
<p>🎤 AWS API<br>🎤 AWS API Follow Along<br>🎤 AWS Management Console<br>🎤 AWS Management Console Follow Along<br>🎤 Service Console<br>🎤 Service Console Follow Along<br>🎤 AWS Account ID<br>🎤 AWS Account ID Follow Along<br>🎤 AWS Tools for PowerShell<br>🎤 AWS Tools for Powershell Follow Along<br>🎤 Amazon Resource Names<br>🎤 ARN Follow Along<br>🎤 AWS CLI<br>🎤 AWS CLI Follow Along<br>🎤 AWS SDK<br>🎤 AWS SDK Follow Along<br>🎤 AWS CloudShell<br>🎤 Infrastructure as Code<br>🎤 CloudFormation<br>🎤 CloudFormation Follow Along<br>🎤 CDK<br>🎤 CDK Follow Along<br>🎤 AWS Toolkit for VSCode<br>🎤 Access Keys<br>🎤 Access Keys Follow Along<br>🎤 AWS Documentation<br>🎤 AWS Documentation Follow Along</p>
<h3 id="heading-shared-responsibility-model">Shared Responsibility Model</h3>
<p>🎤 Introduction to Shared Responsibility Model<br>🎤 AWS Shared Responsibility Model<br>🎤 Types of Cloud Responsibilities<br>🎤 Shared Responsibility for Compute<br>🎤 Shared Responsibility Model Alternate<br>🎤 Shared Responsibility Model Architecture</p>
<h3 id="heading-compute">Compute</h3>
<p>🎤 EC2 Overview<br>🎤 VMs Containers and Serverless<br>🎤 Compute Follow Along<br>🎤 High Performance Computing HPC<br>🎤 HPC Follow Along<br>🎤 Edge and Hybrid<br>🎤 Edge Computing Follow Along<br>🎤 Cost Capacity Management</p>
<h3 id="heading-storage-services">Storage Services</h3>
<p>🎤 Types of Storage Services<br>🎤 Introduction to S3<br>🎤 S3 Storage Classes<br>🎤 AWS Snow Family<br>🎤 Storage Services<br>🎤 S3 Follow Along<br>🎤 EBS Follow Along<br>🎤 EFS Follow Along<br>🎤 Snow Family Follow Along</p>
<h3 id="heading-databases">Databases</h3>
<p>🎤 What is a database<br>🎤 What is a data warehouse<br>🎤 What is a key value store<br>🎤 What is a document database<br>🎤 NoSQL Database Services<br>🎤 Relational Database Services<br>🎤 Other Database Services<br>🎤 DynamoDB Follow Along<br>🎤 RDS Follow Along<br>🎤 Redshift Follow Along</p>
<h3 id="heading-networking">Networking</h3>
<p>🎤 Cloud Native Networking Services<br>🎤 Enterprise Hybrid Networking Services<br>🎤 Virtual Private Cloud VPC Subnets<br>🎤 Security Groups vs NACLs<br>🎤 Security Groups vs NACLs Follow Along<br>🎤 AWS CloudFront</p>
<h3 id="heading-ec2">EC2</h3>
<p>🎤 Introduction to EC2<br>🎤 EC2 Instance Families<br>🎤 EC2 Instance Types<br>🎤 Dedicated Host vs Dedicated Instances<br>🎤 EC2 Tenancy<br>🎤 Launch an EC2 SSH and Sessions Manager<br>🎤 Elastic IP<br>🎤 AMI and Launch Template<br>🎤 Launch an ASG<br>🎤 Launch an ALB<br>🎤 Cleanup</p>
<h3 id="heading-ec2-pricing-models">EC2 Pricing Models</h3>
<p>🎤 Ec2 Pricing Models<br>🎤 On Demand<br>🎤 Reserved Instances<br>🎤 RI Attributes<br>🎤 Regional and Zonal RI<br>🎤 RI Limits<br>🎤 Capacity Reservations<br>🎤 Standard vs Convertible RI<br>🎤 RI Marketplace<br>🎤 Spot Instances<br>🎤 Dedicated Instances<br>🎤 Savings Plan</p>
<h3 id="heading-identity">Identity</h3>
<p>🎤 Zero Trust Model<br>🎤 Zero Trust on AWS<br>🎤 Zero Trust on AWS with Third Parties<br>🎤 Directory Service<br>🎤 Active Directory<br>🎤 Identity Providers<br>🎤 Single Sign On<br>🎤 LDAP<br>🎤 Multi Factor Authenication<br>🎤 Security Keys<br>🎤 AWS IAM<br>🎤 Anatomy of an IAM Policy<br>🎤 IAM Policies Follow Along<br>🎤 Principle of Least Priivilege<br>🎤 AWS Account Root User<br>🎤 AWS SSO</p>
<h3 id="heading-application-integration">Application Integration</h3>
<p>🎤 Introduction to Application Integration<br>🎤 Queueing and SQS<br>🎤 Streaming and Kinesis<br>🎤 Pub Sub and SNS<br>🎤 API Gateway and Amazon API Gateway<br>🎤 State Machines and AWS Step Functions<br>🎤 Event Bus and Amazon Event Bridge<br>🎤 Application Integration Services</p>
<h3 id="heading-containers">Containers</h3>
<p>🎤 VMs vs Containers<br>🎤 What are Microservices<br>🎤 Kuberenetes<br>🎤 Docker<br>🎤 Podman<br>🎤 Container Services</p>
<h3 id="heading-governance">Governance</h3>
<p>🎤 Organizations and Accounts<br>🎤 AWS Control Tower<br>🎤 AWS Config<br>🎤 AWS Config FollowAlong<br>🎤 AWS Quick Starts<br>🎤 AWS QuickStarts Follow Along<br>🎤 Tagging<br>🎤 Tag Name Follow Along<br>🎤 Resource Groups<br>🎤 Resource Groups Follow Along<br>🎤 Business Centric Services</p>
<h3 id="heading-provisioning">Provisioning</h3>
<p>🎤 Provisioning Services<br>🎤 AWS Elastic Beanstalk<br>🎤 AWS Elastic Beanstalk Follow Along</p>
<h3 id="heading-serverless">Serverless</h3>
<p>🎤 What is Serverless<br>🎤 Serverless Services</p>
<h3 id="heading-windows-on-aws">Windows on AWS</h3>
<p>🎤 Windows on AWS<br>🎤 EC2 Windows Follow Along<br>🎤 AWS License Manager</p>
<h3 id="heading-logging">Logging</h3>
<p>🎤 Logging Services<br>🎤 AWS Cloud Trail<br>🎤 CloudWatch Alarm<br>🎤 Anatomy of an Alarm<br>🎤 Log Streams and Events<br>🎤 Log Insights<br>🎤 CloudWatch Metrics<br>🎤 AWS CloudTrail Follow Along</p>
<h3 id="heading-ml-ai-bigdata">ML AI BigData</h3>
<p>🎤 Introduction to ML and AI<br>🎤 AI and ML Services<br>🎤 BigData and Analytics Services<br>🎤 Amazon QuickSight<br>🎤 QuickSight Follow Along<br>🎤 Machine Learning and AI Services Extended<br>🎤 Generative AI<br>🎤 ML and DL Frameworks and Tools<br>🎤 Apache MXNet<br>🎤 What is Intel<br>🎤 Intel Xeon Scalable and Intel Gaudi<br>🎤 What is a GPU<br>🎤 What is CUDA</p>
<h3 id="heading-aws-well-architected-framework">AWS Well Architected Framework</h3>
<p>🎤 AWS Well Architected Framework<br>🎤 General Defintions<br>🎤 On Architecture<br>🎤 Amazon Leadership Principles<br>🎤 General Design Principles<br>🎤 Anatomy of a Pillar<br>🎤 Operational Excellence<br>🎤 Security<br>🎤 Reliability<br>🎤 Performance Efficiency<br>🎤 Cost Optimization<br>🎤 AWS Well Architected Tool<br>🎤 Well Architected Framework and Tool Follow Along<br>🎤 AWS Architecture Center</p>
<h3 id="heading-tco-and-migration">TCO and Migration</h3>
<p>🎤 Total Cost of Ownership TCO<br>🎤 CAPEX vs OPEX<br>🎤 Shifting IT Personnel<br>🎤 AWS Pricing Calculator<br>🎤 AWS Pricing Calculator Follow Along<br>🎤 Migration Evaluator<br>🎤 VM Import Export<br>🎤 Database Migration Service<br>🎤 Cloud Adoption Framework</p>
<h3 id="heading-billing-and-pricing">Billing and Pricing</h3>
<p>🎤 AWS Free Services<br>🎤 AWS Support Plans<br>🎤 Technical Account Manager<br>🎤 AWS Support Follow Along<br>🎤 AWS Marketplace<br>🎤 AWS Marketplace Follow Along<br>🎤 Consolidated Billing<br>🎤 Consolidated Billing Volume Discounts<br>🎤 AWS Trusted Advisor<br>🎤 AWS Trusted Advisor Follow Along<br>🎤 SLAs<br>🎤 AWS SLA Examples<br>🎤 AWS SLA Follow Along<br>🎤 Service Health Dashboard<br>🎤 AWS Personal Health Dashboard<br>🎤 AWS Abuse<br>🎤 AWS Abuse Report Follow Along<br>🎤 AWS Free Tier<br>🎤 AWS Credits<br>🎤 AWS Partner Network<br>🎤 AWS Budgets<br>🎤 AWS Budget Reports<br>🎤 AWS Cost and Usage Reports<br>🎤 Cost Allocation Tags<br>🎤 Billing Alarms<br>🎤 AWS Cost Explorer<br>🎤 AWS Cost Explorer Follow Along<br>🎤 Programmatic Pricing APIs<br>🎤 AWS Savings Plan Follow Along</p>
<h3 id="heading-security">Security</h3>
<p>🎤 Defense In Depth<br>🎤 CIA Triad<br>🎤 Vulnerabilities<br>🎤 Encryption<br>🎤 Cyphers<br>🎤 Cryptographic Keys<br>🎤 Hashing and Salting<br>🎤 Digital Signatures and Signing<br>🎤 In Transit vs At Rest Encryption<br>🎤 Compliance Programs<br>🎤 AWS Compliance Programs Follow Along<br>🎤 Pen Testing<br>🎤 Pen Testing Follow Along<br>🎤 AWS Artifact<br>🎤 AWS Artifact Follow Along<br>🎤 AWS Inspector<br>🎤 DDoS<br>🎤 AWS Shield<br>🎤 AWS Guard Duty<br>🎤 AWS Guard Duty Follow Along<br>🎤 Amazon Macie<br>🎤 AWS VPN<br>🎤 AWS WAF<br>🎤 AWS WAF Follow Along<br>🎤 Hardware Security Module<br>🎤 AWS KMS<br>🎤 AWS KMS Follow Along<br>🎤 CloudHSM</p>
<h3 id="heading-variation-study">Variation Study</h3>
<p>🎤 Know Your Initialisms<br>🎤 AWS Config AWS AppConfig<br>🎤 SNS vs SQS<br>🎤 SNS vs SES vs PinPoint vs Workmail<br>🎤 Amazon Inspector vs AWS Trusted Advisor<br>🎤 Connect Named Services<br>🎤 Elastic Transcoder vs MediaConvert<br>🎤 AWS Artifact vs Amazon Inspector<br>🎤 ELB variants</p>
<p>You can watch the entire <a href="https://youtu.be/7HKot-brXFE">course on the freeCodeCamp.org</a> (14-hour course).</p>
<div class="embed-wrapper"><iframe width="560" height="315" src="https://www.youtube.com/embed/7HKot-brXFE" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Migrate to S3 Native State Locking in Terraform ]]>
                </title>
                <description>
                    <![CDATA[ If you've been running Terraform on AWS for any length of time, you know the setup: an S3 bucket for state storage, a DynamoDB table for state locking, and a handful of IAM policies tying them togethe ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-migrate-to-s3-native-state-locking-in-terraform/</link>
                <guid isPermaLink="false">69fd19239f93a850a430069b</guid>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Terraform ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Infrastructure as code ]]>
                    </category>
                
                    <category>
                        <![CDATA[ S3 ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tolani Akintayo ]]>
                </dc:creator>
                <pubDate>Thu, 07 May 2026 22:58:43 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/9619ad45-15c5-4be7-9221-ed4b76bc2b24.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you've been running Terraform on AWS for any length of time, you know the setup: an S3 bucket for state storage, a DynamoDB table for state locking, and a handful of IAM policies tying them together. It works. It has worked for years.</p>
<p>But it has always carried a cost that rarely gets discussed openly. That cost isn't just money, though a DynamoDB table with on-demand billing adds up across multiple teams and environments.</p>
<p>The real cost is complexity. Every new AWS environment needs both resources provisioned before Terraform can manage anything else. Every engineer who sets up their first Terraform backend has to understand why two completely different AWS services are responsible for what is logically one thing: storing and protecting state. And every incident involving a stuck lock has required someone to manually delete a record from DynamoDB to unblock the team.</p>
<p>In November 2024, AWS announced that S3 now supports native object locking for Terraform state files, meaning <strong>DynamoDB is no longer required for state locking</strong>. Terraform 1.10 added support for this feature, and it's now generally available.</p>
<p>In this tutorial, you'll learn:</p>
<ul>
<li><p>What S3 native locking is and how it works</p>
</li>
<li><p>How to set it up from scratch if you're starting a new project</p>
</li>
<li><p>How to migrate an existing S3 + DynamoDB setup to S3 native locking safely</p>
</li>
<li><p>How to verify locking is working and handle edge cases</p>
</li>
</ul>
<p>By the end, you'll have a simpler, cleaner Terraform backend with one fewer AWS resource to manage.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-what-is-terraform-state-locking">What Is Terraform State Locking?</a></p>
</li>
<li><p><a href="#heading-what-is-s3-native-state-locking">What Is S3 Native State Locking?</a></p>
</li>
<li><p><a href="#heading-how-s3-native-locking-compares-to-the-s3-dynamodb-approach">How S3 Native Locking Compares to the S3 + DynamoDB Approach</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-part-1-fresh-setup-how-to-configure-s3-native-locking-from-scratch">Part 1: Fresh Setup – How to Configure S3 Native Locking from Scratch</a></p>
<ul>
<li><p><a href="#heading-step-1-create-the-s3-bucket-with-versioning-and-encryption">Step 1: Create the S3 Bucket with Versioning and Encryption</a></p>
</li>
<li><p><a href="#heading-step-2-configure-the-terraform-backend-with-native-locking">Step 2: Configure the Terraform Backend with Native Locking</a></p>
</li>
<li><p><a href="#heading-step-3-initialize-and-verify">Step 3: Initialize and Verify</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-part-2-migration-how-to-move-from-s3-dynamodb-to-s3-native-locking">Part 2: Migration – How to Move from S3 + DynamoDB to S3 Native Locking</a></p>
<ul>
<li><p><a href="#heading-step-1-verify-your-current-setup">Step 1: Verify Your Current Setup</a></p>
</li>
<li><p><a href="#heading-step-2-enable-object-lock-on-the-existing-s3-bucket">Step 2: Enable Object Lock on the Existing S3 Bucket</a></p>
</li>
<li><p><a href="#heading-step-3-update-the-terraform-backend-configuration">Step 3: Update the Terraform Backend Configuration</a></p>
</li>
<li><p><a href="#heading-step-4-reinitialize-terraform">Step 4: Reinitialize Terraform</a></p>
</li>
<li><p><a href="#heading-step-5-verify-the-migration">Step 5: Verify the Migration</a></p>
</li>
<li><p><a href="#heading-step-6-clean-up-the-dynamodb-table">Step 6: Clean Up the DynamoDB Table</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-how-to-verify-that-locking-is-working">How to Verify That Locking Is Working</a></p>
</li>
<li><p><a href="#heading-how-to-handle-a-stuck-lock">How to Handle a Stuck Lock</a></p>
</li>
<li><p><a href="#heading-rollback-plan-if-something-goes-wrong">Rollback Plan: If Something Goes Wrong</a></p>
</li>
<li><p><a href="#heading-security-best-practices-for-your-state-bucket">Security Best Practices for Your State Bucket</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-references">References</a></p>
</li>
</ul>
<h2 id="heading-what-is-terraform-state-locking">What is Terraform State Locking?</h2>
<p>Before looking at the new approach, it helps to understand what state locking is solving.</p>
<p>Terraform stores everything it knows about your infrastructure in a <strong>state file</strong> – a JSON document that maps your configuration to real AWS resources. When you run <code>terraform apply</code>, Terraform reads this file, calculates the difference between the current state and your configuration, and makes the necessary changes.</p>
<p>The problem arises when two engineers or two CI/CD pipelines run and try to apply changes at the same time. If both read the state file simultaneously, calculate changes independently, and both try to write back, you get a <strong>race condition</strong>. The second write overwrites changes from the first, and your state is now out of sync with reality. This is a serious problem that can cause resources to be untracked, doubled, or destroyed unexpectedly.</p>
<p><strong>State locking</strong> solves this by creating a lock when any operation starts that could modify state. If a lock already exists, Terraform refuses to proceed and reports who holds the lock and when it was acquired. Only one operation can hold the lock at a time. When the operation completes, the lock is released.</p>
<pre><code class="language-plaintext">Terraform Run A                 State File / Lock                Terraform Run B
(User 1)                         (S3/DynamoDB)                   (User 2)

   |                                   |                            |
   |------- 1. Acquire Lock ----------&gt;|                            |
   |                                   |                            |
   |&lt;------ 2. Lock Granted -----------|                            |
   |                                   |                            |
   |                                   |------- 3. Acquire Lock ---&gt;|
   |            [PROCESSING]           |                            |
   |      (Modifying Infrastructure)   |&lt;------ 4. Lock Denied -----|
   |                                   |        (Wait / Retry)      |
   |                                   |                            |
   |------- 5. Release Lock ----------&gt;|                            |
   |                                   |                            |
   |           [COMPLETED]             |&lt;------ 6. Lock Granted ----|
   |                                   |                            |
   |                                   |       [PROCESSING]         |
   |                                   | (Modifying Infrastructure) |              
   |                                   |                            |
</code></pre>
<h2 id="heading-what-is-s3-native-state-locking">What Is S3 Native State Locking?</h2>
<p>Previously, Terraform's S3 backend used a DynamoDB table as the locking mechanism. When a lock was needed, Terraform wrote a record to DynamoDB with a <code>LockID</code> primary key. DynamoDB's conditional writes guaranteed that only one process could create that record, which is what made the locking atomic.</p>
<p>S3 native locking uses <strong>S3 Object Lock</strong> instead. S3 Object Lock is an S3 feature originally designed to enforce WORM (Write Once, Read Many) compliance for regulatory requirements. AWS extended this capability to support Terraform's state locking workflow.</p>
<p>When S3 native locking is enabled in your Terraform backend:</p>
<ol>
<li><p>Terraform writes your state to an <code>.tfstate</code> object in S3 (as before)</p>
</li>
<li><p>To acquire a lock, Terraform uses <strong>S3's conditional write operations</strong> – specifically the <code>if-none-match</code> conditional header to create a lock file atomically</p>
</li>
<li><p>If the lock file already exists, S3 rejects the write, and Terraform reports that a lock is held</p>
</li>
<li><p>When the operation completes, Terraform deletes the lock file to release the lock.</p>
</li>
</ol>
<p>The key difference from DynamoDB: the entire locking mechanism lives inside S3. No second service. No second set of IAM permissions. No second resource to provision.</p>
<p><strong>Note:</strong> This feature requires Terraform version <strong>1.10.0 or later</strong> and an S3 bucket with <strong>Object Lock enabled</strong>. Object Lock must be enabled at bucket creation time. You can't enable it on an existing bucket through the console or CLI. But there is a supported workaround for existing buckets, which we'll cover in Part 2.</p>
<h2 id="heading-how-s3-native-locking-compares-to-the-s3-dynamodb-approach">How S3 Native Locking Compares to the S3 + DynamoDB Approach</h2>
<table>
<thead>
<tr>
<th><strong>Aspect</strong></th>
<th><strong>S3 + DynamoDB (Old)</strong></th>
<th><strong>S3 Native Locking (New)</strong></th>
</tr>
</thead>
<tbody><tr>
<td><strong>AWS services required</strong></td>
<td>S3 + DynamoDB</td>
<td>S3 only</td>
</tr>
<tr>
<td><strong>IAM permissions needed</strong></td>
<td>S3 + DynamoDB permissions</td>
<td>S3 permissions only</td>
</tr>
<tr>
<td><strong>Terraform version</strong></td>
<td>Any</td>
<td>1.10.0 or later</td>
</tr>
<tr>
<td><strong>Setup complexity</strong></td>
<td>Two resources, two IAM scopes</td>
<td>One resource</td>
</tr>
<tr>
<td><strong>Stuck lock resolution</strong></td>
<td>Delete DynamoDB record</td>
<td>Delete S3 lock file</td>
</tr>
<tr>
<td><strong>Cost</strong></td>
<td>S3 storage + DynamoDB on-demand</td>
<td>S3 storage only</td>
</tr>
<tr>
<td><strong>Object Lock requirement</strong></td>
<td>Not required</td>
<td>Required on S3 bucket</td>
</tr>
<tr>
<td><strong>Locking mechanism</strong></td>
<td>DynamoDB conditional writes</td>
<td>S3 conditional writes (<code>if-none-match</code>)</td>
</tr>
<tr>
<td><strong>State versioning</strong></td>
<td>S3 Versioning (recommended)</td>
<td>S3 Versioning (required for full safety)</td>
</tr>
</tbody></table>
<p>The functional behavior from Terraform's perspective is identical. Locking works the same way. The lock information displayed when a lock is held has the same structure. The only difference is what happens under the hood.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you start, make sure you have the following in place:</p>
<ul>
<li><strong>Terraform 1.10.0 or later</strong> installed. Check your version:</li>
</ul>
<pre><code class="language-shell">terraform version
</code></pre>
<p>If you need to upgrade, follow the <a href="https://developer.hashicorp.com/terraform/install">official upgrade guide</a>.</p>
<ul>
<li><strong>AWS CLI</strong> installed and configured with credentials that have permission to create and manage S3 buckets.</li>
</ul>
<pre><code class="language-shell">aws --version
aws sts get-caller-identity   # confirm you're authenticated
</code></pre>
<ul>
<li><p><strong>IAM permissions</strong> to perform the following S3 actions:</p>
<ul>
<li><p><code>s3:CreateBucket</code></p>
</li>
<li><p><code>s3:PutBucketVersioning</code></p>
</li>
<li><p><code>s3:PutBucketEncryption</code></p>
</li>
<li><p><code>s3:PutObjectLegalHold</code></p>
</li>
<li><p><code>s3:PutObjectRetention</code></p>
</li>
<li><p><code>s3:GetObject</code></p>
</li>
<li><p><code>s3:PutObject</code></p>
</li>
<li><p><code>s3:DeleteObject</code></p>
</li>
<li><p><code>s3:ListBucket</code></p>
</li>
</ul>
</li>
<li><p>For the <strong>migration path</strong>: access to your existing Terraform project and the S3 bucket and DynamoDB table currently in use.</p>
</li>
</ul>
<h2 id="heading-part-1-fresh-setup-how-to-configure-s3-native-locking-from-scratch">Part 1: Fresh Setup – How to Configure S3 Native Locking from Scratch</h2>
<p>Follow this section if you're starting a new Terraform project and want to use S3 native locking from the beginning.</p>
<h3 id="heading-step-1-create-the-s3-bucket-with-versioning-and-encryption">Step 1: Create the S3 Bucket with Versioning and Encryption</h3>
<p>Object Lock <strong>must be enabled at bucket creation time</strong>. You can't add it afterward through the standard console flow. Create the bucket using the AWS CLI with Object Lock enabled:</p>
<pre><code class="language-shell">aws s3api create-bucket \
  --bucket your-project-terraform-state \
  --region us-east-1 \
  --object-lock-enabled-for-bucket
</code></pre>
<p><strong>Note:</strong> For regions other than <code>us-east-1</code>, add the <code>--create-bucket-configuration</code> flag.</p>
<pre><code class="language-shell">aws s3api create-bucket \
  --bucket your-project-terraform-state \
  --region eu-west-1 \
  --create-bucket-configuration LocationConstraint=eu-west-1 \
  --object-lock-enabled-for-bucket
</code></pre>
<p>Now enable versioning on the bucket. Versioning is required alongside Object Lock and allows Terraform to recover previous state versions if something goes wrong:</p>
<pre><code class="language-shell">aws s3api put-bucket-versioning \
  --bucket your-project-terraform-state \
  --versioning-configuration Status=Enabled
</code></pre>
<p>Enable server-side encryption so your state files are encrypted at rest:</p>
<pre><code class="language-shell">aws s3api put-bucket-encryption \
  --bucket your-project-terraform-state \
  --server-side-encryption-configuration '{
    "Rules": [
      {
        "ApplyServerSideEncryptionByDefault": {
          "SSEAlgorithm": "AES256"
        },
        "BucketKeyEnabled": true
      }
    ]
  }'
</code></pre>
<p>Block all public access to the bucket. A Terraform state file contains resource IDs, IP addresses, and potentially sensitive values. It should never be publicly accessible:</p>
<pre><code class="language-shell">aws s3api put-public-access-block \
  --bucket your-project-terraform-state \
  --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
</code></pre>
<p>Verify the bucket configuration:</p>
<pre><code class="language-shell"># Confirm Object Lock is enabled
aws s3api get-object-lock-configuration \
  --bucket your-project-terraform-state
 
# Confirm versioning is enabled
aws s3api get-bucket-versioning \
  --bucket your-project-terraform-state
 
# Confirm encryption is configured
aws s3api get-bucket-encryption \
  --bucket your-project-terraform-state
</code></pre>
<p>Expected output for the Object Lock check:</p>
<pre><code class="language-json">{
    "ObjectLockConfiguration": {
        "ObjectLockEnabled": "Enabled"
    }
}
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/2b2e56cf-687f-4932-a61e-ed7cc33ea6f1.png" alt="Terminal showing AWS CLI verification commands confirming S3 bucket is configured correctly with Object Lock, versioning, and encryption enabled" style="display:block;margin:0 auto" width="1120" height="616" loading="lazy">

<h3 id="heading-step-2-configure-the-terraform-backend-with-native-locking">Step 2: Configure the Terraform Backend with Native Locking</h3>
<p>In your Terraform project, create or update your <code>backend.tf</code> file:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket = "your-project-terraform-state"
    key    = "production/terraform.tfstate"
    region = "us-east-1"
 
    # Enable S3 native state locking
    # Requires Terraform 1.10.0+ and a bucket with Object Lock enabled
    use_lockfile = true
 
    # Encryption at rest
    encrypt = true
  }
}
</code></pre>
<p>The critical difference from the old configuration is the <code>use_lockfile = true</code> parameter. Notice what is <strong>absent</strong>: there's no <code>dynamodb_table</code> argument. No DynamoDB table. No second service.</p>
<p>Here's a direct comparison of the old and new configurations:</p>
<p><strong>Old configuration (S3 + DynamoDB):</strong></p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket         = "your-project-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"   # this goes away
  }
}
</code></pre>
<p><strong>New configuration (S3 native locking):</strong></p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket       = "your-project-terraform-state"
    key          = "production/terraform.tfstate"
    region       = "us-east-1"
    encrypt      = true
    use_lockfile = true   # this replaces dynamodb_table
  }
}
</code></pre>
<h3 id="heading-step-3-initialize-and-verify">Step 3: Initialize and Verify</h3>
<p>Run <code>terraform init</code> to initialize the backend:</p>
<pre><code class="language-shell">terraform init
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">Initializing the backend...
 
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
 
Initializing provider plugins...
 
Terraform has been successfully initialized!
</code></pre>
<p>Run a plan to confirm everything is working end-to-end:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>If locking is working, you'll see a brief pause while Terraform acquires the lock before the plan output appears. You'll also see the lock information if you look at the S3 bucket&nbsp;– a <code>.tflock</code> file will appear temporarily alongside your state file during the operation and disappear when it completes.</p>
<h2 id="heading-part-2-migration-how-to-move-from-s3-dynamodb-to-s3-native-locking">Part 2: Migration&nbsp;– How to Move from S3 + DynamoDB to S3 Native Locking</h2>
<p>Follow this section if you have an <strong>existing Terraform setup</strong> using an S3 bucket and DynamoDB table for state locking, and you want to migrate to S3 native locking.</p>
<p><strong>Important:</strong> Migration requires a maintenance window or at minimum a period where no Terraform operations are running. You're changing the backend configuration, which means <strong>all team members and CI/CD pipelines must stop running</strong> <code>terraform plan</code> <strong>or</strong> <code>terraform apply</code> <strong>during the migration</strong>. The migration itself takes under 10 minutes.</p>
<h3 id="heading-step-1-verify-your-current-setup">Step 1: Verify Your Current Setup</h3>
<p>Before making any changes, document your existing backend configuration and confirm the state file is accessible:</p>
<pre><code class="language-shell"># Confirm your state file is in S3
aws s3 ls s3://your-existing-bucket/path/to/terraform.tfstate
 
# Confirm the DynamoDB table exists
aws dynamodb describe-table \
  --table-name your-dynamodb-lock-table \
  --query 'Table.TableStatus'
</code></pre>
<p>Check your current <code>backend.tf</code> and note the exact values:</p>
<pre><code class="language-shell"># Your current backend.tf - note these values before changing anything
terraform {
  backend "s3" {
    bucket         = "your-existing-bucket"       # note this
    key            = "path/to/terraform.tfstate"   # note this
    region         = "us-east-1"                   # note this
    encrypt        = true
    dynamodb_table = "your-dynamodb-lock-table"    # this will be removed
  }
}
</code></pre>
<p>Run one final plan to confirm the current state is clean and there are no unexpected changes pending:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>If the plan shows no changes, you're in a safe state to proceed.</p>
<h3 id="heading-step-2-enable-object-lock-on-the-existing-s3-bucket">Step 2: Enable Object Lock on the Existing S3 Bucket</h3>
<p>This is the most important step in the migration. Object Lock can't normally be enabled on an existing bucket. It's a setting that must be configured at creation time.</p>
<p>But AWS provides a way to enable Object Lock on an existing bucket through a support request or through a direct API call that's not exposed in the standard console UI. AWS has officially documented this path for the Terraform migration use case.</p>
<p>Run the following AWS CLI command to enable Object Lock on your <strong>existing</strong> bucket:</p>
<pre><code class="language-bash">aws s3api put-object-lock-configuration \
  --bucket your-existing-bucket \
  --object-lock-configuration '{"ObjectLockEnabled": "Enabled"}'
</code></pre>
<p><strong>Note:</strong> This command enables Object Lock in <strong>governance mode with no default retention</strong>, meaning it enables the locking capability without setting a default retention period on all objects. This is exactly what Terraform's native locking needs: the ability to create and delete lock files, not permanent object retention.</p>
<p>Verify Object Lock is now enabled:</p>
<pre><code class="language-shell">aws s3api get-object-lock-configuration \
  --bucket your-existing-bucket
</code></pre>
<p>Expected output:</p>
<pre><code class="language-json">{
    "ObjectLockConfiguration": {
        "ObjectLockEnabled": "Enabled"
    }
}
</code></pre>
<p>Also verify that versioning is already enabled (it should be if you are running a production Terraform setup):</p>
<pre><code class="language-shell">aws s3api get-bucket-versioning \
  --bucket your-existing-bucket
</code></pre>
<p>Expected output:</p>
<pre><code class="language-json">{
    "Status": "Enabled"
}
</code></pre>
<p>If versioning isn't enabled, enable it before proceeding:</p>
<pre><code class="language-shell">aws s3api put-bucket-versioning \
  --bucket your-existing-bucket \
  --versioning-configuration Status=Enabled
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/cd17df01-3d0a-4f93-9250-3f51627e91c8.png" alt="Terminal output showing successful Object Lock enablement on an existing S3 bucket using the AWS CLI" style="display:block;margin:0 auto" width="1204" height="320" loading="lazy">

<h3 id="heading-step-3-update-the-terraform-backend-configuration">Step 3: Update the Terraform Backend Configuration</h3>
<p>Update your <code>backend.tf</code> to remove the <code>dynamodb_table</code> argument and add <code>use_lockfile = true</code>:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket = "your-existing-bucket"
    key    = "path/to/terraform.tfstate"
    region = "us-east-1"
    encrypt = true
 
    # Add this:
    use_lockfile = true
 
    # Remove this line entirely:
    # dynamodb_table = "your-dynamodb-lock-table"
  }
}
</code></pre>
<p>Your updated <code>backend.tf</code> should look like this:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket       = "your-existing-bucket"
    key          = "path/to/terraform.tfstate"
    region       = "us-east-1"
    encrypt      = true
    use_lockfile = true
  }
}
</code></pre>
<h3 id="heading-step-4-reinitialize-terraform">Step 4: Reinitialize Terraform</h3>
<p>Run <code>terraform init</code> with the <code>-reconfigure</code> flag. This flag tells Terraform that the backend configuration has changed intentionally and to reinitialize without prompting you to copy state (the state is already in the same bucket):</p>
<pre><code class="language-shell">terraform init -reconfigure
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">Initializing the backend...
 
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
 
Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
 
Terraform has been successfully initialized!
</code></pre>
<p><strong>If you see an error here:</strong> The most common cause is that Object Lock wasn't successfully enabled on the bucket. Re-run the verification from Step 2 before proceeding.</p>
<h3 id="heading-step-5-verify-the-migration">Step 5: Verify the Migration</h3>
<p>Run a plan to confirm Terraform is working correctly with the new backend configuration:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>The plan should:</p>
<ul>
<li><p>Complete successfully</p>
</li>
<li><p>Show the same result as the plan you ran in Step 1 (no changes, or the same changes as before)</p>
</li>
<li><p>NOT mention DynamoDB anywhere in its output</p>
</li>
</ul>
<p>To confirm that locking is actually using S3 instead of DynamoDB, open a second terminal and run a plan while the first one is running. You should see the second terminal output a lock error that mentions S3, not DynamoDB:</p>
<pre><code class="language-plaintext">╷
│ Error: Error acquiring the state lock
│
│Error message: operation error S3: PutObject, https response       error StatusCode: 409,
│ RequestID: ..., api error Conflict: Object lock already exists for this key.
│
│ Lock Info:
│   ID:        a1b2c3d4-e5f6-7890-abcd-ef1234567890
│   Path:      your-existing-bucket/path/to/terraform.tfstate.tflock
│   Operation: OperationTypePlan
│   Who:       user@hostname
│   Version:   1.10.0
│   Created:   2026-05-06 14:22:01 UTC
│   Info:
╵
</code></pre>
<p>The <code>Path</code> field shows <code>.tfstate.tflock</code>, a file in your S3 bucket, not a DynamoDB record. This confirms that locking is now handled entirely by S3.</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/e9abb703-af6e-429c-83bb-2ea2dac43a3a.png" alt="Two terminals showing concurrent terraform plan commands, the second one displays a lock error confirming S3 native locking is working" style="display:block;margin:0 auto" width="1264" height="539" loading="lazy">

<h3 id="heading-step-6-clean-up-the-dynamodb-table">Step 6: Clean Up the DynamoDB Table</h3>
<p>Once you've confirmed the migration is working correctly and your team has run at least one successful <code>plan</code> and <code>apply</code> cycle using the new backend, you can remove the DynamoDB table.</p>
<p><strong>Wait at least 24-48 hours before deleting the DynamoDB table</strong> if you have CI/CD pipelines or multiple team members. This gives time to catch any pipeline that wasn't updated with the new backend configuration.</p>
<p>When you're ready, delete the DynamoDB table:</p>
<pre><code class="language-shell">aws dynamodb delete-table \
  --table-name your-dynamodb-lock-table
</code></pre>
<p>Confirm the deletion:</p>
<pre><code class="language-shell">aws dynamodb describe-table \
  --table-name your-dynamodb-lock-table
</code></pre>
<p>Expected output:</p>
<pre><code class="language-plaintext">An error occurred (ResourceNotFoundException) when calling the DescribeTable operation:
Requested resource not found
</code></pre>
<p>This error confirms that the table is gone. The migration is complete.</p>
<p>If you provisioned the DynamoDB table using Terraform (which is the recommended pattern), remove the resource from your Terraform configuration and run <code>terraform apply</code> to destroy it via Terraform rather than the CLI directly. This keeps your state clean:</p>
<pre><code class="language-hcl"># Remove this entire block from your Terraform configuration:
resource "aws_dynamodb_table" "terraform_state_lock" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
 
  attribute {
    name = "LockID"
    type = "S"
  }
}
</code></pre>
<p>After removing the block, run:</p>
<pre><code class="language-bash">terraform apply
</code></pre>
<p>Terraform will detect that the DynamoDB table resource has been removed from configuration and will destroy the table.</p>
<h2 id="heading-how-to-verify-that-locking-is-working">How to Verify That Locking Is Working</h2>
<p>After completing either the fresh setup or the migration, use this procedure to independently verify that locking is functioning correctly.</p>
<h3 id="heading-method-1-observe-the-lock-file-during-an-operation">Method 1: Observe the lock file during an operation</h3>
<p>In one terminal, start a long-running plan against a configuration with many resources:</p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>While it's running, in a second terminal, check for the lock file in S3:</p>
<pre><code class="language-shell">aws s3 ls s3://your-bucket/path/to/ | grep tflock
</code></pre>
<p>You should see a file like:</p>
<pre><code class="language-plaintext">2026-05-06 14:22:01        512 terraform.tfstate.tflock
</code></pre>
<p>After the plan completes, run the same command again. The <code>.tflock</code> file should be gone.</p>
<h3 id="heading-method-2-read-the-lock-file-contents">Method 2: Read the lock file contents</h3>
<p>While a plan is running, download and read the lock file to see its contents:</p>
<pre><code class="language-shell">aws s3 cp \
  s3://your-bucket/path/to/terraform.tfstate.tflock \
  /tmp/current.lock &amp;&amp; cat /tmp/current.lock
</code></pre>
<p>Expected output (formatted for readability):</p>
<pre><code class="language-json">{
  "ID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "Operation": "OperationTypePlan",
  "Info": "",
  "Who": "tolani@dev-machine",
  "Version": "1.10.0",
  "Created": "2026-05-06T14:22:01.123456789Z",
  "Path": "your-bucket/path/to/terraform.tfstate"
}
</code></pre>
<p>This is the same lock information that Terraform displays when a lock is held. It's now a JSON file in S3 rather than a record in DynamoDB.</p>
<h2 id="heading-how-to-handle-a-stuck-lock">How to Handle a Stuck Lock</h2>
<p>With the DynamoDB backend, resolving a stuck lock meant deleting a record from the DynamoDB table. With S3 native locking, it means deleting the <code>.tflock</code> file from S3.</p>
<p>A lock can get stuck if:</p>
<ul>
<li><p>A <code>terraform apply</code> or <code>plan</code> process was killed mid-execution</p>
</li>
<li><p>A CI/CD pipeline runner crashed during a Terraform operation</p>
</li>
<li><p>A network interruption prevented the lock release from completing</p>
</li>
</ul>
<p>Here's how you can check for a stuck lock:</p>
<pre><code class="language-shell">aws s3 ls s3://your-bucket/path/to/ | grep tflock
</code></pre>
<p>If a <code>.tflock</code> file exists and no Terraform operation is currently running, it is a stuck lock.</p>
<p>You can also read the lock to understand who held it:</p>
<pre><code class="language-shell">aws s3 cp \
  s3://your-bucket/path/to/terraform.tfstate.tflock \
  /tmp/stuck.lock &amp;&amp; cat /tmp/stuck.lock
</code></pre>
<p>This tells you who (<code>Who</code> field) was running the operation, what operation it was (<code>Operation</code> field), and when it was acquired (<code>Created</code> field).</p>
<p>And you can force-unlock using Terraform like this:</p>
<pre><code class="language-shell">terraform force-unlock LOCK-ID
</code></pre>
<p>Replace <code>LOCK-ID</code> with the <code>ID</code> value from the lock file contents. For example:</p>
<pre><code class="language-shell">terraform force-unlock a1b2c3d4-e5f6-7890-abcd-ef1234567890
</code></pre>
<p>Terraform will confirm:</p>
<pre><code class="language-plaintext">Do you really want to force-unlock?
  Terraform will remove the lock on the remote state.
  This will allow local Terraform commands to modify this state, even though it
  may be still be in use. Only 'yes' will be accepted to confirm.
 
  Enter a value: yes
 
Terraform state has been successfully unlocked!
</code></pre>
<p>An alternative is to delete the lock file directly via CLI. If <code>terraform force-unlock</code> doesn't work (for example, because you are running in a CI environment without Terraform available), delete the lock file directly:</p>
<pre><code class="language-shell">aws s3 rm s3://your-bucket/path/to/terraform.tfstate.tflock
</code></pre>
<p><strong>Only delete the lock file if you are certain no Terraform operation is currently running.</strong> Deleting a lock that is actively held by a running operation will allow a second concurrent operation to start, which is exactly the race condition locking is designed to prevent.</p>
<h2 id="heading-rollback-plan-if-something-goes-wrong">Rollback Plan: If Something Goes Wrong</h2>
<p>If you encounter problems after migrating, you can roll back to the S3 + DynamoDB setup with these steps.</p>
<p><strong>Step 1: Stop all Terraform operations</strong> in your team and CI/CD pipelines.</p>
<p><strong>Step 2: Recreate the DynamoDB table</strong> if you already deleted it:</p>
<pre><code class="language-shell">aws dynamodb create-table \
  --table-name terraform-state-lock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST
</code></pre>
<p><strong>Step 3: Revert</strong> <code>backend.tf</code> to the previous configuration:</p>
<pre><code class="language-hcl">terraform {
  backend "s3" {
    bucket         = "your-existing-bucket"
    key            = "path/to/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"   # restored
    # Remove: use_lockfile = true
  }
}
</code></pre>
<p><strong>Step 4: Reinitialize:</strong></p>
<pre><code class="language-shell">terraform init -reconfigure
</code></pre>
<p><strong>Step 5: Verify:</strong></p>
<pre><code class="language-shell">terraform plan
</code></pre>
<p>The state file hasn't moved, so there's no data loss during a rollback. The only change is which locking mechanism Terraform uses.</p>
<p><strong>Note:</strong> Object Lock being enabled on the S3 bucket doesn't prevent the rollback. Object Lock and DynamoDB locking can coexist, Object Lock simply adds a capability to the bucket. Using <code>dynamodb_table</code> in your backend config tells Terraform to use DynamoDB regardless of whether Object Lock is enabled on the bucket.</p>
<h2 id="heading-security-best-practices-for-your-state-bucket">Security Best Practices for Your State Bucket</h2>
<p>Migrating to S3 native locking is a good opportunity to review the overall security configuration of your state bucket. Here are the practices every production Terraform state bucket should implement:</p>
<h3 id="heading-enable-versioning-required">Enable Versioning (Required)</h3>
<p>Versioning is a hard requirement for S3 native locking to work safely. It ensures that if a state file is accidentally overwritten or corrupted, you can restore a previous version.</p>
<pre><code class="language-shell">aws s3api put-bucket-versioning \
  --bucket your-state-bucket \
  --versioning-configuration Status=Enabled
</code></pre>
<h3 id="heading-block-all-public-access-non-negotiable">Block All Public Access (Non-Negotiable)</h3>
<p>Your state file contains resource ARNs, IP addresses, and may contain sensitive values passed through Terraform variables. It must never be publicly accessible.</p>
<pre><code class="language-shell">aws s3api put-public-access-block \
  --bucket your-state-bucket \
  --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
</code></pre>
<h3 id="heading-enable-server-side-encryption">Enable Server-Side Encryption</h3>
<p>Always encrypt state files at rest. AES256 is the minimum. If your organization requires KMS key management:</p>
<pre><code class="language-shell">aws s3api put-bucket-encryption \
  --bucket your-state-bucket \
  --server-side-encryption-configuration '{
    "Rules": [
      {
        "ApplyServerSideEncryptionByDefault": {
          "SSEAlgorithm": "aws:kms",
          "KMSMasterKeyID": "arn:aws:kms:us-east-1:123456789012:key/your-kms-key-id"
        },
        "BucketKeyEnabled": true
      }
    ]
  }'
</code></pre>
<h3 id="heading-apply-least-privilege-iam-permissions">Apply Least-Privilege IAM Permissions</h3>
<p>The role or user that Terraform uses to access the state bucket should have only the permissions it needs. Here's a minimal IAM policy for S3 native locking:</p>
<pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "TerraformStateAccess",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::your-state-bucket",
        "arn:aws:s3:::your-state-bucket/*"
      ]
    },
    {
      "Sid": "TerraformStateLocking",
      "Effect": "Allow",
      "Action": [
        "s3:GetObjectLegalHold",
        "s3:PutObjectLegalHold",
        "s3:GetObjectRetention",
        "s3:PutObjectRetention"
      ],
      "Resource": "arn:aws:s3:::your-state-bucket/*.tflock"
    }
  ]
}
</code></pre>
<p>Notice what is absent: there are no DynamoDB permissions. This is a cleaner, smaller permission set than the old approach required.</p>
<h3 id="heading-enable-access-logging">Enable Access Logging</h3>
<p>Log all access to your state bucket in CloudTrail or S3 server access logs. This gives you an audit trail of every time state was read, written, or locked:</p>
<pre><code class="language-shell">aws s3api put-bucket-logging \
  --bucket your-state-bucket \
  --bucket-logging-status '{
    "LoggingEnabled": {
      "TargetBucket": "your-logging-bucket",
      "TargetPrefix": "terraform-state-access/"
    }
  }'
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>AWS S3 native state locking removes the need for a DynamoDB table from your Terraform backend setup. The result is simpler infrastructure, a smaller IAM permission surface, and one fewer service to provision, monitor, and pay for across every environment your team manages.</p>
<p>Here's a summary of what you accomplished:</p>
<ul>
<li><p>Understood what state locking is and why it's required for safe Terraform operations</p>
</li>
<li><p>Compared S3 native locking to the existing S3 + DynamoDB approach</p>
</li>
<li><p>Set up a fresh Terraform backend using S3 native locking with correct bucket configuration</p>
</li>
<li><p>Migrated an existing backend from S3 + DynamoDB to S3 native locking safely</p>
</li>
<li><p>Learned how to verify locking, handle stuck locks, and roll back if needed</p>
</li>
<li><p>Applied security best practices to the state bucket</p>
</li>
</ul>
<p>This pattern – using S3 native locking – is the recommended approach for all new Terraform projects on AWS going forward. If you're managing a large estate with multiple Terraform backends, consider automating the migration using a script or Terraform module that applies the pattern across all your state buckets.</p>
<p><em>If you are building or optimizing cloud infrastructure for a startup and want a complete reference for production-ready Terraform modules, CI/CD pipeline patterns, and infrastructure runbooks, check out</em> <a href="https://coachli.co/tolani-akintayo/PR-H4oQS">The Startup DevOps Field Guide</a><em>. It covers the full lifecycle of AWS infrastructure from initial setup to production reliability.</em></p>
<h2 id="heading-references">References</h2>
<ul>
<li><p><a href="https://developer.hashicorp.com/terraform/language/backend/s3#use_lockfile">HashiCorp - S3 Backend Configuration: use_lockfile</a></p>
</li>
<li><p><a href="https://github.com/hashicorp/terraform/releases/tag/v1.10.0">HashiCorp: Terraform 1.10 Release Notes</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html">AWS Docs: S3 Object Lock Overview</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectLockConfiguration.html">AWS Docs: PutObjectLockConfiguration API</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-requests.html">AWS Docs: S3 Conditional Writes</a></p>
</li>
<li><p><a href="https://developer.hashicorp.com/terraform/language/state/locking">HashiCorp: Backend State Locking</a></p>
</li>
<li><p><a href="https://developer.hashicorp.com/terraform/cli/commands/force-unlock">HashiCorp: terraform force-unlock Command</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/manage-versioning-examples.html">AWS Docs: Enabling S3 Versioning</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html">AWS Docs: S3 Server-Side Encryption</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Complete SOC 2 Type II Implementation Handbook for Engineers: A Month-by-Month Roadmap with Real Commands ]]>
                </title>
                <description>
                    <![CDATA[ If your team is preparing for a SOC 2 Type II review, this handbook is for you. It's a self-contained guide to the exact 90-day timeline, 14 critical controls, and evidence collection infrastructure t ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-complete-soc-2-type-ii-implementation-guide-for-engineers/</link>
                <guid isPermaLink="false">69fa364da386d7f121c468af</guid>
                
                    <category>
                        <![CDATA[ SOC ]]>
                    </category>
                
                    <category>
                        <![CDATA[ compliance  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cloud security ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ayobami Adejumo ]]>
                </dc:creator>
                <pubDate>Tue, 05 May 2026 18:26:21 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/83d83215-5d73-49f6-a745-d9c6cd0c33f8.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If your team is preparing for a SOC 2 Type II review, this handbook is for you. It's a self-contained guide to the exact 90-day timeline, 14 critical controls, and evidence collection infrastructure that auditors actually check.</p>
<p>Everyone publishes the controls list. But nobody publishes the week-by-week engineering calendar you'll need to follow to make sure your ducks are in a row.</p>
<p>Here is the exact 90-day timeline — including the mistakes that add 60 days (and how to avoid them).</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a href="#heading-what-youll-learn">What You'll Learn</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-weeks-1-2-the-scope-decision">Weeks 1–2: The Scope Decision</a></p>
</li>
<li><p><a href="#heading-weeks-3-6-the-14-controls-that-must-be-active-on-day-1">Weeks 3–6: The 14 Controls That Must Be Active on Day 1</a></p>
</li>
<li><p><a href="#heading-weeks-7-10-the-evidence-collection-infrastructure">Weeks 7–10: The Evidence Collection Infrastructure</a></p>
</li>
<li><p><a href="#heading-weeks-11-14-auditor-selection-and-readiness-assessment">Weeks 11–14: Auditor Selection and Readiness Assessment</a></p>
</li>
<li><p><a href="#heading-weeks-15-18-the-observation-period">Weeks 15–18: The Observation Period</a></p>
</li>
<li><p><a href="#heading-the-90-day-soc2-timeline-at-a-glance">The 90-Day SOC2 Timeline at a Glance</a></p>
</li>
<li><p><a href="#heading-whats-next">What's Next</a></p>
</li>
<li><p><a href="#heading-resources">Resources</a></p>
</li>
</ol>
<h2 id="heading-what-youll-learn">What You'll Learn</h2>
<p>By the end of this guide, you'll know:</p>
<ul>
<li><p>How to scope your SOC2 boundary correctly — the decision that determines everything else</p>
</li>
<li><p>The 14 controls that must be active on day 1 of your observation period</p>
</li>
<li><p>How to build evidence collection infrastructure that runs automatically</p>
</li>
<li><p>How to choose an auditor and run a readiness assessment</p>
</li>
<li><p>What happens during the observation period and how to close gaps without restarting the clock</p>
</li>
</ul>
<p>Let's dive in.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before following along, you should have:</p>
<p><strong>Knowledge:</strong></p>
<ul>
<li><p>Basic understanding of AWS services (EC2, RDS, S3, IAM, VPC)</p>
</li>
<li><p>Familiarity with Terraform or another infrastructure as code tool</p>
</li>
<li><p>Comfort reading GitHub Actions YAML workflows</p>
</li>
<li><p>A general understanding of what SOC2 is — if you are starting from scratch, read the <a href="https://www.aicpa-cima.com/resources/landing/system-and-organization-controls-soc-suite-of-services">AICPA's SOC2 overview</a> first</p>
</li>
</ul>
<p><strong>Tools and access:</strong></p>
<ul>
<li><p>An AWS account with administrator access</p>
</li>
<li><p>A GitHub organisation with admin rights</p>
</li>
<li><p>Terraform installed (v1.0 or later)</p>
</li>
<li><p>Python 3.8 or later (for the evidence collector Lambda)</p>
</li>
<li><p>A compliance automation platform — <a href="https://www.vanta.com/">Vanta</a> or <a href="https://drata.com/">Drata</a> — connected to your AWS account and GitHub organisation</p>
</li>
</ul>
<p><strong>Estimated time:</strong> 90 days end-to-end, with active engineering work of approximately 8–12 hours per week in the first six weeks, tapering to 2–4 hours per week during the observation period.</p>
<h2 id="heading-weeks-12-the-scope-decision-what-is-in-and-out-of-your-soc2-boundary">Weeks 1–2: The Scope Decision — What Is In and Out of Your SOC2 Boundary</h2>
<h3 id="heading-what-most-teams-get-wrong">What Most Teams Get Wrong</h3>
<p>Most teams scope their SOC2 boundary too broadly. They include every AWS account, every service, every environment. This is a mistake — and here is exactly why.</p>
<p>A broader scope means more controls to implement, more evidence to collect, and more systems the auditor will examine.</p>
<p>Every system inside your boundary must satisfy all 14 controls. Including your development sandbox means your engineers' experimental environments must have GuardDuty enabled, CloudTrail logging, and branch-protected deployments. That adds weeks of work and months of evidence collection for systems that pose no risk to your customers.</p>
<p>A correctly bounded scope means you include only the systems that store, process, or transmit customer data — and you prove that everything else cannot reach those systems.</p>
<p><strong>Bad scope (over-inclusive):</strong></p>
<pre><code class="language-plaintext">Entire AWS Organization
├── Production (in scope)
├── Staging (in scope)
├── Development (in scope)
├── Sandbox (in scope)
└── CI/CD (in scope)
</code></pre>
<p><strong>Good scope (correctly bounded):</strong></p>
<pre><code class="language-plaintext">SOC2 Boundary
├── Production AWS Account (in scope)
├── Production EKS Cluster (in scope)
├── Production RDS (in scope)
└── Everything else (OUT of scope — proven by network segmentation)
</code></pre>
<p>The correctly bounded scope works because it draws the tightest defensible line around the systems that actually handle customer data. Everything outside that line is excluded — not by assumption, but by technical controls that prevent those systems from reaching anything inside the boundary.</p>
<h3 id="heading-the-scope-decision-framework">The Scope Decision Framework</h3>
<p>For every system in your infrastructure, ask these four questions:</p>
<table>
<thead>
<tr>
<th>Question</th>
<th>If YES</th>
<th>If NO</th>
</tr>
</thead>
<tbody><tr>
<td>Does this system store, process, or transmit customer data?</td>
<td>✅ In scope</td>
<td>❌ Out of scope</td>
</tr>
<tr>
<td>Does this system affect the availability of customer-facing services?</td>
<td>✅ In scope</td>
<td>❌ Out of scope</td>
</tr>
<tr>
<td>Does this system have access to production credentials?</td>
<td>✅ In scope</td>
<td>❌ Out of scope</td>
</tr>
<tr>
<td>Can a compromise of this system lead to a customer data breach?</td>
<td>✅ In scope</td>
<td>❌ Out of scope</td>
</tr>
</tbody></table>
<p>Any system where the answer to even one question is yes belongs inside your boundary.</p>
<h3 id="heading-network-segmentation-the-technical-proof-that-your-boundary-holds">Network Segmentation — The Technical Proof That Your Boundary Holds</h3>
<p>Network segmentation is the practice of dividing your infrastructure into isolated zones so that systems in one zone can't communicate with systems in another unless you explicitly allow it.</p>
<p>In the context of SOC2, it's the technical control that proves your out-of-scope systems genuinely can't reach your in-scope systems — not just by policy, but by infrastructure enforcement.</p>
<p>Without network segmentation, the SOC2 auditor can't trust that your boundary is real. A developer in your sandbox environment who can query your production database means the sandbox is effectively in scope, regardless of what your diagram says.</p>
<p>Here's the Terraform that implements network segmentation between your production and non-production environments. The network access control list (NACL) blocks all inbound traffic from the broader private IP range (10.0.0.0/8) into your in-scope production VPC, while the explicit <code>aws_vpc_peering_connection</code> comment documents the deliberate decision not to peer environments:</p>
<pre><code class="language-hcl"># This account has NO VPC peering to non-production environments.
# The absence of peering is itself the segmentation control.
# Do NOT add peering connections to this account without SOC2 scope review.

resource "aws_network_acl" "deny_non_production" {
  vpc_id = aws_vpc.production.id

  # Block all inbound traffic from non-production IP ranges
  ingress {
    rule_no    = 100
    action     = "deny"
    from_port  = 0
    to_port    = 0
    protocol   = "-1"
    cidr_block = "10.0.0.0/8"
  }

  # Allow legitimate inbound traffic (HTTPS from internet)
  ingress {
    rule_no    = 200
    action     = "allow"
    from_port  = 443
    to_port    = 443
    protocol   = "tcp"
    cidr_block = "0.0.0.0/0"
  }

  # Allow all outbound (tighten this per your architecture)
  egress {
    rule_no    = 100
    action     = "allow"
    from_port  = 0
    to_port    = 0
    protocol   = "-1"
    cidr_block = "0.0.0.0/0"
  }

  tags = {
    Name        = "production-nacl"
    Environment = "production"
    Purpose     = "SOC2 network segmentation"
  }
}
</code></pre>
<p>Verify the segmentation with this command after applying the Terraform:</p>
<pre><code class="language-bash"># Confirm no VPC peering connections exist from production to non-production
aws ec2 describe-vpc-peering-connections \
  --filters Name=status-code,Values=active \
  --query 'VpcPeeringConnections[*].{ID:VpcPeeringConnectionId,Requester:RequesterVpcInfo.VpcId,Accepter:AccepterVpcInfo.VpcId}' \
  --output table
</code></pre>
<h3 id="heading-the-deliverable-your-soc2-boundary-diagram">The Deliverable: Your SOC2 Boundary Diagram</h3>
<p>At the end of weeks 1–2, you need a boundary diagram — a visual document that shows every in-scope system, every out-of-scope system, and the segmentation controls between them.</p>
<p>Here is what the diagram should contain:</p>
<img src="https://cdn.hashnode.com/uploads/covers/69d00d5be466e2b76263a583/29dfe0c8-f455-44af-8562-8d088f8a111a.png" alt="29dfe0c8-f455-44af-8562-8d088f8a111a" style="display:block;margin:0 auto" width="611" height="686" loading="lazy">

<p>Include every AWS service, every data flow arrow, and a label on the segmentation control. This diagram becomes your primary scope evidence and is typically the first thing an auditor asks for.</p>
<h2 id="heading-weeks-36-the-14-controls-that-must-be-active-on-day-1">Weeks 3–6: The 14 Controls That Must Be Active on Day 1</h2>
<p>These 14 controls must be implemented and actively collecting evidence from day 1 of your observation period. If you add any of them late, the observation period clock for that control restarts from the implementation date — not from day 1 of the audit period.</p>
<p>Think of the observation period as a surveillance camera recording your infrastructure. The auditor watches the footage later. If the camera was not on when a specific event occurred, that event has no record — and the SOC2 control for it has a gap.</p>
<h3 id="heading-control-1-mfa-enforcement-cc66">Control 1: MFA Enforcement (CC6.6)</h3>
<p>Multi-Factor Authentication (MFA) requires a user to verify their identity using two independent factors — something they know (a password) and something they have (a phone or hardware key). Without MFA, a stolen password is sufficient to access your production systems.</p>
<p>SOC2 CC6.6 requires that access to systems is restricted to authorized users. MFA is the technical control that makes "authorized" meaningful. Without it, any password compromise is a production access event.</p>
<p>To implement MFA, you can use AWS IAM Identity Center (formerly SSO) connected to your identity provider (Okta, Google Workspace, or Azure AD). MFA is then enforced at the identity provider level — any user without MFA enrolled can't authenticate, regardless of which AWS service they're trying to reach.</p>
<pre><code class="language-hcl"># IAM Identity Center configuration — MFA is enforced at the IdP level.
# No IAM user has direct console or CLI access.
# All access goes through SSO sessions (8-hour expiry by default).

resource "aws_ssoadmin_instance_access_control_attributes" "mfa" {
  instance_arn = tolist(data.aws_ssoadmin_instances.this.arns)[0]

  attribute {
    key = "email"
    value {
      source = ["$${path:email}"]
    }
  }
}
</code></pre>
<p>You can verify that no IAM users retain direct console access (which would bypass MFA):</p>
<pre><code class="language-bash"># Any user listed here has direct console access bypassing SSO — investigate immediately
aws iam list-users \
  --query 'Users[?PasswordLastUsed!=`null`].[UserName,PasswordLastUsed]' \
  --output table
</code></pre>
<h3 id="heading-control-2-infrastructure-as-code-cc81">Control 2: Infrastructure as Code (CC8.1)</h3>
<p>Infrastructure as Code (IaC) means defining your cloud infrastructure in version-controlled code files (Terraform, Pulumi, or AWS CDK) rather than creating resources manually through the AWS console. Every infrastructure change is proposed in a pull request, reviewed by a colleague, and applied through an automated pipeline.</p>
<p>SOC2 CC8.1 covers change management — the requirement that every change to your production environment is documented, reviewed, and approved. Manual console changes produce no audit trail. If an engineer opens the AWS console and creates a security group without going through Terraform, that change is invisible to your SOC2 auditor. IaC makes every change reviewable and traceable.</p>
<p>Now let's see how to implement IaC here. This GitHub Actions workflow applies Terraform only from the main branch, after a pull request has been reviewed and approved. The workflow creates an immutable record of every infrastructure change:</p>
<pre><code class="language-yaml"># .github/workflows/terraform-apply.yml
name: Terraform Apply (Production)
on:
  push:
    branches: [main]
    paths: ['terraform/**']

permissions:
  id-token: write   # Required for AWS OIDC authentication
  contents: read

jobs:
  apply:
    name: Apply Infrastructure Changes
    runs-on: ubuntu-latest
    environment: production  # Requires manual approval for production

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Configure AWS credentials (OIDC — no long-lived keys)
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/terraform-apply
          aws-region: us-east-1

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: "1.6.0"

      - name: Terraform Plan
        run: |
          terraform init
          terraform plan -out=tfplan -input=false

      - name: Terraform Apply
        run: terraform apply -input=false tfplan
</code></pre>
<p>SOC2 evidence this produces: A GitHub Actions run log for every infrastructure change, showing who triggered it (the pull request author), when it was applied, and what changed.</p>
<h3 id="heading-control-3-cloudtrail-enabled-cc71">Control 3: CloudTrail Enabled (CC7.1)</h3>
<p>AWS CloudTrail is a service that records every API call made in your AWS account — who called it, when, from which IP address, and whether it succeeded. Think of it as the complete audit log of everything that has ever happened in your AWS environment.</p>
<p>SOC2 CC7.1 requires monitoring for security events. CloudTrail is the foundational logging layer — without it, you can't detect unauthorized access, investigate incidents, or prove to an auditor that your controls were operating as intended. An auditor who can't see historical AWS API activity can't verify that your access controls were enforced during the observation period.</p>
<p>To implement it, you'll want to enable multi-region CloudTrail so that activity in every AWS region is captured, including global services like IAM. You can ship logs to an S3 bucket with Object Lock enabled (Control 3 in the evidence collection section covers this) so logs can't be modified or deleted:</p>
<pre><code class="language-bash"># Enable CloudTrail with log file validation and multi-region coverage
aws cloudtrail create-trail \
  --name production-audit-trail \
  --s3-bucket-name your-cloudtrail-logs-bucket \
  --is-multi-region-trail \
  --enable-log-file-validation \
  --include-global-service-events

# Start the trail (creation alone does not start logging)
aws cloudtrail start-logging --name production-audit-trail

# Verify the trail is active and logging
aws cloudtrail get-trail-status --name production-audit-trail \
  --query '{IsLogging:IsLogging,LatestDeliveryTime:LatestDeliveryTime}'
</code></pre>
<h3 id="heading-control-4-guardduty-enabled-cc72">Control 4: GuardDuty Enabled (CC7.2)</h3>
<p>AWS GuardDuty is a threat detection service that analyses your CloudTrail logs, VPC Flow Logs, and DNS logs. It uses machine learning to identify suspicious behaviour — things like an EC2 instance communicating with a known malware server, an IAM user logging in from an unusual country, or unusual API call patterns that indicate credential theft.</p>
<p>SOC2 CC7.2 requires the use of detection tools to identify potential security events. GuardDuty is the monitoring layer that tells you when something anomalous is happening, not just what happened after the fact. Without it, you would only discover a compromise when the damage is done.</p>
<p>Here's the implementation:</p>
<pre><code class="language-bash"># Enable GuardDuty — findings published every 15 minutes for active threats
aws guardduty create-detector \
  --enable \
  --finding-publishing-frequency FIFTEEN_MINUTES

# Verify GuardDuty is active
aws guardduty list-detectors --query 'DetectorIds' --output table
</code></pre>
<p>You can set up an EventBridge rule to route CRITICAL and HIGH severity GuardDuty findings to your incident response channel immediately. A finding sitting unreviewed for 90 days is a qualified SOC2 finding.</p>
<h3 id="heading-control-5-vpc-flow-logs-cc61">Control 5: VPC Flow Logs (CC6.1)</h3>
<p>VPC Flow Logs capture information about the IP traffic flowing through your Virtual Private Cloud — every accepted and rejected connection, including source IP, destination IP, port, protocol, and whether the traffic was allowed or denied. They are the network-level audit trail that CloudTrail doesn't provide.</p>
<p>SOC2 CC6.1 requires logical access controls and monitoring. VPC Flow Logs let you verify that your network segmentation is actually working (traffic you denied is showing as rejected in the logs), detect unexpected communication between services, and investigate security events at the network layer.</p>
<pre><code class="language-bash"># Create an IAM role for VPC Flow Logs to deliver to CloudWatch
aws iam create-role \
  --role-name vpc-flow-logs-role \
  --assume-role-policy-document '{
    "Version":"2012-10-17",
    "Statement":[{
      "Effect":"Allow",
      "Principal":{"Service":"vpc-flow-logs.amazonaws.com"},
      "Action":"sts:AssumeRole"
    }]
  }'

# Enable VPC Flow Logs for all traffic (ACCEPT and REJECT)
aws ec2 create-flow-logs \
  --resource-ids vpc-YOUR_PRODUCTION_VPC_ID \
  --resource-type VPC \
  --traffic-type ALL \
  --log-group-name /aws/vpc/flow-logs/production \
  --deliver-log-permission-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/vpc-flow-logs-role

# Verify flow logs are active
aws ec2 describe-flow-logs \
  --filter Name=resource-id,Values=vpc-YOUR_PRODUCTION_VPC_ID \
  --query 'FlowLogs[*].{Status:FlowLogStatus,LogGroup:LogGroupName}'
</code></pre>
<h3 id="heading-control-6-secrets-manager-cc67">Control 6: Secrets Manager (CC6.7)</h3>
<p>Secrets management means storing credentials (database passwords, API keys, certificates, and other sensitive configuration values) in a dedicated, access-controlled service (like AWS Secrets Manager or HashiCorp Vault) rather than in <code>.env</code> files, GitHub repository secrets, or hardcoded in application code.</p>
<p>SOC2 CC6.7 requires protecting sensitive system components from unauthorized access. A secret stored in an <code>.env</code> file committed to a repository is accessible to every developer with repo access, every CI/CD runner, and every engineer who has ever cloned the repo — including those who have since left the company.</p>
<p>A Secrets Manager provides centralised storage, access logging, automatic rotation, and fine-grained IAM permissions so only specific services can retrieve specific secrets.</p>
<p>Let's look at the implementation — storing and rotating a secret:</p>
<pre><code class="language-bash"># Store a database credential with automatic 90-day rotation
aws secretsmanager create-secret \
  --name production/postgresql/credentials \
  --description "Production PostgreSQL credentials — rotated every 90 days" \
  --secret-string '{
    "username": "app_user",
    "password": "REPLACE_WITH_STRONG_PASSWORD",
    "host": "your-rds-endpoint.us-east-1.rds.amazonaws.com",
    "port": 5432,
    "dbname": "production"
  }'

# Enable automatic rotation every 90 days
aws secretsmanager rotate-secret \
  --secret-id production/postgresql/credentials \
  --rotation-rules AutomaticallyAfterDays=90
</code></pre>
<p>How your application retrieves the secret at runtime (no hardcoded credentials):</p>
<pre><code class="language-python"># Good: secret retrieved at runtime from Secrets Manager
import boto3
import json

def get_db_credentials():
    client = boto3.client('secretsmanager', region_name='us-east-1')
    response = client.get_secret_value(SecretId='production/postgresql/credentials')
    return json.loads(response['SecretString'])

# Bad: secret hardcoded in application code or .env file
DB_PASSWORD = "my_database_password_123"  # Never do this
</code></pre>
<p>The access log in CloudTrail records every time a secret is retrieved, by which IAM role, at what time. That log is your SOC2 evidence that secrets access is controlled and auditable.</p>
<h3 id="heading-control-7-ebs-encryption-cc61">Control 7: EBS Encryption (CC6.1)</h3>
<p>EBS (Elastic Block Store) encryption ensures that the persistent disks attached to your EC2 instances and used by your RDS databases are encrypted at rest using AES-256. If an AWS employee or an attacker gained physical access to the storage hardware, the data would be unreadable without the encryption key.</p>
<p>SOC2 CC6.1 requires protecting information assets from unauthorised access. Encryption at rest is the control that protects data in the event of physical storage compromise or an improperly decommissioned disk. Enabling it account-wide means every new EBS volume is encrypted automatically, including RDS storage, EKS node volumes, and EC2 instance root volumes.</p>
<pre><code class="language-bash"># Enable EBS encryption by default for all new volumes in this region
aws ec2 enable-ebs-encryption-by-default

# Verify it is enabled
aws ec2 get-ebs-encryption-by-default \
  --query 'EbsEncryptionByDefault'
# Expected output: true

# Check existing volumes — any showing false need to be migrated
aws ec2 describe-volumes \
  --query 'Volumes[?Encrypted==`false`].[VolumeId,Size,VolumeType]' \
  --output table
</code></pre>
<p>Any existing unencrypted volumes must be snapshot-and-replaced. The process: create a snapshot of the unencrypted volume, create a new encrypted volume from the snapshot, and swap it into the instance.</p>
<h3 id="heading-control-8-s3-block-public-access-cc61">Control 8: S3 Block Public Access (CC6.1)</h3>
<p>Amazon S3 buckets can be configured to allow public access — meaning anyone on the internet can read their contents without authentication. Block Public Access is an account-level and bucket-level setting that prevents any bucket from being made public, regardless of the bucket's own policy.</p>
<p>A misconfigured S3 bucket is one of the most common causes of data breaches in cloud environments. Block Public Access at the account level means a developer can't accidentally expose a bucket containing customer data, even if they set the wrong bucket policy. It's a guardrail, not just a policy.</p>
<pre><code class="language-bash"># Block public access at the AWS account level — applies to all buckets
aws s3control put-public-access-block \
  --account-id YOUR_ACCOUNT_ID \
  --public-access-block-configuration \
    BlockPublicAcls=true,\
    IgnorePublicAcls=true,\
    BlockPublicPolicy=true,\
    RestrictPublicBuckets=true

# Verify account-level setting is active
aws s3control get-public-access-block \
  --account-id YOUR_ACCOUNT_ID

# Scan for any buckets that have public access enabled (should be zero)
aws s3api list-buckets --query 'Buckets[*].Name' --output text | \
  tr '\t' '\n' | while read bucket; do
    result=\((aws s3api get-public-access-block --bucket "\)bucket" 2&gt;/dev/null)
    if echo "$result" | grep -q '"BlockPublicAcls": false'; then
      echo "WARNING: $bucket has public access not fully blocked"
    fi
  done
</code></pre>
<h3 id="heading-control-9-branch-protection-cc81">Control 9: Branch Protection (CC8.1)</h3>
<p>Branch protection is a GitHub setting that prevents engineers from pushing code directly to your main branch without going through a pull request that has been reviewed and approved by at least one other team member. It also requires your CI pipeline to pass before any code can be merged.</p>
<p>SOC2 CC8.1 requires change management — the requirement that every change to production systems is documented, reviewed, and approved. Without branch protection, an engineer can push directly to main, which deploys directly to production through your CI/CD pipeline, with no review and no audit trail. Branch protection is the technical enforcement of your change management policy.</p>
<p>The critical setting that most teams miss: the "Do not allow bypassing the above settings" option must be enabled. Without it, administrators can bypass branch protection — and a SOC2 auditor will flag this as a gap because it means your change management control can be circumvented.</p>
<pre><code class="language-yaml"># .github/settings.yml — enforces branch protection via code
# Requires the settings GitHub App: https://github.com/apps/settings

branches:
  - name: main
    protection:
      required_pull_request_reviews:
        required_approving_review_count: 1
        dismiss_stale_reviews: true
        require_code_owner_reviews: false
      required_status_checks:
        strict: true
        contexts:
          - "CI / test"
          - "Security / trivy-scan"
      enforce_admins: true         # Admins cannot bypass — this is critical
      restrictions: null           # No push restriction beyond the above
      allow_force_pushes: false
      allow_deletions: false
</code></pre>
<p>Here's how you can verify that branch protection is enforced and admins can't bypass it:</p>
<pre><code class="language-bash"># Returns the branch protection rules including enforce_admins status
curl -H "Authorization: token YOUR_GITHUB_TOKEN" \
  https://api.github.com/repos/YOUR_ORG/YOUR_REPO/branches/main/protection \
  | jq '{enforce_admins: .enforce_admins.enabled, required_reviews: .required_pull_request_reviews.required_approving_review_count}'
</code></pre>
<h3 id="heading-control-10-container-image-scanning-cc74">Control 10: Container Image Scanning (CC7.4)</h3>
<p>Container image scanning analyses your Docker images before deployment to identify known security vulnerabilities (CVEs) in the operating system packages and application dependencies they contain.</p>
<p>Trivy is an open-source scanner that checks the base image (Ubuntu, Alpine, and so on), all installed OS packages, and language-specific dependencies (npm, pip, Go modules) against the National Vulnerability Database.</p>
<p>SOC2 CC7.4 requires monitoring and identifying vulnerabilities. Every container you deploy contains a base image with OS packages — and those packages regularly receive CVE disclosures. A critical CVE left unpatched for 90 days in a production container is a SOC2 finding. Automated scanning in CI means every image is checked before it can deploy.</p>
<pre><code class="language-yaml"># .github/workflows/security-scan.yml
name: Security Scan
on: [push, pull_request]

jobs:
  trivy-scan:
    name: Container Vulnerability Scan
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Build container image
        run: docker build -t app:${{ github.sha }} .

      - name: Scan image for vulnerabilities
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: app:${{ github.sha }}
          format: sarif
          output: trivy-results.sarif
          severity: CRITICAL,HIGH
          exit-code: 1          # Fail the pipeline on CRITICAL or HIGH findings

      - name: Upload results to GitHub Security tab
        uses: github/codeql-action/upload-sarif@v2
        if: always()            # Upload even if scan found issues
        with:
          sarif_file: trivy-results.sarif
</code></pre>
<p>The scanner looks for:</p>
<ul>
<li><p>CVEs in base image OS packages (for example, a critical OpenSSL vulnerability in your Ubuntu base)</p>
</li>
<li><p>Vulnerable versions of application dependencies (a known RCE in an npm package your app uses)</p>
</li>
<li><p>Misconfigurations in the Dockerfile itself (running as root, using <code>latest</code> tags)</p>
</li>
</ul>
<p>Results appear in the GitHub Security tab for your repository, giving you a historical record of every scan — which is your SOC2 evidence.</p>
<h3 id="heading-control-11-incident-response-plan-cc92">Control 11: Incident Response Plan (CC9.2)</h3>
<p>An incident response plan is a written, tested procedure that defines exactly what your team does when a security event occurs — from the moment an alert fires through to customer notification and post-incident review.</p>
<p>SOC2 CC9.2 requires that you have a documented process for responding to security events and that you've tested it. The auditor will ask for the written runbook and evidence that a tabletop exercise (a simulated incident walkthrough) has been conducted within the observation period.</p>
<p>Your incident response runbook must include:</p>
<ol>
<li><p><strong>Severity classification:</strong> Definitions of P1 (production down, customer data at risk), P2 (degraded service, potential risk), and P3 (minor issue, no customer impact) — and the response SLA for each.</p>
</li>
<li><p><strong>Escalation path:</strong> Exactly who gets paged at each severity level, with contact details. Not "the on-call engineer" — specific names and a backup if the first person doesn't respond within 10 minutes.</p>
</li>
<li><p><strong>First 15 minutes:</strong> The specific steps to take immediately — isolate the affected system, assess the scope, notify the incident channel, begin the timeline log.</p>
</li>
<li><p><strong>Communication templates:</strong> Pre-written Slack messages, customer email templates, and regulatory notification templates (GDPR requires notification within 72 hours, HIPAA within 60 days).</p>
</li>
<li><p><strong>Post-incident review:</strong> The blameless postmortem process, the <a href="https://www.freecodecamp.org/news/from-symptoms-to-root-cause-how-to-use-the-5-whys-technique/">5-why</a> root cause analysis template, and the action item tracking process.</p>
</li>
</ol>
<p>Conduct a tabletop exercise at least once during your observation period: gather your engineering team for 45 minutes, simulate a realistic scenario (for example, "an AWS access key was committed to a public GitHub repo"), and walk through the runbook together. Document the meeting date, attendees, scenario, gaps found, and remediation actions. This document is your evidence.</p>
<h3 id="heading-control-12-access-reviews-cc63">Control 12: Access Reviews (CC6.3)</h3>
<p>An access review is a quarterly audit of who has access to what in your production systems — AWS accounts, GitHub repositories, production databases, and every SaaS tool that touches customer data. You verify that every person on the list still works at the company and still needs the access their role grants them.</p>
<p>SOC2 CC6.3 requires that access is revoked when it's no longer needed. Former employees who retain access to production AWS accounts represent a genuine security risk and a definitive SOC2 finding.</p>
<p>In every access review I've conducted, at least 3–5 former employees or contractors still had active access they should not.</p>
<p>The quarterly access review checklist:</p>
<pre><code class="language-bash"># 1. IAM users — list all with their last login date
aws iam generate-credential-report
aws iam get-credential-report --output text --query Content \
  | base64 --decode | cut -d',' -f1,5 | column -t -s ','

# 2. IAM roles — find roles that have not been used in 90+ days
aws iam get-account-authorization-details \
  --query 'RoleDetailList[*].{Role:RoleName,LastUsed:RoleLastUsed.LastUsedDate}' \
  --output table

# 3. Verify AWS SSO user list matches your current employee list
aws identitystore list-users \
  --identity-store-id YOUR_IDENTITY_STORE_ID \
  --query 'Users[*].{Name:DisplayName,Email:Emails[0].Value}' \
  --output table
</code></pre>
<p>Cross-reference the output against your current employee list in your HR system. Document every change made — access removed, permissions reduced, accounts disabled. The documented changes are the evidence that the review was conducted meaningfully, not just as a checkbox exercise.</p>
<h3 id="heading-control-13-backup-verification-cc95">Control 13: Backup Verification (CC9.5)</h3>
<p>Backup verification is the process of actually restoring your backups to confirm they work — not just confirming that backups are being created. A backup that has never been tested doesn't exist from a recovery perspective.</p>
<p>SOC2 CC9.5 requires that recovery procedures are tested. If your production database is corrupted and you discover for the first time during the incident that your automated RDS snapshots can't be restored, you have both a disaster recovery failure and a SOC2 finding.</p>
<p>How to test your RDS backup:</p>
<pre><code class="language-bash"># Step 1: Find your most recent production snapshot
aws rds describe-db-snapshots \
  --db-instance-identifier your-production-db \
  --query 'sort_by(DBSnapshots, &amp;SnapshotCreateTime)[-1].DBSnapshotIdentifier' \
  --output text

# Step 2: Restore the snapshot to a test instance
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier backup-verification-test \
  --db-snapshot-identifier YOUR_SNAPSHOT_ID \
  --db-instance-class db.t3.medium \
  --no-publicly-accessible \
  --tags Key=Purpose,Value=backup-verification Key=Environment,Value=test

# Step 3: Wait for the restore to complete (typically 5–15 minutes)
aws rds wait db-instance-available \
  --db-instance-identifier backup-verification-test

# Step 4: Connect and verify data integrity (spot check key tables)
# Run this against the restored instance
psql -h RESTORED_INSTANCE_ENDPOINT -U your_user -d your_database \
  -c "SELECT COUNT(*) FROM users; SELECT MAX(created_at) FROM orders;"

# Step 5: Document the test result and delete the test instance
aws rds delete-db-instance \
  --db-instance-identifier backup-verification-test \
  --skip-final-snapshot
</code></pre>
<p>Document the test date, the snapshot used, the restore time, the data verification query results, and who conducted the test. Run this quarterly at minimum. This documentation is your SOC2 evidence for CC9.5.</p>
<h3 id="heading-control-14-change-management-log-cc81">Control 14: Change Management Log (CC8.1)</h3>
<p>A change management log is the auditable record of every change made to your production environment — what changed, who approved it, and when it was applied.</p>
<p>SOC2 CC8.1 requires that changes to your production environment are authorized and documented. With IaC and GitOps in place, you already have two separate sources of immutable change history that together satisfy this control.</p>
<p><strong>GitHub Pull Request history</strong> provides the record of every code and infrastructure change: who opened the PR, who reviewed and approved it, what the CI status was, and when it was merged. This is your change management log for application and infrastructure changes.</p>
<p><strong>ArgoCD sync history</strong> provides the record of every deployment to your Kubernetes cluster: which application was synced, from which Git commit, at what time, and whether the sync succeeded.</p>
<p>To export the ArgoCD sync history as evidence:</p>
<pre><code class="language-bash"># Export ArgoCD application sync history as JSON evidence
argocd app history YOUR_APP_NAME --output json &gt; argocd-sync-history-$(date +%Y%m).json

# Upload to your SOC2 evidence bucket
aws s3 cp argocd-sync-history-$(date +%Y%m).json \
  s3://your-soc2-evidence-bucket/change-management/$(date +%Y/%m)/

# For each deployment, the evidence contains:
# - App name, deployed revision (Git commit SHA)
# - Deployment timestamp
# - Initiating user or automated sync
# - Success/failure status
</code></pre>
<p>Together, the GitHub PR history and the ArgoCD sync history give the auditor a complete, tamper-evident record of every change to your production environment during the observation period.</p>
<h2 id="heading-weeks-710-the-evidence-collection-infrastructure">Weeks 7–10: The Evidence Collection Infrastructure</h2>
<p>Evidence is the difference between passing and failing SOC2.</p>
<p>You might be wondering: what exactly is evidence? In SOC2 terms, evidence is the documentation that proves a specific control was operating correctly during a specific point in time within the observation period. A policy document says you will do something. Evidence proves you did it — and that you did it continuously, not just the week before the audit.</p>
<p>For example:</p>
<ul>
<li><p>For MFA enforcement (Control 1), evidence is a screenshot of your IAM Identity Center MFA settings taken at a specific date during the observation period, combined with an IAM credential report showing zero IAM users with console access.</p>
</li>
<li><p>For GuardDuty (Control 4), evidence is the GuardDuty console screenshot showing active detectors, plus your documented response to any findings during the period.</p>
</li>
<li><p>For access reviews (Control 12), evidence is the completed access review document with dates, names, and specific access changes made.</p>
</li>
</ul>
<p>The challenge is collecting this evidence continuously across 3–12 months without spending hundreds of hours on manual work. The solution is automated evidence collection infrastructure.</p>
<h3 id="heading-the-evidence-bucket-tamper-proof-storage-for-your-audit-evidence">The Evidence Bucket — Tamper-Proof Storage for Your Audit Evidence</h3>
<p>The evidence bucket is an S3 bucket with Object Lock enabled in GOVERNANCE mode. Object Lock prevents any object from being deleted or modified for the retention period you specify — in this case, 365 days. This means once a piece of evidence is uploaded, it can't be altered, even by a user with administrator access (without explicitly overriding the lock, which itself creates an audit trail).</p>
<p>This tamper-evident property is what gives the auditor confidence that the evidence was not created or modified after the fact.</p>
<pre><code class="language-hcl"># terraform/soc2-evidence-bucket.tf

resource "aws_s3_bucket" "soc2_evidence" {
  bucket = "\({var.company_name}-soc2-evidence-\){var.environment}"
}

# Block all public access to the evidence bucket
resource "aws_s3_bucket_public_access_block" "soc2_evidence" {
  bucket = aws_s3_bucket.soc2_evidence.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Enable versioning so overwrites create new versions, not replacements
resource "aws_s3_bucket_versioning" "soc2_evidence" {
  bucket = aws_s3_bucket.soc2_evidence.id
  versioning_configuration {
    status = "Enabled"
  }
}

# Object Lock in GOVERNANCE mode — objects cannot be deleted for 365 days
resource "aws_s3_bucket_object_lock_configuration" "soc2_evidence" {
  bucket = aws_s3_bucket.soc2_evidence.id

  rule {
    default_retention {
      mode = "GOVERNANCE"
      days = 365
    }
  }
}

# Encrypt all evidence at rest
resource "aws_s3_bucket_server_side_encryption_configuration" "soc2_evidence" {
  bucket = aws_s3_bucket.soc2_evidence.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}
</code></pre>
<h3 id="heading-the-daily-evidence-collector-lambda">The Daily Evidence Collector Lambda</h3>
<p>This Lambda function runs automatically every day and exports the status of each critical control to a time-stamped JSON file in the evidence bucket. Over your 3–12 month observation period, it creates a daily record proving that your controls were active and operating.</p>
<p>The function checks seven controls automatically: CloudTrail status, GuardDuty status, VPC Flow Logs, S3 public access block, EBS encryption, MFA compliance, and GuardDuty finding count. Each daily snapshot is uploaded with Object Lock enabled so it can't be modified.</p>
<pre><code class="language-python"># lambda/evidence-collector/handler.py

import boto3
import json
from datetime import datetime, timedelta, timezone

def lambda_handler(event, context):
    """
    Daily SOC2 evidence collector.
    Runs at 00:00 UTC every day via EventBridge scheduler.
    Exports control status to S3 evidence bucket with Object Lock.
    """
    evidence = {
        'collection_timestamp': datetime.now(timezone.utc).isoformat(),
        'collection_date': datetime.now(timezone.utc).strftime('%Y-%m-%d'),
        'account_id': boto3.client('sts').get_caller_identity()['Account'],
        'controls': {}
    }

    # Control 3: CloudTrail status
    cloudtrail = boto3.client('cloudtrail')
    trails = cloudtrail.describe_trails(includeShadowTrails=False)['trailList']
    multi_region_trails = [t for t in trails if t.get('IsMultiRegionTrail')]
    evidence['controls']['cloudtrail'] = {
        'status': 'PASS' if multi_region_trails else 'FAIL',
        'detail': f"{len(multi_region_trails)} multi-region trail(s) active",
        'trails': [t['Name'] for t in multi_region_trails]
    }

    # Control 4: GuardDuty status
    guardduty = boto3.client('guardduty')
    detectors = guardduty.list_detectors()['DetectorIds']
    unresolved_critical = 0
    for detector_id in detectors:
        findings = guardduty.list_findings(
            DetectorId=detector_id,
            FindingCriteria={
                'Criterion': {
                    'severity': {'Gte': 7},  # HIGH and CRITICAL only
                    'service.archived': {'Eq': ['false']}
                }
            }
        )
        unresolved_critical += len(findings['FindingIds'])

    evidence['controls']['guardduty'] = {
        'status': 'PASS' if detectors else 'FAIL',
        'detail': f"{len(detectors)} detector(s) active, {unresolved_critical} unresolved HIGH/CRITICAL findings",
        'unresolved_high_critical': unresolved_critical
    }

    # Control 5: VPC Flow Logs
    ec2 = boto3.client('ec2')
    flow_logs = ec2.describe_flow_logs(
        Filters=[{'Name': 'resource-type', 'Values': ['VPC']},
                 {'Name': 'flow-log-status', 'Values': ['ACTIVE']}]
    )['FlowLogs']
    evidence['controls']['vpc_flow_logs'] = {
        'status': 'PASS' if flow_logs else 'FAIL',
        'detail': f"{len(flow_logs)} active VPC flow log(s)",
        'active_flow_logs': len(flow_logs)
    }

    # Control 7: EBS encryption by default
    ebs_encryption = ec2.get_ebs_encryption_by_default()['EbsEncryptionByDefault']
    evidence['controls']['ebs_encryption_by_default'] = {
        'status': 'PASS' if ebs_encryption else 'FAIL',
        'detail': 'EBS encryption by default is enabled' if ebs_encryption else 'EBS encryption by default is NOT enabled'
    }

    # Control 8: S3 Block Public Access (account level)
    s3control = boto3.client('s3control')
    account_id = boto3.client('sts').get_caller_identity()['Account']
    try:
        pab = s3control.get_public_access_block(AccountId=account_id)['PublicAccessBlockConfiguration']
        all_blocked = all([pab['BlockPublicAcls'], pab['IgnorePublicAcls'],
                           pab['BlockPublicPolicy'], pab['RestrictPublicBuckets']])
        evidence['controls']['s3_block_public_access'] = {
            'status': 'PASS' if all_blocked else 'FAIL',
            'detail': 'All four S3 Block Public Access settings enabled' if all_blocked else 'One or more S3 Block Public Access settings not enabled',
            'configuration': pab
        }
    except Exception as e:
        evidence['controls']['s3_block_public_access'] = {'status': 'FAIL', 'detail': str(e)}

    # Upload evidence to S3 with Object Lock
    s3 = boto3.client('s3')
    evidence_key = f"daily/{evidence['collection_date']}/control-status.json"
    lock_until = datetime.now(timezone.utc) + timedelta(days=365)

    s3.put_object(
        Bucket='YOUR_EVIDENCE_BUCKET_NAME',
        Key=evidence_key,
        Body=json.dumps(evidence, indent=2),
        ContentType='application/json',
        ObjectLockMode='GOVERNANCE',
        ObjectLockRetainUntilDate=lock_until
    )

    # Alert if any control fails
    failed_controls = [k for k, v in evidence['controls'].items() if v['status'] == 'FAIL']
    if failed_controls:
        sns = boto3.client('sns')
        sns.publish(
            TopicArn='YOUR_ALERT_TOPIC_ARN',
            Subject=f'SOC2 Control Failure Detected — {evidence["collection_date"]}',
            Message=f'The following controls failed their daily check:\n\n{json.dumps(failed_controls, indent=2)}'
        )

    return {
        'statusCode': 200,
        'controls_checked': len(evidence['controls']),
        'controls_failed': len(failed_controls),
        'evidence_location': f"s3://YOUR_EVIDENCE_BUCKET_NAME/{evidence_key}"
    }
</code></pre>
<h3 id="heading-the-github-actions-evidence-workflow">The GitHub Actions Evidence Workflow</h3>
<p>This workflow runs daily and captures evidence that can't be automated through AWS APIs — GitHub-level controls like branch protection status, recent pull request activity, and CI pipeline results. It exports these as JSON files to the same evidence bucket.</p>
<pre><code class="language-yaml"># .github/workflows/soc2-evidence.yml
name: SOC2 Evidence Collection
on:
  schedule:
    - cron: '0 1 * * *'   # 01:00 UTC daily (after the Lambda runs at 00:00)
  workflow_dispatch:        # Allow manual trigger when needed

permissions:
  contents: read

jobs:
  collect-github-evidence:
    name: Collect GitHub Control Evidence
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/evidence-collector
          aws-region: us-east-1

      - name: Collect branch protection status
        run: |
          DATE=$(date +%Y-%m-%d)
          mkdir -p evidence/github

          # Export branch protection rules for main
          curl -s -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
            "https://api.github.com/repos/${{ github.repository }}/branches/main/protection" \
            | jq '{
                date: "'$DATE'",
                enforce_admins: .enforce_admins.enabled,
                required_reviews: .required_pull_request_reviews.required_approving_review_count,
                required_status_checks: .required_status_checks.contexts,
                allow_force_pushes: .allow_force_pushes.enabled
              }' &gt; evidence/github/branch-protection-$DATE.json

          echo "Branch protection evidence collected"
          cat evidence/github/branch-protection-$DATE.json

      - name: Upload evidence to S3
        run: |
          DATE=$(date +%Y-%m-%d)
          aws s3 sync evidence/ \
            s3://\({{ secrets.SOC2_EVIDENCE_BUCKET }}/daily/\)DATE/github/ \
            --no-progress
          echo "Evidence uploaded: s3://\({{ secrets.SOC2_EVIDENCE_BUCKET }}/daily/\)DATE/github/"
</code></pre>
<h2 id="heading-weeks-1114-auditor-selection-and-readiness-assessment">Weeks 11–14: Auditor Selection and Readiness Assessment</h2>
<h3 id="heading-how-to-choose-a-soc2-auditor">How to Choose a SOC2 Auditor</h3>
<p>Selecting the right auditor is more consequential than most teams realize. SOC2 audits are conducted by CPA firms — specifically, firms licensed to issue SOC reports. The right firm has experience with cloud-native, SaaS companies your size. The wrong firm could apply enterprise audit frameworks to a seed-stage startup and generate findings based on controls that aren't appropriate to your context.</p>
<p>Here is what to look for and what to watch out for:</p>
<h4 id="heading-experience-matters-more-than-brand">Experience matters more than brand</h4>
<p>A large Big Four firm isn't necessarily better than a specialist boutique auditor for a 20-person SaaS company.</p>
<p>Ask specifically: "How many SOC2 audits have you completed in the last 12 months for SaaS companies between 10 and 50 employees?" You want a firm where this is common, not exceptional.</p>
<h4 id="heading-verify-familiarity-with-your-compliance-tool">Verify familiarity with your compliance tool</h4>
<p>If you're using Vanta or Drata, confirm that the auditor has experience with evidence produced by those platforms. Some auditors prefer to collect evidence directly and are unfamiliar with automated evidence exports. An auditor who doesn't trust your Vanta evidence will ask you to re-collect everything manually.</p>
<h4 id="heading-understand-what-type-ii-actually-costs">Understand what Type II actually costs</h4>
<p>For a Series A SaaS company, expect \(15,000–\)30,000 for a SOC2 Type II audit with a 3-month observation period. A quote below \(10,000 often means the auditor is cutting corners on the review depth. A quote above \)50,000 for a small company typically means the firm is applying enterprise pricing to a startup engagement.</p>
<h4 id="heading-get-references-from-similar-companies">Get references from similar companies</h4>
<p>Ask the auditor for two or three references from SaaS companies they've audited in the last year. Call those references and ask: did the auditor understand cloud infrastructure? Were the findings reasonable? How was the communication during the review?</p>
<p>Here's a summary table of some things to watch out for:</p>
<table>
<thead>
<tr>
<th>Criteria</th>
<th>What to Look For</th>
<th>Red Flag</th>
</tr>
</thead>
<tbody><tr>
<td>Experience</td>
<td>5+ years, 20+ SaaS audits annually</td>
<td>"We have completed several SOC2 audits" (vague)</td>
</tr>
<tr>
<td>Tool familiarity</td>
<td>Has reviewed Vanta/Drata evidence before</td>
<td>Requires manual re-collection of automated evidence</td>
</tr>
<tr>
<td>Company size fit</td>
<td>Has audited companies your size</td>
<td>Only lists enterprise clients as references</td>
</tr>
<tr>
<td>Cost (Type II)</td>
<td>\(15K–\)30K for a 20-person company</td>
<td>Under \(10K or over \)50K without clear justification</td>
</tr>
<tr>
<td>References</td>
<td>Can provide SaaS company contacts to call</td>
<td>Cannot provide references</td>
</tr>
</tbody></table>
<h3 id="heading-how-to-run-a-readiness-assessment-mock-audit">How to Run a Readiness Assessment (Mock Audit)</h3>
<p>A readiness assessment is a self-conducted simulation of the real audit, run 2–4 weeks before you engage the auditor. Its purpose is to find and close gaps before the auditor finds them, because gaps found in a mock audit cost you a week of remediation time, while gaps found in the real audit cost you a conditional report and a re-review.</p>
<p>You can run the readiness assessment yourself or hire a consultant to run it. The consultant approach is more valuable because an independent reviewer will find gaps you have rationalised away.</p>
<p>The process:</p>
<ol>
<li><p><strong>Step 1:</strong> Work through every control in the checklist below and attempt to produce the evidence that an auditor would request.</p>
</li>
<li><p><strong>Step 2:</strong> For every control where you can't produce clear, timestamped evidence: that's a gap. Document it.</p>
</li>
<li><p><strong>Step 3:</strong> Prioritise gaps by type. Evidence gaps (missing evidence for an active control) require evidence collection infrastructure fixes. Control gaps (a control that isn't implemented) require engineering work.</p>
</li>
<li><p><strong>Step 4:</strong> Close all gaps before engaging the real auditor.</p>
</li>
</ol>
<table>
<thead>
<tr>
<th>Control</th>
<th>Evidence Required</th>
<th>How to Verify</th>
<th>Ready?</th>
</tr>
</thead>
<tbody><tr>
<td>MFA enforced</td>
<td>IAM credential report + SSO MFA policy screenshot</td>
<td><code>aws iam get-credential-report</code></td>
<td>⬜</td>
</tr>
<tr>
<td>CloudTrail active</td>
<td>Trail status + S3 delivery confirmation</td>
<td><code>aws cloudtrail get-trail-status</code></td>
<td>⬜</td>
</tr>
<tr>
<td>GuardDuty active</td>
<td>Detector list + finding review log</td>
<td><code>aws guardduty list-detectors</code></td>
<td>⬜</td>
</tr>
<tr>
<td>VPC Flow Logs</td>
<td>Active flow log list + sample log entries</td>
<td><code>aws ec2 describe-flow-logs</code></td>
<td>⬜</td>
</tr>
<tr>
<td>Secrets in Secrets Manager</td>
<td>Secret list + rotation policy confirmation</td>
<td><code>aws secretsmanager list-secrets</code></td>
<td>⬜</td>
</tr>
<tr>
<td>EBS encryption by default</td>
<td>Account-level encryption setting</td>
<td><code>aws ec2 get-ebs-encryption-by-default</code></td>
<td>⬜</td>
</tr>
<tr>
<td>S3 Block Public Access</td>
<td>Account-level PAB configuration</td>
<td><code>aws s3control get-public-access-block</code></td>
<td>⬜</td>
</tr>
<tr>
<td>Branch protection (no admin bypass)</td>
<td>GitHub branch protection API response</td>
<td>GitHub API or Settings UI</td>
<td>⬜</td>
</tr>
<tr>
<td>Trivy scanning in CI</td>
<td>GitHub Actions run history showing scans</td>
<td>GitHub Actions logs</td>
<td>⬜</td>
</tr>
<tr>
<td>Incident response runbook</td>
<td>Written runbook + tabletop exercise notes with date</td>
<td>Document review</td>
<td>⬜</td>
</tr>
<tr>
<td>Access review</td>
<td>Quarterly review document with specific changes made</td>
<td>Document review</td>
<td>⬜</td>
</tr>
<tr>
<td>Backup test</td>
<td>RDS restore log + data verification results</td>
<td>Document review</td>
<td>⬜</td>
</tr>
<tr>
<td>Change management log</td>
<td>GitHub PR history + ArgoCD sync history</td>
<td>GitHub and ArgoCD</td>
<td>⬜</td>
</tr>
</tbody></table>
<p><strong>The one thing most teams skip:</strong> Running the readiness assessment against their own evidence bucket. Pull a random day's evidence from the daily Lambda export and verify that it's complete, timestamped, and accurately reflects the control status on that day.</p>
<p>If the evidence file for December 14th shows GuardDuty as PASS but GuardDuty was actually disabled that day, the auditor will find the discrepancy in the AWS account history — and that's a qualified finding.</p>
<h2 id="heading-weeks-1518-the-observation-period">Weeks 15–18: The Observation Period</h2>
<h3 id="heading-how-the-auditor-observes-your-controls">How the Auditor Observes Your Controls</h3>
<p>The SOC2 auditor doesn't physically visit your office or sit inside your AWS console watching your infrastructure in real time. The audit is a remote, documentation-based process conducted entirely through evidence review.</p>
<p>Here is how it actually works:</p>
<p>First, the auditor provides a list of evidence requests — typically 80–150 items for a Type II audit. You upload the evidence to a shared portal (the auditor provides this — it is usually a secure document sharing platform). The auditor reviews the evidence, asks follow-up questions, and identifies gaps where evidence is missing or a control wasn't operating as described.</p>
<p>For automated controls like CloudTrail and GuardDuty, the evidence is your daily Lambda exports — the auditor spot-checks a sample of daily snapshots across the observation period to verify the controls were consistently active.</p>
<p>For manual controls like access reviews and backup tests, the evidence is the documents you produced when you ran those processes.</p>
<p>The practical implication: the auditor is trusting your evidence. This is why the Object Lock on your evidence bucket matters. It proves to the auditor that the evidence was generated at the time it claims to have been generated and hasn't been modified since.</p>
<h3 id="heading-what-the-auditor-reviews-over-the-observation-period">What the Auditor Reviews Over the Observation Period</h3>
<table>
<thead>
<tr>
<th>What They Check</th>
<th>How Often</th>
<th>What They Are Looking For</th>
</tr>
</thead>
<tbody><tr>
<td>CloudTrail logs</td>
<td>Spot check monthly</td>
<td>Manual console changes that bypassed IaC, gaps in log delivery</td>
</tr>
<tr>
<td>GuardDuty findings</td>
<td>Review quarterly summary</td>
<td>HIGH or CRITICAL findings not remediated within your documented SLA</td>
</tr>
<tr>
<td>Access review completion</td>
<td>Verify each quarterly cycle</td>
<td>Reviews skipped, reviews with no access changes despite employee turnover</td>
</tr>
<tr>
<td>Incident response tests</td>
<td>Verify annually</td>
<td>No tabletop exercise conducted during the observation period</td>
</tr>
<tr>
<td>Evidence collection</td>
<td>Verify continuous coverage</td>
<td>Gaps in daily evidence exports, missing evidence for specific dates</td>
</tr>
<tr>
<td>Change management log</td>
<td>Sample PR/sync history</td>
<td>Deployments with no associated pull request or review</td>
</tr>
</tbody></table>
<h3 id="heading-what-triggers-a-finding">What Triggers a Finding</h3>
<p>A SOC2 finding is the auditor's documented conclusion that a control wasn't operating effectively during the observation period. Findings range from observations (minor issues that don't affect the audit opinion) to qualified opinions (material failures that result in a qualified rather than unqualified report).</p>
<p>Understanding what triggers findings — and which ones restart the observation period — is critical for managing your audit timeline.</p>
<p><strong>Control gaps</strong> occur when a required control isn't implemented or was disabled during the observation period. If you discover in month 2 that MFA wasn't enforced on one IAM user for the first three weeks, you must document the remediation and demonstrate the gap was closed.</p>
<p>Whether this restarts your observation period depends on how long the gap lasted and how the auditor assesses the risk — but a gap of less than 30 days that's immediately remediated and documented typically doesn't restart the clock.</p>
<p><strong>Evidence gaps</strong> are more serious. If your daily Lambda evidence collector failed for two weeks and produced no evidence exports, you have a two-week window with no documented proof that your controls were operating. The auditor can't verify controls they can't see evidence for.</p>
<p>Evidence gaps almost always require extending the observation period because there's no way to retroactively produce evidence for a period that wasn't recorded.</p>
<p><strong>Process failures</strong> occur when a manual control wasn't executed as documented. The most common is an access review that was skipped. Like control gaps, these can typically be remediated without restarting the clock if they're documented promptly and the remediation is clear.</p>
<p><strong>Unpatched critical CVEs</strong> are a special case. If Trivy identifies a CRITICAL vulnerability in a production container and it remains unpatched for more than your documented remediation SLA (typically 30 days for critical, 90 days for high), this is a qualified finding that the auditor will note in the report.</p>
<h3 id="heading-how-to-close-gaps-without-restarting-the-clock">How to Close Gaps Without Restarting the Clock</h3>
<p>When you discover a gap during the observation period:</p>
<p><strong>For control gaps:</strong></p>
<pre><code class="language-plaintext">1. Fix the control immediately — don't wait
2. Document the fix: screenshot, PR link, or CLI command output with timestamp
3. Note the gap date range in your audit log: "Control gap: 2024-03-10 to 2024-03-14 (4 days). Root cause: [X]. Remediated: [Y]. No customer data accessed during gap period."
4. Notify your auditor proactively — they will find it anyway; proactive disclosure is better than defensive explanation
5. The observation period doesn't restart if the gap was short-lived and promptly remediated
</code></pre>
<p><strong>For evidence gaps:</strong></p>
<pre><code class="language-plaintext">1. Fix the evidence collection infrastructure immediately
2. Understand that you can't retroactively generate evidence for the gap period
3. The observation period for affected controls effectively restarts from the date evidence collection resumed
4. If the gap is early in your observation period, you may be able to extend the period rather than restart — discuss with your auditor
</code></pre>
<p><strong>The pro tip:</strong> Set up a CloudWatch alarm that triggers if the evidence Lambda fails to deliver to S3 on schedule. A missing daily evidence file is caught within 24 hours, not discovered during the audit review.</p>
<h2 id="heading-the-90-day-soc2-timeline-at-a-glance">The 90-Day SOC2 Timeline at a Glance</h2>
<table>
<thead>
<tr>
<th>Weeks</th>
<th>Focus</th>
<th>Key Deliverables</th>
<th>Common Mistake</th>
</tr>
</thead>
<tbody><tr>
<td>1–2</td>
<td>Scope</td>
<td>Boundary diagram, network segmentation Terraform</td>
<td>Over-scoping to include dev and staging</td>
</tr>
<tr>
<td>3–6</td>
<td>Controls</td>
<td>14 controls implemented and collecting evidence</td>
<td>Starting controls after the observation period begins</td>
</tr>
<tr>
<td>7–10</td>
<td>Evidence</td>
<td>S3 evidence bucket, Lambda daily collector, GitHub Actions workflow</td>
<td>Manual evidence collection with inevitable gaps</td>
</tr>
<tr>
<td>11–14</td>
<td>Readiness</td>
<td>Mock audit, gap remediation, auditor selected</td>
<td>Skipping the mock audit</td>
</tr>
<tr>
<td>15–18</td>
<td>Observation</td>
<td>Daily evidence, quarterly reviews, incident response test</td>
<td>Discovering evidence gaps during the audit rather than before</td>
</tr>
</tbody></table>
<h2 id="heading-whats-next">What's Next?</h2>
<p>Start with Week 1. Define your SOC2 boundary. Apply the four-question framework to every system in your infrastructure. Draw the diagram in Excalidraw. Document the network segmentation controls.</p>
<p>Then implement the 14 controls in order, starting with MFA and CloudTrail — the two that most commonly fail audits when they're missing.</p>
<p>Then build your evidence collection infrastructure before the observation period starts. The automated Lambda and GitHub Actions workflow are the difference between a smooth audit and a 60-day extension.</p>
<p>One thing to remember: SOC2 is 20% controls, 30% evidence, and 50% continuous operation. Start early. Automate everything. Run a mock audit before you call the real one.</p>
<h2 id="heading-resources">Resources</h2>
<p>The following resources are referenced throughout this guide:</p>
<ul>
<li><p><a href="https://www.aicpa-cima.com/resources/landing/system-and-organization-controls-soc-suite-of-services"><strong>AICPA SOC2 Overview</strong></a> — The official SOC2 documentation from the American Institute of CPAs, including the Trust Service Criteria</p>
</li>
<li><p><a href="https://www.vanta.com/"><strong>Vanta</strong></a> — Compliance automation platform that connects to AWS and GitHub to automate evidence collection and track control status</p>
</li>
<li><p><a href="https://drata.com/"><strong>Drata</strong></a> — Alternative compliance automation platform with similar capabilities to Vanta</p>
</li>
<li><p><a href="https://github.com/aquasecurity/trivy"><strong>Trivy by Aqua Security</strong></a> — Open-source container and filesystem vulnerability scanner used in Control 10</p>
</li>
<li><p><a href="https://excalidraw.com/"><strong>Excalidraw</strong></a> — Free, open-source diagram tool for creating the SOC2 boundary diagram</p>
</li>
<li><p><a href="https://docs.aws.amazon.com/singlesignon/latest/userguide/what-is.html"><strong>AWS IAM Identity Center documentation</strong></a> — Official AWS documentation for setting up SSO and MFA enforcement</p>
</li>
<li><p><a href="https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches"><strong>GitHub branch protection documentation</strong></a> — Official GitHub documentation for configuring branch protection rules</p>
</li>
<li><p><a href="https://argo-cd.readthedocs.io/"><strong>ArgoCD documentation</strong></a> — Official ArgoCD documentation for GitOps deployment and sync history</p>
</li>
</ul>
<p><a href="https://github.com/aayostem">Ayobami Adejumo</a> <em>is a senior platform engineer and FinOps specialist. He writes about SOC2 compliance engineering, Kubernetes cost optimization, and platform engineering.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Deploy a Serverless Spam Classifier Using Scikit-Learn, AWS Lambda, & API Gateway ]]>
                </title>
                <description>
                    <![CDATA[ In today's digital world, spam is no longer just an annoyance - it's a growing security threat. To combat this, developers often turn to machine learning to build intelligent filters that can distingu ]]>
                </description>
                <link>https://www.freecodecamp.org/news/deploying-serverless-spam-classifier/</link>
                <guid isPermaLink="false">69f2e347b18c978233780179</guid>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ serverless ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MathJax ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Architecture ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Rakshath Naik ]]>
                </dc:creator>
                <pubDate>Thu, 30 Apr 2026 05:06:15 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/08672d22-a4df-4b99-8ef7-fffd18f5dc07.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In today's digital world, spam is no longer just an annoyance - it's a growing security threat. To combat this, developers often turn to machine learning to build intelligent filters that can distinguish legitimate emails from malicious ones.</p>
<p>While building a machine learning model in a notebook is relatively straightforward, the real challenge lies in the last mile: deploying that model into a scalable, production-ready system that users can actually interact with.</p>
<p>In this project, I built an end-to-end serverless spam classifier, combining Scikit-learn for model development with AWS Lambda, Amazon S3, and Amazon API Gateway for deployment. The result is a lightweight, scalable API that can classify messages in real time.</p>
<p>The system is designed to be modular and cost-efficient, allowing the model to be retrained and updated independently without affecting the live API. From detecting "free iPhone" scams to identifying phishing attempts, this project demonstrates how to bridge the gap between machine learning experimentation and real-world deployment.</p>
<h3 id="heading-table-of-contents">Table of&nbsp;Contents</h3>
<ul>
<li><p><a href="#heading-1-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-2-building-the-brain-the-model">Building the Brain: The Model</a></p>
</li>
<li><p><a href="#heading-3-deploying-the-model-to-aws">Deploying the Model to AWS</a></p>
</li>
<li><p><a href="#heading-4-how-to-run-the-project-locally">How to Run The Project Locally</a></p>
</li>
<li><p><a href="#heading-5-our-project-architecture">Our Project Architecture</a></p>
</li>
<li><p><a href="#heading-6-conclusion-the-power-of-serverless-ai">Conclusion: The Power of Serverless AI</a></p>
</li>
<li><p><a href="#heading-7-acknowledgment-references">Acknowledgment / References</a></p>
</li>
</ul>
<h2 id="heading-1-prerequisites">1. Prerequisites</h2>
<ol>
<li><p><strong>Fundamental skills:</strong> Basic proficiency in Python and understanding of Machine Learning concepts like classification.</p>
</li>
<li><p><strong>AWS account:</strong> Access to an AWS account with permissions for Lambda, S3, and API Gateway.</p>
</li>
<li><p><strong>Environment:</strong> Python 3.11 installed, along with libraries like scikit-learn, pandas, and joblib.</p>
</li>
<li><p><strong>AWS CLI:</strong> Configured on your local machine for file uploads.</p>
</li>
<li><p><strong>HuggingFace account:</strong> You can directly download the model from my account.</p>
</li>
</ol>
<h2 id="heading-2-building-the-brain-the-model">2. Building the Brain: The&nbsp;Model</h2>
<img src="https://cdn.hashnode.com/uploads/covers/6942c2903c5d674e359eaf1e/b43af198-1472-4914-9469-6cd5ca5384e2.png" alt="Demonstrational image to show the brain of AI." style="display:block;margin:0 auto" width="1000" height="563" loading="lazy">

<p><em>Photo by</em> <a href="https://unsplash.com/@steve_j?utm_source=medium&amp;utm_medium=referral"><em>Steve A Johnson</em></a> <em>on</em> <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral"><em>Unsplash</em></a></p>
<p>At the heart of this project lies a supervised learning approach. Instead of simply specifying which words are considered spam, we'll provide the computer with a dataset and an algorithm, enabling it to learn and identify spam patterns on its own.</p>
<h3 id="heading-1-vectorization-turning-text-into-math">1. Vectorization: Turning Text into&nbsp;Math</h3>
<p>Machine Learning models can't <strong>read</strong> text. They require numerical input. To solve this, we used the <a href="https://www.freecodecamp.org/news/how-to-extract-keywords-from-text-with-tf-idf-and-pythons-scikit-learn-b2a0f3d7e667/">TF-IDF</a> (Term Frequency-Inverse Document Frequency) Vectorizer.</p>
<pre><code class="language-python">feature_extraction = TfidfVectorizer(min_df=1, stop_words='english', lowercase=True)
X_train_features = feature_extraction.fit_transform(X_train
</code></pre>
<p>Here's the mathematical formula:</p>
<p>$$w_{i,j} = tf_{i,j} \times \log \left( \frac{N}{df_i} \right)$$</p>
<p>TF-IDF term definitions:</p>
<ul>
<li><p><strong>wᵢ,ⱼ (Weight):</strong> The final importance score of a specific word in a document.</p>
</li>
<li><p><strong>tfᵢ,ⱼ (Term Frequency):</strong> How often a word appears in a single email.</p>
</li>
<li><p><strong>N (Total Documents):</strong> The total count of all emails in your dataset.</p>
</li>
<li><p><strong>dfᵢ (Document Frequency):</strong> The number of different emails that contain this specific word.</p>
</li>
<li><p><strong>log(N/dfᵢ) (IDF):</strong> A penalty that lowers the score of common words like <strong>the</strong> or <strong>is</strong> that appear everywhere.</p>
</li>
</ul>
<p>It cleans the data by removing common words, converts all text to lowercase for consistency, and assigns more importance to rare and meaningful words while giving less importance to frequently used words.</p>
<h3 id="heading-2-training-the-logistic-regression-engine">2. Training: The Logistic Regression Engine</h3>
<p>We'll use <strong>Logistic Regression</strong> here, a classification algorithm that predicts the probability of an outcome.</p>
<p>In this stage, we feed our vectorized training data into the Logistic Regression algorithm. The goal is to establish a mathematical relationship between specific word weights and the <strong>Spam</strong> or <strong>Ham</strong> label.</p>
<p>During training, the model iteratively adjusts its internal parameters to minimize error, eventually learning that words like winner or free correlate highly with spam, while conversational language correlates with legitimate messages.</p>
<pre><code class="language-python">model = LogisticRegression()
model.fit(X_train_features, Y_train)
</code></pre>
<p>In our case, it calculates the probability that an email belongs to spam or HAM.</p>
<p>The algorithm uses the Sigmoid function to map any real-valued number into a value between 0 and 1.</p>
<p>$$P(y=1|x) = \frac{1}{1 + e^{-(z)}}$$</p>
<p>where z = β₀ + β₁x₁ +&nbsp;… + βₙxₙ.</p>
<h3 id="heading-3-evaluation-testing-the-intelligence">3. Evaluation: Testing the Intelligence</h3>
<p>After training, we need to verify if the brain actually works on data it hasn't seen before.</p>
<pre><code class="language-python">prediction_on_test_data = model.predict(X_test_features)
accuracy_on_test_data = accuracy_score(Y_test, prediction_on_test_data)
</code></pre>
<p>By comparing the model’s predictions against the actual labels in our test set, we calculate an Accuracy Score. This gives us the confidence that the model is ready for the real world (achieving ~94% accuracy in our tests).</p>
<h3 id="heading-4-exporting-the-logic-serialization">4. Exporting the Logic (Serialization)</h3>
<p>To move this brain from our local Python environment to the AWS Cloud, we'll use Joblib to save our work into binary files (.pkl).</p>
<pre><code class="language-python">joblib.dump(model, 'spam_model.pkl')
joblib.dump(feature_extraction, 'vectorizer.pkl')
</code></pre>
<p>We use the Pickle format because it allows us to freeze complex Python objects (mathematical weights and word mappings) into a portable binary format that can be instantly re-animated in the cloud.</p>
<p>We need the Vectorizer to translate new user text into the exact numerical coordinates the Model was trained to understand. Using one without the other is like having a key but no lock.</p>
<p>The trained Logistic Regression model and TF-IDF vectorizer are openly available for the community on Hugging Face here: <a href="https://huggingface.co/rakshath1/mail-spam-detector">Get the model on HuggingFace</a>.</p>
<h2 id="heading-3-deploying-the-model-to-aws">3. Deploying the Model to&nbsp;AWS</h2>
<p>Training a model is science, while deploying it is engineering. To make this classifier accessible to the world, we'll use a serverless stack that scales automatically and incurs nearly no maintenance costs.</p>
<h3 id="heading-1-model-storage-amazon-s3">1. Model Storage: Amazon&nbsp;S3</h3>
<p>First, we'll uploade our&nbsp;.pkl files to an S3 bucket. By decoupling the model from the code, we can update the AI's intelligence (simply by overwriting the file in S3) without redeploying the backend code. It makes the system highly maintainable.</p>
<h3 id="heading-2-the-production-backend-aws-lambda">2. The Production Backend: AWS&nbsp;Lambda</h3>
<p>To make the AI accessible, we'll move from a local script to a Serverless Cloud Architecture. This ensures the model is always available without the cost of a 24/7 server.</p>
<p>The deployment environment is AWS Lambda (Python 3.11). Since Lambda is a lightweight environment, it doesn't include Scikit-Learn or Joblib. To provide these, we'll download and store them in our S3 bucket and import them through the layers.</p>
<p><strong>Commands in AWS CLI:</strong></p>
<pre><code class="language-python">
# 1. Create a workspace
mkdir ml_layer &amp;&amp; cd ml_layer

# 2. Install scikit-learn and its dependencies into a folder
pip install \
    --platform manylinux2014_x86_64 \
    --target=python/lib/python3.11/site-packages \
    --implementation cp \
    --python-version 3.11 \
    --only-binary=:all: \
    scikit-learn joblib

# 3. Zip the folder
zip -r sklearn_lib.zip python

# 4. Upload to S3 (Using AWS CLI)
aws s3 cp sklearn_lib.zip s3://YOUR-BUCKET-NAME/
</code></pre>
<p>We store the Scikit-Learn library as a ZIP in S3 to bypass the AWS Lambda deployment package size limit. This allows the function to dynamically load heavy dependencies only when needed without bloating the core code.</p>
<p><strong>The Lambda Function:</strong></p>
<pre><code class="language-python">
import json
import boto3
import os
import sys
from io import BytesIO

# Ensures the custom Lambda layer(containing sklearn/joblib)
sys.path.append('/opt/python')

try:
    import joblib
except ImportError:
    # Fallback for specific Scikit-Learn distributions
    from sklearn.utils import _joblib as joblib

# Initialize S3 client
s3 = boto3.client('s3')

# Use placeholders for the article so readers can insert their own values
BUCKET_NAME = 'YOUR_S3_BUCKET_NAME' 
MODEL_KEY = 'spam_model.pkl'
VECTORIZER_KEY = 'vectorizer.pkl'

# Global variables for 'Warm Start' caching (improves performance by keeping model in RAM)
model = None
vectorizer = None

def load_model():
    """Downloads model files from S3 only if they aren't already in RAM"""
    global model, vectorizer
    if model is None or vectorizer is None:
        try:
            # 1. Load the Logistic Regression Model from S3
            m_obj = s3.get_object(Bucket=BUCKET_NAME, Key=MODEL_KEY)
            model = joblib.load(BytesIO(m_obj['Body'].read()))
            
            # 2. Load the TF-IDF Vectorizer directly from S3
            v_obj = s3.get_object(Bucket=BUCKET_NAME, Key=VECTORIZER_KEY)
            vectorizer = joblib.load(BytesIO(v_obj['Body'].read()))
        except Exception as e:
            raise Exception(f"Failed to load .pkl files from S3: {str(e)}")

def lambda_handler(event, context):
    try:
        # Ensure model and vectorizer are ready before processing
        load_model()
        
        # Handles both direct Lambda tests and API Gateway POST requests
        body = event.get('body', event)
        if isinstance(body, str):
            body = json.loads(body)
            
        text = body.get('text', '')
            
        if not text:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'No text provided.'})
              }

        # 1. Transform input text to numeric features using the trained Vectorizer
        data_vec = vectorizer.transform([text])
        
        # 2. Predict using the Logistic Regression Model 
        prediction = int(model.predict(data_vec)[0])
        
      # 3. Map numeric result to human-readable label
        result_label = "HAM" if prediction == 1 else "SPAM"
        
        # RESPONSE WITH CORS
        return {
            'statusCode': 200,
            'headers': {
                'Content-Type': 'application/json',
                'Access-Control-Allow-Origin': '*' # needed for cross-domain web integration
            },
            'body': json.dumps({
                'status': 'success',
                'classification': result_label,
                'input_text': text
            })
        }
        
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error_message': f"Inference Error: {str(e)}"})
        }
</code></pre>
<p>Key features of the Lambda function:</p>
<ol>
<li><p><strong>Warm start caching:</strong> By defining the model and vectorizer variables outside the lambda_handler, we store them in the container's memory. This significantly reduces cold start latency for subsequent requests.</p>
</li>
<li><p><strong>Dynamic dependency loading:</strong> The <strong>sys.path.append('/opt/python')</strong> line allows us to import heavy libraries from S3/Layers without exceeding the upload limit.</p>
</li>
<li><p><strong>Bimodal input handling:</strong> The function is designed to handle both direct JSON testing from the AWS console and stringified payloads sent via API Gateway.</p>
</li>
</ol>
<h3 id="heading-3-the-api-gateway-the-bridge-to-the-web">3. The API Gateway - The Bridge to the&nbsp;Web</h3>
<img src="https://cdn.hashnode.com/uploads/covers/6942c2903c5d674e359eaf1e/8aa3e8d7-569a-4dd5-a6ac-184922474952.png" alt="Demonstrational image to show the API Gateway." style="display:block;margin:0 auto" width="1000" height="563" loading="lazy">

<p>Photo by <a href="https://unsplash.com/@growtika?utm_source=medium&amp;utm_medium=referral">Growtika</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></p>
<h4 id="heading-creating-the-rest-api">Creating the REST API</h4>
<p>Next we'll create a REST API with a single POST method. Why POST, you might be wondering? Well, we need to securely send a JSON payload containing the user’s text message to our model.</p>
<ol>
<li><p>First navigate to the Amazon API Gateway console and select Create API -&gt; REST API.</p>
</li>
<li><p>Give your API a name, such as EmailSpamPredictor-API, and set the Endpoint Type to Regional.</p>
</li>
<li><p>Then in the left sidebar, click Resources and enter a resource name (e.g: <strong>/ predict</strong> as entered by me)</p>
</li>
<li><p>Next click the create method and select POST and then select Lambda Function for integration type</p>
</li>
<li><p>Ensure Lambda Proxy integration is enabled (this allows the full request to pass through to your code).</p>
</li>
</ol>
<p><strong>The CORS Configuration (The Troubleshooting Hub)</strong><br>This is where many developers encounter the dreaded <strong>Connection Error</strong>. Since our API is hosted on AWS, and if your front-end is on a separate website, the browser’s Same-Origin Policy will block the request by default.</p>
<p>To fix this, we'll enable <strong>CORS:</strong></p>
<ol>
<li><p><strong>Access-Control-Allow-Origin:</strong> Set to * (or specifically to your domain) to tell the browser that the API is allowed to talk to your front-end.</p>
</li>
<li><p><strong>The OPTIONS method:</strong> API Gateway creates an OPTIONS method automatically. This handles the Preflight request where the browser asks, “Are you allowed to receive data from me?” before sending the actual text.</p>
</li>
<li><p><strong>Access-Control-Allow-Headers:</strong> In the screenshot, you'll notice headers like Content-Type and Authorization are allowed. This ensures that when our JavaScript fetch() call sets the content type to application/json, the API Gateway doesn't reject it.</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/6942c2903c5d674e359eaf1e/cf5c87c6-f374-4dda-8001-77a0aab52672.png" alt="Image illustrates the CORS configuration for our project. " style="display:block;margin:0 auto" width="1487" height="617" loading="lazy">

<p>Image illustrates the CORS configuration for our project. (Image by author)</p>
<h4 id="heading-deployment-stages">Deployment Stages</h4>
<p>Once the API is deployed to a production stage, AWS generates a permanent Invoke URL. This acts as the public gateway to our model and typically follows this structure: <a href="https://%5Bapi-id%5D.execute-api.%5Bregion%5D.amazonaws.com/prod/classify">https://[api-id].execute-api.[region].amazonaws.com/prod/classify</a>.</p>
<h4 id="heading-connecting-the-frontend-the-javascript-layer">Connecting the Frontend (The JavaScript Layer)</h4>
<p>With the API live, we can now write a simple JavaScript function to talk to our model. This script runs whenever a user clicks the <strong>Analyze</strong> button on your site.</p>
<pre><code class="language-python">
async function checkSpam() {
    const message = document.getElementById("userInput").value;
    const apiUrl = "YOUR_API_GATEWAY_INVOKE_URL";

    try {
        const response = await fetch(apiUrl, {
            method: "POST",
            headers: {
                "Content-Type": "application/json"
            },
            body: JSON.stringify({ "text": message })
        });

        const data = await response.json();
        
        // Display result on the webpage
        const resultElement = document.getElementById("result");
        resultElement.innerText = `Prediction: ${data.classification}`;
        resultElement.style.color = data.classification === "SPAM" ? "red" : "green";

    } catch (error) {
        console.error("Error:", error);
        alert("Could not connect to the Spam Detector API.");
    }
}
</code></pre>
<h2 id="heading-4-how-to-run-the-project-locally">4. How to Run The Project&nbsp;Locally</h2>
<p>You can store the front-end as an HTML file. Once it's ready, you shouldn’t just double-click the&nbsp;.html file. Opening it as a <strong>file</strong> in your browser can cause security restrictions. Instead, you should host it using a simple local server.</p>
<p><strong>Step 1:</strong> Open the terminal or Command Prompt.</p>
<p><strong>Step 2:</strong> Navigate to your project folder</p>
<pre><code class="language-shell">cd [PATH_TO_YOUR_FOLDER]
</code></pre>
<p><strong>Step 3:</strong> Start a local Python web server.</p>
<pre><code class="language-shell">python -m http.server 8000
</code></pre>
<p><strong>Step 4:</strong> Access the application.</p>
<p>Open your browser and navigate to:<br><a href="http://localhost:8000/your-file-name.html">http://localhost:8000/your-file-name.html</a></p>
<p><strong>Watch the Demo:</strong></p>
<div class="embed-wrapper"><iframe width="560" height="315" src="https://www.youtube.com/embed/q2X_azntmzY" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>

<h2 id="heading-5-our-project-architecture">5. Our Project Architecture</h2>
<img src="https://cdn.hashnode.com/uploads/covers/6942c2903c5d674e359eaf1e/c17673d4-5dd0-43dc-8e8d-3015bcd31864.png" alt="Image showing the Architecture Diagram of our Project." style="display:block;margin:0 auto" width="1000" height="563" loading="lazy">

<p>The image illustrates the architecture of our project (Building a Serverless Spam Classifier). It shows the process that takes place from the client input to the final model output. (Image by Author)</p>
<ol>
<li><p><strong>Client Front-End Interaction:</strong> The process starts on the far left. A user interacts with the web interface (for example, a website or a desktop app). They input text like <strong>WIN free iPhone now</strong> and trigger a request.</p>
</li>
<li><p><strong>The Entry Point: API Gateway:</strong> The request hits the Amazon API Gateway, which acts as the <strong>security guard</strong> and translator.&nbsp;<br><strong>(a)</strong> CORS OPTIONS handles the pre-flight handshake to ensure the browser has permission to talk to the AWS cloud.&nbsp;<br><strong>(b)</strong> Classification Request (POST) routes the actual message data to your backend logic.</p>
</li>
<li><p><strong>The Engine: AWS Lambda (Python 3.11):</strong>&nbsp;The central “<strong>lightbulb</strong>” represents your Lambda function. This is where the code you wrote lives. It doesn’t run 24/7 – it only wakes up when a request arrives.</p>
</li>
<li><p><strong>Storage &amp; Retrieval: S3 Bucket:</strong> Since Lambda is lightweight, it doesn’t store your heavy Machine Learning files internally.<br><strong>Dependency and Model Download:</strong> The function reaches out to the S3 Bucket to pull in the sklearn_<a href="http://lib.zip">lib.zip</a> (the engine) and the&nbsp;.pkl files (the intelligence).&nbsp;<br><strong>Required Dependency and Model:</strong> These assets are loaded into the Lambda’s temporary memory to prepare for the prediction.</p>
</li>
<li><p><strong>The Inference Pipeline:</strong>&nbsp;Inside the Lambda, a three-step mathematical cycle occurs:<br><strong>(a) Text Vectorizer:</strong> Translates the words into numbers.<br><strong>(b) Logistic Regression:</strong> Calculates the probability of spam based on those numbers.<br><strong>(c) Label:</strong> Assigns a final result (Spam or Ham).</p>
</li>
<li><p><strong>The Result Delivery:</strong> The result is sent back through the API Gateway, including the necessary CORS Headers to ensure the browser accepts it. The front-end then updates to show the “<strong>Result: SPAM</strong>” with a visual indicator.</p>
</li>
</ol>
<h2 id="heading-6-conclusion-the-power-of-serverless-ai">6. Conclusion: The Power of Serverless AI</h2>
<p>By merging the mathematical simplicity of Logistic Regression with the industrial strength of AWS Serverless Architecture, we have transformed a static Python script into a globally accessible, scalable API.</p>
<p>This project demonstrates that you don’t need a massive budget or a 24/7 dedicated server to deploy high-quality Machine Learning.</p>
<p>Using the S3-to-Lambda workaround allowed us to bypass common storage hurdles, ensuring that our Brain (the model) and its Muscle (Scikit-Learn) could function seamlessly within the cloud’s ephemeral environment. It bridges the gap between experimentation and real-world applications, making AI systems practical, efficient, and accessible.</p>
<h2 id="heading-7-acknowledgment-references">7. Acknowledgment / References</h2>
<ul>
<li><p>Pre-trained spam classification model: View on Hugging Face (<a href="https://huggingface.co/rakshath1/mail-spam-detector"><strong>rakshath1/mail-spam-detector · Hugging Face</strong></a><strong>)</strong></p>
</li>
<li><p>Scikit-learn <a href="https://scikit-learn.org/stable/api/index.html?utm_source=chatgpt.com">Documentation</a></p>
</li>
<li><p>AWS Lambda <a href="https://docs.aws.amazon.com/lambda/latest/api/welcome.html?utm_source=chatgpt.com">Documentation</a></p>
</li>
<li><p>Amazon S3 <a href="https://aws.amazon.com/documentation-overview/s3/">Documentation</a></p>
</li>
<li><p>Amazon API Gateway <a href="https://docs.aws.amazon.com/apigateway/">Documentation</a></p>
</li>
</ul>
<h3 id="heading-connect-with-me">Connect With Me</h3>
<ul>
<li><p><a href="https://medium.com/@rakshathnaik62">Medium</a></p>
</li>
<li><p><a href="https://www.linkedin.com/in/rakshath-/">LinkedIN</a></p>
</li>
</ul>
<p><strong>You may also like</strong></p>
<ol>
<li><p><a href="https://qubrica.com/python-polars-v-s-pandas-libraries-comparison/">How Polars overtook Pandas</a></p>
</li>
<li><p><a href="https://qubrica.com/devops-is-dead-platform-engineering-2026/"><strong>DevOps is Dead. Long Live Platform Engineering</strong></a></p>
</li>
</ol>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Set Up OpenID Connect (OIDC) in GitHub Actions for AWS
 ]]>
                </title>
                <description>
                    <![CDATA[ If you've been storing AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as GitHub Secrets to deploy to AWS, you're not alone. It's the most common approach and it's also one of the biggest security risks i ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-set-up-openid-connect-oidc-in-github-actions-for-aws/</link>
                <guid isPermaLink="false">69ef7bbf330a1ad7f7f2d579</guid>
                
                    <category>
                        <![CDATA[ OpenID Connect ]]>
                    </category>
                
                    <category>
                        <![CDATA[ OIDC ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ GitHub Actions ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Security ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ci-cd ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tolani Akintayo ]]>
                </dc:creator>
                <pubDate>Mon, 27 Apr 2026 15:07:43 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/83b71e24-b63b-42a4-ac1c-d59e226da6c3.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you've been storing <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code> as GitHub Secrets to deploy to AWS, you're not alone. It's the most common approach and it's also one of the biggest security risks in a CI/CD pipeline.</p>
<p>Here's why: static credentials don't expire on their own. If they get leaked through a misconfigured workflow, a public fork, or a compromised repository, an attacker has persistent access to your AWS environment until you manually rotate them. And most teams don't rotate them often enough.</p>
<p>OpenID Connect (OIDC) solves this entirely. Instead of storing long-lived credentials, GitHub Actions requests a <strong>short-lived token</strong> directly from AWS every time your workflow runs. No secrets to rotate. No credentials to leak. No manual key management.</p>
<p>In this tutorial, you'll learn how to set up OIDC authentication between GitHub Actions and AWS from scratch. By the end, your workflows will authenticate to AWS securely without storing a single access key.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-what-is-openid-connect-oidc">What Is OpenID Connect (OIDC)?</a></p>
</li>
<li><p><a href="#heading-how-oidc-works-between-github-actions-and-aws">How OIDC Works Between GitHub Actions and AWS</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-step-1-create-an-iam-oidc-identity-provider-in-aws">Step 1: Create an IAM OIDC Identity Provider in AWS</a></p>
<p><a href="#heading-step-2-create-an-iam-role-with-a-trust-policy">Step 2: Create an IAM Role with a Trust Policy</a></p>
<p><a href="#heading-step-3-attach-permissions-to-the-iam-role">Step 3: Attach Permissions to the IAM Role</a></p>
<p><a href="#heading-step-4-store-the-role-arn-as-a-github-actions-variable">Step 4: Store the Role ARN as a GitHub Actions Variable</a></p>
<p><a href="#heading-step-5-configure-your-github-actions-workflow">Step 5: Configure Your GitHub Actions Workflow</a></p>
<p><a href="#heading-step-6-run-and-verify-your-workflow">Step 6: Run and Verify Your Workflow</a></p>
</li>
<li><p><a href="#heading-security-best-practices">Security Best Practices</a></p>
</li>
<li><p><a href="#heading-troubleshooting-common-errors">Troubleshooting Common Errors</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-references">References</a></p>
</li>
</ul>
<h2 id="heading-what-is-openid-connect-oidc">What Is OpenID Connect (OIDC)?</h2>
<p>OpenID Connect is an identity protocol built on top of OAuth 2.0. It allows systems to verify identity through tokens rather than shared secrets.</p>
<p>In the context of GitHub Actions and AWS:</p>
<ul>
<li><p><strong>GitHub</strong> acts as the <strong>identity provider (IdP)</strong>. It issues a signed JWT (JSON Web Token) for each workflow run.</p>
</li>
<li><p><strong>AWS</strong> acts as the <strong>service provider</strong>. It validates that token against GitHub's public keys and exchanges it for temporary AWS credentials. The credentials AWS returns are short-lived (valid for up to 1 hour by default) and scoped to exactly the IAM role you define. When the workflow ends, those credentials are gone.</p>
</li>
</ul>
<p>This model is called <strong>federated identity</strong>. It's the same concept used when you "Sign in with Google" on a third-party website. The difference is that instead of a user signing in, your workflow is the one authenticating.</p>
<h2 id="heading-how-oidc-works-between-github-actions-and-aws">How OIDC Works Between GitHub Actions and AWS</h2>
<p>Before writing a single line of YAML, it beneficial to understand the flow. This is my personal approach when implementing new technologies or concepts. Here's what happens every time your workflow runs:</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/8b5b39de-f671-4ffe-a2db-96d10ade69b3.jpg" alt="Diagram showing the OIDC authentication flow between GitHub Actions and AWS" style="display:block;margin:0 auto" width="449" height="544" loading="lazy">

<p>The diagram illustrates a secure authentication flow between GitHub Actions and AWS using OpenID Connect (OIDC), eliminating the need to store long-lived AWS credentials in GitHub. Here's what happens step-by-step:</p>
<p><strong>1. Initial Authentication Request</strong></p>
<p>When your GitHub Actions workflow starts, the runner (the virtual machine executing your workflow) requests a JSON Web Token (JWT) from GitHub's OIDC provider located at <code>https://token.actions.githubusercontent.com</code>.</p>
<p><strong>2. Token Issuance</strong></p>
<p>GitHub's OIDC provider generates and signs a JWT containing important claims (metadata) about your workflow. These claims include details like which repository the workflow is running from, which branch triggered it, what environment it's running in, and other contextual information that proves the workflow's identity.</p>
<p><strong>3. Token Validation</strong></p>
<p>The GitHub Actions runner presents this signed JWT to AWS Security Token Service (STS). AWS STS validates the JWT's signature by checking it against GitHub's publicly available cryptographic keys, ensuring the token is authentic and hasn't been tampered with.</p>
<p><strong>4. Trust Policy Verification</strong></p>
<p>AWS STS checks the trust policy configured on your IAM Role. This trust policy specifies which GitHub repositories, branches, or environments are allowed to assume this role. If the claims in the JWT match your trust policy conditions, authentication succeeds.</p>
<p><strong>5. Temporary Credentials Issued</strong></p>
<p>Once validated, AWS STS returns temporary security credentials to the GitHub Actions runner. These credentials include an Access Key ID, Secret Access Key, and Session Token that are valid for a limited time (typically 1 hour by default, configurable up to 12 hours).</p>
<p><strong>6. AWS API Access</strong></p>
<p>The GitHub Actions runner uses these temporary credentials to authenticate API calls to your AWS resources such as pushing Docker images to ECR, updating ECS services, writing to S3 buckets, or invoking Lambda functions.</p>
<p>The key point: <strong>AWS never sees your GitHub credentials, and GitHub never sees your AWS credentials.</strong> The JWT is the only thing exchanged and it's signed, scoped, and short-lived.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you start, make sure you have the following in place:</p>
<ul>
<li><p>An <strong>AWS account</strong> with IAM permissions to create identity providers and roles</p>
</li>
<li><p>A <strong>GitHub repository</strong> (public or private) where your workflows will run</p>
</li>
<li><p>Basic familiarity with <strong>GitHub Actions</strong>, knowing how to write a <code>.yml</code> workflow file</p>
</li>
<li><p>Basic familiarity with <strong>AWS IAM</strong> roles, policies, and permissions</p>
</li>
<li><p>The <strong>AWS CLI</strong> installed and configured (optional, but useful for verification). You don't need to be an AWS expert. Each step includes the exact console path and the configuration values you need.</p>
</li>
</ul>
<h2 id="heading-step-1-create-an-iam-oidc-identity-provider-in-aws">Step 1: Create an IAM OIDC Identity Provider in AWS</h2>
<p>The first thing you need to do is tell AWS to trust GitHub as an identity provider. This is a one-time setup per AWS account.</p>
<h3 id="heading-how-to-do-it-in-the-aws-console">How to Do It in the AWS Console</h3>
<p>1. Open the <a href="https://console.aws.amazon.com/iam/">AWS IAM Console</a></p>
<p>2. In the left sidebar, click Identity providers</p>
<p>3. Click Add provider</p>
<p>4. For Provider type, select OpenID Connect</p>
<p>5. For Provider URL, enter:</p>
<pre><code class="language-plaintext">https://token.actions.githubusercontent.com
</code></pre>
<p>6. For Audience, enter:</p>
<pre><code class="language-plaintext">sts.amazonaws.com
</code></pre>
<p>7. Click Add provider</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/66f1de9d-36f9-462e-ad0c-090b152be6e5.png" alt="AWS IAM console showing the Add Identity Provider form configured for GitHub Actions OIDC" style="display:block;margin:0 auto" width="1349" height="609" loading="lazy">

<h3 id="heading-how-to-do-it-with-the-aws-cli">How to Do It with the AWS CLI</h3>
<p>If you prefer the terminal, run this command:</p>
<pre><code class="language-shell">aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --client-id-list sts.amazonaws.com \
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/4b779fa0-0df2-4bc3-bbf4-9839ef8ce5e6.png" alt="terminal-oidc-connect-created" style="display:block;margin:0 auto" width="966" height="114" loading="lazy">

<p>Once created, you'll see <code>token.actions.githubusercontent.com</code> listed under <strong>Identity providers</strong> in your IAM console. This provider will be referenced in your IAM role's trust policy in the next step.</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/eb820487-6553-43d2-b6b7-4e7b08d039ef.png" alt="verify oidc connect in AWS" style="display:block;margin:0 auto" width="1132" height="284" loading="lazy">

<h2 id="heading-step-2-create-an-iam-role-with-a-trust-policy">Step 2: Create an IAM Role with a Trust Policy</h2>
<p>Now you need an IAM role that your GitHub Actions workflow will assume. The trust policy on this role controls which repositories and branches are allowed to request credentials.</p>
<h3 id="heading-how-to-create-the-iam-role-in-the-aws-console">How to Create the IAM Role in the AWS Console</h3>
<p>1. Open the <a href="https://console.aws.amazon.com/iam/">AWS IAM Console</a></p>
<p>2. In the left sidebar, click <strong>Roles</strong></p>
<p>3. Click <strong>Create role</strong></p>
<p>4. For <strong>Trusted entity type</strong>, select <strong>Web identity</strong></p>
<p>5. For <strong>Identity Provider</strong>, choose: <code>token.actions.githubusercontent.com</code> which you created earlier.</p>
<p>6. For Audience, choose <code>sts.amazonaws.com</code> as well</p>
<p>7. For GitHub organisation, enter your GitHub username or organization name</p>
<p>8. For GitHub repository, enter your GitHub repository</p>
<p>9. For GitHub branch, enter your branch name (for example, main)</p>
<p>10. Click Next, then Next, give a name to the role and click create role</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/dca12969-db8a-4ec4-885e-e953f4808f6c.png" alt="create-iam-role-for-github-action-via-the-console" style="display:block;margin:0 auto" width="1351" height="620" loading="lazy">

<p>Note: Creating the IAM role using this approach already establishes the <strong>Trusted Entities</strong> using a trusted policy based on the step 4-9 above. You can verify this by clicking on the created role and navigating to Trust relationships.</p>
<h3 id="heading-how-to-create-the-iam-role-with-the-aws-cli">How to Create the IAM Role with the AWS CLI</h3>
<p>First, you'll need to create a trust policy document on your local machine: You can call it <code>trust-policy.json</code>:</p>
<pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::YOUR_ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "token.actions.githubusercontent.com:sub": "repo:YOUR_GITHUB_ORG/YOUR_REPO_NAME:*"
        }
      }
    }
  ]
}
</code></pre>
<p>Replace the following placeholders before saving:</p>
<table>
<thead>
<tr>
<th>Placeholder</th>
<th>Replace With</th>
</tr>
</thead>
<tbody><tr>
<td><code>YOUR_ACCOUNT_ID</code></td>
<td>Your 12-digit AWS account ID</td>
</tr>
<tr>
<td><code>YOUR_GITHUB_ORG</code></td>
<td>Your GitHub username or organization name</td>
</tr>
<tr>
<td><code>YOUR_REPO_NAME</code></td>
<td>The name of your GitHub repository</td>
</tr>
</tbody></table>
<h3 id="heading-how-to-understand-the-sub-condition">How to Understand the <code>sub</code> Condition</h3>
<p>The <code>sub (subject)</code> claim in the JWT tells AWS exactly where the request is coming from. The value <code>repo:your-org/your-repo:*</code> means any branch in that repository can assume this role.</p>
<p>You can tighten this further depending on your needs:</p>
<pre><code class="language-shell"># Only the main branch
"token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
 
# Only a specific GitHub Environment
"token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:environment:production"
</code></pre>
<p>Scoping this correctly is one of the most important security decisions in this setup. Here's how to decide:</p>
<ul>
<li><p>Use <code>ref:refs/heads/main</code> if only your main/production branch should deploy to AWS. This is the most restrictive and secure option: feature branches can't accidentally (or maliciously) trigger deployments or modify production resources.</p>
</li>
<li><p>Use <code>environment:production</code> if you're using GitHub Environments with protection rules (required reviewers, deployment gates). This lets you control deployments through GitHub's approval workflow while still restricting which workflows can access AWS.</p>
</li>
<li><p>Use <code>repo:your-org/your-repo:*</code> (wildcard) only if you need any branch to deploy. for example, in development environments where every feature branch deploys to its own isolated stack. Never use this for production roles.</p>
</li>
</ul>
<p>Run this command to create the role using your trust policy:</p>
<pre><code class="language-shell">aws iam create-role \
  --role-name GitHubActionsOIDCRole \
  --assume-role-policy-document file://trust-policy.json \
  --description "Role assumed by GitHub Actions via OIDC"
</code></pre>
<p>Take note of the <strong>Role ARN</strong> in the output. It will look like this:</p>
<pre><code class="language-plaintext">arn:aws:iam::YOUR_ACCOUNT_ID:role/GitHubActionsOIDCRole
</code></pre>
<p>You'll need this ARN in your workflow YAML in Step 4.</p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/6bb154e7-0fb3-4c58-94e1-90116eaea95a.png" alt="terminal output of the AWS CLI create-role command showing the returned Role ARN" style="display:block;margin:0 auto" width="1123" height="615" loading="lazy">

<h2 id="heading-step-3-attach-permissions-to-the-iam-role">Step 3: Attach Permissions to the IAM Role</h2>
<p>The IAM role can now authenticate, but it has no permissions yet. You need to attach a policy that defines what your workflow is actually allowed to do in AWS.</p>
<h3 id="heading-how-to-apply-the-principle-of-least-privilege">How to Apply the Principle of Least Privilege</h3>
<p>Only grant the permissions your workflow genuinely needs. If your workflow deploys to S3, give it S3 permissions. If it pushes images to ECR, give it ECR permissions. Never attach <code>AdministratorAccess</code> to a CI/CD role.</p>
<h4 id="heading-option-1-attach-an-aws-managed-policy-quick-start">Option 1: Attach an AWS managed policy (quick start):</h4>
<pre><code class="language-shell">aws iam attach-role-policy \
  --role-name GitHubActionsOIDCRole \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
</code></pre>
<h4 id="heading-option-2-create-a-custom-policy-scoped-to-a-specific-s3-bucket-recommended-for-production">Option 2: Create a custom policy scoped to a specific S3 bucket (recommended for production):</h4>
<p>This approach is recommended for production because it limits the blast radius of a security incident. If your workflow credentials are ever compromised, a custom policy scoped to a specific bucket means an attacker can only affect that single bucket not every S3 bucket in your AWS account. It also prevents accidental misconfigurations in your workflow from impacting unrelated resources.</p>
<p>Create a file called <code>s3-deploy-policy.json</code>:</p>
<pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}
</code></pre>
<p>Then create and attach it:</p>
<pre><code class="language-shell">aws iam create-policy \
  --policy-name GitHubActionsS3DeployPolicy \
  --policy-document file://s3-deploy-policy.json
 
aws iam attach-role-policy \
  --role-name GitHubActionsOIDCRole \
  --policy-arn arn:aws:iam::YOUR_ACCOUNT_ID:policy/GitHubActionsS3DeployPolicy
</code></pre>
<p>Note: You can as well implement <strong>Step 3</strong> via the console.</p>
<p><strong>Reference:</strong> For a full list of available AWS IAM actions, see the <a href="https://docs.aws.amazon.com/service-authorization/latest/reference/reference_policies_actions-resources-contextkeys.html">AWS IAM actions reference</a>.</p>
<h2 id="heading-step-4-store-the-role-arn-as-a-github-actions-variable">Step 4: Store the Role ARN as a GitHub Actions Variable</h2>
<p>Before you configure your workflow, you need to make the Role ARN available to it. You'll store it as a repository variable in GitHub, not a secret, because the ARN itself isn't sensitive data.</p>
<h3 id="heading-how-to-add-the-variable-in-your-repository">How to Add the Variable in Your Repository</h3>
<p>First, open your GitHub repository and click <strong>Settings:</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/b2dd526a-00ca-44eb-8d22-b78dfd220a14.png" alt="GitHub repository top navigation bar with the Settings tab highlighted" style="display:block;margin:0 auto" width="1310" height="307" loading="lazy">

<p>In the left sidebar, scroll down to <strong>Secrets and variables</strong>, then click <strong>Actions:</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/61d67c83-7bbc-4570-93ec-f2ee4207ad6e.png" alt="GitHub repository settings sidebar showing Secrets and variables expanded with Actions selected" style="display:block;margin:0 auto" width="1266" height="325" loading="lazy">

<p>Then click the <strong>Variables</strong> tab (not Secrets). Click New repository variable – you can set the <strong>Name</strong> to:</p>
<pre><code class="language-plaintext">AWS_ROLE_ARN
</code></pre>
<p>Set the <strong>Value</strong> to your Role ARN from Step 2, for example:</p>
<pre><code class="language-plaintext">arn:aws:iam::YOUR_ACCOUNT_ID::role/GitHubActionsOIDCRole
</code></pre>
<p>Click <strong>Add variable:</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/65a5bfab4c73b29396c0b895/71f5468d-d4ab-45c1-aecd-8509f575237a.png" alt="GitHub repository Actions variables tab showing AWS_ROLE_ARN variable added successfully" style="display:block;margin:0 auto" width="1083" height="377" loading="lazy">

<p>You'll reference this variable in your workflow in the next step using <code>${{</code> <code>vars.AWS_ROLE_ARN }}</code>.</p>
<h2 id="heading-step-5-configure-your-github-actions-workflow">Step 5: Configure Your GitHub Actions Workflow</h2>
<p>With AWS and GitHub fully configured, you now need to update your workflow to request an OIDC token and use it to authenticate.</p>
<h3 id="heading-how-to-set-the-required-workflow-permissions">How to Set the Required Workflow Permissions</h3>
<p>Your workflow <strong>must</strong> declare <code>id-token: write</code>. Without this, GitHub won't issue an OIDC token to the runner.</p>
<pre><code class="language-yaml">permissions:
  id-token: write   # Required to request the OIDC JWT
  contents: read    # Required to checkout the repository
</code></pre>
<p><strong>Important:</strong> If you set permissions at the job level, they override any top-level permissions. Make sure <code>id-token: write</code> is present at whichever level your AWS authentication step runs.</p>
<h3 id="heading-full-workflow-example">Full Workflow Example</h3>
<p>Here's a complete workflow that authenticates to AWS using OIDC and deploys a static site to S3:</p>
<pre><code class="language-yaml">name: Deploy to AWS S3
 
on:
  push:
    branches:
      - main
 
permissions:
  id-token: write
  contents: read
 
jobs:
  deploy:
    name: Deploy
    runs-on: ubuntu-latest
 
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
 
      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ vars.AWS_ROLE_ARN }}
          aws-region: us-east-2
 
      - name: Verify AWS identity
        run: aws sts get-caller-identity
 
      - name: Deploy to S3
        run: |
          aws s3 sync ./code s3://your-bucket-name
</code></pre>
<p>Replace the following before committing:</p>
<table>
<thead>
<tr>
<th>Placeholder</th>
<th>Replace With</th>
</tr>
</thead>
<tbody><tr>
<td><code>AWS_ROLE_ARN</code></td>
<td>The variable name for your IAM role ARN in GitHub</td>
</tr>
<tr>
<td><code>us-east-2</code></td>
<td>Your target AWS region</td>
</tr>
<tr>
<td><code>your-bucket-name</code></td>
<td>Your S3 bucket name</td>
</tr>
<tr>
<td><code>./code</code></td>
<td>The local directory where the file you want to sync to S3 is located</td>
</tr>
</tbody></table>
<p>You can see the code sample in my GitHub Repo <a href="https://github.com/tolani-akintayo/OpenID-Connect-in-GitHub-Actions-for-AWS">here</a>.</p>
<p><strong>Note:</strong> The <code>aws-actions/configure-aws-credentials</code> action handles the entire OIDC token exchange automatically. It requests the JWT from GitHub, calls <code>sts:AssumeRoleWithWebIdentity</code>, and exports the temporary credentials as environment variables for the rest of the job.</p>
<p>See the <a href="https://github.com/aws-actions/configure-aws-credentials">action's official documentation</a> for all available options.</p>
<h2 id="heading-step-6-run-and-verify-your-workflow">Step 6: Run and Verify Your Workflow</h2>
<p>Push your workflow to the <code>main</code> branch and open the <strong>Actions</strong> tab in your repository to watch it run.</p>
<h3 id="heading-what-a-successful-run-looks-like">What a Successful Run Looks Like</h3>
<p>The Configure AWS credentials via OIDC step should show:</p>
<pre><code class="language-plaintext">Assuming role with OIDC: arn:aws:iam::YOUR_ACCOUNT_ID:role/GitHubActionsOIDCRole
</code></pre>
<p>The Verify AWS identity step (<code>aws sts get-caller-identity</code>) should return:</p>
<pre><code class="language-json">{
    "UserId": "AROA...:GitHubActions",
    "Account": "YOUR_ACCOUNT_ID",
    "Arn": "arn:aws:sts::YOUR_ACCOUNT_ID:assumed-role/GitHubActionsOIDCRole/GitHubActions"
}
</code></pre>
<p>If you see an <code>assumed-role</code> ARN in the output, OIDC is working correctly. Your workflow is now authenticating to AWS without a single stored credential.</p>
<h2 id="heading-security-best-practices">Security Best Practices</h2>
<p>Getting OIDC working is step one. Locking it down properly is step two.</p>
<h3 id="heading-scope-the-sub-condition-as-tightly-as-possible">Scope the <code>sub</code> Condition as Tightly as Possible</h3>
<p>Don't use a wildcard like <code>repo:your-org/*:*</code> that allows any repository in your organization to assume the role. Scope it to the exact repository and branch that needs access.</p>
<pre><code class="language-json">"token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
</code></pre>
<h3 id="heading-use-github-environments-for-production-deployments">Use GitHub Environments for Production Deployments</h3>
<p>GitHub Environments let you add manual approval gates and restrict which branches can deploy. When combined with OIDC, you can scope your trust policy to only allow the <code>production</code> environment:</p>
<pre><code class="language-json">"token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:environment:production"
</code></pre>
<h3 id="heading-apply-least-privilege-permissions-to-every-iam-role">Apply Least-Privilege Permissions to Every IAM Role</h3>
<p>Never attach <code>AdministratorAccess</code> or <code>PowerUserAccess</code> to a role used by CI/CD. Define a custom policy with only the actions your workflow actually needs.</p>
<h3 id="heading-create-separate-iam-roles-per-environment">Create Separate IAM Roles Per Environment</h3>
<p>A staging role and a production role should have different permission scopes. Your staging deployment role should never have write access to production resources.</p>
<h3 id="heading-enable-aws-cloudtrail">Enable AWS CloudTrail</h3>
<p>Every call made using the temporary credentials is logged in CloudTrail under the assumed role ARN. This gives you a full audit trail of exactly what your workflow did in AWS.</p>
<p><strong>Reference:</strong> GitHub's official security hardening guide for OIDC: <a href="https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect">About security hardening with OpenID Connect</a></p>
<h2 id="heading-troubleshooting-common-errors">Troubleshooting Common Errors</h2>
<h3 id="heading-error-not-authorized-to-perform-stsassumerolewithwebidentity">Error: <code>Not authorized to perform sts:AssumeRoleWithWebIdentity</code></h3>
<p>This usually means the trust policy on your IAM role doesn't match the <code>sub</code> claim in the JWT.</p>
<p>Check the following:</p>
<ul>
<li><p>The <code>sub</code> condition exactly matches your repository path (it is case-sensitive)</p>
</li>
<li><p>The <code>aud</code> condition is set to <code>sts.amazonaws.com</code></p>
</li>
<li><p>The <code>Federated</code> principal uses the correct AWS account ID</p>
</li>
</ul>
<p>To inspect the actual token claims your workflow is receiving, add this debug step temporarily:</p>
<pre><code class="language-yaml">- name: Print OIDC token claims
  run: |
    TOKEN=\((curl -s -H "Authorization: Bearer \)ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
      "$ACTIONS_ID_TOKEN_REQUEST_URL&amp;audience=sts.amazonaws.com" | jq -r '.value')
    echo $TOKEN | cut -d '.' -f2 | base64 -d 2&gt;/dev/null | jq .
</code></pre>
<h3 id="heading-error-could-not-load-credentials-from-any-providers">Error: <code>Could not load credentials from any providers</code></h3>
<p>This almost always means <code>id-token: write</code> is missing from your workflow permissions. Double-check that you have:</p>
<pre><code class="language-yaml">permissions:
  id-token: write
  contents: read
</code></pre>
<h3 id="heading-error-accessdenied-when-calling-an-aws-service">Error: <code>AccessDenied</code> When Calling an AWS Service</h3>
<p>Authentication succeeded but the IAM role doesn't have permission to perform the action your workflow is attempting. Check the permissions policy attached to your role and compare it against the specific action in the error message.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You've gone from storing static, long-lived AWS credentials in GitHub Secrets to a fully keyless authentication setup using OIDC. Here's what you accomplished:</p>
<ul>
<li><p>Registered GitHub as a trusted OIDC identity provider in AWS.</p>
</li>
<li><p>Created an IAM role with a scoped trust policy tied to a specific repository.</p>
</li>
<li><p>Attached least-privilege permissions to that role.</p>
</li>
<li><p>Configured your GitHub Actions workflow to request and use short-lived AWS credentials.</p>
</li>
<li><p>Verified the authentication flow end-to-end.</p>
</li>
</ul>
<p>This pattern works across every AWS service from S3, ECS, Lambda, ECR, Secrets Manager, and more. The workflow example here uses S3, but you only need to swap out the permissions policy and the deployment commands to adapt it for any service.</p>
<p>If you want to go further, explore:</p>
<ul>
<li><p><a href="https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect#supported-cloud-providers">Configuring OIDC for multiple cloud providers</a>: Azure, GCP, and HashiCorp Vault.</p>
</li>
<li><p><a href="https://docs.github.com/en/actions/deployment/targeting-different-environments/using-environments-for-deployment">GitHub Environments and deployment protection rules</a>: for multi-stage pipelines with approval gates.</p>
</li>
<li><p><a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/what-is-access-analyzer.html">AWS IAM Access Analyzer</a>: to validate and tighten your role policies automatically.</p>
</li>
</ul>
<p><em>If you're building out your DevOps practice and want a complete, production-ready reference for infrastructure automation, CI/CD, and platform engineering, check out</em> <a href="https://coachli.co/tolani-akintayo/PR-H4oQS"><em><strong>The Startup DevOps Field Guide</strong></em></a><em>. It covers the patterns, templates, and runbooks I've used across real AWS environments.</em></p>
<p><em>You can also connect with me on</em> <a href="https://www.linkedin.com/in/tolani-akintayo"><em>LinkedIn</em></a></p>
<h2 id="heading-references">References</h2>
<ul>
<li><p><a href="https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect">GitHub Docs: About security hardening with OpenID Connect</a></p>
</li>
<li><p><a href="https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services">GitHub Docs: Configuring OpenID Connect in Amazon Web Services</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html">AWS Docs: Creating OpenID Connect (OIDC) identity providers</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRoleWithWebIdentity.html">AWS Docs: AssumeRoleWithWebIdentity API Reference</a></p>
</li>
<li><p><a href="https://github.com/aws-actions/configure-aws-credentials">aws-actions/configure-aws-credentials - GitHub</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/service-authorization/latest/reference/reference_policies_actions-resources-contextkeys.html">AWS IAM Actions Reference</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html">AWS CloudTrail User Guide</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Why Chrome OS Is the Operating System the AI Era Was Built For ]]>
                </title>
                <description>
                    <![CDATA[ Chrome OS runs on a read-only filesystem. You can't install executables on the host. There's no traditional desktop environment. Everything that interacts with the underlying system does so through a  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/why-chrome-os-is-the-ai-os/</link>
                <guid isPermaLink="false">69e2765cfd22b8ad62611ba8</guid>
                
                    <category>
                        <![CDATA[ Chrome OS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Linux ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cybersecurity ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Christopher Galliart ]]>
                </dc:creator>
                <pubDate>Fri, 17 Apr 2026 18:05:16 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/c4116a06-9e42-4da5-a152-0fe1433e0857.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Chrome OS runs on a read-only filesystem. You can't install executables on the host. There's no traditional desktop environment. Everything that interacts with the underlying system does so through a sandboxed browser, a containerized Linux terminal, or a cloud connection.</p>
<p>For years, that list of constraints was the reason people dismissed it. But in 2026, it's the reason Chrome OS might be the most correctly designed operating system for what's coming.</p>
<p>The security architecture treats the endpoint as untrusted by default. The containerized Linux environment gives developers a full headless stack without compromising the host. And an upcoming OS-level rewrite, Aluminium, puts Google's on-device AI models directly into the kernel.</p>
<p>This article covers security architecture, the container-based developer environment, cloud-streamed creative tools via AWS NICE DCV, cloud gaming, and what Aluminium OS means for on-device AI.</p>
<h3 id="heading-heres-what-well-cover">Here's what we'll cover:</h3>
<ol>
<li><p><a href="#heading-security-first-architecture-in-the-era-of-ai-powered-threats">Security-First Architecture in an Era of AI-Powered Threats</a></p>
</li>
<li><p><a href="#heading-a-headless-linux-stack-thats-more-flexible-than-it-looks">A Headless Linux Stack That's More Flexible Than It Looks</a></p>
</li>
<li><p><a href="#heading-aws-nice-dcv-changes-the-creative-tools-conversation">AWS NICE DCV Changes the Creative Tools Conversation</a></p>
</li>
<li><p><a href="#heading-cloud-gaming-works">Cloud Gaming Works</a></p>
</li>
<li><p><a href="#heading-aluminum-os-on-device-models-on-googles-own-architecture">Aluminium OS: On-Device Models on Google's Own Architecture</a></p>
</li>
<li><p><a href="#heading-where-this-lands">Where This Lands</a></p>
</li>
</ol>
<h2 id="heading-security-first-architecture-in-an-era-of-ai-powered-threats">Security-First Architecture in an Era of AI-Powered Threats</h2>
<p>Threat actors are getting better tools. Models like Mythos are lowering the barrier for generating convincing phishing campaigns, crafting polymorphic malware, and automating social engineering at scale.</p>
<p>Traditional operating systems present exactly the attack surface these tools target: writable system files, user-installable executables, patches that sit uninstalled for weeks because someone clicked "remind me later."</p>
<p>Chrome OS sidesteps most of this by design. The root filesystem is read-only and cryptographically verified on every boot through a process called Verified Boot.</p>
<p>If anything has modified the OS files since the last verified state, whether that's malware, a compromised package, or a rogue AI agent that decided to start deleting system files, the device detects it at startup and either self-corrects or refuses to boot.</p>
<p>Persistence across reboots isn't difficult. It's architecturally impossible through software alone.</p>
<p>Updates happen silently. While you're working, the system downloads the next OS version to an inactive partition. On your next reboot, it pivots to the updated version. No prompts, no deferred patches, no exposure window.</p>
<p>Major updates ship every four to six weeks. Security patches land every two to three weeks. The gap between vulnerability discovery and remediation is measured in days.</p>
<p>Chrome OS consistently doesn't appear in the top 50 products by CVE count in the NIST vulnerability database. Windows and the Linux kernel sit near the top every year. When AI is actively being weaponized to find and exploit vulnerabilities faster than humans can patch them, a read-only, verified, automatically updated endpoint is a different category of security posture.</p>
<p>The tradeoff is trust. Chrome OS's security model means trusting Google as the root authority for your entire computing stack: updates, certificate trust, telemetry. Organizations with strict data sovereignty requirements should weigh that dependency carefully.</p>
<h2 id="heading-a-headless-linux-stack-thats-more-flexible-than-it-looks">A Headless Linux Stack That's More Flexible Than It Looks</h2>
<p>Chrome OS is a text-based operating system. There's no native GUI layer. Stop and sit with that for a second, because it's the thing that makes people dismiss Chrome OS and also the thing that makes it work.</p>
<p>The entire graphical interface you interact with IS the Chrome browser. The Ash shell, Chrome's window manager, is the desktop. You don't install applications onto it the way you install .exe files on Windows or drag .app bundles into a macOS Applications folder. If it isn't running in a browser tab, an Android VM, or a Linux container, it doesn't run. That restriction is what keeps the host locked down, and it's what makes everything else possible.</p>
<p>Under the hood, Chrome OS runs a minimal virtual machine called Termina through crosvm, Google's Rust-based VM monitor.</p>
<p>Inside Termina, LXD manages Linux containers. The default container, penguin, is a Debian environment with a special trick: it bridges GUI-based Linux applications directly into the Chrome OS desktop through a Wayland proxy called Sommelier. Install VS Code, GIMP, or LibreOffice in penguin and they show up in your Chrome OS app launcher, running in windows alongside your browser tabs. For a lot of developers, penguin alone covers the daily workflow.</p>
<p>But Termina gives you more than penguin. Through the LXD layer you can spin up independent containers that are fully isolated operating systems: Arch, Alpine, Ubuntu, whatever you need.</p>
<p>These aren't attached to the GUI bridge. They run headless, natively, with their own systemd, their own package managers, their own persistent state. Need a clean Ubuntu environment to test a deployment script without touching your main setup? <code>lxc launch</code> and you're there. Need to blow it away? <code>lxc delete</code> and it's gone. No orphaned files on the host, no cross-contamination between environments.</p>
<p>The key distinction from Docker is that LXD runs system containers (full OS emulation) rather than application containers. You get background services, persistent daemons, the works. You can also run Docker inside any of these LXD containers if you need application-level containerization on top of that.</p>
<p>Snapshot your entire environment with <code>lxc snapshot</code> before a risky dependency install and roll back instantly if something breaks. That kind of safety net is broader than version control alone: it captures your full OS configuration, not just code.</p>
<p>Pair this with browser-native tools like GitHub Codespaces, Google Colab, AWS CloudShell, or vscode.dev, and the terminal handles your local tooling while the browser handles everything else.</p>
<p>AI coding assistants like Claude and Gemini already operate natively in the browser. The distance between "cloud IDE" and "local IDE" keeps shrinking.</p>
<p>There are friction points: no custom kernel modules inside Crostini. Nested KVM requires Intel Gen 10+ processors. VPN routing into the Linux container from the Chrome OS host can be a headache, with WireGuard requiring userspace workarounds inside the container.</p>
<p>But none of these break the core architecture for cloud-native work. They're just worth knowing about before you commit.</p>
<h2 id="heading-aws-nice-dcv-changes-the-creative-tools-conversation">AWS NICE DCV Changes the Creative Tools Conversation</h2>
<p>One of the longest-standing arguments against Chrome OS has been the absence of professional creative software. There's no Premiere, no DaVinci Resolve, no Blender, no Ableton. For years, this was a dead-end conversation.</p>
<p>AWS NICE DCV (Desktop Cloud Visualization) reopens it. DCV is a high-performance remote display protocol that streams GPU-accelerated desktop sessions from EC2 instances to any device, including a Chromebook running the browser-based DCV client. It supports OpenGL, Vulkan, and DirectX rendering, with adaptive encoding that adjusts to network conditions. On AWS, the DCV license is free. You pay only for the EC2 compute time.</p>
<p>Netflix engineers use DCV to stream content creation applications to remote artists. Volkswagen runs 3D CAD simulations across their engineering division through it. A VFX studio called RVX used it to deliver visual effects for HBO's The Last of Us, streaming Nuke, Maya, Houdini, and Blender to artists distributed across Europe from servers in Iceland. Their team said it was the best remote experience they'd ever worked with.</p>
<p>So: a Chromebook connected to a g5.xlarge EC2 instance (one A10G GPU) can run Blender, DaVinci Resolve, or any other GPU-accelerated creative application with full hardware acceleration. The rendering happens in the data center. DCV streams the pixels. The creative professional gets a responsive, high-fidelity workspace on a $400 machine that couldn't locally render a single frame.</p>
<p>The constraints are connectivity and cost. You need sustained bandwidth (25+ Mbps for 1080p work, more for 4K multi-monitor setups) and leaving a GPU instance running around the clock adds up. But for studios and professionals who already budget for high-end workstations, the math often pencils out, especially when you factor in zero local hardware maintenance and the ability to scale GPU power on demand.</p>
<h2 id="heading-cloud-gaming-works">Cloud Gaming Works</h2>
<p>GeForce NOW survived where Stadia failed because it made a better business decision: bring your own games. Connect your existing Steam, Epic, or Ubisoft library and stream from NVIDIA's server-side hardware. The Ultimate tier now runs on RTX 5080-class infrastructure. 4K at 120fps with ray tracing, on a fanless Chromebook.</p>
<p>Chrome OS has a structural advantage as a cloud gaming client. GeForce NOW runs natively in the Chromium browser via WebRTC, and users consistently report less micro-stuttering and tighter input handling than the standalone Windows desktop app. Under good network conditions, measured total latency runs 13 to 14ms, with sub-3ms ping documented near datacenter proximity. That's below human perceptual threshold for most game types.</p>
<p>Anti-cheat systems like Easy Anti-Cheat and Riot Vanguard are a non-issue in this model. They run on the server where the game executes, not on your local endpoint. On-device gaming isn't viable on Chrome OS and likely never will be. The architecture isn't designed for it, and even projects attempting to bridge local GPUs hit bottlenecks in the container layers. Cloud gaming is the path, and it works.</p>
<p>The limiting factors are network-dependent. Latency spikes above 500ms on bad connections make fast-twitch games unplayable, and NVIDIA's 100-hour monthly cap on the Ultimate tier has drawn criticism. But cloud gaming on Chrome OS has crossed the line from novelty to daily-driver viable for most use cases.</p>
<h2 id="heading-aluminium-os-on-device-models-on-googles-own-architecture">Aluminium OS: On-Device Models on Google's Own Architecture</h2>
<p>The most consequential near-term development for Chrome OS is Project Aluminium, a ground-up rewrite that replaces the current Chrome OS foundation with a native Android kernel. Not another bolted-on compatibility layer: a new operating system built on Android 16, designed to run Android applications natively with direct hardware acceleration instead of routing them through the resource-heavy ARCVM virtual machine that currently eats CPU cycles on even basic app launches.</p>
<p>The AI story is the real story. Aluminium is being built with Gemini models integrated directly into the OS: the file system, the application launcher, the window manager.</p>
<p>Google serving their own proprietary models on their own devices, using an architecture optimized specifically to run them, is a level of vertical integration that no other OS vendor has in the pipeline. Apple has the silicon advantage for local inference. Google has the model-to-OS integration advantage. Those are competing theses about where AI compute should live, and both are worth taking seriously.</p>
<p>The rollout timeline from court documents and leaked roadmaps puts a trusted tester program on select hardware in late 2026, premium tablets by early 2027, and general consumer availability in 2028. Chrome OS Classic gets maintained through existing support obligations until 2033 or 2034.</p>
<p>The launch won't be perfect. Google's track record on platform transitions gives the community earned skepticism. But the ability to iterate a natively AI-integrated OS on hardware they control is the kind of capability that compounds over time.</p>
<h2 id="heading-where-this-lands">Where This Lands</h2>
<p>Two years ago, calling Chrome OS a serious platform for development or creative work would have been a stretch. Today you can run a full Debian environment with systemd daemons, snapshot your workspace, stream Blender from a GPU-backed data center, play AAA games at 4K on hardware you don't own, and do all of it from a verified, read-only endpoint that patches itself while you sleep.</p>
<p>The remaining gaps are real. But they're concentrated in workflows that are themselves moving to the cloud. Chrome OS was designed around assumptions about computing that used to be premature. They're not premature anymore.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Full-Stack CRUD App with React, AWS Lambda, DynamoDB, and Cognito Auth ]]>
                </title>
                <description>
                    <![CDATA[ Building a web application that works only on your local machine is one thing. Building one that is secure, connected to a real database, and accessible to anyone on the internet is another challenge  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/full-stack-aws-react-lambda-dynamodb-tutorial/</link>
                <guid isPermaLink="false">69b96f7ec22d3eeb8ac3bf81</guid>
                
                    <category>
                        <![CDATA[ serverless ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ React ]]>
                    </category>
                
                    <category>
                        <![CDATA[ full stack ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Benedicta Onyebuchi ]]>
                </dc:creator>
                <pubDate>Tue, 17 Mar 2026 15:13:02 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/1a996eff-72f5-4f4d-b8da-cf4d646c3224.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Building a web application that works only on your local machine is one thing. Building one that is secure, connected to a real database, and accessible to anyone on the internet is another challenge entirely. And it requires a different set of tools.</p>
<p>Most production web applications share a common set of needs: they store and retrieve data, they expose that data through an API, they require users to authenticate before accessing sensitive operations, and they need to be deployed somewhere reliable and fast.</p>
<p>Meeting all of those needs used to require managing servers, configuring databases, handling authentication infrastructure, and provisioning hosting environments – often as separate, manual processes.</p>
<p>AWS changes that model significantly. With the combination of services you'll use in this tutorial (Lambda, DynamoDB, API Gateway, Cognito, and CloudFront), you can build and deploy a fully functional, secured, globally distributed application without managing a single server.</p>
<p>Each service handles one specific responsibility:</p>
<ul>
<li><p>DynamoDB stores your data</p>
</li>
<li><p>Lambda runs your business logic on demand</p>
</li>
<li><p>API Gateway exposes your functions as a REST API</p>
</li>
<li><p>Cognito manages user authentication</p>
</li>
<li><p>CloudFront delivers your frontend worldwide over HTTPS.</p>
</li>
</ul>
<p>The AWS CDK (Cloud Development Kit) ties all of this together by letting you define every one of those services as TypeScript code. Instead of clicking through the AWS Console to configure each resource manually, you describe your entire infrastructure in a single file and deploy it with one command.</p>
<p>By the end of this tutorial, you will have a fully deployed vendor management dashboard. Users can sign up, log in, and then create, read, and delete vendors, with all data securely stored in AWS DynamoDB and all routes protected by Amazon Cognito authentication.</p>
<h2 id="heading-what-youll-build">What You'll Build</h2>
<p>In this handbook, you'll build a two-panel web app where authenticated users can:</p>
<ul>
<li><p>Add a new vendor (name, category, contact email)</p>
</li>
<li><p>View all saved vendors in real time</p>
</li>
<li><p>Delete a vendor from the list</p>
</li>
<li><p>Sign in and sign out securely</p>
</li>
</ul>
<p>The frontend is built with Next.js. The backend runs entirely on AWS: DynamoDB stores the data, Lambda functions handle the logic, API Gateway exposes a REST API, Cognito manages authentication, and CloudFront serves the app globally over HTTPS.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-who-this-is-for">Who This Is For</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-architecture-overview">Architecture Overview</a></p>
</li>
<li><p><a href="#heading-part-1-set-up-your-aws-account-and-tools">Part 1: Set Up Your AWS Account and Tools</a></p>
</li>
<li><p><a href="#heading-part-2-set-up-the-project-structure">Part 2: Set Up the Project Structure</a></p>
</li>
<li><p><a href="#heading-part-3-define-the-database-dynamodb">Part 3: Define the Database (DynamoDB)</a></p>
</li>
<li><p><a href="#heading-part-4-write-the-lambda-functions">Part 4: Write the Lambda Functions</a></p>
</li>
<li><p><a href="#heading-part-5-build-the-api-with-api-gateway">Part 5: Build the API with API Gateway</a></p>
</li>
<li><p><a href="#heading-part-6-deploy-the-backend-to-aws">Part 6: Deploy the Backend to AWS</a></p>
</li>
<li><p><a href="#heading-part-7-build-the-react-frontend">Part 7: Build the React Frontend</a></p>
</li>
<li><p><a href="#heading-part-8-add-authentication-with-amazon-cognito">Part 8: Add Authentication with Amazon Cognito</a></p>
</li>
<li><p><a href="#heading-part-9-deploy-the-frontend-with-s3-and-cloudfront">Part 9: Deploy the Frontend with S3 and CloudFront</a></p>
</li>
<li><p><a href="#heading-what-you-built">What You Built</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-who-this-is-for">Who This Is For</h2>
<p>This tutorial is for developers who know basic JavaScript and React but have never used AWS. You don't need any prior backend, cloud, or DevOps experience. I'll explain every AWS concept before we use it.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before starting, make sure you have the following installed and available:</p>
<ul>
<li><p><strong>Node.js 18 or higher</strong>: <a href="https://nodejs.org">Download here</a></p>
</li>
<li><p><strong>npm</strong>: Included with Node.js</p>
</li>
<li><p><strong>A code editor</strong>: I recommend VS Code</p>
</li>
<li><p><strong>A terminal</strong>: Any terminal on macOS, Linux, or Windows (WSL recommended on Windows)</p>
</li>
<li><p><strong>An AWS account</strong>: You will create one in Part 1. A credit card is required, but the Free Tier covers everything in this tutorial.</p>
</li>
<li><p><strong>Basic familiarity with React and TypeScript</strong>: You should understand components, <code>useState</code>, and <code>useEffect</code>.</p>
</li>
</ul>
<h2 id="heading-architecture-overview">Architecture Overview</h2>
<p>Before writing any code, here's a plain-English description of how the pieces fit together.</p>
<p>When a user clicks "Add Vendor" in the React app:</p>
<ol>
<li><p>The frontend reads the user's JWT auth token from the browser session</p>
</li>
<li><p>It sends a <code>POST</code> request to API Gateway, including the token in the request header</p>
</li>
<li><p>API Gateway checks the token against Cognito. If the token is invalid or missing, it rejects the request with a 401 error immediately</p>
</li>
<li><p>If the token is valid, API Gateway passes the request to the createVendor Lambda function</p>
</li>
<li><p>The Lambda function writes the new vendor to DynamoDB</p>
</li>
<li><p>DynamoDB confirms the write, and the Lambda returns a success response</p>
</li>
<li><p>The frontend re-fetches the vendor list and updates the UI</p>
</li>
</ol>
<p>The same flow applies to reading and deleting vendors, with different Lambda functions and HTTP methods.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/70486bdc-f272-45db-be30-f10752916546.png" alt="Architecture diagram of the Vendors Tracker Application" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>How the app is deployed:</strong> Your React app is exported as a static site, uploaded to an S3 bucket, and served globally through CloudFront. Your backend infrastructure (Lambda functions, API Gateway, DynamoDB, Cognito) is defined in TypeScript using AWS CDK and deployed with a single command.</p>
<h2 id="heading-part-1-set-up-your-aws-account-and-tools">Part 1: Set Up Your AWS Account and Tools</h2>
<p>Before writing any application code, you need three things in place: an AWS account, the right tools on your machine, and credentials that let those tools communicate with AWS on your behalf.</p>
<h3 id="heading-11-create-your-aws-account">1.1 Create Your AWS Account</h3>
<p>If you don't have an AWS account:</p>
<ol>
<li><p>Go to <a href="https://aws.amazon.com">https://aws.amazon.com</a></p>
</li>
<li><p>Click <strong>Create an AWS Account</strong></p>
</li>
<li><p>Follow the sign-up prompts and add a payment method</p>
</li>
<li><p>Once registered, log in to the AWS Management Console</p>
</li>
</ol>
<p>AWS has a Free Tier that covers all the services used in this tutorial. You won't be charged for normal use while following along.</p>
<h3 id="heading-12-install-the-aws-cli-and-cdk">1.2 Install the AWS CLI and CDK</h3>
<p>The <strong>AWS CLI</strong> is a command-line tool that lets you interact with AWS from your terminal: checking resources, configuring credentials, and more.</p>
<p>The <strong>AWS CDK (Cloud Development Kit)</strong> is the tool you will use to define your entire backend (database, Lambda functions, API) using TypeScript code. Instead of clicking through the AWS Console to create each resource, you describe what you want in a TypeScript file and CDK builds it for you.</p>
<p>Install both:</p>
<pre><code class="language-shell"># Install AWS CLI (macOS)
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /

# For Linux, see: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html
# For Windows, see: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-windows.html

# Install AWS CDK globally
npm install -g aws-cdk
</code></pre>
<p>Verify both are installed:</p>
<pre><code class="language-shell">aws --version
cdk --version
</code></pre>
<p>Both commands should print a version number. If they do, you are ready to move on.</p>
<h3 id="heading-13-configure-your-aws-credentials-iam">1.3 Configure Your AWS Credentials (IAM)</h3>
<p>This step is critical. Your terminal needs a set of credentials – like a username and password – to act on your behalf inside AWS.</p>
<p>Think of your root account (the one you signed up with) as the master key to your entire AWS account. You should never use it for day-to-day development. Instead, you will create a separate IAM user with its own set of keys. If those keys are ever exposed, you can delete them without compromising your root account.</p>
<h4 id="heading-phase-1-create-an-iam-user">Phase 1: Create an IAM User</h4>
<ol>
<li><p>Log in to the AWS Console and search for IAM in the top search bar</p>
</li>
<li><p>In the left sidebar, click Users, then click Create user</p>
</li>
<li><p>Name the user <code>cdk-dev</code>. Leave "Provide user access to the AWS Management Console" unchecked – you only need terminal access, not console access</p>
</li>
<li><p>On the permissions screen, choose Attach policies directly</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/d4699108-c1aa-4dd3-957c-b84292c719a2.png" alt="IAM Console showing the “Attach policies directly” screen with AdministratorAccess checked" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<ol>
<li>Search for <code>AdministratorAccess</code> and check the box next to it</li>
</ol>
<p>Note on permissions: In a production job you would use a more restricted policy. For this tutorial, Administrator access is needed because CDK creates many different types of AWS resources.</p>
<p>6. Click through to the end and click Create user</p>
<h4 id="heading-phase-2-generate-access-keys">Phase 2: Generate Access Keys</h4>
<ol>
<li><p>Click on your newly created <code>cdk-dev</code> user from the Users list</p>
</li>
<li><p>Go to the Security credentials tab</p>
</li>
<li><p>Scroll down to Access keys and click Create access key</p>
</li>
<li><p>Select Command Line Interface (CLI), check the acknowledgment box, and click Next</p>
</li>
<li><p>Click Create access key</p>
</li>
</ol>
<p><strong>Important</strong>: Copy both the Access Key ID and the Secret Access Key right now. You will never be able to see the Secret Access Key again after closing this screen. Save both values in a password manager or secure note.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/d85bb4eb-0ecf-4d92-be92-d75af5a534c6.png" alt="IAM Console showing the Create access key screen with the Access Key ID and Secret Access Key" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h4 id="heading-phase-3-connect-your-terminal-to-aws">Phase 3: Connect Your Terminal to AWS</h4>
<p>Run the following command in your terminal:</p>
<pre><code class="language-shell">aws configure
</code></pre>
<p>You will be prompted for four values:</p>
<pre><code class="language-shell">AWS Access Key ID:     [paste your Access Key ID]
AWS Secret Access Key: [paste your Secret Access Key]
Default region name:   us-east-1
Default output format: json
</code></pre>
<p>Use <code>us-east-1</code> as your region for this tutorial. After this step, every CDK and AWS CLI command you run will use these credentials automatically.</p>
<h2 id="heading-part-2-set-up-the-project-structure">Part 2: Set Up the Project Structure</h2>
<p>You will use a <strong>monorepo</strong> layout – one top-level folder with two sub-projects inside: <code>frontend</code> for your React app and <code>backend</code> for your AWS infrastructure code. They are deployed independently but live side by side.</p>
<h3 id="heading-21-create-the-workspace">2.1 Create the Workspace</h3>
<pre><code class="language-shell">mkdir vendor-tracker &amp;&amp; cd vendor-tracker
mkdir backend frontend
</code></pre>
<h3 id="heading-22-initialize-the-frontend-nextjs">2.2 Initialize the Frontend (Next.js)</h3>
<p>Navigate into the <code>frontend</code> folder and run:</p>
<pre><code class="language-shell">cd frontend
npx create-next-app@latest .
</code></pre>
<p>When prompted, choose the following options:</p>
<ul>
<li><p><strong>TypeScript</strong> --&gt; Yes</p>
</li>
<li><p><strong>ESLint</strong> --&gt; Yes</p>
</li>
<li><p><strong>Tailwind CSS</strong> --&gt; Yes</p>
</li>
<li><p><strong>src/ directory</strong> --&gt;No</p>
</li>
<li><p><strong>App Router</strong> --&gt; Yes</p>
</li>
<li><p><strong>Import alias</strong> --&gt; No</p>
</li>
</ul>
<h3 id="heading-23-initialize-the-backend-cdk">2.3 Initialize the Backend (CDK)</h3>
<p>Navigate into the <code>backend</code> folder and run:</p>
<pre><code class="language-shell">cd ../backend
cdk init app --language typescript
</code></pre>
<p>This generates a boilerplate CDK project. The most important file it creates is <code>backend/lib/backend-stack.ts</code>. This is where you will define all of your AWS infrastructure as TypeScript code.</p>
<p>Also install <code>esbuild</code>, which CDK uses to bundle your Lambda functions:</p>
<pre><code class="language-shell">npm install --save-dev esbuild
</code></pre>
<h3 id="heading-24-understanding-cdk-before-you-write-any-code">2.4 Understanding CDK Before You Write Any Code</h3>
<p>CDK is likely different from most tools you have used. Here is how it works:</p>
<p>Normally, you would create AWS resources by clicking through the AWS Console: create a table here, configure a Lambda function there. CDK lets you do all of that using TypeScript code instead.</p>
<p>When you run <code>cdk deploy</code>, CDK reads your TypeScript file, converts it into an AWS CloudFormation template (an internal AWS format for describing infrastructure), and submits it to AWS. AWS then creates all the resources you described.</p>
<p>A few terms you will see throughout this tutorial:</p>
<ul>
<li><p><strong>Stack</strong>: The collection of all AWS resources you define together. Your <code>BackendStack</code> class is your stack.</p>
</li>
<li><p><strong>Construct</strong>: Each individual AWS resource you create inside a stack (a table, a Lambda function, an API) is called a construct.</p>
</li>
<li><p><strong>Deploy</strong>: Running <code>cdk deploy</code> sends your TypeScript definition to AWS and creates or updates the real resources.</p>
</li>
</ul>
<p>The main file you'll work in is <code>backend/lib/backend-stack.ts</code>. Think of it as the blueprint for your entire backend.</p>
<p>Your final project structure will look like this:</p>
<pre><code class="language-plaintext">vendor-tracker/
├── backend/
│   ├── lambda/
│   │   ├── createVendor.ts
│   │   ├── getVendors.ts
│   │   └── deleteVendor.ts
│   ├── lib/
│   │   └── backend-stack.ts
│   └── package.json
└── frontend/
    ├── app/
    │   ├── layout.tsx
    │   ├── page.tsx
    │   └── providers.tsx
    ├── lib/
    │   └── api.ts
    ├── types/
    │   └── vendor.ts
    └── .env.local
</code></pre>
<h2 id="heading-part-3-define-the-database-dynamodb">Part 3: Define the Database (DynamoDB)</h2>
<p>DynamoDB is AWS's NoSQL database. Think of it as a fast, scalable key-value store in the cloud. Every item in a DynamoDB table must have a unique ID called the <strong>partition key</strong>. For your vendor table, that key will be <code>vendorId</code>.</p>
<p>Open <code>backend/lib/backend-stack.ts</code>. Replace the entire file contents with the following:</p>
<pre><code class="language-typescript">import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // 1. DynamoDB Table
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: {
        name: 'vendorId',
        type: dynamodb.AttributeType.STRING,
      },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY, // For development only
    });
  }
}
</code></pre>
<p><strong>What each line does:</strong></p>
<ul>
<li><p><code>partitionKey</code> tells DynamoDB that <code>vendorId</code> is the unique identifier for every record. No two vendors can share the same <code>vendorId</code>.</p>
</li>
<li><p><code>PAY_PER_REQUEST</code> means you only pay when data is actually read or written. There is no charge when the table is idle, which makes it cost-effective for learning.</p>
</li>
<li><p><code>RemovalPolicy.DESTROY</code> means the table will be deleted when you run <code>cdk destroy</code>. For production apps you would not use this.</p>
</li>
</ul>
<h2 id="heading-part-4-write-the-lambda-functions">Part 4: Write the Lambda Functions</h2>
<p>A Lambda function is your server, but unlike a traditional server, it only runs when it's called. AWS spins it up on demand, runs your code, and shuts it down. You're only charged for the time your code is actually running.</p>
<p>You'll write three Lambda functions:</p>
<ul>
<li><p><code>createVendor.ts</code>: Adds a new vendor to DynamoDB</p>
</li>
<li><p><code>getVendors.ts</code>: Returns all vendors from DynamoDB</p>
</li>
<li><p><code>deleteVendor.ts</code>: Removes a vendor from DynamoDB by ID</p>
</li>
</ul>
<p>Create a new folder inside <code>backend</code>:</p>
<pre><code class="language-shell">mkdir backend/lambda
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/6330a84b-77c3-4001-9783-5fedc89ae1c0.png" alt="6330a84b-77c3-4001-9783-5fedc89ae1c0" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-a-note-on-the-aws-sdk">A Note on the AWS SDK</h3>
<p>All three Lambda functions use <strong>AWS SDK v3</strong> (<code>@aws-sdk/client-dynamodb</code> and <code>@aws-sdk/lib-dynamodb</code>). This is the current standard. An older version of the SDK (<code>aws-sdk</code>) exists but is deprecated and not bundled in the Node.js 18 Lambda runtime, which is what you'll use. Stick to v3 throughout.</p>
<h3 id="heading-41-create-vendor-lambda">4.1 Create Vendor Lambda</h3>
<p>Create <code>backend/lambda/createVendor.ts</code>:</p>
<pre><code class="language-typescript">import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, PutCommand } from "@aws-sdk/lib-dynamodb";
import { randomUUID } from "crypto";

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

export const handler = async (event: any) =&gt; {
  try {
    const body = JSON.parse(event.body);

    const item = {
      vendorId: randomUUID(), // Generates a collision-safe unique ID
      name: body.name,
      category: body.category,
      contactEmail: body.contactEmail,
      createdAt: new Date().toISOString(),
    };

    await docClient.send(
      new PutCommand({
        TableName: process.env.TABLE_NAME!,
        Item: item,
      })
    );

    return {
      statusCode: 201,
      headers: {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Headers": "Content-Type,Authorization",
        "Access-Control-Allow-Methods": "OPTIONS,POST,GET,DELETE",
      },
      body: JSON.stringify({ message: "Vendor created", vendorId: item.vendorId }),
    };
  } catch (error) {
    console.error("Error creating vendor:", error);
    return {
      statusCode: 500,
      headers: { "Access-Control-Allow-Origin": "*" },
      body: JSON.stringify({ error: "Failed to create vendor" }),
    };
  }
};
</code></pre>
<p><strong>What each part does:</strong></p>
<ul>
<li><p><code>randomUUID()</code> generates a universally unique ID using Node's built-in <code>crypto</code> module. No extra package is needed. This is more reliable than <code>Date.now()</code>, which can produce duplicate IDs if two requests arrive within the same millisecond.</p>
</li>
<li><p><code>process.env.TABLE_NAME</code> reads the DynamoDB table name from an environment variable. You'll set this value in the CDK stack. This avoids hardcoding the table name inside your Lambda code.</p>
</li>
<li><p>The <code>headers</code> block is required for CORS (Cross-Origin Resource Sharing). Without <code>Access-Control-Allow-Origin</code>, your browser will block responses from a different domain than your frontend. Without <code>Access-Control-Allow-Headers</code>, the <code>Authorization</code> header you add later for Cognito will be rejected during the browser's preflight check.</p>
</li>
</ul>
<h3 id="heading-42-get-vendors-lambda">4.2 Get Vendors Lambda</h3>
<p>Create <code>backend/lambda/getVendors.ts</code>:</p>
<pre><code class="language-typescript">import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, ScanCommand } from "@aws-sdk/lib-dynamodb";

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

export const handler = async () =&gt; {
  try {
    const response = await docClient.send(
      new ScanCommand({
        TableName: process.env.TABLE_NAME!,
      })
    );

    return {
      statusCode: 200,
      headers: {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Headers": "Content-Type,Authorization",
        "Content-Type": "application/json",
      },
      body: JSON.stringify(response.Items ?? []),
    };
  } catch (error) {
    console.error("Error fetching vendors:", error);
    return {
      statusCode: 500,
      headers: { "Access-Control-Allow-Origin": "*" },
      body: JSON.stringify({ error: "Failed to fetch vendors" }),
    };
  }
};
</code></pre>
<p><strong>What each part does:</strong></p>
<ul>
<li><p><code>ScanCommand</code> reads every item in the table and returns them as an array. For a learning project this is fine. In a production app with millions of rows, you would use a more targeted <code>QueryCommand</code> to avoid reading the entire table on every request.</p>
</li>
<li><p><code>response.Items ?? []</code> returns an empty array if the table is empty, preventing the frontend from crashing when there are no vendors yet.</p>
</li>
</ul>
<h3 id="heading-43-delete-vendor-lambda">4.3 Delete Vendor Lambda</h3>
<p>Create <code>backend/lambda/deleteVendor.ts</code>:</p>
<pre><code class="language-typescript">import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, DeleteCommand } from "@aws-sdk/lib-dynamodb";

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

export const handler = async (event: any) =&gt; {
  try {
    const body = JSON.parse(event.body);
    const { vendorId } = body;

    if (!vendorId) {
      return {
        statusCode: 400,
        headers: { "Access-Control-Allow-Origin": "*" },
        body: JSON.stringify({ error: "vendorId is required" }),
      };
    }

    await docClient.send(
      new DeleteCommand({
        TableName: process.env.TABLE_NAME!,
        Key: { vendorId },
      })
    );

    return {
      statusCode: 200,
      headers: {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Headers": "Content-Type,Authorization",
        "Access-Control-Allow-Methods": "OPTIONS,POST,GET,DELETE",
      },
      body: JSON.stringify({ message: "Vendor deleted" }),
    };
  } catch (error) {
    console.error("Error deleting vendor:", error);
    return {
      statusCode: 500,
      headers: { "Access-Control-Allow-Origin": "*" },
      body: JSON.stringify({ error: "Failed to delete vendor" }),
    };
  }
};
</code></pre>
<p><strong>What each part does:</strong></p>
<ul>
<li><p><code>DeleteCommand</code> removes the item whose <code>vendorId</code> matches the key you provide. DynamoDB doesn't return an error if the item doesn't exist. It simply does nothing.</p>
</li>
<li><p>The <code>400</code> guard at the top returns a clear error if the caller forgets to send a <code>vendorId</code>, rather than letting DynamoDB throw a confusing internal error.</p>
</li>
</ul>
<h2 id="heading-part-5-build-the-api-with-api-gateway">Part 5: Build the API with API Gateway</h2>
<p>API Gateway is what gives your Lambda functions a public URL. Without it, there's no way for your browser to trigger a Lambda function. Think of it as the front door of your backend: it receives HTTP requests, checks whether the caller is authorized, routes the request to the correct Lambda, and returns the Lambda's response to the caller.</p>
<p>Now you'll wire everything together in <code>backend/lib/backend-stack.ts</code>.</p>
<h3 id="heading-51-add-lambda-functions-and-api-gateway-to-the-stack">5.1 Add Lambda Functions and API Gateway to the Stack</h3>
<p>Replace the entire contents of <code>backend/lib/backend-stack.ts</code> with this complete, assembled file:</p>
<pre><code class="language-typescript">import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // 1. DynamoDB Table 
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: {
        name: 'vendorId',
        type: dynamodb.AttributeType.STRING,
      },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // 2. Lambda Functions
    const lambdaEnv = { TABLE_NAME: vendorTable.tableName };

    const createVendorLambda = new NodejsFunction(this, 'CreateVendorHandler', {
      entry: 'lambda/createVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const getVendorsLambda = new NodejsFunction(this, 'GetVendorsHandler', {
      entry: 'lambda/getVendors.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const deleteVendorLambda = new NodejsFunction(this, 'DeleteVendorHandler', {
      entry: 'lambda/deleteVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    // 3. Permissions (Least Privilege)
    vendorTable.grantWriteData(createVendorLambda);
    vendorTable.grantReadData(getVendorsLambda);
    vendorTable.grantWriteData(deleteVendorLambda);

    // 4. API Gateway
    const api = new apigateway.RestApi(this, 'VendorApi', {
      restApiName: 'Vendor Service',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization'],
      },
    });

    const vendors = api.root.addResource('vendors');
    vendors.addMethod('POST', new apigateway.LambdaIntegration(createVendorLambda));
    vendors.addMethod('GET', new apigateway.LambdaIntegration(getVendorsLambda));
    vendors.addMethod('DELETE', new apigateway.LambdaIntegration(deleteVendorLambda));

    // 5. Outputs
    new cdk.CfnOutput(this, 'ApiEndpoint', {
      value: api.url,
    });
  }
}
</code></pre>
<p><strong>What each section does:</strong></p>
<p><code>NodejsFunction</code> is a special CDK construct that automatically bundles your Lambda code and all its dependencies into a single file using <code>esbuild</code> before uploading it to AWS. This is why you installed <code>esbuild</code> in Part 2.</p>
<p>Always use <code>NodejsFunction</code> instead of the basic <code>lambda.Function</code> construct. The basic version requires you to manually manage bundling, which causes "Module not found" errors at runtime.</p>
<p><strong>Permissions (Least Privilege):</strong> In AWS, no resource can communicate with any other resource by default. A Lambda function has no access to DynamoDB, S3, or anything else unless you explicitly grant it.</p>
<p>This is called the <strong>Least Privilege</strong> principle: each piece of your system gets exactly the permissions it needs, and nothing more. <code>grantWriteData</code> lets a Lambda write and delete items. <code>grantReadData</code> lets a Lambda read items. Using separate grants for each function means the <code>getVendors</code> Lambda can never accidentally delete data.</p>
<p><code>CfnOutput</code> prints a value to your terminal after <code>cdk deploy</code> completes. You'll use the <code>ApiEndpoint</code> URL to configure your frontend.</p>
<h2 id="heading-part-6-deploy-the-backend-to-aws">Part 6: Deploy the Backend to AWS</h2>
<p>Your infrastructure is fully defined in code. Now you'll deploy it to AWS and get a live API URL.</p>
<h3 id="heading-61-bootstrap-your-aws-environment">6.1 Bootstrap Your AWS Environment</h3>
<p>Before your first CDK deployment, AWS needs a small landing zone in your account – an S3 bucket where CDK can upload your Lambda bundles and other assets. This setup step is called <strong>bootstrapping</strong> and only needs to be done once per AWS account per region.</p>
<p>From inside your <code>backend</code> folder, run:</p>
<pre><code class="language-shell">cdk bootstrap
</code></pre>
<p><strong>Important</strong>: Bootstrapping is region-specific. If you ever switch to a different AWS region, you will need to run <code>cdk bootstrap</code> again in that region.</p>
<h3 id="heading-62-deploy">6.2 Deploy</h3>
<p>Run:</p>
<pre><code class="language-shell">cdk deploy
</code></pre>
<p>CDK will display a summary of everything it is about to create and ask for your confirmation. Type <code>y</code> and press Enter.</p>
<p>When the deployment finishes, you'll see an <strong>Outputs</strong> section in your terminal:</p>
<pre><code class="language-plaintext">Outputs:
BackendStack.ApiEndpoint = https://abcdef123.execute-api.us-east-1.amazonaws.com/prod/
</code></pre>
<p>Copy that URL. You'll need it when building the frontend.</p>
<h3 id="heading-63-troubleshooting-how-to-read-aws-error-logs">6.3 Troubleshooting: How to Read AWS Error Logs</h3>
<p>Real deployments rarely go perfectly the first time. If something goes wrong after deploying, here is how to find the actual error message.</p>
<h4 id="heading-error-502-bad-gateway">Error: 502 Bad Gateway</h4>
<p>A <code>502</code> means API Gateway received your request but your Lambda crashed before it could respond. The most common cause is a missing environment variable – for example, if <code>TABLE_NAME</code> was not passed correctly and the Lambda cannot find the table.</p>
<p>To find the actual error message, use <strong>CloudWatch Logs</strong>:</p>
<ol>
<li><p>Log in to the AWS Console and search for CloudWatch</p>
</li>
<li><p>In the left sidebar, click Logs --&gt; Log groups</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/abfb78fc-574b-4a75-a12b-12fb09f041b3.png" alt="CloudWatch left sidebar with log groups, and the search field showing /aws/lambda/" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<ol>
<li><p>Find the group named <code>/aws/lambda/BackendStack-CreateVendorHandler...</code></p>
</li>
<li><p>Click the most recent Log stream</p>
</li>
<li><p>Read the error message. It will tell you exactly what went wrong</p>
</li>
</ol>
<p>Two common messages and their fixes:</p>
<ul>
<li><p><code>Runtime.ImportModuleError</code> : Your Lambda cannot find a module. Make sure you're using <code>NodejsFunction</code> (not <code>lambda.Function</code>) in your CDK stack. <code>NodejsFunction</code> automatically bundles dependencies; <code>lambda.Function</code> does not.</p>
</li>
<li><p><code>AccessDeniedException</code>: Your Lambda tried to access DynamoDB but doesn't have permission. Check that you have the correct <code>grantWriteData</code> or <code>grantReadData</code> call in your stack for that Lambda.</p>
</li>
</ul>
<h2 id="heading-part-7-build-the-react-frontend">Part 7: Build the React Frontend</h2>
<p>Your backend is live. Now you'll build the React UI that talks to it.</p>
<h3 id="heading-71-define-the-vendor-type">7.1 Define the Vendor Type</h3>
<p>Before writing any API or component code, define what a "vendor" looks like in TypeScript. This gives you type safety throughout your frontend code.</p>
<p>Create <code>frontend/types/vendor.ts</code>:</p>
<pre><code class="language-typescript">export interface Vendor {
  vendorId?: string; // Optional when creating — the Lambda generates it
  name: string;
  category: string;
  contactEmail: string;
  createdAt?: string;
}
</code></pre>
<p>The <code>vendorId?</code> is marked optional with <code>?</code> because when you are <em>creating</em> a new vendor, you don't have an ID yet. The <code>createVendor</code> Lambda generates one. When you <em>read</em> vendors back from the API, <code>vendorId</code> will always be present.</p>
<h3 id="heading-72-create-the-api-service-layer">7.2 Create the API Service Layer</h3>
<p>Rather than writing <code>fetch</code> calls directly inside your React components, you'll centralize all your API logic in one file. This pattern is called a <strong>service layer</strong>. It keeps your components clean and makes it easy to update API calls in one place.</p>
<p>First, create a <code>.env.local</code> file inside your <code>frontend</code> folder to store your API URL:</p>
<pre><code class="language-bash"># frontend/.env.local
NEXT_PUBLIC_API_URL=https://abcdef123.execute-api.us-east-1.amazonaws.com/prod
</code></pre>
<p>Replace the URL with the <code>ApiEndpoint</code> value from your <code>cdk deploy</code> output. The <code>NEXT_PUBLIC_</code> prefix is required by Next.js to make an environment variable accessible in the browser.</p>
<p>You might be wondering: <strong>why not hardcode the URL</strong>? If you paste your API URL directly into your code and push it to GitHub, it becomes publicly visible. While an API URL alone does not expose your data (Cognito will protect that), it's good practice to keep URLs and secrets out of source control. Always use .env.local and add it to your .gitignore.</p>
<p>Make sure <code>.env.local</code> is in your <code>.gitignore</code>:</p>
<pre><code class="language-shell">echo ".env.local" &gt;&gt; frontend/.gitignore
</code></pre>
<p>Now create <code>frontend/lib/api.ts</code>:</p>
<pre><code class="language-typescript">import { Vendor } from '@/types/vendor';

const BASE_URL = process.env.NEXT_PUBLIC_API_URL!;

export const getVendors = async (): Promise&lt;Vendor[]&gt; =&gt; {
  const response = await fetch(`${BASE_URL}/vendors`);
  if (!response.ok) throw new Error('Failed to fetch vendors');
  return response.json();
};

export const createVendor = async (vendor: Omit&lt;Vendor, 'vendorId' | 'createdAt'&gt;): Promise&lt;void&gt; =&gt; {
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(vendor),
  });
  if (!response.ok) throw new Error('Failed to create vendor');
};

export const deleteVendor = async (vendorId: string): Promise&lt;void&gt; =&gt; {
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'DELETE',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ vendorId }),
  });
  if (!response.ok) throw new Error('Failed to delete vendor');
};
</code></pre>
<p><strong>What each part does:</strong></p>
<ul>
<li><p><code>Omit&lt;Vendor, 'vendorId' | 'createdAt'&gt;</code> means the <code>createVendor</code> function accepts a vendor without an ID or timestamp (those are generated server-side).</p>
</li>
<li><p><code>if (!response.ok) throw new Error(...)</code> ensures that any HTTP error (4xx or 5xx) surfaces as a JavaScript error in your component, where you can show the user a meaningful message instead of silently failing.</p>
</li>
</ul>
<p>You'll update these functions later in Part 8 to include the Cognito auth token.</p>
<h3 id="heading-73-build-the-main-page">7.3 Build the Main Page</h3>
<p>Now create the main page component. It includes a form for adding vendors and a live list that displays all current vendors.</p>
<p>Replace the contents of <code>frontend/app/page.tsx</code> with:</p>
<pre><code class="language-typescript">'use client';

import { useState, useEffect } from 'react';
import { createVendor, getVendors, deleteVendor } from '@/lib/api';
import { Vendor } from '@/types/vendor';

export default function Home() {
  const [vendors, setVendors] = useState&lt;Vendor[]&gt;([]);
  const [form, setForm] = useState({ name: '', category: '', contactEmail: '' });
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState('');

  const loadVendors = async () =&gt; {
    try {
      const data = await getVendors();
      setVendors(data);
    } catch {
      setError('Failed to load vendors.');
    }
  };

  // Load vendors once when the page first renders
  useEffect(() =&gt; {
    loadVendors();
  }, []);
  // The empty [] means this runs only once. Without it, the effect would
  // run after every render, causing an infinite loop of fetch requests.

  const handleSubmit = async (e: React.FormEvent) =&gt; {
    e.preventDefault(); // Prevent the browser from reloading the page on submit
    setLoading(true);
    setError('');
    try {
      await createVendor(form);
      setForm({ name: '', category: '', contactEmail: '' }); // Reset the form
      await loadVendors(); // Refresh the list from DynamoDB
    } catch {
      setError('Failed to add vendor. Please try again.');
    } finally {
      setLoading(false);
    }
  };

  const handleDelete = async (vendorId: string) =&gt; {
    try {
      await deleteVendor(vendorId);
      await loadVendors(); // Refresh after deleting
    } catch {
      setError('Failed to delete vendor.');
    }
  };

  return (
    &lt;main className="p-10 max-w-5xl mx-auto"&gt;
      &lt;h1 className="text-3xl font-bold mb-2 text-gray-900"&gt;Vendor Tracker&lt;/h1&gt;
      &lt;p className="text-gray-500 mb-8"&gt;Manage your vendors, stored in AWS DynamoDB.&lt;/p&gt;

      {error &amp;&amp; (
        &lt;div className="mb-4 p-3 bg-red-100 text-red-700 rounded"&gt;{error}&lt;/div&gt;
      )}

      &lt;div className="grid grid-cols-1 md:grid-cols-2 gap-10"&gt;

        {/* ── Add Vendor Form ── */}
        &lt;section&gt;
          &lt;h2 className="text-xl font-semibold mb-4 text-gray-800"&gt;Add New Vendor&lt;/h2&gt;
          &lt;form onSubmit={handleSubmit} className="space-y-4"&gt;
            &lt;input
              className="w-full p-2 border rounded text-black focus:outline-none focus:ring-2 focus:ring-orange-400"
              placeholder="Vendor Name"
              value={form.name}
              onChange={e =&gt; setForm({ ...form, name: e.target.value })}
              required
            /&gt;
            &lt;input
              className="w-full p-2 border rounded text-black focus:outline-none focus:ring-2 focus:ring-orange-400"
              placeholder="Category (e.g. SaaS, Hardware)"
              value={form.category}
              onChange={e =&gt; setForm({ ...form, category: e.target.value })}
              required
            /&gt;
            &lt;input
              className="w-full p-2 border rounded text-black focus:outline-none focus:ring-2 focus:ring-orange-400"
              placeholder="Contact Email"
              type="email"
              value={form.contactEmail}
              onChange={e =&gt; setForm({ ...form, contactEmail: e.target.value })}
              required
            /&gt;
            &lt;button
              type="submit"
              disabled={loading}
              className="w-full bg-orange-500 text-white p-2 rounded hover:bg-orange-600 disabled:bg-gray-400 transition-colors"
            &gt;
              {loading ? 'Saving...' : 'Add Vendor'}
            &lt;/button&gt;
          &lt;/form&gt;
        &lt;/section&gt;

        {/* ── Vendor List ── */}
        &lt;section&gt;
          &lt;h2 className="text-xl font-semibold mb-4 text-gray-800"&gt;
            Current Vendors ({vendors.length})
          &lt;/h2&gt;
          &lt;div className="space-y-3"&gt;
            {vendors.length === 0 ? (
              &lt;p className="text-gray-400 italic"&gt;No vendors yet. Add one using the form.&lt;/p&gt;
            ) : (
              vendors.map(v =&gt; (
                &lt;div
                  key={v.vendorId}
                  className="p-4 border rounded shadow-sm bg-white flex justify-between items-start"
                &gt;
                  &lt;div&gt;
                    &lt;p className="font-semibold text-gray-900"&gt;{v.name}&lt;/p&gt;
                    &lt;p className="text-sm text-gray-500"&gt;{v.category} · {v.contactEmail}&lt;/p&gt;
                  &lt;/div&gt;
                  &lt;button
                    onClick={() =&gt; v.vendorId &amp;&amp; handleDelete(v.vendorId)}
                    className="ml-4 text-sm text-red-500 hover:text-red-700 hover:underline"
                  &gt;
                    Delete
                  &lt;/button&gt;
                &lt;/div&gt;
              ))
            )}
          &lt;/div&gt;
        &lt;/section&gt;

      &lt;/div&gt;
    &lt;/main&gt;
  );
}
</code></pre>
<p><strong>Key points in this component:</strong></p>
<ul>
<li><p><code>'use client'</code> at the top is a Next.js directive. It tells Next.js that this component uses browser APIs (<code>useState</code>, <code>useEffect</code>, event handlers) and must run in the browser, not be pre-rendered on the server.</p>
</li>
<li><p><code>e.preventDefault()</code> inside <code>handleSubmit</code> stops the browser's default form submission behavior, which would cause a full page reload and wipe your React state.</p>
</li>
<li><p>After every <code>createVendor</code> or <code>deleteVendor</code> call, <code>loadVendors()</code> is called again. This re-fetches the latest data from DynamoDB so the UI always matches what is actually stored in the database.</p>
</li>
</ul>
<h3 id="heading-74-test-the-app-locally">7.4 Test the App Locally</h3>
<p>Start your Next.js development server:</p>
<pre><code class="language-shell">cd frontend
npm run dev
</code></pre>
<p>Open <code>http://localhost:3000</code> in your browser. You should see the two-panel layout. Try adding a vendor and confirm it appears in the list.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/281f971a-27b8-49b3-9079-e12601525d80.png" alt="The running Vendor Tracker app at localhost:3000 showing the two-panel layout with the Add Vendor form on the left and an empty vendor list on the right" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/88b5dd74-5847-4310-bec3-b1a2b129fbaa.png" alt="The Vendor Tracker app after a vendor has been added, showing the vendor card in the list" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h4 id="heading-verifying-the-connection-to-aws">Verifying the connection to AWS:</h4>
<p>Open Chrome DevTools (F12) and click the Network tab. When you add a vendor, you should see:</p>
<ul>
<li><p>A <code>POST</code> request to your AWS API URL returning a <strong>201</strong> status code</p>
</li>
<li><p>A <code>GET</code> request returning <strong>200</strong> with the updated vendor list</p>
</li>
</ul>
<p>You can also verify the data was saved by opening the AWS Console, navigating to <strong>DynamoDB --&gt; Tables --&gt; VendorTable --&gt; Explore table items</strong>. Your vendor should appear there.</p>
<h2 id="heading-part-8-add-authentication-with-amazon-cognito">Part 8: Add Authentication with Amazon Cognito</h2>
<p>Right now your API is completely open. Anyone who finds your API URL can add or delete vendors. You'll fix that with <strong>Amazon Cognito</strong>.</p>
<p>Cognito is AWS's authentication service. It manages a User Pool – a database of registered users with usernames and passwords. When a user logs in, Cognito issues a JWT (JSON Web Token): a cryptographically signed string that proves who the user is. Your API Gateway will check for this token on every request. No valid token means no access.</p>
<p><strong>What is a JWT?</strong> A JSON Web Token is a string that looks like <code>eyJhbGci...</code>. It contains encoded information about the user and is signed by Cognito using a secret key.</p>
<p>API Gateway can verify the signature without contacting Cognito on every request, which makes token checking fast. Think of it as a tamper-proof badge: anyone can read the name on it, but only Cognito's signature makes it valid.</p>
<h3 id="heading-81-add-cognito-to-the-cdk-stack">8.1 Add Cognito to the CDK Stack</h3>
<p>Open <code>backend/lib/backend-stack.ts</code> and update it to include Cognito. Here is the complete updated file:</p>
<pre><code class="language-typescript">import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as cognito from 'aws-cdk-lib/aws-cognito';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // ─── 1. DynamoDB Table ────────────────────────────────────────────────────
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: { name: 'vendorId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // ─── 2. Lambda Functions ──────────────────────────────────────────────────
    const lambdaEnv = { TABLE_NAME: vendorTable.tableName };

    const createVendorLambda = new NodejsFunction(this, 'CreateVendorHandler', {
      entry: 'lambda/createVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const getVendorsLambda = new NodejsFunction(this, 'GetVendorsHandler', {
      entry: 'lambda/getVendors.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const deleteVendorLambda = new NodejsFunction(this, 'DeleteVendorHandler', {
      entry: 'lambda/deleteVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    // ─── 3. Permissions ───────────────────────────────────────────────────────
    vendorTable.grantWriteData(createVendorLambda);
    vendorTable.grantReadData(getVendorsLambda);
    vendorTable.grantWriteData(deleteVendorLambda);

    // ─── 4. Cognito User Pool ─────────────────────────────────────────────────
    const userPool = new cognito.UserPool(this, 'VendorUserPool', {
      selfSignUpEnabled: true,
      signInAliases: { email: true },
      autoVerify: { email: true },
      userVerification: {
        emailStyle: cognito.VerificationEmailStyle.CODE,
      },
    });

    // Required to host Cognito's internal auth endpoints
    userPool.addDomain('VendorUserPoolDomain', {
      cognitoDomain: {
        domainPrefix: `vendor-tracker-${this.account}`,
      },
    });

    const userPoolClient = userPool.addClient('VendorAppClient');

    // ─── 5. API Gateway + Authorizer ──────────────────────────────────────────
    const api = new apigateway.RestApi(this, 'VendorApi', {
      restApiName: 'Vendor Service',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization'],
      },
    });

    const authorizer = new apigateway.CognitoUserPoolsAuthorizer(
      this,
      'VendorAuthorizer',
      { cognitoUserPools: [userPool] }
    );

    const authOptions = {
      authorizer,
      authorizationType: apigateway.AuthorizationType.COGNITO,
    };

    const vendors = api.root.addResource('vendors');
    vendors.addMethod('GET', new apigateway.LambdaIntegration(getVendorsLambda), authOptions);
    vendors.addMethod('POST', new apigateway.LambdaIntegration(createVendorLambda), authOptions);
    vendors.addMethod('DELETE', new apigateway.LambdaIntegration(deleteVendorLambda), authOptions);

    // ─── 6. Outputs ───────────────────────────────────────────────────────────
    new cdk.CfnOutput(this, 'ApiEndpoint', { value: api.url });
    new cdk.CfnOutput(this, 'UserPoolId', { value: userPool.userPoolId });
    new cdk.CfnOutput(this, 'UserPoolClientId', { value: userPoolClient.userPoolClientId });
  }
}
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/c5e91abf-e6af-429f-bf5b-b14d18233f6c.png" alt="The newly created User Pool (VendorUserPool...) in the User Pools list, with the User Pool ID visible" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>What changed:</strong></p>
<ul>
<li><p><code>CognitoUserPoolsAuthorizer</code> tells API Gateway to check every request for a valid Cognito JWT before passing it to any Lambda. If the token is missing or invalid, API Gateway rejects the request with a <code>401 Unauthorized</code> response without ever touching your Lambda.</p>
</li>
<li><p><code>authOptions</code> is applied to all three API methods: GET, POST, and DELETE. All routes are now protected.</p>
</li>
<li><p><code>autoVerify: { email: true }</code> tells Cognito to mark the email attribute as verified after a user confirms via the verification code email. It doesn't skip the verification email, as users still receive a code. If you want to skip verification during development, you can manually confirm users in the Cognito console (covered in section 8.5).</p>
</li>
<li><p>Two new <code>CfnOutput</code> values (<code>UserPoolId</code> and <code>UserPoolClientId</code>) will appear in your terminal after the next deployment. Your frontend needs them to connect to Cognito.</p>
</li>
</ul>
<p>Deploy the updated stack:</p>
<pre><code class="language-shell">cd backend
cdk deploy
</code></pre>
<p>After deployment, your terminal output will include three values:</p>
<pre><code class="language-plaintext">Outputs:
BackendStack.ApiEndpoint     = https://abc123.execute-api.us-east-1.amazonaws.com/prod/
BackendStack.UserPoolId      = us-east-1_xxxxxxxx
BackendStack.UserPoolClientId = xxxxxxxxxxxxxxxxxxxx
</code></pre>
<p>Save all three values. You'll use them in the next step.</p>
<h3 id="heading-82-install-and-configure-aws-amplify">8.2 Install and Configure AWS Amplify</h3>
<p><strong>AWS Amplify</strong> is a frontend library that handles all the complex authentication logic for you: it manages the login UI, stores tokens in the browser, refreshes expired tokens automatically, and exposes a simple API to read the current user's session.</p>
<p>Install the Amplify libraries inside your <code>frontend</code> folder:</p>
<pre><code class="language-shell">cd frontend
npm install aws-amplify @aws-amplify/ui-react
</code></pre>
<p>Create <code>frontend/app/providers.tsx</code>. This file initializes Amplify with your Cognito configuration. It runs once when the app loads:</p>
<pre><code class="language-typescript">'use client';

import { Amplify } from 'aws-amplify';

Amplify.configure(
  {
    Auth: {
      Cognito: {
        userPoolId: process.env.NEXT_PUBLIC_USER_POOL_ID!,
        userPoolClientId: process.env.NEXT_PUBLIC_USER_POOL_CLIENT_ID!,
      },
    },
  },
  { ssr: true }
);

export function Providers({ children }: { children: React.ReactNode }) {
  return &lt;&gt;{children}&lt;/&gt;;
}
</code></pre>
<p>Add the Cognito IDs to your <code>frontend/.env.local</code> file:</p>
<pre><code class="language-shell">NEXT_PUBLIC_API_URL=https://abc123.execute-api.us-east-1.amazonaws.com/prod
NEXT_PUBLIC_USER_POOL_ID=us-east-1_xxxxxxxx
NEXT_PUBLIC_USER_POOL_CLIENT_ID=xxxxxxxxxxxxxxxxxxxx
</code></pre>
<p>Replace the values with the outputs from your <code>cdk deploy</code>.</p>
<h3 id="heading-83-wire-providers-into-the-app-layout">8.3 Wire Providers into the App Layout</h3>
<p><strong>This step is critical.</strong> Amplify must be initialized before any component tries to use authentication. If you skip this step, <code>fetchAuthSession()</code> will throw an "Amplify not configured" error and nothing will work.</p>
<p>Open <code>frontend/app/layout.tsx</code> and update it to wrap the app in the <code>Providers</code> component:</p>
<pre><code class="language-typescript">import type { Metadata } from 'next';
import './globals.css';
import { Providers } from './providers';

export const metadata: Metadata = {
  title: 'Vendor Tracker',
  description: 'Manage your vendors with AWS',
};

export default function RootLayout({
  children,
}: {
  children: React.ReactNode;
}) {
  return (
    &lt;html lang="en"&gt;
      &lt;body&gt;
        &lt;Providers&gt;{children}&lt;/Providers&gt;
      &lt;/body&gt;
    &lt;/html&gt;
  );
}
</code></pre>
<p>By wrapping <code>{children}</code> in <code>&lt;Providers&gt;</code>, you ensure that Amplify is configured once at the root of the app, before any child page or component renders.</p>
<h3 id="heading-84-protect-the-ui-with-withauthenticator">8.4 Protect the UI with withAuthenticator</h3>
<p>Now wrap your <code>Home</code> component so that unauthenticated users see a login screen instead of the dashboard.</p>
<p>Replace the contents of <code>frontend/app/page.tsx</code> with this updated version:</p>
<pre><code class="language-typescript">'use client';

import { useState, useEffect } from 'react';
import { withAuthenticator } from '@aws-amplify/ui-react';
import '@aws-amplify/ui-react/styles.css';
import { getVendors, createVendor, deleteVendor } from '@/lib/api';
import { Vendor } from '@/types/vendor';

// withAuthenticator injects `signOut` and `user` as props automatically
function Home({ signOut, user }: { signOut?: () =&gt; void; user?: any }) {
  const [vendors, setVendors] = useState&lt;Vendor[]&gt;([]);
  const [form, setForm] = useState({ name: '', category: '', contactEmail: '' });
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState('');

  const loadVendors = async () =&gt; {
    try {
      const data = await getVendors();
      setVendors(data);
    } catch {
      setError('Failed to load vendors.');
    }
  };

  useEffect(() =&gt; {
    loadVendors();
  }, []);

  const handleSubmit = async (e: React.FormEvent) =&gt; {
    e.preventDefault();
    setLoading(true);
    setError('');
    try {
      await createVendor(form);
      setForm({ name: '', category: '', contactEmail: '' });
      await loadVendors();
    } catch {
      setError('Failed to add vendor.');
    } finally {
      setLoading(false);
    }
  };

  const handleDelete = async (vendorId: string) =&gt; {
    try {
      await deleteVendor(vendorId);
      await loadVendors();
    } catch {
      setError('Failed to delete vendor.');
    }
  };

  return (
    &lt;main className="p-10 max-w-5xl mx-auto"&gt;
      {/* ── Header ── */}
      &lt;header className="flex justify-between items-center mb-8 p-4 bg-gray-100 rounded"&gt;
        &lt;div&gt;
          &lt;h1 className="text-xl font-bold text-gray-900"&gt;Vendor Tracker&lt;/h1&gt;
          &lt;p className="text-sm text-gray-500"&gt;Signed in as: {user?.signInDetails?.loginId}&lt;/p&gt;
        &lt;/div&gt;
        &lt;button
          onClick={signOut}
          className="bg-red-500 text-white px-4 py-2 rounded hover:bg-red-600 transition-colors"
        &gt;
          Sign Out
        &lt;/button&gt;
      &lt;/header&gt;

      {error &amp;&amp; (
        &lt;div className="mb-4 p-3 bg-red-100 text-red-700 rounded"&gt;{error}&lt;/div&gt;
      )}

      &lt;div className="grid grid-cols-1 md:grid-cols-2 gap-10"&gt;

        {/* ── Add Vendor Form ── */}
        &lt;section&gt;
          &lt;h2 className="text-xl font-semibold mb-4 text-gray-800"&gt;Add New Vendor&lt;/h2&gt;
          &lt;form onSubmit={handleSubmit} className="space-y-4"&gt;
            &lt;input
              className="w-full p-2 border rounded text-black"
              placeholder="Vendor Name"
              value={form.name}
              onChange={e =&gt; setForm({ ...form, name: e.target.value })}
              required
            /&gt;
            &lt;input
              className="w-full p-2 border rounded text-black"
              placeholder="Category (e.g. SaaS, Hardware)"
              value={form.category}
              onChange={e =&gt; setForm({ ...form, category: e.target.value })}
              required
            /&gt;
            &lt;input
              className="w-full p-2 border rounded text-black"
              placeholder="Contact Email"
              type="email"
              value={form.contactEmail}
              onChange={e =&gt; setForm({ ...form, contactEmail: e.target.value })}
              required
            /&gt;
            &lt;button
              type="submit"
              disabled={loading}
              className="w-full bg-orange-500 text-white p-2 rounded hover:bg-orange-600 disabled:bg-gray-400"
            &gt;
              {loading ? 'Saving...' : 'Add Vendor'}
            &lt;/button&gt;
          &lt;/form&gt;
        &lt;/section&gt;

        {/* ── Vendor List ── */}
        &lt;section&gt;
          &lt;h2 className="text-xl font-semibold mb-4 text-gray-800"&gt;
            Current Vendors ({vendors.length})
          &lt;/h2&gt;
          &lt;div className="space-y-3"&gt;
            {vendors.length === 0 ? (
              &lt;p className="text-gray-400 italic"&gt;No vendors yet.&lt;/p&gt;
            ) : (
              vendors.map(v =&gt; (
                &lt;div
                  key={v.vendorId}
                  className="p-4 border rounded shadow-sm bg-white flex justify-between items-start"
                &gt;
                  &lt;div&gt;
                    &lt;p className="font-semibold text-gray-900"&gt;{v.name}&lt;/p&gt;
                    &lt;p className="text-sm text-gray-500"&gt;{v.category} · {v.contactEmail}&lt;/p&gt;
                  &lt;/div&gt;
                  &lt;button
                    onClick={() =&gt; v.vendorId &amp;&amp; handleDelete(v.vendorId)}
                    className="ml-4 text-sm text-red-500 hover:text-red-700 hover:underline"
                  &gt;
                    Delete
                  &lt;/button&gt;
                &lt;/div&gt;
              ))
            )}
          &lt;/div&gt;
        &lt;/section&gt;

      &lt;/div&gt;
    &lt;/main&gt;
  );
}

// Wrapping Home with withAuthenticator means any user who is not logged in
// will see Amplify's built-in login/signup screen instead of this component.
export default withAuthenticator(Home);
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/e65a88dc-ea75-4daa-b7cf-eac3406c8060.png" alt="Amplify-generated login screen" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-85-pass-the-auth-token-to-api-calls">8.5 Pass the Auth Token to API Calls</h3>
<p>Now that API Gateway requires a JWT on every request, your <code>fetch</code> calls need to include the token in the <code>Authorization</code> header. Without it, every request will return a <code>401 Unauthorized</code> error.</p>
<p>Update <code>frontend/lib/api.ts</code> with a token helper and updated fetch calls:</p>
<pre><code class="language-typescript">import { fetchAuthSession } from 'aws-amplify/auth';
import { Vendor } from '@/types/vendor';

const BASE_URL = process.env.NEXT_PUBLIC_API_URL!;

// Retrieves the current user's JWT token from the active Amplify session
const getAuthToken = async (): Promise&lt;string&gt; =&gt; {
  const session = await fetchAuthSession();
  const token = session.tokens?.idToken?.toString();
  if (!token) throw new Error('No active session. Please sign in.');
  return token;
};

export const getVendors = async (): Promise&lt;Vendor[]&gt; =&gt; {
  const token = await getAuthToken();
  const response = await fetch(`${BASE_URL}/vendors`, {
    headers: { Authorization: token },
  });
  if (!response.ok) throw new Error('Failed to fetch vendors');
  return response.json();
};

export const createVendor = async (
  vendor: Omit&lt;Vendor, 'vendorId' | 'createdAt'&gt;
): Promise&lt;void&gt; =&gt; {
  const token = await getAuthToken();
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: token,
    },
    body: JSON.stringify(vendor),
  });
  if (!response.ok) throw new Error('Failed to create vendor');
};

export const deleteVendor = async (vendorId: string): Promise&lt;void&gt; =&gt; {
  const token = await getAuthToken();
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'DELETE',
    headers: {
      'Content-Type': 'application/json',
      Authorization: token,
    },
    body: JSON.stringify({ vendorId }),
  });
  if (!response.ok) throw new Error('Failed to delete vendor');
};
</code></pre>
<p><strong>What</strong> <code>getAuthToken</code> <strong>does:</strong></p>
<p><code>fetchAuthSession()</code> reads the currently logged-in user's session from the browser. Amplify stores the session in memory and <code>localStorage</code> after the user signs in.</p>
<p><code>session.tokens?.idToken</code> is the JWT string that API Gateway's Cognito Authorizer is looking for. Passing it as the <code>Authorization</code> header tells API Gateway: "This request is from an authenticated user."</p>
<h3 id="heading-86-troubleshooting-cognito">8.6 Troubleshooting Cognito</h3>
<h4 id="heading-unconfirmed-user-error-after-sign-up">"Unconfirmed" user error after sign-up</h4>
<p>When a new user signs up through the Amplify UI, Cognito marks the account as <em>Unconfirmed</em> until the user verifies their email address. A verification code is sent to the user's email. After entering the code, the account becomes confirmed and the user can log in.</p>
<p>If you are testing locally and want to skip the email step, you can manually confirm any account in the AWS Console:</p>
<ol>
<li><p>Open the AWS Console and navigate to Cognito</p>
</li>
<li><p>Click on your User Pool (<code>VendorUserPool...</code>)</p>
</li>
<li><p>Click the Users tab</p>
</li>
<li><p>Click on the user's email address</p>
</li>
<li><p>Open the Actions dropdown and click Confirm account</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/158fb773-9cb1-4c14-9fd7-49e4369ba7e3.png" alt=" Cognito Users list showing a user with &quot;Unconfirmed&quot; status" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/5637ac80-ee0c-4fdf-93cf-d4b7d71f6a65.png" alt="Cognito Users list showing a user with &quot;Unconfirmed&quot; status" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h4 id="heading-401-unauthorized-errors-after-deployment">401 Unauthorized errors after deployment</h4>
<p>If you are getting 401 errors, check two things:</p>
<ol>
<li><p>Open Chrome DevTools --&gt; Network tab, click the failing request, and look at the <strong>Request Headers</strong>. You should see an <code>Authorization</code> header with a long string of characters. If it is missing, <code>getAuthToken</code> is failing. Check that Amplify is configured correctly in <code>providers.tsx</code> and wired in via <code>layout.tsx</code>.</p>
</li>
<li><p>In your CDK stack, confirm that <code>authorizationType: apigateway.AuthorizationType.COGNITO</code> is present on every protected method definition. If it is missing, API Gateway may not be checking tokens even though the authorizer is defined.</p>
</li>
</ol>
<h2 id="heading-part-9-deploy-the-frontend-with-s3-and-cloudfront">Part 9: Deploy the Frontend with S3 and CloudFront</h2>
<p>Your app works locally. Now you'll deploy it to a real HTTPS URL that anyone in the world can visit.</p>
<p><strong>The strategy:</strong> Next.js will export your React app as a set of static HTML, CSS, and JavaScript files. Those files will be uploaded to an <strong>S3 bucket</strong> (AWS's file storage service). <strong>CloudFront</strong> sits in front of the bucket as a Content Delivery Network (CDN), distributing your files to servers around the world and serving them over HTTPS.</p>
<h3 id="heading-91-configure-nextjs-for-static-export">9.1 Configure Next.js for Static Export</h3>
<p>Open <code>frontend/next.config.js</code> (or <code>next.config.mjs</code>) and add the <code>output: 'export'</code> setting:</p>
<pre><code class="language-javascript">/** @type {import('next').NextConfig} */
const nextConfig = {
  output: 'export', // Generates a static /out folder instead of a Node.js server
};

export default nextConfig;
</code></pre>
<p><strong>Note on 'use client' and static export</strong>: When output: 'export' is set, Next.js builds every page at compile time. Any component that uses browser-only APIs – like withAuthenticator from Amplify – must have 'use client' at the top of the file. This tells Next.js to skip server-side rendering for that component and run it only in the browser.</p>
<p>You already have 'use client' in page.tsx. If you ever see a build error mentioning window is not defined or similar, check that the relevant component has 'use client' at the top.</p>
<p>Build the frontend:</p>
<pre><code class="language-shell">cd frontend
npm run build
</code></pre>
<p>This generates an <code>/out</code> folder containing your complete website as static files. Verify the folder was created:</p>
<pre><code class="language-shell">ls out
# You should see: index.html, _next/, etc.
</code></pre>
<h3 id="heading-92-add-s3-and-cloudfront-to-the-cdk-stack">9.2 Add S3 and CloudFront to the CDK Stack</h3>
<p>Open <code>backend/lib/backend-stack.ts</code> and add the hosting infrastructure. Here's the complete final version of the file:</p>
<pre><code class="language-typescript">import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as cognito from 'aws-cdk-lib/aws-cognito';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as cloudfront from 'aws-cdk-lib/aws-cloudfront';
import * as origins from 'aws-cdk-lib/aws-cloudfront-origins';
import * as s3deploy from 'aws-cdk-lib/aws-s3-deployment';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // 1. DynamoDB Table 
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: { name: 'vendorId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // 2. Lambda Functions
    const lambdaEnv = { TABLE_NAME: vendorTable.tableName };

    const createVendorLambda = new NodejsFunction(this, 'CreateVendorHandler', {
      entry: 'lambda/createVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const getVendorsLambda = new NodejsFunction(this, 'GetVendorsHandler', {
      entry: 'lambda/getVendors.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const deleteVendorLambda = new NodejsFunction(this, 'DeleteVendorHandler', {
      entry: 'lambda/deleteVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    // 3. Permissions
    vendorTable.grantWriteData(createVendorLambda);
    vendorTable.grantReadData(getVendorsLambda);
    vendorTable.grantWriteData(deleteVendorLambda);

    // 4. Cognito User Pool
    const userPool = new cognito.UserPool(this, 'VendorUserPool', {
      selfSignUpEnabled: true,
      signInAliases: { email: true },
      autoVerify: { email: true },
      userVerification: {
        emailStyle: cognito.VerificationEmailStyle.CODE,
      },
    });

    userPool.addDomain('VendorUserPoolDomain', {
      cognitoDomain: { domainPrefix: `vendor-tracker-${this.account}` },
    });

    const userPoolClient = userPool.addClient('VendorAppClient');

    // 5. API Gateway + Authorizer
    const api = new apigateway.RestApi(this, 'VendorApi', {
      restApiName: 'Vendor Service',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization'],
      },
    });

    const authorizer = new apigateway.CognitoUserPoolsAuthorizer(
      this,
      'VendorAuthorizer',
      { cognitoUserPools: [userPool] }
    );

    const authOptions = {
      authorizer,
      authorizationType: apigateway.AuthorizationType.COGNITO,
    };

    const vendors = api.root.addResource('vendors');
    vendors.addMethod('GET', new apigateway.LambdaIntegration(getVendorsLambda), authOptions);
    vendors.addMethod('POST', new apigateway.LambdaIntegration(createVendorLambda), authOptions);
    vendors.addMethod('DELETE', new apigateway.LambdaIntegration(deleteVendorLambda), authOptions);

    // 6. S3 Bucket (Frontend Files) 
    const siteBucket = new s3.Bucket(this, 'VendorSiteBucket', {
      blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      autoDeleteObjects: true,
    });

    // 7. CloudFront Distribution (HTTPS + CDN)
    const distribution = new cloudfront.Distribution(this, 'SiteDistribution', {
      defaultBehavior: {
        origin: new origins.S3Origin(siteBucket),
        viewerProtocolPolicy: cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
      },
      defaultRootObject: 'index.html',
      errorResponses: [
        {
          // Redirect all 404s back to index.html so React can handle routing
          httpStatus: 404,
          responseHttpStatus: 200,
          responsePagePath: '/index.html',
        },
      ],
    });

    // 8. Deploy Frontend Files to S3 
    new s3deploy.BucketDeployment(this, 'DeployWebsite', {
      sources: [s3deploy.Source.asset('../frontend/out')],
      destinationBucket: siteBucket,
      distribution,
      distributionPaths: ['/*'], // Clears CloudFront cache on every deploy
    });

    // 9. Outputs ───────────────────────────────────────────────────────────
    new cdk.CfnOutput(this, 'ApiEndpoint', { value: api.url });
    new cdk.CfnOutput(this, 'UserPoolId', { value: userPool.userPoolId });
    new cdk.CfnOutput(this, 'UserPoolClientId', { value: userPoolClient.userPoolClientId });
    new cdk.CfnOutput(this, 'CloudFrontURL', {
      value: `https://${distribution.distributionDomainName}`,
    });
  }
}
</code></pre>
<p><strong>What the hosting infrastructure does:</strong></p>
<ul>
<li><p>The <strong>S3 bucket</strong> stores your static HTML, CSS, and JavaScript files. It is private – users cannot access it directly.</p>
</li>
<li><p><strong>CloudFront</strong> is the CDN that sits in front of S3. It gives you an HTTPS URL and caches your files at edge locations worldwide, so the app loads fast no matter where users are located. <code>REDIRECT_TO_HTTPS</code> automatically upgrades any HTTP request to HTTPS.</p>
</li>
<li><p>The <strong>error response</strong> for 404 returns <code>index.html</code> instead of an error page. This is necessary for single-page apps: if a user navigates directly to a route like <code>/vendors/123</code>, CloudFront cannot find a file at that path, but sending back <code>index.html</code> lets the React app handle the routing correctly.</p>
</li>
<li><p><code>distributionPaths: ['/*']</code> tells CloudFront to invalidate its entire cache after every deployment. This ensures users always see the latest version of your app immediately.</p>
</li>
<li><p><code>BucketDeployment</code> is a CDK construct that automatically uploads the contents of your <code>frontend/out</code> folder to the S3 bucket every time you run <code>cdk deploy</code>.</p>
</li>
</ul>
<h3 id="heading-93-run-the-final-deployment">9.3 Run the Final Deployment</h3>
<p>First, build the frontend with the latest environment variables:</p>
<pre><code class="language-shell">cd frontend
npm run build
</code></pre>
<p>Then deploy everything from the backend folder:</p>
<pre><code class="language-shell">cd ../backend
cdk deploy
</code></pre>
<p>After deployment finishes, copy the <code>CloudFrontURL</code> from the terminal output:</p>
<pre><code class="language-plaintext">Outputs:
BackendStack.CloudFrontURL = https://d1234abcd.cloudfront.net
</code></pre>
<p>Open that URL in your browser. Your app is now live on the internet, served over HTTPS, globally distributed.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62d53ab5bc2c7a1dc672b04f/f8e14979-a667-4afc-bdd4-9afe4abd9593.png" alt="f8e14979-a667-4afc-bdd4-9afe4abd9593" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-what-you-built">What You Built</h2>
<p>You now have a fully deployed, production-style full-stack application. Here is a summary of every piece you built and what it does:</p>
<table>
<thead>
<tr>
<th>Layer</th>
<th>Service</th>
<th>What it does</th>
</tr>
</thead>
<tbody><tr>
<td>Frontend</td>
<td>Next.js + CloudFront</td>
<td>React UI served globally over HTTPS</td>
</tr>
<tr>
<td>Auth</td>
<td>Amazon Cognito + Amplify</td>
<td>User sign-up, login, and JWT token management</td>
</tr>
<tr>
<td>API</td>
<td>API Gateway</td>
<td>Routes HTTP requests, validates auth tokens</td>
</tr>
<tr>
<td>Logic</td>
<td>AWS Lambda (×3)</td>
<td>Creates, reads, and deletes vendors on demand</td>
</tr>
<tr>
<td>Database</td>
<td>DynamoDB</td>
<td>Stores vendor records with no idle cost</td>
</tr>
<tr>
<td>Storage</td>
<td>S3</td>
<td>Holds your built frontend files</td>
</tr>
<tr>
<td>Infrastructure</td>
<td>AWS CDK</td>
<td>Defines and deploys all of the above as code</td>
</tr>
</tbody></table>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You have built and deployed the foundational pattern of almost every cloud application: a secured API backed by a database, deployed with infrastructure as code. Here is everything you accomplished:</p>
<p>You set up a professional AWS development environment with scoped IAM credentials. You defined your entire backend infrastructure as TypeScript code using AWS CDK, which means your database, API, Lambda functions, and authentication system are all version-controlled, repeatable, and deployable with a single command.</p>
<p>You wrote three Lambda functions that handle create, read, and delete operations, each with proper error handling and the correct AWS SDK v3 patterns. You connected them to a REST API through API Gateway and protected every route with Amazon Cognito authentication, so only registered, verified users can interact with your data.</p>
<p>On the frontend, you built a Next.js application with a service layer that cleanly separates API logic from UI components, manages JWTs automatically through AWS Amplify, and gives users a complete sign-up and sign-in flow without you writing a single line of authentication UI code.</p>
<p>Finally, you deployed the entire system: your backend to AWS Lambda and DynamoDB, and your frontend as a static site served globally through CloudFront over HTTPS.</p>
<p>The full source code for this tutorial is available on <a href="https://github.com/BenedictaUche/vendor-tracker">GitHub</a>. Clone it, modify it, and use it as a reference for your own projects.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Serverless RAG Pipeline on AWS That Scales to Zero ]]>
                </title>
                <description>
                    <![CDATA[ Most RAG tutorials end the same way: you've got a working prototype and a bill for a vector database that runs whether anyone's querying it or not. Add an always-on embedding service, a hosted LLM end ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-serverless-rag-pipeline-on-aws-that-scales-to-zero/</link>
                <guid isPermaLink="false">69b1b23c6c896b0519b4eda8</guid>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ serverless ]]>
                    </category>
                
                    <category>
                        <![CDATA[ RAG  ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Christopher Galliart ]]>
                </dc:creator>
                <pubDate>Wed, 11 Mar 2026 18:19:40 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/c0416d9e-9661-47a3-ba9c-8001f5f91b8c.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most RAG tutorials end the same way: you've got a working prototype and a bill for a vector database that runs whether anyone's querying it or not. Add an always-on embedding service, a hosted LLM endpoint, and the usual AWS infrastructure, and you're looking at real money before a single user shows up.</p>
<p>But it doesn't have to work that way. In this tutorial, you'll deploy a fully serverless RAG pipeline that processes documents, images, video, and audio, then scales to zero when nobody's using it.</p>
<p>Everything runs in your AWS account, your data never leaves your infrastructure, and your ongoing monthly cost for a modest knowledge base will be closer to <code>2-3 USD</code> than <code>300 USD</code>.</p>
<p>We'll use <a href="https://github.com/HatmanStack/RAGStack-Lambda">RAGStack-Lambda</a>, an open-source project I built on AWS. By the end, you'll have a deployed pipeline with a dashboard, an AI chat interface with source citations, a drop-in web component you can embed in any app, and an MCP server you can use to feed your assistant context.</p>
<h3 id="heading-heres-what-well-cover">Here's what we'll cover:</h3>
<ul>
<li><p><a href="#heading-what-this-actually-costs">What This Actually Costs</a></p>
</li>
<li><p><a href="#heading-what-youre-building">What You're Building</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-deploying-from-aws-marketplace">Deploying from AWS Marketplace</a></p>
</li>
<li><p><a href="#heading-deploying-from-source">Deploying from Source</a></p>
</li>
<li><p><a href="#heading-uploading-your-first-documents">Uploading Your First Documents</a></p>
</li>
<li><p><a href="#heading-chatting-with-your-knowledge-base">Chatting With Your Knowledge Base</a></p>
</li>
<li><p><a href="#heading-embedding-the-web-component-in-your-app">Embedding the Web Component in Your App</a></p>
</li>
<li><p><a href="#heading-using-the-mcp-server">Using the MCP Server</a></p>
</li>
<li><p><a href="#heading-what-you-can-build-from-here">What You Can Build From Here</a></p>
</li>
<li><p><a href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-what-this-actually-costs">What This Actually Costs</h2>
<p>Before we build anything, let's talk money, because the cost story is the whole point.</p>
<p>RAG pipelines have two cost phases: ingestion (processing your documents once) and operation (querying them over time).</p>
<p>Most platforms charge you a flat monthly rate regardless of which phase you're in. A serverless architecture flips that: ingestion costs something, and then everything scales to zero.</p>
<h3 id="heading-ingestion-the-one-time-hit">Ingestion: The One-Time Hit</h3>
<p>When you upload documents, several things happen: text extraction (OCR for PDFs and images), embedding generation, metadata extraction, and storage. Here's what that actually costs per service:</p>
<p><strong>Textract (OCR):</strong> This is the most expensive part of ingestion, and it only applies to scanned PDFs and images that need text extraction. Plain text, HTML, CSV, and other text-based formats skip this entirely.</p>
<p>Textract charges about <code>1.50 USD</code> per 1,000 pages for standard text detection. If you're uploading 500 pages of scanned PDFs, that's about <code>0.75 USD</code>. A heavy initial load of several thousand scanned pages might run <code>5-10 USD</code>. But once your documents are processed, you never pay this again unless you add new ones.</p>
<p><strong>Bedrock Embeddings (Nova Multimodal):</strong> This is where your content gets converted into vectors for semantic search. The pricing is almost comically cheap:</p>
<ul>
<li><p>Text: <code>0.00002 USD</code> per 1,000 input tokens</p>
</li>
<li><p>Images: <code>0.00115 USD</code> per image</p>
</li>
<li><p>Video/Audio: <code>0.00200 USD</code> per minute</p>
</li>
</ul>
<p>To put that in perspective: if you have 1,500 text documents averaging 2,500 tokens each after chunking, your total embedding cost is about <code>0.08 USD</code>. A knowledge base with 500 images runs <code>0.58 USD</code>. Even a mixed corpus of text, images, and a few hours of video stays well under <code>2 USD</code> for the entire embedding pass. This is a one-time cost – you only re-embed if you add or update documents.</p>
<p><strong>Bedrock LLM (Metadata Extraction):</strong> RAGStack uses an LLM to analyze each document and extract structured metadata automatically. This is a few inference calls per document using Nova Lite or a similar model. At <code>0.06 USD</code>/<code>0.24 USD</code> per million input/output tokens, processing 1,500 documents costs well under <code>1 USD</code>.</p>
<p><strong>S3 Vectors (Storage):</strong> Storing your embeddings. At <code>0.06 USD</code> per GB/month, a knowledge base of 1,500 documents with 1,024-dimension vectors takes up a trivially small amount of space. We're talking pennies per month.</p>
<p><strong>S3 (Document Storage):</strong> Your source documents in standard S3. Even cheaper, <code>0.023 USD</code> per GB/month.</p>
<p><strong>DynamoDB:</strong> Stores document metadata and processing state. The on-demand pricing model means you pay per request during ingestion, then essentially nothing at rest. A few cents for the initial load.</p>
<p>To put real numbers on it: if you upload 200 text documents (PDFs, HTML, markdown), your total ingestion cost is likely under <code>1 USD</code>. If you upload 1,000 scanned PDFs that need OCR, you might see <code>5-8 USD</code> as a one-time hit. That <code>7-10 USD</code> figure you might see referenced? That's the upper end for a heavy initial load with lots of OCR work.</p>
<h3 id="heading-operation-where-scale-to-zero-shines">Operation: Where Scale-to-Zero Shines</h3>
<p>Once your documents are ingested, the pipeline is waiting. Not running. Waiting. Here's what each query costs:</p>
<p><strong>Lambda:</strong> Invocations are billed per request and duration. The free tier covers 1 million requests/month. For a personal or small-team knowledge base, you may never leave the free tier.</p>
<p><strong>S3 Vectors (Queries):</strong> <code>2.50 USD</code> per million query API calls, plus a per-TB data processing charge. For a small index queried a few hundred times a month, this rounds to effectively zero.</p>
<p><strong>Bedrock (Chat Inference):</strong> This is your main operating cost. Each chat response requires an LLM call. Using Nova Lite at <code>0.06 USD</code> per million input tokens and <code>0.24 USD</code> per million output tokens, a typical RAG query (retrieval context + user question + response) might cost <code>0.001-0.003 USD</code> per query. A hundred queries a month is <code>0.10-0.30 USD</code>.</p>
<p><strong>Step Functions:</strong> Orchestrates the document processing pipeline. Standard workflows charge <code>0.025 USD</code> per 1,000 state transitions. Minimal during operation since it's only active during ingestion.</p>
<p><strong>Cognito:</strong> User authentication. Free for the first 10,000 monthly active users.</p>
<p><strong>CloudFront:</strong> Serves the dashboard UI. Free tier covers 1 TB of data transfer per month.</p>
<p><strong>API Gateway:</strong> Handles GraphQL API requests. Free tier covers 1 million API calls per month.</p>
<p>Add it all up for a knowledge base with 500 documents getting a few hundred queries per month, and your monthly operating cost is somewhere between <code>0.50 USD</code> and <code>3.00 USD</code>. Most of that is the LLM inference for chat responses.</p>
<h3 id="heading-the-comparison-that-matters">The Comparison That Matters</h3>
<p>Here's the same pipeline on a traditional always-on stack:</p>
<table>
<thead>
<tr>
<th>Service</th>
<th>RAGStack-Lambda</th>
<th>Traditional Stack</th>
</tr>
</thead>
<tbody><tr>
<td>Vector Database</td>
<td>S3 Vectors: pennies/mo</td>
<td>Pinecone Starter: <code>70 USD</code>/mo</td>
</tr>
<tr>
<td>Vector Database (alt)</td>
<td>S3 Vectors: pennies/mo</td>
<td>OpenSearch Serverless: about <code>350 USD</code>/mo min</td>
</tr>
<tr>
<td>Compute</td>
<td>Lambda: free tier</td>
<td>EC2 or ECS: <code>50-150 USD</code>/mo</td>
</tr>
<tr>
<td>LLM Inference</td>
<td>Same per-query cost</td>
<td>Same per-query cost</td>
</tr>
<tr>
<td>Total (idle)</td>
<td>about <code>0.50-3.00 USD</code>/mo</td>
<td><code>120-500 USD</code>/mo</td>
</tr>
</tbody></table>
<p>The LLM inference cost per query is roughly the same everywhere – that's Bedrock's on-demand pricing regardless of your architecture. The difference is everything else. Traditional stacks pay a floor cost whether anyone's using them or not. A serverless stack pays for what it uses, and idle costs essentially nothing.</p>
<h3 id="heading-what-about-transcribe">What About Transcribe?</h3>
<p>If you're uploading video or audio, AWS Transcribe adds cost for speech-to-text conversion. Standard transcription runs about <code>0.024 USD</code> per minute of audio. A 10-minute video costs <code>0.24 USD</code> to transcribe. This is a one-time ingestion cost, once transcribed and embedded, the resulting text chunks are queried like any other document.</p>
<h2 id="heading-what-youre-building">What You're Building</h2>
<p>By the end of this tutorial, you'll have a deployed pipeline that does the following:</p>
<ol>
<li><p>You upload a document (PDF, image, video, audio, HTML, CSV, <a href="https://github.com/HatmanStack/RAGStack-Lambda/blob/main/docs/ARCHITECTURE.md">the full list</a> is extensive) through a web dashboard.</p>
</li>
<li><p>The pipeline detects the file type and routes it to the right processor. Scanned PDFs go through OCR via Textract. Video and audio go through Transcribe for speech-to-text, split into 30-second searchable chunks with speaker identification. Images get visual embeddings and any caption text you provide.</p>
</li>
<li><p>An LLM analyzes each document and extracts structured metadata, topic, document type, date range, people mentioned, whatever's relevant. This happens automatically.</p>
</li>
<li><p>Everything gets embedded using Amazon Nova Multimodal Embeddings and stored in a Bedrock Knowledge Base backed by S3 Vectors.</p>
</li>
<li><p>You (or your users) ask questions through an AI chat interface. The pipeline retrieves relevant documents, passes them as context to a Bedrock LLM, and returns an answer with collapsible source citations, including timestamp links for video and audio that jump to the exact position.</p>
</li>
</ol>
<p>All of this runs in your AWS account. No external control plane, no third-party services beyond AWS itself.</p>
<h3 id="heading-the-architecture">The Architecture</h3>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/45eca6a5-91b4-4f55-8b1a-ba9f59a3e25d.png" alt="The diagram illustrates a flowchart of a buyer's AWS account, detailing the application plane with processes like S3 to Lambda OCR, supported by services like Cognito Auth. It emphasizes Amazon Bedrock's integration for knowledge and chat." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>A few things to note about this architecture:</p>
<p><strong>Step Functions orchestrate everything.</strong> When a document is uploaded, a state machine manages the entire processing flow, detecting the file type, routing to the right processor, waiting for async operations like Transcribe jobs, then triggering embedding and metadata extraction.</p>
<p>This is what makes the pipeline reliable without a running server. If a step fails, it retries. You can see exactly where every document is in the processing pipeline.</p>
<p><strong>Lambda does the compute.</strong> Every processing step is a Lambda function. They spin up when needed, run for a few seconds to a few minutes, and shut down. There's no EC2 instance idling at 3 AM.</p>
<p><strong>S3 Vectors is the vector store.</strong> Your embeddings live in S3's purpose-built vector storage rather than in a dedicated vector database like Pinecone or OpenSearch.</p>
<p>This is what makes the "scale to zero" cost possible: you're paying object storage rates for vector data instead of keeping a database cluster warm. It also means your vectors are sitting in your own S3 bucket, not in a third-party managed service that holds your data on their terms.</p>
<p><strong>Cognito handles auth.</strong> The dashboard and API are protected with Cognito user pools. When you deploy, you get a temporary password via email. The web component uses IAM-based authentication, and server-side integrations use API key auth.</p>
<p><strong>CloudFront serves the UI.</strong> The dashboard is a static React app served through CloudFront, so there's no web server to maintain.</p>
<h3 id="heading-two-ways-to-deploy">Two Ways to Deploy</h3>
<p>You have two deployment paths depending on what you want:</p>
<p><strong>AWS Marketplace (the fast path)</strong>, click deploy, fill in two fields (stack name and email), and wait about 10 minutes. No local tooling required. This is the path we'll walk through first.</p>
<p><strong>From Source (the developer path)</strong>, Clone the repo, run <code>publish.py</code>, and deploy via SAM CLI. This is the path for when you want to customize the processing pipeline, modify the UI, or contribute to the project. We'll cover this after the Marketplace walkthrough.</p>
<p>Both paths produce the same stack. The Marketplace version just wraps the CloudFormation template in a one-click deployment.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you deploy, you'll need:</p>
<ul>
<li><p><strong>An AWS account</strong> with permissions to create CloudFormation stacks, Lambda functions, S3 buckets, DynamoDB tables, and Cognito user pools. If you're using an admin account, you're covered.</p>
</li>
<li><p><strong>Bedrock model access:</strong> RAGStack defaults to <code>us-east-1</code> because that's where Nova Multimodal Embeddings is available. Amazon's own models (including Nova) are available by default in Bedrock, no manual enablement required. Just make sure your IAM role has the necessary <code>bedrock:InvokeModel</code> permissions.</p>
</li>
<li><p><strong>For the Marketplace path:</strong> just a web browser.</p>
</li>
<li><p><strong>For the source path:</strong> Python 3.13+, Node.js 24+, AWS CLI and SAM CLI configured, and Docker (for building Lambda layers).</p>
</li>
</ul>
<h2 id="heading-deploying-from-aws-marketplace">Deploying from AWS Marketplace</h2>
<p>This is the fastest path – no local tools, no CLI, no Docker. You'll launch a CloudFormation stack and have a working pipeline in about 10 minutes.</p>
<h3 id="heading-step-1-launch-the-stack">Step 1: Launch the Stack</h3>
<p>Click the <a href="https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?templateURL=https://ragstack-quicklaunch-public.s3.us-east-1.amazonaws.com/ragstack-template.yaml&amp;stackName=my-docs">direct deploy link</a> to open CloudFormation's "Quick create stack" page with the template pre-loaded.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/d354f6bc-dee8-4f44-9b3b-523ea27564c7.png" alt="Screenshot of AWS CloudFormation Quick Create Stack page in dark mode. Sections for template URL, stack name, parameters, and build options are visible." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-2-fill-in-two-fields">Step 2: Fill In Two Fields</h3>
<p>The page has a lot of options, but you only need two:</p>
<ul>
<li><p><strong>Stack name:</strong> Must be lowercase. This becomes the prefix for all your AWS resources (for example, <code>my-docs</code>, <code>team-kb</code>, <code>project-notes</code>). Keep it short.</p>
</li>
<li><p><strong>Admin Email:</strong> Under Required Settings. Cognito will send your temporary login credentials here. Use an email you can access right now.</p>
</li>
</ul>
<p>Everything else – Build Options, Advanced Settings, OCR Backend, model selections – can stay at the defaults. They're there for customization later, but the defaults work out of the box.</p>
<h3 id="heading-step-3-deploy">Step 3: Deploy</h3>
<p>Scroll to the bottom, check the three acknowledgment boxes under "Capabilities and transforms," and click <strong>Create stack</strong>.</p>
<p>Deployment takes roughly 10 minutes. You can watch the progress in the CloudFormation Events tab if you're curious, but there's nothing to do until the stack status flips to <code>CREATE_COMPLETE</code>.</p>
<h3 id="heading-step-4-log-in">Step 4: Log In</h3>
<p>Once the stack finishes, check your email. Cognito sends you the dashboard URL and a temporary password. Log in, set a new password, and you're looking at an empty dashboard ready for documents.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/5ac31b6c-2782-4b66-82a9-0cb962c5dac4.png" alt="A software dashboard interface titled 'Document Pipeline (Demo)' displaying options for uploading, scraping, and searching documents. The screen shows no current documents or scrape jobs, with menu options on the left and a search and filter bar at the center. The overall tone is functional and minimalist." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-deploying-from-source">Deploying from Source</h2>
<p>If you want to customize the pipeline, modify the UI, or contribute to the project, deploy from source instead.</p>
<h3 id="heading-step-1-clone-and-set-up">Step 1: Clone and Set Up</h3>
<pre><code class="language-bash">git clone https://github.com/HatmanStack/RAGStack-Lambda.git
cd RAGStack-Lambda

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
</code></pre>
<h3 id="heading-step-2-deploy">Step 2: Deploy</h3>
<p>The <code>publish.py</code> script handles everything: building the frontend, packaging Lambda functions, and deploying via SAM CLI.</p>
<pre><code class="language-bash">python publish.py \
  --project-name my-docs \
  --admin-email admin@example.com
</code></pre>
<p>This defaults to <code>us-east-1</code> for Nova Multimodal Embeddings. The script will build the React dashboard, build the web component, package all Lambda layers with Docker, and deploy the CloudFormation stack through SAM.</p>
<p>First deploy takes longer (15-20 minutes) because it's building everything from scratch. Subsequent deploys are faster since SAM caches unchanged resources.</p>
<p>If you only want to iterate on the backend and skip UI builds:</p>
<pre><code class="language-bash"># Skip dashboard build (still builds web component)
python publish.py --project-name my-docs --admin-email admin@example.com --skip-ui

# Skip ALL UI builds
python publish.py --project-name my-docs --admin-email admin@example.com --skip-ui-all
</code></pre>
<p>Once it finishes, you'll get the same Cognito email and dashboard URL as the Marketplace path.</p>
<h2 id="heading-uploading-your-first-documents">Uploading Your First Documents</h2>
<p>The dashboard has tabs for different content types. We'll start with the Documents tab since that's the most common use case.</p>
<h3 id="heading-documents">Documents</h3>
<p>Click the <strong>Documents</strong> tab and upload a file. RAGStack accepts a wide range of formats: PDF, DOCX, XLSX, HTML, CSV, JSON, XML, EML, EPUB, TXT, and Markdown. Drag and drop or use the file picker.</p>
<p>Once uploaded, the document enters the processing pipeline. You'll see the status update in real time:</p>
<ol>
<li><p><strong>UPLOADED:</strong> File received and stored in S3.</p>
</li>
<li><p><strong>PROCESSING:</strong> Step Functions has picked it up and routed it to the right processor. Text-based files (HTML, CSV, Markdown) go through direct extraction. Scanned PDFs and images go through Textract OCR. The LLM analyzes the content and extracts structured metadata, topic, document type, people mentioned, date ranges, whatever's relevant to the content.</p>
</li>
<li><p><strong>INDEXED:</strong> Embeddings generated, vectors stored, document is searchable.</p>
</li>
</ol>
<p>Text documents typically process in 1-5 minutes. OCR-heavy documents (scanned PDFs, images with text) can take 2-15 minutes depending on page count.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/3df05041-2632-41a9-a71c-6d764c503f2a.png" alt="Screenshot of a document upload interface labeled &quot;Document Pipeline (Demo).&quot; Central panel shows a box for drag-and-drop file upload. Sleek, modern design." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-images">Images</h3>
<p>The <strong>Images</strong> tab works differently. Upload a JPG, PNG, GIF, or WebP and you can add a caption. Both the visual content and caption text get embedded using Nova Multimodal Embeddings, so you can search by what's in the image or by your description of it.</p>
<p>This is where multimodal embeddings earn their keep. A traditional text-only RAG pipeline would need you to describe every image manually. Here, the image itself becomes searchable, and since everything stays in your AWS account, you're not sending personal photos or sensitive visual content to an external service to get there.</p>
<h3 id="heading-what-about-video-and-audio">What About Video and Audio?</h3>
<p>Upload video or audio files and RAGStack routes them through AWS Transcribe for speech-to-text conversion. The transcript gets split into 30-second chunks with speaker identification, then embedded like any other document. When chat results reference a video source, you get timestamp links that jump to the exact position in the recording.</p>
<h3 id="heading-web-scraping">Web Scraping</h3>
<p>The <strong>Scrape</strong> tab lets you pull websites directly into your knowledge base. Enter a URL and RAGStack crawls the page, extracts the content, and processes it through the same pipeline as uploaded documents, metadata extraction, embedding, indexing.</p>
<p>This is useful for building a knowledge base from existing web content without manually saving and uploading pages. Documentation sites, blog archives, reference material, anything publicly accessible.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/ac2c6239-a323-4770-80f7-31aa7ff3bdfb.png" alt="Web scraping interface with fields for URL, max pages, and depth. A dropdown for scope selection and a 'Start Scrape' button are visible." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-chatting-with-your-knowledge-base">Chatting With Your Knowledge Base</h2>
<p>This is the payoff. Go to the <strong>Chat</strong> tab, type a question, and RAGStack retrieves relevant documents from your knowledge base, passes them as context to a Bedrock LLM, and returns an answer with source citations.</p>
<p>The citations are collapsible, so click to expand and see which documents informed the answer, with the option to download the source file. For video and audio sources, you get clickable timestamps that jump to the relevant moment.</p>
<img src="https://cdn.hashnode.com/uploads/covers/698f5932352111d3f67030a2/760b3cd0-8bb8-493d-97ce-5eb3d0138592.png" alt="Screenshot of a web interface titled &quot;Knowledge Base Chat&quot; with menu options on the left. The central section prompts users to ask document-related questions." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-metadata-filtering">Metadata Filtering</h3>
<p>If you've uploaded enough documents to have meaningful metadata categories, the chat interface lets you filter search results by metadata before querying. RAGStack auto-discovers the metadata structure from your documents, so you don't configure this manually, it just appears as your knowledge base grows.</p>
<p>This is useful when you have a large mixed corpus. Instead of hoping the vector search picks the right context from thousands of documents, you can narrow it down: "only search documents about project X" or "only search content from Q4 2024."</p>
<h2 id="heading-embedding-the-web-component-in-your-app">Embedding the Web Component in Your App</h2>
<p>The dashboard is useful for managing your knowledge base, but the real power is embedding RAGStack's chat in your own application. The web component works with any framework, React, Vue, Angular, Svelte, plain HTML.</p>
<p>Load the script once from your CloudFront distribution:</p>
<pre><code class="language-html">&lt;script src="https://your-cloudfront-url/ragstack-chat.js"&gt;&lt;/script&gt;
</code></pre>
<p>Then drop the component wherever you want a chat interface:</p>
<pre><code class="language-html">&lt;ragstack-chat
  conversation-id="my-app"
  header-text="Ask About Documents"
&gt;&lt;/ragstack-chat&gt;
</code></pre>
<p>That's it. The component handles authentication (via IAM), manages conversation state, and renders source citations, all self-contained. Your CloudFront URL is in the stack outputs.</p>
<p>For server-side integrations that don't need a UI, the GraphQL API is available with API key authentication. You can find your endpoint and API key in the dashboard under Settings.</p>
<h2 id="heading-using-the-mcp-server">Using the MCP Server</h2>
<p>RAGStack includes an MCP server that connects your knowledge base to AI assistants like Claude Desktop, Cursor, VS Code, and Amazon Q CLI. Instead of switching to the dashboard to search your documents, you ask your assistant directly.</p>
<p>Install it:</p>
<pre><code class="language-bash">pip install ragstack-mcp
</code></pre>
<p>Then add it to your AI assistant's MCP configuration:</p>
<pre><code class="language-json">{
  "ragstack": {
    "command": "uvx",
    "args": ["ragstack-mcp"],
    "env": {
      "RAGSTACK_GRAPHQL_ENDPOINT": "YOUR_ENDPOINT",
      "RAGSTACK_API_KEY": "YOUR_API_KEY"
    }
  }
}
</code></pre>
<p>Your endpoint and API key are in the dashboard under Settings. Once configured, type <code>@ragstack</code> in your assistant's chat to invoke the MCP server, then ask things like "search my knowledge base for authentication docs" and it queries RAGStack directly.</p>
<p>See the <a href="https://github.com/HatmanStack/RAGStack-Lambda/blob/main/src/ragstack-mcp/README.md">MCP Server docs</a> for the full list of available tools and setup details.</p>
<h2 id="heading-what-you-can-build-from-here">What You Can Build From Here</h2>
<p>You've got a deployed RAG pipeline that costs almost nothing to run and handles text, images, video, and audio. A few directions you might take it:</p>
<p><strong>A searchable personal archive.</strong> Every conference talk you've saved, every PDF textbook, every tutorial video that's sitting in a folder somewhere. Upload it all, and now you have one search interface across years of accumulated material. The multimodal embeddings mean your screenshots and diagrams are searchable too, not just the text.</p>
<p>I built <a href="https://github.com/HatmanStack/family-archive-document-ai">a family archive app</a> this way, scanned letters, old photos, home videos, with RAGStack deployed as a nested CloudFormation stack so the whole family can search across decades of memories using the chat widget.</p>
<p><strong>A second brain for a client project.</strong> Scrape the client's existing docs, upload the SOW and meeting notes, drop in the codebase documentation. Now you've got a searchable knowledge base scoped to that engagement. Spin it up at the start, tear it down when the contract ends. At these costs, it's disposable infrastructure.</p>
<p><strong>AI chat over a niche dataset.</strong> Recipe collections, legal filings, research papers, local government meeting minutes, any corpus that's too specialized for general-purpose LLMs to know well. The web component means you can ship it as a standalone tool without building a frontend from scratch.</p>
<p><strong>RAG for your MCP workflow.</strong> If you're already using Claude Desktop or Cursor, the MCP server turns your knowledge base into another tool your assistant can reach for. Upload your team's runbooks and architecture docs, and now <code>@ragstack</code> in your editor gives you instant context without tab-switching.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>The serverless RAG pipeline you just deployed handles document processing, multimodal embeddings, metadata extraction, and AI chat with source citations, all scaling to zero when idle, all running in your AWS account. Your documents, your vectors, your infrastructure. The traditional approach to this stack costs <code>120-500 USD</code>/month in baseline infrastructure. This one costs pocket change.</p>
<p>The full source is at <a href="https://github.com/HatmanStack/RAGStack-Lambda">github.com/HatmanStack/RAGStack-Lambda</a>. File issues, open PRs, or just poke around the architecture. If you want to go deeper on the technical tradeoffs, particularly how filtered vector search behaves on cost-optimized backends like S3 Vectors, that's a story for the next post.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Deploy a MERN Stack Notes App on AWS ]]>
                </title>
                <description>
                    <![CDATA[ Platforms like Vercel, Netlify, and Render simplify deployment by handling infrastructure for you. In this tutorial, we’ll step one layer deeper and work directly with AWS to understand the building blocks behind these platforms. You'll take a small ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-deploy-mern-stack-notes-app-aws/</link>
                <guid isPermaLink="false">696af32341a3a861f59ed367</guid>
                
                    <category>
                        <![CDATA[ deployment ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ full stack ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Umair Mirza ]]>
                </dc:creator>
                <pubDate>Sat, 17 Jan 2026 02:25:39 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768616328012/274a3de8-32bb-4b56-9f71-0f0723541c7d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Platforms like Vercel, Netlify, and Render simplify deployment by handling infrastructure for you. In this tutorial, we’ll step one layer deeper and work directly with AWS to understand the building blocks behind these platforms.</p>
<p>You'll take a small React and Express notes app and ship it straight to AWS. We'll use EC2 for the API, RDS Postgres for the database, and S3 (optionally CloudFront) for the frontend. If you're new to AWS, you can turn on the Free Tier first: <a target="_blank" href="https://aws.amazon.com/free">https://aws.amazon.com/free</a>.</p>
<p>If you’ve mostly used one-click deployments before, this guide will help you understand what’s happening behind the scenes. You’ll work directly with the core AWS services involved, focusing only on the pieces that matter so you can see how everything fits together. This will also enable you to have more control over cost, security, and scaling.</p>
<p>If you just want to grab the finished code, it's all in this public repo: <a target="_blank" href="https://github.com/umair-mirza/mern-notes-aws">umair-mirza/mern-notes-aws</a>. You can clone or fork it and follow along without creating a new project from scratch.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-youll-build">What You’ll Build</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-mental-map">Mental Map</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-free-tier-basics">Free Tier Basics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-environment-variables">Environment Variables</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-run-it-locally-first">Step #1 - Run It Locally First</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-push-to-github-so-ec2-can-pull">Step #2 - Push to GitHub (So EC2 Can Pull)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-create-aws-resources-quick-path">Step #3 - Create AWS Resources (Quick Path)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-configure-the-ec2-box">Step #4 - Configure the EC2 Box</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-build-and-upload-the-frontend">Step #5 - Build and Upload the Frontend</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-quick-troubleshooting">Step #6 - Quick Troubleshooting</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-secure-and-save">Step #7 - Secure and Save</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-8-verify-end-to-end">Step #8 - Verify End-to-End</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-next-steps">Next Steps</a></p>
</li>
</ul>
<h2 id="heading-what-youll-build"><strong>What You’ll Build</strong></h2>
<p>Before touching any buttons in AWS, it's helpful to know the exact pieces you're trying to build. At the end of this guide, you'll have a classic three-tier web app: a browser-based frontend, a backend API, and a database, all talking to each other over a network.</p>
<ul>
<li><p>API (Express/Node) on EC2</p>
</li>
<li><p>Postgres on RDS (Free Tier eligible)</p>
</li>
<li><p>React/Vite frontend on S3 (CloudFront optional for CDN/HTTPS)</p>
</li>
<li><p>Health check at <code>/api/health</code> and CRUD at <code>/api/notes</code></p>
</li>
</ul>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>You don't need to be a DevOps expert to follow along, but you should be comfortable running basic commands in a terminal and editing some config files. If you've ever used <code>npm install</code> before, then you're in the right place.</p>
<ul>
<li><p>AWS account + AWS CLI configured (<code>aws configure</code>) – see <a target="_blank" href="https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-creating.html">AWS account setup</a> and <a target="_blank" href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html">AWS CLI install</a>.</p>
</li>
<li><p>Node.js 18+ and npm – get it from <a target="_blank" href="http://nodejs.org">nodejs.org</a><a target="_blank" href="https://nodejs.org/">.</a></p>
</li>
<li><p>Git + GitHub repo – see <a target="_blank" href="https://docs.github.com/en/get-started">GitHub getting started</a>.</p>
</li>
<li><p>(Optional) Route 53 domain for a clean URL – <a target="_blank" href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/domain-register.html">Route 53 domains</a>.</p>
</li>
</ul>
<h2 id="heading-mental-map"><strong>Mental Map</strong></h2>
<p>AWS throws a lot of jargon at you (VPCs, security groups, subnets). This section is the story version of what happens when someone opens your app in the browser, without any buzzwords. If you can picture this flow, the later AWS screens will feel less scary.</p>
<ul>
<li><p>Browser loads the built React app from S3 (or CloudFront -&gt; S3)</p>
</li>
<li><p>Browser calls the API on EC2 over HTTP/HTTPS</p>
</li>
<li><p>EC2 talks to RDS Postgres on port 5432 inside your VPC</p>
</li>
<li><p>Security groups: allow 80/443 to EC2; allow 5432 only from the EC2 SG to RDS</p>
</li>
</ul>
<h2 id="heading-free-tier-basics"><strong>Free Tier Basics</strong></h2>
<p>AWS can be cheap if you use the free tier, but it can also surprise you with bills if you accidentally orprovision or leave things running. Here are the main knobs that affect cost for this tutorial and what to watch out for.</p>
<ul>
<li><p>EC2: <code>t2.micro</code> or <code>t3.micro</code> ~750 hours/month</p>
</li>
<li><p>RDS: <code>db.t3.micro</code> Postgres/MySQL with ~20 GB storage</p>
</li>
<li><p>S3/CloudFront: Small sites cost pennies - free tier includes some egress</p>
</li>
<li><p>Save money: Stop EC2 when idle. Delete unused buckets/DBs</p>
</li>
</ul>
<h2 id="heading-environment-variables"><strong>Environment Variables</strong></h2>
<p>Environment variables are just configuration values that live outside your code: ports, database URLs, and allowed origins. They keep secrets (like DB passwords) out of your Git repo and let the same code run in different places (local, staging, production) with different settings.</p>
<ul>
<li><p>Backend: <code>PORT</code>, <code>DATABASE_URL</code> (your RDS endpoint), <code>DATABASE_SSL</code> (<code>true</code> on RDS), <code>CORS_ORIGIN</code></p>
</li>
<li><p>Frontend: <code>VITE_API_URL</code> (API base, for example, <code>https://api.example.com/api</code>)</p>
</li>
</ul>
<h2 id="heading-step-1-run-it-locally-first"><strong>Step #1 - Run It Locally First</strong></h2>
<p>Before touching AWS, you want to prove the app actually works on your own machine. This removes a whole category of "Is it AWS or my code?" debugging later. In this step you just install dependencies and run both backend and frontend in dev mode.</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> mern-notes-aws

<span class="hljs-comment"># Backend</span>
<span class="hljs-built_in">cd</span> backend
npm install
cp .env.example .env   <span class="hljs-comment"># set DATABASE_URL to RDS (or local Postgres), DATABASE_SSL=true for RDS</span>
npm run dev            <span class="hljs-comment"># API on http://localhost:4000</span>

<span class="hljs-comment"># Frontend (new terminal)</span>
<span class="hljs-built_in">cd</span> frontend
npm install
cp .env.example .env   <span class="hljs-comment"># keep API URL at http://localhost:4000/api for local dev</span>
npm run dev            <span class="hljs-comment"># SPA on http://localhost:5173</span>
</code></pre>
<p>Open <code>http://localhost:5173</code>, add a note, and check if it persists. <code>/api/health</code> should return <code>{ status: 'ok' }</code>. If something is broken here, pause and fix it before moving on. AWS will only make debugging harder.</p>
<h2 id="heading-step-2-push-to-github-so-ec2-can-pull"><strong>Step #2 - Push to GitHub (So EC2 Can Pull)</strong></h2>
<p>Your EC2 server in AWS needs a place to pull your code from. Using GitHub is the simplest option: you push your code once, then the EC2 instance clones that repo. You can also reuse this repo later with CI/CD if you decide to automate deployments.</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> mern-notes-aws
git init
git add .
git commit -m <span class="hljs-string">"feat: mern notes app"</span>
git branch -M main
git remote add origin https://github.com/&lt;you&gt;/mern-notes-aws.git
git push -u origin main
</code></pre>
<p>If you're following along with my example repo instead of creating your own, you can simply fork <a target="_blank" href="https://github.com/umair-mirza/mern-notes-aws">umair-mirza/mern-notes-aws</a> and use that as your remote.</p>
<p>Before pushing, make sure your <code>.env</code> file is <strong>not committed to GitHub</strong>. Add it to your <code>.gitignore</code> so secrets like database passwords never end up in version control:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">".env"</span> &gt;&gt; .gitignore
</code></pre>
<p>If you’ve already created a <code>.env</code> file locally, double-check it doesn’t appear in <code>git status</code> before committing.</p>
<h2 id="heading-step-3-create-aws-resources-quick-path"><strong>Step #3 - Create AWS Resources (Quick Path)</strong></h2>
<h3 id="heading-rds-postgres-free-tier-template"><strong>RDS (Postgres, Free Tier template)</strong></h3>
<p>RDS (Relational Database Service) is AWS's way of running managed databases for you. Instead of installing Postgres manually on a VM, you click a few options and AWS handles backups, patching, and high availability. For this app we only need a small, free tier–eligible Postgres instance.</p>
<p>For more background, you can skim the official <a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html">Amazon RDS for PostgreSQL docs</a>.</p>
<p>We’ll start by creating the database layer. The settings below are the minimum you need for a small, production-style Postgres setup that stays within the AWS Free Tier while still following basic best practices.</p>
<ul>
<li><p>RDS Create database Postgres Free Tier.</p>
</li>
<li><p>Class <code>db.t3.micro</code>, storage 20 GB gp2/gp3.</p>
</li>
<li><p>Set master user/pass. You'll need them for <code>DATABASE_URL</code>.</p>
</li>
<li><p>Public access: No.</p>
</li>
<li><p>Security group: allow 5432 only from the EC2 security group.</p>
</li>
<li><p>Enable backups and Require SSL. Download the RDS CA if you want strict cert validation.</p>
</li>
</ul>
<h3 id="heading-s3-bucket-for-the-frontend"><strong>S3 Bucket for the Frontend</strong></h3>
<p>S3 is AWS's "infinite hard drive" for files. A React/Vite app builds down to plain HTML, CSS, and JavaScript files, which are perfect to host from S3. Think of S3 as a very simple web server that just serves static files.</p>
<p>If you want to see more options, check the <a target="_blank" href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/WebsiteHosting.html">Hosting a static website on Amazon S3 guide</a>.</p>
<p>Now, we’ll create an S3 bucket to host the React frontend. These options configure the bucket for static website hosting while keeping it simple and inexpensive.</p>
<ul>
<li><p>Create bucket <code>mern-notes-aws-frontend-&lt;suffix&gt;</code>.</p>
</li>
<li><p>For simple hosting, enable static website hosting and allow public reads, or keep private and use CloudFront + OAC.</p>
</li>
<li><p>Turn on versioning if you want rollback safety.</p>
</li>
</ul>
<h3 id="heading-ec2-for-the-api"><strong>EC2 for the API</strong></h3>
<p>EC2 is "a computer in the cloud" that you control. You'll install Node.js on it, pull your code, and run <code>server.js</code> so that your backend API is always on. The security group attached to this instance works like a firewall.</p>
<p>If you've never launched an instance before, the <a target="_blank" href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html">Getting started with Amazon EC2</a> guide walks through the console screens you'll see.</p>
<p>Finally, we’ll provision a small EC2 instance to run the Express API. The configuration below focuses on a free tier–eligible setup that’s secure enough for learning and easy to extend later.</p>
<ul>
<li><p>Launch Amazon Linux 2023, size <code>t3.micro</code>.</p>
</li>
<li><p>Inbound SG: 22 (your IP), 80 (world), 443 if you add HTTPS on the instance/ALB.</p>
</li>
<li><p>Attach this SG as the allowed source to RDS.</p>
</li>
</ul>
<h3 id="heading-optional-cloudfront-route-53"><strong>Optional: CloudFront + Route 53</strong></h3>
<p>CloudFront is AWS's CDN (content delivery network), and Route 53 is their DNS service. You don't strictly need them to get your app working, but they make it faster and nicer: your app loads from edge locations close to users and can live behind a friendly domain like <code>app.example.com</code>.</p>
<p>For more details, see <a target="_blank" href="https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/GettingStarted.html">Getting started with Amazon CloudFront</a> and the <a target="_blank" href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Welcome.html">Route 53 DNS developer guide</a>.</p>
<ul>
<li><p>Origin: the S3 bucket. Default root <code>index.html</code>. Add OAC if bucket is private.</p>
</li>
<li><p>Request an ACM cert in <code>us-east-1</code>, then create a Route 53 A/AAAA alias to the distribution.</p>
</li>
</ul>
<h2 id="heading-step-4-configure-the-ec2-box"><strong>Step #4 - Configure the EC2 Box</strong></h2>
<p>Once your EC2 instance is running, you treat it like a clean Linux machine. The commands below install the tools your API needs, pull your code from GitHub, configure environment variables, and run the server in a production-safe way.</p>
<p>Install basics:</p>
<pre><code class="lang-bash">sudo dnf update -y
</code></pre>
<p>This command updates all system packages to the latest versions. It's a good first step on any new Linux server.</p>
<pre><code class="lang-bash">sudo dnf install -y git
</code></pre>
<p>Installs Git so the EC2 instance can clone your repository from GitHub.</p>
<pre><code class="lang-bash">curl -fsSL https://rpm.nodesource.com/setup_20.x | sudo bash -
</code></pre>
<p>Adds the official NodeSource repository so you can install a modern version of Node.js (v20). Amazon Linux doesn’t ship with recent Node versions by default.</p>
<pre><code class="lang-bash">sudo dnf install -y nodejs
</code></pre>
<p>Installs Node.js and npm, which are required to run your Express API.</p>
<pre><code class="lang-bash">sudo npm install -g pm2
</code></pre>
<p>Installs PM2, a lightweight process manager that keeps your Node app running in the background and restarts it if it crashes or the server reboots.</p>
<p>Pull code and set environment variables:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> https://github.com/&lt;you&gt;/mern-notes-aws.git
<span class="hljs-built_in">cd</span> mern-notes-aws/backend
npm install

cat &lt;&lt;<span class="hljs-string">'EOF'</span> &gt; .env
PORT=80
DATABASE_URL=postgres://&lt;user&gt;:&lt;password&gt;@&lt;rds-endpoint&gt;:5432/&lt;dbname&gt;
DATABASE_SSL=<span class="hljs-literal">true</span>
CORS_ORIGIN=https://&lt;your-frontend-domain&gt;
EOF
</code></pre>
<p>Start the API with PM2:</p>
<pre><code class="lang-bash">pm2 start server.js --name mern-notes-api
pm2 save
pm2 startup systemd -u ec2-user --hp /home/ec2-user
</code></pre>
<p>PM2 is a small process manager that makes sure your Node server keeps running if the machine reboots or the process crashes. Test on the box: <code>curl http://localhost/api/health</code>. From your laptop: <code>http://&lt;ec2-public-dns&gt;/api/health</code> (make sure SG allows 80/443).</p>
<h2 id="heading-step-5-build-and-upload-the-frontend"><strong>Step #5 - Build and Upload the Frontend</strong></h2>
<p>In development, Vite serves your React app from memory, but in production you want a set of static files that any web server (or S3) can host. <code>npm run build</code> creates an optimized <code>dist/</code> folder that you sync to S3 so the browser can load it.</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> frontend
setx VITE_API_URL <span class="hljs-string">"https://&lt;ec2-or-api-domain&gt;/api"</span>
npm run build
</code></pre>
<p>This sets an environment variable called <code>VITE_API_URL</code> on your local machine. Vite only exposes environment variables to the frontend if they start with the <code>VITE_</code> prefix.</p>
<p>Upload:</p>
<pre><code class="lang-bash">aws s3 sync dist/ s3://mern-notes-aws-frontend-&lt;suffix&gt;/ --delete
</code></pre>
<p>This uploads your compiled frontend (<code>dist/</code>) to S3 and removes old files that no longer exist locally, ensuring the bucket reflects the current version of the app</p>
<p>Open the S3 website URL or your CloudFront URL.</p>
<h2 id="heading-step-6-quick-troubleshooting"><strong>Step #6 - Quick Troubleshooting</strong></h2>
<p>If something doesn't work the first time, that's normal, especially with networking and AWS permissions. This section gives you a few quick places to look before you start randomly changing settings in the console.</p>
<ul>
<li><p>API 500s: <code>pm2 logs mern-notes-api</code>. This is often a bad <code>DATABASE_URL</code> or SSL flag.</p>
</li>
<li><p>DB connect issues: RDS SG must allow the EC2 SG - use the RDS endpoint.</p>
</li>
<li><p>CORS errors: <code>CORS_ORIGIN</code> must match your frontend origin exactly.</p>
</li>
<li><p>403 from S3: If you’re using static website hosting, allow public reads. With CloudFront, keep bucket private and use OAC.</p>
</li>
<li><p>Blank page: Confirm that you’ve uploaded <code>dist/</code> to the right bucket.</p>
</li>
</ul>
<h2 id="heading-step-7-secure-and-save"><strong>Step #7 - Secure and Save</strong></h2>
<p>Once everything works, you don't want to accidentally expose your database to the internet or burn through free tier hours. These are simple, beginner-friendly hardening steps that make your setup safer and cheaper without turning you into a full-time security engineer.</p>
<ul>
<li><p>Turn off SSH after setup or switch to SSM Session Manager.</p>
</li>
<li><p>Use HTTPS (CloudFront + ACM or ALB + ACM).</p>
</li>
<li><p>Keep RDS private and use SSM port forwarding if needed.</p>
</li>
<li><p>Ship PM2 logs with CloudWatch Agent and add alarms for CPU/status checks.</p>
</li>
<li><p>Snapshot RDS daily and stop EC2 when idle to save hours.</p>
</li>
</ul>
<h2 id="heading-step-8-verify-end-to-end"><strong>Step #8 - Verify End-to-End</strong></h2>
<p>Before you celebrate, run through the app like a real user: open it in the browser, create notes, refresh, and make sure everything behaves as expected. This confirms your frontend, API, and database are all wired together correctly.</p>
<ul>
<li><p>Load the frontend (S3 or CloudFront).</p>
</li>
<li><p>Create and delete notes. They should persist in RDS.</p>
</li>
<li><p>Hit <code>/api/health</code> for a quick liveness check.</p>
</li>
</ul>
<h2 id="heading-next-steps"><strong>Next Steps</strong></h2>
<p>Once you're comfortable with this manual setup, you can start layering on more advanced tools. The ideas are the same: frontend, API and database but you get more automation, safety, and scalability.</p>
<ul>
<li><p>Add Prisma + migrations for stronger schemas.</p>
</li>
<li><p>Add auth (Cognito/Auth0) and per-user notes.</p>
</li>
<li><p>Containerize and run on ECS/Fargate or add an ALB in front of EC2.</p>
</li>
<li><p>Use Terraform/CDK to recreate this stack with one command.</p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Manage Blue-Green Deployments on AWS ECS with Database Migrations: Complete Implementation Guide ]]>
                </title>
                <description>
                    <![CDATA[ Blue-green deployments are celebrated for enabling zero-downtime releases and instant rollbacks. You deploy your new version (green) alongside the current one (blue), switch traffic over, and if something goes wrong, you switch back. Simple, right? N... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-manage-blue-green-deployments-on-aws-ecs-with-database-migrations/</link>
                <guid isPermaLink="false">69693109596ef11a775126fb</guid>
                
                    <category>
                        <![CDATA[ deployment ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Blue/Green deployment ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Destiny Erhabor ]]>
                </dc:creator>
                <pubDate>Thu, 15 Jan 2026 18:25:13 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768497873258/be1ce2a3-c95f-488e-913a-a772007a0d2a.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Blue-green deployments are celebrated for enabling zero-downtime releases and instant rollbacks. You deploy your new version (green) alongside the current one (blue), switch traffic over, and if something goes wrong, you switch back. Simple, right?</p>
<p>Not quite. While blue-green deployments work beautifully for stateless applications, they become significantly more complex when you introduce databases and stateful services into the equation. The moment your blue and green environments need to share a database, you're facing a fundamental challenge: how do you evolve your schema and data without breaking either version?</p>
<p>In this article, we'll tackle the real-world complexities of implementing blue-green deployments on Amazon ECS when your application depends on shared state. You'll learn practical strategies for handling database migrations, managing sessions, and maintaining data consistency across application versions.</p>
<p>💡 <strong>Complete Working Example</strong>: All code examples in this article are available in the <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs">bluegreen-deployment-ecs</a> <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs">repository on GitHub.</a> You can clone it and deploy the entire infrastructure to your AWS account.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-problem-with-state-in-blue-green-deployments">The Problem with State in Blue-Green Deployments</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-database-migration-strategies-for-blue-green">Database Migration Strategies for Blue-Green</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-handling-stateful-services-in-ecs">Handling Stateful Services in ECS</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-complete-implementation-end-to-end-example">Complete Implementation: End-to-End Example</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-rollback-strategies">Rollback Strategies</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-monitoring-during-deployments">Monitoring During Deployments</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-best-practices">Best Practices</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-not-to-use-blue-green">When NOT to Use Blue-Green</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-alternative-deployment-strategies">Alternative Deployment Strategies</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-cleanup">Cleanup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-further-resources">Further Resources</a></p>
</li>
</ul>
<h2 id="heading-the-problem-with-state-in-blue-green-deployments">The Problem with State in Blue-Green Deployments</h2>
<p>The elegance of blue-green deployments starts to crumble when you consider databases. Here's why: your blue environment runs application version 1, your green environment runs version 2, but they both connect to the same RDS instance.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768056130585/109ceff8-4500-45d7-aaa0-5e259b4a7b11.png" alt="Figure 1: The blue-green dilemma - both environments share the same database but expect different schemas" class="image--center mx-auto" width="1579" height="1131" loading="lazy"></p>
<p>Consider this scenario: you're adding a new feature that requires a new database column. Version 2 of your application expects this column to exist. You deploy green, run your migration to add the column, and switch traffic.</p>
<p>Everything works great until you need to rollback. Now version 1 is receiving traffic, but it doesn't know what to do with that new column. Worse, if your migration removed or renamed a column that version 1 depends on, your rollback will fail catastrophically.</p>
<p>Here are the specific challenges you'll face:</p>
<ul>
<li><p><strong>Schema versioning conflicts</strong>: Your blue environment expects schema version N, while green expects version N+1. Any breaking schema change will cause one environment to fail.</p>
</li>
<li><p><strong>Data inconsistencies</strong>: If version 2 writes data in a new format that version 1 can't read, switching back to blue will result in errors or data corruption.</p>
</li>
<li><p><strong>Irreversible migrations</strong>: Some database changes are inherently destructive. Dropping a column, changing data types, or restructuring tables can't be easily undone.</p>
</li>
<li><p><strong>Failed rollbacks</strong>: The promise of instant rollback becomes hollow when your database has evolved beyond what the blue environment can handle.</p>
</li>
</ul>
<p>Let's explore the strategies that solve these problems.</p>
<h2 id="heading-database-migration-strategies-for-blue-green">Database Migration Strategies for Blue-Green</h2>
<h3 id="heading-strategy-1-the-expand-contract-pattern-recommended">Strategy 1: The Expand-Contract Pattern (Recommended)</h3>
<p>The expand-contract pattern is the most practical approach for blue-green deployments with shared databases. It works by breaking schema changes into three phases, ensuring backwards compatibility throughout.</p>
<h4 id="heading-phase-1-expand">Phase 1: Expand</h4>
<p>In this phase, you add new schema elements while keeping old ones intact. If you're renaming a column, add the new column without removing the old one. If you're changing table structure, create new tables alongside existing ones.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Example: Renaming 'user_name' to 'username'</span>
<span class="hljs-comment">-- Phase 1: Expand - Add new column</span>
<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">users</span> <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> username <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">255</span>);

<span class="hljs-comment">-- Populate new column from old column</span>
<span class="hljs-keyword">UPDATE</span> <span class="hljs-keyword">users</span> <span class="hljs-keyword">SET</span> username = user_name <span class="hljs-keyword">WHERE</span> username <span class="hljs-keyword">IS</span> <span class="hljs-literal">NULL</span>;
</code></pre>
<p>At this point, your database supports both the old schema (used by blue) and the new schema (used by green). Your application code needs to handle both as well.</p>
<h4 id="heading-phase-2-deploy">Phase 2: Deploy</h4>
<p>Now, deploy your green environment with code that uses the new schema. But this code should still write to both old and new columns to maintain compatibility.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Version 2 code - writes to both columns</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">update_user</span>(<span class="hljs-params">user_id, username</span>):</span>
    db.execute(
        <span class="hljs-string">"UPDATE users SET username = %s, user_name = %s WHERE id = %s"</span>,
        (username, username, user_id)
    )
</code></pre>
<p>Traffic shifts from blue to green. Both environments work because the database supports both schemas. If you need to rollback, blue still functions perfectly because the old columns are intact.</p>
<h4 id="heading-phase-3-contract">Phase 3: Contract</h4>
<p>After you're confident green is stable and you've decommissioned blue, remove the old schema elements in a separate deployment.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Phase 3: Contract - Remove old column</span>
<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">users</span> <span class="hljs-keyword">DROP</span> <span class="hljs-keyword">COLUMN</span> user_name;
</code></pre>
<p>Update your application code to stop writing to the old columns. This is now version 3, deployed as a standard release.</p>
<p><strong>When to use</strong>: This should be your default approach for most schema changes including adding/removing columns, renaming fields, changing constraints, and restructuring tables.</p>
<h3 id="heading-strategy-2-parallel-schemas-or-databases">Strategy 2: Parallel Schemas or Databases</h3>
<p>For major breaking changes where backwards compatibility is impractical, you might maintain entirely separate database versions. Version 1 connects to database A, version 2 connects to database B. This approach requires data synchronization between databases. AWS Database Migration Service (DMS) can replicate data in near real-time, or you can build custom replication logic using change data capture.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Configuration for version-specific database connections</span>
DATABASE_CONFIG = {
    <span class="hljs-string">'v1'</span>: {
        <span class="hljs-string">'host'</span>: <span class="hljs-string">'blue-db.cluster-xxxxx.us-east-1.rds.amazonaws.com'</span>,
        <span class="hljs-string">'database'</span>: <span class="hljs-string">'app_v1'</span>
    },
    <span class="hljs-string">'v2'</span>: {
        <span class="hljs-string">'host'</span>: <span class="hljs-string">'green-db.cluster-yyyyy.us-east-1.rds.amazonaws.com'</span>,
        <span class="hljs-string">'database'</span>: <span class="hljs-string">'app_v2'</span>
    }
}
</code></pre>
<p>During the transition period, you run DMS to keep both databases synchronized, with the understanding that writes go to the active version's database.</p>
<p>The challenge is that you're now managing data synchronization, dealing with replication lag, and paying for two databases. Eventually, you need to consolidate back to one database, which requires another migration. This is expensive and complex, which is why it's the "nuclear option."</p>
<p><strong>When to use</strong>: Only for major architectural changes, complete data model redesigns, or when migrating between database types (for example, MySQL to PostgreSQL). If expand-contract can possibly work, use that instead.</p>
<h3 id="heading-strategy-3-feature-flags-for-gradual-rollout">Strategy 3: Feature Flags for Gradual Rollout</h3>
<p>Feature flags allow you to decouple deployment from release. Both blue and green run the same codebase, but features are toggled on or off via configuration. This shifts the problem from schema compatibility to code-level compatibility.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_user</span>(<span class="hljs-params">user_data</span>):</span>
    config = get_feature_config()
    <span class="hljs-keyword">if</span> config[<span class="hljs-string">'use_new_user_schema'</span>]:
        <span class="hljs-keyword">return</span> create_user_v2(user_data)
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> create_user_v1(user_data)
</code></pre>
<p>Instead of having two separate deployments (blue and green), you have ONE deployment with conditional logic. The "switch" from old to new behavior happens via configuration change, not infrastructure change. This is technically not pure blue-green, but it's a powerful hybrid approach.</p>
<h4 id="heading-how-it-works">How it works</h4>
<p>Your application checks AWS AppConfig (or similar service) for feature flags before executing code paths. When a flag is off, it uses the old schema/logic. When on, it uses the new schema/logic. You can even enable features for a percentage of users (5% get new behavior, 95% get old behavior) for gradual rollout.</p>
<p>The tradeoff is that your codebase temporarily contains both old and new logic with conditional branches everywhere. This increases complexity and requires disciplined cleanup after the feature is fully released. However, you gain fine-grained control and can toggle features on/off instantly without deploying new infrastructure.</p>
<p><strong>When to use:</strong> For large features with uncertain stability, gradual rollouts to monitor impact, or when you want instant rollback capability without touching infrastructure. Also useful when combined with expand-contract for extra safety.</p>
<h2 id="heading-handling-stateful-services-in-ecs">Handling Stateful Services in ECS</h2>
<p>Beyond databases, several other stateful components require careful consideration during blue-green deployments.</p>
<h3 id="heading-session-management">Session Management</h3>
<p>It’s a good idea to store sessions in ElastiCache or DynamoDB rather than application memory:</p>
<pre><code class="lang-python">app.config[<span class="hljs-string">'SESSION_TYPE'</span>] = <span class="hljs-string">'dynamodb'</span>
app.config[<span class="hljs-string">'SESSION_DYNAMODB'</span>] = boto3.client(<span class="hljs-string">'dynamodb'</span>)
</code></pre>
<h3 id="heading-shared-resources">Shared Resources</h3>
<p>Beyond database sessions, your application likely depends on other stateful components that need coordination during blue-green deployments:</p>
<h4 id="heading-1-s3-buckets">1. S3 buckets</h4>
<p>If your application stores files or data in S3, schema changes to object metadata or file formats can cause compatibility issues between versions. To address this, you can enable S3 versioning to maintain multiple format versions simultaneously.</p>
<p>For example, if version 2 writes JSON files with a new structure, version 1 should still be able to read the old format. You can include a version prefix in object keys (like <code>v1/user-data.json</code> and <code>v2/user-data.json</code>) or embed version metadata in the objects themselves.</p>
<h4 id="heading-message-queues-sqssns">Message queues (SQS/SNS)</h4>
<p>Messages sent by one version must be readable by the other during the transition. You can use versioned message schemas with a <code>schema_version</code> field in your message payload. Both blue and green should be able to parse messages from either version, even if they only produce messages in their preferred format. Consider using a schema registry or validation library to ensure compatibility.</p>
<h4 id="heading-cache-layers-elasticacheredis">Cache layers (ElastiCache/Redis)</h4>
<p>Cached data structure changes can cause deserialization errors when switching between versions. Try versioning your cache keys by including the schema version: <code>CACHE_VERSION = 'v2'</code> and then <code>cache_key = f"user:{CACHE_VERSION}:{user_id}"</code>. This ensures blue and green maintain separate cache namespaces, preventing cross-contamination. When you fully migrate to green, you can flush the old cache keys or let them expire naturally.</p>
<pre><code class="lang-python">CACHE_VERSION = <span class="hljs-string">'v2'</span>
cache_key = <span class="hljs-string">f"user:<span class="hljs-subst">{CACHE_VERSION}</span>:<span class="hljs-subst">{user_id}</span>"</span>
</code></pre>
<h2 id="heading-implementation-end-to-end-example">Implementation: End-to-End Example</h2>
<p>Let's walk through a complete blue-green deployment with ECS, handling a database schema change using the <strong>expand-contract pattern</strong>. We'll migrate from a single <code>address</code> text field to structured <code>street_address</code>, <code>city</code>, <code>state</code>, and <code>zip_code</code> fields.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768052075044/fdb732dd-cf3d-473f-a22c-f5ab98870625.png" alt="Figure 2: The three phases of expand-contract migration ensuring continuous compatibility" class="image--center mx-auto" width="3444" height="624" loading="lazy"></p>
<p><strong>Here’s the scenario:</strong> You're running an e-commerce application on ECS. The current version (blue) stores customer addresses in a single address text field. Version 2 (green) splits this into structured fields: street_address, city, state, and zip_code.</p>
<h3 id="heading-architecture-setup"><strong>Architecture Setup</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768087707691/ff19ce97-b745-4aa8-8b39-4d835fd781cd.png" alt="Figure 3: Complete AWS architecture for blue-green ECS deployment with shared RDS database" class="image--center mx-auto" width="2479" height="3679" loading="lazy"></p>
<p>Your infrastructure includes:</p>
<ul>
<li><p>ECS cluster running Fargate tasks</p>
</li>
<li><p>Application Load Balancer with two target groups (blue and green)</p>
</li>
<li><p>RDS PostgreSQL database (shared between environments)</p>
</li>
<li><p>CodeDeploy for managing traffic shifts</p>
</li>
<li><p>Parameter Store for database connection strings</p>
</li>
</ul>
<p>💡 <strong>Implementation Note</strong>: The complete Terraform code for this architecture is available in the <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/tree/main/terraform">companion GitHub repository</a>.</p>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>Before starting, make sure that you have the following tools installed and your AWS credentials properly configured:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Required tools</span>
aws --version      <span class="hljs-comment"># AWS CLI</span>
terraform --version <span class="hljs-comment"># Terraform &gt;= 1.0</span>
docker --version   <span class="hljs-comment"># Docker</span>
psql --version     <span class="hljs-comment"># PostgreSQL client</span>

<span class="hljs-comment"># Configure AWS credentials</span>
aws configure
aws sts get-caller-identity  <span class="hljs-comment"># Verify your identity</span>
</code></pre>
<h3 id="heading-step-1-deploy-infrastructure-and-blue-environment">Step 1: Deploy Infrastructure and Blue Environment</h3>
<p>We’ll start by setting up the entire AWS infrastructure from scratch using Terraform, then deploying the initial version of our application (blue environment).</p>
<p>First, clone the repository and set up your environment:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Clone the repository</span>
git <span class="hljs-built_in">clone</span> https://github.com/Caesarsage/bluegreen-deployment-ecs.git
<span class="hljs-built_in">cd</span> bluegreen-deployment-ecs

<span class="hljs-comment"># Create terraform variables</span>
<span class="hljs-built_in">cd</span> terraform
cat &gt; terraform.tfvars &lt;&lt;EOF
aws_region         = <span class="hljs-string">"us-east-1"</span>
project_name       = <span class="hljs-string">"ecommerce-bluegreen"</span>
environment        = <span class="hljs-string">"production"</span>
vpc_cidr           = <span class="hljs-string">"10.0.0.0/16"</span>

<span class="hljs-comment"># Database credentials (CHANGE THESE!)</span>
db_username = <span class="hljs-string">"dbadmin"</span>
db_password = <span class="hljs-string">"ChangeThisPassword123!"</span>

<span class="hljs-comment"># Container configuration</span>
container_image = <span class="hljs-string">"PLACEHOLDER"</span>  <span class="hljs-comment"># Will update after building image</span>
container_port  = 8080

<span class="hljs-comment"># Scaling configuration</span>
desired_count = 2
cpu           = <span class="hljs-string">"256"</span>
memory        = <span class="hljs-string">"512"</span>

<span class="hljs-comment"># Notifications</span>
notification_email = <span class="hljs-string">"your-email@example.com"</span>
EOF
</code></pre>
<p><strong>Security Note:</strong> Never commit <code>terraform.tfvars</code> to Git. It's already in <code>.gitignore</code>.</p>
<p>Next, initialize Terraform and create the ECR repository:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Initialize Terraform</span>
terraform init
terraform validate

<span class="hljs-comment"># Create ECR repository</span>
terraform apply -target=aws_ecr_repository.app

<span class="hljs-comment"># Get ECR repository URL</span>
<span class="hljs-built_in">export</span> ECR_REPO=$(terraform output -raw ecr_repository_url)
<span class="hljs-built_in">echo</span> <span class="hljs-string">"ECR Repository: <span class="hljs-variable">$ECR_REPO</span>"</span>
</code></pre>
<p>We create the ECR repository first because we need somewhere to push our Docker image. Then we'll build the image, push it, and finally deploy the rest of the infrastructure that depends on that image existing.</p>
<p>Build and push the initial application like this:</p>
<pre><code class="lang-bash">
<span class="hljs-built_in">cd</span> ..  <span class="hljs-comment"># Back to project root</span>

<span class="hljs-comment"># Set variables</span>
<span class="hljs-built_in">export</span> AWS_REGION=us-east-1
<span class="hljs-built_in">export</span> AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
<span class="hljs-built_in">export</span> ECR_REPOSITORY=ecommerce-bluegreen
<span class="hljs-built_in">export</span> IMAGE_TAG=v1.0.0

<span class="hljs-comment"># Login to ECR</span>
aws ecr get-login-password --region <span class="hljs-variable">$AWS_REGION</span> | \
    docker login --username AWS --password-stdin <span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com

<span class="hljs-comment"># Build the image</span>
docker build --platform linux/amd64 -t <span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span> -f docker/Dockerfile .

<span class="hljs-comment"># Tag and push to ECR</span>
docker tag <span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span> \
    <span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span>

docker push <span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span>

<span class="hljs-comment"># Update terraform.tfvars with the image URL</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"container_image = \"<span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span>\""</span> &gt;&gt; terraform/terraform.tfvars
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768137809806/820d7005-b924-4224-9b58-de5701466c1f.png" alt="Figure 4: ECR Private repository for Docker image" class="image--center mx-auto" width="2442" height="632" loading="lazy"></p>
<p>The <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/tree/main/app">application code</a> is a Flask application that handles both old and new schema formats based on the <code>APP_VERSION</code> environment variable.</p>
<p>Now deploy the complete infrastructure:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> terraform
terraform apply  <span class="hljs-comment"># Takes ~15-20 minutes</span>

<span class="hljs-comment"># Get outputs</span>
<span class="hljs-built_in">export</span> ALB_URL=$(terraform output -raw alb_url)
<span class="hljs-built_in">export</span> TEST_URL=$(terraform output -raw test_url)
<span class="hljs-built_in">export</span> DB_ENDPOINT=$(terraform output -raw db_endpoint)
<span class="hljs-built_in">export</span> ECR_URL=$(terraform output -raw ecr_repository_url)
<span class="hljs-built_in">export</span> BASTION_IP=$(terraform output -raw bastion_public_ip)

<span class="hljs-built_in">echo</span> <span class="hljs-string">"Application URL: <span class="hljs-variable">$ALB_URL</span>"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Test URL: <span class="hljs-variable">$TEST_URL</span>"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Database Endpoint: <span class="hljs-variable">$DB_ENDPOINT</span>"</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768141033921/07c2e9b9-c652-4cec-91ae-2de956d8655d.png" alt="Application Load Balancer with two target groups (blue and green)" class="image--center mx-auto" width="2504" height="844" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768142296716/9963c779-e0a8-4418-8d69-9bc8fcbbc553.png" alt="Figure 5: Application Load Balancer with two target groups (blue and green)" class="image--center mx-auto" width="2553" height="458" loading="lazy"></p>
<p>The production listener (port 80) is what your users hit. The test listener (port 8080) lets you test the green environment before shifting production traffic to it. This is crucial for validation.</p>
<p>You can see the complete Terraform configuration in <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/tree/main/terraform"><code>terraform</code></a>.</p>
<h3 id="heading-step-2-initialize-database-schema">Step 2: Initialize Database Schema</h3>
<p>Now you’ll need to initialize the database with the schema for version 1 (blue). We'll use Bastion for secure access:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Copy the migration files to the bastion host from your local machine</span>

scp -i ~/.ssh/id_rsa docker/init.sql ec2-user@<span class="hljs-variable">$BASTION_IP</span>:/tmp/
scp -i ~/.ssh/id_rsa migrations/*.sql ec2-user@<span class="hljs-variable">$BASTION_IP</span>:/tmp/

<span class="hljs-comment"># Then SSH into it and run migrations</span>
ssh -i ~/.ssh ec2-user@<span class="hljs-variable">$BASTION_IP</span>

<span class="hljs-comment"># Inside the bastion:</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -f /tmp/init.sql

<span class="hljs-comment"># Verify</span>
psql -h <span class="hljs-variable">$DB_HOST</span> -U <span class="hljs-variable">$DB_USER</span> -d <span class="hljs-variable">$DB_NAME</span> -c <span class="hljs-string">"\d customers"</span>

<span class="hljs-comment"># Exit the container</span>
<span class="hljs-built_in">exit</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768089062401/8f23655e-b50b-4b24-af98-b195e29da9c7.png" alt="Figure 6: Database schema - the customers table with the original columns" class="image--center mx-auto" width="1298" height="402" loading="lazy"></p>
<h3 id="heading-step-3-verify-blue-environment">Step 3: Verify Blue Environment</h3>
<p>We’ll want to test that everything works before we start the migration. This is your baseline: you want to confirm that the current system is healthy before introducing changes.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Check health</span>
curl <span class="hljs-variable">$ALB_URL</span>/health | jq

<span class="hljs-comment"># Expected response:</span>
<span class="hljs-comment"># {</span>
<span class="hljs-comment">#   "status": "healthy",</span>
<span class="hljs-comment">#   "version": "blue",</span>
<span class="hljs-comment">#   "environment": "production",</span>
<span class="hljs-comment">#   "database": "connected",</span>
<span class="hljs-comment">#   "schema": "compatible"</span>
<span class="hljs-comment"># }</span>

<span class="hljs-comment"># Create a customer with the old schema (single address field)</span>
curl -X POST <span class="hljs-variable">$ALB_URL</span>/api/customers \
    -H <span class="hljs-string">"Content-Type: application/json"</span> \
    -d <span class="hljs-string">'{
      "name": "John Doe",
      "email": "john@example.com",
      "address": "123 Main St, New York, NY, 10001"
    }'</span> | jq

<span class="hljs-comment"># List customers</span>
curl <span class="hljs-variable">$ALB_URL</span>/api/customers | jq
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768138569485/b7455a6e-b101-4cdb-83b8-40e0dbafb0b0.png" alt="Figure 7: Blue Environment Verification" class="image--center mx-auto" width="1068" height="434" loading="lazy"></p>
<h3 id="heading-step-4-expand-phase-add-new-columns">Step 4: Expand Phase – Add New Columns</h3>
<p>This is the first phase of expand-contract. We're adding the new columns WITHOUT removing the old one, creating a database schema that supports both blue and green simultaneously.</p>
<p>Run the expand migration (<a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/blob/main/migrations/001_expand_address.sql"><code>migrations/001_expand_address.sql</code>)</a>:</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Migration: 001_expand_address_fields.sql</span>
<span class="hljs-keyword">BEGIN</span>;

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> customers 
  <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> street_address <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">255</span>),
  <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> city <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>),
  <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> state <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">2</span>),
  <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> zip_code <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">10</span>);

<span class="hljs-comment">-- Populate new columns from existing data</span>
<span class="hljs-comment">-- This uses a simple parsing strategy; yours might be more sophisticated</span>

<span class="hljs-keyword">UPDATE</span> customers 
<span class="hljs-keyword">SET</span> 
  street_address = SPLIT_PART(address, <span class="hljs-string">','</span>, <span class="hljs-number">1</span>),
  city = <span class="hljs-keyword">TRIM</span>(SPLIT_PART(address, <span class="hljs-string">','</span>, <span class="hljs-number">2</span>)),
  state = <span class="hljs-keyword">TRIM</span>(SPLIT_PART(address, <span class="hljs-string">','</span>, <span class="hljs-number">3</span>)),
  zip_code = <span class="hljs-keyword">TRIM</span>(SPLIT_PART(address, <span class="hljs-string">','</span>, <span class="hljs-number">4</span>))
<span class="hljs-keyword">WHERE</span> address <span class="hljs-keyword">IS</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>;

<span class="hljs-keyword">COMMIT</span>;
</code></pre>
<p><strong>Critical observation:</strong> We're NOT dropping the <code>address</code> column. It's still there. Blue continues reading and writing to it, completely unaware that new columns exist. This is what makes the migration safe – nothing breaks.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Then SSH into it and run migrations</span>
ssh -i ~/.ssh ec2-user@<span class="hljs-variable">$BASTION_IP</span>

<span class="hljs-comment"># Inside the bastion:</span>
<span class="hljs-built_in">export</span> DB_ENDPOINT = <span class="hljs-string">""</span> <span class="hljs-comment"># from terraform output</span>

psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -f /tmp/001_expand_address.sql

<span class="hljs-comment"># Verify new columns exist</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -c <span class="hljs-string">"\d customers"</span>

<span class="hljs-built_in">exit</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768089194050/e053dee3-382b-4ccd-a0e0-8c17003e9832.png" alt="Figure 8: Database schema evolution - the customers table during expand phase with both old and new columns" class="image--center mx-auto" width="1638" height="694" loading="lazy"></p>
<p><strong>Verification:</strong> The <code>\d customers</code> command shows the table structure. You should see BOTH the old <code>address</code> column AND the new <code>street_address</code>, <code>city</code>, <code>state</code>, <code>zip_code</code> columns. This confirms the expand phase worked.</p>
<p>The database now supports both old (blue) and new (green) schemas. Blue is still running and working perfectly, and nothing has changed from its perspective.</p>
<h3 id="heading-step-5-build-and-deploy-green-environment">Step 5: Build and Deploy Green Environment</h3>
<p>Now we’ll build version 2 of our application that knows how to work with the new structured address fields, while maintaining backwards compatibility with the old schema.</p>
<p>Start by building version 2 with structured address support:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> ..  <span class="hljs-comment"># Back to project root</span>

<span class="hljs-comment"># Build new version</span>
<span class="hljs-built_in">export</span> IMAGE_TAG=v2.0.0

docker build --platform linux/amd64 -t <span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span> -f docker/Dockerfile .

docker tag <span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span> \
    <span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span>

docker push <span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span>
</code></pre>
<p>What’s different is that the v2 <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/blob/main/app/models.py">application code</a> now has logic that:</p>
<ul>
<li><p><strong>Reads</strong> from the new structured columns (<code>street_address</code>, <code>city</code>, and so on)</p>
</li>
<li><p><strong>Writes</strong> to BOTH new columns AND the old <code>address</code> column</p>
</li>
<li><p>Accepts API requests with structured address format</p>
</li>
</ul>
<p><strong>Why write to both:</strong> This is crucial. Even though green prefers the new format, it maintains the old format, too. If you need to rollback to blue, all the data blue needs is there and up-to-date. Without this, rollback would be impossible: blue would see empty or stale <code>address</code> fields.</p>
<p>Now create and register green task definition:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> terraform

<span class="hljs-comment"># Get necessary ARNs</span>
EXECUTION_ROLE_ARN=$(terraform output -raw ecs_task_execution_role_arn)
TASK_ROLE_ARN=$(terraform output -raw ecs_task_role_arn)
DB_SECRET_ARN=$(terraform output -raw db_secret_arn)

<span class="hljs-comment"># Create task definition</span>
cat &gt; task-def-green.json &lt;&lt;EOF
{
  <span class="hljs-string">"family"</span>: <span class="hljs-string">"ecommerce-bluegreen"</span>,
  <span class="hljs-string">"networkMode"</span>: <span class="hljs-string">"awsvpc"</span>,
  <span class="hljs-string">"requiresCompatibilities"</span>: [<span class="hljs-string">"FARGATE"</span>],
  <span class="hljs-string">"cpu"</span>: <span class="hljs-string">"256"</span>,
  <span class="hljs-string">"memory"</span>: <span class="hljs-string">"512"</span>,
  <span class="hljs-string">"executionRoleArn"</span>: <span class="hljs-string">"<span class="hljs-variable">${EXECUTION_ROLE_ARN}</span>"</span>,
  <span class="hljs-string">"taskRoleArn"</span>: <span class="hljs-string">"<span class="hljs-variable">${TASK_ROLE_ARN}</span>"</span>,
  <span class="hljs-string">"containerDefinitions"</span>: [{
    <span class="hljs-string">"name"</span>: <span class="hljs-string">"app"</span>,
    <span class="hljs-string">"image"</span>: <span class="hljs-string">"<span class="hljs-variable">${AWS_ACCOUNT_ID}</span>.dkr.ecr.<span class="hljs-variable">${AWS_REGION}</span>.amazonaws.com/<span class="hljs-variable">${ECR_REPOSITORY}</span>:<span class="hljs-variable">${IMAGE_TAG}</span>"</span>,
    <span class="hljs-string">"essential"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-string">"portMappings"</span>: [{
      <span class="hljs-string">"containerPort"</span>: 8080,
      <span class="hljs-string">"protocol"</span>: <span class="hljs-string">"tcp"</span>
    }],
    <span class="hljs-string">"environment"</span>: [
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"APP_VERSION"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"green"</span>},
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"ENVIRONMENT"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"production"</span>},
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"AWS_REGION"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"<span class="hljs-variable">${AWS_REGION}</span>"</span>},
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"DB_HOST"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"<span class="hljs-variable">${DB_ENDPOINT}</span>"</span>},
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"DB_PORT"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"5432"</span>},
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"DB_NAME"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"ecommerce"</span>}
    ],
    <span class="hljs-string">"secrets"</span>: [
      {
        <span class="hljs-string">"name"</span>: <span class="hljs-string">"DB_USER"</span>,
        <span class="hljs-string">"valueFrom"</span>: <span class="hljs-string">"<span class="hljs-variable">${DB_SECRET_ARN}</span>:username::"</span>
      },
      {
        <span class="hljs-string">"name"</span>: <span class="hljs-string">"DB_PASSWORD"</span>,
        <span class="hljs-string">"valueFrom"</span>: <span class="hljs-string">"<span class="hljs-variable">${DB_SECRET_ARN}</span>:password::"</span>
      }
    ],
    <span class="hljs-string">"logConfiguration"</span>: {
      <span class="hljs-string">"logDriver"</span>: <span class="hljs-string">"awslogs"</span>,
      <span class="hljs-string">"options"</span>: {
        <span class="hljs-string">"awslogs-group"</span>: <span class="hljs-string">"/ecs/ecommerce-bluegreen"</span>,
        <span class="hljs-string">"awslogs-region"</span>: <span class="hljs-string">"<span class="hljs-variable">${AWS_REGION}</span>"</span>,
        <span class="hljs-string">"awslogs-stream-prefix"</span>: <span class="hljs-string">"ecs"</span>
      }
    },
    <span class="hljs-string">"healthCheck"</span>: {
      <span class="hljs-string">"command"</span>: [<span class="hljs-string">"CMD-SHELL"</span>, <span class="hljs-string">"curl -f http://localhost:8080/health || exit 1"</span>],
      <span class="hljs-string">"interval"</span>: 30,
      <span class="hljs-string">"timeout"</span>: 5,
      <span class="hljs-string">"retries"</span>: 3,
      <span class="hljs-string">"startPeriod"</span>: 60
    }
  }]
}
EOF

<span class="hljs-comment"># Register the task definition</span>
aws ecs register-task-definition --cli-input-json file://task-def-green.json
</code></pre>
<p>This JSON tells ECS everything about how to run your container:</p>
<ul>
<li><p>Which Docker image to use (the v2.0.0 we just built)</p>
</li>
<li><p>How much CPU/memory to allocate (256 CPU units = 0.25 vCPU)</p>
</li>
<li><p>Environment variables (notice <code>APP_VERSION</code> is set to "green")</p>
</li>
<li><p>Secrets (database credentials pulled from AWS Secrets Manager)</p>
</li>
<li><p>Health check configuration (curl the /health endpoint every 30 seconds)</p>
</li>
<li><p>Logging configuration (send logs to CloudWatch)</p>
</li>
</ul>
<p><strong>Key detail:</strong> The <code>APP_VERSION</code> environment variable is how the application knows whether to behave as blue or green. Same codebase, different behavior based on configuration.</p>
<h3 id="heading-step-6-execute-blue-green-deployment">Step 6: Execute Blue-Green Deployment</h3>
<p>Alright, now it’s time to create AppSpec and trigger the deployment:</p>
<pre><code class="lang-bash">TASK_DEF_ARN=$(aws ecs describe-task-definition \
  --task-definition ecommerce-bluegreen \
  --query <span class="hljs-string">'taskDefinition.taskDefinitionArn'</span> \
  --output text)

cat &gt; appspec.json &lt;&lt;EOF
{
  <span class="hljs-string">"version"</span>: 0.0,
  <span class="hljs-string">"Resources"</span>: [{
    <span class="hljs-string">"TargetService"</span>: {
      <span class="hljs-string">"Type"</span>: <span class="hljs-string">"AWS::ECS::Service"</span>,
      <span class="hljs-string">"Properties"</span>: {
        <span class="hljs-string">"TaskDefinition"</span>: <span class="hljs-string">"<span class="hljs-variable">${TASK_DEF_ARN}</span>"</span>,
        <span class="hljs-string">"LoadBalancerInfo"</span>: {
          <span class="hljs-string">"ContainerName"</span>: <span class="hljs-string">"app"</span>,
          <span class="hljs-string">"ContainerPort"</span>: 8080
        }
      }
    }
  }]
}
EOF

<span class="hljs-comment"># Deploy</span>
APPSPEC=$(cat appspec.json | jq -c .)
aws deploy create-deployment \
  --application-name ecommerce-bluegreen \
  --deployment-group-name ecommerce-bluegreen-deployment-group \
  --deployment-config-name CodeDeployDefault.ECSLinear10PercentEvery3Minutes \
  --description <span class="hljs-string">"Blue-green deployment to structured address schema"</span> \
  --cli-input-json <span class="hljs-string">"{
    \"revision\": {
      \"revisionType\": \"AppSpecContent\",
      \"appSpecContent\": {
        \"content\": <span class="hljs-subst">$(echo \"$APPSPEC\" | jq -Rs .)</span>
      }
    }
  }"</span>

DEPLOYMENT_ID=$(aws deploy list-deployments \
    --application-name ecommerce-bluegreen \
    --deployment-group-name ecommerce-bluegreen-deployment-group \
    --query <span class="hljs-string">'deployments[0]'</span> --output text)
</code></pre>
<p>Monitor the deployment:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Watch status</span>
watch -n 10 <span class="hljs-string">"aws deploy get-deployment --deployment-id <span class="hljs-variable">$DEPLOYMENT_ID</span> \
    --query 'deploymentInfo.status' --output text"</span>

<span class="hljs-comment"># Monitor traffic distribution</span>
<span class="hljs-keyword">while</span> <span class="hljs-literal">true</span>; <span class="hljs-keyword">do</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"Production: <span class="hljs-subst">$(curl -s $ALB_URL/health | jq -r '.version')</span>"</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"Test: <span class="hljs-subst">$(curl -s $TEST_URL/health | jq -r '.version')</span>"</span>
    sleep 30
<span class="hljs-keyword">done</span>
</code></pre>
<p>The deployment shifts 10% of traffic every 3 minutes, completing in 30 minutes.</p>
<h3 id="heading-step-7-validate-green-environment">Step 7: Validate Green Environment</h3>
<p>After the deployment begins, you need to validate that the green environment is functioning correctly with the new structured address format before allowing production traffic to reach it.</p>
<p>The CodeBuild dashboard below shows the Traffic migration and Deployment status:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768093087711/fc1b869c-7fae-421e-8d98-45769300cb0a.png" alt="Monitoring in CodeDeploy" class="image--center mx-auto" width="2282" height="1460" loading="lazy"></p>
<p>We can also test through the test listener (port 8080), which provides isolated access to green tasks:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Test new structured address API</span>
curl -X POST <span class="hljs-variable">$TEST_URL</span>/api/customers \
    -H <span class="hljs-string">"Content-Type: application/json"</span> \
    -d <span class="hljs-string">'{
      "name": "Jane Smith",
      "email": "jane@example.com",
      "address": {
        "street": "456 Oak Ave",
        "city": "Los Angeles",
        "state": "CA",
        "zip": "90001"
      }
    }'</span> | jq

curl <span class="hljs-variable">$ALB_URL</span>/api/customers | jq
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768140730325/57c6a047-994f-4b5e-8e19-4d6fb25ad44e.png" alt="Validate Green environment response" class="image--center mx-auto" width="1422" height="672" loading="lazy"></p>
<p>What you're validating:</p>
<ul>
<li><p>The green environment accepts the new structured address format</p>
</li>
<li><p>Data is correctly written to both new columns (street_address, city, state, zip_code) and the old address column for backwards compatibility</p>
</li>
<li><p>The API response matches expectations for the new schema</p>
</li>
<li><p>Existing data from blue environment is still accessible and readable</p>
</li>
</ul>
<p>If any of these tests fail, you can stop the deployment before production traffic reaches green, preventing customer impact.</p>
<h3 id="heading-step-8-post-deployment-validation">Step 8: Post-Deployment Validation</h3>
<p>Once CodeDeploy completes the traffic shift, all production requests route to green. This is your opportunity to verify that the deployment was successful and that the new version is handling real production traffic correctly.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Verify all production traffic goes to green</span>
<span class="hljs-comment"># Running this multiple times confirms consistent routing</span>
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> {1..10}; <span class="hljs-keyword">do</span>
    curl -s <span class="hljs-variable">$ALB_URL</span>/health | jq -r <span class="hljs-string">'.version'</span>
<span class="hljs-keyword">done</span>
<span class="hljs-comment"># Expected output: "green" for all 10 requests</span>

<span class="hljs-comment"># Test complete CRUD operations with the new API</span>
<span class="hljs-comment"># Create a customer with structured address</span>
CUSTOMER_ID=$(curl -s -X POST <span class="hljs-variable">$ALB_URL</span>/api/customers \
    -H <span class="hljs-string">"Content-Type: application/json"</span> \
    -d <span class="hljs-string">'{"name": "Test User", "email": "test@example.com",
         "address": {"street": "789 Test St", "city": "Test City", 
         "state": "TX", "zip": "75001"}}'</span> | jq -r <span class="hljs-string">'.id'</span>)

<span class="hljs-comment"># Read the customer back to verify data persistence</span>
curl <span class="hljs-variable">$ALB_URL</span>/api/customers/<span class="hljs-variable">$CUSTOMER_ID</span> | jq

<span class="hljs-comment"># Update the customer to test modification</span>
curl -X PUT <span class="hljs-variable">$ALB_URL</span>/api/customers/<span class="hljs-variable">$CUSTOMER_ID</span> \
    -H <span class="hljs-string">"Content-Type: application/json"</span> \
    -d <span class="hljs-string">'{"address": {"street": "999 Updated Ave", "city": "Test City", 
         "state": "TX", "zip": "75001"}}'</span> | jq

<span class="hljs-comment"># Delete the test customer for cleanup</span>
curl -X DELETE <span class="hljs-variable">$ALB_URL</span>/api/customers/<span class="hljs-variable">$CUSTOMER_ID</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768140850962/a31273e9-cbc1-4d09-9f6d-7248b402f712.png" alt="Verify all production traffic goes to green" class="image--center mx-auto" width="846" height="270" loading="lazy"></p>
<p>What you're validating:</p>
<ul>
<li><p>Traffic routing is 100% to green with no requests reaching blue</p>
</li>
<li><p>Create operations work with the new structured address format</p>
</li>
<li><p>Read operations return correct data with proper address structure</p>
</li>
<li><p>Update operations successfully modify existing records</p>
</li>
<li><p>Delete operations work without errors</p>
</li>
<li><p>The application correctly writes to both new columns and old address column (enabling potential rollback)</p>
</li>
</ul>
<p>Check your CloudWatch logs and metrics during this validation period for any unexpected errors, increased latency, or database connection issues.</p>
<h3 id="heading-step-9-contract-phase-after-24-72-hours">Step 9: Contract Phase (After 24-72 Hours)</h3>
<p>This is the final phase of expand-contract. We're removing the old <code>address</code> column now that we're confident green is stable. This is the point of no return.</p>
<p><strong>CRITICAL</strong>: Only proceed after green has been stable for your confidence period!</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Backup database first</span>
aws rds create-db-snapshot \
    --db-instance-identifier ecommerce-bluegreen-db \
    --db-snapshot-identifier pre-contract-$(date +%Y%m%d-%H%M%S)

<span class="hljs-comment"># Wait for snapshot</span>
aws rds <span class="hljs-built_in">wait</span> db-snapshot-completed \
    --db-snapshot-identifier pre-contract-$(date +%Y%m%d-%H%M%S)

<span class="hljs-comment"># Run contract migration</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -f /tmp/002_contract_address.sql

<span class="hljs-comment"># Verify old column is gone</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -c <span class="hljs-string">"\d customers"</span>
</code></pre>
<p>The contract migration (<a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/blob/main/migrations/002_contract_address.sql"><code>migrations/002_contract_address.sql</code></a>) removes the old <code>address</code> column.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768140955991/d6f6f287-09e5-4693-a4e9-77c1d9080466.png" alt="d6f6f287-09e5-4693-a4e9-77c1d9080466" class="image--center mx-auto" width="1506" height="444" loading="lazy"></p>
<p><strong>Why wait 24-72 hours:</strong> You want to be absolutely certain green is stable before making irreversible changes. During this waiting period:</p>
<ul>
<li><p>All your monitoring should show green performing normally</p>
</li>
<li><p>You've seen the system handle multiple daily traffic patterns (morning peak, evening peak, overnight)</p>
</li>
<li><p>Weekly batch jobs have run successfully</p>
</li>
<li><p>You've verified third-party integrations work</p>
</li>
<li><p>No unusual errors or performance degradation</p>
</li>
</ul>
<p>It’s important to snapshot first because once you drop that column, there's no undo button. The snapshot is your safety net. If you discover a critical issue after contracting, you can restore this snapshot and get back to a state where rollback is possible. Without it, you're gambling.</p>
<p><strong>What the contract migration does:</strong></p>
<pre><code class="lang-sql"><span class="hljs-comment">-- migrations/002_contract_address.sql</span>
<span class="hljs-keyword">BEGIN</span>;
<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> customers <span class="hljs-keyword">DROP</span> <span class="hljs-keyword">COLUMN</span> address;
<span class="hljs-keyword">COMMIT</span>;
</code></pre>
<p>It's simple but permanent. The old <code>address</code> column is gone. The Blue environment will no longer work with this database, as it expects that column to exist. This is fine because blue has been decommissioned (no traffic, tasks terminated).</p>
<p><strong>What to update:</strong> You should also deploy version 3 of your application that removes the dual-write logic. Version 2 (green) is still writing to both the new columns and the old <code>address</code> column. Version 3 can stop wasting cycles writing to a column that no longer exists.</p>
<p>The contract migration (<a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/blob/main/migrations/002_contract_address.sql"><code>migrations/002_contract_address.sql</code></a>) removes the old <code>address</code> column. Your migration is now complete!</p>
<h2 id="heading-rollback-strategies">Rollback Strategies</h2>
<h3 id="heading-during-deployment-safe-window">During Deployment (Safe Window)</h3>
<p>Use this strategy when you detect issues <strong>during the traffic shift</strong>, before all traffic has moved to green. CodeDeploy is still managing the deployment, which means it can automatically revert traffic distribution to the previous state.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Immediate rollback</span>
aws deploy stop-deployment \
    --deployment-id <span class="hljs-variable">$DEPLOYMENT_ID</span> \
    --auto-rollback-enabled
</code></pre>
<p>You should use this strategy when you notice increased error rates, degraded performance, or functional issues during the canary or linear traffic shift. CodeDeploy automatically shifts all traffic back to blue, and green tasks are terminated. This is the safest and fastest rollback option.</p>
<p>This works because the database still contains the old <code>address</code> column (expand phase), so blue can function normally. No data has been lost or made incompatible.</p>
<h3 id="heading-after-deployment-before-contract">After Deployment (Before Contract)</h3>
<p>Use this when the deployment completed successfully, but you discover issues hours or days later during the monitoring period, before you've run the contract migration. Both blue and green environments still exist, and the database supports both schemas.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Manual listener update</span>
aws elbv2 modify-listener \
    --listener-arn $(terraform output -raw alb_listener_arn) \
    --default-actions Type=forward,TargetGroupArn=$(terraform output -raw blue_target_group_arn)
</code></pre>
<p>Or use the provided script:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> scripts
./rollback.sh
</code></pre>
<p>Use this when you discover bugs in green that weren't caught during initial testing, business metrics show unexpected changes (conversion rates drop, customer complaints increase), or third-party integration issues emerge.</p>
<p>This works because the database still has both old and new schema elements. Blue tasks still exist and can serve traffic immediately. Because green was writing to both old and new columns, blue sees all the latest data.</p>
<p>With this, the traffic immediately shifts from green back to blue. Green continues running for observability, but serves no traffic. You can debug green in place without customer impact.</p>
<h3 id="heading-after-contract-phase">After Contract Phase</h3>
<p>Use this as a <strong>last resort</strong> when you've already removed the old address column, and blue can no longer function with the current database schema. This is significantly more complex and time-consuming than the previous two strategies.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Restore from snapshot</span>
aws rds restore-db-instance-from-db-snapshot \
    --db-instance-identifier ecommerce-bluegreen-db-restored \
    --db-snapshot-identifier pre-contract-YYYYMMDD-HHMMSS
</code></pre>
<p>Only use this strategy when you discover a critical, production-breaking issue after the contract phase, and you have no other option but to return to the previous version.</p>
<p><strong>Why it's painful</strong>:</p>
<ul>
<li><p>Database restore takes 10-30 minutes depending on size</p>
</li>
<li><p>You lose all data written after the snapshot was taken</p>
</li>
<li><p>Requires updating connection strings to point to the restored instance</p>
</li>
<li><p>Need to re-deploy blue environment</p>
</li>
<li><p>Must communicate downtime to users</p>
</li>
</ul>
<p>This is why you wait 24-72 hours before contracting, and take a snapshot immediately before the contract migration. The lengthy waiting period allows you to catch most issues while the safer rollback strategies are still available.</p>
<h2 id="heading-monitoring-during-deployments">Monitoring During Deployments</h2>
<h3 id="heading-essential-metrics">Essential Metrics</h3>
<p>During a blue-green deployment, you need to monitor both environments simultaneously to detect issues early and make informed decisions about proceeding or rolling back.For each target group (blue and green), track these CloudWatch metrics:</p>
<h4 id="heading-1-targetresponsetime">1. TargetResponseTime</h4>
<p>Measures latency from when the load balancer sends a request to when it receives a response. You're looking for sudden spikes or gradual degradation. Green should have similar response times to blue (within 10-20%). If green's latency is significantly higher, you may have performance regressions, inefficient queries with the new schema, or resource constraints.</p>
<h4 id="heading-2-requestcount">2. RequestCount</h4>
<p>Shows traffic volume hitting each target group. During the deployment, you should see blue's count decreasing while green's increases proportionally. If the numbers don't add up (total requests drop significantly), users might be experiencing errors and not retrying. If green receives traffic but shows zero requests, health checks might be failing.</p>
<h4 id="heading-3-httpcodetarget5xxcount">3. HTTPCode_Target_5XX_Count</h4>
<p>Server errors indicate application problems. Even a single 5XX error during deployment warrants investigation. Green should have zero 5XX errors during the initial traffic shift. Any errors could indicate incompatibility issues with the new schema, missing environment variables, or database connection problems.</p>
<h4 id="heading-4-databaseconnections-from-rds-metrics">4. DatabaseConnections (from RDS metrics):</h4>
<p>Shows active database connections from both environments. Watch for connection pool exhaustion, which manifests as a sudden spike or plateau at your max connections limit. If green uses more connections than blue did, you might have connection leaks or inefficient connection handling in the new code.</p>
<h4 id="heading-5-cpuutilization">5. CPUUtilization</h4>
<p>Monitor both ECS task CPU and RDS CPU. Green tasks should use similar CPU to blue tasks for the same request volume. Higher CPU might indicate less efficient code or more complex queries. RDS CPU spikes during deployment often indicate poorly optimized new queries or missing indexes for the new schema.</p>
<p><strong>What to expect</strong>:</p>
<ul>
<li><p>First 5-10 minutes: Green receives 10% traffic, metrics should closely match blue's baseline</p>
</li>
<li><p>15-20 minutes: Green at 30-50% traffic, both environments should show stable metrics</p>
</li>
<li><p>25-30 minutes: Green at 100% traffic, metrics should stabilize at historical levels</p>
</li>
<li><p>Any divergence from these patterns warrants stopping the deployment and investigating</p>
</li>
</ul>
<p><strong>Custom application metrics</strong>: Beyond infrastructure metrics, monitor business-critical metrics like checkout completion rates, API success rates, and user sign-up flows. Sometimes technical metrics look fine but user-facing functionality is broken.</p>
<h2 id="heading-best-practices">Best Practices</h2>
<h3 id="heading-test-migrations-in-staging">Test Migrations in Staging</h3>
<p>Always run your database migrations against a staging environment that mirrors production scale and complexity before touching production. Copy a recent production snapshot to staging and execute your expand migration there first.</p>
<p><strong>Why this matters</strong>: Migrations that work fine on small datasets can timeout or lock tables on production-scale data. You might discover that adding an index to a 50-million-row table takes 2 hours, or that your column population query needs optimization.</p>
<p><strong>What to test</strong>:</p>
<ul>
<li><p>Migration execution time (should complete in seconds/minutes, not hours)</p>
</li>
<li><p>Table locks and their impact (can reads/writes continue during migration?)</p>
</li>
<li><p>Query performance with new schema (are your indexes still effective?)</p>
</li>
<li><p>Rollback procedures (can you undo the migration if needed?)</p>
</li>
</ul>
<h3 id="heading-use-migration-tools">Use Migration Tools</h3>
<p>Don't write raw SQL migrations manually. Use Flyway, Liquibase, Alembic (for Python), or your framework's built-in migration tools (Rails migrations, Django migrations, Entity Framework migrations).</p>
<p><strong>Why this matters</strong>: Migration tools provide version tracking, rollback capabilities, checksums to prevent tampering, and a standardized way to manage schema changes across environments.</p>
<h3 id="heading-configure-health-checks-properly">Configure Health Checks Properly</h3>
<p>Your health check endpoint should verify that the application can actually function, not just that the process is running. A comprehensive health check validates database connectivity, schema compatibility, and dependent service availability.</p>
<pre><code class="lang-python"><span class="hljs-meta">@app.route('/health')</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">health_check</span>():</span>
    checks = {
        <span class="hljs-string">'database'</span>: check_database(),
        <span class="hljs-string">'schema'</span>: check_schema_compatibility(),
        <span class="hljs-string">'cache'</span>: check_cache_connection()
    }

    <span class="hljs-keyword">if</span> all(checks.values()):
        <span class="hljs-keyword">return</span> jsonify(checks), <span class="hljs-number">200</span>
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> jsonify(checks), <span class="hljs-number">503</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">check_schema_compatibility</span>():</span>
    <span class="hljs-string">"""Verify expected schema elements exist"""</span>
    <span class="hljs-keyword">try</span>:
        result = db.query(<span class="hljs-string">"""
            SELECT column_name 
            FROM information_schema.columns 
            WHERE table_name = 'customers'
            AND column_name IN ('street_address', 'city', 'state', 'zip_code')
        """</span>)
        <span class="hljs-keyword">return</span> len(result) == <span class="hljs-number">4</span>
    <span class="hljs-keyword">except</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
</code></pre>
<p>For ALB health checks specifically, make sure you configure appropriate thresholds in your target group settings. A healthy threshold of 2 means the target must pass 2 consecutive health checks before receiving traffic. An unhealthy threshold of 3 means it must fail 3 consecutive checks before being removed. Set your interval to 30 seconds and timeout to 5 seconds to balance responsiveness with stability.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Terraform configuration for ALB health checks</span>
resource <span class="hljs-string">"aws_lb_target_group"</span> <span class="hljs-string">"green"</span> {
  health_check {
    enabled             = <span class="hljs-literal">true</span>
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 30
    path                = <span class="hljs-string">"/health"</span>
    matcher             = <span class="hljs-string">"200"</span>
  }
}
</code></pre>
<p>This configuration ensures that ECS tasks aren't marked healthy prematurely (preventing traffic to broken tasks) while also not being overly sensitive to transient issues (preventing unnecessary task replacements).</p>
<h3 id="heading-plan-the-contract-phase">Plan the Contract Phase</h3>
<p>The contract phase is irreversible, so treat it with appropriate caution. Wait a minimum of 24-72 hours after green deployment before removing old schema elements. This waiting period isn't arbitrary: it ensures you've observed the system under various conditions.</p>
<p><strong>What to verify before contracting</strong>:</p>
<ul>
<li><p>Green has handled multiple daily traffic patterns (morning rush, evening peak, overnight batch jobs)</p>
</li>
<li><p>All scheduled jobs and cron tasks have run successfully with the new schema</p>
</li>
<li><p>Weekly reports or analytics pipelines have completed</p>
</li>
<li><p>Third-party integrations (payment processors, shipping APIs, analytics tools) are working</p>
</li>
<li><p>No unusual error patterns in logs</p>
</li>
<li><p>Business metrics (conversions, sign-ups, purchases) remain stable</p>
</li>
<li><p>Customer support hasn't reported related issues</p>
</li>
</ul>
<p>The pre-contract checklist:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># 1. Create a final snapshot</span>
aws rds create-db-snapshot \
    --db-instance-identifier ecommerce-bluegreen-db \
    --db-snapshot-identifier pre-contract-$(date +%Y%m%d-%H%M%S)

<span class="hljs-comment"># 2. Document current state</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Green tasks: <span class="hljs-subst">$(aws ecs describe-services --cluster ecommerce --services ecommerce-green | jq '.services[0].runningCount')</span>"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Error rate: <span class="hljs-subst">$(aws cloudwatch get-metric-statistics --namespace AWS/ApplicationELB --metric-name HTTPCode_Target_5XX_Count --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S)</span> --end-time <span class="hljs-subst">$(date -u +%Y-%m-%dT%H:%M:%S)</span> --period 3600 --statistics Sum)"</span>

<span class="hljs-comment"># 3. Notify team</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Running contract migration at <span class="hljs-subst">$(date)</span>"</span>

<span class="hljs-comment"># 4. Run migration</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -f migrations/002_contract_address.sql

<span class="hljs-comment"># 5. Verify</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -c <span class="hljs-string">"\d customers"</span>
</code></pre>
<h3 id="heading-version-your-apis">Version Your APIs</h3>
<p>When changing data formats, maintain backward compatibility by supporting both old and new API versions simultaneously. This allows API consumers (mobile apps, third-party integrations, other services) to migrate at their own pace without coordinating releases.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Support both API versions during transition</span>
<span class="hljs-meta">@app.route('/api/v1/customers/&lt;id&gt;')</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_customer_v1</span>(<span class="hljs-params">id</span>):</span>
    customer = Customer.find(id)
    <span class="hljs-keyword">return</span> jsonify({
        <span class="hljs-string">'id'</span>: customer.id,
        <span class="hljs-string">'name'</span>: customer.name,
        <span class="hljs-string">'address'</span>: customer.address  <span class="hljs-comment"># Old format</span>
    })

<span class="hljs-meta">@app.route('/api/v2/customers/&lt;id&gt;')</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_customer_v2</span>(<span class="hljs-params">id</span>):</span>
    customer = Customer.find(id)
    <span class="hljs-keyword">return</span> jsonify({
        <span class="hljs-string">'id'</span>: customer.id,
        <span class="hljs-string">'name'</span>: customer.name,
        <span class="hljs-string">'address'</span>: {  <span class="hljs-comment"># New structured format</span>
            <span class="hljs-string">'street'</span>: customer.street_address,
            <span class="hljs-string">'city'</span>: customer.city,
            <span class="hljs-string">'state'</span>: customer.state,
            <span class="hljs-string">'zip'</span>: customer.zip_code
        }
    })
</code></pre>
<p>To implement this, you can initially deploy both endpoints with blue-green. Then monitor usage of v1 endpoint over time. Once v1 traffic drops below 1% (meaning clients have migrated), deprecate it formally. Remove v1 endpoint in a subsequent release, not during the blue-green deployment itself.</p>
<p>Announce the new API version to consumers with a migration timeline. Give them 2-3 months to update their integrations. Send reminder emails at the halfway point and 2 weeks before v1 shutdown.</p>
<h3 id="heading-monitor-both-environments">Monitor Both Environments</h3>
<p>During the transition period, both blue and green are production environments serving real traffic. Monitor them separately to detect version-specific issues.</p>
<p>Set up separate CloudWatch dashboards for blue and green target groups with the same metrics arranged identically. This makes it easy to spot differences at a glance. If green's response time is 200ms while blue's is 50ms, that's a red flag.</p>
<h4 id="heading-alert-on-metric-divergence">Alert on metric divergence</h4>
<p>Create alarms that trigger when green's metrics deviate significantly from blue's baseline. For example, if green's error rate is more than 2x blue's historical average, trigger an alert. If green's database query time is 50% higher, investigate before shifting more traffic.</p>
<h4 id="heading-log-aggregation">Log aggregation</h4>
<p>Ensure logs from both environments are tagged with their version (<code>environment: blue</code> or <code>environment: green</code>) so you can filter and compare them. Use CloudWatch Insights queries to spot patterns.</p>
<h2 id="heading-when-not-to-use-blue-green">When NOT to Use Blue-Green</h2>
<p>Blue-green isn't always the right choice. Avoid it when you have:</p>
<ul>
<li><p><strong>Very large database migrations</strong>: If your migration takes hours or requires significant locks, use a traditional maintenance window.</p>
</li>
<li><p><strong>Highly stateful applications</strong>: Real-time collaboration tools or WebSocket applications with complex in-memory state may need rolling deployments instead.</p>
</li>
<li><p><strong>Cost constraints</strong>: Running two environments doubles costs. Consider canary deployments for cost-sensitive applications.</p>
</li>
<li><p><strong>Complex data model redesigns</strong>: Use the strangler fig pattern to gradually migrate functionality to a new service.</p>
</li>
</ul>
<h3 id="heading-alternative-deployment-strategies">Alternative Deployment Strategies</h3>
<h4 id="heading-canary-deployments">Canary Deployments</h4>
<p>Route a small percentage (5-10%) to the new version:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"trafficRouting"</span>: {
    <span class="hljs-attr">"type"</span>: <span class="hljs-string">"TimeBasedCanary"</span>,
    <span class="hljs-attr">"timeBasedCanary"</span>: {
      <span class="hljs-attr">"canaryPercentage"</span>: <span class="hljs-number">10</span>,
      <span class="hljs-attr">"canaryInterval"</span>: <span class="hljs-number">5</span>
    }
  }
}
</code></pre>
<h3 id="heading-rolling-deployments">Rolling Deployments</h3>
<p>Gradually replace old tasks with new ones:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"deploymentConfiguration"</span>: {
    <span class="hljs-attr">"maximumPercent"</span>: <span class="hljs-number">200</span>,
    <span class="hljs-attr">"minimumHealthyPercent"</span>: <span class="hljs-number">100</span>
  }
}
</code></pre>
<h2 id="heading-cleanup">Cleanup</h2>
<p>After you've successfully completed your blue-green deployment, validated the green environment, and run the contract phase, you need to clean up the AWS resources to avoid unnecessary costs and resource sprawl.</p>
<p><strong>What you're removing</strong>:</p>
<ul>
<li><p>The entire infrastructure stack (VPC, subnets, NAT gateways, load balancer, ECS cluster, RDS database, and all associated resources)</p>
</li>
<li><p>This is appropriate for a tutorial/testing scenario where you deployed everything from scratch</p>
</li>
</ul>
<p>Important considerations before cleanup:</p>
<ul>
<li><p>Ensure you have backups if you need to reference any data later</p>
</li>
<li><p>Export any logs or metrics you want to retain</p>
</li>
<li><p>Document lessons learned from the deployment</p>
</li>
<li><p>Verify no production traffic is still using these resources</p>
</li>
</ul>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> terraform

<span class="hljs-comment"># Terraform will prompt you to confirm with "yes"</span>
<span class="hljs-comment"># Review the destruction plan carefully before confirming</span>
terraform destroy  <span class="hljs-comment"># Takes ~10-15 minutes</span>
</code></pre>
<p><strong>Partial cleanup</strong>: If you want to keep certain resources (like RDS snapshots for reference), you can remove them from Terraform state before destroying:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Remove RDS from Terraform management before destroying</span>
terraform state rm aws_db_instance.main
terraform destroy  <span class="hljs-comment"># Now destroys everything except RDS</span>
</code></pre>
<p>For production environments, you would NOT destroy everything. Instead, you'd decommission the blue environment specifically after confirming green is stable:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Production scenario - remove only blue environment</span>
terraform destroy -target=aws_ecs_service.blue
terraform destroy -target=aws_lb_target_group.blue
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Blue-green deployments with databases require careful planning, but the expand-contract pattern makes it manageable.</p>
<p>Here are some key takeaways:</p>
<ol>
<li><p><strong>Use expand-contract as default</strong> – Maintains backwards compatibility and safe rollbacks.</p>
</li>
<li><p><strong>Externalize state</strong> – Sessions, caches, and storage should use external services.</p>
</li>
<li><p><strong>Plan for three phases</strong> – Don't rush to the contract phase.</p>
</li>
<li><p><strong>Test everything in staging</strong> – Mirror production scale and complexity.</p>
</li>
<li><p><strong>Monitor aggressively</strong> – Track technical and business metrics for both environments.</p>
</li>
<li><p><strong>Know when to use alternatives</strong> – Blue-green isn't always the answer.</p>
</li>
<li><p><strong>Document rollback procedures</strong> – Everyone should know the rollback process before deployment.</p>
</li>
</ol>
<p>The expand-contract pattern requires more work upfront, but this investment pays dividends in reduced risk and maintained uptime. With the strategies and complete implementation provided here, you can successfully deploy even complex, stateful applications with confidence.</p>
<p>As always, I hope you enjoyed this guide and learned something. If you want to stay connected or see more hands-on DevOps content, you can follow me on <a target="_blank" href="https://www.linkedin.com/in/destiny-erhabor">LinkedIn</a>.</p>
<p>For more practical hands-on Cloud/DevOps projects like this one, follow and star this repository: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building">Learn-DevOps-by-building</a>.</p>
<h2 id="heading-further-resources">Further Resources</h2>
<ul>
<li><p>Complete Code: <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs">github.com/Caesarsage/bluegreen-deployment-ecs</a></p>
</li>
<li><p>Learn DevOps by Building: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building">GitHub repo</a></p>
</li>
<li><p>AWS ECS Blue/Green Documentation: <a target="_blank" href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-type-bluegreen.html">AWS Docs</a></p>
</li>
<li><p>AWS CodeDeploy for ECS: <a target="_blank" href="https://docs.aws.amazon.com/codedeploy/latest/userguide/deployment-steps-ecs.html">AWS Docs</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Run a Docker Container in AWS Lambda ]]>
                </title>
                <description>
                    <![CDATA[ While containers are quite lightweight and provide various benefits, it can be challenging to decide how best to deploy them. There are a number of ways to deploy and run Docker containers. But some are best for orchestrating and managing containers,... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-run-a-docker-container-in-aws-lambda/</link>
                <guid isPermaLink="false">694c7990b7478745bce04604</guid>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ serverless ]]>
                    </category>
                
                    <category>
                        <![CDATA[ lambda ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ containerization ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ecr ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Agnes Olorundare ]]>
                </dc:creator>
                <pubDate>Wed, 24 Dec 2025 23:38:56 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1766599506861/86c07e37-7838-4186-971e-29722ccec785.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>While containers are quite lightweight and provide various benefits, it can be challenging to decide how best to deploy them. There are a number of ways to deploy and run Docker containers. But some are best for orchestrating and managing containers, and may not suit a simple use case of running just one container.</p>
<p>In this article, I’ll teach you how you can deploy a single Docker container using a serverless service on AWS called Lambda.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisite-requirements">Prerequisite/ Requirements</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-serverless-with-aws-lambda">Serverless with AWS Lambda</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-run-and-test-a-container-locally">How to Build, Run, and Test a Container Locally</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-push-your-image-to-amazon-elastic-container-registry-ecr">How to Push Your Image to Amazon Elastic Container Registry (ECR)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-deploy-your-docker-image-to-lambda">How to Deploy Your Docker Image to Lambda</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-cleanup">Cleanup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisite-requirements">Prerequisite/ Requirements</h2>
<p>The following tools and skills are necessary for following along with this tutorial:</p>
<ul>
<li><p>Knowledge of Docker, and have Docker installed locally.</p>
</li>
<li><p>An AWS account with credentials with administrative privilege for making API calls via the CLI. Best practice would be to limit the privilege to exactly what needs to be done.</p>
</li>
<li><p>AWS CLI installed locally</p>
</li>
<li><p>Python virtual environment managers <a target="_blank" href="https://github.com/astral-sh/uv">such as uv</a> (optional)</p>
</li>
</ul>
<h2 id="heading-serverless-with-aws-lambda">Serverless with AWS Lambda</h2>
<p>Containers provide a lightweight, consistent, and resource-friendly way of running applications. Serverless takes away the overhead of managing the underlying infrastructures on which the container runs. So as you can probably start to see, combining these tools helps you deploy applications in a way that lets you focus on business logic, performance, and what gives your product a competitive edge/ advantage.</p>
<p>One AWS tool that enables you to go serverless is Lambda. With Lambda, you’re only billed for the number of times the code in the function runs, the memory you selected at the time of provisioning the service, and the duration of each invocation of the function.</p>
<p>In addition to removing operational overhead, Lambda can also help you save money since you won’t have to deal with idle resources. The function only comes alive when triggered by a request sent to it.</p>
<h2 id="heading-how-to-build-run-and-test-a-container-locally">How to Build, Run, and Test a Container Locally</h2>
<p>Docker is a tool that helps you package applications or software into portable, standardized and shareable units that have everything the applications need such as libraries, runtime, system tools, application code, in order to run. These units are called containers.</p>
<p>In this section, I’ll walk you through building the Docker image, running the container, and testing it after it’s running.</p>
<p>You can find the project that you’ll be using here in this <a target="_blank" href="https://github.com/Agnes4Him/freecodecamp-lambda-docker">GitHub repository</a>.</p>
<h3 id="heading-build-the-docker-image">Build the Docker Image</h3>
<p>To run a Docker container, you first need to build an image. The image becomes the template or <code>class</code> from which you create the container or <code>instance of the class</code>.</p>
<p>You can find the code to build an image in <code>lambda_function.py</code>.</p>
<pre><code class="lang-python"><span class="hljs-comment"># lambda_function.py</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">lambda_handler</span>(<span class="hljs-params">event, context</span>):</span>
    name = event[<span class="hljs-string">"name"</span>]
    message = <span class="hljs-string">f"Hello, <span class="hljs-subst">{name}</span>!"</span>

    <span class="hljs-keyword">try</span>:
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">"statusCode"</span>: <span class="hljs-number">200</span>,
            <span class="hljs-string">"body"</span>: message
        }
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">"statusCode"</span>: <span class="hljs-number">400</span>,
            <span class="hljs-string">"body"</span>: {<span class="hljs-string">"error"</span>: str(e)}
        }
</code></pre>
<p>As you can see from the code above, this is a very basic Python application that expects a <code>POST</code> HTTP request, with a JSON payload that contains the key – <code>name</code> – and a corresponding value. The code then returns a greeting containing the name it has received. The application has just a single function, which also serves as the entry point to it.</p>
<p>To build a Docker image, you’ll need a Dockerfile to provide the blueprint for the image. For this specific case, the Dockerfile you’ll use is also very basic. Each line in a Dockerfile is called a <code>Directive</code>, and this provides the instruction Docker should follow when creating an image. So building a Docker image means creating a template for a container by following the instructions or directives in the Dockerfile.</p>
<pre><code class="lang-plaintext"># Dockerfile

FROM public.ecr.aws/lambda/python:3.12

# Copy function code... LAMBDA_TASK_ROOT is /var/task, the working directory set in the base image
COPY lambda_function.py ${LAMBDA_TASK_ROOT}    

# Set the CMD to your handler - lambda_handler
CMD ["lambda_function.lambda_handler"]
</code></pre>
<p>A Dockerfile usually starts with a base image. To deploy an application as a Docker container in AWS Lambda, the base image has to be of a specific kind, depending on the application run-time. For this case, you’ll need the Python run-time, so the base image is <code>public.ecr.aws/lambda/python:3.12</code>. It’s okay to use a different Python version.</p>
<p>The next directive in the Dockerfile is copying the <code>lambda_function.py</code> file to a specific path in the base image. That path is referenced using an environment variable that has already been defined in the base image and points to <code>/var/task</code>. This is the directory your code will be running from.</p>
<p>The last directive is simply a command to start the application when the container runs.</p>
<p>Now, you can run the build command from the project’s root directory:</p>
<pre><code class="lang-bash">docker build -t &lt;IMAGE_NAME&gt;:&lt;iIMAGE_TAG&gt; .
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766415846066/f128b7fc-f3a0-4770-b361-3f27c36a6ec4.png" alt="Running docker build command on the terminal" class="image--center mx-auto" width="3710" height="891" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766415895836/d4653144-51b2-437d-8d73-4aaa42651206.png" alt="Output of docker images command showing a list of all existing images" class="image--center mx-auto" width="3710" height="891" loading="lazy"></p>
<h3 id="heading-run-the-docker-container">Run the Docker Container</h3>
<p>Next, let’s create a running container from this image.</p>
<pre><code class="lang-bash">docker run -it --rm -p 8080:8080  lambda_docker:1.0.0
</code></pre>
<p>The command above will create a container and run it in interactive mode just so you can see the logs generated by the application in the container. Port 8080 is also exposed on the host where the container is running and mapped to the container port, which is also 8080 (defined by AWS). The container gets automatically removed once you kill the running process with CTRL + C.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766416250857/62584a3c-bf5e-4cd9-b8d5-fc6734c50075.png" alt="Showing docker run command in interactive mode" class="image--center mx-auto" width="3710" height="891" loading="lazy"></p>
<h3 id="heading-test-the-running-container">Test the Running Container</h3>
<p>Now confirm that the application running within the container can receive and process requests. To do this, use the code in the <code>test.py</code> file:</p>
<pre><code class="lang-python"><span class="hljs-comment"># test.py</span>

<span class="hljs-keyword">import</span> requests

url = <span class="hljs-string">"http://localhost:8080/2015-03-31/functions/function/invocations"</span>

data = {
    <span class="hljs-string">"name"</span>: <span class="hljs-string">"Janet"</span>
}

response = requests.post(url, json=data)

print(<span class="hljs-string">"Status Code:"</span>, response.status_code)
print(<span class="hljs-string">"Response Body:"</span>, response.json())
</code></pre>
<p>You can use the Python <code>requests</code> library to make this call. Install the library by using a virtual environment to isolate the application from your overall system. This helps prevent issues with conflicts in the versions of libraries you install for an application to use.</p>
<p>If you’re using uv to manage your virtual environment, simply run the command:</p>
<pre><code class="lang-python">uv add requests
</code></pre>
<p>Then run the code in <code>test.py</code> from within the virtual environment:</p>
<pre><code class="lang-python">uv run python3 test.py
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766419713310/1ebc3435-3826-46fb-93f3-4218c367e280.png" alt="Testing that the running docker container is working by running test.py file" class="image--center mx-auto" width="3710" height="891" loading="lazy"></p>
<p>You should see the desired response on the terminal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766419866358/8f0c2867-64c6-4b16-a5a7-5a0eedf9470f.png" alt="Docker container logs in real time" class="image--center mx-auto" width="3710" height="891" loading="lazy"></p>
<h2 id="heading-how-to-push-your-image-to-amazon-elastic-container-registry-ecr">How to Push Your Image to Amazon Elastic Container Registry (ECR)</h2>
<p>Now that you have a working Docker image to deploy to Lambda, the next step is to push the image to a Docker registry. For this use case, your image has to be pushed to Amazon ECR, a container registry for storing Docker images.</p>
<p>To push your Docker image, you first need to tag the image, which simply means naming the image in a specific way.</p>
<p>Currently, this image tag is <code>lambda-docker:1.0.0</code>. To tag it the AWS way, first create an ECR repository. Let’s use the AWS CLI for this (this requires you to configure the AWS credentials locally by running the <code>aws configure</code> command and providing your credentials).</p>
<h3 id="heading-setup-environment-variables">Setup Environment Variables</h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Set AWS profile</span>
<span class="hljs-built_in">export</span> AWS_PROFILE=&lt;PROFILE_NAME&gt;
</code></pre>
<pre><code class="lang-bash"><span class="hljs-comment"># Set other variables</span>

AWS_REGION=&lt;AWS_REGION&gt;
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
REPO_NAME=lambda-docker
TAG=1.0.0
</code></pre>
<p>The above commands set the <code>AWS_PROFILE</code> for the CLI to target the right AWS account for API calls. The other variables specify the region, account ID, and the ECR repository name and tag.</p>
<h3 id="heading-create-ecr-repository-and-authenticate">Create ECR Repository and Authenticate</h3>
<p>Now, create the ECR repository:</p>
<pre><code class="lang-bash">aws ecr create-repository \
  --repository-name <span class="hljs-string">"<span class="hljs-variable">$REPO_NAME</span>"</span> \
  --region <span class="hljs-string">"<span class="hljs-variable">$AWS_REGION</span>"</span>
</code></pre>
<p>Authenticate to Amazon ECR:</p>
<pre><code class="lang-bash">aws ecr get-login-password --region <span class="hljs-string">"<span class="hljs-variable">$AWS_REGION</span>"</span> \
  | docker login \
  --username AWS \
  --password-stdin <span class="hljs-string">"<span class="hljs-variable">$ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com"</span>
</code></pre>
<h3 id="heading-tag-and-push-the-docker-image">Tag and Push the Docker Image</h3>
<p>Now, tag the Docker image:</p>
<pre><code class="lang-bash">docker tag <span class="hljs-variable">$REPO_NAME</span>:<span class="hljs-variable">$TAG</span> \
  <span class="hljs-variable">$ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$REPO_NAME</span>:<span class="hljs-variable">$TAG</span>
</code></pre>
<p>Push the image to the ECR repository you created:</p>
<pre><code class="lang-bash">docker push <span class="hljs-variable">$ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$REPO_NAME</span>:<span class="hljs-variable">$TAG</span>
</code></pre>
<p>And that’s it! Your image is now in ECR.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766420761622/5a18e41b-be41-4660-8d6c-59b12aebb4de.jpeg" alt="Image of Amazon ECR showing the repository created earlier" class="image--center mx-auto" width="1920" height="1037" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766420810814/9f65af4b-a509-45e3-be8f-0bed08cfe6b2.png" alt="Image of the docker image pushed to the existing ECR repository" class="image--center mx-auto" width="3710" height="1996" loading="lazy"></p>
<h2 id="heading-how-to-deploy-your-docker-image-to-lambda">How to Deploy Your Docker Image to Lambda</h2>
<p>With your image now in ECR, you can create a Lambda function. Navigate to the Lambda console, and click <code>Create a Function</code>.</p>
<h3 id="heading-create-lambda-function">Create Lambda Function</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766421062231/19bae74d-a6d5-4e73-8cca-102be40be214.png" alt="AWS Lambda Console" class="image--center mx-auto" width="3710" height="1996" loading="lazy"></p>
<p>Select <code>Container Image</code> and go ahead to search for the ECR repository you created.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766421207358/25ae6eb2-1b1b-43c7-86dc-6dcd512ddc81.jpeg" alt="Select ECR repository to create a Lambda function" class="image--center mx-auto" width="3710" height="1996" loading="lazy"></p>
<p>Next, select the image:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766421335963/ab7d9103-0ea6-4e25-be8c-139344acb5c5.png" alt="Select the existing Docker image from ECR" class="image--center mx-auto" width="3710" height="1996" loading="lazy"></p>
<p>Leave other configurations as default and click create.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766421506518/2f6e631a-a0c7-4f20-966f-2ef87f91bfb7.jpeg" alt="Hit the Create button to create a Lambda function" class="image--center mx-auto" width="1920" height="1033" loading="lazy"></p>
<p>Navigate to the function after creating.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766421673261/71c60ac4-35e7-4458-b4a7-1be2440b9e16.jpeg" alt="The newly created Lambda function dashboard/ overview" class="image--center mx-auto" width="3710" height="1996" loading="lazy"></p>
<h3 id="heading-test-deployment">Test Deployment</h3>
<p>Now, let’s test the deployment. For this, simply use the existing Lambda <code>Test</code> tab. Provide all the details needed, including the payload for your <code>POST</code> request.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766421769909/008473e4-bb28-4fdd-8c5b-7e1f3489a3a0.png" alt="Create a new test instance to test the Lambda function" class="image--center mx-auto" width="3710" height="1996" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766421889043/86f6dbe6-be94-4dca-973e-9e7b68064ff3.png" alt="The output of testing Lambda function" class="image--center mx-auto" width="3710" height="1996" loading="lazy"></p>
<p>And that’s it. You’ve successfully deployed a Docker container on AWS by leveraging ECR and Lambda. You can go a step forward by integrating API Gateway and making the function accessible from the internet.</p>
<h2 id="heading-cleanup">Cleanup</h2>
<p>Remember to delete the services you’ve created on your AWS ECR repository and Lambda to avoid extra charges.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Deploying your Docker container on AWS Lambda is an efficient way to get your application running quickly without being bothered by managing servers or platforms.</p>
<p>Thanks for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn Cloud Security Fundamentals in AWS – A Guide for Beginners ]]>
                </title>
                <description>
                    <![CDATA[ Security is a vital part of every system and infrastructure. The word "security" comes from the Latin securitas, which is composed of se- (meaning “without”) and cura (meaning “care” or “worry”). Originally, it meant "without worry." Over time, it ha... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-cloud-security-fundamentals-in-aws-a-guide-for-beginners/</link>
                <guid isPermaLink="false">69377429f6e4912378332463</guid>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cloud security ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #sharedresponsibilitymodel ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ BeginnerFriendly ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ijeoma Igboagu ]]>
                </dc:creator>
                <pubDate>Tue, 09 Dec 2025 00:58:17 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764958826138/12b261c3-9a38-4b0b-b67a-48ce54452d5f.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Security is a vital part of every system and infrastructure. The word "security" comes from the Latin <em>securitas</em>, which is composed of <em>se-</em> (meaning “without”) and <em>cura</em> (meaning “care” or “worry”). Originally, it meant "without worry." Over time, it has come to signify being safe or protected.</p>
<p>Today, when we discuss security, we usually refer to protection from harm, danger, or threats, whether in our homes, online, while using online banking, or even across an entire country. Security is important in everything we do.</p>
<p>Cloud providers, such as AWS, are no exception. Their infrastructure must be safeguarded to ensure users’ peace of mind. But on platforms like AWS, security is a <strong>shared responsibility</strong>. This means that both the provider and the user play a role in maintaining security.</p>
<p>Amazon Web Services (AWS) is one of the most popular cloud service providers worldwide. With great power and flexibility comes the responsibility to secure your infrastructure, data, and applications in the cloud.</p>
<p>In this tutorial, we’ll explore the fundamental aspects of cloud security in AWS – especially those that are your responsibility – making it easy to understand if you’re new to cloud computing.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-cloud-security">What is Cloud Security?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-is-cloud-security-important">Why is Cloud Security Important?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-cloud-security-concepts">Key Cloud Security Concepts</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-a-root-user">What is a Root User?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-an-iam-user-for-daily-tasks">How to Create an IAM User for Daily Tasks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-differences-between-the-root-user-and-an-iam-user">Key Differences Between the Root User and an IAM User</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-mfa">What is MFA?</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-aws-shared-responsibility-model">Understanding the AWS Shared Responsibility Model</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-rds-relational-database-service">RDS (Relational Database Service)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-s3-simple-storage-service">S3 (Simple Storage Service)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-give-a-user-permission">How to give a user permission</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-testing-the-policy">Testing the Policy</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-further-reading">Further Reading</a></li>
</ul>
</li>
</ol>
<h2 id="heading-what-is-cloud-security">What is Cloud Security?</h2>
<p>Cloud security is the set of rules, tools, and practices used to protect your data, apps, and services stored online (in the "cloud"). It helps prevent data loss, hacking, and misuse of information.</p>
<p>Think of cloud security like locking the doors of your house. You wouldn’t leave your doors open for anyone to enter. And in the same way, your cloud account must be secured so that your data remains safe.</p>
<p>If your cloud services aren't secure, hackers could steal your data or cause major damage. Whether you're a business or just someone using cloud apps, keeping your information safe is essential.</p>
<h2 id="heading-why-is-cloud-security-important">Why is Cloud Security Important?</h2>
<p>Cloud security matters because it ensures that only the right people have access to your information. It protects your data from being lost, stolen, or misused. With good security in place, your applications can run safely without being exposed to attacks.</p>
<p>It also helps you keep your personal or business data private. When your cloud environment is well-protected, the risk of data breaches and financial loss is greatly reduced.</p>
<p>Now that you understand why cloud security is important, let’s look at how AWS helps you stay secure and what your own role is in keeping things safe.</p>
<h2 id="heading-key-cloud-security-concepts">Key Cloud Security Concepts</h2>
<p>In AWS, cloud security is the responsibility of both AWS <strong>and</strong> the customer. This model is called the Shared Responsibility Model.</p>
<p>But before learning how AWS divides security duties, you need to understand that while AWS protects its infrastructure, you must protect your own account.</p>
<p>Let’s discuss some key security concepts that are your responsibility, so you know how to do your part in the shared responsibility model.</p>
<h3 id="heading-what-is-a-root-user">What is a Root User?</h3>
<p>When you create an AWS account, the first identity that’s created is the <strong>Root</strong> user Account. This account has full, unrestricted control. It can delete resources, change ownership, and even close your entire AWS account. Because of this, it’s risky to use it for everyday tasks.</p>
<p>AWS recommends using root only for a few important account-level actions.</p>
<p>Certain tasks require a root user account, so you will need to use it occasionally. Such tasks include:</p>
<ul>
<li><p>Updating billing and payment information</p>
</li>
<li><p>Closing your AWS account</p>
</li>
<li><p>Changing the root account email</p>
</li>
<li><p>Recovering or resetting MFA for the root user</p>
</li>
</ul>
<p>Apart from these few tasks, avoid using the root user Account completely. Your everyday work should be done through IAM users, not the root account.</p>
<h3 id="heading-how-to-create-an-iam-user-for-daily-tasks">How to Create an IAM User for Daily Tasks</h3>
<p>Before you start creating any infrastructure in your AWS account, you need an IAM user with the right permissions.</p>
<p>Here’s how to create an IAM user, step by step:</p>
<ul>
<li><p>Open the AWS console.</p>
</li>
<li><p>Search for <strong>IAM</strong>, then select it. This takes you to the <strong>IAM page</strong>.</p>
</li>
<li><p>On the left-hand side, you will see <strong>Users</strong>.</p>
</li>
<li><p>Click on it. This takes you to the <strong>Create user</strong> page.</p>
</li>
<li><p>Click the <strong>Create user</strong> button. It takes you to the “specify user details page” where you will create an <strong>IAM user.</strong></p>
</li>
<li><p>Enter a username (for example, <code>adminuser</code>).</p>
</li>
<li><p>Click on “Provide user access to the AWS Management Console”.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764236205525/47c2da54-537f-4242-9048-ced4b7ca6172.png" alt="providing a user access to the AWS console" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<ol start="4">
<li><p>Scroll down and click on “Set a password,” or let AWS generate one for you.</p>
</li>
<li><p>Click <strong>Next</strong> to go to the permissions page.</p>
</li>
<li><p>Select <strong>Attach existing policies directly</strong>.</p>
</li>
<li><p>Choose <code>AdministratorAccess</code>. This permission gives the IAM user full access to perform all administrative tasks in your account.</p>
</li>
<li><p>Click <strong>Create user</strong>.</p>
</li>
</ol>
<p>Once you’ve created this user, sign in with it and use it for your day-to-day tasks. The root user should stay locked down and only be used for rare account-level changes.</p>
<h4 id="heading-video-walkthrough-of-how-to-create-an-iam-user">Video Walkthrough of How to Create an IAM User:</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764235482269/bf5b1f00-13b2-4ff5-b1e4-e6b2fd2796cb.gif" alt="bf5b1f00-13b2-4ff5-b1e4-e6b2fd2796cb" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-key-differences-between-the-root-user-and-an-iam-user">Key Differences Between the Root User and an IAM User</h3>
<p>Just to be clear, let’s summarise the differences between these two accounts:</p>
<h4 id="heading-root-account">Root Account</h4>
<p>This is the very first account created when you set up AWS. It has unlimited power – literally, everything in the account can be changed, deleted, or closed.</p>
<p>It’s meant for rare, high-level tasks like billing changes, MFA resets, or closing the account. Because it’s so powerful, you shouldn’t use it for daily work.</p>
<h4 id="heading-iam-user-account">IAM User Account</h4>
<p>This is a user you create inside your AWS account for everyday tasks. You can assign specific permissions, like admin or limited access, to this user. It’s much safer because you can control what it can and cannot do.</p>
<p>If something goes wrong or the credentials are compromised, the blast radius is much smaller than for the root user.</p>
<p>In short, the Root is the master key too powerful for daily use. IAM users are customizable and safer for your regular work.</p>
<p>Here’s a helpful visual to show the differences between the two as well:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764248147852/7e053766-f6ba-4a17-9b63-c20695f2933c.jpeg" alt="differences between the two account" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now that you have both your root user and IAM user set up properly, let’s go back to the concept of multi-factor authentication, or MFA.</p>
<h3 id="heading-what-is-mfa">What is MFA?</h3>
<p>MFA adds another layer of security when you sign in. It combines something you know, like your password, with something you have, such as a phone or security device. Even if someone gets your password, they can’t log in without your MFA code.</p>
<p>You can enable MFA in several ways:</p>
<ul>
<li><p>Using a virtual MFA app like Google Authenticator or Authy</p>
</li>
<li><p>Using a physical security key such as a YubiKey</p>
</li>
<li><p>Using a hardware device from Gemalto</p>
</li>
<li><p>For AWS GovCloud users, using an MFA device from SurePassID</p>
</li>
</ul>
<p>Enabling MFA makes sure that even if someone gets your password, they still can’t access your account without the second authentication step.</p>
<p>For this tutorial, we’ll use the <strong>Google Authenticator app</strong>, which you can download for free from the Play Store.</p>
<p><strong>How do I turn this on for my account?</strong></p>
<ul>
<li><p>Go to your AWS account.</p>
</li>
<li><p>At the top right corner, you’ll see a menu with your account username or ID. Click on it to open the drop-down.</p>
</li>
<li><p>You’ll see <strong>Security Credentials</strong>. Click on it. This will take you to the IAM-Security Credentials page.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764342765366/76b8ef01-9be2-4d93-918e-aaadf9e771ed.png" alt="security credentials" width="600" height="400" loading="lazy"></p>
<ul>
<li><p>At the top of the page, you’ll see a button labelled <strong>Assign MFA device</strong>. Click on it.</p>
</li>
<li><p>You’ll be redirected to a new page where you can choose the type of MFA device you want to use. Scroll down and select <strong>Virtual MFA device</strong> (this is what the Google Authenticator app uses).</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764342668287/5867caaf-af5d-4dfe-b026-68a16dfa1ad1.png" alt="MFA page selection" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Then just follow the on-screen instructions:</p>
<ul>
<li><p>Open the <strong>Google Authenticator</strong> app on your phone.</p>
</li>
<li><p>Tap the <strong>+</strong> button and scan the QR code displayed on the AWS screen.</p>
</li>
<li><p>Enter the two codes generated by the app to verify your device.</p>
</li>
</ul>
<p>Once verified, AWS will link the MFA device to your account and take you back to the Security credentials page. If you scroll down, you’ll see your MFA device listed as <strong>assigned</strong>.</p>
<p>The next time you log in to AWS, you’ll be prompted to enter your MFA code from the Google Authenticator app before you can access your console.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764345845580/2eec5886-7ce7-43e0-8612-1035655a0619.gif" alt="MFA verification at login" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Always enable MFA for both your root user account and your IAM user account, as it’s one of the simplest and most effective ways to protect your AWS account.</p>
<p>Now that you understand these security fundamentals, we can get back to the shared responsibility model.</p>
<h2 id="heading-understanding-the-aws-shared-responsibility-model">Understanding the AWS Shared Responsibility Model</h2>
<p>The AWS Shared Responsibility Model divides responsibilities between AWS and the customer.</p>
<h3 id="heading-1-awss-responsibility-security-of-the-cloud">1. AWS’s Responsibility (Security of the Cloud)</h3>
<p>AWS is responsible for protecting the infrastructure that runs the services offered in the AWS Cloud. This includes physical security, hardware, software, networking, and facilities.</p>
<h3 id="heading-2-customers-responsibility-security-in-the-cloud">2. Customer’s Responsibility (Security in the Cloud)</h3>
<p>The customer is responsible for securing the data, user accounts, applications, and configurations they store in the cloud.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746970123152/016f8eef-fa2c-4bcb-866b-323fe7585b9d.png" alt="The shared responsibility model in AWS" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><strong><em>Image source:</em></strong> <a target="_blank" href="https://aws.amazon.com/compliance/shared-responsibility-model/"><em>AWS shared responsibility model</em></a></p>
<p>For example, AWS is responsible for securing its data centres and servers. But customers also have a role to play by properly configuring their accounts and resources.</p>
<p>Let’s take two popular AWS services, <strong>RDS (Relational Database Service)</strong> and <strong>S3 (Simple Storage Service)</strong>, as examples.</p>
<h3 id="heading-rds-relational-database-service">RDS (Relational Database Service)</h3>
<p><strong>AWS responsibilities:</strong></p>
<ul>
<li><p>Automates database patching</p>
</li>
<li><p>Audits and maintains the underlying instance and storage disks</p>
</li>
<li><p>Applies operating system patches automatically</p>
</li>
</ul>
<p><strong>Customer responsibilities (you):</strong></p>
<ul>
<li><p>Manage in-database users, roles, and permissions</p>
</li>
<li><p>Choose whether your database is public or private</p>
</li>
<li><p>Review and control inbound rules, ports, and IP addresses in the database’s security group</p>
</li>
<li><p>Configure database encryption settings</p>
</li>
</ul>
<h3 id="heading-s3-simple-storage-service">S3 (Simple Storage Service)</h3>
<p><strong>AWS responsibilities:</strong></p>
<ul>
<li><p>Ensures encryption options are available for your data</p>
</li>
<li><p>Guarantees virtually unlimited storage capacity</p>
</li>
<li><p>Prevents AWS employees and the public from accessing your data</p>
</li>
<li><p>Keeps each customer’s data separated from others</p>
</li>
</ul>
<p><strong>Customer responsibilities (you):</strong></p>
<ul>
<li><p>Define your S3 bucket policies according to your security standards</p>
</li>
<li><p>Review bucket configuration settings</p>
</li>
<li><p>Create and manage IAM users and roles with the right permissions</p>
</li>
</ul>
<p>Now you understand who’s responsible for what.</p>
<h2 id="heading-how-to-give-a-user-permission">How to Give a User Permission</h2>
<p>Security in the cloud isn’t just about strong passwords or enabling MFA – it’s also about controlling <em>who</em> can access <em>what</em>. One of the most important principles in AWS security is to grant users only the access they actually need, nothing more. That’s how you keep your environment safe and your resources protected.</p>
<p>So here’s a key question: how do we know to whom to allow or deny access in the cloud?</p>
<h3 id="heading-demonstration">Demonstration</h3>
<p>Let’s walk through a simple, real-life example together.</p>
<p>Imagine you have a developer on your team who needs access to an S3 bucket named <strong>demo-test-app-ij</strong>. The goal is to let them upload and view files in the bucket, but not delete anything.</p>
<p>We already created a user earlier in this guide, so we’ll use that same one here.</p>
<p>To get started, go to <strong>IAM</strong> from your AWS Management Console. Then click on <strong>Users</strong> from the left-hand menu.</p>
<p>Select the user we created earlier. If you don’t have one yet, go back and follow the steps I showed you before to create a new IAM user.</p>
<p>Once you click on the user’s name, you’ll be taken to the <strong>Permissions</strong> page. On the permissions page, click on <strong>Add permissions</strong>.</p>
<p>From the dropdown options, select <strong>Create inline policy</strong>. This will open the <strong>Specify permissions</strong> page, where you’ll define the user’s access.</p>
<p>Scroll down through the list of services and select <strong>S3</strong>. In our example, we’re using S3 because we want to control access to a specific bucket.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764317236877/10517fde-3bbd-4af3-9e16-289793894903.png" alt="An example showing the permission page" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Once you select the service you want to define permissions for, the <strong>Actions</strong> and <strong>Resources</strong> sections will appear automatically.</p>
<p>In the Actions section, you’ll see a list of what the user can do with the service. Here, you can toggle the effect button to either “Allow” or “Deny.”</p>
<p>Under Actions, scroll through the list and find <strong>DeleteObject</strong>. Set this action to Deny: DeleteObject. This ensures the user won’t be able to delete any files from the bucket.</p>
<p>Next, move on to the Resources section. Here, you’ll specify which bucket these permissions apply to.</p>
<p>Add the following bucket ARN: <code>arn:aws:s3:::demo-test-app-ij/*</code>. This means the rule applies to everything inside the <strong>demo-test-app-ij</strong> bucket.</p>
<p>Once you’ve added the ARN and confirmed the settings, click <strong>Save policy</strong>.</p>
<p>Now, let’s put all these instructions together in a practical example:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764335991549/1d33af0a-23c8-4790-aab4-b629db4b7ba9.gif" alt="1d33af0a-23c8-4790-aab4-b629db4b7ba9" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-testing-the-policy">Testing the Policy</h3>
<p>Now it’s time to confirm that our permissions work the way we expect.</p>
<p>Head over to the <strong>S3</strong> service and open the bucket named <strong>demo-test-app-ij</strong>. Try uploading a file; it should upload successfully. Next, try deleting that same file. You’ll see an error message saying <strong>Failed to delete objects</strong>.</p>
<p>That’s exactly what we want! The user can upload and view files, but can’t delete them, because we never permitted them to do so.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764336439667/279c1f4b-7d5b-477a-b782-cf1378204430.gif" alt="279c1f4b-7d5b-477a-b782-cf1378204430" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Security has always been about peace of mind. Whether it’s your home, your phone, or your cloud account, you’ll want to know your data is safe.</p>
<p>AWS gives you a strong foundation by securing the cloud itself. But your part matters too: things like enabling MFA, using strong passwords, and managing who can access what. These simple habits go a long way in keeping your data protected.</p>
<p>Cloud security isn’t a one-time setup. It’s an ongoing practice. When both AWS and its users stay alert, the cloud becomes a place you can trust to store, build, and grow with confidence.</p>
<p>Now that you have a basic understanding of how security works in AWS, you’re ready to go deeper and start exploring the services that keep it all running smoothly.</p>
<h3 id="heading-further-reading">Further Reading</h3>
<ul>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/cloud-computing-guide-for-beginners/">What is Cloud Computing? A Guide for Beginners</a></p>
</li>
<li><p><a target="_blank" href="https://twitter.com/ijaydimples">How t</a><a target="_blank" href="https://www.freecodecamp.org/news/how-to-deploy-a-kubernetes-app-on-aws-eks/">o D</a><a target="_blank" href="https://www.linkedin.com/in/ijeoma-igboagu/">eploy a</a> <a target="_blank" href="https://www.freecodecamp.org/news/how-to-deploy-a-kubernetes-app-on-aws-eks/">Kub</a><a target="_blank" href="https://github.com/ijayhub">ernete</a><a target="_blank" href="https://twitter.com/ijaydimples">s App o</a><a target="_blank" href="https://www.freecodecamp.org/news/how-to-deploy-a-kubernetes-app-on-aws-eks/">n A</a><a target="_blank" href="https://twitter.com/ijaydimples">WS EKS</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/best-aws-services-for-frontend-deployment/">T</a><a target="_blank" href="https://github.com/ijayhub">he Bes</a><a target="_blank" href="https://www.freecodecamp.org/news/best-aws-services-for-frontend-deployment/">t AW</a><a target="_blank" href="https://twitter.com/ijaydimples">S Servi</a><a target="_blank" href="https://www.freecodecamp.org/news/best-aws-services-for-frontend-deployment/">ces</a> <a target="_blank" href="https://www.linkedin.com/in/ijeoma-igboagu/">to Depl</a><a target="_blank" href="https://www.freecodecamp.org/news/best-aws-services-for-frontend-deployment/">oy</a> <a target="_blank" href="https://github.com/ijayhub">Front-</a><a target="_blank" href="https://www.freecodecamp.org/news/best-aws-services-for-frontend-deployment/">End Applications in 20</a><a target="_blank" href="https://twitter.com/ijaydimples">25</a></p>
</li>
<li><p><a target="_blank" href="https://twitter.com/ijaydimples">W</a><a target="_blank" href="https://www.freecodecamp.org/news/backend-as-a-service-beginners-guide/">hat</a> <a target="_blank" href="https://twitter.com/ijaydimples">is Back</a><a target="_blank" href="https://www.freecodecamp.org/news/backend-as-a-service-beginners-guide/">end</a> <a target="_blank" href="https://github.com/ijayhub">as a</a> <a target="_blank" href="https://www.freecodecamp.org/news/backend-as-a-service-beginners-guide/">Service (BaaS)? A Beginner's Guide</a></p>
</li>
<li><p><a target="_blank" href="https://dev.to/ijay/the-hidden-challenges-of-building-with-aws-8mg">The Hidden Challenges of Building with AWS</a></p>
</li>
</ul>
<p>If you found this article helpful, feel free to share it. And if you prefer learning through videos, I also explain cloud topics in simple terms on my <a target="_blank" href="https://www.youtube.com/@cloudinreallife">YouTube channel</a>.</p>
<p>Stay updated with my projects by following me on <a target="_blank" href="https://twitter.com/ijaydimples">Twitter</a>, <a target="_blank" href="https://www.linkedin.com/in/ijeoma-igboagu/">LinkedIn</a> and <a target="_blank" href="https://github.com/ijayhub">GitHub</a>.</p>
<p>Thank you for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build an AI-Powered RAG Chatbot with Amazon Lex, Bedrock, and S3 ]]>
                </title>
                <description>
                    <![CDATA[ Chatbots are widely adopted among software companies, especially those that interact heavily with customers. It is typically used for tasks such as customer support, answering questions, and providing information on websites, apps, and messaging plat... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-an-ai-powered-rag-chatbot/</link>
                <guid isPermaLink="false">6930baf2becf9d70ca80fcd0</guid>
                
                    <category>
                        <![CDATA[ chatbot ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Chisom Uma ]]>
                </dc:creator>
                <pubDate>Wed, 03 Dec 2025 22:34:26 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764801036311/e1bb9ed8-f64e-433f-916f-fd3079aac4d3.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Chatbots are widely adopted among software companies, especially those that interact heavily with customers. It is typically used for tasks such as customer support, answering questions, and providing information on websites, apps, and messaging platforms.</p>
<p>These days, as expected, some chatbots are AI-powered and can generate answers to queries through Retrieval-Augmented Generation (RAG). I have been curious about how this works, built it out myself, and now, we’ll look at how to build an AI-powered RAG chatbot.</p>
<p>For this tutorial, you’ll build a RAG chatbot that answers queries about travel policies to Mars. The chatbot retrieves its answers from our own data source (travel policy documents) stored in an S3 bucket. The document serves as our internal data source for the chatbot to reference when generating prompts.</p>
<p>Instead of scripted responses from pre-trained data, the chatbot will pull contextual answers directly from the knowledge base.</p>
<p>Let's get started :)</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-retrieval-augmented-generation-rag">What is Retrieval-Augmented Generation (RAG)?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-amazon-bedrock">What is Amazon Bedrock?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-getting-started-access-models-on-bedrock">Getting Started: Access models on Bedrock</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-upload-travel-policy-documents-to-the-s3-bucket">Step 1: Upload Travel Policy Documents to the S3 Bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-create-a-knowledge-base-in-amazon-bedrock">Step 2: Create a Knowledge base in Amazon Bedrock</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-create-an-amazon-lex-chatbot">Step 3: Create an Amazon Lex Chatbot</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-add-a-welcome-intent-to-your-chatbot">Step 4: Add a Welcome Intent to Your Chatbot</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-build-the-chatbot">Step 5: Build the Chatbot</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-adding-amazon-qnaintent">Step 6: Adding Amazon QnAIntent</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p>An AWS account, logged in as an IAM user with admin privileges</p>
</li>
<li><p>Access to Amazon Titan Embeddings G1 - text model on Amazon Bedrock</p>
</li>
<li><p>Access to Anthropic Claude 3.5 Sonnet on Amazon Bedrock.</p>
</li>
<li><p>Access to travel policy documents. You can download these from Google Drive <a target="_blank" href="https://drive.google.com/file/d/1kyewU4eCFnaYS3wQ7Fyv22G3ycthbfJb/view">here</a>.</p>
</li>
<li><p>Experience using the AWS console.</p>
</li>
<li><p>No coding required.</p>
</li>
</ul>
<h2 id="heading-what-is-retrieval-augmented-generation-rag">What is Retrieval-Augmented Generation (RAG)?</h2>
<p>Large Language Models (LLMs) like GPT-4 and Claude are basically everywhere. They get some things amazingly right and others very interestingly wrong, like hallucinations, where the model generates factually incorrect or fabricated information. This brings us to the idea of RAG.</p>
<p><em>Marina Danilevsky</em>, <em>Senior Research Scientist at IBM</em>, in a <a target="_blank" href="https://youtu.be/T-D1OfcDW1M?si=YFYcEeulZpXf9AXN">lecture</a>, referred to RAG as a “framework” for helping LLMs be more accurate and up-to-date.</p>
<p>Before going into the full scope of RAG, let’s talk briefly about the “generation” part. Generation, in the context of RAG, refers to LLMs that generate texts in response to a user query, referred to as a prompt.</p>
<p>These LLMs can sometimes give incorrect answers, due to limited context or outdated information. Especially because they only fetch information from pre-trained data. Imagine you're asked how many Grammy Awards your favorite artist has, and you give an answer you read in a magazine four years ago. You might be correct, but there are two problems with this answer: first, you didn't cite a source, and second, it's outdated.</p>
<p>This is the problem LLMs have traditionally had. The answers were outdated, and no credible sources were cited.</p>
<p>Now, imagine if you had looked up the answer first, from a reputable source on Google. Your answer would be more accurate and factual, and if there was ever a doubt from the person who asked the question, you could easily share the link to the reputable source on Google, and there would be no further doubts or questions.</p>
<p>What does this have to do with LLMs and RAG? Well, now, instead of the LLM only getting answers from its pre-trained data, risking providing outdated answers, when RAG gets involved, it retrieves answers to queries directly from a content store, which could comprise external sources, such as the internet, or internal sources, such as documents (which will be used in this tutorial). This way, its generated answers are more accurate.</p>
<p>RAG helps the LLM stay up to date by further retrieving information from other sources rather than solely from its pre-trained data.</p>
<h2 id="heading-what-is-amazon-bedrock">What is Amazon Bedrock?</h2>
<p><a target="_blank" href="https://aws.amazon.com/bedrock/?trk=68c792bf-53f8-44a0-a8eb-87bc8e0048bf&amp;sc_channel=ps&amp;ef_id=CjwKCAiAraXJBhBJEiwAjz7MZalEEwHhrurF7NUoWofbXeTPsMNnKXsegyAKvkDEfBF2f7Jd4xxwuhoCWW8QAvD_BwE:G:s&amp;s_kwcid=AL!4422!3!692062173758!e!!g!!amazon%20bedrock!21054971963!158684190945&amp;gad_campaignid=21054971963&amp;gbraid=0AAAAADjHtp__vpwZM9pm6Gjqc9UY3wYEa&amp;gclid=CjwKCAiAraXJBhBJEiwAjz7MZalEEwHhrurF7NUoWofbXeTPsMNnKXsegyAKvkDEfBF2f7Jd4xxwuhoCWW8QAvD_BwE">Amazon Bedrock</a> is AWS's managed service that gives you access to foundation models, essentially the core AI engines that power generative AI applications. The beauty of Bedrock is that it handles all the heavy lifting for you. No need to provision GPUs, set up model pipelines, or deal with infrastructure headaches.</p>
<p>It's a single platform where you can experiment with, customize, and deploy top-tier AI models from providers like <em>Anthropic</em>, <em>Stability A</em>I, and Amazon's own <em>Titan</em> models (used in this tutorial).</p>
<p>Here's a practical example: let’s say you're building a customer support chatbot. With Bedrock, you simply pick a language model that fits your needs, fine-tune it for your specific use case, and integrate it into your app, all without touching server configuration or infrastructure code.</p>
<h2 id="heading-getting-started-access-models-on-bedrock">Getting Started: Access models on Bedrock</h2>
<p>To get access to models on AWS via Bedrock:</p>
<ul>
<li><p>Log in to your AWS IAM account with root privileges.</p>
</li>
<li><p>Navigate to <strong>Amazon Bedrock &gt; Model catalog.</strong></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764493545469/839771fb-c3aa-47d4-b1b9-60cd67ba26c5.png" alt="Image of Amazon bedrock model catalog page" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<ul>
<li>Locate the “Titan Embeddings G1 - Text” model and “Claude 3.5 Sonnet” models.</li>
</ul>
<p>When you click these models, you are directed to a page with more details. You don’t need to do anything on this page. We will be using these models later in this tutorial. In the following sections, we’ll walk through the steps to build the chatbot.</p>
<h2 id="heading-step-1-upload-travel-policy-documents-to-the-s3-bucket">Step 1: Upload Travel Policy Documents to the S3 Bucket</h2>
<p>To upload documents, navigate to the Amazon S3 page in your AWS console, then create a bucket. For more details on creating a bucket, refer to the <a target="_blank" href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html">AWS documentation</a>. Next, upload the downloaded document to the S3 bucket.</p>
<p>Note that the document is zipped; you will need to unzip it before uploading.</p>
<h2 id="heading-step-2-create-a-knowledge-base-in-amazon-bedrock">Step 2: Create a Knowledge base in Amazon Bedrock</h2>
<p>Now that we have created our S3 buckets and uploaded our documents, we can’t just hook up our chatbot built with Lex directly to the S3 buckets. S3 isn’t really “smart” from an AI perspective. To get the AI capabilities needed to make this work, we need Amazon Bedrock.</p>
<p>First, we need to create a knowledge base in Amazon Bedrock.</p>
<p>To get started, head back to the Bedrock page opened up earlier and navigate to <strong>Build</strong> &gt; <strong>Knowledge Bases</strong>. Click <strong>Create</strong>. From the dropdown, choose “Knowledge base with vector store.” Leave IAM permissions as “Create and use a new service role”. This is what allows Bedrock to access other services. Choose “Amazon S3” as the data source type. Click <strong>Next</strong>.</p>
<p>Next, click <strong>Browse S3</strong> and select the created bucket with the uploaded documents. Click <strong>Next</strong>. On the next page, click <strong>Select model</strong> to choose an embedding model. Select the “Titan Embeddings G1 - Text”, then select “Amazon OpenSearch Serverless” and click <strong>Apply</strong>.</p>
<p>Leave everything else the same and click <strong>Next</strong>. On the next page, click <strong>Create Knowledge Base</strong>. Note that this takes some time (a few minutes), so you need to be patient with this step. Once your knowledge base is created, you’ll be taken to a new page with a message like one in the image below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764493716382/fd92d1d9-51ef-470f-bae9-b1cb636a260e.png" alt="image of successful Bedrock knowledge creation " class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The second message tells you that you need to sync the knowledge base with data sources. To do this, scroll down to the <em>Data source</em> section, select the data source, then click <strong>Sync</strong>. Wait a few seconds and everything syncs.</p>
<p><strong>Note:</strong> If you have more data than we have in this tutorial (just four PDFs), syncing may take longer.</p>
<p>Now, we have our Bedrock knowledge base set up. The knowledge base connects to the S3 bucket containing the travel documents.</p>
<p>It's now time to create the chatbot. For this, we’ll use <a target="_blank" href="https://aws.amazon.com/lex/">Amazon Lex</a>.</p>
<h2 id="heading-step-3-create-an-amazon-lex-chatbot">Step 3: Create an Amazon Lex Chatbot</h2>
<p>In your AWS console, navigate to Amazon Lex, then click <strong>Create bot</strong>. Select <strong>Create a blank bot</strong> under the <em>Traditional</em> creation method. For the bot name, you can call it “Mars travel bot” or any name you prefer.</p>
<p>Under the “<em>IAM permissions</em>” section, select <strong>Create a role with basic Amazon Lex permissions</strong>. Under the “<em>Children’s Online Privacy Protection Act (COPPA)</em>” section, select <strong>No,</strong> since our bot isn't subject to COPPA, and click <strong>Next</strong>.</p>
<p>On the next page, enter a short description in the <em>Description</em> text field. Select your preferred voice interaction option available for text-to-speech. This is the voice your users will hear when they use the chatbot.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764493888022/29ad9758-c0b7-42f4-8308-8255137c4649.png" alt="Image of Amazon Lex bot voices" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The cool thing about Lex is that you can play a voice sample for each voice. This can help you make the best decision for your business. Next, click <strong>Done</strong>.</p>
<h2 id="heading-step-4-add-a-welcome-intent-to-your-chatbot">Step 4: Add a Welcome Intent to Your Chatbot</h2>
<p>After hitting the <strong>Done</strong> button, you should see a page for creating an intent next. An intent is basically an action that fulfils a user's request.</p>
<p>Let's start with creating a welcome intent. To get started, change Intent name to “WelcomeIntent”. Then scroll down to the “<em>Sample utterances</em>” section and add utterances. These are example texts that you expect a user to type or speak when they start using your chatbot. So, if the user says “Hi” the chatbot responds with a welcome message. For this tutorial, I added the following expected utterances:</p>
<ul>
<li><p>“Hi”</p>
</li>
<li><p>“Hey”</p>
</li>
<li><p>“hello”</p>
</li>
</ul>
<p>You can add as many as you want.</p>
<p>In the “<em>Initial response</em>” section, you can provide a response to the user's utterance. Under the <strong>Message group</strong> dropdown, you can type in something like “Hi! welcome! How can I help you today?” Next, click the <em>Advanced options</em> button. This reveals a dialog box. Under <em>Set values</em>, select the “Wait for users input” option. You can select other options, but for this tutorial, we are going with this. Click <strong>Update options</strong>.</p>
<p>When you navigate back to the <em>Intents</em> page, you’ll notice a “Fallbackintent” intent automatically generated for you. This intent is supposed to be invoked when a user launches your bot with an utterance that differs from the one created for the welcome intent.</p>
<h2 id="heading-step-5-build-the-chatbot">Step 5: Build the Chatbot</h2>
<p>In the previous step, we built an intent for the bot. Now it's time to build the actual chatbot that bundles up all of this configuration into something usable.</p>
<p>To get started, click <strong>Build</strong> at the top-right side of your screen.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764494047899/cac4a609-f551-4b3e-9714-1330da3e7908.png" alt="Image of building bot" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Once the building is completed, you’ll get a message at the top of the page. Now, it’s time to test the bot. Next, click <strong>Test</strong> at the top-right side of your screen.</p>
<p>You get a pre-built chatbot for testing your implementation. Enter a text or utterance, in this case, for example, “Hi”, and you get an initial response. Remember, the utterance and initial response were set in the previous section.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764494109571/391fbcc9-b3b0-498c-85f9-43e28d12b58d.png" alt="Image of interaction with bot" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>When you click on Inspect. You’ll see the current intent. In this case, the welcomeIntent.</p>
<p>At this point, we haven’t fully integrated the AI capabilities required to get answers about travel policies to Mars.</p>
<h2 id="heading-step-6-adding-amazon-qnaintent">Step 6: Adding Amazon QnAIntent</h2>
<p>The Amazon QnAIntent introduces GenAI capabilities to our bot. It is a built-in intent that uses Generative AI to fulfill Frequently Asked Questions (FAQ) requests by querying the authorized knowledge content.</p>
<p>To get started, navigate to <strong>Add intent &gt; Use built-in intent</strong> on the Intents page. Select the QnAIntent option, as shown in the image below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764494196339/d28020cb-c3a2-4595-aae0-f02463d422a7.png" alt="Image of built-in intent" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Give it a name of your choice. Click <strong>Add</strong>. You’ll be directed to the intent page. In the “<em>QnA configuration”</em> section, select <strong>Claude3.5 Sonnet</strong> as the desired model.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764494285444/9d49daed-02ed-44a6-a7e6-763611f3a214.png" alt="Image of model and knowledge base config in Lex" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>For the ID, since we had already created a knowledge base earlier, navigate back to <strong>Amazon Bedrock</strong> &gt; <strong>Knowledge Bases</strong> and copy your <em>Knowledge Base ID</em> and paste it into the “Knowledge base for Amazon Bedrock Id” field.  Click <strong>Save intent</strong>. Before testing your changes, click <strong>Build</strong> to build the bot again.</p>
<p>Now, let’s run a little test with the chatbot. I will be prompting it about items I can expense for my trip.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764494425584/65ec0a99-e438-44d3-ad62-965d82200142.png" alt="Image of AI implementation and interaction with Chabot, retrieving answers to queries from S3 via Bedrock knowledgebase" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The image above shows me having a conversation with the chatbot. I sent an utterance for the welcome intent, and it responded with a welcome message. When I asked the chatbot about what items I can expense for the trip, it pulled the information from the Bedrock knowledge base, which is connected to the S3 bucket housing the travel policy documents.</p>
<p>Try experimenting with other questions like “How much does my trip cost?” or “Can I bring my pets?”</p>
<p>Want to add a proper web UI to your bot? Follow the step-by-step instructions in this <a target="_blank" href="https://github.com/aws-samples/aws-lex-web-ui">GitHub repository</a>.</p>
<p>FYI - you should delete resources such as your knowledge base, S3 bucket, and vector store (navigate to <strong>Amazon OpenSearch Service</strong> &gt; <strong>Serverless</strong> &gt; <strong>Dashboard</strong> and delete the knowledge base vector collection) to avoid incurring any unwanted charges from AWS.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You've just built an AI-powered chatbot that pulls answers from your own data sources. No more generic responses or outdated information. By combining Amazon Lex, Bedrock, S3, and RAG, you've created a system that actually understands your documentation/knowledge base and delivers accurate, contextual answers.</p>
<p>The real power here isn't just in the technology stack, it's in what you can do with it. Scale this approach to handle customer support queries, internal HR questions, product documentation, or any scenario where you need instant, accurate responses from your own knowledge base.</p>
<p>This is just the beginning. Experiment with different foundation models in Bedrock, expand your knowledge base with more documents, or refine your intents to handle more complex conversations. The infrastructure is built, now it's time to customize it for your specific use case.</p>
<p>If you found this tutorial helpful, consider sharing it with your team or fellow developers.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Full-Stack Serverless CRUD App using AWS and React ]]>
                </title>
                <description>
                    <![CDATA[ Imagine running a production application that automatically scales from zero to thousands of users without ever touching a server configuration. That's the power of serverless architecture, and it's easier to implement than you might think. If you're... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-full-stack-serverless-app/</link>
                <guid isPermaLink="false">68f7b6ca9d4df532a83f12f7</guid>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ serverless ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ React ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Chisom Uma ]]>
                </dc:creator>
                <pubDate>Tue, 21 Oct 2025 16:37:30 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1761064422167/c0a6b8ed-a500-43f2-820f-42fef5d73275.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Imagine running a production application that automatically scales from zero to thousands of users without ever touching a server configuration. That's the power of serverless architecture, and it's easier to implement than you might think.</p>
<p>If you're a junior cloud engineer ready to move beyond theoretical AWS concepts and build something real, this tutorial walks you through creating a complete serverless coffee shop management system.</p>
<p>You'll learn how to architect, deploy, and secure a production-ready application using AWS's most powerful serverless services.</p>
<p>Without further ado, let's get started!</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tools-well-be-using">Tools We’ll be Using</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-we-are-building">What We are Building</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-serverless">Why Serverless?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-architectural-overview">Architectural Overview</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-build-a-serverless-full-stack-app">Build a Serverless Full-Stack App</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-create-a-dynamodb-table">Step 1: Create a DynamoDB table</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-create-an-iam-role-for-the-lambda-function">Step 2: Create an IAM role for the Lambda function</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-create-lambda-layer-and-lambda-functions">Step 3: Create Lambda Layer And Lambda Functions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-create-an-api-gateway-to-expose-lambda-functions">Step 4: Create an API Gateway To Expose Lambda Functions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-set-up-react-application-and-upload-build-to-s3-bucket">Step 5: Set up React Application And Upload Build To S3 Bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-set-up-amazon-api-gateway-authorizer">Step 6: Set up Amazon API Gateway Authorizer</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-create-cloudfront-distribution-with-behaviors-for-s3-and-api-gateway">Step 7: Create Cloudfront Distribution With Behaviors For S3 And API Gateway</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-8-set-up-react-application-and-upload-build-to-s3-bucket">Step 8: Set up React Application And Upload Build To S3 Bucket</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-troubleshooting-access-denied-error">Troubleshooting Access Denied Error</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-set-up-origin-access-control-oac">Step 1: Set up Origin Access Control (OAC)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-update-s3-bucket-policy">Step 2: Update S3 Bucket Policy</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-set-default-root-object">Step 3: Set Default Root Object</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p>Basic knowledge of AWS.</p>
</li>
<li><p>Basic knowledge of AWS serverless services.</p>
</li>
<li><p>Knowledge of React (not required).</p>
</li>
<li><p>Basic knowledge of Postman or other API testing tools.</p>
</li>
</ul>
<h2 id="heading-tools-well-be-using">Tools We’ll be Using</h2>
<ul>
<li><p><a target="_blank" href="https://react.dev/">React.js</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/lambda/">AWS Lambda</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/dynamodb/">DynamoDB</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/api-gateway/">API Gateway</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/pm/cognito/?trk=14a3c368-cab2-4b17-9bd6-1bbec9e89f29&amp;sc_channel=ps&amp;ef_id=Cj0KCQjw9czHBhCyARIsAFZlN8QA0K8iJKGNUsG4QX-JlA1a2EMYyCbYff2A9zo-itZdGqnDcYYJVW4aApzbEALw_wcB:G:s&amp;s_kwcid=AL!4422!3!651541907485!e!!g!!cognito!19835790380!146491699385&amp;gad_campaignid=19835790380&amp;gbraid=0AAAAADjHtp9wvSpEmU_k_hjYPjL8j0lSi&amp;gclid=Cj0KCQjw9czHBhCyARIsAFZlN8QA0K8iJKGNUsG4QX-JlA1a2EMYyCbYff2A9zo-itZdGqnDcYYJVW4aApzbEALw_wcB">Cognito</a></p>
</li>
<li><p><a target="_blank" href="https://aws.amazon.com/cloudfront/">CloudFront</a></p>
</li>
</ul>
<h2 id="heading-what-we-are-building">What We are Building</h2>
<p>We'll build a complete serverless coffee shop management system using AWS cloud services. Coffee shop owners will securely log in through AWS Cognito authentication and have full control over their inventory, adding new products, updating stock levels, viewing current inventory, and removing discontinued items. To follow along with this tutorial, you can clone the repo <a target="_blank" href="https://github.com/ChisomUma/aws-serverless-arch-project">here</a>.</p>
<p>This is what our user interface (UI) looks like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760784475691/8d9ba162-74dd-447d-b627-3e67b8a944ae.png" alt="image of coffee shop dashboard serverless project" class="image--center mx-auto" width="375" height="357" loading="lazy"></p>
<h2 id="heading-why-serverless">Why Serverless?</h2>
<p>AWS serverless services like Lambda, Cognito, and API Gateway automatically scale to zero during quiet periods and instantly ramp up when traffic spikes. While 'serverless' might sound like there are no servers at all, this isn't actually the case. It means that AWS handles all the heavy lifting, provisioning, managing, and scaling of the infrastructure behind the scenes. You only pay for what you use.</p>
<h2 id="heading-architectural-overview">Architectural Overview</h2>
<p>Our architecture uses DynamoDB as the data store, with Lambda functions (enhanced by Lambda layers) handling all API Gateway requests. Cognito secures the API Gateway, while CloudFront CDN delivers everything globally. The React frontend connects directly to the Cognito UserPool and gets hosted on S3 with CloudFront distribution. For production deployments, you can add a custom domain using CloudFlare and AWS Certificate Manager.</p>
<h2 id="heading-build-a-serverless-full-stack-app">Build a Serverless Full-Stack App</h2>
<p>In this section, you’ll build a full-stack serverless architecture.</p>
<h3 id="heading-step-1-create-a-dynamodb-table">Step 1: Create a DynamoDB table</h3>
<p>To create a DynamoDB table, navigate to your AWS console and select the DynamoDB section. You can do this quickly by typing “DynamoDB” into the AWS search bar and clicking on DynamoDB. Next, follow the steps below to complete your table creation:</p>
<ol>
<li><p>Click <strong>Create table</strong>.</p>
</li>
<li><p>Input table name as “CoffeeShop” or anything you want to name it.</p>
</li>
<li><p>Input partition key as “coffeeId” or anything you want to name it.</p>
</li>
<li><p>Click <strong>Create table</strong>.</p>
</li>
</ol>
<p><strong>Step 1.1: Create items</strong></p>
<p>You need to create items for the table. This helps with testing connectivity to your DynamoDB table.</p>
<p>For our use case, we’ll be creating an item in the table called “coffee” and input attributes such as coffeeId, name, price, and availability. To create an item:</p>
<ol>
<li><p>Click <strong>Explore items</strong> on the left navigation pane.</p>
</li>
<li><p>Click <strong>Create items</strong>.</p>
</li>
<li><p>Click the <em>CoffeeShop</em> radio button, then click <strong>Create item</strong>.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760785698166/ee1f5e2d-feef-41de-80d8-eb2c4cad4d04.png" alt="image of dynamodb page" class="image--center mx-auto" width="1584" height="731" loading="lazy"></p>
<ol start="4">
<li>Click <strong>Add new attribute</strong>. This allows you to add different data types such as strings and booleans. The JSON structure below shows the attributes created.</li>
</ol>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"coffeeId"</span>: <span class="hljs-string">"c123"</span>,
    <span class="hljs-attr">"name"</span>: <span class="hljs-string">"new cold coffee"</span>,
    <span class="hljs-attr">"price"</span>: <span class="hljs-number">456</span>,
    <span class="hljs-attr">"available"</span>: <span class="hljs-literal">true</span>
}
</code></pre>
<h3 id="heading-step-2-create-an-iam-role-for-the-lambda-function">Step 2: Create an IAM role for the Lambda function</h3>
<p>Next, create a Lambda function that interacts with the DynamoDB table using an IAM role attached to the function. We’ll be setting up an IAM role named "CoffeeShopRole" that serves as a shared execution role for all Lambda functions in the coffee shop application.</p>
<p>This role includes the following permissions:</p>
<ul>
<li><p><strong>CloudWatch Logs</strong>: Full logging capabilities (create, write, and manage log streams)</p>
</li>
<li><p><strong>DynamoDB Access</strong>: Complete read, write, update, and delete operations on the "CoffeeShop" table.</p>
</li>
</ul>
<p>To do this:</p>
<ol>
<li><p>Navigate to the AWS IAM console.</p>
</li>
<li><p>Navigate to <strong>Roles</strong>.</p>
</li>
<li><p>Click <strong>Create role</strong>.</p>
</li>
<li><p>Select the Lambda service.</p>
</li>
<li><p>Search for “AWSLambdaBasicExecutionRole.”</p>
</li>
<li><p>Name your role and click <strong>Create role</strong>.</p>
</li>
</ol>
<p>This is what the role looks like:</p>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"Version"</span>: <span class="hljs-string">"2012-10-17"</span>,
    <span class="hljs-attr">"Statement"</span>: [
        {
            <span class="hljs-attr">"Sid"</span>: <span class="hljs-string">"VisualEditor0"</span>,
            <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
            <span class="hljs-attr">"Action"</span>: [
                <span class="hljs-string">"dynamodb:PutItem"</span>,
                <span class="hljs-string">"dynamodb:DeleteItem"</span>,
                <span class="hljs-string">"dynamodb:GetItem"</span>,
                <span class="hljs-string">"dynamodb:Scan"</span>,
                <span class="hljs-string">"dynamodb:UpdateItem"</span>
            ],
            <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:dynamodb::&lt;DYNAMODB_TABLE_NAME&gt;"</span>
        },
        {
            <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
            <span class="hljs-attr">"Action"</span>: [
                <span class="hljs-string">"logs:CreateLogGroup"</span>,
                <span class="hljs-string">"logs:CreateLogStream"</span>,
                <span class="hljs-string">"logs:PutLogEvents"</span>
            ],
            <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"*"</span>
        }
    ]
}
</code></pre>
<p>This policy allows us to create CloudWatch logs. Next, create an <strong>inline policy</strong> to allow communications to DynamoDB. Select the following actions for the table:</p>
<ul>
<li><p>Get</p>
</li>
<li><p>Put</p>
</li>
<li><p>Update</p>
</li>
<li><p>Scan</p>
</li>
<li><p>Delete</p>
</li>
</ul>
<p>Next, connect your table ARN to the policy by navigating to the created table and copying the ARN into the policy.</p>
<h3 id="heading-step-3-create-lambda-layer-and-lambda-functions">Step 3: Create Lambda Layer And Lambda Functions</h3>
<p>Now, we need to connect our Lambda function to the DynamoDB table. For this, we’ll need the DynamoDB JavaScript SDK. To get started, create two folders: <code>lambda</code> &gt; <code>get</code> in your IDE, preferably VS Code. Navigate into these folders in your terminal and run the <code>npm init</code> command to initialize your project. Update your <code>package.json</code> file with this:</p>
<pre><code class="lang-json">
{
  <span class="hljs-attr">"name"</span>: <span class="hljs-string">"get"</span>,
  <span class="hljs-attr">"type"</span>: <span class="hljs-string">"module"</span>,
  <span class="hljs-attr">"version"</span>: <span class="hljs-string">"1.0.0"</span>,
  <span class="hljs-attr">"main"</span>: <span class="hljs-string">"index.js"</span>,
  <span class="hljs-attr">"scripts"</span>: {
    <span class="hljs-attr">"test"</span>: <span class="hljs-string">"echo \"Error: no test specified\" &amp;&amp; exit 1"</span>
  },
  <span class="hljs-attr">"author"</span>: <span class="hljs-string">""</span>,
  <span class="hljs-attr">"license"</span>: <span class="hljs-string">"ISC"</span>,
  <span class="hljs-attr">"description"</span>: <span class="hljs-string">""</span>
}
</code></pre>
<p><strong>Note:</strong> that we’ll be using <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Glossary/ECMAScript">ECMAScript</a> <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Glossary/ECMAScript">throughou</a>t the course of this tutorial.</p>
<p>Next, we have to create a reusable Node.js Lambda layer containing the <a target="_blank" href="https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/javascript_dynamodb_code_examples.html">DynamoDB JavaScript SDK</a> and shared utility functions. This layer acts like a common library that can be attached to multiple Lambda functions, eliminating the need to bundle the same dependencies repeatedly in each function's deployment package.</p>
<p>To use the SDK, create a new folder in your directory titled <code>index.mjs</code> and paste in the code below:</p>
<pre><code class="lang-typescript">
<span class="hljs-comment">// getCoffee function</span>
<span class="hljs-keyword">import</span> { DynamoDBClient, GetItemCommand } <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/client-dynamodb"</span>; <span class="hljs-comment">// ESM import</span>
<span class="hljs-keyword">const</span> config = {
    region: <span class="hljs-string">"us-east-1"</span>,
};
<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> DynamoDBClient(config);
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> getCoffee = <span class="hljs-keyword">async</span> (event) =&gt; {
    <span class="hljs-keyword">const</span> coffeeId = <span class="hljs-string">"c123"</span>;
    <span class="hljs-keyword">const</span> input = {
        TableName: <span class="hljs-string">"CoffeShop"</span>,
        Key: {
            coffeeId: {
                S: coffeeId,
            },
        },
    };
    <span class="hljs-keyword">const</span> command = <span class="hljs-keyword">new</span> GetItemCommand(input);
    <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> client.send(command);
    <span class="hljs-built_in">console</span>.log(response);
    <span class="hljs-keyword">return</span> response;
}
</code></pre>
<p>The code above is the <code>getCoffee</code> function that connects to the DynamoDB table called <code>CoffeShop</code>, looks up the coffee with the ID <code>c123</code>, and displays its details.</p>
<p>Change <code>region</code> to your specific region.</p>
<p>Next, install the Lambda dependencies for the SDK using the command below:</p>
<pre><code class="lang-bash">
npm i @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb
</code></pre>
<p>Then, create a zip file for all the current files using the command below:</p>
<pre><code class="lang-bash">zip -r get.zip ./*
</code></pre>
<p>This creates a zip file in your project directory. Now, navigate to the Lambda function page on your AWS console and upload this zip file.</p>
<p>Click <strong>Test</strong> to test your application. If you run into an error, edit the Runtime settings and change the handler name to <code>index.getCoffee</code>. Deploy and run the code again, you should get a successful response from DynamoDB as shown below:</p>
<p>Response:</p>
<pre><code class="lang-bash">
{
  <span class="hljs-string">"<span class="hljs-variable">$metadata</span>"</span>: {
    <span class="hljs-string">"httpStatusCode"</span>: 200,
    <span class="hljs-string">"requestId"</span>: <span class="hljs-string">"R14Q5UMTP3K9P9NAF1OGG0IB57VV4KQNSO5AEMVJF66Q9ASUAAJG"</span>,
    <span class="hljs-string">"attempts"</span>: 1,
    <span class="hljs-string">"totalRetryDelay"</span>: 0
  },
  <span class="hljs-string">"Item"</span>: {
    <span class="hljs-string">"available"</span>: {
      <span class="hljs-string">"BOOL"</span>: <span class="hljs-literal">true</span>
    },
    <span class="hljs-string">"price"</span>: {
      <span class="hljs-string">"N"</span>: <span class="hljs-string">"34"</span>
    },
    <span class="hljs-string">"name"</span>: {
      <span class="hljs-string">"S"</span>: <span class="hljs-string">"My New Coffee"</span>
    },
    <span class="hljs-string">"coffeeId"</span>: {
      <span class="hljs-string">"S"</span>: <span class="hljs-string">"c123"</span>
    }
  }
}
</code></pre>
<p>Now, let’s make the necessary changes to make our function ready for the API gateway to get the API. When someone requests a coffee using the <code>/coffee</code> endpoint, we want the app to returns a list of all coffees. But if the request is made to <code>/coffee/c123</code> or <code>/coffee/id</code>, then the app returns only details about that specific coffee.</p>
<p>To do this, head back to your <code>index.mjs</code> file and paste in the code below:</p>
<pre><code class="lang-typescript">
<span class="hljs-keyword">import</span> { DynamoDBClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/client-dynamodb"</span>;
<span class="hljs-keyword">import</span> { DynamoDBDocumentClient, GetCommand, ScanCommand } <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/lib-dynamodb"</span>;
<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> DynamoDBClient({});
<span class="hljs-keyword">const</span> docClient = DynamoDBDocumentClient.from(client);
<span class="hljs-keyword">const</span> tableName = process.env.tableName || <span class="hljs-string">"CoffeShop"</span>;
<span class="hljs-keyword">const</span> createResponse = <span class="hljs-function">(<span class="hljs-params">statusCode, body</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> responseBody = <span class="hljs-built_in">JSON</span>.stringify(body);
    <span class="hljs-keyword">return</span> {
        statusCode,
        headers: { <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span> },
        body: responseBody,
    };
};
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> getCoffee = <span class="hljs-keyword">async</span> (event) =&gt; {
    <span class="hljs-keyword">const</span> { pathParameters } = event;
    <span class="hljs-keyword">const</span> { id } = pathParameters || {};
    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">let</span> command;
        <span class="hljs-keyword">if</span> (id) {
            command = <span class="hljs-keyword">new</span> GetCommand({
                TableName: tableName,
                Key: {
                    <span class="hljs-string">"coffeeId"</span>: id,
                },
            });
        }
        <span class="hljs-keyword">else</span> {
            command = <span class="hljs-keyword">new</span> ScanCommand({
                TableName: tableName,
            });
        }
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> docClient.send(command);
        <span class="hljs-keyword">return</span> createResponse(<span class="hljs-number">200</span>, response);
    }
    <span class="hljs-keyword">catch</span> (err) {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Error fetching data from DynamoDB:"</span>, err);
        <span class="hljs-keyword">return</span> createResponse(<span class="hljs-number">500</span>, { error: err.message });
    }
}
</code></pre>
<p>Run the <code>zip -r get.zip ./*</code> command again and re-upload the zip file in your Lambda function page.</p>
<p>This AWS Lambda function implements a serverless API endpoint for retrieving coffee data from a DynamoDB table, using the <a target="_blank" href="https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/dynamodb/command/GetItemCommand/">AWS SDK v3</a> to create a document client that can either fetch a specific coffee item by ID (when an <code>id</code> parameter is provided in the URL path) or return all items from the table (when no ID is specified, though there's a missing import for <code>ScanCommand</code>).</p>
<p>The function extracts the coffee ID from the incoming event's path parameters, constructs the appropriate DynamoDB command (<code>GetCommand</code> for single items or <code>ScanCommand</code> for all items), executes the database operation, and returns a properly formatted HTTP response with JSON headers and appropriate status codes - either a 200 success response with the coffee data or a 500 error response if something goes wrong during the database operation.</p>
<p>Repeat the steps above for the <code>create</code>, <code>update</code>, and <code>delete</code> functions. You can find these functions in your cloned <a target="_blank" href="https://github.com/ChisomUma/aws-serverless-arch-project">project repo</a>.</p>
<h3 id="heading-step-4-create-an-api-gateway-to-expose-lambda-functions">Step 4: Create an API Gateway To Expose Lambda Functions</h3>
<p>To create an API that points to the Lambda function:</p>
<ol>
<li><p>Navigate to <strong>API Gateway</strong> &gt; <strong>Routes</strong> and click <strong>Create.</strong></p>
</li>
<li><p>Create the following endpoints.</p>
</li>
</ol>
<pre><code class="lang-bash">
GET /coffee  -&gt; getCoffee lambda <span class="hljs-keyword">function</span>
GET /coffee/{id}  -&gt; getCoffee lambda <span class="hljs-keyword">function</span>
POST /coffee  -&gt; createCoffee lambda <span class="hljs-keyword">function</span>
PUT /coffee/{id}  -&gt; updateCoffee lambda <span class="hljs-keyword">function</span>
DELETE /coffee/{id}  -&gt; deleteCoffee lambda <span class="hljs-keyword">function</span>
</code></pre>
<ol start="3">
<li>Navigate to <strong>Integrations</strong> and create integrations for these endpoints. To do this, go to the <strong>Manage integrations</strong> tab, click <strong>Create,</strong> and select Lambda as the integration target.</li>
</ol>
<p>Now, in your API Gateway portal, click on <code>API: CoffeeShop...(random numbers)</code> and copy the invoke URL for testing, as shown in the image below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760792772732/1d453e97-ce05-4be2-ae6d-d7eb55f86820.png" alt="image of postman interface during testing" class="image--center mx-auto" width="747" height="644" loading="lazy"></p>
<p>The <code>get</code> request with an <code>id</code> returns a <code>200 OK</code> response with the created items in DynamoDB. You can play around with the rest of the endpoints on Postman :)</p>
<p><strong>Adding Lambda Layer to Solve the Dependency Issue</strong></p>
<p>Before we continue with this tutorial, I’d like to address one problem with the previous steps so far. All functions use the same dependency, but for each function, we had to maintain separate <code>node_modules</code> folders and <code>packages.json</code> files. To fix this issue, we’ll be using <a target="_blank" href="https://docs.aws.amazon.com/lambda/latest/dg/chapter-layers.html">Lamba Layer.</a> Layer contains all the dependencies, while the functions contain only your code.</p>
<p>To get started:</p>
<ol>
<li><p>Create a new folder in your IDE called <code>LambdaWithLayer</code>.</p>
</li>
<li><p>Create two additional folders under the <code>LambdaWithLayer</code> named <code>LambdaFunctionsWithLayer</code> and <code>nodejs</code>.</p>
</li>
</ol>
<p><strong>Note:</strong> You <em>must</em> use the name <code>nodejs</code> for this to work.</p>
<ol start="3">
<li><p>Navigate to the <code>nodejs</code> folder and initialize using the npm init command.</p>
</li>
<li><p>Install dependencies using the command below:</p>
</li>
</ol>
<pre><code class="lang-bash">npm i @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb
</code></pre>
<ol start="5">
<li>Create a new file called <code>utils.js</code> under the <code>nodejs</code> folder and paste in the code below:</li>
</ol>
<pre><code class="lang-typescript">
<span class="hljs-keyword">import</span> { DynamoDBClient } <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/client-dynamodb"</span>;
<span class="hljs-keyword">import</span> {
    DynamoDBDocumentClient,
    ScanCommand,
    GetCommand,
    PutCommand,
    UpdateCommand,
    DeleteCommand
} <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/lib-dynamodb"</span>;
<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> DynamoDBClient({});
<span class="hljs-keyword">const</span> docClient = DynamoDBDocumentClient.from(client);
<span class="hljs-keyword">const</span> createResponse = <span class="hljs-function">(<span class="hljs-params">statusCode, body</span>) =&gt;</span> {
    <span class="hljs-keyword">return</span> {
        statusCode,
        headers: { <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span> },
        body: <span class="hljs-built_in">JSON</span>.stringify(body),
    };
};
<span class="hljs-keyword">export</span> {
    docClient,
    createResponse,
    ScanCommand,
    GetCommand,
    PutCommand,
    UpdateCommand,
    DeleteCommand
};
</code></pre>
<p>Here, we imported all the commands for our API operations. Now, we can create Lambda Functions without installing the SDK dependencies for each one. For example, you can create a <code>get</code> folder under the <code>LambdaFunctionsWithLayer</code> folder for the <code>get</code> function, then create an <code>index.mjs</code> file under the <code>get</code> folder. Next, paste the code below:</p>
<pre><code class="lang-typescript">
<span class="hljs-keyword">import</span> { docClient, GetCommand, ScanCommand, createResponse } <span class="hljs-keyword">from</span> <span class="hljs-string">'/opt/nodejs/utils.mjs'</span>; <span class="hljs-comment">// Import from Layer</span>
<span class="hljs-keyword">const</span> tableName = process.env.tableName || <span class="hljs-string">"CoffeShop"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> getCoffee = <span class="hljs-keyword">async</span> (event) =&gt; {
    <span class="hljs-keyword">const</span> { pathParameters } = event;
    <span class="hljs-keyword">const</span> { id } = pathParameters || {};
    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">let</span> command;
        <span class="hljs-keyword">if</span> (id) {
            command = <span class="hljs-keyword">new</span> GetCommand({
                TableName: tableName,
                Key: {
                    <span class="hljs-string">"coffeeId"</span>: id,
                },
            });
        }
        <span class="hljs-keyword">else</span> {
            command = <span class="hljs-keyword">new</span> ScanCommand({
                TableName: tableName,
            });
        }
        <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> docClient.send(command);
        <span class="hljs-keyword">return</span> createResponse(<span class="hljs-number">200</span>, response);
    }
    <span class="hljs-keyword">catch</span> (err) {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Error fetching data from DynamoDB:"</span>, err);
        <span class="hljs-keyword">return</span> createResponse(<span class="hljs-number">500</span>, { error: err.message });
    }
}
</code></pre>
<p>Now we can see that, in the code, we no longer require dependencies for the <code>get</code> function. We just imported from the layer.</p>
<p>Repeat the above steps for other functions.</p>
<p><strong>Note:</strong> You can find the code for other functions in <a target="_blank" href="https://github.com/ChisomUma/aws-serverless-arch-project">the cloned repo</a>.</p>
<ol start="6">
<li>Create a zip folder for each function. You can do this by creating a file called <code>create_zip.sh</code> under the <code>LambdaFunctionsWithLayer</code> folder. Then paste the script below:</li>
</ol>
<pre><code class="lang-bash">
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating zip for layer"</span>
zip -r layer.zip nodejs
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating zip for GET Function"</span>
<span class="hljs-built_in">cd</span> LambdaFunctionsWithLayer/get
zip -r get.zip index.mjs
mv get.zip ../../
<span class="hljs-built_in">cd</span> ../..
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating zip for POST Function"</span>
<span class="hljs-built_in">cd</span> LambdaFunctionsWithLayer/post
zip -r post.zip index.mjs
mv post.zip ../../
<span class="hljs-built_in">cd</span> ../..
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating zip for UPDATE Function"</span>
<span class="hljs-built_in">cd</span> LambdaFunctionsWithLayer/update
zip -r update.zip index.mjs
mv update.zip ../../
<span class="hljs-built_in">cd</span> ../..
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating zip for DELETE Function"</span>
<span class="hljs-built_in">cd</span> LambdaFunctionsWithLayer/delete
zip -r delete.zip index.mjs
mv delete.zip ../../
<span class="hljs-built_in">cd</span> ../..
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Success!"</span>
</code></pre>
<p>Run the script using the <code>sh create_zip.sh</code> command. This creates zip files (including a <code>layer.zip</code> file) that you can upload to your AWS Lambda function Layer page.</p>
<ol start="7">
<li><p>In your AWS Lambda function page, navigate to <strong>Layers</strong> and upload the <code>layer.zip</code> file**.**</p>
</li>
<li><p>Update the functions by uploading the newly created zip files for each code.</p>
</li>
<li><p>Add the layer to the function by clicking <strong>Layers</strong> in the function view:</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760793962650/8e797256-af9e-445d-8025-2fbd29dfe87f.png" alt="image of get coffee lambda layer" class="image--center mx-auto" width="407" height="125" loading="lazy"></p>
<p>Next, click <strong>Add a layer,</strong> then select <strong>Custom layers.</strong> Then choose <strong>“DynamoDBLayer”</strong> and version <strong>“1”.</strong></p>
<ol start="10">
<li><p>Click <strong>Add</strong>.</p>
</li>
<li><p>Repeat for all the other functions.</p>
</li>
</ol>
<h3 id="heading-step-5-set-up-react-application-and-upload-build-to-s3-bucket">Step 5: Set up React Application And Upload Build To S3 Bucket</h3>
<p>To set up our React application, navigate to the <code>frontend</code> folder of the cloned repository on your local machine and run <code>npm install</code> to install the dependencies. Then run <code>npm run dev</code> to start your development environment on your local machine. You should see the preview in your browser at: <a target="_blank" href="http://localhost:5173/"><code>http://localhost:5173/</code></a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760794168737/0cace684-7f8b-47db-944a-7a642d991ca0.png" alt="image of coffe list ui" class="image--center mx-auto" width="375" height="357" loading="lazy"></p>
<p>If you inspect the page using <a target="_blank" href="https://developer.chrome.com/docs/devtools">Chrome DevTools</a>, you’ll see that we ran into some <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CORS">CORS</a> error:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760794416609/0cd6196e-b7cc-4f61-af5f-77995d6139ec.png" alt="image of chrome dev tool console" class="image--center mx-auto" width="541" height="322" loading="lazy"></p>
<p>Now, let’s fix this problem. To do that:</p>
<ol>
<li><p>Navigate your API Gateway page.</p>
</li>
<li><p>Click on <strong>CORS</strong> on the left navigation panel.</p>
</li>
<li><p>Click <strong>Configure</strong>.</p>
</li>
<li><p>Copy your <a target="_blank" href="http://localhost">localhost</a> URL and paste it into the <strong>Access-Control-Allow-Origin</strong> field.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760794511537/26a1917b-16ae-48bc-a786-b36c6bb31490.png" alt="image of cors configuration" class="image--center mx-auto" width="696" height="196" loading="lazy"></p>
<p>Ensure to remove the <code>/</code> at the end of your URL as shown in the image above.</p>
<ol start="5">
<li><p>Click <strong>Add</strong>.</p>
</li>
<li><p>Enter the <strong>Access-Control-Allow-Headers</strong> field with the text content-type and click <strong>Add</strong>.</p>
</li>
<li><p>Include <code>GET</code>, <code>POST</code>, <code>OPTIONS</code>, <code>PUT</code>, and <code>DELETE</code> in <strong>Access-Control-Allow-Methods.</strong></p>
</li>
<li><p>Click <strong>Save</strong>.</p>
</li>
</ol>
<p>Now it returns our coffee, and the CORS error has been resolved.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760794692165/7d53573d-f2e0-456d-a1da-0d06265d78a9.png" alt="image of solved cors error" class="image--center mx-auto" width="859" height="578" loading="lazy"></p>
<p>When you add a new coffee, you should see the newly created items in your DynamoDB database.</p>
<h3 id="heading-step-6-set-up-amazon-api-gateway-authorizer">Step 6: Set up Amazon API Gateway Authorizer</h3>
<p>AWS Congnito helps you secure your Amazon API Gateway. Gateway validates the access token with Amazon Cognito to ensure it is valid and has not expired, and grants or denies access based on token validity.</p>
<p>To get started:</p>
<ol>
<li><p>Navigate to <strong>Amazon Cognito &gt; User pools</strong>.</p>
</li>
<li><p>Click <strong>Create user pool</strong>.</p>
</li>
<li><p>Select <strong>Single-page application (SPA)</strong>.</p>
</li>
<li><p>Select email as the preferred sign-in and sign-up method.</p>
</li>
<li><p>Use <code>http://localhost:5174/</code> or your own local URL as the return URL.</p>
</li>
<li><p>Click <strong>Create user directory</strong>.</p>
</li>
</ol>
<p>You’ll be presented with a page containing code that we can copy and paste into our app for integration. But before we do that, let's head back to API Gateway and integrate it with Cognito. To do that:</p>
<ol>
<li><p>Go to the Authorization section in API Gateway.</p>
</li>
<li><p>Navigate to <strong>Manage authorizers</strong>.</p>
</li>
<li><p>Click <strong>Create</strong>.</p>
</li>
<li><p>Select JWT and name it “Cognito-CoffeeShop”</p>
</li>
<li><p>Copy your issuer URL from Cognito Overview. Your issuer URL is the <em>Token signing key URL</em>. If you click on the URL, you’ll be taken to your browser, where you'll see the keys that’ll be used for verification.</p>
</li>
<li><p>For the Audience, navigate to the Cognito user pool, then to App clients, and select CoffeShopClient. Copy the Client ID.</p>
</li>
<li><p>Click <strong>Create</strong>.</p>
</li>
<li><p>Go to Routes and add authorizations to each endpoint.</p>
</li>
</ol>
<p>Now, to integrate with our front-end app:</p>
<p>Navigate into the frontend folder and run the command below:</p>
<pre><code class="lang-bash">npm install oidc-client-ts react-oidc-context --save
</code></pre>
<ol start="2">
<li><p>Go to the <strong>App clients</strong> section in Cognito user pools to find the readily available code snippets for integration.</p>
</li>
<li><p>Edit your <code>main.jsx</code> file to include the code below:</p>
</li>
</ol>
<pre><code class="lang-typescript">
<span class="hljs-keyword">import</span> { createRoot } <span class="hljs-keyword">from</span> <span class="hljs-string">'react-dom/client'</span>
<span class="hljs-keyword">import</span> { BrowserRouter <span class="hljs-keyword">as</span> Router, Route, Routes } <span class="hljs-keyword">from</span> <span class="hljs-string">"react-router-dom"</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'./index.css'</span>
<span class="hljs-keyword">import</span> App <span class="hljs-keyword">from</span> <span class="hljs-string">'./App.jsx'</span>
<span class="hljs-keyword">import</span> ItemDetails <span class="hljs-keyword">from</span> <span class="hljs-string">"./ItemDetails"</span>;
<span class="hljs-keyword">import</span> { AuthProvider } <span class="hljs-keyword">from</span> <span class="hljs-string">"react-oidc-context"</span>;
<span class="hljs-keyword">const</span> cognitoAuthConfig = {
  authority: <span class="hljs-string">"https://cognito-idp.us-east-1.amazonaws.com/us-east-1_rXq7q3KLm"</span>,
  client_id: <span class="hljs-string">"6fjfrlaup7oph5lhf1q8q6pnp4"</span>,
  redirect_uri: <span class="hljs-string">"http://localhost:5174"</span>,
  response_type: <span class="hljs-string">"code"</span>,
  scope: <span class="hljs-string">"email openid phone"</span>,
};
createRoot(<span class="hljs-built_in">document</span>.getElementById(<span class="hljs-string">'root'</span>)).render(
  &lt;AuthProvider {...cognitoAuthConfig}&gt;
    &lt;Router&gt;
      &lt;div&gt;
        &lt;Routes&gt;
          &lt;Route path=<span class="hljs-string">"/"</span> element={&lt;App /&gt;} /&gt;
          &lt;Route path=<span class="hljs-string">"/details/:id"</span> element={&lt;ItemDetails /&gt;} /&gt;
        &lt;/Routes&gt;
      &lt;/div&gt;
    &lt;/Router&gt;
  &lt;/AuthProvider&gt;
)
</code></pre>
<p>Here, we imported <code>AuthProvider</code> from <code>react-oidc-context</code>, then wrapped our app with <code>AuthProvider</code>.  Then, move the code in the <code>App.jsx</code> file to a newly created <code>Home.jsx</code> file, and update <code>App.jsx</code> file with the code below:</p>
<pre><code class="lang-typescript">
<span class="hljs-keyword">import</span> { useEffect, useState } <span class="hljs-keyword">from</span> <span class="hljs-string">"react"</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">"./App.css"</span>;
<span class="hljs-comment">// App.js</span>
<span class="hljs-keyword">import</span> { useAuth } <span class="hljs-keyword">from</span> <span class="hljs-string">"react-oidc-context"</span>;
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> auth = useAuth();
  <span class="hljs-keyword">const</span> signOutRedirect = <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">const</span> clientId = <span class="hljs-string">"6fjfrlaup7oph5lhf1q8q6pnp4"</span>;
    <span class="hljs-keyword">const</span> logoutUri = <span class="hljs-string">"http://localhost:5174/"</span>;
    <span class="hljs-keyword">const</span> cognitoDomain = <span class="hljs-string">"https://us-east-1rxq7q3klm.auth.us-east-1.amazoncognito.com"</span>;
    <span class="hljs-built_in">window</span>.location.href = <span class="hljs-string">`<span class="hljs-subst">${cognitoDomain}</span>/logout?client_id=<span class="hljs-subst">${clientId}</span>&amp;logout_uri=<span class="hljs-subst">${<span class="hljs-built_in">encodeURIComponent</span>(logoutUri)}</span>`</span>;
  };
  <span class="hljs-keyword">if</span> (auth.isLoading) {
    <span class="hljs-keyword">return</span> &lt;div&gt;Loading...&lt;/div&gt;;
  }
  <span class="hljs-keyword">if</span> (auth.error) {
    <span class="hljs-keyword">return</span> &lt;div&gt;Encountering error... {auth.error.message}&lt;/div&gt;;
  }
  <span class="hljs-keyword">if</span> (auth.isAuthenticated) {
    <span class="hljs-keyword">return</span> (
      &lt;div&gt;
        &lt;button onClick={<span class="hljs-function">() =&gt;</span> auth.removeUser()}&gt;Sign out&lt;/button&gt;
        &lt;Home /&gt;
      &lt;/div&gt;
    );
  }
  <span class="hljs-keyword">return</span> (
    &lt;div&gt;
      &lt;button onClick={<span class="hljs-function">() =&gt;</span> auth.signinRedirect()}&gt;Sign <span class="hljs-keyword">in</span>&lt;/button&gt;
      &lt;button onClick={<span class="hljs-function">() =&gt;</span> signOutRedirect()}&gt;Sign out&lt;/button&gt;
    &lt;/div&gt;
  );
}
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> App;
</code></pre>
<p>Now, when you run the application again, you should see this login page on your browser:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760795002733/2be7ce35-ecff-41bb-adff-ed13c7a33a32.png" alt="Sign in and Sign out buttons" class="image--center mx-auto" width="272" height="231" loading="lazy"></p>
<p>When you click on Sign in, you’ll get directed to the Sign in page. Click Sign up. You should see the page below to create your account.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760795104086/41c44c85-881d-482c-ae1f-d84f1ea76fb5.png" alt="Sign in page with a form" class="image--center mx-auto" width="478" height="493" loading="lazy"></p>
<p>During sign-up, a verification code is sent to your sign-up email. Once you’re logged in, you can then access your coffee dashboard.</p>
<h3 id="heading-step-7-create-cloudfront-distribution-with-behaviors-for-s3-and-api-gateway"><strong>Step 7: Create Cloudfront Distribution With Behaviors For S3 And API Gateway</strong></h3>
<p>To create a distribution.</p>
<ol>
<li><p>Navigate to <strong>CloudFront</strong>.</p>
</li>
<li><p>Click <strong>Create distribution</strong>.</p>
</li>
<li><p>In the Origin page, select the S3 bucket and browse through your created S3 buckets.</p>
</li>
<li><p>Select your coffee shop bucket.</p>
</li>
<li><p>Set origin path to <code>/dist</code>.</p>
</li>
<li><p>Select <em>Origin access control</em> under <strong>Origin access</strong>.</p>
</li>
<li><p>Update your React code and AWS Cognito with the distribution domain name provided in the CloudFront log-in pages tab.</p>
</li>
</ol>
<h3 id="heading-step-8-set-up-react-application-and-upload-build-to-s3-bucket"><strong>Step 8: Set up React Application And Upload Build To S3 Bucket</strong></h3>
<p>In this step, we’ll be building our React application and uploading the static files to an Amazon S3 bucket, which is then served from a CloudFront distribution.</p>
<p>To get started:</p>
<ol>
<li><p>Create an S3 bucket and give it the name “mycoffeeShop123new”. This name should be globally unique across all AWS accounts.</p>
</li>
<li><p>In the frontend folder, run the <code>npm run build</code> command. This creates a <code>dist</code> folder in your directory.</p>
</li>
<li><p>Head back to the S3 bucket and drag-and-drop the <code>dist</code> folder into S3 to upload it.</p>
</li>
<li><p>Click <strong>Upload</strong>.</p>
</li>
</ol>
<p>Now, copy your CloudFront distribution URL and try to access your site in a private browser, for example, Chrome incognito. You should see your site live in the browser.</p>
<h2 id="heading-troubleshooting-access-denied-error"><strong>Troubleshooting Access Denied Error</strong></h2>
<p>You may encounter an access denied error in the browser:</p>
<pre><code class="lang-xml">
<span class="hljs-tag">&lt;<span class="hljs-name">Error</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">Code</span>&gt;</span>AccessDenied<span class="hljs-tag">&lt;/<span class="hljs-name">Code</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">Message</span>&gt;</span>Access Denied<span class="hljs-tag">&lt;/<span class="hljs-name">Message</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">Error</span>&gt;</span>
</code></pre>
<p>It may be because of a likely S3 + CloudFront configuration error. Here are the steps to resolve this issue:</p>
<h3 id="heading-step-1-set-up-origin-access-control-oac">Step 1: Set up Origin Access Control (OAC)</h3>
<ol>
<li><p>Go to <strong>CloudFront &gt; Your Distribution &gt; Origins tab.</strong></p>
</li>
<li><p>Select your S3 origin and click <strong>Edit.</strong></p>
</li>
<li><p>Under <strong>Origin access</strong>, select <strong>Origin access control settings (recommended)</strong></p>
</li>
<li><p>Click <strong>Create new OAC</strong> (or select an existing one).</p>
</li>
<li><p>Click <strong>Save changes.</strong></p>
</li>
</ol>
<h3 id="heading-step-2-update-s3-bucket-policy">Step 2: Update S3 Bucket Policy</h3>
<p>After saving, CloudFront will show you a <strong>"Copy Policy"</strong> button. Click it, then:</p>
<ol>
<li><p>Go to your S3 bucket &gt; <strong>Permissions</strong> tab.</p>
</li>
<li><p>Scroll to <strong>Bucket policy</strong> and click <strong>Edit.</strong></p>
</li>
<li><p>Paste the copied policy (it should look like this):</p>
</li>
</ol>
<pre><code class="lang-json">
{
    <span class="hljs-attr">"Version"</span>: <span class="hljs-string">"2012-10-17"</span>,
    <span class="hljs-attr">"Statement"</span>: [
        {
            <span class="hljs-attr">"Sid"</span>: <span class="hljs-string">"AllowCloudFrontServicePrincipal"</span>,
            <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
            <span class="hljs-attr">"Principal"</span>: {
                <span class="hljs-attr">"Service"</span>: <span class="hljs-string">"cloudfront.amazonaws.com"</span>
            },
            <span class="hljs-attr">"Action"</span>: <span class="hljs-string">"s3:GetObject"</span>,
            <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:s3:::YOUR-BUCKET-NAME/*"</span>,
            <span class="hljs-attr">"Condition"</span>: {
                <span class="hljs-attr">"StringEquals"</span>: {
                    <span class="hljs-attr">"AWS:SourceArn"</span>: <span class="hljs-string">"arn:aws:cloudfront::YOUR-ACCOUNT-ID:distribution/YOUR-DISTRIBUTION-ID"</span>
                }
            }
        }
    ]
}
</code></pre>
<ol start="4">
<li>Click <strong>Save changes.</strong></li>
</ol>
<h3 id="heading-step-3-set-default-root-object"><strong>Step 3: Set Default Root Object</strong></h3>
<ol>
<li><p>Go back to <strong>CloudFront &gt; Your Distribution &gt; General</strong> tab.</p>
</li>
<li><p>Click <strong>Edit.</strong></p>
</li>
<li><p>Set <strong>Default root object</strong> to <code>index.html</code>.</p>
</li>
<li><p>Save changes.</p>
</li>
</ol>
<p>Now, try accessing the site again. It should work.</p>
<p>This brings us to the end of this tutorial. I hope you were able to learn a thing or two about building serverless systems :)</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Congratulations! You've just built a production-ready serverless application from the ground up. You've successfully architected a complete CRUD system that automatically scales, stays secure with Cognito authentication, and costs you only what you actually use.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
