Cloud Computing - freeCodeCamp.org

How to Migrate to S3 Native State Locking in Terraform

Tolani Akintayo — Thu, 07 May 2026 22:58:43 +0000

If you've been running Terraform on AWS for any length of time, you know the setup: an S3 bucket for state storage, a DynamoDB table for state locking, and a handful of IAM policies tying them together. It works. It has worked for years.

But it has always carried a cost that rarely gets discussed openly. That cost isn't just money, though a DynamoDB table with on-demand billing adds up across multiple teams and environments.

The real cost is complexity. Every new AWS environment needs both resources provisioned before Terraform can manage anything else. Every engineer who sets up their first Terraform backend has to understand why two completely different AWS services are responsible for what is logically one thing: storing and protecting state. And every incident involving a stuck lock has required someone to manually delete a record from DynamoDB to unblock the team.

In November 2024, AWS announced that S3 now supports native object locking for Terraform state files, meaning DynamoDB is no longer required for state locking. Terraform 1.10 added support for this feature, and it's now generally available.

In this tutorial, you'll learn:

What S3 native locking is and how it works
How to set it up from scratch if you're starting a new project
How to migrate an existing S3 + DynamoDB setup to S3 native locking safely
How to verify locking is working and handle edge cases

By the end, you'll have a simpler, cleaner Terraform backend with one fewer AWS resource to manage.

What Is Terraform State Locking?
What Is S3 Native State Locking?
How S3 Native Locking Compares to the S3 + DynamoDB Approach
Prerequisites
Part 1: Fresh Setup – How to Configure S3 Native Locking from Scratch
Part 2: Migration – How to Move from S3 + DynamoDB to S3 Native Locking
How to Verify That Locking Is Working
How to Handle a Stuck Lock
Rollback Plan: If Something Goes Wrong
Security Best Practices for Your State Bucket
Conclusion
References

What is Terraform State Locking?

Before looking at the new approach, it helps to understand what state locking is solving.

Terraform stores everything it knows about your infrastructure in a state file – a JSON document that maps your configuration to real AWS resources. When you run terraform apply, Terraform reads this file, calculates the difference between the current state and your configuration, and makes the necessary changes.

The problem arises when two engineers or two CI/CD pipelines run and try to apply changes at the same time. If both read the state file simultaneously, calculate changes independently, and both try to write back, you get a race condition. The second write overwrites changes from the first, and your state is now out of sync with reality. This is a serious problem that can cause resources to be untracked, doubled, or destroyed unexpectedly.

State locking solves this by creating a lock when any operation starts that could modify state. If a lock already exists, Terraform refuses to proceed and reports who holds the lock and when it was acquired. Only one operation can hold the lock at a time. When the operation completes, the lock is released.

Terraform Run A                 State File / Lock                Terraform Run B
(User 1)                         (S3/DynamoDB)                   (User 2)

   |                                   |                            |
   |------- 1. Acquire Lock ---------->|                            |
   |                                   |                            |
   |<------ 2. Lock Granted -----------|                            |
   |                                   |                            |
   |                                   |------- 3. Acquire Lock --->|
   |            [PROCESSING]           |                            |
   |      (Modifying Infrastructure)   |<------ 4. Lock Denied -----|
   |                                   |        (Wait / Retry)      |
   |                                   |                            |
   |------- 5. Release Lock ---------->|                            |
   |                                   |                            |
   |           [COMPLETED]             |<------ 6. Lock Granted ----|
   |                                   |                            |
   |                                   |       [PROCESSING]         |
   |                                   | (Modifying Infrastructure) |              
   |                                   |                            |

What Is S3 Native State Locking?

Previously, Terraform's S3 backend used a DynamoDB table as the locking mechanism. When a lock was needed, Terraform wrote a record to DynamoDB with a LockID primary key. DynamoDB's conditional writes guaranteed that only one process could create that record, which is what made the locking atomic.

S3 native locking uses S3 Object Lock instead. S3 Object Lock is an S3 feature originally designed to enforce WORM (Write Once, Read Many) compliance for regulatory requirements. AWS extended this capability to support Terraform's state locking workflow.

When S3 native locking is enabled in your Terraform backend:

Terraform writes your state to an .tfstate object in S3 (as before)
To acquire a lock, Terraform uses S3's conditional write operations – specifically the if-none-match conditional header to create a lock file atomically
If the lock file already exists, S3 rejects the write, and Terraform reports that a lock is held
When the operation completes, Terraform deletes the lock file to release the lock.

The key difference from DynamoDB: the entire locking mechanism lives inside S3. No second service. No second set of IAM permissions. No second resource to provision.

Note: This feature requires Terraform version 1.10.0 or later and an S3 bucket with Object Lock enabled. Object Lock must be enabled at bucket creation time. You can't enable it on an existing bucket through the console or CLI. But there is a supported workaround for existing buckets, which we'll cover in Part 2.

How S3 Native Locking Compares to the S3 + DynamoDB Approach

Aspect	S3 + DynamoDB (Old)	S3 Native Locking (New)
AWS services required	S3 + DynamoDB	S3 only
IAM permissions needed	S3 + DynamoDB permissions	S3 permissions only
Terraform version	Any	1.10.0 or later
Setup complexity	Two resources, two IAM scopes	One resource
Stuck lock resolution	Delete DynamoDB record	Delete S3 lock file
Cost	S3 storage + DynamoDB on-demand	S3 storage only
Object Lock requirement	Not required	Required on S3 bucket
Locking mechanism	DynamoDB conditional writes	S3 conditional writes (`if-none-match`)
State versioning	S3 Versioning (recommended)	S3 Versioning (required for full safety)

The functional behavior from Terraform's perspective is identical. Locking works the same way. The lock information displayed when a lock is held has the same structure. The only difference is what happens under the hood.

Prerequisites

Before you start, make sure you have the following in place:

Terraform 1.10.0 or later installed. Check your version:

terraform version

If you need to upgrade, follow the official upgrade guide.

AWS CLI installed and configured with credentials that have permission to create and manage S3 buckets.

aws --version
aws sts get-caller-identity   # confirm you're authenticated

IAM permissions to perform the following S3 actions:
- s3:CreateBucket
- s3:PutBucketVersioning
- s3:PutBucketEncryption
- s3:PutObjectLegalHold
- s3:PutObjectRetention
- s3:GetObject
- s3:PutObject
- s3:DeleteObject
- s3:ListBucket
For the migration path: access to your existing Terraform project and the S3 bucket and DynamoDB table currently in use.

Part 1: Fresh Setup – How to Configure S3 Native Locking from Scratch

Follow this section if you're starting a new Terraform project and want to use S3 native locking from the beginning.

Step 1: Create the S3 Bucket with Versioning and Encryption

Object Lock must be enabled at bucket creation time. You can't add it afterward through the standard console flow. Create the bucket using the AWS CLI with Object Lock enabled:

aws s3api create-bucket \
  --bucket your-project-terraform-state \
  --region us-east-1 \
  --object-lock-enabled-for-bucket

Note: For regions other than us-east-1, add the --create-bucket-configuration flag.

aws s3api create-bucket \
  --bucket your-project-terraform-state \
  --region eu-west-1 \
  --create-bucket-configuration LocationConstraint=eu-west-1 \
  --object-lock-enabled-for-bucket

Now enable versioning on the bucket. Versioning is required alongside Object Lock and allows Terraform to recover previous state versions if something goes wrong:

aws s3api put-bucket-versioning \
  --bucket your-project-terraform-state \
  --versioning-configuration Status=Enabled

Enable server-side encryption so your state files are encrypted at rest:

aws s3api put-bucket-encryption \
  --bucket your-project-terraform-state \
  --server-side-encryption-configuration '{
    "Rules": [
      {
        "ApplyServerSideEncryptionByDefault": {
          "SSEAlgorithm": "AES256"
        },
        "BucketKeyEnabled": true
      }
    ]
  }'

Block all public access to the bucket. A Terraform state file contains resource IDs, IP addresses, and potentially sensitive values. It should never be publicly accessible:

aws s3api put-public-access-block \
  --bucket your-project-terraform-state \
  --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

Verify the bucket configuration:

# Confirm Object Lock is enabled
aws s3api get-object-lock-configuration \
  --bucket your-project-terraform-state
 
# Confirm versioning is enabled
aws s3api get-bucket-versioning \
  --bucket your-project-terraform-state
 
# Confirm encryption is configured
aws s3api get-bucket-encryption \
  --bucket your-project-terraform-state

Expected output for the Object Lock check:

{
    "ObjectLockConfiguration": {
        "ObjectLockEnabled": "Enabled"
    }
}

Step 2: Configure the Terraform Backend with Native Locking

In your Terraform project, create or update your backend.tf file:

terraform {
  backend "s3" {
    bucket = "your-project-terraform-state"
    key    = "production/terraform.tfstate"
    region = "us-east-1"
 
    # Enable S3 native state locking
    # Requires Terraform 1.10.0+ and a bucket with Object Lock enabled
    use_lockfile = true
 
    # Encryption at rest
    encrypt = true
  }
}

The critical difference from the old configuration is the use_lockfile = true parameter. Notice what is absent: there's no dynamodb_table argument. No DynamoDB table. No second service.

Here's a direct comparison of the old and new configurations:

Old configuration (S3 + DynamoDB):

terraform {
  backend "s3" {
    bucket         = "your-project-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"   # this goes away
  }
}

New configuration (S3 native locking):

terraform {
  backend "s3" {
    bucket       = "your-project-terraform-state"
    key          = "production/terraform.tfstate"
    region       = "us-east-1"
    encrypt      = true
    use_lockfile = true   # this replaces dynamodb_table
  }
}

Step 3: Initialize and Verify

Run terraform init to initialize the backend:

terraform init

Expected output:

Initializing the backend...
 
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
 
Initializing provider plugins...
 
Terraform has been successfully initialized!

Run a plan to confirm everything is working end-to-end:

terraform plan

If locking is working, you'll see a brief pause while Terraform acquires the lock before the plan output appears. You'll also see the lock information if you look at the S3 bucket – a .tflock file will appear temporarily alongside your state file during the operation and disappear when it completes.

Part 2: Migration – How to Move from S3 + DynamoDB to S3 Native Locking

Follow this section if you have an existing Terraform setup using an S3 bucket and DynamoDB table for state locking, and you want to migrate to S3 native locking.

Important: Migration requires a maintenance window or at minimum a period where no Terraform operations are running. You're changing the backend configuration, which means all team members and CI/CD pipelines must stop running terraform plan or terraform apply during the migration. The migration itself takes under 10 minutes.

Step 1: Verify Your Current Setup

Before making any changes, document your existing backend configuration and confirm the state file is accessible:

# Confirm your state file is in S3
aws s3 ls s3://your-existing-bucket/path/to/terraform.tfstate
 
# Confirm the DynamoDB table exists
aws dynamodb describe-table \
  --table-name your-dynamodb-lock-table \
  --query 'Table.TableStatus'

Check your current backend.tf and note the exact values:

# Your current backend.tf - note these values before changing anything
terraform {
  backend "s3" {
    bucket         = "your-existing-bucket"       # note this
    key            = "path/to/terraform.tfstate"   # note this
    region         = "us-east-1"                   # note this
    encrypt        = true
    dynamodb_table = "your-dynamodb-lock-table"    # this will be removed
  }
}

Run one final plan to confirm the current state is clean and there are no unexpected changes pending:

terraform plan

If the plan shows no changes, you're in a safe state to proceed.

Step 2: Enable Object Lock on the Existing S3 Bucket

This is the most important step in the migration. Object Lock can't normally be enabled on an existing bucket. It's a setting that must be configured at creation time.

But AWS provides a way to enable Object Lock on an existing bucket through a support request or through a direct API call that's not exposed in the standard console UI. AWS has officially documented this path for the Terraform migration use case.

Run the following AWS CLI command to enable Object Lock on your existing bucket:

aws s3api put-object-lock-configuration \
  --bucket your-existing-bucket \
  --object-lock-configuration '{"ObjectLockEnabled": "Enabled"}'

Note: This command enables Object Lock in governance mode with no default retention, meaning it enables the locking capability without setting a default retention period on all objects. This is exactly what Terraform's native locking needs: the ability to create and delete lock files, not permanent object retention.

Verify Object Lock is now enabled:

aws s3api get-object-lock-configuration \
  --bucket your-existing-bucket

Expected output:

{
    "ObjectLockConfiguration": {
        "ObjectLockEnabled": "Enabled"
    }
}

Also verify that versioning is already enabled (it should be if you are running a production Terraform setup):

aws s3api get-bucket-versioning \
  --bucket your-existing-bucket

Expected output:

{
    "Status": "Enabled"
}

If versioning isn't enabled, enable it before proceeding:

aws s3api put-bucket-versioning \
  --bucket your-existing-bucket \
  --versioning-configuration Status=Enabled

Step 3: Update the Terraform Backend Configuration

Update your backend.tf to remove the dynamodb_table argument and add use_lockfile = true:

terraform {
  backend "s3" {
    bucket = "your-existing-bucket"
    key    = "path/to/terraform.tfstate"
    region = "us-east-1"
    encrypt = true
 
    # Add this:
    use_lockfile = true
 
    # Remove this line entirely:
    # dynamodb_table = "your-dynamodb-lock-table"
  }
}

Your updated backend.tf should look like this:

terraform {
  backend "s3" {
    bucket       = "your-existing-bucket"
    key          = "path/to/terraform.tfstate"
    region       = "us-east-1"
    encrypt      = true
    use_lockfile = true
  }
}

Step 4: Reinitialize Terraform

Run terraform init with the -reconfigure flag. This flag tells Terraform that the backend configuration has changed intentionally and to reinitialize without prompting you to copy state (the state is already in the same bucket):

terraform init -reconfigure

Expected output:

Initializing the backend...
 
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
 
Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
 
Terraform has been successfully initialized!

If you see an error here: The most common cause is that Object Lock wasn't successfully enabled on the bucket. Re-run the verification from Step 2 before proceeding.

Step 5: Verify the Migration

Run a plan to confirm Terraform is working correctly with the new backend configuration:

terraform plan

The plan should:

Complete successfully
Show the same result as the plan you ran in Step 1 (no changes, or the same changes as before)
NOT mention DynamoDB anywhere in its output

To confirm that locking is actually using S3 instead of DynamoDB, open a second terminal and run a plan while the first one is running. You should see the second terminal output a lock error that mentions S3, not DynamoDB:

╷
│ Error: Error acquiring the state lock
│
│Error message: operation error S3: PutObject, https response       error StatusCode: 409,
│ RequestID: ..., api error Conflict: Object lock already exists for this key.
│
│ Lock Info:
│   ID:        a1b2c3d4-e5f6-7890-abcd-ef1234567890
│   Path:      your-existing-bucket/path/to/terraform.tfstate.tflock
│   Operation: OperationTypePlan
│   Who:       user@hostname
│   Version:   1.10.0
│   Created:   2026-05-06 14:22:01 UTC
│   Info:
╵

The Path field shows .tfstate.tflock, a file in your S3 bucket, not a DynamoDB record. This confirms that locking is now handled entirely by S3.

Step 6: Clean Up the DynamoDB Table

Once you've confirmed the migration is working correctly and your team has run at least one successful plan and apply cycle using the new backend, you can remove the DynamoDB table.

Wait at least 24-48 hours before deleting the DynamoDB table if you have CI/CD pipelines or multiple team members. This gives time to catch any pipeline that wasn't updated with the new backend configuration.

When you're ready, delete the DynamoDB table:

aws dynamodb delete-table \
  --table-name your-dynamodb-lock-table

Confirm the deletion:

aws dynamodb describe-table \
  --table-name your-dynamodb-lock-table

Expected output:

An error occurred (ResourceNotFoundException) when calling the DescribeTable operation:
Requested resource not found

This error confirms that the table is gone. The migration is complete.

If you provisioned the DynamoDB table using Terraform (which is the recommended pattern), remove the resource from your Terraform configuration and run terraform apply to destroy it via Terraform rather than the CLI directly. This keeps your state clean:

# Remove this entire block from your Terraform configuration:
resource "aws_dynamodb_table" "terraform_state_lock" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
 
  attribute {
    name = "LockID"
    type = "S"
  }
}

After removing the block, run:

terraform apply

Terraform will detect that the DynamoDB table resource has been removed from configuration and will destroy the table.

How to Verify That Locking Is Working

After completing either the fresh setup or the migration, use this procedure to independently verify that locking is functioning correctly.

Method 1: Observe the lock file during an operation

In one terminal, start a long-running plan against a configuration with many resources:

terraform plan

While it's running, in a second terminal, check for the lock file in S3:

aws s3 ls s3://your-bucket/path/to/ | grep tflock

You should see a file like:

2026-05-06 14:22:01        512 terraform.tfstate.tflock

After the plan completes, run the same command again. The .tflock file should be gone.

Method 2: Read the lock file contents

While a plan is running, download and read the lock file to see its contents:

aws s3 cp \
  s3://your-bucket/path/to/terraform.tfstate.tflock \
  /tmp/current.lock && cat /tmp/current.lock

Expected output (formatted for readability):

{
  "ID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "Operation": "OperationTypePlan",
  "Info": "",
  "Who": "tolani@dev-machine",
  "Version": "1.10.0",
  "Created": "2026-05-06T14:22:01.123456789Z",
  "Path": "your-bucket/path/to/terraform.tfstate"
}

This is the same lock information that Terraform displays when a lock is held. It's now a JSON file in S3 rather than a record in DynamoDB.

How to Handle a Stuck Lock

With the DynamoDB backend, resolving a stuck lock meant deleting a record from the DynamoDB table. With S3 native locking, it means deleting the .tflock file from S3.

A lock can get stuck if:

A terraform apply or plan process was killed mid-execution
A CI/CD pipeline runner crashed during a Terraform operation
A network interruption prevented the lock release from completing

Here's how you can check for a stuck lock:

aws s3 ls s3://your-bucket/path/to/ | grep tflock

If a .tflock file exists and no Terraform operation is currently running, it is a stuck lock.

You can also read the lock to understand who held it:

aws s3 cp \
  s3://your-bucket/path/to/terraform.tfstate.tflock \
  /tmp/stuck.lock && cat /tmp/stuck.lock

This tells you who (Who field) was running the operation, what operation it was (Operation field), and when it was acquired (Created field).

And you can force-unlock using Terraform like this:

terraform force-unlock LOCK-ID

Replace LOCK-ID with the ID value from the lock file contents. For example:

terraform force-unlock a1b2c3d4-e5f6-7890-abcd-ef1234567890

Terraform will confirm:

Do you really want to force-unlock?
  Terraform will remove the lock on the remote state.
  This will allow local Terraform commands to modify this state, even though it
  may be still be in use. Only 'yes' will be accepted to confirm.
 
  Enter a value: yes
 
Terraform state has been successfully unlocked!

An alternative is to delete the lock file directly via CLI. If terraform force-unlock doesn't work (for example, because you are running in a CI environment without Terraform available), delete the lock file directly:

aws s3 rm s3://your-bucket/path/to/terraform.tfstate.tflock

Only delete the lock file if you are certain no Terraform operation is currently running. Deleting a lock that is actively held by a running operation will allow a second concurrent operation to start, which is exactly the race condition locking is designed to prevent.

Rollback Plan: If Something Goes Wrong

If you encounter problems after migrating, you can roll back to the S3 + DynamoDB setup with these steps.

Step 1: Stop all Terraform operations in your team and CI/CD pipelines.

Step 2: Recreate the DynamoDB table if you already deleted it:

aws dynamodb create-table \
  --table-name terraform-state-lock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

Step 3: Revert backend.tf to the previous configuration:

terraform {
  backend "s3" {
    bucket         = "your-existing-bucket"
    key            = "path/to/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"   # restored
    # Remove: use_lockfile = true
  }
}

Step 4: Reinitialize:

terraform init -reconfigure

Step 5: Verify:

terraform plan

The state file hasn't moved, so there's no data loss during a rollback. The only change is which locking mechanism Terraform uses.

Note: Object Lock being enabled on the S3 bucket doesn't prevent the rollback. Object Lock and DynamoDB locking can coexist, Object Lock simply adds a capability to the bucket. Using dynamodb_table in your backend config tells Terraform to use DynamoDB regardless of whether Object Lock is enabled on the bucket.

Security Best Practices for Your State Bucket

Migrating to S3 native locking is a good opportunity to review the overall security configuration of your state bucket. Here are the practices every production Terraform state bucket should implement:

Enable Versioning (Required)

Versioning is a hard requirement for S3 native locking to work safely. It ensures that if a state file is accidentally overwritten or corrupted, you can restore a previous version.

aws s3api put-bucket-versioning \
  --bucket your-state-bucket \
  --versioning-configuration Status=Enabled

Block All Public Access (Non-Negotiable)

Your state file contains resource ARNs, IP addresses, and may contain sensitive values passed through Terraform variables. It must never be publicly accessible.

aws s3api put-public-access-block \
  --bucket your-state-bucket \
  --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

Enable Server-Side Encryption

Always encrypt state files at rest. AES256 is the minimum. If your organization requires KMS key management:

aws s3api put-bucket-encryption \
  --bucket your-state-bucket \
  --server-side-encryption-configuration '{
    "Rules": [
      {
        "ApplyServerSideEncryptionByDefault": {
          "SSEAlgorithm": "aws:kms",
          "KMSMasterKeyID": "arn:aws:kms:us-east-1:123456789012:key/your-kms-key-id"
        },
        "BucketKeyEnabled": true
      }
    ]
  }'

Apply Least-Privilege IAM Permissions

The role or user that Terraform uses to access the state bucket should have only the permissions it needs. Here's a minimal IAM policy for S3 native locking:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "TerraformStateAccess",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::your-state-bucket",
        "arn:aws:s3:::your-state-bucket/*"
      ]
    },
    {
      "Sid": "TerraformStateLocking",
      "Effect": "Allow",
      "Action": [
        "s3:GetObjectLegalHold",
        "s3:PutObjectLegalHold",
        "s3:GetObjectRetention",
        "s3:PutObjectRetention"
      ],
      "Resource": "arn:aws:s3:::your-state-bucket/*.tflock"
    }
  ]
}

Notice what is absent: there are no DynamoDB permissions. This is a cleaner, smaller permission set than the old approach required.

Enable Access Logging

Log all access to your state bucket in CloudTrail or S3 server access logs. This gives you an audit trail of every time state was read, written, or locked:

aws s3api put-bucket-logging \
  --bucket your-state-bucket \
  --bucket-logging-status '{
    "LoggingEnabled": {
      "TargetBucket": "your-logging-bucket",
      "TargetPrefix": "terraform-state-access/"
    }
  }'

Conclusion

AWS S3 native state locking removes the need for a DynamoDB table from your Terraform backend setup. The result is simpler infrastructure, a smaller IAM permission surface, and one fewer service to provision, monitor, and pay for across every environment your team manages.

Here's a summary of what you accomplished:

Understood what state locking is and why it's required for safe Terraform operations
Compared S3 native locking to the existing S3 + DynamoDB approach
Set up a fresh Terraform backend using S3 native locking with correct bucket configuration
Migrated an existing backend from S3 + DynamoDB to S3 native locking safely
Learned how to verify locking, handle stuck locks, and roll back if needed
Applied security best practices to the state bucket

This pattern – using S3 native locking – is the recommended approach for all new Terraform projects on AWS going forward. If you're managing a large estate with multiple Terraform backends, consider automating the migration using a script or Terraform module that applies the pattern across all your state buckets.

If you are building or optimizing cloud infrastructure for a startup and want a complete reference for production-ready Terraform modules, CI/CD pipeline patterns, and infrastructure runbooks, check out The Startup DevOps Field Guide. It covers the full lifecycle of AWS infrastructure from initial setup to production reliability.

References

How to Land Your First Cloud or DevOps Role: What Hiring Managers Actually Look For

Tolani Akintayo — Thu, 30 Apr 2026 14:33:32 +0000

You've completed three AWS courses. You have notes from a dozen Docker tutorials. You know what Kubernetes is, what CI/CD means, and you can explain Infrastructure as Code without hesitating.

And yet the applications go out, and nothing comes back.

This is one of the most frustrating experiences in tech. You're genuinely learning, genuinely putting in the time, and you have nothing to show for it in terms of results. You start to wonder if the market is too competitive, if you need one more certification, or if there's some hidden door everyone else found that you're missing.

The truth is simpler and more actionable than any of that: hiring managers can't see your YouTube watch history. They can see your GitHub. Most beginners optimize for learning. Hired candidates optimize for proof.

In this guide, you'll get an honest breakdown of the nine factors hiring managers actually evaluate when they look at a junior cloud or DevOps candidate and a concrete 90-day plan to address each one. By the end, you'll know exactly where you stand and exactly what to do next.

The Three Patterns That Keep Beginners Stuck
What Hiring Managers Are Actually Evaluating
Factor 1: Proof of Work (The Non-Negotiable)
- The Three Projects That Cover Everything
Factor 2: System-Level Thinking
Factor 3: Software Engineering Fundamentals
Factor 4: Communication Skills
Factor 5: Consistency Over Intensity
Factor 6: Networking and Visibility
Factor 7: Ownership Mindset
Factor 8: Business Awareness
Factor 9: Learning Agility
Your 90-Day Action Plan
Honest Self-Assessment: Where Do You Stand?
Conclusion
References and Recommended Resources

The Three Patterns That Keep Beginners Stuck

Pattern 1: The Tutorial Loop

Week 1: You watch eight hours of Docker content. Week 2: You start an AWS course and get 70% through. Week 3: A Kubernetes series looks interesting, so you start that instead. Week 4: You open LinkedIn and wonder why you're not getting callbacks.

Watching tutorials feels like progress. It's comfortable, passive, and has no failure state. Nothing breaks. Nothing goes wrong.

The problem is that it produces nothing a hiring manager can evaluate. Courses and certifications tell an employer what you've been exposed to. Your GitHub tells them what you can actually do.

Pattern 2: The Theory-Practice Gap

You can explain CI/CD fluently. You've read the Kubernetes documentation. You understand the conceptual difference between a container and a virtual machine.

But you've never taken a simple application, containerized it, connected it to a pipeline, and deployed it to a cloud server with a real URL that someone can visit.

In an interview, "I understand how it works" and "I have built this and here is the link" are not equivalent answers. Hiring managers hear the first version from hundreds of candidates. The second version gets callbacks.

Pattern 3: Silent Learning

This one is perhaps the most painful pattern because the learning is real. You're putting in the work every day but nobody knows. No GitHub activity. No LinkedIn posts. No community presence. Just cold applications sent from job boards to ATS systems that filter you out before a human ever sees your name.

The hard truth: people get hired through people. A hiring manager who has seen your LinkedIn post about a problem you solved is significantly more likely to give your résumé serious attention than a stranger who applied through a portal.

What Hiring Managers Are Actually Evaluating

I've grouped the nine factors that follow into three buckets: Mindset, Execution, and Visibility. The order matters: mindset shapes how you execute, and execution is what powers visibility.

Bucket	Covers	Factors
Mindset	How you think about problems and your career	Factors 2, 7, 8, 9
Execution	What you actually build and demonstrate	Factors 1, 3
Visibility	Whether the right people know you exist	Factors 4, 5, 6

Let's go through each one.

Factor 1: Proof of Work (The Non-Negotiable)

If there's one thing to take from this entire article, it's this: no portfolio means no serious consideration. The most technically capable candidate in the applicant pool is invisible without proof of work.

This isn't about impressing anyone with complexity. It's about demonstrating that you can take a system from zero to deployed, documented, and working.

Here's the checklist every portfolio project should meet before you consider it done:

It's deployed: there's a real URL you can share, not "it works on my machine"
It has a CI/CD pipeline: code changes are automatically tested and deployed
Infrastructure is defined as code: not manually clicked together in the AWS console
It has monitoring and alerting: you know when it breaks before users tell you
It's documented: a README explains what it does, how to run it, and how it works
It's on GitHub publicly: with real commit history showing iterative work

If your project meets all six criteria, you have proof of work. If it meets four of six, you have a project in progress. Finish it before you start applying.

The Three Projects That Cover Everything

You don't need ten projects. You need two to three projects that together demonstrate the full range of DevOps skills.

Project 1 : The Full-Stack Deploy Pipeline

This is the foundational DevOps project every beginner should build first.

Take any simple web application – a Python Flask app, a Node.js API, or even a static site. Containerize it with Docker. Write a CI/CD pipeline that runs tests, builds the Docker image, and deploys to a cloud server automatically on every push to the main branch. You can also set up Nginx as a reverse proxy and add an uptime monitor (UptimeRobot has a free tier).

Tools: GitHub Actions, Docker, AWS EC2 or Render.com, Nginx.

Why it matters to a hiring manager: it proves you can automate a full deployment workflow end-to-end. The hiring manager can visit your URL, see it running, and inspect your pipeline history.

This single project puts you ahead of most applicants who only have course completion screenshots.

Project 2: Infrastructure as Code with Terraform

Write Terraform code that provisions a complete environment: a VPC, public and private subnets, an EC2 instance with properly scoped security group rules, and an S3 bucket for remote state. Destroy it and recreate it from scratch to prove the code actually works. Add a GitHub Actions workflow that runs terraform plan on pull requests and terraform apply on merge to main.

Tools: Terraform, AWS (or Azure/GCP), GitHub Actions.

Why it matters: Infrastructure as Code with Terraform is a required skill at almost every company running cloud infrastructure. Showing you can write, version-control, and automate Terraform demonstrates a core professional competency.

Project 3: Monitoring and Observability Stack

Deploy a monitoring stack using Docker Compose: Prometheus scraping metrics from your application and the host, Grafana dashboards showing CPU, memory, request rates, and error rates, and Alertmanager configured to send alerts to Slack or email when thresholds are crossed. Connect this to your Project 1 application so the pipeline deploys and the monitoring watches it.

Tools: Prometheus, Grafana, Alertmanager, Node Exporter, Docker Compose.

Why it matters: most beginner portfolios have zero observability work. This project immediately signals that you understand production engineering, not just deployment. Any senior DevOps engineer or SRE reviewing your application will notice it and it will set you apart.

Factor 2: System-Level Thinking

This is the mindset that separates a DevOps engineer from someone who just knows a collection of tools. System-level thinking means you can see the whole picture, not just the part you happen to be working on at any given moment.

Here's the mental test hiring managers are running throughout your interview: can you trace a user request from the moment they click a button to the moment they see a response, and explain what happens at every layer in between?

Here's the full journey of a web request, the map of modern infrastructure every DevOps engineer needs to understand:

Step	Layer	What's happening and what can go wrong
1	User's Browser	The user types a URL. The browser needs to find the server.
2	DNS Resolution	The domain is translated into an IP address. DNS misconfigurations mean users can't reach you at all.
3	CDN / Edge Network	Traffic hits a CDN (Cloudflare, CloudFront) first. Static assets are served from the nearest edge. SSL terminates here.
4	Load Balancer	Routes the request to an available application server. If all targets are unhealthy, users get 502/503 errors.
5	Compute / Application Servers	The application code runs here in containers, on VMs, or in server-less functions. Business logic executes.
6	Database Layer	The application reads from or writes to a database. Slow queries or a full disk causes slow responses or outages.
7	Cache Layer	Redis or Memcached caches frequently-read data. Cache misses cause extra database load.
8	Response Returns	The response travels back through the stack and the user sees the result.
9	Logging and Monitoring	Every step above should emit logs and metrics. Good monitoring alerts you before users notice a problem.

Why does this matter in an interview? Consider two candidates answering the question: "Tell me about a time something broke in production."

Candidate A: "The website was down."

Candidate B: "The load balancer health checks were failing because the app containers were running out of memory due to a memory leak introduced in the previous deploy. We identified it via memory metrics in Grafana, rolled back, and added a memory limit to the container spec."

Same incident. Completely different answer. System-level thinking is what makes the difference.

Factor 3: Software Engineering Fundamentals

Many beginners rush to learn Kubernetes and Terraform before mastering the foundations that make those tools make sense. This creates a knowledge structure that looks impressive but has no solid base underneath it.

Here are the fundamentals that actually matter and what to do if you have a gap in any of them:

1. Linux and the Command Line

DevOps tools run on Linux. CI/CD jobs run in Linux containers. SSH is the front door to every server. If the terminal makes you uncomfortable, you're not ready for a production environment. This is not a preference, it's a prerequisite.

Start with daily Linux practice. The Linux Foundation's free introductory materials are a solid starting point. And here's a solid freeCodeCamp course on Linux basics.

2. Networking Fundamentals

DNS, TCP/IP, HTTP/HTTPS, load balancing, firewalls, VPCs, subnets these concepts appear in every cloud architecture. Without them, Terraform and Kubernetes are magic boxes. Study the request flow in Factor 2 above until you can draw it from memory without looking.

Here's a computer networking fundamentals course to get you started.

3. Scripting: Bash and Python

CI/CD pipelines are scripts. Automation is scripting. If you cannot write a Bash script that reads a config file, calls an API, and handles errors gracefully your automation ceiling is very low. Fix this by writing one small, useful script every week. Solve real problems with code.

Here's a helpful tutorial on shell scripting in Linux for beginners.

4. Git and Version Control

Not just git commit and git push. Branching strategies, pull requests, merge conflicts, rebasing, and tagging releases are all standard practice in professional DevOps teams. Use Git for everything including your personal learning notes. Practice branching workflows intentionally.

Here's a full book on all the Git basics (and some more advanced topics, too) you need to know.

5. Docker and Containers

Docker is the universal packaging format for modern software. Understanding layers, multi-stage builds, volumes, networking, and container security is the floor not the ceiling. Every project you build should be containerized. Write your Dockerfiles by hand instead of copying them.

Here's a course on Docker and Kubernetes to get you started,

Factor 4: Communication Skills

Technical skills set your ceiling. Communication skills determine how fast you reach it. This is the most consistently underestimated factor among beginner DevOps candidates.

Two candidates with identical technical ability will have very different career outcomes based on how clearly they communicate. Here's what that looks like in practice:

Architecture explanation: Can you describe how your project works to someone who has never seen it? Can you draw the architecture on a whiteboard and walk someone through your design decisions and the trade-offs you made?

Trade-off articulation: "I chose X over Y because..." is one of the most powerful phrases in a technical interview. It shows you understand that every decision has pros and cons and you made a conscious, reasoned choice rather than just copying a tutorial.

Written documentation: A README is your project's cover letter. A well-written README with clear setup instructions, an architecture diagram, and documented decisions demonstrates engineering maturity that most beginners don't show.

Here's a quick test: open your most recent project on GitHub and read the README as if you're a hiring manager seeing it for the first time. Does it answer these questions?

What does this project do, and why did you build it?
What does the architecture look like?
How do I run this locally, and how do I deploy it?
What decisions did you make, and why?
What would you improve if you continued working on it?

If you answered "no" to more than two of those rewrite the README before applying anywhere. This single action will meaningfully improve your response rate.

Interview communication: Hiring managers assess communication throughout the entire interview not just your answers. Thinking out loud, structuring your responses, and admitting uncertainty honestly are all evaluated.

Factor 5: Consistency Over Intensity

Hiring managers are pattern recognition machines. They look at your GitHub contribution graph, your LinkedIn activity, and your learning trajectory and form an impression before reading a single word on your résumé.

A binge-learning approach, 10-hour weekends followed by weeks of nothing produces a GitHub graph that tells the wrong story. Thirty minutes of focused daily practice for six months beats a monthly 10-hour binge. At the six-month mark, the daily practitioner has 90 hours of focused work. The binge learner has 60 with significantly worse retention.

Here's how to build consistency in practice:

Pick a time slot in your day that you will protect. Thirty minutes is enough to make progress.
Define a four-week learning sprint with a specific goal, not "learn Terraform" but "build and deploy a VPC with Terraform and write the README."
Keep a private learning journal: date, what you studied, what you built, what confused you.
When the sprint ends, evaluate what you built and plan the next one.

What to avoid: declaring publicly on LinkedIn that you're "grinding DevOps full time" and then disappearing for six weeks. The absence is noticed. Only commit publicly to what you will actually sustain.

Factor 6: Networking and Visibility

This is the factor most beginners resist most, and the one that makes the biggest practical difference in time-to-hire.

Most DevOps jobs are filled through people referrals, community connections, LinkedIn conversations. A warm introduction from someone who has seen your work outweighs fifty cold applications every time.

Here are three ways to build visibility without it feeling performative:

Community Engagement

Join communities where DevOps engineers actually talk: AWS User Groups, local DevOps meetups, DevOps Discord servers, Reddit communities like r/devops and r/kubernetes. You don't need to be the expert. Ask specific questions, answer what you genuinely know, and show up consistently. After three to six months, people will recognize your name.

LinkedIn Content

Post once per week about something you learned, built, or got stuck on. Not marketing – documentation. A post that says "This week I configured Prometheus alerting for a Docker Compose stack. Here's what tripped me up and how I solved it" attracts recruiters, leads to conversations, and builds a searchable record of your growth over time.

Asking Good Questions in Public

When you get stuck and figure it out, write it up. Post the solution in the same community where you asked the question. Answer someone else's version of the same question later. You position yourself as a helpful, engaged learner, exactly who hiring managers want to hire.

Here's a concrete three-month visibility sprint to follow:

Timeframe	Action
Week 1-2	Update your LinkedIn headline: "Cloud / DevOps Engineer in Training │ Building with AWS, Docker, Terraform". Connect with 20 people in DevOps engineers, recruiters, hiring managers. Add a short personal note when connecting.
Week 3-4	Write your first LinkedIn post. Document something you built or learned this week. Keep it honest and specific. 150–200 words is enough.
Month 2	Join one community. Introduce yourself. Answer one question per week.
Month 3	Post consistently once per week. Engage with others' posts. Start appearing in recruiter searches.

By month three, recruiters searching for "DevOps" in your location will encounter your activity. Some of the best entry-level DevOps opportunities come from exactly this kind of low-pressure visibility.

Factor 7: Ownership Mindset

This factor is less about personality type and more about observable behavior. Hiring managers are looking for evidence that you finish what you start not just that you start things.

Here's what the contrast looks like:

What hiring managers frequently see	What hiring managers want to see
"I started a Kubernetes project and encountered a lot of issues"	"Here is a complete project. It deploys to AWS, has a CI/CD pipeline, is monitored, and you can access it at this URL right now."
"I was working through a Terraform course, learnt a lot about XYZ."	"I finished it, documented it, and wrote a post about what I learned."

Ownership mindset has three components. First, finish things: a complete, simple project is worth ten times more than ten incomplete complex ones. Second, take responsibility without blame when something breaks: ownership means identifying the cause, fixing it, and adding monitoring so it doesn't happen again. Third, self-direct your learning you don't wait for someone to tell you what to learn next. You see a gap, identify how to close it, and close it. This is what "junior who can work independently" actually means in job descriptions.

Factor 8: Business Awareness

Technical skill gets you in the door. Business awareness keeps you there and accelerates your career.

The core question hiring managers are testing is: can you connect your technical decisions to cost, uptime, and user impact? Infrastructure decisions are business decisions. Cloud costs are typically the second-largest engineering expense at most companies after salaries. A misconfigured auto-scaling group or a forgotten large EC2 instance can burn thousands of dollars overnight.

Here are a few benchmark questions worth being able to answer comfortably:

If your company has a 99.9% SLA, how many minutes of downtime per month is that? (About 43 minutes.)
If you move workloads from on-demand EC2 instances to Reserved Instances, what's the approximate cost saving? (Around 40–60%.)
If your CI/CD pipeline takes 45 minutes per build and you run 20 builds per day, how much developer wait time does that represent weekly?

Most junior candidates can't answer these fluently in an interview. Candidates who can stand out immediately not because the questions are hard, but because so few people bother to connect infrastructure and business.

The simple habit to build: whenever you describe a technical decision in your project documentation or in an interview, add the business dimension. "I configured auto-scaling" becomes "I configured auto-scaling to handle traffic spikes, which eliminated the cost of over-provisioning and reduced our estimated monthly cloud spend by approximately $X."

Factor 9: Learning Agility

Everyone claims to be a fast learner. It's the most overused phrase in technology job applications. Here's how to make it actually mean something.

Saying "I'm a fast learner" in an interview is table stakes. The question is whether you can prove it. Proof sounds like this: "I had never used GitHub Actions before. I needed a CI/CD pipeline for a project I was building. In 48 hours, I had a working pipeline that runs tests, builds a Docker image, and deploys to AWS."

What makes that credible: it names a specific tool, a specific timeframe, and a specific outcome. There is a GitHub repository with a commit history and a working pipeline that a hiring manager can actually look at.

Learning agility is not about knowing many tools shallowly. It's about picking up new tools quickly because you deeply understand the underlying concepts. Tool names change every few years. Concepts networking, automation, observability, reliability do not.

To build a concrete track record of learning agility: once a month, pick one tool you haven't used. Follow its quick-start guide. Build something small. Document what was difficult. Post about it. This is your learning agility portfolio visible, dated, and specific.

Your 90-Day Action Plan

Here is a concrete, sequential plan that takes you from where you are now to your first DevOps interview-ready state.

Month 1: Build Your Foundation

Focus entirely on Project 1 from the Proof of Work section. Build it completely. Deploy it. Get the live URL. Don't start Project 2 until Project 1 meets all six checklist criteria.

Alongside the build: 30 minutes of Linux and Bash scripting practice daily. This isn't optional, it's the foundation everything else runs on.

Month 2: Expand Your Execution and Start Your Visibility

Begin Project 2 (Terraform IaC). Write your first LinkedIn post, it doesn't need to be polished, it needs to be specific. Join one community and introduce yourself.

Month 3: Complete the Portfolio and Document Everything

Finish all three projects to full checklist standard. Polish every README. Add architecture diagrams. Optimize your GitHub profile, pin your three best repos, write a profile README that describes who you are and what you build, and add links to your live project URLs.

Month 4 Onward: Apply with Strategy

Don't start applying before month four. Apply with real proof of work in hand. Target five to ten quality applications per week rather than spraying a hundred. Include your GitHub and your best project's live URL in every application. For roles at companies where you have a community connection, reach out to that person before applying.

Track every application in a spreadsheet: company, role, date applied, status, outcome, notes. After thirty applications, you'll have enough data to see what's working and what isn't.

Here's the full 90-day breakdown:

Timeframe	Focus	Milestone
Week 1-2	Linux fundamentals. Set up GitHub profile. Start Project 1.	Foundation
Week 3-4	Complete Project 1 CI/CD pipeline. Deploy. Get live URL. Write README.	First Proof of Work
Month 2	Begin Project 2. First LinkedIn post. Join one community.	Visibility begins
Month 2-3	Complete Project 2. Scaffold monitoring (Project 3). Post weekly on LinkedIn.	Building momentum
Month 3	Finish all 3 projects to checklist standard. Polish READMEs and GitHub profile.	Portfolio complete
Month 4+	Apply strategically. Continue posting and community engagement.	Active job search

Honest Self-Assessment: Where Do You Stand?

Go through each statement below. Be completely honest: this is for you, not anyone else.

Statement	Action if the answer is No
I can explain a web request end-to-end (DNS → load balancer → compute → database → logs)	Study Factor 2 until you can draw this from memory
I have at least one deployed project with a live URL	This is Priority 1. Nothing else matters more right now.
My best project has a CI/CD pipeline that auto-deploys on push	Add this to your existing project this week
I have written infrastructure as code (Terraform or CloudFormation)	Project 2 is your next build target
My projects have READMEs that explain architecture and decisions	Spend one hour today rewriting your README
I have posted about my learning on LinkedIn in the last 30 days	Post something today, document what you built last week
I am part of at least one DevOps community	Join r/devops or an AWS Discord server this week
I can write a Bash script that solves a real automation problem	30 minutes of daily scripting practice for the next 30 days
I can explain what I built, why I made each decision, and what I'd change	Practice saying this out loud about each project until it's fluent

Count your "no" answers. Each one is a specific, actionable gap, not a vague sense of being behind. That's the difference between this self-assessment and the anxious feeling of "I'm not ready yet." You're not behind. You just have a prioritized list of what to build next.

Conclusion

Here's what you know now that most beginners still don't:

The gap between you and a DevOps job isn't a gap in certifications, a gap in courses completed, or a gap in the number of tools you've heard about. It's a gap in proof of work, visibility, and the consistency with which you execute.

Hiring managers aren't looking for someone who has watched everything. They're looking for someone who has built something, documented it, deployed it, monitored it, and can clearly explain every decision they made along the way.

The path isn't secret. It's just work. Build two to three complete projects that meet the full checklist. Document everything. Show up consistently in communities and on LinkedIn. Apply with strategy. Iterate based on feedback.

If you want a production-grade reference to support your DevOps journey complete with real Terraform modules, CI/CD workflow templates, infrastructure runbooks, and platform engineering patterns used in real startup environments The Startup DevOps Field Guide was built for exactly this stage of your career.

The information gap between you and your first DevOps role is smaller than you think. The execution gap is where the work is. Start today.

References and Recommended Resources

roadmap.sh/devops: The community-maintained DevOps learning roadmap. Use this to sequence what you learn next and avoid random jumps between topics.
DORA State of DevOps Report: Free annual report on what DevOps practices actually improve software delivery performance. Gives you the vocabulary hiring managers speak.
Linux Foundation - Introduction to Linux: Free introductory Linux course. If the terminal still makes you nervous, start here.
The Phoenix Project: A business novel about DevOps transformation. Teaches core concepts through story. Gives you vocabulary for business-aware conversations.
ExplainShell.com: Paste any command you find online and see exactly what every part does. Use this constantly while building your projects.
GitHub - How to Write a Good README: Official GitHub guidance on repository documentation.
Prometheus Documentation: Official docs for the monitoring tool used in Project 3.
Terraform Getting Started - AWS: Official step-by-step guide for Project 2.
GitHub Actions Documentation: Complete reference for building CI/CD pipelines in Project 1.
freeCodeCamp - Learn Linux for Beginners: Comprehensive Linux guide available on freeCodeCamp.

How to Deploy a Full-Stack Next.js App on Cloudflare Workers with GitHub Actions CI/CD

Md Tarikul Islam — Wed, 29 Apr 2026 14:23:26 +0000

I typically build my projects using Next.js 14 (App Router) and Supabase for authentication along with Postgres. The default deployment choice for a Next.js app is usually Vercel, and for good reason: it provides an excellent developer experience.

But after running the same project on both platforms for about a week, I started exploring Cloudflare Workers as an alternative. I noticed improvements in latency (lower TTFB) and found the free tier to be more flexible for my use case.

Deploying Next.js apps on Cloudflare used to be challenging. Earlier solutions like Cloudflare Pages had limitations with full Next.js features, and tools like next-on-pages often lagged behind the latest releases.

That changed with the introduction of @opennextjs/cloudflare. It allows you to compile a standard Next.js application into a Cloudflare Worker, supporting features like SSR, ISR, middleware, and the Image component – all without requiring major code changes.

In this guide, I’ll walk you through the exact steps I used to deploy my full-stack Next.js + Supabase application to Cloudflare Workers.

This article is the runbook I wish I had when I started.

Why Choose Cloudflare Workers Over Vercel?
Prerequisites
The Stack
Step 1 — Install the Cloudflare Adapter
Step 2 — Wire OpenNext into next dev
Step 3— Local Environment Setup with .dev.vars
Step 4 — Deploy Your App from Your Local Machine
Step 5 — Push your secrets to the Worker
Step 6 — Set Up Continuous Deployment with GitHub Actions
Step 7 — Updating the project (the daily workflow)
Final thoughts

Why Choose Cloudflare Workers Over Vercel?

When deploying a Next.js application, Vercel is often the default choice. It offers a smooth developer experience and tight integration with Next.js.

But Cloudflare Workers provides a compelling alternative, especially when you care about global performance and cost efficiency.

Here’s a high-level comparison (at the time of writing):

Concern	Vercel (Hobby)	Cloudflare Workers (Free Tier)
Requests	Fair usage limits	Millions of requests per day
Cold starts	~100–300 ms (region-based)	Near-zero (V8 isolates)
Edge locations	Limited regions for SSR	300+ global edge locations
Bandwidth	~100 GB/month (soft cap)	Generous / no strict cap on free tier
Custom domains	Supported	Supported
Image optimization	Counts toward usage	Available via `IMAGES` binding
Pricing beyond free	Starts at ~$20/month	Low-cost, usage-based pricing

Key Takeaways

Lower latency globally: Cloudflare runs your app across hundreds of edge locations, reducing response time for users worldwide.
Minimal cold starts: Thanks to V8 isolates, functions start almost instantly.
Cost efficiency: The free tier is generous enough for portfolios, blogs, and many small-to-medium apps.

Trade-offs to Consider

Cloudflare Workers use a V8 isolate runtime, not a full Node.js environment. That means:

Some Node.js APIs like fs or child_process aren't available
Native binaries or certain libraries may not work

That said, for most modern stacks – like Next.js + Supabase + Stripe + Resend – this limitation is rarely an issue.

In short, choose Vercel if you want the simplest, plug-and-play Next.js deployment. Choose Cloudflare Workers if you want better edge performance and more flexible scaling.

Prerequisites

Before getting started, make sure you have the following set up. Most of these take only a few minutes:

Node.js 18+ and pnpm 9+ (you can also use npm or yarn, but this guide uses pnpm.)
A Cloudflare account 👉 https://dash.cloudflare.com/sign-up
A Supabase account (if your app uses a database) 👉 https://supabase.com
A GitHub repository for your project (required later for CI/CD setup)
A domain name (optional) – You’ll get a free *.workers.dev URL by default.

Install Wrangler (Cloudflare CLI)

We’ll use Wrangler to build and deploy the application:

pnpm add -D wrangler

The Stack

Here’s the tech stack used in this project:

Next.js (v14.2.x): Using the App Router with Edge runtime for both public and dashboard routes
Supabase: Handles authentication, Postgres database, and Row-Level Security (RLS)
Tailwind CSS + UI utilities: For styling, along with lightweight animation using Framer Motion
Cloudflare Workers: Deployment powered by @opennextjs/cloudflare and wrangler
GitHub Actions: Used to automate CI/CD and deployments

Note: If you're using Next.js 15 or later, you can remove the
--dangerouslyUseUnsupportedNextVersion flag from the build script, as it's only required for certain Next.js 14 setups.

Step 1 — Install the Cloudflare Adapter

From inside your existing Next.js project, install the OpenNext adapter along with Wrangler (Cloudflare’s CLI tool):

pnpm add @opennextjs/cloudflare
pnpm add -D wrangler

Then add the deploy scripts to package.json:

{
  "scripts": {
    "dev": "next dev",
    "build": "next build",
    "start": "next start",
    "lint": "next lint",

    "cloudflare-build": "opennextjs-cloudflare build --dangerouslyUseUnsupportedNextVersion",
    "preview":          "pnpm cloudflare-build && opennextjs-cloudflare preview",
    "deploy":           "pnpm cloudflare-build && wrangler deploy",
    "upload":           "pnpm cloudflare-build && opennextjs-cloudflare upload",
    "cf-typegen":       "wrangler types --env-interface CloudflareEnv cloudflare-env.d.ts"
  }
}

What each script does:

Script	What it does
`pnpm cloudflare-build`	Compiles your Next app into `.open-next/` (the Worker bundle). No upload.
`pnpm preview`	Builds and runs the Worker locally with `wrangler dev`. Closest thing to prod.
`pnpm deploy`	Builds and uploads to Cloudflare. This ships to production.
`pnpm upload`	Builds and uploads a new version without promoting it (for staged rollouts).
`pnpm cf-typegen`	Regenerates `cloudflare-env.d.ts` types after editing `wrangler.jsonc`.

Heads up: the Pages-based @cloudflare/next-on-pages is a different tool. We are not using Pages — we're deploying as a real Worker. Don't mix the two.

Step 2 — Wire OpenNext into `next dev`

So that pnpm dev can read your Cloudflare bindings (env vars, R2, KV, D1, …) the same way production will, edit next.config.mjs:

/** @type {import('next').NextConfig} */
const nextConfig = {};

if (process.env.NODE_ENV !== "production") {
  const { initOpenNextCloudflareForDev } = await import(
    "@opennextjs/cloudflare"
  );
  initOpenNextCloudflareForDev();
}

export default nextConfig;

We only call it in development so next build stays fast and CI doesn't spin up a Miniflare instance for nothing.

Step 3 — Local Environment Setup with `.dev.vars`

When working with Cloudflare Workers locally, Wrangler uses a file called .dev.vars to store environment variables (instead of .env.local used by Next.js).

A simple and reliable approach is to keep an example file in your repo and ignore the real one.

Example: `.dev.vars.example` (committed)

NEXT_PUBLIC_SUPABASE_URL="https://YOUR-PROJECT-ref.supabase.co"
NEXT_PUBLIC_SUPABASE_ANON_KEY="YOUR-ANON-KEY"
NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL="admin@example.com"

Set Up Your Local Environment

Run the following commands:

cp .dev.vars.example .dev.vars
cp .dev.vars .env.local

.dev.vars is used by Wrangler (wrangler dev)
.env.local is used by Next.js (next dev)

Why Use Both Files?

next dev reads from .env.local
wrangler dev (used in pnpm preview) reads from .dev.vars

Keeping both files in sync ensures your app behaves consistently in development and when running in the Cloudflare runtime.

Update `.gitignore`

Make sure these files are ignored:

.dev.vars
.env*.local
.open-next
.wrangler

Step 4 — Deploy Your App from Your Local Machine

Once pnpm preview is working correctly, you're ready to deploy your application:

pnpm deploy

Under the hood that runs:

pnpm cloudflare-build && wrangler deploy

The first time, Wrangler will:

Compile your app to .open-next/worker.js.
Upload the script + assets to Cloudflare.
Print your live URL, e.g. https://porfolio..workers.dev.

Open it in a browser. Congratulations — you're on Cloudflare's edge in 330+ cities. The page should be served in <100 ms TTFB from anywhere.

Here's the live version of my own portfolio deployed this way

Step 5 — Push Your Secrets to the Worker

Local .dev.vars is not uploaded by wrangler deploy. You have to push secrets explicitly:

wrangler secret put NEXT_PUBLIC_SUPABASE_URL
wrangler secret put NEXT_PUBLIC_SUPABASE_ANON_KEY
wrangler secret put NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL

Each command prompts you for the value and stores it encrypted on Cloudflare. Or do it visually:

Cloudflare Dashboard → Workers & Pages → your worker → Settings → Variables and Secrets → Add.

Important: NEXT_PUBLIC_* vars are inlined into the client bundle at build time, so they also need to be available when pnpm cloudflare-build runs (locally, that's your .env.local; in CI, see Step 10).

Step 6 — Set Up Continuous Deployment with GitHub Actions

Once your local deployment is working, the next step is automating deployments so every push to the main branch updates production automatically.

With this workflow:

Pull requests will run validation checks
Production deploys only happen after successful builds
Broken code never reaches your live site

Create the following file inside your project:

.github/workflows/deploy.yml

name: CI / Deploy to Cloudflare Workers

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  workflow_dispatch:

concurrency:
  group: cloudflare-deploy-${{ github.ref }}
  cancel-in-progress: true

jobs:
  verify:
    name: Lint and Build
    runs-on: ubuntu-latest
    timeout-minutes: 10

    steps:
      - uses: actions/checkout@v4

      - uses: pnpm/action-setup@v4
        with:
          version: 10

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm

      - run: pnpm install --frozen-lockfile
      - run: pnpm lint
      - run: pnpm build
        env:
          NEXT_PUBLIC_SUPABASE_URL: ${{ secrets.NEXT_PUBLIC_SUPABASE_URL }}
          NEXT_PUBLIC_SUPABASE_ANON_KEY: ${{ secrets.NEXT_PUBLIC_SUPABASE_ANON_KEY }}
          NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL: ${{ secrets.NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL }}

  deploy:
    name: Deploy to Cloudflare Workers
    needs: verify
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - uses: actions/checkout@v4

      - uses: pnpm/action-setup@v4
        with:
          version: 10

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm

      - run: pnpm install --frozen-lockfile

      - name: Build and Deploy
        run: pnpm run deploy
        env:
          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          NEXT_PUBLIC_SUPABASE_URL: ${{ secrets.NEXT_PUBLIC_SUPABASE_URL }}
          NEXT_PUBLIC_SUPABASE_ANON_KEY: ${{ secrets.NEXT_PUBLIC_SUPABASE_ANON_KEY }}
          NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL: ${{ secrets.NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL }}

Required GitHub repo secrets

Go to GitHub repo → Settings → Secrets and variables → Actions → New repository secret and add:

Secret	Where to get it
`CLOUDFLARE_API_TOKEN`	https://dash.cloudflare.com/profile/api-tokens → "Edit Cloudflare Workers" template
`CLOUDFLARE_ACCOUNT_ID`	Cloudflare dashboard → right sidebar, "Account ID"
`CLOUDFLARE_ACCOUNT_SUBDOMAIN`	Your `*.workers.dev` subdomain (used only for the deployment URL link)
`NEXT_PUBLIC_SUPABASE_URL`	Supabase project settings
`NEXT_PUBLIC_SUPABASE_ANON_KEY`	Supabase project settings
`NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL`	Email pre-filled on `/dashboard/login`

That's it. Push it to main and it'll go live in about 90 seconds. PRs run lint and build only, so broken code never reaches production.

Step 7 — Updating the Project (the Daily Workflow)

After the initial setup, the loop is boringly simple — which is the whole point. Here's what I actually do day-to-day:

Code Change

git checkout -b feat/new-section
# ...edit files...
pnpm dev                # iterate locally
pnpm preview            # final smoke test on the Worker runtime
git commit -am "feat: add new section"
git push origin feat/new-section

Open a PR and the verify that the job runs. Then review, merge, and the deploy it. The job ships to Cloudflare automatically.

Updating env Vars / Secrets

# Local
nano .dev.vars

# Production
wrangler secret put NEXT_PUBLIC_SUPABASE_URL
# ...etc.

Final Thoughts

When I started this migration, I was nervous about leaving Vercel — the Next.js DX there is genuinely excellent. But the moment you push beyond a hobby site, Cloudflare's economics and edge performance are not close.

With @opennextjs/cloudflare, the developer experience has also caught up: my pnpm dev loop is identical, my pnpm preview mimics production, and git push deploys globally in ~90 seconds.

If you've been holding off because the old Cloudflare Pages + Next.js story was rough, that era is over. Try this runbook on a side project this weekend and see for yourself.

If you found this useful, the full repo is here — feel free to clone it as a starter.

Happy shipping.

— Tarikul

How to Create a GPU-Optimized Machine Image with HashiCorp Packer on GCP

Rasheedat Atinuke Jamiu — Wed, 22 Apr 2026 20:30:00 +0000

Every time you spin up GPU infrastructure, you do the same thing: install CUDA drivers, DCGM, apply OS‑level GPU tuning, and fight dependency issues. Same old ritual every single time, wasting expensive cloud credits and getting frustrated before actual work begins.

In this article, you'll build a reusable GPU-optimized machine image using Packer, pre-loaded with NVIDIA drivers, CUDA Toolkit, NVIDIA Container Toolkit, DCGM, and system-level GPU tuning like persistence mode.

Prerequisites
Project Setup
Step 1: Install Packer
Step 2: Set Up Project Directory
Step 3: Install Packer's Plugins
Step 4: Define Your Source
Step 5: Writing the Build Template
Step 6: Writing the GPU Provisioning Script
Step 7:Assembling and Running the Build
Step 8: Test the Image and Verify the GPU Stack
Conclusion
References

Prerequisites

HashiCorp Packer >= 1.9
Google Compute Packer plugin (installed via packer init)
Optionally, the AWS Packer plugin can be used for EC2 builds by adding an amazon-ebs source to node.pkr.hcl
GCP project with Compute Engine API enabled (or AWS account with EC2 access)
GCP authentication (gcloud auth application-default login) or AWS credentials
Access to an NVIDIA GPU instance type (For example, A100, H100, L4 on GCP; p4d, p5, G6 on AWS)

Project Setup

Step 1: Install Packer

To get started, you'll install Packer with the steps below if you're on macOS (or you can follow the official documentation for Linux and Windows installation guides).

First, you'll install the official Packer formula from the terminal.

Install the HashiCorp tap, a repository of all Hashicorp packages.

$ brew tap hashicorp/tap

Now, install Packer with hashicorp/tap/packer.

$ brew install hashicorp/tap/packer

Step 2: Set Up Project Directory

With Packer installed, you'll create your project directory. For clean code and separation of concerns, your project directory should look like the below. Go ahead and create these files in your packer_demo folder using the command below:

mkdir -p packer_demo/script && touch packer_demo/{build.pkr.hcl,source.pkr.hcl,variable.pkr.hcl,local.pkr.hcl,plugins.pkr.hcl,values.pkrvars.hcl} packer_demo/script/base.sh

Your file directory should look like this:

packer_demo
├── build.pkr.hcl                 # Build pipeline — provisioner ordering
├── source.pkr.hcl                # GCP source definition (googlecompute)
├── variable.pkr.hcl              # Variable definitions with defaults
├── local.pkr.hcl                 # Local values
├── plugins.pkr.hcl                # Packer plugin requirements
├── values.pkrvars.hcl             # variable values (copy and customize)
├── script/
│   ├── base.sh                  # requirement script

Step 3: Install Packer's Plugins

In your plugins.pkr.hcl file,, define your plugins in the packer block. The packer {} block contains Packer settings, including specifying a required plugin version. You'll find the required_plugins block in the Packer block, which specifies all the plugins required by the template to build your image. If you're on Azure or AWS, you can check for the latest plugin here.

packer {
  required_plugins {
    googlecompute = {
      source  = "github.com/hashicorp/googlecompute"
      version = "~> 1"
    }
  }
}

Then, initialize your Packer plugin with the command below:

packer init .

Step 4: Define Your Source

With your plugin initialized, you can now define your source block. The source block configures a specific builder plugin, which is then invoked by a build block. Source blocks contain your project ID, the zone where your machine will be created, the source_image_family (think of this as your base image, such as Debian, Ubuntu, and so on), and your source_image_project_id.

In GCP, each has an image project ID, such as "ubuntu-os-cloud" for Ubuntu. You'll set the machine type to a GPU machine type because you're building your base image for a GPU machine, so the machine on which it will be created needs to be able to run your commands.

source "googlecompute" "gpu-node" {
  project_id              = var.project_id
  zone                    = var.zone
  source_image_family     = var.image_family
  source_image_project_id = var.image_project_id
  ssh_username            = var.ssh_username
  machine_type            = var.machine_type



  image_name        = var.image_name
  image_description = var.image_description

  disk_size           = var.disk_size
  on_host_maintenance = "TERMINATE"

  tags = ["gpu-node"]

}

Setting on_host_maintenance = "TERMINATE" on Google Cloud Compute Engine ensures that a VM instance stops instead of live-migrating during infrastructure maintenance. This is important when using GPUs or specialized hardware that can't migrate, preventing data corruption.

You'll define all your variables in the variable.pkr.hcl file, and set the values in the values.pkrvars.hcl. Remember to always add your values.pkrvars.hcl file to Gitignore.

variable "image_name" {
  type        = string
  description = "The name of the resulting image"
}

variable "image_description" {
  type        = string
  description = "Description of the image"
}

variable "project_id" {
  type        = string
  description = "The GCP project ID where the image will be created"
}

variable "image_family" {
  type        = string
  description = "The image family to which the resulting image belongs"
}

variable "image_project_id" {
  type        = list(string)
  description = "The project ID(s) to search for the source image"
}

variable "zone" {
  type        = string
  description = "The GCP zone where the build instance will be created"
}

variable "ssh_username" {
  type        = string
  description = "The SSH username to use for connecting to the instance"
}
variable "machine_type" {
  type        = string
  description = "The machine type to use for the build instance"
}

variable "cuda_version" {
  type        = string
  description = "CUDA toolkit version"
  default     = "13.1"
}

variable "driver_version" {
  type        = string
  description = "NVIDIA driver version"
  default     = "590.48.01"
}

variable "disk_size" {
  type        = number
  description = "Boot disk size in GB"
  default     = 50
}

values.pkrvars.hcl

image_name        = "base-gpu-image-{{timestamp}}"
image_description = "Ubuntu 24.04 LTS with gpu drivers and health checks"
project_id        = "your gcp project id"
image_family      = "ubuntu-2404-lts-amd64"
image_project_id  = ["ubuntu-os-cloud"]
zone              = "us-central1-a"
ssh_username      = "packer"
machine_type      = "g2-standard-4"
disk_size        = 50
driver_version   = "590.48.01"
cuda_version      = "13.1"

Step 5: Writing the Build Template

Create build.pkr.hcl. The build block creates a temporary instance, runs provisioners, and produces an image.

Provisioners in this template are organized as follows:

First provisioner runs system updates and upgrades.
Second provisioner reboots the instance (expect_disconnect = true).
Third provisioner waits for the instance to come back (pause_before), then runs script/base.sh. This provisioner sets max_retries to handle transient SSH timeouts and pass environment variables for DRIVER_VERSION and CUDA_VERSION.

Lastly, you have the post-processor to tell you the image ID and completion status:

build {
  sources = ["source.googlecompute.gpu-node"]

  provisioner "shell" {
    inline = [
      "set -e",
      "sudo apt update",
      "sudo apt -y dist-upgrade"
    ]
  }

  provisioner "shell" {
    expect_disconnect = true
    inline            = ["sudo reboot"]
  }

  # Base: NVIDIA drivers, CUDA, DCGM
  provisioner "shell" {
    pause_before = "60s"
    script       = "script/base.sh"
    max_retries  = 2
    environment_vars = [
      "DRIVER_VERSION=${var.driver_version}",
      "CUDA_VERSION=${var.cuda_version}"
    ]
  }

  post-processor "shell-local" {
    inline = [
      "echo '=== Image Build Complete ==='",
      "echo 'Image ID: ${build.ID}'",
      "date"
    ]
  }
}

Step 6: Writing the GPU Provisioning Script

Now we'll go through the base script, and break down some parts of it.

Section 1: Pre-Installation (Kernel Headers)

Before installing NVIDIA drivers, the system needs kernel headers and build tools. The NVIDIA driver compiles a kernel module during installation via DKMS, so if the headers for your running kernel aren't present, the build will fail silently, and the driver won't load on boot.

log "Installing kernel headers and build tools..."
sudo apt-get install -qq -y \
  "linux-headers-$(uname -r)" \
  build-essential \
  dkms \
  curl \
  wget

Section 2: Installing NVIDIA's Apt Repository

This snippet downloads and installs NVIDIA’s official keyring package based on your OS Linux distribution, which adds the trusted signing keys needed for the system to verify CUDA packages.

log "Adding NVIDIA CUDA apt repository (${DISTRO})..."
wget -q "https://developer.download.nvidia.com/compute/cuda/repos/\({DISTRO}/\){ARCH}/cuda-keyring_1.1-1_all.deb" \
  -O /tmp/cuda-keyring.deb
sudo dpkg -i /tmp/cuda-keyring.deb
rm /tmp/cuda-keyring.deb
sudo apt-get update -qq

Section 3: Pinning NVIDIA Drivers Version

Pinning the NVIDIA driver to a specific version ensures that the system always installs and keeps using exactly that driver version, even when newer drivers appear in the repository.

NVIDIA drivers are tightly coupled with CUDA toolkit versions, Kernel versions, and container runtimes like Docker or NVIDIA Container Toolkit

A mismatch, such as the system auto‑upgrading to a newer driver, can cause CUDA to stop working, break GPU acceleration, or make the machine image inconsistent across deployments.

log "Pinning driver to version ${DRIVER_VERSION}..."
sudo apt-get install -qq -y "nvidia-driver-pinning-${DRIVER_VERSION}"

Section 4: Installing the Driver

The libnvidia-compute installs only the compute‑related user‑space libraries (CUDA driver components), while the nvidia-dkms-open; installs the open‑source NVIDIA kernel module, built locally via DKMS.

Together, these two packages give you a fully functional CUDA driver environment without any GUI or graphics dependencies.

Here, we're using NVIDIA’s compute‑only driver stack using the open‑source kernel modules, as it deliberately avoids installing any display-related components, which you don't need.

This method provides an installation module based on DKMS that's better aligned with Linux distros, as it's lightweight, and compute-focused.

log "Installing NVIDIA compute-only driver (open kernel modules)..."
sudo apt-get -V install -y \
  libnvidia-compute \
  nvidia-dkms-open

Section 5: CUDA Toolkit Installation

This part of the script installs the CUDA Toolkit for the specified version and then makes sure that CUDA’s executables and libraries are available system‑wide for every user and every shell session.

It adds CUDA binaries to PATH, so commands like nvcc, cuda-gdb, and cuda-memcheck work without specifying full paths. It also adds CUDA libraries to LD_LIBRARY_PATH, so applications can find CUDA’s shared libraries at runtime.

log "Installing CUDA Toolkit ${CUDA_VERSION}..."
sudo apt-get install -qq -y "cuda-toolkit-${CUDA_VERSION}"

# Persist CUDA paths for all users and sessions
cat <<'EOF' | sudo tee /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:-}
EOF
echo "/usr/local/cuda/lib64" | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

Section 6: NVIDIA Container Toolkit

This block installs the NVIDIA Container Toolkit and configures it so that containers (Docker or containerd) can access the GPU safely and correctly. It’s a critical step for Kubernetes GPU nodes, Docker GPU workloads, and any system that needs GPU acceleration inside containers.

log "Installing NVIDIA Container Toolkit..."
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update -qq
sudo apt-get install -qq -y nvidia-container-toolkit

# Configure for containerd (primary Kubernetes runtime)
sudo nvidia-ctk runtime configure --runtime=containerd

# Configure for Docker if present on this image
if systemctl list-unit-files | grep -q "^docker.service"; then
  sudo nvidia-ctk runtime configure --runtime=docker
fi

Section 7: Installing DCGM (Data Center GPU Manager)

This section covers the installation and validation of NVIDIA DCGM (Data Center GPU Manager), which is NVIDIA’s official management and telemetry framework for data center GPUs.

It offers health monitoring and diagnostics, telemetry (including temperature, clocks, power, and utilization), error reporting, and integration with Kubernetes, Prometheus, and monitoring agents. Your GPU monitoring stack relies on this.

The script extracts the installed version and checks that it meets the minimum required version for NVIDIA driver 590+. Then it enforces the version requirement. This prevents a mismatch between the GPU driver and DCGM, which would break monitoring and health checks. It also enables fabric manager for NVLink/NVswitches, if you're on a Multi‑GPU topologies like A100/H100 DGX or multi‑GPU servers.

log "Installing DCGM..."
sudo apt-get install -qq -y datacenter-gpu-manager

DCGM_VER=\((dpkg -s datacenter-gpu-manager 2>/dev/null | awk '/^Version:/{print \)2}' | sed 's/^[0-9]*://')
DCGM_MAJOR=\((echo "\){DCGM_VER}" | cut -d. -f1)
DCGM_MINOR=\((echo "\){DCGM_VER}" | cut -d. -f2)
if [[ "\({DCGM_MAJOR}" -lt 4 ]] || { [[ "\){DCGM_MAJOR}" -eq 4 ]] && [[ "${DCGM_MINOR}" -lt 3 ]]; }; then
  error "DCGM ${DCGM_VER} is below the 4.3 minimum required for driver 590+. Check your CUDA repo."
fi
log "DCGM installed: ${DCGM_VER}"

sudo systemctl enable nvidia-dcgm
sudo systemctl start  nvidia-dcgm

# Fabric Manager — only needed for NVLink/NVSwitch GPUs (A100/H100 multi-GPU nodes)
if systemctl list-unit-files | grep -q "^nvidia-fabricmanager.service"; then
  log "Enabling nvidia-fabricmanager for NVLink GPUs..."
  sudo systemctl enable nvidia-fabricmanager
  sudo systemctl start  nvidia-fabricmanager
fi

Section 8: Enabling Persistence Mode

The NVIDIA driver normally unloads itself when the GPU is idle. When a new workload starts, the driver must reload, reinitialize the GPU, and set up memory mappings. This adds a delay of a few hundred milliseconds to several seconds, depending on the GPU and system.

Enabling nvidia‑persistenced keeps the NVIDIA driver loaded in memory even when no GPU workloads are running.

log "Enabling nvidia-persistenced..."
sudo systemctl enable nvidia-persistenced
sudo systemctl start  nvidia-persistenced

Section 9: System Tuning for GPU Compute Workloads

This block applies a set of system‑level performance and stability tunings that are standard for high‑performance GPU servers, Kubernetes GPU nodes, and ML/AI workloads.

Each line targets a specific bottleneck or instability pattern that appears in real GPU production environments.

Swap and memory behavior: Disabling swap and setting vm.swappiness=0 prevents the kernel from pushing GPU‑bound processes into swap. GPU workloads are extremely sensitive to latency, and swapping can cause CUDA context resets and GPU driver timeouts.
Hugepages for large memory allocations: Setting vm.nr_hugepages=2048 allocates a pool of hugepages, which reduces TLB pressure for large contiguous memory allocations.

CUDA, NCCL, and deep‑learning frameworks frequently allocate large buffers, and hugepages reduce page‑table overhead, improving memory bandwidth and lowering latency for large tensor operations. This is especially useful on multi‑GPU servers.
CPU frequency governor: Installing cpupower and forcing the CPU governor to performance ensures the CPU stays at maximum frequency instead of scaling down.

GPU workloads often become CPU‑bound during Data preprocessing, Kernel launches, and NCCL communication. Keeping CPUs at full speed reduces jitter and improves throughput.
NUMA and topology tools: Installing numactl, libnuma-dev, and hwloc provides tools for pinning processes to NUMA nodes, understanding CPU–GPU affinity, and optimizing multi‑GPU placement.
Disabling irqbalance: Stopping and disabling irqbalance it lets the NVIDIA driver manage interrupt affinity. For GPU servers, irqbalance can incorrectly move GPU interrupts to suboptimal CPUs, causing higher latency and lower throughput.

log "Applying system tuning..."

# Disable swap (critical for Kubernetes scheduler and ML stability)
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
echo "vm.swappiness=0"     | sudo tee /etc/sysctl.d/99-gpu-swappiness.conf

# Hugepages — reduces TLB pressure for large memory allocations
echo "vm.nr_hugepages=2048" | sudo tee /etc/sysctl.d/99-gpu-hugepages.conf

# CPU performance governor
sudo apt-get install -qq -y linux-tools-common "linux-tools-$(uname -r)" || true
sudo cpupower frequency-set -g performance || true

# NUMA and topology tools for GPU affinity tuning
sudo apt-get install -qq -y numactl libnuma-dev hwloc

# Disable irqbalance — let NVIDIA driver manage interrupt affinity
sudo systemctl disable irqbalance || true
sudo systemctl stop    irqbalance || true

# Apply all sysctl settings now
sudo sysctl --system

Full base.sh script here:

#!/bin/bash
set -euo pipefail

log()   { echo "[BASE] $1"; }
error() { echo "[BASE][ERROR] $1" >&2; exit 1; }

###############################################################
###############################################################
[[ -z "${DRIVER_VERSION:-}" ]] && error "DRIVER_VERSION is not set."
[[ -z "${CUDA_VERSION:-}"   ]] && error "CUDA_VERSION is not set."

log "DRIVER_VERSION : ${DRIVER_VERSION}"
log "CUDA_VERSION   : ${CUDA_VERSION}"

DISTRO=\((. /etc/os-release && echo "\){ID}${VERSION_ID}" | tr -d '.')
ARCH="x86_64"

export DEBIAN_FRONTEND=noninteractive

###############################################################
# 1. System update
###############################################################
log "Updating system packages..."
sudo apt-get update -qq
sudo apt-get upgrade -qq -y

###############################################################
# 2. Pre-installation — kernel headers
#    Source: https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/ubuntu.html
###############################################################
log "Installing kernel headers and build tools..."
sudo apt-get install -qq -y \
  "linux-headers-$(uname -r)" \
  build-essential \
  dkms \
  curl \
  wget

###############################################################
# 3. NVIDIA CUDA Network Repository
###############################################################
log "Adding NVIDIA CUDA apt repository (${DISTRO})..."
wget -q "https://developer.download.nvidia.com/compute/cuda/repos/\({DISTRO}/\){ARCH}/cuda-keyring_1.1-1_all.deb" \
  -O /tmp/cuda-keyring.deb
sudo dpkg -i /tmp/cuda-keyring.deb
rm /tmp/cuda-keyring.deb
sudo apt-get update -qq

###############################################################
# 4. Pin driver version BEFORE installation (590+ requirement)
###############################################################
log "Pinning driver to version ${DRIVER_VERSION}..."
sudo apt-get install -qq -y "nvidia-driver-pinning-${DRIVER_VERSION}"

###############################################################
# 5. Compute-only (headless) driver — Open Kernel Modules
#    Source: NVIDIA Driver Installation Guide — Compute-only System (Open Kernel Modules)
#
#    libnvidia-compute  = compute libraries only (no GL/Vulkan/display)
#    nvidia-dkms-open   = open-source kernel module built via DKMS
#
#    Open kernel modules are the NVIDIA-recommended choice for
#    Ampere, Hopper, and Blackwell data centre GPUs (A100, H100, etc.)
###############################################################
log "Installing NVIDIA compute-only driver (open kernel modules)..."
sudo apt-get -V install -y \
  libnvidia-compute \
  nvidia-dkms-open

###############################################################
# 6. CUDA Toolkit
###############################################################
log "Installing CUDA Toolkit ${CUDA_VERSION}..."
sudo apt-get install -qq -y "cuda-toolkit-${CUDA_VERSION}"

# Persist CUDA paths for all users and sessions
cat <<'EOF' | sudo tee /etc/profile.d/cuda.sh
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:-}
EOF
echo "/usr/local/cuda/lib64" | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

###############################################################
# 7. NVIDIA Container Toolkit
#    Required for GPU workloads in Docker / containerd / Kubernetes
###############################################################
log "Installing NVIDIA Container Toolkit..."
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update -qq
sudo apt-get install -qq -y nvidia-container-toolkit

# Configure for containerd (primary Kubernetes runtime)
sudo nvidia-ctk runtime configure --runtime=containerd

# Configure for Docker if present on this image
if systemctl list-unit-files | grep -q "^docker.service"; then
  sudo nvidia-ctk runtime configure --runtime=docker
fi

###############################################################
# 8. DCGM — DataCenter GPU Manager
###############################################################
log "Installing DCGM..."
sudo apt-get install -qq -y datacenter-gpu-manager
 
DCGM_VER=\((dpkg -s datacenter-gpu-manager 2>/dev/null | awk '/^Version:/{print \)2}' | sed 's/^[0-9]*://')
DCGM_MAJOR=\((echo "\){DCGM_VER}" | cut -d. -f1)
DCGM_MINOR=\((echo "\){DCGM_VER}" | cut -d. -f2)
if [[ "\({DCGM_MAJOR}" -lt 4 ]] || { [[ "\){DCGM_MAJOR}" -eq 4 ]] && [[ "${DCGM_MINOR}" -lt 3 ]]; }; then
  error "DCGM ${DCGM_VER} is below the 4.3 minimum required for driver 590+. Check your CUDA repo."
fi
log "DCGM installed: ${DCGM_VER}"

sudo systemctl enable nvidia-dcgm
sudo systemctl start  nvidia-dcgm

# Fabric Manager — only needed for NVLink/NVSwitch GPUs (A100/H100 multi-GPU nodes)
if systemctl list-unit-files | grep -q "^nvidia-fabricmanager.service"; then
  log "Enabling nvidia-fabricmanager for NVLink GPUs..."
  sudo systemctl enable nvidia-fabricmanager
  sudo systemctl start  nvidia-fabricmanager
fi

###############################################################
# 9. NVIDIA Persistence Daemon
#    Keeps the driver loaded between jobs — reduces cold-start
#    latency on the first CUDA call in each new workload
###############################################################
log "Enabling nvidia-persistenced..."
sudo systemctl enable nvidia-persistenced
sudo systemctl start  nvidia-persistenced

###############################################################
# 10. System tuning for GPU compute workloads
###############################################################
log "Applying system tuning..."

# Disable swap (critical for Kubernetes scheduler and ML stability)
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
echo "vm.swappiness=0"     | sudo tee /etc/sysctl.d/99-gpu-swappiness.conf

# Hugepages — reduces TLB pressure for large memory allocations
echo "vm.nr_hugepages=2048" | sudo tee /etc/sysctl.d/99-gpu-hugepages.conf

# CPU performance governor
sudo apt-get install -qq -y linux-tools-common "linux-tools-$(uname -r)" || true
sudo cpupower frequency-set -g performance || true

# NUMA and topology tools for GPU affinity tuning
sudo apt-get install -qq -y numactl libnuma-dev hwloc

# Disable irqbalance — let NVIDIA driver manage interrupt affinity
sudo systemctl disable irqbalance || true
sudo systemctl stop    irqbalance || true

# Apply all sysctl settings now
sudo sysctl --system

###############################################################
# Done
###############################################################
log "============================================"
log "Base layer provisioning complete."
log "  OS      : ${DISTRO}"
log "  Driver  : ${DRIVER_VERSION} (open kernel modules, compute-only)"
log "  CUDA    : cuda-toolkit-${CUDA_VERSION}"
log "  DCGM    : ${DCGM_VER}"
log "============================================"

Step 7: Assembling and Running the Build

Validate the template first, then run the build. Validation catches syntax or variable errors early, so the build doesn’t start on a broken config.

packer validate -var-file=values.pkrvars.hcl .

If validation succeeds, you’ll see a short confirmation like The configuration is valid.. After that, start the build. You should expect the process to create a temporary VM, run your provisioners, and produce an image:

packer build -var-file=values.pkrvars.hcl .

The build typically takes 15–20 minutes, depending on network speed and package installs. Watch the Packer log for three key checkpoints:

Instance creation — confirms the temporary VM was provisioned.
Provisioner output — shows each script step (updates, reboot, script/base.sh) and any errors.
Image creation — indicates the build finished and an image artifact was written.

If the build fails, copy the failing provisioner’s log lines and re-run the build after fixing the script or variables. For quick troubleshooting, re-run the failing provisioner locally on a matching test VM to iterate faster.

googlecompute.gpu-node: output will be in this color.

==> googlecompute.gpu-node: Checking image does not exist...
==> googlecompute.gpu-node: Creating temporary RSA SSH key for instance...
==> googlecompute.gpu-node: no persistent disk to create
==> googlecompute.gpu-node: Using image: ubuntu-2404-noble-amd64-v20260225
==> googlecompute.gpu-node: Creating instance...
==> googlecompute.gpu-node: Loading zone: us-central1-a
==> googlecompute.gpu-node: Loading machine type: g2-standard-4
==> googlecompute.gpu-node: Requesting instance creation...
==> googlecompute.gpu-node: Waiting for creation operation to complete...
==> googlecompute.gpu-node: Instance has been created!
==> googlecompute.gpu-node: Waiting for the instance to become running...
==> googlecompute.gpu-node: IP: 34.58.58.214
==> googlecompute.gpu-node: Using SSH communicator to connect: 34.58.58.214
==> googlecompute.gpu-node: Waiting for SSH to become available...
systemd-logind.service
==> googlecompute.gpu-node:  systemctl restart unattended-upgrades.service
==> googlecompute.gpu-node:
==> googlecompute.gpu-node: No containers need to be restarted.
==> googlecompute.gpu-node:
==> googlecompute.gpu-node: User sessions running outdated binaries:
==> googlecompute.gpu-node:  packer @ session #1: sshd[1535]
==> googlecompute.gpu-node:  packer @ user manager service: systemd[1540]
==> googlecompute.gpu-node: Pausing 1m0s before the next provisioner...
==> googlecompute.gpu-node: Provisioning with shell script: script/base.sh
==> googlecompute.gpu-node: [BASE] DRIVER_VERSION : 590.48.01
==> googlecompute.gpu-node: [BASE] CUDA_VERSION   : 13.1
==> googlecompute.gpu-node: [BASE] Updating system packages...
==> googlecompute.gpu-node: [BASE] Installing kernel headers and build tools...
==> googlecompute.gpu-node: [BASE] Installing CUDA Toolkit 13.1...
==> googlecompute.gpu-node: [BASE] Installing DCGM...
==> googlecompute.gpu-node: [BASE] Enabling nvidia-persistenced...
==> googlecompute.gpu-node: [BASE] Applying system tuning...
==> googlecompute.gpu-node: vm.swappiness=0
==> googlecompute.gpu-node: vm.nr_hugepages=2048
==> googlecompute.gpu-node: Setting cpu: 0
==> googlecompute.gpu-node: Error setting new values. Common errors:
==> googlecompute.gpu-node: [BASE] ============================================
==> googlecompute.gpu-node: [BASE] Base layer provisioning complete.
==> googlecompute.gpu-node: [BASE]   OS      : ubuntu2404
==> googlecompute.gpu-node: [BASE]   Driver  : 590.48.01 (open kernel modules, compute-only)
==> googlecompute.gpu-node: [BASE]   CUDA    : cuda-toolkit-13.1
==> googlecompute.gpu-node: [BASE]   DCGM    : 1:3.3.9
==> googlecompute.gpu-node: [BASE] ============================================
==> googlecompute.gpu-node: Deleting instance...
==> googlecompute.gpu-node: Instance has been deleted!
==> googlecompute.gpu-node: Creating image...
==> googlecompute.gpu-node: Deleting disk...
==> googlecompute.gpu-node: Disk has been deleted!
==> googlecompute.gpu-node: Running post-processor:  (type shell-local)
==> googlecompute.gpu-node (shell-local): Running local shell script: 
==> googlecompute.gpu-node (shell-local): === Image Build Complete ===
==> googlecompute.gpu-node (shell-local): Image ID: packer-69b6c2ee-883a-3602-7bb5-059f1ba27c8b
==> googlecompute.gpu-node (shell-local): Sun Mar 15 15:50:09 WAT 2026
Build 'googlecompute.gpu-node' finished after 17 minutes 55 seconds.

==> Wait completed after 17 minutes 55 seconds

==> Builds finished. The artifacts of successful builds are:
--> googlecompute.gpu-node: A disk image was created in the 'my_project-00000' project: base-gpu-image-1773585134

Step 8: Test the Image and Verify the GPU Stack

Confirm the image exists in the GCP Console: Compute → Storage → Images and locate your newly created OS image.

Create a test VM from the image:

gcloud compute instances create my-gpu-vm \
  --machine-type=g2-standard-4 \
  --accelerator=count=1,type=nvidia-l4 \
  --image=base-gpu-image-1772718104 \
  --image-project=YOUR_PROJECT_ID \
  --boot-disk-size=50GB \
  --maintenance-policy=TERMINATE \
  --restart-on-failure \
  --zone=us-central1-a

Created [https://www.googleapis.com/compute/v1/projects/my-project-000/zones/us-central1-a/instances/my-gpu-vm].
NAME       ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP    EXTERNAL_IP      STATUS
my-gpu-vm  us-central1-a  g2-standard-4               10.128.15.227  104.154.184.217  RUNNING

Once the instance is RUNNING, verify the NVIDIA driver and GPU are visible:

The nvidia-smi output confirms:

Driver 590.48.01 loaded
CUDA 13.1 available
Persistence Mode is On
The L4 GPU is detected with 23GB VRAM
Zero ECC errors
No running processes (clean idle state).

This is exactly what a healthy base image should look like. Notice Disp.A: Off? That confirms our compute-only driver choice is working — no display adapter is active.

Confirm the installed CUDA toolkit by running. nvcc --version. You can see that version 13.1 was installed as specified.

Let's confirm DCGM installation by running dcgmi discovery -l. Successful output indicates DCGM is running and communicating with the driver.

Conclusion

You now have a production‑grade, GPU‑optimized base image that includes the NVIDIA compute‑only driver built with open kernel modules, DCGM for monitoring, and the CUDA Toolkit. You also applied OS‑level tuning tailored to GPU compute workloads, providing a consistent, reproducible environment with no manual setup.

From here, you can extend the build by adding an application‑layer script to install frameworks such as PyTorch, TensorFlow, or vLLM, or create an instance template that uses this image to scale your GPU infrastructure.

The full Packer project includes additional scripts for training and inference workloads that you can use to extend your image.

References

NVIDIA Driver Installation Guide (Ubuntu): https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/
NVIDIA CUDA Toolkit Documentation: https://docs.nvidia.com/cuda/
NVIDIA Container Toolkit Installation Guide: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
NVIDIA DCGM Documentation: https://docs.nvidia.com/datacenter/dcgm/latest/index.html
NVIDIA Persistence Daemon: https://docs.nvidia.com/deploy/driver-persistence/index.html
HashiCorp Packer Documentation: https://developer.hashicorp.com/packer/docs
Packer Google Compute Builder: https://developer.hashicorp.com/packer/integrations/hashicorp/googlecompute

Why Chrome OS Is the Operating System the AI Era Was Built For

Christopher Galliart — Fri, 17 Apr 2026 18:05:16 +0000

Chrome OS runs on a read-only filesystem. You can't install executables on the host. There's no traditional desktop environment. Everything that interacts with the underlying system does so through a sandboxed browser, a containerized Linux terminal, or a cloud connection.

For years, that list of constraints was the reason people dismissed it. But in 2026, it's the reason Chrome OS might be the most correctly designed operating system for what's coming.

The security architecture treats the endpoint as untrusted by default. The containerized Linux environment gives developers a full headless stack without compromising the host. And an upcoming OS-level rewrite, Aluminium, puts Google's on-device AI models directly into the kernel.

This article covers security architecture, the container-based developer environment, cloud-streamed creative tools via AWS NICE DCV, cloud gaming, and what Aluminium OS means for on-device AI.

Here's what we'll cover:

Security-First Architecture in an Era of AI-Powered Threats
A Headless Linux Stack That's More Flexible Than It Looks
AWS NICE DCV Changes the Creative Tools Conversation
Cloud Gaming Works
Aluminium OS: On-Device Models on Google's Own Architecture
Where This Lands

Security-First Architecture in an Era of AI-Powered Threats

Threat actors are getting better tools. Models like Mythos are lowering the barrier for generating convincing phishing campaigns, crafting polymorphic malware, and automating social engineering at scale.

Traditional operating systems present exactly the attack surface these tools target: writable system files, user-installable executables, patches that sit uninstalled for weeks because someone clicked "remind me later."

Chrome OS sidesteps most of this by design. The root filesystem is read-only and cryptographically verified on every boot through a process called Verified Boot.

If anything has modified the OS files since the last verified state, whether that's malware, a compromised package, or a rogue AI agent that decided to start deleting system files, the device detects it at startup and either self-corrects or refuses to boot.

Persistence across reboots isn't difficult. It's architecturally impossible through software alone.

Updates happen silently. While you're working, the system downloads the next OS version to an inactive partition. On your next reboot, it pivots to the updated version. No prompts, no deferred patches, no exposure window.

Major updates ship every four to six weeks. Security patches land every two to three weeks. The gap between vulnerability discovery and remediation is measured in days.

Chrome OS consistently doesn't appear in the top 50 products by CVE count in the NIST vulnerability database. Windows and the Linux kernel sit near the top every year. When AI is actively being weaponized to find and exploit vulnerabilities faster than humans can patch them, a read-only, verified, automatically updated endpoint is a different category of security posture.

The tradeoff is trust. Chrome OS's security model means trusting Google as the root authority for your entire computing stack: updates, certificate trust, telemetry. Organizations with strict data sovereignty requirements should weigh that dependency carefully.

A Headless Linux Stack That's More Flexible Than It Looks

Chrome OS is a text-based operating system. There's no native GUI layer. Stop and sit with that for a second, because it's the thing that makes people dismiss Chrome OS and also the thing that makes it work.

The entire graphical interface you interact with IS the Chrome browser. The Ash shell, Chrome's window manager, is the desktop. You don't install applications onto it the way you install .exe files on Windows or drag .app bundles into a macOS Applications folder. If it isn't running in a browser tab, an Android VM, or a Linux container, it doesn't run. That restriction is what keeps the host locked down, and it's what makes everything else possible.

Under the hood, Chrome OS runs a minimal virtual machine called Termina through crosvm, Google's Rust-based VM monitor.

Inside Termina, LXD manages Linux containers. The default container, penguin, is a Debian environment with a special trick: it bridges GUI-based Linux applications directly into the Chrome OS desktop through a Wayland proxy called Sommelier. Install VS Code, GIMP, or LibreOffice in penguin and they show up in your Chrome OS app launcher, running in windows alongside your browser tabs. For a lot of developers, penguin alone covers the daily workflow.

But Termina gives you more than penguin. Through the LXD layer you can spin up independent containers that are fully isolated operating systems: Arch, Alpine, Ubuntu, whatever you need.

These aren't attached to the GUI bridge. They run headless, natively, with their own systemd, their own package managers, their own persistent state. Need a clean Ubuntu environment to test a deployment script without touching your main setup? lxc launch and you're there. Need to blow it away? lxc delete and it's gone. No orphaned files on the host, no cross-contamination between environments.

The key distinction from Docker is that LXD runs system containers (full OS emulation) rather than application containers. You get background services, persistent daemons, the works. You can also run Docker inside any of these LXD containers if you need application-level containerization on top of that.

Snapshot your entire environment with lxc snapshot before a risky dependency install and roll back instantly if something breaks. That kind of safety net is broader than version control alone: it captures your full OS configuration, not just code.

Pair this with browser-native tools like GitHub Codespaces, Google Colab, AWS CloudShell, or vscode.dev, and the terminal handles your local tooling while the browser handles everything else.

AI coding assistants like Claude and Gemini already operate natively in the browser. The distance between "cloud IDE" and "local IDE" keeps shrinking.

There are friction points: no custom kernel modules inside Crostini. Nested KVM requires Intel Gen 10+ processors. VPN routing into the Linux container from the Chrome OS host can be a headache, with WireGuard requiring userspace workarounds inside the container.

But none of these break the core architecture for cloud-native work. They're just worth knowing about before you commit.

AWS NICE DCV Changes the Creative Tools Conversation

One of the longest-standing arguments against Chrome OS has been the absence of professional creative software. There's no Premiere, no DaVinci Resolve, no Blender, no Ableton. For years, this was a dead-end conversation.

AWS NICE DCV (Desktop Cloud Visualization) reopens it. DCV is a high-performance remote display protocol that streams GPU-accelerated desktop sessions from EC2 instances to any device, including a Chromebook running the browser-based DCV client. It supports OpenGL, Vulkan, and DirectX rendering, with adaptive encoding that adjusts to network conditions. On AWS, the DCV license is free. You pay only for the EC2 compute time.

Netflix engineers use DCV to stream content creation applications to remote artists. Volkswagen runs 3D CAD simulations across their engineering division through it. A VFX studio called RVX used it to deliver visual effects for HBO's The Last of Us, streaming Nuke, Maya, Houdini, and Blender to artists distributed across Europe from servers in Iceland. Their team said it was the best remote experience they'd ever worked with.

So: a Chromebook connected to a g5.xlarge EC2 instance (one A10G GPU) can run Blender, DaVinci Resolve, or any other GPU-accelerated creative application with full hardware acceleration. The rendering happens in the data center. DCV streams the pixels. The creative professional gets a responsive, high-fidelity workspace on a $400 machine that couldn't locally render a single frame.

The constraints are connectivity and cost. You need sustained bandwidth (25+ Mbps for 1080p work, more for 4K multi-monitor setups) and leaving a GPU instance running around the clock adds up. But for studios and professionals who already budget for high-end workstations, the math often pencils out, especially when you factor in zero local hardware maintenance and the ability to scale GPU power on demand.

Cloud Gaming Works

GeForce NOW survived where Stadia failed because it made a better business decision: bring your own games. Connect your existing Steam, Epic, or Ubisoft library and stream from NVIDIA's server-side hardware. The Ultimate tier now runs on RTX 5080-class infrastructure. 4K at 120fps with ray tracing, on a fanless Chromebook.

Chrome OS has a structural advantage as a cloud gaming client. GeForce NOW runs natively in the Chromium browser via WebRTC, and users consistently report less micro-stuttering and tighter input handling than the standalone Windows desktop app. Under good network conditions, measured total latency runs 13 to 14ms, with sub-3ms ping documented near datacenter proximity. That's below human perceptual threshold for most game types.

Anti-cheat systems like Easy Anti-Cheat and Riot Vanguard are a non-issue in this model. They run on the server where the game executes, not on your local endpoint. On-device gaming isn't viable on Chrome OS and likely never will be. The architecture isn't designed for it, and even projects attempting to bridge local GPUs hit bottlenecks in the container layers. Cloud gaming is the path, and it works.

The limiting factors are network-dependent. Latency spikes above 500ms on bad connections make fast-twitch games unplayable, and NVIDIA's 100-hour monthly cap on the Ultimate tier has drawn criticism. But cloud gaming on Chrome OS has crossed the line from novelty to daily-driver viable for most use cases.

Aluminium OS: On-Device Models on Google's Own Architecture

The most consequential near-term development for Chrome OS is Project Aluminium, a ground-up rewrite that replaces the current Chrome OS foundation with a native Android kernel. Not another bolted-on compatibility layer: a new operating system built on Android 16, designed to run Android applications natively with direct hardware acceleration instead of routing them through the resource-heavy ARCVM virtual machine that currently eats CPU cycles on even basic app launches.

The AI story is the real story. Aluminium is being built with Gemini models integrated directly into the OS: the file system, the application launcher, the window manager.

Google serving their own proprietary models on their own devices, using an architecture optimized specifically to run them, is a level of vertical integration that no other OS vendor has in the pipeline. Apple has the silicon advantage for local inference. Google has the model-to-OS integration advantage. Those are competing theses about where AI compute should live, and both are worth taking seriously.

The rollout timeline from court documents and leaked roadmaps puts a trusted tester program on select hardware in late 2026, premium tablets by early 2027, and general consumer availability in 2028. Chrome OS Classic gets maintained through existing support obligations until 2033 or 2034.

The launch won't be perfect. Google's track record on platform transitions gives the community earned skepticism. But the ability to iterate a natively AI-integrated OS on hardware they control is the kind of capability that compounds over time.

Where This Lands

Two years ago, calling Chrome OS a serious platform for development or creative work would have been a stretch. Today you can run a full Debian environment with systemd daemons, snapshot your workspace, stream Blender from a GPU-backed data center, play AAA games at 4K on hardware you don't own, and do all of it from a verified, read-only endpoint that patches itself while you sleep.

The remaining gaps are real. But they're concentrated in workflows that are themselves moving to the cloud. Chrome OS was designed around assumptions about computing that used to be premature. They're not premature anymore.

How to Authenticate Users in Kubernetes: x509 Certificates, OIDC, and Cloud Identity

Destiny Erhabor — Mon, 06 Apr 2026 20:31:43 +0000

Kubernetes doesn't know who you are.

It has no user database, no built-in login system, no password file. When you run kubectl get pods, Kubernetes receives an HTTP request and asks one question: who signed this, and do I trust that signature? Everything else — what you're allowed to do, which namespaces you can access, whether your request goes through at all — comes after that question is answered.

This surprises most engineers who are new to Kubernetes. They expect something like a database of users with passwords. Instead, they find a pluggable chain of authenticators, each one able to vouch for a request in a different way:

Client certificates
OIDC tokens from an external identity provider
Cloud provider IAM tokens
Service account tokens projected into pods.

Any of these can be active at the same time.

Understanding this model is what separates engineers who can debug authentication failures from engineers who copy kubeconfig files and hope for the best.

In this article, you'll work through how the Kubernetes authentication chain works from first principles. You'll see how x509 client certificates are used — and why they're a poor choice for human users in production. You'll configure OIDC authentication with Dex, giving your cluster a real browser-based login flow. And you'll see how AWS, GCP, and Azure each plug into the same underlying model.

Prerequisites

A running kind cluster — a fresh one works fine, or reuse an existing one
kubectl and helm installed
openssl available on your machine (comes pre-installed on macOS and most Linux distros)
Basic familiarity with what a JWT is (a signed JSON object with claims) — you don't need to be able to write one, just recognise one

All demo files are in the companion GitHub repository.

How Kubernetes Authentication Works
How to Use x509 Client Certificates
Demo 1 — Create and Use an x509 Client Certificate
How to Set Up OIDC Authentication
Demo 2 — Configure OIDC Login with Dex and kubelogin
Cloud Provider Authentication
Webhook Token Authentication
Cleanup
Conclusion

How Kubernetes Authentication Works

Every request that reaches the Kubernetes API server — whether from kubectl, a pod, a controller, or a CI pipeline — carries a credential of some kind.

The API server passes that credential through a chain of authenticators in sequence. The first authenticator that can verify the credential wins. If none can, the request is treated as anonymous.

The Authenticator Chain

Kubernetes supports several authentication strategies simultaneously. You can have client certificate authentication and OIDC authentication active on the same cluster at the same time, which is common in production: cluster administrators use certificates, regular developers use OIDC. The strategies active on a cluster are determined by flags passed to the kube-apiserver process.

The strategies available are x509 client certificates, bearer tokens (static token files — rarely used in production), bootstrap tokens (used during node join operations), service account tokens, OIDC tokens, authenticating proxies, and webhook token authentication. A cluster doesn't have to use all of them, and most don't. But knowing they all exist helps when you're diagnosing an auth failure.

Users vs Service Accounts

There is an important distinction in how Kubernetes thinks about identity. Service accounts are Kubernetes objects — they live in a namespace, get created with kubectl create serviceaccount, and have tokens managed by the cluster itself. Every pod runs as a service account. These are machine identities for workloads.

Users, on the other hand, don't exist as Kubernetes objects at all. There is no kubectl create user command. Kubernetes doesn't manage user accounts. Instead, it trusts external systems to assert user identity — a certificate authority, an OIDC provider, or a cloud provider's IAM system. Kubernetes just verifies the assertion and extracts the username and group memberships from it.

	Service Account	User
Kubernetes object?	Yes — lives in a namespace	No — managed externally
Created with	`kubectl create serviceaccount`	External system (CA, IdP, cloud IAM)
Used by	Pods and workloads	Humans and CI systems
Token managed by	Kubernetes	External system
Namespaced?	Yes	No

What Happens After Authentication

Authentication only answers one question: who is this? Once the API server has a verified identity — a username and zero or more group memberships — it passes the request to the authorisation layer. By default that is RBAC, which checks the identity against Role and ClusterRole bindings to determine what the request is allowed to do.

This is why authentication and authorisation are separate concerns in Kubernetes. A valid certificate gets you past the front door. What you can do inside is RBAC's job. An authenticated user with no RBAC bindings can authenticate successfully but will be denied every API call.

If you want a deep dive into how RBAC rules, roles, and bindings work, check out this handbook on How to Secure a Kubernetes Cluster: RBAC, Pod Hardening, and Runtime Protection.

How to Use x509 Client Certificates

x509 client certificate authentication is the oldest and simplest authentication method in Kubernetes. It's how kubectl works out of the box when you create a cluster — the kubeconfig file that kind or kubeadm generates contains an embedded client certificate signed by the cluster's Certificate Authority.

How the Certificate Maps to an Identity

When the API server receives a request with a client certificate, it validates the certificate against its trusted CA, then reads two fields (The Common Name and Organization) from the certificate to construct an identity.

The Common Name (CN) field becomes the username. The Organization (O) field, which can contain multiple values, becomes the list of groups the user belongs to.

So a certificate with CN=jane and O=engineering authenticates as username jane in group engineering. If you want to give jane permissions, you create a RoleBinding that references either the username jane or the group engineering as a subject.

This is the same mechanism behind system:masters. When kind creates a cluster and writes a kubeconfig for you, it generates a certificate with O=system:masters. Kubernetes has a built-in ClusterRoleBinding that grants cluster-admin to anyone in the system:masters group. That's why your default kubeconfig has full admin access — it's not magic, it's a certificate with the right group.

The Cluster CA

Every Kubernetes cluster has a root Certificate Authority — a private key and a self-signed certificate that the API server trusts. Any client certificate signed by this CA is trusted by the cluster.

The CA certificate and key are typically stored in /etc/kubernetes/pki/ on the control plane node, or in the kube-system namespace as a secret, depending on how the cluster was created.

On kind clusters, you can copy the CA cert and key directly from the control plane container:

docker cp k8s-security-control-plane:/etc/kubernetes/pki/ca.crt ./ca.crt
docker cp k8s-security-control-plane:/etc/kubernetes/pki/ca.key ./ca.key

Whoever holds the CA key can issue certificates for any username and any group, including system:masters. This makes the CA key the most sensitive secret in a Kubernetes cluster. Guard it accordingly.

The Limits of Certificate-Based Auth

Client certificates work, but they have two fundamental problems that make them a poor choice for human users in production.

The first is that Kubernetes doesn't check certificate revocation lists (CRLs). If a developer's kubeconfig is stolen, the embedded certificate remains valid until it expires — which is typically one year in most Kubernetes setups. There's no way to immediately invalidate it. You can't "log out" a certificate. The only mitigation is to rotate the entire cluster CA, which invalidates every certificate including those belonging to other legitimate users.

The second is operational overhead. Certificates must be generated, distributed to users, and rotated before expiry. There's no self-service. In a team of ten engineers, managing certificates is annoying. In a team of a hundred, it's a full-time job.

For human access in production, OIDC is the right answer: short-lived tokens issued by a trusted identity provider, with a central revocation mechanism, and a standard browser-based login flow. Certificates are fine for service accounts and automation, where token management can be automated and rotation is handled programmatically.

That said, understanding certificates isn't optional. Your kubeconfig uses one. Your CI system probably does too. And cert-based auth is what you fall back to when everything else breaks.

Demo 1 — Create and Use an x509 Client Certificate

In this section, you'll generate a user certificate signed by the cluster CA, bind it to an RBAC role, and use it to authenticate to the cluster as a different user.

This guide is for local development and learning only. Manually signing certificates with the cluster CA and storing keys on disk is done here for simplicity.

In production, you should use the Kubernetes CertificateSigningRequest API or cert-manager for certificate issuance, enforce short-lived certificates with automatic rotation, and store private keys in a secrets manager (HashiCorp Vault, AWS Secrets Manager) or hardware security module (HSM) — never distribute the cluster CA key.

Step 1: Copy the CA cert and key from the kind control plane

docker cp k8s-security-control-plane:/etc/kubernetes/pki/ca.crt ./ca.crt
docker cp k8s-security-control-plane:/etc/kubernetes/pki/ca.key ./ca.key

This will create two files in your current directory called ca.crt and ca.key

Step 2: Generate a private key and CSR for a new user

You're creating a certificate for a user named jane in the engineering group:

# Generate the private key
openssl genrsa -out jane.key 2048

# Generate a Certificate Signing Request
# CN = username, O = group
openssl req -new \
  -key jane.key \
  -out jane.csr \
  -subj "/CN=jane/O=engineering"

Step 3: Sign the CSR with the cluster CA

openssl x509 -req \
  -in jane.csr \
  -CA ca.crt \
  -CAkey ca.key \
  -CAcreateserial \
  -out jane.crt \
  -days 365

Expected output:

Certificate request self-signature ok
subject=CN=jane, O=engineering

Step 4: Inspect the certificate

Before using it, confirm the identity it carries:

openssl x509 -in jane.crt -noout -subject -dates

subject=CN=jane, O=engineering
notBefore=Mar 20 10:00:00 2024 GMT
notAfter=Mar 20 10:00:00 2025 GMT

One year from now, this certificate becomes invalid and must be replaced. There's no way to extend it — you have to issue a new one.

Step 5: Build a kubeconfig entry for jane

# Get the cluster API server address from the current context
APISERVER=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}')

# Create a kubeconfig for jane
kubectl config set-cluster k8s-security \
  --server=$APISERVER \
  --certificate-authority=ca.crt \
  --embed-certs=true \
  --kubeconfig=jane.kubeconfig

kubectl config set-credentials jane \
  --client-certificate=jane.crt \
  --client-key=jane.key \
  --embed-certs=true \
  --kubeconfig=jane.kubeconfig

kubectl config set-context jane@k8s-security \
  --cluster=k8s-security \
  --user=jane \
  --kubeconfig=jane.kubeconfig

kubectl config use-context jane@k8s-security \
  --kubeconfig=jane.kubeconfig

Step 6: Test authentication — before RBAC

Try to list pods using jane's kubeconfig:

kubectl get pods -n staging --kubeconfig=jane.kubeconfig

Error from server (Forbidden): pods is forbidden: User "jane" cannot list
resource "pods" in API group "" in the namespace "staging"

This is correct. Jane authenticated successfully — Kubernetes knows who she is. But she has no RBAC bindings, so every API call is denied. Authentication passed, but authorisation failed.

Step 7: Grant jane access with RBAC

RBAC bindings use the username exactly as it appears in the certificate's CN field. If you need a refresher on how Roles, ClusterRoles, and RoleBindings work, this handbook How to Secure a Kubernetes Cluster: RBAC, Pod Hardening, and Runtime Protection covers the full RBAC model. For now, a simple RoleBinding using the built-in view ClusterRole is enough:

# jane-rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: jane-reader
  namespace: staging
subjects:
  - kind: User
    name: jane          # matches the CN in the certificate
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: view
  apiGroup: rbac.authorization.k8s.io

kubectl apply -f jane-rolebinding.yaml
kubectl get pods -n staging --kubeconfig=jane.kubeconfig

No resources found in staging namespace.

No error — jane can now list pods in staging. She can't delete them, create them, or access other namespaces. The certificate got her in. RBAC determines what she can do.

How to Set Up OIDC Authentication

OpenID Connect is an identity layer on top of OAuth 2.0. It's how Kubernetes integrates with enterprise identity providers — Active Directory, Okta, Google Workspace, Keycloak, and any other provider that speaks OIDC. Understanding how Kubernetes uses it requires following the token from the user's browser to the API server's decision.

How the OIDC Flow Works in Kubernetes

When a developer runs kubectl get pods with OIDC configured, the following happens:

kubectl checks whether the current credential in the kubeconfig is a valid, unexpired OIDC token
If not, it launches kubelogin, a kubectl plugin that opens a browser window
The browser redirects to the OIDC provider (Dex, Okta, your corporate IdP)
The user logs in with their corporate credentials
The OIDC provider issues a signed JWT and returns it to kubelogin
kubelogin caches the token locally (under ~/.kube/cache/oidc-login/) and returns it to kubectl
kubectl sends the token to the API server as a Bearer header
The API server fetches the provider's public keys from its JWKS endpoint and verifies the token signature
If valid, the API server extracts the username and group claims from the token
RBAC takes over from there

The Kubernetes API server never contacts the OIDC provider for each request. It only fetches the provider's public keys periodically to verify signatures locally. This makes OIDC authentication stateless and scalable.

The API Server Configuration

For OIDC to work, the API server needs to know where to find the identity provider and how to interpret the tokens it issues.

In Kubernetes v1.30+, this is configured through an AuthenticationConfiguration file passed via the --authentication-config flag. (In older versions, individual --oidc-* flags were used instead, but these were removed in v1.35.)

The AuthenticationConfiguration defines OIDC providers under the jwt key:

Field	What it does	Example
`issuer.url`	The OIDC provider's base URL — must match the `iss` claim in the token	`https://dex.example.com`
`issuer.audiences`	The client IDs the token was issued for — must match the `aud` claim	`["kubernetes"]`
`issuer.certificateAuthority`	CA certificate to trust when contacting the OIDC provider (inlined PEM)	`-----BEGIN CERTIFICATE-----...`
`claimMappings.username.claim`	Which JWT claim to use as the Kubernetes username	`email`
`claimMappings.groups.claim`	Which JWT claim to use as the Kubernetes group list	`groups`
`claimMappings.*.prefix`	Prefix added to the claim value — set to `""` for no prefix	`""`

On a kind cluster, the --authentication-config flag is set in the cluster configuration before creation, not after. You'll see this in the next demo.

JWT Claims Kubernetes Uses

A JWT is a signed JSON object with three sections: a header, a payload, and a signature. The payload is a set of claims – key-value pairs that assert facts about the token. Kubernetes reads specific claims from the payload to build an identity.

The required claims are iss (the issuer URL, must match issuer.url in the AuthenticationConfiguration), sub (the subject, a unique identifier for the user), and aud (the audience, must match the issuer.audiences list). The exp claim (expiry time) is also required as the API server rejects expired tokens.

The most useful optional claim is groups (or whatever you configure via claimMappings.groups.claim). When this claim is present, Kubernetes can map OIDC group memberships directly to RBAC group bindings. A user in the platform-engineers group in your identity provider automatically gets the RBAC permissions you've bound to that group in Kubernetes — no manual user management required.

How kubelogin Works

kubelogin (also distributed as kubectl oidc-login) is a kubectl credential plugin. Instead of embedding a static certificate or token in your kubeconfig, you configure a credential plugin that runs a helper binary when kubectl needs a token.

When kubelogin is invoked, it checks its local token cache. If the cached token is still valid, it returns it immediately. If the token has expired, it initiates the OIDC authorization code flow — opens a browser, redirects to the identity provider, receives the token after login, caches it locally, and returns it to kubectl. The whole flow takes about five seconds when it triggers.

This means tokens are short-lived (typically an hour) and rotate automatically. If a developer's machine is compromised, the token expires on its own. There is no long-lived credential sitting in a file somewhere.

In this section, you'll deploy Dex as a self-hosted OIDC provider, configure a kind cluster to trust it, and log in with a browser. Dex is a good demo vehicle because it runs inside the cluster and doesn't require a cloud account or an external service.

This guide is for local development and learning only. Self-signed certificates, static passwords, and certs stored on disk are used here for simplicity.

In production, use a managed identity provider (Azure Entra ID, Google Workspace, Okta), automate certificate lifecycle with cert-manager, and store secrets in a secrets manager (HashiCorp Vault, AWS Secrets Manager) or inject them via CSI driver — never commit or store certs as local files.

Step 1: Create a kind cluster with OIDC authentication

OIDC authentication for the API server must be configured at cluster creation time on Kind because the API server needs to know which identity provider to trust before it starts accepting requests.

Note: Kubernetes v1.30+ deprecated the --oidc-* API server flags in favor of the structured AuthenticationConfiguration API (via --authentication-config). In v1.35+ the old flags are removed entirely. This guide uses the new approach.

nip.io is a wildcard DNS service — dex.127.0.0.1.nip.io resolves to 127.0.0.1. This lets us use a real hostname for TLS without editing /etc/hosts.

First, generate a self-signed CA and TLS certificate for Dex:

# Generate a CA for Dex
openssl req -x509 -newkey rsa:4096 -keyout dex-ca.key \
  -out dex-ca.crt -days 365 -nodes \
  -subj "/CN=dex-ca"

# Generate a certificate for Dex signed by that CA
openssl req -newkey rsa:2048 -keyout dex.key \
  -out dex.csr -nodes \
  -subj "/CN=dex.127.0.0.1.nip.io"

openssl x509 -req -in dex.csr \
  -CA dex-ca.crt -CAkey dex-ca.key \
  -CAcreateserial -out dex.crt -days 365 \
  -extfile <(printf "subjectAltName=DNS:dex.127.0.0.1.nip.io")

Next, generate the AuthenticationConfiguration file. This tells the API server how to validate JWTs — which issuer to trust (url), which audience to expect (audiences), and which JWT claims map to Kubernetes usernames and groups (claimMappings). The CA cert is inlined so the API server can verify Dex's TLS certificate when fetching signing keys:

cat > auth-config.yaml <


The kind-oidc.yaml config uses extraPortMappings to expose Dex's port to your browser, extraMounts to copy files into the Kind node, and a kubeadmConfigPatch to pass --authentication-config to the API server:
# kind-oidc.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    extraPortMappings:
      # Forward port 32000 from the Docker container to localhost,
      # so your browser can reach Dex's login page
      - containerPort: 32000
        hostPort: 32000
        protocol: TCP
    extraMounts:
      # Copy files from your machine into the Kind node's filesystem
      - hostPath: ./dex-ca.crt
        containerPath: /etc/ca-certificates/dex-ca.crt
        readOnly: true
      - hostPath: ./auth-config.yaml
        containerPath: /etc/kubernetes/auth-config.yaml
        readOnly: true
    kubeadmConfigPatches:
      # Patch the API server to enable OIDC authentication
      - |
        kind: ClusterConfiguration
        apiServer:
          extraArgs:
            # Tell the API server to load our AuthenticationConfiguration
            authentication-config: /etc/kubernetes/auth-config.yaml
          extraVolumes:
            # Mount files into the API server pod (it runs as a static pod,
            # so it needs explicit volume mounts even though files are on the node)
            - name: dex-ca
              hostPath: /etc/ca-certificates/dex-ca.crt
              mountPath: /etc/ca-certificates/dex-ca.crt
              readOnly: true
              pathType: File
            - name: auth-config
              hostPath: /etc/kubernetes/auth-config.yaml
              mountPath: /etc/kubernetes/auth-config.yaml
              readOnly: true
              pathType: File

Create the cluster:
kind create cluster --name k8s-auth --config kind-oidc.yaml

Step 2: Deploy Dex
Dex is an OIDC-compliant identity provider that acts as a bridge between Kubernetes and upstream identity sources (LDAP, SAML, GitHub, and so on). In this demo it runs inside the cluster with a static password database — two hardcoded users you can log in as.
The API server doesn't talk to Dex directly on every request. It only needs Dex's CA certificate (which you inlined in the AuthenticationConfiguration) to verify the JWT signatures on tokens that Dex issues.
The deployment has four parts: a ConfigMap with Dex's configuration, a Deployment to run Dex, a NodePort Service to expose it on port 32000 (matching the issuer URL), and RBAC resources so Dex can store state using Kubernetes CRDs.
First, create the namespace and load the TLS certificate as a Kubernetes Secret. Dex needs this to serve HTTPS. Without it, your browser and the API server would refuse to connect:
kubectl create namespace dex

kubectl create secret tls dex-tls \
  --cert=dex.crt \
  --key=dex.key \
  -n dex

Save the following as dex-config.yaml. This configures Dex with a static password connector — two hardcoded users for the demo:
# dex-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dex-config
  namespace: dex
data:
  config.yaml: |
    # issuer must exactly match the URL in your AuthenticationConfiguration
    issuer: https://dex.127.0.0.1.nip.io:32000

    # Dex stores refresh tokens and auth codes — here it uses Kubernetes CRDs
    storage:
      type: kubernetes
      config:
        inCluster: true

    # Dex's HTTPS listener — serves the login page and token endpoints
    web:
      https: 0.0.0.0:5556
      tlsCert: /etc/dex/tls/tls.crt
      tlsKey: /etc/dex/tls/tls.key

    # staticClients defines which applications can request tokens.
    # "kubernetes" is the client ID that kubelogin uses when authenticating
    staticClients:
      - id: kubernetes
        redirectURIs:
          - http://localhost:8000     # kubelogin listens here to receive the callback
        name: Kubernetes
        secret: kubernetes-secret     # shared secret between kubelogin and Dex

    # Two demo users with the password "password" (bcrypt-hashed).
    # In production, you'd connect Dex to LDAP, SAML, or a social login instead
    enablePasswordDB: true
    staticPasswords:
      - email: "jane@example.com"
        # bcrypt hash of "password" — generate your own with: htpasswd -bnBC 10 "" password
        hash: "\(2a\)10$2b2cU8CPhOTaGrs1HRQuAueS7JTT5ZHsHSzYiFPm1leZck7Mc8T4W"
        username: "jane"
        userID: "08a8684b-db88-4b73-90a9-3cd1661f5466"
      - email: "admin@example.com"
        hash: "\(2a\)10$2b2cU8CPhOTaGrs1HRQuAueS7JTT5ZHsHSzYiFPm1leZck7Mc8T4W"
        username: "admin"
        userID: "a8b53e13-7e8c-4f7b-9a33-6c2f4d8c6a1b"
        groups:
          - platform-engineers

Save the following as dex-deployment.yaml. This creates the Deployment, Service, ServiceAccount, and RBAC that Dex needs to run:
# dex-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dex
  namespace: dex
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dex
  template:
    metadata:
      labels:
        app: dex
    spec:
      serviceAccountName: dex
      containers:
        - name: dex
          # v2.45.0+ required — earlier versions don't include groups from staticPasswords in tokens
          image: ghcr.io/dexidp/dex:v2.45.0
          command: ["dex", "serve", "/etc/dex/cfg/config.yaml"]
          ports:
            - name: https
              containerPort: 5556
          volumeMounts:
            - name: config
              mountPath: /etc/dex/cfg
            - name: tls
              mountPath: /etc/dex/tls
      volumes:
        - name: config
          configMap:
            name: dex-config
        - name: tls
          secret:
            secretName: dex-tls
---
# NodePort Service — exposes Dex on port 32000 on the Kind node.
# Combined with extraPortMappings, this makes Dex reachable from your browser
apiVersion: v1
kind: Service
metadata:
  name: dex
  namespace: dex
spec:
  type: NodePort
  ports:
    - name: https
      port: 5556
      targetPort: 5556
      nodePort: 32000
  selector:
    app: dex
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: dex
  namespace: dex
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: dex
rules:
  - apiGroups: ["dex.coreos.com"]
    resources: ["*"]
    verbs: ["*"]
  - apiGroups: ["apiextensions.k8s.io"]
    resources: ["customresourcedefinitions"]
    verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: dex
subjects:
  - kind: ServiceAccount
    name: dex
    namespace: dex
roleRef:
  kind: ClusterRole
  name: dex
  apiGroup: rbac.authorization.k8s.io

kubectl apply -f dex-config.yaml
kubectl apply -f dex-deployment.yaml
kubectl rollout status deployment/dex -n dex

Step 3: Install kubelogin
# macOS
brew install int128/kubelogin/kubelogin

# Linux
curl -LO https://github.com/int128/kubelogin/releases/latest/download/kubelogin_linux_amd64.zip
unzip -j kubelogin_linux_amd64.zip kubelogin -d /tmp
sudo mv /tmp/kubelogin /usr/local/bin/kubectl-oidc_login
rm kubelogin_linux_amd64.zip

Confirm it's installed:
kubectl oidc-login --version

Step 4: Configure a kubeconfig entry for OIDC
This creates a new user and context in your kubeconfig. Instead of using a client certificate (like the default Kind admin), it tells kubectl to use kubelogin to get a token from Dex.
The --oidc-extra-scope flags are important: without email and groups, Dex won't include those claims in the JWT, and the API server won't know who you are or what groups you belong to.
kubectl config set-credentials oidc-user \
  --exec-api-version=client.authentication.k8s.io/v1beta1 \
  --exec-command=kubectl \
  --exec-arg=oidc-login \
  --exec-arg=get-token \
  --exec-arg=--oidc-issuer-url=https://dex.127.0.0.1.nip.io:32000 \
  --exec-arg=--oidc-client-id=kubernetes \
  --exec-arg=--oidc-client-secret=kubernetes-secret \
  --exec-arg=--oidc-extra-scope=email \
  --exec-arg=--oidc-extra-scope=groups \
  --exec-arg=--certificate-authority=$(pwd)/dex-ca.crt

kubectl config set-context oidc@k8s-auth \
  --cluster=kind-k8s-auth \
  --user=oidc-user

kubectl config use-context oidc@k8s-auth

Step 5: Trigger the login flow
Jane has no RBAC permissions yet, so first grant her read access from the admin context:
kubectl --context kind-k8s-auth create clusterrolebinding jane-view \
  --clusterrole=view --user=jane@example.com

Now switch to the OIDC context and trigger a login:
kubectl get pods -n default

Your browser opens and redirects to the Dex login page. Log in as jane@example.com with password password.




After login, the terminal completes:
No resources found in default namespace.

The browser-based authentication worked. kubectl received the token from Dex, sent it to the API server, the API server validated the JWT signature using the CA certificate from the AuthenticationConfiguration, extracted jane@example.com from the email claim, matched it against the RBAC binding, and authorized the request.
Without the clusterrolebinding, you would see Error from server (Forbidden) — authentication succeeds (the API server knows who you are) but authorization fails (jane has no permissions). This is the distinction between 401 Unauthorized and 403 Forbidden.
Step 6: Inspect the JWT
A JWT (JSON Web Token) is a signed JSON payload that contains claims about the user. kubelogin caches the token locally under ~/.kube/cache/oidc-login/ so you don't have to log in on every kubectl command.
List the directory to find the cached file:
ls ~/.kube/cache/oidc-login/

Decode the JWT payload directly from the cache:
cat ~/.kube/cache/oidc-login/$(ls ~/.kube/cache/oidc-login/ | grep -v lock | head -1) | \
  python3 -c "
import json, sys, base64
token = json.load(sys.stdin)['id_token'].split('.')[1]
token += '=' * (4 - len(token) % 4)
print(json.dumps(json.loads(base64.urlsafe_b64decode(token)), indent=2))
"

You'll see something like:
{
  "iss": "https://dex.127.0.0.1.nip.io:32000",
  "sub": "CiQwOGE4Njg0Yi1kYjg4LTRiNzMtOTBhOS0zY2QxNjYxZjU0NjYSBWxvY2Fs",
  "aud": "kubernetes",
  "exp": 1775307910,
  "iat": 1775221510,
  "email": "jane@example.com",
  "email_verified": true
}

The email claim becomes jane's Kubernetes username because the AuthenticationConfiguration maps username.claim: email. The aud matches the configured audiences. The iss matches the issuer url. This is how the API server validates the token without contacting Dex on every request — it only needs the CA certificate to verify the JWT signature.
Step 7: Map OIDC groups to RBAC
The admin@example.com user has a groups claim in the Dex config containing platform-engineers. Instead of creating individual RBAC bindings per user, you can bind permissions to a group — anyone whose JWT contains that group gets the permissions automatically:
# platform-engineers-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: platform-engineers-admin
subjects:
  - kind: Group
    name: platform-engineers     # matches the groups claim in the JWT
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

You're currently logged in as jane@example.com via the OIDC context, but jane only has view permissions — she can't create cluster-wide RBAC bindings. Switch back to the admin context to apply this:
kubectl config use-context kind-k8s-auth
kubectl apply -f platform-engineers-binding.yaml
kubectl config use-context oidc@k8s-auth

Now clear the cached token to log out of jane's session, then trigger a new login as admin@example.com:
# Clear the cached token — this is how you "log out" with kubelogin
rm -rf ~/.kube/cache/oidc-login/

# This will open the browser again for a fresh login
kubectl get pods -n default

Log in as admin@example.com with password password. This time the JWT will contain "groups": ["platform-engineers"], which matches the ClusterRoleBinding you just created. The admin user gets full cluster access — without ever being added to a kubeconfig by name.
You can verify by decoding the new token (Step 6) — the groups claim will be present:
{
  "email": "admin@example.com",
  "groups": ["platform-engineers"]
}

This is the real power of OIDC group claims: you manage group membership in your identity provider, and Kubernetes permissions follow automatically. Add someone to the platform-engineers group in Dex (or any upstream IdP), and they get cluster-admin access on their next login — no kubeconfig or RBAC changes needed.
Cloud Provider Authentication
AWS, GCP, and Azure each give Kubernetes clusters a native authentication mechanism that ties into their IAM systems.
The implementations differ in API surface, but they all use the same underlying mechanism: OIDC token projection. Once you understand how Dex works above, these are all variations on the same theme.
AWS EKS
EKS uses the aws-iam-authenticator to translate AWS IAM identities into Kubernetes identities. When you run kubectl against an EKS cluster, the AWS CLI generates a short-lived token signed with your IAM credentials. The API server passes this token to the aws-iam-authenticator webhook, which verifies it against AWS STS and returns the corresponding username and groups.
User access is controlled via the aws-auth ConfigMap in kube-system, which maps IAM role ARNs and IAM user ARNs to Kubernetes usernames and groups. A typical entry looks like this:
# In kube-system/aws-auth ConfigMap
mapRoles:
  - rolearn: arn:aws:iam::123456789:role/platform-engineers
    username: platform-engineer:{{SessionName}}
    groups:
      - platform-engineers

AWS is migrating from the aws-auth ConfigMap to a newer Access Entries API, which manages the same mapping through the EKS API rather than a ConfigMap. The underlying authentication mechanism is the same.
Google GKE
GKE integrates with Google Cloud IAM using two different mechanisms, depending on whether you're authenticating as a human user or as a workload.
For human users, GKE accepts standard Google OAuth2 tokens. Running gcloud container clusters get-credentials writes a kubeconfig that uses the gcloud CLI as a credential plugin, generating short-lived tokens from your Google account automatically.
For pod-level identity — letting a pod assume a Google Cloud IAM role — GKE uses Workload Identity. You annotate a Kubernetes service account to bind it to a Google Service Account, and pods running as that service account can call Google Cloud APIs using the GSA's permissions:
# Bind a Kubernetes SA to a Google Service Account
kubectl annotate serviceaccount my-app \
  --namespace production \
  iam.gke.io/gcp-service-account=my-app@my-project.iam.gserviceaccount.com

Azure AKS
AKS integrates with Azure Active Directory. When Azure AD integration is enabled, kubectl requests an Azure AD token on behalf of the user via the Azure CLI, and the AKS API server validates it against Azure AD.
For pod-level identity, AKS uses Azure Workload Identity, which follows the same OIDC federation pattern as GKE Workload Identity. A Kubernetes service account is annotated with an Azure Managed Identity client ID, and pods can request Azure AD tokens without storing any credentials:
# Annotate a service account with the Azure Managed Identity client ID
kubectl annotate serviceaccount my-app \
  --namespace production \
  azure.workload.identity/client-id=

The underlying pattern across all three providers is the same: a trusted OIDC token is issued by the cloud provider, verified by the Kubernetes API server, and mapped to an identity through a binding (the aws-auth ConfigMap, a GKE Workload Identity binding, or an AKS federated identity credential). The OIDC section in this article is the conceptual foundation for all of them.
Webhook Token Authentication
Webhook token authentication is worth knowing about because it appears in several common Kubernetes setups, even if you never configure it yourself.
When a request arrives with a bearer token that no other authenticator recognises, Kubernetes can send that token to an external HTTP endpoint for validation. The endpoint returns a response indicating who the token belongs to.
This is how EKS authentication worked before the aws-iam-authenticator was built into the API server. It's also how bootstrap tokens work during node join operations: a token is generated, embedded in the kubeadm join command, and validated by the bootstrap webhook when the new node contacts the API server for the first time.
For most clusters, you'll encounter webhook auth as something already running rather than something you configure. The main thing to know is that it exists and what it looks like when it appears in logs or configuration.
Cleanup
To remove everything created in this article:
# Delete the OIDC demo cluster
kind delete cluster --name k8s-auth

# Remove generated certificate files
rm -f ca.crt ca.key jane.key jane.csr jane.crt jane.kubeconfig
rm -f dex-ca.crt dex-ca.key dex.crt dex.key dex.csr dex-ca.srl auth-config.yaml

# Remove the kubelogin token cache
rm -rf ~/.kube/cache/oidc-login/

Conclusion
Kubernetes authentication is not a single mechanism — it's a chain of pluggable strategies, each one suited to different use cases. In this article you worked through the most important ones.
x509 client certificates are how Kubernetes works out of the box. The CN field becomes the username, the O field becomes the group, and the cluster CA is the trust anchor. You created a certificate for a new user, bound it to RBAC, and saw exactly how authentication and authorisation interact — authentication gets you in, RBAC determines what you can do.
You also saw the fundamental limitation: Kubernetes doesn't check certificate revocation lists, so a compromised certificate remains valid until it expires. This makes certificates a poor fit for human users in production environments.
OIDC is the production-grade answer. Tokens are short-lived, issued by a trusted identity provider, and map directly to Kubernetes groups through JWT claims. You deployed Dex as a self-hosted OIDC provider, configured the API server to trust it, and set up kubelogin for browser-based authentication.
You then decoded a JWT to see exactly what the API server reads from it, and mapped an OIDC group claim to a Kubernetes ClusterRoleBinding.
Cloud provider authentication — EKS, GKE, AKS — uses the same OIDC foundation with provider-specific wrappers. Understanding how Dex works makes each of those systems immediately readable.
All YAML, certificates, and configuration files from this article are in the companion GitHub repository.



 How to Build AI Agents That Can Control Cloud Infrastructure 
Manish Shivanandhan — Tue, 31 Mar 2026 16:00:38 +0000
 Cloud infrastructure has become deeply programmable over the past decade.
Nearly every platform exposes APIs that allow developers to create applications, provision databases, configure networking, and retrieve metrics.
This shift enabled automation via Infrastructure as Code and CI/CD pipelines, allowing teams to manage systems through scripts rather than dashboards.
Now another layer of automation is emerging. AI agents are starting to participate directly in development workflows. These agents can read codebases, generate implementations, run terminal commands, and help debug systems. The next logical step is to allow them to interact with the infrastructure itself.
Instead of manually inspecting dashboards or remembering complex command-line syntax, developers can ask an AI agent to check system state, deploy services, or retrieve metrics. The agent performs these tasks by interacting with cloud APIs on behalf of the user.
This capability opens the door to a new type of workflow where infrastructure becomes conversational, programmable, and deeply integrated into development environments.
In this article, we will explore how AI agents can interact with cloud infrastructure through APIs, the challenges of exposing large APIs to AI systems, and how architectures like MCP make it possible for agents to discover and execute infrastructure operations safely. We will also look at a practical example of connecting an AI agent to a cloud platform like Sevalla using the search-and-execute pattern.
Familiarity with cloud infrastructure concepts such as APIs, Infrastructure as Code, and CI/CD workflows is recommended to follow along effectively. You should also have a basic understanding of how AI agents or developer assistants interact with code and systems to fully understand the architectures discussed in this article.
What We'll Cover

AI Agents Are Becoming Part of the Development Environment

Connecting AI Agents to External Systems

The Challenge of Large Cloud APIs

A Simpler Pattern for API Access

Why Sandboxed Code Execution Is Important

Practical Example with Sevalla

What This Means for Developers

The Next Evolution of Infrastructure Automation


AI Agents Are Becoming Part of the Development Environment
Modern developer tools increasingly embed AI assistants directly inside coding environments. Editors such as Cursor, Windsurf, and Claude Code allow developers to ask questions about their projects, generate new code, and execute commands without leaving the editor.
Instead of manually navigating documentation or writing boilerplate code, developers can simply describe what they want. The AI interprets the request and produces the necessary actions.
This approach is already common for tasks like writing functions, refactoring code, or debugging errors. However, infrastructure management is still largely handled through dashboards, terminal commands, or external tooling.
If AI agents are going to assist developers effectively, they need access to the same systems developers interact with every day. That means accessing APIs that manage applications, databases, deployments, and other infrastructure resources.
The challenge is providing that access in a structured and scalable way.
Connecting AI Agents to External Systems
AI agents do not inherently know how to interact with external services. They need a framework that allows them to call tools and access data safely.
Model Context Protocol, or MCP, provides one such framework. MCP is designed to let AI assistants connect to external tools in a standardized way.
An MCP server exposes tools that an AI agent can call when it needs information or wants to act. These tools might retrieve data from a database, query logs, interact with APIs, or execute commands on a remote system.
When the AI agent receives a request from the user, it determines which tool to call and executes that tool through the MCP server. The results are returned to the agent, which can then continue reasoning about the problem.
This architecture allows AI assistants to interact with complex systems while maintaining a clear boundary between the agent and the external environment.
The Challenge of Large Cloud APIs
While MCP enables connecting AI agents to infrastructure systems, cloud platforms introduce an additional challenge.
Most cloud platforms expose large APIs with many endpoints. A typical platform might include endpoints for managing applications, databases, storage, networking, domains, metrics, logs, and deployment pipelines.
If an MCP server exposes each endpoint as a separate tool, the number of tools can quickly grow into the hundreds.
This creates several problems. First, the AI agent must understand the purpose and parameters of every available tool before deciding which one to use. This increases the amount of context required for the agent to operate effectively.
Second, maintaining hundreds of tools becomes difficult for developers who build and maintain the MCP server.
Third, the system becomes rigid. Every time a new API endpoint is added, a new tool must also be created and documented.
For large APIs, this approach quickly becomes impractical.
A Simpler Pattern for API Access
A different architecture solves this problem by dramatically reducing the number of tools exposed to the AI.
Instead of providing a separate tool for every API endpoint, the MCP server exposes only two capabilities.
The first capability allows the agent to search the API specification. This lets the agent discover available endpoints, understand parameters, and inspect request or response schemas.
The second capability allows the agent to execute code that calls the API.
In this model, the AI agent dynamically generates the code required to call the API. Because the agent can search the specification and write its own API calls, the MCP server does not need to define individual tools for every endpoint.
This pattern drastically reduces the complexity of the integration while still giving the agent full access to the underlying platform.
Why Sandboxed Code Execution Is Important
Allowing AI agents to generate and execute code raises important security considerations.
If the generated code runs unrestricted, it could potentially access sensitive parts of the system or perform unintended operations. To prevent this, the execution environment must be carefully controlled.
A common solution is running the generated code inside a sandboxed environment. In this setup, the code runs in an isolated runtime with limited permissions. The environment exposes only specific functions that allow interaction with the platform’s API.
Because the code cannot access the host system directly, the risk of unintended behavior is greatly reduced. At the same time, the AI agent retains the flexibility to generate custom API calls as needed.
This combination of dynamic code generation and sandboxed execution makes it possible for AI agents to interact with complex APIs safely.
Practical Example with Sevalla
A practical implementation of this architecture can be seen in the Sevalla MCP server, which exposes a cloud platform’s API to AI agents through the search-and-execute pattern.
Sevalla is a PaaS provider designed for developers shipping production applications. It offers app hosting, database, object storage, and static site hosting for your projects. We also have other options, such as AWS and Azure, that come with their own MCP tools.
Instead of registering hundreds of tools for every API endpoint, the server provides only two tools that allow the AI agent to explore and interact with the entire platform. Find the full documentation for Sevalla’s MCP server here.
The first tool, search, allows the agent to query the platform’s OpenAPI specification. Through this interface the agent can discover available endpoints, understand parameters, and inspect response schemas.


Because the API specification is searchable, the agent does not need to know the structure of the platform’s API in advance. It can explore the API dynamically based on the task it needs to perform.
For example, if the user asks the agent to list all applications running in their account, the agent can begin by searching the API specification.
const endpoints = await sevalla.search("list all applications")

The result returns the relevant API definitions, including the correct path and parameters required for the request. Once the agent understands which endpoint to use, it can generate the necessary API call.
The second tool, execute, runs JavaScript inside a sandboxed V8 environment. Within this environment the agent can call the API using a helper function provided by the platform.
const apps = await sevalla.request({
  method: "GET",
  path: "/applications"
})

Because the code runs inside an isolated V8 sandbox, the generated script cannot access the host system. The only permitted interaction is through the API helper function. This ensures that the AI agent can perform infrastructure operations safely while still retaining the flexibility to generate dynamic API calls.
This approach allows an agent to discover and interact with many parts of the platform without requiring predefined tools for each capability. After discovering endpoints through the API specification, the agent can retrieve application data, inspect deployments, query metrics, or manage infrastructure resources through generated API calls.
The design also significantly reduces context usage. Traditional MCP integrations might require hundreds of tools to represent every endpoint of a large API. In contrast, the search-and-execute pattern allows the entire API surface to be accessed through just two tools.
For developers connecting AI assistants to infrastructure platforms, this architecture provides a practical way to expose large APIs while keeping the integration simple and efficient.
What This Means for Developers
Allowing AI agents to interact with infrastructure APIs changes how developers manage systems.
Instead of manually navigating dashboards or writing long sequences of commands, developers can describe what they want in natural language. The AI agent can interpret the request, discover the relevant API endpoints, and execute the required operations.
This approach also improves observability and debugging. When something goes wrong, the agent can query logs, inspect metrics, and retrieve system state without requiring the developer to manually gather information.
Over time, this type of integration could significantly reduce the friction involved in managing complex cloud systems.
The Next Evolution of Infrastructure Automation
Infrastructure automation has evolved through several stages. Early cloud systems relied heavily on manual configuration through web interfaces. Infrastructure as Code later allowed teams to define infrastructure using scripts and configuration files.
CI/CD pipelines then automated the process of deploying and updating systems.
AI agents represent the next step in this progression. By combining APIs, MCP integrations, and sandboxed execution environments, developers can allow intelligent systems to reason about infrastructure and interact with it safely.
Instead of static integrations, agents can dynamically discover and call APIs as needed. This makes infrastructure management more flexible and accessible while maintaining the reliability of programmable systems.
As AI tools become more deeply embedded in development environments, the ability for agents to understand and control infrastructure will likely become a standard capability for modern platforms.
Hope you enjoyed this article. Visit my blog for more practical tutorials.
 


 Infrastructure as Code with APIs: How to Automate Cloud Resources the Developer Way 
Manish Shivanandhan — Mon, 23 Mar 2026 17:58:20 +0000
 Modern software development moves fast. Teams deploy code many times a day. New environments appear and disappear constantly. In this world, manual infrastructure setup simply doesn't scale.
For years, developers logged into dashboards, clicked through forms, and configured servers by hand. This worked for small projects, but it quickly became fragile. Every manual step increased the chance of mistakes. Environments drifted apart. Reproducing the same setup became difficult.
Infrastructure as Code (IaC) solves this problem. Instead of clicking through interfaces, developers define infrastructure using code. This approach makes infrastructure predictable, repeatable, and easy to automate.
In recent years, another approach has become popular alongside traditional IaC tools: using cloud APIs directly to create and manage infrastructure. This gives developers full control over how resources are provisioned and integrated into workflows.
This article explains what Infrastructure as Code means, why APIs are a powerful way to implement it, and how developers can automate cloud resources using simple scripts.
A basic understanding of cloud platforms, command-line interfaces, and scripting languages like Python, Bash, or JavaScript will help you follow along effectively. Familiarity with APIs, authentication methods, and CI/CD concepts will also make it easier to implement the automation techniques discussed in this article.
Here's what we'll cover:

What Is Infrastructure as Code?

The Limits of Manual Infrastructure

Why APIs Are a Powerful IaC Tool

Automating Infrastructure with Scripts

Practical Example with Sevalla

Installing CLI

Working with your Infrastructure using CLI



Infrastructure as Code Improves Developer Productivity

The Future of Infrastructure

Conclusion


What Is Infrastructure as Code?
Infrastructure as Code means managing infrastructure using code instead of manual processes.
Instead of setting up servers, databases, and networks by hand, you define them in scripts or configuration files. These files describe the desired state of your infrastructure. A tool or script then creates and maintains that state automatically.
For example, instead of manually creating a database, you might define it in code like this:
database:
  name: app_db
  engine: postgres
  version: 16

Once the code runs, the database is created automatically.
This approach provides several key benefits.
First, it improves consistency. Every environment is created from the same definition. Development, staging, and production environments stay aligned.
Second, it improves repeatability. If infrastructure fails, it can be recreated from code in minutes.
Third, it improves version control. Infrastructure definitions live in the same repositories as application code. Teams can review, track, and roll back changes.
Finally, it enables automation. Infrastructure can be created during deployments, tests, or CI/CD pipelines.
The Limits of Manual Infrastructure
Before IaC became common, infrastructure management relied heavily on dashboards and manual configuration.
A developer would open a cloud console and perform steps like:

Create a server

Attach storage

Configure environment variables

Connect a database

Add a domain


These steps worked, but they introduced problems.
First of all, manual configuration is hard to document. Even if teams write guides, small details are often missed. Over time, environments drift apart.
Manual processes also slow down development. Spinning up a new environment may take hours instead of seconds.
Even worse, manual infrastructure cannot easily be tested. If something breaks, reproducing the same conditions becomes difficult.
Infrastructure as Code removes these problems by turning infrastructure into something that can be scripted, tested, and automated.
Why APIs Are a Powerful IaC Tool
Many people associate Infrastructure as Code with tools like Terraform or CloudFormation. These tools are powerful, but they're not the only option.
Every modern cloud platform exposes an API. That API allows developers to create resources programmatically.
This means infrastructure can be controlled directly from code using HTTP requests or command-line interfaces.
Using APIs for IaC has several advantages.
First, it offers maximum flexibility. Developers can integrate infrastructure creation directly into applications, deployment scripts, or internal tools.
Second, it reduces tooling complexity. Instead of learning a specialized IaC language, teams can use languages they already know, such as Python, JavaScript, or Bash.
Third, it enables dynamic infrastructure. Scripts can create resources only when needed, scale them automatically, and remove them when work is complete.
For example, a test suite could automatically create a database, run tests, and delete the database afterwards. This keeps environments clean and reduces costs.
APIs essentially turn the cloud into a programmable platform.
Automating Infrastructure with Scripts
Using APIs for infrastructure automation usually follows a simple workflow.

First, a script authenticates with the cloud platform using an API token or credentials.

Second, the script sends requests to create or modify resources such as applications, databases, or storage.

Third, the script captures identifiers or configuration values from the response.

Finally, those values are used in later steps, such as deployments or integrations.


Because these steps run in code, they can easily be included in CI/CD pipelines.
A typical pipeline might do the following:

Create infrastructure

Deploy the application

Run tests

Collect metrics

Destroy temporary environments


This approach ensures every deployment follows the same process.
Practical Example with Sevalla
A practical way to apply Infrastructure as Code through APIs is to use a command-line interface that directly interacts with a cloud platform’s API. This lets you automate infrastructure creation using scripts rather than dashboards.
One example is the Sevalla CLI, which exposes infrastructure operations as terminal commands that can be executed manually or inside automation pipelines.
Sevalla is a developer-centric PaaS designed to simplify your workflow. They provide high-performance application hosting, managed databases, object storage, and static sites in one unified platform.
Other options are AWS and Azure, which require complex CLI tools and heavy DevOps overhead. Sevalla offers simplicity and ease of use, similar to Heroku.
Installing CLI
You can install the CLI using the following shell command:
bash <(curl -fsSL https://raw.githubusercontent.com/sevalla-hosting/cli/main/install.sh)

Once installed, you can view the list of all available commands using the help command:


The first step is authentication. Make sure you have an account on Sevalla before using the CLI.
sevalla login

For automated environments such as CI/CD pipelines, authentication can be done with an API token. The token is stored in an environment variable so scripts can run without user interaction.
export SEVALLA_API_TOKEN="your-api-token"

Once authenticated, you can quickly view a list of your apps using sevalla apps list


Working with Your Infrastructure using CLI
Your infrastructure can now be created directly from the command line. For example, you might start by creating an application service that will run the backend code:
sevalla apps create --name myapp --source privateGit --cluster 

This command provisions a new application resource on the platform. Instead of navigating through a web interface and filling out forms, the entire setup is performed through a single command.
Because the command can be stored in scripts or configuration files, it becomes part of the project’s infrastructure definition.
After creating the application, you'll often need a database. You can also provision this programmatically:
sevalla databases create \
  --name mydb \
  --type postgresql \
  --db-version 16 \
  --cluster  \
  --resource-type  \
  --db-name mydb \
  --db-password secret

This creates a PostgreSQL database with a defined version and credentials. In an automated workflow, the database creation step could run during environment setup for staging or testing.
Once the application and database exist, the next step might be configuring environment variables so the application can connect to the database:
sevalla apps env-vars create  --key DATABASE_URL --value "postgres://..."

These configuration values can be injected during deployments, ensuring the application always receives the correct settings.
Deployment automation is another key part of Infrastructure as Code. Instead of manually triggering deployments, a script can deploy new code whenever a repository is updated.
sevalla apps deployments trigger  --branch main

This allows CI/CD systems to deploy new versions of the application automatically after tests pass.
Infrastructure automation also includes scaling and monitoring. For example, if an application needs more instances to handle traffic, you can update the number of running processes programmatically.
sevalla apps processes update  --app-id  --instances 3

You can also retrieve metrics through the CLI. This allows monitoring tools or scripts to analyze system performance.
sevalla apps processes metrics cpu-usage  

Similarly, you can also query application metrics such as response time or request rates to detect performance issues.
Another common step in infrastructure automation is configuring domains. Instead of manually linking domains to applications, a script can add them during environment setup.
sevalla apps domains add  --name example.com

With these commands combined in scripts or pipelines, you can fully automate the lifecycle of your infrastructure. A CI pipeline could create an application, provision a database, configure environment variables, deploy code, attach a domain, and monitor performance  – all without human intervention.
Because every command supports JSON output, scripts can also capture values returned by the platform and reuse them in later steps. For example:
APP_ID=$(sevalla apps list --json | jq -r '.[0].id')

This ability to chain commands together makes it easy to build powerful automation workflows.
In practice, teams often place these commands inside deployment scripts or pipeline steps. Whenever code is pushed to a repository, the pipeline automatically provisions or updates the infrastructure needed to run the application.
This approach demonstrates how APIs and automation tools can turn infrastructure into something you can manage the same way you manage application code: through scripts, version control, and automated workflows.
Infrastructure as Code Improves Developer Productivity
One of the biggest benefits of Infrastructure as Code is developer productivity.
Developers no longer need to wait for infrastructure changes or manually configure environments.
Instead, infrastructure becomes part of the development workflow.
When a new feature requires a service, the developer simply adds the infrastructure definition to the repository. The pipeline then creates it automatically.
This reduces delays and keeps development moving quickly.
It also makes onboarding easier. New team members can spin up a full environment with a single command.
The Future of Infrastructure
Cloud infrastructure continues to evolve toward automation and programmability.
Platforms increasingly expose APIs that allow every resource to be created, configured, and monitored through code.
This trend aligns naturally with the way developers already work.
Applications are built with code. Deployments are automated with code. It makes sense that infrastructure should also be defined with code.
Infrastructure as Code with APIs takes this idea even further. It allows infrastructure to be embedded directly into development workflows, pipelines, and internal tools.
The result is faster development, fewer configuration errors, and more reliable systems.
Conclusion
Infrastructure as Code has transformed how teams manage cloud environments.
By replacing manual configuration with code, organizations gain consistency, automation, and repeatability.
Using APIs to control infrastructure adds another level of flexibility. Developers can integrate infrastructure directly into scripts, pipelines, and applications.
This approach turns the cloud into a programmable platform.
As systems grow more complex and deployment cycles accelerate, the ability to automate infrastructure will only become more important.
For modern development teams, treating infrastructure as code is no longer optional. It's the foundation of reliable and scalable software delivery.
Hope you enjoyed this article. Learn more about me by visiting my LinkedIn.
 


 How to Build a Full-Stack CRUD App with React, AWS Lambda, DynamoDB, and Cognito Auth 
Benedicta Onyebuchi — Tue, 17 Mar 2026 15:13:02 +0000
 Building a web application that works only on your local machine is one thing. Building one that is secure, connected to a real database, and accessible to anyone on the internet is another challenge entirely. And it requires a different set of tools.
Most production web applications share a common set of needs: they store and retrieve data, they expose that data through an API, they require users to authenticate before accessing sensitive operations, and they need to be deployed somewhere reliable and fast.
Meeting all of those needs used to require managing servers, configuring databases, handling authentication infrastructure, and provisioning hosting environments – often as separate, manual processes.
AWS changes that model significantly. With the combination of services you'll use in this tutorial (Lambda, DynamoDB, API Gateway, Cognito, and CloudFront), you can build and deploy a fully functional, secured, globally distributed application without managing a single server.
Each service handles one specific responsibility:

DynamoDB stores your data

Lambda runs your business logic on demand

API Gateway exposes your functions as a REST API

Cognito manages user authentication

CloudFront delivers your frontend worldwide over HTTPS.


The AWS CDK (Cloud Development Kit) ties all of this together by letting you define every one of those services as TypeScript code. Instead of clicking through the AWS Console to configure each resource manually, you describe your entire infrastructure in a single file and deploy it with one command.
By the end of this tutorial, you will have a fully deployed vendor management dashboard. Users can sign up, log in, and then create, read, and delete vendors, with all data securely stored in AWS DynamoDB and all routes protected by Amazon Cognito authentication.
What You'll Build
In this handbook, you'll build a two-panel web app where authenticated users can:

Add a new vendor (name, category, contact email)

View all saved vendors in real time

Delete a vendor from the list

Sign in and sign out securely


The frontend is built with Next.js. The backend runs entirely on AWS: DynamoDB stores the data, Lambda functions handle the logic, API Gateway exposes a REST API, Cognito manages authentication, and CloudFront serves the app globally over HTTPS.
Table of Contents

Who This Is For

Prerequisites

Architecture Overview

Part 1: Set Up Your AWS Account and Tools

Part 2: Set Up the Project Structure

Part 3: Define the Database (DynamoDB)

Part 4: Write the Lambda Functions

Part 5: Build the API with API Gateway

Part 6: Deploy the Backend to AWS

Part 7: Build the React Frontend

Part 8: Add Authentication with Amazon Cognito

Part 9: Deploy the Frontend with S3 and CloudFront

What You Built

Conclusion


Who This Is For
This tutorial is for developers who know basic JavaScript and React but have never used AWS. You don't need any prior backend, cloud, or DevOps experience. I'll explain every AWS concept before we use it.
Prerequisites
Before starting, make sure you have the following installed and available:

Node.js 18 or higher: Download here

npm: Included with Node.js

A code editor: I recommend VS Code

A terminal: Any terminal on macOS, Linux, or Windows (WSL recommended on Windows)

An AWS account: You will create one in Part 1. A credit card is required, but the Free Tier covers everything in this tutorial.

Basic familiarity with React and TypeScript: You should understand components, useState, and useEffect.


Architecture Overview
Before writing any code, here's a plain-English description of how the pieces fit together.
When a user clicks "Add Vendor" in the React app:

The frontend reads the user's JWT auth token from the browser session

It sends a POST request to API Gateway, including the token in the request header

API Gateway checks the token against Cognito. If the token is invalid or missing, it rejects the request with a 401 error immediately

If the token is valid, API Gateway passes the request to the createVendor Lambda function

The Lambda function writes the new vendor to DynamoDB

DynamoDB confirms the write, and the Lambda returns a success response

The frontend re-fetches the vendor list and updates the UI


The same flow applies to reading and deleting vendors, with different Lambda functions and HTTP methods.


How the app is deployed: Your React app is exported as a static site, uploaded to an S3 bucket, and served globally through CloudFront. Your backend infrastructure (Lambda functions, API Gateway, DynamoDB, Cognito) is defined in TypeScript using AWS CDK and deployed with a single command.
Part 1: Set Up Your AWS Account and Tools
Before writing any application code, you need three things in place: an AWS account, the right tools on your machine, and credentials that let those tools communicate with AWS on your behalf.
1.1 Create Your AWS Account
If you don't have an AWS account:

Go to https://aws.amazon.com

Click Create an AWS Account

Follow the sign-up prompts and add a payment method

Once registered, log in to the AWS Management Console


AWS has a Free Tier that covers all the services used in this tutorial. You won't be charged for normal use while following along.
1.2 Install the AWS CLI and CDK
The AWS CLI is a command-line tool that lets you interact with AWS from your terminal: checking resources, configuring credentials, and more.
The AWS CDK (Cloud Development Kit) is the tool you will use to define your entire backend (database, Lambda functions, API) using TypeScript code. Instead of clicking through the AWS Console to create each resource, you describe what you want in a TypeScript file and CDK builds it for you.
Install both:
# Install AWS CLI (macOS)
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /

# For Linux, see: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html
# For Windows, see: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-windows.html

# Install AWS CDK globally
npm install -g aws-cdk

Verify both are installed:
aws --version
cdk --version

Both commands should print a version number. If they do, you are ready to move on.
1.3 Configure Your AWS Credentials (IAM)
This step is critical. Your terminal needs a set of credentials – like a username and password – to act on your behalf inside AWS.
Think of your root account (the one you signed up with) as the master key to your entire AWS account. You should never use it for day-to-day development. Instead, you will create a separate IAM user with its own set of keys. If those keys are ever exposed, you can delete them without compromising your root account.
Phase 1: Create an IAM User

Log in to the AWS Console and search for IAM in the top search bar

In the left sidebar, click Users, then click Create user

Name the user cdk-dev. Leave "Provide user access to the AWS Management Console" unchecked – you only need terminal access, not console access

On the permissions screen, choose Attach policies directly





Search for AdministratorAccess and check the box next to it

Note on permissions: In a production job you would use a more restricted policy. For this tutorial, Administrator access is needed because CDK creates many different types of AWS resources.
6. Click through to the end and click Create user
Phase 2: Generate Access Keys

Click on your newly created cdk-dev user from the Users list

Go to the Security credentials tab

Scroll down to Access keys and click Create access key

Select Command Line Interface (CLI), check the acknowledgment box, and click Next

Click Create access key


Important: Copy both the Access Key ID and the Secret Access Key right now. You will never be able to see the Secret Access Key again after closing this screen. Save both values in a password manager or secure note.


Phase 3: Connect Your Terminal to AWS
Run the following command in your terminal:
aws configure

You will be prompted for four values:
AWS Access Key ID:     [paste your Access Key ID]
AWS Secret Access Key: [paste your Secret Access Key]
Default region name:   us-east-1
Default output format: json

Use us-east-1 as your region for this tutorial. After this step, every CDK and AWS CLI command you run will use these credentials automatically.
Part 2: Set Up the Project Structure
You will use a monorepo layout – one top-level folder with two sub-projects inside: frontend for your React app and backend for your AWS infrastructure code. They are deployed independently but live side by side.
2.1 Create the Workspace
mkdir vendor-tracker && cd vendor-tracker
mkdir backend frontend

2.2 Initialize the Frontend (Next.js)
Navigate into the frontend folder and run:
cd frontend
npx create-next-app@latest .

When prompted, choose the following options:

TypeScript --> Yes

ESLint --> Yes

Tailwind CSS --> Yes

src/ directory -->No

App Router --> Yes

Import alias --> No


2.3 Initialize the Backend (CDK)
Navigate into the backend folder and run:
cd ../backend
cdk init app --language typescript

This generates a boilerplate CDK project. The most important file it creates is backend/lib/backend-stack.ts. This is where you will define all of your AWS infrastructure as TypeScript code.
Also install esbuild, which CDK uses to bundle your Lambda functions:
npm install --save-dev esbuild

2.4 Understanding CDK Before You Write Any Code
CDK is likely different from most tools you have used. Here is how it works:
Normally, you would create AWS resources by clicking through the AWS Console: create a table here, configure a Lambda function there. CDK lets you do all of that using TypeScript code instead.
When you run cdk deploy, CDK reads your TypeScript file, converts it into an AWS CloudFormation template (an internal AWS format for describing infrastructure), and submits it to AWS. AWS then creates all the resources you described.
A few terms you will see throughout this tutorial:

Stack: The collection of all AWS resources you define together. Your BackendStack class is your stack.

Construct: Each individual AWS resource you create inside a stack (a table, a Lambda function, an API) is called a construct.

Deploy: Running cdk deploy sends your TypeScript definition to AWS and creates or updates the real resources.


The main file you'll work in is backend/lib/backend-stack.ts. Think of it as the blueprint for your entire backend.
Your final project structure will look like this:
vendor-tracker/
├── backend/
│   ├── lambda/
│   │   ├── createVendor.ts
│   │   ├── getVendors.ts
│   │   └── deleteVendor.ts
│   ├── lib/
│   │   └── backend-stack.ts
│   └── package.json
└── frontend/
    ├── app/
    │   ├── layout.tsx
    │   ├── page.tsx
    │   └── providers.tsx
    ├── lib/
    │   └── api.ts
    ├── types/
    │   └── vendor.ts
    └── .env.local

Part 3: Define the Database (DynamoDB)
DynamoDB is AWS's NoSQL database. Think of it as a fast, scalable key-value store in the cloud. Every item in a DynamoDB table must have a unique ID called the partition key. For your vendor table, that key will be vendorId.
Open backend/lib/backend-stack.ts. Replace the entire file contents with the following:
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // 1. DynamoDB Table
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: {
        name: 'vendorId',
        type: dynamodb.AttributeType.STRING,
      },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY, // For development only
    });
  }
}

What each line does:

partitionKey tells DynamoDB that vendorId is the unique identifier for every record. No two vendors can share the same vendorId.

PAY_PER_REQUEST means you only pay when data is actually read or written. There is no charge when the table is idle, which makes it cost-effective for learning.

RemovalPolicy.DESTROY means the table will be deleted when you run cdk destroy. For production apps you would not use this.


Part 4: Write the Lambda Functions
A Lambda function is your server, but unlike a traditional server, it only runs when it's called. AWS spins it up on demand, runs your code, and shuts it down. You're only charged for the time your code is actually running.
You'll write three Lambda functions:

createVendor.ts: Adds a new vendor to DynamoDB

getVendors.ts: Returns all vendors from DynamoDB

deleteVendor.ts: Removes a vendor from DynamoDB by ID


Create a new folder inside backend:
mkdir backend/lambda



A Note on the AWS SDK
All three Lambda functions use AWS SDK v3 (@aws-sdk/client-dynamodb and @aws-sdk/lib-dynamodb). This is the current standard. An older version of the SDK (aws-sdk) exists but is deprecated and not bundled in the Node.js 18 Lambda runtime, which is what you'll use. Stick to v3 throughout.
4.1 Create Vendor Lambda
Create backend/lambda/createVendor.ts:
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, PutCommand } from "@aws-sdk/lib-dynamodb";
import { randomUUID } from "crypto";

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

export const handler = async (event: any) => {
  try {
    const body = JSON.parse(event.body);

    const item = {
      vendorId: randomUUID(), // Generates a collision-safe unique ID
      name: body.name,
      category: body.category,
      contactEmail: body.contactEmail,
      createdAt: new Date().toISOString(),
    };

    await docClient.send(
      new PutCommand({
        TableName: process.env.TABLE_NAME!,
        Item: item,
      })
    );

    return {
      statusCode: 201,
      headers: {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Headers": "Content-Type,Authorization",
        "Access-Control-Allow-Methods": "OPTIONS,POST,GET,DELETE",
      },
      body: JSON.stringify({ message: "Vendor created", vendorId: item.vendorId }),
    };
  } catch (error) {
    console.error("Error creating vendor:", error);
    return {
      statusCode: 500,
      headers: { "Access-Control-Allow-Origin": "*" },
      body: JSON.stringify({ error: "Failed to create vendor" }),
    };
  }
};

What each part does:

randomUUID() generates a universally unique ID using Node's built-in crypto module. No extra package is needed. This is more reliable than Date.now(), which can produce duplicate IDs if two requests arrive within the same millisecond.

process.env.TABLE_NAME reads the DynamoDB table name from an environment variable. You'll set this value in the CDK stack. This avoids hardcoding the table name inside your Lambda code.

The headers block is required for CORS (Cross-Origin Resource Sharing). Without Access-Control-Allow-Origin, your browser will block responses from a different domain than your frontend. Without Access-Control-Allow-Headers, the Authorization header you add later for Cognito will be rejected during the browser's preflight check.


4.2 Get Vendors Lambda
Create backend/lambda/getVendors.ts:
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, ScanCommand } from "@aws-sdk/lib-dynamodb";

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

export const handler = async () => {
  try {
    const response = await docClient.send(
      new ScanCommand({
        TableName: process.env.TABLE_NAME!,
      })
    );

    return {
      statusCode: 200,
      headers: {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Headers": "Content-Type,Authorization",
        "Content-Type": "application/json",
      },
      body: JSON.stringify(response.Items ?? []),
    };
  } catch (error) {
    console.error("Error fetching vendors:", error);
    return {
      statusCode: 500,
      headers: { "Access-Control-Allow-Origin": "*" },
      body: JSON.stringify({ error: "Failed to fetch vendors" }),
    };
  }
};

What each part does:

ScanCommand reads every item in the table and returns them as an array. For a learning project this is fine. In a production app with millions of rows, you would use a more targeted QueryCommand to avoid reading the entire table on every request.

response.Items ?? [] returns an empty array if the table is empty, preventing the frontend from crashing when there are no vendors yet.


4.3 Delete Vendor Lambda
Create backend/lambda/deleteVendor.ts:
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, DeleteCommand } from "@aws-sdk/lib-dynamodb";

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

export const handler = async (event: any) => {
  try {
    const body = JSON.parse(event.body);
    const { vendorId } = body;

    if (!vendorId) {
      return {
        statusCode: 400,
        headers: { "Access-Control-Allow-Origin": "*" },
        body: JSON.stringify({ error: "vendorId is required" }),
      };
    }

    await docClient.send(
      new DeleteCommand({
        TableName: process.env.TABLE_NAME!,
        Key: { vendorId },
      })
    );

    return {
      statusCode: 200,
      headers: {
        "Access-Control-Allow-Origin": "*",
        "Access-Control-Allow-Headers": "Content-Type,Authorization",
        "Access-Control-Allow-Methods": "OPTIONS,POST,GET,DELETE",
      },
      body: JSON.stringify({ message: "Vendor deleted" }),
    };
  } catch (error) {
    console.error("Error deleting vendor:", error);
    return {
      statusCode: 500,
      headers: { "Access-Control-Allow-Origin": "*" },
      body: JSON.stringify({ error: "Failed to delete vendor" }),
    };
  }
};

What each part does:

DeleteCommand removes the item whose vendorId matches the key you provide. DynamoDB doesn't return an error if the item doesn't exist. It simply does nothing.

The 400 guard at the top returns a clear error if the caller forgets to send a vendorId, rather than letting DynamoDB throw a confusing internal error.


Part 5: Build the API with API Gateway
API Gateway is what gives your Lambda functions a public URL. Without it, there's no way for your browser to trigger a Lambda function. Think of it as the front door of your backend: it receives HTTP requests, checks whether the caller is authorized, routes the request to the correct Lambda, and returns the Lambda's response to the caller.
Now you'll wire everything together in backend/lib/backend-stack.ts.
5.1 Add Lambda Functions and API Gateway to the Stack
Replace the entire contents of backend/lib/backend-stack.ts with this complete, assembled file:
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // 1. DynamoDB Table 
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: {
        name: 'vendorId',
        type: dynamodb.AttributeType.STRING,
      },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // 2. Lambda Functions
    const lambdaEnv = { TABLE_NAME: vendorTable.tableName };

    const createVendorLambda = new NodejsFunction(this, 'CreateVendorHandler', {
      entry: 'lambda/createVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const getVendorsLambda = new NodejsFunction(this, 'GetVendorsHandler', {
      entry: 'lambda/getVendors.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const deleteVendorLambda = new NodejsFunction(this, 'DeleteVendorHandler', {
      entry: 'lambda/deleteVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    // 3. Permissions (Least Privilege)
    vendorTable.grantWriteData(createVendorLambda);
    vendorTable.grantReadData(getVendorsLambda);
    vendorTable.grantWriteData(deleteVendorLambda);

    // 4. API Gateway
    const api = new apigateway.RestApi(this, 'VendorApi', {
      restApiName: 'Vendor Service',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization'],
      },
    });

    const vendors = api.root.addResource('vendors');
    vendors.addMethod('POST', new apigateway.LambdaIntegration(createVendorLambda));
    vendors.addMethod('GET', new apigateway.LambdaIntegration(getVendorsLambda));
    vendors.addMethod('DELETE', new apigateway.LambdaIntegration(deleteVendorLambda));

    // 5. Outputs
    new cdk.CfnOutput(this, 'ApiEndpoint', {
      value: api.url,
    });
  }
}

What each section does:
NodejsFunction is a special CDK construct that automatically bundles your Lambda code and all its dependencies into a single file using esbuild before uploading it to AWS. This is why you installed esbuild in Part 2.
Always use NodejsFunction instead of the basic lambda.Function construct. The basic version requires you to manually manage bundling, which causes "Module not found" errors at runtime.
Permissions (Least Privilege): In AWS, no resource can communicate with any other resource by default. A Lambda function has no access to DynamoDB, S3, or anything else unless you explicitly grant it.
This is called the Least Privilege principle: each piece of your system gets exactly the permissions it needs, and nothing more. grantWriteData lets a Lambda write and delete items. grantReadData lets a Lambda read items. Using separate grants for each function means the getVendors Lambda can never accidentally delete data.
CfnOutput prints a value to your terminal after cdk deploy completes. You'll use the ApiEndpoint URL to configure your frontend.
Part 6: Deploy the Backend to AWS
Your infrastructure is fully defined in code. Now you'll deploy it to AWS and get a live API URL.
6.1 Bootstrap Your AWS Environment
Before your first CDK deployment, AWS needs a small landing zone in your account – an S3 bucket where CDK can upload your Lambda bundles and other assets. This setup step is called bootstrapping and only needs to be done once per AWS account per region.
From inside your backend folder, run:
cdk bootstrap

Important: Bootstrapping is region-specific. If you ever switch to a different AWS region, you will need to run cdk bootstrap again in that region.
6.2 Deploy
Run:
cdk deploy

CDK will display a summary of everything it is about to create and ask for your confirmation. Type y and press Enter.
When the deployment finishes, you'll see an Outputs section in your terminal:
Outputs:
BackendStack.ApiEndpoint = https://abcdef123.execute-api.us-east-1.amazonaws.com/prod/

Copy that URL. You'll need it when building the frontend.
6.3 Troubleshooting: How to Read AWS Error Logs
Real deployments rarely go perfectly the first time. If something goes wrong after deploying, here is how to find the actual error message.
Error: 502 Bad Gateway
A 502 means API Gateway received your request but your Lambda crashed before it could respond. The most common cause is a missing environment variable – for example, if TABLE_NAME was not passed correctly and the Lambda cannot find the table.
To find the actual error message, use CloudWatch Logs:

Log in to the AWS Console and search for CloudWatch

In the left sidebar, click Logs --> Log groups





Find the group named /aws/lambda/BackendStack-CreateVendorHandler...

Click the most recent Log stream

Read the error message. It will tell you exactly what went wrong


Two common messages and their fixes:

Runtime.ImportModuleError : Your Lambda cannot find a module. Make sure you're using NodejsFunction (not lambda.Function) in your CDK stack. NodejsFunction automatically bundles dependencies; lambda.Function does not.

AccessDeniedException: Your Lambda tried to access DynamoDB but doesn't have permission. Check that you have the correct grantWriteData or grantReadData call in your stack for that Lambda.


Part 7: Build the React Frontend
Your backend is live. Now you'll build the React UI that talks to it.
7.1 Define the Vendor Type
Before writing any API or component code, define what a "vendor" looks like in TypeScript. This gives you type safety throughout your frontend code.
Create frontend/types/vendor.ts:
export interface Vendor {
  vendorId?: string; // Optional when creating — the Lambda generates it
  name: string;
  category: string;
  contactEmail: string;
  createdAt?: string;
}

The vendorId? is marked optional with ? because when you are creating a new vendor, you don't have an ID yet. The createVendor Lambda generates one. When you read vendors back from the API, vendorId will always be present.
7.2 Create the API Service Layer
Rather than writing fetch calls directly inside your React components, you'll centralize all your API logic in one file. This pattern is called a service layer. It keeps your components clean and makes it easy to update API calls in one place.
First, create a .env.local file inside your frontend folder to store your API URL:
# frontend/.env.local
NEXT_PUBLIC_API_URL=https://abcdef123.execute-api.us-east-1.amazonaws.com/prod

Replace the URL with the ApiEndpoint value from your cdk deploy output. The NEXT_PUBLIC_ prefix is required by Next.js to make an environment variable accessible in the browser.
You might be wondering: why not hardcode the URL? If you paste your API URL directly into your code and push it to GitHub, it becomes publicly visible. While an API URL alone does not expose your data (Cognito will protect that), it's good practice to keep URLs and secrets out of source control. Always use .env.local and add it to your .gitignore.
Make sure .env.local is in your .gitignore:
echo ".env.local" >> frontend/.gitignore

Now create frontend/lib/api.ts:
import { Vendor } from '@/types/vendor';

const BASE_URL = process.env.NEXT_PUBLIC_API_URL!;

export const getVendors = async (): Promise => {
  const response = await fetch(`${BASE_URL}/vendors`);
  if (!response.ok) throw new Error('Failed to fetch vendors');
  return response.json();
};

export const createVendor = async (vendor: Omit): Promise => {
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(vendor),
  });
  if (!response.ok) throw new Error('Failed to create vendor');
};

export const deleteVendor = async (vendorId: string): Promise => {
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'DELETE',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ vendorId }),
  });
  if (!response.ok) throw new Error('Failed to delete vendor');
};

What each part does:

Omit means the createVendor function accepts a vendor without an ID or timestamp (those are generated server-side).

if (!response.ok) throw new Error(...) ensures that any HTTP error (4xx or 5xx) surfaces as a JavaScript error in your component, where you can show the user a meaningful message instead of silently failing.


You'll update these functions later in Part 8 to include the Cognito auth token.
7.3 Build the Main Page
Now create the main page component. It includes a form for adding vendors and a live list that displays all current vendors.
Replace the contents of frontend/app/page.tsx with:
'use client';

import { useState, useEffect } from 'react';
import { createVendor, getVendors, deleteVendor } from '@/lib/api';
import { Vendor } from '@/types/vendor';

export default function Home() {
  const [vendors, setVendors] = useState([]);
  const [form, setForm] = useState({ name: '', category: '', contactEmail: '' });
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState('');

  const loadVendors = async () => {
    try {
      const data = await getVendors();
      setVendors(data);
    } catch {
      setError('Failed to load vendors.');
    }
  };

  // Load vendors once when the page first renders
  useEffect(() => {
    loadVendors();
  }, []);
  // The empty [] means this runs only once. Without it, the effect would
  // run after every render, causing an infinite loop of fetch requests.

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault(); // Prevent the browser from reloading the page on submit
    setLoading(true);
    setError('');
    try {
      await createVendor(form);
      setForm({ name: '', category: '', contactEmail: '' }); // Reset the form
      await loadVendors(); // Refresh the list from DynamoDB
    } catch {
      setError('Failed to add vendor. Please try again.');
    } finally {
      setLoading(false);
    }
  };

  const handleDelete = async (vendorId: string) => {
    try {
      await deleteVendor(vendorId);
      await loadVendors(); // Refresh after deleting
    } catch {
      setError('Failed to delete vendor.');
    }
  };

  return (
    
      Vendor Tracker
      Manage your vendors, stored in AWS DynamoDB.

      {error && (
        {error}
      )}

      

        {/* ── Add Vendor Form ── */}
        
          Add New Vendor
          
             setForm({ ...form, name: e.target.value })}
              required
            />
             setForm({ ...form, category: e.target.value })}
              required
            />
             setForm({ ...form, contactEmail: e.target.value })}
              required
            />
            
          
        

        {/* ── Vendor List ── */}
        
          
            Current Vendors ({vendors.length})
          
          
            {vendors.length === 0 ? (
              No vendors yet. Add one using the form.
            ) : (
              vendors.map(v => (
                
                  
                    {v.name}
                    {v.category} · {v.contactEmail}
                  
                  
                
              ))
            )}
          
        

      
    
  );
}

Key points in this component:

'use client' at the top is a Next.js directive. It tells Next.js that this component uses browser APIs (useState, useEffect, event handlers) and must run in the browser, not be pre-rendered on the server.

e.preventDefault() inside handleSubmit stops the browser's default form submission behavior, which would cause a full page reload and wipe your React state.

After every createVendor or deleteVendor call, loadVendors() is called again. This re-fetches the latest data from DynamoDB so the UI always matches what is actually stored in the database.


7.4 Test the App Locally
Start your Next.js development server:
cd frontend
npm run dev

Open http://localhost:3000 in your browser. You should see the two-panel layout. Try adding a vendor and confirm it appears in the list.




Verifying the connection to AWS:
Open Chrome DevTools (F12) and click the Network tab. When you add a vendor, you should see:

A POST request to your AWS API URL returning a 201 status code

A GET request returning 200 with the updated vendor list


You can also verify the data was saved by opening the AWS Console, navigating to DynamoDB --> Tables --> VendorTable --> Explore table items. Your vendor should appear there.
Part 8: Add Authentication with Amazon Cognito
Right now your API is completely open. Anyone who finds your API URL can add or delete vendors. You'll fix that with Amazon Cognito.
Cognito is AWS's authentication service. It manages a User Pool – a database of registered users with usernames and passwords. When a user logs in, Cognito issues a JWT (JSON Web Token): a cryptographically signed string that proves who the user is. Your API Gateway will check for this token on every request. No valid token means no access.
What is a JWT? A JSON Web Token is a string that looks like eyJhbGci.... It contains encoded information about the user and is signed by Cognito using a secret key.
API Gateway can verify the signature without contacting Cognito on every request, which makes token checking fast. Think of it as a tamper-proof badge: anyone can read the name on it, but only Cognito's signature makes it valid.
8.1 Add Cognito to the CDK Stack
Open backend/lib/backend-stack.ts and update it to include Cognito. Here is the complete updated file:
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as cognito from 'aws-cdk-lib/aws-cognito';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // ─── 1. DynamoDB Table ────────────────────────────────────────────────────
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: { name: 'vendorId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // ─── 2. Lambda Functions ──────────────────────────────────────────────────
    const lambdaEnv = { TABLE_NAME: vendorTable.tableName };

    const createVendorLambda = new NodejsFunction(this, 'CreateVendorHandler', {
      entry: 'lambda/createVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const getVendorsLambda = new NodejsFunction(this, 'GetVendorsHandler', {
      entry: 'lambda/getVendors.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const deleteVendorLambda = new NodejsFunction(this, 'DeleteVendorHandler', {
      entry: 'lambda/deleteVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    // ─── 3. Permissions ───────────────────────────────────────────────────────
    vendorTable.grantWriteData(createVendorLambda);
    vendorTable.grantReadData(getVendorsLambda);
    vendorTable.grantWriteData(deleteVendorLambda);

    // ─── 4. Cognito User Pool ─────────────────────────────────────────────────
    const userPool = new cognito.UserPool(this, 'VendorUserPool', {
      selfSignUpEnabled: true,
      signInAliases: { email: true },
      autoVerify: { email: true },
      userVerification: {
        emailStyle: cognito.VerificationEmailStyle.CODE,
      },
    });

    // Required to host Cognito's internal auth endpoints
    userPool.addDomain('VendorUserPoolDomain', {
      cognitoDomain: {
        domainPrefix: `vendor-tracker-${this.account}`,
      },
    });

    const userPoolClient = userPool.addClient('VendorAppClient');

    // ─── 5. API Gateway + Authorizer ──────────────────────────────────────────
    const api = new apigateway.RestApi(this, 'VendorApi', {
      restApiName: 'Vendor Service',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization'],
      },
    });

    const authorizer = new apigateway.CognitoUserPoolsAuthorizer(
      this,
      'VendorAuthorizer',
      { cognitoUserPools: [userPool] }
    );

    const authOptions = {
      authorizer,
      authorizationType: apigateway.AuthorizationType.COGNITO,
    };

    const vendors = api.root.addResource('vendors');
    vendors.addMethod('GET', new apigateway.LambdaIntegration(getVendorsLambda), authOptions);
    vendors.addMethod('POST', new apigateway.LambdaIntegration(createVendorLambda), authOptions);
    vendors.addMethod('DELETE', new apigateway.LambdaIntegration(deleteVendorLambda), authOptions);

    // ─── 6. Outputs ───────────────────────────────────────────────────────────
    new cdk.CfnOutput(this, 'ApiEndpoint', { value: api.url });
    new cdk.CfnOutput(this, 'UserPoolId', { value: userPool.userPoolId });
    new cdk.CfnOutput(this, 'UserPoolClientId', { value: userPoolClient.userPoolClientId });
  }
}



What changed:

CognitoUserPoolsAuthorizer tells API Gateway to check every request for a valid Cognito JWT before passing it to any Lambda. If the token is missing or invalid, API Gateway rejects the request with a 401 Unauthorized response without ever touching your Lambda.

authOptions is applied to all three API methods: GET, POST, and DELETE. All routes are now protected.

autoVerify: { email: true } tells Cognito to mark the email attribute as verified after a user confirms via the verification code email. It doesn't skip the verification email, as users still receive a code. If you want to skip verification during development, you can manually confirm users in the Cognito console (covered in section 8.5).

Two new CfnOutput values (UserPoolId and UserPoolClientId) will appear in your terminal after the next deployment. Your frontend needs them to connect to Cognito.


Deploy the updated stack:
cd backend
cdk deploy

After deployment, your terminal output will include three values:
Outputs:
BackendStack.ApiEndpoint     = https://abc123.execute-api.us-east-1.amazonaws.com/prod/
BackendStack.UserPoolId      = us-east-1_xxxxxxxx
BackendStack.UserPoolClientId = xxxxxxxxxxxxxxxxxxxx

Save all three values. You'll use them in the next step.
8.2 Install and Configure AWS Amplify
AWS Amplify is a frontend library that handles all the complex authentication logic for you: it manages the login UI, stores tokens in the browser, refreshes expired tokens automatically, and exposes a simple API to read the current user's session.
Install the Amplify libraries inside your frontend folder:
cd frontend
npm install aws-amplify @aws-amplify/ui-react

Create frontend/app/providers.tsx. This file initializes Amplify with your Cognito configuration. It runs once when the app loads:
'use client';

import { Amplify } from 'aws-amplify';

Amplify.configure(
  {
    Auth: {
      Cognito: {
        userPoolId: process.env.NEXT_PUBLIC_USER_POOL_ID!,
        userPoolClientId: process.env.NEXT_PUBLIC_USER_POOL_CLIENT_ID!,
      },
    },
  },
  { ssr: true }
);

export function Providers({ children }: { children: React.ReactNode }) {
  return <>{children};
}

Add the Cognito IDs to your frontend/.env.local file:
NEXT_PUBLIC_API_URL=https://abc123.execute-api.us-east-1.amazonaws.com/prod
NEXT_PUBLIC_USER_POOL_ID=us-east-1_xxxxxxxx
NEXT_PUBLIC_USER_POOL_CLIENT_ID=xxxxxxxxxxxxxxxxxxxx

Replace the values with the outputs from your cdk deploy.
8.3 Wire Providers into the App Layout
This step is critical. Amplify must be initialized before any component tries to use authentication. If you skip this step, fetchAuthSession() will throw an "Amplify not configured" error and nothing will work.
Open frontend/app/layout.tsx and update it to wrap the app in the Providers component:
import type { Metadata } from 'next';
import './globals.css';
import { Providers } from './providers';

export const metadata: Metadata = {
  title: 'Vendor Tracker',
  description: 'Manage your vendors with AWS',
};

export default function RootLayout({
  children,
}: {
  children: React.ReactNode;
}) {
  return (
    
      
        {children}
      
    
  );
}

By wrapping {children} in , you ensure that Amplify is configured once at the root of the app, before any child page or component renders.
8.4 Protect the UI with withAuthenticator
Now wrap your Home component so that unauthenticated users see a login screen instead of the dashboard.
Replace the contents of frontend/app/page.tsx with this updated version:
'use client';

import { useState, useEffect } from 'react';
import { withAuthenticator } from '@aws-amplify/ui-react';
import '@aws-amplify/ui-react/styles.css';
import { getVendors, createVendor, deleteVendor } from '@/lib/api';
import { Vendor } from '@/types/vendor';

// withAuthenticator injects `signOut` and `user` as props automatically
function Home({ signOut, user }: { signOut?: () => void; user?: any }) {
  const [vendors, setVendors] = useState([]);
  const [form, setForm] = useState({ name: '', category: '', contactEmail: '' });
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState('');

  const loadVendors = async () => {
    try {
      const data = await getVendors();
      setVendors(data);
    } catch {
      setError('Failed to load vendors.');
    }
  };

  useEffect(() => {
    loadVendors();
  }, []);

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();
    setLoading(true);
    setError('');
    try {
      await createVendor(form);
      setForm({ name: '', category: '', contactEmail: '' });
      await loadVendors();
    } catch {
      setError('Failed to add vendor.');
    } finally {
      setLoading(false);
    }
  };

  const handleDelete = async (vendorId: string) => {
    try {
      await deleteVendor(vendorId);
      await loadVendors();
    } catch {
      setError('Failed to delete vendor.');
    }
  };

  return (
    
      {/* ── Header ── */}
      
        
          Vendor Tracker
          Signed in as: {user?.signInDetails?.loginId}
        
        
      

      {error && (
        {error}
      )}

      

        {/* ── Add Vendor Form ── */}
        
          Add New Vendor
          
             setForm({ ...form, name: e.target.value })}
              required
            />
             setForm({ ...form, category: e.target.value })}
              required
            />
             setForm({ ...form, contactEmail: e.target.value })}
              required
            />
            
          
        

        {/* ── Vendor List ── */}
        
          
            Current Vendors ({vendors.length})
          
          
            {vendors.length === 0 ? (
              No vendors yet.
            ) : (
              vendors.map(v => (
                
                  
                    {v.name}
                    {v.category} · {v.contactEmail}
                  
                  
                
              ))
            )}
          
        

      
    
  );
}

// Wrapping Home with withAuthenticator means any user who is not logged in
// will see Amplify's built-in login/signup screen instead of this component.
export default withAuthenticator(Home);



8.5 Pass the Auth Token to API Calls
Now that API Gateway requires a JWT on every request, your fetch calls need to include the token in the Authorization header. Without it, every request will return a 401 Unauthorized error.
Update frontend/lib/api.ts with a token helper and updated fetch calls:
import { fetchAuthSession } from 'aws-amplify/auth';
import { Vendor } from '@/types/vendor';

const BASE_URL = process.env.NEXT_PUBLIC_API_URL!;

// Retrieves the current user's JWT token from the active Amplify session
const getAuthToken = async (): Promise => {
  const session = await fetchAuthSession();
  const token = session.tokens?.idToken?.toString();
  if (!token) throw new Error('No active session. Please sign in.');
  return token;
};

export const getVendors = async (): Promise => {
  const token = await getAuthToken();
  const response = await fetch(`${BASE_URL}/vendors`, {
    headers: { Authorization: token },
  });
  if (!response.ok) throw new Error('Failed to fetch vendors');
  return response.json();
};

export const createVendor = async (
  vendor: Omit
): Promise => {
  const token = await getAuthToken();
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: token,
    },
    body: JSON.stringify(vendor),
  });
  if (!response.ok) throw new Error('Failed to create vendor');
};

export const deleteVendor = async (vendorId: string): Promise => {
  const token = await getAuthToken();
  const response = await fetch(`${BASE_URL}/vendors`, {
    method: 'DELETE',
    headers: {
      'Content-Type': 'application/json',
      Authorization: token,
    },
    body: JSON.stringify({ vendorId }),
  });
  if (!response.ok) throw new Error('Failed to delete vendor');
};

What getAuthToken does:
fetchAuthSession() reads the currently logged-in user's session from the browser. Amplify stores the session in memory and localStorage after the user signs in.
session.tokens?.idToken is the JWT string that API Gateway's Cognito Authorizer is looking for. Passing it as the Authorization header tells API Gateway: "This request is from an authenticated user."
8.6 Troubleshooting Cognito
"Unconfirmed" user error after sign-up
When a new user signs up through the Amplify UI, Cognito marks the account as Unconfirmed until the user verifies their email address. A verification code is sent to the user's email. After entering the code, the account becomes confirmed and the user can log in.
If you are testing locally and want to skip the email step, you can manually confirm any account in the AWS Console:

Open the AWS Console and navigate to Cognito

Click on your User Pool (VendorUserPool...)

Click the Users tab

Click on the user's email address

Open the Actions dropdown and click Confirm account






401 Unauthorized errors after deployment
If you are getting 401 errors, check two things:

Open Chrome DevTools --> Network tab, click the failing request, and look at the Request Headers. You should see an Authorization header with a long string of characters. If it is missing, getAuthToken is failing. Check that Amplify is configured correctly in providers.tsx and wired in via layout.tsx.

In your CDK stack, confirm that authorizationType: apigateway.AuthorizationType.COGNITO is present on every protected method definition. If it is missing, API Gateway may not be checking tokens even though the authorizer is defined.


Part 9: Deploy the Frontend with S3 and CloudFront
Your app works locally. Now you'll deploy it to a real HTTPS URL that anyone in the world can visit.
The strategy: Next.js will export your React app as a set of static HTML, CSS, and JavaScript files. Those files will be uploaded to an S3 bucket (AWS's file storage service). CloudFront sits in front of the bucket as a Content Delivery Network (CDN), distributing your files to servers around the world and serving them over HTTPS.
9.1 Configure Next.js for Static Export
Open frontend/next.config.js (or next.config.mjs) and add the output: 'export' setting:
/** @type {import('next').NextConfig} */
const nextConfig = {
  output: 'export', // Generates a static /out folder instead of a Node.js server
};

export default nextConfig;

Note on 'use client' and static export: When output: 'export' is set, Next.js builds every page at compile time. Any component that uses browser-only APIs – like withAuthenticator from Amplify – must have 'use client' at the top of the file. This tells Next.js to skip server-side rendering for that component and run it only in the browser.
You already have 'use client' in page.tsx. If you ever see a build error mentioning window is not defined or similar, check that the relevant component has 'use client' at the top.
Build the frontend:
cd frontend
npm run build

This generates an /out folder containing your complete website as static files. Verify the folder was created:
ls out
# You should see: index.html, _next/, etc.

9.2 Add S3 and CloudFront to the CDK Stack
Open backend/lib/backend-stack.ts and add the hosting infrastructure. Here's the complete final version of the file:
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as cognito from 'aws-cdk-lib/aws-cognito';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as cloudfront from 'aws-cdk-lib/aws-cloudfront';
import * as origins from 'aws-cdk-lib/aws-cloudfront-origins';
import * as s3deploy from 'aws-cdk-lib/aws-s3-deployment';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';

export class BackendStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // 1. DynamoDB Table 
    const vendorTable = new dynamodb.Table(this, 'VendorTable', {
      partitionKey: { name: 'vendorId', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // 2. Lambda Functions
    const lambdaEnv = { TABLE_NAME: vendorTable.tableName };

    const createVendorLambda = new NodejsFunction(this, 'CreateVendorHandler', {
      entry: 'lambda/createVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const getVendorsLambda = new NodejsFunction(this, 'GetVendorsHandler', {
      entry: 'lambda/getVendors.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    const deleteVendorLambda = new NodejsFunction(this, 'DeleteVendorHandler', {
      entry: 'lambda/deleteVendor.ts',
      handler: 'handler',
      environment: lambdaEnv,
    });

    // 3. Permissions
    vendorTable.grantWriteData(createVendorLambda);
    vendorTable.grantReadData(getVendorsLambda);
    vendorTable.grantWriteData(deleteVendorLambda);

    // 4. Cognito User Pool
    const userPool = new cognito.UserPool(this, 'VendorUserPool', {
      selfSignUpEnabled: true,
      signInAliases: { email: true },
      autoVerify: { email: true },
      userVerification: {
        emailStyle: cognito.VerificationEmailStyle.CODE,
      },
    });

    userPool.addDomain('VendorUserPoolDomain', {
      cognitoDomain: { domainPrefix: `vendor-tracker-${this.account}` },
    });

    const userPoolClient = userPool.addClient('VendorAppClient');

    // 5. API Gateway + Authorizer
    const api = new apigateway.RestApi(this, 'VendorApi', {
      restApiName: 'Vendor Service',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
        allowHeaders: ['Content-Type', 'Authorization'],
      },
    });

    const authorizer = new apigateway.CognitoUserPoolsAuthorizer(
      this,
      'VendorAuthorizer',
      { cognitoUserPools: [userPool] }
    );

    const authOptions = {
      authorizer,
      authorizationType: apigateway.AuthorizationType.COGNITO,
    };

    const vendors = api.root.addResource('vendors');
    vendors.addMethod('GET', new apigateway.LambdaIntegration(getVendorsLambda), authOptions);
    vendors.addMethod('POST', new apigateway.LambdaIntegration(createVendorLambda), authOptions);
    vendors.addMethod('DELETE', new apigateway.LambdaIntegration(deleteVendorLambda), authOptions);

    // 6. S3 Bucket (Frontend Files) 
    const siteBucket = new s3.Bucket(this, 'VendorSiteBucket', {
      blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      autoDeleteObjects: true,
    });

    // 7. CloudFront Distribution (HTTPS + CDN)
    const distribution = new cloudfront.Distribution(this, 'SiteDistribution', {
      defaultBehavior: {
        origin: new origins.S3Origin(siteBucket),
        viewerProtocolPolicy: cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
      },
      defaultRootObject: 'index.html',
      errorResponses: [
        {
          // Redirect all 404s back to index.html so React can handle routing
          httpStatus: 404,
          responseHttpStatus: 200,
          responsePagePath: '/index.html',
        },
      ],
    });

    // 8. Deploy Frontend Files to S3 
    new s3deploy.BucketDeployment(this, 'DeployWebsite', {
      sources: [s3deploy.Source.asset('../frontend/out')],
      destinationBucket: siteBucket,
      distribution,
      distributionPaths: ['/*'], // Clears CloudFront cache on every deploy
    });

    // 9. Outputs ───────────────────────────────────────────────────────────
    new cdk.CfnOutput(this, 'ApiEndpoint', { value: api.url });
    new cdk.CfnOutput(this, 'UserPoolId', { value: userPool.userPoolId });
    new cdk.CfnOutput(this, 'UserPoolClientId', { value: userPoolClient.userPoolClientId });
    new cdk.CfnOutput(this, 'CloudFrontURL', {
      value: `https://${distribution.distributionDomainName}`,
    });
  }
}

What the hosting infrastructure does:

The S3 bucket stores your static HTML, CSS, and JavaScript files. It is private – users cannot access it directly.

CloudFront is the CDN that sits in front of S3. It gives you an HTTPS URL and caches your files at edge locations worldwide, so the app loads fast no matter where users are located. REDIRECT_TO_HTTPS automatically upgrades any HTTP request to HTTPS.

The error response for 404 returns index.html instead of an error page. This is necessary for single-page apps: if a user navigates directly to a route like /vendors/123, CloudFront cannot find a file at that path, but sending back index.html lets the React app handle the routing correctly.

distributionPaths: ['/*'] tells CloudFront to invalidate its entire cache after every deployment. This ensures users always see the latest version of your app immediately.

BucketDeployment is a CDK construct that automatically uploads the contents of your frontend/out folder to the S3 bucket every time you run cdk deploy.


9.3 Run the Final Deployment
First, build the frontend with the latest environment variables:
cd frontend
npm run build

Then deploy everything from the backend folder:
cd ../backend
cdk deploy

After deployment finishes, copy the CloudFrontURL from the terminal output:
Outputs:
BackendStack.CloudFrontURL = https://d1234abcd.cloudfront.net

Open that URL in your browser. Your app is now live on the internet, served over HTTPS, globally distributed.


What You Built
You now have a fully deployed, production-style full-stack application. Here is a summary of every piece you built and what it does:



Layer
Service
What it does



Frontend
Next.js + CloudFront
React UI served globally over HTTPS


Auth
Amazon Cognito + Amplify
User sign-up, login, and JWT token management


API
API Gateway
Routes HTTP requests, validates auth tokens


Logic
AWS Lambda (×3)
Creates, reads, and deletes vendors on demand


Database
DynamoDB
Stores vendor records with no idle cost


Storage
S3
Holds your built frontend files


Infrastructure
AWS CDK
Defines and deploys all of the above as code


Conclusion
You have built and deployed the foundational pattern of almost every cloud application: a secured API backed by a database, deployed with infrastructure as code. Here is everything you accomplished:
You set up a professional AWS development environment with scoped IAM credentials. You defined your entire backend infrastructure as TypeScript code using AWS CDK, which means your database, API, Lambda functions, and authentication system are all version-controlled, repeatable, and deployable with a single command.
You wrote three Lambda functions that handle create, read, and delete operations, each with proper error handling and the correct AWS SDK v3 patterns. You connected them to a REST API through API Gateway and protected every route with Amazon Cognito authentication, so only registered, verified users can interact with your data.
On the frontend, you built a Next.js application with a service layer that cleanly separates API logic from UI components, manages JWTs automatically through AWS Amplify, and gives users a complete sign-up and sign-in flow without you writing a single line of authentication UI code.
Finally, you deployed the entire system: your backend to AWS Lambda and DynamoDB, and your frontend as a static site served globally through CloudFront over HTTPS.
The full source code for this tutorial is available on GitHub. Clone it, modify it, and use it as a reference for your own projects.
 


 How to Build a Serverless RAG Pipeline on AWS That Scales to Zero 
Christopher Galliart — Wed, 11 Mar 2026 18:19:40 +0000
 Most RAG tutorials end the same way: you've got a working prototype and a bill for a vector database that runs whether anyone's querying it or not. Add an always-on embedding service, a hosted LLM endpoint, and the usual AWS infrastructure, and you're looking at real money before a single user shows up.
But it doesn't have to work that way. In this tutorial, you'll deploy a fully serverless RAG pipeline that processes documents, images, video, and audio, then scales to zero when nobody's using it.
Everything runs in your AWS account, your data never leaves your infrastructure, and your ongoing monthly cost for a modest knowledge base will be closer to 2-3 USD than 300 USD.
We'll use RAGStack-Lambda, an open-source project I built on AWS. By the end, you'll have a deployed pipeline with a dashboard, an AI chat interface with source citations, a drop-in web component you can embed in any app, and an MCP server you can use to feed your assistant context.
Here's what we'll cover:

What This Actually Costs

What You're Building

Prerequisites

Deploying from AWS Marketplace

Deploying from Source

Uploading Your First Documents

Chatting With Your Knowledge Base

Embedding the Web Component in Your App

Using the MCP Server

What You Can Build From Here

Wrapping Up


What This Actually Costs
Before we build anything, let's talk money, because the cost story is the whole point.
RAG pipelines have two cost phases: ingestion (processing your documents once) and operation (querying them over time).
Most platforms charge you a flat monthly rate regardless of which phase you're in. A serverless architecture flips that: ingestion costs something, and then everything scales to zero.
Ingestion: The One-Time Hit
When you upload documents, several things happen: text extraction (OCR for PDFs and images), embedding generation, metadata extraction, and storage. Here's what that actually costs per service:
Textract (OCR): This is the most expensive part of ingestion, and it only applies to scanned PDFs and images that need text extraction. Plain text, HTML, CSV, and other text-based formats skip this entirely.
Textract charges about 1.50 USD per 1,000 pages for standard text detection. If you're uploading 500 pages of scanned PDFs, that's about 0.75 USD. A heavy initial load of several thousand scanned pages might run 5-10 USD. But once your documents are processed, you never pay this again unless you add new ones.
Bedrock Embeddings (Nova Multimodal): This is where your content gets converted into vectors for semantic search. The pricing is almost comically cheap:

Text: 0.00002 USD per 1,000 input tokens

Images: 0.00115 USD per image

Video/Audio: 0.00200 USD per minute


To put that in perspective: if you have 1,500 text documents averaging 2,500 tokens each after chunking, your total embedding cost is about 0.08 USD. A knowledge base with 500 images runs 0.58 USD. Even a mixed corpus of text, images, and a few hours of video stays well under 2 USD for the entire embedding pass. This is a one-time cost – you only re-embed if you add or update documents.
Bedrock LLM (Metadata Extraction): RAGStack uses an LLM to analyze each document and extract structured metadata automatically. This is a few inference calls per document using Nova Lite or a similar model. At 0.06 USD/0.24 USD per million input/output tokens, processing 1,500 documents costs well under 1 USD.
S3 Vectors (Storage): Storing your embeddings. At 0.06 USD per GB/month, a knowledge base of 1,500 documents with 1,024-dimension vectors takes up a trivially small amount of space. We're talking pennies per month.
S3 (Document Storage): Your source documents in standard S3. Even cheaper, 0.023 USD per GB/month.
DynamoDB: Stores document metadata and processing state. The on-demand pricing model means you pay per request during ingestion, then essentially nothing at rest. A few cents for the initial load.
To put real numbers on it: if you upload 200 text documents (PDFs, HTML, markdown), your total ingestion cost is likely under 1 USD. If you upload 1,000 scanned PDFs that need OCR, you might see 5-8 USD as a one-time hit. That 7-10 USD figure you might see referenced? That's the upper end for a heavy initial load with lots of OCR work.
Operation: Where Scale-to-Zero Shines
Once your documents are ingested, the pipeline is waiting. Not running. Waiting. Here's what each query costs:
Lambda: Invocations are billed per request and duration. The free tier covers 1 million requests/month. For a personal or small-team knowledge base, you may never leave the free tier.
S3 Vectors (Queries): 2.50 USD per million query API calls, plus a per-TB data processing charge. For a small index queried a few hundred times a month, this rounds to effectively zero.
Bedrock (Chat Inference): This is your main operating cost. Each chat response requires an LLM call. Using Nova Lite at 0.06 USD per million input tokens and 0.24 USD per million output tokens, a typical RAG query (retrieval context + user question + response) might cost 0.001-0.003 USD per query. A hundred queries a month is 0.10-0.30 USD.
Step Functions: Orchestrates the document processing pipeline. Standard workflows charge 0.025 USD per 1,000 state transitions. Minimal during operation since it's only active during ingestion.
Cognito: User authentication. Free for the first 10,000 monthly active users.
CloudFront: Serves the dashboard UI. Free tier covers 1 TB of data transfer per month.
API Gateway: Handles GraphQL API requests. Free tier covers 1 million API calls per month.
Add it all up for a knowledge base with 500 documents getting a few hundred queries per month, and your monthly operating cost is somewhere between 0.50 USD and 3.00 USD. Most of that is the LLM inference for chat responses.
The Comparison That Matters
Here's the same pipeline on a traditional always-on stack:



Service
RAGStack-Lambda
Traditional Stack



Vector Database
S3 Vectors: pennies/mo
Pinecone Starter: 70 USD/mo


Vector Database (alt)
S3 Vectors: pennies/mo
OpenSearch Serverless: about 350 USD/mo min


Compute
Lambda: free tier
EC2 or ECS: 50-150 USD/mo


LLM Inference
Same per-query cost
Same per-query cost


Total (idle)
about 0.50-3.00 USD/mo
120-500 USD/mo


The LLM inference cost per query is roughly the same everywhere – that's Bedrock's on-demand pricing regardless of your architecture. The difference is everything else. Traditional stacks pay a floor cost whether anyone's using them or not. A serverless stack pays for what it uses, and idle costs essentially nothing.
What About Transcribe?
If you're uploading video or audio, AWS Transcribe adds cost for speech-to-text conversion. Standard transcription runs about 0.024 USD per minute of audio. A 10-minute video costs 0.24 USD to transcribe. This is a one-time ingestion cost, once transcribed and embedded, the resulting text chunks are queried like any other document.
What You're Building
By the end of this tutorial, you'll have a deployed pipeline that does the following:

You upload a document (PDF, image, video, audio, HTML, CSV, the full list is extensive) through a web dashboard.

The pipeline detects the file type and routes it to the right processor. Scanned PDFs go through OCR via Textract. Video and audio go through Transcribe for speech-to-text, split into 30-second searchable chunks with speaker identification. Images get visual embeddings and any caption text you provide.

An LLM analyzes each document and extracts structured metadata, topic, document type, date range, people mentioned, whatever's relevant. This happens automatically.

Everything gets embedded using Amazon Nova Multimodal Embeddings and stored in a Bedrock Knowledge Base backed by S3 Vectors.

You (or your users) ask questions through an AI chat interface. The pipeline retrieves relevant documents, passes them as context to a Bedrock LLM, and returns an answer with collapsible source citations, including timestamp links for video and audio that jump to the exact position.


All of this runs in your AWS account. No external control plane, no third-party services beyond AWS itself.
The Architecture


A few things to note about this architecture:
Step Functions orchestrate everything. When a document is uploaded, a state machine manages the entire processing flow, detecting the file type, routing to the right processor, waiting for async operations like Transcribe jobs, then triggering embedding and metadata extraction.
This is what makes the pipeline reliable without a running server. If a step fails, it retries. You can see exactly where every document is in the processing pipeline.
Lambda does the compute. Every processing step is a Lambda function. They spin up when needed, run for a few seconds to a few minutes, and shut down. There's no EC2 instance idling at 3 AM.
S3 Vectors is the vector store. Your embeddings live in S3's purpose-built vector storage rather than in a dedicated vector database like Pinecone or OpenSearch.
This is what makes the "scale to zero" cost possible: you're paying object storage rates for vector data instead of keeping a database cluster warm. It also means your vectors are sitting in your own S3 bucket, not in a third-party managed service that holds your data on their terms.
Cognito handles auth. The dashboard and API are protected with Cognito user pools. When you deploy, you get a temporary password via email. The web component uses IAM-based authentication, and server-side integrations use API key auth.
CloudFront serves the UI. The dashboard is a static React app served through CloudFront, so there's no web server to maintain.
Two Ways to Deploy
You have two deployment paths depending on what you want:
AWS Marketplace (the fast path), click deploy, fill in two fields (stack name and email), and wait about 10 minutes. No local tooling required. This is the path we'll walk through first.
From Source (the developer path), Clone the repo, run publish.py, and deploy via SAM CLI. This is the path for when you want to customize the processing pipeline, modify the UI, or contribute to the project. We'll cover this after the Marketplace walkthrough.
Both paths produce the same stack. The Marketplace version just wraps the CloudFormation template in a one-click deployment.
Prerequisites
Before you deploy, you'll need:

An AWS account with permissions to create CloudFormation stacks, Lambda functions, S3 buckets, DynamoDB tables, and Cognito user pools. If you're using an admin account, you're covered.

Bedrock model access: RAGStack defaults to us-east-1 because that's where Nova Multimodal Embeddings is available. Amazon's own models (including Nova) are available by default in Bedrock, no manual enablement required. Just make sure your IAM role has the necessary bedrock:InvokeModel permissions.

For the Marketplace path: just a web browser.

For the source path: Python 3.13+, Node.js 24+, AWS CLI and SAM CLI configured, and Docker (for building Lambda layers).


Deploying from AWS Marketplace
This is the fastest path – no local tools, no CLI, no Docker. You'll launch a CloudFormation stack and have a working pipeline in about 10 minutes.
Step 1: Launch the Stack
Click the direct deploy link to open CloudFormation's "Quick create stack" page with the template pre-loaded.


Step 2: Fill In Two Fields
The page has a lot of options, but you only need two:

Stack name: Must be lowercase. This becomes the prefix for all your AWS resources (for example, my-docs, team-kb, project-notes). Keep it short.

Admin Email: Under Required Settings. Cognito will send your temporary login credentials here. Use an email you can access right now.


Everything else – Build Options, Advanced Settings, OCR Backend, model selections – can stay at the defaults. They're there for customization later, but the defaults work out of the box.
Step 3: Deploy
Scroll to the bottom, check the three acknowledgment boxes under "Capabilities and transforms," and click Create stack.
Deployment takes roughly 10 minutes. You can watch the progress in the CloudFormation Events tab if you're curious, but there's nothing to do until the stack status flips to CREATE_COMPLETE.
Step 4: Log In
Once the stack finishes, check your email. Cognito sends you the dashboard URL and a temporary password. Log in, set a new password, and you're looking at an empty dashboard ready for documents.


Deploying from Source
If you want to customize the pipeline, modify the UI, or contribute to the project, deploy from source instead.
Step 1: Clone and Set Up
git clone https://github.com/HatmanStack/RAGStack-Lambda.git
cd RAGStack-Lambda

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Step 2: Deploy
The publish.py script handles everything: building the frontend, packaging Lambda functions, and deploying via SAM CLI.
python publish.py \
  --project-name my-docs \
  --admin-email admin@example.com

This defaults to us-east-1 for Nova Multimodal Embeddings. The script will build the React dashboard, build the web component, package all Lambda layers with Docker, and deploy the CloudFormation stack through SAM.
First deploy takes longer (15-20 minutes) because it's building everything from scratch. Subsequent deploys are faster since SAM caches unchanged resources.
If you only want to iterate on the backend and skip UI builds:
# Skip dashboard build (still builds web component)
python publish.py --project-name my-docs --admin-email admin@example.com --skip-ui

# Skip ALL UI builds
python publish.py --project-name my-docs --admin-email admin@example.com --skip-ui-all

Once it finishes, you'll get the same Cognito email and dashboard URL as the Marketplace path.
Uploading Your First Documents
The dashboard has tabs for different content types. We'll start with the Documents tab since that's the most common use case.
Documents
Click the Documents tab and upload a file. RAGStack accepts a wide range of formats: PDF, DOCX, XLSX, HTML, CSV, JSON, XML, EML, EPUB, TXT, and Markdown. Drag and drop or use the file picker.
Once uploaded, the document enters the processing pipeline. You'll see the status update in real time:

UPLOADED: File received and stored in S3.

PROCESSING: Step Functions has picked it up and routed it to the right processor. Text-based files (HTML, CSV, Markdown) go through direct extraction. Scanned PDFs and images go through Textract OCR. The LLM analyzes the content and extracts structured metadata, topic, document type, people mentioned, date ranges, whatever's relevant to the content.

INDEXED: Embeddings generated, vectors stored, document is searchable.


Text documents typically process in 1-5 minutes. OCR-heavy documents (scanned PDFs, images with text) can take 2-15 minutes depending on page count.


Images
The Images tab works differently. Upload a JPG, PNG, GIF, or WebP and you can add a caption. Both the visual content and caption text get embedded using Nova Multimodal Embeddings, so you can search by what's in the image or by your description of it.
This is where multimodal embeddings earn their keep. A traditional text-only RAG pipeline would need you to describe every image manually. Here, the image itself becomes searchable, and since everything stays in your AWS account, you're not sending personal photos or sensitive visual content to an external service to get there.
What About Video and Audio?
Upload video or audio files and RAGStack routes them through AWS Transcribe for speech-to-text conversion. The transcript gets split into 30-second chunks with speaker identification, then embedded like any other document. When chat results reference a video source, you get timestamp links that jump to the exact position in the recording.
Web Scraping
The Scrape tab lets you pull websites directly into your knowledge base. Enter a URL and RAGStack crawls the page, extracts the content, and processes it through the same pipeline as uploaded documents, metadata extraction, embedding, indexing.
This is useful for building a knowledge base from existing web content without manually saving and uploading pages. Documentation sites, blog archives, reference material, anything publicly accessible.


Chatting With Your Knowledge Base
This is the payoff. Go to the Chat tab, type a question, and RAGStack retrieves relevant documents from your knowledge base, passes them as context to a Bedrock LLM, and returns an answer with source citations.
The citations are collapsible, so click to expand and see which documents informed the answer, with the option to download the source file. For video and audio sources, you get clickable timestamps that jump to the relevant moment.


Metadata Filtering
If you've uploaded enough documents to have meaningful metadata categories, the chat interface lets you filter search results by metadata before querying. RAGStack auto-discovers the metadata structure from your documents, so you don't configure this manually, it just appears as your knowledge base grows.
This is useful when you have a large mixed corpus. Instead of hoping the vector search picks the right context from thousands of documents, you can narrow it down: "only search documents about project X" or "only search content from Q4 2024."
Embedding the Web Component in Your App
The dashboard is useful for managing your knowledge base, but the real power is embedding RAGStack's chat in your own application. The web component works with any framework, React, Vue, Angular, Svelte, plain HTML.
Load the script once from your CloudFront distribution:


Then drop the component wherever you want a chat interface:


That's it. The component handles authentication (via IAM), manages conversation state, and renders source citations, all self-contained. Your CloudFront URL is in the stack outputs.
For server-side integrations that don't need a UI, the GraphQL API is available with API key authentication. You can find your endpoint and API key in the dashboard under Settings.
Using the MCP Server
RAGStack includes an MCP server that connects your knowledge base to AI assistants like Claude Desktop, Cursor, VS Code, and Amazon Q CLI. Instead of switching to the dashboard to search your documents, you ask your assistant directly.
Install it:
pip install ragstack-mcp

Then add it to your AI assistant's MCP configuration:
{
  "ragstack": {
    "command": "uvx",
    "args": ["ragstack-mcp"],
    "env": {
      "RAGSTACK_GRAPHQL_ENDPOINT": "YOUR_ENDPOINT",
      "RAGSTACK_API_KEY": "YOUR_API_KEY"
    }
  }
}

Your endpoint and API key are in the dashboard under Settings. Once configured, type @ragstack in your assistant's chat to invoke the MCP server, then ask things like "search my knowledge base for authentication docs" and it queries RAGStack directly.
See the MCP Server docs for the full list of available tools and setup details.
What You Can Build From Here
You've got a deployed RAG pipeline that costs almost nothing to run and handles text, images, video, and audio. A few directions you might take it:
A searchable personal archive. Every conference talk you've saved, every PDF textbook, every tutorial video that's sitting in a folder somewhere. Upload it all, and now you have one search interface across years of accumulated material. The multimodal embeddings mean your screenshots and diagrams are searchable too, not just the text.
I built a family archive app this way, scanned letters, old photos, home videos, with RAGStack deployed as a nested CloudFormation stack so the whole family can search across decades of memories using the chat widget.
A second brain for a client project. Scrape the client's existing docs, upload the SOW and meeting notes, drop in the codebase documentation. Now you've got a searchable knowledge base scoped to that engagement. Spin it up at the start, tear it down when the contract ends. At these costs, it's disposable infrastructure.
AI chat over a niche dataset. Recipe collections, legal filings, research papers, local government meeting minutes, any corpus that's too specialized for general-purpose LLMs to know well. The web component means you can ship it as a standalone tool without building a frontend from scratch.
RAG for your MCP workflow. If you're already using Claude Desktop or Cursor, the MCP server turns your knowledge base into another tool your assistant can reach for. Upload your team's runbooks and architecture docs, and now @ragstack in your editor gives you instant context without tab-switching.
Wrapping Up
The serverless RAG pipeline you just deployed handles document processing, multimodal embeddings, metadata extraction, and AI chat with source citations, all scaling to zero when idle, all running in your AWS account. Your documents, your vectors, your infrastructure. The traditional approach to this stack costs 120-500 USD/month in baseline infrastructure. This one costs pocket change.
The full source is at github.com/HatmanStack/RAGStack-Lambda. File issues, open PRs, or just poke around the architecture. If you want to go deeper on the technical tradeoffs, particularly how filtered vector search behaves on cost-optimized backends like S3 Vectors, that's a story for the next post.
 


 How to Build and Deploy a Production-Ready WhatsApp Bot with FastAPI, Evolution API, Docker, EasyPanel, and GCP 
Raju Manoj — Fri, 20 Feb 2026 15:03:08 +0000
 WhatsApp bots are widely used for customer support, automated replies, notifications, and internal tools. Instead of relying on expensive third-party platforms, you can build and deploy your own self-hosted WhatsApp bot using modern open-source tools.
In this tutorial, you’ll learn how to build and deploy a production-ready WhatsApp bot using:

FastAPI

Evolution API

Docker

EasyPanel

Google Cloud Platform (GCP)


By the end of this guide, you will have a fully working WhatsApp bot connected to your own WhatsApp account and deployed on a cloud virtual machine.
Table of Contents

How the Architecture Works

How Your WhatsApp Bot Works

Prerequisites

Step 1: Create Firewall Rules on GCP

Step 2: Create a Virtual Machine (Ubuntu 22.04)

Step 3: SSH into the VM

Step 4: Install Docker

Step 5: Install EasyPanel

Step 6: Open the EasyPanel Dashboard

Step 7: Deploy Evolution API

Step 8: Connect WhatsApp

Step 9: Deploy the FastAPI Bot

Step 10: Connect the Webhook - Telling Evolution API Where to Send Messages

Step 11: Final Test

Production Considerations

Conclusion


How the Architecture Works
Before we start installing anything, let’s understand how the system works.


How Your WhatsApp Bot Works
Before we continue setting things up, let's make sure you understand what's actually happening behind the scenes. Don't worry – no technical experience needed here.
Imagine a postal service
Think of your WhatsApp bot like a very fast, automated postal service:

Someone sends you a letter (a WhatsApp message)

A postal worker (Evolution API) picks it up and brings it to your office

Your office manager (FastAPI bot) reads it and writes a reply

The postal worker takes the reply back and delivers it


That's it. That's the whole system.
The 7 steps

Someone sends a message to your WhatsApp number – just like texting a friend.

Evolution API notices the message – it's constantly watching your WhatsApp number for new messages, like a receptionist sitting by the phone.

Evolution API passes the message to your bot – it sends the message content to your app and says "hey, you've got a new message!"

Your bot reads the message and decides what to say – this is where your code does its job.

Your bot sends the reply back to Evolution API – "okay, send this response."

Evolution API delivers the reply through WhatsApp.

The user sees the reply on their phone – usually within seconds.


One line summary
User → WhatsApp → Evolution API → Your Bot → Evolution API → WhatsApp → User

Every step in this guide is just setting up one piece of that chain. Once they're all connected, the whole thing runs on its own automatically.
This architecture allows you to automate replies while keeping full control of your infrastructure.
Why These Tools?
Let’s briefly understand why we’re using each tool.
FastAPI
FastAPI is a modern Python framework for building APIs. It is fast, lightweight, and ideal for handling webhook requests from Evolution API.
Evolution API
Evolution API is a self-hosted WhatsApp automation server built on top of Baileys. It connects your personal WhatsApp account without requiring official WhatsApp Business API approval.
Docker
Docker allows us to run applications in containers. This makes deployments consistent, portable, and production-ready.
EasyPanel
EasyPanel is a graphical platform for managing Docker services. Instead of writing Docker Compose files manually, we use EasyPanel’s UI to deploy and manage our services easily.
Google Cloud Platform (GCP)
GCP provides the virtual machine that hosts our infrastructure. We will use an Ubuntu 22.04 server to run Docker, EasyPanel, Evolution API, and our FastAPI bot.
I chose these tools because they are practical, lightweight, and suitable for real-world production deployments.
Prerequisites
Before starting, make sure you have:

A Google Cloud, AWS, or Azure account

Billing enabled

A project selected

Access to Cloud Shell

Basic Linux and Docker knowledge


Step 1: Create Firewall Rules on GCP
We need to allow traffic to specific ports on our VM. So, we run this command in GCP Cloud Shell:
gcloud compute firewall-rules create easypanel-whatsapp-fw \
 --network default \
 --direction INGRESS \
 --priority 1000 \
 --action ALLOW \
 --rules tcp:22,tcp:80,tcp:443,tcp:3000,tcp:8080,tcp:9000,tcp:5000-5999 \
 --source-ranges 0.0.0.0/0 \
 --description "SSH, EasyPanel, Evolution API, Bot"

This command:

Creates a firewall rule named easypanel-whatsapp-fw

On the default network

Allows incoming internet traffic (INGRESS)

Opens these ports:

22 → SSH (server access)

80 → HTTP

443 → HTTPS

3000, 8080, 9000 → App panels / APIs

5000–5999 → Custom app range



Allows access from any IP address (0.0.0.0/0)


Basically It opens your server so people (and you) can access your apps and services from the internet. This firewall rule allows external traffic to reach your VM.
Step 2: Create a Virtual Machine (Ubuntu 22.04)
Now we'll create the server that hosts everything. Run the following command in the GCP Cloud Shell to set up a virtual machine with Ubuntu 22.04.
gcloud compute instances create whatsapp-vm \
  --zone=asia-south1-a \
  --machine-type=e2-medium \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=30GB \
  --tags=easypanel

This command creates a new virtual machine (VM) on Google Cloud:

Name: whatsapp-vm

Location (zone): asia-south1-a (India region)

Machine size: e2-medium (2 vCPU, 4GB RAM)

Operating System: Ubuntu 22.04 LTS

Disk size: 30GB

Tag: easypanel (used to apply firewall rules)


This creates a Linux server in Google Cloud that you can use to host EasyPanel, WhatsApp bot, or your APIs.
Note: Wait about one minute for the instance to start.
Step 3: SSH into the VM
Connect to your server by using SSH to access the virtual machine you just created on Google Cloud.
gcloud compute ssh whatsapp-vm --zone=asia-south1-a

This command connects to your virtual machine named whatsapp-vm in the zone asia-south1-a using SSH (secure remote login).
It logs you into your Google Cloud server so you can start installing software and running commands. After running this, you will see a terminal prompt – that means you are now inside your Ubuntu server and ready to go.
Step 4: Install Docker
Docker is needed to run EasyPanel and the Evolution API.
First update the system:
sudo apt update -y
sudo apt install -y curl

This does two things:

sudo apt update -y→ Updates your server’s package list (refreshes available software info).

sudo apt install -y curl→ Installs curl, a tool used to download things from the internet using the terminal.


It prepares your server and installs a tool needed to download and install other software.
Then install Docker:
curl -fsSL https://get.docker.com | sudo sh

This command uses curl to download Docker’s official installation script. The | (pipe) sends it directly to sudo sh, which runs the script as administrator.
It automatically installs Docker on your server.
After this finishes, Docker should be installed.
Enable Docker:
sudo systemctl enable docker
sudo systemctl start docker

This command does two things:

enable docker→ Makes Docker start automatically every time the server reboots.

start docker→ Starts Docker right now.


It turns Docker ON now and makes sure it stays ON after restart.
Allow the Ubuntu user to run Docker:
sudo usermod -aG docker ubuntu

This command adds the user ubuntu to the Docker group.
This is important: By default, you must use sudo before every Docker command.After running this, the Ubuntu user can run Docker without needing sudo every time.
Note: This command assumes your username is ubuntu, which is the default on Google Cloud VMs. If your username is different, replace Ubuntu with your actual username.
Exit the session and reconnect:
exit
gcloud compute ssh whatsapp-vm --zone=asia-south1-a


exit→ Logs you out of your current server session.

gcloud compute ssh whatsapp-vm --zone=asia-south1-a→ Logs you back into your Google Cloud VM.


Why we do this: After adding the ubuntu user to the Docker group, you must log out and log back in for the permission changes to work.
Test Docker:
docker run hello-world

This command downloads a small test image called hello-world, runs it inside Docker, and prints a success message if Docker is working correctly.
It checks if Docker is installed and working properly. If you see “Hello from Docker!”, Docker is working correctly.
Step 5: Install EasyPanel
EasyPanel provides a user interface for deploying Docker services. Run this command in the VM:
curl -sSL https://get.easypanel.io | sudo bash

This command:

Downloads the official EasyPanel installation script

Runs it with administrator (sudo) permission

Automatically installs and configures EasyPanel on your server


It installs EasyPanel on your VM so you can manage apps using a web dashboard instead of commands. Installation takes about one minute.
Step 6: Open the EasyPanel Dashboard
Once you have your IP address, open a new tab in your browser and type it in like this:
http://:3000

For example, if your IP was 34.123.45.67, you would type:
http://34.123.45.67:3000

EasyPanel runs on port 3000 by default – that's why we add :3000 at the end. Without it, your browser won't know which service to open on the server.
Create an admin account and log in; the EasyPanel login page will appear.
Click “Create Admin Account”.
Fill in:

Username (choose something you’ll remember)

Email

Password (make it strong!)

Submit the form.


You are now logged in as the admin and can start managing apps, APIs, and bots through the EasyPanel dashboard.
You will see a page like the one below:


Step 7: Deploy Evolution API

Create a new project (for example: whatsapp-1)

Go to Services → Templates

Select Evolution API

Deploy the latest version


Wait until all services turn green. You will see a page like the one below.


Next, open Environment Variables and locate:
AUTHENTICATION_API_KEY

Copy the AUTHENTICATION_API_KEY.
Open the Evolution API dashboard
Inside EasyPanel, find your Evolution API service. You will see a clickable domain link – it usually looks something like:
https://evolution-api.easypanel.host

Click that link to open it in your browser. You will see a JSON response confirming the service is running.
Once you open the link, you’ll see a JSON response confirming success. To proceed with login, copy the Manager link displayed in the response. This link opens the management dashboard where you can authenticate and begin using the Evolution API. The screenshot below highlights the manager URL along with version details for easy reference


Copy the manager link and open it in a new tab, then copy the AUTHENTICATION_API_KEY, which you did in the previous step. This is how it looks, as you can see below:


Create a new instance:

Choose channel: Baileys

Leave phone number blank

Give your instance a name


Save the instance.
Step 8: Connect WhatsApp
Inside your instance dashboard:

Click Get QR

Scan it using WhatsApp on your phone


Once connected, your chats and contacts will sync automatically. If syncing fails, disconnect and reconnect the session.
Step 9: Deploy the FastAPI Bot
Now we’ll deploy the bot service.
1: Go to EasyPanel
You’re opening the EasyPanel dashboard you just installed. This is where you can manage apps, servers, and services using a graphical interface instead of terminal commands.
2: Create a new project
A “project” is like a container or folder for your bot service. It organizes all files, settings, and deployments for this app.
3: Add an App service
“App service” means a running instance of your application. In this case, it will be the WhatsApp bot.
4: Choose Git deployment
Git deployment lets you connect a code repository to EasyPanel.This will automatically download your code from GitHub and run it inside Docker.
5: Paste your repository URL
https://github.com/rajumanoj333/wabot

This is the GitHub repository containing the WhatsApp bot code. EasyPanel will clone this repo and prepare the app automatically.
6: Domains in EasyPanel
This section lets you assign a URL or domain name to your app service. Even if you don’t have a custom domain, you can use your server’s public IP. Your WhatsApp bot app runs on port 9000 inside the server.
7: Set the port to 9000
By setting the domain to use port 9000, EasyPanel knows where to send traffic.
Example URL after this step:
https://your-project.easypanel.host

This is the public address people (and other services) will use to reach your bot.
You’re telling EasyPanel:

“Whenever someone accesses this project, forward them to the bot service running on port 9000.”

Without this step, the bot service would run but you wouldn’t be able to access it from your browser or other apps.
Configure Environment Variables
Set the following variables:
EVOLUTION_API_URL=http://evolution-api:8080
EVOLUTION_API_KEY=YOUR_AUTHENTICATION_API_KEY
INSTANCE_NAME=your_instance_name

Note: You might notice two different names here – AUTHENTICATION_API_KEY (used in EasyPanel) and EVOLUTION_API_KEY (used in your bot code). They are the same key. Just copy the value from EasyPanel and paste it into both places.
Step 10: Connect the Webhook – Telling Evolution API Where to Send Messages
At this point, you have two separate things running:

Evolution API: the service that connects to WhatsApp and handles messages

Your app (fastapi bot): the chatbot brain you deployed in the previous steps


Right now, these two don't know each other exists. They're like two people in different rooms with no way to pass notes between them. A webhook fixes that.
So what exactly is a webhook?
A webhook is simply a URL (a web address) that you hand to one service so it can automatically notify another service when something happens.
You're going to tell Evolution API "whenever a WhatsApp message arrives, forward it to this address." Your app will be sitting at that address, waiting to receive it, read it, and send a reply.
Think of it like a forwarding address at the post office. When mail (a WhatsApp message) arrives, it gets automatically redirected to your app's door.
Let's set it up
1. Open your Evolution API dashboard.
You should already have this open from earlier steps. In the left sidebar, click on Events, then click on Webhook. This is where you control how Evolution API sends data to your app.
2. Turn the webhook on.
At the top of the page, you'll see a toggle next to the word "Enabled". Click it so it turns green. This tells Evolution API that you want to start using a webhook.
3. Enter your app's webhook URL.
In the URL field, type your app's address with /webhook added to the end, like this:
https://your-domain.easypanel.host/webhook

Replace your-domain with the actual domain name you set up when you deployed your app. The /webhook part at the end is important: it's a specific page your app has set up just for receiving these messages. Without it, Evolution API would be knocking on the wrong door.
4. Leave "Webhook by Events" and "Webhook Base64" turned off for now.
These are advanced options you won't need for a basic chatbot.
5. Scroll down to the Events section and enable these two events:

MESSAGES_UPSERT: This triggers every time someone sends your WhatsApp number a message. Without this, your app would never know a message arrived.

SEND_MESSAGE: This triggers when a message is sent out. It helps your app confirm that replies are going through correctly.


You can leave all the other events (like APPLICATION_STARTUP) turned off. They handle things like group chats and contact updates, which aren't needed for what we're building.
6. Click Save.
Quick recap of what you just did
You created a direct line between Evolution API and your app. Now, the moment someone messages your WhatsApp number, Evolution API will instantly pass that message along to your app. Your app reads it, figures out a response, and sends one back all automatically.
This is the step that brings your chatbot to life. Without it, nothing would happen when someone sent you a message. With it, the whole system clicks into place.
Step 11: Final Test
Send a message from a different WhatsApp number (not the connected one).
Send:
Hi

If everything is configured correctly, your bot should reply:
👋 Hello! Bot is working.

Congratulations! Your WhatsApp bot is now live.
Production Considerations
For real-world deployments, consider:

Restricting firewall rules instead of allowing 0.0.0.0/0

Using HTTPS with a custom domain

Securing API keys with a secret manager

Monitoring logs and container health

Setting up automatic backups


This tutorial demonstrates the core working system, but these improvements will make your deployment more secure and scalable.
Conclusion
You now have a fully self-hosted WhatsApp bot running on a cloud VM using FastAPI, Evolution API, Docker, EasyPanel, and GCP.
This setup gives you:

Full control over infrastructure

No dependency on expensive SaaS platforms

Production-ready container deployment

Scalable architecture


From here, you can extend your bot with:
AI integrations : connect your bot to ChatGPT or Gemini or Claude so it can answer questions intelligently instead of just sending fixed replies.
Database storage: save incoming messages, user details, or conversation history to a database like PostgreSQL or MongoDB.
Custom automation workflows trigger actions based on keywords, like sending a PDF when someone types "menu" or booking an appointment when they type "schedule".
CRM integrations :connect your bot to tools like HubSpot or Notion to automatically log leads and customer conversations.Building your own infrastructure is one of the best ways to deeply understand how modern backend systems work together.
Happy building!
 


 Learn Cloud Security Fundamentals in AWS – A Guide for Beginners 
Ijeoma Igboagu — Tue, 09 Dec 2025 00:58:17 +0000
 Security is a vital part of every system and infrastructure. The word "security" comes from the Latin securitas, which is composed of se- (meaning “without”) and cura (meaning “care” or “worry”). Originally, it meant "without worry." Over time, it has come to signify being safe or protected.
Today, when we discuss security, we usually refer to protection from harm, danger, or threats, whether in our homes, online, while using online banking, or even across an entire country. Security is important in everything we do.
Cloud providers, such as AWS, are no exception. Their infrastructure must be safeguarded to ensure users’ peace of mind. But on platforms like AWS, security is a shared responsibility. This means that both the provider and the user play a role in maintaining security.
Amazon Web Services (AWS) is one of the most popular cloud service providers worldwide. With great power and flexibility comes the responsibility to secure your infrastructure, data, and applications in the cloud.
In this tutorial, we’ll explore the fundamental aspects of cloud security in AWS – especially those that are your responsibility – making it easy to understand if you’re new to cloud computing.
Table of Contents

What is Cloud Security?

Why is Cloud Security Important?

Key Cloud Security Concepts

What is a Root User?

How to Create an IAM User for Daily Tasks

Key Differences Between the Root User and an IAM User

What is MFA?



Understanding the AWS Shared Responsibility Model

RDS (Relational Database Service)

S3 (Simple Storage Service)

How to give a user permission

Testing the Policy



Conclusion

Further Reading



What is Cloud Security?
Cloud security is the set of rules, tools, and practices used to protect your data, apps, and services stored online (in the "cloud"). It helps prevent data loss, hacking, and misuse of information.
Think of cloud security like locking the doors of your house. You wouldn’t leave your doors open for anyone to enter. And in the same way, your cloud account must be secured so that your data remains safe.
If your cloud services aren't secure, hackers could steal your data or cause major damage. Whether you're a business or just someone using cloud apps, keeping your information safe is essential.
Why is Cloud Security Important?
Cloud security matters because it ensures that only the right people have access to your information. It protects your data from being lost, stolen, or misused. With good security in place, your applications can run safely without being exposed to attacks.
It also helps you keep your personal or business data private. When your cloud environment is well-protected, the risk of data breaches and financial loss is greatly reduced.
Now that you understand why cloud security is important, let’s look at how AWS helps you stay secure and what your own role is in keeping things safe.
Key Cloud Security Concepts
In AWS, cloud security is the responsibility of both AWS and the customer. This model is called the Shared Responsibility Model.
But before learning how AWS divides security duties, you need to understand that while AWS protects its infrastructure, you must protect your own account.
Let’s discuss some key security concepts that are your responsibility, so you know how to do your part in the shared responsibility model.
What is a Root User?
When you create an AWS account, the first identity that’s created is the Root user Account. This account has full, unrestricted control. It can delete resources, change ownership, and even close your entire AWS account. Because of this, it’s risky to use it for everyday tasks.
AWS recommends using root only for a few important account-level actions.
Certain tasks require a root user account, so you will need to use it occasionally. Such tasks include:

Updating billing and payment information

Closing your AWS account

Changing the root account email

Recovering or resetting MFA for the root user


Apart from these few tasks, avoid using the root user Account completely. Your everyday work should be done through IAM users, not the root account.
How to Create an IAM User for Daily Tasks
Before you start creating any infrastructure in your AWS account, you need an IAM user with the right permissions.
Here’s how to create an IAM user, step by step:

Open the AWS console.

Search for IAM, then select it. This takes you to the IAM page.

On the left-hand side, you will see Users.

Click on it. This takes you to the Create user page.

Click the Create user button. It takes you to the “specify user details page” where you will create an IAM user.

Enter a username (for example, adminuser).

Click on “Provide user access to the AWS Management Console”.




Scroll down and click on “Set a password,” or let AWS generate one for you.

Click Next to go to the permissions page.

Select Attach existing policies directly.

Choose AdministratorAccess. This permission gives the IAM user full access to perform all administrative tasks in your account.

Click Create user.


Once you’ve created this user, sign in with it and use it for your day-to-day tasks. The root user should stay locked down and only be used for rare account-level changes.
Video Walkthrough of How to Create an IAM User:

Key Differences Between the Root User and an IAM User
Just to be clear, let’s summarise the differences between these two accounts:
Root Account
This is the very first account created when you set up AWS. It has unlimited power – literally, everything in the account can be changed, deleted, or closed.
It’s meant for rare, high-level tasks like billing changes, MFA resets, or closing the account. Because it’s so powerful, you shouldn’t use it for daily work.
IAM User Account
This is a user you create inside your AWS account for everyday tasks. You can assign specific permissions, like admin or limited access, to this user. It’s much safer because you can control what it can and cannot do.
If something goes wrong or the credentials are compromised, the blast radius is much smaller than for the root user.
In short, the Root is the master key too powerful for daily use. IAM users are customizable and safer for your regular work.
Here’s a helpful visual to show the differences between the two as well:

Now that you have both your root user and IAM user set up properly, let’s go back to the concept of multi-factor authentication, or MFA.
What is MFA?
MFA adds another layer of security when you sign in. It combines something you know, like your password, with something you have, such as a phone or security device. Even if someone gets your password, they can’t log in without your MFA code.
You can enable MFA in several ways:

Using a virtual MFA app like Google Authenticator or Authy

Using a physical security key such as a YubiKey

Using a hardware device from Gemalto

For AWS GovCloud users, using an MFA device from SurePassID


Enabling MFA makes sure that even if someone gets your password, they still can’t access your account without the second authentication step.
For this tutorial, we’ll use the Google Authenticator app, which you can download for free from the Play Store.
How do I turn this on for my account?

Go to your AWS account.

At the top right corner, you’ll see a menu with your account username or ID. Click on it to open the drop-down.

You’ll see Security Credentials. Click on it. This will take you to the IAM-Security Credentials page.




At the top of the page, you’ll see a button labelled Assign MFA device. Click on it.

You’ll be redirected to a new page where you can choose the type of MFA device you want to use. Scroll down and select Virtual MFA device (this is what the Google Authenticator app uses).



Then just follow the on-screen instructions:

Open the Google Authenticator app on your phone.

Tap the + button and scan the QR code displayed on the AWS screen.

Enter the two codes generated by the app to verify your device.


Once verified, AWS will link the MFA device to your account and take you back to the Security credentials page. If you scroll down, you’ll see your MFA device listed as assigned.
The next time you log in to AWS, you’ll be prompted to enter your MFA code from the Google Authenticator app before you can access your console.

Always enable MFA for both your root user account and your IAM user account, as it’s one of the simplest and most effective ways to protect your AWS account.
Now that you understand these security fundamentals, we can get back to the shared responsibility model.
Understanding the AWS Shared Responsibility Model
The AWS Shared Responsibility Model divides responsibilities between AWS and the customer.
1. AWS’s Responsibility (Security of the Cloud)
AWS is responsible for protecting the infrastructure that runs the services offered in the AWS Cloud. This includes physical security, hardware, software, networking, and facilities.
2. Customer’s Responsibility (Security in the Cloud)
The customer is responsible for securing the data, user accounts, applications, and configurations they store in the cloud.

Image source: AWS shared responsibility model
For example, AWS is responsible for securing its data centres and servers. But customers also have a role to play by properly configuring their accounts and resources.
Let’s take two popular AWS services, RDS (Relational Database Service) and S3 (Simple Storage Service), as examples.
RDS (Relational Database Service)
AWS responsibilities:

Automates database patching

Audits and maintains the underlying instance and storage disks

Applies operating system patches automatically


Customer responsibilities (you):

Manage in-database users, roles, and permissions

Choose whether your database is public or private

Review and control inbound rules, ports, and IP addresses in the database’s security group

Configure database encryption settings


S3 (Simple Storage Service)
AWS responsibilities:

Ensures encryption options are available for your data

Guarantees virtually unlimited storage capacity

Prevents AWS employees and the public from accessing your data

Keeps each customer’s data separated from others


Customer responsibilities (you):

Define your S3 bucket policies according to your security standards

Review bucket configuration settings

Create and manage IAM users and roles with the right permissions


Now you understand who’s responsible for what.
How to Give a User Permission
Security in the cloud isn’t just about strong passwords or enabling MFA – it’s also about controlling who can access what. One of the most important principles in AWS security is to grant users only the access they actually need, nothing more. That’s how you keep your environment safe and your resources protected.
So here’s a key question: how do we know to whom to allow or deny access in the cloud?
Demonstration
Let’s walk through a simple, real-life example together.
Imagine you have a developer on your team who needs access to an S3 bucket named demo-test-app-ij. The goal is to let them upload and view files in the bucket, but not delete anything.
We already created a user earlier in this guide, so we’ll use that same one here.
To get started, go to IAM from your AWS Management Console. Then click on Users from the left-hand menu.
Select the user we created earlier. If you don’t have one yet, go back and follow the steps I showed you before to create a new IAM user.
Once you click on the user’s name, you’ll be taken to the Permissions page. On the permissions page, click on Add permissions.
From the dropdown options, select Create inline policy. This will open the Specify permissions page, where you’ll define the user’s access.
Scroll down through the list of services and select S3. In our example, we’re using S3 because we want to control access to a specific bucket.

Once you select the service you want to define permissions for, the Actions and Resources sections will appear automatically.
In the Actions section, you’ll see a list of what the user can do with the service. Here, you can toggle the effect button to either “Allow” or “Deny.”
Under Actions, scroll through the list and find DeleteObject. Set this action to Deny: DeleteObject. This ensures the user won’t be able to delete any files from the bucket.
Next, move on to the Resources section. Here, you’ll specify which bucket these permissions apply to.
Add the following bucket ARN: arn:aws:s3:::demo-test-app-ij/*. This means the rule applies to everything inside the demo-test-app-ij bucket.
Once you’ve added the ARN and confirmed the settings, click Save policy.
Now, let’s put all these instructions together in a practical example:

Testing the Policy
Now it’s time to confirm that our permissions work the way we expect.
Head over to the S3 service and open the bucket named demo-test-app-ij. Try uploading a file; it should upload successfully. Next, try deleting that same file. You’ll see an error message saying Failed to delete objects.
That’s exactly what we want! The user can upload and view files, but can’t delete them, because we never permitted them to do so.

Conclusion
Security has always been about peace of mind. Whether it’s your home, your phone, or your cloud account, you’ll want to know your data is safe.
AWS gives you a strong foundation by securing the cloud itself. But your part matters too: things like enabling MFA, using strong passwords, and managing who can access what. These simple habits go a long way in keeping your data protected.
Cloud security isn’t a one-time setup. It’s an ongoing practice. When both AWS and its users stay alert, the cloud becomes a place you can trust to store, build, and grow with confidence.
Now that you have a basic understanding of how security works in AWS, you’re ready to go deeper and start exploring the services that keep it all running smoothly.
Further Reading

What is Cloud Computing? A Guide for Beginners

How to Deploy a Kubernetes App on AWS EKS

The Best AWS Services to Deploy Front-End Applications in 2025

What is Backend as a Service (BaaS)? A Beginner's Guide

The Hidden Challenges of Building with AWS


If you found this article helpful, feel free to share it. And if you prefer learning through videos, I also explain cloud topics in simple terms on my YouTube channel.
Stay updated with my projects by following me on Twitter, LinkedIn and GitHub.
Thank you for reading!
 


 How to Build a Full-Stack Serverless CRUD App using AWS and React 
Chisom Uma — Tue, 21 Oct 2025 16:37:30 +0000
 Imagine running a production application that automatically scales from zero to thousands of users without ever touching a server configuration. That's the power of serverless architecture, and it's easier to implement than you might think.
If you're a junior cloud engineer ready to move beyond theoretical AWS concepts and build something real, this tutorial walks you through creating a complete serverless coffee shop management system.
You'll learn how to architect, deploy, and secure a production-ready application using AWS's most powerful serverless services.
Without further ado, let's get started!
Table of Contents

Prerequisites

Tools We’ll be Using

What We are Building

Why Serverless?

Architectural Overview

Build a Serverless Full-Stack App

Step 1: Create a DynamoDB table

Step 2: Create an IAM role for the Lambda function

Step 3: Create Lambda Layer And Lambda Functions

Step 4: Create an API Gateway To Expose Lambda Functions

Step 5: Set up React Application And Upload Build To S3 Bucket

Step 6: Set up Amazon API Gateway Authorizer

Step 7: Create Cloudfront Distribution With Behaviors For S3 And API Gateway

Step 8: Set up React Application And Upload Build To S3 Bucket



Troubleshooting Access Denied Error

Step 1: Set up Origin Access Control (OAC)

Step 2: Update S3 Bucket Policy

Step 3: Set Default Root Object



Conclusion


Prerequisites

Basic knowledge of AWS.

Basic knowledge of AWS serverless services.

Knowledge of React (not required).

Basic knowledge of Postman or other API testing tools.


Tools We’ll be Using

React.js

AWS Lambda

DynamoDB

API Gateway

Cognito

CloudFront


What We are Building
We'll build a complete serverless coffee shop management system using AWS cloud services. Coffee shop owners will securely log in through AWS Cognito authentication and have full control over their inventory, adding new products, updating stock levels, viewing current inventory, and removing discontinued items. To follow along with this tutorial, you can clone the repo here.
This is what our user interface (UI) looks like:

Why Serverless?
AWS serverless services like Lambda, Cognito, and API Gateway automatically scale to zero during quiet periods and instantly ramp up when traffic spikes. While 'serverless' might sound like there are no servers at all, this isn't actually the case. It means that AWS handles all the heavy lifting, provisioning, managing, and scaling of the infrastructure behind the scenes. You only pay for what you use.
Architectural Overview
Our architecture uses DynamoDB as the data store, with Lambda functions (enhanced by Lambda layers) handling all API Gateway requests. Cognito secures the API Gateway, while CloudFront CDN delivers everything globally. The React frontend connects directly to the Cognito UserPool and gets hosted on S3 with CloudFront distribution. For production deployments, you can add a custom domain using CloudFlare and AWS Certificate Manager.
Build a Serverless Full-Stack App
In this section, you’ll build a full-stack serverless architecture.
Step 1: Create a DynamoDB table
To create a DynamoDB table, navigate to your AWS console and select the DynamoDB section. You can do this quickly by typing “DynamoDB” into the AWS search bar and clicking on DynamoDB. Next, follow the steps below to complete your table creation:

Click Create table.

Input table name as “CoffeeShop” or anything you want to name it.

Input partition key as “coffeeId” or anything you want to name it.

Click Create table.


Step 1.1: Create items
You need to create items for the table. This helps with testing connectivity to your DynamoDB table.
For our use case, we’ll be creating an item in the table called “coffee” and input attributes such as coffeeId, name, price, and availability. To create an item:

Click Explore items on the left navigation pane.

Click Create items.

Click the CoffeeShop radio button, then click Create item.




Click Add new attribute. This allows you to add different data types such as strings and booleans. The JSON structure below shows the attributes created.


{
    "coffeeId": "c123",
    "name": "new cold coffee",
    "price": 456,
    "available": true
}

Step 2: Create an IAM role for the Lambda function
Next, create a Lambda function that interacts with the DynamoDB table using an IAM role attached to the function. We’ll be setting up an IAM role named "CoffeeShopRole" that serves as a shared execution role for all Lambda functions in the coffee shop application.
This role includes the following permissions:

CloudWatch Logs: Full logging capabilities (create, write, and manage log streams)

DynamoDB Access: Complete read, write, update, and delete operations on the "CoffeeShop" table.


To do this:

Navigate to the AWS IAM console.

Navigate to Roles.

Click Create role.

Select the Lambda service.

Search for “AWSLambdaBasicExecutionRole.”

Name your role and click Create role.


This is what the role looks like:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:DeleteItem",
                "dynamodb:GetItem",
                "dynamodb:Scan",
                "dynamodb:UpdateItem"
            ],
            "Resource": "arn:aws:dynamodb::"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

This policy allows us to create CloudWatch logs. Next, create an inline policy to allow communications to DynamoDB. Select the following actions for the table:

Get

Put

Update

Scan

Delete


Next, connect your table ARN to the policy by navigating to the created table and copying the ARN into the policy.
Step 3: Create Lambda Layer And Lambda Functions
Now, we need to connect our Lambda function to the DynamoDB table. For this, we’ll need the DynamoDB JavaScript SDK. To get started, create two folders: lambda > get in your IDE, preferably VS Code. Navigate into these folders in your terminal and run the npm init command to initialize your project. Update your package.json file with this:

{
  "name": "get",
  "type": "module",
  "version": "1.0.0",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "description": ""
}

Note: that we’ll be using ECMAScript throughout the course of this tutorial.
Next, we have to create a reusable Node.js Lambda layer containing the DynamoDB JavaScript SDK and shared utility functions. This layer acts like a common library that can be attached to multiple Lambda functions, eliminating the need to bundle the same dependencies repeatedly in each function's deployment package.
To use the SDK, create a new folder in your directory titled index.mjs and paste in the code below:

// getCoffee function
import { DynamoDBClient, GetItemCommand } from "@aws-sdk/client-dynamodb"; // ESM import
const config = {
    region: "us-east-1",
};
const client = new DynamoDBClient(config);
export const getCoffee = async (event) => {
    const coffeeId = "c123";
    const input = {
        TableName: "CoffeShop",
        Key: {
            coffeeId: {
                S: coffeeId,
            },
        },
    };
    const command = new GetItemCommand(input);
    const response = await client.send(command);
    console.log(response);
    return response;
}

The code above is the getCoffee function that connects to the DynamoDB table called CoffeShop, looks up the coffee with the ID c123, and displays its details.
Change region to your specific region.
Next, install the Lambda dependencies for the SDK using the command below:

npm i @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb

Then, create a zip file for all the current files using the command below:
zip -r get.zip ./*

This creates a zip file in your project directory. Now, navigate to the Lambda function page on your AWS console and upload this zip file.
Click Test to test your application. If you run into an error, edit the Runtime settings and change the handler name to index.getCoffee. Deploy and run the code again, you should get a successful response from DynamoDB as shown below:
Response:

{
  "$metadata": {
    "httpStatusCode": 200,
    "requestId": "R14Q5UMTP3K9P9NAF1OGG0IB57VV4KQNSO5AEMVJF66Q9ASUAAJG",
    "attempts": 1,
    "totalRetryDelay": 0
  },
  "Item": {
    "available": {
      "BOOL": true
    },
    "price": {
      "N": "34"
    },
    "name": {
      "S": "My New Coffee"
    },
    "coffeeId": {
      "S": "c123"
    }
  }
}

Now, let’s make the necessary changes to make our function ready for the API gateway to get the API. When someone requests a coffee using the /coffee endpoint, we want the app to returns a list of all coffees. But if the request is made to /coffee/c123 or /coffee/id, then the app returns only details about that specific coffee.
To do this, head back to your index.mjs file and paste in the code below:

import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, GetCommand, ScanCommand } from "@aws-sdk/lib-dynamodb";
const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);
const tableName = process.env.tableName || "CoffeShop";
const createResponse = (statusCode, body) => {
    const responseBody = JSON.stringify(body);
    return {
        statusCode,
        headers: { "Content-Type": "application/json" },
        body: responseBody,
    };
};
export const getCoffee = async (event) => {
    const { pathParameters } = event;
    const { id } = pathParameters || {};
    try {
        let command;
        if (id) {
            command = new GetCommand({
                TableName: tableName,
                Key: {
                    "coffeeId": id,
                },
            });
        }
        else {
            command = new ScanCommand({
                TableName: tableName,
            });
        }
        const response = await docClient.send(command);
        return createResponse(200, response);
    }
    catch (err) {
        console.error("Error fetching data from DynamoDB:", err);
        return createResponse(500, { error: err.message });
    }
}

Run the zip -r get.zip ./* command again and re-upload the zip file in your Lambda function page.
This AWS Lambda function implements a serverless API endpoint for retrieving coffee data from a DynamoDB table, using the AWS SDK v3 to create a document client that can either fetch a specific coffee item by ID (when an id parameter is provided in the URL path) or return all items from the table (when no ID is specified, though there's a missing import for ScanCommand).
The function extracts the coffee ID from the incoming event's path parameters, constructs the appropriate DynamoDB command (GetCommand for single items or ScanCommand for all items), executes the database operation, and returns a properly formatted HTTP response with JSON headers and appropriate status codes - either a 200 success response with the coffee data or a 500 error response if something goes wrong during the database operation.
Repeat the steps above for the create, update, and delete functions. You can find these functions in your cloned project repo.
Step 4: Create an API Gateway To Expose Lambda Functions
To create an API that points to the Lambda function:

Navigate to API Gateway > Routes and click Create.

Create the following endpoints.



GET /coffee  -> getCoffee lambda function
GET /coffee/{id}  -> getCoffee lambda function
POST /coffee  -> createCoffee lambda function
PUT /coffee/{id}  -> updateCoffee lambda function
DELETE /coffee/{id}  -> deleteCoffee lambda function


Navigate to Integrations and create integrations for these endpoints. To do this, go to the Manage integrations tab, click Create, and select Lambda as the integration target.

Now, in your API Gateway portal, click on API: CoffeeShop...(random numbers) and copy the invoke URL for testing, as shown in the image below:

The get request with an id returns a 200 OK response with the created items in DynamoDB. You can play around with the rest of the endpoints on Postman :)
Adding Lambda Layer to Solve the Dependency Issue
Before we continue with this tutorial, I’d like to address one problem with the previous steps so far. All functions use the same dependency, but for each function, we had to maintain separate node_modules folders and packages.json files. To fix this issue, we’ll be using Lamba Layer. Layer contains all the dependencies, while the functions contain only your code.
To get started:

Create a new folder in your IDE called LambdaWithLayer.

Create two additional folders under the LambdaWithLayer named LambdaFunctionsWithLayer and nodejs.


Note: You must use the name nodejs for this to work.

Navigate to the nodejs folder and initialize using the npm init command.

Install dependencies using the command below:


npm i @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb


Create a new file called utils.js under the nodejs folder and paste in the code below:


import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import {
    DynamoDBDocumentClient,
    ScanCommand,
    GetCommand,
    PutCommand,
    UpdateCommand,
    DeleteCommand
} from "@aws-sdk/lib-dynamodb";
const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);
const createResponse = (statusCode, body) => {
    return {
        statusCode,
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify(body),
    };
};
export {
    docClient,
    createResponse,
    ScanCommand,
    GetCommand,
    PutCommand,
    UpdateCommand,
    DeleteCommand
};

Here, we imported all the commands for our API operations. Now, we can create Lambda Functions without installing the SDK dependencies for each one. For example, you can create a get folder under the LambdaFunctionsWithLayer folder for the get function, then create an index.mjs file under the get folder. Next, paste the code below:

import { docClient, GetCommand, ScanCommand, createResponse } from '/opt/nodejs/utils.mjs'; // Import from Layer
const tableName = process.env.tableName || "CoffeShop";
export const getCoffee = async (event) => {
    const { pathParameters } = event;
    const { id } = pathParameters || {};
    try {
        let command;
        if (id) {
            command = new GetCommand({
                TableName: tableName,
                Key: {
                    "coffeeId": id,
                },
            });
        }
        else {
            command = new ScanCommand({
                TableName: tableName,
            });
        }
        const response = await docClient.send(command);
        return createResponse(200, response);
    }
    catch (err) {
        console.error("Error fetching data from DynamoDB:", err);
        return createResponse(500, { error: err.message });
    }
}

Now we can see that, in the code, we no longer require dependencies for the get function. We just imported from the layer.
Repeat the above steps for other functions.
Note: You can find the code for other functions in the cloned repo.

Create a zip folder for each function. You can do this by creating a file called create_zip.sh under the LambdaFunctionsWithLayer folder. Then paste the script below:


echo "Creating zip for layer"
zip -r layer.zip nodejs
echo "Creating zip for GET Function"
cd LambdaFunctionsWithLayer/get
zip -r get.zip index.mjs
mv get.zip ../../
cd ../..
echo "Creating zip for POST Function"
cd LambdaFunctionsWithLayer/post
zip -r post.zip index.mjs
mv post.zip ../../
cd ../..
echo "Creating zip for UPDATE Function"
cd LambdaFunctionsWithLayer/update
zip -r update.zip index.mjs
mv update.zip ../../
cd ../..
echo "Creating zip for DELETE Function"
cd LambdaFunctionsWithLayer/delete
zip -r delete.zip index.mjs
mv delete.zip ../../
cd ../..
echo "Success!"

Run the script using the sh create_zip.sh command. This creates zip files (including a layer.zip file) that you can upload to your AWS Lambda function Layer page.

In your AWS Lambda function page, navigate to Layers and upload the layer.zip file**.**

Update the functions by uploading the newly created zip files for each code.

Add the layer to the function by clicking Layers in the function view:



Next, click Add a layer, then select Custom layers. Then choose “DynamoDBLayer” and version “1”.

Click Add.

Repeat for all the other functions.


Step 5: Set up React Application And Upload Build To S3 Bucket
To set up our React application, navigate to the frontend folder of the cloned repository on your local machine and run npm install to install the dependencies. Then run npm run dev to start your development environment on your local machine. You should see the preview in your browser at: http://localhost:5173/.

If you inspect the page using Chrome DevTools, you’ll see that we ran into some CORS error:

Now, let’s fix this problem. To do that:

Navigate your API Gateway page.

Click on CORS on the left navigation panel.

Click Configure.

Copy your localhost URL and paste it into the Access-Control-Allow-Origin field.



Ensure to remove the / at the end of your URL as shown in the image above.

Click Add.

Enter the Access-Control-Allow-Headers field with the text content-type and click Add.

Include GET, POST, OPTIONS, PUT, and DELETE in Access-Control-Allow-Methods.

Click Save.


Now it returns our coffee, and the CORS error has been resolved.

When you add a new coffee, you should see the newly created items in your DynamoDB database.
Step 6: Set up Amazon API Gateway Authorizer
AWS Congnito helps you secure your Amazon API Gateway. Gateway validates the access token with Amazon Cognito to ensure it is valid and has not expired, and grants or denies access based on token validity.
To get started:

Navigate to Amazon Cognito > User pools.

Click Create user pool.

Select Single-page application (SPA).

Select email as the preferred sign-in and sign-up method.

Use http://localhost:5174/ or your own local URL as the return URL.

Click Create user directory.


You’ll be presented with a page containing code that we can copy and paste into our app for integration. But before we do that, let's head back to API Gateway and integrate it with Cognito. To do that:

Go to the Authorization section in API Gateway.

Navigate to Manage authorizers.

Click Create.

Select JWT and name it “Cognito-CoffeeShop”

Copy your issuer URL from Cognito Overview. Your issuer URL is the Token signing key URL. If you click on the URL, you’ll be taken to your browser, where you'll see the keys that’ll be used for verification.

For the Audience, navigate to the Cognito user pool, then to App clients, and select CoffeShopClient. Copy the Client ID.

Click Create.

Go to Routes and add authorizations to each endpoint.


Now, to integrate with our front-end app:
Navigate into the frontend folder and run the command below:
npm install oidc-client-ts react-oidc-context --save


Go to the App clients section in Cognito user pools to find the readily available code snippets for integration.

Edit your main.jsx file to include the code below:



import { createRoot } from 'react-dom/client'
import { BrowserRouter as Router, Route, Routes } from "react-router-dom";
import './index.css'
import App from './App.jsx'
import ItemDetails from "./ItemDetails";
import { AuthProvider } from "react-oidc-context";
const cognitoAuthConfig = {
  authority: "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_rXq7q3KLm",
  client_id: "6fjfrlaup7oph5lhf1q8q6pnp4",
  redirect_uri: "http://localhost:5174",
  response_type: "code",
  scope: "email openid phone",
};
createRoot(document.getElementById('root')).render(
  
    
      
        
          "/" element={} />
          "/details/:id" element={} />
        
      
    
  
)

Here, we imported AuthProvider from react-oidc-context, then wrapped our app with AuthProvider.  Then, move the code in the App.jsx file to a newly created Home.jsx file, and update App.jsx file with the code below:

import { useEffect, useState } from "react";
import "./App.css";
// App.js
import { useAuth } from "react-oidc-context";
function App() {
  const auth = useAuth();
  const signOutRedirect = () => {
    const clientId = "6fjfrlaup7oph5lhf1q8q6pnp4";
    const logoutUri = "http://localhost:5174/";
    const cognitoDomain = "https://us-east-1rxq7q3klm.auth.us-east-1.amazoncognito.com";
    window.location.href = `${cognitoDomain}/logout?client_id=${clientId}&logout_uri=${encodeURIComponent(logoutUri)}`;
  };
  if (auth.isLoading) {
    return Loading...;
  }
  if (auth.error) {
    return Encountering error... {auth.error.message};
  }
  if (auth.isAuthenticated) {
    return (
      
        
        
      
    );
  }
  return (
    
      
      
    
  );
}
export default App;

Now, when you run the application again, you should see this login page on your browser:

When you click on Sign in, you’ll get directed to the Sign in page. Click Sign up. You should see the page below to create your account.

During sign-up, a verification code is sent to your sign-up email. Once you’re logged in, you can then access your coffee dashboard.
Step 7: Create Cloudfront Distribution With Behaviors For S3 And API Gateway
To create a distribution.

Navigate to CloudFront.

Click Create distribution.

In the Origin page, select the S3 bucket and browse through your created S3 buckets.

Select your coffee shop bucket.

Set origin path to /dist.

Select Origin access control under Origin access.

Update your React code and AWS Cognito with the distribution domain name provided in the CloudFront log-in pages tab.


Step 8: Set up React Application And Upload Build To S3 Bucket
In this step, we’ll be building our React application and uploading the static files to an Amazon S3 bucket, which is then served from a CloudFront distribution.
To get started:

Create an S3 bucket and give it the name “mycoffeeShop123new”. This name should be globally unique across all AWS accounts.

In the frontend folder, run the npm run build command. This creates a dist folder in your directory.

Head back to the S3 bucket and drag-and-drop the dist folder into S3 to upload it.

Click Upload.


Now, copy your CloudFront distribution URL and try to access your site in a private browser, for example, Chrome incognito. You should see your site live in the browser.
Troubleshooting Access Denied Error
You may encounter an access denied error in the browser:

<Error>
    <Code>AccessDeniedCode>
    <Message>Access DeniedMessage>
Error>

It may be because of a likely S3 + CloudFront configuration error. Here are the steps to resolve this issue:
Step 1: Set up Origin Access Control (OAC)

Go to CloudFront > Your Distribution > Origins tab.

Select your S3 origin and click Edit.

Under Origin access, select Origin access control settings (recommended)

Click Create new OAC (or select an existing one).

Click Save changes.


Step 2: Update S3 Bucket Policy
After saving, CloudFront will show you a "Copy Policy" button. Click it, then:

Go to your S3 bucket > Permissions tab.

Scroll to Bucket policy and click Edit.

Paste the copied policy (it should look like this):



{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowCloudFrontServicePrincipal",
            "Effect": "Allow",
            "Principal": {
                "Service": "cloudfront.amazonaws.com"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::YOUR-BUCKET-NAME/*",
            "Condition": {
                "StringEquals": {
                    "AWS:SourceArn": "arn:aws:cloudfront::YOUR-ACCOUNT-ID:distribution/YOUR-DISTRIBUTION-ID"
                }
            }
        }
    ]
}


Click Save changes.

Step 3: Set Default Root Object

Go back to CloudFront > Your Distribution > General tab.

Click Edit.

Set Default root object to index.html.

Save changes.


Now, try accessing the site again. It should work.
This brings us to the end of this tutorial. I hope you were able to learn a thing or two about building serverless systems :)
Conclusion
Congratulations! You've just built a production-ready serverless application from the ground up. You've successfully architected a complete CRUD system that automatically scales, stays secure with Cognito authentication, and costs you only what you actually use.
 


 Docker Build Tutorial: Learn Contexts, Architecture, and Performance Optimization Techniques 
Destiny Erhabor — Tue, 07 Oct 2025 18:20:08 +0000
 Docker build is a fundamental concept every developer needs to understand. Whether you're containerizing your first application or optimizing existing Docker workflows, understanding Docker build contexts and Docker build architecture is essential for creating efficient, scalable containerized applications.
This comprehensive guide covers everything from basic concepts to advanced optimization techniques, helping you avoid common pitfalls and build better Docker images.
Table of Contents

What is Docker Build?

Docker Build Architecture: How It All Works

Docker Build Features

Docker Build Context

Types of Docker Build Contexts

Common Docker Build Mistakes (And How to Fix Them)

How to Optimize and Monitor Build Performance

Best Practices for Docker Build Performance

Troubleshooting Docker Build Issues

Conclusion


What is Docker Build?
Docker build is the process of creating a Docker image from a Dockerfile and a set of files called the build context. When you run docker build, you're instructing Docker to:

Read your Dockerfile instructions

Gather the necessary files (build context)

Execute each instruction step-by-step

Create a final Docker image


Think of it like following a recipe: the Dockerfile is your recipe, and the build context contains all the ingredients you might need.
Docker Build Architecture: How It All Works
Docker Build uses a client-server architecture where two separate components (Buildx and BuildKit) work together to build your Docker images. This is different from how many people think Docker works, as it's not just one monolithic program doing everything.
What is Buildx (The Client)?
Buildx serves as the user interface that you interact with directly whenever you work with Docker builds. When you type docker build . in your terminal, you're actually communicating with Buildx, which acts as the intermediary between you and the actual build engine.
Buildx’s primary jobs:

Interprets your build command and options

Sends structured build requests to BuildKit

Manages multiple BuildKit instances (builders)

Handles authentication and secrets

Displays build progress to you


What is BuildKit (The Server/Builder)
BuildKit functions as the actual build engine that performs all the heavy lifting during the Docker build process. This powerful backend component receives the structured build requests from Buildx and immediately begins reading and interpreting your Dockerfiles line by line.
BuildKit’s primary jobs:

Receives build requests from Buildx

Reads and interprets Dockerfiles

Executes build instructions step by step

Manages build cache and layers

Requests only the files it needs from the client

Creates the final Docker image


How They Communicate
Here's what happens when you run docker build .:

When you run docker build, the command initiates a multi-step process with BuildKit (as illustrated in the above image).
First, it sends a build request containing your Dockerfile, build arguments, export options, and cache options. BuildKit then intelligently requests only the files it needs when it needs them, starting with package.json to run npm install for dependency installation.
After that's complete, it requests the src/ directory containing your application code and copies those files into the image with the COPY command.
Once all build steps are finished, BuildKit sends back the completed image. Optionally, you can then push this image to a container registry for distribution or deployment.
This on-demand file transfer approach is one of BuildKit's key optimizations: rather than sending your entire build context upfront, it only requests specific files as each build step needs them, making the build process more efficient.
Key Communication Details
Build request contains:
{
  "dockerfile": "FROM node:18\nWORKDIR /app\n...",
  "buildArgs": {"NODE_ENV": "production"},
  "exportOptions": {"type": "image", "name": "my-app:latest"},
  "cacheOptions": {"type": "registry", "ref": "my-app:cache"}
}

Resource requests:

BuildKit asks: "I need the file at ./package.json"

Buildx responds: Sends the actual file content

BuildKit asks: "I need the directory ./src/"

Buildx responds: Sends all files in that directory


Why This Architecture Exists
1. Efficiency
The old Docker builder had a major flaw: it always copied your entire build context upfront, regardless of what was actually needed. Even if your Dockerfile only used a few files, Docker would transfer hundreds of megabytes before starting the build.
BuildKit fixes this through on-demand file transfers. It only requests specific files at each step.
# Old Docker Builder (legacy)
# Always copied ENTIRE context upfront
$ docker build .
Sending build context to Docker daemon  245.7MB  # Everything!

# New BuildKit Architecture  
# Only requests files when needed
$ docker build .
#1 [internal] load build definition from Dockerfile    0.1s
#2 [internal] load .dockerignore                       0.1s
#3 [1/4] FROM node:18                                  0.5s
#4 [internal] load build context                       0.1s
#4 transferring context: 234B  # Only package.json initially!
#5 [2/4] WORKDIR /app                                  0.2s  
#6 [3/4] COPY package*.json ./                         0.1s
#7 [4/4] RUN npm install                               5.2s
#8 [internal] load build context                       0.3s  
#8 transferring context: 2.1MB  # Now requests src/ files
#9 [5/4] COPY src/ ./src/                              0.2s

2. Scalability
The client-server architecture enables scalability features. Multiple Docker CLI clients can connect to the same BuildKit instance, and BuildKit can run on remote servers instead of your local machine. This means you could execute builds on a cloud server while controlling them from your laptop. Teams can also deploy multiple BuildKit instances for different teams or purposes, scaling from individual developers to large enterprises.
3. Security
Security is improved by only requesting sensitive files when explicitly needed. BuildKit never sees files your Dockerfile doesn't reference, reducing the attack surface. It also handles credentials through separate, secure channels rather than mixing them with your build context, preventing secrets from being embedded in image layers or exposed in build logs.
Real-World Example
Let's trace through a typical build step by step. You can find the full code available here: 02-python-cache.
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ ./src/
COPY main.py .
CMD ["python", "main.py"]

Let’s see what actually happens here:

You run docker build .

Buildx says to BuildKit:


   "Here's a build request with this Dockerfile"


BuildKit processes: FROM python:3.9-slim

No client files needed, pulls base image


BuildKit processes: COPY requirements.txt .

BuildKit to Buildx: "I need requirements.txt"

Buildx to BuildKit: Sends the file content



BuildKit processes: RUN pip install -r requirements.txt

No client files needed, runs inside container


BuildKit processes: COPY src/ ./src/

BuildKit to Buildx: "I need all files in src/ directory"

Buildx to BuildKit: Sends all files in src/



BuildKit processes: COPY main.py .

BuildKit to Buildx: "I need main.py"

Buildx to BuildKit: Sends the file



BuildKit to Buildx: "Build complete, here's your image"


From the illustration, you can see that BuildKit only requests what it needs, when it needs it. Not this entire context:

my-app/
├── src/                 # ← Only loaded when COPY src/ runs
├── tests/              # ← Never requested (not in Dockerfile)
├── docs/               # ← Never requested  
├── node_modules/       # ← Never requested (in .dockerignore)
├── requirements.txt    # ← Loaded early (first COPY)
└── main.py            # ← Loaded later (second COPY)

Docker Build Features
Named Contexts
👉 Demo project: 07-named-contexts
Named contexts allow you to include files from multiple sources during a build while keeping them logically separated. This is useful when you need documentation, configuration files, or shared libraries from different directories or repositories in your build.
# Build with additional named context
docker build --build-context docs=./documentation .

# Use named context in Dockerfile
FROM alpine
COPY . /app
# Mount files from named context
RUN --mount=from=docs,target=/docs \
    cp /docs/manual.pdf /app/

Build Secrets
👉 Demo project: 06-build-secrets
Build secrets let you pass sensitive information (like API keys or passwords) to your build without including them in the final image or build history. The secrets are mounted temporarily during specific RUN commands and are never stored in image layers.
# Pass secret to build
echo "api_key=secret123" | docker build --secret id=apikey,src=- .

# Use secret in Dockerfile
FROM alpine
RUN --mount=type=secret,id=apikey \
    export API_KEY=$(cat /run/secrets/apikey) && \
    curl -H "Authorization: $API_KEY" https://api.example.com/data

Docker Build Context
What is a Build Context?
The build context is the collection of files and directories that Docker can access during the build process. It's like gathering all your cooking ingredients on the counter before you start cooking.
docker build [OPTIONS] CONTEXT
                       ^^^^^^^
                       This is your build context

Why Build Contexts Matter

Security: Only files in the context can be accessed during build

Performance: Large contexts slow down builds

Functionality: Your Dockerfile can only COPY/ADD files from the context

Efficiency: Understanding contexts helps you build faster, leaner images


Types of Docker Build Contexts
1. Local Directory Context (Most Common)
👉 See code here: 01-node-local-context
This is what you'll use in 90% of cases – pointing to a folder on your machine:
# Use current directory
docker build .

# Use specific directory
docker build /path/to/my/project

# Use parent directory
docker build ..

Example Project Structure:
my-webapp/
├── src/
│   ├── index.js
│   └── utils.js
├── public/
│   ├── index.html
│   └── styles.css
├── package.json
├── package-lock.json
├── Dockerfile
├── .dockerignore
└── README.md

Corresponding Dockerfile:
FROM node:18-alpine
WORKDIR /app

# Copy package files first for better layer caching
COPY package*.json ./
RUN npm ci --only=production

# Copy application source
COPY src/ ./src/
COPY public/ ./public/

EXPOSE 3000
CMD ["node", "src/index.js"]

2. Remote Git Repository Context
You can build directly from Git repositories without cloning locally:
# Build from GitHub main branch
docker build https://github.com//project.git

# Build from specific branch
docker build https://github.com//project.git#develop

# Build from specific directory in repo
docker build https://github.com//project.git#main:docker

# Build with authentication
docker build --ssh default git@github.com:/private-repo.git

This has various cases like CI/CD pipelines, building open-source projects, ensuring clean builds from source control, automated deployments, and so on.
3. Remote Tarball Context
You can also build from compressed archives hosted on web servers. A remote tarball is a .tar.gz or similar compressed archive file accessible via HTTP/HTTPS. This is useful when your source code is packaged and hosted on a web server, artifact repository, or CDN. Docker downloads and extracts the archive automatically, using its contents as the build context.
This approach works well for CI/CD pipelines where build artifacts are stored centrally, or when you want to build images from released versions of your code without cloning entire repositories.
# Build from remote tarball
docker build http://server.com/context.tar.gz

# BuildKit downloads and extracts automatically
docker build https://example.com/project-v1.2.3.tar.gz

4. Empty Context (Advanced)
When you don't need any files, you can pipe the Dockerfile directly:
# Create image without file context
docker build -t hello-world - <echo "Hello, World!" > /hello.txt
CMD cat /hello.txt
EOF

Common Docker Build Mistakes (And How to Fix Them)
Mistake 1: Wrong Context Directory
👉 Reproduced here: 04-wrong-context
This mistake occurs when you run docker build from the wrong directory, causing the build context to be different from what your Dockerfile expects.
In the example, running docker build frontend/ from the /projects/ directory means the context is /projects/frontend/, but the Dockerfile tries to access ../shared/utils.js, which is outside this context. Docker can only access files within the build context, so any attempt to reference files outside it will fail.
# Project structure
/projects/
├── frontend/
│   ├── Dockerfile
│   ├── src/
│   └── package.json
└── shared/
    └── utils.js

# WRONG - Running from projects directory
docker build frontend/
# This won't work if Dockerfile tries to COPY ../shared/utils.js

How to fix wrong context directory:
The key is aligning your build context with what your Dockerfile needs.

Option 1 changes your working directory so the context matches your Dockerfile's expectations. You run the build from inside frontend/, making that directory the context root.

Option 2 keeps you in the parent directory but explicitly sets it as the context (the . argument) while telling Docker where to find the Dockerfile with the -f flag. Now both frontend/ and shared/ are accessible since they're both within the /projects/ context.


# Option 1: Run from correct directory
cd frontend
docker build .

# Option 2: Use parent directory as context
docker build -f frontend/Dockerfile .

Mistake 2: Including Massive Files
👉 Optimized version with .dockerignore: 05-dockerignore-optimization
This mistake happens when your build context contains large, unnecessary files that slow down the build process.
Docker must transfer the entire context to the build daemon before starting, so including files like node_modules (which can be hundreds of MB), git history, build artifacts, logs, and database dumps makes builds painfully slow. These files are rarely needed in the final image and should be excluded.
# This context includes everything!
my-app/
├── node_modules/        # 200MB+ 
├── .git/               # Version history
├── dist/               # Built files
├── logs/               # Log files
├── temp/               # Temporary files
├── database.dump       # 1GB database backup
└── Dockerfile

How to fix Docker build massive files:
Use .dockerignore to exclude unnecessary files, dramatically reducing context size and build time. We’ll discuss this in more detail below.
Mistake 3: Inefficient Layer Caching
👉 See good practice code here: 02-python-cache
This mistake wastes Docker's layer caching system by copying frequently-changing files (like source code) before running expensive operations (like npm install). When you modify your source code, Docker invalidates the cache for that layer and all subsequent layers, forcing npm install to run again even though dependencies haven't changed. This can turn a 5-second build into a 5-minute build.
# BAD - Changes to source code rebuild npm install
FROM node:18
COPY . /app
WORKDIR /app
RUN npm install
CMD ["npm", "start"]

How to fix docker build inefficient layer caching:
Copy dependency files first, install dependencies, then copy source code. This way, npm install only runs when package.json actually changes:
# GOOD - npm install only rebuilds when package.json changes
FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["npm", "start"]

How to Optimize and Monitor Build Performance
Understanding build performance metrics helps you identify bottlenecks and measure improvements.
How to Optimize Docker Builds with .dockerignore
The .dockerignore file is your secret weapon for faster, more secure builds. It tells Docker which files to exclude from the build context.
Creating .dockerignore Patterns
Create a .dockerignore file in your project root. The syntax is similar to .gitignore, and you can use wildcards (*), match specific file extensions (*.log), exclude entire directories (node_modules/), or use negation patterns (!important.txt) to include files that would otherwise be excluded. Each line represents a pattern, and comments start with #.
Example of a .dockerignore file:
# Dependencies
node_modules/
npm-debug.log*
yarn-debug.log*
yarn-error.log*

# Build outputs
dist/
build/
*.tgz

# Version control
.git/
.gitignore
.svn/

# IDE and editor files
.vscode/
.idea/
*.swp
*.swo
*~

# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Logs and databases
*.log
*.sqlite
*.db

# Environment and secrets
.env
.env.local
.env.*.local
secrets/
*.key
*.pem

# Documentation
README.md
docs/
*.md

# Test files
test/
tests/
*.test.js
coverage/

# Temporary files
tmp/
temp/
*.tmp

Measuring Build Performance
Analyzing Build Time
Understanding where your build spends time helps identify bottlenecks and optimization opportunities. The detailed progress output shows timing for each build step, cache hits/misses, and resource usage.
# Enable BuildKit progress output
DOCKER_BUILDKIT=1 docker build --progress=plain .

# Use buildx for detailed timing
docker buildx build --progress=plain .

Profiling Context Transfer
Monitor context transfer time to understand how build context size affects overall performance. Profile which directories contribute most to help target .dockerignore optimizations.
# Measure context transfer time
time docker build --no-cache .

# Profile context size by directory
du -sh */ | sort -hr

Measuring .dockerignore Impact
Before .dockerignore, you'll notice that the transfering context size is 245.7MB in 15.2s:
$ docker build .
#1 [internal] load build context
#1 transferring context: 245.7MB in 15.2s

After adding the .dockerignore file, the context reduced to 2.1MB in 0.3s:
$ docker build .
#1 [internal] load build context  
#1 transferring context: 2.1MB in 0.3s

Result: 99% reduction in context size and 50x faster context transfer!
Best Practices for Docker Build Performance
We've covered several optimization techniques throughout this guide. Here's a quick recap of the key practices, plus some additional strategies:

Layer Caching (covered in Mistake 3): Copy dependency files before source code to maximize cache reuse.

Using .dockerignore (covered in Mistake 2): Exclude unnecessary files to reduce context size and improve build speed.

Choosing the Right Context (covered earlier): Select appropriate context types (local, Git, tarball) based on your use case.


Now let’s talk about some more ways you can improve performance:
Use Multi-Stage Builds
👉 Demo project: 03-multistage-node
Multi-stage builds let you use one image for building/compiling your application and a different, smaller image for running it. This dramatically reduces your final image size by excluding build tools, source code, and other unnecessary files from the production image.
# Build stage
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Use Specific Base Images
Generic base images like ubuntu:latest include many packages you don't need, making your images larger and slower to download. Specific images like node:18-alpine or distroless images contain only what's necessary for your application to run.
# Large base image
FROM ubuntu:latest

# Smaller, more specific base image  
FROM node:18-alpine

# Even smaller distroless image
FROM gcr.io/distroless/nodejs18-debian11

Combine RUN Commands
Each RUN command creates a new layer in your image. Multiple RUN commands create multiple layers, increasing image size. Combining commands into a single RUN instruction creates just one layer, and you can clean up temporary files in the same step.
# Creates multiple layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean

# Single layer
RUN apt-get update && \
    apt-get install -y curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Troubleshooting Docker Build Issues
Issue: "COPY failed: no such file or directory"
Problem: File not in build context
What’s going wrong: Docker can only access files within the build context (the directory you specify in docker build). If your Dockerfile tries to COPY a file that doesn't exist in the context directory, the build fails. This often happens when running the build command from the wrong directory or when the file path is incorrect relative to the context root.
Solution:
# Check what's in your context
ls -la

# Verify file path relative to context
docker build -t debug . --progress=plain

Issue: "Docker Build is extremely slow"
Problem: Large build context
What’s going wrong: Docker must transfer your entire build context to the BuildKit daemon before building starts. If your context contains large files, directories like node_modules, or unnecessary files, this transfer can take minutes instead of seconds. The larger the context, the slower your builds become.
Solution:
# Check context size
du -sh .

# Add more patterns to .dockerignore
echo "large-directory/" >> .dockerignore
echo "*.zip" >> .dockerignore

Issue: "Cannot locate specified Dockerfile"
Problem: Dockerfile not in context root
What’s going wrong: By default, Docker looks for a file named Dockerfile in the root of your build context. If your Dockerfile is in a subdirectory or has a different name, Docker can't find it. This is common in monorepo setups where Dockerfiles are organized in separate folders.
Solution:
# Specify Dockerfile location
docker build -f path/to/Dockerfile .

# Or move Dockerfile to context root
mv path/to/Dockerfile .

Issue: "Cache misses on unchanged files"
Problem: File timestamps or permissions changed
What’s going wrong: Docker's layer caching relies on file checksums and metadata. Even if file content is unchanged, different timestamps or permissions can cause cache misses, forcing unnecessary rebuilds. This often happens after git operations, file system operations, or when files are copied between systems.
Solution:
# Check file modifications
git status

# Reset timestamps
git ls-files -z | xargs -0 touch -r .git/HEAD

Conclusion
Understanding Docker build contexts and architecture is essential for achieving faster builds. We’ve covered various techniques in this article, like optimized contexts and caching strategies, creating smaller images with efficient layering and multi-stage builds, maintaining better security with proper secret handling and minimal attack surface, and delivering an improved developer experience with faster iteration cycles.
👉 Full code examples are available on GitHub here: Docker build architecture examples
As always, I hope you enjoyed the article and learned something new. If you want, you can also follow me on LinkedIn or Twitter.
For more hands-on projects, follow and star this repository: Learn-DevOps-by-building
 


 How to Upload Large Objects to S3 with AWS CLI Multipart Upload 
Chisom Uma — Thu, 31 Jul 2025 05:48:09 +0000
 Uploading large files to S3 using traditional single-request methods can be quite challenging. If you’re transferring a 5GB database backup, and a network interruption happens, it forces you to restart the entire upload process. This wastes bandwidth and time. And this approach becomes increasingly unreliable as file sizes grow.
With a single PUT operation, you can actually upload an object of up to 5 GB. But, when it comes to uploading larger objects (above 5GB), using Amazon S3’s Multipart Upload feature is a better approach.
Multipart upload makes it easier for you to upload larger files and objects by segmenting them into smaller, independent chunks that upload separately and reassemble on S3.
In this guide, you’ll learn how to implement multipart uploads using AWS CLI.
Table of Contents

Prerequisites

How Multipart Uploads Work

Getting started

Step 1: Download the AWS CLI

Step 2: Configure AWS IAM credentials



Step 1: Split Object

Step 2: Create an Amazon S3 bucket

Step 3: Initiate Multipart Upload

Step 4: Upload split files to S3 Bucket

Step 5: Create a JSON File to Compile ETag Values

Step 6: Complete Mulitipart Upload to S3

Conclusion


Prerequisites
To follow this guide, you should have:

An AWS account.

Knowledge of AWS and the S3 service.

AWS CLI is installed on your local machine.


How Multipart Uploads Work
In a multipart upload, large file transfers are segmented into smaller chunks that get uploaded separately to Amazon S3. After all segments complete their upload process, S3 reassembles them into the complete object.
For example, a 160GB file broken into 1GB segments generates 160 individual upload operations to S3. Each segment receives a distinct identifier while preserving sequence information to guarantee proper file reconstruction.
The system supports configurable retry logic for failed segments and allows upload suspension/resumption functionality. Here’s a diagram that shows what the multipart upload process looks like:

Getting Started
Before you get started with this guide, make sure that you have the AWS CLI installed on your machine. If you don’t already have that installed, follow the steps below.
Step 1: Download the AWS CLI
To download the CLI, visit the CLI download documentation. Then, download the CLI based on your operating system (Windows, Linux, macOS). Once the CLI is installed, the next step is to configure your AWS IAM credentials in your terminal.
Step 2: Configure AWS IAM credentials
To configure your AWS credentials, navigate to your terminal and run the command below:
aws configure

This command prompts you to paste in certain credentials, such as AWS Access Key ID and secret ID. To obtain these credentials, create a new IAM user in your AWS account. To do this, follow the steps below. (You can skip these steps if you already have an IAM user and security credentials.)

Sign in to your AWS dashboard.

Click on the search bar above your dashboard and search “IAM”.

Click on IAM.

In the left navigation pane, navigate to Access management > Users.

Click Create user.

During IAM user creation, attach a policy directly by selecting Attach policies directly in step 2: Set permissions.

Give the user admin access by searching “admin” in the permission policies search bar and selecting AdministratorAccess.

On the next page, click Create user.

Click on the created user in the Users section and navigate to Security credentials.

Scroll down and click Create access key.

Select the Command Line Interface (CLI) use case.

On the next page, click Create access key.


You will now see your access keys. Please keep these safe and do not expose them publicly or share them with anyone.
You can now copy these access keys into your terminal after running the aws configure command.
You will be prompted to include the following details:

AWS Access Key ID: gotten from the created IAM user credentials. See steps above.

AWS Secret Access Key: gotten from the created IAM user credentials. See steps above.

Default region name: default AWS region name, for example, us-east-1.

Default output format: None.


Now we’re done with the CLI configuration.
To confirm that you’ve successfully installed the CLI, run the command below:
aws --version

You should see the CLI version in your terminal as shown below:

Now, you are ready for the following main steps for multipart uploads :)
Step 1: Split Object
The first step is to split the object you intend to upload. For this guide, we’ll be splitting a 188MB video file into smaller chunks.

Note that this process also works for much larger files.
Next, locate the object you intend to upload in your system. You can use the cd command to locate the object in its stored folder using your terminal.
Then run the split command below:
split -b mb 

Replace  with your desired chunk size in megabytes (for example, 150, 100, 200).
For this use case, we’ll be splitting our 188mb video file into bytes. Here’s the command:

split -b 31457280 videoplayback.mp4

Next, run the ls -lh command on your terminal. You should get the output below:

Here, you can see that the 188MB file has been split into multiple parts (30MB and 7.9MB). When you go to the folder where the object is saved in your system files, you will see additional files with names that look like this:

xaa

xab

xac


and so on. These files represent the different parts of your object. For example, xaa is the first part of your file, which will be uploaded first to S3. More on this later in the guide.
Step 2: Create an Amazon S3 Bucket
If you don’t already have an S3 bucket created, follow the steps in the AWS Get Started with Amazon S3 documentation to create one.
Step 3: Initiate Multipart Upload
The next step is to initiate a multipart upload. To do this, execute the command below:

aws s3api create-multipart-upload --bucket DOC-EXAMPLE-BUCKET --key large_test_file

In this command:

DOC-EXAMPLE-BUCKET is your S3 bucket name.

large_test_file is the name of the file, for example, videoplayback.mp4.


You’ll get a JSON response in your terminal, providing you with the UploadId. The response looks like this:

{
    "ServerSideEncryption": "AES345",
    "Bucket": "s3-multipart-uploads",
    "Key": "videoplayback.mp4",
    "UploadId": "************************************"
}

Keep the UploadId somewhere safe in your local machine, as you will need it for later steps.
Step 4: Upload Split Files to S3 Bucket
Remember those extra files saved as xaa, xab, and so on? Well, now it’s time to upload them to your S3 bucket. To do that, execute the command below:
aws s3api upload-part --bucket DOC-EXAMPLE-BUCKET --key large_test_file --part-number 1 --body large_test_file.001 --upload-id exampleTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk


DOC-EXAMPLE-BUCKET is your S3 bucket name.

large_test_file is the name of the file, for example, videoplayback.mp4

large_test_file.001 is the name of the file part, for example, xaa.

upload-id replaces the example ID with your saved UploadId.


The command returns a response that contains an ETag value for the part of the file that you uploaded.

{
    "ServerSideEncryption": "aws:kms",
    "ETag": "\"7f9b8c3e2a1d5f4e8c9b2a6d4e8f1c3a\"",
    "ChecksumCRC64NVME": "mK9xQpD2WnE="
}

Copy the ETag value and save it somewhere on your local machine, as you’ll need it later as a reference.
Continue uploading the remaining file parts by repeating the command above, incrementing both the part number and file name for each subsequent upload. For example: xaa becomes xab, and --part-number 1 becomes --part-number 2, and so forth.
Note that upload speed depends on how large the object is and how good your internet speed is.
To confirm that all the file parts have been uploaded successfully, run the command below:
aws s3api list-parts --bucket s3-multipart-uploads --key videoplayback.mp4 --upload-id p0NU3agC3C2tOi4oBmT8lHLebUYqYXmWhEYYt8gc8jXlCStEZYe1_kSx1GjON2ExY_0T.4N4E6pjzPlNcji7VDT6UomtNYUhFkyzpQ7IFKrtA5Dov8YdC20c7UE20Qf0

Replace the example upload ID with your actual upload ID.
You should get a JSON response like this:

{
    "Parts": [
        {
            "PartNumber": 1,
            "LastModified": "2025-07-27T14:22:18+00:00",
            "ETag": "\"f7b9c8e4d3a2f6e8c9b5a4d7e6f8c2b1\"",
            "Size": 26214400
        },
        {
            "PartNumber": 2,
            "LastModified": "2025-07-27T14:25:42+00:00",
            "ETag": "\"a8e5d2c7f9b4e6a3c8d5f2e9b7c4a6d3\"",
            "Size": 26214400
        },
        {
            "PartNumber": 3,
            "LastModified": "2025-07-27T14:28:15+00:00",
            "ETag": "\"c4f8e2b6d9a3c7e5f8b2d6a9c3e7f4b8\"",
            "Size": 26214400
        },
        {
            "PartNumber": 4,
            "LastModified": "2025-07-27T14:31:03+00:00",
            "ETag": "\"e9c3f7a5d8b4e6c9f2a7d4b8c6e3f9a2\"",
            "Size": 26214400
        },
        {
            "PartNumber": 5,
            "LastModified": "2025-07-27T14:33:47+00:00",
            "ETag": "\"b6d4a8c7f5e9b3d6a2c8f4e7b9c5d8a6\"",
            "Size": 26214400
        },
        {
            "PartNumber": 6,
            "LastModified": "2025-07-27T14:36:29+00:00",
            "ETag": "\"d7e3c9f6a4b8d2e5c7f9a3b6d4e8c2f5\"",
            "Size": 26214400
        },
        {
            "PartNumber": 7,
            "LastModified": "2025-07-27T14:38:52+00:00",
            "ETag": "\"f2a6d8c4e7b3f6a9c2d5e8b4c7f3a6d9\"",
            "Size": 15728640
        }
    ]
}

This is how you verify that all parts have been uploaded.
Step 5: Create a JSON File to Compile ETag Values
The document we are about to create helps AWS understand which parts the ETags represent. Gather the ETag values from each uploaded file part and organize them into a JSON structure.
Sample JSON format:

{
    "Parts": [{
        "ETag": "example8be9a0268ebfb8b115d4c1fd3",
        "PartNumber":1
    },

    ....

    {
        "ETag": "example246e31ab807da6f62802c1ae8",
        "PartNumber":4
    }]
}

Save the created JSON file in the same folder as your object and name it multipart.json. You can use any IDE of your choice to create and save this document.
Step 6: Complete Mulitipart Upload to S3
To complete the multipart upload, run the command below:
aws s3api complete-multipart-upload --multipart-upload file://fileparts.json --bucket DOC-EXAMPLE-BUCKET --key large_test_file --upload-id exampleTUVGeKAk3Ob7qMynRKqe3ROcavPRwg92eA6JPD4ybIGRxJx9R0VbgkrnOVphZFK59KCYJAO1PXlrBSW7vcH7ANHZwTTf0ovqe6XPYHwsSp7eTRnXB1qjx40Tk

Replace fileparts.json with multipart.json.
You should get an output like this:

{
    "ServerSideEncryption": "AES256",
    "Location": "https://s3-multipart-uploads.s3.eu-west-1.amazonaws.com/videoplayback.mp4",
    "Bucket": "s3-multipart-uploads",
    "Key": "videoplayback.mp4",
    "ETag": "\"78298db673a369adf33dd8054bb6bab7-7\"",
    "ChecksumCRC64NVME": "d1UPkm73mAE=",
    "ChecksumType": "FULL_OBJECT"
}

Now, when you go to your S3 bucket and hit refresh, you should see the uploaded object.

Here, you can see the complete file, file name, type, and size.
Conclusion
Multipart uploads transform large file transfers to Amazon S3 from fragile, all-or-nothing operations into robust, resumable processes. By segmenting files into manageable chunks, you gain retry capabilities, better performance, and the ability to handle objects exceeding S3's 5GB single-upload limit.
This approach is essential for production environments dealing with database backups, video files, or any large assets. With the AWS CLI techniques covered in this guide, you're now equipped to handle S3 transfers confidently, regardless of file size or network conditions.
Check out this documentation from the AWS knowledge center to learn more about multi-part uploads using AWS CLI.

Layer	Service	What it does
Frontend	Next.js + CloudFront	React UI served globally over HTTPS
Auth	Amazon Cognito + Amplify	User sign-up, login, and JWT token management
API	API Gateway	Routes HTTP requests, validates auth tokens
Logic	AWS Lambda (×3)	Creates, reads, and deletes vendors on demand
Database	DynamoDB	Stores vendor records with no idle cost
Storage	S3	Holds your built frontend files
Infrastructure	AWS CDK	Defines and deploys all of the above as code

Service	RAGStack-Lambda	Traditional Stack
Vector Database	S3 Vectors: pennies/mo	Pinecone Starter: `70 USD`/mo
Vector Database (alt)	S3 Vectors: pennies/mo	OpenSearch Serverless: about `350 USD`/mo min
Compute	Lambda: free tier	EC2 or ECS: `50-150 USD`/mo
LLM Inference	Same per-query cost	Same per-query cost
Total (idle)	about `0.50-3.00 USD`/mo	`120-500 USD`/mo

Cloud Computing - freeCodeCamp.org

How to Migrate to S3 Native State Locking in Terraform

Table of Contents

What is Terraform State Locking?

What Is S3 Native State Locking?

How S3 Native Locking Compares to the S3 + DynamoDB Approach

Prerequisites

Part 1: Fresh Setup – How to Configure S3 Native Locking from Scratch

Step 1: Create the S3 Bucket with Versioning and Encryption

Step 2: Configure the Terraform Backend with Native Locking

Step 3: Initialize and Verify

Part 2: Migration – How to Move from S3 + DynamoDB to S3 Native Locking

Step 1: Verify Your Current Setup

Step 2: Enable Object Lock on the Existing S3 Bucket

Step 3: Update the Terraform Backend Configuration

Step 4: Reinitialize Terraform

Step 5: Verify the Migration

Step 6: Clean Up the DynamoDB Table

How to Verify That Locking Is Working

Method 1: Observe the lock file during an operation

Method 2: Read the lock file contents

How to Handle a Stuck Lock

Rollback Plan: If Something Goes Wrong

Security Best Practices for Your State Bucket

Enable Versioning (Required)

Block All Public Access (Non-Negotiable)

Enable Server-Side Encryption

Apply Least-Privilege IAM Permissions

Enable Access Logging

Conclusion

References

How to Land Your First Cloud or DevOps Role: What Hiring Managers Actually Look For

Table of Contents

The Three Patterns That Keep Beginners Stuck

Pattern 1: The Tutorial Loop

Pattern 2: The Theory-Practice Gap

Pattern 3: Silent Learning

What Hiring Managers Are Actually Evaluating

Factor 1: Proof of Work (The Non-Negotiable)

The Three Projects That Cover Everything

Project 1 : The Full-Stack Deploy Pipeline

Project 2: Infrastructure as Code with Terraform

Project 3: Monitoring and Observability Stack

Factor 2: System-Level Thinking

Factor 3: Software Engineering Fundamentals

1. Linux and the Command Line

2. Networking Fundamentals

3. Scripting: Bash and Python

4. Git and Version Control

5. Docker and Containers

Factor 4: Communication Skills

Factor 5: Consistency Over Intensity

Factor 6: Networking and Visibility

Community Engagement

LinkedIn Content

Asking Good Questions in Public

Factor 7: Ownership Mindset

Factor 8: Business Awareness

Factor 9: Learning Agility

Your 90-Day Action Plan

Month 1: Build Your Foundation

Month 2: Expand Your Execution and Start Your Visibility

Month 3: Complete the Portfolio and Document Everything

Month 4 Onward: Apply with Strategy

Honest Self-Assessment: Where Do You Stand?

Conclusion

References and Recommended Resources

How to Deploy a Full-Stack Next.js App on Cloudflare Workers with GitHub Actions CI/CD

Table of Contents

Why Choose Cloudflare Workers Over Vercel?

Key Takeaways

Trade-offs to Consider

Prerequisites

Install Wrangler (Cloudflare CLI)

The Stack

Step 1 — Install the Cloudflare Adapter

Step 2 — Wire OpenNext into next dev

Step 3 — Local Environment Setup with .dev.vars

Example: .dev.vars.example (committed)

Set Up Your Local Environment

Step 2 — Wire OpenNext into `next dev`

Step 3 — Local Environment Setup with `.dev.vars`

Example: `.dev.vars.example` (committed)

Update `.gitignore`