<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Prince Onukwili - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Prince Onukwili - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 14 May 2026 04:32:20 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/onukwilip/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Deploy Your Own Cockroach DB  Instance on Kubernetes [Full Book for Devs] ]]>
                </title>
                <description>
                    <![CDATA[ Developers are smart, wonderful people, and they’re some of the most logical thinkers you’ll ever meet. But we’re pretty terrible at naming things 😂 Like, what in the world – out of every other possible name, they decided to name a database after a ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/deploy-your-own-cockroach-db-instance-on-kubernetes-full-book-for-devs/</link>
                <guid isPermaLink="false">6925e482ccc8b29b82c002c5</guid>
                
                    <category>
                        <![CDATA[ cockroachdb ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Kubernetes ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Prince Onukwili ]]>
                </dc:creator>
                <pubDate>Tue, 25 Nov 2025 17:16:50 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764088553942/496bf5f4-f059-4873-b6c1-419a86e594ef.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Developers are smart, wonderful people, and they’re some of the most logical thinkers you’ll ever meet. But we’re pretty terrible at naming things 😂</p>
<p>Like, what in the world – out of every other possible name, they decided to name a database after a <em>literal cockroach</em>? 🤣</p>
<p>I mean, I get it: cockroaches are known for being resilient, and the devs were probably trying to say “our database never dies”… but still…a cockroach?</p>
<p>The name aside, out of all the databases out there, you might be wondering why would you choose CockroachDB? And if you did choose it, where would you even start when trying to host and deploy it? Would you go for a managed cloud service? Or could you actually self-manage it?</p>
<p>If you ever thought of doing it yourself – maybe in a dev environment, or even introducing it to your company – how would you go about it?</p>
<p>Well, just calm your nerves 😄</p>
<p>In this book, we’ll explore everything you need to know about <strong>deploying and managing CockroachDB on Kubernetes</strong>. We’ll dive deep into:</p>
<ul>
<li><p>Understanding how CockroachDB’s masterless (multi-primary) architecture actually works</p>
</li>
<li><p>Setting up and deploying CockroachDB on a Kubernetes cluster</p>
</li>
<li><p>Automating backups to Google Cloud Storage using just a few queries in the CockroachDB cluster</p>
</li>
<li><p>Managing service accounts and authentication securely</p>
</li>
<li><p>Tuning CockroachDB’s memory settings for stable performance</p>
</li>
<li><p>Scaling the cluster horizontally and vertically without downtime</p>
</li>
<li><p>Monitoring and maintaining the database like a pro</p>
</li>
</ul>
<p>By the end, you’ll not only understand how CockroachDB works, you’ll be confident enough to deploy and manage your own resilient, production-ready instance. 🚀</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-even-is-cockroachdb">What Even Is CockroachDB? 🤔</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-simple-definition">Simple Definition</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-who-made-cockroachdb-when-was-it-released">Who Made CockroachDB? When Was it Released?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-problems-does-cockroachdb-try-to-solve">What Problems Does CockroachDB Try to Solve?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-terms-you-should-know-in-plain-language">Key Terms You Should Know (in plain language):</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-the-name-cockroachdb">Why the name “CockroachDB”? 😅</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-why-choose-cockroachdb-over-postgresql-or-mongodb">Why Choose CockroachDB Over PostgreSQL or MongoDB 🤷🏾‍♂️?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-fault-tolerance-is-handled-in-postgresql-and-mongodb">How Fault Tolerance is Handled in PostgreSQL and MongoDB</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-cockroachdb-handles-it-differently">How CockroachDB Handles It Differently</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-cockroachdb-works-behind-the-scenes">How CockroachDB Works Behind the Scenes ⚙️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-ranges-the-small-pieces-of-data">Ranges: The Small Pieces of Data</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-replication-many-copies-for-safety">Replication: Many Copies for Safety</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-raft-consensus-how-all-copies-agree">Raft Consensus: How All Copies Agree</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-multiraft-keeping-raft-efficient-when-things-scale">MultiRaft: Keeping Raft Efficient When Things Scale</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-rebalancing-movement-for-balance">Rebalancing: Movement for Balance</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-distributed-transactions-doing-work-across-multiple-ranges">Distributed Transactions: Doing Work Across Multiple Ranges</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-it-all-fits-together-read-write-flow-what-happens-when-you-use-it">How It All Fits Together: Read + Write Flow (What Happens When You Use It)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-this-all-matters-putting-it-in-plain-english">Why This All Matters (Putting It in Plain English)</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-where-and-how-should-you-host-cockroachdb">Where (and How) Should You Host CockroachDB? ☁️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-option-1-cockroachdb-cloud-fully-managed-by-cockroach-labs">Option 1: CockroachDB Cloud (fully managed by Cockroach Labs)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-2-bring-your-own-cloud-byoc">Option 2: Bring Your Own Cloud (BYOC)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-3-use-cloud-marketplaces-aws-gcp-azure">Option 3: Use Cloud Marketplaces (AWS, GCP, Azure)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-4-my-favorite-self-hosting-especially-using-kubernetes">Option 4 (My Favorite 😁): Self-Hosting — Especially Using Kubernetes</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-setting-up-your-local-environment">Setting Up Your Local Environment 🧑‍💻</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-these-tools">Why these tools?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-install-minikube">Step 1: Install Minikube</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-install-kubectl">Step 2: Install kubectl</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-install-helm">Step 3: Install Helm</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-deploying-cockroachdb-on-minikube-the-fun-part-begins">Deploying CockroachDB on Minikube (The Fun Part Begins 😁!)</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-visit-artifacthub">Step 1: Visit ArtifactHub</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-explore-the-helm-chart">Step 2: Explore the Helm Chart</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-copy-the-default-values">Step 3: Copy the Default Values</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-create-a-folder-for-our-project">Step 4: Create a Folder for Our Project</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-understanding-the-key-configurations">Step 5: Understanding the Key Configurations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-create-a-simplified-values-config-for-the-cockroachdb-helm-chart">Step 6: Create a Simplified Values Config for the CockroachDB Helm Chart</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-overview-of-the-yaml-values">Overview of the YAML values</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-install-the-cockroachdb-cluster-using-helm">🚀 Step 7: Install the CockroachDB Cluster Using Helm</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-accessing-the-cockroachdb-console-amp-viewing-metrics">Accessing the CockroachDB Console &amp; Viewing Metrics</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-locate-the-cockroachdb-public-service">Step 1: Locate the CockroachDB Public Service</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-learn-more-about-the-service">Step 2: Learn More About the Service</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-access-the-cockroachdb-dashboard">Step 3: Access the CockroachDB Dashboard</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-visit-the-dashboard">Step 4: Visit the Dashboard</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-exploring-the-metrics-dashboard">Step 5: Exploring the Metrics Dashboard</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-creating-a-little-load-on-the-cockroachdb-cluster">Step 6: Creating a Little Load on the CockroachDB Cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-viewing-the-metrics-from-the-load">Step 7: Viewing the Metrics from the Load</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-8-view-the-list-of-created-items-in-the-database">Step 8: View the List of Created Items in the Database</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-backing-up-cockroachdb-to-google-cloud-storage">Backing Up CockroachDB to Google Cloud Storage ☁️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-backups-are-absolutely-critical">Why Backups Are Absolutely Critical</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-our-db-installing-beekeeper-studio">Connecting to Our DB – Installing Beekeeper Studio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-install-beekeeper-studio">How to Install Beekeeper Studio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-beekeeper-studio-to-cockroachdb">Connecting Beekeeper Studio to CockroachDB</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-exposing-the-cluster-for-local-access">Exposing the Cluster for Local Access</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-via-beekeeper-studio">🐝 Connecting via Beekeeper Studio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-verify-the-connection">Verify the Connection</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-creating-a-google-cloud-account">Creating a Google Cloud Account</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-creating-a-google-cloud-storage-bucket">Creating a Google Cloud Storage Bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-giving-cockroachdb-access-to-the-bucket">Giving CockroachDB Access to the Bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-attaching-the-key-to-our-cockroachdb-cluster">Attaching the Key to Our CockroachDB Cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-testing-our-backup-disaster-recovery-time">Testing Our Backup — Disaster Recovery Time</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-managing-resources-amp-optimizing-memory-usage">Managing Resources &amp; Optimizing Memory Usage</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-cockroachdb-uses-memory">How CockroachDB Uses Memory</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-memory-usage-formula-you-must-follow">The Memory Usage Formula You Must Follow</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-where-you-find-these-settings">Where You Find These Settings</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-concrete-example-step-by-step">Concrete Example (Step-by-Step)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-on-requests-vs-limits-in-kubernetes">⚠️ On Requests vs Limits in Kubernetes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-overriding-the-default-fractions">Overriding the Default Fractions</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-scaling-cockroachdb-the-right-way">Scaling CockroachDB the Right Way</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-key-metrics-to-understand">Key Metrics to Understand</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-and-what-to-scale-based-on-your-metrics">When (and What) to Scale Based on Your Metrics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-disk-bound-situations-what-to-do-when-your-disk-is-the-limiting-factor">Disk-Bound Situations — What to Do When Your Disk Is the Limiting Factor</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-memory-pressure-what-to-do-when-your-database-hits-the-limit">Memory Pressure — What to Do When Your Database Hits the Limit</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-queries-are-slow-but-everything-else-cpu-memory-amp-disk-looks-fine">When Queries Are Slow but Everything Else (CPU, Memory &amp; Disk) Looks “Fine”</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-disk-speed-iops-amp-throughput-across-cloud-providers">Understanding Disk Speed (IOPS &amp; Throughput) Across Cloud Providers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-downsizing-the-cluster-reducing-replicas">Downsizing the Cluster (Reducing Replicas)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-wrong-way-to-downscale">⚠️ The Wrong Way to Downscale</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-decommissioning-a-node-before-scaling-down-the-cluster">Decommissioning a Node Before Scaling Down the Cluster</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-what-to-consider-when-deploying-cockroachdb-on-google-kubernetes-engine-gke">What to Consider When Deploying CockroachDB on Google Kubernetes Engine (GKE) ☁️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-creating-your-gke-cluster">Creating Your GKE Cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-your-gke-cluster">Connecting to your GKE cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-deploying-cockroachdb-in-production-on-gke">Deploying CockroachDB in Production (on GKE)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-configuration">Understanding the Configuration</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-installing-the-cockroachdb-cluster-on-gke">Installing the CockroachDB Cluster on GKE</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-our-cockroachdb-cluster-now-that-tls-mtls-are-enabled">Connecting to Our CockroachDB Cluster (Now That TLS + mTLS Are Enabled)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-via-mutual-tls-mtls-why-we-need-a-certificate-for-our-root-user">Connecting via Mutual TLS (mTLS) — Why We Need a Certificate for Our root User</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-lets-explore-our-clusters-certificate">Let’s Explore Our Cluster’s Certificate</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-certificate-sections-explained-super-simply">Understanding the Certificate Sections (Explained Super Simply)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-creating-a-client-certificate-so-we-can-finally-connect-to-cockroachdb">Creating a Client Certificate (So We Can Finally Connect to CockroachDB)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-our-cockroachdb-cluster-securely-using-mtls">Connecting to Our CockroachDB Cluster Securely (Using mTLS)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-restoring-our-previous-database-into-the-new-gke-cockroachdb-cluster-without-sa-keys">Restoring Our Previous Database into the New GKE CockroachDB Cluster (without SA keys)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-restoring-our-previous-database-from-google-cloud-storage">Restoring Our Previous Database from Google Cloud Storage</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-now-lets-restore-the-data">Now, Let’s Restore the Data 🎉</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-the-database-with-a-new-user">Connecting to the Database with a New User</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-with-passwordless-authentication-mutual-tls">Connecting with Passwordless Authentication (Mutual TLS)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-via-mutual-tls-mtls-from-our-apps-on-kubernetes">Connecting via Mutual TLS (mTLS) from Our Apps on Kubernetes</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-get-a-cockroachdb-enterprise-license-for-free">How to Get a CockroachDB Enterprise License for FREEE!</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-three-types-of-licenses">Three Types of Licenses</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-apply-for-the-free-enterprise-license">How to Apply for the Free Enterprise License</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-adding-your-license-to-the-cockroachdb-cluster">Adding Your License to the CockroachDB Cluster</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion-amp-next-steps">Conclusion &amp; Next Steps ✨</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-about-the-author">About the Author 👨🏾‍💻</a></li>
</ul>
</li>
</ol>
<h2 id="heading-what-even-is-cockroachdb">What Even Is CockroachDB? 🤔</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760416037885/c67edcbb-be85-4614-bdf3-104942048eea.jpeg" alt="An image summarizing what CockroachDB is" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Hey! before we jump into setting up our Kubernetes cluster and deploying our CockroachDB cluster, let’s get grounded in what CockroachDB really is. (Because if you don’t understand the why and how, the implementation and practical session will just feel like magic 😅.)</p>
<h3 id="heading-simple-definition">Simple Definition</h3>
<p>CockroachDB is a distributed SQL database. This means it gives you the features of a relational database (tables, SQL queries, JOINS, transactions) but copies data across multiple replicas (servers, nodes, instances). No need for sharding manually. 😃</p>
<p>It’s built to survive failures, scale easily (compared to other SQL databases), and keep your data consistent no matter what (across all the instances).</p>
<h3 id="heading-who-made-cockroachdb-when-was-it-released">Who Made CockroachDB? When Was it Released?</h3>
<p>CockroachDB was created by <a target="_blank" href="https://www.cockroachlabs.com/"><strong>Cockroach Labs</strong></a>, founded by Spencer Kimball, Peter Mattis, and Ben Darnell. The idea first started taking shape around 2014, and by 2015 Cockroach Labs was formally founded.</p>
<p>Its 1.0 “production-ready” version was announced in 2017, marking its transition from beta to being suitable for real-world use.</p>
<h3 id="heading-what-problems-does-cockroachdb-try-to-solve">What Problems Does CockroachDB Try to Solve?</h3>
<p>Traditional relational databases are great, but they run into real challenges when your app grows. CockroachDB was built to solve those. Here are the key pain points and how CockroachDB addresses them:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Pain Point</td><td>What usually happens</td><td>How CockroachDB fixes it</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Single primary bottleneck</strong></td><td>ONLY ONE “primary” node handles writes, updates, and deletes. That node can become difficult to scale (adapt to the DB usage) without downtime</td><td>CockroachDB is <strong>multi-primary</strong>, meaning every node can accept reads and writes. No single “primary” for the entire cluster.</td></tr>
<tr>
<td><strong>Manual sharding complexity</strong></td><td>You have to split data (shard) by hand, decide which piece goes where, and handle cross-shard queries, lots of headache 😖.</td><td>CockroachDB automatically partitions data into smaller units (called <em>ranges</em>) and moves them around to balance load.</td></tr>
<tr>
<td><strong>Failover downtime</strong></td><td>If the primary node fails, you need to promote a replica (read-only instance) and switch over. During that time, your app might be down.</td><td>Because there’s no single primary, if one of the instances fail, others take over seamlessly (via consensus) without a big outage.</td></tr>
<tr>
<td><strong>Geographic scaling &amp; latency</strong></td><td>Serving users in different regions is hard — either data is far away (slow) or you must build complex replication logic.</td><td>CockroachDB lets you distribute nodes across regions. You can serve local reads/writes while keeping global consistency.</td></tr>
</tbody>
</table>
</div><p>So instead of fighting your database as it grows, CockroachDB handles much of the hard work for you.</p>
<h3 id="heading-key-terms-you-should-know-in-plain-language">Key Terms You Should Know (in plain language):</h3>
<ul>
<li><p><strong>Node:</strong> Duplicates or copies of your database. These are also known as replicas. They can be read-only (databases from which data can only be read, for example using SELECT statements), OR read-write (databases from which data can be read, created, updated, and deleted).</p>
</li>
<li><p><strong>Replication</strong>: making copies of data on multiple nodes. If one node fails, others still have the data.</p>
</li>
<li><p><strong>Raft (consensus algorithm)</strong>: a system that ensures copies (replicas) agree on changes in a safe, reliable way. For example, when you want to write data, Raft ensures that most copies agree before it’s accepted.</p>
</li>
<li><p><strong>Sharding / Ranges</strong>: Instead of putting all your data in one big blob, CockroachDB splits it into smaller chunks called <em>ranges</em>. Each range is replicated and can move between nodes.</p>
</li>
<li><p><strong>Distributed transaction</strong>: a transaction (series of operations) that might touch data stored in different nodes. CockroachDB manages this, so you still get ACID (atomic, consistent, isolated, durable) properties.</p>
</li>
</ul>
<h3 id="heading-why-the-name-cockroachdb">Why the name “CockroachDB”? 😅</h3>
<p>You might wonder: <em>Why name a database after a cockroach?</em> It sounds weird at first, but there's a reason:</p>
<p>Cockroaches are known for surviving harsh conditions: radiation, natural disasters, and so on. The founders wanted a database that feels almost “impossible to kill,” that can survive node failures, outages, and network splits. The name is a tongue-in-cheek nod to resilience.</p>
<h2 id="heading-why-choose-cockroachdb-over-postgresql-or-mongodb">Why Choose CockroachDB Over PostgreSQL or MongoDB 🤷🏾‍♂️?</h2>
<p>Let’s compare the classic setup (Postgres / MongoDB) to CockroachDB, especially why you might want to go with CockroachDB, and how it helps ease scaling. I’ll also explain some terms to make sure you’re following.</p>
<p>In many setups, when you use Postgres or MongoDB, you’ll often have one “primary” node that handles all writes (that is, inserts, updates, deletes).</p>
<p>Then you have multiple “read replicas” that copy the primary’s data and serve read requests (selects). That works okay – reads can be spread out – but all write traffic goes to that one primary node.</p>
<p>Usually, the primary eventually gets stressed when the write volume grows (for example, more customers create accounts and products on your platform).</p>
<p>You can add more read replicas (horizontal scaling for reads, for example customers trying to view their accounts, or previously created products on your site), but scaling the primary is much harder.</p>
<p>To scale the primary, you often resort to upgrading its resources (CPU, RAM, disk) – that’s vertical scaling – which often needs downtime (shut down the primary database, increase its CPU and RAM, then spin it back up).</p>
<p>Or you’d have to manually shard (split) your data across multiple primaries, route traffic carefully, and manage complexity.</p>
<h3 id="heading-how-fault-tolerance-is-handled-in-postgresql-and-mongodb">How Fault Tolerance is Handled in PostgreSQL and MongoDB</h3>
<p>When you try to make Postgres (or MongoDB) highly available and fault tolerant in a self-managed setup, you often need two+ read replicas and one primary.</p>
<p>The tricky part is handling what happens when the primary fails (or is taken down temporarily for an upgrade). You need something that can promote a replica to a primary automatically.</p>
<p>In Postgres land, that’s often handled by <a target="_blank" href="https://github.com/patroni/patroni"><strong>Patroni</strong></a> or <a target="_blank" href="https://www.repmgr.org/"><strong>repmgr</strong></a> (tools that handle cluster management, failover, leader election, and so on).</p>
<p>In MongoDB, such logic is part of the <strong>replica set</strong> behavior: it does automatic elections among replicas.</p>
<p>Here are some of the core challenges with that classic model:</p>
<ul>
<li><p>Every write must go to a single primary. If that primary fails or is overloaded, your whole system suffers.</p>
</li>
<li><p>Scaling reads is easy (add more replicas), but scaling writes is hard.</p>
</li>
<li><p>Vertical scaling (give more resources to one server) has its cons. If the primary node needs more resources, you might experience some downtime when it’s being scaled up.</p>
</li>
<li><p>Manual sharding is messy: you decide which piece of data goes to which shard, handle cross-shard queries, and build routing logic. That’s a lot of maintenance and can lead to unexpected issues if not handled properly.</p>
</li>
<li><p>One service (or load balancer/proxy) points to the primary (for ALL write queries).</p>
</li>
<li><p>Another service or routing logic handles read queries and can share reads across replicas.</p>
</li>
<li><p>You might use <strong>HAProxy</strong>, <strong>pgpool-II</strong>, or <strong>pgBouncer</strong> for Postgres to route traffic, do read/write splitting, or manage connection pooling. These are external (not part of the database core) tools you have to configure.</p>
</li>
</ul>
<p>So when the primary fails, Patroni (or repmgr, and so on) will detect it and promote one of the read replicas to be the new primary.</p>
<p>But that promotion, reconfiguration, and traffic rerouting often cause a brief window of downtime (when your primary database node becomes unavailable).</p>
<h3 id="heading-how-cockroachdb-handles-it-differently">How CockroachDB Handles It Differently</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760416070693/af1ade70-19bb-4e9f-82ec-9711c13d8079.jpeg" alt="A brief look at CockroachDB properties" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>CockroachDB changes the rules:</p>
<ul>
<li><p><strong>All replicas are equal</strong> for reads <em>and</em> writes. You don’t have a special “primary” that handles writes. Every node in the cluster can accept write requests.</p>
</li>
<li><p>CockroachDB breaks your data into small chunks (ranges) and replicates them across nodes. If you add a new node, data moves around automatically to balance the load.</p>
</li>
<li><p>Every write is automatically copied to other replicas, and consistency is managed by a protocol (Raft), so you don’t have to build this yourself.</p>
</li>
<li><p>No manual sharding needed. Because the database handles how data is split and moved, you don’t need to decide how to shard by hand.</p>
</li>
<li><p>You <strong>don’t need a special service</strong> to route writes vs reads queries. Any node can accept both reads <strong>and</strong> writes.</p>
</li>
<li><p>During scaling, you don’t have to worry about which node is the primary – because <em>there is no primary</em>.</p>
</li>
<li><p>You can scale your nodes one at a time (rollout style). When one node is being upgraded, the others continue to serve traffic. You won’t hit a downtime window just because you're scaling the “primary.”</p>
</li>
<li><p>Because there's no replica promotion logic to fight with, there's no moment where a replica needs to be “elevated” to primary – it’s all just nodes continuing to serve.</p>
</li>
</ul>
<h2 id="heading-how-cockroachdb-works-behind-the-scenes">How CockroachDB Works Behind the Scenes ⚙️</h2>
<p>In CockroachDB, there are many moving parts behind the scenes. But they work together, so you don’t have to babysit them. The core ideas, which we’ve mostly already touched on, are:</p>
<ul>
<li><p>Splitting data into pieces (<strong>ranges</strong>)</p>
</li>
<li><p>Keeping multiple copies of each piece (<strong>replicas/replication</strong>)</p>
</li>
<li><p>Making sure all copies agree via <strong>Raft consensus</strong></p>
</li>
<li><p>Moving pieces around to balance the load (<strong>automatic rebalancing/distribution</strong>)</p>
</li>
<li><p>Coordinating transactions that might touch many pieces</p>
</li>
</ul>
<p>Let’s go through each of those, one by one.</p>
<h3 id="heading-ranges-the-small-pieces-of-data">Ranges: The Small Pieces of Data</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760413105037/984f8b5c-bd53-4850-9704-57ce1dcedb80.png" alt="A little depiction of CockroachDB ranges" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Imagine you have a giant book of recipes. If you try to carry the whole thing, it’s heavy. So you split the book into smaller booklets, each covering recipes for a certain range of meals: breakfasts, lunches, dinners, desserts.</p>
<p>In CockroachDB, data is split into ranges, which are like those smaller booklets:</p>
<ul>
<li><p>Each range covers a certain block of data (like “all users whose ID is 1-1000”)</p>
</li>
<li><p>When a range gets too big (like having too many recipes in one booklet) it’s cut/split into two smaller ones. That makes each piece easier to manage.</p>
</li>
<li><p>If two neighboring ranges have become very small (few recipes), they might be merged (joined) back together so you’re not keeping too many tiny booklets.</p>
</li>
<li><p>These splits and merges happen automatically, behind the scenes, so the database stays smooth as things grow or shrink.</p>
</li>
</ul>
<p>This chopping helps the system in many ways: moving pieces, copying them, balancing load, recovering from node failures becomes easier.</p>
<h3 id="heading-replication-many-copies-for-safety">Replication: Many Copies for Safety</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760413678362/a0066780-1360-4511-8fd0-466f54ea2135.jpeg" alt="Replication of Ranges across multiple Nodes (databases) in CockroachDB" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Nobody likes losing their work, so you keep backup copies. CockroachDB does this for data as well.</p>
<p>For each range, there are usually 3 copies (replicas) stored on different machines (nodes). If one machine dies, you still have others. (<a target="_blank" href="https://www.cockroachlabs.com/docs/stable/architecture/replication-layer?utm_source=chatgpt.com">cockroachlabs.com</a>). And these copies are always kept in sync: when you write something (for example, insert or update), the change is propagated to the other copies.</p>
<p>The database also tolerates failures. If one node goes down, the system detects it and eventually makes a new copy elsewhere to replace it. So the target number of copies is maintained. This gives you fault tolerance: your data stays safe even when parts of your system fail.</p>
<h3 id="heading-raft-consensus-how-all-copies-agree">Raft Consensus: How All Copies Agree</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760415307117/79859a4b-4341-46eb-91d9-cccc3bde9a66.jpeg" alt="79859a4b-4341-46eb-91d9-cccc3bde9a66" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Having copies is useful, but you also need them to agree with each other – like all your recipe booklets have the same content in each copy. The Raft protocol is a way to make sure that happens reliably.</p>
<p>Here’s how Raft works in simple terms:</p>
<ul>
<li><p>Each range has a group of replicas. One of these replicas acts as the <strong>leader</strong>. Others are <strong>followers</strong>.</p>
</li>
<li><p>All write requests for that range go through the leader. The leader gets the request, then tells followers to record the same change.</p>
</li>
<li><p>Once most of the copies (a majority) say “yep, we got it,” the change is considered final (committed). Then the leader tells the client, “Done.”</p>
</li>
<li><p>If the leader stops working (the machine dies or the network fails), the followers notice it (they stop getting regular “I’m alive” messages), then they hold an election to pick a new leader, and the show goes on.</p>
</li>
<li><p>This way, the system ensures everyone has the same final data and no conflicting changes happen.</p>
</li>
</ul>
<p>So Raft is the agreement protocol that keeps all copies in sync and safe.</p>
<h3 id="heading-multiraft-keeping-raft-efficient-when-things-scale">MultiRaft: Keeping Raft Efficient When Things Scale</h3>
<p>When you have many ranges (many pieces of the booklets), each range has its own Raft group. That can mean a lot of “are you alive?” messages between nodes, and a lot of overhead. MultiRaft is the trick CockroachDB uses to make this efficient.</p>
<p>MultiRaft groups together Raft work for many ranges that share nodes, so overhead is reduced. Instead of sending separate heartbeat (are you alive?) messages for each range, some of the messages are bundled.</p>
<p>This reduces network chatter and resource waste and helps the database scale smoothly when you have tons of data and many pieces.</p>
<h3 id="heading-rebalancing-movement-for-balance">Rebalancing: Movement for Balance</h3>
<p>When your ranges are not evenly spread across nodes (machines), some machines are doing way too much work, and some hardly any. That’s not good. So CockroachDB automatically moves pieces around to balance things.</p>
<ul>
<li><p>The system watches how busy each node is (how many ranges it holds, how much data, how much read/write traffic).</p>
</li>
<li><p>If one node is overloaded, it will move some ranges to other nodes.</p>
</li>
<li><p>If a node dies, the system notices and makes sure that ranges that were on that node get copied somewhere else so safety (replica count) is maintained.</p>
</li>
<li><p>If you add a new node, the system starts moving ranges to the new node so its resources are used.</p>
</li>
</ul>
<p>This happens without you having to manually decide “move this here, move that there.”</p>
<h3 id="heading-distributed-transactions-doing-work-across-multiple-ranges">Distributed Transactions: Doing Work Across Multiple Ranges</h3>
<p>Often, an operation touches multiple ranges. For example, “transfer money from account A (in range 1) to account B (in range 2)”. That must be handled carefully so that either both parts succeed, or neither do.</p>
<p>CockroachDB supports <strong>distributed transactions</strong>, meaning a single transaction can work across many ranges. It uses “intent” writes (temporary placeholders) and once everything is ready, it commits the transaction so it becomes permanent. If something fails, it aborts (cancels) the whole thing. The system ensures atomic behavior: all or nothing.</p>
<h3 id="heading-how-it-all-fits-together-read-write-flow-what-happens-when-you-use-it">How It All Fits Together: Read + Write Flow (What Happens When You Use It)</h3>
<p>Let’s picture a write, step by step:</p>
<ol>
<li><p>Your app sends a write (for example, “add new user”) to any node in the CockroachDB cluster.</p>
</li>
<li><p>That node figures out which range(s) are involved (which pieces hold the data you want to write).</p>
</li>
<li><p>For each range, the write goes to that range’s leader.</p>
</li>
<li><p>The leader writes the change to their own copy, then tells followers to do the same.</p>
</li>
<li><p>Once most copies confirm they have the change, the leader declares it “committed” and tells your app, “yes, write done.”</p>
</li>
<li><p>If a node is busy or down, others still handle traffic.</p>
</li>
</ol>
<p>Read flow:</p>
<ul>
<li><p>Your app sends a read (for example “get user by ID”) to any node.</p>
</li>
<li><p>That node checks its copies. If it has a fresh copy, it answers. If not, it asks the node that does.</p>
</li>
</ul>
<p>Everything works so data is correct, up to date, and reliably available even if machines fail or network lags.</p>
<h3 id="heading-why-this-all-matters-putting-it-in-plain-english">Why This All Matters (Putting It in Plain English)</h3>
<p>All these tweaks are important for several key reasons. First of all, because data is chopped into ranges and replicated, no single node is a bottleneck. Also, Raft ensures consensus, so you can trust that data is consistent across all working replicas.</p>
<p>Beyond this, rebalancing is automatic, you don’t have to micromanage shards or worry about nodes drowning in load. And because transactions that touch multiple ranges are coordinated, you can trust ACID properties even in a distributed setup.</p>
<h2 id="heading-where-and-how-should-you-host-cockroachdb">Where (and How) Should You Host CockroachDB? ☁️</h2>
<p>There isn’t just one “right” way to host CockroachDB. There are a few paths you can pick, each with pros and cons. What you pick depends on cost, control, ease of use, and your risk tolerance.</p>
<p>In this section, we’ll explore:</p>
<ul>
<li><p>Cockroach Labs’ own managed cloud (CockroachDB Cloud)</p>
</li>
<li><p>“Bring Your Own Cloud” (BYOC) – letting Cockroach Labs manage it inside <em>your</em> cloud account</p>
</li>
<li><p>Hosting via cloud marketplaces (AWS, GCP, Azure)</p>
</li>
<li><p>Self-hosting / Kubernetes / your own infrastructure</p>
</li>
<li><p>And notes on DigitalOcean support</p>
</li>
</ul>
<p>Let’s dive in.</p>
<h3 id="heading-option-1-cockroachdb-cloud-fully-managed-by-cockroach-labs">Option 1: CockroachDB Cloud (fully managed by Cockroach Labs)</h3>
<p>This is the easiest option if you want to offload operations. You don’t manage nodes (computers, Virtual machines, and so on), upgrades, or backups, as Cockroach Labs handles all that.</p>
<p><strong>What it offers:</strong></p>
<ul>
<li><p>You sign up and click “create cluster.”</p>
</li>
<li><p>Automatic scaling, zero-downtime upgrades, and managed backups.</p>
</li>
<li><p>It supports multiple cloud providers behind the scenes (you pick region(s)).</p>
</li>
<li><p>You get tools, APIs, and Terraform integration to automate it.</p>
</li>
<li><p>They often give free credits to get started.</p>
</li>
</ul>
<p><strong>Tradeoffs:</strong></p>
<ul>
<li><p>You have less control over underlying infrastructure, for example Virtual Machines, networking, disks, and so on (you trade control for convenience).</p>
</li>
<li><p>You pay for the managed service premium.</p>
</li>
<li><p>You rely on Cockroach Labs’ SLAs, uptime, and support.</p>
</li>
</ul>
<p>If you want, you can check it out here: <a target="_blank" href="https://www.cockroachlabs.com/product/cloud/">CockroachDB Cloud (managed by Cockroach Labs)</a>.</p>
<h3 id="heading-option-2-bring-your-own-cloud-byoc">Option 2: Bring Your Own Cloud (BYOC)</h3>
<p>This is a middle ground: you keep your cloud environment, but let Cockroach Labs manage the database. It gives you control over infrastructure, billing, network, and so on, while still offloading operational complexity.</p>
<p><strong>How it works:</strong></p>
<ul>
<li><p>You run CockroachDB Cloud inside your cloud account (AWS, GCP, and so on).</p>
</li>
<li><p>Cockroach Labs still handles provisioning, upgrades, backups, and observability. You manage roles, networking, and logs.</p>
</li>
<li><p>Useful for complying with regulations, keeping data within your cloud folder/account, and using your cloud discounts.</p>
</li>
</ul>
<p><strong>Tradeoffs:</strong></p>
<ul>
<li><p>You still need to set up cloud aspects (VPCs, IAM, roles) correctly.</p>
</li>
<li><p>There’s more complexity than pure managed, but more control as well.</p>
</li>
<li><p>Cockroach Labs needs access to certain parts of your account (permissions).</p>
</li>
</ul>
<p>If you want to explore BYOC, you can read more here: <a target="_blank" href="https://www.cockroachlabs.com/product/cloud/bring-your-own-cloud/">CockroachDB Bring Your Own Cloud</a>.</p>
<h3 id="heading-option-3-use-cloud-marketplaces-aws-gcp-azure">Option 3: Use Cloud Marketplaces (AWS, GCP, Azure)</h3>
<p>If you already use a cloud provider, sometimes the easiest way is to deploy via their marketplace offerings. It gives you familiarity, billing simplicity, and so on.</p>
<ul>
<li><p><strong>GCP Marketplace</strong> – CockroachDB is available on the Google Cloud Marketplace, making it easier to deploy within your GCP environment. You can learn more here: <a target="_blank" href="https://console.cloud.google.com/marketplace/product/cockroachdb-public/cockroachdb">GCP Marketplace</a>.</p>
</li>
<li><p><strong>AWS Marketplace</strong> – CockroachDB is listed there: <a target="_blank" href="https://aws.amazon.com/marketplace/pp/prodview-n3xpypxea63du">AWS Marketplace</a>.</p>
</li>
<li><p><strong>Azure Marketplace</strong> – Also supported for Azure deployments (SaaS/managed listings): <a target="_blank" href="https://marketplace.microsoft.com/en-us/product/saas/cockroachlabs1586448087626.cockroachdb-azure?tab=overview">Azure Marketplace</a>.</p>
</li>
<li><p><strong>DigitalOcean</strong> – There is support for CockroachDB deployment on DigitalOcean using their infrastructure: <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/deploy-cockroachdb-on-digital-ocean">Deploy CockroachDB on DigitalOcean</a>.</p>
</li>
</ul>
<p>These options let you stay in your cloud console, use your existing cloud accounts, and integrate with other resources you already have.</p>
<p>But you're still responsible for certain operational tasks (networking, security, monitoring, backups) depending on how the marketplace offering is configured.</p>
<h3 id="heading-option-4-my-favorite-self-hosting-especially-using-kubernetes">Option 4 (My Favorite 😁): Self-Hosting — Especially Using Kubernetes</h3>
<p>If you self-host CockroachDB, you get <strong>full control</strong>. You’re the boss of everything: the machines, storage, networking, backups, upgrades, monitoring – all of it.</p>
<p>What’s even better is that using Kubernetes means your setup isn’t tied to one cloud provider. You can run it on AWS, GCP, Azure, or even on-premises later, with very little change. Kubernetes gives you a “portable infra” layer.</p>
<p>Managed CockroachDB services charge you extra for “maintenance, upgrades, backup, etc.” – those are baked into the price. But when you self-host, you accept the burden, but also avoid paying that extra margin. You pay for compute, disks, network, and your time/ops work.</p>
<p>You can also self-host in the cloud (using cloud VMs) but still manage every layer: disks, network, security, and so on. Using Kubernetes, there is a sweet middle ground: you get cloud reliability for VMs, but you fully control everything above that.</p>
<h4 id="heading-why-kubernetes-beats-tools-like-docker-swarm-or-hashicorp-nomad-for-databases">Why Kubernetes Beats Tools Like Docker Swarm or Hashicorp Nomad for Databases</h4>
<p>Because CockroachDB is a <strong>stateful</strong> system (it holds data), you need strong support for “data that stays even when a pod restarts or moves.” Kubernetes is designed with good primitives for that. Other tools don’t always shine there.</p>
<p>Here’s the comparison in simple terms:</p>
<ul>
<li><p><strong>Docker Swarm / Docker Compose:</strong> Great for stateless apps (web servers, APIs), but when it comes to databases, it struggles. Swarm doesn’t natively support persistent volume claims at a cluster level, so if a container (database replica) moves to a different node (VM), it might lose access to its storage. Devs often pin containers to specific nodes manually to avoid this.</p>
</li>
<li><p><strong>Nomad:</strong> More flexible and simpler in some ways, but it’s not as rich in features around connectivity, storage management, and built-in tooling for containers. It works well in mixed workloads, but handling complex databases usually means you need to build extra layers.</p>
</li>
<li><p><strong>Kubernetes:</strong> It has built-in support for stateful workloads:</p>
<ul>
<li><p><strong>StatefulSets (Properly managing data for each database):</strong> This ensures that each CockroachDB replica (pod) keeps its identity and storage intact even if the pod restarts. So the database replica doesn’t lose its “name” or data when things change.</p>
</li>
<li><p><strong>Persistent volumes and persistent volume claims (external disks):</strong> These are like dedicated hard drives or disks attached to pods (database replicas). Even if a pod moves, crashes, or restarts, the disk (data) stays. Kubernetes makes sure the data stays safe.</p>
</li>
<li><p><strong>StorageClasses (choose your disk):</strong> You can customize the disks in which your data will be stored, that is:</p>
<ul>
<li><p>HDD (most affordable, but slower),</p>
</li>
<li><p>Balanced Disk (SSD enabled, a balance between costs and speed),</p>
</li>
<li><p>Fast SSD (Very fast, recommended by the CockroachDB team, but a bit more expensive than a Balanced Disk).</p>
</li>
<li><p>Rolling updates, anti-affinity, (No Downtime, High Availability, Fault tolerance).<br>  Anti-affinity means you can tell Kubernetes, “don’t put more than one CockroachDB replica on the same VM or physical machine.” This protects you if one VM goes bad, other replicas are safe.</p>
</li>
<li><p>Rolling updates let you update one replica at a time (configuration, version, resources) without bringing down the whole cluster. While one replica updates, others serve traffic. That helps avoid downtime.</p>
</li>
<li><p>Kubernetes also has ordered start/stop for replicas (via StatefulSets) so things are predictable and safe</p>
</li>
</ul>
</li>
<li><p><strong>Vertical vs horizontal scaling (earlier talk – reminder)</strong><br>  You remember we talked about scaling in prior sections:</p>
<ul>
<li><p><strong>Horizontal scaling</strong> means adding more replicas (more pods, more nodes) so load spreads out.</p>
</li>
<li><p><strong>Vertical scaling</strong> means increasing the resources (CPU, RAM, disk) of existing nodes/replicas.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>        In tools like Nomad or Docker Swarm, vertical scaling tends to be harder, often involves stopping services, shutting things down, and restarting VMs, which causes downtime.</p>
<p>        Kubernetes makes vertical and horizontal scaling easier at the pod level (you can resize one pod CPU + RAM) and manage rolling upgrades so you don’t take everything down at once.</p>
<p>        You can also add more database replicas to the cluster easily (to balance load and make the database process queries faster), and the data is automatically copied to the new database replica (replication), especially when you use the official CockroachDB Helm Chart.</p>
<h4 id="heading-why-other-tools-swarm-nomad-docker-compose-dont-match-up-here">Why Other Tools (Swarm / Nomad / Docker Compose) Don’t Match Up Here</h4>
<p>Docker Swarm and Docker Compose are simpler to use and are good when you don’t have much complexity. But they lack robust features for stable storage, default support for replication, vertical scaling, horizontal scaling of stateful services, and so on. For example, Swarm doesn’t have built-in StatefulSets or dynamic volume provisioning like Kubernetes.</p>
<p>Nomad is more flexible than Swarm in some ways, but many users say storage plugins (CSI) are weaker than what Kubernetes has. Also, less built-in for ordering things, rolling updates for stateful apps.</p>
<p>So while these work fine for simpler apps (stateless services, small apps), when you have a distributed stateful SQL database like CockroachDB, Kubernetes gives you more safety, more control, less chance of data loss or misconfiguration.</p>
<p>Because of all this, running CockroachDB on Kubernetes gives you the tools you need baked in, reducing how much custom plumbing you must write yourself.</p>
<h4 id="heading-trade-offhttpswwwredditcomrhashicorpcomments1ivtuo5utmsourcechatgptcoms-things-to-watch-out-for">Trade-of<a target="_blank" href="https://www.reddit.com/r/hashicorp/comments/1ivtuo5?utm_source=chatgpt.com">f</a>s (things to watch out for)</h4>
<ul>
<li><p>You have to manage everything: backups, monitoring the ENTIRE CockroachDB cluster, withstanding failures (fault tolerance), and upgrades. That’s work 🥲.</p>
</li>
<li><p>You need to know your way around infra (VMs, disks, networking, and inter-node connections) and operations (or have teammates who do – DevOps Engineers, Cloud Architects, Site Reliability Engineers).</p>
</li>
<li><p>Using managed Kubernetes (like GKE, EKS, AKS) helps as you offload the control plane. You still manage the nodes, storage, and higher layers.</p>
</li>
<li><p>But even with that, you avoid paying for “database management as a service” markup – you're only paying for infrastructure plus your time.</p>
</li>
</ul>
<h2 id="heading-setting-up-your-local-environment"><strong>Setting Up Your Local Environment 🧑‍💻</strong></h2>
<p>Alright, we’ve learned quite a bit so far: what CockroachDB is, how it works behind the scenes, and where you can host it. Now, it’s time to roll up our sleeves and get our hands dirty with some practical setup.</p>
<p>Before we deploy CockroachDB, we need a safe “playground” where we can test and experiment without touching the cloud or spending a dime.</p>
<h3 id="heading-why-these-tools">Why these tools?</h3>
<p>Before we jump into running commands, here’s a quick lookup of what tools we’ll use and why:</p>
<ul>
<li><p><strong>Minikube</strong>: A tool that runs a small Kubernetes cluster on your computer. It gives you a local “mini cloud” where you can deploy and experiment.</p>
</li>
<li><p><strong>Kubectl</strong>: The command line tool you’ll use to talk to your Kubernetes cluster to deploy apps, check status, and manage resources.</p>
</li>
<li><p><strong>Helm</strong>: A package manager for Kubernetes. It helps you install complex applications (like CockroachDB) with fewer manual steps.</p>
</li>
</ul>
<h3 id="heading-step-1-install-minikube">Step 1: Install Minikube</h3>
<p><strong>What is Minikube?</strong><br>Minikube is a lightweight tool that helps you run a small Kubernetes cluster on your personal computer.</p>
<p>Think of it as your own mini-cloud environment where you can test, deploy, and learn Kubernetes (and in our case, CockroachDB) locally. It’s perfect for learning and experimenting before deploying on the cloud.</p>
<p>Here’s how to get it on different operating systems:</p>
<h4 id="heading-windows">🪟 Windows</h4>
<ol>
<li><p>Make sure you have a hypervisor (VirtualBox, Hyper-V) or Docker installed.</p>
</li>
<li><p>Open PowerShell as Administrator.</p>
</li>
<li><p>Run:</p>
<pre><code class="lang-bash"> choco install minikube
</code></pre>
<p> or use:</p>
<pre><code class="lang-bash"> winget install minikube
</code></pre>
</li>
<li><p>After installation, check the version:</p>
<pre><code class="lang-bash"> minikube version
</code></pre>
<p> If it returns a version number, you’re good 👍🏾</p>
</li>
</ol>
<p>If you don’t have the <code>choco</code> or <code>winget</code> package manager, you can install Minikube via PowerShell by following the steps in the <a target="_blank" href="https://minikube.sigs.k8s.io/docs/start/?arch=%2Fwindows%2Fx86-64%2Fstable%2F.exe+download">docs</a>.</p>
<h4 id="heading-macos">🍎 macOS</h4>
<ol>
<li><p>Ensure you have Homebrew installed.</p>
</li>
<li><p>In Terminal, run:</p>
<pre><code class="lang-bash"> brew install minikube
</code></pre>
</li>
<li><p>Start the cluster:</p>
<pre><code class="lang-bash"> minikube start
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> minikube version
</code></pre>
</li>
</ol>
<h4 id="heading-linux">🐧 Linux</h4>
<ol>
<li><p>Ensure you’re on a supported distribution (Ubuntu, Fedora, and so on) and virtualization (Docker, KVM, and so on) is enabled.</p>
</li>
<li><p>Run:</p>
<pre><code class="lang-bash"> curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
 sudo install minikube-linux-amd64 /usr/<span class="hljs-built_in">local</span>/bin/minikube
 rm minikube-linux-amd64
</code></pre>
</li>
<li><p>Start the cluster:</p>
<pre><code class="lang-bash"> minikube start
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> minikube status
</code></pre>
</li>
</ol>
<p>✅ At this point you should have a local Kubernetes cluster up and running on your machine! Next, we’ll install Kubectl so you can talk to the cluster from your command line.</p>
<h3 id="heading-step-2-install-kubectl">Step 2: Install kubectl</h3>
<p><strong>What kubectl does:</strong><br>kubectl is the command-line tool that lets you talk to your Kubernetes cluster. Using it, you can deploy applications, check your cluster’s health, and manage resources inside your cluster.</p>
<p>You’ll use it a lot when working with Kubernetes on Minikube and later when you deploy CockroachDB.</p>
<p>Here’s how to install it on Windows, macOS, and Linux:</p>
<h4 id="heading-windows-1">🪟 Windows</h4>
<ol>
<li><p>Open PowerShell as Administrator.</p>
</li>
<li><p>Run:</p>
<pre><code class="lang-bash"> choco install kubernetes-cli
</code></pre>
<p> or if you prefer:</p>
<pre><code class="lang-bash"> choco install kubectl
</code></pre>
</li>
<li><p>Then check the version:</p>
<pre><code class="lang-bash"> kubectl version --client
</code></pre>
<p> If it prints a version number, you’re good.</p>
</li>
</ol>
<h4 id="heading-macos-1">🍎 macOS</h4>
<ol>
<li><p>Open Terminal.</p>
</li>
<li><p>If you have Homebrew installed, run:</p>
<pre><code class="lang-bash"> brew install kubectl
</code></pre>
</li>
<li><p>Check the version:</p>
<pre><code class="lang-bash"> kubectl version --client
</code></pre>
<p> That should show something like “Client Version: v1.x.x”.</p>
</li>
</ol>
<h4 id="heading-linux-1">🐧 Linux</h4>
<ol>
<li><p>Open your terminal.</p>
</li>
<li><p>Download the latest kubectl binary:</p>
<pre><code class="lang-bash"> curl -LO <span class="hljs-string">"https://dl.k8s.io/release/<span class="hljs-subst">$(curl -L -s https://dl.k8s.io/release/stable.txt)</span>/bin/linux/amd64/kubectl"</span>
</code></pre>
</li>
<li><p>Make it executable and move it into your PATH:</p>
<pre><code class="lang-bash"> chmod +x ./kubectl
 sudo mv ./kubectl /usr/<span class="hljs-built_in">local</span>/bin/kubectl
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> kubectl version --client
</code></pre>
</li>
</ol>
<p>After this, you’ll have kubectl installed and ready to use with your local Minikube cluster. Next up we’ll install Helm, which will make deploying CockroachDB much easier.</p>
<h3 id="heading-step-3-install-helm">Step 3: Install Helm</h3>
<p>Helm is basically the package manager for Kubernetes. Think of it like how you use <code>apt</code>, <code>yum</code>, or <code>brew</code> to install software on your computer. Helm does something similar for Kubernetes apps.</p>
<p>With Kubernetes, deploying a full app often means writing lots of configs (manifests – Deployments, Services, PersistentVolumes, ConfigMaps, and so on). Helm lets us bundle all of that into a single “package” (called a chart) so we don’t have to manually create the resources one-after-the-other (which could be hectic to manage btw 😖).</p>
<p>Because our goal is to deploy a pretty complex system (CockroachDB) on Kubernetes – which includes stateful nodes, persistent storage, networking, SSL/TLS, and so on – using a Helm chart makes it <em>so much easier</em> than crafting dozens of YAML files from scratch.</p>
<p>So before we install CockroachDB, we’ll install Helm. This gives us the toolkit to deploy and manage our cluster much more easily.</p>
<p>Let’s install Helm on each platform. After this, you’ll have the <code>helm</code> command ready to deploy apps into your Kubernetes cluster.</p>
<h4 id="heading-windows-2">🪟 Windows</h4>
<ol>
<li><p>Open PowerShell as Administrator.</p>
</li>
<li><p>If you have Chocolatey installed, run:</p>
<pre><code class="lang-bash"> choco install kubernetes-helm
</code></pre>
<p> Alternatively:</p>
<pre><code class="lang-bash"> choco install helm
</code></pre>
</li>
<li><p>Confirm installation:</p>
<pre><code class="lang-bash"> helm version
</code></pre>
<p> You should see something like <code>version.BuildInfo{Version:"v3.x.x",…}</code>.</p>
</li>
</ol>
<h4 id="heading-macos-2">🍎 macOS</h4>
<ol>
<li><p>Open Terminal.</p>
</li>
<li><p>With Homebrew installed, run:</p>
<pre><code class="lang-bash"> brew install helm
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> helm version
</code></pre>
<p> If you see version info, you’re good.</p>
</li>
</ol>
<h4 id="heading-linux-2">🐧 Linux</h4>
<ol>
<li><p>Open your terminal.</p>
</li>
<li><p>Download and install the binary (example for the latest version):</p>
<pre><code class="lang-bash"> curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
 chmod 700 get_helm.sh
 ./get_helm.sh
</code></pre>
<p> Or you can directly download the binary and move it into your <code>PATH</code>.</p>
</li>
<li><p>Check version:</p>
<pre><code class="lang-bash"> helm version
</code></pre>
</li>
</ol>
<p>✅ After this, you have <code>helm</code> installed and you’re ready to use it.</p>
<p>In the next part, we’ll use Helm to install CockroachDB into your local Minikube cluster. We’ll add the CockroachDB chart, configure it, and spin up a multi-node replica setup right on your PC.</p>
<h2 id="heading-deploying-cockroachdb-on-minikube-the-fun-part-begins">Deploying CockroachDB on Minikube (The Fun Part Begins 😁!)</h2>
<p>Before we go to the cloud, we’ll deploy CockroachDB locally on Minikube using Helm.</p>
<p>This process will help us:</p>
<ul>
<li><p>Understand how CockroachDB runs in a cluster</p>
</li>
<li><p>Learn how Kubernetes manages database replicas</p>
</li>
<li><p>Gain hands-on experience before deploying to the cloud</p>
</li>
</ul>
<h3 id="heading-step-1-visit-artifacthub">Step 1: Visit ArtifactHub</h3>
<p><strong>ArtifactHub</strong> is like an App Store for Kubernetes Helm Charts – a huge collection of open-source Helm charts and packages you can easily install.</p>
<ol>
<li><p>Go to <a target="_blank" href="https://artifacthub.io">https://artifacthub.io</a></p>
</li>
<li><p>In the search bar, type <strong>CockroachDB</strong></p>
</li>
<li><p>Click the <strong>CockroachDB Helm chart</strong> result (you’ll see it published by <em>Cockroach Labs</em>).</p>
</li>
</ol>
<p>You’ll see something like this 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760848079912/1778bbcf-088a-4919-80bb-ca24241ffa85.png" alt="The official CockroachDB Helm chart" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-2-explore-the-helm-chart">Step 2: Explore the Helm Chart</h3>
<p>You’ll notice a lot of information on the page:</p>
<ul>
<li><p><strong>README</strong> – the documentation for installing and customizing CockroachDB</p>
</li>
<li><p><strong>Default Values</strong> – all the settings that define how the database runs</p>
</li>
</ul>
<p>Don’t worry if it looks overwhelming. We’ll walk through it together 😉</p>
<h3 id="heading-step-3-copy-the-default-values">Step 3: Copy the Default Values</h3>
<p>Every Helm chart has a <em>default configuration</em> file. These defaults are usually too advanced or too heavy for local setups, so we’ll create our own lighter version. But first, let’s copy the original for reference.</p>
<ol>
<li><p>On the CockroachDB chart page, click the <strong>Default Values</strong> button.</p>
</li>
<li><p>A modal window will pop up showing a long YAML file.</p>
</li>
<li><p>Click the <strong>Copy</strong> icon in the top-right corner to copy all the default values.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760848210119/17cd734b-6d7c-40dc-a8c3-f01c85edd7a7.png" alt="The Default Values button description" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760848520060/1e1ce249-0cf0-46cb-abbc-00efb3ea1343.png" alt="Copy the default values" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-4-create-a-folder-for-our-project">Step 4: Create a Folder for Our Project</h3>
<p>We’ll keep everything organized in a single folder.</p>
<pre><code class="lang-bash">mkdir cockroachdb-tutorial
<span class="hljs-built_in">cd</span> cockroachdb-tutorial
</code></pre>
<p>Inside this folder, create a new file called:</p>
<pre><code class="lang-bash">nano cockroachdb-original-values.yml
</code></pre>
<p>Now paste all the default values you copied earlier (use Ctrl+V or right-click → Paste), then save and exit (<code>Ctrl+O</code>, then <code>Ctrl+X</code> in nano).</p>
<p>If you’re on Windows, just open Notepad/VSCode, paste the content, and save the file in the same folder.</p>
<h3 id="heading-step-5-understanding-the-key-configurations">Step 5: Understanding the Key Configurations</h3>
<p>Let’s break down a few important values you’ll notice in the file.</p>
<h4 id="heading-statefulsetreplicas">🧩 <code>statefulset.replicas</code></h4>
<p>This tells CockroachDB how many database nodes (replicas) to run in the cluster. By default, it’s set to 3, meaning you’ll have 3 independent database instances that can all read and write data.</p>
<h4 id="heading-statefulsetresourcesrequests-and-statefulsetresourceslimits">⚙️ <code>statefulset.resources.requests</code> and <code>statefulset.resources.limits</code></h4>
<p>These settings tell Kubernetes how much CPU and memory to give CockroachDB.</p>
<ul>
<li><p><code>requests</code>: the minimum guaranteed amount</p>
</li>
<li><p><code>limits</code>: the maximum allowed amount</p>
</li>
</ul>
<p>CockroachDB can be a bit greedy with memory 😅, so limits make sure it doesn’t take everything and leave no room for other apps.</p>
<h4 id="heading-storagepersistentvolumesize">💾 <code>storage.persistentVolume.size</code></h4>
<p>This defines how much disk space each CockroachDB node gets. For example, if you set it to <code>10Gi</code> and you have 3 replicas, total usage = <code>30Gi</code>.</p>
<h4 id="heading-storagepersistentvolumestorageclass">💽 <code>storage.persistentVolume.storageClass</code></h4>
<p>This defines the type of disk to use:</p>
<ul>
<li><p><code>standard</code>: HDD (cheap but slow)</p>
</li>
<li><p><code>standard-rwo</code>: SSD (faster and affordable)</p>
</li>
<li><p><code>pd-ssd</code> or <code>fast-ssd</code>: NVMe (super fast but pricey)</p>
</li>
</ul>
<p>You can check available storage classes in your Minikube cluster using:</p>
<pre><code class="lang-bash">kubectl get sc
</code></pre>
<p>On Minikube, the default storage class is usually <code>standard</code>.</p>
<p>You can learn more about <a target="_blank" href="https://cloud.google.com/kubernetes-engine/docs/concepts/storage-overview">Google Cloud storage classes here</a>.</p>
<h4 id="heading-tlsenabled">🔐 <code>tls.enabled</code></h4>
<p>This controls whether CockroachDB requires <strong>TLS certificates</strong> for secure connections.</p>
<p>If <code>true</code>, you’ll need to generate certificates for any app or client that connects to your cluster (instead of using a username and password). This is <strong>strongly recommended for production</strong>, but for our local Minikube setup, we’ll disable it so it’s easier to play around and test connections.</p>
<h3 id="heading-step-6-create-a-simplified-values-config-for-the-cockroachdb-helm-chart">Step 6: Create a Simplified Values Config for the CockroachDB Helm Chart</h3>
<p>We’ll now create a new config file with lighter resource settings for our local test environment.</p>
<p>In the same folder, create:</p>
<pre><code class="lang-bash">nano cockroachdb-values.yml
</code></pre>
<p>Then paste this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">podSecurityContext:</span>
    <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span> <span class="hljs-comment"># You should have 3GB+ of RAM free on your device; else, you can reduce this to 500Mi (this will result in your PC needing just 1.5 GB of RAM free)</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>  <span class="hljs-comment"># The same with this, you can reduce it to 500m CPU if you don't have up to 3 CPU cores (1 CPU core * 3 replicas)</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">podAntiAffinity:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>
  <span class="hljs-attr">nodeSelector:</span>
    <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>

<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">5Gi</span> <span class="hljs-comment"># Make sure you have 15GB+ of free storage on your local machine, if not, you can reduce it to 2 - 3 Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">standard</span>

<span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>

<span class="hljs-attr">init:</span>
  <span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">wait:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Setting the <code>requests</code> and <code>limits</code> to the same value ensures Kubernetes won’t terminate CockroachDB pods due to high memory or CPU usage.</p>
<p>You can <a target="_blank" href="https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/">read more about this here</a>.</p>
<h3 id="heading-overview-of-the-yaml-values">Overview of the YAML values</h3>
<p>Now, let’s understand the content of the <code>cockroachdb-values.yml</code> file together</p>
<p><code>podSecurityContext</code> – why you needed it on Minikube:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">podSecurityContext:</span>
  <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
</code></pre>
<p>This block sets the Linux user and group IDs that the CockroachDB process runs as inside the container, and the group ownership for mounted files.</p>
<p>Why this matters, simply:</p>
<ul>
<li><p>The CockroachDB process runs as <strong>UID 1000</strong> inside the container. If the disk mount (the persistent volume) is owned by a different UID, Cockroach can’t create files there and fails with <code>permission denied</code>.</p>
</li>
<li><p><code>runAsUser</code> and <code>runAsGroup</code> make the container process run as UID/GID 1000.</p>
</li>
<li><p><code>fsGroup</code> makes the mounted volume be accessible to that group, so the process can write to <code>/cockroach/cockroach-data</code>.</p>
</li>
</ul>
<p>In short, these lines make sure the DB process has permission to create and write files on the mounted disk (volume), which is especially important on Minikube and other local setups where host-mounted storage can have odd permissions.</p>
<p><code>podAntiAffinity</code> and <code>nodeSelector</code> – what they do:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">podAntiAffinity:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>

<span class="hljs-attr">nodeSelector:</span>
  <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
</code></pre>
<p><code>podAntiAffinity</code> is the default behavior. Normally this tells Kubernetes to <em>spread</em> pods across different nodes (VMs), so replicas don’t run on the same physical machine. This is good for high availability, because one node failing won’t kill multiple replicas.</p>
<p>By setting <code>type: ""</code> (empty), you <strong>disabled</strong> that spreading rule, so Kubernetes can place multiple CockroachDB replicas on the same node.</p>
<p><code>nodeSelector</code> tells Kubernetes to schedule pods only on nodes that match the label you set (here <code>kubernetes.io/hostname: minikube</code>). That forces all pods to run on the node named <code>minikube</code>.</p>
<p>Quick summary of the effect:</p>
<ul>
<li><p>Good for local testing on a multi-node Minikube cluster, when only one node has properly mounted writable storage.</p>
</li>
<li><p><strong>Not recommended for production</strong>, because it places all replicas on the same machine (single point of failure).</p>
</li>
</ul>
<p>PS: If you’re using another Kubernetes cluster provider, for example K3s, Kind, and so on… this might not get deployed due to the nodeSelector property targeting <code>minikube</code> nodes. So, I'd advise removing the <code>nodeSelector</code> property entirely.</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">nodeSelector:</span>
    <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
<span class="hljs-string">...</span>
</code></pre>
<p>✅ <strong>At this point</strong>, we’ve:</p>
<ul>
<li><p>Copied the default CockroachDB Helm chart configuration</p>
</li>
<li><p>Created a lightweight version for Minikube</p>
</li>
<li><p>Learned what each key property means</p>
</li>
</ul>
<h3 id="heading-step-7-install-the-cockroachdb-cluster-using-helm">🚀 Step 7: Install the CockroachDB Cluster Using Helm</h3>
<p>Great job so far! You’ve created your <code>cockroachdb-values.yml</code> file and set up your custom configuration for Minikube. Now we’ll actually deploy the cluster.</p>
<p><strong>What we’re going to do:</strong><br>We’ll use Helm to install the official CockroachDB Helm chart using our custom values. This will spin up your 3-node cluster locally so you can play with it.</p>
<p><strong>Command to run:</strong></p>
<pre><code class="lang-bash">helm install crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>Here:</p>
<ul>
<li><p><code>crdb</code> is the name we’re giving this release (you can pick something else if you like).</p>
</li>
<li><p><code>cockroachdb/cockroachdb</code> tells Helm which chart to use.</p>
</li>
<li><p><code>-f cockroachdb-values.yml</code> tells Helm to use our custom file instead of default values.</p>
</li>
</ul>
<h4 id="heading-after-the-command-runs">After the command runs:</h4>
<p>After a little while the command completes, and you’ll see output telling you what resources were created (pods, services, persistent volume claims, and so on).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761386160496/babc3e67-1ea9-4aa1-b6a7-516fe3a9972a.png" alt="The CockroachDB Helm Chart post-installation message" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now to check if everything is working, do this:</p>
<pre><code class="lang-bash">kubectl get pods | grep -i crdb
</code></pre>
<p>This filters pods with “crdb” in the name (our release prefix).</p>
<p>You should see something like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761386195190/21469ce5-c909-4336-ba5f-a4c4a776a470.png" alt="The CockroachDB replicas running successfully" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The three primary pods (<code>0</code>, <code>1</code>, <code>2</code>) should be in <code>Running</code> state. The <code>init</code> job or pod (<code>crdb-cockroachdb-init-xxx</code>) should show <code>Completed</code>. This means the initialization tasks (cluster bootstrap) succeeded.</p>
<p>If you see that, congratulations! You’ve got your local CockroachDB cluster up and running! 🎉</p>
<h2 id="heading-accessing-the-cockroachdb-console-amp-viewing-metrics">Accessing the CockroachDB Console &amp; Viewing Metrics</h2>
<p>Alright! Now that our CockroachDB cluster is up and running, let’s take a peek behind the scenes and explore the CockroachDB Admin Console. It’s a beautiful web dashboard that helps us visualize everything happening in our database cluster.</p>
<p>In this section, we’ll learn how to:</p>
<ul>
<li><p>Access the CockroachDB admin console right from your browser 🖥️</p>
</li>
<li><p>Understand what each built-in dashboard shows (CPU, memory, disk, SQL performance)</p>
</li>
<li><p>Confirm that our cluster is healthy and that all 3 nodes are working together perfectly</p>
</li>
</ul>
<h3 id="heading-step-1-locate-the-cockroachdb-public-service">Step 1: Locate the CockroachDB Public Service</h3>
<p>CockroachDB automatically creates a <strong>public service</strong> that allows us to connect to the database and also access its dashboard.</p>
<p>Let’s check it out by running:</p>
<pre><code class="lang-bash">kubectl get svc | grep -i crdb
</code></pre>
<p>You should see a line similar to:</p>
<pre><code class="lang-bash">crdb-cockroachdb-public   ClusterIP   10.x.x.x   &lt;none&gt;   26257/TCP,8080/TCP   ...
</code></pre>
<p>This service (<code>crdb-cockroachdb-public</code>) is what we’ll use to connect to both:</p>
<ul>
<li><p>The <strong>database</strong> itself (via port 26257)</p>
</li>
<li><p>The <strong>dashboard UI</strong> (via port 8080)</p>
</li>
</ul>
<h3 id="heading-step-2-learn-more-about-the-service">Step 2: Learn More About the Service</h3>
<p>Let’s dig a little deeper to understand it:</p>
<pre><code class="lang-bash">kubectl describe svc crdb-cockroachdb-public
</code></pre>
<p>Here’s what you’ll notice:</p>
<ul>
<li><p><strong>Port 26257</strong> is used for <strong>gRPC connections</strong> (this is how applications connect to send and receive SQL queries).</p>
</li>
<li><p><strong>Port 8080</strong> is used for the <strong>web dashboard</strong>, where we can view metrics and monitor performance.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761387757614/dab8cfd0-2d89-45b0-a54f-41e530f1a6ab.png" alt="Description of the crdb-cockroachdb-public service" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-3-access-the-cockroachdb-dashboard">Step 3: Access the CockroachDB Dashboard</h3>
<p>Now, let’s make the dashboard available on your local computer. Run this command:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 8080:8080
</code></pre>
<p>This command simply tells Kubernetes:</p>
<blockquote>
<p>“Hey, please open a tunnel from my local computer’s port 8080 to the CockroachDB service’s port 8080 in the cluster.”</p>
</blockquote>
<p>Once you see something like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761387838362/186ff222-c643-4e67-b0a4-dbaff8777977.png" alt="Result of port-forwarding the crdb-cockroachdb-public service on port 8080" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>...you’re good to go!</p>
<h3 id="heading-step-4-visit-the-dashboard">Step 4: Visit the Dashboard</h3>
<p>Now, open your browser and go to http://localhost:8080.</p>
<p>You’ll see the CockroachDB Admin Console. This is your central command center for monitoring your cluster</p>
<p>Here, you’ll be able to view:</p>
<ul>
<li><p><strong>Number of replicas (nodes)</strong>: You should see 3 in our setup.</p>
</li>
<li><p><strong>RAM usage</strong> per node: Helps track how much memory each CockroachDB instance is using.</p>
</li>
<li><p><strong>CPU usage</strong>: Useful to know when your database is getting busy.</p>
</li>
<li><p><strong>Disk space</strong>: Shows how much data your cluster is storing and how much free space remains.</p>
</li>
</ul>
<p>Here’s what your dashboard might look like 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761387968743/327288e5-4811-42bf-8fd8-74ed187792a4.png" alt="The CockroachDB dashboard UI on http://localhost:8080" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-5-exploring-the-metrics-dashboard">Step 5: Exploring the Metrics Dashboard</h3>
<p>Now that you’re inside the CockroachDB Admin Console (<a target="_blank" href="http://localhost:8080">http://localhost:8080</a>), let’s take things a step further by exploring the <strong>Metrics</strong> section. This is where CockroachDB really shines.</p>
<p>On the left-hand side, click on “Metrics.” Here, you’ll find a collection of dashboards showing how your database is performing behind the scenes, things like query activity, performance, memory use, and much more.</p>
<p>These metrics help you understand what’s happening inside your cluster and make data-driven decisions – like when to scale up, optimize queries, or add more nodes.</p>
<p>We’ll start by focusing on some of the most insightful ones, such as:</p>
<ul>
<li><p><strong>SQL Queries Per Second</strong> – how busy your database is</p>
</li>
<li><p><strong>Service Latency (SQL Statements, 99th percentile)</strong> – how fast or slow your queries are</p>
</li>
</ul>
<p>Then, we’ll also look at others like SQL Contention, Replicas per Node, and Capacity to get a complete view of your CockroachDB cluster’s health.</p>
<p>Here’s what each of these metrics means in simple, everyday terms 👇🏾</p>
<h4 id="heading-sql-queries-per-second">SQL Queries Per Second</h4>
<p>This metric shows the number of SQL commands (like <code>SELECT</code>, <code>INSERT</code>, <code>UPDATE</code>, <code>DELETE</code>) your database cluster is handling every second. In simpler words, it’s how busy your database is. Imagine cars passing through a toll booth – this is the count of cars per second.</p>
<p>This is useful to know because if this number is steadily climbing, your system is getting more traffic or work. You may need to scale up (more nodes, more resources) or optimize queries. If it drops suddenly, something might be wrong (traffic drop, and so on).</p>
<p>Look for a stable or expected value for your workload. Spikes or sustained high values mean you should check performance.</p>
<h4 id="heading-service-latency-sql-statements-99th-percentile">Service Latency: SQL Statements, 99th percentile</h4>
<p>This metric shows the time it takes (for the slowest ~1 % of queries) from when the database gets the request until it finishes executing it. Think of waiting in a queue: 99% percentile is what the slowest people (1 in 100) experienced.</p>
<p>You’ll want to know this because if the slowest queries are taking too long, it might signal a bottleneck (CPU, disk, network, and so on). Low latency = good user experience.</p>
<p>So keep an eye out: if this value rises (gets worse) over time, investigate what’s slowing down. If it stays low and stable, you’re in good shape.</p>
<h4 id="heading-sql-statement-contention">SQL Statement Contention</h4>
<p>Statement contention demonstrates the number of SQL queries that got “stuck” or had to wait because other queries were using the same data or resources. This is like if two people were trying to grab the same book – one has to wait. That waiting is contention.</p>
<p>High contention means your database is chasing conflicts, waiting for locks or resources. This slows things down overall. So you’ll want to keep this number as low as possible. If it starts rising, you might need to revisit your schema, queries, or scale differently.</p>
<h4 id="heading-replicas-per-node">Replicas per Node</h4>
<p>This tells you how many copies (“replicas”) of data ranges live on each database node. If you imagine your data is like documents saved in several safes (nodes), this shows how many copies are in each safe.</p>
<p>This matters, because you want balanced replicas so no node is overloaded with too many copies (which can slow it down or put it at risk).</p>
<p>To check on this, make sure nodes have roughly equal replica counts. If one node has many more replicas, you might need to rebalance or add nodes.</p>
<h4 id="heading-capacity">Capacity</h4>
<p>Capacity shows how much disk/storage your cluster has (total), how much is used, and how much is free. Imagine a warehouse: it’s like how many boxes you can store, how many you’ve filled, and how much empty space remains.</p>
<p>You’ll need to know this, because if capacity is nearly full, you risk running out of space which can cause downtime or performance issues.</p>
<p>Free space should stay healthy (for example less than ~80% used). If it crosses that, plan to add storage or nodes.</p>
<h4 id="heading-why-these-matter-together">Why These Matter Together</h4>
<p>When you combine these metrics, you get a clear picture:</p>
<ul>
<li><p>High Queries Per Second + high latency = maybe you're under-powered.</p>
</li>
<li><p>High contention = your workload design might be fighting itself.</p>
</li>
<li><p>Imbalanced replicas or full capacity = infrastructure issues.</p>
</li>
<li><p>Stable low latency + balanced replicas + plenty of capacity = sounds like a healthy cluster.</p>
</li>
</ul>
<p>So by keeping an eye on these, you make data-driven decisions: when to scale, when to optimize, when to tweak configs.</p>
<h3 id="heading-step-6-creating-a-little-load-on-the-cockroachdb-cluster">Step 6: Creating a Little Load on the CockroachDB Cluster</h3>
<p>So far, we’ve explored the CockroachDB dashboard and understood what each metric means. Now, let’s make things a bit more fun. 🎉</p>
<p>In this part, we’ll run a simple Python app that connects to our CockroachDB cluster and performs a few database operations (creating, updating, deleting, and retrieving some records). This will help us generate a small load on the database so we can actually see the metrics in action.</p>
<p>Here’s what we’ll be doing step-by-step 👇🏾</p>
<h4 id="heading-step-61-create-a-configmap-for-our-books-data">Step 6.1: Create a ConfigMap for Our Books Data</h4>
<p>We’ll first create a list of 20 books that our Python script will interact with. Each book will have basic info like name, author, genre, pages, and price.</p>
<ol>
<li><p>Create a new file called <code>books.json</code></p>
<ul>
<li><p>On Linux:</p>
<pre><code class="lang-bash">  nano books.json
</code></pre>
<p>  Paste the below JSON content into it.</p>
<pre><code class="lang-json">  [
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Bright Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Ava Hart"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9783218196000"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2020</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">234</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">10.99</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Library"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Liam Stone"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9783863794026"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1993</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">358</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Romance"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">30.2</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Shadow Archive"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Maya Chen"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781615594078"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2001</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">404</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"History"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">16.21</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Bright Voyage"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Noah Rivers"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9785931034133"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1987</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">507</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">13.14</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Shadow Garden"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Zara Malik"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9785534192834"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2004</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">404</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Sci-Fi"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">28.13</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Ethan Brooks"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9785030564135"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2009</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">508</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Self-Help"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">20.79</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Atomic Atlas"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Iris Park"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9787242388493"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2025</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">442</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Romance"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">18.5</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The First Library"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Caleb Nguyen"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9787101226911"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2017</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">528</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Romance"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">24.47</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal River"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Sofia Diaz"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781845146276"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2004</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">599</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">31.15</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Archive"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Jude Bennett"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784893252883"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1996</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">632</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">40.47</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Last Compass"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Nina Volkova"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784303911713"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2018</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">451</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"History"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">29.53</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Garden"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Omar Haddad"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784896383461"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1988</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">251</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Thriller"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">36.38</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Silent Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Priya Kapoor"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781509839308"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2008</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">649</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">28.05</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Compass"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Felix Romero"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781834738291"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2025</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">180</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Self-Help"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">19.15</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Lost Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Tara Quinn"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781165667017"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2010</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">368</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">41.37</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Last Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Hana Sato"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9783387262476"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2005</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">467</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Nonfiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">42.01</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Archive"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Leo Fischer"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9780801326776"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1984</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">573</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Nonfiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">42.31</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Atlas"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Mila Novak"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784746872343"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2005</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">180</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Nonfiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">16.58</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Compass"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Arthur Wells"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9780097882086"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1983</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">713</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">39.42</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Silent Atlas"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Selene Ortiz"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781939909169"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1991</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">190</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Self-Help"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">33.79</span>
    }
  ]
</code></pre>
<p>  To save and close the file in nano:</p>
<ul>
<li><p>Press <code>CTRL + O</code> → then <code>ENTER</code> (to save)</p>
</li>
<li><p>Press <code>CTRL + X</code> (to exit the editor)</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p>Then create a ConfigMap from the file:</p>
<pre><code class="lang-bash"> kubectl create configmap books-json --from-file=books.json
</code></pre>
</li>
</ol>
<h4 id="heading-step-62-create-the-python-script-configmap">Step 6.2: Create the Python Script ConfigMap</h4>
<p>Next, we’ll create a simple Python script that:</p>
<ul>
<li><p>Creates a new table for books</p>
</li>
<li><p>Inserts 20 records</p>
</li>
<li><p>Updates 7 of them</p>
</li>
<li><p>Deletes 5</p>
</li>
<li><p>Retrieves 15 books from the database</p>
</li>
</ul>
<p>It’s like simulating a small library app. 📚</p>
<p>Create a new file called <code>books-script.yml</code> and paste the content below:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ConfigMap</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">books-script</span>
<span class="hljs-attr">data:</span>
  <span class="hljs-attr">run.py:</span> <span class="hljs-string">|
    #!/usr/bin/env python3
    import argparse
    import json
    import os
    import sys
    import time
    from typing import List, Dict
</span>
    <span class="hljs-string">import</span> <span class="hljs-string">psycopg</span>
    <span class="hljs-string">from</span> <span class="hljs-string">psycopg.rows</span> <span class="hljs-string">import</span> <span class="hljs-string">dict_row</span>

    <span class="hljs-string">DDL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    CREATE TABLE IF NOT EXISTS books (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name STRING NOT NULL,
        author STRING NOT NULL,
        isbn STRING UNIQUE,
        published_year INT4,
        pages INT4,
        genre STRING,
        price DECIMAL(10,2),
        created_at TIMESTAMPTZ NOT NULL DEFAULT now()
    );
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">INSERT_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    INSERT INTO books (name, author, isbn, published_year, pages, genre, price)
    VALUES (%s, %s, %s, %s, %s, %s, %s);
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">UPDATE_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    UPDATE books
    SET price = %s, pages = %s
    WHERE isbn = %s;
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">DELETE_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    DELETE FROM books
    WHERE isbn = %s;
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">GET_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    SELECT id, name, author, isbn, published_year, pages, genre, price, created_at
    FROM books
    WHERE isbn = %s;
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">def</span> <span class="hljs-string">load_books(path:</span> <span class="hljs-string">str)</span> <span class="hljs-string">-&gt;</span> <span class="hljs-string">List[Dict]:</span>
        <span class="hljs-string">with</span> <span class="hljs-string">open(path,</span> <span class="hljs-string">"r"</span><span class="hljs-string">)</span> <span class="hljs-attr">as f:</span>
            <span class="hljs-string">return</span> <span class="hljs-string">json.load(f)</span>

    <span class="hljs-string">def</span> <span class="hljs-string">connect_with_retry(dsn:</span> <span class="hljs-string">str,</span> <span class="hljs-attr">attempts:</span> <span class="hljs-string">int</span> <span class="hljs-string">=</span> <span class="hljs-number">30</span><span class="hljs-string">,</span> <span class="hljs-attr">delay:</span> <span class="hljs-string">float</span> <span class="hljs-string">=</span> <span class="hljs-number">2.0</span><span class="hljs-string">):</span>
        <span class="hljs-string">last_exc</span> <span class="hljs-string">=</span> <span class="hljs-string">None</span>
        <span class="hljs-string">for</span> <span class="hljs-string">_</span> <span class="hljs-string">in</span> <span class="hljs-string">range(attempts):</span>
            <span class="hljs-attr">try:</span>
                <span class="hljs-string">conn</span> <span class="hljs-string">=</span> <span class="hljs-string">psycopg.connect(dsn,</span> <span class="hljs-string">autocommit=False)</span>
                <span class="hljs-string">return</span> <span class="hljs-string">conn</span>
            <span class="hljs-attr">except Exception as e:</span>
                <span class="hljs-string">last_exc</span> <span class="hljs-string">=</span> <span class="hljs-string">e</span>
                <span class="hljs-string">time.sleep(delay)</span>
        <span class="hljs-string">raise</span> <span class="hljs-string">last_exc</span>

    <span class="hljs-string">def</span> <span class="hljs-string">main():</span>
        <span class="hljs-string">ap</span> <span class="hljs-string">=</span> <span class="hljs-string">argparse.ArgumentParser()</span>
        <span class="hljs-string">ap.add_argument("--dsn",</span> <span class="hljs-string">required=True,</span> <span class="hljs-string">help="Postgres/CockroachDB</span> <span class="hljs-string">DSN")</span>
        <span class="hljs-string">ap.add_argument("--json",</span> <span class="hljs-string">default="/app/books.json",</span> <span class="hljs-string">help="Path</span> <span class="hljs-string">to</span> <span class="hljs-string">books</span> <span class="hljs-string">JSON")</span>
        <span class="hljs-string">args</span> <span class="hljs-string">=</span> <span class="hljs-string">ap.parse_args()</span>

        <span class="hljs-string">books</span> <span class="hljs-string">=</span> <span class="hljs-string">load_books(args.json)</span>
        <span class="hljs-string">print(f"Loaded</span> {<span class="hljs-string">len(books)</span>} <span class="hljs-string">books")</span>

        <span class="hljs-string">conn</span> <span class="hljs-string">=</span> <span class="hljs-string">connect_with_retry(args.dsn)</span>
        <span class="hljs-string">conn.row_factory</span> <span class="hljs-string">=</span> <span class="hljs-string">dict_row</span>
        <span class="hljs-attr">try:</span>
            <span class="hljs-attr">with conn:</span>
                <span class="hljs-string">with</span> <span class="hljs-string">conn.cursor()</span> <span class="hljs-attr">as cur:</span>
                    <span class="hljs-string">print("Creating</span> <span class="hljs-string">table...")</span>
                    <span class="hljs-string">cur.execute(DDL)</span>

                    <span class="hljs-string">print("Inserting</span> <span class="hljs-number">20</span> <span class="hljs-string">books...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[:20]:</span>
                        <span class="hljs-string">cur.execute(INSERT_SQL,</span> <span class="hljs-string">(</span>
                            <span class="hljs-string">b["name"],</span> <span class="hljs-string">b["author"],</span> <span class="hljs-string">b["isbn"],</span>
                            <span class="hljs-string">b.get("published_year"),</span> <span class="hljs-string">b.get("pages"),</span>
                            <span class="hljs-string">b.get("genre"),</span> <span class="hljs-string">b.get("price"),</span>
                        <span class="hljs-string">))</span>

                    <span class="hljs-string">print("Updating</span> <span class="hljs-number">7</span> <span class="hljs-string">books...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[:7]:</span>
                        <span class="hljs-string">new_price</span> <span class="hljs-string">=</span> <span class="hljs-string">round(float(b.get("price",</span> <span class="hljs-number">10</span><span class="hljs-string">))</span> <span class="hljs-string">+</span> <span class="hljs-number">1.23</span><span class="hljs-string">,</span> <span class="hljs-number">2</span><span class="hljs-string">)</span>
                        <span class="hljs-string">new_pages</span> <span class="hljs-string">=</span> <span class="hljs-string">int(b.get("pages",</span> <span class="hljs-number">100</span><span class="hljs-string">))</span> <span class="hljs-string">+</span> <span class="hljs-number">5</span>
                        <span class="hljs-string">cur.execute(UPDATE_SQL,</span> <span class="hljs-string">(new_price,</span> <span class="hljs-string">new_pages,</span> <span class="hljs-string">b["isbn"]))</span>

                    <span class="hljs-string">print("Deleting</span> <span class="hljs-number">5</span> <span class="hljs-string">books...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[-5:]:</span>
                        <span class="hljs-string">cur.execute(DELETE_SQL,</span> <span class="hljs-string">(b["isbn"],))</span>

                    <span class="hljs-string">print("Performing</span> <span class="hljs-number">15</span> <span class="hljs-string">retrievals...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[:15]:</span>
                        <span class="hljs-string">cur.execute(GET_SQL,</span> <span class="hljs-string">(b["isbn"],))</span>
                        <span class="hljs-string">row</span> <span class="hljs-string">=</span> <span class="hljs-string">cur.fetchone()</span>
                        <span class="hljs-attr">if row:</span>
                            <span class="hljs-string">print(f"GET</span> {<span class="hljs-string">b</span>[<span class="hljs-string">'isbn'</span>]}<span class="hljs-string">:</span> {<span class="hljs-string">row</span>[<span class="hljs-string">'name'</span>]} <span class="hljs-string">by</span> {<span class="hljs-string">row</span>[<span class="hljs-string">'author'</span>]} <span class="hljs-string">(${row['price']})")</span>
                        <span class="hljs-attr">else:</span>
                            <span class="hljs-string">print(f"GET</span> {<span class="hljs-string">b</span>[<span class="hljs-string">'isbn'</span>]}<span class="hljs-string">:</span> <span class="hljs-string">not</span> <span class="hljs-string">found</span> <span class="hljs-string">(possibly</span> <span class="hljs-string">deleted)")</span>

            <span class="hljs-string">print("All</span> <span class="hljs-string">operations</span> <span class="hljs-string">completed.")</span>
        <span class="hljs-attr">finally:</span>
            <span class="hljs-string">conn.close()</span>

    <span class="hljs-string">if</span> <span class="hljs-string">__name__</span> <span class="hljs-string">==</span> <span class="hljs-attr">"__main__":</span>
        <span class="hljs-string">main()</span>
</code></pre>
<p>This script connects to the CockroachDB cluster, creates a table (if it doesn’t exist), and performs all those operations in sequence.</p>
<p>It runs around 50 SQL queries in total – a mix of <code>INSERT</code>, <code>UPDATE</code>, <code>DELETE</code>, and <code>SELECT</code> statements.</p>
<p>Now apply it:</p>
<pre><code class="lang-json">kubectl apply -f books-script.yml
</code></pre>
<h4 id="heading-step-63-create-the-job-to-run-the-script">Step 6.3: Create the Job to Run the Script</h4>
<p>Next, let’s create a Kubernetes Job that will actually run our Python script inside a container.</p>
<p>Create a file called <code>books-job.yml</code> and paste the manifest below:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">batch/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Job</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">books-job</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">runner</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">python:3.12-slim</span>
          <span class="hljs-attr">env:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">CRDB_DSN</span>
              <span class="hljs-attr">value:</span> <span class="hljs-string">"postgresql://root@crdb-cockroachdb-public:26257/defaultdb?sslmode=disable"</span>
          <span class="hljs-attr">command:</span> [<span class="hljs-string">"bash"</span>, <span class="hljs-string">"-lc"</span>]
          <span class="hljs-attr">args:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">|
              pip install --no-cache-dir "psycopg[binary]&gt;=3.1,&lt;3.3" &amp;&amp; \
              python /app/run.py --dsn "$CRDB_DSN" --json /app/books.json
</span>          <span class="hljs-attr">volumeMounts:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">script</span>
              <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/app/run.py</span>
              <span class="hljs-attr">subPath:</span> <span class="hljs-string">run.py</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">books</span>
              <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/app/books.json</span>
              <span class="hljs-attr">subPath:</span> <span class="hljs-string">books.json</span>
      <span class="hljs-attr">volumes:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">script</span>
          <span class="hljs-attr">configMap:</span>
            <span class="hljs-attr">name:</span> <span class="hljs-string">books-script</span>
            <span class="hljs-attr">defaultMode:</span> <span class="hljs-number">0555</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">books</span>
          <span class="hljs-attr">configMap:</span>
            <span class="hljs-attr">name:</span> <span class="hljs-string">books-json</span>
</code></pre>
<p>Here’s what’s happening:</p>
<ul>
<li><p>The Job runs a container based on Python 3.12-slim.</p>
</li>
<li><p>It connects to CockroachDB using the connection string <code>postgresql://root@crdb-cockroachdb-public:26257/defaultdb?sslmode=disable</code>. Notice how <code>sslmode=disable</code>: this is because we disabled TLS in our Helm values earlier.</p>
</li>
<li><p>The Job mounts the two ConfigMaps we created earlier (<code>books-json</code> and <code>books-script</code>) as <strong>volumes</strong> inside the container. Think of volumes like small external drives that the container can read from.</p>
</li>
</ul>
<p>Apply it:</p>
<pre><code class="lang-bash">kubectl apply -f books-job.yml
</code></pre>
<h4 id="heading-step-64-check-if-the-job-ran-successfully">Step 6.4: Check if the Job Ran Successfully</h4>
<p>After a minute or two, check your pods:</p>
<pre><code class="lang-bash">kubectl get po
</code></pre>
<p>If you see <code>books-job-xxx</code> with the status <strong>Completed</strong>, then your script ran successfully 🎉</p>
<p>That means our database just got a nice little workout – some records were created, updated, deleted, and read.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761460118429/99ed49a3-52e9-4357-ba2b-9295f0dfbdc8.png" alt="The Completed state of the Books Job" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-7-viewing-the-metrics-from-the-load">Step 7: Viewing the Metrics from the Load</h3>
<p>Now that we’ve generated a small load, let’s jump back to the CockroachDB dashboard.</p>
<p>Head to the Metrics section, and under SQL Queries Per Second, you should see a little spike: this shows the activity from our Python job.👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761460175366/6c1e129e-c8bd-4f41-89de-60a1a753026e.png" alt="The SQL Queries Per Second Metric" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Hover your mouse over the graph lines to see exact numbers.</p>
<p>Do the same for Service Latency: SQL Statements (99th percentile). You’ll notice a few bumps showing how long some of the queries took.👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761460224971/8ba9d5ed-0724-4dc6-82f4-7e5d0d05be82.png" alt="The Service Latency Metric" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This small experiment gives you a real feel for how CockroachDB reacts under activity, even a tiny one.</p>
<p>To explore more metrics and dashboards, check out the <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/ui-overview-dashboard">official CockroachDB documentation here</a>.</p>
<h3 id="heading-step-8-view-the-list-of-created-items-in-the-database">Step 8: View the List of Created Items in the Database</h3>
<p>Now that our Python job ran and touched the database (creating, updating, deleting, retrieving records), let’s check the content of our <code>books</code> table just to verify everything really happened.</p>
<p>First, we’ll create another Kubernetes job (or pod) that connects to our CockroachDB cluster and runs a simple SQL query <code>SELECT * FROM books;</code>. This pulls out all the remaining records in the table.</p>
<p>Here’s the manifest to use. Create a file named <code>view-books.yml</code> and paste the below content inside it:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">batch/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Job</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">view-books</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">client</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.2</span>
          <span class="hljs-attr">command:</span> [<span class="hljs-string">"bash"</span>, <span class="hljs-string">"-lc"</span>]
          <span class="hljs-attr">args:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">|
              cockroach sql \
                --insecure \
                --host=crdb-cockroachdb-public:26257 \
                --database=defaultdb \
                --format=records \
                --execute="SELECT * FROM public.books;"</span>
</code></pre>
<p>Note: We use <code>sslmode=disable</code> because we turned off TLS in our Minikube config. This job mounts nothing fancy. It just spins up, connects to the database, runs the <code>SELECT</code>, and displays the result.</p>
<p>Run the job:</p>
<pre><code class="lang-bash">kubectl apply -f view-books.yml
</code></pre>
<p>Wait a minute, then check the pod status:</p>
<pre><code class="lang-bash">kubectl get po
</code></pre>
<p>Look for something like <code>books-client-job-xxx</code> in <strong>Completed</strong> state.</p>
<p>Finally, view the job logs to see the actual records:</p>
<pre><code class="lang-bash">kubectl logs view-books
</code></pre>
<p>You’ll see output similar to the below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761462270132/c881eca7-18b0-4647-a6b1-2841e7774969.png" alt="The list of created books in the books table in the CockroachDB database" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-backing-up-cockroachdb-to-google-cloud-storage">Backing Up CockroachDB to Google Cloud Storage ☁️</h2>
<p>In this section we’ll explain how you can automate backups of your CockroachDB cluster using simple SQL commands, service accounts (for authenticating to Google Cloud), and Google Cloud Storage (where the data will be stored).</p>
<h3 id="heading-why-backups-are-absolutely-critical">Why Backups Are Absolutely Critical</h3>
<p>Imagine you’ve built your cluster on Kubernetes, and everything’s humming along for weeks or months. You’ve got tens or hundreds of gigabytes of data and 10k+ users relying on it.</p>
<p>Then <strong>BAM!</strong> Something happens. Maybe someone accidentally overwrote the Helm release (<code>helm upgrade --install …</code> with the same release name, for example <code>crdb</code>), or a cloud disk got deleted, or a critical node failed and you lose the majority of data replicas. That’s the nightmare we all dread 😭.</p>
<p>Mistakes happen, even if you’re super careful. What matters most is: How fast and easily could you recover?</p>
<p>That’s why we’ll set up <strong>daily backups</strong> of our CockroachDB cluster, targeting a Google Cloud Storage bucket. (Quick note: Google Cloud Object Storage is a service where you can store large amounts of data in the cloud as “objects”. You can grab, store, and retrieve data from it, just like Google Drive or Apple Storage. 😃)</p>
<p>With your backups going into a storage bucket, if disaster strikes, you can restore the entire cluster (or specific databases/tables) in minutes or hours – instead of days or losing data forever.</p>
<h3 id="heading-connecting-to-our-db-installing-beekeeper-studio">Connecting to Our DB – Installing Beekeeper Studio</h3>
<p>So far, we’ve been connecting to our database programmatically, running commands from pods or jobs inside Kubernetes. But what if there was a <em>more visual</em> and <em>user-friendly</em> way to explore our data?</p>
<p>Well, meet my friend <strong>Beekeeper Studio.</strong> 🙂</p>
<p>Beekeeper Studio is a sleek, open-source database management tool that lets you connect to a wide range of databases like PostgreSQL, MySQL, SQLite, and (most importantly for us) CockroachDB.</p>
<p>It comes with a simple, modern interface for running queries, browsing tables, and viewing data – no need to jump into pods or remember command-line flags 😄</p>
<h3 id="heading-how-to-install-beekeeper-studio">How to Install Beekeeper Studio</h3>
<ol>
<li><p>Visit the official Beekeeper Studio download page here: <a target="_blank" href="https://www.beekeeperstudio.io/get">https://www.beekeeperstudio.io/get</a></p>
</li>
<li><p>Click the “Skip to the download” link. You’ll see something like this:</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761542821015/2e7a0fd5-7047-4090-97fb-46b81a3dd638.png" alt="Finding the Button to Skip to the DOwnload page on the Beekeeper Studio website" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
<li><p>You’ll be redirected to a page listing download options for different operating systems.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761542877590/6034dcf0-d9b0-447b-bd2b-089458729db7.png" alt="Page to select download option according to the user OS" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
<li><p>Choose your OS and download the correct installer.</p>
</li>
<li><p>Afterwards, install the downloaded Beekeeper Studio software according to your OS</p>
</li>
</ol>
<h3 id="heading-connecting-beekeeper-studio-to-cockroachdb">Connecting Beekeeper Studio to CockroachDB</h3>
<p>Now that we’ve installed Beekeeper Studio, it’s time to connect it to our CockroachDB cluster running inside Minikube</p>
<p>But before we jump in, here’s something important to note:👇🏾</p>
<p>Our CockroachDB cluster is running INSIDE Kubernetes, and by default, it’s not accessible from outside the cluster.</p>
<p>To confirm this, run:</p>
<pre><code class="lang-bash">kubectl get svc crdb-cockroachdb-public
</code></pre>
<p>You should see something like this 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761544640270/2cf9f8f1-15f1-459b-acd0-63b1c361fa54.png" alt="The CockroachDB service being of type ClusterIP" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Notice the <strong>CLUSTER-IP</strong> column. That means the service can only be accessed by other pods INSIDE the Minikube cluster – not from your laptop or external apps</p>
<h3 id="heading-exposing-the-cluster-for-local-access">Exposing the Cluster for Local Access</h3>
<p>To make our database accessible from your local machine (so Beekeeper Studio can reach it), we’ll use <strong>Kubernetes Port Forwarding</strong>.</p>
<p>In a new terminal tab, run:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26257
</code></pre>
<p>This command tells Kubernetes to forward your local port 26257 to CockroachDB service’s port 26257 inside the cluster.</p>
<p>Once it’s running, your CockroachDB instance will now be accessible from <a target="_blank" href="http://localhost:26257"><code>localhost:26257</code></a>.<br>(Note: it’s not accessible via your browser because this isn’t an HTTP endpoint 😅)</p>
<h3 id="heading-connecting-via-beekeeper-studio">🐝 Connecting via Beekeeper Studio</h3>
<ol>
<li><p>Open Beekeeper Studio.</p>
</li>
<li><p>Click on the dropdown that says “Select a connection type…”.</p>
</li>
<li><p>Choose CockroachDB from the list.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761544886889/98443b46-574d-4bcc-a41c-d2daa7412201.png" alt="Selecting CockroachDB as a connection type in Beekeeper Studio" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
<li><p>In the connection window that pops up:</p>
<ul>
<li><p>Disable the <code>Enable SSL</code> option.</p>
</li>
<li><p>Set User to <code>root</code></p>
</li>
<li><p>Set Default Database to <code>defaultdb</code></p>
</li>
<li><p>Host to <a target="_blank" href="http://localhost"><code>localhost</code></a></p>
</li>
<li><p>Port to <code>26257</code></p>
</li>
</ul>
</li>
<li><p>Now click <strong>Test</strong> (bottom right corner). You should see a success message like <em>Connection looks good</em>.</p>
</li>
</ol>
<p>Your setup should look like this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761544818021/0248173e-9969-433c-a9d4-e83684bf34cf.png" alt="Connecting to the CockroachDB cluster from the Beekeeper Studio software" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Finally, click Connect (right beside the Test button).</p>
<h3 id="heading-verify-the-connection">Verify the Connection</h3>
<p>Once connected, you’ll land on a clean workspace where you can run SQL queries.</p>
<p>To confirm you’re connected to the right cluster, run:</p>
<pre><code class="lang-bash">SELECT * FROM books;
</code></pre>
<p>You should see a table containing about 15 books (the same ones we inserted earlier):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761545094817/99ef4415-bd0d-4452-817f-380996485397.png" alt="List of books in the CockroachDB database" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>And there you go. You’ve now connected Beekeeper Studio to your CockroachDB running inside Minikube! 🚀</p>
<h3 id="heading-creating-a-google-cloud-account">Creating a Google Cloud Account</h3>
<p>Before we can back up our CockroachDB data to Google Cloud Storage, we need to have a Google Cloud account ready.</p>
<h4 id="heading-step-1-visit-the-google-cloud-console">Step 1: Visit the Google Cloud Console</h4>
<p>Head over to 👉🏾 <a target="_blank" href="https://console.cloud.google.com">https://console.cloud.google.com</a></p>
<p>If you don’t have a Google account yet, don’t worry. The process is simple and self-explanatory once you visit the site :). You’ll be guided to create a Google account first, and then your Google Cloud account.</p>
<h4 id="heading-step-2-create-or-use-a-project">Step 2: Create or Use a Project</h4>
<p>Once you’re in the Google Cloud Console, you’ll either:</p>
<ul>
<li><p>Use the <strong>default project</strong> that was automatically created for you, <strong>or</strong></p>
</li>
<li><p>Create a new one by clicking on <strong>“New Project”</strong> and naming it <code>crdb-tutorial</code>.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761546797213/295c7b09-9bb8-4c34-85cf-8701242b2768.png" alt="Creating a new Project in our Google Cloud account" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Projects are like folders that contain all your Google Cloud resources: compute instances, storage buckets, databases, and more.</p>
<h4 id="heading-step-3-link-a-billing-account-optional-but-recommended">Step 3: Link a Billing Account (Optional but Recommended)</h4>
<p>If you already have a billing account, link it to your project.</p>
<p>If not, you can easily create one by <a target="_blank" href="https://docs.cloud.google.com/billing/docs/how-to/create-billing-account">following Google’s instructions here</a>. (You’ll need a valid Debit or Credit card.)</p>
<p>Don’t worry if your card doesn’t link right away. Sometimes Google’s billing system can be picky. 😅</p>
<p>Here’s a quick fix that usually works:</p>
<ol>
<li><p>Add your card to Google Pay first.</p>
</li>
<li><p>Then go to Google Subscriptions in your Google account, and link it to your Google Billing Account.</p>
</li>
</ol>
<p>To add your card via Google Subscriptions, <a target="_blank" href="https://myaccount.google.com/payments-and-subscriptions">visit here</a>. (You need to have a Google account first. Don’t worry, the site will direct you on what to do if you don’t.)</p>
<p>You’ll see a page like this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761546938934/9e983134-dd7e-49b1-85a7-cd12bd01bf67.png" alt="Adding a card to Google Subscriptions" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Click Manage payment methods, then add your card details.</p>
<p>Once you’ve done that, refresh your Google Billing Account page – you should now see your card as one of the available options.</p>
<h3 id="heading-creating-a-google-cloud-storage-bucket">Creating a Google Cloud Storage Bucket</h3>
<p>Now that we’ve set up our Google Cloud account and enabled billing, let’s create a Cloud Storage Bucket. This is simply a location (like an online folder) where our CockroachDB backup files will be stored.</p>
<p>In your Google Cloud console, type “storage” in the search bar at the top. From the dropdown results, click on “Cloud Storage”:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089121918/c737c3e1-e45f-48e1-aed9-99e273583425.png" alt="Navigating to the Cloud Storage page" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>On the new page, click on the “Buckets” link in the side menu, then click the “Create Bucket” button.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089164660/8b9336fc-c0c3-4811-ab98-d3538596ee5a.png" alt="Creating a new Bucket in Cloud Storage" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Give your bucket a unique name, like <em>cockroachdb-backup</em>-. For example, <em>cockroachdb-backup-i8wu, cockroachdb-backup-7gw8u.</em> The random characters ensure your bucket name is unique globally (no other Google Cloud user will have the same name).</p>
<p>Scroll to the bottom and click “Create” to create your bucket.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089287083/a376f695-81b8-4f5a-80a7-cd563c8b4c81.png" alt="Creating your Bucket in Google Cloud Storage" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You’ll see a pop-up asking you to <strong>confirm public access prevention</strong>. This means that only you (and people you explicitly give access to) can view or edit your bucket. Make sure the “Enforce public access prevention on this bucket” checkbox is checked, then click “Confirm.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089404876/38c8e6b5-0de0-4771-9bed-9334f8f8c43a.png" alt="Preventing random users from accessing your bucket" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Perfect! 🎉 You’ve now created a storage bucket where your CockroachDB backups will live.</p>
<h3 id="heading-giving-cockroachdb-access-to-the-bucket">Giving CockroachDB Access to the Bucket</h3>
<p>Our next goal is to let the CockroachDB cluster upload and read files from this bucket. To do this, we’ll create something called a <strong>Service Account</strong> using <strong>Google IAM</strong>.</p>
<p><strong>What’s IAM?</strong><br>IAM stands for <em>Identity and Access Management.</em> It’s basically Google Cloud’s way of managing who can access what in your project.</p>
<p>With IAM, we can create a service account (like a “digital employee”) and give it permission to interact with our bucket instead of using our personal Google account.</p>
<h4 id="heading-creating-a-service-account">Creating a Service Account</h4>
<p>Type “service account” in the search bar and click on “Service Accounts” in the results.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089569066/2855b7fa-d896-4249-825d-4ec590499ca8.png" alt="Navigating the Service Accounts page" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Click “Create Service Account” at the top of the page. On the new page, type: <em>cockroachdb-backup</em> as the service account name, then click ‘Create and Continue’</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089677768/05c9f9ed-257f-44c6-89b5-3880c8af017d.png" alt="Creating a new Service Account for the CockroachDB cluster, to give it access to our Cloud Storage Bucket" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now we’ll give this service account permission to work with our storage bucket. In the <em>Permissions</em> section, type “storage object creator” in the filter box and select it from the dropdown.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089744927/64ed65df-88ee-43c9-8be4-892a41a24989.png" alt="Providing our Service Account with the necessary permissions to access the bucket" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Repeat the same for “storage object viewer”, and “storage object user”.</p>
<p>At the end, you should see three roles assigned:</p>
<ul>
<li><p>Storage Object Creator</p>
</li>
<li><p>Storage Object Viewer</p>
</li>
<li><p>Storage Object User</p>
</li>
</ul>
<p>Click “Continue”, then “Done.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762092953125/0419abe8-a1ff-4f1c-b367-f9e203bdf6ff.png" alt="The necessary permissions to be assigned to the Service Account" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You’ve now created a service account that can create and read files in your bucket.</p>
<h4 id="heading-downloading-the-service-account-key">Downloading the Service Account Key</h4>
<p>To let our CockroachDB cluster use this service account, we’ll generate a <strong>key file</strong>.</p>
<p><strong>What’s a key file?</strong><br>It’s just a small <strong>JSON file</strong> containing secret information your app (CockroachDB) can use to authenticate securely with Google Cloud – like an ID card.</p>
<p><strong>But be careful ⚠️</strong> If this key gets into the wrong hands, anyone could use it to access your Google Cloud resources. <strong>Never share or upload this file</strong> to your GitHub, BitBucket, or GitLab repository, or any other online repositories.</p>
<p>In the Service Accounts page, find your <code>cockroachdb-backup</code> account, click the three dots (⋮) under the Action column, then select “Manage Keys.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762090008411/11c4b373-87b0-416d-bf14-1a9ccd15c452.png" alt="Finding the newly created service account, and creating a key" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>On the new page, click “Add Key” then “Create new key.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762090059309/ebe17228-e2a8-4abe-b41b-7378013570d5.png" alt="Creating a new key for the new service account" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>A dialog box will pop-up, choose JSON as the key type, and click “Create.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762090115728/5ed82664-f57a-4489-af08-be85c2ad42e9.png" alt="Selecting the Key Type as JSON" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Google will automatically download a file named something like <code>cockroachdb-backup-1234567890abcdef.json</code></p>
<p>We’ll use this key soon when we configure our CockroachDB backup job.</p>
<h3 id="heading-attaching-the-key-to-our-cockroachdb-cluster">Attaching the Key to Our CockroachDB Cluster</h3>
<p>Now that we’ve downloaded the service account key, we need to attach it to our CockroachDB cluster so that the DB can upload and read backups from our Google Cloud Storage bucket.</p>
<p><strong>Why this is needed:</strong><br>Our Minikube cluster (and even any managed Kubernetes cluster like GKE, EKS, or AKS) <strong>doesn’t have direct access</strong> to the files on your computer. So, we’ll upload the key file to Kubernetes as a Secret, and then mount it inside our CockroachDB pods as a volume.</p>
<h4 id="heading-step-1-create-a-kubernetes-secret">Step 1: Create a Kubernetes Secret</h4>
<p>Run the command below in your terminal👇🏾 Replace <code>&lt;PATH_TO_KEY&gt;</code> with the path to your downloaded key file:</p>
<pre><code class="lang-bash">kubectl create secret generic gcs-key --from-file=key.json=&lt;PATH_TO_KEY&gt;
</code></pre>
<p>This command creates a <strong>Kubernetes Secret</strong> named <code>gcs-key</code> that securely stores your Google Cloud key.</p>
<h4 id="heading-step-2-mount-the-secret-to-the-cockroachdb-cluster">Step 2: Mount the Secret to the CockroachDB Cluster</h4>
<p>Now, let’s tell Kubernetes to use this secret inside our CockroachDB cluster.</p>
<p>Open your <code>cockroachdb-values.yml</code> file and scroll to the <code>statefulset:</code> section. Add the following lines under it:👇🏾</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-string">...</span>
  <span class="hljs-attr">env:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GOOGLE_APPLICATION_CREDENTIALS</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">/var/run/gcp/key.json</span>

  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">gcs-key</span>

  <span class="hljs-attr">volumeMounts:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/var/run/gcp</span>
      <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Here’s what this does:</p>
<ul>
<li><p>The <code>volumes</code> section tells Kubernetes to create a volume from the secret we just made.</p>
</li>
<li><p>The <code>volumeMounts</code> section attaches that volume inside the CockroachDB container.</p>
</li>
<li><p>The <code>GOOGLE_APPLICATION_CREDENTIALS</code> environment variable points CockroachDB to our key file so it knows where to find it when connecting to Google Cloud.</p>
</li>
</ul>
<p>Your final file should look like this:👇🏾</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">podSecurityContext:</span>
    <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">podAntiAffinity:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>
  <span class="hljs-attr">nodeSelector:</span>
    <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
  <span class="hljs-attr">env:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GOOGLE_APPLICATION_CREDENTIALS</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">/var/run/gcp/key.json</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">gcs-key</span>
  <span class="hljs-attr">volumeMounts:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/var/run/gcp</span>
      <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>

<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">5Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">standard</span>

<span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>

<span class="hljs-attr">init:</span>
  <span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">wait:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Now, apply the update using Helm:👇🏾</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<h4 id="heading-step-3-confirm-the-key-exists-in-the-cluster">Step 3: Confirm the Key Exists in the Cluster</h4>
<p>Once the upgrade is complete, run this command to confirm the key is now inside your CockroachDB pods:</p>
<pre><code class="lang-bash">kubectl <span class="hljs-built_in">exec</span> -it crdb-cockroachdb-1 -- cat /var/run/gcp/key.json
</code></pre>
<p>You should see something similar to this:👇🏾</p>
<pre><code class="lang-bash">prince@DESKTOP-QHVTAUD:~/programming/cockroachdb-tutorial$ kubectl <span class="hljs-built_in">exec</span> -it crdb-cockroachdb-1 -- cat /var/run/gcp/key.json
{
  <span class="hljs-string">"type"</span>: <span class="hljs-string">"service_account"</span>,
  <span class="hljs-string">"project_id"</span>: ***,
  <span class="hljs-string">"private_key_id"</span>: ***,
  <span class="hljs-string">"private_key"</span>: ***,
  <span class="hljs-string">"client_email"</span>: ***,
  <span class="hljs-string">"client_id"</span>: ***,
  <span class="hljs-string">"auth_uri"</span>: <span class="hljs-string">"https://accounts.google.com/o/oauth2/auth"</span>,
  <span class="hljs-string">"token_uri"</span>: <span class="hljs-string">"https://oauth2.googleapis.com/token"</span>,
  <span class="hljs-string">"auth_provider_x509_cert_url"</span>: <span class="hljs-string">"https://www.googleapis.com/oauth2/v1/certs"</span>,
  <span class="hljs-string">"client_x509_cert_url"</span>: ***,
  <span class="hljs-string">"universe_domain"</span>: <span class="hljs-string">"googleapis.com"</span>
}
</code></pre>
<p>Nice! That means our cluster now has access to the Google Cloud key.</p>
<h4 id="heading-step-4-creating-the-backup-schedule">Step 4: Creating the Backup Schedule</h4>
<p>CockroachDB makes backups super convenient. It can automatically back up your database <strong>on a schedule</strong> (without you needing to manually create Kubernetes CronJobs).</p>
<p>To create an automatic backup schedule, run this SQL command inside the CockroachDB SQL shell 👇🏾(Replace the BUCKET_NAME placeholder with the name of your Google Cloud Storage bucket):</p>
<pre><code class="lang-bash">CREATE SCHEDULE backup_cluster
FOR BACKUP INTO <span class="hljs-string">'gs://&lt;BUCKET_NAME&gt;/cluster?AUTH=implicit'</span>
WITH revision_history
RECURRING <span class="hljs-string">'@hourly'</span>
FULL BACKUP <span class="hljs-string">'@daily'</span>
WITH SCHEDULE OPTIONS first_run = <span class="hljs-string">'now'</span>;
</code></pre>
<p>Here’s what each part means:</p>
<ul>
<li><p><code>AUTH=implicit</code> tells CockroachDB to use the Google key we mounted (<code>GOOGLE_APPLICATION_CREDENTIALS</code>) for authentication.</p>
</li>
<li><p><code>FULL BACKUP '@daily'</code> creates a complete backup of the entire database every day.</p>
</li>
<li><p><code>RECURRING '@hourly'</code> creates smaller, incremental backups every hour, capturing just the changes since the last backup.</p>
</li>
<li><p><code>WITH SCHEDULE OPTIONS first_run = 'now'</code> starts the first backup immediately after running the command.</p>
</li>
</ul>
<p>After running it, CockroachDB will return two rows:</p>
<ul>
<li><p>The first is for the <strong>recurring incremental backup</strong> (hourly updates)</p>
</li>
<li><p>The second is for the <strong>full backup</strong> (daily snapshot)</p>
</li>
</ul>
<p>You can read more about full and incremental backups in the official docs here 👉🏾<a target="_blank" href="https://www.cockroachlabs.com/docs/stable/take-full-and-incremental-backups">CockroachDB Backups Guide</a>.</p>
<h4 id="heading-step-5-checking-backup-status">Step 5: Checking Backup Status</h4>
<p>To see the status of your backups, copy the <strong>Job ID</strong> from the second row (the <code>id</code> column) and run this command:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762103549260/742fc309-9c4d-4967-9436-91539851a9b9.png" alt="The job ID to copy" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<pre><code class="lang-bash">SHOW JOBS FOR SCHEDULE &lt;YOUR_JOB_ID&gt;;
</code></pre>
<p>Replace <code>&lt;YOUR_JOB_ID&gt;</code> with the ID you copied.</p>
<p>You’ll see output similar to this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762103606748/8627d561-0b54-4e6d-9109-ba7e1c7a85c3.png" alt="Getting the status of the backup job" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now, do the same for the recurring backup job (the ID on the 1st row of the previous result)</p>
<p>If both statuses show <code>succeeded</code>, that means your full and recurring backups worked perfectly! If either is still running, just give it a few minutes – backups can take a bit of time :)</p>
<h3 id="heading-testing-our-backup-disaster-recovery-time">Testing Our Backup — Disaster Recovery Time</h3>
<p>Woohoo! We’ve successfully created a backup of our CockroachDB cluster to Google Cloud Storage. That’s a huge milestone. But let’s be honest: how can we be <em>sure</em> it works if we’ve never tried restoring it?</p>
<p>So, in true brave-developer fashion, we’re going to do the unthinkable: <strong>destroy our entire database</strong>...yes, everything! 😬</p>
<p>Why would we do that?! Because in real life, disasters happen. A node crashes, data gets wiped, or an upgrade goes sideways. The question is: <em>Can we recover?</em> Let’s find out.</p>
<h4 id="heading-step-1-uninstall-the-helm-chart">Step 1: Uninstall the Helm Chart</h4>
<p>First, let’s remove the CockroachDB Helm release. This deletes the cluster resources like StatefulSets, pods, and secrets:</p>
<pre><code class="lang-bash">helm uninstall crdb
</code></pre>
<p>This removes the running cluster, but <strong>not the actual data</strong>, which is stored on Persistent Volumes (PVs).</p>
<h4 id="heading-step-2-delete-persistent-volume-claims-pvcs">Step 2: Delete Persistent Volume Claims (PVCs)</h4>
<p>Each CockroachDB node stores its data in a <strong>Persistent Volume Claim</strong> (PVC). These PVCs remain even after uninstalling the Helm release, so let’s manually delete them:</p>
<pre><code class="lang-bash">kubectl delete pvc datadir-crdb-cockroachdb-0
kubectl delete pvc datadir-crdb-cockroachdb-1
kubectl delete pvc datadir-crdb-cockroachdb-2
</code></pre>
<h4 id="heading-step-3-delete-the-persistent-volumes-pvs">Step 3: Delete the Persistent Volumes (PVs)</h4>
<p>Next, list all the Persistent Volumes:</p>
<pre><code class="lang-bash">kubectl get pv
</code></pre>
<p>You’ll see a list of volumes similar to this 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762107818554/01defffd-543b-486a-aa19-4bbf6f768270.png" alt="List existing Persistent Volumes for CockroachDB" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Look for the PVs that are <strong>bound to the PVCs</strong> you just deleted. Then delete them manually using:</p>
<pre><code class="lang-bash">kubectl delete pv &lt;PV_NAME&gt;
</code></pre>
<p>At this point, you’ve completely wiped out your database like it never existed 🥲. Don’t worry: this is all part of the plan.</p>
<h4 id="heading-step-4-reinstall-the-cluster">Step 4: Reinstall the Cluster</h4>
<p>Let’s bring CockroachDB back to life (an empty one for now):</p>
<pre><code class="lang-bash">helm install crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>Once the installation is done, expose the cluster locally again:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26257
</code></pre>
<h4 id="heading-step-5-check-whats-left">Step 5: Check What’s Left</h4>
<p>Connect to the Beekeeper Studio to your DB if your not, and try running the query below:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>You’ll get an error saying the <code>books</code> table doesn’t exist, because this is a <em>brand new</em> database.</p>
<h4 id="heading-step-6-restore-from-google-cloud-storage">Step 6: Restore from Google Cloud Storage</h4>
<p>Now for the magic part, let’s bring our data back from the backup we created earlier 😃!</p>
<p>Run this query the new cluster:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">RESTORE</span> <span class="hljs-keyword">FROM</span> LATEST <span class="hljs-keyword">IN</span> <span class="hljs-string">'gs://&lt;BUCKET_NAME&gt;/cluster?AUTH=implicit'</span>;
</code></pre>
<p>Replace <code>&lt;BUCKET_NAME&gt;</code> with your actual Google Cloud Storage bucket name (for example: <code>cockroachdb-backup-7gw8u</code>).</p>
<p>CockroachDB will begin restoring your data. This can take a few seconds or minutes depending on your backup size. When it’s done, you’ll see a response showing a success status:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762108106557/0da98d45-d8f4-48ed-b852-9f76209fb20f.png" alt="Database restored successfully" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-step-7-confirm-the-restoration">Step 7: Confirm the Restoration</h4>
<p>Now, run the same query again:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom 💥 your books are back 😁! That means your backup and restore process works perfectly. You just performed a full disaster recovery test.</p>
<p>Congrats! You’ve done something many real-world teams fail to test: a <strong>full backup and restore cycle</strong>. You’ve now proven that your database setup is resilient, even in a worst-case scenario.</p>
<h2 id="heading-managing-resources-amp-optimizing-memory-usage">Managing Resources &amp; Optimizing Memory Usage</h2>
<p>In this section, we’ll learn how CockroachDB handles memory internally (for things like caching and SQL query work), and how to tune these setting<strong>s</strong> so you avoid OOM kills or Eviction – Kubernetes crashing/stopping the database due to it using too much memory than what was allocated to it.</p>
<h3 id="heading-how-cockroachdb-uses-memory">How CockroachDB Uses Memory</h3>
<p>When you deploy CockroachDB nodes (each replica) via Kubernetes, each pod (node) needs memory for multiple things. At a high level, there are two major internal uses:</p>
<ul>
<li><p><strong>Cache</strong> (<code>conf.cache</code>): This is the space CockroachDB uses to keep frequently accessed data in memory so queries can run faster without hitting the disk.</p>
</li>
<li><p><strong>SQL Memory</strong> (<code>conf.max-sql-memory</code>): This is the memory used when running SQL queries (things like sorting, joins, buffering numbers, and temporary data).</p>
</li>
</ul>
<p>Together, they need to be sized appropriately relative to the total memory you give the pod, so there’s room for these internal operations <em>plus</em> other overhead (networking, logging, background tasks).</p>
<h3 id="heading-the-memory-usage-formula-you-must-follow">The Memory Usage Formula You Must Follow</h3>
<p>Here’s the golden rule you should <strong>never forget</strong>:</p>
<pre><code class="lang-yaml"><span class="hljs-string">(2</span> <span class="hljs-string">×</span> <span class="hljs-string">max-sql-memory)</span> <span class="hljs-string">+</span> <span class="hljs-string">cache</span>  <span class="hljs-string">≤</span>  <span class="hljs-number">80</span><span class="hljs-string">%</span> <span class="hljs-string">of</span> <span class="hljs-string">the</span> <span class="hljs-string">memory</span> <span class="hljs-string">limit</span>
</code></pre>
<p>What this means:</p>
<ul>
<li><p>You take the <code>max-sql-memory</code> value and multiply by 2 (because SQL work may need space for both input and output, etc)</p>
</li>
<li><p>Add your <code>cache</code> value</p>
</li>
<li><p>That total must be <strong>less than or equal to 80%</strong> of the pod’s memory limit (<code>statefulset.resources.limits.memory</code>)</p>
</li>
<li><p>The remaining ~20% (or more) is free space for <em>other internal CockroachDB processes</em> like background jobs, metrics, network, and so on</p>
</li>
</ul>
<p>If you give CockroachDB too little “free” memory beyond these two settings, you risk OOM kills (pod gets killed by Kubernetes because it used more memory than allowed) or performance issues.</p>
<h3 id="heading-where-you-find-these-settings">Where You Find These Settings</h3>
<p>If you go to the Helm chart docs on ArtifactHub, <a target="_blank" href="https://artifacthub.io/packages/helm/cockroachdb/cockroachdb">CockroachDB Helm Chart on ArtifactHub</a>, and scroll down to the <strong>Configuration</strong> section (or press Ctrl-F for <code>conf.cache</code>), you’ll see:</p>
<ul>
<li><p><code>conf.cache</code> (cache size)</p>
</li>
<li><p><code>conf.max-sql-memory</code> (SQL memory size)</p>
</li>
<li><p>It states that each of these is by default set to roughly 25% of the memory allocation you set in the <code>resources.limits.memory</code> for the statefulset.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762235290740/bd176882-43bd-4abd-94e0-cce083335d64.png" alt="Artifacthub docs for the CockroachDB Helm chart" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-concrete-example-step-by-step">Concrete Example (Step-by-Step)</h3>
<p>Let’s do the math with numbers in our Minikube environment.</p>
<ul>
<li><p>In our case we set <code>statefulset.resources.limits.memory</code> = <strong>2 GiB</strong> for each CockroachDB pod.</p>
</li>
<li><p>The Helm default of ¼ (25%) rule means:</p>
<ul>
<li><p><code>conf.cache</code> = ¼ × 2 GiB = <strong>512 MiB</strong></p>
</li>
<li><p><code>conf.max-sql-memory</code> = ¼ × 2 GiB = <strong>512 MiB</strong></p>
</li>
</ul>
</li>
<li><p>Apply the formula: <code>(2 × 512 MiB) + 512 MiB = 1,536 MiB</code></p>
</li>
<li><p>Calculate 80% of the memory limit: <code>80% of 2 GiB = 1,638 MiB</code> (approximately)</p>
</li>
<li><p>Compare: 1,536 MiB ≤ 1,638 MiB – so we’re within the safe zone ✅</p>
</li>
<li><p>That means in this configuration, CockroachDB expects to use <strong>~1,536 MiB</strong> for its cache + SQL memory. This leaves <strong>~512 MiB</strong> (20%) of the 2 GiB limit for other internal processes.</p>
</li>
</ul>
<p>That leftover memory is for things like internal bookkeeping (range rebalancing, replication metadata), communication among database replicas, metric collection, logging, garbage collection, and temporary or unexpected memory spikes.</p>
<p>If you don’t leave this free space, your node might struggle when “normal operations”. And on Kubernetes, if the pod uses more memory than the <code>limits.memory</code> says, it can get OOM-killed which causes downtime or restarts.</p>
<h3 id="heading-on-requests-vs-limits-in-kubernetes">⚠️ On Requests vs Limits in Kubernetes</h3>
<p>Important nuance: Kubernetes schedules pods based on <strong>requests</strong> (what you ask for) but enforces limits based on <strong>limits</strong> (what you allow).</p>
<ul>
<li><p><code>statefulset.resources.requests.memory</code> = what the scheduler guarantees the pod will have.</p>
</li>
<li><p><code>statefulset.resources.limits.memory</code> = the maximum the pod can use before Kubernetes will kill it for excess memory.</p>
</li>
</ul>
<p>Because CockroachDB’s internal memory computations (cache + SQL memory) use the <strong>limit</strong> value to calculate sizing, if you set requests &lt; limits you’ll get a mismatch. Example:</p>
<ul>
<li><p>Suppose requests = 1 GiB, limits = 2 GiB</p>
</li>
<li><p>Kubernetes may schedule the pod on a node that has (at least) 1 GiB free</p>
</li>
<li><p>But internally, CockroachDB will plan for ~1.5 GiB usage (based on the 2 GiB limit)</p>
</li>
<li><p>The node may not actually have that much free memory available</p>
</li>
<li><p>The pod might try to use more memory than the node reserved and risk eviction due to less memory for other pods</p>
</li>
</ul>
<p>✅ <strong>Best practice:</strong> Set requests = limits for memory and CPU for CockroachDB pods. That way the scheduler reserves enough space for what CockroachDB will use internally.</p>
<h3 id="heading-overriding-the-default-fractions">Overriding the Default Fractions</h3>
<p>If you want to set static <code>conf.cache</code> or <code>conf.max-sql-memory</code> values (rather than relying on 25% of limit) you <em>can</em> – but you must still obey the memory usage formula.</p>
<p>For example, if you set:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">conf:</span>
  <span class="hljs-attr">cache:</span> <span class="hljs-string">"1Gi"</span>
  <span class="hljs-attr">max-sql-memory:</span> <span class="hljs-string">"1Gi"</span>
<span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
</code></pre>
<p>According to the above configuration your pod memory request and limit is <strong>3 GiB</strong>, then calculate:</p>
<pre><code class="lang-yaml"><span class="hljs-string">(2</span> <span class="hljs-string">×</span> <span class="hljs-string">1Gi)</span> <span class="hljs-string">+</span> <span class="hljs-string">1Gi</span> <span class="hljs-string">=</span> <span class="hljs-string">3Gi</span>
<span class="hljs-number">80</span><span class="hljs-string">%</span> <span class="hljs-string">of</span> <span class="hljs-string">3Gi</span> <span class="hljs-string">=</span> <span class="hljs-string">~2.4Gi</span>
</code></pre>
<p>Here <strong>3Gi &gt; 2.4Gi</strong>, so you’d be violating the rule. This is a risky setup.</p>
<p>So you’ll need to either reduce cache or SQL memory, for example to 768Mi (or increase the memory limit, for example 4Gi) so that your formula results in ≤ 80% of the limit.</p>
<h2 id="heading-scaling-cockroachdb-the-right-way">Scaling CockroachDB the Right Way</h2>
<p>In this section we’ll look at when and how you should grow your CockroachDB cluster – whether that means adding more replicas (horizontal scale), giving each node more CPU/RAM (vertical scale), or giving them more storage.</p>
<p>I’ll explain everything in simple terms and cover what metrics to watch, what decisions to make, and how to scale safely.</p>
<p>What we’ll discuss:</p>
<ul>
<li><p>How you can tell it’s time to “grow” your cluster</p>
</li>
<li><p>How to safely add more nodes or upgrade what you already have</p>
</li>
<li><p>How to decide whether you need more nodes, bigger nodes, or bigger disks</p>
</li>
<li><p>How to do all this without causing downtime or stress</p>
</li>
</ul>
<h3 id="heading-key-metrics-to-understand">Key Metrics to Understand</h3>
<p>Before we dive into how to scale our cluster, we need to understand what certain metrics mean. Because, these metrics will help us make calculated decisions, knowing what and and when to scale certain resources.</p>
<h4 id="heading-read-bytessecond-amp-write-bytessecond-throughput">Read bytes/second &amp; Write bytes/second (Throughput)</h4>
<p>Read bytes/second is how much data (in bytes) the disk is <strong>reading</strong> every second from itself to the database, that is, passing from the disk to the database app.</p>
<p>Write bytes/second is how much data is being <strong>written</strong> to the disk per second, that is, moving from the database to the disk.</p>
<p>This matters because your database is an application that stores data on disk. If your app needs to read a lot of data (reads) or write a lot of data (writes), this metric shows the <strong>volume</strong> of data flowing to/from disk.</p>
<p>To keep an eye on it, go to your CockroachDB dashboard and navigate to the “Metrics” link on the sidebar. Under the “Metrics” title, click the “Dashboard:…” drop-down and select “Hardware” from the options.</p>
<p>Now, scroll down a bit till you see “Disk Read Bytes/s” and “Disk Write Bytes/s”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762325396257/553ac9d4-4927-40f3-b654-8b19a0b2aef8.png" alt="The Disk Read &amp; Write Bytes/s metrics" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-read-iops-amp-write-iops">Read IOPS &amp; Write IOPS</h4>
<p><strong>IOPS</strong> = “Input/Output Operations Per Second”. Here, Read IOPS = how many <strong>read operations</strong> the disk is performing per second. Write IOPS = how many <strong>write operations</strong> per second.</p>
<p>This is different from throughput because throughput is about how many bytes (data) are being transferred. IOPS, on the other hand, is about <strong>how many operations</strong> are happening (regardless of size).</p>
<p>Here’s an example: 10 read operations/sec of 1 MiB each = 10 MiB/sec throughput, 10 IOPS. Another scenario: 100 reads/sec of 10 KiB each = ~1 MiB/sec throughput, but 100 IOPS (higher operations count though lower data size.</p>
<p>Scroll down a bit more to view the IOPS metrics:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762325699278/dd549ac3-16cf-4373-9637-5a1e798bf5db.png" alt="Illustrating the IOPS metrics on the dashboard" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-sql-p99-latency-99th-percentile-latency">SQL p99 Latency (99th percentile latency)</h4>
<p>P99 latency is the time it takes for the <strong>slowest 1% of queries</strong> to finish.</p>
<p>For example, let’s say you run 1,000 queries. How long the slowest 10 of them took is what p99 shows.</p>
<p>This matters because it’s not about the average query, but about the tail (worst cases). If your p99 is high, it means some queries are seriously lagging. All other queries might be fine, but some are dragging.</p>
<p>So if p99 jumps up (for example, from 10 ms → 300 ms), you should investigate: maybe big joins, missing indexes, contention, or data takes too much time to get stored in the disk.</p>
<p>To access the SQL P99 Latency metrics, simply click the “Dashboard:…” select field, and choose the “Overview” option from the dropdown.</p>
<p>PS: The higher the p99 latency, the more problem there is (slower queries).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762326088120/e6f39e6e-942b-4db9-b808-cb228c1e0cc5.png" alt="The SQL p99 latency metric" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-disk-ops-in-progress-queue-depth">Disk Ops In Progress (Queue Depth)</h4>
<p>This shows how many disk reads and writes are waiting <em>in line</em> (queued) because the storage system is busy.</p>
<p>A queue depth of 0–5 is generally OK. If it frequently goes into double-digits (10+), that means storage is struggling and latency may spike. If you see this number high and staying high, you may need faster storage or more database replicas.</p>
<p>Simple rule: if “Ops In Progress” &gt; ~9 for extended time, this is a bad sign. Time to check disks and I/O.</p>
<p>To access the “Disk Ops In Progress“ metric, return to the “Hardware“ dashboard, and scroll down:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762488796957/b2a215fd-ec51-4ee3-9056-a5fa6d511c61.png" alt="Accessing the Disk Ops In Progress metrics on the COckroachDB dashboard" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>By monitoring these, you can choose:</p>
<ul>
<li><p>“I need <strong>more nodes</strong>” (horizontal scale)</p>
</li>
<li><p>“I need <strong>bigger nodes or faster storage</strong>” (vertical scale)</p>
</li>
<li><p>“I need <strong>better query/index tuning</strong>” (optimize rather than scale)</p>
</li>
</ul>
<h3 id="heading-when-and-what-to-scale-based-on-your-metrics">When (and What) to Scale Based on Your Metrics</h3>
<p>So, let’s imagine you’re watching your CockroachDB dashboard and notice this pattern:</p>
<ul>
<li><p>The <strong>SQL P99 latency</strong> (the slowest 1% of your queries) is high, meaning your queries are taking too long.</p>
</li>
<li><p>The <strong>CPU usage</strong> for your CockroachDB pods (under <em>Cockroach process CPU%</em>) is above <strong>80%</strong> consistently.</p>
</li>
</ul>
<p>That’s a classic sign your cluster is running out of CPU power and the database is struggling to process queries fast enough because the CPU is maxed out.</p>
<p>Here’s how to fix it 👇🏾</p>
<h4 id="heading-step-1-add-more-cpu-power">Step 1: Add More CPU Power</h4>
<p>You can scale up your CPUs directly through the <strong>Helm chart values file</strong>, <code>cockroachdb-values.yml</code>.</p>
<p>In that file, look for the section where CPU and memory requests/limits are defined under <code>statefulset.resources</code>. Then, increase the CPU allocations. For example:</p>
<pre><code class="lang-bash">statefulset:
  resources:
    requests:
      cpu: <span class="hljs-string">"3"</span>
      memory: <span class="hljs-string">"6Gi"</span>
    limits:
      cpu: <span class="hljs-string">"3"</span>
      memory: <span class="hljs-string">"6Gi"</span>
</code></pre>
<p>This means each CockroachDB pod (replica) will now <em>request</em> 3 vCPUs (guaranteed). Save the file, then apply the update with the Helm command:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>Once the upgrade is done, give it 30 minutes to 1 hour to stabilize. The CockroachDB dashboard will automatically start showing you updated metrics.</p>
<p>If you see that the CPU usage drops below 70% and the SQL P99 latency improves, you’re good. 👍🏾</p>
<h4 id="heading-step-2-add-another-replica-new-node">Step 2: Add Another Replica (New Node)</h4>
<p>But…what if the latency is <strong>still high</strong> even after adding more CPU? That likely means the cluster is still overloaded, and it’s time to add another node (replica) to distribute the load.</p>
<p>Here’s why that works: CockroachDB is horizontally scalable, meaning it automatically spreads out your data (remember <strong>ranges</strong>?) and balances reads/writes across all replicas. So, the more nodes you add, the more evenly your cluster can share the work.</p>
<p>To add another replica, simply increase the <code>replicas</code> value in your Helm config:</p>
<pre><code class="lang-bash">statefulset:
  replicas: 4  <span class="hljs-comment"># If it was 3 before</span>
</code></pre>
<p>Then, redeploy again:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>This adds a new pod (a new CockroachDB node) to your cluster. CockroachDB will automatically rebalance your data across nodes – no manual migration needed</p>
<p>💡 <strong>Tip:</strong> Try to keep one CockroachDB pod (replica) per VM. For example, if you have 3 replicas, you should ideally have 3 separate VMs (worker nodes). This ensures better fault tolerance and performance.</p>
<p>Luckily, the official CockroachDB Helm chart already helps with this by managing <strong>Pod</strong> <strong>anti-affinity rules</strong>, so pods are automatically spread across nodes safely.</p>
<h3 id="heading-disk-bound-situations-what-to-do-when-your-disk-is-the-limiting-factor">Disk-Bound Situations — What to Do When Your Disk Is the Limiting Factor</h3>
<p>If you’re seeing this kind of pattern in your CockroachDB dashboard and Kubernetes cluster:</p>
<ul>
<li><p>SQL P99 latency is high (queries are slow)</p>
</li>
<li><p>“Disk Ops In Progress” (queue depth) stays above ~9-10 – meaning many disk I/O operations are waiting to be processed</p>
</li>
<li><p>Disk “Read bytes/sec” or “Write bytes/sec” (throughput) are high <strong>or</strong> “Read IOPS” or “Write IOPS” are high (even though CPU looks okay)</p>
</li>
</ul>
<p>Then you’re very likely <strong>disk-bound</strong>, meaning your storage is the bottleneck.</p>
<p>Here’s how to fix it (and yes, it’s a bit more complex than just “add more RAM”)…</p>
<h4 id="heading-step-1-increase-disk-size-in-your-helm-values">Step 1: Increase Disk Size in Your Helm Values</h4>
<p>Often the first problem is that the disk size is too small. Here’s how you can increase it:</p>
<ol>
<li><p>Open your <code>cockroachdb-values.yml</code> (the Helm chart values file)</p>
</li>
<li><p>Look for the storage section, for example:</p>
</li>
</ol>
<pre><code class="lang-bash">storage:
  persistentVolume:
    size: 5Gi  <span class="hljs-comment"># current size</span>
</code></pre>
<ol start="3">
<li>Update it to a larger size, like:</li>
</ol>
<pre><code class="lang-bash">storage:
  persistentVolume:
    size: 15Gi  <span class="hljs-comment"># increased size</span>
</code></pre>
<ol start="4">
<li>Save the file and run:</li>
</ol>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p><strong>N.B.</strong> If this doesn’t work or you receive an error from the Helm chart concerning not being able to modify some values (this is normal), just upsize the disk this way:👇🏾 (just replace the PVC_NAME and SIZE placeholders accordingly)</p>
<pre><code class="lang-bash">kubectl patch pvc &lt;PVC_NAME&gt; \
  -p <span class="hljs-string">'{"spec":{"resources":{"requests":{"storage":"&lt;SIZE&gt;"}}}}'</span>
</code></pre>
<p>Do that for each PVC (<code>datadir-crdb-cockroachdb-0</code>, <code>datadir-crdb-cockroachdb-1</code>, and so on).</p>
<p><strong>Important:</strong> Increasing size <em>may help</em>, but often alone is not enough because your disk speed (IOPS/throughput) also depends on factors beyond just size.</p>
<p>Let’s break down why that’s the case, and what really affects your disk performance (especially on Google Cloud, which is what I’m using, too).</p>
<h4 id="heading-why-disk-speed-can-vary">Why Disk Speed Can Vary</h4>
<p>Your CockroachDB cluster uses <strong>external disks</strong> provided by your cloud provider (like Google, AWS, or Azure). The speed of those disks – that is, how fast they can read/write data – isn’t fixed. It depends on a few key factors.</p>
<p>On Google Cloud, disk performance depends on three main things:</p>
<ol>
<li><p><strong>Disk type</strong>: HDD, SSD, or fast SSD (pd-ssd) (the faster the disk type, the faster it can handle data operations)</p>
</li>
<li><p><strong>Disk size</strong>: larger disks usually come with higher speed limits (the bigger, the faster)</p>
</li>
<li><p><strong>VM’s vCPU count</strong>: more CPUs mean higher quotas for both</p>
<ul>
<li><p>read/write operations per second (<strong>IOPS</strong>), and</p>
</li>
<li><p>how much data can flow to/from the disk per second (<strong>throughput</strong>)</p>
</li>
</ul>
</li>
</ol>
<h4 id="heading-the-recommended-disk-type-for-cockroachdb">The Recommended Disk Type for CockroachDB</h4>
<p>The pd-ssd (Google’s fast SSD) is the recommended type for CockroachDB.</p>
<ul>
<li><p>Each pd-ssd disk starts with a minimum of 6,000 IOPS (read or write operations per second).</p>
</li>
<li><p>It also has around 240 MiB/s (~252 MB/s) of read/write throughput.</p>
</li>
</ul>
<p>In simple terms, that means your CockroachDB disk can handle up to 6,000 read/write operations EVERY SECOND, and move 250+ MB of data in and out every second. That’s pretty impressive!</p>
<p>But here’s the catch: those numbers can still vary depending on your <strong>VM family</strong> and <strong>CPU count</strong>.</p>
<h4 id="heading-how-vm-family-affects-disk-speed-e2-example">How VM Family Affects Disk Speed (E2 Example)</h4>
<p>If your CockroachDB is running on an E2 VM family (one of Google Cloud’s general-purpose VM types):</p>
<ul>
<li><p>A VM with 2–7 vCPUs can handle up to:</p>
<ul>
<li><p>15k IOPS (read/write operations per second)</p>
</li>
<li><p>250+ MiB/s throughput (which is already far more than many databases ever use 😅)</p>
</li>
</ul>
</li>
<li><p>A VM with 8–15 vCPUs still allows 15k IOPS, but throughput jumps up to ~800 MiB/s 😮 –<br>  meaning your disk can push nearly 0.8 GB per second of data in/out IN A SECOND.</p>
</li>
</ul>
<p>The more vCPUs you have, the higher these limits grow, both for IOPS and throughput.</p>
<h4 id="heading-putting-it-all-together">Putting It All Together</h4>
<p>So, if you notice high SQL P99 latency (queries taking long), and disk read and write IOPS or throughput (read &amp; write bytes) usage close to their limits, then your disk may be maxing out, not your database itself.</p>
<p>Here’s what you can do:</p>
<ul>
<li><p>Check your current VM’s vCPU count and disk performance limit for that CPU.</p>
</li>
<li><p>If you’re using E2 with low vCPUs (for example, 2–4), try increasing it to <strong>8 vCPUs or more</strong>. That’ll immediately lift your IOPS and throughput ceiling.</p>
</li>
</ul>
<h4 id="heading-example-e2-vm-family-iopsthroughput-table">Example: E2 VM Family IOPS/Throughput Table</h4>
<pre><code class="lang-bash">E2 per-VM caps (pd-ssd):

e2-medium:     10k write / 12k <span class="hljs-built_in">read</span> IOPS, 200/200 MiB/s
2–7 vCPUs:     15k / 15k IOPS, 240/240 MiB/s
8–15 vCPUs:    15k / 15k IOPS, 800/800 MiB/s
16–31 vCPUs:   25k / 25k IOPS, 1,000 write / 1,200 <span class="hljs-built_in">read</span> MiB/s
32 vCPUs:      60k / 60k IOPS, 1,000 write / 1,200 <span class="hljs-built_in">read</span> MiB/s
</code></pre>
<p>The rule is simple — the higher the CPU tier (2–7, 8–15, and so on), the higher the disk speed cap.</p>
<h4 id="heading-but-what-if-youre-still-seeing-slow-queries">⚠️ But What If You’re Still Seeing Slow Queries?</h4>
<p>If your CockroachDB queries are <em>still</em> slow, but your metrics show that you’re not fully using your disk capacity (based on your VM’s CPU range), then your <strong>disk size</strong> might be the actual limitation.</p>
<p>In that case:</p>
<ul>
<li><p>Gradually increase your disk size, for exaxmple from <code>50Gi</code> to <code>70Gi</code> to <code>100Gi</code>.</p>
</li>
<li><p>Each increase enables your disk to pass more amount of data in and out (especially with pd-ssd).</p>
</li>
<li><p>Remember: once you increase disk size on Google Cloud, <strong>you can’t shrink it back down</strong>, so grow it slowly and observe improvements before scaling again.</p>
</li>
</ul>
<p>This step helps you pinpoint <em>exactly</em> whether the slowdown is coming from insufficient IOPS, throughput, or just a disk that’s too small for CockroachDB’s workload 💪🏾</p>
<h3 id="heading-memory-pressure-what-to-do-when-your-database-hits-the-limit">Memory Pressure — What to Do When Your Database Hits the Limit</h3>
<p>There are some signs in your cluster you can look out for that’ll tell you your database is getting close to its limit. Pods (database replicas) might be getting <strong>OOMKilled</strong> (out of memory) or being evicted by Kubernetes, or your memory usage might be staying above ~ 75–80% for a while.</p>
<p>If either these is the case, you’re often dealing with <strong>memory pressure</strong> (you can check memory usage on the CockroachDB overview dashboard).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762584827011/e7828548-7ed7-4a87-b6b2-fff52c6f6df1.png" alt="Accessing your Cluster memory usage" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-why-this-happens">Why this happens</h4>
<p>If you didn’t set memory requests and limits properly for each replica, the pod might not have enough head-room for all of its internal work (cache, SQL memory, background jobs) and Kubernetes kills it or it crashes.</p>
<p>Also, as you increase load (lots of queries, many users), your database needs more memory for two internal areas:</p>
<ul>
<li><p><code>--cache</code> (or <code>conf.cache</code>): in-memory data caching</p>
</li>
<li><p><code>--max-sql-memory</code> (or <code>conf.max-sql-memory</code>): memory for running SQL queries (joins, sorts, and so on).<br>  And yes, we covered the formula earlier <code>(2 × max-sql-memory) + cache ≤ ~ 80% of RAM limit</code>.</p>
</li>
</ul>
<h4 id="heading-what-to-do">What to do:</h4>
<p>First, you can increase the DB memory. In your Helm chart values (<code>cockroachdb-values.yml</code>), bump up the <code>statefulset.resources.limits.memory</code> and <code>statefulset.resources.requests.memory</code>. Or you can modify <code>conf.cache</code> and <code>conf.max-sql-memory</code> values (if you’re comfortable) but only if the total RAM limit is sufficient to support them.</p>
<p>Because the defaults (when you installed) set each to ~25% of RAM limit, they will scale automatically when you increase RAM.</p>
<p>For example:</p>
<ul>
<li><p>If RAM limit per pod = <strong>5 GiB</strong>, then cache ≈ <strong>1.25 GiB</strong>, max-sql-memory ≈ <strong>1.25 GiB</strong></p>
</li>
<li><p>If you raise RAM limit to <strong>8 GiB</strong>, these become ≈ <strong>2 GiB</strong> each. This keeps you inside the formula and avoids memory crashes.</p>
</li>
</ul>
<h4 id="heading-quick-yaml-snippet-example">Quick YAML snippet example:</h4>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"8Gi"</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"8Gi"</span>
<span class="hljs-attr">conf:</span>
  <span class="hljs-attr">cache:</span> <span class="hljs-string">"25%"</span>
  <span class="hljs-attr">max-sql-memory:</span> <span class="hljs-string">"25%"</span>
</code></pre>
<p>After editing your values file, remember to apply it:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<h3 id="heading-when-queries-are-slow-but-everything-else-cpu-memory-amp-disk-looks-fine">When Queries Are Slow but Everything Else (CPU, Memory &amp; Disk) Looks “Fine”</h3>
<p>Sometimes you’ll see that your resource metrics (CPU, memory, disk I/O) all seem healthy. But your queries are still slow.</p>
<p>What then? One important cause: <strong>hotspots</strong> – especially “hot ranges” or “hot nodes” in CockroachDB.</p>
<p>A <strong>hot range</strong> is a portion of data (in CockroachDB, a range is a section of data from a table) that’s receiving much more traffic (reads or writes) than others.</p>
<p>A <strong>hot node</strong>, on the other hand, is a node/replica in the cluster which has significantly more load compared to the other nodes – often because it holds one or more hot ranges.</p>
<p>Because most of the traffic (queries) go to a range which is on a specific node, even though your overall CPU / memory / disk metrics might look “okay”, performance still suffers locally: queries are funneled into that specific range, making a “hotspot”.</p>
<p>Learn more about Hotspots <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/understand-hotspots">here</a>.</p>
<h4 id="heading-why-a-high-write-workload-can-slow-reads">Why A High Write Workload Can Slow Reads</h4>
<p>When you have lots of write queries, they may overload specific ranges or nodes (especially if the keyspace is skewed). Writes tend to:</p>
<ul>
<li><p>Acquire locks or latches on rows or ranges</p>
</li>
<li><p>Cause contention among transactions</p>
</li>
<li><p>Require coordination (for example, via Raft consensus) which impacts performance.</p>
</li>
</ul>
<p>When writes dominate a range, read queries that hit the same ranges may get queued behind these write operations, or suffer longer wait times.</p>
<p>Since reads and writes are sharing the same underlying data/ranges, too much writes can delay reads by creating bottlenecks. The docs call this part of “write hotspots”.</p>
<h4 id="heading-key-signs-you-might-have-a-hotspot">Key Signs You Might Have a Hotspot</h4>
<ul>
<li><p>One node’s CPU % is much higher than the others (even though overall resources seem fine)</p>
</li>
<li><p>On the Hot Ranges page in the CockroachDB UI, some ranges show very high QPS (queries per second) compared to others.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762586236835/aeb3b0ea-b280-48d3-b12f-4cfe78d11dc1.png" alt="The Hot Ranges page in the CockorachDB dashboard UI" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
<li><p>You observe that increasing overall resources (more CPU, more nodes) didn’t resolve the slowness. This suggests the problem isn’t “not enough resources” but “resource imbalance”.</p>
</li>
</ul>
<h4 id="heading-what-you-can-do">What You Can Do</h4>
<p>There are a few things you can do to prevent hotspots:</p>
<ul>
<li><p>Use the <strong>Hot Ranges</strong> UI page (go to the Database Console and then to Hot Ranges) to identify the range IDs and table/indexes causing the issue.</p>
</li>
<li><p>Examine how the key space is being used. If your table/index primary key is monotonically increasing (for example, timestamps or serial IDs), the writes may target a narrow portion of the data, causing a hotspot. The docs suggest using hash-sharded indexes or distributing writes across the key-space.</p>
</li>
<li><p>Ensure load is balanced across nodes: avoid “one node doing most of the work”. If needed, add nodes or ensure range distribution/lease-holder movement is happening.</p>
</li>
<li><p>Monitor write-versus-read workload. if writes are heavy, they may cause queuing for reads even when resources appear OK. So look at write heavy traffic patterns and try reducing the amount of writes (if possible).</p>
</li>
</ul>
<h4 id="heading-note">⚠️ Note</h4>
<p>Learning everything about hotspots, key visualizers, and range splitting is a bit advanced. For those wanting to dive deeper: see the CockroachDB <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/performance-recipes">Performance Recipes page</a>.</p>
<h3 id="heading-understanding-disk-speed-iops-amp-throughput-across-cloud-providers">Understanding Disk Speed (IOPS &amp; Throughput) Across Cloud Providers</h3>
<p>So far, we’ve talked about how disk speed affects CockroachDB’s performance – especially how Google Cloud measures it. But it’s important to know that <strong>each cloud provider has its own way of measuring and limiting disk performance</strong> (IOPS and throughput).</p>
<p>So, while our earlier examples focused on Google Cloud, similar logic applies to AWS, Azure, and even DigitalOcean, just with different formulas and limits.</p>
<h4 id="heading-for-google-cloud">For Google Cloud:</h4>
<p>These guides break down how disk performance works:</p>
<ul>
<li><p><a target="_blank" href="https://cloud.google.com/compute/docs/disks/performance">Persistent Disk performance overview</a>: explains how baseline IOPS and throughput are calculated and the per-instance caps.</p>
</li>
<li><p><a target="_blank" href="https://docs.cloud.google.com/compute/docs/disks/persistent-disks">About Persistent Disks</a>: quick definitions of <code>pd-standard</code> (HDD), <code>pd-balanced</code> (SSD), and <code>pd-ssd</code> (SSD).</p>
</li>
<li><p><a target="_blank" href="https://cloud.google.com/compute/docs/disks/optimizing-pd-performance">Optimize PD performance</a>: shows how disk size, machine series, and tuning can affect performance.</p>
</li>
</ul>
<h4 id="heading-for-aws-ebs">For AWS (EBS):</h4>
<p>AWS’s Elastic Block Store (EBS) has several disk types:</p>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html">EBS volume types</a>: overview of all SSD and HDD types (<code>gp3</code>, <code>gp2</code>, <code>io2</code>, and so on).</p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/ebs/latest/userguide/general-purpose.html">General Purpose SSD (gp3)</a>: lets you provision custom IOPS and throughput for your disks (about 0.25 MiB/s per IOPS, up to 2,000 MiB/s).</p>
</li>
</ul>
<h4 id="heading-for-azure-managed-disks">For Azure (Managed Disks):</h4>
<p>Azure disks also vary by type and size:</p>
<ul>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types">Disk types overview</a>: compares Standard HDD, Standard SSD, Premium SSD, Premium SSD v2, and Ultra Disk.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/virtual-machines/disks-deploy-premium-v2">Premium SSD v2</a>: lets you independently set IOPS and throughput for your disks.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/virtual-machines/disks-performance">VM &amp; disk performance</a>: lists per-VM IOPS and throughput caps.</p>
</li>
</ul>
<h4 id="heading-for-digitalocean">For DigitalOcean:</h4>
<p>DigitalOcean offers simpler storage setups:</p>
<ul>
<li><p><a target="_blank" href="https://docs.digitalocean.com/products/volumes/">Volumes overview</a>: explains block storage and NVMe details.</p>
</li>
<li><p><a target="_blank" href="https://docs.digitalocean.com/products/volumes/details/limits/">Volume Limits</a>: shows per-Droplet IOPS and throughput caps (including burst windows).</p>
</li>
</ul>
<h3 id="heading-downsizing-the-cluster-reducing-replicas">Downsizing the Cluster (Reducing Replicas)</h3>
<p>Now that we’ve seen how to scale up our CockroachDB cluster, let’s look at how to scale it down safely and correctly.</p>
<p>Let’s assume we scaled our cluster from 3 replicas to 5 replicas earlier (to handle more workload).</p>
<p>PS: If your CockroachDB pods were crashing often, you might need to increase the CPU and memory limits in the Helm chart configuration, like this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">5</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"2Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span> <span class="hljs-comment"># We can keep the memory requests and limits inconsistent for now, since we're in a development environment</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
<span class="hljs-string">...</span>
</code></pre>
<p>Then, you update the cluster using:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/helm-chart -f cockroachdb-values.yml
</code></pre>
<p>After a few minutes, you can confirm the newly added replicas <code>kubectl get pods</code>. You should now see five CockroachDB pods running.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762612478598/dee9f9e7-6b31-4b06-aed3-e2b0b97268fd.png" alt="The newly added CockroachDB replicas" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Also, check your CockroachDB Admin UI – the new nodes should now appear in the cluster overview.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762612539734/30e01a7d-3d2b-4160-be90-2988a161d87d.png" alt="Newly added nodes in the cluster" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>P.S: You might experience some issues when upscaling your cluster, especially if you don’t have sufficient memory and CPU on your PC or wherever you’re running your Kubernetes cluster.</p>
<h3 id="heading-the-wrong-way-to-downscale">⚠️ The Wrong Way to Downscale</h3>
<p>Now, what if your workload reduces and you’d like to cut costs by scaling down from 5 replicas back to 3?</p>
<p>You might think, <em>“Oh, I’ll just reduce the number of replicas in the Helm chart from 5 to 3 and redeploy.”</em> But hold on, that’s very wrong! 😅</p>
<p>Scaling up CockroachDB is simple…but scaling down must be done carefully, because of certain factors which will explain.</p>
<h3 id="heading-decommissioning-a-node-before-scaling-down-the-cluster">Decommissioning a Node Before Scaling Down the Cluster</h3>
<p>Before you go ahead and reduce the number of replicas in your CockroachDB cluster, it’s important to follow the right process.</p>
<p>You <em>can’t</em> just go from 5 replicas down to 3 and expect everything to go smoothly. There are steps you must take.</p>
<h4 id="heading-why-you-cant-just-scale-from-5-to-3-instantly">Why you can’t just scale from 5 to 3 instantly</h4>
<p>If you reduce your cluster size too quickly, you might:</p>
<ul>
<li><p>Lose data redundancy or fail to meet the required replication factor.</p>
</li>
<li><p>Cause data rebalancing to happen under heavy load, which can slow queries.</p>
</li>
<li><p>Put your cluster into a state where certain ranges or data replicas don’t have enough copies to remain fault-tolerant.</p>
</li>
</ul>
<h4 id="heading-the-correct-approach-decommission-first-then-scale-down-one-node-at-a-time">✅ The correct approach: Decommission first, then scale down one node at a time</h4>
<p>Here’s the safe way to downscale:</p>
<ol>
<li><p><strong>Decommission</strong> the node you plan to remove.</p>
</li>
<li><p>Once decommissioning is complete, <strong>reduce the replica count</strong> (for example, from 5 to 4).</p>
</li>
<li><p>Delete the disk/PVC tied to that removed node.</p>
</li>
<li><p>Repeat the process (remove one node at a time) until you reach your target size (for example, down to 3 replicas).</p>
</li>
</ol>
<h4 id="heading-step-by-step-decommission-the-5th-node-before-scaling-5-to-4">Step-by-step: Decommission the 5th node (before scaling 5 to 4)</h4>
<ol>
<li><p><strong>Create a client pod</strong> to run CockroachDB commands.<br> Create a file named <code>cockroachdb-client.yml</code> with this content:</p>
<pre><code class="lang-yaml"> <span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
 <span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
 <span class="hljs-attr">metadata:</span>
   <span class="hljs-attr">name:</span> <span class="hljs-string">cockroachdb-client</span>
 <span class="hljs-attr">spec:</span>
   <span class="hljs-attr">serviceAccountName:</span> <span class="hljs-string">&lt;SA&gt;</span>
   <span class="hljs-attr">containers:</span>
     <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">cockroachdb-client</span>
       <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.1</span>
       <span class="hljs-attr">imagePullPolicy:</span> <span class="hljs-string">IfNotPresent</span>
       <span class="hljs-attr">command:</span>
         <span class="hljs-bullet">-</span> <span class="hljs-string">sleep</span>
         <span class="hljs-bullet">-</span> <span class="hljs-string">"2147483648"</span>
   <span class="hljs-attr">terminationGracePeriodSeconds:</span> <span class="hljs-number">300</span>
</code></pre>
<p> Replace <code>&lt;SA&gt;</code> with your CockroachDB service account name (find it via <code>kubectl get sa -l app.kubernetes.io/name=cockroachdb</code>).</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620657038/34d5eb4b-de16-4e8a-b85c-1e7bf6b76172.png" alt="The CockroachDB service account details" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
<li><p>Apply the manifest:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">apply</span> <span class="hljs-string">-f</span> <span class="hljs-string">cockroachdb-client.yml</span>
</code></pre>
</li>
<li><p>Confirm the pod is running:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">get</span> <span class="hljs-string">pods</span>
</code></pre>
<p> You should see <code>cockroachdb-client</code>.</p>
</li>
<li><p>Exec into the client pod:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">exec</span> <span class="hljs-string">-it</span> <span class="hljs-string">cockroachdb-client</span> <span class="hljs-string">--</span> <span class="hljs-string">bash</span>
</code></pre>
</li>
<li><p>Get the list of nodes and IDs:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">./cockroach</span> <span class="hljs-string">node</span> <span class="hljs-string">status</span> <span class="hljs-string">--insecure</span> <span class="hljs-string">--host</span> <span class="hljs-string">&lt;SERVICE_NAME&gt;</span>
</code></pre>
<p> Find your service name: <code>kubectl get svc -l app.kubernetes.io/component=cockroachdb</code>. In our case it’s <code>crdb-cockroachdb-public</code>.</p>
<p> You’ll see nodes with IDs 1, 2, 3, 4, 5. Each maps to a replica pod like <code>crdb-cockroachdb-0</code>, <code>-1</code>, <code>-2</code>, <code>-3</code>, <code>-4</code>.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620790692/af8d382e-71db-4eab-af7a-a3491d98c8a8.png" alt="The nodes in the CockroachDB cluster" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
<li><p><strong>Decommission the node with the highest index</strong> (since Kubernetes will remove the highest-numbered replica when scaling down).<br> For example, if you’re removing the pod <code>crdb-cockroachdb-4…</code>, and the node ID is 5:</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620838125/b51856cb-2fbb-4b24-ba41-21f572c7678c.png" alt="The node to be decommissioned" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p> Run the command below to decommission the 5th node.</p>
<pre><code class="lang-yaml"> <span class="hljs-string">./cockroach</span> <span class="hljs-string">node</span> <span class="hljs-string">decommission</span> <span class="hljs-number">5</span> <span class="hljs-string">--host</span> <span class="hljs-string">crdb-cockroachdb-public</span> <span class="hljs-string">--insecure</span>
</code></pre>
</li>
<li><p>Navigate to the CockroachDB dashboard, and monitor until the node status shows as <code>decommissioned</code>.<br> In the CockroachDB Console’s Cluster Overview page, you’ll see formerly removed nodes under “Recently Decommissioned Nodes”.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620923692/e678b21b-e2cc-4fe5-bd5b-46c4b0248958.png" alt="e678b21b-e2cc-4fe5-bd5b-46c4b0248958" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
<li><p><strong>Scale down the replicas</strong> in your Helm values file:</p>
<pre><code class="lang-yaml"> <span class="hljs-attr">statefulset:</span>
   <span class="hljs-attr">replicas:</span> <span class="hljs-number">4</span>
 <span class="hljs-string">...</span>
</code></pre>
<p> Then run:</p>
<pre><code class="lang-bash"> helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
</li>
<li><p>Verify pods:</p>
<pre><code class="lang-bash"> kubectl get pods
</code></pre>
<p> You should now see 4 CockroachDB replica pods.</p>
</li>
<li><p><strong>Delete the PVC</strong> for the removed node (to avoid paying for storage you’re no longer using):</p>
</li>
</ol>
<pre><code class="lang-bash">kubectl delete pvc datadir-crdb-cockroachdb-4
</code></pre>
<ol start="11">
<li>Repeat the process for the next node if you want to go from 4 to 3 replicas: decommission node #4 next, scale to 3, delete its PVC, and so on.</li>
</ol>
<p>After you’re done, you’ll have the target state (for example, 3 nodes) safely and cleanly without causing cluster instability or data loss.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762621007089/cf7fce07-a3a6-4b01-9536-1d5476c2119e.png" alt="Scaling down to 3 nodes, the nodes status on the CockroachDB dashboarrd" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>To learn more about scaling down your CockroachDB nodes, visit the <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/scale-cockroachdb-kubernetes?filters=helm#remove-nodes">official CockroachDB docs</a>.</p>
<p>Note that you should <strong>NOT</strong> use Horizontal Pod Autoscalers for scaling up and down your CockroachDB cluster.</p>
<p>Remember, before scaling down, you need to <strong>DECOMMISSION THE NODES FIRST</strong>, and <strong>scale down ONE AT A TIME</strong>!</p>
<p>However, the Horizontal Pod Autoscalers do NOT obey this. So if you intend to auto-scale your CockroachDB cluster, it's best to have a fixed size of replicas, for example, 3, 5, 7.</p>
<p>Then set up a Vertical pod Autoscaler to scale their CPU and RAM (Remember to set the Memory and CPU requests and limits to the same quantity to prevent eviction as explained earlier).</p>
<h2 id="heading-what-to-consider-when-deploying-cockroachdb-on-google-kubernetes-engine-gke">What to Consider When Deploying CockroachDB on Google Kubernetes Engine (GKE) ☁️</h2>
<p>Up until now we’ve been working in a <strong>development environment</strong> (using Minikube, local setups), testing and learning.</p>
<p>Now we’re ready to move into <strong>production mode 🤓</strong>. And one of the best places to host CockroachDB in production is on GKE.</p>
<p>In this section, we’ll cover GKE-specific considerations, such as storage classes, load balancers, networking, and how to secure our CockroachDB cluster on GKE using mTLS for authenticating our clients and encrypting any data sent to and from our CockroachDB cluster.</p>
<h3 id="heading-creating-your-gke-cluster">Creating Your GKE Cluster</h3>
<p>To get started, head over to the <a target="_blank" href="https://console.cloud.google.com/"><strong>Google Cloud Console</strong></a>.</p>
<p>In the search bar at the top, type “Kubernetes” and click on “Kubernetes Engine” from the dropdown.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836788168/0d509529-69fb-4308-ba05-6a1426ee7fe1.png" alt="Searching the Kubernetes Engine resource" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You’ll be taken to the Kubernetes Engine page. On the left sidebar, click “Clusters.” Then click the “Create” button at the top.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836843514/fc6d59a2-5b9d-4dee-9fea-7bbb7fc2a023.png" alt="Creating a new cluster" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>💡 <strong>Note:</strong> You’ll need to enable the <strong>Compute Engine API</strong> before you can create a GKE cluster. If you haven’t done that yet, Google Cloud will automatically redirect you to a page where you can enable it. Just click “Enable”, then return to the cluster page.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763998084001/3ecbe47c-3def-4f9c-bc80-dabe2c0002c8.png" alt="Enabling the Compute Engine API" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You can also learn more about enabling APIs in Google Cloud here: <a target="_blank" href="https://docs.cloud.google.com/endpoints/docs/openapi/enable-api">Enable APIs in Google Cloud</a>.</p>
<p>Once you’re back, you’ll see the cluster creation page. If it defaults to Autopilot, click “Switch to Standard cluster” in the top-right corner. This gives you more control over node settings.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836938958/a2c35e79-6404-4c3a-a821-94d4ce926839.png" alt="Switching to Standard Cluster settings" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Under Cluster basics, give your cluster a name – something like <code>cockroachdb-tutorial</code> works great! Then, set Location type to Zonal (that’s fine for now).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836985443/eb7b1f79-66e3-4ca4-bfe3-842c5571509b.png" alt="Configuring Zonal clusters" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>On the left sidebar, go to “Node pools.” You’ll see a default pool already added.</p>
<ul>
<li><p>Keep the name as is.</p>
</li>
<li><p>Set the Number of nodes to 1.</p>
</li>
<li><p>Enable the Cluster autoscaler option (so it can scale up automatically later).</p>
</li>
<li><p>Set the Maximum number of Nodes to 10, and the minimum to 0.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762918866561/89a00b2c-46e8-440d-8662-77386cc2cf0e.png" alt="Modifying our default node pool, the cluster autoscaler, etc" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
</ul>
<p>Next, click the dropdown arrow beside “default-pool” and select “Nodes.” Here, set up your node specifications:</p>
<ul>
<li><p><strong>VM family:</strong> <code>E2</code></p>
</li>
<li><p><strong>Machine type:</strong> <code>Custom</code></p>
</li>
<li><p><strong>vCPUs:</strong> 2</p>
</li>
<li><p><strong>Memory:</strong> 7 GB</p>
</li>
<li><p><strong>Boot disk type:</strong> Standard persistent disk</p>
</li>
<li><p><strong>Disk size:</strong> 50 GB</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762837157043/89da8297-8ecc-4369-aef5-c3b0e75e37be.png" alt="Configuring the E2 Machine type" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762920102117/173a1d66-d31b-49e3-835b-436ec2781b49.png" alt="Configuring our default pool CPU, RAM, and disk" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
</ul>
<p>When all that’s set, click “Create.” Your cluster will start provisioning.</p>
<h3 id="heading-connecting-to-your-gke-cluster">Connecting to your GKE cluster</h3>
<p>Once your GKE cluster creation is complete (this might take a few minutes), you’ll see something like this in the console:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762844143298/042cc870-82ae-4981-b7c8-d80b187f37a9.png" alt="Accessing out new cluster page" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Next, click the “Connect” link at the top of the page. A modal will pop up. Copy the CLI command you see.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762844213835/119b603c-26c3-46ee-83e1-8feba78031a7.png" alt="Getting the command to access the cluster" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>It’ll look something like:</p>
<pre><code class="lang-bash">gcloud container clusters get-credentials cockroachdb-tutorial --zone us-central1<span class="hljs-_">-a</span> --project &lt;PROJECT_NAME&gt;
</code></pre>
<p>📌 <strong>Note:</strong> To run this command successfully, you need to have the <code>gcloud</code> CLI tool installed. If you don’t have it yet, visit <a target="_blank" href="https://docs.cloud.google.com/sdk/docs/install">Install Google Cloud SDK</a> and pick the steps for your OS.</p>
<p>After installing the <code>gcloud</code> CLI, run:</p>
<pre><code class="lang-bash">gcloud auth login
</code></pre>
<p>This authenticates your terminal with your Google Cloud account so you can access the cluster securely.</p>
<p>After authenticating your terminal with access to Google Cloud, run the command you copied earlier. You should see something like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762844890936/12e6d8a7-b0ae-44d1-a77c-aeb118ba269b.png" alt="The command to provide your terminate your terminal to the newly created Kubernetes cluster" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now run the command to retrieve your pods, <code>kubectl get po</code>. This will retrieve the pods from your new cluster on Google Kubernetes Engine, not Minikube.</p>
<p>For now, we’ve not deployed anything yet, so the namespace should be empty.</p>
<p>But we should have at least 1 worker node available. Run the <code>kubectl get nodes</code> command to view it. You should see something similar to this (GKE takes care of our control plane for us, so when we view the nodes, we’ll only see the worker nodes).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762917947091/c29eb598-1723-43d0-a77f-c6611d04d3d8.png" alt="The available nodes in the GKE cluster" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-deploying-cockroachdb-in-production-on-gke">Deploying CockroachDB in Production (on GKE)</h3>
<p>Now that we’ve successfully created our Google Kubernetes Engine (GKE) cluster, it’s time to deploy our CockroachDB cluster in it – this time, in production mode.</p>
<p>Unlike our earlier Minikube setup (which we used for local development), deploying to GKE introduces new considerations like security, storage classes, and authentication methods – all tailored for a real-world production environment.</p>
<p>To get started, create a new file called <code>cockroachdb-production.yml</code>, and paste the following configuration inside:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">serviceAccount:</span>
    <span class="hljs-attr">create:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">"crdb-cockroachdb"</span>
    <span class="hljs-attr">annotations:</span>
      <span class="hljs-attr">iam.gke.io/gcp-service-account:</span> <span class="hljs-string">&lt;GOOGLE_SERVICE_ACCOUNT&gt;</span>

<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">10Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">premium-rwo</span>

<span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>

<span class="hljs-attr">init:</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app.kubernetes.io/component:</span> <span class="hljs-string">init</span>
  <span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">wait:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Replace the placeholder <code>&lt;GOOGLE_SERVICE_ACCOUNT&gt;</code> with the <strong>CockroachDB backup service account</strong> you created earlier (in the “Backing Up CockroachDB to Google Cloud Storage” section). It should look something like this <code>cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</code>.</p>
<h3 id="heading-understanding-the-configuration">Understanding the Configuration</h3>
<p>Let’s break down what’s happening in this production Helm values configuration and how it differs from the one we used in Minikube.👇🏽</p>
<h4 id="heading-1-modified-the-statefulset-configuration">1. Modified the <code>statefulset</code> Configuration</h4>
<p>We’re allocating 3 GiB of RAM and 1 vCPU to each replica, both as requests and limits.</p>
<p>This ensures that each node has enough guaranteed resources and avoids Kubernetes evicting it due to it using more than its requested resources.</p>
<p>We also defined a <strong>service account</strong> and annotated it with a GCP service account using the <code>iam.gke.io/gcp-service-account</code> annotation.</p>
<p>This annotation allows CockroachDB to securely access Google Cloud services (like Google Cloud Storage) without using static JSON key files (key.json), thanks to a GKE feature called <strong>Workload Identity</strong>.</p>
<p>In production, we let GKE handle authentication to Google services instead of mounting key files.</p>
<h4 id="heading-2-removed-podsecuritycontext">2. Removed <code>podSecurityContext</code></h4>
<p>In Minikube, we included this section:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">podSecurityContext:</span>
  <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
<span class="hljs-string">...</span>
</code></pre>
<p>We did that to give CockroachDB permission to access our local disk for persistent storage. But in GKE, this isn’t needed. Google Cloud handles storage mounting securely on our behalf, so we can safely omit this part.</p>
<h4 id="heading-3-removed-podantiaffinity-and-nodeselector">3. Removed <code>podAntiAffinity</code> and <code>nodeSelector</code></h4>
<p>In our Minikube deployment, we used:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">podAntiAffinity:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>
<span class="hljs-attr">nodeSelector:</span>
  <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
<span class="hljs-string">...</span>
</code></pre>
<p>That was just to <strong>force all CockroachDB instances to run on the same node</strong> on Minikube.</p>
<p>But in production, we <em>want</em> each replica on a different VM. This ensures high availability, even if one VM fails, only one CockroachDB replica is affected, and the cluster stays active.</p>
<p>Since our cluster uses a replication factor of 3, at least 2 replicas (a quorum) need to be active for the database to stay online, else, it will crash 🥲.</p>
<h4 id="heading-4-removed-env-volumes-and-volumemounts">4. Removed <code>env</code>, <code>volumes</code>, and <code>volumeMounts</code></h4>
<p>In Minikube, we had to manually mount the Service Account key:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">env:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GOOGLE_APPLICATION_CREDENTIALS</span>
    <span class="hljs-attr">value:</span> <span class="hljs-string">/var/run/gcp/key.json</span>
<span class="hljs-attr">volumes:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
    <span class="hljs-attr">secret:</span>
      <span class="hljs-attr">secretName:</span> <span class="hljs-string">gcs-key</span>
<span class="hljs-attr">volumeMounts:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
    <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/var/run/gcp</span>
    <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
<span class="hljs-string">...</span>
</code></pre>
<p>This was needed so CockroachDB could access our Google Cloud Storage bucket for backups.</p>
<p>But in production, we don’t use key files. Instead, we use a GKE feature called Workload Identity.</p>
<p>It securely binds a Kubernetes Service Account to a Google Service Account, giving our CockroachDB pods the same permissions as the GCP account: no keys, no secrets, and much safer 🔒</p>
<h4 id="heading-5-updated-storagepersistentvolumestorageclass">5. Updated <code>storage.persistentVolume.storageClass</code></h4>
<p>In Minikube, we used a standard disk:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">5Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">standard</span>
<span class="hljs-string">...</span>
</code></pre>
<p>But for production, we’re switching to a faster SSD:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">10Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">premium-rwo</span>
<span class="hljs-string">...</span>
</code></pre>
<p>This uses Google Cloud’s <code>pd-ssd</code> disk type which is the recommended choice for CockroachDB due to its <strong>high IOPS</strong> (read/write operations per second) and <strong>throughput</strong>. This gives our cluster faster read and write speeds under load, leading to better performance.</p>
<h4 id="heading-6-enabled-tls-for-secure-communication">6. Enabled TLS for Secure Communication</h4>
<p>In development, we disabled TLS:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>
</code></pre>
<p>That made it easier and simpler to connect without dealing with certificates.</p>
<p>But in production, security is non-negotiable. We’re enabling TLS to ensure that all communication with CockroachDB is encrypted in transit, and that only clients with <strong>valid certificates</strong> (signed by the same authority) can connect. This is <strong>mutual TLS (mTLS)</strong> authentication.</p>
<p>mTLS ensures that both sides (client and server) prove who they are, preventing impersonation or man-in-the-middle attacks. It’s one of the strongest ways to secure a production database connection.</p>
<p>To learn more about TLS and mTLS encryption, check out:</p>
<ul>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/understanding-website-encryption/">Understanding Website Encryption (FreeCodeCamp)</a></p>
</li>
<li><p><a target="_blank" href="https://medium.com/@LukV/mutual-tls-mtls-a-deep-dive-into-secure-client-server-communication-bbb83f463292">Mutual TLS Deep Dive (Medium)</a></p>
</li>
</ul>
<h3 id="heading-installing-the-cockroachdb-cluster-on-gke">Installing the CockroachDB Cluster on GKE</h3>
<p>We’ll use the values file you created (<code>cockroachdb-production.yml</code>) and deploy our CockroachDB cluster in our GKE cluster using Helm.</p>
<h4 id="heading-deploy-the-cluster">Deploy the cluster</h4>
<p>Run the following command:</p>
<pre><code class="lang-bash">helm install crdb cockroachdb/cockroachdb -f cockroachdb-production.yml
</code></pre>
<p>This command tells Helm to install a release named <code>crdb</code> using the <code>cockroachdb/cockroachdb</code> chart with your custom production-values file.</p>
<p>This step will take a few minutes. GKE will spin up 3 (or more) worker nodes to host the CockroachDB replicas.</p>
<p>Thanks to pod anti-affinity rules, you’ll typically see <strong>one replica pod per VM</strong> (which improves fault tolerance).</p>
<h4 id="heading-verify-the-pods">Verify the pods</h4>
<p>Once provisioning is done, check the pods:</p>
<pre><code class="lang-bash">kubectl get pods
</code></pre>
<p>You should see three CockroachDB replica pods (for example: <code>crdb-cockroachdb-0</code>, <code>crdb-cockroachdb-1</code>, <code>crdb-cockroachdb-2</code>) in <code>Running</code> status.</p>
<h4 id="heading-verify-the-storage-class-ssd">Verify the storage class (SSD)</h4>
<p>Now check the persistent volume claims to confirm they’re using the fast SSD storage class you requested:</p>
<pre><code class="lang-bash">kubectl get pvc
</code></pre>
<p>Look for your PVCs (persistent volume claims) and check the <code>STORAGECLASS</code> column. You should see something like <code>premium-rwo</code> instead of <code>standard</code> or <code>standard-rwo</code>. This confirms that your replicas are using the high-performance disk type you configured.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762928441524/d7e3d17f-c144-468f-8cc5-d71628ac6a3b.png" alt="The CockorachDB replicas and disk in production" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>📌 This is important, because in production you want good disk IOPS and throughput. Slower disks can bottleneck the database.</p>
<h3 id="heading-connecting-to-our-cockroachdb-cluster-now-that-tls-mtls-are-enabled">Connecting to Our CockroachDB Cluster (Now That TLS + mTLS Are Enabled)</h3>
<p>Now that we’ve enabled TLS encryption and mTLS authentication, let’s actually try connecting to the cluster so you can <em>see</em> what this security setup looks like in action.</p>
<p>We’ll break down in more detail what TLS and mTLS mean shortly. But for now, let’s jump straight into trying to connect – because once you see the behavior, the explanation becomes much easier to understand.</p>
<h4 id="heading-step-1-expose-the-cockroachdb-cluster-to-your-local-pc-using-port-forwarding">Step 1: Expose the CockroachDB Cluster to Your Local PC (Using Port Forwarding)</h4>
<p>Just like we've been doing from the start, we’ll expose our CockroachDB cluster through <strong>port-forwarding</strong>.</p>
<p>Open a new terminal window and run:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26259:26257
</code></pre>
<p>What this means:</p>
<ul>
<li><p>The first port (26259) is the port on your computer.</p>
</li>
<li><p>The second port (26257) is the port inside the CockroachDB cluster.</p>
</li>
<li><p>Format is: <code>&lt;YOUR_COMPUTER_PORT&gt;</code> <strong>:</strong> <code>&lt;COCKROACHDB_PORT&gt;</code></p>
</li>
</ul>
<p>So now, CockroachDB will be reachable locally at <code>localhost:26259</code>.</p>
<h4 id="heading-step-2-open-beekeeper-studio-and-create-a-fresh-connection">Step 2: Open Beekeeper Studio and Create a Fresh Connection</h4>
<p>If Beekeeper Studio is still connected to our old Minikube cluster, or you're not seeing the “new connection” screen, just press <code>Ctrl + Shift + N</code>. This opens a new connection window instantly.</p>
<h4 id="heading-step-3-enter-the-connection-details">Step 3: Enter the Connection Details</h4>
<p>Now fill in these fields:</p>
<ul>
<li><p><strong>Port:</strong> <code>26259</code></p>
</li>
<li><p><strong>User:</strong> <code>root</code></p>
</li>
<li><p><strong>Default Database:</strong> <code>defaultdb</code></p>
</li>
</ul>
<p>Now click Test Connection.</p>
<p>And boom! You should see a message telling you something like:</p>
<blockquote>
<p>“This cluster is running in secure mode. You must use SSL to connect.”</p>
</blockquote>
<p>It’ll look similar to this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763193779864/f3e7abcb-34b0-4c21-8652-48a03e4ff6c9.png" alt="Trying to connect to the new CockroachDB cluster in insecure mode" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This is good: it means our CockroachDB cluster is officially in <strong>secure mode</strong>, and it’s rejecting any connection that doesn’t include proper TLS certificates.</p>
<h3 id="heading-connecting-via-mutual-tls-mtls-why-we-need-a-certificate-for-our-root-user">Connecting via Mutual TLS (mTLS) — Why We Need a Certificate for Our <code>root</code> User</h3>
<p>Now that our CockroachDB cluster is officially running in secure mode, we can’t just connect to it with a username and port anymore. CockroachDB won’t accept that.</p>
<p>To talk to it, <strong>we must connect using Mutual TLS (mTLS)</strong>.</p>
<p>Why? Because TLS alone only protects the connection in one direction (you verifying the server). mTLS protects the connection in both directions (you verify the server, and the server also verifies <em>you</em>).</p>
<p>Let’s break this down in simple, everyday English 👇🏾</p>
<h4 id="heading-why-tls-exists-in-the-first-place">Why TLS Exists in the First Place</h4>
<p>Whenever you send anything to CockroachDB, like a query, a connection, a password, whatever, it’s all data moving over a network – for example, the internet.</p>
<p>Without protection, anyone could intercept it and read the data being sent to your DB while it’s on its way<br>TLS fixes that :)</p>
<p>✔️ The CockroachDB cluster has its own <strong>public key + private key</strong><br>✔️ It has a <strong>certificate</strong> that carries its public key<br>✔️ When you connect, the cluster sends you this certificate<br>✔️ Your database tool, for example Beekeeper, uses the public key in the process of encrypting all your traffic sent to the DB<br>✔️ Only CockroachDB can decrypt it with the help of its private key</p>
<p>This gives you encryption and proof you’re really talking to CockroachDB, not some fake service pretending to be it.</p>
<h4 id="heading-why-mtls-exists-mutual-tls">Why mTLS Exists (Mutual TLS)</h4>
<p>TLS protects the server – CockroachDB. mTLS protects <strong>both sides</strong> – you and CockroachDB.</p>
<p>So CockroachDB also wants YOU to send your certificate.</p>
<p>But not just any certificate. Your certificate must be:</p>
<ul>
<li><p>Signed by <strong>THE SAME Certificate Authority (CA)</strong></p>
</li>
<li><p>Trusted by the CockroachDB cluster</p>
</li>
<li><p>Mapped to a CockroachDB user (like <code>root</code>)</p>
</li>
</ul>
<p>This is how CockroachDB says:</p>
<blockquote>
<p>“Let me see your certificate so I know you’re someone I should allow in.”</p>
</blockquote>
<p>And we reply:</p>
<blockquote>
<p>“Here is my certificate, signed by the same CA that signed yours.”</p>
</blockquote>
<p>At that point, both sides trust each other.</p>
<p>If this still feels abstract, <a target="_blank" href="https://www.youtube.com/watch?v=EnY6fSng3Ew">watch this video</a>. It explains TLS beautifully.</p>
<h3 id="heading-lets-explore-our-clusters-certificate">Let’s Explore Our Cluster’s Certificate</h3>
<p>Remember that the Helm chart automatically created:</p>
<ul>
<li><p>The CockroachDB Certificate Authority</p>
</li>
<li><p>The CockroachDB node certificates</p>
</li>
<li><p>The keypairs used for encryption</p>
</li>
</ul>
<p>You can list all the CockroachDB-related Kubernetes secrets with:</p>
<pre><code class="lang-bash">kubectl get secrets
</code></pre>
<p>The one we're interested in is:</p>
<pre><code class="lang-bash">crdb-cockroachdb-node-secret
</code></pre>
<p>If you inspect this secret, you’ll see three keys inside:</p>
<ul>
<li><p><code>ca.crt</code>: the CA’s public certificate</p>
</li>
<li><p><code>tls.key</code>: the CockroachDB node’s private key</p>
</li>
<li><p><code>tls.crt</code>: the CockroachDB node certificate</p>
</li>
</ul>
<p>Now let’s decode the CockroachDB node certificate.</p>
<p>Run this:</p>
<pre><code class="lang-bash">kubectl get secret crdb-cockroachdb-node-secret -o jsonpath=<span class="hljs-string">'{.data.tls\.crt}'</span> | base64 -d &gt; crdb-node.crt
</code></pre>
<p>This gives you the raw certificate (which looks like gibberish):</p>
<pre><code class="lang-bash">-----BEGIN CERTIFICATE-----
MIIEGDCCAwCgAwIBAgIQWgOPJa4OLoZZjcXLgDF3bjANBgkqhkiG9w0BAQsFADAr
...
-----END CERTIFICATE-----
</code></pre>
<p>Let’s decode it into something readable:</p>
<pre><code class="lang-bash">openssl x509 -<span class="hljs-keyword">in</span> ./crdb-node.crt -text -noout &gt; crdb-node.crt.decoded
</code></pre>
<p>Open the <code>crdb-node.crt.decoded</code> file. This is the <strong>human-readable</strong> CockroachDB cluster certificate.</p>
<p><strong>N.B.:</strong> You need to have the <code>openssl</code> tool installed in order to be able to make the certificate human-readable. If you don’t, <a target="_blank" href="https://github.com/openssl/openssl#download">install it following this tutorial</a>.</p>
<h3 id="heading-understanding-the-certificate-sections-explained-super-simply">Understanding the Certificate Sections (Explained Super Simply)</h3>
<h4 id="heading-1-issuer">1. Issuer</h4>
<p>You’ll see something like:</p>
<pre><code class="lang-bash">Issuer: O = Cockroach, CN = Cockroach CA
</code></pre>
<p>This tells us:</p>
<ul>
<li><p>The certificate was signed by a Certificate Authority created by the Helm chart</p>
</li>
<li><p>The <strong>Organization (O)</strong> is “Cockroach”</p>
</li>
<li><p>The <strong>Common Name (CN)</strong> is “Cockroach CA”</p>
</li>
</ul>
<p>This basically means:</p>
<blockquote>
<p>“This certificate comes from the CockroachDB internal CA.”</p>
</blockquote>
<h4 id="heading-2-subject">2. Subject</h4>
<p>You’ll also see this:</p>
<pre><code class="lang-bash">Subject: O = Cockroach, CN = node
</code></pre>
<p>What does this mean?</p>
<p><strong>Organization = Cockroach</strong></p>
<ul>
<li><p>This simply groups all CockroachDB-generated certificates under one “organization label.”</p>
</li>
<li><p>It doesn’t refer to the company. It’s just a logical grouping created by CockroachDB’s built-in toolset.</p>
</li>
</ul>
<p><strong>Common Name = node</strong></p>
<ul>
<li><p>This tells CockroachDB that this certificate belongs to a <strong>cluster node</strong>, not a user or a client machine.</p>
</li>
<li><p>In CockroachDB, node certificates are used for:</p>
<ol>
<li><p>DB-to-DB communication</p>
</li>
<li><p>cluster gossip</p>
</li>
<li><p>handling incoming connections from clients (you)</p>
</li>
</ol>
</li>
</ul>
<p>So this certificate is saying:</p>
<blockquote>
<p>“Hi, I’m a CockroachDB node. Please trust me as part of the cluster.”</p>
</blockquote>
<h4 id="heading-3-extended-key-usage-eku">3. Extended Key Usage (EKU)</h4>
<p>Scroll down and you’ll see:</p>
<pre><code class="lang-bash">X509v3 Extended Key Usage:
    TLS Web Server Authentication
    TLS Web Client Authentication
</code></pre>
<p>This is <em>super important</em>, because it defines <strong>how</strong> this certificate is allowed to be used.</p>
<p>Let’s simplify it:</p>
<h4 id="heading-tls-web-server-authentication">TLS Web Server Authentication</h4>
<p>This means:</p>
<blockquote>
<p>“This certificate can be presented <strong>by a server</strong> to prove its identity.”</p>
</blockquote>
<p>In our case, the CockroachDB node uses this certificate to prove to you (the client) that it is the real CockroachDB server. Think of it like flashing an ID card before letting you in.</p>
<h4 id="heading-tls-web-client-authentication">TLS Web Client Authentication</h4>
<p>This means:</p>
<blockquote>
<p>“This certificate can also be used <strong>as a client certificate</strong>.”</p>
</blockquote>
<p>Why would a server have a client certificate? Well, because in CockroachDB, nodes (DBs) talk to each other. When node A connects to node B, node A is a <strong>client</strong>, and node B is a <strong>server</strong>.</p>
<p>So the same certificate serves two roles. Your local machine will use a different certificate, created specifically for your <code>root</code> user. We’ll generate that soon.</p>
<h3 id="heading-creating-a-client-certificate-so-we-can-finally-connect-to-cockroachdb">Creating a Client Certificate (So We Can Finally Connect to CockroachDB)</h3>
<p>Now that we’ve seen how the CockroachDB node certificate works, let’s generate our client certificate – the one we’ll use to connect from Beekeeper Studio.</p>
<p>Remember: CockroachDB is running in secure mode, so it won’t accept any connection that doesn’t come with a valid, signed certificate.</p>
<p>To fix that, let’s build a tiny Kubernetes pod whose only job is to create a certificate for our <code>root</code> SQL user.</p>
<h4 id="heading-step-1-create-a-file-called-gen-root-certyml">Step 1: Create a File Called <code>gen-root-cert.yml</code></h4>
<p>Paste this into it:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gen-root-cert</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span>
        <span class="hljs-attr">items:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.crt</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.crt</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.key</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.key</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gen</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.1</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"sh"</span>, <span class="hljs-string">"-ec"</span>]
      <span class="hljs-attr">args:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">|
          mkdir -p /out
</span>
          <span class="hljs-comment"># Copy the CockroachDB cluster Certificate Authority certificate file `ca.crt` (for Mutual TLS authentication)</span>
          <span class="hljs-string">cp</span> <span class="hljs-string">/ca/ca.crt</span> <span class="hljs-string">/out/ca.crt</span>

          <span class="hljs-comment"># Create the client certificate and key pair for the SQL user 'root' using the CockroachDB cluster Certificate Authority private key `ca.key`</span>
          <span class="hljs-string">/cockroach/cockroach</span> <span class="hljs-string">cert</span> <span class="hljs-string">create-client</span> <span class="hljs-string">root</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--certs-dir=/out</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--ca-key=/ca/ca.key</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--lifetime=5h</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--overwrite</span>

          <span class="hljs-comment"># List the generated files</span>
          <span class="hljs-string">ls</span> <span class="hljs-string">-al</span> <span class="hljs-string">/out</span>

          <span class="hljs-comment"># Keep the pod alive so we can kubectl cp the files</span>
          <span class="hljs-string">sleep</span> <span class="hljs-number">3600</span>
      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> { <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>, <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/ca</span>, <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span> }
      <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">requests:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"50Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"10m"</span>
        <span class="hljs-attr">limits:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"500Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"50m"</span>
</code></pre>
<p>So how does this work?</p>
<p>We previously mentioned that the Helm chart created a secret, <code>crdb-cockroachdb-ca-secret</code>.</p>
<p>This secret contains:</p>
<ul>
<li><p>The Certificate Authority public certificate</p>
</li>
<li><p>The private key (used for signing)</p>
</li>
<li><p>The CA metadata</p>
</li>
</ul>
<p>CockroachDB requires that the server certificate (node cert) and the client certificate (your root cert) be signed by <strong>THE SAME CA</strong>. Because this ensures both sides trust each other.</p>
<p>So what do we do?</p>
<p>We mount the CA secret into the pod:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">volumes:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
    <span class="hljs-attr">secret:</span>
      <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span>
</code></pre>
<p>This gives the pod access to:</p>
<ul>
<li><p><code>/ca/ca.crt</code>: CA public certificate</p>
</li>
<li><p><code>/ca/ca.key</code>: CA <em>private</em> key</p>
</li>
</ul>
<p>And with these, we can sign new client certificates inside the cluster.</p>
<p>The important command inside the pod:</p>
<pre><code class="lang-yaml"><span class="hljs-string">/cockroach/cockroach</span> <span class="hljs-string">cert</span> <span class="hljs-string">create-client</span> <span class="hljs-string">root</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--certs-dir=/out</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--ca-key=/ca/ca.key</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--lifetime=5h</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--overwrite</span>
</code></pre>
<p>What this does:</p>
<ul>
<li><p>Generates a brand new public/private key pair for the <code>root</code> SQL user</p>
</li>
<li><p>Uses the CA private key to <strong>sign the client certificate</strong></p>
</li>
<li><p>Places everything inside <code>/out</code></p>
</li>
<li><p>Makes the certificate valid for <strong>5 hours</strong></p>
</li>
</ul>
<p>If we passed <code>demo</code> instead of <code>root</code>, then the certificate CN would be <code>demo</code>, and CockroachDB would treat anyone using that certificate as the <code>demo</code> SQL user.</p>
<p>That’s how CockroachDB identifies and authenticates users when running in secure mode.</p>
<h4 id="heading-step-2-deploy-the-pod">Step 2: Deploy the Pod</h4>
<p>Run:</p>
<pre><code class="lang-yaml"><span class="hljs-string">kubectl</span> <span class="hljs-string">apply</span> <span class="hljs-string">-f</span> <span class="hljs-string">gen-root-cert.yml</span>
</code></pre>
<p>Give it a minute to start and generate the files.</p>
<h4 id="heading-step-3-copy-the-certificates-to-your-local-pc">Step 3: Copy the Certificates to Your Local PC</h4>
<p>We need three files:</p>
<ul>
<li><p><code>client.root.crt</code>: client certificate</p>
</li>
<li><p><code>client.root.key</code>: private key</p>
</li>
<li><p><code>ca.crt</code>: CA certificate</p>
</li>
</ul>
<p>Copy them from the pod to your machine:</p>
<pre><code class="lang-bash">kubectl cp default/gen-root-cert:/out/client.root.crt ./client.root.crt
kubectl cp default/gen-root-cert:/out/client.root.key ./client.root.key
kubectl cp default/gen-root-cert:/out/ca.crt             ./ca.crt
</code></pre>
<p>Now your folder should contain:</p>
<pre><code class="lang-bash">client.root.crt
client.root.key
ca.crt
</code></pre>
<p>These are the files Beekeeper Studio needs for mTLS.</p>
<h4 id="heading-step-4-decode-the-client-certificate-just-like-we-did-for-the-node-certificate">Step 4: Decode the Client Certificate (Just Like We Did for the Node Certificate)</h4>
<p>Run:</p>
<pre><code class="lang-bash">openssl x509 -<span class="hljs-keyword">in</span> client.root.crt -text -noout &gt; crdb-root.crt.decoded
</code></pre>
<p>Open the <code>crdb-root.crt.decoded</code> file and look at the contents.</p>
<h4 id="heading-understanding-the-client-certificate">Understanding the Client Certificate</h4>
<ol>
<li><strong>Issuer</strong></li>
</ol>
<p>You'll see <code>Issuer: O = Cockroach, CN = Cockroach CA</code></p>
<p>This is the same Issuer as the CockroachDB node certificate.</p>
<p>This confirms that both certificates were signed by the <em>same</em> Certificate Authority, that they trust each other, and that mTLS will work perfectly.</p>
<ol start="2">
<li><strong>Subject</strong></li>
</ol>
<p>You’ll see: <code>Subject: O = Cockroach, CN = root</code></p>
<p>This means that the Organization is just a label grouping CockroachDB identities, and that the Common Name is <code>root</code>. This is VERY important.</p>
<p>The CN of a client certificate literally tells CockroachDB:</p>
<blockquote>
<p>“This connection belongs to the SQL user named <code>root</code>.”</p>
</blockquote>
<p>If CN was <code>demo</code>, CockroachDB would authenticate you as the <code>demo</code> SQL user.</p>
<h4 id="heading-extended-key-usage-eku">Extended Key Usage (EKU)</h4>
<p>You should see: <code>TLS Web Client Authentication</code>.</p>
<p>This is exactly what we want. It tells CockroachDB:</p>
<blockquote>
<p>“This certificate is only for clients connecting to the database.”</p>
</blockquote>
<p>Unlike node certificates, you will NOT see: <code>TLS Web Server Authentication</code>.</p>
<p>Why?</p>
<p>Because:</p>
<ul>
<li><p><strong>Server Authentication</strong> = for certificates the SERVER SHOWS TO THE CLIENT. For example: CockroachDB nodes proving they are legitimate.</p>
</li>
<li><p><strong>Client Authentication</strong> = for certificates THE CLIENT SENDS TO THE SERVER. For example: You proving you are the real <code>root</code> user.</p>
</li>
</ul>
<h4 id="heading-why-your-client-certificate-cannot-be-used-as-a-server-certificate">Why your client certificate <strong>cannot</strong> be used as a server certificate</h4>
<p>Because a server certificate says:</p>
<blockquote>
<p>“Trust me, I AM the CockroachDB server.”</p>
</blockquote>
<p>But your client certificate says:</p>
<blockquote>
<p>“Trust me, I am an authenticated user.”</p>
</blockquote>
<p>Two very different identities. And CockroachDB will <em>reject</em> any certificate used in the wrong role.</p>
<p>So having only TLS Web Client Authentication in your certificate is perfect for our use case. :)</p>
<h3 id="heading-connecting-to-our-cockroachdb-cluster-securely-using-mtls">Connecting to Our CockroachDB Cluster Securely (Using mTLS)</h3>
<p>Now that we’ve successfully generated the certificates and key pairs we need, it's time to use them to securely connect to our CockroachDB cluster from Beekeeper Studio.</p>
<p>Remember: CockroachDB is running in secure mode, so without these certificates, it will <em>reject all incoming connections</em>, even if you enter the correct username and password.</p>
<p>Let’s walk through the steps.👇🏾</p>
<h4 id="heading-step-1-make-sure-port-forwarding-is-still-running">Step 1: Make Sure Port Forwarding Is Still Running</h4>
<p>Before connecting, ensure that your CockroachDB cluster is still exposed to your PC.</p>
<p>If you already closed the previous terminal window, simply re-run this:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26259:26257
</code></pre>
<p>This makes your CockroachDB node reachable at: <code>localhost:26259</code>. If this step isn’t active, <em>Beekeeper Studio will not be able to connect</em>.</p>
<h4 id="heading-step-2-open-beekeeper-studio-and-set-up-the-connection">Step 2: Open Beekeeper Studio and Set Up the Connection</h4>
<p>Launch Beekeeper Studio and open a fresh connection window (Ctrl + Shift + N if needed).</p>
<p>Now fill in the fields like this:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Field</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Connection Type</strong></td><td>CockroachDB</td></tr>
<tr>
<td><strong>Host</strong></td><td><code>localhost</code></td></tr>
<tr>
<td><strong>Port</strong></td><td><code>26259</code></td></tr>
<tr>
<td><strong>User</strong></td><td><code>root</code></td></tr>
<tr>
<td><strong>Default Database</strong></td><td><code>defaultdb</code></td></tr>
</tbody>
</table>
</div><p>Now enable the <strong>“Enable SSL”</strong> option. Once enabled, expand the SSL section and set the following three fields:</p>
<ul>
<li><p><strong>CA Cert:</strong> Set this to the location of: <code>ca.crt</code>. This is the root Certificate Authority file you copied earlier using: <code>kubectl cp default/gen-root-cert:/out/ca.crt ./ca.crt</code>. It should still be in your project’s root directory (for example, <code>cockroachdb-tutorial/</code>).</p>
</li>
<li><p><strong>Certificate:</strong> Set this to the location of: <code>client.root.crt</code></p>
</li>
<li><p><strong>Key File:</strong> Set this to the location of: <code>client.root.key</code></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763389469459/bbdb17c5-1c3b-4163-932f-3cd5382160f4.png" alt="Connecting to the CokcorachDB cluster from Beekeeper Studio in &quot;Secure&quot; mode" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-step-3-click-connect">Step 3: Click “Connect”</h4>
<p>Once all the fields are set properly, click <strong>Connect</strong>.</p>
<p>If everything was done correctly, you should now be connected to your CockroachDB cluster securely over Mutual TLS.</p>
<p>If the connection fails:</p>
<ul>
<li><p>Double-check your certificate paths</p>
</li>
<li><p>Ensure port-forwarding is running</p>
</li>
<li><p>Verify the user is <code>root</code></p>
</li>
<li><p>Confirm the selected connection type is <code>CockroachDB</code>.</p>
</li>
</ul>
<h4 id="heading-step-4-run-your-first-secure-query">Step 4: Run Your First Secure Query</h4>
<p>Now that you're connected, let’s verify everything works by running:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SHOW</span> <span class="hljs-keyword">users</span>;
</code></pre>
<p>You should see two users automatically created by CockroachDB:</p>
<ul>
<li><p><strong>admin</strong></p>
</li>
<li><p><strong>root</strong></p>
</li>
</ul>
<p>In the next subsection, we’ll create a <strong>new SQL user</strong> and generate a certificate for that user (just like we did for the <code>root</code> user) so you’ll understand how CockroachDB handles user authentication in production environments.</p>
<h3 id="heading-restoring-our-previous-database-into-the-new-gke-cockroachdb-cluster-without-sa-keys">Restoring Our Previous Database into the New GKE CockroachDB Cluster (without SA keys)</h3>
<p>Now that our CockroachDB cluster is up and running on GKE – fully secured with TLS encryption and mTLS authentication – it’s time to bring back the data from our previous setup.</p>
<p>Remember how we backed up our CockroachDB database (running on Minikube) to Google Cloud Storage?</p>
<p>Well, now we’re going to restore that same backup into our new production cluster on GKE. But before CockroachDB can access our bucket, we must give it permission – securely.</p>
<p>And here’s the cool part: <strong>we don’t need to use Service Account keys anymore.</strong></p>
<h4 id="heading-why-we-dont-need-service-account-keys-on-gke">Why We Don’t Need Service Account Keys on GKE</h4>
<p>Earlier, in the backup section, we generated a Service Account key on our PC and mounted it into our Minikube cluster.</p>
<p>But for GKE, we intentionally left out the following fields in our <code>cockroachdb-production.yml</code>:</p>
<ul>
<li><p><code>env</code></p>
</li>
<li><p><code>volumes</code></p>
</li>
<li><p><code>volumeMounts</code></p>
</li>
</ul>
<p>The reason? GKE supports something called <strong>Workload Identity</strong>.</p>
<p>Workload Identity lets us securely connect Kubernetes Service Accounts (KSAs) to Google Cloud Service Accounts (GSAs), without storing or mounting any secret keys. The authentication happens “implicitly” thanks to Google’s metadata server.</p>
<p>💡 Workload Identity works easily when your cluster is running on GKE. It’s more complex to set up on Minikube, Kind, EKS, AKS, or any other non-GKE cluster.</p>
<h4 id="heading-step-1-linking-the-google-service-account-to-our-kubernetes-service-account">Step 1: Linking the Google Service Account to Our Kubernetes Service Account</h4>
<p>We already touched this when deploying our cluster, but let’s look at the specific line again.</p>
<p>Open your <code>cockroachdb-production.yml</code> Helm values file and scroll to the <code>serviceAccount</code> section. You should see something like this:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">serviceAccount:</span>
    <span class="hljs-attr">create:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">"crdb-cockroachdb"</span>
    <span class="hljs-attr">annotations:</span>
      <span class="hljs-attr">iam.gke.io/gcp-service-account:</span> <span class="hljs-string">cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</span>
<span class="hljs-string">...</span>
</code></pre>
<p>Replace the <code>&lt;PROJECT_ID&gt;</code> placeholder with your real Google Cloud project ID.</p>
<p>If you’re unsure of the ID, go to Google Cloud Console, then to IAM &amp; Admin, and finally to Service Accounts. Search for <code>cockroachdb-backup</code> and copy the project ID from there.</p>
<p>This annotation instructs GKE to automatically authenticate our CockroachDB pods as the <code>cockroachdb-backup</code> Google Service Account – no keys needed.</p>
<h4 id="heading-step-2-binding-ksa-gsa-using-workload-identity">Step 2: Binding KSA ↔️ GSA Using Workload Identity</h4>
<p>Annotating the Service Account isn’t enough. We still need to explicitly allow our KSA to “impersonate" the GSA.</p>
<p>Run this command to set the active project:</p>
<pre><code class="lang-bash">gcloud config <span class="hljs-built_in">set</span> project &lt;PROJECT_ID&gt;
</code></pre>
<p>Now, apply the IAM policy binding:</p>
<pre><code class="lang-bash">gcloud iam service-accounts add-iam-policy-binding \
  &lt;GOOGLE_SERVICE_ACCOUNT&gt; \
  --role roles/iam.workloadIdentityUser \
  --member <span class="hljs-string">"serviceAccount:&lt;PROJECT_ID&gt;.svc.id.goog[&lt;NAMESPACE&gt;/&lt;KUBERNETES_SERVICE_ACCOUNT&gt;]"</span>
</code></pre>
<p>Replace the placeholders with:</p>
<ul>
<li><p><code>&lt;GOOGLE_SERVICE_ACCOUNT&gt;</code> with <code>cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</code></p>
</li>
<li><p><code>&lt;PROJECT_ID&gt;</code> with your GCP project ID</p>
</li>
<li><p><code>&lt;NAMESPACE&gt;</code> with where CockroachDB runs (<code>default</code>)</p>
</li>
<li><p><code>&lt;KUBERNETES_SERVICE_ACCOUNT&gt;</code> with <code>crdb-cockroachdb</code></p>
</li>
</ul>
<p>After a few seconds, you should see something like:</p>
<pre><code class="lang-yaml"><span class="hljs-string">Updated</span> <span class="hljs-string">IAM</span> <span class="hljs-string">policy</span> <span class="hljs-string">for</span> <span class="hljs-string">serviceAccount</span> [<span class="hljs-string">cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</span>]<span class="hljs-string">.</span>
<span class="hljs-attr">bindings:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">members:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">serviceAccount:&lt;PROJECT_ID&gt;.svc.id.goog[default/crdb-cockroachdb]</span>
  <span class="hljs-attr">role:</span> <span class="hljs-string">roles/iam.workloadIdentityUser</span>
<span class="hljs-attr">etag:</span> <span class="hljs-string">***</span>
<span class="hljs-attr">version:</span> <span class="hljs-number">1</span>
</code></pre>
<p>Perfect. Your KSA can now access Google Cloud Storage automatically.</p>
<h3 id="heading-restoring-our-previous-database-from-google-cloud-storage">Restoring Our Previous Database from Google Cloud Storage</h3>
<p>Now that authentication is set up, let’s restore the backup we previously created in the Minikube cluster.</p>
<p>Open Beekeeper Studio and reconnect to your CockroachDB cluster (the one running on GKE).</p>
<p>Before restoring anything, let’s check if the <code>books</code> table exists:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>You should see an error saying the table doesn’t exist. Don’t worry, that’s expected.</p>
<h3 id="heading-now-lets-restore-the-data">Now, Let’s Restore the Data 🎉</h3>
<p>Run this command:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">RESTORE</span> <span class="hljs-keyword">FROM</span> LATEST <span class="hljs-keyword">IN</span> <span class="hljs-string">'gs://&lt;BUCKET_NAME&gt;/cluster?AUTH=implicit'</span>;
</code></pre>
<p>Replace <code>&lt;BUCKET_NAME&gt;</code> with the name of the bucket you created earlier (for example: <code>cockroachdb-backup-7gw8u</code>).</p>
<p>CockroachDB will now:</p>
<ul>
<li><p>Authenticate using Workload Identity</p>
</li>
<li><p>Find the latest backup inside your bucket</p>
</li>
<li><p>Restore all tables, schemas, and data into your new GKE cluster</p>
</li>
</ul>
<p>After a couple of minutes, you should get a Success message.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763393752870/f95d76c0-3722-491a-a97c-a1b8a79bdc79.png" alt="Successfully restored CockroachDB database" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now, run the query again:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom! Your books from the Minikube cluster should now appear inside the new CockroachDB cluster running on GKE 😃.</p>
<h3 id="heading-connecting-to-the-database-with-a-new-user">Connecting to the Database with a New User</h3>
<p>So far, we’ve been connecting to our CockroachDB cluster using the <code>root</code> user. While this is super convenient for tutorials, it’s not recommended for real apps.</p>
<p>This is because the <code>root</code> user has advanced privileges – basically, full access to your entire cluster. If an attacker got hold of these credentials, or your application was compromised, they could do <strong>A LOT</strong> of damage. 😬</p>
<p>Instead, it’s best practice to create a user with <strong>limited permissions</strong> for your apps. This way, even if the user is compromised, the damage is contained.</p>
<h4 id="heading-authentication-options-for-users">Authentication Options for Users</h4>
<p>CockroachDB is flexible when it comes to authentication:</p>
<ol>
<li><p><strong>Password Authentication:</strong> Create a user with a password and connect using just username + password (no client certificates required).</p>
</li>
<li><p><strong>Passwordless / Mutual TLS Authentication:</strong> Create a user without a password, then connect using client certificates signed by the same CA (like we did for <code>root</code>).</p>
</li>
<li><p><strong>Both Password + Mutual TLS:</strong> Create a user with a password and also connect using client certificates. This adds an extra layer of security.</p>
</li>
</ol>
<p>In this subsection, we’ll start simple and use password authentication.</p>
<h4 id="heading-step-1-create-the-new-user">Step 1: Create the New User</h4>
<p>Open your current connection in Beekeeper Studio (signed in as <code>root</code>) and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">USER</span> password_auth <span class="hljs-keyword">WITH</span> <span class="hljs-keyword">PASSWORD</span> <span class="hljs-string">'supersecret'</span>;
</code></pre>
<p>You should see a message confirming the user was created successfully.</p>
<h4 id="heading-step-2-connect-as-the-new-user">Step 2: Connect as the New User</h4>
<p>Open a new Beekeeper Studio window (Ctrl + Shift + N). <strong>DO NOT</strong> exit/close the old window, as we’ll need it later.</p>
<p>Fill in the connection fields:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Field</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Connection Type</strong></td><td>CockroachDB</td></tr>
<tr>
<td><strong>Host</strong></td><td><code>localhost</code></td></tr>
<tr>
<td><strong>Port</strong></td><td><code>26259</code></td></tr>
<tr>
<td><strong>Database</strong></td><td><code>defaultdb</code></td></tr>
<tr>
<td><strong>User</strong></td><td><code>password_auth</code></td></tr>
<tr>
<td><strong>Password</strong></td><td><code>huh</code> (for now, we’ll try a wrong password to see it fail)</td></tr>
</tbody>
</table>
</div><p>Click Connect.</p>
<p>❌ You’ll see an error about SSL connection being required.</p>
<p>Even though we’re connecting with a password instead of certificates, <strong>enabling SSL is still important</strong>. It encrypts the data between Beekeeper Studio and CockroachDB.</p>
<p>Without it, sensitive info like passwords and queries could be intercepted (man-in-the-middle attacks).</p>
<h4 id="heading-step-3-enable-ssl-amp-ca-verification">Step 3 — Enable SSL &amp; CA Verification</h4>
<ul>
<li><p>Tick <strong>Enable SSL</strong></p>
</li>
<li><p>Click the <strong>CA Cert</strong> field and select the <code>ca.crt</code> file in your project root (<code>cockroachdb-tutorial/</code>)</p>
</li>
</ul>
<p>This ensures that Beekeeper Studio verifies it’s really talking to our CockroachDB cluster and protects against attackers trying to intercept the connection.</p>
<p>Now, click Connect again.</p>
<p>❌ Initially, you’ll still see a <strong>Password authentication failed</strong> error because we intentionally entered the wrong password.</p>
<h4 id="heading-step-4-connect-with-the-correct-password">Step 4: Connect With the Correct Password</h4>
<p>Replace the password with <code>supersecret</code>, then click Connect.</p>
<p>You are now signed in as the <code>password_auth</code> user!</p>
<h4 id="heading-step-5-check-permissions">Step 5: Check Permissions</h4>
<p>Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>❌ You should see an error stating that <code>password_auth</code> does not have permission to access the <code>books</code> table.</p>
<p>This is expected, as it confirms that our limited-access user can <strong>only access what we explicitly grant it</strong>. Even if compromised, the attacker can’t modify our entire database.</p>
<h4 id="heading-step-6-granting-access-to-specific-tables">Step 6: Granting Access to Specific Tables</h4>
<p>To allow <code>password_auth</code> to work with the <code>books</code> table, switch back to the <code>root</code> connection Beekeeper Studio window and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">USAGE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">SCHEMA</span> defaultdb.public <span class="hljs-keyword">TO</span> password_auth;
<span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">SELECT</span>, <span class="hljs-keyword">INSERT</span>, <span class="hljs-keyword">UPDATE</span>, <span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">TABLE</span> defaultdb.public.books <span class="hljs-keyword">TO</span> password_auth;
</code></pre>
<p>This gives the user read and write access to the <code>books</code> table only.</p>
<h4 id="heading-step-7-verify-the-new-user-access">Step 7: Verify the New User Access</h4>
<p>Go back to the Beekeeper Studio window where you’re signed in as <code>password_auth</code> and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom! You should now see the list of books from your restored database.</p>
<p>Our new user is fully functional with <strong>limited privileges</strong>, making it safe for use in real applications.</p>
<h3 id="heading-connecting-with-passwordless-authentication-mutual-tls">Connecting with Passwordless Authentication (Mutual TLS)</h3>
<p>We’ve already seen how to connect to the database using a user that authenticates with a password, and without any client certificates.</p>
<p>Now, let’s look at the opposite scenario: passwordless authentication via Mutual TLS (mTLS).</p>
<p>This is one of the strongest forms of authentication because instead of a password, the database verifies you using a <strong>cryptographically signed certificate</strong>.</p>
<p>Let’s walk through it.</p>
<h4 id="heading-step-1-create-the-mtlsauth-user">Step 1: Create the <code>mtls_auth</code> User</h4>
<p>Navigate back to the Beekeeper Studio window where you're currently signed in as the <code>root</code> user. Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">USER</span> mtls_auth;
</code></pre>
<p>You should see a success message confirming that the user has been created.</p>
<p><strong>N.B.:</strong> If this query fails, there’s a good chance your <code>root</code> client certificate has expired. Remember that we set a <strong>5-hour lifetime</strong> when generating it earlier.</p>
<p>If this happens, delete the certificate-generation pod:</p>
<pre><code class="lang-bash">kubectl delete po/gen-root-cert
</code></pre>
<p>Then re-apply the <code>gen-root-cert.yml</code> manifest. Copy the newly generated <code>client.root.crt</code>, <code>client.root.key</code>, and <code>ca.crt</code> back to your PC. Then try creating the user again.</p>
<h4 id="heading-step-2-attempt-signing-in-as-mtlsauth-expect-failure">Step 2: Attempt Signing In as <code>mtls_auth</code> (Expect Failure)</h4>
<p>Open a new Beekeeper Studio window (Ctrl + Shift + N).</p>
<p>Try filling in the connection settings using:</p>
<ul>
<li><p>User: <code>mtls_auth</code></p>
</li>
<li><p>SSL enabled</p>
</li>
<li><p>CA Cert: <code>ca.crt</code></p>
</li>
<li><p>Client Cert: <code>client.root.crt</code></p>
</li>
<li><p>Client Key: <code>client.root.key</code></p>
</li>
</ul>
<p>Click Connect.</p>
<p>You’ll see an error message similar to this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763444971964/93f41787-425b-4e36-86da-4b688cef672f.png" alt="Connecting as the mtls_auth user with the wrong certificate and key-pair" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Why does this fail?</p>
<ol>
<li><p>The user has no password, so password login is impossible.</p>
</li>
<li><p>You’re using the <em>root</em> certificate, not a certificate belonging to <code>mtls_auth</code>. CockroachDB is strict: each user must authenticate using <em>their own</em> certificate.</p>
</li>
</ol>
<p>So let's fix that by generating a new certificate + key pair for the <code>mtls_auth</code> user.</p>
<h4 id="heading-step-3-create-certificate-key-for-mtlsauth">Step 3: Create Certificate + Key for <code>mtls_auth</code></h4>
<p>Just like we generated certificates for the <code>root</code> user earlier, we’ll do the same for <code>mtls_auth</code>.</p>
<p>Create a new manifest named <code>gen-mtls_auth-cert.yml</code>.</p>
<p>Paste in this content:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gen-mtls-auth-cert</span> 
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span> 
        <span class="hljs-attr">items:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.crt</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.crt</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.key</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.key</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gen</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.1</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"sh"</span>, <span class="hljs-string">"-ec"</span>]
      <span class="hljs-attr">args:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">|
          mkdir -p /out
</span>
          <span class="hljs-comment"># Copy the CA certificate</span>
          <span class="hljs-string">cp</span> <span class="hljs-string">/ca/ca.crt</span> <span class="hljs-string">/out/ca.crt</span>

          <span class="hljs-comment"># Create the client certificate and key pair for user 'mtls_auth'</span>
          <span class="hljs-string">/cockroach/cockroach</span> <span class="hljs-string">cert</span> <span class="hljs-string">create-client</span> <span class="hljs-string">mtls_auth</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--certs-dir=/out</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--ca-key=/ca/ca.key</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--lifetime=5h</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--overwrite</span>

          <span class="hljs-comment"># List generated files</span>
          <span class="hljs-string">ls</span> <span class="hljs-string">-al</span> <span class="hljs-string">/out</span>

          <span class="hljs-comment"># Keep pod alive for kubectl cp</span>
          <span class="hljs-string">sleep</span> <span class="hljs-number">3600</span>
      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> { <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>, <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/ca</span>, <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span> }
      <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">requests:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"50Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"10m"</span>
        <span class="hljs-attr">limits:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"500Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"50m"</span>
</code></pre>
<p>Apply this file, wait for the pod to start, then copy the generated files:</p>
<pre><code class="lang-bash">kubectl cp default/gen-mtls-auth-cert:/out/client.mtls_auth.crt ./client.mtls_auth.crt 
kubectl cp default/gen-mtls-auth-cert:/out/client.mtls_auth.key ./client.mtls_auth.key
kubectl cp default/gen-mtls-auth-cert:/out/ca.crt ./ca.crt
</code></pre>
<p>Now we have the correct certificate + key pair for our new user.</p>
<h4 id="heading-step-4-connect-as-mtlsauth">Step 4: Connect as <code>mtls_auth</code></h4>
<p>Go back to the new Beekeeper Studio window and update the SSL fields:</p>
<ul>
<li><p><strong>CA Cert:</strong> <code>ca.crt</code></p>
</li>
<li><p><strong>Certificate:</strong> <code>client.mtls_auth.crt</code></p>
</li>
<li><p><strong>Key File:</strong> <code>client.mtls_auth.key</code></p>
</li>
</ul>
<p>Click Connect.</p>
<p>This time, it should succeed instantly</p>
<h4 id="heading-step-5-inspect-the-certificate">Step 5 — Inspect the Certificate</h4>
<p>To understand how CockroachDB links certificates to users, decode the certificate:</p>
<pre><code class="lang-bash">openssl x509 -<span class="hljs-keyword">in</span> client.mtls_auth.crt -text -noout &gt; client.mtls_auth.crt.decoded
</code></pre>
<p>Open the file, scroll to the Subject field, and you’ll see:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">Subject:</span> <span class="hljs-string">O</span> <span class="hljs-string">=</span> <span class="hljs-string">Cockroach,</span> <span class="hljs-string">CN</span> <span class="hljs-string">=</span> <span class="hljs-string">mtls_auth</span>
<span class="hljs-string">...</span>
</code></pre>
<p>The <code>CN</code> (Common Name) is the username CockroachDB uses to authenticate the session.</p>
<p>This is how CockroachDB knows you’re connecting as the <code>mtls_auth</code> user without any password at all. :)</p>
<h4 id="heading-step-6-try-reading-the-books-table">Step 6: Try Reading the Books Table</h4>
<p>Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>❌ You’ll get a permission error, just like we did earlier with the <code>password_auth</code> user.</p>
<p>This is expected because <code>mtls_auth</code> has <em>no</em> privileges yet. Perfect!</p>
<h4 id="heading-step-7-grant-permissions-to-mtlsauth">Step 7: Grant Permissions to <code>mtls_auth</code></h4>
<p>Switch to the Beekeeper Studio window where you're signed in as <code>root</code>, and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">USAGE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">SCHEMA</span> defaultdb.public <span class="hljs-keyword">TO</span> mtls_auth;
<span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">SELECT</span>, <span class="hljs-keyword">INSERT</span>, <span class="hljs-keyword">UPDATE</span>, <span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">TABLE</span> defaultdb.public.books <span class="hljs-keyword">TO</span> mtls_auth;
</code></pre>
<p>You should see a success message.</p>
<p>Now return to the <code>mtls_auth</code> session and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom! You should now see your previously restored list of books.</p>
<p>You’ve successfully connected using passwordless, certificate-based authentication and granted controlled permissions to the new user. :)</p>
<h3 id="heading-connecting-via-mutual-tls-mtls-from-our-apps-on-kubernetes">Connecting via Mutual TLS (mTLS) from Our Apps on Kubernetes</h3>
<p>So far, we’ve been connecting to our CockroachDB cluster <em>securely</em> using Beekeeper Studio thanks to our TLS certificates and mTLS authentication.</p>
<p>But…what happens when we have applications running inside our Kubernetes cluster that need to talk to CockroachDB as well?</p>
<p>Exactly: those apps also need to authenticate using client certificates</p>
<p>And that brings us to a very important point…</p>
<h4 id="heading-why-we-should-not-generate-client-certificates-using-pods-the-dangerous-way">Why We Should <em>Not</em> Generate Client Certificates Using Pods (The Dangerous Way)</h4>
<p>Up until now, we’ve been generating our client certificates using Kubernetes Pods like:</p>
<ul>
<li><p><code>gen-root-cert</code></p>
</li>
<li><p><code>gen-mtls-auth-cert</code></p>
</li>
</ul>
<p>They <em>work</em>, yes…but they’re not safe for production.</p>
<p>Why? Because these jobs <strong>mount our Certificate Authority (CA) key</strong> inside the pod:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span>
        <span class="hljs-attr">items:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.crt</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.crt</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.key</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.key</span>
<span class="hljs-string">...</span>
</code></pre>
<p>This is a <em>big</em> security risk!</p>
<p>If an attacker ever gains access to that pod?</p>
<p>🔥 Your CA key is exposed<br>🔥 They can generate <em>their own trusted certificates</em><br>🔥 They can impersonate ANY client/user, including the <code>root</code> and <code>admin</code> users<br>🔥 They’ll have full access to your CockroachDB cluster</p>
<p>And they’ll keep that access <strong>forever</strong>, until you rotate the CA key (which is painful and disruptive).</p>
<p>This is why CockroachDB strongly advises against mounting CA keys into Pods.</p>
<h4 id="heading-the-right-way-using-cert-manager-recommended-by-cockroachdb">The Right Way: Using Cert Manager (Recommended by CockroachDB)</h4>
<p>CockroachDB’s <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/secure-cockroachdb-kubernetes?filters=helm#deploy-cert-manager-for-mtls">official docs recommend</a> managing client certificates using <strong>cert-manager</strong>.</p>
<p>This is because instead of YOU exposing your CA key inside Pods, cert-manager handles everything <em>internally and securely:</em></p>
<ul>
<li><p>Cert-manager stores and protects your CA key</p>
</li>
<li><p>It generates client certificates for you</p>
</li>
<li><p>It issues private keys <em>without ever exposing your CA key</em></p>
</li>
<li><p>It auto-renews certificates before they expire</p>
</li>
<li><p>And it gives you production-grade certificate lifecycle management</p>
</li>
</ul>
<h4 id="heading-but-wait-dont-we-need-the-ca-key-to-generate-client-certificates">But Wait: Don’t We Need the CA Key to Generate Client Certificates?</h4>
<p>Great question.</p>
<p>Yes, normally you need the CA key to sign client certificates…but <strong>cert-manager takes care of that for us</strong>.</p>
<p>You simply:</p>
<ol>
<li><p>Create an Issuer (or ClusterIssuer)</p>
</li>
<li><p>Tell cert-manager to use your CockroachDB CA</p>
</li>
<li><p>Request a Certificate</p>
</li>
</ol>
<p>Then cert-manager automatically:</p>
<ol>
<li><p>Signs it</p>
</li>
<li><p>Stores it in a Kubernetes Secret (where its safe)</p>
</li>
<li><p>Rotates it before expiry</p>
</li>
<li><p>Keeps your CA key completely secure</p>
</li>
</ol>
<p>No more exposing the CA key in Pods. No more writing custom Kubernetes Pods.</p>
<h4 id="heading-certificate-rotation-another-huge-win">Certificate Rotation — Another Huge Win</h4>
<p>Let’s talk about expirations.</p>
<p>Right now:</p>
<ul>
<li><p>The <code>mtls_auth</code> client cert we generated manually has <strong>5 hours</strong> validity</p>
</li>
<li><p>After 5 hours, it expires</p>
</li>
<li><p>Your apps will fail all DB connections</p>
</li>
<li><p>You’d need to regenerate a new certificate manually</p>
</li>
<li><p>Or worse: create a CronJob to regenerate them every 4 hours</p>
</li>
</ul>
<p>This is messy and unsafe.</p>
<p>With cert-manager?</p>
<ul>
<li><p>Certificates are automatically rotated</p>
</li>
<li><p>Renewed before expiration</p>
</li>
<li><p>No downtime</p>
</li>
<li><p>No manual intervention</p>
</li>
<li><p>Apps easily reload the new certificates</p>
</li>
</ul>
<h4 id="heading-alright-lets-install-cert-manager">Alright — Let’s Install Cert Manager</h4>
<p>To start using cert-manager, install it using the Helm chart:</p>
<pre><code class="lang-bash">helm repo add cert-manager https://charts.jetstack.io

helm install cert-manager cert-manager/cert-manager \
  --<span class="hljs-built_in">set</span> crds.enabled=<span class="hljs-literal">true</span> \
  --create-namespace \
  -n cert-manager \
  --version 1.19.1
</code></pre>
<p>Once cert-manager is installed, we’ll:</p>
<ol>
<li><p>Create a <strong>ClusterIssuer</strong> that uses our CockroachDB CA</p>
</li>
<li><p>Create a <strong>Certificate</strong> for our <code>mtls_auth</code> user</p>
</li>
<li><p>Mount that Certificate into our application Pods</p>
</li>
<li><p>Connect securely to CockroachDB via mTLS from inside Kubernetes</p>
</li>
</ol>
<p>That’s what we’ll walk through next</p>
<p>Before cert-manager can issue our certificates, it needs an <strong>Issuer</strong>. And before creating an Issuer, we need a secret that contains our CA certificate and CA key using the correct key names.</p>
<h4 id="heading-creating-a-ca-secret-for-the-issuer">Creating a CA Secret for the Issuer</h4>
<p>cert-manager’s <code>Issuer</code> is a bit picky about the secret format. It expects the secret to contain two keys:</p>
<ul>
<li><p><code>tls.crt</code>: the CA certificate</p>
</li>
<li><p><code>tls.key</code>: the CA private key</p>
</li>
</ul>
<p>But\ the CockroachDB Helm chart automatically generates a secret named <code>crdb-cockroachdb-ca-secret</code>, which uses different key names:</p>
<ul>
<li><p><code>ca.crt</code></p>
</li>
<li><p><code>ca.key</code></p>
</li>
</ul>
<p>So even though this secret contains exactly what we need, cert-manager won’t accept it because the keys are not named the way it expects.</p>
<p>To fix this, we’ll re-create a new secret with the correct key names. First, copy the existing CA files from Kubernetes to your local machine:</p>
<pre><code class="lang-bash">kubectl get secret crdb-cockroachdb-ca-secret -o jsonpath=<span class="hljs-string">'{.data.ca\.crt}'</span> | base64 -d &gt; ca.crt
</code></pre>
<p>If you get a “permission denied”, simply delete any existing <code>ca.crt</code> file in your project directory.</p>
<p>Now copy the key:</p>
<pre><code class="lang-bash">kubectl get secret crdb-cockroachdb-ca-secret -o jsonpath=<span class="hljs-string">'{.data.ca\.key}'</span> | base64 -d &gt; ca.key
</code></pre>
<p>Next, create the properly formatted secret:</p>
<pre><code class="lang-bash">kubectl create secret tls crdb-ca-issuer-secret --cert=ca.crt --key=ca.key
</code></pre>
<p>If you describe it:</p>
<pre><code class="lang-bash">kubectl describe secret crdb-ca-issuer-secret
</code></pre>
<p>You should now see <code>tls.crt</code> and <code>tls.key</code> in the <code>Data</code> section – exactly what cert-manager needs.</p>
<h4 id="heading-creating-the-issuer">Creating the Issuer</h4>
<p>Now that we have a properly formatted CA secret, we can create the Issuer that cert-manager will use to sign our client certificates.</p>
<p>Create a file called <code>crdb-issuer.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cert-manager.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Issuer</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-issuer</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">ca:</span>
    <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-ca-issuer-secret</span>
</code></pre>
<p>Apply it:</p>
<pre><code class="lang-bash">kubectl apply -f crdb-issuer.yml
</code></pre>
<p>Confirm that it’s ready:</p>
<pre><code class="lang-bash">kubectl get issuer crdb-issuer
</code></pre>
<p>The <code>Ready</code> column should display <code>True</code>.</p>
<h4 id="heading-creating-the-certificate-manifest">Creating the Certificate Manifest</h4>
<p>Now we’ll define a Certificate object. This doesn’t create the client certificate instantly – instead, it tells cert-manager <strong>what kind</strong> of certificate we need. cert-manager then generates and stores the certificate automatically.</p>
<p>Create a file named <code>crdb-mtls_auth-certificate.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cert-manager.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Certificate</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-mtls-auth-certificate</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-mtls-auth-certificate</span> <span class="hljs-comment"># Secret that will hold the cert+key</span>
  <span class="hljs-attr">commonName:</span> <span class="hljs-string">mtls_auth</span> <span class="hljs-comment"># MUST match Cockroach SQL role</span>
  <span class="hljs-attr">duration:</span> <span class="hljs-string">24h</span> <span class="hljs-comment"># 1 day</span>
  <span class="hljs-attr">renewBefore:</span> <span class="hljs-string">20h</span> <span class="hljs-comment"># renew 4 hours before expiry</span>
  <span class="hljs-attr">privateKey:</span>
    <span class="hljs-attr">algorithm:</span> <span class="hljs-string">RSA</span>
    <span class="hljs-attr">size:</span> <span class="hljs-number">2048</span>
    <span class="hljs-attr">encoding:</span> <span class="hljs-string">PKCS8</span>
  <span class="hljs-attr">usages:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">client</span> <span class="hljs-string">auth</span> <span class="hljs-comment"># important: client certificate</span>
  <span class="hljs-attr">issuerRef:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-issuer</span>
    <span class="hljs-attr">kind:</span> <span class="hljs-string">Issuer</span>
    <span class="hljs-attr">group:</span> <span class="hljs-string">cert-manager.io</span>
</code></pre>
<p>Let’s look at the important properties so we can understand what the Certificate workload does:</p>
<ul>
<li><p><strong>secretName:</strong> The Kubernetes secret where cert-manager will store the generated certificate, key, and CA certificate. This is where your apps will later mount the certificate files from.</p>
</li>
<li><p><strong>commonName:</strong> Very important! This must match the <strong>CockroachDB SQL user</strong> (<code>mtls_auth</code>), because CockroachDB uses the certificate’s Common Name to identify the connecting user.</p>
</li>
<li><p><strong>duration</strong> and <strong>renewBefore:</strong> <code>duration</code> defines how long the certificate is valid. <code>renewBefore</code> ensures cert-manager renews it early, preventing the certificate from getting expired before it gets renewed (to avoid downtime).</p>
</li>
<li><p><strong>usages:</strong> Tells cert-manager what the certificate is for. <code>client auth</code> ensures this certificate is only used by clients connecting to servers, not the other way around.</p>
</li>
<li><p><strong>issuerRef:</strong> Points to the Issuer we created earlier. This tells cert-manager <em>who</em> should sign the certificate.</p>
</li>
</ul>
<p>Apply the manifest:</p>
<pre><code class="lang-bash">kubectl apply -f crdb-mtls_auth-certificate.yml
</code></pre>
<p>After a few seconds, cert-manager will generate the certificate.</p>
<p>Check the secret:</p>
<pre><code class="lang-bash">kubectl get secret crdb-mtls-auth-certificate
</code></pre>
<p>Describe it to view the keys:</p>
<pre><code class="lang-bash">kubectl describe secret crdb-mtls-auth-certificate
</code></pre>
<p>You should see:</p>
<ul>
<li><p><code>tls.crt</code></p>
</li>
<li><p><code>tls.key</code></p>
</li>
<li><p><code>ca.crt</code></p>
</li>
</ul>
<p>These are the files the application will use.</p>
<p>If we copied the content of the <code>tls.crt</code> to our local machine and decoded it using the <code>openssl x509...</code> command, we'll see similar details to the content in the <code>client.mtls_auth.crt</code> client certificate we previously generated, with the Common Name (CN being <code>mtls_auth</code>).</p>
<h4 id="heading-creating-a-pod-that-connects-using-the-client-certificate">Creating a Pod That Connects Using the Client Certificate</h4>
<p>Now let’s create a simple Pod that uses our new client certificate to connect to CockroachDB.</p>
<p>Create a file called <code>books-pod.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">books-pod</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-certs</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-mtls-auth-certificate</span>
        <span class="hljs-comment"># Make secret files read-only for the user only: 0400 (Without this, the Python app will thow an error). Howevwe, this is not compulsory for all apps, just this one being used in this tutorial :)</span>
        <span class="hljs-attr">defaultMode:</span> <span class="hljs-number">0400</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">books</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">prince2006/cockroachdb-tutorial-python-app:new</span>
      <span class="hljs-attr">imagePullPolicy:</span> <span class="hljs-string">Always</span>
      <span class="hljs-attr">env:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">DATABASE_URL</span>
          <span class="hljs-attr">value:</span> <span class="hljs-string">&gt;-
            postgresql://mtls_auth@crdb-cockroachdb-public.default:26257/defaultdb?sslmode=verify-full&amp;sslrootcert=/crdb-certs/ca.crt&amp;sslcert=/crdb-certs/tls.crt&amp;sslkey=/crdb-certs/tls.key
</span>      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-certs</span>
          <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/crdb-certs</span>
          <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">limits:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"100Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"50m"</span>
        <span class="hljs-attr">requests:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"50Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"10m"</span>
</code></pre>
<p>Here’s what’s happening:</p>
<ul>
<li><p>We mount the generated certificate secret into <code>/crdb-certs</code>.</p>
</li>
<li><p>The Python app uses those certificate files (<code>tls.crt</code>, <code>tls.key</code>, <code>ca.crt</code>) to authenticate.</p>
</li>
<li><p>The connection string does <strong>NOT</strong> include a password. CockroachDB authenticates the user entirely via the certificate’s Common Name.</p>
</li>
</ul>
<p>Apply the Pod:</p>
<pre><code class="lang-bash">kubectl apply -f books-pod.yml
</code></pre>
<p>After about a minute, view the logs:</p>
<pre><code class="lang-bash">kubectl logs books-pod
</code></pre>
<p>Or if the Pod already restarted:</p>
<pre><code class="lang-bash">kubectl logs -p books-pod
</code></pre>
<p>You should see a successful connection to CockroachDB using the <code>mtls_auth</code> user and a list of books</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763534354156/60114f7b-ba62-4706-a0b7-7629e20bfaaa.png" alt="List of books from our books-pod logs" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>If you remove the certificate files or try connecting without them, the app will fail – as expected.</p>
<p><strong>Congratulations!</strong></p>
<p>You’ve officially built a fully secure, production-ready CockroachDB cluster on Kubernetes – complete with:</p>
<ul>
<li><p>End-to-end encryption (TLS)</p>
</li>
<li><p>Mutual TLS authentication (mTLS) for users and apps</p>
</li>
<li><p>Automated, daily backups to Google Cloud Storage</p>
</li>
<li><p>Proper certificate rotation with cert-manager</p>
</li>
</ul>
<h2 id="heading-how-to-get-a-cockroachdb-enterprise-license-for-free">How to Get a CockroachDB Enterprise License for Free</h2>
<p>Okay, so here’s a thing: even though you’ve built a super professional CockroachDB cluster, there’s one small catch: <strong>without a license, your cluster might be “throttled.”</strong></p>
<p>We know that because, when we access our dashboard, we get a message concerning our cluster getting throttled.</p>
<p>That means things slow down: queries take longer, performance gets worse, and scaling up won’t magically make it faster. Yeah, it’s real. 🥲</p>
<p>Why does this happen? Because CockroachDB’s “full feature set” is under a special license. If you don’t set a valid license, it limits how many SQL transactions you can run at a time.</p>
<h3 id="heading-three-types-of-licenses">Three Types of Licenses</h3>
<p>Here’s a breakdown of the different kinds of CockroachDB licenses and what they mean for you:</p>
<ol>
<li><p><strong>Trial License</strong></p>
<ul>
<li><p>Valid for <strong>30 days</strong>.</p>
</li>
<li><p>Lets you try all the “Enterprise” features.</p>
</li>
<li><p>You <em>must</em> send telemetry (more on that soon) while the trial is active.</p>
</li>
</ul>
</li>
<li><p><strong>Enterprise License (Paid)</strong></p>
<ul>
<li><p>This is CockroachDB’s “premium / fully paid” version.</p>
</li>
<li><p>You can pick the kind of license based on your environment: “Production”, “Pre-production”, or “Development.”</p>
</li>
<li><p>Companies with more than <strong>$10 million in annual revenue</strong> need to pay for this license.</p>
</li>
<li><p>There <em>are</em> discounts, startup perks, or “free” versions for smaller companies (more below).</p>
</li>
</ul>
</li>
<li><p><strong>Enterprise Free License</strong></p>
<ul>
<li><p>This is the magic one for early-stage companies or startups: it has exactly the same features as the paid Enterprise license. But it’s free if your business makes <strong>under $10 million per year</strong>.</p>
</li>
<li><p>You <em>do</em> need to renew it each year.</p>
</li>
<li><p>Support for this “Free” license is <strong>community-level</strong> (forums, docs), not paid enterprise.</p>
</li>
</ul>
</li>
</ol>
<p><strong>N.B.:</strong> To keep your free license active and <em>not</em> get throttled, CockroachDB requires telemetry. Telemetry means your cluster sends some usage data back to Cockroach Labs. And no, they’re not “stealing your data”. Here’s what that actually means:</p>
<ul>
<li><p>Telemetry includes basic usage stats, cluster health info, and configuration metrics.</p>
</li>
<li><p>It does NOT send your business data, queries, or personal customer data.</p>
</li>
<li><p>It helps Cockroach Labs <em>make sure the free license is used responsibly</em>, and helps them build better features.</p>
</li>
<li><p>If you stop sending telemetry, your cluster will eventually be throttled after 7 days (slowed down).</p>
</li>
</ul>
<h3 id="heading-how-to-apply-for-the-free-enterprise-license">How to Apply for the Free Enterprise License</h3>
<p>Here’s how you can try to get that free enterprise license:</p>
<ol>
<li><p>Go to the CockroachDB Cloud Console (Sign up if you don’t have a account). Then go to the “Organization” link on the menu, click it, then click the “Enterprise Licenses” from the dropdown.</p>
</li>
<li><p>Click the Create License button → Enable the “Find out if my company qualifies for an Enterprise Free license” option.</p>
</li>
<li><p>Fill in the form: your name, company name, job function, and the intended use of the license.</p>
</li>
<li><p>Click “Continue”.</p>
</li>
</ol>
<p>You should see this success message “Based on your company's intended use, you qualify for an Enterprise Free license.” Now agree to the terms and conditions, then click the “Generate License key“.</p>
<p>Learn more about CockroachDB licenses here 👉🏾 <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/licensing-faqs">https://www.cockroachlabs.com/docs/stable/licensing-faqs</a></p>
<h3 id="heading-adding-your-license-to-the-cockroachdb-cluster">Adding Your License to the CockroachDB Cluster</h3>
<p>Now that you’ve gotten your shiny new CockroachDB license (whether it’s the Free one or the Enterprise one), the next step is…actually <em>using it</em>.</p>
<p>Let’s add it to your CockroachDB cluster so it stops shouting “THROTTLED!” at you every time you open the dashboard :)</p>
<p>We’ll do this by updating our CockroachDB Helm configuration.</p>
<h4 id="heading-step-1-update-your-cockroachdb-productionyml">Step 1: Update Your <code>cockroachdb-production.yml</code></h4>
<p>Open your production Helm values file, and inside the <code>init</code> section, add the following:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">init:</span>
<span class="hljs-string">...</span>
    <span class="hljs-attr">provisioning:</span>
        <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
        <span class="hljs-attr">clusterSettings:</span>
          <span class="hljs-attr">cluster.organization:</span> <span class="hljs-string">"'&lt;ORGANIZATION&gt;'"</span> <span class="hljs-comment"># Enter the name of your organization here </span>
          <span class="hljs-attr">enterprise.license:</span> <span class="hljs-string">"'&lt;LICENSE&gt;'"</span> <span class="hljs-comment"># Enter your CockroachDB Enterprise license key here</span>
<span class="hljs-string">...</span>
</code></pre>
<p>Now replace:</p>
<ul>
<li><p><code>&lt;ORGANIZATION&gt;</code> with the name of your startup, business, project, or company</p>
</li>
<li><p><code>&lt;LICENSE&gt;</code> with the exact license string CockroachDB gave you</p>
</li>
</ul>
<p>That’s it – super simple.</p>
<h4 id="heading-step-2-apply-the-changes-with-helm">Step 2: Apply the Changes With Helm</h4>
<p>Run your usual Helm upgrade command:</p>
<pre><code class="lang-bash">helm upgrade cockroachdb -f cockroachdb-production.yml cockroachdb/cockroachdb
</code></pre>
<h4 id="heading-step-3-confirm-the-license-was-added-correctly">Step 3: Confirm the License Was Added Correctly</h4>
<p>Now let’s double-check everything worked.</p>
<ol>
<li><p>Connect as the <code>root</code> user: You can connect using Beekeeper Studio (like we’ve been doing).</p>
</li>
<li><p>Run this query to check your license:</p>
</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">SHOW</span> CLUSTER SETTING enterprise.license;
</code></pre>
<p>If everything went well, you should see your license key printed out in the results.</p>
<h4 id="heading-step-4-make-sure-telemetry-is-enabled-important">Step 4: Make Sure Telemetry Is Enabled (Important!)</h4>
<p>Remember: without telemetry enabled, your cluster will still get throttled, even if you have a valid license 🥲</p>
<p>Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SHOW</span> CLUSTER SETTING diagnostics.reporting.enabled;
</code></pre>
<p>If the result says “true”, you're good! Telemetry is on, CockroachDB can verify your license, and your cluster will behave normally without slowing down.</p>
<h2 id="heading-conclusion-amp-next-steps"><strong>Conclusion &amp; Next Steps ✨</strong></h2>
<p>Throughout this book, you’ve gone from “What even is CockroachDB?” to actually running your <strong>own secure, production-ready database</strong> on Kubernetes – and that’s a BIG deal. 🎉</p>
<p>You learned why CockroachDB is special, how it avoids downtime, and why it’s different from the usual databases everyone talks about.</p>
<p>Then you set up your own local environment, practiced everything safely on Minikube, and gradually built your way to a full production setup on GKE.</p>
<p>You explored CockroachDB’s dashboard, checked your cluster’s health, backed up your data to the cloud, and even learned how to keep your database fast, stable, and ready to grow when needed.</p>
<p>Finally, you deployed it on Google Cloud, secured it with encryption and certificates, and connected to it from your own PC – all step-by-step.</p>
<p>By now, you’ve basically gone from curious learner to “I can actually run this thing in production.” 🚀</p>
<p>You’ve covered a lot – and you’ve built something powerful, modern, and production-worthy. Amazing job 👏🏾😁!! And thanks for reading.</p>
<h3 id="heading-about-the-author">About the Author 👨🏾‍💻</h3>
<p>Hi, I’m Prince! I’m a DevOps engineer and Cloud architect passionate about building, deploying, architecting, and managing applications and sharing knowledge with the tech community.</p>
<p>If you enjoyed this book, you can learn more about me by exploring more of my blogs and projects on my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/">LinkedIn profile</a>. and reach out to me on <a target="_blank" href="https://x.com/POnukwili">Twitter (X)</a>. You can find more of my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/details/publications/">articles here</a> or on <a target="_blank" href="https://www.freecodecamp.org/news/author/onukwilip/">my freeCodeCamp blog</a>.</p>
<p>You can also <a target="_blank" href="https://prince-onuk.vercel.app">visit my website</a>. Let’s connect and grow together! 😊</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Load Balancing with Azure Application Gateway and Azure Load Balancer – When to Use Each One ]]>
                </title>
                <description>
                    <![CDATA[ You’ve probably heard someone mention load balancing when talking about cloud apps. Maybe even names like Azure Load Balancer, Azure Application Gateway, or something about Virtual Machines and Scale Sets. 😵‍💫 It all sounds important...but also a l... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/load-balancing-with-azure-application-gateway-and-azure-load-balancer/</link>
                <guid isPermaLink="false">6824f10a7d203c180e5ea4b2</guid>
                
                    <category>
                        <![CDATA[ Load Balancing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Azure ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Azure Application Gateway ]]>
                    </category>
                
                    <category>
                        <![CDATA[ virtual machine ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #virtual machine scale set ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Load Balancer ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Prince Onukwili ]]>
                </dc:creator>
                <pubDate>Wed, 14 May 2025 19:37:46 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1747235455030/cb82bfb4-8d7b-47e5-ab31-126906f60b40.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>You’ve probably heard someone mention load balancing when talking about cloud apps. Maybe even names like Azure Load Balancer, Azure Application Gateway, or something about Virtual Machines and Scale Sets. 😵‍💫</p>
<p>It all sounds important...but also a little confusing. Like, why are there so many moving parts? And what do they actually do?</p>
<p>In this guide, we’re going to break it all down – step by step – using real examples and simple language.</p>
<p>You’ll learn:</p>
<ul>
<li><p>What load balancers are (and why apps even need them)</p>
</li>
<li><p>How apps were deployed before load balancers existed (hint: everything lived on one lonely server)</p>
</li>
<li><p>How Azure Virtual Machines work – and how they let you scale up your apps</p>
</li>
<li><p>What Virtual Machine Scale Sets are, and how they help handle sudden traffic spikes</p>
</li>
<li><p>The differences between Azure Load Balancer and Azure Application Gateway, and when to use each</p>
</li>
</ul>
<p>By the end, you won’t just understand what these tools do – you’ll know <em>when</em> and <em>why</em> to use them in real-world scenarios.</p>
<p>Whether you’re a curious beginner, a hands-on builder, or someone just trying to wrap their head around Azure’s ecosystem, this guide is for you.</p>
<p>Ready to untangle the cloud spaghetti? Let’s go! 🍝🚀</p>
<h2 id="heading-table-of-contents">📚 Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-are-load-balancers">🧊 What Are Load Balancers?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-applications-were-deployed-before-load-balancers">🖥️ How Applications Were Deployed Before Load Balancers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-azure-virtual-machines-vms-the-building-blocks">⚙️ Azure Virtual Machines (VMs) – The Building Blocks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-need-for-scaling-vertical-vs-horizontal">📈 The Need for Scaling – Vertical vs Horizontal</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-azure-virtual-machine-scale-sets-vmss-scaling-made-simple">🔁 Azure Virtual Machine Scale Sets (VMSS) – Scaling Made Simple</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-azure-load-balancer-spreading-the-traffic">📦 Azure Load Balancer – Spreading the Traffic</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-azure-application-gateway-smart-routing-for-modern-apps">🍴 Azure Application Gateway – Smart Routing for Modern Apps</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-azure-load-balancer-vs-azure-application-gateway">🔍 Azure Load Balancer vs Azure Application Gateway</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-use-cases-when-to-use-what">🧭</a> <a class="post-section-overview" href="#heading-use-cases-when-to-use-each-one">Use Cases: When to Use Each One</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">✅ Conclusion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-study-further">Study Further 📚</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-about-the-author">About the Author 👨‍💻</a></p>
</li>
</ol>
<h2 id="heading-what-are-load-balancers">🧊 What Are Load Balancers?</h2>
<p>Imagine you're running a small restaurant with just one chef in the kitchen. Everything goes smoothly when you have a few customers – each order is prepared one after the other, and everyone leaves satisfied.</p>
<p>But what happens when 50 people walk in all at once?</p>
<p>🍽️ One chef can’t handle that many orders at the same time.<br>⏳ People start waiting longer.<br>😤 Some customers leave.<br>💥 The chef gets overwhelmed – and eventually burns out.</p>
<p>This is what can happen to a server (the computer running your app) when too many users try to access it at the same time.</p>
<h3 id="heading-so-what-does-a-load-balancer-do">So, What Does a Load Balancer Do?</h3>
<p>A <strong>load balancer</strong> is like a smart restaurant manager. But instead of food orders, it handles user requests – the things people do when they open your app, click buttons, or load data.</p>
<p>Let’s say you now have three chefs (servers) instead of one. The load balancer’s job is to:</p>
<ul>
<li><p>👀 Watch for incoming orders (user requests)</p>
</li>
<li><p>🧠 Decide which chef (server) is available or least busy</p>
</li>
<li><p>🍽️ Send that request to the right one</p>
</li>
<li><p>🔁 Repeat this over and over, making sure things stay fast and smooth</p>
</li>
</ul>
<p>So in simple terms, a load balancer takes all the incoming traffic to your app and distributes it across multiple servers so no single server gets overloaded – cool, right? 🙂</p>
<h3 id="heading-why-were-load-balancers-introduced">Why Were Load Balancers Introduced?</h3>
<p>Back in the early days, many applications were hosted on just one machine – called a Single Server Deployment.</p>
<p>That was okay when you had a small number of users. But once things started to grow – more users, more actions, more data – single servers became a bottleneck:</p>
<ul>
<li><p>They could only handle a limited number of requests.</p>
</li>
<li><p>If they went down, your entire app would stop working.</p>
</li>
<li><p>Scaling (adding more power) was expensive and manual.</p>
</li>
</ul>
<p>💡 Enter <strong>load balancers</strong> – designed to solve this by making it possible to:</p>
<ul>
<li><p>Spread traffic across multiple servers (so no one server crashes under pressure),</p>
</li>
<li><p>Replace or restart servers without downtime,</p>
</li>
<li><p>Add or remove servers as needed, depending on how busy your app is (this is called <strong>scaling</strong>).</p>
</li>
</ul>
<h3 id="heading-a-simple-use-case-scenario">A Simple Use-Case Scenario</h3>
<p>Let’s say you're building an online store — your own mini Amazon. At first, you host your app on one Azure Virtual Machine. Things are great. But one day, you run a huge promo and suddenly…thousands of people flood in to browse, shop, and check out.</p>
<p>Your single VM starts lagging.</p>
<p>Orders fail. People complain. Your dream app? Crashing fast. 💥</p>
<p>So what do you do?</p>
<p>You spin up two more VMs to help out – but now you’ve got another problem: <em>How do you divide the traffic between the three?</em></p>
<p>This is where the load balancer steps in. It:</p>
<ul>
<li><p>Looks at every incoming user request</p>
</li>
<li><p>Figures out which VM is available and least busy</p>
</li>
<li><p>Sends the request there</p>
</li>
<li><p>Keeps rotating requests in real-time</p>
</li>
</ul>
<p>And the result?<br>✅ No single VM gets overwhelmed<br>✅ Your app stays fast and responsive<br>✅ Users are happy (and buying stuff again!)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746980088916/41be330b-8d5b-4709-b07d-3f1a19d641e7.png" alt="Load balancer illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-applications-were-deployed-before-load-balancers">🖥️ How Applications Were Deployed Before Load Balancers</h2>
<p>Before cloud tools like load balancers came along, the typical way to run an application was pretty simple: You’d deploy the entire app on a single server, like running a small business from one tiny shop.</p>
<h3 id="heading-first-things-first-whats-a-server">First Things First: What’s a Server?</h3>
<p>Think of a server as a special computer that’s always connected to the internet. Its job is to “serve” your app to people when they visit your website, open your app, or use your service.</p>
<p>In cloud platforms like Azure, we usually call these Virtual Machines (VMs) – basically, software-powered servers you can spin up with a few clicks.</p>
<h3 id="heading-monoliths-vs-microservices">Monoliths vs Microservices</h3>
<p>Now, applications come in different “shapes.” The two most common are:</p>
<ul>
<li><p><strong>Monoliths</strong>: Everything is bundled together into one big app. All the code – from user login to shopping cart to checkout – lives in a single unit.</p>
</li>
<li><p><strong>Microservices</strong>: The app is broken into smaller, independent apps (services). Each service does one job – like login, payments, orders – and runs separately.</p>
</li>
</ul>
<h4 id="heading-how-were-these-apps-deployed">How Were These Apps Deployed?</h4>
<p>Whether it was a monolith or a bunch of microservices, they were all usually deployed on a single server (VM).</p>
<p>For monoliths, you just ran the entire app directly on the server. For microservices: you'd run each service in a separate space on that same server, using <strong>containers</strong>.</p>
<h4 id="heading-wait-whats-a-container">Wait — What’s a Container?</h4>
<p>A container is like a mini-computer <em>inside</em> a computer. It has everything an app needs to run – code, tools, settings – and it keeps each app isolated from the others.</p>
<p>Why use containers?</p>
<ul>
<li><p>You can run multiple services on the same server without their underlying software (software needed for each app to run) interfering with each other.</p>
</li>
<li><p>It’s faster and more efficient than installing everything directly on the server.</p>
</li>
<li><p>They make moving apps between environments (for example, test → production) super smooth (no more “But, it works on my machine…”).</p>
</li>
</ul>
<p>Popular tools like Docker make working with containers easy.</p>
<h4 id="heading-connecting-it-all-together-domains-subdomains-and-reverse-proxies">Connecting It All Together: Domains, Subdomains, and Reverse Proxies</h4>
<p>When your app lives on a server, you want people to be able to reach it. That’s where <strong>domain names</strong> come in.</p>
<ul>
<li><p>Your server has a public IP address – a set of numbers like <code>102.80.1.23</code>, that gives it a unique identifier on the public internet</p>
</li>
<li><p>But instead of asking users to type numbers, you link that IP to a domain name, like <code>mycoolapp.com</code></p>
</li>
</ul>
<p>If your app has microservices, you might even assign <strong>subdomains</strong> like:</p>
<ul>
<li><p><code>api.mycoolapp.com</code> for the backend</p>
</li>
<li><p><code>dashboard.mycoolapp.com</code> for the user interface</p>
</li>
<li><p><code>payments.mycoolapp.com</code> for payments</p>
</li>
</ul>
<p>To manage all this, you’d use a <strong>reverse proxy</strong> (like Nginx or Apache). It listens on the main domain and subdomains, and forwards traffic to the right app or service.</p>
<p>Example:</p>
<ul>
<li><p>Someone visits <code>dashboard.mycoolapp.com</code></p>
</li>
<li><p>The reverse proxy checks the domain and forwards the request to the correct container running the dashboard service</p>
</li>
</ul>
<p>And to help with all of this setup – from deploying containers to configuring reverse proxies – there are developer-friendly tools like <a target="_blank" href="https://coolify.io">Coolify</a>. Coolify is an open-source platform that makes it super easy for developers and DevOps teams to:</p>
<ul>
<li><p>Deploy apps in containers</p>
</li>
<li><p>Set up domains and subdomains</p>
</li>
<li><p>Configure reverse proxies – all from a clean dashboard, no complex terminal commands needed</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746979943646/a6525a09-f44a-4e00-a945-7bded3483b0d.jpeg" alt="Coolify dashboard example" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>All this was set up on ONE SERVER/VM. But here’s the catch: when that one server got overloaded or went down…💥 everything stopped.</p>
<p>That’s why we needed a better way. And that's where <strong>scaling</strong> and <strong>load balancing</strong> came in – to keep apps running smoothly, no matter the traffic.</p>
<h2 id="heading-azure-virtual-machines-vms-the-building-blocks">⚙️ Azure Virtual Machines (VMs) – The Building Blocks</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746980948928/eb6a7fb2-7432-42ed-8cbd-bff6c8250d4e.jpeg" alt="Virtual Machine illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>When it comes to running apps in the cloud, <strong>Virtual Machines (VMs)</strong> are the basic building blocks – kind of like renting an apartment in a giant digital skyscraper.</p>
<p>You don’t need to buy the whole building (aka physical servers), you just rent the space you need, when you need it.</p>
<h3 id="heading-what-exactly-is-a-virtual-machine">What Exactly Is a Virtual Machine?</h3>
<p>A Virtual Machine is a software-based computer that runs inside a real, physical computer (a server) – hosted in a data center, like those run by Microsoft Azure.</p>
<p>It looks and behaves like a normal computer:</p>
<ul>
<li><p>It has an operating system (Windows, Linux)</p>
</li>
<li><p>You can install apps</p>
</li>
<li><p>It has memory (RAM), storage (disks), and CPU</p>
</li>
</ul>
<p>But the best part? You don’t need to worry about the hardware. Azure takes care of that behind the scenes – all you do is say:</p>
<blockquote>
<p>“Hey Azure, give me a Linux VM with 4GB RAM and 2 CPUs.”</p>
</blockquote>
<p>And boom 💥 — it spins up in minutes.</p>
<h3 id="heading-why-use-a-vm">Why Use a VM?</h3>
<p>Let’s say you’ve built a web app – it’s just a simple blog. You want to deploy it and make it accessible to the world.</p>
<p>Here's what you can do with a VM:</p>
<ul>
<li><p>Set it up with your favorite OS (for example, Ubuntu)</p>
</li>
<li><p>Install web servers like Nginx or Apache</p>
</li>
<li><p>Deploy your app</p>
</li>
<li><p>Bind it to your domain name</p>
</li>
<li><p>Let the world visit your blog at <a target="_blank" href="http://myawesomeblog.com"><code>myawesomeblog.com</code></a></p>
</li>
</ul>
<p>It’s your own personal environment – no sharing, full control.</p>
<h2 id="heading-the-need-for-scaling-vertical-vs-horizontal">📈 The Need for Scaling – Vertical vs Horizontal</h2>
<p>Imagine your app is growing. At first, it’s just a few users. Then a few hundred. Then thousands are logging in, placing orders, chatting, uploading photos – all at once 😮</p>
<p>Suddenly, your server (VM) is under pressure. It’s like trying to pour a flood through a straw.</p>
<h3 id="heading-so-what-do-you-do-when-one-server-isnt-enough">So, What Do You Do When One Server Isn’t Enough?</h3>
<p>This is where scaling comes in – the art of upgrading your app’s infrastructure to keep up with traffic.</p>
<p>There are two main ways to scale:</p>
<h4 id="heading-option-1-vertical-scaling-aka-scaling-up">🧱 Option 1: Vertical Scaling (aka Scaling Up)</h4>
<p>You take your existing VM and give it more power:</p>
<ul>
<li><p>Add more CPUs 🧠</p>
</li>
<li><p>Increase RAM 🧵</p>
</li>
<li><p>Add faster disks ⚡</p>
</li>
</ul>
<p>Think of it like upgrading from a regular car to a sports car. It’s the same vehicle, just faster and stronger.</p>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Simple to do</p>
</li>
<li><p>No major changes to your app setup</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>There’s a limit to how much you can upgrade</p>
</li>
<li><p>Still a single point of failure: if the VM crashes, everything goes down 😬</p>
</li>
</ul>
<h4 id="heading-option-2-horizontal-scaling-aka-scaling-out">🧩 Option 2: Horizontal Scaling (aka Scaling Out)</h4>
<p>Instead of boosting one server, you add more servers – multiple VMs running copies of your app.</p>
<p>Now:</p>
<ul>
<li><p>Users can be distributed across all these VMs</p>
</li>
<li><p>If one goes down, others keep serving traffic</p>
</li>
<li><p>You can <em>dynamically</em> add or remove VMs based on traffic</p>
</li>
</ul>
<p>It’s like opening more checkout counters in a busy supermarket 🛒</p>
<p><strong>Pros:</strong></p>
<ul>
<li><p>The load is evenly distributed. For example, if one server previously handled 100% of the traffic, adding two more servers would result in the traffic being split into approximately 33% to 34% for each server.</p>
</li>
<li><p>Improves both performance and reliability</p>
</li>
<li><p>You can scale based on real-time demand, that is traffic inflow</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>Needs something to split traffic between VMs – Load Balancers</p>
</li>
<li><p>More expensive. You end up paying the original amount for 1 VM (for example $30) for the number of VMs you provide – if you provide 3 VMs at $30 each, you end up paying $90 at the end of the month</p>
</li>
</ul>
<h3 id="heading-quick-real-world-example">Quick Real-World Example</h3>
<p>Let’s say you’ve launched an e-commerce site for sneakers 👟 Traffic spikes during a big sale? Your vertical scaling (bigger VM) might choke.</p>
<p>But with horizontal scaling:</p>
<ul>
<li><p>You spin up 5 VMs across different regions</p>
</li>
<li><p>Traffic is shared between them</p>
</li>
<li><p>If one VM slows down, others handle the load</p>
</li>
</ul>
<h4 id="heading-so-remember">So, remember 👇🏾</h4>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Scaling Type</td><td>Description</td><td>Pros</td><td>Cons</td></tr>
</thead>
<tbody>
<tr>
<td>🧱 Vertical Scaling</td><td>Make 1 VM more powerful (adding more CPU power, SSD, RAM, bandwidth, and so on)</td><td>Easy setup, fewer changes</td><td>Hardware limits, 1 point of failure - If that 1 server/VM goes down, so does your app :(</td></tr>
<tr>
<td>🧩 Horizontal Scaling</td><td>Add more VMs to handle traffic</td><td>Flexible, reliable</td><td>Needs traffic distribution logic (Load Balancer). Usually more expensive (the price of 1 VM times the number of VMs)</td></tr>
</tbody>
</table>
</div><h2 id="heading-azure-virtual-machine-scale-sets-vmss-scaling-made-simple">🔁 Azure Virtual Machine Scale Sets (VMSS) – Scaling Made Simple</h2>
<p>Okay – so we’ve talked about <strong>horizontal scaling</strong>: adding multiple VMs to handle growing traffic. Sounds great, right?</p>
<p>But here’s the thing: manually spinning up and configuring 5, 10, or 100 VMs... every time your app gets busy? Yeah, that’s not fun 🙃</p>
<h3 id="heading-enter-virtual-machine-scale-sets-vmss">Enter: Virtual Machine Scale Sets (VMSS)</h3>
<p>VMSS is Azure’s way of automating horizontal scaling. Instead of creating each VM one by one, you define a template, and Azure takes care of the rest:</p>
<ul>
<li><p>How many VMs to start with</p>
</li>
<li><p>How to configure them (OS, apps, settings) ⚙️</p>
</li>
<li><p>When to add or remove VMs based on traffic 📈📉</p>
</li>
</ul>
<h3 id="heading-a-simple-analogy">A Simple Analogy 🧃</h3>
<p>Think of VMSS like a juice dispenser at a party:</p>
<ul>
<li><p>At first, it pours into 2 cups (VMs)</p>
</li>
<li><p>If 10 guests show up? It starts filling 5 cups</p>
</li>
<li><p>Party slows down? Back to 2 cups again</p>
</li>
</ul>
<p>You never have to refill manually – the dispenser adjusts on its own. 🎉</p>
<h3 id="heading-how-it-works-without-the-jargon">How It Works (Without the Jargon 😌)</h3>
<ol>
<li><p><strong>You set the rules:</strong> “If CPU usage goes above 70%, add 2 more VMs.”</p>
</li>
<li><p><strong>Azure watches traffic and adjusts the number of VMs</strong> automatically.</p>
</li>
<li><p><strong>All VMs are identical</strong> – like clones, all running the same app setup.</p>
</li>
<li><p><strong>It works with Azure Load Balancer</strong> to spread traffic across all these VMs smoothly.</p>
</li>
</ol>
<h3 id="heading-real-life-example-food-delivery-app">Real-Life Example: Food Delivery App 🍕📱</h3>
<p>You’ve built an app where users order food. During lunch and dinner, traffic explodes.</p>
<p>💡 With VMSS:</p>
<ul>
<li><p>You start with 3 VMs in the morning</p>
</li>
<li><p>At 12PM, Azure sees high CPU usage, so it spins up 5 more VMs</p>
</li>
<li><p>At 3PM, traffic drops, so Azure removes the extra VMs</p>
</li>
</ul>
<p>You only pay for what you use. And users get a smooth experience – no delays, no crashes 👌🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746982520998/7fe3c997-fc8f-418a-861b-e999905ca43c.png" alt="Auto-scaling illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-azure-load-balancer-spreading-the-traffic">📦 Azure Load Balancer – Spreading the Traffic</h2>
<p>By now, you know that your app can live on multiple Virtual Machines (VMs), and that you can scale them easily using Virtual Machine Scale Sets (VMSS).</p>
<p>But here's the big question: when users start accessing your app – hundreds, even thousands at once – how do you make sure that all that traffic is fairly and efficiently distributed across those VMs?</p>
<p>You don’t want one VM to be overwhelmed while others are just chilling. You need a middleman – something smart enough to balance the load.</p>
<p>That’s where <strong>Azure Load Balancer</strong> steps in. It’s Azure’s way of saying, “Don’t worry, I got this” when traffic starts rolling in.</p>
<h3 id="heading-so-what-is-azure-load-balancer">🏢 So, What Is Azure Load Balancer?</h3>
<p>Azure Load Balancer is a <strong>traffic director</strong>. It takes incoming traffic from the internet (or even internal sources within your network) and intelligently spreads it across multiple backend machines – usually VMs.</p>
<p>It's like having a well-trained receptionist who routes every customer to the next available agent, so no one waits too long and no one gets overwhelmed 😃.</p>
<p>And the best part? This entire process happens in the background – fast, silent, and seamless. Users visiting your app have no idea a traffic manager is working behind the scenes. They just see a fast, responsive experience.</p>
<h3 id="heading-the-frontend-ip-your-apps-public-face">🌐 The Frontend IP – Your App’s Public Face</h3>
<p>Every Azure Load Balancer is tied to a <strong>Frontend IP</strong>, which is basically the public IP address of your application – the one users connect to when they open <code>www.yourapp.com</code>.</p>
<p>This IP acts as the entry point. All user traffic comes through it first. But the Load Balancer doesn’t actually run your app. Instead, it accepts the traffic and forwards it to one of the VMs in the backend pool (we’ll get to that shortly).</p>
<p>You can configure this Frontend IP to be either public (accessible over the internet) or private (used for internal traffic within your cloud network – say, between microservices or internal tools).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747055268951/5afbb738-d00d-4f49-9709-2fa1fe7cffdd.png" alt="Frontend IP address illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-backend-pool-where-the-magic-happens">🗂️ Backend Pool – Where the Magic Happens</h3>
<p>Behind every Azure Load Balancer is a <strong>backend pool</strong> – a group of VMs (or VM Scale Set instances) where your actual app is running. These are the real workers, doing all the heavy lifting.</p>
<p>When traffic hits the Frontend IP, the Load Balancer takes that request and hands it off to one of the VMs in the backend pool.</p>
<p>But it doesn’t just randomly pick one. It checks a few things first – like whether the VM is healthy, whether it's already busy, and what rules you’ve set.</p>
<p>Each VM in the pool typically runs the same app or service. This means any of them can handle any incoming request, which is what makes load balancing possible in the first place.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747055337014/e831056d-7c0c-49d9-b05a-6d3dbe3edc76.png" alt="Backend pool illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-health-probes-keeping-tabs-on-the-vms">🩺 Health Probes – Keeping Tabs on the VMs</h3>
<p>Now, how does the Load Balancer know which VM is healthy or not? This is where <strong>health probes</strong> come in. Think of them as regular check-ups.</p>
<p>You configure the Load Balancer to periodically "ping" each VM – maybe by hitting a specific URL (like <code>/health</code>) or a certain port (like 80 for HTTP). If a VM doesn’t respond correctly, Azure marks it as unhealthy and temporarily removes it from the rotation.</p>
<p>This ensures users never get routed to a broken or unresponsive instance of your app. And once the VM becomes healthy again, it's automatically added back to the pool.</p>
<h3 id="heading-load-balancing-rules-who-gets-what">⚖️ Load Balancing Rules – Who Gets What?</h3>
<p>Next, we have <strong>Load Balancing Rules</strong>. These are the instructions that tell Azure Load Balancer exactly how to behave.</p>
<p>You can define rules like:</p>
<ul>
<li><p>“Forward all HTTP (port 80) traffic to backend pool VMs on port 80”</p>
</li>
<li><p>“Forward HTTPS (port 443) traffic to VMs on port 443”</p>
</li>
<li><p>“Only route traffic to healthy VMs”</p>
</li>
</ul>
<p>These rules make Azure Load Balancer highly customizable. You get to decide how traffic flows, which protocols to support, and how to handle backend ports. It's like customizing the rules of a relay race – who gets the baton and when.</p>
<h3 id="heading-real-world-example-sneaker-sale-rush">👟 Real-World Example: Sneaker Sale Rush</h3>
<p>Imagine you're running an online sneaker store at <code>www.sneakerblast.com</code>. You’re launching a flash sale, and thousands of users are hitting your website all at once.</p>
<p>Thanks to your Azure Load Balancer, here’s what happens:</p>
<ol>
<li><p>All those users land on your Frontend IP, the public face of your site.</p>
</li>
<li><p>The Load Balancer accepts the traffic and checks the health probes of all VMs in the backend pool.</p>
</li>
<li><p>Based on its rules, it forwards each user to a healthy, available VM.</p>
</li>
<li><p>One VM might serve a user in Lagos, another in Nairobi, another in Accra – all seamlessly.</p>
</li>
</ol>
<p>If one VM crashes or lags? The Load Balancer detects it instantly and stops routing traffic to it until it’s back online.</p>
<p>That’s smooth traffic management without any manual effort.</p>
<h2 id="heading-azure-application-gateway-smart-routing-for-modern-apps">🍴 Azure Application Gateway – Smart Routing for Modern Apps</h2>
<p>So far, we’ve seen how Azure Load Balancer helps you split traffic across multiple VMs running a single service – like a monolithic app or a web frontend.</p>
<p>Let’s say you have a web application deployed on a VM. It listens on port 80, and you’ve scaled it into 3 instances. The Azure Load Balancer takes requests from the internet and spreads them across all 3 instances of the same service. Easy, right?</p>
<p>You can even link the Load Balancer’s public IP address to your domain – like <code>mydomain.com</code> – so users can visit your site normally.</p>
<h3 id="heading-but-what-if-you-have-multiple-services">🧠 But What If You Have <em>Multiple</em> Services?</h3>
<p>Now imagine you’ve gone beyond just one app. You’re building something more modern, like a set of microservices.</p>
<p>You now have:</p>
<ul>
<li><p>A payment service listening on port 5000</p>
</li>
<li><p>An authentication service on port 6000</p>
</li>
<li><p>A purchase service on port 7000</p>
</li>
</ul>
<p>All deployed across the same VMs (or Virtual Machine Scale Set), just on different ports.</p>
<p>Here’s the problem: an Azure Load Balancer is designed to route traffic to <em>one</em> backend pool – basically one service – on one port. If you tie it to <code>mydomain.com</code>, it can only send traffic to one of your microservices. 😬</p>
<p>So… what do you do?</p>
<p>You might think: “Let me just create a separate Load Balancer for each service!” 🤕</p>
<p>But that means:</p>
<ul>
<li><p>You’ll have to pay for multiple load balancers</p>
</li>
<li><p>You’ll end up managing 3–5 public IP addresses</p>
</li>
<li><p>You might even need to buy multiple domains like <code>mypayment.com</code>, <code>myauth.com</code>, and so on to route users properly</p>
</li>
</ul>
<p>Yikes. That’s impractical, messy, <em>and</em> expensive 😖💸</p>
<h3 id="heading-enter-azure-application-gateway">🎉 Enter Azure Application Gateway</h3>
<p><strong>Azure Application Gateway</strong> solves this problem beautifully. It’s designed to route traffic intelligently – not just to one service, but to multiple services using just one gateway.</p>
<p>It works like this:</p>
<ol>
<li><p>You create one public-facing frontend IP (like <code>52.160.100.5</code>)</p>
</li>
<li><p>You link that IP address to your main domain, for example <code>mydomain.com</code></p>
</li>
<li><p>Then, you define multiple backend pools – one for each service:</p>
<ul>
<li><p>Payment service (port 5000)</p>
</li>
<li><p>Auth service (port 6000)</p>
</li>
<li><p>Purchase service (port 7000)</p>
</li>
</ul>
</li>
<li><p>Next, you set up routing rules that decide how to forward each request.</p>
</li>
</ol>
<h3 id="heading-two-ways-to-route-with-application-gateway">✨ Two Ways to Route with Application Gateway</h3>
<p>You can configure <strong>smart routing</strong> based on:</p>
<ul>
<li><p><strong>URL paths</strong>:</p>
<ul>
<li><p><code>mydomain.com/payment</code> → Payment service</p>
</li>
<li><p><code>mydomain.com/auth</code> → Auth service</p>
</li>
</ul>
</li>
<li><p><strong>Subdomains</strong> (host headers):</p>
<ul>
<li><p><code>payment.mydomain.com</code> → Payment service</p>
</li>
<li><p><code>auth.mydomain.com</code> → Auth service</p>
</li>
</ul>
</li>
</ul>
<p>This way, all your services share one public IP and one domain – super clean, super efficient 🙌🏾</p>
<h3 id="heading-real-life-scenario-lets-break-it-down">🤓 Real-Life Scenario (Let’s Break It Down)</h3>
<p>Let’s say you’re building a startup platform that has three key microservices:</p>
<ul>
<li><p><strong>Payment service</strong> that handles transactions</p>
</li>
<li><p><strong>Authentication service</strong> that handles login and user identity</p>
</li>
<li><p><strong>Purchase service</strong> that manages product ordering</p>
</li>
</ul>
<p>Each service is containerized and deployed on the same VM (or across several VMs using a VM Scale Set). But – and this is key – they all listen on <strong>different ports</strong> inside the VMs:</p>
<ul>
<li><p>Payment → port 3000</p>
</li>
<li><p>Auth → port 6000</p>
</li>
<li><p>Purchase → port 7000</p>
</li>
</ul>
<p>Now, without a smart routing solution, you’d be stuck trying to expose just one of these services using a standard Azure Load Balancer. But you need all three to be accessible from the internet – and you don’t want to pay for or manage 3 different Load Balancers 😅</p>
<p>So, what do you do?</p>
<h3 id="heading-using-azure-application-gateway-to-route-traffic-intelligently">🧠 Using Azure Application Gateway to Route Traffic Intelligently</h3>
<p>Here's how you can fix this using <strong>one</strong> Application Gateway:</p>
<ol>
<li><p>Deploy your microservices inside each VM:</p>
<ul>
<li><p>Each service runs on a specific port</p>
</li>
<li><p>All VMs in your scale set are identical (they contain all three services)</p>
</li>
</ul>
</li>
<li><p>Create backend pools in Application Gateway:</p>
<ul>
<li><p>A backend pool for the payment service (pointing to port 3000 on all VMs)</p>
</li>
<li><p>One for the auth service (port 6000)</p>
</li>
<li><p>Another for the purchase service (port 7000)</p>
</li>
</ul>
</li>
<li><p>Create routing rules:</p>
<ul>
<li><p>Option A (Path-based routing):</p>
<ul>
<li><p>Requests to <code>mydomain.com/payment</code> → go to the payment backend pool</p>
</li>
<li><p>Requests to <code>mydomain.com/auth</code> → go to the auth backend pool</p>
</li>
<li><p>Requests to <code>mydomain.com/purchase</code> → go to the purchase backend pool</p>
</li>
</ul>
</li>
<li><p>Option B (Subdomain-based routing):</p>
<ul>
<li><p><code>payment.mydomain.com</code> → payment service</p>
</li>
<li><p><code>auth.mydomain.com</code> → auth service</p>
</li>
<li><p><code>purchase.mydomain.com</code> → purchase service</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<p>You just tell the Application Gateway: “Hey, if a request comes in for this URL or subdomain, send it to this port on these VMs.” And it does just that – consistently and intelligently 🔁</p>
<h3 id="heading-so-whats-really-happening">📦 So, What’s Really Happening?</h3>
<p>Imagine a user visits <code>mydomain.com/auth</code>. Here’s what goes on behind the scenes:</p>
<ol>
<li><p>The DNS translates <code>mydomain.com</code> to your Application Gateway’s public IP</p>
</li>
<li><p>The Gateway receives the request</p>
</li>
<li><p>It checks your routing rules</p>
</li>
<li><p>It sees that <code>/auth</code> should go to the backend pool for port 6000</p>
</li>
<li><p>It forwards the request to one of the VMs running the auth service</p>
</li>
<li><p>The response goes back to the user – fast and seamless ✨</p>
</li>
</ol>
<p>This happens in milliseconds, for every request. And because the Application Gateway is aware of multiple ports and services, it can handle routing logic that a regular Load Balancer just can’t do.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747056436345/7ea97231-d2ee-4f63-aff1-50595e7c06e0.png" alt="Application Gateway Illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-azure-load-balancer-vs-azure-application-gateway">🔍 Azure Load Balancer vs Azure Application Gateway</h2>
<p>By now, you've seen how both tools help route traffic in Azure – but they solve different problems.</p>
<p>Let’s break down how they compare, and when you should use one over the other 👇🏾</p>
<h3 id="heading-1-routing-logic">🛣️ 1. <strong>Routing Logic</strong></h3>
<p><strong>Azure Load Balancer</strong><br>It simply distributes incoming traffic evenly across a pool of VMs. It doesn’t care <em>what</em> the request is – it just balances the load.  </p>
<p>Imagine a delivery guy who doesn't ask questions – he just drops each package at the next available house.  </p>
<p>That’s what Azure Load Balancer does: it sends traffic to one of your servers without looking inside the request.</p>
<p><strong>Azure Application Gateway</strong><br>This is the smart one. It looks at <em>what’s inside</em> each request (like the URL path or domain) and makes intelligent decisions.</p>
<p>Just like a smarter delivery guy who looks at the address and decides where to go: "Oh! This one is for the payment office, not the main office."  </p>
<p>That’s what Application Gateway does: it reads the request (like the URL or domain name) and sends it to the right place according to the routing rules.</p>
<h3 id="heading-2-protocols-handled">🌐 2. <strong>Protocols Handled</strong></h3>
<p><strong>Load Balancer</strong><br>Works at the transport layer (Layer 4 in the OSI model). It deals with TCP/UDP traffic – raw network traffic, like HTTP, video streaming, games, and so on.</p>
<p><strong>Application Gateway</strong><br>Works at the application layer (Layer 7). It handles web traffic only – like websites and apps (HTTP/HTTPS) – and it can actually read what's being asked, like:</p>
<ul>
<li><p>“Go to /login”</p>
</li>
<li><p>“Go to <a target="_blank" href="http://payment.mydomain.com">payment.mydomain.com</a>”.</p>
</li>
</ul>
<p>TL;DR: Load Balancer just pushes packets. App Gateway actually <em>reads</em> your web requests.</p>
<h3 id="heading-3-use-case-scenarios">🔁 3. <strong>Use Case Scenarios</strong></h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Situation</td><td>Best Choice</td></tr>
</thead>
<tbody>
<tr>
<td>You have one big app and just want to spread users across servers</td><td>✅ Load Balancer</td></tr>
<tr>
<td>You have multiple services (like login, payment, and so on) and need to send users to the right one</td><td>✅ Application Gateway</td></tr>
<tr>
<td>You want to use subdomains (like <a target="_blank" href="http://login.mysite.com">login.mysite.com</a>)</td><td>✅ Application Gateway</td></tr>
<tr>
<td>You want to secure your website with HTTPS and Web Application Firewall (WAF)</td><td>✅ Application Gateway</td></tr>
<tr>
<td>You want the simplest setup and lowest cost</td><td>✅ Load Balancer</td></tr>
</tbody>
</table>
</div><h3 id="heading-4-ssl-termination-amp-security-features">🔐 4. <strong>SSL Termination &amp; Security Features</strong></h3>
<p><strong>Load Balancer</strong> doesn’t handle security stuff. You’ll need to secure each server yourself (for example, set up HTTPS on each one).</p>
<p><strong>Application Gateway</strong> can secure everything in one place – you upload your SSL certificate once and it takes care of HTTPS for all services.</p>
<p>It can also protect you from hackers and bad traffic with something called <strong>WAF (Web Application Firewall)</strong>, which protects your app from threats like SQL injection, XSS, and so on (you need to set this up manually).</p>
<h3 id="heading-5-pricing-and-complexity">💰 5. <strong>Pricing and Complexity</strong></h3>
<p><strong>Load Balancer</strong> is cheaper and easier to set up. Great when you don’t need anything fancy.</p>
<p><strong>Application Gateway</strong> costs more, but gives you more control and less headache when working with complex apps and microservices.</p>
<p>Trying to use Load Balancer for multiple services? You’ll need to create one Load Balancer per service, which becomes costly and impractical.</p>
<h3 id="heading-summary-table">🧠 Summary Table</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>Load Balancer</td><td>Application Gateway</td></tr>
</thead>
<tbody>
<tr>
<td>Can it understand the request?</td><td>❌ No</td><td>✅ Yes</td></tr>
<tr>
<td>Can it route based on URL or subdomain?</td><td>❌ No</td><td>✅ Yes</td></tr>
<tr>
<td>Can it handle secure HTTPS traffic?</td><td>❌ No</td><td>✅ Yes</td></tr>
<tr>
<td>Is it good for simple apps?</td><td>✅ Yes</td><td>✅ Yes</td></tr>
<tr>
<td>Is it good for complex apps with many services?</td><td>❌ No</td><td>✅ Yes</td></tr>
<tr>
<td>Cost</td><td>💲 Lower</td><td>💰 Higher</td></tr>
</tbody>
</table>
</div><h2 id="heading-use-cases-when-to-use-each-one">🧭 Use Cases: When to Use Each One</h2>
<p>There’s no one-size-fits-all when it comes to hosting apps in the cloud. The right setup depends on what you’re building, how much traffic you expect, and how complex your app is.</p>
<p>Let’s walk through 4 different use-case scenarios, starting from the most basic setup all the way to a fully auto-scaled and smartly routed architecture.</p>
<h3 id="heading-1-single-vm-instance-for-small-projects-or-internal-tools">1️⃣ <strong>Single VM Instance – For Small Projects or Internal Tools</strong></h3>
<p><strong>Use this when:</strong><br>You're just getting started. You’ve built a small app – maybe a portfolio, a blog, or a side project – and you want to make it live, OR You’re a startup that just launched.</p>
<p><strong>How it works:</strong><br>You spin up one Azure VM, install your app on it, and open the port it listens on (for example, port 80 for a web server). You can then attach a public IP to the VM and bind it to a custom domain like <code>myawesomeapp.com</code>.</p>
<p><strong>Real-life examples:</strong></p>
<ul>
<li><p>A developer hosting a portfolio website or blog</p>
</li>
<li><p>A startup testing a new product with only a few users</p>
</li>
<li><p>An internal company tool for a small team</p>
</li>
</ul>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Super simple setup</p>
</li>
<li><p>Low cost</p>
</li>
<li><p>Full control of your environment</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>If the VM goes down, your app goes down</p>
</li>
<li><p>No auto-scaling – performance may drop with traffic spikes (the only way to adapt to increased CPU/memory usage due to traffic inflow is via manually scaling the VM vertically)</p>
</li>
<li><p>You manually maintain and monitor everything</p>
</li>
</ul>
<h3 id="heading-2-manual-horizontal-scaling-for-apps-with-medium-predictable-traffic">2️⃣ <strong>Manual Horizontal Scaling – For Apps With Medium, Predictable Traffic</strong></h3>
<p><strong>Use this when:</strong><br>Your app is growing – maybe you have a few thousand users now, and performance matters. You want more than one server so your app doesn’t crash during busy hours.</p>
<p><strong>How it works:</strong><br>You manually create 2 or 3 Azure VMs with the same app setup. You then add a Load Balancer in front to split traffic evenly across them.</p>
<p><strong>Real-life examples:</strong></p>
<ul>
<li><p>A business with a customer portal</p>
</li>
<li><p>A school website that handles regular logins, lecture video streaming, and so on during class hours</p>
</li>
<li><p>An app that gets traffic mostly during the day (predictable load)</p>
</li>
</ul>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Better performance and availability</p>
</li>
<li><p>Load is shared across multiple VMs</p>
</li>
<li><p>You can scale manually when needed</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>You must manually add or remove VMs – which takes effort</p>
</li>
<li><p>Still need to monitor performance manually</p>
</li>
<li><p>No built-in automation or auto-healing</p>
</li>
</ul>
<h3 id="heading-3-auto-scaling-with-vm-scale-sets-azure-load-balancer-for-apps-with-spiky-or-unpredictable-traffic">3️⃣ <strong>Auto-Scaling with VM Scale Sets + Azure Load Balancer – For Apps With Spiky or Unpredictable Traffic</strong></h3>
<p><strong>Use this when:</strong><br>You’re building something more serious – traffic comes in waves (for example, a fitness/coach booking app), and you don’t want to sit around scaling VMs all day. You want Azure to automatically scale your infrastructure for you.</p>
<p><strong>How it works:</strong><br>You set up a Virtual Machine Scale Set (VMSS) that can automatically create more VMs when needed (like during high traffic), and remove them when things are calm — saving money. A Load Balancer distributes traffic across all those VMs.</p>
<p><strong>Real-life examples:</strong></p>
<ul>
<li><p>A media platform where people upload videos or photos</p>
</li>
<li><p>A shopping site that gets surges during promotions, for example Black Fridays</p>
</li>
<li><p>A booking platform with peak traffic in evenings/weekends</p>
</li>
</ul>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Automatic scaling – saves time and money</p>
</li>
<li><p>High availability: VMs can be replaced if one fails</p>
</li>
<li><p>Easy to grow as your user base grows</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>Works best if your app is monolithic (one big service)</p>
</li>
<li><p>No support for routing traffic to specific services – just spreads traffic across VMs</p>
</li>
<li><p>Load Balancer can’t look at URL paths or subdomains</p>
</li>
</ul>
<h3 id="heading-4-vm-scale-set-azure-application-gateway-for-microservices-or-complex-web-apps">4️⃣ <strong>VM Scale Set + Azure Application Gateway – For Microservices or Complex Web Apps</strong></h3>
<p><strong>Use this when:</strong><br>You have a modern, multi-service app – maybe built with microservices. Each service (like payments, authentication, search, and so on) lives on a different port or even in a container.</p>
<p>You want to route traffic smartly – like <code>/login</code> goes to the auth service, <code>/pay</code> to payments, and <code>/search</code> to the search service – all on the same domain.</p>
<p><strong>How it works:</strong><br>You still use a VM Scale Set for auto-scaling, but instead of a basic Load Balancer, you add an Application Gateway. It can inspect each request and send it to the right service based on things like:</p>
<ul>
<li><p>URL path (for example, <code>/payments</code>, <code>/orders</code>)</p>
</li>
<li><p>Subdomain (for example, <code>payments.mydomain.com</code>, <code>auth.mydomain.com</code>)</p>
</li>
</ul>
<p><strong>Real-life examples:</strong></p>
<ul>
<li><p>A full-blown SaaS product with multiple services</p>
</li>
<li><p>An e-commerce site with checkout, account, orders, and admin dashboards</p>
</li>
<li><p>A business migrating from a monolith to a microservices setup</p>
</li>
</ul>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Smart routing based on path or subdomain</p>
</li>
<li><p>Everything runs under one public IP and one domain</p>
</li>
<li><p>Secure HTTPS handling + optional Web Application Firewall (WAF)</p>
</li>
<li><p>Auto-scaling and high availability</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>More complex setup</p>
</li>
<li><p>Slightly higher cost due to Application Gateway</p>
</li>
<li><p>Needs planning around port numbers and backend pools</p>
</li>
</ul>
<h3 id="heading-quick-summary-table">🧠 Quick Summary Table</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Setup</td><td>Best For</td><td>Scaling</td><td>Routing Logic</td><td>Cost</td><td>Ease</td></tr>
</thead>
<tbody>
<tr>
<td>☁️ Single VM</td><td>Small sites, personal apps</td><td>❌ (Manual)</td><td>❌ One app only</td><td>💲 (Lowest)</td><td>⭐⭐⭐⭐</td></tr>
<tr>
<td>🧱 Manual Horizontal Scaling + Load Balancer</td><td>Mid-size apps, predictable traffic</td><td>✅ (Manual)</td><td>❌ One app only</td><td>💲💲💲 (due to multiple VMs running at once without down-scaling — even with no traffic)</td><td>⭐⭐ (due to manual scaling)</td></tr>
<tr>
<td>🔁 VMSS + Load Balancer</td><td>Busy apps, spiky traffic</td><td>✅ (Auto)</td><td>❌ One app only</td><td>💲💲</td><td>⭐⭐⭐</td></tr>
<tr>
<td>🍴 VMSS + App Gateway</td><td>Microservices, modern apps</td><td>✅ (Auto)</td><td>✅ Smart routing (involving multiple microservices)</td><td>💲💲💲💲(Highest)</td><td>⭐⭐</td></tr>
</tbody>
</table>
</div><h2 id="heading-conclusion">✅ Conclusion</h2>
<p>By now, you’ve gone from simply hearing the words “load balancer” or “scale set” to understanding exactly how they work, when to use them, and what problems they solve. Whether you’re just launching a small app or scaling up a high-traffic service, Azure gives you flexible, powerful tools to grow with confidence.</p>
<p>We started from the very beginning – a single virtual machine. It’s simple and great for small apps, but it quickly becomes a bottleneck as traffic grows.</p>
<p>That’s where scaling comes in. We explored:</p>
<ul>
<li><p>🧱 <strong>Vertical scaling</strong> – Upgrading the same VM (quick fix, but limited)</p>
</li>
<li><p>🧩 <strong>Horizontal scaling</strong> – Adding more VMs to handle traffic better</p>
</li>
</ul>
<p>Then we introduced Azure Virtual Machine Scale Sets (VMSS) – which bring auto-scaling to life. No more manual intervention – Azure can scale your servers up and down based on demand.</p>
<p>But where things really get smart is with load balancers:</p>
<ul>
<li><p>📦 <strong>Azure Load Balancer</strong> helps spread traffic across your VMs — great for single-service apps</p>
</li>
<li><p>🍴 <strong>Azure Application Gateway</strong> takes it further by routing requests based on URL paths or subdomains — perfect for multi-service or microservice apps</p>
</li>
</ul>
<h3 id="heading-tldr-what-should-you-use">🎯 TL;DR – What Should You Use?</h3>
<ul>
<li><p><strong>Single VM</strong>: For side projects, portfolios, or internal tools</p>
</li>
<li><p><strong>Manual scaling + Load Balancer</strong>: For medium apps with predictable load</p>
</li>
<li><p><strong>VMSS + Load Balancer</strong>: For monolithic apps with auto-scaling needs</p>
</li>
<li><p><strong>VMSS + Application Gateway</strong>: Also includes auto-scaling but for microservices or smart routing needs</p>
</li>
</ul>
<h3 id="heading-final-thoughts">💡 Final Thoughts</h3>
<p>Cloud apps grow – fast. And with growth comes complexity. But with the right Azure setup, you can stay one step ahead of your traffic, serve users better, and keep costs under control.</p>
<p>Remember: you don’t need to start big. Start small, understand your app's traffic patterns, and scale only when you need to. Tools like Azure VM Scale Sets, Load Balancer, and Application Gateway give you the control and power to build scalable, modern applications without over-engineering.</p>
<p>Thanks for sticking with me through this deep dive. I hope this made things clearer, simpler, and maybe even a little fun 😊</p>
<h2 id="heading-study-further"><strong>Study Further 📚</strong></h2>
<p>If you would like to learn more about Azure Virtual Machines, Scale Sets, Load Balancer, and Application Gateway, you can check out the courses below:</p>
<ul>
<li><p><a target="_blank" href="https://www.coursera.org/specializations/microsoft-azure-fundamentals-az900-exam-prep">Microsoft Azure Fundamentals AZ-900 Exam Prep Specialization</a> — Microsoft, Coursera</p>
</li>
<li><p><a target="_blank" href="https://youtu.be/QOv_-xBXkpo?si=kSijmQdev5cQbRKl">Azure Virtual Machine Tutorial | Creating A Virtual Machine In Azure | Azure Training | Simplilearn</a> — YouTube</p>
</li>
<li><p><a target="_blank" href="https://youtu.be/wN4lRWHUHA0?si=kWBGXhXZTnVgzuEj">Virtual machine scale sets</a> — YouTube</p>
</li>
<li><p><a target="_blank" href="https://youtu.be/VqBGjddK5VY?si=diLGQfuW5i0lxbse">Azure Load Balancer | Azure Load Balancer Tutorial | All About Load Balancer | Edureka</a> — YouTube</p>
</li>
<li><p><a target="_blank" href="https://youtu.be/V9EP4jAg4QM?si=t7EqQjw1eNHqOtjK">Azure Application Gateway Deep dive | Step by step explained</a> — YouTube</p>
</li>
</ul>
<h2 id="heading-about-the-author"><strong>About the Author 👨‍💻</strong></h2>
<p>Hi, I’m Prince! I’m a DevOps engineer and Cloud architect passionate about building, deploying, and managing scalable applications and sharing knowledge with the tech community.</p>
<p>If you enjoyed this article, you can learn more about me by exploring more of my blogs and projects on my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/">LinkedIn profile.</a> You can find my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/details/publications/">LinkedIn articles here</a>. You can also <a target="_blank" href="https://prince-onuk.vercel.app/achievements#articles">visit my website</a> to read more of my articles as well. Let’s connect and grow together! 😊</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn Kubernetes – Full Handbook for Developers, Startups, and Businesses ]]>
                </title>
                <description>
                    <![CDATA[ You’ve probably heard the word Kubernetes floating around, or it’s cooler nickname k8s (pronounced “kates“). Maybe in a job post, a tech podcast, or from that one DevOps friend who always brings it up like it’s the secret sauce to everything 😅. It s... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-kubernetes-handbook-devs-startups-businesses/</link>
                <guid isPermaLink="false">68150214fd424d0874293171</guid>
                
                    <category>
                        <![CDATA[ Kubernetes ]]>
                    </category>
                
                    <category>
                        <![CDATA[ containers ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Prince Onukwili ]]>
                </dc:creator>
                <pubDate>Fri, 02 May 2025 17:34:12 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1746205417767/d9d6b0d3-f2a5-44eb-83b5-d1a614bead9f.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>You’ve probably heard the word Kubernetes floating around, or it’s cooler nickname k8s (pronounced “kates“). Maybe in a job post, a tech podcast, or from that one DevOps friend who always brings it up like it’s the secret sauce to everything 😅. It sounds important, but also... kinda mysterious.</p>
<p>So what is Kubernetes, really? Why is it everywhere? And should you care?</p>
<p>In this handbook, we’ll unpack Kubernetes in a way that actually makes sense. No buzzwords. No overwhelming tech-speak. Just straight talk. You’ll learn what Kubernetes is, how it came about, and why it became such a big deal – especially for teams building and running huge apps with millions of users.</p>
<p>We’ll rewind a bit to see how things were done before Kubernetes showed up (spoiler: it wasn’t pretty), and walk through the real problems it was designed to solve.</p>
<p>By the end, you’ll not only understand the purpose of Kubernetes, but you’ll also know how to deploy a simple app on a Kubernetes cluster – even if you’re just getting started.</p>
<p>Yep, by the time we’re done, you’ll go from <em>“I keep hearing about Kubernetes”</em> to <em>“Hey, I kinda get it now!”</em> 😄</p>
<h2 id="heading-table-of-contents">📚 Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-kubernetes">What is Kubernetes?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-applications-were-deployed-before-kubernetes">How Applications Were Deployed Before Kubernetes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-problem-kubernetes-solves">The Problem Kubernetes Solves 🧠</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-kubernetes-works-components-of-a-kubernetes-environment">How Kubernetes Works – Components of a Kubernetes Environment 🧑‍🔧</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-kubernetes-workloads-pods-deployments-services-amp-more">Kubernetes Workloads 🛠️ – Pods, Deployments, Services, &amp; More</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to=create-a-kubernetes-cluster-in-a-demo-environment-with-play-with-k8s">How to Create a Kubernetes Cluster in a Demo Environment with play-with-k8s</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-sign-in-to-play-with-kubernetes">Sign in to Play with Kubernetes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-create-your-kubernetes-cluster">Create Your Kubernetes Cluster</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-deploy-your-application-on-a-kubernetes-cluster">How to Deploy an Application on Your Kubernetes Cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-advantages-of-using-kubernetes-in-business">✅ Advantages of Using Kubernetes in Business</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-disadvantages-of-using-kubernetes">😬 Disadvantages of Using Kubernetes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-use-cases-when-and-when-not-to-use-kubernetes">Use Cases: When (and When Not) to Use Kubernetes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-study-further">Study Further 📚</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-about-the-author">About the Author 👨‍💻</a></p>
</li>
</ol>
<h2 id="heading-what-is-kubernetes"><strong>What is Kubernetes?</strong></h2>
<p>Imagine you're building a huge software platform, like a banking app. This app needs many features, like user onboarding, depositing money, withdrawals, payments, and so on. These features are so big and complex that it’s easier to split them into separate applications. These individual applications are called microservices.</p>
<p><strong>So what are Microservices</strong>? Think of them like little building blocks that work together to create a bigger platform. So, you might have:</p>
<ul>
<li><p>One microservice for user onboarding</p>
</li>
<li><p>Another for processing deposits</p>
</li>
<li><p>Another for handling payments</p>
</li>
<li><p>And many, many more!</p>
</li>
</ul>
<p>To the user, it still looks like they’re using one smooth, unified banking app. But behind the scenes, it’s like a bunch of little apps working together to make everything run.</p>
<h3 id="heading-but-heres-where-things-get-tricky">But here’s where things get tricky...</h3>
<p>When you have dozens (or even hundreds) of these microservices, managing them becomes a nightmare. You might need to:</p>
<ul>
<li><p><strong>Deploy</strong> each one separately</p>
</li>
<li><p><strong>Monitor</strong> them individually (to ensure they don’t crash/become slow due to too much load)</p>
</li>
<li><p><strong>Scale</strong> them (make them bigger to handle more users) as traffic surges, one by one</p>
</li>
</ul>
<p>So, if your banking app suddenly gets millions of users, you'd have to manually tweak and update each microservice to keep it running smoothly. 😖 It’s a lot of work, and if something goes wrong, you’re in deep trouble.</p>
<h3 id="heading-this-is-where-kubernetes-comes-to-the-rescue">This is where Kubernetes comes to the rescue! 🚀</h3>
<p>Kubernetes is like a super-efficient manager for all these microservices. It’s a platform that helps you:</p>
<ul>
<li><p><strong>Automate</strong> the deployment (getting the apps up and running)</p>
</li>
<li><p><strong>Scale</strong> the microservices (making them bigger or smaller as needed based on the inflow of traffic – your customers)</p>
</li>
<li><p><strong>Monitor</strong> them (keeping an eye on their health)</p>
</li>
<li><p><strong>Ensure reliability</strong> (so if one microservice breaks/fails, k8s replaces it immediately)</p>
</li>
</ul>
<p>In simple terms, Kubernetes takes all your little microservices and organizes them, ensuring they run smoothly together, no matter how much traffic your app gets. It handles everything behind the scenes, like a conductor leading an orchestra, so your microservices work together without chaos.</p>
<h2 id="heading-how-applications-were-deployed-before-kubernetes"><strong>How Applications Were Deployed Before Kubernetes</strong></h2>
<p>Before Kubernetes came into the picture, software teams had quite the juggling act when it came to deploying applications – especially when they were made up of lots of microservices.</p>
<p>One popular method was using a <strong>distributed system</strong> setup. Here’s what that looked like:</p>
<p>Imagine each microservice (like your user onboarding, payments, deposits, and so on) being installed on separate servers (physical computers or virtual machines). Each of these servers had to be carefully prepared:</p>
<ul>
<li><p>The microservice itself needed to be installed.</p>
</li>
<li><p>The software dependencies it needed (like programming languages, libraries, tools) also had to be installed.</p>
</li>
<li><p>Everything had to be configured manually ON EACH server.</p>
</li>
</ul>
<p>And all of these servers had to talk to each other – sometimes over the public internet, or via private networks like VPNs.</p>
<p>Sounds like a lot of work, right? 😮 It was! Managing updates, fixing bugs, scaling up during traffic spikes, and keeping things from crashing could turn into a full-time headache for developers and system admins. 😖</p>
<h3 id="heading-then-came-containers">Then Came Containers 🚢</h3>
<p>A more modern solution that eased the pain (a little) was using containers.</p>
<p><strong>So, what are containers?</strong></p>
<p>Think of a container like a lunchbox for your microservice. Instead of installing the microservice and its supporting tools directly on a server, you pack everything it needs – code, settings, software libraries – into this single, neat container. Wherever the container goes, the microservice runs exactly the same way. No surprises!</p>
<p>Tools like <a target="_blank" href="https://www.docker.com/">Docker</a> made this super easy. Once your microservice was packed into a container, you could deploy it on:</p>
<ul>
<li><p>A single server</p>
</li>
<li><p>Multiple servers</p>
</li>
<li><p>Or cloud platforms like AWS Elastic Beanstalk, Azure App Service, or Google Cloud Run.</p>
</li>
</ul>
<h2 id="heading-the-problem-kubernetes-solves"><strong>The Problem Kubernetes Solves</strong> 🧠</h2>
<p>At first, when containers arrived on the scene, it felt like developers had struck gold.</p>
<p>You could package a microservice into a neat little container and run it anywhere – no more installing the same software on every server again and again. Tools like Docker and Docker Compose made this smooth for small projects.</p>
<p>But the real world? That’s where it got messy.</p>
<h3 id="heading-the-growing-headache-of-managing-containers">The Growing Headache of Managing Containers 💡</h3>
<p>When you have just a few microservices, you can manually deploy and manage their containers without much stress. But when your app grows – and you suddenly have dozens or even hundreds of microservices – managing them becomes an uphill battle:</p>
<ul>
<li><p>You had to deploy each container manually.</p>
</li>
<li><p>You had to restart them if one crashed.</p>
</li>
<li><p>You had to scale them one by one when more users started flooding in.</p>
</li>
</ul>
<p>Docker and Docker Compose were great for a small playground or startups, but not for an enterprise application with high traffic inflow.</p>
<h3 id="heading-cloud-managed-services-helped-but-only-up-to-a-point">Cloud-Managed Services Helped... But Only Up To a Point 🧑‍💻</h3>
<p>Cloud services like AWS Elastic Beanstalk, Azure App Service, and Google Code Engine offered a shortcut. They let you deploy containers without worrying about setting up servers.</p>
<p>You could:</p>
<ul>
<li><p>Deploy each container on its own managed cloud instance.</p>
</li>
<li><p>Scale them automatically based on traffic.</p>
</li>
</ul>
<p>BUT there were still some big headaches:</p>
<h4 id="heading-grouping-microservices-was-awkward-and-expensive">📦 Grouping microservices was awkward and expensive</h4>
<p>Sure, you could organize containers by environment (like “testing” or “production”) or even by team (like “Finance” or “HR”). But each new microservice usually needed its own cloud instance – for example, a separate Azure App Service or Elastic Beanstalk environment FOR EVERY SINGLE CONTAINER.</p>
<p>Imagine this:</p>
<ul>
<li><p>Each App Service instance costs ~$50 per month.</p>
</li>
<li><p>You’ve got 10 microservices.</p>
</li>
<li><p>That’s $500/month... even if they’re barely used. 💸 Yikes!</p>
</li>
</ul>
<h3 id="heading-kubernetes-smarter-leaner-and-more-flexible">Kubernetes: Smarter, Leaner, and More Flexible 💪</h3>
<p>With Kubernetes, you don’t need to spin up a separate server for each microservice. You can start with just one or two servers (VMs) – and Kubernetes will automatically decide which container goes where based on available space and resources.</p>
<p>No stress, no waste! 💡</p>
<h3 id="heading-kubernetes-lets-you-customize-everything">🧑‍🍳 <strong>Kubernetes Lets You Customize Everything</strong></h3>
<ol>
<li><p>You can assign resources to each microservice container.<br> 👉 Example: If you have a "Payment" microservice that’s lightweight, you might give it 0.5 vCPUs and 512MB of memory. If you have a "Data Analytics" microservice that’s resource-hungry, you could give it 2 vCPUs and 4GB of memory.</p>
</li>
<li><p>You can set a minimum number of instances for each microservice.<br> 👉 Example: If you want at least 2 copies of your "Login" service always running (so your app doesn’t break if one fails), Kubernetes makes sure you always have 2 live copies at all times.</p>
</li>
<li><p>You can group your containers however you like:<br> 👉 By teams (Finance, HR, DevOps) or by environments (Testing, Staging, Production). Kubernetes makes this grouping super clean and logical.</p>
</li>
<li><p>You can automatically scale individual containers.<br> 👉 When more users flood your app, Kubernetes can create extra copies (called “replicas”) of only the containers that are under pressure. No more wasting resources on containers that don’t need it.</p>
</li>
<li><p>You can even scale your servers!<br> 👉 Kubernetes can automatically increase the number of servers (VMs) in your environment – called a <strong>Cluster</strong> – when traffic grows. So you could start with 2 VMs at $30 each ($60/month) and let Kubernetes add more servers only when necessary, rather than locking yourself into high fixed costs like $500/month for cloud-managed services.</p>
</li>
</ol>
<p>Also, Kubernetes works <strong>the same way everywhere</strong>. Whether you deploy your containers on AWS, Google Cloud, Azure, or even your own laptop – Kubernetes doesn’t care. Your setup stays the same.</p>
<p>Compare that to managed services like Elastic Beanstalk or Azure App Service – which tie you to their platform, making it super hard to switch later.</p>
<p>✅ <strong>In short:</strong> Kubernetes saves you money, time, and a whole lot of headaches. It lets you run, scale, and organize your microservices without being chained to a single cloud provider — and without drowning in manual work.</p>
<h2 id="heading-how-kubernetes-works-components-of-a-kubernetes-environment"><strong>How Kubernetes Works — Components of a Kubernetes Environment</strong> 🧑‍🔧</h2>
<p>So by now you’ve seen the problem: running dozens (or hundreds!) of microservices manually is like juggling too many balls – you’re bound to drop some.</p>
<p>That’s why Kubernetes was created. But... how does it actually do all this magic? Let’s first break it down with the technical definition (simple but sharp – perfect for interviews) and then the layperson’s analogy (so it sticks in your head!).</p>
<h3 id="heading-1-cluster">1️⃣ <strong>Cluster 🏰</strong></h3>
<p>A Kubernetes Cluster is the entire setup of machines (physical or cloud-based) where Kubernetes runs. It’s made of one or more Master Nodes and Worker Nodes, working together to deploy and manage containerized applications.</p>
<p>Think of a Kubernetes Cluster as your entire playground. This is the environment where all your microservices live, grow, and play together.</p>
<p>A cluster is made up of two types of computers (called nodes):</p>
<ul>
<li><p>Master Node (nowadays often called the Control Plane)</p>
</li>
<li><p>Worker Nodes</p>
</li>
</ul>
<h3 id="heading-2-master-node-control-plane">2️⃣ <strong>Master Node (Control Plane) 👑</strong></h3>
<p>The Master Node is like the brain of Kubernetes. It manages and coordinates the whole cluster – deciding which applications run where, monitoring health, and scaling things up or down as needed.</p>
<p>It’s like the boss of the entire cluster. It doesn’t run your applications directly. Instead, it:</p>
<ul>
<li><p>Watches over the worker nodes</p>
</li>
<li><p>Decides which microservice (container) goes where</p>
</li>
<li><p>Makes sure everything runs smoothly and fairly</p>
</li>
</ul>
<p>Think of it like a factory manager who tells machines what to do, when to start, when to stop, and where to send the next package.</p>
<p>Inside the Master Node are a few clever mini-components that handle the real work.</p>
<h3 id="heading-3-api-server">3️⃣ <strong>API Server 💌</strong></h3>
<p>The API Server is the front door to Kubernetes. It handles communication between users and the system, taking commands and feeding them into the cluster.</p>
<p>This is where you (or your team) give Kubernetes instructions. Whether you're deploying a new app or scaling an existing one, you "talk" to the API Server first. It's like submitting a request at the front desk – the API server passes it on to the right people (or machines).</p>
<h3 id="heading-4-scheduler">4️⃣ <strong>Scheduler 📅</strong></h3>
<p>The Scheduler assigns Pods (applications) to Worker Nodes based on available resources and needs.</p>
<p>Imagine you’ve asked Kubernetes to launch a new microservice. The Scheduler checks:</p>
<ul>
<li><p>Which worker node has enough space?</p>
</li>
<li><p>Which node has enough memory and CPU?</p>
</li>
<li><p>Where would this service run best?</p>
</li>
</ul>
<p>It makes the decision and assigns the microservice to the perfect spot. Smart, huh?</p>
<h3 id="heading-5-controller-manager">5️⃣ <strong>Controller Manager 🎛️</strong></h3>
<p>The Controller Manager runs controllers that watch over the cluster and ensures that the system’s actual state matches the desired state.</p>
<p>This component watches over the system like a hawk. Let’s say you told Kubernetes:<br><em>"Hey, I want 3 copies of my payment microservice running at all times."</em></p>
<p>If one of them crashes, the Controller Manager sees that and spins up a new one to replace it automatically. It makes sure the reality always matches the plan.</p>
<h3 id="heading-6-etcd">6️⃣ <strong>etcd 📚</strong></h3>
<p>etcd is Kubernetes' memory – a distributed key-value store where cluster data is saved: config files, state, and metadata.</p>
<p>Imagine a notebook where all rules, records, and plans are written down. Without etcd, Kubernetes would forget everything.</p>
<h3 id="heading-7-worker-nodes">7️⃣ <strong>Worker Nodes 💪</strong></h3>
<p>Worker Nodes are the servers that run the actual application containers, doing the heavy lifting in the cluster.</p>
<p>These are the machines where your microservices actually live and run. The Master Node gives orders, but the Worker Nodes do the heavy lifting – they run your containers!</p>
<p>Each worker node has a few helpers to manage its microservices:</p>
<ul>
<li><p>The Kubelet</p>
</li>
<li><p>The Kube Proxy</p>
</li>
</ul>
<h3 id="heading-8-kubelet">8️⃣ <strong>Kubelet 📢</strong></h3>
<p>The Kubelet is the agent which lives on each Worker Node that makes sure containers are healthy and running as expected.</p>
<p>It listens to the Master Node’s instructions. If the Master Node says:<em>"Hey, run this container!",</em> the Kubelet makes it happen and keeps it running. If something goes wrong, the Kubelet reports back to the Master Node</p>
<h3 id="heading-9-kube-proxy">9️⃣ <strong>Kube Proxy 🚦</strong></h3>
<p>Kube Proxy handles network traffic, ensuring that Pods can talk to each other and to the outside world.</p>
<p>Imagine your banking app’s login service needs to talk to the payments service. The Kube Proxy handles the routing so the request reaches the right place. It also handles load balancing, so no single microservice gets overwhelmed.</p>
<p>So, to summarize:</p>
<ul>
<li><p>The Master Node is the boss – it plans, watches, and assigns tasks.</p>
</li>
<li><p>The Worker Nodes do the actual work – running your microservices.</p>
</li>
<li><p>Components like etcd, Kubelet, Scheduler, Controller Manager, and Kube Proxy all work together like parts of a well-oiled machine.</p>
</li>
</ul>
<p>Kubernetes is designed to handle your microservices automatically – keeping them alive, scaling them up, moving them around, and restarting them if they crash – so you don’t have to babysit them yourself.</p>
<h2 id="heading-kubernetes-workloads-pods-deployments-services-amp-more">Kubernetes Workloads 🛠️ — Pods, Deployments, Services, &amp; More</h2>
<p>Kubernetes workloads are the objects you use to manage and run your applications. Think of them as blueprints 📐 that tell Kubernetes <strong>what</strong> to run and <strong>how</strong> to run it – whether it’s a single app container, a group of containers, a database, or a batch job. Here are some of the workloads in Kubernetes:</p>
<h3 id="heading-1-pods">1️⃣ <strong>Pods</strong></h3>
<p>A <strong>Pod</strong> is the smallest and simplest unit in the Kubernetes object model. It represents a single instance of a running process in your cluster and can contain one or more containers that share storage and network resources. ​</p>
<p>Think of a Pod as a wrapper around one or more containers that need to work together. They share the same network IP and storage, allowing them to communicate easily and share data. Pods are ephemeral (live for a short time, they can be replaced very easily). If a Pod dies, Kubernetes can create a new one to replace it almost instantly.​</p>
<p>Say you have an application which is split into 2 distributed monoliths – a frontend and a backend. The frontend will run in a container in Pod A, while the backend app will run in a container in another Pod B.</p>
<h3 id="heading-2-deployments">2️⃣ <strong>Deployments</strong></h3>
<p>A <strong>Deployment</strong> provides declarative updates for Pods and ReplicaSets. You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to the desired state at a controlled rate.</p>
<p>Deployments manage the lifecycle of your application Pods. They ensure that the specified number of Pods are running and can handle updates, rollbacks, and scaling. If a Pod fails, the Deployment automatically replaces it to maintain the desired state.​</p>
<p>Imagine you're managing a store. A Deployment is like the store manager – you tell it how many workers (Pods) you want, and it makes sure they’re always present. If one doesn't show up for work, the manager finds a replacement automatically. You can also tell it to hire more workers or fire some when needed.</p>
<h3 id="heading-3-services">3️⃣ <strong>Services</strong></h3>
<p>A <strong>Service</strong> in Kubernetes defines a way to access/communicate with Pods. Services enable communication between different Pods (for example, your frontend Pod A can communicate with your backend Pod B via a service) and can expose your application to external traffic (for example the public internet). ​</p>
<p>Services act as a stable endpoint to access a set of Pods. Even if the underlying Pods change, the Service's IP and DNS name remain constant, ensuring communication between the Pods within the cluster or with the internet.</p>
<p>A Service is like the front door to your app. No matter which worker (Pod) is behind it, people always use the same entrance to access it. It hides the messy stuff happening behind the scenes and gives users a simple way to connect to your app.</p>
<h3 id="heading-4-replicasets">4️⃣ <strong>ReplicaSets</strong></h3>
<p>A <strong>ReplicaSet</strong> ensures that a specified number of identical Pods are running at any given time. It is often used to guarantee the availability of a specified number of Pods (horizontal scaling). ​</p>
<p>ReplicaSets maintain a stable set of running Pods. If a Pod crashes or is deleted, the ReplicaSet automatically creates a new one to replace it, ensuring your application remains available.​</p>
<p>Think of a ReplicaSet like a robot that counts how many copies of your app are running. If one goes missing, it automatically makes a new one. It keeps the number steady, just like you told it to.</p>
<h3 id="heading-5-daemonsets">5️⃣ <strong>DaemonSets</strong></h3>
<p>A <strong>DaemonSet</strong> ensures that all (or some) Nodes run an instance (a copy) of a specific Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are also removed. ​</p>
<p>DaemonSets are used to deploy a Pod on every node in the cluster. This is useful for running background tasks like log collection or monitoring agents on all nodes (for example to get the CPU, memory, and disk usage of each node).​</p>
<p>A DaemonSet is like saying, “I want this helper app to run on <strong>every single computer</strong> we have.” As mentioned earlier, it’s great for things like log collectors or security checkers – small helpers that every machine should have.</p>
<h3 id="heading-6-statefulsets">6️⃣ <strong>StatefulSets</strong></h3>
<p>A <strong>StatefulSet</strong> is the workload API object used to manage stateful applications (applications that store data, for example in their filesystem – databases). It manages the deployment and scaling of a set of Pods and provides guarantees about the ordering and uniqueness of these Pods.</p>
<p>StatefulSets are designed for applications that require persistent storage and stable network identities, like databases.</p>
<p>Let’s say you’re running a database or anything that needs to save info. A StatefulSet is like giving each app a name tag and a personal drawer to store their stuff. Even if you restart them, they come back with the same name and same drawer.</p>
<h3 id="heading-7-jobs">7️⃣ <strong>Jobs</strong></h3>
<p>A <strong>Job</strong> creates one or more Pods and ensures that a specified number of them successfully terminate. As Pods successfully complete, the Job tracks the successful completions. When a specified number of successful completions is reached, the Job is complete. ​</p>
<p>A Job is like a one-time task. Imagine sending out a batch of emails or processing a report. You want the task to run, finish, and then stop. That’s exactly what a Job does.</p>
<h3 id="heading-8-cronjobs">8️⃣ <strong>CronJobs</strong></h3>
<p>A <strong>CronJob</strong> creates Jobs on a time-based schedule. It runs a Job periodically on a given schedule, written in Cron format.</p>
<p>A CronJob is like setting a reminder or alarm. It tells your app (in this case the Job) to do something every night at 2 AM, every Monday morning, or once a month – whatever schedule you give it.</p>
<h2 id="heading-how-to-create-a-kubernetes-cluster-in-a-demo-environment-with-play-with-k8s">🛠️ How to Create a Kubernetes Cluster in a Demo Environment with <code>play-with-k8s</code></h2>
<p>As we've discussed earlier, a Kubernetes cluster is a set of machines (called nodes) that run containerized applications.</p>
<p>Setting up a Kubernetes cluster locally or in the cloud can be complex and expensive. To simplify the learning process, Docker provides a free, browser-based platform called <a target="_blank" href="https://labs.play-with-k8s.com/">Play with Kubernetes</a>. This environment allows you to create and interact with a Kubernetes cluster without installing anything on your local machine. It's an excellent tool for beginners to get hands-on experience with Kubernetes.​</p>
<h3 id="heading-sign-in-to-play-with-kubernetes">🔐 Sign in to Play with Kubernetes</h3>
<ol>
<li><p><strong>Visit the platform</strong> at <a target="_blank" href="https://labs.play-with-k8s.com/">https://labs.play-with-k8s.com/</a>.​</p>
</li>
<li><p><strong>Authenticate:</strong></p>
<ul>
<li><p>Click on the "Login" button.</p>
</li>
<li><p>You can sign in using your Docker Hub or GitHub account.</p>
</li>
<li><p>If you don't have an account, you can create one for free on <a target="_blank" href="https://hub.docker.com/">Docker Hub</a> or <a target="_blank" href="https://github.com/">GitHub</a>.​</p>
</li>
</ul>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746083007442/a038ee6c-b471-4880-ba17-2e8927678780.png" alt="Sign in to Play with k8s" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-create-your-kubernetes-cluster">🚀 Create Your Kubernetes Cluster</h3>
<p>Once signed in, follow these steps to set up your cluster:</p>
<h4 id="heading-step-1-start-a-new-session">Step 1: Start a New Session:</h4>
<p>Click on the <strong>"Start"</strong> button to initiate a new session.​ This will create a new session giving you about 4 hours of play time, after which the cluster and it’s resources will be automatically terminated.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746083204331/8410e18b-4ed4-4374-8d4f-44f0fefa1623.png" alt="Play with k8s timed session" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-step-2-add-instances">Step 2: Add Instances:</h4>
<p>Then click on <strong>"+ Add New Instance"</strong> to create a new node (Virtual Machine).  </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746083280594/740d963a-c70f-43c6-8354-e6ea0c3d7f41.png" alt="Create new master node (VM)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This will open a terminal window where you can run commands.​  </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746083304493/ffd34d73-e5cd-41d0-908a-2240924e7ad0.png" alt="Terminal of newly created node" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-step-3-initialize-the-master-node">Step 3: Initialize the Master Node:</h4>
<p>In the terminal, run the following command to initialize the master node:​</p>
<pre><code class="lang-bash">kubeadm init --apiserver-advertise-address $(hostname -i) --pod-network-cidr &lt;SPECIFIED_IP_ADDRESS&gt;
</code></pre>
<p>You can find the command in the terminal. In my case, the IP address is <code>10.5.0.0/16</code>. Replace the <code>&lt;SPECIFIED_IP_ADDRESS&gt;</code> placeholder with the IP address specified in your terminal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746083865451/fdf18710-c987-4221-bc02-369cd709a849.png" alt="Initialize the master node and the control plane" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This process will set up the control plane of your Kubernetes cluster.​</p>
<h4 id="heading-step-4-add-worker-nodes">Step 4: Add Worker Nodes:</h4>
<p>If you want to add worker nodes, in the master node terminal, you'll find a <code>kubeadm join...</code> command after running the <code>kubeadm init --apiserver-advertise-address $(hostname -i) --pod-network-cidr &lt;SPECIFIED_IP_ADDRESS&gt;</code> command.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746084559142/6e539ef6-0219-40da-95e7-42abc9f1af8c.png" alt="Command to add worker node to control plane" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Click on <strong>"+ Add New Instance"</strong> to create another node just as you did earlier.</p>
<p>Run this command in the new node's terminal to join it to the cluster:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746084666411/78f07ba1-7f1f-402e-9ed8-c4d6054bdcab.png" alt="Add worker node to control plane" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-step-5-configure-the-clusters-networking">Step 5: Configure the Cluster’s networking:</h4>
<p>Navigate to the master node, and run the command below to configure the cluster’s networking.</p>
<pre><code class="lang-bash">kubectl apply -f https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/daemonset/kubeadm-kuberouter.yaml
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746085296963/ba35966c-5dd1-4e17-b4b5-85639cb3a80d.png" alt="Configure networking in the cluster" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-step-6-verify-the-cluster">Step 6: Verify the Cluster:</h4>
<p>In the master node terminal (the first node with the highlighted user profile), run:​</p>
<pre><code class="lang-bash">kubectl get nodes
</code></pre>
<p>You should see a list of nodes in your cluster, including the master and any worker nodes you've added.​</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746085583418/45e55418-4b0f-461f-98d8-3b0c8f19b839.png" alt="Nodes in the cluster" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Congratulations! You just created your very own Kubernetes cluster with 2 VMs: the master node (where the control plane resides), and the worker nodes (where the Kubernetes workloads, for example Pods, will be deployed).</p>
<h2 id="heading-how-to-deploy-an-application-on-your-kubernetes-cluster">🚀 How to Deploy an Application on Your Kubernetes Cluster</h2>
<p>Now that we've set up our Kubernetes cluster using Play with Kubernetes, it's time to deploy the application and make it accessible over the internet.</p>
<h3 id="heading-understanding-imperative-vs-declarative-approaches-in-kubernetes">🧠 Understanding Imperative vs. Declarative Approaches in Kubernetes</h3>
<p>Before we proceed, it's essential to grasp the two primary methods for managing resources in Kubernetes: <strong>Imperative</strong> and <strong>Declarative</strong>.</p>
<h3 id="heading-imperative-approach">🖋️ Imperative Approach</h3>
<p>In the imperative approach, you directly issue commands to the Kubernetes API to create or modify resources. Each command specifies the desired action, and Kubernetes executes it immediately.​</p>
<p>Imagine telling someone, "Turn on the light." You're giving a direct command, and the action happens right away. Similarly, with imperative commands, you instruct Kubernetes step-by-step on what to do.</p>
<p><strong>Example:</strong><br>To create a pod running an NGINX container, run the below command in the terminal of the master node:​</p>
<pre><code class="lang-bash">kubectl run nginx-pod --image=nginx
</code></pre>
<p>Now wait a few seconds and run the command below to check the status of the pod:</p>
<pre><code class="lang-bash">kubectl get pods
</code></pre>
<p>You should get a response similar to this</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746087463204/52ef26e5-96df-4d91-8a2d-7527a38786d2.png" alt="Get pods running in the cluster" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now let’s expose our Pod to the internet by creating a <strong>Service.</strong> Run the command below to expose the Pod:</p>
<pre><code class="lang-bash">kubectl expose pod nginx-pod --<span class="hljs-built_in">type</span>=NodePort --port=80
</code></pre>
<p>To get the IP address of the Cluster so we can access our Pod, run the command below:</p>
<pre><code class="lang-bash">kubectl get svc
</code></pre>
<p>The command displays the IP address from which we can access our service. You should get an output similar to this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746088678881/a4f3bdbc-c7eb-4696-ba6e-587637be5792.png" alt="Get service IP address" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now, copy the IP address for the <code>nginx-pod</code> service and run the command below to make a request to your Pod:</p>
<pre><code class="lang-bash">curl &lt;YOUR-SERVICE-IP-ADDRESS&gt;
</code></pre>
<p>Replace the <code>&lt;YOUR-SERVICE-IP-ADDRESS&gt;</code> placeholder with the IP address of your <code>nginx-pod</code> service. In my case, it’s <code>10.98.108.173</code>.</p>
<p>You should get a response from your <code>nginx-pod</code> Pod:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746088937046/8b86cd63-21f0-45d3-9ab5-59bd630fb37c.png" alt="Make a request to the Nginx Pod running in the Cluster" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>We couldn’t access the Pod from the internet, that is our browser, because our Cluster isn’t connected to a cloud service like AWS or Google Cloud which can provide us with an external load balancer.</p>
<p>Now let’s try doing the same thing but using the Declarative method.</p>
<h3 id="heading-declarative-approach">🚀 Declarative Approach</h3>
<p>So far, we used the imperative approach, where we typed commands like <code>kubectl run</code> or <code>kubectl expose</code> directly into the terminal to make Kubernetes do something immediately.</p>
<p>But Kubernetes has another (and often better) way to do things: the declarative approach.</p>
<h4 id="heading-what-is-the-declarative-approach">🧾 What Is the Declarative Approach?</h4>
<p>Instead of giving Kubernetes instructions step-by-step like a chef in a kitchen, you give it a full recipe – a file that describes exactly what you want (for example, what app to run, how many copies of it, how to expose it, and so on).</p>
<p>This recipe is written in a file called a <strong>manifest</strong>.</p>
<h4 id="heading-whats-a-manifest">📘 What’s a Manifest?</h4>
<p>A manifest is a file (usually written in YAML format) that describes a Kubernetes object – like a Pod, a Deployment, or a Service.</p>
<p>It’s like writing down what you want, handing it over to Kubernetes, and saying: “Hey, please make sure this exists exactly how I described it.”</p>
<p>We’ll use two manifests:</p>
<ol>
<li><p>One to deploy our application</p>
</li>
<li><p>Another to expose it to the internet</p>
</li>
</ol>
<p>Let’s walk through it!</p>
<h4 id="heading-step-1-clone-the-github-repo">📁 Step 1: Clone the GitHub Repo</h4>
<p>We already have a GitHub repo that contains the two manifest files we need. Let’s clone it into our Kubernetes environment.</p>
<p>Run this in the terminal (on your master node):</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> https://github.com/onukwilip/simple-kubernetes-app
</code></pre>
<p>Now, let’s go into the folder:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> simple-kubernetes-app
</code></pre>
<p>You should see two files:</p>
<ul>
<li><p><code>deployment.yaml</code></p>
</li>
<li><p><code>service.yaml</code></p>
</li>
</ul>
<h4 id="heading-step-2-understanding-the-deployment-manifest-deploymentyaml">📦 Step 2: Understanding the Deployment Manifest (<code>deployment.yaml</code>)</h4>
<p>This manifest will tell Kubernetes to deploy our app and ensure it’s always running.</p>
<p>Here’s what’s inside:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">nginx-deployment</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">nginx</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">nginx</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nginx</span>
        <span class="hljs-attr">image:</span> <span class="hljs-string">nginx</span>
</code></pre>
<p>Now, let’s break this down:</p>
<ul>
<li><p><code>apiVersion: apps/v1</code>: This tells Kubernetes which version of the API we’re using to define this object.</p>
</li>
<li><p><code>kind: Deployment</code>: This means we’re creating a Deployment (a controller that manages Pods).</p>
</li>
<li><p><code>metadata.name</code>: We’re giving our Deployment a name: <code>nginx-deployment</code>.</p>
</li>
<li><p><code>spec.replicas: 3</code>: We’re telling Kubernetes: “Please run 3 copies (replicas) of this app.”</p>
</li>
<li><p><code>selector.matchLabels</code>: Kubernetes will use this label to find which Pods this Deployment is managing.</p>
</li>
<li><p><code>template.metadata.labels</code> &amp; <code>spec.containers</code>: This section describes the Pods that the Deployment should create – each Pod will run a container using the official <code>nginx</code> image.</p>
</li>
</ul>
<p>✅ In plain terms: We're asking Kubernetes to create and maintain 3 copies of an app that runs NGINX, and automatically restart them if any fails.</p>
<h4 id="heading-step-3-understanding-the-service-manifest-serviceyaml">🌐 Step 3: Understanding the Service Manifest (<code>service.yaml</code>)</h4>
<p>This file tells Kubernetes to expose our NGINX app to the outside world using a Service.</p>
<p>Here’s the file – let’s break this down, too:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">nginx-service</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">NodePort</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">nginx</span>
  <span class="hljs-attr">ports:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
    <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
    <span class="hljs-attr">targetPort:</span> <span class="hljs-number">80</span>
</code></pre>
<ul>
<li><p><code>apiVersion: v1</code>: We’re using version 1 of the Kubernetes API.</p>
</li>
<li><p><code>kind: Service</code>: We’re creating a Service object.</p>
</li>
<li><p><code>metadata.name: nginx-service</code>: Giving it a name.</p>
</li>
<li><p><code>spec.type: NodePort</code>: We’re exposing it through a port on the node (so we can access it via the node's IP address).</p>
</li>
<li><p><code>selector.app: nginx</code>: This tells Kubernetes to connect this Service to Pods with the label <code>app: nginx</code>.</p>
</li>
<li><p><code>ports.port</code> and <code>targetPort</code>: The Service will listen on port 80 and forward traffic to port 80 on the Pod.</p>
</li>
</ul>
<p>✅ In plain terms: This file says, “Expose our NGINX app through the cluster’s network so we can access it from the outside world.”</p>
<h4 id="heading-step-4-clean-up-previous-resources">🧹 Step 4: Clean Up Previous Resources</h4>
<p>If you’re still running the Pod and Service we created using the imperative approach, let’s delete them to avoid conflicts:</p>
<pre><code class="lang-bash">kubectl delete pod nginx-pod
kubectl delete service nginx-pod
</code></pre>
<h4 id="heading-step-5-apply-the-manifests">📥 Step 5: Apply the Manifests</h4>
<p>Now let’s deploy the NGINX app and expose it – this time using the <strong>declarative</strong> way.</p>
<p>From inside the <code>simple-kubernetes-app</code> folder, run:</p>
<pre><code class="lang-bash">kubectl apply -f deployment.yaml
</code></pre>
<p>Then:</p>
<pre><code class="lang-bash">kubectl apply -f service.yaml
</code></pre>
<p>This will create the Deployment and the Service described in the files. 🎉</p>
<h4 id="heading-step-6-check-that-its-running">🔍 Step 6: Check That It’s Running</h4>
<p>Let’s see if the Pods were created:</p>
<pre><code class="lang-bash">kubectl get pods
</code></pre>
<p>You should see 3 Pods running!</p>
<p>And let’s check the service:</p>
<pre><code class="lang-bash">kubectl get svc
</code></pre>
<p>Look for the <code>nginx-service</code>. You’ll see something like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746092825896/617084f1-3a71-4cfd-a287-9f7a9ac08810.png" alt="Access service NodePort" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Note the <strong>NodePort</strong> (for example, <code>30001</code>) as we’ll use it to access the app.</p>
<h4 id="heading-step-7-access-the-app">🌍 Step 7: Access the App</h4>
<p>You can now send a request to your app like this:</p>
<pre><code class="lang-bash">curl http://&lt;YOUR-NODE-IP&gt;:&lt;NODE-PORT&gt;
</code></pre>
<blockquote>
<p>Replace <code>&lt;YOUR-NODE-IP&gt;</code> with the IP of your master node (you’ll usually find this in Play With Kubernetes at the top of your terminal), and <code>&lt;NODE-PORT&gt;</code> with the NodePort shown in the <code>kubectl get svc</code> command.</p>
</blockquote>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746092570586/b33cabc0-ea1e-4a70-ab55-9f3a0761bec0.png" alt="Get master node IP address" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You should see the HTML content of the NGINX welcome page printed out.</p>
<p>Now terminate the cluster environment by clicking the <strong>CLOSE SESSION</strong> button:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746093081895/79139f75-5e6b-4991-be74-38ecbbf2ef66.png" alt="79139f75-5e6b-4991-be74-38ecbbf2ef66" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-why-declarative-is-better-in-most-cases">🆚 Why Declarative Is Better (In Most Cases)</h3>
<ul>
<li><p>🔁 <strong>Reusable</strong>: You can use the same files again and again.</p>
</li>
<li><p>📦 <strong>Version-controlled</strong>: You can push these files to GitHub and track changes over time.</p>
</li>
<li><p>🛠️ <strong>Fixes mistakes easily</strong>: Want to change 3 replicas to 5? Just update the file and re-apply!</p>
</li>
<li><p>🧠 <strong>Easier to maintain</strong>: Especially when you have many resources to manage.</p>
</li>
</ul>
<h2 id="heading-advantages-of-using-kubernetes-in-business">💼 Advantages of Using Kubernetes in Business</h2>
<p>Kubernetes isn’t just a developer tool—it’s a business enabler as well. It helps companies deliver products faster, more reliably, and with reduced operational overhead.</p>
<p>Let’s break down how Kubernetes translates to real-world business benefits:</p>
<h3 id="heading-1-better-use-of-cloud-resources-cost-savings">1️⃣ <strong>Better Use of Cloud Resources = Cost Savings</strong></h3>
<p>Before Kubernetes, deploying many microservices for a single application often meant creating separate cloud resources (like one Azure App Service per microservice), which could rack up huge costs quickly. Imagine $50/month per service × 10 services = $500/month 😬.</p>
<p><strong>With Kubernetes:</strong><br>You can run multiple microservices on fewer virtual machines (VMs) while Kubernetes automatically decides the most efficient way to use the available servers. That means you pay for fewer servers and get more out of them 💸.</p>
<h3 id="heading-2-high-availability-and-uptime-happy-customers">2️⃣ <strong>High Availability and Uptime = Happy Customers</strong></h3>
<p>Kubernetes watches your apps like a hawk 👀. If one of them crashes or fails, Kubernetes restarts or replaces it <em>immediately</em> – automatically.</p>
<p><strong>For your business:</strong><br>This means less downtime, fewer support tickets, and happier customers who don’t even notice when things go wrong in the background.</p>
<h3 id="heading-3-easy-scaling-during-high-demand">3️⃣ <strong>Easy Scaling During High Demand</strong></h3>
<p>Manually scaling apps during high traffic (like Black Friday) can be a nightmare 😰. And if you don't act fast, customers experience slowness or crashes.</p>
<p><strong>With Kubernetes:</strong><br>You can configure each microservice to automatically scale — meaning it adds more instances of that service <em>only when needed</em> (too many users on your site trying to purchase different products) and scales back down when traffic drops. This ensures your app is always responsive and you only pay for what you use.</p>
<h3 id="heading-4-faster-deployment-faster-time-to-market">4️⃣ <strong>Faster Deployment = Faster Time to Market</strong></h3>
<p>Kubernetes supports automation and repeatability. Teams can deploy new features or microservices faster without worrying about infrastructure setup every time.</p>
<p><strong>For business:</strong><br>This means faster product updates, quicker response to market demands, and competitive advantage 🚀.</p>
<h3 id="heading-5-consistent-environments-fewer-bugs">5️⃣ <strong>Consistent Environments = Fewer Bugs</strong></h3>
<p>Each microservice in Kubernetes is containerized, meaning it runs with all its dependencies in a self-contained package. You can run the exact same app setup in:</p>
<ul>
<li><p>Development</p>
</li>
<li><p>Testing</p>
</li>
<li><p>Production</p>
</li>
</ul>
<p>This reduces bugs caused by "it works on my machine" issues 🤦‍♂️ and helps teams build with confidence.</p>
<h3 id="heading-6-vendor-independence-bye-bye-to-vendor-lock-in">6️⃣ <strong>Vendor Independence (Bye-bye to Vendor lock-in)</strong></h3>
<p>When you use cloud-managed services (like AWS Elastic Beanstalk or Azure App Service), it’s often hard to move to another provider because everything is tailored to that specific platform.</p>
<p><strong>With Kubernetes:</strong><br>It works the same way on AWS, Azure, GCP, or even your own data center. This means you can switch cloud providers easily and avoid being locked into one vendor – aka cloud freedom! ☁️🕊️</p>
<h3 id="heading-7-organizational-clarity">7️⃣ <strong>Organizational Clarity</strong></h3>
<p>Kubernetes lets you organize your apps clearly. You can group workloads by:</p>
<ul>
<li><p>Team (for example, Finance, HR)</p>
</li>
<li><p>Environment (for example, testing, staging, production)</p>
</li>
</ul>
<p>This structure helps large teams collaborate better, stay organized, and manage resources efficiently.</p>
<h2 id="heading-disadvantages-of-using-kubernetes">😬 Disadvantages of Using Kubernetes</h2>
<p>Like everything in tech, Kubernetes isn’t all rainbows and rockets 🚀. Just like any other tool, it has its pros and its cons. And it's super important for startup founders, product managers, or even CEOs to know when Kubernetes is the right fit – and when it’s just overkill.</p>
<p>Let’s break down the main disadvantages in a simple, honest way:</p>
<h3 id="heading-1-youll-likely-need-a-devops-engineer-or-team">👨‍🔧 1. You’ll Likely Need a DevOps Engineer or Team</h3>
<p>Kubernetes is powerful, yes. But that power comes with great responsibility 😅.</p>
<p>In simple terms:</p>
<ul>
<li><p>You don't just "click a button" and your app is magically running.</p>
</li>
<li><p>Kubernetes needs someone who understands how to set it up, keep it running, and fix issues when they pop up. This person (or team) is usually called a DevOps Engineer, SIte Relability Engineer or Cloud Engineer.</p>
</li>
</ul>
<p>Here’s what they’ll typically handle:</p>
<ul>
<li><p>Creating the cluster (the environment where your apps will run)</p>
</li>
<li><p>Defining how your app containers should behave (how many should run, how much memory they need, when they should restart, and so on)</p>
</li>
<li><p>Monitoring the apps and making sure they’re healthy</p>
</li>
<li><p>Ensuring security rules are followed</p>
</li>
<li><p>Handling automated scaling, deployment rollouts, backups, and so on.</p>
</li>
</ul>
<p>💡 <strong>In short:</strong> You’ll need someone skilled to manage this tool. If you’re a solo founder or a small team with no DevOps experience, Kubernetes might be too much upfront.</p>
<h3 id="heading-2-kubernetes-can-be-expensive-if-used-prematurely">💰 2. Kubernetes Can Be Expensive (If Used Prematurely)</h3>
<p>Kubernetes saves money at scale – but can cost more if you adopt it too early or for the wrong use case.</p>
<p>Here's why:</p>
<ul>
<li><p>Kubernetes is meant for managing multiple applications or microservices. If your business only has one small app, you’re using a rocket to deliver a pizza 🍕 – it’s just not necessary.</p>
</li>
<li><p>Kubernetes is also best when you have high or unpredictable traffic. It can automatically scale up your services when traffic spikes...but if your traffic is steady and small, you won’t benefit much from that power.</p>
</li>
</ul>
<p>Let’s say:</p>
<ul>
<li><p>You have one app with moderate traffic.</p>
</li>
<li><p>You deploy it on Kubernetes (which requires at least 1–2 VMs + setup).</p>
</li>
<li><p>You hire a DevOps engineer to manage it.</p>
</li>
<li><p>You pay for cloud compute + storage + monitoring.</p>
</li>
</ul>
<p>You could end up spending $300–$800/month or more... for something that could’ve been hosted on a simple service like <a target="_blank" href="https://render.com">Render</a>, <a target="_blank" href="https://www.heroku.com">Heroku</a>, or a basic VM for a fraction of the cost.</p>
<p>So when <strong>should</strong> you consider Kubernetes?</p>
<ul>
<li><p>When your platform is made up of multiple services (For example, separate services for user auth, payments, analytics, notifications, and so on)</p>
</li>
<li><p>When you’re expecting traffic spikes (for example, launching in new countries, going viral, seasonal demand like black Friday)</p>
</li>
<li><p>When you want flexibility in managing your infrastructure across cloud providers (AWS, GCP, Azure) or even on-premises</p>
</li>
</ul>
<h2 id="heading-use-cases-when-and-when-not-to-use-kubernetes">🧭 Use Cases: When (and When Not) to Use Kubernetes</h2>
<p>Kubernetes is an incredibly powerful tool – but it’s not always the right solution from day one.</p>
<p>Let’s break down when it makes sense to use Kubernetes and when it might be overkill 👇</p>
<h3 id="heading-when-you-should-use-kubernetes">✅ When You Should Use Kubernetes</h3>
<p>Kubernetes becomes essential in these scenarios:</p>
<h4 id="heading-1-your-application-is-made-of-many-microservices">1. Your Application Is Made of Many Microservices</h4>
<p>If your app is broken down into multiple microservices – like user authentication, payments, orders, notifications, and more – it’s a good sign that Kubernetes might eventually help.</p>
<p>Kubernetes can:</p>
<ul>
<li><p>Help manage each microservice independently</p>
</li>
<li><p>Automatically scale each one based on demand</p>
</li>
<li><p>Restart failed services automatically</p>
</li>
<li><p>Make it easier to roll out updates to specific parts of the application</p>
</li>
</ul>
<h4 id="heading-2-youre-getting-steady-and-high-traffic">2. You’re Getting <em>Steady and High</em> Traffic</h4>
<p>It’s not just about complexity – it’s about demand.</p>
<p>If your app receives a consistent, high volume of users (like hundreds or thousands every day), and you start seeing signs that your servers are getting overloaded, Kubernetes shines here. It can:</p>
<ul>
<li><p>Automatically increase resources when traffic surges</p>
</li>
<li><p>Balance the load across multiple servers</p>
</li>
<li><p>Prevent downtime due to traffic spikes</p>
</li>
</ul>
<h4 id="heading-3-you-want-portability-and-cloud-independence">3. You Want Portability and Cloud Independence</h4>
<p>If your business doesn’t want to be locked into just one cloud provider (for example, only AWS), Kubernetes gives you flexibility. You can move your application between AWS, GCP, Azure – or even to your own data center – with fewer changes.</p>
<h4 id="heading-4-your-devops-team-is-growing">4. Your DevOps Team Is Growing</h4>
<p>When you have multiple developers or teams working on different parts of the app, Kubernetes helps:</p>
<ul>
<li><p>Organize and isolate workloads per team</p>
</li>
<li><p>Improve collaboration and consistency</p>
</li>
<li><p>Provide easy access control and monitoring</p>
</li>
</ul>
<h3 id="heading-when-you-should-not-use-kubernetes">❌ When You Should Not Use Kubernetes</h3>
<p>Let’s be honest: Kubernetes is not for everyone, especially not at the beginning.</p>
<h4 id="heading-1-you-just-launched-your-app">1. You Just Launched Your App</h4>
<p>In the early days of your product, when you’ve just launched and traffic is still low, Kubernetes is <em>overkill</em>. You don’t need its complexity (yet).</p>
<p>👉 Instead, deploy your app or each microservice on a simple virtual machine (VM). It’s cheaper and faster to get started.</p>
<h4 id="heading-2-you-dont-need-auto-scaling-yet">2. You Don’t Need Auto-scaling (Yet)</h4>
<p>If traffic to your app is still small and manageable, a single server (or a few of them) can easily handle the load. In that case, it’s better to:</p>
<ul>
<li><p>Deploy your microservices manually or with Docker Compose</p>
</li>
<li><p>Monitor and scale manually when needed</p>
</li>
<li><p>Keep things simple until the need for automation becomes obvious</p>
</li>
</ul>
<h4 id="heading-3-you-dont-have-a-devops-team">3. You Don’t Have a DevOps Team</h4>
<p>Kubernetes is powerful – but it needs expertise to set up and maintain. If you don’t have a DevOps engineer or someone who understands Kubernetes, it may cause more problems than it solves.</p>
<p>Hiring a DevOps team can be expensive, and setting up Kubernetes incorrectly can lead to outages, security risks, or wasted resources 💸</p>
<h3 id="heading-when-to-move-to-kubernetes">📈 When to Move to Kubernetes</h3>
<p>So, what’s the best path forward?</p>
<p>Here’s a simple roadmap:</p>
<ol>
<li><p><strong>Start small</strong>: Deploy your app (or microservices) on one or a few VMs</p>
</li>
<li><p><strong>Watch traffic</strong>: As user demand grows, increase VM size or replicate the app manually</p>
</li>
<li><p><strong>Track pain points</strong>: If scaling becomes too manual, or if services crash under load...</p>
</li>
<li><p><strong>Then adopt Kubernetes</strong> 🧠</p>
</li>
</ol>
<p>It’s not about how complex your app is – it’s about when the traffic and growth demand an upgrade in how you manage things.</p>
<h3 id="heading-tldr-for-founders-and-devops-teams">🎯 TL;DR for Founders and DevOps Teams</h3>
<ul>
<li><p>Don’t jump to Kubernetes just because it’s trendy</p>
</li>
<li><p>Use it only when traffic grows steadily and auto-scaling becomes necessary</p>
</li>
<li><p>Kubernetes is most valuable when you want to scale reliably and efficiently</p>
</li>
<li><p>Before that point, stick to simple deployments – it’ll save you time, money, and stress</p>
</li>
</ul>
<h2 id="heading-conclusion">🎉 Conclusion</h2>
<p>Wow! What a journey we’ve been on 😄</p>
<p>We started by answering the big question — <strong>What is Kubernetes?</strong> We discovered that it’s not some mythical beast, but a powerful orchestration tool that helps us manage, deploy, scale, and maintain containerized applications in a smarter way.</p>
<p>Then, we took a step back in time to see how applications were deployed before Kubernetes — the headaches of manually installing software on servers, spinning up separate cloud instances for every microservice, and racking up huge cloud bills just to stay afloat. We also saw how containers simplified things, but even they had their own limitations when managed at scale.</p>
<p>That’s where Kubernetes came to the rescue</p>
<p>We explored:</p>
<ul>
<li><p><strong>The problems Kubernetes solves</strong> – like auto-scaling, efficient resource management, cost savings, and seamless container grouping.</p>
</li>
<li><p><strong>Kubernetes architecture and components</strong> – breaking down complex terms like the cluster, master node, worker nodes, Pods, Services, Kubelet, and more, into simple, easy-to-digest ideas.</p>
</li>
<li><p><strong>Kubernetes workloads</strong> like Deployments, Pods, Services, DaemonSets, and StatefulSets, and what they do behind the scenes to keep our apps running reliably.</p>
</li>
</ul>
<p>From theory to practice, we even got our hands dirty:</p>
<ul>
<li><p>We created a free Kubernetes cluster using Play with Kubernetes 🧪</p>
</li>
<li><p>Deployed a real application using both imperative (direct command) and declarative (manifest file) approaches</p>
</li>
<li><p>Understood why the declarative method makes our infrastructure easier to manage, especially when our systems grow.</p>
</li>
</ul>
<p>Then we took a business lens 🔍 and looked at:</p>
<ul>
<li><p>The advantages of Kubernetes – from auto-scaling during traffic surges, to cost efficiency, and cloud-agnostic deployment.</p>
</li>
<li><p>And also the disadvantages – like needing experienced DevOps engineers and not being ideal for every stage of a product's lifecycle.</p>
</li>
</ul>
<p>Finally, we wrapped up with real-life use cases, highlighting when Kubernetes is a must-have, and when it’s better to wait – especially for early-stage startups still trying to find their audience.</p>
<p>So, whether you're a DevOps newbie, a startup founder, or just someone curious about how modern tech keeps your favorite apps online – you now have a strong foundational understanding of Kubernetes 🙌</p>
<p>Kubernetes is powerful, but it doesn't have to be overwhelming. With a solid grasp of the basics (which you now have 💪), you're well on your way to managing scalable applications like a pro.</p>
<p>Start simple. Grow smart. And when the time is right – Kubernetes will be your best friend.</p>
<h2 id="heading-study-further"><strong>Study Further 📚</strong></h2>
<p>If you would like to learn more about Kubernetes, you can check out the courses below:</p>
<ul>
<li><p><a target="_blank" href="https://www.udemy.com/course/docker-kubernetes-the-practical-guide/">Docker &amp; Kubernetes: The Practical Guide (Academind - Udemy)</a></p>
</li>
<li><p><a target="_blank" href="https://www.coursera.org/specializations/certified-kubernetes-application-developer-ckad-course">Certified Kubernetes Application Developer (CKAD) Specialization (Coursera)</a></p>
</li>
</ul>
<h2 id="heading-about-the-author"><strong>About the Author 👨‍💻</strong></h2>
<p>Hi, I’m Prince! I’m a DevOps engineer and Cloud architect passionate about building, deploying, and managing scalable applications and sharing knowledge with the tech community<a target="_blank" href="https://www.udemy.com/course/github-actions-the-complete-guide/?couponCode=CMCPSALE24">.</a></p>
<p>If you enjoyed this article, you can learn more about me by exploring more of my blogs and projects on my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/">LinkedIn profile.</a> You can find my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/details/publications/">LinkedIn articles here</a>. You can also <a target="_blank" href="https://prince-onuk.vercel.app/achievements#articles">visit my website</a> to read more of my articles as well. Let’s connect and grow together! 😊</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Serverless Architecture Handbook: How to Publish a Node Js Docker Image to AWS ECR and Deploy the Container to AWS Lambda ]]>
                </title>
                <description>
                    <![CDATA[ Imagine you’re tasked with building a web application that can handle incoming traffic surges as your users grow without accumulating too much cost. Sounds like a dream, right? But here’s the thing: traditionally, to do this, you would have to manage... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/serverless-architecture-with-aws-lambda/</link>
                <guid isPermaLink="false">68006521c1f51bf42a74f4b0</guid>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ aws lambda ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Node.js ]]>
                    </category>
                
                    <category>
                        <![CDATA[ serverless ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ecr ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Prince Onukwili ]]>
                </dc:creator>
                <pubDate>Thu, 17 Apr 2025 02:19:13 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1744843935296/c359998f-1657-482f-adf4-5ab023cb1c02.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Imagine you’re tasked with building a web application that can handle incoming traffic surges as your users grow without accumulating too much cost. Sounds like a dream, right?</p>
<p>But here’s the thing: traditionally, to do this, you would have to manage lots of infrastructure – resources on which your application will be deployed – which can be a real headache. You’d have servers (VM instances or physical computers) to configure, databases to scale, load balancers to monitor...it’s a whole lot 😩</p>
<p>This is where Serverless architecture comes to the rescue. With the Serverless model, you can deploy your applications to handle thousands of users without you having to worry about incurring too much cost, managing infrastructure, servers, networking, and so on.</p>
<p>In this article, you’ll learn about Serverless Architecture: what it’s all about, and how to deploy your very own application using AWS Lambda. We’ll walk through the entire process step-by-step:</p>
<ul>
<li><p>How to clone your application repository using Git.</p>
</li>
<li><p>How to build an image of your application using Docker.</p>
</li>
<li><p>How to install the AWS CLI on your local machine and create AWS IAM users with the right permissions to push your Docker image to AWS Elastic Container Registry (ECR).</p>
</li>
</ul>
<p>Once the image is up and running on ECR, we’ll then connect it to AWS Lambda and deploy the container to Lambda for a fully serverless experience. 💡✨</p>
<p>Ready to go serverless? Let’s get started! 🚀</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-serverless-architecture">What is Serverless Architecture?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-differences-between-serverless-and-other-deployment-models">Differences Between Serverless and Other Deployment Models ⚡</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-prerequisites-what-you-should-know-before-following-along">🧠 Prerequisites — What You Should Know Before Following Along!</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-the-application-using-git">How to Set Up the Application Using Git 🐙</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-codebase">Understanding the Codebase 🔎</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-a-docker-image-of-the-application">How to Create a Docker Image of the Application 🐋</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-a-container-registry-on-aws-elastic-container-registry-ecr">How to Create a Container Registry on AWS Elastic Container Registry (ECR) 📁</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-iam-with-aws-how-to-create-a-user-on-aws-iam-to-allow-access-to-your-aws-ecr">IAM with AWS: How to Create a User on AWS IAM to Allow Access to Your AWS ECR 👤🔐</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-upload-your-docker-image-to-the-aws-ecr-repository">How to Upload Your Docker Image to the AWS ECR repository ⬆️</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-deploy-the-application-container-to-aws-lambda-from-the-image-on-aws-ecr">How to Deploy the Application Container to AWS Lambda from the Image on AWS ECR 🚀</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-advantages-of-adopting-the-serverless-model-in-businesses">Advantages of Adopting the Serverless Model in Businesses 💼</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-disadvantages-of-the-serverless-model">Disadvantages of the Serverless Model 🚫</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-to-adopt-the-serverless-model">When to Adopt the Serverless Model 🤔</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion 📝</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-about-the-author">About the Author 👨‍💻</a></p>
</li>
</ol>
<h2 id="heading-what-is-serverless-architecture">What is Serverless Architecture?</h2>
<p>Before we dive deeper, let’s break down what we mean by Servers. In the tech world, servers are powerful computers that store, process, and manage data. Think of them as the behind-the-scenes workhorses that:</p>
<ul>
<li><p><strong>Store your data:</strong> Like a central filing cabinet for your digital documents.</p>
</li>
<li><p><strong>Run your applications:</strong> They execute the code that keeps your app or website running.</p>
</li>
<li><p><strong>Handle requests:</strong> Servers respond to user requests – like loading a webpage or processing a login.</p>
</li>
</ul>
<p>Alright, now let’s talk about Serverless Architecture – but first, let’s clear up a common misconception. When most people hear the word "Serverless", they immediately think, "Wait… no servers? How does that even work?!" 😅</p>
<p>Here’s the truth: Serverless doesn’t mean there are no servers involved (surprise, surprise! 😉). Instead, it means you, as a developer, don’t have to worry about managing the servers that your application runs on. The server-side infrastructure is fully handled by the cloud provider – in this case, AWS Lambda. You just focus on writing code and deploying it, and AWS takes care of the rest.</p>
<h3 id="heading-so-whats-the-big-deal-with-serverless">So, What’s the Big Deal with Serverless?</h3>
<p>In a traditional setup, when you deploy your application, you’re responsible for things like:</p>
<ul>
<li><p><strong>Provisioning servers</strong> (how many servers do you need? What size?)</p>
</li>
<li><p><strong>Scaling resources</strong> (how do you handle traffic spikes without overpaying?)</p>
</li>
<li><p><strong>Monitoring</strong> and keeping everything running smoothly.</p>
</li>
</ul>
<p>Sounds like a lot, right? 🤯 Well, Serverless Architecture simplifies all of that by letting you focus purely on your application code. With Lambda, you can run code in response to events (like an HTTP request, a file upload, or a database change) without worrying about the infrastructure behind it. AWS automatically scales the compute resources as needed, charging you only for the time your code is actually running. ⏱️💸</p>
<p>Imagine you’re at a restaurant. Instead of running the kitchen yourself (like managing your own servers), you just place an order (your code) and the chef (AWS Lambda) makes it for you, on-demand, based on what you need. 🍽️🍴</p>
<h2 id="heading-differences-between-serverless-and-other-deployment-models">Differences Between Serverless and Other Deployment Models ⚡</h2>
<p>Now that you understand how Serverless works, let’s take a little detour and explore the other models used to deploy applications. After all, Serverless isn’t the only kid on the block, and this will give you some important perspective when choosing the right model for your use case. 👀</p>
<p>When you build an app, you need somewhere to host it – a home for your code to live and run. Over the years, the tech world has come up with different ways to handle this, and each one gives you a different level of control (and responsibility) over your servers.</p>
<p>Let’s break it down.</p>
<h3 id="heading-infrastructure-as-a-service-iaas">🏠 Infrastructure as a Service (IaaS)</h3>
<p>With IaaS, cloud providers like AWS, Google Cloud, or Microsoft Azure give you the building blocks – virtual servers (also called instances), storage, and networking tools – but it’s still your job to set everything up.</p>
<p>It’s like renting an empty apartment. You get the walls, the doors, and the roof, but you still have to bring your own furniture, set up your Wi-Fi, and clean the place regularly. 🏡🧹</p>
<p>When you choose IaaS, you’re responsible for:</p>
<ul>
<li><p>Configuring the servers (choosing the size, the operating system, and installing software).</p>
</li>
<li><p>Handling updates, patches, and security.</p>
</li>
<li><p>Scaling up or down when traffic changes.</p>
</li>
</ul>
<p><strong>Example:</strong> Amazon EC2 (Elastic Compute Cloud) is a classic IaaS service. You rent a virtual machine, set it up yourself, and manage it like a digital landlord.</p>
<h3 id="heading-platform-as-a-service-paas">🎯 Platform as a Service (PaaS)</h3>
<p>Next up, we’ve got PaaS – a more polished setup.</p>
<p>In this model, the cloud provider takes care of the infrastructure and the underlying operating system, so you don’t have to. You just upload your code, configure a few settings, and the platform runs your app.</p>
<p>It’s like moving into a fully furnished apartment — the kitchen works, the lights are on, and the Wi-Fi is already connected. You just show up with your bags and get to work! 🧳✨</p>
<p><strong>Example:</strong> AWS Elastic Beanstalk, Heroku, or Google App Engine.</p>
<h3 id="heading-serverless-the-special-paas">🌩️ Serverless: The Special PaaS</h3>
<p>Now here’s where things get interesting: Serverless actually falls under the PaaS umbrella, but it deserves its own spotlight. Why? Because it takes the convenience of PaaS and pushes it to the next level.</p>
<p>In a traditional PaaS model (like AWS Fargate or Heroku), your application is running 24/7, whether you have visitors using it or not. You pay for the reserved space and compute power all month long, just like renting an apartment. Even if you didn’t sleep there the entire month, the bill still comes at the end. 💸🏡</p>
<p>But with Serverless, the rules change. You only pay when your code is actually being used.</p>
<h4 id="heading-how-applications-run-in-the-serverless-model">How Applications Run in the Serverless Model ⚙️</h4>
<p>In a Serverless model, your application isn’t just sitting there running all day. It “wakes up” only when it’s needed. But what exactly causes it to wake up? That’s where triggers come in.</p>
<p>Triggers are events that tell your Serverless application, “Hey, it’s time to do something!” These events could be all sorts of things, like:</p>
<ul>
<li><p>A user visiting your website and clicking a button.</p>
</li>
<li><p>Someone uploading a file to your cloud storage (like an image or document).</p>
</li>
<li><p>A new row being added to a database.</p>
</li>
<li><p>An automated schedule (like a reminder that runs every day at 8 AM).</p>
</li>
</ul>
<p>When one of these events happens, your application instantly comes to life, runs the exact task you programmed, and then goes back to “sleep” until the next trigger. This is how Serverless keeps your cloud costs low and your resources efficient – no constant running in the background, only action when there’s actually something to do!.⚡😎</p>
<p>For example, if a user sends a request that triggers your application to run for just 10 seconds and uses 20MB of memory, that’s all you pay for — the exact time and resources consumed.</p>
<p>No users? No requests? No payment. Now that’s a smart way to save money. 🧠💰</p>
<h3 id="heading-quick-comparison-paas-vs-serverless">💡 Quick Comparison: PaaS vs Serverless</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Traditional PaaS (example: AWS Fargate)</strong></td><td><strong>Serverless PaaS (example: AWS Lambda)</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Server Configuration</td><td>You select compute size &amp; limits.</td><td>No need — AWS handles it all.</td></tr>
<tr>
<td>Scaling</td><td>You configure scaling policies.</td><td>Automatic, event-driven scaling (based on incoming traffic). The higher the traffic, the more compute power is added to your application, and vice versa. 😃</td></tr>
<tr>
<td>Billing</td><td>Charged for running instances 24/7, even when idle.</td><td>Charged only when your code runs. ⏱️💸</td></tr>
<tr>
<td>Deployment</td><td>Deploy full applications.</td><td>Deploy small chunks of code (functions). You can also deploy microservices and full-scale web applications</td></tr>
</tbody>
</table>
</div><hr>
<h2 id="heading-prerequisites-what-you-should-know-before-following-along">🧠 Prerequisites — What You Should Know Before Following Along</h2>
<p>Before we dive in, here’s the best part: I wrote this article to be super beginner-friendly and detailed, so even if you have little to no programming background, you’ll still be able to follow along.</p>
<p>Whether you’re a developer, a tech-curious startup, or a business leader trying to understand modern cloud solutions, this guide was written for you.</p>
<p>That said, having some light knowledge in these areas will make the ride even smoother:</p>
<ul>
<li><p>🧑‍💻 Basic Programming Concepts – like how Node.js apps run and what a server does.</p>
</li>
<li><p>💡 Familiarity with Common Tech Terms – words like “deploy,” “application,” “CPU,” and “software” will pop up, but don’t worry: I’ve done my best to break these down into simple, relatable explanations.</p>
</li>
</ul>
<p>No prior cloud experience? No problem! This guide holds your hand all the way from setup to deployment – all in plain language, no jargon.</p>
<p>So buckle up, and let’s proceed with deploying your very own application to AWS Lambda. 😁</p>
<h2 id="heading-how-to-set-up-the-application-using-git">How to Set Up the Application Using Git 🐙</h2>
<p>Before we jump into writing code or deploying anything, the very first step is to grab the application we’ll be working with — and for that, we’ll be using Git.</p>
<p>But wait... what’s Git? — It’s a Version Control System (VCS) that helps developers track changes to their code, collaborate with teammates without stepping on each other’s toes, and safely store their work in a central place — like GitHub.</p>
<h3 id="heading-clone-the-application-repository">Clone the Application Repository 🧑‍💻</h3>
<p>I’ve already created a simple project for us to use in this tutorial — it’s sitting pretty on GitHub, waiting for you.</p>
<p>To clone the project onto your local machine, open up your terminal and run:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> https://github.com/onukwilip/lambda-tutorial.git
</code></pre>
<p>This command will download all the code from the <code>lambda-tutorial</code> repository into a folder on your computer. 📁</p>
<p>Once the cloning is done, navigate into the project directory like this:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> lambda-tutorial
</code></pre>
<p>Boom — just like that, your local machine is now set up with the same code that’s stored in the GitHub repo. 🏡</p>
<h2 id="heading-understanding-the-codebase">Understanding the Codebase 🔎</h2>
<h3 id="heading-open-the-codebase-in-your-favorite-ide">Open the Codebase in Your Favorite IDE 🧑‍💻</h3>
<p>For this tutorial, we’ll be using Visual Studio Code (VS Code), but feel free to use any editor you’re comfortable with.</p>
<p>Once you open the <code>lambda-tutorial</code> project folder, you’ll notice it’s a simple Node.js web server. Nothing too fancy — just a server that can handle requests and respond with some data.</p>
<p>Now, it’s important to understand what’s going on inside our codebase, especially if you’re coming from deploying on platforms like Render, Vercel, or Google Cloud Run.</p>
<h3 id="heading-deploying-to-lambda-vs-other-serverless-platforms"><strong>Deploying to Lambda vs Other Serverless Platforms ⚡</strong></h3>
<p>When you deploy to platforms like Vercel, Render, or Google Cloud Run, you usually package your web server just the way you wrote it – whether it’s a Node.js Express server or a Next.js app – and the platform handles it pretty much as-is.</p>
<p>Those platforms run your server like a mini container (or microservice) that’s always ready to handle incoming traffic, just like a waiter standing by at your table, waiting for your order.</p>
<p>But AWS Lambda works a little differently.</p>
<p>Lambda expects your code to be organized around functions – not full web servers. Think of Lambda as a chef that only shows up the moment an order is placed, cooks the food, and disappears once the job is done. 👨‍🍳🍽️</p>
<p>So if you’ve got a full-blown Node.js Express server, you’ll need to do a tiny bit of “translation” to fit Lambda’s expectations – and that’s where the lambda.js file comes in.</p>
<h4 id="heading-the-lambdajs-file-your-lambda-translator">The <code>lambda.js</code> File — Your Lambda Translator 🔀</h4>
<p>Here’s what the file looks like:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> serverless = <span class="hljs-built_in">require</span>(<span class="hljs-string">"serverless-http"</span>);
<span class="hljs-keyword">const</span> app = <span class="hljs-built_in">require</span>(<span class="hljs-string">"./app"</span>);

<span class="hljs-keyword">const</span> handler = serverless(app);
<span class="hljs-built_in">module</span>.exports.handler = handler;
</code></pre>
<p>Let’s break it down:</p>
<ul>
<li><p><code>const serverless = require("serverless-http");</code>: This imports a handy little library called serverless-http. (The <code>serverless-http</code> library is important for our platform to run properly on AWS Lambda.) It acts like a translator: it takes your regular Express app and wraps it so that AWS Lambda can understand it.</p>
</li>
<li><p><code>const handler = serverless(app);</code>: Here’s the magic. This wraps your Express app into a Lambda-compatible function.</p>
</li>
<li><p><code>module.exports.handler = handler;</code>: This exports your wrapped function so AWS Lambda can call it when the application is triggered.</p>
</li>
</ul>
<p>So, instead of starting your server like this:</p>
<pre><code class="lang-javascript">app.listen(<span class="hljs-number">5000</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Server running on port 5000"</span>);
});
</code></pre>
<p>You’re handing your app over to Lambda and letting it handle incoming requests, scale, and run the app only when it’s needed.</p>
<h4 id="heading-the-appjs-file-your-classic-express-app">The <code>app.js</code> File — Your Classic Express App 💻</h4>
<p>Your <code>app.js</code> is where the main application logic lives. Here is usually where you:</p>
<ul>
<li><p>Set up Express.</p>
</li>
<li><p>Define routes (like <code>/api</code>, <code>/users</code>, <code>/hello</code>).</p>
</li>
<li><p>Apply middleware (like JSON parsing, logging, CORS, and so on).</p>
</li>
<li><p>Handle HTTP requests and send back responses.</p>
</li>
</ul>
<p>In a normal deployment (Render, Google Cloud Run, DigitalOcean, or your own server), you’d start the server using <code>app.listen(PORT)</code> at the bottom of this file.</p>
<p>But since we’re deploying to Lambda, you don’t directly start the server here. Instead, you export the <code>app</code> like this:</p>
<pre><code class="lang-javascript"><span class="hljs-built_in">module</span>.exports = app;
</code></pre>
<p>This way, your application stays “server-agnostic” – it’s not hardcoded to run on a traditional server. Lambda (via the <code>lambda.js</code> file) takes care of starting and stopping your app whenever it’s triggered by an event (like an HTTP request). Smart, right? 💡</p>
<p>Why this setup? 🤔</p>
<p>This little separation gives you flexibility:</p>
<ul>
<li><p>You can write your Node.js app like you always would (using <code>Express</code>) inside <code>app.js</code>.</p>
</li>
<li><p>And you only tweak the entry point (via <code>lambda.js</code>) to fit AWS Lambda’s expectations.</p>
</li>
</ul>
<h2 id="heading-how-to-create-a-docker-image-of-the-application">How to Create a Docker Image of the Application 🐋</h2>
<p>Now that we’ve had a good look at the code, let’s package it up the smart way — using Docker.</p>
<h3 id="heading-what-is-docker">What is Docker? 🐳</h3>
<p>Now, you might be wondering, <em>"Why are we using Docker?"</em></p>
<p>Docker is a software for creating images of your applications and running those images as containers. Just like real-world shipping containers hold goods securely, Docker containers hold your app, bundled with everything it needs to run: its code, libraries, dependencies, and settings. Everything is all wrapped up neatly, so your app runs the same way everywhere, whether on your laptop, AWS Lambda, or even your friend’s machine.</p>
<h3 id="heading-lets-take-a-look-at-the-dockerfile">Let’s Take a Look at the Dockerfile 🔍</h3>
<p>Inside your project folder, you’ll find a file named <code>Dockerfile</code>. This is basically the recipe that Docker uses to build your app’s container image.</p>
<p>Here’s what it looks like:</p>
<pre><code class="lang-dockerfile"><span class="hljs-keyword">FROM</span> node:<span class="hljs-number">18</span>-slim AS builder

<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>

<span class="hljs-keyword">COPY</span><span class="bash"> package.json .</span>

<span class="hljs-keyword">RUN</span><span class="bash"> npm i -f</span>

<span class="hljs-keyword">COPY</span><span class="bash"> . .</span>

<span class="hljs-keyword">USER</span> root

<span class="hljs-keyword">FROM</span> amazon/aws-lambda-nodejs

<span class="hljs-keyword">ENV</span> PORT=<span class="hljs-number">5000</span>

<span class="hljs-keyword">COPY</span><span class="bash"> --from=builder /app/ <span class="hljs-variable">${LAMBDA_TASK_ROOT}</span></span>
<span class="hljs-keyword">COPY</span><span class="bash"> --from=builder /app/node_modules <span class="hljs-variable">${LAMBDA_TASK_ROOT}</span>/node_modules</span>
<span class="hljs-keyword">COPY</span><span class="bash"> --from=builder /app/package.json <span class="hljs-variable">${LAMBDA_TASK_ROOT}</span></span>
<span class="hljs-keyword">COPY</span><span class="bash"> --from=builder /app/package-lock.json <span class="hljs-variable">${LAMBDA_TASK_ROOT}</span></span>

<span class="hljs-keyword">EXPOSE</span> <span class="hljs-number">5000</span>

<span class="hljs-keyword">CMD</span><span class="bash"> [ <span class="hljs-string">"lambda.handler"</span> ]</span>
</code></pre>
<p>Let’s break down the important steps— in plain English: 😎</p>
<ul>
<li><p><code>FROM node:18-slim AS builder</code>: We start by using a lightweight version of Node.js called <code>node:18-slim</code> and give it a tag named <code>builder</code> (think of it as Stage 1). This gives us the tools we need to build a Node.js app, but without extra stuff that makes the image heavy. The tag <code>builder</code> enables us to re-use the content of this build in the next stage</p>
</li>
<li><p><code>WORKDIR /app</code>: We set the working directory inside the container to <code>/app</code>. Think of this as telling Docker: <em>"Hey, this is the folder where I’ll be working from!"</em></p>
</li>
<li><p><code>COPY package.json .</code>: This copies the <code>package.json</code> file (which lists your app’s dependencies) into the <code>/app</code> folder inside the container.</p>
</li>
<li><p><code>RUN npm i -f</code>: This installs all the Node.js dependencies (the packages your app needs to work).<br>  The <code>-f</code> flag forces npm to resolve conflicts if any pop up.</p>
</li>
<li><p><code>COPY . .</code>: This copies the rest of your project files from your computer into the container.</p>
</li>
<li><p><code>USER root</code>: This sets the user to root (administrator level) inside the container. Useful when extra permissions are needed for certain tasks.</p>
</li>
<li><p><code>FROM amazon/aws-lambda-nodejs</code>: Now here’s the switch: we swap to the official AWS Lambda base image for Node.js! That is, Stage 2. This image is designed to work smoothly when deploying containers to Lambda.</p>
</li>
<li><p><code>ENV PORT=5000</code>: We set an environment variable for the server port. Our app will listen on port 5000.</p>
</li>
<li><p><code>COPY --from=builder /app/ ${LAMBDA_TASK_ROOT}</code>: This grabs all the files from the builder stage and copies them into Lambda’s special working directory (<code>${LAMBDA_TASK_ROOT}</code>).</p>
</li>
<li><p><code>COPY --from=builder /app/node_modules ${LAMBDA_TASK_ROOT}/node_modules</code>: Same thing, but this one specifically copies the node_modules folder (all your installed dependencies) into Lambda’s working directory.</p>
</li>
<li><p><code>COPY --from=builder /app/package.json ${LAMBDA_TASK_ROOT}</code>: Copies the <code>package.json</code> file into Lambda’s working directory.</p>
</li>
<li><p><code>COPY --from=builder /app/package-lock.json ${LAMBDA_TASK_ROOT}</code>: Copies the lock file for your dependencies – so Lambda knows exactly which versions of libraries to use.</p>
</li>
<li><p><code>EXPOSE 5000</code>: This tells Docker, <em>“Hey, my app is going to listen for requests on port 5000!"</em> (Though Lambda doesn’t use this directly, it’s useful for local testing.)</p>
</li>
<li><p><code>CMD [ "lambda.handler" ]</code>: This tells AWS Lambda which function to run when the container starts.<br>  In this case, it’s looking for a <code>handler</code> function inside your app – that’s the entry point!</p>
</li>
</ul>
<h3 id="heading-how-to-create-our-own-docker-image">How to Create Our Own Docker Image</h3>
<p>Before we proceed, you need to have Docker running on your machine. If you haven’t installed Docker yet, check out the official installation guide here: <a target="_blank" href="https://docs.docker.com/engine/install/">Docker Installation Tutorial</a>. It’s a great resource to get Docker up and running.</p>
<h4 id="heading-ensure-docker-is-running">Ensure Docker is Running</h4>
<p>Make sure Docker Desktop is installed and running. You can usually tell by the Docker icon in your system tray. If it’s not running, start it up before proceeding.</p>
<h4 id="heading-build-the-docker-image">Build the Docker Image</h4>
<p>Now, it’s time to create a Docker image of our application. In your terminal, navigate to the root directory of your project (where your Dockerfile is located). Then run the following command:</p>
<pre><code class="lang-bash">docker build -t demo-lambda-project:latest .
</code></pre>
<ul>
<li><p>The <code>docker build</code> command tells Docker to create an image.</p>
</li>
<li><p>The <code>-t demo-lambda-project:latest</code> flag assigns a tag (or name) to your image (we’ll change this later to the image naming convention supported by AWS Elastic Container Registry – ECR).</p>
<ul>
<li>Here, <code>demo-lambda-project</code> is the name, and <code>latest</code> is the tag indicating the most recent build.</li>
</ul>
</li>
<li><p>The <code>.</code> at the end tells Docker to look for the Dockerfile in the current directory.</p>
</li>
</ul>
<h4 id="heading-what-this-does">What This Does</h4>
<p>Docker will now follow the instructions in your Dockerfile step-by-step. It starts by building your Node.js app (using the lightweight Node 18 image), installs the dependencies, and then copies everything over to an AWS Lambda-ready image. Once done, you have a neat image tagged as <code>demo-lambda-project:latest</code> that’s ready for deployment.</p>
<h2 id="heading-how-to-create-a-container-registry-on-aws-elastic-container-registry-ecr">How to Create a Container Registry on AWS Elastic Container Registry (ECR) 📁</h2>
<p>Okay, let’s dive into creating an image registry on AWS Elastic Container Registry (ECR). Follow these steps closely to set up your repository named lambda-practice:</p>
<h3 id="heading-step-1-sign-in-and-navigate-to-aws-ecr">Step 1: Sign In and Navigate to AWS ECR</h3>
<p>Log in to your AWS Management Console: <a target="_blank" href="https://console.aws.amazon.com/console/home">https://console.aws.amazon.com/console/home</a>.</p>
<p>In the search bar at the top, type "ECR". You should see Amazon ECR pop up in the dropdown results. Click on it to navigate to the Elastic Container Registry section.</p>
<h3 id="heading-step-2-start-creating-your-repository">Step 2: Start Creating Your Repository</h3>
<p>Once you’re in the ECR section, look for a button that says "Create repository". Click this button to start setting up your new container registry.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744649904087/615bbd21-c6ed-4243-9a18-10042eec9634.png" alt="Create new AWS ECR repository" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-3-configuring-the-repository-details">Step 3: Configuring the Repository Details</h3>
<p>You’ll need to add some info like:</p>
<ul>
<li><p><strong>Repository name:</strong> In the form that appears, enter <code>lambda-practice</code> as the repository name. This name will be used to reference your repository later when uploading your Docker image.</p>
</li>
<li><p><strong>Tag mutability:</strong> You’ll also see an option for Tag Mutability. For this tutorial, set it to Mutable. This means that if you need to update or change a tag on your image later, you can do so. (Keep in mind that in some scenarios, you might want immutable tags for images used in production environments – but mutable tags are great for testing and development, especially since we want to use the tag <code>latest</code> for our images.)</p>
</li>
</ul>
<p>When you’re happy with the settings, click the "Create repository" button at the bottom of the form.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744650070919/3010590f-f2e3-4d52-9631-8c5d4e1a5239.png" alt="Configure AWS ECR repository" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-repository-created-now-lets-take-a-look">Repository Created – Now Let's Take a Look</h3>
<p>After creating the repository, AWS will redirect you to the page listing your repositories.</p>
<p>Find the repository named <code>lambda-practice</code> in the list. This is your newly created container registry where you can push Docker images.</p>
<p>Copy the <code>lambda-practice</code> repository URI, which we’ll need later when we push our image from our local machine. The URI should be in a format similar to this - <code>&lt;aws_account_id&gt;.dkr.ecr.&lt;region&gt;.amazonaws.com/lambda-practice</code></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744650192129/67d724c7-15da-4ff1-8e38-638c3a8d1aa4.png" alt="Completed creation of AWS ECR repository" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>And that’s it! You’ve now successfully created a container registry on AWS ECR and have your repository (<code>lambda-practice</code>) ready to receive your Docker image. 🚀</p>
<h2 id="heading-iam-with-aws-how-to-create-a-user-on-aws-iam-to-allow-access-to-your-aws-ecr">IAM with AWS: How to Create a User on AWS IAM to Allow Access to Your AWS ECR 👤🔐</h2>
<p>Now that we’ve successfully created our AWS ECR container registry (the home for our Docker image), it's time to make sure our local machine has the necessary permissions to interact with that registry. Without proper authorization, we won’t be able to upload our image.</p>
<p>To do that, we’ll create an IAM user with the appropriate permissions.</p>
<h3 id="heading-step-1-access-the-iam-console">Step 1: Access the IAM Console</h3>
<p>Start by logging in to your AWS Management Console: <a target="_blank" href="https://console.aws.amazon.com/console/home">https://console.aws.amazon.com/console/home</a>.</p>
<p>In the search bar at the top, type "IAM" and select the IAM service from the dropdown. This brings you to the IAM dashboard where you can manage users, roles, policies, and more.</p>
<h3 id="heading-step-2-navigate-to-the-users-section">Step 2: Navigate to the Users Section</h3>
<p>On the left sidebar of the IAM dashboard, click on "Users". Here you'll see a list of existing users, and this is where you'll add a new one.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744651384601/085a25ca-82eb-447b-8106-46df32264a85.png" alt="Create AWS IAM User" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-3-create-a-new-user">Step 3: Create a New User</h3>
<p>Click the "Add users" button at the top. In the "Set user details" step, enter the username as <code>lambda-practice</code>.</p>
<h3 id="heading-step-4-attach-permissions-directly">Step 4: Attach Permissions Directly</h3>
<p>In the "Set permissions" step, choose "Attach policies directly". In the search box, type <code>AmazonEC2ContainerRegistryPowerUser</code>. Select the <code>AmazonEC2ContainerRegistryPowerUser</code> policy by ticking its checkbox. This policy grants the necessary permissions to work with AWS ECR, such as pushing and pulling Docker images.</p>
<p>Click Next, and verify that the username is <code>lambda-practice</code> and that the AmazonEC2ContainerRegistryPowerUser policy is attached. If everything looks good, click "Create user".</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744651476901/c6d91c8c-9757-4cc6-a00f-c23d3a72de59.png" alt="Add policy to AWS IAM User" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-5-generate-access-keys-for-the-user">Step 5: Generate Access Keys for the User</h3>
<p>Once the user is created, you’ll be redirected to the page listing all IAM users. Locate and click on the user <code>lambda-practice</code>. This action will take you to the user’s summary page.</p>
<ul>
<li><p>Navigate to the "Security credentials" tab.</p>
</li>
<li><p>Under "Access keys", click the "Create access key" button.</p>
</li>
<li><p>A page will appear for configuring the new access key.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744652284582/f6a586e9-d09e-467f-ad12-81ccf538bc34.png" alt="Create Access key for AWS IAM User" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>In the "Access key best practices &amp; alternatives" step, select "Command Line Interface (CLI)".</p>
<p><strong>Why should you select this option?</strong> Choosing CLI ensures that the generated access key is optimized for use with the AWS CLI and other command-line tools (like Docker commands that push images to ECR), which is exactly what we need for our workflow.</p>
<p>Leave the other configurations as their default settings, and then click "Create access key".</p>
<p>Once the key is created, you’ll see the new Access key ID and Secret access key. Make sure to copy and store these credentials securely. They are essential for authorizing your local machine to access AWS ECR and perform operations with the permissions assigned to the <code>lambda-practice</code> user.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744652339772/c3d94e2a-f823-4d73-9a46-ab4d829289e9.png" alt="Completed creation of Access key for AWS IAM User" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-how-to-authorize-your-local-pc-to-publish-images-to-the-aws-ecr-repository"><strong>How to Authorize Your Local PC to Publish Images to the AWS ECR Repository</strong></h3>
<p>Now that we have our IAM user set up and the access keys in hand, it’s time to authenticate our local PC so we can securely push our Docker images to AWS ECR using the AWS CLI. Follow these steps:</p>
<h4 id="heading-step-1-install-the-aws-cli">Step 1: Install the AWS CLI</h4>
<p>If you haven’t installed the AWS CLI on your machine yet, download and install it using the official guide here: <a target="_blank" href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html">Install the AWS CLI</a>.</p>
<p>This tool allows you to interact with your AWS account right from the command line, which is essential for pushing images to ECR.</p>
<h4 id="heading-step-2-configure-your-aws-cli-credentials">Step 2: Configure Your AWS CLI Credentials</h4>
<p>Once installed, you need to configure your AWS CLI to use the credentials associated with the <code>lambda-practice</code> user. Open your terminal and run the following command to set up a new profile named <code>lambda</code>:</p>
<pre><code class="lang-bash">aws configure --profile lambda
</code></pre>
<p>You’ll be prompted to enter the following details:</p>
<ul>
<li><p><strong>AWS Access Key ID:</strong> Paste the access key ID that you generated for the <code>lambda-practice</code> user.</p>
</li>
<li><p><strong>AWS Secret Access Key:</strong> Paste the corresponding secret access key.</p>
</li>
<li><p><strong>Default region name:</strong> Enter your preferred AWS region (for example, <code>us-east-1</code> or your relevant region).</p>
</li>
<li><p><strong>Default output format:</strong> You can leave this as <code>json</code> or choose your preferred format.</p>
</li>
</ul>
<p>This command configures a new CLI profile called <code>lambda</code> with the credentials of our IAM user.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744652931837/650c93af-25f0-4d7b-a202-50d825a6b77a.png" alt="Authenticate and authorize AWS CLI with AWS IAM User Access key" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-step-3-verify-the-configuration">Step 3: Verify the Configuration</h4>
<p>To ensure everything is set up correctly, run:</p>
<pre><code class="lang-bash">aws sts get-caller-identity --profile lambda
</code></pre>
<p>This command will return details about the IAM user configured for the <code>lambda</code> profile, confirming that your local PC is now authenticated correctly.</p>
<p>Now you’re all set! Your AWS CLI is configured with the <code>lambda</code> profile, meaning your local machine has the right credentials to interact with your AWS ECR repository and push Docker images using the permissions assigned to your <code>lambda-practice</code> IAM user.</p>
<h2 id="heading-how-to-upload-your-docker-image-to-the-aws-ecr-repository">How to Upload Your Docker Image to the AWS ECR repository ⬆️</h2>
<p>Uploading your Docker image to AWS ECR is the moment when your hard work gets sent off to your repository so AWS Lambda can later grab and run your container. Now that your PC is authorized to talk to ECR, let’s take a look at how to upload the image.</p>
<h3 id="heading-step-1-log-in-to-ecr-with-docker">Step 1: Log in to ECR with Docker</h3>
<p>Before you can push your image, you need to authenticate Docker to your AWS ECR account. You do this by running a command that gets an authentication token from AWS and pipes it to Docker. For example:</p>
<pre><code class="lang-bash">aws ecr get-login-password --region &lt;YOUR_REGION&gt; --profile lambda | docker login --username AWS --password-stdin &lt;YOUR_AWS_ACCOUNT_ID&gt;.dkr.ecr.&lt;YOUR_REGION&gt;.amazonaws.com
</code></pre>
<p>Let’s break it down:</p>
<ul>
<li><p><code>aws ecr get-login-password --region &lt;YOUR_REGION&gt; --profile lambda</code>: This part uses the AWS CLI to get a temporary login password for ECR. Be sure to replace <code>&lt;YOUR_REGION&gt;</code> with the region in which your ECR repository was created (for example, <code>us-east-1</code>).</p>
</li>
<li><p><code>| docker login --username AWS --password-stdin &lt;YOUR_AWS_ACCOUNT_ID&gt;.dkr.ecr.&lt;YOUR_REGION&gt;.</code><a target="_blank" href="http://amazonaws.com"><code>amazonaws.com</code></a>: The pipe (<code>|</code>) takes the password from the AWS CLI command and passes it as input to <code>docker login</code>. The login command then logs Docker into ECR using the provided username (<code>AWS</code>) and the password. Replace <code>&lt;YOUR_AWS_ACCOUNT_ID&gt;</code> with your actual AWS account ID.</p>
</li>
</ul>
<h3 id="heading-step-2-environment-considerations">Step 2: Environment Considerations</h3>
<p>This command works on shell environments like Powershell, zsh, and bash.</p>
<p><strong>Windows Users (CMD)</strong>:<br>If you’re using the classic Windows Command Prompt (CMD), the piping syntax might not work the same way. In that case, you might consider using Windows PowerShell or Git Bash. Alternatively, you can run the command in an environment like Windows Subsystem for Linux (WSL).</p>
<h4 id="heading-why-use-the-correct-region">Why Use the Correct Region?</h4>
<p>It is crucial to use the exact region where your ECR repository was created. The region is a part of your repository URI. If you use the wrong region, the login will fail because it won’t find the correct repository endpoint.</p>
<h4 id="heading-how-to-check-the-region">How to Check the Region:</h4>
<p>Log in to your AWS Console, navigate to the ECR section, and select your repository. The URI will look similar to this: <code>&lt;YOUR_AWS_ACCOUNT_ID&gt;.dkr.ecr.&lt;YOUR_REGION&gt;.amazonaws.com/lambda-practice</code>. Here, <code>&lt;YOUR_REGION&gt;</code> is the region you must use in your login command.</p>
<h3 id="heading-step-3-build-your-docker-image-with-the-correct-tag">Step 3: Build Your Docker Image with the Correct Tag</h3>
<p>Before pushing the image to ECR, you need to build it on your local machine and tag it with your repository’s name. In your terminal, navigate to your project’s root folder (where your Dockerfile is located), then run (replace <code>&lt;YOUR_AWS_ACCOUNT_ID&gt;</code> and <code>&lt;YOUR_REGION&gt;</code> placeholders with your AWS Account ID and AWS ECR repository region):</p>
<pre><code class="lang-bash">docker build -t &lt;YOUR_AWS_ACCOUNT_ID&gt;.dkr.ecr.&lt;YOUR_REGION&gt;.amazonaws.com/lambda-practice:latest
</code></pre>
<h3 id="heading-step-4-push-your-docker-image-to-aws-ecr">Step 4: Push Your Docker Image to AWS ECR</h3>
<p>Once your image is built and tagged, it’s time to push it to your remote ECR repository. Run the following command:</p>
<pre><code class="lang-bash">docker push &lt;YOUR_AWS_ACCOUNT_ID&gt;.dkr.ecr.&lt;YOUR_REGION&gt;.amazonaws.com/lambda-practice:latest
</code></pre>
<p>This command tells Docker to upload (or “push”) your image to the repository you created earlier.</p>
<ul>
<li><p>Make sure the repository URI and tag match what you used in the build command.</p>
</li>
<li><p>Remember, if you use a different region than the one in your repository URI, the push will fail because AWS won’t recognize the repository endpoint.</p>
</li>
</ul>
<h2 id="heading-how-to-deploy-the-application-container-to-aws-lambda-from-the-image-on-aws-ecr">How to Deploy the Application Container to AWS Lambda from the Image on AWS ECR 🚀</h2>
<p>You can deploy your function on AWS Lambda in several ways, each catering to different use cases. Here’s a quick rundown:</p>
<ol>
<li><p><strong>ZIP file upload:</strong> Simply compress your code and dependencies into a ZIP file, then upload it directly via the AWS Lambda console. This traditional method is great for small codebases that don’t require custom runtimes.</p>
</li>
<li><p><strong>Direct editing in the console:</strong> Write or edit your function code directly in the AWS Lambda code editor. Handy for quick tweaks, but not ideal for larger projects.</p>
</li>
<li><p><strong>Container image:</strong> Package your application as a Docker container image and deploy it. This approach is particularly useful if you have complex dependencies, need a custom runtime, or want consistent environments across development and production.</p>
</li>
</ol>
<p>In this tutorial, we’re taking the container image route because it offers flexibility, consistency, and scalability – all while letting us reuse our existing Docker configuration. Let’s walk through the steps for deploying your containerized application to AWS Lambda:</p>
<h3 id="heading-step-1-access-the-aws-lambda-console">Step 1: Access the AWS Lambda Console</h3>
<p>Log into your AWS Management Console. In the search bar at the top, type "Lambda" and select the AWS Lambda service from the dropdown results.</p>
<h3 id="heading-step-2-create-a-new-lambda-function">Step 2: Create a New Lambda Function</h3>
<p>Once on the Lambda page, click the "Create function" button. You’ll see multiple function creation options. For our purposes, select the "Container image" option. This choice tells AWS that you’ll be deploying a containerized application instead of uploading a ZIP file.</p>
<h3 id="heading-step-3-name-your-function">Step 3: Name Your Function</h3>
<p>In the function setup screen, enter <code>lambda-practice</code> as the name of your new Lambda function. This name identifies your function in AWS.</p>
<h3 id="heading-step-4-configure-the-container-image">Step 4: Configure the Container Image</h3>
<p>Under the “Container image” settings, click the "Browse images" button. A new window should appear, listing your available images from AWS Elastic Container Registry (ECR).</p>
<p>Select the repository you previously created (for instance, the one named <code>lambda-practice</code>), and pick the image tagged as <code>latest</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744655907615/df0e3576-5fe6-43a7-8da5-d2964b36a2af.png" alt="Create AWS Lambda function" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744655978526/fafd6b35-579a-4439-b15e-dd5e3dba2acf.png" alt="Connect AWS ECR image to AWS lambda Function" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744656031049/3de3bcc1-2034-4518-acb6-84adb6136752.png" alt="Select Image from AWS ECR repository" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-5-finalize-and-create">Step 5: Finalize and Create</h3>
<p>Now you’ll want to review the basic settings. In this step, you might also configure additional options such as memory allocation, timeout limits, and environment variables, depending on your application needs.</p>
<p>Once everything is set, click "Create function" to finalize the deployment.</p>
<h3 id="heading-how-to-enable-access-to-your-lambda-function">How to Enable Access to Your Lambda Function</h3>
<p>Awesome – hurray, you’ve successfully deployed your image from AWS ECR to AWS Lambda! Now the next step is to make sure your function is up and running and can be triggered properly. But you might be wondering, “How do I actually access my Lambda function to see if it’s working?” Let's break it down:</p>
<h4 id="heading-understanding-lambda-function-triggers">Understanding Lambda Function Triggers</h4>
<p>There are several ways to invoke a Lambda function, and AWS supports multiple trigger options. Here are a few:</p>
<ul>
<li><p><strong>Event Source Mapping:</strong> Automatically triggers your function in response to changes in services like DynamoDB, Kinesis, or S3.</p>
</li>
<li><p><strong>Scheduled Events:</strong> Set up cron-like scheduled invocations via Amazon CloudWatch Events.</p>
</li>
<li><p><strong>API Gateway:</strong> Create RESTful APIs that call your function.</p>
</li>
<li><p><strong>AWS SDK/CLI:</strong> Directly invoke the function using the AWS SDK or CLI commands.</p>
</li>
<li><p><strong>Function URLs:</strong> A simple way to expose your function over HTTPS, giving you a public URL that users or applications can call directly.</p>
</li>
</ul>
<p>In this tutorial, we’re going to use a Function URL to trigger our Lambda function via an HTTP event. This method allows you to invoke your function from the public internet and is perfect for testing or building public-facing APIs.</p>
<h3 id="heading-how-to-create-a-function-url-for-your-lambda-function">How to Create a Function URL for Your Lambda Function</h3>
<p>Now that you're on your Lambda function's details page, here’s how to create a Function URL step-by-step:</p>
<p>First, on your Lambda function’s page, click the "Configuration" tab at the top. Within the Configuration section, find and select the "Function URL" sub-tab. This is where you manage the public URL for your function.</p>
<p>Click on the "Create Function URL" button. This will open a new configuration screen for setting up your Function URL.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744656877335/835422c5-8c88-418a-b1f2-3650360069c3.png" alt="Create Function URL for AWS Lambda Function" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<ul>
<li><p><strong>Authentication type:</strong> Set the Auth type to NONE. This setting allows public, unauthenticated access to your function from the internet, which means anyone with the URL can invoke it. (This is great for testing or building public services, but be cautious with security in production environments!)</p>
</li>
<li><p><strong>Additional settings:</strong> Under the Additional Settings section, enable Configure cross-origin resource sharing (CORS). This is useful if you plan to call your function from client-side applications hosted on different domains. Think of it as opening a window for your app to communicate with other web pages or services.</p>
</li>
</ul>
<p>After configuring your settings, click the appropriate button to create or save the Function URL.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744656860868/cd98ce34-7fdf-4cb6-be85-a25d3718e2e6.png" alt="Configure AWS Function URL for AWS Lambda Function" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-verify-your-function-url">Verify Your Function URL</h4>
<p>Once configured, you’ll see the Function URL displayed on the same page. You can now copy this URL.</p>
<p>Paste the URL into a browser or use tools like <code>curl</code> or Postman to send an HTTP request, triggering your Lambda function and verifying that it works as expected.</p>
<p>You should get a response just like this on your browser:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1744656939019/fcda2621-8057-438b-8d5a-8ac8936b6322.png" alt="Deployed application on AWS Lambda" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>And that’s it! You’ve successfully set up a public HTTP endpoint that triggers your AWS Lambda function. Whether you're testing your deployment or building a public-facing API, the Function URL makes it easy for anyone to interact with your function.</p>
<h3 id="heading-congrats-you-did-it"><strong>Congrats — You did it!</strong></h3>
<p>You've just walked through the entire journey of deploying a Node.js web server, containerized with Docker, all the way to AWS Lambda using AWS ECR as your image repository. 🚀</p>
<p>From writing and containerizing your Node.js application, creating an AWS ECR repository, setting up IAM users and access keys, pushing your Docker image to ECR, to deploying it on Lambda – you’ve covered it all like a pro. 💪</p>
<p>Not only that, but you also configured a public-facing Function URL so your serverless app can now handle requests from anywhere in the world 🌍.</p>
<p>You’ve just combined modern cloud-native workflows with serverless deployment – giving you flexibility, scalability, and lightning-fast response times without the headache of managing servers 😁.</p>
<p>👏 Give yourself a pat on the back. You’ve officially containerized and deployed your Node.js web server to AWS Lambda!</p>
<h2 id="heading-advantages-of-adopting-the-serverless-model-in-businesses">Advantages of Adopting the Serverless Model in Businesses 💼</h2>
<p>When it comes to deploying applications in the cloud, the serverless model has truly flipped the old playbook and has helped businesses save on Cloud costs! Let’s break it down in simple, real-world terms.</p>
<h3 id="heading-cost-efficiency"><strong>Cost-Efficiency 💰</strong></h3>
<p>For most businesses – especially startups – serverless offers a major financial advantage. Here’s why:</p>
<p>In traditional models like IaaS (Infrastructure as a Service) and PaaS (Platform as a Service), such as using AWS EC2 or AWS Elastic Beanstalk, you provision resources upfront.</p>
<p>For example: You spin up a server with 4 GB RAM and 4 vCPUs, and AWS charges you $100/month (this covers 730 hours – the whole month). Even if your app barely does anything – say it only serves real requests for 120 hours, and uses just 1 GB of memory – you still pay the full $100, because the resources were reserved and waiting for traffic 24/7.</p>
<p>But with Serverless:</p>
<ul>
<li><p>You don’t pre-allocate or reserve compute power.</p>
</li>
<li><p>Your application only runs when someone actually needs it (for example, when a user makes an HTTP request).</p>
</li>
<li><p>You only pay for the actual execution time and the resources used.</p>
</li>
</ul>
<p>For instance, if your function only runs for 50 hours in a month and uses 1.5 GB RAM, you might pay something like $30, compared to the flat $100 you'd have paid on EC2 or Elastic Beanstalk.</p>
<h3 id="heading-scalability-without-stress"><strong>Scalability Without Stress 📈</strong></h3>
<p>Serverless platforms like AWS Lambda automatically handle:</p>
<ul>
<li><p>Scaling up during high demand.</p>
</li>
<li><p>Scaling down to zero when idle.</p>
</li>
</ul>
<p>This means your team won’t need to predict or provision for resources during traffic surges. Whether 1 or 1 million users visit your app, the cloud provider handles the rest.</p>
<h3 id="heading-simplified-operations"><strong>Simplified Operations ⚙️</strong></h3>
<p>For your software team:</p>
<ul>
<li><p>No more babysitting servers, patching security updates, or worrying about load balancers.</p>
</li>
<li><p>You focus purely on writing the business logic and shipping code.</p>
</li>
<li><p>The cloud provider handles the infrastructure behind the scenes.</p>
</li>
</ul>
<p>This frees up your team’s time, cuts maintenance tasks, and speeds up development times.</p>
<h3 id="heading-better-return-on-investment-roi"><strong>Better Return on Investment (ROI) 📊</strong></h3>
<p>Because you only pay for what you use, the cost-to-value ratio improves significantly. Startups and businesses can:</p>
<ul>
<li><p>Launch faster.</p>
</li>
<li><p>Experiment without financial risk.</p>
</li>
<li><p>Scale without surprise bills.</p>
</li>
<li><p>Avoid overpaying for idle resources.</p>
</li>
</ul>
<h2 id="heading-disadvantages-of-the-serverless-model">Disadvantages of the Serverless Model 🚫</h2>
<p>As exciting and cost-friendly as the serverless model seems, the golden rule in tech still applies:<br>every solution comes with trade-offs.</p>
<p>Let’s walk through a few important downsides you should consider:</p>
<h3 id="heading-no-built-in-support-for-background-jobs"><strong>No Built-in Support for Background Jobs ⏰</strong></h3>
<p>Unlike traditional servers where you can run background processes – like sending out newsletters at midnight or cleaning up databases at scheduled times – serverless platforms such as AWS Lambda don’t natively support background tasks or recurring jobs.</p>
<p>For example, let’s say you wanted your app to automatically generate reports every day at 3 AM. In a typical server setup, you’d just write a cron job and call it a day.</p>
<p>But with Lambda or serverless, you can’t do this directly inside your deployed function. Instead, you need external tools like:</p>
<ul>
<li><p>AWS EventBridge (for scheduling and triggering Lambda functions)</p>
</li>
<li><p>Or other cloud-native schedulers.</p>
</li>
</ul>
<p>This adds a bit of extra setup, management, and sometimes extra cost.</p>
<h3 id="heading-unpredictable-cloud-costs"><strong>Unpredictable Cloud Costs 💸</strong></h3>
<p>One of the biggest selling points of serverless is “pay-as-you-use” – but this can also become a financial blind spot, because:</p>
<ul>
<li><p>Costs depend on traffic volume and resource usage.</p>
</li>
<li><p>If your app suddenly goes viral or experiences a traffic spike, your cloud bill could skyrocket without warning.</p>
</li>
</ul>
<p>For example, an app that runs stable at $30/month for low traffic could unexpectedly hit $1000+ if a marketing campaign or external event drives huge numbers of users to your service. While this means your app is succeeding, your budget might take a hit.</p>
<p>In contrast, with traditional models like AWS EC2 or Elastic Beanstalk, your costs are usually predictable – even if your server sits idle all month.</p>
<h2 id="heading-when-to-adopt-the-serverless-model">When to Adopt the Serverless Model 🤔</h2>
<p>So, is Serverless always the right choice? Not necessarily!</p>
<p>If you expect:</p>
<ul>
<li><p><strong>Steady, predictable workloads,</strong> EC2 or Elastic Beanstalk might offer more cost certainty.</p>
</li>
<li><p><strong>Long-running background tasks</strong>, serverless isn’t ideal without extra services.</p>
</li>
<li><p><strong>Real-time control over resource limits</strong>, traditional servers give you more flexibility.</p>
</li>
</ul>
<p>But if your app has burst traffic (users come and go), event-driven logic (like APIs or webhooks), or you want minimal ops overhead, then Serverless can save time, effort, and money.</p>
<h3 id="heading-when-serverless-is-the-perfect-fit-a-startup-building-an-event-driven-api"><strong>When Serverless is the Perfect Fit: A Startup Building an Event-Driven API</strong></h3>
<p>Imagine you’re running a small tech startup that just launched an app for booking fitness classes. Your team is small, budgets are tight, and traffic is unpredictable – some days you have 50 users, some days 5,000.</p>
<p>In this case:</p>
<ul>
<li><p>Your backend mostly handles HTTP requests: new sign-ups, class bookings, cancellations, and payments.</p>
</li>
<li><p>Traffic spikes during lunch breaks and weekends, but is quiet at night.</p>
</li>
<li><p>You don’t want to hire a full-time DevOps engineer just to manage servers.</p>
</li>
</ul>
<p>👉 <strong>Why Serverless is perfect in this case:</strong></p>
<ul>
<li><p>You only pay when people use your app.</p>
</li>
<li><p>No need to manage or provision servers.</p>
</li>
<li><p>AWS Lambda auto-scales based on demand.</p>
</li>
<li><p>Fast to deploy, easy to connect to other AWS services (like DynamoDB for your database, S3 for images, and SES for emails).</p>
</li>
</ul>
<p>By using Serverless in this case, you can save money, scale automatically, and stay laser-focused on features – not infrastructure.</p>
<h3 id="heading-when-serverless-is-not-a-good-fit-a-video-streaming-platform"><strong>When Serverless is Not a Good Fit: A Video Streaming Platform</strong></h3>
<p>Now imagine you’re building the next YouTube-like service for a niche audience – say, education-based content for universities.</p>
<p>In this case:</p>
<ul>
<li><p>Your platform requires continuous background processing: encoding videos, generating thumbnails, and pushing them to CDN.</p>
</li>
<li><p>Users stream content 24/7, meaning your app is always under load.</p>
</li>
<li><p>Background jobs like recommendation engine updates or nightly reports need to run frequently.</p>
</li>
</ul>
<p>👉 <strong>Why Serverless might be a bad idea:</strong></p>
<ul>
<li><p>Functions like AWS Lambda have a timeout limit (for example 15 minutes max per execution).</p>
</li>
<li><p>Continuous processing or streaming doesn’t fit the on-demand, short-lived nature of serverless.</p>
</li>
<li><p>Costs could skyrocket since the app runs almost all the time, making it more expensive than a dedicated EC2 or Kubernetes cluster.</p>
</li>
</ul>
<p><strong>Better alternative:</strong><br>For this kind of use case, a traditional server-based setup – like EC2 or container orchestration via ECS or Kubernetes – would offer more control, predictable pricing, and support for long-running processes</p>
<p>✅ <strong>Bottom line:</strong><br>Serverless is fantastic for modern apps, but like any tool, it’s best used when its strengths match your project’s needs.</p>
<h2 id="heading-conclusion">Conclusion 📝</h2>
<p>Congratulations on making it to the end of this tutorial! 🚀</p>
<p>In this article, we explored the power of serverless computing by walking step-by-step through the process of deploying a Node.js web server using Docker and AWS Lambda.</p>
<p>From building your container image, pushing it to AWS ECR, and finally deploying it on Lambda – you’ve now seen how easy it is to get an app running without the hassle of provisioning servers.</p>
<p>We also discussed the advantages of adopting the Serverless model in deploying your applications, it’s disadvantages, and real-world use cases in which you should adopt the serverless approach.</p>
<h2 id="heading-about-the-author"><strong>About the Author 👨‍💻</strong></h2>
<p>Hi, I’m Prince! I’m a DevOps engineer and Cloud architect passionate about building, deploying, and managing scalable applications and sharing knowledge with the tech community.</p>
<p>If you enjoyed this article, you can learn more about me by exploring more of my blogs and projects on my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/">LinkedIn profile</a>. You can find my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/details/publications/">LinkedIn articles here</a>. You can also <a target="_blank" href="https://prince-onuk.vercel.app/achievements#articles">visit my website</a> to read more of my articles as well. Let’s connect and grow together! 😊</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The CI/CD Handbook: Learn Continuous Integration and Delivery with GitHub Actions, Docker, and Google Cloud Run ]]>
                </title>
                <description>
                    <![CDATA[ Hey everyone! 🌟 If you’re in the tech space, chances are you’ve come across terms like Continuous Integration (CI), Continuous Delivery (CD), and Continuous Deployment. You’ve probably also heard about automation pipelines, staging environments, pro... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-continuous-integration-delivery-and-deployment/</link>
                <guid isPermaLink="false">6751d2f856661d3d5a501466</guid>
                
                    <category>
                        <![CDATA[ Continuous Integration ]]>
                    </category>
                
                    <category>
                        <![CDATA[ continuous delivery ]]>
                    </category>
                
                    <category>
                        <![CDATA[ continuous deployment ]]>
                    </category>
                
                    <category>
                        <![CDATA[ GitHub Actions ]]>
                    </category>
                
                    <category>
                        <![CDATA[ CI/CD ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Prince Onukwili ]]>
                </dc:creator>
                <pubDate>Thu, 05 Dec 2024 16:21:12 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1734119999570/cfbf3375-1e95-41df-b5b0-8fbb8b827f59.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Hey everyone! 🌟 If you’re in the tech space, chances are you’ve come across terms like <strong>Continuous Integration (CI)</strong>, <strong>Continuous Delivery (CD)</strong>, and <strong>Continuous Deployment</strong>. You’ve probably also heard about automation pipelines, staging environments, production environments, and concepts like testing workflows.</p>
<p>These terms might seem complex or interchangeable at first glance, leaving you wondering: What do they actually mean? How do they differ from one another? 🤔</p>
<p>In this handbook, I’ll break down these concepts in a clear and approachable way, drawing on relatable analogies to make each term easier to understand. 🧠💡 Beyond just theory, we’ll dive into a hands-on tutorial where you’ll learn how to set up a CI/CD workflow step by step.</p>
<p>Together, we’ll:</p>
<ul>
<li><p>Set up a Node.js project. ✨</p>
</li>
<li><p>Implement automated tests using Jest and Supertest. 🛠️</p>
</li>
<li><p>Set up a CI/CD workflow using GitHub Actions, triggered on push, and pull requests, or after a new release. ⚙️</p>
</li>
<li><p>Build and publish a Docker image of your application to Docker Hub. 📦</p>
</li>
<li><p>Deploy your application to a staging environment for testing. 🚀</p>
</li>
<li><p>Finally, roll it out to a production environment, making it live! 🌐</p>
</li>
</ul>
<p>By the end of this guide, not only will you understand the difference between CI/CD concepts, but you’ll also have practical experience in building your own automated pipeline. 😃</p>
<h3 id="heading-table-of-contents">Table of Contents</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-continuous-integration-deployment-and-delivery"><strong>What is Continuous Integration, Deployment, and Delivery?</strong></a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-differences-between-continuous-integration-continuous-delivery-and-continuous-deployment"><strong>Differences Between Continuous Integration, Continuous Delivery, and Continuous Deployment</strong></a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-a-nodejs-project-with-a-web-server-and-automated-tests"><strong>How to Set Up a Node.js Project with a Web Server and Automated Tests</strong></a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-a-github-repository-to-host-your-codebase"><strong>How to Create a GitHub Repository to Host Your Codebase</strong></a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-the-ci-and-cd-workflows-within-your-project"><strong>How to Set Up the CI and CD Workflows Within Your Project</strong></a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-set-up-a-docker-hub-repository-for-the-projects-image-and-generate-an-access-token-for-publishing-the-image"><strong>Set Up a Docker Hub Repository for the Project's Image and Generate an Access Token for Publishing the Image</strong></a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-create-a-google-cloud-account-project-and-billing-account"><strong>Create a Google Cloud Account, Project, and Billing Account</strong></a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-create-a-google-cloud-service-account-to-enable-deployment-of-the-nodejs-application-to-google-cloud-run-via-the-cd-pipeline"><strong>Create a Google Cloud Service Account to Enable Deployment of the Node.js Application to Google Cloud Run via the CD Pipeline</strong></a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-create-the-staging-branch-and-merge-the-feature-branch-into-it-continuous-integration-and-continuous-delivery"><strong>Create the Staging Branch and Merge the Feature Branch into It (Continuous Integration and Continuous Delivery)</strong></a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-merge-the-staging-branch-into-the-main-branch-continuous-integration-and-continuous-deployment"><strong>Merge the Staging Branch into the Main Branch (Continuous Integration and Continuous Deployment)</strong></a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion"><strong>Conclusion</strong></a></p>
</li>
</ol>
<h2 id="heading-what-is-continuous-integration-deployment-and-delivery"><strong>What is Continuous Integration, Deployment, and Delivery?</strong> 🤔</h2>
<h3 id="heading-continuous-integration-ci"><strong>Continuous Integration (CI)</strong></h3>
<p>Imagine you’re part of a team of six developers, all working on the same project. Without a proper system, chaos would ensue.</p>
<p>Let’s say Mr. A is building a new login feature, Mrs. B is fixing a bug in the search bar, and Mr. C is tweaking the dashboard UI—all at the same time. If everyone is editing the same "folder" or codebase directly, things could go horribly wrong: <em>"Hey! Who just broke the app?!"</em> 😱</p>
<p>To keep everything in order, teams use <strong>Version Control Systems (VCS)</strong> like GitHub, GitLab, or BitBucket. Think of it as a digital workspace where everyone can safely collaborate without stepping on each other’s toes. 🗂️✨</p>
<p>Here’s how Continuous Integration fits into this process step-by-step:</p>
<h4 id="heading-1-the-main-branch-the-general-folder">1. <strong>The Main Branch: The General Folder</strong> ✨</h4>
<p>At the heart of every project is the <strong>main branch</strong>—the ultimate source of truth. It contains the stable codebase that powers your live app. It’s where every team member contributes their work, but with one important rule: only tested and approved code gets merged here. 🚀</p>
<h4 id="heading-2-feature-branches-personal-workspaces">2. <strong>Feature Branches: Personal Workspaces</strong> 🔨</h4>
<p>When someone like Mr. A wants to work on a new feature, they create a <strong>feature branch</strong>. This branch is essentially a personal copy of the main branch where they can tinker, write code, and test without affecting others. Mrs. B and Mr. C are also working on their own branches. Everyone’s experiments stay neatly organized. 🧪💡</p>
<h4 id="heading-3-merging-changes-the-ci-workflow">3. <strong>Merging Changes: The CI Workflow</strong> 🎉</h4>
<p>When Mr. A is satisfied with his feature, he doesn’t just shove it into the main branch—CI ensures it’s done safely:</p>
<ul>
<li><p><strong>Automated Tests</strong>: Before merging, CI tools automatically run tests on Mr. A’s code to check for bugs or errors. Think of it as a bouncer guarding the main branch, ensuring no bad code gets in. 🕵️‍♂️</p>
</li>
<li><p><strong>Build Verification</strong>: The feature branch code is also "built" (converted into a deployable version of the app) to confirm it works as intended.</p>
</li>
</ul>
<p>Once these checks are passed, Mr. A’s feature branch is merged into the main branch. This frequent merging of changes is what we call <strong>Continuous Integration</strong>.</p>
<h3 id="heading-continuous-delivery-cd">Continuous Delivery (CD)</h3>
<p>Continuous Delivery (CD) often gets mixed up with Continuous Deployment, and while they share similarities, they serve distinct purposes in the development lifecycle. Let’s break it down! 🧐</p>
<h4 id="heading-the-need-for-a-staging-area">The Need for a <code>Staging</code> Area 🌉</h4>
<p>In the Continuous Integration (CI) process we discussed above, we primarily dealt with <strong>feature branches</strong> and the <strong>main branch</strong>. But directly merging changes from feature branches into the main branch (which powers the live product) can be risky. Why? 🛑</p>
<p>While automated tests and builds catch many errors, they’re not foolproof. Some edge cases or bugs might slip through unnoticed. This is where the <strong>staging branch</strong> and <strong>staging environment</strong> come into play! 🎭</p>
<p>Think of the staging branch as a “trial run.” Before unleashing changes to real customers, the codebase from feature branches is merged into the staging branch and deployed to a <strong>staging environment</strong>. This environment is an exact replica of the production environment, but it’s used exclusively by the <strong>Quality Assurance (QA) team</strong> for testing.</p>
<p>The QA team takes the role of a “test driver,” running the platform through its paces just as a real user would. They check for usability issues, edge cases, or bugs that automated tests might miss, and provide feedback to developers for fixes. 🚦 If everything passes, the codebase is cleared for deployment to production.</p>
<h4 id="heading-continuous-delivery-in-action">Continuous Delivery in Action 📦</h4>
<p>The process of merging changes into the staging branch and deploying them to the <strong>staging environment</strong> is what we call <strong>Continuous Delivery</strong>. 🛠️ It ensures that the application is always in a deployable state, ready for the next step in the pipeline.</p>
<p>Unlike Continuous Deployment (which we’ll discuss later), Continuous Delivery doesn’t automatically push changes to production (live platform). Instead, it pauses to let humans—namely the QA team or stakeholders—decide when to proceed. This adds an extra layer of quality assurance, reducing the chances of errors making it to the live product. 🕵️‍♂️</p>
<h3 id="heading-continuous-deployment-cd">Continuous Deployment (CD)</h3>
<p>Continuous Deployment (CD) takes automation to its peak. While it shares similarities with Continuous Delivery, the key difference lies in the <strong>final step</strong>: there’s no manual approval required. The final process—merging the codebase and deploying it live for end users (the QA testers or the team lead could do this).</p>
<p>Let’s explore what makes Continuous Deployment so powerful (and a little scary)! 😅</p>
<h4 id="heading-the-last-mile-of-the-cicd-pipeline">The Last Mile of the CI/CD Pipeline 🛣️</h4>
<p>Imagine you’ve gone through the rigorous process of Continuous Integration: teammates have merged their feature branches, automated tests were run, and the codebase was successfully deployed to the staging environment during Continuous Delivery.</p>
<p>Now, you’re confident that the application is free of bugs and ready to shine in the production environment—the live version of your platform used by real customers.</p>
<p>In <strong>Continuous Deployment</strong>, this final step of deploying changes to the live environment happens <strong>automatically</strong>. The pipeline triggers whenever specific events occur, such as:</p>
<ul>
<li><p>A <strong>Pull Request (PR)</strong> is merged into the <strong>main branch</strong>.</p>
</li>
<li><p>A new <strong>release version</strong> is created.</p>
</li>
<li><p>A <strong>commit</strong> is pushed directly to the production branch (though this is rare for most teams).</p>
</li>
</ul>
<p>Once triggered, the pipeline springs into action, building, testing, and finally deploying the updated codebase to the production environment. 📡</p>
<h2 id="heading-differences-between-continuous-integration-continuous-delivery-and-continuous-deployment"><strong>Differences Between Continuous Integration, Continuous Delivery, and Continuous Deployment</strong> 🔍</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Aspect</td><td>Continuous Integration (CI)</td><td>Continuous Delivery (CD)</td><td>Continuous Deployment (CD)</td></tr>
</thead>
<tbody>
<tr>
<td>Primary Focus</td><td>Merging feature branches into the main/general codebase OR to the staging codebase.</td><td>Deploying the tested code to a staging environment for QA testing and approval.</td><td>Automatically deploying the code to the live production environment.</td></tr>
<tr>
<td><strong>Automation Level</strong></td><td>Automates testing and building processes for feature branches.</td><td>Automates deployment to staging/test environments after successful testing.</td><td>Fully automates the deployment to production with no manual approval.</td></tr>
<tr>
<td><strong>Testing Scope</strong></td><td>Automated tests run on feature branches to ensure code quality before merging into the main or staging branch.</td><td>Includes automated tests before deployment to staging and allows QA testers to perform manual testing in a controlled environment.</td><td>May include automated tests as a final check, ensuring the production environment is stable before deployment.</td></tr>
<tr>
<td><strong>Branch Involved</strong></td><td>Feature branches merging into the main/general or staging branch.</td><td>Staging branch used as an intermediate step before merging into the main branch.</td><td>Main/general branch deployed directly to production.</td></tr>
<tr>
<td><strong>Environment Target</strong></td><td>Ensures integration and testing within a local environment or build pipeline.</td><td>Deploys to staging/test environments where QA testers validate features.</td><td>Deploys to production/live environment accessed by end users.</td></tr>
<tr>
<td><strong>Key Goal</strong></td><td>Prevent integration conflicts and ensure new changes don’t break the existing codebase.</td><td>Provide a stable, near-production environment for thorough QA testing before final deployment.</td><td>Ensure that new features and updates reach users as soon as possible with minimal delays.</td></tr>
<tr>
<td><strong>Approval Process</strong></td><td>No approval needed. Feature branches are tested and merged upon passing criteria.</td><td>QA team or lead provides feedback/approval before changes are merged into the main branch for production.</td><td>No manual approval. Deployment is entirely automated.</td></tr>
<tr>
<td><strong>Example Trigger</strong></td><td>A developer merges a feature branch into the main branch.</td><td>The staging branch passes automated tests (during PR) and is ready for deployment to the testing environment.</td><td>A new release is created or a pull request is merged into the main branch, triggering an automatic production deployment.</td></tr>
</tbody>
</table>
</div><p>Now that we’ve untangled the mysteries of Continuous Integration, Continuous Delivery, and Continuous Deployment, it’s time to roll up our sleeves and put theory into practice 😁.</p>
<h2 id="heading-how-to-set-up-a-nodejs-project-with-a-web-server-and-automated-tests"><strong>How to Set Up a Node.js Project with a Web Server and Automated Tests</strong> ✨</h2>
<p>In this hands-on section, we’ll build a Node.js web server with automated tests using Jest. From there, we’ll create a CI/CD pipeline with GitHub Actions that automates testing for every <strong>pull request to the staging and main branches</strong>. Finally, we’ll publish an Image of our application to DockerHub and deploy the image to <strong>Google Cloud Run</strong>, first to a staging environment for testing and later to the production environment for live use.</p>
<p>Ready to bring your project to life? Let’s get started! 🚀✨</p>
<h3 id="heading-step-1-install-nodejs">Step 1: Install Node.js 📥</h3>
<p>To get started, you’ll need to have <strong>Node.js</strong> installed on your machine. Node.js provides the JavaScript runtime we’ll use to create our web server.</p>
<ol>
<li><p>Visit <a target="_blank" href="https://nodejs.org/en/download/package-manager">https://nodejs.org/en/download/package-manager</a></p>
</li>
<li><p>Choose your operating system (Windows, macOS, or Linux) and download the installer.</p>
</li>
<li><p>Follow the installation instructions to complete the setup.</p>
</li>
</ol>
<p>To verify that Node.js was installed successfully, open your terminal and run <code>node -v</code>. This should display the installed version of Node.js</p>
<h3 id="heading-step-2-clone-the-starter-repository">Step 2: Clone the Starter Repository 📂</h3>
<p>The next step is to grab the starter code from GitHub. If you don’t have Git installed, you can download it at <a target="_blank" href="https://git-scm.com/downloads">https://git-scm.com/downloads</a>. Choose your OS and follow the instructions to install Git. Once you’re set, it’s time to clone the repository.</p>
<p>Run the following command in your terminal to clone the boilerplate code:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> --single-branch --branch initial https://github.com/onukwilip/ci-cd-tutorial
</code></pre>
<p>This will download the project files from the <code>initial</code> branch, which contains the starter template for our Node.js web server.</p>
<p>Navigate into the project directory:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> ci-cd-tutorial
</code></pre>
<h3 id="heading-step-3-install-dependencies">Step 3: Install Dependencies 📦</h3>
<p>Once you’re in the project directory, install the required dependencies for the Node.js project. These are the packages that power the application:</p>
<pre><code class="lang-bash">npm install --force
</code></pre>
<p>This will download and set up all the libraries specified in the project. Alright, dependencies installed? You’re one step closer!</p>
<h3 id="heading-step-4-run-automated-tests">Step 4: Run Automated Tests ✅</h3>
<p>Before diving into the code, let’s confirm that the automated tests are functioning correctly. Run:</p>
<pre><code class="lang-bash">npm <span class="hljs-built_in">test</span>
</code></pre>
<p>You should see two successful test results in your terminal. This indicates that the starter project is correctly configured with working automated tests.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733074280408/93b4ea86-1dfa-42eb-a163-b97c19c2a053.png" alt="Successful test run" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-5-start-the-web-server">Step 5: Start the Web Server 🌐</h3>
<p>Finally, let’s start the web server and see it in action. Run the following command:</p>
<pre><code class="lang-bash">npm start
</code></pre>
<p>Wait for the application to start running. Open your browser and visit <a target="_blank" href="http://localhost:5000/">http://localhost:5000</a>. 🎉 You should see the starter web server up and running, ready for your CI/CD magic:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733074667521/7b80bb21-1f43-430e-8a56-2bff8b81ddad.png" alt="Successful project run" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-create-a-github-repository-to-host-your-codebase"><strong>How to Create a GitHub Repository to Host Your Codebase 📂</strong></h2>
<h3 id="heading-step-1-sign-in-to-github">Step 1: Sign In to GitHub</h3>
<ol>
<li><p><strong>Go to GitHub</strong>: Open your browser and visit GitHub - <a target="_blank" href="https://github.com/">https://github.com</a>.</p>
</li>
<li><p><strong>Sign In</strong>: Click on the <strong>Sign In</strong> button in the top-right corner and enter your username and password to log in, OR create an account if you don’t have one by clicking the <strong>Sign up</strong> button.</p>
</li>
</ol>
<h3 id="heading-step-2-create-a-new-repository">Step 2: Create a New Repository</h3>
<p>Once you're signed in, on the main GitHub page, you’ll see a "+" sign in the top-right corner next to your profile picture. Click on it, and select <strong>“New repository”</strong> from the dropdown.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733130465203/dac28dee-74da-4fd4-8a96-bc90aef01207.png" alt="New GitHub repository" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now it’s time to set the repository details. You’ll include:</p>
<ul>
<li><p><strong>Repository Name</strong>: Choose a name for your repository. For example, you can call it <code>ci-cd-tutorial</code>.</p>
</li>
<li><p><strong>Description</strong> (Optional): You can add a short description, like “A tutorial project for CI/CD with Docker and GitHub Actions.”</p>
</li>
<li><p><strong>Visibility</strong>: Choose whether you want your repository to be <strong>public</strong> (accessible by anyone) or <strong>private</strong> (only accessible by you and those you invite). For the sake of this tutorial, make it <strong>public</strong>.</p>
</li>
<li><p><strong>Do Not Check the Add a README File Box</strong>: <strong>Important</strong>: Make sure you <strong>do not check</strong> the option to <strong>Add a README file</strong>. This will automatically create a <code>README.md</code> file in your repository, which could cause conflicts later when you push your local files. We'll add the README file manually if needed later.</p>
</li>
</ul>
<p>After filling out the details, click on <strong>“Create repository”</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733130890582/04e09ac8-0ee6-4d26-a9f2-007c0e6ca08f.png" alt="Create GitHub repository" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-3-change-the-remote-destination-and-push-to-your-new-repository">Step 3: Change the Remote Destination and Push to Your New Repository</h3>
<h4 id="heading-update-the-remote-repository-url"><strong>Update the Remote Repository URL</strong>:</h4>
<p>Since you've already cloned the codebase from my repository, you need to update the remote destination to point to your newly created GitHub repository.</p>
<p>Copy your repository URL (the URL of the page you were redirected to after creating the repository). It should look similar to this: <code>https://github.com/&lt;username&gt;/&lt;repo-name&gt;</code>.</p>
<p>Open your terminal in the project directory and run the following commands:</p>
<pre><code class="lang-bash">git remote set-url origin &lt;your-repo-url&gt;
</code></pre>
<p>Replace <code>&lt;your-repo-url&gt;</code> with your GitHub repository URL which you copied earlier.</p>
<h4 id="heading-rename-the-current-branch-to-main"><strong>Rename the Current Branch to</strong> <code>main</code>:</h4>
<p>If your branch is named something other than <code>main</code>, you can rename it to <code>main</code> using:</p>
<pre><code class="lang-bash">git branch -M main
</code></pre>
<h4 id="heading-push-to-your-new-repository"><strong>Push to Your New Repository</strong>:</h4>
<p>Finally, commit any changes you’ve made and push your local repository to the new remote GitHub repository by running:</p>
<pre><code class="lang-bash">git add .
git commit -m <span class="hljs-string">'Created boilerplate'</span>
git push -u origin main
</code></pre>
<p>Now your local codebase is linked to your new GitHub repository, and the files are successfully pushed there. You can verify by visiting your repository on GitHub.</p>
<h2 id="heading-how-to-set-up-the-ci-and-cd-workflows-within-your-project">How to Set Up the CI and CD Workflows Within Your Project ⚙️</h2>
<p>Now it’s time to create the <strong>CI and CD workflows</strong> for our project! These workflows won’t run on your local PC but will be automatically triggered and executed in the cloud once you push your changes to the remote repository. GitHub Actions will detect these workflows and run them based on the triggers you define.</p>
<h3 id="heading-step-1-prepare-the-workflow-directory">Step 1: Prepare the Workflow Directory 📂</h3>
<p>Before adding the CI/CD pipelines, it's a good practice to first create a feature branch. This step mirrors the workflow commonly used in teams, where new features or changes are made in separate branches before they are merged into the main codebase.</p>
<p>To create and switch to a new branch, run the following command:</p>
<pre><code class="lang-bash">git checkout -b feature/ci-cd-pipeline
</code></pre>
<p>This will create a new branch called <code>feature/ci-cd-pipeline</code> and switch to it. Now, you can safely add and test the CI/CD workflows without affecting the main branch.</p>
<p>Once you finish, you’ll be able to merge this feature branch back into <code>main</code> or <code>staging</code> as part of the pull request process.</p>
<p>In the project’s root directory, create a folder named <code>.github</code>. Inside <code>.github</code>, create another folder called <code>workflows</code>.</p>
<p>Any YAML file placed in the <code>.github/workflows</code> directory is automatically recognized as a GitHub Actions workflow. These workflows will execute based on specific triggers, such as pull requests, pushes, or releases.</p>
<h3 id="heading-step-2-create-the-continuous-integration-workflow">Step 2: Create the Continuous Integration Workflow 🚀</h3>
<p>We’ll now create a CI workflow that automatically tests the application whenever a pull request is made to the <code>main</code> or <code>staging</code> branches.</p>
<p>First, inside the <code>workflows</code> directory, create a file named <code>ci-pipeline.yml</code>.</p>
<p>Paste the following code into the file:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">CI</span> <span class="hljs-string">Pipeline</span> <span class="hljs-string">to</span> <span class="hljs-string">staging/production</span> <span class="hljs-string">environment</span>
<span class="hljs-attr">on:</span>
  <span class="hljs-attr">pull_request:</span>
    <span class="hljs-attr">branches:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">staging</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">main</span>
<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">test:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">Setup,</span> <span class="hljs-string">test,</span> <span class="hljs-string">and</span> <span class="hljs-string">build</span> <span class="hljs-string">project</span>
    <span class="hljs-attr">env:</span>
      <span class="hljs-attr">PORT:</span> <span class="hljs-number">5001</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v3</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Install</span> <span class="hljs-string">dependencies</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">ci</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Test</span> <span class="hljs-string">application</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">test</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Build</span> <span class="hljs-string">application</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          echo "Run command to build the application if present"
          npm run build --if-present</span>
</code></pre>
<h4 id="heading-explanation-of-the-ci-workflow">Explanation of the CI Workflow</h4>
<p>Here’s a breakdown of each section in the workflow:</p>
<ol>
<li><p><code>name: CI Pipeline to staging/production environment</code>: This is the title of your workflow. It helps you identify this pipeline in GitHub Actions.</p>
</li>
<li><p><code>on</code>: The <code>on</code> parameter is what determines the events that trigger your workflow. When the workflow YAML file is pushed to the remote GitHub repository, GitHub Actions automatically registers the workflow using the configured triggers in the <code>on</code> field. These triggers act as event listeners that tell GitHub when to execute the workflow</p>
<p> <strong>For example:</strong></p>
<p> If we set <code>pull_request</code> as the value for the <code>on</code> parameter and specify the branches we want to monitor using the <code>branches</code> key, GitHub sets up event listeners for pull requests to those branches.</p>
<pre><code class="lang-yaml"> <span class="hljs-attr">on:</span>
   <span class="hljs-attr">pull_request:</span>
     <span class="hljs-attr">branches:</span>
       <span class="hljs-bullet">-</span> <span class="hljs-string">main</span>
       <span class="hljs-bullet">-</span> <span class="hljs-string">staging</span>
</code></pre>
<p> This configuration means that GitHub will trigger the workflow whenever a pull request is made to the <code>main</code> or <code>staging</code> branches.</p>
<p> <strong>Multiple Triggers</strong>:<br> You can define multiple event listeners in the <code>on</code> parameter. For instance, in addition to pull requests, you can add a listener for push events.</p>
<pre><code class="lang-yaml"> <span class="hljs-attr">on:</span>
   <span class="hljs-attr">pull_request:</span>
     <span class="hljs-attr">branches:</span>
       <span class="hljs-bullet">-</span> <span class="hljs-string">main</span>
       <span class="hljs-bullet">-</span> <span class="hljs-string">staging</span>
   <span class="hljs-attr">push:</span>
     <span class="hljs-attr">branches:</span>
       <span class="hljs-bullet">-</span> <span class="hljs-string">main</span>
</code></pre>
<p> This configuration ensures that the workflow is triggered when:</p>
<ul>
<li><p>A pull request is made to either the <code>main</code> or <code>staging</code> branch.</p>
</li>
<li><p>A push is made directly to the <code>main</code> branch.</p>
</li>
</ul>
</li>
</ol>
<p>    📘 <strong>Learn more about triggers:</strong> Check out the <a target="_blank" href="https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows">official GitHub documentation here</a>.</p>
<ol start="3">
<li><p><code>jobs</code>: The <code>jobs</code> section outlines the specific tasks (or jobs) that the workflow will execute. Each job is an independent unit of work that runs on a separate virtual machine (VM). This isolation ensures a clean, unique environment for every job, avoiding potential conflicts between tasks.</p>
<p> <strong>Key Points About Jobs:</strong></p>
<ol>
<li><p><strong>Clean VM for Each Job</strong>: When GitHub Actions runs a workflow, it assigns a dedicated VM instance to each job. This means the environment is reset for every job, ensuring there’s no overlap or interference between tasks.</p>
</li>
<li><p><strong>Multiple Jobs</strong>: Workflows can have multiple jobs, each responsible for a specific task. For example:</p>
<ul>
<li><p>A <strong>Test</strong> job to install dependencies and run automated tests.</p>
</li>
<li><p>A <strong>Build</strong> job to compile the application.</p>
</li>
</ul>
</li>
<li><p><strong>Job Organization</strong>: Jobs can be organized to run:</p>
<ul>
<li><p><strong>Sequentially</strong>: Ensures one job is completed before the next starts, for example the Test job must finish before the Build job. This sequential flow mimics the "pipeline" structure.</p>
</li>
<li><p><strong>Simultaneously</strong>: Multiple jobs can run in parallel to save time, especially if the jobs are independent of one another.</p>
</li>
</ul>
</li>
<li><p><strong>Single Job in This Workflow</strong>: In our current workflow, there is only one job, <code>test</code>, which:</p>
<ul>
<li><p>Installs dependencies.</p>
</li>
<li><p>Runs automated tests.</p>
</li>
<li><p>Builds the application.</p>
</li>
</ul>
</li>
</ol>
</li>
</ol>
<p>    📘 <strong>Learn more about jobs:</strong> Dive into the <a target="_blank" href="https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/using-jobs-in-a-workflow">GitHub Actions jobs documentation here</a>.</p>
<ol start="4">
<li><p><code>runs-on: ubuntu-latest</code>: Specifies the operating system the job will run on. GitHub provides pre-configured virtual environments, and we’re using the latest Ubuntu image.</p>
</li>
<li><p><code>env</code>: Sets environment variables for the job. Here, we define the <strong>PORT</strong> variable used by our application.</p>
</li>
<li><p><strong>Steps</strong>: Steps define the individual actions to execute within a job:</p>
<ul>
<li><p><code>Checkout</code>: Uses the <code>actions/checkout</code> action to clone the repository containing the codebase in the feature branch into the virtual machine instance environment. This step ensures the pipeline has access to the project files.</p>
</li>
<li><p><code>Install dependencies</code>: Runs <code>npm ci</code> to install the required Node.js packages.</p>
</li>
<li><p><code>Test application</code>: Runs the automated tests using the <code>npm test</code> command. This validates the codebase for errors or failing test cases.</p>
</li>
<li><p><code>Build application</code>: Builds the application if a build script is defined in the <code>package.json</code>. The <code>--if-present</code> flag ensures this step doesn’t fail if no build script is present.</p>
</li>
</ul>
</li>
</ol>
<p>Now that we’ve completed the CI pipeline, which runs on pull requests to the <code>main</code> or <code>staging</code> branches, let’s move on to setting up the <strong>Continuous Delivery (CD)</strong> and <strong>Continuous Deployment</strong> pipelines. 🚀</p>
<h3 id="heading-step-3-the-continuous-delivery-and-deployment-workflow">Step 3: The Continuous Delivery and Deployment Workflow</h3>
<p><strong>First, create the Pipeline File</strong>:<br>In the <code>.github/workflows</code> folder, create a new file called <code>cd-pipeline.yml</code>. This file will define the workflows for automating delivery and deployment.</p>
<p><strong>Next, paste the configuration</strong>:<br>Copy and paste the following configuration into the <code>cd-pipeline.yml</code> file:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">CD</span> <span class="hljs-string">Pipeline</span> <span class="hljs-string">to</span> <span class="hljs-string">Google</span> <span class="hljs-string">Cloud</span> <span class="hljs-string">Run</span> <span class="hljs-string">(staging</span> <span class="hljs-string">and</span> <span class="hljs-string">production)</span>
<span class="hljs-attr">on:</span>
  <span class="hljs-attr">push:</span>
    <span class="hljs-attr">branches:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">staging</span>
  <span class="hljs-attr">workflow_dispatch:</span> {}
  <span class="hljs-attr">release:</span>
    <span class="hljs-attr">types:</span> <span class="hljs-string">published</span>

<span class="hljs-attr">env:</span>
  <span class="hljs-attr">PORT:</span> <span class="hljs-number">5001</span>
  <span class="hljs-attr">IMAGE:</span> <span class="hljs-string">${{vars.IMAGE}}:${{github.sha}}</span>
<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">test:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">Setup,</span> <span class="hljs-string">test,</span> <span class="hljs-string">and</span> <span class="hljs-string">build</span> <span class="hljs-string">project</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v3</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Install</span> <span class="hljs-string">dependencies</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">ci</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Test</span> <span class="hljs-string">application</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">test</span>
  <span class="hljs-attr">build:</span>
    <span class="hljs-attr">needs:</span> <span class="hljs-string">test</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">Setup</span> <span class="hljs-string">project,</span> <span class="hljs-string">Authorize</span> <span class="hljs-string">GitHub</span> <span class="hljs-string">Actions</span> <span class="hljs-string">to</span> <span class="hljs-string">GCP</span> <span class="hljs-string">and</span> <span class="hljs-string">Docker</span> <span class="hljs-string">Hub,</span> <span class="hljs-string">and</span> <span class="hljs-string">deploy</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v3</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Authenticate</span> <span class="hljs-string">for</span> <span class="hljs-string">GCP</span>
        <span class="hljs-attr">id:</span> <span class="hljs-string">gcp-auth</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">google-github-actions/auth@v0</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">credentials_json:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.GCP_SERVICE_ACCOUNT</span> <span class="hljs-string">}}</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Set</span> <span class="hljs-string">up</span> <span class="hljs-string">Cloud</span> <span class="hljs-string">SDK</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">google-github-actions/setup-gcloud@v0</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Authenticate</span> <span class="hljs-string">for</span> <span class="hljs-string">Docker</span> <span class="hljs-string">Hub</span>
        <span class="hljs-attr">id:</span> <span class="hljs-string">docker-auth</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">D_USER:</span> <span class="hljs-string">${{secrets.DOCKER_USER}}</span>
          <span class="hljs-attr">D_PASS:</span> <span class="hljs-string">${{secrets.DOCKER_PASSWORD}}</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          docker login -u $D_USER -p $D_PASS
</span>      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Build</span> <span class="hljs-string">and</span> <span class="hljs-string">tag</span> <span class="hljs-string">Image</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          docker build -t ${{env.IMAGE}} .
</span>      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Push</span> <span class="hljs-string">the</span> <span class="hljs-string">image</span> <span class="hljs-string">to</span> <span class="hljs-string">Docker</span> <span class="hljs-string">hub</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          docker push ${{env.IMAGE}}
</span>      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Enable</span> <span class="hljs-string">the</span> <span class="hljs-string">Billing</span> <span class="hljs-string">API</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          gcloud services enable cloudbilling.googleapis.com --project=${{secrets.GCP_PROJECT_ID}}
</span>      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Deploy</span> <span class="hljs-string">to</span> <span class="hljs-string">GCP</span> <span class="hljs-string">Run</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Production</span> <span class="hljs-string">environment</span> <span class="hljs-string">(If</span> <span class="hljs-string">a</span> <span class="hljs-string">new</span> <span class="hljs-string">release</span> <span class="hljs-string">was</span> <span class="hljs-string">published</span> <span class="hljs-string">from</span> <span class="hljs-string">the</span> <span class="hljs-string">master</span> <span class="hljs-string">branch)</span>
        <span class="hljs-attr">if:</span> <span class="hljs-string">github.event_name</span> <span class="hljs-string">==</span> <span class="hljs-string">'release'</span> <span class="hljs-string">&amp;&amp;</span> <span class="hljs-string">github.event.action</span> <span class="hljs-string">==</span> <span class="hljs-string">'published'</span> <span class="hljs-string">&amp;&amp;</span> <span class="hljs-string">github.event.release.target_commitish</span> <span class="hljs-string">==</span> <span class="hljs-string">'main'</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          gcloud run deploy ${{vars.GCR_PROJECT_NAME}} \
          --region ${{vars.GCR_REGION}} \
          --image ${{env.IMAGE}} \
          --platform "managed" \
          --allow-unauthenticated \
          --tag production \
</span>      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Deploy</span> <span class="hljs-string">to</span> <span class="hljs-string">GCP</span> <span class="hljs-string">Run</span> <span class="hljs-bullet">-</span> <span class="hljs-string">Staging</span> <span class="hljs-string">environment</span>
        <span class="hljs-attr">if:</span> <span class="hljs-string">github.ref</span> <span class="hljs-type">!=</span> <span class="hljs-string">'refs/heads/main'</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          echo "Deploying to staging environment"
          # Deploy service with to staging environment
          gcloud run deploy ${{vars.GCR_STAGING_PROJECT_NAME}} \
          --region ${{vars.GCR_REGION}} \
          --image ${{env.IMAGE}} \
          --platform "managed" \
          --allow-unauthenticated \
          --tag staging \</span>
</code></pre>
<p>The <strong>CD pipeline</strong> configuration combines Continuous Delivery and Continuous Deployment workflows into a single file for simplicity. It builds on the concepts of CI/CD we discussed earlier, automating testing, building, and deploying the application to Google Cloud Run.</p>
<h4 id="heading-explanation-of-the-cd-pipeline">Explanation of the CD pipeline:</h4>
<ol>
<li><h4 id="heading-workflow-triggers-on">Workflow Triggers (<code>on</code>)</h4>
</li>
</ol>
<ul>
<li><p><code>push</code>: Workflow triggers on pushes to the <code>staging</code> branch.</p>
</li>
<li><p><code>workflow_dispatch</code>: Enables manual execution of the workflow via the GitHub Actions interface.</p>
</li>
<li><p><code>release</code>: Triggers when a new release is published.<br>  Example: When a release is published from the <code>main</code> branch, the app deploys to the production environment.</p>
</li>
</ul>
<ol start="2">
<li><p><strong>Job 1 – Testing the Codebase:</strong> The first job in the pipeline, Test, ensures the codebase is functional and error-free before proceeding with delivery or deployment</p>
</li>
<li><p><strong>Job 2 – Building and Deploying the Application:</strong> Aha! Moment ✨: These jobs run sequentially. 😃 The <strong>Build</strong> job begins only after the <strong>Test</strong> job is completed successfully. It prepares the application for deployment and manages the actual deployment process.</p>
<p> Here's what happens:</p>
<ul>
<li><p><strong>Authorization for GCP and Docker Hub</strong>: The workflow authenticates with both Google Cloud Platform (GCP) and Docker Hub. For GCP, it uses the <code>google-github-actions/auth@v0</code> action to handle service account credentials stored as secrets. Similarly, it logs into Docker Hub with stored credentials to enable image uploads.</p>
</li>
<li><p><strong>Build and Push Docker Image</strong>: The application is built into a Docker image and tagged with a unique identifier (<code>${{env.IMAGE}}</code>). This image is then pushed to Docker Hub, making it accessible for deployment.</p>
</li>
<li><p><strong>Deploy to Google Cloud Run</strong>: Based on the event that triggered the workflow, the application is <strong>deployed to either the staging or production environment</strong> in Google Cloud Run. A <strong>push</strong> to the <code>staging</code> branch deploys to the staging environment (Continuous Delivery), while a <strong>release</strong> from the <code>main</code> branch deploys to production (Continuous Deployment).</p>
</li>
</ul>
</li>
</ol>
<p>To ensure the security and flexibility of our pipeline, we rely on external variables and secrets rather than hardcoding sensitive information directly into the workflow file.</p>
<p>Why? Workflow configuration files are part of your repository and accessible to anyone with access to the codebase. If sensitive data, like API keys or passwords, is exposed here, it can be easily compromised. 😨</p>
<p>Instead, we use GitHub’s <strong>Secrets</strong> to securely store and access this information. Secrets allow us to define variables that are encrypted and only accessible by our workflows. For example:</p>
<ul>
<li><p><strong>DockerHub Credentials</strong>: We’ll add a Docker username and access token to the repository’s secrets. These are essential for authenticating with DockerHub to upload the built Docker images.</p>
</li>
<li><p><strong>Google Cloud Service Account Key</strong>: This key will grant the pipeline the necessary permissions to deploy the application on <strong>Google Cloud Run</strong> securely.</p>
</li>
</ul>
<p>We'll set up these variables and secrets incrementally as we proceed, ensuring each step is fully secure and functional. 🎯</p>
<h2 id="heading-set-up-a-docker-hub-repository-for-the-projects-image-and-generate-an-access-token-for-publishing-the-image"><strong>Set Up a Docker Hub Repository for the Project's Image and Generate an Access Token for Publishing the Image</strong> 📦</h2>
<p>Before we dive into the steps, let’s quickly go over what we’re about to do. In this section, you’ll learn how to create a Docker Hub repository, which acts like an online storage space for your application’s container image.</p>
<p>Think of a container image as a snapshot of your application, ready to be deployed anywhere. To ensure smooth and secure access, we’ll also generate a special access token, kind of like a revokable password that our CI/CD pipeline can use to upload your app’s image to Docker Hub. Let’s get started! 🚀</p>
<h3 id="heading-step-1-sign-up-for-docker-hub">Step 1: Sign Up for Docker Hub</h3>
<p>Here are the steps to follow to sign up for Docker Hub:</p>
<ol>
<li><p><strong>Go to the Docker Hub website</strong>: Open your web browser and visit Docker Hub - <a target="_blank" href="https://hub.docker.com/">https://hub.docker.com/</a>.</p>
</li>
<li><p><strong>Create an account</strong>: On the Docker Hub homepage, you’ll see a button labelled <strong>"Sign Up"</strong> in the top-right corner. Click on it.</p>
</li>
<li><p><strong>Fill in your details</strong>: You'll be asked to provide a few details like your username, email address, and password. Choose a strong password that you can remember.</p>
</li>
<li><p><strong>Agree to the terms</strong>: You’ll need to check a box to agree to Docker’s terms of service. After that, click <strong>“Sign Up”</strong> to create your account.</p>
</li>
<li><p><strong>Verify your email</strong>: Docker Hub will send you an email to verify your account. Open that email and click on the verification link to complete your account creation.</p>
</li>
</ol>
<h3 id="heading-step-2-sign-in-to-docker-hub">Step 2: Sign In to Docker Hub</h3>
<p>After verifying your email, go back to Docker Hub, and click on <strong>"Sign In"</strong> at the top right. Then you can use the credentials you just created to log in.</p>
<h3 id="heading-step-3-generate-an-access-token-for-the-cicd-pipeline">Step 3: Generate an Access Token (for the CI/CD pipeline)</h3>
<p>Now that you have an account, you can create an access token. This token will allow your GitHub Actions workflow to securely sign into Docker Hub and upload Docker images.</p>
<p>Once you’re logged into Docker Hub, click on your profile picture (or avatar) in the top right corner. This will open a menu. From the menu, click “Account Settings”.</p>
<p>Then in the left-hand menu of your account settings, scroll to the <strong>"Security"</strong> tab. This section is where you manage your tokens and passwords.</p>
<p>Now you’ll need to create a new access token. In the Security tab, you’ll see a link labelled <strong>“Personal access tokens”</strong> – click on it. Click the button labelled <strong>“Generate new token”</strong>.</p>
<p>You’ll be asked to give your token a description. You can name it something like "GitHub Actions CI/CD" so that you know what it's for.</p>
<p>After giving it a description, click on the “<strong>Access permissions dropdown</strong>“ and select <strong>“Read &amp; Write“,</strong> or <strong>“Read, Write, Delete“</strong>. Click “<strong>Generate</strong>“</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733129374816/c725f041-c0ef-49a0-b8ef-ca62acafc1ee.png" alt="Create Docker access token" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Now, you need to copy the credentials. After clicking the generate button, Docker Hub will create an access token. <strong>Immediately copy this token along with your username</strong> and save it somewhere safe, like in a file (don’t worry, we’ll add it to our GitHub secrets). You won’t be able to see this token again, so make sure you save it!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733133363382/33dbf334-a7ec-4151-8639-5368c3ccaedb.png" alt="Copy Docker username + access token" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-4-add-the-token-to-github-as-a-secret">Step 4: Add the Token to GitHub as a Secret</h3>
<p>To do this, open your GitHub repository where the codebase is hosted. In the GitHub repo, click on the <strong>Settings</strong> tab (located near the top of your repo page).</p>
<p>Then on the left sidebar, scroll down and click on <strong>“Secrets and Variables”</strong>, then choose <strong>“Actions”</strong>.</p>
<ol>
<li><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733133003023/75c3bd35-1a5b-46fa-845a-0f4fd8305d53.png" alt="Open GitHub Actions Secrets" class="image--center mx-auto" width="600" height="400" loading="lazy"></li>
</ol>
<p>Here are the steps to create and manage your new secret:</p>
<ol>
<li><p><strong>Add a new secret</strong>: Click on the <strong>“New repository secret”</strong> button.</p>
</li>
<li><p><strong>Set up the secret</strong>:</p>
<ul>
<li><p>In the <strong>Name</strong> field, type <code>DOCKER_PASSWORD</code>.</p>
</li>
<li><p>In the <strong>Value</strong> field, paste the access token you copied earlier.</p>
</li>
</ul>
</li>
<li><p><strong>Save the secret</strong>: Finally, click <strong>Add secret</strong> to save your Docker access token securely in GitHub.</p>
</li>
</ol>
<p>Then you’ll repeat the process for your Docker username. Create a new secret called <code>DOCKER_USER</code> and add your Docker username that you copied earlier.</p>
<p>And that’s it! Now your CI/CD pipeline can use this token to securely log in to Docker Hub and upload images automatically when triggered. 🎉</p>
<h3 id="heading-step-5-creating-the-dockerfile-for-the-project"><strong>Step 5: Creating the Dockerfile for the Project</strong></h3>
<p>Before you can build and publish the Docker image to Docker Hub, you need to create a <code>Dockerfile</code> that contains the necessary instructions to build your application.</p>
<p>Follow the steps below to create the <code>Dockerfile</code> in the root folder of your project:</p>
<ol>
<li><p>Navigate to your project’s root folder.</p>
</li>
<li><p>Create a new file named <code>Dockerfile</code>.</p>
</li>
<li><p>Open the <strong>Dockerfile</strong> in a text editor and paste the following content into it:</p>
</li>
</ol>
<pre><code class="lang-dockerfile"><span class="hljs-keyword">FROM</span> node:<span class="hljs-number">18</span>-slim

<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>

<span class="hljs-keyword">COPY</span><span class="bash"> package.json .</span>

<span class="hljs-keyword">RUN</span><span class="bash"> npm install -f</span>

<span class="hljs-keyword">COPY</span><span class="bash"> . .</span>

<span class="hljs-comment"># EXPOSE 5001</span>
<span class="hljs-keyword">EXPOSE</span> <span class="hljs-number">5001</span>

<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"npm"</span>, <span class="hljs-string">"start"</span>]</span>
</code></pre>
<h4 id="heading-explanation-of-the-dockerfile">Explanation of the Dockerfile:</h4>
<ul>
<li><p><code>FROM node:18-slim</code>: This sets the base image for the Docker container, which is a slim version of the official Node.js image based on version 18.</p>
</li>
<li><p><code>WORKDIR /app</code>: Sets the working directory for the application inside the container to <code>/app</code>.</p>
</li>
<li><p><code>COPY package.json .</code>: Copies the <code>package.json</code> file into the working directory.</p>
</li>
<li><p><code>RUN npm install -f</code>: Installs the project dependencies using <code>npm</code>.</p>
</li>
<li><p><code>COPY . .</code>: Copies the rest of the project files into the container.</p>
</li>
<li><p><code>EXPOSE 5001</code>: This tells Docker to expose port <code>5001</code>, which is the port our app will run on inside the container.</p>
</li>
<li><p><code>CMD ["npm", "start"]</code>: This sets the default command to start the application when the container is run, using <code>npm start</code>.</p>
</li>
</ul>
<h2 id="heading-create-a-google-cloud-account-project-and-billing-account"><strong>Create a Google Cloud Account, Project, and Billing Account</strong> ☁️</h2>
<p>In this section, we’re laying the foundation for deploying our application to Google Cloud. First, we’ll set up a Google Cloud account (don’t worry, it’s free to get started!). Then, we’ll create a new project where all the resources for your app will live.</p>
<p>Finally, we’ll enable billing so you can unlock the cloud services needed for deployment. Think of this as setting up your workspace in the cloud—organized, ready, and secure! Let’s dive in! ☁️</p>
<h3 id="heading-step-1-create-or-sign-in-to-a-google-cloud-account">Step 1: Create or Sign in to a Google Cloud Account 🌐</h3>
<p>First, go to <a target="_blank" href="https://console.cloud.google.com">Google Cloud Console</a>. If you don’t have a Google Cloud account, you’ll need to create one.</p>
<p>To do this, click on <strong>Get Started for Free</strong> and follow the steps to set up your account (you’ll need to provide payment information, but Google offers $300 in free credits to get started). If you already have a Google account, simply sign in using your credentials.</p>
<p>Once you’ve signed in, you’ll be taken to your Google Cloud dashboard. This is where you can manage all your cloud projects and resources.</p>
<h3 id="heading-step-2-create-a-new-google-cloud-project">Step 2: Create a New Google Cloud Project 🏗️</h3>
<p>At the top left of the Google Cloud Console, you’ll see a drop-down menu beside the Google Cloud logo. Click on this drop-down to display your current projects.</p>
<p>Now it’s time to create a new project. In the top-left corner of the pop-up modal, click on the <strong>New Project</strong> button.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733134260252/6769909a-cf9c-4c91-9d79-7676500f3981.webp" alt="Create Google Cloud Project" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>You’ll be redirected to a page where you’ll need to provide some basic details for your new project. So now enter the following information:</p>
<ul>
<li><p><strong>Project Name:</strong> Enter a name of your choice for the project (for example, <code>gcr-ci-cd-project</code>).</p>
</li>
<li><p><strong>Location:</strong> Select a location for your project. You can leave it as the default "No organization" if you're just getting started.</p>
</li>
</ul>
<p>Once you've entered the project name, click the <strong>Create</strong> button. Google Cloud will now start creating your new project. It may take a few seconds.</p>
<h3 id="heading-step-3-access-your-new-project">Step 3: Access Your New Project 🛠️</h3>
<p>After a few seconds, you’ll be redirected to your <strong>Google Cloud dashboard</strong>.</p>
<p>Click on the drop-down menu beside the Google Cloud logo again, and you should now see your newly created project listed in the modal where you can select it.</p>
<p>Then click on the project name (for example, <code>gcr-ci-cd-project</code>) to enter your project’s dashboard.</p>
<h3 id="heading-step-4-link-a-billing-account-to-your-project">Step 4: Link A Billing Account To Your Project 💳</h3>
<p>To access the billing page, in the Google Cloud Console, find the <strong>Navigation Menu</strong> (the three horizontal lines) at the top left of the screen. Click on it to open a list of options. Scroll down and click on <strong>Billing</strong>. This will take you to the billing section of your Google Cloud account.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733134747962/745c8a0e-13c5-4dde-849b-303c1200f495.png" alt="Navigate to Google Cloud Billing dashboard/section " class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>If you haven't set up a billing account yet, you'll be prompted to do so. Click on the <strong>"Link a billing account"</strong> button to start the process.</p>
<p>Now you can create a new billing account (if you don’t have one). You’ll be redirected to a page where you can either select an existing billing account or create a new one. If you don't already have a billing account, click on <strong>"Create a billing account"</strong>.</p>
<p>Provide the necessary details, including:</p>
<ul>
<li><p><strong>Account name</strong> (for example, "Personal Billing Account" or your business name).</p>
</li>
<li><p><strong>Country</strong>: Choose the country where your business or account is based.</p>
</li>
<li><p><strong>Currency</strong>: Choose the currency in which you want to be billed.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733135153425/1287ab53-e9c5-45b5-a09d-3d3a13840ca4.png" alt="Create Google Cloud billing account" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
</ul>
<p>Next, enter your payment information (credit card or bank account details). Google Cloud will verify your payment method, so make sure the information is correct.</p>
<p>Read and agree to the Google Cloud Terms of Service and Billing Account Terms. Once you’ve done this, click <strong>"Start billing"</strong> to finish setting up your billing account</p>
<p>After setting up your billing account, you’ll be taken to a page that asks you to <strong>link</strong> it to your project. Select the billing account you just created or an existing billing account you want to use. Click Set Account to link the billing account to your project.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733337276189/b80702dd-2ff6-42db-a325-c2082e8059e5.png" alt="Link Google Cloud billing account to project" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>After you’ve linked your billing account to your project, you should see a confirmation message indicating that billing has been successfully enabled for your project.</p>
<p>You can always verify this by returning to the Billing section in the Google Cloud Console, where you’ll see your billing account listed.</p>
<h2 id="heading-create-a-google-cloud-service-account-to-enable-deployment-of-the-nodejs-application-to-google-cloud-run-via-the-cd-pipeline"><strong>Create a Google Cloud Service Account to Enable Deployment of the Node.js Application to Google Cloud Run via the CD Pipeline</strong> 🚀</h2>
<h3 id="heading-why-do-we-need-a-service-account-and-key">Why Do We Need a Service Account and Key? 🤔</h3>
<p>A <strong>service account</strong> allows our CI/CD pipeline to authenticate and interact with Google Cloud services programmatically. By assigning specific roles (permissions), we ensure the service account can only perform tasks related to deployment, such as managing Google Cloud Run.</p>
<p>The <strong>service account key</strong> is a JSON file containing the credentials used for authentication. We securely store this key as a GitHub secret to protect sensitive information.</p>
<h3 id="heading-step-1-open-the-service-accounts-page">Step 1: Open the Service Accounts Page</h3>
<p>Here are the steps you can follow to set up your service account and get your key:</p>
<p>First, visit the Google Cloud Console at <a target="_blank" href="https://console.cloud.google.com/">https://console.cloud.google.com/</a>. Ensure you’ve selected the correct project (e.g. <code>gcr-ci-cd-project</code>). To change projects, click the drop-down menu next to the Google Cloud logo at the top-left corner and select your project.</p>
<p>Then navigate to the Navigation Menu (three horizontal lines in the top-left corner) and click on <strong>IAM &amp; Admin &gt; Service Accounts</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733147553088/e3647442-ca8e-4197-ab5f-91cee5a6d6b0.png" alt="Navigate to Google Cloud IAM - Service Account" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-2-create-a-new-service-account">Step 2: Create a New Service Account</h3>
<p>Click on the "Create Service Account" button. This will open a form where you’ll define your service account details.</p>
<p>Next, enter the Service Account details:</p>
<ul>
<li><p><strong>Name</strong>: Enter a descriptive name (for example, <code>ci-cd-sa</code>).</p>
</li>
<li><p><strong>ID</strong>: This will auto-fill based on the name.</p>
</li>
<li><p><strong>Description</strong>: Add a description to help identify its purpose, such as “Used for deploying Node.js app to Cloud Run.”</p>
</li>
<li><p>Click <strong>Create and Continue</strong> to proceed.</p>
</li>
</ul>
<h3 id="heading-step-3-assign-necessary-roles-permissions">Step 3: Assign Necessary Roles (Permissions)</h3>
<p>On the next screen, you’ll assign roles to the service account. Add the following roles one by one:</p>
<ul>
<li><p><strong>Cloud Run Admin</strong>: Allows management of Cloud Run services.</p>
</li>
<li><p><strong>Service Account User</strong>: Grants the ability to use service accounts.</p>
</li>
<li><p><strong>Service Usage Admin</strong>: Enables control over enabling APIs.</p>
</li>
<li><p><strong>Viewer</strong>: Provides read-only access to view resources.</p>
</li>
</ul>
<p>To add a role:</p>
<ul>
<li><p>Click on <strong>"Select a Role"</strong>.</p>
</li>
<li><p>Use the search bar to type the role name (for example, "Cloud Run Admin") and select it.</p>
</li>
<li><p>Repeat for all four roles.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733147870701/393833c9-c320-49e3-8743-dbc0d739b99b.png" alt="Create Google Cloud Service Account - Add role to a service account during creation" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Your screen should look similar to this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733147949148/c509c810-767d-4900-aa44-a737cc1c8dc1.png" alt="Create a Google Cloud service account (SA) - Done assigning all roles to SA" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>After assigning the roles, click <strong>Continue</strong>.</p>
<h3 id="heading-step-4-skip-granting-users-access-to-the-service-account">Step 4: Skip Granting Users Access to the Service Account</h3>
<p>On the next screen, you’ll see an option to grant additional users access to this service account. Click <strong>Done</strong> to complete the creation process.</p>
<h3 id="heading-step-5-generate-a-service-account-key">Step 5: Generate a Service Account Key 🔑</h3>
<p>You should now see your newly created service account in the list. Find the row for your service account (for example, <code>ci-cd-sa</code>) and click the three vertical dots under the “Actions” column. Select <strong>"Manage Keys"</strong> from the drop-down menu.</p>
<p>To add a new key:</p>
<ul>
<li><p>Click on <strong>"Add Key" &gt; "Create New Key"</strong>.</p>
</li>
<li><p>In the pop-up dialog, select <strong>JSON</strong> as the key type.</p>
</li>
<li><p>Click <strong>Create</strong>.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733148120618/c7014982-ae7d-40ed-bbfb-0c8f5c4b8090.png" alt="Create Google Cloud service account key" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
</ul>
<p>Now, download the key file. A JSON file will automatically be downloaded to your computer. This file contains the credentials needed to authenticate with Google Cloud.</p>
<p>Make sure you keep the key secure and store it in a safe location. Don’t share it – treat it as sensitive information.</p>
<h3 id="heading-step-6-add-the-service-account-key-to-github-secrets">Step 6: Add the Service Account Key to GitHub Secrets 🔒</h3>
<p>Start by opening the downloaded JSON file using a text editor (like Notepad or VS Code). Then select and copy the entire contents of the file.</p>
<p>Then navigate to the repository you created for this project on GitHub. Click on the <strong>Settings</strong> tab at the top of the repository. Scroll down and find the <strong>Secrets and variables &gt; Actions</strong> section.</p>
<p>Now you need to add a new secret. Click the <strong>"New repository secret"</strong> button. In the <strong>Name</strong> field, enter <code>GCP_SERVICE_ACCOUNT</code>. In the <strong>Value</strong> field, paste the JSON content you copied earlier. Click <strong>Add secret</strong> to save it.</p>
<p>Do the same for the <code>GCP_PROJECT_ID</code> secret, but now add your Google Project ID as the value. To get your project ID, follow these steps:</p>
<ol>
<li><p><strong>Navigate to the Google Cloud Console</strong>: Open Google Cloud Console at <a target="_blank" href="https://console.cloud.google.com/">https://console.cloud.google.com/</a>.</p>
</li>
<li><p><strong>Locate the Project Dropdown</strong>: At the top-left of the screen, next to the <strong>Google Cloud logo</strong>, you will see a drop-down that shows the name of your current project.</p>
</li>
<li><p><strong>View the Project ID</strong>: Click the drop-down, and you'll see a list of all your projects. Your <strong>Project ID</strong> will be displayed next to the project name. It is a unique identifier used by Google Cloud.</p>
</li>
<li><p><strong>Copy the Project ID</strong>: Copy the <strong>Project ID</strong> that is displayed, and add it as the value of the <code>GCP_PROJECT_ID</code> secret.</p>
</li>
</ol>
<h3 id="heading-step-7-adding-external-variables-to-the-github-repository">Step 7: Adding External Variables to the GitHub Repository 🔧</h3>
<p>Before proceeding with deployment, we need to define some external variables that were referenced in the CD workflow. These variables ensure that the pipeline knows critical details about your Google Cloud Run services and Docker container registry.</p>
<p>Here are the steps you’ll need to follow to do this:</p>
<ol>
<li><p>First, go to your repository on GitHub.</p>
</li>
<li><p>Click the <strong>Settings</strong> tab at the top of the repository. Scroll down to <strong>Secrets and variables &gt; Actions</strong>.</p>
</li>
<li><p>Click on the <strong>Variables</strong> tab next to <strong>Secrets</strong>. Click <strong>"New repository variable"</strong> for each variable. Then you’ll need to define these variables:</p>
<ul>
<li><p><code>GCR_PROJECT_NAME</code>: Set this to the name of your Cloud Run service for the production/live environment. For example, <code>gcr-ci-cd-app</code>.</p>
</li>
<li><p><code>GCR_STAGING_PROJECT_NAME</code>: Set this to the name of your Cloud Run service for the staging/test environment. For example, <code>gcr-ci-cd-staging</code>.</p>
</li>
<li><p><code>GCR_REGION</code>: Enter the region where you’d like to deploy the services. For this tutorial, set it to <code>us-central1</code>.</p>
</li>
<li><p><code>IMAGE</code>: Specify the name of the Docker image/container registry where the published image will be uploaded. For example, <code>&lt;dockerhub-username&gt;/ci-cd-tutorial-app</code>.</p>
</li>
</ul>
</li>
<li><p>After entering each variable name and value, click <strong>Add variable</strong>.</p>
</li>
</ol>
<h3 id="heading-enabling-the-service-usage-api-on-the-google-cloud-project">Enabling the Service Usage API on the Google Cloud Project 🌐</h3>
<p>To deploy your application, the <strong>Service Usage API</strong> must be enabled in your Google Cloud project. This API allows you to manage Google Cloud services programmatically, including enabling/disabling APIs and monitoring their usage.</p>
<p>Follow these steps to enable it:</p>
<ol>
<li><p>First, visit the Google Cloud Console at <a target="_blank" href="https://console.cloud.google.com/">https://console.cloud.google.com/</a>.</p>
</li>
<li><p>Then make sure you’re in the correct project. Click the project drop-down menu near the <strong>Google Cloud logo</strong> at the top-left corner. Select <code>gcr-ci-cd-project</code> , or the name you gave your project from the list of projects.</p>
</li>
<li><p>Next you’ll need to access the API library. Open the <strong>Navigation Menu</strong> (three horizontal lines in the top-left corner). Select <strong>APIs &amp; Services &gt; Library</strong> from the menu.</p>
</li>
<li><p>In the API Library, use the search bar to search for <strong>"Service Usage API"</strong>.</p>
</li>
<li><p>Click on the <strong>Service Usage API</strong> from the search results. On the API’s details page, click <strong>Enable</strong>.</p>
</li>
<li><p>To verify, go to <strong>APIs &amp; Services &gt; Enabled APIs &amp; Services</strong> in the Google Cloud Console. Confirm that the <strong>Service Usage API</strong> appears in the list of enabled APIs.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733150269757/00a4e20b-72ac-4bd4-b05f-af6e61600e09.png" alt="Enable the Google Cloud &quot;Service Usage API&quot; in the project" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
</ol>
<h2 id="heading-create-the-staging-branch-and-merge-the-feature-branch-into-it-continuous-integration-and-continuous-delivery"><strong>Create the Staging Branch and Merge the Feature Branch into It (Continuous Integration and Continuous Delivery) 🌟</strong></h2>
<p>When changes from the <code>feature/ci-cd-pipeline</code> branch are merged into the <code>staging</code> branch, we complete the <strong>Continuous Integration (CI)</strong> process, and the workflow <code>ci-pipeline.yml</code> will run. This ensures that the changes made in the feature branch are tested and integrated into a shared branch.</p>
<p>Once the pull request (PR) is merged into <code>staging</code>, the <strong>Continuous Delivery (CD)</strong> pipeline automatically triggers, deploying the application to the staging environment. This simulates how updates are tested in a safe environment before being pushed to production.</p>
<h3 id="heading-create-the-staging-branch-on-the-remote-repository">Create the <code>staging</code> Branch on the Remote Repository</h3>
<p>To enable the CI/CD pipeline, we’ll first create a <code>staging</code> branch on the remote GitHub repository. This branch will serve as the test environment where changes are deployed before they reach the production environment.</p>
<p>To create the <code>staging</code> branch directly on GitHub, follow these steps:</p>
<ol>
<li><p>First, navigate to your repository on GitHub. Open your web browser and go to the GitHub repository where you want to create the new <code>staging</code> branch.</p>
</li>
<li><p>Then, switch to the <code>main</code> branch. On the top of the repository page, locate the <strong>Branch</strong> dropdown (usually labelled as <code>main</code> or the current branch name). Click on the dropdown and make sure you are on the <code>main</code> branch.</p>
</li>
<li><p>Next, create the <code>staging</code> branch. In the same dropdown where you see the <code>main</code> branch, type <code>staging</code> into the text box. Once you start typing, GitHub will offer you the option to create a new branch called <code>staging</code>. Select the <strong>Create branch: staging</strong> option from the dropdown.</p>
</li>
<li><p>Finally, verify the branch**.** After creating the <code>staging</code> branch, GitHub will automatically switch to it. You should now see <code>staging</code> in the branch dropdown, confirming the new branch was created.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733152232155/e6215137-5e3b-474b-88f8-af03269eccc2.png" alt="Create a new Staging branch in the GitHub repository" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
</li>
</ol>
<h3 id="heading-merge-your-feature-branch-into-the-staging-branch-via-a-pull-request-pr"><strong>Merge Your Feature Branch into the Staging Branch via a Pull Request (PR)</strong></h3>
<p>This process combines both Continuous Integration (CI) and Continuous Delivery (CD). You will commit changes from your feature branch, push them to the remote feature branch, and then open a PR to merge those changes into the <code>staging</code> branch. Here's how to do it:</p>
<h4 id="heading-step-1-commit-local-changes-on-your-feature-branch"><strong>Step 1: Commit Local Changes on Your Feature Branch</strong></h4>
<p>First, you’ll want to make sure that you are on the correct branch (the feature branch) by running:</p>
<pre><code class="lang-bash">git status
</code></pre>
<p>If you are not on the <code>feature/ci-cd-pipeline</code> branch, switch to it by running:</p>
<pre><code class="lang-bash">git checkout feature/ci-cd-pipeline
</code></pre>
<p>Now, it’s time to add your changes you made for the commit:</p>
<pre><code class="lang-bash">git add .
</code></pre>
<p>This stages all changes, including new files, modified files, and deleted files.</p>
<p>Next, commit your changes with a clear and descriptive message:</p>
<pre><code class="lang-bash">git commit -m <span class="hljs-string">"Set up CI/CD pipelines for the project"</span>
</code></pre>
<p>Then you can verify your commit by running:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">log</span>
</code></pre>
<p>This will display your most recent commits, and you should see the commit message you just added.</p>
<h4 id="heading-step-2-push-your-feature-branch-changes-to-the-remote-repository"><strong>Step 2: Push Your Feature Branch Changes to the Remote Repository</strong></h4>
<p>After committing your changes, push them to the remote repository:</p>
<pre><code class="lang-bash">git push origin feature/ci-cd-pipeline
</code></pre>
<p>This pushes your local changes on the <code>feature/ci-cd-pipeline</code> branch to the remote GitHub repository.</p>
<p>Once the push is successful, visit your GitHub repository in a web browser, and confirm that the <code>feature/ci-cd-pipeline</code> branch is updated with your new commit.</p>
<h4 id="heading-step-3-create-a-pull-request-to-merge-the-feature-branch-into-staging"><strong>Step 3: Create a Pull Request to Merge the Feature Branch into Staging</strong></h4>
<p>Go to your repository on GitHub and ensure that you are on the main page of the repository.</p>
<p>You should see an alert at the top of the page suggesting you create a pull request for the recently pushed branch (<code>feature/ci-cd-pipeline</code>). Click the <strong>Compare &amp; Pull Request</strong> button next to the alert.</p>
<p>Now, it’s time to choose the base and compare branches. On the PR creation page, make sure the <strong>base</strong> branch is set to <code>staging</code> (this is the branch you want to merge your changes into). The <strong>compare</strong> branch should already be set to <code>feature/ci-cd-pipeline</code> (the branch you just pushed). If they’re not selected correctly, use the dropdowns to change them.</p>
<p>You’ll want to come up with a good PR description for this. Write a clear title and description for the pull request, explaining what changes you're merging and why. For example:</p>
<ul>
<li><p><strong>Title</strong>: "Merge CI/CD setup changes from feature branch"</p>
</li>
<li><p><strong>Description</strong>: "This pull request adds the CI/CD pipelines for GitHub Actions and Docker Hub integration to the project. It includes the configurations for both CI and CD workflows."</p>
</li>
</ul>
<p>Now GitHub will show a list of all the changes that will be merged. Take a moment to review them and ensure everything looks correct.</p>
<p>If all looks good after reviewing, click on the <strong>Create pull request</strong> button. This will create the PR and notify team members (if any) that changes are ready to be reviewed and merged.</p>
<p>Wait a few seconds, and you should see a message indicating that all the checks have passed. Click on the link with the description "<strong>CI Pipeline to staging/production environment...</strong>". This should direct you to the Continuous Integration workflow, where you can view the steps that ran</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733153444873/6ecdb277-0a45-44ec-981c-c7ee671cd2f0.png" alt="Create a new pull request (PR) from the feature to the staging branch" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733153637817/e12fefde-9259-41a3-9bd1-63b5da1d88ea.png" alt="CI workflow run from PR (feature to staging branch)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-the-continuous-integration-ci-process">The Continuous Integration (CI) Process</h4>
<p>The CI process begins when a Pull Request is made to the <code>staging</code> branch. It triggers the GitHub Actions workflow defined in the <code>.github/workflows/ci-pipeline.yml</code> file. The workflow runs the necessary steps to set up the environment, install dependencies, and build the Node.js application.</p>
<p>It then runs automated tests (using <code>npm test</code>) to ensure that the changes do not break any functionality in the codebase. If all these steps are completed successfully, the CI pipeline confirms that the feature branch is stable and ready to be merged into the <code>staging</code> branch for further testing and deployment.</p>
<h4 id="heading-step-4-merge-the-pull-request"><strong>Step 4: Merge the Pull Request</strong></h4>
<p>If your team or collaborators are part of the project, they may review your PR. This step may involve discussing any changes or improvements. If everything looks good, a reviewer will merge the PR.</p>
<p>Once the PR has been reviewed and approved, you can merge the PR. To do this, just click on the <strong>Merge pull request</strong> button. Choose <strong>Confirm merge</strong> when prompted.</p>
<p>After merging, you can go to the <code>staging</code> branch to verify that the changes were successfully merged.</p>
<h3 id="heading-navigating-to-the-actions-page-after-merging-the-pr"><strong>Navigating to the Actions Page After Merging the PR</strong></h3>
<p>Once you have successfully merged your pull request from the <code>feature/ci-cd-pipeline</code> branch into the <code>staging</code> branch, the Continuous Delivery (CD) pipeline will be triggered. To view the progress of the CD pipeline, navigate to the <strong>Actions</strong> tab in your GitHub repository. Here's how to do it:</p>
<ol>
<li><p>Go to your GitHub repository.</p>
</li>
<li><p>At the top of the page, you will see the <strong>Actions</strong> tab next to the <strong>Code</strong> tab. Click on it.</p>
</li>
<li><p>On the Actions page, you will see a list of workflows that have been triggered. Look for the one labelled <strong>CD Pipeline to Google Cloud Run (staging and production)</strong>. It should appear as a new run after the PR merge.</p>
</li>
<li><p>Click on the workflow run to view its progress and see the detailed logs for each step.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733154575368/96e236a2-ae66-494b-b544-f96955a18ac9.png" alt="Continuous Delivery workflow from merge to staging (feature to staging)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733159329441/cb7e26a9-7a20-4b1b-9869-e00facc695c1.png" alt="Continuous Delivery workflow Jobs from merge to staging (feature to staging)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733160506355/4682afe3-bb04-405d-af4e-fd9bd3494659.png" alt="Continuous Delivery workflow steps from merge to staging (feature to staging)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>This will allow you to monitor the status of the CD pipeline and check if there are any issues during deployment.</p>
<p>If you look at the CD steps and workflow, you'll see that the step to deploy the application to the <strong>production</strong> environment was skipped, while the step to deploy to the <strong>staging</strong> environment was executed.</p>
<h4 id="heading-continuous-delivery-cd-pipeline-whats-going-on"><strong>Continuous Delivery (CD) pipeline – what’s going on:</strong></h4>
<p>The <strong>Continuous Delivery (CD) Pipeline</strong> automates the process of deploying the application to Google Cloud Run (testing environment). This workflow is triggered by a push to the <code>staging</code> branch, which happens after the changes from the feature branch are merged into <code>staging</code>. It can also be manually triggered via <code>workflow_dispatch</code> or upon a new release being published.</p>
<p>The pipeline consists of multiple stages:</p>
<ol>
<li><p><strong>Test Job:</strong> The pipeline begins by setting up the environment and running tests using the <code>npm test</code> command. If the tests pass, the process moves forward.</p>
</li>
<li><p><strong>Build Job:</strong> The next step builds the Docker image of the Node.js application, tags it, and then pushes it to Docker Hub.</p>
</li>
<li><p><strong>Deployment to GCP:</strong> After the image is pushed, the workflow authenticates to Google Cloud and deploys the application. If the event is a release (that is, a push to the <code>main</code> branch), the application is deployed to the production environment. If the event is a push to <code>staging</code>, the app is deployed to the staging environment.</p>
</li>
</ol>
<p>The CD process ensures that any changes made to the <code>staging</code> branch are automatically tested, built, and deployed to the staging environment, ready for further validation. When a release is published, it will trigger deployment to production, ensuring your app is always up to date.</p>
<h3 id="heading-accessing-the-deployed-application-in-the-staging-environment-on-google-cloud-run">Accessing the Deployed Application in the Staging Environment on Google Cloud Run 🌐</h3>
<p>Once the deployment to Google Cloud Run is successfully completed, you'll want to access your application running in the <strong>staging</strong> environment. Follow these steps to find and visit your deployed application:</p>
<h4 id="heading-1-navigate-to-the-google-cloud-console">1. <strong>Navigate to the Google Cloud Console</strong></h4>
<p>Open the Google Cloud Console in your browser by visiting <a target="_blank" href="https://console.cloud.google.com">https://console.cloud.google.com</a>. If you're not already signed in, make sure you log in with your Google account.</p>
<h4 id="heading-2-go-to-the-cloud-run-dashboard">2. <strong>Go to the Cloud Run Dashboard</strong></h4>
<p>In the Google Cloud Console, use the Search bar at the top or navigate through the left-hand menu: Go to <strong>Cloud Run</strong> (you can type this into the search bar, or find it under <strong>Products &amp; services</strong> &gt; <strong>Compute</strong> &gt; <strong>Cloud Run</strong>). Click on <strong>Cloud Run</strong> to open the Cloud Run dashboard.</p>
<h4 id="heading-3-select-your-staging-service">3. <strong>Select Your Staging Service</strong></h4>
<p>In the <strong>Cloud Run dashboard</strong>, you should see a list of all your services deployed across various environments. Find the service associated with the staging environment. The name should be similar to what you defined in your workflow (for example, <code>gcr-ci-cd-staging</code>).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733159635861/4ac895d2-5071-4d3f-9ed1-5af2bcca8835.png" alt="Google Cloud Run service for the staging environment" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-4-access-the-service-url">4. <strong>Access the Service URL</strong></h4>
<p>Once you've selected your staging service, you’ll be taken to the <strong>Service details page</strong>. This page provides all the important information about your deployed service.<br>On this page, look for the <strong>URL</strong> section under the <strong>Service URL</strong> heading. The URL will look something like: <code>https://gcr-ci-cd-staging-&lt;unique-id&gt;.run.app</code>.</p>
<h4 id="heading-5-visit-the-application">5. <strong>Visit the Application</strong></h4>
<p>Click on the <strong>Service URL</strong>, and it will open your staging environment in a new tab in your browser. You can now interact with your application as if it were live, but in the <strong>staging environment</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733160050763/b097e647-bf6d-442e-87df-fc7d82d3585c.png" alt="Google Cloud Run service URL for the staging environment" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-merge-the-staging-branch-into-the-main-branch-continuous-integration-and-continuous-deployment"><strong>Merge the Staging Branch into the Main Branch (Continuous Integration and Continuous Deployment) 🌐</strong></h2>
<p>In this section, we'll take the updates in the staging branch, merge them into the main branch, and trigger the CI/CD pipeline. This process not only ensures your changes are production-ready but also deploys them to the production/live environment. 🚀</p>
<h3 id="heading-step-1-push-local-changes-and-open-a-pull-request">Step 1: Push Local Changes and Open a Pull Request</h3>
<p><strong>Why?</strong> The first step involves merging the staging branch into the main branch. Just like in the previous Continuous Delivery process, this ensures the integration of thoroughly tested updates.</p>
<p>Here’s how to do it:</p>
<p>First, visit the GitHub repository where your project is hosted.</p>
<p>Then go to the <strong>Pull Requests</strong> tab. Click <strong>New Pull Request</strong>. Choose <strong>staging</strong> as the source branch (base branch) and <strong>main</strong> as the target branch. Add a clear title and description for the Pull Request, explaining why these updates are ready for production deployment.</p>
<h3 id="heading-step-2-continuous-integration-ci-pipeline-execution">Step 2: Continuous Integration (CI) Pipeline Execution</h3>
<p>After merging the pull request, the <strong>Continuous Integration (CI)</strong> pipeline will automatically execute to validate that the changes are still stable when integrated into the <strong>main branch</strong>.</p>
<h4 id="heading-pipeline-steps">Pipeline Steps:</h4>
<ul>
<li><p><strong>Code Checkout</strong>: The workflow fetches the latest code from the <strong>main branch</strong>.</p>
</li>
<li><p><strong>Dependency Installation</strong>: The pipeline installs all required dependencies.</p>
</li>
<li><p><strong>Testing</strong>: Automated tests are run to validate the application's stability.</p>
</li>
</ul>
<h3 id="heading-step-3-create-a-new-release">Step 3: Create a New Release</h3>
<p>The Continuous Deployment (CD) workflow to deploy to the production environment is triggered by the creation of a new release from the main branch.</p>
<p>Let’s walk through the steps to create a release.</p>
<p>On your GitHub repository page, click on the <strong>Releases</strong> section (located under the <strong>Code</strong> tab).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733338781623/c21e7f03-5381-47f9-8807-b5a3360245ad.png" alt="Navigate to the Release page in theGitHub repo" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Next, click <strong>Draft a new release</strong>. Set the <strong>Target</strong> branch to <strong>main</strong>. Enter a <strong>Tag version</strong> (for example, <code>v1.0.0</code>) following semantic versioning. Add a <strong>Release title</strong> and an optional description of the changes.</p>
<p>Then, click <strong>Publish Release</strong> to finalize.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733161473858/6e14214c-31fb-49b3-9dff-a719b9ec1d40.png" alt="Create a new release in the GitHub repo" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h4 id="heading-why-run-the-continuous-deployment-pipeline-on-release-instead-of-on-push">Why run the Continuous Deployment pipeline on release instead of on push? 🤔</h4>
<p>In our setup, we decided not to trigger the Continuous Deployment (CD) pipeline every time changes are pushed to the main branch. Instead, we trigger it only when a new release is created. This gives the team more control over when updates are deployed to the production environment.</p>
<p>Imagine a scenario where developers are working on new features—they may push changes to the main branch as part of their regular workflow, but these features might not be complete or ready for users yet. Automatically deploying every push could accidentally expose unfinished features to your users, which can be confusing or disruptive.</p>
<p>By requiring a release to trigger the deployment, the team gets a chance to finalize and polish all changes before they go live.</p>
<p>For example, developers can test new features in the staging environment, fix any issues, and merge those changes into the main branch without worrying about them immediately appearing in production. This workflow ensures that only well-tested and complete features make their way to your end users.</p>
<p>Ultimately, this approach helps maintain a smooth user experience. Instead of seeing half-built features or unexpected changes, users only see updates that are ready and functional. It also gives the team the flexibility to push changes to the main branch frequently—preventing merge conflicts and making collaboration easier—while keeping control over what gets deployed live. 🚀</p>
<h3 id="heading-step-4-navigate-to-the-actions-page">Step 4: Navigate to the Actions Page</h3>
<p>After the release is published, the CD pipeline for the production environment is triggered. To monitor this repeat the process taken for the Continuous Delivery workflow, follow these steps:</p>
<ol>
<li><p><strong>Go to the GitHub Actions tab</strong>: In your GitHub repository, click on the <strong>Actions</strong> tab.</p>
</li>
<li><p><strong>Locate the deployment workflow</strong>: Look for the <strong>CD Pipeline to Google Cloud Run (staging and production)</strong> workflow. You’ll notice that the workflow has been triggered on the <strong>main branch</strong> due to the push event.</p>
</li>
<li><p><strong>Open the workflow details</strong>: Click on the workflow to view detailed steps, logs, and statuses for each part of the deployment process.</p>
</li>
</ol>
<p>This time, the Continuous delivery workflow deploys the application to the <strong>production</strong>/<strong>live</strong> environment.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733164741827/303cd415-5bb9-4149-aa5d-7088d0eab582.png" alt="Continuous Deployment workflow from merge to main (staging to main)" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-5-access-the-live-application">Step 5: Access the Live Application</h3>
<p>Once the deployment is complete, go to Google Cloud Console at <a target="_blank" href="https://console.cloud.google.com">https://console.cloud.google.com</a>.</p>
<p>Navigate to <strong>Cloud Run</strong> from the menu. Select the service corresponding to the <strong>production environment</strong> (for example, <code>gcr-ci-cd-app</code>).</p>
<p>Locate the <strong>Service URL</strong> in the service details page. Open the URL in your browser to access the live application.</p>
<p>And now, congratulations – you’re done!</p>
<h2 id="heading-conclusion">Conclusion 🌟</h2>
<p>In this article, we explored how to build and automate a CI/CD pipeline for a Node.js application, using GitHub Actions, Docker Hub, and Google Cloud Run.</p>
<p>We set up workflows to handle Continuous Integration by testing and integrating code changes and Continuous Delivery to deploy those changes to a staging environment. We also containerized our app using Docker and deployed it seamlessly to Google Cloud Run.</p>
<p>Finally, we implemented Continuous Deployment, ensuring updates to the production environment happen only when a release is created from the main branch.</p>
<p>This approach gives teams the flexibility to push and test incomplete features without impacting end users. By following these steps, you've built a robust pipeline that makes deploying your application smoother, faster, and more reliable.</p>
<h3 id="heading-study-further">Study Further 📚</h3>
<p>If you would like to learn more about Continuous Integration, Delivery, and Deployment you can check out the courses below:</p>
<ul>
<li><p><a target="_blank" href="https://www.coursera.org/learn/continuous-integration-and-continuous-delivery-ci-cd"><strong>Continuous Integration and Continuous Delivery (CI/CD) (from IBM Coursera</strong></a><strong>)</strong></p>
</li>
<li><p><a target="_blank" href="https://www.udemy.com/course/github-actions-the-complete-guide/?couponCode=CMCPSALE24"><strong>GitHub Actions - The Complete Guide (from Udemy</strong></a><strong>)</strong></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/what-is-ci-cd/"><strong>Learn CI/CD by buliding a project (freeCodeCamp tutorial)</strong></a></p>
</li>
</ul>
<h3 id="heading-about-the-author">About the Author 👨‍💻</h3>
<p>Hi, I’m Prince! I’m a software engineer passionate about building scalable applications and sharing knowledge with the tech community.</p>
<p>If you enjoyed this article, you can learn more about me by exploring more of my blogs and projects on my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/">LinkedIn profile</a>. You can find my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/details/publications/">LinkedIn articles here</a>. And you can <a target="_blank" href="https://prince-onuk.vercel.app/achievements#articles">visit my website</a> to read more of my articles as well. Let’s connect and grow together! 😊</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
