<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ google cloud - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ google cloud - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Tue, 26 May 2026 16:23:05 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/google-cloud/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Build and Deploy Multi-Architecture Docker Apps on Google Cloud Using ARM Nodes (Without QEMU)
 ]]>
                </title>
                <description>
                    <![CDATA[ If you've bought a laptop in the last few years, there's a good chance it's running an ARM processor. Apple's M-series chips put ARM on the map for developers, but the real revolution is happening ins ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-and-deploy-multi-architecture-docker-apps-on-google-cloud-using-arm-nodes/</link>
                <guid isPermaLink="false">69dcf2c3f57346bc1e05a01d</guid>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Kubernetes ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ARM ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Amina Lawal ]]>
                </dc:creator>
                <pubDate>Mon, 13 Apr 2026 13:42:27 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/e89ae65a-4b3a-44b7-94d8-d0638f017bf6.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you've bought a laptop in the last few years, there's a good chance it's running an ARM processor. Apple's M-series chips put ARM on the map for developers, but the real revolution is happening inside cloud data centers.</p>
<p>Google Cloud Axion is Google's own custom ARM-based chip, built to handle the demands of modern cloud workloads. The performance and cost numbers are striking: Google claims Axion delivers up to 60% better energy efficiency and up to 65% better price-performance compared to comparable x86 machines.</p>
<p>AWS has Graviton. Azure has Cobalt. ARM is no longer niche. It's the direction the entire cloud industry is moving.</p>
<p>But there's a problem that catches almost every team off guard when they start this transition: <strong>container architecture mismatch</strong>.</p>
<p>If you build a Docker image on your M-series Mac and push it to an x86 server, it crashes on startup with a cryptic <code>exec format error</code>.</p>
<p>The server isn't broken. It just can't read the compiled instructions inside your image. An ARM binary and an x86 binary are written in fundamentally different languages at the machine level. The CPU literally can't execute instructions it wasn't designed for.</p>
<p>We're going to solve this problem completely in this tutorial. You'll build a single Docker image tag that automatically serves the correct binary on both ARM and x86 machines — no separate pipelines, no separate tags. Then you'll provision Google Cloud ARM nodes in GKE and configure your Kubernetes deployment to route workloads precisely to those cost-efficient nodes.</p>
<p><strong>Here's what you'll build, step by step:</strong></p>
<ul>
<li><p>A Go HTTP server that reports the CPU architecture it's running on at runtime</p>
</li>
<li><p>A multi-stage Dockerfile that cross-compiles for both <code>linux/amd64</code> and <code>linux/arm64</code> without slow QEMU emulation</p>
</li>
<li><p>A multi-arch image in Google Artifact Registry that acts as a single entry point for any architecture</p>
</li>
<li><p>A GKE cluster with two node pools: a standard x86 pool and an ARM Axion pool</p>
</li>
<li><p>A Kubernetes Deployment that pins your workload exclusively to the ARM nodes</p>
</li>
</ul>
<p>By the end, you'll hit a live endpoint and see the word <code>arm64</code> staring back at you from a Google Cloud ARM node. Let's get into it.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-step-1-set-up-your-google-cloud-project">Step 1: Set Up Your Google Cloud Project</a></p>
</li>
<li><p><a href="#heading-step-2-create-the-gke-cluster">Step 2: Create the GKE Cluster</a></p>
</li>
<li><p><a href="#heading-step-3-write-the-application">Step 3: Write the Application</a></p>
</li>
<li><p><a href="#heading-step-4-enable-multi-arch-builds-with-docker-buildx">Step 4: Enable Multi-Arch Builds with Docker Buildx</a></p>
</li>
<li><p><a href="#heading-step-5-write-the-dockerfile">Step 5: Write the Dockerfile</a></p>
</li>
<li><p><a href="#heading-step-6-build-and-push-the-multi-arch-image">Step 6: Build and Push the Multi-Arch Image</a></p>
</li>
<li><p><a href="#heading-step-7-add-the-axion-arm-node-pool">Step 7: Add the Axion ARM Node Pool</a></p>
</li>
<li><p><a href="#heading-step-8-deploy-the-app-to-the-arm-node-pool">Step 8: Deploy the App to the ARM Node Pool</a></p>
</li>
<li><p><a href="#heading-step-9-verify-the-deployment">Step 9: Verify the Deployment</a></p>
</li>
<li><p><a href="#heading-step-10-cost-savings-and-tradeoffs">Step 10: Cost Savings and Tradeoffs</a></p>
</li>
<li><p><a href="#heading-cleanup">Cleanup</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-project-file-structure">Project File Structure</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you start, make sure you have the following ready:</p>
<ul>
<li><p><strong>A Google Cloud project</strong> with billing enabled. If you don't have one, create it at <a href="https://console.cloud.google.com">console.cloud.google.com</a>. The total cost to follow this tutorial is around $5–10.</p>
</li>
<li><p><code>gcloud</code> <strong>CLI</strong> installed and authenticated. Run <code>gcloud auth login</code> to sign in and <code>gcloud config set project YOUR_PROJECT_ID</code> to point it at your project.</p>
</li>
<li><p><strong>Docker Desktop</strong> version 19.03 or later. Docker Buildx (the tool we'll use for multi-arch builds) ships bundled with it.</p>
</li>
<li><p><code>kubectl</code> installed. This is the CLI for interacting with Kubernetes clusters.</p>
</li>
<li><p>Basic familiarity with <strong>Docker</strong> (images, layers, Dockerfile) and <strong>Kubernetes</strong> (pods, deployments, services). You don't need to be an expert, but you should know what these things are.</p>
</li>
</ul>
<h2 id="heading-step-1-set-up-your-google-cloud-project">Step 1: Set Up Your Google Cloud Project</h2>
<p>Before writing a single line of application code, let's get the cloud infrastructure side ready. This is the foundation everything else will build on.</p>
<h3 id="heading-enable-the-required-apis">Enable the Required APIs</h3>
<p>Google Cloud services are off by default in any new project. Run this command to turn on the three APIs we'll need:</p>
<pre><code class="language-bash">gcloud services enable \
  artifactregistry.googleapis.com \
  container.googleapis.com \
  containeranalysis.googleapis.com
</code></pre>
<p>Here's what each one does:</p>
<ul>
<li><p><code>artifactregistry.googleapis.com</code> — enables <strong>Artifact Registry</strong>, where we'll store our Docker images</p>
</li>
<li><p><code>container.googleapis.com</code> — enables <strong>Google Kubernetes Engine (GKE)</strong>, where our cluster will run</p>
</li>
<li><p><code>containeranalysis.googleapis.com</code> — enables vulnerability scanning for images stored in Artifact Registry</p>
</li>
</ul>
<h3 id="heading-create-a-docker-repository-in-artifact-registry">Create a Docker Repository in Artifact Registry</h3>
<p>Artifact Registry is Google Cloud's managed container image store — the place where our built images will live before being deployed to the cluster. Create a dedicated repository for this tutorial:</p>
<pre><code class="language-bash">gcloud artifacts repositories create multi-arch-repo \
  --repository-format=docker \
  --location=us-central1 \
  --description="Multi-arch tutorial images"
</code></pre>
<p>Breaking down the flags:</p>
<ul>
<li><p><code>--repository-format=docker</code> — tells Artifact Registry this repository stores Docker images (as opposed to npm packages, Maven artifacts, and so on)</p>
</li>
<li><p><code>--location=us-central1</code> — the Google Cloud region where your images will be stored. Use a region that's close to where your cluster will run to minimize image pull latency. Run <code>gcloud artifacts locations list</code> to see all options.</p>
</li>
<li><p><code>--description</code> — a human-readable label for the repository, shown in the console.</p>
</li>
</ul>
<h3 id="heading-authenticate-docker-to-push-to-artifact-registry">Authenticate Docker to Push to Artifact Registry</h3>
<p>Docker needs credentials before it can push images to Google Cloud. Run this command to wire up authentication automatically:</p>
<pre><code class="language-bash">gcloud auth configure-docker us-central1-docker.pkg.dev
</code></pre>
<p>This adds a credential helper entry to your <code>~/.docker/config.json</code> file. What that means in practice: any time Docker tries to push or pull from a URL under <code>us-central1-docker.pkg.dev</code>, it will automatically call <code>gcloud</code> to get a valid auth token. You won't need to run <code>docker login</code> manually.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/31fd020f-ffa2-40bd-9057-57b16a61b325.png" alt="Terminal output of the gcloud artifacts repositories list command, showing a row for multi-arch-repo with format DOCKER, location us-central1" style="display:block;margin:0 auto" width="2870" height="1512" loading="lazy">

<h2 id="heading-step-2-create-the-gke-cluster">Step 2: Create the GKE Cluster</h2>
<p>With Artifact Registry ready to receive images, let's create the Kubernetes cluster. We'll start with a standard cluster using x86 nodes and add an ARM node pool later once we have an image to deploy.</p>
<pre><code class="language-bash">gcloud container clusters create axion-tutorial-cluster \
  --zone=us-central1-a \
  --num-nodes=2 \
  --machine-type=e2-standard-2 \
  --workload-pool=PROJECT_ID.svc.id.goog
</code></pre>
<p>Replace <code>PROJECT_ID</code> with your actual Google Cloud project ID.</p>
<p>What each flag does:</p>
<ul>
<li><p><code>--zone=us-central1-a</code> — creates a zonal cluster in a single availability zone. A regional cluster (using <code>--region</code>) would spread nodes across three zones for higher resilience, but for this tutorial a single zone keeps things simple and avoids capacity issues that can affect specific zones. If <code>us-central1-a</code> is unavailable, try <code>us-central1-b</code>.</p>
</li>
<li><p><code>--num-nodes=2</code> — two x86 nodes in this zone. We need at least 2 to have enough capacity alongside our ARM node pool later.</p>
</li>
<li><p><code>--machine-type=e2-standard-2</code> — the machine type for this default node pool. <code>e2-standard-2</code> is a cost-effective x86 machine with 2 vCPUs and 8 GB of memory, good for general workloads.</p>
</li>
<li><p><code>--workload-pool=PROJECT_ID.svc.id.goog</code> — enables <strong>Workload Identity</strong>, which is Google's recommended way for pods to authenticate with Google Cloud APIs. It avoids the need to download and store service account key files inside your cluster.</p>
</li>
</ul>
<p>This command takes a few minutes. While it runs, you can move on to writing the application. We'll come back to the cluster in Step 6.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/332250a8-3f99-4eb1-849f-51ab054c9567.png" alt="GCP Console Kubernetes Engine Clusters page showing axion-tutorial-cluster with a green checkmark status, the zone us-central1-a, and Kubernetes version in the table." style="display:block;margin:0 auto" width="1457" height="720" loading="lazy">

<h2 id="heading-step-3-write-the-application">Step 3: Write the Application</h2>
<p>We need an application to containerize. We'll use <strong>Go</strong> for three specific reasons:</p>
<ol>
<li><p>Go compiles into a single, statically-linked binary. There's no runtime to install, no interpreter — just the binary. This makes for extremely lean container images.</p>
</li>
<li><p>Go has first-class, built-in cross-compilation support. We can compile an ARM64 binary from an x86 Mac, or vice versa, by setting two environment variables. This will matter a lot when we get to the Dockerfile.</p>
</li>
<li><p>Go exposes the architecture the binary was compiled for via <code>runtime.GOARCH</code>. Our server will report this at runtime, giving us hard proof that the correct binary is running on the correct hardware.</p>
</li>
</ol>
<p>Start by creating the project directories:</p>
<pre><code class="language-bash">mkdir -p hello-axion/app hello-axion/k8s
cd hello-axion/app
</code></pre>
<p>Initialize the Go module from inside <code>app/</code>. This creates <code>go.mod</code> in the current directory:</p>
<pre><code class="language-bash">go mod init hello-axion
</code></pre>
<p><code>go mod init</code> is Go's built-in command for starting a new module. It writes a <code>go.mod</code> file that declares the module name (<code>hello-axion</code>) and the minimum Go version required. Every modern Go project needs this file — without it, the compiler doesn't know how to resolve packages.</p>
<p>Now create the application at <code>app/main.go</code>:</p>
<pre><code class="language-go">package main

import (
    "fmt"
    "net/http"
    "os"
    "runtime"
)

func handler(w http.ResponseWriter, r *http.Request) {
    hostname, _ := os.Hostname()
    fmt.Fprintf(w, "Hello from freeCodeCamp!\n")
    fmt.Fprintf(w, "Architecture : %s\n", runtime.GOARCH)
    fmt.Fprintf(w, "OS           : %s\n", runtime.GOOS)
    fmt.Fprintf(w, "Pod hostname : %s\n", hostname)
}

func healthz(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
    fmt.Fprintln(w, "ok")
}

func main() {
    http.HandleFunc("/", handler)
    http.HandleFunc("/healthz", healthz)
    fmt.Println("Server starting on port 8080...")
    if err := http.ListenAndServe(":8080", nil); err != nil {
        fmt.Fprintf(os.Stderr, "server error: %v\n", err)
        os.Exit(1)
    }
}
</code></pre>
<p>Verify both files were created:</p>
<pre><code class="language-bash">ls -la
</code></pre>
<p>You should see <code>go.mod</code> and <code>main.go</code> listed.</p>
<p>Let's walk through what this code does:</p>
<ul>
<li><p><code>import "runtime"</code> — imports Go's built-in <code>runtime</code> package, which exposes information about the Go runtime environment, including the CPU architecture.</p>
</li>
<li><p><code>runtime.GOARCH</code> — returns a string like <code>"arm64"</code> or <code>"amd64"</code> representing the architecture this binary was compiled for. When we deploy to an ARM node, this value will be <code>arm64</code>. This is the core of our proof.</p>
</li>
<li><p><code>os.Hostname()</code> — returns the pod's hostname, which Kubernetes sets to the pod name. This lets us see which specific pod responded when we test the app later.</p>
</li>
<li><p><code>handler</code> — the main HTTP handler, registered on the root path <code>/</code>. It writes the architecture, OS, and hostname to the response.</p>
</li>
<li><p><code>healthz</code> — a separate handler registered on <code>/healthz</code>. It returns HTTP 200 with the text <code>ok</code>. Kubernetes will use this endpoint to check whether the container is alive and ready to serve traffic — we'll wire this up in the deployment manifest later.</p>
</li>
<li><p><code>http.ListenAndServe(":8080", nil)</code> — starts the server on port 8080. If it fails to start (for example, if the port is already in use), it prints the error and exits with a non-zero code so Kubernetes knows something went wrong.</p>
</li>
</ul>
<h2 id="heading-step-4-enable-multi-arch-builds-with-docker-buildx">Step 4: Enable Multi-Arch Builds with Docker Buildx</h2>
<p>Before we write the Dockerfile, we need to understand a fundamental constraint, because it directly shapes how the Dockerfile must be written.</p>
<h3 id="heading-why-your-docker-images-are-architecture-specific-by-default">Why Your Docker Images Are Architecture-Specific By Default</h3>
<p>A CPU only understands instructions written for its specific <strong>Instruction Set Architecture (ISA)</strong>. ARM64 and x86_64 are different ISAs — different vocabularies of machine-level operations. When you compile a Go program, the compiler translates your source code into binary instructions for exactly one ISA. That binary can't run on a different ISA.</p>
<p>When you build a Docker image the normal way (<code>docker build</code>), the binary inside that image is compiled for your local machine's ISA. If you're on an Apple Silicon Mac, you get an ARM64 binary. Push that image to an x86 server, and when Docker tries to execute the binary, the kernel rejects it:</p>
<pre><code class="language-shell">standard_init_linux.go:228: exec user process caused: exec format error
</code></pre>
<p>That's the operating system saying: "This binary was written for a different processor. I have no idea what to do with it."</p>
<h3 id="heading-the-solution-a-single-image-tag-that-serves-any-architecture">The Solution: A Single Image Tag That Serves Any Architecture</h3>
<p>Docker solves this with a structure called a <strong>Manifest List</strong> (also called a multi-arch image index). Instead of one image, a Manifest List is a pointer table. It holds multiple image references — one per architecture — all under the same tag.</p>
<p>When a server pulls <code>hello-axion:v1</code>, here's what actually happens:</p>
<ol>
<li><p>Docker contacts the registry and requests the manifest for <code>hello-axion:v1</code></p>
</li>
<li><p>The registry returns the Manifest List, which looks like this internally:</p>
</li>
</ol>
<pre><code class="language-json">{
  "manifests": [
    { "digest": "sha256:a1b2...", "platform": { "architecture": "amd64", "os": "linux" } },
    { "digest": "sha256:c3d4...", "platform": { "architecture": "arm64", "os": "linux" } }
  ]
}
</code></pre>
<ol>
<li>Docker checks the current machine's architecture, finds the matching entry, and pulls only that specific image layer. The x86 image never downloads onto your ARM server, and vice versa.</li>
</ol>
<p>One tag, two actual images. Completely transparent to your deployment manifests.</p>
<h3 id="heading-set-up-docker-buildx">Set Up Docker Buildx</h3>
<p><strong>Docker Buildx</strong> is the CLI tool that builds these Manifest Lists. It's powered by the <strong>BuildKit</strong> engine and ships bundled with Docker Desktop. Run the following to create and activate a new builder instance:</p>
<pre><code class="language-bash">docker buildx create --name multiarch-builder --use
</code></pre>
<ul>
<li><p><code>--name multiarch-builder</code> — gives this builder a memorable name. You can have multiple builders. This command creates a new one named <code>multiarch-builder</code>.</p>
</li>
<li><p><code>--use</code> — immediately sets this new builder as the active one, so all future <code>docker buildx build</code> commands use it.</p>
</li>
</ul>
<p>Now boot the builder and confirm it supports the platforms we need:</p>
<pre><code class="language-bash">docker buildx inspect --bootstrap
</code></pre>
<ul>
<li><code>--bootstrap</code> — starts the builder container if it isn't already running, and prints its full configuration.</li>
</ul>
<p>You should see output like this:</p>
<pre><code class="language-plaintext">Name:          multiarch-builder
Driver:        docker-container
Platforms:     linux/amd64, linux/arm64, linux/arm/v7, linux/386, ...
</code></pre>
<p>The <code>Platforms</code> line lists every architecture this builder can produce images for. As long as you see <code>linux/amd64</code> and <code>linux/arm64</code> in that list, you're ready to build for both x86 and ARM.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/1c19aca1-30c4-406d-9c37-679ee4f2928f.png" alt="Terminal output showing the multiarch-builder details with Name, Driver set to docker-container, and a Platforms list that includes linux/amd64 and linux/arm64 highlighted." style="display:block;margin:0 auto" width="2188" height="1258" loading="lazy">

<h2 id="heading-step-5-write-the-dockerfile">Step 5: Write the Dockerfile</h2>
<p>Now we can write the Dockerfile. We'll use two techniques together: a <strong>multi-stage build</strong> to keep the final image tiny, and a <strong>cross-compilation trick</strong> to avoid slow CPU emulation.</p>
<p>Create <code>app/Dockerfile</code> with the following content:</p>
<pre><code class="language-dockerfile"># -----------------------------------------------------------
# Stage 1: Build
# -----------------------------------------------------------
# $BUILDPLATFORM = the machine running this build (your laptop)
# \(TARGETOS / \)TARGETARCH = the platform we are building FOR
# -----------------------------------------------------------
FROM --platform=$BUILDPLATFORM golang:1.23-alpine AS builder

ARG TARGETOS
ARG TARGETARCH

WORKDIR /app

COPY go.mod .
RUN go mod download

COPY main.go .

RUN GOOS=\(TARGETOS GOARCH=\)TARGETARCH go build -ldflags="-w -s" -o server main.go

# -----------------------------------------------------------
# Stage 2: Runtime
# -----------------------------------------------------------

FROM alpine:latest

RUN addgroup -S appgroup &amp;&amp; adduser -S appuser -G appgroup
USER appuser

WORKDIR /app
COPY --from=builder /app/server .

EXPOSE 8080
CMD ["./server"]
</code></pre>
<p>There's a lot happening here. Let's go through it carefully.</p>
<h3 id="heading-stage-1-the-builder">Stage 1: The Builder</h3>
<p><code>FROM --platform=$BUILDPLATFORM golang:1.23-alpine AS builder</code></p>
<p>This is the most important line in the file. <code>\(BUILDPLATFORM</code> is a special build argument that Docker Buildx automatically injects — it equals the platform of the machine <em>running the build</em> (your laptop). By pinning the builder stage to <code>\)BUILDPLATFORM</code>, the Go compiler always runs natively on your machine, not inside a CPU emulator. This is what makes multi-arch builds fast.</p>
<p>Without <code>--platform=$BUILDPLATFORM</code>, Buildx would have to use <strong>QEMU</strong> — a full CPU emulator — to run an ARM64 build environment on your x86 machine (or vice versa). QEMU works, but it's typically 5–10 times slower than native execution. For a project with many dependencies, that's the difference between a 2-minute build and a 20-minute build.</p>
<p><code>ARG TARGETOS</code> <strong>and</strong> <code>ARG TARGETARCH</code></p>
<p>These two lines declare that our Dockerfile expects build arguments named <code>TARGETOS</code> and <code>TARGETARCH</code>. Buildx injects these automatically based on the <code>--platform</code> flag you pass at build time. For a <code>linux/arm64</code> target, <code>TARGETOS</code> will be <code>linux</code> and <code>TARGETARCH</code> will be <code>arm64</code>.</p>
<p><code>COPY go.mod .</code> <strong>and</strong> <code>RUN go mod download</code></p>
<p>We copy <code>go.mod</code> first, before copying the rest of the source code. Docker builds images layer by layer and caches each layer. By copying only the module file first, we create a cached layer for <code>go mod download</code>.</p>
<p>On future builds, as long as <code>go.mod</code> hasn't changed, Docker skips the download step entirely — even if the source code changed. This speeds up iterative development significantly.</p>
<p><code>RUN GOOS=\(TARGETOS GOARCH=\)TARGETARCH go build -ldflags="-w -s" -o server main.go</code></p>
<p>This is the cross-compilation step. <code>GOOS</code> and <code>GOARCH</code> are Go's built-in cross-compilation environment variables. Setting them tells the Go compiler to produce a binary for a different target than the machine it's running on. We set them from the <code>\(TARGETOS</code> and <code>\)TARGETARCH</code> build args injected by Buildx.</p>
<p>The <code>-ldflags="-w -s"</code> flag strips the debug symbol table and the DWARF debugging information from the binary. This has no effect on runtime behavior but reduces the binary size by roughly 30%.</p>
<h3 id="heading-stage-2-the-runtime-image">Stage 2: The Runtime Image</h3>
<p><code>FROM alpine:latest</code></p>
<p>This starts a brand-new image from Alpine Linux — a minimal Linux distribution that weighs about 5 MB. Critically, <code>alpine:latest</code> is itself a multi-arch image, so Docker automatically selects the <code>arm64</code> or <code>amd64</code> Alpine variant depending on which platform this stage is built for.</p>
<p>Everything from Stage 1 — the Go toolchain, the source files, the intermediate object files — is discarded. The final image contains <em>only</em> Alpine Linux plus our binary. Compared to a naive single-stage Go image (~300 MB), this approach produces an image under 15 MB.</p>
<p><code>RUN addgroup -S appgroup &amp;&amp; adduser -S appuser -G appgroup</code> and <code>USER appuser</code></p>
<p>These two lines create a non-root user and set it as the active user for the container. Running containers as root is a security risk — if an attacker exploits a vulnerability in your application, they gain root access inside the container. Running as a non-root user limits the blast radius.</p>
<p><code>COPY --from=builder /app/server .</code></p>
<p>This is how multi-stage builds work: the <code>--from=builder</code> flag tells Docker to copy files from the <code>builder</code> stage (Stage 1), not from your local disk. Only the compiled binary (<code>server</code>) makes it into the final image.</p>
<h2 id="heading-step-6-build-and-push-the-multi-arch-image">Step 6: Build and Push the Multi-Arch Image</h2>
<p>With the application and Dockerfile in place, we can now build images for both architectures and push them to Artifact Registry — all in a single command.</p>
<p>From inside the <code>app/</code> directory, run:</p>
<pre><code class="language-bash">docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1 \
  --push \
  .
</code></pre>
<p>Replace <code>PROJECT_ID</code> with your actual GCP project ID.</p>
<p>Here's what each part of this command does:</p>
<ul>
<li><p><code>docker buildx build</code> — uses the Buildx CLI instead of the standard <code>docker build</code>. Buildx is required for multi-platform builds.</p>
</li>
<li><p><code>--platform linux/amd64,linux/arm64</code> — instructs Buildx to build the image twice: once targeting x86 Intel/AMD machines, and once targeting ARM64. Both builds run in parallel. Because our Dockerfile uses the <code>$BUILDPLATFORM</code> cross-compilation trick, both builds run natively on your machine without QEMU emulation.</p>
</li>
<li><p><code>-t us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1</code> — the full image path in Artifact Registry. The format is always <code>REGION-docker.pkg.dev/PROJECT_ID/REPO_NAME/IMAGE_NAME:TAG</code>.</p>
</li>
<li><p><code>--push</code> — multi-arch images can't be loaded into your local Docker daemon (which only understands single-architecture images). This flag tells Buildx to skip local storage and push the completed Manifest List — with both architecture variants — directly to the registry.</p>
</li>
<li><p><code>.</code> — the build context, the directory Docker scans for the Dockerfile and any files the build needs.</p>
</li>
</ul>
<p>Watch the output as the build runs. You'll see BuildKit working on both platforms simultaneously:</p>
<pre><code class="language-plaintext"> =&gt; [linux/amd64 builder 1/5] FROM golang:1.23-alpine
 =&gt; [linux/arm64 builder 1/5] FROM golang:1.23-alpine
 ...
 =&gt; pushing manifest for us-central1-docker.pkg.dev/.../hello-axion:v1
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/dc88f558-b4ee-4100-bfe1-eaa943bec9bc.png" alt="Terminal showing docker buildx build output with two parallel build tracks labeled linux/amd64 and linux/arm64, and a final line reading pushing manifest for the Artifact Registry image path." style="display:block;margin:0 auto" width="2188" height="1258" loading="lazy">

<h3 id="heading-verify-the-multi-arch-image-in-artifact-registry">Verify the Multi-Arch Image in Artifact Registry</h3>
<p>Once the push completes, navigate to <strong>GCP Console → Artifact Registry → Repositories → multi-arch-repo</strong> and click on <code>hello-axion</code>.</p>
<p>You won't see a single image — you'll see something labelled <strong>"Image Index"</strong>. That's the Manifest List we created. Click into it, and you'll find two child images with separate digests, one for <code>linux/amd64</code> and one for <code>linux/arm64</code>.</p>
<p>You can also inspect this from the command line:</p>
<pre><code class="language-bash">docker buildx imagetools inspect \
  us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/28d0e4a4-1d45-4c0b-ac47-34dc3b72c11d.png" alt="Google Cloud Artifact Registry console showing hello-axion as an Image Index with two child images: one labeled linux/amd64 and one labeled linux/arm64, each with its own digest and size." style="display:block;margin:0 auto" width="2188" height="1258" loading="lazy">

<p>The output lists every manifest inside the image index. You'll see entries for <code>linux/amd64</code> and <code>linux/arm64</code> — those are our two real images. You'll also see two entries with <code>Platform: unknown/unknown</code> labelled as <code>attestation-manifest</code>. These are <strong>build provenance records</strong> that Docker Buildx automatically attaches to prove how and where the image was built (a supply chain security feature called SLSA attestation).</p>
<p>The two entries you care about are <code>linux/amd64</code> and <code>linux/arm64</code>. Note the digest for the <code>arm64</code> entry — we'll use it in the verification step to confirm the cluster pulled the right variant.</p>
<h2 id="heading-step-7-add-the-axion-arm-node-pool">Step 7: Add the Axion ARM Node Pool</h2>
<p>We have a universal image. Now we need somewhere to run it.</p>
<p>Recall the cluster we created in Step 2 — it's running <code>e2-standard-2</code> x86 machines. We're going to add a second node pool running ARM machines. This is the key architectural move: a <strong>mixed-architecture cluster</strong> where different workloads can be routed to different hardware.</p>
<h3 id="heading-choosing-your-arm-machine-type">Choosing Your ARM Machine Type</h3>
<p>Google Cloud currently offers two ARM-based machine series in GKE:</p>
<table>
<thead>
<tr>
<th>Series</th>
<th>Example type</th>
<th>What it is</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Tau T2A</strong></td>
<td><code>t2a-standard-2</code></td>
<td>First-gen Google ARM (Ampere Altra). Broadly available across regions. Great for getting started.</td>
</tr>
<tr>
<td><strong>Axion (C4A)</strong></td>
<td><code>c4a-standard-2</code></td>
<td>Google's custom ARM chip (Arm Neoverse V2 core). Newest generation, best price-performance. Still expanding availability.</td>
</tr>
</tbody></table>
<p>This tutorial uses <code>t2a-standard-2</code> because it's widely available. The commands are identical for <code>c4a-standard-2</code> — just swap the <code>--machine-type</code> value. If <code>t2a-standard-2</code> isn't available in your zone, GKE will tell you immediately when you run the node pool creation command below, and you can try a neighbouring zone.</p>
<h3 id="heading-create-the-arm-node-pool">Create the ARM Node Pool</h3>
<p>Add the ARM node pool to your existing cluster:</p>
<pre><code class="language-bash">gcloud container node-pools create axion-pool \
  --cluster=axion-tutorial-cluster \
  --zone=us-central1-a \
  --machine-type=t2a-standard-2 \
  --num-nodes=2 \
  --node-labels=workload-type=arm-optimized
</code></pre>
<p>What each flag does:</p>
<ul>
<li><p><code>--cluster=axion-tutorial-cluster</code> — the name of the cluster we created in Step 2. Node pools are always added to an existing cluster.</p>
</li>
<li><p><code>--zone=us-central1-a</code> — must match the zone you used when creating the cluster.</p>
</li>
<li><p><code>--machine-type=t2a-standard-2</code> — GKE detects this is an ARM machine type and automatically provisions the nodes with an ARM-compatible version of Container-Optimized OS (COS). You don't need to configure anything special for ARM at the OS level.</p>
</li>
<li><p><code>--num-nodes=2</code> — two ARM nodes in the zone, enough to schedule our 3-replica deployment alongside other cluster overhead.</p>
</li>
<li><p><code>--node-labels=workload-type=arm-optimized</code> — attaches a custom label to every node in this pool. We'll use this label in our deployment manifest to target these specific nodes. Using a descriptive custom label (rather than just relying on the automatic <code>kubernetes.io/arch=arm64</code> label) is good practice in real clusters — it communicates the <em>intent</em> of the pool, not just its hardware.</p>
</li>
</ul>
<p>This command takes a few minutes. Once it completes, let's confirm our cluster now has both node pools:</p>
<pre><code class="language-bash">gcloud container clusters get-credentials axion-tutorial-cluster --zone=us-central1-a

kubectl get nodes --label-columns=kubernetes.io/arch
</code></pre>
<p>The <code>get-credentials</code> command configures <code>kubectl</code> to authenticate with your new cluster. The <code>get nodes</code> command then lists all nodes and adds a column showing the <code>kubernetes.io/arch</code> label.</p>
<p>You should see something like:</p>
<pre><code class="language-plaintext">NAME                                    STATUS   ARCH    AGE
gke-...default-pool-abc...              Ready    amd64   15m
gke-...default-pool-def...              Ready    amd64   15m
gke-...axion-pool-jkl...                Ready    arm64   3m
gke-...axion-pool-mno...                Ready    arm64   3m
</code></pre>
<p><code>amd64</code> for the default x86 pool, <code>arm64</code> for our new Axion pool. This <code>kubernetes.io/arch</code> label is applied automatically by GKE — you don't set it, it's derived from the hardware.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/6389f4c6-17fe-4086-982f-39d94dbfa252.png" alt="Terminal output of kubectl get nodes with a ARCH column showing amd64 for two default-pool nodes and arm64 for two axion-pool nodes." style="display:block;margin:0 auto" width="2330" height="646" loading="lazy">

<h2 id="heading-step-8-deploy-the-app-to-the-arm-node-pool">Step 8: Deploy the App to the ARM Node Pool</h2>
<p>We have a multi-arch image and a mixed-architecture cluster. Here's something important to understand before writing the deployment manifest: <strong>Kubernetes doesn't know or care about image architecture by default</strong>.</p>
<p>If you applied a standard Deployment right now, the scheduler would look for any available node with enough CPU and memory and place pods there — potentially landing on x86 nodes instead of your ARM Axion nodes. The multi-arch Manifest List would handle this gracefully (the right binary would run regardless), but you'd lose the cost efficiency you provisioned Axion nodes for in the first place.</p>
<p>To guarantee that pods land on ARM nodes and only ARM nodes, we use a <code>nodeSelector</code>.</p>
<h3 id="heading-how-nodeselector-works">How nodeSelector Works</h3>
<p>A <code>nodeSelector</code> is a set of key-value pairs in your pod spec. Before the Kubernetes scheduler places a pod, it checks every available node's labels. If a node doesn't have all the labels in the <code>nodeSelector</code>, the scheduler skips it — the pod will remain in <code>Pending</code> state rather than land on the wrong node.</p>
<p>This is a hard constraint, which is exactly what we want for cost optimization. Contrast this with Node Affinity's soft preference mode (<code>preferredDuringSchedulingIgnoredDuringExecution</code>), which says "try to use ARM, but fall back to x86 if needed." Soft preferences are useful for resilience, but they undermine the whole point of dedicated ARM pools. We want the hard constraint.</p>
<h3 id="heading-write-the-deployment-manifest">Write the Deployment Manifest</h3>
<p>Create <code>k8s/deployment.yaml</code>:</p>
<pre><code class="language-yaml">apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-axion
  labels:
    app: hello-axion
spec:
  replicas: 3
  selector:
    matchLabels:
      app: hello-axion
  template:
    metadata:
      labels:
        app: hello-axion
    spec:
      nodeSelector:
        kubernetes.io/arch: arm64

      containers:
      - name: hello-axion
        image: us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 3
          periodSeconds: 5
        resources:
          requests:
            cpu: "250m"
            memory: "64Mi"
          limits:
            cpu: "500m"
            memory: "128Mi"
</code></pre>
<p>Replace <code>PROJECT_ID</code> with your project ID. Here's what the key sections do:</p>
<p><code>replicas: 3</code> — tells Kubernetes to keep three instances of this pod running at all times. If one crashes or a node goes down, the scheduler spins up a replacement. Three replicas also means one pod per ARM node in <code>us-central1</code>, which distributes load across availability zones.</p>
<p><code>selector.matchLabels</code> and <code>template.metadata.labels</code> — these two blocks must match. The <code>selector</code> tells the Deployment which pods it "owns," and the <code>template.metadata.labels</code> is what those pods will be tagged with. If they don't match, Kubernetes won't be able to manage the pods.</p>
<p><code>nodeSelector: kubernetes.io/arch: arm64</code> — this is the pin. The Kubernetes scheduler filters out every node that doesn't carry this label before considering resource availability. Since GKE automatically applies <code>kubernetes.io/arch=arm64</code> to all ARM nodes, our pods will schedule only onto the <code>axion-pool</code> nodes.</p>
<p><code>livenessProbe</code> — periodically calls <code>GET /healthz</code>. If this check fails a certain number of times in a row (indicating the container has deadlocked or is otherwise unresponsive), Kubernetes restarts the container. <code>initialDelaySeconds: 5</code> gives the server 5 seconds to start up before the first check.</p>
<p><code>readinessProbe</code> — similar to the liveness probe, but with a different purpose. While the readiness probe is failing, Kubernetes removes the pod from the service's load balancer, so no traffic is sent to it. This is important during startup — the pod won't receive traffic until it signals it's ready.</p>
<p><code>resources.requests</code> — reserves <code>250m</code> (25% of a CPU core) and <code>64Mi</code> of memory on the node for this pod. The scheduler uses these numbers to decide whether a node has enough room for the pod. Setting requests is required for sensible bin-packing. Without them, nodes can be silently overcommitted.</p>
<p><code>resources.limits</code> — caps the container at <code>500m</code> CPU and <code>128Mi</code> memory. If the container exceeds these limits, Kubernetes throttles the CPU or kills the container (for memory). This prevents a single misbehaving pod from starving other workloads on the same node.</p>
<h3 id="heading-a-note-on-taints-and-tolerations">A Note on Taints and Tolerations</h3>
<p>Once you're comfortable with <code>nodeSelector</code>, the next step in production clusters is adding a <strong>taint</strong> to your ARM node pool. A taint is a repellent — any pod without an explicit <strong>toleration</strong> for that taint is blocked from landing on the tainted node.</p>
<p>This means other workloads in your cluster can't accidentally consume your ARM capacity. You'd add the taint when creating the pool:</p>
<pre><code class="language-bash"># Add --node-taints to the pool creation command:
--node-taints=workload-type=arm-optimized:NoSchedule
</code></pre>
<p>And a matching toleration in the pod spec:</p>
<pre><code class="language-yaml">tolerations:
- key: "workload-type"
  operator: "Equal"
  value: "arm-optimized"
  effect: "NoSchedule"
</code></pre>
<p>We're not doing this in the tutorial to keep things simple, but it's the pattern production multi-tenant clusters use to enforce hard separation between workload types.</p>
<h3 id="heading-write-the-service-manifest">Write the Service Manifest</h3>
<p>We also need a Kubernetes Service to expose the pods over the network. Create <code>k8s/service.yaml</code>:</p>
<pre><code class="language-yaml">apiVersion: v1
kind: Service
metadata:
  name: hello-axion-svc
spec:
  selector:
    app: hello-axion
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer
</code></pre>
<ul>
<li><p><code>selector: app: hello-axion</code> — the Service discovers pods using labels. Any pod with <code>app: hello-axion</code> on it will be added to this Service's load balancer pool.</p>
</li>
<li><p><code>port: 80</code> — the port the Service is reachable on from outside the cluster.</p>
</li>
<li><p><code>targetPort: 8080</code> — the port on the pod that traffic gets forwarded to. Our Go server listens on port 8080, so this must match.</p>
</li>
<li><p><code>type: LoadBalancer</code> — tells GKE to provision an external Google Cloud load balancer and assign it a public IP. This is what makes the Service reachable from the internet.</p>
</li>
</ul>
<h3 id="heading-apply-both-manifests">Apply Both Manifests</h3>
<pre><code class="language-bash">kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
</code></pre>
<p><code>kubectl apply</code> reads each manifest file and creates or updates the resources described in it. If the resources don't exist yet, they're created. If they already exist, Kubernetes only applies the diff — it won't restart pods unnecessarily.</p>
<p>Watch the pods come up in real time:</p>
<pre><code class="language-bash">kubectl get pods -w
</code></pre>
<p>The <code>-w</code> flag watches for changes and prints updates as they happen. You should see pods transition from <code>Pending</code> → <code>ContainerCreating</code> → <code>Running</code>. Once all three show <code>Running</code>, press <code>Ctrl+C</code> to stop watching.</p>
<h2 id="heading-step-9-verify-the-deployment">Step 9: Verify the Deployment</h2>
<p>Everything is running. Now we need evidence — not just that pods are up, but that they're on the right nodes and serving the right binary.</p>
<h3 id="heading-confirm-pod-placement">Confirm Pod Placement</h3>
<pre><code class="language-bash">kubectl get pods -o wide
</code></pre>
<p>The <code>-o wide</code> flag adds extra columns to the output, including the name of the node each pod was scheduled on. Look at the <code>NODE</code> column:</p>
<pre><code class="language-plaintext">NAME                          READY   STATUS    NODE
hello-axion-7b8d9f-abc12      1/1     Running   gke-axion-tutorial-axion-pool-a-...
hello-axion-7b8d9f-def34      1/1     Running   gke-axion-tutorial-axion-pool-b-...
hello-axion-7b8d9f-ghi56      1/1     Running   gke-axion-tutorial-axion-pool-c-...
</code></pre>
<p>All three pods should show node names containing <code>axion-pool</code>. None should show <code>default-pool</code>.</p>
<h3 id="heading-confirm-the-nodes-are-arm">Confirm the Nodes Are ARM</h3>
<p>Take one of those node names and verify its architecture label:</p>
<pre><code class="language-bash">kubectl get node NODE_NAME --show-labels | grep kubernetes.io/arch
</code></pre>
<p>Replace <code>NODE_NAME</code> with one of the node names from the previous command. You should see:</p>
<pre><code class="language-plaintext">kubernetes.io/arch=arm64
</code></pre>
<p>That's the automatic label GKE applied when it provisioned the ARM hardware. Our <code>nodeSelector</code> matched on this label to pin the pods here.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/815312ea-e2bf-4106-863e-55cd0bdad5f7.png" alt="Terminal split into two sections: the top showing kubectl get pods -o wide with all pods scheduled on nodes containing axion-pool in the name, and the bottom showing kubectl get node with kubernetes.io/arch=arm64 in the labels output." style="display:block;margin:0 auto" width="2848" height="1500" loading="lazy">

<h3 id="heading-ask-the-application-itself">Ask the Application Itself</h3>
<p>This is the most satisfying verification step. Our Go server reports the architecture of the binary that's running. Let's ask it directly.</p>
<p>Use <code>kubectl port-forward</code> to create a secure tunnel from port 8080 on your local machine to port 8080 on the Deployment:</p>
<pre><code class="language-bash">kubectl port-forward deployment/hello-axion 8080:8080
</code></pre>
<p>This command stays running in the foreground — open a <strong>second terminal window</strong> and run:</p>
<pre><code class="language-bash">curl http://localhost:8080
</code></pre>
<p>You should see:</p>
<pre><code class="language-plaintext">Hello from freeCodeCamp!
Architecture : arm64
OS           : linux
Pod hostname : hello-axion-7b8d9f-abc12
</code></pre>
<p><code>Architecture : arm64</code>. That's our Go binary confirming that it was compiled for ARM64 and is executing on an ARM64 CPU. The single image tag we built does the right thing automatically.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/114ff82d-950f-4059-a1fa-89baffb90b6c.png" alt="Terminal output of curl http://localhost:8080 showing the four-line response: Hello from freeCodeCamp, Architecture: arm64, OS: linux, and the pod hostname." style="display:block;margin:0 auto" width="1042" height="292" loading="lazy">

<h3 id="heading-the-bonus-see-the-manifest-list-in-action">The Bonus: See the Manifest List in Action</h3>
<p>Want to see the multi-arch image indexing at work? Stop the port-forward, then run:</p>
<pre><code class="language-bash">docker buildx imagetools inspect \
  us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1
</code></pre>
<p>Replace <code>PROJECT_ID</code> with your actual Google Cloud project ID.</p>
<p>You'll see four entries in the manifest list. Two are real images — <code>Platform: linux/amd64</code> and <code>Platform: linux/arm64</code>. The other two show <code>Platform: unknown/unknown</code> with an <code>attestation-manifest</code> annotation. These are <strong>build provenance records</strong> that Docker Buildx automatically attaches to every image — a supply chain security feature (SLSA attestation) that proves how and where the image was built.</p>
<p>You may notice that if you check the image digest recorded in a running pod:</p>
<pre><code class="language-bash">kubectl get pod POD_NAME \
  -o jsonpath='{.status.containerStatuses[0].imageID}'
</code></pre>
<p>Replace <code>POD_NAME</code> with one of the pod names from earlier.</p>
<p>The digest returned matches the <strong>top-level manifest list digest</strong>, not the <code>arm64</code>-specific one. This is expected behaviour. Modern Kubernetes (using containerd) records the manifest list digest, not the resolved platform digest. The platform resolution already happened when the node pulled the correct image variant.</p>
<p>The definitive proof that the right binary is running is what you already have: the node labeled <code>kubernetes.io/arch=arm64</code> and the application reporting <code>Architecture: arm64</code>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/7dffe0c8-28cf-4a5d-8459-1e8db3da7dc0.png" alt="top-level manifest list digest" style="display:block;margin:0 auto" width="2302" height="1000" loading="lazy">

<h2 id="heading-step-10-cost-savings-and-tradeoffs">Step 10: Cost Savings and Tradeoffs</h2>
<p>The hands-on work is done. Let's talk about why any of this is worth the effort.</p>
<h3 id="heading-the-cost-math">The Cost Math</h3>
<p>At the time of writing, here's how ARM compares to equivalent x86 machines on Google Cloud (prices are approximate and change over time — check the <a href="https://cloud.google.com/compute/vm-instance-pricing">official pricing page</a> before making decisions):</p>
<table>
<thead>
<tr>
<th>Instance</th>
<th>vCPU</th>
<th>Memory</th>
<th>Approx. $/hour</th>
</tr>
</thead>
<tbody><tr>
<td><code>n2-standard-4</code> (x86)</td>
<td>4</td>
<td>16 GB</td>
<td>~$0.19</td>
</tr>
<tr>
<td><code>t2a-standard-4</code> (Tau ARM)</td>
<td>4</td>
<td>16 GB</td>
<td>~$0.14</td>
</tr>
<tr>
<td><code>c4a-standard-4</code> (Axion)</td>
<td>4</td>
<td>16 GB</td>
<td>~$0.15</td>
</tr>
</tbody></table>
<p>That's a raw 25–30% reduction in compute cost per node. Factor in Google's published claim of up to 65% better price-performance for Axion on relevant workloads — meaning you may need fewer nodes to handle the same traffic — and the savings compound further.</p>
<p>Here's how that looks at scale, for a service running 20 nodes continuously for a year:</p>
<ul>
<li><p>20 × <code>n2-standard-4</code> × \(0.19 × 8,760 hours = <strong>\)33,288/year</strong></p>
</li>
<li><p>20 × <code>t2a-standard-4</code> × \(0.14 × 8,760 hours = <strong>\)24,528/year</strong></p>
</li>
</ul>
<p>That's roughly <strong>$8,760 saved annually</strong> on compute, before committed use discounts (which further widen the gap).</p>
<h3 id="heading-when-arm-is-the-right-choice">When ARM Is the Right Choice</h3>
<p>ARM works best for:</p>
<ul>
<li><p><strong>Stateless API servers and web applications</strong> — like the app we built. ARM excels at high-throughput, low-latency network workloads.</p>
</li>
<li><p><strong>Background workers and queue processors</strong> — long-running services that don't depend on x86-specific binaries.</p>
</li>
<li><p><strong>Microservices written in Go, Rust, or Python</strong> — these languages have excellent ARM64 support and are built cross-platform by default.</p>
</li>
</ul>
<h3 id="heading-when-to-proceed-carefully">When to Proceed Carefully</h3>
<ul>
<li><p><strong>Native library dependencies</strong> — some older C libraries, proprietary SDKs, or compiled ML model-serving runtimes don't have ARM64 builds. Always audit your dependency tree before migrating.</p>
</li>
<li><p><strong>CI pipelines need ARM too</strong> — your automated tests should run on ARM, not just x86. An image that silently fails only on ARM is harder to debug than one that never claimed ARM support.</p>
</li>
<li><p><strong>Profile before optimizing</strong> — the cost savings are real, but measure your actual workload behavior on ARM before committing. Not every workload benefits equally.</p>
</li>
</ul>
<h2 id="heading-cleanup">Cleanup</h2>
<p>When you're done, clean up to avoid ongoing charges:</p>
<pre><code class="language-bash"># Remove the Kubernetes resources from the cluster
kubectl delete -f k8s/

# Delete the ARM node pool
gcloud container node-pools delete axion-pool \
  --cluster=axion-tutorial-cluster \
  --zone=us-central1-a

# Delete the cluster itself
gcloud container clusters delete axion-tutorial-cluster \
  --zone=us-central1-a

# Delete the images from Artifact Registry (optional — storage costs are minimal)
gcloud artifacts docker images delete \
  us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Let's recap what you built and why each part matters.</p>
<p>You started with a Go application, a Dockerfile, and a <code>docker buildx build</code> command that produced two images — one for x86, one for ARM64 — wrapped in a single Manifest List tag. Any server that pulls that tag gets the right binary automatically, without you maintaining separate pipelines or separate tags.</p>
<p>You provisioned a GKE cluster with two node pools running different CPU architectures, then used <code>nodeSelector</code> to make sure your ARM-optimized workload lands only on the ARM Axion nodes — not on x86 by accident. The result is a deployment that's both architecture-correct and cost-efficient.</p>
<p>The patterns you practiced here don't stop at this demo. The same Dockerfile technique works for any language with cross-compilation support. The same <code>nodeSelector</code> approach works for any workload you want to pin to ARM. As more teams migrate services to ARM over the coming years, having these skills will be a real asset.</p>
<p><strong>Where to go from here:</strong></p>
<ul>
<li><p>Add a GitHub Actions workflow that runs <code>docker buildx build --platform linux/amd64,linux/arm64</code> on every push, automating this entire process in CI.</p>
</li>
<li><p>Audit one of your existing stateless services for ARM compatibility and try migrating it.</p>
</li>
<li><p>Explore <strong>Node Affinity</strong> as a softer alternative to <code>nodeSelector</code> for workloads that can run on either architecture but prefer ARM.</p>
</li>
<li><p>Look into <strong>GKE Autopilot</strong>, which now supports ARM nodes and handles node pool management automatically.</p>
</li>
</ul>
<p>Happy building.</p>
<h2 id="heading-project-file-structure">Project File Structure</h2>
<pre><code class="language-plaintext">hello-axion/
├── app/
│   ├── main.go          — Go HTTP server
│   ├── go.mod           — Go module definition
│   └── Dockerfile       — Multi-stage Dockerfile
└── k8s/
    ├── deployment.yaml  — Deployment with nodeSelector and probes
    └── service.yaml     — LoadBalancer Service
</code></pre>
<p>All source files for this tutorial are available in the companion GitHub repository: <a href="https://github.com/Amiynarh/multi-arch-docker-gke-arm">https://github.com/Amiynarh/multi-arch-docker-gke-arm</a></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build and Deploy a Production-Ready WhatsApp Bot with FastAPI, Evolution API, Docker, EasyPanel, and GCP ]]>
                </title>
                <description>
                    <![CDATA[ WhatsApp bots are widely used for customer support, automated replies, notifications, and internal tools. Instead of relying on expensive third-party platforms, you can build and deploy your own self- ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-and-deploy-a-production-ready-whatsapp-bot/</link>
                <guid isPermaLink="false">699877ac3dc17c4862f466c7</guid>
                
                    <category>
                        <![CDATA[ chatbot ]]>
                    </category>
                
                    <category>
                        <![CDATA[ whatsapp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Raju Manoj ]]>
                </dc:creator>
                <pubDate>Fri, 20 Feb 2026 15:03:08 +0000</pubDate>
                <media:content url="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/5e1e335a7a1d3fcc59028c64/de480f02-206a-4325-b4c2-788d33d746b1.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>WhatsApp bots are widely used for customer support, automated replies, notifications, and internal tools. Instead of relying on expensive third-party platforms, you can build and deploy your own self-hosted WhatsApp bot using modern open-source tools.</p>
<p>In this tutorial, you’ll learn how to build and deploy a production-ready WhatsApp bot using:</p>
<ul>
<li><p>FastAPI</p>
</li>
<li><p>Evolution API</p>
</li>
<li><p>Docker</p>
</li>
<li><p>EasyPanel</p>
</li>
<li><p>Google Cloud Platform (GCP)</p>
</li>
</ul>
<p>By the end of this guide, you will have a fully working WhatsApp bot connected to your own WhatsApp account and deployed on a cloud virtual machine.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a href="#heading-how-the-architecture-works">How the Architecture Works</a></p>
</li>
<li><p><a href="#heading-how-your-whatsapp-bot-works">How Your WhatsApp Bot Works</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-step-1-create-firewall-rules-on-gcp">Step 1: Create Firewall Rules on GCP</a></p>
</li>
<li><p><a href="#heading-step-2-create-a-virtual-machine-ubuntu-2204">Step 2: Create a Virtual Machine (Ubuntu 22.04)</a></p>
</li>
<li><p><a href="#heading-step-3-ssh-into-the-vm">Step 3: SSH into the VM</a></p>
</li>
<li><p><a href="#heading-step-4-install-docker">Step 4: Install Docker</a></p>
</li>
<li><p><a href="#heading-step-5-install-easypanel">Step 5: Install EasyPanel</a></p>
</li>
<li><p><a href="#heading-step-6-open-the-easypanel-dashboard">Step 6: Open the EasyPanel Dashboard</a></p>
</li>
<li><p><a href="#heading-step-7-deploy-evolution-api">Step 7: Deploy Evolution API</a></p>
</li>
<li><p><a href="#heading-step-8-connect-whatsapp">Step 8: Connect WhatsApp</a></p>
</li>
<li><p><a href="#heading-step-9-deploy-the-fastapi-bot">Step 9: Deploy the FastAPI Bot</a></p>
</li>
<li><p><a href="#heading-step-10-connect-the-webhook-telling-evolution-api-where-to-send-messages">Step 10: Connect the Webhook - Telling Evolution API Where to Send Messages</a></p>
</li>
<li><p><a href="#heading-step-11-final-test">Step 11: Final Test</a></p>
</li>
<li><p><a href="#heading-production-considerations">Production Considerations</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-how-the-architecture-works">How the Architecture Works</h2>
<p>Before we start installing anything, let’s understand how the system works.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770444944919/cbcd4016-c98d-4379-9c7c-2d22ab9e309a.png" alt="This diagram shows how a WhatsApp message flows from the user’s WhatsApp app, through WhatsApp servers, into a GCP VM (via firewall, Docker, and EasyPanel) where Evolution API receives it, triggers a FastAPI bot via webhook, processes the logic, and sends the reply back to the user through WhatsApp." width="2619" height="4286" loading="lazy">

<h2 id="heading-how-your-whatsapp-bot-works">How Your WhatsApp Bot Works</h2>
<p>Before we continue setting things up, let's make sure you understand what's actually happening behind the scenes. Don't worry – no technical experience needed here.</p>
<h3 id="heading-imagine-a-postal-service">Imagine a postal service</h3>
<p>Think of your WhatsApp bot like a very fast, automated postal service:</p>
<ul>
<li><p>Someone sends you a letter (a WhatsApp message)</p>
</li>
<li><p>A postal worker (Evolution API) picks it up and brings it to your office</p>
</li>
<li><p>Your office manager (FastAPI bot) reads it and writes a reply</p>
</li>
<li><p>The postal worker takes the reply back and delivers it</p>
</li>
</ul>
<p>That's it. That's the whole system.</p>
<h3 id="heading-the-7-steps">The 7 steps</h3>
<ol>
<li><p>Someone sends a message to your WhatsApp number – just like texting a friend.</p>
</li>
<li><p>Evolution API notices the message – it's constantly watching your WhatsApp number for new messages, like a receptionist sitting by the phone.</p>
</li>
<li><p>Evolution API passes the message to your bot – it sends the message content to your app and says <em>"hey, you've got a new message!"</em></p>
</li>
<li><p>Your bot reads the message and decides what to say – this is where your code does its job.</p>
</li>
<li><p>Your bot sends the reply back to Evolution API – <em>"okay, send this response."</em></p>
</li>
<li><p>Evolution API delivers the reply through WhatsApp.</p>
</li>
<li><p>The user sees the reply on their phone – usually within seconds.</p>
</li>
</ol>
<h3 id="heading-one-line-summary">One line summary</h3>
<pre><code class="language-plaintext">User → WhatsApp → Evolution API → Your Bot → Evolution API → WhatsApp → User
</code></pre>
<p>Every step in this guide is just setting up one piece of that chain. Once they're all connected, the whole thing runs on its own automatically.</p>
<p>This architecture allows you to automate replies while keeping full control of your infrastructure.</p>
<h3 id="heading-why-these-tools">Why These Tools?</h3>
<p>Let’s briefly understand why we’re using each tool.</p>
<h4 id="heading-fastapi">FastAPI</h4>
<p>FastAPI is a modern Python framework for building APIs. It is fast, lightweight, and ideal for handling webhook requests from Evolution API.</p>
<h4 id="heading-evolution-api">Evolution API</h4>
<p>Evolution API is a self-hosted WhatsApp automation server built on top of Baileys. It connects your personal WhatsApp account without requiring official WhatsApp Business API approval.</p>
<h4 id="heading-docker">Docker</h4>
<p>Docker allows us to run applications in containers. This makes deployments consistent, portable, and production-ready.</p>
<h4 id="heading-easypanel">EasyPanel</h4>
<p>EasyPanel is a graphical platform for managing Docker services. Instead of writing Docker Compose files manually, we use EasyPanel’s UI to deploy and manage our services easily.</p>
<h4 id="heading-google-cloud-platform-gcp">Google Cloud Platform (GCP)</h4>
<p>GCP provides the virtual machine that hosts our infrastructure. We will use an Ubuntu 22.04 server to run Docker, EasyPanel, Evolution API, and our FastAPI bot.</p>
<p>I chose these tools because they are practical, lightweight, and suitable for real-world production deployments.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before starting, make sure you have:</p>
<ul>
<li><p>A Google Cloud, AWS, or Azure account</p>
</li>
<li><p>Billing enabled</p>
</li>
<li><p>A project selected</p>
</li>
<li><p>Access to Cloud Shell</p>
</li>
<li><p>Basic Linux and Docker knowledge</p>
</li>
</ul>
<h2 id="heading-step-1-create-firewall-rules-on-gcp">Step 1: Create Firewall Rules on GCP</h2>
<p>We need to allow traffic to specific ports on our VM. So, we run this command in GCP Cloud Shell:</p>
<pre><code class="language-bash">gcloud compute firewall-rules create easypanel-whatsapp-fw \
 --network default \
 --direction INGRESS \
 --priority 1000 \
 --action ALLOW \
 --rules tcp:22,tcp:80,tcp:443,tcp:3000,tcp:8080,tcp:9000,tcp:5000-5999 \
 --source-ranges 0.0.0.0/0 \
 --description "SSH, EasyPanel, Evolution API, Bot"
</code></pre>
<p>This command:</p>
<ul>
<li><p>Creates a <strong>firewall rule</strong> named <code>easypanel-whatsapp-fw</code></p>
</li>
<li><p>On the <strong>default network</strong></p>
</li>
<li><p>Allows incoming internet traffic (<code>INGRESS</code>)</p>
</li>
<li><p>Opens these ports:</p>
<ul>
<li><p><code>22</code> → SSH (server access)</p>
</li>
<li><p><code>80</code> → HTTP</p>
</li>
<li><p><code>443</code> → HTTPS</p>
</li>
<li><p><code>3000, 8080, 9000</code> → App panels / APIs</p>
</li>
<li><p><code>5000–5999</code> → Custom app range</p>
</li>
</ul>
</li>
<li><p>Allows access from <strong>any IP address</strong> (<code>0.0.0.0/0</code>)</p>
</li>
</ul>
<p>Basically It opens your server so people (and you) can access your apps and services from the internet. This firewall rule allows external traffic to reach your VM.</p>
<h2 id="heading-step-2-create-a-virtual-machine-ubuntu-2204">Step 2: Create a Virtual Machine (Ubuntu 22.04)</h2>
<p>Now we'll create the server that hosts everything. Run the following command in the GCP Cloud Shell to set up a virtual machine with Ubuntu 22.04.</p>
<pre><code class="language-bash">gcloud compute instances create whatsapp-vm \
  --zone=asia-south1-a \
  --machine-type=e2-medium \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=30GB \
  --tags=easypanel
</code></pre>
<p>This command creates a new virtual machine (VM) on Google Cloud:</p>
<ul>
<li><p><strong>Name:</strong> <code>whatsapp-vm</code></p>
</li>
<li><p><strong>Location (zone):</strong> <code>asia-south1-a</code> (India region)</p>
</li>
<li><p><strong>Machine size:</strong> <code>e2-medium</code> (2 vCPU, 4GB RAM)</p>
</li>
<li><p><strong>Operating System:</strong> Ubuntu 22.04 LTS</p>
</li>
<li><p><strong>Disk size:</strong> 30GB</p>
</li>
<li><p><strong>Tag:</strong> <code>easypanel</code> (used to apply firewall rules)</p>
</li>
</ul>
<p>This creates a Linux server in Google Cloud that you can use to host EasyPanel, WhatsApp bot, or your APIs.</p>
<p>Note: Wait about one minute for the instance to start.</p>
<h2 id="heading-step-3-ssh-into-the-vm">Step 3: SSH into the VM</h2>
<p>Connect to your server by using SSH to access the virtual machine you just created on Google Cloud.</p>
<pre><code class="language-bash">gcloud compute ssh whatsapp-vm --zone=asia-south1-a
</code></pre>
<p>This command connects to your virtual machine named <code>whatsapp-vm</code> in the zone <code>asia-south1-a</code> using SSH (secure remote login).</p>
<p>It logs you into your Google Cloud server so you can start installing software and running commands. After running this, you will see a terminal prompt – that means you are now inside your Ubuntu server and ready to go.</p>
<h2 id="heading-step-4-install-docker">Step 4: Install Docker</h2>
<p>Docker is needed to run EasyPanel and the Evolution API.</p>
<p><strong>First update the system:</strong></p>
<pre><code class="language-bash">sudo apt update -y
sudo apt install -y curl
</code></pre>
<p>This does two things:</p>
<ol>
<li><p><code>sudo apt update -y</code>→ Updates your server’s package list (refreshes available software info).</p>
</li>
<li><p><code>sudo apt install -y curl</code>→ Installs <strong>curl</strong>, a tool used to download things from the internet using the terminal.</p>
</li>
</ol>
<p>It prepares your server and installs a tool needed to download and install other software.</p>
<p><strong>Then install Docker:</strong></p>
<pre><code class="language-bash">curl -fsSL https://get.docker.com | sudo sh
</code></pre>
<p>This command uses <code>curl</code> to download Docker’s official installation script. The <code>|</code> (pipe) sends it directly to <code>sudo sh</code>, which runs the script as administrator.</p>
<p>It automatically installs <strong>Docker</strong> on your server.</p>
<p>After this finishes, Docker should be installed.</p>
<p><strong>Enable Docker:</strong></p>
<pre><code class="language-bash">sudo systemctl enable docker
sudo systemctl start docker
</code></pre>
<p>This command does two things:</p>
<ol>
<li><p><code>enable docker</code>→ Makes Docker start automatically every time the server reboots.</p>
</li>
<li><p><code>start docker</code>→ Starts Docker right now.</p>
</li>
</ol>
<p>It turns Docker ON now and makes sure it stays ON after restart.</p>
<p><strong>Allow the Ubuntu user to run Docker:</strong></p>
<pre><code class="language-bash">sudo usermod -aG docker ubuntu
</code></pre>
<p>This command adds the user <code>ubuntu</code> to the Docker group.</p>
<p>This is important: By default, you must use <code>sudo</code> before every Docker command.After running this, the Ubuntu user can run Docker without needing sudo every time.</p>
<p>Note: This command assumes your username is <code>ubuntu</code>, which is the default on Google Cloud VMs. If your username is different, replace Ubuntu with your actual username.</p>
<p><strong>Exit the session and reconnect:</strong></p>
<pre><code class="language-bash">exit
gcloud compute ssh whatsapp-vm --zone=asia-south1-a
</code></pre>
<ol>
<li><p><code>exit</code>→ Logs you out of your current server session.</p>
</li>
<li><p><code>gcloud compute ssh whatsapp-vm --zone=asia-south1-a</code>→ Logs you back into your Google Cloud VM.</p>
</li>
</ol>
<p>Why we do this: After adding the <code>ubuntu</code> user to the Docker group, you must log out and log back in for the permission changes to work.</p>
<p><strong>Test Docker:</strong></p>
<pre><code class="language-bash">docker run hello-world
</code></pre>
<p>This command downloads a small test image called <strong>hello-world</strong>, runs it inside Docker, and prints a success message if Docker is working correctly.</p>
<p>It checks if Docker is installed and working properly. If you see “Hello from Docker!”, Docker is working correctly.</p>
<h2 id="heading-step-5-install-easypanel">Step 5: Install EasyPanel</h2>
<p>EasyPanel provides a user interface for deploying Docker services. Run this command in the VM:</p>
<pre><code class="language-bash">curl -sSL https://get.easypanel.io | sudo bash
</code></pre>
<p>This command:</p>
<ul>
<li><p>Downloads the official <strong>EasyPanel installation script</strong></p>
</li>
<li><p>Runs it with administrator (sudo) permission</p>
</li>
<li><p>Automatically installs and configures EasyPanel on your server</p>
</li>
</ul>
<p>It installs EasyPanel on your VM so you can manage apps using a web dashboard instead of commands. Installation takes about one minute.</p>
<h2 id="heading-step-6-open-the-easypanel-dashboard">Step 6: Open the EasyPanel Dashboard</h2>
<p>Once you have your IP address, open a new tab in your browser and type it in like this:</p>
<pre><code class="language-plaintext">http://&lt;YOUR_PUBLIC_IP&gt;:3000
</code></pre>
<p>For example, if your IP was 34.123.45.67, you would type:</p>
<pre><code class="language-plaintext">http://34.123.45.67:3000
</code></pre>
<p>EasyPanel runs on port 3000 by default – that's why we add :3000 at the end. Without it, your browser won't know which service to open on the server.</p>
<p>Create an admin account and log in; the EasyPanel login page will appear.</p>
<p>Click <strong>“Create Admin Account”</strong>.</p>
<p>Fill in:</p>
<ul>
<li><p><strong>Username</strong> (choose something you’ll remember)</p>
</li>
<li><p><strong>Email</strong></p>
</li>
<li><p><strong>Password</strong> (make it strong!)</p>
</li>
<li><p>Submit the form.</p>
</li>
</ul>
<p>You are now logged in as the admin and can start managing apps, APIs, and bots through the EasyPanel dashboard.</p>
<p>You will see a page like the one below:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771478879926/c222ae8c-8714-4af2-a631-c6fead8185e8.png" alt="easypanel page" style="display:block;margin:0 auto" width="1908" height="1060" loading="lazy">

<h2 id="heading-step-7-deploy-evolution-api">Step 7: Deploy Evolution API</h2>
<ol>
<li><p>Create a new project (for example: <code>whatsapp-1</code>)</p>
</li>
<li><p>Go to Services → Templates</p>
</li>
<li><p>Select Evolution API</p>
</li>
<li><p>Deploy the latest version</p>
</li>
</ol>
<p>Wait until all services turn green. You will see a page like the one below.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771479080612/ce8453b2-e20e-4c37-b150-de4e032ba794.png" alt="Deploying evolution-api" style="display:block;margin:0 auto" width="1919" height="995" loading="lazy">

<p>Next, open Environment Variables and locate:</p>
<pre><code class="language-plaintext">AUTHENTICATION_API_KEY
</code></pre>
<p>Copy the AUTHENTICATION_API_KEY.</p>
<p>Open the Evolution API dashboard</p>
<p>Inside EasyPanel, find your Evolution API service. You will see a clickable domain link – it usually looks something like:</p>
<pre><code class="language-plaintext">https://evolution-api.easypanel.host
</code></pre>
<p>Click that link to open it in your browser. You will see a JSON response confirming the service is running.</p>
<p>Once you open the link, you’ll see a JSON response confirming success. To proceed with login, copy the <strong>Manager link</strong> displayed in the response. This link opens the management dashboard where you can authenticate and begin using the Evolution API. The screenshot below highlights the manager URL along with version details for easy reference</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771480117778/017524df-f8be-41bd-9d6c-0ba04b499fe9.png" alt="manager URL" style="display:block;margin:0 auto" width="840" height="396" loading="lazy">

<p>Copy the manager link and open it in a new tab, then copy the AUTHENTICATION_API_KEY, which you did in the previous step. This is how it looks, as you can see below:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771480570638/9e18ca51-1637-4261-be22-7cd89ee0459f.png" alt="Evolution manager" style="display:block;margin:0 auto" width="1919" height="831" loading="lazy">

<p>Create a new instance:</p>
<ul>
<li><p>Choose channel: <strong>Baileys</strong></p>
</li>
<li><p>Leave phone number blank</p>
</li>
<li><p>Give your instance a name</p>
</li>
</ul>
<p>Save the instance.</p>
<h2 id="heading-step-8-connect-whatsapp">Step 8: Connect WhatsApp</h2>
<p>Inside your instance dashboard:</p>
<ol>
<li><p>Click <strong>Get QR</strong></p>
</li>
<li><p>Scan it using WhatsApp on your phone</p>
</li>
</ol>
<p>Once connected, your chats and contacts will sync automatically. If syncing fails, disconnect and reconnect the session.</p>
<h2 id="heading-step-9-deploy-the-fastapi-bot">Step 9: Deploy the FastAPI Bot</h2>
<p>Now we’ll deploy the bot service.</p>
<h3 id="heading-1-go-to-easypanel">1: Go to EasyPanel</h3>
<p>You’re opening the EasyPanel dashboard you just installed. This is where you can manage apps, servers, and services using a graphical interface instead of terminal commands.</p>
<h3 id="heading-2-create-a-new-project">2: Create a new project</h3>
<p>A “project” is like a container or folder for your bot service. It organizes all files, settings, and deployments for this app.</p>
<h3 id="heading-3-add-an-app-service">3: Add an App service</h3>
<p>“App service” means a running instance of your application. In this case, it will be the WhatsApp bot.</p>
<h3 id="heading-4-choose-git-deployment">4: Choose Git deployment</h3>
<p>Git deployment lets you connect a code repository to EasyPanel.This will automatically download your code from GitHub and run it inside Docker.</p>
<h3 id="heading-5-paste-your-repository-url">5: Paste your repository URL</h3>
<pre><code class="language-plaintext">https://github.com/rajumanoj333/wabot
</code></pre>
<p>This is the GitHub repository containing the WhatsApp bot code. EasyPanel will clone this repo and prepare the app automatically.</p>
<h3 id="heading-6-domains-in-easypanel">6: Domains in EasyPanel</h3>
<p>This section lets you assign a URL or domain name to your app service. Even if you don’t have a custom domain, you can use your server’s public IP. Your WhatsApp bot app runs on port 9000 inside the server.</p>
<h3 id="heading-7-set-the-port-to-9000">7: Set the port to <code>9000</code></h3>
<p>By setting the domain to use port 9000, EasyPanel knows where to send traffic.</p>
<p>Example URL after this step:</p>
<pre><code class="language-plaintext">https://your-project.easypanel.host
</code></pre>
<p>This is the public address people (and other services) will use to reach your bot.</p>
<p>You’re telling EasyPanel:</p>
<blockquote>
<p>“Whenever someone accesses this project, forward them to the bot service running on port 9000.”</p>
</blockquote>
<p>Without this step, the bot service would run but <strong>you wouldn’t be able to access it from your browser or other apps</strong>.</p>
<h3 id="heading-configure-environment-variables">Configure Environment Variables</h3>
<p>Set the following variables:</p>
<pre><code class="language-plaintext">EVOLUTION_API_URL=http://evolution-api:8080
EVOLUTION_API_KEY=YOUR_AUTHENTICATION_API_KEY
INSTANCE_NAME=your_instance_name
</code></pre>
<p>Note: You might notice two different names here – AUTHENTICATION_API_KEY (used in EasyPanel) and EVOLUTION_API_KEY (used in your bot code). They are the same key. Just copy the value from EasyPanel and paste it into both places.</p>
<h2 id="heading-step-10-connect-the-webhook-telling-evolution-api-where-to-send-messages">Step 10: Connect the Webhook – Telling Evolution API Where to Send Messages</h2>
<p>At this point, you have two separate things running:</p>
<ol>
<li><p><strong>Evolution API</strong>: the service that connects to WhatsApp and handles messages</p>
</li>
<li><p><strong>Your app (fastapi bot)</strong>: the chatbot brain you deployed in the previous steps</p>
</li>
</ol>
<p>Right now, these two don't know each other exists. They're like two people in different rooms with no way to pass notes between them. A <strong>webhook</strong> fixes that.</p>
<h3 id="heading-so-what-exactly-is-a-webhook">So what exactly is a webhook?</h3>
<p>A webhook is simply a URL (a web address) that you hand to one service so it can automatically notify another service when something happens.</p>
<p>You're going to tell Evolution API <em>"whenever a WhatsApp message arrives, forward it to this address."</em> Your app will be sitting at that address, waiting to receive it, read it, and send a reply.</p>
<p>Think of it like a forwarding address at the post office. When mail (a WhatsApp message) arrives, it gets automatically redirected to your app's door.</p>
<h3 id="heading-lets-set-it-up">Let's set it up</h3>
<h4 id="heading-1-open-your-evolution-api-dashboard">1. Open your Evolution API dashboard.</h4>
<p>You should already have this open from earlier steps. In the left sidebar, click on <strong>Events</strong>, then click on <strong>Webhook</strong>. This is where you control how Evolution API sends data to your app.</p>
<h4 id="heading-2-turn-the-webhook-on">2. Turn the webhook on.</h4>
<p>At the top of the page, you'll see a toggle next to the word <strong>"Enabled"</strong>. Click it so it turns green. This tells Evolution API that you want to start using a webhook.</p>
<h4 id="heading-3-enter-your-apps-webhook-url">3. Enter your app's webhook URL.</h4>
<p>In the <strong>URL</strong> field, type your app's address with <code>/webhook</code> added to the end, like this:</p>
<pre><code class="language-plaintext">https://your-domain.easypanel.host/webhook
</code></pre>
<p>Replace <code>your-domain</code> with the actual domain name you set up when you deployed your app. The <code>/webhook</code> part at the end is important: it's a specific page your app has set up just for receiving these messages. Without it, Evolution API would be knocking on the wrong door.</p>
<h4 id="heading-4-leave-webhook-by-events-and-webhook-base64-turned-off-for-now">4. Leave "Webhook by Events" and "Webhook Base64" turned off for now.</h4>
<p>These are advanced options you won't need for a basic chatbot.</p>
<h4 id="heading-5-scroll-down-to-the-events-section-and-enable-these-two-events">5. Scroll down to the Events section and enable these two events:</h4>
<ul>
<li><p><strong>MESSAGES_UPSERT</strong>: This triggers every time someone sends your WhatsApp number a message. Without this, your app would never know a message arrived.</p>
</li>
<li><p><strong>SEND_MESSAGE</strong>: This triggers when a message is sent <em>out</em>. It helps your app confirm that replies are going through correctly.</p>
</li>
</ul>
<p>You can leave all the other events (like <code>APPLICATION_STARTUP</code>) turned off. They handle things like group chats and contact updates, which aren't needed for what we're building.</p>
<p><strong>6. Click Save.</strong></p>
<h3 id="heading-quick-recap-of-what-you-just-did">Quick recap of what you just did</h3>
<p>You created a direct line between Evolution API and your app. Now, the moment someone messages your WhatsApp number, Evolution API will instantly pass that message along to your app. Your app reads it, figures out a response, and sends one back all automatically.</p>
<p>This is the step that brings your chatbot to life. Without it, nothing would happen when someone sent you a message. With it, the whole system clicks into place.</p>
<h2 id="heading-step-11-final-test">Step 11: Final Test</h2>
<p>Send a message from a different WhatsApp number (not the connected one).</p>
<p>Send:</p>
<pre><code class="language-plaintext">Hi
</code></pre>
<p>If everything is configured correctly, your bot should reply:</p>
<pre><code class="language-plaintext">👋 Hello! Bot is working.
</code></pre>
<p>Congratulations! Your WhatsApp bot is now live.</p>
<h2 id="heading-production-considerations">Production Considerations</h2>
<p>For real-world deployments, consider:</p>
<ul>
<li><p>Restricting firewall rules instead of allowing 0.0.0.0/0</p>
</li>
<li><p>Using HTTPS with a custom domain</p>
</li>
<li><p>Securing API keys with a secret manager</p>
</li>
<li><p>Monitoring logs and container health</p>
</li>
<li><p>Setting up automatic backups</p>
</li>
</ul>
<p>This tutorial demonstrates the core working system, but these improvements will make your deployment more secure and scalable.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You now have a fully self-hosted WhatsApp bot running on a cloud VM using FastAPI, Evolution API, Docker, EasyPanel, and GCP.</p>
<p>This setup gives you:</p>
<ul>
<li><p>Full control over infrastructure</p>
</li>
<li><p>No dependency on expensive SaaS platforms</p>
</li>
<li><p>Production-ready container deployment</p>
</li>
<li><p>Scalable architecture</p>
</li>
</ul>
<p>From here, you can extend your bot with:</p>
<p>AI integrations : connect your bot to ChatGPT or Gemini or Claude so it can answer questions intelligently instead of just sending fixed replies.</p>
<p>Database storage: save incoming messages, user details, or conversation history to a database like PostgreSQL or MongoDB.</p>
<p>Custom automation workflows trigger actions based on keywords, like sending a PDF when someone types "menu" or booking an appointment when they type "schedule".</p>
<p>CRM integrations :connect your bot to tools like HubSpot or Notion to automatically log leads and customer conversations.Building your own infrastructure is one of the best ways to deeply understand how modern backend systems work together.</p>
<p>Happy building!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Deploy Your Own Cockroach DB  Instance on Kubernetes [Full Book for Devs] ]]>
                </title>
                <description>
                    <![CDATA[ Developers are smart, wonderful people, and they’re some of the most logical thinkers you’ll ever meet. But we’re pretty terrible at naming things 😂 Like, what in the world – out of every other possible name, they decided to name a database after a ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/deploy-your-own-cockroach-db-instance-on-kubernetes-full-book-for-devs/</link>
                <guid isPermaLink="false">6925e482ccc8b29b82c002c5</guid>
                
                    <category>
                        <![CDATA[ cockroachdb ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Kubernetes ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Prince Onukwili ]]>
                </dc:creator>
                <pubDate>Tue, 25 Nov 2025 17:16:50 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764088553942/496bf5f4-f059-4873-b6c1-419a86e594ef.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Developers are smart, wonderful people, and they’re some of the most logical thinkers you’ll ever meet. But we’re pretty terrible at naming things 😂</p>
<p>Like, what in the world – out of every other possible name, they decided to name a database after a <em>literal cockroach</em>? 🤣</p>
<p>I mean, I get it: cockroaches are known for being resilient, and the devs were probably trying to say “our database never dies”… but still…a cockroach?</p>
<p>The name aside, out of all the databases out there, you might be wondering why would you choose CockroachDB? And if you did choose it, where would you even start when trying to host and deploy it? Would you go for a managed cloud service? Or could you actually self-manage it?</p>
<p>If you ever thought of doing it yourself – maybe in a dev environment, or even introducing it to your company – how would you go about it?</p>
<p>Well, just calm your nerves 😄</p>
<p>In this book, we’ll explore everything you need to know about <strong>deploying and managing CockroachDB on Kubernetes</strong>. We’ll dive deep into:</p>
<ul>
<li><p>Understanding how CockroachDB’s masterless (multi-primary) architecture actually works</p>
</li>
<li><p>Setting up and deploying CockroachDB on a Kubernetes cluster</p>
</li>
<li><p>Automating backups to Google Cloud Storage using just a few queries in the CockroachDB cluster</p>
</li>
<li><p>Managing service accounts and authentication securely</p>
</li>
<li><p>Tuning CockroachDB’s memory settings for stable performance</p>
</li>
<li><p>Scaling the cluster horizontally and vertically without downtime</p>
</li>
<li><p>Monitoring and maintaining the database like a pro</p>
</li>
</ul>
<p>By the end, you’ll not only understand how CockroachDB works, you’ll be confident enough to deploy and manage your own resilient, production-ready instance. 🚀</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-even-is-cockroachdb">What Even Is CockroachDB? 🤔</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-simple-definition">Simple Definition</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-who-made-cockroachdb-when-was-it-released">Who Made CockroachDB? When Was it Released?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-problems-does-cockroachdb-try-to-solve">What Problems Does CockroachDB Try to Solve?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-terms-you-should-know-in-plain-language">Key Terms You Should Know (in plain language):</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-the-name-cockroachdb">Why the name “CockroachDB”? 😅</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-why-choose-cockroachdb-over-postgresql-or-mongodb">Why Choose CockroachDB Over PostgreSQL or MongoDB 🤷🏾‍♂️?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-fault-tolerance-is-handled-in-postgresql-and-mongodb">How Fault Tolerance is Handled in PostgreSQL and MongoDB</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-cockroachdb-handles-it-differently">How CockroachDB Handles It Differently</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-cockroachdb-works-behind-the-scenes">How CockroachDB Works Behind the Scenes ⚙️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-ranges-the-small-pieces-of-data">Ranges: The Small Pieces of Data</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-replication-many-copies-for-safety">Replication: Many Copies for Safety</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-raft-consensus-how-all-copies-agree">Raft Consensus: How All Copies Agree</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-multiraft-keeping-raft-efficient-when-things-scale">MultiRaft: Keeping Raft Efficient When Things Scale</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-rebalancing-movement-for-balance">Rebalancing: Movement for Balance</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-distributed-transactions-doing-work-across-multiple-ranges">Distributed Transactions: Doing Work Across Multiple Ranges</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-it-all-fits-together-read-write-flow-what-happens-when-you-use-it">How It All Fits Together: Read + Write Flow (What Happens When You Use It)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-this-all-matters-putting-it-in-plain-english">Why This All Matters (Putting It in Plain English)</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-where-and-how-should-you-host-cockroachdb">Where (and How) Should You Host CockroachDB? ☁️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-option-1-cockroachdb-cloud-fully-managed-by-cockroach-labs">Option 1: CockroachDB Cloud (fully managed by Cockroach Labs)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-2-bring-your-own-cloud-byoc">Option 2: Bring Your Own Cloud (BYOC)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-3-use-cloud-marketplaces-aws-gcp-azure">Option 3: Use Cloud Marketplaces (AWS, GCP, Azure)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-4-my-favorite-self-hosting-especially-using-kubernetes">Option 4 (My Favorite 😁): Self-Hosting — Especially Using Kubernetes</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-setting-up-your-local-environment">Setting Up Your Local Environment 🧑‍💻</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-these-tools">Why these tools?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-install-minikube">Step 1: Install Minikube</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-install-kubectl">Step 2: Install kubectl</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-install-helm">Step 3: Install Helm</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-deploying-cockroachdb-on-minikube-the-fun-part-begins">Deploying CockroachDB on Minikube (The Fun Part Begins 😁!)</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-visit-artifacthub">Step 1: Visit ArtifactHub</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-explore-the-helm-chart">Step 2: Explore the Helm Chart</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-copy-the-default-values">Step 3: Copy the Default Values</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-create-a-folder-for-our-project">Step 4: Create a Folder for Our Project</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-understanding-the-key-configurations">Step 5: Understanding the Key Configurations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-create-a-simplified-values-config-for-the-cockroachdb-helm-chart">Step 6: Create a Simplified Values Config for the CockroachDB Helm Chart</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-overview-of-the-yaml-values">Overview of the YAML values</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-install-the-cockroachdb-cluster-using-helm">🚀 Step 7: Install the CockroachDB Cluster Using Helm</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-accessing-the-cockroachdb-console-amp-viewing-metrics">Accessing the CockroachDB Console &amp; Viewing Metrics</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-locate-the-cockroachdb-public-service">Step 1: Locate the CockroachDB Public Service</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-learn-more-about-the-service">Step 2: Learn More About the Service</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-access-the-cockroachdb-dashboard">Step 3: Access the CockroachDB Dashboard</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-visit-the-dashboard">Step 4: Visit the Dashboard</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-exploring-the-metrics-dashboard">Step 5: Exploring the Metrics Dashboard</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-creating-a-little-load-on-the-cockroachdb-cluster">Step 6: Creating a Little Load on the CockroachDB Cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-viewing-the-metrics-from-the-load">Step 7: Viewing the Metrics from the Load</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-8-view-the-list-of-created-items-in-the-database">Step 8: View the List of Created Items in the Database</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-backing-up-cockroachdb-to-google-cloud-storage">Backing Up CockroachDB to Google Cloud Storage ☁️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-backups-are-absolutely-critical">Why Backups Are Absolutely Critical</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-our-db-installing-beekeeper-studio">Connecting to Our DB – Installing Beekeeper Studio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-install-beekeeper-studio">How to Install Beekeeper Studio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-beekeeper-studio-to-cockroachdb">Connecting Beekeeper Studio to CockroachDB</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-exposing-the-cluster-for-local-access">Exposing the Cluster for Local Access</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-via-beekeeper-studio">🐝 Connecting via Beekeeper Studio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-verify-the-connection">Verify the Connection</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-creating-a-google-cloud-account">Creating a Google Cloud Account</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-creating-a-google-cloud-storage-bucket">Creating a Google Cloud Storage Bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-giving-cockroachdb-access-to-the-bucket">Giving CockroachDB Access to the Bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-attaching-the-key-to-our-cockroachdb-cluster">Attaching the Key to Our CockroachDB Cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-testing-our-backup-disaster-recovery-time">Testing Our Backup — Disaster Recovery Time</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-managing-resources-amp-optimizing-memory-usage">Managing Resources &amp; Optimizing Memory Usage</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-cockroachdb-uses-memory">How CockroachDB Uses Memory</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-memory-usage-formula-you-must-follow">The Memory Usage Formula You Must Follow</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-where-you-find-these-settings">Where You Find These Settings</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-concrete-example-step-by-step">Concrete Example (Step-by-Step)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-on-requests-vs-limits-in-kubernetes">⚠️ On Requests vs Limits in Kubernetes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-overriding-the-default-fractions">Overriding the Default Fractions</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-scaling-cockroachdb-the-right-way">Scaling CockroachDB the Right Way</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-key-metrics-to-understand">Key Metrics to Understand</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-and-what-to-scale-based-on-your-metrics">When (and What) to Scale Based on Your Metrics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-disk-bound-situations-what-to-do-when-your-disk-is-the-limiting-factor">Disk-Bound Situations — What to Do When Your Disk Is the Limiting Factor</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-memory-pressure-what-to-do-when-your-database-hits-the-limit">Memory Pressure — What to Do When Your Database Hits the Limit</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-queries-are-slow-but-everything-else-cpu-memory-amp-disk-looks-fine">When Queries Are Slow but Everything Else (CPU, Memory &amp; Disk) Looks “Fine”</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-disk-speed-iops-amp-throughput-across-cloud-providers">Understanding Disk Speed (IOPS &amp; Throughput) Across Cloud Providers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-downsizing-the-cluster-reducing-replicas">Downsizing the Cluster (Reducing Replicas)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-wrong-way-to-downscale">⚠️ The Wrong Way to Downscale</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-decommissioning-a-node-before-scaling-down-the-cluster">Decommissioning a Node Before Scaling Down the Cluster</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-what-to-consider-when-deploying-cockroachdb-on-google-kubernetes-engine-gke">What to Consider When Deploying CockroachDB on Google Kubernetes Engine (GKE) ☁️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-creating-your-gke-cluster">Creating Your GKE Cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-your-gke-cluster">Connecting to your GKE cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-deploying-cockroachdb-in-production-on-gke">Deploying CockroachDB in Production (on GKE)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-configuration">Understanding the Configuration</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-installing-the-cockroachdb-cluster-on-gke">Installing the CockroachDB Cluster on GKE</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-our-cockroachdb-cluster-now-that-tls-mtls-are-enabled">Connecting to Our CockroachDB Cluster (Now That TLS + mTLS Are Enabled)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-via-mutual-tls-mtls-why-we-need-a-certificate-for-our-root-user">Connecting via Mutual TLS (mTLS) — Why We Need a Certificate for Our root User</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-lets-explore-our-clusters-certificate">Let’s Explore Our Cluster’s Certificate</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-certificate-sections-explained-super-simply">Understanding the Certificate Sections (Explained Super Simply)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-creating-a-client-certificate-so-we-can-finally-connect-to-cockroachdb">Creating a Client Certificate (So We Can Finally Connect to CockroachDB)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-our-cockroachdb-cluster-securely-using-mtls">Connecting to Our CockroachDB Cluster Securely (Using mTLS)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-restoring-our-previous-database-into-the-new-gke-cockroachdb-cluster-without-sa-keys">Restoring Our Previous Database into the New GKE CockroachDB Cluster (without SA keys)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-restoring-our-previous-database-from-google-cloud-storage">Restoring Our Previous Database from Google Cloud Storage</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-now-lets-restore-the-data">Now, Let’s Restore the Data 🎉</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-the-database-with-a-new-user">Connecting to the Database with a New User</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-with-passwordless-authentication-mutual-tls">Connecting with Passwordless Authentication (Mutual TLS)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-via-mutual-tls-mtls-from-our-apps-on-kubernetes">Connecting via Mutual TLS (mTLS) from Our Apps on Kubernetes</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-get-a-cockroachdb-enterprise-license-for-free">How to Get a CockroachDB Enterprise License for FREEE!</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-three-types-of-licenses">Three Types of Licenses</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-apply-for-the-free-enterprise-license">How to Apply for the Free Enterprise License</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-adding-your-license-to-the-cockroachdb-cluster">Adding Your License to the CockroachDB Cluster</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion-amp-next-steps">Conclusion &amp; Next Steps ✨</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-about-the-author">About the Author 👨🏾‍💻</a></li>
</ul>
</li>
</ol>
<h2 id="heading-what-even-is-cockroachdb">What Even Is CockroachDB? 🤔</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760416037885/c67edcbb-be85-4614-bdf3-104942048eea.jpeg" alt="An image summarizing what CockroachDB is" class="image--center mx-auto" width="1307" height="1697" loading="lazy"></p>
<p>Hey! before we jump into setting up our Kubernetes cluster and deploying our CockroachDB cluster, let’s get grounded in what CockroachDB really is. (Because if you don’t understand the why and how, the implementation and practical session will just feel like magic 😅.)</p>
<h3 id="heading-simple-definition">Simple Definition</h3>
<p>CockroachDB is a distributed SQL database. This means it gives you the features of a relational database (tables, SQL queries, JOINS, transactions) but copies data across multiple replicas (servers, nodes, instances). No need for sharding manually. 😃</p>
<p>It’s built to survive failures, scale easily (compared to other SQL databases), and keep your data consistent no matter what (across all the instances).</p>
<h3 id="heading-who-made-cockroachdb-when-was-it-released">Who Made CockroachDB? When Was it Released?</h3>
<p>CockroachDB was created by <a target="_blank" href="https://www.cockroachlabs.com/"><strong>Cockroach Labs</strong></a>, founded by Spencer Kimball, Peter Mattis, and Ben Darnell. The idea first started taking shape around 2014, and by 2015 Cockroach Labs was formally founded.</p>
<p>Its 1.0 “production-ready” version was announced in 2017, marking its transition from beta to being suitable for real-world use.</p>
<h3 id="heading-what-problems-does-cockroachdb-try-to-solve">What Problems Does CockroachDB Try to Solve?</h3>
<p>Traditional relational databases are great, but they run into real challenges when your app grows. CockroachDB was built to solve those. Here are the key pain points and how CockroachDB addresses them:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Pain Point</td><td>What usually happens</td><td>How CockroachDB fixes it</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Single primary bottleneck</strong></td><td>ONLY ONE “primary” node handles writes, updates, and deletes. That node can become difficult to scale (adapt to the DB usage) without downtime</td><td>CockroachDB is <strong>multi-primary</strong>, meaning every node can accept reads and writes. No single “primary” for the entire cluster.</td></tr>
<tr>
<td><strong>Manual sharding complexity</strong></td><td>You have to split data (shard) by hand, decide which piece goes where, and handle cross-shard queries, lots of headache 😖.</td><td>CockroachDB automatically partitions data into smaller units (called <em>ranges</em>) and moves them around to balance load.</td></tr>
<tr>
<td><strong>Failover downtime</strong></td><td>If the primary node fails, you need to promote a replica (read-only instance) and switch over. During that time, your app might be down.</td><td>Because there’s no single primary, if one of the instances fail, others take over seamlessly (via consensus) without a big outage.</td></tr>
<tr>
<td><strong>Geographic scaling &amp; latency</strong></td><td>Serving users in different regions is hard — either data is far away (slow) or you must build complex replication logic.</td><td>CockroachDB lets you distribute nodes across regions. You can serve local reads/writes while keeping global consistency.</td></tr>
</tbody>
</table>
</div><p>So instead of fighting your database as it grows, CockroachDB handles much of the hard work for you.</p>
<h3 id="heading-key-terms-you-should-know-in-plain-language">Key Terms You Should Know (in plain language):</h3>
<ul>
<li><p><strong>Node:</strong> Duplicates or copies of your database. These are also known as replicas. They can be read-only (databases from which data can only be read, for example using SELECT statements), OR read-write (databases from which data can be read, created, updated, and deleted).</p>
</li>
<li><p><strong>Replication</strong>: making copies of data on multiple nodes. If one node fails, others still have the data.</p>
</li>
<li><p><strong>Raft (consensus algorithm)</strong>: a system that ensures copies (replicas) agree on changes in a safe, reliable way. For example, when you want to write data, Raft ensures that most copies agree before it’s accepted.</p>
</li>
<li><p><strong>Sharding / Ranges</strong>: Instead of putting all your data in one big blob, CockroachDB splits it into smaller chunks called <em>ranges</em>. Each range is replicated and can move between nodes.</p>
</li>
<li><p><strong>Distributed transaction</strong>: a transaction (series of operations) that might touch data stored in different nodes. CockroachDB manages this, so you still get ACID (atomic, consistent, isolated, durable) properties.</p>
</li>
</ul>
<h3 id="heading-why-the-name-cockroachdb">Why the name “CockroachDB”? 😅</h3>
<p>You might wonder: <em>Why name a database after a cockroach?</em> It sounds weird at first, but there's a reason:</p>
<p>Cockroaches are known for surviving harsh conditions: radiation, natural disasters, and so on. The founders wanted a database that feels almost “impossible to kill,” that can survive node failures, outages, and network splits. The name is a tongue-in-cheek nod to resilience.</p>
<h2 id="heading-why-choose-cockroachdb-over-postgresql-or-mongodb">Why Choose CockroachDB Over PostgreSQL or MongoDB 🤷🏾‍♂️?</h2>
<p>Let’s compare the classic setup (Postgres / MongoDB) to CockroachDB, especially why you might want to go with CockroachDB, and how it helps ease scaling. I’ll also explain some terms to make sure you’re following.</p>
<p>In many setups, when you use Postgres or MongoDB, you’ll often have one “primary” node that handles all writes (that is, inserts, updates, deletes).</p>
<p>Then you have multiple “read replicas” that copy the primary’s data and serve read requests (selects). That works okay – reads can be spread out – but all write traffic goes to that one primary node.</p>
<p>Usually, the primary eventually gets stressed when the write volume grows (for example, more customers create accounts and products on your platform).</p>
<p>You can add more read replicas (horizontal scaling for reads, for example customers trying to view their accounts, or previously created products on your site), but scaling the primary is much harder.</p>
<p>To scale the primary, you often resort to upgrading its resources (CPU, RAM, disk) – that’s vertical scaling – which often needs downtime (shut down the primary database, increase its CPU and RAM, then spin it back up).</p>
<p>Or you’d have to manually shard (split) your data across multiple primaries, route traffic carefully, and manage complexity.</p>
<h3 id="heading-how-fault-tolerance-is-handled-in-postgresql-and-mongodb">How Fault Tolerance is Handled in PostgreSQL and MongoDB</h3>
<p>When you try to make Postgres (or MongoDB) highly available and fault tolerant in a self-managed setup, you often need two+ read replicas and one primary.</p>
<p>The tricky part is handling what happens when the primary fails (or is taken down temporarily for an upgrade). You need something that can promote a replica to a primary automatically.</p>
<p>In Postgres land, that’s often handled by <a target="_blank" href="https://github.com/patroni/patroni"><strong>Patroni</strong></a> or <a target="_blank" href="https://www.repmgr.org/"><strong>repmgr</strong></a> (tools that handle cluster management, failover, leader election, and so on).</p>
<p>In MongoDB, such logic is part of the <strong>replica set</strong> behavior: it does automatic elections among replicas.</p>
<p>Here are some of the core challenges with that classic model:</p>
<ul>
<li><p>Every write must go to a single primary. If that primary fails or is overloaded, your whole system suffers.</p>
</li>
<li><p>Scaling reads is easy (add more replicas), but scaling writes is hard.</p>
</li>
<li><p>Vertical scaling (give more resources to one server) has its cons. If the primary node needs more resources, you might experience some downtime when it’s being scaled up.</p>
</li>
<li><p>Manual sharding is messy: you decide which piece of data goes to which shard, handle cross-shard queries, and build routing logic. That’s a lot of maintenance and can lead to unexpected issues if not handled properly.</p>
</li>
<li><p>One service (or load balancer/proxy) points to the primary (for ALL write queries).</p>
</li>
<li><p>Another service or routing logic handles read queries and can share reads across replicas.</p>
</li>
<li><p>You might use <strong>HAProxy</strong>, <strong>pgpool-II</strong>, or <strong>pgBouncer</strong> for Postgres to route traffic, do read/write splitting, or manage connection pooling. These are external (not part of the database core) tools you have to configure.</p>
</li>
</ul>
<p>So when the primary fails, Patroni (or repmgr, and so on) will detect it and promote one of the read replicas to be the new primary.</p>
<p>But that promotion, reconfiguration, and traffic rerouting often cause a brief window of downtime (when your primary database node becomes unavailable).</p>
<h3 id="heading-how-cockroachdb-handles-it-differently">How CockroachDB Handles It Differently</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760416070693/af1ade70-19bb-4e9f-82ec-9711c13d8079.jpeg" alt="A brief look at CockroachDB properties" class="image--center mx-auto" width="800" height="800" loading="lazy"></p>
<p>CockroachDB changes the rules:</p>
<ul>
<li><p><strong>All replicas are equal</strong> for reads <em>and</em> writes. You don’t have a special “primary” that handles writes. Every node in the cluster can accept write requests.</p>
</li>
<li><p>CockroachDB breaks your data into small chunks (ranges) and replicates them across nodes. If you add a new node, data moves around automatically to balance the load.</p>
</li>
<li><p>Every write is automatically copied to other replicas, and consistency is managed by a protocol (Raft), so you don’t have to build this yourself.</p>
</li>
<li><p>No manual sharding needed. Because the database handles how data is split and moved, you don’t need to decide how to shard by hand.</p>
</li>
<li><p>You <strong>don’t need a special service</strong> to route writes vs reads queries. Any node can accept both reads <strong>and</strong> writes.</p>
</li>
<li><p>During scaling, you don’t have to worry about which node is the primary – because <em>there is no primary</em>.</p>
</li>
<li><p>You can scale your nodes one at a time (rollout style). When one node is being upgraded, the others continue to serve traffic. You won’t hit a downtime window just because you're scaling the “primary.”</p>
</li>
<li><p>Because there's no replica promotion logic to fight with, there's no moment where a replica needs to be “elevated” to primary – it’s all just nodes continuing to serve.</p>
</li>
</ul>
<h2 id="heading-how-cockroachdb-works-behind-the-scenes">How CockroachDB Works Behind the Scenes ⚙️</h2>
<p>In CockroachDB, there are many moving parts behind the scenes. But they work together, so you don’t have to babysit them. The core ideas, which we’ve mostly already touched on, are:</p>
<ul>
<li><p>Splitting data into pieces (<strong>ranges</strong>)</p>
</li>
<li><p>Keeping multiple copies of each piece (<strong>replicas/replication</strong>)</p>
</li>
<li><p>Making sure all copies agree via <strong>Raft consensus</strong></p>
</li>
<li><p>Moving pieces around to balance the load (<strong>automatic rebalancing/distribution</strong>)</p>
</li>
<li><p>Coordinating transactions that might touch many pieces</p>
</li>
</ul>
<p>Let’s go through each of those, one by one.</p>
<h3 id="heading-ranges-the-small-pieces-of-data">Ranges: The Small Pieces of Data</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760413105037/984f8b5c-bd53-4850-9704-57ce1dcedb80.png" alt="A little depiction of CockroachDB ranges" class="image--center mx-auto" width="977" height="445" loading="lazy"></p>
<p>Imagine you have a giant book of recipes. If you try to carry the whole thing, it’s heavy. So you split the book into smaller booklets, each covering recipes for a certain range of meals: breakfasts, lunches, dinners, desserts.</p>
<p>In CockroachDB, data is split into ranges, which are like those smaller booklets:</p>
<ul>
<li><p>Each range covers a certain block of data (like “all users whose ID is 1-1000”)</p>
</li>
<li><p>When a range gets too big (like having too many recipes in one booklet) it’s cut/split into two smaller ones. That makes each piece easier to manage.</p>
</li>
<li><p>If two neighboring ranges have become very small (few recipes), they might be merged (joined) back together so you’re not keeping too many tiny booklets.</p>
</li>
<li><p>These splits and merges happen automatically, behind the scenes, so the database stays smooth as things grow or shrink.</p>
</li>
</ul>
<p>This chopping helps the system in many ways: moving pieces, copying them, balancing load, recovering from node failures becomes easier.</p>
<h3 id="heading-replication-many-copies-for-safety">Replication: Many Copies for Safety</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760413678362/a0066780-1360-4511-8fd0-466f54ea2135.jpeg" alt="Replication of Ranges across multiple Nodes (databases) in CockroachDB" class="image--center mx-auto" width="1024" height="448" loading="lazy"></p>
<p>Nobody likes losing their work, so you keep backup copies. CockroachDB does this for data as well.</p>
<p>For each range, there are usually 3 copies (replicas) stored on different machines (nodes). If one machine dies, you still have others. (<a target="_blank" href="https://www.cockroachlabs.com/docs/stable/architecture/replication-layer?utm_source=chatgpt.com">cockroachlabs.com</a>). And these copies are always kept in sync: when you write something (for example, insert or update), the change is propagated to the other copies.</p>
<p>The database also tolerates failures. If one node goes down, the system detects it and eventually makes a new copy elsewhere to replace it. So the target number of copies is maintained. This gives you fault tolerance: your data stays safe even when parts of your system fail.</p>
<h3 id="heading-raft-consensus-how-all-copies-agree">Raft Consensus: How All Copies Agree</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760415307117/79859a4b-4341-46eb-91d9-cccc3bde9a66.jpeg" alt="79859a4b-4341-46eb-91d9-cccc3bde9a66" class="image--center mx-auto" width="1024" height="448" loading="lazy"></p>
<p>Having copies is useful, but you also need them to agree with each other – like all your recipe booklets have the same content in each copy. The Raft protocol is a way to make sure that happens reliably.</p>
<p>Here’s how Raft works in simple terms:</p>
<ul>
<li><p>Each range has a group of replicas. One of these replicas acts as the <strong>leader</strong>. Others are <strong>followers</strong>.</p>
</li>
<li><p>All write requests for that range go through the leader. The leader gets the request, then tells followers to record the same change.</p>
</li>
<li><p>Once most of the copies (a majority) say “yep, we got it,” the change is considered final (committed). Then the leader tells the client, “Done.”</p>
</li>
<li><p>If the leader stops working (the machine dies or the network fails), the followers notice it (they stop getting regular “I’m alive” messages), then they hold an election to pick a new leader, and the show goes on.</p>
</li>
<li><p>This way, the system ensures everyone has the same final data and no conflicting changes happen.</p>
</li>
</ul>
<p>So Raft is the agreement protocol that keeps all copies in sync and safe.</p>
<h3 id="heading-multiraft-keeping-raft-efficient-when-things-scale">MultiRaft: Keeping Raft Efficient When Things Scale</h3>
<p>When you have many ranges (many pieces of the booklets), each range has its own Raft group. That can mean a lot of “are you alive?” messages between nodes, and a lot of overhead. MultiRaft is the trick CockroachDB uses to make this efficient.</p>
<p>MultiRaft groups together Raft work for many ranges that share nodes, so overhead is reduced. Instead of sending separate heartbeat (are you alive?) messages for each range, some of the messages are bundled.</p>
<p>This reduces network chatter and resource waste and helps the database scale smoothly when you have tons of data and many pieces.</p>
<h3 id="heading-rebalancing-movement-for-balance">Rebalancing: Movement for Balance</h3>
<p>When your ranges are not evenly spread across nodes (machines), some machines are doing way too much work, and some hardly any. That’s not good. So CockroachDB automatically moves pieces around to balance things.</p>
<ul>
<li><p>The system watches how busy each node is (how many ranges it holds, how much data, how much read/write traffic).</p>
</li>
<li><p>If one node is overloaded, it will move some ranges to other nodes.</p>
</li>
<li><p>If a node dies, the system notices and makes sure that ranges that were on that node get copied somewhere else so safety (replica count) is maintained.</p>
</li>
<li><p>If you add a new node, the system starts moving ranges to the new node so its resources are used.</p>
</li>
</ul>
<p>This happens without you having to manually decide “move this here, move that there.”</p>
<h3 id="heading-distributed-transactions-doing-work-across-multiple-ranges">Distributed Transactions: Doing Work Across Multiple Ranges</h3>
<p>Often, an operation touches multiple ranges. For example, “transfer money from account A (in range 1) to account B (in range 2)”. That must be handled carefully so that either both parts succeed, or neither do.</p>
<p>CockroachDB supports <strong>distributed transactions</strong>, meaning a single transaction can work across many ranges. It uses “intent” writes (temporary placeholders) and once everything is ready, it commits the transaction so it becomes permanent. If something fails, it aborts (cancels) the whole thing. The system ensures atomic behavior: all or nothing.</p>
<h3 id="heading-how-it-all-fits-together-read-write-flow-what-happens-when-you-use-it">How It All Fits Together: Read + Write Flow (What Happens When You Use It)</h3>
<p>Let’s picture a write, step by step:</p>
<ol>
<li><p>Your app sends a write (for example, “add new user”) to any node in the CockroachDB cluster.</p>
</li>
<li><p>That node figures out which range(s) are involved (which pieces hold the data you want to write).</p>
</li>
<li><p>For each range, the write goes to that range’s leader.</p>
</li>
<li><p>The leader writes the change to their own copy, then tells followers to do the same.</p>
</li>
<li><p>Once most copies confirm they have the change, the leader declares it “committed” and tells your app, “yes, write done.”</p>
</li>
<li><p>If a node is busy or down, others still handle traffic.</p>
</li>
</ol>
<p>Read flow:</p>
<ul>
<li><p>Your app sends a read (for example “get user by ID”) to any node.</p>
</li>
<li><p>That node checks its copies. If it has a fresh copy, it answers. If not, it asks the node that does.</p>
</li>
</ul>
<p>Everything works so data is correct, up to date, and reliably available even if machines fail or network lags.</p>
<h3 id="heading-why-this-all-matters-putting-it-in-plain-english">Why This All Matters (Putting It in Plain English)</h3>
<p>All these tweaks are important for several key reasons. First of all, because data is chopped into ranges and replicated, no single node is a bottleneck. Also, Raft ensures consensus, so you can trust that data is consistent across all working replicas.</p>
<p>Beyond this, rebalancing is automatic, you don’t have to micromanage shards or worry about nodes drowning in load. And because transactions that touch multiple ranges are coordinated, you can trust ACID properties even in a distributed setup.</p>
<h2 id="heading-where-and-how-should-you-host-cockroachdb">Where (and How) Should You Host CockroachDB? ☁️</h2>
<p>There isn’t just one “right” way to host CockroachDB. There are a few paths you can pick, each with pros and cons. What you pick depends on cost, control, ease of use, and your risk tolerance.</p>
<p>In this section, we’ll explore:</p>
<ul>
<li><p>Cockroach Labs’ own managed cloud (CockroachDB Cloud)</p>
</li>
<li><p>“Bring Your Own Cloud” (BYOC) – letting Cockroach Labs manage it inside <em>your</em> cloud account</p>
</li>
<li><p>Hosting via cloud marketplaces (AWS, GCP, Azure)</p>
</li>
<li><p>Self-hosting / Kubernetes / your own infrastructure</p>
</li>
<li><p>And notes on DigitalOcean support</p>
</li>
</ul>
<p>Let’s dive in.</p>
<h3 id="heading-option-1-cockroachdb-cloud-fully-managed-by-cockroach-labs">Option 1: CockroachDB Cloud (fully managed by Cockroach Labs)</h3>
<p>This is the easiest option if you want to offload operations. You don’t manage nodes (computers, Virtual machines, and so on), upgrades, or backups, as Cockroach Labs handles all that.</p>
<p><strong>What it offers:</strong></p>
<ul>
<li><p>You sign up and click “create cluster.”</p>
</li>
<li><p>Automatic scaling, zero-downtime upgrades, and managed backups.</p>
</li>
<li><p>It supports multiple cloud providers behind the scenes (you pick region(s)).</p>
</li>
<li><p>You get tools, APIs, and Terraform integration to automate it.</p>
</li>
<li><p>They often give free credits to get started.</p>
</li>
</ul>
<p><strong>Tradeoffs:</strong></p>
<ul>
<li><p>You have less control over underlying infrastructure, for example Virtual Machines, networking, disks, and so on (you trade control for convenience).</p>
</li>
<li><p>You pay for the managed service premium.</p>
</li>
<li><p>You rely on Cockroach Labs’ SLAs, uptime, and support.</p>
</li>
</ul>
<p>If you want, you can check it out here: <a target="_blank" href="https://www.cockroachlabs.com/product/cloud/">CockroachDB Cloud (managed by Cockroach Labs)</a>.</p>
<h3 id="heading-option-2-bring-your-own-cloud-byoc">Option 2: Bring Your Own Cloud (BYOC)</h3>
<p>This is a middle ground: you keep your cloud environment, but let Cockroach Labs manage the database. It gives you control over infrastructure, billing, network, and so on, while still offloading operational complexity.</p>
<p><strong>How it works:</strong></p>
<ul>
<li><p>You run CockroachDB Cloud inside your cloud account (AWS, GCP, and so on).</p>
</li>
<li><p>Cockroach Labs still handles provisioning, upgrades, backups, and observability. You manage roles, networking, and logs.</p>
</li>
<li><p>Useful for complying with regulations, keeping data within your cloud folder/account, and using your cloud discounts.</p>
</li>
</ul>
<p><strong>Tradeoffs:</strong></p>
<ul>
<li><p>You still need to set up cloud aspects (VPCs, IAM, roles) correctly.</p>
</li>
<li><p>There’s more complexity than pure managed, but more control as well.</p>
</li>
<li><p>Cockroach Labs needs access to certain parts of your account (permissions).</p>
</li>
</ul>
<p>If you want to explore BYOC, you can read more here: <a target="_blank" href="https://www.cockroachlabs.com/product/cloud/bring-your-own-cloud/">CockroachDB Bring Your Own Cloud</a>.</p>
<h3 id="heading-option-3-use-cloud-marketplaces-aws-gcp-azure">Option 3: Use Cloud Marketplaces (AWS, GCP, Azure)</h3>
<p>If you already use a cloud provider, sometimes the easiest way is to deploy via their marketplace offerings. It gives you familiarity, billing simplicity, and so on.</p>
<ul>
<li><p><strong>GCP Marketplace</strong> – CockroachDB is available on the Google Cloud Marketplace, making it easier to deploy within your GCP environment. You can learn more here: <a target="_blank" href="https://console.cloud.google.com/marketplace/product/cockroachdb-public/cockroachdb">GCP Marketplace</a>.</p>
</li>
<li><p><strong>AWS Marketplace</strong> – CockroachDB is listed there: <a target="_blank" href="https://aws.amazon.com/marketplace/pp/prodview-n3xpypxea63du">AWS Marketplace</a>.</p>
</li>
<li><p><strong>Azure Marketplace</strong> – Also supported for Azure deployments (SaaS/managed listings): <a target="_blank" href="https://marketplace.microsoft.com/en-us/product/saas/cockroachlabs1586448087626.cockroachdb-azure?tab=overview">Azure Marketplace</a>.</p>
</li>
<li><p><strong>DigitalOcean</strong> – There is support for CockroachDB deployment on DigitalOcean using their infrastructure: <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/deploy-cockroachdb-on-digital-ocean">Deploy CockroachDB on DigitalOcean</a>.</p>
</li>
</ul>
<p>These options let you stay in your cloud console, use your existing cloud accounts, and integrate with other resources you already have.</p>
<p>But you're still responsible for certain operational tasks (networking, security, monitoring, backups) depending on how the marketplace offering is configured.</p>
<h3 id="heading-option-4-my-favorite-self-hosting-especially-using-kubernetes">Option 4 (My Favorite 😁): Self-Hosting — Especially Using Kubernetes</h3>
<p>If you self-host CockroachDB, you get <strong>full control</strong>. You’re the boss of everything: the machines, storage, networking, backups, upgrades, monitoring – all of it.</p>
<p>What’s even better is that using Kubernetes means your setup isn’t tied to one cloud provider. You can run it on AWS, GCP, Azure, or even on-premises later, with very little change. Kubernetes gives you a “portable infra” layer.</p>
<p>Managed CockroachDB services charge you extra for “maintenance, upgrades, backup, etc.” – those are baked into the price. But when you self-host, you accept the burden, but also avoid paying that extra margin. You pay for compute, disks, network, and your time/ops work.</p>
<p>You can also self-host in the cloud (using cloud VMs) but still manage every layer: disks, network, security, and so on. Using Kubernetes, there is a sweet middle ground: you get cloud reliability for VMs, but you fully control everything above that.</p>
<h4 id="heading-why-kubernetes-beats-tools-like-docker-swarm-or-hashicorp-nomad-for-databases">Why Kubernetes Beats Tools Like Docker Swarm or Hashicorp Nomad for Databases</h4>
<p>Because CockroachDB is a <strong>stateful</strong> system (it holds data), you need strong support for “data that stays even when a pod restarts or moves.” Kubernetes is designed with good primitives for that. Other tools don’t always shine there.</p>
<p>Here’s the comparison in simple terms:</p>
<ul>
<li><p><strong>Docker Swarm / Docker Compose:</strong> Great for stateless apps (web servers, APIs), but when it comes to databases, it struggles. Swarm doesn’t natively support persistent volume claims at a cluster level, so if a container (database replica) moves to a different node (VM), it might lose access to its storage. Devs often pin containers to specific nodes manually to avoid this.</p>
</li>
<li><p><strong>Nomad:</strong> More flexible and simpler in some ways, but it’s not as rich in features around connectivity, storage management, and built-in tooling for containers. It works well in mixed workloads, but handling complex databases usually means you need to build extra layers.</p>
</li>
<li><p><strong>Kubernetes:</strong> It has built-in support for stateful workloads:</p>
<ul>
<li><p><strong>StatefulSets (Properly managing data for each database):</strong> This ensures that each CockroachDB replica (pod) keeps its identity and storage intact even if the pod restarts. So the database replica doesn’t lose its “name” or data when things change.</p>
</li>
<li><p><strong>Persistent volumes and persistent volume claims (external disks):</strong> These are like dedicated hard drives or disks attached to pods (database replicas). Even if a pod moves, crashes, or restarts, the disk (data) stays. Kubernetes makes sure the data stays safe.</p>
</li>
<li><p><strong>StorageClasses (choose your disk):</strong> You can customize the disks in which your data will be stored, that is:</p>
<ul>
<li><p>HDD (most affordable, but slower),</p>
</li>
<li><p>Balanced Disk (SSD enabled, a balance between costs and speed),</p>
</li>
<li><p>Fast SSD (Very fast, recommended by the CockroachDB team, but a bit more expensive than a Balanced Disk).</p>
</li>
<li><p>Rolling updates, anti-affinity, (No Downtime, High Availability, Fault tolerance).<br>  Anti-affinity means you can tell Kubernetes, “don’t put more than one CockroachDB replica on the same VM or physical machine.” This protects you if one VM goes bad, other replicas are safe.</p>
</li>
<li><p>Rolling updates let you update one replica at a time (configuration, version, resources) without bringing down the whole cluster. While one replica updates, others serve traffic. That helps avoid downtime.</p>
</li>
<li><p>Kubernetes also has ordered start/stop for replicas (via StatefulSets) so things are predictable and safe</p>
</li>
</ul>
</li>
<li><p><strong>Vertical vs horizontal scaling (earlier talk – reminder)</strong><br>  You remember we talked about scaling in prior sections:</p>
<ul>
<li><p><strong>Horizontal scaling</strong> means adding more replicas (more pods, more nodes) so load spreads out.</p>
</li>
<li><p><strong>Vertical scaling</strong> means increasing the resources (CPU, RAM, disk) of existing nodes/replicas.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>        In tools like Nomad or Docker Swarm, vertical scaling tends to be harder, often involves stopping services, shutting things down, and restarting VMs, which causes downtime.</p>
<p>        Kubernetes makes vertical and horizontal scaling easier at the pod level (you can resize one pod CPU + RAM) and manage rolling upgrades so you don’t take everything down at once.</p>
<p>        You can also add more database replicas to the cluster easily (to balance load and make the database process queries faster), and the data is automatically copied to the new database replica (replication), especially when you use the official CockroachDB Helm Chart.</p>
<h4 id="heading-why-other-tools-swarm-nomad-docker-compose-dont-match-up-here">Why Other Tools (Swarm / Nomad / Docker Compose) Don’t Match Up Here</h4>
<p>Docker Swarm and Docker Compose are simpler to use and are good when you don’t have much complexity. But they lack robust features for stable storage, default support for replication, vertical scaling, horizontal scaling of stateful services, and so on. For example, Swarm doesn’t have built-in StatefulSets or dynamic volume provisioning like Kubernetes.</p>
<p>Nomad is more flexible than Swarm in some ways, but many users say storage plugins (CSI) are weaker than what Kubernetes has. Also, less built-in for ordering things, rolling updates for stateful apps.</p>
<p>So while these work fine for simpler apps (stateless services, small apps), when you have a distributed stateful SQL database like CockroachDB, Kubernetes gives you more safety, more control, less chance of data loss or misconfiguration.</p>
<p>Because of all this, running CockroachDB on Kubernetes gives you the tools you need baked in, reducing how much custom plumbing you must write yourself.</p>
<h4 id="heading-trade-offhttpswwwredditcomrhashicorpcomments1ivtuo5utmsourcechatgptcoms-things-to-watch-out-for">Trade-of<a target="_blank" href="https://www.reddit.com/r/hashicorp/comments/1ivtuo5?utm_source=chatgpt.com">f</a>s (things to watch out for)</h4>
<ul>
<li><p>You have to manage everything: backups, monitoring the ENTIRE CockroachDB cluster, withstanding failures (fault tolerance), and upgrades. That’s work 🥲.</p>
</li>
<li><p>You need to know your way around infra (VMs, disks, networking, and inter-node connections) and operations (or have teammates who do – DevOps Engineers, Cloud Architects, Site Reliability Engineers).</p>
</li>
<li><p>Using managed Kubernetes (like GKE, EKS, AKS) helps as you offload the control plane. You still manage the nodes, storage, and higher layers.</p>
</li>
<li><p>But even with that, you avoid paying for “database management as a service” markup – you're only paying for infrastructure plus your time.</p>
</li>
</ul>
<h2 id="heading-setting-up-your-local-environment"><strong>Setting Up Your Local Environment 🧑‍💻</strong></h2>
<p>Alright, we’ve learned quite a bit so far: what CockroachDB is, how it works behind the scenes, and where you can host it. Now, it’s time to roll up our sleeves and get our hands dirty with some practical setup.</p>
<p>Before we deploy CockroachDB, we need a safe “playground” where we can test and experiment without touching the cloud or spending a dime.</p>
<h3 id="heading-why-these-tools">Why these tools?</h3>
<p>Before we jump into running commands, here’s a quick lookup of what tools we’ll use and why:</p>
<ul>
<li><p><strong>Minikube</strong>: A tool that runs a small Kubernetes cluster on your computer. It gives you a local “mini cloud” where you can deploy and experiment.</p>
</li>
<li><p><strong>Kubectl</strong>: The command line tool you’ll use to talk to your Kubernetes cluster to deploy apps, check status, and manage resources.</p>
</li>
<li><p><strong>Helm</strong>: A package manager for Kubernetes. It helps you install complex applications (like CockroachDB) with fewer manual steps.</p>
</li>
</ul>
<h3 id="heading-step-1-install-minikube">Step 1: Install Minikube</h3>
<p><strong>What is Minikube?</strong><br>Minikube is a lightweight tool that helps you run a small Kubernetes cluster on your personal computer.</p>
<p>Think of it as your own mini-cloud environment where you can test, deploy, and learn Kubernetes (and in our case, CockroachDB) locally. It’s perfect for learning and experimenting before deploying on the cloud.</p>
<p>Here’s how to get it on different operating systems:</p>
<h4 id="heading-windows">🪟 Windows</h4>
<ol>
<li><p>Make sure you have a hypervisor (VirtualBox, Hyper-V) or Docker installed.</p>
</li>
<li><p>Open PowerShell as Administrator.</p>
</li>
<li><p>Run:</p>
<pre><code class="lang-bash"> choco install minikube
</code></pre>
<p> or use:</p>
<pre><code class="lang-bash"> winget install minikube
</code></pre>
</li>
<li><p>After installation, check the version:</p>
<pre><code class="lang-bash"> minikube version
</code></pre>
<p> If it returns a version number, you’re good 👍🏾</p>
</li>
</ol>
<p>If you don’t have the <code>choco</code> or <code>winget</code> package manager, you can install Minikube via PowerShell by following the steps in the <a target="_blank" href="https://minikube.sigs.k8s.io/docs/start/?arch=%2Fwindows%2Fx86-64%2Fstable%2F.exe+download">docs</a>.</p>
<h4 id="heading-macos">🍎 macOS</h4>
<ol>
<li><p>Ensure you have Homebrew installed.</p>
</li>
<li><p>In Terminal, run:</p>
<pre><code class="lang-bash"> brew install minikube
</code></pre>
</li>
<li><p>Start the cluster:</p>
<pre><code class="lang-bash"> minikube start
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> minikube version
</code></pre>
</li>
</ol>
<h4 id="heading-linux">🐧 Linux</h4>
<ol>
<li><p>Ensure you’re on a supported distribution (Ubuntu, Fedora, and so on) and virtualization (Docker, KVM, and so on) is enabled.</p>
</li>
<li><p>Run:</p>
<pre><code class="lang-bash"> curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
 sudo install minikube-linux-amd64 /usr/<span class="hljs-built_in">local</span>/bin/minikube
 rm minikube-linux-amd64
</code></pre>
</li>
<li><p>Start the cluster:</p>
<pre><code class="lang-bash"> minikube start
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> minikube status
</code></pre>
</li>
</ol>
<p>✅ At this point you should have a local Kubernetes cluster up and running on your machine! Next, we’ll install Kubectl so you can talk to the cluster from your command line.</p>
<h3 id="heading-step-2-install-kubectl">Step 2: Install kubectl</h3>
<p><strong>What kubectl does:</strong><br>kubectl is the command-line tool that lets you talk to your Kubernetes cluster. Using it, you can deploy applications, check your cluster’s health, and manage resources inside your cluster.</p>
<p>You’ll use it a lot when working with Kubernetes on Minikube and later when you deploy CockroachDB.</p>
<p>Here’s how to install it on Windows, macOS, and Linux:</p>
<h4 id="heading-windows-1">🪟 Windows</h4>
<ol>
<li><p>Open PowerShell as Administrator.</p>
</li>
<li><p>Run:</p>
<pre><code class="lang-bash"> choco install kubernetes-cli
</code></pre>
<p> or if you prefer:</p>
<pre><code class="lang-bash"> choco install kubectl
</code></pre>
</li>
<li><p>Then check the version:</p>
<pre><code class="lang-bash"> kubectl version --client
</code></pre>
<p> If it prints a version number, you’re good.</p>
</li>
</ol>
<h4 id="heading-macos-1">🍎 macOS</h4>
<ol>
<li><p>Open Terminal.</p>
</li>
<li><p>If you have Homebrew installed, run:</p>
<pre><code class="lang-bash"> brew install kubectl
</code></pre>
</li>
<li><p>Check the version:</p>
<pre><code class="lang-bash"> kubectl version --client
</code></pre>
<p> That should show something like “Client Version: v1.x.x”.</p>
</li>
</ol>
<h4 id="heading-linux-1">🐧 Linux</h4>
<ol>
<li><p>Open your terminal.</p>
</li>
<li><p>Download the latest kubectl binary:</p>
<pre><code class="lang-bash"> curl -LO <span class="hljs-string">"https://dl.k8s.io/release/<span class="hljs-subst">$(curl -L -s https://dl.k8s.io/release/stable.txt)</span>/bin/linux/amd64/kubectl"</span>
</code></pre>
</li>
<li><p>Make it executable and move it into your PATH:</p>
<pre><code class="lang-bash"> chmod +x ./kubectl
 sudo mv ./kubectl /usr/<span class="hljs-built_in">local</span>/bin/kubectl
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> kubectl version --client
</code></pre>
</li>
</ol>
<p>After this, you’ll have kubectl installed and ready to use with your local Minikube cluster. Next up we’ll install Helm, which will make deploying CockroachDB much easier.</p>
<h3 id="heading-step-3-install-helm">Step 3: Install Helm</h3>
<p>Helm is basically the package manager for Kubernetes. Think of it like how you use <code>apt</code>, <code>yum</code>, or <code>brew</code> to install software on your computer. Helm does something similar for Kubernetes apps.</p>
<p>With Kubernetes, deploying a full app often means writing lots of configs (manifests – Deployments, Services, PersistentVolumes, ConfigMaps, and so on). Helm lets us bundle all of that into a single “package” (called a chart) so we don’t have to manually create the resources one-after-the-other (which could be hectic to manage btw 😖).</p>
<p>Because our goal is to deploy a pretty complex system (CockroachDB) on Kubernetes – which includes stateful nodes, persistent storage, networking, SSL/TLS, and so on – using a Helm chart makes it <em>so much easier</em> than crafting dozens of YAML files from scratch.</p>
<p>So before we install CockroachDB, we’ll install Helm. This gives us the toolkit to deploy and manage our cluster much more easily.</p>
<p>Let’s install Helm on each platform. After this, you’ll have the <code>helm</code> command ready to deploy apps into your Kubernetes cluster.</p>
<h4 id="heading-windows-2">🪟 Windows</h4>
<ol>
<li><p>Open PowerShell as Administrator.</p>
</li>
<li><p>If you have Chocolatey installed, run:</p>
<pre><code class="lang-bash"> choco install kubernetes-helm
</code></pre>
<p> Alternatively:</p>
<pre><code class="lang-bash"> choco install helm
</code></pre>
</li>
<li><p>Confirm installation:</p>
<pre><code class="lang-bash"> helm version
</code></pre>
<p> You should see something like <code>version.BuildInfo{Version:"v3.x.x",…}</code>.</p>
</li>
</ol>
<h4 id="heading-macos-2">🍎 macOS</h4>
<ol>
<li><p>Open Terminal.</p>
</li>
<li><p>With Homebrew installed, run:</p>
<pre><code class="lang-bash"> brew install helm
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> helm version
</code></pre>
<p> If you see version info, you’re good.</p>
</li>
</ol>
<h4 id="heading-linux-2">🐧 Linux</h4>
<ol>
<li><p>Open your terminal.</p>
</li>
<li><p>Download and install the binary (example for the latest version):</p>
<pre><code class="lang-bash"> curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
 chmod 700 get_helm.sh
 ./get_helm.sh
</code></pre>
<p> Or you can directly download the binary and move it into your <code>PATH</code>.</p>
</li>
<li><p>Check version:</p>
<pre><code class="lang-bash"> helm version
</code></pre>
</li>
</ol>
<p>✅ After this, you have <code>helm</code> installed and you’re ready to use it.</p>
<p>In the next part, we’ll use Helm to install CockroachDB into your local Minikube cluster. We’ll add the CockroachDB chart, configure it, and spin up a multi-node replica setup right on your PC.</p>
<h2 id="heading-deploying-cockroachdb-on-minikube-the-fun-part-begins">Deploying CockroachDB on Minikube (The Fun Part Begins 😁!)</h2>
<p>Before we go to the cloud, we’ll deploy CockroachDB locally on Minikube using Helm.</p>
<p>This process will help us:</p>
<ul>
<li><p>Understand how CockroachDB runs in a cluster</p>
</li>
<li><p>Learn how Kubernetes manages database replicas</p>
</li>
<li><p>Gain hands-on experience before deploying to the cloud</p>
</li>
</ul>
<h3 id="heading-step-1-visit-artifacthub">Step 1: Visit ArtifactHub</h3>
<p><strong>ArtifactHub</strong> is like an App Store for Kubernetes Helm Charts – a huge collection of open-source Helm charts and packages you can easily install.</p>
<ol>
<li><p>Go to <a target="_blank" href="https://artifacthub.io">https://artifacthub.io</a></p>
</li>
<li><p>In the search bar, type <strong>CockroachDB</strong></p>
</li>
<li><p>Click the <strong>CockroachDB Helm chart</strong> result (you’ll see it published by <em>Cockroach Labs</em>).</p>
</li>
</ol>
<p>You’ll see something like this 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760848079912/1778bbcf-088a-4919-80bb-ca24241ffa85.png" alt="The official CockroachDB Helm chart" class="image--center mx-auto" width="918" height="469" loading="lazy"></p>
<h3 id="heading-step-2-explore-the-helm-chart">Step 2: Explore the Helm Chart</h3>
<p>You’ll notice a lot of information on the page:</p>
<ul>
<li><p><strong>README</strong> – the documentation for installing and customizing CockroachDB</p>
</li>
<li><p><strong>Default Values</strong> – all the settings that define how the database runs</p>
</li>
</ul>
<p>Don’t worry if it looks overwhelming. We’ll walk through it together 😉</p>
<h3 id="heading-step-3-copy-the-default-values">Step 3: Copy the Default Values</h3>
<p>Every Helm chart has a <em>default configuration</em> file. These defaults are usually too advanced or too heavy for local setups, so we’ll create our own lighter version. But first, let’s copy the original for reference.</p>
<ol>
<li><p>On the CockroachDB chart page, click the <strong>Default Values</strong> button.</p>
</li>
<li><p>A modal window will pop up showing a long YAML file.</p>
</li>
<li><p>Click the <strong>Copy</strong> icon in the top-right corner to copy all the default values.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760848210119/17cd734b-6d7c-40dc-a8c3-f01c85edd7a7.png" alt="The Default Values button description" class="image--center mx-auto" width="896" height="458" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760848520060/1e1ce249-0cf0-46cb-abbc-00efb3ea1343.png" alt="Copy the default values" class="image--center mx-auto" width="781" height="399" loading="lazy"></p>
<h3 id="heading-step-4-create-a-folder-for-our-project">Step 4: Create a Folder for Our Project</h3>
<p>We’ll keep everything organized in a single folder.</p>
<pre><code class="lang-bash">mkdir cockroachdb-tutorial
<span class="hljs-built_in">cd</span> cockroachdb-tutorial
</code></pre>
<p>Inside this folder, create a new file called:</p>
<pre><code class="lang-bash">nano cockroachdb-original-values.yml
</code></pre>
<p>Now paste all the default values you copied earlier (use Ctrl+V or right-click → Paste), then save and exit (<code>Ctrl+O</code>, then <code>Ctrl+X</code> in nano).</p>
<p>If you’re on Windows, just open Notepad/VSCode, paste the content, and save the file in the same folder.</p>
<h3 id="heading-step-5-understanding-the-key-configurations">Step 5: Understanding the Key Configurations</h3>
<p>Let’s break down a few important values you’ll notice in the file.</p>
<h4 id="heading-statefulsetreplicas">🧩 <code>statefulset.replicas</code></h4>
<p>This tells CockroachDB how many database nodes (replicas) to run in the cluster. By default, it’s set to 3, meaning you’ll have 3 independent database instances that can all read and write data.</p>
<h4 id="heading-statefulsetresourcesrequests-and-statefulsetresourceslimits">⚙️ <code>statefulset.resources.requests</code> and <code>statefulset.resources.limits</code></h4>
<p>These settings tell Kubernetes how much CPU and memory to give CockroachDB.</p>
<ul>
<li><p><code>requests</code>: the minimum guaranteed amount</p>
</li>
<li><p><code>limits</code>: the maximum allowed amount</p>
</li>
</ul>
<p>CockroachDB can be a bit greedy with memory 😅, so limits make sure it doesn’t take everything and leave no room for other apps.</p>
<h4 id="heading-storagepersistentvolumesize">💾 <code>storage.persistentVolume.size</code></h4>
<p>This defines how much disk space each CockroachDB node gets. For example, if you set it to <code>10Gi</code> and you have 3 replicas, total usage = <code>30Gi</code>.</p>
<h4 id="heading-storagepersistentvolumestorageclass">💽 <code>storage.persistentVolume.storageClass</code></h4>
<p>This defines the type of disk to use:</p>
<ul>
<li><p><code>standard</code>: HDD (cheap but slow)</p>
</li>
<li><p><code>standard-rwo</code>: SSD (faster and affordable)</p>
</li>
<li><p><code>pd-ssd</code> or <code>fast-ssd</code>: NVMe (super fast but pricey)</p>
</li>
</ul>
<p>You can check available storage classes in your Minikube cluster using:</p>
<pre><code class="lang-bash">kubectl get sc
</code></pre>
<p>On Minikube, the default storage class is usually <code>standard</code>.</p>
<p>You can learn more about <a target="_blank" href="https://cloud.google.com/kubernetes-engine/docs/concepts/storage-overview">Google Cloud storage classes here</a>.</p>
<h4 id="heading-tlsenabled">🔐 <code>tls.enabled</code></h4>
<p>This controls whether CockroachDB requires <strong>TLS certificates</strong> for secure connections.</p>
<p>If <code>true</code>, you’ll need to generate certificates for any app or client that connects to your cluster (instead of using a username and password). This is <strong>strongly recommended for production</strong>, but for our local Minikube setup, we’ll disable it so it’s easier to play around and test connections.</p>
<h3 id="heading-step-6-create-a-simplified-values-config-for-the-cockroachdb-helm-chart">Step 6: Create a Simplified Values Config for the CockroachDB Helm Chart</h3>
<p>We’ll now create a new config file with lighter resource settings for our local test environment.</p>
<p>In the same folder, create:</p>
<pre><code class="lang-bash">nano cockroachdb-values.yml
</code></pre>
<p>Then paste this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">podSecurityContext:</span>
    <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span> <span class="hljs-comment"># You should have 3GB+ of RAM free on your device; else, you can reduce this to 500Mi (this will result in your PC needing just 1.5 GB of RAM free)</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>  <span class="hljs-comment"># The same with this, you can reduce it to 500m CPU if you don't have up to 3 CPU cores (1 CPU core * 3 replicas)</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">podAntiAffinity:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>
  <span class="hljs-attr">nodeSelector:</span>
    <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>

<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">5Gi</span> <span class="hljs-comment"># Make sure you have 15GB+ of free storage on your local machine, if not, you can reduce it to 2 - 3 Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">standard</span>

<span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>

<span class="hljs-attr">init:</span>
  <span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">wait:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Setting the <code>requests</code> and <code>limits</code> to the same value ensures Kubernetes won’t terminate CockroachDB pods due to high memory or CPU usage.</p>
<p>You can <a target="_blank" href="https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/">read more about this here</a>.</p>
<h3 id="heading-overview-of-the-yaml-values">Overview of the YAML values</h3>
<p>Now, let’s understand the content of the <code>cockroachdb-values.yml</code> file together</p>
<p><code>podSecurityContext</code> – why you needed it on Minikube:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">podSecurityContext:</span>
  <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
</code></pre>
<p>This block sets the Linux user and group IDs that the CockroachDB process runs as inside the container, and the group ownership for mounted files.</p>
<p>Why this matters, simply:</p>
<ul>
<li><p>The CockroachDB process runs as <strong>UID 1000</strong> inside the container. If the disk mount (the persistent volume) is owned by a different UID, Cockroach can’t create files there and fails with <code>permission denied</code>.</p>
</li>
<li><p><code>runAsUser</code> and <code>runAsGroup</code> make the container process run as UID/GID 1000.</p>
</li>
<li><p><code>fsGroup</code> makes the mounted volume be accessible to that group, so the process can write to <code>/cockroach/cockroach-data</code>.</p>
</li>
</ul>
<p>In short, these lines make sure the DB process has permission to create and write files on the mounted disk (volume), which is especially important on Minikube and other local setups where host-mounted storage can have odd permissions.</p>
<p><code>podAntiAffinity</code> and <code>nodeSelector</code> – what they do:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">podAntiAffinity:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>

<span class="hljs-attr">nodeSelector:</span>
  <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
</code></pre>
<p><code>podAntiAffinity</code> is the default behavior. Normally this tells Kubernetes to <em>spread</em> pods across different nodes (VMs), so replicas don’t run on the same physical machine. This is good for high availability, because one node failing won’t kill multiple replicas.</p>
<p>By setting <code>type: ""</code> (empty), you <strong>disabled</strong> that spreading rule, so Kubernetes can place multiple CockroachDB replicas on the same node.</p>
<p><code>nodeSelector</code> tells Kubernetes to schedule pods only on nodes that match the label you set (here <code>kubernetes.io/hostname: minikube</code>). That forces all pods to run on the node named <code>minikube</code>.</p>
<p>Quick summary of the effect:</p>
<ul>
<li><p>Good for local testing on a multi-node Minikube cluster, when only one node has properly mounted writable storage.</p>
</li>
<li><p><strong>Not recommended for production</strong>, because it places all replicas on the same machine (single point of failure).</p>
</li>
</ul>
<p>PS: If you’re using another Kubernetes cluster provider, for example K3s, Kind, and so on… this might not get deployed due to the nodeSelector property targeting <code>minikube</code> nodes. So, I'd advise removing the <code>nodeSelector</code> property entirely.</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">nodeSelector:</span>
    <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
<span class="hljs-string">...</span>
</code></pre>
<p>✅ <strong>At this point</strong>, we’ve:</p>
<ul>
<li><p>Copied the default CockroachDB Helm chart configuration</p>
</li>
<li><p>Created a lightweight version for Minikube</p>
</li>
<li><p>Learned what each key property means</p>
</li>
</ul>
<h3 id="heading-step-7-install-the-cockroachdb-cluster-using-helm">🚀 Step 7: Install the CockroachDB Cluster Using Helm</h3>
<p>Great job so far! You’ve created your <code>cockroachdb-values.yml</code> file and set up your custom configuration for Minikube. Now we’ll actually deploy the cluster.</p>
<p><strong>What we’re going to do:</strong><br>We’ll use Helm to install the official CockroachDB Helm chart using our custom values. This will spin up your 3-node cluster locally so you can play with it.</p>
<p><strong>Command to run:</strong></p>
<pre><code class="lang-bash">helm install crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>Here:</p>
<ul>
<li><p><code>crdb</code> is the name we’re giving this release (you can pick something else if you like).</p>
</li>
<li><p><code>cockroachdb/cockroachdb</code> tells Helm which chart to use.</p>
</li>
<li><p><code>-f cockroachdb-values.yml</code> tells Helm to use our custom file instead of default values.</p>
</li>
</ul>
<h4 id="heading-after-the-command-runs">After the command runs:</h4>
<p>After a little while the command completes, and you’ll see output telling you what resources were created (pods, services, persistent volume claims, and so on).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761386160496/babc3e67-1ea9-4aa1-b6a7-516fe3a9972a.png" alt="The CockroachDB Helm Chart post-installation message" class="image--center mx-auto" width="923" height="675" loading="lazy"></p>
<p>Now to check if everything is working, do this:</p>
<pre><code class="lang-bash">kubectl get pods | grep -i crdb
</code></pre>
<p>This filters pods with “crdb” in the name (our release prefix).</p>
<p>You should see something like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761386195190/21469ce5-c909-4336-ba5f-a4c4a776a470.png" alt="The CockroachDB replicas running successfully" class="image--center mx-auto" width="528" height="105" loading="lazy"></p>
<p>The three primary pods (<code>0</code>, <code>1</code>, <code>2</code>) should be in <code>Running</code> state. The <code>init</code> job or pod (<code>crdb-cockroachdb-init-xxx</code>) should show <code>Completed</code>. This means the initialization tasks (cluster bootstrap) succeeded.</p>
<p>If you see that, congratulations! You’ve got your local CockroachDB cluster up and running! 🎉</p>
<h2 id="heading-accessing-the-cockroachdb-console-amp-viewing-metrics">Accessing the CockroachDB Console &amp; Viewing Metrics</h2>
<p>Alright! Now that our CockroachDB cluster is up and running, let’s take a peek behind the scenes and explore the CockroachDB Admin Console. It’s a beautiful web dashboard that helps us visualize everything happening in our database cluster.</p>
<p>In this section, we’ll learn how to:</p>
<ul>
<li><p>Access the CockroachDB admin console right from your browser 🖥️</p>
</li>
<li><p>Understand what each built-in dashboard shows (CPU, memory, disk, SQL performance)</p>
</li>
<li><p>Confirm that our cluster is healthy and that all 3 nodes are working together perfectly</p>
</li>
</ul>
<h3 id="heading-step-1-locate-the-cockroachdb-public-service">Step 1: Locate the CockroachDB Public Service</h3>
<p>CockroachDB automatically creates a <strong>public service</strong> that allows us to connect to the database and also access its dashboard.</p>
<p>Let’s check it out by running:</p>
<pre><code class="lang-bash">kubectl get svc | grep -i crdb
</code></pre>
<p>You should see a line similar to:</p>
<pre><code class="lang-bash">crdb-cockroachdb-public   ClusterIP   10.x.x.x   &lt;none&gt;   26257/TCP,8080/TCP   ...
</code></pre>
<p>This service (<code>crdb-cockroachdb-public</code>) is what we’ll use to connect to both:</p>
<ul>
<li><p>The <strong>database</strong> itself (via port 26257)</p>
</li>
<li><p>The <strong>dashboard UI</strong> (via port 8080)</p>
</li>
</ul>
<h3 id="heading-step-2-learn-more-about-the-service">Step 2: Learn More About the Service</h3>
<p>Let’s dig a little deeper to understand it:</p>
<pre><code class="lang-bash">kubectl describe svc crdb-cockroachdb-public
</code></pre>
<p>Here’s what you’ll notice:</p>
<ul>
<li><p><strong>Port 26257</strong> is used for <strong>gRPC connections</strong> (this is how applications connect to send and receive SQL queries).</p>
</li>
<li><p><strong>Port 8080</strong> is used for the <strong>web dashboard</strong>, where we can view metrics and monitor performance.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761387757614/dab8cfd0-2d89-45b0-a54f-41e530f1a6ab.png" alt="Description of the crdb-cockroachdb-public service" class="image--center mx-auto" width="938" height="431" loading="lazy"></p>
<h3 id="heading-step-3-access-the-cockroachdb-dashboard">Step 3: Access the CockroachDB Dashboard</h3>
<p>Now, let’s make the dashboard available on your local computer. Run this command:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 8080:8080
</code></pre>
<p>This command simply tells Kubernetes:</p>
<blockquote>
<p>“Hey, please open a tunnel from my local computer’s port 8080 to the CockroachDB service’s port 8080 in the cluster.”</p>
</blockquote>
<p>Once you see something like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761387838362/186ff222-c643-4e67-b0a4-dbaff8777977.png" alt="Result of port-forwarding the crdb-cockroachdb-public service on port 8080" class="image--center mx-auto" width="832" height="59" loading="lazy"></p>
<p>...you’re good to go!</p>
<h3 id="heading-step-4-visit-the-dashboard">Step 4: Visit the Dashboard</h3>
<p>Now, open your browser and go to http://localhost:8080.</p>
<p>You’ll see the CockroachDB Admin Console. This is your central command center for monitoring your cluster</p>
<p>Here, you’ll be able to view:</p>
<ul>
<li><p><strong>Number of replicas (nodes)</strong>: You should see 3 in our setup.</p>
</li>
<li><p><strong>RAM usage</strong> per node: Helps track how much memory each CockroachDB instance is using.</p>
</li>
<li><p><strong>CPU usage</strong>: Useful to know when your database is getting busy.</p>
</li>
<li><p><strong>Disk space</strong>: Shows how much data your cluster is storing and how much free space remains.</p>
</li>
</ul>
<p>Here’s what your dashboard might look like 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761387968743/327288e5-4811-42bf-8fd8-74ed187792a4.png" alt="The CockroachDB dashboard UI on http://localhost:8080" class="image--center mx-auto" width="1858" height="952" loading="lazy"></p>
<h3 id="heading-step-5-exploring-the-metrics-dashboard">Step 5: Exploring the Metrics Dashboard</h3>
<p>Now that you’re inside the CockroachDB Admin Console (<a target="_blank" href="http://localhost:8080">http://localhost:8080</a>), let’s take things a step further by exploring the <strong>Metrics</strong> section. This is where CockroachDB really shines.</p>
<p>On the left-hand side, click on “Metrics.” Here, you’ll find a collection of dashboards showing how your database is performing behind the scenes, things like query activity, performance, memory use, and much more.</p>
<p>These metrics help you understand what’s happening inside your cluster and make data-driven decisions – like when to scale up, optimize queries, or add more nodes.</p>
<p>We’ll start by focusing on some of the most insightful ones, such as:</p>
<ul>
<li><p><strong>SQL Queries Per Second</strong> – how busy your database is</p>
</li>
<li><p><strong>Service Latency (SQL Statements, 99th percentile)</strong> – how fast or slow your queries are</p>
</li>
</ul>
<p>Then, we’ll also look at others like SQL Contention, Replicas per Node, and Capacity to get a complete view of your CockroachDB cluster’s health.</p>
<p>Here’s what each of these metrics means in simple, everyday terms 👇🏾</p>
<h4 id="heading-sql-queries-per-second">SQL Queries Per Second</h4>
<p>This metric shows the number of SQL commands (like <code>SELECT</code>, <code>INSERT</code>, <code>UPDATE</code>, <code>DELETE</code>) your database cluster is handling every second. In simpler words, it’s how busy your database is. Imagine cars passing through a toll booth – this is the count of cars per second.</p>
<p>This is useful to know because if this number is steadily climbing, your system is getting more traffic or work. You may need to scale up (more nodes, more resources) or optimize queries. If it drops suddenly, something might be wrong (traffic drop, and so on).</p>
<p>Look for a stable or expected value for your workload. Spikes or sustained high values mean you should check performance.</p>
<h4 id="heading-service-latency-sql-statements-99th-percentile">Service Latency: SQL Statements, 99th percentile</h4>
<p>This metric shows the time it takes (for the slowest ~1 % of queries) from when the database gets the request until it finishes executing it. Think of waiting in a queue: 99% percentile is what the slowest people (1 in 100) experienced.</p>
<p>You’ll want to know this because if the slowest queries are taking too long, it might signal a bottleneck (CPU, disk, network, and so on). Low latency = good user experience.</p>
<p>So keep an eye out: if this value rises (gets worse) over time, investigate what’s slowing down. If it stays low and stable, you’re in good shape.</p>
<h4 id="heading-sql-statement-contention">SQL Statement Contention</h4>
<p>Statement contention demonstrates the number of SQL queries that got “stuck” or had to wait because other queries were using the same data or resources. This is like if two people were trying to grab the same book – one has to wait. That waiting is contention.</p>
<p>High contention means your database is chasing conflicts, waiting for locks or resources. This slows things down overall. So you’ll want to keep this number as low as possible. If it starts rising, you might need to revisit your schema, queries, or scale differently.</p>
<h4 id="heading-replicas-per-node">Replicas per Node</h4>
<p>This tells you how many copies (“replicas”) of data ranges live on each database node. If you imagine your data is like documents saved in several safes (nodes), this shows how many copies are in each safe.</p>
<p>This matters, because you want balanced replicas so no node is overloaded with too many copies (which can slow it down or put it at risk).</p>
<p>To check on this, make sure nodes have roughly equal replica counts. If one node has many more replicas, you might need to rebalance or add nodes.</p>
<h4 id="heading-capacity">Capacity</h4>
<p>Capacity shows how much disk/storage your cluster has (total), how much is used, and how much is free. Imagine a warehouse: it’s like how many boxes you can store, how many you’ve filled, and how much empty space remains.</p>
<p>You’ll need to know this, because if capacity is nearly full, you risk running out of space which can cause downtime or performance issues.</p>
<p>Free space should stay healthy (for example less than ~80% used). If it crosses that, plan to add storage or nodes.</p>
<h4 id="heading-why-these-matter-together">Why These Matter Together</h4>
<p>When you combine these metrics, you get a clear picture:</p>
<ul>
<li><p>High Queries Per Second + high latency = maybe you're under-powered.</p>
</li>
<li><p>High contention = your workload design might be fighting itself.</p>
</li>
<li><p>Imbalanced replicas or full capacity = infrastructure issues.</p>
</li>
<li><p>Stable low latency + balanced replicas + plenty of capacity = sounds like a healthy cluster.</p>
</li>
</ul>
<p>So by keeping an eye on these, you make data-driven decisions: when to scale, when to optimize, when to tweak configs.</p>
<h3 id="heading-step-6-creating-a-little-load-on-the-cockroachdb-cluster">Step 6: Creating a Little Load on the CockroachDB Cluster</h3>
<p>So far, we’ve explored the CockroachDB dashboard and understood what each metric means. Now, let’s make things a bit more fun. 🎉</p>
<p>In this part, we’ll run a simple Python app that connects to our CockroachDB cluster and performs a few database operations (creating, updating, deleting, and retrieving some records). This will help us generate a small load on the database so we can actually see the metrics in action.</p>
<p>Here’s what we’ll be doing step-by-step 👇🏾</p>
<h4 id="heading-step-61-create-a-configmap-for-our-books-data">Step 6.1: Create a ConfigMap for Our Books Data</h4>
<p>We’ll first create a list of 20 books that our Python script will interact with. Each book will have basic info like name, author, genre, pages, and price.</p>
<ol>
<li><p>Create a new file called <code>books.json</code></p>
<ul>
<li><p>On Linux:</p>
<pre><code class="lang-bash">  nano books.json
</code></pre>
<p>  Paste the below JSON content into it.</p>
<pre><code class="lang-json">  [
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Bright Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Ava Hart"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9783218196000"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2020</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">234</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">10.99</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Library"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Liam Stone"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9783863794026"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1993</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">358</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Romance"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">30.2</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Shadow Archive"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Maya Chen"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781615594078"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2001</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">404</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"History"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">16.21</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Bright Voyage"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Noah Rivers"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9785931034133"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1987</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">507</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">13.14</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Shadow Garden"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Zara Malik"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9785534192834"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2004</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">404</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Sci-Fi"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">28.13</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Ethan Brooks"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9785030564135"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2009</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">508</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Self-Help"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">20.79</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Atomic Atlas"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Iris Park"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9787242388493"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2025</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">442</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Romance"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">18.5</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The First Library"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Caleb Nguyen"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9787101226911"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2017</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">528</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Romance"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">24.47</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal River"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Sofia Diaz"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781845146276"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2004</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">599</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">31.15</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Archive"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Jude Bennett"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784893252883"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1996</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">632</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">40.47</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Last Compass"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Nina Volkova"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784303911713"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2018</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">451</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"History"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">29.53</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Garden"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Omar Haddad"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784896383461"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1988</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">251</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Thriller"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">36.38</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Silent Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Priya Kapoor"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781509839308"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2008</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">649</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">28.05</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Compass"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Felix Romero"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781834738291"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2025</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">180</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Self-Help"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">19.15</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Lost Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Tara Quinn"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781165667017"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2010</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">368</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">41.37</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Last Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Hana Sato"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9783387262476"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2005</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">467</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Nonfiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">42.01</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Archive"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Leo Fischer"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9780801326776"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1984</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">573</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Nonfiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">42.31</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Atlas"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Mila Novak"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784746872343"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2005</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">180</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Nonfiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">16.58</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Compass"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Arthur Wells"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9780097882086"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1983</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">713</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">39.42</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Silent Atlas"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Selene Ortiz"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781939909169"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1991</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">190</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Self-Help"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">33.79</span>
    }
  ]
</code></pre>
<p>  To save and close the file in nano:</p>
<ul>
<li><p>Press <code>CTRL + O</code> → then <code>ENTER</code> (to save)</p>
</li>
<li><p>Press <code>CTRL + X</code> (to exit the editor)</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p>Then create a ConfigMap from the file:</p>
<pre><code class="lang-bash"> kubectl create configmap books-json --from-file=books.json
</code></pre>
</li>
</ol>
<h4 id="heading-step-62-create-the-python-script-configmap">Step 6.2: Create the Python Script ConfigMap</h4>
<p>Next, we’ll create a simple Python script that:</p>
<ul>
<li><p>Creates a new table for books</p>
</li>
<li><p>Inserts 20 records</p>
</li>
<li><p>Updates 7 of them</p>
</li>
<li><p>Deletes 5</p>
</li>
<li><p>Retrieves 15 books from the database</p>
</li>
</ul>
<p>It’s like simulating a small library app. 📚</p>
<p>Create a new file called <code>books-script.yml</code> and paste the content below:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ConfigMap</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">books-script</span>
<span class="hljs-attr">data:</span>
  <span class="hljs-attr">run.py:</span> <span class="hljs-string">|
    #!/usr/bin/env python3
    import argparse
    import json
    import os
    import sys
    import time
    from typing import List, Dict
</span>
    <span class="hljs-string">import</span> <span class="hljs-string">psycopg</span>
    <span class="hljs-string">from</span> <span class="hljs-string">psycopg.rows</span> <span class="hljs-string">import</span> <span class="hljs-string">dict_row</span>

    <span class="hljs-string">DDL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    CREATE TABLE IF NOT EXISTS books (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name STRING NOT NULL,
        author STRING NOT NULL,
        isbn STRING UNIQUE,
        published_year INT4,
        pages INT4,
        genre STRING,
        price DECIMAL(10,2),
        created_at TIMESTAMPTZ NOT NULL DEFAULT now()
    );
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">INSERT_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    INSERT INTO books (name, author, isbn, published_year, pages, genre, price)
    VALUES (%s, %s, %s, %s, %s, %s, %s);
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">UPDATE_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    UPDATE books
    SET price = %s, pages = %s
    WHERE isbn = %s;
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">DELETE_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    DELETE FROM books
    WHERE isbn = %s;
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">GET_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    SELECT id, name, author, isbn, published_year, pages, genre, price, created_at
    FROM books
    WHERE isbn = %s;
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">def</span> <span class="hljs-string">load_books(path:</span> <span class="hljs-string">str)</span> <span class="hljs-string">-&gt;</span> <span class="hljs-string">List[Dict]:</span>
        <span class="hljs-string">with</span> <span class="hljs-string">open(path,</span> <span class="hljs-string">"r"</span><span class="hljs-string">)</span> <span class="hljs-attr">as f:</span>
            <span class="hljs-string">return</span> <span class="hljs-string">json.load(f)</span>

    <span class="hljs-string">def</span> <span class="hljs-string">connect_with_retry(dsn:</span> <span class="hljs-string">str,</span> <span class="hljs-attr">attempts:</span> <span class="hljs-string">int</span> <span class="hljs-string">=</span> <span class="hljs-number">30</span><span class="hljs-string">,</span> <span class="hljs-attr">delay:</span> <span class="hljs-string">float</span> <span class="hljs-string">=</span> <span class="hljs-number">2.0</span><span class="hljs-string">):</span>
        <span class="hljs-string">last_exc</span> <span class="hljs-string">=</span> <span class="hljs-string">None</span>
        <span class="hljs-string">for</span> <span class="hljs-string">_</span> <span class="hljs-string">in</span> <span class="hljs-string">range(attempts):</span>
            <span class="hljs-attr">try:</span>
                <span class="hljs-string">conn</span> <span class="hljs-string">=</span> <span class="hljs-string">psycopg.connect(dsn,</span> <span class="hljs-string">autocommit=False)</span>
                <span class="hljs-string">return</span> <span class="hljs-string">conn</span>
            <span class="hljs-attr">except Exception as e:</span>
                <span class="hljs-string">last_exc</span> <span class="hljs-string">=</span> <span class="hljs-string">e</span>
                <span class="hljs-string">time.sleep(delay)</span>
        <span class="hljs-string">raise</span> <span class="hljs-string">last_exc</span>

    <span class="hljs-string">def</span> <span class="hljs-string">main():</span>
        <span class="hljs-string">ap</span> <span class="hljs-string">=</span> <span class="hljs-string">argparse.ArgumentParser()</span>
        <span class="hljs-string">ap.add_argument("--dsn",</span> <span class="hljs-string">required=True,</span> <span class="hljs-string">help="Postgres/CockroachDB</span> <span class="hljs-string">DSN")</span>
        <span class="hljs-string">ap.add_argument("--json",</span> <span class="hljs-string">default="/app/books.json",</span> <span class="hljs-string">help="Path</span> <span class="hljs-string">to</span> <span class="hljs-string">books</span> <span class="hljs-string">JSON")</span>
        <span class="hljs-string">args</span> <span class="hljs-string">=</span> <span class="hljs-string">ap.parse_args()</span>

        <span class="hljs-string">books</span> <span class="hljs-string">=</span> <span class="hljs-string">load_books(args.json)</span>
        <span class="hljs-string">print(f"Loaded</span> {<span class="hljs-string">len(books)</span>} <span class="hljs-string">books")</span>

        <span class="hljs-string">conn</span> <span class="hljs-string">=</span> <span class="hljs-string">connect_with_retry(args.dsn)</span>
        <span class="hljs-string">conn.row_factory</span> <span class="hljs-string">=</span> <span class="hljs-string">dict_row</span>
        <span class="hljs-attr">try:</span>
            <span class="hljs-attr">with conn:</span>
                <span class="hljs-string">with</span> <span class="hljs-string">conn.cursor()</span> <span class="hljs-attr">as cur:</span>
                    <span class="hljs-string">print("Creating</span> <span class="hljs-string">table...")</span>
                    <span class="hljs-string">cur.execute(DDL)</span>

                    <span class="hljs-string">print("Inserting</span> <span class="hljs-number">20</span> <span class="hljs-string">books...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[:20]:</span>
                        <span class="hljs-string">cur.execute(INSERT_SQL,</span> <span class="hljs-string">(</span>
                            <span class="hljs-string">b["name"],</span> <span class="hljs-string">b["author"],</span> <span class="hljs-string">b["isbn"],</span>
                            <span class="hljs-string">b.get("published_year"),</span> <span class="hljs-string">b.get("pages"),</span>
                            <span class="hljs-string">b.get("genre"),</span> <span class="hljs-string">b.get("price"),</span>
                        <span class="hljs-string">))</span>

                    <span class="hljs-string">print("Updating</span> <span class="hljs-number">7</span> <span class="hljs-string">books...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[:7]:</span>
                        <span class="hljs-string">new_price</span> <span class="hljs-string">=</span> <span class="hljs-string">round(float(b.get("price",</span> <span class="hljs-number">10</span><span class="hljs-string">))</span> <span class="hljs-string">+</span> <span class="hljs-number">1.23</span><span class="hljs-string">,</span> <span class="hljs-number">2</span><span class="hljs-string">)</span>
                        <span class="hljs-string">new_pages</span> <span class="hljs-string">=</span> <span class="hljs-string">int(b.get("pages",</span> <span class="hljs-number">100</span><span class="hljs-string">))</span> <span class="hljs-string">+</span> <span class="hljs-number">5</span>
                        <span class="hljs-string">cur.execute(UPDATE_SQL,</span> <span class="hljs-string">(new_price,</span> <span class="hljs-string">new_pages,</span> <span class="hljs-string">b["isbn"]))</span>

                    <span class="hljs-string">print("Deleting</span> <span class="hljs-number">5</span> <span class="hljs-string">books...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[-5:]:</span>
                        <span class="hljs-string">cur.execute(DELETE_SQL,</span> <span class="hljs-string">(b["isbn"],))</span>

                    <span class="hljs-string">print("Performing</span> <span class="hljs-number">15</span> <span class="hljs-string">retrievals...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[:15]:</span>
                        <span class="hljs-string">cur.execute(GET_SQL,</span> <span class="hljs-string">(b["isbn"],))</span>
                        <span class="hljs-string">row</span> <span class="hljs-string">=</span> <span class="hljs-string">cur.fetchone()</span>
                        <span class="hljs-attr">if row:</span>
                            <span class="hljs-string">print(f"GET</span> {<span class="hljs-string">b</span>[<span class="hljs-string">'isbn'</span>]}<span class="hljs-string">:</span> {<span class="hljs-string">row</span>[<span class="hljs-string">'name'</span>]} <span class="hljs-string">by</span> {<span class="hljs-string">row</span>[<span class="hljs-string">'author'</span>]} <span class="hljs-string">(${row['price']})")</span>
                        <span class="hljs-attr">else:</span>
                            <span class="hljs-string">print(f"GET</span> {<span class="hljs-string">b</span>[<span class="hljs-string">'isbn'</span>]}<span class="hljs-string">:</span> <span class="hljs-string">not</span> <span class="hljs-string">found</span> <span class="hljs-string">(possibly</span> <span class="hljs-string">deleted)")</span>

            <span class="hljs-string">print("All</span> <span class="hljs-string">operations</span> <span class="hljs-string">completed.")</span>
        <span class="hljs-attr">finally:</span>
            <span class="hljs-string">conn.close()</span>

    <span class="hljs-string">if</span> <span class="hljs-string">__name__</span> <span class="hljs-string">==</span> <span class="hljs-attr">"__main__":</span>
        <span class="hljs-string">main()</span>
</code></pre>
<p>This script connects to the CockroachDB cluster, creates a table (if it doesn’t exist), and performs all those operations in sequence.</p>
<p>It runs around 50 SQL queries in total – a mix of <code>INSERT</code>, <code>UPDATE</code>, <code>DELETE</code>, and <code>SELECT</code> statements.</p>
<p>Now apply it:</p>
<pre><code class="lang-json">kubectl apply -f books-script.yml
</code></pre>
<h4 id="heading-step-63-create-the-job-to-run-the-script">Step 6.3: Create the Job to Run the Script</h4>
<p>Next, let’s create a Kubernetes Job that will actually run our Python script inside a container.</p>
<p>Create a file called <code>books-job.yml</code> and paste the manifest below:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">batch/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Job</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">books-job</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">runner</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">python:3.12-slim</span>
          <span class="hljs-attr">env:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">CRDB_DSN</span>
              <span class="hljs-attr">value:</span> <span class="hljs-string">"postgresql://root@crdb-cockroachdb-public:26257/defaultdb?sslmode=disable"</span>
          <span class="hljs-attr">command:</span> [<span class="hljs-string">"bash"</span>, <span class="hljs-string">"-lc"</span>]
          <span class="hljs-attr">args:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">|
              pip install --no-cache-dir "psycopg[binary]&gt;=3.1,&lt;3.3" &amp;&amp; \
              python /app/run.py --dsn "$CRDB_DSN" --json /app/books.json
</span>          <span class="hljs-attr">volumeMounts:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">script</span>
              <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/app/run.py</span>
              <span class="hljs-attr">subPath:</span> <span class="hljs-string">run.py</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">books</span>
              <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/app/books.json</span>
              <span class="hljs-attr">subPath:</span> <span class="hljs-string">books.json</span>
      <span class="hljs-attr">volumes:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">script</span>
          <span class="hljs-attr">configMap:</span>
            <span class="hljs-attr">name:</span> <span class="hljs-string">books-script</span>
            <span class="hljs-attr">defaultMode:</span> <span class="hljs-number">0555</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">books</span>
          <span class="hljs-attr">configMap:</span>
            <span class="hljs-attr">name:</span> <span class="hljs-string">books-json</span>
</code></pre>
<p>Here’s what’s happening:</p>
<ul>
<li><p>The Job runs a container based on Python 3.12-slim.</p>
</li>
<li><p>It connects to CockroachDB using the connection string <code>postgresql://root@crdb-cockroachdb-public:26257/defaultdb?sslmode=disable</code>. Notice how <code>sslmode=disable</code>: this is because we disabled TLS in our Helm values earlier.</p>
</li>
<li><p>The Job mounts the two ConfigMaps we created earlier (<code>books-json</code> and <code>books-script</code>) as <strong>volumes</strong> inside the container. Think of volumes like small external drives that the container can read from.</p>
</li>
</ul>
<p>Apply it:</p>
<pre><code class="lang-bash">kubectl apply -f books-job.yml
</code></pre>
<h4 id="heading-step-64-check-if-the-job-ran-successfully">Step 6.4: Check if the Job Ran Successfully</h4>
<p>After a minute or two, check your pods:</p>
<pre><code class="lang-bash">kubectl get po
</code></pre>
<p>If you see <code>books-job-xxx</code> with the status <strong>Completed</strong>, then your script ran successfully 🎉</p>
<p>That means our database just got a nice little workout – some records were created, updated, deleted, and read.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761460118429/99ed49a3-52e9-4357-ba2b-9295f0dfbdc8.png" alt="The Completed state of the Books Job" class="image--center mx-auto" width="530" height="124" loading="lazy"></p>
<h3 id="heading-step-7-viewing-the-metrics-from-the-load">Step 7: Viewing the Metrics from the Load</h3>
<p>Now that we’ve generated a small load, let’s jump back to the CockroachDB dashboard.</p>
<p>Head to the Metrics section, and under SQL Queries Per Second, you should see a little spike: this shows the activity from our Python job.👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761460175366/6c1e129e-c8bd-4f41-89de-60a1a753026e.png" alt="The SQL Queries Per Second Metric" class="image--center mx-auto" width="972" height="526" loading="lazy"></p>
<p>Hover your mouse over the graph lines to see exact numbers.</p>
<p>Do the same for Service Latency: SQL Statements (99th percentile). You’ll notice a few bumps showing how long some of the queries took.👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761460224971/8ba9d5ed-0724-4dc6-82f4-7e5d0d05be82.png" alt="The Service Latency Metric" class="image--center mx-auto" width="973" height="410" loading="lazy"></p>
<p>This small experiment gives you a real feel for how CockroachDB reacts under activity, even a tiny one.</p>
<p>To explore more metrics and dashboards, check out the <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/ui-overview-dashboard">official CockroachDB documentation here</a>.</p>
<h3 id="heading-step-8-view-the-list-of-created-items-in-the-database">Step 8: View the List of Created Items in the Database</h3>
<p>Now that our Python job ran and touched the database (creating, updating, deleting, retrieving records), let’s check the content of our <code>books</code> table just to verify everything really happened.</p>
<p>First, we’ll create another Kubernetes job (or pod) that connects to our CockroachDB cluster and runs a simple SQL query <code>SELECT * FROM books;</code>. This pulls out all the remaining records in the table.</p>
<p>Here’s the manifest to use. Create a file named <code>view-books.yml</code> and paste the below content inside it:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">batch/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Job</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">view-books</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">client</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.2</span>
          <span class="hljs-attr">command:</span> [<span class="hljs-string">"bash"</span>, <span class="hljs-string">"-lc"</span>]
          <span class="hljs-attr">args:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">|
              cockroach sql \
                --insecure \
                --host=crdb-cockroachdb-public:26257 \
                --database=defaultdb \
                --format=records \
                --execute="SELECT * FROM public.books;"</span>
</code></pre>
<p>Note: We use <code>sslmode=disable</code> because we turned off TLS in our Minikube config. This job mounts nothing fancy. It just spins up, connects to the database, runs the <code>SELECT</code>, and displays the result.</p>
<p>Run the job:</p>
<pre><code class="lang-bash">kubectl apply -f view-books.yml
</code></pre>
<p>Wait a minute, then check the pod status:</p>
<pre><code class="lang-bash">kubectl get po
</code></pre>
<p>Look for something like <code>books-client-job-xxx</code> in <strong>Completed</strong> state.</p>
<p>Finally, view the job logs to see the actual records:</p>
<pre><code class="lang-bash">kubectl logs view-books
</code></pre>
<p>You’ll see output similar to the below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761462270132/c881eca7-18b0-4647-a6b1-2841e7774969.png" alt="The list of created books in the books table in the CockroachDB database" class="image--center mx-auto" width="631" height="837" loading="lazy"></p>
<h2 id="heading-backing-up-cockroachdb-to-google-cloud-storage">Backing Up CockroachDB to Google Cloud Storage ☁️</h2>
<p>In this section we’ll explain how you can automate backups of your CockroachDB cluster using simple SQL commands, service accounts (for authenticating to Google Cloud), and Google Cloud Storage (where the data will be stored).</p>
<h3 id="heading-why-backups-are-absolutely-critical">Why Backups Are Absolutely Critical</h3>
<p>Imagine you’ve built your cluster on Kubernetes, and everything’s humming along for weeks or months. You’ve got tens or hundreds of gigabytes of data and 10k+ users relying on it.</p>
<p>Then <strong>BAM!</strong> Something happens. Maybe someone accidentally overwrote the Helm release (<code>helm upgrade --install …</code> with the same release name, for example <code>crdb</code>), or a cloud disk got deleted, or a critical node failed and you lose the majority of data replicas. That’s the nightmare we all dread 😭.</p>
<p>Mistakes happen, even if you’re super careful. What matters most is: How fast and easily could you recover?</p>
<p>That’s why we’ll set up <strong>daily backups</strong> of our CockroachDB cluster, targeting a Google Cloud Storage bucket. (Quick note: Google Cloud Object Storage is a service where you can store large amounts of data in the cloud as “objects”. You can grab, store, and retrieve data from it, just like Google Drive or Apple Storage. 😃)</p>
<p>With your backups going into a storage bucket, if disaster strikes, you can restore the entire cluster (or specific databases/tables) in minutes or hours – instead of days or losing data forever.</p>
<h3 id="heading-connecting-to-our-db-installing-beekeeper-studio">Connecting to Our DB – Installing Beekeeper Studio</h3>
<p>So far, we’ve been connecting to our database programmatically, running commands from pods or jobs inside Kubernetes. But what if there was a <em>more visual</em> and <em>user-friendly</em> way to explore our data?</p>
<p>Well, meet my friend <strong>Beekeeper Studio.</strong> 🙂</p>
<p>Beekeeper Studio is a sleek, open-source database management tool that lets you connect to a wide range of databases like PostgreSQL, MySQL, SQLite, and (most importantly for us) CockroachDB.</p>
<p>It comes with a simple, modern interface for running queries, browsing tables, and viewing data – no need to jump into pods or remember command-line flags 😄</p>
<h3 id="heading-how-to-install-beekeeper-studio">How to Install Beekeeper Studio</h3>
<ol>
<li><p>Visit the official Beekeeper Studio download page here: <a target="_blank" href="https://www.beekeeperstudio.io/get">https://www.beekeeperstudio.io/get</a></p>
</li>
<li><p>Click the “Skip to the download” link. You’ll see something like this:</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761542821015/2e7a0fd5-7047-4090-97fb-46b81a3dd638.png" alt="Finding the Button to Skip to the DOwnload page on the Beekeeper Studio website" class="image--center mx-auto" width="874" height="547" loading="lazy"></p>
</li>
<li><p>You’ll be redirected to a page listing download options for different operating systems.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761542877590/6034dcf0-d9b0-447b-bd2b-089458729db7.png" alt="Page to select download option according to the user OS" class="image--center mx-auto" width="924" height="727" loading="lazy"></p>
</li>
<li><p>Choose your OS and download the correct installer.</p>
</li>
<li><p>Afterwards, install the downloaded Beekeeper Studio software according to your OS</p>
</li>
</ol>
<h3 id="heading-connecting-beekeeper-studio-to-cockroachdb">Connecting Beekeeper Studio to CockroachDB</h3>
<p>Now that we’ve installed Beekeeper Studio, it’s time to connect it to our CockroachDB cluster running inside Minikube</p>
<p>But before we jump in, here’s something important to note:👇🏾</p>
<p>Our CockroachDB cluster is running INSIDE Kubernetes, and by default, it’s not accessible from outside the cluster.</p>
<p>To confirm this, run:</p>
<pre><code class="lang-bash">kubectl get svc crdb-cockroachdb-public
</code></pre>
<p>You should see something like this 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761544640270/2cf9f8f1-15f1-459b-acd0-63b1c361fa54.png" alt="The CockroachDB service being of type ClusterIP" class="image--center mx-auto" width="709" height="63" loading="lazy"></p>
<p>Notice the <strong>CLUSTER-IP</strong> column. That means the service can only be accessed by other pods INSIDE the Minikube cluster – not from your laptop or external apps</p>
<h3 id="heading-exposing-the-cluster-for-local-access">Exposing the Cluster for Local Access</h3>
<p>To make our database accessible from your local machine (so Beekeeper Studio can reach it), we’ll use <strong>Kubernetes Port Forwarding</strong>.</p>
<p>In a new terminal tab, run:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26257
</code></pre>
<p>This command tells Kubernetes to forward your local port 26257 to CockroachDB service’s port 26257 inside the cluster.</p>
<p>Once it’s running, your CockroachDB instance will now be accessible from <a target="_blank" href="http://localhost:26257"><code>localhost:26257</code></a>.<br>(Note: it’s not accessible via your browser because this isn’t an HTTP endpoint 😅)</p>
<h3 id="heading-connecting-via-beekeeper-studio">🐝 Connecting via Beekeeper Studio</h3>
<ol>
<li><p>Open Beekeeper Studio.</p>
</li>
<li><p>Click on the dropdown that says “Select a connection type…”.</p>
</li>
<li><p>Choose CockroachDB from the list.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761544886889/98443b46-574d-4bcc-a41c-d2daa7412201.png" alt="Selecting CockroachDB as a connection type in Beekeeper Studio" class="image--center mx-auto" width="694" height="496" loading="lazy"></p>
</li>
<li><p>In the connection window that pops up:</p>
<ul>
<li><p>Disable the <code>Enable SSL</code> option.</p>
</li>
<li><p>Set User to <code>root</code></p>
</li>
<li><p>Set Default Database to <code>defaultdb</code></p>
</li>
<li><p>Host to <a target="_blank" href="http://localhost"><code>localhost</code></a></p>
</li>
<li><p>Port to <code>26257</code></p>
</li>
</ul>
</li>
<li><p>Now click <strong>Test</strong> (bottom right corner). You should see a success message like <em>Connection looks good</em>.</p>
</li>
</ol>
<p>Your setup should look like this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761544818021/0248173e-9969-433c-a9d4-e83684bf34cf.png" alt="Connecting to the CockroachDB cluster from the Beekeeper Studio software" class="image--center mx-auto" width="808" height="709" loading="lazy"></p>
<p>Finally, click Connect (right beside the Test button).</p>
<h3 id="heading-verify-the-connection">Verify the Connection</h3>
<p>Once connected, you’ll land on a clean workspace where you can run SQL queries.</p>
<p>To confirm you’re connected to the right cluster, run:</p>
<pre><code class="lang-bash">SELECT * FROM books;
</code></pre>
<p>You should see a table containing about 15 books (the same ones we inserted earlier):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761545094817/99ef4415-bd0d-4452-817f-380996485397.png" alt="List of books in the CockroachDB database" class="image--center mx-auto" width="851" height="749" loading="lazy"></p>
<p>And there you go. You’ve now connected Beekeeper Studio to your CockroachDB running inside Minikube! 🚀</p>
<h3 id="heading-creating-a-google-cloud-account">Creating a Google Cloud Account</h3>
<p>Before we can back up our CockroachDB data to Google Cloud Storage, we need to have a Google Cloud account ready.</p>
<h4 id="heading-step-1-visit-the-google-cloud-console">Step 1: Visit the Google Cloud Console</h4>
<p>Head over to 👉🏾 <a target="_blank" href="https://console.cloud.google.com">https://console.cloud.google.com</a></p>
<p>If you don’t have a Google account yet, don’t worry. The process is simple and self-explanatory once you visit the site :). You’ll be guided to create a Google account first, and then your Google Cloud account.</p>
<h4 id="heading-step-2-create-or-use-a-project">Step 2: Create or Use a Project</h4>
<p>Once you’re in the Google Cloud Console, you’ll either:</p>
<ul>
<li><p>Use the <strong>default project</strong> that was automatically created for you, <strong>or</strong></p>
</li>
<li><p>Create a new one by clicking on <strong>“New Project”</strong> and naming it <code>crdb-tutorial</code>.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761546797213/295c7b09-9bb8-4c34-85cf-8701242b2768.png" alt="Creating a new Project in our Google Cloud account" class="image--center mx-auto" width="566" height="527" loading="lazy"></p>
<p>Projects are like folders that contain all your Google Cloud resources: compute instances, storage buckets, databases, and more.</p>
<h4 id="heading-step-3-link-a-billing-account-optional-but-recommended">Step 3: Link a Billing Account (Optional but Recommended)</h4>
<p>If you already have a billing account, link it to your project.</p>
<p>If not, you can easily create one by <a target="_blank" href="https://docs.cloud.google.com/billing/docs/how-to/create-billing-account">following Google’s instructions here</a>. (You’ll need a valid Debit or Credit card.)</p>
<p>Don’t worry if your card doesn’t link right away. Sometimes Google’s billing system can be picky. 😅</p>
<p>Here’s a quick fix that usually works:</p>
<ol>
<li><p>Add your card to Google Pay first.</p>
</li>
<li><p>Then go to Google Subscriptions in your Google account, and link it to your Google Billing Account.</p>
</li>
</ol>
<p>To add your card via Google Subscriptions, <a target="_blank" href="https://myaccount.google.com/payments-and-subscriptions">visit here</a>. (You need to have a Google account first. Don’t worry, the site will direct you on what to do if you don’t.)</p>
<p>You’ll see a page like this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761546938934/9e983134-dd7e-49b1-85a7-cd12bd01bf67.png" alt="Adding a card to Google Subscriptions" class="image--center mx-auto" width="887" height="403" loading="lazy"></p>
<p>Click Manage payment methods, then add your card details.</p>
<p>Once you’ve done that, refresh your Google Billing Account page – you should now see your card as one of the available options.</p>
<h3 id="heading-creating-a-google-cloud-storage-bucket">Creating a Google Cloud Storage Bucket</h3>
<p>Now that we’ve set up our Google Cloud account and enabled billing, let’s create a Cloud Storage Bucket. This is simply a location (like an online folder) where our CockroachDB backup files will be stored.</p>
<p>In your Google Cloud console, type “storage” in the search bar at the top. From the dropdown results, click on “Cloud Storage”:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089121918/c737c3e1-e45f-48e1-aed9-99e273583425.png" alt="Navigating to the Cloud Storage page" class="image--center mx-auto" width="553" height="197" loading="lazy"></p>
<p>On the new page, click on the “Buckets” link in the side menu, then click the “Create Bucket” button.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089164660/8b9336fc-c0c3-4811-ab98-d3538596ee5a.png" alt="Creating a new Bucket in Cloud Storage" class="image--center mx-auto" width="749" height="650" loading="lazy"></p>
<p>Give your bucket a unique name, like <em>cockroachdb-backup</em>-. For example, <em>cockroachdb-backup-i8wu, cockroachdb-backup-7gw8u.</em> The random characters ensure your bucket name is unique globally (no other Google Cloud user will have the same name).</p>
<p>Scroll to the bottom and click “Create” to create your bucket.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089287083/a376f695-81b8-4f5a-80a7-cd563c8b4c81.png" alt="Creating your Bucket in Google Cloud Storage" class="image--center mx-auto" width="552" height="889" loading="lazy"></p>
<p>You’ll see a pop-up asking you to <strong>confirm public access prevention</strong>. This means that only you (and people you explicitly give access to) can view or edit your bucket. Make sure the “Enforce public access prevention on this bucket” checkbox is checked, then click “Confirm.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089404876/38c8e6b5-0de0-4771-9bed-9334f8f8c43a.png" alt="Preventing random users from accessing your bucket" class="image--center mx-auto" width="571" height="391" loading="lazy"></p>
<p>Perfect! 🎉 You’ve now created a storage bucket where your CockroachDB backups will live.</p>
<h3 id="heading-giving-cockroachdb-access-to-the-bucket">Giving CockroachDB Access to the Bucket</h3>
<p>Our next goal is to let the CockroachDB cluster upload and read files from this bucket. To do this, we’ll create something called a <strong>Service Account</strong> using <strong>Google IAM</strong>.</p>
<p><strong>What’s IAM?</strong><br>IAM stands for <em>Identity and Access Management.</em> It’s basically Google Cloud’s way of managing who can access what in your project.</p>
<p>With IAM, we can create a service account (like a “digital employee”) and give it permission to interact with our bucket instead of using our personal Google account.</p>
<h4 id="heading-creating-a-service-account">Creating a Service Account</h4>
<p>Type “service account” in the search bar and click on “Service Accounts” in the results.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089569066/2855b7fa-d896-4249-825d-4ec590499ca8.png" alt="Navigating the Service Accounts page" class="image--center mx-auto" width="527" height="279" loading="lazy"></p>
<p>Click “Create Service Account” at the top of the page. On the new page, type: <em>cockroachdb-backup</em> as the service account name, then click ‘Create and Continue’</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089677768/05c9f9ed-257f-44c6-89b5-3880c8af017d.png" alt="Creating a new Service Account for the CockroachDB cluster, to give it access to our Cloud Storage Bucket" class="image--center mx-auto" width="543" height="597" loading="lazy"></p>
<p>Now we’ll give this service account permission to work with our storage bucket. In the <em>Permissions</em> section, type “storage object creator” in the filter box and select it from the dropdown.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089744927/64ed65df-88ee-43c9-8be4-892a41a24989.png" alt="Providing our Service Account with the necessary permissions to access the bucket" class="image--center mx-auto" width="476" height="590" loading="lazy"></p>
<p>Repeat the same for “storage object viewer”, and “storage object user”.</p>
<p>At the end, you should see three roles assigned:</p>
<ul>
<li><p>Storage Object Creator</p>
</li>
<li><p>Storage Object Viewer</p>
</li>
<li><p>Storage Object User</p>
</li>
</ul>
<p>Click “Continue”, then “Done.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762092953125/0419abe8-a1ff-4f1c-b367-f9e203bdf6ff.png" alt="The necessary permissions to be assigned to the Service Account" class="image--center mx-auto" width="520" height="788" loading="lazy"></p>
<p>You’ve now created a service account that can create and read files in your bucket.</p>
<h4 id="heading-downloading-the-service-account-key">Downloading the Service Account Key</h4>
<p>To let our CockroachDB cluster use this service account, we’ll generate a <strong>key file</strong>.</p>
<p><strong>What’s a key file?</strong><br>It’s just a small <strong>JSON file</strong> containing secret information your app (CockroachDB) can use to authenticate securely with Google Cloud – like an ID card.</p>
<p><strong>But be careful ⚠️</strong> If this key gets into the wrong hands, anyone could use it to access your Google Cloud resources. <strong>Never share or upload this file</strong> to your GitHub, BitBucket, or GitLab repository, or any other online repositories.</p>
<p>In the Service Accounts page, find your <code>cockroachdb-backup</code> account, click the three dots (⋮) under the Action column, then select “Manage Keys.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762090008411/11c4b373-87b0-416d-bf14-1a9ccd15c452.png" alt="Finding the newly created service account, and creating a key" class="image--center mx-auto" width="647" height="373" loading="lazy"></p>
<p>On the new page, click “Add Key” then “Create new key.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762090059309/ebe17228-e2a8-4abe-b41b-7378013570d5.png" alt="Creating a new key for the new service account" class="image--center mx-auto" width="501" height="571" loading="lazy"></p>
<p>A dialog box will pop-up, choose JSON as the key type, and click “Create.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762090115728/5ed82664-f57a-4489-af08-be85c2ad42e9.png" alt="Selecting the Key Type as JSON" class="image--center mx-auto" width="610" height="381" loading="lazy"></p>
<p>Google will automatically download a file named something like <code>cockroachdb-backup-1234567890abcdef.json</code></p>
<p>We’ll use this key soon when we configure our CockroachDB backup job.</p>
<h3 id="heading-attaching-the-key-to-our-cockroachdb-cluster">Attaching the Key to Our CockroachDB Cluster</h3>
<p>Now that we’ve downloaded the service account key, we need to attach it to our CockroachDB cluster so that the DB can upload and read backups from our Google Cloud Storage bucket.</p>
<p><strong>Why this is needed:</strong><br>Our Minikube cluster (and even any managed Kubernetes cluster like GKE, EKS, or AKS) <strong>doesn’t have direct access</strong> to the files on your computer. So, we’ll upload the key file to Kubernetes as a Secret, and then mount it inside our CockroachDB pods as a volume.</p>
<h4 id="heading-step-1-create-a-kubernetes-secret">Step 1: Create a Kubernetes Secret</h4>
<p>Run the command below in your terminal👇🏾 Replace <code>&lt;PATH_TO_KEY&gt;</code> with the path to your downloaded key file:</p>
<pre><code class="lang-bash">kubectl create secret generic gcs-key --from-file=key.json=&lt;PATH_TO_KEY&gt;
</code></pre>
<p>This command creates a <strong>Kubernetes Secret</strong> named <code>gcs-key</code> that securely stores your Google Cloud key.</p>
<h4 id="heading-step-2-mount-the-secret-to-the-cockroachdb-cluster">Step 2: Mount the Secret to the CockroachDB Cluster</h4>
<p>Now, let’s tell Kubernetes to use this secret inside our CockroachDB cluster.</p>
<p>Open your <code>cockroachdb-values.yml</code> file and scroll to the <code>statefulset:</code> section. Add the following lines under it:👇🏾</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-string">...</span>
  <span class="hljs-attr">env:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GOOGLE_APPLICATION_CREDENTIALS</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">/var/run/gcp/key.json</span>

  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">gcs-key</span>

  <span class="hljs-attr">volumeMounts:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/var/run/gcp</span>
      <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Here’s what this does:</p>
<ul>
<li><p>The <code>volumes</code> section tells Kubernetes to create a volume from the secret we just made.</p>
</li>
<li><p>The <code>volumeMounts</code> section attaches that volume inside the CockroachDB container.</p>
</li>
<li><p>The <code>GOOGLE_APPLICATION_CREDENTIALS</code> environment variable points CockroachDB to our key file so it knows where to find it when connecting to Google Cloud.</p>
</li>
</ul>
<p>Your final file should look like this:👇🏾</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">podSecurityContext:</span>
    <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">podAntiAffinity:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>
  <span class="hljs-attr">nodeSelector:</span>
    <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
  <span class="hljs-attr">env:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GOOGLE_APPLICATION_CREDENTIALS</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">/var/run/gcp/key.json</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">gcs-key</span>
  <span class="hljs-attr">volumeMounts:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/var/run/gcp</span>
      <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>

<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">5Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">standard</span>

<span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>

<span class="hljs-attr">init:</span>
  <span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">wait:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Now, apply the update using Helm:👇🏾</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<h4 id="heading-step-3-confirm-the-key-exists-in-the-cluster">Step 3: Confirm the Key Exists in the Cluster</h4>
<p>Once the upgrade is complete, run this command to confirm the key is now inside your CockroachDB pods:</p>
<pre><code class="lang-bash">kubectl <span class="hljs-built_in">exec</span> -it crdb-cockroachdb-1 -- cat /var/run/gcp/key.json
</code></pre>
<p>You should see something similar to this:👇🏾</p>
<pre><code class="lang-bash">prince@DESKTOP-QHVTAUD:~/programming/cockroachdb-tutorial$ kubectl <span class="hljs-built_in">exec</span> -it crdb-cockroachdb-1 -- cat /var/run/gcp/key.json
{
  <span class="hljs-string">"type"</span>: <span class="hljs-string">"service_account"</span>,
  <span class="hljs-string">"project_id"</span>: ***,
  <span class="hljs-string">"private_key_id"</span>: ***,
  <span class="hljs-string">"private_key"</span>: ***,
  <span class="hljs-string">"client_email"</span>: ***,
  <span class="hljs-string">"client_id"</span>: ***,
  <span class="hljs-string">"auth_uri"</span>: <span class="hljs-string">"https://accounts.google.com/o/oauth2/auth"</span>,
  <span class="hljs-string">"token_uri"</span>: <span class="hljs-string">"https://oauth2.googleapis.com/token"</span>,
  <span class="hljs-string">"auth_provider_x509_cert_url"</span>: <span class="hljs-string">"https://www.googleapis.com/oauth2/v1/certs"</span>,
  <span class="hljs-string">"client_x509_cert_url"</span>: ***,
  <span class="hljs-string">"universe_domain"</span>: <span class="hljs-string">"googleapis.com"</span>
}
</code></pre>
<p>Nice! That means our cluster now has access to the Google Cloud key.</p>
<h4 id="heading-step-4-creating-the-backup-schedule">Step 4: Creating the Backup Schedule</h4>
<p>CockroachDB makes backups super convenient. It can automatically back up your database <strong>on a schedule</strong> (without you needing to manually create Kubernetes CronJobs).</p>
<p>To create an automatic backup schedule, run this SQL command inside the CockroachDB SQL shell 👇🏾(Replace the BUCKET_NAME placeholder with the name of your Google Cloud Storage bucket):</p>
<pre><code class="lang-bash">CREATE SCHEDULE backup_cluster
FOR BACKUP INTO <span class="hljs-string">'gs://&lt;BUCKET_NAME&gt;/cluster?AUTH=implicit'</span>
WITH revision_history
RECURRING <span class="hljs-string">'@hourly'</span>
FULL BACKUP <span class="hljs-string">'@daily'</span>
WITH SCHEDULE OPTIONS first_run = <span class="hljs-string">'now'</span>;
</code></pre>
<p>Here’s what each part means:</p>
<ul>
<li><p><code>AUTH=implicit</code> tells CockroachDB to use the Google key we mounted (<code>GOOGLE_APPLICATION_CREDENTIALS</code>) for authentication.</p>
</li>
<li><p><code>FULL BACKUP '@daily'</code> creates a complete backup of the entire database every day.</p>
</li>
<li><p><code>RECURRING '@hourly'</code> creates smaller, incremental backups every hour, capturing just the changes since the last backup.</p>
</li>
<li><p><code>WITH SCHEDULE OPTIONS first_run = 'now'</code> starts the first backup immediately after running the command.</p>
</li>
</ul>
<p>After running it, CockroachDB will return two rows:</p>
<ul>
<li><p>The first is for the <strong>recurring incremental backup</strong> (hourly updates)</p>
</li>
<li><p>The second is for the <strong>full backup</strong> (daily snapshot)</p>
</li>
</ul>
<p>You can read more about full and incremental backups in the official docs here 👉🏾<a target="_blank" href="https://www.cockroachlabs.com/docs/stable/take-full-and-incremental-backups">CockroachDB Backups Guide</a>.</p>
<h4 id="heading-step-5-checking-backup-status">Step 5: Checking Backup Status</h4>
<p>To see the status of your backups, copy the <strong>Job ID</strong> from the second row (the <code>id</code> column) and run this command:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762103549260/742fc309-9c4d-4967-9436-91539851a9b9.png" alt="The job ID to copy" class="image--center mx-auto" width="1587" height="107" loading="lazy"></p>
<pre><code class="lang-bash">SHOW JOBS FOR SCHEDULE &lt;YOUR_JOB_ID&gt;;
</code></pre>
<p>Replace <code>&lt;YOUR_JOB_ID&gt;</code> with the ID you copied.</p>
<p>You’ll see output similar to this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762103606748/8627d561-0b54-4e6d-9109-ba7e1c7a85c3.png" alt="Getting the status of the backup job" class="image--center mx-auto" width="1152" height="294" loading="lazy"></p>
<p>Now, do the same for the recurring backup job (the ID on the 1st row of the previous result)</p>
<p>If both statuses show <code>succeeded</code>, that means your full and recurring backups worked perfectly! If either is still running, just give it a few minutes – backups can take a bit of time :)</p>
<h3 id="heading-testing-our-backup-disaster-recovery-time">Testing Our Backup — Disaster Recovery Time</h3>
<p>Woohoo! We’ve successfully created a backup of our CockroachDB cluster to Google Cloud Storage. That’s a huge milestone. But let’s be honest: how can we be <em>sure</em> it works if we’ve never tried restoring it?</p>
<p>So, in true brave-developer fashion, we’re going to do the unthinkable: <strong>destroy our entire database</strong>...yes, everything! 😬</p>
<p>Why would we do that?! Because in real life, disasters happen. A node crashes, data gets wiped, or an upgrade goes sideways. The question is: <em>Can we recover?</em> Let’s find out.</p>
<h4 id="heading-step-1-uninstall-the-helm-chart">Step 1: Uninstall the Helm Chart</h4>
<p>First, let’s remove the CockroachDB Helm release. This deletes the cluster resources like StatefulSets, pods, and secrets:</p>
<pre><code class="lang-bash">helm uninstall crdb
</code></pre>
<p>This removes the running cluster, but <strong>not the actual data</strong>, which is stored on Persistent Volumes (PVs).</p>
<h4 id="heading-step-2-delete-persistent-volume-claims-pvcs">Step 2: Delete Persistent Volume Claims (PVCs)</h4>
<p>Each CockroachDB node stores its data in a <strong>Persistent Volume Claim</strong> (PVC). These PVCs remain even after uninstalling the Helm release, so let’s manually delete them:</p>
<pre><code class="lang-bash">kubectl delete pvc datadir-crdb-cockroachdb-0
kubectl delete pvc datadir-crdb-cockroachdb-1
kubectl delete pvc datadir-crdb-cockroachdb-2
</code></pre>
<h4 id="heading-step-3-delete-the-persistent-volumes-pvs">Step 3: Delete the Persistent Volumes (PVs)</h4>
<p>Next, list all the Persistent Volumes:</p>
<pre><code class="lang-bash">kubectl get pv
</code></pre>
<p>You’ll see a list of volumes similar to this 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762107818554/01defffd-543b-486a-aa19-4bbf6f768270.png" alt="List existing Persistent Volumes for CockroachDB" class="image--center mx-auto" width="925" height="91" loading="lazy"></p>
<p>Look for the PVs that are <strong>bound to the PVCs</strong> you just deleted. Then delete them manually using:</p>
<pre><code class="lang-bash">kubectl delete pv &lt;PV_NAME&gt;
</code></pre>
<p>At this point, you’ve completely wiped out your database like it never existed 🥲. Don’t worry: this is all part of the plan.</p>
<h4 id="heading-step-4-reinstall-the-cluster">Step 4: Reinstall the Cluster</h4>
<p>Let’s bring CockroachDB back to life (an empty one for now):</p>
<pre><code class="lang-bash">helm install crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>Once the installation is done, expose the cluster locally again:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26257
</code></pre>
<h4 id="heading-step-5-check-whats-left">Step 5: Check What’s Left</h4>
<p>Connect to the Beekeeper Studio to your DB if your not, and try running the query below:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>You’ll get an error saying the <code>books</code> table doesn’t exist, because this is a <em>brand new</em> database.</p>
<h4 id="heading-step-6-restore-from-google-cloud-storage">Step 6: Restore from Google Cloud Storage</h4>
<p>Now for the magic part, let’s bring our data back from the backup we created earlier 😃!</p>
<p>Run this query the new cluster:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">RESTORE</span> <span class="hljs-keyword">FROM</span> LATEST <span class="hljs-keyword">IN</span> <span class="hljs-string">'gs://&lt;BUCKET_NAME&gt;/cluster?AUTH=implicit'</span>;
</code></pre>
<p>Replace <code>&lt;BUCKET_NAME&gt;</code> with your actual Google Cloud Storage bucket name (for example: <code>cockroachdb-backup-7gw8u</code>).</p>
<p>CockroachDB will begin restoring your data. This can take a few seconds or minutes depending on your backup size. When it’s done, you’ll see a response showing a success status:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762108106557/0da98d45-d8f4-48ed-b852-9f76209fb20f.png" alt="Database restored successfully" class="image--center mx-auto" width="645" height="422" loading="lazy"></p>
<h4 id="heading-step-7-confirm-the-restoration">Step 7: Confirm the Restoration</h4>
<p>Now, run the same query again:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom 💥 your books are back 😁! That means your backup and restore process works perfectly. You just performed a full disaster recovery test.</p>
<p>Congrats! You’ve done something many real-world teams fail to test: a <strong>full backup and restore cycle</strong>. You’ve now proven that your database setup is resilient, even in a worst-case scenario.</p>
<h2 id="heading-managing-resources-amp-optimizing-memory-usage">Managing Resources &amp; Optimizing Memory Usage</h2>
<p>In this section, we’ll learn how CockroachDB handles memory internally (for things like caching and SQL query work), and how to tune these setting<strong>s</strong> so you avoid OOM kills or Eviction – Kubernetes crashing/stopping the database due to it using too much memory than what was allocated to it.</p>
<h3 id="heading-how-cockroachdb-uses-memory">How CockroachDB Uses Memory</h3>
<p>When you deploy CockroachDB nodes (each replica) via Kubernetes, each pod (node) needs memory for multiple things. At a high level, there are two major internal uses:</p>
<ul>
<li><p><strong>Cache</strong> (<code>conf.cache</code>): This is the space CockroachDB uses to keep frequently accessed data in memory so queries can run faster without hitting the disk.</p>
</li>
<li><p><strong>SQL Memory</strong> (<code>conf.max-sql-memory</code>): This is the memory used when running SQL queries (things like sorting, joins, buffering numbers, and temporary data).</p>
</li>
</ul>
<p>Together, they need to be sized appropriately relative to the total memory you give the pod, so there’s room for these internal operations <em>plus</em> other overhead (networking, logging, background tasks).</p>
<h3 id="heading-the-memory-usage-formula-you-must-follow">The Memory Usage Formula You Must Follow</h3>
<p>Here’s the golden rule you should <strong>never forget</strong>:</p>
<pre><code class="lang-yaml"><span class="hljs-string">(2</span> <span class="hljs-string">×</span> <span class="hljs-string">max-sql-memory)</span> <span class="hljs-string">+</span> <span class="hljs-string">cache</span>  <span class="hljs-string">≤</span>  <span class="hljs-number">80</span><span class="hljs-string">%</span> <span class="hljs-string">of</span> <span class="hljs-string">the</span> <span class="hljs-string">memory</span> <span class="hljs-string">limit</span>
</code></pre>
<p>What this means:</p>
<ul>
<li><p>You take the <code>max-sql-memory</code> value and multiply by 2 (because SQL work may need space for both input and output, etc)</p>
</li>
<li><p>Add your <code>cache</code> value</p>
</li>
<li><p>That total must be <strong>less than or equal to 80%</strong> of the pod’s memory limit (<code>statefulset.resources.limits.memory</code>)</p>
</li>
<li><p>The remaining ~20% (or more) is free space for <em>other internal CockroachDB processes</em> like background jobs, metrics, network, and so on</p>
</li>
</ul>
<p>If you give CockroachDB too little “free” memory beyond these two settings, you risk OOM kills (pod gets killed by Kubernetes because it used more memory than allowed) or performance issues.</p>
<h3 id="heading-where-you-find-these-settings">Where You Find These Settings</h3>
<p>If you go to the Helm chart docs on ArtifactHub, <a target="_blank" href="https://artifacthub.io/packages/helm/cockroachdb/cockroachdb">CockroachDB Helm Chart on ArtifactHub</a>, and scroll down to the <strong>Configuration</strong> section (or press Ctrl-F for <code>conf.cache</code>), you’ll see:</p>
<ul>
<li><p><code>conf.cache</code> (cache size)</p>
</li>
<li><p><code>conf.max-sql-memory</code> (SQL memory size)</p>
</li>
<li><p>It states that each of these is by default set to roughly 25% of the memory allocation you set in the <code>resources.limits.memory</code> for the statefulset.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762235290740/bd176882-43bd-4abd-94e0-cce083335d64.png" alt="Artifacthub docs for the CockroachDB Helm chart" class="image--center mx-auto" width="1260" height="489" loading="lazy"></p>
<h3 id="heading-concrete-example-step-by-step">Concrete Example (Step-by-Step)</h3>
<p>Let’s do the math with numbers in our Minikube environment.</p>
<ul>
<li><p>In our case we set <code>statefulset.resources.limits.memory</code> = <strong>2 GiB</strong> for each CockroachDB pod.</p>
</li>
<li><p>The Helm default of ¼ (25%) rule means:</p>
<ul>
<li><p><code>conf.cache</code> = ¼ × 2 GiB = <strong>512 MiB</strong></p>
</li>
<li><p><code>conf.max-sql-memory</code> = ¼ × 2 GiB = <strong>512 MiB</strong></p>
</li>
</ul>
</li>
<li><p>Apply the formula: <code>(2 × 512 MiB) + 512 MiB = 1,536 MiB</code></p>
</li>
<li><p>Calculate 80% of the memory limit: <code>80% of 2 GiB = 1,638 MiB</code> (approximately)</p>
</li>
<li><p>Compare: 1,536 MiB ≤ 1,638 MiB – so we’re within the safe zone ✅</p>
</li>
<li><p>That means in this configuration, CockroachDB expects to use <strong>~1,536 MiB</strong> for its cache + SQL memory. This leaves <strong>~512 MiB</strong> (20%) of the 2 GiB limit for other internal processes.</p>
</li>
</ul>
<p>That leftover memory is for things like internal bookkeeping (range rebalancing, replication metadata), communication among database replicas, metric collection, logging, garbage collection, and temporary or unexpected memory spikes.</p>
<p>If you don’t leave this free space, your node might struggle when “normal operations”. And on Kubernetes, if the pod uses more memory than the <code>limits.memory</code> says, it can get OOM-killed which causes downtime or restarts.</p>
<h3 id="heading-on-requests-vs-limits-in-kubernetes">⚠️ On Requests vs Limits in Kubernetes</h3>
<p>Important nuance: Kubernetes schedules pods based on <strong>requests</strong> (what you ask for) but enforces limits based on <strong>limits</strong> (what you allow).</p>
<ul>
<li><p><code>statefulset.resources.requests.memory</code> = what the scheduler guarantees the pod will have.</p>
</li>
<li><p><code>statefulset.resources.limits.memory</code> = the maximum the pod can use before Kubernetes will kill it for excess memory.</p>
</li>
</ul>
<p>Because CockroachDB’s internal memory computations (cache + SQL memory) use the <strong>limit</strong> value to calculate sizing, if you set requests &lt; limits you’ll get a mismatch. Example:</p>
<ul>
<li><p>Suppose requests = 1 GiB, limits = 2 GiB</p>
</li>
<li><p>Kubernetes may schedule the pod on a node that has (at least) 1 GiB free</p>
</li>
<li><p>But internally, CockroachDB will plan for ~1.5 GiB usage (based on the 2 GiB limit)</p>
</li>
<li><p>The node may not actually have that much free memory available</p>
</li>
<li><p>The pod might try to use more memory than the node reserved and risk eviction due to less memory for other pods</p>
</li>
</ul>
<p>✅ <strong>Best practice:</strong> Set requests = limits for memory and CPU for CockroachDB pods. That way the scheduler reserves enough space for what CockroachDB will use internally.</p>
<h3 id="heading-overriding-the-default-fractions">Overriding the Default Fractions</h3>
<p>If you want to set static <code>conf.cache</code> or <code>conf.max-sql-memory</code> values (rather than relying on 25% of limit) you <em>can</em> – but you must still obey the memory usage formula.</p>
<p>For example, if you set:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">conf:</span>
  <span class="hljs-attr">cache:</span> <span class="hljs-string">"1Gi"</span>
  <span class="hljs-attr">max-sql-memory:</span> <span class="hljs-string">"1Gi"</span>
<span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
</code></pre>
<p>According to the above configuration your pod memory request and limit is <strong>3 GiB</strong>, then calculate:</p>
<pre><code class="lang-yaml"><span class="hljs-string">(2</span> <span class="hljs-string">×</span> <span class="hljs-string">1Gi)</span> <span class="hljs-string">+</span> <span class="hljs-string">1Gi</span> <span class="hljs-string">=</span> <span class="hljs-string">3Gi</span>
<span class="hljs-number">80</span><span class="hljs-string">%</span> <span class="hljs-string">of</span> <span class="hljs-string">3Gi</span> <span class="hljs-string">=</span> <span class="hljs-string">~2.4Gi</span>
</code></pre>
<p>Here <strong>3Gi &gt; 2.4Gi</strong>, so you’d be violating the rule. This is a risky setup.</p>
<p>So you’ll need to either reduce cache or SQL memory, for example to 768Mi (or increase the memory limit, for example 4Gi) so that your formula results in ≤ 80% of the limit.</p>
<h2 id="heading-scaling-cockroachdb-the-right-way">Scaling CockroachDB the Right Way</h2>
<p>In this section we’ll look at when and how you should grow your CockroachDB cluster – whether that means adding more replicas (horizontal scale), giving each node more CPU/RAM (vertical scale), or giving them more storage.</p>
<p>I’ll explain everything in simple terms and cover what metrics to watch, what decisions to make, and how to scale safely.</p>
<p>What we’ll discuss:</p>
<ul>
<li><p>How you can tell it’s time to “grow” your cluster</p>
</li>
<li><p>How to safely add more nodes or upgrade what you already have</p>
</li>
<li><p>How to decide whether you need more nodes, bigger nodes, or bigger disks</p>
</li>
<li><p>How to do all this without causing downtime or stress</p>
</li>
</ul>
<h3 id="heading-key-metrics-to-understand">Key Metrics to Understand</h3>
<p>Before we dive into how to scale our cluster, we need to understand what certain metrics mean. Because, these metrics will help us make calculated decisions, knowing what and and when to scale certain resources.</p>
<h4 id="heading-read-bytessecond-amp-write-bytessecond-throughput">Read bytes/second &amp; Write bytes/second (Throughput)</h4>
<p>Read bytes/second is how much data (in bytes) the disk is <strong>reading</strong> every second from itself to the database, that is, passing from the disk to the database app.</p>
<p>Write bytes/second is how much data is being <strong>written</strong> to the disk per second, that is, moving from the database to the disk.</p>
<p>This matters because your database is an application that stores data on disk. If your app needs to read a lot of data (reads) or write a lot of data (writes), this metric shows the <strong>volume</strong> of data flowing to/from disk.</p>
<p>To keep an eye on it, go to your CockroachDB dashboard and navigate to the “Metrics” link on the sidebar. Under the “Metrics” title, click the “Dashboard:…” drop-down and select “Hardware” from the options.</p>
<p>Now, scroll down a bit till you see “Disk Read Bytes/s” and “Disk Write Bytes/s”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762325396257/553ac9d4-4927-40f3-b654-8b19a0b2aef8.png" alt="The Disk Read &amp; Write Bytes/s metrics" class="image--center mx-auto" width="1135" height="821" loading="lazy"></p>
<h4 id="heading-read-iops-amp-write-iops">Read IOPS &amp; Write IOPS</h4>
<p><strong>IOPS</strong> = “Input/Output Operations Per Second”. Here, Read IOPS = how many <strong>read operations</strong> the disk is performing per second. Write IOPS = how many <strong>write operations</strong> per second.</p>
<p>This is different from throughput because throughput is about how many bytes (data) are being transferred. IOPS, on the other hand, is about <strong>how many operations</strong> are happening (regardless of size).</p>
<p>Here’s an example: 10 read operations/sec of 1 MiB each = 10 MiB/sec throughput, 10 IOPS. Another scenario: 100 reads/sec of 10 KiB each = ~1 MiB/sec throughput, but 100 IOPS (higher operations count though lower data size.</p>
<p>Scroll down a bit more to view the IOPS metrics:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762325699278/dd549ac3-16cf-4373-9637-5a1e798bf5db.png" alt="Illustrating the IOPS metrics on the dashboard" class="image--center mx-auto" width="977" height="813" loading="lazy"></p>
<h4 id="heading-sql-p99-latency-99th-percentile-latency">SQL p99 Latency (99th percentile latency)</h4>
<p>P99 latency is the time it takes for the <strong>slowest 1% of queries</strong> to finish.</p>
<p>For example, let’s say you run 1,000 queries. How long the slowest 10 of them took is what p99 shows.</p>
<p>This matters because it’s not about the average query, but about the tail (worst cases). If your p99 is high, it means some queries are seriously lagging. All other queries might be fine, but some are dragging.</p>
<p>So if p99 jumps up (for example, from 10 ms → 300 ms), you should investigate: maybe big joins, missing indexes, contention, or data takes too much time to get stored in the disk.</p>
<p>To access the SQL P99 Latency metrics, simply click the “Dashboard:…” select field, and choose the “Overview” option from the dropdown.</p>
<p>PS: The higher the p99 latency, the more problem there is (slower queries).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762326088120/e6f39e6e-942b-4db9-b808-cb228c1e0cc5.png" alt="The SQL p99 latency metric" class="image--center mx-auto" width="980" height="479" loading="lazy"></p>
<h4 id="heading-disk-ops-in-progress-queue-depth">Disk Ops In Progress (Queue Depth)</h4>
<p>This shows how many disk reads and writes are waiting <em>in line</em> (queued) because the storage system is busy.</p>
<p>A queue depth of 0–5 is generally OK. If it frequently goes into double-digits (10+), that means storage is struggling and latency may spike. If you see this number high and staying high, you may need faster storage or more database replicas.</p>
<p>Simple rule: if “Ops In Progress” &gt; ~9 for extended time, this is a bad sign. Time to check disks and I/O.</p>
<p>To access the “Disk Ops In Progress“ metric, return to the “Hardware“ dashboard, and scroll down:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762488796957/b2a215fd-ec51-4ee3-9056-a5fa6d511c61.png" alt="Accessing the Disk Ops In Progress metrics on the COckroachDB dashboard" class="image--center mx-auto" width="975" height="621" loading="lazy"></p>
<p>By monitoring these, you can choose:</p>
<ul>
<li><p>“I need <strong>more nodes</strong>” (horizontal scale)</p>
</li>
<li><p>“I need <strong>bigger nodes or faster storage</strong>” (vertical scale)</p>
</li>
<li><p>“I need <strong>better query/index tuning</strong>” (optimize rather than scale)</p>
</li>
</ul>
<h3 id="heading-when-and-what-to-scale-based-on-your-metrics">When (and What) to Scale Based on Your Metrics</h3>
<p>So, let’s imagine you’re watching your CockroachDB dashboard and notice this pattern:</p>
<ul>
<li><p>The <strong>SQL P99 latency</strong> (the slowest 1% of your queries) is high, meaning your queries are taking too long.</p>
</li>
<li><p>The <strong>CPU usage</strong> for your CockroachDB pods (under <em>Cockroach process CPU%</em>) is above <strong>80%</strong> consistently.</p>
</li>
</ul>
<p>That’s a classic sign your cluster is running out of CPU power and the database is struggling to process queries fast enough because the CPU is maxed out.</p>
<p>Here’s how to fix it 👇🏾</p>
<h4 id="heading-step-1-add-more-cpu-power">Step 1: Add More CPU Power</h4>
<p>You can scale up your CPUs directly through the <strong>Helm chart values file</strong>, <code>cockroachdb-values.yml</code>.</p>
<p>In that file, look for the section where CPU and memory requests/limits are defined under <code>statefulset.resources</code>. Then, increase the CPU allocations. For example:</p>
<pre><code class="lang-bash">statefulset:
  resources:
    requests:
      cpu: <span class="hljs-string">"3"</span>
      memory: <span class="hljs-string">"6Gi"</span>
    limits:
      cpu: <span class="hljs-string">"3"</span>
      memory: <span class="hljs-string">"6Gi"</span>
</code></pre>
<p>This means each CockroachDB pod (replica) will now <em>request</em> 3 vCPUs (guaranteed). Save the file, then apply the update with the Helm command:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>Once the upgrade is done, give it 30 minutes to 1 hour to stabilize. The CockroachDB dashboard will automatically start showing you updated metrics.</p>
<p>If you see that the CPU usage drops below 70% and the SQL P99 latency improves, you’re good. 👍🏾</p>
<h4 id="heading-step-2-add-another-replica-new-node">Step 2: Add Another Replica (New Node)</h4>
<p>But…what if the latency is <strong>still high</strong> even after adding more CPU? That likely means the cluster is still overloaded, and it’s time to add another node (replica) to distribute the load.</p>
<p>Here’s why that works: CockroachDB is horizontally scalable, meaning it automatically spreads out your data (remember <strong>ranges</strong>?) and balances reads/writes across all replicas. So, the more nodes you add, the more evenly your cluster can share the work.</p>
<p>To add another replica, simply increase the <code>replicas</code> value in your Helm config:</p>
<pre><code class="lang-bash">statefulset:
  replicas: 4  <span class="hljs-comment"># If it was 3 before</span>
</code></pre>
<p>Then, redeploy again:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>This adds a new pod (a new CockroachDB node) to your cluster. CockroachDB will automatically rebalance your data across nodes – no manual migration needed</p>
<p>💡 <strong>Tip:</strong> Try to keep one CockroachDB pod (replica) per VM. For example, if you have 3 replicas, you should ideally have 3 separate VMs (worker nodes). This ensures better fault tolerance and performance.</p>
<p>Luckily, the official CockroachDB Helm chart already helps with this by managing <strong>Pod</strong> <strong>anti-affinity rules</strong>, so pods are automatically spread across nodes safely.</p>
<h3 id="heading-disk-bound-situations-what-to-do-when-your-disk-is-the-limiting-factor">Disk-Bound Situations — What to Do When Your Disk Is the Limiting Factor</h3>
<p>If you’re seeing this kind of pattern in your CockroachDB dashboard and Kubernetes cluster:</p>
<ul>
<li><p>SQL P99 latency is high (queries are slow)</p>
</li>
<li><p>“Disk Ops In Progress” (queue depth) stays above ~9-10 – meaning many disk I/O operations are waiting to be processed</p>
</li>
<li><p>Disk “Read bytes/sec” or “Write bytes/sec” (throughput) are high <strong>or</strong> “Read IOPS” or “Write IOPS” are high (even though CPU looks okay)</p>
</li>
</ul>
<p>Then you’re very likely <strong>disk-bound</strong>, meaning your storage is the bottleneck.</p>
<p>Here’s how to fix it (and yes, it’s a bit more complex than just “add more RAM”)…</p>
<h4 id="heading-step-1-increase-disk-size-in-your-helm-values">Step 1: Increase Disk Size in Your Helm Values</h4>
<p>Often the first problem is that the disk size is too small. Here’s how you can increase it:</p>
<ol>
<li><p>Open your <code>cockroachdb-values.yml</code> (the Helm chart values file)</p>
</li>
<li><p>Look for the storage section, for example:</p>
</li>
</ol>
<pre><code class="lang-bash">storage:
  persistentVolume:
    size: 5Gi  <span class="hljs-comment"># current size</span>
</code></pre>
<ol start="3">
<li>Update it to a larger size, like:</li>
</ol>
<pre><code class="lang-bash">storage:
  persistentVolume:
    size: 15Gi  <span class="hljs-comment"># increased size</span>
</code></pre>
<ol start="4">
<li>Save the file and run:</li>
</ol>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p><strong>N.B.</strong> If this doesn’t work or you receive an error from the Helm chart concerning not being able to modify some values (this is normal), just upsize the disk this way:👇🏾 (just replace the PVC_NAME and SIZE placeholders accordingly)</p>
<pre><code class="lang-bash">kubectl patch pvc &lt;PVC_NAME&gt; \
  -p <span class="hljs-string">'{"spec":{"resources":{"requests":{"storage":"&lt;SIZE&gt;"}}}}'</span>
</code></pre>
<p>Do that for each PVC (<code>datadir-crdb-cockroachdb-0</code>, <code>datadir-crdb-cockroachdb-1</code>, and so on).</p>
<p><strong>Important:</strong> Increasing size <em>may help</em>, but often alone is not enough because your disk speed (IOPS/throughput) also depends on factors beyond just size.</p>
<p>Let’s break down why that’s the case, and what really affects your disk performance (especially on Google Cloud, which is what I’m using, too).</p>
<h4 id="heading-why-disk-speed-can-vary">Why Disk Speed Can Vary</h4>
<p>Your CockroachDB cluster uses <strong>external disks</strong> provided by your cloud provider (like Google, AWS, or Azure). The speed of those disks – that is, how fast they can read/write data – isn’t fixed. It depends on a few key factors.</p>
<p>On Google Cloud, disk performance depends on three main things:</p>
<ol>
<li><p><strong>Disk type</strong>: HDD, SSD, or fast SSD (pd-ssd) (the faster the disk type, the faster it can handle data operations)</p>
</li>
<li><p><strong>Disk size</strong>: larger disks usually come with higher speed limits (the bigger, the faster)</p>
</li>
<li><p><strong>VM’s vCPU count</strong>: more CPUs mean higher quotas for both</p>
<ul>
<li><p>read/write operations per second (<strong>IOPS</strong>), and</p>
</li>
<li><p>how much data can flow to/from the disk per second (<strong>throughput</strong>)</p>
</li>
</ul>
</li>
</ol>
<h4 id="heading-the-recommended-disk-type-for-cockroachdb">The Recommended Disk Type for CockroachDB</h4>
<p>The pd-ssd (Google’s fast SSD) is the recommended type for CockroachDB.</p>
<ul>
<li><p>Each pd-ssd disk starts with a minimum of 6,000 IOPS (read or write operations per second).</p>
</li>
<li><p>It also has around 240 MiB/s (~252 MB/s) of read/write throughput.</p>
</li>
</ul>
<p>In simple terms, that means your CockroachDB disk can handle up to 6,000 read/write operations EVERY SECOND, and move 250+ MB of data in and out every second. That’s pretty impressive!</p>
<p>But here’s the catch: those numbers can still vary depending on your <strong>VM family</strong> and <strong>CPU count</strong>.</p>
<h4 id="heading-how-vm-family-affects-disk-speed-e2-example">How VM Family Affects Disk Speed (E2 Example)</h4>
<p>If your CockroachDB is running on an E2 VM family (one of Google Cloud’s general-purpose VM types):</p>
<ul>
<li><p>A VM with 2–7 vCPUs can handle up to:</p>
<ul>
<li><p>15k IOPS (read/write operations per second)</p>
</li>
<li><p>250+ MiB/s throughput (which is already far more than many databases ever use 😅)</p>
</li>
</ul>
</li>
<li><p>A VM with 8–15 vCPUs still allows 15k IOPS, but throughput jumps up to ~800 MiB/s 😮 –<br>  meaning your disk can push nearly 0.8 GB per second of data in/out IN A SECOND.</p>
</li>
</ul>
<p>The more vCPUs you have, the higher these limits grow, both for IOPS and throughput.</p>
<h4 id="heading-putting-it-all-together">Putting It All Together</h4>
<p>So, if you notice high SQL P99 latency (queries taking long), and disk read and write IOPS or throughput (read &amp; write bytes) usage close to their limits, then your disk may be maxing out, not your database itself.</p>
<p>Here’s what you can do:</p>
<ul>
<li><p>Check your current VM’s vCPU count and disk performance limit for that CPU.</p>
</li>
<li><p>If you’re using E2 with low vCPUs (for example, 2–4), try increasing it to <strong>8 vCPUs or more</strong>. That’ll immediately lift your IOPS and throughput ceiling.</p>
</li>
</ul>
<h4 id="heading-example-e2-vm-family-iopsthroughput-table">Example: E2 VM Family IOPS/Throughput Table</h4>
<pre><code class="lang-bash">E2 per-VM caps (pd-ssd):

e2-medium:     10k write / 12k <span class="hljs-built_in">read</span> IOPS, 200/200 MiB/s
2–7 vCPUs:     15k / 15k IOPS, 240/240 MiB/s
8–15 vCPUs:    15k / 15k IOPS, 800/800 MiB/s
16–31 vCPUs:   25k / 25k IOPS, 1,000 write / 1,200 <span class="hljs-built_in">read</span> MiB/s
32 vCPUs:      60k / 60k IOPS, 1,000 write / 1,200 <span class="hljs-built_in">read</span> MiB/s
</code></pre>
<p>The rule is simple — the higher the CPU tier (2–7, 8–15, and so on), the higher the disk speed cap.</p>
<h4 id="heading-but-what-if-youre-still-seeing-slow-queries">⚠️ But What If You’re Still Seeing Slow Queries?</h4>
<p>If your CockroachDB queries are <em>still</em> slow, but your metrics show that you’re not fully using your disk capacity (based on your VM’s CPU range), then your <strong>disk size</strong> might be the actual limitation.</p>
<p>In that case:</p>
<ul>
<li><p>Gradually increase your disk size, for exaxmple from <code>50Gi</code> to <code>70Gi</code> to <code>100Gi</code>.</p>
</li>
<li><p>Each increase enables your disk to pass more amount of data in and out (especially with pd-ssd).</p>
</li>
<li><p>Remember: once you increase disk size on Google Cloud, <strong>you can’t shrink it back down</strong>, so grow it slowly and observe improvements before scaling again.</p>
</li>
</ul>
<p>This step helps you pinpoint <em>exactly</em> whether the slowdown is coming from insufficient IOPS, throughput, or just a disk that’s too small for CockroachDB’s workload 💪🏾</p>
<h3 id="heading-memory-pressure-what-to-do-when-your-database-hits-the-limit">Memory Pressure — What to Do When Your Database Hits the Limit</h3>
<p>There are some signs in your cluster you can look out for that’ll tell you your database is getting close to its limit. Pods (database replicas) might be getting <strong>OOMKilled</strong> (out of memory) or being evicted by Kubernetes, or your memory usage might be staying above ~ 75–80% for a while.</p>
<p>If either these is the case, you’re often dealing with <strong>memory pressure</strong> (you can check memory usage on the CockroachDB overview dashboard).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762584827011/e7828548-7ed7-4a87-b6b2-fff52c6f6df1.png" alt="Accessing your Cluster memory usage" class="image--center mx-auto" width="1139" height="900" loading="lazy"></p>
<h4 id="heading-why-this-happens">Why this happens</h4>
<p>If you didn’t set memory requests and limits properly for each replica, the pod might not have enough head-room for all of its internal work (cache, SQL memory, background jobs) and Kubernetes kills it or it crashes.</p>
<p>Also, as you increase load (lots of queries, many users), your database needs more memory for two internal areas:</p>
<ul>
<li><p><code>--cache</code> (or <code>conf.cache</code>): in-memory data caching</p>
</li>
<li><p><code>--max-sql-memory</code> (or <code>conf.max-sql-memory</code>): memory for running SQL queries (joins, sorts, and so on).<br>  And yes, we covered the formula earlier <code>(2 × max-sql-memory) + cache ≤ ~ 80% of RAM limit</code>.</p>
</li>
</ul>
<h4 id="heading-what-to-do">What to do:</h4>
<p>First, you can increase the DB memory. In your Helm chart values (<code>cockroachdb-values.yml</code>), bump up the <code>statefulset.resources.limits.memory</code> and <code>statefulset.resources.requests.memory</code>. Or you can modify <code>conf.cache</code> and <code>conf.max-sql-memory</code> values (if you’re comfortable) but only if the total RAM limit is sufficient to support them.</p>
<p>Because the defaults (when you installed) set each to ~25% of RAM limit, they will scale automatically when you increase RAM.</p>
<p>For example:</p>
<ul>
<li><p>If RAM limit per pod = <strong>5 GiB</strong>, then cache ≈ <strong>1.25 GiB</strong>, max-sql-memory ≈ <strong>1.25 GiB</strong></p>
</li>
<li><p>If you raise RAM limit to <strong>8 GiB</strong>, these become ≈ <strong>2 GiB</strong> each. This keeps you inside the formula and avoids memory crashes.</p>
</li>
</ul>
<h4 id="heading-quick-yaml-snippet-example">Quick YAML snippet example:</h4>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"8Gi"</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"8Gi"</span>
<span class="hljs-attr">conf:</span>
  <span class="hljs-attr">cache:</span> <span class="hljs-string">"25%"</span>
  <span class="hljs-attr">max-sql-memory:</span> <span class="hljs-string">"25%"</span>
</code></pre>
<p>After editing your values file, remember to apply it:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<h3 id="heading-when-queries-are-slow-but-everything-else-cpu-memory-amp-disk-looks-fine">When Queries Are Slow but Everything Else (CPU, Memory &amp; Disk) Looks “Fine”</h3>
<p>Sometimes you’ll see that your resource metrics (CPU, memory, disk I/O) all seem healthy. But your queries are still slow.</p>
<p>What then? One important cause: <strong>hotspots</strong> – especially “hot ranges” or “hot nodes” in CockroachDB.</p>
<p>A <strong>hot range</strong> is a portion of data (in CockroachDB, a range is a section of data from a table) that’s receiving much more traffic (reads or writes) than others.</p>
<p>A <strong>hot node</strong>, on the other hand, is a node/replica in the cluster which has significantly more load compared to the other nodes – often because it holds one or more hot ranges.</p>
<p>Because most of the traffic (queries) go to a range which is on a specific node, even though your overall CPU / memory / disk metrics might look “okay”, performance still suffers locally: queries are funneled into that specific range, making a “hotspot”.</p>
<p>Learn more about Hotspots <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/understand-hotspots">here</a>.</p>
<h4 id="heading-why-a-high-write-workload-can-slow-reads">Why A High Write Workload Can Slow Reads</h4>
<p>When you have lots of write queries, they may overload specific ranges or nodes (especially if the keyspace is skewed). Writes tend to:</p>
<ul>
<li><p>Acquire locks or latches on rows or ranges</p>
</li>
<li><p>Cause contention among transactions</p>
</li>
<li><p>Require coordination (for example, via Raft consensus) which impacts performance.</p>
</li>
</ul>
<p>When writes dominate a range, read queries that hit the same ranges may get queued behind these write operations, or suffer longer wait times.</p>
<p>Since reads and writes are sharing the same underlying data/ranges, too much writes can delay reads by creating bottlenecks. The docs call this part of “write hotspots”.</p>
<h4 id="heading-key-signs-you-might-have-a-hotspot">Key Signs You Might Have a Hotspot</h4>
<ul>
<li><p>One node’s CPU % is much higher than the others (even though overall resources seem fine)</p>
</li>
<li><p>On the Hot Ranges page in the CockroachDB UI, some ranges show very high QPS (queries per second) compared to others.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762586236835/aeb3b0ea-b280-48d3-b12f-4cfe78d11dc1.png" alt="The Hot Ranges page in the CockorachDB dashboard UI" class="image--center mx-auto" width="1095" height="608" loading="lazy"></p>
</li>
<li><p>You observe that increasing overall resources (more CPU, more nodes) didn’t resolve the slowness. This suggests the problem isn’t “not enough resources” but “resource imbalance”.</p>
</li>
</ul>
<h4 id="heading-what-you-can-do">What You Can Do</h4>
<p>There are a few things you can do to prevent hotspots:</p>
<ul>
<li><p>Use the <strong>Hot Ranges</strong> UI page (go to the Database Console and then to Hot Ranges) to identify the range IDs and table/indexes causing the issue.</p>
</li>
<li><p>Examine how the key space is being used. If your table/index primary key is monotonically increasing (for example, timestamps or serial IDs), the writes may target a narrow portion of the data, causing a hotspot. The docs suggest using hash-sharded indexes or distributing writes across the key-space.</p>
</li>
<li><p>Ensure load is balanced across nodes: avoid “one node doing most of the work”. If needed, add nodes or ensure range distribution/lease-holder movement is happening.</p>
</li>
<li><p>Monitor write-versus-read workload. if writes are heavy, they may cause queuing for reads even when resources appear OK. So look at write heavy traffic patterns and try reducing the amount of writes (if possible).</p>
</li>
</ul>
<h4 id="heading-note">⚠️ Note</h4>
<p>Learning everything about hotspots, key visualizers, and range splitting is a bit advanced. For those wanting to dive deeper: see the CockroachDB <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/performance-recipes">Performance Recipes page</a>.</p>
<h3 id="heading-understanding-disk-speed-iops-amp-throughput-across-cloud-providers">Understanding Disk Speed (IOPS &amp; Throughput) Across Cloud Providers</h3>
<p>So far, we’ve talked about how disk speed affects CockroachDB’s performance – especially how Google Cloud measures it. But it’s important to know that <strong>each cloud provider has its own way of measuring and limiting disk performance</strong> (IOPS and throughput).</p>
<p>So, while our earlier examples focused on Google Cloud, similar logic applies to AWS, Azure, and even DigitalOcean, just with different formulas and limits.</p>
<h4 id="heading-for-google-cloud">For Google Cloud:</h4>
<p>These guides break down how disk performance works:</p>
<ul>
<li><p><a target="_blank" href="https://cloud.google.com/compute/docs/disks/performance">Persistent Disk performance overview</a>: explains how baseline IOPS and throughput are calculated and the per-instance caps.</p>
</li>
<li><p><a target="_blank" href="https://docs.cloud.google.com/compute/docs/disks/persistent-disks">About Persistent Disks</a>: quick definitions of <code>pd-standard</code> (HDD), <code>pd-balanced</code> (SSD), and <code>pd-ssd</code> (SSD).</p>
</li>
<li><p><a target="_blank" href="https://cloud.google.com/compute/docs/disks/optimizing-pd-performance">Optimize PD performance</a>: shows how disk size, machine series, and tuning can affect performance.</p>
</li>
</ul>
<h4 id="heading-for-aws-ebs">For AWS (EBS):</h4>
<p>AWS’s Elastic Block Store (EBS) has several disk types:</p>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html">EBS volume types</a>: overview of all SSD and HDD types (<code>gp3</code>, <code>gp2</code>, <code>io2</code>, and so on).</p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/ebs/latest/userguide/general-purpose.html">General Purpose SSD (gp3)</a>: lets you provision custom IOPS and throughput for your disks (about 0.25 MiB/s per IOPS, up to 2,000 MiB/s).</p>
</li>
</ul>
<h4 id="heading-for-azure-managed-disks">For Azure (Managed Disks):</h4>
<p>Azure disks also vary by type and size:</p>
<ul>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types">Disk types overview</a>: compares Standard HDD, Standard SSD, Premium SSD, Premium SSD v2, and Ultra Disk.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/virtual-machines/disks-deploy-premium-v2">Premium SSD v2</a>: lets you independently set IOPS and throughput for your disks.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/virtual-machines/disks-performance">VM &amp; disk performance</a>: lists per-VM IOPS and throughput caps.</p>
</li>
</ul>
<h4 id="heading-for-digitalocean">For DigitalOcean:</h4>
<p>DigitalOcean offers simpler storage setups:</p>
<ul>
<li><p><a target="_blank" href="https://docs.digitalocean.com/products/volumes/">Volumes overview</a>: explains block storage and NVMe details.</p>
</li>
<li><p><a target="_blank" href="https://docs.digitalocean.com/products/volumes/details/limits/">Volume Limits</a>: shows per-Droplet IOPS and throughput caps (including burst windows).</p>
</li>
</ul>
<h3 id="heading-downsizing-the-cluster-reducing-replicas">Downsizing the Cluster (Reducing Replicas)</h3>
<p>Now that we’ve seen how to scale up our CockroachDB cluster, let’s look at how to scale it down safely and correctly.</p>
<p>Let’s assume we scaled our cluster from 3 replicas to 5 replicas earlier (to handle more workload).</p>
<p>PS: If your CockroachDB pods were crashing often, you might need to increase the CPU and memory limits in the Helm chart configuration, like this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">5</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"2Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span> <span class="hljs-comment"># We can keep the memory requests and limits inconsistent for now, since we're in a development environment</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
<span class="hljs-string">...</span>
</code></pre>
<p>Then, you update the cluster using:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/helm-chart -f cockroachdb-values.yml
</code></pre>
<p>After a few minutes, you can confirm the newly added replicas <code>kubectl get pods</code>. You should now see five CockroachDB pods running.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762612478598/dee9f9e7-6b31-4b06-aed3-e2b0b97268fd.png" alt="The newly added CockroachDB replicas" class="image--center mx-auto" width="526" height="139" loading="lazy"></p>
<p>Also, check your CockroachDB Admin UI – the new nodes should now appear in the cluster overview.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762612539734/30e01a7d-3d2b-4160-be90-2988a161d87d.png" alt="Newly added nodes in the cluster" class="image--center mx-auto" width="1502" height="728" loading="lazy"></p>
<p>P.S: You might experience some issues when upscaling your cluster, especially if you don’t have sufficient memory and CPU on your PC or wherever you’re running your Kubernetes cluster.</p>
<h3 id="heading-the-wrong-way-to-downscale">⚠️ The Wrong Way to Downscale</h3>
<p>Now, what if your workload reduces and you’d like to cut costs by scaling down from 5 replicas back to 3?</p>
<p>You might think, <em>“Oh, I’ll just reduce the number of replicas in the Helm chart from 5 to 3 and redeploy.”</em> But hold on, that’s very wrong! 😅</p>
<p>Scaling up CockroachDB is simple…but scaling down must be done carefully, because of certain factors which will explain.</p>
<h3 id="heading-decommissioning-a-node-before-scaling-down-the-cluster">Decommissioning a Node Before Scaling Down the Cluster</h3>
<p>Before you go ahead and reduce the number of replicas in your CockroachDB cluster, it’s important to follow the right process.</p>
<p>You <em>can’t</em> just go from 5 replicas down to 3 and expect everything to go smoothly. There are steps you must take.</p>
<h4 id="heading-why-you-cant-just-scale-from-5-to-3-instantly">Why you can’t just scale from 5 to 3 instantly</h4>
<p>If you reduce your cluster size too quickly, you might:</p>
<ul>
<li><p>Lose data redundancy or fail to meet the required replication factor.</p>
</li>
<li><p>Cause data rebalancing to happen under heavy load, which can slow queries.</p>
</li>
<li><p>Put your cluster into a state where certain ranges or data replicas don’t have enough copies to remain fault-tolerant.</p>
</li>
</ul>
<h4 id="heading-the-correct-approach-decommission-first-then-scale-down-one-node-at-a-time">✅ The correct approach: Decommission first, then scale down one node at a time</h4>
<p>Here’s the safe way to downscale:</p>
<ol>
<li><p><strong>Decommission</strong> the node you plan to remove.</p>
</li>
<li><p>Once decommissioning is complete, <strong>reduce the replica count</strong> (for example, from 5 to 4).</p>
</li>
<li><p>Delete the disk/PVC tied to that removed node.</p>
</li>
<li><p>Repeat the process (remove one node at a time) until you reach your target size (for example, down to 3 replicas).</p>
</li>
</ol>
<h4 id="heading-step-by-step-decommission-the-5th-node-before-scaling-5-to-4">Step-by-step: Decommission the 5th node (before scaling 5 to 4)</h4>
<ol>
<li><p><strong>Create a client pod</strong> to run CockroachDB commands.<br> Create a file named <code>cockroachdb-client.yml</code> with this content:</p>
<pre><code class="lang-yaml"> <span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
 <span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
 <span class="hljs-attr">metadata:</span>
   <span class="hljs-attr">name:</span> <span class="hljs-string">cockroachdb-client</span>
 <span class="hljs-attr">spec:</span>
   <span class="hljs-attr">serviceAccountName:</span> <span class="hljs-string">&lt;SA&gt;</span>
   <span class="hljs-attr">containers:</span>
     <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">cockroachdb-client</span>
       <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.1</span>
       <span class="hljs-attr">imagePullPolicy:</span> <span class="hljs-string">IfNotPresent</span>
       <span class="hljs-attr">command:</span>
         <span class="hljs-bullet">-</span> <span class="hljs-string">sleep</span>
         <span class="hljs-bullet">-</span> <span class="hljs-string">"2147483648"</span>
   <span class="hljs-attr">terminationGracePeriodSeconds:</span> <span class="hljs-number">300</span>
</code></pre>
<p> Replace <code>&lt;SA&gt;</code> with your CockroachDB service account name (find it via <code>kubectl get sa -l app.kubernetes.io/name=cockroachdb</code>).</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620657038/34d5eb4b-de16-4e8a-b85c-1e7bf6b76172.png" alt="The CockroachDB service account details" class="image--center mx-auto" width="791" height="55" loading="lazy"></p>
</li>
<li><p>Apply the manifest:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">apply</span> <span class="hljs-string">-f</span> <span class="hljs-string">cockroachdb-client.yml</span>
</code></pre>
</li>
<li><p>Confirm the pod is running:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">get</span> <span class="hljs-string">pods</span>
</code></pre>
<p> You should see <code>cockroachdb-client</code>.</p>
</li>
<li><p>Exec into the client pod:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">exec</span> <span class="hljs-string">-it</span> <span class="hljs-string">cockroachdb-client</span> <span class="hljs-string">--</span> <span class="hljs-string">bash</span>
</code></pre>
</li>
<li><p>Get the list of nodes and IDs:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">./cockroach</span> <span class="hljs-string">node</span> <span class="hljs-string">status</span> <span class="hljs-string">--insecure</span> <span class="hljs-string">--host</span> <span class="hljs-string">&lt;SERVICE_NAME&gt;</span>
</code></pre>
<p> Find your service name: <code>kubectl get svc -l app.kubernetes.io/component=cockroachdb</code>. In our case it’s <code>crdb-cockroachdb-public</code>.</p>
<p> You’ll see nodes with IDs 1, 2, 3, 4, 5. Each maps to a replica pod like <code>crdb-cockroachdb-0</code>, <code>-1</code>, <code>-2</code>, <code>-3</code>, <code>-4</code>.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620790692/af8d382e-71db-4eab-af7a-a3491d98c8a8.png" alt="The nodes in the CockroachDB cluster" class="image--center mx-auto" width="1658" height="299" loading="lazy"></p>
</li>
<li><p><strong>Decommission the node with the highest index</strong> (since Kubernetes will remove the highest-numbered replica when scaling down).<br> For example, if you’re removing the pod <code>crdb-cockroachdb-4…</code>, and the node ID is 5:</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620838125/b51856cb-2fbb-4b24-ba41-21f572c7678c.png" alt="The node to be decommissioned" class="image--center mx-auto" width="527" height="38" loading="lazy"></p>
<p> Run the command below to decommission the 5th node.</p>
<pre><code class="lang-yaml"> <span class="hljs-string">./cockroach</span> <span class="hljs-string">node</span> <span class="hljs-string">decommission</span> <span class="hljs-number">5</span> <span class="hljs-string">--host</span> <span class="hljs-string">crdb-cockroachdb-public</span> <span class="hljs-string">--insecure</span>
</code></pre>
</li>
<li><p>Navigate to the CockroachDB dashboard, and monitor until the node status shows as <code>decommissioned</code>.<br> In the CockroachDB Console’s Cluster Overview page, you’ll see formerly removed nodes under “Recently Decommissioned Nodes”.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620923692/e678b21b-e2cc-4fe5-bd5b-46c4b0248958.png" alt="e678b21b-e2cc-4fe5-bd5b-46c4b0248958" class="image--center mx-auto" width="1335" height="734" loading="lazy"></p>
</li>
<li><p><strong>Scale down the replicas</strong> in your Helm values file:</p>
<pre><code class="lang-yaml"> <span class="hljs-attr">statefulset:</span>
   <span class="hljs-attr">replicas:</span> <span class="hljs-number">4</span>
 <span class="hljs-string">...</span>
</code></pre>
<p> Then run:</p>
<pre><code class="lang-bash"> helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
</li>
<li><p>Verify pods:</p>
<pre><code class="lang-bash"> kubectl get pods
</code></pre>
<p> You should now see 4 CockroachDB replica pods.</p>
</li>
<li><p><strong>Delete the PVC</strong> for the removed node (to avoid paying for storage you’re no longer using):</p>
</li>
</ol>
<pre><code class="lang-bash">kubectl delete pvc datadir-crdb-cockroachdb-4
</code></pre>
<ol start="11">
<li>Repeat the process for the next node if you want to go from 4 to 3 replicas: decommission node #4 next, scale to 3, delete its PVC, and so on.</li>
</ol>
<p>After you’re done, you’ll have the target state (for example, 3 nodes) safely and cleanly without causing cluster instability or data loss.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762621007089/cf7fce07-a3a6-4b01-9536-1d5476c2119e.png" alt="Scaling down to 3 nodes, the nodes status on the CockroachDB dashboarrd" class="image--center mx-auto" width="1314" height="705" loading="lazy"></p>
<p>To learn more about scaling down your CockroachDB nodes, visit the <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/scale-cockroachdb-kubernetes?filters=helm#remove-nodes">official CockroachDB docs</a>.</p>
<p>Note that you should <strong>NOT</strong> use Horizontal Pod Autoscalers for scaling up and down your CockroachDB cluster.</p>
<p>Remember, before scaling down, you need to <strong>DECOMMISSION THE NODES FIRST</strong>, and <strong>scale down ONE AT A TIME</strong>!</p>
<p>However, the Horizontal Pod Autoscalers do NOT obey this. So if you intend to auto-scale your CockroachDB cluster, it's best to have a fixed size of replicas, for example, 3, 5, 7.</p>
<p>Then set up a Vertical pod Autoscaler to scale their CPU and RAM (Remember to set the Memory and CPU requests and limits to the same quantity to prevent eviction as explained earlier).</p>
<h2 id="heading-what-to-consider-when-deploying-cockroachdb-on-google-kubernetes-engine-gke">What to Consider When Deploying CockroachDB on Google Kubernetes Engine (GKE) ☁️</h2>
<p>Up until now we’ve been working in a <strong>development environment</strong> (using Minikube, local setups), testing and learning.</p>
<p>Now we’re ready to move into <strong>production mode 🤓</strong>. And one of the best places to host CockroachDB in production is on GKE.</p>
<p>In this section, we’ll cover GKE-specific considerations, such as storage classes, load balancers, networking, and how to secure our CockroachDB cluster on GKE using mTLS for authenticating our clients and encrypting any data sent to and from our CockroachDB cluster.</p>
<h3 id="heading-creating-your-gke-cluster">Creating Your GKE Cluster</h3>
<p>To get started, head over to the <a target="_blank" href="https://console.cloud.google.com/"><strong>Google Cloud Console</strong></a>.</p>
<p>In the search bar at the top, type “Kubernetes” and click on “Kubernetes Engine” from the dropdown.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836788168/0d509529-69fb-4308-ba05-6a1426ee7fe1.png" alt="Searching the Kubernetes Engine resource" class="image--center mx-auto" width="1381" height="183" loading="lazy"></p>
<p>You’ll be taken to the Kubernetes Engine page. On the left sidebar, click “Clusters.” Then click the “Create” button at the top.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836843514/fc6d59a2-5b9d-4dee-9fea-7bbb7fc2a023.png" alt="Creating a new cluster" class="image--center mx-auto" width="1352" height="434" loading="lazy"></p>
<p>💡 <strong>Note:</strong> You’ll need to enable the <strong>Compute Engine API</strong> before you can create a GKE cluster. If you haven’t done that yet, Google Cloud will automatically redirect you to a page where you can enable it. Just click “Enable”, then return to the cluster page.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763998084001/3ecbe47c-3def-4f9c-bc80-dabe2c0002c8.png" alt="Enabling the Compute Engine API" class="image--center mx-auto" width="1090" height="537" loading="lazy"></p>
<p>You can also learn more about enabling APIs in Google Cloud here: <a target="_blank" href="https://docs.cloud.google.com/endpoints/docs/openapi/enable-api">Enable APIs in Google Cloud</a>.</p>
<p>Once you’re back, you’ll see the cluster creation page. If it defaults to Autopilot, click “Switch to Standard cluster” in the top-right corner. This gives you more control over node settings.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836938958/a2c35e79-6404-4c3a-a821-94d4ce926839.png" alt="Switching to Standard Cluster settings" class="image--center mx-auto" width="1153" height="676" loading="lazy"></p>
<p>Under Cluster basics, give your cluster a name – something like <code>cockroachdb-tutorial</code> works great! Then, set Location type to Zonal (that’s fine for now).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836985443/eb7b1f79-66e3-4ca4-bfe3-842c5571509b.png" alt="Configuring Zonal clusters" class="image--center mx-auto" width="864" height="803" loading="lazy"></p>
<p>On the left sidebar, go to “Node pools.” You’ll see a default pool already added.</p>
<ul>
<li><p>Keep the name as is.</p>
</li>
<li><p>Set the Number of nodes to 1.</p>
</li>
<li><p>Enable the Cluster autoscaler option (so it can scale up automatically later).</p>
</li>
<li><p>Set the Maximum number of Nodes to 10, and the minimum to 0.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762918866561/89a00b2c-46e8-440d-8662-77386cc2cf0e.png" alt="Modifying our default node pool, the cluster autoscaler, etc" class="image--center mx-auto" width="867" height="822" loading="lazy"></p>
</li>
</ul>
<p>Next, click the dropdown arrow beside “default-pool” and select “Nodes.” Here, set up your node specifications:</p>
<ul>
<li><p><strong>VM family:</strong> <code>E2</code></p>
</li>
<li><p><strong>Machine type:</strong> <code>Custom</code></p>
</li>
<li><p><strong>vCPUs:</strong> 2</p>
</li>
<li><p><strong>Memory:</strong> 7 GB</p>
</li>
<li><p><strong>Boot disk type:</strong> Standard persistent disk</p>
</li>
<li><p><strong>Disk size:</strong> 50 GB</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762837157043/89da8297-8ecc-4369-aef5-c3b0e75e37be.png" alt="Configuring the E2 Machine type" class="image--center mx-auto" width="860" height="617" loading="lazy"></p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762920102117/173a1d66-d31b-49e3-835b-436ec2781b49.png" alt="Configuring our default pool CPU, RAM, and disk" class="image--center mx-auto" width="870" height="616" loading="lazy"></p>
</li>
</ul>
<p>When all that’s set, click “Create.” Your cluster will start provisioning.</p>
<h3 id="heading-connecting-to-your-gke-cluster">Connecting to your GKE cluster</h3>
<p>Once your GKE cluster creation is complete (this might take a few minutes), you’ll see something like this in the console:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762844143298/042cc870-82ae-4981-b7c8-d80b187f37a9.png" alt="Accessing out new cluster page" class="image--center mx-auto" width="1267" height="537" loading="lazy"></p>
<p>Next, click the “Connect” link at the top of the page. A modal will pop up. Copy the CLI command you see.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762844213835/119b603c-26c3-46ee-83e1-8feba78031a7.png" alt="Getting the command to access the cluster" class="image--center mx-auto" width="1258" height="710" loading="lazy"></p>
<p>It’ll look something like:</p>
<pre><code class="lang-bash">gcloud container clusters get-credentials cockroachdb-tutorial --zone us-central1<span class="hljs-_">-a</span> --project &lt;PROJECT_NAME&gt;
</code></pre>
<p>📌 <strong>Note:</strong> To run this command successfully, you need to have the <code>gcloud</code> CLI tool installed. If you don’t have it yet, visit <a target="_blank" href="https://docs.cloud.google.com/sdk/docs/install">Install Google Cloud SDK</a> and pick the steps for your OS.</p>
<p>After installing the <code>gcloud</code> CLI, run:</p>
<pre><code class="lang-bash">gcloud auth login
</code></pre>
<p>This authenticates your terminal with your Google Cloud account so you can access the cluster securely.</p>
<p>After authenticating your terminal with access to Google Cloud, run the command you copied earlier. You should see something like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762844890936/12e6d8a7-b0ae-44d1-a77c-aeb118ba269b.png" alt="The command to provide your terminate your terminal to the newly created Kubernetes cluster" class="image--center mx-auto" width="1293" height="54" loading="lazy"></p>
<p>Now run the command to retrieve your pods, <code>kubectl get po</code>. This will retrieve the pods from your new cluster on Google Kubernetes Engine, not Minikube.</p>
<p>For now, we’ve not deployed anything yet, so the namespace should be empty.</p>
<p>But we should have at least 1 worker node available. Run the <code>kubectl get nodes</code> command to view it. You should see something similar to this (GKE takes care of our control plane for us, so when we view the nodes, we’ll only see the worker nodes).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762917947091/c29eb598-1723-43d0-a77f-c6611d04d3d8.png" alt="The available nodes in the GKE cluster" class="image--center mx-auto" width="702" height="55" loading="lazy"></p>
<h3 id="heading-deploying-cockroachdb-in-production-on-gke">Deploying CockroachDB in Production (on GKE)</h3>
<p>Now that we’ve successfully created our Google Kubernetes Engine (GKE) cluster, it’s time to deploy our CockroachDB cluster in it – this time, in production mode.</p>
<p>Unlike our earlier Minikube setup (which we used for local development), deploying to GKE introduces new considerations like security, storage classes, and authentication methods – all tailored for a real-world production environment.</p>
<p>To get started, create a new file called <code>cockroachdb-production.yml</code>, and paste the following configuration inside:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">serviceAccount:</span>
    <span class="hljs-attr">create:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">"crdb-cockroachdb"</span>
    <span class="hljs-attr">annotations:</span>
      <span class="hljs-attr">iam.gke.io/gcp-service-account:</span> <span class="hljs-string">&lt;GOOGLE_SERVICE_ACCOUNT&gt;</span>

<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">10Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">premium-rwo</span>

<span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>

<span class="hljs-attr">init:</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app.kubernetes.io/component:</span> <span class="hljs-string">init</span>
  <span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">wait:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Replace the placeholder <code>&lt;GOOGLE_SERVICE_ACCOUNT&gt;</code> with the <strong>CockroachDB backup service account</strong> you created earlier (in the “Backing Up CockroachDB to Google Cloud Storage” section). It should look something like this <code>cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</code>.</p>
<h3 id="heading-understanding-the-configuration">Understanding the Configuration</h3>
<p>Let’s break down what’s happening in this production Helm values configuration and how it differs from the one we used in Minikube.👇🏽</p>
<h4 id="heading-1-modified-the-statefulset-configuration">1. Modified the <code>statefulset</code> Configuration</h4>
<p>We’re allocating 3 GiB of RAM and 1 vCPU to each replica, both as requests and limits.</p>
<p>This ensures that each node has enough guaranteed resources and avoids Kubernetes evicting it due to it using more than its requested resources.</p>
<p>We also defined a <strong>service account</strong> and annotated it with a GCP service account using the <code>iam.gke.io/gcp-service-account</code> annotation.</p>
<p>This annotation allows CockroachDB to securely access Google Cloud services (like Google Cloud Storage) without using static JSON key files (key.json), thanks to a GKE feature called <strong>Workload Identity</strong>.</p>
<p>In production, we let GKE handle authentication to Google services instead of mounting key files.</p>
<h4 id="heading-2-removed-podsecuritycontext">2. Removed <code>podSecurityContext</code></h4>
<p>In Minikube, we included this section:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">podSecurityContext:</span>
  <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
<span class="hljs-string">...</span>
</code></pre>
<p>We did that to give CockroachDB permission to access our local disk for persistent storage. But in GKE, this isn’t needed. Google Cloud handles storage mounting securely on our behalf, so we can safely omit this part.</p>
<h4 id="heading-3-removed-podantiaffinity-and-nodeselector">3. Removed <code>podAntiAffinity</code> and <code>nodeSelector</code></h4>
<p>In our Minikube deployment, we used:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">podAntiAffinity:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>
<span class="hljs-attr">nodeSelector:</span>
  <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
<span class="hljs-string">...</span>
</code></pre>
<p>That was just to <strong>force all CockroachDB instances to run on the same node</strong> on Minikube.</p>
<p>But in production, we <em>want</em> each replica on a different VM. This ensures high availability, even if one VM fails, only one CockroachDB replica is affected, and the cluster stays active.</p>
<p>Since our cluster uses a replication factor of 3, at least 2 replicas (a quorum) need to be active for the database to stay online, else, it will crash 🥲.</p>
<h4 id="heading-4-removed-env-volumes-and-volumemounts">4. Removed <code>env</code>, <code>volumes</code>, and <code>volumeMounts</code></h4>
<p>In Minikube, we had to manually mount the Service Account key:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">env:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GOOGLE_APPLICATION_CREDENTIALS</span>
    <span class="hljs-attr">value:</span> <span class="hljs-string">/var/run/gcp/key.json</span>
<span class="hljs-attr">volumes:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
    <span class="hljs-attr">secret:</span>
      <span class="hljs-attr">secretName:</span> <span class="hljs-string">gcs-key</span>
<span class="hljs-attr">volumeMounts:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
    <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/var/run/gcp</span>
    <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
<span class="hljs-string">...</span>
</code></pre>
<p>This was needed so CockroachDB could access our Google Cloud Storage bucket for backups.</p>
<p>But in production, we don’t use key files. Instead, we use a GKE feature called Workload Identity.</p>
<p>It securely binds a Kubernetes Service Account to a Google Service Account, giving our CockroachDB pods the same permissions as the GCP account: no keys, no secrets, and much safer 🔒</p>
<h4 id="heading-5-updated-storagepersistentvolumestorageclass">5. Updated <code>storage.persistentVolume.storageClass</code></h4>
<p>In Minikube, we used a standard disk:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">5Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">standard</span>
<span class="hljs-string">...</span>
</code></pre>
<p>But for production, we’re switching to a faster SSD:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">10Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">premium-rwo</span>
<span class="hljs-string">...</span>
</code></pre>
<p>This uses Google Cloud’s <code>pd-ssd</code> disk type which is the recommended choice for CockroachDB due to its <strong>high IOPS</strong> (read/write operations per second) and <strong>throughput</strong>. This gives our cluster faster read and write speeds under load, leading to better performance.</p>
<h4 id="heading-6-enabled-tls-for-secure-communication">6. Enabled TLS for Secure Communication</h4>
<p>In development, we disabled TLS:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>
</code></pre>
<p>That made it easier and simpler to connect without dealing with certificates.</p>
<p>But in production, security is non-negotiable. We’re enabling TLS to ensure that all communication with CockroachDB is encrypted in transit, and that only clients with <strong>valid certificates</strong> (signed by the same authority) can connect. This is <strong>mutual TLS (mTLS)</strong> authentication.</p>
<p>mTLS ensures that both sides (client and server) prove who they are, preventing impersonation or man-in-the-middle attacks. It’s one of the strongest ways to secure a production database connection.</p>
<p>To learn more about TLS and mTLS encryption, check out:</p>
<ul>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/understanding-website-encryption/">Understanding Website Encryption (FreeCodeCamp)</a></p>
</li>
<li><p><a target="_blank" href="https://medium.com/@LukV/mutual-tls-mtls-a-deep-dive-into-secure-client-server-communication-bbb83f463292">Mutual TLS Deep Dive (Medium)</a></p>
</li>
</ul>
<h3 id="heading-installing-the-cockroachdb-cluster-on-gke">Installing the CockroachDB Cluster on GKE</h3>
<p>We’ll use the values file you created (<code>cockroachdb-production.yml</code>) and deploy our CockroachDB cluster in our GKE cluster using Helm.</p>
<h4 id="heading-deploy-the-cluster">Deploy the cluster</h4>
<p>Run the following command:</p>
<pre><code class="lang-bash">helm install crdb cockroachdb/cockroachdb -f cockroachdb-production.yml
</code></pre>
<p>This command tells Helm to install a release named <code>crdb</code> using the <code>cockroachdb/cockroachdb</code> chart with your custom production-values file.</p>
<p>This step will take a few minutes. GKE will spin up 3 (or more) worker nodes to host the CockroachDB replicas.</p>
<p>Thanks to pod anti-affinity rules, you’ll typically see <strong>one replica pod per VM</strong> (which improves fault tolerance).</p>
<h4 id="heading-verify-the-pods">Verify the pods</h4>
<p>Once provisioning is done, check the pods:</p>
<pre><code class="lang-bash">kubectl get pods
</code></pre>
<p>You should see three CockroachDB replica pods (for example: <code>crdb-cockroachdb-0</code>, <code>crdb-cockroachdb-1</code>, <code>crdb-cockroachdb-2</code>) in <code>Running</code> status.</p>
<h4 id="heading-verify-the-storage-class-ssd">Verify the storage class (SSD)</h4>
<p>Now check the persistent volume claims to confirm they’re using the fast SSD storage class you requested:</p>
<pre><code class="lang-bash">kubectl get pvc
</code></pre>
<p>Look for your PVCs (persistent volume claims) and check the <code>STORAGECLASS</code> column. You should see something like <code>premium-rwo</code> instead of <code>standard</code> or <code>standard-rwo</code>. This confirms that your replicas are using the high-performance disk type you configured.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762928441524/d7e3d17f-c144-468f-8cc5-d71628ac6a3b.png" alt="The CockorachDB replicas and disk in production" class="image--center mx-auto" width="1008" height="209" loading="lazy"></p>
<p>📌 This is important, because in production you want good disk IOPS and throughput. Slower disks can bottleneck the database.</p>
<h3 id="heading-connecting-to-our-cockroachdb-cluster-now-that-tls-mtls-are-enabled">Connecting to Our CockroachDB Cluster (Now That TLS + mTLS Are Enabled)</h3>
<p>Now that we’ve enabled TLS encryption and mTLS authentication, let’s actually try connecting to the cluster so you can <em>see</em> what this security setup looks like in action.</p>
<p>We’ll break down in more detail what TLS and mTLS mean shortly. But for now, let’s jump straight into trying to connect – because once you see the behavior, the explanation becomes much easier to understand.</p>
<h4 id="heading-step-1-expose-the-cockroachdb-cluster-to-your-local-pc-using-port-forwarding">Step 1: Expose the CockroachDB Cluster to Your Local PC (Using Port Forwarding)</h4>
<p>Just like we've been doing from the start, we’ll expose our CockroachDB cluster through <strong>port-forwarding</strong>.</p>
<p>Open a new terminal window and run:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26259:26257
</code></pre>
<p>What this means:</p>
<ul>
<li><p>The first port (26259) is the port on your computer.</p>
</li>
<li><p>The second port (26257) is the port inside the CockroachDB cluster.</p>
</li>
<li><p>Format is: <code>&lt;YOUR_COMPUTER_PORT&gt;</code> <strong>:</strong> <code>&lt;COCKROACHDB_PORT&gt;</code></p>
</li>
</ul>
<p>So now, CockroachDB will be reachable locally at <code>localhost:26259</code>.</p>
<h4 id="heading-step-2-open-beekeeper-studio-and-create-a-fresh-connection">Step 2: Open Beekeeper Studio and Create a Fresh Connection</h4>
<p>If Beekeeper Studio is still connected to our old Minikube cluster, or you're not seeing the “new connection” screen, just press <code>Ctrl + Shift + N</code>. This opens a new connection window instantly.</p>
<h4 id="heading-step-3-enter-the-connection-details">Step 3: Enter the Connection Details</h4>
<p>Now fill in these fields:</p>
<ul>
<li><p><strong>Port:</strong> <code>26259</code></p>
</li>
<li><p><strong>User:</strong> <code>root</code></p>
</li>
<li><p><strong>Default Database:</strong> <code>defaultdb</code></p>
</li>
</ul>
<p>Now click Test Connection.</p>
<p>And boom! You should see a message telling you something like:</p>
<blockquote>
<p>“This cluster is running in secure mode. You must use SSL to connect.”</p>
</blockquote>
<p>It’ll look similar to this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763193779864/f3e7abcb-34b0-4c21-8652-48a03e4ff6c9.png" alt="Trying to connect to the new CockroachDB cluster in insecure mode" class="image--center mx-auto" width="562" height="615" loading="lazy"></p>
<p>This is good: it means our CockroachDB cluster is officially in <strong>secure mode</strong>, and it’s rejecting any connection that doesn’t include proper TLS certificates.</p>
<h3 id="heading-connecting-via-mutual-tls-mtls-why-we-need-a-certificate-for-our-root-user">Connecting via Mutual TLS (mTLS) — Why We Need a Certificate for Our <code>root</code> User</h3>
<p>Now that our CockroachDB cluster is officially running in secure mode, we can’t just connect to it with a username and port anymore. CockroachDB won’t accept that.</p>
<p>To talk to it, <strong>we must connect using Mutual TLS (mTLS)</strong>.</p>
<p>Why? Because TLS alone only protects the connection in one direction (you verifying the server). mTLS protects the connection in both directions (you verify the server, and the server also verifies <em>you</em>).</p>
<p>Let’s break this down in simple, everyday English 👇🏾</p>
<h4 id="heading-why-tls-exists-in-the-first-place">Why TLS Exists in the First Place</h4>
<p>Whenever you send anything to CockroachDB, like a query, a connection, a password, whatever, it’s all data moving over a network – for example, the internet.</p>
<p>Without protection, anyone could intercept it and read the data being sent to your DB while it’s on its way<br>TLS fixes that :)</p>
<p>✔️ The CockroachDB cluster has its own <strong>public key + private key</strong><br>✔️ It has a <strong>certificate</strong> that carries its public key<br>✔️ When you connect, the cluster sends you this certificate<br>✔️ Your database tool, for example Beekeeper, uses the public key in the process of encrypting all your traffic sent to the DB<br>✔️ Only CockroachDB can decrypt it with the help of its private key</p>
<p>This gives you encryption and proof you’re really talking to CockroachDB, not some fake service pretending to be it.</p>
<h4 id="heading-why-mtls-exists-mutual-tls">Why mTLS Exists (Mutual TLS)</h4>
<p>TLS protects the server – CockroachDB. mTLS protects <strong>both sides</strong> – you and CockroachDB.</p>
<p>So CockroachDB also wants YOU to send your certificate.</p>
<p>But not just any certificate. Your certificate must be:</p>
<ul>
<li><p>Signed by <strong>THE SAME Certificate Authority (CA)</strong></p>
</li>
<li><p>Trusted by the CockroachDB cluster</p>
</li>
<li><p>Mapped to a CockroachDB user (like <code>root</code>)</p>
</li>
</ul>
<p>This is how CockroachDB says:</p>
<blockquote>
<p>“Let me see your certificate so I know you’re someone I should allow in.”</p>
</blockquote>
<p>And we reply:</p>
<blockquote>
<p>“Here is my certificate, signed by the same CA that signed yours.”</p>
</blockquote>
<p>At that point, both sides trust each other.</p>
<p>If this still feels abstract, <a target="_blank" href="https://www.youtube.com/watch?v=EnY6fSng3Ew">watch this video</a>. It explains TLS beautifully.</p>
<h3 id="heading-lets-explore-our-clusters-certificate">Let’s Explore Our Cluster’s Certificate</h3>
<p>Remember that the Helm chart automatically created:</p>
<ul>
<li><p>The CockroachDB Certificate Authority</p>
</li>
<li><p>The CockroachDB node certificates</p>
</li>
<li><p>The keypairs used for encryption</p>
</li>
</ul>
<p>You can list all the CockroachDB-related Kubernetes secrets with:</p>
<pre><code class="lang-bash">kubectl get secrets
</code></pre>
<p>The one we're interested in is:</p>
<pre><code class="lang-bash">crdb-cockroachdb-node-secret
</code></pre>
<p>If you inspect this secret, you’ll see three keys inside:</p>
<ul>
<li><p><code>ca.crt</code>: the CA’s public certificate</p>
</li>
<li><p><code>tls.key</code>: the CockroachDB node’s private key</p>
</li>
<li><p><code>tls.crt</code>: the CockroachDB node certificate</p>
</li>
</ul>
<p>Now let’s decode the CockroachDB node certificate.</p>
<p>Run this:</p>
<pre><code class="lang-bash">kubectl get secret crdb-cockroachdb-node-secret -o jsonpath=<span class="hljs-string">'{.data.tls\.crt}'</span> | base64 -d &gt; crdb-node.crt
</code></pre>
<p>This gives you the raw certificate (which looks like gibberish):</p>
<pre><code class="lang-bash">-----BEGIN CERTIFICATE-----
MIIEGDCCAwCgAwIBAgIQWgOPJa4OLoZZjcXLgDF3bjANBgkqhkiG9w0BAQsFADAr
...
-----END CERTIFICATE-----
</code></pre>
<p>Let’s decode it into something readable:</p>
<pre><code class="lang-bash">openssl x509 -<span class="hljs-keyword">in</span> ./crdb-node.crt -text -noout &gt; crdb-node.crt.decoded
</code></pre>
<p>Open the <code>crdb-node.crt.decoded</code> file. This is the <strong>human-readable</strong> CockroachDB cluster certificate.</p>
<p><strong>N.B.:</strong> You need to have the <code>openssl</code> tool installed in order to be able to make the certificate human-readable. If you don’t, <a target="_blank" href="https://github.com/openssl/openssl#download">install it following this tutorial</a>.</p>
<h3 id="heading-understanding-the-certificate-sections-explained-super-simply">Understanding the Certificate Sections (Explained Super Simply)</h3>
<h4 id="heading-1-issuer">1. Issuer</h4>
<p>You’ll see something like:</p>
<pre><code class="lang-bash">Issuer: O = Cockroach, CN = Cockroach CA
</code></pre>
<p>This tells us:</p>
<ul>
<li><p>The certificate was signed by a Certificate Authority created by the Helm chart</p>
</li>
<li><p>The <strong>Organization (O)</strong> is “Cockroach”</p>
</li>
<li><p>The <strong>Common Name (CN)</strong> is “Cockroach CA”</p>
</li>
</ul>
<p>This basically means:</p>
<blockquote>
<p>“This certificate comes from the CockroachDB internal CA.”</p>
</blockquote>
<h4 id="heading-2-subject">2. Subject</h4>
<p>You’ll also see this:</p>
<pre><code class="lang-bash">Subject: O = Cockroach, CN = node
</code></pre>
<p>What does this mean?</p>
<p><strong>Organization = Cockroach</strong></p>
<ul>
<li><p>This simply groups all CockroachDB-generated certificates under one “organization label.”</p>
</li>
<li><p>It doesn’t refer to the company. It’s just a logical grouping created by CockroachDB’s built-in toolset.</p>
</li>
</ul>
<p><strong>Common Name = node</strong></p>
<ul>
<li><p>This tells CockroachDB that this certificate belongs to a <strong>cluster node</strong>, not a user or a client machine.</p>
</li>
<li><p>In CockroachDB, node certificates are used for:</p>
<ol>
<li><p>DB-to-DB communication</p>
</li>
<li><p>cluster gossip</p>
</li>
<li><p>handling incoming connections from clients (you)</p>
</li>
</ol>
</li>
</ul>
<p>So this certificate is saying:</p>
<blockquote>
<p>“Hi, I’m a CockroachDB node. Please trust me as part of the cluster.”</p>
</blockquote>
<h4 id="heading-3-extended-key-usage-eku">3. Extended Key Usage (EKU)</h4>
<p>Scroll down and you’ll see:</p>
<pre><code class="lang-bash">X509v3 Extended Key Usage:
    TLS Web Server Authentication
    TLS Web Client Authentication
</code></pre>
<p>This is <em>super important</em>, because it defines <strong>how</strong> this certificate is allowed to be used.</p>
<p>Let’s simplify it:</p>
<h4 id="heading-tls-web-server-authentication">TLS Web Server Authentication</h4>
<p>This means:</p>
<blockquote>
<p>“This certificate can be presented <strong>by a server</strong> to prove its identity.”</p>
</blockquote>
<p>In our case, the CockroachDB node uses this certificate to prove to you (the client) that it is the real CockroachDB server. Think of it like flashing an ID card before letting you in.</p>
<h4 id="heading-tls-web-client-authentication">TLS Web Client Authentication</h4>
<p>This means:</p>
<blockquote>
<p>“This certificate can also be used <strong>as a client certificate</strong>.”</p>
</blockquote>
<p>Why would a server have a client certificate? Well, because in CockroachDB, nodes (DBs) talk to each other. When node A connects to node B, node A is a <strong>client</strong>, and node B is a <strong>server</strong>.</p>
<p>So the same certificate serves two roles. Your local machine will use a different certificate, created specifically for your <code>root</code> user. We’ll generate that soon.</p>
<h3 id="heading-creating-a-client-certificate-so-we-can-finally-connect-to-cockroachdb">Creating a Client Certificate (So We Can Finally Connect to CockroachDB)</h3>
<p>Now that we’ve seen how the CockroachDB node certificate works, let’s generate our client certificate – the one we’ll use to connect from Beekeeper Studio.</p>
<p>Remember: CockroachDB is running in secure mode, so it won’t accept any connection that doesn’t come with a valid, signed certificate.</p>
<p>To fix that, let’s build a tiny Kubernetes pod whose only job is to create a certificate for our <code>root</code> SQL user.</p>
<h4 id="heading-step-1-create-a-file-called-gen-root-certyml">Step 1: Create a File Called <code>gen-root-cert.yml</code></h4>
<p>Paste this into it:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gen-root-cert</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span>
        <span class="hljs-attr">items:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.crt</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.crt</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.key</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.key</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gen</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.1</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"sh"</span>, <span class="hljs-string">"-ec"</span>]
      <span class="hljs-attr">args:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">|
          mkdir -p /out
</span>
          <span class="hljs-comment"># Copy the CockroachDB cluster Certificate Authority certificate file `ca.crt` (for Mutual TLS authentication)</span>
          <span class="hljs-string">cp</span> <span class="hljs-string">/ca/ca.crt</span> <span class="hljs-string">/out/ca.crt</span>

          <span class="hljs-comment"># Create the client certificate and key pair for the SQL user 'root' using the CockroachDB cluster Certificate Authority private key `ca.key`</span>
          <span class="hljs-string">/cockroach/cockroach</span> <span class="hljs-string">cert</span> <span class="hljs-string">create-client</span> <span class="hljs-string">root</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--certs-dir=/out</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--ca-key=/ca/ca.key</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--lifetime=5h</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--overwrite</span>

          <span class="hljs-comment"># List the generated files</span>
          <span class="hljs-string">ls</span> <span class="hljs-string">-al</span> <span class="hljs-string">/out</span>

          <span class="hljs-comment"># Keep the pod alive so we can kubectl cp the files</span>
          <span class="hljs-string">sleep</span> <span class="hljs-number">3600</span>
      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> { <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>, <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/ca</span>, <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span> }
      <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">requests:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"50Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"10m"</span>
        <span class="hljs-attr">limits:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"500Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"50m"</span>
</code></pre>
<p>So how does this work?</p>
<p>We previously mentioned that the Helm chart created a secret, <code>crdb-cockroachdb-ca-secret</code>.</p>
<p>This secret contains:</p>
<ul>
<li><p>The Certificate Authority public certificate</p>
</li>
<li><p>The private key (used for signing)</p>
</li>
<li><p>The CA metadata</p>
</li>
</ul>
<p>CockroachDB requires that the server certificate (node cert) and the client certificate (your root cert) be signed by <strong>THE SAME CA</strong>. Because this ensures both sides trust each other.</p>
<p>So what do we do?</p>
<p>We mount the CA secret into the pod:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">volumes:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
    <span class="hljs-attr">secret:</span>
      <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span>
</code></pre>
<p>This gives the pod access to:</p>
<ul>
<li><p><code>/ca/ca.crt</code>: CA public certificate</p>
</li>
<li><p><code>/ca/ca.key</code>: CA <em>private</em> key</p>
</li>
</ul>
<p>And with these, we can sign new client certificates inside the cluster.</p>
<p>The important command inside the pod:</p>
<pre><code class="lang-yaml"><span class="hljs-string">/cockroach/cockroach</span> <span class="hljs-string">cert</span> <span class="hljs-string">create-client</span> <span class="hljs-string">root</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--certs-dir=/out</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--ca-key=/ca/ca.key</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--lifetime=5h</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--overwrite</span>
</code></pre>
<p>What this does:</p>
<ul>
<li><p>Generates a brand new public/private key pair for the <code>root</code> SQL user</p>
</li>
<li><p>Uses the CA private key to <strong>sign the client certificate</strong></p>
</li>
<li><p>Places everything inside <code>/out</code></p>
</li>
<li><p>Makes the certificate valid for <strong>5 hours</strong></p>
</li>
</ul>
<p>If we passed <code>demo</code> instead of <code>root</code>, then the certificate CN would be <code>demo</code>, and CockroachDB would treat anyone using that certificate as the <code>demo</code> SQL user.</p>
<p>That’s how CockroachDB identifies and authenticates users when running in secure mode.</p>
<h4 id="heading-step-2-deploy-the-pod">Step 2: Deploy the Pod</h4>
<p>Run:</p>
<pre><code class="lang-yaml"><span class="hljs-string">kubectl</span> <span class="hljs-string">apply</span> <span class="hljs-string">-f</span> <span class="hljs-string">gen-root-cert.yml</span>
</code></pre>
<p>Give it a minute to start and generate the files.</p>
<h4 id="heading-step-3-copy-the-certificates-to-your-local-pc">Step 3: Copy the Certificates to Your Local PC</h4>
<p>We need three files:</p>
<ul>
<li><p><code>client.root.crt</code>: client certificate</p>
</li>
<li><p><code>client.root.key</code>: private key</p>
</li>
<li><p><code>ca.crt</code>: CA certificate</p>
</li>
</ul>
<p>Copy them from the pod to your machine:</p>
<pre><code class="lang-bash">kubectl cp default/gen-root-cert:/out/client.root.crt ./client.root.crt
kubectl cp default/gen-root-cert:/out/client.root.key ./client.root.key
kubectl cp default/gen-root-cert:/out/ca.crt             ./ca.crt
</code></pre>
<p>Now your folder should contain:</p>
<pre><code class="lang-bash">client.root.crt
client.root.key
ca.crt
</code></pre>
<p>These are the files Beekeeper Studio needs for mTLS.</p>
<h4 id="heading-step-4-decode-the-client-certificate-just-like-we-did-for-the-node-certificate">Step 4: Decode the Client Certificate (Just Like We Did for the Node Certificate)</h4>
<p>Run:</p>
<pre><code class="lang-bash">openssl x509 -<span class="hljs-keyword">in</span> client.root.crt -text -noout &gt; crdb-root.crt.decoded
</code></pre>
<p>Open the <code>crdb-root.crt.decoded</code> file and look at the contents.</p>
<h4 id="heading-understanding-the-client-certificate">Understanding the Client Certificate</h4>
<ol>
<li><strong>Issuer</strong></li>
</ol>
<p>You'll see <code>Issuer: O = Cockroach, CN = Cockroach CA</code></p>
<p>This is the same Issuer as the CockroachDB node certificate.</p>
<p>This confirms that both certificates were signed by the <em>same</em> Certificate Authority, that they trust each other, and that mTLS will work perfectly.</p>
<ol start="2">
<li><strong>Subject</strong></li>
</ol>
<p>You’ll see: <code>Subject: O = Cockroach, CN = root</code></p>
<p>This means that the Organization is just a label grouping CockroachDB identities, and that the Common Name is <code>root</code>. This is VERY important.</p>
<p>The CN of a client certificate literally tells CockroachDB:</p>
<blockquote>
<p>“This connection belongs to the SQL user named <code>root</code>.”</p>
</blockquote>
<p>If CN was <code>demo</code>, CockroachDB would authenticate you as the <code>demo</code> SQL user.</p>
<h4 id="heading-extended-key-usage-eku">Extended Key Usage (EKU)</h4>
<p>You should see: <code>TLS Web Client Authentication</code>.</p>
<p>This is exactly what we want. It tells CockroachDB:</p>
<blockquote>
<p>“This certificate is only for clients connecting to the database.”</p>
</blockquote>
<p>Unlike node certificates, you will NOT see: <code>TLS Web Server Authentication</code>.</p>
<p>Why?</p>
<p>Because:</p>
<ul>
<li><p><strong>Server Authentication</strong> = for certificates the SERVER SHOWS TO THE CLIENT. For example: CockroachDB nodes proving they are legitimate.</p>
</li>
<li><p><strong>Client Authentication</strong> = for certificates THE CLIENT SENDS TO THE SERVER. For example: You proving you are the real <code>root</code> user.</p>
</li>
</ul>
<h4 id="heading-why-your-client-certificate-cannot-be-used-as-a-server-certificate">Why your client certificate <strong>cannot</strong> be used as a server certificate</h4>
<p>Because a server certificate says:</p>
<blockquote>
<p>“Trust me, I AM the CockroachDB server.”</p>
</blockquote>
<p>But your client certificate says:</p>
<blockquote>
<p>“Trust me, I am an authenticated user.”</p>
</blockquote>
<p>Two very different identities. And CockroachDB will <em>reject</em> any certificate used in the wrong role.</p>
<p>So having only TLS Web Client Authentication in your certificate is perfect for our use case. :)</p>
<h3 id="heading-connecting-to-our-cockroachdb-cluster-securely-using-mtls">Connecting to Our CockroachDB Cluster Securely (Using mTLS)</h3>
<p>Now that we’ve successfully generated the certificates and key pairs we need, it's time to use them to securely connect to our CockroachDB cluster from Beekeeper Studio.</p>
<p>Remember: CockroachDB is running in secure mode, so without these certificates, it will <em>reject all incoming connections</em>, even if you enter the correct username and password.</p>
<p>Let’s walk through the steps.👇🏾</p>
<h4 id="heading-step-1-make-sure-port-forwarding-is-still-running">Step 1: Make Sure Port Forwarding Is Still Running</h4>
<p>Before connecting, ensure that your CockroachDB cluster is still exposed to your PC.</p>
<p>If you already closed the previous terminal window, simply re-run this:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26259:26257
</code></pre>
<p>This makes your CockroachDB node reachable at: <code>localhost:26259</code>. If this step isn’t active, <em>Beekeeper Studio will not be able to connect</em>.</p>
<h4 id="heading-step-2-open-beekeeper-studio-and-set-up-the-connection">Step 2: Open Beekeeper Studio and Set Up the Connection</h4>
<p>Launch Beekeeper Studio and open a fresh connection window (Ctrl + Shift + N if needed).</p>
<p>Now fill in the fields like this:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Field</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Connection Type</strong></td><td>CockroachDB</td></tr>
<tr>
<td><strong>Host</strong></td><td><code>localhost</code></td></tr>
<tr>
<td><strong>Port</strong></td><td><code>26259</code></td></tr>
<tr>
<td><strong>User</strong></td><td><code>root</code></td></tr>
<tr>
<td><strong>Default Database</strong></td><td><code>defaultdb</code></td></tr>
</tbody>
</table>
</div><p>Now enable the <strong>“Enable SSL”</strong> option. Once enabled, expand the SSL section and set the following three fields:</p>
<ul>
<li><p><strong>CA Cert:</strong> Set this to the location of: <code>ca.crt</code>. This is the root Certificate Authority file you copied earlier using: <code>kubectl cp default/gen-root-cert:/out/ca.crt ./ca.crt</code>. It should still be in your project’s root directory (for example, <code>cockroachdb-tutorial/</code>).</p>
</li>
<li><p><strong>Certificate:</strong> Set this to the location of: <code>client.root.crt</code></p>
</li>
<li><p><strong>Key File:</strong> Set this to the location of: <code>client.root.key</code></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763389469459/bbdb17c5-1c3b-4163-932f-3cd5382160f4.png" alt="Connecting to the CokcorachDB cluster from Beekeeper Studio in &quot;Secure&quot; mode" class="image--center mx-auto" width="512" height="400" loading="lazy"></p>
<h4 id="heading-step-3-click-connect">Step 3: Click “Connect”</h4>
<p>Once all the fields are set properly, click <strong>Connect</strong>.</p>
<p>If everything was done correctly, you should now be connected to your CockroachDB cluster securely over Mutual TLS.</p>
<p>If the connection fails:</p>
<ul>
<li><p>Double-check your certificate paths</p>
</li>
<li><p>Ensure port-forwarding is running</p>
</li>
<li><p>Verify the user is <code>root</code></p>
</li>
<li><p>Confirm the selected connection type is <code>CockroachDB</code>.</p>
</li>
</ul>
<h4 id="heading-step-4-run-your-first-secure-query">Step 4: Run Your First Secure Query</h4>
<p>Now that you're connected, let’s verify everything works by running:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SHOW</span> <span class="hljs-keyword">users</span>;
</code></pre>
<p>You should see two users automatically created by CockroachDB:</p>
<ul>
<li><p><strong>admin</strong></p>
</li>
<li><p><strong>root</strong></p>
</li>
</ul>
<p>In the next subsection, we’ll create a <strong>new SQL user</strong> and generate a certificate for that user (just like we did for the <code>root</code> user) so you’ll understand how CockroachDB handles user authentication in production environments.</p>
<h3 id="heading-restoring-our-previous-database-into-the-new-gke-cockroachdb-cluster-without-sa-keys">Restoring Our Previous Database into the New GKE CockroachDB Cluster (without SA keys)</h3>
<p>Now that our CockroachDB cluster is up and running on GKE – fully secured with TLS encryption and mTLS authentication – it’s time to bring back the data from our previous setup.</p>
<p>Remember how we backed up our CockroachDB database (running on Minikube) to Google Cloud Storage?</p>
<p>Well, now we’re going to restore that same backup into our new production cluster on GKE. But before CockroachDB can access our bucket, we must give it permission – securely.</p>
<p>And here’s the cool part: <strong>we don’t need to use Service Account keys anymore.</strong></p>
<h4 id="heading-why-we-dont-need-service-account-keys-on-gke">Why We Don’t Need Service Account Keys on GKE</h4>
<p>Earlier, in the backup section, we generated a Service Account key on our PC and mounted it into our Minikube cluster.</p>
<p>But for GKE, we intentionally left out the following fields in our <code>cockroachdb-production.yml</code>:</p>
<ul>
<li><p><code>env</code></p>
</li>
<li><p><code>volumes</code></p>
</li>
<li><p><code>volumeMounts</code></p>
</li>
</ul>
<p>The reason? GKE supports something called <strong>Workload Identity</strong>.</p>
<p>Workload Identity lets us securely connect Kubernetes Service Accounts (KSAs) to Google Cloud Service Accounts (GSAs), without storing or mounting any secret keys. The authentication happens “implicitly” thanks to Google’s metadata server.</p>
<p>💡 Workload Identity works easily when your cluster is running on GKE. It’s more complex to set up on Minikube, Kind, EKS, AKS, or any other non-GKE cluster.</p>
<h4 id="heading-step-1-linking-the-google-service-account-to-our-kubernetes-service-account">Step 1: Linking the Google Service Account to Our Kubernetes Service Account</h4>
<p>We already touched this when deploying our cluster, but let’s look at the specific line again.</p>
<p>Open your <code>cockroachdb-production.yml</code> Helm values file and scroll to the <code>serviceAccount</code> section. You should see something like this:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">serviceAccount:</span>
    <span class="hljs-attr">create:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">"crdb-cockroachdb"</span>
    <span class="hljs-attr">annotations:</span>
      <span class="hljs-attr">iam.gke.io/gcp-service-account:</span> <span class="hljs-string">cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</span>
<span class="hljs-string">...</span>
</code></pre>
<p>Replace the <code>&lt;PROJECT_ID&gt;</code> placeholder with your real Google Cloud project ID.</p>
<p>If you’re unsure of the ID, go to Google Cloud Console, then to IAM &amp; Admin, and finally to Service Accounts. Search for <code>cockroachdb-backup</code> and copy the project ID from there.</p>
<p>This annotation instructs GKE to automatically authenticate our CockroachDB pods as the <code>cockroachdb-backup</code> Google Service Account – no keys needed.</p>
<h4 id="heading-step-2-binding-ksa-gsa-using-workload-identity">Step 2: Binding KSA ↔️ GSA Using Workload Identity</h4>
<p>Annotating the Service Account isn’t enough. We still need to explicitly allow our KSA to “impersonate" the GSA.</p>
<p>Run this command to set the active project:</p>
<pre><code class="lang-bash">gcloud config <span class="hljs-built_in">set</span> project &lt;PROJECT_ID&gt;
</code></pre>
<p>Now, apply the IAM policy binding:</p>
<pre><code class="lang-bash">gcloud iam service-accounts add-iam-policy-binding \
  &lt;GOOGLE_SERVICE_ACCOUNT&gt; \
  --role roles/iam.workloadIdentityUser \
  --member <span class="hljs-string">"serviceAccount:&lt;PROJECT_ID&gt;.svc.id.goog[&lt;NAMESPACE&gt;/&lt;KUBERNETES_SERVICE_ACCOUNT&gt;]"</span>
</code></pre>
<p>Replace the placeholders with:</p>
<ul>
<li><p><code>&lt;GOOGLE_SERVICE_ACCOUNT&gt;</code> with <code>cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</code></p>
</li>
<li><p><code>&lt;PROJECT_ID&gt;</code> with your GCP project ID</p>
</li>
<li><p><code>&lt;NAMESPACE&gt;</code> with where CockroachDB runs (<code>default</code>)</p>
</li>
<li><p><code>&lt;KUBERNETES_SERVICE_ACCOUNT&gt;</code> with <code>crdb-cockroachdb</code></p>
</li>
</ul>
<p>After a few seconds, you should see something like:</p>
<pre><code class="lang-yaml"><span class="hljs-string">Updated</span> <span class="hljs-string">IAM</span> <span class="hljs-string">policy</span> <span class="hljs-string">for</span> <span class="hljs-string">serviceAccount</span> [<span class="hljs-string">cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</span>]<span class="hljs-string">.</span>
<span class="hljs-attr">bindings:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">members:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">serviceAccount:&lt;PROJECT_ID&gt;.svc.id.goog[default/crdb-cockroachdb]</span>
  <span class="hljs-attr">role:</span> <span class="hljs-string">roles/iam.workloadIdentityUser</span>
<span class="hljs-attr">etag:</span> <span class="hljs-string">***</span>
<span class="hljs-attr">version:</span> <span class="hljs-number">1</span>
</code></pre>
<p>Perfect. Your KSA can now access Google Cloud Storage automatically.</p>
<h3 id="heading-restoring-our-previous-database-from-google-cloud-storage">Restoring Our Previous Database from Google Cloud Storage</h3>
<p>Now that authentication is set up, let’s restore the backup we previously created in the Minikube cluster.</p>
<p>Open Beekeeper Studio and reconnect to your CockroachDB cluster (the one running on GKE).</p>
<p>Before restoring anything, let’s check if the <code>books</code> table exists:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>You should see an error saying the table doesn’t exist. Don’t worry, that’s expected.</p>
<h3 id="heading-now-lets-restore-the-data">Now, Let’s Restore the Data 🎉</h3>
<p>Run this command:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">RESTORE</span> <span class="hljs-keyword">FROM</span> LATEST <span class="hljs-keyword">IN</span> <span class="hljs-string">'gs://&lt;BUCKET_NAME&gt;/cluster?AUTH=implicit'</span>;
</code></pre>
<p>Replace <code>&lt;BUCKET_NAME&gt;</code> with the name of the bucket you created earlier (for example: <code>cockroachdb-backup-7gw8u</code>).</p>
<p>CockroachDB will now:</p>
<ul>
<li><p>Authenticate using Workload Identity</p>
</li>
<li><p>Find the latest backup inside your bucket</p>
</li>
<li><p>Restore all tables, schemas, and data into your new GKE cluster</p>
</li>
</ul>
<p>After a couple of minutes, you should get a Success message.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763393752870/f95d76c0-3722-491a-a97c-a1b8a79bdc79.png" alt="Successfully restored CockroachDB database" class="image--center mx-auto" width="587" height="268" loading="lazy"></p>
<p>Now, run the query again:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom! Your books from the Minikube cluster should now appear inside the new CockroachDB cluster running on GKE 😃.</p>
<h3 id="heading-connecting-to-the-database-with-a-new-user">Connecting to the Database with a New User</h3>
<p>So far, we’ve been connecting to our CockroachDB cluster using the <code>root</code> user. While this is super convenient for tutorials, it’s not recommended for real apps.</p>
<p>This is because the <code>root</code> user has advanced privileges – basically, full access to your entire cluster. If an attacker got hold of these credentials, or your application was compromised, they could do <strong>A LOT</strong> of damage. 😬</p>
<p>Instead, it’s best practice to create a user with <strong>limited permissions</strong> for your apps. This way, even if the user is compromised, the damage is contained.</p>
<h4 id="heading-authentication-options-for-users">Authentication Options for Users</h4>
<p>CockroachDB is flexible when it comes to authentication:</p>
<ol>
<li><p><strong>Password Authentication:</strong> Create a user with a password and connect using just username + password (no client certificates required).</p>
</li>
<li><p><strong>Passwordless / Mutual TLS Authentication:</strong> Create a user without a password, then connect using client certificates signed by the same CA (like we did for <code>root</code>).</p>
</li>
<li><p><strong>Both Password + Mutual TLS:</strong> Create a user with a password and also connect using client certificates. This adds an extra layer of security.</p>
</li>
</ol>
<p>In this subsection, we’ll start simple and use password authentication.</p>
<h4 id="heading-step-1-create-the-new-user">Step 1: Create the New User</h4>
<p>Open your current connection in Beekeeper Studio (signed in as <code>root</code>) and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">USER</span> password_auth <span class="hljs-keyword">WITH</span> <span class="hljs-keyword">PASSWORD</span> <span class="hljs-string">'supersecret'</span>;
</code></pre>
<p>You should see a message confirming the user was created successfully.</p>
<h4 id="heading-step-2-connect-as-the-new-user">Step 2: Connect as the New User</h4>
<p>Open a new Beekeeper Studio window (Ctrl + Shift + N). <strong>DO NOT</strong> exit/close the old window, as we’ll need it later.</p>
<p>Fill in the connection fields:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Field</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Connection Type</strong></td><td>CockroachDB</td></tr>
<tr>
<td><strong>Host</strong></td><td><code>localhost</code></td></tr>
<tr>
<td><strong>Port</strong></td><td><code>26259</code></td></tr>
<tr>
<td><strong>Database</strong></td><td><code>defaultdb</code></td></tr>
<tr>
<td><strong>User</strong></td><td><code>password_auth</code></td></tr>
<tr>
<td><strong>Password</strong></td><td><code>huh</code> (for now, we’ll try a wrong password to see it fail)</td></tr>
</tbody>
</table>
</div><p>Click Connect.</p>
<p>❌ You’ll see an error about SSL connection being required.</p>
<p>Even though we’re connecting with a password instead of certificates, <strong>enabling SSL is still important</strong>. It encrypts the data between Beekeeper Studio and CockroachDB.</p>
<p>Without it, sensitive info like passwords and queries could be intercepted (man-in-the-middle attacks).</p>
<h4 id="heading-step-3-enable-ssl-amp-ca-verification">Step 3 — Enable SSL &amp; CA Verification</h4>
<ul>
<li><p>Tick <strong>Enable SSL</strong></p>
</li>
<li><p>Click the <strong>CA Cert</strong> field and select the <code>ca.crt</code> file in your project root (<code>cockroachdb-tutorial/</code>)</p>
</li>
</ul>
<p>This ensures that Beekeeper Studio verifies it’s really talking to our CockroachDB cluster and protects against attackers trying to intercept the connection.</p>
<p>Now, click Connect again.</p>
<p>❌ Initially, you’ll still see a <strong>Password authentication failed</strong> error because we intentionally entered the wrong password.</p>
<h4 id="heading-step-4-connect-with-the-correct-password">Step 4: Connect With the Correct Password</h4>
<p>Replace the password with <code>supersecret</code>, then click Connect.</p>
<p>You are now signed in as the <code>password_auth</code> user!</p>
<h4 id="heading-step-5-check-permissions">Step 5: Check Permissions</h4>
<p>Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>❌ You should see an error stating that <code>password_auth</code> does not have permission to access the <code>books</code> table.</p>
<p>This is expected, as it confirms that our limited-access user can <strong>only access what we explicitly grant it</strong>. Even if compromised, the attacker can’t modify our entire database.</p>
<h4 id="heading-step-6-granting-access-to-specific-tables">Step 6: Granting Access to Specific Tables</h4>
<p>To allow <code>password_auth</code> to work with the <code>books</code> table, switch back to the <code>root</code> connection Beekeeper Studio window and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">USAGE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">SCHEMA</span> defaultdb.public <span class="hljs-keyword">TO</span> password_auth;
<span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">SELECT</span>, <span class="hljs-keyword">INSERT</span>, <span class="hljs-keyword">UPDATE</span>, <span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">TABLE</span> defaultdb.public.books <span class="hljs-keyword">TO</span> password_auth;
</code></pre>
<p>This gives the user read and write access to the <code>books</code> table only.</p>
<h4 id="heading-step-7-verify-the-new-user-access">Step 7: Verify the New User Access</h4>
<p>Go back to the Beekeeper Studio window where you’re signed in as <code>password_auth</code> and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom! You should now see the list of books from your restored database.</p>
<p>Our new user is fully functional with <strong>limited privileges</strong>, making it safe for use in real applications.</p>
<h3 id="heading-connecting-with-passwordless-authentication-mutual-tls">Connecting with Passwordless Authentication (Mutual TLS)</h3>
<p>We’ve already seen how to connect to the database using a user that authenticates with a password, and without any client certificates.</p>
<p>Now, let’s look at the opposite scenario: passwordless authentication via Mutual TLS (mTLS).</p>
<p>This is one of the strongest forms of authentication because instead of a password, the database verifies you using a <strong>cryptographically signed certificate</strong>.</p>
<p>Let’s walk through it.</p>
<h4 id="heading-step-1-create-the-mtlsauth-user">Step 1: Create the <code>mtls_auth</code> User</h4>
<p>Navigate back to the Beekeeper Studio window where you're currently signed in as the <code>root</code> user. Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">USER</span> mtls_auth;
</code></pre>
<p>You should see a success message confirming that the user has been created.</p>
<p><strong>N.B.:</strong> If this query fails, there’s a good chance your <code>root</code> client certificate has expired. Remember that we set a <strong>5-hour lifetime</strong> when generating it earlier.</p>
<p>If this happens, delete the certificate-generation pod:</p>
<pre><code class="lang-bash">kubectl delete po/gen-root-cert
</code></pre>
<p>Then re-apply the <code>gen-root-cert.yml</code> manifest. Copy the newly generated <code>client.root.crt</code>, <code>client.root.key</code>, and <code>ca.crt</code> back to your PC. Then try creating the user again.</p>
<h4 id="heading-step-2-attempt-signing-in-as-mtlsauth-expect-failure">Step 2: Attempt Signing In as <code>mtls_auth</code> (Expect Failure)</h4>
<p>Open a new Beekeeper Studio window (Ctrl + Shift + N).</p>
<p>Try filling in the connection settings using:</p>
<ul>
<li><p>User: <code>mtls_auth</code></p>
</li>
<li><p>SSL enabled</p>
</li>
<li><p>CA Cert: <code>ca.crt</code></p>
</li>
<li><p>Client Cert: <code>client.root.crt</code></p>
</li>
<li><p>Client Key: <code>client.root.key</code></p>
</li>
</ul>
<p>Click Connect.</p>
<p>You’ll see an error message similar to this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763444971964/93f41787-425b-4e36-86da-4b688cef672f.png" alt="Connecting as the mtls_auth user with the wrong certificate and key-pair" class="image--center mx-auto" width="521" height="813" loading="lazy"></p>
<p>Why does this fail?</p>
<ol>
<li><p>The user has no password, so password login is impossible.</p>
</li>
<li><p>You’re using the <em>root</em> certificate, not a certificate belonging to <code>mtls_auth</code>. CockroachDB is strict: each user must authenticate using <em>their own</em> certificate.</p>
</li>
</ol>
<p>So let's fix that by generating a new certificate + key pair for the <code>mtls_auth</code> user.</p>
<h4 id="heading-step-3-create-certificate-key-for-mtlsauth">Step 3: Create Certificate + Key for <code>mtls_auth</code></h4>
<p>Just like we generated certificates for the <code>root</code> user earlier, we’ll do the same for <code>mtls_auth</code>.</p>
<p>Create a new manifest named <code>gen-mtls_auth-cert.yml</code>.</p>
<p>Paste in this content:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gen-mtls-auth-cert</span> 
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span> 
        <span class="hljs-attr">items:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.crt</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.crt</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.key</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.key</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gen</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.1</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"sh"</span>, <span class="hljs-string">"-ec"</span>]
      <span class="hljs-attr">args:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">|
          mkdir -p /out
</span>
          <span class="hljs-comment"># Copy the CA certificate</span>
          <span class="hljs-string">cp</span> <span class="hljs-string">/ca/ca.crt</span> <span class="hljs-string">/out/ca.crt</span>

          <span class="hljs-comment"># Create the client certificate and key pair for user 'mtls_auth'</span>
          <span class="hljs-string">/cockroach/cockroach</span> <span class="hljs-string">cert</span> <span class="hljs-string">create-client</span> <span class="hljs-string">mtls_auth</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--certs-dir=/out</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--ca-key=/ca/ca.key</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--lifetime=5h</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--overwrite</span>

          <span class="hljs-comment"># List generated files</span>
          <span class="hljs-string">ls</span> <span class="hljs-string">-al</span> <span class="hljs-string">/out</span>

          <span class="hljs-comment"># Keep pod alive for kubectl cp</span>
          <span class="hljs-string">sleep</span> <span class="hljs-number">3600</span>
      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> { <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>, <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/ca</span>, <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span> }
      <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">requests:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"50Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"10m"</span>
        <span class="hljs-attr">limits:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"500Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"50m"</span>
</code></pre>
<p>Apply this file, wait for the pod to start, then copy the generated files:</p>
<pre><code class="lang-bash">kubectl cp default/gen-mtls-auth-cert:/out/client.mtls_auth.crt ./client.mtls_auth.crt 
kubectl cp default/gen-mtls-auth-cert:/out/client.mtls_auth.key ./client.mtls_auth.key
kubectl cp default/gen-mtls-auth-cert:/out/ca.crt ./ca.crt
</code></pre>
<p>Now we have the correct certificate + key pair for our new user.</p>
<h4 id="heading-step-4-connect-as-mtlsauth">Step 4: Connect as <code>mtls_auth</code></h4>
<p>Go back to the new Beekeeper Studio window and update the SSL fields:</p>
<ul>
<li><p><strong>CA Cert:</strong> <code>ca.crt</code></p>
</li>
<li><p><strong>Certificate:</strong> <code>client.mtls_auth.crt</code></p>
</li>
<li><p><strong>Key File:</strong> <code>client.mtls_auth.key</code></p>
</li>
</ul>
<p>Click Connect.</p>
<p>This time, it should succeed instantly</p>
<h4 id="heading-step-5-inspect-the-certificate">Step 5 — Inspect the Certificate</h4>
<p>To understand how CockroachDB links certificates to users, decode the certificate:</p>
<pre><code class="lang-bash">openssl x509 -<span class="hljs-keyword">in</span> client.mtls_auth.crt -text -noout &gt; client.mtls_auth.crt.decoded
</code></pre>
<p>Open the file, scroll to the Subject field, and you’ll see:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">Subject:</span> <span class="hljs-string">O</span> <span class="hljs-string">=</span> <span class="hljs-string">Cockroach,</span> <span class="hljs-string">CN</span> <span class="hljs-string">=</span> <span class="hljs-string">mtls_auth</span>
<span class="hljs-string">...</span>
</code></pre>
<p>The <code>CN</code> (Common Name) is the username CockroachDB uses to authenticate the session.</p>
<p>This is how CockroachDB knows you’re connecting as the <code>mtls_auth</code> user without any password at all. :)</p>
<h4 id="heading-step-6-try-reading-the-books-table">Step 6: Try Reading the Books Table</h4>
<p>Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>❌ You’ll get a permission error, just like we did earlier with the <code>password_auth</code> user.</p>
<p>This is expected because <code>mtls_auth</code> has <em>no</em> privileges yet. Perfect!</p>
<h4 id="heading-step-7-grant-permissions-to-mtlsauth">Step 7: Grant Permissions to <code>mtls_auth</code></h4>
<p>Switch to the Beekeeper Studio window where you're signed in as <code>root</code>, and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">USAGE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">SCHEMA</span> defaultdb.public <span class="hljs-keyword">TO</span> mtls_auth;
<span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">SELECT</span>, <span class="hljs-keyword">INSERT</span>, <span class="hljs-keyword">UPDATE</span>, <span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">TABLE</span> defaultdb.public.books <span class="hljs-keyword">TO</span> mtls_auth;
</code></pre>
<p>You should see a success message.</p>
<p>Now return to the <code>mtls_auth</code> session and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom! You should now see your previously restored list of books.</p>
<p>You’ve successfully connected using passwordless, certificate-based authentication and granted controlled permissions to the new user. :)</p>
<h3 id="heading-connecting-via-mutual-tls-mtls-from-our-apps-on-kubernetes">Connecting via Mutual TLS (mTLS) from Our Apps on Kubernetes</h3>
<p>So far, we’ve been connecting to our CockroachDB cluster <em>securely</em> using Beekeeper Studio thanks to our TLS certificates and mTLS authentication.</p>
<p>But…what happens when we have applications running inside our Kubernetes cluster that need to talk to CockroachDB as well?</p>
<p>Exactly: those apps also need to authenticate using client certificates</p>
<p>And that brings us to a very important point…</p>
<h4 id="heading-why-we-should-not-generate-client-certificates-using-pods-the-dangerous-way">Why We Should <em>Not</em> Generate Client Certificates Using Pods (The Dangerous Way)</h4>
<p>Up until now, we’ve been generating our client certificates using Kubernetes Pods like:</p>
<ul>
<li><p><code>gen-root-cert</code></p>
</li>
<li><p><code>gen-mtls-auth-cert</code></p>
</li>
</ul>
<p>They <em>work</em>, yes…but they’re not safe for production.</p>
<p>Why? Because these jobs <strong>mount our Certificate Authority (CA) key</strong> inside the pod:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span>
        <span class="hljs-attr">items:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.crt</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.crt</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.key</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.key</span>
<span class="hljs-string">...</span>
</code></pre>
<p>This is a <em>big</em> security risk!</p>
<p>If an attacker ever gains access to that pod?</p>
<p>🔥 Your CA key is exposed<br>🔥 They can generate <em>their own trusted certificates</em><br>🔥 They can impersonate ANY client/user, including the <code>root</code> and <code>admin</code> users<br>🔥 They’ll have full access to your CockroachDB cluster</p>
<p>And they’ll keep that access <strong>forever</strong>, until you rotate the CA key (which is painful and disruptive).</p>
<p>This is why CockroachDB strongly advises against mounting CA keys into Pods.</p>
<h4 id="heading-the-right-way-using-cert-manager-recommended-by-cockroachdb">The Right Way: Using Cert Manager (Recommended by CockroachDB)</h4>
<p>CockroachDB’s <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/secure-cockroachdb-kubernetes?filters=helm#deploy-cert-manager-for-mtls">official docs recommend</a> managing client certificates using <strong>cert-manager</strong>.</p>
<p>This is because instead of YOU exposing your CA key inside Pods, cert-manager handles everything <em>internally and securely:</em></p>
<ul>
<li><p>Cert-manager stores and protects your CA key</p>
</li>
<li><p>It generates client certificates for you</p>
</li>
<li><p>It issues private keys <em>without ever exposing your CA key</em></p>
</li>
<li><p>It auto-renews certificates before they expire</p>
</li>
<li><p>And it gives you production-grade certificate lifecycle management</p>
</li>
</ul>
<h4 id="heading-but-wait-dont-we-need-the-ca-key-to-generate-client-certificates">But Wait: Don’t We Need the CA Key to Generate Client Certificates?</h4>
<p>Great question.</p>
<p>Yes, normally you need the CA key to sign client certificates…but <strong>cert-manager takes care of that for us</strong>.</p>
<p>You simply:</p>
<ol>
<li><p>Create an Issuer (or ClusterIssuer)</p>
</li>
<li><p>Tell cert-manager to use your CockroachDB CA</p>
</li>
<li><p>Request a Certificate</p>
</li>
</ol>
<p>Then cert-manager automatically:</p>
<ol>
<li><p>Signs it</p>
</li>
<li><p>Stores it in a Kubernetes Secret (where its safe)</p>
</li>
<li><p>Rotates it before expiry</p>
</li>
<li><p>Keeps your CA key completely secure</p>
</li>
</ol>
<p>No more exposing the CA key in Pods. No more writing custom Kubernetes Pods.</p>
<h4 id="heading-certificate-rotation-another-huge-win">Certificate Rotation — Another Huge Win</h4>
<p>Let’s talk about expirations.</p>
<p>Right now:</p>
<ul>
<li><p>The <code>mtls_auth</code> client cert we generated manually has <strong>5 hours</strong> validity</p>
</li>
<li><p>After 5 hours, it expires</p>
</li>
<li><p>Your apps will fail all DB connections</p>
</li>
<li><p>You’d need to regenerate a new certificate manually</p>
</li>
<li><p>Or worse: create a CronJob to regenerate them every 4 hours</p>
</li>
</ul>
<p>This is messy and unsafe.</p>
<p>With cert-manager?</p>
<ul>
<li><p>Certificates are automatically rotated</p>
</li>
<li><p>Renewed before expiration</p>
</li>
<li><p>No downtime</p>
</li>
<li><p>No manual intervention</p>
</li>
<li><p>Apps easily reload the new certificates</p>
</li>
</ul>
<h4 id="heading-alright-lets-install-cert-manager">Alright — Let’s Install Cert Manager</h4>
<p>To start using cert-manager, install it using the Helm chart:</p>
<pre><code class="lang-bash">helm repo add cert-manager https://charts.jetstack.io

helm install cert-manager cert-manager/cert-manager \
  --<span class="hljs-built_in">set</span> crds.enabled=<span class="hljs-literal">true</span> \
  --create-namespace \
  -n cert-manager \
  --version 1.19.1
</code></pre>
<p>Once cert-manager is installed, we’ll:</p>
<ol>
<li><p>Create a <strong>ClusterIssuer</strong> that uses our CockroachDB CA</p>
</li>
<li><p>Create a <strong>Certificate</strong> for our <code>mtls_auth</code> user</p>
</li>
<li><p>Mount that Certificate into our application Pods</p>
</li>
<li><p>Connect securely to CockroachDB via mTLS from inside Kubernetes</p>
</li>
</ol>
<p>That’s what we’ll walk through next</p>
<p>Before cert-manager can issue our certificates, it needs an <strong>Issuer</strong>. And before creating an Issuer, we need a secret that contains our CA certificate and CA key using the correct key names.</p>
<h4 id="heading-creating-a-ca-secret-for-the-issuer">Creating a CA Secret for the Issuer</h4>
<p>cert-manager’s <code>Issuer</code> is a bit picky about the secret format. It expects the secret to contain two keys:</p>
<ul>
<li><p><code>tls.crt</code>: the CA certificate</p>
</li>
<li><p><code>tls.key</code>: the CA private key</p>
</li>
</ul>
<p>But\ the CockroachDB Helm chart automatically generates a secret named <code>crdb-cockroachdb-ca-secret</code>, which uses different key names:</p>
<ul>
<li><p><code>ca.crt</code></p>
</li>
<li><p><code>ca.key</code></p>
</li>
</ul>
<p>So even though this secret contains exactly what we need, cert-manager won’t accept it because the keys are not named the way it expects.</p>
<p>To fix this, we’ll re-create a new secret with the correct key names. First, copy the existing CA files from Kubernetes to your local machine:</p>
<pre><code class="lang-bash">kubectl get secret crdb-cockroachdb-ca-secret -o jsonpath=<span class="hljs-string">'{.data.ca\.crt}'</span> | base64 -d &gt; ca.crt
</code></pre>
<p>If you get a “permission denied”, simply delete any existing <code>ca.crt</code> file in your project directory.</p>
<p>Now copy the key:</p>
<pre><code class="lang-bash">kubectl get secret crdb-cockroachdb-ca-secret -o jsonpath=<span class="hljs-string">'{.data.ca\.key}'</span> | base64 -d &gt; ca.key
</code></pre>
<p>Next, create the properly formatted secret:</p>
<pre><code class="lang-bash">kubectl create secret tls crdb-ca-issuer-secret --cert=ca.crt --key=ca.key
</code></pre>
<p>If you describe it:</p>
<pre><code class="lang-bash">kubectl describe secret crdb-ca-issuer-secret
</code></pre>
<p>You should now see <code>tls.crt</code> and <code>tls.key</code> in the <code>Data</code> section – exactly what cert-manager needs.</p>
<h4 id="heading-creating-the-issuer">Creating the Issuer</h4>
<p>Now that we have a properly formatted CA secret, we can create the Issuer that cert-manager will use to sign our client certificates.</p>
<p>Create a file called <code>crdb-issuer.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cert-manager.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Issuer</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-issuer</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">ca:</span>
    <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-ca-issuer-secret</span>
</code></pre>
<p>Apply it:</p>
<pre><code class="lang-bash">kubectl apply -f crdb-issuer.yml
</code></pre>
<p>Confirm that it’s ready:</p>
<pre><code class="lang-bash">kubectl get issuer crdb-issuer
</code></pre>
<p>The <code>Ready</code> column should display <code>True</code>.</p>
<h4 id="heading-creating-the-certificate-manifest">Creating the Certificate Manifest</h4>
<p>Now we’ll define a Certificate object. This doesn’t create the client certificate instantly – instead, it tells cert-manager <strong>what kind</strong> of certificate we need. cert-manager then generates and stores the certificate automatically.</p>
<p>Create a file named <code>crdb-mtls_auth-certificate.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cert-manager.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Certificate</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-mtls-auth-certificate</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-mtls-auth-certificate</span> <span class="hljs-comment"># Secret that will hold the cert+key</span>
  <span class="hljs-attr">commonName:</span> <span class="hljs-string">mtls_auth</span> <span class="hljs-comment"># MUST match Cockroach SQL role</span>
  <span class="hljs-attr">duration:</span> <span class="hljs-string">24h</span> <span class="hljs-comment"># 1 day</span>
  <span class="hljs-attr">renewBefore:</span> <span class="hljs-string">20h</span> <span class="hljs-comment"># renew 4 hours before expiry</span>
  <span class="hljs-attr">privateKey:</span>
    <span class="hljs-attr">algorithm:</span> <span class="hljs-string">RSA</span>
    <span class="hljs-attr">size:</span> <span class="hljs-number">2048</span>
    <span class="hljs-attr">encoding:</span> <span class="hljs-string">PKCS8</span>
  <span class="hljs-attr">usages:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">client</span> <span class="hljs-string">auth</span> <span class="hljs-comment"># important: client certificate</span>
  <span class="hljs-attr">issuerRef:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-issuer</span>
    <span class="hljs-attr">kind:</span> <span class="hljs-string">Issuer</span>
    <span class="hljs-attr">group:</span> <span class="hljs-string">cert-manager.io</span>
</code></pre>
<p>Let’s look at the important properties so we can understand what the Certificate workload does:</p>
<ul>
<li><p><strong>secretName:</strong> The Kubernetes secret where cert-manager will store the generated certificate, key, and CA certificate. This is where your apps will later mount the certificate files from.</p>
</li>
<li><p><strong>commonName:</strong> Very important! This must match the <strong>CockroachDB SQL user</strong> (<code>mtls_auth</code>), because CockroachDB uses the certificate’s Common Name to identify the connecting user.</p>
</li>
<li><p><strong>duration</strong> and <strong>renewBefore:</strong> <code>duration</code> defines how long the certificate is valid. <code>renewBefore</code> ensures cert-manager renews it early, preventing the certificate from getting expired before it gets renewed (to avoid downtime).</p>
</li>
<li><p><strong>usages:</strong> Tells cert-manager what the certificate is for. <code>client auth</code> ensures this certificate is only used by clients connecting to servers, not the other way around.</p>
</li>
<li><p><strong>issuerRef:</strong> Points to the Issuer we created earlier. This tells cert-manager <em>who</em> should sign the certificate.</p>
</li>
</ul>
<p>Apply the manifest:</p>
<pre><code class="lang-bash">kubectl apply -f crdb-mtls_auth-certificate.yml
</code></pre>
<p>After a few seconds, cert-manager will generate the certificate.</p>
<p>Check the secret:</p>
<pre><code class="lang-bash">kubectl get secret crdb-mtls-auth-certificate
</code></pre>
<p>Describe it to view the keys:</p>
<pre><code class="lang-bash">kubectl describe secret crdb-mtls-auth-certificate
</code></pre>
<p>You should see:</p>
<ul>
<li><p><code>tls.crt</code></p>
</li>
<li><p><code>tls.key</code></p>
</li>
<li><p><code>ca.crt</code></p>
</li>
</ul>
<p>These are the files the application will use.</p>
<p>If we copied the content of the <code>tls.crt</code> to our local machine and decoded it using the <code>openssl x509...</code> command, we'll see similar details to the content in the <code>client.mtls_auth.crt</code> client certificate we previously generated, with the Common Name (CN being <code>mtls_auth</code>).</p>
<h4 id="heading-creating-a-pod-that-connects-using-the-client-certificate">Creating a Pod That Connects Using the Client Certificate</h4>
<p>Now let’s create a simple Pod that uses our new client certificate to connect to CockroachDB.</p>
<p>Create a file called <code>books-pod.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">books-pod</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-certs</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-mtls-auth-certificate</span>
        <span class="hljs-comment"># Make secret files read-only for the user only: 0400 (Without this, the Python app will thow an error). Howevwe, this is not compulsory for all apps, just this one being used in this tutorial :)</span>
        <span class="hljs-attr">defaultMode:</span> <span class="hljs-number">0400</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">books</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">prince2006/cockroachdb-tutorial-python-app:new</span>
      <span class="hljs-attr">imagePullPolicy:</span> <span class="hljs-string">Always</span>
      <span class="hljs-attr">env:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">DATABASE_URL</span>
          <span class="hljs-attr">value:</span> <span class="hljs-string">&gt;-
            postgresql://mtls_auth@crdb-cockroachdb-public.default:26257/defaultdb?sslmode=verify-full&amp;sslrootcert=/crdb-certs/ca.crt&amp;sslcert=/crdb-certs/tls.crt&amp;sslkey=/crdb-certs/tls.key
</span>      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-certs</span>
          <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/crdb-certs</span>
          <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">limits:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"100Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"50m"</span>
        <span class="hljs-attr">requests:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"50Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"10m"</span>
</code></pre>
<p>Here’s what’s happening:</p>
<ul>
<li><p>We mount the generated certificate secret into <code>/crdb-certs</code>.</p>
</li>
<li><p>The Python app uses those certificate files (<code>tls.crt</code>, <code>tls.key</code>, <code>ca.crt</code>) to authenticate.</p>
</li>
<li><p>The connection string does <strong>NOT</strong> include a password. CockroachDB authenticates the user entirely via the certificate’s Common Name.</p>
</li>
</ul>
<p>Apply the Pod:</p>
<pre><code class="lang-bash">kubectl apply -f books-pod.yml
</code></pre>
<p>After about a minute, view the logs:</p>
<pre><code class="lang-bash">kubectl logs books-pod
</code></pre>
<p>Or if the Pod already restarted:</p>
<pre><code class="lang-bash">kubectl logs -p books-pod
</code></pre>
<p>You should see a successful connection to CockroachDB using the <code>mtls_auth</code> user and a list of books</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763534354156/60114f7b-ba62-4706-a0b7-7629e20bfaaa.png" alt="List of books from our books-pod logs" class="image--center mx-auto" width="896" height="601" loading="lazy"></p>
<p>If you remove the certificate files or try connecting without them, the app will fail – as expected.</p>
<p><strong>Congratulations!</strong></p>
<p>You’ve officially built a fully secure, production-ready CockroachDB cluster on Kubernetes – complete with:</p>
<ul>
<li><p>End-to-end encryption (TLS)</p>
</li>
<li><p>Mutual TLS authentication (mTLS) for users and apps</p>
</li>
<li><p>Automated, daily backups to Google Cloud Storage</p>
</li>
<li><p>Proper certificate rotation with cert-manager</p>
</li>
</ul>
<h2 id="heading-how-to-get-a-cockroachdb-enterprise-license-for-free">How to Get a CockroachDB Enterprise License for Free</h2>
<p>Okay, so here’s a thing: even though you’ve built a super professional CockroachDB cluster, there’s one small catch: <strong>without a license, your cluster might be “throttled.”</strong></p>
<p>We know that because, when we access our dashboard, we get a message concerning our cluster getting throttled.</p>
<p>That means things slow down: queries take longer, performance gets worse, and scaling up won’t magically make it faster. Yeah, it’s real. 🥲</p>
<p>Why does this happen? Because CockroachDB’s “full feature set” is under a special license. If you don’t set a valid license, it limits how many SQL transactions you can run at a time.</p>
<h3 id="heading-three-types-of-licenses">Three Types of Licenses</h3>
<p>Here’s a breakdown of the different kinds of CockroachDB licenses and what they mean for you:</p>
<ol>
<li><p><strong>Trial License</strong></p>
<ul>
<li><p>Valid for <strong>30 days</strong>.</p>
</li>
<li><p>Lets you try all the “Enterprise” features.</p>
</li>
<li><p>You <em>must</em> send telemetry (more on that soon) while the trial is active.</p>
</li>
</ul>
</li>
<li><p><strong>Enterprise License (Paid)</strong></p>
<ul>
<li><p>This is CockroachDB’s “premium / fully paid” version.</p>
</li>
<li><p>You can pick the kind of license based on your environment: “Production”, “Pre-production”, or “Development.”</p>
</li>
<li><p>Companies with more than <strong>$10 million in annual revenue</strong> need to pay for this license.</p>
</li>
<li><p>There <em>are</em> discounts, startup perks, or “free” versions for smaller companies (more below).</p>
</li>
</ul>
</li>
<li><p><strong>Enterprise Free License</strong></p>
<ul>
<li><p>This is the magic one for early-stage companies or startups: it has exactly the same features as the paid Enterprise license. But it’s free if your business makes <strong>under $10 million per year</strong>.</p>
</li>
<li><p>You <em>do</em> need to renew it each year.</p>
</li>
<li><p>Support for this “Free” license is <strong>community-level</strong> (forums, docs), not paid enterprise.</p>
</li>
</ul>
</li>
</ol>
<p><strong>N.B.:</strong> To keep your free license active and <em>not</em> get throttled, CockroachDB requires telemetry. Telemetry means your cluster sends some usage data back to Cockroach Labs. And no, they’re not “stealing your data”. Here’s what that actually means:</p>
<ul>
<li><p>Telemetry includes basic usage stats, cluster health info, and configuration metrics.</p>
</li>
<li><p>It does NOT send your business data, queries, or personal customer data.</p>
</li>
<li><p>It helps Cockroach Labs <em>make sure the free license is used responsibly</em>, and helps them build better features.</p>
</li>
<li><p>If you stop sending telemetry, your cluster will eventually be throttled after 7 days (slowed down).</p>
</li>
</ul>
<h3 id="heading-how-to-apply-for-the-free-enterprise-license">How to Apply for the Free Enterprise License</h3>
<p>Here’s how you can try to get that free enterprise license:</p>
<ol>
<li><p>Go to the CockroachDB Cloud Console (Sign up if you don’t have a account). Then go to the “Organization” link on the menu, click it, then click the “Enterprise Licenses” from the dropdown.</p>
</li>
<li><p>Click the Create License button → Enable the “Find out if my company qualifies for an Enterprise Free license” option.</p>
</li>
<li><p>Fill in the form: your name, company name, job function, and the intended use of the license.</p>
</li>
<li><p>Click “Continue”.</p>
</li>
</ol>
<p>You should see this success message “Based on your company's intended use, you qualify for an Enterprise Free license.” Now agree to the terms and conditions, then click the “Generate License key“.</p>
<p>Learn more about CockroachDB licenses here 👉🏾 <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/licensing-faqs">https://www.cockroachlabs.com/docs/stable/licensing-faqs</a></p>
<h3 id="heading-adding-your-license-to-the-cockroachdb-cluster">Adding Your License to the CockroachDB Cluster</h3>
<p>Now that you’ve gotten your shiny new CockroachDB license (whether it’s the Free one or the Enterprise one), the next step is…actually <em>using it</em>.</p>
<p>Let’s add it to your CockroachDB cluster so it stops shouting “THROTTLED!” at you every time you open the dashboard :)</p>
<p>We’ll do this by updating our CockroachDB Helm configuration.</p>
<h4 id="heading-step-1-update-your-cockroachdb-productionyml">Step 1: Update Your <code>cockroachdb-production.yml</code></h4>
<p>Open your production Helm values file, and inside the <code>init</code> section, add the following:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">init:</span>
<span class="hljs-string">...</span>
    <span class="hljs-attr">provisioning:</span>
        <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
        <span class="hljs-attr">clusterSettings:</span>
          <span class="hljs-attr">cluster.organization:</span> <span class="hljs-string">"'&lt;ORGANIZATION&gt;'"</span> <span class="hljs-comment"># Enter the name of your organization here </span>
          <span class="hljs-attr">enterprise.license:</span> <span class="hljs-string">"'&lt;LICENSE&gt;'"</span> <span class="hljs-comment"># Enter your CockroachDB Enterprise license key here</span>
<span class="hljs-string">...</span>
</code></pre>
<p>Now replace:</p>
<ul>
<li><p><code>&lt;ORGANIZATION&gt;</code> with the name of your startup, business, project, or company</p>
</li>
<li><p><code>&lt;LICENSE&gt;</code> with the exact license string CockroachDB gave you</p>
</li>
</ul>
<p>That’s it – super simple.</p>
<h4 id="heading-step-2-apply-the-changes-with-helm">Step 2: Apply the Changes With Helm</h4>
<p>Run your usual Helm upgrade command:</p>
<pre><code class="lang-bash">helm upgrade cockroachdb -f cockroachdb-production.yml cockroachdb/cockroachdb
</code></pre>
<h4 id="heading-step-3-confirm-the-license-was-added-correctly">Step 3: Confirm the License Was Added Correctly</h4>
<p>Now let’s double-check everything worked.</p>
<ol>
<li><p>Connect as the <code>root</code> user: You can connect using Beekeeper Studio (like we’ve been doing).</p>
</li>
<li><p>Run this query to check your license:</p>
</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">SHOW</span> CLUSTER SETTING enterprise.license;
</code></pre>
<p>If everything went well, you should see your license key printed out in the results.</p>
<h4 id="heading-step-4-make-sure-telemetry-is-enabled-important">Step 4: Make Sure Telemetry Is Enabled (Important!)</h4>
<p>Remember: without telemetry enabled, your cluster will still get throttled, even if you have a valid license 🥲</p>
<p>Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SHOW</span> CLUSTER SETTING diagnostics.reporting.enabled;
</code></pre>
<p>If the result says “true”, you're good! Telemetry is on, CockroachDB can verify your license, and your cluster will behave normally without slowing down.</p>
<h2 id="heading-conclusion-amp-next-steps"><strong>Conclusion &amp; Next Steps ✨</strong></h2>
<p>Throughout this book, you’ve gone from “What even is CockroachDB?” to actually running your <strong>own secure, production-ready database</strong> on Kubernetes – and that’s a BIG deal. 🎉</p>
<p>You learned why CockroachDB is special, how it avoids downtime, and why it’s different from the usual databases everyone talks about.</p>
<p>Then you set up your own local environment, practiced everything safely on Minikube, and gradually built your way to a full production setup on GKE.</p>
<p>You explored CockroachDB’s dashboard, checked your cluster’s health, backed up your data to the cloud, and even learned how to keep your database fast, stable, and ready to grow when needed.</p>
<p>Finally, you deployed it on Google Cloud, secured it with encryption and certificates, and connected to it from your own PC – all step-by-step.</p>
<p>By now, you’ve basically gone from curious learner to “I can actually run this thing in production.” 🚀</p>
<p>You’ve covered a lot – and you’ve built something powerful, modern, and production-worthy. Amazing job 👏🏾😁!! And thanks for reading.</p>
<h3 id="heading-about-the-author">About the Author 👨🏾‍💻</h3>
<p>Hi, I’m Prince! I’m a DevOps engineer and Cloud architect passionate about building, deploying, architecting, and managing applications and sharing knowledge with the tech community.</p>
<p>If you enjoyed this book, you can learn more about me by exploring more of my blogs and projects on my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/">LinkedIn profile</a>. and reach out to me on <a target="_blank" href="https://x.com/POnukwili">Twitter (X)</a>. You can find more of my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/details/publications/">articles here</a> or on <a target="_blank" href="https://www.freecodecamp.org/news/author/onukwilip/">my freeCodeCamp blog</a>.</p>
<p>You can also <a target="_blank" href="https://prince-onuk.vercel.app">visit my website</a>. Let’s connect and grow together! 😊</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Integrate Vector Search in Columnar Storage ]]>
                </title>
                <description>
                    <![CDATA[ Integrating vector search into traditional data platforms is becoming a common task in the current AI-driven landscape. When Google announced general availability for vector search in BigQuery in early 2024, it joined a growing list of established da... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-integrate-vector-search-in-columnar-storage/</link>
                <guid isPermaLink="false">6914ff68e1ffda5f6ea6d8ea</guid>
                
                    <category>
                        <![CDATA[ vector database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ semantic search ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Columnar Database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Chirag Agrawal ]]>
                </dc:creator>
                <pubDate>Wed, 12 Nov 2025 21:43:04 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762983768101/928331bd-3f97-4d05-92fb-2d8ea9af5dab.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Integrating vector search into traditional data platforms is becoming a common task in the current AI-driven landscape. When Google announced general availability for vector search in BigQuery in early 2024, it joined a growing list of established databases that have added capabilities for similarity search on high-dimensional embeddings.</p>
<p>But if you examine BigQuery's implementation more closely, you’ll find an approach that goes beyond a simple feature addition. Instead of bolting on a vector library, Google has deeply integrated vector search into its existing distributed, columnar architecture.</p>
<p>In this article, we’ll take a technical deep dive into the engineering decisions behind BigQuery's vector search. We’ll explore how foundational Google technologies like Dremel, Borg, and Colossus, combined with a proprietary columnar format and a novel indexing algorithm, create a highly scalable and efficient platform for AI workloads.</p>
<p>This analysis will give you insights into the architectural trade-offs involved in building vector search at scale. It also demonstrates how you can adapt a system designed for large-scale analytics so that it excels at modern AI tasks.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-unique-challenge-of-vector-search">The Unique Challenge of Vector Search</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-bigquerys-foundational-distributed-architecture">BigQuery's Foundational Distributed Architecture</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-dremel-the-distributed-query-engine">Dremel: The Distributed Query Engine</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-borg-cluster-management-and-resource-orchestration">Borg: Cluster Management and Resource Orchestration</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-colossus-the-distributed-storage-layer">Colossus: The Distributed Storage Layer</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-jupiter-the-high-speed-network-fabric">Jupiter: The High-Speed Network Fabric</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-the-role-of-columnar-storage-in-vector-operations">The Role of Columnar Storage in Vector Operations</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-accelerating-computations-with-simd">Accelerating Computations with SIMD</a></li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-the-treeah-indexing-algorithm">The TreeAH Indexing Algorithm</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-1-hierarchical-tree-structure">1. Hierarchical Tree Structure</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-2-product-quantization-pq">2. Product Quantization (PQ)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-3-asymmetric-hashing">3. Asymmetric Hashing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-architectural-comparison-treeah-vs-hnsw">Architectural Comparison: TreeAH vs. HNSW</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-the-end-to-end-vector-search-query-flow">The End-to-End Vector Search Query Flow</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-practical-implications-for-engineering-teams">Practical Implications for Engineering Teams</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-1-query-latency-vs-throughput">1. Query Latency vs. Throughput</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-2-cost-model-considerations">2. Cost Model Considerations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-3-index-management-trade-offs">3. Index Management Trade-offs</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-4-integration-benefits-that-actually-matter">4. Integration Benefits That Actually Matter</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-further-reading">Further Reading</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>This article assumes that you have a solid foundation in distributed systems and database internals, including familiarity with concepts like columnar storage, query execution plans, and distributed query processing.</p>
<p>You should understand the basics of vector embeddings and similarity search, though we'll briefly review the fundamentals. Experience with at least one vector database or search system (such as pgvector, Pinecone, or Elasticsearch) will help contextualize the architectural comparisons.</p>
<p>While deep knowledge of Google Cloud Platform isn't required, basic familiarity with cloud data warehouses and their typical architectures will be beneficial. The article includes discussions of SIMD operations and CPU-level optimizations, so comfort with low-level performance considerations is helpful, though not mandatory.</p>
<p>Code examples assume working knowledge of SQL, with some sections referencing implementation details in languages like Python or Java. Most importantly, you should have experience building or operating production data systems at scale, as many insights focus on practical engineering trade-offs rather than theoretical concepts.</p>
<h2 id="heading-the-unique-challenge-of-vector-search">The Unique Challenge of Vector Search</h2>
<p>Vector search fundamentally differs from traditional database operations in ways that challenge our existing infrastructure assumptions. Where conventional queries leverage decades of optimization around exact matching and range scans, vector similarity search requires computing distances between high-dimensional points at massive scale.</p>
<p>Consider the numbers. Modern embedding models produce vectors with 768 or more dimensions. At 4 bytes per float32 value, a single embedding consumes roughly 3KB. A modest corpus of 100 million items translates to 300GB of vector data.</p>
<p>But the real challenge isn't storage. The killer is computation. Finding the nearest neighbors to a query vector means computing distance metrics across all those dimensions. For 100 million vectors, a brute-force search requires 76.8 billion floating-point operations per query just for the distance calculations. Even with modern SIMD instructions processing 16 floats at once, you're looking at billions of CPU cycles per search.</p>
<p>This computational reality forces a fundamental compromise: we abandon exact solutions for approximate ones. Approximate Nearest Neighbor (ANN) algorithms trade perfect accuracy for practical query times. They work by partitioning the vector space cleverly, building graphs of nearest neighbors, or using hashing schemes to avoid examining every vector. The engineering challenge becomes balancing query latency, recall accuracy, and resource consumption.</p>
<p>Most purpose-built vector databases address this through specialized in-memory indexes like HNSW or IVF. These work well for single queries but require keeping massive indexes in RAM. In case you are not familiar with these vector indexes, you can read <a target="_blank" href="https://medium.com/towards-artificial-intelligence/unlocking-the-power-of-efficient-vector-search-in-rag-applications-c2e3a0c551d5">this article</a>.</p>
<p>BigQuery took a different path. Rather than optimizing for single-query latency, they asked what vector search would look like when built for analytical workloads at warehouse scale. The answer required rethinking basic assumptions about index design, storage layout, and query execution.</p>
<h2 id="heading-bigquerys-foundational-distributed-architecture">BigQuery's Foundational Distributed Architecture</h2>
<p>BigQuery's vector search runs on the same infrastructure that's been processing SQL queries since 2011. No new cluster type. No specialized vector nodes. Just four core technologies that power most of Google's data processing, now handling a workload they weren't originally designed for.</p>
<p>This isn't the obvious choice. Most vector databases build specialized infrastructure optimized for similarity search. Graph-based indexes need fast random access. In-memory systems require careful memory management. BigQuery took its existing distributed SQL engine and asked: can we make this work for vectors, too?</p>
<p>The answer required leveraging four foundational systems in new ways:</p>
<ul>
<li><p>Dremel, the query engine that normally handles SQL, now orchestrates vector similarity computations.</p>
</li>
<li><p>Borg, which allocates resources for everything from Search to YouTube, dynamically assigns thousands of workers to vector queries.</p>
</li>
<li><p>Colossus stores embeddings in the same distributed filesystem that holds petabytes of analytics data.</p>
</li>
<li><p>And Jupiter's datacenter network, built for bulk data processing, now shuttles vector data between computation nodes.</p>
</li>
</ul>
<p>What's surprising isn't that it works, but how well it works. The same architecture that runs aggregate queries over trillion-row tables can search billion-scale vector collections. Understanding how requires examining each component and how they've been adapted for this new workload.</p>
<h3 id="heading-dremel-the-distributed-query-engine">Dremel: The Distributed Query Engine</h3>
<p>At its core, BigQuery is powered by Dremel, a distributed query execution engine developed at Google since 2006.</p>
<p>Dremel processes SQL queries using a hierarchical serving tree. A root server receives the query and orchestrates the execution, while mixer nodes break down the work and distribute it to hundreds or thousands of leaf nodes. These leaf nodes perform the actual computations in parallel on segments of the data.</p>
<p>This architecture allows BigQuery to dynamically allocate a massive number of execution threads, known as slots, to a single query, enabling it to process petabytes of data in seconds.</p>
<h3 id="heading-borg-cluster-management-and-resource-orchestration">Borg: Cluster Management and Resource Orchestration</h3>
<p>The serverless nature of BigQuery is made possible by Borg, Google's cluster management system that predates and inspired Kubernetes.</p>
<p>When a vector search query is submitted, Borg is responsible for finding available machines across Google's global data centers, allocating the precise amount of CPU and memory resources needed for the query's Dremel slots, and managing fault tolerance by automatically rescheduling work if a machine fails. This dynamic resource allocation means users do not need to provision or scale infrastructure, whether they are searching 1,000 vectors or 10 billion.</p>
<h3 id="heading-colossus-the-distributed-storage-layer">Colossus: The Distributed Storage Layer</h3>
<p>Data in BigQuery is stored in Colossus, Google's next-generation distributed file system. Colossus is designed for exabyte-scale storage, provides high availability through automatic cross-datacenter replication, and is optimized for the high-throughput parallel reads required by Dremel's leaf nodes.</p>
<p>During a vector search, Colossus can deliver data to thousands of nodes simultaneously without creating a storage bottleneck.</p>
<h3 id="heading-jupiter-the-high-speed-network-fabric">Jupiter: The High-Speed Network Fabric</h3>
<p>These compute and storage systems are interconnected by Jupiter, Google's internal datacenter network, which features a petabit-per-second bisection bandwidth. The network's design ensures that data can move between Colossus storage and Dremel compute nodes at extremely high speeds, making data shuffling and aggregation phases of a query efficient.  </p>
<p><img alt="Big Query vector search architecture is powered by Dremel Query Engine, Borg Orchestrator for resource allocation, Colossus for large scale data storage and Jupiter network for ultra high bandwidth data transfer" width="600" height="400" loading="lazy"></p>
<h2 id="heading-the-role-of-columnar-storage-in-vector-operations">The Role of Columnar Storage in Vector Operations</h2>
<p>Storing vectors in columns sounds wrong. Vectors are arrays. They belong together. Why split them across columnar storage?</p>
<p>BigQuery does it anyway, and it works brilliantly. Here's why.</p>
<p>When you search a million vectors, you need exactly one thing from each row: the embedding. Not the product name, price, or category. Just the vector. Row-oriented storage forces you to read entire records and throw away 90% of the data. Columnar storage reads only what you need.</p>
<p>The performance impact is dramatic. A table with 768-dimensional embeddings plus 20 other columns might total 3TB. Reading just the embedding column? 300GB. That's a <strong>10x reduction in I/O</strong> before you've done any actual computation.</p>
<p>But the real magic happens at the CPU level. Columnar storage naturally aligns vector data for SIMD processing. Instead of jumping around memory gathering vector components, the CPU finds them laid out sequentially, ready for bulk operations. Modern processors can load 16 floating-point values into a single register and process them simultaneously.</p>
<p>Compression becomes almost trivial, too. BigQuery's Capacitor format applies techniques like Product Quantization directly to the column data, shrinking vectors from 3KB to under 300 bytes. Try doing that with row-oriented storage where vectors are scattered across pages.</p>
<p>The lesson? Sometimes the "wrong" abstraction at one level enables the right optimizations at another.</p>
<h3 id="heading-accelerating-computations-with-simd">Accelerating Computations with SIMD</h3>
<p>SIMD instructions are a form of hardware-level parallelism available in modern CPUs that provide significant speedups for vector arithmetic. This is achieved through special instruction sets built into the processor.</p>
<p>For example, AVX-512 (Advanced Vector Extensions 512-bit) is an instruction set found in modern high-performance CPUs, such as those from Intel, that allows a single instruction to operate on 512 bits of data at once.</p>
<p>Since a standard single-precision floating-point number is 32 bits, a CPU with AVX-512 can process 16 floating-point numbers in a single operation. This leads to dramatic performance gains.</p>
<p>The difference between scalar and SIMD processing for vector distance calculations is stark:</p>
<ul>
<li><p><strong>Scalar approach</strong>: Loop through each dimension, multiply corresponding components, accumulate results. For 768 dimensions, that's 768 multiplications, 768 additions, and terrible cache performance as you jump between two different memory locations for each iteration.</p>
</li>
<li><p><strong>SIMD approach</strong>: Load 16 components from each vector into 512-bit registers. Execute a single multiply instruction that handles all 16 pairs. Execute a single horizontal add. Repeat 48 times. The CPU's pipeline stays full, the cache prefetcher knows exactly what data you need next, and you've turned 1,536 operations into 96.</p>
</li>
</ul>
<p>The columnar storage pays off here, too. Vectors stored contiguously in memory align perfectly with SIMD register loads. No gather operations, no wasted cycles. Just pure throughput.</p>
<p><img alt="TreeAH SIMD In-Register Operations Speed up distance calculations with the help of pre-computed distance table and parallel operations " width="600" height="400" loading="lazy"></p>
<p>BigQuery's query engine is designed to leverage SIMD extensively. It automatically detects and uses the optimal instruction set available on the underlying hardware (for example, AVX-512 for Intel, NEON for ARM). The columnar storage format ensures that vector data is laid out in memory in a way that is friendly to SIMD registers, and the engine processes query vectors in large batches to maximize the utilization of these parallel instructions.</p>
<h2 id="heading-the-treeah-indexing-algorithm">The TreeAH Indexing Algorithm</h2>
<p>While brute-force search can be effective at smaller scales due to BigQuery's massive parallelism, efficient search over billions of vectors requires an index. BigQuery's primary vector index is TreeAH (Tree with Asymmetric Hashing), which is based on Google's open-sourced ScaNN (Scalable Nearest Neighbors) algorithm. TreeAH combines three techniques to achieve high performance and memory efficiency.</p>
<h3 id="heading-1-hierarchical-tree-structure">1. Hierarchical Tree Structure</h3>
<p>The algorithm first partitions the entire vector space into thousands of smaller lists. You can think of this like organizing a massive library. Instead of having one giant room with a million books, a library has floors, sections, and shelves. This hierarchy allows you to find a book without scanning every single one.</p>
<p>Similarly, TreeAH groups semantically similar vectors together into partitions and arranges them in a tree. During a query, the search navigates this tree by comparing the query vector to "centroid" vectors that represent the center of each partition, effectively following a path to the most relevant partitions and pruning away large, irrelevant branches of the search space.</p>
<h3 id="heading-2-product-quantization-pq">2. Product Quantization (PQ)</h3>
<p>Within TreeAH, PQ serves a different purpose than just compression. The index doesn't just store smaller vectors – it fundamentally changes how distance calculations work.</p>
<p>TreeAH learns partition-specific codebooks that capture the local structure of vectors in each tree node. This means vectors that end up in the "shoes" partition get quantized differently than those in "electronics." The compression becomes semantic-aware.</p>
<p>When combined with the tree structure, this creates a powerful effect: not only are you searching fewer vectors (thanks to the tree), but you're computing distances faster on the vectors you do search (thanks to PQ).</p>
<h3 id="heading-3-asymmetric-hashing">3. Asymmetric Hashing</h3>
<p>The "asymmetric" aspect refers to the fact that the query vector is kept in its full-precision form, while the database vectors are compared in their compressed, quantized form.</p>
<p>The vectors are not of different dimensions, but of different precision. The semantic matching works because the comparison is not direct. The compressed database vector is a code that points to a region in the original vector space. The distance calculation uses the full-precision query vector to look up a pre-computed distance to the center of that region. This way, the rich information in the query vector is used to accurately estimate the distance, avoiding the significant information loss that would occur if both vectors were compressed.</p>
<h3 id="heading-architectural-comparison-treeah-vs-hnsw">Architectural Comparison: TreeAH vs. HNSW</h3>
<p>To better understand the design philosophy behind TreeAH, it’s useful to compare it with HNSW (Hierarchical Navigable Small World), a popular graph-based algorithm used in many dedicated vector databases.</p>
<p>HNSW constructs a multi-layered graph where vectors are nodes and edges connect them to their nearest neighbors. It’s known for excellent single-query latency.</p>
<p>But this performance comes with significant memory overhead, as the graph structure must be stored in addition to the full-precision vectors. HNSW index builds can also be time-consuming, and frequent data updates can lead to memory fragmentation and performance degradation.</p>
<p>TreeAH, in contrast, makes different architectural trade-offs that align with BigQuery's nature as a distributed analytics system.</p>
<p>The comparison reveals a fundamental design choice: TreeAH prioritizes batch throughput, memory efficiency, and scalability over absolute single-query latency. This makes it well-suited for analytical workloads where thousands of searches are performed simultaneously.</p>
<p><img alt="TreeAH vs. HNSW Architectural Comparison" width="600" height="400" loading="lazy"></p>
<h2 id="heading-the-end-to-end-vector-search-query-flow">The End-to-End Vector Search Query Flow</h2>
<p>The execution timeline of a BigQuery vector search demonstrates how parallel processing eliminates traditional bottlenecks. When a VECTOR_SEARCH query arrives, the system initiates multiple operations concurrently rather than executing them sequentially.</p>
<p>The root server begins query planning immediately upon receiving the request. In parallel, Borg starts allocating compute slots across the cluster, targeting 1,000 slots distributed across 50 or more nodes. Borg prioritizes slots that are physically close to the data in Colossus to minimize data movement costs. This allocation typically completes within 10 milliseconds.</p>
<p>Query planning and resource allocation overlap significantly. The mixer nodes receive partial execution plans and begin partitioning the search space before Borg completes all slot allocations. When TreeAH indexes are available, mixers use them to assign specific vector partitions to leaf nodes. This streaming approach ensures that leaf nodes receive work assignments as soon as they come online.</p>
<p>The parallel execution phase showcases the architecture's efficiency. Hundreds or thousands of leaf nodes simultaneously read their assigned vector partitions from Colossus. Jupiter's high-bandwidth network prevents I/O congestion even with thousands of concurrent reads. Each leaf node operates independently: loading compressed vectors, executing SIMD operations for distance calculations, and maintaining local top-k results.</p>
<p>Aggregation begins before all leaf nodes complete their local searches. Mixers implement a streaming merge algorithm that processes results as they arrive. This approach means that by the time the slowest leaf node reports its results, the mixers have already processed most of the data. The final global top-k emerges from this continuous merging process.</p>
<p>The measured 40-millisecond execution time represents the longest path through the parallel execution graph, not the sum of individual operations. Most operations complete much faster, but the overall latency is bounded by the slowest component. This design trades single-query latency for massive throughput, enabling BigQuery to process thousands of vector searches concurrently across billions of vectors.</p>
<p><img alt="Big Query Vector Search Timeline" width="600" height="400" loading="lazy"></p>
<h2 id="heading-practical-implications-for-engineering-teams">Practical Implications for Engineering Teams</h2>
<p>The architectural choices behind BigQuery's vector search create specific trade-offs that engineering teams need to understand before committing to this approach.</p>
<h3 id="heading-1-query-latency-vs-throughput">1. Query Latency vs. Throughput</h3>
<p>BigQuery vector searches typically complete in 1-10 seconds, not the sub-100ms latency of specialized vector databases. But you can run thousands of searches concurrently without degradation. This makes BigQuery ideal for batch recommendation generation, similarity analysis across product catalogs, or embedding-based data enrichment pipelines. It's the wrong choice for autocomplete features or real-time personalization that requires immediate responses.</p>
<h3 id="heading-2-cost-model-considerations">2. Cost Model Considerations</h3>
<p>BigQuery charges for data scanned, not query execution time. A vector search that scans 1TB costs the same whether it completes in 2 seconds or 20 seconds. This model favors workloads where you search large datasets infrequently rather than small datasets continuously. Running vector search on a 10GB table thousands of times per day will be more expensive than a dedicated vector database with fixed infrastructure costs.</p>
<h3 id="heading-3-index-management-trade-offs">3. Index Management Trade-offs</h3>
<p>TreeAH indexes update automatically in the background when new data arrives, typically within 5-15 minutes. You cannot force immediate index updates or control index parameters like you can with HNSW or IVF indexes. This simplicity reduces operational overhead but limits optimization options. If your use case requires fine-tuning recall/latency trade-offs or immediate consistency after updates, you'll need a different solution.</p>
<h3 id="heading-4-integration-benefits-that-actually-matter">4. Integration Benefits That Actually Matter</h3>
<p>The ability to JOIN vector search results with business data in a single query is more powerful than it initially appears. Consider this query pattern:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">WITH</span> semantic_matches <span class="hljs-keyword">AS</span> (

  <span class="hljs-keyword">SELECT</span> item_id, distance

  <span class="hljs-keyword">FROM</span> VECTOR_SEARCH(

    <span class="hljs-keyword">TABLE</span> products,

    <span class="hljs-string">'embedding'</span>,

    (<span class="hljs-keyword">SELECT</span> embedding <span class="hljs-keyword">FROM</span> queries <span class="hljs-keyword">WHERE</span> query_id = @query_id)

  )

)

<span class="hljs-keyword">SELECT</span> p.*, s.distance

<span class="hljs-keyword">FROM</span> semantic_matches s

<span class="hljs-keyword">JOIN</span> products p <span class="hljs-keyword">USING</span> (item_id)

<span class="hljs-keyword">WHERE</span> p.in_stock = <span class="hljs-literal">TRUE</span>

  <span class="hljs-keyword">AND</span> p.price <span class="hljs-keyword">BETWEEN</span> <span class="hljs-number">50</span> <span class="hljs-keyword">AND</span> <span class="hljs-number">200</span>

  <span class="hljs-keyword">AND</span> p.category_restrictions <span class="hljs-keyword">IS</span> <span class="hljs-literal">NULL</span>

<span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> s.distance

<span class="hljs-keyword">LIMIT</span> <span class="hljs-number">20</span>
</code></pre>
<p>This combines semantic search with business logic, inventory status, and access controls in one atomic operation. Implementing this with a separate vector database requires complex synchronization between systems.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>BigQuery's vector search implementation challenges our assumptions about what a data warehouse can do. Instead of building another specialized vector database, Google pushed their existing infrastructure to handle a fundamentally different workload.</p>
<p>The key insight is recognizing that vector search at scale is a data processing problem. And processing data at scale is what BigQuery was built for.</p>
<p>By leveraging its columnar architecture and hardware-aware algorithms like TreeAH, BigQuery makes a deliberate trade-off. It exchanges the sub-millisecond latency of in-memory systems for massive batch throughput and incredible resource efficiency. An index that uses <strong>10x less memory</strong> than HNSW is a trade-off many teams building analytical AI systems would gladly make.</p>
<p>The real power emerges when vectors live alongside business data. Complex queries that would require multiple systems and synchronization nightmares become simple SQL. "Find similar products, but only from reliable suppliers, in stock locally, with no recent quality issues." One query, one system, no architectural gymnastics.</p>
<p>This approach validates a broader trend: vector capabilities are becoming table stakes for data platforms. The question isn't whether your data platform will support vectors, but how well it integrates them into existing workflows.</p>
<p>For teams building analytical AI applications, BigQuery offers a pragmatic path. It won't win latency benchmarks against dedicated vector databases. But for batch processing, integrated analytics, and operational simplicity at scale, it demonstrates that sometimes the best vector database isn't a vector database at all. It's your data warehouse, evolved.</p>
<h3 id="heading-further-reading">Further Reading</h3>
<ul>
<li><p><a target="_blank" href="https://cloud.google.com/blog/products/bigquery/bigquery-under-the-hood">BigQuery Under the Hood</a>: Official architecture deep dive</p>
</li>
<li><p><a target="_blank" href="https://github.com/google-research/google-research/tree/master/scann/docs/algorithms.md">ScaNN Algorithm Details</a>: The mathematics behind TreeAH</p>
</li>
<li><p><a target="_blank" href="https://research.google/pubs/pub36632/">Dremel: Interactive Analysis of Web-Scale Datasets</a>: The foundational paper</p>
</li>
<li><p><a target="_blank" href="https://research.google/pubs/pub43438/">Large-scale cluster management at Google with Borg</a>: Understanding resource orchestration</p>
</li>
<li><p><a target="_blank" href="https://research.google/pubs/pub43837/">Jupiter Rising: A Decade of Clos Topologies</a>: Google's datacenter networking</p>
</li>
<li><p><a target="_blank" href="https://medium.com/google-cloud/bigquery-vector-search-a-practitioners-guide-0f85b0d988f0">BigQuery Vector Search: A Practitioner's Guide</a>: Optimization strategies</p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Configure Google Workspace Addon For Tier 2 CASA Security Assessment – Step by Step Guide ]]>
                </title>
                <description>
                    <![CDATA[ As part of the Google CASA process, developers can run static analysis on their application’s source code using an inline integration with OpenText’s Fortify Source Code Analyzer (SCA) via the CASA portal. Naturally, I had to prepare my source code a... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/tier-casa-security-assessment/</link>
                <guid isPermaLink="false">66ba5b0dcccc49d721b6ea3a</guid>
                
                    <category>
                        <![CDATA[ Google ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Nibesh Khadka ]]>
                </dc:creator>
                <pubDate>Fri, 23 Feb 2024 16:43:39 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/02/Addon-Assesment-Poster--4.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>As part of the <a target="_blank" href="https://appdefensealliance.dev/casa">Google CASA process</a>, developers can run static analysis on their application’s source code using an inline integration with OpenText’s Fortify Source Code Analyzer (SCA) via the CASA portal.</p>
<p>Naturally, I had to prepare my source code as per instruction. In this article, I will share how I packaged and submitted my Add-on's source code in Ubuntu OS. </p>
<p>But before that, let's talk a little about Tier 2 CASA assessment.</p>
<h2 id="heading-what-is-tier-2-casa-security-assessment">What is Tier 2 CASA Security Assessment?</h2>
<p>The <a target="_blank" href="https://appdefensealliance.dev/casa">Tier 2 CASA</a> (Cloud Application Security Assessment) is a self-service security assessment process for applicants seeking access to Google Workspace data or to comply with specific Google Workspace policies. </p>
<p>It allows developers to scan their applications and submit the results for verification without an external assessor accessing the code or infrastructure.</p>
<h3 id="heading-importance-tier-2-casa-security-assessment">Importance Tier 2 CASA security assessment</h3>
<p>Tier 2 CASA is important for several reasons:</p>
<ul>
<li><strong>Security Assurance:</strong> It provides independent verification of your application's security posture, reducing the risk of data breaches and protecting user privacy.</li>
<li><strong>Compliance:</strong> It helps meet security requirements for accessing Google Workspace data or adhering to Google policies, like the Workspace Marketplace Terms of Service.</li>
<li><strong>Efficiency:</strong> It's a faster and more cost-effective alternative to Tier 1 assessments, which involves external assessors directly examining your application.</li>
<li><strong>Trust</strong>: If your addon is published without verification it'll display an "unverified" message to the clients while installing the addon, which creates distrust and can lead to the installation process of your addon to be abandoned.</li>
</ul>
<p>In the context of my Google Workspace Addon <a target="_blank" href="https://appdefensealliance.dev/casa">Scan Me</a>, the use of <a target="_blank" href="https://developers.google.com/apps-script/add-ons/concepts/editor-scopes#restricted_scopes">restrictive</a> OAuth scope  <a target="_blank" href="https://developers.google.com/identity/protocols/oauth2/scopes#drive"><code>auth/drive</code></a> of Google Drive API likely triggered the need for a Tier 2 assessment. This scope grants your addon access to see, edit, create, and delete all of your Google Drive files, which falls under Google's security and privacy requirements.</p>
<h3 id="heading-additional-resources">Additional Resources</h3>
<ul>
<li><strong><a target="_blank" href="https://appdefensealliance.dev/casa/tier-2/getting-started">CASA Tier 2 Overview</a></strong></li>
<li><strong><a target="_blank" href="https://tacsecurity.com/google-casa-cloud-application-security-assessment/">CASA Documentation</a></strong></li>
<li><strong><a target="_blank" href="https://workspace.google.com/terms/marketplace/tos/">Google Workspace Marketplace Terms of Service</a></strong></li>
</ul>
<p><strong>Disclaimer</strong>: While I'll explain the Tier 2 CASA process<a target="_blank" href="https://workspace.google.com/marketplace/app/scan_me/613697866593">,</a> it's crucial to consult the official documentation and Google's security guidelines for specific requirements and guidance.</p>
<p>The assessment certification is free, by the way. To prepare your addon for the CASA assessment process follow the following steps.</p>
<h2 id="heading-step-1-sign-up-for-the-new-assessment-procedure"><strong>Step 1</strong> – <strong>Sign up for the new Assessment Procedure</strong></h2>
<p>If you're using restrictive scopes, you'll receive an email from Google's Verification team at some point requesting to verify the scopes after you've submitted your add-on for verification. </p>
<p>This email is the notification document. So, you need to download this email as a PDF, which must be submitted in the application form later on.</p>
<p>In that email, you'll find the following instructions for Tier 2 evaluation. You'll find a link to <a target="_blank" href="https://rc.products.pwc.com/login/casa/register">register</a> or <a target="_blank" href="https://rc.products.pwc.com/login/casa/">log-in</a> to the CASA portal. Click the link and register to the site. Then click on <strong>Start New Assessment&gt; Create New Assessment.</strong></p>
<p>Fill in the information asked carefully. Upload the previously downloaded email where you're asked for a Tier 2 notification pdf.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708599483954/704e9bf1-ac25-414d-b3f7-dcda721a82fd.png?auto=compress,format&amp;format=webp&amp;auto=compress,format&amp;format=webp" alt="Image" width="1748" height="1240" loading="lazy">
<em>Starting New CASA Assesment of the Addon</em></p>
<p><strong>Note</strong>: For Google Workspace Addon, the type of application is <strong>Local App</strong>.</p>
<p><strong>Caution</strong>: As shown in the image above, even though "<strong>Project ID</strong>" is asked in the input field, they are asking for the <strong>Project Number</strong> included in the email, not the <strong>Project ID</strong> of your Google Cloud Console project.</p>
<p>After you carefully fill in the details and submit the form, you'll arrive at a new screen – <strong>Application Screening</strong> – where there are two things that you should download:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1708599799176/52631f9a-8719-472e-997e-2169d1063127.png?auto=compress,format&amp;format=webp&amp;auto=compress,format&amp;format=webp" alt="Image" width="915" height="163" loading="lazy">
<em>Download Scan Cenral Package and Setup Insruction</em></p>
<ol>
<li>Fortify Scan Central Package.</li>
<li>Instruction on compressing your application's source code for initial assessment.</li>
</ol>
<h2 id="heading-step-2-download-and-setup-java-jdk"><strong>Step 2</strong> – <strong>Download and Setup Java JDK</strong></h2>
<p>To use the Scan Central package as mentioned in the instructions, a minimum of JDK 11 is required. </p>
<p>For setting up the path for the Java environment in Linux, I followed <a target="_blank" href="https://stackoverflow.com/a/73414921/6163929">this</a> instruction on StackOverflow.</p>
<h2 id="heading-step-3-setup-path-for-scan-central"><strong>Step 3</strong> – <strong>Setup Path for Scan Central</strong></h2>
<p>Now let's add the path to the Scan Central in our system.</p>
<p>In your CLI, open <code>.bashrc</code> file with <code>sudo nano ~/.bashrc</code>. Append the following path at the end of the file:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># SCAN Central </span>
<span class="hljs-comment"># Path looks like following</span>
<span class="hljs-comment">#/home/&lt;username&gt;/Fortify_ScanCentral_Client_22.2.1_x64/bin</span>

 <span class="hljs-built_in">export</span> PATH=<span class="hljs-variable">$PATH</span>:&lt;Path To bin folder <span class="hljs-keyword">in</span> Scan Central&gt;
</code></pre>
<p>Save (CTRL+S) and exit (CTRL + X) the file.</p>
<p>Open <code>.profile</code> with <code>sudo nano ~/.profile</code> and add the same path as above. You can check the version of Scan Central in your CLI with the command <code>scancentral -version</code>, to make sure the setup was successful.</p>
<h2 id="heading-step-4-packaging-source-code-for-assessment"><strong>Step 4</strong> – <strong>Packaging Source Code for</strong> A<strong>ssessment</strong></h2>
<p>To package the source code for your Google Workspace Addon, go to the root directory of your project. If you're following the instruction manual, go to the section for JavaScript code packaging.</p>
<p>In the root directory run any of the following commands:</p>
<pre><code class="lang-bash"><span class="hljs-comment">#cmd 1 </span>
scancentral package -bt none -o myPackage.zip
<span class="hljs-comment"># or cmd 2</span>
scancentral package -bt none --scan-node-modules -o myPackage.zip
</code></pre>
<p><strong>Note</strong>: The command <code>scancentral.bat</code> is for Windows users.</p>
<p>As mentioned in the instruction, command 2 increases the size of the package and is not necessary for Node.js or Angular. I created Workspace Addon so I don't have node-modules in my source code.</p>
<p>After that, you'll see a compressed package named <strong>myPackage</strong> in the directory where you ran the packaging operation.</p>
<h2 id="heading-step-5-initiate-the-scan-process"><strong>Step 5</strong> – <strong>Initiate the Scan Process</strong></h2>
<p>After packaging, go back to the CASA portal and click on your assessment ID link in the list, and open up the <strong>Application Screening</strong> window. Here:</p>
<ol>
<li>Click the <strong>Begin Scan Process</strong> button.</li>
<li>Upload the package you just compressed.</li>
<li>Click the <strong>Upload File &amp; Initiate Scan</strong> button.</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/casa-form-filling--2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Upload Source Code To Fortify Scan</em></p>
<p>This will initiate auto scanning of your application which is the beginning of assessment for your Addon.</p>
<p><strong>Reminder</strong>: As I've personally experienced, if your source code uses the <code>Math.random()</code> method, then the auto-scanner will not pass your code.</p>
<p>If you pass this phase, the manual verification process will begin where you'll have to fill in forms for the survey. Go to this <a target="_blank" href="https://lookerstudio.google.com/u/0/reporting/757d8fab-9682-4b74-9acc-58efb5e3081c/page/p_ana6axxq4c?s=tug3GYx0bmg">link</a> for the questions that'll be asked in the CASA survey. Here, choose the <strong>Local App</strong> option for App Type for a Google Workspace Add-on. I want to remind you that they will change based on the answer provided.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Alright, I'm hoping this blog helped you reduce the time and confusion that I had to encounter when I was trying to assess my <a target="_blank" href="https://workspace.google.com/marketplace/app/scan_me/613697866593">addon</a>. And please don't give up midway during the evaluation otherwise your months of hard work will be in vain.</p>
<p>My addon <a target="_blank" href="https://workspace.google.com/marketplace/app/scan_me/613697866593">Scan Me</a>, scans the Google Drive and prepares an audit report in a spreadsheet file of your choosing in your Google Drive. It makes it extremely easy for you to analyze your Google Drive from one place, and it also offers a free quota. If you're looking for a similar addon I hope you'll try this addon. </p>
<p>This is Nibesh khadka, have a good day.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Google Cloud Associate Cloud Engineer Certification Study Course – Pass the Exam With This Free 20 Hour Course ]]>
                </title>
                <description>
                    <![CDATA[ By Andrew Brown What is the Google Cloud Associate Cloud Engineer? The Associate Cloud Engineer also commonly referred to as the ACE is the associate level certification by Google Cloud.  They key difference between the Associate Cloud Engineer and t... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/google-cloud-digital-leader-certification-study-course-pass-the-exam-with-this-free-20-hour-course/</link>
                <guid isPermaLink="false">66d45daa787a2a3b05af4390</guid>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Mon, 02 May 2022 13:32:13 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/05/ace.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Andrew Brown</p>
<h2 id="heading-what-is-the-google-cloud-associate-cloud-engineer">What is the Google Cloud Associate Cloud Engineer?</h2>
<p>The Associate Cloud Engineer also commonly referred to as the ACE is the associate level certification by Google Cloud. </p>
<p>They key difference between the Associate Cloud Engineer and the Google Cloud Digital Cloud Leader is that the Associate Cloud Engineer is a hands-on certification requiring you to have hands-on experience with the Google Cloud's core services.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/04/google-cloud-roadmap.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Google Cloud only has a single Associate level certification. This is different compared to AWS which has three, and Azure which has more than six associate level certifications.</p>
<p>Having only a single associate level certification means that you are expected to know more information the compared to say the AWS Solutions Architect Associate.   </p>
<p>However the Associate Cloud Engineer better reflects both in role-based title and knowledge closer to entry level jobs into the cloud industry.</p>
<h2 id="heading-what-is-a-cloud-engineer">What is a Cloud Engineer?</h2>
<p>A Cloud Engineer is the most popular a IT/DevOps role within the cloud industry. A Cloud Engineer is an entry level role where you tasked with designing, implement and maintain cloud infrastructure.</p>
<p>This cloud role lets you perform a bit of everything can and from there you can future your cloud career by specializing in an advanced role such as Solutions Architect, DevOps Engineer, Data Engineer and many more.</p>
<p>To obtain a cloud role its recommended to hold an associate level certification from a Cloud Servicer Provider (CSPs) like Google Cloud as well as building your cloud project to showcase within your portfolio.</p>
<h2 id="heading-overview-of-the-google-cloud-associate-cloud-engineer">Overview of the Google Cloud Associate Cloud Engineer</h2>
<p>The exam guide divides the exam questions into the following five sections.</p>
<ul>
<li>Section 1. Setting up a cloud solution environment</li>
<li>Section 2. Planning and configuring a cloud solution</li>
<li>Section 3. Deploying and implementing a cloud solution </li>
<li>Section 4. Ensuring successful operation of a cloud solution</li>
<li>Section 5. Configuring access and security</li>
</ul>
<h2 id="heading-how-do-you-get-certified"><strong>How do you get Certified?</strong></h2>
<p>Google uses <a target="_blank" href="https://www.freecodecamp.org/news/p/bc0cbbc3-1a26-43ac-a07c-e158c256003e/Kryterion">Kryterion</a> as its test center. You can take the exam in-person or online.</p>
<p>There are 5<strong>0 multiple-choice</strong> and multiple-select questions and you have to score <strong>70% to pass</strong></p>
<p>You get 2 hours to complete this course.</p>
<p>The Google Cloud Associate Cloud Engineer  is <strong>$</strong>125 <strong>USD.</strong></p>
<h2 id="heading-can-i-simply-watch-the-videos-and-pass-the-exam"><strong>Can I simply watch the videos and pass the exam?</strong></h2>
<p>Is it recommended that you complete the labs within your Google Cloud account to ensure you completely understand each concept. Cloud Service Providers often change their UI and the experience can vary based on the region you wish to launch services.</p>
<p>Students that perform labs within their own accounts are more likely to be able to recall information.</p>
<p>Practice exams are strongly recommended after your study course. ExamPro has a <a target="_blank" href="https://www.exampro.co/gcp-ace">full free practice</a> to help you study for the final exam.</p>
<p>Head on over <a target="_blank" href="https://youtu.be/jpno8FSqpc8">to freeCodeCamp's YouTube channel</a> to start working through the full 6.0 hour course.</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/jpno8FSqpc8" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Google BigQuery Beginner's Guide – How to Analyze Large Datasets ]]>
                </title>
                <description>
                    <![CDATA[ By Ambreen Khan Gone are the days of storing your data in a CSV file or an Excel spreadsheet. If you want to quickly analyze millions of data rows in seconds, BigQuery is the way to go. In this getting started guide, we'll learn about BigQuery and ho... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/google-bigquery-beginners-guide/</link>
                <guid isPermaLink="false">66d45d99aad1510d0766b5e3</guid>
                
                    <category>
                        <![CDATA[ bigquery ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Google ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Mon, 12 Jul 2021 11:26:39 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/07/web-1.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Ambreen Khan</p>
<p>Gone are the days of storing your data in a CSV file or an Excel spreadsheet. If you want to quickly analyze millions of data rows in seconds, BigQuery is the way to go.</p>
<p>In this getting started guide, we'll learn about BigQuery and how we can use it to query and analyze data.</p>
<h2 id="heading-what-is-bigquery">What is BigQuery?</h2>
<p>BigQuery is an enterprise data warehouse that many companies use who need a fully-managed cloud based solution for their massive datasets. </p>
<p>BigQuery's serverless architecture allows you to quickly execute standard SQL queries and analyze millions of data rows in seconds.  You can then store your data both in Google Cloud Storage in files and buckets or in BigQuery storage. </p>
<p>BigQuery also has excellent integrations with other GCP products, like Data Flow and Data Studio that makes it a great choice for data analytics tasks.</p>
<h2 id="heading-before-you-begin">Before You Begin:</h2>
<p>We are going to query tables in a public dataset that Google has provided to try out BigQuery using the Google Cloud Platform. Therefore, this guide assumes that:</p>
<ul>
<li>You have an access on <a target="_blank" href="https://cloud.google.com/free/?gclid=CjwKCAjw55-HBhAHEiwARMCsziVtllCq8mRIWlXVVztmn6HkzAlkuajtZeYMInLQmykNGfbEjz2tfRoCFs0QAvD_BwE&amp;gclsrc=aw.ds">Google Cloud Platform</a>.</li>
<li>You have already created a <a target="_blank" href="https://cloud.google.com/bigquery/docs/quickstarts/quickstart-web-ui#before-you-begin">Google Cloud project</a>.</li>
<li>Google sandbox environment is up and running. </li>
</ul>
<h2 id="heading-how-to-access-a-public-dataset">How to Access a Public Dataset</h2>
<p>A public dataset is available to the general public through the <a target="_blank" href="https://cloud.google.com/public-datasets">Google Cloud Public Dataset Program</a>. We'll use a Hacker News dataset that contains all stories and comments from Hacker News from its launch in 2006 to present. Let's get started.</p>
<p>Navigate to <a target="_blank" href="https://console.cloud.google.com/marketplace/product/y-combinator/hacker-news">Hacker News dataset</a> and click the VIEW DATASET button. It will take you to the Google Cloud Platform login screen. Login to the account and it will open the BigQuery Editor window with the dataset. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-51.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-the-bigquery-interface-is-organized">How the BigQuery Interface Is Organized</h2>
<p>BigQuery is structured as a hierarchy with 4 levels:</p>
<ul>
<li>Projects: Top-level containers that store the data</li>
<li>Datasets: Within projects, datasets allow you to organize your data and hold one or more tables of data</li>
<li>Tables: Within datasets, tables hold actual data.</li>
<li>Jobs: task performed on data such as running queries, loading data, and exporting data.</li>
</ul>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-53.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><strong>Note:</strong> Please note that while working with tables, you'll also notice that:</p>
<ul>
<li>Tables are broken out by day meaning that you will need to use a wildcard, or * to pull a larger date range.</li>
<li>There is also an “intraday” table that will give you data for the last 24 hours.</li>
</ul>
<h2 id="heading-how-to-check-the-table-schema">How to Check the Table Schema</h2>
<p>Click on the table name. This will allow you to see what columns are in the table, as well as some buttons to perform various operations on the table.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-55.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-preview-the-data">How to Preview the Data</h2>
<p>Use the preview button to get a sample of some rows in the table. <a target="_blank" href="https://cloud.google.com/bigquery/docs/best-practices-costs#avoid_select_">Don’t do a <code>SELECT *</code> in BigQuery</a>:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-56.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-query-big-data">How to Query Big Data</h2>
<p>SQL statements are used to perform various database tasks, such as querying data, creating tables, and updating databases.</p>
<h3 id="heading-basic-queries">Basic Queries</h3>
<p>Basic queries contain the following components:</p>
<ul>
<li><code>SELECT</code> (required): identifies the columns to be included in the query</li>
<li><code>FROM</code> (required): the table that contains the columns in the SELECT statement</li>
<li><code>WHERE</code>: a condition for filtering records</li>
<li><code>ORDER BY</code>: Used to sort the result-set in ascending or descending order.</li>
<li><code>GROUP BY</code>: how to aggregate data in the result set</li>
</ul>
<h2 id="heading-how-to-compose-a-query-in-bigquery">How to Compose a Query in BigQuery</h2>
<p>For our first query, let’s find out what are the top 5 domains shared in Hacker News in year 2021 so far (query executed on July 9th 2021).</p>
<p>Click the <strong>Compose New query</strong> button. It will open the editor tab.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-41.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Write your first query as below:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> REGEXP_EXTRACT(<span class="hljs-keyword">url</span>, <span class="hljs-string">'//([^/]*)/?'</span>) <span class="hljs-keyword">domain</span>, <span class="hljs-keyword">COUNT</span>(*) total
<span class="hljs-keyword">FROM</span> <span class="hljs-string">`bigquery-public-data.hacker_news.full`</span>
<span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">url</span>!=<span class="hljs-string">''</span> <span class="hljs-keyword">AND</span> <span class="hljs-keyword">EXTRACT</span>(<span class="hljs-keyword">YEAR</span> <span class="hljs-keyword">FROM</span> <span class="hljs-built_in">timestamp</span>)=<span class="hljs-number">2021</span>
<span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> <span class="hljs-keyword">domain</span> <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> total <span class="hljs-keyword">DESC</span> <span class="hljs-keyword">LIMIT</span> <span class="hljs-number">5</span>
</code></pre>
<p>You'll notice that BigQuery debugs your code as you construct it. If the query is valid, then a check mark appears along with the amount of data that the query will process. This helps you determine the cost of running the query. </p>
<p>If the query is invalid, then an exclamation point appears along with an error message.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-59.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>To run this query, click on the Run button. In a few seconds, you should see results returned from the query:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-60.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>You can click on the <strong>JSON</strong> tab if you want the results in JSON format. You'll also find interesting details under the 'Execution details' column.</p>
<h2 id="heading-how-to-query-multiple-tables-using-a-wildcard-table"><strong>How to Query Multiple Tables Using a Wildcard Table</strong></h2>
<p>Wildcard tables enable you to query multiple tables using concise SQL statements. A wildcard table represents a union of all the tables that match the wildcard expression:</p>
<p><code>FROM</code>tablename.stories_*`` </p>
<h3 id="heading-tablesuffix-pseudo-column">_TABLE_SUFFIX Pseudo Column</h3>
<p>Queries with wildcard tables support the <code>_TABLE_SUFFIX</code> pseudo column in the <code>WHERE</code> clause. To restrict a query so that it scans only a specified set of tables, use the <code>_TABLE_SUFFIX</code> pseudo column in a <code>WHERE</code> clause with a condition that is a constant expression.</p>
<p>Using <code>_TABLE_SUFFIX</code> can greatly reduce the number of bytes scanned, which helps reduce the cost of running your queries.</p>
<h3 id="heading-how-to-get-data-by-providing-a-date-range">How to Get Data by Providing a Date Range</h3>
<pre><code>WHERE _TABLE_SUFFIX BETWEEN
    FORMAT_DATE(‘%Y%m%d’,DATE_SUB(CURRENT_DATE(), INTERVAL <span class="hljs-number">36</span> MONTH))
    AND
    FORMAT_DATE(‘%Y%m%d’,DATE_SUB(CURRENT_DATE(), INTERVAL <span class="hljs-number">1</span> DAY))
</code></pre><h3 id="heading-how-to-use-unnest-to-flatten-the-date">How to Use UNNEST to Flatten the Date</h3>
<p>To convert an <code>ARRAY</code> into a set of rows, also known as "flattening," use the <a target="_blank" href="https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#unnest_operator"><code>UNNEST</code></a> operator. <code>UNNEST</code> takes an <code>ARRAY</code> and returns a table with a single row for each element in the <code>ARRAY</code>:</p>
<pre><code>SELECT * FROM UNNEST ([<span class="hljs-string">'Ambreen'</span>, <span class="hljs-string">'Abdul'</span>, <span class="hljs-string">'Adam'</span>, <span class="hljs-string">'David'</span>]) AS names;
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-45.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-save-and-share-queries">How to Save and Share Queries</h2>
<p>You can save your queries for later use. There are 3 types of saved queries:</p>
<ul>
<li><strong>Private:</strong> Private saved queries are visible only to the user who creates them.</li>
<li><strong>Project-level:</strong> Project-level saved queries are visible to members of the predefined BigQuery IAM roles with the required <a target="_blank" href="https://cloud.google.com/bigquery/docs/saving-sharing-queries#permissions">permissions</a>.</li>
<li><strong>Public:</strong> Public saved queries are visible to anyone with a link to the query.</li>
</ul>
<h2 id="heading-summary">Summary</h2>
<p>BigQuery is much more sophisticated than what we explored in this simple tutorial. You can also export Firebase Analytics data to BigQuery, which will let you run sophisticated ad hoc queries against your analytics data. </p>
<p>And with BigQuery ML, you can create and execute machine learning models using standard SQL queries. </p>
<p>If you’re feeling excited and want to learn more about BigQuery, check out the links below.</p>
<h2 id="heading-resources">Resources:</h2>
<ul>
<li><a target="_blank" href="https://support.google.com/analytics/answer/4419694?hl=en#zippy=%2Cin-this-article">BigQuery cookbook</a> </li>
<li><a target="_blank" href="https://cloud.google.com/bigquery/docs/querying-wildcard-tables#filtering_selected_tables_using_table_suffix">Filtering selected tables using _TABLE_SUFFIX</a> </li>
<li><a target="_blank" href="https://firebase.googleblog.com/2017/03/bigquery-tip-unnest-function.html">BigQuery Tip: The UNNEST Function</a></li>
<li><a target="_blank" href="https://towardsdatascience.com/bigquery-unnest-how-to-work-with-nested-data-in-bigquery-f27006a64c3">BigQuery UNNEST: How to work with nested data in BigQuery</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Google Cloud Platform Tutorial: From Zero to Hero with GCP ]]>
                </title>
                <description>
                    <![CDATA[ By Sergio Fuentes Navarro Do you have the knowledge and skills to design a mobile gaming analytics platform that collects, stores, and analyzes large amounts of bulk and real-time data? Well, after reading this article, you will. I aim to take you fr... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/google-cloud-platform-from-zero-to-hero/</link>
                <guid isPermaLink="false">66d460ee33b83c4378a5183d</guid>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Google Cloud Platform ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Fri, 09 Oct 2020 15:24:46 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/10/gcp.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Sergio Fuentes Navarro</p>
<p>Do you have the knowledge and skills to design a mobile gaming analytics platform that collects, stores, and analyzes large amounts of bulk and real-time data?</p>
<p>Well, after reading this article, you will.</p>
<p>I aim to take you <strong>from zero to hero in Google Cloud Platform (GCP) in just one article</strong>. I will show you how to:</p>
<ul>
<li>Get started with a GCP account for free</li>
<li>Reduce costs in your GCP infrastructure</li>
<li>Organize your resources</li>
<li>Automate the creation and configuration of your resources</li>
<li>Manage operations: logging, monitoring, tracing, and so on.</li>
<li>Store your data</li>
<li>Deploy your applications and services</li>
<li>Create networks in GCP and connect them with your on-premise networks</li>
<li>Work with Big Data, AI, and Machine Learning</li>
<li>Secure your resources</li>
</ul>
<p>Once I have explained all the topics in this list, I will share with you a solution to the system I described. </p>
<p>If you do not understand some parts of it, you can go back to the relevant sections. And if that is not enough, visit the links to the documentation that I have provided.</p>
<p><strong>Are you up for a challenge?</strong> I have selected a few questions from old GCP Professional Certification exams. They will test your understanding of the concepts explained in this article.</p>
<p>I recommend trying to solve both the design and the questions <em>on your own</em>, going back to the guide if necessary. Once you have an answer, compare it to the proposed solution.</p>
<p>Try to go beyond what you are reading and ask yourself what would happen if requirement X changed:</p>
<ul>
<li>Batch vs streaming data</li>
<li>Regional vs global solution</li>
<li>A small number of users vs huge volume of users</li>
<li>Latency is not a problem vs real-time applications</li>
</ul>
<p>And any other scenarios you can think of.</p>
<p>At the end of the day, you are not paid just for what you know but for your thought process and the decisions you make. That is why it is vitally important that you exercise this skill.</p>
<p>At the end of the article, I'll provide more resources and next steps if you want to continue learning about GCP.</p>
<h2 id="heading-how-to-get-started-with-google-cloud-platform-for-free"><strong>How to get started with G</strong>oogle <strong>C</strong>loud <strong>P</strong>latform <strong>for</strong> free</h2>
<p>GCP currently offers a <a target="_blank" href="https://cloud.google.com/free">3 month free trial</a> with $300 US dollars of free credit. You can use it to get started, play around with GCP, and run experiments to decide if it is the right option for you. </p>
<p><strong>You will NOT</strong> <strong>be charged</strong> <strong>at the end of your trial</strong>. You will be notified and your services will stop running unless you decide to upgrade your plan.</p>
<p>I strongly recommend using this trial to practice. To learn, you have to <strong>try things on your own</strong>, face problems, break things, and fix them. It doesn't matter how good this guide is (or the official documentation for that matter) if you do not try things out.</p>
<h2 id="heading-why-would-you-migrate-your-services-to-google-cloud-platform"><strong>Why would you migrate your services to Google Cloud Platform?</strong></h2>
<p>Consuming resources from GCP, like storage or computing power, provides the following benefits:</p>
<ul>
<li>No need to spend a lot of money upfront for hardware</li>
<li>No need to upgrade your hardware and migrate your data and services every few years</li>
<li>Ability to scale to adjust to the demand, paying only for the resources you consume</li>
<li>Create proof of concepts quickly since provisioning resources can be done very fast</li>
<li>Secure and manage your <a target="_blank" href="https://cloud.google.com/endpoints/docs">APIs</a></li>
<li>Not just infrastructure: data analytics and machine learning services are available in GCP</li>
</ul>
<p>GCP makes it easy to experiment and use the resources you need in an economical way.</p>
<h2 id="heading-how-to-optimize-your-vms-to-reduce-costs-in-gcp">How to optimize your VMs to reduce costs in GCP</h2>
<p>In general, you will only be <strong>charged for the time your instances are running</strong>. Google will not charge you for stopped instances. However, if they consume resources, like disks or reserved IPs, you might incur charges.</p>
<p>Here are some ways you can optimize the cost of running your applications in GCP.</p>
<h3 id="heading-custom-machine-types">Custom Machine Types</h3>
<p>GCP provides different <a target="_blank" href="https://cloud.google.com/compute/docs/machine-types">machine families</a> with predefined amounts of RAM and CPUs:</p>
<ul>
<li><strong>General-purpose</strong>. Offers the best price-performance ratio for a variety of workloads.</li>
<li><strong>Memory-optimized</strong>. Ideal for memory-intensive workloads. They offer more memory per core than other machine types.</li>
<li><strong>Compute-optimized</strong>. They offer the highest performance per core and and are optimized for compute-intensive workloads</li>
<li><strong>Shared-core</strong>. These machine types timeshare a physical core. This can be a cost-effective method for running small applications.</li>
</ul>
<p>Besides, you can create your custom machine with the amount of RAM and CPUs you need.</p>
<h3 id="heading-preemptible-vms">Preemptible VM's</h3>
<p>You can use preemptible virtual machines to save up to 80% of your costs. They are ideal <strong>for fault-tolerant, non-critical applications</strong>.  You can save the progress of your job in a persistent disk using a shut-down script to continue where you left off.</p>
<p>Google may stop your instances at any time (with a 30-second warning) and will always stop them after 24 hours.</p>
<p>To reduce the chances of getting your VMs shut down, Google recommends:</p>
<ul>
<li>Using <strong>many small instances</strong> and</li>
<li>Running your jobs during <strong>off-peak</strong> times.</li>
</ul>
<p><strong>Note</strong>: Start-up and shut-down scripts apply to non-preemptible VMS as well. You can use them the control the behavior of your machine when it starts or stops. For instance, to install software, download data, or backup logs.</p>
<p>There are two options to define these scripts:</p>
<ul>
<li>When you are creating your instance in the Google Console, there is a field to paste your code.</li>
<li>Using the metadata server URL to point your instance to a script stored in Google Cloud Storage.</li>
</ul>
<p>This latter is preferred because it is easier to create many instances and to manage the script.</p>
<h3 id="heading-sustained-use-discounts">Sustained Use Discounts</h3>
<p>The longer you use your virtual machines (and Cloud SQL instances), the higher the discount - up to 30%. Google does this automatically for you.</p>
<h3 id="heading-committed-use-discounts">Committed Use Discounts</h3>
<p>You can get up to 57% discount if you commit to a certain amount of CPU and RAM resources for a period of 1 to 3 years.</p>
<p>To estimate your costs, use the <a target="_blank" href="https://cloud.google.com/products/calculator">Price Calculator</a>. This helps prevent any surprises with your bills and create <a target="_blank" href="https://cloud.google.com/billing/docs/how-to/budgets">budget alerts</a>.</p>
<h2 id="heading-how-to-manage-resources-in-gcphttpscloudgooglecomresource-managerdocs"><strong><a target="_blank" href="https://cloud.google.com/resource-manager/docs">How to manage resources in GCP</a></strong></h2>
<p>In this section, I will explain how you can manage and administer your Google Cloud resources.</p>
<h3 id="heading-resource-hierarchyhttpscloudgooglecomresource-managerdocscloud-platform-resource-hierarchy"><strong><a target="_blank" href="https://cloud.google.com/resource-manager/docs/cloud-platform-resource-hierarchy">Resource Hierarchy</a></strong></h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/resource-manager.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>There are four types of resources that can be managed through Resource Manager:</p>
<ul>
<li><strong>The organization resource</strong>. It is the root node in the resource hierarchy. It represents an organization, for example, a company.</li>
<li><strong>The projects resource</strong>. For example, to separate projects for production and development environments. They are <strong>required</strong> to create resources.</li>
<li><strong>The folder resource</strong>. They provide an extra level of project isolation. For example, creating a folder for each department in a company.</li>
<li><strong>Resources</strong>. Virtual machines, database instances, load balancers, and so on.</li>
</ul>
<p>There are <strong>quotas</strong> that limit the maximum number of resources you can create to prevent unexpected spikes in billing. However, most quotas can be increased by opening a support ticket.</p>
<p>Resources in GCP follow a <strong>hierarchy</strong> via a parent/child relationship, similar to a traditional file system, where:</p>
<ul>
<li><strong>Permissions are inherited as we descend the hierarchy</strong>. For example, permissions granted and the organization level will be propagated to all the folders and projects.</li>
<li>More permissive parent policies always overrule more restrictive child policies.</li>
</ul>
<p>This hierarchical organization helps you manage common aspects of your resources, such as access control and configuration settings.</p>
<p>You can create super admin accounts that have access to every resource in your organization. Since they are very powerful, make sure you follow <a target="_blank" href="https://cloud.google.com/resource-manager/docs/super-admin-best-practices">Google's best practices</a>.</p>
<h3 id="heading-labels">Labels</h3>
<p>Labels are key-value pairs you can use to organize your resources in GCP. Once you attach a label to a resource (for instance, to a virtual machine), you can filter based on that label. This is useful also to break down your bills by labels.</p>
<p>Some common use cases:</p>
<ul>
<li>Environments: prod, test, and so on.</li>
<li>Team or product owners</li>
<li>Components: backend, frontend, and so on.</li>
<li>Resource state: active, archive, and so on.</li>
</ul>
<h3 id="heading-labels-vs-network-tags">Labels vs Network tags</h3>
<p>These two similar concepts seem to generate some confusion. I have summarized the differences in this table:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Labels</td><td>Network tags</td></tr>
</thead>
<tbody>
<tr>
<td>Applied to any GCP resource</td><td>Applied only for VPC resources</td></tr>
<tr>
<td>Just organize resources</td><td>Affect how resources work (ex: through application of firewall rules)</td></tr>
</tbody>
</table>
</div><h3 id="heading-cloud-iamhttpscloudgooglecomiamdocs"><strong><a target="_blank" href="https://cloud.google.com/iam/docs">Cloud IAM</a></strong></h3>
<p>Simply put, Cloud IAM controls <strong>who can do what on which resource</strong>. A resource can be a virtual machine, a database instance, a user, and so on. </p>
<p>It is important to notice that permissions are not directly assigned to users. Instead, they are bundled into <em>roles</em>, which are assigned to <em>members</em>. A <em>policy</em> is a collection of one or more bindings of a set of members to a role.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/IAM.svg" alt="Image" width="600" height="400" loading="lazy"></p>
<h3 id="heading-identities">Identities</h3>
<p>In a GCP project, identities are represented by Google accounts, created outside of GCP, and defined by an email address (not necessarily @gmail.com). There are different types:</p>
<ul>
<li><strong>Google accounts*</strong>. To represent people: engineers, administrators, and so on.</li>
<li><strong>Service accounts</strong>. Used to identify non-human users: applications, services, virtual machines, and others. The authentication process is defined by <em>account keys</em>, which can be managed by Google or by users (only for user-created service accounts).</li>
<li><strong>Google Groups</strong> are a collection of Google and service accounts.</li>
<li><strong>G Suite Domain*</strong> is the type of account you can use to identify organizations. If your organization is already using <a target="_blank" href="https://en.wikipedia.org/wiki/Active_Directory">Active Directory</a>, it can be synchronized with Cloud IAM using <a target="_blank" href="https://cloud.google.com/identity/docs">Cloud Identity</a>.</li>
<li><strong>allAuthenticatedUsers</strong>. To represent any authenticated user in GCP.</li>
<li><strong>allUsers</strong>. To represent anyone, authenticated or not.</li>
</ul>
<p>Regarding service accounts, some of Google's best practices include:</p>
<ul>
<li>Not using the default service account</li>
<li>Applying the <a target="_blank" href="https://en.wikipedia.org/wiki/Principle_of_least_privilege">Principle of Least Privilege</a>. For instance:  </li>
<li>Restrict who can act as a service account  </li>
<li>Grant only the minimum set of permissions that the account needs  </li>
<li>Create service accounts for each service, only with the permissions the account needs</li>
</ul>
<h3 id="heading-roles">Roles</h3>
<p>A role is a <strong>collection of permissions</strong>. There are three types of roles:</p>
<ul>
<li><strong>Primitive</strong>. Original GCP roles that apply to the entire project. There are three concentric roles: <strong>Viewer</strong>, <strong>Editor</strong>, and <strong>Owner</strong>. Editor contains Viewer and Owner contains Editor.</li>
<li><strong>Predefined</strong>. Provides access to specific services, for example, storage.admin.</li>
<li><strong>Custom</strong>. lets you create your own roles, combining the specific permissions you need.</li>
</ul>
<p>When assigning roles, follow the principle of least privilege, too. In general, <strong>prefer predefined over primitive roles</strong>.</p>
<h3 id="heading-cloud-deployment-managerhttpscloudgooglecomdeployment-managerdocs"><strong><a target="_blank" href="https://cloud.google.com/deployment-manager/docs">Cloud Deployment Manager</a></strong></h3>
<p>Cloud Deployment Manager automates repeatable tasks like provisioning, configuration, and deployments for any number of machines. </p>
<p>It is Google's <em>Infrastructure as Code</em> service, similar to Terraform - although you can deploy only GCP resources. It is used by <a target="_blank" href="https://cloud.google.com/marketplace">GCP Marketplace</a> to create pre-configured deployments.</p>
<p>You define your configuration in YAML files, listing the resources (created through API calls) you want to create and their properties. Resources are defined by their <strong>name</strong> (VM-1, disk-1), <strong>type</strong> (compute.v1.disk, compute.v1.instance) and <strong>properties</strong> (zone:europe-west4, boot:false).</p>
<p>To increase performance, resources are deployed in parallel. Therefore you need to <strong>specify any dependencies using references</strong>. For instance, do not create virtual machine VM-1 until the persistent disk disk-1 has been created. In contrast, Terraform would figure out the dependencies on its own.</p>
<p>You can modularize your configuration files using templates so that they can be independently updated and shared. Templates can be defined in Python or Jinja2. The contents of your templates will be inlined in the configuration file that references them.</p>
<p>Deployment Manager will create a manifest containing your original configuration, any templates you have imported, and the expanded list of all the resources you want to create.</p>
<h2 id="heading-cloud-operations-formerly-stackdriverhttpscloudgooglecomstackdriverdocs"><strong><a target="_blank" href="https://cloud.google.com/stackdriver/docs">Cloud Operations (formerly Stackdriver)</a></strong></h2>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/stackd.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Operations provide a set of tools for monitoring, logging, debugging, error reporting, profiling, and tracing of resources in GCP (AWS and even on-premise).</p>
<h3 id="heading-cloud-logging"><strong>Cloud Logging</strong></h3>
<p>Cloud Logging is GCP's centralized solution for real-time log management. For each of your projects, it allows you to store, search, analyze, monitor, and alert on logging data:</p>
<ul>
<li>By default, data will be stored for a certain period of time. The retention period varies depending on the type of log. You cannot retrieve logs after they have passed this retention period.</li>
<li>Logs can be exported for different purposes. To do this, you create a <strong>sink</strong>, which is composed of a <strong>filter</strong> (to select what you want to log) and a <strong>destination</strong>: Google Cloud Storage (GCS) for long term retention, BigQuery for analytics, or Pub/Sub to stream it into other applications.</li>
<li>You can create log-based metrics in Cloud Monitoring and even get alerted when something goes wrong.</li>
</ul>
<p>Logs are a named collection of <strong>log entries</strong>. Log entries record status or events and includes the name of its log, for example, compute.googleapis.com/activity. There are two main types of logs:</p>
<p><strong>First, User Logs:</strong></p>
<ul>
<li>These are generated by your applications and services. </li>
<li>They are written to Cloud Logging using the Cloud Logging API, client libraries, or <a target="_blank" href="https://cloud.google.com/logging/docs/agent">logging agents</a> installed on your virtual machines. </li>
<li>They stream logs from common third-party applications like MySQL, MongoDB, or Tomcat.</li>
</ul>
<p><strong>Second, Security logs, divided into:</strong></p>
<ul>
<li>Audit logs, for administrative changes, system events, and data access to your resources. For example, who created a particular database instance or to log a <a target="_blank" href="https://cloud.google.com/compute/docs/instances/live-migration">live migration</a>. Data access logs must be explicitly enabled and may incur additional charges. The rest are enabled by default, cannot be disabled, and are free of charges.</li>
<li>Access Transparency logs, for actions taken by Google staff when they access your resources for example to investigate an issue you reported to the support team.</li>
</ul>
<h4 id="heading-vpc-flow-logs">VPC Flow Logs</h4>
<p>They are specific to VPC networks (which I will introduce later). VPC flow logs record a <strong>sample of network flows</strong> sent from and received by VM instances, which can be later access in Cloud Logging. </p>
<p>They can be used to monitor network performance, usage, forensics, real-time security analysis, and expense optimization.</p>
<p><strong>Note</strong>: you may want to log your billing data for analysis. In this case, you do <em>not</em> create a sink. You can directly export your reports to BigQuery.</p>
<h3 id="heading-cloud-monitoring"><strong>Cloud Monitoring</strong></h3>
<p>Cloud Monitoring lets you monitor the performance of your applications and infrastructure, visualize it in dashboards, create <a target="_blank" href="https://cloud.google.com/monitoring/uptime-checks">uptime checks</a> to detect resources that are down and <a target="_blank" href="https://cloud.google.com/monitoring/alerts">alert</a> you based on these checks so that you can fix problems in your environment. You can monitor resources in GCP, AWS, and even on-premise.</p>
<p>It is recommended to create a separate project for Cloud Monitoring since it can keep track of resources across multiple projects. </p>
<p>Also, it is recommended to install a monitoring agent in your virtual machines to send application metrics (including many third-party applications) to Cloud Monitoring. Otherwise, Cloud Monitoring will only display CPU, disk traffic, network traffic, and uptime metrics.</p>
<h4 id="heading-alerts">Alerts</h4>
<p>To receive alerts, you must declare an <strong>alerting policy</strong>. An alerting policy defines the <strong>conditions</strong> under which a service is considered unhealthy. When the conditions are met, a new incident will be created and notifications will be sent (via email, Slack, SMS, PagerDuty, etc). </p>
<p>A policy belongs to an individual workspace, which can contain a maximum of 500 policies.</p>
<h4 id="heading-trace">Trace</h4>
<p>Trace helps <strong>find bottlenecks in your services</strong>. You can use this service to figure out how long it takes to handle a request, which microservice takes the longest to respond, where to focus to reduce the overall latency, and so on.</p>
<p>It is enabled by default for applications running on Google App Engine (GAE) - Standard environment - but can be used for applications running on GCE, GKE, and Google App Engine Flexible.</p>
<h4 id="heading-error-reporting">Error Reporting</h4>
<p>Error Reporting will aggregate and display errors produced in services written in Go, Java, Node.js, PHP, Python, Ruby, or .NET. running on GCE, GKE, GAP, Cloud Functions, or Cloud Run.</p>
<h4 id="heading-debug">Debug</h4>
<p>Debug lets you inspect the application's state without stopping your service. Currently supported for Java, Go, Node.js and Python. It is automatically integrated with GAE but can be used on GCE, GKE, and Cloud Run.</p>
<h4 id="heading-profile">Profile</h4>
<p>Profiler that continuously gathers CPU usage and memory-allocation information from your applications. To use it, you need to install a profiling agent.</p>
<h2 id="heading-how-to-store-data-in-gcp"><strong>How to store data in GCP</strong></h2>
<p>In this section, I will cover both Google Cloud Storage (for any type of data, including files, images, video, and so on), the different database services available in GCP, and how to decide which storage option works best for you.</p>
<h3 id="heading-google-cloud-storage-gcshttpscloudgooglecomstoragedocs"><strong><a target="_blank" href="https://cloud.google.com/storage/docs">Google Cloud Storage (GCS)</a></strong></h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/gcs.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<p>GCS is Google's <strong>storage service for unstructured data</strong>: pictures, videos, files, scripts, database backups, and so on. </p>
<p>Objects are placed in <strong>buckets</strong>, from which they inherit permissions and storage classes. </p>
<p><a target="_blank" href="https://cloud.google.com/storage/docs/storage-classes">Storage classes</a> provide different SLAs for storing your data to minimize costs for your use case. A bucket's storage class can be changed (under some restrictions), but it will affect new objects added to the bucket only.</p>
<p>In addition to Google's console, you can interact with GCS from your command line, using <a target="_blank" href="https://cloud.google.com/storage/docs/gsutil">gsutil</a>. You can use specify:</p>
<ul>
<li><strong>Multithreaded updates</strong> when you need to upload a large number of small files. The command looks like gsutil -m cp files gs://my-bucket)</li>
<li><strong>Parallel updates</strong> when you need to upload large files. For more details and restrictions, visit this <a target="_blank" href="https://cloud.google.com/storage/docs/gsutil/commands/cp#parallel-composite-uploads">link</a>.</li>
</ul>
<p>Another option to upload files to GCS is <a target="_blank" href="https://cloud.google.com/storage-transfer/docs">Storage Transfer Service (STS)</a>, a service that imports data to a GCS bucket from:</p>
<ul>
<li>An AWS S3 bucket</li>
<li>A resource that can be accessed through HTTP(S)</li>
<li>Another Google Cloud Storage bucket</li>
</ul>
<p>If you need to upload huge amounts of data (from hundreds of terabytes up to one petabyte) consider <a target="_blank" href="https://cloud.google.com/transfer-appliance/docs/2.2">Data Transfer Appliance</a>: ship your data to a Google facility. Once they have uploaded the data to GCS, the process of <a target="_blank" href="https://cloud.google.com/transfer-appliance/docs/2.0/data-rehydration">data rehydration</a> reconstitutes the files so that they can be accessed again.</p>
<h4 id="heading-object-lifecycle-managementhttpscloudgooglecomstoragedocslifecycle"><a target="_blank" href="https://cloud.google.com/storage/docs/lifecycle">Object lifecycle management</a></h4>
<p>You can define rules that determine what will happen to an object (will it be archived or deleted) when a certain condition is met. </p>
<p>For example, you could define a policy to automatically change the storage class of an object from Standard to Nearline after 30 days and to delete it after 180 days.</p>
<p>This is the way a rule can be defined:</p>
<pre><code class="lang-js">{
   <span class="hljs-string">"lifecycle"</span>:{
      <span class="hljs-string">"rule"</span>:[
         {
            <span class="hljs-string">"action"</span>:{
               <span class="hljs-string">"type"</span>:<span class="hljs-string">"Delete"</span>
            },
            <span class="hljs-string">"condition"</span>:{
               <span class="hljs-string">"age"</span>:<span class="hljs-number">30</span>,
               <span class="hljs-string">"isLive"</span>:<span class="hljs-literal">true</span>
            }
         },
         {
            <span class="hljs-string">"action"</span>:{
               <span class="hljs-string">"type"</span>:<span class="hljs-string">"Delete"</span>
            },
            <span class="hljs-string">"condition"</span>:{
               <span class="hljs-string">"numNewerVersions"</span>:<span class="hljs-number">2</span>
            }
         },
         {
            <span class="hljs-string">"action"</span>:{
               <span class="hljs-string">"type"</span>:<span class="hljs-string">"Delete"</span>
            },
            <span class="hljs-string">"condition"</span>:{
               <span class="hljs-string">"age"</span>:<span class="hljs-number">180</span>,
               <span class="hljs-string">"isLive"</span>:<span class="hljs-literal">false</span>
            }
         }
      ]
   }
}
</code></pre>
<p>It will be applied through gsutils or a REST API call. Rules can be created also through the Google Console.</p>
<h4 id="heading-permissions-in-gcs">Permissions in GCS</h4>
<p>In addition to IAM roles, you can use Access Control Lists (ACLs) to manage access to the resources in a bucket. </p>
<p>Use IAM roles when possible, but remember that <strong>ACLs</strong> grant access to buckets and <strong>individual objects</strong>, while <strong>IAM roles are project or bucket wide</strong> permissions. Both methods work in tandem.</p>
<p>To grant temporary access to users outside of GCP, use <a target="_blank" href="https://cloud.google.com/storage/docs/access-control/signed-urls">Signed URLs</a>.</p>
<h4 id="heading-bucket-lock">Bucket lock</h4>
<p>Bucket locks allow you to enforce a <strong>minimum retention period</strong> for objects in a bucket. You may need this for auditing or legal reasons.</p>
<p><strong>Once a bucket is locked, it cannot be unlocked</strong>. To remove, you need to first remove all objects in the bucket, which you can only do after they all have reached the retention period specified by the retention policy. Only then, you can delete the bucket. </p>
<p>You can include the retention policy when you are creating the bucket or add a retention policy to an existing bucket (it retroactively applies to existing objects in the bucket too).</p>
<p>Fun fact: the maximum retention period is 100 years.</p>
<h3 id="heading-relational-managed-databases-in-gcp"><strong>Relational Managed Databases in GCP</strong></h3>
<p>Cloud SQL and Cloud Spanner are two managed database services available in GCP. If you do not want to deal with all the work necessary to maintain a database online, they are a great option. You can always spin a virtual machine and manage your own database.</p>
<h4 id="heading-cloud-sqlhttpscloudgooglecomsqldocs"><a target="_blank" href="https://cloud.google.com/sql/docs">Cloud SQL</a></h4>
<p>Cloud SQL provides access to a managed MySQL or PostgreSQL database instance in GCP. Each instance is limited to a <strong>single region</strong> and has a <strong>maximum capacity of 30 TB</strong>. </p>
<p>Google will take care of the installation, backups, scaling, monitoring, failover, and read replicas. For availability reasons, replicas must be defined in the same region but a different zone from the primary instances.</p>
<p>Data can be easily imported (first uploading the data to Google Cloud Storage and then to the instance) and exported using SQL dumps or CSV files format. Data can be compressed to reduce costs (you can directly import .gz files). For "lift and shift" migrations, this is a great option.</p>
<p>If you need global availability or more capacity, consider using Cloud Spanner.</p>
<h4 id="heading-cloud-spannerhttpscloudgooglecomspannerdocs"><a target="_blank" href="https://cloud.google.com/spanner/docs">Cloud Spanner</a></h4>
<p>Cloud Spanner is globally available and can scale (horizontally) very well. </p>
<p>These two features make it capable of supporting different use cases than Cloud SQL and more expensive too. Cloud Spanner is not an option for lift and shift migrations.</p>
<h3 id="heading-nosql-managed-databases-in-gcp"><strong>NoSQL Managed Databases in GCP</strong></h3>
<p>Similarly, GCP provides two managed NoSQL databases, Bigtable and Datastore, as well as an in-memory database service, Memorystore.</p>
<h4 id="heading-datastorehttpscloudgooglecomdatastoredocs"><a target="_blank" href="https://cloud.google.com/datastore/docs">Datastore</a></h4>
<p>Datastore is a completely no-ops, highly-scalable document database ideal for web and mobile applications: game states, product catalogs, real-time inventory, and so on. It's great for:</p>
<ul>
<li>User profiles - mobile apps</li>
<li>Game save states</li>
</ul>
<p>By default, Datastore has a built-in <strong>index</strong> that improves performance on simple queries. You can create your own indices, called <strong>composite indexes</strong>, defined in YAML format.</p>
<p>If you need extreme throughput (huge number of reads/writes per second), use Bigtable instead.</p>
<h4 id="heading-bigtablehttpscloudgooglecombigtabledocs"><a target="_blank" href="https://cloud.google.com/bigtable/docs">Bigtable</a></h4>
<p>Bigtable is a NoSQL database ideal for analytical workloads where you can expect a very high volume of writes, reads in the milliseconds, and the ability to store terabytes to petabytes of information. It's great for:</p>
<ul>
<li>Financial analysis</li>
<li>IoT data</li>
<li>Marketing data</li>
</ul>
<p>Bigtable requires the creation and configuration of your nodes (as opposed to the fully-managed Datastore or BigQuery). You can add or remove nodes to your cluster with zero downtime. The simplest way to interact with Bigtable is the command-line tool <a target="_blank" href="https://cloud.google.com/bigtable/docs/cbt-overview">cbt</a>.</p>
<p>Bigtable's performance will depend on the design of your database schema.</p>
<ul>
<li>You can only define one key per row and must keep all the information associated with an entity in the same row. Think of it as a hash table.</li>
<li>Tables are sparse: if there is no information associated with a column, no space is required.</li>
<li>To make reads more efficient, try to store related entities in adjacent rows.</li>
</ul>
<p>Since this topic is worth an article on its own, I recommend you read the <a target="_blank" href="https://cloud.google.com/bigtable/docs/performance">documentation</a>.</p>
<h4 id="heading-memorystorehttpscloudgooglecommemorystoredocs"><a target="_blank" href="https://cloud.google.com/memorystore/docs">Memorystore</a></h4>
<p>It provides a managed version of Redis and Memcache (in-memory databases), resulting in very fast performance. Instances are regional, like Cloud SQL, and have a capacity of up to 300 GB.</p>
<h3 id="heading-how-to-choose-your-database"><strong>How to choose your database</strong></h3>
<p>Google loves decision trees. This one will help you choose the right database your your projects. For unstructured data consider GCS or process it using Dataflow (discussed later).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/choose-db.svg" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-does-networking-work-in-gcp"><strong>How does networking work in GCP?</strong></h2>
<h3 id="heading-virtual-private-cloud-vpc-see-the-docs-herehttpscloudgooglecomvpcdocs"><strong>Virtual Private Cloud (VPC) - <a target="_blank" href="https://cloud.google.com/vpc/docs/">see the docs here</a></strong></h3>
<p>You can use the same network infrastructure that Google uses to run its services: YouTube, Search, Maps, Gmail, Drive, and so on. </p>
<p>Google infrastructure is divided into:</p>
<ul>
<li><strong>Regions</strong>: Independent geographical areas, at least 100 miles apart from each other, where Google hosts datacenters. It consists of 3 or more zones. For example, us-central1.</li>
<li><strong>Zones</strong>: Multiple individual datacenters within a region. For example, us-central1-a.</li>
<li><strong>Edge Points of Presence</strong>: points of connection between Google's network and the rest of the internet.</li>
</ul>
<p>GCP infrastructure is designed in a way that all traffic between regions travels through a global private network, resulting in better security and performance.</p>
<p>On top of this infrastructure, you can build networks for your resources, Virtual Private Clouds. They are <strong>software-defined networks</strong>, where all the traditional network concepts apply:</p>
<ul>
<li><a target="_blank" href="https://cloud.google.com/vpc/docs/vpc#subnets_vs_subnetworks">Subnets</a>. Logical partitions of a network defined using <a target="_blank" href="https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing">CIDR notation</a>. They belong to one region only but can span multiple zones. If you have multiple subnets (including your on-premise networks if they are connected to GCP), make sure the CIDR ranges do not overlap.</li>
<li><strong>IP addresses</strong>. Can be internal (for private communication within GCP) or external (to communicate with the rest of the internet). For external IP addresses, you can use an <strong>ephemeral IP</strong> or pay for a <strong>static IP</strong>. In general, you need an external IP address to connect to GCP services. However, in some cases, you can configure <a target="_blank" href="https://cloud.google.com/vpc/docs/private-access-options">private access</a> for instances that only have an internal IP.</li>
<li><a target="_blank" href="https://cloud.google.com/vpc/docs/firewalls">Firewalls rules</a>, to allow or deny traffic to your virtual machines, both incoming (ingress) and outgoing (egress). By default, all ingress traffic is denied and all egress traffic is allowed. Firewall rules are defined at the VPC level but they <strong>apply to individual instances or groups of instances</strong> using <strong>network tags</strong> or <strong>IP ranges</strong>.<br><strong>Common issue</strong>: If you know your VMs are working correctly but you cannot access them through HTTP(s) or cannot SSH into them, have a look at your firewall rules.</li>
</ul>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/vpc.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>You can create <strong>hybrid networks</strong> connecting your on-premise infrastructure to your VPC.</p>
<p>When you create a project, a <strong>default network</strong> will be created with subnets in each region (auto mode). You can delete this network, but you need to create at least one network to be able to create virtual machines. </p>
<p>You can also create your <strong>custom networks</strong>, where no subnets are created by default and you have full control over subnet creation (custom mode).</p>
<p>The main goal of a VPC is the <strong>separation of network resource</strong>s. A GCP project is a way to organize resources and manage permissions. </p>
<p>Users of project A need permissions to access resources in project B. All users can access any VPC defined in any project to which they belong. Within the same VPC, resources in subnet 1 need to be granted access to resources in subnet 2.</p>
<p>In terms of IAM roles, there is a distinction between who can create network resources (Network admin, to create subnets, virtual machines, and so on) and who is responsible for the security of the resources (Security Admin, to create firewall rules, SSL certificates, and so on). </p>
<p>The Compute Instance Admin role combines both roles.</p>
<p>As usual, there are quotas and limits to what you can do in a VPC, amongst them:</p>
<ul>
<li>The maximum number of VPCs in a project.</li>
<li>The maximum number of virtual machines per VPC.</li>
<li>No broadcast or multicast.</li>
<li>VPCs cannot use IPv6 to communicate internally, although global load balancers support IPv6 traffic.</li>
</ul>
<h3 id="heading-how-to-share-resources-between-multiple-vpcs"><strong>How to share resources between multiple VPCs</strong></h3>
<h4 id="heading-shared-vpchttpscloudgooglecomvpcdocsshared-vpc"><a target="_blank" href="https://cloud.google.com/vpc/docs/shared-vpc">Shared VPC</a></h4>
<p>Shared VPCs are a way to share resources between different projects within the same organization. This allows you to control billing and manage access to the resources in different projects, following the principle of least privilege. Otherwise you'd have to put all the resources in a single project.</p>
<p>To design a shared VPC, projects fall under three categories:</p>
<ul>
<li><strong>Host project</strong>. It is the project that hosts the common resources. There can only be one host project.</li>
<li><strong>Service project</strong>: Projects that can access the resources in the host project. A project cannot be both host and service.</li>
<li><strong>Standalone project</strong>. Any project that does not make use of the shared VPC.</li>
</ul>
<p>You will only be able to communicate between resources created <strong>after</strong> you define your host and service projects. Any existing resources before this will not be part of the shared VPC.</p>
<h4 id="heading-vpc-network-peeringhttpscloudgooglecomvpcdocsvpc-peering"><a target="_blank" href="https://cloud.google.com/vpc/docs/vpc-peering">VPC Network Peering</a></h4>
<p>Shared VPCs can be used when all the projects belong to the same organization. However, if:</p>
<ul>
<li>You need <strong>private communication</strong> across VPCs.</li>
<li>The VPCs are in projects that may belong to <strong>different organizations</strong>.</li>
<li>Want <strong>decentralized</strong> control, that is, no need to define host projects, server projects, and so on.</li>
<li>Want to reuse existing resources.</li>
</ul>
<p>VPC Network peering is the right solution.</p>
<p>In the next section, I will discuss how to connect your VPC(s) with networks outside of GCP.</p>
<h2 id="heading-how-to-connect-on-premise-and-gcp-infrastructures">How to connect on-premise and GCP infrastructures</h2>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/interc.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<p>There are three options to connect your on-premise infrastructure to GCP:</p>
<ul>
<li>Cloud VPN</li>
<li>Cloud Interconnect</li>
<li>Cloud Peering</li>
</ul>
<p>Each of them with different capabilities, use cases, and prices that I will describe in the following sections.</p>
<h3 id="heading-cloud-vpnhttpscloudgooglecomnetwork-connectivitydocsvpn"><a target="_blank" href="https://cloud.google.com/network-connectivity/docs/vpn">Cloud VPN</a></h3>
<p>With Cloud VPN, your traffic travels through the public internet over an encrypted tunnel. Each tunnel has a maximum capacity of 3 Gb per second and you can use a maximum of 8 for better performance. These two characteristics make VPN the cheapest option.</p>
<p>You can define two types of routes between your VPC and your on-premise networks:</p>
<ul>
<li><strong>Static routes</strong>. You have to manually define and update them, for example when you add a new subnet. This is not the preferred option.</li>
<li><strong>Dynamic routes</strong>. Routes are automatically handled (defined and updated) for you using <a target="_blank" href="https://cloud.google.com/network-connectivity/docs/router">Cloud Router</a>. This is the preferred option when <a target="_blank" href="https://en.wikipedia.org/wiki/Border_Gateway_Protocol">BGP</a> is available.</li>
</ul>
<p>Your traffic gets encrypted and decrypted by VPN Gateways (in GCP, they are regional resources). </p>
<p>To have a more robust connection, consider using multiple VPN gateways and tunnels. In case of failure, this redundancy guarantees that traffic will still flow.</p>
<h3 id="heading-cloud-interconnecthttpscloudgooglecomnetwork-connectivitydocsinterconnect"><a target="_blank" href="https://cloud.google.com/network-connectivity/docs/interconnect">Cloud Interconnect</a></h3>
<p>With Cloud VPN, traffic travels through the public internet. With Cloud Interconnect, there is a <strong>direct physical connection</strong> between your on-premises network and your VPC. This option will be more expensive but will provide the best performance.</p>
<p>There are two types of interconnect available, depending on how you want your connection to GCP to materialize:</p>
<ul>
<li><strong>Dedicated interconnect</strong>. There is "a direct cable" connecting your infrastructure and GCP. This is the fastest option, with a capacity of 10 to 200 Gb per second. However, it is not available everywhere: at the time of this writing, only in 62 locations in the world.</li>
<li><strong>Partner interconnect</strong>. You connect through a service provider. This option is more geographically available, but the not as fast as a dedicated interconnects: from 50 Mb per second to 10 Gb per second.</li>
</ul>
<h3 id="heading-cloud-peeringhttpscloudgooglecomvpcdocsusing-vpc-peering"><a target="_blank" href="https://cloud.google.com/vpc/docs/using-vpc-peering">Cloud Peering</a></h3>
<p>Cloud peering is not a GCP service, but you can use it to connect your network to Google's network and access services like Youtube, Drive, or GCP services. </p>
<p>A common use case is when you need to connect to Google but don't want to do it over the public internet.</p>
<h3 id="heading-other-networking-services"><strong>Other networking services</strong></h3>
<h4 id="heading-load-balancers-lbhttpscloudgooglecomload-balancingdocsload"><a target="_blank" href="https://cloud.google.com/load-balancing/docs/load">Load Balancers (LB)</a></h4>
<p>In GCP, load balancers are pieces of software that distribute user requests among a group of instances. </p>
<p>A load balancer may have multiple backends associated with it, having rules to decide the appropriate backend for a given request.</p>
<p>There are different types of load balancers. They differ in the type of traffic (HTTP vs TCP/UDP - Layer 7 or Layer 4), whether they handle external or internal traffic, and whether their scope is regional or global:</p>
<ul>
<li><strong>HTTP(s)</strong>. Global LB that handles HTTP(s) requests, distributing traffic to multiple regions based on user location (to the closest region with available instances) or URL maps (the LB can be configured to forward requests to <em>URL/news</em> to a backend service and <em>URL/videos</em> to a different one). It can receive both IPv4 and IPv6 traffic (but this one is terminated at the LB level and proxied as IPv4 to the backends) and has native support for WebSockets.</li>
<li><strong>SSL Proxy LB</strong>. <strong>Global</strong> LB that handles encrypted TCP traffic, managing SSL certificates for you.</li>
<li><strong>TCP Proxy LB</strong>. <strong>Global</strong> LB that handles unencrypted TCP traffic. Like SSL Proxy LB, by default, it will not preserve the client's IP, but this can be changed.</li>
<li><strong>Network Load Balancer</strong>. Regional LB that handles TCP/UDP external traffic, based on IP address and port.</li>
<li><strong>Internal Load Balancer</strong>. Like a Network LB, but for internal traffic.</li>
</ul>
<p>For the visual learners:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/lb.svg" alt="Image" width="600" height="400" loading="lazy"></p>
<h4 id="heading-cloud-dnshttpscloudgooglecomdnsdocs"><a target="_blank" href="https://cloud.google.com/dns/docs">Cloud DNS</a></h4>
<p>Cloud DNS is Google's managed <a target="_blank" href="https://en.wikipedia.org/wiki/Domain_Name_System">Domain Name System (DNS)</a> host, both for internal and external (public) traffic. It will map URLs like <a target="_blank" href="https://www.freecodecamp.org/">https://www.freecodecamp.org/</a> to an IP address. It is the only service in GCP with 100% SLA - it is available 100% of the time.</p>
<h4 id="heading-google-cloud-cdnhttpscloudgooglecomcdndocs"><a target="_blank" href="https://cloud.google.com/cdn/docs">Google Cloud CDN</a></h4>
<p>Cloud DNS is Google's <a target="_blank" href="https://en.wikipedia.org/wiki/Content_delivery_network">Content Delivery Network</a>. If you have data that does not change often (images, videos, CSS, etc.) it makes sense to cache it close to your users. Cloud CDN provides 90 Edges Point of Presence (POP) to cache the data close to your end-users.</p>
<p>After the first request, static data can be stored in a POP, usually much closer to your user than your main servers. Thus, in subsequent requests, you can retrieve the data faster from the POP and reduce the load on your backend servers.</p>
<h2 id="heading-where-can-you-run-your-applications-in-gcp"><strong>Where can you run your applications in GCP?</strong></h2>
<p>I will present 4 places where your code can run in GCP:</p>
<ul>
<li>Google Compute Engine</li>
<li>Google Kubernetes Engine</li>
<li>App Engine</li>
<li>Cloud Functions</li>
</ul>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/gce.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><strong>Note</strong>: there is a 5th option: Firebase is Google's mobile platform that helps you quickly develop apps.</p>
<h3 id="heading-compute-engine-gcehttpscloudgooglecomcomputedocs"><strong><a target="_blank" href="https://cloud.google.com/compute/docs">Compute Engine (GCE)</a></strong></h3>
<p>Compute engine allows you to spin up virtual machines in GCP. This section will be longer since GCE provides the infrastructure where GKE and GAE run.</p>
<p>In the introduction, I talked about the different types of VMs you can create in GCE. Now, I will cover where to store the data, how to back it up, and how to create instances with all the data and configuration you need.</p>
<h4 id="heading-where-to-store-your-vms-data-disks">Where to store your VM's data: disks</h4>
<p>Your data can be stored in <strong>Persistent disks</strong>, <strong>Local SSDs</strong>, or in <strong>Cloud Storage</strong>.</p>
<h5 id="heading-persistent-diskhttpscloudgooglecompersistent-disk"><strong><a target="_blank" href="https://cloud.google.com/persistent-disk">Persistent Disk</a></strong></h5>
<p>Persistent disks provide durable and reliable block storage. They are not local to the machine. Rather, they are networked attached, which has its pros and cons:</p>
<ul>
<li>Disks can be resized, attached, or detached from a VM even if the instance is in use.</li>
<li>They have high reliability.</li>
<li>Disks can survive the instance after its deletion.</li>
<li>If you need more space, simply attach more disks.</li>
<li>Larger disks will provide higher performance.</li>
<li>Being networked attached, they are less performant than local options. SSD persistent disks are also available for more demanding workloads.</li>
</ul>
<p>Every instance will need one boot disk and it must be of this type.</p>
<h5 id="heading-local-ssdhttpscloudgooglecomlocal-ssd"><strong><a target="_blank" href="https://cloud.google.com/local-ssd">Local SSD</a></strong></h5>
<p>Local SSDs are attached to a VM to which they provide high-performance ephemeral storage. As of now, you can attach up to eight 375GB local SSDs to the same instance. However, this data will be lost if the VM is killed.</p>
<p>Local SSDs can only be attached to a machine when it is created, but you can attach both local SSDs and persistent disks to the same machine.</p>
<p>Both types of disks are zonal resources.</p>
<h5 id="heading-cloud-storage"><strong>Cloud Storage</strong></h5>
<p>We have extensively covered GCS in a previous section. GCS is not a filesystem, but you can use <a target="_blank" href="https://cloud.google.com/storage/docs/gcs-fuse">GCS-Fuse</a> to mount GCS buckets as filesystems in Linux or macOS systems. You can also let apps download and upload data to GCS using standard filesystem semantics.</p>
<h4 id="heading-how-to-back-up-your-vms-data-snapshots">How to back up your VM's data: Snapshots</h4>
<p>Snapshots are backups of your disks. To reduce space, they are created incrementally:</p>
<ul>
<li>Back up 1 contains all your disk content</li>
<li>Back up 2 only contains the data that has changed since back up 1</li>
<li>Back up 3 only contains the data that has changed since back up 2, and so on</li>
</ul>
<p>This is enough to restore the state of your disk.</p>
<p>Even though snapshots can be taken without stopping the instance, it is best practice to at least reduce its activity, stop writing data to disk, and flush buffers. This helps you make sure you get an accurate representation of the content of the disk.</p>
<h4 id="heading-imageshttpscloudgooglecomcomputedocsimages"><a target="_blank" href="https://cloud.google.com/compute/docs/images">Images</a></h4>
<p>Images refer to the operating system images needed to create boot disks for your instances. There are two types of images:</p>
<ul>
<li><strong>Public images</strong>. They are provided and maintained by Google, open-source communities, and third-party vendors. Ready for you to use as soon as you create your project. Available to anyone</li>
<li><strong>Custom images</strong>. Images that you have created.</li>
<li>They are linked to the project in which you created them but you can share them with other projects.</li>
<li>You can create images from <strong>persistent disks</strong> and <strong>other images</strong>, both from the same project or shared from another project.</li>
<li>Related images can be grouped in <strong>image families</strong> to simplify the management of the different image versions.</li>
<li>For Linux-based images, you can share them also by exporting them to Cloud Storage as a tar.gz file.</li>
</ul>
<p>You might be asking yourself what is the difference between an image and a snapshot. Mainly, <strong>their purpose</strong>. Snapshots are taken as incremental backups of a disk while images are created to spin up new virtual machines and configure instance templates.</p>
<p><strong>Note on images vs startup scripts:</strong></p>
<p>For simple setups, startup scripts are also an option. They can be used to test changes quickly, but the VMs will take longer to be ready compared to using an image where all the needed software is installed, configured, and so on.</p>
<h4 id="heading-instance-groups">Instance groups</h4>
<p>Instance groups let you treat a group of instances as a single unit and they come in two flavors:</p>
<ul>
<li><strong>Unmanaged instance group</strong>. Formed by a heterogeneous group of instances that required individual configuration settings.</li>
<li><strong>Managed instance group (MIG).</strong> This is the preferred option when possible. All the machines look the same, making it easy to configure them, create them in multiple zones (high availability), replace them if they become unhealthy (auto-healing), balance the traffic among them, and create new instances if they traffic increases (horizontal scaling).</li>
</ul>
<p>To create a MIGs, you need to define an <strong>instance template</strong>, specifying your machine type, zone, OS image, startup and shutdown scripts, and so on. Instance templates are immutable.</p>
<p>To update a MIG, you need to create a new template and use the <strong>Managed Instance Group Updated</strong> to deploy the new version to every machine in the group. </p>
<p>This functionality can be used to create <a target="_blank" href="https://martinfowler.com/bliki/CanaryRelease.html">canary tests</a>, deploying your changes to a small fraction of your machines first.</p>
<p>Visit this <a target="_blank" href="https://cloud.google.com/compute/docs/instance-groups/distributing-instances-with-regional-instance-groups">link</a> to know more about Google's recommendations to ensure an application deployed via a managed instance group can handle the load even if an entire zone fails.</p>
<h4 id="heading-security-best-practices-for-gce">Security best practices for GCE</h4>
<p>To increase the security of your infrastructure in GCE, have a look at:</p>
<ul>
<li><a target="_blank" href="https://cloud.google.com/security/shielded-cloud/shielded-vm">Shielded VMs</a></li>
<li><a target="_blank" href="https://cloud.google.com/solutions/connecting-securely">Prevent instances from being reached from the public internet</a></li>
<li>[Trusted images](https://cloud.google.com/compute/docs/images/restricting-image-access#:~:text=Use the Trusted image feature,images%2C disks%2C and snapshots.) to make sure your users can only create disks from images in <strong>specific projects</strong></li>
</ul>
<h3 id="heading-app-enginehttpscloudgooglecomappenginedocs"><strong><a target="_blank" href="https://cloud.google.com/appengine/docs">App Engine</a></strong></h3>
<p>App Engine is a great choice when you want to focus on the code and let Google handle your infrastructure. You just need to choose the region where your app will be deployed (this cannot be changed once it is set). Amongst its main use cases are websites, mobile apps, and game backends.</p>
<p>You can easily update the version of your app that is running via the command line or the Google Console. </p>
<p>Also, if you need to deploy a risky update to your application, you can split the traffic between the old and the risky versions for a canary deployment. Once you are happy with the results, you can route all the traffic to the new version.</p>
<p>There are two App Engine environments:</p>
<ul>
<li><strong>Standard</strong>. This version can quickly scale up or down (even to zero instances) to adjust to the demand. Currently, only a few programming languages are supported (Go, Java, PHP, and Python) and you do not have access to a VPC (including VPN connections). It can be scaled down to zero instances.</li>
<li><strong>Flexible.</strong> Your code runs in Docker containers in GCE, hence more flexible than the Standard environment. However, creating new instances is slower and it cannot be scaled down to zero instances. It is suited for more consistent traffic.</li>
</ul>
<p>Regardless of the environment, there are no up-front costs and you only pay for what you use (billed per second).</p>
<p><strong>Memcache</strong> is a built-in App Engine, giving you the possibility to choose between a <strong>shared</strong> cache (default, free option) or a <strong>dedicated</strong> cache for better performance. </p>
<p>Visit this link to know more about the <a target="_blank" href="https://cloud.google.com/appengine/docs/standard/java/memcache#best_practices">best practices</a> you should follow to maximize the performance of your app.</p>
<h3 id="heading-google-kubernetes-engine-gkehttpscloudgooglecomkubernetes-enginedocs"><strong><a target="_blank" href="https://cloud.google.com/kubernetes-engine/docs">Google Kubernetes Engine (GKE)</a></strong></h3>
<p><a target="_blank" href="https://kubernetes.io/">Kubernetes</a> is an open-source <strong>container orchestration system</strong>, developed by Google. </p>
<p>Kubernetes is a very extensive topic in itself and I will not cover here. You just need to know that GKE makes it easy to run and manage your Kubernetes clusters on GCP. </p>
<p>Google also provides <a target="_blank" href="https://cloud.google.com/container-registry">Container Registry</a> to store your container images - think of it as your private Docker Hub.</p>
<p><strong>Note</strong>: You can use <a target="_blank" href="https://cloud.google.com/cloud-build/docs">Cloud Build</a> to run your builds in GCP and, among other things, produce Docker images and store them in Container Registry. Cloud Build can import your code from Google Cloud Storage, <a target="_blank" href="https://cloud.google.com/source-repositories/docs">Cloud Source Repository</a>, GitHub, or Bitbucket.</p>
<h3 id="heading-cloud-functionshttpscloudgooglecomfunctionsdocs"><strong><a target="_blank" href="https://cloud.google.com/functions/docs">Cloud Functions</a></strong></h3>
<p>Cloud Functions are the equivalent of Lambda functions in AWS. Cloud functions are <strong>serverless</strong>. They let you focus on the code and not worry about the infrastructure where it is going to run.</p>
<p>With Cloud Functions it is <strong>easy to respond to events</strong> such as uploads to a GCS bucket or messages in a Pub/Sub topic. You are only charged for the time your function is running in response to an event.</p>
<h2 id="heading-how-to-work-with-big-data-in-gcp"><strong>How to work with Big Data in GCP</strong></h2>
<h3 id="heading-bigqueryhttpscloudgooglecombigquerydocs"><strong><a target="_blank" href="https://cloud.google.com/bigquery/docs/">BigQuery</a></strong></h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/bigq.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>BigQuery is Google's serverless data warehousing and provides analytics capabilities for petabyte-scale databases. </p>
<p>BigQuery automatically backs up your tables, but you can always export them to GCS to be on the safe side - incurring extra costs. </p>
<p>Data can be ingested in batches (for instance, from a GCS bucket) or from a stream in multiple formats: CSV, JSON, Parquet, or Avro (most performant). Also, you can query data that resides in external sources, called federated sources, for example, GCS buckets.</p>
<p>You can interact with your data in BigQuery using SQL via the</p>
<ul>
<li>Google Console.</li>
<li><a target="_blank" href="https://cloud.google.com/bigquery/docs/bq-command-line-tool">Command-line</a>, running commands like <code>bq query 'SELECT field FROM ....</code></li>
<li>REST API.</li>
<li>Code using client libraries.</li>
</ul>
<p><a target="_blank" href="https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions">User-Defined Functions</a> allow you to combine SQL queries with JavaScript functions to create complex operations.</p>
<p>BigQuery is a columnar data store: records are stored in columns. Tables are collections of columns and datasets are collections of tables. </p>
<p>Jobs are actions to load, export, query, or copy data that BigQuery runs on your behalf. </p>
<p>Views are virtual tables defined by a SQL query and are useful sharing data with others when you want to control exactly what they have access to.</p>
<p>Two important concepts related to tables are:</p>
<ul>
<li><strong>Partitioned tables</strong>. To limit the amount of data that needs to be queried, tables can be divided into partitions. This can be done based on ingest time or including a timestamp or date column or an integer range. This way it is easy to query for certain periods without querying the full table. To reduce costs, you can define an expiration period after which the partition will be deleted.</li>
<li><a target="_blank" href="https://cloud.google.com/bigquery/docs/clustered-tables"><strong>Clustered tables</strong></a>. Data are clustered by column (for instance, order_id). When you query your table, only the rows associated with this column will be read. BigQuery will perform this clustering automatically based on one or more columns.</li>
</ul>
<p>Using IAM roles, you can control access at a project, dataset, or view level, but <em>not at the table level</em>. Roles are complex for BigQuery, so I recommend checking the <a target="_blank" href="https://cloud.google.com/bigquery/docs/access-control#predefined_roles_comparison_matrix">documentation</a>. </p>
<p>For instance, the jobUser role only lets you run jobs while the user role lets you run jobs and create datasets (but not tables).</p>
<p>Your costs depend on how much data you store and stream into BigQuery and how much data you query. To reduce costs, BigQuery automatically caches previous queries (per user). This behavior can be disabled.</p>
<p>When you don't edit data for 90 days, it automatically moves to a cheaper storage class. You pay for what you use, but it is possible to opt for a flat rate (only if you need more than the 2000 <a target="_blank" href="https://cloud.google.com/bigquery/docs/slots">slots</a> that are allocated by default). </p>
<p>Check these links to see how to <a target="_blank" href="https://cloud.google.com/bigquery/docs/best-practices-performance-overview">optimize your performance</a> and <a target="_blank" href="https://cloud.google.com/blog/products/data-analytics/cost-optimization-best-practices-for-bigquery">costs</a>.</p>
<h3 id="heading-cloud-pubsubhttpscloudgooglecompubsubdocs"><strong><a target="_blank" href="https://cloud.google.com/pubsub/docs">Cloud Pub/Sub</a></strong></h3>
<p>Pub/Sub is Google's <strong>fully-managed message queue</strong>, allowing you to decouple publishers (adding messages to the queue) and subscribers (consuming messages from the queue).</p>
<p>Although it is similar to <a target="_blank" href="https://kafka.apache.org/">Kafka</a>, Pub/Sub is not a direct substitute. They can be combined in the same pipeline (Kafka deployed on-premise or even in GKE). There are open-source plugins to connect Kafka to GCP, like <a target="_blank" href="https://www.confluent.io/hub/confluentinc/kafka-connect-gcp-pubsub">Kafka Connect</a>.</p>
<p>Pub/Sub guarantees that every message will be delivered at least once but it does not guarantee that messages will be processed in order. It is usually connected to Dataflow to process the data, ensure that the messages are processed in order, and so on.</p>
<p>Pub/Sub support both push and pull modes:</p>
<ul>
<li><strong>Push.</strong> Messages are sent to subscribers, resulting in lower latency.</li>
<li><strong>Pull.</strong> Subscribers pull messages from topics, better suited for a large volume of messages.</li>
</ul>
<h3 id="heading-cloud-pubsub-vs-cloud-task"><strong>Cloud Pub/Sub vs Cloud Task</strong></h3>
<p><a target="_blank" href="https://cloud.google.com/tasks/docs">Cloud Tasks</a> is another fully-managed service to execute tasks asynchronously and manage messages between services. However, there are differences between Cloud Tasks and Pub/Sub:</p>
<ul>
<li>In Pub/Sub, publishers and subscribers are decoupled. Publishers know nothing about their subscribers. When they publish a message, they implicitly cause one or multiple subscribers to react to a publishing event.</li>
<li>In Cloud Tasks, the publisher stays in control of the execution. Besides, Cloud Tasks provide other features unavailable for Pub/Sub like scheduling specific delivery times, delivery rate controls, configurable retries, access and management of individual tasks in a queue, task/message creation deduplication.</li>
</ul>
<p>For more details, check out this <a target="_blank" href="https://cloud.google.com/tasks/docs/comp-pub-sub">link</a>.</p>
<h3 id="heading-cloud-dataflowhttpscloudgooglecomdataflowdocs"><strong><a target="_blank" href="https://cloud.google.com/dataflow/docs/">Cloud Dataflow</a></strong></h3>
<p>Cloud Dataflow is Google's managed service for <strong>stream and batch data processing</strong>, based on <a target="_blank" href="https://beam.apache.org/documentation/">Apache Beam</a>. </p>
<p>You can define pipelines that will transform your data, for example before it is ingested in another service like BigQuery, BigTable, or Cloud ML. The same pipeline can process both stream and batch data.</p>
<p>A common pattern is to stream data into Pub/Sub, let's say from <a target="_blank" href="https://cloud.google.com/solutions/iot">IoT devices</a>, process it in Dataflow, and store it for analysis in BigQuery. </p>
<p>But Pub/Sub does not guarantee that the order in which messages are pushed to the topics will be the order in which the messages are consumed. However, this can be done with Dataflow.</p>
<h3 id="heading-cloud-dataprochttpscloudgooglecomdataprocdocs"><strong><a target="_blank" href="https://cloud.google.com/dataproc/docs">Cloud Dataproc</a></strong></h3>
<p>Cloud Dataproc is Google's managed the Hadoop and Spark ecosystem. It lets you create and manage your clusters easily and turn them off when you are not using them, to reduce costs. </p>
<p>Dataproc can only be used to process batch data, while Dataflow can handle also streaming data.</p>
<p>Google recommends using Dataproc for a lift and leverage migration of your on-premise Hadoop clusters to the cloud:</p>
<ul>
<li>Reduce costs turning your cluster off when you are not using it.</li>
<li>Leverage Google's infrastructure</li>
<li>Use some preemptible virtual machines to reduce costs</li>
<li>Add larger (SSD) persistent disks to improve performance</li>
<li>BigQuery can replace Hive and BigTable can replace HBase</li>
<li>Cloud Storage replaces HDFS. Just upload your data to GCS and change the prefixes hdfs:// to gs://</li>
</ul>
<p>Otherwise, you should choose Cloud Dataflow.</p>
<h3 id="heading-dataprephttpscloudgooglecomdataprepdocs"><strong><a target="_blank" href="https://cloud.google.com/dataprep/docs">Dataprep</a></strong></h3>
<p>Cloud Dataprep provides you with a <strong>web-based interface to clean and prepare your data</strong> before processing. The input and output formats include, among others, CSV, JSON, and Avro.</p>
<p>After defining the transformations, a Dataflow job will run. The transformed data can be exported to GCS, BigQuery, etc.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/proc.svg" alt="Image" width="600" height="400" loading="lazy"></p>
<h3 id="heading-cloud-composerhttpscloudgooglecomcomposerdocs"><strong><a target="_blank" href="https://cloud.google.com/composer/docs">Cloud Composer</a></strong></h3>
<p>Cloud Composer is Google's fully-managed <a target="_blank" href="https://airflow.apache.org/">Apache Airflow</a> service to create, schedule, monitor, and manage workflows. It handles all the infrastructure for you so that you can concentrate on combining the services I have described above to create your own workflows.</p>
<p>Under the hood, a GKE cluster will be created with Airflow in it and GCS will be used to store files.</p>
<h3 id="heading-ai-and-machine-learning-in-gcphttpscloudgooglecomproductsai"><strong><a target="_blank" href="https://cloud.google.com/products/ai">AI and Machine Learning in GCP</a></strong></h3>
<p>Covering the basics of machine learning would take another article. So here, I assume you are familiar with it and will show you how to train and deploy your models in GCP. </p>
<p>We'll also look at what APIs are available to leverage Google's machine learning capabilities in your services, even if you are not an expert in this area.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/AI.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<h4 id="heading-ai-platformhttpscloudgooglecomai-platformdocs"><a target="_blank" href="https://cloud.google.com/ai-platform/docs">AI Platform</a></h4>
<p>AI Platform provides you with a <strong>fully-managed platform</strong> to use machine learning libraries like <a target="_blank" href="https://www.tensorflow.org/">Tensorflow</a>. You just need to focus on your model and Google will handle all the infrastructure needed to train it. </p>
<p>After your model is trained, you can use it to get online and batch predictions.</p>
<h4 id="heading-cloud-automlhttpscloudgooglecomautoml"><a target="_blank" href="https://cloud.google.com/automl">Cloud AutoML</a></h4>
<p>Google lets you use <strong>your data to train their models</strong>. You can leverage models to build applications that are based on natural language processing (for example, document classification or sentiment analysis applications), speech processing, machine translation, or video processing (video classification or object detection).</p>
<h2 id="heading-how-to-explore-and-visualize-your-data-in-gcp"><strong>How to explore and visualize your data in GCP</strong></h2>
<h3 id="heading-cloud-data-studiohttpscloudgooglecombigquerydocsvisualize-data-studio"><strong><a target="_blank" href="https://cloud.google.com/bigquery/docs/visualize-data-studio">Cloud Data Studio</a></strong></h3>
<p>Data Studio lets you create <strong>visualizations and dashboards</strong> based on data that resides in Google services (YouTube Analytics, Sheets, AdWords, local upload), Google Cloud Platform (BigQuery, Cloud SQL, GCS, Spanner), and many third-party services, storing your reports in Google Drive.</p>
<p>Data Studio is not part of GCP, but <strong>G-Suite</strong>, thus its permissions are not managed using IAM. </p>
<p>There are no additional costs for using Data Studio, other than the storage of the data, queries in BigQuery, and so on. <strong>Caching</strong> can be used to improve performance and reduce costs.</p>
<h3 id="heading-cloud-datalabhttpscloudgooglecomdatalabdocs"><strong><a target="_blank" href="https://cloud.google.com/datalab/docs">Cloud Datalab</a></strong></h3>
<p>Datalab lets you <strong>explore, analyze, and visualize data</strong> in BigQuery, ML Engine, Compute Engine, Cloud Storage, and Stackdriver. </p>
<p>It is based on Jupyter notebooks and supports Python, SQL, and Javascript code. Your notebooks can be shared via the Cloud Source Repository.</p>
<p>Cloud Datalab itself is free of charge, but it will create a virtual machine in GCE for which you will be billed.</p>
<h2 id="heading-security-in-gcphttpscloudgooglecomsecurity"><strong><a target="_blank" href="https://cloud.google.com/security/">Security in GCP</a></strong></h2>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/10/sec.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<h3 id="heading-encryption-on-google-cloud-platform"><strong>Encryption on Google Cloud Platform</strong></h3>
<p>Google Cloud encrypts data both at rest (data stored on disk) and in transit (data traveling in the network), using <a target="_blank" href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a> implemented via <a target="_blank" href="https://boringssl.googlesource.com/boringssl/">Boring SSL</a>. </p>
<p>You can manage the encryption keys yourself (both storing them in GCP or on-premise) or let Google handle them.</p>
<h4 id="heading-encryption-at-resthttpscloudgooglecomsecurityencryption-at-restdefault-encryption"><a target="_blank" href="https://cloud.google.com/security/encryption-at-rest/default-encryption">Encryption at rest</a></h4>
<p>GCP encrypts data stored at rest <strong>by default</strong>. Your data will be divided into chunks. Each chunk is distributed across different machines and encrypted with a unique key, called a <strong>data encryption key (DEK)</strong>. </p>
<p>Keys are generated and managed by Google but you can also manage the keys yourself, as we will see later in this guide.</p>
<h4 id="heading-encryption-in-transithttpscloudgooglecomsecurityencryption-in-transit"><a target="_blank" href="https://cloud.google.com/security/encryption-in-transit">Encryption in Transit</a></h4>
<p>To add an extra security layer, all communications between two GCP services or from your infrastructure to GCP are encrypted at one or more network layers. Your data would not be compromised if your messages were to be intercepted.</p>
<h3 id="heading-cloud-key-management-service-kmshttpscloudgooglecomkmsdocs"><strong><a target="_blank" href="https://cloud.google.com/kms/docs">Cloud Key Management Service (KMS)</a></strong></h3>
<p>As I mentioned earlier, you can let Google manage the keys for you or you can manage them yourself. </p>
<p>Google KMS is the service that allows you to <strong>manage your encryption keys</strong>. You can create, rotate, and destroy symmetric encryption keys. All keys related activity is registered in logs. These keys are referred to as <strong>customer-managed encryption keys</strong>. </p>
<p>In GCS, they are used to <a target="_blank" href="https://cloud.google.com/storage/docs/encryption/customer-managed-keys#:~:text=Data%20Encryption%20Options.-,Overview,are%20stored%20within%20Cloud%20KMS.">encrypt</a>:</p>
<ul>
<li>The object's data.</li>
<li>The object's CRC32C checksum.</li>
<li>The object's MD5 hash.</li>
</ul>
<p>And Google uses server-side keys to handle the rest of the metadata, including the object's name.</p>
<p>The DEKs used to encrypt your data are also encrypted using key encryption keys (KEKs), in a process called envelope encryption. By default, KEKs are rotated every 90 days.</p>
<p>It is important to note that KMS does not store secrets. KMS is a central repository for KEKs. Only the keys that GCP needs to encrypt secrets that are stored somewhere else, for instance in <a target="_blank" href="https://cloud.google.com/solutions/secret-manager/">Secrets management</a>.</p>
<p><strong>Note</strong>: For GCE and GCS, you have the possibility of keeping your keys on-premise and let Google retrieve them to encrypt and decrypt your data. These are known as <strong>customer-supplied keys</strong>.</p>
<h3 id="heading-identity-aware-proxy-iaphttpscloudgooglecomiapdocs"><strong><a target="_blank" href="https://cloud.google.com/iap/docs">Identity-Aware Proxy (IAP)</a></strong></h3>
<p>Identity-Aware Proxy allows you to <strong>control the access</strong> GCP applications via HTTPs without installing any VPN software or adding extra code in your application to handle login. </p>
<p>Your applications are visible to the public internet, but only accessible to authorized users, implementing a zero-trust security access model. </p>
<p>Furthermore, with TCP forwarding you can prevent services like SSH to be exposed to the public internet.</p>
<h3 id="heading-cloud-armorhttpscloudgooglecomarmordocs"><strong><a target="_blank" href="https://cloud.google.com/armor/docs">Cloud Armor</a></strong></h3>
<p>Cloud Armor protects your infrastructure from <a target="_blank" href="https://en.wikipedia.org/wiki/Denial-of-service_attack">distributed denial of service (DDoS)</a> attacks. You define rules (for example to whitelist or deny certain IP addresses or CIDR ranges) to create security policies, which are enforced at the Point of Presence level (closer to the source of the attack).</p>
<p>Cloud Armor gives you the option of previewing the effects of your policies before activating them.</p>
<h3 id="heading-cloud-data-loss-preventionhttpscloudgooglecomdlpdocs"><strong><a target="_blank" href="https://cloud.google.com/dlp/docs">Cloud Data Loss Prevention</a></strong></h3>
<p>Data Loss Prevention is a fully-managed service designed to help you discover, classify, and protect sensitive data, like:</p>
<ul>
<li><strong>Personable Identifiable Information (PII)</strong>: name, Social Security number, driver's license number, bank account number, passport number, email address, and so on.</li>
<li><strong>Secrets</strong></li>
<li><strong>Credentials</strong></li>
</ul>
<p>DLP is integrated with GCS, BigQuery, and Datastore. Also, the source of the data can be outside of GCP. </p>
<p>You can specify what type of data you're interested in, called info type, define your own types (based on dictionaries of words and phrases or based on regex expressions), or let Google use the default which can be time-consuming for large amounts of data.</p>
<p>For each result, DLP will return the likelihood of that piece of data matches a certain info type: LIKELIHOOD_UNSPECIFIED, VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY, VERY_LIKELY.</p>
<p>After detecting a piece of PII, DLP can transform it so that it cannot be mapped back to the user. DLP uses multiple techniques to de-identify your sensitive data like tokenization, bucketing, and date shifting. DLP can detect and redact sensitive data in images too.</p>
<h3 id="heading-vpc-service-controlhttpscloudgooglecomvpc-service-controlsdocsoverview"><strong><a target="_blank" href="https://cloud.google.com/vpc-service-controls/docs/overview">VPC Service Control</a></strong></h3>
<p>VPC Service Control helps prevent data exfiltration. It allows you to define a perimeter around resources you want to protect. You can define what services and from what networks these resources can be accessed.</p>
<h3 id="heading-cloud-web-security-scannerhttpscloudgooglecomsecurity-command-centerdocsconcepts-web-security-scanner-overview"><strong><a target="_blank" href="https://cloud.google.com/security-command-center/docs/concepts-web-security-scanner-overview">Cloud Web Security Scanner</a></strong></h3>
<p>Cloud Web Security Scanner scanner applications running in Compute Engine, GKE, and App Engine for common vulnerabilities such as passwords in plain text, invalid headers, outdated libraries, and <a target="_blank" href="https://en.wikipedia.org/wiki/Cross-site_scripting">cross-site scripting attacks</a>. It simulates a real user trying to click on your buttons, inputting text in your text fields, and so on.</p>
<p>It is part of <a target="_blank" href="https://cloud.google.com/security-command-center/docs">Cloud Security Command Center</a>.</p>
<h2 id="heading-more-gcp-resources"><strong>More GCP resources</strong></h2>
<ul>
<li><a target="_blank" href="https://gcp.solutions/">Google Cloud Solutions Architecture Reference</a></li>
<li><a target="_blank" href="https://cloud.google.com/solutions/#section-1">GCP solutions by industry</a></li>
<li><a target="_blank" href="https://www.youtube.com/user/googlecloudplatform">GCP Youtube channel</a></li>
<li><a target="_blank" href="https://cloud.google.com/training/free-labs">GCP Labs</a></li>
</ul>
<p>If you're interested in learning more about GCP, I recommend checking the free practice exams for the different certifications. Whether you are preparing for a GCP or not you can use them to find gaps in your knowledge:</p>
<ul>
<li><a target="_blank" href="https://cloud.google.com/certification/sample-questions/cloud-developer">Professional Cloud Developer</a></li>
<li><a target="_blank" href="https://cloud.google.com/certification/sample-questions/clouddata-engineer">Professional Cloud Data Engineer</a></li>
<li><a target="_blank" href="https://cloud.google.com/certification/sample-questions/cloud-network-engineer">Professional Cloud Network Engineer</a></li>
<li><a target="_blank" href="https://cloud.google.com/certification/sample-questions/cloud-security-engineer">Professional Cloud Security Engineer</a></li>
<li><a target="_blank" href="https://cloud.google.com/certification/sample-questions/cloud-devops-engineer">Professional Cloud DevOps Engineer</a></li>
<li><a target="_blank" href="https://cloud.google.com/certification/sample-questions/cloud-devops-engineer">Professional Cloud Machine Learning Engineer</a></li>
<li><a target="_blank" href="https://cloud.google.com/certification/sample-questions/cloud-architect">Professional Cloud Architect</a></li>
</ul>
<p><strong>Note:</strong> Some questions are based on case studies. Links to the case studies will be provided in the exams so that you have the full context to properly understand and answer the question.</p>
<h2 id="heading-time-to-test-your-knowledge"><strong>Time to test your knowledge</strong></h2>
<p>I've extracted 10 questions from some of the exams above. Some of them are pretty straightforward. Others require deep thought and deciding what is the best solution when more than one option is a viable solution.</p>
<h3 id="heading-question-1"><strong>Question 1</strong></h3>
<p>Your customer is moving their corporate applications to Google Cloud. The security team wants detailed visibility of all resources in the organization. You use the Resource Manager to set yourself up as the Organization Administrator. </p>
<p>Which Cloud Identity and Access Management (Cloud IAM) roles should you give to the security team while following Google's recommended practices?</p>
<p>A. Organization viewer, Project owner</p>
<p>B. Organization viewer, Project viewer</p>
<p>C. Organization administrator, Project browser</p>
<p>D. Project owner, Network administrator</p>
<h3 id="heading-question-2"><strong>Question 2</strong></h3>
<p>Your company wants to try out the cloud with low risk. They want to archive approximately 100 TB of their log data to the cloud and test the serverless analytics features available to them there, while also retaining that data as a long-term disaster recovery backup. </p>
<p>Which two steps should they take? (Choose two)</p>
<p>A. Load logs into BigQuery.</p>
<p>B. Load logs into Cloud SQL.</p>
<p>C. Import logs into Cloud Logging.</p>
<p>D. Insert logs into Cloud Bigtable.</p>
<p>E. Upload log files into Cloud Storage.</p>
<h3 id="heading-question-3"><strong>Question 3</strong></h3>
<p>Your company wants to track whether someone is present in a meeting room reserved for a scheduled meeting. </p>
<p>There are 1000 meeting rooms across 5 offices on 3 continents. Each room is equipped with a motion sensor that reports its status every second. </p>
<p>You want to support the data ingestion needs of this sensor network. The receiving infrastructure needs to account for the possibility that the devices may have inconsistent connectivity. </p>
<p>Which solution should you design?</p>
<p>A. Have each device create a persistent connection to a Compute Engine instance and write messages to a custom application.</p>
<p>B. Have devices poll for connectivity to Cloud SQL and insert the latest messages on a regular interval to a device-specific table.</p>
<p>C. Have devices poll for connectivity to Cloud Pub/Sub and publish the latest messages on a regular interval to a shared topic for all devices.</p>
<p>D. Have devices create a persistent connection to an App Engine application fronted by Cloud Endpoints, which ingest messages and write them to Cloud Datastore.</p>
<h3 id="heading-question-4"><strong>Question 4</strong></h3>
<p>To reduce costs, the Director of Engineering has required all developers to move their development infrastructure resources from on-premises virtual machines (VMs) to Google Cloud. </p>
<p>These resources go through multiple start/stop events during the day and require the state to persist. </p>
<p>You have been asked to design the process of running a development environment in Google Cloud while providing cost visibility to the finance department. </p>
<p>Which two steps should you take? (Choose two)</p>
<p>A. Use persistent disks to store the state. Start and stop the VM as needed.</p>
<p>B. Use the --auto-delete flag on all persistent disks before stopping the VM.</p>
<p>C. Apply the VM CPU utilization label and include it in the BigQuery billing export.</p>
<p>D. Use BigQuery billing export and labels to relate cost to groups.</p>
<p>E. Store all state in a Local SSD, snapshot the persistent disks and terminate the VM.</p>
<h3 id="heading-question-5"><strong>Question 5</strong></h3>
<p>The database administration team has asked you to help them improve the performance of their new database server running on Compute Engine. </p>
<p>The database is used for importing and normalizing the company’s performance statistics. It is built with MySQL running on Debian Linux. They have an n1-standard-8 virtual machine with 80 GB of SSD zonal persistent disk which they can't restart until the next maintenance event. </p>
<p>What should they change to get better performance from this system as soon as possible and in a cost-effective manner?</p>
<p>A. Increase the virtual machine’s memory to 64 GB.</p>
<p>B. Create a new virtual machine running PostgreSQL.</p>
<p>C. Dynamically resize the SSD persistent disk to 500 GB.</p>
<p>D. Migrate their performance metrics warehouse to BigQuery.</p>
<h3 id="heading-question-6"><strong>Question 6</strong></h3>
<p>Your organization has a 3-tier web application deployed in the same Google Cloud Virtual Private Cloud (VPC). </p>
<p>Each tier (web, API, and database) scales independently of the others. Network traffic should flow through the web to the API tier, and then on to the database tier. Traffic should not flow between the web and the database tier. </p>
<p>How should you configure the network with minimal steps?</p>
<p>A. Add each tier to a different subnetwork.</p>
<p>B. Set up software-based firewalls on individual VMs.</p>
<p>C. Add tags to each tier and set up routes to allow the desired traffic flow.</p>
<p>D. Add tags to each tier and set up firewall rules to allow the desired traffic flow.</p>
<h3 id="heading-question-7"><strong>Question 7</strong></h3>
<p>You are developing an application on Google Cloud that will label famous landmarks in users’ photos. You are under competitive pressure to develop a predictive model quickly. You need to keep service costs low. </p>
<p>What should you do?</p>
<p>A. Build an application that calls the Cloud Vision API. Inspect the generated MID values to supply the image labels.</p>
<p>B. Build an application that calls the Cloud Vision API. Pass client image locations as base64-encoded strings.</p>
<p>C. Build and train a classification model with TensorFlow. Deploy the model using the AI Platform Prediction. Pass client image locations as base64-encoded strings.</p>
<p>D. Build and train a classification model with TensorFlow. Deploy the model using the AI Platform Prediction. Inspect the generated MID values to supply the image labels.</p>
<h3 id="heading-question-8"><strong>Question 8</strong></h3>
<p>You set up an autoscaling managed instance group to serve web traffic for an upcoming launch. </p>
<p>After configuring the instance group as a backend service to an HTTP(S) load balancer, you notice that virtual machine (VM) instances are being terminated and re-launched every minute. The instances do not have a public IP address. </p>
<p>You have verified that the appropriate web response is coming from each instance using the curl command. You want to ensure that the backend is configured correctly. </p>
<p>What should you do?</p>
<p>A. Ensure that a firewall rule exists to allow source traffic on HTTP/HTTPS to reach the load balancer.</p>
<p>B. Assign a public IP to each instance and configure a firewall rule to allow the load balancer to reach the instance public IP.</p>
<p>C. Ensure that a firewall rule exists to allow load balancer health checks to reach the instances in the instance group.</p>
<p>D. Create a tag on each instance with the name of the load balancer. Configure a firewall rule with the name of the load balancer as the source and the instance tag as the destination.</p>
<h3 id="heading-question-9"><strong>Question 9</strong></h3>
<p>You created a job that runs daily to import highly sensitive data from an on-premises location to Cloud Storage. You also set up a streaming data insert into Cloud Storage via a Kafka node that is running on a Compute Engine instance. </p>
<p>You need to encrypt the data at rest and supply your own encryption key. Your key should not be stored in the Google Cloud. </p>
<p>What should you do?</p>
<p>A. Create a dedicated service account and use encryption at rest to reference your data stored in Cloud Storage and Compute Engine data as part of your API service calls.</p>
<p>B. Upload your own encryption key to Cloud Key Management Service and use it to encrypt your data in Cloud Storage. Use your uploaded encryption key and reference it as part of your API service calls to encrypt your data in the Kafka node hosted on Compute Engine.</p>
<p>C. Upload your own encryption key to Cloud Key Management Service and use it to encrypt your data in your Kafka node hosted on Compute Engine.</p>
<p>D. Supply your own encryption key, and reference it as part of your API service calls to encrypt your data in Cloud Storage and your Kafka node hosted on Compute Engine.</p>
<h3 id="heading-question-10"><strong>Question 10</strong></h3>
<p>You are designing a relational data repository on Google Cloud to grow as needed. The data will be transactionally consistent and added from any location in the world. You want to monitor and adjust node count for input traffic, which can spike unpredictably. </p>
<p>What should you do?</p>
<p>A. Use Cloud Spanner for storage. Monitor storage usage and increase node count if more than 70% utilized.</p>
<p>B. Use Cloud Spanner for storage. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.</p>
<p>C. Use Cloud Bigtable for storage. Monitor data stored and increase node count if more than 70% is utilized.</p>
<p>D. Use Cloud Bigtable for storage. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.</p>
<h4 id="heading-answers">Answers</h4>
<ol>
<li>B</li>
<li>A, E</li>
<li>C</li>
<li>A, D</li>
<li>C</li>
<li>D</li>
<li>B</li>
<li>C</li>
<li>D</li>
<li>B</li>
</ol>
<h2 id="heading-back-to-the-initial-proposition"><strong>Back to the initial proposition</strong></h2>
<p>At the beginning of this article, I said you'd learn how to design a mobile gaming analytics platform that collects, stores, and analyzes vast amounts of player-telemetry both from bulks of data and real-time events.</p>
<p>So, do you think you can do it?</p>
<p>Take a pen and a piece of paper and try to come up with your own solution based on the services I have described here. If you get stuck, the following questions might help:</p>
<ul>
<li>The platform needs to collect real-time events from the game:</li>
<li>Where might be the game running?</li>
<li>How can you ingest streaming data from the game into GCP?</li>
<li>How can you store it?</li>
<li>How can you collect and store the uploads of batches of data?</li>
<li>Can you analyze all the ingested data as it comes? Does it need to be processed?</li>
<li>What services can you use to analyze the data? How would this change if low-latency was now a new requirement?</li>
</ul>
<p>I have purposely defined the problem in a very vague way. This is what you can expect when you are facing this sort of challenge: uncertainty. It is part of your job to gather requirements and document your assumptions.</p>
<p>Do not worry if your solution does not look like <a target="_blank" href="https://cloud.google.com/solutions/mobile/mobile-gaming-analysis-telemetry">Google's</a>. This is just one possible solution. Learning to design complex systems is a skill that takes a lifetime to master. Luckily, you're headed in the right direction.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>This guide will help you get started on GCP and give you a broad perspective of what you can do with it. </p>
<p>By no means will you be an expert after finishing this guide, or any other guide for that matter. The only way to really learn is by practicing. </p>
<p><strong>You are going to learn infinitely more by doing than by reading or watching</strong>. I strongly recommend using your free trial and Code Labs if you are serious about learning.  </p>
<p>You can visit my blog <a target="_blank" href="https://www.yourdevopsguy.com/">www.yourdevopsguy.com</a>  and <a target="_blank" href="https://twitter.com/CodingLanguages">follow me on Twitter</a> for more high-quality technical content.</p>
<p><strong>Disclaimer:</strong> At the time of publishing this article, I don't work or have ever worked for Google. I wanted to organize and summarize the knowledge I have acquired learned via the Google documentation, YouTube videos, the courses that I have taken and most importantly through hands-on practice using GCP daily on my job. </p>
<p>All of this information is free out there. The figures, numbers, and versions that you see here come from the documentation at the time I am publishing this article. To make sure you are using up-to-date data, please visit the official documentation.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Pass Almost Every Google Cloud Platform Professional Certification Exam ]]>
                </title>
                <description>
                    <![CDATA[ By Ivam Luz Are you interested in becoming a Google Cloud Platform certified professional? Last year, I took five out of the seven (at the time of this writing) of the GCP professional exams: Professional Cloud Architect Professional Data Engineer P... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-pass-almost-every-google-cloud-professional-certification-exam/</link>
                <guid isPermaLink="false">66d45f3133b83c4378a517dc</guid>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Google Cloud Platform ]]>
                    </category>
                
                    <category>
                        <![CDATA[ professional development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ self-improvement  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Mon, 15 Jun 2020 19:59:54 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/06/cloud-developer-7.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Ivam Luz</p>
<p>Are you interested in becoming a Google Cloud Platform certified professional?</p>
<p>Last year, I took five out of the seven (at the time of this writing) of the GCP professional exams:</p>
<ul>
<li><a target="_blank" href="https://cloud.google.com/certification/cloud-architect">Professional Cloud Architect</a></li>
<li><a target="_blank" href="https://cloud.google.com/certification/data-engineer">Professional Data Engineer</a></li>
<li><a target="_blank" href="https://cloud.google.com/certification/cloud-security-engineer">Professional Cloud Security Engineer</a></li>
<li><a target="_blank" href="https://cloud.google.com/certification/cloud-network-engineer">Professional Cloud Network Engineer</a></li>
<li><a target="_blank" href="https://cloud.google.com/certification/cloud-developer">Professional Cloud Developer</a></li>
</ul>
<p>In this post, I'll share some information about the exams, my strategies for passing them, as well as a link to the study guides I created along the way. These guides have been battle tested by more than a hundred professionals (so far) who successfully got certified with their help.</p>
<h1 id="heading-about-the-certification-exams">About the certification exams</h1>
<h3 id="heading-professional-cloud-architect">Professional Cloud Architect</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/06/cloud-architect-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Professional Cloud Architect certification logo</em></p>
<ul>
<li><strong>Length:</strong> 2 hours</li>
<li><strong>Registration fee:</strong> $200 (plus tax where applicable)</li>
<li><strong>Languages:</strong> English, Japanese.</li>
<li><strong>Exam format:</strong> Multiple choice and multiple select, taken remotely or in person at a test enter. <a target="_blank" href="https://www.kryteriononline.com/Locate-Test-Center">Locate a test center near you</a>.</li>
<li><strong>Exam Delivery Method:</strong><br>• Take the online-proctored exam from a remote location, review the online testing <a target="_blank" href="https://www.webassessor.com/wa.do?page=certInfo&amp;branding=GOOGLECLOUD&amp;tabs=13">requirements</a>.<br>• Take the onsite-proctored exam at a testing center, <a target="_blank" href="https://www.kryteriononline.com/Locate-Test-Center">locate a test center near you</a>.</li>
<li><strong>Prerequisites:</strong> None</li>
<li><strong>Recommended experience:</strong> 3+ years of industry experience including 1+ years designing and managing solutions using GCP.</li>
</ul>
<p><strong>Reference:</strong> <a target="_blank" href="https://cloud.google.com/certification/cloud-architect">https://cloud.google.com/certification/cloud-architect</a></p>
<h3 id="heading-professional-data-engineer">Professional Data Engineer</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/06/data-engineer-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Professional Data Engineer certification logo</em></p>
<ul>
<li><strong>Length:</strong> 2 hours</li>
<li><strong>Registration fee:</strong> $200 (plus tax where applicable)</li>
<li><strong>Languages:</strong> English, Japanese.</li>
<li><strong>Exam format:</strong> Multiple choice and multiple select taken remotely or in person at a test center. <a target="_blank" href="https://www.kryteriononline.com/Locate-Test-Center">Locate a test center near you</a>.</li>
<li><strong>Exam Delivery Method:</strong><br>•  Take the online-proctored exam from a remote location, review the online testing <a target="_blank" href="https://www.webassessor.com/wa.do?page=certInfo&amp;branding=GOOGLECLOUD&amp;tabs=13">requirements</a>.<br>• Take the onsite-proctored exam at a testing center, <a target="_blank" href="https://www.kryteriononline.com/Locate-Test-Center">Locate a test center near you</a>.</li>
<li><strong>Prerequisites:</strong> None</li>
<li><strong>Recommended experience:</strong> 3+ years of industry experience including 1+ years designing and managing solutions using GCP.</li>
</ul>
<p><strong>Reference:</strong> <a target="_blank" href="https://cloud.google.com/certification/data-engineer">https://cloud.google.com/certification/data-engineer</a></p>
<h3 id="heading-professional-cloud-security-engineer">Professional Cloud Security Engineer</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/06/security-engineer-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Professional Cloud Security Engineer certification logo</em></p>
<ul>
<li><strong>Length:</strong> 2 hours</li>
<li><strong>Registration fee:</strong> $200 (plus tax where applicable)</li>
<li><strong>Languages:</strong> English.<br><strong>Exam format:</strong> Multiple choice and multiple select, taken in person at a test center. <a target="_blank" href="https://www.kryteriononline.com/Locate-Test-Center">Locate a test center near you</a>.</li>
<li><strong>Prerequisites:</strong> None</li>
<li><strong>Recommended experience:</strong> 3+ years of industry experience including 1+ years designing and managing solutions using GCP.</li>
</ul>
<p><strong>Reference:</strong> <a target="_blank" href="https://cloud.google.com/certification/cloud-security-engineer">https://cloud.google.com/certification/cloud-security-engineer</a></p>
<h3 id="heading-professional-cloud-network-engineer">Professional Cloud Network Engineer</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/06/network-engineer-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Professional Cloud Network Engineer certification logo</em></p>
<ul>
<li><strong>Length:</strong> 2 hours</li>
<li><strong>Registration fee:</strong> $200 (plus tax where applicable)</li>
<li><strong>Languages:</strong> English.</li>
<li><strong>Exam format:</strong> Multiple choice and multiple select, taken in person at a test center. <a target="_blank" href="https://www.kryteriononline.com/Locate-Test-Center">Locate a test center near you</a>.</li>
<li><strong>Prerequisites:</strong> None</li>
<li><strong>Recommended experience:</strong> 3+ years of industry experience including 1+ years designing and managing solutions using GCP.</li>
</ul>
<p><strong>Reference:</strong> <a target="_blank" href="https://cloud.google.com/certification/cloud-network-engineer">https://cloud.google.com/certification/cloud-network-engineer</a></p>
<h3 id="heading-professional-cloud-developer">Professional Cloud Developer</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/06/cloud-developer-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Professional Cloud Developer certification logo</em></p>
<p>Honestly, this one caught me by surprise. Back in December 2019, when I took the exam, <strong>there wasn't anything saying it was in beta</strong>. By that time, this was the information available at the exam page:</p>
<ul>
<li><strong>Length</strong>: 2 hours</li>
<li><strong>Registration fee</strong>: $200 (plus tax where applicable)</li>
<li><strong>Languages</strong>: English, Japanese.</li>
<li><strong>Exam format</strong>: Multiple choice and multiple select, taken in person at a test center. <a target="_blank" href="https://www.kryteriononline.com/Locate-Test-Center">Locate a test center near you</a>.</li>
<li><strong>Prerequisites</strong>: None</li>
<li><strong>Recommended experience</strong>: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.</li>
</ul>
<p>At the time of this writing, though, the information available on the exam page is very different, as you can see below:</p>
<p>Beta certification exams are newly developed assessments. We gather performance statistics on the questions and use these statistics to create the certification standards for the final exams. If you pass, you are Google Cloud Certified.</p>
<ul>
<li>Save 40% on the cost of certification</li>
<li>Prove early adoption by claiming a low certificate number if you pass</li>
<li>Get exclusive Google-branded apparel</li>
<li>Refer to our <a target="_blank" href="https://cloud.google.com/certification/faqs#1">FAQs</a> for more details</li>
</ul>
<p><strong>Specifics about the beta</strong></p>
<ul>
<li><strong>Length</strong>: 4 hours</li>
<li><strong>Registration fee</strong>: $120 USD (40% discount on retail price of $200 USD) (plus tax where applicable)</li>
<li><strong>Languages</strong>: English.</li>
<li><strong>Exam format</strong>: Multiple choice and multiple select, taken in person at a test center. <a target="_blank" href="https://www.kryteriononline.com/Locate-Test-Center">Locate a test center near you</a>.</li>
<li><strong>Prerequisites</strong>: None</li>
<li><strong>Recommended experience</strong>: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.</li>
<li><strong>Beta exam preparation resources</strong>:  </li>
<li>To take the upcoming beta exam, use <a target="_blank" href="https://cloud.google.com/certification/guides/cloud-developer-2">the revised exam guide</a>.</li>
</ul>
<p><strong>Reference:</strong> <a target="_blank" href="https://cloud.google.com/certification/cloud-developer">https://cloud.google.com/certification/cloud-developer</a></p>
<h2 id="heading-the-preparation-process">The Preparation Process</h2>
<p>Now that you have all the basic information about the exams, it's time to study and get ready to pass them. My preparation process for the five exams I took involved the following steps:</p>
<ol>
<li><p>Read the exam overviews:<br>•  <a target="_blank" href="https://cloud.google.com/certification/cloud-architect">Professional Cloud Architect exam overview</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/data-engineer">Professional Data Engineer exam overview</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/cloud-security-engineer">Professional Cloud Security Engineer exam overview</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/cloud-network-engineer">Professional Cloud Network Engineer exam overview</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/cloud-developer">Professional Cloud Developer exam overview</a>  </p>
</li>
<li><p>Read the exam guides:<br>•  <a target="_blank" href="https://cloud.google.com/certification/guides/cloud-architect">Professional Cloud Architect exam guide</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/guides/data-engineer">Professional Data Engineer exam guide</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/guides/cloud-security-engineer">Professional Cloud Security Engineer exam guide</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/guides/cloud-network-engineer">Professional Cloud Network Engineer exam guide</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/guides/cloud-developer">Professional Cloud Developer exam guide</a>  </p>
</li>
<li><p>Next, visit the products page of the platform and identify each product that may be related to the topics listed on the exam guides. For GCP, you can find this list <a target="_blank" href="https://cloud.google.com/products/">here</a>.  </p>
</li>
<li><p>For each of the products identified in the prior step, visit its <strong>Documentation/Concepts</strong> <strong>page</strong> and start reading about each of the concepts that are relevant for the given product. Check the <a target="_blank" href="https://cloud.google.com/compute/docs/concepts">GCE concepts page</a>, for example.  </p>
</li>
<li><p>You’ll probably notice some products seem to overlap with each other, and you might find it difficult to know when to use one or the other. Google Search is your best friend here. :)  </p>
</li>
<li><p>For the <strong>Cloud Architect</strong> exam, after going through each product and its concepts, read the sample study cases provided by Google and try to design potential solutions that could address the requirements described on them.<br>•  <a target="_blank" href="https://cloud.google.com/certification/guides/cloud-architect/casestudy-mountkirkgames-rev2">Mountkirk games study case</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/guides/cloud-architect/casestudy-dress4win-rev2">Dress4Win study case</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/guides/cloud-architect/casestudy-terramearth-rev2">TerramEarth study case</a>  </p>
</li>
</ol>
<p>For the <strong>Data Engineer</strong> exam, the sample study cases have been recently removed and weren't part of the exam anymore, at least until June 2019.  </p>
<p><strong>All the other exams</strong> didn't make use of sample study cases, as of 2019.  </p>
<ol start="7">
<li><p>Finally, take the practice exams. The practice exams provide an explanation for each of the questions after you finish them. They also help you get an idea of the format of the questions you’ll face on each exam and will help you know how prepared you are.<br>•  <a target="_blank" href="https://cloud.google.com/certification/practice-exam/cloud-architect">Professional Cloud Architect practice exam</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/practice-exam/data-engineer">Professional Data Engineer practice exam</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/practice-exam/cloud-security-engineer">Professional Cloud Security Engineer practice exam</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/practice-exam/cloud-network-engineer">Professional Cloud Network Engineer practice exam</a><br>•  <a target="_blank" href="https://cloud.google.com/certification/practice-exam/cloud-developer">Professional Cloud Developer practice exam</a>  </p>
</li>
<li><p>After finishing the practice exams, take notes of the topics that didn’t go well and re-read the relevant documentation collected on <strong>step 4</strong> above.  </p>
</li>
<li><p>Take the practice exams again (you can take them as many times as you want) and keep <strong>repeating steps from 6 to 8</strong> until you feel confident to take the real exams.</p>
</li>
</ol>
<h1 id="heading-study-guides">Study guides</h1>
<p>From my own experience, I can tell you <strong>it's a lot of work</strong>. For this reason, I decided to <strong>contribute back to the community and share the study guides</strong> I created throughout my preparation process.</p>
<p>The study guides are contained in the spreadsheet linked below, each on a separate tab.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://docs.google.com/spreadsheets/d/1LUtqhOEjUMySCfn3zj8Arhzcmazr3vrPzy7VzJwIshE/edit#gid=0">https://docs.google.com/spreadsheets/d/1LUtqhOEjUMySCfn3zj8Arhzcmazr3vrPzy7VzJwIshE/edit#gid=0</a></div>
<p>To use it, create your own copy. Once you do it, the spreadsheet will be made writable to you and you’ll be able to update the <strong>Status</strong> column, which you’ll help you to track your progress through the material:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/06/image-43.png" alt="Image" width="600" height="400" loading="lazy">
<em>A screenshot of the spreadsheet with reference material for both professional certification exams.</em></p>
<p>You can freely copy, change and distribute this material. The only thing I kindly ask from you is that you keep a reference to the original material and give me proper credits, if you feel it's helpful.</p>
<p>Even though the Cloud Developer exam is back in beta, I believe the guide I created is still relevant, as it covers a lot of the topics (probably even more than what's needed_.</p>
<h2 id="heading-disclaimer">Disclaimer</h2>
<p>I'm sharing these guides with the only intent of helping people aiming to take the <strong>Google Cloud Professional certification exams</strong>. Be advised there is no guarantee that following the guides will make you pass the exams. Use them at your own discretion.</p>
<h1 id="heading-tips-for-taking-your-exams">Tips for taking your exams</h1>
<ol>
<li>Know what each product does, what it’s good for and what it’s not good for, as well as its billing characteristics.</li>
<li>As you can see above, except for the Cloud Developer exam, which seems to be back in beta, you have <strong>2 hours to finish the exams</strong>. Keep in mind that <strong>good time management is crucial for your success</strong>.</li>
<li><strong>Don’t spend too much time on questions you don’t know</strong>. If you aren’t sure about an answer, mark the question to be reviewed later and move on to the next questions.</li>
<li><strong>Practice as much as possible using the practice exams</strong>.</li>
</ol>
<h1 id="heading-conclusion">Conclusion</h1>
<p>In my opinion, the <strong>greatest </strong>value of a certification<em>**</em> is to help you know which subjects are important to learn as a professional, if you are willing to work with a specific technology. </p>
<p>The <strong>Professional Cloud Security Engineer</strong> certification exam helped me a lot in guiding my studies to learn more about the security aspects of the Google Cloud Platform. It helped me <strong>learn about specific </strong>concerns<em>**</em> we should have when considering or using many of the platform products from a security standpoint.</p>
<p>The Cloud Network Engineer certification was the hardest one I took out of the five. I believe it’s due to the fact my whole career was focused on Software Development so far. </p>
<p>I recognize that, just because I got certified, it <strong>doesn’t mean I am</strong> now <strong>a network specialist</strong> (and, honestly, I don’t really intend to be). As some people say, a certification is just a “piece of paper”, right? On the other hand, I certainly learned a lot during this process and <strong>having some networking skills in my belt certainly makes me a better professional</strong>. </p>
<p>In fact, some of these skills have already helped me solve some infrastructure issues from one of my clients.</p>
<p>Besides all that, certifications are still highly valued by the market and may help you stand out from the crowd.</p>
<p>As a final tip, if you aren't sure which certification you should take first, this is the order I'd recommend (unless you have specific needs related to your job or are strongly focused on a specific area):</p>
<ol>
<li>Professional Cloud Architect</li>
<li>Professional Cloud Security Engineer</li>
<li>Professional Data Engineer</li>
<li>Professional Cloud Network Engineer</li>
<li>Professional Cloud Developer (because it's back in beta, otherwise it would probably be number 3 in this list)</li>
</ol>
<p>I hope this article and the referenced study guides help you in your journey to become a <strong>Google Cloud certified professional</strong> and I <strong>wish you all the success</strong> in your career!</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/04/image-168.png" alt="Image" width="600" height="400" loading="lazy">
_Photo by [Unsplash](https://unsplash.com/@mahdigp?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit"&gt;Mahdi Dastmard / &lt;a href="https://unsplash.com/?utm_source=ghost&amp;utm_medium=referral&amp;utm<em>campaign=api-credit)</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to run Laravel on Google Cloud Run with Continuous Integration - a step by step guide ]]>
                </title>
                <description>
                    <![CDATA[ By Geshan Manandhar Laravel has soared in popularity over the last few years. The Laravel community even says that Laravel has made writing PHP more enjoyable instead of a pain. Laravel 6 has some interesting new features. Getting a super scaleable w... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-setup-laravel-6-on-google-cloud-run-with-continuous-integration-ci-step-by-step/</link>
                <guid isPermaLink="false">66d45edd73634435aafcefa2</guid>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud run ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Laravel ]]>
                    </category>
                
                    <category>
                        <![CDATA[ PHP ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Software Engineering ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 14 Nov 2019 20:04:00 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2019/11/laravel6-on-gcr-f.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Geshan Manandhar</p>
<p>Laravel has <a target="_blank" href="https://trends.google.com/trends/explore?date=2014-11-12%202019-11-12&amp;q=laravel,symfony">soared</a> in popularity over the last few years. The Laravel community even says that Laravel has made writing PHP more enjoyable instead of a pain. Laravel 6 has some interesting new <a target="_blank" href="https://laracasts.com/series/whats-new-in-laravel-6">features</a>. Getting a super scaleable working URL for your application takes hours if not days. Setting up something like Kubernetes is a huge task. This is where Google Cloud Run shines: you can get a working HTTPS URL for any of your containerized apps in minutes.</p>
<p><a target="_blank" href="https://cloud.google.com/run/">Google Cloud Run</a> is serverless and fully managed by Google. You get super scale, billing by the second, HTTPS URLs, and your own domain mapping. If you want to run stateless containers, Cloud Run is hands down the easiest way to do it. In this post, I will detail how to get your Laravel 6 app working on Google cloud run with Continuous Integration (CI).</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li>You are familiar with PHP/Composer and aware of Laravel (if you’ve landed here you are, I suppose)</li>
<li>You know how to use Git from the CLI</li>
<li>Your code is hosted on GitHub for CI/CD and you are familiar with GitHub</li>
<li>Know a fair bit of Docker, maybe even multi-stage build</li>
<li>Have a working Google cloud account (they give you <a target="_blank" href="https://cloud.google.com/free/">$300 credit</a> free for 1 yr, no reasons not to have an account)</li>
</ul>
<h2 id="heading-why-is-cloud-run-a-great-option-for-beginners"><strong>Why is Cloud Run a great option for beginners?</strong></h2>
<p>For two main reasons:</p>
<ol>
<li>Learn about the best practices and software like docker and CI/CD</li>
<li>Getting the basics going just involves clicking a button, selecting 2 things, waiting for 5 mins, and you get a working HTTPS URL. Can it be any easier than this? :)</li>
</ol>
<h2 id="heading-steps-to-deploy"><strong>Steps to deploy</strong></h2>
<p>Below are the steps to set up and deploy Laravel 6 on Cloud Run:</p>
<h3 id="heading-1-clone-laravel-or-new-laravel-project"><strong>1. Clone Laravel or new Laravel project</strong></h3>
<p>Start by cloning Laravel or using composer or the Laravel CLI as indicated in the official <a target="_blank" href="https://laravel.com/docs/5.8/installation">installation</a> guide. I am using composer to get the latest Laravel as below:</p>
<h4 id="heading-command"><strong>Command</strong></h4>
<p>I ran the following command to get the latest Laravel:</p>
<pre><code class="lang-bash">composer create-project --prefer-dist laravel/laravel laravel6-on-google-cloud-run
</code></pre>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/01install-laravel.jpg" alt="Installing Laravel with composer" width="800" height="395" loading="lazy">
<em>Creating a new Laravel Project with Composer</em></p>
<h3 id="heading-2-test-it-locally-first"><strong>2. Test it locally first</strong></h3>
<p>Then run <code>cd laravel6-on-google-cloud-run</code> then <code>php artisan serve</code> to see if it is working. For me it was fine when I went to <code>http://localhost:8000</code> on a web browser. I had PHP 7.2 installed locally.</p>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/02running-laravel.jpg" alt="Running Laravel locally" width="800" height="410" loading="lazy">
<em>Laravel running locally without Docker</em></p>
<h3 id="heading-3-create-a-new-github-repo"><strong>3. Create a new GitHub repo</strong></h3>
<p>Create a new repository on Github like below:</p>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/03github-repo.jpg" alt="Creating a repo for Laravel on Github" width="767" height="616" loading="lazy">
<em>Create a new public repo on Github</em></p>
<p>You can use any Git hosting provider, but for this example I will be using <a target="_blank" href="https://github.com/features/actions">Github Actions</a> to run tests (and Github is the most popular git hosting tool).</p>
<h3 id="heading-4-add-repo-push-readme"><strong>4. Add repo, push readme</strong></h3>
<p>Now after you have the repo created, add it to your local Laravel copy and push the Readme file. To do this run the following commands on your CLI:</p>
<pre><code>git init
code . # I used VS code to change the readme
git add readme.md
git commit -m <span class="hljs-string">"Initial commit -- App Readme"</span>
git remote add origin git@github.com:geshan/laravel6-on-google-cloud-run.git
git push -u origin master
</code></pre><h4 id="heading-after-running-the-above-commands-i-had-this-on-my-github-repo"><strong>After running the above commands I had this on my Github repo</strong></h4>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/04initial-push.jpg" alt="After the first push, repo looks like this" width="1054" height="736" loading="lazy">
<em>Repo after pushing the readme to master branch</em></p>
<h3 id="heading-5-add-full-laravel-open-a-pr"><strong>5. Add full Laravel, open a PR</strong></h3>
<p>Now let’s add the whole app as a PR to the Github repo by executing the following commands:</p>
<pre><code>git checkout -b laravel6-full-app
git add .gitignore
git add .
git commit -m <span class="hljs-string">"Add the whole Laravel 6 app"</span>
git push origin laravel6-full-app
</code></pre><p>After that go and open a Pull Request (PR), on the repo like <a target="_blank" href="https://github.com/geshan/laravel6-on-google-cloud-run/pull/1">this</a> one. You might be thinking I am the only one working on this, why do I need a PR? Well, it is always better to do things methodically even if it is just one person working on the project :).</p>
<p>After that merge your pull request.</p>
<h3 id="heading-6-setup-tests-with-github-actionshttpsgithubcomfeaturesactions"><strong>6. Setup tests with <a target="_blank" href="https://github.com/features/actions">GitHub actions</a></strong></h3>
<p>Now the fun part: after you merged your PR now Github knows that this is a Laravel project. Click on the  <code>Actions</code> tab on your repo page and you should be able to see something like below:</p>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/05github-actions.jpg" alt="Click Actions tab to view options" width="800" height="440" loading="lazy">
<em>Setup the CI workflow for Laravel with Github Actions</em></p>
<p>Click the <code>Set up this workflow</code> under <code>Laravel</code> then on the next page click the <code>Start commit</code> button on the top right. After that add a commit message like below and click <code>Commit new file</code>.</p>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/06gh-actions-ci.jpg" alt="Add Laravel tests action" width="800" height="418" loading="lazy">
<em>Steps to setup the CI workflow with Github Actions</em></p>
<p>There you go, you have your CI setup. Laravel default tests will run on each git push now. Wasn’t that easy? Thank Github for this great intelligence. No more creating <code>.myCIname.yml</code> files :).</p>
<h3 id="heading-7-add-docker-and-docker-compose-to-run-app-locally"><strong>7. Add docker and docker-compose to run app locally</strong></h3>
<p>Now let’s add docker and docker-compose to run the app locally without PHP or artisan serve. </p>
<p>We will need the container to run Laravel on Google Cloud Run too. This part is inspired by the <a target="_blank" href="https://nsirap.com/posts/010-laravel-on-google-cloud-run/">Laravel on Google Cloud Run</a> post by Nicolas. If you want to learn more about <a target="_blank" href="https://www.docker.com/">Docker</a> and Laravel please refer to this <a target="_blank" href="https://geshan.com.np/blog/2015/10/getting-started-with-laravel-mariadb-mysql-docker/">post</a>.</p>
<p>Run the following commands first to get your master up to date as we added the <code>workflow</code> file from Github interface:</p>
<pre><code>git checkout master
git fetch
git pull --rebase origin master # <span class="hljs-keyword">as</span> we added the workflow file <span class="hljs-keyword">from</span> github interface
git checkout -b docker
</code></pre><p>Add a key to the <code>.env.example</code> file. Copy it from the <code>.env</code> file like below:</p>
<pre><code>APP_NAME=Laravel
APP_ENV=local
APP_KEY=base64:DJkdj8L5Di3rUkUOwmBFCrr5dsIYU/s7s+W52ClI4AA=
APP_DEBUG=<span class="hljs-literal">true</span>
APP_URL=http:<span class="hljs-comment">//localhost</span>
</code></pre><p>As this is just a demo this is ok to do. For a real app always be careful with secrets. For production-ready apps do turn of the debugging and other dev related things.</p>
<p>Add the following <code>Dockerfile</code> on the project root:</p>
<pre><code>FROM composer:<span class="hljs-number">1.9</span><span class="hljs-number">.0</span> <span class="hljs-keyword">as</span> build
WORKDIR /app
COPY . /app
RUN composer <span class="hljs-built_in">global</span> <span class="hljs-built_in">require</span> hirak/prestissimo &amp;&amp; composer install

FROM php:<span class="hljs-number">7.3</span>-apache-stretch
RUN docker-php-ext-install pdo pdo_mysql

EXPOSE <span class="hljs-number">8080</span>
COPY --<span class="hljs-keyword">from</span>=build /app /<span class="hljs-keyword">var</span>/www/
COPY docker/<span class="hljs-number">000</span>-<span class="hljs-keyword">default</span>.conf /etc/apache2/sites-available/<span class="hljs-number">000</span>-<span class="hljs-keyword">default</span>.conf
COPY .env.example /<span class="hljs-keyword">var</span>/www/.env
RUN chmod <span class="hljs-number">777</span> -R /<span class="hljs-keyword">var</span>/www/storage/ &amp;&amp; \
    echo <span class="hljs-string">"Listen 8080"</span> &gt;&gt; <span class="hljs-regexp">/etc/</span>apache2/ports.conf &amp;&amp; \
    chown -R www-data:www-data /<span class="hljs-keyword">var</span>/www/ &amp;&amp; \
    a2enmod rewrite
</code></pre><p>Then add the following file at <code>docker/000-default.conf</code></p>
<pre><code>&lt;VirtualHost *:<span class="hljs-number">8080</span>&gt;

  ServerAdmin webmaster@localhost
  DocumentRoot /<span class="hljs-keyword">var</span>/www/public/

  <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">Directory</span> /<span class="hljs-attr">var</span>/<span class="hljs-attr">www</span>/&gt;</span></span>
    AllowOverride All
    Require all granted
  &lt;/Directory&gt;

  ErrorLog ${APACHE_LOG_DIR}/error.log
  CustomLog ${APACHE_LOG_DIR}/access.log combined

&lt;/VirtualHost&gt;
</code></pre><p>After that add the following <code>docker-compose.yml</code></p>
<pre><code>version: <span class="hljs-string">'3'</span>
<span class="hljs-attr">services</span>:
  app:
    build:
      context: ./
    volumes:
      - .:<span class="hljs-regexp">/var/</span>www
    <span class="hljs-attr">ports</span>:
      - <span class="hljs-string">"8080:8080"</span>
    <span class="hljs-attr">environment</span>:
      - APP_ENV=local
</code></pre><h4 id="heading-lets-boil-down-to-main-things"><strong>Let's Boil down to main things</strong></h4>
<p>If you try to understand everything here it might be overwhelming, so let me boil down the main parts:</p>
<ol>
<li>We are using the official PHP Apache docker image to run Laravel, which has PHP version 7.3.</li>
<li>We are using a multistage build to get the dependencies with Composer then we're copying them to the main docker image that has PHP 7.3 and Apache.</li>
<li>As Google Cloud Run requires the web-server to be listening to port <code>8080</code> and we are using <code>000-default.conf</code> to configure this</li>
<li>To make things easy to run with the single command <code>docker-compose up</code> we are using docker-compose.</li>
<li>Now as you have read this far, run <code>docker-compose up</code> on your root and then after everything runs go to <code>http://localhost:8080</code> to see that Laravel 6 is running locally on Docker. Below is my <code>docker-compose up</code> output towards the end:</li>
</ol>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/07docker-compose-output.jpg" alt="Docker compose running Laravel with PHP 7.3 and Apache" width="800" height="306" loading="lazy">
<em>Docker compose running successfully on local machine</em></p>
<p>As Laravel is running fine with Docker, let’s open a PR like <a target="_blank" href="https://github.com/geshan/laravel6-on-google-cloud-run/pull/2/files">this</a> one to add Docker to our project. I ran the following commands on the root of the project before opening the Pull Request (PR):</p>
<pre><code>git status
</code></pre><p>It should give you something like below:</p>
<pre><code>On branch docker
Untracked files:
  (use <span class="hljs-string">"git add &lt;file&gt;..."</span> to include <span class="hljs-keyword">in</span> what will be committed)

  Dockerfile
  docker-compose.yml
  docker/

nothing added to commit but untracked files present (use <span class="hljs-string">"git add"</span> to track)
</code></pre><p>Now run the following commands:</p>
<pre><code>git add .
git commit -m <span class="hljs-string">"Add docker and docker compose"</span>
git push origin docker
</code></pre><p>As a bonus it will run the Laravel default test on the push, like you can see below:</p>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/08test-running-gh.jpg" alt="On each push PHP unit tests will run" width="800" height="486" loading="lazy">
<em>Default Laravel tests running with Github Actions</em></p>
<p>Only the owner of the repo has access to the <code>Actions</code> tab so other people don’t necessarily need to know the results of your test builds :).</p>
<h3 id="heading-8-add-deploy-to-google-cloud-buttonhttpsgithubcomgooglecloudplatformcloud-run-button"><strong>8. Add deploy to <a target="_blank" href="https://github.com/GoogleCloudPlatform/cloud-run-button">Google Cloud button</a></strong></h3>
<p>Now let’s deploy this Laravel setup to Google Cloud Run the easy way. Given that you have merged your PR from the <code>docker</code> branch, let’s run the following commands:</p>
<pre><code>git checkout master
git fetch
git pull --rebase origin master
git checkout -b cloud-run-button
</code></pre><p>Then add the following to your <code>readme.md</code> file:</p>
<pre><code>### Run on Google cloud run

[![Run on Google Cloud](https:<span class="hljs-comment">//storage.googleapis.com/cloudrun/button.svg)](https://console.cloud.google.com/cloudshell/editor?shellonly=true&amp;cloudshell_image=gcr.io/cloudrun/button&amp;cloudshell_git_repo=https://github.com/geshan/laravel6-on-google-cloud-run.git)</span>
</code></pre><p>Be careful and replace the last part with your repo’s <code>HTTPs</code> URL. For example, if your repo is at <code>https://github.com/ghaleroshan/laravel6-on-google-cloud-run</code> it will be <code>https://github.com/ghaleroshan/laravel6-on-google-cloud-run.git</code>, then commit and push. Your PR should look something like <a target="_blank" href="https://github.com/geshan/laravel6-on-google-cloud-run/pull/3/files">this</a> one.</p>
<h3 id="heading-9-deploy-on-google-cloud-run"><strong>9. Deploy on Google Cloud Run</strong></h3>
<p>After you merge your Pull Request (PR), then go to your repo page and click on the <code>Run on Google Cloud</code> button.</p>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/09cloud-run-button.jpg" alt="Click on the blue button to deploy the app" width="800" height="442" loading="lazy">
<em>Github readme after adding the Run on Google Cloud button</em></p>
<p>After that, if you are logged into your Google account and have Google cloud setup with 1 project, click “Proceed”. You might need to wait a bit, then</p>
<ol>
<li>Choose the project – <code>Choose a project to deploy this application</code></li>
<li>Choose the region – <code>Choose a region to deploy this application</code>, I usually go with <code>us-central-1</code></li>
<li>Then wait for the container to be built and deployed. You can see my process below:</li>
</ol>
<p>If everything goes fine on your <code>Google Cloud Shell</code>, you will see the HTTPS URL that you can hit to see your Laravel app running like below:</p>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/10laravel-running-gcr.jpg" alt="Hit the given URL to see its running" width="800" height="387" loading="lazy">
<em>Deploy screen of Google Cloud Shell for Laravel 6 on Cloud Run</em></p>
<p>What just happened above is:</p>
<ol>
<li>After choosing the region, the script built a docker container image from the <code>Dockerfile</code> in the repo</li>
<li>Then it pushed the built image to <a target="_blank" href="https://cloud.google.com/container-registry/">Google Container Registry</a></li>
<li>After that using the <a target="_blank" href="https://cloud.google.com/sdk/gcloud/">gcloud</a> CLI it deployed the built image to Cloud Run, which gave back the URL.</li>
</ol>
<h3 id="heading-10-hurray-your-app-is-working"><strong>10. Hurray, your app is working</strong></h3>
<p>After you git the URL you should see your app working on Google Cloud Run like below:</p>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/11laravel-url.jpg" alt="Laravel Running on Google Cloud Run" width="800" height="367" loading="lazy">
<em>Laravel running on Google Cloud Run with Serverless HTTPS URL :)</em></p>
<p>If you want to deploy another version you can merge your PR to master and click the button again to deploy.</p>
<h2 id="heading-more-about-google-cloud-run"><strong>More about Google Cloud Run</strong></h2>
<p>The <a target="_blank" href="https://cloud.google.com/run/pricing">pricing</a> for Google Cloud Run is very generous. You can run any containerized app or web app on Google cloud run. I ran a pet project that got ~ 1 request per minute and I did not have to pay anything.</p>
<p>Behind the scenes, it is using <a target="_blank" href="https://cloud.google.com/knative/">Knative</a> and <a target="_blank" href="https://kubernetes.io/">Kubernetes</a>. It can also be run on your Kubernetes cluster but who would choose to manage a K8s cluster if you can just push and get a scaleable serverless fully managed app :).</p>
<h2 id="heading-tldr"><strong>TLDR</strong></h2>
<p>To run Laravel 6 on Google Cloud Run quickly follow the steps below:</p>
<ol>
<li>Make sure you are logged into your <a target="_blank" href="https://console.cloud.google.com/">Google Cloud Account</a></li>
<li>Go to <a target="_blank" href="https://github.com/geshan/laravel6-on-google-cloud-run">https://github.com/geshan/laravel6-on-google-cloud-run</a></li>
<li>Click the “Run On Google Cloud” blue button</li>
<li>Select your project</li>
<li>Select your region</li>
<li>Wait and get the URL of your Laravel App as below, Enjoy!</li>
</ol>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/10laravel-running-gcr.jpg" alt="Hit the given URL to see its running" width="800" height="387" loading="lazy">
<em>Deploy log of Cloud Run button to deploy Laravel on Google Cloud Run</em></p>
<hr>
<p><img src="https://geshan.com.np/images/laravel6-on-google-cloud-run/11laravel-url.jpg" alt="Laravel Running on Google Cloud Run" width="800" height="367" loading="lazy">
<em>Laravel Running successfully on Google Cloud Run</em></p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>There you go – running a Laravel app on Google cloud run was pretty easy. You have even got test running on Github with Github actions. Hope it helps. To do a CI/CD approach you can check out this <a target="_blank" href="https://medium.com/google-cloud/simplifying-continuous-deployment-to-cloud-run-with-cloud-build-including-custom-domain-setup-ssl-22d23bed5cd6">post</a>. It shows deployment using Cloud build. As the same container is running for local and production (Google Cloud Run) environment you don’t need to learn a new framework to go Serverless.</p>
<blockquote>
<p>Any containerized web app can be run on Google Cloud Run, it is a great service. You can read more at https://geshan.com.np</p>
</blockquote>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to check the weather using GCP-Cloud IoT Core with ESP32 and Mongoose OS ]]>
                </title>
                <description>
                    <![CDATA[ By Olivier LOURME This post on freecodecamp.org is not maintained. The most up to date version is on Medium: https://medium.com/free-code-camp/gcp-cloudiotcore-esp32-mongooseos-1st-5c88d8134ac7 This post is a step-by-step tutorial for newbies to Goog... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/gcp-cloudiotcore-esp32-mongooseos-1st-5c88d8134ac7/</link>
                <guid isPermaLink="false">66d4608a230dff0166905849</guid>
                
                    <category>
                        <![CDATA[ Firebase ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ iot ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 28 Feb 2019 22:07:21 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*GGNvAgxLJXeagpiyQnlHvQ.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Olivier LOURME</p>
<p>This post on freecodecamp.org is not maintained. The most up to date version is on Medium: <a target="_blank" href="https://medium.com/free-code-camp/gcp-cloudiotcore-esp32-mongooseos-1st-5c88d8134ac7">https://medium.com/free-code-camp/gcp-cloudiotcore-esp32-mongooseos-1st-5c88d8134ac7</a></p>
<p>This post is a step-by-step tutorial for newbies to <strong>Google Cloud Platform-Cloud IoT Core</strong>. The devices are <strong>ESP32 Wifi chips</strong> running <strong>Mongoose OS</strong>. To go through his tutorial, the concepts and then the setup of a simple <strong>IoT system</strong> <strong>measuring weather data</strong> are described.</p>
<p><strong>Live demo is here:</strong> <a target="_blank" href="https://hello-cloud-iot-core.firebaseapp.com/">https://hello-cloud-iot-core.firebaseapp.com/</a></p>
<p><strong>GitHub for last section</strong> <em>(Logging, storing and visualizing weather data with Firebase) <strong>is here:</strong></em> <a target="_blank" href="https://github.com/olivierlourme/iot-store-display">https://github.com/olivierlourme/iot-store-display</a></p>
<p><strong>This post is completed by a second one:</strong> see <a target="_blank" href="https://medium.com/@o.lourme/gcp-cloudiotcore-esp32-mongooseos-2nd-config-state-encrypt-7c5e937e5be9">here</a>.</p>
<h3 id="heading-introduction">Introduction</h3>
<h4 id="heading-1-history">1) History</h4>
<p>In a previous 3-post series [<a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-1-a07db495ac5f">link</a>, <a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-2-14b0609d3f5e">link</a>, <a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-3-644048e90ca4">link</a>], we used an <strong>ESP8266 Wifi chip</strong> to regularly measure luminosity and feed a database with the obtained data. The data set was ultimately lively plotted to a web app (see live plot here: [<a target="_blank" href="https://esp8266-rocks.firebaseapp.com/">link</a>]). We massively used <strong>Firebase products</strong> (Realtime Database, Cloud Functions, SDK and Hosting) to meet our goals.</p>
<p>This project works fine, it draws very little power and we enjoyed developing it — but:</p>
<ul>
<li><strong>This project was okay to handle just a few connected sensors</strong>. Setting up a set of a hundred sensors would require a lot of (rigorous) manual intervention and monitoring them would be challenging as well. Indeed, there is no central place where we can manage our system.</li>
<li><strong>Arduino IDE</strong> and <strong>Arduino core for ESP8266</strong> were great for discovering ESP8266 but they are <strong>quickly insufficient</strong>: The IDE file management is really basic, there is only one program in the chip, and <strong>there is no Operating System</strong> <strong>providing useful APIs for IoT</strong>.</li>
<li><strong>FirebaseArduino</strong> <strong>library</strong>, allowing an ESP8266 to push data to a Firebase Realtime Database, <strong>was experimental</strong>. Some features like authentication should be improved. For now, the “secret” type authentication we used gives ESP8266 admin rights over the whole database!</li>
<li>Eventually, <strong>ESP8266 SPI flash memory was not designed to be encrypted</strong>. In our first post [<a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-1-a07db495ac5f">link</a>], we showed how easy it was to recover a Wifi password when reading this memory.</li>
</ul>
<blockquote>
<p>In a word, this past project couldn’t be used in an industrial context. It was more a prototype for a Proof of Concept. We learnt a lot with it but today <strong>we’d like to develop a professional and fully secured solution capable of managing in a simple way a lot of connected sensors</strong>.</p>
</blockquote>
<p>This is why we decided to:</p>
<ul>
<li><strong>investigate Google Cloud Platform-Cloud IoT Core</strong> [<a target="_blank" href="https://cloud.google.com/iot-core/">link</a>] to manage our system : devices setup, provision, authentication and monitoring;</li>
<li><strong>move from ESP8266 to ESP32</strong>, which offers memory encryption;</li>
<li><strong>run Mongoose OS</strong> [<a target="_blank" href="https://mongoose-os.com/">link</a>] in our ESP32s. This OS accepts programs written in Javascript(JS) and provides a lot of APIs to deal with time, MQTT protocol, sensors, provisioning, etc. It is easy to interface with the main IoT platforms, including Google Cloud Platform-Cloud IoT Core.</li>
</ul>
<h4 id="heading-2-a-word-about-esp32-wifi-chip"><strong>2) A word about ESP32 Wifi chip</strong></h4>
<p>ESP32 Wifi chip is a successor of the famous ESP8266 we described here: [<a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-1-a07db495ac5f">link</a>]. Compared to it, every feature is enhanced (speed up to 240 MHz, two cores, 520 kiB RAM, number of GPIOs, variety of peripherals, etc.) and there are some new ones (Bluetooth: legacy/BLE, <strong>4 MiB-flash memory encryption capability</strong>, <strong>cryptographic hardware acceleration</strong>: AES, SHA-2, RSA, ECC, RNG). There are a lot of resources on the web concerning ESP32. The following one deals with the <strong>ESP32 DEVKIT V1 development board</strong> that we will use and gives its pinout: [<a target="_blank" href="https://randomnerdtutorials.com/esp32-pinout-reference-gpios/">link</a>].</p>
<p>There is also this extensive resource concerning the wide variety of ESP32 chips and development kits : <a target="_blank" href="http://esp32.net/">http://esp32.net/</a> . On their home page, searching for “ESP32 DevKit” or “GeekCreit” leads to a link to the <a target="_blank" href="https://github.com/SmartArduino/ESP/blob/master/SchematicsforESP32.pdf">schematic</a> of our ESP32 DEVKIT V1. This development board embeds an official Espressif ESP32-WROOM-32 chip and costs about 6€ at Banggood.</p>
<h3 id="heading-basic-iot-concepts-explained-through-our-use-case">Basic IoT concepts explained through our use case</h3>
<p>So, what will be our playground for test all these new tools?</p>
<blockquote>
<p>To illustrate IoT concepts through Cloud IoT Core, we chose to build <strong>a weather station</strong> <strong>reporting humidity and temperature from different places</strong>.</p>
</blockquote>
<p>For simplicity we’ll handle only 2 places: inside our house (“indoor”) and outside our house (“outdoor”). It’s up to you to deal with many more places.</p>
<h4 id="heading-1-project-hardware-esp32-amp-dht22">1) Project hardware : ESP32 &amp; DHT22</h4>
<p>At each of theses places, we’ll install a connected sensor (<strong>“a device”</strong>) constituted by a <strong>DHT22</strong> <strong>humidity/temperature sensor</strong> (description: [<a target="_blank" href="https://learn.adafruit.com/dht">link</a>], datasheet: [<a target="_blank" href="https://cdn-shop.adafruit.com/datasheets/Digital+humidity+and+temperature+sensor+AM2302.pdf">link</a>], 4€ at Banggood) connected to an <strong>ESP32 DEVKIT V1</strong> development board. DHT22 observes a kind of “1-Wire” protocol. Each ESP32 will house <strong>Mongoose OS</strong> for operating system. Its installation on an ESP32, a Hello, World! and a test with a DHT22 are given in the following section below.</p>
<p>Just below are given DHT22 specifications. Afterwards, we think the accuracy figures are a bit optimistic but that’s not our concern today.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*WIxEy4NLdDr_2SVhcjmv0g.png" alt="Image" width="682" height="287" loading="lazy">
_DHT22 sensor characteristics ([[link](https://learn.adafruit.com/dht/overview" rel="noopener" target="<em>blank" title=")])</em></p>
<p>We can already build the following assembly twice (one for indoor and one for outdoor). For now, power will come from the USB connector connected to our host machine. In production, power may come from a power bank.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*tFxxVtWwVH9gFajtrCWQ0w.png" alt="Image" width="800" height="508" loading="lazy">
_Assembly Diagram — ESP32 DEVKIT V1 and DHT22 sensor constitute a “device”. Pinout is here : [[link](https://randomnerdtutorials.com/esp32-pinout-reference-gpios/" rel="noopener" target="<em>blank" title=")]</em></p>
<p>That’s all for hardware! The rest of the project uses <strong>serverless solutions</strong> from Google. We describe them now…</p>
<h4 id="heading-2-project-architecture-cloud-iot-core-amp-firebase">2) Project architecture : Cloud IoT Core &amp; Firebase</h4>
<p>All this “Project architecture” section is theoretical, there is no step to perform. Its aim is to introduce vocabulary and notions related to IoT, more specifically when this domain involves Google Cloud solutions.</p>
<p>Here is the general architecture of our project:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*PttQxTGMbDHCGwzCsWlZvA.png" alt="Image" width="800" height="348" loading="lazy">
<em>Project architecture</em></p>
<p><em>Note:</em> There is no <strong>gateway</strong> between our devices and Cloud IoT Core because they “speak” MQTT.</p>
<p><em>Note:</em> Devices can also communicate with Cloud IoT Core via its <strong>HTTP bridge</strong>. As it is less performant than the <strong>MQTT bridge</strong> (see a comparison : [<a target="_blank" href="https://cloud.google.com/iot/docs/concepts/protocols">link</a>]), we will disallow this communication later during registry configuration. Limiting access just to what is necessary is a good practice.</p>
<p>Let’s explain this architecture in three sections:</p>
<ul>
<li>“From Devices to Cloud Pub/Sub” describes the classical Google IoT architecture.</li>
<li>“From Cloud Pub/Sub to data storage and visualization”, describes the choices we made to exploit data.</li>
<li>“Additional config and state topics” completes this architecture presentation.</li>
</ul>
<p><strong>From Devices to Cloud Pub/Sub</strong></p>
<ul>
<li>Cloud IoT Core</li>
</ul>
<p><strong>Cloud IoT Core</strong> is the Google Cloud Platform service to which each of our <strong>registered devices</strong> will send temperature/humidity data. When such a data is sent, we say that <strong>the device publishes a telemetry event</strong> (sometimes also called a “telemetry message”).</p>
<p><em>Note:</em> Pricing is detailed here : [<a target="_blank" href="https://cloud.google.com/iot/pricing">link</a>]. For small projects with a few devices, there’s little chance you get charged.</p>
<ul>
<li><em>MQTT</em></li>
</ul>
<p>This publication is done through a <strong>MQTT connection</strong>. MQTT is a publish/subscribe-based message protocol; most of the time it lies over TCP [<a target="_blank" href="https://en.wikipedia.org/wiki/MQTT">link</a>] (or better: over TLS, itself being over TCP). The telemetry message has to be published by the device (a MQTT client) to the Cloud Iot Core “MQTT bridge” (a MQTT server) in a <strong>MQTT topic</strong> whose name imperatively respects this format:</p>
<pre><code>/devices/{device-id}/events
</code></pre><p><em>Note:</em> Sub-folders in the topic name are possible. We won’t need this feature here but see [<a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/mqtt-bridge#publishing_telemetry_events">link</a>], as it can sometimes be useful.</p>
<p><code>{device-id}</code> is unique to each device. In our case, Mongoose OS creates it from the last 3 bytes of the MAC address of the ESP32. For example it could be <code>esp32_ABB3B4</code>.</p>
<ul>
<li><em>Quality of Service (QoS)</em></li>
</ul>
<p>The MQTT specification describes three <strong>Quality of Service (QoS)</strong> levels, when publishing to a topic ([<a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/mqtt-bridge#quality_of_service_qos">link</a>]):</p>
<blockquote>
<p>QoS 0, the message is delivered at most once;</p>
<p>QoS 1, the message is delivered at least once;</p>
<p>QoS 2, the message is delivered exactly once.</p>
</blockquote>
<p>Cloud IoT Core does not support QoS 2. And QoS 1 is better than QoS 0. So <strong>QoS 1 is the one we will adopt</strong>. Mongoose OS can do that.</p>
<ul>
<li><em>Security</em></li>
</ul>
<p>Concerning <strong>security</strong>, in our Mongoose OS/Cloud IoT Core context, MQTT communications are made over <strong>TLS</strong> ([<a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/mqtt-bridge#mqtt_server">link</a>]), so (1) the device is assured to be connected to Cloud IoT Core MQTT server (CA’s certificates are stored in Mongoose OS <code>ca.pem</code> file), (2) the data exchange will be private and (3) data integrity will be checked. On the other way, <strong>device authentication</strong> ([<a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/mqtt-bridge#device_authentication">link</a>]) with Cloud IoT Core is performed with a per-device public/private key authentication using <strong>JSON Web Tokens (JWT)</strong>. The device performs the signature part of the JWT with its private key and Cloud IoT Core validates it using the related public key. Mongoose OS tools handles this keys generation and distribution, we’ll see that soon in the section called “Device registration within the Cloud IoT Core project” lying a few paragraphs below. In this section, we’ll see also how to store securely the private key on the device by performing memory encryption (preventing as well reverse engineering).</p>
<p><em>Note:</em> Beyond JWT device authentication, for additional security, it’s possible to impose TLS from Cloud IoT Core to devices (so each device has also a public key certificate, etc.). It is an option we won’t use but it’s described <a target="_blank" href="https://mongoose-os.com/docs/mongoose-os/api/net/mqtt.md">here</a> for Mongoose OS side (see “mutual TLS”) and <a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/credentials/verifying-credentials">here</a> for Cloud Iot Core side. It’s good to know that AWS IoT imposes this mutual TLS, unconditionally ([<a target="_blank" href="https://docs.aws.amazon.com/iot/latest/developerguide/iot-security-identity.html">link</a>]).</p>
<ul>
<li><em>Registry</em></li>
</ul>
<p>Devices sharing the same purpose are regrouped within a <strong>registry</strong>.</p>
<ul>
<li><em>Cloud Pub/Sub</em></li>
</ul>
<p><strong>Telemetry data from all devices belonging to the same registry is then <em>forwarded</em> to a Cloud Pub/Sub topic</strong> (Cloud Pub/Sub is a GCP product [<a target="_blank" href="https://cloud.google.com/pubsub/">link</a>], not specifically a Cloud IoT Core one). The name of the Cloud Pub/Sub topic follows this pattern:</p>
<pre><code>projects/id-<span class="hljs-keyword">of</span>-google-cloud-project/topics/name-<span class="hljs-keyword">of</span>-telemetry-topic
</code></pre><p>So, if we call our Google Cloud project <code>hello-cloud-iot-core</code>, if we choose <code>weather-telemetry-topic</code> for the name of our Pub/Sub telemetry topic and if finally our registry is called <code>weather-devices-registry</code>, we’ll get sooner or later that kind of view in <strong>Google Cloud Console</strong> :</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*BeVR0exU6l-rXg8LqJvfeQ.jpeg" alt="Image" width="800" height="264" loading="lazy">
<em>Project ID, registry ID and telemetry Pub/Sub topic name in Google Cloud Console</em></p>
<p>But no stress, everything will be explained step by step to reach that.</p>
<p><em>Note:</em> As it is said here ([<a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/mqtt-bridge#publishing_telemetry_events">link</a>]), each message in the Cloud Pub/Sub topic contains a copy of the telemetry message published by the device but also some <strong>message attributes</strong>, the most important being probably <code>deviceID</code>, allowing us to match some received data with the device that published it.</p>
<p><em>Note:</em> We talk a lot about <strong>Pub(lish)</strong>, but where is the <strong>Sub(scribe)</strong>? In fact, we’ll create quickly with Google Cloud Command Line Interface a Cloud Pub/Sub subscription (a “pull” one) in order to view the messages published to the telemetry topic. Later in this post, we’ll create a Firebase Cloud Function reacting to each publication and this will automatically create another subscription (a “push” one this time).</p>
<p><strong>From Cloud Pub/Sub to data storage and visualization</strong></p>
<p>We’re following the right part of the project architecture diagram given at the beginning of this post:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*aN_fNiSZXWfgoNrHPmNXeQ.png" alt="Image" width="571" height="300" loading="lazy">
<em>Project Architecture — Weather data storage and visualization</em></p>
<p>A publication to the Cloud Pub/Sub topic will <strong>trigger a Firebase Cloud Function</strong> that will itself <strong>fulfill a Firebase Realtime Database</strong> with the new data. A web app hosted by <strong>Firebase Hosting</strong> will lively plot data from the Firebase Realtime Database, in the same way as we did in a previous post: [<a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-3-644048e90ca4">link</a>].</p>
<p>There are other options in the Google ecosystem to store/treat/visualize data. <a target="_blank" href="https://www.freecodecamp.org/news/gcp-cloudiotcore-esp32-mongooseos-1st-5c88d8134ac7/undefined">Alvaro Viebrantz</a>’s really good post [<a target="_blank" href="https://medium.com/google-cloud/build-a-weather-station-using-google-cloud-iot-core-and-mongooseos-7a78b69822c5">link</a>] that helped us uses <strong>Big Query</strong> ([<a target="_blank" href="https://cloud.google.com/bigquery/">link</a>]) and <strong>Data Studio</strong> ([<a target="_blank" href="https://datastudio.google.com">link</a>]).</p>
<p><strong>Additional “config” and “state” topics</strong></p>
<p>On the project architecture diagram given at the beginning of this post, we see besides telemetry two other data flows: <strong>Config</strong> ([<a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/config/configuring-devices">link</a>]) and <strong>State</strong> ([<a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/config/getting-state">link</a>]):</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*kBpmaNkEksBh_SAju-XpxA.png" alt="Image" width="249" height="156" loading="lazy">
<em>Config and State data flows</em></p>
<p>Indeed, the Cloud IoT Core service may publish <strong>configuration update messages</strong> to a special topic the device has subscribed to ([<a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/config/configuring-devices">link</a>]). It is useful when we need the device to go to a new state, <em>e.g.</em> by updating a parameter of its associated sensor, by changing a deep sleep period, moving a servomotor, etc.</p>
<p>For efficiency, there shouldn’t be more than one message of that type per second per device. Such a message is an arbitrary user-defined blob (we’ll use JSON), up to 64 kiB. At last, the name of this special MQTT topic is imperatively:</p>
<pre><code>/devices/{device-id}/config
</code></pre><p>On the other hand, a device may publish to a special topic — that Cloud IoT Core has automatically subscribed to — <strong>messages concerning its state</strong> ([<a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/config/getting-state">link</a>])<strong>,</strong> <em>e.g.</em> quantity of RAM available, state of a button, etc. It is often used to see if the previous config message sent to the device had the desired effect.</p>
<p>For efficiency, this kind of publication shouldn’t be done more than once per second per device. Such a message is an arbitrary user-defined blob (we’ll use JSON), up to 64 kiB. At last, the topic to which the device publishes its state data has imperatively this name:</p>
<pre><code>/devices/{device-id}/state
</code></pre><p><em>Note:</em> Sending <strong>commands</strong> to devices is also possible from Cloud IoT Core: see [<a target="_blank" href="https://cloud.google.com/iot/docs/how-tos/commands">link</a>] but we won’t illustrate it.</p>
<blockquote>
<p>But for the moment, we will focus on telemetry. After this journey, in a “coming soon” post we will show how to handle <code>_config_</code> and <code>_state_</code> special topics.</p>
</blockquote>
<p>UPDATE March 29, 2019: This post about <code>config</code> and <code>state</code> special topics is out: [<a target="_blank" href="https://medium.com/@o.lourme/gcp-cloudiotcore-esp32-mongooseos-2nd-config-state-encrypt-7c5e937e5be9">link</a>].</p>
<h3 id="heading-mongoose-os-installation-on-devices"><strong>Mongoose OS installation on devices</strong></h3>
<h4 id="heading-1-a-short-description-of-mongoose-os">1) A short description of Mongoose OS</h4>
<p><strong>Mongoose OS</strong> ([<a target="_blank" href="https://mongoose-os.com/">link</a>], [<a target="_blank" href="https://lwn.net/Articles/733297/">link</a>]) is a smart IoT-oriented OS, runnable on several chips, including ESP8266 and ESP32. Mongoose OS is in partnership with the major actors in IoT ([<a target="_blank" href="https://mongoose-os.com/about.html">link</a>]). It comes with a development tool called <strong>mos</strong>, working either in a UI or with a Command Line Terminal (like <code>cmd.exe</code> in Windows). In either cases, we’ll write <code>mos</code> commands. There is also a device management app called mDash but we didn’t try it. <strong>Numerous APIs dealing with most of the network and sensor protocols are provided.</strong> Programs can be written in both C/C++ and JS.</p>
<p>At last, there is a 12-tutorial series on YouTube, really useful:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/bDsqR6HBseY" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p><em>Note:</em> We used Mongoose OS Community Edition, which is free, licensed under Apache 2.0.</p>
<h4 id="heading-2-mongoose-os-installation-on-esp32">2) Mongoose OS installation on ESP32</h4>
<p>This installation has to be performed on each device.</p>
<p>We head to the <strong>developers section</strong> of Mongoose OS web site ([<a target="_blank" href="https://mongoose-os.com/docs/quickstart/setup.md">link</a>]) in order to perform the <strong>first seven steps</strong> of the list given in this resource:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*LZbBhMBqdngkZwdJrmtANg.png" alt="Image" width="800" height="183" loading="lazy">
<em>Mongoose OS setup steps</em></p>
<p><strong>Step #1</strong>, <strong>Step #2</strong> and <strong>Step #3</strong> are trivial. At Step #3 don’t forget to connect the device to the host machine via a USB cable.</p>
<p>For <strong>Step #4</strong> “Create new app”, we choose to call the app <code>app1</code>. When <code>mos clone https://github.com/mongoose-os-apps/demo-js app1</code> indicated on the web site is completed, mos tool automatically goes to the just created <code>app1/</code> folder.</p>
<p>In <code>app1/fs/</code> folder, there is a source file called <code>init.js</code>. It is a demo file capable to communicate with different IoT platforms (if they are configured of course). We will basically test it and soon simplify it for our purposes.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*5GYs7d1Ws_km6oDnWPaa6Q.png" alt="Image" width="800" height="449" loading="lazy">
<em><strong>mos tool</strong> launched in a UI ; ESP32 selected; Serial console (on the right) is by default at 115200 bds, ESP32’s default speed.</em></p>
<p><strong>Step #5</strong> “Build app firmware” is launched with <code>mos build</code> command (add <code>--arch esp32</code> to this command if you launch it from a Command Line Terminal, not from mos tool). It may take a while but normally we have to perform this build only once. After it, we have many more files. One called <code>app1/build/fw.zip</code> contains the binaries of the OS and <code>init.js</code>. It will be flashed to ESP32 in the next step.</p>
<p><strong>Step #6</strong> “Flash firmware” is launched with <code>mos flash</code> command. It has normally to be done just once. Even if later we change some files (like <code>init.js</code> for instance), we will use a <code>mos put</code> command to upload a file from the host machine to the <strong>local device’s file system</strong>. Of course this command is only available after the flash process.</p>
<p><em>Note:</em> Firmware flash step can be tricky with a brand new ESP32. With our ESP32 DEVKIT V1, we had messages in console (that’s a first good point!) reporting issues about failing to connect to ESP32 ROM. Retrying to flash by pressing the BOOT button (closed to USB connector) finally turned out to a successful flash. Though, be ready to wait for one minute or two.</p>
<p>Then, the device automatically reboots and executes <code>init.js</code>. We obtained every second the following information in mos console (or in any serial terminal @115200 bds) :</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*0i3bn3fhEkAolrkfvBjQVg.png" alt="Image" width="800" height="22" loading="lazy">
<em>Console after initial ESP32 flash firmware with Mongoose OS</em></p>
<p>In <strong>Step #7</strong> we connect ESP32 to our wifi network (we use mos tool):</p>
<pre><code>mos wifi WIFI_NETWORK_NAME WIFI_PASSWORD
</code></pre><p>The device will reboot by itself after getting an IP address and synchronizing time by contacting a SNTP server. We then ping our device to check its internet connexion.</p>
<p><em>Note:</em> We get device information (IP address for instance) by hitting <strong>CTRL+i</strong> in mos tool, or by typing <code>mos call Sys.GetInfo</code>.</p>
<p><em>Note:</em> We reset the device by hitting <strong>CTRL+u</strong> in mos tool, or by typing <code>mos call Sys.Reboot</code>.</p>
<p><em>Note:</em> Steps #5, #6 and #7 could be the beginning of a “provisioning script”, useful if we have many devices to setup. It is optional to rerun Step #5 if all devices are the same, <em>e.g.</em> only ESP32.</p>
<h4 id="heading-3-esp32-hello-world-program-with-mongoose-os">3) ESP32 “Hello, World!” program with Mongoose OS</h4>
<p>To get used to Mongoose OS JS programming style and mos tool, let’s write a small program whose aim is to make the blue built-in LED blink and print messages on console. <strong>This led is connected to GPIO2 pin</strong> of ESP32 DEVKIT V1 (see Assembly Diagram at the beginning of this post). On our host machine, let’s replace the content of <code>app1/fs/init.js</code> by this one:</p>
<pre><code class="lang-js"><span class="hljs-comment">/*
 ESP32 DEVKIT V1 - Mongoose OS
 Built-in LED blink and console log
 This blue LED is connected to GPIO2.
 See: 
 - https://mongoose-os.com/docs/mos/api/core/mgos_timers.h.md
*/</span>

load(<span class="hljs-string">'api_config.js'</span>);
load(<span class="hljs-string">'api_gpio.js'</span>);
load(<span class="hljs-string">'api_timer.js'</span>);

<span class="hljs-keyword">let</span> pin = <span class="hljs-number">2</span>;

GPIO.set_mode(pin, GPIO.MODE_OUTPUT);

<span class="hljs-comment">// Call every 2 seconds</span>
Timer.set(<span class="hljs-number">2000</span>, Timer.REPEAT, <span class="hljs-function"><span class="hljs-keyword">function</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">let</span> value = GPIO.toggle(pin);
  print(value ? <span class="hljs-string">'Tick'</span> : <span class="hljs-string">'Tock'</span>);
}, <span class="hljs-literal">null</span>);
</code></pre>
<p>From mos tool or from a Command Line Terminal, we upload this file to Mongoose OS file system and finally we reboot the device:</p>
<pre><code>mos put fs/init.js
mos call Sys.Reboot
</code></pre><p>The blue led should blink and we should see alternatively <code>Tick</code> and <code>Tock</code> printed on console.</p>
<h4 id="heading-4-dht22-test-with-mongoose-os">4) DHT22 test with Mongoose OS</h4>
<p>At the beginning of this post, there is an Assembly Diagram showing how to connect DHT22 sensor with ESP32 DEVKIT V1. We chose to connect <strong>DHT22 data pin</strong> <strong>to GPIO0 of ESP32 DEVKIT V1</strong>.</p>
<p>So, here is another short <code>init.js</code> program. This one prints periodically to serial console DHT22 measures (temperature and humidity - as an object in JSON, no MQTT publication yet):</p>
<pre><code class="lang-js"><span class="hljs-comment">/*
 ESP32 DEVKIT V1 - Mongoose OS
 DHT22 sensor measures are sent to console.
 DHT22 data pin is connnected to GPIO0.
 See: 
 - https://mongoose-os.com/docs/quickstart/develop-in-js.md
 - https://mongoose-os.com/docs/mos/api/drivers/dht.md
*/</span>

load(<span class="hljs-string">'api_config.js'</span>);
load(<span class="hljs-string">'api_dht.js'</span>);
load(<span class="hljs-string">'api_timer.js'</span>);

<span class="hljs-keyword">let</span> pin = <span class="hljs-number">0</span>;
<span class="hljs-keyword">let</span> dht = DHT.create(pin, DHT.DHT22);

Timer.set(<span class="hljs-number">5000</span>, <span class="hljs-literal">true</span>, <span class="hljs-function"><span class="hljs-keyword">function</span>(<span class="hljs-params"></span>) </span>{ <span class="hljs-comment">// timer period is in ms</span>
  <span class="hljs-keyword">let</span> msg = <span class="hljs-built_in">JSON</span>.stringify({<span class="hljs-attr">temperature</span>: dht.getTemp(), <span class="hljs-attr">humidity</span>: dht.getHumidity()});
  print(msg);
}, <span class="hljs-literal">null</span>);
</code></pre>
<p>Then:</p>
<pre><code>mos put fs/init.js
mos call Sys.Reboot
</code></pre><p>And this is the related console we get after uploading the <code>init.js</code> program and rebooting the device. Humidity is in % and temperature is in Celsius degrees:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*2xr980fPqZCYC84iUq4qEw.png" alt="Image" width="626" height="82" loading="lazy">
<em>DHT22 measures as printed to console. A bit too accurate, isn’t it ?</em></p>
<p>Numbers seem to have a long decimal part but this will be fixed later within a Cloud Function.</p>
<p><em>Note:</em> For pedagogical purposes, we chose all over this post explicit long key names like <code>temperature</code> or <code>humidity</code>. This will have consequences on the volume of data stored later in a NoSQL database (Firebase Realtime Database) as those keys will be repeated for each measure. Shorter key names could be a good idea.</p>
<h4 id="heading-5-lets-publish-data-to-the-mqtt-telemetry-topic">5) Let’s publish data to the MQTT telemetry topic</h4>
<p><strong>This is our last program, the one ready to work with Cloud IoT Core!</strong> On the previous program, we just add a publication to the telemetry topic we already talked about: <code>/devices/{device-id}/events</code>.</p>
<p>Note that messages are published in JSON as it will facilitate later their content retrieval with the Firebase Cloud Function reacting to messages publication.</p>
<pre><code class="lang-js"><span class="hljs-comment">/*
 ESP32 DEVKIT V1 - Mongoose OS
 DHT22 sensor measures are sent to console.
 DHT22 data pin is connnected to GPIO0.
 Publishes weather data to the appropriate topic.

 See: 
 - https://mongoose-os.com/docs/quickstart/develop-in-js.md
 - https://mongoose-os.com/docs/mos/api/drivers/dht.md
 - https://mongoose-os.com/docs/mos/api/net/mqtt.md
*/</span>

load(<span class="hljs-string">'api_config.js'</span>);
load(<span class="hljs-string">'api_dht.js'</span>);
load(<span class="hljs-string">'api_timer.js'</span>);
load(<span class="hljs-string">'api_mqtt.js'</span>);

<span class="hljs-comment">// Telemetry topic must have this name:</span>
<span class="hljs-keyword">let</span> topic = <span class="hljs-string">'/devices/'</span> + Cfg.get(<span class="hljs-string">'device.id'</span>) + <span class="hljs-string">'/events'</span>;

<span class="hljs-keyword">let</span> pin = <span class="hljs-number">0</span>;
<span class="hljs-keyword">let</span> dht = DHT.create(pin, DHT.DHT22);

Timer.set(<span class="hljs-number">5000</span>, <span class="hljs-literal">true</span>, <span class="hljs-function"><span class="hljs-keyword">function</span>(<span class="hljs-params"></span>) </span>{ <span class="hljs-comment">// timer period is in ms</span>
  <span class="hljs-keyword">let</span> msg = <span class="hljs-built_in">JSON</span>.stringify({<span class="hljs-attr">temperature</span>: dht.getTemp(), <span class="hljs-attr">humidity</span>: dht.getHumidity()});
  <span class="hljs-comment">// Publish message with a QoS 1</span>
  <span class="hljs-comment">// MQTT.pub() returns 1 in case of success, 0 otherwise.</span>
  <span class="hljs-keyword">let</span> ok = MQTT.pub(topic, msg, <span class="hljs-number">1</span>); 
  print(ok, msg);
}, <span class="hljs-literal">null</span>);
</code></pre>
<p>We name this file <code>init.js</code>, upload it to Mongoose file system, then provoke a reset:</p>
<pre><code>mos put fs/init.js
mos call Sys.Reboot
</code></pre><p><em>Note:</em> These commands could be appended to the “provisioning script” we mentioned earlier.</p>
<p>When running, this last program prints data to console but it fails to publish data to the MQTT bridge of Cloud IoT Core (<code>MQTT.pub()</code> returns 0):</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*Q-FIqugYuMQGd5rjjeu56Q.png" alt="Image" width="646" height="95" loading="lazy">
<em>Telemetry data can still not be published as we haven’t set up a Google Cloud project yet.</em></p>
<p>Indeed, we haven’t set up any Google Cloud project yet, neither <em>a fortiori</em> registered a single device to it. Let’s do it now!</p>
<h3 id="heading-cloud-iot-core-project-setup">Cloud IoT Core project setup</h3>
<h4 id="heading-1-google-cloud-sdk-installation">1) Google Cloud SDK installation</h4>
<p>Firstly we need to <strong>install Google Cloud SDK</strong> because we will have to type some <code>**gcloud**</code> <strong>commands</strong> in a Command Line Terminal. At the time of writing, it requires Python 2.7. It won’t work with Python 3.5. The Google Cloud SDK download page ([<a target="_blank" href="https://cloud.google.com/sdk/downloads">link</a>]) offers versions of the SDK with Python bundled inside (if you’re sure you don’t have Python already installed and don’ t want to handle this Python point).</p>
<p>Then, Cloud IoT Core requires some <strong>Beta versions of <code>gcloud</code> commands</strong>. So in a Command Line Terminal, from any folder, we type:</p>
<pre><code>gcloud components install beta
</code></pre><p>These two previous steps have to be done just one time!</p>
<p><em>Note:</em> Most of the following actions on Google IoT Core can be performed in three ways:</p>
<ul>
<li>with <strong>Google Cloud Console</strong> (on the web)</li>
<li>with <strong>some APIs</strong> in different languages, and</li>
<li>with <strong>Command Line Interface</strong> in a terminal, typing <code>gcloud</code> commands.</li>
</ul>
<p>We will use the latter to configure things and we’ll check facts with Google Cloud Console (on the web).</p>
<h4 id="heading-2-google-cloud-project-setup">2) Google Cloud project setup</h4>
<p>We follow now this guide from Mongoose OS web site : [<a target="_blank" href="https://mongoose-os.com/docs/quickstart/cloud/google.md">link</a>].</p>
<pre><code># Commands indicated <span class="hljs-keyword">in</span> <span class="hljs-built_in">this</span> grey frame have to be done just once to configure the Google Cloud project! They can be performed <span class="hljs-keyword">from</span> any folder.

# Get authenticated <span class="hljs-keyword">with</span> Google Cloud
gcloud auth login

# Create cloud project. We chose hello-cloud-iot-core <span class="hljs-keyword">as</span> PROJECT_ID
gcloud projects create hello-cloud-iot-core

# Give Cloud IoT Core permission to publish to Pub/Sub topics
gcloud projects add-iam-policy-binding hello-cloud-iot-core --member=serviceAccount:cloud-iot@system.gserviceaccount.com --role=roles/pubsub.publisher

# <span class="hljs-built_in">Set</span> <span class="hljs-keyword">default</span> project <span class="hljs-keyword">for</span> gcloud
gcloud config set project hello-cloud-iot-core

# Create Pub/Sub topic <span class="hljs-keyword">for</span> device telemetry
gcloud beta pubsub topics create weather-telemetry-topic

# Create a Pub/Sub subscription to the just created topic
gcloud beta pubsub subscriptions create --topic weather-telemetry-topic weather-telemetry-subscription

# Create devices registry (we call it weather-devices-registry)

# Precise Pub/Sub topic name <span class="hljs-keyword">for</span> event notifications

# Disallow device connections to the HTTP bridge
gcloud beta iot registries create weather-devices-registry --region europe-west1 --no-enable-http-config --event-notification-config=topic=weather-telemetry-topic

# Say <span class="hljs-string">'yes'</span> to enable API (<span class="hljs-keyword">if</span> prompted).

# But the last command may not work all the same

# <span class="hljs-keyword">if</span> you don<span class="hljs-string">'t enable billing.

# So, follow the link to enable billing and retry last command.

# It should end up to "Created registry [weather-devices-registry]."</span>
</code></pre><h4 id="heading-3-device-registration-within-the-cloud-iot-core-project">3) Device registration within the Cloud IoT Core project</h4>
<p>Let’s now register the devices to the project! One at a time of course. <strong>mos tool is really helpful for this task.</strong> From mos tool launched in its UI or from Command Line Terminal, placed in our <code>app1</code> folder, we type the following command (project id and registry name are involved, as you see):</p>
<pre><code># Register device <span class="hljs-keyword">with</span> Cloud IoT Core (<span class="hljs-keyword">do</span> it <span class="hljs-keyword">for</span> each device!)
mos gcp-iot-setup --gcp-project hello-cloud-iot-core --gcp-region europe-west1 --gcp-registry weather-devices-registry
</code></pre><p><em>Note:</em> This command could be the last one of the “provisioning script” we mentioned already twice.</p>
<p>This command is a <code>mos</code> command that will itself use <code>gcloud</code> commands. The device about to be registered must be connected via the serial port to our host computer because some information will be uploaded to it just like keys, MQTT bridge address, etc.</p>
<p>Indeed, we see on mos console that <strong>two keys (one private, one public) are generated</strong>. We can inspect them in <code>app1</code> project folder. The private one is for ESP32 and the public one is for Google IoT Core. They are used during the authentication process involving the JSON Web Token we mentioned earlier.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*iVJ-dbpnqmQJqwuGFC917A.png" alt="Image" width="354" height="335" loading="lazy">
<em>Pair of keys just generated (private and public)</em></p>
<p><em>Note concerning security:</em> <strong>The private key shouldn’t be stored in plain text in ESP32 flash memory</strong>. This is why we describe in the post following this one ([<a target="_blank" href="https://medium.com/@o.lourme/gcp-cloudiotcore-esp32-mongooseos-2nd-config-state-encrypt-7c5e937e5be9">link</a>]) how to encrypt this memory. Also, <strong>the private key file shouldn’t be stored in plain text on the host development computer</strong>. At least, protect access to its content with a password.</p>
<p>When the device reboots, we see in the console that it successfully connects to the Google MQTT bridge and publishes telemetry messages (<code>MQTT.pub()</code> returns 1):</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*H9T-zF4B-HwYgO_ZdratEw.png" alt="Image" width="646" height="95" loading="lazy">
<em>Publications to MQTT bridge are successful. Nice job!</em></p>
<h4 id="heading-4-checking-the-project-setup-in-google-cloud-console">4) Checking the project setup in Google Cloud Console</h4>
<p>We head to <a target="_blank" href="https://console.cloud.google.com/iot/">https://console.cloud.google.com/iot/</a> to check that everything was well configured:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*BeVR0exU6l-rXg8LqJvfeQ.jpeg" alt="Image" width="800" height="264" loading="lazy">
<em>Project ID, registry ID and telemetry Pub/Sub topic name in Google Cloud Console</em></p>
<p>Clicking on the <strong>Registry ID</strong> <code>weather-devices-registry</code> reaches another screen. Clicking on “Devices” on this new screen lists provisioned devices and gives details like the last time they were seen (but this is not a live update, we have to refresh the page):</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*sfa7ErglL4OqCsk2jIZqUA.png" alt="Image" width="800" height="326" loading="lazy">
<em>Devices View in Google Cloud Console</em></p>
<p>Clicking on the <strong>Telemetry Pub/Sub topic</strong> name goes to <strong>Pub/Sub console</strong> to show the subscription we created before, <em>i.e.</em> the one related to the telemetry topic:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*EQ1r5SozuzjM5Jrbl4vdVg.png" alt="Image" width="760" height="392" loading="lazy">
<em>Google Cloud Console (Pub/Sub) — names of topic and related subscription</em></p>
<h4 id="heading-5-viewing-at-last-some-telemetry-data">5) Viewing at last some telemetry data</h4>
<p>Now it would be nice to see the data that devices are publishing. For this, we have the subscription we already created. From any folder of the host computer, we type:</p>
<pre><code>gcloud beta pubsub subscriptions pull --auto-ack weather-telemetry-subscription --limit=<span class="hljs-number">2</span>
</code></pre><p>This command ([<a target="_blank" href="https://cloud.google.com/sdk/gcloud/reference/beta/pubsub/subscriptions/pull">link</a>]) pulls until <strong><em>2</em></strong> Pub/Sub messages from our <code>weather-telemetry-subscription</code> subscription. We can see data in JSON, messages ids and a list of attributes for each message. Among them the <code>deviceId</code> attribute is present. Unfortunately there are no timestamps, we’ll see how to get them later.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*78OYVPPUeKZWRRvlWdI4Mg.png" alt="Image" width="800" height="267" loading="lazy">
<em>Result of pulling telemetry messages from a subscription</em></p>
<blockquote>
<p>If you have reached this milestone, congrats! We’re now ready to write a Firebase Cloud Function reacting to each publication to the Pub/Sub telemetry topic!</p>
</blockquote>
<h3 id="heading-logging-storing-and-visualizing-weather-data-with-firebase">Logging, storing and visualizing weather data with Firebase</h3>
<h4 id="heading-1-introduction">1) Introduction</h4>
<p>We’re now tackling this part of the project:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*UwR_qn8GX1b2CkMA62kT3w.png" alt="Image" width="503" height="302" loading="lazy">
<em>The part we study now</em></p>
<p>On that diagram, we see that our project needs 3 Firebase products :</p>
<p>A <strong>Firebase Cloud Function</strong> (more exactly “a Cloud Function for Firebase”) must react to any publication to the telemetry topic in order to store the weather data of this publication to a <strong>Firebase Realtime Database</strong>. This storage allows weather data persistence and is used to feed a web app hosted by <strong>Firebase Hosting</strong>. This web app draws live plots of this weather data across time.</p>
<p>The good new is that it’s possible to configure all these products with one command.</p>
<h4 id="heading-2-firebase-configuration-github-repository"><strong>2) Firebase configuration, GitHub repository</strong></h4>
<p>We are still working on the same Google Cloud project called <code>hello-cloud-iot-core</code>. Firebase will just “enhance” this project with its products.</p>
<p>We made a GitHub repository for the Firebase aspects of our project:</p>
<p><a target="_blank" href="https://github.com/olivierlourme/iot-store-display"><strong>olivierlourme/iot-store-display</strong></a><br><a target="_blank" href="https://github.com/olivierlourme/iot-store-display">_Contribute to olivierlourme/iot-store-display development by creating an account on GitHub._github.com</a></p>
<p>Clone this repository in your favorite development folder and head to the newly created directory:</p>
<pre><code>c:\_app&gt;git clone https:<span class="hljs-comment">//github.com/olivierlourme/iot-store-display</span>
c:\_app&gt;cd iot-store-display
</code></pre><p><strong>Global Firebase configuration</strong></p>
<p><em>Note:</em> We suppose you have <strong>Firebase tools</strong> installed (<em>i.e.</em> Node.js installed and <code>npm install -g firebase-tools</code> was run, see [<a target="_blank" href="https://firebase.google.com/docs/functions/get-started">link</a>] for details).</p>
<p>Let’s perform the Firebase initializations:</p>
<pre><code>c:\_app\iot-store-display&gt;firebase init
</code></pre><p>First step is to choose Firebase products we want to use:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*FYoZkLMMrQBQDflK8Z46fQ.png" alt="Image" width="720" height="522" loading="lazy">
<em>Choosing Firebase products</em></p>
<p>We are then prompted to associate the current directory (<code>iot-store-display</code>) with one of the listed Firebase projects. The problem is that our project <code>hello-cloud-iot-core</code> doesn’t appear in the list because before being a Firebase project it’s also a Google Cloud project! Read <a target="_blank" href="https://www.freecodecamp.org/news/gcp-cloudiotcore-esp32-mongooseos-1st-5c88d8134ac7/undefined">Doug Stevenson</a>’s posts for relationships between Firebase and Google Cloud: [<a target="_blank" href="https://medium.com/google-developers/whats-the-relationship-between-firebase-and-google-cloud-57e268a7ff6f">link</a>] and [<a target="_blank" href="https://medium.com/google-developers/firebase-google-cloud-whats-different-with-cloud-functions-612d9e1e89cb">link</a>].</p>
<p>To overcome this, first we hit CTRL+C to stop this initialization process and then we go to <strong>Firebase Console</strong> at <a target="_blank" href="https://console.firebase.google.com">https://console.firebase.google.com</a>. We choose “Add a project”:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*nOcFfL_ZwGUKGe-PDnsZUQ.png" alt="Image" width="606" height="117" loading="lazy">
<em>Firebse Console — Add a project</em></p>
<p>And we can see our project (with Google Cloud logo) and choose it:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*QqI3k8bH5P348eAInetTOQ.png" alt="Image" width="392" height="227" loading="lazy">
<em>Fierbase Console — Even Google Cloud projects are listed.</em></p>
<p><em>Note:</em> You might then be asked to confirm Firebase billing plan if the Google Cloud project itself has a billing plan.</p>
<p>Great! We restart the Firebase initialization with <code>firebase init</code> command and this time our Google Cloud project <code>hello-cloud-iot-core</code> is listed. We choose it:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*qr4CKgpdnEG9UorzVv3yrw.png" alt="Image" width="513" height="105" loading="lazy">
<em>Our <code>**iot-store-display**</code> directory is associated with <strong>hello-cloud-iot-core</strong> project.</em></p>
<p><em>Note:</em> If you still don’t see your project you might be logged to Firebase without the correct Google account. In this case, type <code>firebase logout</code> followed by <code>firebase login</code>.</p>
<p><strong>Realtime Database configuration</strong></p>
<p>Then the wizard asks a single question about the Realtime Database and its rules: the name of the file where they will be saved. We maintain the default name. It’s more practical to have these rules in a file lying in the project directory than to go to Firebase Console as we did in past posts. We will detail these rules later.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*YtlGoKNyjspvirVxSoXYNg.png" alt="Image" width="710" height="129" loading="lazy">
<em>Name of file storing Realtime Database rules</em></p>
<p><strong>Cloud Functions configuration</strong></p>
<p>Here are the answers we made to the wizard concerning Functions Setup:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*ml7Pc1OSEgqi6dRsy5J_aA.png" alt="Image" width="733" height="235" loading="lazy">
<em>Cloud Functions setup</em></p>
<p>Of course, we choose not to overwrite the <code>functions/index.js</code> file obtained from <a target="_blank" href="https://github.com/olivierlourme/iot-store-display">GitHub</a>.</p>
<p><strong>Firebase Hosting configuration</strong></p>
<p>And here are the answers we made to the wizard concerning Hosting Setup:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*F_7qI1Norg4qjQfPhouCWw.png" alt="Image" width="704" height="342" loading="lazy">
<em>Hosting setup</em></p>
<p>Of course we choose not to overwrite the <code>public/index.html</code> file obtained from <a target="_blank" href="https://github.com/olivierlourme/iot-store-display">GitHub</a>.</p>
<p><strong>Deploying (not now!)</strong></p>
<p>Later, if we want to deploy some updates we made to our 3 products, we can type globally:</p>
<pre><code>c:\_app\iot-store-display&gt;firebase deploy
</code></pre><p>But if we want to deploy only, respectively:</p>
<ul>
<li>updated database rules,</li>
<li>updated cloud functions,</li>
<li>updated web app,</li>
</ul>
<p>we type, respectively:</p>
<pre><code>firebase deploy --only database
firebase deploy --only functions
firebase serve --only hosting (local deployment) OR firebase deploy --only hosting (remote deployment)
</code></pre><h4 id="heading-3-pubsub-trigger-cloud-function">3) Pub/Sub trigger Cloud Function</h4>
<p><strong>Introduction</strong></p>
<p>In a past post, we explained that we could write <strong>Firebase Cloud Functions</strong> triggered on some events happening to some of the Google products.</p>
<p><a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-2-14b0609d3f5e"><strong>Post 2 of 3. Our IoT journey through ESP8266, Firebase and Plotly.js</strong></a><br><a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-2-14b0609d3f5e">_A Firebase Cloud Function appends a timestamp to each value pushed to a Firebase Realtime Database._medium.com</a></p>
<p><strong>Cloud Pub/Sub</strong> is one of these products and so it is possible to trigger a function each time a message is published to a Pub/Sub topic : [<a target="_blank" href="https://firebase.google.com/docs/functions/pubsub-events">link</a>].</p>
<p>So if the Firebase Cloud Function is triggered on each publication to the <code>weather-telemetry-topic</code> topic, watching its log will allow us to watch the telemetry topic’s activity.</p>
<p>The code of the Cloud Function has to store each new published data to the Firebase Realtime Database associated with our project.</p>
<p><strong>Cloud Function source code</strong></p>
<p>The beginning of the source code looks like this:</p>
<pre><code><span class="hljs-built_in">exports</span>.detectTelemetryEvents = functions.pubsub.topic(<span class="hljs-string">'weather-telemetry-topic'</span>).onPublish(
    <span class="hljs-function">(<span class="hljs-params">message, context</span>) =&gt;</span> {...
</code></pre><p>The full Cloud Function source code lies in the file named <code>index.js</code>. This file is in the <code>functions</code> folder of our <code>iot-store-display</code> directory on <a target="_blank" href="https://github.com/olivierlourme/iot-store-display">GitHub</a>. It is fully commented, so run and study it, it’s short and not complicated.</p>
<p><strong>Cloud Function deployment</strong></p>
<p>It’s time to deploy the Cloud Function:</p>
<pre><code>c:\_app\iot-store-display&gt;firebase deploy --only functions
</code></pre><p><strong>Cloud Function validation</strong></p>
<p>Once Cloud Function is deployed, we can watch the Cloud Function logs and among other things, we’ll see the results of the <code>console.log(</code>Device=${deviceId}...)<code>we wrote at the end of</code>index.js`.</p>
<p>Where to see those logs? We have two opportunities:</p>
<ul>
<li>in Firebase Console (<a target="_blank" href="https://console.firebase.google.com">https://console.firebase.google.com</a>):</li>
</ul>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*SH9086pyf8yyuEnyCDu0UQ.png" alt="Image" width="800" height="283" loading="lazy">
<em>Firebase Console — Firebase Cloud Functions logs</em></p>
<ul>
<li>in Google Cloud Console (<a target="_blank" href="https://console.cloud.google.com/functions/">https://console.cloud.google.com/functions/</a>):</li>
</ul>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*D3QmCuAsqE5jUkuIyPTB-w.png" alt="Image" width="800" height="308" loading="lazy">
<em>Google Cloud Console — Access to <strong>Cloud Function logs</strong> and to <strong>Cloud Function deletion</strong></em></p>
<p>We prefer this latter solution, as logs are clearer:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*kgpwdupeMVLKQ2msHpO4cQ.png" alt="Image" width="800" height="244" loading="lazy">
<em>Google Cloud Console- Cloud Functions logs</em></p>
<p>Concerning storage, here is what lies in Firebase Realtime Database after each device has made 2 telemetry data publication. Data is of course sorted by device as we specified it in <code>index.js</code>:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*gJX7raPOkRvRZzAHhQC14Q.png" alt="Image" width="800" height="584" loading="lazy">
<em>Firebase Console — Realtime Database after 2 telemetry data publications per device</em></p>
<p><em>Note:</em> Don’t forget to <strong>delete your Cloud Function</strong> on Google servers if you don’t use it, otherwise you might either reach the invocation quota or pay for service you don’t use, as indicated here: [<a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-2-14b0609d3f5e">link</a>]. Function deletion is to be performed on Google Cloud Console (see “Delete”, 3 screenshots above).</p>
<p><em>Note:</em> The Cloud Function has admin rights over the database, whatever is the content of the <code>database.rules.json</code> file. At this step, the <code>database.rules.json</code> file can still be very restrictive. Don’t forget to deploy them, once edited.</p>
<pre><code class="lang-js">{
  <span class="hljs-string">"rules"</span>: {
    <span class="hljs-string">".read"</span>: <span class="hljs-literal">false</span>,
    <span class="hljs-string">".write"</span>: <span class="hljs-literal">false</span>
  }
}
</code></pre>
<h4 id="heading-4-a-web-app-using-firebase-and-plotlysjs-to-visualize-weather-data">4) A web app using Firebase and plotlys.js to visualize weather data</h4>
<p><strong>Introduction</strong></p>
<p><em>Note:</em> We’re now building a “homemade” (and satisfying) data visualization solution. For enhanced UI (dashboard, etc.), maybe you should investigate Data Studio we already mentioned elsewhere in this post.</p>
<p>We focus on <strong>building a small web</strong> app, hosted by <strong>Firebase Hosting</strong>. This web app <strong>lively plots</strong> the data stored in the Firebase Realtime Database. We used <a target="_blank" href="https://www.freecodecamp.org/news/gcp-cloudiotcore-esp32-mongooseos-1st-5c88d8134ac7/undefined">plotly</a> (<a target="_blank" href="https://plot.ly/javascript/">https://plot.ly/javascript/</a>) for the plotting library. We are familiar with that work as we already undertook a similar one in a previous post:</p>
<p><a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-3-644048e90ca4"><strong>Post 3 of 3. Our IoT journey through ESP8266, Firebase and Plotly.js</strong></a><br><a target="_blank" href="https://medium.com/@o.lourme/our-iot-journey-through-esp8266-firebase-angular-and-plotly-js-part-3-644048e90ca4">_A web app hosted by Firebase Hosting susbscribes to the data stream coming from a Firebase Realtime Database and plot…_medium.com</a></p>
<p>What’s different today is that we have to:</p>
<ul>
<li>draw several charts: temperature <em>vs</em> time and humidity <em>vs</em> time,</li>
<li>inside each chart, we have one plot per device.</li>
</ul>
<p><strong>Database rules &amp; devices-ids node</strong></p>
<p>Concerning the database rules, what should be now the <code>database.rules.json</code> file? The <code>device-telemetry</code> node needs to be <strong>read</strong> by the web app. And if you looked attentively at the Realtime Database screenshot given a few screenshots before, there is another node called <code>devices-ids</code>.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*Si6-O12MUJXpzN3qyZyBVg.png" alt="Image" width="335" height="248" loading="lazy">
<em>Firebase Console — Detail of devices-ids node in Realtime Database</em></p>
<p>You need to create <strong>manually in the Firebase Console</strong> this <code>devices-ids</code> node and fill it appropriately for the web app to work properly. It is a simple mean to declare to the web app the devices we want plots for and also to give aliases to the devices. Its role and necessity are fully explained in comments of the <code>public/script.js</code> file given in <a target="_blank" href="https://github.com/olivierlourme/iot-store-display">GitHub</a>.</p>
<p><em>Note:</em> An improvement could be a form (accessed through authentication) that, once fulfilled, calls a script to generate this <code>devices-ids</code> node.</p>
<p>This <code>devices-ids</code> node also needs to be read by the web app. So the <code>database.rules.json</code> file should eventually become:</p>
<pre><code class="lang-js">{
  <span class="hljs-string">"rules"</span>: {
    <span class="hljs-string">"devices-ids"</span>: {
      <span class="hljs-string">".read"</span>: <span class="hljs-literal">true</span>,
      <span class="hljs-string">".write"</span>: <span class="hljs-literal">false</span>
    },
    <span class="hljs-string">"devices-telemetry"</span>: {
      <span class="hljs-string">".read"</span>: <span class="hljs-literal">true</span>,
      <span class="hljs-string">".write"</span>: <span class="hljs-literal">false</span>
    }
  }
}
</code></pre>
<p>These new rules, once edited and saved, must be deployed with:</p>
<pre><code>c:\_app\iot-store-display&gt;firebase deploy --only database
</code></pre><p><strong>Web app source code</strong></p>
<p>The web app source code lies in the <code>public</code> directory of our <code>hello-cloud-iot-core</code> folder or in <a target="_blank" href="https://github.com/olivierlourme/iot-store-display">GitHub</a>. The content of the folder, especially <code>script.js</code>, is fully commented so you know where to study (and improve!) it.</p>
<p><em>Note:</em> we have only two devices for this demo but the source code is okay for <em>x</em> devices as long as you declare them in the <code>devices-ids</code> node.</p>
<p><strong>Web app local deployment and validation</strong></p>
<p>For testing purposes, Firebase Hosting can launch a local live server:</p>
<pre><code>c:\_app\iot-store-display&gt;firebase serve --only hosting
</code></pre><p>We head to <code>http://localhost:5000</code> and we’re happy to get this:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*02UfatBLkuqCD12-JNBgjw.gif" alt="Image" width="800" height="599" loading="lazy">
<em>At last the charts we wanted!</em></p>
<p><strong>Web app remote deployment</strong></p>
<p>At last, Firebase offers us the hosting of our web app and an access to it via https:</p>
<pre><code>c:\_app\iot-store-display&gt;firebase deploy --only hosting
</code></pre><p>We quickly get the public URL of our web app : <a target="_blank" href="https://hello-cloud-iot-core.firebaseapp.com">https://hello-cloud-iot-core.firebaseapp.com</a></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*EZjq0siYgb0K6sMaFHlk4Q.png" alt="Image" width="534" height="86" loading="lazy">
<em>Web app deployment is complete!</em></p>
<p><em>Note:</em> If you have your own domain, you can connect your Firebase web app to it. See [<a target="_blank" href="https://firebase.google.com/docs/hosting/custom-domain">link</a>].</p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>In this post we discovered how to combine <strong>ESP32</strong>, <strong>Mongoose OS</strong> and <strong>Cloud IoT Core,</strong> obtaining a serious, secure and <strong>professional IoT project</strong>. Now that we know, it can go really fast to provision 10, 100… 1000 devices acquiring weather data all over an area, as long as they can get a Wifi connection. Now, devices are centrally managed, it is easy to provision and monitor them. But we can go further!</p>
<p>Indeed, in addition to this post, there is a second one ([<a target="_blank" href="https://medium.com/@o.lourme/gcp-cloudiotcore-esp32-mongooseos-2nd-config-state-encrypt-7c5e937e5be9">link</a>]). Inside it:</p>
<ul>
<li>We’ll focus on ESP32 flash <strong>memory encryption</strong>, to achieve a fully secured system.</li>
<li>We’ll see how to use the <code>config</code> special topic, allowing us to <strong>trig an action on the device from the Google Cloud Console</strong>.</li>
<li>We’ll see how to use the <code>state</code> special topic, allowing <strong>the device to communicate to Google Cloud Console informations about its current state</strong>.</li>
</ul>
<p>We hope you enjoyed this really long post and that you learnt something! Don’t hesitate to ping me if you have any questions or improvement suggestions…</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
