<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Amina Lawal - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Amina Lawal - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 14 May 2026 04:32:16 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/Bronze/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Build and Deploy Multi-Architecture Docker Apps on Google Cloud Using ARM Nodes (Without QEMU)
 ]]>
                </title>
                <description>
                    <![CDATA[ If you've bought a laptop in the last few years, there's a good chance it's running an ARM processor. Apple's M-series chips put ARM on the map for developers, but the real revolution is happening ins ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-and-deploy-multi-architecture-docker-apps-on-google-cloud-using-arm-nodes/</link>
                <guid isPermaLink="false">69dcf2c3f57346bc1e05a01d</guid>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Kubernetes ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ARM ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Amina Lawal ]]>
                </dc:creator>
                <pubDate>Mon, 13 Apr 2026 13:42:27 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/e89ae65a-4b3a-44b7-94d8-d0638f017bf6.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you've bought a laptop in the last few years, there's a good chance it's running an ARM processor. Apple's M-series chips put ARM on the map for developers, but the real revolution is happening inside cloud data centers.</p>
<p>Google Cloud Axion is Google's own custom ARM-based chip, built to handle the demands of modern cloud workloads. The performance and cost numbers are striking: Google claims Axion delivers up to 60% better energy efficiency and up to 65% better price-performance compared to comparable x86 machines.</p>
<p>AWS has Graviton. Azure has Cobalt. ARM is no longer niche. It's the direction the entire cloud industry is moving.</p>
<p>But there's a problem that catches almost every team off guard when they start this transition: <strong>container architecture mismatch</strong>.</p>
<p>If you build a Docker image on your M-series Mac and push it to an x86 server, it crashes on startup with a cryptic <code>exec format error</code>.</p>
<p>The server isn't broken. It just can't read the compiled instructions inside your image. An ARM binary and an x86 binary are written in fundamentally different languages at the machine level. The CPU literally can't execute instructions it wasn't designed for.</p>
<p>We're going to solve this problem completely in this tutorial. You'll build a single Docker image tag that automatically serves the correct binary on both ARM and x86 machines — no separate pipelines, no separate tags. Then you'll provision Google Cloud ARM nodes in GKE and configure your Kubernetes deployment to route workloads precisely to those cost-efficient nodes.</p>
<p><strong>Here's what you'll build, step by step:</strong></p>
<ul>
<li><p>A Go HTTP server that reports the CPU architecture it's running on at runtime</p>
</li>
<li><p>A multi-stage Dockerfile that cross-compiles for both <code>linux/amd64</code> and <code>linux/arm64</code> without slow QEMU emulation</p>
</li>
<li><p>A multi-arch image in Google Artifact Registry that acts as a single entry point for any architecture</p>
</li>
<li><p>A GKE cluster with two node pools: a standard x86 pool and an ARM Axion pool</p>
</li>
<li><p>A Kubernetes Deployment that pins your workload exclusively to the ARM nodes</p>
</li>
</ul>
<p>By the end, you'll hit a live endpoint and see the word <code>arm64</code> staring back at you from a Google Cloud ARM node. Let's get into it.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-step-1-set-up-your-google-cloud-project">Step 1: Set Up Your Google Cloud Project</a></p>
</li>
<li><p><a href="#heading-step-2-create-the-gke-cluster">Step 2: Create the GKE Cluster</a></p>
</li>
<li><p><a href="#heading-step-3-write-the-application">Step 3: Write the Application</a></p>
</li>
<li><p><a href="#heading-step-4-enable-multi-arch-builds-with-docker-buildx">Step 4: Enable Multi-Arch Builds with Docker Buildx</a></p>
</li>
<li><p><a href="#heading-step-5-write-the-dockerfile">Step 5: Write the Dockerfile</a></p>
</li>
<li><p><a href="#heading-step-6-build-and-push-the-multi-arch-image">Step 6: Build and Push the Multi-Arch Image</a></p>
</li>
<li><p><a href="#heading-step-7-add-the-axion-arm-node-pool">Step 7: Add the Axion ARM Node Pool</a></p>
</li>
<li><p><a href="#heading-step-8-deploy-the-app-to-the-arm-node-pool">Step 8: Deploy the App to the ARM Node Pool</a></p>
</li>
<li><p><a href="#heading-step-9-verify-the-deployment">Step 9: Verify the Deployment</a></p>
</li>
<li><p><a href="#heading-step-10-cost-savings-and-tradeoffs">Step 10: Cost Savings and Tradeoffs</a></p>
</li>
<li><p><a href="#heading-cleanup">Cleanup</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-project-file-structure">Project File Structure</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you start, make sure you have the following ready:</p>
<ul>
<li><p><strong>A Google Cloud project</strong> with billing enabled. If you don't have one, create it at <a href="https://console.cloud.google.com">console.cloud.google.com</a>. The total cost to follow this tutorial is around $5–10.</p>
</li>
<li><p><code>gcloud</code> <strong>CLI</strong> installed and authenticated. Run <code>gcloud auth login</code> to sign in and <code>gcloud config set project YOUR_PROJECT_ID</code> to point it at your project.</p>
</li>
<li><p><strong>Docker Desktop</strong> version 19.03 or later. Docker Buildx (the tool we'll use for multi-arch builds) ships bundled with it.</p>
</li>
<li><p><code>kubectl</code> installed. This is the CLI for interacting with Kubernetes clusters.</p>
</li>
<li><p>Basic familiarity with <strong>Docker</strong> (images, layers, Dockerfile) and <strong>Kubernetes</strong> (pods, deployments, services). You don't need to be an expert, but you should know what these things are.</p>
</li>
</ul>
<h2 id="heading-step-1-set-up-your-google-cloud-project">Step 1: Set Up Your Google Cloud Project</h2>
<p>Before writing a single line of application code, let's get the cloud infrastructure side ready. This is the foundation everything else will build on.</p>
<h3 id="heading-enable-the-required-apis">Enable the Required APIs</h3>
<p>Google Cloud services are off by default in any new project. Run this command to turn on the three APIs we'll need:</p>
<pre><code class="language-bash">gcloud services enable \
  artifactregistry.googleapis.com \
  container.googleapis.com \
  containeranalysis.googleapis.com
</code></pre>
<p>Here's what each one does:</p>
<ul>
<li><p><code>artifactregistry.googleapis.com</code> — enables <strong>Artifact Registry</strong>, where we'll store our Docker images</p>
</li>
<li><p><code>container.googleapis.com</code> — enables <strong>Google Kubernetes Engine (GKE)</strong>, where our cluster will run</p>
</li>
<li><p><code>containeranalysis.googleapis.com</code> — enables vulnerability scanning for images stored in Artifact Registry</p>
</li>
</ul>
<h3 id="heading-create-a-docker-repository-in-artifact-registry">Create a Docker Repository in Artifact Registry</h3>
<p>Artifact Registry is Google Cloud's managed container image store — the place where our built images will live before being deployed to the cluster. Create a dedicated repository for this tutorial:</p>
<pre><code class="language-bash">gcloud artifacts repositories create multi-arch-repo \
  --repository-format=docker \
  --location=us-central1 \
  --description="Multi-arch tutorial images"
</code></pre>
<p>Breaking down the flags:</p>
<ul>
<li><p><code>--repository-format=docker</code> — tells Artifact Registry this repository stores Docker images (as opposed to npm packages, Maven artifacts, and so on)</p>
</li>
<li><p><code>--location=us-central1</code> — the Google Cloud region where your images will be stored. Use a region that's close to where your cluster will run to minimize image pull latency. Run <code>gcloud artifacts locations list</code> to see all options.</p>
</li>
<li><p><code>--description</code> — a human-readable label for the repository, shown in the console.</p>
</li>
</ul>
<h3 id="heading-authenticate-docker-to-push-to-artifact-registry">Authenticate Docker to Push to Artifact Registry</h3>
<p>Docker needs credentials before it can push images to Google Cloud. Run this command to wire up authentication automatically:</p>
<pre><code class="language-bash">gcloud auth configure-docker us-central1-docker.pkg.dev
</code></pre>
<p>This adds a credential helper entry to your <code>~/.docker/config.json</code> file. What that means in practice: any time Docker tries to push or pull from a URL under <code>us-central1-docker.pkg.dev</code>, it will automatically call <code>gcloud</code> to get a valid auth token. You won't need to run <code>docker login</code> manually.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/31fd020f-ffa2-40bd-9057-57b16a61b325.png" alt="Terminal output of the gcloud artifacts repositories list command, showing a row for multi-arch-repo with format DOCKER, location us-central1" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-step-2-create-the-gke-cluster">Step 2: Create the GKE Cluster</h2>
<p>With Artifact Registry ready to receive images, let's create the Kubernetes cluster. We'll start with a standard cluster using x86 nodes and add an ARM node pool later once we have an image to deploy.</p>
<pre><code class="language-bash">gcloud container clusters create axion-tutorial-cluster \
  --zone=us-central1-a \
  --num-nodes=2 \
  --machine-type=e2-standard-2 \
  --workload-pool=PROJECT_ID.svc.id.goog
</code></pre>
<p>Replace <code>PROJECT_ID</code> with your actual Google Cloud project ID.</p>
<p>What each flag does:</p>
<ul>
<li><p><code>--zone=us-central1-a</code> — creates a zonal cluster in a single availability zone. A regional cluster (using <code>--region</code>) would spread nodes across three zones for higher resilience, but for this tutorial a single zone keeps things simple and avoids capacity issues that can affect specific zones. If <code>us-central1-a</code> is unavailable, try <code>us-central1-b</code>.</p>
</li>
<li><p><code>--num-nodes=2</code> — two x86 nodes in this zone. We need at least 2 to have enough capacity alongside our ARM node pool later.</p>
</li>
<li><p><code>--machine-type=e2-standard-2</code> — the machine type for this default node pool. <code>e2-standard-2</code> is a cost-effective x86 machine with 2 vCPUs and 8 GB of memory, good for general workloads.</p>
</li>
<li><p><code>--workload-pool=PROJECT_ID.svc.id.goog</code> — enables <strong>Workload Identity</strong>, which is Google's recommended way for pods to authenticate with Google Cloud APIs. It avoids the need to download and store service account key files inside your cluster.</p>
</li>
</ul>
<p>This command takes a few minutes. While it runs, you can move on to writing the application. We'll come back to the cluster in Step 6.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/332250a8-3f99-4eb1-849f-51ab054c9567.png" alt="GCP Console Kubernetes Engine Clusters page showing axion-tutorial-cluster with a green checkmark status, the zone us-central1-a, and Kubernetes version in the table." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-step-3-write-the-application">Step 3: Write the Application</h2>
<p>We need an application to containerize. We'll use <strong>Go</strong> for three specific reasons:</p>
<ol>
<li><p>Go compiles into a single, statically-linked binary. There's no runtime to install, no interpreter — just the binary. This makes for extremely lean container images.</p>
</li>
<li><p>Go has first-class, built-in cross-compilation support. We can compile an ARM64 binary from an x86 Mac, or vice versa, by setting two environment variables. This will matter a lot when we get to the Dockerfile.</p>
</li>
<li><p>Go exposes the architecture the binary was compiled for via <code>runtime.GOARCH</code>. Our server will report this at runtime, giving us hard proof that the correct binary is running on the correct hardware.</p>
</li>
</ol>
<p>Start by creating the project directories:</p>
<pre><code class="language-bash">mkdir -p hello-axion/app hello-axion/k8s
cd hello-axion/app
</code></pre>
<p>Initialize the Go module from inside <code>app/</code>. This creates <code>go.mod</code> in the current directory:</p>
<pre><code class="language-bash">go mod init hello-axion
</code></pre>
<p><code>go mod init</code> is Go's built-in command for starting a new module. It writes a <code>go.mod</code> file that declares the module name (<code>hello-axion</code>) and the minimum Go version required. Every modern Go project needs this file — without it, the compiler doesn't know how to resolve packages.</p>
<p>Now create the application at <code>app/main.go</code>:</p>
<pre><code class="language-go">package main

import (
    "fmt"
    "net/http"
    "os"
    "runtime"
)

func handler(w http.ResponseWriter, r *http.Request) {
    hostname, _ := os.Hostname()
    fmt.Fprintf(w, "Hello from freeCodeCamp!\n")
    fmt.Fprintf(w, "Architecture : %s\n", runtime.GOARCH)
    fmt.Fprintf(w, "OS           : %s\n", runtime.GOOS)
    fmt.Fprintf(w, "Pod hostname : %s\n", hostname)
}

func healthz(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
    fmt.Fprintln(w, "ok")
}

func main() {
    http.HandleFunc("/", handler)
    http.HandleFunc("/healthz", healthz)
    fmt.Println("Server starting on port 8080...")
    if err := http.ListenAndServe(":8080", nil); err != nil {
        fmt.Fprintf(os.Stderr, "server error: %v\n", err)
        os.Exit(1)
    }
}
</code></pre>
<p>Verify both files were created:</p>
<pre><code class="language-bash">ls -la
</code></pre>
<p>You should see <code>go.mod</code> and <code>main.go</code> listed.</p>
<p>Let's walk through what this code does:</p>
<ul>
<li><p><code>import "runtime"</code> — imports Go's built-in <code>runtime</code> package, which exposes information about the Go runtime environment, including the CPU architecture.</p>
</li>
<li><p><code>runtime.GOARCH</code> — returns a string like <code>"arm64"</code> or <code>"amd64"</code> representing the architecture this binary was compiled for. When we deploy to an ARM node, this value will be <code>arm64</code>. This is the core of our proof.</p>
</li>
<li><p><code>os.Hostname()</code> — returns the pod's hostname, which Kubernetes sets to the pod name. This lets us see which specific pod responded when we test the app later.</p>
</li>
<li><p><code>handler</code> — the main HTTP handler, registered on the root path <code>/</code>. It writes the architecture, OS, and hostname to the response.</p>
</li>
<li><p><code>healthz</code> — a separate handler registered on <code>/healthz</code>. It returns HTTP 200 with the text <code>ok</code>. Kubernetes will use this endpoint to check whether the container is alive and ready to serve traffic — we'll wire this up in the deployment manifest later.</p>
</li>
<li><p><code>http.ListenAndServe(":8080", nil)</code> — starts the server on port 8080. If it fails to start (for example, if the port is already in use), it prints the error and exits with a non-zero code so Kubernetes knows something went wrong.</p>
</li>
</ul>
<h2 id="heading-step-4-enable-multi-arch-builds-with-docker-buildx">Step 4: Enable Multi-Arch Builds with Docker Buildx</h2>
<p>Before we write the Dockerfile, we need to understand a fundamental constraint, because it directly shapes how the Dockerfile must be written.</p>
<h3 id="heading-why-your-docker-images-are-architecture-specific-by-default">Why Your Docker Images Are Architecture-Specific By Default</h3>
<p>A CPU only understands instructions written for its specific <strong>Instruction Set Architecture (ISA)</strong>. ARM64 and x86_64 are different ISAs — different vocabularies of machine-level operations. When you compile a Go program, the compiler translates your source code into binary instructions for exactly one ISA. That binary can't run on a different ISA.</p>
<p>When you build a Docker image the normal way (<code>docker build</code>), the binary inside that image is compiled for your local machine's ISA. If you're on an Apple Silicon Mac, you get an ARM64 binary. Push that image to an x86 server, and when Docker tries to execute the binary, the kernel rejects it:</p>
<pre><code class="language-shell">standard_init_linux.go:228: exec user process caused: exec format error
</code></pre>
<p>That's the operating system saying: "This binary was written for a different processor. I have no idea what to do with it."</p>
<h3 id="heading-the-solution-a-single-image-tag-that-serves-any-architecture">The Solution: A Single Image Tag That Serves Any Architecture</h3>
<p>Docker solves this with a structure called a <strong>Manifest List</strong> (also called a multi-arch image index). Instead of one image, a Manifest List is a pointer table. It holds multiple image references — one per architecture — all under the same tag.</p>
<p>When a server pulls <code>hello-axion:v1</code>, here's what actually happens:</p>
<ol>
<li><p>Docker contacts the registry and requests the manifest for <code>hello-axion:v1</code></p>
</li>
<li><p>The registry returns the Manifest List, which looks like this internally:</p>
</li>
</ol>
<pre><code class="language-json">{
  "manifests": [
    { "digest": "sha256:a1b2...", "platform": { "architecture": "amd64", "os": "linux" } },
    { "digest": "sha256:c3d4...", "platform": { "architecture": "arm64", "os": "linux" } }
  ]
}
</code></pre>
<ol>
<li>Docker checks the current machine's architecture, finds the matching entry, and pulls only that specific image layer. The x86 image never downloads onto your ARM server, and vice versa.</li>
</ol>
<p>One tag, two actual images. Completely transparent to your deployment manifests.</p>
<h3 id="heading-set-up-docker-buildx">Set Up Docker Buildx</h3>
<p><strong>Docker Buildx</strong> is the CLI tool that builds these Manifest Lists. It's powered by the <strong>BuildKit</strong> engine and ships bundled with Docker Desktop. Run the following to create and activate a new builder instance:</p>
<pre><code class="language-bash">docker buildx create --name multiarch-builder --use
</code></pre>
<ul>
<li><p><code>--name multiarch-builder</code> — gives this builder a memorable name. You can have multiple builders. This command creates a new one named <code>multiarch-builder</code>.</p>
</li>
<li><p><code>--use</code> — immediately sets this new builder as the active one, so all future <code>docker buildx build</code> commands use it.</p>
</li>
</ul>
<p>Now boot the builder and confirm it supports the platforms we need:</p>
<pre><code class="language-bash">docker buildx inspect --bootstrap
</code></pre>
<ul>
<li><code>--bootstrap</code> — starts the builder container if it isn't already running, and prints its full configuration.</li>
</ul>
<p>You should see output like this:</p>
<pre><code class="language-plaintext">Name:          multiarch-builder
Driver:        docker-container
Platforms:     linux/amd64, linux/arm64, linux/arm/v7, linux/386, ...
</code></pre>
<p>The <code>Platforms</code> line lists every architecture this builder can produce images for. As long as you see <code>linux/amd64</code> and <code>linux/arm64</code> in that list, you're ready to build for both x86 and ARM.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/1c19aca1-30c4-406d-9c37-679ee4f2928f.png" alt="Terminal output showing the multiarch-builder details with Name, Driver set to docker-container, and a Platforms list that includes linux/amd64 and linux/arm64 highlighted." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-step-5-write-the-dockerfile">Step 5: Write the Dockerfile</h2>
<p>Now we can write the Dockerfile. We'll use two techniques together: a <strong>multi-stage build</strong> to keep the final image tiny, and a <strong>cross-compilation trick</strong> to avoid slow CPU emulation.</p>
<p>Create <code>app/Dockerfile</code> with the following content:</p>
<pre><code class="language-dockerfile"># -----------------------------------------------------------
# Stage 1: Build
# -----------------------------------------------------------
# $BUILDPLATFORM = the machine running this build (your laptop)
# \(TARGETOS / \)TARGETARCH = the platform we are building FOR
# -----------------------------------------------------------
FROM --platform=$BUILDPLATFORM golang:1.23-alpine AS builder

ARG TARGETOS
ARG TARGETARCH

WORKDIR /app

COPY go.mod .
RUN go mod download

COPY main.go .

RUN GOOS=\(TARGETOS GOARCH=\)TARGETARCH go build -ldflags="-w -s" -o server main.go

# -----------------------------------------------------------
# Stage 2: Runtime
# -----------------------------------------------------------

FROM alpine:latest

RUN addgroup -S appgroup &amp;&amp; adduser -S appuser -G appgroup
USER appuser

WORKDIR /app
COPY --from=builder /app/server .

EXPOSE 8080
CMD ["./server"]
</code></pre>
<p>There's a lot happening here. Let's go through it carefully.</p>
<h3 id="heading-stage-1-the-builder">Stage 1: The Builder</h3>
<p><code>FROM --platform=$BUILDPLATFORM golang:1.23-alpine AS builder</code></p>
<p>This is the most important line in the file. <code>\(BUILDPLATFORM</code> is a special build argument that Docker Buildx automatically injects — it equals the platform of the machine <em>running the build</em> (your laptop). By pinning the builder stage to <code>\)BUILDPLATFORM</code>, the Go compiler always runs natively on your machine, not inside a CPU emulator. This is what makes multi-arch builds fast.</p>
<p>Without <code>--platform=$BUILDPLATFORM</code>, Buildx would have to use <strong>QEMU</strong> — a full CPU emulator — to run an ARM64 build environment on your x86 machine (or vice versa). QEMU works, but it's typically 5–10 times slower than native execution. For a project with many dependencies, that's the difference between a 2-minute build and a 20-minute build.</p>
<p><code>ARG TARGETOS</code> <strong>and</strong> <code>ARG TARGETARCH</code></p>
<p>These two lines declare that our Dockerfile expects build arguments named <code>TARGETOS</code> and <code>TARGETARCH</code>. Buildx injects these automatically based on the <code>--platform</code> flag you pass at build time. For a <code>linux/arm64</code> target, <code>TARGETOS</code> will be <code>linux</code> and <code>TARGETARCH</code> will be <code>arm64</code>.</p>
<p><code>COPY go.mod .</code> <strong>and</strong> <code>RUN go mod download</code></p>
<p>We copy <code>go.mod</code> first, before copying the rest of the source code. Docker builds images layer by layer and caches each layer. By copying only the module file first, we create a cached layer for <code>go mod download</code>.</p>
<p>On future builds, as long as <code>go.mod</code> hasn't changed, Docker skips the download step entirely — even if the source code changed. This speeds up iterative development significantly.</p>
<p><code>RUN GOOS=\(TARGETOS GOARCH=\)TARGETARCH go build -ldflags="-w -s" -o server main.go</code></p>
<p>This is the cross-compilation step. <code>GOOS</code> and <code>GOARCH</code> are Go's built-in cross-compilation environment variables. Setting them tells the Go compiler to produce a binary for a different target than the machine it's running on. We set them from the <code>\(TARGETOS</code> and <code>\)TARGETARCH</code> build args injected by Buildx.</p>
<p>The <code>-ldflags="-w -s"</code> flag strips the debug symbol table and the DWARF debugging information from the binary. This has no effect on runtime behavior but reduces the binary size by roughly 30%.</p>
<h3 id="heading-stage-2-the-runtime-image">Stage 2: The Runtime Image</h3>
<p><code>FROM alpine:latest</code></p>
<p>This starts a brand-new image from Alpine Linux — a minimal Linux distribution that weighs about 5 MB. Critically, <code>alpine:latest</code> is itself a multi-arch image, so Docker automatically selects the <code>arm64</code> or <code>amd64</code> Alpine variant depending on which platform this stage is built for.</p>
<p>Everything from Stage 1 — the Go toolchain, the source files, the intermediate object files — is discarded. The final image contains <em>only</em> Alpine Linux plus our binary. Compared to a naive single-stage Go image (~300 MB), this approach produces an image under 15 MB.</p>
<p><code>RUN addgroup -S appgroup &amp;&amp; adduser -S appuser -G appgroup</code> and <code>USER appuser</code></p>
<p>These two lines create a non-root user and set it as the active user for the container. Running containers as root is a security risk — if an attacker exploits a vulnerability in your application, they gain root access inside the container. Running as a non-root user limits the blast radius.</p>
<p><code>COPY --from=builder /app/server .</code></p>
<p>This is how multi-stage builds work: the <code>--from=builder</code> flag tells Docker to copy files from the <code>builder</code> stage (Stage 1), not from your local disk. Only the compiled binary (<code>server</code>) makes it into the final image.</p>
<h2 id="heading-step-6-build-and-push-the-multi-arch-image">Step 6: Build and Push the Multi-Arch Image</h2>
<p>With the application and Dockerfile in place, we can now build images for both architectures and push them to Artifact Registry — all in a single command.</p>
<p>From inside the <code>app/</code> directory, run:</p>
<pre><code class="language-bash">docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1 \
  --push \
  .
</code></pre>
<p>Replace <code>PROJECT_ID</code> with your actual GCP project ID.</p>
<p>Here's what each part of this command does:</p>
<ul>
<li><p><code>docker buildx build</code> — uses the Buildx CLI instead of the standard <code>docker build</code>. Buildx is required for multi-platform builds.</p>
</li>
<li><p><code>--platform linux/amd64,linux/arm64</code> — instructs Buildx to build the image twice: once targeting x86 Intel/AMD machines, and once targeting ARM64. Both builds run in parallel. Because our Dockerfile uses the <code>$BUILDPLATFORM</code> cross-compilation trick, both builds run natively on your machine without QEMU emulation.</p>
</li>
<li><p><code>-t us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1</code> — the full image path in Artifact Registry. The format is always <code>REGION-docker.pkg.dev/PROJECT_ID/REPO_NAME/IMAGE_NAME:TAG</code>.</p>
</li>
<li><p><code>--push</code> — multi-arch images can't be loaded into your local Docker daemon (which only understands single-architecture images). This flag tells Buildx to skip local storage and push the completed Manifest List — with both architecture variants — directly to the registry.</p>
</li>
<li><p><code>.</code> — the build context, the directory Docker scans for the Dockerfile and any files the build needs.</p>
</li>
</ul>
<p>Watch the output as the build runs. You'll see BuildKit working on both platforms simultaneously:</p>
<pre><code class="language-plaintext"> =&gt; [linux/amd64 builder 1/5] FROM golang:1.23-alpine
 =&gt; [linux/arm64 builder 1/5] FROM golang:1.23-alpine
 ...
 =&gt; pushing manifest for us-central1-docker.pkg.dev/.../hello-axion:v1
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/dc88f558-b4ee-4100-bfe1-eaa943bec9bc.png" alt="Terminal showing docker buildx build output with two parallel build tracks labeled linux/amd64 and linux/arm64, and a final line reading pushing manifest for the Artifact Registry image path." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-verify-the-multi-arch-image-in-artifact-registry">Verify the Multi-Arch Image in Artifact Registry</h3>
<p>Once the push completes, navigate to <strong>GCP Console → Artifact Registry → Repositories → multi-arch-repo</strong> and click on <code>hello-axion</code>.</p>
<p>You won't see a single image — you'll see something labelled <strong>"Image Index"</strong>. That's the Manifest List we created. Click into it, and you'll find two child images with separate digests, one for <code>linux/amd64</code> and one for <code>linux/arm64</code>.</p>
<p>You can also inspect this from the command line:</p>
<pre><code class="language-bash">docker buildx imagetools inspect \
  us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/28d0e4a4-1d45-4c0b-ac47-34dc3b72c11d.png" alt="Google Cloud Artifact Registry console showing hello-axion as an Image Index with two child images: one labeled linux/amd64 and one labeled linux/arm64, each with its own digest and size." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The output lists every manifest inside the image index. You'll see entries for <code>linux/amd64</code> and <code>linux/arm64</code> — those are our two real images. You'll also see two entries with <code>Platform: unknown/unknown</code> labelled as <code>attestation-manifest</code>. These are <strong>build provenance records</strong> that Docker Buildx automatically attaches to prove how and where the image was built (a supply chain security feature called SLSA attestation).</p>
<p>The two entries you care about are <code>linux/amd64</code> and <code>linux/arm64</code>. Note the digest for the <code>arm64</code> entry — we'll use it in the verification step to confirm the cluster pulled the right variant.</p>
<h2 id="heading-step-7-add-the-axion-arm-node-pool">Step 7: Add the Axion ARM Node Pool</h2>
<p>We have a universal image. Now we need somewhere to run it.</p>
<p>Recall the cluster we created in Step 2 — it's running <code>e2-standard-2</code> x86 machines. We're going to add a second node pool running ARM machines. This is the key architectural move: a <strong>mixed-architecture cluster</strong> where different workloads can be routed to different hardware.</p>
<h3 id="heading-choosing-your-arm-machine-type">Choosing Your ARM Machine Type</h3>
<p>Google Cloud currently offers two ARM-based machine series in GKE:</p>
<table>
<thead>
<tr>
<th>Series</th>
<th>Example type</th>
<th>What it is</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Tau T2A</strong></td>
<td><code>t2a-standard-2</code></td>
<td>First-gen Google ARM (Ampere Altra). Broadly available across regions. Great for getting started.</td>
</tr>
<tr>
<td><strong>Axion (C4A)</strong></td>
<td><code>c4a-standard-2</code></td>
<td>Google's custom ARM chip (Arm Neoverse V2 core). Newest generation, best price-performance. Still expanding availability.</td>
</tr>
</tbody></table>
<p>This tutorial uses <code>t2a-standard-2</code> because it's widely available. The commands are identical for <code>c4a-standard-2</code> — just swap the <code>--machine-type</code> value. If <code>t2a-standard-2</code> isn't available in your zone, GKE will tell you immediately when you run the node pool creation command below, and you can try a neighbouring zone.</p>
<h3 id="heading-create-the-arm-node-pool">Create the ARM Node Pool</h3>
<p>Add the ARM node pool to your existing cluster:</p>
<pre><code class="language-bash">gcloud container node-pools create axion-pool \
  --cluster=axion-tutorial-cluster \
  --zone=us-central1-a \
  --machine-type=t2a-standard-2 \
  --num-nodes=2 \
  --node-labels=workload-type=arm-optimized
</code></pre>
<p>What each flag does:</p>
<ul>
<li><p><code>--cluster=axion-tutorial-cluster</code> — the name of the cluster we created in Step 2. Node pools are always added to an existing cluster.</p>
</li>
<li><p><code>--zone=us-central1-a</code> — must match the zone you used when creating the cluster.</p>
</li>
<li><p><code>--machine-type=t2a-standard-2</code> — GKE detects this is an ARM machine type and automatically provisions the nodes with an ARM-compatible version of Container-Optimized OS (COS). You don't need to configure anything special for ARM at the OS level.</p>
</li>
<li><p><code>--num-nodes=2</code> — two ARM nodes in the zone, enough to schedule our 3-replica deployment alongside other cluster overhead.</p>
</li>
<li><p><code>--node-labels=workload-type=arm-optimized</code> — attaches a custom label to every node in this pool. We'll use this label in our deployment manifest to target these specific nodes. Using a descriptive custom label (rather than just relying on the automatic <code>kubernetes.io/arch=arm64</code> label) is good practice in real clusters — it communicates the <em>intent</em> of the pool, not just its hardware.</p>
</li>
</ul>
<p>This command takes a few minutes. Once it completes, let's confirm our cluster now has both node pools:</p>
<pre><code class="language-bash">gcloud container clusters get-credentials axion-tutorial-cluster --zone=us-central1-a

kubectl get nodes --label-columns=kubernetes.io/arch
</code></pre>
<p>The <code>get-credentials</code> command configures <code>kubectl</code> to authenticate with your new cluster. The <code>get nodes</code> command then lists all nodes and adds a column showing the <code>kubernetes.io/arch</code> label.</p>
<p>You should see something like:</p>
<pre><code class="language-plaintext">NAME                                    STATUS   ARCH    AGE
gke-...default-pool-abc...              Ready    amd64   15m
gke-...default-pool-def...              Ready    amd64   15m
gke-...axion-pool-jkl...                Ready    arm64   3m
gke-...axion-pool-mno...                Ready    arm64   3m
</code></pre>
<p><code>amd64</code> for the default x86 pool, <code>arm64</code> for our new Axion pool. This <code>kubernetes.io/arch</code> label is applied automatically by GKE — you don't set it, it's derived from the hardware.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/6389f4c6-17fe-4086-982f-39d94dbfa252.png" alt="Terminal output of kubectl get nodes with a ARCH column showing amd64 for two default-pool nodes and arm64 for two axion-pool nodes." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-step-8-deploy-the-app-to-the-arm-node-pool">Step 8: Deploy the App to the ARM Node Pool</h2>
<p>We have a multi-arch image and a mixed-architecture cluster. Here's something important to understand before writing the deployment manifest: <strong>Kubernetes doesn't know or care about image architecture by default</strong>.</p>
<p>If you applied a standard Deployment right now, the scheduler would look for any available node with enough CPU and memory and place pods there — potentially landing on x86 nodes instead of your ARM Axion nodes. The multi-arch Manifest List would handle this gracefully (the right binary would run regardless), but you'd lose the cost efficiency you provisioned Axion nodes for in the first place.</p>
<p>To guarantee that pods land on ARM nodes and only ARM nodes, we use a <code>nodeSelector</code>.</p>
<h3 id="heading-how-nodeselector-works">How nodeSelector Works</h3>
<p>A <code>nodeSelector</code> is a set of key-value pairs in your pod spec. Before the Kubernetes scheduler places a pod, it checks every available node's labels. If a node doesn't have all the labels in the <code>nodeSelector</code>, the scheduler skips it — the pod will remain in <code>Pending</code> state rather than land on the wrong node.</p>
<p>This is a hard constraint, which is exactly what we want for cost optimization. Contrast this with Node Affinity's soft preference mode (<code>preferredDuringSchedulingIgnoredDuringExecution</code>), which says "try to use ARM, but fall back to x86 if needed." Soft preferences are useful for resilience, but they undermine the whole point of dedicated ARM pools. We want the hard constraint.</p>
<h3 id="heading-write-the-deployment-manifest">Write the Deployment Manifest</h3>
<p>Create <code>k8s/deployment.yaml</code>:</p>
<pre><code class="language-yaml">apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-axion
  labels:
    app: hello-axion
spec:
  replicas: 3
  selector:
    matchLabels:
      app: hello-axion
  template:
    metadata:
      labels:
        app: hello-axion
    spec:
      nodeSelector:
        kubernetes.io/arch: arm64

      containers:
      - name: hello-axion
        image: us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 3
          periodSeconds: 5
        resources:
          requests:
            cpu: "250m"
            memory: "64Mi"
          limits:
            cpu: "500m"
            memory: "128Mi"
</code></pre>
<p>Replace <code>PROJECT_ID</code> with your project ID. Here's what the key sections do:</p>
<p><code>replicas: 3</code> — tells Kubernetes to keep three instances of this pod running at all times. If one crashes or a node goes down, the scheduler spins up a replacement. Three replicas also means one pod per ARM node in <code>us-central1</code>, which distributes load across availability zones.</p>
<p><code>selector.matchLabels</code> and <code>template.metadata.labels</code> — these two blocks must match. The <code>selector</code> tells the Deployment which pods it "owns," and the <code>template.metadata.labels</code> is what those pods will be tagged with. If they don't match, Kubernetes won't be able to manage the pods.</p>
<p><code>nodeSelector: kubernetes.io/arch: arm64</code> — this is the pin. The Kubernetes scheduler filters out every node that doesn't carry this label before considering resource availability. Since GKE automatically applies <code>kubernetes.io/arch=arm64</code> to all ARM nodes, our pods will schedule only onto the <code>axion-pool</code> nodes.</p>
<p><code>livenessProbe</code> — periodically calls <code>GET /healthz</code>. If this check fails a certain number of times in a row (indicating the container has deadlocked or is otherwise unresponsive), Kubernetes restarts the container. <code>initialDelaySeconds: 5</code> gives the server 5 seconds to start up before the first check.</p>
<p><code>readinessProbe</code> — similar to the liveness probe, but with a different purpose. While the readiness probe is failing, Kubernetes removes the pod from the service's load balancer, so no traffic is sent to it. This is important during startup — the pod won't receive traffic until it signals it's ready.</p>
<p><code>resources.requests</code> — reserves <code>250m</code> (25% of a CPU core) and <code>64Mi</code> of memory on the node for this pod. The scheduler uses these numbers to decide whether a node has enough room for the pod. Setting requests is required for sensible bin-packing. Without them, nodes can be silently overcommitted.</p>
<p><code>resources.limits</code> — caps the container at <code>500m</code> CPU and <code>128Mi</code> memory. If the container exceeds these limits, Kubernetes throttles the CPU or kills the container (for memory). This prevents a single misbehaving pod from starving other workloads on the same node.</p>
<h3 id="heading-a-note-on-taints-and-tolerations">A Note on Taints and Tolerations</h3>
<p>Once you're comfortable with <code>nodeSelector</code>, the next step in production clusters is adding a <strong>taint</strong> to your ARM node pool. A taint is a repellent — any pod without an explicit <strong>toleration</strong> for that taint is blocked from landing on the tainted node.</p>
<p>This means other workloads in your cluster can't accidentally consume your ARM capacity. You'd add the taint when creating the pool:</p>
<pre><code class="language-bash"># Add --node-taints to the pool creation command:
--node-taints=workload-type=arm-optimized:NoSchedule
</code></pre>
<p>And a matching toleration in the pod spec:</p>
<pre><code class="language-yaml">tolerations:
- key: "workload-type"
  operator: "Equal"
  value: "arm-optimized"
  effect: "NoSchedule"
</code></pre>
<p>We're not doing this in the tutorial to keep things simple, but it's the pattern production multi-tenant clusters use to enforce hard separation between workload types.</p>
<h3 id="heading-write-the-service-manifest">Write the Service Manifest</h3>
<p>We also need a Kubernetes Service to expose the pods over the network. Create <code>k8s/service.yaml</code>:</p>
<pre><code class="language-yaml">apiVersion: v1
kind: Service
metadata:
  name: hello-axion-svc
spec:
  selector:
    app: hello-axion
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer
</code></pre>
<ul>
<li><p><code>selector: app: hello-axion</code> — the Service discovers pods using labels. Any pod with <code>app: hello-axion</code> on it will be added to this Service's load balancer pool.</p>
</li>
<li><p><code>port: 80</code> — the port the Service is reachable on from outside the cluster.</p>
</li>
<li><p><code>targetPort: 8080</code> — the port on the pod that traffic gets forwarded to. Our Go server listens on port 8080, so this must match.</p>
</li>
<li><p><code>type: LoadBalancer</code> — tells GKE to provision an external Google Cloud load balancer and assign it a public IP. This is what makes the Service reachable from the internet.</p>
</li>
</ul>
<h3 id="heading-apply-both-manifests">Apply Both Manifests</h3>
<pre><code class="language-bash">kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
</code></pre>
<p><code>kubectl apply</code> reads each manifest file and creates or updates the resources described in it. If the resources don't exist yet, they're created. If they already exist, Kubernetes only applies the diff — it won't restart pods unnecessarily.</p>
<p>Watch the pods come up in real time:</p>
<pre><code class="language-bash">kubectl get pods -w
</code></pre>
<p>The <code>-w</code> flag watches for changes and prints updates as they happen. You should see pods transition from <code>Pending</code> → <code>ContainerCreating</code> → <code>Running</code>. Once all three show <code>Running</code>, press <code>Ctrl+C</code> to stop watching.</p>
<h2 id="heading-step-9-verify-the-deployment">Step 9: Verify the Deployment</h2>
<p>Everything is running. Now we need evidence — not just that pods are up, but that they're on the right nodes and serving the right binary.</p>
<h3 id="heading-confirm-pod-placement">Confirm Pod Placement</h3>
<pre><code class="language-bash">kubectl get pods -o wide
</code></pre>
<p>The <code>-o wide</code> flag adds extra columns to the output, including the name of the node each pod was scheduled on. Look at the <code>NODE</code> column:</p>
<pre><code class="language-plaintext">NAME                          READY   STATUS    NODE
hello-axion-7b8d9f-abc12      1/1     Running   gke-axion-tutorial-axion-pool-a-...
hello-axion-7b8d9f-def34      1/1     Running   gke-axion-tutorial-axion-pool-b-...
hello-axion-7b8d9f-ghi56      1/1     Running   gke-axion-tutorial-axion-pool-c-...
</code></pre>
<p>All three pods should show node names containing <code>axion-pool</code>. None should show <code>default-pool</code>.</p>
<h3 id="heading-confirm-the-nodes-are-arm">Confirm the Nodes Are ARM</h3>
<p>Take one of those node names and verify its architecture label:</p>
<pre><code class="language-bash">kubectl get node NODE_NAME --show-labels | grep kubernetes.io/arch
</code></pre>
<p>Replace <code>NODE_NAME</code> with one of the node names from the previous command. You should see:</p>
<pre><code class="language-plaintext">kubernetes.io/arch=arm64
</code></pre>
<p>That's the automatic label GKE applied when it provisioned the ARM hardware. Our <code>nodeSelector</code> matched on this label to pin the pods here.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/815312ea-e2bf-4106-863e-55cd0bdad5f7.png" alt="Terminal split into two sections: the top showing kubectl get pods -o wide with all pods scheduled on nodes containing axion-pool in the name, and the bottom showing kubectl get node with kubernetes.io/arch=arm64 in the labels output." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-ask-the-application-itself">Ask the Application Itself</h3>
<p>This is the most satisfying verification step. Our Go server reports the architecture of the binary that's running. Let's ask it directly.</p>
<p>Use <code>kubectl port-forward</code> to create a secure tunnel from port 8080 on your local machine to port 8080 on the Deployment:</p>
<pre><code class="language-bash">kubectl port-forward deployment/hello-axion 8080:8080
</code></pre>
<p>This command stays running in the foreground — open a <strong>second terminal window</strong> and run:</p>
<pre><code class="language-bash">curl http://localhost:8080
</code></pre>
<p>You should see:</p>
<pre><code class="language-plaintext">Hello from freeCodeCamp!
Architecture : arm64
OS           : linux
Pod hostname : hello-axion-7b8d9f-abc12
</code></pre>
<p><code>Architecture : arm64</code>. That's our Go binary confirming that it was compiled for ARM64 and is executing on an ARM64 CPU. The single image tag we built does the right thing automatically.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/114ff82d-950f-4059-a1fa-89baffb90b6c.png" alt="Terminal output of curl http://localhost:8080 showing the four-line response: Hello from freeCodeCamp, Architecture: arm64, OS: linux, and the pod hostname." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-the-bonus-see-the-manifest-list-in-action">The Bonus: See the Manifest List in Action</h3>
<p>Want to see the multi-arch image indexing at work? Stop the port-forward, then run:</p>
<pre><code class="language-bash">docker buildx imagetools inspect \
  us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1
</code></pre>
<p>Replace <code>PROJECT_ID</code> with your actual Google Cloud project ID.</p>
<p>You'll see four entries in the manifest list. Two are real images — <code>Platform: linux/amd64</code> and <code>Platform: linux/arm64</code>. The other two show <code>Platform: unknown/unknown</code> with an <code>attestation-manifest</code> annotation. These are <strong>build provenance records</strong> that Docker Buildx automatically attaches to every image — a supply chain security feature (SLSA attestation) that proves how and where the image was built.</p>
<p>You may notice that if you check the image digest recorded in a running pod:</p>
<pre><code class="language-bash">kubectl get pod POD_NAME \
  -o jsonpath='{.status.containerStatuses[0].imageID}'
</code></pre>
<p>Replace <code>POD_NAME</code> with one of the pod names from earlier.</p>
<p>The digest returned matches the <strong>top-level manifest list digest</strong>, not the <code>arm64</code>-specific one. This is expected behaviour. Modern Kubernetes (using containerd) records the manifest list digest, not the resolved platform digest. The platform resolution already happened when the node pulled the correct image variant.</p>
<p>The definitive proof that the right binary is running is what you already have: the node labeled <code>kubernetes.io/arch=arm64</code> and the application reporting <code>Architecture: arm64</code>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f97fb446ea7602886a16070/7dffe0c8-28cf-4a5d-8459-1e8db3da7dc0.png" alt="top-level manifest list digest" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-step-10-cost-savings-and-tradeoffs">Step 10: Cost Savings and Tradeoffs</h2>
<p>The hands-on work is done. Let's talk about why any of this is worth the effort.</p>
<h3 id="heading-the-cost-math">The Cost Math</h3>
<p>At the time of writing, here's how ARM compares to equivalent x86 machines on Google Cloud (prices are approximate and change over time — check the <a href="https://cloud.google.com/compute/vm-instance-pricing">official pricing page</a> before making decisions):</p>
<table>
<thead>
<tr>
<th>Instance</th>
<th>vCPU</th>
<th>Memory</th>
<th>Approx. $/hour</th>
</tr>
</thead>
<tbody><tr>
<td><code>n2-standard-4</code> (x86)</td>
<td>4</td>
<td>16 GB</td>
<td>~$0.19</td>
</tr>
<tr>
<td><code>t2a-standard-4</code> (Tau ARM)</td>
<td>4</td>
<td>16 GB</td>
<td>~$0.14</td>
</tr>
<tr>
<td><code>c4a-standard-4</code> (Axion)</td>
<td>4</td>
<td>16 GB</td>
<td>~$0.15</td>
</tr>
</tbody></table>
<p>That's a raw 25–30% reduction in compute cost per node. Factor in Google's published claim of up to 65% better price-performance for Axion on relevant workloads — meaning you may need fewer nodes to handle the same traffic — and the savings compound further.</p>
<p>Here's how that looks at scale, for a service running 20 nodes continuously for a year:</p>
<ul>
<li><p>20 × <code>n2-standard-4</code> × \(0.19 × 8,760 hours = <strong>\)33,288/year</strong></p>
</li>
<li><p>20 × <code>t2a-standard-4</code> × \(0.14 × 8,760 hours = <strong>\)24,528/year</strong></p>
</li>
</ul>
<p>That's roughly <strong>$8,760 saved annually</strong> on compute, before committed use discounts (which further widen the gap).</p>
<h3 id="heading-when-arm-is-the-right-choice">When ARM Is the Right Choice</h3>
<p>ARM works best for:</p>
<ul>
<li><p><strong>Stateless API servers and web applications</strong> — like the app we built. ARM excels at high-throughput, low-latency network workloads.</p>
</li>
<li><p><strong>Background workers and queue processors</strong> — long-running services that don't depend on x86-specific binaries.</p>
</li>
<li><p><strong>Microservices written in Go, Rust, or Python</strong> — these languages have excellent ARM64 support and are built cross-platform by default.</p>
</li>
</ul>
<h3 id="heading-when-to-proceed-carefully">When to Proceed Carefully</h3>
<ul>
<li><p><strong>Native library dependencies</strong> — some older C libraries, proprietary SDKs, or compiled ML model-serving runtimes don't have ARM64 builds. Always audit your dependency tree before migrating.</p>
</li>
<li><p><strong>CI pipelines need ARM too</strong> — your automated tests should run on ARM, not just x86. An image that silently fails only on ARM is harder to debug than one that never claimed ARM support.</p>
</li>
<li><p><strong>Profile before optimizing</strong> — the cost savings are real, but measure your actual workload behavior on ARM before committing. Not every workload benefits equally.</p>
</li>
</ul>
<h2 id="heading-cleanup">Cleanup</h2>
<p>When you're done, clean up to avoid ongoing charges:</p>
<pre><code class="language-bash"># Remove the Kubernetes resources from the cluster
kubectl delete -f k8s/

# Delete the ARM node pool
gcloud container node-pools delete axion-pool \
  --cluster=axion-tutorial-cluster \
  --zone=us-central1-a

# Delete the cluster itself
gcloud container clusters delete axion-tutorial-cluster \
  --zone=us-central1-a

# Delete the images from Artifact Registry (optional — storage costs are minimal)
gcloud artifacts docker images delete \
  us-central1-docker.pkg.dev/PROJECT_ID/multi-arch-repo/hello-axion:v1
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Let's recap what you built and why each part matters.</p>
<p>You started with a Go application, a Dockerfile, and a <code>docker buildx build</code> command that produced two images — one for x86, one for ARM64 — wrapped in a single Manifest List tag. Any server that pulls that tag gets the right binary automatically, without you maintaining separate pipelines or separate tags.</p>
<p>You provisioned a GKE cluster with two node pools running different CPU architectures, then used <code>nodeSelector</code> to make sure your ARM-optimized workload lands only on the ARM Axion nodes — not on x86 by accident. The result is a deployment that's both architecture-correct and cost-efficient.</p>
<p>The patterns you practiced here don't stop at this demo. The same Dockerfile technique works for any language with cross-compilation support. The same <code>nodeSelector</code> approach works for any workload you want to pin to ARM. As more teams migrate services to ARM over the coming years, having these skills will be a real asset.</p>
<p><strong>Where to go from here:</strong></p>
<ul>
<li><p>Add a GitHub Actions workflow that runs <code>docker buildx build --platform linux/amd64,linux/arm64</code> on every push, automating this entire process in CI.</p>
</li>
<li><p>Audit one of your existing stateless services for ARM compatibility and try migrating it.</p>
</li>
<li><p>Explore <strong>Node Affinity</strong> as a softer alternative to <code>nodeSelector</code> for workloads that can run on either architecture but prefer ARM.</p>
</li>
<li><p>Look into <strong>GKE Autopilot</strong>, which now supports ARM nodes and handles node pool management automatically.</p>
</li>
</ul>
<p>Happy building.</p>
<h2 id="heading-project-file-structure">Project File Structure</h2>
<pre><code class="language-plaintext">hello-axion/
├── app/
│   ├── main.go          — Go HTTP server
│   ├── go.mod           — Go module definition
│   └── Dockerfile       — Multi-stage Dockerfile
└── k8s/
    ├── deployment.yaml  — Deployment with nodeSelector and probes
    └── service.yaml     — LoadBalancer Service
</code></pre>
<p>All source files for this tutorial are available in the companion GitHub repository: <a href="https://github.com/Amiynarh/multi-arch-docker-gke-arm">https://github.com/Amiynarh/multi-arch-docker-gke-arm</a></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Reduce Latency in Your Generative AI Apps with Gemini and Cloud Run ]]>
                </title>
                <description>
                    <![CDATA[ You've built your first Generative AI feature. Now what? When deploying AI, the challenge is no longer if the model can answer, but how fast it can answer for a user halfway across the globe. Low latency is not a luxury, it's a requirement for good u... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-reduce-latency-in-your-generative-ai-apps-with-gemini-and-cloud-run/</link>
                <guid isPermaLink="false">69398520ef68a953062588d1</guid>
                
                    <category>
                        <![CDATA[ optimization ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Load Balancing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Amina Lawal ]]>
                </dc:creator>
                <pubDate>Wed, 10 Dec 2025 14:35:12 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1765370930321/e4256d2f-cab3-4ae3-9486-c6651e363366.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>You've built your first Generative AI feature. Now what? When deploying AI, the challenge is no longer <em>if</em> the model can answer, but <em>how fast</em> it can answer for a user halfway across the globe. Low latency is not a luxury, it's a requirement for good user experience.</p>
<p>Today, we’ve moved beyond simple container deployments and into building <strong>Global AI Architectures</strong>. This setup leverages Google’s infrastructure to deliver context-aware, instant Gen AI responses anywhere in the world. If you're ready to get your hands dirty, let's build the future of global, intelligent features.</p>
<p>In this article, you’re not just going to deploy a container, you’ll be building a global AI architecture.</p>
<p>A global AI architecture is a design pattern that leverages a worldwide network to deploy and manage AI services, ensuring the fastest possible response time (low latency) for users, no matter where they are located. Instead of deploying a feature to a single region, this architecture distributes the service across multiple continents.</p>
<p>Most people may deploy a service to a single region. That’s fine for a local user, but physical distance, and the speed of light, creates terrible latency for everyone else. We are going to eliminate this problem by leveraging Google’s global network to deploy the service in a "triangle" of locations.</p>
<p>The generative AI service you’ll be building is a "Local Guide." This application will be designed to be deeply <strong>hyper-personalized</strong>, changing its personality and providing recommendations based on the user's detected geographical context. For example, if a user is in Paris, the guide will greet them warmly, mentioning their city and suggesting a local activity.</p>
<p>You’re going to build this service to achieve three critical goals:</p>
<ul>
<li><p><strong>Lives Almost Everywhere:</strong> Deployed to three continents simultaneously (USA, Europe, and Asia).</p>
</li>
<li><p><strong>Feels Instant:</strong> Uses Google's global fiber network and Anycast IP to route users to the nearest server, ensuring the lowest possible latency.</p>
</li>
<li><p><strong>Knows Where You Are:</strong> Automatically detects the user's location (without relying on client-side GPS permissions) to provide deeply personalized, location-aware suggestions.</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-phase-1-the-location-aware-code">Phase 1: The "Location-Aware" Code</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-phase-2-build-amp-push">Phase 2: Build &amp; Push</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-phase-3-the-triangle-deployment">Phase 3: The "Triangle" Deployment</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-phase-4-the-global-network-the-glue">Phase 4: The Global Network (The Glue)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-phase-5-testing-teleportation-time">Phase 5: Testing (Teleportation Time)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion-the-global-ai-edge">Conclusion: The Global AI Edge</a></p>
</li>
</ul>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>To follow along, you need:</p>
<ol>
<li><p><strong>A Google Cloud Project</strong> (with billing enabled).</p>
</li>
<li><p><strong>Google Cloud Shell</strong> (Recommended! No local setup required). Click the icon in the top right of the GCP Console that looks like a terminal prompt <code>&gt;_</code>.</p>
</li>
</ol>
<p><strong>Note</strong>: The project utilizes various Google Cloud services (Cloud Run, Artifact Registry, Load Balancer, Vertex AI), all of which require a Google Cloud Project with billing enabled to function. While many of these services offer a free tier, you must link a billing account to your project. Although a billing account is required, new Google Cloud users may be eligible for a <a target="_blank" href="https://console.cloud.google.com/freetrial?hl=en&amp;facet_utm_source=google&amp;facet_utm_campaign=%28organic%29&amp;facet_utm_medium=organic&amp;facet_url=https%3A%2F%2Fcloud.google.com%2Fsignup-faqs"><strong>free trial credit</strong></a> that should cover the cost of this lab. <a target="_blank" href="https://cloud.google.com/free/docs/free-cloud-features#free-trial">See credit program eligibility and coverage</a></p>
<h2 id="heading-phase-1-the-location-aware-code"><strong>Phase 1: The "Location-Aware" Code</strong></h2>
<p>We don’t want to build a generic chatbot, so we’ll be building a "Local Guide" that changes its personality based on where the request comes from.</p>
<h3 id="heading-enable-the-apis"><strong>Enable the APIs</strong></h3>
<p>To wake up the services, run this in your terminal:</p>
<pre><code class="lang-bash">gcloud services <span class="hljs-built_in">enable</span> \
  run.googleapis.com \
  artifactregistry.googleapis.com \
  compute.googleapis.com \
  aiplatform.googleapis.com \
  cloudbuild.googleapis.com
</code></pre>
<p>This command enables the necessary Google Cloud APIs for the project:</p>
<ul>
<li><p>Cloud Run (<a target="_blank" href="http://run.googleapis.com">run.googleapis.com</a>)</p>
</li>
<li><p>Artifact Registry (<a target="_blank" href="http://artifactregistry.googleapis.com">artifactregistry.googleapis.com</a>)</p>
</li>
<li><p>Compute Engine (<a target="_blank" href="http://compute.googleapis.com">compute.googleapis.com</a>)</p>
</li>
<li><p>Vertex AI (<a target="_blank" href="http://aiplatform.googleapis.com">aiplatform.googleapis.com</a>)</p>
</li>
<li><p>Cloud Build (<a target="_blank" href="http://cloudbuild.googleapis.com">cloudbuild.googleapis.com</a>).</p>
</li>
</ul>
<p>Enabling them ensures that the services we need are ready to be used.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764156603095/fb2ffd56-12e4-4b9f-ac2d-8fbb30fc0a2d.png" alt="Screenshot showing the Google Cloud APIs being successfully completed" class="image--center mx-auto" width="2132" height="280" loading="lazy"></p>
<h3 id="heading-create-and-populate-mainpyhttpmainpy">Create and Populate <a target="_blank" href="http://main.py"><code>main.py</code></a></h3>
<p>This is the brain of our service. In your Cloud Shell terminal, create a file named <a target="_blank" href="http://main.py"><code>main.py</code></a> and paste the following code into it:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> logging
<span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> Flask, request, jsonify
<span class="hljs-keyword">import</span> vertexai
<span class="hljs-keyword">from</span> vertexai.generative_models <span class="hljs-keyword">import</span> GenerativeModel

app = Flask(__name__)

<span class="hljs-comment"># Initialize Vertex AI</span>
PROJECT_ID = os.environ.get(<span class="hljs-string">"GOOGLE_CLOUD_PROJECT"</span>)
vertexai.init(project=PROJECT_ID)

<span class="hljs-meta">@app.route("/", methods=["GET", "POST"])</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate</span>():</span>
    <span class="hljs-comment"># 1. Identify where the code is physically running (We set this ENV var later)</span>
    service_region = os.environ.get(<span class="hljs-string">"SERVICE_REGION"</span>, <span class="hljs-string">"unknown-region"</span>)

    <span class="hljs-comment"># 2. Identify where the user is (Header comes from Global Load Balancer)</span>
    <span class="hljs-comment"># Format typically: "City,State,Country"</span>
    user_location = request.headers.get(<span class="hljs-string">"X-Client-Geo-Location"</span>, <span class="hljs-string">"Unknown Location"</span>)

    model = GenerativeModel(<span class="hljs-string">"gemini-2.5-flash"</span>)

    <span class="hljs-comment"># 3. Construct a location-aware prompt</span>
    prompt = (
        <span class="hljs-string">f"You are a helpful local guide. The user is currently in <span class="hljs-subst">{user_location}</span>. "</span>
        <span class="hljs-string">"Greet them warmly mentioning their city, and suggest one "</span>
        <span class="hljs-string">"hidden gem activity to do nearby right now. Keep it under 50 words."</span>
    )

    <span class="hljs-keyword">try</span>:
        response = model.generate_content(prompt)
        <span class="hljs-keyword">return</span> jsonify({
            <span class="hljs-string">"ai_response"</span>: response.text,
            <span class="hljs-string">"meta"</span>: {
                <span class="hljs-string">"served_from_region"</span>: service_region,
                <span class="hljs-string">"user_detected_location"</span>: user_location
            }
        })
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        <span class="hljs-keyword">return</span> jsonify({<span class="hljs-string">"error"</span>: str(e)}), <span class="hljs-number">500</span>

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    app.run(debug=<span class="hljs-literal">True</span>, host=<span class="hljs-string">"0.0.0.0"</span>, port=int(os.environ.get(<span class="hljs-string">"PORT"</span>, <span class="hljs-number">8080</span>)))
</code></pre>
<p>It’s a simple Flask web application that relies entirely on a specific HTTP header (<code>X-Client-Geo-Location</code>) that the global load balancer will inject later in the process. This design choice keeps the Python code clean, fast, and focused on using the context that the powerful Google Cloud infrastructure provides. The script uses Vertex AI and the high-performance Gemini 2.5 Flash generative model.</p>
<p>This core logic of the application is a simple Flask web service. It does the following:</p>
<ul>
<li><p><strong>Initialization:</strong> Sets up the Flask app, logging, and initializes the Vertex AI client using the project ID.</p>
</li>
<li><p><strong>Context:</strong> It extracts two critical pieces of information: the <code>SERVICE_REGION</code> (where the code is physically running) from the environment variable, and the <code>X-Client-Geo-Location</code> (the user's detected location) from the request header, which will be injected by the global load balancer.</p>
</li>
<li><p><strong>AI Generation:</strong> It uses the high-performance <code>gemini-2.5-flash</code> model.</p>
</li>
<li><p><strong>Prompt Construction:</strong> A dynamic, location-aware prompt is built using the detected city to instruct Gemini to act as a helpful local guide and provide a personalized suggestion.</p>
</li>
<li><p><strong>Response:</strong> The response includes the AI's generated text and a <code>meta</code> section containing both the serving region and the user's detected location, which helps in verification.</p>
</li>
</ul>
<h3 id="heading-create-the-dockerfile"><strong>Create the</strong> <code>Dockerfile</code></h3>
<p>This Dockerfile tells Cloud Run how to build the Python application into a container image. Create a file named <code>Dockerfile</code> in the same directory as <code>main.py</code> and paste the following content into it:</p>
<pre><code class="lang-dockerfile"><span class="hljs-keyword">FROM</span> python:<span class="hljs-number">3.9</span>-slim

<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>
<span class="hljs-keyword">COPY</span><span class="bash"> main.py .</span>

<span class="hljs-comment"># Install Flask and Vertex AI SDK</span>
<span class="hljs-keyword">RUN</span><span class="bash"> pip install flask google-cloud-aiplatform</span>

<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"python"</span>, <span class="hljs-string">"main.py"</span>]</span>
</code></pre>
<p>Here’s what the code does:</p>
<ul>
<li><p>Starts with a lightweight Python base image <code>python:3.9-slim</code>.</p>
</li>
<li><p>Sets the working directory inside the container <code>WORKDIR /app</code>.</p>
</li>
<li><p>Copies your application code into the container.</p>
</li>
<li><p><code>RUN pip install...</code> installs the required Python packages: Flask for the web server and <code>google-cloud-aiplatform</code> for accessing the Gemini model.</p>
</li>
<li><p><code>CMD</code> specifies the command to run when the container starts.</p>
</li>
</ul>
<h2 id="heading-phase-2-build-amp-push"><strong>Phase 2: Build &amp; Push</strong></h2>
<p>Let's package this up. For efficiency and consistency, we’ll follow the best practice of Build Once, Deploy Many. We’ll build the container image once using Cloud Build and store it in Google's Artifact Registry. This guarantees that the same tested application code runs in New York, Belgium, and Tokyo.</p>
<p>First, sets an environment variable for your Google Cloud Project ID to simplify later commands.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># 1. Set your Project ID variable</span>
<span class="hljs-built_in">export</span> PROJECT_ID=$(gcloud config get-value project)
</code></pre>
<p>Then create a new Docker repository named <code>gemini-global-repo</code> in the <code>us-central1</code> region to store the application container image:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># 2. Create the repository</span>
gcloud artifacts repositories create gemini-global-repo \
    --repository-format=docker \
    --location=us-central1 \
    --description=<span class="hljs-string">"Repo for Global Gemini App"</span>
</code></pre>
<p>Using the <code>mkdir gemini-app</code> command, create and navigate into a directory where you should place your <a target="_blank" href="http://main.py"><code>main.py</code></a> and <code>Dockerfile</code>:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># 3. Prepare the Build Environment (Crucial Step! 💡). To ensure the build process only includes our necessary code and avoids including temporary files from Cloud Shell's home directory </span>
mkdir gemini-app
<span class="hljs-built_in">cd</span> gemini-app
</code></pre>
<p>Next, use <code>gcloud builds submit --tag</code> to build the container image from the files in the current directory and push the resulting image to the newly created Artifact Registry repository:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># 4. Build the image (This takes about 2 minutes)</span>
gcloud builds submit --tag us-central1-docker.pkg.dev/<span class="hljs-variable">$PROJECT_ID</span>/gemini-global-repo/region-ai:v
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764159484475/97a5b2b6-f3c2-4d1b-8bf8-6f302748e744.png" alt="Screenshot of Cloud Shell Editor showing Dockerfile and terminal build output." class="image--center mx-auto" width="2880" height="1348" loading="lazy"></p>
<p><strong>NOTE:</strong> You might notice that we created the Artifact Registry repository (<code>gemini-global-repo</code>) in the <code>us-central1</code> region. This choice is purely for management and storage of the container image. When you create an image and push it to a regional Artifact Registry, the resulting image is still accessible globally. For this lab, <code>us-central1</code> serves as a reliable, central location for our single, canonical container image, the single source of truth, which is then pulled by Cloud Run in the three separate global regions.</p>
<h2 id="heading-phase-3-the-triangle-deployment"><strong>Phase 3: The "Triangle" Deployment</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764661657796/0890a47b-589a-4cf8-b537-bb61e5e65ee7.png" alt="Diagram of the Global AI Architecture Triangle Deployment." class="image--center mx-auto" width="1024" height="1024" loading="lazy"></p>
<p>We’ll deploy the same image to three corners of the world, forming our "Triangle". This ensures that whether a user is in Lagos, London, or Tokyo, they’ll be geographically close to a server. This is the low-latency core of our architecture.</p>
<p>We’ll use Cloud Run to deploy our services. Cloud Run is a fully managed serverless platform on Google Cloud that enables you to run stateless containers via web requests or events. Crucially, it is serverless, meaning you don't manage any virtual machines, operating system updates, or scaling infrastructure. You provide a container image, and Cloud Run automatically scales it up (and down to zero) in the region you specify.</p>
<p>For this project, we’ll use its regional deployment capability to easily and consistently deploy the exact same container image to New York, Belgium, and Tokyo.</p>
<p><strong>Note:</strong> Setting it up primarily involves enabling the API (done in Phase 1) and using the <code>gcloud run deploy</code> command, which handles provisioning and managing the service in the specified region.</p>
<p>Now, we’ll proceed to deploy the single, canonical container image to three separate Cloud Run regions, forming the "Triangle Deployment".</p>
<p>First, set a variable for the image path, pointing to the image stored in Artifact Registry.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Define our image URL</span>
<span class="hljs-built_in">export</span> IMAGE_URL=us-central1-docker.pkg.dev/<span class="hljs-variable">$PROJECT_ID</span>/gemini-global-repo/region-ai:v1
</code></pre>
<pre><code class="lang-bash">
<span class="hljs-comment"># 1. Deploy to USA (New York)</span>
gcloud run deploy gemini-service \
    --image <span class="hljs-variable">$IMAGE_URL</span> \
    --region us-east4 \
    --set-env-vars SERVICE_REGION=us-east4 \
    --allow-unauthenticated

<span class="hljs-comment"># 2. Deploy to Europe (Belgium)</span>
gcloud run deploy gemini-service \
    --image <span class="hljs-variable">$IMAGE_URL</span> \
    --region europe-west1 \
    --set-env-vars SERVICE_REGION=europe-west1 \
    --allow-unauthenticated

<span class="hljs-comment"># 3. Deploy to Asia (Tokyo)</span>
gcloud run deploy gemini-service \
    --image <span class="hljs-variable">$IMAGE_URL</span> \
    --region asia-northeast1 \
    --set-env-vars SERVICE_REGION=asia-northeast1 \
    --allow-unauthenticated
</code></pre>
<p><code>gcloud run deploy gemini-service...</code> deploys the service. Key flags:</p>
<ul>
<li><p><code>--image \$IMAGE_URL</code> specifies the container image to use.</p>
</li>
<li><p><code>--region</code> specifies the deployment region (for example, <code>us-east4</code> for New York).</p>
</li>
<li><p><code>--set-env-vars SERVICE_REGION=...</code> injects an environment variable into the running container to let the <a target="_blank" href="http://main.py"><code>main.py</code></a> code know its own physical region.</p>
</li>
<li><p><code>--allow-unauthenticated</code> makes the service publicly accessible, as required for the Load Balancer to connect.</p>
</li>
</ul>
<p><strong>Note:</strong> The commands are repeated for Europe (<code>europe-west1</code>) and Asia (<code>asia-northeast1</code>) regions.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764160600271/fbb6a810-7496-4b29-a405-b67a22a988ed.png" alt="Screenshot of Cloud Shell terminal showing the execution of the cloud run services." class="image--right mx-auto mr-0" width="2880" height="1348" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764160624375/dd4dc7e7-22a9-4d8b-a36c-7a0988068f57.png" alt="Cloud run Service Url (asia region) terminal screenshot showing the successful execution of the service" class="image--center mx-auto" width="2880" height="536" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764160656898/1b6ca938-9ce4-48f6-bb3b-d09900dbde68.png" alt="Cloud run Service Url (europe region) terminal screenshot showing the successful execution of the service" class="image--center mx-auto" width="2880" height="536" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764160665595/39c2524d-62c8-4187-8b8f-15f7ebbffba4.png" alt="Cloud run Service Url (us-east region) terminal screenshot showing the successful execution of the service" class="image--center mx-auto" width="2880" height="536" loading="lazy"></p>
<p><code>user_detected_location</code> is always "Unknown Location". This is expected. You are accessing the Cloud Run URLs directly, not via the global load balancer, so the <code>X-Client-Geo-Location</code> header is not yet being injected.</p>
<h2 id="heading-phase-4-the-global-network-the-glue"><strong>Phase 4: The Global Network (The Glue)</strong></h2>
<p>You are now ready to execute the steps to create the <strong>Global External HTTP Load Balancer</strong> infrastructure. This is the "magic" that stitches the three regional services together behind a single <strong>Anycast IP Address</strong>. The load balancer performs two critical functions:</p>
<ol>
<li><p><strong>Global Routing:</strong> It uses Google’s high-speed network to automatically route the user to the closest available region (for example, Tokyo user → Asia service).</p>
</li>
<li><p><strong>Context Injection:</strong> It dynamically adds the <code>X-Client-Geo-Location</code> header to the request, telling your code exactly where the user is<sup>.</sup></p>
</li>
</ol>
<h3 id="heading-the-global-ip"><strong>The Global IP</strong></h3>
<p><code>gcloud compute addresses create...</code> creates a single, global, static Anycast IP address (<code>gemini-global-ip</code>) that will serve as the single public entry point for users worldwide. That is</p>
<pre><code class="lang-bash">gcloud compute addresses create gemini-global-ip \
    --global \
    --ip-version IPV4
</code></pre>
<h3 id="heading-the-network-endpoint-groups-negs"><strong>The Network Endpoint Groups (NEGs)</strong></h3>
<p><code>gcloud compute network-endpoint-groups create...</code> creates a <strong>Serverless Network Endpoint Group (NEG)</strong> for each regional Cloud Run deployment. For example, <code>neg-us</code> is created in <code>us-east4</code> and points to the <code>gemini-service</code> in that region. These map your Cloud Run services to the Load Balancer's backend service:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># USA NEG</span>
gcloud compute network-endpoint-groups create neg-us \
    --region=us-east4 \
    --network-endpoint-type=serverless  \
    --cloud-run-service=gemini-service

<span class="hljs-comment"># Europe NEG</span>
gcloud compute network-endpoint-groups create neg-eu \
    --region=europe-west1 \
    --network-endpoint-type=serverless \
    --cloud-run-service=gemini-service

<span class="hljs-comment"># Asia NEG</span>
gcloud compute network-endpoint-groups create neg-asia \
    --region=asia-northeast1 \
    --network-endpoint-type=serverless \
    --cloud-run-service=gemini-service
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764161003478/549c959d-8ab5-45d6-a2ae-94129529b5b4.png" alt="Screenshot of Cloud Shell terminal showing the execution of global load balancer setup commands." class="image--center mx-auto" width="2880" height="1010" loading="lazy"></p>
<h3 id="heading-the-backend-service-amp-routing"><strong>The Backend Service &amp; Routing</strong></h3>
<p>This is the load balancer's core, distributing traffic across your regions. Connect the NEGs to a global backend.</p>
<p><code>gcloud compute backend-services create...</code> creates the global backend service (<code>gemini-backend-global</code>), which is the core component that manages traffic distribution:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create the backend service</span>
gcloud compute backend-services create gemini-backend-global \
    --global \
    --protocol=HTTP
</code></pre>
<p><code>gcloud compute backend-services add-backend...</code> adds all three regional NEGs (<code>neg-us</code>, <code>neg-eu</code>, <code>neg-asia</code>) as backends to the global service. This tells the load balancer where all the services are located:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Add the 3 regions to the backend</span>
gcloud compute backend-services add-backend gemini-backend-global \
    --global --network-endpoint-group=neg-us --network-endpoint-group-region=us-east4
gcloud compute backend-services add-backend gemini-backend-global \
    --global --network-endpoint-group=neg-eu --network-endpoint-group-region=europe-west1
gcloud compute backend-services add-backend gemini-backend-global \
    --global --network-endpoint-group=neg-asia --network-endpoint-group-region=asia-northeast1
</code></pre>
<h3 id="heading-the-url-map-amp-frontend"><strong>The URL Map &amp; Frontend</strong></h3>
<p>Now we can finalize the connection.</p>
<p><code>gcloud compute url-maps create...</code> creates a URL Map (<code>gemini-url-map</code>) to direct all incoming traffic to the Backend Service:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create URL Map (Maps incoming requests to the backend service)</span>
gcloud compute url-maps create gemini-url-map \
    --default-service gemini-backend-global
</code></pre>
<p><code>gcloud compute target-http-proxies create...</code> creates an HTTP Proxy (<code>gemini-http-proxy</code>) that inspects the request and directs it based on the URL map.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create HTTP Proxy (The component that inspects the request headers)</span>
gcloud compute target-http-proxies create gemini-http-proxy \
    --url-map gemini-url-map
</code></pre>
<p><code>export VIP=...</code> retrieves the final, public IP address of the newly created Global IP and stores it in the <code>VIP</code> environment variable.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Get your IP Address variable</span>
<span class="hljs-built_in">export</span> VIP=$(gcloud compute addresses describe gemini-global-ip --global --format=<span class="hljs-string">"value(address)"</span>)
</code></pre>
<p><code>gcloud compute forwarding-rules create...</code> creates the final global Forwarding Rule (<code>gemini-forwarding-rule</code>). This links the Global IP (<code>$VIP</code>) to the HTTP Proxy and opens port 80 for public traffic.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create Forwarding Rule (Open port 80)</span>
gcloud compute forwarding-rules create gemini-forwarding-rule \
    --address=<span class="hljs-variable">$VIP</span> \
    --global \
    --target-http-proxy=gemini-http-proxy \
    --ports=80
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764161323862/299c6c43-9074-493c-95b1-2c08208aa2ec.png" alt="Cloud Shell terminal screenshot showing the successful execution of commands to create the gemini-backend-global service" class="image--center mx-auto" width="2880" height="1010" loading="lazy"></p>
<h2 id="heading-phase-5-testing-teleportation-time"><strong>Phase 5: Testing (Teleportation Time)</strong></h2>
<p>Global load balancers take about <strong>5-7 minutes</strong> to propagate worldwide. This is how you verify that the global load balancer is working correctly:</p>
<ul>
<li><p>Using the single <strong>VIP</strong> (Virtual IP) address.</p>
</li>
<li><p><strong>Routing traffic</strong> to the nearest server.</p>
</li>
<li><p><strong>Injecting the</strong> <code>X-Client-Geo-Location</code> header to tell your code where the user is.</p>
</li>
</ul>
<h3 id="heading-1-get-your-global-ip"><strong>1. Get your Global IP</strong></h3>
<p>First, ensure your <code>VIP</code> variable is set and retrieve the final address:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"http://<span class="hljs-variable">$VIP</span>/"</span>
</code></pre>
<p>The output will be your single point of entry for the entire global architecture.</p>
<h3 id="heading-2-test-teleportation"><strong>2. Test "Teleportation"</strong></h3>
<p>These <code>curl</code> commands simulate a user requesting the service from different geographical locations by manually injecting the <code>X-Client-Geo-Location</code> header. This bypasses the need to be physically in those locations for testing.</p>
<h4 id="heading-simulate-europe-paris">Simulate Europe (Paris)</h4>
<p>We expect this to be served by the <code>europe-west1</code> region because it's the closest server.</p>
<pre><code class="lang-bash">curl -H <span class="hljs-string">"X-Client-Geo-Location: Paris,France"</span> http://<span class="hljs-variable">$VIP</span>/
</code></pre>
<p><em>Expected Output:</em> Gemini should say "Bonjour" and mention Paris. The <code>served_from_region</code> should be <code>europe-west1</code>.</p>
<p><strong>Simulate Asia (Tokyo)</strong></p>
<p>We expect this to be served by the <code>asia-northeast1</code> region.</p>
<pre><code class="lang-bash">curl -H <span class="hljs-string">"X-Client-Geo-Location: Tokyo,Japan"</span> http://<span class="hljs-variable">$VIP</span>/
</code></pre>
<p><em>Expected Output:</em> Gemini should mention Tokyo. The <code>served_from_region</code> should be <code>asia-northeast1</code>.</p>
<h4 id="heading-simulate-usa-new-york">Simulate USA (New York)</h4>
<p>We expect this to be served by the <code>us-east4</code> region.</p>
<pre><code class="lang-bash">curl -s -H <span class="hljs-string">"X-Client-Geo-Location: New York,USA"</span> http://<span class="hljs-variable">$VIP</span>/ | jq .
</code></pre>
<p><em>Expected Output:</em> Gemini should mention USA. The <code>served_from_region</code> should be <code>us-east4</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764161891891/ecc290ef-1c75-4088-b453-093a92b404ff.png" alt="Cloud Shell terminal screenshot showing the results of curl commands simulating users in Paris, Tokyo, and New York." class="image--center mx-auto" width="2880" height="1010" loading="lazy"></p>
<p><strong>Note:</strong> The <code>| jq .</code> part is optional, but highly recommended as it formats the JSON output, making it much easier to read the <code>served_from_region</code> and <code>ai_response</code> details. If <code>jq</code> isn't available, you can just run <code>curl ...</code> without it.</p>
<h2 id="heading-conclusion-the-global-ai-edge">Conclusion: The Global AI Edge</h2>
<p>Congratulations! You have successfully built a sophisticated, global AI architecture that solves the challenges of latency and personalization for generative AI features. By combining the following technologies, you achieved two critical outcomes:</p>
<ul>
<li><p><strong>Guaranteed Low Latency:</strong> By deploying the <strong>Cloud Run</strong> service to a "Triangle" of global regions (USA, Europe, Asia) and using the <strong>Global External HTTP Load Balancer's Anycast IP</strong>, your users are automatically routed across Google’s private fiber network to the closest available server.</p>
</li>
<li><p><strong>Hyper-Personalization:</strong> The global load balancer was configured to dynamically inject the user's geographical location via the <code>X-Client-Geo-Location</code> header. This context was passed directly to the <strong>Gemini 2.5 Flash</strong> model, allowing it to act as a truly location-aware "Local Guide".</p>
</li>
</ul>
<p>This pattern allows you to scale intelligent features globally and is immediately applicable to any application where speed and context are essential, from real-time translations to hyper-local recommendations.</p>
<h3 id="heading-cleanup"><strong>Cleanup</strong></h3>
<p>Don't leave the meter running! Remember to execute the cleanup commands to ensure you don't incur unnecessary charges</p>
<pre><code class="lang-bash">gcloud run services delete gemini-service --region us-east4 --quiet
gcloud run services delete gemini-service --region europe-west1 --quiet
gcloud run services delete gemini-service --region asia-northeast1 --quiet
gcloud compute forwarding-rules delete gemini-forwarding-rule --global --quiet
gcloud compute addresses delete gemini-global-ip --global --quiet
gcloud compute backend-services delete gemini-backend-global --global --quiet
gcloud compute url-maps delete gemini-url-map --global --quiet
gcloud compute target-http-proxies delete gemini-http-proxy --global --quiet
</code></pre>
<h3 id="heading-resources">Resources</h3>
<ul>
<li><p>Google Cloud Shell Documentation</p>
</li>
<li><p><a target="_blank" href="https://www.google.com/search?q=https://cloud.google.com/vertex-ai/docs/generative-ai/learn/sdk">Vertex AI Generative AI SDK</a></p>
</li>
<li><p><a target="_blank" href="https://cloud.google.com/artifact-registry/docs">Artifact Registry Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://cloud.google.com/run/docs">Cloud Run Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://www.google.com/search?q=https://cloud.google.com/load-balancing/docs/load-balancing-overview%23external_http_s_load_balancing">Global External HTTP(S) Load Balancer Overview</a></p>
</li>
<li><p><a target="_blank" href="https://www.google.com/search?q=https://cloud.google.com/load-balancing/docs/negs/serverless-neg-overview">Serverless Network Endpoint Groups (NEGs)</a></p>
</li>
<li><p><a target="_blank" href="https://docs.cloud.google.com/run/docs/multiple-regions">Serve traffic from multiple regions</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
