<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Load Balancing - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Load Balancing - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 14 May 2026 04:32:58 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/load-balancing/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Reduce Latency in Your Generative AI Apps with Gemini and Cloud Run ]]>
                </title>
                <description>
                    <![CDATA[ You've built your first Generative AI feature. Now what? When deploying AI, the challenge is no longer if the model can answer, but how fast it can answer for a user halfway across the globe. Low latency is not a luxury, it's a requirement for good u... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-reduce-latency-in-your-generative-ai-apps-with-gemini-and-cloud-run/</link>
                <guid isPermaLink="false">69398520ef68a953062588d1</guid>
                
                    <category>
                        <![CDATA[ optimization ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Load Balancing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Amina Lawal ]]>
                </dc:creator>
                <pubDate>Wed, 10 Dec 2025 14:35:12 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1765370930321/e4256d2f-cab3-4ae3-9486-c6651e363366.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>You've built your first Generative AI feature. Now what? When deploying AI, the challenge is no longer <em>if</em> the model can answer, but <em>how fast</em> it can answer for a user halfway across the globe. Low latency is not a luxury, it's a requirement for good user experience.</p>
<p>Today, we’ve moved beyond simple container deployments and into building <strong>Global AI Architectures</strong>. This setup leverages Google’s infrastructure to deliver context-aware, instant Gen AI responses anywhere in the world. If you're ready to get your hands dirty, let's build the future of global, intelligent features.</p>
<p>In this article, you’re not just going to deploy a container, you’ll be building a global AI architecture.</p>
<p>A global AI architecture is a design pattern that leverages a worldwide network to deploy and manage AI services, ensuring the fastest possible response time (low latency) for users, no matter where they are located. Instead of deploying a feature to a single region, this architecture distributes the service across multiple continents.</p>
<p>Most people may deploy a service to a single region. That’s fine for a local user, but physical distance, and the speed of light, creates terrible latency for everyone else. We are going to eliminate this problem by leveraging Google’s global network to deploy the service in a "triangle" of locations.</p>
<p>The generative AI service you’ll be building is a "Local Guide." This application will be designed to be deeply <strong>hyper-personalized</strong>, changing its personality and providing recommendations based on the user's detected geographical context. For example, if a user is in Paris, the guide will greet them warmly, mentioning their city and suggesting a local activity.</p>
<p>You’re going to build this service to achieve three critical goals:</p>
<ul>
<li><p><strong>Lives Almost Everywhere:</strong> Deployed to three continents simultaneously (USA, Europe, and Asia).</p>
</li>
<li><p><strong>Feels Instant:</strong> Uses Google's global fiber network and Anycast IP to route users to the nearest server, ensuring the lowest possible latency.</p>
</li>
<li><p><strong>Knows Where You Are:</strong> Automatically detects the user's location (without relying on client-side GPS permissions) to provide deeply personalized, location-aware suggestions.</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-phase-1-the-location-aware-code">Phase 1: The "Location-Aware" Code</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-phase-2-build-amp-push">Phase 2: Build &amp; Push</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-phase-3-the-triangle-deployment">Phase 3: The "Triangle" Deployment</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-phase-4-the-global-network-the-glue">Phase 4: The Global Network (The Glue)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-phase-5-testing-teleportation-time">Phase 5: Testing (Teleportation Time)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion-the-global-ai-edge">Conclusion: The Global AI Edge</a></p>
</li>
</ul>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>To follow along, you need:</p>
<ol>
<li><p><strong>A Google Cloud Project</strong> (with billing enabled).</p>
</li>
<li><p><strong>Google Cloud Shell</strong> (Recommended! No local setup required). Click the icon in the top right of the GCP Console that looks like a terminal prompt <code>&gt;_</code>.</p>
</li>
</ol>
<p><strong>Note</strong>: The project utilizes various Google Cloud services (Cloud Run, Artifact Registry, Load Balancer, Vertex AI), all of which require a Google Cloud Project with billing enabled to function. While many of these services offer a free tier, you must link a billing account to your project. Although a billing account is required, new Google Cloud users may be eligible for a <a target="_blank" href="https://console.cloud.google.com/freetrial?hl=en&amp;facet_utm_source=google&amp;facet_utm_campaign=%28organic%29&amp;facet_utm_medium=organic&amp;facet_url=https%3A%2F%2Fcloud.google.com%2Fsignup-faqs"><strong>free trial credit</strong></a> that should cover the cost of this lab. <a target="_blank" href="https://cloud.google.com/free/docs/free-cloud-features#free-trial">See credit program eligibility and coverage</a></p>
<h2 id="heading-phase-1-the-location-aware-code"><strong>Phase 1: The "Location-Aware" Code</strong></h2>
<p>We don’t want to build a generic chatbot, so we’ll be building a "Local Guide" that changes its personality based on where the request comes from.</p>
<h3 id="heading-enable-the-apis"><strong>Enable the APIs</strong></h3>
<p>To wake up the services, run this in your terminal:</p>
<pre><code class="lang-bash">gcloud services <span class="hljs-built_in">enable</span> \
  run.googleapis.com \
  artifactregistry.googleapis.com \
  compute.googleapis.com \
  aiplatform.googleapis.com \
  cloudbuild.googleapis.com
</code></pre>
<p>This command enables the necessary Google Cloud APIs for the project:</p>
<ul>
<li><p>Cloud Run (<a target="_blank" href="http://run.googleapis.com">run.googleapis.com</a>)</p>
</li>
<li><p>Artifact Registry (<a target="_blank" href="http://artifactregistry.googleapis.com">artifactregistry.googleapis.com</a>)</p>
</li>
<li><p>Compute Engine (<a target="_blank" href="http://compute.googleapis.com">compute.googleapis.com</a>)</p>
</li>
<li><p>Vertex AI (<a target="_blank" href="http://aiplatform.googleapis.com">aiplatform.googleapis.com</a>)</p>
</li>
<li><p>Cloud Build (<a target="_blank" href="http://cloudbuild.googleapis.com">cloudbuild.googleapis.com</a>).</p>
</li>
</ul>
<p>Enabling them ensures that the services we need are ready to be used.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764156603095/fb2ffd56-12e4-4b9f-ac2d-8fbb30fc0a2d.png" alt="Screenshot showing the Google Cloud APIs being successfully completed" class="image--center mx-auto" width="2132" height="280" loading="lazy"></p>
<h3 id="heading-create-and-populate-mainpyhttpmainpy">Create and Populate <a target="_blank" href="http://main.py"><code>main.py</code></a></h3>
<p>This is the brain of our service. In your Cloud Shell terminal, create a file named <a target="_blank" href="http://main.py"><code>main.py</code></a> and paste the following code into it:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> logging
<span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> Flask, request, jsonify
<span class="hljs-keyword">import</span> vertexai
<span class="hljs-keyword">from</span> vertexai.generative_models <span class="hljs-keyword">import</span> GenerativeModel

app = Flask(__name__)

<span class="hljs-comment"># Initialize Vertex AI</span>
PROJECT_ID = os.environ.get(<span class="hljs-string">"GOOGLE_CLOUD_PROJECT"</span>)
vertexai.init(project=PROJECT_ID)

<span class="hljs-meta">@app.route("/", methods=["GET", "POST"])</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate</span>():</span>
    <span class="hljs-comment"># 1. Identify where the code is physically running (We set this ENV var later)</span>
    service_region = os.environ.get(<span class="hljs-string">"SERVICE_REGION"</span>, <span class="hljs-string">"unknown-region"</span>)

    <span class="hljs-comment"># 2. Identify where the user is (Header comes from Global Load Balancer)</span>
    <span class="hljs-comment"># Format typically: "City,State,Country"</span>
    user_location = request.headers.get(<span class="hljs-string">"X-Client-Geo-Location"</span>, <span class="hljs-string">"Unknown Location"</span>)

    model = GenerativeModel(<span class="hljs-string">"gemini-2.5-flash"</span>)

    <span class="hljs-comment"># 3. Construct a location-aware prompt</span>
    prompt = (
        <span class="hljs-string">f"You are a helpful local guide. The user is currently in <span class="hljs-subst">{user_location}</span>. "</span>
        <span class="hljs-string">"Greet them warmly mentioning their city, and suggest one "</span>
        <span class="hljs-string">"hidden gem activity to do nearby right now. Keep it under 50 words."</span>
    )

    <span class="hljs-keyword">try</span>:
        response = model.generate_content(prompt)
        <span class="hljs-keyword">return</span> jsonify({
            <span class="hljs-string">"ai_response"</span>: response.text,
            <span class="hljs-string">"meta"</span>: {
                <span class="hljs-string">"served_from_region"</span>: service_region,
                <span class="hljs-string">"user_detected_location"</span>: user_location
            }
        })
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        <span class="hljs-keyword">return</span> jsonify({<span class="hljs-string">"error"</span>: str(e)}), <span class="hljs-number">500</span>

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    app.run(debug=<span class="hljs-literal">True</span>, host=<span class="hljs-string">"0.0.0.0"</span>, port=int(os.environ.get(<span class="hljs-string">"PORT"</span>, <span class="hljs-number">8080</span>)))
</code></pre>
<p>It’s a simple Flask web application that relies entirely on a specific HTTP header (<code>X-Client-Geo-Location</code>) that the global load balancer will inject later in the process. This design choice keeps the Python code clean, fast, and focused on using the context that the powerful Google Cloud infrastructure provides. The script uses Vertex AI and the high-performance Gemini 2.5 Flash generative model.</p>
<p>This core logic of the application is a simple Flask web service. It does the following:</p>
<ul>
<li><p><strong>Initialization:</strong> Sets up the Flask app, logging, and initializes the Vertex AI client using the project ID.</p>
</li>
<li><p><strong>Context:</strong> It extracts two critical pieces of information: the <code>SERVICE_REGION</code> (where the code is physically running) from the environment variable, and the <code>X-Client-Geo-Location</code> (the user's detected location) from the request header, which will be injected by the global load balancer.</p>
</li>
<li><p><strong>AI Generation:</strong> It uses the high-performance <code>gemini-2.5-flash</code> model.</p>
</li>
<li><p><strong>Prompt Construction:</strong> A dynamic, location-aware prompt is built using the detected city to instruct Gemini to act as a helpful local guide and provide a personalized suggestion.</p>
</li>
<li><p><strong>Response:</strong> The response includes the AI's generated text and a <code>meta</code> section containing both the serving region and the user's detected location, which helps in verification.</p>
</li>
</ul>
<h3 id="heading-create-the-dockerfile"><strong>Create the</strong> <code>Dockerfile</code></h3>
<p>This Dockerfile tells Cloud Run how to build the Python application into a container image. Create a file named <code>Dockerfile</code> in the same directory as <code>main.py</code> and paste the following content into it:</p>
<pre><code class="lang-dockerfile"><span class="hljs-keyword">FROM</span> python:<span class="hljs-number">3.9</span>-slim

<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>
<span class="hljs-keyword">COPY</span><span class="bash"> main.py .</span>

<span class="hljs-comment"># Install Flask and Vertex AI SDK</span>
<span class="hljs-keyword">RUN</span><span class="bash"> pip install flask google-cloud-aiplatform</span>

<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"python"</span>, <span class="hljs-string">"main.py"</span>]</span>
</code></pre>
<p>Here’s what the code does:</p>
<ul>
<li><p>Starts with a lightweight Python base image <code>python:3.9-slim</code>.</p>
</li>
<li><p>Sets the working directory inside the container <code>WORKDIR /app</code>.</p>
</li>
<li><p>Copies your application code into the container.</p>
</li>
<li><p><code>RUN pip install...</code> installs the required Python packages: Flask for the web server and <code>google-cloud-aiplatform</code> for accessing the Gemini model.</p>
</li>
<li><p><code>CMD</code> specifies the command to run when the container starts.</p>
</li>
</ul>
<h2 id="heading-phase-2-build-amp-push"><strong>Phase 2: Build &amp; Push</strong></h2>
<p>Let's package this up. For efficiency and consistency, we’ll follow the best practice of Build Once, Deploy Many. We’ll build the container image once using Cloud Build and store it in Google's Artifact Registry. This guarantees that the same tested application code runs in New York, Belgium, and Tokyo.</p>
<p>First, sets an environment variable for your Google Cloud Project ID to simplify later commands.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># 1. Set your Project ID variable</span>
<span class="hljs-built_in">export</span> PROJECT_ID=$(gcloud config get-value project)
</code></pre>
<p>Then create a new Docker repository named <code>gemini-global-repo</code> in the <code>us-central1</code> region to store the application container image:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># 2. Create the repository</span>
gcloud artifacts repositories create gemini-global-repo \
    --repository-format=docker \
    --location=us-central1 \
    --description=<span class="hljs-string">"Repo for Global Gemini App"</span>
</code></pre>
<p>Using the <code>mkdir gemini-app</code> command, create and navigate into a directory where you should place your <a target="_blank" href="http://main.py"><code>main.py</code></a> and <code>Dockerfile</code>:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># 3. Prepare the Build Environment (Crucial Step! 💡). To ensure the build process only includes our necessary code and avoids including temporary files from Cloud Shell's home directory </span>
mkdir gemini-app
<span class="hljs-built_in">cd</span> gemini-app
</code></pre>
<p>Next, use <code>gcloud builds submit --tag</code> to build the container image from the files in the current directory and push the resulting image to the newly created Artifact Registry repository:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># 4. Build the image (This takes about 2 minutes)</span>
gcloud builds submit --tag us-central1-docker.pkg.dev/<span class="hljs-variable">$PROJECT_ID</span>/gemini-global-repo/region-ai:v
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764159484475/97a5b2b6-f3c2-4d1b-8bf8-6f302748e744.png" alt="Screenshot of Cloud Shell Editor showing Dockerfile and terminal build output." class="image--center mx-auto" width="2880" height="1348" loading="lazy"></p>
<p><strong>NOTE:</strong> You might notice that we created the Artifact Registry repository (<code>gemini-global-repo</code>) in the <code>us-central1</code> region. This choice is purely for management and storage of the container image. When you create an image and push it to a regional Artifact Registry, the resulting image is still accessible globally. For this lab, <code>us-central1</code> serves as a reliable, central location for our single, canonical container image, the single source of truth, which is then pulled by Cloud Run in the three separate global regions.</p>
<h2 id="heading-phase-3-the-triangle-deployment"><strong>Phase 3: The "Triangle" Deployment</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764661657796/0890a47b-589a-4cf8-b537-bb61e5e65ee7.png" alt="Diagram of the Global AI Architecture Triangle Deployment." class="image--center mx-auto" width="1024" height="1024" loading="lazy"></p>
<p>We’ll deploy the same image to three corners of the world, forming our "Triangle". This ensures that whether a user is in Lagos, London, or Tokyo, they’ll be geographically close to a server. This is the low-latency core of our architecture.</p>
<p>We’ll use Cloud Run to deploy our services. Cloud Run is a fully managed serverless platform on Google Cloud that enables you to run stateless containers via web requests or events. Crucially, it is serverless, meaning you don't manage any virtual machines, operating system updates, or scaling infrastructure. You provide a container image, and Cloud Run automatically scales it up (and down to zero) in the region you specify.</p>
<p>For this project, we’ll use its regional deployment capability to easily and consistently deploy the exact same container image to New York, Belgium, and Tokyo.</p>
<p><strong>Note:</strong> Setting it up primarily involves enabling the API (done in Phase 1) and using the <code>gcloud run deploy</code> command, which handles provisioning and managing the service in the specified region.</p>
<p>Now, we’ll proceed to deploy the single, canonical container image to three separate Cloud Run regions, forming the "Triangle Deployment".</p>
<p>First, set a variable for the image path, pointing to the image stored in Artifact Registry.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Define our image URL</span>
<span class="hljs-built_in">export</span> IMAGE_URL=us-central1-docker.pkg.dev/<span class="hljs-variable">$PROJECT_ID</span>/gemini-global-repo/region-ai:v1
</code></pre>
<pre><code class="lang-bash">
<span class="hljs-comment"># 1. Deploy to USA (New York)</span>
gcloud run deploy gemini-service \
    --image <span class="hljs-variable">$IMAGE_URL</span> \
    --region us-east4 \
    --set-env-vars SERVICE_REGION=us-east4 \
    --allow-unauthenticated

<span class="hljs-comment"># 2. Deploy to Europe (Belgium)</span>
gcloud run deploy gemini-service \
    --image <span class="hljs-variable">$IMAGE_URL</span> \
    --region europe-west1 \
    --set-env-vars SERVICE_REGION=europe-west1 \
    --allow-unauthenticated

<span class="hljs-comment"># 3. Deploy to Asia (Tokyo)</span>
gcloud run deploy gemini-service \
    --image <span class="hljs-variable">$IMAGE_URL</span> \
    --region asia-northeast1 \
    --set-env-vars SERVICE_REGION=asia-northeast1 \
    --allow-unauthenticated
</code></pre>
<p><code>gcloud run deploy gemini-service...</code> deploys the service. Key flags:</p>
<ul>
<li><p><code>--image \$IMAGE_URL</code> specifies the container image to use.</p>
</li>
<li><p><code>--region</code> specifies the deployment region (for example, <code>us-east4</code> for New York).</p>
</li>
<li><p><code>--set-env-vars SERVICE_REGION=...</code> injects an environment variable into the running container to let the <a target="_blank" href="http://main.py"><code>main.py</code></a> code know its own physical region.</p>
</li>
<li><p><code>--allow-unauthenticated</code> makes the service publicly accessible, as required for the Load Balancer to connect.</p>
</li>
</ul>
<p><strong>Note:</strong> The commands are repeated for Europe (<code>europe-west1</code>) and Asia (<code>asia-northeast1</code>) regions.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764160600271/fbb6a810-7496-4b29-a405-b67a22a988ed.png" alt="Screenshot of Cloud Shell terminal showing the execution of the cloud run services." class="image--right mx-auto mr-0" width="2880" height="1348" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764160624375/dd4dc7e7-22a9-4d8b-a36c-7a0988068f57.png" alt="Cloud run Service Url (asia region) terminal screenshot showing the successful execution of the service" class="image--center mx-auto" width="2880" height="536" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764160656898/1b6ca938-9ce4-48f6-bb3b-d09900dbde68.png" alt="Cloud run Service Url (europe region) terminal screenshot showing the successful execution of the service" class="image--center mx-auto" width="2880" height="536" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764160665595/39c2524d-62c8-4187-8b8f-15f7ebbffba4.png" alt="Cloud run Service Url (us-east region) terminal screenshot showing the successful execution of the service" class="image--center mx-auto" width="2880" height="536" loading="lazy"></p>
<p><code>user_detected_location</code> is always "Unknown Location". This is expected. You are accessing the Cloud Run URLs directly, not via the global load balancer, so the <code>X-Client-Geo-Location</code> header is not yet being injected.</p>
<h2 id="heading-phase-4-the-global-network-the-glue"><strong>Phase 4: The Global Network (The Glue)</strong></h2>
<p>You are now ready to execute the steps to create the <strong>Global External HTTP Load Balancer</strong> infrastructure. This is the "magic" that stitches the three regional services together behind a single <strong>Anycast IP Address</strong>. The load balancer performs two critical functions:</p>
<ol>
<li><p><strong>Global Routing:</strong> It uses Google’s high-speed network to automatically route the user to the closest available region (for example, Tokyo user → Asia service).</p>
</li>
<li><p><strong>Context Injection:</strong> It dynamically adds the <code>X-Client-Geo-Location</code> header to the request, telling your code exactly where the user is<sup>.</sup></p>
</li>
</ol>
<h3 id="heading-the-global-ip"><strong>The Global IP</strong></h3>
<p><code>gcloud compute addresses create...</code> creates a single, global, static Anycast IP address (<code>gemini-global-ip</code>) that will serve as the single public entry point for users worldwide. That is</p>
<pre><code class="lang-bash">gcloud compute addresses create gemini-global-ip \
    --global \
    --ip-version IPV4
</code></pre>
<h3 id="heading-the-network-endpoint-groups-negs"><strong>The Network Endpoint Groups (NEGs)</strong></h3>
<p><code>gcloud compute network-endpoint-groups create...</code> creates a <strong>Serverless Network Endpoint Group (NEG)</strong> for each regional Cloud Run deployment. For example, <code>neg-us</code> is created in <code>us-east4</code> and points to the <code>gemini-service</code> in that region. These map your Cloud Run services to the Load Balancer's backend service:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># USA NEG</span>
gcloud compute network-endpoint-groups create neg-us \
    --region=us-east4 \
    --network-endpoint-type=serverless  \
    --cloud-run-service=gemini-service

<span class="hljs-comment"># Europe NEG</span>
gcloud compute network-endpoint-groups create neg-eu \
    --region=europe-west1 \
    --network-endpoint-type=serverless \
    --cloud-run-service=gemini-service

<span class="hljs-comment"># Asia NEG</span>
gcloud compute network-endpoint-groups create neg-asia \
    --region=asia-northeast1 \
    --network-endpoint-type=serverless \
    --cloud-run-service=gemini-service
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764161003478/549c959d-8ab5-45d6-a2ae-94129529b5b4.png" alt="Screenshot of Cloud Shell terminal showing the execution of global load balancer setup commands." class="image--center mx-auto" width="2880" height="1010" loading="lazy"></p>
<h3 id="heading-the-backend-service-amp-routing"><strong>The Backend Service &amp; Routing</strong></h3>
<p>This is the load balancer's core, distributing traffic across your regions. Connect the NEGs to a global backend.</p>
<p><code>gcloud compute backend-services create...</code> creates the global backend service (<code>gemini-backend-global</code>), which is the core component that manages traffic distribution:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create the backend service</span>
gcloud compute backend-services create gemini-backend-global \
    --global \
    --protocol=HTTP
</code></pre>
<p><code>gcloud compute backend-services add-backend...</code> adds all three regional NEGs (<code>neg-us</code>, <code>neg-eu</code>, <code>neg-asia</code>) as backends to the global service. This tells the load balancer where all the services are located:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Add the 3 regions to the backend</span>
gcloud compute backend-services add-backend gemini-backend-global \
    --global --network-endpoint-group=neg-us --network-endpoint-group-region=us-east4
gcloud compute backend-services add-backend gemini-backend-global \
    --global --network-endpoint-group=neg-eu --network-endpoint-group-region=europe-west1
gcloud compute backend-services add-backend gemini-backend-global \
    --global --network-endpoint-group=neg-asia --network-endpoint-group-region=asia-northeast1
</code></pre>
<h3 id="heading-the-url-map-amp-frontend"><strong>The URL Map &amp; Frontend</strong></h3>
<p>Now we can finalize the connection.</p>
<p><code>gcloud compute url-maps create...</code> creates a URL Map (<code>gemini-url-map</code>) to direct all incoming traffic to the Backend Service:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create URL Map (Maps incoming requests to the backend service)</span>
gcloud compute url-maps create gemini-url-map \
    --default-service gemini-backend-global
</code></pre>
<p><code>gcloud compute target-http-proxies create...</code> creates an HTTP Proxy (<code>gemini-http-proxy</code>) that inspects the request and directs it based on the URL map.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create HTTP Proxy (The component that inspects the request headers)</span>
gcloud compute target-http-proxies create gemini-http-proxy \
    --url-map gemini-url-map
</code></pre>
<p><code>export VIP=...</code> retrieves the final, public IP address of the newly created Global IP and stores it in the <code>VIP</code> environment variable.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Get your IP Address variable</span>
<span class="hljs-built_in">export</span> VIP=$(gcloud compute addresses describe gemini-global-ip --global --format=<span class="hljs-string">"value(address)"</span>)
</code></pre>
<p><code>gcloud compute forwarding-rules create...</code> creates the final global Forwarding Rule (<code>gemini-forwarding-rule</code>). This links the Global IP (<code>$VIP</code>) to the HTTP Proxy and opens port 80 for public traffic.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create Forwarding Rule (Open port 80)</span>
gcloud compute forwarding-rules create gemini-forwarding-rule \
    --address=<span class="hljs-variable">$VIP</span> \
    --global \
    --target-http-proxy=gemini-http-proxy \
    --ports=80
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764161323862/299c6c43-9074-493c-95b1-2c08208aa2ec.png" alt="Cloud Shell terminal screenshot showing the successful execution of commands to create the gemini-backend-global service" class="image--center mx-auto" width="2880" height="1010" loading="lazy"></p>
<h2 id="heading-phase-5-testing-teleportation-time"><strong>Phase 5: Testing (Teleportation Time)</strong></h2>
<p>Global load balancers take about <strong>5-7 minutes</strong> to propagate worldwide. This is how you verify that the global load balancer is working correctly:</p>
<ul>
<li><p>Using the single <strong>VIP</strong> (Virtual IP) address.</p>
</li>
<li><p><strong>Routing traffic</strong> to the nearest server.</p>
</li>
<li><p><strong>Injecting the</strong> <code>X-Client-Geo-Location</code> header to tell your code where the user is.</p>
</li>
</ul>
<h3 id="heading-1-get-your-global-ip"><strong>1. Get your Global IP</strong></h3>
<p>First, ensure your <code>VIP</code> variable is set and retrieve the final address:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"http://<span class="hljs-variable">$VIP</span>/"</span>
</code></pre>
<p>The output will be your single point of entry for the entire global architecture.</p>
<h3 id="heading-2-test-teleportation"><strong>2. Test "Teleportation"</strong></h3>
<p>These <code>curl</code> commands simulate a user requesting the service from different geographical locations by manually injecting the <code>X-Client-Geo-Location</code> header. This bypasses the need to be physically in those locations for testing.</p>
<h4 id="heading-simulate-europe-paris">Simulate Europe (Paris)</h4>
<p>We expect this to be served by the <code>europe-west1</code> region because it's the closest server.</p>
<pre><code class="lang-bash">curl -H <span class="hljs-string">"X-Client-Geo-Location: Paris,France"</span> http://<span class="hljs-variable">$VIP</span>/
</code></pre>
<p><em>Expected Output:</em> Gemini should say "Bonjour" and mention Paris. The <code>served_from_region</code> should be <code>europe-west1</code>.</p>
<p><strong>Simulate Asia (Tokyo)</strong></p>
<p>We expect this to be served by the <code>asia-northeast1</code> region.</p>
<pre><code class="lang-bash">curl -H <span class="hljs-string">"X-Client-Geo-Location: Tokyo,Japan"</span> http://<span class="hljs-variable">$VIP</span>/
</code></pre>
<p><em>Expected Output:</em> Gemini should mention Tokyo. The <code>served_from_region</code> should be <code>asia-northeast1</code>.</p>
<h4 id="heading-simulate-usa-new-york">Simulate USA (New York)</h4>
<p>We expect this to be served by the <code>us-east4</code> region.</p>
<pre><code class="lang-bash">curl -s -H <span class="hljs-string">"X-Client-Geo-Location: New York,USA"</span> http://<span class="hljs-variable">$VIP</span>/ | jq .
</code></pre>
<p><em>Expected Output:</em> Gemini should mention USA. The <code>served_from_region</code> should be <code>us-east4</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764161891891/ecc290ef-1c75-4088-b453-093a92b404ff.png" alt="Cloud Shell terminal screenshot showing the results of curl commands simulating users in Paris, Tokyo, and New York." class="image--center mx-auto" width="2880" height="1010" loading="lazy"></p>
<p><strong>Note:</strong> The <code>| jq .</code> part is optional, but highly recommended as it formats the JSON output, making it much easier to read the <code>served_from_region</code> and <code>ai_response</code> details. If <code>jq</code> isn't available, you can just run <code>curl ...</code> without it.</p>
<h2 id="heading-conclusion-the-global-ai-edge">Conclusion: The Global AI Edge</h2>
<p>Congratulations! You have successfully built a sophisticated, global AI architecture that solves the challenges of latency and personalization for generative AI features. By combining the following technologies, you achieved two critical outcomes:</p>
<ul>
<li><p><strong>Guaranteed Low Latency:</strong> By deploying the <strong>Cloud Run</strong> service to a "Triangle" of global regions (USA, Europe, Asia) and using the <strong>Global External HTTP Load Balancer's Anycast IP</strong>, your users are automatically routed across Google’s private fiber network to the closest available server.</p>
</li>
<li><p><strong>Hyper-Personalization:</strong> The global load balancer was configured to dynamically inject the user's geographical location via the <code>X-Client-Geo-Location</code> header. This context was passed directly to the <strong>Gemini 2.5 Flash</strong> model, allowing it to act as a truly location-aware "Local Guide".</p>
</li>
</ul>
<p>This pattern allows you to scale intelligent features globally and is immediately applicable to any application where speed and context are essential, from real-time translations to hyper-local recommendations.</p>
<h3 id="heading-cleanup"><strong>Cleanup</strong></h3>
<p>Don't leave the meter running! Remember to execute the cleanup commands to ensure you don't incur unnecessary charges</p>
<pre><code class="lang-bash">gcloud run services delete gemini-service --region us-east4 --quiet
gcloud run services delete gemini-service --region europe-west1 --quiet
gcloud run services delete gemini-service --region asia-northeast1 --quiet
gcloud compute forwarding-rules delete gemini-forwarding-rule --global --quiet
gcloud compute addresses delete gemini-global-ip --global --quiet
gcloud compute backend-services delete gemini-backend-global --global --quiet
gcloud compute url-maps delete gemini-url-map --global --quiet
gcloud compute target-http-proxies delete gemini-http-proxy --global --quiet
</code></pre>
<h3 id="heading-resources">Resources</h3>
<ul>
<li><p>Google Cloud Shell Documentation</p>
</li>
<li><p><a target="_blank" href="https://www.google.com/search?q=https://cloud.google.com/vertex-ai/docs/generative-ai/learn/sdk">Vertex AI Generative AI SDK</a></p>
</li>
<li><p><a target="_blank" href="https://cloud.google.com/artifact-registry/docs">Artifact Registry Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://cloud.google.com/run/docs">Cloud Run Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://www.google.com/search?q=https://cloud.google.com/load-balancing/docs/load-balancing-overview%23external_http_s_load_balancing">Global External HTTP(S) Load Balancer Overview</a></p>
</li>
<li><p><a target="_blank" href="https://www.google.com/search?q=https://cloud.google.com/load-balancing/docs/negs/serverless-neg-overview">Serverless Network Endpoint Groups (NEGs)</a></p>
</li>
<li><p><a target="_blank" href="https://docs.cloud.google.com/run/docs/multiple-regions">Serve traffic from multiple regions</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Load Balancing with Azure Application Gateway and Azure Load Balancer – When to Use Each One ]]>
                </title>
                <description>
                    <![CDATA[ You’ve probably heard someone mention load balancing when talking about cloud apps. Maybe even names like Azure Load Balancer, Azure Application Gateway, or something about Virtual Machines and Scale Sets. 😵‍💫 It all sounds important...but also a l... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/load-balancing-with-azure-application-gateway-and-azure-load-balancer/</link>
                <guid isPermaLink="false">6824f10a7d203c180e5ea4b2</guid>
                
                    <category>
                        <![CDATA[ Load Balancing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Azure ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Azure Application Gateway ]]>
                    </category>
                
                    <category>
                        <![CDATA[ virtual machine ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #virtual machine scale set ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Load Balancer ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Prince Onukwili ]]>
                </dc:creator>
                <pubDate>Wed, 14 May 2025 19:37:46 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1747235455030/cb82bfb4-8d7b-47e5-ab31-126906f60b40.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>You’ve probably heard someone mention load balancing when talking about cloud apps. Maybe even names like Azure Load Balancer, Azure Application Gateway, or something about Virtual Machines and Scale Sets. 😵‍💫</p>
<p>It all sounds important...but also a little confusing. Like, why are there so many moving parts? And what do they actually do?</p>
<p>In this guide, we’re going to break it all down – step by step – using real examples and simple language.</p>
<p>You’ll learn:</p>
<ul>
<li><p>What load balancers are (and why apps even need them)</p>
</li>
<li><p>How apps were deployed before load balancers existed (hint: everything lived on one lonely server)</p>
</li>
<li><p>How Azure Virtual Machines work – and how they let you scale up your apps</p>
</li>
<li><p>What Virtual Machine Scale Sets are, and how they help handle sudden traffic spikes</p>
</li>
<li><p>The differences between Azure Load Balancer and Azure Application Gateway, and when to use each</p>
</li>
</ul>
<p>By the end, you won’t just understand what these tools do – you’ll know <em>when</em> and <em>why</em> to use them in real-world scenarios.</p>
<p>Whether you’re a curious beginner, a hands-on builder, or someone just trying to wrap their head around Azure’s ecosystem, this guide is for you.</p>
<p>Ready to untangle the cloud spaghetti? Let’s go! 🍝🚀</p>
<h2 id="heading-table-of-contents">📚 Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-are-load-balancers">🧊 What Are Load Balancers?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-applications-were-deployed-before-load-balancers">🖥️ How Applications Were Deployed Before Load Balancers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-azure-virtual-machines-vms-the-building-blocks">⚙️ Azure Virtual Machines (VMs) – The Building Blocks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-need-for-scaling-vertical-vs-horizontal">📈 The Need for Scaling – Vertical vs Horizontal</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-azure-virtual-machine-scale-sets-vmss-scaling-made-simple">🔁 Azure Virtual Machine Scale Sets (VMSS) – Scaling Made Simple</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-azure-load-balancer-spreading-the-traffic">📦 Azure Load Balancer – Spreading the Traffic</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-azure-application-gateway-smart-routing-for-modern-apps">🍴 Azure Application Gateway – Smart Routing for Modern Apps</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-azure-load-balancer-vs-azure-application-gateway">🔍 Azure Load Balancer vs Azure Application Gateway</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-use-cases-when-to-use-what">🧭</a> <a class="post-section-overview" href="#heading-use-cases-when-to-use-each-one">Use Cases: When to Use Each One</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">✅ Conclusion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-study-further">Study Further 📚</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-about-the-author">About the Author 👨‍💻</a></p>
</li>
</ol>
<h2 id="heading-what-are-load-balancers">🧊 What Are Load Balancers?</h2>
<p>Imagine you're running a small restaurant with just one chef in the kitchen. Everything goes smoothly when you have a few customers – each order is prepared one after the other, and everyone leaves satisfied.</p>
<p>But what happens when 50 people walk in all at once?</p>
<p>🍽️ One chef can’t handle that many orders at the same time.<br>⏳ People start waiting longer.<br>😤 Some customers leave.<br>💥 The chef gets overwhelmed – and eventually burns out.</p>
<p>This is what can happen to a server (the computer running your app) when too many users try to access it at the same time.</p>
<h3 id="heading-so-what-does-a-load-balancer-do">So, What Does a Load Balancer Do?</h3>
<p>A <strong>load balancer</strong> is like a smart restaurant manager. But instead of food orders, it handles user requests – the things people do when they open your app, click buttons, or load data.</p>
<p>Let’s say you now have three chefs (servers) instead of one. The load balancer’s job is to:</p>
<ul>
<li><p>👀 Watch for incoming orders (user requests)</p>
</li>
<li><p>🧠 Decide which chef (server) is available or least busy</p>
</li>
<li><p>🍽️ Send that request to the right one</p>
</li>
<li><p>🔁 Repeat this over and over, making sure things stay fast and smooth</p>
</li>
</ul>
<p>So in simple terms, a load balancer takes all the incoming traffic to your app and distributes it across multiple servers so no single server gets overloaded – cool, right? 🙂</p>
<h3 id="heading-why-were-load-balancers-introduced">Why Were Load Balancers Introduced?</h3>
<p>Back in the early days, many applications were hosted on just one machine – called a Single Server Deployment.</p>
<p>That was okay when you had a small number of users. But once things started to grow – more users, more actions, more data – single servers became a bottleneck:</p>
<ul>
<li><p>They could only handle a limited number of requests.</p>
</li>
<li><p>If they went down, your entire app would stop working.</p>
</li>
<li><p>Scaling (adding more power) was expensive and manual.</p>
</li>
</ul>
<p>💡 Enter <strong>load balancers</strong> – designed to solve this by making it possible to:</p>
<ul>
<li><p>Spread traffic across multiple servers (so no one server crashes under pressure),</p>
</li>
<li><p>Replace or restart servers without downtime,</p>
</li>
<li><p>Add or remove servers as needed, depending on how busy your app is (this is called <strong>scaling</strong>).</p>
</li>
</ul>
<h3 id="heading-a-simple-use-case-scenario">A Simple Use-Case Scenario</h3>
<p>Let’s say you're building an online store — your own mini Amazon. At first, you host your app on one Azure Virtual Machine. Things are great. But one day, you run a huge promo and suddenly…thousands of people flood in to browse, shop, and check out.</p>
<p>Your single VM starts lagging.</p>
<p>Orders fail. People complain. Your dream app? Crashing fast. 💥</p>
<p>So what do you do?</p>
<p>You spin up two more VMs to help out – but now you’ve got another problem: <em>How do you divide the traffic between the three?</em></p>
<p>This is where the load balancer steps in. It:</p>
<ul>
<li><p>Looks at every incoming user request</p>
</li>
<li><p>Figures out which VM is available and least busy</p>
</li>
<li><p>Sends the request there</p>
</li>
<li><p>Keeps rotating requests in real-time</p>
</li>
</ul>
<p>And the result?<br>✅ No single VM gets overwhelmed<br>✅ Your app stays fast and responsive<br>✅ Users are happy (and buying stuff again!)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746980088916/41be330b-8d5b-4709-b07d-3f1a19d641e7.png" alt="Load balancer illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-applications-were-deployed-before-load-balancers">🖥️ How Applications Were Deployed Before Load Balancers</h2>
<p>Before cloud tools like load balancers came along, the typical way to run an application was pretty simple: You’d deploy the entire app on a single server, like running a small business from one tiny shop.</p>
<h3 id="heading-first-things-first-whats-a-server">First Things First: What’s a Server?</h3>
<p>Think of a server as a special computer that’s always connected to the internet. Its job is to “serve” your app to people when they visit your website, open your app, or use your service.</p>
<p>In cloud platforms like Azure, we usually call these Virtual Machines (VMs) – basically, software-powered servers you can spin up with a few clicks.</p>
<h3 id="heading-monoliths-vs-microservices">Monoliths vs Microservices</h3>
<p>Now, applications come in different “shapes.” The two most common are:</p>
<ul>
<li><p><strong>Monoliths</strong>: Everything is bundled together into one big app. All the code – from user login to shopping cart to checkout – lives in a single unit.</p>
</li>
<li><p><strong>Microservices</strong>: The app is broken into smaller, independent apps (services). Each service does one job – like login, payments, orders – and runs separately.</p>
</li>
</ul>
<h4 id="heading-how-were-these-apps-deployed">How Were These Apps Deployed?</h4>
<p>Whether it was a monolith or a bunch of microservices, they were all usually deployed on a single server (VM).</p>
<p>For monoliths, you just ran the entire app directly on the server. For microservices: you'd run each service in a separate space on that same server, using <strong>containers</strong>.</p>
<h4 id="heading-wait-whats-a-container">Wait — What’s a Container?</h4>
<p>A container is like a mini-computer <em>inside</em> a computer. It has everything an app needs to run – code, tools, settings – and it keeps each app isolated from the others.</p>
<p>Why use containers?</p>
<ul>
<li><p>You can run multiple services on the same server without their underlying software (software needed for each app to run) interfering with each other.</p>
</li>
<li><p>It’s faster and more efficient than installing everything directly on the server.</p>
</li>
<li><p>They make moving apps between environments (for example, test → production) super smooth (no more “But, it works on my machine…”).</p>
</li>
</ul>
<p>Popular tools like Docker make working with containers easy.</p>
<h4 id="heading-connecting-it-all-together-domains-subdomains-and-reverse-proxies">Connecting It All Together: Domains, Subdomains, and Reverse Proxies</h4>
<p>When your app lives on a server, you want people to be able to reach it. That’s where <strong>domain names</strong> come in.</p>
<ul>
<li><p>Your server has a public IP address – a set of numbers like <code>102.80.1.23</code>, that gives it a unique identifier on the public internet</p>
</li>
<li><p>But instead of asking users to type numbers, you link that IP to a domain name, like <code>mycoolapp.com</code></p>
</li>
</ul>
<p>If your app has microservices, you might even assign <strong>subdomains</strong> like:</p>
<ul>
<li><p><code>api.mycoolapp.com</code> for the backend</p>
</li>
<li><p><code>dashboard.mycoolapp.com</code> for the user interface</p>
</li>
<li><p><code>payments.mycoolapp.com</code> for payments</p>
</li>
</ul>
<p>To manage all this, you’d use a <strong>reverse proxy</strong> (like Nginx or Apache). It listens on the main domain and subdomains, and forwards traffic to the right app or service.</p>
<p>Example:</p>
<ul>
<li><p>Someone visits <code>dashboard.mycoolapp.com</code></p>
</li>
<li><p>The reverse proxy checks the domain and forwards the request to the correct container running the dashboard service</p>
</li>
</ul>
<p>And to help with all of this setup – from deploying containers to configuring reverse proxies – there are developer-friendly tools like <a target="_blank" href="https://coolify.io">Coolify</a>. Coolify is an open-source platform that makes it super easy for developers and DevOps teams to:</p>
<ul>
<li><p>Deploy apps in containers</p>
</li>
<li><p>Set up domains and subdomains</p>
</li>
<li><p>Configure reverse proxies – all from a clean dashboard, no complex terminal commands needed</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746979943646/a6525a09-f44a-4e00-a945-7bded3483b0d.jpeg" alt="Coolify dashboard example" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>All this was set up on ONE SERVER/VM. But here’s the catch: when that one server got overloaded or went down…💥 everything stopped.</p>
<p>That’s why we needed a better way. And that's where <strong>scaling</strong> and <strong>load balancing</strong> came in – to keep apps running smoothly, no matter the traffic.</p>
<h2 id="heading-azure-virtual-machines-vms-the-building-blocks">⚙️ Azure Virtual Machines (VMs) – The Building Blocks</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746980948928/eb6a7fb2-7432-42ed-8cbd-bff6c8250d4e.jpeg" alt="Virtual Machine illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>When it comes to running apps in the cloud, <strong>Virtual Machines (VMs)</strong> are the basic building blocks – kind of like renting an apartment in a giant digital skyscraper.</p>
<p>You don’t need to buy the whole building (aka physical servers), you just rent the space you need, when you need it.</p>
<h3 id="heading-what-exactly-is-a-virtual-machine">What Exactly Is a Virtual Machine?</h3>
<p>A Virtual Machine is a software-based computer that runs inside a real, physical computer (a server) – hosted in a data center, like those run by Microsoft Azure.</p>
<p>It looks and behaves like a normal computer:</p>
<ul>
<li><p>It has an operating system (Windows, Linux)</p>
</li>
<li><p>You can install apps</p>
</li>
<li><p>It has memory (RAM), storage (disks), and CPU</p>
</li>
</ul>
<p>But the best part? You don’t need to worry about the hardware. Azure takes care of that behind the scenes – all you do is say:</p>
<blockquote>
<p>“Hey Azure, give me a Linux VM with 4GB RAM and 2 CPUs.”</p>
</blockquote>
<p>And boom 💥 — it spins up in minutes.</p>
<h3 id="heading-why-use-a-vm">Why Use a VM?</h3>
<p>Let’s say you’ve built a web app – it’s just a simple blog. You want to deploy it and make it accessible to the world.</p>
<p>Here's what you can do with a VM:</p>
<ul>
<li><p>Set it up with your favorite OS (for example, Ubuntu)</p>
</li>
<li><p>Install web servers like Nginx or Apache</p>
</li>
<li><p>Deploy your app</p>
</li>
<li><p>Bind it to your domain name</p>
</li>
<li><p>Let the world visit your blog at <a target="_blank" href="http://myawesomeblog.com"><code>myawesomeblog.com</code></a></p>
</li>
</ul>
<p>It’s your own personal environment – no sharing, full control.</p>
<h2 id="heading-the-need-for-scaling-vertical-vs-horizontal">📈 The Need for Scaling – Vertical vs Horizontal</h2>
<p>Imagine your app is growing. At first, it’s just a few users. Then a few hundred. Then thousands are logging in, placing orders, chatting, uploading photos – all at once 😮</p>
<p>Suddenly, your server (VM) is under pressure. It’s like trying to pour a flood through a straw.</p>
<h3 id="heading-so-what-do-you-do-when-one-server-isnt-enough">So, What Do You Do When One Server Isn’t Enough?</h3>
<p>This is where scaling comes in – the art of upgrading your app’s infrastructure to keep up with traffic.</p>
<p>There are two main ways to scale:</p>
<h4 id="heading-option-1-vertical-scaling-aka-scaling-up">🧱 Option 1: Vertical Scaling (aka Scaling Up)</h4>
<p>You take your existing VM and give it more power:</p>
<ul>
<li><p>Add more CPUs 🧠</p>
</li>
<li><p>Increase RAM 🧵</p>
</li>
<li><p>Add faster disks ⚡</p>
</li>
</ul>
<p>Think of it like upgrading from a regular car to a sports car. It’s the same vehicle, just faster and stronger.</p>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Simple to do</p>
</li>
<li><p>No major changes to your app setup</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>There’s a limit to how much you can upgrade</p>
</li>
<li><p>Still a single point of failure: if the VM crashes, everything goes down 😬</p>
</li>
</ul>
<h4 id="heading-option-2-horizontal-scaling-aka-scaling-out">🧩 Option 2: Horizontal Scaling (aka Scaling Out)</h4>
<p>Instead of boosting one server, you add more servers – multiple VMs running copies of your app.</p>
<p>Now:</p>
<ul>
<li><p>Users can be distributed across all these VMs</p>
</li>
<li><p>If one goes down, others keep serving traffic</p>
</li>
<li><p>You can <em>dynamically</em> add or remove VMs based on traffic</p>
</li>
</ul>
<p>It’s like opening more checkout counters in a busy supermarket 🛒</p>
<p><strong>Pros:</strong></p>
<ul>
<li><p>The load is evenly distributed. For example, if one server previously handled 100% of the traffic, adding two more servers would result in the traffic being split into approximately 33% to 34% for each server.</p>
</li>
<li><p>Improves both performance and reliability</p>
</li>
<li><p>You can scale based on real-time demand, that is traffic inflow</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>Needs something to split traffic between VMs – Load Balancers</p>
</li>
<li><p>More expensive. You end up paying the original amount for 1 VM (for example $30) for the number of VMs you provide – if you provide 3 VMs at $30 each, you end up paying $90 at the end of the month</p>
</li>
</ul>
<h3 id="heading-quick-real-world-example">Quick Real-World Example</h3>
<p>Let’s say you’ve launched an e-commerce site for sneakers 👟 Traffic spikes during a big sale? Your vertical scaling (bigger VM) might choke.</p>
<p>But with horizontal scaling:</p>
<ul>
<li><p>You spin up 5 VMs across different regions</p>
</li>
<li><p>Traffic is shared between them</p>
</li>
<li><p>If one VM slows down, others handle the load</p>
</li>
</ul>
<h4 id="heading-so-remember">So, remember 👇🏾</h4>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Scaling Type</td><td>Description</td><td>Pros</td><td>Cons</td></tr>
</thead>
<tbody>
<tr>
<td>🧱 Vertical Scaling</td><td>Make 1 VM more powerful (adding more CPU power, SSD, RAM, bandwidth, and so on)</td><td>Easy setup, fewer changes</td><td>Hardware limits, 1 point of failure - If that 1 server/VM goes down, so does your app :(</td></tr>
<tr>
<td>🧩 Horizontal Scaling</td><td>Add more VMs to handle traffic</td><td>Flexible, reliable</td><td>Needs traffic distribution logic (Load Balancer). Usually more expensive (the price of 1 VM times the number of VMs)</td></tr>
</tbody>
</table>
</div><h2 id="heading-azure-virtual-machine-scale-sets-vmss-scaling-made-simple">🔁 Azure Virtual Machine Scale Sets (VMSS) – Scaling Made Simple</h2>
<p>Okay – so we’ve talked about <strong>horizontal scaling</strong>: adding multiple VMs to handle growing traffic. Sounds great, right?</p>
<p>But here’s the thing: manually spinning up and configuring 5, 10, or 100 VMs... every time your app gets busy? Yeah, that’s not fun 🙃</p>
<h3 id="heading-enter-virtual-machine-scale-sets-vmss">Enter: Virtual Machine Scale Sets (VMSS)</h3>
<p>VMSS is Azure’s way of automating horizontal scaling. Instead of creating each VM one by one, you define a template, and Azure takes care of the rest:</p>
<ul>
<li><p>How many VMs to start with</p>
</li>
<li><p>How to configure them (OS, apps, settings) ⚙️</p>
</li>
<li><p>When to add or remove VMs based on traffic 📈📉</p>
</li>
</ul>
<h3 id="heading-a-simple-analogy">A Simple Analogy 🧃</h3>
<p>Think of VMSS like a juice dispenser at a party:</p>
<ul>
<li><p>At first, it pours into 2 cups (VMs)</p>
</li>
<li><p>If 10 guests show up? It starts filling 5 cups</p>
</li>
<li><p>Party slows down? Back to 2 cups again</p>
</li>
</ul>
<p>You never have to refill manually – the dispenser adjusts on its own. 🎉</p>
<h3 id="heading-how-it-works-without-the-jargon">How It Works (Without the Jargon 😌)</h3>
<ol>
<li><p><strong>You set the rules:</strong> “If CPU usage goes above 70%, add 2 more VMs.”</p>
</li>
<li><p><strong>Azure watches traffic and adjusts the number of VMs</strong> automatically.</p>
</li>
<li><p><strong>All VMs are identical</strong> – like clones, all running the same app setup.</p>
</li>
<li><p><strong>It works with Azure Load Balancer</strong> to spread traffic across all these VMs smoothly.</p>
</li>
</ol>
<h3 id="heading-real-life-example-food-delivery-app">Real-Life Example: Food Delivery App 🍕📱</h3>
<p>You’ve built an app where users order food. During lunch and dinner, traffic explodes.</p>
<p>💡 With VMSS:</p>
<ul>
<li><p>You start with 3 VMs in the morning</p>
</li>
<li><p>At 12PM, Azure sees high CPU usage, so it spins up 5 more VMs</p>
</li>
<li><p>At 3PM, traffic drops, so Azure removes the extra VMs</p>
</li>
</ul>
<p>You only pay for what you use. And users get a smooth experience – no delays, no crashes 👌🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746982520998/7fe3c997-fc8f-418a-861b-e999905ca43c.png" alt="Auto-scaling illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-azure-load-balancer-spreading-the-traffic">📦 Azure Load Balancer – Spreading the Traffic</h2>
<p>By now, you know that your app can live on multiple Virtual Machines (VMs), and that you can scale them easily using Virtual Machine Scale Sets (VMSS).</p>
<p>But here's the big question: when users start accessing your app – hundreds, even thousands at once – how do you make sure that all that traffic is fairly and efficiently distributed across those VMs?</p>
<p>You don’t want one VM to be overwhelmed while others are just chilling. You need a middleman – something smart enough to balance the load.</p>
<p>That’s where <strong>Azure Load Balancer</strong> steps in. It’s Azure’s way of saying, “Don’t worry, I got this” when traffic starts rolling in.</p>
<h3 id="heading-so-what-is-azure-load-balancer">🏢 So, What Is Azure Load Balancer?</h3>
<p>Azure Load Balancer is a <strong>traffic director</strong>. It takes incoming traffic from the internet (or even internal sources within your network) and intelligently spreads it across multiple backend machines – usually VMs.</p>
<p>It's like having a well-trained receptionist who routes every customer to the next available agent, so no one waits too long and no one gets overwhelmed 😃.</p>
<p>And the best part? This entire process happens in the background – fast, silent, and seamless. Users visiting your app have no idea a traffic manager is working behind the scenes. They just see a fast, responsive experience.</p>
<h3 id="heading-the-frontend-ip-your-apps-public-face">🌐 The Frontend IP – Your App’s Public Face</h3>
<p>Every Azure Load Balancer is tied to a <strong>Frontend IP</strong>, which is basically the public IP address of your application – the one users connect to when they open <code>www.yourapp.com</code>.</p>
<p>This IP acts as the entry point. All user traffic comes through it first. But the Load Balancer doesn’t actually run your app. Instead, it accepts the traffic and forwards it to one of the VMs in the backend pool (we’ll get to that shortly).</p>
<p>You can configure this Frontend IP to be either public (accessible over the internet) or private (used for internal traffic within your cloud network – say, between microservices or internal tools).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747055268951/5afbb738-d00d-4f49-9709-2fa1fe7cffdd.png" alt="Frontend IP address illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-backend-pool-where-the-magic-happens">🗂️ Backend Pool – Where the Magic Happens</h3>
<p>Behind every Azure Load Balancer is a <strong>backend pool</strong> – a group of VMs (or VM Scale Set instances) where your actual app is running. These are the real workers, doing all the heavy lifting.</p>
<p>When traffic hits the Frontend IP, the Load Balancer takes that request and hands it off to one of the VMs in the backend pool.</p>
<p>But it doesn’t just randomly pick one. It checks a few things first – like whether the VM is healthy, whether it's already busy, and what rules you’ve set.</p>
<p>Each VM in the pool typically runs the same app or service. This means any of them can handle any incoming request, which is what makes load balancing possible in the first place.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747055337014/e831056d-7c0c-49d9-b05a-6d3dbe3edc76.png" alt="Backend pool illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-health-probes-keeping-tabs-on-the-vms">🩺 Health Probes – Keeping Tabs on the VMs</h3>
<p>Now, how does the Load Balancer know which VM is healthy or not? This is where <strong>health probes</strong> come in. Think of them as regular check-ups.</p>
<p>You configure the Load Balancer to periodically "ping" each VM – maybe by hitting a specific URL (like <code>/health</code>) or a certain port (like 80 for HTTP). If a VM doesn’t respond correctly, Azure marks it as unhealthy and temporarily removes it from the rotation.</p>
<p>This ensures users never get routed to a broken or unresponsive instance of your app. And once the VM becomes healthy again, it's automatically added back to the pool.</p>
<h3 id="heading-load-balancing-rules-who-gets-what">⚖️ Load Balancing Rules – Who Gets What?</h3>
<p>Next, we have <strong>Load Balancing Rules</strong>. These are the instructions that tell Azure Load Balancer exactly how to behave.</p>
<p>You can define rules like:</p>
<ul>
<li><p>“Forward all HTTP (port 80) traffic to backend pool VMs on port 80”</p>
</li>
<li><p>“Forward HTTPS (port 443) traffic to VMs on port 443”</p>
</li>
<li><p>“Only route traffic to healthy VMs”</p>
</li>
</ul>
<p>These rules make Azure Load Balancer highly customizable. You get to decide how traffic flows, which protocols to support, and how to handle backend ports. It's like customizing the rules of a relay race – who gets the baton and when.</p>
<h3 id="heading-real-world-example-sneaker-sale-rush">👟 Real-World Example: Sneaker Sale Rush</h3>
<p>Imagine you're running an online sneaker store at <code>www.sneakerblast.com</code>. You’re launching a flash sale, and thousands of users are hitting your website all at once.</p>
<p>Thanks to your Azure Load Balancer, here’s what happens:</p>
<ol>
<li><p>All those users land on your Frontend IP, the public face of your site.</p>
</li>
<li><p>The Load Balancer accepts the traffic and checks the health probes of all VMs in the backend pool.</p>
</li>
<li><p>Based on its rules, it forwards each user to a healthy, available VM.</p>
</li>
<li><p>One VM might serve a user in Lagos, another in Nairobi, another in Accra – all seamlessly.</p>
</li>
</ol>
<p>If one VM crashes or lags? The Load Balancer detects it instantly and stops routing traffic to it until it’s back online.</p>
<p>That’s smooth traffic management without any manual effort.</p>
<h2 id="heading-azure-application-gateway-smart-routing-for-modern-apps">🍴 Azure Application Gateway – Smart Routing for Modern Apps</h2>
<p>So far, we’ve seen how Azure Load Balancer helps you split traffic across multiple VMs running a single service – like a monolithic app or a web frontend.</p>
<p>Let’s say you have a web application deployed on a VM. It listens on port 80, and you’ve scaled it into 3 instances. The Azure Load Balancer takes requests from the internet and spreads them across all 3 instances of the same service. Easy, right?</p>
<p>You can even link the Load Balancer’s public IP address to your domain – like <code>mydomain.com</code> – so users can visit your site normally.</p>
<h3 id="heading-but-what-if-you-have-multiple-services">🧠 But What If You Have <em>Multiple</em> Services?</h3>
<p>Now imagine you’ve gone beyond just one app. You’re building something more modern, like a set of microservices.</p>
<p>You now have:</p>
<ul>
<li><p>A payment service listening on port 5000</p>
</li>
<li><p>An authentication service on port 6000</p>
</li>
<li><p>A purchase service on port 7000</p>
</li>
</ul>
<p>All deployed across the same VMs (or Virtual Machine Scale Set), just on different ports.</p>
<p>Here’s the problem: an Azure Load Balancer is designed to route traffic to <em>one</em> backend pool – basically one service – on one port. If you tie it to <code>mydomain.com</code>, it can only send traffic to one of your microservices. 😬</p>
<p>So… what do you do?</p>
<p>You might think: “Let me just create a separate Load Balancer for each service!” 🤕</p>
<p>But that means:</p>
<ul>
<li><p>You’ll have to pay for multiple load balancers</p>
</li>
<li><p>You’ll end up managing 3–5 public IP addresses</p>
</li>
<li><p>You might even need to buy multiple domains like <code>mypayment.com</code>, <code>myauth.com</code>, and so on to route users properly</p>
</li>
</ul>
<p>Yikes. That’s impractical, messy, <em>and</em> expensive 😖💸</p>
<h3 id="heading-enter-azure-application-gateway">🎉 Enter Azure Application Gateway</h3>
<p><strong>Azure Application Gateway</strong> solves this problem beautifully. It’s designed to route traffic intelligently – not just to one service, but to multiple services using just one gateway.</p>
<p>It works like this:</p>
<ol>
<li><p>You create one public-facing frontend IP (like <code>52.160.100.5</code>)</p>
</li>
<li><p>You link that IP address to your main domain, for example <code>mydomain.com</code></p>
</li>
<li><p>Then, you define multiple backend pools – one for each service:</p>
<ul>
<li><p>Payment service (port 5000)</p>
</li>
<li><p>Auth service (port 6000)</p>
</li>
<li><p>Purchase service (port 7000)</p>
</li>
</ul>
</li>
<li><p>Next, you set up routing rules that decide how to forward each request.</p>
</li>
</ol>
<h3 id="heading-two-ways-to-route-with-application-gateway">✨ Two Ways to Route with Application Gateway</h3>
<p>You can configure <strong>smart routing</strong> based on:</p>
<ul>
<li><p><strong>URL paths</strong>:</p>
<ul>
<li><p><code>mydomain.com/payment</code> → Payment service</p>
</li>
<li><p><code>mydomain.com/auth</code> → Auth service</p>
</li>
</ul>
</li>
<li><p><strong>Subdomains</strong> (host headers):</p>
<ul>
<li><p><code>payment.mydomain.com</code> → Payment service</p>
</li>
<li><p><code>auth.mydomain.com</code> → Auth service</p>
</li>
</ul>
</li>
</ul>
<p>This way, all your services share one public IP and one domain – super clean, super efficient 🙌🏾</p>
<h3 id="heading-real-life-scenario-lets-break-it-down">🤓 Real-Life Scenario (Let’s Break It Down)</h3>
<p>Let’s say you’re building a startup platform that has three key microservices:</p>
<ul>
<li><p><strong>Payment service</strong> that handles transactions</p>
</li>
<li><p><strong>Authentication service</strong> that handles login and user identity</p>
</li>
<li><p><strong>Purchase service</strong> that manages product ordering</p>
</li>
</ul>
<p>Each service is containerized and deployed on the same VM (or across several VMs using a VM Scale Set). But – and this is key – they all listen on <strong>different ports</strong> inside the VMs:</p>
<ul>
<li><p>Payment → port 3000</p>
</li>
<li><p>Auth → port 6000</p>
</li>
<li><p>Purchase → port 7000</p>
</li>
</ul>
<p>Now, without a smart routing solution, you’d be stuck trying to expose just one of these services using a standard Azure Load Balancer. But you need all three to be accessible from the internet – and you don’t want to pay for or manage 3 different Load Balancers 😅</p>
<p>So, what do you do?</p>
<h3 id="heading-using-azure-application-gateway-to-route-traffic-intelligently">🧠 Using Azure Application Gateway to Route Traffic Intelligently</h3>
<p>Here's how you can fix this using <strong>one</strong> Application Gateway:</p>
<ol>
<li><p>Deploy your microservices inside each VM:</p>
<ul>
<li><p>Each service runs on a specific port</p>
</li>
<li><p>All VMs in your scale set are identical (they contain all three services)</p>
</li>
</ul>
</li>
<li><p>Create backend pools in Application Gateway:</p>
<ul>
<li><p>A backend pool for the payment service (pointing to port 3000 on all VMs)</p>
</li>
<li><p>One for the auth service (port 6000)</p>
</li>
<li><p>Another for the purchase service (port 7000)</p>
</li>
</ul>
</li>
<li><p>Create routing rules:</p>
<ul>
<li><p>Option A (Path-based routing):</p>
<ul>
<li><p>Requests to <code>mydomain.com/payment</code> → go to the payment backend pool</p>
</li>
<li><p>Requests to <code>mydomain.com/auth</code> → go to the auth backend pool</p>
</li>
<li><p>Requests to <code>mydomain.com/purchase</code> → go to the purchase backend pool</p>
</li>
</ul>
</li>
<li><p>Option B (Subdomain-based routing):</p>
<ul>
<li><p><code>payment.mydomain.com</code> → payment service</p>
</li>
<li><p><code>auth.mydomain.com</code> → auth service</p>
</li>
<li><p><code>purchase.mydomain.com</code> → purchase service</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<p>You just tell the Application Gateway: “Hey, if a request comes in for this URL or subdomain, send it to this port on these VMs.” And it does just that – consistently and intelligently 🔁</p>
<h3 id="heading-so-whats-really-happening">📦 So, What’s Really Happening?</h3>
<p>Imagine a user visits <code>mydomain.com/auth</code>. Here’s what goes on behind the scenes:</p>
<ol>
<li><p>The DNS translates <code>mydomain.com</code> to your Application Gateway’s public IP</p>
</li>
<li><p>The Gateway receives the request</p>
</li>
<li><p>It checks your routing rules</p>
</li>
<li><p>It sees that <code>/auth</code> should go to the backend pool for port 6000</p>
</li>
<li><p>It forwards the request to one of the VMs running the auth service</p>
</li>
<li><p>The response goes back to the user – fast and seamless ✨</p>
</li>
</ol>
<p>This happens in milliseconds, for every request. And because the Application Gateway is aware of multiple ports and services, it can handle routing logic that a regular Load Balancer just can’t do.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747056436345/7ea97231-d2ee-4f63-aff1-50595e7c06e0.png" alt="Application Gateway Illustration" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-azure-load-balancer-vs-azure-application-gateway">🔍 Azure Load Balancer vs Azure Application Gateway</h2>
<p>By now, you've seen how both tools help route traffic in Azure – but they solve different problems.</p>
<p>Let’s break down how they compare, and when you should use one over the other 👇🏾</p>
<h3 id="heading-1-routing-logic">🛣️ 1. <strong>Routing Logic</strong></h3>
<p><strong>Azure Load Balancer</strong><br>It simply distributes incoming traffic evenly across a pool of VMs. It doesn’t care <em>what</em> the request is – it just balances the load.  </p>
<p>Imagine a delivery guy who doesn't ask questions – he just drops each package at the next available house.  </p>
<p>That’s what Azure Load Balancer does: it sends traffic to one of your servers without looking inside the request.</p>
<p><strong>Azure Application Gateway</strong><br>This is the smart one. It looks at <em>what’s inside</em> each request (like the URL path or domain) and makes intelligent decisions.</p>
<p>Just like a smarter delivery guy who looks at the address and decides where to go: "Oh! This one is for the payment office, not the main office."  </p>
<p>That’s what Application Gateway does: it reads the request (like the URL or domain name) and sends it to the right place according to the routing rules.</p>
<h3 id="heading-2-protocols-handled">🌐 2. <strong>Protocols Handled</strong></h3>
<p><strong>Load Balancer</strong><br>Works at the transport layer (Layer 4 in the OSI model). It deals with TCP/UDP traffic – raw network traffic, like HTTP, video streaming, games, and so on.</p>
<p><strong>Application Gateway</strong><br>Works at the application layer (Layer 7). It handles web traffic only – like websites and apps (HTTP/HTTPS) – and it can actually read what's being asked, like:</p>
<ul>
<li><p>“Go to /login”</p>
</li>
<li><p>“Go to <a target="_blank" href="http://payment.mydomain.com">payment.mydomain.com</a>”.</p>
</li>
</ul>
<p>TL;DR: Load Balancer just pushes packets. App Gateway actually <em>reads</em> your web requests.</p>
<h3 id="heading-3-use-case-scenarios">🔁 3. <strong>Use Case Scenarios</strong></h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Situation</td><td>Best Choice</td></tr>
</thead>
<tbody>
<tr>
<td>You have one big app and just want to spread users across servers</td><td>✅ Load Balancer</td></tr>
<tr>
<td>You have multiple services (like login, payment, and so on) and need to send users to the right one</td><td>✅ Application Gateway</td></tr>
<tr>
<td>You want to use subdomains (like <a target="_blank" href="http://login.mysite.com">login.mysite.com</a>)</td><td>✅ Application Gateway</td></tr>
<tr>
<td>You want to secure your website with HTTPS and Web Application Firewall (WAF)</td><td>✅ Application Gateway</td></tr>
<tr>
<td>You want the simplest setup and lowest cost</td><td>✅ Load Balancer</td></tr>
</tbody>
</table>
</div><h3 id="heading-4-ssl-termination-amp-security-features">🔐 4. <strong>SSL Termination &amp; Security Features</strong></h3>
<p><strong>Load Balancer</strong> doesn’t handle security stuff. You’ll need to secure each server yourself (for example, set up HTTPS on each one).</p>
<p><strong>Application Gateway</strong> can secure everything in one place – you upload your SSL certificate once and it takes care of HTTPS for all services.</p>
<p>It can also protect you from hackers and bad traffic with something called <strong>WAF (Web Application Firewall)</strong>, which protects your app from threats like SQL injection, XSS, and so on (you need to set this up manually).</p>
<h3 id="heading-5-pricing-and-complexity">💰 5. <strong>Pricing and Complexity</strong></h3>
<p><strong>Load Balancer</strong> is cheaper and easier to set up. Great when you don’t need anything fancy.</p>
<p><strong>Application Gateway</strong> costs more, but gives you more control and less headache when working with complex apps and microservices.</p>
<p>Trying to use Load Balancer for multiple services? You’ll need to create one Load Balancer per service, which becomes costly and impractical.</p>
<h3 id="heading-summary-table">🧠 Summary Table</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>Load Balancer</td><td>Application Gateway</td></tr>
</thead>
<tbody>
<tr>
<td>Can it understand the request?</td><td>❌ No</td><td>✅ Yes</td></tr>
<tr>
<td>Can it route based on URL or subdomain?</td><td>❌ No</td><td>✅ Yes</td></tr>
<tr>
<td>Can it handle secure HTTPS traffic?</td><td>❌ No</td><td>✅ Yes</td></tr>
<tr>
<td>Is it good for simple apps?</td><td>✅ Yes</td><td>✅ Yes</td></tr>
<tr>
<td>Is it good for complex apps with many services?</td><td>❌ No</td><td>✅ Yes</td></tr>
<tr>
<td>Cost</td><td>💲 Lower</td><td>💰 Higher</td></tr>
</tbody>
</table>
</div><h2 id="heading-use-cases-when-to-use-each-one">🧭 Use Cases: When to Use Each One</h2>
<p>There’s no one-size-fits-all when it comes to hosting apps in the cloud. The right setup depends on what you’re building, how much traffic you expect, and how complex your app is.</p>
<p>Let’s walk through 4 different use-case scenarios, starting from the most basic setup all the way to a fully auto-scaled and smartly routed architecture.</p>
<h3 id="heading-1-single-vm-instance-for-small-projects-or-internal-tools">1️⃣ <strong>Single VM Instance – For Small Projects or Internal Tools</strong></h3>
<p><strong>Use this when:</strong><br>You're just getting started. You’ve built a small app – maybe a portfolio, a blog, or a side project – and you want to make it live, OR You’re a startup that just launched.</p>
<p><strong>How it works:</strong><br>You spin up one Azure VM, install your app on it, and open the port it listens on (for example, port 80 for a web server). You can then attach a public IP to the VM and bind it to a custom domain like <code>myawesomeapp.com</code>.</p>
<p><strong>Real-life examples:</strong></p>
<ul>
<li><p>A developer hosting a portfolio website or blog</p>
</li>
<li><p>A startup testing a new product with only a few users</p>
</li>
<li><p>An internal company tool for a small team</p>
</li>
</ul>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Super simple setup</p>
</li>
<li><p>Low cost</p>
</li>
<li><p>Full control of your environment</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>If the VM goes down, your app goes down</p>
</li>
<li><p>No auto-scaling – performance may drop with traffic spikes (the only way to adapt to increased CPU/memory usage due to traffic inflow is via manually scaling the VM vertically)</p>
</li>
<li><p>You manually maintain and monitor everything</p>
</li>
</ul>
<h3 id="heading-2-manual-horizontal-scaling-for-apps-with-medium-predictable-traffic">2️⃣ <strong>Manual Horizontal Scaling – For Apps With Medium, Predictable Traffic</strong></h3>
<p><strong>Use this when:</strong><br>Your app is growing – maybe you have a few thousand users now, and performance matters. You want more than one server so your app doesn’t crash during busy hours.</p>
<p><strong>How it works:</strong><br>You manually create 2 or 3 Azure VMs with the same app setup. You then add a Load Balancer in front to split traffic evenly across them.</p>
<p><strong>Real-life examples:</strong></p>
<ul>
<li><p>A business with a customer portal</p>
</li>
<li><p>A school website that handles regular logins, lecture video streaming, and so on during class hours</p>
</li>
<li><p>An app that gets traffic mostly during the day (predictable load)</p>
</li>
</ul>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Better performance and availability</p>
</li>
<li><p>Load is shared across multiple VMs</p>
</li>
<li><p>You can scale manually when needed</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>You must manually add or remove VMs – which takes effort</p>
</li>
<li><p>Still need to monitor performance manually</p>
</li>
<li><p>No built-in automation or auto-healing</p>
</li>
</ul>
<h3 id="heading-3-auto-scaling-with-vm-scale-sets-azure-load-balancer-for-apps-with-spiky-or-unpredictable-traffic">3️⃣ <strong>Auto-Scaling with VM Scale Sets + Azure Load Balancer – For Apps With Spiky or Unpredictable Traffic</strong></h3>
<p><strong>Use this when:</strong><br>You’re building something more serious – traffic comes in waves (for example, a fitness/coach booking app), and you don’t want to sit around scaling VMs all day. You want Azure to automatically scale your infrastructure for you.</p>
<p><strong>How it works:</strong><br>You set up a Virtual Machine Scale Set (VMSS) that can automatically create more VMs when needed (like during high traffic), and remove them when things are calm — saving money. A Load Balancer distributes traffic across all those VMs.</p>
<p><strong>Real-life examples:</strong></p>
<ul>
<li><p>A media platform where people upload videos or photos</p>
</li>
<li><p>A shopping site that gets surges during promotions, for example Black Fridays</p>
</li>
<li><p>A booking platform with peak traffic in evenings/weekends</p>
</li>
</ul>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Automatic scaling – saves time and money</p>
</li>
<li><p>High availability: VMs can be replaced if one fails</p>
</li>
<li><p>Easy to grow as your user base grows</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>Works best if your app is monolithic (one big service)</p>
</li>
<li><p>No support for routing traffic to specific services – just spreads traffic across VMs</p>
</li>
<li><p>Load Balancer can’t look at URL paths or subdomains</p>
</li>
</ul>
<h3 id="heading-4-vm-scale-set-azure-application-gateway-for-microservices-or-complex-web-apps">4️⃣ <strong>VM Scale Set + Azure Application Gateway – For Microservices or Complex Web Apps</strong></h3>
<p><strong>Use this when:</strong><br>You have a modern, multi-service app – maybe built with microservices. Each service (like payments, authentication, search, and so on) lives on a different port or even in a container.</p>
<p>You want to route traffic smartly – like <code>/login</code> goes to the auth service, <code>/pay</code> to payments, and <code>/search</code> to the search service – all on the same domain.</p>
<p><strong>How it works:</strong><br>You still use a VM Scale Set for auto-scaling, but instead of a basic Load Balancer, you add an Application Gateway. It can inspect each request and send it to the right service based on things like:</p>
<ul>
<li><p>URL path (for example, <code>/payments</code>, <code>/orders</code>)</p>
</li>
<li><p>Subdomain (for example, <code>payments.mydomain.com</code>, <code>auth.mydomain.com</code>)</p>
</li>
</ul>
<p><strong>Real-life examples:</strong></p>
<ul>
<li><p>A full-blown SaaS product with multiple services</p>
</li>
<li><p>An e-commerce site with checkout, account, orders, and admin dashboards</p>
</li>
<li><p>A business migrating from a monolith to a microservices setup</p>
</li>
</ul>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Smart routing based on path or subdomain</p>
</li>
<li><p>Everything runs under one public IP and one domain</p>
</li>
<li><p>Secure HTTPS handling + optional Web Application Firewall (WAF)</p>
</li>
<li><p>Auto-scaling and high availability</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>More complex setup</p>
</li>
<li><p>Slightly higher cost due to Application Gateway</p>
</li>
<li><p>Needs planning around port numbers and backend pools</p>
</li>
</ul>
<h3 id="heading-quick-summary-table">🧠 Quick Summary Table</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Setup</td><td>Best For</td><td>Scaling</td><td>Routing Logic</td><td>Cost</td><td>Ease</td></tr>
</thead>
<tbody>
<tr>
<td>☁️ Single VM</td><td>Small sites, personal apps</td><td>❌ (Manual)</td><td>❌ One app only</td><td>💲 (Lowest)</td><td>⭐⭐⭐⭐</td></tr>
<tr>
<td>🧱 Manual Horizontal Scaling + Load Balancer</td><td>Mid-size apps, predictable traffic</td><td>✅ (Manual)</td><td>❌ One app only</td><td>💲💲💲 (due to multiple VMs running at once without down-scaling — even with no traffic)</td><td>⭐⭐ (due to manual scaling)</td></tr>
<tr>
<td>🔁 VMSS + Load Balancer</td><td>Busy apps, spiky traffic</td><td>✅ (Auto)</td><td>❌ One app only</td><td>💲💲</td><td>⭐⭐⭐</td></tr>
<tr>
<td>🍴 VMSS + App Gateway</td><td>Microservices, modern apps</td><td>✅ (Auto)</td><td>✅ Smart routing (involving multiple microservices)</td><td>💲💲💲💲(Highest)</td><td>⭐⭐</td></tr>
</tbody>
</table>
</div><h2 id="heading-conclusion">✅ Conclusion</h2>
<p>By now, you’ve gone from simply hearing the words “load balancer” or “scale set” to understanding exactly how they work, when to use them, and what problems they solve. Whether you’re just launching a small app or scaling up a high-traffic service, Azure gives you flexible, powerful tools to grow with confidence.</p>
<p>We started from the very beginning – a single virtual machine. It’s simple and great for small apps, but it quickly becomes a bottleneck as traffic grows.</p>
<p>That’s where scaling comes in. We explored:</p>
<ul>
<li><p>🧱 <strong>Vertical scaling</strong> – Upgrading the same VM (quick fix, but limited)</p>
</li>
<li><p>🧩 <strong>Horizontal scaling</strong> – Adding more VMs to handle traffic better</p>
</li>
</ul>
<p>Then we introduced Azure Virtual Machine Scale Sets (VMSS) – which bring auto-scaling to life. No more manual intervention – Azure can scale your servers up and down based on demand.</p>
<p>But where things really get smart is with load balancers:</p>
<ul>
<li><p>📦 <strong>Azure Load Balancer</strong> helps spread traffic across your VMs — great for single-service apps</p>
</li>
<li><p>🍴 <strong>Azure Application Gateway</strong> takes it further by routing requests based on URL paths or subdomains — perfect for multi-service or microservice apps</p>
</li>
</ul>
<h3 id="heading-tldr-what-should-you-use">🎯 TL;DR – What Should You Use?</h3>
<ul>
<li><p><strong>Single VM</strong>: For side projects, portfolios, or internal tools</p>
</li>
<li><p><strong>Manual scaling + Load Balancer</strong>: For medium apps with predictable load</p>
</li>
<li><p><strong>VMSS + Load Balancer</strong>: For monolithic apps with auto-scaling needs</p>
</li>
<li><p><strong>VMSS + Application Gateway</strong>: Also includes auto-scaling but for microservices or smart routing needs</p>
</li>
</ul>
<h3 id="heading-final-thoughts">💡 Final Thoughts</h3>
<p>Cloud apps grow – fast. And with growth comes complexity. But with the right Azure setup, you can stay one step ahead of your traffic, serve users better, and keep costs under control.</p>
<p>Remember: you don’t need to start big. Start small, understand your app's traffic patterns, and scale only when you need to. Tools like Azure VM Scale Sets, Load Balancer, and Application Gateway give you the control and power to build scalable, modern applications without over-engineering.</p>
<p>Thanks for sticking with me through this deep dive. I hope this made things clearer, simpler, and maybe even a little fun 😊</p>
<h2 id="heading-study-further"><strong>Study Further 📚</strong></h2>
<p>If you would like to learn more about Azure Virtual Machines, Scale Sets, Load Balancer, and Application Gateway, you can check out the courses below:</p>
<ul>
<li><p><a target="_blank" href="https://www.coursera.org/specializations/microsoft-azure-fundamentals-az900-exam-prep">Microsoft Azure Fundamentals AZ-900 Exam Prep Specialization</a> — Microsoft, Coursera</p>
</li>
<li><p><a target="_blank" href="https://youtu.be/QOv_-xBXkpo?si=kSijmQdev5cQbRKl">Azure Virtual Machine Tutorial | Creating A Virtual Machine In Azure | Azure Training | Simplilearn</a> — YouTube</p>
</li>
<li><p><a target="_blank" href="https://youtu.be/wN4lRWHUHA0?si=kWBGXhXZTnVgzuEj">Virtual machine scale sets</a> — YouTube</p>
</li>
<li><p><a target="_blank" href="https://youtu.be/VqBGjddK5VY?si=diLGQfuW5i0lxbse">Azure Load Balancer | Azure Load Balancer Tutorial | All About Load Balancer | Edureka</a> — YouTube</p>
</li>
<li><p><a target="_blank" href="https://youtu.be/V9EP4jAg4QM?si=t7EqQjw1eNHqOtjK">Azure Application Gateway Deep dive | Step by step explained</a> — YouTube</p>
</li>
</ul>
<h2 id="heading-about-the-author"><strong>About the Author 👨‍💻</strong></h2>
<p>Hi, I’m Prince! I’m a DevOps engineer and Cloud architect passionate about building, deploying, and managing scalable applications and sharing knowledge with the tech community.</p>
<p>If you enjoyed this article, you can learn more about me by exploring more of my blogs and projects on my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/">LinkedIn profile.</a> You can find my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/details/publications/">LinkedIn articles here</a>. You can also <a target="_blank" href="https://prince-onuk.vercel.app/achievements#articles">visit my website</a> to read more of my articles as well. Let’s connect and grow together! 😊</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
