<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Databases - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Databases - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Mon, 01 Jun 2026 05:28:13 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/databases/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Live Options Database in Python – A Complete Guide ]]>
                </title>
                <description>
                    <![CDATA[ Live options analytics change constantly. Implied volatility shifts, Greeks drift, and the shape of the surface can look different even a few minutes later. But a lot of teams still treat these number ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-live-options-database-in-python-a-complete-guide/</link>
                <guid isPermaLink="false">69fd19789f93a850a43041c9</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ stockmarket ]]>
                    </category>
                
                    <category>
                        <![CDATA[ trading,  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Nikhil Adithyan ]]>
                </dc:creator>
                <pubDate>Thu, 07 May 2026 23:00:08 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/4ecffa99-c492-4959-9899-885021d11ee4.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Live options analytics change constantly. Implied volatility shifts, Greeks drift, and the shape of the surface can look different even a few minutes later.</p>
<p>But a lot of teams still treat these numbers like something you glance at once. A screenshot in a deck. A one-off notebook cell. A quick check in a UI before a meeting.</p>
<p>That works until you need to answer basic questions that show up in real workflows:</p>
<p>What did TSLA's surface look like at 10:32? When did skew start steepening? Did the change come from the wings moving or the ATM shifting?</p>
<p>If you don't store the data as it arrives, you can't replay it, compare it, or audit it. You're stuck with whatever you happened to look at in the moment.</p>
<p>In this walkthrough, we'll build something small but practical: an internal database that continuously captures SpiderRock MLink's LiveImpliedQuote analytics for TSLA, stores each snapshot as queryable history, and also maintains a "latest view" table so you can pull the current surface state without scanning the full history.</p>
<p><strong>The goal is not to build a trading system. It's to build a reliable internal dataset that you can monitor and query.</strong></p>
<p>Note: SpiderRock MLink's LiveImpliedQuote analytics is a product offered for a fee, which includes exchange charges for the underlying market data used in its creation.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-what-data-were-using">What Data We're Using</a></p>
</li>
<li><p><a href="#heading-setup-importing-packages">Setup: Importing Packages</a></p>
</li>
<li><p><a href="#heading-database-design">Database Design</a></p>
</li>
<li><p><a href="#heading-pulling-liveimpliedquote">Pulling LiveImpliedQuote</a></p>
</li>
<li><p><a href="#heading-normalizing-the-response-into-rows">Normalizing the Response Into Rows</a></p>
</li>
<li><p><a href="#heading-writing-to-the-database">Writing To The Database</a></p>
</li>
<li><p><a href="#heading-running-a-short-polling-capture">Running a Short Polling Capture</a></p>
</li>
<li><p><a href="#heading-analysis-smile-reconstruction-from-the-database">Analysis: Smile Reconstruction From the Database</a></p>
<ul>
<li><p><a href="#heading-pick-an-expiry-with-good-coverage">Pick an Expiry with Good Coverage</a></p>
</li>
<li><p><a href="#heading-rebuild-the-smile-across-snapshots">Rebuild the Smile Across Snapshots</a></p>
</li>
<li><p><a href="#heading-zoom-in-around-spot">Zoom-In Around Spot</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-analysis-atm-iv-and-skew-over-time">Analysis: ATM IV and Skew Over Time</a></p>
</li>
<li><p><a href="#heading-alert-style-thresholds">Alert-Style Thresholds</a></p>
</li>
<li><p><a href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before running any of the code in this walkthrough, there are a few things you need to have in place.</p>
<p>On the API side, you need a SpiderRock MLink account with access to the LiveImpliedQuote feed. The examples use the REST interface, so no websocket setup is required, but you do need a valid API key. If you don't have one yet, you can reach out to SpiderRock directly to get access.</p>
<p>On the Python side, the environment is minimal. You need Python 3.10 or later for the tuple type hint syntax used in one of the function signatures. The external packages are requests, pandas, numpy, and matplotlib. Everything else – sqlite3, time, datetime – is part of the standard library. You can install the external dependencies with:</p>
<pre><code class="language-plaintext">pip install requests pandas numpy matplotlib
</code></pre>
<p>No database setup is required beyond a writable local path. SQLite creates the file automatically on first run, so there's nothing to install or configure separately.</p>
<p>Finally, the walkthrough uses TSLA as the target symbol because it has a liquid and active options chain. If you want to swap in a different underlying, the only thing you need to change is the symbol variable in the config block.</p>
<h2 id="heading-what-data-were-using">What Data We're Using</h2>
<p>This build is driven by one OptAnalytics message type from SpiderRock MLink: <a href="https://docs.spiderrockconnect.com/docs/next/MessageSchemas/Schema/Topics/analytics/LiveImpliedQuote/"><strong>LiveImpliedQuote</strong></a>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/7150e733-6238-410b-afe7-abc781d67e7a.png" alt="LiveImpliedQuote docs page" style="display:block;margin:0 auto" width="1000" height="451" loading="lazy">

<p>Each message represents an option contract and comes with the analytics you actually need for monitoring:</p>
<ul>
<li><p>the option identifier (symbol, expiry, strike, call or put)</p>
</li>
<li><p>surface IV (sVol) and related surface fields</p>
</li>
<li><p>Greeks (delta, gamma, theta, vega)</p>
</li>
<li><p>context fields like underlying price (uPrc), time to expiry (years), and rate (rate)</p>
</li>
<li><p>timestamps and calc source markers, which matter when you're turning a live feed into a database</p>
</li>
</ul>
<p>We'll treat sVol as the main volatility field for the article and refer to it as surface IV. That keeps the workflow consistent when we rebuild smiles or compute skew proxies from stored history.</p>
<p>The demo uses TSLA because it has a rich and active options chain, which makes the database and queries more interesting even in a short capture window. The same pipeline works for any other underlying&nbsp;– the only thing you change is the symbol filter.</p>
<h2 id="heading-setup-importing-packages">Setup: Importing Packages</h2>
<p>Before touching the database or the API, we set up a small, repeatable environment. This section is intentionally minimal. We only import what we need for three things: making REST calls, storing data in SQLite, and doing basic analysis and plots.</p>
<pre><code class="language-python">import requests
import sqlite3
import pandas as pd
import numpy as np
import time
from datetime import datetime, timezone
import matplotlib.pyplot as plt
plt.style.use('ggplot')
</code></pre>
<ul>
<li><p><code>requests</code> is used for calling MLink REST endpoints.</p>
</li>
<li><p><code>sqlite3</code> gives us a lightweight database we can write to locally without extra setup.</p>
</li>
<li><p><code>pandas</code> and <code>numpy</code> are only for shaping and filtering the data once it comes back.</p>
</li>
<li><p><code>time</code> and <code>datetime</code> help us run a polling loop and timestamp each snapshot so the database becomes a real-time series.</p>
</li>
</ul>
<h2 id="heading-database-design">Database Design</h2>
<p>If the goal is to make live analytics queryable, the database design has to support two different needs.</p>
<p>First, you want an audit trail. Every snapshot should be preserved so you can reconstruct what the surface looked like at a specific time.</p>
<p>Second, you also want a fast way to answer "what does it look like right now" without scanning everything you've ever stored.</p>
<p>So we use two tables:</p>
<ul>
<li><p><code>implied_quote_history</code>: Append-only. Every poll inserts a full snapshot.</p>
</li>
<li><p><code>implied_quote_latest</code>: One row per option contract. Each poll upserts into this table so it always reflects the most recent snapshot.</p>
</li>
</ul>
<p>The core of both tables is a stable option identifier. In the feed, the option key is nested, so we normalize it into a single <code>option_key</code> string that includes symbol, expiry, strike, call or put, and venue fields. This becomes the primary key for the latest table and the main join key for queries.</p>
<pre><code class="language-python">#config
api_key = "YOUR SPIDERROCK API KEY"
mlink_url = "https://mlink-live.nms.saturn.spiderrockconnect.com/rest/json"

msg_type = "LiveImpliedQuote"

symbol = "TSLA"
poll_interval_s = 10
poll_duration_s = 120
limit = 2000

#create db connection
db_path = "/mnt/data/optanalytics_iv_greeks.db"

def get_conn(path: str = db_path):
    conn = sqlite3.connect(path)
    conn.execute("PRAGMA journal_mode=WAL;")
    conn.execute("PRAGMA synchronous=NORMAL;")
    return conn

#create db schema
def setup_db(path: str = db_path):
    conn = get_conn(path)
    cur = conn.cursor()

    cur.execute("""
    create table if not exists implied_quote_history (
        id integer primary key autoincrement,
        asof_ts text not null,

        option_key text not null,
        symbol text not null,
        expiry text not null,
        strike real not null,
        cp text not null,

        calc_source text,
        u_prc real,
        years real,
        rate real,

        s_vol real,
        atm_vol real,
        s_mark real,

        o_bid real,
        o_ask real,
        o_bid_iv real,
        o_ask_iv real,

        delta real,
        gamma real,
        theta real,
        vega real,

        src_ts text
    );
    """)

    cur.execute("""
    create index if not exists idx_hist_symbol_expiry_asof
    on implied_quote_history(symbol, expiry, asof_ts);
    """)

    cur.execute("""
    create index if not exists idx_hist_option_asof
    on implied_quote_history(option_key, asof_ts);
    """)

    cur.execute("""
    create table if not exists implied_quote_latest (
        option_key text primary key,

        last_asof_ts text not null,
        symbol text not null,
        expiry text not null,
        strike real not null,
        cp text not null,

        calc_source text,
        u_prc real,
        years real,
        rate real,

        s_vol real,
        atm_vol real,
        s_mark real,

        o_bid real,
        o_ask real,
        o_bid_iv real,
        o_ask_iv real,

        delta real,
        gamma real,
        theta real,
        vega real,

        src_ts text
    );
    """)

    cur.execute("""
    create index if not exists idx_latest_symbol_expiry
    on implied_quote_latest(symbol, expiry);
    """)

    conn.commit()
    conn.close()

setup_db()
</code></pre>
<p>This creates the SQLite database file and both tables. The history table is append-only and indexed for the two queries we'll run later: pulling snapshots by expiry and time, and pulling a specific option's timeline by <code>option_key</code>. The latest table is keyed by <code>option_key</code>, which lets us upsert and maintain a consistent "current view."</p>
<p>The columns we store are intentionally opinionated. We keep surface IV (s_vol), surface mark (s_mark), Greeks, and a few context fields. We also store timestamps so later we can reason about when a value was produced.</p>
<h2 id="heading-pulling-liveimpliedquote">Pulling LiveImpliedQuote</h2>
<p>Now we do the first live pull. The goal here is not to build a perfect filter. It's to confirm that we can retrieve a meaningful slice of TSLA option analytics and that the response structure is what we expect.</p>
<p>We request LiveImpliedQuote and filter by symbol using the where clause. The response is a list where most rows are actual LiveImpliedQuote messages, and one row at the end is a QueryResult summary.</p>
<pre><code class="language-python">def fetch_live_implied_quote(symbol: str, limit: int = 2000):
    where = f"okey.tk:eq:{symbol}"

    params = {
        "apiKey": api_key,
        "cmd": "getmsgs",
        "msgType": msg_type,
        "where": where,
        "limit": limit
    }

    r = requests.get(mlink_url, params=params)
    r.raise_for_status()
    return r.json()

raw = fetch_live_implied_quote(symbol, limit=limit)
print("raw messages:", len(raw))
print("first type:", raw[0].get("header", {}).get("mTyp") if raw else None)
</code></pre>
<p>This is a straight REST <code>getmsgs</code> call. We pass the API key, message type, and a simple symbol filter. The <code>limit</code> is important. It caps how many messages we get back in one poll, so for active underlyings, the returned set of strikes and expiries can vary between polls. That's fine for this tutorial, because the goal is to show the database pattern and the types of monitoring queries it enables.</p>
<p>This is the output you should see:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/606259cd-e6ed-4f6f-b24f-48fafe9c561b.png" alt="LiveImpliedQuote sample pull" style="display:block;margin:0 auto" width="988" height="170" loading="lazy">

<h2 id="heading-normalizing-the-response-into-rows">Normalizing the Response Into Rows</h2>
<p>Right now, raw is a list of nested message objects. That format is fine for transport, but it's not something you can store or query directly. So now, we turn each LiveImpliedQuote message into one flat row with a consistent schema.</p>
<pre><code class="language-python">def make_option_key(okey: dict) -&gt; str:
    return "|".join([
        str(okey.get("tk")),
        str(okey.get("dt")),
        str(okey.get("xx")),
        str(okey.get("cp")),
        str(okey.get("at")),
        str(okey.get("ts")),
    ])

def normalize_liq(raw: list, asof_ts: str, keep_calc_source: str = "Loop") -&gt; pd.DataFrame:
    rows = []

    for row in raw:
        if row.get("header", {}).get("mTyp") != "LiveImpliedQuote":
            continue

        m = row.get("message", {})
        if keep_calc_source and m.get("calcSource") != keep_calc_source:
            continue

        pkey = m.get("pkey", {})
        okey = pkey.get("okey", {})
        if not okey:
            continue

        s_vol = m.get("sVol")
        if s_vol is None or s_vol == 0:
            continue

        o_bid = m.get("oBid", 0) or 0
        o_ask = m.get("oAsk", 0) or 0

        quote_ok = int(not (o_bid == 0 and o_ask == 0))

        rows.append({
            "asof_ts": asof_ts,
            "option_key": make_option_key(okey),

            "symbol": okey.get("tk"),
            "expiry": okey.get("dt"),
            "strike": okey.get("xx"),
            "cp": okey.get("cp"),

            "calc_source": m.get("calcSource"),
            "u_prc": m.get("uPrc"),
            "years": m.get("years"),
            "rate": m.get("rate"),

            "s_vol": s_vol,
            "atm_vol": m.get("atmVol"),
            "s_mark": m.get("sMark"),

            "o_bid": o_bid,
            "o_ask": o_ask,
            "o_bid_iv": m.get("oBidIv"),
            "o_ask_iv": m.get("oAskIv"),
            "quote_ok": quote_ok,

            "delta": m.get("de"),
            "gamma": m.get("ga"),
            "theta": m.get("th"),
            "vega": m.get("ve"),

            "src_ts": m.get("timestamp"),
        })

    df = pd.DataFrame(rows)
    if df.empty:
        return df

    df = (
        df.sort_values("src_ts")
          .drop_duplicates(subset=["option_key"], keep="last")
          .reset_index(drop=True)
    )
    return df

asof_ts = datetime.now(timezone.utc).isoformat(timespec="seconds").replace("+00:00", "Z")
snapshot_df = normalize_liq(raw, asof_ts)

print("snapshot rows:", len(snapshot_df))
print("quote_ok distribution:", snapshot_df["quote_ok"].value_counts().to_dict() if not snapshot_df.empty else {})
snapshot_df.head()
</code></pre>
<p>There are three practical decisions baked into this normalization step:</p>
<ul>
<li><p>First, we build a stable <code>option_key</code> from the option identifier so we have a consistent primary key for the latest table.</p>
</li>
<li><p>Second, we keep only <code>calcSource="Loop"</code>. LiveImpliedQuote can include both Tick and Loop records. Loop records tend to be more consistent for snapshot-style analysis because the underlying reference price is stable across the surface.</p>
</li>
<li><p>Third, we avoid aggressive filtering. In this dataset, the top-of-book bid and ask fields can be zero even when the analytics fields are populated. So instead of dropping those rows, we store a <code>quote_ok</code> flag and keep the record. That keeps the pipeline usable while still making it obvious later which rows had live quotes.</p>
</li>
</ul>
<p>This is the output:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/7d04a9e8-d3ec-4737-a0a7-64cb3888380c.png" alt="LiveImpliedQuote snapshot" style="display:block;margin:0 auto" width="1500" height="496" loading="lazy">

<p>At this point, one row represents one option contract snapshot. The fact that <code>quote_ok</code> is 0 across the board simply means bid and ask are not populated in this slice, even though surface IV, Greeks, and other analytics fields are present. That's still useful for building a monitoring database, because the core idea here is tracking the evolution of analytics over time, not reconstructing executable markets.</p>
<h2 id="heading-writing-to-the-database">Writing to the Database</h2>
<p>Now that we have a clean snapshot DataFrame, the job is to persist it in two places.</p>
<p>History table: Append everything. This is the audit log. Latest table: Upsert by <code>option_key</code>. This is the fast "current view."</p>
<p>This separation is what makes the database useful. History lets you reconstruct any past snapshot. Latest lets you answer "what does the surface look like right now" without scanning time series.</p>
<pre><code class="language-python">def safe_add_column(table: str, col: str, col_type: str, path: str = db_path):
    conn = get_conn(path)
    cur = conn.cursor()
    existing = [r[1] for r in cur.execute(f"PRAGMA table_info({table});").fetchall()]
    if col not in existing:
        cur.execute(f"ALTER TABLE {table} ADD COLUMN {col} {col_type};")
    conn.commit()
    conn.close()

safe_add_column("implied_quote_history", "quote_ok", "INTEGER")
safe_add_column("implied_quote_latest", "quote_ok", "INTEGER")

def write_snapshot_to_db(df: pd.DataFrame, path: str = db_path) -&gt; tuple[int, int]:
    if df.empty:
        return 0, 0

    conn = get_conn(path)
    cur = conn.cursor()

    cols = [
        "asof_ts",
        "option_key","symbol","expiry","strike","cp",
        "calc_source","u_prc","years","rate",
        "s_vol","atm_vol","s_mark",
        "o_bid","o_ask","o_bid_iv","o_ask_iv",
        "delta","gamma","theta","vega",
        "quote_ok","src_ts"
    ]

    for c in cols:
        if c not in df.columns:
            df[c] = None

    insert_df = df[cols].copy()

    cur.executemany(
        """
        insert into implied_quote_history (
            asof_ts,
            option_key, symbol, expiry, strike, cp,
            calc_source, u_prc, years, rate,
            s_vol, atm_vol, s_mark,
            o_bid, o_ask, o_bid_iv, o_ask_iv,
            delta, gamma, theta, vega,
            quote_ok, src_ts
        ) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """,
        insert_df.itertuples(index=False, name=None)
    )
    history_inserted = cur.rowcount

    cur.executemany(
        """
        insert into implied_quote_latest (
            option_key,
            last_asof_ts, symbol, expiry, strike, cp,
            calc_source, u_prc, years, rate,
            s_vol, atm_vol, s_mark,
            o_bid, o_ask, o_bid_iv, o_ask_iv,
            delta, gamma, theta, vega,
            quote_ok, src_ts
        ) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        on conflict(option_key) do update set
            last_asof_ts=excluded.last_asof_ts,
            symbol=excluded.symbol,
            expiry=excluded.expiry,
            strike=excluded.strike,
            cp=excluded.cp,
            calc_source=excluded.calc_source,
            u_prc=excluded.u_prc,
            years=excluded.years,
            rate=excluded.rate,
            s_vol=excluded.s_vol,
            atm_vol=excluded.atm_vol,
            s_mark=excluded.s_mark,
            o_bid=excluded.o_bid,
            o_ask=excluded.o_ask,
            o_bid_iv=excluded.o_bid_iv,
            o_ask_iv=excluded.o_ask_iv,
            delta=excluded.delta,
            gamma=excluded.gamma,
            theta=excluded.theta,
            vega=excluded.vega,
            quote_ok=excluded.quote_ok,
            src_ts=excluded.src_ts
        """,
        insert_df[[
            "option_key","asof_ts","symbol","expiry","strike","cp",
            "calc_source","u_prc","years","rate",
            "s_vol","atm_vol","s_mark",
            "o_bid","o_ask","o_bid_iv","o_ask_iv",
            "delta","gamma","theta","vega",
            "quote_ok","src_ts"
        ]].itertuples(index=False, name=None)
    )
    latest_upserted = cur.rowcount

    conn.commit()
    conn.close()
    return history_inserted, latest_upserted

hist_n, latest_n = write_snapshot_to_db(snapshot_df)
print("history inserted:", hist_n)
print("latest upserted:", latest_n)
</code></pre>
<p>We batch write using <code>executemany</code> so inserts are fast even with thousands of option rows. The history insert is straightforward. The latest write uses a SQLite upsert keyed on <code>option_key</code>, which means if the contract already exists in the latest table, its fields are overwritten with the newest snapshot.</p>
<p>You should see:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/8fdbdeb1-a4f2-434d-a3c7-99f44e51ec5d.png" alt="History inserted: 1852, latest upserted: 1852" style="display:block;margin:0 auto" width="608" height="137" loading="lazy">

<p>After the first write, both tables have the same number of rows. That's expected, because there is only one snapshot in history so far. Once we start polling multiple snapshots, the history table will grow every cycle, while the latest table will stay roughly flat and continue updating in place.</p>
<h2 id="heading-running-a-short-polling-capture">Running a Short Polling Capture</h2>
<p>At this point, the pipeline works end-to-end for a single snapshot. The whole point of the database, though, is to turn live analytics into a time series. So we run a short capture window and store multiple snapshots back-to-back.</p>
<p>This isn't meant to be a production scheduler. It's just a simple loop that runs for a couple of minutes, polls every few seconds, timestamps the snapshot, and writes it to both tables.</p>
<pre><code class="language-python">def poll_and_write(symbol: str, duration_s: int = poll_duration_s, interval_s: int = poll_interval_s):
    start = time.time()
    polls = 0
    total_hist = 0

    while time.time() - start &lt; duration_s:
        asof_ts = datetime.now(timezone.utc).isoformat(timespec="seconds").replace("+00:00", "Z")

        raw = fetch_live_implied_quote(symbol, limit=limit)
        df = normalize_liq(raw, asof_ts)

        hist_n, latest_n = write_snapshot_to_db(df)
        polls += 1
        total_hist += hist_n

        print(f"[{polls}] {asof_ts} snapshot_rows={len(df)} history+={hist_n} latest_upsert={latest_n}")
        time.sleep(interval_s)

    print(f"done. polls={polls}, total_history_added={total_hist}")

poll_and_write(symbol, duration_s=120, interval_s=10)
</code></pre>
<p>Each loop iteration represents one snapshot. We generate a UTC timestamp (asof_ts), pull the latest batch from LiveImpliedQuote, normalize it into rows, then write it into the database. The history table accumulates every snapshot. The latest table overwrites by <code>option_key</code>, so it always represents the most recent view.</p>
<p>One practical detail is worth calling out. The API call is capped by limit, so you're not guaranteed to receive an identical set of strikes and expiries every poll. That's why <code>snapshot_rows</code> can vary between iterations.</p>
<p>In production, you usually stabilize the slice by pinning specific expiries and a strike band or by interpolating IV to fixed moneyness points. For this tutorial, we're keeping ingestion simple and focusing on the database pattern and the monitoring queries it enables.</p>
<p>You should see per-poll telemetry like this:</p>
<pre><code class="language-plaintext">[1] 2026-04-14T18:09:29Z snapshot_rows=1454 history+=1454 latest_upsert=1454
...
done. polls=9, total_history_added=12806
</code></pre>
<p>This confirms the database is building a time series. Over nine polls, you stored 12,806 option rows in history. The latest table is updated each time, but it doesn't grow in the same way as history because it overwrites per contract key.</p>
<p>From the next section, we'll stop writing and start querying.</p>
<h2 id="heading-analysis-smile-reconstruction-from-the-database">Analysis: Smile Reconstruction From the Database</h2>
<p>Once the data is in <code>implied_quote_history</code>, the workflow flips. We stop thinking in terms of "API responses" and start thinking in terms of "queries." This section does two things. First, it picks an expiry that has enough rows to be representative. Then it reconstructs the call-side volatility smile for that expiry across a few timestamps.</p>
<h3 id="heading-pick-an-expiry-with-good-coverage">Pick an Expiry with Good Coverage</h3>
<p>If you pick an expiry that only appears sporadically in the captured snapshots, the smile plot will be misleading. So we start by looking at which expiries have the most rows in the history table.</p>
<pre><code class="language-python">conn = get_conn()

expiry_counts = pd.read_sql_query(
    """
    select expiry, count(*) as n
    from implied_quote_history
    where symbol = ?
    group by expiry
    order by n desc
    limit 10
    """,
    conn,
    params=(symbol,)
)

conn.close()
expiry_counts
</code></pre>
<p>This query scans only the history table, filters to TSLA, and counts how many option rows exist per expiry across the capture window. We keep the top 10 and pick the first one as the expiry we'll reconstruct.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/2f7b897f-0a4f-4b1a-826e-0fee6b19f2bd.png" alt="Expiry-wise coverage" style="display:block;margin:0 auto" width="373" height="724" loading="lazy">

<p>The expiry date <code>2026-11-20</code> has the highest count.</p>
<p>Here, the count doesn't mean this expiry is "best" in any trading sense. It just means it showed up most consistently in the captured data. That makes it a practical choice for a clean smile comparison.</p>
<h3 id="heading-rebuild-the-smile-across-snapshots">Rebuild the Smile Across Snapshots</h3>
<p>Now we query the stored history for one expiry, keep only calls, and plot surface IV (s_vol) against strike for multiple snapshot timestamps.</p>
<pre><code class="language-python">chosen_expiry = "2026-11-20" 

conn = get_conn()
smile = pd.read_sql_query(
    """
    select asof_ts, strike, cp, s_vol, u_prc
    from implied_quote_history
    where symbol = ? and expiry = ?
    """,
    conn,
    params=(symbol, chosen_expiry)
)
conn.close()

smile_calls = smile[smile["cp"] == "Call"].copy()

ts_list = sorted(smile_calls["asof_ts"].unique())
pick = [ts_list[0], ts_list[len(ts_list)//2], ts_list[-1]]

plt.figure(figsize=(9,5))
for ts in pick:
    g = smile_calls[smile_calls["asof_ts"] == ts].sort_values("strike")
    plt.plot(g["strike"], g["s_vol"], label=ts)

plt.title(f"{symbol} Vol Smile (Calls) | Expiry {chosen_expiry} | 3 snapshots")
plt.xlabel("Strike")
plt.ylabel("Implied Vol (s_vol)")
plt.grid(True)
plt.legend()
plt.show()
</code></pre>
<p>We pull all rows for the chosen expiry from history, then filter to calls so we don't mix put and call shapes. To keep the plot readable, we only plot three snapshots. First, middle, and last.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/84416f80-9253-4f18-8da4-ea814e174987.png" alt="TSLA vol smile (calls)" style="display:block;margin:0 auto" width="778" height="475" loading="lazy">

<p>Over a short capture window, the smiles often overlap heavily. That doesn't mean the system isn't working. It usually means the surface didn't move much in those two minutes. The important part is that we can reconstruct and compare it purely from stored history.</p>
<h3 id="heading-zoom-in-around-spot">Zoom-In Around Spot</h3>
<p>The full-range plot is useful for shape, but it can hide small shifts near the region people actually care about. So we zoom to a band around the underlying price.</p>
<pre><code class="language-python">s0 = float(smile_calls["u_prc"].dropna().median())
low, high = s0 * 0.6, s0 * 1.4

for ts in pick:
    g = smile_calls[smile_calls["asof_ts"] == ts].sort_values("strike")
    g = g[(g["strike"] &gt;= low) &amp; (g["strike"] &lt;= high)]
    plt.plot(g["strike"], g["s_vol"], label=ts)

plt.title(f"{symbol} Vol Smile (Calls) | Expiry {chosen_expiry} | zoomed")
plt.xlabel("Strike")
plt.ylabel("Implied Vol (s_vol)")
plt.grid(True)
plt.legend(fontsize=8)
plt.show()
</code></pre>
<p>We take a robust spot proxy from the stored <code>u_prc</code> values and then keep strikes within a range around it. The goal is not precision. It's to make the chart readable and show whether the near-ATM region is drifting.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/107de4b4-7b40-4e79-a38b-fac96cb11b26.png" alt="TSLA vol smile (calls)  -  zoomed-in" style="display:block;margin:0 auto" width="781" height="475" loading="lazy">

<p>Here, even small changes become visible. This is also why storing history matters. If you only looked at one snapshot in isolation, these shifts would be easy to miss or dismiss.</p>
<h2 id="heading-analysis-atm-iv-and-skew-over-time">Analysis: ATM IV and Skew Over Time</h2>
<p>A full smile plot is useful, but it's not always the fastest way to monitor a surface. In practice, teams usually track a few summary numbers per expiry so they can spot changes quickly, then drill down only when something looks off.</p>
<p>Here we reduce each stored snapshot into two metrics for a single expiry.</p>
<ul>
<li><p>ATM IV: Surface IV at the strike closest to spot.</p>
</li>
<li><p>Skew proxy: Surface IV at 0.9 times spot minus surface IV at 1.1 times spot, using the closest available strikes.</p>
</li>
</ul>
<pre><code class="language-python">chosen_expiry = "2026-11-20"

conn = get_conn()
df = pd.read_sql_query(
    """
    select asof_ts, strike, s_vol, u_prc
    from implied_quote_history
    where symbol = ? and expiry = ? and cp = 'Call'
    """,
    conn,
    params=(symbol, chosen_expiry)
)
conn.close()

df["strike"] = df["strike"].astype(float)
df["s_vol"] = df["s_vol"].astype(float)

def closest_iv(grp: pd.DataFrame, target_strike: float):
    g = grp.iloc[(grp["strike"] - target_strike).abs().argsort()[:1]]
    return float(g["s_vol"].iloc[0]), float(g["strike"].iloc[0])

rows = []
for ts, grp in df.groupby("asof_ts"):
    spot = float(grp["u_prc"].dropna().median())
    atm_target = spot
    down_target = spot * 0.9
    up_target = spot * 1.1

    atm_iv, atm_k = closest_iv(grp, atm_target)
    down_iv, down_k = closest_iv(grp, down_target)
    up_iv, up_k = closest_iv(grp, up_target)

    rows.append({
        "asof_ts": ts,
        "spot": spot,
        "atm_strike": atm_k,
        "atm_iv": atm_iv,
        "k90": down_k,
        "iv_90": down_iv,
        "k110": up_k,
        "iv_110": up_iv,
        "skew_90_110": down_iv - up_iv
    })

metrics = pd.DataFrame(rows).sort_values("asof_ts").reset_index(drop=True)
metrics
</code></pre>
<p>We query the history table for one expiry and keep only calls, then group by snapshot timestamp. For each snapshot, we use the median <code>u_prc</code> as a spot proxy and pick the closest available strike to spot. That gives ATM IV. We repeat the same approach for 0.9 times spot and 1.1 times spot and compute a skew proxy as the difference.</p>
<p>The table also stores the actual strikes used (atm_strike, k90, k110). Options strikes are discrete, so the nearest strike can change between snapshots. Keeping the chosen strikes visible makes the metric explainable when it moves.</p>
<p>The output is a table with one row per snapshot timestamp and the computed metrics.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/5590b162-5fe7-4713-8f56-edc4c6171ab2.png" alt="ATM IV, skew proxy metrics" style="display:block;margin:0 auto" width="1000" height="441" loading="lazy">

<p>Now that we have a clean time series table, we can visualize the two metrics. First, ATM IV. Then, the skew proxy.</p>
<pre><code class="language-python">plt.plot(metrics["asof_ts"], metrics["atm_iv"])
plt.title(f"{symbol} ATM IV over time | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.ylabel("ATM IV (s_vol)")
plt.grid(True)
plt.show()

plt.plot(metrics["asof_ts"], metrics["skew_90_110"])
plt.title(f"{symbol} Skew proxy (IV@0.9S - IV@1.1S) | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.ylabel("Skew proxy")
plt.grid(True)
plt.show()
</code></pre>
<p>Here is the first chart, ATM IV over time.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/0df9b0ff-e02f-4c6b-b4ec-175ddc46522c.png" alt="TSLA ATM IV over time" style="display:block;margin:0 auto" width="831" height="453" loading="lazy">

<p>ATM IV tends to move slowly over short windows unless there is a sharp repricing event. In this run, it stays fairly stable, which is a realistic outcome for a short capture. The value here is that the database turns "fairly stable" into something you can quantify and compare later, rather than a vague impression.</p>
<p>Here is the second chart, Skew proxy over time.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/f90243ee-6039-4d7e-94ed-d248eaaf9722.png" alt="TSLA skew proxy" style="display:block;margin:0 auto" width="831" height="453" loading="lazy">

<p>The skew proxy is more sensitive because it's based on wing points. If it changes, it usually means the downside is being repriced differently from the upside for that expiry. One nuance is that the nearest available strike can change between snapshots, which can create step-like moves even when the surface isn't moving dramatically. That's why we keep k90 and k110 in the metrics table. It keeps the skew plot explainable.</p>
<h2 id="heading-alert-style-thresholds">Alert-Style Thresholds</h2>
<p>Once you have a metrics table per snapshot, adding a monitoring layer is straightforward. The idea isn't to generate trades. It's to flag when the surface moves enough that someone should look closer.</p>
<p>Here we do two checks:</p>
<ul>
<li><p>ATM IV change alert: Flag if ATM IV changes more than a small threshold between snapshots.</p>
</li>
<li><p>Skew change alert: Flag if the skew proxy changes more than a threshold between snapshots.</p>
</li>
</ul>
<pre><code class="language-python">alerts = metrics.copy()

alerts["atm_iv_change"] = alerts["atm_iv"].diff()
alerts["skew_change"] = alerts["skew_90_110"].diff()

atm_thresh = 0.002    
skew_thresh = 0.003   

alerts["atm_alert"] = alerts["atm_iv_change"].abs() &gt;= atm_thresh
alerts["skew_alert"] = alerts["skew_change"].abs() &gt;= skew_thresh

alerts[[
    "asof_ts",
    "atm_iv", "atm_iv_change", "atm_alert",
    "skew_90_110", "skew_change", "skew_alert",
    "atm_strike", "k90", "k110"
]]
</code></pre>
<p>We take the per-snapshot metrics table and compute first differences. Then we compare those changes to thresholds and store boolean flags. The output table keeps both the metrics and the strikes used for the calculations, so any alert is explainable rather than a black box.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/b6805adc-90f6-4c57-8dee-aa6e0ec4d724.png" alt="Alerts dataframe" style="display:block;margin:0 auto" width="1500" height="546" loading="lazy">

<p>In this run, the ATM IV alerts are all false, while the skew alert triggers once.</p>
<p>The skew alert fires because the skew proxy jumps by more than the threshold between two snapshots. This is explainable. If you see the table, you can see the strikes used for the proxy changed around the same time (k90 shifts from 340 to 315). Because strikes are discrete, nearest-strike metrics can step even when the surface is not moving dramatically.</p>
<p>To make this easier to read, we also plot the two series and mark alert points.</p>
<pre><code class="language-python">plt.plot(alerts["asof_ts"], alerts["atm_iv"])
for i, r in alerts[alerts["atm_alert"]].iterrows():
    plt.scatter(r["asof_ts"], r["atm_iv"],  s=30, edgecolors="r", alpha=0.6, linewidth=2)
plt.title(f"{symbol} ATM IV with alerts | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.grid(True)
plt.show()

plt.plot(alerts["asof_ts"], alerts["skew_90_110"])
for i, r in alerts[alerts["skew_alert"]].iterrows():
    plt.scatter(r["asof_ts"], r["skew_90_110"], s=30, edgecolors="r", alpha=0.6, linewidth=2)
plt.title(f"{symbol} Skew proxy with alerts | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.grid(True)
plt.show()
</code></pre>
<p>Both plots use the same pattern. Plot the metric as a line, then overlay a marker on any timestamp where the corresponding alert flag is true. This makes it obvious when something crossed the threshold.</p>
<p>This chart represents skew proxy with alerts.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f362fe21017f7317167b14c/eff87263-68f0-4132-935d-bdf148e73c82.png" alt="TSLA skew proxy with alerts" style="display:block;margin:0 auto" width="831" height="453" loading="lazy">

<p>This chart shows one alert marker, which matches what we saw in the table.</p>
<p>The ATM IV plot isn't featured since there are no alert points.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>In this walkthrough, we used SpiderRock MLink's LiveImpliedQuote feed for TSLA and turned it into a small internal database you can query. We stored every snapshot in an append-only history table, maintained a latest view keyed by a stable option identifier, then used that stored data to rebuild a smile, track ATM surface IV and a simple skew proxy, and add a basic alert rule on top.</p>
<p>This fits well in B2B workflows because it turns live analytics into something operational: a dataset you can audit, replay, and monitor. The same pattern works whether you're building an internal dashboard, running routine surface checks for a desk, or doing a quick post-event review without relying on screenshots and one-off notebook runs.</p>
<p>If you want to extend it, the most practical next steps are longer capture windows, tracking multiple symbols, and moving from SQLite to Postgres once the data volume grows. If metric stability becomes important, you can also standardize the slice you track per poll or interpolate IV to fixed moneyness points so skew measures don't step when nearest strikes change.</p>
<p>With that being said, you've reached the end of the article. Hope you learned something new and useful.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use PostgreSQL as a Cache, Queue, and Search Engine ]]>
                </title>
                <description>
                    <![CDATA[ "Just use Postgres" has been circulating as advice for years, but most articles arguing for it are opinion pieces. I wanted hard numbers. So I built a benchmark suite that pits vanilla PostgreSQL agai ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-postgresql-as-a-cache-queue-and-search-engine/</link>
                <guid isPermaLink="false">69e7accfe43672781470ff97</guid>
                
                    <category>
                        <![CDATA[ PostgreSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ backend ]]>
                    </category>
                
                    <category>
                        <![CDATA[ performance ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Aaron Yong ]]>
                </dc:creator>
                <pubDate>Tue, 21 Apr 2026 16:58:55 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/6fcdd3c0-eead-42a7-b2f0-cf4c6a3d06dc.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>"Just use Postgres" has been circulating as advice for years, but most articles arguing for it are opinion pieces. I wanted hard numbers.</p>
<p>So I built a benchmark suite that pits vanilla PostgreSQL against a feature-optimized PostgreSQL instance — measuring caching, message queues, full-text search, and pub/sub under controlled conditions.</p>
<p>In this article, you'll learn how to use PostgreSQL's built-in features for caching, job queues, full-text search, and pub/sub. You'll see actual benchmark results (latency percentiles, throughput, and error rates) comparing naive PostgreSQL patterns against optimized ones, and understand where PostgreSQL's limits are so you can decide whether you really need that extra service in your stack.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-the-setup">The Setup</a></p>
</li>
<li><p><a href="#heading-benchmark-1-caching-with-unlogged-tables">Benchmark 1: Caching with UNLOGGED Tables</a></p>
</li>
<li><p><a href="#heading-benchmark-2-job-queues-with-skip-locked">Benchmark 2: Job Queues with SKIP LOCKED</a></p>
</li>
<li><p><a href="#heading-benchmark-3-full-text-search-with-tsvector">Benchmark 3: Full-Text Search with tsvector</a></p>
</li>
<li><p><a href="#heading-benchmark-4-pubsub-with-listennotify">Benchmark 4: Pub/Sub with LISTEN/NOTIFY</a></p>
</li>
<li><p><a href="#heading-the-combined-workload-the-honest-test">The Combined Workload: The Honest Test</a></p>
</li>
<li><p><a href="#heading-what-i-learned">What I Learned</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along or reproduce the benchmarks, you'll need:</p>
<ul>
<li><p>Docker and Docker Compose</p>
</li>
<li><p>Node.js 20+ (for the Express TypeScript API layer)</p>
</li>
<li><p><a href="https://k6.io/">k6</a> for load testing</p>
</li>
<li><p>Basic familiarity with SQL and PostgreSQL</p>
</li>
</ul>
<p>The full benchmark project is <a href="https://github.com/aaronhsyong2/pg-stack-benchmark">open source on GitHub</a> — you can clone it and run every test yourself.</p>
<h2 id="heading-the-setup">The Setup</h2>
<p>The benchmark uses two identical PostgreSQL 17 instances running in Docker containers, each with fixed resource constraints (2 CPUs, 2 GB RAM). Both share the same Express TypeScript API layer — the only difference is which PostgreSQL features are enabled.</p>
<pre><code class="language-plaintext">┌─────────┐     ┌──────────────────┐     ┌─────────────────┐
│   k6    │────&gt;│  Express API     │────&gt;│  PG Baseline    │
│  (load  │     │  (TypeScript)    │     │  (vanilla PG17) │
│  test)  │────&gt;│  Port 3001/3002  │────&gt;│  PG Modded      │
└─────────┘     └──────────────────┘     │  (features on)  │
                                         └─────────────────┘
</code></pre>
<p>The baseline instance uses naïve approaches (regular tables, <code>ILIKE</code> search, polling). The modded instance uses PostgreSQL's built-in features (UNLOGGED tables, <code>tsvector</code> with GIN indexes, <code>LISTEN/NOTIFY</code>, partial indexes). Same hardware, same API code, same data. Only the database features differ.</p>
<p>Both instances share this tuned <code>postgresql.conf</code>:</p>
<pre><code class="language-ini"># Memory allocation
shared_buffers = 512MB           # 25% of available RAM
effective_cache_size = 1536MB    # 75% of RAM — helps the query planner
work_mem = 16MB                  # per-sort/hash operation memory

# SSD-optimized planner settings
random_page_cost = 1.1           # default 4.0 assumes spinning disks
effective_io_concurrency = 200   # allow parallel I/O on SSDs
</code></pre>
<p>These settings matter. The defaults assume spinning disks from the early 2000s. Setting <code>random_page_cost = 1.1</code> tells the query planner that random reads are nearly as fast as sequential reads on SSDs, which encourages index usage over sequential scans.</p>
<h2 id="heading-benchmark-1-caching-with-unlogged-tables">Benchmark 1: Caching with UNLOGGED Tables</h2>
<p><strong>The idea:</strong> Use an UNLOGGED table as an in-database cache. UNLOGGED tables skip PostgreSQL's Write-Ahead Log (WAL) — the mechanism that guarantees durability. Since cache data is ephemeral by nature, losing it on a crash is acceptable, and skipping WAL removes the biggest write bottleneck.</p>
<pre><code class="language-sql">-- Modded: UNLOGGED table for cache entries
CREATE UNLOGGED TABLE cache_entries (
    key TEXT PRIMARY KEY,
    value JSONB NOT NULL,
    expires_at TIMESTAMPTZ
);

-- Baseline: same schema, but a regular (logged) table
CREATE TABLE cache_entries (
    key TEXT PRIMARY KEY,
    value JSONB NOT NULL,
    expires_at TIMESTAMPTZ
);
</code></pre>
<h3 id="heading-results-200-virtual-users">Results (200 Virtual Users)</h3>
<table>
<thead>
<tr>
<th>Mode</th>
<th>p50</th>
<th>p95</th>
<th>avg</th>
<th>req/s</th>
</tr>
</thead>
<tbody><tr>
<td>Baseline (regular table)</td>
<td>1.87ms</td>
<td>6.00ms</td>
<td>2.50ms</td>
<td>1,754/s</td>
</tr>
<tr>
<td>Modded (UNLOGGED table)</td>
<td>1.71ms</td>
<td>5.24ms</td>
<td>2.17ms</td>
<td>1,760/s</td>
</tr>
</tbody></table>
<p>A consistent 13% improvement across all percentiles. Not dramatic, but free — you change one keyword in your <code>CREATE TABLE</code> statement.</p>
<h3 id="heading-under-stress-1000-virtual-users-no-sleep">Under Stress (1,000 Virtual Users, No Sleep)</h3>
<table>
<thead>
<tr>
<th>Mode</th>
<th>p50</th>
<th>p95</th>
<th>req/s</th>
<th>Total Requests</th>
</tr>
</thead>
<tbody><tr>
<td>Baseline</td>
<td>83.38ms</td>
<td>143.23ms</td>
<td>7,663/s</td>
<td>728,021</td>
</tr>
<tr>
<td>Modded</td>
<td>77.69ms</td>
<td>126.39ms</td>
<td>8,062/s</td>
<td>765,934</td>
</tr>
</tbody></table>
<p>The relative improvement stays locked at 12-13% regardless of load level. The UNLOGGED advantage is a per-write optimization — it saves the same amount of I/O whether you are doing 100 or 10,000 writes per second. The modded instance served 37,000 more requests in the same time window.</p>
<h3 id="heading-the-verdict">The Verdict</h3>
<p>UNLOGGED tables won't match Redis for sub-millisecond hot-path caching (real-time bidding, gaming leaderboards). But for web applications where the difference between 2ms and 5ms is invisible to users, they eliminate an entire infrastructure dependency for zero additional complexity.</p>
<p>You do give up Redis data structures (sorted sets, HyperLogLog, streams). If you need those, a dedicated cache is still the right call.</p>
<h2 id="heading-benchmark-2-job-queues-with-skip-locked">Benchmark 2: Job Queues with SKIP LOCKED</h2>
<p><strong>The idea:</strong> Use PostgreSQL as a job queue with <code>SELECT ... FOR UPDATE SKIP LOCKED</code>. Multiple workers poll the same table, and <code>SKIP LOCKED</code> ensures each worker gets a different row — no duplicates, no contention.</p>
<pre><code class="language-sql">-- Queue table with a partial index on pending jobs only
CREATE TABLE job_queue (
    id SERIAL PRIMARY KEY,
    payload JSONB NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Partial index: only indexes pending jobs
-- As jobs complete, they leave the index — it stays small forever
CREATE INDEX idx_pending_jobs ON job_queue (created_at)
    WHERE status = 'pending';
</code></pre>
<p>The dequeue pattern:</p>
<pre><code class="language-sql">-- Atomic dequeue: select + update in one statement
UPDATE job_queue SET status = 'processing'
WHERE id = (
    SELECT id FROM job_queue
    WHERE status = 'pending'
    ORDER BY created_at
    LIMIT 1
    FOR UPDATE SKIP LOCKED  -- skip rows locked by other workers
) RETURNING *;
</code></pre>
<p>How <code>SKIP LOCKED</code> works: Worker A locks row 1. Worker B tries row 1, sees the lock, skips it, and takes row 2 instead. No blocking, no duplicates. If a worker crashes, the transaction rolls back and the row becomes available again.</p>
<h3 id="heading-results-100-producers-50-consumers">Results (100 Producers + 50 Consumers)</h3>
<table>
<thead>
<tr>
<th>Mode</th>
<th>p50</th>
<th>p95</th>
<th>avg</th>
<th>req/s</th>
</tr>
</thead>
<tbody><tr>
<td>Baseline (full index)</td>
<td>1.90ms</td>
<td>5.01ms</td>
<td>2.30ms</td>
<td>1,053/s</td>
</tr>
<tr>
<td>Modded (partial index)</td>
<td>1.81ms</td>
<td>5.28ms</td>
<td>2.29ms</td>
<td>1,052/s</td>
</tr>
</tbody></table>
<p>They're virtually identical. The partial index doesn't show its value in a 60-second benchmark because the table doesn't accumulate enough completed rows for the index size difference to matter. In a production system with millions of completed jobs, the partial index keeps the index at kilobytes while a full index grows to gigabytes.</p>
<h3 id="heading-the-verdict">The Verdict</h3>
<p><code>SKIP LOCKED</code> is production-ready for job queues. Libraries like <a href="https://github.com/timgit/pg-boss">pg-boss</a> (Node.js) and <a href="https://github.com/riverqueue/river">river</a> (Go) build on this exact pattern.</p>
<p>You do give up exchange/routing patterns (fan-out, topic-based routing) and consumer groups with message replay. If you need those, a dedicated message broker is still the right tool. For simple "process this job once" workloads, PostgreSQL handles it.</p>
<h2 id="heading-benchmark-3-full-text-search-with-tsvector">Benchmark 3: Full-Text Search with tsvector</h2>
<p><strong>The idea:</strong> Use PostgreSQL's built-in full-text search instead of a separate search service. A <code>tsvector</code> column stores pre-processed search tokens, and a GIN (Generalized Inverted Index) enables fast lookups using the same inverted index concept that powers Elasticsearch.</p>
<pre><code class="language-sql">-- Search-optimized article table
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    search_vector tsvector  -- pre-computed search tokens
);

-- GIN index for full-text search
CREATE INDEX idx_search ON articles USING GIN (search_vector);

-- Auto-update search_vector on insert/update
CREATE OR REPLACE FUNCTION update_search_vector() RETURNS trigger AS $$
BEGIN
    NEW.search_vector := to_tsvector('english',
        COALESCE(NEW.title, '') || ' ' || COALESCE(NEW.body, ''));
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_search
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION update_search_vector();
</code></pre>
<p>The baseline uses <code>ILIKE</code> with a leading wildcard — the approach most developers reach for first:</p>
<pre><code class="language-sql">-- Baseline: sequential scan on every query
SELECT * FROM articles
WHERE title ILIKE '%postgresql%' OR body ILIKE '%postgresql%';

-- Modded: GIN index lookup with relevance ranking
SELECT id, title,
    ts_rank(search_vector, plainto_tsquery('english', 'postgresql')) AS rank
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql')
ORDER BY rank DESC LIMIT 20;
</code></pre>
<h3 id="heading-results-500-virtual-users">Results (500 Virtual Users)</h3>
<table>
<thead>
<tr>
<th>Mode</th>
<th>p50</th>
<th>p95</th>
<th>avg</th>
<th>req/s</th>
</tr>
</thead>
<tbody><tr>
<td>Baseline (ILIKE)</td>
<td>1.96ms</td>
<td>101.83ms</td>
<td>25.22ms</td>
<td>561/s</td>
</tr>
<tr>
<td>Modded (tsvector + GIN)</td>
<td>2.76ms</td>
<td>10.39ms</td>
<td>3.76ms</td>
<td>675/s</td>
</tr>
</tbody></table>
<p>This is the standout result. The baseline's p95 of 101ms versus the modded's 10ms is a 10x improvement.</p>
<p>Why the baseline's p50 (1.96ms) is slightly better than the modded's (2.76ms): simple <code>ILIKE</code> queries on small result sets can be fast when the data fits in <code>shared_buffers</code>. But as load increases and the buffer cache is contested, sequential scans degrade dramatically. The GIN index stays stable.</p>
<h3 id="heading-under-stress-500-virtual-users-no-sleep">Under Stress (500 Virtual Users, No Sleep)</h3>
<table>
<thead>
<tr>
<th>Mode</th>
<th>p50</th>
<th>p95</th>
<th>req/s</th>
<th>Total Requests</th>
</tr>
</thead>
<tbody><tr>
<td>Baseline (ILIKE)</td>
<td>599ms</td>
<td>1,000ms</td>
<td>558/s</td>
<td>50,212</td>
</tr>
<tr>
<td>Modded (tsvector)</td>
<td>209ms</td>
<td>396ms</td>
<td>1,441/s</td>
<td>129,679</td>
</tr>
</tbody></table>
<p>ILIKE collapses to 1-second p95 latencies. Each query forces a sequential scan of all 10,000 articles, blocking shared buffers and starving concurrent queries. The tsvector approach serves 2.6x more requests in the same time window because the GIN index lookup is O(log n) regardless of concurrency.</p>
<h3 id="heading-the-verdict">The Verdict</h3>
<p>This is the strongest argument in the entire benchmark. The fix requires zero extensions — <code>to_tsvector()</code>, <code>plainto_tsquery()</code>, and <code>CREATE INDEX USING GIN</code> are all built into core PostgreSQL. If you're doing <code>WHERE column ILIKE '%term%'</code> on any table with more than a few thousand rows, you're leaving massive performance on the table.</p>
<p>You do give up distributed search across shards, complex analyzers for CJK languages, and aggregation/faceted search pipelines. For a product search bar, blog search, or internal tool — PostgreSQL is enough.</p>
<h2 id="heading-benchmark-4-pubsub-with-listennotify">Benchmark 4: Pub/Sub with LISTEN/NOTIFY</h2>
<p><strong>The idea:</strong> Use PostgreSQL's native <code>LISTEN/NOTIFY</code> for pub/sub messaging, triggered automatically on INSERT via a database trigger.</p>
<pre><code class="language-sql">-- Trigger that fires pg_notify on every new message
CREATE OR REPLACE FUNCTION notify_message() RETURNS trigger AS $$
BEGIN
    PERFORM pg_notify(NEW.channel, NEW.payload::text);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_notify
    AFTER INSERT ON messages
    FOR EACH ROW EXECUTE FUNCTION notify_message();
</code></pre>
<h3 id="heading-results-200-virtual-users">Results (200 Virtual Users)</h3>
<table>
<thead>
<tr>
<th>Mode</th>
<th>p50</th>
<th>p95</th>
<th>avg</th>
<th>req/s</th>
</tr>
</thead>
<tbody><tr>
<td>Baseline (poll-based)</td>
<td>1.99ms</td>
<td>6.04ms</td>
<td>2.84ms</td>
<td>1,116/s</td>
</tr>
<tr>
<td>Modded (LISTEN/NOTIFY)</td>
<td>1.65ms</td>
<td>4.80ms</td>
<td>2.13ms</td>
<td>1,131/s</td>
</tr>
</tbody></table>
<p>Here we have a 20% improvement at p95. The trigger-based approach does more work per INSERT (INSERT + NOTIFY), but the reduced round trips and better connection reuse patterns offset the overhead.</p>
<h3 id="heading-the-verdict">The Verdict</h3>
<p><code>LISTEN/NOTIFY</code> works for real-time features where you would otherwise reach for Redis pub/sub. The main limitation is payload size (8,000 bytes maximum) and the requirement for dedicated connections (incompatible with PgBouncer in transaction mode).</p>
<h2 id="heading-the-combined-workload-the-honest-test">The Combined Workload: The Honest Test</h2>
<p>Individual benchmarks are flattering. The real question: can one PostgreSQL instance handle caching, queues, search, and pub/sub simultaneously without degrading?</p>
<h3 id="heading-results-all-four-workloads-running-together">Results (All Four Workloads Running Together)</h3>
<table>
<thead>
<tr>
<th>Mode</th>
<th>p50</th>
<th>p95</th>
<th>avg</th>
<th>req/s</th>
</tr>
</thead>
<tbody><tr>
<td>Baseline</td>
<td>1.65ms</td>
<td>5.24ms</td>
<td>2.17ms</td>
<td>1,424/s</td>
</tr>
<tr>
<td>Modded</td>
<td>1.86ms</td>
<td>6.05ms</td>
<td>2.47ms</td>
<td>1,417/s</td>
</tr>
</tbody></table>
<p>Under combined load, the baseline marginally outperforms the modded setup. The modded PostgreSQL does more work per operation — maintaining GIN indexes, firing triggers, running <code>pg_cron</code> in the background. When all these features are active simultaneously, the overhead is measurable: about 15% higher p95 latency.</p>
<p>But both setups stay comfortably under 10ms at p95. For most web applications, that's more than good enough.</p>
<h2 id="heading-what-i-learned">What I Learned</h2>
<p>After running all these benchmarks, here's what I would tell a team evaluating whether to "just use Postgres":</p>
<ol>
<li><p><strong>Do it for full-text search:</strong> Switching from <code>ILIKE</code> to <code>tsvector</code> with a GIN index is a 10x improvement that requires zero extensions. This is the single highest-ROI change in the entire PostgreSQL ecosystem, and most developers don't know it exists.</p>
</li>
<li><p><strong>Do it for job queues:</strong> <code>SKIP LOCKED</code> is production-ready and eliminates RabbitMQ for simple "process this job" workloads. Use a library like pg-boss or river rather than rolling your own.</p>
</li>
<li><p><strong>Consider it for caching:</strong> UNLOGGED tables give a steady 13% improvement over regular tables. If sub-millisecond latency is not a hard requirement (and for most web apps, it is not), you can drop Redis entirely.</p>
</li>
<li><p><strong>Be honest about the overhead:</strong> Running all four roles simultaneously adds about 15% latency compared to running any single role. Whether that matters depends on your latency budget.</p>
</li>
<li><p><strong>Know where to stop:</strong> PostgreSQL won't match Redis for sub-millisecond caching, Kafka for millions of messages per second, or Elasticsearch for distributed multi-node search with complex analyzers. The line is at extreme throughput or extreme specialization.</p>
</li>
</ol>
<p>The honest conclusion is not "PostgreSQL does everything." It is: for most applications, a single well-configured PostgreSQL instance handles 80% of what you would otherwise need three to five additional services for. That is less infrastructure to deploy, monitor, and maintain — and fewer things to break at 3 AM.</p>
<p>Enterprise-scale applications processing millions of messages per second, serving sub-millisecond cache hits to millions of concurrent users, or running distributed search across terabytes of documents will still need specialized tools. Those tools exist for a reason, and at that scale the operational cost of running them is justified by the performance you get back.</p>
<p>But most of us aren't building at that scale — and may never need to. Starting with PostgreSQL for these roles means you ship faster with fewer moving parts. If and when you outgrow what PostgreSQL can handle, your benchmarks will tell you exactly which role needs to be extracted into a dedicated service. That is a much better position than starting with five services on day one because you assumed you would need them.</p>
<p>The <a href="https://github.com/aaronhsyong2/pg-stack-benchmark">benchmark project</a> is open source if you want to reproduce these results or adapt the tests for your own workload.</p>
<p>You can find more of my writing at <a href="https://site.aaronhsyong.com">site.aaronhsyong.com</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How Database Indexes Work – A Practical Guide with PostgreSQL Examples ]]>
                </title>
                <description>
                    <![CDATA[ Every developer eventually runs into a slow query. The table has grown from a few hundred rows to a few million, and what used to take milliseconds now takes seconds — or worse. The fix, more often th ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-database-indexes-work-a-practical-guide-with-postgresql-examples/</link>
                <guid isPermaLink="false">69e11c10ffbb787634dea035</guid>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ PostgreSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ indexing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ iyiola ]]>
                </dc:creator>
                <pubDate>Thu, 16 Apr 2026 17:27:44 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/cf6919a4-f803-4783-83ff-5c7674141c55.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Every developer eventually runs into a slow query. The table has grown from a few hundred rows to a few million, and what used to take milliseconds now takes seconds — or worse.</p>
<p>The fix, more often than not, is an index.</p>
<p>A database index is a data structure that helps the database find rows faster without scanning the entire table. It works a lot like the index at the back of a textbook: instead of reading every page to find a topic, you look it up in the index, get the page number, and go straight there.</p>
<p>In this tutorial, you'll learn how indexes work under the hood, how to create and use them effectively in PostgreSQL, and how to avoid the common mistakes that make indexes useless or even harmful.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-why-do-you-need-indexes">Why Do You Need Indexes?</a></p>
</li>
<li><p><a href="#heading-how-indexes-work-under-the-hood">How Indexes Work Under the Hood</a></p>
</li>
<li><p><a href="#heading-how-to-create-your-first-index">How to Create Your First Index</a></p>
</li>
<li><p><a href="#heading-how-to-use-explain-analyze-to-measure-performance">How to Use EXPLAIN ANALYZE to Measure Performance</a></p>
</li>
<li><p><a href="#heading-types-of-indexes-in-postgresql">Types of Indexes in PostgreSQL</a></p>
</li>
<li><p><a href="#heading-how-to-create-a-composite-index">How to Create a Composite Index</a></p>
</li>
<li><p><a href="#heading-how-to-create-a-partial-index">How to Create a Partial Index</a></p>
</li>
<li><p><a href="#heading-how-to-create-an-expression-index">How to Create an Expression Index</a></p>
</li>
<li><p><a href="#heading-how-to-create-a-unique-index">How to Create a Unique Index</a></p>
</li>
<li><p><a href="#heading-how-to-manage-indexes">How to Manage Indexes</a></p>
</li>
<li><p><a href="#heading-when-indexes-hurt-instead-of-help">When Indexes Hurt Instead of Help</a></p>
</li>
<li><p><a href="#heading-common-mistakes-that-prevent-index-usage">Common Mistakes That Prevent Index Usage</a></p>
</li>
<li><p><a href="#heading-best-practices-for-indexing">Best Practices for Indexing</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along with the examples, you'll need:</p>
<ul>
<li><p>Basic knowledge of SQL (SELECT, INSERT, UPDATE, DELETE, WHERE, JOIN)</p>
</li>
<li><p>A running PostgreSQL instance (version 12 or later)</p>
</li>
<li><p>A SQL client like <code>psql</code>, pgAdmin, or DBeaver</p>
</li>
</ul>
<p>If you don't have PostgreSQL installed locally, you can use a free cloud-hosted instance from services like <a href="https://neon.tech">Neon</a> or <a href="https://supabase.com">Supabase</a>.</p>
<h2 id="heading-why-do-you-need-indexes">Why Do You Need Indexes?</h2>
<p>When you run a query like <code>SELECT * FROM users WHERE email = 'jane@example.com'</code>, the database needs to find the matching row. Without an index, PostgreSQL performs a <strong>sequential scan</strong> — it reads every single row in the table and checks whether the <code>email</code> column matches.</p>
<p>For a table with 100 rows, this is fine. For a table with 10 million rows, it's painfully slow.</p>
<p>An index solves this by creating a separate, sorted data structure that maps column values to their row locations. Instead of scanning 10 million rows, PostgreSQL can look up the value in the index and jump directly to the matching row. This can reduce query time from seconds to milliseconds.</p>
<p>But indexes aren't free. They come with trade-offs you need to understand before adding them everywhere. You'll learn about those trade-offs throughout this tutorial.</p>
<h2 id="heading-how-indexes-work-under-the-hood">How Indexes Work Under the Hood</h2>
<p>PostgreSQL's default index type is the <strong>B-tree</strong> (balanced tree). Understanding how a B-tree works will help you make smarter decisions about when and how to index.</p>
<p>A B-tree organizes data into a sorted, hierarchical structure with three levels:</p>
<ol>
<li><p><strong>Root node</strong> — the top of the tree. It holds a few values that divide the data into broad ranges.</p>
</li>
<li><p><strong>Internal nodes</strong> — each one further narrows down the range.</p>
</li>
<li><p><strong>Leaf nodes</strong> — the bottom level. These hold the actual indexed values along with pointers to the corresponding rows in the table.</p>
</li>
</ol>
<p>When PostgreSQL uses a B-tree index to find a value, it starts at the root and follows the path that matches the target value, moving through internal nodes until it reaches the correct leaf node. This path is called a <strong>tree traversal</strong>, and it typically requires only 3–4 steps even for tables with millions of rows.</p>
<p>Think of it like a phone book. You don't start at page one and read every name. You open to roughly the right section (root), narrow it down to the right page (internal nodes), and scan the entries on that page (leaf node).</p>
<p>This sorted structure is also why B-tree indexes work well for range queries like <code>WHERE price &gt; 50 AND price &lt; 100</code>. The database finds the starting point in the tree and then scans forward through the leaf nodes, which are already in order.</p>
<h2 id="heading-how-to-create-your-first-index">How to Create Your First Index</h2>
<p>Let's build a practical example. You'll create a table, load it with data, and see the difference an index makes.</p>
<h3 id="heading-step-1-create-the-table-and-insert-sample-data">Step 1 – Create the Table and Insert Sample Data</h3>
<pre><code class="language-sql">CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) NOT NULL,
    city VARCHAR(50),
    created_at TIMESTAMP DEFAULT NOW()
);
</code></pre>
<p>Now insert a large number of rows so the performance difference is visible. This generates 500,000 rows of sample data:</p>
<pre><code class="language-sql">INSERT INTO customers (first_name, last_name, email, city)
SELECT
    'User' || gs,
    'Last' || gs,
    'user' || gs || '@example.com',
    (ARRAY['Lagos', 'London', 'New York', 'Berlin', 'Tokyo'])[1 + (gs % 5)]
FROM generate_series(1, 500000) AS gs;
</code></pre>
<h3 id="heading-step-2-query-without-an-index">Step 2 – Query Without an Index</h3>
<pre><code class="language-sql">EXPLAIN ANALYZE
SELECT * FROM customers WHERE email = 'user250000@example.com';
</code></pre>
<p>You'll see output similar to this:</p>
<pre><code class="language-plaintext">Seq Scan on customers  (cost=0.00..11374.00 rows=1 width=52) (actual time=45.123..91.456 rows=1 loops=1)
  Filter: ((email)::text = 'user250000@example.com'::text)
  Rows Removed by Filter: 499999
Planning Time: 0.085 ms
Execution Time: 91.502 ms
</code></pre>
<p>The key detail here is <code>Seq Scan</code> — PostgreSQL scanned all 500,000 rows to find a single match. It filtered out 499,999 rows. That's a lot of wasted work.</p>
<h3 id="heading-step-3-create-an-index">Step 3 – Create an Index</h3>
<pre><code class="language-sql">CREATE INDEX idx_customers_email ON customers (email);
</code></pre>
<p>This creates a B-tree index on the <code>email</code> column. The name <code>idx_customers_email</code> follows a common naming convention: <code>idx_</code> prefix, then the table name, then the column name.</p>
<h3 id="heading-step-4-query-with-the-index">Step 4 – Query With the Index</h3>
<p>Run the same query again:</p>
<pre><code class="language-sql">EXPLAIN ANALYZE
SELECT * FROM customers WHERE email = 'user250000@example.com';
</code></pre>
<p>Now you'll see something like this:</p>
<pre><code class="language-plaintext">Index Scan using idx_customers_email on customers  (cost=0.42..8.44 rows=1 width=52) (actual time=0.034..0.036 rows=1 loops=1)
  Index Cond: ((email)::text = 'user250000@example.com'::text)
Planning Time: 0.112 ms
Execution Time: 0.058 ms
</code></pre>
<p>The scan type changed from <code>Seq Scan</code> to <code>Index Scan</code>. The execution time dropped from ~91ms to ~0.06ms. That's roughly a 1,500x improvement — from one line of SQL.</p>
<h2 id="heading-how-to-use-explain-analyze-to-measure-performance">How to Use <code>EXPLAIN ANALYZE</code> to Measure Performance</h2>
<p><code>EXPLAIN ANALYZE</code> is your most important tool for understanding how PostgreSQL executes a query. You already saw it in the previous section, but let's break down what the output means.</p>
<pre><code class="language-sql">EXPLAIN ANALYZE SELECT * FROM customers WHERE city = 'Lagos';
</code></pre>
<p>The output will tell you several things:</p>
<ul>
<li><p><strong>Scan type</strong> — whether PostgreSQL used a sequential scan, index scan, bitmap index scan, or another access method</p>
</li>
<li><p><strong>Cost</strong> — the estimated cost in arbitrary units. The first number is the startup cost, the second is the total cost</p>
</li>
<li><p><strong>Rows</strong> — how many rows PostgreSQL estimated it would find versus how many it actually found</p>
</li>
<li><p><strong>Actual time</strong> — the real time in milliseconds to execute the query</p>
</li>
<li><p><strong>Rows Removed by Filter</strong> — how many rows were scanned but didn't match the condition</p>
</li>
</ul>
<p>If you see <code>Seq Scan</code> on a large table with a selective WHERE clause, that's usually a sign you need an index. If you see <code>Index Scan</code> or <code>Index Only Scan</code>, your index is working.</p>
<p>One thing to keep in mind: <code>EXPLAIN</code> without <code>ANALYZE</code> shows the plan without actually running the query. <code>EXPLAIN ANALYZE</code> runs the query and shows real timing data. Always use <code>EXPLAIN ANALYZE</code> when you're investigating performance, but be careful with it on destructive queries — <code>EXPLAIN ANALYZE DELETE FROM ...</code> will actually delete the rows. Wrap those in a transaction and roll back:</p>
<pre><code class="language-sql">BEGIN;
EXPLAIN ANALYZE DELETE FROM customers WHERE city = 'Berlin';
ROLLBACK;
</code></pre>
<h2 id="heading-types-of-indexes-in-postgresql">Types of Indexes in PostgreSQL</h2>
<p>PostgreSQL supports several index types, each optimized for different query patterns.</p>
<h3 id="heading-b-tree-default">B-tree (Default)</h3>
<p>B-tree is the default index type and covers the vast majority of use cases. It supports equality checks (<code>=</code>), range queries (<code>&lt;</code>, <code>&gt;</code>, <code>&lt;=</code>, <code>&gt;=</code>, <code>BETWEEN</code>), sorting (<code>ORDER BY</code>), and <code>IS NULL</code> / <code>IS NOT NULL</code> checks.</p>
<pre><code class="language-sql">-- These are equivalent – B-tree is the default
CREATE INDEX idx_name ON customers (last_name);
CREATE INDEX idx_name ON customers USING btree (last_name);
</code></pre>
<p>Use B-tree when you don't have a specific reason to use something else.</p>
<h3 id="heading-hash">Hash</h3>
<p>Hash indexes are optimized purely for equality comparisons (<code>=</code>). They don't support range queries or sorting. In practice, B-tree handles equality checks almost as fast, so hash indexes are rarely necessary.</p>
<pre><code class="language-sql">CREATE INDEX idx_email_hash ON customers USING hash (email);
</code></pre>
<p>Consider a hash index only if you have a very large table with frequent equality-only lookups and want to save a small amount of index space.</p>
<h3 id="heading-gin-generalized-inverted-index">GIN (Generalized Inverted Index)</h3>
<p>GIN indexes are designed for values that contain multiple elements — like arrays, JSONB documents, or full-text search vectors. Instead of indexing a single value per row, GIN indexes every element within the value.</p>
<pre><code class="language-sql">-- Add a JSONB column
ALTER TABLE customers ADD COLUMN preferences JSONB DEFAULT '{}';

-- Index the JSONB column
CREATE INDEX idx_preferences ON customers USING gin (preferences);

-- Now this query uses the GIN index
SELECT * FROM customers WHERE preferences @&gt; '{"newsletter": true}';
</code></pre>
<p>Use GIN when you're querying inside JSONB data, searching arrays with <code>@&gt;</code> or <code>&amp;&amp;</code>, or doing full-text search with <code>tsvector</code>.</p>
<h3 id="heading-gist-generalized-search-tree">GiST (Generalized Search Tree)</h3>
<p>GiST indexes support geometric data, ranges, and full-text search. They're commonly used with PostGIS for geospatial queries.</p>
<pre><code class="language-sql">-- Range type example
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    duration TSRANGE
);

CREATE INDEX idx_event_duration ON events USING gist (duration);

-- Find overlapping events
SELECT * FROM events WHERE duration &amp;&amp; '[2025-01-01, 2025-01-31]'::tsrange;
</code></pre>
<p>Use GiST when you're working with spatial data, range types, or need overlap/containment operators.</p>
<h3 id="heading-brin-block-range-index">BRIN (Block Range Index)</h3>
<p>BRIN indexes are extremely small and work well on large tables where the physical row order correlates with the indexed column's value. A common example is a timestamp column on an append-only table where new rows always have later timestamps.</p>
<pre><code class="language-sql">CREATE INDEX idx_created_at_brin ON customers USING brin (created_at);
</code></pre>
<p>BRIN stores summary information (min/max values) for each block of rows rather than indexing every row individually. This makes the index much smaller than a B-tree, but it only works well when the data is naturally ordered.</p>
<p>Use BRIN for very large, append-only tables with naturally ordered data — like logs, events, or time-series data.</p>
<h2 id="heading-how-to-create-a-composite-index">How to Create a Composite Index</h2>
<p>A composite index (also called a multi-column index) covers more than one column. It's useful when your queries frequently filter or sort by multiple columns together.</p>
<pre><code class="language-sql">CREATE INDEX idx_city_lastname ON customers (city, last_name);
</code></pre>
<p>The order of columns in a composite index matters. PostgreSQL can use this index for queries that filter on <code>city</code> alone, or on both <code>city</code> and <code>last_name</code>. But it <strong>can't</strong> efficiently use this index for queries that filter only on <code>last_name</code>.</p>
<p>Think of it like a phone book sorted by city first, then by last name within each city. You can easily look up everyone in Lagos. You can also look up everyone named "Adeyemi" in Lagos. But finding all people named "Adeyemi" across all cities requires scanning the whole book.</p>
<p>This principle is called the <strong>leftmost prefix rule</strong>: PostgreSQL can use a composite index for queries that include the leftmost column(s) of the index, but not for queries that skip them.</p>
<pre><code class="language-sql">-- ✅ Uses the index (matches leftmost column)
SELECT * FROM customers WHERE city = 'Lagos';

-- ✅ Uses the index (matches both columns, left to right)
SELECT * FROM customers WHERE city = 'Lagos' AND last_name = 'Adeyemi';

-- ❌ Cannot use this index efficiently (skips the leftmost column)
SELECT * FROM customers WHERE last_name = 'Adeyemi';
</code></pre>
<p>When deciding column order, place the most selective column first — the one that narrows down the results the most.</p>
<h2 id="heading-how-to-create-a-partial-index">How to Create a Partial Index</h2>
<p>A partial index covers only a subset of rows in a table. You define the subset with a WHERE clause in the index definition.</p>
<p>This is useful when you only query a specific portion of the data. For example, if you have an <code>orders</code> table and you frequently query for pending orders but rarely look at completed ones:</p>
<pre><code class="language-sql">CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'pending',
    total NUMERIC(10, 2),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Only index rows where status is 'pending'
CREATE INDEX idx_orders_pending ON orders (customer_id)
WHERE status = 'pending';
</code></pre>
<p>This index is smaller than a full index because it skips all rows that don't match the WHERE condition. Smaller indexes use less disk space, consume less memory, and are faster to maintain during writes.</p>
<p>For the index to be used, your query's WHERE clause must match the index's condition:</p>
<pre><code class="language-sql">-- ✅ Uses the partial index
SELECT * FROM orders WHERE status = 'pending' AND customer_id = 42;

-- ❌ Cannot use the partial index (different status)
SELECT * FROM orders WHERE status = 'shipped' AND customer_id = 42;
</code></pre>
<h2 id="heading-how-to-create-an-expression-index">How to Create an Expression Index</h2>
<p>Sometimes you need to index the result of a function or expression rather than a raw column value. Expression indexes (also called functional indexes) handle this.</p>
<p>A common scenario is case-insensitive email lookups. If your queries use <code>LOWER(email)</code>, a regular index on <code>email</code> won't help — PostgreSQL sees the function call as a different expression.</p>
<pre><code class="language-sql">-- Regular index on email – won't help with LOWER() queries
CREATE INDEX idx_email ON customers (email);

-- This query does NOT use the index above
SELECT * FROM customers WHERE LOWER(email) = 'user100@example.com';
</code></pre>
<p>To fix this, create an index on the expression itself:</p>
<pre><code class="language-sql">CREATE INDEX idx_email_lower ON customers (LOWER(email));
</code></pre>
<p>Now queries that use <code>LOWER(email)</code> in their WHERE clause will use this index:</p>
<pre><code class="language-sql">-- ✅ Uses the expression index
SELECT * FROM customers WHERE LOWER(email) = 'user100@example.com';
</code></pre>
<p>The rule is straightforward: the expression in your query must match the expression in the index exactly. If the index is on <code>LOWER(email)</code>, your query must also use <code>LOWER(email)</code>.</p>
<h2 id="heading-how-to-create-a-unique-index">How to Create a Unique Index</h2>
<p>A unique index guarantees that no two rows have the same value (or combination of values) in the indexed columns. It serves a dual purpose: it enforces data integrity and provides fast lookups.</p>
<pre><code class="language-sql">CREATE UNIQUE INDEX idx_customers_email_unique ON customers (email);
</code></pre>
<p>If you try to insert a duplicate value, PostgreSQL will reject the operation:</p>
<pre><code class="language-sql">INSERT INTO customers (first_name, last_name, email, city)
VALUES ('Test', 'User', 'user1@example.com', 'Lagos');
-- ERROR: duplicate key value violates unique constraint "idx_customers_email_unique"
</code></pre>
<p>You might wonder how this differs from a UNIQUE constraint. Under the hood, PostgreSQL implements UNIQUE constraints by creating a unique index. The two are functionally identical.</p>
<p>The difference is intent — a UNIQUE constraint expresses a data integrity rule, while a unique index explicitly focuses on query performance with uniqueness as a bonus.</p>
<h2 id="heading-how-to-manage-indexes">How to Manage Indexes</h2>
<p>As your database grows, you'll need to inspect, monitor, and maintain your indexes.</p>
<h3 id="heading-how-to-list-all-indexes-on-a-table">How to List All Indexes on a Table</h3>
<pre><code class="language-sql">SELECT
    indexname,
    indexdef
FROM pg_indexes
WHERE tablename = 'customers';
</code></pre>
<p>This shows the name and full definition of every index on the table.</p>
<h3 id="heading-how-to-check-index-size">How to Check Index Size</h3>
<pre><code class="language-sql">SELECT
    pg_size_pretty(pg_relation_size('idx_customers_email')) AS index_size;
</code></pre>
<p>For a broader view of all indexes and their sizes:</p>
<pre><code class="language-sql">SELECT
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE relname = 'customers'
ORDER BY pg_relation_size(indexrelid) DESC;
</code></pre>
<h3 id="heading-how-to-find-unused-indexes">How to Find Unused Indexes</h3>
<p>Indexes that are never used waste disk space and slow down writes. You can find them by checking <code>pg_stat_user_indexes</code>:</p>
<pre><code class="language-sql">SELECT
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE relname = 'customers'
AND idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
</code></pre>
<p>If an index has <code>idx_scan = 0</code> after a reasonable period of normal usage, it's a candidate for removal. Just make sure to check across a full business cycle — some indexes are only used during monthly reports or seasonal operations.</p>
<h3 id="heading-how-to-drop-an-index">How to Drop an Index</h3>
<pre><code class="language-sql">DROP INDEX IF EXISTS idx_customers_email;
</code></pre>
<p>If you're dropping an index on a production table and want to avoid locking writes, use <code>CONCURRENTLY</code>:</p>
<pre><code class="language-sql">DROP INDEX CONCURRENTLY IF EXISTS idx_customers_email;
</code></pre>
<h3 id="heading-how-to-rebuild-an-index">How to Rebuild an Index</h3>
<p>Over time, indexes can become bloated as rows are inserted, updated, and deleted. You can rebuild an index to reclaim space:</p>
<pre><code class="language-sql">REINDEX INDEX idx_customers_email;
</code></pre>
<p>Or rebuild all indexes on a table:</p>
<pre><code class="language-sql">REINDEX TABLE customers;
</code></pre>
<p>On production systems, use <code>REINDEX CONCURRENTLY</code> (PostgreSQL 12+) to avoid locking the table:</p>
<pre><code class="language-sql">REINDEX INDEX CONCURRENTLY idx_customers_email;
</code></pre>
<h2 id="heading-when-indexes-hurt-instead-of-help">When Indexes Hurt Instead of Help</h2>
<p>Indexes aren't free. Every index you add comes with costs:</p>
<ol>
<li><p><strong>Write overhead</strong> — every INSERT, UPDATE, or DELETE must also update every index on the table. If a table has 10 indexes and you insert a row, PostgreSQL performs 11 write operations (one for the table and one for each index). On write-heavy tables, excessive indexes can significantly slow down data modification.</p>
</li>
<li><p><strong>Storage cost</strong> — indexes consume disk space. On large tables, indexes can take up as much space as the table itself, sometimes more. You can check this with <code>pg_relation_size</code>.</p>
</li>
<li><p><strong>Memory consumption</strong> — PostgreSQL caches frequently used indexes in memory. More indexes means more memory pressure, which can push useful data out of the cache and slow down other queries.</p>
</li>
<li><p><strong>Maintenance burden</strong> — indexes need periodic maintenance (vacuuming, reindexing) and add complexity to schema migrations.</p>
</li>
</ol>
<p>The question to ask is not "should I add an index?" but rather "does the read performance gain justify the write performance cost for this table's workload?"</p>
<h2 id="heading-common-mistakes-that-prevent-index-usage">Common Mistakes That Prevent Index Usage</h2>
<p>You can have the perfect index and PostgreSQL might still ignore it. Here are the most common reasons.</p>
<h3 id="heading-wrapping-the-indexed-column-in-a-function">Wrapping the Indexed Column in a Function</h3>
<pre><code class="language-sql">-- Index on email
CREATE INDEX idx_email ON customers (email);

-- ❌ PostgreSQL cannot use the index because of LOWER()
SELECT * FROM customers WHERE LOWER(email) = 'user1@example.com';

-- ✅ Fix: create an expression index on LOWER(email)
CREATE INDEX idx_email_lower ON customers (LOWER(email));
</code></pre>
<p>Any function applied to the indexed column in a WHERE clause prevents the standard index from being used. You need an expression index that matches the function.</p>
<h3 id="heading-implicit-type-casting">Implicit Type Casting</h3>
<pre><code class="language-sql">-- id is an INTEGER column with an index
-- ❌ Passing a string forces a type cast, which may prevent index usage
SELECT * FROM customers WHERE id = '42';

-- ✅ Use the correct type
SELECT * FROM customers WHERE id = 42;
</code></pre>
<p>When the query's value type doesn't match the column type, PostgreSQL may cast the column to match, which prevents index usage.</p>
<h3 id="heading-using-or-conditions-across-different-columns">Using OR Conditions Across Different Columns</h3>
<pre><code class="language-sql">-- ❌ OR across different columns can prevent index usage
SELECT * FROM customers WHERE email = 'user1@example.com' OR city = 'Lagos';

-- ✅ Rewrite as UNION for better index utilization
SELECT * FROM customers WHERE email = 'user1@example.com'
UNION
SELECT * FROM customers WHERE city = 'Lagos';
</code></pre>
<h3 id="heading-leading-wildcards-in-like-queries">Leading Wildcards in LIKE Queries</h3>
<pre><code class="language-sql">-- ❌ Leading wildcard cannot use a B-tree index
SELECT * FROM customers WHERE email LIKE '%@example.com';

-- ✅ Trailing wildcard CAN use a B-tree index
SELECT * FROM customers WHERE email LIKE 'user1%';
</code></pre>
<p>A B-tree index is sorted from left to right. A leading wildcard (<code>%something</code>) means the database can't use the sorted structure and falls back to a sequential scan. If you need to search by suffix or substring, consider a GIN index with the <code>pg_trgm</code> extension.</p>
<h3 id="heading-low-selectivity">Low Selectivity</h3>
<p>If a column has very few distinct values relative to the number of rows (low selectivity), PostgreSQL may decide a sequential scan is faster than using the index.</p>
<p>For example, if a <code>status</code> column has only three possible values (<code>'pending'</code>, <code>'shipped'</code>, <code>'delivered'</code>) and each value covers roughly a third of the table, an index on <code>status</code> alone provides little benefit. PostgreSQL would still need to read a large portion of the table, and the extra index lookup adds overhead.</p>
<p>A partial index is often the better solution in these cases.</p>
<h2 id="heading-best-practices-for-indexing">Best Practices for Indexing</h2>
<p>Here's a summary of the key principles to follow:</p>
<ol>
<li><p><strong>Index columns that appear in WHERE, JOIN, and ORDER BY clauses.</strong> These are the columns the database needs to search, match, or sort by. Start with the queries that run most frequently or take the longest.</p>
</li>
<li><p><strong>Measure before and after with EXPLAIN ANALYZE.</strong> Never add an index based on guesswork. Run your query with <code>EXPLAIN ANALYZE</code>, add the index, and run it again. If the execution time doesn't improve meaningfully, the index isn't helping.</p>
</li>
<li><p><strong>Don't index every column.</strong> Each index slows down writes and consumes storage. Be deliberate about which columns you index based on actual query patterns.</p>
</li>
<li><p><strong>Use composite indexes for multi-column filters.</strong> If your queries commonly filter on <code>city</code> and <code>last_name</code> together, a composite index on <code>(city, last_name)</code> is more efficient than two separate single-column indexes.</p>
</li>
<li><p><strong>Put the most selective column first in composite indexes.</strong> The column that narrows the results the most should come first.</p>
</li>
<li><p><strong>Use partial indexes when you only query a subset of data.</strong> If 90% of your queries target rows where <code>status = 'active'</code>, a partial index on that subset is smaller and faster than a full index.</p>
</li>
<li><p><strong>Monitor index usage regularly.</strong> Query <code>pg_stat_user_indexes</code> to find unused indexes and remove them.</p>
</li>
<li><p><strong>Rebuild bloated indexes periodically.</strong> On tables with heavy update/delete activity, indexes can become bloated. Use <code>REINDEX CONCURRENTLY</code> on production systems.</p>
</li>
</ol>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you learned what database indexes are and why they matter for query performance. You explored how B-tree indexes work under the hood, created several types of indexes (single-column, composite, partial, expression, and unique), and used <code>EXPLAIN ANALYZE</code> to measure the impact.</p>
<p>You also learned about the trade-offs indexes introduce — write overhead, storage cost, and memory pressure — and the common mistakes that silently prevent PostgreSQL from using your indexes.</p>
<p>The core principle is simple: index deliberately based on your actual query patterns, measure the results, and remove anything that isn't pulling its weight.</p>
<p>If you found this tutorial helpful, you can find more of my writing on <a href="https://freecodecamp.org/news/author/iyiola">freeCodeCamp</a> and connect with me on <a href="https://linkedin.com/in/iyioladev">LinkedIn</a> and <a href="https://x.com/iyiola_dev_">X</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What Are Database Triggers? A Practical Introduction with PostgreSQL Examples ]]>
                </title>
                <description>
                    <![CDATA[ If you've ever needed your database to automatically respond to changes – like logging every update to a sensitive table, enforcing a business rule before an insert, or syncing derived data after a de ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-are-database-triggers-practical-intro-with-postgresql-examples/</link>
                <guid isPermaLink="false">69c6d1357cf270651037755c</guid>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ PostgreSQL ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ iyiola ]]>
                </dc:creator>
                <pubDate>Fri, 27 Mar 2026 18:49:25 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/b5940820-d1aa-4d10-8b40-06005bec7e60.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you've ever needed your database to automatically respond to changes – like logging every update to a sensitive table, enforcing a business rule before an insert, or syncing derived data after a delete – then triggers are the tool you're looking for.</p>
<p>A database trigger is a function that the database executes automatically when a specific event occurs on a table. You don't call it manually. Instead, you define the conditions, and the database handles the rest.</p>
<p>In this tutorial, you'll learn what triggers are, how they work, when to use them, and when to avoid them. You'll work through practical examples using PostgreSQL, but the core concepts apply to most relational databases.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-how-triggers-work">How Triggers Work</a></p>
</li>
<li><p><a href="#heading-how-to-create-your-first-trigger">How to Create Your First Trigger</a></p>
</li>
<li><p><a href="#heading-before-vs-after-triggers">BEFORE vs AFTER Triggers</a></p>
</li>
<li><p><a href="#heading-how-to-build-an-audit-log-with-an-after-trigger">How to Build an Audit Log with an AFTER Trigger</a></p>
</li>
<li><p><a href="#heading-how-to-use-a-before-trigger-for-validation">How to Use a BEFORE Trigger for Validation</a></p>
</li>
<li><p><a href="#heading-row-level-vs-statement-level-triggers">Row-Level vs Statement-Level Triggers</a></p>
</li>
<li><p><a href="#heading-the-new-and-old-variables-reference">The NEW and OLD Variables Reference</a></p>
</li>
<li><p><a href="#heading-how-to-manage-triggers">How to Manage Triggers</a></p>
</li>
<li><p><a href="#heading-when-to-use-triggers">When to Use Triggers</a></p>
</li>
<li><p><a href="#heading-when-to-avoid-triggers">When to Avoid Triggers</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along with the examples, you'll need:</p>
<ul>
<li><p>Basic knowledge of SQL (SELECT, INSERT, UPDATE, DELETE)</p>
</li>
<li><p>A running PostgreSQL instance (version 12 or later)</p>
</li>
<li><p>A SQL client like <code>psql</code>, pgAdmin, or DBeaver</p>
</li>
</ul>
<p>If you don't have PostgreSQL installed, you can use a free cloud-hosted instance from services like <a href="https://neon.tech">Neon</a> or <a href="https://supabase.com">Supabase</a> to follow along.</p>
<h2 id="heading-how-triggers-work">How Triggers Work</h2>
<p>At a high level, a trigger has three parts:</p>
<ol>
<li><p><strong>The event</strong>: what action activates the trigger (INSERT, UPDATE, DELETE, or TRUNCATE)</p>
</li>
<li><p><strong>The timing</strong>: when the trigger fires relative to the event (BEFORE or AFTER)</p>
</li>
<li><p><strong>The function</strong>: what logic runs when the trigger fires</p>
</li>
</ol>
<p>Here's the general flow: a user or application performs an operation on a table, the database checks if any triggers are associated with that operation, and if a match is found, the database executes the trigger function automatically.</p>
<p>You can think of triggers as event listeners for your database. Just like a JavaScript <code>addEventListener</code> watches for a click or keypress, a database trigger watches for row-level changes on a table.</p>
<h2 id="heading-how-to-create-your-first-trigger">How to Create Your First Trigger</h2>
<p>In PostgreSQL, creating a trigger is a two-step process. You first create a trigger function, then you attach that function to a table with a <code>CREATE TRIGGER</code> statement.</p>
<p>Let's build a concrete example. Say you have a <code>products</code> table and you want to automatically set the <code>updated_at</code> timestamp every time a row is modified.</p>
<h3 id="heading-step-1-create-the-table">Step 1 – Create the Table</h3>
<pre><code class="language-sql">CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price NUMERIC(10, 2) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);
</code></pre>
<h3 id="heading-step-2-create-the-trigger-function">Step 2 – Create the Trigger Function</h3>
<p>A trigger function in PostgreSQL is a special function that returns the <code>TRIGGER</code> type. Inside the function body, you have access to two important variables: <code>NEW</code> (the row after the operation) and <code>OLD</code> (the row before the operation).</p>
<pre><code class="language-sql">CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
</code></pre>
<p>This function sets the <code>updated_at</code> column to the current timestamp every time it runs. It then returns <code>NEW</code>, which tells PostgreSQL to proceed with the modified row.</p>
<h3 id="heading-step-3-attach-the-trigger-to-the-table">Step 3 – Attach the Trigger to the Table</h3>
<pre><code class="language-sql">CREATE TRIGGER trigger_set_updated_at
BEFORE UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION set_updated_at();
</code></pre>
<p>Let's break down each part of this statement:</p>
<ul>
<li><p><code>BEFORE UPDATE</code> – the trigger fires before the update is applied to the table</p>
</li>
<li><p><code>ON products</code> – the trigger is associated with the <code>products</code> table</p>
</li>
<li><p><code>FOR EACH ROW</code> – the function runs once for every row affected by the update</p>
</li>
<li><p><code>EXECUTE FUNCTION set_updated_at()</code> – the function to call</p>
</li>
</ul>
<h3 id="heading-step-4-test-it">Step 4 – Test It</h3>
<pre><code class="language-sql">INSERT INTO products (name, price) VALUES ('Wireless Keyboard', 49.99);

-- Wait a moment, then update the row
UPDATE products SET price = 44.99 WHERE name = 'Wireless Keyboard';

SELECT name, price, created_at, updated_at FROM products;
</code></pre>
<p>You'll see that <code>updated_at</code> has been automatically updated to the time of the UPDATE operation, even though you didn't explicitly set it in your query. That's the trigger doing its job.</p>
<h2 id="heading-before-vs-after-triggers">BEFORE vs AFTER Triggers</h2>
<p>The timing of a trigger determines when the function executes relative to the actual data change.</p>
<p><strong>BEFORE triggers</strong> run before the row is inserted, updated, or deleted. They are useful when you want to modify or validate the incoming data. Since the change hasn't been applied yet, you can alter the <code>NEW</code> row or even cancel the operation entirely by returning <code>NULL</code>.</p>
<p><strong>AFTER triggers</strong> run after the row change has been committed to the table. They are useful for side effects like logging, sending notifications, or updating related tables. At this point, the change is already done, so you can't modify the row – but you can read both <code>OLD</code> and <code>NEW</code> to see what changed.</p>
<p>Here's a rule of thumb: use BEFORE triggers when you need to change or reject data, and use AFTER triggers when you need to react to a completed change.</p>
<h2 id="heading-how-to-build-an-audit-log-with-an-after-trigger">How to Build an Audit Log with an AFTER Trigger</h2>
<p>One of the most common uses for triggers is audit logging – keeping a record of every change made to an important table. Let's build one.</p>
<h3 id="heading-step-1-create-an-audit-table">Step 1 – Create an Audit Table</h3>
<pre><code class="language-sql">CREATE TABLE product_audit (
    audit_id SERIAL PRIMARY KEY,
    product_id INT NOT NULL,
    action VARCHAR(10) NOT NULL,
    old_price NUMERIC(10, 2),
    new_price NUMERIC(10, 2),
    changed_by TEXT DEFAULT current_user,
    changed_at TIMESTAMP DEFAULT NOW()
);
</code></pre>
<h3 id="heading-step-2-create-the-audit-trigger-function">Step 2 – Create the Audit Trigger Function</h3>
<pre><code class="language-sql">CREATE OR REPLACE FUNCTION log_product_changes()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'UPDATE' THEN
        INSERT INTO product_audit (product_id, action, old_price, new_price)
        VALUES (OLD.id, 'UPDATE', OLD.price, NEW.price);
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO product_audit (product_id, action, old_price)
        VALUES (OLD.id, 'DELETE', OLD.price);
    ELSIF TG_OP = 'INSERT' THEN
        INSERT INTO product_audit (product_id, action, new_price)
        VALUES (NEW.id, 'INSERT', NEW.price);
    END IF;

    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;
</code></pre>
<p>There are a few important things happening here. The <code>TG_OP</code> variable is a special string that PostgreSQL provides inside trigger functions. It tells you which operation activated the trigger: <code>'INSERT'</code>, <code>'UPDATE'</code>, or <code>'DELETE'</code>. This lets you handle different operations with a single function.</p>
<p>The <code>RETURN COALESCE(NEW, OLD)</code> at the end ensures the function returns the correct row. For INSERT and UPDATE operations, <code>NEW</code> exists and is returned. For DELETE operations, <code>NEW</code> is null, so <code>OLD</code> is returned instead.</p>
<h3 id="heading-step-3-attach-the-trigger">Step 3 – Attach the Trigger</h3>
<pre><code class="language-sql">CREATE TRIGGER trigger_product_audit
AFTER INSERT OR UPDATE OR DELETE ON products
FOR EACH ROW
EXECUTE FUNCTION log_product_changes();
</code></pre>
<p>Notice the <code>AFTER INSERT OR UPDATE OR DELETE</code> syntax. You can bind a single trigger to multiple events, which keeps your setup clean.</p>
<h3 id="heading-step-4-test-it">Step 4 – Test It</h3>
<pre><code class="language-sql">-- Insert a new product
INSERT INTO products (name, price) VALUES ('USB-C Hub', 29.99);

-- Update the price
UPDATE products SET price = 24.99 WHERE name = 'USB-C Hub';

-- Delete the product
DELETE FROM products WHERE name = 'USB-C Hub';

-- Check the audit log
SELECT * FROM product_audit ORDER BY changed_at;
</code></pre>
<p>You'll see three rows in <code>product_audit</code> (one for each operation) with the old and new prices recorded automatically. No application code needed.</p>
<h2 id="heading-how-to-use-a-before-trigger-for-validation">How to Use a BEFORE Trigger for Validation</h2>
<p>Triggers can also enforce business rules at the database level. Let's say you want to prevent any product from having a negative price.</p>
<pre><code class="language-sql">CREATE OR REPLACE FUNCTION prevent_negative_price()
RETURNS TRIGGER AS $$
BEGIN
    IF NEW.price &lt; 0 THEN
        RAISE EXCEPTION 'Product price cannot be negative. Got: %', NEW.price;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_check_price
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION prevent_negative_price();
</code></pre>
<p>Now test it:</p>
<pre><code class="language-sql">INSERT INTO products (name, price) VALUES ('Faulty Item', -10.00);
-- ERROR: Product price cannot be negative. Got: -10.00
</code></pre>
<p>The insert is rejected entirely. The row never makes it into the table. This is powerful because the rule is enforced at the database level regardless of which application or script sends the query.</p>
<h2 id="heading-row-level-vs-statement-level-triggers">Row-Level vs Statement-Level Triggers</h2>
<p>All the triggers you've seen so far use <code>FOR EACH ROW</code>, which means the function runs once per affected row. If you update 100 rows in a single query, the trigger function runs 100 times.</p>
<p>PostgreSQL also supports <code>FOR EACH STATEMENT</code> triggers, which run once per SQL statement regardless of how many rows are affected.</p>
<pre><code class="language-sql">CREATE OR REPLACE FUNCTION log_bulk_update()
RETURNS TRIGGER AS $$
BEGIN
    RAISE NOTICE 'A bulk operation was performed on the products table';
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_bulk_update_notice
AFTER UPDATE ON products
FOR EACH STATEMENT
EXECUTE FUNCTION log_bulk_update();
</code></pre>
<p>Statement-level triggers are less common, but they're useful for operations like refreshing a materialized view or sending a single notification after a batch update instead of one notification per row.</p>
<p><strong>Important</strong>: in statement-level triggers, the <code>NEW</code> and <code>OLD</code> variables are not available because the trigger isn't tied to any specific row.</p>
<h2 id="heading-the-new-and-old-variables-reference">The NEW and OLD Variables Reference</h2>
<p>Here's a quick reference for when <code>NEW</code> and <code>OLD</code> are available in row-level triggers:</p>
<table>
<thead>
<tr>
<th>Operation</th>
<th>OLD</th>
<th>NEW</th>
</tr>
</thead>
<tbody><tr>
<td>INSERT</td>
<td>Not available</td>
<td>Contains the new row</td>
</tr>
<tr>
<td>UPDATE</td>
<td>Contains the row before the change</td>
<td>Contains the row after the change</td>
</tr>
<tr>
<td>DELETE</td>
<td>Contains the deleted row</td>
<td>Not available</td>
</tr>
</tbody></table>
<p>Understanding when each variable is available will save you from runtime errors in your trigger functions.</p>
<h2 id="heading-how-to-manage-triggers">How to Manage Triggers</h2>
<p>As you add more triggers to your database, you'll need to know how to inspect, disable, and remove them.</p>
<h3 id="heading-how-to-list-all-triggers-on-a-table">How to List All Triggers on a Table</h3>
<pre><code class="language-sql">SELECT trigger_name, event_manipulation, action_timing
FROM information_schema.triggers
WHERE event_object_table = 'products';
</code></pre>
<h3 id="heading-how-to-disable-a-trigger-temporarily">How to Disable a Trigger Temporarily</h3>
<pre><code class="language-sql">-- Disable a specific trigger
ALTER TABLE products DISABLE TRIGGER trigger_product_audit;

-- Disable all triggers on a table
ALTER TABLE products DISABLE TRIGGER ALL;
</code></pre>
<p>This is useful during bulk data migrations where you want to skip trigger execution for performance reasons.</p>
<h3 id="heading-how-to-re-enable-a-trigger">How to Re-Enable a Trigger</h3>
<pre><code class="language-sql">ALTER TABLE products ENABLE TRIGGER trigger_product_audit;
</code></pre>
<h3 id="heading-how-to-drop-a-trigger">How to Drop a Trigger</h3>
<pre><code class="language-sql">DROP TRIGGER IF EXISTS trigger_product_audit ON products;
</code></pre>
<p>Note that dropping a trigger does not drop the associated function. You'll need to drop the function separately if you no longer need it:</p>
<pre><code class="language-sql">DROP FUNCTION IF EXISTS log_product_changes();
</code></pre>
<h2 id="heading-when-to-use-triggers">When to Use Triggers</h2>
<p>Triggers work well for specific use cases. Here are the scenarios where they're a strong choice:</p>
<ul>
<li><p><strong>Audit logging</strong>: automatically recording who changed what and when, as you saw earlier in this tutorial.</p>
</li>
<li><p><strong>Derived data maintenance</strong>: keeping computed columns, counters, or summary tables in sync with the source data.</p>
</li>
<li><p><strong>Data validation</strong>: enforcing business rules that go beyond what CHECK constraints can express, like cross-table validations.</p>
</li>
<li><p><strong>Automatic timestamping</strong>: setting <code>created_at</code> and <code>updated_at</code> fields without relying on the application layer.</p>
</li>
</ul>
<h2 id="heading-when-to-avoid-triggers">When to Avoid Triggers</h2>
<p>Triggers are powerful, but they come with trade-offs. Here are cases where you should think twice before using them:</p>
<ul>
<li><p><strong>Complex business logic</strong>: if the logic involves calling external APIs, sending emails, or orchestrating multi-step workflows, it belongs in your application layer. Triggers should stay lightweight.</p>
</li>
<li><p><strong>Performance-sensitive bulk operations</strong>: row-level triggers on tables that frequently receive bulk inserts or updates can create significant overhead. If you're inserting millions of rows, those triggers fire millions of times.</p>
</li>
<li><p><strong>Cascading triggers</strong>: when one trigger's action fires another trigger, which fires another, debugging becomes extremely difficult. If you find yourself building a chain of triggers, reconsider the design.</p>
</li>
<li><p><strong>Logic that developers need to discover easily</strong>: triggers are sometimes called "hidden logic" because they execute automatically without appearing in application code. If your team frequently asks "why did this column change?" and the answer is always "there's a trigger," that's a sign the logic might be more discoverable if placed in your application layer or a stored procedure that's called explicitly.</p>
</li>
</ul>
<p>A good rule of thumb: if the logic is tightly coupled to the data and should always execute regardless of which client or service touches the table, a trigger is appropriate. If the logic depends on application context (like the current user's session, feature flags, or external state), it belongs in the application.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you learned what database triggers are and how they work in PostgreSQL. You built three practical triggers: an automatic timestamp updater, a full audit logging system, and a data validation guard. You also learned the difference between BEFORE and AFTER triggers, row-level and statement-level triggers, and when <code>NEW</code> and <code>OLD</code> variables are available.</p>
<p>Triggers are a powerful tool for keeping your data consistent and your business rules enforced at the database level. Use them for focused, data-centric operations, and keep the logic simple.</p>
<p>If you found this tutorial helpful, you can connect with me on <a href="https://linkedin.com/in/iyioladev">LinkedIn</a> and <a href="https://x.com/iyiola_dev_">X</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ An Introduction to Database System Design ]]>
                </title>
                <description>
                    <![CDATA[ These days, businesses and startups rely on well-designed databases to manage vast amounts of data. In domains like Healthcare, E-commerce, and Fintech/Banking, a solid database design ensures data in ]]>
                </description>
                <link>https://www.freecodecamp.org/news/an-introduction-to-database-system-design/</link>
                <guid isPermaLink="false">69c46ab210e664c5da06ac46</guid>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ DBMS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ database design ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Olasunkanmi Emmanuel Jesuferanmi ]]>
                </dc:creator>
                <pubDate>Wed, 25 Mar 2026 23:07:30 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/e0c4195f-9f09-45f4-99d1-977c02731b02.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>These days, businesses and startups rely on well-designed databases to manage vast amounts of data. In domains like Healthcare, E-commerce, and Fintech/Banking, a solid database design ensures data integrity, security, and accessibility.</p>
<p>In this article, we'll talk about what it takes to design a highly-functional database using some key best practices.</p>
<p>This article is aimed at developers and those looking to start a career in managing Databases. We'll discuss what a database actually is, the components of a Database System, what we mean by database design, the stages of database design, and what's involved in Database System Design.</p>
<h3 id="heading-table-of-contents">Table of Contents</h3>
<ol>
<li><p><a href="#heading-prerequisites-and-setup">Prerequisites and Setup</a></p>
<ul>
<li><p><a href="#heading-1-foundational-knowledge">1. Foundational Knowledge</a></p>
</li>
<li><p><a href="#heading-2-software-and-installation">2. Software and Installation</a></p>
</li>
<li><p><a href="#heading-3-verifying-your-setup">3. Verifying Your Setup</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-what-is-a-database">What is a Database?</a></p>
</li>
<li><p><a href="#heading-components-of-a-database-system">Components of a Database System</a></p>
</li>
<li><p><a href="#heading-types-of-database-systems">Types of Database Systems</a></p>
</li>
<li><p><a href="#heading-database-system-vs-dbms">Database System vs. DBMS</a></p>
</li>
<li><p><a href="#heading-characteristics-of-a-good-database">Characteristics of a Good Database</a></p>
</li>
<li><p><a href="#heading-stages-of-database-design">Stages of Database Design</a></p>
</li>
<li><p><a href="#heading-the-role-of-normalisation">The Role of Normalisation</a></p>
</li>
<li><p><a href="#heading-practical-designing-a-library-system">Practical: Designing a Library System</a></p>
<ul>
<li><p><a href="#heading-step-1-requirements--er-diagram">Step 1: Requirements &amp; ER Diagram</a></p>
</li>
<li><p><a href="#heading-step-2-normalization-in-action">Step 2: Normalization in Action</a></p>
</li>
<li><p><a href="#heading-step-3-implementation-sql">Step 3: Implementation (SQL)</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-prerequisites-and-setup">Prerequisites and Setup</h2>
<p>To get the most out of this guide, you should have the following foundational skills and tools ready. This will help ensure that you aren't just reading theory, but that you're actually building a functional system.</p>
<h3 id="heading-1-foundational-knowledge">1. Foundational Knowledge</h3>
<ul>
<li><p><strong>Data Types:</strong> You should be able to distinguish between basic data formats. In database design, choosing the wrong type can lead to storage waste or application errors.</p>
<ul>
<li><p><strong>Strings/Varchars:</strong> Textual data (for example, "John Doe", "123 Main St").</p>
</li>
<li><p><strong>Integers:</strong> Whole numbers used for math or unique IDs (for example, 10, 500).</p>
</li>
<li><p><strong>Floats/Decimals:</strong> Numbers with decimal points, usually for currency (for example, 19.99).</p>
</li>
<li><p><strong>Booleans:</strong> Simple True/False toggles (for example, <code>is_available</code>).</p>
</li>
</ul>
</li>
<li><p><strong>Logical Thinking:</strong> You should be comfortable identifying "entities." If you're building an app for a school, you'll need to recognize that "Students," "Teachers," and "Classrooms" are separate objects that must be linked via relationships.</p>
</li>
<li><p><strong>Terminal/CLI Basics:</strong> While we'll use visual tools, you should know how to open your Command Prompt (Windows) or Terminal (Mac/Linux) and understand that commands are often case-sensitive.</p>
</li>
</ul>
<h3 id="heading-2-software-and-installation">2. Software and Installation</h3>
<p>We'll use <strong>PostgreSQL</strong> (the database engine) and <strong>pgAdmin 4</strong> (the visual management tool).</p>
<ol>
<li><p><strong>Download:</strong> Visit the <a href="https://www.postgresql.org/download/">official PostgreSQL Downloads page</a> and select the installer for your operating system.</p>
</li>
<li><p><strong>Installation Wizard:</strong> Run the installer. When asked which components to include, make sure that PostgreSQL Server, pgAdmin 4, and Command Line Tools are all checked.</p>
</li>
<li><p><strong>The "Postgres" User:</strong> During setup, you will be prompted to create a password for the default "postgres" superuser. <strong>Note:</strong> Write this password down. You can't easily reset it, and you'll need it to access your data.</p>
</li>
<li><p><strong>Port Selection:</strong> The default port is <code>5432</code>. Keep this as the default unless you're an advanced user with a specific reason to change it.</p>
</li>
</ol>
<h3 id="heading-3-verifying-your-setup">3. Verifying Your Setup</h3>
<p>Before moving to the practical section, let's verify that everything is installed correctly:</p>
<ol>
<li><p>Open <strong>pgAdmin 4</strong> from your applications menu.</p>
</li>
<li><p>In the left-hand sidebar, click on <strong>Servers</strong>.</p>
</li>
<li><p>Enter the master password you created during installation.</p>
</li>
<li><p>If you see "PostgreSQL [Version Number]" appear with a green icon, your database environment is successfully configured.</p>
</li>
</ol>
<h2 id="heading-what-is-a-database">What is a Database?</h2>
<p>A Database is a collection of structured data usually stored electronically in a computer. Databases are controlled and managed using a <strong>Database Management System (DBMS).</strong> Database Management Software is an application that constructs and maintain (and sometimes expands) Databases. Examples of DBMS are IBM's DB2, Oracle Corporation's Oracle, Microsoft Access, and Microsoft's SQL Server.</p>
<p>We use databases everyday, whether knowingly or unknowingly. And as a developer, you'll likely need to at least understand Database basics so you can effectively work with them.</p>
<p>It's also important for you to how to know how to design a scalable database, as well as be familiar with the environment in which the database will be housed (called, not surprisingly, a <strong>Database Environment</strong>). The hardware and operating system that house the database makes up this Database Environment.</p>
<h3 id="heading-components-of-a-database-system">Components of a Database System</h3>
<p>A Database System is a computerized record-keeping system designed to store, manage, and retrieve data efficiently. It acts as a centralized repository that allows multiple users to access and manipulate data simultaneously while ensuring the integrity, security, and persistence of that information over time.</p>
<p>A Database System consists of four basic components:</p>
<h4 id="heading-1-hardware">1. Hardware</h4>
<p>This includes the secondary storage where the database resides alongside other necessary components. Examples are Hard disks, Processors, RAM, and so on. Since a database can span from a single workstation to a global mainframe, hardware selection is a priority. Proper investment in processing power and storage is essential to handling the projected user load and data volume.</p>
<h4 id="heading-2-software">2. Software</h4>
<p>In this case, Database Management Software (DBMS) is in charge of the maintenance and management of databases. It's robust software that acts as an intermediary, restricting users from the complex hardware-level details of data storage. The Software layer (DBMS) handles data storage, retrieval, and processing. Examples of DBMS are Microsoft's SQL Server, IBM's DB2, and Oracle.</p>
<h4 id="heading-3-data"><strong>3. Data</strong></h4>
<p>Data serves as the bridge connecting the machine components (hardware and software) to the human users. In a database system, data is organized into two main types:</p>
<ul>
<li><p><strong>User Data:</strong> The actual structured information stored in tables, made up of columns (attributes) and rows (records).</p>
</li>
<li><p><strong>Metadata:</strong> Often defined as "data about data," metadata is stored in system tables and describes the actual structure of the database, such as the number of tables, field names, and defined primary keys</p>
</li>
</ul>
<h4 id="heading-4-users">4. Users</h4>
<p>These are the people who interact with the database to carry out their business responsibilities. Users generally fall into three distinct categories:</p>
<ul>
<li><p><strong>Database Administrators (DBAs):</strong> Technical experts who hold central responsibility for the database. They monitor performance, define security and integrity checks, and establish backup and recovery strategies.</p>
</li>
<li><p><strong>Database Designers/Programmers:</strong> The engineers who actually write the code and use the DBMS to create the database's logical structure.</p>
</li>
<li><p><strong>End Users:</strong> The everyday people who access the database using query languages or simple menu-driven application interfaces</p>
</li>
</ul>
<h2 id="heading-types-of-database-systems">Types of Database Systems</h2>
<p>It's important to know that not all databases store data in the same way. The choice of database depends on the specific needs of the application. The primary types include:</p>
<h3 id="heading-hierarchical-and-network-databases">Hierarchical and Network Databases</h3>
<p>These are older, legacy models. Hierarchical databases structure data in a tree-like, parent-child format where a child can only have one parent. Network databases improved on this by allowing a graph-like structure where records can have multiple parent and child relationships, making it easier to model complex associations.</p>
<h3 id="heading-relational-databases-rdbms">Relational Databases (RDBMS)</h3>
<p>The most widely used type today. They organize data into structured tables consisting of rows and columns. These tables are linked using primary and foreign keys, and they use Structured Query Language (SQL) for operations. They are ideal for applications requiring strong consistency, like banking systems.</p>
<h3 id="heading-object-oriented-databases-oodbms">Object-Oriented Databases (OODBMS)</h3>
<p>These combine database capabilities with object-oriented programming principles (like Java or C++). Data is stored as "objects" that contain both the data and the methods (functions) that operate on it, making them great for complex data like multimedia or engineering designs.</p>
<h3 id="heading-nosql-databases">NoSQL Databases</h3>
<p>Designed to handle large volumes of unstructured or semi-structured data. Unlike relational databases, they don't rely on rigid table structures and are highly scalable. Types of NoSQL include document stores (for example, MongoDB), key-value stores (for example, Redis), column-family stores, and graph databases.</p>
<h3 id="heading-cloud-and-distributed-databases">Cloud and Distributed Databases</h3>
<p>Cloud databases are hosted on cloud platforms (like AWS or Microsoft Azure) and offer elasticity, scalability, and cost-efficiency (pay-as-you-go).</p>
<p>Distributed databases store data across multiple physical locations but function as a single unified system to the user, providing high availability and fault tolerance.</p>
<h3 id="heading-database-system-vs-database-management-system-dbms">Database System vs. Database Management System (DBMS)</h3>
<p>People often use "Database" and "DBMS" interchangeably, but there is a distinct difference:</p>
<ul>
<li><p><strong>Database Management System (DBMS):</strong> This is purely the <em>software</em> that helps users interact with the database. It handles data storage, retrieval, security, and concurrency control. Examples include MySQL, PostgreSQL, and Oracle DB.</p>
</li>
<li><p><strong>Database System:</strong> This is the <em>broader concept</em> that encompasses the entire setup. It includes the actual database (where data is stored), the DBMS software, the physical hardware, the network, and the users interacting with it.</p>
</li>
</ul>
<h2 id="heading-characteristics-of-a-good-database">Characteristics of a Good Database</h2>
<p>To make sure that your database design is successful, it should exhibit several core characteristics:</p>
<ul>
<li><p><strong>Data integrity and consistency:</strong> Ensuring data is accurate, reliable, and uniform across the entire system.</p>
</li>
<li><p><strong>Data security:</strong> Protecting sensitive information from unauthorized access and potential breaches.</p>
</li>
<li><p><strong>Scalability and performance:</strong> The ability to handle increasing amounts of data and users efficiently while providing fast query processing.</p>
</li>
<li><p><strong>Redundancy management (normalization):</strong> Avoiding unnecessary duplication of data to save storage space and prevent errors during updates.</p>
</li>
<li><p><strong>Concurrency control:</strong> Allowing multiple users to access and modify data simultaneously without causing conflicts or data corruption.</p>
</li>
<li><p><strong>Backup and recovery:</strong> Supporting robust mechanisms to recover data in the event of hardware or system failures.</p>
</li>
</ul>
<h2 id="heading-stages-of-database-design">Stages of Database Design</h2>
<p>Database design is a structured process consisting of several stages to ensure that data is efficiently stored, accessed, and managed. There are four key stages in this process:</p>
<h3 id="heading-requirements-analysis">Requirements Analysis</h3>
<p>This is the foundational stage where designers gather and analyse the specific needs of users and the business. It involves identifying the overall purpose of the database, understanding data requirements, defining key entities and attributes, and establishing both functional and non-functional requirements.</p>
<h3 id="heading-conceptual-design">Conceptual Design</h3>
<p>In this phase, a high-level visual blueprint of the database is created, which is independent of any specific software implementation. This makes it easy for non-technical stakeholders to understand.</p>
<p>Designers typically use <a href="https://www.freecodecamp.org/news/how-to-make-flowcharts-with-mermaid/"><strong>Entity-Relationship (ER) models</strong></a> or <a href="https://www.freecodecamp.org/news/uml-diagrams-full-course/">UML diagrams</a> to identify entities, map out relationships, and define constraints like primary keys.</p>
<h3 id="heading-logical-design">Logical Design</h3>
<p>This stage involves translating the conceptual blueprint into a logical model that aligns with a specific type of Database Management System (DBMS), such as a relational or NoSQL system.</p>
<p>Key steps include converting the ER diagram into relational schemas (tables and columns), defining foreign and primary keys, and normalising the database to remove anomalies and reduce data redundancy.</p>
<h3 id="heading-physical-design">Physical Design</h3>
<p>The final stage translates the logical model into an actual physical structure optimized for high performance and efficient storage. Activities here include selecting the specific DBMS, establishing indexing strategies to speed up data retrieval, defining access paths, and configuring essential security policies and backup mechanisms.</p>
<h2 id="heading-the-role-of-normalisation-in-database-design">The Role of Normalisation in Database Design</h2>
<p>When building a relational database, one of the most critical steps in the logical design phase is a process called <strong>Normalisation</strong>.</p>
<p>Normalisation is a systematic approach to organising data to minimise redundancy (duplicate data) and improve overall data integrity. Essentially, it involves taking large, clunky tables and decomposing them into smaller, more focused tables, then linking them together using defined relationships.</p>
<h3 id="heading-why-is-normalisation-important">Why is Normalisation Important?</h3>
<p>A poorly designed database often suffers from errors that happen when you try to insert, update, or delete data. For example, if a teacher's phone number is stored in multiple places, updating it in one row but forgetting another creates an update anomaly. Normalisation solves this by ensuring each piece of information is stored in only one place.</p>
<p>The main objectives of normalisation are:</p>
<ul>
<li><p><strong>Eliminating redundancy:</strong> By reducing duplicate data, you save valuable storage space and keep your data consistent.</p>
</li>
<li><p><strong>Avoiding anomalies:</strong> It prevents data corruption that arises during insertion, updating, or deletion.</p>
</li>
<li><p><strong>Ensuring data integrity:</strong> It maintains the accuracy and reliability of the data across the entire database.</p>
</li>
<li><p><strong>Enhancing query performance:</strong> Organising the data efficiently helps optimise how data is retrieved and updated.</p>
</li>
</ul>
<h3 id="heading-stages-of-normalisation">Stages of Normalisation</h3>
<p>Normalisation happens in sequential stages known as <strong>Normal Forms (NFs)</strong>, where each stage builds upon the rules of the previous one to further refine the database structure.</p>
<p>For beginners, the first three forms are the most important to understand:</p>
<ul>
<li><p><strong>First Normal Form (1NF):</strong> This stage ensures "atomicity". This means that every column in a table should hold a single, indivisible value, and duplicate columns are eliminated. For instance, you would not store two different phone numbers in a single "Phone" cell; you would separate them.</p>
</li>
<li><p><strong>Second Normal Form (2NF):</strong> To achieve 2NF, the table must first be in 1NF. Then, it ensures that all non-key attributes (the regular data columns) are fully dependent on the entire primary key. This often involves creating separate tables for distinct entities, like putting "Courses" into their own table rather than mixing them with "Student" details.</p>
</li>
<li><p><strong>Third Normal Form (3NF):</strong> A table in 3NF is already in 2NF and has removed all "transitive dependencies". This means that a non-key column shouldn't depend on another non-key column. For example, if a table has an "Instructor Name" and "Instructor Phone," those details should live in a dedicated "Instructor" table, not inside a "Course" table.</p>
</li>
<li><p><strong>Boyce-Codd Normal Form (BCNF):</strong> This is a stricter version of 3NF used to resolve any remaining, complex anomalies.</p>
</li>
</ul>
<h3 id="heading-finding-the-right-balance">Finding the Right Balance</h3>
<p>While normalisation is crucial for maintaining data consistency, it's important to strike a balance. Minimising redundancy is great, but excessive normalisation creates dozens of tiny tables. When you need to retrieve a complete record, the database system has to piece all those tables back together (using complex queries called "joins"), which can slow down performance.</p>
<p>So the ultimate goal of a good database designer is to find the sweet spot between a highly normalised structure and efficient query performance.</p>
<h2 id="heading-practical-designing-a-library-system">Practical: Designing a Library System</h2>
<p>To move from theory to practice, let’s build a database for a small local library. We'll go through the design stages to ensure the data is structured efficiently.</p>
<h3 id="heading-step-1-requirements-amp-er-diagram">Step 1: Requirements &amp; ER Diagram</h3>
<p>First, we'll identify what our library needs to track. We have three main entities:</p>
<ul>
<li><p><strong>Authors:</strong> The writers of the books.</p>
</li>
<li><p><strong>Books:</strong> The actual items available for loan.</p>
</li>
<li><p><strong>Members:</strong> The people who borrow the books.</p>
</li>
</ul>
<p><strong>The Relationships:</strong></p>
<ul>
<li><p>One <strong>Author</strong> can write many <strong>Books</strong> (One-to-Many).</p>
</li>
<li><p>One <strong>Member</strong> can borrow many <strong>Books</strong> (One-to-Many).</p>
</li>
</ul>
<p>Here's the ER diagram I've created for this example:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5f1de20cf4016901885ccc17/fde8b5a3-c546-4426-b5e3-9a6901c9553d.png" alt="ER diagram" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-step-2-normalization-in-action">Step 2: Normalization in Action</h3>
<p>To ensure our database is "well-designed" and free of redundancy, we'll apply the normalization rules discussed earlier. Instead of one giant spreadsheet, we split the data into three distinct tables:</p>
<ol>
<li><p><strong>Authors Table:</strong> * <code>author_id</code> (Primary Key)</p>
<ul>
<li><code>author_name</code></li>
</ul>
</li>
<li><p><strong>Books Table:</strong> * <code>book_id</code> (Primary Key)</p>
<ul>
<li><p><code>title</code></p>
</li>
<li><p><code>isbn</code></p>
</li>
<li><p><code>author_id</code> (Foreign Key linking to the Authors table)</p>
</li>
</ul>
</li>
<li><p><strong>Members Table:</strong> * <code>member_id</code> (Primary Key)</p>
<ul>
<li><p><code>first_name</code></p>
</li>
<li><p><code>last_name</code></p>
</li>
<li><p><code>email</code> (Unique constraint)</p>
</li>
</ul>
</li>
</ol>
<h3 id="heading-step-3-implementation-sql">Step 3: Implementation (SQL)</h3>
<p>Now, let’s use the <strong>PostgreSQL Query Tool</strong> in pgAdmin 4 to actually create these tables and insert some dummy data.</p>
<pre><code class="language-sql">-- 1. Create the Authors table
CREATE TABLE Authors (
    author_id SERIAL PRIMARY KEY,
    author_name VARCHAR(100) NOT NULL
);

-- 2. Create the Books table with a relationship to Authors
CREATE TABLE Books (
    book_id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    isbn VARCHAR(20) UNIQUE,
    author_id INT REFERENCES Authors(author_id)
);

-- 3. Create the Members table
CREATE TABLE Members (
    member_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100) UNIQUE NOT NULL
);

-- 4. Insert dummy data to test the design
INSERT INTO Authors (author_name) 
VALUES ('J.R.R. Tolkien'), ('George R.R. Martin');

INSERT INTO Books (title, isbn, author_id) 
VALUES ('The Hobbit', '978-0261102217', 1), 
       ('A Game of Thrones', '978-0553103540', 2);
</code></pre>
<h4 id="heading-understanding-the-schema-design">Understanding the Schema Design:</h4>
<p>By running the SQL script above, you’ve successfully transitioned from a logical design to a physical database. Here is a breakdown of the key concepts we applied:</p>
<ul>
<li><p><strong>Primary Keys (PK):</strong> Using <code>SERIAL PRIMARY KEY</code> automatically creates a unique, incrementing ID for every new entry. This ensures no two authors or books are ever confused by the system.</p>
</li>
<li><p><strong>Foreign Keys (FK):</strong> The <code>REFERENCES Authors(author_id)</code> command is where the "Relational" part of a Relational Database happens. It tells the <code>Books</code> table that it must point to a valid ID in the <code>Authors</code> table, preventing "orphan" books without creators.</p>
</li>
<li><p><strong>Constraints:</strong> By using <code>UNIQUE</code> on the <code>isbn</code> and <code>email</code> columns, we've programmed the database to reject any duplicate data, ensuring high data integrity.</p>
</li>
</ul>
<h3 id="heading-how-to-fetch-your-data">How to Fetch Your Data</h3>
<p>Now that the data is stored, you need to know how to get it back out. In SQL, we can do this using the <code>SELECT</code> statement.</p>
<h4 id="heading-1-see-everything-in-a-table">1. See Everything in a Table</h4>
<p>To see all books currently in the library:</p>
<pre><code class="language-sql">SELECT * FROM Books;
</code></pre>
<h4 id="heading-2-filtering-results">2. Filtering Results</h4>
<p>Often, you don't want every single row. You can use the <code>WHERE</code> clause to filter for specific data. For example, to find an author by their exact name:</p>
<pre><code class="language-sql">SELECT * FROM Authors 
WHERE author_name = 'J.R.R. Tolkien';
</code></pre>
<h4 id="heading-3-joining-tables">3. Joining Tables</h4>
<p>In a normalized database, information is spread across tables. To see a list of book titles alongside their actual author names (instead of just an ID number), you use a <code>JOIN</code>.</p>
<pre><code class="language-sql">SELECT Books.title, Authors.author_name
FROM Books
JOIN Authors ON Books.author_id = Authors.author_id;
</code></pre>
<p>The database is instructed by this query to "give the titles from the Books table and the names from the Authors table where the author_id matches in both." This enables you to effectively store data in distinct tables while viewing it as a single, comprehensive report.</p>
<p>This ability to link information across tables is what makes Relational Databases the industry standard for most business applications. But while the relational model is powerful, it isn't the only way to store data. Depending on whether you're handling social media connections, real-time sensor data, or simple document storage, you might need a different architectural approach.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Designing a database is much more than simply throwing information into a computer. It's the process of building a robust, efficient, and secure foundation for decision-making and business operations.</p>
<p>As we have explored here, a successful database relies on a carefully orchestrated ecosystem of hardware, software (the DBMS), data, and users.</p>
<p>By following the four stages of design – Requirements Analysis, Conceptual Design, Logical Design, and Physical Design – you can avoid costly structural mistakes and ensure that your system perfectly aligns with user needs.</p>
<p>Applying critical techniques like normalisation during this process guarantees that your data remains consistent, accurate, and free from frustrating anomalies.</p>
<p>Also, as the digital landscape continues to evolve, mastering these foundational concepts is your stepping stone into the future. Traditional relational databases remain incredibly powerful, but modern data demands are rapidly driving the adoption of cloud-based, AI-powered, and serverless database systems.</p>
<p>A well-designed system today must not only focus on data integrity and query performance but also prioritise scalability and stringent data security to protect against modern cyber threats.</p>
<p>Whether you are building a simple address book or architecting a backend for the next big application, keeping these core principles of database system design in mind will empower you to create resilient, high-performing, and future-proof data solutions.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Implement the Outbox Pattern in Go and PostgreSQL ]]>
                </title>
                <description>
                    <![CDATA[ In event-driven systems, two things need to happen when you process a request: you need to save data to your database, and you need to publish an event to a message broker so other services know somet ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-implement-the-outbox-pattern-in-go-and-postgresql/</link>
                <guid isPermaLink="false">69bc31b3b238fd45a31f1291</guid>
                
                    <category>
                        <![CDATA[ PostgreSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ golang ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Alex Pliutau ]]>
                </dc:creator>
                <pubDate>Thu, 19 Mar 2026 17:26:11 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/7a24b5a7-6619-4997-b24c-c4a743f37c33.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In event-driven systems, two things need to happen when you process a request: you need to save data to your database, and you need to publish an event to a message broker so other services know something changed.</p>
<p>These two operations look simple, but they hide a dangerous reliability problem. What if the database write succeeds but the message broker is temporarily unreachable? Or your service crashes between the two steps? You end up in an inconsistent state: your database has the new data, but the rest of the system never heard about it.</p>
<p>The <strong>Outbox Pattern</strong> is a well-established solution to this problem. In this tutorial, you'll learn what the pattern is, why it works, and how to implement it in Go with PostgreSQL and Google Cloud Pub/Sub.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before reading this tutorial, you should be familiar with:</p>
<ul>
<li><p>The basics of the Go programming language</p>
</li>
<li><p>SQL and PostgreSQL</p>
</li>
<li><p>The concept of database transactions</p>
</li>
<li><p>Basic familiarity with event-driven or distributed systems (helpful but not required)</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a href="#heading-the-problem-two-operations-no-atomicity">The Problem: Two Operations, No Atomicity</a></p>
</li>
<li><p><a href="#heading-how-the-outbox-pattern-works">How the Outbox Pattern Works</a></p>
</li>
<li><p><a href="#heading-the-outbox-table-schema">The Outbox Table Schema</a></p>
</li>
<li><p><a href="#heading-the-message-relay">The Message Relay</a></p>
</li>
<li><p><a href="#heading-go-and-postgresql-implementation">Go and PostgreSQL Implementation</a></p>
<ul>
<li><p><a href="#heading-the-orders-service">The Orders Service</a></p>
</li>
<li><p><a href="#heading-the-relay-service">The Relay Service</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-why-messages-can-be-delivered-more-than-once">Why Messages Can Be Delivered More Than Once</a></p>
</li>
<li><p><a href="#heading-alternative-postgresql-logical-replication">Alternative: PostgreSQL Logical Replication</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-the-problem-two-operations-no-atomicity">The Problem: Two Operations, No Atomicity</h2>
<p>To understand why the Outbox Pattern exists, you need to understand a core challenge in distributed systems: <strong>atomicity across different systems</strong>.</p>
<p>In a relational database, a <strong>transaction</strong> lets you group multiple operations so they either all succeed or all fail together. If you insert a row and update another row in the same transaction, you're guaranteed that both happen – or neither does.</p>
<p>The problem arises when you try to extend this guarantee <em>across two different systems:</em> for example, your database and your message broker (like Kafka, RabbitMQ, or Pub/Sub). These systems don't share a transaction boundary.</p>
<p>Here's a typical event-driven flow that breaks without the Outbox Pattern:</p>
<ol>
<li><p>A user places an order.</p>
</li>
<li><p>Your service saves the order to the database ✅</p>
</li>
<li><p>Your service publishes an <code>order.created</code> event to the message broker ❌ (broker is down)</p>
</li>
<li><p>The order exists in the database, but downstream services never learned about it.</p>
</li>
</ol>
<p>Or the reverse failure:</p>
<ol>
<li><p>Your service publishes the event first ✅</p>
</li>
<li><p>Your service tries to save the order to the database ❌ (database times out)</p>
</li>
<li><p>Downstream services received a notification for an order that doesn't exist.</p>
</li>
</ol>
<p>Either scenario leaves your system in an inconsistent state. This is the core problem the Outbox Pattern solves.</p>
<p>Here's what the process looks like when not using the Outbox Pattern:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5ea89c91fdc930d846b413ab/9f9abcaa-adc8-48ab-b8cb-c47cb731724e.png" alt="diagram without outbox" style="display:block;margin:0 auto" width="2558" height="1062" loading="lazy">

<h2 id="heading-how-the-outbox-pattern-works">How the Outbox Pattern Works</h2>
<p>The Outbox Pattern solves the atomicity problem by keeping both operations <em>inside</em> the database:</p>
<ol>
<li><p>Saves your business data (for example, a new order) to your database.</p>
</li>
<li><p>Writes the event message to a special table called the outbox table in the same database transaction.</p>
</li>
<li><p>A separate background process called the Message Relay polls the outbox table and publishes pending messages to the broker.</p>
</li>
<li><p>Once the broker confirms receipt, the relay marks the message as processed.</p>
</li>
</ol>
<p>Because steps 1 and 2 happen in the same database transaction, they are <strong>atomic</strong>. Either both succeed or neither does. You can never end up with saved data but no corresponding event queued – or an event queued for data that was never saved.</p>
<p>The message is never published directly to the broker in your main application code. Instead, the database acts as a reliable staging area.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5ea89c91fdc930d846b413ab/ef5413b0-6c8e-42b8-949f-5eefe3844231.png" alt="diagram with outbox" style="display:block;margin:0 auto" width="2478" height="1216" loading="lazy">

<h2 id="heading-the-outbox-table-schema">The Outbox Table Schema</h2>
<p>The outbox table stores pending messages until the relay picks them up. Here's a typical PostgreSQL schema:</p>
<pre><code class="language-sql">CREATE TABLE outbox (
    id          uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    topic       varchar(255)  NOT NULL,
    message     jsonb         NOT NULL,
    state       varchar(50)   NOT NULL DEFAULT 'pending',
    created_at  timestamptz   NOT NULL DEFAULT now(),
    processed_at timestamptz
);
</code></pre>
<p>Let's walk through each column:</p>
<ul>
<li><p><code>id</code>: A unique identifier for each message. Using UUIDs makes it easy to reference specific messages.</p>
</li>
<li><p><code>topic</code>: The destination topic or queue name in your message broker (for example, <code>orders.created</code>).</p>
</li>
<li><p><code>message</code>: The event payload, stored as JSON. This is the data your consumers will receive.</p>
</li>
<li><p><code>state</code>: Tracks whether the message has been sent. The two main values are <code>pending</code> (waiting to be published) and <code>processed</code> (successfully published).</p>
</li>
<li><p><code>created_at</code>: When the message was inserted. The relay uses this to process messages in order.</p>
</li>
<li><p><code>processed_at</code>: When the relay successfully published the message.</p>
</li>
</ul>
<p>You may want to add additional columns depending on your needs: for example, a <code>retry_count</code> column to track how many times the relay has attempted to send a message, or an <code>error</code> column to log failure reasons.</p>
<h2 id="heading-the-message-relay">The Message Relay</h2>
<p>The Message Relay is a background process (often a goroutine, a sidecar, or a separate service) that bridges the outbox table and the message broker.</p>
<p>Its responsibilities are:</p>
<ol>
<li><p>Periodically query the outbox table for messages with <code>state = 'pending'</code>.</p>
</li>
<li><p>Publish each message to the appropriate topic in the broker.</p>
</li>
<li><p>Once the broker confirms delivery, update the row's <code>state</code> to <code>'processed'</code>.</p>
</li>
<li><p>Handle failures gracefully: if publishing fails, leave the message as <code>'pending'</code> so it will be retried.</p>
</li>
</ol>
<p>This design gives you <strong>at-least-once delivery</strong>: a message will always be sent, even if the relay crashes and restarts. The trade-off is that a message might occasionally be sent more than once (more on this below), so your consumers should handle duplicates.</p>
<h2 id="heading-go-and-postgresql-implementation">Go and PostgreSQL Implementation</h2>
<p>Let's build a concrete example. Imagine you have an orders service. When a new order is created, you want to:</p>
<ol>
<li><p>Save the order to a PostgreSQL <code>orders</code> table.</p>
</li>
<li><p>Publish an <code>order.created</code> event to Google Cloud Pub/Sub.</p>
</li>
</ol>
<p>You'll use <a href="https://github.com/jackc/pgx">pgx</a> for the PostgreSQL driver.</p>
<h3 id="heading-the-orders-service">The Orders Service</h3>
<p>The key insight is that the order insert and the outbox insert happen <strong>inside the same transaction</strong>. If anything goes wrong, both are rolled back.</p>
<pre><code class="language-go">// orders/main.go

package main

import (
	"context"
	"encoding/json"
	"log"
	"os"

	"github.com/google/uuid"
	"github.com/jackc/pgx/v5"
	"github.com/jackc/pgx/v5/pgxpool"
)

// Order represents a customer order in our system.
type Order struct {
	ID       uuid.UUID `json:"id"`
	Product  string    `json:"product"`
	Quantity int       `json:"quantity"`
}

// OrderCreatedEvent is the payload published to the message broker.
// It contains only the fields that downstream services need to know about.
type OrderCreatedEvent struct {
	OrderID uuid.UUID `json:"order_id"`
	Product string    `json:"product"`
}

// createOrderInTx saves a new order and its outbox event atomically.
// Both operations share the same transaction (tx), so either both succeed
// or both are rolled back — ensuring consistency.
func createOrderInTx(ctx context.Context, tx pgx.Tx, order Order) error {
	// Step 1: Insert the business data (the actual order).
	_, err := tx.Exec(ctx,
		"INSERT INTO orders (id, product, quantity) VALUES (\(1, \)2, $3)",
		order.ID, order.Product, order.Quantity,
	)
	if err != nil {
		return err
	}
	log.Printf("Inserted order %s into database", order.ID)

	// Step 2: Serialize the event payload that consumers will receive.
	event := OrderCreatedEvent{
		OrderID: order.ID,
		Product: order.Product,
	}
	msg, err := json.Marshal(event)
	if err != nil {
		return err
	}

	// Step 3: Write the event to the outbox table.
	// This does NOT publish to Pub/Sub — it just queues it for the relay.
	_, err = tx.Exec(ctx,
		"INSERT INTO outbox (topic, message) VALUES (\(1, \)2)",
		"orders.created", msg,
	)
	if err != nil {
		return err
	}
	log.Printf("Inserted outbox event for order %s", order.ID)

	return nil
}

func main() {
	ctx := context.Background()

	pool, err := pgxpool.New(ctx, os.Getenv("DATABASE_URL"))
	if err != nil {
		log.Fatalf("Unable to connect to database: %v", err)
	}
	defer pool.Close()

	// Begin a transaction that will cover both the order insert
	// and the outbox insert.
	tx, err := pool.Begin(ctx)
	if err != nil {
		log.Fatalf("Unable to begin transaction: %v", err)
	}
	// If anything fails, the deferred Rollback is a no-op after a successful Commit.
	defer tx.Rollback(ctx)

	newOrder := Order{
		ID:       uuid.New(),
		Product:  "Super Widget",
		Quantity: 10,
	}

	if err := createOrderInTx(ctx, tx, newOrder); err != nil {
		log.Fatalf("Failed to create order: %v", err)
	}

	// Committing the transaction makes both writes permanent simultaneously.
	if err := tx.Commit(ctx); err != nil {
		log.Fatalf("Failed to commit transaction: %v", err)
	}

	log.Println("Successfully created order and queued outbox event.")
}
</code></pre>
<p>Notice that <code>createOrderInTx</code> receives a <code>pgx.Tx</code> (a transaction) rather than a pool connection. This is intentional: it enforces that the caller is responsible for managing the transaction boundary, making the atomicity guarantee explicit.</p>
<h3 id="heading-the-relay-service">The Relay Service</h3>
<p>The relay runs as a separate background process. It polls the outbox table, publishes messages, and marks them as processed.</p>
<p>A critical detail here is the use of <code>FOR UPDATE SKIP LOCKED</code> in the SQL query. This PostgreSQL feature lets you run <strong>multiple relay instances</strong> concurrently without them stepping on each other. When one instance locks a row to process it, other instances skip that row and move on to the next one.</p>
<pre><code class="language-go">// relay/main.go

package main

import (
	"context"
	"log"
	"time"

	"cloud.google.com/go/pubsub"
	"github.com/google/uuid"
	"github.com/jackc/pgx/v5/pgxpool"
)

// OutboxMessage mirrors the columns we need from the outbox table.
type OutboxMessage struct {
	ID      uuid.UUID
	Topic   string
	Message []byte
}

// processOutboxMessages picks up one pending message, publishes it to Pub/Sub,
// and marks it as processed — all within a single database transaction.
func processOutboxMessages(ctx context.Context, pool *pgxpool.Pool, pubsubClient *pubsub.Client) error {
	tx, err := pool.Begin(ctx)
	if err != nil {
		return err
	}
	defer tx.Rollback(ctx)

	// Query for the next pending message.
	// FOR UPDATE SKIP LOCKED ensures that if multiple relay instances are
	// running, they won't try to process the same message simultaneously.
	rows, err := tx.Query(ctx, `
		SELECT id, topic, message
		FROM outbox
		WHERE state = 'pending'
		ORDER BY created_at
		LIMIT 1
		FOR UPDATE SKIP LOCKED
	`)
	if err != nil {
		return err
	}
	defer rows.Close()

	var msg OutboxMessage
	if rows.Next() {
		if err := rows.Scan(&amp;msg.ID, &amp;msg.Topic, &amp;msg.Message); err != nil {
			return err
		}
	} else {
		// No pending messages — nothing to do.
		return nil
	}

	log.Printf("Publishing message %s to topic %s", msg.ID, msg.Topic)

	// Publish the message to the Pub/Sub topic and wait for confirmation.
	result := pubsubClient.Topic(msg.Topic).Publish(ctx, &amp;pubsub.Message{
		Data: msg.Message,
	})
	if _, err = result.Get(ctx); err != nil {
		// Publishing failed. We return the error here without committing,
		// so the transaction rolls back and the message stays 'pending'.
		// The relay will retry it on the next polling interval.
		return err
	}

	// Mark the message as processed now that the broker has confirmed receipt.
	_, err = tx.Exec(ctx,
		"UPDATE outbox SET state = 'processed', processed_at = now() WHERE id = $1",
		msg.ID,
	)
	if err != nil {
		return err
	}
	log.Printf("Marked message %s as processed", msg.ID)

	// Commit the transaction: the state update becomes permanent.
	return tx.Commit(ctx)
}

func main() {
	// In production, initialize real connections using environment variables
	// or a config file. These are left as placeholders for clarity.
	var (
		pool         *pgxpool.Pool
		pubsubClient *pubsub.Client
	)

	// Poll the outbox table every second.
	// Adjust the interval based on your latency requirements.
	ticker := time.NewTicker(1 * time.Second)
	defer ticker.Stop()

	for range ticker.C {
		if err := processOutboxMessages(context.Background(), pool, pubsubClient); err != nil {
			log.Printf("Error processing outbox: %v", err)
		}
	}
}
</code></pre>
<p>The polling interval (1 second in this example) controls the maximum latency between an event being written to the outbox and it being published to the broker. For most use cases, 1–5 seconds is perfectly acceptable. If you need lower latency, you can reduce the interval, or consider using PostgreSQL's <code>LISTEN/NOTIFY</code> feature to wake up the relay immediately when a new row is inserted.</p>
<h2 id="heading-why-messages-can-be-delivered-more-than-once">Why Messages Can Be Delivered More Than Once</h2>
<p>You might wonder: isn't the Outbox Pattern supposed to guarantee <em>exactly once</em> delivery?</p>
<p>It does not. It guarantees <strong>at-least-once</strong> delivery. Here's the edge case:</p>
<ol>
<li><p>The relay publishes the message to Pub/Sub successfully.</p>
</li>
<li><p>Before it can update the outbox row to <code>'processed'</code>, the relay process crashes.</p>
</li>
<li><p>On restart, the relay sees the message is still <code>'pending'</code> and publishes it again.</p>
</li>
</ol>
<p>This is a rare but possible scenario. The standard way to handle it is to design your message <strong>consumers to be idempotent</strong>. This means that they can safely receive and process the same message multiple times without causing incorrect behavior.</p>
<p>Common strategies for idempotency include:</p>
<ul>
<li><p>Using the message's <code>id</code> as a deduplication key, and checking if you've already processed it before acting.</p>
</li>
<li><p>Making your operations naturally idempotent. For example, using <code>INSERT ... ON CONFLICT DO NOTHING</code> instead of a plain <code>INSERT</code>.</p>
</li>
</ul>
<h2 id="heading-alternative-postgresql-logical-replication">Alternative: PostgreSQL Logical Replication</h2>
<p>The polling approach described above is simple and works well, but it has two drawbacks: it introduces some latency (up to one polling interval), and it issues database queries even when there's nothing to process.</p>
<p>For high-throughput systems where these trade-offs matter, PostgreSQL offers a more advanced alternative: <strong>logical replication</strong> via the <strong>Write-Ahead Log (WAL)</strong>.</p>
<p>Every change made to a PostgreSQL database is first written to the WAL – an append-only log used for crash recovery and replication. With logical replication, you can subscribe to changes in specific tables and receive them as a stream in near real-time.</p>
<p>Instead of your relay asking "Are there any new messages?" on a schedule, PostgreSQL will proactively notify your relay the moment a new row is inserted into the outbox table.</p>
<p>This approach is lower latency and more resource-efficient for high-volume workloads. The trade-off is added implementation complexity: you need to manage a replication slot in PostgreSQL and handle the WAL stream correctly.</p>
<p>In Go, you can use the <a href="https://github.com/jackc/pglogrepl">pglogrepl</a> library to interact with PostgreSQL's logical replication protocol.</p>
<p>For more details on how WAL and change data capture work in PostgreSQL, see the <a href="https://www.postgresql.org/docs/current/wal-intro.html">official Write-Ahead Logging documentation</a>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/5ea89c91fdc930d846b413ab/c706cc8f-6bbd-49d1-90c0-8c4934c2718e.png" alt="diagram with WAL" style="display:block;margin:0 auto" width="2362" height="1224" loading="lazy">

<h2 id="heading-conclusion">Conclusion</h2>
<p>The Outbox Pattern solves a fundamental problem in distributed systems: how do you reliably perform a database write and publish a message to a broker in a consistent way?</p>
<p>The key idea is to use your database as the source of truth for <em>both</em> the business data and the pending messages. By writing to the outbox table in the same transaction as your business data, you get atomic guarantees from the database itself: no distributed transaction protocol required.</p>
<p>Here's a quick summary of the key concepts:</p>
<ul>
<li><p><strong>The outbox table</strong> stores pending events as part of your regular database schema.</p>
</li>
<li><p><strong>The transaction</strong> wraps both the business write and the outbox write, making them atomic.</p>
</li>
<li><p><strong>The Message Relay</strong> is a background process that reads from the outbox and publishes to the broker.</p>
</li>
<li><p><strong>At-least-once delivery</strong> means your consumers must be idempotent.</p>
</li>
<li><p><code>FOR UPDATE SKIP LOCKED</code> allows multiple relay instances to run safely in parallel.</p>
</li>
<li><p><strong>Logical replication</strong> is an advanced alternative that avoids polling for high-throughput systems.</p>
</li>
</ul>
<p>The pattern is simple in concept, but there are several ways to implement it depending on your scale and infrastructure. The polling approach shown in this tutorial is a solid starting point for most applications.</p>
<h3 id="heading-resources">Resources</h3>
<ul>
<li><p><a href="https://github.com/plutov/packagemain/tree/master/outbox">Source code on GitHub</a></p>
</li>
<li><p><a href="https://www.postgresql.org/docs/current/wal-intro.html">PostgreSQL Write-Ahead Logging (WAL)</a></p>
</li>
<li><p><a href="https://github.com/jackc/pglogrepl">pglogrepl – Go library for PostgreSQL logical replication</a></p>
</li>
<li><p><a href="https://github.com/jackc/pgx">pgx – PostgreSQL driver and toolkit for Go</a></p>
</li>
<li><p><a href="https://packagemain.tech">Explore more Go tutorials on packagemain.tech</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What is Disaster Recovery Testing? Explained with Practical Examples ]]>
                </title>
                <description>
                    <![CDATA[ Most teams are confident they can recover from a major outage until they actually have to. Backups exist, architectures are redundant and a recovery plan is documented somewhere, yet real incidents of ]]>
                </description>
                <link>https://www.freecodecamp.org/news/disaster-recovery-testing/</link>
                <guid isPermaLink="false">69a5614ffc6453a5f17ca809</guid>
                
                    <category>
                        <![CDATA[ Testing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cybersecurity ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Security ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Alex Tray ]]>
                </dc:creator>
                <pubDate>Mon, 02 Mar 2026 10:07:11 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/57c1e51b-867c-444e-90f0-e6551284fe0a.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Most teams are confident they can recover from a major outage until they actually have to. Backups exist, architectures are redundant and a recovery plan is documented somewhere, yet real incidents often reveal critical gaps.</p>
<p>Disaster recovery testing is what separates assumed resilience from proven recovery, but it’s still skipped, rushed or treated as a checkbox exercise. For developers and technical teams, that gap can turn a manageable failure into a prolonged outage.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-what-is-disaster-recovery-testing">What is Disaster Recovery Testing?</a></p>
</li>
<li><p><a href="#heading-how-disaster-recovery-testing-works-in-practice">How Disaster Recovery Testing Works in Practice</a></p>
</li>
<li><p><a href="#heading-disaster-recovery-testing-methods-developers-should-know">Disaster Recovery Testing Methods Developers Should Know</a></p>
</li>
<li><p><a href="#heading-what-technology-disaster-recovery-testing-evaluates">What Technology Disaster Recovery Testing Evaluates</a></p>
</li>
<li><p><a href="#heading-how-to-test-a-disaster-recovery-plan">How to Test a Disaster Recovery Plan</a></p>
</li>
<li><p><a href="#heading-disaster-recovery-test-scenarios-practical-examples">Disaster Recovery Test Scenarios: Practical Examples</a></p>
</li>
<li><p><a href="#heading-disaster-recovery-test-report-turning-tests-into-improvements">Disaster Recovery Test Report: Turning Tests Into Improvements</a></p>
</li>
<li><p><a href="#heading-disaster-recovery-audits-and-continuous-validation">Disaster Recovery Audits and Continuous Validation</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-is-disaster-recovery-testing"><strong>What is Disaster Recovery Testing?</strong></h2>
<p>Disaster recovery (DR) testing is the process of validating that systems, data and applications can be restored after a disruptive event within defined recovery objectives. It generally evaluates:</p>
<ul>
<li><p><strong>Recovery Time Objective (RTO)</strong>: How quickly systems must be restored.</p>
</li>
<li><p><strong>Recovery Point Objective (RPO)</strong>: How much data loss is acceptable.</p>
</li>
<li><p><strong>Operational readiness</strong>: Whether teams know what to do during an incident.</p>
</li>
</ul>
<p>A disaster recovery test plan documents how these elements are tested, who is responsible and what success looks like. Without testing, DR plans are assumptions, not guarantees.</p>
<h2 id="heading-how-disaster-recovery-testing-works-in-practice"><strong>How Disaster Recovery Testing Works in Practice</strong></h2>
<p>In real environments, disaster recovery testing is used to check all <a href="https://www.nakivo.com/blog/components-disaster-recovery-plan-checklist/">elements of the disaster recovery plan</a> and is rarely a single event. It’s a structured exercise that simulates failure, observes system behavior and measures outcomes against expectations.</p>
<p>A typical DR test involves:</p>
<ol>
<li><p><strong>Defining scope</strong> – Which applications, services, or data sets are included.</p>
</li>
<li><p><strong>Selecting a scenario</strong> – Outage, corruption, ransomware, region failure, and so on.</p>
</li>
<li><p><strong>Executing recovery actions</strong> – Restore data, fail over systems, reconfigure dependencies.</p>
</li>
<li><p><strong>Measuring results</strong> – Time to recovery, data consistency, service availability.</p>
</li>
<li><p><strong>Documenting findings</strong> – What worked, what failed, what needs improvement.</p>
</li>
</ol>
<p>For developers, the key shift is recognizing that DR testing isn’t just an ops exercise. Application architecture, data handling and deployment patterns all influence recovery outcomes.</p>
<p>Importantly, regulatory pressure is also reshaping how organizations approach recovery validation. Frameworks such as the <a href="https://heimdalsecurity.com/nis-2-directive">NIS2 Directive</a> require essential and important entities in the EU to implement robust cybersecurity risk management measures, including incident response and business continuity capabilities.</p>
<h2 id="heading-disaster-recovery-testing-methods-developers-should-know"><strong>Disaster Recovery Testing Methods Developers Should Know</strong></h2>
<p>Different testing methods provide different levels of confidence. Mature teams use more than one. Each method has a place, but relying only on low-impact testing creates blind spots that surface during real incidents.</p>
<h3 id="heading-checklist-testing"><strong>Checklist Testing</strong></h3>
<p>The simplest method: Teams review documented recovery steps without executing them. This helps validate documentation completeness but does not confirm real-world recoverability.</p>
<h3 id="heading-tabletop-exercises"><strong>Tabletop Exercises</strong></h3>
<p>Stakeholders walk through a simulated disaster scenario and discuss responses. Tabletop tests are useful for identifying communication gaps and unclear responsibilities, especially for cross-team coordination.</p>
<h3 id="heading-partial-or-component-testing"><strong>Partial or Component Testing</strong></h3>
<p>Specific systems, such as databases or backup restores, are tested in isolation. Developers often encounter this when validating recovery procedures for individual services or environments.</p>
<h3 id="heading-full-scale-testing"><strong>Full-scale Testing</strong></h3>
<p>This is the most comprehensive method. It involves actual failover or full recovery in production-like environments. While disruptive, full-scale tests provide the highest confidence.</p>
<h2 id="heading-what-technology-disaster-recovery-testing-evaluates"><strong>What Technology Disaster Recovery Testing Evaluates</strong></h2>
<p>Modern environments are complex, and disaster recovery testing must validate more than just data restores.</p>
<p>DR testing evaluates:</p>
<ul>
<li><p><strong>Backup integrity</strong> – Are backups usable, consistent and complete?</p>
</li>
<li><p><strong>Application dependencies</strong> – Do services come back in the correct order?</p>
</li>
<li><p><strong>Infrastructure recovery</strong> – Can compute, storage and networking be re-provisioned?</p>
</li>
<li><p><strong>Identity and access</strong> – Do credentials, secrets and permissions still function?</p>
</li>
<li><p><strong>Automation and scripts</strong> – Do recovery workflows still match current architectures?</p>
</li>
</ul>
<p>For developers, this often reveals hidden coupling between services, outdated scripts or environment-specific assumptions that were never documented.</p>
<h2 id="heading-how-to-test-a-disaster-recovery-plan"><strong>How to Test a Disaster Recovery Plan</strong></h2>
<p>Testing a disaster recovery plan doesn’t require shutting down production on day one. A practical, incremental approach works best.</p>
<ol>
<li><p><strong>Start with a single application</strong>: Pick a service with well-defined data and dependencies. Avoid starting with your most complex system.</p>
</li>
<li><p><strong>Validate backup restores</strong>: Restore data into a non-production environment and confirm application functionality, not just file presence.</p>
</li>
<li><p><strong>Measure RTO and RPO</strong>: Time the recovery process and compare results to stated objectives. At this stage, many teams can discover that their objectives were unrealistic.</p>
</li>
<li><p><strong>Test failure assumptions</strong>: Simulate real-world issues like missing credentials, expired certificates or partial data loss.</p>
</li>
<li><p><strong>Document gaps immediately</strong>: Update the disaster recovery test plan while findings are fresh. Untested fixes are just new assumptions.</p>
</li>
</ol>
<p>This approach makes disaster recovery testing part of standard processes rather than a once-a-year compliance task.</p>
<h3 id="heading-automating-restore-validation"><strong>Automating Restore Validation</strong></h3>
<p>One of the most common gaps in disaster recovery testing is stopping at “restore completed” instead of validating that the application actually works. A restored database that can’t serve queries or contains incomplete data doesn’t meet recovery objectives.</p>
<p>Teams can reduce this risk by automating post-restore validation. For example, after restoring a PostgreSQL database into a staging or isolated DR environment, a simple validation script can confirm connectivity and basic data integrity:</p>
<pre><code class="language-python">import psycopg2

import sys


def validate_restore():

&nbsp;&nbsp;&nbsp;&nbsp;try:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;conn = psycopg2.connect(

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;host="restored-db.internal",

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;database="appdb",

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;user="dr_test_user",

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;password="securepassword"

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cur = conn.cursor()

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cur.execute("SELECT COUNT(*) FROM users;")

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;result = cur.fetchone()



&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if result and result[0] &gt; 0:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print("Restore validation successful.")

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print("Restore validation failed: No data found.")

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sys.exit(1)


&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;conn.close()

&nbsp;&nbsp;&nbsp;&nbsp;except Exception as e:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print(f"Restore validation error: {e}")

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sys.exit(1)


validate_restore()
</code></pre>
<p>This script does three important things:</p>
<ul>
<li><p>Confirms the database is reachable</p>
</li>
<li><p>Executes a real query, not just a connection check</p>
</li>
<li><p>Fails explicitly if the expected data is missing</p>
</li>
</ul>
<p>In practice, teams can integrate scripts like this into CI/CD pipelines or scheduled recovery drills. The goal isn’t to test every edge case, but to move from “backup exists” to “restore is functionally verified.” Over time, these automated checks become part of the disaster recovery test plan, helping teams measure RTO accurately and detect configuration drift before a real incident exposes it.</p>
<h2 id="heading-disaster-recovery-test-scenarios-practical-examples"><strong>Disaster Recovery Test Scenarios: Practical Examples</strong></h2>
<p>Effective disaster recovery testing focuses on realistic failures, not idealized outages.</p>
<h3 id="heading-accidental-deletion-or-misconfiguration"><strong>Accidental Deletion or Misconfiguration</strong></h3>
<p>A dropped database table, deleted storage bucket or bad configuration change tests how quickly teams can restore specific data without rolling back entire systems. These everyday incidents often reveal slow or overly manual recovery processes.</p>
<h3 id="heading-data-corruption-and-application-failure"><strong>Data Corruption and Application Failure</strong></h3>
<p>Buggy releases can silently corrupt data while systems remain online. This scenario validates point-in-time recovery and whether teams can identify when corruption started, not just restore the latest backup.</p>
<h3 id="heading-ransomware-simulation"><strong>Ransomware Simulation</strong></h3>
<p>Ransomware testing checks whether clean, uncompromised backups can be restored in isolation. It often exposes gaps in backup immutability, credential handling and realistic recovery times.</p>
<h3 id="heading-infrastructure-or-platform-outage"><strong>Infrastructure or Platform Outage</strong></h3>
<p>Simulating the loss of a cluster, availability zone or region tests automation and infrastructure-as-code maturity. In virtualized environments, most commonly <a href="https://www.nakivo.com/vmware-disaster-recovery/">VMware disaster recovery</a>, testing involves restoring virtual machines at a secondary site and validating networking and application dependencies.</p>
<h3 id="heading-credential-and-access-failure"><strong>Credential and Access Failure</strong></h3>
<p>Recovery can stall if credentials, certificates or secret keys are unavailable. Testing this scenario validates identity systems and whether recovery procedures rely on fragile access assumptions.</p>
<h2 id="heading-disaster-recovery-test-report-turning-tests-into-improvements"><strong>Disaster Recovery Test Report: Turning Tests Into Improvements</strong></h2>
<p>Testing without documentation is wasted effort. A disaster recovery test report turns results into actionable improvements.</p>
<p>A valuable DR test report includes:</p>
<ul>
<li><p>Test scope and scenario</p>
</li>
<li><p>Expected vs. actual RTO/RPO</p>
</li>
<li><p>Recovery steps executed</p>
</li>
<li><p>Failures, delays and root causes</p>
</li>
<li><p>Recommended changes</p>
</li>
</ul>
<p>For developers, this often results in concrete action items: refactoring startup dependencies, adding health checks, improving automation or adjusting data protection policies. The report should feed directly into backlog planning.</p>
<h2 id="heading-disaster-recovery-audits-and-continuous-validation"><strong>Disaster Recovery Audits and Continuous Validation</strong></h2>
<p>Audits often expose what teams already suspect: Disaster recovery plans exist, but haven’t been tested recently (or at all).</p>
<p>Rather than treating audits as one-time events, teams should adopt continuous validation:</p>
<ul>
<li><p>Regular restore tests integrated into CI/CD pipelines.</p>
</li>
<li><p>Scheduled DR tests tied to major architecture changes.</p>
</li>
<li><p>Automated alerts when recovery objectives drift.</p>
</li>
</ul>
<p>This shifts disaster recovery testing from an annual obligation to an ongoing practice that evolves alongside the environment.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Disaster recovery testing is not about pessimism, it’s about realism. Systems and people change, and failure modes evolve faster than documentation. Without testing, even the best-designed recovery plan can become outdated.</p>
<p>For developers and technical teams, practicing disaster recovery testing builds confidence rooted in evidence, not assumptions. It exposes hidden dependencies, validates data protection strategies and ensures that when something goes wrong, recovery is predictable instead of chaotic.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Elevate Your Database Game: Supercharging Query Performance with Postgres FDW ]]>
                </title>
                <description>
                    <![CDATA[ Foreign data wrappers (FDWs) make remote Postgres tables feel local. That convenience is exactly why FDW performance surprises are so common. A query that looks like a normal join can execute like a distributed system: rows move across the network, r... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/fdw-pushdown/</link>
                <guid isPermaLink="false">69963f00d35b661838993bd0</guid>
                
                    <category>
                        <![CDATA[ performance ]]>
                    </category>
                
                    <category>
                        <![CDATA[ PostgreSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Hamdaan Ali ]]>
                </dc:creator>
                <pubDate>Wed, 18 Feb 2026 22:36:48 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771357398917/8db8c3fd-9f16-4631-aa48-2537e8a4cb45.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Foreign data wrappers (FDWs) make remote Postgres tables feel local. That convenience is exactly why FDW performance surprises are so common.</p>
<p>A query that looks like a normal join can execute like a distributed system: rows move across the network, remote statements get executed repeatedly, and the local planner quietly becomes a coordinator. In that world, “fast SQL” is not mainly about CPU or indexes. It’s about <strong>data movement</strong> and <strong>round-trips</strong>.</p>
<p>This handbook covers the mechanism that determines whether a federated query behaves like a clean remote query or a chatty distributed workflow: <strong>pushdown</strong>.</p>
<p>Pushdown is not “moving compute”. Pushdown determines whether filtering, joining, ordering, and aggregation occur at the data source or after the data has already crossed the wire. When pushdown works, the local server receives a reduced result set. When it doesn’t, Postgres often has to fetch broad intermediate sets and finish the work locally.</p>
<p>The chapters ahead will help you build a practical mental model of what is “shippable” in <code>postgres_fdw</code>, why some expressions are blocked, and how to read <code>EXPLAIN (ANALYZE, BUFFERS, VERBOSE)</code> without getting tricked by familiar plan shapes.</p>
<p>After the core method, the handbook covers tuning knobs that matter in production, schema and indexing considerations, benchmarking methodology, monitoring and logging, and a case study that shows what a real pushdown win looks like end-to-end.</p>
<p>The later sections go deeper into advanced shippability edge cases, cost model calibration, and regression-proofing FDW workloads.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-executive-summary">Executive Summary</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-motivation">Motivation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-fdw-basics-without-the-setup-tax">FDW Basics Without the Setup Tax</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pushdown-mechanics">Pushdown Mechanics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-shippable-operations-a-deep-dive">Shippable Operations: a Deep Dive</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pushdown-blockers-and-why-they-exist">Pushdown Blockers and Why They Exist</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-reading-explain-like-a-pro">Reading EXPLAIN Like a Pro</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-tune-postgresfdw">How to Tune postgres_fdw</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-schema-and-index-recommendations">Schema and Index Recommendations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-benchmarking-methodology">Benchmarking Methodology</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-monitoring-and-logging">Monitoring and Logging</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-case-study-refactoring-a-keycloak-coverage-query">Case Study: Refactoring a Keycloak Coverage Query</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-checklist-and-troubleshooting-guide">Checklist and Troubleshooting Guide</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-case-study-takeaways">Case Study Takeaways</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-advanced-operations-a-deeper-dive-into-shippability">Advanced Operations: A Deeper Dive into Shippability</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-common-antipatterns-and-how-to-avoid-them">Common Anti‑Patterns and How to Avoid Them</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-extending-tuning-calibrating-cost-models">Extending Tuning: Calibrating Cost Models</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-further-case-studies-and-practical-examples">Further Case Studies and Practical Examples</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-monitoring-diagnostics-and-regression-testing">Monitoring, Diagnostics, and Regression Testing</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-extended-guidelines-for-advanced-dbas">Extended Guidelines for Advanced DBAs</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-bringing-it-all-together">Bringing it All Together</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-references">References</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>This handbook assumes basic comfort with Postgres query plans. It builds on <code>EXPLAIN (ANALYZE, BUFFERS)</code> rather than reintroducing SQL fundamentals, indexing, or join algorithms.</p>
<p>The focus here is federated execution: how foreign queries behave, and how to reason about them with the same clarity as local plans.</p>
<p>Here’s what you should already be comfortable with:</p>
<ul>
<li><p>Reading <code>EXPLAIN (ANALYZE, BUFFERS)</code> output and spotting obvious plan smells (row explosions, bad join order, missed indexes).</p>
</li>
<li><p>Basic join mechanics (nested loop, hash join, merge join) and why cardinality estimates matter.</p>
</li>
<li><p>Postgres statistics at a practical level (<code>ANALYZE</code>, correlation, and what “estimated rows vs actual rows” implies).</p>
</li>
</ul>
<p>And here’s what you need to follow along with the examples:</p>
<ul>
<li><p>A Postgres “local” instance that will run <code>postgres_fdw</code> and act as the coordinator.</p>
</li>
<li><p>A Postgres “remote” instance that holds the foreign tables.</p>
</li>
<li><p>Permission on the local side to:</p>
<ul>
<li><p><code>CREATE EXTENSION postgres_fdw;</code></p>
</li>
<li><p>create a <code>SERVER</code> and <code>USER MAPPING</code></p>
</li>
<li><p>create <code>FOREIGN TABLE</code> objects (or permission to use existing ones)</p>
</li>
</ul>
</li>
<li><p>A way to run queries and capture plans:</p>
<ul>
<li><code>psql</code> is enough, and so is any GUI, as long as you can run <code>EXPLAIN (ANALYZE, BUFFERS, VERBOSE)</code>.</li>
</ul>
</li>
</ul>
<p>We won’t go through a long environment setup walkthrough. The examples assume the FDW objects exist and focus on plans and behavior.</p>
<p>We also won’t go into general distributed systems theory. Only the pieces that show up in an FDW plan are used.</p>
<h2 id="heading-executive-summary">Executive Summary</h2>
<p>The single most important lesson of this handbook is that <strong>FDW pushdown reduces data movement</strong>. It’s tempting to think of pushdown as merely changing where a calculation happens (“move the work to the remote”). But what really matters is whether the remote server is asked for only the rows you need.</p>
<p>When pushdown is working, the remote server performs the selective join and filtering, and the local Postgres receives a small, already reduced result set. When pushdown fails, the local server becomes a distributed query coordinator: it pulls large intermediate sets over the network and then finishes the heavy lifting locally.</p>
<p>Why does this matter? Because a refactor that makes more of your query shippable to the remote server can slash end‑to‑end latency without changing a single row of output. In the case study we'll explore later, rewriting a query so that the FDW can ship a joined remote query instead of performing multiple foreign scans and local joins reduces runtime from approximately <strong>166 ms to 25 ms</strong>. The business logic did not change – the <em>shape</em> of the work changed.</p>
<p>Below is a simple bar chart illustrating that dramatic drop. The chart uses actual timings from the case study. If you run the experiment yourself, the numbers may differ depending on your hardware and network, but the relative difference should be clear.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771117284661/ecadfc8b-7e45-4122-921d-5b06215d627a.png" alt="Bar chart titled &quot;Query Execution Time: Before vs After Refactor.&quot; The chart shows execution time in milliseconds on the vertical axis. The &quot;Before&quot; bar is much taller, over 160 ms, compared to the &quot;After&quot; bar, which is below 20 ms, indicating a significant improvement in execution time after refactoring." class="image--center mx-auto" width="840" height="630" loading="lazy"></p>
<h2 id="heading-motivation">Motivation</h2>
<p>Foreign data wrappers let you query remote data using the same SQL syntax you use locally. That convenience is exactly why they can be so deceptive.</p>
<p>A federated query may look like a normal join, but under the hood, it behaves like a distributed system: some part of the plan runs on the remote server, some on the local server, and every boundary between them is a network hop. The slow path is rarely “bad SQL” – it’s usually a combination of two things:</p>
<ol>
<li><p><strong>Too many rows are pulled over the network.</strong> Without pushdown, the FDW retrieves a large slice of the remote table and applies your filters and joins locally. This may lead to tens of thousands or millions of rows being shipped across the network when you only needed hundreds or fewer.</p>
</li>
<li><p><strong>Too many round-trips.</strong> If the plan performs a nested loop that drives a foreign scan, it can end up executing the same remote query hundreds or thousands of times. Each call might be fast on its own, but latency adds up.</p>
</li>
</ol>
<p>This isn't speculation. PostgreSQL's documentation makes clear that a foreign table <strong>has no local storage</strong> and that Postgres “asks the FDW to fetch data from the external source” <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=It%20is%20generally%20recommended%20that,differently%20from%20the%20local%20server">[1]</a>. There is no local buffer cache or heap storage to hide mistakes. Every row you retrieve must traverse the network at least once. If your plan fetches more rows than it needs, or repeatedly does so, performance can degrade quickly.</p>
<p>That’s why you should treat the Remote SQL shown in <code>EXPLAIN (VERBOSE)</code> as part of your query plan. It tells you exactly what the remote server is being asked to do. If it’s missing your filters or joins, you know the local server will have to finish the job. The rest of this handbook will teach you how to read that plan, how to force pushdown when possible, and how to recognize the signs that something has gone wrong.</p>
<h2 id="heading-fdw-basics-without-the-setup-tax">FDW Basics Without the Setup Tax</h2>
<p>You might be tempted to skip this section if you've already created foreign tables in your own databases. Don't. Understanding the architecture of foreign data wrappers is essential to understanding why pushdown matters.</p>
<h3 id="heading-sqlmed-in-a-nutshell">SQL/MED in a nutshell</h3>
<p>PostgreSQL implements the <strong>SQL/MED</strong> (Management of External Data) standard through its FDW framework. To access a remote Postgres server via <code>postgres_fdw</code>, you perform four steps:</p>
<ol>
<li><p><strong>Install the extension</strong>: <code>CREATE EXTENSION postgres_fdw</code> tells Postgres to load the FDW code.</p>
</li>
<li><p><strong>Create a foreign server</strong>: <code>CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '...', port '...', dbname '...')</code>defines where the remote server resides and how to connect.</p>
</li>
<li><p><strong>Create a user mapping</strong>: <code>CREATE USER MAPPING FOR your_user SERVER foreign_server OPTIONS (user 'remote_user', password '...')</code> tells Postgres how to authenticate on the remote side.</p>
</li>
<li><p><strong>Create a foreign table</strong>: <code>CREATE FOREIGN TABLE remote_table (...) SERVER foreign_server OPTIONS (schema_name '...', table_name '...');</code> defines the columns and references the remote table.</p>
</li>
</ol>
<p>Once you've done that, you can run <code>SELECT</code> statements against the foreign table as if it were local. But the definition hides an important detail: there is no storage associated with that foreign table <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=It%20is%20generally%20recommended%20that,differently%20from%20the%20local%20server">[1]</a>. Every time you <code>SELECT</code>, <code>INSERT</code>, <code>UPDATE</code>, or <code>DELETE</code>, the FDW must connect to the remote server, build a remote query, send it, and read the results. This overhead is small for simple queries but becomes critical as queries get more complex.</p>
<h3 id="heading-what-postgresfdw-does-and-does-not-do">What postgres_fdw does and does not do</h3>
<p><code>postgres_fdw</code> does two things for you:</p>
<ol>
<li><p>It builds remote SQL from your query, including pushing down safe filters, joins, sorts, and aggregates when it can.</p>
</li>
<li><p>It fetches rows from the remote server and hands them to the local executor. If some part of your query cannot be executed remotely, the local executor performs that part.</p>
</li>
</ol>
<p>The FDW tries hard to minimize data transfer by sending as much of your <code>WHERE</code> clause as possible to the remote server and by not retrieving unused columns <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=">[2]</a>. It also has a number of tuning knobs that we'll explore later (such as <code>fetch_size</code>, <code>use_remote_estimate</code>, <code>fdw_startup_cost</code>, and <code>fdw_tuple_cost</code><a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a>). But the real win often comes from structuring your query so that the FDW can push work down.</p>
<p>There's one last architectural point to keep in mind: the remote server runs with a restricted session environment. In remote sessions opened by <code>postgres_fdw</code>, the <code>search_path</code> is set to <code>pg_catalog</code> only, and <code>TimeZone</code>, <code>DateStyle</code>, and <code>IntervalStyle</code> are set to specific values <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=In%20the%20remote%20sessions%20opened,their%20expected%20search%20path%20environment">[4]</a>. This means that any functions you expect to run remotely must be schema‑qualified or packaged in a way that the FDW can find them. It also underscores why you should not override session settings for FDW connections unless you know exactly what you are doing <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=In%20the%20remote%20sessions%20opened,their%20expected%20search%20path%20environment">[4]</a>.</p>
<h2 id="heading-pushdown-mechanics">Pushdown Mechanics</h2>
<p>At a high level, “pushdown” means pushing as much of your SQL query as possible to the remote server. But the FDW cannot simply send arbitrary SQL. It must be <em>safe</em> and <em>portable</em> for remote evaluation. Postgres uses the term <strong>shippable</strong> to describe expressions and operations that can be evaluated on the foreign server.</p>
<h3 id="heading-what-shippable-means-in-practice">What “shippable” means in practice</h3>
<p>An expression is considered shippable if it meets several conditions:</p>
<ol>
<li><p><strong>It uses built‑in functions, operators, or data types</strong>, or functions/operators from extensions that have been explicitly allow‑listed via the extensions option on the foreign server <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=">[2]</a>. If you use a custom function or an extension that has not been declared, the FDW assumes it cannot run remotely.</p>
</li>
<li><p><strong>It’s marked IMMUTABLE.</strong> Postgres distinguishes between <code>IMMUTABLE</code>, <code>STABLE</code>, and <code>VOLATILE</code> functions. Only immutable functions – those that always return the same output for the same inputs and don’t depend on session state – are candidates for pushdown <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=functions%20in%20such%20clauses%20must,to%20reduce%20the%20risk%20of">[5]</a>. This rule prevents time‑dependent functions, such as <code>now()</code> or <code>random()</code> from being evaluated remotely, because the result might differ between the local and remote servers.</p>
</li>
<li><p><strong>It doesn’t depend on local collations or type conversions</strong>. PostgreSQL’s docs warn that type or collation mismatches can lead to semantic anomalies <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=It%20is%20generally%20recommended%20that,differently%20from%20the%20local%20server">[1]</a>. If the FDW cannot guarantee that a comparison behaves identically on both servers, it will refuse to push it down. For example, comparing a <code>citext</code> column to a <code>text</code> constant could be unsafe if the remote server doesn’t have the <code>citext</code> extension installed.</p>
</li>
</ol>
<p>From these rules, you can derive a mental checklist: avoid non‑immutable functions in your <code>WHERE</code> clause, keep your join conditions simple and typed correctly, and list any third‑party extensions you want to use in the foreign server’s extensions option so that they are considered shippable <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=">[2]</a>.</p>
<h3 id="heading-where-pushdown">WHERE pushdown</h3>
<p>If a <code>WHERE</code> clause consists entirely of shippable expressions, it will be included in the remote query. Otherwise, it will be evaluated locally. This matters because pushing a filter down reduces the number of rows returned to the local server.</p>
<p>Consider a predicate like this:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">WHERE</span> created_at &gt;= now() - <span class="hljs-type">interval</span> <span class="hljs-string">'30 days'</span>
</code></pre>
<p>Because <code>now()</code> is volatile (it returns a different value each time it’s called), Postgres cannot assume the remote server will interpret <code>now()</code> the same way. The FDW therefore pulls the entire table and applies the filter locally.</p>
<p>A better approach is to pass a parameter into the query or compute the cutoff timestamp once in the application and embed it into the SQL.</p>
<h3 id="heading-join-pushdown-conditions">Join pushdown conditions</h3>
<p>Joins are the next big lever. When <code>postgres_fdw</code> encounters a join between foreign tables on the <strong>same foreign server</strong>, it will send the entire join to the remote server unless it believes it will be more efficient to fetch the tables individually or unless the tables use different user mappings <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=When%20,clauses">[6]</a>.</p>
<p>It applies the same precautions described for <code>WHERE</code> clauses: the join condition must be shippable, and both tables must be on the same server. Cross‑server joins are never pushed down – the FDW will perform them locally.</p>
<h3 id="heading-shippability-decision-tree">Shippability decision tree</h3>
<p>It can be helpful to visualize the shippability rules as a flowchart. Below is a simple decision tree that you can use when inspecting an expression or join clause.</p>
<p>It starts with the question of whether an expression is in a WHERE or JOIN clause. Further decisions are made based on factors like using volatile functions, built-in functions, type mismatches, or cross-server joins. The flowchart concludes with outcomes like "Not shippable, evaluated locally" or "Shippable, included in Remote SQL."</p>
<p>If you reach the left side of the tree, the expression will be evaluated locally. If you reach the right side, the FDW can ship it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771109842865/9dafcd32-c390-487d-8b35-2911d6075b13.png" alt="Flowchart for determining SQL expression shippability. It starts with the question of whether an expression is in a WHERE or JOIN clause. Further decisions are made based on factors like using volatile functions, built-in functions, type mismatches, or cross-server joins. The flowchart concludes with outcomes like &quot;Not shippable, evaluated locally&quot; or &quot;Shippable, included in Remote SQL.&quot;" class="image--center mx-auto" width="8192" height="2404" loading="lazy"></p>
<h2 id="heading-shippable-operations-a-deep-dive">Shippable Operations: a Deep Dive</h2>
<p>Postgres has been expanding what <code>postgres_fdw</code> can be pushed down over several versions. This section walks through each operation class and the conditions required for pushdown.</p>
<h3 id="heading-filters-where-clauses">Filters (WHERE clauses)</h3>
<p>As explained above, simple filters that use built‑in operators and immutable functions are generally pushed down. If you see a <code>Filter:</code> node above a Foreign Scan in your plan, it means some part of your predicate didn’t qualify. Common reasons include using <code>now()</code>, <code>timezone()</code> or other volatile functions, referencing a non‑allow‑listed extension, or comparing different collation settings.</p>
<p>When this happens, the entire table (or at least all rows matching other shippable conditions) is fetched, and the filter is applied locally.</p>
<p><strong>Plan smell:</strong> Look for a Foreign Scan node with a <code>Filter:</code> line directly above it. That means filtering happened locally. Also look for broad Remote SQL such as:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> remote_table <span class="hljs-keyword">WHERE</span> (<span class="hljs-type">name</span> = <span class="hljs-string">'Hamdaan'</span>)
</code></pre>
<p>with no group constraints. That's a sign that the filter was not pushed down.</p>
<h3 id="heading-joins">Joins</h3>
<p>Simple inner joins between foreign tables on the same foreign server are usually pushable. The join condition must satisfy the same shippability rules as filters. If the join involves more than one foreign server, if the join condition uses an unshippable function, or if the foreign tables use different user mappings, the FDW will fetch each table separately and join them locally <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=When%20,clauses">[6]</a>. This can lead to large intermediate sets being transferred.</p>
<p><strong>Plan smell:</strong> A Hash Join or Merge Join where both inputs are Foreign Scan nodes indicates that the join was performed locally. Conversely, a single Foreign Scan representing a join and containing the <code>JOIN ... ON</code> clause in Remote SQL indicates that the join was pushed down.</p>
<h3 id="heading-aggregates-group-by-count-sum-and-so-on">Aggregates (GROUP BY, COUNT, SUM, and so on)</h3>
<p>Starting in PostgreSQL 10, aggregates can be pushed to the remote server when possible. The release notes state explicitly: “push aggregate functions to the remote server,” and explain that this <strong>reduces the amount of data that must be transferred from the remote server and offloads aggregate computation</strong> <a target="_blank" href="https://www.postgresql.org/docs/release/10.0/#:~:text=,Jeevan%20Chalke%2C%20Ashutosh%20Bapat">[7]</a>.</p>
<p>To qualify, both the grouping expressions and the aggregate functions themselves must be shippable. If the FDW cannot push an aggregate, it will fetch the raw rows and perform the aggregation locally.</p>
<p><strong>Plan smell:</strong> Look for a <code>GroupAggregate</code> node above a Foreign Scan that returns many rows. When the aggregate is pushed down, there will be no local aggregate node. Instead, the Remote SQL will include a <code>GROUP BY</code> clause.</p>
<h3 id="heading-order-by-and-limit">ORDER BY and LIMIT</h3>
<p>Prior to PostgreSQL 12, sorting and limiting were rarely pushed down. In version 12, Etsuro Fujita’s patch allows ORDER BY sorts and LIMIT clauses to be pushed to <code>postgres_fdw</code> foreign servers <strong>in more cases</strong> <a target="_blank" href="https://www.postgresql.org/docs/release/12.0/#:~:text=,Etsuro%20Fujita%29%20%C2%A7%20%C2%A7">[8]</a>. For the sort or limit to be pushed, the underlying scan must be pushable, and the ordering expression must be shippable. Partitioned queries or complicated join trees may still cause the sort or limit to be applied locally.</p>
<p><strong>Plan smell:</strong> A local Sort or Limit node above a Foreign Scan indicates the operation was not pushed down. Conversely, a Remote SQL statement containing ORDER BY and LIMIT indicates that pushdown succeeded.</p>
<h3 id="heading-distinct">DISTINCT</h3>
<p>Distinct operations can be pushed down when the distinct expression list is shippable. But if the distinct is combined with unshippable expressions, or if the distinct is applied after a join that cannot be pushed down, the FDW will retrieve all rows and perform the distinct locally.</p>
<h3 id="heading-window-functions">Window functions</h3>
<p>In practice, window functions are rarely pushed down through <code>postgres_fdw</code>. They often require ordering or partitioning semantics that are difficult to represent portably. If you see a <code>WindowAgg</code> node in your plan, it’s almost always local. That doesn’t mean you can't use window functions with foreign tables, but you should expect them to incur network and CPU costs.</p>
<h3 id="heading-version-differences">Version differences</h3>
<p>Postgres developers continue to improve the FDW layer. Here are some notable changes by version:</p>
<ol>
<li><p><strong>PostgreSQL 9.6</strong> introduced remote join pushdown and allowed UPDATE/DELETE pushdown. Before 9.6, all joins were local.</p>
</li>
<li><p><strong>PostgreSQL 10</strong> introduced aggregate pushdown, enabling remote GROUP BY and aggregate functions <a target="_blank" href="https://www.postgresql.org/docs/release/10.0/#:~:text=,Jeevan%20Chalke%2C%20Ashutosh%20Bapat">[7]</a>.</p>
</li>
<li><p><strong>PostgreSQL 12</strong> expanded ORDER BY and LIMIT pushdown <a target="_blank" href="https://www.postgresql.org/docs/release/12.0/#:~:text=,Etsuro%20Fujita%29%20%C2%A7%20%C2%A7">[8]</a>.</p>
</li>
<li><p><strong>PostgreSQL 15</strong> added pushdown for certain CASE expressions and other improvements.</p>
</li>
</ol>
<p>If you learned FDW behavior on an older version, revisit your assumptions.</p>
<h2 id="heading-pushdown-blockers-and-why-they-exist">Pushdown Blockers and Why They Exist</h2>
<p>When pushdown fails, it’s not due to bad luck. There’s always a reason grounded in safety or correctness. Here are the most common blockers and how to diagnose them.</p>
<h3 id="heading-nonimmutable-functions">Non‑immutable functions</h3>
<p>Functions marked <code>VOLATILE</code> or <code>STABLE</code> cannot be pushed down because their results may differ between the local and remote server. Examples include <code>now()</code>, <code>random()</code>, <code>current_user</code>, and user‑defined functions that look at session variables or query the database. Even functions you might think are harmless, like <code>age()</code> or <code>clock_timestamp()</code>, can cause pushdown to fail.</p>
<p><strong>Fix:</strong> Compute volatile values in your application or in a CTE before referencing the foreign table. For example, compute timestamp <code>'now' - interval '30 days'</code> as a constant and compare your <code>created_at</code> column against that constant. Alternatively, move the logic into a stored generated column on the remote table.</p>
<h3 id="heading-type-and-collation-mismatches">Type and collation mismatches</h3>
<p>The documentation warns that when types or collations don’t match between the local and remote tables, the remote server may interpret conditions differently <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=It%20is%20generally%20recommended%20that,differently%20from%20the%20local%20server">[1]</a>. This is particularly insidious when text comparisons, case‑insensitive collations, or non‑default locale settings are used. If Postgres can't guarantee the same semantics, it will pull rows locally and evaluate the expression.</p>
<p><strong>Fix:</strong> Make sure that your foreign table definition uses the same data types and collations as the remote table. When in doubt, explicitly cast values to a common type.</p>
<h3 id="heading-crossserver-joins">Cross‑server joins</h3>
<p>Joins across different foreign servers cannot be pushed down. The FDW can only ship a join when both tables reside on the same remote server and use the same user mapping <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=When%20,clauses">[6]</a>. Otherwise, it will perform two separate scans and join the results locally.</p>
<p><strong>Fix:</strong> If you frequently join tables across servers, consider consolidating the tables on a single server, materializing a view on one side, or pulling the smaller table into a temporary local table before joining.</p>
<h3 id="heading-mixed-local-and-foreign-joins">Mixed local and foreign joins</h3>
<p>A join between a local table and a foreign table will not be pushed down. Even though the foreign side might be pushdown‑eligible, the FDW cannot join it with local data on the remote server. A nested loop with a parameterized foreign scan is the typical pattern here, resulting in many remote calls.</p>
<p><strong>Fix:</strong> Filter or aggregate as much as possible on the foreign side first (via a CTE or by materializing a subset) before joining to local tables.</p>
<h3 id="heading-remote-session-settings-and-search-paths">Remote session settings and search paths</h3>
<p>Because <code>postgres_fdw</code> sets a restricted <code>search_path</code>, <code>TimeZone</code>, <code>DateStyle</code>, and <code>IntervalStyle</code> in remote sessions <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=In%20the%20remote%20sessions%20opened,their%20expected%20search%20path%20environment">[4]</a>, any functions you call must be schema‑qualified or otherwise compatible. If a function relies on the current search path or session settings, it may break or produce different results on the remote side.</p>
<p><strong>Fix:</strong> Schema‑qualify remote functions and ensure that any environment‑dependent logic is safe to execute under the default FDW session settings. If necessary, attach <code>SET search_path</code> or other settings to your remote functions.</p>
<h3 id="heading-troubleshooting-matrix">Troubleshooting matrix</h3>
<p>The table below maps symptoms in your <code>EXPLAIN</code> plan to likely causes and fixes. Use it as a quick diagnostic tool when something looks off.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Symptom in plan</strong></td><td><strong>Likely cause</strong></td><td><strong>Suggested fix</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Foreign Scan has loops much greater than 1</td><td>Parameterized remote lookup caused by nested loop, join conditions not shippable</td><td>Rewrite join so the FDW can ship a single joined query, or batch remote requests via an <code>IN</code> list or temporary table</td></tr>
<tr>
<td>Broad Remote SQL that lacks scope predicates</td><td><code>WHERE</code> clause contains non‑immutable functions or unsupported operators</td><td>Replace volatile functions with constants or allow‑list extension functions, ensure types and collations match</td></tr>
<tr>
<td>Local Hash Join or Merge Join between two foreign tables</td><td>Join could not be pushed down (different servers, user mappings, or unshippable join expression)</td><td>Consolidate tables on one server, align user mappings, or rewrite the join condition</td></tr>
<tr>
<td>Local Sort, Limit, or Unique on top of a Foreign Scan</td><td><code>ORDER BY</code>, <code>LIMIT</code>, or <code>DISTINCT</code> could not be pushed down</td><td>Simplify sort expressions, push filters deeper, check PG version for improvements</td></tr>
<tr>
<td>Plan runs but gives wrong results when pushdown is enabled</td><td>Semantic mismatch due to type/collation differences or remote session settings <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=It%20is%20generally%20recommended%20that,differently%20from%20the%20local%20server">[1]</a> <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=In%20the%20remote%20sessions%20opened,their%20expected%20search%20path%20environment">[4]</a></td><td>Align types/collations, schema‑qualify functions, use stable session settings</td></tr>
</tbody>
</table>
</div><h2 id="heading-reading-explain-like-a-pro">Reading EXPLAIN Like a Pro</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771117830315/62ca8fde-2638-4ae1-b968-1100ac5251bb.png" alt="SQL execution plan analysis table with columns: exclusive, inclusive, rows x, rows, loops, and node details. Rows display Nested Loop Join, Hash Join, and Seq Scan operations with costs, times, and buffers. Highlighted cells indicate notable metrics." class="image--center mx-auto" width="1579" height="823" loading="lazy"></p>
<p>Many developers skim <code>EXPLAIN</code> plans for local queries, looking at the top nodes and overall cost. For FDW queries, you must invert that habit: read the foreign parts first. The Remote SQL string tells you what the remote server is being asked to do, and the loops field tells you how many times that remote call is executed.</p>
<h3 id="heading-inspect-the-foreign-scan-nodes">Inspect the Foreign Scan nodes</h3>
<p>Start by finding the Foreign Scan node(s). In <code>EXPLAIN (VERBOSE)</code>, each foreign scan includes a line like:</p>
<pre><code class="lang-pgsql">Remote <span class="hljs-keyword">SQL</span>: <span class="hljs-keyword">SELECT</span> ...
</code></pre>
<p>This line is not a trivial – it’s the actual SQL that will run on the remote server. Read it carefully. Does it include your <code>WHERE</code> predicates? Does it include your join conditions? If not, you know the local server will pick up the slack.</p>
<p>Look at the loops column. If the loops exceed 1, the same remote query is executed multiple times. For example:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">Foreign</span> Scan <span class="hljs-keyword">on</span> <span class="hljs-built_in">public</span>.user_entity  (<span class="hljs-keyword">rows</span>=<span class="hljs-number">1</span> loops=<span class="hljs-number">416</span>)
  Remote <span class="hljs-keyword">SQL</span>: <span class="hljs-keyword">SELECT</span> id, tenant_id <span class="hljs-keyword">FROM</span> <span class="hljs-built_in">public</span>.user_entity <span class="hljs-keyword">WHERE</span> enabled <span class="hljs-keyword">AND</span> service_account_client_link <span class="hljs-keyword">IS</span> <span class="hljs-keyword">NULL</span> <span class="hljs-keyword">AND</span> id = <span class="hljs-meta">$1</span>
</code></pre>
<p>This is the “N+1” problem in disguise. The plan executes the foreign scan once per outer row. Multiply the per‑loop cost by the number of loops to understand why the query is slow. The fix is to rewrite the query so that the join and filters are applied in a single remote call.</p>
<h3 id="heading-recognize-initplan-vs-subplan">Recognize InitPlan vs SubPlan</h3>
<p>An InitPlan runs once and caches its result. A SubPlan can run per outer row. In FDW queries, subplans often drive parameterized remote scans. If you see a SubPlan attached to a nested loop that feeds a foreign scan, suspect a parameterized remote lookup and look for ways to turn it into an InitPlan or merge it into a single remote query.</p>
<h3 id="heading-understand-cte-materialization">Understand CTE materialization</h3>
<p>Common table expressions (CTEs) behave differently depending on whether they are marked <code>MATERIALIZED</code> or <code>NOT MATERIALIZED</code>. A materialized CTE is computed once and stored in a temporary structure, then read by the rest of the query. A non‑materialized CTE is inlined into the parent query, allowing optimizations to span across the boundary.</p>
<p>In PostgreSQL 12 and later, CTEs are inlined by default unless they’re referenced multiple times or explicitly marked <code>MATERIALIZED</code>. Materializing a CTE that contains a foreign scan can freeze a broad remote fetch and prevent later clauses from being pushed down. On the other hand, materialization can prevent repeated remote scans if the CTE is referenced multiple times. Use this lever deliberately to control where remote work happens.</p>
<h3 id="heading-annotated-example">Annotated example</h3>
<p>Let's annotate a simplified excerpt from a real plan. The goal is to show how to quickly read the relevant parts.</p>
<pre><code class="lang-pgsql">Nested <span class="hljs-keyword">Loop</span>  (<span class="hljs-keyword">rows</span>=<span class="hljs-number">414</span> loops=<span class="hljs-number">1</span>)
  -&gt; Hash <span class="hljs-keyword">Join</span>  (<span class="hljs-keyword">rows</span>=<span class="hljs-number">416</span> loops=<span class="hljs-number">1</span>)
       -&gt; <span class="hljs-keyword">Foreign</span> Scan <span class="hljs-keyword">on</span> <span class="hljs-built_in">public</span>.user_entity (<span class="hljs-keyword">rows</span>=<span class="hljs-number">1</span> loops=<span class="hljs-number">416</span>)
            Remote <span class="hljs-keyword">SQL</span>: <span class="hljs-keyword">SELECT</span> id, tenant_id <span class="hljs-keyword">FROM</span> <span class="hljs-built_in">public</span>.user_entity <span class="hljs-keyword">WHERE</span> enabled <span class="hljs-keyword">AND</span> service_account_client_link <span class="hljs-keyword">IS</span> <span class="hljs-keyword">NULL</span> <span class="hljs-keyword">AND</span> id = <span class="hljs-meta">$1</span>
  -&gt; <span class="hljs-keyword">Foreign</span> Scan <span class="hljs-keyword">on</span> <span class="hljs-built_in">public</span>.user_attribute (<span class="hljs-keyword">rows</span>=<span class="hljs-number">671</span> loops=<span class="hljs-number">1</span>)
       Remote <span class="hljs-keyword">SQL</span>: <span class="hljs-keyword">SELECT</span> ua.user_id, ua.<span class="hljs-keyword">value</span> <span class="hljs-keyword">FROM</span> user_attribute ua <span class="hljs-keyword">JOIN</span> user_entity u <span class="hljs-keyword">ON</span> ua.user_id = u.id <span class="hljs-keyword">JOIN</span> tenant r <span class="hljs-keyword">ON</span> u.tenant_id = r.id <span class="hljs-keyword">WHERE</span> ua.name = <span class="hljs-string">'attribute A'</span> <span class="hljs-keyword">AND</span> r.name = <span class="hljs-string">'demo'</span> <span class="hljs-keyword">AND</span> u.enabled <span class="hljs-keyword">AND</span> u.service_account_client_link <span class="hljs-keyword">IS</span> <span class="hljs-keyword">NULL</span> <span class="hljs-keyword">AND</span> (g.name = <span class="hljs-string">'keycloak-group-a'</span> <span class="hljs-keyword">OR</span> g.parent_group = <span class="hljs-meta">$1</span>)
</code></pre>
<p>In the old plan, the first Foreign Scan executed 416 times, each time retrieving a single row. The Remote SQL only applies the filter on enabled and service_account_client_link – it doesn’t include the tenant or group scoping. That scoping is applied by the nested loop outside the foreign scan.</p>
<p>In the refactored plan, the second Foreign Scan results from combining user_attribute, user_entity, user_group_membership, keycloak_group, and tenant into a single remote query. It retrieves 671 rows in a single query and includes all relevant filters. There is no repeated remote call. The timing difference is driven by the different loop values and the selectivity of the Remote SQL.</p>
<h2 id="heading-how-to-tune-postgresfdw">How to Tune postgres_fdw</h2>
<p>Once you've structured your query for maximum pushdown, tuning knobs let you squeeze out further performance improvements and adjust planner decisions.</p>
<h3 id="heading-fetchsize">fetch_size</h3>
<p><code>fetch_size</code> controls how many rows <code>postgres_fdw</code> retrieves per network fetch. The default is <code>100</code> rows <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=">[9]</a>. A small fetch size means more round-trips and lower memory usage. A larger fetch size reduces network overhead at the cost of buffering more rows in memory.</p>
<p>In practice, increasing <code>fetch_size</code> to a few thousand can reduce latency for large result sets. It’s specified either at the foreign server or foreign table level:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">SERVER</span> foreign_server <span class="hljs-keyword">OPTIONS</span> (<span class="hljs-keyword">ADD</span> fetch_size <span class="hljs-string">'1000'</span>);
<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">FOREIGN</span> <span class="hljs-keyword">TABLE</span> remote_table <span class="hljs-keyword">OPTIONS</span> (<span class="hljs-keyword">ADD</span> fetch_size <span class="hljs-string">'1000'</span>);
</code></pre>
<h3 id="heading-useremoteestimate">use_remote_estimate</h3>
<p>By default, the planner estimates the cost of foreign scans using local statistics. This can be wildly inaccurate if the foreign table has a different data distribution. Setting <code>use_remote_estimate</code> to true tells <code>postgres_fdw</code> to run <code>EXPLAIN</code> on the remote server to get row count and cost estimates. This can dramatically improve join order selection at the cost of an additional remote query during planning <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a>. You can set this per table or per server:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">SERVER</span> foreign_server <span class="hljs-keyword">OPTIONS</span> (<span class="hljs-keyword">SET</span> use_remote_estimate <span class="hljs-string">'true'</span>);
</code></pre>
<h3 id="heading-fdwstartupcost-and-fdwtuplecost">fdw_startup_cost and fdw_tuple_cost</h3>
<p>These cost parameters model the overhead of starting a foreign scan and the cost per row fetched. Adjusting them can influence the planner’s choice of join strategy. A higher <code>fdw_startup_cost</code> discourages the planner from choosing plans with many small foreign scans (which might generate many remote calls). A higher <code>fdw_tuple_cost</code> discourages plans that fetch large numbers of rows <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a>. Use these only after you have solid evidence from <code>EXPLAIN</code> and experiments.</p>
<h3 id="heading-analyze-and-analyzesampling">ANALYZE and analyze_sampling</h3>
<p>Running <code>ANALYZE</code> on a foreign table collects local statistics by sampling the remote table <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a>. Accurate stats are essential for good estimates when <code>use_remote_estimate</code> is false.</p>
<p>But if the remote table changes frequently, these stats become stale quickly. The <code>analyze_sampling</code> option controls whether sampling happens on the remote side or locally. When <code>analyze_sampling</code> is set to <code>random</code>, <code>system</code>, <code>bernoulli</code>, or <code>auto</code>, <code>ANALYZE</code> will sample rows remotely instead of pulling all rows into the local server<a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a>.</p>
<h3 id="heading-extensions">extensions</h3>
<p>The extensions option lists extensions whose functions and operators can be shipped to the remote server <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=">[2]</a>. If you rely on functions from citext, <code>pg_trgm</code>, or other extensions, add them to the server definition:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">SERVER</span> foreign_server <span class="hljs-keyword">OPTIONS</span> (<span class="hljs-keyword">SET</span> extensions <span class="hljs-string">'citext,pg_trgm'</span>);
</code></pre>
<h3 id="heading-a-quick-knob-impact-table">A quick knob impact table</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Knob</strong></td><td><strong>Primary effect</strong></td><td><strong>When to change it</strong></td><td><strong>Possible downside</strong></td></tr>
</thead>
<tbody>
<tr>
<td>fetch_size</td><td>Number of rows per fetch</td><td>Result sets are large and latency dominates</td><td>Too large consumes memory</td></tr>
<tr>
<td>use_remote_estimate</td><td>Better row count/cost estimates</td><td>Planner misestimates foreign scans</td><td>Extra remote queries during planning</td></tr>
<tr>
<td>fdw_startup_cost</td><td>Penalty per foreign scan</td><td>Planner chooses many small foreign scans</td><td>Wrong values bias the planner</td></tr>
<tr>
<td>fdw_tuple_cost</td><td>Cost per row fetched</td><td>Planner pulls too many rows</td><td>Mis‑tuned values mislead planner</td></tr>
<tr>
<td>extensions</td><td>Which extension functions are shippable</td><td>Using extension functions in predicates</td><td>Extensions must exist and match on both servers</td></tr>
</tbody>
</table>
</div><h2 id="heading-schema-and-index-recommendations">Schema and Index Recommendations</h2>
<p>Pushdown doesn’t eliminate the need for good indexes. In fact, effective pushdown depends on the remote server having indexes that support the filter and join predicates you’re shipping.</p>
<p>Below are some patterns to watch for in FDW queries and the indexes that support them. You can adapt these to your own schema.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Table</strong></td><td><strong>Access pattern</strong></td><td><strong>Recommended index</strong></td><td><strong>Why</strong></td></tr>
</thead>
<tbody>
<tr>
<td>tenant (remote)</td><td>Filter by tenant.name</td><td>UNIQUE (name) or BTREE (name)</td><td>Resolves tenant ID quickly</td></tr>
<tr>
<td>keycloak_group (remote)</td><td>Filter by name, join by tenant_id, filter on parent_group</td><td>Composite (tenant_id, name) and (parent_group)</td><td>Supports resolving root group and walking one‑level hierarchy</td></tr>
<tr>
<td>user_group_membership (remote)</td><td>Join by user_id, filter by group_id</td><td>BTREE (group_id, user_id)</td><td>Efficiently finds users in a set of groups</td></tr>
<tr>
<td>user_attribute (remote)</td><td>Filter by name, join by user_id</td><td>Composite (name, user_id) (optionally include value)</td><td>Matches “attribute name → users → values” flow</td></tr>
<tr>
<td>user_entity (remote)</td><td>Filter by tenant_id, enabled, service_account_client_link IS NULL, join by id</td><td>Partial index on (tenant_id, id) with predicate on enabled and service_account_client_link IS NULL</td><td>Helps remote planner start from user table when tenant and user filters are applied</td></tr>
<tr>
<td>filtercategory (local)</td><td>Filter by category &amp;&amp; uuid[], join on (entitytype, entityid)</td><td>GIN index on category, BTREE (entitytype, entityid)</td><td>Speeds array overlap checks and join predicate</td></tr>
</tbody>
</table>
</div><p>In general, indexes should reflect the join order you expect the remote planner to use. If your Remote SQL starts with:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">FROM</span> user_attribute ua <span class="hljs-keyword">JOIN</span> user_entity u <span class="hljs-keyword">ON</span> ua.user_id = u.id <span class="hljs-keyword">JOIN</span> user_group_membership ugm <span class="hljs-keyword">ON</span> ...
</code></pre>
<p>ensure that indexes exist on <code>user_attribute(user_id</code>) and <code>user_group_membership(user_id)</code>.</p>
<h2 id="heading-benchmarking-methodology">Benchmarking Methodology</h2>
<p>It’s easy to claim a performance improvement without proper measurement. Here's a repeatable method you can use to benchmark FDW query changes.</p>
<ol>
<li><p><strong>Warm the caches.</strong> Run each query once to load data into the remote buffer cache and the local FDW connection. Discard the timings.</p>
</li>
<li><p><strong>Measure latencies.</strong> Use EXPLAIN (ANALYZE, BUFFERS, VERBOSE) to capture execution times, buffer usage, and remote row counts. Be aware that EXPLAIN ANALYZE adds overhead, so record the raw execution time if possible by running the query directly.</p>
</li>
<li><p><strong>Record remote metrics.</strong> On the remote server, enable pg_stat_statements and track the calls, total_time, and rows for each remote query. This gives you a per‑query breakdown and confirms what Remote SQL is executed.</p>
</li>
<li><p><strong>Control for concurrency and network latency.</strong> Run benchmarks during a quiet period or isolate the test cluster. If your environment has high network latency, record the round‑trip time separately to attribute delays.</p>
</li>
<li><p><strong>Compare apples to apples.</strong> Benchmark the old and new queries under identical conditions. Use the same sample data, same remote server, and same connection settings.</p>
</li>
<li><p><strong>Look at row counts.</strong> The primary goal of pushdown is to reduce the number of rows shipped. Compare the rows column of each Foreign Scan node.</p>
</li>
</ol>
<p>Here's a simple matrix you can use to record your experiments:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Scenario</strong></td><td><strong>What you're testing</strong></td><td><strong>Expected change in Remote SQL</strong></td><td><strong>Metrics to record</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Baseline (old query)</td><td>Starting point: broad remote scans + local joins</td><td>Remote SQL lacks scoping predicates</td><td>p50/p95 latency, remote row count, local sort/hash time</td></tr>
<tr>
<td>Refactor (new query)</td><td>Join + filter pushdown</td><td>Remote SQL includes joins and filters</td><td>Same metrics, plus remote row count</td></tr>
<tr>
<td>Introduce a volatile function</td><td>Pushdown blocker test</td><td>Clause removed from Remote SQL</td><td>Remote row count increases, local filter cost increases</td></tr>
<tr>
<td>Type or collation mismatch</td><td>Semantic risk test</td><td>Remote SQL might change behavior or lose pushdown</td><td>Compare correctness and row counts</td></tr>
<tr>
<td>ORDER/LIMIT pushdown</td><td>Version‑dependent test</td><td>Remote SQL includes ORDER BY, LIMIT</td><td>Sort time shifts to remote. Row count should remain</td></tr>
<tr>
<td>use_remote_estimate on/off</td><td>Planning accuracy test</td><td>Planner uses remote estimates</td><td>Planning time, join order, and runtime difference</td></tr>
</tbody>
</table>
</div><h2 id="heading-monitoring-and-logging">Monitoring and Logging</h2>
<p>In production, you need to know when a query starts misbehaving. There are two places to look: the local server and the remote server.</p>
<h3 id="heading-local-metrics">Local metrics</h3>
<ol>
<li><p><strong>pg_stat_statements.</strong> This extension tracks planning and execution times, row counts, and buffer hits for each query. Look for high total times relative to rows or calls.</p>
</li>
<li><p><strong>Auto Explain or auto_explain.</strong> Turn on <code>auto_explain.log_min_duration_statement</code> to capture slow queries with plans. This will show you the Remote SQL executed and whether the plan changed.</p>
</li>
<li><p><strong>Connection pool metrics.</strong> Monitor connection counts and wait events related to FDW operations (for example, PostgresFdwConnect, PostgresFdwGetResult) as described in the documentation <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=,Extension">[10]</a>.</p>
</li>
</ol>
<h3 id="heading-remote-metrics">Remote metrics</h3>
<ol>
<li><p><strong>pg_stat_statements on the remote server.</strong> This lets you see which Remote SQL queries are being executed, how often, and how long they take. Compare these with the Remote SQL strings in your local EXPLAIN plans.</p>
</li>
<li><p><strong>Server logs.</strong> Increase <code>log_statement</code> or <code>log_min_duration_statement</code> on the remote server to capture long-running remote queries.</p>
</li>
</ol>
<p>Correlating local and remote metrics can reveal patterns such as a new code path causing a surge in remote queries or pushdown failures, leading to heavy remote scans.</p>
<h2 id="heading-case-study-refactoring-a-keycloak-coverage-query">Case Study: Refactoring a Keycloak Coverage Query</h2>
<p>The theory above may seem abstract until you see it play out in practice. Let's walk through a real example inspired by a Keycloak integration.</p>
<p>The original query calculated coverage: given a list of category IDs, it returned the percentage of users who had attributes mapped to those categories and a JSON array of entity counts. The query used a CTE to build a list of scoped users, then joined it with user attributes, category mappings, and a few other tables.</p>
<h3 id="heading-symptom">Symptom</h3>
<p>In a test environment with 100K user records, the query averaged 166 ms. This was slower than expected. Running <code>EXPLAIN (ANALYZE, BUFFERS, VERBOSE)</code> showed two foreign scans on the Keycloak database. The first scanned <code>user_entity</code> 416 times (loops = 416). The second pulled all rows from <code>user_attribute</code> where <code>name = 'attributeA'</code> before filtering by tenant and group locally.</p>
<p>Here's a simplified excerpt (numbers are approximate):</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">Foreign</span> Scan <span class="hljs-keyword">on</span> <span class="hljs-built_in">public</span>.user_entity  (actual <span class="hljs-type">time</span>=<span class="hljs-number">0.117</span>.<span class="hljs-number">.0</span><span class="hljs-number">.117</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">1</span> loops=<span class="hljs-number">416</span>)
  Remote <span class="hljs-keyword">SQL</span>: <span class="hljs-keyword">SELECT</span> id, tenant_id <span class="hljs-keyword">FROM</span> <span class="hljs-built_in">public</span>.user_entity <span class="hljs-keyword">WHERE</span> (enabled <span class="hljs-keyword">AND</span> service_account_client_link <span class="hljs-keyword">IS</span> <span class="hljs-keyword">NULL</span> <span class="hljs-keyword">AND</span> id = <span class="hljs-meta">$1</span>)
<span class="hljs-keyword">Foreign</span> Scan <span class="hljs-keyword">on</span> <span class="hljs-built_in">public</span>.user_attribute  (actual <span class="hljs-type">time</span>=<span class="hljs-number">41.267</span>.<span class="hljs-number">.80</span><span class="hljs-number">.352</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">80739</span> loops=<span class="hljs-number">1</span>)
  Remote <span class="hljs-keyword">SQL</span>: <span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">value</span>, user_id <span class="hljs-keyword">FROM</span> <span class="hljs-built_in">public</span>.user_attribute <span class="hljs-keyword">WHERE</span> ((<span class="hljs-string">'attributeA'</span> = <span class="hljs-type">name</span>))
</code></pre>
<p>The first scan performed a single-row lookup 416 times. The second scan retrieved 80,739 rows because the only condition pushed down was <code>name = 'attributeA'</code>. Tenant and group scoping occurred locally. That meant 80k rows were transferred over the network and then filtered down to about 671 on the local side.</p>
<h3 id="heading-diagnosis">Diagnosis</h3>
<p>There were two main issues.</p>
<p>First was the N+1 remote calls on user_entity. The join to <code>user_entity</code> was not pushed down, so the plan executed a remote lookup for each row from <code>user_group_membership</code>. This created 416 remote queries.</p>
<p>Second was the unscoped attribute fetch. Because the <code>WHERE</code> clause included <code>user_entity.tenant_id = tenant.id</code> and <code>keycloak_group.name = 'groupA'</code> in a higher CTE, the FDW could not see those predicates when scanning <code>user_attribute</code>. It therefore fetched all rows with <code>name = 'attributeA'</code> and left the tenant and group filters to the local side.</p>
<h3 id="heading-refactor">Refactor</h3>
<p>The fix was to inline the tenant and group joins into the user_attribute scan to avoid the nested-loop pattern. The refactored <code>selected_user_attributes</code> CTE looked like this (simplified for readability):</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">WITH</span> selected_user_attributes <span class="hljs-keyword">AS</span> (
  <span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">DISTINCT</span> ua.user_id, ua.<span class="hljs-keyword">value</span>
  <span class="hljs-keyword">FROM</span> <span class="hljs-built_in">public</span>.user_attribute ua
  <span class="hljs-keyword">JOIN</span> <span class="hljs-built_in">public</span>.user_entity u <span class="hljs-keyword">ON</span> u.id = ua.user_id
  <span class="hljs-keyword">JOIN</span> <span class="hljs-built_in">public</span>.user_group_membership ugm <span class="hljs-keyword">ON</span> ugm.user_id = u.id
  <span class="hljs-keyword">JOIN</span> <span class="hljs-built_in">public</span>.keycloak_group g <span class="hljs-keyword">ON</span> g.id = ugm.group_id
  <span class="hljs-keyword">JOIN</span> <span class="hljs-built_in">public</span>.tenant r <span class="hljs-keyword">ON</span> r.id = u.tenant_id
  <span class="hljs-keyword">WHERE</span> ua.name = <span class="hljs-string">'attributeA'</span>
    <span class="hljs-keyword">AND</span> u.enabled
    <span class="hljs-keyword">AND</span> u.service_account_client_link <span class="hljs-keyword">IS</span> <span class="hljs-keyword">NULL</span>
    <span class="hljs-keyword">AND</span> r.name = <span class="hljs-string">'tenantA'</span>
    <span class="hljs-keyword">AND</span> (g.name = <span class="hljs-string">'groupA'</span> <span class="hljs-keyword">OR</span> g.parent_group = (
         <span class="hljs-keyword">SELECT</span> id <span class="hljs-keyword">FROM</span> <span class="hljs-built_in">public</span>.keycloak_group <span class="hljs-keyword">WHERE</span> <span class="hljs-type">name</span> = <span class="hljs-string">'groupA'</span> <span class="hljs-keyword">AND</span> tenant_id= r.id
    ))
)
</code></pre>
<p>This single query expresses the same scoping logic that previously lived in separate CTEs. Because all the join conditions are on the same foreign server and use built‑in operators, the FDW can push down the entire join. The new plan looked like this:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">Foreign</span> Scan  (actual <span class="hljs-type">time</span>=<span class="hljs-number">7.840</span>.<span class="hljs-number">.7</span><span class="hljs-number">.856</span> <span class="hljs-keyword">rows</span>=<span class="hljs-number">671</span> loops=<span class="hljs-number">1</span>)
  Remote <span class="hljs-keyword">SQL</span>: <span class="hljs-keyword">SELECT</span> ua.user_id, ua.<span class="hljs-keyword">value</span> <span class="hljs-keyword">FROM</span> user_attribute ua <span class="hljs-keyword">JOIN</span> user_entity u <span class="hljs-keyword">ON</span> ua.user_id = u.id <span class="hljs-keyword">JOIN</span> user_group_membership ugm <span class="hljs-keyword">ON</span> ugm.user_id = u.id <span class="hljs-keyword">JOIN</span> keycloak_group g <span class="hljs-keyword">ON</span> g.id = ugm.group_id <span class="hljs-keyword">JOIN</span> tenant r <span class="hljs-keyword">ON</span> u.tenant_id= r.id <span class="hljs-keyword">WHERE</span> ua.name = <span class="hljs-string">'attributeA'</span> <span class="hljs-keyword">AND</span> u.enabled <span class="hljs-keyword">AND</span> u.service_account_client_link <span class="hljs-keyword">IS</span> <span class="hljs-keyword">NULL</span> <span class="hljs-keyword">AND</span> r.name = <span class="hljs-string">'tenantA'</span> <span class="hljs-keyword">AND</span> (g.name = <span class="hljs-string">'groupA'</span> <span class="hljs-keyword">OR</span> g.parent_group = <span class="hljs-meta">$1</span>)
</code></pre>
<p>Only one remote query is executed, and it returns 671 rows. Tenant and group scoping occur on the remote server. There is no nested loop or repeated remote scan. The final runtime dropped to <strong>about 25 ms</strong>.</p>
<h3 id="heading-why-it-improved">Why it improved</h3>
<ol>
<li><p><strong>Fewer rows crossing the network.</strong> The old plan fetched 80k attribute rows and filtered them locally. The new plan fetched only the 671 scoped rows.</p>
</li>
<li><p><strong>No repeated remote calls.</strong> The old plan executed 416 remote scans of <code>user_entity</code>. The new plan performs one joined remote query.</p>
</li>
<li><p><strong>Less local work.</strong> Because the join and filtering happen remotely, the local side no longer hashes or filters large sets.</p>
</li>
</ol>
<h3 id="heading-key-takeaway">Key takeaway</h3>
<p>If you see a Foreign Scan with a high loops count or a Remote SQL that doesn’t contain your filters and joins, you’re leaving performance on the table. Merging filters and joins into a single remote query (subject to shippability rules) often yields orders-of-magnitude improvements.</p>
<h2 id="heading-checklist-and-troubleshooting-guide">Checklist and Troubleshooting Guide</h2>
<p>The following steps summarize how to approach FDW performance tuning:</p>
<ol>
<li><p><strong>Inspect the Remote SQL.</strong> Always run <code>EXPLAIN (VERBOSE)</code> and look at what is being sent to the remote. If your predicates are missing, the FDW isn't pushing them down.</p>
</li>
<li><p><strong>Check loops.</strong> If the loops are greater than 1 on a Foreign Scan, you are paying for repeated remote calls. Rewrite the query or reorder the joins to make the foreign scan run once.</p>
</li>
<li><p><strong>Make predicates shippable.</strong> Replace volatile functions with constants or parameters. Ensure operators and functions are built‑in or explicitly allow‑listed via the extensions option <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=">[2]</a>.</p>
</li>
<li><p><strong>Align types and collations.</strong> Use the same data types and collations on both sides to avoid semantic mismatches <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=It%20is%20generally%20recommended%20that,differently%20from%20the%20local%20server">[1]</a>.</p>
</li>
<li><p><strong>Push joins to the same server.</strong> Consolidate tables on one foreign server if possible. Joins across servers cannot be pushed down <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=When%20,clauses">[6]</a>.</p>
</li>
<li><p><strong>Use use_remote_estimate when planning seems off.</strong> Enabling remote estimates can improve join order selection <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a>.</p>
</li>
<li><p><strong>Tune fetch_size and costs</strong> if your queries transfer many rows. A bigger fetch_size reduces round-trip; adjusting <code>fdw_startup_cost</code> and <code>fdw_tuple_cost</code> influences the planner <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a>.</p>
</li>
<li><p><strong>Analyze foreign tables</strong> if you rely on local cost estimates. Keep in mind that stats can get stale quickly <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a>.</p>
</li>
<li><p><strong>Monitor both servers.</strong> Use <code>pg_stat_statements</code> on local and remote servers to see how often remote queries run and how long they take.</p>
</li>
<li><p><strong>Test version upgrades.</strong> Each major release improves FDW pushdown semantics (for example, aggregates in 10 <a target="_blank" href="https://www.postgresql.org/docs/release/10.0/#:~:text=,Jeevan%20Chalke%2C%20Ashutosh%20Bapat">[7]</a>, ORDER/LIMIT in 12 <a target="_blank" href="https://www.postgresql.org/docs/release/12.0/#:~:text=,Etsuro%20Fujita%29%20%C2%A7%20%C2%A7">[8]</a>). Retest after upgrading.</p>
</li>
</ol>
<h2 id="heading-case-study-takeaways">Case Study Takeaways</h2>
<p>Querying remote data with PostgreSQL’s <code>postgres_fdw</code> can be fast and convenient if you respect the underlying mechanics. Pushdown is the difference between streaming a trickle of relevant rows and hauling an ocean of data across the network. It isn't simply a matter of moving CPU cycles – it changes how much data moves, how many network round-trip occur, and how much your local server has to do.</p>
<p>The rules may seem restrictive – use only immutable functions, avoid cross‑server joins, align types and collations – but they exist to preserve correctness while enabling optimization.</p>
<p>By reading <code>EXPLAIN</code> from the bottom up, inspecting the Remote SQL, and understanding the shippability rules, you can spot slow patterns quickly. Armed with tuning knobs like <code>fetch_size</code> and <code>use_remote_estimate</code>, and a willingness to rewrite queries to make joins and filters pushable, you can often achieve dramatic performance gains without touching your hardware.</p>
<p>This case study shows that rewriting a query to enable a single-joined remote query reduced runtime from around <strong>166 ms to 25 ms</strong>. That sort of improvement is not rare. It’s what happens when you treat FDW queries as distributed queries rather than local queries in disguise.</p>
<p>The next time you debug a slow FDW query, remember this handbook. Check the Remote SQL. Count the loops. Ask yourself: “Am I doing the work close to the data, or am I bringing the data to the work?” Adjust accordingly, and you'll write queries that make the most of Postgres's federated capabilities while keeping your latency in check.</p>
<p>This section closes the case study loop and summarizes exactly what changed in the plan and why it produced a large end-to-end win. The following sections of the handbook turn that single win into a repeatable method: how Postgres determines what is shippable, how to quickly read FDW plans, which operations and versions matter, and how to debug common failure modes that prevent pushdown.</p>
<h2 id="heading-advanced-operations-a-deeper-dive-into-shippability">Advanced Operations: A Deeper Dive into Shippability</h2>
<p>The previous sections introduced the basic rules around what can be pushed to the remote and why. To really make sense of those rules, you need to see how they play out on the operations you use every day.</p>
<p>This section walks through filters, joins, aggregates, ordering, and limits, DISTINCT queries, and window functions in more detail. By the end, you should have a mental map of which operations to trust and which to double‑check when reading your plans.</p>
<h3 id="heading-filters-and-simple-predicates">Filters and simple predicates</h3>
<h4 id="heading-where-clauses-matter-more-than-you-think">WHERE clauses matter more than you think</h4>
<p>When you specify <code>WHERE attribute = 'value'</code> on a foreign table, the FDW will happily transmit that predicate to the remote server as long as the comparison uses built‑in types and immutable operators. For example:</p>
<ul>
<li><p><code>WHERE id = 42</code> is fine</p>
</li>
<li><p><code>WHERE lower(username) = 'hamdaan'</code> is fine if <code>lower()</code> is allow‑listed and immutable</p>
</li>
<li><p><code>WHERE created_at &gt;= now() - interval '7 days'</code> is not shippable because <code>now()</code> is volatile</p>
</li>
</ul>
<p>When such a predicate cannot be pushed, the FDW will fetch every row that matches all the shippable predicates and apply the rest locally. That means that a seemingly innocuous call to <code>now()</code> can blow up your network traffic.</p>
<p>The lesson is simple: compute volatile values up front (in your application or in a CTE) and reference them as constants in the query against the foreign table.</p>
<h4 id="heading-complex-expressions-are-not-automatically-unsafe">Complex expressions are not automatically unsafe</h4>
<p>Suppose you have <code>WHERE (status = 'active' AND (age BETWEEN 18 AND 29 OR age &gt; 65))</code>. This entire expression is shippable because it uses built‑in boolean logic, simple comparisons, and immutable operators. The FDW will deparse it into remote SQL and forward it. You only need to worry when one of the subexpressions introduces a function or operator that the FDW doesn’t recognize or cannot safely assume exists on the remote.</p>
<p>A good heuristic is: if you can express your filter using only simple comparisons, boolean logic, and built‑in functions, pushdown should work. When in doubt, check the Remote SQL.</p>
<h4 id="heading-array-and-json-operators">Array and JSON operators</h4>
<p>Modern Postgres makes heavy use of array and JSON functions. Many of these functions, like the array overlap operator <code>&amp;&amp;</code> used in the case study, are built‑in and can be shipped. But some JSON functions are provided by extensions (like <code>jsonb_path_query</code> or functions from the <code>pgjson</code> family).</p>
<p>If your filter uses one of these, ensure that the extension is available and allow‑listed on the foreign server. Otherwise, the FDW will fetch rows and perform the JSON logic locally. This is rarely what you want when dealing with large JSON columns.</p>
<h3 id="heading-joins-the-good-the-bad-and-the-ugly">Joins: the good, the bad, and the ugly</h3>
<h4 id="heading-sameserver-joins-are-your-friend">Same‑server joins are your friend</h4>
<p>If you join multiple foreign tables that are all defined on the same foreign server and user mapping, and if the join condition uses only shippable expressions, then the FDW can generate a single remote join. This is the ideal case.</p>
<p>For example, joining orders and customers on <code>orders.customer_id = customers.id</code> is pushable, as long as both tables reside on the same foreign server. The remote planner will use its own statistics and indexes to plan the join, and the local server will simply iterate through the result. Postgres 9.6 and later support this pattern <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=When%20,clauses">[6]</a>.</p>
<h4 id="heading-crossserver-joins-break-pushdown">Cross‑server joins break pushdown</h4>
<p>If you attempt to join two foreign tables that live on different servers (or even on the same remote server but with different user mappings), postgres_fdw will fetch the tables separately and join them locally. This is almost always slower than pushing the join down, because you end up transferring both tables in their entirety.</p>
<p>The FDW design team chose not to support cross‑server joins because there is no portable way to tell two remote servers to cooperate on a join. Your options are: replicate one table on the other server, materialize the smaller table locally before joining, or restructure the query to filter aggressively on each side before joining locally.</p>
<h4 id="heading-mixed-localforeign-joins-are-tricky">Mixed local/foreign joins are tricky</h4>
<p>Joining a local table to a foreign table cannot be pushed down, for straightforward reasons: the remote server has no access to your local data. A common pattern that triggers repeated remote calls looks like this:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> u.id, a.<span class="hljs-keyword">value</span>
<span class="hljs-keyword">FROM</span> users u
<span class="hljs-keyword">LEFT JOIN</span> user_attribute a
  <span class="hljs-keyword">ON</span> a.user_id = u.id <span class="hljs-keyword">AND</span> a.name = <span class="hljs-string">'favorite_color'</span>;
</code></pre>
<p>If <code>users</code> is a local table and <code>user_attribute</code> is foreign, the plan may use a nested loop: for each local u, it executes a remote lookup in user_attribute to retrieve attributes.</p>
<p>The fix is to flip the query: retrieve all relevant rows from <code>user_attribute</code> in one remote scan, then join them locally. Or, if possible, create a small temporary table on the remote side with your u.id values, perform the join entirely remotely, and then fetch the results.</p>
<h4 id="heading-join-conditions-matter">Join conditions matter</h4>
<p>Even when joining two foreign tables on the same server, an unshippable join condition will force the join to be local. For example, <code>JOIN ON textcol ILIKE '%foo%'</code> is not pushable because <code>ILIKE</code> might not exist or behave identically on the remote.</p>
<p>If you need case‑insensitive matching, consider lowercasing both sides: <code>LOWER(textcol) = 'foo'</code> (assuming the remote server has the <code>lower()</code> function available and allowed). Similarly, joining on a cast expression (for example, <code>JOIN ON CAST(a.id AS text) = b.text_id</code>) can block pushdown. Define your columns with matching types instead.</p>
<h3 id="heading-aggregates-and-grouping">Aggregates and grouping</h3>
<p>Aggregates are where the data movement story shines. When you can push down a <code>GROUP BY</code> and aggregate functions like <code>COUNT</code>, <code>SUM</code>, <code>AVG</code>, or <code>MAX</code>, you reduce the result set to just the aggregated rows. This can be a difference of several orders of magnitude.</p>
<p>Postgres 10 introduced aggregate pushdown <a target="_blank" href="https://www.postgresql.org/docs/release/10.0/#:~:text=,Jeevan%20Chalke%2C%20Ashutosh%20Bapat">[7]</a>. But not all aggregates are equal:</p>
<p><strong>Simple aggregates</strong> such as <code>COUNT(*)</code>, <code>SUM(col)</code>, <code>AVG(col)</code>, <code>MIN(col)</code>, and <code>MAX(col)</code> are shippable when applied to shippable expressions. Even <code>COUNT(DISTINCT col)</code> is often shippable, because the remote can deduplicate before counting. The FDW will wrap the aggregate in a remote query and return just the aggregated row.</p>
<p>If you see a GroupAggregate node on the local side, check whether all involved columns and functions are shippable. If they are, ensure that the join conditions above are also pushable.</p>
<p><strong>Filtered aggregates</strong> such as <code>COUNT(*) FILTER (WHERE x &gt; 5) or SUM(col) FILTER (WHERE status = 'active')</code> are often pushable, because they translate into <code>SUM(CASE WHEN condition THEN col ELSE 0 END) or COUNT(...)</code>. As long as the filter is shippable, the FDW will push it into the remote aggregate.</p>
<p><strong>User‑defined aggregates</strong> are rarely pushable. If you have a custom aggregate function, the FDW will not assume that it exists or behaves the same on the remote server. Even if you install the function on both servers, postgres_fdw won't push it unless the function is in an allow‑listed extension.</p>
<p><strong>Grouping sets and rollups</strong> are not currently pushable. When you write <code>GROUP BY GROUPING SETS (...) or ROLLUP(...)</code>, Postgres will compute the grouping locally even if the underlying scan is remote.</p>
<p>If you need complex rollups, consider performing them in two steps: push down the initial grouping to the remote server to reduce rows, then perform the rollup locally.</p>
<h3 id="heading-order-by-limit-and-distinct">ORDER BY, LIMIT, and DISTINCT</h3>
<p>Ordering and limiting rows may seem like purely cosmetic features, but they affect how much data is transferred. If the remote can sort and limit, the local server only receives the top N rows. If it cannot, the local server must sort everything.</p>
<p>Postgres 12 expanded the cases where <code>ORDER BY</code> and LIMIT are pushed down <a target="_blank" href="https://www.postgresql.org/docs/release/12.0/#:~:text=,Etsuro%20Fujita%29%20%C2%A7%20%C2%A7">[8]</a>. Here are guidelines:</p>
<ul>
<li><p><strong>Single foreign scan with simple sort:</strong> If your query selects from one foreign table and sorts by a shippable expression (for example, <code>ORDER BY created_at DESC</code>), the FDW will include <code>ORDER BY</code> in Remote SQL. It will also push down <code>LIMIT</code> and <code>OFFSET</code>. This is ideal because the remote server does the sort and sends only the top rows.</p>
</li>
<li><p><strong>Sort after join:</strong> If you sort after joining two foreign tables on the same server, and the join and sort expressions are shippable, the FDW may push both down. But if the sort requires columns from the local side or from a different remote server, the FDW cannot push it down.</p>
</li>
<li><p><strong>Sort after aggregation:</strong> Sorting aggregated results is often pushable as long as the aggregate itself is pushable. But when grouping occurs locally, the sort remains local.</p>
</li>
<li><p><strong>DISTINCT behaves like GROUP BY.</strong> If the distinct expression list is shippable, the FDW can push it down. If you write <code>SELECT DISTINCT ON (col1) col2, col3 FROM ...</code> and col3 is not part of the <code>DISTINCT</code> list, Postgres will treat this as <code>GROUP BY</code> and may push it. Be aware that <code>DISTINCT ON</code> semantics differ from plain <code>DISTINCT</code> and may not be pushable in older Postgres versions.</p>
</li>
</ul>
<h3 id="heading-window-functions-1">Window functions</h3>
<p>Window functions (for example, <code>ROW_NUMBER() OVER (PARTITION BY ...), RANK(), LAG(), LEAD()</code>) rely on ordering and partitioning across rows.</p>
<p>Postgres has not yet taught <code>postgres_fdw</code> how to push window functions. When you see a WindowAgg node in your plan, it’s almost always local. The FDW will fetch the rows, and the local server will sort, partition, and compute the window. If you need to run window functions on remote data, plan to transfer the data locally.</p>
<h3 id="heading-versionspecific-quirks">Version‑specific quirks</h3>
<p>The exact pushdown capabilities vary by release. When planning migrations or deciding whether to rely on a pushdown behavior, check the release notes:</p>
<ul>
<li><p><strong>9.6:</strong> first version to support pushdown of joins and sorts, and remote updates and deletes.</p>
</li>
<li><p><strong>10:</strong> introduced aggregate pushdown <a target="_blank" href="https://www.postgresql.org/docs/release/10.0/#:~:text=,Jeevan%20Chalke%2C%20Ashutosh%20Bapat">[7]</a>, significantly reducing network use for <code>GROUP BY</code> queries.</p>
</li>
<li><p><strong>11:</strong> improved partition pruning and join ordering for foreign tables.</p>
</li>
<li><p><strong>12:</strong> expanded <code>ORDER BY</code> and <code>LIMIT</code> pushdown <a target="_blank" href="https://www.postgresql.org/docs/release/12.0/#:~:text=,Etsuro%20Fujita%29%20%C2%A7%20%C2%A7">[8]</a>.</p>
</li>
<li><p><strong>15:</strong> added pushdown for simple <code>CASE</code> expressions and additional built‑in functions.</p>
</li>
<li><p><strong>17</strong> (development at the time of writing) continues to expand shippable constructs. Always test on your target version because subtle improvements can change what the FDW can ship.</p>
</li>
</ul>
<h2 id="heading-common-antipatterns-and-how-to-avoid-them">Common Anti‑Patterns and How to Avoid Them</h2>
<p>Everyone has run into FDW queries that seemed reasonable but turned out to be bottlenecks. Here are a few of the most common mistakes and how to correct them. These examples are deliberately simplified – so you can adapt them to your schema.</p>
<h3 id="heading-using-volatile-functions-in-predicates">Using volatile functions in predicates</h3>
<p><strong>Anti‑pattern:</strong></p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> *
<span class="hljs-keyword">FROM</span> audit_logs
<span class="hljs-keyword">WHERE</span> event_ts &gt;= now() - <span class="hljs-type">interval</span> <span class="hljs-string">'1 day'</span>;
</code></pre>
<p><code>now()</code> is a volatile function, so the FDW refuses to push this predicate. It pulls all rows from audit_logs and filters them locally.</p>
<p><strong>Better:</strong></p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> *
<span class="hljs-keyword">FROM</span> audit_logs
<span class="hljs-keyword">WHERE</span> event_ts &gt;= <span class="hljs-meta">$1</span>;
</code></pre>
<p>Compute <code>$1</code> (a timestamp) in your application or upstream query. Or compute it once in a CTE:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">WITH</span> cutoff <span class="hljs-keyword">AS</span> (<span class="hljs-keyword">SELECT</span> now() - <span class="hljs-type">interval</span> <span class="hljs-string">'1 day'</span> <span class="hljs-keyword">AS</span> ts) <span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> audit_logs, cutoff <span class="hljs-keyword">WHERE</span> event_ts &gt;= cutoff.ts;
</code></pre>
<p>The FDW sees a constant and pushes the predicate.</p>
<h3 id="heading-joining-local-and-foreign-data-first">Joining local and foreign data first</h3>
<p><strong>Anti‑pattern:</strong></p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> u.email, ua.<span class="hljs-keyword">value</span>
<span class="hljs-keyword">FROM</span> users u
<span class="hljs-keyword">LEFT JOIN</span> user_attribute ua <span class="hljs-keyword">ON</span> u.id = ua.user_id <span class="hljs-keyword">AND</span> ua.name = <span class="hljs-string">'favorite_movie'</span>;
</code></pre>
<p>This uses a local table (users) to drive a join to a foreign table (user_attribute). The FDW receives 10,000 individual remote queries if users have 10,000 rows. Each call fetches one or zero rows from user_attribute.</p>
<p><strong>Better:</strong></p>
<pre><code class="lang-pgsql"><span class="hljs-comment">-- Fetch all favorite movies remotely and join locally</span>
<span class="hljs-keyword">WITH</span> remote_movies <span class="hljs-keyword">AS</span> (
  <span class="hljs-keyword">SELECT</span> ua.user_id, ua.<span class="hljs-keyword">value</span>
  <span class="hljs-keyword">FROM</span> user_attribute ua
  <span class="hljs-keyword">WHERE</span> ua.name = <span class="hljs-string">'favorite_movie'</span>
)
<span class="hljs-keyword">SELECT</span> u.email, rm.<span class="hljs-keyword">value</span>
<span class="hljs-keyword">FROM</span> users u
<span class="hljs-keyword">LEFT JOIN</span> remote_movies rm <span class="hljs-keyword">ON</span> u.id = rm.user_id;
</code></pre>
<p>Now the FDW issues one query to fetch all relevant attributes, and the join is done locally in one pass.</p>
<h3 id="heading-crossserver-joins-without-materialization">Cross‑server joins without materialization</h3>
<p><strong>Anti‑pattern:</strong></p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> *
<span class="hljs-keyword">FROM</span> remote_db1.orders o
<span class="hljs-keyword">JOIN</span> remote_db2.customers c <span class="hljs-keyword">ON</span> o.customer_id = c.id;
</code></pre>
<p>This is not pushable because the two tables are on different foreign servers. Postgres will fetch orders and customers separately and join them locally. If orders have 1 million rows and customers have 50,000 rows, you will transfer 1.05 million rows.</p>
<p><strong>Better:</strong> Replicate or materialize one side on the other server (or locally) before joining. For example, create a materialized view m_customers on remote_db1 containing just the id and name of the customers you need, then join orders and m_customers on the same server. Alternatively, copy customers into a temporary table on the local server and join there.</p>
<h3 id="heading-complex-expressions-on-join-keys">Complex expressions on join keys</h3>
<p><strong>Anti‑pattern:</strong></p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> *
<span class="hljs-keyword">FROM</span> remote_table a
<span class="hljs-keyword">JOIN</span> remote_table b <span class="hljs-keyword">ON</span> CAST(a.key <span class="hljs-keyword">AS</span> <span class="hljs-type">text</span>) = b.key_text;
</code></pre>
<p>Casting a numeric key to text prevents pushdown. The remote server cannot use indexes and must return both tables. The local server performs the join and cast.</p>
<p><strong>Better:</strong> Align your schemas so that the join columns use the same type. If you cannot change the schema, create a computed column on the remote server with the appropriate type and use it in the join.</p>
<h3 id="heading-ignoring-collation-and-type-mismatches">Ignoring collation and type mismatches</h3>
<p><strong>Anti‑pattern:</strong></p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> *
<span class="hljs-keyword">FROM</span> remote_table
<span class="hljs-keyword">WHERE</span> citext_col = <span class="hljs-string">'abc'</span>;
</code></pre>
<p>If the remote server doesn’t have the citext extension installed, the comparison semantics will differ, and the FDW will refuse to ship the filter. This appears harmless until you see the plan and realize all rows were fetched.</p>
<p><strong>Better:</strong> Install the same extensions and collations on the remote server, or convert the column to a base type like text on both sides.</p>
<h2 id="heading-extending-tuning-calibrating-cost-models">Extending Tuning: Calibrating Cost Models</h2>
<p>Earlier, we discussed <code>fetch_size</code>, <code>use_remote_estimate</code>, and the cost knobs. This section expands on how to use them strategically.</p>
<h3 id="heading-balancing-fetch-size-and-memory">Balancing fetch size and memory</h3>
<p><code>fetch_size</code> controls how many rows the FDW asks for in each round trip <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=">[9]</a>. Think of it as the batch size. The default (100) works well for small result sets. If you expect to retrieve tens of thousands of rows, a higher fetch size reduces the overhead of many network requests. But there are trade‑offs:</p>
<ul>
<li><p><strong>Memory consumption:</strong> Each foreign scan buffers rows until they are consumed. A huge fetch size (for example, 10,000) may allocate more memory than you expect, especially when multiple scans run concurrently. Monitor memory usage as you increase this setting.</p>
</li>
<li><p><strong>Latency hiding:</strong> If network latency is high, overlapping network requests with local processing can hide some latency. But <code>postgres_fdw</code> does not pipeline multiple fetches – it waits for one batch before requesting the next. This means that a larger batch size reduces the number of waits, but cannot overlap them. If you operate across data centers, consider using a connection pooler or caching layer instead of just increasing fetch_size.</p>
</li>
</ul>
<h3 id="heading-remote-estimates-vs-local-estimates">Remote estimates vs. local estimates</h3>
<p>The planner uses statistics to estimate how many rows each node will produce, which in turn influences join order. When <code>use_remote_estimate</code> is false (the default), the planner guesses based on local stats collected by <code>ANALYZE</code> on the foreign table. This can be wrong if the remote table has a different distribution than the local sample, or if the table has changed since the last <code>ANALYZE</code>.</p>
<p>Setting <code>use_remote_estimate</code> to true instructs the FDW to run <code>EXPLAIN</code> on the remote server during planning to obtain row counts and cost estimates <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a>. This can improve join ordering, especially when joining multiple foreign tables or mixing local and foreign tables. The downside is increased planning time because each remote estimate runs an extra query.</p>
<p>In practice:</p>
<ul>
<li><p>Enable <code>use_remote_estimate</code> on queries with complex joins where the planner picks obviously wrong join orders. If enabling it improves the plan, consider leaving it on for that server or table.</p>
</li>
<li><p>Use <code>ANALYZE</code> on foreign tables periodically if your remote data is relatively static. This populates local stats and can avoid the overhead of remote estimates.</p>
</li>
<li><p>Don’t enable <code>use_remote_estimate</code> indiscriminately on simple lookups. The cost of additional round-trip remote flights may outweigh the benefit.</p>
</li>
</ul>
<h3 id="heading-tuning-cost-parameters">Tuning cost parameters</h3>
<p><code>fdw_startup_cost</code> and <code>fdw_tuple_cost</code> control how much the planner thinks it costs to start a foreign scan and fetch each row <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a>. If these are too low, the planner may choose a nested loop that generates many small remote calls. If they are too high, the planner might avoid remote scans even when they are efficient.</p>
<p>You can adjust these parameters based on empirical measurement:</p>
<ul>
<li><p>Increase <code>fdw_startup_cost</code> to discourage the planner from using nested loops that call the remote table repeatedly. You might set it to the average cost of a round-trip remote.</p>
</li>
<li><p>Increase <code>fdw_tuple_cost</code> if network bandwidth is limited or expensive. This indicates to the planner that each remote row incurs higher fetch costs than a local row. The planner will prefer plans that filter early on the remote side.</p>
</li>
</ul>
<p>Always adjust these settings gradually and observe the effect on the plan. Keep separate settings per foreign server if network conditions differ.</p>
<h3 id="heading-when-to-analyze-foreign-tables">When to analyze foreign tables</h3>
<p>Running <code>ANALYZE</code> on a foreign table collects sample statistics by pulling a subset of rows from the remote server. This helps the planner estimate row counts when <code>use_remote_estimate</code> is off. It also helps decide whether to use an index on the remote side. You should analyze foreign tables when:</p>
<ul>
<li><p>The remote table is large and static, and you want accurate local estimates without the overhead of remote estimates.</p>
</li>
<li><p>You have just defined a foreign table, and the default stats are empty.</p>
</li>
<li><p>You changed the extensions allow‑list to enable more pushdown and want the planner to see the effect.</p>
</li>
</ul>
<p>Conversely, if the remote data changes constantly, <code>ANALYZE</code> results will quickly become stale. In that case, rely on use_remote_estimate instead.</p>
<h2 id="heading-further-case-studies-and-practical-examples">Further Case Studies and Practical Examples</h2>
<p>The Keycloak coverage example is not the only place where pushdown matters. The following scenarios illustrate other patterns you may encounter.</p>
<h3 id="heading-reporting-on-a-sharded-logging-system">Reporting on a sharded logging system</h3>
<p>Imagine you store application logs across multiple shards, each a separate Postgres database. You want to produce a report of the number of error logs per service per day.</p>
<p>A naïve approach might join all shards in one query:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> shard, service, date_trunc(<span class="hljs-string">'day'</span>, log_time) <span class="hljs-keyword">AS</span> day, COUNT(*)
<span class="hljs-keyword">FROM</span> shard1.logs
<span class="hljs-keyword">UNION</span> <span class="hljs-keyword">ALL</span>
<span class="hljs-keyword">SELECT</span> shard, service, date_trunc(<span class="hljs-string">'day'</span>, log_time) <span class="hljs-keyword">AS</span> day, COUNT(*)
<span class="hljs-keyword">FROM</span> shard2.logs
...;
</code></pre>
<p>This approach will fetch all log rows to the local server and aggregate them locally. A better solution is to push the grouping to each shard:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> shard, service, day, sum(count)
<span class="hljs-keyword">FROM</span> (
  <span class="hljs-keyword">SELECT</span> <span class="hljs-number">1</span> <span class="hljs-keyword">AS</span> shard, service, date_trunc(<span class="hljs-string">'day'</span>, log_time) <span class="hljs-keyword">AS</span> day, COUNT(*) <span class="hljs-keyword">AS</span> count
  <span class="hljs-keyword">FROM</span> shard1.logs
  <span class="hljs-keyword">WHERE</span> log_time &gt;= <span class="hljs-meta">$1</span> <span class="hljs-keyword">AND</span> log_time &lt; <span class="hljs-meta">$2</span>
  <span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> service, day
  <span class="hljs-keyword">UNION</span> <span class="hljs-keyword">ALL</span>
  <span class="hljs-keyword">SELECT</span> <span class="hljs-number">2</span> <span class="hljs-keyword">AS</span> shard, service, date_trunc(<span class="hljs-string">'day'</span>, log_time) <span class="hljs-keyword">AS</span> day, COUNT(*)
  <span class="hljs-keyword">FROM</span> shard2.logs
  <span class="hljs-keyword">WHERE</span> log_time &gt;= <span class="hljs-meta">$1</span> <span class="hljs-keyword">AND</span> log_time &lt; <span class="hljs-meta">$2</span>
  <span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> service, day
  ...
) x
<span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> shard, service, day;
</code></pre>
<p>Here, each foreign server returns a small set of aggregated rows instead of raw logs. The outer aggregation sums across shards. This pattern generalizes: push grouping and filtering to the remote side, then combine locally.</p>
<h3 id="heading-combining-remote-and-local-data-for-analytics">Combining remote and local data for analytics</h3>
<p>Suppose you have a local table <code>users</code> and a remote table <code>orders</code>. You want to compute the average order amount per user segment. A naïve query might look like:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> u.segment, AVG(o.amount)
<span class="hljs-keyword">FROM</span> users u
<span class="hljs-keyword">JOIN</span> orders o <span class="hljs-keyword">ON</span> o.user_id = u.id
<span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> u.segment;
</code></pre>
<p>This is a local join driving a remote nested loop. The better approach is to aggregate orders remotely by user_id and join on the small result:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">WITH</span> remote_totals <span class="hljs-keyword">AS</span> (
  <span class="hljs-keyword">SELECT</span> user_id, SUM(amount) <span class="hljs-keyword">AS</span> total, COUNT(*) <span class="hljs-keyword">AS</span> n
  <span class="hljs-keyword">FROM</span> orders
  <span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> user_id
)
<span class="hljs-keyword">SELECT</span> u.segment, AVG(rt.total / rt.n)
<span class="hljs-keyword">FROM</span> users u
<span class="hljs-keyword">JOIN</span> remote_totals rt <span class="hljs-keyword">ON</span> u.id = rt.user_id
<span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> u.segment;
</code></pre>
<p>This pushes the heavy aggregation to the remote and transfers only one row per user. The local join then groups by segment. As with other examples, the key is to reduce remote rows before they cross the network.</p>
<h3 id="heading-avoiding-pushdown-for-correctness">Avoiding pushdown for correctness</h3>
<p>There are legitimate cases where you should <em>prevent</em> pushdown because of semantic differences. Postgres allows you to do this by adding <code>OFFSET 0</code> or wrapping the foreign table in a CTE.</p>
<p>For example, if a built‑in function behaves differently on the remote due to a version mismatch, you can force local evaluation:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">WITH</span> local_eval <span class="hljs-keyword">AS</span> (<span class="hljs-keyword">SELECT</span>  <span class="hljs-keyword">FROM</span> remote_table)  <span class="hljs-comment">-- CTE prevents pushdown</span>
<span class="hljs-keyword">SELECT</span> 
<span class="hljs-keyword">FROM</span> local_eval
<span class="hljs-keyword">WHERE</span> some_complex_expression(local_eval.col) &gt; <span class="hljs-number">0</span>;
</code></pre>
<p>Alternatively, a <code>WHERE</code> clause like <code>random() &lt; 0.1</code> will not push down because <code>random()</code> is volatile – you don't need to force it. But adding <code>OFFSET 0</code> is a simple hack that prevents any pushdown:</p>
<pre><code class="lang-pgsql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> remote_table <span class="hljs-keyword">OFFSET</span> <span class="hljs-number">0</span>;
</code></pre>
<p>Knowing how to disable pushdown intentionally helps you debug. If a query returns different results when pushdown occurs, suspect type/collation mismatches or remote session settings <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=In%20the%20remote%20sessions%20opened,their%20expected%20search%20path%20environment">[4]</a>.</p>
<h2 id="heading-monitoring-diagnostics-and-regression-testing">Monitoring, Diagnostics, and Regression Testing</h2>
<p>Monitoring doesn't end at counting remote rows. To make pushdown reliable in production, you need to set up mechanisms to detect regressions and gather evidence when performance changes.</p>
<h3 id="heading-automate-explain-regression-tests">Automate EXPLAIN regression tests</h3>
<p>In addition to unit tests and integration tests, you can add tests that assert the shape of your plans. For instance, if a mission‑critical report must always push down a <code>WHERE</code> clause, you can write a test that runs <code>EXPLAIN (VERBOSE)</code> and checks that the Remote SQL contains the filter. You might even parse loops and assert that it is 1. When a developer inadvertently adds a non‑immutable function or changes a join, the test will fail. This is akin to snapshot testing for SQL.</p>
<h3 id="heading-monitor-pgstatstatements-across-servers">Monitor pg_stat_statements across servers</h3>
<p>Enable <code>pg_stat_statements</code> on both the local and remote servers. On the local side, track the total time, planning time, and rows for each FDW query. On the remote side, track which queries are being executed.</p>
<p>Look for outliers: a query whose remote calls spike or whose average remote rows jump from hundreds to thousands. Those are early signs of pushdown failure.</p>
<h3 id="heading-log-remote-sql-with-autoexplain">Log remote SQL with auto_explain</h3>
<p>Setting <code>auto_explain.log_min_duration_statement</code> (for example, to 500ms) causes Postgres to automatically log slow queries with their plans. Combine this with <code>auto_explain.log_verbose = true</code> and <code>auto_explain.log_nested_statements = true</code> to capture remote SQL as well. When a federated query slows down, the log will show you exactly what remote SQL was executed and how often. This is invaluable in production, where you cannot always run EXPLAIN interactively.</p>
<h3 id="heading-use-connection-pooling-and-prepare-statements">Use connection pooling and prepare statements</h3>
<p><code>postgres_fdw</code> maintains a connection pool keyed on the user mapping. It reuses connections between queries, but you can also use connection pooling at the network level (for example, pgbouncer or pgcat).</p>
<p>Keeping connections warm reduces the startup cost, as captured by <code>fdw_startup_cost</code>. Meanwhile, preparing statements on the remote server (via <code>PREPARE</code> and <code>EXECUTE</code>) can save parse time when the same remote SQL is executed frequently. <code>postgres_fdw</code> can use server‑side prepared statements for parameterized scans.</p>
<h3 id="heading-regression-testing-after-version-upgrades">Regression testing after version upgrades</h3>
<p>Every major Postgres release brings improvements to postgres_fdw pushdown semantics. But new releases also change planner heuristics and remote SQL generation. After an upgrade, rerun your key queries with EXPLAIN (VERBOSE), compare the Remote SQL, and benchmark them.</p>
<p>In some cases, a release may push down something previously local, revealing a latent type mismatch or a function difference. In other cases, pushdown may be withheld due to a new rule. Don’t assume that an upgrade automatically improves performance – test it.</p>
<h2 id="heading-extended-guidelines-for-advanced-dbas">Extended Guidelines for Advanced DBAs</h2>
<p>To close this handbook, here are consolidated guidelines distilled from the previous sections. They go beyond simple bullet points to capture nuances. Keep them handy for reference or print them out for your team.</p>
<ol>
<li><p><strong>Respect the FDW safety model.</strong> Immutable functions and built‑in operators are your friends. Anything outside that scope must be explicitly allowed or evaluated locally. Understand which items belong to each category and plan accordingly.</p>
</li>
<li><p><strong>Always read the Remote SQL.</strong> Don’t trust your intuition about what is being pushed down. The Remote SQL string is the only source of truth. It indicates whether a predicate, join, sort, or limit operation is occurring remotely. It also shows parameter placeholders (for example, $1) that correspond to values passed from the local plan.</p>
</li>
<li><p><strong>Reduce before you fetch.</strong> The network is the highest cost. If the remote can reduce rows through filtering, grouping, or limiting, let it. If it cannot, structure your query to enable it. Avoid queries that require pulling large raw tables and processing them locally.</p>
</li>
<li><p><strong>Beware of join order.</strong> The planner sometimes chooses a nested loop with a foreign table as the inner side, resulting in repeated remote calls. Examine loops: if you see a high number, consider rewriting the query or adjusting cost parameters.</p>
</li>
<li><p><strong>Use CTEs strategically.</strong> A CTE can isolate remote scans and let you control whether they are materialized once or inlined. Use <code>MATERIALIZED</code> to avoid repeated remote scans when a CTE is referenced multiple times. Use <code>NOT MATERIALIZED</code> to allow optimizations across CTE boundaries.</p>
</li>
<li><p><strong>Instrument, monitor, iterate.</strong> Good FDW performance is not a one‑off fix. Monitor queries and plans. Use tests to catch regressions. Adjust tuning knobs and indexes as your data or workload changes. Document your reasoning so others can understand why a particular plan is expected.</p>
</li>
<li><p><strong>Educate your team.</strong> Federated queries invite subtle bugs and performance traps. Share the high‑level rules – immutable functions only, cross‑server joins are local, always check remote SQL – so engineers write safer queries by default. A 30‑minute training can save hours of debugging later.</p>
</li>
</ol>
<h2 id="heading-bringing-it-all-together">Bringing it All Together</h2>
<p>This handbook has covered a lot of ground: from the high‑level principle that pushdown is about data movement, to the nitty‑gritty of join conditions and tuning knobs, to troubleshooting steps and case studies. It is intentionally opinionated and personal: these are the patterns and pitfalls encountered in real systems, not abstract guidelines. By sharing specific examples, I hoped to make the rules memorable and show how they interplay with actual workloads.</p>
<p>The goal is not just to tell you what to do, but to show you how to think and problem solve: review the plan, trace data movement, and determine whether the query is doing the heavy work in the right place.</p>
<p>That thinking process, practiced enough times, becomes second nature. When you write a new query, you'll automatically consider whether your predicates are immutable, whether the join can be shipped, and whether you are about to trigger an N+1 pattern. When you review plans, you'll start from the Foreign Scan nodes and remote SQL, not the top‑level node. When you tune, you'll know which knobs to twist and in which order.</p>
<p>Keep experimenting. Use the examples here as starting points. Try different structures in a test environment and measure the difference. The more you play with pushdown, the more comfortable you'll become with its constraints and superpowers.</p>
<p>If this handbook helps you avoid one performance incident or saves you from shipping a broken query, it has done its job. Enjoy exploring the federated world of Postgres.</p>
<h2 id="heading-references">References</h2>
<p><a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=It%20is%20generally%20recommended%20that,differently%20from%20the%20local%20server">[1]</a> <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=">[2]</a> <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=This%20option%2C%20which%20can%20be,false">[3]</a> <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=In%20the%20remote%20sessions%20opened,their%20expected%20search%20path%20environment">[4]</a> <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=functions%20in%20such%20clauses%20must,to%20reduce%20the%20risk%20of">[5]</a> <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=When%20,clauses">[6]</a> <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=">[9]</a> <a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html#:~:text=,Extension">[10]</a> PostgreSQL: Documentation: 18: F.38. postgres_fdw – access data stored in external PostgreSQL servers (<a target="_blank" href="https://www.postgresql.org/docs/current/postgres-fdw.html">https://www.postgresql.org/docs/current/postgres-fdw.html</a>)</p>
<p><a target="_blank" href="https://www.postgresql.org/docs/release/10.0/#:~:text=,Jeevan%20Chalke%2C%20Ashutosh%20Bapat">[7]</a> PostgreSQL: Release Notes (<a target="_blank" href="https://www.postgresql.org/docs/release/10.0/">https://www.postgresql.org/docs/release/10.0/</a>)</p>
<p><a target="_blank" href="https://www.postgresql.org/docs/release/12.0/#:~:text=,Etsuro%20Fujita%29%20%C2%A7%20%C2%A7">[8]</a> PostgreSQL: Release Notes (<a target="_blank" href="https://www.postgresql.org/docs/release/12.0/">https://www.postgresql.org/docs/release/12.0/</a>)</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn Relational Database Design ]]>
                </title>
                <description>
                    <![CDATA[ Relational databases are used in many different types of software. We just posted a course on the freeCodeCamp.org YouTube channel that will help you learn relational database design from the ground up. This course covers SQL fundamentals, entity-rel... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-relational-database-design-1/</link>
                <guid isPermaLink="false">697bd350c5c536590f03089b</guid>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Thu, 29 Jan 2026 21:38:24 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769722691702/9d54eabe-0c4c-43d9-b584-a34c61fbe43d.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Relational databases are used in many different types of software.</p>
<p>We just posted a course on the freeCodeCamp.org YouTube channel that will help you learn relational database design from the ground up. This course covers SQL fundamentals, entity-relationship modeling, normalization (1NF through BCNF), data types and constraints, indexing strategies, and query optimization.</p>
<p>This course is based on the book Grokking Relational Database Design by Dr. Qiang Hao and Dr. Michael Tsikerdekis (Manning Publications, 2025).</p>
<p>Here are the sections in the course:</p>
<ul>
<li><p>Relational Databases for Beginners — Tables, Entities, Keys &amp; SQL</p>
</li>
<li><p>SQL Filtering &amp; Aggregation</p>
</li>
<li><p>SQL Table Commands</p>
</li>
<li><p>Foreign Keys in SQL</p>
</li>
<li><p>How SQL JOINs Work</p>
</li>
<li><p>How to learn SQL on your own</p>
</li>
<li><p>Database Design Goals</p>
</li>
<li><p>Database Design Lifecycle</p>
</li>
<li><p>From Real-World Ideas to Tables</p>
</li>
<li><p>Primary Key, Candidate Key, and Super Key</p>
</li>
<li><p>Don't Use the Wrong SQL String Type</p>
</li>
<li><p>The FLOAT Mistake That Crashed a Stock Exchange</p>
</li>
<li><p>SQL Date and Time Types Explained</p>
</li>
<li><p>Connecting Entities in an ER Diagram</p>
</li>
<li><p>One-to-One Relationships</p>
</li>
<li><p>One-to-Many Relationships</p>
</li>
<li><p>Many-to-Many Relationships</p>
</li>
<li><p>Strong vs Weak Entities</p>
</li>
<li><p>First Normal Form - Primary Keys and Atomic Values</p>
</li>
<li><p>Second Normal Form - Partial Keys and Functional Dependencies</p>
</li>
<li><p>Third Normal Form - Transitive Dependencies</p>
</li>
<li><p>The Untold Story of BCNF</p>
</li>
<li><p>Primary Key vs Unique Constraints</p>
</li>
<li><p>Foreign Key Constraints - ON DELETE &amp; ON UPDATE</p>
</li>
<li><p>Other Constraints: NOT NULL, DEFAULT, and CHECK</p>
</li>
<li><p>Access Control, Hashing &amp; Encryption</p>
</li>
<li><p>B-Tree vs Full-Text Indexes</p>
</li>
<li><p>Denormalization</p>
</li>
</ul>
<p>Watch the full course on <a target="_blank" href="https://youtu.be/26ls5lNiijk">the freeCodeCamp.org YouTube channel</a> (6-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/26ls5lNiijk" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Manage Blue-Green Deployments on AWS ECS with Database Migrations: Complete Implementation Guide ]]>
                </title>
                <description>
                    <![CDATA[ Blue-green deployments are celebrated for enabling zero-downtime releases and instant rollbacks. You deploy your new version (green) alongside the current one (blue), switch traffic over, and if something goes wrong, you switch back. Simple, right? N... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-manage-blue-green-deployments-on-aws-ecs-with-database-migrations/</link>
                <guid isPermaLink="false">69693109596ef11a775126fb</guid>
                
                    <category>
                        <![CDATA[ deployment ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Blue/Green deployment ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Destiny Erhabor ]]>
                </dc:creator>
                <pubDate>Thu, 15 Jan 2026 18:25:13 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768497873258/be1ce2a3-c95f-488e-913a-a772007a0d2a.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Blue-green deployments are celebrated for enabling zero-downtime releases and instant rollbacks. You deploy your new version (green) alongside the current one (blue), switch traffic over, and if something goes wrong, you switch back. Simple, right?</p>
<p>Not quite. While blue-green deployments work beautifully for stateless applications, they become significantly more complex when you introduce databases and stateful services into the equation. The moment your blue and green environments need to share a database, you're facing a fundamental challenge: how do you evolve your schema and data without breaking either version?</p>
<p>In this article, we'll tackle the real-world complexities of implementing blue-green deployments on Amazon ECS when your application depends on shared state. You'll learn practical strategies for handling database migrations, managing sessions, and maintaining data consistency across application versions.</p>
<p>💡 <strong>Complete Working Example</strong>: All code examples in this article are available in the <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs">bluegreen-deployment-ecs</a> <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs">repository on GitHub.</a> You can clone it and deploy the entire infrastructure to your AWS account.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-problem-with-state-in-blue-green-deployments">The Problem with State in Blue-Green Deployments</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-database-migration-strategies-for-blue-green">Database Migration Strategies for Blue-Green</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-handling-stateful-services-in-ecs">Handling Stateful Services in ECS</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-complete-implementation-end-to-end-example">Complete Implementation: End-to-End Example</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-rollback-strategies">Rollback Strategies</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-monitoring-during-deployments">Monitoring During Deployments</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-best-practices">Best Practices</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-not-to-use-blue-green">When NOT to Use Blue-Green</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-alternative-deployment-strategies">Alternative Deployment Strategies</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-cleanup">Cleanup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-further-resources">Further Resources</a></p>
</li>
</ul>
<h2 id="heading-the-problem-with-state-in-blue-green-deployments">The Problem with State in Blue-Green Deployments</h2>
<p>The elegance of blue-green deployments starts to crumble when you consider databases. Here's why: your blue environment runs application version 1, your green environment runs version 2, but they both connect to the same RDS instance.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768056130585/109ceff8-4500-45d7-aaa0-5e259b4a7b11.png" alt="Figure 1: The blue-green dilemma - both environments share the same database but expect different schemas" class="image--center mx-auto" width="1579" height="1131" loading="lazy"></p>
<p>Consider this scenario: you're adding a new feature that requires a new database column. Version 2 of your application expects this column to exist. You deploy green, run your migration to add the column, and switch traffic.</p>
<p>Everything works great until you need to rollback. Now version 1 is receiving traffic, but it doesn't know what to do with that new column. Worse, if your migration removed or renamed a column that version 1 depends on, your rollback will fail catastrophically.</p>
<p>Here are the specific challenges you'll face:</p>
<ul>
<li><p><strong>Schema versioning conflicts</strong>: Your blue environment expects schema version N, while green expects version N+1. Any breaking schema change will cause one environment to fail.</p>
</li>
<li><p><strong>Data inconsistencies</strong>: If version 2 writes data in a new format that version 1 can't read, switching back to blue will result in errors or data corruption.</p>
</li>
<li><p><strong>Irreversible migrations</strong>: Some database changes are inherently destructive. Dropping a column, changing data types, or restructuring tables can't be easily undone.</p>
</li>
<li><p><strong>Failed rollbacks</strong>: The promise of instant rollback becomes hollow when your database has evolved beyond what the blue environment can handle.</p>
</li>
</ul>
<p>Let's explore the strategies that solve these problems.</p>
<h2 id="heading-database-migration-strategies-for-blue-green">Database Migration Strategies for Blue-Green</h2>
<h3 id="heading-strategy-1-the-expand-contract-pattern-recommended">Strategy 1: The Expand-Contract Pattern (Recommended)</h3>
<p>The expand-contract pattern is the most practical approach for blue-green deployments with shared databases. It works by breaking schema changes into three phases, ensuring backwards compatibility throughout.</p>
<h4 id="heading-phase-1-expand">Phase 1: Expand</h4>
<p>In this phase, you add new schema elements while keeping old ones intact. If you're renaming a column, add the new column without removing the old one. If you're changing table structure, create new tables alongside existing ones.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Example: Renaming 'user_name' to 'username'</span>
<span class="hljs-comment">-- Phase 1: Expand - Add new column</span>
<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">users</span> <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> username <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">255</span>);

<span class="hljs-comment">-- Populate new column from old column</span>
<span class="hljs-keyword">UPDATE</span> <span class="hljs-keyword">users</span> <span class="hljs-keyword">SET</span> username = user_name <span class="hljs-keyword">WHERE</span> username <span class="hljs-keyword">IS</span> <span class="hljs-literal">NULL</span>;
</code></pre>
<p>At this point, your database supports both the old schema (used by blue) and the new schema (used by green). Your application code needs to handle both as well.</p>
<h4 id="heading-phase-2-deploy">Phase 2: Deploy</h4>
<p>Now, deploy your green environment with code that uses the new schema. But this code should still write to both old and new columns to maintain compatibility.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Version 2 code - writes to both columns</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">update_user</span>(<span class="hljs-params">user_id, username</span>):</span>
    db.execute(
        <span class="hljs-string">"UPDATE users SET username = %s, user_name = %s WHERE id = %s"</span>,
        (username, username, user_id)
    )
</code></pre>
<p>Traffic shifts from blue to green. Both environments work because the database supports both schemas. If you need to rollback, blue still functions perfectly because the old columns are intact.</p>
<h4 id="heading-phase-3-contract">Phase 3: Contract</h4>
<p>After you're confident green is stable and you've decommissioned blue, remove the old schema elements in a separate deployment.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Phase 3: Contract - Remove old column</span>
<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">users</span> <span class="hljs-keyword">DROP</span> <span class="hljs-keyword">COLUMN</span> user_name;
</code></pre>
<p>Update your application code to stop writing to the old columns. This is now version 3, deployed as a standard release.</p>
<p><strong>When to use</strong>: This should be your default approach for most schema changes including adding/removing columns, renaming fields, changing constraints, and restructuring tables.</p>
<h3 id="heading-strategy-2-parallel-schemas-or-databases">Strategy 2: Parallel Schemas or Databases</h3>
<p>For major breaking changes where backwards compatibility is impractical, you might maintain entirely separate database versions. Version 1 connects to database A, version 2 connects to database B. This approach requires data synchronization between databases. AWS Database Migration Service (DMS) can replicate data in near real-time, or you can build custom replication logic using change data capture.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Configuration for version-specific database connections</span>
DATABASE_CONFIG = {
    <span class="hljs-string">'v1'</span>: {
        <span class="hljs-string">'host'</span>: <span class="hljs-string">'blue-db.cluster-xxxxx.us-east-1.rds.amazonaws.com'</span>,
        <span class="hljs-string">'database'</span>: <span class="hljs-string">'app_v1'</span>
    },
    <span class="hljs-string">'v2'</span>: {
        <span class="hljs-string">'host'</span>: <span class="hljs-string">'green-db.cluster-yyyyy.us-east-1.rds.amazonaws.com'</span>,
        <span class="hljs-string">'database'</span>: <span class="hljs-string">'app_v2'</span>
    }
}
</code></pre>
<p>During the transition period, you run DMS to keep both databases synchronized, with the understanding that writes go to the active version's database.</p>
<p>The challenge is that you're now managing data synchronization, dealing with replication lag, and paying for two databases. Eventually, you need to consolidate back to one database, which requires another migration. This is expensive and complex, which is why it's the "nuclear option."</p>
<p><strong>When to use</strong>: Only for major architectural changes, complete data model redesigns, or when migrating between database types (for example, MySQL to PostgreSQL). If expand-contract can possibly work, use that instead.</p>
<h3 id="heading-strategy-3-feature-flags-for-gradual-rollout">Strategy 3: Feature Flags for Gradual Rollout</h3>
<p>Feature flags allow you to decouple deployment from release. Both blue and green run the same codebase, but features are toggled on or off via configuration. This shifts the problem from schema compatibility to code-level compatibility.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_user</span>(<span class="hljs-params">user_data</span>):</span>
    config = get_feature_config()
    <span class="hljs-keyword">if</span> config[<span class="hljs-string">'use_new_user_schema'</span>]:
        <span class="hljs-keyword">return</span> create_user_v2(user_data)
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> create_user_v1(user_data)
</code></pre>
<p>Instead of having two separate deployments (blue and green), you have ONE deployment with conditional logic. The "switch" from old to new behavior happens via configuration change, not infrastructure change. This is technically not pure blue-green, but it's a powerful hybrid approach.</p>
<h4 id="heading-how-it-works">How it works</h4>
<p>Your application checks AWS AppConfig (or similar service) for feature flags before executing code paths. When a flag is off, it uses the old schema/logic. When on, it uses the new schema/logic. You can even enable features for a percentage of users (5% get new behavior, 95% get old behavior) for gradual rollout.</p>
<p>The tradeoff is that your codebase temporarily contains both old and new logic with conditional branches everywhere. This increases complexity and requires disciplined cleanup after the feature is fully released. However, you gain fine-grained control and can toggle features on/off instantly without deploying new infrastructure.</p>
<p><strong>When to use:</strong> For large features with uncertain stability, gradual rollouts to monitor impact, or when you want instant rollback capability without touching infrastructure. Also useful when combined with expand-contract for extra safety.</p>
<h2 id="heading-handling-stateful-services-in-ecs">Handling Stateful Services in ECS</h2>
<p>Beyond databases, several other stateful components require careful consideration during blue-green deployments.</p>
<h3 id="heading-session-management">Session Management</h3>
<p>It’s a good idea to store sessions in ElastiCache or DynamoDB rather than application memory:</p>
<pre><code class="lang-python">app.config[<span class="hljs-string">'SESSION_TYPE'</span>] = <span class="hljs-string">'dynamodb'</span>
app.config[<span class="hljs-string">'SESSION_DYNAMODB'</span>] = boto3.client(<span class="hljs-string">'dynamodb'</span>)
</code></pre>
<h3 id="heading-shared-resources">Shared Resources</h3>
<p>Beyond database sessions, your application likely depends on other stateful components that need coordination during blue-green deployments:</p>
<h4 id="heading-1-s3-buckets">1. S3 buckets</h4>
<p>If your application stores files or data in S3, schema changes to object metadata or file formats can cause compatibility issues between versions. To address this, you can enable S3 versioning to maintain multiple format versions simultaneously.</p>
<p>For example, if version 2 writes JSON files with a new structure, version 1 should still be able to read the old format. You can include a version prefix in object keys (like <code>v1/user-data.json</code> and <code>v2/user-data.json</code>) or embed version metadata in the objects themselves.</p>
<h4 id="heading-message-queues-sqssns">Message queues (SQS/SNS)</h4>
<p>Messages sent by one version must be readable by the other during the transition. You can use versioned message schemas with a <code>schema_version</code> field in your message payload. Both blue and green should be able to parse messages from either version, even if they only produce messages in their preferred format. Consider using a schema registry or validation library to ensure compatibility.</p>
<h4 id="heading-cache-layers-elasticacheredis">Cache layers (ElastiCache/Redis)</h4>
<p>Cached data structure changes can cause deserialization errors when switching between versions. Try versioning your cache keys by including the schema version: <code>CACHE_VERSION = 'v2'</code> and then <code>cache_key = f"user:{CACHE_VERSION}:{user_id}"</code>. This ensures blue and green maintain separate cache namespaces, preventing cross-contamination. When you fully migrate to green, you can flush the old cache keys or let them expire naturally.</p>
<pre><code class="lang-python">CACHE_VERSION = <span class="hljs-string">'v2'</span>
cache_key = <span class="hljs-string">f"user:<span class="hljs-subst">{CACHE_VERSION}</span>:<span class="hljs-subst">{user_id}</span>"</span>
</code></pre>
<h2 id="heading-implementation-end-to-end-example">Implementation: End-to-End Example</h2>
<p>Let's walk through a complete blue-green deployment with ECS, handling a database schema change using the <strong>expand-contract pattern</strong>. We'll migrate from a single <code>address</code> text field to structured <code>street_address</code>, <code>city</code>, <code>state</code>, and <code>zip_code</code> fields.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768052075044/fdb732dd-cf3d-473f-a22c-f5ab98870625.png" alt="Figure 2: The three phases of expand-contract migration ensuring continuous compatibility" class="image--center mx-auto" width="3444" height="624" loading="lazy"></p>
<p><strong>Here’s the scenario:</strong> You're running an e-commerce application on ECS. The current version (blue) stores customer addresses in a single address text field. Version 2 (green) splits this into structured fields: street_address, city, state, and zip_code.</p>
<h3 id="heading-architecture-setup"><strong>Architecture Setup</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768087707691/ff19ce97-b745-4aa8-8b39-4d835fd781cd.png" alt="Figure 3: Complete AWS architecture for blue-green ECS deployment with shared RDS database" class="image--center mx-auto" width="2479" height="3679" loading="lazy"></p>
<p>Your infrastructure includes:</p>
<ul>
<li><p>ECS cluster running Fargate tasks</p>
</li>
<li><p>Application Load Balancer with two target groups (blue and green)</p>
</li>
<li><p>RDS PostgreSQL database (shared between environments)</p>
</li>
<li><p>CodeDeploy for managing traffic shifts</p>
</li>
<li><p>Parameter Store for database connection strings</p>
</li>
</ul>
<p>💡 <strong>Implementation Note</strong>: The complete Terraform code for this architecture is available in the <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/tree/main/terraform">companion GitHub repository</a>.</p>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>Before starting, make sure that you have the following tools installed and your AWS credentials properly configured:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Required tools</span>
aws --version      <span class="hljs-comment"># AWS CLI</span>
terraform --version <span class="hljs-comment"># Terraform &gt;= 1.0</span>
docker --version   <span class="hljs-comment"># Docker</span>
psql --version     <span class="hljs-comment"># PostgreSQL client</span>

<span class="hljs-comment"># Configure AWS credentials</span>
aws configure
aws sts get-caller-identity  <span class="hljs-comment"># Verify your identity</span>
</code></pre>
<h3 id="heading-step-1-deploy-infrastructure-and-blue-environment">Step 1: Deploy Infrastructure and Blue Environment</h3>
<p>We’ll start by setting up the entire AWS infrastructure from scratch using Terraform, then deploying the initial version of our application (blue environment).</p>
<p>First, clone the repository and set up your environment:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Clone the repository</span>
git <span class="hljs-built_in">clone</span> https://github.com/Caesarsage/bluegreen-deployment-ecs.git
<span class="hljs-built_in">cd</span> bluegreen-deployment-ecs

<span class="hljs-comment"># Create terraform variables</span>
<span class="hljs-built_in">cd</span> terraform
cat &gt; terraform.tfvars &lt;&lt;EOF
aws_region         = <span class="hljs-string">"us-east-1"</span>
project_name       = <span class="hljs-string">"ecommerce-bluegreen"</span>
environment        = <span class="hljs-string">"production"</span>
vpc_cidr           = <span class="hljs-string">"10.0.0.0/16"</span>

<span class="hljs-comment"># Database credentials (CHANGE THESE!)</span>
db_username = <span class="hljs-string">"dbadmin"</span>
db_password = <span class="hljs-string">"ChangeThisPassword123!"</span>

<span class="hljs-comment"># Container configuration</span>
container_image = <span class="hljs-string">"PLACEHOLDER"</span>  <span class="hljs-comment"># Will update after building image</span>
container_port  = 8080

<span class="hljs-comment"># Scaling configuration</span>
desired_count = 2
cpu           = <span class="hljs-string">"256"</span>
memory        = <span class="hljs-string">"512"</span>

<span class="hljs-comment"># Notifications</span>
notification_email = <span class="hljs-string">"your-email@example.com"</span>
EOF
</code></pre>
<p><strong>Security Note:</strong> Never commit <code>terraform.tfvars</code> to Git. It's already in <code>.gitignore</code>.</p>
<p>Next, initialize Terraform and create the ECR repository:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Initialize Terraform</span>
terraform init
terraform validate

<span class="hljs-comment"># Create ECR repository</span>
terraform apply -target=aws_ecr_repository.app

<span class="hljs-comment"># Get ECR repository URL</span>
<span class="hljs-built_in">export</span> ECR_REPO=$(terraform output -raw ecr_repository_url)
<span class="hljs-built_in">echo</span> <span class="hljs-string">"ECR Repository: <span class="hljs-variable">$ECR_REPO</span>"</span>
</code></pre>
<p>We create the ECR repository first because we need somewhere to push our Docker image. Then we'll build the image, push it, and finally deploy the rest of the infrastructure that depends on that image existing.</p>
<p>Build and push the initial application like this:</p>
<pre><code class="lang-bash">
<span class="hljs-built_in">cd</span> ..  <span class="hljs-comment"># Back to project root</span>

<span class="hljs-comment"># Set variables</span>
<span class="hljs-built_in">export</span> AWS_REGION=us-east-1
<span class="hljs-built_in">export</span> AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
<span class="hljs-built_in">export</span> ECR_REPOSITORY=ecommerce-bluegreen
<span class="hljs-built_in">export</span> IMAGE_TAG=v1.0.0

<span class="hljs-comment"># Login to ECR</span>
aws ecr get-login-password --region <span class="hljs-variable">$AWS_REGION</span> | \
    docker login --username AWS --password-stdin <span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com

<span class="hljs-comment"># Build the image</span>
docker build --platform linux/amd64 -t <span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span> -f docker/Dockerfile .

<span class="hljs-comment"># Tag and push to ECR</span>
docker tag <span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span> \
    <span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span>

docker push <span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span>

<span class="hljs-comment"># Update terraform.tfvars with the image URL</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"container_image = \"<span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span>\""</span> &gt;&gt; terraform/terraform.tfvars
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768137809806/820d7005-b924-4224-9b58-de5701466c1f.png" alt="Figure 4: ECR Private repository for Docker image" class="image--center mx-auto" width="2442" height="632" loading="lazy"></p>
<p>The <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/tree/main/app">application code</a> is a Flask application that handles both old and new schema formats based on the <code>APP_VERSION</code> environment variable.</p>
<p>Now deploy the complete infrastructure:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> terraform
terraform apply  <span class="hljs-comment"># Takes ~15-20 minutes</span>

<span class="hljs-comment"># Get outputs</span>
<span class="hljs-built_in">export</span> ALB_URL=$(terraform output -raw alb_url)
<span class="hljs-built_in">export</span> TEST_URL=$(terraform output -raw test_url)
<span class="hljs-built_in">export</span> DB_ENDPOINT=$(terraform output -raw db_endpoint)
<span class="hljs-built_in">export</span> ECR_URL=$(terraform output -raw ecr_repository_url)
<span class="hljs-built_in">export</span> BASTION_IP=$(terraform output -raw bastion_public_ip)

<span class="hljs-built_in">echo</span> <span class="hljs-string">"Application URL: <span class="hljs-variable">$ALB_URL</span>"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Test URL: <span class="hljs-variable">$TEST_URL</span>"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Database Endpoint: <span class="hljs-variable">$DB_ENDPOINT</span>"</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768141033921/07c2e9b9-c652-4cec-91ae-2de956d8655d.png" alt="Application Load Balancer with two target groups (blue and green)" class="image--center mx-auto" width="2504" height="844" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768142296716/9963c779-e0a8-4418-8d69-9bc8fcbbc553.png" alt="Figure 5: Application Load Balancer with two target groups (blue and green)" class="image--center mx-auto" width="2553" height="458" loading="lazy"></p>
<p>The production listener (port 80) is what your users hit. The test listener (port 8080) lets you test the green environment before shifting production traffic to it. This is crucial for validation.</p>
<p>You can see the complete Terraform configuration in <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/tree/main/terraform"><code>terraform</code></a>.</p>
<h3 id="heading-step-2-initialize-database-schema">Step 2: Initialize Database Schema</h3>
<p>Now you’ll need to initialize the database with the schema for version 1 (blue). We'll use Bastion for secure access:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Copy the migration files to the bastion host from your local machine</span>

scp -i ~/.ssh/id_rsa docker/init.sql ec2-user@<span class="hljs-variable">$BASTION_IP</span>:/tmp/
scp -i ~/.ssh/id_rsa migrations/*.sql ec2-user@<span class="hljs-variable">$BASTION_IP</span>:/tmp/

<span class="hljs-comment"># Then SSH into it and run migrations</span>
ssh -i ~/.ssh ec2-user@<span class="hljs-variable">$BASTION_IP</span>

<span class="hljs-comment"># Inside the bastion:</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -f /tmp/init.sql

<span class="hljs-comment"># Verify</span>
psql -h <span class="hljs-variable">$DB_HOST</span> -U <span class="hljs-variable">$DB_USER</span> -d <span class="hljs-variable">$DB_NAME</span> -c <span class="hljs-string">"\d customers"</span>

<span class="hljs-comment"># Exit the container</span>
<span class="hljs-built_in">exit</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768089062401/8f23655e-b50b-4b24-af98-b195e29da9c7.png" alt="Figure 6: Database schema - the customers table with the original columns" class="image--center mx-auto" width="1298" height="402" loading="lazy"></p>
<h3 id="heading-step-3-verify-blue-environment">Step 3: Verify Blue Environment</h3>
<p>We’ll want to test that everything works before we start the migration. This is your baseline: you want to confirm that the current system is healthy before introducing changes.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Check health</span>
curl <span class="hljs-variable">$ALB_URL</span>/health | jq

<span class="hljs-comment"># Expected response:</span>
<span class="hljs-comment"># {</span>
<span class="hljs-comment">#   "status": "healthy",</span>
<span class="hljs-comment">#   "version": "blue",</span>
<span class="hljs-comment">#   "environment": "production",</span>
<span class="hljs-comment">#   "database": "connected",</span>
<span class="hljs-comment">#   "schema": "compatible"</span>
<span class="hljs-comment"># }</span>

<span class="hljs-comment"># Create a customer with the old schema (single address field)</span>
curl -X POST <span class="hljs-variable">$ALB_URL</span>/api/customers \
    -H <span class="hljs-string">"Content-Type: application/json"</span> \
    -d <span class="hljs-string">'{
      "name": "John Doe",
      "email": "john@example.com",
      "address": "123 Main St, New York, NY, 10001"
    }'</span> | jq

<span class="hljs-comment"># List customers</span>
curl <span class="hljs-variable">$ALB_URL</span>/api/customers | jq
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768138569485/b7455a6e-b101-4cdb-83b8-40e0dbafb0b0.png" alt="Figure 7: Blue Environment Verification" class="image--center mx-auto" width="1068" height="434" loading="lazy"></p>
<h3 id="heading-step-4-expand-phase-add-new-columns">Step 4: Expand Phase – Add New Columns</h3>
<p>This is the first phase of expand-contract. We're adding the new columns WITHOUT removing the old one, creating a database schema that supports both blue and green simultaneously.</p>
<p>Run the expand migration (<a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/blob/main/migrations/001_expand_address.sql"><code>migrations/001_expand_address.sql</code>)</a>:</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Migration: 001_expand_address_fields.sql</span>
<span class="hljs-keyword">BEGIN</span>;

<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> customers 
  <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> street_address <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">255</span>),
  <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> city <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>),
  <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> state <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">2</span>),
  <span class="hljs-keyword">ADD</span> <span class="hljs-keyword">COLUMN</span> zip_code <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">10</span>);

<span class="hljs-comment">-- Populate new columns from existing data</span>
<span class="hljs-comment">-- This uses a simple parsing strategy; yours might be more sophisticated</span>

<span class="hljs-keyword">UPDATE</span> customers 
<span class="hljs-keyword">SET</span> 
  street_address = SPLIT_PART(address, <span class="hljs-string">','</span>, <span class="hljs-number">1</span>),
  city = <span class="hljs-keyword">TRIM</span>(SPLIT_PART(address, <span class="hljs-string">','</span>, <span class="hljs-number">2</span>)),
  state = <span class="hljs-keyword">TRIM</span>(SPLIT_PART(address, <span class="hljs-string">','</span>, <span class="hljs-number">3</span>)),
  zip_code = <span class="hljs-keyword">TRIM</span>(SPLIT_PART(address, <span class="hljs-string">','</span>, <span class="hljs-number">4</span>))
<span class="hljs-keyword">WHERE</span> address <span class="hljs-keyword">IS</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>;

<span class="hljs-keyword">COMMIT</span>;
</code></pre>
<p><strong>Critical observation:</strong> We're NOT dropping the <code>address</code> column. It's still there. Blue continues reading and writing to it, completely unaware that new columns exist. This is what makes the migration safe – nothing breaks.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Then SSH into it and run migrations</span>
ssh -i ~/.ssh ec2-user@<span class="hljs-variable">$BASTION_IP</span>

<span class="hljs-comment"># Inside the bastion:</span>
<span class="hljs-built_in">export</span> DB_ENDPOINT = <span class="hljs-string">""</span> <span class="hljs-comment"># from terraform output</span>

psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -f /tmp/001_expand_address.sql

<span class="hljs-comment"># Verify new columns exist</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -c <span class="hljs-string">"\d customers"</span>

<span class="hljs-built_in">exit</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768089194050/e053dee3-382b-4ccd-a0e0-8c17003e9832.png" alt="Figure 8: Database schema evolution - the customers table during expand phase with both old and new columns" class="image--center mx-auto" width="1638" height="694" loading="lazy"></p>
<p><strong>Verification:</strong> The <code>\d customers</code> command shows the table structure. You should see BOTH the old <code>address</code> column AND the new <code>street_address</code>, <code>city</code>, <code>state</code>, <code>zip_code</code> columns. This confirms the expand phase worked.</p>
<p>The database now supports both old (blue) and new (green) schemas. Blue is still running and working perfectly, and nothing has changed from its perspective.</p>
<h3 id="heading-step-5-build-and-deploy-green-environment">Step 5: Build and Deploy Green Environment</h3>
<p>Now we’ll build version 2 of our application that knows how to work with the new structured address fields, while maintaining backwards compatibility with the old schema.</p>
<p>Start by building version 2 with structured address support:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> ..  <span class="hljs-comment"># Back to project root</span>

<span class="hljs-comment"># Build new version</span>
<span class="hljs-built_in">export</span> IMAGE_TAG=v2.0.0

docker build --platform linux/amd64 -t <span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span> -f docker/Dockerfile .

docker tag <span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span> \
    <span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span>

docker push <span class="hljs-variable">$AWS_ACCOUNT_ID</span>.dkr.ecr.<span class="hljs-variable">$AWS_REGION</span>.amazonaws.com/<span class="hljs-variable">$ECR_REPOSITORY</span>:<span class="hljs-variable">$IMAGE_TAG</span>
</code></pre>
<p>What’s different is that the v2 <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/blob/main/app/models.py">application code</a> now has logic that:</p>
<ul>
<li><p><strong>Reads</strong> from the new structured columns (<code>street_address</code>, <code>city</code>, and so on)</p>
</li>
<li><p><strong>Writes</strong> to BOTH new columns AND the old <code>address</code> column</p>
</li>
<li><p>Accepts API requests with structured address format</p>
</li>
</ul>
<p><strong>Why write to both:</strong> This is crucial. Even though green prefers the new format, it maintains the old format, too. If you need to rollback to blue, all the data blue needs is there and up-to-date. Without this, rollback would be impossible: blue would see empty or stale <code>address</code> fields.</p>
<p>Now create and register green task definition:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> terraform

<span class="hljs-comment"># Get necessary ARNs</span>
EXECUTION_ROLE_ARN=$(terraform output -raw ecs_task_execution_role_arn)
TASK_ROLE_ARN=$(terraform output -raw ecs_task_role_arn)
DB_SECRET_ARN=$(terraform output -raw db_secret_arn)

<span class="hljs-comment"># Create task definition</span>
cat &gt; task-def-green.json &lt;&lt;EOF
{
  <span class="hljs-string">"family"</span>: <span class="hljs-string">"ecommerce-bluegreen"</span>,
  <span class="hljs-string">"networkMode"</span>: <span class="hljs-string">"awsvpc"</span>,
  <span class="hljs-string">"requiresCompatibilities"</span>: [<span class="hljs-string">"FARGATE"</span>],
  <span class="hljs-string">"cpu"</span>: <span class="hljs-string">"256"</span>,
  <span class="hljs-string">"memory"</span>: <span class="hljs-string">"512"</span>,
  <span class="hljs-string">"executionRoleArn"</span>: <span class="hljs-string">"<span class="hljs-variable">${EXECUTION_ROLE_ARN}</span>"</span>,
  <span class="hljs-string">"taskRoleArn"</span>: <span class="hljs-string">"<span class="hljs-variable">${TASK_ROLE_ARN}</span>"</span>,
  <span class="hljs-string">"containerDefinitions"</span>: [{
    <span class="hljs-string">"name"</span>: <span class="hljs-string">"app"</span>,
    <span class="hljs-string">"image"</span>: <span class="hljs-string">"<span class="hljs-variable">${AWS_ACCOUNT_ID}</span>.dkr.ecr.<span class="hljs-variable">${AWS_REGION}</span>.amazonaws.com/<span class="hljs-variable">${ECR_REPOSITORY}</span>:<span class="hljs-variable">${IMAGE_TAG}</span>"</span>,
    <span class="hljs-string">"essential"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-string">"portMappings"</span>: [{
      <span class="hljs-string">"containerPort"</span>: 8080,
      <span class="hljs-string">"protocol"</span>: <span class="hljs-string">"tcp"</span>
    }],
    <span class="hljs-string">"environment"</span>: [
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"APP_VERSION"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"green"</span>},
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"ENVIRONMENT"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"production"</span>},
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"AWS_REGION"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"<span class="hljs-variable">${AWS_REGION}</span>"</span>},
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"DB_HOST"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"<span class="hljs-variable">${DB_ENDPOINT}</span>"</span>},
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"DB_PORT"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"5432"</span>},
      {<span class="hljs-string">"name"</span>: <span class="hljs-string">"DB_NAME"</span>, <span class="hljs-string">"value"</span>: <span class="hljs-string">"ecommerce"</span>}
    ],
    <span class="hljs-string">"secrets"</span>: [
      {
        <span class="hljs-string">"name"</span>: <span class="hljs-string">"DB_USER"</span>,
        <span class="hljs-string">"valueFrom"</span>: <span class="hljs-string">"<span class="hljs-variable">${DB_SECRET_ARN}</span>:username::"</span>
      },
      {
        <span class="hljs-string">"name"</span>: <span class="hljs-string">"DB_PASSWORD"</span>,
        <span class="hljs-string">"valueFrom"</span>: <span class="hljs-string">"<span class="hljs-variable">${DB_SECRET_ARN}</span>:password::"</span>
      }
    ],
    <span class="hljs-string">"logConfiguration"</span>: {
      <span class="hljs-string">"logDriver"</span>: <span class="hljs-string">"awslogs"</span>,
      <span class="hljs-string">"options"</span>: {
        <span class="hljs-string">"awslogs-group"</span>: <span class="hljs-string">"/ecs/ecommerce-bluegreen"</span>,
        <span class="hljs-string">"awslogs-region"</span>: <span class="hljs-string">"<span class="hljs-variable">${AWS_REGION}</span>"</span>,
        <span class="hljs-string">"awslogs-stream-prefix"</span>: <span class="hljs-string">"ecs"</span>
      }
    },
    <span class="hljs-string">"healthCheck"</span>: {
      <span class="hljs-string">"command"</span>: [<span class="hljs-string">"CMD-SHELL"</span>, <span class="hljs-string">"curl -f http://localhost:8080/health || exit 1"</span>],
      <span class="hljs-string">"interval"</span>: 30,
      <span class="hljs-string">"timeout"</span>: 5,
      <span class="hljs-string">"retries"</span>: 3,
      <span class="hljs-string">"startPeriod"</span>: 60
    }
  }]
}
EOF

<span class="hljs-comment"># Register the task definition</span>
aws ecs register-task-definition --cli-input-json file://task-def-green.json
</code></pre>
<p>This JSON tells ECS everything about how to run your container:</p>
<ul>
<li><p>Which Docker image to use (the v2.0.0 we just built)</p>
</li>
<li><p>How much CPU/memory to allocate (256 CPU units = 0.25 vCPU)</p>
</li>
<li><p>Environment variables (notice <code>APP_VERSION</code> is set to "green")</p>
</li>
<li><p>Secrets (database credentials pulled from AWS Secrets Manager)</p>
</li>
<li><p>Health check configuration (curl the /health endpoint every 30 seconds)</p>
</li>
<li><p>Logging configuration (send logs to CloudWatch)</p>
</li>
</ul>
<p><strong>Key detail:</strong> The <code>APP_VERSION</code> environment variable is how the application knows whether to behave as blue or green. Same codebase, different behavior based on configuration.</p>
<h3 id="heading-step-6-execute-blue-green-deployment">Step 6: Execute Blue-Green Deployment</h3>
<p>Alright, now it’s time to create AppSpec and trigger the deployment:</p>
<pre><code class="lang-bash">TASK_DEF_ARN=$(aws ecs describe-task-definition \
  --task-definition ecommerce-bluegreen \
  --query <span class="hljs-string">'taskDefinition.taskDefinitionArn'</span> \
  --output text)

cat &gt; appspec.json &lt;&lt;EOF
{
  <span class="hljs-string">"version"</span>: 0.0,
  <span class="hljs-string">"Resources"</span>: [{
    <span class="hljs-string">"TargetService"</span>: {
      <span class="hljs-string">"Type"</span>: <span class="hljs-string">"AWS::ECS::Service"</span>,
      <span class="hljs-string">"Properties"</span>: {
        <span class="hljs-string">"TaskDefinition"</span>: <span class="hljs-string">"<span class="hljs-variable">${TASK_DEF_ARN}</span>"</span>,
        <span class="hljs-string">"LoadBalancerInfo"</span>: {
          <span class="hljs-string">"ContainerName"</span>: <span class="hljs-string">"app"</span>,
          <span class="hljs-string">"ContainerPort"</span>: 8080
        }
      }
    }
  }]
}
EOF

<span class="hljs-comment"># Deploy</span>
APPSPEC=$(cat appspec.json | jq -c .)
aws deploy create-deployment \
  --application-name ecommerce-bluegreen \
  --deployment-group-name ecommerce-bluegreen-deployment-group \
  --deployment-config-name CodeDeployDefault.ECSLinear10PercentEvery3Minutes \
  --description <span class="hljs-string">"Blue-green deployment to structured address schema"</span> \
  --cli-input-json <span class="hljs-string">"{
    \"revision\": {
      \"revisionType\": \"AppSpecContent\",
      \"appSpecContent\": {
        \"content\": <span class="hljs-subst">$(echo \"$APPSPEC\" | jq -Rs .)</span>
      }
    }
  }"</span>

DEPLOYMENT_ID=$(aws deploy list-deployments \
    --application-name ecommerce-bluegreen \
    --deployment-group-name ecommerce-bluegreen-deployment-group \
    --query <span class="hljs-string">'deployments[0]'</span> --output text)
</code></pre>
<p>Monitor the deployment:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Watch status</span>
watch -n 10 <span class="hljs-string">"aws deploy get-deployment --deployment-id <span class="hljs-variable">$DEPLOYMENT_ID</span> \
    --query 'deploymentInfo.status' --output text"</span>

<span class="hljs-comment"># Monitor traffic distribution</span>
<span class="hljs-keyword">while</span> <span class="hljs-literal">true</span>; <span class="hljs-keyword">do</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"Production: <span class="hljs-subst">$(curl -s $ALB_URL/health | jq -r '.version')</span>"</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"Test: <span class="hljs-subst">$(curl -s $TEST_URL/health | jq -r '.version')</span>"</span>
    sleep 30
<span class="hljs-keyword">done</span>
</code></pre>
<p>The deployment shifts 10% of traffic every 3 minutes, completing in 30 minutes.</p>
<h3 id="heading-step-7-validate-green-environment">Step 7: Validate Green Environment</h3>
<p>After the deployment begins, you need to validate that the green environment is functioning correctly with the new structured address format before allowing production traffic to reach it.</p>
<p>The CodeBuild dashboard below shows the Traffic migration and Deployment status:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768093087711/fc1b869c-7fae-421e-8d98-45769300cb0a.png" alt="Monitoring in CodeDeploy" class="image--center mx-auto" width="2282" height="1460" loading="lazy"></p>
<p>We can also test through the test listener (port 8080), which provides isolated access to green tasks:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Test new structured address API</span>
curl -X POST <span class="hljs-variable">$TEST_URL</span>/api/customers \
    -H <span class="hljs-string">"Content-Type: application/json"</span> \
    -d <span class="hljs-string">'{
      "name": "Jane Smith",
      "email": "jane@example.com",
      "address": {
        "street": "456 Oak Ave",
        "city": "Los Angeles",
        "state": "CA",
        "zip": "90001"
      }
    }'</span> | jq

curl <span class="hljs-variable">$ALB_URL</span>/api/customers | jq
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768140730325/57c6a047-994f-4b5e-8e19-4d6fb25ad44e.png" alt="Validate Green environment response" class="image--center mx-auto" width="1422" height="672" loading="lazy"></p>
<p>What you're validating:</p>
<ul>
<li><p>The green environment accepts the new structured address format</p>
</li>
<li><p>Data is correctly written to both new columns (street_address, city, state, zip_code) and the old address column for backwards compatibility</p>
</li>
<li><p>The API response matches expectations for the new schema</p>
</li>
<li><p>Existing data from blue environment is still accessible and readable</p>
</li>
</ul>
<p>If any of these tests fail, you can stop the deployment before production traffic reaches green, preventing customer impact.</p>
<h3 id="heading-step-8-post-deployment-validation">Step 8: Post-Deployment Validation</h3>
<p>Once CodeDeploy completes the traffic shift, all production requests route to green. This is your opportunity to verify that the deployment was successful and that the new version is handling real production traffic correctly.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Verify all production traffic goes to green</span>
<span class="hljs-comment"># Running this multiple times confirms consistent routing</span>
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> {1..10}; <span class="hljs-keyword">do</span>
    curl -s <span class="hljs-variable">$ALB_URL</span>/health | jq -r <span class="hljs-string">'.version'</span>
<span class="hljs-keyword">done</span>
<span class="hljs-comment"># Expected output: "green" for all 10 requests</span>

<span class="hljs-comment"># Test complete CRUD operations with the new API</span>
<span class="hljs-comment"># Create a customer with structured address</span>
CUSTOMER_ID=$(curl -s -X POST <span class="hljs-variable">$ALB_URL</span>/api/customers \
    -H <span class="hljs-string">"Content-Type: application/json"</span> \
    -d <span class="hljs-string">'{"name": "Test User", "email": "test@example.com",
         "address": {"street": "789 Test St", "city": "Test City", 
         "state": "TX", "zip": "75001"}}'</span> | jq -r <span class="hljs-string">'.id'</span>)

<span class="hljs-comment"># Read the customer back to verify data persistence</span>
curl <span class="hljs-variable">$ALB_URL</span>/api/customers/<span class="hljs-variable">$CUSTOMER_ID</span> | jq

<span class="hljs-comment"># Update the customer to test modification</span>
curl -X PUT <span class="hljs-variable">$ALB_URL</span>/api/customers/<span class="hljs-variable">$CUSTOMER_ID</span> \
    -H <span class="hljs-string">"Content-Type: application/json"</span> \
    -d <span class="hljs-string">'{"address": {"street": "999 Updated Ave", "city": "Test City", 
         "state": "TX", "zip": "75001"}}'</span> | jq

<span class="hljs-comment"># Delete the test customer for cleanup</span>
curl -X DELETE <span class="hljs-variable">$ALB_URL</span>/api/customers/<span class="hljs-variable">$CUSTOMER_ID</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768140850962/a31273e9-cbc1-4d09-9f6d-7248b402f712.png" alt="Verify all production traffic goes to green" class="image--center mx-auto" width="846" height="270" loading="lazy"></p>
<p>What you're validating:</p>
<ul>
<li><p>Traffic routing is 100% to green with no requests reaching blue</p>
</li>
<li><p>Create operations work with the new structured address format</p>
</li>
<li><p>Read operations return correct data with proper address structure</p>
</li>
<li><p>Update operations successfully modify existing records</p>
</li>
<li><p>Delete operations work without errors</p>
</li>
<li><p>The application correctly writes to both new columns and old address column (enabling potential rollback)</p>
</li>
</ul>
<p>Check your CloudWatch logs and metrics during this validation period for any unexpected errors, increased latency, or database connection issues.</p>
<h3 id="heading-step-9-contract-phase-after-24-72-hours">Step 9: Contract Phase (After 24-72 Hours)</h3>
<p>This is the final phase of expand-contract. We're removing the old <code>address</code> column now that we're confident green is stable. This is the point of no return.</p>
<p><strong>CRITICAL</strong>: Only proceed after green has been stable for your confidence period!</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Backup database first</span>
aws rds create-db-snapshot \
    --db-instance-identifier ecommerce-bluegreen-db \
    --db-snapshot-identifier pre-contract-$(date +%Y%m%d-%H%M%S)

<span class="hljs-comment"># Wait for snapshot</span>
aws rds <span class="hljs-built_in">wait</span> db-snapshot-completed \
    --db-snapshot-identifier pre-contract-$(date +%Y%m%d-%H%M%S)

<span class="hljs-comment"># Run contract migration</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -f /tmp/002_contract_address.sql

<span class="hljs-comment"># Verify old column is gone</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -c <span class="hljs-string">"\d customers"</span>
</code></pre>
<p>The contract migration (<a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/blob/main/migrations/002_contract_address.sql"><code>migrations/002_contract_address.sql</code></a>) removes the old <code>address</code> column.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768140955991/d6f6f287-09e5-4693-a4e9-77c1d9080466.png" alt="d6f6f287-09e5-4693-a4e9-77c1d9080466" class="image--center mx-auto" width="1506" height="444" loading="lazy"></p>
<p><strong>Why wait 24-72 hours:</strong> You want to be absolutely certain green is stable before making irreversible changes. During this waiting period:</p>
<ul>
<li><p>All your monitoring should show green performing normally</p>
</li>
<li><p>You've seen the system handle multiple daily traffic patterns (morning peak, evening peak, overnight)</p>
</li>
<li><p>Weekly batch jobs have run successfully</p>
</li>
<li><p>You've verified third-party integrations work</p>
</li>
<li><p>No unusual errors or performance degradation</p>
</li>
</ul>
<p>It’s important to snapshot first because once you drop that column, there's no undo button. The snapshot is your safety net. If you discover a critical issue after contracting, you can restore this snapshot and get back to a state where rollback is possible. Without it, you're gambling.</p>
<p><strong>What the contract migration does:</strong></p>
<pre><code class="lang-sql"><span class="hljs-comment">-- migrations/002_contract_address.sql</span>
<span class="hljs-keyword">BEGIN</span>;
<span class="hljs-keyword">ALTER</span> <span class="hljs-keyword">TABLE</span> customers <span class="hljs-keyword">DROP</span> <span class="hljs-keyword">COLUMN</span> address;
<span class="hljs-keyword">COMMIT</span>;
</code></pre>
<p>It's simple but permanent. The old <code>address</code> column is gone. The Blue environment will no longer work with this database, as it expects that column to exist. This is fine because blue has been decommissioned (no traffic, tasks terminated).</p>
<p><strong>What to update:</strong> You should also deploy version 3 of your application that removes the dual-write logic. Version 2 (green) is still writing to both the new columns and the old <code>address</code> column. Version 3 can stop wasting cycles writing to a column that no longer exists.</p>
<p>The contract migration (<a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs/blob/main/migrations/002_contract_address.sql"><code>migrations/002_contract_address.sql</code></a>) removes the old <code>address</code> column. Your migration is now complete!</p>
<h2 id="heading-rollback-strategies">Rollback Strategies</h2>
<h3 id="heading-during-deployment-safe-window">During Deployment (Safe Window)</h3>
<p>Use this strategy when you detect issues <strong>during the traffic shift</strong>, before all traffic has moved to green. CodeDeploy is still managing the deployment, which means it can automatically revert traffic distribution to the previous state.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Immediate rollback</span>
aws deploy stop-deployment \
    --deployment-id <span class="hljs-variable">$DEPLOYMENT_ID</span> \
    --auto-rollback-enabled
</code></pre>
<p>You should use this strategy when you notice increased error rates, degraded performance, or functional issues during the canary or linear traffic shift. CodeDeploy automatically shifts all traffic back to blue, and green tasks are terminated. This is the safest and fastest rollback option.</p>
<p>This works because the database still contains the old <code>address</code> column (expand phase), so blue can function normally. No data has been lost or made incompatible.</p>
<h3 id="heading-after-deployment-before-contract">After Deployment (Before Contract)</h3>
<p>Use this when the deployment completed successfully, but you discover issues hours or days later during the monitoring period, before you've run the contract migration. Both blue and green environments still exist, and the database supports both schemas.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Manual listener update</span>
aws elbv2 modify-listener \
    --listener-arn $(terraform output -raw alb_listener_arn) \
    --default-actions Type=forward,TargetGroupArn=$(terraform output -raw blue_target_group_arn)
</code></pre>
<p>Or use the provided script:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> scripts
./rollback.sh
</code></pre>
<p>Use this when you discover bugs in green that weren't caught during initial testing, business metrics show unexpected changes (conversion rates drop, customer complaints increase), or third-party integration issues emerge.</p>
<p>This works because the database still has both old and new schema elements. Blue tasks still exist and can serve traffic immediately. Because green was writing to both old and new columns, blue sees all the latest data.</p>
<p>With this, the traffic immediately shifts from green back to blue. Green continues running for observability, but serves no traffic. You can debug green in place without customer impact.</p>
<h3 id="heading-after-contract-phase">After Contract Phase</h3>
<p>Use this as a <strong>last resort</strong> when you've already removed the old address column, and blue can no longer function with the current database schema. This is significantly more complex and time-consuming than the previous two strategies.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Restore from snapshot</span>
aws rds restore-db-instance-from-db-snapshot \
    --db-instance-identifier ecommerce-bluegreen-db-restored \
    --db-snapshot-identifier pre-contract-YYYYMMDD-HHMMSS
</code></pre>
<p>Only use this strategy when you discover a critical, production-breaking issue after the contract phase, and you have no other option but to return to the previous version.</p>
<p><strong>Why it's painful</strong>:</p>
<ul>
<li><p>Database restore takes 10-30 minutes depending on size</p>
</li>
<li><p>You lose all data written after the snapshot was taken</p>
</li>
<li><p>Requires updating connection strings to point to the restored instance</p>
</li>
<li><p>Need to re-deploy blue environment</p>
</li>
<li><p>Must communicate downtime to users</p>
</li>
</ul>
<p>This is why you wait 24-72 hours before contracting, and take a snapshot immediately before the contract migration. The lengthy waiting period allows you to catch most issues while the safer rollback strategies are still available.</p>
<h2 id="heading-monitoring-during-deployments">Monitoring During Deployments</h2>
<h3 id="heading-essential-metrics">Essential Metrics</h3>
<p>During a blue-green deployment, you need to monitor both environments simultaneously to detect issues early and make informed decisions about proceeding or rolling back.For each target group (blue and green), track these CloudWatch metrics:</p>
<h4 id="heading-1-targetresponsetime">1. TargetResponseTime</h4>
<p>Measures latency from when the load balancer sends a request to when it receives a response. You're looking for sudden spikes or gradual degradation. Green should have similar response times to blue (within 10-20%). If green's latency is significantly higher, you may have performance regressions, inefficient queries with the new schema, or resource constraints.</p>
<h4 id="heading-2-requestcount">2. RequestCount</h4>
<p>Shows traffic volume hitting each target group. During the deployment, you should see blue's count decreasing while green's increases proportionally. If the numbers don't add up (total requests drop significantly), users might be experiencing errors and not retrying. If green receives traffic but shows zero requests, health checks might be failing.</p>
<h4 id="heading-3-httpcodetarget5xxcount">3. HTTPCode_Target_5XX_Count</h4>
<p>Server errors indicate application problems. Even a single 5XX error during deployment warrants investigation. Green should have zero 5XX errors during the initial traffic shift. Any errors could indicate incompatibility issues with the new schema, missing environment variables, or database connection problems.</p>
<h4 id="heading-4-databaseconnections-from-rds-metrics">4. DatabaseConnections (from RDS metrics):</h4>
<p>Shows active database connections from both environments. Watch for connection pool exhaustion, which manifests as a sudden spike or plateau at your max connections limit. If green uses more connections than blue did, you might have connection leaks or inefficient connection handling in the new code.</p>
<h4 id="heading-5-cpuutilization">5. CPUUtilization</h4>
<p>Monitor both ECS task CPU and RDS CPU. Green tasks should use similar CPU to blue tasks for the same request volume. Higher CPU might indicate less efficient code or more complex queries. RDS CPU spikes during deployment often indicate poorly optimized new queries or missing indexes for the new schema.</p>
<p><strong>What to expect</strong>:</p>
<ul>
<li><p>First 5-10 minutes: Green receives 10% traffic, metrics should closely match blue's baseline</p>
</li>
<li><p>15-20 minutes: Green at 30-50% traffic, both environments should show stable metrics</p>
</li>
<li><p>25-30 minutes: Green at 100% traffic, metrics should stabilize at historical levels</p>
</li>
<li><p>Any divergence from these patterns warrants stopping the deployment and investigating</p>
</li>
</ul>
<p><strong>Custom application metrics</strong>: Beyond infrastructure metrics, monitor business-critical metrics like checkout completion rates, API success rates, and user sign-up flows. Sometimes technical metrics look fine but user-facing functionality is broken.</p>
<h2 id="heading-best-practices">Best Practices</h2>
<h3 id="heading-test-migrations-in-staging">Test Migrations in Staging</h3>
<p>Always run your database migrations against a staging environment that mirrors production scale and complexity before touching production. Copy a recent production snapshot to staging and execute your expand migration there first.</p>
<p><strong>Why this matters</strong>: Migrations that work fine on small datasets can timeout or lock tables on production-scale data. You might discover that adding an index to a 50-million-row table takes 2 hours, or that your column population query needs optimization.</p>
<p><strong>What to test</strong>:</p>
<ul>
<li><p>Migration execution time (should complete in seconds/minutes, not hours)</p>
</li>
<li><p>Table locks and their impact (can reads/writes continue during migration?)</p>
</li>
<li><p>Query performance with new schema (are your indexes still effective?)</p>
</li>
<li><p>Rollback procedures (can you undo the migration if needed?)</p>
</li>
</ul>
<h3 id="heading-use-migration-tools">Use Migration Tools</h3>
<p>Don't write raw SQL migrations manually. Use Flyway, Liquibase, Alembic (for Python), or your framework's built-in migration tools (Rails migrations, Django migrations, Entity Framework migrations).</p>
<p><strong>Why this matters</strong>: Migration tools provide version tracking, rollback capabilities, checksums to prevent tampering, and a standardized way to manage schema changes across environments.</p>
<h3 id="heading-configure-health-checks-properly">Configure Health Checks Properly</h3>
<p>Your health check endpoint should verify that the application can actually function, not just that the process is running. A comprehensive health check validates database connectivity, schema compatibility, and dependent service availability.</p>
<pre><code class="lang-python"><span class="hljs-meta">@app.route('/health')</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">health_check</span>():</span>
    checks = {
        <span class="hljs-string">'database'</span>: check_database(),
        <span class="hljs-string">'schema'</span>: check_schema_compatibility(),
        <span class="hljs-string">'cache'</span>: check_cache_connection()
    }

    <span class="hljs-keyword">if</span> all(checks.values()):
        <span class="hljs-keyword">return</span> jsonify(checks), <span class="hljs-number">200</span>
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> jsonify(checks), <span class="hljs-number">503</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">check_schema_compatibility</span>():</span>
    <span class="hljs-string">"""Verify expected schema elements exist"""</span>
    <span class="hljs-keyword">try</span>:
        result = db.query(<span class="hljs-string">"""
            SELECT column_name 
            FROM information_schema.columns 
            WHERE table_name = 'customers'
            AND column_name IN ('street_address', 'city', 'state', 'zip_code')
        """</span>)
        <span class="hljs-keyword">return</span> len(result) == <span class="hljs-number">4</span>
    <span class="hljs-keyword">except</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
</code></pre>
<p>For ALB health checks specifically, make sure you configure appropriate thresholds in your target group settings. A healthy threshold of 2 means the target must pass 2 consecutive health checks before receiving traffic. An unhealthy threshold of 3 means it must fail 3 consecutive checks before being removed. Set your interval to 30 seconds and timeout to 5 seconds to balance responsiveness with stability.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Terraform configuration for ALB health checks</span>
resource <span class="hljs-string">"aws_lb_target_group"</span> <span class="hljs-string">"green"</span> {
  health_check {
    enabled             = <span class="hljs-literal">true</span>
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 30
    path                = <span class="hljs-string">"/health"</span>
    matcher             = <span class="hljs-string">"200"</span>
  }
}
</code></pre>
<p>This configuration ensures that ECS tasks aren't marked healthy prematurely (preventing traffic to broken tasks) while also not being overly sensitive to transient issues (preventing unnecessary task replacements).</p>
<h3 id="heading-plan-the-contract-phase">Plan the Contract Phase</h3>
<p>The contract phase is irreversible, so treat it with appropriate caution. Wait a minimum of 24-72 hours after green deployment before removing old schema elements. This waiting period isn't arbitrary: it ensures you've observed the system under various conditions.</p>
<p><strong>What to verify before contracting</strong>:</p>
<ul>
<li><p>Green has handled multiple daily traffic patterns (morning rush, evening peak, overnight batch jobs)</p>
</li>
<li><p>All scheduled jobs and cron tasks have run successfully with the new schema</p>
</li>
<li><p>Weekly reports or analytics pipelines have completed</p>
</li>
<li><p>Third-party integrations (payment processors, shipping APIs, analytics tools) are working</p>
</li>
<li><p>No unusual error patterns in logs</p>
</li>
<li><p>Business metrics (conversions, sign-ups, purchases) remain stable</p>
</li>
<li><p>Customer support hasn't reported related issues</p>
</li>
</ul>
<p>The pre-contract checklist:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># 1. Create a final snapshot</span>
aws rds create-db-snapshot \
    --db-instance-identifier ecommerce-bluegreen-db \
    --db-snapshot-identifier pre-contract-$(date +%Y%m%d-%H%M%S)

<span class="hljs-comment"># 2. Document current state</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Green tasks: <span class="hljs-subst">$(aws ecs describe-services --cluster ecommerce --services ecommerce-green | jq '.services[0].runningCount')</span>"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Error rate: <span class="hljs-subst">$(aws cloudwatch get-metric-statistics --namespace AWS/ApplicationELB --metric-name HTTPCode_Target_5XX_Count --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S)</span> --end-time <span class="hljs-subst">$(date -u +%Y-%m-%dT%H:%M:%S)</span> --period 3600 --statistics Sum)"</span>

<span class="hljs-comment"># 3. Notify team</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Running contract migration at <span class="hljs-subst">$(date)</span>"</span>

<span class="hljs-comment"># 4. Run migration</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -f migrations/002_contract_address.sql

<span class="hljs-comment"># 5. Verify</span>
psql -h <span class="hljs-variable">$DB_ENDPOINT</span> -U dbadmin -d ecommerce -c <span class="hljs-string">"\d customers"</span>
</code></pre>
<h3 id="heading-version-your-apis">Version Your APIs</h3>
<p>When changing data formats, maintain backward compatibility by supporting both old and new API versions simultaneously. This allows API consumers (mobile apps, third-party integrations, other services) to migrate at their own pace without coordinating releases.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Support both API versions during transition</span>
<span class="hljs-meta">@app.route('/api/v1/customers/&lt;id&gt;')</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_customer_v1</span>(<span class="hljs-params">id</span>):</span>
    customer = Customer.find(id)
    <span class="hljs-keyword">return</span> jsonify({
        <span class="hljs-string">'id'</span>: customer.id,
        <span class="hljs-string">'name'</span>: customer.name,
        <span class="hljs-string">'address'</span>: customer.address  <span class="hljs-comment"># Old format</span>
    })

<span class="hljs-meta">@app.route('/api/v2/customers/&lt;id&gt;')</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_customer_v2</span>(<span class="hljs-params">id</span>):</span>
    customer = Customer.find(id)
    <span class="hljs-keyword">return</span> jsonify({
        <span class="hljs-string">'id'</span>: customer.id,
        <span class="hljs-string">'name'</span>: customer.name,
        <span class="hljs-string">'address'</span>: {  <span class="hljs-comment"># New structured format</span>
            <span class="hljs-string">'street'</span>: customer.street_address,
            <span class="hljs-string">'city'</span>: customer.city,
            <span class="hljs-string">'state'</span>: customer.state,
            <span class="hljs-string">'zip'</span>: customer.zip_code
        }
    })
</code></pre>
<p>To implement this, you can initially deploy both endpoints with blue-green. Then monitor usage of v1 endpoint over time. Once v1 traffic drops below 1% (meaning clients have migrated), deprecate it formally. Remove v1 endpoint in a subsequent release, not during the blue-green deployment itself.</p>
<p>Announce the new API version to consumers with a migration timeline. Give them 2-3 months to update their integrations. Send reminder emails at the halfway point and 2 weeks before v1 shutdown.</p>
<h3 id="heading-monitor-both-environments">Monitor Both Environments</h3>
<p>During the transition period, both blue and green are production environments serving real traffic. Monitor them separately to detect version-specific issues.</p>
<p>Set up separate CloudWatch dashboards for blue and green target groups with the same metrics arranged identically. This makes it easy to spot differences at a glance. If green's response time is 200ms while blue's is 50ms, that's a red flag.</p>
<h4 id="heading-alert-on-metric-divergence">Alert on metric divergence</h4>
<p>Create alarms that trigger when green's metrics deviate significantly from blue's baseline. For example, if green's error rate is more than 2x blue's historical average, trigger an alert. If green's database query time is 50% higher, investigate before shifting more traffic.</p>
<h4 id="heading-log-aggregation">Log aggregation</h4>
<p>Ensure logs from both environments are tagged with their version (<code>environment: blue</code> or <code>environment: green</code>) so you can filter and compare them. Use CloudWatch Insights queries to spot patterns.</p>
<h2 id="heading-when-not-to-use-blue-green">When NOT to Use Blue-Green</h2>
<p>Blue-green isn't always the right choice. Avoid it when you have:</p>
<ul>
<li><p><strong>Very large database migrations</strong>: If your migration takes hours or requires significant locks, use a traditional maintenance window.</p>
</li>
<li><p><strong>Highly stateful applications</strong>: Real-time collaboration tools or WebSocket applications with complex in-memory state may need rolling deployments instead.</p>
</li>
<li><p><strong>Cost constraints</strong>: Running two environments doubles costs. Consider canary deployments for cost-sensitive applications.</p>
</li>
<li><p><strong>Complex data model redesigns</strong>: Use the strangler fig pattern to gradually migrate functionality to a new service.</p>
</li>
</ul>
<h3 id="heading-alternative-deployment-strategies">Alternative Deployment Strategies</h3>
<h4 id="heading-canary-deployments">Canary Deployments</h4>
<p>Route a small percentage (5-10%) to the new version:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"trafficRouting"</span>: {
    <span class="hljs-attr">"type"</span>: <span class="hljs-string">"TimeBasedCanary"</span>,
    <span class="hljs-attr">"timeBasedCanary"</span>: {
      <span class="hljs-attr">"canaryPercentage"</span>: <span class="hljs-number">10</span>,
      <span class="hljs-attr">"canaryInterval"</span>: <span class="hljs-number">5</span>
    }
  }
}
</code></pre>
<h3 id="heading-rolling-deployments">Rolling Deployments</h3>
<p>Gradually replace old tasks with new ones:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"deploymentConfiguration"</span>: {
    <span class="hljs-attr">"maximumPercent"</span>: <span class="hljs-number">200</span>,
    <span class="hljs-attr">"minimumHealthyPercent"</span>: <span class="hljs-number">100</span>
  }
}
</code></pre>
<h2 id="heading-cleanup">Cleanup</h2>
<p>After you've successfully completed your blue-green deployment, validated the green environment, and run the contract phase, you need to clean up the AWS resources to avoid unnecessary costs and resource sprawl.</p>
<p><strong>What you're removing</strong>:</p>
<ul>
<li><p>The entire infrastructure stack (VPC, subnets, NAT gateways, load balancer, ECS cluster, RDS database, and all associated resources)</p>
</li>
<li><p>This is appropriate for a tutorial/testing scenario where you deployed everything from scratch</p>
</li>
</ul>
<p>Important considerations before cleanup:</p>
<ul>
<li><p>Ensure you have backups if you need to reference any data later</p>
</li>
<li><p>Export any logs or metrics you want to retain</p>
</li>
<li><p>Document lessons learned from the deployment</p>
</li>
<li><p>Verify no production traffic is still using these resources</p>
</li>
</ul>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> terraform

<span class="hljs-comment"># Terraform will prompt you to confirm with "yes"</span>
<span class="hljs-comment"># Review the destruction plan carefully before confirming</span>
terraform destroy  <span class="hljs-comment"># Takes ~10-15 minutes</span>
</code></pre>
<p><strong>Partial cleanup</strong>: If you want to keep certain resources (like RDS snapshots for reference), you can remove them from Terraform state before destroying:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Remove RDS from Terraform management before destroying</span>
terraform state rm aws_db_instance.main
terraform destroy  <span class="hljs-comment"># Now destroys everything except RDS</span>
</code></pre>
<p>For production environments, you would NOT destroy everything. Instead, you'd decommission the blue environment specifically after confirming green is stable:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Production scenario - remove only blue environment</span>
terraform destroy -target=aws_ecs_service.blue
terraform destroy -target=aws_lb_target_group.blue
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Blue-green deployments with databases require careful planning, but the expand-contract pattern makes it manageable.</p>
<p>Here are some key takeaways:</p>
<ol>
<li><p><strong>Use expand-contract as default</strong> – Maintains backwards compatibility and safe rollbacks.</p>
</li>
<li><p><strong>Externalize state</strong> – Sessions, caches, and storage should use external services.</p>
</li>
<li><p><strong>Plan for three phases</strong> – Don't rush to the contract phase.</p>
</li>
<li><p><strong>Test everything in staging</strong> – Mirror production scale and complexity.</p>
</li>
<li><p><strong>Monitor aggressively</strong> – Track technical and business metrics for both environments.</p>
</li>
<li><p><strong>Know when to use alternatives</strong> – Blue-green isn't always the answer.</p>
</li>
<li><p><strong>Document rollback procedures</strong> – Everyone should know the rollback process before deployment.</p>
</li>
</ol>
<p>The expand-contract pattern requires more work upfront, but this investment pays dividends in reduced risk and maintained uptime. With the strategies and complete implementation provided here, you can successfully deploy even complex, stateful applications with confidence.</p>
<p>As always, I hope you enjoyed this guide and learned something. If you want to stay connected or see more hands-on DevOps content, you can follow me on <a target="_blank" href="https://www.linkedin.com/in/destiny-erhabor">LinkedIn</a>.</p>
<p>For more practical hands-on Cloud/DevOps projects like this one, follow and star this repository: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building">Learn-DevOps-by-building</a>.</p>
<h2 id="heading-further-resources">Further Resources</h2>
<ul>
<li><p>Complete Code: <a target="_blank" href="https://github.com/Caesarsage/bluegreen-deployment-ecs">github.com/Caesarsage/bluegreen-deployment-ecs</a></p>
</li>
<li><p>Learn DevOps by Building: <a target="_blank" href="https://github.com/Caesarsage/Learn-DevOps-by-building">GitHub repo</a></p>
</li>
<li><p>AWS ECS Blue/Green Documentation: <a target="_blank" href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-type-bluegreen.html">AWS Docs</a></p>
</li>
<li><p>AWS CodeDeploy for ECS: <a target="_blank" href="https://docs.aws.amazon.com/codedeploy/latest/userguide/deployment-steps-ecs.html">AWS Docs</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How Relational Database Constraints Work and Why They're Important ]]>
                </title>
                <description>
                    <![CDATA[ Databases are a crucial tool because they store the data that power our day-to-day lives. Databases are designed to match the real world as much as possible, so they store data of different forms, about different things, just as it is in the world. T... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-relational-database-constraints-work-and-why-theyre-important/</link>
                <guid isPermaLink="false">69681d224dcb07c08e435626</guid>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Relational Database ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Zubair Idris Aweda ]]>
                </dc:creator>
                <pubDate>Wed, 14 Jan 2026 22:48:02 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768416017042/66390973-a4cb-4e7a-9161-2d737045bf7b.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Databases are a crucial tool because they store the data that power our day-to-day lives. Databases are designed to match the real world as much as possible, so they store data of different forms, about different things, just as it is in the world.</p>
<p>There are many rules that govern how entities interact with each other, to make things work. For example, a student can’t take a course that the school doesn’t offer. A soccer player can’t have a jersey number less than 1 or greater than 99. And a car must always have a plate number.</p>
<p>Relational databases are also able to represent and enforce these rules using <strong>constraints</strong>. And in this article, I’ll explain how constraints work with practical examples.</p>
<p>Whether you’re a beginner or just looking to refresh your knowledge, this article will help you learn the essentials. If you need some more background, you can read this article on the <a target="_blank" href="https://www.freecodecamp.org/news/learn-relational-database-basics-key-concepts-for-beginners/">basics of relational databases</a> before continuing.</p>
<h3 id="heading-what-well-cover">What We’ll Cover:</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-a-relational-database-constraint">What is a Relational Database Constraint?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-types-of-relational-database-constraints">Types of Relational Database Constraints</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-inherent-model-based-constraints-implicit-constraints">Inherent Model-based Constraints (Implicit Constraints)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-schema-based-constraints-explicit-constraints">Schema-based Constraints (Explicit Constraints)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-application-based-constraints-semantic-constraints">Application-based constraints (Semantic constraints)</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-testing-constraints">Testing Constraints</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-how-to-delete-a-record">How to Delete a Record</a></li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-summary">Summary</a></p>
</li>
</ol>
<h2 id="heading-what-is-a-relational-database-constraint">What is a Relational Database Constraint?</h2>
<p>Relational database constraints are a set of database rules that are used to define or determine what set of values are acceptable or valid in a database. They’re usually based on the many rules of the real world.</p>
<p>They are put in place to:</p>
<ul>
<li><p>Ensure data accuracy: only values that would be acceptable in real life should be acceptable in the database. Learn more about data accuracy <a target="_blank" href="https://www.ibm.com/think/topics/data-accuracy">here</a>.</p>
</li>
<li><p>Ensure data integrity: values in the database remain correct, accurate, complete, and valid as long as the database exists. Learn more about data integrity <a target="_blank" href="https://www.fortinet.com/uk/resources/cyberglossary/data-integrity">here</a>.</p>
</li>
<li><p>Ensure data consistency: values always maintain same agreed form throughout their lifetime.</p>
</li>
</ul>
<p>These rules limit what can be entered into a database or what can be deleted from it. They also limit data update to ensure validity after original creation.</p>
<blockquote>
<p>These integrity constraints help enforce business rules on data in the tables to ensure the accuracy and reliability of the data. - <a target="_blank" href="https://aws.amazon.com/rds/what-is-a-relational-database/">AWS</a></p>
</blockquote>
<h2 id="heading-types-of-relational-database-constraints">Types of Relational Database Constraints</h2>
<p>There are many ways to group or categorise database constraints, depending on how they’re applied or what they’re preventing. This article focuses on three popular types:</p>
<ul>
<li><p>Inherent model-based constraints (implicit constraints)</p>
</li>
<li><p>Schema-based constraints (explicit constraints)</p>
</li>
<li><p>Application-based constraints (semantic constraints)</p>
</li>
</ul>
<h3 id="heading-inherent-model-based-constraints-implicit-constraints">Inherent Model-based Constraints (Implicit Constraints)</h3>
<p>These rules are the base rules that come with the database and are enforced by the DMBS. Some of these rules are:</p>
<ul>
<li><p>Each row must be unique. This is with or without a <code>UNIQUE</code> or <code>PRIMARY KEY</code> constraint.</p>
</li>
<li><p>Columns can only store one value at a time. The value of a field like <code>age</code> will always be one value like 23, not 23 and 35.</p>
</li>
<li><p>Each column name in a table must be unique.</p>
</li>
<li><p>Columns exist for all rows. Every row will have the same number of columns. For some of the rows, the data might be empty, but the column will always be there.</p>
</li>
</ul>
<h3 id="heading-schema-based-constraints-explicit-constraints">Schema-based Constraints (Explicit Constraints)</h3>
<p>These constraints are expressed by the developer or database designer on database creation. They’re expressed directly in the database schemas, using the <a target="_blank" href="https://en.wikipedia.org/wiki/Data_definition_language">DDL</a>.</p>
<p>These can be further broken down into:</p>
<ul>
<li><p>Domain constraints</p>
</li>
<li><p>Key constraints</p>
<ul>
<li><p>Entity integrity constraint (Primary key)</p>
</li>
<li><p>Unique constraint (Unique key)</p>
</li>
<li><p>Referential integrity constraint (Foreign key)</p>
</li>
</ul>
</li>
</ul>
<h4 id="heading-1-domain-constraints">1. Domain Constraints</h4>
<p>These are used to define a range or set of possible values for an attribute of a database table. They help ensure that column values are valid and consistent by defining acceptable data types, formats, and ranges for an attribute. This prevents incorrect or illogical data entry and maintains data integrity.</p>
<p>You can define them simply by specifying a data type that the values must follow. For example, the <code>age</code> of a person can only be a number, or could be a number between 18-60 if the database is for a company, or a number between 5-65 if it’s for an amusement park.</p>
<p>The database will enforce this rule by rejecting age values outside of the given range or type. The DDL for the age would look like this:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> people (
    age <span class="hljs-built_in">INT</span>, <span class="hljs-comment">-- Any integer value is allowed</span>
    age <span class="hljs-built_in">INT</span> <span class="hljs-keyword">CHECK</span> (age <span class="hljs-keyword">BETWEEN</span> <span class="hljs-number">18</span> <span class="hljs-keyword">AND</span> <span class="hljs-number">60</span>), <span class="hljs-comment">-- Only allows ages between 18 and 60</span>
    age <span class="hljs-built_in">INT</span> <span class="hljs-keyword">CHECK</span> (age <span class="hljs-keyword">BETWEEN</span> <span class="hljs-number">5</span> <span class="hljs-keyword">AND</span> <span class="hljs-number">65</span>) <span class="hljs-comment">-- Only allows ages between 5 and 65</span>
);
</code></pre>
<p>The <code>INT</code> means only integer values are accepted, and the <code>CHECK</code> is used with the <code>BETWEEN</code> and <code>AND</code> keywords to specify the sub-domain or range of values.</p>
<p>Other <a target="_blank" href="https://www.w3schools.com/sql/sql_datatypes.asp">data types in SQL</a> include: <code>CHAR</code>, <code>BIT</code>, <code>DATE</code>, <code>VARCHAR</code> and so on. You can use all of them to define the acceptable domain for database values.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> employees (
    employee_id <span class="hljs-built_in">INT</span>,
    <span class="hljs-keyword">name</span> <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>),
    age <span class="hljs-built_in">INT</span> <span class="hljs-keyword">CHECK</span> (age <span class="hljs-keyword">BETWEEN</span> <span class="hljs-number">18</span> <span class="hljs-keyword">AND</span> <span class="hljs-number">60</span>)
);
</code></pre>
<p>As well as defining a range of acceptable values, you can also define the optionality of an attribute using the <code>NOT NULL</code> keyword. You’d use this in cases where the data must exist and must also be within the given range.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> employees (
    employee_id <span class="hljs-built_in">INT</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    <span class="hljs-keyword">name</span> <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    age <span class="hljs-built_in">INT</span> <span class="hljs-keyword">CHECK</span> (age <span class="hljs-keyword">BETWEEN</span> <span class="hljs-number">18</span> <span class="hljs-keyword">AND</span> <span class="hljs-number">60</span>)
);
</code></pre>
<p>In this example, every employee record needs to have an <code>employee_id</code> and a <code>name</code> but not an <code>age</code>. This works for real life situations where, although the range of values is known, the actual value is either unknown or doesn’t exist. An example would be the minor course of study of a student at a university – many students only have majors, and as such, the minor course of study will be empty (NULL) for those students.</p>
<h4 id="heading-2-entity-integrity-constraint-primary-key">2. Entity integrity constraint (Primary key)</h4>
<p>This ensures that no primary key is NULL. The primary key is the one attribute or set of attributes that must be unique to each row in the database. It’s the primary value that uniquely identifies the rest of the data. This means that every row in the database will remain uniquely identifiable with a primary key.</p>
<p>A NULL primary key means that rows will not be unique, or identifiable, and the database can contain duplicates. Without the primary key, we can’t have data consistency.</p>
<p>For example, in a school, every student will have a unique student id number with which they can always be distinguished from other students. The government uses methods like passport numbers or tax ids to uniquely identify citizens.</p>
<p>In our example, it’s impossible to be a student without a student id number. You can implement this constraint by using the <code>PRIMARY KEY</code> keyword.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> employees (
    employee_id <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>,
    <span class="hljs-keyword">name</span> <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>),
    age <span class="hljs-built_in">INT</span> <span class="hljs-keyword">CHECK</span> (age <span class="hljs-keyword">BETWEEN</span> <span class="hljs-number">18</span> <span class="hljs-keyword">AND</span> <span class="hljs-number">60</span>)
);
</code></pre>
<h4 id="heading-3-unique-constraint-unique-key">3. Unique constraint (Unique key)</h4>
<p>This is similar to the <strong>Entity integrity constraint</strong> in that it only accepts unique values – but it’s different in that it accepts NULL values.</p>
<p>An example of this would be in a students table, every student must have a student id number that uniquely identifies them. This number cannot be NULL, and it must the unique. Students can also have an email address that the school can reach them on. This email must be unique for each student. But, not every student has to have an email. So the condition is: <strong>“If the value exists, it must be unique”</strong>.</p>
<p>You can implement this constraint using the <code>UNIQUE</code> keyword, like this:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> students (
    student_id <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>, <span class="hljs-comment">-- Must exist and must be unique</span>
    email <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">255</span>) <span class="hljs-keyword">UNIQUE</span> <span class="hljs-comment">-- Can be NULL, but must be unique if provided</span>
);
</code></pre>
<h4 id="heading-4-referential-integrity-constraint-foreign-key">4. Referential integrity constraint (Foreign key)</h4>
<p>This constraint guards the relationship between two related tables. It is used to maintain consistency in the relationship. It requires that data from one table, A, being referenced in another table, B, must exist in the original table, A. For example, a student can’t register for a course the school doesn’t have.</p>
<p>To enforce this, the <code>FOREIGN KEY</code> keyword is used with the <code>REFERENCES</code> to define the table being referenced, and what attribute is being referred to.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> courses (
    course_id <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>,
    course_name <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>
);

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> students (
    student_id <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>,
    student_name <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    course_id <span class="hljs-built_in">INT</span>,
    <span class="hljs-keyword">FOREIGN</span> <span class="hljs-keyword">KEY</span> (course_id) <span class="hljs-keyword">REFERENCES</span> courses(course_id)
);
</code></pre>
<p>In this example, every value provided in the <code>course_id</code> of the <code>students</code> must be in the <code>courses</code> table.</p>
<h3 id="heading-application-based-constraints-semantic-constraints">Application-based constraints (Semantic constraints)</h3>
<p>These can also be called <strong>business rules</strong>. They can’t be directly expressed in the database schema, so they’re often implemented the application layer instead.</p>
<p>These are logical constraints, like saying <strong>“a course cannot have more than 30 students enrolled”</strong> or <strong>“a customer cannot place an order if it would exceed their credit limit”</strong>.</p>
<p>These rules are best implemented in the application, because it would be too complex (or sometimes impossible) to implement them on the database itself.</p>
<h2 id="heading-testing-constraints">Testing Constraints</h2>
<p>To demonstrate the constraints we’ve discussed here, let’s look at this sample school database setup:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> courses (course_id <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>, course_name <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>, max_students <span class="hljs-built_in">INT</span> <span class="hljs-keyword">CHECK</span> (max_students &gt; <span class="hljs-number">0</span>));

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> students (student_id <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>, student_name <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>, email <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>) <span class="hljs-keyword">UNIQUE</span>, age <span class="hljs-built_in">INT</span> <span class="hljs-keyword">CHECK</span> (age <span class="hljs-keyword">BETWEEN</span> <span class="hljs-number">5</span> <span class="hljs-keyword">AND</span> <span class="hljs-number">25</span>));

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> enrollments (
    enrollment_id <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>,
    student_id <span class="hljs-built_in">INT</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    course_id <span class="hljs-built_in">INT</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    enrollment_date <span class="hljs-built_in">DATE</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>,
    <span class="hljs-keyword">FOREIGN</span> <span class="hljs-keyword">KEY</span> (student_id) <span class="hljs-keyword">REFERENCES</span> students (student_id),
    <span class="hljs-keyword">FOREIGN</span> <span class="hljs-keyword">KEY</span> (course_id) <span class="hljs-keyword">REFERENCES</span> courses (course_id)
);
</code></pre>
<p>This shows the creation of a sample school database with three tables: <code>courses</code>, <code>students</code>, and <code>enrollments</code>.</p>
<p>The <code>courses</code> table includes a primary key for course IDs, course names, and a constraint ensuring that the maximum number of students is greater than zero. The <code>students</code> table contains a primary key for student IDs, student names, unique email addresses, and an age constraint between 5 and 25. The <code>enrollments</code> table links students to courses with primary keys for enrollment IDs and foreign keys referencing the <code>students</code> and <code>courses</code> tables, along with a non-null enrollment date.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768410057148/ffe42706-4540-4ddb-8394-c157fe999b96.png" alt="DDL to create database tables" class="image--center mx-auto" width="2310" height="628" loading="lazy"></p>
<p>At this point, the tables are created, and setup with the constraints guiding them.</p>
<p>Now we’ll test a few queries:</p>
<ol>
<li>Insert courses, Mathematics and History, into the <code>courses</code> table:</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> courses (course_id, course_name, max_students) <span class="hljs-keyword">VALUES</span> (<span class="hljs-number">1</span>, <span class="hljs-string">'Mathematics'</span>, <span class="hljs-number">30</span>);
<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span>
    courses (course_id, course_name, max_students)
<span class="hljs-keyword">VALUES</span>
    (<span class="hljs-number">2</span>, <span class="hljs-string">'History'</span>, <span class="hljs-number">25</span>);
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768410234586/bb08fb52-9573-4f57-88d2-6383138ddc7f.png" alt="Query to insert courses" class="image--center mx-auto" width="2310" height="392" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768410261770/f974693c-25fc-4952-80c7-6db7f4959236.png" alt="Result of insert query" class="image--center mx-auto" width="2310" height="162" loading="lazy"></p>
<p>The query works perfectly, as the records get inserted.</p>
<ol start="2">
<li>Insert students, Alice and Bob, into the <code>students</code> table:</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span>
    students (student_id, student_name, email, age)
<span class="hljs-keyword">VALUES</span>
    (<span class="hljs-number">101</span>, <span class="hljs-string">'Alice'</span>, <span class="hljs-string">'alice@example.com'</span>, <span class="hljs-number">20</span>);

<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span>
    students (student_id, student_name, email, age)
<span class="hljs-keyword">VALUES</span>
    (<span class="hljs-number">102</span>, <span class="hljs-string">'Bob'</span>, <span class="hljs-literal">NULL</span>, <span class="hljs-number">18</span>);
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768410360967/92aee819-63fe-4b3c-af2b-b579d66e58a9.png" alt="Query to insert students" class="image--center mx-auto" width="2310" height="532" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768410407790/ec9893cf-ded6-4ec2-bac1-c78d430a84fb.png" alt="Result of query" class="image--center mx-auto" width="2310" height="158" loading="lazy"></p>
<p>The query works perfectly, as the records get inserted.</p>
<ol start="3">
<li>Enroll Alice into Mathematics:</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span>
    enrollments (enrollment_id, student_id, course_id, enrollment_date)
<span class="hljs-keyword">VALUES</span>
    (<span class="hljs-number">1001</span>, <span class="hljs-number">101</span>, <span class="hljs-number">1</span>, <span class="hljs-string">'2026-01-14'</span>);
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768410510198/be229077-028c-4048-836c-fb7796b02f7d.png" alt="Query to insert enrollment" class="image--center mx-auto" width="2310" height="290" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768410543235/f546d371-da31-4f42-be3d-20bd2bb1aa3a.png" alt="Result of query" class="image--center mx-auto" width="2310" height="114" loading="lazy"></p>
<p>The query works perfectly, as the record gets inserted.</p>
<ol start="4">
<li>Insert a new student, Charlie, into the <code>students</code> table:</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span>
    students (student_id, student_name, email, age)
<span class="hljs-keyword">VALUES</span>
    (<span class="hljs-number">103</span>, <span class="hljs-string">'Charlie'</span>, <span class="hljs-string">'charlie@example.com'</span>, <span class="hljs-number">30</span>);
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768410642880/0b9f038d-fb4f-49e1-a92f-3c1fd01c3fae.png" alt="Failed query to insert student" class="image--center mx-auto" width="2310" height="290" loading="lazy"></p>
<p>This fails because Charlie has an <code>age</code> value of 30, which is outside of the specified range of <code>age INT CHECK (age BETWEEN 5 AND 25)</code>. The record of Charlie never gets added.</p>
<p>Here’s a list of some other queries that will fail:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span>
    students (student_id, student_name, email, age)
<span class="hljs-keyword">VALUES</span>
    (<span class="hljs-number">104</span>, <span class="hljs-string">'David'</span>, <span class="hljs-string">'alice@example.com'</span>, <span class="hljs-number">19</span>); <span class="hljs-comment">-- Fails for duplicate email</span>

<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span>
    students (student_id, student_name, email, age)
<span class="hljs-keyword">VALUES</span>
    (<span class="hljs-literal">NULL</span>, <span class="hljs-string">'Evra'</span>, <span class="hljs-string">'evra@example.com'</span>, <span class="hljs-number">20</span>); <span class="hljs-comment">-- Fails for NULL primary key</span>

<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span>
    enrollments (enrollment_id, student_id, course_id, enrollment_date)
<span class="hljs-keyword">VALUES</span>
    (<span class="hljs-number">1002</span>, <span class="hljs-number">999</span>, <span class="hljs-number">1</span>, <span class="hljs-string">'2026-01-14'</span>); <span class="hljs-comment">-- Fails for invalid student reference</span>
</code></pre>
<p>In each case, the DBMS will provide a reason for the rejection or failure.</p>
<ol start="5">
<li>Delete Bob from the <code>students</code> table:</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">FROM</span> students
<span class="hljs-keyword">WHERE</span>
    student_id = <span class="hljs-number">102</span>;
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768411198451/3e33efdb-68e9-4cf5-a809-d2053059c29d.png" alt="Query to delete student" class="image--center mx-auto" width="2310" height="260" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768411236739/50b24978-2612-4232-8169-3a24377b39a0.png" alt="Result of query" class="image--center mx-auto" width="2310" height="114" loading="lazy"></p>
<p>The query works perfectly, as the record gets deleted.</p>
<ol start="6">
<li>Delete Alice from the <code>students</code> table:</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">FROM</span> students
<span class="hljs-keyword">WHERE</span>
    student_id = <span class="hljs-number">101</span>; <span class="hljs-comment">-- Fails for referential integrity constraint</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768411336408/8fc634c2-1f10-41f2-b413-4c6b5af22369.png" alt="Failed query to delete students" class="image--center mx-auto" width="2310" height="278" loading="lazy"></p>
<p>This fails because Alice, with <code>student_id</code> of 101, has an enrollment record in the <code>enrollments</code> table. Deleting the record would mean there will be an enrollment record for a non-existent student which should not be possible.</p>
<h3 id="heading-how-to-delete-a-record">How to Delete a Record</h3>
<p>In some cases, you do want to delete a record, even though it has records tied to it. There are two main ways to go about this:</p>
<h4 id="heading-cascade">CASCADE</h4>
<p>You can use this to define situations where, when a parent record is deleted, the child records cannot exist. All dependent (child) records in other tables are <strong>automatically deleted</strong>. You can use this to ensure that all enrollment records are deleted when the course is no longer available, or when a student is no longer in the school.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> enrollments (enrollment_id <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>, student_id <span class="hljs-built_in">INT</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>, course_id <span class="hljs-built_in">INT</span> <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>, <span class="hljs-keyword">FOREIGN</span> <span class="hljs-keyword">KEY</span> (course_id) <span class="hljs-keyword">REFERENCES</span> courses (course_id) <span class="hljs-keyword">ON</span> <span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">CASCADE</span>);

<span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">FROM</span> courses
<span class="hljs-keyword">WHERE</span>
    course_id = <span class="hljs-number">1</span>;
</code></pre>
<h4 id="heading-set-null-or-set-default">SET NULL or SET DEFAULT</h4>
<p>You can use these methods to define situations where child records can still exist without the parent. All dependent (child) records in other tables are <strong>automatically set to null</strong> or <strong>automatically set to a defined default.</strong></p>
<p>A useful example is if a school had a mentor assigned to students, when the mentor leaves the school, you don’t want to delete the students – you want to set the mentor to NULL or a default staff.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> teachers (teacher_id <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>, teacher_name <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>);

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> students (student_id <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>, student_name <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">100</span>) <span class="hljs-keyword">NOT</span> <span class="hljs-literal">NULL</span>, mentor_id <span class="hljs-built_in">INT</span>, <span class="hljs-keyword">FOREIGN</span> <span class="hljs-keyword">KEY</span> (mentor_id) <span class="hljs-keyword">REFERENCES</span> teachers (teacher_id) <span class="hljs-keyword">ON</span> <span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">SET</span> <span class="hljs-literal">NULL</span>);
</code></pre>
<ol start="7">
<li>Update Alice’s details. Change her email to a new one, and increase her age:</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">UPDATE</span> students
<span class="hljs-keyword">SET</span>
    email = <span class="hljs-string">'alice.new@example.com'</span>,
    age = <span class="hljs-number">22</span>
<span class="hljs-keyword">WHERE</span>
    student_id = <span class="hljs-number">101</span>;
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768411782761/0fe14052-2bbd-4c44-952d-a13ce30947ce.png" alt="Query to update student" class="image--center mx-auto" width="2310" height="346" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768411806702/651e9694-5821-4aa0-b606-98ab361a9637.png" alt="Result of query" class="image--center mx-auto" width="2310" height="112" loading="lazy"></p>
<p>The query works perfectly, as the record gets updated.</p>
<ol start="8">
<li>Update Alice’s age to 30:</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">UPDATE</span> students
<span class="hljs-keyword">SET</span>
    age = <span class="hljs-number">30</span>
<span class="hljs-keyword">WHERE</span>
    student_id = <span class="hljs-number">101</span>;
</code></pre>
<p>This fails just like the 4th test for the same reason: the <code>age</code> is out of the stated range.</p>
<p>Here’s another query that will fail:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">UPDATE</span> enrollments
<span class="hljs-keyword">SET</span>
    course_id = <span class="hljs-number">999</span>
<span class="hljs-keyword">WHERE</span>
    enrollment_id = <span class="hljs-number">1001</span>;
</code></pre>
<p>This will fail because the new <code>course_id</code> does not exist in the <code>courses</code> table.</p>
<h2 id="heading-summary">Summary</h2>
<p>Databases are a pivotal part of everyday modern technology, and understanding their fundamental concepts can open doors to building and managing more accurate databases.</p>
<p>This article introduced you to what relational database constraints are, some of the different types, and how they’re enforced and violated. You should now have the essential knowledge to navigate the world of database constraints confidently.</p>
<p>If you’re curious to learn more, connect with me on <a target="_blank" href="https://www.linkedin.com/in/idris-aweda-zubair-5433121a3/">LinkedIn</a>, <a target="_blank" href="https://twitter.com/greatzubs">Twitter</a>, or <a target="_blank" href="https://github.com/Zubs">GitHub</a>. Let’s continue this journey together toward mastering database systems!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ freeCodeCamp's New Relational Databases Certification is Now Live ]]>
                </title>
                <description>
                    <![CDATA[ The freeCodeCamp community just published our new Relational Databases certification. You can now sit for the exam to earn the free verified certification, which you can add to your résumé, CV, or LinkedIn profile. Each certification is filled with h... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/freecodecamps-new-relational-databases-certification-is-now-live/</link>
                <guid isPermaLink="false">694595068a4df0c53d579475</guid>
                
                    <category>
                        <![CDATA[ freeCodeCamp.org ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Certification ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Relational Database ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Jessica Wilkins ]]>
                </dc:creator>
                <pubDate>Fri, 19 Dec 2025 18:10:14 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1765839442063/a5db9c4a-cb34-468b-b097-e4257803c29d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>The freeCodeCamp community just published our new <a target="_blank" href="https://www.freecodecamp.org/learn/relational-databases-v9/">Relational Databases certification</a>. You can now sit for the exam to earn the free verified certification, which you can add to your résumé, CV, or LinkedIn profile.</p>
<p>Each certification is filled with hundreds of hours worth of interactive lessons, workshops, labs, and quizzes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764617660453/0be3ab2b-90e9-4881-be81-47af607de2df.png" alt="List of modules for the Relational Databases Certification." class="image--center mx-auto" width="1688" height="1436" loading="lazy"></p>
<h2 id="heading-how-does-the-new-relational-databases-certification-work">How Does the New Relational Databases Certification Work?</h2>
<p>The new <a target="_blank" href="https://www.freecodecamp.org/learn/relational-databases-v9/">Relational Databases certification</a> will teach you core concepts including Bash scripting, SQL, Git, and more.</p>
<p>The certification is broken down into several modules that include lessons, workshops, labs, review pages, and quizzes to ensure that you truly understand the material before moving onto the next module.</p>
<p>The lessons are your first exposure to new concepts. They provide crucial theory and context for how things work in the software development industry.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764618152249/1b623cef-11d4-4ad2-9598-b273050db80f.png" alt="Example text from command line lesson" class="image--center mx-auto" width="2240" height="928" loading="lazy"></p>
<p>At the end of each lesson, there will be three comprehension check questions to test your understanding of the material from the lesson.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764618208440/7da0832d-b3a9-4e12-b21d-5215dcc48c13.png" alt="Example comprehension check question from the command line lesson." class="image--center mx-auto" width="1602" height="732" loading="lazy"></p>
<p>After the lesson blocks, you will do the workshops. These workshops are guided step-based projects that provide you with an opportunity to practice what you have learned in the lessons.</p>
<p>These workshops will not be done inside the regular freeCodeCamp editor in the browser. Instead you will need to do these workshops in one of three environments:</p>
<ul>
<li><p>GitHub Codespaces: This course runs in a virtual Linux machine using GitHub Codespaces.</p>
</li>
<li><p>Your own local environment: This course runs in a virtual Linux machine on your computer.</p>
</li>
<li><p>Ona: This course runs in a virtual Linux machine using Ona.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764639433422/09fb0695-42d6-4678-b2c1-1532eb840675.png" alt="Step 1 for the Build a Database of Video Game Characters workshop" class="image--center mx-auto" width="3014" height="828" loading="lazy"></p>
<p>After the workshop, you will complete a lab which will help you review what you have learned so far. This will give you chance to start building projects on your own, which is a crucial skill for a developer. You will be presented with a list of users stories and will need to pass the tests to complete the lab.</p>
<p>At the end of each module, there is a review page containing a list of all of the concepts covered. You can use these review pages to help you study for the quizzes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765815540302/7ad0cadc-f94d-4b80-b35e-e46b756fd407.png" alt="Example review page from the Bash and SQL module" class="image--center mx-auto" width="1740" height="1194" loading="lazy"></p>
<p>The last portion of the module is the quiz. This is a 20 question multiple choice quiz designed to test your understanding from the material covered in the module. You will need to get 18 out of 20 correct to pass.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765815587496/8878215f-9799-4259-ae3f-44d7101fba72.png" alt="Sample question from the Git Quiz." class="image--center mx-auto" width="1758" height="1142" loading="lazy"></p>
<p>Throughout the certification, there will be five certification projects you will need to complete in order to qualify for the exam.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765815626782/6839bfa8-1d6f-4425-8a6e-f4dfbd3f9581.png" alt="List of certification projects for the Relational Databases Certification." class="image--center mx-auto" width="1708" height="1296" loading="lazy"></p>
<p>Once you’ve completed all 5 certification projects, you’ll be able to take the 50 question exam using our new open source exam environment. The freeCodeCamp community designed this exam environment tool with two goals: respecting your privacy while also making it harder for people to cheat.</p>
<p>Once you download the app to your laptop or desktop, you can take the exam.</p>
<h2 id="heading-frequently-asked-questions">Frequently Asked Questions</h2>
<h3 id="heading-is-all-of-this-really-free">Is all of this really free?</h3>
<p>Yes. freeCodeCamp has always been free, and we’ve now offered free verified certifications for more than a decade. These exams are just the latest expansion to our community’s free learning resources.</p>
<h3 id="heading-what-prevents-people-from-just-cheating-on-the-exams">What prevents people from just cheating on the exams?</h3>
<p>Our goal is to strike a balance between preventing cheating and respecting people's right to privacy.</p>
<p>We've implemented a number of reliable, yet non-invasive, measures to help prevent people from cheating on freeCodeCamp's exams:</p>
<ol>
<li><p>For each exam, we have a massive bank of questions and potential answers to those questions. Each time a person attempts an exam, they'll see only a small, randomized sampling of these questions.</p>
</li>
<li><p>We only allow people to attempt an exam one time per week. This reduces their ability to "brute force" the exam.</p>
</li>
<li><p>We have security in place to validate exam submissions and prevent man-in-the-middle attacks or manipulation of the exam environment.</p>
</li>
<li><p>We manually review each passing exam for evidence of cheating. Our exam environment produces tons of metrics for us to draw from.</p>
</li>
</ol>
<p>We take cheating, and any form of academic dishonesty, seriously. We will act decisively.</p>
<p>This said, no one's exam results will be thrown out without human review, and no one's account will be banned without warning based on a single suspicious exam result.</p>
<h3 id="heading-are-these-exams-open-book-or-closed-book">Are these exams “open book” or “closed book”?</h3>
<p>All of freeCodeCamp’s exams are “closed book”, meaning you must rely only on your mind and not outside resources.</p>
<p>Of course, in the real world you’ll be able to look things up. And in the real world, we encourage you to do so.</p>
<p>But that is not what these exams are evaluating. These exams are instead designed to test your memory of details and your comprehension of concepts.</p>
<p>So when taking these exams, do not use outside assistance in the form of books, notes, AI tools, or other people. Use of any of these will be considered academic dishonesty.</p>
<h3 id="heading-do-you-record-my-webcam-microphone-or-require-me-to-upload-a-photo-of-my-personal-id">Do you record my webcam, microphone, or require me to upload a photo of my personal ID?</h3>
<p>No. We considered adding these as additional test-taking security measures. But we have less privacy-invading methods of detecting most forms of academic dishonesty.</p>
<h3 id="heading-if-the-environment-is-open-source-doesnt-that-make-it-less-secure">If the environment is open source, doesn't that make it less secure?</h3>
<p>"Given enough eyeballs, all bugs are shallow." – Linus’s Law, formulated by Eric S. Raymond in his book <em>The Cathedral and the Bazaar</em></p>
<p>Open source software projects are often more secure than their closed source equivalents. This is because a lot more people are scrutinizing the code. And a lot more people can potentially help identify bugs and other deficiencies, then fix them.</p>
<p>We feel confident that open source is the way to go for this exam environment system.</p>
<h3 id="heading-how-can-i-contribute-to-the-exam-environment-codebase">How can I contribute to the Exam Environment codebase?</h3>
<p>It's fully open source, and we'd welcome your code contributions. Please read our general <a target="_blank" href="https://contribute.freecodecamp.org/intro/">contributor onboarding documentation</a>.</p>
<p>Then check out the <a target="_blank" href="https://github.com/freeCodeCamp/exam-env">GitHub repo</a>.</p>
<p>You can help by creating issues to report bugs or request features.</p>
<p>You can also browse open <code>help wanted</code> issues and attempt to open pull requests addressing them.</p>
<h3 id="heading-are-the-exam-questions-themselves-open-source">Are the exam questions themselves open source?</h3>
<p>For obvious exam security reasons, the exam question banks themselves are not publicly accessible. :)</p>
<p>These are built and maintained by freeCodeCamp's staff instructional designers.</p>
<h3 id="heading-what-happens-if-i-have-internet-connectivity-issues-mid-exam">What happens if I have internet connectivity issues mid-exam?</h3>
<p>If you have internet connectivity issues mid exam, the next time you try submit an answer, you’ll be told there are connectivity issues. The system will keep prompting you to retry submitting until the connection succeeds.</p>
<h3 id="heading-what-if-my-computer-crashes-mid-exam">What if my computer crashes mid-exam?</h3>
<p>If your computer crashes mid exam, you’ll be able to re-open the Exam Environment. Then, if you still have time left for your exam attempt, you’ll be able to continue from where you left off.</p>
<h3 id="heading-can-i-take-exams-in-languages-other-than-english">Can I take exams in languages other than English?</h3>
<p>Not yet. We’re working to add multi-lingual support in the future.</p>
<h3 id="heading-i-have-completed-my-exam-why-cant-i-see-my-results-yet">I have completed my exam. Why can't I see my results yet?</h3>
<p>All exam attempts are reviewed by freeCodeCamp staff before we release the results. We do this to ensure the integrity of the exam process and to prevent cheating. Once your attempt has been reviewed, you'll be notified of your results the next time you log in to freeCodeCamp.org.</p>
<h3 id="heading-i-am-deaf-or-hard-of-hearing-can-i-still-take-the-exams">I am Deaf or hard of hearing. Can I still take the exams?</h3>
<p>Yes! While some exams may include audio components, we do make written transcripts available for reading.</p>
<h3 id="heading-i-am-blind-or-have-limited-vision-and-use-a-screen-reader-can-i-still-take-the-exams">I am blind or have limited vision, and use a screen reader. Can I still take the exams?</h3>
<p>We’re working on it. Our curriculum is fully screen reader accessible. We're still refining our screen reader usability for the Exam Environment app. This is a high priority for us.</p>
<h3 id="heading-i-use-a-keyboard-instead-of-a-mouse-can-i-navigate-the-exams-using-just-a-keyboard">I use a keyboard instead of a mouse. Can I navigate the exams using just a keyboard?</h3>
<p>This is a high priority for us. We hope to add keyboard navigation to the Exam Environment app soon.</p>
<h3 id="heading-are-exams-timed">Are exams timed?</h3>
<p>Yes, exams are timed. We err on the side of giving plenty of time to take the exam, to account for people who are non-native English speakers, or who have ADHD and other learning differences that can make timed exams more challenging.</p>
<p>If you have a condition that usually qualifies you for extra time on standardized exams, please email support@freecodecamp.org. We’ll review your request and see whether we can find a reasonable solution.</p>
<h3 id="heading-what-happens-if-i-fail-the-exam-can-i-retake-it">What happens if I fail the exam? Can I retake it?</h3>
<p>Yes. You get one exam attempt per week. After you attempt an exam, there is a one-week (exactly 168 hour) “cool-down” period where you cannot take any freeCodeCamp exams. This is to encourage you to study and to pace yourself.</p>
<p>There is no limit to the number of times you can take an exam. So if you fail, study more, practice your skills more, then try again the following week.</p>
<h3 id="heading-do-i-need-to-redo-the-projects-if-i-fail-the-exam">Do I need to redo the projects if I fail the exam?</h3>
<p>No. Once you’ve submitted a certification project, you do not need to ever submit it again.</p>
<p>You can re-do projects for practice, but we recommend that you instead build some of our many practice projects in freeCodeCamp’s developer interview job search section.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764629117812/35c0c09a-3131-4c01-8b97-d5c101912f9e.png" alt="A screenshot of the &quot;Prepare for the developer interview job search&quot; section with lots of coding projects" width="1562" height="958" loading="lazy"></p>
<h3 id="heading-what-happens-if-i-already-have-the-old-legacy-responsive-web-design-certification-should-i-claim-the-new-one">What happens if I already have the old Legacy Responsive Web Design certification? Should I claim the new one?</h3>
<p>The new certification has more theory and practice as well as an exam. So if you’re looking to brush up on your skills, then you can go through the new version of this certification.</p>
<h3 id="heading-what-will-happen-to-my-existing-coursework-progress-on-the-full-stack-certification-does-it-transfer-over-to-the-responsive-web-design-course">What will happen to my existing coursework progress on the Full Stack Certification? Does it transfer over to the Responsive Web Design course?</h3>
<p>If you’ve already started the <a target="_blank" href="https://www.freecodecamp.org/learn/full-stack-developer-v9/">Certified Full Stack Developer Curriculum</a>, all of your previously completed work should already be saved there.</p>
<p>To be clear, we’ve copied over all of the coursework from the full stack certification to this newer certification.</p>
<h3 id="heading-can-i-still-continue-with-the-current-full-stack-developer-certification-and-just-not-do-the-new-certification">Can I still continue with the current Full Stack Developer Certification and just not do the new certification?</h3>
<p>We’ve moved the coursework for the <a target="_blank" href="https://www.freecodecamp.org/learn/full-stack-developer-v9/">Full Stack Developer Certification</a> over and broken it up into smaller certifications. Currently there are seven courses available for you to go through. Here is the complete list:</p>
<ul>
<li><p><a target="_blank" href="https://www.freecodecamp.org/learn/responsive-web-design-v9/">Responsive Web Design Certification</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/learn/javascript-v9/">JavaScript Certification</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/learn/front-end-development-libraries-v9/">Frontend Libraries Certification</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/learn/python-v9/">Python Certification</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/learn/relational-databases-v9/">Relational Databases Certification</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/learn/back-end-development-and-apis-v9/">Backend JavaScript Certification</a></p>
</li>
<li><p><a target="_blank" href="https://www.freecodecamp.org/learn/full-stack-developer-v9/">Certified Full Stack Developer Certification</a></p>
</li>
</ul>
<p>The Certified Full Stack Developer Certification button will remain on the learn page for a short time to give people the opportunity to switch over to the new certifications. Over the next few months, though, this option will disappear.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763050732251/0276ffab-bd3f-46fe-bac0-a654ddfafcb5.png" alt="List of all certifications on the freeCodeCamp learn page." width="1834" height="1324" loading="lazy"></p>
<h3 id="heading-will-my-legacy-certifications-become-invalid">Will my legacy certifications become invalid?</h3>
<p>No. Once you claim a certification, it’s yours to keep.</p>
<p>Also note that we previously announced that freeCodeCamp certifications would have an expiration date and require recertification. We don’t plan to implement this anytime soon. And if we do decide to, we will give everyone at least a year’s notice.</p>
<h3 id="heading-will-the-exam-be-available-to-take-on-my-phone">Will the exam be available to take on my phone?</h3>
<p>At this time, no. You’ll need to use a laptop or desktop to download the exam environment and take the exam. We hope to eventually offer these certification exams on iPhone and Android.</p>
<h3 id="heading-i-have-a-disability-or-health-condition-that-is-not-covered-here-how-can-i-request-accommodations">I have a disability or health condition that is not covered here. How can I request accommodations?</h3>
<p>If you need specific accommodations for the exam (for example extra time, breaks, or alternative formats), please email support@freecodecamp.org. We’ll review your request and see whether we can find a reasonable solution.</p>
<h2 id="heading-anything-else">Anything else?</h2>
<p>Good luck working through freeCodeCamp’s coursework, building projects, and preparing for these exams.</p>
<p>Happy coding!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build an LSM Tree Storage Engine from Scratch – Full Handbook ]]>
                </title>
                <description>
                    <![CDATA[ Databases are one of the most important parts of a software system. They allow us to store huge amounts of data in an organized way and retrieve it efficiently when we need it. In the early days, when the volume of data was relatively small, engineer... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-an-lsm-tree-storage-engine-from-scratch-handbook/</link>
                <guid isPermaLink="false">6944631e80f40a442d1799df</guid>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ lsmtree ]]>
                    </category>
                
                    <category>
                        <![CDATA[ storage solutions ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Go Language ]]>
                    </category>
                
                    <category>
                        <![CDATA[ heap ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ramesh Sinha ]]>
                </dc:creator>
                <pubDate>Thu, 18 Dec 2025 20:25:02 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1766089431510/433ff03f-8aca-4a87-82d3-0b6d6c1f371c.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Databases are one of the most important parts of a software system. They allow us to store huge amounts of data in an organized way and retrieve it efficiently when we need it.</p>
<p>In the early days, when the volume of data was relatively small, engineers prioritized fast data retrieval and stored data in <a target="_blank" href="https://en.wikipedia.org/wiki/B-tree">B-tree structures</a> that made searching efficient.</p>
<p>But over time, we started building systems that needed to ingest massive amounts of data like logs, metrics, likes, chats and tweets. This made it necessary to design a storage system that would make writing faster.</p>
<p>One such storage system is the LSM-tree (Log-Structured Merge tree).</p>
<p>In this tutorial, rather than immediately diving into the theoretical concepts of an LSM-Tree Storage system, I’ll take a practical, problem-driven approach. I believe that learning through solving problems is far more effective and engaging than simple memorization of concepts.</p>
<p>By approaching these ideas progressively, my goal is to guide you step by step through real-world engineering challenges and solutions, giving you a front-row seat to the intricacies of building a robust storage system from scratch.</p>
<p>We’ll begin by identifying real-world challenges that arise in database design – like handling write-heavy workloads, ensuring data durability, or managing efficient storage. These challenges will set the stage for each feature and component of LSM-Trees.</p>
<p>Through this method, we’ll explore the foundations of LSM-Tree storage systems and dive deeper into their key components: MemTable, SSTable, Write-Ahead Log (WAL), and Manifest File.</p>
<p>We’ll also examine the Write and Read paths, explore Durability and Crash-Recovery mechanisms, and conclude with one of the most critical processes: Compaction.</p>
<p>By the end of this handbook, you’ll understand not just what these components are but also why they are designed the way they are and how they solve the unique challenges of building modern, high-performance databases.</p>
<h3 id="heading-what-well-cover"><strong>What We’ll Cover:</strong></h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-an-lsm-tree">What is an LSM Tree?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-preface-setting-up-to-build-an-lsm-tree-database">Preface: Setting up to Build an LSM-Tree Database</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-initial-feature-set-laying-the-foundation-of-the-database-system">Initial Feature Set: Laying the Foundation of the Database System</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-memtable-in-memory-data-storage">MemTable: In-Memory Data Storage</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-sstable-persisting-data-for-durability">SSTable: Persisting Data for Durability</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-wal-write-ahead-log-crash-recovery-made-simple">The WAL (Write Ahead Log ): Crash Recovery Made Simple</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-manifest-file-tracking-the-state-of-the-database">Manifest File: Tracking the State of the Database</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-update-and-delete-handling-mutability-in-an-immutable-system">Update and Delete: Handling Mutability in an Immutable System</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-compaction-cleaning-up-stale-and-deleted-data">Compaction: Cleaning Up Stale and Deleted Data</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-complete-code">Complete Code</a></li>
</ul>
</li>
</ol>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>While this tutorial is designed to be comprehensive and approachable, it’ll be helpful if you come in with some foundational knowledge in the following areas:</p>
<ul>
<li><p><strong>Programming in Golang</strong>: Familiarity with Go syntax, error handling, and standard libraries (example <code>os,</code> <code>encoding/gob</code>, <code>container/heap</code>) will make it easier to work through the implementation examples.</p>
</li>
<li><p><strong>Basic data structures and algorithms:</strong> Concepts such as maps, heaps, some sorting algorithms, and early termination are leveraged throughout the tutorial.</p>
</li>
<li><p><strong>Understanding persistent storage:</strong> Awareness of the differences between in-memory and disk-based storage, as well as sequential versus random read/write operations will be helpful in grasping performance-related trade-offs.</p>
</li>
<li><p><strong>General database knowledge:</strong> If you're familiar with key-value databases or CRUD operations (Create, Read, Update, Delete), you’ll have a head start.</p>
</li>
<li><p><strong>Concurrency</strong>: Basic understanding of threads and concurrency.</p>
</li>
</ul>
<p>While having experience in these areas will deepen your understanding of the concepts and reduce the learning curve, I will provide sufficient detail and practical explanations at every step ensuring you gain the insights necessary to follow along and build your own LSM-tree-based storage engine.</p>
<h2 id="heading-what-is-an-lsm-tree">What is an LSM Tree?</h2>
<p>A log-structured merge-tree (or LSM tree) is a data structure that makes database writes super fast by recording new data in memory first, then periodically sorting and merging it into larger files on disk.</p>
<p>The “log” in its name refers to the fact that it saves data in a log-structured format (rather than simply storing it). We will come to what those logs are in a little bit.</p>
<p>LSM trees keep appending new data to the existing data, instead of looking for something that exists and updating it. In other words, you don't have to spend any CPU cycles thinking about where to store data – just append it at the end.</p>
<p>An LSM tree also has "tree" in its name, but does it actually store data in a tree? Not really. The “tree” here is mostly an abstract concept. It refers to the hierarchical organization of levels (L0, L1, L2, and so on), not a tree data structure with nodes and pointers. Again, we will come to those levels in a little bit, but for now, let’s just say it makes sense to call it a tree given that it stores data in a leveled fashion.</p>
<p>Just note that there isn't a tangible tree structure in play (like a binary trees or graph) – it’s not node-based storage.</p>
<p>Finally, there is the "merge" part of the name. For now, suffice it to say that you’ll soon see how this storage engine merges data to save storage by avoiding duplication.</p>
<p>Personally, I think that "Log-Structured Merge <strong>System</strong>" would be clearer than "tree," but "LSM tree" is the established term in the industry, so that's what we'll use.</p>
<h2 id="heading-preface-setting-up-to-build-an-lsm-tree-database">Preface: Setting up to Build an LSM-Tree Database</h2>
<p>Now that we have set the context, lets put this theory into practice and start building our own LSM-tree-based database storage engine from scratch.</p>
<p>To follow along with this tutorial:</p>
<ul>
<li><p>Make sure you have Golang installed on your system. If not, you can download and install it from the <a target="_blank" href="https://go.dev/">official Go website</a>.</p>
</li>
<li><p>Set up your development environment and create a new Go module for this project by running: <code>go mod init lsm-db</code></p>
</li>
<li><p>Keep a code editor or IDE ready to try out the examples.</p>
</li>
</ul>
<h3 id="heading-initial-feature-set-laying-the-foundation-of-the-database-system">Initial Feature Set: Laying the Foundation of the Database System</h3>
<p>When I’m designing or building a system, I like to think that the system already exists, and I assume that I can just start calling functions that support the features of the system. I’ll follow that pattern here and assume that the following functions of the LSM tree exist and we can invoke those functions from <a target="_blank" href="http://main.go">main.go</a>.</p>
<pre><code class="lang-go">db, err := NewDB[<span class="hljs-keyword">string</span>, <span class="hljs-keyword">string</span>](<span class="hljs-number">3</span>, <span class="hljs-number">3</span>) <span class="hljs-comment">// there is a feature to create new with some parameters we will get to</span>
db.Put(<span class="hljs-string">"a"</span>, <span class="hljs-string">"apple"</span>) <span class="hljs-comment">// a feature to add key value</span>
db.Delete(<span class="hljs-string">"a"</span>) <span class="hljs-comment">// a feature to delete a key</span>
val, _ := db.Get(<span class="hljs-string">"a"</span>) <span class="hljs-comment">// a feature to get value given a key</span>
</code></pre>
<p>As we progress through this journey, I’ll introduce essential features such as in-memory storage, flushing data to disk, and handling duplicate keys. We’ll also explore more advanced components, including a Write-Ahead Log (WAL) to ensure crash tolerance, a Manifest file to maintain the database state across application restarts, and a Compaction process to clean up redundant or stale data by merging older SSTables.</p>
<p>By the end of this tutorial, you will gain a clear understanding of how all these components work together to form a robust and efficient LSM-tree-based storage system.</p>
<h3 id="heading-memtable-in-memory-data-storage">MemTable: In-Memory Data Storage</h3>
<p>We’re building a database storage system, so of course you’ll need a way to store data. This means you need some kind of backing storage. This backing storage in an LSM tree is called a MemTable. The "Mem" refers to its in-memory storage. The benefit of in-memory storage is that it’s orders of magnitude faster than storing on disk.</p>
<p>For simplicity, at the core of the MemTable you can use a map (or dictionary depending on the programming language) as the underlying data structure to store key-value pairs. The map allows for fast lookups, insertions, and deletions, making it ideal for in-memory storage where performance is crucial. So the structure for MemTable will look like:</p>
<pre><code class="lang-go"><span class="hljs-keyword">type</span> MemTable[K comparable, V any] <span class="hljs-keyword">struct</span> {
    data <span class="hljs-keyword">map</span>[K]V <span class="hljs-comment">// this is primary storage map. It's generic so that you</span>
                  <span class="hljs-comment">// can store any kind of data</span>
}
</code></pre>
<p>Above code defines a <code>MemTable</code> struct, where <code>data</code> is a map that acts as the main storage for our key-value pairs. Since the <code>data</code> field is a map, you’ll be able to quickly add, retrieve, or delete values associated with a given key.</p>
<p>You must have noticed something new in the code. The use of <code>&lt;K comparable, V any&gt;</code>. This syntax is Go’s <strong>generic types</strong> feature, which allows us to write flexible code that can handle different data types.</p>
<p>Generics are a way to write code that is independent of any specific data type. They allow you to write functions and data structures that can work with a string, int, float, or any custom type you define, without sacrificing type safety.</p>
<p>In the above code, K and V are type parameters. They say: "This MemTable can work with any Key type K that is comparable, and any value type V."</p>
<p>Now that you have the MemTable, think of what functions it should provide to its clients. Well, the clients need to be able to save and retrieve values associated with a key, so the following functions would fit naturally:</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(m *MemTable[K, V])</span> <span class="hljs-title">Put</span><span class="hljs-params">(key K, value V)</span></span> {
    m.data[key] = value
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(m *MemTable[K, V])</span> <span class="hljs-title">Get</span><span class="hljs-params">(key K)</span> <span class="hljs-params">(V, <span class="hljs-keyword">bool</span>)</span></span> {
    value, ok := m.data[key]
    <span class="hljs-keyword">var</span> zero V
    <span class="hljs-keyword">if</span> !ok {
        <span class="hljs-keyword">return</span> zero, <span class="hljs-literal">false</span>
    }
    <span class="hljs-keyword">return</span> value, <span class="hljs-literal">true</span>
}
</code></pre>
<p>The above code has <code>Put</code> and a <code>Get</code> functions – let’s break them down:</p>
<ul>
<li><p><strong>Put</strong>: This function allows the client to insert a key-value pair into the MemTable. If the key already exists in the map, its value will be updated with the new value provided as an argument. This is effectively the <code>write</code> operation of our key-value store.</p>
</li>
<li><p><strong>Get</strong>: This function is responsible for retrieving a value associated with a give key from the MemTable. It returns two values, the value itself (of type <code>V</code>) and a boolean (<code>true</code> or <code>false</code>). The boolean indicates whether the key was found in the map. If the key does not exist, the function return a <code>zero value</code> (more on that below) along with <code>false</code>.</p>
</li>
</ul>
<p>Did you notice <code>var zero V</code>?</p>
<p>It's pretty interesting. Think of a situation where we don't get a value from the map – say the key is not there, or something else is wrong. What should the function <code>Get</code> return in that case? Can it return an int (0), or a string "Not found", or some random object (foo)? You don't know anything about the type yet (Generics), so you can't tell it what to return.</p>
<p>In this case, the compiler comes to the rescue. Go has this zero value concept: everything should have a zero value. An int has 0, string has "", bool has false, and pointer, slice, and map have nil. By saying <code>var zero V</code> you are telling the compiler, "I don't know the type yet, you figure out the type while compiling and put that type here as the return type." Neat!</p>
<p>I missed one thing though: how would a client invoke these functions? Right, we need a way to build the MemTable type.</p>
<p>To construct and initialize a MemTable, we can use a <strong>factory function</strong>: a common programming pattern for creating and returning new objects or instances without directly exposing the underlying implementation details.</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">NewMemTable</span>[<span class="hljs-title">K</span> <span class="hljs-title">comparable</span>, <span class="hljs-title">V</span> <span class="hljs-title">any</span>]<span class="hljs-params">()</span> *<span class="hljs-title">MemTable</span>[<span class="hljs-title">K</span>, <span class="hljs-title">V</span>]</span> {
    <span class="hljs-keyword">return</span> &amp;MemTable[K, V]{
        data: <span class="hljs-built_in">make</span>(<span class="hljs-keyword">map</span>[K]V),
    }
}
</code></pre>
<p>Notice how we’ve initialized the data field using the built-in <code>make</code> function. Here’s why we do this:</p>
<p>Go has a built-in function called <code>make</code>, which is used to allocate and initialize slices, maps, and channels. This allocation ensures that they are ready for use without the risk of runtime panics.</p>
<p>You might wonder, why not use the <code>new</code> function to allocate the map? After all, developers coming from other programming backgrounds (like C++ or Java) might expect to use <code>new</code> for all types of memory allocation. But Go <strong>differentiates how it manages memory for composite types versus basic/numeric types</strong>, and that’s where <code>make</code> comes in.</p>
<p>This distinction matters because the <code>new</code> function only <strong>allocates memory</strong> for an object and returns a <em>pointer</em> to that memory. The object itself is not initialized, meaning that while the memory is allocated, the map isn’t ready to use. If we try to perform operations (like adding a key-value pair) on a <code>map</code> only allocated using <code>new</code>, it will cause a runtime panic because the map wasn’t correctly initialized.</p>
<p>For example:</p>
<pre><code class="lang-go">m := <span class="hljs-built_in">new</span>(<span class="hljs-keyword">map</span>[<span class="hljs-keyword">string</span>]<span class="hljs-keyword">int</span>) <span class="hljs-comment">// Allocates a pointer to an uninitialized map</span>
(*m)[<span class="hljs-string">"a"</span>] = <span class="hljs-number">1</span>            <span class="hljs-comment">// This will panic because the map is not initialized</span>
</code></pre>
<p>On the other hand, <code>make</code> both allocates and <strong>initializes the map</strong>, ensuring it’s fully functional right away. That’s why the correct way to create a map is:</p>
<pre><code class="lang-go">m1 := <span class="hljs-built_in">make</span>(<span class="hljs-keyword">map</span>[<span class="hljs-keyword">string</span>]<span class="hljs-keyword">int</span>) <span class="hljs-comment">// Initializes the map properly</span>
m1[<span class="hljs-string">"a"</span>] = <span class="hljs-number">1</span>                <span class="hljs-comment">// This works as expected</span>
</code></pre>
<p>Now that you have the MemTable which can store data in memory, let's hook it up and use it.</p>
<p>But before that, do you remember at the beginning that I used functional invocations like <code>db.Put</code> and <code>db.Get</code>? Well, what is <code>db</code>? Because we are building a database storage system, it makes more sense to name the interface <code>db</code> instead of MemTable, right? And to be honest, it seems like MemTable is going to be part of the database system, not the whole system, doesn't it?</p>
<p>Even if it's not intuitive at the moment to define something like a DB type, let's just do it. Trust me, it will start to get clearer as we move along. This <code>db</code> type will wrap adding and retrieving data from MemTable.</p>
<pre><code class="lang-go"><span class="hljs-keyword">type</span> DB[K comparable, V any] <span class="hljs-keyword">struct</span> {
    memtable *MemTable[K, V]
}

<span class="hljs-comment">// factory function for DB type</span>
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">NewDB</span>[<span class="hljs-title">K</span> <span class="hljs-title">comparable</span>, <span class="hljs-title">V</span> <span class="hljs-title">any</span>]<span class="hljs-params">()</span> <span class="hljs-params">(*DB[K, V], error)</span></span> {
    memtable := NewMemTable[K, V]()
    <span class="hljs-keyword">return</span> &amp;DB[K, V]{
        memtable: memtable,
    }, <span class="hljs-literal">nil</span>
}
</code></pre>
<p>Let's just define the Put and Get functions which will invoke corresponding functions in MemTable:</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">Put</span><span class="hljs-params">(key K, value V)</span> <span class="hljs-title">error</span></span> {
    db.memtable.Put(key, value)
    <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">Get</span><span class="hljs-params">(key K)</span> <span class="hljs-params">(V, error)</span></span> {
    <span class="hljs-keyword">if</span> val, ok := db.memtable.Get(key); ok {
        <span class="hljs-keyword">return</span> val, <span class="hljs-literal">nil</span>
    }
    <span class="hljs-keyword">var</span> zero V
    <span class="hljs-keyword">return</span> zero, errors.New(<span class="hljs-string">"key not found"</span>)
}
</code></pre>
<p>Let's integrate whatever we’ve built so far and run it. To run add below code in main.go and run using <code>go run main.go</code></p>
<pre><code class="lang-go">db, err := NewDB[<span class="hljs-keyword">string</span>, <span class="hljs-keyword">string</span>]()
<span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
    log.Fatalf(<span class="hljs-string">"Failed to create DB: %v"</span>, err)
}
db.Put(<span class="hljs-string">"a"</span>, <span class="hljs-string">"apple"</span>)
val, _ := db.Get(<span class="hljs-string">"a"</span>)
log.Printf(<span class="hljs-string">"Get('a') = %s (should be 'apple')"</span>, val)
</code></pre>
<p>Look at that, you have built an in-memory database where your clients can store and fetch data from. It’s using generics so you can store any kind of values (int, string, objects).</p>
<p>Now say you shipped this solution and then it crashes. Your clients will lose all the data. Why will this crash? For one thing, memory is limited and at some point you’re going to run out of it. So there are two major problems with just in-memory storage:</p>
<ol>
<li><p>It's not durable.</p>
</li>
<li><p>Unbounded memory usage is going to crash the system.</p>
</li>
</ol>
<p>How do we solve these problems?</p>
<p>Here's a thought: what if we flush the MemTable data to disk at some regular interval? That way we can ensure that MemTable doesn't grow out of bounds. Also if the db crashes, we won’t lose all the data. We’ll still lose the data that hasn’t been flushed yet, but that's way better than losing all of it.</p>
<h3 id="heading-sstable-persisting-data-for-durability">SSTable: Persisting Data for Durability</h3>
<p>An SSTable is a sorted string table. I wish they’d called it a "Secondary Storage" table, but historically keys and values were strings – hence sorted string table. An SSTable is a persistent, ordered, and immutable file that stores key-value pairs. It’s a file stored on disk, so it's pretty clear that it’s persistent (durable).</p>
<p>Let’s discuss a couple key features of the SSTable:</p>
<ul>
<li><p><strong>It’s ordered</strong>: There is an incentive to store the keys in a sorted order, and it makes searching keys faster and efficient. If not for that, you'd have to scan the whole file to be able to find a key. Later, I will point out some code that leverages sorted storage.</p>
</li>
<li><p><strong>It’s immutable</strong>: Once an SSTable file is written, it can’t be modified. To update or delete a key, you must write a new record in a newer SSTable. This simplifies the design and makes reads and writes very predictable.</p>
</li>
</ul>
<p>But wait, how does that simplify the design?</p>
<p>One of the most complex things in software engineering is dealing with concurrency. Let’s say you’re writing to a file and another thread updates it underneath. How do you know you have the correct data?</p>
<p>With immutable design, you don't have to worry about this at all. You are 100% confident that the data you are reading has not been altered by anybody else. I’ll take that as a massive simplification: you don't have to deal with locks, starvation, staleness, and so on.</p>
<h4 id="heading-how-does-it-make-the-write-path-predictable">How does it make the write path predictable?</h4>
<p>I will answer this partially here and come back to it when we have completed some more implementation. You’ll see that every write in our code follows the exact same steps. There is not a single different condition or edge case.</p>
<p>In a traditional database (using a B-Tree), a typical write involves:</p>
<ol>
<li><p>Finding the data on disk.</p>
</li>
<li><p>Reading the block of data from disk into memory.</p>
</li>
<li><p>Modifying the data in memory.</p>
</li>
<li><p>Writing the entire block back to disk.</p>
</li>
</ol>
<p>The more steps, the more unpredictable performance can get, because the write can be fast if data is already in the memory cache or slow if there are multiple disk seeks needed.</p>
<p>Granted, our code is an overly simplified version, but the extension of this concept still stands true in real LSM implementations.</p>
<h4 id="heading-how-does-it-make-the-read-path-predictable">How does it make the read path predictable?</h4>
<p>Read is predictable because any number of threads can read the same SSTable file at the same time without any problem, with full confidence that data has not been updated.</p>
<p>In contrast, when reading from a mutable data structure, you have to worry that another thread might be in the process of changing the data you are trying to read.</p>
<p>To prevent this, B-Tree-based databases use complex locking mechanisms, and that adds overhead and unpredictability.</p>
<p>I should raise a caution here: the Read in LSM tree storage is not always predictable. It can be faster if data is read from memory and it can be very slow if multiple SSTables need to be looked up to find the key.</p>
<p>Having said that, you don't have to worry about other performance bottlenecks because of locks. Meaning, in B-Tree storage, your read query can be slower because another write query is holding a lock. In simple low-concurrency use cases, you will mostly get amazing read performance from a B-Tree structure, but this advantage wears off as concurrency increases.</p>
<p>LSM tree was built for highly concurrent, write-heavy use cases, and at times slower reads are a trade-off.</p>
<p>The takeaway that gives you ammunition to design better is that B-trees are better for read-heavy workloads. Reads are generally faster and more consistent, but performance can have unpredictable outliers under high write concurrency due to locking.</p>
<p>An LSM tree is better for write-heavy workloads. Writes are much faster. Reads are generally slower and more variable, but their performance profile is more predictable under high write concurrency because there is no read-write locking.</p>
<p>Let's implement an SSTable to see how it works.</p>
<p><strong>The write path:</strong></p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">writeSSTable</span>[<span class="hljs-title">K</span> <span class="hljs-title">comparable</span>, <span class="hljs-title">V</span> <span class="hljs-title">any</span>]<span class="hljs-params">(memtable *MemTable[K, V], path <span class="hljs-keyword">string</span>)</span> <span class="hljs-params">(*SSTable[K, V], error)</span></span> {
    file, err := os.Create(path)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }
    <span class="hljs-keyword">defer</span> file.Close()

    pairs := <span class="hljs-built_in">make</span>([]Pair[K, V], <span class="hljs-number">0</span>, <span class="hljs-built_in">len</span>(memtable.data))
    <span class="hljs-keyword">for</span> k, v := <span class="hljs-keyword">range</span> memtable.data {
        pairs = <span class="hljs-built_in">append</span>(pairs, Pair[K, V]{Key: k, Value: v})
    }

    sort.Slice(pairs, <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(i, j <span class="hljs-keyword">int</span>)</span> <span class="hljs-title">bool</span></span> {
        <span class="hljs-keyword">return</span> any(pairs[i].Key).(<span class="hljs-keyword">string</span>) &lt; any(pairs[j].Key).(<span class="hljs-keyword">string</span>)
    })

    encoder := gob.NewEncoder(file)
    <span class="hljs-keyword">for</span> _, pair := <span class="hljs-keyword">range</span> pairs {
        <span class="hljs-keyword">if</span> err := encoder.Encode(pair); err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
        }
    }

    <span class="hljs-keyword">return</span> &amp;SSTable[K, V]{path: path}, <span class="hljs-literal">nil</span>
}
</code></pre>
<p>The following things are important to note from the above code:</p>
<ol>
<li><p><code>sort.Slice</code>: Remember I spoke about order earlier? So we store data in the SSTable in a sorted fashion, and we will see how we leverage it in the read path.</p>
</li>
<li><p>I have used the gob encoding package. An encoder makes life simpler for you because it streams the data to and from Go data structures to binary streams that can be stored on disk. It handles all the complexity of representing types, field names, and values in a standardized binary format, so that you don't have to.</p>
</li>
</ol>
<p><strong>The read path:</strong></p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(s *SSTable[K, V])</span> <span class="hljs-title">Get</span><span class="hljs-params">(key K)</span> <span class="hljs-params">(V, error)</span></span> {
    file, err := os.Open(s.path)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">var</span> zero V
        <span class="hljs-keyword">return</span> zero, err
    }
    <span class="hljs-keyword">defer</span> file.Close()

    decoder := gob.NewDecoder(file)

    <span class="hljs-keyword">for</span> {
        <span class="hljs-keyword">var</span> pair Pair[K, V]
        <span class="hljs-keyword">if</span> err := decoder.Decode(&amp;pair); err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">if</span> err == io.EOF {
                <span class="hljs-keyword">break</span>
            }
            <span class="hljs-keyword">var</span> zero V
            <span class="hljs-keyword">return</span> zero, err
        }

        <span class="hljs-comment">// for simple comparison we are assuming key is just string</span>
        keyInDB := any(pair.Key).(<span class="hljs-keyword">string</span>)
        <span class="hljs-keyword">if</span> keyInDB == any(key).(<span class="hljs-keyword">string</span>) {
            <span class="hljs-keyword">if</span> any(pair.Value).(<span class="hljs-keyword">string</span>) == TOMBSTONE {
                <span class="hljs-keyword">var</span> zero V
                <span class="hljs-keyword">return</span> zero, ErrDeleted
            }
            <span class="hljs-keyword">return</span> pair.Value, <span class="hljs-literal">nil</span>
        }

        <span class="hljs-keyword">if</span> keyInDB &gt; any(key).(<span class="hljs-keyword">string</span>) {
            <span class="hljs-keyword">var</span> zero V
            <span class="hljs-keyword">return</span> zero, ErrNotFound
        }
    }

    <span class="hljs-keyword">var</span> zero V
    <span class="hljs-keyword">return</span> zero, ErrNotFound
}
</code></pre>
<p>On the read path, look at <code>keyInDB &gt; any(key).(string)</code>. This is one of the examples of how we took advantage of storing data in a sorted key order. The moment we find a key in the SSTable <strong>that is greater than the key</strong> we are looking for, we stop looking because it’s obvious all other keys will be greater than this, so we won't find our key anymore.</p>
<p>Now that you have implemented the SSTable, you just have to decide when to flush data from the MemTable to the SSTable. You can just define max size for MemTable and flush it to disk on the write path when the max size is reached.</p>
<p>I am skipping some variables, boilerplate code, and simplifying things for brevity. I will post a GitHub link with the complete implementation later.</p>
<pre><code class="lang-go"><span class="hljs-keyword">type</span> DB[K comparable, V any] <span class="hljs-keyword">struct</span> {
    memtable        *MemTable[K, V]
    maxMemtableSize <span class="hljs-keyword">int</span>
    memtableSize    <span class="hljs-keyword">int</span>
    sstables        []*SSTable[K, V]
    sstableCounter  <span class="hljs-keyword">int</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">NewDB</span>[<span class="hljs-title">K</span> <span class="hljs-title">comparable</span>, <span class="hljs-title">V</span> <span class="hljs-title">any</span>]<span class="hljs-params">(maxMemtableSize <span class="hljs-keyword">int</span>)</span> <span class="hljs-params">(*DB[K, V], error)</span></span> {
    sstables := <span class="hljs-built_in">make</span>([]*SSTable[K, V], <span class="hljs-number">0</span>)
    memtable := NewMemTable[K, V]()

    <span class="hljs-keyword">return</span> &amp;DB[K, V]{
        memtable:        memtable,
        maxMemtableSize: maxMemtableSize,
        sstables:        sstables,
    }, <span class="hljs-literal">nil</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">Put</span><span class="hljs-params">(key K, value V)</span> <span class="hljs-title">error</span></span> {
    db.memtable.Put(key, value)
    db.memtableSize++

    <span class="hljs-keyword">if</span> db.memtableSize &gt;= db.maxMemtableSize {
        <span class="hljs-keyword">if</span> err := db.flushMemtable(); err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">return</span> err
        }
    }

    <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">flushMemtable</span><span class="hljs-params">()</span> <span class="hljs-title">error</span></span> {
    sstablePath := fmt.Sprintf(<span class="hljs-string">"data-%d.sstable"</span>, db.sstableCounter)
    sstable, err := writeSSTable(db.memtable, sstablePath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> err
    }

    db.sstables = <span class="hljs-built_in">append</span>(db.sstables, sstable)
    db.sstableCounter++
    db.memtable = NewMemTable[K, V]()
    db.memtableSize = <span class="hljs-number">0</span>

    <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>
}
</code></pre>
<p>You'll notice that every time we flush to disk, we write to a new SSTable versus using a single SSTable for the whole database. This is the immutability aspect we discussed earlier.</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">Get</span><span class="hljs-params">(key K)</span> <span class="hljs-params">(V, error)</span></span> {
    <span class="hljs-keyword">if</span> val, ok := db.memtable.Get(key); ok {
        <span class="hljs-keyword">return</span> val, <span class="hljs-literal">nil</span>
    }

    <span class="hljs-keyword">for</span> i := <span class="hljs-built_in">len</span>(db.sstables) - <span class="hljs-number">1</span>; i &gt;= <span class="hljs-number">0</span>; i-- {
        sstable := db.sstables[i]
        val, err := sstable.Get(key)

        <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">var</span> zero V

            <span class="hljs-keyword">if</span> err == ErrDeleted {
                <span class="hljs-keyword">return</span> zero, ErrNotFound
            }
            <span class="hljs-keyword">if</span> err == ErrNotFound {
                <span class="hljs-keyword">continue</span>
            }
            <span class="hljs-keyword">return</span> zero, err
        }

        <span class="hljs-keyword">return</span> val, <span class="hljs-literal">nil</span>
    }

    <span class="hljs-keyword">var</span> zero V
    <span class="hljs-keyword">return</span> zero, ErrNotFound
}
</code></pre>
<p>One important aspect to note on the read path is that we are reading the newest SSTable first. This is because the newest SSTable has the most updated value for the key.</p>
<p>So, say you have a key "a" with value "apple", and along the way you update that value for "a" to be "apricot". You'd have flushed it to a new SSTable (for immutability), and so if you were to read an older SSTable, first you'd get the older value. So by reading the newer SSTable first, we get the correct value and we don't have to worry about updating older SSTables.</p>
<h3 id="heading-the-wal-write-ahead-log-crash-recovery-made-simple">The WAL (Write Ahead Log ): Crash Recovery Made Simple</h3>
<p>Now that we have an SSTable, our data is durable and we are safe from losing data upon crashes. Are we really safe, though? Think of a scenario where a crash happens before we flush to the SSTable. We know MemTable has a max threshold, and until then, data lives in memory. So we’re still prone to losing data if a crash happens before the flush.</p>
<p>This is where the WAL (Write Ahead Log) comes into the picture. It’s the single most important aspect of the LSM tree.</p>
<p>We’ll follow a simple rule: "Before we write a piece of data to the in-memory MemTable, we first write it to a log file on disk."</p>
<p>If a crash happens and the database starts again, the first thing it does is look for a WAL, read it if one is found, and replay all the data into MemTable. This process reconstructs the MemTable to the exact state it was in right before the crash.</p>
<p>It's natural to think that if all of your writes are first written to disk it will impact performance. You aren’t wrong, but at the same time there are nuances.</p>
<p>The writes to WAL are different in that they are append-only sequential writes, meaning random disk seeks are not required. On a traditional spinning hard drive (HDD), this is fast because the disk's read/write head does not have to move to a new location. On a modern solid-state drive (SSD), sequential writes are also much faster than random writes.</p>
<p>Whatever small performance impact we accept is a trade-off for durability.</p>
<p>Now that we know what WAL does, let's implement it. Two key functions of WAL are to write to a file on disk and replay MemTable upon start.</p>
<p>Note that in the factory function below (NewWAL), the file has been opened in append mode.</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">NewWAL</span>[<span class="hljs-title">K</span> <span class="hljs-title">comparable</span>, <span class="hljs-title">V</span> <span class="hljs-title">any</span>]<span class="hljs-params">(path <span class="hljs-keyword">string</span>)</span> <span class="hljs-params">(*WAL[K, V], error)</span></span> {
    file, err := os.OpenFile(path, os.O_APPEND|os.O_CREATE|os.O_WRONLY, <span class="hljs-number">0644</span>)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }
    <span class="hljs-keyword">return</span> &amp;WAL[K, V]{
        file:    file,
        encoder: gob.NewEncoder(file),
    }, <span class="hljs-literal">nil</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(wal *WAL[K, V])</span> <span class="hljs-title">Write</span><span class="hljs-params">(key K, value V)</span> <span class="hljs-title">error</span></span> {
    entry := WALEntry[K, V]{Key: key, Value: value}
    <span class="hljs-keyword">return</span> wal.encoder.Encode(&amp;entry)
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">ReplayWAL</span>[<span class="hljs-title">K</span> <span class="hljs-title">comparable</span>, <span class="hljs-title">V</span> <span class="hljs-title">any</span>]<span class="hljs-params">(path <span class="hljs-keyword">string</span>)</span> <span class="hljs-params">(*MemTable[K, V], error)</span></span> {
    memtable := NewMemTable[K, V]()
    file, err := os.Open(path)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">if</span> os.IsNotExist(err) {
            <span class="hljs-comment">// If the file doesn't exist, that's fine. Return an empty memtable.</span>
            <span class="hljs-keyword">return</span> memtable, <span class="hljs-literal">nil</span>
        }
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }
    <span class="hljs-keyword">defer</span> file.Close()

    decoder := gob.NewDecoder(file)
    <span class="hljs-keyword">for</span> {
        <span class="hljs-keyword">var</span> entry WALEntry[K, V]
        <span class="hljs-keyword">if</span> err := decoder.Decode(&amp;entry); err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">if</span> err == io.EOF {
                <span class="hljs-keyword">break</span> <span class="hljs-comment">// We've reached the end of the file.</span>
            }
            <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
        }
        memtable.Put(entry.Key, entry.Value)
    }

    <span class="hljs-keyword">return</span> memtable, <span class="hljs-literal">nil</span>
}
</code></pre>
<p>A couple notes about the above code:</p>
<ul>
<li><p><strong>NewWAL</strong>: This function creates an instance of the WAL for our database. It takes in the file path where the WAL data should be stored and opens the file using Go’s <code>os.OpenFile</code> function. Also, a <code>gob.Encoder</code> is initialized to simplify the encoding of Go data structures into binary format for efficient storage in the WAL file.</p>
</li>
<li><p><strong>Write</strong>: The Write function appends a new key-value pair to the WAL file. Every write operation to the MemTable first calls this function to ensure the update is durably recorded:</p>
</li>
<li><p><strong>ReplayWAL:</strong> This is the most important function. In the event of crash, this function comes to our rescue by reconstructing the MemTable from the WAL file. It replays the entries stored in the WAL file and writes it into MemeTable. Following it how it works:</p>
<ol>
<li><p>The function begins by creating a new empty MemTable instance that will be populated with key-value pairs.</p>
</li>
<li><p>It then attempts to open the WAL file. If the file does not exist (example – if this is the first startup), the function assumes there’s nothing to recover and simply returns the empty MemTable.</p>
</li>
<li><p>A <code>gob.Decoder</code> is used to read the WAL file, which helps to deserialize the saved binary-encoded <code>WALEntry</code> data back into key-value pairs.</p>
</li>
<li><p>For each successfully decoded <code>WALEntry</code>, the key-value pair is added back into the MemTable using the <code>Put</code> function.</p>
</li>
</ol>
</li>
</ul>
<p>With this, the database can fully recover its state by replaying all the operations recorded in the WAL.</p>
<p>As far as integration is concerned, every time you create a new DB, you should think of replaying from an existing WAL and opening the WAL in append mode. Also, Put should first write to WAL.</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">NewDB</span>[<span class="hljs-title">K</span> <span class="hljs-title">comparable</span>, <span class="hljs-title">V</span> <span class="hljs-title">any</span>]<span class="hljs-params">(maxMemtableSize <span class="hljs-keyword">int</span>)</span> <span class="hljs-params">(*DB[K, V], error)</span></span> {
    walPath := <span class="hljs-string">"db.wal"</span>
    memtable, err := ReplayWAL[K, V](walPath) <span class="hljs-comment">// this is the replay</span>
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }
    <span class="hljs-comment">//open WAL in append mode</span>
    wal, err := NewWAL[K, V](walPath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    <span class="hljs-keyword">return</span> &amp;DB[K, V]{
        memtable:        memtable,
        maxMemtableSize: maxMemtableSize,
        memtableSize:    <span class="hljs-built_in">len</span>(memtable.data),
        wal:             wal,
        walPath:         walPath,
        sstables:        <span class="hljs-built_in">make</span>([]*SSTable[K, V], <span class="hljs-number">0</span>),
    }, <span class="hljs-literal">nil</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">Put</span><span class="hljs-params">(key K, value V)</span> <span class="hljs-title">error</span></span> {
<span class="hljs-comment">//first write to WAL</span>
    <span class="hljs-keyword">if</span> err := db.wal.Write(key, value); err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> err
    }

    db.memtable.Put(key, value)
    db.memtableSize++

    <span class="hljs-keyword">if</span> db.memtableSize &gt;= db.maxMemtableSize {
        <span class="hljs-keyword">if</span> err := db.flushMemtable(); err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">return</span> err
        }
    }

    <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>
}
</code></pre>
<h3 id="heading-manifest-file-tracking-the-state-of-the-database">Manifest File: Tracking the State of the Database</h3>
<p>By this point, the database is pretty robust and durable, but an important question lingers: upon restarts, how does our database know about SSTables? Knowing about all SSTables is important for fetching data.</p>
<p>So say our database crashed after writing several SSTables. Without knowing about these SSTables, the database will create a new slice of SSTables and all of our old data is gone – queries won't read those files.</p>
<p>To solve this problem, we introduce an inventory of SSTables called MANIFEST. Every time we successfully create a new SSTable in flushMemtable, we add its path to the MANIFEST and save the MANIFEST to disk.</p>
<p>The very first thing NewDB does on startup is read the MANIFEST. This gives it the list of all the file paths, and it uses this list to perfectly reconstruct its SSTables slice.</p>
<p>In short, MANIFEST determines the state of the DB.</p>
<p>Manifest contains a slice of SSTablePaths. The Read function will read the MANIFEST file to restore the knowledge of the SSTables. The Write function will write a new manifest file.</p>
<pre><code class="lang-go"><span class="hljs-keyword">type</span> Manifest <span class="hljs-keyword">struct</span> {
    SSTablePaths []<span class="hljs-keyword">string</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">ReadManifest</span><span class="hljs-params">(path <span class="hljs-keyword">string</span>)</span> <span class="hljs-params">(*Manifest, error)</span></span> {
    file, err := os.Open(path)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">if</span> os.IsNotExist(err) {
            <span class="hljs-comment">// If manifest doesn't exist, return empty manifest</span>
            <span class="hljs-keyword">return</span> &amp;Manifest{SSTablePaths: []<span class="hljs-keyword">string</span>{}}, <span class="hljs-literal">nil</span>
        }
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }
    <span class="hljs-keyword">defer</span> file.Close()

    <span class="hljs-keyword">var</span> manifest Manifest
    decoder := gob.NewDecoder(file)
    err = decoder.Decode(&amp;manifest)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    <span class="hljs-keyword">return</span> &amp;manifest, <span class="hljs-literal">nil</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">WriteManifest</span><span class="hljs-params">(path <span class="hljs-keyword">string</span>, manifest *Manifest)</span> <span class="hljs-title">error</span></span> {
    tmpPath := path + <span class="hljs-string">".tmp"</span>
    file, err := os.Create(tmpPath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> err
    }

    encoder := gob.NewEncoder(file)
    <span class="hljs-keyword">if</span> err := encoder.Encode(manifest); err != <span class="hljs-literal">nil</span> {
        file.Close()
        os.Remove(tmpPath)
        <span class="hljs-keyword">return</span> err
    }

    <span class="hljs-keyword">if</span> err := file.Close(); err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> err
    }
    <span class="hljs-comment">// Atomic Rename</span>
    <span class="hljs-keyword">return</span> os.Rename(tmpPath, path)
}
</code></pre>
<p>You'll notice that we aren’t modifying the existing MANIFEST file directly. Instead, we’re creating a temporary file, writing all the data to it, closing it, and then <code>atomically renaming</code> it to replace the old MANIFEST.</p>
<p>The <code>os.Rename()</code> operation is atomic on most filesystems, meaning it either completely succeeds or completely fails – there's no in-between state. This is crucial because if the system crashes while updating the MANIFEST, we need to ensure we don't end up with a corrupted file. We’ll discuss this again below when we’re talking about compaction.</p>
<p>With this approach, we either have the old valid MANIFEST or the new valid MANIFEST, never a partially written corrupted file.</p>
<p>From an integration standpoint, NewDB will read the manifest and set its SSTable slice based on that. The flush method, given that it writes to SSTable, will also write SSTable info to manifest to keep the db updated about new SSTables.</p>
<pre><code class="lang-go"><span class="hljs-keyword">type</span> DB[K comparable, V any] <span class="hljs-keyword">struct</span> {
    memtable        *MemTable[K, V]
    maxMemtableSize <span class="hljs-keyword">int</span>
    memtableSize    <span class="hljs-keyword">int</span>
    sstables        []*SSTable[K, V]
    sstableCounter  <span class="hljs-keyword">int</span>
    wal             *WAL[K, V]
    walPath         <span class="hljs-keyword">string</span>
    manifest        *Manifest
    manifestPath    <span class="hljs-keyword">string</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">NewDB</span>[<span class="hljs-title">K</span> <span class="hljs-title">comparable</span>, <span class="hljs-title">V</span> <span class="hljs-title">any</span>]<span class="hljs-params">(maxMemtableSize <span class="hljs-keyword">int</span>)</span> <span class="hljs-params">(*DB[K, V], error)</span></span> {
    walPath := <span class="hljs-string">"db.wal"</span>
    memtable, err := ReplayWAL[K, V](walPath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    wal, err := NewWAL[K, V](walPath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    manifestPath := <span class="hljs-string">"MANIFEST"</span>
    manifest, err := ReadManifest(manifestPath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    sstables := <span class="hljs-built_in">make</span>([]*SSTable[K, V], <span class="hljs-built_in">len</span>(manifest.SSTablePaths))
    <span class="hljs-keyword">for</span> i, path := <span class="hljs-keyword">range</span> manifest.SSTablePaths {
        sstables[i] = &amp;SSTable[K, V]{path: path}
    }

    <span class="hljs-keyword">return</span> &amp;DB[K, V]{
        memtable:        memtable,
        maxMemtableSize: maxMemtableSize,
        memtableSize:    <span class="hljs-built_in">len</span>(memtable.data),
        wal:             wal,
        walPath:         walPath,
        manifest:        manifest,
        manifestPath:    manifestPath,
        sstables:        sstables,
    }, <span class="hljs-literal">nil</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">flushMemtable</span><span class="hljs-params">()</span> <span class="hljs-title">error</span></span> {
    sstablePath := fmt.Sprintf(<span class="hljs-string">"data-%d.sstable"</span>, db.sstableCounter)
    sstable, err := writeSSTable(db.memtable, sstablePath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> err
    }

    db.sstables = <span class="hljs-built_in">append</span>(db.sstables, sstable)
    db.sstableCounter++

    db.manifest.SSTablePaths = <span class="hljs-built_in">append</span>(db.manifest.SSTablePaths, sstablePath)
    <span class="hljs-keyword">if</span> err := WriteManifest(db.manifestPath, db.manifest); err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> err
    }

    db.memtable = NewMemTable[K, V]()
    db.memtableSize = <span class="hljs-number">0</span>

    <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>
}
</code></pre>
<p>At this point, our DB has almost everything. It can write to memory (MemTable), persist to disk (sstable), and recover from crashes (WAL and manifest). You should include the update and delete feature for completeness – so let’s look at those next.</p>
<h3 id="heading-update-and-delete-handling-mutability-in-an-immutable-system">Update and Delete: Handling Mutability in an Immutable System</h3>
<p>By this time, you should know that in an LSM storage system, data is never updated – rather, new data is written. For example, if you have a data pair ("a": "apple") and over time this has to change to the pair ("a": "apricot"), a new pair will be written to a different SSTable without any change to the existing pair. And yes, this leads to duplicates.</p>
<p>Also, interestingly, data isn't even deleted during write operations. The reason for that is, in a traditional sense, if you have to delete ("a":"apple"), you will have to find where it lives on disk and remove it. This makes writes slow. So instead, a clever mechanism is used: instead of removing the data directly, you can mark the key as deleted by writing a special <code>TOMBSTONE</code> value.</p>
<p>So, in the case of deleting (a : apple), you wouldn't remove the key from any SSTable. Instead, you’d write a new key-value pair such as ("a": "TOMBSTONE"). Here’s what this achieves:</p>
<ul>
<li><p>The <code>"TOMBSTONE"</code> serves as a marker within the SSTable, telling the system that the key <code>"a"</code> has been logically deleted, even though it still physically exists in older SSTables.</p>
</li>
<li><p>During future reads, any value associated with <code>"TOMBSTONE"</code> will be treated as deleted, ensuring that the entry no longer shows up in query results.</p>
</li>
<li><p>This mechanism avoids the need for immediate deletions or expensive in-place updates, making write operations faster and simpler.</p>
</li>
</ul>
<p>But this also raises the following questions:</p>
<ol>
<li><p>How do you accurately read when there are duplicates? Meaning, how do users get ("a": "apricot") instead of ("a": "apple") because the former is latest and accurate?</p>
</li>
<li><p>How do you handle deletes to ensure deleted keys are not returned (and instead, a proper error message is returned)?</p>
</li>
<li><p>These stale and deleted data are garbage. How do you get rid of them to save on storage space?</p>
</li>
</ol>
<p>As long as data is in MemTable (in-memory map), the duplicates are easy to handle: new values will just replace the old values.</p>
<p>But it gets tricky when data is in multiple SSTables. There is a very simple solution to this problem, and that is to just read the newer SSTable before older ones. That way, you will always read the latest value for a given key and exit early.</p>
<p>The following code in the read path ensures reading from newer SSTables before moving to older ones (note the loop starts from <code>len(db.sstables) - 1</code>):</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">Get</span><span class="hljs-params">(key K)</span> <span class="hljs-params">(V, error)</span></span> {
    <span class="hljs-comment">// Check memtable first</span>
    <span class="hljs-keyword">if</span> val, ok := db.memtable.Get(key); ok {
        <span class="hljs-keyword">if</span> any(val).(<span class="hljs-keyword">string</span>) == TOMBSTONE {
            <span class="hljs-keyword">var</span> zero V
            <span class="hljs-keyword">return</span> zero, ErrNotFound
        }
        <span class="hljs-keyword">return</span> val, <span class="hljs-literal">nil</span>
    }

    <span class="hljs-comment">// Then check sstables from newest to oldest</span>
    <span class="hljs-keyword">for</span> i := <span class="hljs-built_in">len</span>(db.sstables) - <span class="hljs-number">1</span>; i &gt;= <span class="hljs-number">0</span>; i-- {
        sstable := db.sstables[i]
        val, err := sstable.Get(key)

        <span class="hljs-keyword">if</span> err == <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">return</span> val, <span class="hljs-literal">nil</span>
        }

        <span class="hljs-keyword">var</span> zero V
        <span class="hljs-keyword">if</span> err == ErrDeleted {
            <span class="hljs-keyword">return</span> zero, ErrNotFound
        }
        <span class="hljs-keyword">if</span> err == ErrNotFound {
            <span class="hljs-keyword">continue</span>
        }
        <span class="hljs-keyword">return</span> zero, err
    }

    <span class="hljs-keyword">var</span> zero V
    <span class="hljs-keyword">return</span> zero, ErrNotFound
}
</code></pre>
<p>And for delete, you could just add a new value "TOMBSTONE":</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">Delete</span><span class="hljs-params">(key K)</span> <span class="hljs-title">error</span></span> {
    <span class="hljs-keyword">return</span> db.Put(key, any(TOMBSTONE).(V))
}
</code></pre>
<p>Note: This implementation assumes V is a string type. In a production system, you would need a more robust way to handle tombstones that works with any value type.</p>
<p>Handling deleted keys becomes simple now. You can check for the value (in MemTable and SSTable) and return an error if the value is "TOMBSTONE":</p>
<pre><code class="lang-go"><span class="hljs-comment">// db.go</span>
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">Get</span><span class="hljs-params">(key K)</span> <span class="hljs-params">(V, error)</span></span> {
    <span class="hljs-keyword">if</span> val, ok := db.memtable.Get(key); ok {
        <span class="hljs-keyword">if</span> any(val).(<span class="hljs-keyword">string</span>) == TOMBSTONE { <span class="hljs-comment">//got TOMBSTONE, return zero</span>
            <span class="hljs-keyword">var</span> zero V
            <span class="hljs-keyword">return</span> zero, ErrNotFound
        }
        <span class="hljs-keyword">return</span> val, <span class="hljs-literal">nil</span>
    }
    <span class="hljs-comment">// ... rest of function</span>
}
</code></pre>
<pre><code class="lang-go"><span class="hljs-comment">// sstable.go</span>
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(s *SSTable[K, V])</span> <span class="hljs-title">Get</span><span class="hljs-params">(key K)</span> <span class="hljs-params">(V, error)</span></span> {
    <span class="hljs-comment">// ... earlier code</span>

    keyInDB := any(pair.Key).(<span class="hljs-keyword">string</span>)
    <span class="hljs-keyword">if</span> keyInDB == any(key).(<span class="hljs-keyword">string</span>) {
        <span class="hljs-keyword">if</span> any(pair.Value).(<span class="hljs-keyword">string</span>) == TOMBSTONE {
            <span class="hljs-keyword">var</span> zero V
            <span class="hljs-keyword">return</span> zero, ErrDeleted
        }
        <span class="hljs-keyword">return</span> pair.Value, <span class="hljs-literal">nil</span>
    }

    <span class="hljs-comment">// ... rest of function</span>
}
</code></pre>
<h3 id="heading-compaction-cleaning-up-stale-and-deleted-data">Compaction: Cleaning Up Stale and Deleted Data</h3>
<p>We have handled all the scenarios so far except for one. It’s not a concern of serving read/write traffic but something that’s important for the health of the storage system.</p>
<p>Over time, the system has developed a lot of garbage (stale, deleted data) and needs a garbage collection mechanism. Compaction is a background maintenance process that cleans up and reorganizes data in an LSM storage system.</p>
<p>As the system grows, multiple SSTables have been created. This leads to reads needing multiple file operations to get values. By compacting (or merging) multiple SSTables into a single one, you avoid disk operation overhead. Along the way, you should also permanently delete data that has been TOMBSTONED.</p>
<p>Note: Compaction is the only time data is permanently deleted from an LSM storage system.</p>
<p>To grasp the concept of compaction, we are going to implement something called <code>Full Compaction</code> where you will merge all the existing SSTables into one larger SSTable. In real-world database implementations, the strategy is more complex, there are multi level compaction involved.</p>
<h4 id="heading-compaction-algorithm">Compaction Algorithm</h4>
<p>We’re going to implement <code>K-way merge</code> to perform compaction. It’s a general algorithm that takes K sorted lists and merges them into a single, combined sorted list. In this case, the K sorted lists are the SSTables, and you are going to merge all of them into a single SSTable.</p>
<p>Our SSTables are already sorted, so the idea of merging them involves:</p>
<ol>
<li><p>Taking the smallest (first) keys from each SSTable</p>
</li>
<li><p>Finding the smallest among those keys</p>
</li>
<li><p>Storing the found smallest key into new SSTable file</p>
</li>
<li><p>Fetching next key from the SSTable the smallest key belongs to</p>
</li>
<li><p>Repeating this process for all SSTables</p>
</li>
</ol>
<p>Here’s a simple example with numbers:</p>
<pre><code class="lang-bash">Assume we have 3 sorted lists:
List A : [4, 8, 12]
List B : [3, 9]
List C : [7, 10, 11]

In the first iteration, we will take (4, 3, 7) because those are the smallest keys <span class="hljs-keyword">for</span> individual lists. 
We find the smallest among those, <span class="hljs-built_in">which</span> is 3, and store 3 <span class="hljs-keyword">in</span> the result list.

In the second iteration, we will take (4, 9, 7). Note that 3 has already been accounted <span class="hljs-keyword">for</span>. 
We pick 4 and store it to the result list.

Repeating this until all lists are empty, we get:
Result List : [3, 4, 7, 8, 9, 10, 11, 12]
</code></pre>
<p>The core part of this algorithm is to find the smallest key among the smallest keys from the individual SSTables. Fortunately, we have a data structure called <code>Min-Heap</code> that does this for us. So, you’re going to take the smallest key from each SSTable and put them all onto a Min Heap for it to return the smallest among those. We’re going to leverage go’s <code>container/heap</code> package to get the Min-Heap data structure and corresponding algorithm to find minimum value and put it at the top of heap.</p>
<p>Min Heap needs you to provide a function for it to determine what is the smaller key between two keys, as it uses that logic to determine global minimum. The following function is implemented for that:</p>
<pre><code class="lang-go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(h MinHeap[K, V])</span> <span class="hljs-title">Less</span><span class="hljs-params">(i, j <span class="hljs-keyword">int</span>)</span> <span class="hljs-title">bool</span></span> {
    <span class="hljs-comment">// again for simple comparison assume string key</span>
    keyI := any(h[i].Pair.Key).(<span class="hljs-keyword">string</span>)
    keyJ := any(h[j].Pair.Key).(<span class="hljs-keyword">string</span>)
    <span class="hljs-keyword">if</span> keyI != keyJ {
        <span class="hljs-keyword">return</span> keyI &lt; keyJ
    }
    <span class="hljs-comment">// this is needed for the case when you have duplicate keys,</span>
    <span class="hljs-comment">// you will want to pick the one that is in newer sstable because that is latest</span>
    <span class="hljs-keyword">return</span> h[i].SSTableIndex &gt; h[j].SSTableIndex
}
</code></pre>
<p>One important aspect about the above shown <code>Less</code> function is how it handles ties. So if we have two pairs with same key, which is lesser? Let’s assume two pairs as <code>(a: apple)</code> and <code>(a: apricot)</code>, where (a: apple) is the older value (written to an older SSTable), which pair should the Less function return as the lesser value?</p>
<p>The answer is the one which is in the newer SSTable (see <code>h[i].SSTableIndex &gt; h[j].SSTableIndex</code>). It ensures that the SSTable with higher index (that is, latest) becomes the lesser value, so (a: apricot) wins. It’s is important to always get the newer value of a given key.</p>
<p>The code for compaction looks something like the following. Note that we’re discarding deleted values (TOMBSTONE) and the older values.</p>
<pre><code class="lang-go"><span class="hljs-comment">// put this in a new file compaction.go</span>
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">MergeSSTables</span>[<span class="hljs-title">K</span> <span class="hljs-title">comparable</span>, <span class="hljs-title">V</span> <span class="hljs-title">any</span>]<span class="hljs-params">(sstables []*SSTable[K, V], newPath <span class="hljs-keyword">string</span>)</span> <span class="hljs-params">(*SSTable[K, V], error)</span></span> {
    newFile, err := os.Create(newPath) <span class="hljs-comment">// create a new sstable file</span>
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }
    <span class="hljs-keyword">defer</span> newFile.Close() <span class="hljs-comment">// prevent memory leak by ensuring file is closed</span>

    newEncoder := gob.NewEncoder(newFile) <span class="hljs-comment">// initialize encoder for new SSTable file</span>


    files := <span class="hljs-built_in">make</span>([]*os.File, <span class="hljs-built_in">len</span>(sstables)) <span class="hljs-comment">// open all the sstables</span>
    decoders := <span class="hljs-built_in">make</span>([]*gob.Decoder, <span class="hljs-built_in">len</span>(sstables)) <span class="hljs-comment">// initialize one decoder per ssltable file</span>
    <span class="hljs-keyword">for</span> i, sstable := <span class="hljs-keyword">range</span> sstables {
        files[i], err = os.Open(sstable.path)
        <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
        }
        <span class="hljs-keyword">defer</span> files[i].Close() <span class="hljs-comment">// prevent memory leak by ensuring file is closed</span>
        decoders[i] = gob.NewDecoder(files[i])
    }

    <span class="hljs-comment">// read first pair from each sstable and store in a pair array</span>
    pairs := <span class="hljs-built_in">make</span>([]Pair[K, V], <span class="hljs-built_in">len</span>(decoders))
    emptySSTables := <span class="hljs-built_in">make</span>([]<span class="hljs-keyword">bool</span>, <span class="hljs-built_in">len</span>(decoders)) <span class="hljs-comment">// track empty sstables</span>
    <span class="hljs-keyword">for</span> i, decoder := <span class="hljs-keyword">range</span> decoders {
        <span class="hljs-keyword">if</span> err := decoder.Decode(&amp;pairs[i]); err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">if</span> err == io.EOF {
                emptySSTables[i] = <span class="hljs-literal">true</span>
                <span class="hljs-keyword">continue</span>
            }
            <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
        }
    }

    <span class="hljs-comment">// push those pairs onto heap</span>
    h := &amp;MinHeap[K, V]{}
    <span class="hljs-keyword">for</span> i, pair := <span class="hljs-keyword">range</span> pairs {
        <span class="hljs-keyword">if</span> !emptySSTables[i] {
            heap.Push(h, &amp;HeapItem[K, V]{Pair: pair, SSTableIndex: i})
        }
    }

    <span class="hljs-comment">// init the min-heap calculation algorithm from container/heap package</span>
    heap.Init(h)

    <span class="hljs-keyword">var</span> lastKey K
    firstKey := <span class="hljs-literal">true</span>

    <span class="hljs-comment">// pop the min item from heap and store it into new sstable</span>
    <span class="hljs-keyword">for</span> h.Len() &gt; <span class="hljs-number">0</span> {
        item := heap.Pop(h).(*HeapItem[K, V])

        <span class="hljs-comment">// If this key is a duplicate of the last one we saw, skip it</span>
        <span class="hljs-keyword">if</span> !firstKey &amp;&amp; item.Pair.Key == lastKey {
            <span class="hljs-comment">// We only care about the version from the newest SSTable,</span>
            <span class="hljs-comment">// which we have already processed</span>
        } <span class="hljs-keyword">else</span> {
            <span class="hljs-keyword">if</span> any(item.Pair.Value).(<span class="hljs-keyword">string</span>) != TOMBSTONE {
                <span class="hljs-comment">// discard deleted</span>
                <span class="hljs-keyword">if</span> err := newEncoder.Encode(item.Pair); err != <span class="hljs-literal">nil</span> {
                    <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
                }
            }
        }

        lastKey = item.Pair.Key
        firstKey = <span class="hljs-literal">false</span>

        <span class="hljs-comment">// Push the next item from the same SSTable into the heap</span>
        <span class="hljs-keyword">var</span> nextPair Pair[K, V]
        <span class="hljs-keyword">if</span> err := decoders[item.SSTableIndex].Decode(&amp;nextPair); err == <span class="hljs-literal">nil</span> {
            heap.Push(h, &amp;HeapItem[K, V]{Pair: nextPair, SSTableIndex: item.SSTableIndex})
        } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> err != io.EOF {
            <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
        }
    }

    <span class="hljs-keyword">return</span> &amp;SSTable[K, V]{path: newPath}, <span class="hljs-literal">nil</span>
}
</code></pre>
<p>All the compaction magic has been packed in one function, <code>MergeSSTables</code>. The function has the following logical steps (and you can check the inline comments in the code to follow along):</p>
<ol>
<li><p>We create a new destination SSTable file and initialize corresponding <code>gob.Encoder</code></p>
</li>
<li><p>We open all the existing SSTable files, and store their references to <code>files array</code>. Also, we initialize one <code>gob.Decoder</code> per exiting SSTable file. To prevent memory leak, a <code>defer</code> statement ensures that each file will be closed once the function completes its work.</p>
</li>
<li><p>Each <code>decoder</code> reads the first key-value pair from its corresponding SSTable and stores it in the <code>pairs</code> array.</p>
</li>
<li><p>SSTables that are already exhausted (for example, are empty or have hit the end of the file) are marked as such in the <code>emptySSTables</code> slice, and we skip pushing them onto the heap.</p>
</li>
<li><p>We push each pair from the pairs array to <code>Min-Heap</code> and then initialize the <code>Min-Heap</code> calculation algorithm. This algorithm is present in Go’s <code>container/heap</code> package.</p>
</li>
<li><p>Each time the smallest key-value pair is popped from the min-heap, it’s compared with the previously processed key (<code>lastKey</code>). Duplicate keys (those whose values are already written) are skipped.</p>
</li>
<li><p>Values marked with a <code>"TOMBSTONE"</code> (logically deleted entries) are ignored and not written to the new SSTable, effectively cleaning up deleted data.</p>
</li>
<li><p>To continue the merge, the next key-value pair from the same SSTable (as the one we just processed) is read and pushed onto the heap, unless the end of the SSTable (<code>io.EOF</code>) has been reached.</p>
</li>
</ol>
<p>To integrate this with the DB, you could use a compaction threshold and trigger compaction as part of the flush when this threshold is reached:</p>
<pre><code class="lang-go"><span class="hljs-keyword">type</span> DB[K comparable, V any] <span class="hljs-keyword">struct</span> {
    memtable            *MemTable[K, V]
    maxMemtableSize     <span class="hljs-keyword">int</span>
    memtableSize        <span class="hljs-keyword">int</span>
    sstables            []*SSTable[K, V]
    sstableCounter      <span class="hljs-keyword">int</span>
    wal                 *WAL[K, V]
    walPath             <span class="hljs-keyword">string</span>
    manifest            *Manifest
    manifestPath        <span class="hljs-keyword">string</span>
    compactionThreshold <span class="hljs-keyword">int</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">NewDB</span>[<span class="hljs-title">K</span> <span class="hljs-title">comparable</span>, <span class="hljs-title">V</span> <span class="hljs-title">any</span>]<span class="hljs-params">(maxMemtableSize <span class="hljs-keyword">int</span>, compactionThreshold <span class="hljs-keyword">int</span>)</span> <span class="hljs-params">(*DB[K, V], error)</span></span> {
    walPath := <span class="hljs-string">"db.wal"</span>
    memtable, err := ReplayWAL[K, V](walPath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    wal, err := NewWAL[K, V](walPath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    manifestPath := <span class="hljs-string">"MANIFEST"</span>
    manifest, err := ReadManifest(manifestPath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err
    }

    sstables := <span class="hljs-built_in">make</span>([]*SSTable[K, V], <span class="hljs-built_in">len</span>(manifest.SSTablePaths))
    <span class="hljs-keyword">for</span> i, path := <span class="hljs-keyword">range</span> manifest.SSTablePaths {
        sstables[i] = &amp;SSTable[K, V]{path: path}
    }

    <span class="hljs-keyword">return</span> &amp;DB[K, V]{
        wal:                 wal,
        walPath:             walPath,
        memtable:            memtable,
        memtableSize:        <span class="hljs-built_in">len</span>(memtable.data),
        maxMemtableSize:     maxMemtableSize,
        manifestPath:        manifestPath,
        manifest:            manifest,
        sstables:            sstables,
        compactionThreshold: compactionThreshold,
    }, <span class="hljs-literal">nil</span>
}

<span class="hljs-comment">// a new compact function</span>
<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">Compact</span><span class="hljs-params">()</span> <span class="hljs-title">error</span></span> {
    compactedSSTablePath := fmt.Sprintf(<span class="hljs-string">"data-compacted-%d.sstable"</span>, db.sstableCounter)
    compactedSSTable, err := MergeSSTables(db.sstables, compactedSSTablePath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> err
    }
    <span class="hljs-comment">// write new SSTable to MANIFEST file</span>
    db.manifest.SSTablePaths = []<span class="hljs-keyword">string</span>{compactedSSTablePath}
    <span class="hljs-keyword">if</span> err := WriteManifest(db.manifestPath, db.manifest); err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> err
    }
    <span class="hljs-comment">//note delete only after writing manifest</span>
    <span class="hljs-keyword">for</span> _, sstable := <span class="hljs-keyword">range</span> db.sstables {
        <span class="hljs-keyword">if</span> err := os.Remove(sstable.path); err != <span class="hljs-literal">nil</span> {
            log.Printf(<span class="hljs-string">"Failed to remove old sstable %s: %v"</span>, sstable.path, err)
        }
    }

    db.sstables = []*SSTable[K, V]{compactedSSTable}
    db.sstableCounter++

    <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(db *DB[K, V])</span> <span class="hljs-title">flushMemtable</span><span class="hljs-params">()</span> <span class="hljs-title">error</span></span> {
    sstablePath := fmt.Sprintf(<span class="hljs-string">"data-%d.sstable"</span>, db.sstableCounter)
    sstable, err := writeSSTable(db.memtable, sstablePath)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> err
    }

    db.sstables = <span class="hljs-built_in">append</span>(db.sstables, sstable)
    db.sstableCounter++

    db.manifest.SSTablePaths = <span class="hljs-built_in">append</span>(db.manifest.SSTablePaths, sstablePath)
    <span class="hljs-keyword">if</span> err := WriteManifest(db.manifestPath, db.manifest); err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> err
    }

    db.memtable = NewMemTable[K, V]()
    db.memtableSize = <span class="hljs-number">0</span>

    <span class="hljs-comment">// trigger compaction</span>
    <span class="hljs-keyword">if</span> <span class="hljs-built_in">len</span>(db.sstables) &gt;= db.compactionThreshold {
        <span class="hljs-keyword">if</span> err := db.Compact(); err != <span class="hljs-literal">nil</span> {
            log.Printf(<span class="hljs-string">"Compaction failed: %v"</span>, err)
            <span class="hljs-keyword">return</span> err
        }
    }

    <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>
}
</code></pre>
<p>Notice the <code>Compact()</code> function in the integrated DB code? This is where we invoke previously defined the <code>MergeSSTables</code> function to trigger the compaction process. After invoking <code>MergeSSTables</code>, we write a new SSTable to the MANIFEST file and then delete the older SSTables.</p>
<p>Previously, in the <a class="post-section-overview" href="#heading-manifest-file-tracking-the-state-of-the-database">Manifest File: Tracking the State of the Database</a>, I spoke about atomic renaming <code>os.Rename(tmpPath, path)</code>. Let’s talk about why the atomic renaming of MANIFEST matters for compaction.</p>
<p>During compaction, we're making a major change to the database state: replacing multiple SSTables with a single compacted one. The MANIFEST update is critical here because it's the source of truth for which SSTables exist.</p>
<p>Let’s think about what could go wrong without atomic renaming:</p>
<ol>
<li><p>You start writing the new MANIFEST (which points to the compacted SSTable)</p>
</li>
<li><p>System crashes mid-write</p>
</li>
<li><p>MANIFEST is corrupted and unreadable</p>
</li>
<li><p>On restart, the database has no idea which SSTables exist</p>
</li>
<li><p>All data is effectively lost</p>
</li>
</ol>
<p>With atomic renaming:</p>
<ol>
<li><p>We write the new MANIFEST to MANIFEST.tmp</p>
</li>
<li><p>We fully close and sync it to disk</p>
</li>
<li><p>We atomically rename MANIFEST.tmp to MANIFEST using <code>os.Rename(tmpPath, path)</code></p>
</li>
<li><p>If crash happens before step 3: old MANIFEST is intact, we retry compaction</p>
</li>
<li><p>If crash happens during step 3: atomic operation either completes or doesn't – no corruption</p>
</li>
<li><p>If crash happens after step 3: new MANIFEST is in place, we're good</p>
</li>
</ol>
<p>This is also why we delete the old SSTables only after successfully updating the MANIFEST. If we deleted them before updating MANIFEST and then crashed, the MANIFEST would still point to files that no longer exist.</p>
<h4 id="heading-complete-picture">Complete Picture:</h4>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765740593067/c18083ad-bf8a-4cae-92d3-d690a61dac52.png" alt="c18083ad-bf8a-4cae-92d3-d690a61dac52" class="image--center mx-auto" width="1806" height="1900" loading="lazy"></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Congratulations! You've built a working LSM tree storage engine from scratch. By following the problem-driven approach – discovering issues and implementing solutions as they arose – you've experienced how engineers think about building robust storage systems. I hope this is better than just memorizing the concepts.</p>
<p><strong>Key Takeaways</strong></p>
<ul>
<li><p><strong>Append-only writes</strong> make LSM-trees fast for write-heavy workloads</p>
</li>
<li><p><strong>Immutability</strong> eliminates complex concurrency issues</p>
</li>
<li><p><strong>Trade-off</strong> is that LSM-tree favor writes over reads (opposite of B-trees)</p>
</li>
<li><p><strong>Durability</strong> requires multiple mechanism working together (WAL, MANIFEST, atomic operations)</p>
</li>
<li><p><strong>Background maintenance</strong> (compaction) is essential for long-term health and cost.</p>
</li>
</ul>
<p>Important note: This is a learning implementation. This means that I intentionally simplified the code, so it’s <strong>not production-ready</strong>. Key limitations include:</p>
<ul>
<li><p>No concurrency control (missing mutexes/locks)</p>
</li>
<li><p>No bloom filters for efficient lookups</p>
</li>
<li><p>Simplified compaction strategy</p>
</li>
<li><p>Type safety issues with generic tombstones</p>
</li>
<li><p>Missing robust error recovery</p>
</li>
</ul>
<h3 id="heading-complete-code">Complete Code:</h3>
<p>Like I’ve mentioned before, I've omitted boilerplate code and helper functions for brevity. The complete, runnable implementation is available <a target="_blank" href="https://github.com/justramesh2000/lsm-db">at this GitHub repo</a>.</p>
<p>To learn more about production LSM implementations, study RocksDB, LevelDB, or read the original LSM tree paper by O'Neil et al: <a target="_blank" href="https://www.cs.umb.edu/~poneil/lsmtree.pdf">https://www.cs.umb.edu/~poneil/lsmtree.pdf</a></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Deploy Your Own Cockroach DB  Instance on Kubernetes [Full Book for Devs] ]]>
                </title>
                <description>
                    <![CDATA[ Developers are smart, wonderful people, and they’re some of the most logical thinkers you’ll ever meet. But we’re pretty terrible at naming things 😂 Like, what in the world – out of every other possible name, they decided to name a database after a ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/deploy-your-own-cockroach-db-instance-on-kubernetes-full-book-for-devs/</link>
                <guid isPermaLink="false">6925e482ccc8b29b82c002c5</guid>
                
                    <category>
                        <![CDATA[ cockroachdb ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                    <category>
                        <![CDATA[ google cloud ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Kubernetes ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Prince Onukwili ]]>
                </dc:creator>
                <pubDate>Tue, 25 Nov 2025 17:16:50 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764088553942/496bf5f4-f059-4873-b6c1-419a86e594ef.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Developers are smart, wonderful people, and they’re some of the most logical thinkers you’ll ever meet. But we’re pretty terrible at naming things 😂</p>
<p>Like, what in the world – out of every other possible name, they decided to name a database after a <em>literal cockroach</em>? 🤣</p>
<p>I mean, I get it: cockroaches are known for being resilient, and the devs were probably trying to say “our database never dies”… but still…a cockroach?</p>
<p>The name aside, out of all the databases out there, you might be wondering why would you choose CockroachDB? And if you did choose it, where would you even start when trying to host and deploy it? Would you go for a managed cloud service? Or could you actually self-manage it?</p>
<p>If you ever thought of doing it yourself – maybe in a dev environment, or even introducing it to your company – how would you go about it?</p>
<p>Well, just calm your nerves 😄</p>
<p>In this book, we’ll explore everything you need to know about <strong>deploying and managing CockroachDB on Kubernetes</strong>. We’ll dive deep into:</p>
<ul>
<li><p>Understanding how CockroachDB’s masterless (multi-primary) architecture actually works</p>
</li>
<li><p>Setting up and deploying CockroachDB on a Kubernetes cluster</p>
</li>
<li><p>Automating backups to Google Cloud Storage using just a few queries in the CockroachDB cluster</p>
</li>
<li><p>Managing service accounts and authentication securely</p>
</li>
<li><p>Tuning CockroachDB’s memory settings for stable performance</p>
</li>
<li><p>Scaling the cluster horizontally and vertically without downtime</p>
</li>
<li><p>Monitoring and maintaining the database like a pro</p>
</li>
</ul>
<p>By the end, you’ll not only understand how CockroachDB works, you’ll be confident enough to deploy and manage your own resilient, production-ready instance. 🚀</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-even-is-cockroachdb">What Even Is CockroachDB? 🤔</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-simple-definition">Simple Definition</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-who-made-cockroachdb-when-was-it-released">Who Made CockroachDB? When Was it Released?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-problems-does-cockroachdb-try-to-solve">What Problems Does CockroachDB Try to Solve?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-key-terms-you-should-know-in-plain-language">Key Terms You Should Know (in plain language):</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-the-name-cockroachdb">Why the name “CockroachDB”? 😅</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-why-choose-cockroachdb-over-postgresql-or-mongodb">Why Choose CockroachDB Over PostgreSQL or MongoDB 🤷🏾‍♂️?</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-fault-tolerance-is-handled-in-postgresql-and-mongodb">How Fault Tolerance is Handled in PostgreSQL and MongoDB</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-cockroachdb-handles-it-differently">How CockroachDB Handles It Differently</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-cockroachdb-works-behind-the-scenes">How CockroachDB Works Behind the Scenes ⚙️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-ranges-the-small-pieces-of-data">Ranges: The Small Pieces of Data</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-replication-many-copies-for-safety">Replication: Many Copies for Safety</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-raft-consensus-how-all-copies-agree">Raft Consensus: How All Copies Agree</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-multiraft-keeping-raft-efficient-when-things-scale">MultiRaft: Keeping Raft Efficient When Things Scale</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-rebalancing-movement-for-balance">Rebalancing: Movement for Balance</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-distributed-transactions-doing-work-across-multiple-ranges">Distributed Transactions: Doing Work Across Multiple Ranges</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-it-all-fits-together-read-write-flow-what-happens-when-you-use-it">How It All Fits Together: Read + Write Flow (What Happens When You Use It)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-this-all-matters-putting-it-in-plain-english">Why This All Matters (Putting It in Plain English)</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-where-and-how-should-you-host-cockroachdb">Where (and How) Should You Host CockroachDB? ☁️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-option-1-cockroachdb-cloud-fully-managed-by-cockroach-labs">Option 1: CockroachDB Cloud (fully managed by Cockroach Labs)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-2-bring-your-own-cloud-byoc">Option 2: Bring Your Own Cloud (BYOC)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-3-use-cloud-marketplaces-aws-gcp-azure">Option 3: Use Cloud Marketplaces (AWS, GCP, Azure)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-4-my-favorite-self-hosting-especially-using-kubernetes">Option 4 (My Favorite 😁): Self-Hosting — Especially Using Kubernetes</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-setting-up-your-local-environment">Setting Up Your Local Environment 🧑‍💻</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-these-tools">Why these tools?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-install-minikube">Step 1: Install Minikube</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-install-kubectl">Step 2: Install kubectl</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-install-helm">Step 3: Install Helm</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-deploying-cockroachdb-on-minikube-the-fun-part-begins">Deploying CockroachDB on Minikube (The Fun Part Begins 😁!)</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-visit-artifacthub">Step 1: Visit ArtifactHub</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-explore-the-helm-chart">Step 2: Explore the Helm Chart</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-copy-the-default-values">Step 3: Copy the Default Values</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-create-a-folder-for-our-project">Step 4: Create a Folder for Our Project</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-understanding-the-key-configurations">Step 5: Understanding the Key Configurations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-create-a-simplified-values-config-for-the-cockroachdb-helm-chart">Step 6: Create a Simplified Values Config for the CockroachDB Helm Chart</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-overview-of-the-yaml-values">Overview of the YAML values</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-install-the-cockroachdb-cluster-using-helm">🚀 Step 7: Install the CockroachDB Cluster Using Helm</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-accessing-the-cockroachdb-console-amp-viewing-metrics">Accessing the CockroachDB Console &amp; Viewing Metrics</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-step-1-locate-the-cockroachdb-public-service">Step 1: Locate the CockroachDB Public Service</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-learn-more-about-the-service">Step 2: Learn More About the Service</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-access-the-cockroachdb-dashboard">Step 3: Access the CockroachDB Dashboard</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-visit-the-dashboard">Step 4: Visit the Dashboard</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-exploring-the-metrics-dashboard">Step 5: Exploring the Metrics Dashboard</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-creating-a-little-load-on-the-cockroachdb-cluster">Step 6: Creating a Little Load on the CockroachDB Cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-viewing-the-metrics-from-the-load">Step 7: Viewing the Metrics from the Load</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-8-view-the-list-of-created-items-in-the-database">Step 8: View the List of Created Items in the Database</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-backing-up-cockroachdb-to-google-cloud-storage">Backing Up CockroachDB to Google Cloud Storage ☁️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-backups-are-absolutely-critical">Why Backups Are Absolutely Critical</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-our-db-installing-beekeeper-studio">Connecting to Our DB – Installing Beekeeper Studio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-install-beekeeper-studio">How to Install Beekeeper Studio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-beekeeper-studio-to-cockroachdb">Connecting Beekeeper Studio to CockroachDB</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-exposing-the-cluster-for-local-access">Exposing the Cluster for Local Access</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-via-beekeeper-studio">🐝 Connecting via Beekeeper Studio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-verify-the-connection">Verify the Connection</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-creating-a-google-cloud-account">Creating a Google Cloud Account</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-creating-a-google-cloud-storage-bucket">Creating a Google Cloud Storage Bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-giving-cockroachdb-access-to-the-bucket">Giving CockroachDB Access to the Bucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-attaching-the-key-to-our-cockroachdb-cluster">Attaching the Key to Our CockroachDB Cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-testing-our-backup-disaster-recovery-time">Testing Our Backup — Disaster Recovery Time</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-managing-resources-amp-optimizing-memory-usage">Managing Resources &amp; Optimizing Memory Usage</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-cockroachdb-uses-memory">How CockroachDB Uses Memory</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-memory-usage-formula-you-must-follow">The Memory Usage Formula You Must Follow</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-where-you-find-these-settings">Where You Find These Settings</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-concrete-example-step-by-step">Concrete Example (Step-by-Step)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-on-requests-vs-limits-in-kubernetes">⚠️ On Requests vs Limits in Kubernetes</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-overriding-the-default-fractions">Overriding the Default Fractions</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-scaling-cockroachdb-the-right-way">Scaling CockroachDB the Right Way</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-key-metrics-to-understand">Key Metrics to Understand</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-and-what-to-scale-based-on-your-metrics">When (and What) to Scale Based on Your Metrics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-disk-bound-situations-what-to-do-when-your-disk-is-the-limiting-factor">Disk-Bound Situations — What to Do When Your Disk Is the Limiting Factor</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-memory-pressure-what-to-do-when-your-database-hits-the-limit">Memory Pressure — What to Do When Your Database Hits the Limit</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-when-queries-are-slow-but-everything-else-cpu-memory-amp-disk-looks-fine">When Queries Are Slow but Everything Else (CPU, Memory &amp; Disk) Looks “Fine”</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-disk-speed-iops-amp-throughput-across-cloud-providers">Understanding Disk Speed (IOPS &amp; Throughput) Across Cloud Providers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-downsizing-the-cluster-reducing-replicas">Downsizing the Cluster (Reducing Replicas)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-wrong-way-to-downscale">⚠️ The Wrong Way to Downscale</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-decommissioning-a-node-before-scaling-down-the-cluster">Decommissioning a Node Before Scaling Down the Cluster</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-what-to-consider-when-deploying-cockroachdb-on-google-kubernetes-engine-gke">What to Consider When Deploying CockroachDB on Google Kubernetes Engine (GKE) ☁️</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-creating-your-gke-cluster">Creating Your GKE Cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-your-gke-cluster">Connecting to your GKE cluster</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-deploying-cockroachdb-in-production-on-gke">Deploying CockroachDB in Production (on GKE)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-configuration">Understanding the Configuration</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-installing-the-cockroachdb-cluster-on-gke">Installing the CockroachDB Cluster on GKE</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-our-cockroachdb-cluster-now-that-tls-mtls-are-enabled">Connecting to Our CockroachDB Cluster (Now That TLS + mTLS Are Enabled)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-via-mutual-tls-mtls-why-we-need-a-certificate-for-our-root-user">Connecting via Mutual TLS (mTLS) — Why We Need a Certificate for Our root User</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-lets-explore-our-clusters-certificate">Let’s Explore Our Cluster’s Certificate</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-understanding-the-certificate-sections-explained-super-simply">Understanding the Certificate Sections (Explained Super Simply)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-creating-a-client-certificate-so-we-can-finally-connect-to-cockroachdb">Creating a Client Certificate (So We Can Finally Connect to CockroachDB)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-our-cockroachdb-cluster-securely-using-mtls">Connecting to Our CockroachDB Cluster Securely (Using mTLS)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-restoring-our-previous-database-into-the-new-gke-cockroachdb-cluster-without-sa-keys">Restoring Our Previous Database into the New GKE CockroachDB Cluster (without SA keys)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-restoring-our-previous-database-from-google-cloud-storage">Restoring Our Previous Database from Google Cloud Storage</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-now-lets-restore-the-data">Now, Let’s Restore the Data 🎉</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-to-the-database-with-a-new-user">Connecting to the Database with a New User</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-with-passwordless-authentication-mutual-tls">Connecting with Passwordless Authentication (Mutual TLS)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-connecting-via-mutual-tls-mtls-from-our-apps-on-kubernetes">Connecting via Mutual TLS (mTLS) from Our Apps on Kubernetes</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-get-a-cockroachdb-enterprise-license-for-free">How to Get a CockroachDB Enterprise License for FREEE!</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-three-types-of-licenses">Three Types of Licenses</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-apply-for-the-free-enterprise-license">How to Apply for the Free Enterprise License</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-adding-your-license-to-the-cockroachdb-cluster">Adding Your License to the CockroachDB Cluster</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion-amp-next-steps">Conclusion &amp; Next Steps ✨</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-about-the-author">About the Author 👨🏾‍💻</a></li>
</ul>
</li>
</ol>
<h2 id="heading-what-even-is-cockroachdb">What Even Is CockroachDB? 🤔</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760416037885/c67edcbb-be85-4614-bdf3-104942048eea.jpeg" alt="An image summarizing what CockroachDB is" class="image--center mx-auto" width="1307" height="1697" loading="lazy"></p>
<p>Hey! before we jump into setting up our Kubernetes cluster and deploying our CockroachDB cluster, let’s get grounded in what CockroachDB really is. (Because if you don’t understand the why and how, the implementation and practical session will just feel like magic 😅.)</p>
<h3 id="heading-simple-definition">Simple Definition</h3>
<p>CockroachDB is a distributed SQL database. This means it gives you the features of a relational database (tables, SQL queries, JOINS, transactions) but copies data across multiple replicas (servers, nodes, instances). No need for sharding manually. 😃</p>
<p>It’s built to survive failures, scale easily (compared to other SQL databases), and keep your data consistent no matter what (across all the instances).</p>
<h3 id="heading-who-made-cockroachdb-when-was-it-released">Who Made CockroachDB? When Was it Released?</h3>
<p>CockroachDB was created by <a target="_blank" href="https://www.cockroachlabs.com/"><strong>Cockroach Labs</strong></a>, founded by Spencer Kimball, Peter Mattis, and Ben Darnell. The idea first started taking shape around 2014, and by 2015 Cockroach Labs was formally founded.</p>
<p>Its 1.0 “production-ready” version was announced in 2017, marking its transition from beta to being suitable for real-world use.</p>
<h3 id="heading-what-problems-does-cockroachdb-try-to-solve">What Problems Does CockroachDB Try to Solve?</h3>
<p>Traditional relational databases are great, but they run into real challenges when your app grows. CockroachDB was built to solve those. Here are the key pain points and how CockroachDB addresses them:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Pain Point</td><td>What usually happens</td><td>How CockroachDB fixes it</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Single primary bottleneck</strong></td><td>ONLY ONE “primary” node handles writes, updates, and deletes. That node can become difficult to scale (adapt to the DB usage) without downtime</td><td>CockroachDB is <strong>multi-primary</strong>, meaning every node can accept reads and writes. No single “primary” for the entire cluster.</td></tr>
<tr>
<td><strong>Manual sharding complexity</strong></td><td>You have to split data (shard) by hand, decide which piece goes where, and handle cross-shard queries, lots of headache 😖.</td><td>CockroachDB automatically partitions data into smaller units (called <em>ranges</em>) and moves them around to balance load.</td></tr>
<tr>
<td><strong>Failover downtime</strong></td><td>If the primary node fails, you need to promote a replica (read-only instance) and switch over. During that time, your app might be down.</td><td>Because there’s no single primary, if one of the instances fail, others take over seamlessly (via consensus) without a big outage.</td></tr>
<tr>
<td><strong>Geographic scaling &amp; latency</strong></td><td>Serving users in different regions is hard — either data is far away (slow) or you must build complex replication logic.</td><td>CockroachDB lets you distribute nodes across regions. You can serve local reads/writes while keeping global consistency.</td></tr>
</tbody>
</table>
</div><p>So instead of fighting your database as it grows, CockroachDB handles much of the hard work for you.</p>
<h3 id="heading-key-terms-you-should-know-in-plain-language">Key Terms You Should Know (in plain language):</h3>
<ul>
<li><p><strong>Node:</strong> Duplicates or copies of your database. These are also known as replicas. They can be read-only (databases from which data can only be read, for example using SELECT statements), OR read-write (databases from which data can be read, created, updated, and deleted).</p>
</li>
<li><p><strong>Replication</strong>: making copies of data on multiple nodes. If one node fails, others still have the data.</p>
</li>
<li><p><strong>Raft (consensus algorithm)</strong>: a system that ensures copies (replicas) agree on changes in a safe, reliable way. For example, when you want to write data, Raft ensures that most copies agree before it’s accepted.</p>
</li>
<li><p><strong>Sharding / Ranges</strong>: Instead of putting all your data in one big blob, CockroachDB splits it into smaller chunks called <em>ranges</em>. Each range is replicated and can move between nodes.</p>
</li>
<li><p><strong>Distributed transaction</strong>: a transaction (series of operations) that might touch data stored in different nodes. CockroachDB manages this, so you still get ACID (atomic, consistent, isolated, durable) properties.</p>
</li>
</ul>
<h3 id="heading-why-the-name-cockroachdb">Why the name “CockroachDB”? 😅</h3>
<p>You might wonder: <em>Why name a database after a cockroach?</em> It sounds weird at first, but there's a reason:</p>
<p>Cockroaches are known for surviving harsh conditions: radiation, natural disasters, and so on. The founders wanted a database that feels almost “impossible to kill,” that can survive node failures, outages, and network splits. The name is a tongue-in-cheek nod to resilience.</p>
<h2 id="heading-why-choose-cockroachdb-over-postgresql-or-mongodb">Why Choose CockroachDB Over PostgreSQL or MongoDB 🤷🏾‍♂️?</h2>
<p>Let’s compare the classic setup (Postgres / MongoDB) to CockroachDB, especially why you might want to go with CockroachDB, and how it helps ease scaling. I’ll also explain some terms to make sure you’re following.</p>
<p>In many setups, when you use Postgres or MongoDB, you’ll often have one “primary” node that handles all writes (that is, inserts, updates, deletes).</p>
<p>Then you have multiple “read replicas” that copy the primary’s data and serve read requests (selects). That works okay – reads can be spread out – but all write traffic goes to that one primary node.</p>
<p>Usually, the primary eventually gets stressed when the write volume grows (for example, more customers create accounts and products on your platform).</p>
<p>You can add more read replicas (horizontal scaling for reads, for example customers trying to view their accounts, or previously created products on your site), but scaling the primary is much harder.</p>
<p>To scale the primary, you often resort to upgrading its resources (CPU, RAM, disk) – that’s vertical scaling – which often needs downtime (shut down the primary database, increase its CPU and RAM, then spin it back up).</p>
<p>Or you’d have to manually shard (split) your data across multiple primaries, route traffic carefully, and manage complexity.</p>
<h3 id="heading-how-fault-tolerance-is-handled-in-postgresql-and-mongodb">How Fault Tolerance is Handled in PostgreSQL and MongoDB</h3>
<p>When you try to make Postgres (or MongoDB) highly available and fault tolerant in a self-managed setup, you often need two+ read replicas and one primary.</p>
<p>The tricky part is handling what happens when the primary fails (or is taken down temporarily for an upgrade). You need something that can promote a replica to a primary automatically.</p>
<p>In Postgres land, that’s often handled by <a target="_blank" href="https://github.com/patroni/patroni"><strong>Patroni</strong></a> or <a target="_blank" href="https://www.repmgr.org/"><strong>repmgr</strong></a> (tools that handle cluster management, failover, leader election, and so on).</p>
<p>In MongoDB, such logic is part of the <strong>replica set</strong> behavior: it does automatic elections among replicas.</p>
<p>Here are some of the core challenges with that classic model:</p>
<ul>
<li><p>Every write must go to a single primary. If that primary fails or is overloaded, your whole system suffers.</p>
</li>
<li><p>Scaling reads is easy (add more replicas), but scaling writes is hard.</p>
</li>
<li><p>Vertical scaling (give more resources to one server) has its cons. If the primary node needs more resources, you might experience some downtime when it’s being scaled up.</p>
</li>
<li><p>Manual sharding is messy: you decide which piece of data goes to which shard, handle cross-shard queries, and build routing logic. That’s a lot of maintenance and can lead to unexpected issues if not handled properly.</p>
</li>
<li><p>One service (or load balancer/proxy) points to the primary (for ALL write queries).</p>
</li>
<li><p>Another service or routing logic handles read queries and can share reads across replicas.</p>
</li>
<li><p>You might use <strong>HAProxy</strong>, <strong>pgpool-II</strong>, or <strong>pgBouncer</strong> for Postgres to route traffic, do read/write splitting, or manage connection pooling. These are external (not part of the database core) tools you have to configure.</p>
</li>
</ul>
<p>So when the primary fails, Patroni (or repmgr, and so on) will detect it and promote one of the read replicas to be the new primary.</p>
<p>But that promotion, reconfiguration, and traffic rerouting often cause a brief window of downtime (when your primary database node becomes unavailable).</p>
<h3 id="heading-how-cockroachdb-handles-it-differently">How CockroachDB Handles It Differently</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760416070693/af1ade70-19bb-4e9f-82ec-9711c13d8079.jpeg" alt="A brief look at CockroachDB properties" class="image--center mx-auto" width="800" height="800" loading="lazy"></p>
<p>CockroachDB changes the rules:</p>
<ul>
<li><p><strong>All replicas are equal</strong> for reads <em>and</em> writes. You don’t have a special “primary” that handles writes. Every node in the cluster can accept write requests.</p>
</li>
<li><p>CockroachDB breaks your data into small chunks (ranges) and replicates them across nodes. If you add a new node, data moves around automatically to balance the load.</p>
</li>
<li><p>Every write is automatically copied to other replicas, and consistency is managed by a protocol (Raft), so you don’t have to build this yourself.</p>
</li>
<li><p>No manual sharding needed. Because the database handles how data is split and moved, you don’t need to decide how to shard by hand.</p>
</li>
<li><p>You <strong>don’t need a special service</strong> to route writes vs reads queries. Any node can accept both reads <strong>and</strong> writes.</p>
</li>
<li><p>During scaling, you don’t have to worry about which node is the primary – because <em>there is no primary</em>.</p>
</li>
<li><p>You can scale your nodes one at a time (rollout style). When one node is being upgraded, the others continue to serve traffic. You won’t hit a downtime window just because you're scaling the “primary.”</p>
</li>
<li><p>Because there's no replica promotion logic to fight with, there's no moment where a replica needs to be “elevated” to primary – it’s all just nodes continuing to serve.</p>
</li>
</ul>
<h2 id="heading-how-cockroachdb-works-behind-the-scenes">How CockroachDB Works Behind the Scenes ⚙️</h2>
<p>In CockroachDB, there are many moving parts behind the scenes. But they work together, so you don’t have to babysit them. The core ideas, which we’ve mostly already touched on, are:</p>
<ul>
<li><p>Splitting data into pieces (<strong>ranges</strong>)</p>
</li>
<li><p>Keeping multiple copies of each piece (<strong>replicas/replication</strong>)</p>
</li>
<li><p>Making sure all copies agree via <strong>Raft consensus</strong></p>
</li>
<li><p>Moving pieces around to balance the load (<strong>automatic rebalancing/distribution</strong>)</p>
</li>
<li><p>Coordinating transactions that might touch many pieces</p>
</li>
</ul>
<p>Let’s go through each of those, one by one.</p>
<h3 id="heading-ranges-the-small-pieces-of-data">Ranges: The Small Pieces of Data</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760413105037/984f8b5c-bd53-4850-9704-57ce1dcedb80.png" alt="A little depiction of CockroachDB ranges" class="image--center mx-auto" width="977" height="445" loading="lazy"></p>
<p>Imagine you have a giant book of recipes. If you try to carry the whole thing, it’s heavy. So you split the book into smaller booklets, each covering recipes for a certain range of meals: breakfasts, lunches, dinners, desserts.</p>
<p>In CockroachDB, data is split into ranges, which are like those smaller booklets:</p>
<ul>
<li><p>Each range covers a certain block of data (like “all users whose ID is 1-1000”)</p>
</li>
<li><p>When a range gets too big (like having too many recipes in one booklet) it’s cut/split into two smaller ones. That makes each piece easier to manage.</p>
</li>
<li><p>If two neighboring ranges have become very small (few recipes), they might be merged (joined) back together so you’re not keeping too many tiny booklets.</p>
</li>
<li><p>These splits and merges happen automatically, behind the scenes, so the database stays smooth as things grow or shrink.</p>
</li>
</ul>
<p>This chopping helps the system in many ways: moving pieces, copying them, balancing load, recovering from node failures becomes easier.</p>
<h3 id="heading-replication-many-copies-for-safety">Replication: Many Copies for Safety</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760413678362/a0066780-1360-4511-8fd0-466f54ea2135.jpeg" alt="Replication of Ranges across multiple Nodes (databases) in CockroachDB" class="image--center mx-auto" width="1024" height="448" loading="lazy"></p>
<p>Nobody likes losing their work, so you keep backup copies. CockroachDB does this for data as well.</p>
<p>For each range, there are usually 3 copies (replicas) stored on different machines (nodes). If one machine dies, you still have others. (<a target="_blank" href="https://www.cockroachlabs.com/docs/stable/architecture/replication-layer?utm_source=chatgpt.com">cockroachlabs.com</a>). And these copies are always kept in sync: when you write something (for example, insert or update), the change is propagated to the other copies.</p>
<p>The database also tolerates failures. If one node goes down, the system detects it and eventually makes a new copy elsewhere to replace it. So the target number of copies is maintained. This gives you fault tolerance: your data stays safe even when parts of your system fail.</p>
<h3 id="heading-raft-consensus-how-all-copies-agree">Raft Consensus: How All Copies Agree</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760415307117/79859a4b-4341-46eb-91d9-cccc3bde9a66.jpeg" alt="79859a4b-4341-46eb-91d9-cccc3bde9a66" class="image--center mx-auto" width="1024" height="448" loading="lazy"></p>
<p>Having copies is useful, but you also need them to agree with each other – like all your recipe booklets have the same content in each copy. The Raft protocol is a way to make sure that happens reliably.</p>
<p>Here’s how Raft works in simple terms:</p>
<ul>
<li><p>Each range has a group of replicas. One of these replicas acts as the <strong>leader</strong>. Others are <strong>followers</strong>.</p>
</li>
<li><p>All write requests for that range go through the leader. The leader gets the request, then tells followers to record the same change.</p>
</li>
<li><p>Once most of the copies (a majority) say “yep, we got it,” the change is considered final (committed). Then the leader tells the client, “Done.”</p>
</li>
<li><p>If the leader stops working (the machine dies or the network fails), the followers notice it (they stop getting regular “I’m alive” messages), then they hold an election to pick a new leader, and the show goes on.</p>
</li>
<li><p>This way, the system ensures everyone has the same final data and no conflicting changes happen.</p>
</li>
</ul>
<p>So Raft is the agreement protocol that keeps all copies in sync and safe.</p>
<h3 id="heading-multiraft-keeping-raft-efficient-when-things-scale">MultiRaft: Keeping Raft Efficient When Things Scale</h3>
<p>When you have many ranges (many pieces of the booklets), each range has its own Raft group. That can mean a lot of “are you alive?” messages between nodes, and a lot of overhead. MultiRaft is the trick CockroachDB uses to make this efficient.</p>
<p>MultiRaft groups together Raft work for many ranges that share nodes, so overhead is reduced. Instead of sending separate heartbeat (are you alive?) messages for each range, some of the messages are bundled.</p>
<p>This reduces network chatter and resource waste and helps the database scale smoothly when you have tons of data and many pieces.</p>
<h3 id="heading-rebalancing-movement-for-balance">Rebalancing: Movement for Balance</h3>
<p>When your ranges are not evenly spread across nodes (machines), some machines are doing way too much work, and some hardly any. That’s not good. So CockroachDB automatically moves pieces around to balance things.</p>
<ul>
<li><p>The system watches how busy each node is (how many ranges it holds, how much data, how much read/write traffic).</p>
</li>
<li><p>If one node is overloaded, it will move some ranges to other nodes.</p>
</li>
<li><p>If a node dies, the system notices and makes sure that ranges that were on that node get copied somewhere else so safety (replica count) is maintained.</p>
</li>
<li><p>If you add a new node, the system starts moving ranges to the new node so its resources are used.</p>
</li>
</ul>
<p>This happens without you having to manually decide “move this here, move that there.”</p>
<h3 id="heading-distributed-transactions-doing-work-across-multiple-ranges">Distributed Transactions: Doing Work Across Multiple Ranges</h3>
<p>Often, an operation touches multiple ranges. For example, “transfer money from account A (in range 1) to account B (in range 2)”. That must be handled carefully so that either both parts succeed, or neither do.</p>
<p>CockroachDB supports <strong>distributed transactions</strong>, meaning a single transaction can work across many ranges. It uses “intent” writes (temporary placeholders) and once everything is ready, it commits the transaction so it becomes permanent. If something fails, it aborts (cancels) the whole thing. The system ensures atomic behavior: all or nothing.</p>
<h3 id="heading-how-it-all-fits-together-read-write-flow-what-happens-when-you-use-it">How It All Fits Together: Read + Write Flow (What Happens When You Use It)</h3>
<p>Let’s picture a write, step by step:</p>
<ol>
<li><p>Your app sends a write (for example, “add new user”) to any node in the CockroachDB cluster.</p>
</li>
<li><p>That node figures out which range(s) are involved (which pieces hold the data you want to write).</p>
</li>
<li><p>For each range, the write goes to that range’s leader.</p>
</li>
<li><p>The leader writes the change to their own copy, then tells followers to do the same.</p>
</li>
<li><p>Once most copies confirm they have the change, the leader declares it “committed” and tells your app, “yes, write done.”</p>
</li>
<li><p>If a node is busy or down, others still handle traffic.</p>
</li>
</ol>
<p>Read flow:</p>
<ul>
<li><p>Your app sends a read (for example “get user by ID”) to any node.</p>
</li>
<li><p>That node checks its copies. If it has a fresh copy, it answers. If not, it asks the node that does.</p>
</li>
</ul>
<p>Everything works so data is correct, up to date, and reliably available even if machines fail or network lags.</p>
<h3 id="heading-why-this-all-matters-putting-it-in-plain-english">Why This All Matters (Putting It in Plain English)</h3>
<p>All these tweaks are important for several key reasons. First of all, because data is chopped into ranges and replicated, no single node is a bottleneck. Also, Raft ensures consensus, so you can trust that data is consistent across all working replicas.</p>
<p>Beyond this, rebalancing is automatic, you don’t have to micromanage shards or worry about nodes drowning in load. And because transactions that touch multiple ranges are coordinated, you can trust ACID properties even in a distributed setup.</p>
<h2 id="heading-where-and-how-should-you-host-cockroachdb">Where (and How) Should You Host CockroachDB? ☁️</h2>
<p>There isn’t just one “right” way to host CockroachDB. There are a few paths you can pick, each with pros and cons. What you pick depends on cost, control, ease of use, and your risk tolerance.</p>
<p>In this section, we’ll explore:</p>
<ul>
<li><p>Cockroach Labs’ own managed cloud (CockroachDB Cloud)</p>
</li>
<li><p>“Bring Your Own Cloud” (BYOC) – letting Cockroach Labs manage it inside <em>your</em> cloud account</p>
</li>
<li><p>Hosting via cloud marketplaces (AWS, GCP, Azure)</p>
</li>
<li><p>Self-hosting / Kubernetes / your own infrastructure</p>
</li>
<li><p>And notes on DigitalOcean support</p>
</li>
</ul>
<p>Let’s dive in.</p>
<h3 id="heading-option-1-cockroachdb-cloud-fully-managed-by-cockroach-labs">Option 1: CockroachDB Cloud (fully managed by Cockroach Labs)</h3>
<p>This is the easiest option if you want to offload operations. You don’t manage nodes (computers, Virtual machines, and so on), upgrades, or backups, as Cockroach Labs handles all that.</p>
<p><strong>What it offers:</strong></p>
<ul>
<li><p>You sign up and click “create cluster.”</p>
</li>
<li><p>Automatic scaling, zero-downtime upgrades, and managed backups.</p>
</li>
<li><p>It supports multiple cloud providers behind the scenes (you pick region(s)).</p>
</li>
<li><p>You get tools, APIs, and Terraform integration to automate it.</p>
</li>
<li><p>They often give free credits to get started.</p>
</li>
</ul>
<p><strong>Tradeoffs:</strong></p>
<ul>
<li><p>You have less control over underlying infrastructure, for example Virtual Machines, networking, disks, and so on (you trade control for convenience).</p>
</li>
<li><p>You pay for the managed service premium.</p>
</li>
<li><p>You rely on Cockroach Labs’ SLAs, uptime, and support.</p>
</li>
</ul>
<p>If you want, you can check it out here: <a target="_blank" href="https://www.cockroachlabs.com/product/cloud/">CockroachDB Cloud (managed by Cockroach Labs)</a>.</p>
<h3 id="heading-option-2-bring-your-own-cloud-byoc">Option 2: Bring Your Own Cloud (BYOC)</h3>
<p>This is a middle ground: you keep your cloud environment, but let Cockroach Labs manage the database. It gives you control over infrastructure, billing, network, and so on, while still offloading operational complexity.</p>
<p><strong>How it works:</strong></p>
<ul>
<li><p>You run CockroachDB Cloud inside your cloud account (AWS, GCP, and so on).</p>
</li>
<li><p>Cockroach Labs still handles provisioning, upgrades, backups, and observability. You manage roles, networking, and logs.</p>
</li>
<li><p>Useful for complying with regulations, keeping data within your cloud folder/account, and using your cloud discounts.</p>
</li>
</ul>
<p><strong>Tradeoffs:</strong></p>
<ul>
<li><p>You still need to set up cloud aspects (VPCs, IAM, roles) correctly.</p>
</li>
<li><p>There’s more complexity than pure managed, but more control as well.</p>
</li>
<li><p>Cockroach Labs needs access to certain parts of your account (permissions).</p>
</li>
</ul>
<p>If you want to explore BYOC, you can read more here: <a target="_blank" href="https://www.cockroachlabs.com/product/cloud/bring-your-own-cloud/">CockroachDB Bring Your Own Cloud</a>.</p>
<h3 id="heading-option-3-use-cloud-marketplaces-aws-gcp-azure">Option 3: Use Cloud Marketplaces (AWS, GCP, Azure)</h3>
<p>If you already use a cloud provider, sometimes the easiest way is to deploy via their marketplace offerings. It gives you familiarity, billing simplicity, and so on.</p>
<ul>
<li><p><strong>GCP Marketplace</strong> – CockroachDB is available on the Google Cloud Marketplace, making it easier to deploy within your GCP environment. You can learn more here: <a target="_blank" href="https://console.cloud.google.com/marketplace/product/cockroachdb-public/cockroachdb">GCP Marketplace</a>.</p>
</li>
<li><p><strong>AWS Marketplace</strong> – CockroachDB is listed there: <a target="_blank" href="https://aws.amazon.com/marketplace/pp/prodview-n3xpypxea63du">AWS Marketplace</a>.</p>
</li>
<li><p><strong>Azure Marketplace</strong> – Also supported for Azure deployments (SaaS/managed listings): <a target="_blank" href="https://marketplace.microsoft.com/en-us/product/saas/cockroachlabs1586448087626.cockroachdb-azure?tab=overview">Azure Marketplace</a>.</p>
</li>
<li><p><strong>DigitalOcean</strong> – There is support for CockroachDB deployment on DigitalOcean using their infrastructure: <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/deploy-cockroachdb-on-digital-ocean">Deploy CockroachDB on DigitalOcean</a>.</p>
</li>
</ul>
<p>These options let you stay in your cloud console, use your existing cloud accounts, and integrate with other resources you already have.</p>
<p>But you're still responsible for certain operational tasks (networking, security, monitoring, backups) depending on how the marketplace offering is configured.</p>
<h3 id="heading-option-4-my-favorite-self-hosting-especially-using-kubernetes">Option 4 (My Favorite 😁): Self-Hosting — Especially Using Kubernetes</h3>
<p>If you self-host CockroachDB, you get <strong>full control</strong>. You’re the boss of everything: the machines, storage, networking, backups, upgrades, monitoring – all of it.</p>
<p>What’s even better is that using Kubernetes means your setup isn’t tied to one cloud provider. You can run it on AWS, GCP, Azure, or even on-premises later, with very little change. Kubernetes gives you a “portable infra” layer.</p>
<p>Managed CockroachDB services charge you extra for “maintenance, upgrades, backup, etc.” – those are baked into the price. But when you self-host, you accept the burden, but also avoid paying that extra margin. You pay for compute, disks, network, and your time/ops work.</p>
<p>You can also self-host in the cloud (using cloud VMs) but still manage every layer: disks, network, security, and so on. Using Kubernetes, there is a sweet middle ground: you get cloud reliability for VMs, but you fully control everything above that.</p>
<h4 id="heading-why-kubernetes-beats-tools-like-docker-swarm-or-hashicorp-nomad-for-databases">Why Kubernetes Beats Tools Like Docker Swarm or Hashicorp Nomad for Databases</h4>
<p>Because CockroachDB is a <strong>stateful</strong> system (it holds data), you need strong support for “data that stays even when a pod restarts or moves.” Kubernetes is designed with good primitives for that. Other tools don’t always shine there.</p>
<p>Here’s the comparison in simple terms:</p>
<ul>
<li><p><strong>Docker Swarm / Docker Compose:</strong> Great for stateless apps (web servers, APIs), but when it comes to databases, it struggles. Swarm doesn’t natively support persistent volume claims at a cluster level, so if a container (database replica) moves to a different node (VM), it might lose access to its storage. Devs often pin containers to specific nodes manually to avoid this.</p>
</li>
<li><p><strong>Nomad:</strong> More flexible and simpler in some ways, but it’s not as rich in features around connectivity, storage management, and built-in tooling for containers. It works well in mixed workloads, but handling complex databases usually means you need to build extra layers.</p>
</li>
<li><p><strong>Kubernetes:</strong> It has built-in support for stateful workloads:</p>
<ul>
<li><p><strong>StatefulSets (Properly managing data for each database):</strong> This ensures that each CockroachDB replica (pod) keeps its identity and storage intact even if the pod restarts. So the database replica doesn’t lose its “name” or data when things change.</p>
</li>
<li><p><strong>Persistent volumes and persistent volume claims (external disks):</strong> These are like dedicated hard drives or disks attached to pods (database replicas). Even if a pod moves, crashes, or restarts, the disk (data) stays. Kubernetes makes sure the data stays safe.</p>
</li>
<li><p><strong>StorageClasses (choose your disk):</strong> You can customize the disks in which your data will be stored, that is:</p>
<ul>
<li><p>HDD (most affordable, but slower),</p>
</li>
<li><p>Balanced Disk (SSD enabled, a balance between costs and speed),</p>
</li>
<li><p>Fast SSD (Very fast, recommended by the CockroachDB team, but a bit more expensive than a Balanced Disk).</p>
</li>
<li><p>Rolling updates, anti-affinity, (No Downtime, High Availability, Fault tolerance).<br>  Anti-affinity means you can tell Kubernetes, “don’t put more than one CockroachDB replica on the same VM or physical machine.” This protects you if one VM goes bad, other replicas are safe.</p>
</li>
<li><p>Rolling updates let you update one replica at a time (configuration, version, resources) without bringing down the whole cluster. While one replica updates, others serve traffic. That helps avoid downtime.</p>
</li>
<li><p>Kubernetes also has ordered start/stop for replicas (via StatefulSets) so things are predictable and safe</p>
</li>
</ul>
</li>
<li><p><strong>Vertical vs horizontal scaling (earlier talk – reminder)</strong><br>  You remember we talked about scaling in prior sections:</p>
<ul>
<li><p><strong>Horizontal scaling</strong> means adding more replicas (more pods, more nodes) so load spreads out.</p>
</li>
<li><p><strong>Vertical scaling</strong> means increasing the resources (CPU, RAM, disk) of existing nodes/replicas.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>        In tools like Nomad or Docker Swarm, vertical scaling tends to be harder, often involves stopping services, shutting things down, and restarting VMs, which causes downtime.</p>
<p>        Kubernetes makes vertical and horizontal scaling easier at the pod level (you can resize one pod CPU + RAM) and manage rolling upgrades so you don’t take everything down at once.</p>
<p>        You can also add more database replicas to the cluster easily (to balance load and make the database process queries faster), and the data is automatically copied to the new database replica (replication), especially when you use the official CockroachDB Helm Chart.</p>
<h4 id="heading-why-other-tools-swarm-nomad-docker-compose-dont-match-up-here">Why Other Tools (Swarm / Nomad / Docker Compose) Don’t Match Up Here</h4>
<p>Docker Swarm and Docker Compose are simpler to use and are good when you don’t have much complexity. But they lack robust features for stable storage, default support for replication, vertical scaling, horizontal scaling of stateful services, and so on. For example, Swarm doesn’t have built-in StatefulSets or dynamic volume provisioning like Kubernetes.</p>
<p>Nomad is more flexible than Swarm in some ways, but many users say storage plugins (CSI) are weaker than what Kubernetes has. Also, less built-in for ordering things, rolling updates for stateful apps.</p>
<p>So while these work fine for simpler apps (stateless services, small apps), when you have a distributed stateful SQL database like CockroachDB, Kubernetes gives you more safety, more control, less chance of data loss or misconfiguration.</p>
<p>Because of all this, running CockroachDB on Kubernetes gives you the tools you need baked in, reducing how much custom plumbing you must write yourself.</p>
<h4 id="heading-trade-offhttpswwwredditcomrhashicorpcomments1ivtuo5utmsourcechatgptcoms-things-to-watch-out-for">Trade-of<a target="_blank" href="https://www.reddit.com/r/hashicorp/comments/1ivtuo5?utm_source=chatgpt.com">f</a>s (things to watch out for)</h4>
<ul>
<li><p>You have to manage everything: backups, monitoring the ENTIRE CockroachDB cluster, withstanding failures (fault tolerance), and upgrades. That’s work 🥲.</p>
</li>
<li><p>You need to know your way around infra (VMs, disks, networking, and inter-node connections) and operations (or have teammates who do – DevOps Engineers, Cloud Architects, Site Reliability Engineers).</p>
</li>
<li><p>Using managed Kubernetes (like GKE, EKS, AKS) helps as you offload the control plane. You still manage the nodes, storage, and higher layers.</p>
</li>
<li><p>But even with that, you avoid paying for “database management as a service” markup – you're only paying for infrastructure plus your time.</p>
</li>
</ul>
<h2 id="heading-setting-up-your-local-environment"><strong>Setting Up Your Local Environment 🧑‍💻</strong></h2>
<p>Alright, we’ve learned quite a bit so far: what CockroachDB is, how it works behind the scenes, and where you can host it. Now, it’s time to roll up our sleeves and get our hands dirty with some practical setup.</p>
<p>Before we deploy CockroachDB, we need a safe “playground” where we can test and experiment without touching the cloud or spending a dime.</p>
<h3 id="heading-why-these-tools">Why these tools?</h3>
<p>Before we jump into running commands, here’s a quick lookup of what tools we’ll use and why:</p>
<ul>
<li><p><strong>Minikube</strong>: A tool that runs a small Kubernetes cluster on your computer. It gives you a local “mini cloud” where you can deploy and experiment.</p>
</li>
<li><p><strong>Kubectl</strong>: The command line tool you’ll use to talk to your Kubernetes cluster to deploy apps, check status, and manage resources.</p>
</li>
<li><p><strong>Helm</strong>: A package manager for Kubernetes. It helps you install complex applications (like CockroachDB) with fewer manual steps.</p>
</li>
</ul>
<h3 id="heading-step-1-install-minikube">Step 1: Install Minikube</h3>
<p><strong>What is Minikube?</strong><br>Minikube is a lightweight tool that helps you run a small Kubernetes cluster on your personal computer.</p>
<p>Think of it as your own mini-cloud environment where you can test, deploy, and learn Kubernetes (and in our case, CockroachDB) locally. It’s perfect for learning and experimenting before deploying on the cloud.</p>
<p>Here’s how to get it on different operating systems:</p>
<h4 id="heading-windows">🪟 Windows</h4>
<ol>
<li><p>Make sure you have a hypervisor (VirtualBox, Hyper-V) or Docker installed.</p>
</li>
<li><p>Open PowerShell as Administrator.</p>
</li>
<li><p>Run:</p>
<pre><code class="lang-bash"> choco install minikube
</code></pre>
<p> or use:</p>
<pre><code class="lang-bash"> winget install minikube
</code></pre>
</li>
<li><p>After installation, check the version:</p>
<pre><code class="lang-bash"> minikube version
</code></pre>
<p> If it returns a version number, you’re good 👍🏾</p>
</li>
</ol>
<p>If you don’t have the <code>choco</code> or <code>winget</code> package manager, you can install Minikube via PowerShell by following the steps in the <a target="_blank" href="https://minikube.sigs.k8s.io/docs/start/?arch=%2Fwindows%2Fx86-64%2Fstable%2F.exe+download">docs</a>.</p>
<h4 id="heading-macos">🍎 macOS</h4>
<ol>
<li><p>Ensure you have Homebrew installed.</p>
</li>
<li><p>In Terminal, run:</p>
<pre><code class="lang-bash"> brew install minikube
</code></pre>
</li>
<li><p>Start the cluster:</p>
<pre><code class="lang-bash"> minikube start
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> minikube version
</code></pre>
</li>
</ol>
<h4 id="heading-linux">🐧 Linux</h4>
<ol>
<li><p>Ensure you’re on a supported distribution (Ubuntu, Fedora, and so on) and virtualization (Docker, KVM, and so on) is enabled.</p>
</li>
<li><p>Run:</p>
<pre><code class="lang-bash"> curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
 sudo install minikube-linux-amd64 /usr/<span class="hljs-built_in">local</span>/bin/minikube
 rm minikube-linux-amd64
</code></pre>
</li>
<li><p>Start the cluster:</p>
<pre><code class="lang-bash"> minikube start
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> minikube status
</code></pre>
</li>
</ol>
<p>✅ At this point you should have a local Kubernetes cluster up and running on your machine! Next, we’ll install Kubectl so you can talk to the cluster from your command line.</p>
<h3 id="heading-step-2-install-kubectl">Step 2: Install kubectl</h3>
<p><strong>What kubectl does:</strong><br>kubectl is the command-line tool that lets you talk to your Kubernetes cluster. Using it, you can deploy applications, check your cluster’s health, and manage resources inside your cluster.</p>
<p>You’ll use it a lot when working with Kubernetes on Minikube and later when you deploy CockroachDB.</p>
<p>Here’s how to install it on Windows, macOS, and Linux:</p>
<h4 id="heading-windows-1">🪟 Windows</h4>
<ol>
<li><p>Open PowerShell as Administrator.</p>
</li>
<li><p>Run:</p>
<pre><code class="lang-bash"> choco install kubernetes-cli
</code></pre>
<p> or if you prefer:</p>
<pre><code class="lang-bash"> choco install kubectl
</code></pre>
</li>
<li><p>Then check the version:</p>
<pre><code class="lang-bash"> kubectl version --client
</code></pre>
<p> If it prints a version number, you’re good.</p>
</li>
</ol>
<h4 id="heading-macos-1">🍎 macOS</h4>
<ol>
<li><p>Open Terminal.</p>
</li>
<li><p>If you have Homebrew installed, run:</p>
<pre><code class="lang-bash"> brew install kubectl
</code></pre>
</li>
<li><p>Check the version:</p>
<pre><code class="lang-bash"> kubectl version --client
</code></pre>
<p> That should show something like “Client Version: v1.x.x”.</p>
</li>
</ol>
<h4 id="heading-linux-1">🐧 Linux</h4>
<ol>
<li><p>Open your terminal.</p>
</li>
<li><p>Download the latest kubectl binary:</p>
<pre><code class="lang-bash"> curl -LO <span class="hljs-string">"https://dl.k8s.io/release/<span class="hljs-subst">$(curl -L -s https://dl.k8s.io/release/stable.txt)</span>/bin/linux/amd64/kubectl"</span>
</code></pre>
</li>
<li><p>Make it executable and move it into your PATH:</p>
<pre><code class="lang-bash"> chmod +x ./kubectl
 sudo mv ./kubectl /usr/<span class="hljs-built_in">local</span>/bin/kubectl
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> kubectl version --client
</code></pre>
</li>
</ol>
<p>After this, you’ll have kubectl installed and ready to use with your local Minikube cluster. Next up we’ll install Helm, which will make deploying CockroachDB much easier.</p>
<h3 id="heading-step-3-install-helm">Step 3: Install Helm</h3>
<p>Helm is basically the package manager for Kubernetes. Think of it like how you use <code>apt</code>, <code>yum</code>, or <code>brew</code> to install software on your computer. Helm does something similar for Kubernetes apps.</p>
<p>With Kubernetes, deploying a full app often means writing lots of configs (manifests – Deployments, Services, PersistentVolumes, ConfigMaps, and so on). Helm lets us bundle all of that into a single “package” (called a chart) so we don’t have to manually create the resources one-after-the-other (which could be hectic to manage btw 😖).</p>
<p>Because our goal is to deploy a pretty complex system (CockroachDB) on Kubernetes – which includes stateful nodes, persistent storage, networking, SSL/TLS, and so on – using a Helm chart makes it <em>so much easier</em> than crafting dozens of YAML files from scratch.</p>
<p>So before we install CockroachDB, we’ll install Helm. This gives us the toolkit to deploy and manage our cluster much more easily.</p>
<p>Let’s install Helm on each platform. After this, you’ll have the <code>helm</code> command ready to deploy apps into your Kubernetes cluster.</p>
<h4 id="heading-windows-2">🪟 Windows</h4>
<ol>
<li><p>Open PowerShell as Administrator.</p>
</li>
<li><p>If you have Chocolatey installed, run:</p>
<pre><code class="lang-bash"> choco install kubernetes-helm
</code></pre>
<p> Alternatively:</p>
<pre><code class="lang-bash"> choco install helm
</code></pre>
</li>
<li><p>Confirm installation:</p>
<pre><code class="lang-bash"> helm version
</code></pre>
<p> You should see something like <code>version.BuildInfo{Version:"v3.x.x",…}</code>.</p>
</li>
</ol>
<h4 id="heading-macos-2">🍎 macOS</h4>
<ol>
<li><p>Open Terminal.</p>
</li>
<li><p>With Homebrew installed, run:</p>
<pre><code class="lang-bash"> brew install helm
</code></pre>
</li>
<li><p>Verify:</p>
<pre><code class="lang-bash"> helm version
</code></pre>
<p> If you see version info, you’re good.</p>
</li>
</ol>
<h4 id="heading-linux-2">🐧 Linux</h4>
<ol>
<li><p>Open your terminal.</p>
</li>
<li><p>Download and install the binary (example for the latest version):</p>
<pre><code class="lang-bash"> curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
 chmod 700 get_helm.sh
 ./get_helm.sh
</code></pre>
<p> Or you can directly download the binary and move it into your <code>PATH</code>.</p>
</li>
<li><p>Check version:</p>
<pre><code class="lang-bash"> helm version
</code></pre>
</li>
</ol>
<p>✅ After this, you have <code>helm</code> installed and you’re ready to use it.</p>
<p>In the next part, we’ll use Helm to install CockroachDB into your local Minikube cluster. We’ll add the CockroachDB chart, configure it, and spin up a multi-node replica setup right on your PC.</p>
<h2 id="heading-deploying-cockroachdb-on-minikube-the-fun-part-begins">Deploying CockroachDB on Minikube (The Fun Part Begins 😁!)</h2>
<p>Before we go to the cloud, we’ll deploy CockroachDB locally on Minikube using Helm.</p>
<p>This process will help us:</p>
<ul>
<li><p>Understand how CockroachDB runs in a cluster</p>
</li>
<li><p>Learn how Kubernetes manages database replicas</p>
</li>
<li><p>Gain hands-on experience before deploying to the cloud</p>
</li>
</ul>
<h3 id="heading-step-1-visit-artifacthub">Step 1: Visit ArtifactHub</h3>
<p><strong>ArtifactHub</strong> is like an App Store for Kubernetes Helm Charts – a huge collection of open-source Helm charts and packages you can easily install.</p>
<ol>
<li><p>Go to <a target="_blank" href="https://artifacthub.io">https://artifacthub.io</a></p>
</li>
<li><p>In the search bar, type <strong>CockroachDB</strong></p>
</li>
<li><p>Click the <strong>CockroachDB Helm chart</strong> result (you’ll see it published by <em>Cockroach Labs</em>).</p>
</li>
</ol>
<p>You’ll see something like this 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760848079912/1778bbcf-088a-4919-80bb-ca24241ffa85.png" alt="The official CockroachDB Helm chart" class="image--center mx-auto" width="918" height="469" loading="lazy"></p>
<h3 id="heading-step-2-explore-the-helm-chart">Step 2: Explore the Helm Chart</h3>
<p>You’ll notice a lot of information on the page:</p>
<ul>
<li><p><strong>README</strong> – the documentation for installing and customizing CockroachDB</p>
</li>
<li><p><strong>Default Values</strong> – all the settings that define how the database runs</p>
</li>
</ul>
<p>Don’t worry if it looks overwhelming. We’ll walk through it together 😉</p>
<h3 id="heading-step-3-copy-the-default-values">Step 3: Copy the Default Values</h3>
<p>Every Helm chart has a <em>default configuration</em> file. These defaults are usually too advanced or too heavy for local setups, so we’ll create our own lighter version. But first, let’s copy the original for reference.</p>
<ol>
<li><p>On the CockroachDB chart page, click the <strong>Default Values</strong> button.</p>
</li>
<li><p>A modal window will pop up showing a long YAML file.</p>
</li>
<li><p>Click the <strong>Copy</strong> icon in the top-right corner to copy all the default values.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760848210119/17cd734b-6d7c-40dc-a8c3-f01c85edd7a7.png" alt="The Default Values button description" class="image--center mx-auto" width="896" height="458" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760848520060/1e1ce249-0cf0-46cb-abbc-00efb3ea1343.png" alt="Copy the default values" class="image--center mx-auto" width="781" height="399" loading="lazy"></p>
<h3 id="heading-step-4-create-a-folder-for-our-project">Step 4: Create a Folder for Our Project</h3>
<p>We’ll keep everything organized in a single folder.</p>
<pre><code class="lang-bash">mkdir cockroachdb-tutorial
<span class="hljs-built_in">cd</span> cockroachdb-tutorial
</code></pre>
<p>Inside this folder, create a new file called:</p>
<pre><code class="lang-bash">nano cockroachdb-original-values.yml
</code></pre>
<p>Now paste all the default values you copied earlier (use Ctrl+V or right-click → Paste), then save and exit (<code>Ctrl+O</code>, then <code>Ctrl+X</code> in nano).</p>
<p>If you’re on Windows, just open Notepad/VSCode, paste the content, and save the file in the same folder.</p>
<h3 id="heading-step-5-understanding-the-key-configurations">Step 5: Understanding the Key Configurations</h3>
<p>Let’s break down a few important values you’ll notice in the file.</p>
<h4 id="heading-statefulsetreplicas">🧩 <code>statefulset.replicas</code></h4>
<p>This tells CockroachDB how many database nodes (replicas) to run in the cluster. By default, it’s set to 3, meaning you’ll have 3 independent database instances that can all read and write data.</p>
<h4 id="heading-statefulsetresourcesrequests-and-statefulsetresourceslimits">⚙️ <code>statefulset.resources.requests</code> and <code>statefulset.resources.limits</code></h4>
<p>These settings tell Kubernetes how much CPU and memory to give CockroachDB.</p>
<ul>
<li><p><code>requests</code>: the minimum guaranteed amount</p>
</li>
<li><p><code>limits</code>: the maximum allowed amount</p>
</li>
</ul>
<p>CockroachDB can be a bit greedy with memory 😅, so limits make sure it doesn’t take everything and leave no room for other apps.</p>
<h4 id="heading-storagepersistentvolumesize">💾 <code>storage.persistentVolume.size</code></h4>
<p>This defines how much disk space each CockroachDB node gets. For example, if you set it to <code>10Gi</code> and you have 3 replicas, total usage = <code>30Gi</code>.</p>
<h4 id="heading-storagepersistentvolumestorageclass">💽 <code>storage.persistentVolume.storageClass</code></h4>
<p>This defines the type of disk to use:</p>
<ul>
<li><p><code>standard</code>: HDD (cheap but slow)</p>
</li>
<li><p><code>standard-rwo</code>: SSD (faster and affordable)</p>
</li>
<li><p><code>pd-ssd</code> or <code>fast-ssd</code>: NVMe (super fast but pricey)</p>
</li>
</ul>
<p>You can check available storage classes in your Minikube cluster using:</p>
<pre><code class="lang-bash">kubectl get sc
</code></pre>
<p>On Minikube, the default storage class is usually <code>standard</code>.</p>
<p>You can learn more about <a target="_blank" href="https://cloud.google.com/kubernetes-engine/docs/concepts/storage-overview">Google Cloud storage classes here</a>.</p>
<h4 id="heading-tlsenabled">🔐 <code>tls.enabled</code></h4>
<p>This controls whether CockroachDB requires <strong>TLS certificates</strong> for secure connections.</p>
<p>If <code>true</code>, you’ll need to generate certificates for any app or client that connects to your cluster (instead of using a username and password). This is <strong>strongly recommended for production</strong>, but for our local Minikube setup, we’ll disable it so it’s easier to play around and test connections.</p>
<h3 id="heading-step-6-create-a-simplified-values-config-for-the-cockroachdb-helm-chart">Step 6: Create a Simplified Values Config for the CockroachDB Helm Chart</h3>
<p>We’ll now create a new config file with lighter resource settings for our local test environment.</p>
<p>In the same folder, create:</p>
<pre><code class="lang-bash">nano cockroachdb-values.yml
</code></pre>
<p>Then paste this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">podSecurityContext:</span>
    <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span> <span class="hljs-comment"># You should have 3GB+ of RAM free on your device; else, you can reduce this to 500Mi (this will result in your PC needing just 1.5 GB of RAM free)</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>  <span class="hljs-comment"># The same with this, you can reduce it to 500m CPU if you don't have up to 3 CPU cores (1 CPU core * 3 replicas)</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">podAntiAffinity:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>
  <span class="hljs-attr">nodeSelector:</span>
    <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>

<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">5Gi</span> <span class="hljs-comment"># Make sure you have 15GB+ of free storage on your local machine, if not, you can reduce it to 2 - 3 Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">standard</span>

<span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>

<span class="hljs-attr">init:</span>
  <span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">wait:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Setting the <code>requests</code> and <code>limits</code> to the same value ensures Kubernetes won’t terminate CockroachDB pods due to high memory or CPU usage.</p>
<p>You can <a target="_blank" href="https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/">read more about this here</a>.</p>
<h3 id="heading-overview-of-the-yaml-values">Overview of the YAML values</h3>
<p>Now, let’s understand the content of the <code>cockroachdb-values.yml</code> file together</p>
<p><code>podSecurityContext</code> – why you needed it on Minikube:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">podSecurityContext:</span>
  <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
</code></pre>
<p>This block sets the Linux user and group IDs that the CockroachDB process runs as inside the container, and the group ownership for mounted files.</p>
<p>Why this matters, simply:</p>
<ul>
<li><p>The CockroachDB process runs as <strong>UID 1000</strong> inside the container. If the disk mount (the persistent volume) is owned by a different UID, Cockroach can’t create files there and fails with <code>permission denied</code>.</p>
</li>
<li><p><code>runAsUser</code> and <code>runAsGroup</code> make the container process run as UID/GID 1000.</p>
</li>
<li><p><code>fsGroup</code> makes the mounted volume be accessible to that group, so the process can write to <code>/cockroach/cockroach-data</code>.</p>
</li>
</ul>
<p>In short, these lines make sure the DB process has permission to create and write files on the mounted disk (volume), which is especially important on Minikube and other local setups where host-mounted storage can have odd permissions.</p>
<p><code>podAntiAffinity</code> and <code>nodeSelector</code> – what they do:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">podAntiAffinity:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>

<span class="hljs-attr">nodeSelector:</span>
  <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
</code></pre>
<p><code>podAntiAffinity</code> is the default behavior. Normally this tells Kubernetes to <em>spread</em> pods across different nodes (VMs), so replicas don’t run on the same physical machine. This is good for high availability, because one node failing won’t kill multiple replicas.</p>
<p>By setting <code>type: ""</code> (empty), you <strong>disabled</strong> that spreading rule, so Kubernetes can place multiple CockroachDB replicas on the same node.</p>
<p><code>nodeSelector</code> tells Kubernetes to schedule pods only on nodes that match the label you set (here <code>kubernetes.io/hostname: minikube</code>). That forces all pods to run on the node named <code>minikube</code>.</p>
<p>Quick summary of the effect:</p>
<ul>
<li><p>Good for local testing on a multi-node Minikube cluster, when only one node has properly mounted writable storage.</p>
</li>
<li><p><strong>Not recommended for production</strong>, because it places all replicas on the same machine (single point of failure).</p>
</li>
</ul>
<p>PS: If you’re using another Kubernetes cluster provider, for example K3s, Kind, and so on… this might not get deployed due to the nodeSelector property targeting <code>minikube</code> nodes. So, I'd advise removing the <code>nodeSelector</code> property entirely.</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">nodeSelector:</span>
    <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
<span class="hljs-string">...</span>
</code></pre>
<p>✅ <strong>At this point</strong>, we’ve:</p>
<ul>
<li><p>Copied the default CockroachDB Helm chart configuration</p>
</li>
<li><p>Created a lightweight version for Minikube</p>
</li>
<li><p>Learned what each key property means</p>
</li>
</ul>
<h3 id="heading-step-7-install-the-cockroachdb-cluster-using-helm">🚀 Step 7: Install the CockroachDB Cluster Using Helm</h3>
<p>Great job so far! You’ve created your <code>cockroachdb-values.yml</code> file and set up your custom configuration for Minikube. Now we’ll actually deploy the cluster.</p>
<p><strong>What we’re going to do:</strong><br>We’ll use Helm to install the official CockroachDB Helm chart using our custom values. This will spin up your 3-node cluster locally so you can play with it.</p>
<p><strong>Command to run:</strong></p>
<pre><code class="lang-bash">helm install crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>Here:</p>
<ul>
<li><p><code>crdb</code> is the name we’re giving this release (you can pick something else if you like).</p>
</li>
<li><p><code>cockroachdb/cockroachdb</code> tells Helm which chart to use.</p>
</li>
<li><p><code>-f cockroachdb-values.yml</code> tells Helm to use our custom file instead of default values.</p>
</li>
</ul>
<h4 id="heading-after-the-command-runs">After the command runs:</h4>
<p>After a little while the command completes, and you’ll see output telling you what resources were created (pods, services, persistent volume claims, and so on).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761386160496/babc3e67-1ea9-4aa1-b6a7-516fe3a9972a.png" alt="The CockroachDB Helm Chart post-installation message" class="image--center mx-auto" width="923" height="675" loading="lazy"></p>
<p>Now to check if everything is working, do this:</p>
<pre><code class="lang-bash">kubectl get pods | grep -i crdb
</code></pre>
<p>This filters pods with “crdb” in the name (our release prefix).</p>
<p>You should see something like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761386195190/21469ce5-c909-4336-ba5f-a4c4a776a470.png" alt="The CockroachDB replicas running successfully" class="image--center mx-auto" width="528" height="105" loading="lazy"></p>
<p>The three primary pods (<code>0</code>, <code>1</code>, <code>2</code>) should be in <code>Running</code> state. The <code>init</code> job or pod (<code>crdb-cockroachdb-init-xxx</code>) should show <code>Completed</code>. This means the initialization tasks (cluster bootstrap) succeeded.</p>
<p>If you see that, congratulations! You’ve got your local CockroachDB cluster up and running! 🎉</p>
<h2 id="heading-accessing-the-cockroachdb-console-amp-viewing-metrics">Accessing the CockroachDB Console &amp; Viewing Metrics</h2>
<p>Alright! Now that our CockroachDB cluster is up and running, let’s take a peek behind the scenes and explore the CockroachDB Admin Console. It’s a beautiful web dashboard that helps us visualize everything happening in our database cluster.</p>
<p>In this section, we’ll learn how to:</p>
<ul>
<li><p>Access the CockroachDB admin console right from your browser 🖥️</p>
</li>
<li><p>Understand what each built-in dashboard shows (CPU, memory, disk, SQL performance)</p>
</li>
<li><p>Confirm that our cluster is healthy and that all 3 nodes are working together perfectly</p>
</li>
</ul>
<h3 id="heading-step-1-locate-the-cockroachdb-public-service">Step 1: Locate the CockroachDB Public Service</h3>
<p>CockroachDB automatically creates a <strong>public service</strong> that allows us to connect to the database and also access its dashboard.</p>
<p>Let’s check it out by running:</p>
<pre><code class="lang-bash">kubectl get svc | grep -i crdb
</code></pre>
<p>You should see a line similar to:</p>
<pre><code class="lang-bash">crdb-cockroachdb-public   ClusterIP   10.x.x.x   &lt;none&gt;   26257/TCP,8080/TCP   ...
</code></pre>
<p>This service (<code>crdb-cockroachdb-public</code>) is what we’ll use to connect to both:</p>
<ul>
<li><p>The <strong>database</strong> itself (via port 26257)</p>
</li>
<li><p>The <strong>dashboard UI</strong> (via port 8080)</p>
</li>
</ul>
<h3 id="heading-step-2-learn-more-about-the-service">Step 2: Learn More About the Service</h3>
<p>Let’s dig a little deeper to understand it:</p>
<pre><code class="lang-bash">kubectl describe svc crdb-cockroachdb-public
</code></pre>
<p>Here’s what you’ll notice:</p>
<ul>
<li><p><strong>Port 26257</strong> is used for <strong>gRPC connections</strong> (this is how applications connect to send and receive SQL queries).</p>
</li>
<li><p><strong>Port 8080</strong> is used for the <strong>web dashboard</strong>, where we can view metrics and monitor performance.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761387757614/dab8cfd0-2d89-45b0-a54f-41e530f1a6ab.png" alt="Description of the crdb-cockroachdb-public service" class="image--center mx-auto" width="938" height="431" loading="lazy"></p>
<h3 id="heading-step-3-access-the-cockroachdb-dashboard">Step 3: Access the CockroachDB Dashboard</h3>
<p>Now, let’s make the dashboard available on your local computer. Run this command:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 8080:8080
</code></pre>
<p>This command simply tells Kubernetes:</p>
<blockquote>
<p>“Hey, please open a tunnel from my local computer’s port 8080 to the CockroachDB service’s port 8080 in the cluster.”</p>
</blockquote>
<p>Once you see something like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761387838362/186ff222-c643-4e67-b0a4-dbaff8777977.png" alt="Result of port-forwarding the crdb-cockroachdb-public service on port 8080" class="image--center mx-auto" width="832" height="59" loading="lazy"></p>
<p>...you’re good to go!</p>
<h3 id="heading-step-4-visit-the-dashboard">Step 4: Visit the Dashboard</h3>
<p>Now, open your browser and go to http://localhost:8080.</p>
<p>You’ll see the CockroachDB Admin Console. This is your central command center for monitoring your cluster</p>
<p>Here, you’ll be able to view:</p>
<ul>
<li><p><strong>Number of replicas (nodes)</strong>: You should see 3 in our setup.</p>
</li>
<li><p><strong>RAM usage</strong> per node: Helps track how much memory each CockroachDB instance is using.</p>
</li>
<li><p><strong>CPU usage</strong>: Useful to know when your database is getting busy.</p>
</li>
<li><p><strong>Disk space</strong>: Shows how much data your cluster is storing and how much free space remains.</p>
</li>
</ul>
<p>Here’s what your dashboard might look like 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761387968743/327288e5-4811-42bf-8fd8-74ed187792a4.png" alt="The CockroachDB dashboard UI on http://localhost:8080" class="image--center mx-auto" width="1858" height="952" loading="lazy"></p>
<h3 id="heading-step-5-exploring-the-metrics-dashboard">Step 5: Exploring the Metrics Dashboard</h3>
<p>Now that you’re inside the CockroachDB Admin Console (<a target="_blank" href="http://localhost:8080">http://localhost:8080</a>), let’s take things a step further by exploring the <strong>Metrics</strong> section. This is where CockroachDB really shines.</p>
<p>On the left-hand side, click on “Metrics.” Here, you’ll find a collection of dashboards showing how your database is performing behind the scenes, things like query activity, performance, memory use, and much more.</p>
<p>These metrics help you understand what’s happening inside your cluster and make data-driven decisions – like when to scale up, optimize queries, or add more nodes.</p>
<p>We’ll start by focusing on some of the most insightful ones, such as:</p>
<ul>
<li><p><strong>SQL Queries Per Second</strong> – how busy your database is</p>
</li>
<li><p><strong>Service Latency (SQL Statements, 99th percentile)</strong> – how fast or slow your queries are</p>
</li>
</ul>
<p>Then, we’ll also look at others like SQL Contention, Replicas per Node, and Capacity to get a complete view of your CockroachDB cluster’s health.</p>
<p>Here’s what each of these metrics means in simple, everyday terms 👇🏾</p>
<h4 id="heading-sql-queries-per-second">SQL Queries Per Second</h4>
<p>This metric shows the number of SQL commands (like <code>SELECT</code>, <code>INSERT</code>, <code>UPDATE</code>, <code>DELETE</code>) your database cluster is handling every second. In simpler words, it’s how busy your database is. Imagine cars passing through a toll booth – this is the count of cars per second.</p>
<p>This is useful to know because if this number is steadily climbing, your system is getting more traffic or work. You may need to scale up (more nodes, more resources) or optimize queries. If it drops suddenly, something might be wrong (traffic drop, and so on).</p>
<p>Look for a stable or expected value for your workload. Spikes or sustained high values mean you should check performance.</p>
<h4 id="heading-service-latency-sql-statements-99th-percentile">Service Latency: SQL Statements, 99th percentile</h4>
<p>This metric shows the time it takes (for the slowest ~1 % of queries) from when the database gets the request until it finishes executing it. Think of waiting in a queue: 99% percentile is what the slowest people (1 in 100) experienced.</p>
<p>You’ll want to know this because if the slowest queries are taking too long, it might signal a bottleneck (CPU, disk, network, and so on). Low latency = good user experience.</p>
<p>So keep an eye out: if this value rises (gets worse) over time, investigate what’s slowing down. If it stays low and stable, you’re in good shape.</p>
<h4 id="heading-sql-statement-contention">SQL Statement Contention</h4>
<p>Statement contention demonstrates the number of SQL queries that got “stuck” or had to wait because other queries were using the same data or resources. This is like if two people were trying to grab the same book – one has to wait. That waiting is contention.</p>
<p>High contention means your database is chasing conflicts, waiting for locks or resources. This slows things down overall. So you’ll want to keep this number as low as possible. If it starts rising, you might need to revisit your schema, queries, or scale differently.</p>
<h4 id="heading-replicas-per-node">Replicas per Node</h4>
<p>This tells you how many copies (“replicas”) of data ranges live on each database node. If you imagine your data is like documents saved in several safes (nodes), this shows how many copies are in each safe.</p>
<p>This matters, because you want balanced replicas so no node is overloaded with too many copies (which can slow it down or put it at risk).</p>
<p>To check on this, make sure nodes have roughly equal replica counts. If one node has many more replicas, you might need to rebalance or add nodes.</p>
<h4 id="heading-capacity">Capacity</h4>
<p>Capacity shows how much disk/storage your cluster has (total), how much is used, and how much is free. Imagine a warehouse: it’s like how many boxes you can store, how many you’ve filled, and how much empty space remains.</p>
<p>You’ll need to know this, because if capacity is nearly full, you risk running out of space which can cause downtime or performance issues.</p>
<p>Free space should stay healthy (for example less than ~80% used). If it crosses that, plan to add storage or nodes.</p>
<h4 id="heading-why-these-matter-together">Why These Matter Together</h4>
<p>When you combine these metrics, you get a clear picture:</p>
<ul>
<li><p>High Queries Per Second + high latency = maybe you're under-powered.</p>
</li>
<li><p>High contention = your workload design might be fighting itself.</p>
</li>
<li><p>Imbalanced replicas or full capacity = infrastructure issues.</p>
</li>
<li><p>Stable low latency + balanced replicas + plenty of capacity = sounds like a healthy cluster.</p>
</li>
</ul>
<p>So by keeping an eye on these, you make data-driven decisions: when to scale, when to optimize, when to tweak configs.</p>
<h3 id="heading-step-6-creating-a-little-load-on-the-cockroachdb-cluster">Step 6: Creating a Little Load on the CockroachDB Cluster</h3>
<p>So far, we’ve explored the CockroachDB dashboard and understood what each metric means. Now, let’s make things a bit more fun. 🎉</p>
<p>In this part, we’ll run a simple Python app that connects to our CockroachDB cluster and performs a few database operations (creating, updating, deleting, and retrieving some records). This will help us generate a small load on the database so we can actually see the metrics in action.</p>
<p>Here’s what we’ll be doing step-by-step 👇🏾</p>
<h4 id="heading-step-61-create-a-configmap-for-our-books-data">Step 6.1: Create a ConfigMap for Our Books Data</h4>
<p>We’ll first create a list of 20 books that our Python script will interact with. Each book will have basic info like name, author, genre, pages, and price.</p>
<ol>
<li><p>Create a new file called <code>books.json</code></p>
<ul>
<li><p>On Linux:</p>
<pre><code class="lang-bash">  nano books.json
</code></pre>
<p>  Paste the below JSON content into it.</p>
<pre><code class="lang-json">  [
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Bright Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Ava Hart"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9783218196000"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2020</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">234</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">10.99</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Library"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Liam Stone"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9783863794026"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1993</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">358</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Romance"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">30.2</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Shadow Archive"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Maya Chen"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781615594078"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2001</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">404</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"History"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">16.21</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Bright Voyage"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Noah Rivers"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9785931034133"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1987</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">507</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">13.14</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Shadow Garden"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Zara Malik"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9785534192834"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2004</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">404</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Sci-Fi"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">28.13</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Ethan Brooks"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9785030564135"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2009</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">508</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Self-Help"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">20.79</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Atomic Atlas"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Iris Park"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9787242388493"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2025</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">442</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Romance"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">18.5</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The First Library"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Caleb Nguyen"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9787101226911"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2017</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">528</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Romance"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">24.47</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal River"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Sofia Diaz"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781845146276"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2004</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">599</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">31.15</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Archive"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Jude Bennett"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784893252883"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1996</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">632</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">40.47</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Last Compass"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Nina Volkova"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784303911713"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2018</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">451</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"History"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">29.53</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Garden"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Omar Haddad"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784896383461"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1988</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">251</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Thriller"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">36.38</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Silent Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Priya Kapoor"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781509839308"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2008</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">649</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">28.05</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Compass"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Felix Romero"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781834738291"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2025</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">180</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Self-Help"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">19.15</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Lost Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Tara Quinn"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781165667017"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2010</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">368</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">41.37</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Last Signal"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Hana Sato"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9783387262476"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2005</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">467</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Nonfiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">42.01</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Crystal Archive"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Leo Fischer"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9780801326776"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1984</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">573</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Nonfiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">42.31</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Atlas"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Mila Novak"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9784746872343"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">2005</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">180</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Nonfiction"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">16.58</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Hidden Compass"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Arthur Wells"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9780097882086"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1983</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">713</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Fantasy"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">39.42</span>
    },
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"The Silent Atlas"</span>,
      <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Selene Ortiz"</span>,
      <span class="hljs-attr">"isbn"</span>: <span class="hljs-string">"9781939909169"</span>,
      <span class="hljs-attr">"published_year"</span>: <span class="hljs-number">1991</span>,
      <span class="hljs-attr">"pages"</span>: <span class="hljs-number">190</span>,
      <span class="hljs-attr">"genre"</span>: <span class="hljs-string">"Self-Help"</span>,
      <span class="hljs-attr">"price"</span>: <span class="hljs-number">33.79</span>
    }
  ]
</code></pre>
<p>  To save and close the file in nano:</p>
<ul>
<li><p>Press <code>CTRL + O</code> → then <code>ENTER</code> (to save)</p>
</li>
<li><p>Press <code>CTRL + X</code> (to exit the editor)</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p>Then create a ConfigMap from the file:</p>
<pre><code class="lang-bash"> kubectl create configmap books-json --from-file=books.json
</code></pre>
</li>
</ol>
<h4 id="heading-step-62-create-the-python-script-configmap">Step 6.2: Create the Python Script ConfigMap</h4>
<p>Next, we’ll create a simple Python script that:</p>
<ul>
<li><p>Creates a new table for books</p>
</li>
<li><p>Inserts 20 records</p>
</li>
<li><p>Updates 7 of them</p>
</li>
<li><p>Deletes 5</p>
</li>
<li><p>Retrieves 15 books from the database</p>
</li>
</ul>
<p>It’s like simulating a small library app. 📚</p>
<p>Create a new file called <code>books-script.yml</code> and paste the content below:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">ConfigMap</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">books-script</span>
<span class="hljs-attr">data:</span>
  <span class="hljs-attr">run.py:</span> <span class="hljs-string">|
    #!/usr/bin/env python3
    import argparse
    import json
    import os
    import sys
    import time
    from typing import List, Dict
</span>
    <span class="hljs-string">import</span> <span class="hljs-string">psycopg</span>
    <span class="hljs-string">from</span> <span class="hljs-string">psycopg.rows</span> <span class="hljs-string">import</span> <span class="hljs-string">dict_row</span>

    <span class="hljs-string">DDL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    CREATE TABLE IF NOT EXISTS books (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        name STRING NOT NULL,
        author STRING NOT NULL,
        isbn STRING UNIQUE,
        published_year INT4,
        pages INT4,
        genre STRING,
        price DECIMAL(10,2),
        created_at TIMESTAMPTZ NOT NULL DEFAULT now()
    );
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">INSERT_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    INSERT INTO books (name, author, isbn, published_year, pages, genre, price)
    VALUES (%s, %s, %s, %s, %s, %s, %s);
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">UPDATE_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    UPDATE books
    SET price = %s, pages = %s
    WHERE isbn = %s;
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">DELETE_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    DELETE FROM books
    WHERE isbn = %s;
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">GET_SQL</span> <span class="hljs-string">=</span> <span class="hljs-string">""</span><span class="hljs-string">"
    SELECT id, name, author, isbn, published_year, pages, genre, price, created_at
    FROM books
    WHERE isbn = %s;
    "</span><span class="hljs-string">""</span>

    <span class="hljs-string">def</span> <span class="hljs-string">load_books(path:</span> <span class="hljs-string">str)</span> <span class="hljs-string">-&gt;</span> <span class="hljs-string">List[Dict]:</span>
        <span class="hljs-string">with</span> <span class="hljs-string">open(path,</span> <span class="hljs-string">"r"</span><span class="hljs-string">)</span> <span class="hljs-attr">as f:</span>
            <span class="hljs-string">return</span> <span class="hljs-string">json.load(f)</span>

    <span class="hljs-string">def</span> <span class="hljs-string">connect_with_retry(dsn:</span> <span class="hljs-string">str,</span> <span class="hljs-attr">attempts:</span> <span class="hljs-string">int</span> <span class="hljs-string">=</span> <span class="hljs-number">30</span><span class="hljs-string">,</span> <span class="hljs-attr">delay:</span> <span class="hljs-string">float</span> <span class="hljs-string">=</span> <span class="hljs-number">2.0</span><span class="hljs-string">):</span>
        <span class="hljs-string">last_exc</span> <span class="hljs-string">=</span> <span class="hljs-string">None</span>
        <span class="hljs-string">for</span> <span class="hljs-string">_</span> <span class="hljs-string">in</span> <span class="hljs-string">range(attempts):</span>
            <span class="hljs-attr">try:</span>
                <span class="hljs-string">conn</span> <span class="hljs-string">=</span> <span class="hljs-string">psycopg.connect(dsn,</span> <span class="hljs-string">autocommit=False)</span>
                <span class="hljs-string">return</span> <span class="hljs-string">conn</span>
            <span class="hljs-attr">except Exception as e:</span>
                <span class="hljs-string">last_exc</span> <span class="hljs-string">=</span> <span class="hljs-string">e</span>
                <span class="hljs-string">time.sleep(delay)</span>
        <span class="hljs-string">raise</span> <span class="hljs-string">last_exc</span>

    <span class="hljs-string">def</span> <span class="hljs-string">main():</span>
        <span class="hljs-string">ap</span> <span class="hljs-string">=</span> <span class="hljs-string">argparse.ArgumentParser()</span>
        <span class="hljs-string">ap.add_argument("--dsn",</span> <span class="hljs-string">required=True,</span> <span class="hljs-string">help="Postgres/CockroachDB</span> <span class="hljs-string">DSN")</span>
        <span class="hljs-string">ap.add_argument("--json",</span> <span class="hljs-string">default="/app/books.json",</span> <span class="hljs-string">help="Path</span> <span class="hljs-string">to</span> <span class="hljs-string">books</span> <span class="hljs-string">JSON")</span>
        <span class="hljs-string">args</span> <span class="hljs-string">=</span> <span class="hljs-string">ap.parse_args()</span>

        <span class="hljs-string">books</span> <span class="hljs-string">=</span> <span class="hljs-string">load_books(args.json)</span>
        <span class="hljs-string">print(f"Loaded</span> {<span class="hljs-string">len(books)</span>} <span class="hljs-string">books")</span>

        <span class="hljs-string">conn</span> <span class="hljs-string">=</span> <span class="hljs-string">connect_with_retry(args.dsn)</span>
        <span class="hljs-string">conn.row_factory</span> <span class="hljs-string">=</span> <span class="hljs-string">dict_row</span>
        <span class="hljs-attr">try:</span>
            <span class="hljs-attr">with conn:</span>
                <span class="hljs-string">with</span> <span class="hljs-string">conn.cursor()</span> <span class="hljs-attr">as cur:</span>
                    <span class="hljs-string">print("Creating</span> <span class="hljs-string">table...")</span>
                    <span class="hljs-string">cur.execute(DDL)</span>

                    <span class="hljs-string">print("Inserting</span> <span class="hljs-number">20</span> <span class="hljs-string">books...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[:20]:</span>
                        <span class="hljs-string">cur.execute(INSERT_SQL,</span> <span class="hljs-string">(</span>
                            <span class="hljs-string">b["name"],</span> <span class="hljs-string">b["author"],</span> <span class="hljs-string">b["isbn"],</span>
                            <span class="hljs-string">b.get("published_year"),</span> <span class="hljs-string">b.get("pages"),</span>
                            <span class="hljs-string">b.get("genre"),</span> <span class="hljs-string">b.get("price"),</span>
                        <span class="hljs-string">))</span>

                    <span class="hljs-string">print("Updating</span> <span class="hljs-number">7</span> <span class="hljs-string">books...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[:7]:</span>
                        <span class="hljs-string">new_price</span> <span class="hljs-string">=</span> <span class="hljs-string">round(float(b.get("price",</span> <span class="hljs-number">10</span><span class="hljs-string">))</span> <span class="hljs-string">+</span> <span class="hljs-number">1.23</span><span class="hljs-string">,</span> <span class="hljs-number">2</span><span class="hljs-string">)</span>
                        <span class="hljs-string">new_pages</span> <span class="hljs-string">=</span> <span class="hljs-string">int(b.get("pages",</span> <span class="hljs-number">100</span><span class="hljs-string">))</span> <span class="hljs-string">+</span> <span class="hljs-number">5</span>
                        <span class="hljs-string">cur.execute(UPDATE_SQL,</span> <span class="hljs-string">(new_price,</span> <span class="hljs-string">new_pages,</span> <span class="hljs-string">b["isbn"]))</span>

                    <span class="hljs-string">print("Deleting</span> <span class="hljs-number">5</span> <span class="hljs-string">books...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[-5:]:</span>
                        <span class="hljs-string">cur.execute(DELETE_SQL,</span> <span class="hljs-string">(b["isbn"],))</span>

                    <span class="hljs-string">print("Performing</span> <span class="hljs-number">15</span> <span class="hljs-string">retrievals...")</span>
                    <span class="hljs-string">for</span> <span class="hljs-string">b</span> <span class="hljs-string">in</span> <span class="hljs-string">books[:15]:</span>
                        <span class="hljs-string">cur.execute(GET_SQL,</span> <span class="hljs-string">(b["isbn"],))</span>
                        <span class="hljs-string">row</span> <span class="hljs-string">=</span> <span class="hljs-string">cur.fetchone()</span>
                        <span class="hljs-attr">if row:</span>
                            <span class="hljs-string">print(f"GET</span> {<span class="hljs-string">b</span>[<span class="hljs-string">'isbn'</span>]}<span class="hljs-string">:</span> {<span class="hljs-string">row</span>[<span class="hljs-string">'name'</span>]} <span class="hljs-string">by</span> {<span class="hljs-string">row</span>[<span class="hljs-string">'author'</span>]} <span class="hljs-string">(${row['price']})")</span>
                        <span class="hljs-attr">else:</span>
                            <span class="hljs-string">print(f"GET</span> {<span class="hljs-string">b</span>[<span class="hljs-string">'isbn'</span>]}<span class="hljs-string">:</span> <span class="hljs-string">not</span> <span class="hljs-string">found</span> <span class="hljs-string">(possibly</span> <span class="hljs-string">deleted)")</span>

            <span class="hljs-string">print("All</span> <span class="hljs-string">operations</span> <span class="hljs-string">completed.")</span>
        <span class="hljs-attr">finally:</span>
            <span class="hljs-string">conn.close()</span>

    <span class="hljs-string">if</span> <span class="hljs-string">__name__</span> <span class="hljs-string">==</span> <span class="hljs-attr">"__main__":</span>
        <span class="hljs-string">main()</span>
</code></pre>
<p>This script connects to the CockroachDB cluster, creates a table (if it doesn’t exist), and performs all those operations in sequence.</p>
<p>It runs around 50 SQL queries in total – a mix of <code>INSERT</code>, <code>UPDATE</code>, <code>DELETE</code>, and <code>SELECT</code> statements.</p>
<p>Now apply it:</p>
<pre><code class="lang-json">kubectl apply -f books-script.yml
</code></pre>
<h4 id="heading-step-63-create-the-job-to-run-the-script">Step 6.3: Create the Job to Run the Script</h4>
<p>Next, let’s create a Kubernetes Job that will actually run our Python script inside a container.</p>
<p>Create a file called <code>books-job.yml</code> and paste the manifest below:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">batch/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Job</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">books-job</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">runner</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">python:3.12-slim</span>
          <span class="hljs-attr">env:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">CRDB_DSN</span>
              <span class="hljs-attr">value:</span> <span class="hljs-string">"postgresql://root@crdb-cockroachdb-public:26257/defaultdb?sslmode=disable"</span>
          <span class="hljs-attr">command:</span> [<span class="hljs-string">"bash"</span>, <span class="hljs-string">"-lc"</span>]
          <span class="hljs-attr">args:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">|
              pip install --no-cache-dir "psycopg[binary]&gt;=3.1,&lt;3.3" &amp;&amp; \
              python /app/run.py --dsn "$CRDB_DSN" --json /app/books.json
</span>          <span class="hljs-attr">volumeMounts:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">script</span>
              <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/app/run.py</span>
              <span class="hljs-attr">subPath:</span> <span class="hljs-string">run.py</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">books</span>
              <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/app/books.json</span>
              <span class="hljs-attr">subPath:</span> <span class="hljs-string">books.json</span>
      <span class="hljs-attr">volumes:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">script</span>
          <span class="hljs-attr">configMap:</span>
            <span class="hljs-attr">name:</span> <span class="hljs-string">books-script</span>
            <span class="hljs-attr">defaultMode:</span> <span class="hljs-number">0555</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">books</span>
          <span class="hljs-attr">configMap:</span>
            <span class="hljs-attr">name:</span> <span class="hljs-string">books-json</span>
</code></pre>
<p>Here’s what’s happening:</p>
<ul>
<li><p>The Job runs a container based on Python 3.12-slim.</p>
</li>
<li><p>It connects to CockroachDB using the connection string <code>postgresql://root@crdb-cockroachdb-public:26257/defaultdb?sslmode=disable</code>. Notice how <code>sslmode=disable</code>: this is because we disabled TLS in our Helm values earlier.</p>
</li>
<li><p>The Job mounts the two ConfigMaps we created earlier (<code>books-json</code> and <code>books-script</code>) as <strong>volumes</strong> inside the container. Think of volumes like small external drives that the container can read from.</p>
</li>
</ul>
<p>Apply it:</p>
<pre><code class="lang-bash">kubectl apply -f books-job.yml
</code></pre>
<h4 id="heading-step-64-check-if-the-job-ran-successfully">Step 6.4: Check if the Job Ran Successfully</h4>
<p>After a minute or two, check your pods:</p>
<pre><code class="lang-bash">kubectl get po
</code></pre>
<p>If you see <code>books-job-xxx</code> with the status <strong>Completed</strong>, then your script ran successfully 🎉</p>
<p>That means our database just got a nice little workout – some records were created, updated, deleted, and read.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761460118429/99ed49a3-52e9-4357-ba2b-9295f0dfbdc8.png" alt="The Completed state of the Books Job" class="image--center mx-auto" width="530" height="124" loading="lazy"></p>
<h3 id="heading-step-7-viewing-the-metrics-from-the-load">Step 7: Viewing the Metrics from the Load</h3>
<p>Now that we’ve generated a small load, let’s jump back to the CockroachDB dashboard.</p>
<p>Head to the Metrics section, and under SQL Queries Per Second, you should see a little spike: this shows the activity from our Python job.👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761460175366/6c1e129e-c8bd-4f41-89de-60a1a753026e.png" alt="The SQL Queries Per Second Metric" class="image--center mx-auto" width="972" height="526" loading="lazy"></p>
<p>Hover your mouse over the graph lines to see exact numbers.</p>
<p>Do the same for Service Latency: SQL Statements (99th percentile). You’ll notice a few bumps showing how long some of the queries took.👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761460224971/8ba9d5ed-0724-4dc6-82f4-7e5d0d05be82.png" alt="The Service Latency Metric" class="image--center mx-auto" width="973" height="410" loading="lazy"></p>
<p>This small experiment gives you a real feel for how CockroachDB reacts under activity, even a tiny one.</p>
<p>To explore more metrics and dashboards, check out the <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/ui-overview-dashboard">official CockroachDB documentation here</a>.</p>
<h3 id="heading-step-8-view-the-list-of-created-items-in-the-database">Step 8: View the List of Created Items in the Database</h3>
<p>Now that our Python job ran and touched the database (creating, updating, deleting, retrieving records), let’s check the content of our <code>books</code> table just to verify everything really happened.</p>
<p>First, we’ll create another Kubernetes job (or pod) that connects to our CockroachDB cluster and runs a simple SQL query <code>SELECT * FROM books;</code>. This pulls out all the remaining records in the table.</p>
<p>Here’s the manifest to use. Create a file named <code>view-books.yml</code> and paste the below content inside it:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">batch/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Job</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">view-books</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">client</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.2</span>
          <span class="hljs-attr">command:</span> [<span class="hljs-string">"bash"</span>, <span class="hljs-string">"-lc"</span>]
          <span class="hljs-attr">args:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">|
              cockroach sql \
                --insecure \
                --host=crdb-cockroachdb-public:26257 \
                --database=defaultdb \
                --format=records \
                --execute="SELECT * FROM public.books;"</span>
</code></pre>
<p>Note: We use <code>sslmode=disable</code> because we turned off TLS in our Minikube config. This job mounts nothing fancy. It just spins up, connects to the database, runs the <code>SELECT</code>, and displays the result.</p>
<p>Run the job:</p>
<pre><code class="lang-bash">kubectl apply -f view-books.yml
</code></pre>
<p>Wait a minute, then check the pod status:</p>
<pre><code class="lang-bash">kubectl get po
</code></pre>
<p>Look for something like <code>books-client-job-xxx</code> in <strong>Completed</strong> state.</p>
<p>Finally, view the job logs to see the actual records:</p>
<pre><code class="lang-bash">kubectl logs view-books
</code></pre>
<p>You’ll see output similar to the below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761462270132/c881eca7-18b0-4647-a6b1-2841e7774969.png" alt="The list of created books in the books table in the CockroachDB database" class="image--center mx-auto" width="631" height="837" loading="lazy"></p>
<h2 id="heading-backing-up-cockroachdb-to-google-cloud-storage">Backing Up CockroachDB to Google Cloud Storage ☁️</h2>
<p>In this section we’ll explain how you can automate backups of your CockroachDB cluster using simple SQL commands, service accounts (for authenticating to Google Cloud), and Google Cloud Storage (where the data will be stored).</p>
<h3 id="heading-why-backups-are-absolutely-critical">Why Backups Are Absolutely Critical</h3>
<p>Imagine you’ve built your cluster on Kubernetes, and everything’s humming along for weeks or months. You’ve got tens or hundreds of gigabytes of data and 10k+ users relying on it.</p>
<p>Then <strong>BAM!</strong> Something happens. Maybe someone accidentally overwrote the Helm release (<code>helm upgrade --install …</code> with the same release name, for example <code>crdb</code>), or a cloud disk got deleted, or a critical node failed and you lose the majority of data replicas. That’s the nightmare we all dread 😭.</p>
<p>Mistakes happen, even if you’re super careful. What matters most is: How fast and easily could you recover?</p>
<p>That’s why we’ll set up <strong>daily backups</strong> of our CockroachDB cluster, targeting a Google Cloud Storage bucket. (Quick note: Google Cloud Object Storage is a service where you can store large amounts of data in the cloud as “objects”. You can grab, store, and retrieve data from it, just like Google Drive or Apple Storage. 😃)</p>
<p>With your backups going into a storage bucket, if disaster strikes, you can restore the entire cluster (or specific databases/tables) in minutes or hours – instead of days or losing data forever.</p>
<h3 id="heading-connecting-to-our-db-installing-beekeeper-studio">Connecting to Our DB – Installing Beekeeper Studio</h3>
<p>So far, we’ve been connecting to our database programmatically, running commands from pods or jobs inside Kubernetes. But what if there was a <em>more visual</em> and <em>user-friendly</em> way to explore our data?</p>
<p>Well, meet my friend <strong>Beekeeper Studio.</strong> 🙂</p>
<p>Beekeeper Studio is a sleek, open-source database management tool that lets you connect to a wide range of databases like PostgreSQL, MySQL, SQLite, and (most importantly for us) CockroachDB.</p>
<p>It comes with a simple, modern interface for running queries, browsing tables, and viewing data – no need to jump into pods or remember command-line flags 😄</p>
<h3 id="heading-how-to-install-beekeeper-studio">How to Install Beekeeper Studio</h3>
<ol>
<li><p>Visit the official Beekeeper Studio download page here: <a target="_blank" href="https://www.beekeeperstudio.io/get">https://www.beekeeperstudio.io/get</a></p>
</li>
<li><p>Click the “Skip to the download” link. You’ll see something like this:</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761542821015/2e7a0fd5-7047-4090-97fb-46b81a3dd638.png" alt="Finding the Button to Skip to the DOwnload page on the Beekeeper Studio website" class="image--center mx-auto" width="874" height="547" loading="lazy"></p>
</li>
<li><p>You’ll be redirected to a page listing download options for different operating systems.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761542877590/6034dcf0-d9b0-447b-bd2b-089458729db7.png" alt="Page to select download option according to the user OS" class="image--center mx-auto" width="924" height="727" loading="lazy"></p>
</li>
<li><p>Choose your OS and download the correct installer.</p>
</li>
<li><p>Afterwards, install the downloaded Beekeeper Studio software according to your OS</p>
</li>
</ol>
<h3 id="heading-connecting-beekeeper-studio-to-cockroachdb">Connecting Beekeeper Studio to CockroachDB</h3>
<p>Now that we’ve installed Beekeeper Studio, it’s time to connect it to our CockroachDB cluster running inside Minikube</p>
<p>But before we jump in, here’s something important to note:👇🏾</p>
<p>Our CockroachDB cluster is running INSIDE Kubernetes, and by default, it’s not accessible from outside the cluster.</p>
<p>To confirm this, run:</p>
<pre><code class="lang-bash">kubectl get svc crdb-cockroachdb-public
</code></pre>
<p>You should see something like this 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761544640270/2cf9f8f1-15f1-459b-acd0-63b1c361fa54.png" alt="The CockroachDB service being of type ClusterIP" class="image--center mx-auto" width="709" height="63" loading="lazy"></p>
<p>Notice the <strong>CLUSTER-IP</strong> column. That means the service can only be accessed by other pods INSIDE the Minikube cluster – not from your laptop or external apps</p>
<h3 id="heading-exposing-the-cluster-for-local-access">Exposing the Cluster for Local Access</h3>
<p>To make our database accessible from your local machine (so Beekeeper Studio can reach it), we’ll use <strong>Kubernetes Port Forwarding</strong>.</p>
<p>In a new terminal tab, run:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26257
</code></pre>
<p>This command tells Kubernetes to forward your local port 26257 to CockroachDB service’s port 26257 inside the cluster.</p>
<p>Once it’s running, your CockroachDB instance will now be accessible from <a target="_blank" href="http://localhost:26257"><code>localhost:26257</code></a>.<br>(Note: it’s not accessible via your browser because this isn’t an HTTP endpoint 😅)</p>
<h3 id="heading-connecting-via-beekeeper-studio">🐝 Connecting via Beekeeper Studio</h3>
<ol>
<li><p>Open Beekeeper Studio.</p>
</li>
<li><p>Click on the dropdown that says “Select a connection type…”.</p>
</li>
<li><p>Choose CockroachDB from the list.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761544886889/98443b46-574d-4bcc-a41c-d2daa7412201.png" alt="Selecting CockroachDB as a connection type in Beekeeper Studio" class="image--center mx-auto" width="694" height="496" loading="lazy"></p>
</li>
<li><p>In the connection window that pops up:</p>
<ul>
<li><p>Disable the <code>Enable SSL</code> option.</p>
</li>
<li><p>Set User to <code>root</code></p>
</li>
<li><p>Set Default Database to <code>defaultdb</code></p>
</li>
<li><p>Host to <a target="_blank" href="http://localhost"><code>localhost</code></a></p>
</li>
<li><p>Port to <code>26257</code></p>
</li>
</ul>
</li>
<li><p>Now click <strong>Test</strong> (bottom right corner). You should see a success message like <em>Connection looks good</em>.</p>
</li>
</ol>
<p>Your setup should look like this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761544818021/0248173e-9969-433c-a9d4-e83684bf34cf.png" alt="Connecting to the CockroachDB cluster from the Beekeeper Studio software" class="image--center mx-auto" width="808" height="709" loading="lazy"></p>
<p>Finally, click Connect (right beside the Test button).</p>
<h3 id="heading-verify-the-connection">Verify the Connection</h3>
<p>Once connected, you’ll land on a clean workspace where you can run SQL queries.</p>
<p>To confirm you’re connected to the right cluster, run:</p>
<pre><code class="lang-bash">SELECT * FROM books;
</code></pre>
<p>You should see a table containing about 15 books (the same ones we inserted earlier):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761545094817/99ef4415-bd0d-4452-817f-380996485397.png" alt="List of books in the CockroachDB database" class="image--center mx-auto" width="851" height="749" loading="lazy"></p>
<p>And there you go. You’ve now connected Beekeeper Studio to your CockroachDB running inside Minikube! 🚀</p>
<h3 id="heading-creating-a-google-cloud-account">Creating a Google Cloud Account</h3>
<p>Before we can back up our CockroachDB data to Google Cloud Storage, we need to have a Google Cloud account ready.</p>
<h4 id="heading-step-1-visit-the-google-cloud-console">Step 1: Visit the Google Cloud Console</h4>
<p>Head over to 👉🏾 <a target="_blank" href="https://console.cloud.google.com">https://console.cloud.google.com</a></p>
<p>If you don’t have a Google account yet, don’t worry. The process is simple and self-explanatory once you visit the site :). You’ll be guided to create a Google account first, and then your Google Cloud account.</p>
<h4 id="heading-step-2-create-or-use-a-project">Step 2: Create or Use a Project</h4>
<p>Once you’re in the Google Cloud Console, you’ll either:</p>
<ul>
<li><p>Use the <strong>default project</strong> that was automatically created for you, <strong>or</strong></p>
</li>
<li><p>Create a new one by clicking on <strong>“New Project”</strong> and naming it <code>crdb-tutorial</code>.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761546797213/295c7b09-9bb8-4c34-85cf-8701242b2768.png" alt="Creating a new Project in our Google Cloud account" class="image--center mx-auto" width="566" height="527" loading="lazy"></p>
<p>Projects are like folders that contain all your Google Cloud resources: compute instances, storage buckets, databases, and more.</p>
<h4 id="heading-step-3-link-a-billing-account-optional-but-recommended">Step 3: Link a Billing Account (Optional but Recommended)</h4>
<p>If you already have a billing account, link it to your project.</p>
<p>If not, you can easily create one by <a target="_blank" href="https://docs.cloud.google.com/billing/docs/how-to/create-billing-account">following Google’s instructions here</a>. (You’ll need a valid Debit or Credit card.)</p>
<p>Don’t worry if your card doesn’t link right away. Sometimes Google’s billing system can be picky. 😅</p>
<p>Here’s a quick fix that usually works:</p>
<ol>
<li><p>Add your card to Google Pay first.</p>
</li>
<li><p>Then go to Google Subscriptions in your Google account, and link it to your Google Billing Account.</p>
</li>
</ol>
<p>To add your card via Google Subscriptions, <a target="_blank" href="https://myaccount.google.com/payments-and-subscriptions">visit here</a>. (You need to have a Google account first. Don’t worry, the site will direct you on what to do if you don’t.)</p>
<p>You’ll see a page like this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761546938934/9e983134-dd7e-49b1-85a7-cd12bd01bf67.png" alt="Adding a card to Google Subscriptions" class="image--center mx-auto" width="887" height="403" loading="lazy"></p>
<p>Click Manage payment methods, then add your card details.</p>
<p>Once you’ve done that, refresh your Google Billing Account page – you should now see your card as one of the available options.</p>
<h3 id="heading-creating-a-google-cloud-storage-bucket">Creating a Google Cloud Storage Bucket</h3>
<p>Now that we’ve set up our Google Cloud account and enabled billing, let’s create a Cloud Storage Bucket. This is simply a location (like an online folder) where our CockroachDB backup files will be stored.</p>
<p>In your Google Cloud console, type “storage” in the search bar at the top. From the dropdown results, click on “Cloud Storage”:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089121918/c737c3e1-e45f-48e1-aed9-99e273583425.png" alt="Navigating to the Cloud Storage page" class="image--center mx-auto" width="553" height="197" loading="lazy"></p>
<p>On the new page, click on the “Buckets” link in the side menu, then click the “Create Bucket” button.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089164660/8b9336fc-c0c3-4811-ab98-d3538596ee5a.png" alt="Creating a new Bucket in Cloud Storage" class="image--center mx-auto" width="749" height="650" loading="lazy"></p>
<p>Give your bucket a unique name, like <em>cockroachdb-backup</em>-. For example, <em>cockroachdb-backup-i8wu, cockroachdb-backup-7gw8u.</em> The random characters ensure your bucket name is unique globally (no other Google Cloud user will have the same name).</p>
<p>Scroll to the bottom and click “Create” to create your bucket.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089287083/a376f695-81b8-4f5a-80a7-cd563c8b4c81.png" alt="Creating your Bucket in Google Cloud Storage" class="image--center mx-auto" width="552" height="889" loading="lazy"></p>
<p>You’ll see a pop-up asking you to <strong>confirm public access prevention</strong>. This means that only you (and people you explicitly give access to) can view or edit your bucket. Make sure the “Enforce public access prevention on this bucket” checkbox is checked, then click “Confirm.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089404876/38c8e6b5-0de0-4771-9bed-9334f8f8c43a.png" alt="Preventing random users from accessing your bucket" class="image--center mx-auto" width="571" height="391" loading="lazy"></p>
<p>Perfect! 🎉 You’ve now created a storage bucket where your CockroachDB backups will live.</p>
<h3 id="heading-giving-cockroachdb-access-to-the-bucket">Giving CockroachDB Access to the Bucket</h3>
<p>Our next goal is to let the CockroachDB cluster upload and read files from this bucket. To do this, we’ll create something called a <strong>Service Account</strong> using <strong>Google IAM</strong>.</p>
<p><strong>What’s IAM?</strong><br>IAM stands for <em>Identity and Access Management.</em> It’s basically Google Cloud’s way of managing who can access what in your project.</p>
<p>With IAM, we can create a service account (like a “digital employee”) and give it permission to interact with our bucket instead of using our personal Google account.</p>
<h4 id="heading-creating-a-service-account">Creating a Service Account</h4>
<p>Type “service account” in the search bar and click on “Service Accounts” in the results.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089569066/2855b7fa-d896-4249-825d-4ec590499ca8.png" alt="Navigating the Service Accounts page" class="image--center mx-auto" width="527" height="279" loading="lazy"></p>
<p>Click “Create Service Account” at the top of the page. On the new page, type: <em>cockroachdb-backup</em> as the service account name, then click ‘Create and Continue’</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089677768/05c9f9ed-257f-44c6-89b5-3880c8af017d.png" alt="Creating a new Service Account for the CockroachDB cluster, to give it access to our Cloud Storage Bucket" class="image--center mx-auto" width="543" height="597" loading="lazy"></p>
<p>Now we’ll give this service account permission to work with our storage bucket. In the <em>Permissions</em> section, type “storage object creator” in the filter box and select it from the dropdown.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762089744927/64ed65df-88ee-43c9-8be4-892a41a24989.png" alt="Providing our Service Account with the necessary permissions to access the bucket" class="image--center mx-auto" width="476" height="590" loading="lazy"></p>
<p>Repeat the same for “storage object viewer”, and “storage object user”.</p>
<p>At the end, you should see three roles assigned:</p>
<ul>
<li><p>Storage Object Creator</p>
</li>
<li><p>Storage Object Viewer</p>
</li>
<li><p>Storage Object User</p>
</li>
</ul>
<p>Click “Continue”, then “Done.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762092953125/0419abe8-a1ff-4f1c-b367-f9e203bdf6ff.png" alt="The necessary permissions to be assigned to the Service Account" class="image--center mx-auto" width="520" height="788" loading="lazy"></p>
<p>You’ve now created a service account that can create and read files in your bucket.</p>
<h4 id="heading-downloading-the-service-account-key">Downloading the Service Account Key</h4>
<p>To let our CockroachDB cluster use this service account, we’ll generate a <strong>key file</strong>.</p>
<p><strong>What’s a key file?</strong><br>It’s just a small <strong>JSON file</strong> containing secret information your app (CockroachDB) can use to authenticate securely with Google Cloud – like an ID card.</p>
<p><strong>But be careful ⚠️</strong> If this key gets into the wrong hands, anyone could use it to access your Google Cloud resources. <strong>Never share or upload this file</strong> to your GitHub, BitBucket, or GitLab repository, or any other online repositories.</p>
<p>In the Service Accounts page, find your <code>cockroachdb-backup</code> account, click the three dots (⋮) under the Action column, then select “Manage Keys.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762090008411/11c4b373-87b0-416d-bf14-1a9ccd15c452.png" alt="Finding the newly created service account, and creating a key" class="image--center mx-auto" width="647" height="373" loading="lazy"></p>
<p>On the new page, click “Add Key” then “Create new key.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762090059309/ebe17228-e2a8-4abe-b41b-7378013570d5.png" alt="Creating a new key for the new service account" class="image--center mx-auto" width="501" height="571" loading="lazy"></p>
<p>A dialog box will pop-up, choose JSON as the key type, and click “Create.”</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762090115728/5ed82664-f57a-4489-af08-be85c2ad42e9.png" alt="Selecting the Key Type as JSON" class="image--center mx-auto" width="610" height="381" loading="lazy"></p>
<p>Google will automatically download a file named something like <code>cockroachdb-backup-1234567890abcdef.json</code></p>
<p>We’ll use this key soon when we configure our CockroachDB backup job.</p>
<h3 id="heading-attaching-the-key-to-our-cockroachdb-cluster">Attaching the Key to Our CockroachDB Cluster</h3>
<p>Now that we’ve downloaded the service account key, we need to attach it to our CockroachDB cluster so that the DB can upload and read backups from our Google Cloud Storage bucket.</p>
<p><strong>Why this is needed:</strong><br>Our Minikube cluster (and even any managed Kubernetes cluster like GKE, EKS, or AKS) <strong>doesn’t have direct access</strong> to the files on your computer. So, we’ll upload the key file to Kubernetes as a Secret, and then mount it inside our CockroachDB pods as a volume.</p>
<h4 id="heading-step-1-create-a-kubernetes-secret">Step 1: Create a Kubernetes Secret</h4>
<p>Run the command below in your terminal👇🏾 Replace <code>&lt;PATH_TO_KEY&gt;</code> with the path to your downloaded key file:</p>
<pre><code class="lang-bash">kubectl create secret generic gcs-key --from-file=key.json=&lt;PATH_TO_KEY&gt;
</code></pre>
<p>This command creates a <strong>Kubernetes Secret</strong> named <code>gcs-key</code> that securely stores your Google Cloud key.</p>
<h4 id="heading-step-2-mount-the-secret-to-the-cockroachdb-cluster">Step 2: Mount the Secret to the CockroachDB Cluster</h4>
<p>Now, let’s tell Kubernetes to use this secret inside our CockroachDB cluster.</p>
<p>Open your <code>cockroachdb-values.yml</code> file and scroll to the <code>statefulset:</code> section. Add the following lines under it:👇🏾</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-string">...</span>
  <span class="hljs-attr">env:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GOOGLE_APPLICATION_CREDENTIALS</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">/var/run/gcp/key.json</span>

  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">gcs-key</span>

  <span class="hljs-attr">volumeMounts:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/var/run/gcp</span>
      <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Here’s what this does:</p>
<ul>
<li><p>The <code>volumes</code> section tells Kubernetes to create a volume from the secret we just made.</p>
</li>
<li><p>The <code>volumeMounts</code> section attaches that volume inside the CockroachDB container.</p>
</li>
<li><p>The <code>GOOGLE_APPLICATION_CREDENTIALS</code> environment variable points CockroachDB to our key file so it knows where to find it when connecting to Google Cloud.</p>
</li>
</ul>
<p>Your final file should look like this:👇🏾</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">podSecurityContext:</span>
    <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
    <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"1Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">podAntiAffinity:</span>
    <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>
  <span class="hljs-attr">nodeSelector:</span>
    <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
  <span class="hljs-attr">env:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GOOGLE_APPLICATION_CREDENTIALS</span>
      <span class="hljs-attr">value:</span> <span class="hljs-string">/var/run/gcp/key.json</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">gcs-key</span>
  <span class="hljs-attr">volumeMounts:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
      <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/var/run/gcp</span>
      <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>

<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">5Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">standard</span>

<span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>

<span class="hljs-attr">init:</span>
  <span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">wait:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Now, apply the update using Helm:👇🏾</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<h4 id="heading-step-3-confirm-the-key-exists-in-the-cluster">Step 3: Confirm the Key Exists in the Cluster</h4>
<p>Once the upgrade is complete, run this command to confirm the key is now inside your CockroachDB pods:</p>
<pre><code class="lang-bash">kubectl <span class="hljs-built_in">exec</span> -it crdb-cockroachdb-1 -- cat /var/run/gcp/key.json
</code></pre>
<p>You should see something similar to this:👇🏾</p>
<pre><code class="lang-bash">prince@DESKTOP-QHVTAUD:~/programming/cockroachdb-tutorial$ kubectl <span class="hljs-built_in">exec</span> -it crdb-cockroachdb-1 -- cat /var/run/gcp/key.json
{
  <span class="hljs-string">"type"</span>: <span class="hljs-string">"service_account"</span>,
  <span class="hljs-string">"project_id"</span>: ***,
  <span class="hljs-string">"private_key_id"</span>: ***,
  <span class="hljs-string">"private_key"</span>: ***,
  <span class="hljs-string">"client_email"</span>: ***,
  <span class="hljs-string">"client_id"</span>: ***,
  <span class="hljs-string">"auth_uri"</span>: <span class="hljs-string">"https://accounts.google.com/o/oauth2/auth"</span>,
  <span class="hljs-string">"token_uri"</span>: <span class="hljs-string">"https://oauth2.googleapis.com/token"</span>,
  <span class="hljs-string">"auth_provider_x509_cert_url"</span>: <span class="hljs-string">"https://www.googleapis.com/oauth2/v1/certs"</span>,
  <span class="hljs-string">"client_x509_cert_url"</span>: ***,
  <span class="hljs-string">"universe_domain"</span>: <span class="hljs-string">"googleapis.com"</span>
}
</code></pre>
<p>Nice! That means our cluster now has access to the Google Cloud key.</p>
<h4 id="heading-step-4-creating-the-backup-schedule">Step 4: Creating the Backup Schedule</h4>
<p>CockroachDB makes backups super convenient. It can automatically back up your database <strong>on a schedule</strong> (without you needing to manually create Kubernetes CronJobs).</p>
<p>To create an automatic backup schedule, run this SQL command inside the CockroachDB SQL shell 👇🏾(Replace the BUCKET_NAME placeholder with the name of your Google Cloud Storage bucket):</p>
<pre><code class="lang-bash">CREATE SCHEDULE backup_cluster
FOR BACKUP INTO <span class="hljs-string">'gs://&lt;BUCKET_NAME&gt;/cluster?AUTH=implicit'</span>
WITH revision_history
RECURRING <span class="hljs-string">'@hourly'</span>
FULL BACKUP <span class="hljs-string">'@daily'</span>
WITH SCHEDULE OPTIONS first_run = <span class="hljs-string">'now'</span>;
</code></pre>
<p>Here’s what each part means:</p>
<ul>
<li><p><code>AUTH=implicit</code> tells CockroachDB to use the Google key we mounted (<code>GOOGLE_APPLICATION_CREDENTIALS</code>) for authentication.</p>
</li>
<li><p><code>FULL BACKUP '@daily'</code> creates a complete backup of the entire database every day.</p>
</li>
<li><p><code>RECURRING '@hourly'</code> creates smaller, incremental backups every hour, capturing just the changes since the last backup.</p>
</li>
<li><p><code>WITH SCHEDULE OPTIONS first_run = 'now'</code> starts the first backup immediately after running the command.</p>
</li>
</ul>
<p>After running it, CockroachDB will return two rows:</p>
<ul>
<li><p>The first is for the <strong>recurring incremental backup</strong> (hourly updates)</p>
</li>
<li><p>The second is for the <strong>full backup</strong> (daily snapshot)</p>
</li>
</ul>
<p>You can read more about full and incremental backups in the official docs here 👉🏾<a target="_blank" href="https://www.cockroachlabs.com/docs/stable/take-full-and-incremental-backups">CockroachDB Backups Guide</a>.</p>
<h4 id="heading-step-5-checking-backup-status">Step 5: Checking Backup Status</h4>
<p>To see the status of your backups, copy the <strong>Job ID</strong> from the second row (the <code>id</code> column) and run this command:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762103549260/742fc309-9c4d-4967-9436-91539851a9b9.png" alt="The job ID to copy" class="image--center mx-auto" width="1587" height="107" loading="lazy"></p>
<pre><code class="lang-bash">SHOW JOBS FOR SCHEDULE &lt;YOUR_JOB_ID&gt;;
</code></pre>
<p>Replace <code>&lt;YOUR_JOB_ID&gt;</code> with the ID you copied.</p>
<p>You’ll see output similar to this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762103606748/8627d561-0b54-4e6d-9109-ba7e1c7a85c3.png" alt="Getting the status of the backup job" class="image--center mx-auto" width="1152" height="294" loading="lazy"></p>
<p>Now, do the same for the recurring backup job (the ID on the 1st row of the previous result)</p>
<p>If both statuses show <code>succeeded</code>, that means your full and recurring backups worked perfectly! If either is still running, just give it a few minutes – backups can take a bit of time :)</p>
<h3 id="heading-testing-our-backup-disaster-recovery-time">Testing Our Backup — Disaster Recovery Time</h3>
<p>Woohoo! We’ve successfully created a backup of our CockroachDB cluster to Google Cloud Storage. That’s a huge milestone. But let’s be honest: how can we be <em>sure</em> it works if we’ve never tried restoring it?</p>
<p>So, in true brave-developer fashion, we’re going to do the unthinkable: <strong>destroy our entire database</strong>...yes, everything! 😬</p>
<p>Why would we do that?! Because in real life, disasters happen. A node crashes, data gets wiped, or an upgrade goes sideways. The question is: <em>Can we recover?</em> Let’s find out.</p>
<h4 id="heading-step-1-uninstall-the-helm-chart">Step 1: Uninstall the Helm Chart</h4>
<p>First, let’s remove the CockroachDB Helm release. This deletes the cluster resources like StatefulSets, pods, and secrets:</p>
<pre><code class="lang-bash">helm uninstall crdb
</code></pre>
<p>This removes the running cluster, but <strong>not the actual data</strong>, which is stored on Persistent Volumes (PVs).</p>
<h4 id="heading-step-2-delete-persistent-volume-claims-pvcs">Step 2: Delete Persistent Volume Claims (PVCs)</h4>
<p>Each CockroachDB node stores its data in a <strong>Persistent Volume Claim</strong> (PVC). These PVCs remain even after uninstalling the Helm release, so let’s manually delete them:</p>
<pre><code class="lang-bash">kubectl delete pvc datadir-crdb-cockroachdb-0
kubectl delete pvc datadir-crdb-cockroachdb-1
kubectl delete pvc datadir-crdb-cockroachdb-2
</code></pre>
<h4 id="heading-step-3-delete-the-persistent-volumes-pvs">Step 3: Delete the Persistent Volumes (PVs)</h4>
<p>Next, list all the Persistent Volumes:</p>
<pre><code class="lang-bash">kubectl get pv
</code></pre>
<p>You’ll see a list of volumes similar to this 👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762107818554/01defffd-543b-486a-aa19-4bbf6f768270.png" alt="List existing Persistent Volumes for CockroachDB" class="image--center mx-auto" width="925" height="91" loading="lazy"></p>
<p>Look for the PVs that are <strong>bound to the PVCs</strong> you just deleted. Then delete them manually using:</p>
<pre><code class="lang-bash">kubectl delete pv &lt;PV_NAME&gt;
</code></pre>
<p>At this point, you’ve completely wiped out your database like it never existed 🥲. Don’t worry: this is all part of the plan.</p>
<h4 id="heading-step-4-reinstall-the-cluster">Step 4: Reinstall the Cluster</h4>
<p>Let’s bring CockroachDB back to life (an empty one for now):</p>
<pre><code class="lang-bash">helm install crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>Once the installation is done, expose the cluster locally again:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26257
</code></pre>
<h4 id="heading-step-5-check-whats-left">Step 5: Check What’s Left</h4>
<p>Connect to the Beekeeper Studio to your DB if your not, and try running the query below:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>You’ll get an error saying the <code>books</code> table doesn’t exist, because this is a <em>brand new</em> database.</p>
<h4 id="heading-step-6-restore-from-google-cloud-storage">Step 6: Restore from Google Cloud Storage</h4>
<p>Now for the magic part, let’s bring our data back from the backup we created earlier 😃!</p>
<p>Run this query the new cluster:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">RESTORE</span> <span class="hljs-keyword">FROM</span> LATEST <span class="hljs-keyword">IN</span> <span class="hljs-string">'gs://&lt;BUCKET_NAME&gt;/cluster?AUTH=implicit'</span>;
</code></pre>
<p>Replace <code>&lt;BUCKET_NAME&gt;</code> with your actual Google Cloud Storage bucket name (for example: <code>cockroachdb-backup-7gw8u</code>).</p>
<p>CockroachDB will begin restoring your data. This can take a few seconds or minutes depending on your backup size. When it’s done, you’ll see a response showing a success status:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762108106557/0da98d45-d8f4-48ed-b852-9f76209fb20f.png" alt="Database restored successfully" class="image--center mx-auto" width="645" height="422" loading="lazy"></p>
<h4 id="heading-step-7-confirm-the-restoration">Step 7: Confirm the Restoration</h4>
<p>Now, run the same query again:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom 💥 your books are back 😁! That means your backup and restore process works perfectly. You just performed a full disaster recovery test.</p>
<p>Congrats! You’ve done something many real-world teams fail to test: a <strong>full backup and restore cycle</strong>. You’ve now proven that your database setup is resilient, even in a worst-case scenario.</p>
<h2 id="heading-managing-resources-amp-optimizing-memory-usage">Managing Resources &amp; Optimizing Memory Usage</h2>
<p>In this section, we’ll learn how CockroachDB handles memory internally (for things like caching and SQL query work), and how to tune these setting<strong>s</strong> so you avoid OOM kills or Eviction – Kubernetes crashing/stopping the database due to it using too much memory than what was allocated to it.</p>
<h3 id="heading-how-cockroachdb-uses-memory">How CockroachDB Uses Memory</h3>
<p>When you deploy CockroachDB nodes (each replica) via Kubernetes, each pod (node) needs memory for multiple things. At a high level, there are two major internal uses:</p>
<ul>
<li><p><strong>Cache</strong> (<code>conf.cache</code>): This is the space CockroachDB uses to keep frequently accessed data in memory so queries can run faster without hitting the disk.</p>
</li>
<li><p><strong>SQL Memory</strong> (<code>conf.max-sql-memory</code>): This is the memory used when running SQL queries (things like sorting, joins, buffering numbers, and temporary data).</p>
</li>
</ul>
<p>Together, they need to be sized appropriately relative to the total memory you give the pod, so there’s room for these internal operations <em>plus</em> other overhead (networking, logging, background tasks).</p>
<h3 id="heading-the-memory-usage-formula-you-must-follow">The Memory Usage Formula You Must Follow</h3>
<p>Here’s the golden rule you should <strong>never forget</strong>:</p>
<pre><code class="lang-yaml"><span class="hljs-string">(2</span> <span class="hljs-string">×</span> <span class="hljs-string">max-sql-memory)</span> <span class="hljs-string">+</span> <span class="hljs-string">cache</span>  <span class="hljs-string">≤</span>  <span class="hljs-number">80</span><span class="hljs-string">%</span> <span class="hljs-string">of</span> <span class="hljs-string">the</span> <span class="hljs-string">memory</span> <span class="hljs-string">limit</span>
</code></pre>
<p>What this means:</p>
<ul>
<li><p>You take the <code>max-sql-memory</code> value and multiply by 2 (because SQL work may need space for both input and output, etc)</p>
</li>
<li><p>Add your <code>cache</code> value</p>
</li>
<li><p>That total must be <strong>less than or equal to 80%</strong> of the pod’s memory limit (<code>statefulset.resources.limits.memory</code>)</p>
</li>
<li><p>The remaining ~20% (or more) is free space for <em>other internal CockroachDB processes</em> like background jobs, metrics, network, and so on</p>
</li>
</ul>
<p>If you give CockroachDB too little “free” memory beyond these two settings, you risk OOM kills (pod gets killed by Kubernetes because it used more memory than allowed) or performance issues.</p>
<h3 id="heading-where-you-find-these-settings">Where You Find These Settings</h3>
<p>If you go to the Helm chart docs on ArtifactHub, <a target="_blank" href="https://artifacthub.io/packages/helm/cockroachdb/cockroachdb">CockroachDB Helm Chart on ArtifactHub</a>, and scroll down to the <strong>Configuration</strong> section (or press Ctrl-F for <code>conf.cache</code>), you’ll see:</p>
<ul>
<li><p><code>conf.cache</code> (cache size)</p>
</li>
<li><p><code>conf.max-sql-memory</code> (SQL memory size)</p>
</li>
<li><p>It states that each of these is by default set to roughly 25% of the memory allocation you set in the <code>resources.limits.memory</code> for the statefulset.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762235290740/bd176882-43bd-4abd-94e0-cce083335d64.png" alt="Artifacthub docs for the CockroachDB Helm chart" class="image--center mx-auto" width="1260" height="489" loading="lazy"></p>
<h3 id="heading-concrete-example-step-by-step">Concrete Example (Step-by-Step)</h3>
<p>Let’s do the math with numbers in our Minikube environment.</p>
<ul>
<li><p>In our case we set <code>statefulset.resources.limits.memory</code> = <strong>2 GiB</strong> for each CockroachDB pod.</p>
</li>
<li><p>The Helm default of ¼ (25%) rule means:</p>
<ul>
<li><p><code>conf.cache</code> = ¼ × 2 GiB = <strong>512 MiB</strong></p>
</li>
<li><p><code>conf.max-sql-memory</code> = ¼ × 2 GiB = <strong>512 MiB</strong></p>
</li>
</ul>
</li>
<li><p>Apply the formula: <code>(2 × 512 MiB) + 512 MiB = 1,536 MiB</code></p>
</li>
<li><p>Calculate 80% of the memory limit: <code>80% of 2 GiB = 1,638 MiB</code> (approximately)</p>
</li>
<li><p>Compare: 1,536 MiB ≤ 1,638 MiB – so we’re within the safe zone ✅</p>
</li>
<li><p>That means in this configuration, CockroachDB expects to use <strong>~1,536 MiB</strong> for its cache + SQL memory. This leaves <strong>~512 MiB</strong> (20%) of the 2 GiB limit for other internal processes.</p>
</li>
</ul>
<p>That leftover memory is for things like internal bookkeeping (range rebalancing, replication metadata), communication among database replicas, metric collection, logging, garbage collection, and temporary or unexpected memory spikes.</p>
<p>If you don’t leave this free space, your node might struggle when “normal operations”. And on Kubernetes, if the pod uses more memory than the <code>limits.memory</code> says, it can get OOM-killed which causes downtime or restarts.</p>
<h3 id="heading-on-requests-vs-limits-in-kubernetes">⚠️ On Requests vs Limits in Kubernetes</h3>
<p>Important nuance: Kubernetes schedules pods based on <strong>requests</strong> (what you ask for) but enforces limits based on <strong>limits</strong> (what you allow).</p>
<ul>
<li><p><code>statefulset.resources.requests.memory</code> = what the scheduler guarantees the pod will have.</p>
</li>
<li><p><code>statefulset.resources.limits.memory</code> = the maximum the pod can use before Kubernetes will kill it for excess memory.</p>
</li>
</ul>
<p>Because CockroachDB’s internal memory computations (cache + SQL memory) use the <strong>limit</strong> value to calculate sizing, if you set requests &lt; limits you’ll get a mismatch. Example:</p>
<ul>
<li><p>Suppose requests = 1 GiB, limits = 2 GiB</p>
</li>
<li><p>Kubernetes may schedule the pod on a node that has (at least) 1 GiB free</p>
</li>
<li><p>But internally, CockroachDB will plan for ~1.5 GiB usage (based on the 2 GiB limit)</p>
</li>
<li><p>The node may not actually have that much free memory available</p>
</li>
<li><p>The pod might try to use more memory than the node reserved and risk eviction due to less memory for other pods</p>
</li>
</ul>
<p>✅ <strong>Best practice:</strong> Set requests = limits for memory and CPU for CockroachDB pods. That way the scheduler reserves enough space for what CockroachDB will use internally.</p>
<h3 id="heading-overriding-the-default-fractions">Overriding the Default Fractions</h3>
<p>If you want to set static <code>conf.cache</code> or <code>conf.max-sql-memory</code> values (rather than relying on 25% of limit) you <em>can</em> – but you must still obey the memory usage formula.</p>
<p>For example, if you set:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">conf:</span>
  <span class="hljs-attr">cache:</span> <span class="hljs-string">"1Gi"</span>
  <span class="hljs-attr">max-sql-memory:</span> <span class="hljs-string">"1Gi"</span>
<span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
</code></pre>
<p>According to the above configuration your pod memory request and limit is <strong>3 GiB</strong>, then calculate:</p>
<pre><code class="lang-yaml"><span class="hljs-string">(2</span> <span class="hljs-string">×</span> <span class="hljs-string">1Gi)</span> <span class="hljs-string">+</span> <span class="hljs-string">1Gi</span> <span class="hljs-string">=</span> <span class="hljs-string">3Gi</span>
<span class="hljs-number">80</span><span class="hljs-string">%</span> <span class="hljs-string">of</span> <span class="hljs-string">3Gi</span> <span class="hljs-string">=</span> <span class="hljs-string">~2.4Gi</span>
</code></pre>
<p>Here <strong>3Gi &gt; 2.4Gi</strong>, so you’d be violating the rule. This is a risky setup.</p>
<p>So you’ll need to either reduce cache or SQL memory, for example to 768Mi (or increase the memory limit, for example 4Gi) so that your formula results in ≤ 80% of the limit.</p>
<h2 id="heading-scaling-cockroachdb-the-right-way">Scaling CockroachDB the Right Way</h2>
<p>In this section we’ll look at when and how you should grow your CockroachDB cluster – whether that means adding more replicas (horizontal scale), giving each node more CPU/RAM (vertical scale), or giving them more storage.</p>
<p>I’ll explain everything in simple terms and cover what metrics to watch, what decisions to make, and how to scale safely.</p>
<p>What we’ll discuss:</p>
<ul>
<li><p>How you can tell it’s time to “grow” your cluster</p>
</li>
<li><p>How to safely add more nodes or upgrade what you already have</p>
</li>
<li><p>How to decide whether you need more nodes, bigger nodes, or bigger disks</p>
</li>
<li><p>How to do all this without causing downtime or stress</p>
</li>
</ul>
<h3 id="heading-key-metrics-to-understand">Key Metrics to Understand</h3>
<p>Before we dive into how to scale our cluster, we need to understand what certain metrics mean. Because, these metrics will help us make calculated decisions, knowing what and and when to scale certain resources.</p>
<h4 id="heading-read-bytessecond-amp-write-bytessecond-throughput">Read bytes/second &amp; Write bytes/second (Throughput)</h4>
<p>Read bytes/second is how much data (in bytes) the disk is <strong>reading</strong> every second from itself to the database, that is, passing from the disk to the database app.</p>
<p>Write bytes/second is how much data is being <strong>written</strong> to the disk per second, that is, moving from the database to the disk.</p>
<p>This matters because your database is an application that stores data on disk. If your app needs to read a lot of data (reads) or write a lot of data (writes), this metric shows the <strong>volume</strong> of data flowing to/from disk.</p>
<p>To keep an eye on it, go to your CockroachDB dashboard and navigate to the “Metrics” link on the sidebar. Under the “Metrics” title, click the “Dashboard:…” drop-down and select “Hardware” from the options.</p>
<p>Now, scroll down a bit till you see “Disk Read Bytes/s” and “Disk Write Bytes/s”.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762325396257/553ac9d4-4927-40f3-b654-8b19a0b2aef8.png" alt="The Disk Read &amp; Write Bytes/s metrics" class="image--center mx-auto" width="1135" height="821" loading="lazy"></p>
<h4 id="heading-read-iops-amp-write-iops">Read IOPS &amp; Write IOPS</h4>
<p><strong>IOPS</strong> = “Input/Output Operations Per Second”. Here, Read IOPS = how many <strong>read operations</strong> the disk is performing per second. Write IOPS = how many <strong>write operations</strong> per second.</p>
<p>This is different from throughput because throughput is about how many bytes (data) are being transferred. IOPS, on the other hand, is about <strong>how many operations</strong> are happening (regardless of size).</p>
<p>Here’s an example: 10 read operations/sec of 1 MiB each = 10 MiB/sec throughput, 10 IOPS. Another scenario: 100 reads/sec of 10 KiB each = ~1 MiB/sec throughput, but 100 IOPS (higher operations count though lower data size.</p>
<p>Scroll down a bit more to view the IOPS metrics:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762325699278/dd549ac3-16cf-4373-9637-5a1e798bf5db.png" alt="Illustrating the IOPS metrics on the dashboard" class="image--center mx-auto" width="977" height="813" loading="lazy"></p>
<h4 id="heading-sql-p99-latency-99th-percentile-latency">SQL p99 Latency (99th percentile latency)</h4>
<p>P99 latency is the time it takes for the <strong>slowest 1% of queries</strong> to finish.</p>
<p>For example, let’s say you run 1,000 queries. How long the slowest 10 of them took is what p99 shows.</p>
<p>This matters because it’s not about the average query, but about the tail (worst cases). If your p99 is high, it means some queries are seriously lagging. All other queries might be fine, but some are dragging.</p>
<p>So if p99 jumps up (for example, from 10 ms → 300 ms), you should investigate: maybe big joins, missing indexes, contention, or data takes too much time to get stored in the disk.</p>
<p>To access the SQL P99 Latency metrics, simply click the “Dashboard:…” select field, and choose the “Overview” option from the dropdown.</p>
<p>PS: The higher the p99 latency, the more problem there is (slower queries).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762326088120/e6f39e6e-942b-4db9-b808-cb228c1e0cc5.png" alt="The SQL p99 latency metric" class="image--center mx-auto" width="980" height="479" loading="lazy"></p>
<h4 id="heading-disk-ops-in-progress-queue-depth">Disk Ops In Progress (Queue Depth)</h4>
<p>This shows how many disk reads and writes are waiting <em>in line</em> (queued) because the storage system is busy.</p>
<p>A queue depth of 0–5 is generally OK. If it frequently goes into double-digits (10+), that means storage is struggling and latency may spike. If you see this number high and staying high, you may need faster storage or more database replicas.</p>
<p>Simple rule: if “Ops In Progress” &gt; ~9 for extended time, this is a bad sign. Time to check disks and I/O.</p>
<p>To access the “Disk Ops In Progress“ metric, return to the “Hardware“ dashboard, and scroll down:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762488796957/b2a215fd-ec51-4ee3-9056-a5fa6d511c61.png" alt="Accessing the Disk Ops In Progress metrics on the COckroachDB dashboard" class="image--center mx-auto" width="975" height="621" loading="lazy"></p>
<p>By monitoring these, you can choose:</p>
<ul>
<li><p>“I need <strong>more nodes</strong>” (horizontal scale)</p>
</li>
<li><p>“I need <strong>bigger nodes or faster storage</strong>” (vertical scale)</p>
</li>
<li><p>“I need <strong>better query/index tuning</strong>” (optimize rather than scale)</p>
</li>
</ul>
<h3 id="heading-when-and-what-to-scale-based-on-your-metrics">When (and What) to Scale Based on Your Metrics</h3>
<p>So, let’s imagine you’re watching your CockroachDB dashboard and notice this pattern:</p>
<ul>
<li><p>The <strong>SQL P99 latency</strong> (the slowest 1% of your queries) is high, meaning your queries are taking too long.</p>
</li>
<li><p>The <strong>CPU usage</strong> for your CockroachDB pods (under <em>Cockroach process CPU%</em>) is above <strong>80%</strong> consistently.</p>
</li>
</ul>
<p>That’s a classic sign your cluster is running out of CPU power and the database is struggling to process queries fast enough because the CPU is maxed out.</p>
<p>Here’s how to fix it 👇🏾</p>
<h4 id="heading-step-1-add-more-cpu-power">Step 1: Add More CPU Power</h4>
<p>You can scale up your CPUs directly through the <strong>Helm chart values file</strong>, <code>cockroachdb-values.yml</code>.</p>
<p>In that file, look for the section where CPU and memory requests/limits are defined under <code>statefulset.resources</code>. Then, increase the CPU allocations. For example:</p>
<pre><code class="lang-bash">statefulset:
  resources:
    requests:
      cpu: <span class="hljs-string">"3"</span>
      memory: <span class="hljs-string">"6Gi"</span>
    limits:
      cpu: <span class="hljs-string">"3"</span>
      memory: <span class="hljs-string">"6Gi"</span>
</code></pre>
<p>This means each CockroachDB pod (replica) will now <em>request</em> 3 vCPUs (guaranteed). Save the file, then apply the update with the Helm command:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>Once the upgrade is done, give it 30 minutes to 1 hour to stabilize. The CockroachDB dashboard will automatically start showing you updated metrics.</p>
<p>If you see that the CPU usage drops below 70% and the SQL P99 latency improves, you’re good. 👍🏾</p>
<h4 id="heading-step-2-add-another-replica-new-node">Step 2: Add Another Replica (New Node)</h4>
<p>But…what if the latency is <strong>still high</strong> even after adding more CPU? That likely means the cluster is still overloaded, and it’s time to add another node (replica) to distribute the load.</p>
<p>Here’s why that works: CockroachDB is horizontally scalable, meaning it automatically spreads out your data (remember <strong>ranges</strong>?) and balances reads/writes across all replicas. So, the more nodes you add, the more evenly your cluster can share the work.</p>
<p>To add another replica, simply increase the <code>replicas</code> value in your Helm config:</p>
<pre><code class="lang-bash">statefulset:
  replicas: 4  <span class="hljs-comment"># If it was 3 before</span>
</code></pre>
<p>Then, redeploy again:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p>This adds a new pod (a new CockroachDB node) to your cluster. CockroachDB will automatically rebalance your data across nodes – no manual migration needed</p>
<p>💡 <strong>Tip:</strong> Try to keep one CockroachDB pod (replica) per VM. For example, if you have 3 replicas, you should ideally have 3 separate VMs (worker nodes). This ensures better fault tolerance and performance.</p>
<p>Luckily, the official CockroachDB Helm chart already helps with this by managing <strong>Pod</strong> <strong>anti-affinity rules</strong>, so pods are automatically spread across nodes safely.</p>
<h3 id="heading-disk-bound-situations-what-to-do-when-your-disk-is-the-limiting-factor">Disk-Bound Situations — What to Do When Your Disk Is the Limiting Factor</h3>
<p>If you’re seeing this kind of pattern in your CockroachDB dashboard and Kubernetes cluster:</p>
<ul>
<li><p>SQL P99 latency is high (queries are slow)</p>
</li>
<li><p>“Disk Ops In Progress” (queue depth) stays above ~9-10 – meaning many disk I/O operations are waiting to be processed</p>
</li>
<li><p>Disk “Read bytes/sec” or “Write bytes/sec” (throughput) are high <strong>or</strong> “Read IOPS” or “Write IOPS” are high (even though CPU looks okay)</p>
</li>
</ul>
<p>Then you’re very likely <strong>disk-bound</strong>, meaning your storage is the bottleneck.</p>
<p>Here’s how to fix it (and yes, it’s a bit more complex than just “add more RAM”)…</p>
<h4 id="heading-step-1-increase-disk-size-in-your-helm-values">Step 1: Increase Disk Size in Your Helm Values</h4>
<p>Often the first problem is that the disk size is too small. Here’s how you can increase it:</p>
<ol>
<li><p>Open your <code>cockroachdb-values.yml</code> (the Helm chart values file)</p>
</li>
<li><p>Look for the storage section, for example:</p>
</li>
</ol>
<pre><code class="lang-bash">storage:
  persistentVolume:
    size: 5Gi  <span class="hljs-comment"># current size</span>
</code></pre>
<ol start="3">
<li>Update it to a larger size, like:</li>
</ol>
<pre><code class="lang-bash">storage:
  persistentVolume:
    size: 15Gi  <span class="hljs-comment"># increased size</span>
</code></pre>
<ol start="4">
<li>Save the file and run:</li>
</ol>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<p><strong>N.B.</strong> If this doesn’t work or you receive an error from the Helm chart concerning not being able to modify some values (this is normal), just upsize the disk this way:👇🏾 (just replace the PVC_NAME and SIZE placeholders accordingly)</p>
<pre><code class="lang-bash">kubectl patch pvc &lt;PVC_NAME&gt; \
  -p <span class="hljs-string">'{"spec":{"resources":{"requests":{"storage":"&lt;SIZE&gt;"}}}}'</span>
</code></pre>
<p>Do that for each PVC (<code>datadir-crdb-cockroachdb-0</code>, <code>datadir-crdb-cockroachdb-1</code>, and so on).</p>
<p><strong>Important:</strong> Increasing size <em>may help</em>, but often alone is not enough because your disk speed (IOPS/throughput) also depends on factors beyond just size.</p>
<p>Let’s break down why that’s the case, and what really affects your disk performance (especially on Google Cloud, which is what I’m using, too).</p>
<h4 id="heading-why-disk-speed-can-vary">Why Disk Speed Can Vary</h4>
<p>Your CockroachDB cluster uses <strong>external disks</strong> provided by your cloud provider (like Google, AWS, or Azure). The speed of those disks – that is, how fast they can read/write data – isn’t fixed. It depends on a few key factors.</p>
<p>On Google Cloud, disk performance depends on three main things:</p>
<ol>
<li><p><strong>Disk type</strong>: HDD, SSD, or fast SSD (pd-ssd) (the faster the disk type, the faster it can handle data operations)</p>
</li>
<li><p><strong>Disk size</strong>: larger disks usually come with higher speed limits (the bigger, the faster)</p>
</li>
<li><p><strong>VM’s vCPU count</strong>: more CPUs mean higher quotas for both</p>
<ul>
<li><p>read/write operations per second (<strong>IOPS</strong>), and</p>
</li>
<li><p>how much data can flow to/from the disk per second (<strong>throughput</strong>)</p>
</li>
</ul>
</li>
</ol>
<h4 id="heading-the-recommended-disk-type-for-cockroachdb">The Recommended Disk Type for CockroachDB</h4>
<p>The pd-ssd (Google’s fast SSD) is the recommended type for CockroachDB.</p>
<ul>
<li><p>Each pd-ssd disk starts with a minimum of 6,000 IOPS (read or write operations per second).</p>
</li>
<li><p>It also has around 240 MiB/s (~252 MB/s) of read/write throughput.</p>
</li>
</ul>
<p>In simple terms, that means your CockroachDB disk can handle up to 6,000 read/write operations EVERY SECOND, and move 250+ MB of data in and out every second. That’s pretty impressive!</p>
<p>But here’s the catch: those numbers can still vary depending on your <strong>VM family</strong> and <strong>CPU count</strong>.</p>
<h4 id="heading-how-vm-family-affects-disk-speed-e2-example">How VM Family Affects Disk Speed (E2 Example)</h4>
<p>If your CockroachDB is running on an E2 VM family (one of Google Cloud’s general-purpose VM types):</p>
<ul>
<li><p>A VM with 2–7 vCPUs can handle up to:</p>
<ul>
<li><p>15k IOPS (read/write operations per second)</p>
</li>
<li><p>250+ MiB/s throughput (which is already far more than many databases ever use 😅)</p>
</li>
</ul>
</li>
<li><p>A VM with 8–15 vCPUs still allows 15k IOPS, but throughput jumps up to ~800 MiB/s 😮 –<br>  meaning your disk can push nearly 0.8 GB per second of data in/out IN A SECOND.</p>
</li>
</ul>
<p>The more vCPUs you have, the higher these limits grow, both for IOPS and throughput.</p>
<h4 id="heading-putting-it-all-together">Putting It All Together</h4>
<p>So, if you notice high SQL P99 latency (queries taking long), and disk read and write IOPS or throughput (read &amp; write bytes) usage close to their limits, then your disk may be maxing out, not your database itself.</p>
<p>Here’s what you can do:</p>
<ul>
<li><p>Check your current VM’s vCPU count and disk performance limit for that CPU.</p>
</li>
<li><p>If you’re using E2 with low vCPUs (for example, 2–4), try increasing it to <strong>8 vCPUs or more</strong>. That’ll immediately lift your IOPS and throughput ceiling.</p>
</li>
</ul>
<h4 id="heading-example-e2-vm-family-iopsthroughput-table">Example: E2 VM Family IOPS/Throughput Table</h4>
<pre><code class="lang-bash">E2 per-VM caps (pd-ssd):

e2-medium:     10k write / 12k <span class="hljs-built_in">read</span> IOPS, 200/200 MiB/s
2–7 vCPUs:     15k / 15k IOPS, 240/240 MiB/s
8–15 vCPUs:    15k / 15k IOPS, 800/800 MiB/s
16–31 vCPUs:   25k / 25k IOPS, 1,000 write / 1,200 <span class="hljs-built_in">read</span> MiB/s
32 vCPUs:      60k / 60k IOPS, 1,000 write / 1,200 <span class="hljs-built_in">read</span> MiB/s
</code></pre>
<p>The rule is simple — the higher the CPU tier (2–7, 8–15, and so on), the higher the disk speed cap.</p>
<h4 id="heading-but-what-if-youre-still-seeing-slow-queries">⚠️ But What If You’re Still Seeing Slow Queries?</h4>
<p>If your CockroachDB queries are <em>still</em> slow, but your metrics show that you’re not fully using your disk capacity (based on your VM’s CPU range), then your <strong>disk size</strong> might be the actual limitation.</p>
<p>In that case:</p>
<ul>
<li><p>Gradually increase your disk size, for exaxmple from <code>50Gi</code> to <code>70Gi</code> to <code>100Gi</code>.</p>
</li>
<li><p>Each increase enables your disk to pass more amount of data in and out (especially with pd-ssd).</p>
</li>
<li><p>Remember: once you increase disk size on Google Cloud, <strong>you can’t shrink it back down</strong>, so grow it slowly and observe improvements before scaling again.</p>
</li>
</ul>
<p>This step helps you pinpoint <em>exactly</em> whether the slowdown is coming from insufficient IOPS, throughput, or just a disk that’s too small for CockroachDB’s workload 💪🏾</p>
<h3 id="heading-memory-pressure-what-to-do-when-your-database-hits-the-limit">Memory Pressure — What to Do When Your Database Hits the Limit</h3>
<p>There are some signs in your cluster you can look out for that’ll tell you your database is getting close to its limit. Pods (database replicas) might be getting <strong>OOMKilled</strong> (out of memory) or being evicted by Kubernetes, or your memory usage might be staying above ~ 75–80% for a while.</p>
<p>If either these is the case, you’re often dealing with <strong>memory pressure</strong> (you can check memory usage on the CockroachDB overview dashboard).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762584827011/e7828548-7ed7-4a87-b6b2-fff52c6f6df1.png" alt="Accessing your Cluster memory usage" class="image--center mx-auto" width="1139" height="900" loading="lazy"></p>
<h4 id="heading-why-this-happens">Why this happens</h4>
<p>If you didn’t set memory requests and limits properly for each replica, the pod might not have enough head-room for all of its internal work (cache, SQL memory, background jobs) and Kubernetes kills it or it crashes.</p>
<p>Also, as you increase load (lots of queries, many users), your database needs more memory for two internal areas:</p>
<ul>
<li><p><code>--cache</code> (or <code>conf.cache</code>): in-memory data caching</p>
</li>
<li><p><code>--max-sql-memory</code> (or <code>conf.max-sql-memory</code>): memory for running SQL queries (joins, sorts, and so on).<br>  And yes, we covered the formula earlier <code>(2 × max-sql-memory) + cache ≤ ~ 80% of RAM limit</code>.</p>
</li>
</ul>
<h4 id="heading-what-to-do">What to do:</h4>
<p>First, you can increase the DB memory. In your Helm chart values (<code>cockroachdb-values.yml</code>), bump up the <code>statefulset.resources.limits.memory</code> and <code>statefulset.resources.requests.memory</code>. Or you can modify <code>conf.cache</code> and <code>conf.max-sql-memory</code> values (if you’re comfortable) but only if the total RAM limit is sufficient to support them.</p>
<p>Because the defaults (when you installed) set each to ~25% of RAM limit, they will scale automatically when you increase RAM.</p>
<p>For example:</p>
<ul>
<li><p>If RAM limit per pod = <strong>5 GiB</strong>, then cache ≈ <strong>1.25 GiB</strong>, max-sql-memory ≈ <strong>1.25 GiB</strong></p>
</li>
<li><p>If you raise RAM limit to <strong>8 GiB</strong>, these become ≈ <strong>2 GiB</strong> each. This keeps you inside the formula and avoids memory crashes.</p>
</li>
</ul>
<h4 id="heading-quick-yaml-snippet-example">Quick YAML snippet example:</h4>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"8Gi"</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"8Gi"</span>
<span class="hljs-attr">conf:</span>
  <span class="hljs-attr">cache:</span> <span class="hljs-string">"25%"</span>
  <span class="hljs-attr">max-sql-memory:</span> <span class="hljs-string">"25%"</span>
</code></pre>
<p>After editing your values file, remember to apply it:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
<h3 id="heading-when-queries-are-slow-but-everything-else-cpu-memory-amp-disk-looks-fine">When Queries Are Slow but Everything Else (CPU, Memory &amp; Disk) Looks “Fine”</h3>
<p>Sometimes you’ll see that your resource metrics (CPU, memory, disk I/O) all seem healthy. But your queries are still slow.</p>
<p>What then? One important cause: <strong>hotspots</strong> – especially “hot ranges” or “hot nodes” in CockroachDB.</p>
<p>A <strong>hot range</strong> is a portion of data (in CockroachDB, a range is a section of data from a table) that’s receiving much more traffic (reads or writes) than others.</p>
<p>A <strong>hot node</strong>, on the other hand, is a node/replica in the cluster which has significantly more load compared to the other nodes – often because it holds one or more hot ranges.</p>
<p>Because most of the traffic (queries) go to a range which is on a specific node, even though your overall CPU / memory / disk metrics might look “okay”, performance still suffers locally: queries are funneled into that specific range, making a “hotspot”.</p>
<p>Learn more about Hotspots <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/understand-hotspots">here</a>.</p>
<h4 id="heading-why-a-high-write-workload-can-slow-reads">Why A High Write Workload Can Slow Reads</h4>
<p>When you have lots of write queries, they may overload specific ranges or nodes (especially if the keyspace is skewed). Writes tend to:</p>
<ul>
<li><p>Acquire locks or latches on rows or ranges</p>
</li>
<li><p>Cause contention among transactions</p>
</li>
<li><p>Require coordination (for example, via Raft consensus) which impacts performance.</p>
</li>
</ul>
<p>When writes dominate a range, read queries that hit the same ranges may get queued behind these write operations, or suffer longer wait times.</p>
<p>Since reads and writes are sharing the same underlying data/ranges, too much writes can delay reads by creating bottlenecks. The docs call this part of “write hotspots”.</p>
<h4 id="heading-key-signs-you-might-have-a-hotspot">Key Signs You Might Have a Hotspot</h4>
<ul>
<li><p>One node’s CPU % is much higher than the others (even though overall resources seem fine)</p>
</li>
<li><p>On the Hot Ranges page in the CockroachDB UI, some ranges show very high QPS (queries per second) compared to others.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762586236835/aeb3b0ea-b280-48d3-b12f-4cfe78d11dc1.png" alt="The Hot Ranges page in the CockorachDB dashboard UI" class="image--center mx-auto" width="1095" height="608" loading="lazy"></p>
</li>
<li><p>You observe that increasing overall resources (more CPU, more nodes) didn’t resolve the slowness. This suggests the problem isn’t “not enough resources” but “resource imbalance”.</p>
</li>
</ul>
<h4 id="heading-what-you-can-do">What You Can Do</h4>
<p>There are a few things you can do to prevent hotspots:</p>
<ul>
<li><p>Use the <strong>Hot Ranges</strong> UI page (go to the Database Console and then to Hot Ranges) to identify the range IDs and table/indexes causing the issue.</p>
</li>
<li><p>Examine how the key space is being used. If your table/index primary key is monotonically increasing (for example, timestamps or serial IDs), the writes may target a narrow portion of the data, causing a hotspot. The docs suggest using hash-sharded indexes or distributing writes across the key-space.</p>
</li>
<li><p>Ensure load is balanced across nodes: avoid “one node doing most of the work”. If needed, add nodes or ensure range distribution/lease-holder movement is happening.</p>
</li>
<li><p>Monitor write-versus-read workload. if writes are heavy, they may cause queuing for reads even when resources appear OK. So look at write heavy traffic patterns and try reducing the amount of writes (if possible).</p>
</li>
</ul>
<h4 id="heading-note">⚠️ Note</h4>
<p>Learning everything about hotspots, key visualizers, and range splitting is a bit advanced. For those wanting to dive deeper: see the CockroachDB <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/performance-recipes">Performance Recipes page</a>.</p>
<h3 id="heading-understanding-disk-speed-iops-amp-throughput-across-cloud-providers">Understanding Disk Speed (IOPS &amp; Throughput) Across Cloud Providers</h3>
<p>So far, we’ve talked about how disk speed affects CockroachDB’s performance – especially how Google Cloud measures it. But it’s important to know that <strong>each cloud provider has its own way of measuring and limiting disk performance</strong> (IOPS and throughput).</p>
<p>So, while our earlier examples focused on Google Cloud, similar logic applies to AWS, Azure, and even DigitalOcean, just with different formulas and limits.</p>
<h4 id="heading-for-google-cloud">For Google Cloud:</h4>
<p>These guides break down how disk performance works:</p>
<ul>
<li><p><a target="_blank" href="https://cloud.google.com/compute/docs/disks/performance">Persistent Disk performance overview</a>: explains how baseline IOPS and throughput are calculated and the per-instance caps.</p>
</li>
<li><p><a target="_blank" href="https://docs.cloud.google.com/compute/docs/disks/persistent-disks">About Persistent Disks</a>: quick definitions of <code>pd-standard</code> (HDD), <code>pd-balanced</code> (SSD), and <code>pd-ssd</code> (SSD).</p>
</li>
<li><p><a target="_blank" href="https://cloud.google.com/compute/docs/disks/optimizing-pd-performance">Optimize PD performance</a>: shows how disk size, machine series, and tuning can affect performance.</p>
</li>
</ul>
<h4 id="heading-for-aws-ebs">For AWS (EBS):</h4>
<p>AWS’s Elastic Block Store (EBS) has several disk types:</p>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html">EBS volume types</a>: overview of all SSD and HDD types (<code>gp3</code>, <code>gp2</code>, <code>io2</code>, and so on).</p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/ebs/latest/userguide/general-purpose.html">General Purpose SSD (gp3)</a>: lets you provision custom IOPS and throughput for your disks (about 0.25 MiB/s per IOPS, up to 2,000 MiB/s).</p>
</li>
</ul>
<h4 id="heading-for-azure-managed-disks">For Azure (Managed Disks):</h4>
<p>Azure disks also vary by type and size:</p>
<ul>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types">Disk types overview</a>: compares Standard HDD, Standard SSD, Premium SSD, Premium SSD v2, and Ultra Disk.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/virtual-machines/disks-deploy-premium-v2">Premium SSD v2</a>: lets you independently set IOPS and throughput for your disks.</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/virtual-machines/disks-performance">VM &amp; disk performance</a>: lists per-VM IOPS and throughput caps.</p>
</li>
</ul>
<h4 id="heading-for-digitalocean">For DigitalOcean:</h4>
<p>DigitalOcean offers simpler storage setups:</p>
<ul>
<li><p><a target="_blank" href="https://docs.digitalocean.com/products/volumes/">Volumes overview</a>: explains block storage and NVMe details.</p>
</li>
<li><p><a target="_blank" href="https://docs.digitalocean.com/products/volumes/details/limits/">Volume Limits</a>: shows per-Droplet IOPS and throughput caps (including burst windows).</p>
</li>
</ul>
<h3 id="heading-downsizing-the-cluster-reducing-replicas">Downsizing the Cluster (Reducing Replicas)</h3>
<p>Now that we’ve seen how to scale up our CockroachDB cluster, let’s look at how to scale it down safely and correctly.</p>
<p>Let’s assume we scaled our cluster from 3 replicas to 5 replicas earlier (to handle more workload).</p>
<p>PS: If your CockroachDB pods were crashing often, you might need to increase the CPU and memory limits in the Helm chart configuration, like this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">5</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"2Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span> <span class="hljs-comment"># We can keep the memory requests and limits inconsistent for now, since we're in a development environment</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
<span class="hljs-string">...</span>
</code></pre>
<p>Then, you update the cluster using:</p>
<pre><code class="lang-bash">helm upgrade crdb cockroachdb/helm-chart -f cockroachdb-values.yml
</code></pre>
<p>After a few minutes, you can confirm the newly added replicas <code>kubectl get pods</code>. You should now see five CockroachDB pods running.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762612478598/dee9f9e7-6b31-4b06-aed3-e2b0b97268fd.png" alt="The newly added CockroachDB replicas" class="image--center mx-auto" width="526" height="139" loading="lazy"></p>
<p>Also, check your CockroachDB Admin UI – the new nodes should now appear in the cluster overview.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762612539734/30e01a7d-3d2b-4160-be90-2988a161d87d.png" alt="Newly added nodes in the cluster" class="image--center mx-auto" width="1502" height="728" loading="lazy"></p>
<p>P.S: You might experience some issues when upscaling your cluster, especially if you don’t have sufficient memory and CPU on your PC or wherever you’re running your Kubernetes cluster.</p>
<h3 id="heading-the-wrong-way-to-downscale">⚠️ The Wrong Way to Downscale</h3>
<p>Now, what if your workload reduces and you’d like to cut costs by scaling down from 5 replicas back to 3?</p>
<p>You might think, <em>“Oh, I’ll just reduce the number of replicas in the Helm chart from 5 to 3 and redeploy.”</em> But hold on, that’s very wrong! 😅</p>
<p>Scaling up CockroachDB is simple…but scaling down must be done carefully, because of certain factors which will explain.</p>
<h3 id="heading-decommissioning-a-node-before-scaling-down-the-cluster">Decommissioning a Node Before Scaling Down the Cluster</h3>
<p>Before you go ahead and reduce the number of replicas in your CockroachDB cluster, it’s important to follow the right process.</p>
<p>You <em>can’t</em> just go from 5 replicas down to 3 and expect everything to go smoothly. There are steps you must take.</p>
<h4 id="heading-why-you-cant-just-scale-from-5-to-3-instantly">Why you can’t just scale from 5 to 3 instantly</h4>
<p>If you reduce your cluster size too quickly, you might:</p>
<ul>
<li><p>Lose data redundancy or fail to meet the required replication factor.</p>
</li>
<li><p>Cause data rebalancing to happen under heavy load, which can slow queries.</p>
</li>
<li><p>Put your cluster into a state where certain ranges or data replicas don’t have enough copies to remain fault-tolerant.</p>
</li>
</ul>
<h4 id="heading-the-correct-approach-decommission-first-then-scale-down-one-node-at-a-time">✅ The correct approach: Decommission first, then scale down one node at a time</h4>
<p>Here’s the safe way to downscale:</p>
<ol>
<li><p><strong>Decommission</strong> the node you plan to remove.</p>
</li>
<li><p>Once decommissioning is complete, <strong>reduce the replica count</strong> (for example, from 5 to 4).</p>
</li>
<li><p>Delete the disk/PVC tied to that removed node.</p>
</li>
<li><p>Repeat the process (remove one node at a time) until you reach your target size (for example, down to 3 replicas).</p>
</li>
</ol>
<h4 id="heading-step-by-step-decommission-the-5th-node-before-scaling-5-to-4">Step-by-step: Decommission the 5th node (before scaling 5 to 4)</h4>
<ol>
<li><p><strong>Create a client pod</strong> to run CockroachDB commands.<br> Create a file named <code>cockroachdb-client.yml</code> with this content:</p>
<pre><code class="lang-yaml"> <span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
 <span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
 <span class="hljs-attr">metadata:</span>
   <span class="hljs-attr">name:</span> <span class="hljs-string">cockroachdb-client</span>
 <span class="hljs-attr">spec:</span>
   <span class="hljs-attr">serviceAccountName:</span> <span class="hljs-string">&lt;SA&gt;</span>
   <span class="hljs-attr">containers:</span>
     <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">cockroachdb-client</span>
       <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.1</span>
       <span class="hljs-attr">imagePullPolicy:</span> <span class="hljs-string">IfNotPresent</span>
       <span class="hljs-attr">command:</span>
         <span class="hljs-bullet">-</span> <span class="hljs-string">sleep</span>
         <span class="hljs-bullet">-</span> <span class="hljs-string">"2147483648"</span>
   <span class="hljs-attr">terminationGracePeriodSeconds:</span> <span class="hljs-number">300</span>
</code></pre>
<p> Replace <code>&lt;SA&gt;</code> with your CockroachDB service account name (find it via <code>kubectl get sa -l app.kubernetes.io/name=cockroachdb</code>).</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620657038/34d5eb4b-de16-4e8a-b85c-1e7bf6b76172.png" alt="The CockroachDB service account details" class="image--center mx-auto" width="791" height="55" loading="lazy"></p>
</li>
<li><p>Apply the manifest:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">apply</span> <span class="hljs-string">-f</span> <span class="hljs-string">cockroachdb-client.yml</span>
</code></pre>
</li>
<li><p>Confirm the pod is running:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">get</span> <span class="hljs-string">pods</span>
</code></pre>
<p> You should see <code>cockroachdb-client</code>.</p>
</li>
<li><p>Exec into the client pod:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">exec</span> <span class="hljs-string">-it</span> <span class="hljs-string">cockroachdb-client</span> <span class="hljs-string">--</span> <span class="hljs-string">bash</span>
</code></pre>
</li>
<li><p>Get the list of nodes and IDs:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">./cockroach</span> <span class="hljs-string">node</span> <span class="hljs-string">status</span> <span class="hljs-string">--insecure</span> <span class="hljs-string">--host</span> <span class="hljs-string">&lt;SERVICE_NAME&gt;</span>
</code></pre>
<p> Find your service name: <code>kubectl get svc -l app.kubernetes.io/component=cockroachdb</code>. In our case it’s <code>crdb-cockroachdb-public</code>.</p>
<p> You’ll see nodes with IDs 1, 2, 3, 4, 5. Each maps to a replica pod like <code>crdb-cockroachdb-0</code>, <code>-1</code>, <code>-2</code>, <code>-3</code>, <code>-4</code>.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620790692/af8d382e-71db-4eab-af7a-a3491d98c8a8.png" alt="The nodes in the CockroachDB cluster" class="image--center mx-auto" width="1658" height="299" loading="lazy"></p>
</li>
<li><p><strong>Decommission the node with the highest index</strong> (since Kubernetes will remove the highest-numbered replica when scaling down).<br> For example, if you’re removing the pod <code>crdb-cockroachdb-4…</code>, and the node ID is 5:</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620838125/b51856cb-2fbb-4b24-ba41-21f572c7678c.png" alt="The node to be decommissioned" class="image--center mx-auto" width="527" height="38" loading="lazy"></p>
<p> Run the command below to decommission the 5th node.</p>
<pre><code class="lang-yaml"> <span class="hljs-string">./cockroach</span> <span class="hljs-string">node</span> <span class="hljs-string">decommission</span> <span class="hljs-number">5</span> <span class="hljs-string">--host</span> <span class="hljs-string">crdb-cockroachdb-public</span> <span class="hljs-string">--insecure</span>
</code></pre>
</li>
<li><p>Navigate to the CockroachDB dashboard, and monitor until the node status shows as <code>decommissioned</code>.<br> In the CockroachDB Console’s Cluster Overview page, you’ll see formerly removed nodes under “Recently Decommissioned Nodes”.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762620923692/e678b21b-e2cc-4fe5-bd5b-46c4b0248958.png" alt="e678b21b-e2cc-4fe5-bd5b-46c4b0248958" class="image--center mx-auto" width="1335" height="734" loading="lazy"></p>
</li>
<li><p><strong>Scale down the replicas</strong> in your Helm values file:</p>
<pre><code class="lang-yaml"> <span class="hljs-attr">statefulset:</span>
   <span class="hljs-attr">replicas:</span> <span class="hljs-number">4</span>
 <span class="hljs-string">...</span>
</code></pre>
<p> Then run:</p>
<pre><code class="lang-bash"> helm upgrade crdb cockroachdb/cockroachdb -f cockroachdb-values.yml
</code></pre>
</li>
<li><p>Verify pods:</p>
<pre><code class="lang-bash"> kubectl get pods
</code></pre>
<p> You should now see 4 CockroachDB replica pods.</p>
</li>
<li><p><strong>Delete the PVC</strong> for the removed node (to avoid paying for storage you’re no longer using):</p>
</li>
</ol>
<pre><code class="lang-bash">kubectl delete pvc datadir-crdb-cockroachdb-4
</code></pre>
<ol start="11">
<li>Repeat the process for the next node if you want to go from 4 to 3 replicas: decommission node #4 next, scale to 3, delete its PVC, and so on.</li>
</ol>
<p>After you’re done, you’ll have the target state (for example, 3 nodes) safely and cleanly without causing cluster instability or data loss.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762621007089/cf7fce07-a3a6-4b01-9536-1d5476c2119e.png" alt="Scaling down to 3 nodes, the nodes status on the CockroachDB dashboarrd" class="image--center mx-auto" width="1314" height="705" loading="lazy"></p>
<p>To learn more about scaling down your CockroachDB nodes, visit the <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/scale-cockroachdb-kubernetes?filters=helm#remove-nodes">official CockroachDB docs</a>.</p>
<p>Note that you should <strong>NOT</strong> use Horizontal Pod Autoscalers for scaling up and down your CockroachDB cluster.</p>
<p>Remember, before scaling down, you need to <strong>DECOMMISSION THE NODES FIRST</strong>, and <strong>scale down ONE AT A TIME</strong>!</p>
<p>However, the Horizontal Pod Autoscalers do NOT obey this. So if you intend to auto-scale your CockroachDB cluster, it's best to have a fixed size of replicas, for example, 3, 5, 7.</p>
<p>Then set up a Vertical pod Autoscaler to scale their CPU and RAM (Remember to set the Memory and CPU requests and limits to the same quantity to prevent eviction as explained earlier).</p>
<h2 id="heading-what-to-consider-when-deploying-cockroachdb-on-google-kubernetes-engine-gke">What to Consider When Deploying CockroachDB on Google Kubernetes Engine (GKE) ☁️</h2>
<p>Up until now we’ve been working in a <strong>development environment</strong> (using Minikube, local setups), testing and learning.</p>
<p>Now we’re ready to move into <strong>production mode 🤓</strong>. And one of the best places to host CockroachDB in production is on GKE.</p>
<p>In this section, we’ll cover GKE-specific considerations, such as storage classes, load balancers, networking, and how to secure our CockroachDB cluster on GKE using mTLS for authenticating our clients and encrypting any data sent to and from our CockroachDB cluster.</p>
<h3 id="heading-creating-your-gke-cluster">Creating Your GKE Cluster</h3>
<p>To get started, head over to the <a target="_blank" href="https://console.cloud.google.com/"><strong>Google Cloud Console</strong></a>.</p>
<p>In the search bar at the top, type “Kubernetes” and click on “Kubernetes Engine” from the dropdown.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836788168/0d509529-69fb-4308-ba05-6a1426ee7fe1.png" alt="Searching the Kubernetes Engine resource" class="image--center mx-auto" width="1381" height="183" loading="lazy"></p>
<p>You’ll be taken to the Kubernetes Engine page. On the left sidebar, click “Clusters.” Then click the “Create” button at the top.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836843514/fc6d59a2-5b9d-4dee-9fea-7bbb7fc2a023.png" alt="Creating a new cluster" class="image--center mx-auto" width="1352" height="434" loading="lazy"></p>
<p>💡 <strong>Note:</strong> You’ll need to enable the <strong>Compute Engine API</strong> before you can create a GKE cluster. If you haven’t done that yet, Google Cloud will automatically redirect you to a page where you can enable it. Just click “Enable”, then return to the cluster page.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763998084001/3ecbe47c-3def-4f9c-bc80-dabe2c0002c8.png" alt="Enabling the Compute Engine API" class="image--center mx-auto" width="1090" height="537" loading="lazy"></p>
<p>You can also learn more about enabling APIs in Google Cloud here: <a target="_blank" href="https://docs.cloud.google.com/endpoints/docs/openapi/enable-api">Enable APIs in Google Cloud</a>.</p>
<p>Once you’re back, you’ll see the cluster creation page. If it defaults to Autopilot, click “Switch to Standard cluster” in the top-right corner. This gives you more control over node settings.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836938958/a2c35e79-6404-4c3a-a821-94d4ce926839.png" alt="Switching to Standard Cluster settings" class="image--center mx-auto" width="1153" height="676" loading="lazy"></p>
<p>Under Cluster basics, give your cluster a name – something like <code>cockroachdb-tutorial</code> works great! Then, set Location type to Zonal (that’s fine for now).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762836985443/eb7b1f79-66e3-4ca4-bfe3-842c5571509b.png" alt="Configuring Zonal clusters" class="image--center mx-auto" width="864" height="803" loading="lazy"></p>
<p>On the left sidebar, go to “Node pools.” You’ll see a default pool already added.</p>
<ul>
<li><p>Keep the name as is.</p>
</li>
<li><p>Set the Number of nodes to 1.</p>
</li>
<li><p>Enable the Cluster autoscaler option (so it can scale up automatically later).</p>
</li>
<li><p>Set the Maximum number of Nodes to 10, and the minimum to 0.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762918866561/89a00b2c-46e8-440d-8662-77386cc2cf0e.png" alt="Modifying our default node pool, the cluster autoscaler, etc" class="image--center mx-auto" width="867" height="822" loading="lazy"></p>
</li>
</ul>
<p>Next, click the dropdown arrow beside “default-pool” and select “Nodes.” Here, set up your node specifications:</p>
<ul>
<li><p><strong>VM family:</strong> <code>E2</code></p>
</li>
<li><p><strong>Machine type:</strong> <code>Custom</code></p>
</li>
<li><p><strong>vCPUs:</strong> 2</p>
</li>
<li><p><strong>Memory:</strong> 7 GB</p>
</li>
<li><p><strong>Boot disk type:</strong> Standard persistent disk</p>
</li>
<li><p><strong>Disk size:</strong> 50 GB</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762837157043/89da8297-8ecc-4369-aef5-c3b0e75e37be.png" alt="Configuring the E2 Machine type" class="image--center mx-auto" width="860" height="617" loading="lazy"></p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762920102117/173a1d66-d31b-49e3-835b-436ec2781b49.png" alt="Configuring our default pool CPU, RAM, and disk" class="image--center mx-auto" width="870" height="616" loading="lazy"></p>
</li>
</ul>
<p>When all that’s set, click “Create.” Your cluster will start provisioning.</p>
<h3 id="heading-connecting-to-your-gke-cluster">Connecting to your GKE cluster</h3>
<p>Once your GKE cluster creation is complete (this might take a few minutes), you’ll see something like this in the console:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762844143298/042cc870-82ae-4981-b7c8-d80b187f37a9.png" alt="Accessing out new cluster page" class="image--center mx-auto" width="1267" height="537" loading="lazy"></p>
<p>Next, click the “Connect” link at the top of the page. A modal will pop up. Copy the CLI command you see.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762844213835/119b603c-26c3-46ee-83e1-8feba78031a7.png" alt="Getting the command to access the cluster" class="image--center mx-auto" width="1258" height="710" loading="lazy"></p>
<p>It’ll look something like:</p>
<pre><code class="lang-bash">gcloud container clusters get-credentials cockroachdb-tutorial --zone us-central1<span class="hljs-_">-a</span> --project &lt;PROJECT_NAME&gt;
</code></pre>
<p>📌 <strong>Note:</strong> To run this command successfully, you need to have the <code>gcloud</code> CLI tool installed. If you don’t have it yet, visit <a target="_blank" href="https://docs.cloud.google.com/sdk/docs/install">Install Google Cloud SDK</a> and pick the steps for your OS.</p>
<p>After installing the <code>gcloud</code> CLI, run:</p>
<pre><code class="lang-bash">gcloud auth login
</code></pre>
<p>This authenticates your terminal with your Google Cloud account so you can access the cluster securely.</p>
<p>After authenticating your terminal with access to Google Cloud, run the command you copied earlier. You should see something like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762844890936/12e6d8a7-b0ae-44d1-a77c-aeb118ba269b.png" alt="The command to provide your terminate your terminal to the newly created Kubernetes cluster" class="image--center mx-auto" width="1293" height="54" loading="lazy"></p>
<p>Now run the command to retrieve your pods, <code>kubectl get po</code>. This will retrieve the pods from your new cluster on Google Kubernetes Engine, not Minikube.</p>
<p>For now, we’ve not deployed anything yet, so the namespace should be empty.</p>
<p>But we should have at least 1 worker node available. Run the <code>kubectl get nodes</code> command to view it. You should see something similar to this (GKE takes care of our control plane for us, so when we view the nodes, we’ll only see the worker nodes).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762917947091/c29eb598-1723-43d0-a77f-c6611d04d3d8.png" alt="The available nodes in the GKE cluster" class="image--center mx-auto" width="702" height="55" loading="lazy"></p>
<h3 id="heading-deploying-cockroachdb-in-production-on-gke">Deploying CockroachDB in Production (on GKE)</h3>
<p>Now that we’ve successfully created our Google Kubernetes Engine (GKE) cluster, it’s time to deploy our CockroachDB cluster in it – this time, in production mode.</p>
<p>Unlike our earlier Minikube setup (which we used for local development), deploying to GKE introduces new considerations like security, storage classes, and authentication methods – all tailored for a real-world production environment.</p>
<p>To get started, create a new file called <code>cockroachdb-production.yml</code>, and paste the following configuration inside:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">statefulset:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
    <span class="hljs-attr">limits:</span>
      <span class="hljs-attr">memory:</span> <span class="hljs-string">"3Gi"</span>
      <span class="hljs-attr">cpu:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">serviceAccount:</span>
    <span class="hljs-attr">create:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">"crdb-cockroachdb"</span>
    <span class="hljs-attr">annotations:</span>
      <span class="hljs-attr">iam.gke.io/gcp-service-account:</span> <span class="hljs-string">&lt;GOOGLE_SERVICE_ACCOUNT&gt;</span>

<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">10Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">premium-rwo</span>

<span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>

<span class="hljs-attr">init:</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app.kubernetes.io/component:</span> <span class="hljs-string">init</span>
  <span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">wait:</span>
      <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>Replace the placeholder <code>&lt;GOOGLE_SERVICE_ACCOUNT&gt;</code> with the <strong>CockroachDB backup service account</strong> you created earlier (in the “Backing Up CockroachDB to Google Cloud Storage” section). It should look something like this <code>cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</code>.</p>
<h3 id="heading-understanding-the-configuration">Understanding the Configuration</h3>
<p>Let’s break down what’s happening in this production Helm values configuration and how it differs from the one we used in Minikube.👇🏽</p>
<h4 id="heading-1-modified-the-statefulset-configuration">1. Modified the <code>statefulset</code> Configuration</h4>
<p>We’re allocating 3 GiB of RAM and 1 vCPU to each replica, both as requests and limits.</p>
<p>This ensures that each node has enough guaranteed resources and avoids Kubernetes evicting it due to it using more than its requested resources.</p>
<p>We also defined a <strong>service account</strong> and annotated it with a GCP service account using the <code>iam.gke.io/gcp-service-account</code> annotation.</p>
<p>This annotation allows CockroachDB to securely access Google Cloud services (like Google Cloud Storage) without using static JSON key files (key.json), thanks to a GKE feature called <strong>Workload Identity</strong>.</p>
<p>In production, we let GKE handle authentication to Google services instead of mounting key files.</p>
<h4 id="heading-2-removed-podsecuritycontext">2. Removed <code>podSecurityContext</code></h4>
<p>In Minikube, we included this section:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">podSecurityContext:</span>
  <span class="hljs-attr">fsGroup:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsUser:</span> <span class="hljs-number">1000</span>
  <span class="hljs-attr">runAsGroup:</span> <span class="hljs-number">1000</span>
<span class="hljs-string">...</span>
</code></pre>
<p>We did that to give CockroachDB permission to access our local disk for persistent storage. But in GKE, this isn’t needed. Google Cloud handles storage mounting securely on our behalf, so we can safely omit this part.</p>
<h4 id="heading-3-removed-podantiaffinity-and-nodeselector">3. Removed <code>podAntiAffinity</code> and <code>nodeSelector</code></h4>
<p>In our Minikube deployment, we used:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">podAntiAffinity:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">""</span>
<span class="hljs-attr">nodeSelector:</span>
  <span class="hljs-attr">kubernetes.io/hostname:</span> <span class="hljs-string">minikube</span>
<span class="hljs-string">...</span>
</code></pre>
<p>That was just to <strong>force all CockroachDB instances to run on the same node</strong> on Minikube.</p>
<p>But in production, we <em>want</em> each replica on a different VM. This ensures high availability, even if one VM fails, only one CockroachDB replica is affected, and the cluster stays active.</p>
<p>Since our cluster uses a replication factor of 3, at least 2 replicas (a quorum) need to be active for the database to stay online, else, it will crash 🥲.</p>
<h4 id="heading-4-removed-env-volumes-and-volumemounts">4. Removed <code>env</code>, <code>volumes</code>, and <code>volumeMounts</code></h4>
<p>In Minikube, we had to manually mount the Service Account key:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">env:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GOOGLE_APPLICATION_CREDENTIALS</span>
    <span class="hljs-attr">value:</span> <span class="hljs-string">/var/run/gcp/key.json</span>
<span class="hljs-attr">volumes:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
    <span class="hljs-attr">secret:</span>
      <span class="hljs-attr">secretName:</span> <span class="hljs-string">gcs-key</span>
<span class="hljs-attr">volumeMounts:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gcp-sa</span>
    <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/var/run/gcp</span>
    <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
<span class="hljs-string">...</span>
</code></pre>
<p>This was needed so CockroachDB could access our Google Cloud Storage bucket for backups.</p>
<p>But in production, we don’t use key files. Instead, we use a GKE feature called Workload Identity.</p>
<p>It securely binds a Kubernetes Service Account to a Google Service Account, giving our CockroachDB pods the same permissions as the GCP account: no keys, no secrets, and much safer 🔒</p>
<h4 id="heading-5-updated-storagepersistentvolumestorageclass">5. Updated <code>storage.persistentVolume.storageClass</code></h4>
<p>In Minikube, we used a standard disk:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">5Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">standard</span>
<span class="hljs-string">...</span>
</code></pre>
<p>But for production, we’re switching to a faster SSD:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">storage:</span>
  <span class="hljs-attr">persistentVolume:</span>
    <span class="hljs-attr">size:</span> <span class="hljs-string">10Gi</span>
    <span class="hljs-attr">storageClass:</span> <span class="hljs-string">premium-rwo</span>
<span class="hljs-string">...</span>
</code></pre>
<p>This uses Google Cloud’s <code>pd-ssd</code> disk type which is the recommended choice for CockroachDB due to its <strong>high IOPS</strong> (read/write operations per second) and <strong>throughput</strong>. This gives our cluster faster read and write speeds under load, leading to better performance.</p>
<h4 id="heading-6-enabled-tls-for-secure-communication">6. Enabled TLS for Secure Communication</h4>
<p>In development, we disabled TLS:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">tls:</span>
  <span class="hljs-attr">enabled:</span> <span class="hljs-literal">false</span>
</code></pre>
<p>That made it easier and simpler to connect without dealing with certificates.</p>
<p>But in production, security is non-negotiable. We’re enabling TLS to ensure that all communication with CockroachDB is encrypted in transit, and that only clients with <strong>valid certificates</strong> (signed by the same authority) can connect. This is <strong>mutual TLS (mTLS)</strong> authentication.</p>
<p>mTLS ensures that both sides (client and server) prove who they are, preventing impersonation or man-in-the-middle attacks. It’s one of the strongest ways to secure a production database connection.</p>
<p>To learn more about TLS and mTLS encryption, check out:</p>
<ul>
<li><p><a target="_blank" href="https://www.freecodecamp.org/news/understanding-website-encryption/">Understanding Website Encryption (FreeCodeCamp)</a></p>
</li>
<li><p><a target="_blank" href="https://medium.com/@LukV/mutual-tls-mtls-a-deep-dive-into-secure-client-server-communication-bbb83f463292">Mutual TLS Deep Dive (Medium)</a></p>
</li>
</ul>
<h3 id="heading-installing-the-cockroachdb-cluster-on-gke">Installing the CockroachDB Cluster on GKE</h3>
<p>We’ll use the values file you created (<code>cockroachdb-production.yml</code>) and deploy our CockroachDB cluster in our GKE cluster using Helm.</p>
<h4 id="heading-deploy-the-cluster">Deploy the cluster</h4>
<p>Run the following command:</p>
<pre><code class="lang-bash">helm install crdb cockroachdb/cockroachdb -f cockroachdb-production.yml
</code></pre>
<p>This command tells Helm to install a release named <code>crdb</code> using the <code>cockroachdb/cockroachdb</code> chart with your custom production-values file.</p>
<p>This step will take a few minutes. GKE will spin up 3 (or more) worker nodes to host the CockroachDB replicas.</p>
<p>Thanks to pod anti-affinity rules, you’ll typically see <strong>one replica pod per VM</strong> (which improves fault tolerance).</p>
<h4 id="heading-verify-the-pods">Verify the pods</h4>
<p>Once provisioning is done, check the pods:</p>
<pre><code class="lang-bash">kubectl get pods
</code></pre>
<p>You should see three CockroachDB replica pods (for example: <code>crdb-cockroachdb-0</code>, <code>crdb-cockroachdb-1</code>, <code>crdb-cockroachdb-2</code>) in <code>Running</code> status.</p>
<h4 id="heading-verify-the-storage-class-ssd">Verify the storage class (SSD)</h4>
<p>Now check the persistent volume claims to confirm they’re using the fast SSD storage class you requested:</p>
<pre><code class="lang-bash">kubectl get pvc
</code></pre>
<p>Look for your PVCs (persistent volume claims) and check the <code>STORAGECLASS</code> column. You should see something like <code>premium-rwo</code> instead of <code>standard</code> or <code>standard-rwo</code>. This confirms that your replicas are using the high-performance disk type you configured.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762928441524/d7e3d17f-c144-468f-8cc5-d71628ac6a3b.png" alt="The CockorachDB replicas and disk in production" class="image--center mx-auto" width="1008" height="209" loading="lazy"></p>
<p>📌 This is important, because in production you want good disk IOPS and throughput. Slower disks can bottleneck the database.</p>
<h3 id="heading-connecting-to-our-cockroachdb-cluster-now-that-tls-mtls-are-enabled">Connecting to Our CockroachDB Cluster (Now That TLS + mTLS Are Enabled)</h3>
<p>Now that we’ve enabled TLS encryption and mTLS authentication, let’s actually try connecting to the cluster so you can <em>see</em> what this security setup looks like in action.</p>
<p>We’ll break down in more detail what TLS and mTLS mean shortly. But for now, let’s jump straight into trying to connect – because once you see the behavior, the explanation becomes much easier to understand.</p>
<h4 id="heading-step-1-expose-the-cockroachdb-cluster-to-your-local-pc-using-port-forwarding">Step 1: Expose the CockroachDB Cluster to Your Local PC (Using Port Forwarding)</h4>
<p>Just like we've been doing from the start, we’ll expose our CockroachDB cluster through <strong>port-forwarding</strong>.</p>
<p>Open a new terminal window and run:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26259:26257
</code></pre>
<p>What this means:</p>
<ul>
<li><p>The first port (26259) is the port on your computer.</p>
</li>
<li><p>The second port (26257) is the port inside the CockroachDB cluster.</p>
</li>
<li><p>Format is: <code>&lt;YOUR_COMPUTER_PORT&gt;</code> <strong>:</strong> <code>&lt;COCKROACHDB_PORT&gt;</code></p>
</li>
</ul>
<p>So now, CockroachDB will be reachable locally at <code>localhost:26259</code>.</p>
<h4 id="heading-step-2-open-beekeeper-studio-and-create-a-fresh-connection">Step 2: Open Beekeeper Studio and Create a Fresh Connection</h4>
<p>If Beekeeper Studio is still connected to our old Minikube cluster, or you're not seeing the “new connection” screen, just press <code>Ctrl + Shift + N</code>. This opens a new connection window instantly.</p>
<h4 id="heading-step-3-enter-the-connection-details">Step 3: Enter the Connection Details</h4>
<p>Now fill in these fields:</p>
<ul>
<li><p><strong>Port:</strong> <code>26259</code></p>
</li>
<li><p><strong>User:</strong> <code>root</code></p>
</li>
<li><p><strong>Default Database:</strong> <code>defaultdb</code></p>
</li>
</ul>
<p>Now click Test Connection.</p>
<p>And boom! You should see a message telling you something like:</p>
<blockquote>
<p>“This cluster is running in secure mode. You must use SSL to connect.”</p>
</blockquote>
<p>It’ll look similar to this:👇🏾</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763193779864/f3e7abcb-34b0-4c21-8652-48a03e4ff6c9.png" alt="Trying to connect to the new CockroachDB cluster in insecure mode" class="image--center mx-auto" width="562" height="615" loading="lazy"></p>
<p>This is good: it means our CockroachDB cluster is officially in <strong>secure mode</strong>, and it’s rejecting any connection that doesn’t include proper TLS certificates.</p>
<h3 id="heading-connecting-via-mutual-tls-mtls-why-we-need-a-certificate-for-our-root-user">Connecting via Mutual TLS (mTLS) — Why We Need a Certificate for Our <code>root</code> User</h3>
<p>Now that our CockroachDB cluster is officially running in secure mode, we can’t just connect to it with a username and port anymore. CockroachDB won’t accept that.</p>
<p>To talk to it, <strong>we must connect using Mutual TLS (mTLS)</strong>.</p>
<p>Why? Because TLS alone only protects the connection in one direction (you verifying the server). mTLS protects the connection in both directions (you verify the server, and the server also verifies <em>you</em>).</p>
<p>Let’s break this down in simple, everyday English 👇🏾</p>
<h4 id="heading-why-tls-exists-in-the-first-place">Why TLS Exists in the First Place</h4>
<p>Whenever you send anything to CockroachDB, like a query, a connection, a password, whatever, it’s all data moving over a network – for example, the internet.</p>
<p>Without protection, anyone could intercept it and read the data being sent to your DB while it’s on its way<br>TLS fixes that :)</p>
<p>✔️ The CockroachDB cluster has its own <strong>public key + private key</strong><br>✔️ It has a <strong>certificate</strong> that carries its public key<br>✔️ When you connect, the cluster sends you this certificate<br>✔️ Your database tool, for example Beekeeper, uses the public key in the process of encrypting all your traffic sent to the DB<br>✔️ Only CockroachDB can decrypt it with the help of its private key</p>
<p>This gives you encryption and proof you’re really talking to CockroachDB, not some fake service pretending to be it.</p>
<h4 id="heading-why-mtls-exists-mutual-tls">Why mTLS Exists (Mutual TLS)</h4>
<p>TLS protects the server – CockroachDB. mTLS protects <strong>both sides</strong> – you and CockroachDB.</p>
<p>So CockroachDB also wants YOU to send your certificate.</p>
<p>But not just any certificate. Your certificate must be:</p>
<ul>
<li><p>Signed by <strong>THE SAME Certificate Authority (CA)</strong></p>
</li>
<li><p>Trusted by the CockroachDB cluster</p>
</li>
<li><p>Mapped to a CockroachDB user (like <code>root</code>)</p>
</li>
</ul>
<p>This is how CockroachDB says:</p>
<blockquote>
<p>“Let me see your certificate so I know you’re someone I should allow in.”</p>
</blockquote>
<p>And we reply:</p>
<blockquote>
<p>“Here is my certificate, signed by the same CA that signed yours.”</p>
</blockquote>
<p>At that point, both sides trust each other.</p>
<p>If this still feels abstract, <a target="_blank" href="https://www.youtube.com/watch?v=EnY6fSng3Ew">watch this video</a>. It explains TLS beautifully.</p>
<h3 id="heading-lets-explore-our-clusters-certificate">Let’s Explore Our Cluster’s Certificate</h3>
<p>Remember that the Helm chart automatically created:</p>
<ul>
<li><p>The CockroachDB Certificate Authority</p>
</li>
<li><p>The CockroachDB node certificates</p>
</li>
<li><p>The keypairs used for encryption</p>
</li>
</ul>
<p>You can list all the CockroachDB-related Kubernetes secrets with:</p>
<pre><code class="lang-bash">kubectl get secrets
</code></pre>
<p>The one we're interested in is:</p>
<pre><code class="lang-bash">crdb-cockroachdb-node-secret
</code></pre>
<p>If you inspect this secret, you’ll see three keys inside:</p>
<ul>
<li><p><code>ca.crt</code>: the CA’s public certificate</p>
</li>
<li><p><code>tls.key</code>: the CockroachDB node’s private key</p>
</li>
<li><p><code>tls.crt</code>: the CockroachDB node certificate</p>
</li>
</ul>
<p>Now let’s decode the CockroachDB node certificate.</p>
<p>Run this:</p>
<pre><code class="lang-bash">kubectl get secret crdb-cockroachdb-node-secret -o jsonpath=<span class="hljs-string">'{.data.tls\.crt}'</span> | base64 -d &gt; crdb-node.crt
</code></pre>
<p>This gives you the raw certificate (which looks like gibberish):</p>
<pre><code class="lang-bash">-----BEGIN CERTIFICATE-----
MIIEGDCCAwCgAwIBAgIQWgOPJa4OLoZZjcXLgDF3bjANBgkqhkiG9w0BAQsFADAr
...
-----END CERTIFICATE-----
</code></pre>
<p>Let’s decode it into something readable:</p>
<pre><code class="lang-bash">openssl x509 -<span class="hljs-keyword">in</span> ./crdb-node.crt -text -noout &gt; crdb-node.crt.decoded
</code></pre>
<p>Open the <code>crdb-node.crt.decoded</code> file. This is the <strong>human-readable</strong> CockroachDB cluster certificate.</p>
<p><strong>N.B.:</strong> You need to have the <code>openssl</code> tool installed in order to be able to make the certificate human-readable. If you don’t, <a target="_blank" href="https://github.com/openssl/openssl#download">install it following this tutorial</a>.</p>
<h3 id="heading-understanding-the-certificate-sections-explained-super-simply">Understanding the Certificate Sections (Explained Super Simply)</h3>
<h4 id="heading-1-issuer">1. Issuer</h4>
<p>You’ll see something like:</p>
<pre><code class="lang-bash">Issuer: O = Cockroach, CN = Cockroach CA
</code></pre>
<p>This tells us:</p>
<ul>
<li><p>The certificate was signed by a Certificate Authority created by the Helm chart</p>
</li>
<li><p>The <strong>Organization (O)</strong> is “Cockroach”</p>
</li>
<li><p>The <strong>Common Name (CN)</strong> is “Cockroach CA”</p>
</li>
</ul>
<p>This basically means:</p>
<blockquote>
<p>“This certificate comes from the CockroachDB internal CA.”</p>
</blockquote>
<h4 id="heading-2-subject">2. Subject</h4>
<p>You’ll also see this:</p>
<pre><code class="lang-bash">Subject: O = Cockroach, CN = node
</code></pre>
<p>What does this mean?</p>
<p><strong>Organization = Cockroach</strong></p>
<ul>
<li><p>This simply groups all CockroachDB-generated certificates under one “organization label.”</p>
</li>
<li><p>It doesn’t refer to the company. It’s just a logical grouping created by CockroachDB’s built-in toolset.</p>
</li>
</ul>
<p><strong>Common Name = node</strong></p>
<ul>
<li><p>This tells CockroachDB that this certificate belongs to a <strong>cluster node</strong>, not a user or a client machine.</p>
</li>
<li><p>In CockroachDB, node certificates are used for:</p>
<ol>
<li><p>DB-to-DB communication</p>
</li>
<li><p>cluster gossip</p>
</li>
<li><p>handling incoming connections from clients (you)</p>
</li>
</ol>
</li>
</ul>
<p>So this certificate is saying:</p>
<blockquote>
<p>“Hi, I’m a CockroachDB node. Please trust me as part of the cluster.”</p>
</blockquote>
<h4 id="heading-3-extended-key-usage-eku">3. Extended Key Usage (EKU)</h4>
<p>Scroll down and you’ll see:</p>
<pre><code class="lang-bash">X509v3 Extended Key Usage:
    TLS Web Server Authentication
    TLS Web Client Authentication
</code></pre>
<p>This is <em>super important</em>, because it defines <strong>how</strong> this certificate is allowed to be used.</p>
<p>Let’s simplify it:</p>
<h4 id="heading-tls-web-server-authentication">TLS Web Server Authentication</h4>
<p>This means:</p>
<blockquote>
<p>“This certificate can be presented <strong>by a server</strong> to prove its identity.”</p>
</blockquote>
<p>In our case, the CockroachDB node uses this certificate to prove to you (the client) that it is the real CockroachDB server. Think of it like flashing an ID card before letting you in.</p>
<h4 id="heading-tls-web-client-authentication">TLS Web Client Authentication</h4>
<p>This means:</p>
<blockquote>
<p>“This certificate can also be used <strong>as a client certificate</strong>.”</p>
</blockquote>
<p>Why would a server have a client certificate? Well, because in CockroachDB, nodes (DBs) talk to each other. When node A connects to node B, node A is a <strong>client</strong>, and node B is a <strong>server</strong>.</p>
<p>So the same certificate serves two roles. Your local machine will use a different certificate, created specifically for your <code>root</code> user. We’ll generate that soon.</p>
<h3 id="heading-creating-a-client-certificate-so-we-can-finally-connect-to-cockroachdb">Creating a Client Certificate (So We Can Finally Connect to CockroachDB)</h3>
<p>Now that we’ve seen how the CockroachDB node certificate works, let’s generate our client certificate – the one we’ll use to connect from Beekeeper Studio.</p>
<p>Remember: CockroachDB is running in secure mode, so it won’t accept any connection that doesn’t come with a valid, signed certificate.</p>
<p>To fix that, let’s build a tiny Kubernetes pod whose only job is to create a certificate for our <code>root</code> SQL user.</p>
<h4 id="heading-step-1-create-a-file-called-gen-root-certyml">Step 1: Create a File Called <code>gen-root-cert.yml</code></h4>
<p>Paste this into it:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gen-root-cert</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span>
        <span class="hljs-attr">items:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.crt</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.crt</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.key</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.key</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gen</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.1</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"sh"</span>, <span class="hljs-string">"-ec"</span>]
      <span class="hljs-attr">args:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">|
          mkdir -p /out
</span>
          <span class="hljs-comment"># Copy the CockroachDB cluster Certificate Authority certificate file `ca.crt` (for Mutual TLS authentication)</span>
          <span class="hljs-string">cp</span> <span class="hljs-string">/ca/ca.crt</span> <span class="hljs-string">/out/ca.crt</span>

          <span class="hljs-comment"># Create the client certificate and key pair for the SQL user 'root' using the CockroachDB cluster Certificate Authority private key `ca.key`</span>
          <span class="hljs-string">/cockroach/cockroach</span> <span class="hljs-string">cert</span> <span class="hljs-string">create-client</span> <span class="hljs-string">root</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--certs-dir=/out</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--ca-key=/ca/ca.key</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--lifetime=5h</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--overwrite</span>

          <span class="hljs-comment"># List the generated files</span>
          <span class="hljs-string">ls</span> <span class="hljs-string">-al</span> <span class="hljs-string">/out</span>

          <span class="hljs-comment"># Keep the pod alive so we can kubectl cp the files</span>
          <span class="hljs-string">sleep</span> <span class="hljs-number">3600</span>
      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> { <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>, <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/ca</span>, <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span> }
      <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">requests:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"50Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"10m"</span>
        <span class="hljs-attr">limits:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"500Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"50m"</span>
</code></pre>
<p>So how does this work?</p>
<p>We previously mentioned that the Helm chart created a secret, <code>crdb-cockroachdb-ca-secret</code>.</p>
<p>This secret contains:</p>
<ul>
<li><p>The Certificate Authority public certificate</p>
</li>
<li><p>The private key (used for signing)</p>
</li>
<li><p>The CA metadata</p>
</li>
</ul>
<p>CockroachDB requires that the server certificate (node cert) and the client certificate (your root cert) be signed by <strong>THE SAME CA</strong>. Because this ensures both sides trust each other.</p>
<p>So what do we do?</p>
<p>We mount the CA secret into the pod:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">volumes:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
    <span class="hljs-attr">secret:</span>
      <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span>
</code></pre>
<p>This gives the pod access to:</p>
<ul>
<li><p><code>/ca/ca.crt</code>: CA public certificate</p>
</li>
<li><p><code>/ca/ca.key</code>: CA <em>private</em> key</p>
</li>
</ul>
<p>And with these, we can sign new client certificates inside the cluster.</p>
<p>The important command inside the pod:</p>
<pre><code class="lang-yaml"><span class="hljs-string">/cockroach/cockroach</span> <span class="hljs-string">cert</span> <span class="hljs-string">create-client</span> <span class="hljs-string">root</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--certs-dir=/out</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--ca-key=/ca/ca.key</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--lifetime=5h</span> <span class="hljs-string">\</span>
  <span class="hljs-string">--overwrite</span>
</code></pre>
<p>What this does:</p>
<ul>
<li><p>Generates a brand new public/private key pair for the <code>root</code> SQL user</p>
</li>
<li><p>Uses the CA private key to <strong>sign the client certificate</strong></p>
</li>
<li><p>Places everything inside <code>/out</code></p>
</li>
<li><p>Makes the certificate valid for <strong>5 hours</strong></p>
</li>
</ul>
<p>If we passed <code>demo</code> instead of <code>root</code>, then the certificate CN would be <code>demo</code>, and CockroachDB would treat anyone using that certificate as the <code>demo</code> SQL user.</p>
<p>That’s how CockroachDB identifies and authenticates users when running in secure mode.</p>
<h4 id="heading-step-2-deploy-the-pod">Step 2: Deploy the Pod</h4>
<p>Run:</p>
<pre><code class="lang-yaml"><span class="hljs-string">kubectl</span> <span class="hljs-string">apply</span> <span class="hljs-string">-f</span> <span class="hljs-string">gen-root-cert.yml</span>
</code></pre>
<p>Give it a minute to start and generate the files.</p>
<h4 id="heading-step-3-copy-the-certificates-to-your-local-pc">Step 3: Copy the Certificates to Your Local PC</h4>
<p>We need three files:</p>
<ul>
<li><p><code>client.root.crt</code>: client certificate</p>
</li>
<li><p><code>client.root.key</code>: private key</p>
</li>
<li><p><code>ca.crt</code>: CA certificate</p>
</li>
</ul>
<p>Copy them from the pod to your machine:</p>
<pre><code class="lang-bash">kubectl cp default/gen-root-cert:/out/client.root.crt ./client.root.crt
kubectl cp default/gen-root-cert:/out/client.root.key ./client.root.key
kubectl cp default/gen-root-cert:/out/ca.crt             ./ca.crt
</code></pre>
<p>Now your folder should contain:</p>
<pre><code class="lang-bash">client.root.crt
client.root.key
ca.crt
</code></pre>
<p>These are the files Beekeeper Studio needs for mTLS.</p>
<h4 id="heading-step-4-decode-the-client-certificate-just-like-we-did-for-the-node-certificate">Step 4: Decode the Client Certificate (Just Like We Did for the Node Certificate)</h4>
<p>Run:</p>
<pre><code class="lang-bash">openssl x509 -<span class="hljs-keyword">in</span> client.root.crt -text -noout &gt; crdb-root.crt.decoded
</code></pre>
<p>Open the <code>crdb-root.crt.decoded</code> file and look at the contents.</p>
<h4 id="heading-understanding-the-client-certificate">Understanding the Client Certificate</h4>
<ol>
<li><strong>Issuer</strong></li>
</ol>
<p>You'll see <code>Issuer: O = Cockroach, CN = Cockroach CA</code></p>
<p>This is the same Issuer as the CockroachDB node certificate.</p>
<p>This confirms that both certificates were signed by the <em>same</em> Certificate Authority, that they trust each other, and that mTLS will work perfectly.</p>
<ol start="2">
<li><strong>Subject</strong></li>
</ol>
<p>You’ll see: <code>Subject: O = Cockroach, CN = root</code></p>
<p>This means that the Organization is just a label grouping CockroachDB identities, and that the Common Name is <code>root</code>. This is VERY important.</p>
<p>The CN of a client certificate literally tells CockroachDB:</p>
<blockquote>
<p>“This connection belongs to the SQL user named <code>root</code>.”</p>
</blockquote>
<p>If CN was <code>demo</code>, CockroachDB would authenticate you as the <code>demo</code> SQL user.</p>
<h4 id="heading-extended-key-usage-eku">Extended Key Usage (EKU)</h4>
<p>You should see: <code>TLS Web Client Authentication</code>.</p>
<p>This is exactly what we want. It tells CockroachDB:</p>
<blockquote>
<p>“This certificate is only for clients connecting to the database.”</p>
</blockquote>
<p>Unlike node certificates, you will NOT see: <code>TLS Web Server Authentication</code>.</p>
<p>Why?</p>
<p>Because:</p>
<ul>
<li><p><strong>Server Authentication</strong> = for certificates the SERVER SHOWS TO THE CLIENT. For example: CockroachDB nodes proving they are legitimate.</p>
</li>
<li><p><strong>Client Authentication</strong> = for certificates THE CLIENT SENDS TO THE SERVER. For example: You proving you are the real <code>root</code> user.</p>
</li>
</ul>
<h4 id="heading-why-your-client-certificate-cannot-be-used-as-a-server-certificate">Why your client certificate <strong>cannot</strong> be used as a server certificate</h4>
<p>Because a server certificate says:</p>
<blockquote>
<p>“Trust me, I AM the CockroachDB server.”</p>
</blockquote>
<p>But your client certificate says:</p>
<blockquote>
<p>“Trust me, I am an authenticated user.”</p>
</blockquote>
<p>Two very different identities. And CockroachDB will <em>reject</em> any certificate used in the wrong role.</p>
<p>So having only TLS Web Client Authentication in your certificate is perfect for our use case. :)</p>
<h3 id="heading-connecting-to-our-cockroachdb-cluster-securely-using-mtls">Connecting to Our CockroachDB Cluster Securely (Using mTLS)</h3>
<p>Now that we’ve successfully generated the certificates and key pairs we need, it's time to use them to securely connect to our CockroachDB cluster from Beekeeper Studio.</p>
<p>Remember: CockroachDB is running in secure mode, so without these certificates, it will <em>reject all incoming connections</em>, even if you enter the correct username and password.</p>
<p>Let’s walk through the steps.👇🏾</p>
<h4 id="heading-step-1-make-sure-port-forwarding-is-still-running">Step 1: Make Sure Port Forwarding Is Still Running</h4>
<p>Before connecting, ensure that your CockroachDB cluster is still exposed to your PC.</p>
<p>If you already closed the previous terminal window, simply re-run this:</p>
<pre><code class="lang-bash">kubectl port-forward svc/crdb-cockroachdb-public 26259:26257
</code></pre>
<p>This makes your CockroachDB node reachable at: <code>localhost:26259</code>. If this step isn’t active, <em>Beekeeper Studio will not be able to connect</em>.</p>
<h4 id="heading-step-2-open-beekeeper-studio-and-set-up-the-connection">Step 2: Open Beekeeper Studio and Set Up the Connection</h4>
<p>Launch Beekeeper Studio and open a fresh connection window (Ctrl + Shift + N if needed).</p>
<p>Now fill in the fields like this:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Field</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Connection Type</strong></td><td>CockroachDB</td></tr>
<tr>
<td><strong>Host</strong></td><td><code>localhost</code></td></tr>
<tr>
<td><strong>Port</strong></td><td><code>26259</code></td></tr>
<tr>
<td><strong>User</strong></td><td><code>root</code></td></tr>
<tr>
<td><strong>Default Database</strong></td><td><code>defaultdb</code></td></tr>
</tbody>
</table>
</div><p>Now enable the <strong>“Enable SSL”</strong> option. Once enabled, expand the SSL section and set the following three fields:</p>
<ul>
<li><p><strong>CA Cert:</strong> Set this to the location of: <code>ca.crt</code>. This is the root Certificate Authority file you copied earlier using: <code>kubectl cp default/gen-root-cert:/out/ca.crt ./ca.crt</code>. It should still be in your project’s root directory (for example, <code>cockroachdb-tutorial/</code>).</p>
</li>
<li><p><strong>Certificate:</strong> Set this to the location of: <code>client.root.crt</code></p>
</li>
<li><p><strong>Key File:</strong> Set this to the location of: <code>client.root.key</code></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763389469459/bbdb17c5-1c3b-4163-932f-3cd5382160f4.png" alt="Connecting to the CokcorachDB cluster from Beekeeper Studio in &quot;Secure&quot; mode" class="image--center mx-auto" width="512" height="400" loading="lazy"></p>
<h4 id="heading-step-3-click-connect">Step 3: Click “Connect”</h4>
<p>Once all the fields are set properly, click <strong>Connect</strong>.</p>
<p>If everything was done correctly, you should now be connected to your CockroachDB cluster securely over Mutual TLS.</p>
<p>If the connection fails:</p>
<ul>
<li><p>Double-check your certificate paths</p>
</li>
<li><p>Ensure port-forwarding is running</p>
</li>
<li><p>Verify the user is <code>root</code></p>
</li>
<li><p>Confirm the selected connection type is <code>CockroachDB</code>.</p>
</li>
</ul>
<h4 id="heading-step-4-run-your-first-secure-query">Step 4: Run Your First Secure Query</h4>
<p>Now that you're connected, let’s verify everything works by running:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SHOW</span> <span class="hljs-keyword">users</span>;
</code></pre>
<p>You should see two users automatically created by CockroachDB:</p>
<ul>
<li><p><strong>admin</strong></p>
</li>
<li><p><strong>root</strong></p>
</li>
</ul>
<p>In the next subsection, we’ll create a <strong>new SQL user</strong> and generate a certificate for that user (just like we did for the <code>root</code> user) so you’ll understand how CockroachDB handles user authentication in production environments.</p>
<h3 id="heading-restoring-our-previous-database-into-the-new-gke-cockroachdb-cluster-without-sa-keys">Restoring Our Previous Database into the New GKE CockroachDB Cluster (without SA keys)</h3>
<p>Now that our CockroachDB cluster is up and running on GKE – fully secured with TLS encryption and mTLS authentication – it’s time to bring back the data from our previous setup.</p>
<p>Remember how we backed up our CockroachDB database (running on Minikube) to Google Cloud Storage?</p>
<p>Well, now we’re going to restore that same backup into our new production cluster on GKE. But before CockroachDB can access our bucket, we must give it permission – securely.</p>
<p>And here’s the cool part: <strong>we don’t need to use Service Account keys anymore.</strong></p>
<h4 id="heading-why-we-dont-need-service-account-keys-on-gke">Why We Don’t Need Service Account Keys on GKE</h4>
<p>Earlier, in the backup section, we generated a Service Account key on our PC and mounted it into our Minikube cluster.</p>
<p>But for GKE, we intentionally left out the following fields in our <code>cockroachdb-production.yml</code>:</p>
<ul>
<li><p><code>env</code></p>
</li>
<li><p><code>volumes</code></p>
</li>
<li><p><code>volumeMounts</code></p>
</li>
</ul>
<p>The reason? GKE supports something called <strong>Workload Identity</strong>.</p>
<p>Workload Identity lets us securely connect Kubernetes Service Accounts (KSAs) to Google Cloud Service Accounts (GSAs), without storing or mounting any secret keys. The authentication happens “implicitly” thanks to Google’s metadata server.</p>
<p>💡 Workload Identity works easily when your cluster is running on GKE. It’s more complex to set up on Minikube, Kind, EKS, AKS, or any other non-GKE cluster.</p>
<h4 id="heading-step-1-linking-the-google-service-account-to-our-kubernetes-service-account">Step 1: Linking the Google Service Account to Our Kubernetes Service Account</h4>
<p>We already touched this when deploying our cluster, but let’s look at the specific line again.</p>
<p>Open your <code>cockroachdb-production.yml</code> Helm values file and scroll to the <code>serviceAccount</code> section. You should see something like this:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">serviceAccount:</span>
    <span class="hljs-attr">create:</span> <span class="hljs-literal">true</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">"crdb-cockroachdb"</span>
    <span class="hljs-attr">annotations:</span>
      <span class="hljs-attr">iam.gke.io/gcp-service-account:</span> <span class="hljs-string">cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</span>
<span class="hljs-string">...</span>
</code></pre>
<p>Replace the <code>&lt;PROJECT_ID&gt;</code> placeholder with your real Google Cloud project ID.</p>
<p>If you’re unsure of the ID, go to Google Cloud Console, then to IAM &amp; Admin, and finally to Service Accounts. Search for <code>cockroachdb-backup</code> and copy the project ID from there.</p>
<p>This annotation instructs GKE to automatically authenticate our CockroachDB pods as the <code>cockroachdb-backup</code> Google Service Account – no keys needed.</p>
<h4 id="heading-step-2-binding-ksa-gsa-using-workload-identity">Step 2: Binding KSA ↔️ GSA Using Workload Identity</h4>
<p>Annotating the Service Account isn’t enough. We still need to explicitly allow our KSA to “impersonate" the GSA.</p>
<p>Run this command to set the active project:</p>
<pre><code class="lang-bash">gcloud config <span class="hljs-built_in">set</span> project &lt;PROJECT_ID&gt;
</code></pre>
<p>Now, apply the IAM policy binding:</p>
<pre><code class="lang-bash">gcloud iam service-accounts add-iam-policy-binding \
  &lt;GOOGLE_SERVICE_ACCOUNT&gt; \
  --role roles/iam.workloadIdentityUser \
  --member <span class="hljs-string">"serviceAccount:&lt;PROJECT_ID&gt;.svc.id.goog[&lt;NAMESPACE&gt;/&lt;KUBERNETES_SERVICE_ACCOUNT&gt;]"</span>
</code></pre>
<p>Replace the placeholders with:</p>
<ul>
<li><p><code>&lt;GOOGLE_SERVICE_ACCOUNT&gt;</code> with <code>cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</code></p>
</li>
<li><p><code>&lt;PROJECT_ID&gt;</code> with your GCP project ID</p>
</li>
<li><p><code>&lt;NAMESPACE&gt;</code> with where CockroachDB runs (<code>default</code>)</p>
</li>
<li><p><code>&lt;KUBERNETES_SERVICE_ACCOUNT&gt;</code> with <code>crdb-cockroachdb</code></p>
</li>
</ul>
<p>After a few seconds, you should see something like:</p>
<pre><code class="lang-yaml"><span class="hljs-string">Updated</span> <span class="hljs-string">IAM</span> <span class="hljs-string">policy</span> <span class="hljs-string">for</span> <span class="hljs-string">serviceAccount</span> [<span class="hljs-string">cockroachdb-backup@&lt;PROJECT_ID&gt;.iam.gserviceaccount.com</span>]<span class="hljs-string">.</span>
<span class="hljs-attr">bindings:</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">members:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">serviceAccount:&lt;PROJECT_ID&gt;.svc.id.goog[default/crdb-cockroachdb]</span>
  <span class="hljs-attr">role:</span> <span class="hljs-string">roles/iam.workloadIdentityUser</span>
<span class="hljs-attr">etag:</span> <span class="hljs-string">***</span>
<span class="hljs-attr">version:</span> <span class="hljs-number">1</span>
</code></pre>
<p>Perfect. Your KSA can now access Google Cloud Storage automatically.</p>
<h3 id="heading-restoring-our-previous-database-from-google-cloud-storage">Restoring Our Previous Database from Google Cloud Storage</h3>
<p>Now that authentication is set up, let’s restore the backup we previously created in the Minikube cluster.</p>
<p>Open Beekeeper Studio and reconnect to your CockroachDB cluster (the one running on GKE).</p>
<p>Before restoring anything, let’s check if the <code>books</code> table exists:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>You should see an error saying the table doesn’t exist. Don’t worry, that’s expected.</p>
<h3 id="heading-now-lets-restore-the-data">Now, Let’s Restore the Data 🎉</h3>
<p>Run this command:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">RESTORE</span> <span class="hljs-keyword">FROM</span> LATEST <span class="hljs-keyword">IN</span> <span class="hljs-string">'gs://&lt;BUCKET_NAME&gt;/cluster?AUTH=implicit'</span>;
</code></pre>
<p>Replace <code>&lt;BUCKET_NAME&gt;</code> with the name of the bucket you created earlier (for example: <code>cockroachdb-backup-7gw8u</code>).</p>
<p>CockroachDB will now:</p>
<ul>
<li><p>Authenticate using Workload Identity</p>
</li>
<li><p>Find the latest backup inside your bucket</p>
</li>
<li><p>Restore all tables, schemas, and data into your new GKE cluster</p>
</li>
</ul>
<p>After a couple of minutes, you should get a Success message.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763393752870/f95d76c0-3722-491a-a97c-a1b8a79bdc79.png" alt="Successfully restored CockroachDB database" class="image--center mx-auto" width="587" height="268" loading="lazy"></p>
<p>Now, run the query again:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom! Your books from the Minikube cluster should now appear inside the new CockroachDB cluster running on GKE 😃.</p>
<h3 id="heading-connecting-to-the-database-with-a-new-user">Connecting to the Database with a New User</h3>
<p>So far, we’ve been connecting to our CockroachDB cluster using the <code>root</code> user. While this is super convenient for tutorials, it’s not recommended for real apps.</p>
<p>This is because the <code>root</code> user has advanced privileges – basically, full access to your entire cluster. If an attacker got hold of these credentials, or your application was compromised, they could do <strong>A LOT</strong> of damage. 😬</p>
<p>Instead, it’s best practice to create a user with <strong>limited permissions</strong> for your apps. This way, even if the user is compromised, the damage is contained.</p>
<h4 id="heading-authentication-options-for-users">Authentication Options for Users</h4>
<p>CockroachDB is flexible when it comes to authentication:</p>
<ol>
<li><p><strong>Password Authentication:</strong> Create a user with a password and connect using just username + password (no client certificates required).</p>
</li>
<li><p><strong>Passwordless / Mutual TLS Authentication:</strong> Create a user without a password, then connect using client certificates signed by the same CA (like we did for <code>root</code>).</p>
</li>
<li><p><strong>Both Password + Mutual TLS:</strong> Create a user with a password and also connect using client certificates. This adds an extra layer of security.</p>
</li>
</ol>
<p>In this subsection, we’ll start simple and use password authentication.</p>
<h4 id="heading-step-1-create-the-new-user">Step 1: Create the New User</h4>
<p>Open your current connection in Beekeeper Studio (signed in as <code>root</code>) and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">USER</span> password_auth <span class="hljs-keyword">WITH</span> <span class="hljs-keyword">PASSWORD</span> <span class="hljs-string">'supersecret'</span>;
</code></pre>
<p>You should see a message confirming the user was created successfully.</p>
<h4 id="heading-step-2-connect-as-the-new-user">Step 2: Connect as the New User</h4>
<p>Open a new Beekeeper Studio window (Ctrl + Shift + N). <strong>DO NOT</strong> exit/close the old window, as we’ll need it later.</p>
<p>Fill in the connection fields:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Field</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Connection Type</strong></td><td>CockroachDB</td></tr>
<tr>
<td><strong>Host</strong></td><td><code>localhost</code></td></tr>
<tr>
<td><strong>Port</strong></td><td><code>26259</code></td></tr>
<tr>
<td><strong>Database</strong></td><td><code>defaultdb</code></td></tr>
<tr>
<td><strong>User</strong></td><td><code>password_auth</code></td></tr>
<tr>
<td><strong>Password</strong></td><td><code>huh</code> (for now, we’ll try a wrong password to see it fail)</td></tr>
</tbody>
</table>
</div><p>Click Connect.</p>
<p>❌ You’ll see an error about SSL connection being required.</p>
<p>Even though we’re connecting with a password instead of certificates, <strong>enabling SSL is still important</strong>. It encrypts the data between Beekeeper Studio and CockroachDB.</p>
<p>Without it, sensitive info like passwords and queries could be intercepted (man-in-the-middle attacks).</p>
<h4 id="heading-step-3-enable-ssl-amp-ca-verification">Step 3 — Enable SSL &amp; CA Verification</h4>
<ul>
<li><p>Tick <strong>Enable SSL</strong></p>
</li>
<li><p>Click the <strong>CA Cert</strong> field and select the <code>ca.crt</code> file in your project root (<code>cockroachdb-tutorial/</code>)</p>
</li>
</ul>
<p>This ensures that Beekeeper Studio verifies it’s really talking to our CockroachDB cluster and protects against attackers trying to intercept the connection.</p>
<p>Now, click Connect again.</p>
<p>❌ Initially, you’ll still see a <strong>Password authentication failed</strong> error because we intentionally entered the wrong password.</p>
<h4 id="heading-step-4-connect-with-the-correct-password">Step 4: Connect With the Correct Password</h4>
<p>Replace the password with <code>supersecret</code>, then click Connect.</p>
<p>You are now signed in as the <code>password_auth</code> user!</p>
<h4 id="heading-step-5-check-permissions">Step 5: Check Permissions</h4>
<p>Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>❌ You should see an error stating that <code>password_auth</code> does not have permission to access the <code>books</code> table.</p>
<p>This is expected, as it confirms that our limited-access user can <strong>only access what we explicitly grant it</strong>. Even if compromised, the attacker can’t modify our entire database.</p>
<h4 id="heading-step-6-granting-access-to-specific-tables">Step 6: Granting Access to Specific Tables</h4>
<p>To allow <code>password_auth</code> to work with the <code>books</code> table, switch back to the <code>root</code> connection Beekeeper Studio window and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">USAGE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">SCHEMA</span> defaultdb.public <span class="hljs-keyword">TO</span> password_auth;
<span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">SELECT</span>, <span class="hljs-keyword">INSERT</span>, <span class="hljs-keyword">UPDATE</span>, <span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">TABLE</span> defaultdb.public.books <span class="hljs-keyword">TO</span> password_auth;
</code></pre>
<p>This gives the user read and write access to the <code>books</code> table only.</p>
<h4 id="heading-step-7-verify-the-new-user-access">Step 7: Verify the New User Access</h4>
<p>Go back to the Beekeeper Studio window where you’re signed in as <code>password_auth</code> and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom! You should now see the list of books from your restored database.</p>
<p>Our new user is fully functional with <strong>limited privileges</strong>, making it safe for use in real applications.</p>
<h3 id="heading-connecting-with-passwordless-authentication-mutual-tls">Connecting with Passwordless Authentication (Mutual TLS)</h3>
<p>We’ve already seen how to connect to the database using a user that authenticates with a password, and without any client certificates.</p>
<p>Now, let’s look at the opposite scenario: passwordless authentication via Mutual TLS (mTLS).</p>
<p>This is one of the strongest forms of authentication because instead of a password, the database verifies you using a <strong>cryptographically signed certificate</strong>.</p>
<p>Let’s walk through it.</p>
<h4 id="heading-step-1-create-the-mtlsauth-user">Step 1: Create the <code>mtls_auth</code> User</h4>
<p>Navigate back to the Beekeeper Studio window where you're currently signed in as the <code>root</code> user. Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">USER</span> mtls_auth;
</code></pre>
<p>You should see a success message confirming that the user has been created.</p>
<p><strong>N.B.:</strong> If this query fails, there’s a good chance your <code>root</code> client certificate has expired. Remember that we set a <strong>5-hour lifetime</strong> when generating it earlier.</p>
<p>If this happens, delete the certificate-generation pod:</p>
<pre><code class="lang-bash">kubectl delete po/gen-root-cert
</code></pre>
<p>Then re-apply the <code>gen-root-cert.yml</code> manifest. Copy the newly generated <code>client.root.crt</code>, <code>client.root.key</code>, and <code>ca.crt</code> back to your PC. Then try creating the user again.</p>
<h4 id="heading-step-2-attempt-signing-in-as-mtlsauth-expect-failure">Step 2: Attempt Signing In as <code>mtls_auth</code> (Expect Failure)</h4>
<p>Open a new Beekeeper Studio window (Ctrl + Shift + N).</p>
<p>Try filling in the connection settings using:</p>
<ul>
<li><p>User: <code>mtls_auth</code></p>
</li>
<li><p>SSL enabled</p>
</li>
<li><p>CA Cert: <code>ca.crt</code></p>
</li>
<li><p>Client Cert: <code>client.root.crt</code></p>
</li>
<li><p>Client Key: <code>client.root.key</code></p>
</li>
</ul>
<p>Click Connect.</p>
<p>You’ll see an error message similar to this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763444971964/93f41787-425b-4e36-86da-4b688cef672f.png" alt="Connecting as the mtls_auth user with the wrong certificate and key-pair" class="image--center mx-auto" width="521" height="813" loading="lazy"></p>
<p>Why does this fail?</p>
<ol>
<li><p>The user has no password, so password login is impossible.</p>
</li>
<li><p>You’re using the <em>root</em> certificate, not a certificate belonging to <code>mtls_auth</code>. CockroachDB is strict: each user must authenticate using <em>their own</em> certificate.</p>
</li>
</ol>
<p>So let's fix that by generating a new certificate + key pair for the <code>mtls_auth</code> user.</p>
<h4 id="heading-step-3-create-certificate-key-for-mtlsauth">Step 3: Create Certificate + Key for <code>mtls_auth</code></h4>
<p>Just like we generated certificates for the <code>root</code> user earlier, we’ll do the same for <code>mtls_auth</code>.</p>
<p>Create a new manifest named <code>gen-mtls_auth-cert.yml</code>.</p>
<p>Paste in this content:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">gen-mtls-auth-cert</span> 
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span> 
        <span class="hljs-attr">items:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.crt</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.crt</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.key</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.key</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">gen</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">cockroachdb/cockroach:v25.3.1</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"sh"</span>, <span class="hljs-string">"-ec"</span>]
      <span class="hljs-attr">args:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">|
          mkdir -p /out
</span>
          <span class="hljs-comment"># Copy the CA certificate</span>
          <span class="hljs-string">cp</span> <span class="hljs-string">/ca/ca.crt</span> <span class="hljs-string">/out/ca.crt</span>

          <span class="hljs-comment"># Create the client certificate and key pair for user 'mtls_auth'</span>
          <span class="hljs-string">/cockroach/cockroach</span> <span class="hljs-string">cert</span> <span class="hljs-string">create-client</span> <span class="hljs-string">mtls_auth</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--certs-dir=/out</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--ca-key=/ca/ca.key</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--lifetime=5h</span> <span class="hljs-string">\</span>
            <span class="hljs-string">--overwrite</span>

          <span class="hljs-comment"># List generated files</span>
          <span class="hljs-string">ls</span> <span class="hljs-string">-al</span> <span class="hljs-string">/out</span>

          <span class="hljs-comment"># Keep pod alive for kubectl cp</span>
          <span class="hljs-string">sleep</span> <span class="hljs-number">3600</span>
      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> { <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>, <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/ca</span>, <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span> }
      <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">requests:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"50Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"10m"</span>
        <span class="hljs-attr">limits:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"500Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"50m"</span>
</code></pre>
<p>Apply this file, wait for the pod to start, then copy the generated files:</p>
<pre><code class="lang-bash">kubectl cp default/gen-mtls-auth-cert:/out/client.mtls_auth.crt ./client.mtls_auth.crt 
kubectl cp default/gen-mtls-auth-cert:/out/client.mtls_auth.key ./client.mtls_auth.key
kubectl cp default/gen-mtls-auth-cert:/out/ca.crt ./ca.crt
</code></pre>
<p>Now we have the correct certificate + key pair for our new user.</p>
<h4 id="heading-step-4-connect-as-mtlsauth">Step 4: Connect as <code>mtls_auth</code></h4>
<p>Go back to the new Beekeeper Studio window and update the SSL fields:</p>
<ul>
<li><p><strong>CA Cert:</strong> <code>ca.crt</code></p>
</li>
<li><p><strong>Certificate:</strong> <code>client.mtls_auth.crt</code></p>
</li>
<li><p><strong>Key File:</strong> <code>client.mtls_auth.key</code></p>
</li>
</ul>
<p>Click Connect.</p>
<p>This time, it should succeed instantly</p>
<h4 id="heading-step-5-inspect-the-certificate">Step 5 — Inspect the Certificate</h4>
<p>To understand how CockroachDB links certificates to users, decode the certificate:</p>
<pre><code class="lang-bash">openssl x509 -<span class="hljs-keyword">in</span> client.mtls_auth.crt -text -noout &gt; client.mtls_auth.crt.decoded
</code></pre>
<p>Open the file, scroll to the Subject field, and you’ll see:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">Subject:</span> <span class="hljs-string">O</span> <span class="hljs-string">=</span> <span class="hljs-string">Cockroach,</span> <span class="hljs-string">CN</span> <span class="hljs-string">=</span> <span class="hljs-string">mtls_auth</span>
<span class="hljs-string">...</span>
</code></pre>
<p>The <code>CN</code> (Common Name) is the username CockroachDB uses to authenticate the session.</p>
<p>This is how CockroachDB knows you’re connecting as the <code>mtls_auth</code> user without any password at all. :)</p>
<h4 id="heading-step-6-try-reading-the-books-table">Step 6: Try Reading the Books Table</h4>
<p>Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>❌ You’ll get a permission error, just like we did earlier with the <code>password_auth</code> user.</p>
<p>This is expected because <code>mtls_auth</code> has <em>no</em> privileges yet. Perfect!</p>
<h4 id="heading-step-7-grant-permissions-to-mtlsauth">Step 7: Grant Permissions to <code>mtls_auth</code></h4>
<p>Switch to the Beekeeper Studio window where you're signed in as <code>root</code>, and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">USAGE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">SCHEMA</span> defaultdb.public <span class="hljs-keyword">TO</span> mtls_auth;
<span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">SELECT</span>, <span class="hljs-keyword">INSERT</span>, <span class="hljs-keyword">UPDATE</span>, <span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">TABLE</span> defaultdb.public.books <span class="hljs-keyword">TO</span> mtls_auth;
</code></pre>
<p>You should see a success message.</p>
<p>Now return to the <code>mtls_auth</code> session and run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> books;
</code></pre>
<p>Boom! You should now see your previously restored list of books.</p>
<p>You’ve successfully connected using passwordless, certificate-based authentication and granted controlled permissions to the new user. :)</p>
<h3 id="heading-connecting-via-mutual-tls-mtls-from-our-apps-on-kubernetes">Connecting via Mutual TLS (mTLS) from Our Apps on Kubernetes</h3>
<p>So far, we’ve been connecting to our CockroachDB cluster <em>securely</em> using Beekeeper Studio thanks to our TLS certificates and mTLS authentication.</p>
<p>But…what happens when we have applications running inside our Kubernetes cluster that need to talk to CockroachDB as well?</p>
<p>Exactly: those apps also need to authenticate using client certificates</p>
<p>And that brings us to a very important point…</p>
<h4 id="heading-why-we-should-not-generate-client-certificates-using-pods-the-dangerous-way">Why We Should <em>Not</em> Generate Client Certificates Using Pods (The Dangerous Way)</h4>
<p>Up until now, we’ve been generating our client certificates using Kubernetes Pods like:</p>
<ul>
<li><p><code>gen-root-cert</code></p>
</li>
<li><p><code>gen-mtls-auth-cert</code></p>
</li>
</ul>
<p>They <em>work</em>, yes…but they’re not safe for production.</p>
<p>Why? Because these jobs <strong>mount our Certificate Authority (CA) key</strong> inside the pod:</p>
<pre><code class="lang-yaml"><span class="hljs-string">...</span>
<span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-ca</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-cockroachdb-ca-secret</span>
        <span class="hljs-attr">items:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.crt</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.crt</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">ca.key</span>
            <span class="hljs-attr">path:</span> <span class="hljs-string">ca.key</span>
<span class="hljs-string">...</span>
</code></pre>
<p>This is a <em>big</em> security risk!</p>
<p>If an attacker ever gains access to that pod?</p>
<p>🔥 Your CA key is exposed<br>🔥 They can generate <em>their own trusted certificates</em><br>🔥 They can impersonate ANY client/user, including the <code>root</code> and <code>admin</code> users<br>🔥 They’ll have full access to your CockroachDB cluster</p>
<p>And they’ll keep that access <strong>forever</strong>, until you rotate the CA key (which is painful and disruptive).</p>
<p>This is why CockroachDB strongly advises against mounting CA keys into Pods.</p>
<h4 id="heading-the-right-way-using-cert-manager-recommended-by-cockroachdb">The Right Way: Using Cert Manager (Recommended by CockroachDB)</h4>
<p>CockroachDB’s <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/secure-cockroachdb-kubernetes?filters=helm#deploy-cert-manager-for-mtls">official docs recommend</a> managing client certificates using <strong>cert-manager</strong>.</p>
<p>This is because instead of YOU exposing your CA key inside Pods, cert-manager handles everything <em>internally and securely:</em></p>
<ul>
<li><p>Cert-manager stores and protects your CA key</p>
</li>
<li><p>It generates client certificates for you</p>
</li>
<li><p>It issues private keys <em>without ever exposing your CA key</em></p>
</li>
<li><p>It auto-renews certificates before they expire</p>
</li>
<li><p>And it gives you production-grade certificate lifecycle management</p>
</li>
</ul>
<h4 id="heading-but-wait-dont-we-need-the-ca-key-to-generate-client-certificates">But Wait: Don’t We Need the CA Key to Generate Client Certificates?</h4>
<p>Great question.</p>
<p>Yes, normally you need the CA key to sign client certificates…but <strong>cert-manager takes care of that for us</strong>.</p>
<p>You simply:</p>
<ol>
<li><p>Create an Issuer (or ClusterIssuer)</p>
</li>
<li><p>Tell cert-manager to use your CockroachDB CA</p>
</li>
<li><p>Request a Certificate</p>
</li>
</ol>
<p>Then cert-manager automatically:</p>
<ol>
<li><p>Signs it</p>
</li>
<li><p>Stores it in a Kubernetes Secret (where its safe)</p>
</li>
<li><p>Rotates it before expiry</p>
</li>
<li><p>Keeps your CA key completely secure</p>
</li>
</ol>
<p>No more exposing the CA key in Pods. No more writing custom Kubernetes Pods.</p>
<h4 id="heading-certificate-rotation-another-huge-win">Certificate Rotation — Another Huge Win</h4>
<p>Let’s talk about expirations.</p>
<p>Right now:</p>
<ul>
<li><p>The <code>mtls_auth</code> client cert we generated manually has <strong>5 hours</strong> validity</p>
</li>
<li><p>After 5 hours, it expires</p>
</li>
<li><p>Your apps will fail all DB connections</p>
</li>
<li><p>You’d need to regenerate a new certificate manually</p>
</li>
<li><p>Or worse: create a CronJob to regenerate them every 4 hours</p>
</li>
</ul>
<p>This is messy and unsafe.</p>
<p>With cert-manager?</p>
<ul>
<li><p>Certificates are automatically rotated</p>
</li>
<li><p>Renewed before expiration</p>
</li>
<li><p>No downtime</p>
</li>
<li><p>No manual intervention</p>
</li>
<li><p>Apps easily reload the new certificates</p>
</li>
</ul>
<h4 id="heading-alright-lets-install-cert-manager">Alright — Let’s Install Cert Manager</h4>
<p>To start using cert-manager, install it using the Helm chart:</p>
<pre><code class="lang-bash">helm repo add cert-manager https://charts.jetstack.io

helm install cert-manager cert-manager/cert-manager \
  --<span class="hljs-built_in">set</span> crds.enabled=<span class="hljs-literal">true</span> \
  --create-namespace \
  -n cert-manager \
  --version 1.19.1
</code></pre>
<p>Once cert-manager is installed, we’ll:</p>
<ol>
<li><p>Create a <strong>ClusterIssuer</strong> that uses our CockroachDB CA</p>
</li>
<li><p>Create a <strong>Certificate</strong> for our <code>mtls_auth</code> user</p>
</li>
<li><p>Mount that Certificate into our application Pods</p>
</li>
<li><p>Connect securely to CockroachDB via mTLS from inside Kubernetes</p>
</li>
</ol>
<p>That’s what we’ll walk through next</p>
<p>Before cert-manager can issue our certificates, it needs an <strong>Issuer</strong>. And before creating an Issuer, we need a secret that contains our CA certificate and CA key using the correct key names.</p>
<h4 id="heading-creating-a-ca-secret-for-the-issuer">Creating a CA Secret for the Issuer</h4>
<p>cert-manager’s <code>Issuer</code> is a bit picky about the secret format. It expects the secret to contain two keys:</p>
<ul>
<li><p><code>tls.crt</code>: the CA certificate</p>
</li>
<li><p><code>tls.key</code>: the CA private key</p>
</li>
</ul>
<p>But\ the CockroachDB Helm chart automatically generates a secret named <code>crdb-cockroachdb-ca-secret</code>, which uses different key names:</p>
<ul>
<li><p><code>ca.crt</code></p>
</li>
<li><p><code>ca.key</code></p>
</li>
</ul>
<p>So even though this secret contains exactly what we need, cert-manager won’t accept it because the keys are not named the way it expects.</p>
<p>To fix this, we’ll re-create a new secret with the correct key names. First, copy the existing CA files from Kubernetes to your local machine:</p>
<pre><code class="lang-bash">kubectl get secret crdb-cockroachdb-ca-secret -o jsonpath=<span class="hljs-string">'{.data.ca\.crt}'</span> | base64 -d &gt; ca.crt
</code></pre>
<p>If you get a “permission denied”, simply delete any existing <code>ca.crt</code> file in your project directory.</p>
<p>Now copy the key:</p>
<pre><code class="lang-bash">kubectl get secret crdb-cockroachdb-ca-secret -o jsonpath=<span class="hljs-string">'{.data.ca\.key}'</span> | base64 -d &gt; ca.key
</code></pre>
<p>Next, create the properly formatted secret:</p>
<pre><code class="lang-bash">kubectl create secret tls crdb-ca-issuer-secret --cert=ca.crt --key=ca.key
</code></pre>
<p>If you describe it:</p>
<pre><code class="lang-bash">kubectl describe secret crdb-ca-issuer-secret
</code></pre>
<p>You should now see <code>tls.crt</code> and <code>tls.key</code> in the <code>Data</code> section – exactly what cert-manager needs.</p>
<h4 id="heading-creating-the-issuer">Creating the Issuer</h4>
<p>Now that we have a properly formatted CA secret, we can create the Issuer that cert-manager will use to sign our client certificates.</p>
<p>Create a file called <code>crdb-issuer.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cert-manager.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Issuer</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-issuer</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">ca:</span>
    <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-ca-issuer-secret</span>
</code></pre>
<p>Apply it:</p>
<pre><code class="lang-bash">kubectl apply -f crdb-issuer.yml
</code></pre>
<p>Confirm that it’s ready:</p>
<pre><code class="lang-bash">kubectl get issuer crdb-issuer
</code></pre>
<p>The <code>Ready</code> column should display <code>True</code>.</p>
<h4 id="heading-creating-the-certificate-manifest">Creating the Certificate Manifest</h4>
<p>Now we’ll define a Certificate object. This doesn’t create the client certificate instantly – instead, it tells cert-manager <strong>what kind</strong> of certificate we need. cert-manager then generates and stores the certificate automatically.</p>
<p>Create a file named <code>crdb-mtls_auth-certificate.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">cert-manager.io/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Certificate</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-mtls-auth-certificate</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-mtls-auth-certificate</span> <span class="hljs-comment"># Secret that will hold the cert+key</span>
  <span class="hljs-attr">commonName:</span> <span class="hljs-string">mtls_auth</span> <span class="hljs-comment"># MUST match Cockroach SQL role</span>
  <span class="hljs-attr">duration:</span> <span class="hljs-string">24h</span> <span class="hljs-comment"># 1 day</span>
  <span class="hljs-attr">renewBefore:</span> <span class="hljs-string">20h</span> <span class="hljs-comment"># renew 4 hours before expiry</span>
  <span class="hljs-attr">privateKey:</span>
    <span class="hljs-attr">algorithm:</span> <span class="hljs-string">RSA</span>
    <span class="hljs-attr">size:</span> <span class="hljs-number">2048</span>
    <span class="hljs-attr">encoding:</span> <span class="hljs-string">PKCS8</span>
  <span class="hljs-attr">usages:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">client</span> <span class="hljs-string">auth</span> <span class="hljs-comment"># important: client certificate</span>
  <span class="hljs-attr">issuerRef:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-issuer</span>
    <span class="hljs-attr">kind:</span> <span class="hljs-string">Issuer</span>
    <span class="hljs-attr">group:</span> <span class="hljs-string">cert-manager.io</span>
</code></pre>
<p>Let’s look at the important properties so we can understand what the Certificate workload does:</p>
<ul>
<li><p><strong>secretName:</strong> The Kubernetes secret where cert-manager will store the generated certificate, key, and CA certificate. This is where your apps will later mount the certificate files from.</p>
</li>
<li><p><strong>commonName:</strong> Very important! This must match the <strong>CockroachDB SQL user</strong> (<code>mtls_auth</code>), because CockroachDB uses the certificate’s Common Name to identify the connecting user.</p>
</li>
<li><p><strong>duration</strong> and <strong>renewBefore:</strong> <code>duration</code> defines how long the certificate is valid. <code>renewBefore</code> ensures cert-manager renews it early, preventing the certificate from getting expired before it gets renewed (to avoid downtime).</p>
</li>
<li><p><strong>usages:</strong> Tells cert-manager what the certificate is for. <code>client auth</code> ensures this certificate is only used by clients connecting to servers, not the other way around.</p>
</li>
<li><p><strong>issuerRef:</strong> Points to the Issuer we created earlier. This tells cert-manager <em>who</em> should sign the certificate.</p>
</li>
</ul>
<p>Apply the manifest:</p>
<pre><code class="lang-bash">kubectl apply -f crdb-mtls_auth-certificate.yml
</code></pre>
<p>After a few seconds, cert-manager will generate the certificate.</p>
<p>Check the secret:</p>
<pre><code class="lang-bash">kubectl get secret crdb-mtls-auth-certificate
</code></pre>
<p>Describe it to view the keys:</p>
<pre><code class="lang-bash">kubectl describe secret crdb-mtls-auth-certificate
</code></pre>
<p>You should see:</p>
<ul>
<li><p><code>tls.crt</code></p>
</li>
<li><p><code>tls.key</code></p>
</li>
<li><p><code>ca.crt</code></p>
</li>
</ul>
<p>These are the files the application will use.</p>
<p>If we copied the content of the <code>tls.crt</code> to our local machine and decoded it using the <code>openssl x509...</code> command, we'll see similar details to the content in the <code>client.mtls_auth.crt</code> client certificate we previously generated, with the Common Name (CN being <code>mtls_auth</code>).</p>
<h4 id="heading-creating-a-pod-that-connects-using-the-client-certificate">Creating a Pod That Connects Using the Client Certificate</h4>
<p>Now let’s create a simple Pod that uses our new client certificate to connect to CockroachDB.</p>
<p>Create a file called <code>books-pod.yml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">books-pod</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">restartPolicy:</span> <span class="hljs-string">Never</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-certs</span>
      <span class="hljs-attr">secret:</span>
        <span class="hljs-attr">secretName:</span> <span class="hljs-string">crdb-mtls-auth-certificate</span>
        <span class="hljs-comment"># Make secret files read-only for the user only: 0400 (Without this, the Python app will thow an error). Howevwe, this is not compulsory for all apps, just this one being used in this tutorial :)</span>
        <span class="hljs-attr">defaultMode:</span> <span class="hljs-number">0400</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">books</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">prince2006/cockroachdb-tutorial-python-app:new</span>
      <span class="hljs-attr">imagePullPolicy:</span> <span class="hljs-string">Always</span>
      <span class="hljs-attr">env:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">DATABASE_URL</span>
          <span class="hljs-attr">value:</span> <span class="hljs-string">&gt;-
            postgresql://mtls_auth@crdb-cockroachdb-public.default:26257/defaultdb?sslmode=verify-full&amp;sslrootcert=/crdb-certs/ca.crt&amp;sslcert=/crdb-certs/tls.crt&amp;sslkey=/crdb-certs/tls.key
</span>      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">crdb-certs</span>
          <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/crdb-certs</span>
          <span class="hljs-attr">readOnly:</span> <span class="hljs-literal">true</span>
      <span class="hljs-attr">resources:</span>
        <span class="hljs-attr">limits:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"100Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"50m"</span>
        <span class="hljs-attr">requests:</span>
          <span class="hljs-attr">memory:</span> <span class="hljs-string">"50Mi"</span>
          <span class="hljs-attr">cpu:</span> <span class="hljs-string">"10m"</span>
</code></pre>
<p>Here’s what’s happening:</p>
<ul>
<li><p>We mount the generated certificate secret into <code>/crdb-certs</code>.</p>
</li>
<li><p>The Python app uses those certificate files (<code>tls.crt</code>, <code>tls.key</code>, <code>ca.crt</code>) to authenticate.</p>
</li>
<li><p>The connection string does <strong>NOT</strong> include a password. CockroachDB authenticates the user entirely via the certificate’s Common Name.</p>
</li>
</ul>
<p>Apply the Pod:</p>
<pre><code class="lang-bash">kubectl apply -f books-pod.yml
</code></pre>
<p>After about a minute, view the logs:</p>
<pre><code class="lang-bash">kubectl logs books-pod
</code></pre>
<p>Or if the Pod already restarted:</p>
<pre><code class="lang-bash">kubectl logs -p books-pod
</code></pre>
<p>You should see a successful connection to CockroachDB using the <code>mtls_auth</code> user and a list of books</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763534354156/60114f7b-ba62-4706-a0b7-7629e20bfaaa.png" alt="List of books from our books-pod logs" class="image--center mx-auto" width="896" height="601" loading="lazy"></p>
<p>If you remove the certificate files or try connecting without them, the app will fail – as expected.</p>
<p><strong>Congratulations!</strong></p>
<p>You’ve officially built a fully secure, production-ready CockroachDB cluster on Kubernetes – complete with:</p>
<ul>
<li><p>End-to-end encryption (TLS)</p>
</li>
<li><p>Mutual TLS authentication (mTLS) for users and apps</p>
</li>
<li><p>Automated, daily backups to Google Cloud Storage</p>
</li>
<li><p>Proper certificate rotation with cert-manager</p>
</li>
</ul>
<h2 id="heading-how-to-get-a-cockroachdb-enterprise-license-for-free">How to Get a CockroachDB Enterprise License for Free</h2>
<p>Okay, so here’s a thing: even though you’ve built a super professional CockroachDB cluster, there’s one small catch: <strong>without a license, your cluster might be “throttled.”</strong></p>
<p>We know that because, when we access our dashboard, we get a message concerning our cluster getting throttled.</p>
<p>That means things slow down: queries take longer, performance gets worse, and scaling up won’t magically make it faster. Yeah, it’s real. 🥲</p>
<p>Why does this happen? Because CockroachDB’s “full feature set” is under a special license. If you don’t set a valid license, it limits how many SQL transactions you can run at a time.</p>
<h3 id="heading-three-types-of-licenses">Three Types of Licenses</h3>
<p>Here’s a breakdown of the different kinds of CockroachDB licenses and what they mean for you:</p>
<ol>
<li><p><strong>Trial License</strong></p>
<ul>
<li><p>Valid for <strong>30 days</strong>.</p>
</li>
<li><p>Lets you try all the “Enterprise” features.</p>
</li>
<li><p>You <em>must</em> send telemetry (more on that soon) while the trial is active.</p>
</li>
</ul>
</li>
<li><p><strong>Enterprise License (Paid)</strong></p>
<ul>
<li><p>This is CockroachDB’s “premium / fully paid” version.</p>
</li>
<li><p>You can pick the kind of license based on your environment: “Production”, “Pre-production”, or “Development.”</p>
</li>
<li><p>Companies with more than <strong>$10 million in annual revenue</strong> need to pay for this license.</p>
</li>
<li><p>There <em>are</em> discounts, startup perks, or “free” versions for smaller companies (more below).</p>
</li>
</ul>
</li>
<li><p><strong>Enterprise Free License</strong></p>
<ul>
<li><p>This is the magic one for early-stage companies or startups: it has exactly the same features as the paid Enterprise license. But it’s free if your business makes <strong>under $10 million per year</strong>.</p>
</li>
<li><p>You <em>do</em> need to renew it each year.</p>
</li>
<li><p>Support for this “Free” license is <strong>community-level</strong> (forums, docs), not paid enterprise.</p>
</li>
</ul>
</li>
</ol>
<p><strong>N.B.:</strong> To keep your free license active and <em>not</em> get throttled, CockroachDB requires telemetry. Telemetry means your cluster sends some usage data back to Cockroach Labs. And no, they’re not “stealing your data”. Here’s what that actually means:</p>
<ul>
<li><p>Telemetry includes basic usage stats, cluster health info, and configuration metrics.</p>
</li>
<li><p>It does NOT send your business data, queries, or personal customer data.</p>
</li>
<li><p>It helps Cockroach Labs <em>make sure the free license is used responsibly</em>, and helps them build better features.</p>
</li>
<li><p>If you stop sending telemetry, your cluster will eventually be throttled after 7 days (slowed down).</p>
</li>
</ul>
<h3 id="heading-how-to-apply-for-the-free-enterprise-license">How to Apply for the Free Enterprise License</h3>
<p>Here’s how you can try to get that free enterprise license:</p>
<ol>
<li><p>Go to the CockroachDB Cloud Console (Sign up if you don’t have a account). Then go to the “Organization” link on the menu, click it, then click the “Enterprise Licenses” from the dropdown.</p>
</li>
<li><p>Click the Create License button → Enable the “Find out if my company qualifies for an Enterprise Free license” option.</p>
</li>
<li><p>Fill in the form: your name, company name, job function, and the intended use of the license.</p>
</li>
<li><p>Click “Continue”.</p>
</li>
</ol>
<p>You should see this success message “Based on your company's intended use, you qualify for an Enterprise Free license.” Now agree to the terms and conditions, then click the “Generate License key“.</p>
<p>Learn more about CockroachDB licenses here 👉🏾 <a target="_blank" href="https://www.cockroachlabs.com/docs/stable/licensing-faqs">https://www.cockroachlabs.com/docs/stable/licensing-faqs</a></p>
<h3 id="heading-adding-your-license-to-the-cockroachdb-cluster">Adding Your License to the CockroachDB Cluster</h3>
<p>Now that you’ve gotten your shiny new CockroachDB license (whether it’s the Free one or the Enterprise one), the next step is…actually <em>using it</em>.</p>
<p>Let’s add it to your CockroachDB cluster so it stops shouting “THROTTLED!” at you every time you open the dashboard :)</p>
<p>We’ll do this by updating our CockroachDB Helm configuration.</p>
<h4 id="heading-step-1-update-your-cockroachdb-productionyml">Step 1: Update Your <code>cockroachdb-production.yml</code></h4>
<p>Open your production Helm values file, and inside the <code>init</code> section, add the following:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">init:</span>
<span class="hljs-string">...</span>
    <span class="hljs-attr">provisioning:</span>
        <span class="hljs-attr">enabled:</span> <span class="hljs-literal">true</span>
        <span class="hljs-attr">clusterSettings:</span>
          <span class="hljs-attr">cluster.organization:</span> <span class="hljs-string">"'&lt;ORGANIZATION&gt;'"</span> <span class="hljs-comment"># Enter the name of your organization here </span>
          <span class="hljs-attr">enterprise.license:</span> <span class="hljs-string">"'&lt;LICENSE&gt;'"</span> <span class="hljs-comment"># Enter your CockroachDB Enterprise license key here</span>
<span class="hljs-string">...</span>
</code></pre>
<p>Now replace:</p>
<ul>
<li><p><code>&lt;ORGANIZATION&gt;</code> with the name of your startup, business, project, or company</p>
</li>
<li><p><code>&lt;LICENSE&gt;</code> with the exact license string CockroachDB gave you</p>
</li>
</ul>
<p>That’s it – super simple.</p>
<h4 id="heading-step-2-apply-the-changes-with-helm">Step 2: Apply the Changes With Helm</h4>
<p>Run your usual Helm upgrade command:</p>
<pre><code class="lang-bash">helm upgrade cockroachdb -f cockroachdb-production.yml cockroachdb/cockroachdb
</code></pre>
<h4 id="heading-step-3-confirm-the-license-was-added-correctly">Step 3: Confirm the License Was Added Correctly</h4>
<p>Now let’s double-check everything worked.</p>
<ol>
<li><p>Connect as the <code>root</code> user: You can connect using Beekeeper Studio (like we’ve been doing).</p>
</li>
<li><p>Run this query to check your license:</p>
</li>
</ol>
<pre><code class="lang-sql"><span class="hljs-keyword">SHOW</span> CLUSTER SETTING enterprise.license;
</code></pre>
<p>If everything went well, you should see your license key printed out in the results.</p>
<h4 id="heading-step-4-make-sure-telemetry-is-enabled-important">Step 4: Make Sure Telemetry Is Enabled (Important!)</h4>
<p>Remember: without telemetry enabled, your cluster will still get throttled, even if you have a valid license 🥲</p>
<p>Run:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SHOW</span> CLUSTER SETTING diagnostics.reporting.enabled;
</code></pre>
<p>If the result says “true”, you're good! Telemetry is on, CockroachDB can verify your license, and your cluster will behave normally without slowing down.</p>
<h2 id="heading-conclusion-amp-next-steps"><strong>Conclusion &amp; Next Steps ✨</strong></h2>
<p>Throughout this book, you’ve gone from “What even is CockroachDB?” to actually running your <strong>own secure, production-ready database</strong> on Kubernetes – and that’s a BIG deal. 🎉</p>
<p>You learned why CockroachDB is special, how it avoids downtime, and why it’s different from the usual databases everyone talks about.</p>
<p>Then you set up your own local environment, practiced everything safely on Minikube, and gradually built your way to a full production setup on GKE.</p>
<p>You explored CockroachDB’s dashboard, checked your cluster’s health, backed up your data to the cloud, and even learned how to keep your database fast, stable, and ready to grow when needed.</p>
<p>Finally, you deployed it on Google Cloud, secured it with encryption and certificates, and connected to it from your own PC – all step-by-step.</p>
<p>By now, you’ve basically gone from curious learner to “I can actually run this thing in production.” 🚀</p>
<p>You’ve covered a lot – and you’ve built something powerful, modern, and production-worthy. Amazing job 👏🏾😁!! And thanks for reading.</p>
<h3 id="heading-about-the-author">About the Author 👨🏾‍💻</h3>
<p>Hi, I’m Prince! I’m a DevOps engineer and Cloud architect passionate about building, deploying, architecting, and managing applications and sharing knowledge with the tech community.</p>
<p>If you enjoyed this book, you can learn more about me by exploring more of my blogs and projects on my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/">LinkedIn profile</a>. and reach out to me on <a target="_blank" href="https://x.com/POnukwili">Twitter (X)</a>. You can find more of my <a target="_blank" href="https://www.linkedin.com/in/prince-onukwili-a82143233/details/publications/">articles here</a> or on <a target="_blank" href="https://www.freecodecamp.org/news/author/onukwilip/">my freeCodeCamp blog</a>.</p>
<p>You can also <a target="_blank" href="https://prince-onuk.vercel.app">visit my website</a>. Let’s connect and grow together! 😊</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn Databases and SQL from Harvard University ]]>
                </title>
                <description>
                    <![CDATA[ Are you ready to master the art of data management using one of the most essential languages in the world of computing? Introducing CS50 SQL, Harvard University's focused video course dedicated to an introduction to databases using a language called ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-databases-and-sql-from-harvard-university/</link>
                <guid isPermaLink="false">68e7c8517c1c20cab7c3737b</guid>
                
                    <category>
                        <![CDATA[ SQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Databases ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Thu, 09 Oct 2025 14:36:01 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1760020545784/53c7451f-ea27-4471-afa6-87c1ca44827a.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Are you ready to master the art of data management using one of the most essential languages in the world of computing? Introducing CS50 SQL, Harvard University's focused video course dedicated to an introduction to databases using a language called SQL.</p>
<p>This course is created by Carter Zenke, and offers a unique and immersive learning experience that will help you develop the skills you need to excel in managing and manipulating data.</p>
<p>We just posted the entire course on the <a target="_blank" href="http://freeCodeCamp.org">freeCodeCamp.org</a> YouTube channel.</p>
<p>This SQL-focused course offers a deeper dive into relational databases, covering essential topics such as how to create, read, update, and delete data (CRUD), as well as modeling real-world entities.</p>
<p>Here's an overview of the course sections:</p>
<ul>
<li><p>Querying</p>
</li>
<li><p>Relating</p>
</li>
<li><p>Designing</p>
</li>
<li><p>Writing</p>
</li>
<li><p>Viewing</p>
</li>
<li><p>Optimizing</p>
</li>
<li><p>Scaling</p>
</li>
</ul>
<p>The CS50 SQL course offers an extensive range of hands-on opportunities for practice, with assignments inspired by real-world datasets. You'll learn how to normalize data to eliminate redundancies, join tables using primary and foreign keys, automate searches with views, and expedite them with indexes. The course begins with SQLite for portability and ends with introductions to PostgreSQL and MySQL for scalability.</p>
<p>Watch the full course on <a target="_blank" href="https://youtu.be/WXk7yDqsKxs">the freeCodeCamp.org YouTube channel</a> (11-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/WXk7yDqsKxs" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
