<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Md Tarikul Islam - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Md Tarikul Islam - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Tue, 16 Jun 2026 17:37:54 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/Tarikul001/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ The Saga Pattern in Node.js: How to Roll Back Distributed Transactions Across Microservices ]]>
                </title>
                <description>
                    <![CDATA[ Building reliable workflows across multiple microservices is challenging. In a monolith, a database transaction can ensure that multiple operations either succeed or fail together. But once data is sp ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-saga-pattern-in-node-js-roll-back-distributed-transactions-across-microservices/</link>
                <guid isPermaLink="false">6a2cfc9713c6ff659c6c31d1</guid>
                
                    <category>
                        <![CDATA[ Microservices ]]>
                    </category>
                
                    <category>
                        <![CDATA[ design patterns ]]>
                    </category>
                
                    <category>
                        <![CDATA[ PostgreSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ rollback ]]>
                    </category>
                
                    <category>
                        <![CDATA[ idempotence ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Md Tarikul Islam ]]>
                </dc:creator>
                <pubDate>Sat, 13 Jun 2026 06:45:43 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/b0e126ec-8b90-470a-b5c0-55e5e1673731.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Building reliable workflows across multiple microservices is challenging. In a monolith, a database transaction can ensure that multiple operations either succeed or fail together. But once data is spread across different services and databases, that guarantee disappears.</p>
<p>This is where the Saga Pattern comes in. Instead of using distributed transactions, a saga coordinates a sequence of local transactions and runs compensation actions when something goes wrong.</p>
<p>In this article, we'll build an orchestrated Saga Pattern using NestJS, gRPC, PostgreSQL, and Sequelize. You'll learn how to coordinate work across services, implement compensation-based rollbacks, handle idempotency, and track workflow progress in a production-style microservice architecture.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-1-introduction">1. Introduction</a></p>
</li>
<li><p><a href="#heading-2-the-problem-in-one-picture">2. The Problem in One Picture</a></p>
</li>
<li><p><a href="#heading-3-why-you-need-a-saga">3. Why You Need a Saga</a></p>
</li>
<li><p><a href="#heading-4-choreography-vs-orchestration">4. Choreography vs Orchestration</a></p>
<ul>
<li><p><a href="#heading-choreography">Choreography</a></p>
</li>
<li><p><a href="#heading-orchestration">Orchestration</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-5-the-example-project">5. The Example Project</a></p>
</li>
<li><p><a href="#heading-6-architecture">6. Architecture</a></p>
</li>
<li><p><a href="#heading-7-the-saga-flow-step-by-step">7. The Saga Flow, Step by Step</a></p>
</li>
<li><p><a href="#heading-8-the-state-machine">8. The State Machine</a></p>
</li>
<li><p><a href="#heading-9-implementing-the-orchestrator">9. Implementing the Orchestrator</a></p>
<ul>
<li><p><a href="#heading-creating-the-saga-record">Creating the Saga Record</a></p>
</li>
<li><p><a href="#heading-the-main-loop">The Main Loop</a></p>
</li>
<li><p><a href="#heading-a-single-step-in-detail">A Single Step in Detail</a></p>
</li>
<li><p><a href="#heading-habits-worth-copying">Habits Worth Copying</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-10-implementing-the-participant">10. Implementing the Participant</a></p>
</li>
<li><p><a href="#heading-11-rollback-compensation">11. Rollback (Compensation)</a></p>
<ul>
<li><p><a href="#heading-on-the-orchestrator-side">On the Orchestrator Side</a></p>
</li>
<li><p><a href="#heading-on-the-participant-side">On the Participant Side</a></p>
</li>
<li><p><a href="#heading-rules-of-a-good-compensation">Rules of a Good Compensation</a></p>
</li>
<li><p><a href="#heading-what-happens-if-the-compensation-itself-fails">What Happens if the Compensation Itself Fails?</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-12-tracking-idempotency-and-observability">12. Tracking, Idempotency and Observability</a></p>
<ul>
<li><p><a href="#heading-orchestrator-side-agency_onboarding_sagas">Orchestrator Side — agency_onboarding_sagas</a></p>
</li>
<li><p><a href="#heading-participant-side-agency_provision_records">Participant Side — agency_provision_records</a></p>
</li>
<li><p><a href="#heading-observability-for-free">Observability for Free</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-13-testing-a-saga">13. Testing a Saga</a></p>
</li>
<li><p><a href="#heading-14-when-not-to-use-a-saga">14. When NOT to Use a Saga</a></p>
</li>
<li><p><a href="#heading-15-trade-offs-and-lessons-learned">15. Trade-offs and Lessons Learned</a></p>
</li>
<li><p><a href="#heading-16-conclusion">16. Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>This article assumes you're already familiar with some backend development concepts. You don't need prior experience with the Saga Pattern, but you should be comfortable with:</p>
<ul>
<li><p>JavaScript, TypeScript, Node.js</p>
</li>
<li><p>NestJS fundamentals (controllers, services, dependency injection)</p>
</li>
<li><p>Basic PostgreSQL concepts</p>
</li>
<li><p>Database transactions</p>
</li>
<li><p>Docker (recommended for local development)</p>
</li>
<li><p>Microservice architecture basics</p>
</li>
<li><p>gRPC fundamentals (helpful but not required)</p>
</li>
</ul>
<p>If you've already built a few backend services with NestJS and PostgreSQL, you'll have everything you need to follow this guide.</p>
<h2 id="heading-1-introduction">1. Introduction</h2>
<p>A <strong>saga</strong> is a sequence of local transactions across multiple services. Each step commits its own database transaction. If a later step fails, the saga runs <strong>compensating transactions</strong> to semantically undo the work already committed.</p>
<p>The pattern was first described by Hector Garcia-Molina and Kenneth Salem in 1987 for long-lived database transactions. It was rediscovered a decade ago when companies started splitting monoliths into microservices and realised that the database transaction — the single most powerful tool in a backend developer's belt — stops working at the service boundary.</p>
<p>This article walks through an orchestrated saga in Node.js (NestJS + gRPC) for onboarding an agency, where two services must agree on a single business outcome:</p>
<ul>
<li><p><code>agency-service</code> — owns the agency record.</p>
</li>
<li><p><code>auth-service</code> — owns the organization, user and role.</p>
</li>
</ul>
<p>If either side fails, the system must end up as if nothing ever happened. No half-created users, orphan organizations, or 3am Slack threads.</p>
<h2 id="heading-2-the-problem-in-one-picture">2. The Problem in One Picture</h2>
<p>Here's the bug a saga is built to prevent:</p>
<pre><code class="language-plaintext">Step 1: auth-service     ✅ creates Organization #42
Step 2: auth-service     ✅ creates User #99
Step 3: agency-service   ❌ fails (DB down, validation, network blip…)

Result without a saga:
   Organization #42 and User #99 still exist.
   There is no Agency row.
   The user can log in but has nothing to manage.
   Support gets a ticket. Engineer writes a one-off SQL cleanup.
   Repeat every week.
</code></pre>
<p>The saga's job is to detect that step 3 failed and <strong>explicitly delete Organization #42 and User #99</strong>, so the system is consistent again — even though those rows live in a different service's database.</p>
<h2 id="heading-3-why-you-need-a-saga">3. Why You Need a Saga</h2>
<p>In a monolith, you wrap everything in one DB transaction and let the database handle atomicity:</p>
<pre><code class="language-ts">await sequelize.transaction(async (tx) =&gt; {
  await Organization.create({...}, { transaction: tx });
  await User.create({...}, { transaction: tx });
  await Agency.create({...}, { transaction: tx });
});
</code></pre>
<p>In microservices, each service has its own database. You can't wrap two services in one ACID transaction. The classic alternatives all have problems:</p>
<table>
<thead>
<tr>
<th>Option</th>
<th>Problem</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Two-Phase Commit (2PC)</strong></td>
<td>Locks rows across services, coordinator is a single point of failure, and doesn't scale. Most modern databases don't support it well across HTTP/gRPC.</td>
</tr>
<tr>
<td><strong>"Just hope it works"</strong></td>
<td>Leaves orphan users / billing rows when half the flow fails. Real data corruption — and the longer the system runs, the more orphans accumulate.</td>
</tr>
<tr>
<td><strong>Manual cleanup scripts</strong></td>
<td>Works for a week. Bugs hide for months. New engineers don't know they exist.</td>
</tr>
<tr>
<td><strong>Eventual consistency without compensation</strong></td>
<td>Fine for some domains (analytics) but completely wrong for billing, identity, or anything with money.</td>
</tr>
<tr>
<td><strong>Saga pattern</strong></td>
<td>Each service commits locally. The orchestrator owns the workflow and runs explicit compensation on failure. It's auditable, restartable, and reasonable.</td>
</tr>
</tbody></table>
<p>The saga gives you eventual consistency with a clear, auditable rollback path — without distributed locks.</p>
<h2 id="heading-4-choreography-vs-orchestration">4. Choreography vs Orchestration</h2>
<p>There are two ways to implement a saga:</p>
<h3 id="heading-choreography">Choreography</h3>
<p>With Choreography, services emit events and other services subscribe and react.</p>
<pre><code class="language-plaintext">auth-service → emits "UserCreated"
agency-service → listens, creates agency, emits "AgencyCreated"
billing-service → listens, creates subscription…
</code></pre>
<p>It's simple at first, but brittle later. The workflow is scattered across N codebases. Nobody owns it. Debugging means tracing events across logs. Adding a step means changing several services.</p>
<h3 id="heading-orchestration">Orchestration</h3>
<p>With Orchestration, one service is the conductor. It calls the others in order.</p>
<pre><code class="language-plaintext">orchestrator:
   1. authClient.provisionAccount(...)
   2. agencyRepo.create(...)
   3. authClient.sendWelcomeEmail(...)
</code></pre>
<p>There's slightly more coupling here (the orchestrator imports clients), but the entire workflow lives in one file. Onboarding new engineers becomes a one-hour task. Adding a step is a single PR.</p>
<p><strong>Pick orchestration unless you have a strong reason not to.</strong> This article — and the reference implementation — uses orchestration.</p>
<h2 id="heading-5-the-example-project">5. The Example Project</h2>
<p>Our goal here is to create an Agency in the system. This is the moment a new B2B customer signs up.</p>
<p>It requires two services to agree on a single outcome:</p>
<p><code>auth-service</code> <strong>must create:</strong></p>
<ul>
<li><p>an <code>Organization</code> row (the tenant)</p>
</li>
<li><p>a <code>User</code> row (the agency admin who will log in)</p>
</li>
<li><p>a <code>UserRole</code> row linking the user to the <code>AGENCY_ADMIN</code> role</p>
</li>
</ul>
<p><code>agency-service</code> <strong>must create:</strong></p>
<ul>
<li>an <code>Agency</code> row containing business details (size, registration number, website, branches…), linked to the user/organization above</li>
</ul>
<p>These rows have foreign-key relationships <em>within</em> a service, but <em>not</em> across services — Postgres can't enforce that the user in auth's DB matches the <code>authUserId</code> in agency's DB. The application has to do it.</p>
<pre><code class="language-plaintext">auth-service DB                    agency-service DB
─────────────────                  ─────────────────
organizations  ◄────────┐
   │                    │
   │ (1:1)              │   foreign reference (no FK)
   ▼                    │           agencies
users  ──────► user_roles                     ─ authUserId
                                              └ authOrganizationId
</code></pre>
<p>If step 2 fails <em>after</em> step 1 succeeded, we end up with a user who can authenticate but has no agency — the exact bug from 2. That's what the saga prevents.</p>
<h2 id="heading-6-architecture">6. Architecture</h2>
<pre><code class="language-plaintext">                     ┌───────────────────────────────┐
                     │        API Gateway            │
                     └──────────────┬────────────────┘
                                    │ HTTP
                                    ▼
   ┌──────────────────────────────────────────────────┐
   │              agency-service                      │
   │   ┌─────────────────────────────────────────┐    │
   │   │   AgencyOnboardingOrchestrator (SAGA)   │    │
   │   └───────────────┬─────────────────────────┘    │
   │                   │ writes state                 │
   │                   ▼                              │
   │      agency_onboarding_sagas  (Postgres)         │
   └───────────────┬─────────────────┬────────────────┘
                   │ gRPC            │ gRPC
       provisionAgencyAccount   compensateAgencyAccount
                   │                 │
                   ▼                 ▼
   ┌──────────────────────────────────────────────────┐
   │              auth-service                        │
   │   AgencyProvisioningService  (Participant)       │
   │                                                  │
   │   organizations · users · user_roles             │
   │   agency_provision_records  ← idempotency log    │
   └──────────────────────────────────────────────────┘
</code></pre>
<p>Three components do all the work:</p>
<ol>
<li><p><code>AgencyOnboardingOrchestrator</code> in <code>agency-service</code> — drives the workflow.</p>
</li>
<li><p><code>agency_onboarding_sagas</code> table in <code>agency-service</code> — the durable log of the saga's progress.</p>
</li>
<li><p><code>AgencyProvisioningService</code> in <code>auth-service</code> — exposes a <code>do</code> operation (<code>provisionAgencyAccount</code>) and an <code>undo</code> operation (<code>compensateAgencyAccount</code>). It's backed by its own <code>agency_provision_records</code> idempotency table.</p>
</li>
</ol>
<p>The orchestrator never reaches into the auth database directly. The boundary is enforced by gRPC.</p>
<h2 id="heading-7-the-saga-flow-step-by-step">7. The Saga Flow, Step by Step</h2>
<p>This sequence diagram shows the complete lifecycle of the onboarding saga. The workflow begins when a client sends a request to create a new agency. The orchestrator first creates a saga record in its database and marks it as <code>STARTED</code>, giving it a durable record of the workflow before any business action takes place.</p>
<p>At a high level, the orchestrator begins by creating a saga record and then asks <code>auth-service</code> to provision the organization, user, and role. Once that succeeds, the orchestrator creates the agency record in its own database.</p>
<p>If every step succeeds, the saga reaches the <code>COMPLETED</code> state. If the agency creation fails after the auth resources have already been created, the orchestrator triggers a compensation step that instructs <code>auth-service</code> to remove everything it previously provisioned.</p>
<p>The key idea is that each service commits its own local transaction, while the saga coordinates the overall business workflow and ensures the system can return to a consistent state when failures occur.</p>
<pre><code class="language-mermaid">sequenceDiagram
    autonumber
    participant C as Client
    participant AS as agency-service&lt;br/&gt;Orchestrator
    participant DB1 as saga store
    participant AU as auth-service
    participant DB2 as auth DB

    C-&gt;&gt;AS: POST /agencies
    AS-&gt;&gt;DB1: INSERT saga (STARTED, payload)
    AS-&gt;&gt;AU: provisionAgencyAccount(sagaId, …)
    AU-&gt;&gt;DB2: BEGIN TX
    AU-&gt;&gt;DB2: create org + user + role + provision_record
    AU-&gt;&gt;DB2: COMMIT
    AU--&gt;&gt;AS: { userId, organizationId, roleId }
    AS-&gt;&gt;DB1: UPDATE saga (AUTH_PROVISIONED)
    AS-&gt;&gt;AS: create Agency row
    alt Agency row OK
        AS-&gt;&gt;DB1: UPDATE saga (AGENCY_CREATED → COMPLETED)
        AS-&gt;&gt;AU: sendAgencyWelcomeEmail (non-critical)
        AS--&gt;&gt;C: 200 OK + sagaId
    else Agency row fails
        AS-&gt;&gt;DB1: UPDATE saga (COMPENSATING)
        AS-&gt;&gt;AU: compensateAgencyAccount(sagaId)
        AU-&gt;&gt;DB2: BEGIN TX
        AU-&gt;&gt;DB2: delete role + token + user + org + record
        AU-&gt;&gt;DB2: COMMIT
        AS-&gt;&gt;DB1: UPDATE saga (COMPENSATED → FAILED)
        AS--&gt;&gt;C: 5xx + error code
    end
</code></pre>
<p>Read this once top to bottom and you'll understand the entire onboarding workflow. That's the value of orchestration — the sequence diagram <em>is</em> the architecture.</p>
<h2 id="heading-8-the-state-machine">8. The State Machine</h2>
<p>Every transition is written to <code>agency_onboarding_sagas</code> <strong>before</strong> the next step runs. That is what makes the saga observable and recoverable.</p>
<pre><code class="language-ts">export enum AgencyOnboardingSagaStatus {
  STARTED            = 'STARTED',            // Row exists, no side effects yet
  AUTH_PROVISIONED   = 'AUTH_PROVISIONED',   // Auth side committed
  AGENCY_CREATED     = 'AGENCY_CREATED',     // Agency row committed
  COMPLETED          = 'COMPLETED',          // Happy-path terminal state
  COMPENSATING       = 'COMPENSATING',       // Rollback in progress
  COMPENSATED        = 'COMPENSATED',        // Rollback finished
  FAILED             = 'FAILED',             // Terminal failure (with or without compensation)
}
</code></pre>
<p>Why so many states? Because <em>"what went wrong here?"</em> is a question someone will ask at 2am. A saga that only stores <code>success | failure</code> is useless for forensics.</p>
<pre><code class="language-plaintext">                ┌── auth fails ──────────► FAILED  (nothing to compensate)
                │
STARTED ──► AUTH_PROVISIONED ──► AGENCY_CREATED ──► COMPLETED  (happy path)
                                       │
                       agency fails ───┘
                                       ▼
                                COMPENSATING
                                       │
                                       ▼
                                COMPENSATED ──► FAILED  (consistent again)
</code></pre>
<p>The “point of no return” is <code>AUTH_PROVISIONED</code>. Before it, we can fail fast — there's nothing to undo. After it, every failure path <em>must</em> go through compensation.</p>
<h2 id="heading-9-implementing-the-orchestrator">9. Implementing the Orchestrator</h2>
<p>The orchestrator is the <em>only</em> place that knows the workflow. Each step is a private method, and each step persists its result before returning.</p>
<h3 id="heading-creating-the-saga-record">Creating the Saga Record</h3>
<pre><code class="language-ts">// agency-onboarding.saga.repository.ts
async createSaga(payload: CreateAgencyOrchestrationInput) {
  return this.sagaModel.create({
    sagaId: randomUUID(),                          // correlation id for everything
    status: AgencyOnboardingSagaStatus.STARTED,
    currentStep: 'STARTED',
    payload,                                       // full input snapshot for replay
  });
}
</code></pre>
<p>The <code>sagaId</code> is a UUID generated once and <strong>propagated to every downstream call</strong>. It's the single identifier that ties the saga log on the orchestrator side to the provision record on the participant side.</p>
<h3 id="heading-the-main-loop">The Main Loop</h3>
<pre><code class="language-ts">// agency-onboarding.orchestrator.ts (trimmed for the article)
async execute(input: CreateAgencyOrchestrationInput) {
  const saga = await this.sagaRepository.createSaga(input); // STARTED

  try {
    // Step 1 — auth-service work
    const authStep = await this.provisionAuth(saga, input);
    if (!authStep.ok) {
      await this.markFailed(saga, authStep.failure); // nothing to compensate
      return authStep.failure;
    }

    // Step 2 — agency-service work
    let activeSaga = authStep.saga; // status: AUTH_PROVISIONED
    try {
      activeSaga = await this.createAgencyRow(activeSaga, input, authStep.authIds);
    } catch (err) {
      // The expensive case: undo what auth-service did
      await this.compensateAuth(activeSaga, 'SAGA_FAILED');
      const failure = mapSagaFailure(err.message, 'SAGA_FAILED', 'CREATE_AGENCY');
      await this.markFailed(activeSaga, failure);
      return failure;
    }

    // Step 3 — mark done and run non-critical side effects
    activeSaga = await this.sagaRepository.updateSaga(activeSaga, {
      status: AgencyOnboardingSagaStatus.COMPLETED,
    });
    await this.sendWelcomeEmail(input, activeSaga); // best-effort

    return mapSagaSuccess(activeSaga, await this.agencyModel.findByPk(activeSaga.agencyId!));
  } catch (error) {
    // Defensive catch-all (lost DB connection, unexpected throw)
    await this.compensateAuth(saga, 'SAGA_FAILED');
    const failure = mapSagaFailure(error.message, 'SAGA_FAILED', 'SAGA');
    await this.markFailed(saga, failure);
    return failure;
  }
}
</code></pre>
<h3 id="heading-a-single-step-in-detail">A Single Step in Detail</h3>
<pre><code class="language-ts">private async provisionAuth(saga: AgencyOnboardingSaga, input: ...) {
  this.logger.log(`[${saga.sagaId}] PROVISION_AUTH`);

  const auth = await firstValueFrom(
    this.authClient.provisionAgencyAccount({
      sagaId: saga.sagaId,                  // &lt;-- correlation
      organizationName: input.agencyName.trim(),
      email: input.email.trim().toLowerCase(),
      // …
    }),
  );

  if (!auth.status || !auth.data) {
    return { ok: false, failure: mapAuthProvisionFailure(auth) };
  }

  // Persist the IDs we will need if we have to compensate later
  const updated = await this.sagaRepository.updateSaga(saga, {
    authOrganizationId: Number(auth.data.organizationId),
    authUserId: Number(auth.data.userId),
    authUserRoleId: Number(auth.data.userRoleId),
    status: AgencyOnboardingSagaStatus.AUTH_PROVISIONED,
  });

  return { ok: true, saga: updated, authIds: auth.data };
}
</code></pre>
<p>The line that does most of the work is the <code>updateSaga</code> call. It stores the foreign IDs returned by <code>auth-service</code> on the saga row, so even if the orchestrator process crashes and restarts, a recovery job can read that row and still know what to compensate.</p>
<h3 id="heading-habits-worth-copying">Habits Worth Copying</h3>
<ul>
<li><p><strong>Persist after every successful step</strong>, including the IDs you'll need to undo it.</p>
</li>
<li><p><strong>Distinguish critical vs non-critical steps.</strong> Welcome emails, audit logs and analytics events are <em>not</em> worth rolling a saga back for. They're best-effort.</p>
</li>
<li><p><strong>One log line per transition</strong>, prefixed with <code>[${sagaId}]</code>. Grep is your debugger.</p>
</li>
</ul>
<h2 id="heading-10-implementing-the-participant">10. Implementing the Participant</h2>
<p>The participant (<code>auth-service</code>) wraps all of its own work in a local DB transaction. Inside that boundary it's still ACID — the saga only handles the cross-service problem.</p>
<pre><code class="language-ts">// agency-provisioning.service.ts (trimmed)
async provisionAgencyAccount(req: ProvisionAgencyAccountInput) {

  // 1. Idempotency — return the previous result if this sagaId already provisioned.
  const existing = await this.provisionRecordModel.findOne({
    where: { sagaId: req.sagaId },
  });
  if (existing) {
    return serviceSuccess('Agency admin already onboarded', {
      userId: Number(existing.userId),
      organizationId: Number(existing.organizationId),
      userRoleId: Number(existing.roleId),
    });
  }

  // 2. Domain validation BEFORE the transaction (fail fast).
  if (await this.emailExists(req.email)) {
    return serviceFailure('Email already exists', { code: 'EMAIL_EXISTS' });
  }
  if (await this.organizationExists(req.organizationName)) {
    return serviceFailure('Organization already exists', { code: 'ORGANIZATION_EXISTS' });
  }

  // 3. The actual work — atomic at the auth-service boundary.
  return withSequelizeTransaction(this.sequelize, async (tx) =&gt; {
    const org = await this.organizationModel.create({ ... }, { transaction: tx });
    const user = await this.userModel.create({ ..., organizationId: org.id }, { transaction: tx });
    await this.userRoleModel.create({ userId: user.id, roleId: agencyAdminRole.id }, { transaction: tx });

    // The audit record that makes compensation possible later.
    await this.provisionRecordModel.create(
      { sagaId: req.sagaId, organizationId: org.id, userId: user.id, roleId: agencyAdminRole.id },
      { transaction: tx },
    );

    return serviceSuccess('Provisioned', {
      userId: user.id, organizationId: org.id, userRoleId: agencyAdminRole.id,
    });
  });
}
</code></pre>
<p>Three things make this method "saga-safe":</p>
<ol>
<li><p><strong>Idempotency check first:</strong> If the orchestrator retries (network blip, gRPC timeout), the second call is a no-op that returns the same IDs. No duplicate users.</p>
</li>
<li><p><strong>Validation outside the transaction:</strong> Cheap reads first, expensive writes second.</p>
</li>
<li><p><strong>One transaction wraps every write:</strong> If any insert fails, the whole thing rolls back automatically. The orchestrator sees a clean failure response and knows nothing was persisted.</p>
</li>
</ol>
<p>The <code>agency_provision_records</code> table is the single most important piece of the participant. It's <strong>both</strong> the idempotency key <em>and</em> the compensation lookup — keyed by the same <code>sagaId</code> the orchestrator uses.</p>
<h2 id="heading-11-rollback-compensation">11. Rollback (Compensation)</h2>
<p>Compensation is just another gRPC call. The orchestrator sends the <code>sagaId</code> and the IDs it remembers. The participant deletes everything it created, <strong>in reverse dependency order</strong>, inside its own DB transaction.</p>
<h3 id="heading-on-the-orchestrator-side">On the Orchestrator Side</h3>
<pre><code class="language-ts">private async compensateAuth(saga: AgencyOnboardingSaga, errorCode?: string) {
  if (!saga.authUserId &amp;&amp; !saga.authOrganizationId) {
    // Nothing was provisioned — nothing to compensate.
    return;
  }

  // Mark the saga as compensating BEFORE the call, so the row is consistent
  // even if the compensating RPC times out.
  await this.sagaRepository.updateSaga(saga, {
    status: AgencyOnboardingSagaStatus.COMPENSATING,
    currentStep: 'COMPENSATING',
    errorCode,
  });

  try {
    const rollback = await firstValueFrom(this.authClient.compensateAgencyAccount({
      sagaId: saga.sagaId,
      organizationId: saga.authOrganizationId,
      userId: saga.authUserId,
    }));
    if (!rollback.status) {
      this.logger.error(`[\({saga.sagaId}] Auth compensation returned failure: \){rollback.message}`);
    }
  } catch (err) {
    this.logger.error(`[\({saga.sagaId}] Auth compensation RPC failed: \){err.message}`);
  }

  await this.sagaRepository.updateSaga(saga, {
    status: AgencyOnboardingSagaStatus.COMPENSATED,
    currentStep: 'COMPENSATED',
  });
}
</code></pre>
<h3 id="heading-on-the-participant-side">On the Participant Side</h3>
<pre><code class="language-ts">private async rollbackProvisionedAuth(req, sagaId: string, tx: Transaction) {
  // Use the saga log as the source of truth — even if the caller forgot IDs.
  const record = await this.provisionRecordModel.findOne({
    where: { sagaId }, transaction: tx,
  });
  const userId         = req.userId         ?? record?.userId;
  const organizationId = req.organizationId ?? record?.organizationId;

  if (userId) {
    const user = await this.userModel.findByPk(userId, { transaction: tx, attributes: ['email'] });
    await this.userRoleModel.destroy({ where: { userId }, transaction: tx });
    if (user?.email) {
      await this.passwordResetTokenModel.destroy({ where: { email: user.email }, transaction: tx });
    }
    await this.userModel.destroy({ where: { id: userId }, transaction: tx });
  }
  if (organizationId) {
    await this.organizationModel.destroy({ where: { id: organizationId }, transaction: tx });
  }
  if (record) {
    await record.destroy({ transaction: tx });
  }
}
</code></pre>
<h3 id="heading-rules-of-a-good-compensation">Rules of a Good Compensation</h3>
<ol>
<li><p><strong>Reverse the order of creation:</strong> Children first (user_roles, tokens), then parents (users, organizations). The same rule you follow for <code>DROP TABLE</code> statements.</p>
</li>
<li><p><strong>Be idempotent:</strong> Receiving the same <code>sagaId</code> twice must be safe — every <code>destroy</code> is a no-op if the row is already gone.</p>
</li>
<li><p><strong>Use the saga log, not just the request:</strong> If the caller forgets an ID or sends a partial payload, look it up by <code>sagaId</code>. Defence in depth.</p>
</li>
<li><p><strong>Wrap it in a local transaction:</strong> The rollback must itself be atomic — half-undone is worse than not-undone.</p>
</li>
<li><p><strong>Always close the loop on the orchestrator side:</strong> Mark <code>COMPENSATED</code> even if the RPC failed. The failure should also be surfaced (log, metric, alert). A stuck <code>COMPENSATING</code> row is an operational landmine.</p>
</li>
</ol>
<h3 id="heading-what-happens-if-the-compensation-itself-fails">What Happens if the Compensation Itself Fails?</h3>
<p>This is the worst case in any saga design. There are three reasonable strategies:</p>
<p>First, you can retry with exponential backoff. This works for transient failures (network, deadlocks).</p>
<p>Second, you can dead-letter the saga — write it to a "needs human attention" queue and alert.</p>
<p>Third, you can expose a manual rollback endpoint. This reference implementation does that via <code>RollbackAgencyOnboarding</code> gRPC, so an operator can replay compensation with the same <code>sagaId</code>.</p>
<p>A production system should combine all three. The pattern doesn't decide for you. <em>You</em> decide based on your business risk.</p>
<h2 id="heading-12-tracking-idempotency-and-observability">12. Tracking, Idempotency and Observability</h2>
<p>Two tables, both keyed by the same UUID <code>sagaId</code>, give you full traceability across services.</p>
<h3 id="heading-orchestrator-side-agencyonboardingsagas">Orchestrator Side — <code>agency_onboarding_sagas</code></h3>
<table>
<thead>
<tr>
<th>column</th>
<th>purpose</th>
</tr>
</thead>
<tbody><tr>
<td><code>sagaId</code> (UUID, unique)</td>
<td>Propagated to every RPC. The join key across services.</td>
</tr>
<tr>
<td><code>status</code></td>
<td>Current state in the state machine.</td>
</tr>
<tr>
<td><code>currentStep</code></td>
<td>Human-readable label for dashboards (<code>PROVISION_AUTH</code>, <code>CREATE_AGENCY</code>…).</td>
</tr>
<tr>
<td><code>payload</code> (JSONB)</td>
<td>Snapshot of the input — used for replay, debug, support.</td>
</tr>
<tr>
<td><code>authOrganizationId</code>, <code>authUserId</code>, <code>authUserRoleId</code></td>
<td>Foreign IDs needed for compensation.</td>
</tr>
<tr>
<td><code>agencyId</code></td>
<td>Set once the agency row exists.</td>
</tr>
<tr>
<td><code>errorCode</code>, <code>errorMessage</code></td>
<td>Filled on failure.</td>
</tr>
<tr>
<td><code>createdAt</code>, <code>updatedAt</code></td>
<td>Timeline for the saga.</td>
</tr>
</tbody></table>
<p>A real row in <code>COMPLETED</code> state looks roughly like this:</p>
<pre><code class="language-json">{
  "sagaId": "0a4f3e2c-7b11-4f8d-9a2c-90b6f5f5b8a1",
  "status": "COMPLETED",
  "currentStep": "COMPLETED",
  "agencyId": 17,
  "authOrganizationId": 42,
  "authUserId": 99,
  "authUserRoleId": 3,
  "errorCode": null,
  "errorMessage": null,
  "payload": { "agencyName": "Acme Education", "email": "admin@acme.com", "...": "..." },
  "createdAt": "2026-05-22T10:14:32.118Z",
  "updatedAt": "2026-05-22T10:14:33.412Z"
}
</code></pre>
<h3 id="heading-participant-side-agencyprovisionrecords">Participant Side — <code>agency_provision_records</code></h3>
<table>
<thead>
<tr>
<th>column</th>
<th>purpose</th>
</tr>
</thead>
<tbody><tr>
<td><code>sagaId</code> (unique)</td>
<td>Idempotency key. The same <code>sagaId</code> from the orchestrator.</td>
</tr>
<tr>
<td><code>userId</code>, <code>organizationId</code>, <code>roleId</code></td>
<td>What to delete on compensation.</td>
</tr>
<tr>
<td><code>createdAt</code>, <code>updatedAt</code></td>
<td>Audit timestamps.</td>
</tr>
</tbody></table>
<h3 id="heading-observability-for-free">Observability for Free</h3>
<p>Because every log line is prefixed with <code>[${sagaId}]</code>, a single grep across both services gives the full timeline:</p>
<pre><code class="language-plaintext">[0a4f3e2c…] PROVISION_AUTH                  agency-service
[0a4f3e2c…] provisionAgencyAccount: ok      auth-service
[0a4f3e2c…] CREATE_AGENCY                   agency-service
[0a4f3e2c…] Agency step failed: ...         agency-service
[0a4f3e2c…] Auth compensation completed     auth-service
</code></pre>
<p>In a structured-logging setup (Loki, Elasticsearch, Datadog) this becomes a one-click filter. <strong>The</strong> <code>sagaId</code> <strong>is your distributed trace.</strong></p>
<h2 id="heading-13-testing-a-saga">13. Testing a Saga</h2>
<p>A saga is just a state machine, so the test matrix is finite and small. Cover at least these cases:</p>
<table>
<thead>
<tr>
<th>#</th>
<th>Scenario</th>
<th>Expected end state</th>
</tr>
</thead>
<tbody><tr>
<td>1</td>
<td>Happy path</td>
<td><code>COMPLETED</code>, agency exists, user exists</td>
</tr>
<tr>
<td>2</td>
<td>Auth step fails (e.g. email exists)</td>
<td><code>FAILED</code>, no rows on either side</td>
</tr>
<tr>
<td>3</td>
<td>Agency step fails</td>
<td><code>COMPENSATED</code>, auth rows gone, no agency</td>
</tr>
<tr>
<td>4</td>
<td>Compensation RPC times out</td>
<td><code>COMPENSATING</code> → operator-driven recovery</td>
</tr>
<tr>
<td>5</td>
<td>Caller retries with the same <code>sagaId</code></td>
<td>Second call returns the first call's result; no duplicate rows</td>
</tr>
<tr>
<td>6</td>
<td>Welcome email fails</td>
<td><code>COMPLETED</code> still — non-critical step did not cascade</td>
</tr>
</tbody></table>
<p>Two practical tips for testing:</p>
<p>First, mock the gRPC client at the orchestrator level, not the network. You want to assert that <code>compensateAgencyAccount</code> <em>was called with the right</em> <code>sagaId</code>, not that bytes hit a socket.</p>
<p>Second, spin up a real Postgres in integration tests (Testcontainers, or a Docker Compose <code>postgres</code> service). The saga state machine is too easy to "test" against a mock and too easy to break against a real DB.</p>
<h2 id="heading-14-when-not-to-use-a-saga">14. When NOT to Use a Saga</h2>
<p>Sagas are not free. Skip them when:</p>
<ul>
<li><p><strong>One service does all the writes.</strong> Use a regular DB transaction. Don't reinvent the wheel.</p>
</li>
<li><p><strong>The workflow is read-only or analytical.</strong> No rollback semantics exist for a SELECT.</p>
</li>
<li><p><strong>The "rollback" is impossible.</strong> You sent a real email. You charged a credit card and the gateway doesn't support refunds. In those cases, design forward: send an apology email, queue a manual refund. Sagas can't unsend physical actions.</p>
</li>
<li><p><strong>You don't actually have multiple services yet.</strong> A saga in a monolith is over-engineering. Wait until the service boundary is real.</p>
</li>
</ul>
<p>A saga adds a state table, a compensation method per step, and an operational habit of grepping by <code>sagaId</code>. That cost is worth paying when the alternative is orphaned data — and not before.</p>
<h2 id="heading-15-trade-offs-and-lessons-learned">15. Trade-offs and Lessons Learned</h2>
<p>Things that worked well in this design:</p>
<ul>
<li><p>Synchronous orchestration is easier to debug than choreography. A new engineer reads one file and understands the whole flow.</p>
</li>
<li><p>Idempotency at the participant is non-negotiable. Retries from the orchestrator must be safe. Build it in from day one — retro-fitting is painful.</p>
</li>
<li><p>The saga table replaces tribal knowledge. Ops can answer <em>"what happened to this signup?"</em> with a single SQL query. The payload JSONB is gold during incidents.</p>
</li>
<li><p><code>sagaId</code> as the trace key plays nicely with OpenTelemetry / Datadog / Loki — no extra infra to set up.</p>
</li>
</ul>
<p>Things to know before copying this pattern:</p>
<ul>
<li><p>A failing compensation is the worst case. If <code>compensateAgencyAccount</code> itself errors, you have inconsistent state. Plan for retries + dead-letter + a manual rollback endpoint from the start.</p>
</li>
<li><p>Non-critical steps must be marked explicitly. Here, the welcome email is allowed to fail without rolling back the agency. Don't accidentally compensate over a flaky SMTP provider.</p>
</li>
<li><p>Sagas aren't a replacement for local transactions. Inside each service, still use a real DB transaction. The saga only handles the cross-service seam.</p>
</li>
<li><p>Synchronous gRPC is simple but couples availability. If <code>auth-service</code> is down, agency creation fails. Swap the gRPC calls for a durable message bus (RabbitMQ / Kafka) and treat each step as a command + reply when you need higher resilience.</p>
</li>
<li><p>The orchestrator becomes a critical service. Treat its uptime accordingly — monitor saga durations, alert on stuck <code>COMPENSATING</code> rows, and run more than one replica.</p>
</li>
</ul>
<h2 id="heading-16-conclusion">16. Conclusion</h2>
<p>The saga pattern isn't magic. It's a disciplined version of what experienced engineers already do by hand: <em>commit locally, record what you did, and know how to undo it.</em></p>
<p>In Node.js with NestJS, you only need three ingredients:</p>
<ol>
<li><p><strong>A state table</strong> to track the saga.</p>
</li>
<li><p><strong>An orchestrator</strong> that drives the workflow and writes that state.</p>
</li>
<li><p><strong>A participant</strong> that exposes a <code>do</code> and an <code>undo</code> operation, both idempotent and keyed by <code>sagaId</code>.</p>
</li>
</ol>
<p>Get those three right and your microservices can offer the same "all-or-nothing" feel as a monolithic transaction — without the operational pain of distributed locks.</p>
<p>Start simple, use orchestration, make every step idempotent, persist before you call, and always know how to undo. That's the whole pattern.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Self‑Host an S3‑Compatible Object Store with MinIO on Your Staging Server (and Save Hundreds of Dollars a Month) ]]>
                </title>
                <description>
                    <![CDATA[ This article is a complete copy‑paste guide to running MinIO behind Traefik with HTTPS, custom domains, and pre-signed upload/download URLs — using only Docker Compose. Your production will keep using ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-self-host-an-s3-compatible-object-store-with-minio-on-your-staging-server/</link>
                <guid isPermaLink="false">6a1d99eb2f5663bb4c520a8f</guid>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cloud-storage ]]>
                    </category>
                
                    <category>
                        <![CDATA[ S3 ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Md Tarikul Islam ]]>
                </dc:creator>
                <pubDate>Mon, 01 Jun 2026 14:40:43 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/a7e1dd1d-2e31-4d80-ae9b-10242588a5e1.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>This article is a complete copy‑paste guide to running MinIO behind Traefik with HTTPS, custom domains, and pre-signed upload/download URLs — using only Docker Compose.</p>
<p>Your production will keep using a managed S3 / Cloudflare R2 / Hetzner Object Storage, while every staging upload, download, and pre-signed URL goes to your <strong>own</strong> server for free.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-1-why-selfhost-object-storage-on-staging">1. Why Self‑Host Object Storage on Staging?</a></p>
</li>
<li><p><a href="#heading-2-the-architecture-production-vs-staging">2. The Architecture: Production vs. Staging</a></p>
</li>
<li><p><a href="#heading-3-prerequisites">3. Prerequisites</a></p>
</li>
<li><p><a href="#heading-4-step-1-dns-point-your-domains-to-the-staging-server">4. Step 1 — DNS: Point Your Domains to the Staging Server</a></p>
</li>
<li><p><a href="#heading-5-step-2-run-minio-with-docker-compose">5. Step 2 — Run MinIO with Docker Compose</a></p>
</li>
<li><p><a href="#heading-6-step-3-expose-minio-over-https-with-traefik">6. Step 3 — Expose MinIO over HTTPS with Traefik</a></p>
</li>
<li><p><a href="#heading-7-step-4-create-the-bucket-and-access-keys">7. Step 4 — Create the Bucket and Access Keys</a></p>
</li>
<li><p><a href="#heading-8-step-5-configure-your-app-to-use-minio-on-staging-only">8. Step 5 — Configure Your App to Use MinIO on Staging Only</a></p>
</li>
<li><p><a href="#heading-9-step-6-upload-files-3-ways">9. Step 6 — Upload Files (3 Ways)</a></p>
</li>
<li><p><a href="#heading-10-step-7-generate-presigned-urls-put-and-get">10. Step 7 — Generate Presigned URLs (PUT and GET)</a></p>
</li>
<li><p><a href="#heading-11-step-8-get-public-urls-for-documents">11. Step 8 — Get Public URLs for Documents</a></p>
</li>
<li><p><a href="#heading-12-step-9-lock-down-cors-lifecycle-and-security">12. Step 9 — Lock Down CORS, Lifecycle, and Security</a></p>
</li>
<li><p><a href="#heading-13-step-10-backups-and-monitoring">13. Step 10 — Backups and Monitoring</a></p>
</li>
<li><p><a href="#heading-14-troubleshooting-cheat-sheet">14. Troubleshooting Cheat Sheet</a></p>
</li>
<li><p><a href="#heading-15-wrapping-up">15. Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-1-why-selfhost-object-storage-on-staging">1. Why Self‑Host Object Storage on Staging?</h2>
<p>If your app handles documents — PDFs, profile pictures, application transcripts, recordings — every test upload your QA team makes costs real money on AWS S3, Cloudflare R2, or Hetzner Object Storage. The price isn't huge per file, but staging is where you:</p>
<ul>
<li><p>run automated end‑to‑end tests that upload thousands of dummy files,</p>
</li>
<li><p>reset databases nightly (which leaves orphan objects behind),</p>
</li>
<li><p>let developers experiment with broken code that re‑uploads the same files,</p>
</li>
<li><p>and hold months of test data nobody ever deletes.</p>
</li>
</ul>
<p>In production those costs are justified. Managed storage gives you replication, availability, and someone else's pager. In staging, those costs are pure waste.</p>
<p><a href="https://min.io/"><strong>MinIO</strong></a> is a free, open‑source, S3‑compatible object server. Same API, same SDKs, same presigned URLs, same <code>mc</code>/<code>aws s3</code> CLIs — but running on your own VPS, billed at $0 per gigabyte. Point your staging app at MinIO, point your production app at S3/R2, and the only thing that changes is an environment variable.</p>
<p><strong>The result:</strong> identical code paths in both environments, zero storage bill on staging, and a nice fallback if your cloud provider ever has an outage.</p>
<h2 id="heading-2-the-architecture-production-vs-staging">2. The Architecture: Production vs. Staging</h2>
<p>In real-world applications, you usually don’t want your development or staging environment writing directly to production storage.</p>
<p>A common and cost-effective setup is:</p>
<ul>
<li><p><strong>Production</strong>: managed cloud object storage</p>
</li>
<li><p><strong>Staging / Development</strong>: self-hosted S3-compatible storage</p>
</li>
</ul>
<p>The good part is that your application code doesn't need to change.</p>
<p>As long as both services are S3-compatible, the same SDK and upload logic work everywhere. Only the environment variables differ.</p>
<h3 id="heading-high-level-architecture">High-Level Architecture</h3>
<img src="https://cdn.hashnode.com/uploads/covers/66cb39fcaa2a09f9a8d691c1/01ddeefd-8a67-42e3-a3af-9b1d3664bdb2.png" alt="High-level architecture showing a Next.js application uploading files to Cloudflare R2 in production and MinIO in staging through the same S3-compatible API." style="display:block;margin:0 auto" width="426" height="421" loading="lazy">

<p>The above diagram illustrates how the same application can communicate with different storage providers depending on the deployment environment.</p>
<p>In the <strong>production environment</strong>, uploads are stored in a managed object storage service such as AWS S3, Cloudflare R2, or Hetzner Object Storage. These services handle durability, scalability, backups, and infrastructure management.</p>
<p>In the <strong>staging environment</strong>, uploads are directed to a self-hosted MinIO instance running inside Docker on a VPS. MinIO implements the S3 API, making it behave similarly to production storage while keeping costs low.</p>
<p>Because both storage systems are S3-compatible, the application uses the same upload logic in every environment. The only difference is the configuration provided through environment variables.</p>
<h3 id="heading-why-this-architecture-is-useful">Why This Architecture Is Useful</h3>
<p>This setup gives you:</p>
<ul>
<li><p>A cheap staging environment</p>
</li>
<li><p>Production-like testing</p>
</li>
<li><p>Zero storage vendor lock-in</p>
</li>
<li><p>The ability to switch providers without rewriting application code</p>
</li>
</ul>
<p>Because both environments speak the S3 protocol, your upload logic remains identical.</p>
<h3 id="heading-example-environment-variables">Example Environment Variables</h3>
<p>Your application only reads environment variables like these:</p>
<pre><code class="language-xml">S3_ENDPOINT=
S3_REGION=
S3_ACCESS_KEY=
S3_SECRET_KEY=
S3_BUCKET=
</code></pre>
<p>Switch the values, and the exact same application now uploads files to a different backend.</p>
<h3 id="heading-production-storage-example">Production Storage Example</h3>
<p>In production, you typically use managed object storage providers such as:</p>
<ul>
<li><p>AWS S3</p>
</li>
<li><p>Cloudflare R2</p>
</li>
<li><p>Hetzner Object Storage</p>
</li>
</ul>
<p>Example:</p>
<pre><code class="language-plaintext">S3_ENDPOINT=https://&lt;region&gt;.r2.cloudflarestorage.com
</code></pre>
<p>The benefits are that it's highly scalable, globally available, durable, has managed backups, and doesn't have infrastructure maintenance.</p>
<h3 id="heading-staging-environment-example">Staging Environment Example</h3>
<p>For staging, a lightweight self-hosted MinIO container is often enough.</p>
<pre><code class="language-plaintext">Next.js App
     ↓
MinIO Container (inside Docker on VPS)
</code></pre>
<p>Example domains:</p>
<table>
<thead>
<tr>
<th>Service</th>
<th>Domain</th>
<th>Internal Port</th>
</tr>
</thead>
<tbody><tr>
<td>MinIO S3 API</td>
<td><a href="http://minio-staging.domain.com"><code>minio-staging.domain.com</code></a></td>
<td><code>9000</code></td>
</tr>
<tr>
<td>MinIO Web Console</td>
<td><a href="http://minio-console-staging.domain.com"><code>minio-console-staging.domain.com</code></a></td>
<td><code>9001</code></td>
</tr>
</tbody></table>
<p>This allows you to:</p>
<ul>
<li><p>Test uploads safely</p>
</li>
<li><p>Avoid production storage costs</p>
</li>
<li><p>Reproduce production-like behavior locally</p>
</li>
</ul>
<h2 id="heading-3-prerequisites">3. Prerequisites</h2>
<p>You'll need:</p>
<ul>
<li><p>A Linux VPS (Hetzner, DigitalOcean, Contabo, OVH — anything with a public IP).</p>
</li>
<li><p>Two A records pointing at that IP (we'll register them next).</p>
</li>
<li><p>Docker + Docker Compose v2.</p>
</li>
<li><p><a href="https://traefik.io/">Traefik</a> v2 in front, with Let's Encrypt configured (any reverse proxy works&nbsp;– the labels below are Traefik's flavor).</p>
</li>
<li><p>Open ports <code>80</code> and <code>443</code> on the firewall for Let's Encrypt + HTTPS.</p>
</li>
<li><p>~10 GB free disk for the MinIO data volume to start.</p>
</li>
</ul>
<p>If Docker isn't installed:</p>
<pre><code class="language-bash">curl -fsSL https://get.docker.com | sh
sudo apt-get install -y docker-compose-plugin
docker --version &amp;&amp; docker compose version
</code></pre>
<h2 id="heading-4-step-1-dns-point-your-domains-to-the-staging-server">4. Step 1 — DNS: Point Your Domains to the Staging Server</h2>
<p>In your DNS provider (Cloudflare, Route 53, Namecheap, and so on), create two <strong>A records</strong> pointing at your staging server's public IP:</p>
<pre><code class="language-plaintext">minio-staging.domain.com           A    203.0.113.45
minio-console-staging.domain.com   A    203.0.113.45
</code></pre>
<p>If you use Cloudflare, set the proxy status to <strong>DNS only</strong> (gray cloud) for <code>minio-staging.*</code>. Cloudflare's free plan caps uploads at 100 MB, and you don't want it stripping S3 signing headers. The console subdomain can stay proxied if you want a WAF in front of it.</p>
<p>Wait a minute and verify:</p>
<pre><code class="language-bash">dig +short minio-staging.domain.com
# 203.0.113.45
</code></pre>
<h2 id="heading-5-step-2-run-minio-with-docker-compose">5. Step 2 — Run MinIO with Docker Compose</h2>
<p>Add this service to your staging compose file (<code>docker-compose.staging.yml</code>). MinIO is just one container — the disk is mounted as a Docker volume so data survives upgrades.</p>
<pre><code class="language-yaml"># docker-compose.staging.yml
networks:
  proxy:
    external: true
    name: proxy
  internal:
    name: internal

volumes:
  minio-data:

services:
  minio:
    image: minio/minio:latest
    container_name: minio-staging
    restart: unless-stopped
    environment:
      - MINIO_ROOT_USER=${MINIO_ROOT_USER:-admin}
      - MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-change-me-please}
      # Tell MinIO which public domain to sign URLs with
      - MINIO_SERVER_URL=https://minio-staging.domain.com
      - MINIO_BROWSER_REDIRECT_URL=https://minio-console-staging.domain.com
    command: server /data --console-address ":9001"
    volumes:
      - minio-data:/data
    networks:
      - proxy
      - internal
    ports:
      - "9000:9000"  # S3 API
      - "9001:9001"  # Web console
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s
</code></pre>
<p>Two things deserve attention:</p>
<ul>
<li><p><code>MINIO_SERVER_URL</code> is the secret sauce. Without it, MinIO signs presigned URLs using its internal hostname (<code>http://minio:9000</code>), which then fails verification when the browser hits the public domain. Set it to the exact HTTPS URL clients will use.</p>
</li>
<li><p><code>MINIO_BROWSER_REDIRECT_URL</code> does the same for the web console (login redirects, OIDC callbacks, and so on).</p>
</li>
</ul>
<p>Bring it up:</p>
<pre><code class="language-bash">docker compose -f docker-compose.staging.yml up -d minio
docker compose -f docker-compose.staging.yml logs -f minio
</code></pre>
<p>You should see <code>API: http://...</code> and <code>Console: http://...</code> lines.</p>
<h2 id="heading-6-step-3-expose-minio-over-https-with-traefik">6. Step 3 — Expose MinIO over HTTPS with Traefik</h2>
<p>We don't expose ports <code>9000</code>/<code>9001</code> to the world directly — Traefik does that for us, terminating TLS with a free Let's Encrypt certificate.</p>
<p>Add these labels to the <code>minio</code> service:</p>
<pre><code class="language-yaml">    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=proxy"

      # ---- S3 API (port 9000) ----
      - "traefik.http.routers.minio-staging.rule=Host(`minio-staging.domain.com`)"
      - "traefik.http.routers.minio-staging.entrypoints=websecure"
      - "traefik.http.routers.minio-staging.tls.certresolver=letsencrypt"
      - "traefik.http.routers.minio-staging.service=minio-staging"
      - "traefik.http.services.minio-staging.loadbalancer.server.port=9000"

      # ---- Web Console (port 9001) ----
      - "traefik.http.routers.minio-console-staging.rule=Host(`minio-console-staging.domain.com`)"
      - "traefik.http.routers.minio-console-staging.entrypoints=websecure"
      - "traefik.http.routers.minio-console-staging.tls.certresolver=letsencrypt"
      - "traefik.http.routers.minio-console-staging.service=minio-console-staging"
      - "traefik.http.services.minio-console-staging.loadbalancer.server.port=9001"
</code></pre>
<p>You also need an <code>entrypoint</code> for <code>:443</code> and a <code>certificatesresolver</code> named <code>letsencrypt</code>. Here's the minimum Traefik config (<code>traefik.staging.yml</code>):</p>
<pre><code class="language-yaml">api:
  dashboard: true

entryPoints:
  web:
    address: ":80"
  websecure:
    address: ":443"

certificatesResolvers:
  letsencrypt:
    acme:
      httpChallenge:
        entryPoint: web
      email: admin@domain.com
      storage: /etc/traefik/acme.json

providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
    network: proxy
</code></pre>
<p>Restart and watch the cert get issued:</p>
<pre><code class="language-bash">docker compose -f docker-compose.staging.yml up -d
docker compose -f docker-compose.staging.yml logs -f traefik | grep -i acme
</code></pre>
<p>Sanity check from your laptop:</p>
<pre><code class="language-bash">curl -I https://minio-staging.domain.com/minio/health/live
# HTTP/2 200
</code></pre>
<p>You can now log in to the <strong>web console</strong> at <code>https://minio-console-staging.domain.com</code> with <code>admin</code> / <code>change-me-please</code>.</p>
<p><strong>Important upload size tweak:</strong> if you're behind Cloudflare or NGINX in front of Traefik, raise the request body limit. Traefik itself has no default limit, but Cloudflare's free plan refuses anything over 100 MB. For self‑hosted edge proxies, set <code>client_max_body_size 0;</code> (NGINX) or the equivalent.</p>
<h2 id="heading-7-step-4-create-the-bucket-and-access-keys">7. Step 4 — Create the Bucket and Access Keys</h2>
<p>Anything that speaks S3 can talk to MinIO. The easiest tool is <code>mc</code> (the official MinIO client), shipped inside the same image.</p>
<h3 id="heading-71-connect-mc-to-your-server">7.1 Connect mc to your server</h3>
<pre><code class="language-bash">docker exec -it minio-staging \
  mc alias set local http://localhost:9000 admin change-me-please
</code></pre>
<h3 id="heading-72-create-a-bucket">7.2 Create a bucket</h3>
<pre><code class="language-bash">docker exec -it minio-staging mc mb local/domain-files-staging
</code></pre>
<h3 id="heading-73-choose-a-bucket-policy">7.3 Choose a bucket policy</h3>
<p>You have three choices, so just pick based on what you store:</p>
<table>
<thead>
<tr>
<th>Policy</th>
<th>When to use</th>
</tr>
</thead>
<tbody><tr>
<td><code>private</code> (default)</td>
<td>Anything sensitive — student transcripts, contracts, internal docs. Reads only via presigned URL.</td>
</tr>
<tr>
<td><code>download</code></td>
<td>Public read, no listing. Good for CDN‑style assets like avatars.</td>
</tr>
<tr>
<td><code>public</code></td>
<td>Anyone can read AND list. Use only for truly public content.</td>
</tr>
</tbody></table>
<p>Set one:</p>
<pre><code class="language-bash"># Private (recommended for documents)
docker exec -it minio-staging \
  mc anonymous set none local/domain-files-staging

# OR public read for static assets only:
docker exec -it minio-staging \
  mc anonymous set download local/domain-files-staging
</code></pre>
<h3 id="heading-74-create-a-dedicated-app-user-dont-use-root-keys">7.4 Create a dedicated app user (don't use root keys!)</h3>
<p>The <code>admin</code> account can wipe everything. Make a least‑privilege user for your app:</p>
<pre><code class="language-bash">docker exec -it minio-staging mc admin user add local \
  domain-app a-long-random-secret-key

# Attach the built-in read/write policy, scoped to one bucket via JSON:
cat &gt; /tmp/policy.json &lt;&lt;'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:*"],
      "Resource": [
        "arn:aws:s3:::domain-files-staging",
        "arn:aws:s3:::domain-files-staging/*"
      ]
    }
  ]
}
EOF

docker cp /tmp/policy.json minio-staging:/tmp/policy.json
docker exec -it minio-staging \
  mc admin policy create local domain-rw /tmp/policy.json
docker exec -it minio-staging \
  mc admin policy attach local domain-rw --user domain-app
</code></pre>
<p>Save those two values — they are your <code>S3_ACCESS_KEY</code> and <code>S3_SECRET_KEY</code>.</p>
<h2 id="heading-8-step-5-configure-your-app-to-use-minio-on-staging-only">8. Step 5 — Configure Your App to Use MinIO on Staging Only</h2>
<p>The trick to "MinIO in staging, real S3 in prod" is to use the <strong>same S3 client</strong> in your code and only swap the env vars.</p>
<p>Your <code>staging.env</code> (loaded by your staging compose stack):</p>
<pre><code class="language-env"># ---- Staging: self-hosted MinIO ----
STORAGE_ENABLED=true
S3_ENDPOINT=https://minio-staging.domain.com
S3_PUBLIC_ENDPOINT=https://minio-staging.domain.com
S3_BUCKET=domain-files-staging
S3_ACCESS_KEY=domain-app
S3_SECRET_KEY=a-long-random-secret-key
S3_REGION=us-east-1
S3_FORCE_PATH_STYLE=true
</code></pre>
<p>Your <code>production.env</code>:</p>
<pre><code class="language-env"># ---- Production: Cloudflare R2 ----
STORAGE_ENABLED=true
S3_ENDPOINT=https://&lt;account-id&gt;.r2.cloudflarestorage.com
S3_PUBLIC_ENDPOINT=https://files.domain.com
S3_BUCKET=domain-files
S3_ACCESS_KEY=&lt;r2-access-key&gt;
S3_SECRET_KEY=&lt;r2-secret-key&gt;
S3_REGION=auto
S3_FORCE_PATH_STYLE=true
</code></pre>
<p><code>S3_FORCE_PATH_STYLE=true</code> is critical for both MinIO <strong>and</strong> R2/Hetzner. Without it, the SDK tries <code>https://bucket.minio-staging.domain.com</code> (virtual‑host style), which won't resolve.</p>
<p>Now in your application code (Node.js example using AWS SDK v3):</p>
<pre><code class="language-javascript">// src/lib/s3.js
import { S3Client } from "@aws-sdk/client-s3";

export const s3 = new S3Client({
  endpoint: process.env.S3_ENDPOINT,
  region: process.env.S3_REGION,
  credentials: {
    accessKeyId: process.env.S3_ACCESS_KEY,
    secretAccessKey: process.env.S3_SECRET_KEY,
  },
  forcePathStyle: process.env.S3_FORCE_PATH_STYLE === "true",
});

export const BUCKET = process.env.S3_BUCKET;
export const PUBLIC_ENDPOINT = process.env.S3_PUBLIC_ENDPOINT;
</code></pre>
<p>The same <code>s3</code> instance now talks to MinIO on staging and to R2 in production with no code change.</p>
<h2 id="heading-9-step-6-upload-files-3-ways">9. Step 6 — Upload Files (3 Ways)</h2>
<h3 id="heading-91-from-a-server-best-for-trusted-backends">9.1 From a server (best for trusted backends)</h3>
<pre><code class="language-javascript">import { PutObjectCommand } from "@aws-sdk/client-s3";
import { s3, BUCKET } from "./lib/s3.js";
import { readFile } from "node:fs/promises";

export async function uploadDocument(localPath, key, contentType) {
  const Body = await readFile(localPath);
  await s3.send(new PutObjectCommand({
    Bucket: BUCKET,
    Key: key,
    Body,
    ContentType: contentType,
    // Optional: per-object metadata, useful for audits
    Metadata: { uploadedBy: "system", env: process.env.NODE_ENV },
  }));
  return key;
}
</code></pre>
<h3 id="heading-92-with-the-mc-cli-good-for-oneoff-uploads-migrations">9.2 With the mc CLI (good for one‑off uploads / migrations)</h3>
<pre><code class="language-bash">mc alias set staging https://minio-staging.domain.com domain-app a-long-random-secret-key
mc cp ./report.pdf staging/domain-files-staging/reports/2026/report.pdf
mc ls staging/domain-files-staging --recursive
</code></pre>
<h3 id="heading-93-directly-from-the-browser-via-a-presigned-put-url">9.3 Directly from the browser via a presigned PUT URL</h3>
<p>The recommended pattern for user uploads is: the file goes from the browser to MinIO with <strong>zero</strong> bytes touching your API server.</p>
<p>We'll cover this in detail next.</p>
<h2 id="heading-10-step-7-generate-presigned-urls-put-and-get">10. Step 7 — Generate Presigned URLs (PUT and GET)</h2>
<p>A <strong>presigned URL</strong> is a regular HTTPS URL with a time‑limited signature in the query string. Anyone with the URL can do exactly the action it was signed for (PUT this object, or GET that object) for the next N minutes — and nothing else.</p>
<p>This is what makes "users upload directly to storage" safe.</p>
<h3 id="heading-101-presigned-put-for-uploads">10.1 Presigned PUT (for uploads)</h3>
<pre><code class="language-javascript">// src/lib/presign.js
import { PutObjectCommand, GetObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
import { s3, BUCKET } from "./s3.js";
import { randomUUID } from "node:crypto";

export async function presignUpload({ filename, contentType, userId }) {
  const key = `users/\({userId}/\){randomUUID()}-${filename}`;
  const cmd = new PutObjectCommand({
    Bucket: BUCKET,
    Key: key,
    ContentType: contentType,
  });
  const uploadUrl = await getSignedUrl(s3, cmd, { expiresIn: 60 * 5 }); // 5 min
  return { uploadUrl, key };
}
</code></pre>
<p>Wire it to your API:</p>
<pre><code class="language-javascript">// POST /api/uploads/presign
app.post("/api/uploads/presign", requireAuth, async (req, res) =&gt; {
  const { filename, contentType } = req.body;
  const result = await presignUpload({
    filename,
    contentType,
    userId: req.user.id,
  });
  res.json(result); // { uploadUrl, key }
});
</code></pre>
<p>The browser uploads straight to MinIO:</p>
<pre><code class="language-javascript">// In your frontend
async function uploadFile(file) {
  const { uploadUrl, key } = await fetch("/api/uploads/presign", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ filename: file.name, contentType: file.type }),
  }).then(r =&gt; r.json());

  await fetch(uploadUrl, {
    method: "PUT",
    headers: { "Content-Type": file.type },
    body: file,
  });

  // Persist `key` in your DB so you can retrieve it later
  await fetch("/api/documents", {
    method: "POST",
    body: JSON.stringify({ key, originalName: file.name }),
  });
}
</code></pre>
<p>The <code>Content-Type</code> you send during PUT <strong>must match</strong> the one you signed with, or MinIO will reject the request with <code>SignatureDoesNotMatch</code>. This catches everyone the first time.</p>
<h3 id="heading-102-presigned-get-for-downloads">10.2 Presigned GET (for downloads)</h3>
<p>Same idea, but with <code>GetObjectCommand</code>:</p>
<pre><code class="language-javascript">export async function presignDownload(key, expiresIn = 60 * 10) {
  const cmd = new GetObjectCommand({ Bucket: BUCKET, Key: key });
  return getSignedUrl(s3, cmd, { expiresIn });
}
</code></pre>
<p>A typical "view document" endpoint:</p>
<pre><code class="language-javascript">app.get("/api/documents/:id/url", requireAuth, async (req, res) =&gt; {
  const doc = await db.documents.findById(req.params.id);
  if (!doc || !canUserSee(req.user, doc)) return res.sendStatus(403);
  const url = await presignDownload(doc.key, 600);
  res.json({ url });
});
</code></pre>
<p>The frontend just opens that URL — the file streams from MinIO directly to the user.</p>
<h3 id="heading-103-why-presigned-urls-beat-proxy-through-the-api">10.3 Why presigned URLs beat "proxy through the API"</h3>
<table>
<thead>
<tr>
<th></th>
<th>Proxy through API</th>
<th>Presigned URL</th>
</tr>
</thead>
<tbody><tr>
<td>Bytes through your app</td>
<td>All of them</td>
<td>Zero</td>
</tr>
<tr>
<td>API CPU/RAM cost</td>
<td>High</td>
<td>None</td>
</tr>
<tr>
<td>Throughput limit</td>
<td>Your API</td>
<td>MinIO's NIC</td>
</tr>
<tr>
<td>Auth check</td>
<td>Your code</td>
<td>Your code (still — check before signing)</td>
</tr>
</tbody></table>
<h2 id="heading-11-step-8-get-public-urls-for-documents">11. Step 8 — Get Public URLs for Documents</h2>
<p>Sometimes you want a permanent, unauthenticated URL — for example public profile pictures.</p>
<p>If the bucket policy allows anonymous reads (<code>mc anonymous set download …</code>), the public URL pattern is:</p>
<pre><code class="language-plaintext">https://minio-staging.domain.com/&lt;bucket&gt;/&lt;key&gt;
</code></pre>
<p>So <code>users/42/avatar.png</code> becomes:</p>
<pre><code class="language-plaintext">https://minio-staging.domain.com/domain-files-staging/users/42/avatar.png
</code></pre>
<p>In code:</p>
<pre><code class="language-javascript">export function publicUrl(key) {
  return `\({process.env.S3_PUBLIC_ENDPOINT}/\){BUCKET}/${key}`;
}
</code></pre>
<p>For <strong>private</strong> buckets (most documents), don't use public URLs at all — always go through <code>presignDownload(key)</code> so you can re‑check authorization on every request and expire links.</p>
<h2 id="heading-12-step-9-lock-down-cors-lifecycle-and-security">12. Step 9 — Lock Down CORS, Lifecycle, and Security</h2>
<h3 id="heading-121-allow-your-frontend-origins-cors">12.1 Allow your frontend origins (CORS)</h3>
<p>Browser uploads need CORS rules on the bucket. Drop this JSON via <code>mc</code>:</p>
<pre><code class="language-bash">cat &gt; /tmp/cors.json &lt;&lt;'EOF'
{
  "CORSRules": [
    {
      "AllowedOrigins": [
        "https://crm-staging.domain.com",
        "http://localhost:3000"
      ],
      "AllowedMethods": ["GET", "PUT", "POST", "HEAD"],
      "AllowedHeaders": ["*"],
      "ExposeHeaders": ["ETag"],
      "MaxAgeSeconds": 3000
    }
  ]
}
EOF

docker cp /tmp/cors.json minio-staging:/tmp/cors.json
docker exec -it minio-staging \
  mc cors set local/domain-files-staging /tmp/cors.json
</code></pre>
<h3 id="heading-122-autodelete-old-test-files-lifecycle">12.2 Auto‑delete old test files (lifecycle)</h3>
<p>Staging accumulates junk. Tell MinIO to expire anything older than 30 days:</p>
<pre><code class="language-bash">docker exec -it minio-staging \
  mc ilm rule add --expire-days 30 local/domain-files-staging
</code></pre>
<h3 id="heading-123-encrypt-at-rest">12.3 Encrypt at rest</h3>
<pre><code class="language-bash">docker exec -it minio-staging \
  mc encrypt set sse-s3 local/domain-files-staging
</code></pre>
<h3 id="heading-124-hard-rules">12.4 Hard rules</h3>
<ul>
<li><p><strong>Never</strong> ship <code>MINIO_ROOT_USER=admin</code> / <code>MINIO_ROOT_PASSWORD=admin123</code> to a server reachable from the internet. Generate strong values and store them in your secret manager.</p>
</li>
<li><p>The root account should be used only by <code>mc admin</code>, never by your app. The app uses a scoped IAM user (Step 7.4).</p>
</li>
<li><p>Keep the <strong>console</strong> subdomain behind an IP allow‑list or basic auth via Traefik middleware if it's truly public.</p>
</li>
<li><p>Rotate the app access keys at least every 90 days.</p>
</li>
</ul>
<h2 id="heading-13-step-10-backups-and-monitoring">13. Step 10 — Backups and Monitoring</h2>
<h3 id="heading-131-backups-mirror-to-a-cheap-cold-bucket-weekly">13.1 Backups: mirror to a cheap cold bucket weekly</h3>
<p>Set up a tiny cron job that uses <code>mc mirror</code> to push to Backblaze B2, R2, or another cheap S3 endpoint:</p>
<pre><code class="language-bash">mc alias set b2 https://s3.us-east-005.backblazeb2.com \(B2_KEY \)B2_SECRET
mc mirror --overwrite --remove \
  staging/domain-files-staging \
  b2/domain-staging-backup
</code></pre>
<p>Even at $6/TB/month this is essentially free for staging volumes.</p>
<h3 id="heading-132-monitoring-with-prometheus">13.2 Monitoring with Prometheus</h3>
<p>MinIO exposes Prometheus metrics out of the box at <code>/minio/v2/metrics/cluster</code>. Scrape with:</p>
<pre><code class="language-yaml">scrape_configs:
  - job_name: minio
    metrics_path: /minio/v2/metrics/cluster
    scheme: https
    static_configs:
      - targets: ["minio-staging.domain.com"]
</code></pre>
<p>If you have Grafana, import dashboard ID <strong>13502</strong> for an instant overview (capacity, request rates, latency, error counts).</p>
<h2 id="heading-14-troubleshooting-cheat-sheet">14. Troubleshooting Cheat Sheet</h2>
<table>
<thead>
<tr>
<th>Symptom</th>
<th>Likely cause</th>
<th>Fix</th>
</tr>
</thead>
<tbody><tr>
<td><code>SignatureDoesNotMatch</code> on presigned PUT</td>
<td>Browser sent a different <code>Content-Type</code> than what was signed</td>
<td>Send the exact same <code>Content-Type</code> header during PUT</td>
</tr>
<tr>
<td>Presigned URL works locally but not in browser</td>
<td><code>MINIO_SERVER_URL</code> not set, so URLs are signed for <code>minio:9000</code></td>
<td>Set <code>MINIO_SERVER_URL=https://minio-staging.domain.com</code> and restart</td>
</tr>
<tr>
<td><code>403 SignatureDoesNotMatch</code> after going through Cloudflare</td>
<td>Cloudflare strips/modifies headers</td>
<td>Set the DNS record to <strong>DNS‑only</strong> (gray cloud)</td>
</tr>
<tr>
<td><code>NoSuchBucket</code></td>
<td>App pointing at the wrong endpoint or bucket</td>
<td>Re‑check <code>S3_ENDPOINT</code> and <code>S3_BUCKET</code> in env</td>
</tr>
<tr>
<td>Browser CORS preflight fails</td>
<td>No CORS rule on the bucket</td>
<td>Apply the CORS JSON from §12.1</td>
</tr>
<tr>
<td>Upload works for small files, fails at 100 MB</td>
<td>Cloudflare free plan body limit</td>
<td>Use Cloudflare paid plan, or skip CF proxy</td>
</tr>
<tr>
<td><code>x509: certificate signed by unknown authority</code> from your app</td>
<td>App container doesn't trust Let's Encrypt</td>
<td>Update CA bundle (<code>apt install ca-certificates</code>) or use HTTP inside the Docker network</td>
</tr>
<tr>
<td>Web console redirects to <code>http://minio:9001/login</code></td>
<td><code>MINIO_BROWSER_REDIRECT_URL</code> missing</td>
<td>Set it to <code>https://minio-console-staging.domain.com</code></td>
</tr>
</tbody></table>
<p>Useful diagnostics:</p>
<pre><code class="language-bash"># Check MinIO health
curl -I https://minio-staging.domain.com/minio/health/live

# List all objects in a bucket
docker exec -it minio-staging mc ls --recursive local/domain-files-staging

# Tail MinIO logs
docker compose -f docker-compose.staging.yml logs -f minio

# Decode a presigned URL to see what it was signed for
echo "&lt;paste url&gt;" | tr '&amp;' '\n'
</code></pre>
<h2 id="heading-15-wrapping-up">15. Wrapping Up</h2>
<p>Here's what you have now:</p>
<ul>
<li><p>A free, S3‑compatible object store running on your own staging server.</p>
</li>
<li><p>Real HTTPS on a real domain (<code>https://minio-staging.domain.com</code>), thanks to Traefik + Let's Encrypt.</p>
</li>
<li><p>A scoped, least‑privilege application user — root keys stay locked away.</p>
</li>
<li><p>The same exact code paths in staging and production. Switching between MinIO / R2 / Hetzner / AWS S3 is a four‑variable change in the env file.</p>
</li>
<li><p>Presigned PUT URLs so users upload straight to storage, bypassing your API.</p>
</li>
<li><p>Presigned GET URLs so private documents are short‑lived and authorization‑gated.</p>
</li>
<li><p>Lifecycle rules that nuke old test files automatically.</p>
</li>
<li><p>Optional weekly mirror to a cold backup bucket.</p>
</li>
</ul>
<p>Production keeps running on managed storage where the SLA matters. Staging now costs you exactly <strong>$0 per month per gigabyte uploaded</strong> — and you can finally stop telling QA to "delete the test files when you're done."</p>
<h3 id="heading-further-reading">Further Reading</h3>
<ul>
<li><p><a href="https://min.io/docs/minio/container/index.html">MinIO Documentation</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-s3-request-presigner/">AWS SDK v3 — <code>getSignedUrl</code></a></p>
</li>
<li><p><a href="https://doc.traefik.io/traefik/providers/docker/">Traefik v2 Docker provider</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-policies.html">S3 bucket policy reference</a></p>
</li>
</ul>
<p>If this guide saved your team a few dollars, share it with another team that's still uploading test PDFs to a $90/month S3 bucket. Happy shipping.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Deploy a Full-Stack Next.js App on Cloudflare Workers with GitHub Actions CI/CD ]]>
                </title>
                <description>
                    <![CDATA[ I typically build my projects using Next.js 14 (App Router) and Supabase for authentication along with Postgres. The default deployment choice for a Next.js app is usually Vercel, and for good reason: ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-deploy-a-full-stack-next-js-app-on-cloudflare-workers-with-github-actions-ci-cd/</link>
                <guid isPermaLink="false">69f2145e6e0124c05e1a5b6e</guid>
                
                    <category>
                        <![CDATA[ Next.js ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cloudflare ]]>
                    </category>
                
                    <category>
                        <![CDATA[ GitHub Actions ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cloud Computing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Md Tarikul Islam ]]>
                </dc:creator>
                <pubDate>Wed, 29 Apr 2026 14:23:26 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/cbb9e559-baa7-452c-992a-3416041712ad.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>I typically build my projects using Next.js 14 (App Router) and Supabase for authentication along with Postgres. The default deployment choice for a Next.js app is usually Vercel, and for good reason: it provides an excellent developer experience.</p>
<p>But after running the same project on both platforms for about a week, I started exploring Cloudflare Workers as an alternative. I noticed improvements in latency (lower TTFB) and found the free tier to be more flexible for my use case.</p>
<p>Deploying Next.js apps on Cloudflare used to be challenging. Earlier solutions like Cloudflare Pages had limitations with full Next.js features, and tools like <code>next-on-pages</code> often lagged behind the latest releases.</p>
<p>That changed with the introduction of <a href="https://opennext.js.org/cloudflare"><code>@opennextjs/cloudflare</code></a>. It allows you to compile a standard Next.js application into a Cloudflare Worker, supporting features like SSR, ISR, middleware, and the Image component – all without requiring major code changes.</p>
<p>In this guide, I’ll walk you through the exact steps I used to deploy my full-stack Next.js + Supabase application to Cloudflare Workers.</p>
<p>This article is the runbook I wish I had when I started.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-why-choose-cloudflare-workers-over-vercel">Why Choose Cloudflare Workers Over Vercel?</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-the-stack">The Stack</a></p>
</li>
<li><p><a href="#heading-step-1-install-the-cloudflare-adapter">Step 1 — Install the Cloudflare Adapter</a></p>
</li>
<li><p><a href="#heading-step-2-wire-opennext-into-next-dev">Step 2 — Wire OpenNext into next dev</a></p>
</li>
<li><p><a href="#heading-step-3-local-environment-setup-with-devvars">Step 3— Local Environment Setup with .dev.vars</a></p>
</li>
<li><p><a href="#heading-step-4-deploy-your-app-from-your-local-machine">Step 4 — Deploy Your App from Your Local Machine</a></p>
</li>
<li><p><a href="#heading-step-5-push-your-secrets-to-the-worker">Step 5 — Push your secrets to the Worker</a></p>
</li>
<li><p><a href="#heading-step-6-set-up-continuous-deployment-with-github-actions">Step 6 — Set Up Continuous Deployment with GitHub Actions</a></p>
</li>
<li><p><a href="#heading-step-7-updating-the-project-the-daily-workflow">Step 7 — Updating the project (the daily workflow)</a></p>
</li>
<li><p><a href="#heading-final-thoughts">Final thoughts</a></p>
</li>
</ul>
<h2 id="heading-why-choose-cloudflare-workers-over-vercel">Why Choose Cloudflare Workers Over Vercel?</h2>
<p>When deploying a Next.js application, Vercel is often the default choice. It offers a smooth developer experience and tight integration with Next.js.</p>
<p>But Cloudflare Workers provides a compelling alternative, especially when you care about global performance and cost efficiency.</p>
<p>Here’s a high-level comparison (at the time of writing):</p>
<table>
<thead>
<tr>
<th>Concern</th>
<th>Vercel (Hobby)</th>
<th>Cloudflare Workers (Free Tier)</th>
</tr>
</thead>
<tbody><tr>
<td>Requests</td>
<td>Fair usage limits</td>
<td>Millions of requests per day</td>
</tr>
<tr>
<td>Cold starts</td>
<td>~100–300 ms (region-based)</td>
<td>Near-zero (V8 isolates)</td>
</tr>
<tr>
<td>Edge locations</td>
<td>Limited regions for SSR</td>
<td>300+ global edge locations</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>~100 GB/month (soft cap)</td>
<td>Generous / no strict cap on free tier</td>
</tr>
<tr>
<td>Custom domains</td>
<td>Supported</td>
<td>Supported</td>
</tr>
<tr>
<td>Image optimization</td>
<td>Counts toward usage</td>
<td>Available via <code>IMAGES</code> binding</td>
</tr>
<tr>
<td>Pricing beyond free</td>
<td>Starts at ~$20/month</td>
<td>Low-cost, usage-based pricing</td>
</tr>
</tbody></table>
<h3 id="heading-key-takeaways">Key Takeaways</h3>
<ul>
<li><p><strong>Lower latency globally</strong>: Cloudflare runs your app across hundreds of edge locations, reducing response time for users worldwide.</p>
</li>
<li><p><strong>Minimal cold starts</strong>: Thanks to V8 isolates, functions start almost instantly.</p>
</li>
<li><p><strong>Cost efficiency</strong>: The free tier is generous enough for portfolios, blogs, and many small-to-medium apps.</p>
</li>
</ul>
<h3 id="heading-trade-offs-to-consider">Trade-offs to Consider</h3>
<p>Cloudflare Workers use a V8 isolate runtime, not a full Node.js environment. That means:</p>
<ul>
<li><p>Some Node.js APIs like <code>fs</code> or <code>child_process</code> aren't available</p>
</li>
<li><p>Native binaries or certain libraries may not work</p>
</li>
</ul>
<p>That said, for most modern stacks –&nbsp;like Next.js + Supabase + Stripe + Resend – this limitation is rarely an issue.</p>
<p>In short, choose <strong>Vercel</strong> if you want the simplest, plug-and-play Next.js deployment. Choose <strong>Cloudflare Workers</strong> if you want better edge performance and more flexible scaling.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before getting started, make sure you have the following set up. Most of these take only a few minutes:</p>
<ul>
<li><p><strong>Node.js 18+</strong> and <strong>pnpm 9+</strong> (you can also use npm or yarn, but this guide uses pnpm.)</p>
</li>
<li><p>A <strong>Cloudflare account</strong> 👉 <a href="https://dash.cloudflare.com/sign-up">https://dash.cloudflare.com/sign-up</a></p>
</li>
<li><p>A <strong>Supabase account</strong> (if your app uses a database) 👉 <a href="https://supabase.com">https://supabase.com</a></p>
</li>
<li><p>A <strong>GitHub repository</strong> for your project (required later for CI/CD setup)</p>
</li>
<li><p>A <strong>domain name</strong> (optional) – You’ll get a free <code>*.workers.dev</code> URL by default.</p>
</li>
</ul>
<h3 id="heading-install-wrangler-cloudflare-cli">Install Wrangler (Cloudflare CLI)</h3>
<p>We’ll use Wrangler to build and deploy the application:</p>
<pre><code class="language-bash">pnpm add -D wrangler
</code></pre>
<h2 id="heading-the-stack">The Stack</h2>
<p>Here’s the tech stack used in this project:</p>
<ul>
<li><p><strong>Next.js (v14.2.x):</strong> Using the App Router with Edge runtime for both public and dashboard routes</p>
</li>
<li><p><strong>Supabase:</strong> Handles authentication, Postgres database, and Row-Level Security (RLS)</p>
</li>
<li><p><strong>Tailwind CSS</strong> + UI utilities: For styling, along with lightweight animation using Framer Motion</p>
</li>
<li><p><strong>Cloudflare Workers:</strong> Deployment powered by <code>@opennextjs/cloudflare</code> and <code>wrangler</code></p>
</li>
<li><p><strong>GitHub Actions:</strong> Used to automate CI/CD and deployments</p>
</li>
</ul>
<p><strong>Note:</strong> If you're using Next.js <strong>15 or later</strong>, you can remove the<br><code>--dangerouslyUseUnsupportedNextVersion</code> flag from the build script, as it's only required for certain Next.js 14 setups.</p>
<h2 id="heading-step-1-install-the-cloudflare-adapter">Step 1 — Install the Cloudflare Adapter</h2>
<p>From inside your existing Next.js project, install the OpenNext adapter along with Wrangler (Cloudflare’s CLI tool):</p>
<pre><code class="language-bash">pnpm add @opennextjs/cloudflare
pnpm add -D wrangler
</code></pre>
<p>Then add the deploy scripts to <code>package.json</code>:</p>
<pre><code class="language-jsonc">{
  "scripts": {
    "dev": "next dev",
    "build": "next build",
    "start": "next start",
    "lint": "next lint",

    "cloudflare-build": "opennextjs-cloudflare build --dangerouslyUseUnsupportedNextVersion",
    "preview":          "pnpm cloudflare-build &amp;&amp; opennextjs-cloudflare preview",
    "deploy":           "pnpm cloudflare-build &amp;&amp; wrangler deploy",
    "upload":           "pnpm cloudflare-build &amp;&amp; opennextjs-cloudflare upload",
    "cf-typegen":       "wrangler types --env-interface CloudflareEnv cloudflare-env.d.ts"
  }
}
</code></pre>
<p>What each script does:</p>
<table>
<thead>
<tr>
<th>Script</th>
<th>What it does</th>
</tr>
</thead>
<tbody><tr>
<td><code>pnpm cloudflare-build</code></td>
<td>Compiles your Next app into <code>.open-next/</code> (the Worker bundle). No upload.</td>
</tr>
<tr>
<td><code>pnpm preview</code></td>
<td>Builds and runs the Worker locally with <code>wrangler dev</code>. Closest thing to prod.</td>
</tr>
<tr>
<td><code>pnpm deploy</code></td>
<td>Builds and uploads to Cloudflare. <strong>This ships to production.</strong></td>
</tr>
<tr>
<td><code>pnpm upload</code></td>
<td>Builds and uploads a <em>new version</em> without promoting it (for staged rollouts).</td>
</tr>
<tr>
<td><code>pnpm cf-typegen</code></td>
<td>Regenerates <code>cloudflare-env.d.ts</code> types after editing <code>wrangler.jsonc</code>.</td>
</tr>
</tbody></table>
<p><strong>Heads up:</strong> the Pages-based <code>@cloudflare/next-on-pages</code> is a different tool. We are <strong>not</strong> using Pages — we're deploying as a real Worker. Don't mix the two.</p>
<h2 id="heading-step-2-wire-opennext-into-next-dev">Step 2 — Wire OpenNext into <code>next dev</code></h2>
<p>So that <code>pnpm dev</code> can read your Cloudflare bindings (env vars, R2, KV, D1, …) the same way production will, edit <code>next.config.mjs</code>:</p>
<pre><code class="language-js">/** @type {import('next').NextConfig} */
const nextConfig = {};

if (process.env.NODE_ENV !== "production") {
  const { initOpenNextCloudflareForDev } = await import(
    "@opennextjs/cloudflare"
  );
  initOpenNextCloudflareForDev();
}

export default nextConfig;
</code></pre>
<p>We only call it in development so <code>next build</code> stays fast and CI doesn't spin up a Miniflare instance for nothing.</p>
<h2 id="heading-step-3-local-environment-setup-with-devvars">Step 3 — Local Environment Setup with <code>.dev.vars</code></h2>
<p>When working with Cloudflare Workers locally, Wrangler uses a file called <code>.dev.vars</code> to store environment variables (instead of <code>.env.local</code> used by Next.js).</p>
<p>A simple and reliable approach is to keep an example file in your repo and ignore the real one.</p>
<h3 id="heading-example-devvarsexample-committed">Example: <code>.dev.vars.example</code> (committed)</h3>
<pre><code class="language-bash">NEXT_PUBLIC_SUPABASE_URL="https://YOUR-PROJECT-ref.supabase.co"
NEXT_PUBLIC_SUPABASE_ANON_KEY="YOUR-ANON-KEY"
NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL="admin@example.com"
</code></pre>
<h3 id="heading-set-up-your-local-environment">Set Up Your Local Environment</h3>
<p>Run the following commands:</p>
<pre><code class="language-plaintext">cp .dev.vars.example .dev.vars
cp .dev.vars .env.local
</code></pre>
<ul>
<li><p><code>.dev.vars</code> is used by Wrangler (<code>wrangler dev</code>)</p>
</li>
<li><p><code>.env.local</code> is used by Next.js (<code>next dev</code>)</p>
</li>
</ul>
<h3 id="heading-why-use-both-files">Why Use Both Files?</h3>
<ul>
<li><p><code>next dev</code> reads from <code>.env.local</code></p>
</li>
<li><p><code>wrangler dev</code> (used in <code>pnpm preview</code>) reads from <code>.dev.vars</code></p>
</li>
</ul>
<p>Keeping both files in sync ensures your app behaves consistently in development and when running in the Cloudflare runtime.</p>
<h3 id="heading-update-gitignore">Update <code>.gitignore</code></h3>
<p>Make sure these files are ignored:</p>
<pre><code class="language-plaintext">.dev.vars
.env*.local
.open-next
.wrangler
</code></pre>
<h2 id="heading-step-4-deploy-your-app-from-your-local-machine">Step 4 — Deploy Your App from Your Local Machine</h2>
<p>Once <code>pnpm preview</code> is working correctly, you're ready to deploy your application:</p>
<pre><code class="language-bash">pnpm deploy
</code></pre>
<p>Under the hood that runs:</p>
<pre><code class="language-bash">pnpm cloudflare-build &amp;&amp; wrangler deploy
</code></pre>
<p>The first time, Wrangler will:</p>
<ol>
<li><p>Compile your app to <code>.open-next/worker.js</code>.</p>
</li>
<li><p>Upload the script + assets to Cloudflare.</p>
</li>
<li><p>Print your live URL, e.g. <code>https://porfolio.&lt;your-account&gt;.workers.dev</code>.</p>
</li>
</ol>
<p>Open it in a browser. Congratulations — you're on Cloudflare's edge in 330+ cities. The page should be served in <strong>&lt;100 ms</strong> TTFB from anywhere.  </p>
<p><a href="https://portfolio.tarikuldev.workers.dev/">Here's the live version of my own portfolio deployed this way</a></p>
<h2 id="heading-step-5-push-your-secrets-to-the-worker">Step 5 — Push Your Secrets to the Worker</h2>
<p>Local <code>.dev.vars</code> is <strong>not</strong> uploaded by <code>wrangler deploy</code>. You have to push secrets explicitly:</p>
<pre><code class="language-bash">wrangler secret put NEXT_PUBLIC_SUPABASE_URL
wrangler secret put NEXT_PUBLIC_SUPABASE_ANON_KEY
wrangler secret put NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL
</code></pre>
<p>Each command prompts you for the value and stores it encrypted on Cloudflare. Or do it visually:</p>
<blockquote>
<p>Cloudflare Dashboard → <strong>Workers &amp; Pages</strong> → your worker → <strong>Settings</strong> → <strong>Variables and Secrets</strong> → <strong>Add</strong>.</p>
</blockquote>
<p>Important: <code>NEXT_PUBLIC_*</code> vars are inlined into the client bundle at build time, so they also need to be available when pnpm cloudflare-build runs (locally, that's your .env.local; in CI, see Step 10).</p>
<h2 id="heading-step-6-set-up-continuous-deployment-with-github-actions">Step 6 — Set Up Continuous Deployment with GitHub Actions</h2>
<p>Once your local deployment is working, the next step is automating deployments so every push to the <code>main</code> branch updates production automatically.</p>
<p>With this workflow:</p>
<ul>
<li><p>Pull requests will run validation checks</p>
</li>
<li><p>Production deploys only happen after successful builds</p>
</li>
<li><p>Broken code never reaches your live site</p>
</li>
</ul>
<p>Create the following file inside your project:</p>
<p><code>.github/workflows/deploy.yml</code></p>
<pre><code class="language-yaml">name: CI / Deploy to Cloudflare Workers

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  workflow_dispatch:

concurrency:
  group: cloudflare-deploy-${{ github.ref }}
  cancel-in-progress: true

jobs:
  verify:
    name: Lint and Build
    runs-on: ubuntu-latest
    timeout-minutes: 10

    steps:
      - uses: actions/checkout@v4

      - uses: pnpm/action-setup@v4
        with:
          version: 10

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm

      - run: pnpm install --frozen-lockfile
      - run: pnpm lint
      - run: pnpm build
        env:
          NEXT_PUBLIC_SUPABASE_URL: ${{ secrets.NEXT_PUBLIC_SUPABASE_URL }}
          NEXT_PUBLIC_SUPABASE_ANON_KEY: ${{ secrets.NEXT_PUBLIC_SUPABASE_ANON_KEY }}
          NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL: ${{ secrets.NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL }}

  deploy:
    name: Deploy to Cloudflare Workers
    needs: verify
    if: github.event_name == 'push' &amp;&amp; github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - uses: actions/checkout@v4

      - uses: pnpm/action-setup@v4
        with:
          version: 10

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm

      - run: pnpm install --frozen-lockfile

      - name: Build and Deploy
        run: pnpm run deploy
        env:
          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          NEXT_PUBLIC_SUPABASE_URL: ${{ secrets.NEXT_PUBLIC_SUPABASE_URL }}
          NEXT_PUBLIC_SUPABASE_ANON_KEY: ${{ secrets.NEXT_PUBLIC_SUPABASE_ANON_KEY }}
          NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL: ${{ secrets.NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL }}
</code></pre>
<h3 id="heading-required-github-repo-secrets">Required GitHub repo secrets</h3>
<p>Go to GitHub repo → Settings → Secrets and variables → Actions → New repository secret and add:</p>
<table>
<thead>
<tr>
<th>Secret</th>
<th>Where to get it</th>
</tr>
</thead>
<tbody><tr>
<td><code>CLOUDFLARE_API_TOKEN</code></td>
<td><a href="https://dash.cloudflare.com/profile/api-tokens">https://dash.cloudflare.com/profile/api-tokens</a> → "Edit Cloudflare Workers" template</td>
</tr>
<tr>
<td><code>CLOUDFLARE_ACCOUNT_ID</code></td>
<td>Cloudflare dashboard → right sidebar, "Account ID"</td>
</tr>
<tr>
<td><code>CLOUDFLARE_ACCOUNT_SUBDOMAIN</code></td>
<td>Your <code>*.workers.dev</code> subdomain (used only for the deployment URL link)</td>
</tr>
<tr>
<td><code>NEXT_PUBLIC_SUPABASE_URL</code></td>
<td>Supabase project settings</td>
</tr>
<tr>
<td><code>NEXT_PUBLIC_SUPABASE_ANON_KEY</code></td>
<td>Supabase project settings</td>
</tr>
<tr>
<td><code>NEXT_PUBLIC_DASHBOARD_DEFAULT_EMAIL</code></td>
<td>Email pre-filled on <code>/dashboard/login</code></td>
</tr>
</tbody></table>
<p>That's it. Push it to <code>main</code> and it'll go live in about 90 seconds. PRs run lint and build only, so broken code never reaches production.</p>
<h2 id="heading-step-7-updating-the-project-the-daily-workflow">Step 7 — Updating the Project (the Daily Workflow)</h2>
<p>After the initial setup, the loop is boringly simple — which is the whole point. Here's what I actually do day-to-day:</p>
<h3 id="heading-code-change">Code Change</h3>
<pre><code class="language-bash">git checkout -b feat/new-section
# ...edit files...
pnpm dev                # iterate locally
pnpm preview            # final smoke test on the Worker runtime
git commit -am "feat: add new section"
git push origin feat/new-section
</code></pre>
<p>Open a PR and the <strong>verify</strong> that the job runs. Then review, merge, and the deploy it. The job ships to Cloudflare automatically.</p>
<h3 id="heading-updating-env-vars-secrets">Updating env Vars / Secrets</h3>
<pre><code class="language-bash"># Local
nano .dev.vars

# Production
wrangler secret put NEXT_PUBLIC_SUPABASE_URL
# ...etc.
</code></pre>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>When I started this migration, I was nervous about leaving Vercel — the Next.js DX there is genuinely excellent. But the moment you push beyond a hobby site, Cloudflare's economics and edge performance are not close.</p>
<p>With <code>@opennextjs/cloudflare</code>, the developer experience has also caught up: my <code>pnpm dev</code> loop is identical, my <code>pnpm preview</code> mimics production, and <code>git push</code> deploys globally in ~90 seconds.</p>
<p>If you've been holding off because the old Cloudflare Pages + Next.js story was rough, that era is over. Try this runbook on a side project this weekend and see for yourself.</p>
<p>If you found this useful, the full repo is <a href="./">here</a> — feel free to clone it as a starter.</p>
<p>Happy shipping.</p>
<p>— <em>Tarikul</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How I Built a Production-Ready CI/CD Pipeline for a Monorepo-Based Microservices System with Jenkins, Docker Compose, and Traefik ]]>
                </title>
                <description>
                    <![CDATA[ This tutorial is a complete, real-world guide to building a production-ready CI/CD pipeline using Jenkins, Docker Compose, and Traefik on a single Linux server. You’ll learn how to expose services on  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-production-ready-ci-cd-pipeline-for-monorepo-based-microservices-system/</link>
                <guid isPermaLink="false">69ea60c8904b915438a58ca2</guid>
                
                    <category>
                        <![CDATA[ Jenkins ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ci-cd ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Docker ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Traefik ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Devops ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Md Tarikul Islam ]]>
                </dc:creator>
                <pubDate>Thu, 23 Apr 2026 18:11:20 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/66cb39fcaa2a09f9a8d691c1/d59c62f5-e376-4f09-851f-83e437f9960a.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>This tutorial is a complete, real-world guide to building a production-ready CI/CD pipeline using Jenkins, Docker Compose, and Traefik on a single Linux server.</p>
<p>You’ll learn how to expose services on a custom domain with auto-renewing HTTPS, and implement a smart deployment strategy that detects changes and redeploys only the affected microservices. This helps avoid unnecessary full-stack redeploys. We'll also cover real production issues and the exact fixes for each one.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a href="#heading-1-what-youll-build">1. What you'll build</a></p>
</li>
<li><p><a href="#heading-2-architecture">2. Architecture</a></p>
</li>
<li><p><a href="#heading-3-server-prerequisites">3. Server prerequisites</a></p>
</li>
<li><p><a href="#heading-4-traefik-the-reverse-proxy">4. Traefik — the reverse proxy</a></p>
</li>
<li><p><a href="#heading-5-run-jenkins-in-docker">5. Run Jenkins in Docker</a></p>
</li>
<li><p><a href="#heading-6-expose-jenkins-on-a-domain-via-traefik">6. Expose Jenkins on a domain via Traefik</a></p>
</li>
<li><p><a href="#heading-7-first-time-jenkins-setup">7. First-time Jenkins setup</a></p>
</li>
<li><p><a href="#heading-8-add-the-github-credential">8. Add the GitHub credential</a></p>
</li>
<li><p><a href="#heading-9-create-the-pipeline-job">9. Create the pipeline job</a></p>
</li>
<li><p><a href="#heading-10-the-jenkinsfile-deploy-only-what-changed">10. The Jenkinsfile (deploy only what changed)</a></p>
</li>
<li><p><a href="#heading-11-end-to-end-test">11. End-to-end test</a></p>
</li>
<li><p><a href="#heading-12-troubleshooting-every-error-we-hit">12. Troubleshooting — every error we hit</a></p>
</li>
<li><p><a href="#heading-13-mental-model-host-vs-container">13. Mental model: host vs. container</a></p>
</li>
<li><p><a href="#heading-14-daily-operations-cheat-sheet">14. Daily operations cheat sheet</a></p>
</li>
<li><p><a href="#heading-15-what-id-do-differently-next-time">15. What I'd do differently next time</a></p>
</li>
<li><p><a href="#heading-closing-thoughts">Closing thoughts</a></p>
</li>
</ul>
<h2 id="heading-1-what-youll-build">1. What You'll Build</h2>
<p>In this tutorial, you'll build a Jenkins instance running inside Docker on the same Linux server as your application stack.</p>
<p>Traefik will act as a reverse proxy in front of Jenkins, exposing it via a clean URL (<a href="https://jenkins.example.com"><code>https://jenkins.example.com</code></a>) with <strong>auto-renewing Let's Encrypt certificates</strong>.</p>
<p>You'll also create a Jenkinsfile in your application repository that:</p>
<ul>
<li><p>Automatically triggers on every push to the <code>staging</code> branch,</p>
</li>
<li><p>Detects which microservices changed in each commit,</p>
</li>
<li><p>Pulls the latest code on the host machine,</p>
</li>
<li><p>Rebuilds and restarts <strong>only the affected services</strong>.</p>
</li>
</ul>
<p>On every push, only the relevant services are redeployed.</p>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>Before jumping in, this guide assumes you’re already comfortable with a few core concepts and tools.</p>
<p>This isn't a beginner-level tutorial — we’ll be working directly with infrastructure, containers, and CI/CD pipelines.</p>
<p>You should be familiar with:</p>
<ul>
<li><p>Basic Linux commands (SSH, file system navigation, permissions)</p>
</li>
<li><p>Docker fundamentals (images, containers, volumes, networks)</p>
</li>
<li><p>Git workflows (clone, pull, branches)</p>
</li>
<li><p>General idea of CI/CD pipelines</p>
</li>
</ul>
<p>Tools and environment required:</p>
<ul>
<li><p>A Linux server (Ubuntu recommended)</p>
</li>
<li><p>Docker Engine + Docker Compose (v2)</p>
</li>
<li><p>A domain name (for Traefik + HTTPS)</p>
</li>
<li><p>GitHub repository (for your backend project)</p>
</li>
<li><p>Basic understanding of microservices architecture</p>
</li>
</ul>
<p>If you’re comfortable with the above, you’re ready to follow along.</p>
<h2 id="heading-2-architecture">2. Architecture</h2>
<p>Here's an overview of the architecture:</p>
<pre><code class="language-plaintext">┌──────────────────────────── Linux server (Ubuntu) ────────────────────────────┐
│                                                                               │
│   /home/developer/projects/                                                  │
│       └── project-prod-configs/             ← infra repo (compose, Traefik) │
│              ├── docker-compose.staging.yml                                   │
│              ├── traefik.staging.yml                                          │
│              └── project-backend/          ← app repo (services, gateways) │
│                     ├── Jenkinsfile                                           │
│                     ├── docker-compose.staging.yml                            │
│                     └── apps/                                                 │
│                            ├── services/&lt;name&gt;/                               │
│                            ├── gateways/&lt;name&gt;/                               │
│                            └── core/&lt;name&gt;/                                   │
│                                                                               │
│   ┌─────────────────────── Docker network: proxy ──────────────────────┐      │
│   │  traefik (80, 443)                                                 │      │
│   │     │                                                              │      │
│   │     ├──► jenkins  (projects-jenkins-staging)                     │      │
│   │     │      ↳ /projects  ← bind-mount of the host project tree     │      │
│   │     │      ↳ /var/run/docker.sock ← controls host Docker           │      │
│   │     │                                                              │      │
│   │     └──► your services &amp; gateways (built by the pipeline)          │      │
│   └────────────────────────────────────────────────────────────────────┘      │
│                                                                               │
└───────────────────────────────────────────────────────────────────────────────┘
            ▲
            │  webhook on push
            │
   GitHub: &lt;org&gt;/project-backend (branch: staging)
</code></pre>
<p>There are two key ideas here:</p>
<ol>
<li><p><strong>Jenkins runs in a container</strong>, but it controls the <strong>host's</strong> Docker by mounting <code>/var/run/docker.sock</code>. It also bind-mounts the project folder as <code>/projects/...</code>, so it can <code>cd</code> into the real code on the host and run <code>docker compose</code> there.</p>
</li>
<li><p>The <strong>Jenkinsfile lives inside the app repo</strong>, so the pipeline definition is versioned with the code. Jenkins simply points at it.</p>
</li>
</ol>
<h3 id="heading-3-server-prerequisites">3. Server Prerequisites</h3>
<p>Before we start configuring Jenkins or Traefik, we need to prepare the server properly.</p>
<p>In this step, we’ll:</p>
<ul>
<li><p>Create a dedicated Linux user for managing the project</p>
</li>
<li><p>Install Docker and Docker Compose</p>
</li>
<li><p>Set up the folder structure for our repositories</p>
</li>
</ul>
<p>This ensures our CI/CD pipeline runs in a clean and predictable environment.</p>
<pre><code class="language-bash"># Linux user that owns the project tree
sudo adduser developer

# Docker engine + Compose plugin
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker developer

# Sanity check Compose v2
docker compose version
# -&gt; Docker Compose version v2.x.y

# Find where the Compose plugin binary lives — write it down, you'll need it
ls /usr/libexec/docker/cli-plugins/docker-compose
# (some distros use /usr/lib/docker/cli-plugins/docker-compose)

# Project layout
sudo mkdir -p /home/developer/project
sudo chown -R developer:developer /home/developer/project

# Clone both repos in the right place
cd /home/developer/projects
git clone https://github.com/&lt;org&gt;/projects-prod-configs.git
cd projects-prod-configs
git clone -b staging https://github.com/&lt;org&gt;/projects-backend.git
</code></pre>
<p>You should now have:</p>
<pre><code class="language-plaintext">/home/developer/projects/projects-prod-configs/projects-backend
</code></pre>
<p>Memorize this path — your Jenkinsfile references it.</p>
<h3 id="heading-dns">DNS</h3>
<p>Point an A-record for your Jenkins subdomain to the server's public IP <strong>before</strong> the next steps so Let's Encrypt can validate via HTTP challenge:</p>
<pre><code class="language-plaintext">jenkins.example.com   A   &lt;server-public-ip&gt;
</code></pre>
<h2 id="heading-4-traefik-the-reverse-proxy">4. Traefik — the Reverse Proxy</h2>
<p>Traefik acts as the entry point to your entire system. Instead of exposing each service manually with ports, Traefik automatically:</p>
<ul>
<li><p>Routes traffic based on domain names</p>
</li>
<li><p>Generates and renews HTTPS certificates using Let’s Encrypt</p>
</li>
<li><p>Connects to Docker and detects services dynamically</p>
</li>
</ul>
<p>In simple terms, Traefik lets you access services like:</p>
<p><a href="https://jenkins.example.com">https://jenkins.example.com</a><br><a href="https://api.example.com">https://api.example.com</a></p>
<p>…without manually configuring NGINX or managing SSL certificates.</p>
<p>In this setup, Traefik watches Docker containers and routes traffic using labels we'll define later.</p>
<p>Traefik gives every container a real domain and a real cert with <strong>zero per-service config</strong> — you just add a few labels.</p>
<h3 id="heading-traefikstagingyml-static-config"><code>traefik.staging.yml</code> (static config)</h3>
<p>Put this at the root of your infra repo:</p>
<pre><code class="language-yaml">api:
  dashboard: true

entryPoints:
  web:
    address: ":80"
  websecure:
    address: ":443"

certificatesResolvers:
  letsencrypt:
    acme:
      httpChallenge:
        entryPoint: web
      email: admin@example.com           # ← change me
      storage: /etc/traefik/acme.json

providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false              # only containers with traefik.enable=true
    network: proxy
  file:
    directory: /etc/traefik/dynamic
    watch: true

log:
  level: INFO

accessLog: {}
</code></pre>
<h3 id="heading-the-traefik-service-in-docker-composestagingyml">The Traefik service in <code>docker-compose.staging.yml</code></h3>
<pre><code class="language-yaml">networks:
  proxy:
    name: proxy
    driver: bridge
  internal:
    name: internal
    driver: bridge

volumes:
  acme-data:
  traefik-logs:
  jenkins-data:

services:
  traefik:
    image: traefik:v2.11
    container_name: projects-traefik-staging
    restart: unless-stopped
    ports:
      - "80:80"        # HTTP (auto-redirects to HTTPS)
      - "443:443"      # HTTPS
      - "8080:8080"    # Traefik dashboard (internal only — protect via firewall)
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./traefik.staging.yml:/etc/traefik/traefik.yml:ro
      - ./dynamic:/etc/traefik/dynamic:ro
      - acme-data:/etc/traefik           # persists Let's Encrypt certs
      - traefik-logs:/var/log/traefik
    networks:
      - proxy
    command:
      - '--api.insecure=false'
      - '--api.dashboard=true'
      - '--providers.docker=true'
      - '--providers.docker.exposedbydefault=false'
      - '--providers.docker.network=proxy'
      - '--entrypoints.web.address=:80'
      - '--entrypoints.websecure.address=:443'
      - '--entrypoints.web.http.redirections.entryPoint.to=websecure'
      - '--entrypoints.web.http.redirections.entryPoint.scheme=https'
      - '--certificatesresolvers.letsencrypt.acme.httpchallenge=true'
      - '--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web'
      - '--certificatesresolvers.letsencrypt.acme.email=${ACME_EMAIL:-admin@example.com}'
      - '--certificatesresolvers.letsencrypt.acme.storage=/etc/traefik/acme.json'
      - '--log.level=INFO'
      - '--accesslog=true'
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=proxy"
      # Traefik's own dashboard
      - "traefik.http.routers.traefik-dash.rule=Host(`traefik.example.com`)"
      - "traefik.http.routers.traefik-dash.entrypoints=websecure"
      - "traefik.http.routers.traefik-dash.tls.certresolver=letsencrypt"
      - "traefik.http.routers.traefik-dash.service=api@internal"
</code></pre>
<p>Bring it up:</p>
<pre><code class="language-bash">cd /home/developer/projects/projects-prod-configs
docker compose -f docker-compose.staging.yml up -d traefik
</code></pre>
<p>Watch the logs the first time — Traefik will request a cert for the dashboard host as soon as DNS resolves.</p>
<pre><code class="language-bash">docker logs -f projects-traefik-staging
</code></pre>
<p><strong>Tip.</strong> While testing, switch ACME to staging endpoint (<code>acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory</code>) so you don't burn through Let's Encrypt's rate limits if you misconfigure DNS. Remove that flag before going live.</p>
<h2 id="heading-5-run-jenkins-in-docker">5. Run Jenkins in Docker</h2>
<p>Add this Jenkins service to the same <code>docker-compose.staging.yml</code>. Every line matters (and the comments explain why).</p>
<pre><code class="language-yaml">  jenkins:
    image: jenkins/jenkins:lts
    container_name: projects-jenkins-staging
    restart: unless-stopped
    user: root                           # to use host docker.sock without UID juggling
    environment:
      - JAVA_OPTS=-Xmx1g -Xms512m -Duser.timezone=Asia/Dhaka
      - TZ=Asia/Dhaka                    # OS-level timezone inside container
      - JENKINS_OPTS=--prefix=/
    ports:
      - "3095:8080"                      # web UI (also reachable directly if needed)
      - "50000:50000"                    # inbound agent port
    volumes:
      - jenkins-data:/var/jenkins_home   # Jenkins config/jobs/secrets persistence
      - /var/run/docker.sock:/var/run/docker.sock                          # control host Docker
      - /usr/bin/docker:/usr/bin/docker                                     # docker CLI from host
      - /usr/libexec/docker/cli-plugins:/usr/libexec/docker/cli-plugins:ro  # docker compose plugin
      - /home/developer/projects:/projects                                # project tree
      - /etc/localtime:/etc/localtime:ro                                    # match host clock
      - /etc/timezone:/etc/timezone:ro
    networks:
      - proxy
      - internal
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:8080/login']
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 120s
    deploy:
      resources:
        limits:
          memory: 1024M
</code></pre>
<p><strong>Why</strong> <code>user: root</code><strong>?</strong> It's the simplest way to share <code>docker.sock</code> and the project bind-mount without UID/GID gymnastics. If you prefer an unprivileged user, you'll need to set <code>group: docker</code> and align UIDs/perms on host folders — possible but out of scope here.</p>
<h2 id="heading-6-expose-jenkins-on-a-domain-via-traefik">6. Expose Jenkins on a Domain via Traefik</h2>
<p>This is the section many guides skip. We'll add <strong>labels</strong> to the Jenkins service so Traefik picks it up automatically. No editing of Traefik config required.</p>
<pre><code class="language-yaml">  jenkins:
    # ... everything above ...
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=proxy"

      # 1) Router — match incoming Host
      - "traefik.http.routers.jenkins.rule=Host(`jenkins.example.com`)"
      - "traefik.http.routers.jenkins.entrypoints=websecure"
      - "traefik.http.routers.jenkins.tls.certresolver=letsencrypt"
      - "traefik.http.routers.jenkins.service=jenkins"

      # 2) Service — tell Traefik which container port is the app
      - "traefik.http.services.jenkins.loadbalancer.server.port=8080"

      # 3) Middleware — Jenkins needs X-Forwarded-Proto so it knows it's behind HTTPS
      - "traefik.http.middlewares.jenkins-headers.headers.customrequestheaders.X-Forwarded-Proto=https"
      - "traefik.http.routers.jenkins.middlewares=jenkins-headers"
</code></pre>
<p>What each line does:</p>
<table>
<thead>
<tr>
<th>Label</th>
<th>Purpose</th>
</tr>
</thead>
<tbody><tr>
<td><code>traefik.enable=true</code></td>
<td>Opts this container in (we set <code>exposedByDefault=false</code>).</td>
</tr>
<tr>
<td><code>traefik.docker.network=proxy</code></td>
<td>Tells Traefik which network to talk to Jenkins on (Jenkins is on both <code>proxy</code> and <code>internal</code>).</td>
</tr>
<tr>
<td><code>routers.jenkins.rule=Host(...)</code></td>
<td>Forwards only this hostname to Jenkins.</td>
</tr>
<tr>
<td><code>routers.jenkins.entrypoints=websecure</code></td>
<td>Listens only on 443. (HTTP redirect was set up in section 4.)</td>
</tr>
<tr>
<td><code>routers.jenkins.tls.certresolver=letsencrypt</code></td>
<td>Auto-issues + renews the cert.</td>
</tr>
<tr>
<td><code>services.jenkins.loadbalancer.server.port=8080</code></td>
<td>Jenkins listens on 8080 inside the container.</td>
</tr>
<tr>
<td><code>customrequestheaders.X-Forwarded-Proto=https</code></td>
<td>Without this, Jenkins generates <code>http://</code> URLs in webhooks/links and breaks.</td>
</tr>
</tbody></table>
<p>Bring Jenkins up:</p>
<pre><code class="language-bash">cd /home/developer/projects/projects-prod-configs
docker compose -f docker-compose.staging.yml up -d jenkins

# Watch Traefik issue the certificate
docker logs -f projects-traefik-staging | grep -i acme
</code></pre>
<p>After 10–60 seconds you should be able to open <code>https://jenkins.example.com</code> and see Jenkins's setup wizard with a valid lock icon.</p>
<p>Inside Jenkins (after first login):</p>
<p>Manage Jenkins → System → Jenkins URL → set this to: <a href="https://jenkins.example.com/">https://jenkins.example.com/</a></p>
<p>This is important because Jenkins uses this base URL to generate:</p>
<ul>
<li><p>Webhook endpoints (for GitHub triggers)</p>
</li>
<li><p>Links inside emails and build logs</p>
</li>
</ul>
<p>If this isn't set correctly, GitHub webhooks may fail, and any links Jenkins generates will point to the wrong address (often localhost or internal IPs).</p>
<h2 id="heading-7-first-time-jenkins-setup">7. First-Time Jenkins Setup</h2>
<p>If you're running Jenkins for the first time on this server, follow this section to complete the initial setup.</p>
<p>If you already have Jenkins configured, you can skip this section — but make sure the required plugins and settings match what we use later in this guide.</p>
<ol>
<li><p>Open <code>https://jenkins.example.com</code>. Get the initial admin password:</p>
<pre><code class="language-bash">docker exec projects-jenkins-staging cat /var/jenkins_home/secrets/initialAdminPassword
</code></pre>
</li>
<li><p>Paste it, choose Install suggested plugins.</p>
</li>
<li><p>Create your admin user.</p>
</li>
<li><p>Manage Jenkins → Plugins → Available and install:</p>
<ul>
<li><p>GitHub (and GitHub Branch Source)</p>
</li>
<li><p>Pipeline: GitHub</p>
</li>
<li><p>Credentials Binding (usually preinstalled)</p>
</li>
</ul>
</li>
</ol>
<p>That's all the plugins you need for the rest of this guide.</p>
<h2 id="heading-8-add-the-github-credential">8. Add the GitHub Credential</h2>
<p>Jenkins needs permission to access your GitHub repository.</p>
<p>This is done using a GitHub Personal Access Token (PAT), which acts like a password for secure API and Git operations.</p>
<p>We’ll store this token inside Jenkins as a credential so it can pull code during pipeline execution and authenticate securely without exposing secrets in code.</p>
<p>This single credential is used both for the SCM checkout and for the deploy-time <code>git pull</code>.</p>
<ol>
<li><p>Create a Personal Access Token (classic) on GitHub with <code>repo</code> scope.</p>
</li>
<li><p>In Jenkins: Manage Jenkins → Credentials → System → Global → Add Credentials.</p>
</li>
<li><p>Fill in:</p>
<ul>
<li><p>Kind: Username with password</p>
</li>
<li><p>Username: your GitHub username</p>
</li>
<li><p>Password: the token</p>
</li>
<li><p><strong>ID:</strong> <code>github_classic_token</code> <em>(the Jenkinsfile references this exact ID)</em></p>
</li>
</ul>
</li>
</ol>
<h2 id="heading-9-create-the-pipeline-job">9. Create the Pipeline Job</h2>
<p>Now that Jenkins has access to your repository, the next step is to define how deployments should run.</p>
<p>A pipeline job tells Jenkins:</p>
<ul>
<li><p>where your code lives,</p>
</li>
<li><p>which branch to monitor,</p>
</li>
<li><p>and how to execute your deployment process.</p>
</li>
</ul>
<p>In Jenkins, create a new Pipeline job and connect it to your GitHub repository. Once this is set up, Jenkins will automatically trigger deployments whenever you push to the <code>staging</code> branch.</p>
<p>Start by creating a new job:</p>
<p>New Item → Pipeline → name it <code>projects-staging</code> → OK</p>
<p>Then configure the job:</p>
<ul>
<li><p>Under <strong>Build Triggers</strong>, enable:<br><strong>GitHub hook trigger for GITScm polling</strong></p>
</li>
<li><p>Under <strong>Pipeline</strong>:</p>
<ul>
<li><p>Definition: Pipeline script from SCM</p>
</li>
<li><p>SCM: Git</p>
</li>
<li><p>Repository URL: <code>https://github.com/&lt;org&gt;/projects-backend.git</code></p>
</li>
<li><p>Credentials: <code>github_classic_token</code></p>
</li>
<li><p>Branch: <code>*/staging</code></p>
</li>
<li><p>Script Path: <code>Jenkinsfile</code></p>
</li>
</ul>
</li>
</ul>
<p>Save the configuration.</p>
<p>At this point, Jenkins is fully connected to your repository and ready to run your deployment pipeline automatically.</p>
<h2 id="heading-10-the-jenkinsfile-deploy-only-what-changed">10. The Jenkinsfile (Deploy Only What Changed)</h2>
<p>Place this at the root of the <strong>app</strong> repo (<code>projects-backend/Jenkinsfile</code>), branch <code>staging</code>.</p>
<pre><code class="language-groovy">pipeline {
  agent any

  environment {
    PROJECT_PATH = "/projects/projects-prod-configs/projects-backend"
    COMPOSE_FILE = "docker-compose.staging.yml"
  }

  stages {

    stage('Checkout') {
      steps {
        checkout scm
        echo "Checkout completed for branch: ${env.BRANCH_NAME ?: 'staging'}"
      }
    }

    stage('Detect Changes') {
      steps {
        script {
          def changedFiles = sh(
            script: "git diff --name-only HEAD~1 HEAD",
            returnStdout: true
          ).trim()

          echo "Changed files:\n${changedFiles}"

          def services = [] as Set
          changedFiles.split('\n').each { file -&gt;
            def svc  = file =~ /^apps\/services\/([a-z0-9-]+)\//
            def gw   = file =~ /^apps\/gateways\/([a-z0-9-]+)\//
            def core = file =~ /^apps\/core\/([a-z0-9-]+)\//
            if (svc)  { services &lt;&lt; svc[0][1]  }
            if (gw)   { services &lt;&lt; gw[0][1]   }
            if (core) { services &lt;&lt; core[0][1] }
          }
          services = services.findAll { !it.endsWith('-e2e') }
          env.CHANGED_SERVICES = services.join(' ')

          echo "Services to deploy: ${env.CHANGED_SERVICES ?: '(none)'}"
        }
      }
    }

    stage('Deploy') {
      when { expression { return env.CHANGED_SERVICES?.trim() } }
      steps {
        withCredentials([usernamePassword(
          credentialsId: 'github_classic_token',
          usernameVariable: 'GIT_USER',
          passwordVariable: 'GIT_TOKEN'
        )]) {
          sh '''
            set -eu
            git config --global --add safe.directory "${PROJECT_PATH}"
            cd "${PROJECT_PATH}"
            git remote set-url origin "https://github.com/&lt;org&gt;/projects-backend.git"
            git -c credential.helper= \
                -c "credential.helper=!f() { echo username=\({GIT_USER}; echo password=\){GIT_TOKEN}; }; f" \
                pull origin staging
            docker compose -f "\({COMPOSE_FILE}" up -d --build \){CHANGED_SERVICES}
          '''
        }
        echo "Deployed: ${env.CHANGED_SERVICES}"
      }
    }

    stage('Skip Deployment') {
      when { expression { return !env.CHANGED_SERVICES?.trim() } }
      steps { echo "No service changes detected — nothing to deploy." }
    }
  }
}
</code></pre>
<p>Why each tricky line is there:</p>
<ul>
<li><p><code>git config --global --add safe.directory ...</code> — git refuses to operate on a repo whose owner UID differs from the current user's. The repo on disk is owned by <code>developer</code>, but Git inside the container runs as <code>root</code>. This whitelists the path.</p>
</li>
<li><p><code>git remote set-url origin "https://..."</code> — flips the on-disk remote to HTTPS so the <strong>token can be used</strong>. (A PAT can't authenticate <code>git@github.com:</code> URLs — those use SSH.) Idempotent — safe to re-run.</p>
</li>
<li><p><code>git -c credential.helper="!f() { echo username=...; echo password=...; }; f"</code> — feeds the username/token to git for that one command without writing the token to disk and without exposing it on the process command line.</p>
</li>
<li><p><code>${CHANGED_SERVICES}</code> is unquoted on purpose so multiple service names expand as separate args.</p>
</li>
</ul>
<h2 id="heading-11-end-to-end-test">11. End-to-End Test</h2>
<p>Before considering the setup complete, we need to verify that the entire pipeline works as expected.</p>
<p>This end-to-end test ensures that:</p>
<ul>
<li><p>GitHub webhooks are triggering Jenkins correctly,</p>
</li>
<li><p>Jenkins can detect which services changed,</p>
</li>
<li><p>and only the affected services are rebuilt and deployed.</p>
</li>
</ul>
<p>In other words, this simulates a real production deployment.</p>
<p>Start by making a small change in your repository. For example, modify a file inside:</p>
<p>apps/gateways/student-apigw/</p>
<p>Then push the change to the <code>staging</code> branch.</p>
<p>Once pushed, Jenkins should automatically trigger via the webhook. If not, you can manually click <strong>Build Now</strong>.</p>
<p>Now open the build’s <strong>Console Output</strong> and verify the flow. You should see something like:</p>
<ul>
<li><p>Checkout completed for branch: staging</p>
</li>
<li><p>Services to deploy: student-apigw</p>
</li>
<li><p>git pull origin staging (successful)</p>
</li>
<li><p>docker compose ... up -d --build student-apigw</p>
</li>
<li><p>Deployed: student-apigw</p>
</li>
</ul>
<p>If you see this sequence, your pipeline is working correctly.</p>
<p>If anything fails, don’t worry — jump to Section 12 where every common issue and its fix is documented.</p>
<h2 id="heading-12-troubleshooting-every-error-we-hit">12. Troubleshooting — Every Error We Hit</h2>
<p>This section covers real issues we faced while setting up this pipeline — and more importantly, <em>why each fix works</em>. Understanding the “why” will help you debug similar problems in your own setup.</p>
<h3 id="heading-cd-cant-cd-to-projectsprojects-prod-configsprojects-backend">cd: can't cd to /projects/projects-prod-configs/projects-backend</h3>
<p><strong>Cause:</strong><br>The Jenkinsfile runs <code>cd $PROJECT_PATH</code>, but inside the container that path doesn’t exist. This usually happens when:</p>
<ul>
<li><p>the project wasn’t cloned on the host, or</p>
</li>
<li><p>the bind mount isn’t configured correctly.</p>
</li>
</ul>
<p><strong>Fix:</strong></p>
<pre><code class="language-bash">ls /home/developer/projects/projects-prod-configs/projects-backend
# If missing: git clone -b staging &lt;url&gt; there.
</code></pre>
<p>Confirm the bind mount:</p>
<pre><code class="language-plaintext">docker inspect projects-jenkins-staging --format '{{range .Mounts}}{{.Source}} -&gt; {{.Destination}}{{println}}{{end}}'
</code></pre>
<p>If missing, recreate the container:</p>
<pre><code class="language-plaintext">docker compose -f docker-compose.staging.yml up -d --force-recreate jenkins
</code></pre>
<p><strong>Why this works:</strong></p>
<p>Jenkins runs inside a container, but your code lives on the host. The bind mount connects them. Without it, Jenkins cannot access your project directory.</p>
<h3 id="heading-fatal-detected-dubious-ownership-in-repository">fatal: detected dubious ownership in repository</h3>
<p><strong>Cause:</strong><br>Git blocks access when the repository owner differs from the current user.</p>
<ul>
<li><p>Repo owner: <code>developer</code> (host)</p>
</li>
<li><p>Git runs as: <code>root</code> (inside container)</p>
</li>
</ul>
<p><strong>Fix:</strong></p>
<pre><code class="language-plaintext">git config --global --add safe.directory "${PROJECT_PATH}"
</code></pre>
<p><strong>Why this works:</strong></p>
<p>This explicitly tells Git that the directory is trusted, bypassing ownership mismatch security restrictions.</p>
<h3 id="heading-host-key-verification-failed-could-not-read-from-remote-repository"><code>Host key verification failed</code> / <code>Could not read from remote repository</code></h3>
<h4 id="heading-cause">Cause:</h4>
<p>The repository uses SSH (<code>git@github.com:...</code>), but:</p>
<ul>
<li><p>the container has no SSH keys</p>
</li>
<li><p>no known_hosts file exists</p>
</li>
</ul>
<p>Also, GitHub tokens cannot authenticate over SSH.</p>
<p><strong>Fix (recommended):</strong></p>
<pre><code class="language-plaintext">git remote set-url origin "https://github.com/&lt;org&gt;/projects-backend.git"
</code></pre>
<p><strong>Why this works:</strong></p>
<p>HTTPS uses token-based authentication (PAT), which works inside containers without SSH configuration.</p>
<h3 id="heading-unknown-shorthand-flag-f-in-f-docker-compose"><code>unknown shorthand flag: 'f' in -f</code> ( <code>docker compose</code>)</h3>
<p><strong>Cause:</strong><br>The Docker CLI exists, but the Docker Compose plugin is missing inside the container.</p>
<p><strong>Fix:</strong></p>
<pre><code class="language-plaintext">volumes:
  - /usr/libexec/docker/cli-plugins:/usr/libexec/docker/cli-plugins:ro
</code></pre>
<p>Find your path if needed:</p>
<pre><code class="language-plaintext">find /usr -name docker-compose -type f 2&gt;/dev/null
</code></pre>
<p>Verify:</p>
<pre><code class="language-plaintext">docker exec projects-jenkins-staging docker compose version
</code></pre>
<p><strong>Why this works:</strong></p>
<p>Docker Compose v2 is a CLI plugin. Mounting this directory makes the <code>docker compose</code> command available inside the container.</p>
<h3 id="heading-wrong-timezone-in-build-timestamps-and-jenkins-ui">Wrong timezone in build timestamps and Jenkins UI</h3>
<p><strong>Fix:</strong> Set both env var and JVM flag, and bind-mount the host's clock files:</p>
<pre><code class="language-yaml">environment:
  - TZ=Asia/Dhaka
  - JAVA_OPTS=... -Duser.timezone=Asia/Dhaka
volumes:
  - /etc/localtime:/etc/localtime:ro
  - /etc/timezone:/etc/timezone:ro
</code></pre>
<p>You <strong>must</strong> recreate the container for env-var changes to take effect:</p>
<pre><code class="language-bash">docker compose -f docker-compose.staging.yml up -d --force-recreate jenkins
</code></pre>
<p><strong>Why this works:</strong><br>Jenkins runs on Java, which uses its own timezone separate from the OS.<br>By aligning OS timezone, JVM timezone, and host clock, you ensure consistent timestamps everywhere.</p>
<h3 id="heading-errsockettimeout-pnpm-install-fails">ERR_SOCKET_TIMEOUT (pnpm install fails)</h3>
<h4 id="heading-cause">Cause:</h4>
<p>If you have multiple services building in parallel and each runs pnpm install with ~1500 packages, the network gets saturated and a timeout occurs.</p>
<h4 id="heading-fixes">Fixes:</h4>
<p>a) Increase timeout + control concurrency</p>
<pre><code class="language-xml">RUN pnpm install --frozen-lockfile --ignore-scripts 
--network-timeout 600000 
--network-concurrency 8
</code></pre>
<p>Why: Gives pnpm more time and reduces network overload.</p>
<p>b) Enable pnpm cache (BuildKit)</p>
<pre><code class="language-xml">RUN --mount=type=cache,id=pnpm-store,target=/root/.local/share/pnpm/store 
pnpm install --frozen-lockfile --ignore-scripts
</code></pre>
<p>Why: Dependencies are cached and reused instead of downloading every time.</p>
<p>c) Avoid unnecessary rebuilds</p>
<pre><code class="language-xml">docker compose -f \(COMPOSE_FILE build \)CHANGED_SERVICES docker compose -f \(COMPOSE_FILE up -d --no-build \)CHANGED_SERVICES
</code></pre>
<p>Why: Only changed services are rebuilt → less network load → fewer failures.</p>
<h3 id="heading-container-changes-dont-apply-after-editing-docker-composeyml">Container changes don’t apply after editing docker-compose.yml</h3>
<h4 id="heading-cause">Cause:</h4>
<p>Docker compose up -d does not update running containers.</p>
<h4 id="heading-fix">Fix:</h4>
<pre><code class="language-xml">docker compose -f docker-compose.staging.yml up -d --force-recreate jenkins
</code></pre>
<p><strong>Why this works:</strong></p>
<p>This forces Docker to recreate the container with updated configuration (env, volumes, labels).</p>
<h3 id="heading-traefik-shows-default-certificate-no-https">Traefik shows default certificate (no HTTPS)</h3>
<h4 id="heading-common-causes">Common causes:</h4>
<p>DNS not pointing to server Port 80 blocked Wrong Docker network</p>
<h4 id="heading-check">Check:</h4>
<pre><code class="language-xml">dig +short jenkins.example.com docker logs projects-traefik-staging 2&gt;&amp;1 | grep -i acme
</code></pre>
<p><strong>Why this works:</strong></p>
<p>Let’s Encrypt uses HTTP-01 challenge, so it must reach your server via port 80. If DNS or networking is wrong, certificate issuance fails.</p>
<h3 id="heading-jenkins-reverse-proxy-setup-is-broken">Jenkins: "Reverse proxy setup is broken"</h3>
<h4 id="heading-fix">Fix:</h4>
<p>Set the Jenkins URL to <a href="https://jenkins.example.com/">https://jenkins.example.com/</a><br>Ensure header:</p>
<pre><code class="language-xml">X-Forwarded-Proto: https
</code></pre>
<p><strong>Why this works:</strong></p>
<p>Jenkins needs to know it's behind HTTPS. Without this, it generates incorrect URLs (http instead of https), breaking redirects and webhooks.</p>
<h2 id="heading-13-mental-model-host-vs-container">13. Mental Model: Host vs. Container</h2>
<p>Many setup mistakes come from confusing the <strong>host</strong> filesystem with the <strong>container</strong> filesystem. This table makes it explicit:</p>
<table>
<thead>
<tr>
<th>Inside the Jenkins container</th>
<th>Comes from on the host</th>
</tr>
</thead>
<tbody><tr>
<td><code>/var/jenkins_home</code></td>
<td>docker volume <code>jenkins-data</code> (Jenkins config, jobs, secrets)</td>
</tr>
<tr>
<td><code>/projects/...</code></td>
<td><code>/home/developer/projects/...</code> (your project tree)</td>
</tr>
<tr>
<td><code>/usr/bin/docker</code></td>
<td>host's <code>/usr/bin/docker</code></td>
</tr>
<tr>
<td><code>/usr/libexec/docker/cli-plugins/docker-compose</code></td>
<td>host plugin (lets <code>docker compose</code> work)</td>
</tr>
<tr>
<td><code>/var/run/docker.sock</code></td>
<td>host Docker daemon (so builds happen on the host's engine)</td>
</tr>
<tr>
<td><code>/etc/localtime</code>, <code>/etc/timezone</code></td>
<td>host clock</td>
</tr>
<tr>
<td><code>~/.ssh</code></td>
<td><strong>nothing</strong> — that's why SSH-to-GitHub doesn't work without extra setup</td>
</tr>
</tbody></table>
<p>When debugging, always ask: <em>"Inside which filesystem is this command running, and does the file/folder it's looking for exist there?"</em></p>
<h2 id="heading-14-daily-operations-cheat-sheet">14. Daily Operations Cheat Sheet</h2>
<pre><code class="language-bash"># Recreate Jenkins after changing compose
cd /home/developer/Projects/projects-prod-configs
docker compose -f docker-compose.staging.yml up -d --force-recreate jenkins

# Tail Jenkins logs
docker logs -f projects-jenkins-staging

# Open a shell inside the Jenkins container
docker exec -it projects-jenkins-staging bash

# From inside the container — sanity checks
docker compose version
ls /projects/projects-prod-configs/projects-backend
git -C /projects/projects-prod-configs/projects-backend remote -v

# Manually trigger the same deploy the pipeline does
cd /projects/projects-configs/projects-backend
git pull origin staging
docker compose -f docker-compose.staging.yml up -d --build student-apigw

# Inspect Traefik routing decisions
docker logs projects-traefik-staging 2&gt;&amp;1 | grep -i jenkins

# Check renewed certs
docker exec projects-traefik-staging cat /etc/traefik/acme.json | head -50
</code></pre>
<h2 id="heading-15-what-id-do-differently-next-time">15. What I'd Do Differently Next Time</h2>
<ul>
<li><p><strong>Pre-build a base image</strong> with all node_modules baked in. With ~1500 packages × 15 services, every clean build re-downloads ~22k tarballs. A shared base cuts that 90%.</p>
</li>
<li><p><strong>Run a private npm proxy</strong> (Verdaccio / Nexus / GitHub Packages) on the same Docker network — eliminates flaky <code>npmjs.org</code> timeouts entirely.</p>
</li>
<li><p><strong>Per-service Jenkinsfile</strong> if your services drift apart in tooling. With one Jenkinsfile, every team contends for the same pipeline definition.</p>
</li>
<li><p><strong>Replace</strong> <code>git diff HEAD~1 HEAD</code> with <code>git diff $(git merge-base HEAD origin/staging~1) HEAD</code> so squash-merges and force-pushes don't accidentally skip services.</p>
</li>
<li><p><strong>Move secrets to a vault</strong> (HashiCorp Vault / AWS Secrets Manager / Doppler). PATs in Jenkins work, but rotation across many jobs is painful.</p>
</li>
<li><p><strong>Use Jenkins' Configuration-as-Code (JCasC)</strong> so the entire Jenkins setup (jobs, credentials definitions, plugins) is in git. Then a server rebuild is a one-command operation.</p>
</li>
</ul>
<h2 id="heading-closing-thoughts">Closing Thoughts</h2>
<p>The pipeline itself is just three stages: <strong>Checkout → Detect Changes → Deploy</strong> — but a real production setup is mostly about <strong>plumbing</strong>: reverse proxy, certificates, bind-mounts, credentials, timezones, build caches. None of these are exotic. Together they decide whether your Friday-afternoon deploy goes silently green or eats your weekend.</p>
<p>Follow sections 1–11 to get a working pipeline. Bookmark section 12 to keep it working.</p>
<p>Happy shipping.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
