scaling - freeCodeCamp.org

How Large-Scale Platforms Handle Millions of Daily Transactions

Manish Shivanandhan — Sat, 13 Jun 2026 06:50:15 +0000

Every day, millions of people order food, stream videos, send messages, book rides, make payments, and shop online. Most of these actions take only a few seconds from the user's perspective. A user clicks a button, and the platform responds almost instantly.

Behind the scenes, however, these platforms are processing enormous numbers of transactions. A single popular application may handle thousands of requests every second and millions of transactions every day. Each transaction must be processed accurately, securely, and quickly.

In this article, we'll explore how large-scale platforms manage massive transaction volumes, the engineering challenges involved, and the architectural patterns developers use to build reliable systems.

What We'll Cover:

Why Transaction Volume Creates Unique Challenges
Breaking Monoliths Into Services
Using Load Balancers to Distribute Traffic
Why Databases Become Bottlenecks
Caching Frequently Accessed Data
Processing Tasks Asynchronously
Preventing Duplicate Transactions
Monitoring Everything
Preparing for Traffic Spikes
Building for Failure
The Importance of Consistency and Reliability
Conclusion

Why Transaction Volume Creates Unique Challenges

Handling a few hundred transactions per day is relatively straightforward. A single server and database can often manage the workload without difficulty. The challenge emerges as usage grows and systems begin serving thousands or even millions of users simultaneously.

Consider an online marketplace operating across multiple countries. At any given moment, thousands of users may be placing orders. Inventory must be updated in real time, payments must be processed accurately, notifications must be delivered, and fraud detection systems must evaluate transactions before approval. All of this happens within seconds.

At scale, even a minor delay can affect thousands of users. Systems must maintain low response times while preventing database bottlenecks, avoiding duplicate transactions, handling unexpected traffic spikes, and remaining reliable when failures occur.

To solve these problems, engineering teams rely on distributed systems and scalable architectural patterns.

Breaking Monoliths Into Services

Many successful platforms begin as monolithic applications where all functionality exists within a single codebase. While this approach works well during the early stages of growth, it can become increasingly difficult to scale as transaction volume increases.

To overcome this limitation, large platforms often adopt a service-oriented architecture. Instead of one application handling every responsibility, individual services are created for specific business functions such as user management, payments, inventory, notifications, and analytics.

A simplified order-processing workflow might look like this:

def create_order(user_id, product_id):
    inventory.reserve(product_id)

    payment_result = payment.charge(user_id)

    if payment_result.success:
        order.create(user_id, product_id)
        notification.send_confirmation(user_id)

    return payment_result

This separation allows each service to scale independently. If payment activity suddenly increases, engineers can allocate additional resources specifically to the payment service without affecting the rest of the platform. It also lets teams develop, deploy, and maintain services independently, improving both agility and reliability.

Using Load Balancers to Distribute Traffic

No single server can handle millions of daily transactions on its own. To distribute incoming requests efficiently, platforms place load balancers in front of their application servers.

Instead of connecting directly to a server, users send requests to a load balancer. The load balancer determines which server is best positioned to handle each request based on factors such as current load, availability, and health status.

A simplified architecture looks like this:

Users
   |
Load Balancer
   |
-------------------
|        |        |
Server1 Server2 Server3

If one server becomes overloaded or fails, traffic can be redirected to healthier servers. This improves both performance and availability. Modern cloud providers offer managed load-balancing solutions that automatically distribute traffic based on resource utilization and server health.

Why Databases Become Bottlenecks

Scaling application servers is often relatively easy. But databases frequently become the most significant bottleneck in transaction-heavy systems.

Every transaction ultimately requires reading or writing data. Consider an online task management platform where users complete tasks and receive rewards. Each completed task may trigger multiple database operations, including verification of task completion, updating account balances, recording transaction history, and generating audit logs.

As transaction volume grows, database performance becomes critical. One common solution is read replication. Instead of relying on a single database instance, platforms create multiple replicas that handle read requests while the primary database focuses on write operations.

The architecture may resemble the following:

Primary DB
     |
-------------------------
|         |            |
Replica1 Replica2 Replica3

By distributing read traffic across multiple replicas, platforms reduce pressure on the primary database and improve response times for users.

Caching Frequently Accessed Data

Not every request needs to reach the database. In fact, repeatedly querying the database for the same information can significantly increase infrastructure costs and response times.

To address this, platforms use caching systems such as Redis to store frequently accessed data in memory. Information such as user profiles, product details, and application settings often changes infrequently and can be retrieved directly from the cache.

Without caching:

user = database.get_user(user_id)

With caching:

user = cache.get(user_id)

if not user:
    user = database.get_user(user_id)
    cache.set(user_id, user)

Memory access is substantially faster than database queries. When a platform processes millions of requests every day, caching can dramatically improve performance while reducing backend load.

Processing Tasks Asynchronously

Users expect immediate responses. If every operation must finish before the system responds, applications quickly become sluggish under heavy load.

To improve responsiveness, large-scale systems separate critical user-facing actions from background processing tasks. Consider a payment transaction. The user needs confirmation that the payment was successful, but they don't need to wait for analytics updates, report generation, or email delivery.

A synchronous implementation might look like this:

process_payment()
send_email()
update_analytics()
generate_report()

A more scalable approach uses message queues:

process_payment()

queue.publish("send_email")
queue.publish("update_analytics")
queue.publish("generate_report")

Background workers consume these queued tasks and process them independently. This architecture improves user experience and enables systems to handle significantly larger transaction volumes.

Preventing Duplicate Transactions

One of the most important challenges in transaction processing is preventing duplicate execution.

Network interruptions can create situations where users unknowingly submit the same request multiple times. Imagine a customer making a purchase. The payment succeeds, but the confirmation never reaches the user's device because of a network failure. Believing the payment failed, the customer clicks the button again.

Without safeguards, the platform could charge the customer twice.

Many systems solve this problem through idempotency keys. A simplified implementation looks like this:

def process_payment(request_id, amount):

    if payment_exists(request_id):
        return existing_payment(request_id)

    payment = create_payment(request_id, amount)
    return payment

If the same request arrives again, the system returns the original result instead of processing a second payment. This pattern is widely used in financial services, payment gateways, and banking applications.

Monitoring Everything

As systems grow more complex, visibility becomes essential. Engineering teams can't effectively troubleshoot issues they can't observe.

Modern platforms collect metrics from every layer of their infrastructure. Engineers continuously monitor request latency, database response times, error rates, queue depth, CPU utilization, and memory consumption.

A simple monitoring rule might look like this:

if error_rate > 5:
    alert("High error rate detected")

Monitoring enables teams to identify problems before they impact users. It also provides valuable data for performance optimization and future capacity planning.

Preparing for Traffic Spikes

Traffic patterns are rarely predictable. An e-commerce platform may experience enormous demand during holiday sales, while a ticketing website can receive millions of requests within minutes when a popular event goes live.

To handle these surges, platforms rely on autoscaling. Cloud infrastructure can automatically add resources as demand increases and remove them when traffic subsides.

A simplified scaling rule might look like this:

if cpu_usage > 70:
    add_server()

Autoscaling helps maintain performance during peak periods while controlling infrastructure costs during quieter times.

Building for Failure

One of the most important principles in distributed systems is accepting that failures are inevitable.

Servers crash. Databases become unavailable. Networks experience interruptions. Rather than hoping these events never occur, large-scale platforms design systems that can continue operating when failures happen.

For example, payment systems often include retry logic:

for attempt in range(3):
    try:
        charge_customer()
        break
    except:
        continue

In addition, platforms implement redundancy by running multiple instances of critical components across different geographic regions and availability zones. If one component fails, another can take over with minimal disruption.

This strategy significantly improves availability and resilience.

The Importance of Consistency and Reliability

At scale, transaction processing isn't solely about speed. Accuracy is equally important.

Users may tolerate a slight delay, but they won't tolerate duplicate charges, missing funds, incorrect balances, or lost transactions. For this reason, large-scale transaction systems place a strong emphasis on consistency, auditing, logging, reconciliation, and recovery mechanisms.

Every transaction must be traceable. Every failure must be recoverable. These requirements become particularly important in industries such as finance, e-commerce, subscription billing, and task earning platforms where money and rewards move between users and businesses every day.

Conclusion

The ability to handle millions of daily transactions isn't the result of a single technology. It comes from combining multiple architectural principles that work together to create reliable, scalable systems.

Large-scale platforms distribute traffic across multiple servers, separate responsibilities into specialized services, cache frequently accessed data, process background work asynchronously, continuously monitor system health, and design for inevitable failures.

For developers, understanding these patterns provides valuable insight into how modern internet platforms operate behind the scenes. Whether you're building a payment processor, a SaaS platform, an online marketplace, or a task earning application, the same foundational principles apply.

As systems grow, scalability becomes less about writing more code and more about designing architecture that remains reliable under increasing demand. The platforms that succeed are the ones capable of delivering fast, accurate, and consistent transactions regardless of how many users arrive.

Hope you enjoyed this article. You can connect with me on LinkedIn.

How to Scale Laravel Applications for High-Traffic Production Systems

Olamilekan Lamidi — Thu, 11 Jun 2026 23:45:39 +0000

Your first scaling problem rarely arrives with a bang. For a while, everything is fine: pages load fast, the database barely breaks a sweat, and the team ships features without thinking much about infrastructure.

Then traffic climbs. A campaign over-performs. A marketplace onboards a popular seller. A SaaS product signs a couple of enterprise accounts.

Suddenly, /dashboard takes two seconds instead of 300 milliseconds. Queue jobs that used to clear in seconds sit waiting for minutes. You have database CPU spikes every afternoon.

So you add another app server, and response time barely moves because the real culprit was a slow query on a large table all along.

If you have run Laravel in production, you've probably lived some version of this. The good news is that scaling Laravel almost never means abandoning the framework. It means learning where pressure builds and making the application behave predictably under load.

In this guide, you'll learn how to find common bottlenecks, tune the database, use Redis effectively, move slow work onto queues, optimize APIs, and monitor a Laravel application in production.

None of this requires a single heroic rewrite. The biggest wins usually come from practical work: removing inefficient queries, pushing slow tasks onto queues, adding the right indexes, caching carefully chosen data, and measuring whether each change actually helped.

Prerequisites

You'll get the most out of this guide if you're already comfortable with:

Building applications with Laravel and PHP
Writing Eloquent queries and database migrations
Using queues, jobs, and scheduled commands
Reading a basic database query plan
Deploying Laravel to a production server or platform
Working with Redis and either MySQL or PostgreSQL in a production-like setup

What Happens When Laravel Apps Start Growing
Common Laravel Bottlenecks
How to Optimize the Database
How to Scale with Redis
How to Use Queue-Driven Architectures
How to Optimize API Performance
How to Monitor Laravel in Production
An Example High-Traffic Laravel Architecture
Lessons Learned the Hard Way
A Pre-Launch Scaling Checklist
Conclusion
References

What Happens When Laravel Apps Start Growing

Traffic changes a system's behavior because it turns small inefficiencies into permanent costs. A query that takes 80 milliseconds is harmless when it runs a few hundred times an hour. Run it 30 times per page view on a page that gets thousands of hits a minute, and that same query becomes a capacity problem.

The pressure tends to show up in predictable places. More requests mean more PHP workers, more database connections, more queue volume, and more Redis operations.

The database, whether MySQL or PostgreSQL, is usually the first thing to buckle. Queues back up when work is created faster than workers can drain it. Caches only help when hit rates stay high and misses stay controlled. And scaling everything horizontally can turn sloppy code into an expensive cloud bill.

That's why scaling work has to start with measurement, not guesswork. Before you change anything, you want to know what is actually saturated: request CPU, database I/O, lock contention, Redis latency, queue depth, an external API, or oversized payloads.

A typical request in a growing Laravel app travels through several layers. The user sends a request, a load balancer routes it to an app server, and Laravel checks Redis for a cached result. On a miss, it queries the database, stores the computed result back in Redis, and hands any slow follow-up work to a queue. A worker picks up that job later while Laravel returns the response right away.

Here's the important part: adding more app servers does nothing for a slow query, a missing index, or an overloaded queue. Horizontal scaling only pays off once the shared dependencies behind those servers can keep up.

Common Laravel Bottlenecks

Laravel itself causes very few scaling problems. Most issues come from how application code talks to the database, the network, and background workers.

N+1 Queries

The classic offender is the N+1 query. You load a list of models, then lazily touch a relationship on each one:

use App\Models\Post;

$posts = Post::latest()->take(50)->get();

foreach (\(posts as \)post) {
    echo $post->author->name;
}

That's one query for the posts plus one query per author: 51 queries for a single page. Eager load the relationship instead:

use App\Models\Post;

$posts = Post::with('author')
    ->latest()
    ->take(50)
    ->get();

foreach (\(posts as \)post) {
    echo $post->author->name;
}

In production, these are sneaky. They often hide inside API Resources, Blade components, and authorization checks, where the relationship access isn't obvious from the controller.

Missing Indexes

Adding an index is one of the highest-return fixes you can make. Take a query like this:

\(orders = Order::where('account_id', \)accountId)
    ->where('status', 'paid')
    ->whereBetween('created_at', [\(start, \)end])
    ->latest()
    ->paginate(50);

If orders has millions of rows and no useful compound index, the database scans far more rows than it needs to. Add an index that matches how you actually query:

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration {
    public function up(): void
    {
        Schema::table('orders', function (Blueprint $table) {
            $table->index(['account_id', 'status', 'created_at']);
        });
    }

    public function down(): void
    {
        Schema::table('orders', function (Blueprint $table) {
            $table->dropIndex(['account_id', 'status', 'created_at']);
        });
    }
};

Indexes aren't free, though. They take up space and slow down writes. Add them for real, repeated query patterns, not for every column that ever appears in a where clause.

Inefficient Eager Loading

You can also swing too far the other way. Loading every relationship "just in case" burns memory and ships data the request never uses:

$users = User::with([
    'profile',
    'teams',
    'roles.permissions',
    'invoices.lineItems.product',
])->get();

That might be fine for an admin detail page showing one user. On a list page, it's a liability. Constrain the eager loads and select only the columns you need:

$users = User::query()
    ->select(['id', 'name', 'email'])
    ->with([
        'profile:id,user_id,avatar_url',
        'teams:id,name',
    ])
    ->latest()
    ->paginate(25);

One caveat: tightly scoped select lists can break later code that expects a column you didn't load. Keep this technique close to read-heavy endpoints where the payoff is obvious.

Synchronous Processing

High-traffic apps need short web requests. Sending email, generating PDFs, calling third-party APIs, resizing images, and building exports usually belong outside the request cycle. This version can hurt you:

public function store(Request $request)
{
    \(order = Order::create(\)request->validated());

    Mail::to(\(order->user)->send(new OrderReceipt(\)order));

    return response()->json($order, 201);
}

Push the work onto a queue instead:

public function store(StoreOrderRequest $request)
{
    \(order = Order::create(\)request->validated());

    SendOrderReceipt::dispatch($order->id);

    return response()->json([
        'id' => $order->id,
        'status' => 'accepted',
    ], 202);
}

Now your response time no longer depends on your mail provider. If the provider has a slow afternoon, the queue absorbs it and your users don't have to wait.

Large Payloads

Oversized JSON responses hurt everyone in the chain: the app server serializing them, the network carrying them, and the client parsing them. A frequent mistake is returning whole models when you meant to return a summary:

return User::with('orders', 'invoices', 'teams')->findOrFail($id);

Define an explicit API Resource instead:

use Illuminate\Http\Resources\Json\JsonResource;

class UserSummaryResource extends JsonResource
{
    public function toArray($request): array
    {
        return [
            'id' => $this->id,
            'name' => $this->name,
            'avatar_url' => $this->profile?->avatar_url,
            'plan' => $this->subscription_plan,
        ];
    }
}

A small, deliberate response contract keeps endpoint cost easy to reason about and prevents accidental coupling.

Expensive Joins

Joins are useful, but expensive joins across large tables can dominate your database time, especially when they sort or filter on columns that aren't indexed:

$rows = DB::table('orders')
    ->join('users', 'users.id', '=', 'orders.user_id')
    ->join('accounts', 'accounts.id', '=', 'users.account_id')
    ->where('accounts.region', 'us-east')
    ->where('orders.status', 'paid')
    ->orderByDesc('orders.created_at')
    ->limit(100)
    ->get();

At scale, you may need to denormalize a small field, precompute a reporting table, or move analytics off the primary transactional database entirely. Do not treat denormalization as an admission of defeat. Copying a stable field like account_id onto orders can remove a costly join from a hot path. The price you pay is keeping that duplicated data consistent, which can be a worthwhile trade-off.

How to Optimize the Database

When a Laravel app slows down, the database is usually the first place to look.

Add Indexes Around Real Query Patterns

Start with your slow query log, database metrics, and traces rather than intuition. If the app constantly looks up active subscriptions by account, build a compound index that matches that access pattern:

Schema::table('subscriptions', function (Blueprint $table) {
    $table->index(['account_id', 'status', 'renews_at']);
});

Then write the query so it can actually use the index:

\(subscription = Subscription::where('account_id', \)accountId)
    ->where('status', 'active')
    ->where('renews_at', '>=', now())
    ->orderBy('renews_at')
    ->first();

Get in the habit of running EXPLAIN after you add an index to confirm that the plan changed. An index the optimizer ignores is just write overhead.

Use Eager Loading Deliberately

Match eager loading to what the endpoint actually returns. For list endpoints, keep relationships shallow and constrained:

$projects = Project::query()
    ->select(['id', 'account_id', 'name', 'updated_at'])
    ->withCount('openTasks')
    ->with([
        'owner:id,name',
    ])
    ->where('account_id', $accountId)
    ->latest('updated_at')
    ->paginate(30);

When you only need a number, withCount beats loading a whole relationship to count it:

$teams = Team::query()
    ->withCount([
        'members',
        'invitations as pending_invitations_count' => fn (\(query) => \)query->whereNull('accepted_at'),
    ])
    ->paginate(25);

Your memory footprint stays flat, which matters much more on a list page than on a detail page.

Optimize Queries Before Adding Hardware

A bigger database instance buys you time. It also hides the inefficient queries that put you there until the next traffic jump exposes them again. Before you reach for a larger machine, find your highest-cost queries. In local or staging environments, logging slow ones is easy:

use Illuminate\Database\Events\QueryExecuted;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Log;

DB::listen(function (QueryExecuted $query) {
    if ($query->time > 100) {
        Log::warning('Slow query detected', [
            'sql' => $query->toRawSql(),
            'time_ms' => $query->time,
        ]);
    }
});

Be careful doing this in production. Bindings can contain sensitive data, and verbose logging at high volume can become its own performance problem.

Process Large Tables with Chunking

Never pull an entire large table into memory for a batch job:

User::where('is_active', true)
    ->chunkById(1000, function ($users) {
        foreach (\(users as \)user) {
            RefreshUserSearchIndex::dispatch($user->id);
        }
    });

chunkById is safer than offset-based chunking when rows can change while the job runs, because it tracks the last seen ID instead of a numeric offset. For very large exports, stream the records or write them out in batches.

Use Cursor Pagination for High-Volume Feeds

Offset pagination gets slower the deeper a user scrolls, because the database still has to skip every row it's not returning. For feeds, audit logs, messages, and timelines, cursor pagination is usually the better fit:

$events = AuditEvent::query()
    ->where('account_id', $accountId)
    ->orderByDesc('id')
    ->cursorPaginate(50);

return AuditEventResource::collection($events);

It relies on a stable, indexed ordering column and uses next/previous cursors rather than arbitrary page numbers, which is what an infinite-scroll feed usually needs.

Split Reads with Read Replicas

As read traffic grows, replicas can take load off the primary:

'mysql' => [
    'driver' => 'mysql',
    'read' => [
        'host' => [
            env('DB_READ_HOST', '127.0.0.1'),
        ],
    ],
    'write' => [
        'host' => [
            env('DB_WRITE_HOST', '127.0.0.1'),
        ],
    ],
    'sticky' => true,
    'database' => env('DB_DATABASE', 'laravel'),
    'username' => env('DB_USERNAME', 'root'),
    'password' => env('DB_PASSWORD', ''),
],

The sticky option keeps reads on the write connection after a write within the same request, which helps avoid some read-after-write surprises.

Replicas come with replication lag, and that lag matters. Don't route payment confirmations, password changes, permission checks, or anything else consistency-sensitive to a replica that might be a few seconds stale unless the business flow can genuinely tolerate seeing old data.

How to Scale with Redis

Redis often does a lot in a Laravel production stack: caching, sessions, rate limiting, queues, locks, and Horizon metrics. It's fast, but it still needs thought: sensible key design, expiration policies, memory monitoring, and a real plan for invalidation.

Caching

Cache expensive reads that get requested often and can tolerate being slightly out of date:

use Illuminate\Support\Facades\Cache;

$stats = Cache::remember(
    "accounts:{$account->id}:dashboard-stats",
    now()->addMinutes(5),
    fn () => DashboardStats::forAccount($account)->calculate()
);

Short time-to-live values go a surprisingly long way. A five-minute cache can wipe out thousands of duplicate queries while keeping the data fresh enough for most dashboards.

When the data changes after a known event, invalidate it explicitly:

Order::created(function (Order $order) {
    Cache::forget("accounts:{$order->account_id}:dashboard-stats");
});

Caching works best when your keys are predictable and your invalidation is tied to domain events rather than guesswork.

Sessions

For horizontally scaled app servers, file-based sessions are a trap: the next request can land on a different server that has never seen the session. Store sessions in Redis or a database so any server can handle any request:

SESSION_DRIVER=redis
CACHE_STORE=redis
QUEUE_CONNECTION=redis

Rate Limiting

Rate limits protect you from abusive clients, runaway loops, and endpoints that get hammered:

use Illuminate\Cache\RateLimiting\Limit;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\RateLimiter;

RateLimiter::for('api', function (Request $request) {
    return Limit::perMinute(120)->by(
        optional(\(request->user())->id ?: \)request->ip()
    );
});

Expensive endpoints deserve stricter limits:

RateLimiter::for('exports', function (Request $request) {
    return Limit::perHour(10)->by($request->user()->id);
});

Let business cost drive the numbers. Login, search, export, and webhook endpoints rarely need the same limit.

Queues

Redis is a common queue backend because it's quick and Horizon supports it well:

QUEUE_CONNECTION=redis

Dispatch work onto named queues from the request:

GenerateInvoicePdf::dispatch($invoice->id)
    ->onQueue('documents');

Split work by profile, such as default, emails, webhooks, documents, and imports, because each workload can need different worker counts and retry rules. Keep the names meaningful. During an incident, "the documents queue is 20 minutes behind" tells you far more than "default is slow."

How to Use Queue-Driven Architectures

Queues are one of Laravel's best scaling tools. They let the app accept work quickly and process it asynchronously with controlled concurrency. They also make the system more resilient: when a third-party API goes down, jobs retry on their own instead of tying up your PHP-FPM request workers.

Laravel Queues

A good job is small, idempotent, and safe to retry:

use App\Mail\OrderReceiptMail;
use App\Models\Order;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Queue\Queueable;
use Illuminate\Support\Facades\Mail;

class SendOrderReceipt implements ShouldQueue
{
    use Queueable;

    public int $tries = 3;
    public int $backoff = 60;

    public function __construct(public int $orderId)
    {
    }

    public function handle(): void
    {
        \(order = Order::with('user')->findOrFail(\)this->orderId);

        Mail::to(\(order->user)->send(new OrderReceiptMail(\)order));
    }
}

Pass IDs into jobs rather than full Eloquent models. The model might change before the job runs, and serializing a whole model bloats the payload. For external APIs, add timeouts and guard against duplicate work:

use App\Models\Order;
use App\Services\CrmClient;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Queue\Queueable;

class SyncOrderToCrm implements ShouldQueue
{
    use Queueable;

    public int $tries = 3;
    public int $backoff = 60;

    public function __construct(public int $orderId)
    {
    }

    public function handle(CrmClient $crm): void
    {
        \(order = Order::findOrFail(\)this->orderId);

        if ($order->crm_synced_at) {
            return;
        }

        \(crm->upsertOrder(\)order->external_reference, [
            'total' => $order->total,
            'status' => $order->status,
        ]);

        $order->forceFill(['crm_synced_at' => now()])->save();
    }
}

The crm_synced_at check is the whole point. Jobs run more than once in real life, and idempotency is what keeps a retry from double-charging or double-syncing.

Horizon

Horizon gives you visibility and control over Redis queues. A typical setup runs different supervisors for different workloads:

'production' => [
    'supervisor-default' => [
        'connection' => 'redis',
        'queue' => ['default', 'emails'],
        'balance' => 'auto',
        'maxProcesses' => 20,
        'tries' => 3,
    ],

    'supervisor-documents' => [
        'connection' => 'redis',
        'queue' => ['documents'],
        'balance' => 'simple',
        'maxProcesses' => 5,
        'tries' => 2,
        'timeout' => 300,
    ],
],

The separation matters: a long-running document job shouldn't starve a quick password-reset email.

Failed Jobs and Retries

Retries only help when failures are temporary. Retrying a job that's permanently broken just burns capacity. For jobs with a business deadline, use retryUntil:

use DateTime;
use Throwable;

public function retryUntil(): DateTime
{
    return now()->addMinutes(30);
}

public function failed(Throwable $exception): void
{
    ImportBatch::whereKey($this->batchId)->update([
        'status' => 'failed',
        'failed_reason' => $exception->getMessage(),
    ]);
}

Use failed to flag the problem somewhere a human will see it. Whatever you do, don't set unlimited retries on jobs that hit a third-party service.

Queue Monitoring

Track queue depth, wait time, failure rate, and processing time together. Depth alone can mislead you. When depth starts climbing, walk through it methodically: are workers keeping pace with incoming jobs? If the queue keeps growing, check how long individual jobs take. If the slow part is the database, fix the query or dial back worker concurrency. If it's an external API, add backoff or a circuit breaker. If the work is CPU-bound, scale workers or break the jobs into smaller pieces.

Be careful with the "scale workers" instinct, though. Adding more workers without checking the database first can make an incident worse. More workers mean more concurrent queries, more locks, and more pressure on the primary exactly when it's already struggling.

How to Optimize API Performance

APIs earn special attention because clients call them repeatedly and payloads tend to grow quietly over months.

API Resources

Resources keep your response shape intentional:

class OrderResource extends JsonResource
{
    public function toArray($request): array
    {
        return [
            'id' => $this->id,
            'status' => $this->status,
            'total' => $this->total,
            'placed_at' => $this->created_at->toIso8601String(),
            'customer' => new CustomerSummaryResource($this->whenLoaded('customer')),
        ];
    }
}

whenLoaded is doing real work here. It stops the resource from quietly triggering a lazy query when the relationship wasn't eager loaded:

$orders = Order::query()
    ->with('customer:id,name')
    ->where('account_id', $accountId)
    ->latest()
    ->paginate(50);

return OrderResource::collection($orders);

Pagination

Returning unbounded collections is an easy way to create an API performance problem you won't notice until a client has a lot of data:

$perPage = min((int) request('per_page', 50), 100);

\(orders = Order::where('account_id', \)accountId)
    ->latest()
    ->paginate($perPage);

Cap the page size. If a client genuinely needs every record for an export, make that an async job rather than a giant synchronous response.

Response Optimization

Stop returning fields nobody reads. On read-heavy endpoints, selecting only the columns you need cuts both database I/O and serialization cost:

$products = Product::query()
    ->select(['id', 'name', 'slug', 'price', 'thumbnail_url'])
    ->where('is_visible', true)
    ->orderBy('name')
    ->paginate(40);

It's also worth turning on compression at the web server or load balancer. JSON compresses extremely well, and that's often a small config change with a real bandwidth payoff.

Rate Limiting

Design API rate limits around identity and endpoint cost:

Route::middleware(['auth:sanctum', 'throttle:api'])
    ->group(function () {
        Route::get('/orders', [OrderController::class, 'index']);
        Route::post('/exports/orders', [OrderExportController::class, 'store'])
            ->middleware('throttle:exports');
    });

This keeps casual browsing and expensive exports under separate policies, so one heavy user can't squeeze out everyone else.

Caching API Responses

Cache responses that are expensive to compute and can tolerate being a little stale:

public function index(Request $request)
{
    \(accountId = \)request->user()->account_id;
    \(page = \)request->integer('page', 1);

    \(cacheKey = "api:accounts:{\)accountId}:orders:v1:page:{$page}";

    return Cache::remember(\(cacheKey, now()->addSeconds(60), function () use (\)accountId) {
        return OrderResource::collection(
            Order::with('customer:id,name')
                ->where('account_id', $accountId)
                ->latest()
                ->paginate(50)
        )->response()->getData(true);
    });
}

Notice the v1 in the key. Bumping that version number lets you invalidate an entire response format at once when the shape changes. Always scope the key to the tenant or user for anything that's not truly global.

How to Monitor Laravel in Production

The teams that catch problems before customers do are the ones collecting signals from everywhere: Laravel, queues, the database, Redis, the infrastructure, and external services.

Laravel gives you several good starting points. Horizon shows queue throughput, failed jobs, wait times, and worker balancing. Telescope surfaces request details, queries, exceptions, jobs, mail, and cache events. Your logs capture slow operations, unexpected retries, and external failures. Your metrics track latency, error rate, queue depth, job runtime, database CPU, lock waits, cache hit ratio, and Redis memory. Your alerting ties all of it back to something a customer would actually feel.

That last part is where teams often make mistakes. The best alerts are about symptoms, not machines being busy: p95 API latency over 800ms for 10 minutes, checkout error rate above 1%, the emails queue waiting more than 5 minutes, database CPU over 85% with slow queries rising, Redis memory over 80%, or failed payment webhooks crossing a threshold.

A useful mental model is this: logs tell you what happened, metrics tell you whether the system is healthy, and traces tell you where the time went. In practice, wrapping your expensive business operations in a bit of instrumentation pays off quickly:

use Illuminate\Support\Facades\Log;

$startedAt = microtime(true);

\(report = \)builder->forAccount($account)->build();

Log::info('Billing report generated', [
    'account_id' => $account->id,
    'duration_ms' => (int) ((microtime(true) - $startedAt) * 1000),
    'invoice_count' => $report->invoiceCount(),
]);

When something is failing at 2am, a log line like that can tell you which account, import, or report is causing the pressure.

One more thing worth internalizing: monitor wait time, not just throughput. A queue can process thousands of jobs a minute and still be unhealthy if important jobs sit waiting too long before they start. Users feel the wait, not the throughput.

An Example High-Traffic Laravel Architecture

A high-traffic Laravel setup generally separates four things: stateless web requests, shared cache and session storage, asynchronous workers, and database roles.

Users hit a load balancer, which spreads traffic across a fleet of stateless Laravel app servers. Those servers use Redis for cache, sessions, rate limits, queues, and Horizon data. Queue workers handle slow or unreliable work off to the side. A MySQL primary takes all writes and any consistency-sensitive reads, while a read replica absorbs read-heavy endpoints that can tolerate some replication lag.

The flow looks like this:

Users
  -> Load balancer
  -> Stateless Laravel app servers
  -> Redis for cache, sessions, rate limits, queues, and Horizon data
  -> Primary database for writes and consistency-sensitive reads
  -> Read replica for safe read-heavy endpoints

Redis queue
  -> Queue workers
  -> Database, external APIs, mail providers, object storage, and other services

This isn't the only valid shape. PostgreSQL can stand in for MySQL, Amazon SQS can replace Redis queues, a CDN can serve static assets and cache public responses, and object storage should hold user uploads. The principle that matters is that each layer has one clear job and can be scaled or tuned on its own.

The flip side of stateless app servers is that anything a user needs after the request ends has to live in shared storage. Uploads, generated files, and session state shouldn't sit on a single server's local disk, or they may disappear from the user's point of view when the load balancer sends the next request somewhere else.

Lessons Learned the Hard Way

1. Premature Optimization

This usually shows up as elaborate infrastructure built before the app has any real visibility into itself.

The practical path works better: measure, rank the bottlenecks, fix the biggest one, repeat. For most Laravel apps, the first round of scaling is mostly indexes, N+1 fixes, queue separation, and trimming payloads.

2. Over-caching

Caching can make a system faster and harder to reason about at the same time. One team cached an account-settings response for 30 minutes, then later folded role changes into that same response. The result was that users who had just lost access could still see features until the cache expired.

The fix was splitting stable account metadata away from permission-sensitive state. The lesson is to avoid caching authorization data unless you have thought carefully about invalidation.

3. Missing Indexes

These hide until a table crosses a size threshold. A query that scanned 20,000 rows in development can scan 20 million in production. Bake index review into feature work, and plan big index migrations carefully so they don't lock a hot table at the worst possible time.

4. Queue Overload

Queues don't remove work, they move it. The classic failure is letting one noisy workload block everything else. A big CSV import floods the default queue, and password-reset emails get stuck behind it. Separate queues are cheap insurance against that entire class of incident.

5. Large Transactions

Long transactions hold locks longer and make failures more expensive. Dispatching a job inside a transaction is especially risky because a worker can grab it before the transaction commits:

DB::transaction(function () use ($request) {
    $order = Order::create([...]);
    \(order->items()->createMany(\)request->items);

    GenerateInvoicePdf::dispatch($order->id);
    SyncOrderToCrm::dispatch($order->id);
});

Use after-commit dispatching for any job that depends on committed data:

GenerateInvoicePdf::dispatch($order->id)->afterCommit();
SyncOrderToCrm::dispatch($order->id)->afterCommit();

Keep transactions scoped to the data that genuinely has to change atomically, and nothing more.

6. Treating Symptoms as Causes

This is the expensive one. If latency is high because an endpoint runs 300 queries, adding app servers adds database pressure. If jobs are slow because an external API is rate-limiting you, adding workers multiplies the failures.

Good scaling work keeps asking the same questions: What resource is saturated? Which endpoint, job, tenant, or query is causing it? Is this work necessary during the request? Can I reduce it, defer it, cache it, or isolate it? How will I know whether the change helped?

A Pre-Launch Scaling Checklist

Run through this before a big launch, a traffic campaign, or an enterprise rollout.

Application and runtime: Cache config, routes, and views during deploy. Set APP_DEBUG=false. Turn on OPcache. Keep web requests short and move slow work to queues. Store uploads in object storage, not on app-server disk. Keep servers stateless. Set timeouts on every external HTTP call.

Database: Review slow query logs first. Add indexes for your high-volume filters, joins, and ordering. Hunt for N+1 queries in controllers, resources, policies, and views. Paginate every list endpoint. Use chunkById or cursors for batch work. Avoid long transactions and external calls inside transactions. Confirm your backup and restore process works. Test stale-read behavior if you use replicas.

Redis and cache: Use Redis for cache, sessions, rate limiting, and queues where it fits. Set TTLs unless you have a clear reason not to. Include tenant, user, locale, and version in keys when relevant. Watch memory and the eviction policy. Avoid caching permission-sensitive responses without careful invalidation. Guard against cache stampedes on expensive recomputation.

Queues: Separate queues by workload. Configure Horizon supervisors per queue. Set timeouts, retries, and backoff on purpose. Make jobs idempotent where you can. Use afterCommit for jobs that depend on committed data. Monitor wait time, runtime, failures, and retries. Review failed jobs instead of ignoring them.

APIs: Use Resources to control response shape. Cap per_page. Use cursor pagination for big feeds and logs. Cache expensive reads with safe, versioned keys and short TTLs. Apply rate limits by endpoint cost. Don't return raw Eloquent models. Compress responses at the edge.

Observability: Track p50, p95, and p99 latency on the endpoints that matter. Track error rates by route and job class. Alert on queue wait time, not just size. Watch database CPU, connections, slow queries, and lock waits. Watch Redis memory, latency, and evictions. Log important business operations with durations and identifiers. Test your alerts before launch night because a silent alert is worse than no alert.

Conclusion

Laravel runs high-traffic production systems well when you design around the real costs of data, concurrency, and external dependencies. Just make sure you measure before you optimize, because guessing wastes time and tends to complicate the wrong layer.

Fix the database first: indexes, query shape, pagination, and eager loading usually deliver the biggest early wins. Lean on queues to keep requests fast and push slow work into controlled background workers. Cache deliberately, with clear keys, sane TTLs, and a plan for invalidation. Keep watching latency, errors, queue wait time, database health, Redis memory, and your external dependencies.

The best scaling work is practical and repeatable. You study the system you actually have, remove waste, isolate slow parts, and give yourself enough visibility to make the next change with confidence. Do that on a loop, and you rarely need the big rewrite.

References

What is Amazon EC2 Auto Scaling?

Destiny Erhabor — Mon, 06 May 2024 16:32:47 +0000

Auto scaling is like having a smart system that keeps an eye on how many people are visiting your website. When you have a lot of people, it quickly adds more servers to handle the extra traffic. And when things quiet down, it scales back to save you money.

In AWS, there are two important services that help with this: Amazon EC2 Auto Scaling and AWS Auto Scaling. Amazon EC2 Auto Scaling is specifically for managing your EC2 servers, while AWS Auto Scaling can also handle other things like DynamoDB tables and Amazon Aurora databases.

In this article, we'll dive deeper into how Amazon EC2 Auto Scaling works and how you can use it to keep your website running smoothly without overspending on servers.

Prerequisites

Have an AWS account
Basic understanding of EC2 instance

Table of Content

Prerequisites
Example Use case
Advantages of Amazon EC2 Auto Scaling
Components of EC2 Auto Scaling
What is Launch Configurations vs Launch Templates
How to create a launch template
What are Auto Scaling Groups (ASGs)
How to create an Auto Scaling Group
What are Scaling Policies
Conclusion

Example Use Case

Scenario:

Imagine running a website that sells trendy clothes. Sometimes, lots of people visit your site at once, especially during lunch breaks or evenings. Other times, it's pretty quiet.

Problem:

You need enough servers to handle busy times, but you don't want to waste money on too many servers when it's quiet.

Solution with Amazon EC2 Auto Scaling:

Traffic Analysis: Look at when people visit your site the most. This helps you understand when you need more servers.

Set Rules: Decide when to add or remove servers automatically. For example, you might say, "If more than 70% of our servers are busy for more than 5 minutes, add one more server."

Adjust Server Numbers: Tell Amazon the smallest and biggest number of servers you need. You can also say how many you'd like on average. For instance, you might say, "Keep at least 2 servers running all the time. But if it's busy, go up to 10 servers. And usually, we need around 4."

Load Balancing: Make sure all servers get some work. Use a load balancer to send visitors to the least busy server. This keeps everything running smoothly even if you have many servers.

Test and Watch: Before trusting everything, test to see if it works as planned. Keep an eye on it afterward to make sure it's doing its job right.

Save Money: With auto scaling, you don't pay for servers you're not using. When traffic is low, it reduces the number of servers, saving you money. When traffic picks up, it adds more servers, so your site stays fast.

Advantages of Using Amazon EC2 Auto Scaling

Cost Optimization: EC2 Auto Scaling helps optimize costs by automatically adjusting the number of EC2 instances based on demand. During periods of low traffic, it reduces the number of instances, saving on operational costs. Conversely, during high traffic, it scales up to ensure optimal performance without over-provisioning resources.

Improved Availability: By automatically distributing incoming traffic across multiple instances and fault tolerance of your application. If any instance fails/is unhealthy, the Auto Scaling group replaces it with a new one, ensuring minimal disruption to your services.

Scalability: EC2 Auto Scaling allows your application to handle sudden spikes in traffic or increased workload without manual intervention.

Enhanced Performance: With EC2 Auto Scaling, you can maintain consistent performance levels even during peak usage periods. By automatically adding more instances when traffic increases, it prevents performance degradation and ensures a smooth user experience.

Ease of Management: EC2 Auto Scaling simplifies the management of your EC2 fleet by automating instance provisioning, scaling, and monitoring.

Integration with AWS Services: EC2 Auto Scaling integrates seamlessly with other AWS services such as Elastic Load Balancing (ELB) and Amazon CloudWatch.

Highly Customizable: EC2 Auto Scaling offers flexibility and customization options to meet the specific needs of your application.

Components of EC2 Auto Scaling

Let's get a better understanding on how the Auto Scaling works through its different components.

There are two distinct steps to configuration. The first step is the creation of a launch configuration or launch template. The second is the creation of an Auto Scaling group.

Launch Configurations and Launch Templates

Launch configurations or launch templates define the configuration settings for the EC2 instances that will be launched by the Auto Scaling group.

These settings include the AMI (Amazon Machine Image), instance type, security groups, key pair, and user data.

Launch configurations are older and being phased out in favor of launch templates, which offer more features and flexibility.

How to Create a Launch Template

First, navigate to EC2 Instance page

AWS instance page

Select the Launch Templates under the instances and click the create button.

AWS launch templates

The following screen should show up, almost similar to launching an EC2 instance. You can fill the required information accordingly.

Create AWS launch templates

After configuration, click the "Create Launch" template button and allow it to create, then view your newly created launch template with default and latest version as 1. You can use this launch template to create another launch template and specify a different version for it.

View AWS launch templates

Auto scaling requires either a launch template or launch configuration to identify the instance it's launching and its configurations.

What are Auto Scaling Groups (ASGs)

Auto Scaling groups are the core component of EC2 Auto Scaling. They define the group of EC2 instances that are managed together and share the same scaling policies. ASGs ensure that your application can automatically scale out (add instances) or scale in (remove instances) based on demand.

How to create an Auto Scaling Group

First, navigate to EC2 Instance page and under the Auto Scaling group, select and click the create button.

creating an Auto Scaling group

On the create screen, the first step is to give your ASG a Name and then select your launch template created from the steps above.

creating a launch template

The next step requires you to select or override an instance launch template. You also select a VPC and subnet.

selecting instance launch template

The next step is to configure advanced options such as adding a load balancer and monitoring. You can attach or add a new load balancer but for this article we will skip this part.

configuring advanced options

Next, configure the group size and scaling. Here, we want to configure the scale between minimum of 2 and maximum of 5. Also, set the metrics type to track the CPU utilization (set to 50 – you can increase to 70 or more) for scaling.

configuring group size and scaling

Next two steps are for adding notifications (you will need to create an SNS service for this) and tags. In this article, we are going to skip these and create our ASG.

Create and view the ASG created. From its activity folder, you can see those two instances launched. Also, from the instances page, you should see two EC2 instances. This is because we set our desired state to 2.

Auto Scaling groups

What are Scaling Policies?

Scaling policies define the rules that govern how the Auto Scaling group scales in or out in response to changing demand. There are four types of scaling policies:

Let's break down each type of scaling with examples:

Manual Scaling

Manual scaling involves adjusting the number of EC2 instances in your Auto Scaling group manually, without relying on automated triggers or policies. This type of scaling is typically done in response to predictable events or planned changes in demand.

Example: Assuming you run an e-commerce website, and you know that there will be a flash sale event that will attract a large number of visitors. To handle the expected surge in traffic, you can manually increase the desired capacity of your Auto Scaling group before the event, adding more EC2 instances in advance of the anticipated demand spike. After the event is over, you can manually reduce the desired capacity back to its normal level.

Pros:

Control: Offers direct control over the number of EC2 instances in the Auto Scaling group.
Flexibility: Allows for immediate adjustments based on specific requirements or events.

Cons:

Manual Intervention: Relies on human intervention, which can be time-consuming and prone to errors.
Lack of Automation: Not suitable for handling dynamic or unpredictable fluctuations in demand efficiently.

Schedule Scaling

Schedule scaling involves defining predefined schedules to adjust the number of EC2 instances in your Auto Scaling group automatically. This type of scaling is useful for applications with predictable traffic patterns, such as daily or weekly fluctuations in demand.

Example: Consider a video streaming service that experiences peak traffic during evenings and weekends. You can set up a schedule scaling policy to increase the desired capacity of your Auto Scaling group every evening at 6 PM and decrease it every morning at 6 AM. This ensures that you have enough capacity to handle peak demand periods without overspending on resources during off-peak hours.

Pros:

Predictability: Well-suited for applications with predictable traffic patterns, such as daily or weekly fluctuations.
Cost Optimization: Helps optimize costs by aligning resources with expected demand patterns.

Cons:

Limited Adaptability: May not be responsive to sudden changes in demand or unexpected traffic spikes.
Requires Planning: Requires upfront planning and configuration of schedules based on historical data or business insights.

Dynamic Scaling

Dynamic scaling adjusts the number of EC2 instances in your Auto Scaling group automatically based on real-time metrics, such as CPU utilization, network traffic, or other application-specific metrics. This type of scaling is responsive to fluctuations in demand and helps ensure optimal performance and cost-effectiveness.

Types:

Step Scaling: This policy scales the number of instances based on a series of scaling adjustments defined by step adjustments and associated metrics thresholds.
Target Tracking: This policy automatically adjusts the number of instances to maintain a specified target metric, such as average CPU utilization or network traffic.

When adding instances to the ASG, it will take a few minutes for them to come online and handle load. This is why a cooldown policy has to be set.

Scaling Cooldowns: Scaling cooldowns help prevent rapid fluctuations in the number of instances by imposing a cooldown period after a scaling activity is triggered. During this cooldown period, EC2 Auto Scaling will not launch or terminate additional instances, allowing time for the newly launched instances to stabilize or for the impact of terminated instances to be observed.

Example: Let's say you operate a ride-sharing platform where demand can vary unpredictably throughout the day. With dynamic scaling, you can configure Auto Scaling policies to add more EC2 instances when the number of ride requests exceeds a certain threshold, and remove instances when demand decreases. This allows you to dynamically adapt to changing traffic patterns in real-time, ensuring a seamless experience for both drivers and passengers.

Pros:

Real-Time Responsiveness: Adjusts resource allocation dynamically in response to actual demand, ensuring optimal performance.
Cost Efficiency: Automatically scales resources up or down, helping to optimize costs by only using what is needed.

Cons:

Potential Over-Provisioning: May lead to over-provisioning during sudden spikes in demand if scaling policies are not properly configured.
Complexity: Requires careful configuration of scaling policies and monitoring of metrics to ensure effective scaling behavior.

Predictive Scaling

Predictive scaling uses machine learning algorithms and historical data to forecast future demand and proactively adjust the number of EC2 instances in your Auto Scaling group. This type of scaling helps prevent under-provisioning or over-provisioning of resources by anticipating changes in demand before they occur.

Example: Suppose you operate a weather forecasting application that experiences increased demand during severe weather events. By analyzing historical data on weather patterns and user behavior, predictive scaling can predict when a surge in traffic is likely to occur and automatically scale up the capacity of your Auto Scaling group ahead of time. This ensures that your application remains responsive and available during peak usage periods without unnecessary resource waste.

Pros:

Proactive Optimization: Anticipates future demand based on historical data, ensuring resources are provisioned ahead of time.
Improved Cost Management: Helps prevent under-provisioning and over-provisioning, optimizing resource usage and costs.

Cons:

Data Dependence: Relies on accurate historical data and effective machine learning models for accurate predictions.
Initial Setup: Requires initial setup and configuration of predictive scaling models, which can be complex and resource-intensive.

Conclusion

In conclusion, Amazon EC2 Auto Scaling offers a range of strategies to effectively manage and optimize the performance of applications running on EC2 instances.

Whether it's through manual adjustments, scheduled scaling, dynamic responses to real-time metrics, or proactive measures based on predictive analytics, EC2 Auto Scaling provides the flexibility and automation needed to ensure that resources are aligned with demand.

By leveraging these scaling capabilities, businesses can enhance availability, improve cost efficiency, and deliver a seamless user experience, ultimately driving better outcomes for their applications and customers on the AWS platform.

As always, I hope you enjoyed the article and learned something new. If you want, you can also follow me on LinkedIn or Twitter.

Best Practices for Scaling Your Node.js REST APIs

freeCodeCamp — Thu, 15 Sep 2022 16:59:00 +0000

By Rishabh Rawat

There is more to scalability than using cluster mode. In this tutorial, we'll explore 10 ways you can make your Node.js API ready to scale.

When working on a project, we often get a few real nuggets here and there on how to do something in a better way. We get to learn retrospectively, and then we're fully prepared to apply it next time around.

But how often does that actually work out? I don't even remember what I did yesterday sometimes. So I wrote this article.

This is my attempt to document some of the best Node.js scalability practices that are not talked about as often.

You can adopt these practices at any stage in your Node.js project. It doesn't have to be a last-minute patch.

With that said, here's what we will cover in this article:

🚦Use throttling
🐢 Optimize your database queries
䷪ Fail fast with circuit breaker
🔍 Log your checkpoints
🌠 Use Kafka over HTTP requests
🪝 Look out for memory leaks
🐇 Use caching
🎏 Use connection pooling
🕋 Seamless scale-ups
💎 OpenAPI compliant documentation

Use Throttling

Throttling allows you to limit access to your services to prevent them from being overwhelmed by too many requests. It has some clear benefits – you can safeguard your application whether it's a large burst of users or a denial-of-service attack.

The common place to implement a throttling mechanism is where the rate of input and output don't match. Particularly, when there is more inbound traffic than what a service can (or wants to) handle.

Let’s understand with a visualization.

Your application is throttling requests from News Feed Service

There's throttling at the first junction point between your application and the News Feed Service:

News Feed Service (NFS) subscribes to your application for sending notifications.
It sends 1000 requests to your application every second.
Your application only handles 500 requests/sec based on the billing plan NFS subscribed to.
Notifications are sent for the first 500 requests.

Now it is very important to note that all the requests by NFS that exceed the quota of 500 requests/sec should fail and have to be retried by the NFS.

Why reject the extra requests when you can queue them? There are a couple of reasons:

Accepting all the requests will cause your application to start accumulating them. It will become a single point of failure (by RAM/disk exhaustion) for all the clients subscribed to your application, including NFS.
You should not accept requests that are greater than the scope of the subscription plan of your clients (in this case, NFS).

For application level rate limiting, you can use express-rate-limit middleware for your Express.js API. For network level throttling, you can find solutions like WAF.

If you are using a pub-sub mechanism, you can throttle your consumers or subscribers as well. For instance, you can choose to consume only limited bytes of data when consuming a Kafka topic by setting the maxBytes option.

Optimize Your Database Queries

There will be times when querying the database is the only choice. You might have not cached the data or it could be stale.

When that happens, make sure your database is prepared for it. Having enough RAM and disk IOPS is a good first step.

Secondly, optimize your queries as much as possible. For starters, here are a couple of things that will set you on the right path:

Try to use indexed fields when querying. Don't over-index your tables in hopes of the best performance. Indexes have their cost.
For deletes, stick to soft deletes. If permanent deletion is necessary, delay it. (interesting story)
When reading data, only fetch the required fields using projection. If possible, strip away the unnecessary metadata and methods (for example, Mongoose has lean).
Try to decouple database performance from the user experience. If CRUD on the database can happen in the background (that is, non-blocking), do it. Don't leave the user waiting.
Directly update the desired fields using update queries. Don't fetch the document, update the field, and save the whole document back to the database. It has network and database overhead.

Fail Fast with a Circuit Breaker

Imagine you get burst traffic on your Node.js application, and one of the external services required to fulfill the requests is down. Would you want to keep hitting the dead end for every request thereafter? Definitely Not. We don't want to waste time and resources on the requests destined to fail.

This is the whole idea of a circuit breaker. Fail early. Fail fast.

For example, if 50 out of 100 requests fail, it doesn't allow any more requests to that external service for the next X seconds. It prevents firing requests that are bound to fail.

Once the circuit resets, it allows requests to go through. If they fail again, the circuit breaks and the cycle repeats.

Node.js Opposum circuit breaker states

To learn more about how to add a circuit breaker to your Node.js application, check out Opposum. You can read more on circuit breakers here.

Log Your Checkpoints

A good logging setup allows you to spot errors quickly. You can create visualizations to understand your app's behavior, set up alerts, and debug efficiently.

You can check out the ELK stack for setting up a good logging and alerting pipeline.

While logging is an essential tool, it is very easy to overdo it. If you start logging everything, you can end up exhausting your disk IOPS causing your application to suffer.

As a good rule of thumb is to only log checkpoints.

Checkpoints can be:

Requests, as they enter the main control flow in your application and after they are validated and sanitized.
Request and response when interacting with an external service/SDK/API.
The final response to that request.
Helpful error messages for your catch handlers (with sane defaults for error messages).

PS: If a request goes through multiple services during the lifecycle, you can pass along a unique ID in the logs to capture a particular request across all the services.

Use Kafka Over HTTP Requests

While HTTP has its use-cases, it is easy to overdo it. Avoid using HTTP requests where it is not necessary.

Let's understand this with the help of an example.

Overview of Kafka pub-sub using topics

Let's say you are building a product like Amazon and there are two services:

Vendor service
Inventory service

Whenever you receive new stock from the vendor service, you push the stock details to a Kafka topic. The inventory service listens to that topic and updates the database acknowledging the fresh restock.

To note that, you push the new stock data into the pipeline and move on. It is consumed by the inventory service at its own pace. Kafka allows you to decouple services.

Now, what happens if your inventory service goes down? It is not straightforward with HTTP requests. Whereas in the case of Kafka, you can replay the intended messages (for example using kcat). With Kafka, you do not lose data after consumption.

When an item comes back in stock, you might want to send out notifications to the users who wishlisted it. To do that, your notification service can listen to the same topic as the inventory service. This way, a single message bus is consumed at various places without HTTP overhead.

The Getting Started page of KafkaJS shares the exact snippet to get you started with a basic setup in your Node.js application. I’d highly recommend checking it out, as there's a lot to explore.

Look Out for Memory Leaks

If you don't write memory-safe code and don't profile your application often, you may end up with a crashed server.

You do not want your profiling results to look like this:

setTimeout retaining 98% memory after execution is over

For starters, I would recommend the following:

Run your Node.js API with the --inspect flag.
Open chrome://inspect/#devices in your Chrome browser.
Click inspect > Memory tab > Allocation instrumentation on timeline.
Perform some operations on your app. You can use apache bench on macOS to fire off multiple requests. Run curl cheat.sh/ab in your terminal to learn how to use it.
Stop the recording and analyze the memory retainers.

If you find any large blocks of retained memory, try to minimize it. There are a lot of resources on this topic. Start by googling "how to prevent memory leaks in Node.js".

Profiling your Node.js application and looking for memory utilization patterns should be regular practice. Let's make "Profiling Driven Refactor" (PDR) a thing?

Use Caching to Prevent Excessive Database Lookup

The goal is to not hit the database for every request your application gets. Storing the results in cache decreases the load on your database and boosts performance.

There are two strategies when working with caching.

Write through caching makes sure the data is inserted into the database and the cache when a write operation happens. This keeps the cache relevant and leads to better performance. Downsides? Expensive cache as you store infrequently used data to the cache as well.

Whereas in Lazy loading, the data is only written to the cache when it is first read. The first request serves the data from the database but the consequent requests use the cache. It has a smaller cost but increased response time for the first request.

To decide the TTL (or Time To Live) for the cached data, ask yourself:

How often the underlying data changes?
What is the risk of returning outdated data to the end user?

If it is okay, having more TTL will help you with a better performance.

Importantly, add a slight delta to your TTLs. If your application receives a large burst of traffic and all of your cached data expires at once, it can lead to unbearable load on the database, affecting user experience.

final TTL = estimated value of TTL + small random delta

There are a number of policies to perform cache eviction. But leaving it on default settings is a valid and accepted approach.

Use Connection Pooling

Opening a standalone connection to the database is costly. It involves TCP handshake, SSL, authentication and authorization checks, and so on.

Instead, you can leverage connection pooling.

Database connection pool

A connection pool holds multiple connections at any given time. Whenever you need it, the pool manager assigns any available/idle connection. You get to skip the cold start phase of a brand new connection.

Why not max out the number of connections in the pool, then? Because it highly depends on your hardware resources. If you ignore it, performance can take a massive toll.

The more the connections, the less RAM each connection has, and the slower the queries that leverage RAM (for example sort). The same principle applies to your disk and CPU. With every new connection, you are spreading your resources thin across the connections.

You can tweak the number of connections till it matches your needs. For starters, you can get an estimate on the size you need from here.

Read about the MongoDB connection pool here. For PostgreSQL, you can use the node-postgres package. It has built-in support for connection pooling.

Seamless Scale-ups

When your application's user base is starting to grow and you have already hit the ceiling on vertical scaling, what do you do? You scale horizontally.

Vertical scaling means increasing the resources of a node (CPU, memory, etc.) whereas horizontal scaling involves adding more nodes to balance out the load on each node.

If you’re using AWS, you can leverage Automatic Scaling Groups (ASG) which horizontally scales the number of servers based on a predefined rule (for example when CPU utilization is more than 50%).

You can even pre-schedule the scale up and scale down using scheduled actions in case of predictable traffic patterns (for example during the World Cup finals for a streaming service).

Once you have your ASG in place, adding a load balancer in front will make sure the traffic is routed to all the instances based on a chosen strategy (like round robin, for example).

Load balancing multiple targets based on predefined rules

PS: It is always a good idea to estimate the requests your single server can handle (CPU, memory, disk, and so on) and allocate at least 30% more.

OpenAPI Compliant Documentation

It might not directly affect your ability to scale a Node.js application, but I had to include this in the list. If you've ever done an API integration, you know it.

It is crucial to know everything about the API before you take a single step forward. It makes it easy to integrate, iterate, and reason about the design. Not to mention the gains in the speed of development.

Make sure to create OpenAPI Specification (OAS) for your Node.js API.

It allows you to create API documentation in an industry-standard manner. It acts as a single source of truth. When defined properly, it makes interacting with the API much more productive.

I have created and published a sample API documentation here. You can even inspect any API using the swagger inspector.

You can find all of your API documentations and create new ones from the Swagger Hub dashboard.

Now you go, captain!

We have looked at ten lesser-known best practices to prepare Node.js for scale and how you can take your first steps with each one of them.

Now it is your turn to go through the checklist and explore the ones you find lacking in your Node.js application.

Grab your checklist ✨

I hope you found this helpful and it gave you some pointers to move forward in your scalability endeavor. This is not an exhaustive list of all the best practices – I have just included the ones I found are not talked about as much based on my experience.

Feel free to reach out on Twitter. I'd love to hear your feedback and suggestions on other best practices that you are using.

Liked the article? Get the improvement pills on backend web development 💌.

Horizontal vs. Vertical Scaling – How to Scale a Database

Sophia Iroegbu — Thu, 09 Jun 2022 15:26:24 +0000

Data Scalability

Data scalability refers to the ability of a database to manipulate changing demands by adding and removing data. In this way, the database grows at the same pace as the software.

Via scaling, the database can expand or contract the capacity of the system's resources to support the application's frequently changing usage.

There are two ways a database can be scaled:

Horizontal scaling (scale-out)
Vertical scaling (scale-up)

In this article, we'll look at both methods of scaling and discuss the advantages and disadvantages of each to help you choose.

Horizontal Scaling

This scaling approach adds more database nodes to handle the increased workload. It decreases the load on the server rather than expanding the individual servers.

When you need more capacity, you can add more servers to the cluster. Another name for this scaling method is Scaling out.

Advantages of Horizontal Scaling:

It is easy to upgrade
It is simple to implement and costs less
It offers flexible, scalable tools
It has limitless scaling with unlimited addition of server instances
Upgrading a horizontally scaled database is easy – just add a node to the server

Disadvantages of Horizontal Scaling:

Any bugs in the code will become more complex to debug and understand
The licensing fee is expensive as you will have more nodes that are licensed
The cost of the data center will increase significantly because of the increased space, cooling, and power required

When to use horizontal scaling:

If you are dealing with more than a thousand users, it is best to use this scaling system because when the servers receive multiple user requests, everything will scale well.

It will also not crash because there are multiple servers.

Vertical Scaling

The vertical scaling approach increases the capacity of a single machine by increasing the resources in the same logical server. This involves adding resources like memory, storage, and processing power to existing software, enhancing its performance.

This is the traditional method of scaling a database. Another name for this approach is Scale-up.

Advantages of Vertical Scaling:

The cost of the data center for the space, cooling, and power will be smaller
It is a cost-efficient software
It is easy to use and implement – the administrator can easily manage and maintain the software
The resources for this approach are flexible

Disadvantages of Vertical Scaling:

The cost may be low, but you will need to pay for a license each time you scale up
The hardware costs more because of high-end servers
There is a limit to the amount you can upgrade
You are restricted to a single database vendor, and migration is challenging, or you may need to start over

When to use vertical scaling:

The vertical scaling approach is for you if you need a system with unique data consistency.

If you don't want to worry about balancing the server's workload, vertical scaling is the best option.

Differences Between Vertical and Horizontal Scaling

Vertical	Horizontal
The license costs less	The license costs more
This method increases the power of the server with additional individual servers	This method increases the power of the server with the existing server
This data is present on one single node, and it is scaled through a multicore	This is based on partitioning each node that contains a single part of data

Which scaling method is best for your app?

When choosing how to scale your database, you must consider what's at stake when you scale up and out.

Now we'll take a look at some factors to consider so you can choose which scaling system is best for your app:

Load balancing

The vertical scaling system is best for balancing loads because you have a single server (vertical scaling), and there is no need to balance your load. Horizontal scaling requires you to balance the workload evenly.

Point of failure

The horizontal scaling system has more than one server, so when one server crashes, the next one picks up the slack. This means that there is no single point of failure which makes the system resilient.

But in the vertical scaling system, there is only one server, so once the server crashes, everything goes offline.

Speed

In terms of speed, the vertical scaling system is faster because, since it runs on one server, the vertical scaling system has an interprocess communication – that is, the server communicates within itself and it's fast.

The horizontal scaling system has network calls between two or more servers. This is also known as Remote Procedure Calls (RPC). RPCs are slow, though.

Data consistency

When dealing with servers, you'll need to make sure that the data stored in them is consistent when end users send a request.

The vertical scaling system is data consistent because all information is on a single server. But the horizontal scaling system is scaled out with multiple servers, so data consistency can be a huge issue.

Hardware limitations

The horizontal scaling system scales well because the number of servers you throw at a request is linear to the number of users in the database or server. The vertical scaling system, on the other hand, has a limitation because everything runs on a single server.

When choosing a system to scale your database, make sure to make a pros and cons list of the information in this article. It will help you decide which to use.

Conclusion

A cloud computing model's scalability is the ability to quickly and instantly increase or decrease an IT capacity. Knowing how the two types of scaling work is crucial as this plays a massive role in your database or server management.

Quick recap...

A server's role is to enhance its capacity to handle the increased workload, called Vertical scaling.
A system's job is to add new nodes to manage the distributed workload, termed Horizontal scaling.
The horizontal scaling system scales well as the number of users increases.
The vertical scaling system is faster due to its ability to inter-process communication.

Thanks for reading!

How to Scale a Distributed System

freeCodeCamp — Mon, 13 Dec 2021 23:37:24 +0000

By Apoorv Tyagi

Designing a distributed system that supports millions of users is a complex task, and one that requires continuous improvement and refinement.

Recently I read a book by Alex Xu called "System Design Interview – An Insider's Guide". This article, inspired by the first part of the book, shares some popular techniques used by many large tech companies to scale their architecture to support up to a million users.

This is not an exhaustive list, but if you're a newer developer who's just getting started, this can help you build a stronger foundation for your career.

Use a Load Balancer**

A load balancer is a device that evenly distributes network traffic across several web servers. In this architecture, the clients do not connect to the servers directly – instead they connect to the public IP of the load balancer.

Using a load balancer also protects your site in the event of web server failure – and this, in turn, improves availability. For example,

If one server goes down, all the traffic can be routed to the second server. This prevents the overall system from going offline.
If in the future the traffic grows and these two servers are not enough to handle all the requests properly, then you just need to add more servers to your pool of web servers and the load balancer automatically starts distributing requests to them.

Load Balancing Algorithms

Let's look at some of the algorithms which a load balancer can use to choose a web server from a pool for an incoming request:

Round Robin – You start from the first server in the pool, move down to the next server, and when you're done with the last server you loop back up to the first and start working down the pool again.
Load-based server – You assign a server based on whichever server has the smallest load currently, thereby increasing throughput.
IP Hashing – You assign a server by hashing the IP address of incoming requests and using the hash value to do the modulo operation with the number of servers available in the server pool.

Use Caching

A cache stores the result of the previous responses so that any subsequent requests for the same data can be served faster. So you can use caching to minimize the network latency of a system.

You can significantly improve the performance of an application by decreasing the network calls to the database. This is because repeated database calls are expensive and cost time.

For example, every time a new user loads a website's home page, one or more database calls are made to fetch the data. This increases the response time. Caching can alleviate this problem by storing the results you know will get called often and those whose results get modified infrequently.

Here are a few considerations to keep in mind before using a cache:

Set an expiration policy: You should always have an expiration policy on your cache. If you don't have one, the data will get stored in the cache permanently and it will become stale.
Sync the cache and database: You should build a mechanism to keep the database and the cache in sync. If any data modifying operations occur in the databases and the same change doesn't reflect in the cache then it will introduce inconsistencies in your system.
Set an eviction policy: You should have an algorithm that can decide which existing items will get removed once the cache is full and you get a request to add other items to the cache. Least-recently-used (LRU) is one of the most popular cache eviction policies used today.

Use a Content Delivery Network (CDN)**

A CDN or a Content Delivery Network is a network of geographically distributed servers that help improve the delivery of static content from a performance perspective. CDN servers are generally used to cache content like images, CSS, and JavaScript files.

Here is how a CDN works:

When a client sends a request, a CDN server to the client will deliver all the static content related to the request.
If the CDN server does not have the required file, it then sends a request to the original web server.
The CDN caches the file and returns it to the client.
Let's say now another client sends the same request, then the file is returned from the CDN.

Here are a few considerations to keep in mind before using a CDN:

Cost: CDNs are generally run by third-party providers and they charge you for the data transfers in and out of the CDN. So caching infrequently used assets should not be stored in the CDN.
Fallback Mechanism: If a CDN fails, you should be able to detect it and start sending requests for resources from the original web server. So you should build a mechanism for how your application copes with a CDN failure.

Set Up a Message Queue**

A message queue allows an asynchronous form of communication. It acts as a buffer for the messages to get stored on the queue until they are processed.

The architecture of a message queue includes an input service, called publishers, that creates messages, publishes them to a message queue, and sends an event. Another service called subscribers receives these events and performs actions defined by the messages.

Both publishers and subscribers are decoupled from each other and that's what makes the message queue a preferred architecture for building scalable applications.

Message queue example

Consider the following use case:

You are building an application for ticket booking. As soon as a user completes their booking, a message confirming their payment and ticket should be triggered. This task may take some time to complete and it should not make our system wait for processing the next request.

Here, we can push the message details along with other metadata like the user's phone number to the message queue. Another worker service picks up the jobs from the message queue and asynchronously performs the message creation and sending tasks.

The publishers and the subscribers can be scaled independently. When the size of the queue increases, you can add more consumers to reduce the processing time.

Choose Your Database Wisely

According to Wikipedia:

A database is an organized collection of data stored and accessed via a computer system.

Databases are used for the persistent storage of data. We generally have two types of databases, relational and non-relational.

➔ Relational Database

A relational database has strict relationships between entries stored in the database and they are highly structured. This is to ensure data integrity. For example, adding a new field to the table when its schema doesn't allow for it will throw an error.

Another important feature of relational databases is ACID transactions.

ACID transactions

These are a set of features that describe any given transactions (a set of read or write operations) that a good relational database should support.

Atomicity means that when a transaction that comprises more than one operation takes place, the database must guarantee that if one operation fails the entire transaction fails. Either it happens completely or doesn't happen at all.

Consistency means that each transaction in a database does not violate the data integrity constraints whenever the database changes state and does not corrupt the data. In simple terms, consistency means for every "read" operation, you'll receive the most recent "write" operation results.

Isolation means that you can run multiple concurrent transactions on a database, without leading to any kind of inconsistency. All these multiple transactions will occur independently of each other.

Durability means that once the transaction has completed execution, the updated data remains stored in the database. It will be saved on a disk and will be persistent even if a system failure occurs.

➔ Non-Relational Databases

A non-relational database has a less rigid structure and may or may not have strict relationships between the entries stored in the database. The data typically is stored as key-value pairs. For example:

[
    { 
        firstName: "Apoorv",
        lastName: "Tyagi",
        gender: "M"
    },
    { 
        name: "Judit",
        rank: "Polgar",
        gender: "F"
    },
    {
      //...
    },
]

Similar to the ACID properties of relational databases, the non-relational database offers BASE properties:

Basically Available (BA) which states that the system guarantees availability even in the presence of multiple failures.

Soft State (S) means the state of the system may change over time, even without application interaction due to eventual consistency. In NoSQL, unlike RDBMS, it is believed that data consistency is the developer's responsibility and should not be handled by the database.

Eventual Consistency (E) means that the system will become consistent "eventually". However, there's no guarantee of when this will happen.

NoSQL vs SQL

Non-relational databases (also often referred to as NoSQL databases) might be a better choice if:

Your application requires low latency. Since there are no complex JOIN queries.
You have a large amount of unstructured data, or you do not have any relation among your data.

How to Scale a Database

Let's now look at the various ways you can scale your database:

Vertical vs horizontal database scaling

In vertical scaling, you scale by adding more power (CPU, RAM) to a single server.

In horizontal scaling, you scale by simply adding more servers to your pool of servers.

For low-scale applications, vertical scaling is a great option because of its simplicity. But vertical scaling has a hard limit. It is practically not possible to add unlimited RAM, CPU, and memory to a single server.

Because of this, it is recommended that you go for horizontal scaling (also known as sharding) for large-scale applications.

Database replication

This is the process of copying data from your central database to one or more databases.

You do database replication using primary-replica (formerly known as master-slave) architecture. The primary database generally only supports write operations. All the data modifying operations like insert or update will be sent to the primary database.

On the other hand, the replica databases get copies of the data from the primary database and only support read operations. All the data querying operations like read, fetch will be served by replica databases.

Advantages of database replication:

Performance Improvements: Database replication improves performance significantly as all the writes and updates happen in the primary node and all the read operations are distributed to replica nodes, thereby allowing more queries to run in parallel.
High Availability: Since we create replicas of data across different nodes available in different parts of the world, the application remains functional even if one database node goes offline as you can access data from other nodes. In case the failure occurs in the primary node, any one of the replica nodes will get promoted to a primary node and serve the write/update operations until the original primary node comes back online.

Wrapping Up

That's it. Thanks for stopping by. I hope you found this article interesting and informative!

My DMs are always open if you want to discuss further on any tech topic or if you've got any questions, suggestions, or feedback in general:

Happy learning! 💻 😄

A guide to understanding database scaling patterns

freeCodeCamp — Tue, 30 Jul 2019 17:07:05 +0000

By Kousik Nath

There are lot of articles online describing database scalability patterns, but they are mostly scattered articles — just techniques that are defined haphazardly without much context. I find that they are not defined in a step by step manner, and don't discuss when to choose which scaling option, which scaling options are feasible in practise, and why.

Therefore, I am planing to discuss some of the techniques in detail in future articles. To start, I feel it’s better if I discuss step by step techniques with some context in my own way. This article is a high level article — I will not discuss scaling techniques in details here, but will provide an overview. So let's get started.

A case study

Assume you have built a startup which offers ride sharing at a cheap cost. Initially when you start, you target a city and hardly have tens of customers after your initial advertisement.

You save all customers, trips, locations, bookings data, and customer trip history in the same database or most likely in a single physical machine. There is no fancy caching or big data pipeline to solve problems since your app is very new. This is perfect for your use case at this moment since there are very few customers and your system hardly books 1 trip in 5 minutes, for example.

But as time goes on, more people start signing up in your system since you are the cheapest service in the market and thanks to your promotion and ads. You start booking, say, 10 bookings per minute, and slowly the number increases to 20, 30 bookings per minute.

At this point of time, you realize that the system has started performing poorly: API latency has increased a lot, and some transactions deadlock or starve and eventually they fail. Your app is taking more time to respond, causing customer dissatisfaction. What can you do to solve the problem?

Pattern 1 - Query Optimization & Connection Pool implementation:

The first solution that comes in mind is that the cache frequently uses non-dynamic data like booking history, payment history, user profiles and so on. But after this application layer caching, you can’t solve the latency problem of APIs exposing dynamic data like the current driver's location or the nearest cabs for a given customer or current trip cost at a certain moment in time after the trip starts.

You identify that your database is probably heavily normalized, so you introduce some redundant columns (these columns frequently appear in WHERE or JOIN ON clause in queries) in highly used tables for the sake of denormalization. This reduces join queries, breaks a big query into multiple smaller queries, and adds their results up in the application layer.

Another parallel optimization that you can do is tweaking around database connections. Database client libraries and external libraries are available in almost all programming languages. You can use connection pool libraries to cache database connections or can configure connection pool size in the database management system itself.

Creating any network connection is costly since it requires some back & forth communication between client & server. Pooling connections may help you to optimize the number of connections. Connection pool libraries may help you to multiplex connections — multiple application threads can use the same database connection. I shall see if I can explain connection pooling in detail in a separate article later.

Your measure latency of your APIs & find probably 20–50% or more reduced latency. This is good optimization at this point in time.

You have now scaled your business to one more city, more customer sign up, you slowly start to do 80–100 bookings per minute. Your system is not able to handle this scale. Again you see API latency has increased, database layer has given up, but this time, no query optimization is giving you any significant performance gain. You check the system metric, you find disk space is almost full, CPU is busy 80% of the time, RAM fills up very quickly.

Pattern 2 - Vertical Scaling or Scale Up:

After examining all system metrics, you know there is no other easy solution rather than upgrading the hardware of the system. You upgrade your RAM size by 2 times, upgrade disk space by, say, 3 times or more. This is called vertical scaling or scaling up your system. You inform your infrastructure team or devops team or third party data centre agents to upgrade your machine.

But how do you set up machine for vertical scaling?

You allocate a bigger machine. One approach is not to migrate data manually from old machine rather set the new machine as replica to the existing machine (primary)-make a temporary primary replica configuration. Let the replication happen naturally. Once the replication is done, promote the new machine to primary & take the older machine offline. Since the bigger machine is expected to serve all request, all read / write will happen on this machine.

Cool. Your system is up & running again with increased performance.

Your business is doing very well & you decide to scale to 3 more cities — you are now operational in 5 cities total. Traffic is 3x times than earlier, you are expected to do around 300 bookings per minute. Before even achieving this target booking, you again hit the performance crunch, database index size is increasing heavily in memory, it needs constant maintenance, table scanning with index is getting slower than ever. You calculate the cost of scaling up the machine further but not convinced with the cost. What do you do now?

Pattern 3 - Command Query Responsibility Segregation (CQRS):

You identify that the big machine is not able to handle all read/write requests. Also in most of the cases, any company needs transactional capability on write but not on read operations. You are also fine with a little bit of inconsistent or delayed read operations & your business has no issue with that either. You see an opportunity where it might be a good option to separate the read & write operations physical machine wise. It will create scope for individual machines to handle more read/write operations.

You now take two more big machines & set them up as replica to the current machine. Database replication will take care of distributing data from primary machine to replica machines. You navigate all read queries (Query (Q) in CQRS) to the replicas — any replica can serve any read request, you navigate all write queries (Command (C) in CQRS) to the primary. There might be little lag in the replication, but according to your business use case that’s fine.

Most of the medium scale startups which serve few hundred thousand requests everyday can survive with primary-replica set up provided that they periodically archive older data.

Now you scale to 2 more cities, you see that your primary is not able to handle all write requests. Many write requests are having latency. Moreover, the lag between primary & replica sometimes impact customers & drivers ex — when trip ends, customer pays the driver successfully, but the driver is not able to see the payment since customer’s activity is a write request that goes to the primary, while driver’s activity is a read request that goes to one of the replicas. Your overall system is so slow that driver is not able to see the payment for at least half a minute — frustrating for both driver & customer. How do you solve it?

Pattern 4 - Multi Primary Replication

You scaled really well with primary-replica configuration, but now you need more write performance. You might be ready to compromise a little bit on read request performance. Why not distribute the write request to a replica also?

In a multi-primary configuration, all the machines can work as both primary& replica. You can think of multi-primary as a circle of machines say A->B->C->D->A. B can replicate data from A, C can replicate data from B, D can replicate data from C, A can replicate data from D. You can write data to any node, while reading data, you can broadcast the query to all nodes, whoever replies return that. All nodes will have same database schema, same set of tables, index etc. So you have to make sure there are no collision in id across nodes in the same table, otherwise during broadcasting, multiple nodes would return different data for the same id.

Generally it’s better to use UUID or GUID for id. One more disadvantage of this technique is — read queries might be inefficient since it involves broadcasting query & getting the correct result — basically scatter gather approach.

Now you scale to 5 more cities & your system is in pain again. You are expected to handle roughly 50 request per second. You are in desperate need to handle heavy number of concurrent requests. How do you achieve that?

Pattern 5 - Partitioning:

You know that your location database is something which is getting high write & read traffic. Probably write:read ratio is 7:3. This is putting a lot of pressure on the existing databases. The location tables contain few primary data like longitude, latitude, timestamp, driver id, trip id etc. It does not have a much to do with user trips, user data, payment data etc. What about separating the location tables in a separate database schema? What about putting that database in separate machines with proper primary-replica or multi-primary configuration?

This is called partitioning of data by functionality. Different database can host data categorized by different functionality, if required the result can be aggregated in the back end layer. Using this technique, you can focus on scaling those functionalities well which demand high read/write requests. Although the back end or application layer has to take the responsibility to join the results when necessary resulting in more code changes probably.

Now imagine you have expanded your business to a total of 20 cities in your country & planning to expand to Australia soon. Your increasing demand of app requires faster & faster response. None of the above method can help you to the extreme now. You must scale your system in such a way that expanding to other countries / regions does not always need you to do frequent engineering or architecture changes. How do you do that?

Pattern 6 - Horizontal Scaling:

You do lot of googling, read a lot on how other companies have solved the issue — and come to the conclusion that you need to scale horizontally. You allocate say 50 machines — all have the same database schema which in turn contains the same set of tables. All the machines just hold a part of data.

Since all databases contain same set of tables, you can design the system in such a way that locality of data is there i.e; all related data lands in the same machine. Each machine can have their own replicas, replicas can be used in failure recovery. Each of the databases is called shard. A physical machine can have one or multiple shards — it’s up to your design how you want. You need to decide on sharding key in such a way that a single sharding key always refers to the same machine. So you can imagine lot of machines all holding related data in same set of tables, read/write requests for the same row or same set of resource land in the same database machine.

Sharding is in general hard — at least engineers from different companies say that. But when you serve millions or billions of requests, you have to make such tough decision.

I will discuss sharding in greater detail in my next post, so holding back my temptation to discuss more in this post.

Now since you have sharding in place, you are confident that you can scale to many countries. Your business has grown so much that investors are pushing you to scale the business across continents. You again see some problem here. API latency again. Your service is hosted in USA & people from Vietnam are having difficult time book rides. Why? What do you do about it?

Pattern 7 - Data Centre Wise Partition:

Your business is growing in America, South Asia & in few countries in Europe. You are doing millions of bookings daily with billions of request hitting your server. Congrats - this is a peak moment for your business.

But since requests from the app have to travel across continents through hundreds or thousands of servers in the internet, the latency arises. What about distributing traffic across data centres? You can set up a data centre in Singapore that handles all requests from South Asia, data centre in Germany can handle all requests from European countries, and a California data centre can handle all USA requests.

Also you enable cross data centre replication which helps disaster recovery. So if California data centre does replication to Singapore data centre, in case California data centre crashes due to electricity issue or natural calamity, all USA requests can fall back to Singapore data centre and so on.

This scaling technique is useful when you have millions of customers to serve across countries and you can’t accommodate any data loss, you have to always maintain availability of the system.

These are some general step by step techniques for database scaling. Although most of the engineers don’t get enough chance to implement these techniques, but as a whole it’s better to get a broader idea about such system which in future may help you to do better system & architecture designing.

In my next articles, I will try to discuss some of the concepts in details. Please feel free to give appropriate feedback for this post if any.

The article is originally published on the author's medium account: https://medium.com/@kousiknath/understanding-database-scaling-patterns-ac24e5223522

How to Scale Elm Views with Master View Types

Cedd Burge — Thu, 18 Jul 2019 07:29:29 +0000

A concept to help Elm Views scale as applications grow larger and more complicated.

In Elm, there are a lot of great ways to scale the Model, and update, but there is more controversy around scaling the view. A lot of the debate is around Reusable Views versus Components. Components are not recommended, but a lot of people are still advocating for them. This article presents an idea that hopefully strengthens the argument for Resuable Views.

In almost all cases, the scaling problem comes down to enforcing consistency, which usually means allowing child views to make some adjustments to the master view, while at the same time not allowing child views to make a mess.

I will be using Richard Feldman's excellent Real World app (specifically written to demonstrate scaling in Elm) as an example, as it is contains a lot of current best practice techniques, it is well known (2000+ stars and 300+ forks) and Richard is a well known Elm expert.

I will be suggesting some improvements to this code, so I want make a clear at this point that I mean no disrespect by this (I would bet large sums of money that he did it in about one tenth of the time it would have taken me!). You could also argue that the problems are small and not worth fixing. Ultimately, this decision is yours, but by the end of the article I hope to persuade you that there are problems, and that they are fixable if you think it is worthwhile.

Master view functions with conditionals

One option is to define a master view function. This function takes care of shared concerns, like the header bar and overall layout. Then it calls child view functions depending on the current view and / or has parameters to control child specific behaviour.

This works, but can quickly lead to:

An explosion of parameters, potentially forcing your child views to return a lot of things they don't care about.
A mixing of responsibilities between master and child views.
Extra code and duplication.

In the Real World App, a parameter of type Page is passed to the master view so that it can render a navbar link as active. There is a large case statement that uses this parameter to work out what which link is active, and it would be a lot easier for the child just to specify this.

The line below shows the master view passing Page.Home, which has to match up with Home.view home. This is easy to get wrong, there is no help from the compiler or type system, and really it is the responsibility of the child view the specify this.

viewPage Page.Home GotHomeMsg (Home.view home)

There is some duplication when creating the NavBarLink Html, and the linkTo function will accept any Html, although only very particular Html is valid.

Convention and trust

Another possibility is for child views to be responsible for keeping shared elements consistent, by convention and trust.

Arguably this also happens in the Real World App. The Home, Article and Profile views all have the concept of a banner. The banner is different in each view, but presumably is meant to be a consistent and recognisable visual element (essentially, it's the title / header for the view). The views don't share any code for these banners, and as a result of this they are not the same size or colour. You could theoretically try and enforce a convention using tests, but it would be difficult, and probably not worthwhile.

Helper functions

Another possibility is for child views to be responsible for keeping shared elements consistent, but by using some helper functions. This is definitely a step forward, and is probably the most common solution I see in the wild. The functions can go in the same file and be next to each other. This makes it easier to see that they are related and are representing the same visual element, and easier to make them consistent.

However, there are still some drawbacks. The main one is that the child views have to know to use the helper functions, and there is nothing enforcing this. This isn't a huge deal when you only have one shared element and one function to call, but as applications get bigger, you end up with a combinatorial explosion of differences in the shared visual elements. Most people tame this by providing a number of small, focused functions for the various differences. Then the child view has to know about all these functions, and how to compose them, and there no help from the compiler.

Again, this arguably occurs in the Real World App: for example in this part of the Profile.view function, which needs to know how to use the viewTabs, Feed.viewArticles and Feed.viewPagination helper functions, and what Html they need to be contained in.

Scaling with Master View Types

In order to overcome these problems, I propose using a Type to define your site structure (I rather pompously call this a "Master View Type"). Child views then return this type, and the master view takes it as a parameter and returns the html.

For the Real World App examples we have been looking at, the Master View Type is below (Viewer is the person viewing the page in the Real World App). You could arguably have more general banner types here, such as AvatarBanner, or even IconBanner (instead of ViewerBanner) depending on your domain.

type alias Page =
    {   activeNavBarLink: NavBarLink
        , banner: Banner
        , body: Html Msg
    }

type Banner =
    TextBanner TextBannerProperties
    | ViewerBanner Viewer
    | ArticleBanner Viewer ArticlePreview

type NavBarLink =
    NavBarLink NavBarLinkProperties

To demonstrate this, I have create a repository with just the Header and Banner parts of the Real World App and then created a new repository after refactoring to use a Master Page Type, NavBarLink Type and Banner Type. You can peruse the code to get a feel for how it works.

To my mind, using a Master Page Type has the following benefits:

Writing the master view code is easier
Writing the child view code is easier
Communication and understanding are improved, as UI concepts now have names
Theming / redesigning a site is a lot easier
Elm packages can provide UI templates

The master view can precisely define what it will accept / support via the types, with union types and opaque types. Non supported combinations can be made unrepresentable or uncreatable.

In my example repository the NavBarLink type is opaque, so it is only possible to create supported NavBarLinks (home, article and viewer). In a similar way Banner is a union type, which means that only supported variants can be represented.

It would be possible for a programmer to simply change these files, but a proficient programmer would recognise the patterns and follow them. If this isn't enough and you are feeling paranoid, then you can require stricter code review on such files, potentially taking advantage of CODEOWNERS functionality on GitHub and GitLab. In the extreme you can provide the modules via an elm package, and restrict push access to the underlying repository.

Child views don't have to do anything more than create an instance of the types. The helper functions all return types, so it's easy to see which functions can be used in a particular context, and is impossible to use functions in the wrong context. For example, if a function returns a HeaderBarLink, it is impossible to mistakenly use this function to create a link in the FooterBar, or elsewhere on the page. Child views can also leave some of the complexity to the master view. For example, the child view can define a list of options to choose from, and the master view can render this using buttons, a drop down list or an autocomplete list, depending on the number of options.

The master page type also provides names for UI concepts, which can then be discussed. For example, a designer could say "Let's move the NavBarLinks to the left hand side", and everybody would know what they meant. A product owner could say "Let's create a new page with an IconBanner, and we'll use the current weather api for the icon" and again, everybody would know what they mean. You can look at this excellent thoughtworks article for more details of this.

Since the responsibility for turning the Master View Type in to html is all in the same place, it is easy to make drastic changes to the look and feel of a website, and to do theming. These changes and themes can alter the Css and the Html, which is something that the normal theming techniques just can't do. Pragmatically, your Master View Type will often have a body: Html Msg property (to allow child views complete flexibility on the child specific parts of the page) so there would still be some sprawling code to fix up, but it will definitely be a lot easier.

Finally, it opens up possibility of providing ready made themes and site layouts as packages. This would allow you to just do the following to get a working app, complete with layout and styling:

create-elm-app
elm install elm-bootstrap-starter-template
Write some code to create the Master Page Type
elm-app start

Companies could create packages like these to ensure a consistent look and feel across their applications. Open source designs and layouts could emerge and become commonplace, similar to the way that Bootstrap has revolutionised html and css design. Developers with limited design skills (like me) could concentrate on the the bits they are best at (the logic), but still produce produce elegant websites using these packages.

To demonstrate this I have created a bootstrap starter master view package. It mimics the layout and design of the bootstrap starter template. I have then used this package in a demo elm application. You can browse the demo application to see how it looks, and view the source to see how it works.

All these advantages come at a small to negative cost. There is a little more code for the new types, but some duplication is removed. You can view the source of the Real World App repositories from before and after refactoring to use a Master Page Type for the full details.

Conclusions

Master View Types bring a lot of benefits (view code is easier to write and maintain, UI concepts are named and UI packages are possible) for little or no cost. They should improve the code of any Elm application that has issues around enforcing consistency (while allowing flexibility) in their view code, which in my experience is most medium and large applications.

Unity Dashboard — lessons learned scaling our frontends, development culture and processes

freeCodeCamp — Thu, 14 Mar 2019 14:41:41 +0000

By Maciej Gurban

At Unity, we’ve recently set out to improve our Dashboards — an undertaking which dramatically changed not only our frontend tech stack, but also the ways we work and collaborate.

We’ve developed best practices and tooling to help us scale our frontend architecture, build products with great UX and performance, and to ship new features sooner.

This article gathers these practices and aims to provide as much reasoning behind each decision as possible. But first, some context.

The Legacy

Looking at the number of engineers, Unity more than quadrupled its headcount in the last 4 years. As the company grew both organically and through acquisitions, its product offering grew as well. While the products developed originally at Unity were largely uniform in terms of tech and design language, the newly acquired ones naturally were not.

As a result we had multiple visually distinct dashboards which worked and behaved differently and which shared no common navigational elements. This resulted in poor user experience and frustrated users. In the very literal sense, the state of frontends of our products was costing us revenue.

After analyzing the portfolio of our products, we’ve elicited three distinct sections Unity Dashboard would be split into: Develop, Operate and Acquire , each satisfying a different business need and meant for different customer groups, thus containing feature sets largely independent from each other.

This new structure, and the introduction of common navigational elements aimed to solve the first major issue our users were facing — where to find the information and configuration options they’re looking for, and while it all looked good on paper, the journey how to get there were far from obvious.

Considerations

Many of our developers were very excited about the possibility of moving to React and its more modern tech stack. As these solutions had been battle tested in large applications, and had their best practices and conventions mostly ironed out, things looked very promising.

Nevertheless, what our developers knew best and what most of our actively developed applications were written in was AngularJS. Deciding to start migrating everything in one go would have been a disaster waiting to happen. Instead we set out to test our assumptions on a much smaller scale first.

Perhaps the most disjointed group of products we’ve had were the Monetization dashboards. These projects, which would eventually end up under the umbrella of the Operate dashboard, were vastly different in almost any way possible: technologies used, approach to UI/UX, development practices, coding conventions — you name it.

Here’s what the situation roughly looked like:

State of our dashboards in April 2018. Projects using Angular vs those using React.

After some brainstorming we identified the main areas which we’d need to work on to bring all the products together:

1. A single product

We needed these dashboards (split across multiple applications, domains and tech stacks) to:

Feel like a single product (no full page redirects as the user navigates through pages of all the different applications)
Have a consistent look and feel
Include common navigational elements are always visible and look the same, no matter which part of the dashboard the user is visiting

2. Legacy support

While we did have a clean slate when it comes to the technology choice of our new frontend solution, we had to accommodate for the legacy projects which needed to be integrated into the new system. A solution, which didn’t involve big refactoring efforts, and which wouldn’t stop feature development, or drag for months without end in sight.

3. Practices and tooling

While nearly all the teams used AngularJS, different tools were being used to address the same set of challenges. Different test runners and assertion libraries, state management solutions or lack thereof, jQuery vs native browser selectors, SASS vs LESS, charting libraries etc.

4. Developer productivity

Since every team had their own solution to developing, testing and building their application, the development environment was often riddled with bugs, manual steps, and inefficiencies.

Additionally, many of our teams work in locations separated by a 10 hour difference (Helsinki, Finland and San Francisco), which makes efficient decision-making on any shared pieces a real challenge.

The New

Our main areas of focus were to:

Encourage and preserve agile ways of working in our teams, and to let the teams be largely independent from one another
Leverage and develop common tooling and conventions as much as possible, to document them, and make them easily accessible and usable

We believed that achieving these the goals would significantly improve our time to market and developer productivity. For that to happen, we required a solution which would:

Build product features with better user experience
Improve code quality
Allow for better collaboration without blocking anybody’s work progress in the process.

We also wanted to encourage and ease-in the move to a modern tech stack to make our developers more satisfied with their work, and to over time move away from our antiquated frameworks and tooling.

The ever-evolving result of our work is a React-based SPA built inside a monorepository where all the pages and bigger features get built into largely independent code bundles loaded on demand, and which can be developed and deployed by multiple teams at the same time.

As a means of sandboxing all the legacy applications but still displaying them in the context of the same new application, we load them inside an iframe from within which they can communicate with the main SPA using a message bus implemented using the [postMessage()](https://developer.mozilla.org/en-US/docs/Web/API/Window/postMessage) API.

The monorepository

Here’s the directory structure we started out with:

/src   /components  /scenes    /foo      /components      package.json      foo.js    /bar      /components      package.json      bar.js package.json index.js

The package.json in the root directory contains a set of devDependencies responsible for development, test and build environment of the whole application, but also contains dependencies of the core of the application (more on that a bit later).

All the larger UI chunks are referred to as scenes. Each scene contains a package.json where dependencies used by that scene’s components are defined. This makes two things possible:

Deployment updates only the files which have changed
The build step compiles separate vendor and app bundles for each scene, naming each using a hash which will change only when contents of the file have changed. This means our users only download files which have changed since their last visit, and nothing more.
Scenes are loaded only when needed
We load all scenes asynchronously and on demand which drastically improves the load times of the whole application. The “on demand” here usually means visiting a specific route, or performing a UI action which performs a dynamic module import.

Here’s how such setup looks in practice (simplified for readability):

// In src/routes.jsconst FooLoader = AsyncLoadComponent( () => import(‘src/scenes/foo/foo’), GenericPagePreloader,);

// In src/scenes/foo/foo.js

The AsyncLoadComponent is a thin wrapper around [React.lazy()](https://reactjs.org/docs/code-splitting.html#reactlazy), additionally accepting a preloader component, the same one passed through fallback to [React.Suspense()](https://reactjs.org/docs/code-splitting.html#suspense), and a delay after which the preloader should be rendered if the scene hasn’t finished loading.

This is useful when making sure our users see the same preloader without any interruption or flash of content from the moment a scene is requested to the moment when all of its files have been downloaded, all of the critical API requests have completed, and the component has finished rendering.

Component tiers

As each application grows, its directory structure and abstractions evolve along with it. After roughly half a year of building and moving features to the new codebase, having a single components directory proved insufficient.
We needed our directory structure to inform us about:

Have the components been developed to be generic, or are they meant only for a specific use-case?
Are they generic enough to be used across all the application, or should they used only in the certain contexts?
Who’s responsible for and most knowledgable about the code?

Based on that we’ve defined the following Component Tiers:

1. Application-specific (src/app)

Single-use components which cater to specific use-cases within this application, and which are not meant to be re-used or extracted to the component library (routes, footer, page header etc.).

2. Generic (src/components)

Generic multi-purpose components to be used all across the application and its scenes. Once we’ve arrived at a stable API for these components, they could be moved into the common component library (more on that below)

3. Components of a single scene (src/scenes/my-scene/components)

Components developed with a specific use case in mind; not meant to be used in any other scenes. For cases when a component from one scene needs to be used in another one, we’d use:

4. Multi-scene components (src/scenes/components/my-feature)

Components used across multiples scenes, but not meant to be generic enough to be used anywhere else. To illustrate why simply moving them to src/components isn’t good enough:

Imagine that so far you’ve had a single scene which contained components used to build some rather specific data charts. Your team is now building a second scene which will use different data for the charts, but visually the two will look pretty much the same.

Importing components from one scene into another would break the encapsulation of the scene and would mean that we can no longer be certain whether changes made to a single scene’s components only affect that one scene.

For this purpose, any component or group of components, roughly referred to as a feature, would be placed in src/scenes/components from where it can be imported and used by any other team, however:

Whenever a team would like to start using scene components which another team developed, the best practice would be to reach out to that team first to figure out whether the use case you intend these components for can safely be supported in the future. Giving a heads up to the team who originally developed the code will prevent shipping broken features in the future when code you’ve taken into use inevitably gets changed in ways you didn’t expect (because of course, how could you!), and which might not always be caught by the unit tests.

5. Common library

Components which we’ve battle-tested in production and want to extract to our shared component library, used by other dashboard teams at Unity.

Ode to shared dependencies

While it would be very convenient to be able to build and deploy every piece of our application in a fully isolated environment, certain dependencies — both external libraries and internal application code — are simply going to be used all across the codebase. Things like React itself, Redux and all redux-related logic, common navigational components etc.

Rolling out the changes

At the moment, fully encapsulating the scenes isn’t practical and in many cases simply impossible. It would take either shipping many dependencies multiple times over and in the process slowing down pages loads, or building abstractions meant to make certain libraries work in aways they’ve not been designed to.

As the web development and its ecosystem evolves though, the libraries seem to become more and more standalone and encapsulated, which we hope in the future will mean little to no shared dependencies, and true isolation between all the modules.

Perhaps the biggest drawback of authoring large-scale applications is performing code changes and dependency updates without breaking something in the process

Using a monorepository makes it possible (though not mandatory) to roll out changes and updates to the code in more gradual and safe manner — if a change causes issues, these issues will only affect a small part of the application, not the whole system.

And while for some the ability to perform updates on multiple unrelated areas of the codebase at the same time would come off as a benefit, the reality of having multiple teams working on the same codebase and not knowing all the other teams’ features thoroughly means that a great deal of caution is needed when building the application scaffolding and taking measures to minimize the risk of breakage.

How to avoid breaking things

Perhaps the most fundamental strategy which helps us to do so, other than scene isolation, is having a high unit test coverage.

Testing

The unit tests aren’t of course everything — many mature products on even a moderate scale do after all invest in suites of integration and e2e tests which do a better job at verifying whether the application works as expected overall. However, as the number of features grows so does the maintenance cost and time needed to run them — a cost which cannot always be justified for less crucial but still important features.

Some lessons we’ve learned from various testing strategies:

Try to unit test as much of the code as possible, especially: conditional logic, data transformations and function calls
Invest in and leverage integration tests to their full extent before deciding to write any e2e tests. The initial cost of integration tests is much higher, but pales in comparison to the price of upkeep of an e2e suite
Try not to over-react by starting to write e2e tests for things that weren’t caught by unit or integration tests. Sometimes, the processes or tooling are at fault
Let test cases explain UI behavior rather than implementation details
Automated tests cannot fully replace manual testing

2. Minimize the surface of shared code

Aside from testing, code re-used across the whole application is kept to a reasonable minimum. One of the most useful strategies so far has been to move the most commonly used components and code to a shared component library, from where they are used as dependencies in scenes which need them. This allows us to roll out most of the changes progressively, on a per team- or page-basis.

3. Accountability

Last but not least, a huge factor in multiple teams being able to collaborate within the same codebase comes from encouraging and having developers take personal responsibility and accountability for the product, instead of offloading the responsibility for properly testing that everything works to Q.A., testers or automation.

This carries over to code reviews as well. Making sure each change is carefully reviewed is harder than it might seem on the surface. As team works closely together, a healthy degree of trust is developed between its members. This trust however, can sometimes translate into people being less diligent about changes made by the more experienced or otherwise trustworthy developers.

To encourage diligence, we emphasize that the author of the PR and the reviewer are equally responsible for ensuring everything works.

Component library

To achieve the same look and feel across all the pages of our dashboards, we’ve developed a component library. What stands in our approach, is that new components are almost never developed within that library.

Every component, after being developed within the dashboard’s codebase, is taken into use in a bunch of features within that codebase first. Usually after a few weeks we begin to feel more confident that the component could be moved over, given that:

The API is flexible enough to support the foreseeable use-cases
The component has been tested in a variety of contexts
The performance, responsiveness, and UX are all accounted for

This process follows the Rule of Three and aims to help us release only components which are truly reusable and have been taken into use in a variety of contexts before being moved to our common library.

Some of the examples of the components we’d move over would include: footer, page header, side and top navigation elements, layout building blocks, banners, powered-up versions of buttons, typography elements etc.

In the early days, the component library used to be located in the same codebase as the application itself. We’ve since then extracted it to a separate repository to make the development process more democratized for other teams at Unity — important when driving for its adoption.

Modular component design

For the longest time, building reusable components meant dealing with multiple challenges, many of which often didn’t have good solutions:

How to easily import the component along with its styles, and only that
How to override default styles without selector specificity wars
In bigger components consisting of multiple smaller ones, how to override the styling of the smaller component

Our dashboard, as well as our component library heavily depend on and utilize Material UI. What’s uniquely compelling in Material UI’s styling solution is the potential brought by JSS and their Unified Styling Language (well worth the read), which make it possible to develop UIs encapsulated by design like in the case of CSS Modules, and solve of the above mentioned issues in a stride.

This differs significantly from approaches like BEM which provide encapsulation by convention which tend to be less extensible and less encapsulated.

Living styleguide

A component library wouldn’t be complete without a way to showcase the components it contains and being able to see the components as they change throughout the releases.

We’ve had pretty good experience with Storybook which was ridiculously easy to setup and get started with, but after some time we realized a more robust and end-to-end solution was needed. Pretty close to what Styleguidist offers, but more tailored to our needs.

Existing design docs

The documentation serving as the main source of information about the latest design specification was located in Confluence, where designers kept an up-to-date specification for each component using screenshots illustrating permitted use-cases, states and variations the component could be in, listed best practices, as well as details like dimensions, used colors etc. Following that approach we’ve faced a number of challenges:

Material design specification keeps evolving and because of that we oftentimes found ourselves either spending time on updating all the screenshots and guidelines, or let our design guidelines become outdated
Figuring out which is more correct: implementation or specification wasn’t always an easy task. Because we’ve been publishing Storybook demos of every component and for every library version, we could see what and how changed. We couldn’t do the same for the design spec.
Screenshots and videos can only communicate as much. To provide components of high quality and which can be used by multiple teams it’s necessary to review whether each component works in all resolutions, is bug-free and has good UX — this was difficult without having the designer sit literally next to you to see the implementation demo being shown on the screen

Component documentation app

Our documentation app aims to provide the means of efficient collaboration between designers and engineers to make it simpler and less time-consuming for both parties to document, review and develop components. To be more specific, we needed to:

Have a single point of reference showcasing the components, how should they look, behave, and be used — provided for every release — replacing detailed descriptions with live demos
Make it as easy for designers and developers to collaborate on components and their docs and do so before the components are released — without the need of sharing videos, screenshots, or being physically in the same location
Separate the designs into what we plan to do vs what has been done

Similarly like before, each release of the component library causes a new version of the living styleguide to be published. This time over however, there are a few differences:

Designers contribute to component documentation directly by editing documentation files through the Github UI, committing changes to the latest release.
Component demos as WYSIWYG — the same code you see as an example of how to implement the component is used to render the demo, including any intermediate file imports, variable declarations etc. As an added bonus, components wrapped in withStyles() are displayed correctly (issue present in Storybook at the moment).
Changes to the docs and the code are almost instantly visible without checking out the branch locally and starting the documentation app — the app is rebuilt and published on and for every commit.

Development experience

One of the main goals of code reviews is making sure that each change is carefully reviewed, considered and tested before being merged and deployed.

To make this task as obstacle-free as possible we’ve developed a Preview Server capable of creating a new build of our application every time a PR is created or updated.

A comment containing version links gets added to every PR and is updated on every pushed change

Our designers, product managers and engineers can test each change before merging it in, in both staging and production environments and within minutes of making the change.

Browsing production version of the application before merging the PR

Closing words

It’s been nearly a year since we’ve undertaken to consolidate our dashboards. We’ve spent that time learning how to grow a large but healthy software project, how to get better at collaboration and communication, and how to raise the quality bar for ourselves.

We scaled a frontend project not only in terms of lines of code, but also in terms of number of engineers who work within its codebase — a number which quadrupled since the beginning.

Code frequency from the beginning of project’s existence until now

We did a 180 degree change in dealing with time differences between our teams, moving away from a model where our teams worked in full isolation to one where close collaboration and communication are an everyday occurrence.

While we still have a long road ahead to ensure we can scale our approach to more teams and to bigger challenges, we’ve noticed a number of improvements already:

Roadmap and work visibility
Due to having one place where all the work is happening, the progress gets tracked, and all the issues are gathered in
Development velocity and time-to-market
New features can be created in large part from already existing and well-tested components — easily findable through our documentation app
Code quality & test coverage
When building new things, a solution to a similar problem usually already exists and is at a hand’s reach, along with examples how to test it
Overall quality & UX
Testing features and ensuring their quality is now easier than ever, as designers, product managers and other stakeholders can test each change on their own machine, with their own accounts and data sets

Naturally, along the way we’ve encountered a number of challenges which we need to solve, or which will need solving in the future:

Build & CI performance
As the numbers of dependencies, build bundles, and tests grow, as does the time needed to do a deployment. In the future, we’ll need to develop tooling to help us only build, test and deploy the pieces which changed.
Development culture
To build healthy software, we need to continuously work on healthy ways of communicating and exchanging ideas, and text-based communications make this task more difficult. We’re working to address this issue through a series regular leadership training sessions and embracing a more open-source ways of working, as well as organizing a few get together sessions per year for the teams to meet each other face to face.
Breakage isolation & updates
As the number of features and pages grows, we’ll need a more robust way of isolating our application modules to prevent damage from spreading for when things go wrong. This could be achieved by versioning all the shared code (redux logic, src/components), or in extreme cases producing standalone builds of certain features.

State then, now and in the future

The migration has involved moving away from AngularJS to React. Here’s how the situation changed over the past year:

April 2018

February 2019

Where we hope our dashboards will be by the end of 2019

It’s a wrap! Thank you for reading! You can find me on LinkedIn here.

If working on similar challenges sounds interesting to you, we’re always looking for talented engineers to join our teams all around the world.

Scaling Node.js Applications

freeCodeCamp — Fri, 14 Jul 2017 01:32:54 +0000

By Samer Buna

Everything you need to know about Node.js built-in tools for scalability

Update: This article is now part of my book “Node.js Beyond The Basics”.

Read the updated version of this content and more about Node at jscomplete.com/node-beyond-basics.

Scalability in Node.js is not an afterthought. It’s something that’s baked into the core of the runtime. Node is named Node to emphasize the idea that a Node application should comprise multiple small distributed nodes that communicate with each other.

Are you running multiple nodes for your Node applications? Are you running a Node process on every CPU core of your production machines and load balancing all the requests among them? Did you know that Node has a built-in module to help with that?

Node’s cluster module not only provides an out-of-the-box solution to utilizing the full CPU power of a machine, but it also helps with increasing the availability of your Node processes and provides an option to restart the whole application with a zero downtime. This article covers all that goodness and more.

This article is a write-up of part of my Pluralsight course about Node.js. I cover similar content in video format there.

Strategies of Scalability

The workload is the most popular reason we scale our applications, but it’s not the only reason. We also scale our applications to increase their availability and tolerance to failure.

There are mainly three different things we can do to scale an application:

1 — Cloning

The easiest thing to do to scale a big application is to clone it multiple times and have each cloned instance handle part of the workload (with a load balancer, for example). This does not cost a lot in term of development time and it’s highly effective. This strategy is the minimum you should do and Node.js has the built-in module, cluster, to make it easier for you to implement the cloning strategy on a single server.

2 — Decomposing

We can also scale an application by decomposing it based on functionalities and services. This means having multiple, different applications with different code bases and sometimes with their own dedicated databases and User Interfaces.

This strategy is commonly associated with the term Microservice, where micro indicates that those services should be as small as possible, but in reality, the size of the service is not what’s important but rather the enforcement of loose coupling and high cohesion between services. The implementation of this strategy is often not easy and could result in long-term unexpected problems, but when done right the advantages are great.

3 — Splitting

We can also split the application into multiple instances where each instance is responsible for only a part of the application’s data. This strategy is often named horizontal partitioning, or sharding, in databases. Data partitioning requires a lookup step before each operation to determine which instance of the application to use. For example, maybe we want to partition our users based on their country or language. We need to do a lookup of that information first.

Successfully scaling a big application should eventually implement all three strategies. Node.js makes it easy to do so but I am going to focus on the cloning strategy in this article and explore the built-in tools available in Node.js to implement it.

Please note that you need a good understanding of Node.js child processes before reading this article. If you haven’t already, I recommend that you read this other article first:

Node.js Child Processes: Everything you need to know
_How to use spawn(), exec(), execFile(), and fork()_medium.freecodecamp.org

The Cluster Module

The cluster module can be used to enable load balancing over an environment’s multiple CPU cores. It’s based on the child process module fork method and it basically allows us to fork the main application process as many times as we have CPU cores. It will then take over and load balance all requests to the main process across all forked processes.

The cluster module is Node’s helper for us to implement the cloning scalability strategy, but only on one machine. When you have a big machine with a lot of resources or when it’s easier and cheaper to add more resources to one machine rather than adding new machines, the cluster module is a great option for a really quick implementation of the cloning strategy.

Even small machines usually have multiple cores and even if you’re not worried about the load on your Node server, you should enable the cluster module anyway to increase your server availability and fault-tolerance. It’s a simple step and when using a process manager like PM2, for example, it becomes as simple as just providing an argument to the launch command!

But let me tell you how to use the cluster module natively and explain how it works.

The structure of what the cluster module does is simple. We create a master process and that master process forks a number of worker processes and manages them. Each worker process represents an instance of the application that we want to scale. All incoming requests are handled by the master process, which is the one that decides which worker process should handle an incoming request.

Screenshot captured from my Pluralsight course — Advanced Node.js

The master process’s job is easy because it actually just uses a round-robin algorithm to pick a worker process. This is enabled by default on all platforms except Windows and it can be globally modified to let the load-balancing be handled by the operation system itself.

The round-robin algorithm distributes the load evenly across all available processes on a rotational basis. The first request is forwarded to the first worker process, the second to the next worker process in the list, and so on. When the end of the list is reached, the algorithm starts again from the beginning.

This is one of the simplest and most used load balancing algorithms. But it’s not the only one. More featured algorithms allow assigning priorities and selecting the least loaded server or the one with the fastest response time.

Load-Balancing an HTTP Server

Let’s clone and load balance a simple HTTP server using the cluster module. Here’s the simple Node’s hello-world example server slightly modified to simulate some CPU work before responding:

// server.js
const http = require('http');
const pid = process.pid;

http.createServer((req, res) => {
  for (let i=0; i<1e7; i++); // simulate CPU work
  res.end(`Handled by process ${pid}`);
}).listen(8080, () => {
  console.log(`Started process ${pid}`);
});

To verify that the balancer we’re going to create is going to work, I’ve included the process pid in the HTTP response to identify which instance of the application is actually handling a request.

Before we create a cluster to clone this server into multiple workers, let’s do a simple benchmark of how many requests this server can handle per second. We can use the Apache benchmarking tool for that. After running the simple server.js code above, run this ab command:

ab -c200 -t10 http://localhost:8080/

This command will test-load the server with 200 concurrent connections for 10 seconds.

Screenshot captured from my Pluralsight course — Advanced Node.js

On my machine, the single node server was able to handle about 51 requests per second. Of course, the results here will be different on different platforms and this is a very simplified test of performance that’s not a 100% accurate, but it will clearly show the difference that a cluster would make in a multi-core environment.

Now that we have a reference benchmark, we can scale the application with the cloning strategy using the cluster module.

On the same level as the server.js file above, we can create a new file (cluster.js) for the master process with this content (explanation follows):

// cluster.js
const cluster = require('cluster');
const os = require('os');

if (cluster.isMaster) {
  const cpus = os.cpus().length;

  console.log(`Forking for ${cpus} CPUs`);
  for (let i = 0; ielse {
  require('./server');
}

In cluster.js, we first required both the cluster module and the os module. We use the os module to read the number of CPU cores we can work with using os.cpus().

The cluster module gives us the handy Boolean flag isMaster to determine if this cluster.js file is being loaded as a master process or not. The first time we execute this file, we will be executing the master process and that isMaster flag will be set to true. In this case, we can instruct the master process to fork our server as many times as we have CPU cores.

Now we just read the number of CPUs we have using the os module, then with a for loop over that number, we call the cluster.fork method. The for loop will simply create as many workers as the number of CPUs in the system to take advantage of all the available processing power.

When the cluster.fork line is executed from the master process, the current file, cluster.js, is run again, but this time in worker mode with the isMaster flag set to false. There is actually another flag set to true in this case if you need to use it, which is the isWorker flag.

When the application runs as a worker, it can start doing the actual work. This is where we need to define our server logic, which, for this example, we can do by requiring the server.js file that we have already.

That’s basically it. That’s how easy it is to take advantage of all the processing power in a machine. To test the cluster, run the cluster.js file:

Screenshot captured from my Pluralsight course — Advanced Node.js

I have 8 cores on my machine so it started 8 processes. It’s important to understand that these are completely different Node.js processes. Each worker process here will have its own event loop and memory space.

When we now hit the web server multiple times, the requests will start to get handled by different worker processes with different process ids. The workers will not be exactly rotated in sequence because the cluster module performs some optimizations when picking the next worker, but the load will be somehow distributed among the different worker processes.

We can use the same ab command above to load-test this cluster of processes:

Screenshot captured from my Pluralsight course — Advanced Node.js

The cluster I created on my machine was able to handle 181 requests per second in comparison to the 51 requests per second that we got using a single Node process. The performance of this simple application tripled with just a few lines of code.

Broadcasting Messages to All Workers

Communicating between the master process and the workers is simple because under the hood the cluster module is just using the child_process.fork API, which means we also have communication channels available between the master process and each worker.

Based on the server.js/cluster.js example above, we can access the list of worker objects using cluster.workers, which is an object that holds a reference to all workers and can be used to read information about these workers. Since we have communication channels between the master process and all workers, to broadcast a message to all them we just need a simple loop over all the workers. For example:

Object.values(cluster.workers).forEach(worker => {
  worker.send(`Hello Worker ${worker.id}`);
});

We simply used Object.values to get an array of all workers from the cluster.workers object. Then, for each worker, we can use the send function to send over any value that we want.

In a worker file, server.js in our example, to read a message received from this master process, we can register a handler for the message event on the global process object. For example:

process.on('message', msg => {
  console.log(`Message from master: ${msg}`);
});

Here is what I see when I test these two additions to the cluster/server example:

Screenshot captured from my Pluralsight course — Advanced Node.js

Every worker received a message from the master process. Note how the workers did not start in order.

Let’s make this communication example a little bit more practical. Let’s say we want our server to reply with the number of users we have created in our database. We’ll create a mock function that returns the number of users we have in the database and just have it square its value every time it’s called (dream growth):

// **** Mock DB Call
const numberOfUsersInDB = function() {
  this.count = this.count || 5;
  this.count = this.count * this.count;
  return this.count;
}
// ****

Every time numberOfUsersInDB is called, we’ll assume that a database connection has been made. What we want to do here — to avoid multiple DB requests — is to cache this call for a certain period of time, such as 10 seconds. However, we still don’t want the 8 forked workers to do their own DB requests and end up with 8 DB requests every 10 seconds. We can have the master process do just one request and tell all of the 8 workers about the new value for the user count using the communication interface.

In the master process mode, we can, for example, use the same loop to broadcast the users count value to all workers:

// Right after the fork loop within the isMaster=true block
const updateWorkers = () => {
  const usersCount = numberOfUsersInDB();
  Object.values(cluster.workers).forEach(worker => {
    worker.send({ usersCount });
  });
};

updateWorkers();
setInterval(updateWorkers, 10000);

Here we’re invoking updateWorkers for the first time and then invoking it every 10 seconds using a setInterval. This way, every 10 seconds, all workers will receive the new user count value over the process communication channel and only one database connection will be made.

In the server code, we can use the usersCount value using the same message event handler. We can simply cache that value with a module global variable and use it anywhere we want.

For example:

const http = require('http');
const pid = process.pid;

let usersCount;

http.createServer((req, res) => {
  for (let i=0; i<1e7; i++); // simulate CPU work
  res.write(`Handled by process ${pid}\n`);
  res.end(`Users: ${usersCount}`);
}).listen(8080, () => {
  console.log(`Started process ${pid}`);
});

process.on('message', msg => {
  usersCount = msg.usersCount;
});

The above code makes the worker web server respond with the cached usersCount value. If you test the cluster code now, during the first 10 seconds you’ll get “25” as the users count from all workers (and only one DB request would be made). Then after another 10 seconds, all workers would start reporting the new user count, 625 (and only one other DB request would be made).

This is all possible thanks to the communication channels between the master process and all workers.

Increasing Server Availability

One of the problems in running a single instance of a Node application is that when that instance crashes, it has to be restarted. This means some downtime between these two actions, even if the process was automated as it should be.

This also applies to the case when the server has to be restarted to deploy new code. With one instance, there will be downtime which affects the availability of the system.

When we have multiple instances, the availability of the system can be easily increased with just a few extra lines of code.

To simulate a random crash in the server process, we can simply do a process.exit call inside a timer that fires after a random amount of time:

// In server.js
setTimeout(() => {
  process.exit(1) // death by random timeout
}, Math.random() * 10000);

When a worker process exits like this, the master process will be notified using the exit event on the cluster model object. We can register a handler for that event and just fork a new worker process when any worker process exits.

For example:

// Right after the fork loop within the isMaster=true block
cluster.on('exit', (worker, code, signal) => {
  if (code !== 0 && !worker.exitedAfterDisconnect) {
    console.log(`Worker ${worker.id} crashed. ` +
                'Starting a new worker...');
    cluster.fork();
  }
});

It’s good to add the if condition above to make sure the worker process actually crashed and was not manually disconnected or killed by the master process itself. For example, the master process might decide that we are using too many resources based on the load patterns it sees and it will need to kill a few workers in that case. To do so, we can use the disconnect methods on any worker and, in that case, the exitedAfterDisconnect flag will be set to true. The if statement above will guard to not fork a new worker for that case.

If we run the cluster with the handler above (and the random crash in server.js), after a random number of seconds, workers will start to crash and the master process will immediately fork new workers to increase the availability of the system. You can actually measure the availability using the same ab command and see how many requests the server will not be able to handle overall (because some of the unlucky requests will have to face the crash case and that’s hard to avoid.)

When I tested the code, only 17 requests failed out of over 1800 in the 10-second testing interval with 200 concurrent requests.

Screenshot captured from my Pluralsight course — Advanced Node.js

That’s over 99% availability. By just adding a few lines of code, we now don’t have to worry about process crashes anymore. The master guardian will keep an eye on those processes for us.

Zero-downtime Restarts

What about the case when we want to restart all worker processes when, for example, we need to deploy new code?

We have multiple instances running, so instead of restarting them together, we can simply restart them one at a time to allow other workers to continue to serve requests while one worker is being restarted.

Implementing this with the cluster module is easy. Since we don’t want to restart the master process once it’s up, we need a way to send this master process a command to instruct it to start restarting its workers. This is easy on Linux systems because we can simply listen to a process signal like SIGUSR2, which we can trigger by using the kill command on the process id and passing that signal:

// In Node
process.on('SIGUSR2', () => { ... });
// To trigger that
$ kill -SIGUSR2 PID

This way, the master process will not be killed and we have a way to instruct it to start doing something. SIGUSR2 is a proper signal to use here because this will be a user command. If you’re wondering why not SIGUSR1, it’s because Node uses that for its debugger and you want to avoid any conflicts.

Unfortunately, on Windows, these process signal are not supported and we would have to find another way to command the master process to do something. There are some alternatives. We can, for example, use standard input or socket input. Or we can monitor the existence of a process.pid file and watch that for a remove event. But to keep this example simple, we’ll just assume this server is running on a Linux platform.

Node works very well on Windows, but I think it’s a much safer option to host production Node applications on a Linux platform. This is not just because of Node itself, but many other production tools that are much more stable on Linux. This is my personal opinion and feel free to completely ignore it.

By the way, on recent versions of Windows, you can actually use a Linux subsystem and it works very well. I’ve tested it myself and it was nothing short of impressive. If you’re developing a Node applications on Windows, check out Bash on Windows and give it a try.

In our example, when the master process receives the SIGUSR2 signal, that means it’s time for it to restart its workers, but we want to do that one worker at a time. This simply means the master process should only restart the next worker when it’s done restarting the current one.

To begin this task, we need to get a reference to all current workers using the cluster.workers object and we can simply just store the workers in an array:

const workers = Object.values(cluster.workers);

Then, we can create a restartWorker function that receives the index of the worker to be restarted. This way we can do the restarting in sequence by having the function call itself when it’s ready for the next worker. Here’s an example restartWorker function that we can use (explanation follows):

const restartWorker = (workerIndex) => {
  const worker = workers[workerIndex];
  if (!worker) return;

  worker.on('exit', () => {
    if (!worker.exitedAfterDisconnect) return;
    console.log(`Exited process ${worker.process.pid}`);

    cluster.fork().on('listening', () => {
      restartWorker(workerIndex + 1);
    });
  });

  worker.disconnect();
};

restartWorker(0);

Inside the restartWorker function, we got a reference to the worker to be restarted and since we will be calling this function recursively to form a sequence, we need a stop condition. When we no longer have a worker to restart, we can just return. We then basically want to disconnect this worker (using worker.disconnect), but before restarting the next worker, we need to fork a new worker to replace this current one that we’re disconnecting.

We can use the exit event on the worker itself to fork a new worker when the current one exists, but we have to make sure that the exit action was actually triggered after a normal disconnect call. We can use the exitedAfetrDisconnect flag. If this flag is not true, the exit was caused by something else other than our disconnect call and in that case, we should just return and do nothing. But if the flag is set to true, we can go ahead and fork a new worker to replace the one that we’re disconnecting.

When this new forked worker is ready, we can restart the next one. However, remember that the fork process is not synchronous, so we can’t just restart the next worker after the fork call. Instead, we can monitor the listening event on the newly forked worker, which tells us that this worker is connected and ready. When we get this event, we can safely restart the next worker in sequence.

That’s all we need for a zero-downtime restart. To test it, you’ll need to read the master process id to be sent to the SIGUSR2 signal:

console.log(`Master PID: ${process.pid}`);

Start the cluster, copy the master process id, and then restart the cluster using the kill -SIGUSR2 PID command. You can also run the same ab command while restarting the cluster to see the effect that this restart process will have on availability. Spoiler alert, you should get ZERO failed requests:

Screenshot captured from my Pluralsight course — Advanced Node.js

Process monitors like PM2, which I personally use in production, make all the tasks we went through so far extremely easy and give a lot more features to monitor the health of a Node.js application. For example, with PM2, to launch a cluster for any app, all you need to do is use the -i argument:

pm2 start server.js -i max

And to do a zero downtime restart you just issue this magic command:

pm2 reload all

However, I find it helpful to first understand what actually will happen under the hood when you use these commands.

Shared State and Sticky Load Balancing

Good things always come with a cost. When we load balance a Node application, we lose some features that are only suitable for a single process. This problem is somehow similar to what’s known in other languages as thread safety, which is about sharing data between threads. In our case, it’s sharing data between worker processes.

For example, with a cluster setup, we can no longer cache things in memory because every worker process will have its own memory space. If we cache something in one worker’s memory, other workers will not have access to it.

If we need to cache things with a cluster setup, we have to use a separate entity and read/write to that entity’s API from all workers. This entity can be a database server or if you want to use in-memory cache you can use a server like Redis or create a dedicated Node process with a read/write API for all other workers to communicate with.

Screenshot captured from my Pluralsight course — Advanced Node.js

Don’t look at this as a disadvantage though, because using a separate entity for your application caching needs is part of decomposing your app for scalability. You should probably be doing that even if you’re running on a single core machine.

Other than caching, when we’re running on a cluster, stateful communication in general becomes a problem. Since the communication is not guaranteed to be with the same worker, creating a stateful channel on any one worker is not an option.

The most common example for this is authenticating users.

Screenshot captured from my Pluralsight course — Advanced Node.js

With a cluster, the request for authentication comes to the master balancer process, which gets sent to a worker, assuming that to be A in this example.

Screenshot captured from my Pluralsight course — Advanced Node.js

Worker A now recognizes the state of this user. However, when the same user makes another request, the load balancer will eventually send them to other workers, which do not have them as authenticated. Keeping a reference to an authenticated user session in one instance memory is not going to work anymore.

This problem can be solved in many ways. We can simply share the state across the many workers we have by storing these sessions’ information in a shared database or a Redis node. However, applying this strategy requires some code changes, which is not always an option.

If you can’t do the code modifications needed to make a shared storage of sessions here, there is a less invasive but not as efficient strategy. You can use what’s known as Sticky Load Balancing. This is much simpler to implement as many load balancers support this strategy out of the box. The idea is simple. When a user authenticates with a worker instance, we keep a record of that relation on the load balancer level.

Screenshot captured from my Pluralsight course — Advanced Node.js

Then, when the same user sends a new request, we do a lookup in this record to figure out which server has their session authenticated and keep sending them to that server instead of the normal distributed behavior. This way, the code on the server side does not have to be changed, but we don’t really get the benefit of load balancing for authenticated users here so only use sticky load balancing if you have no other option.

The cluster module actually does not support sticky load balancing, but a few other load balancers can be configured to do sticky load balancing by default.

Thanks for reading.

Learning React or Node? Checkout my books:

scaling - freeCodeCamp.org

How Large-Scale Platforms Handle Millions of Daily Transactions

What We'll Cover:

Why Transaction Volume Creates Unique Challenges

Breaking Monoliths Into Services

Using Load Balancers to Distribute Traffic

Why Databases Become Bottlenecks

Caching Frequently Accessed Data

Processing Tasks Asynchronously

Preventing Duplicate Transactions

Monitoring Everything

Preparing for Traffic Spikes

Building for Failure

The Importance of Consistency and Reliability

Conclusion

How to Scale Laravel Applications for High-Traffic Production Systems

Prerequisites

Table of Contents

What Happens When Laravel Apps Start Growing

Common Laravel Bottlenecks

N+1 Queries

Missing Indexes

Inefficient Eager Loading

Synchronous Processing

Large Payloads

Expensive Joins

How to Optimize the Database

Add Indexes Around Real Query Patterns

Use Eager Loading Deliberately

Optimize Queries Before Adding Hardware

Process Large Tables with Chunking

Use Cursor Pagination for High-Volume Feeds

Split Reads with Read Replicas

How to Scale with Redis

Caching

Sessions

Rate Limiting

Queues

How to Use Queue-Driven Architectures

Laravel Queues

Horizon

Failed Jobs and Retries

Queue Monitoring

How to Optimize API Performance

API Resources

Pagination

Response Optimization

Rate Limiting

Caching API Responses

How to Monitor Laravel in Production

An Example High-Traffic Laravel Architecture

Lessons Learned the Hard Way

1. Premature Optimization

2. Over-caching

3. Missing Indexes

4. Queue Overload

5. Large Transactions

6. Treating Symptoms as Causes

A Pre-Launch Scaling Checklist

Conclusion

References

What is Amazon EC2 Auto Scaling?

Prerequisites

Table of Content

Example Use Case

Scenario:

Problem:

Solution with Amazon EC2 Auto Scaling:

Advantages of Using Amazon EC2 Auto Scaling

Components of EC2 Auto Scaling

Launch Configurations and Launch Templates

How to Create a Launch Template

What are Auto Scaling Groups (ASGs)

How to create an Auto Scaling Group

What are Scaling Policies?

Manual Scaling

Pros:

Cons:

Schedule Scaling

Pros: