performance - freeCodeCamp.org

How to Fix App Jank: A Practical Guide to Profiling Flutter Apps with DevTools

Gidudu Nicholas — Wed, 08 Jul 2026 15:47:35 +0000

Flutter makes it fast to build beautiful UIs. That speed is one of the framework's greatest strengths, but it also creates a subtle problem: performance issues are easy to introduce and difficult to find without the right tools.

Jank — the visible stutters, hitches, and freezes users notice — rarely comes from where developers expect. Networking is blamed when the issue is widget rebuilds. Slow APIs are investigated when the problem is synchronous parsing on the main isolate. State management is refactored when the real culprit is an animation creating a SaveLayer on every frame.

Guessing at performance problems and profiling them are completely different activities. Flutter DevTools makes profiling accessible, precise, and actionable. This article is a practical guide to using it effectively.

What Jank Actually Is
Setting Up for Accurate Profiling
The Performance View: Reading the Frame Timeline
The CPU Profiler: Finding the Root Cause
The Flutter Inspector: Hunting Unnecessary Rebuilds
The Memory View: Catching Leaks Before Users Do
Fixing the Most Common Jank Patterns
Verifying Your Fix Actually Worked
Conclusion

What Jank Actually Is

Jank is any visible stutter, freeze, or hesitation in a Flutter app's UI. It's the feeling that something is slightly wrong, like an animation that skips a beat, a scroll that catches for a moment, or a screen transition that feels heavy.

The source of jank is almost always the same: a frame took too long to produce.

Flutter renders at 60 frames per second on most devices, and 120fps on newer hardware. At 60fps, Flutter has exactly 16 milliseconds to produce each frame — run Dart code, build the widget tree, calculate layout, paint the frame, and hand it to the GPU. Miss that deadline and the user sees a dropped frame.

Normal frames (smooth):
│████████░░░░░░░│  12ms — within 16ms budget ✓
│████████░░░░░░░│  12ms — smooth
│████████░░░░░░░│  12ms — smooth

Dropped frame (jank):
│████████░░░░░░░│  12ms — smooth
│████████████████████████│  28ms — OVER BUDGET ✗
│████████░░░░░░░│  12ms — smooth again

Jank has two distinct origins, and the correct fix depends entirely on which one applies:

UI thread jank: Dart code is doing too much work. Expensive widget builds, heavy computation on the main isolate, and synchronous parsing.
Raster thread jank: the GPU is struggling. Expensive visual effects, overdraw, too many layers being composited, and SaveLayer operations.

DevTools tells you which is responsible. That distinction matters before a single line of code changes.

Setting Up for Accurate Profiling

One constraint matters more than any other: always profile in profile mode, never debug mode.

Debug mode adds significant overhead — extra assertions, hot reload infrastructure, debug paintings, and verbose logging.

An app in debug mode runs measurably slower than in production. Profiling in debug mode surfaces phantom problems that don't exist for users, while real production problems remain hidden.

# Debug mode — distorts measurements, do not use for profiling
flutter run

# Profile mode — matches production performance
# with DevTools still connected
flutter run --profile

Profile mode removes debug overhead while keeping the DevTools connection alive. It's the closest measurement possible to real user experience.

Opening DevTools from VS Code:

Cmd+Shift+P → Flutter: Open DevTools → select Performance

The performance overlay can also be enabled directly in the app during development, giving an immediate visual signal of frame budget violations without opening DevTools:

MaterialApp(
  // Two bars appear at the top of the screen.
  // Top bar: UI thread. Bottom bar: raster thread.
  // Green means within budget. Red means over budget.
  showPerformanceOverlay: true,
  home: const MyScreen(),
)

The Performance View: Reading the Frame Timeline

The Performance view is the starting point for any jank investigation. Interact with the app while watching the frame chart fill in — scroll a list, trigger an animation, and navigate between screens.

The Frame Chart

Each vertical bar represents one frame. Height represents duration. The red horizontal line marks the 16ms budget.

Frame chart:
     ▲ ms
  28 │           ██
  20 │           ██
  16 │─────────────────── red line (16ms budget)
  12 │ ██  ██    ██  ██
   8 │ ██  ██    ██  ██
   0 └─────────────────────────────→ frames
       ok  ok  JANK  ok

Any bar above the red line is a janky frame. Clicking on it reveals the detailed breakdown of what happened during that specific frame.

The Two Threads

Clicking a janky frame shows a flame chart split into two sections:

UI Thread     ████████████████░░░░  — Dart code execution
Raster Thread ████░░░░░░░░░░░░      — GPU work

A tall UI thread bar indicates that Dart code is the problem. A tall raster thread bar indicates that the GPU is struggling with paint operations.

Reading the Flame Chart

The flame chart is a horizontal bar chart. Each row is a function call. Width represents duration. Rows are stacked to show the call hierarchy.

Frame (28ms total)
├── dart:ui (16ms)
│   └── build (14ms)
│       ├── ExpensiveList.build (8ms)
│       │   └── _buildItem (8ms)     ← wide bar = expensive
│       └── AppBar.build (2ms)
└── layout (4ms)

The widest bars near the top of the stack are the functions consuming the most time. Everything beneath them shows only what called them.

The CPU Profiler: Finding the Root Cause

The Performance view identifies that a frame was slow. The CPU Profiler identifies exactly which function caused it.

Recording a Profile

Open the CPU Profiler tab in DevTools
Click Record
Reproduce the janky interaction
Click Stop
DevTools builds a flame graph from the recording

Reading the Flame Graph

CPU Profiler flame graph:
                                         ← time →
_CounterScreenState.build [████████████████] 45ms
  Column.build            [████████████   ] 35ms
    ExpensiveWidget.build [████████████   ] 35ms
      _buildRows          [████████       ] 25ms
        jsonDecode        [████████       ] 25ms  ← root cause

The widest bars indicate where time is being spent. In this example, jsonDecode is being called inside a build method — running on every rebuild rather than once outside the widget tree.

The Bottom-up Table

The bottom-up table shows which individual functions are doing the most direct work:

Self time: time spent inside the function itself, excluding functions it called. High self time means this function is intrinsically expensive.
Total time: time including all downstream function calls. High total time means this function triggers expensive work somewhere below it.

Sorting by self time identifies the root cause. Sorting by total time identifies the trigger.

Fixing a CPU-bound Bottleneck

Parsing large responses synchronously on the main isolate is one of the most common causes of UI thread jank. The fix is moving that work to a separate isolate:

// Before — blocking the main isolate on every search result
Future> processResults(dynamic data) async {
  return (data as List)
      .map((json) => User.fromJson(json))
      .toList();
}

// After — parsing in a background isolate
// The main isolate stays free to render frames
// while parsing happens concurrently
Future> processResults(dynamic data) async {
  return Isolate.run(() {
    return (data as List)
        .map((json) => User.fromJson(json as Map))
        .toList();
  });
}

One caveat worth understanding: passing a large object graph into Isolate.run copies that data across the isolate boundary. For massive payloads, that copying overhead can rival the cost of the work being offloaded.

The safer pattern for large JSON responses is to pass the raw response string into the isolate and parse it there, rather than passing an already-decoded object:

// Safer for large payloads — the raw string is copied
// into the isolate, parsed there, and only the final
// typed list is copied back. No intermediate object graph crossing.
Future> processResults(String rawJson) async {
  return Isolate.run(() {
    final data = jsonDecode(rawJson) as List;
    return data
        .map((json) => User.fromJson(json as Map))
        .toList();
  });
}

The Flutter Inspector: Hunting Unnecessary Rebuilds

Not all jank comes from expensive individual operations. Some comes from too many rebuilds — widgets that don't need to update rebuilding anyway because a parent called setState.

Enabling Rebuild Counting

In the DevTools Inspector tab, open settings and enable "Track widget build counts." Interact with the app and DevTools displays rebuild counts next to each widget:

Widget Tree with rebuild counts:
MyApp                         0 rebuilds
└── MaterialApp               0 rebuilds
    └── CounterScreen         0 rebuilds
        └── Scaffold          0 rebuilds
            └── Column       24 rebuilds
                ├── Text     24 rebuilds  — necessary
                ├── Text     24 rebuilds  — necessary
                └── ExpensiveList  24 rebuilds  — PROBLEM

ExpensiveList rebuilds 24 times despite having no dependency on the counter state. It rebuilds because it lives in the same subtree as the widgets that do need to update.

Fixing Unnecessary Rebuilds by Extracting State

The solution is extracting the stateful portion into its own widget. Only that widget rebuilds when state changes. Everything else is untouched.

// Before — the entire Scaffold rebuilds on every setState
class _CounterScreenState extends State {
  int _count = 0;

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      body: Column(
        children: [
          Text('Count: $_count'),
          ElevatedButton(
            onPressed: () => setState(() => _count++),
            child: const Text('Increment'),
          ),
          // This never changes but rebuilds on every tap
          const ExpensiveList(),
        ],
      ),
    );
  }
}

// After — CounterDisplay owns its own state
// ExpensiveList never rebuilds
class CounterDisplay extends StatefulWidget {
  const CounterDisplay({super.key});

  @override
  State createState() => _CounterDisplayState();
}

class _CounterDisplayState extends State {
  int _count = 0;

  @override
  Widget build(BuildContext context) {
    return Column(
      children: [
        Text('Count: $_count'),
        ElevatedButton(
          onPressed: () => setState(() => _count++),
          child: const Text('Increment'),
        ),
      ],
    );
  }
}

// The screen is now stateless — it never rebuilds
class CounterScreen extends StatelessWidget {
  const CounterScreen({super.key});

  @override
  Widget build(BuildContext context) {
    return const Scaffold(
      body: Column(
        children: [
          CounterDisplay(),   // rebuilds when count changes
          ExpensiveList(),    // never rebuilds
        ],
      ),
    );
  }
}

Using RepaintBoundary to Isolate Expensive Painting

When one section of the UI repaints frequently while adjacent sections remain static, RepaintBoundary places those sections on separate layers. The frequently-updated section repaints independently without touching the static content.

// Without RepaintBoundary — the animation causes
// the entire Column to repaint on every frame
Column(
  children: [
    AnimatedWidget(controller: _controller),
    const ExpensiveStaticContent(),
  ],
)

// With RepaintBoundary — ExpensiveStaticContent
// lives on its own layer and is never repainted
// during the animation
Column(
  children: [
    AnimatedWidget(controller: _controller),
    const RepaintBoundary(
      child: ExpensiveStaticContent(),
    ),
  ],
)

RepaintBoundary should be used deliberately, not broadly. Every boundary creates an additional compositing layer the GPU must handle. Overuse introduces raster thread overhead that offsets any UI thread savings.

The Memory View: Catching Leaks Before Users Do

Jank from memory leaks behaves differently from other types. It doesn't appear immediately.

The app performs well for the first several minutes, then degrades progressively as memory climbs and the garbage collector works harder to reclaim space. By the time a user reports erratic behavior or slowdowns, the leak has been accumulating for a while.

What a Memory Leak Looks Like in DevTools

The Memory view charts heap usage over time. A healthy app shows a sawtooth pattern — memory rises as objects are allocated, then drops sharply when the garbage collector runs.

Healthy memory:
     ▲ MB
  60 │     ▲       ← GC runs, heap returns to baseline
  40 │   ██│██
  20 │ ██  │  ██▼  ← rises then drops back down
   0 └──────────────→ time
       stable baseline

Memory leak:
     ▲ MB
  80 │               ██
  60 │         ████
  40 │    ████         ← never returns to baseline
  20 │████
   0 └──────────────→ time
       baseline rising

Finding a Leak

The process for isolating a leak:

Open the Memory view and note the current heap size
Navigate to the suspected screen
Navigate away from it
Click the GC button in DevTools to force garbage collection
Observe the heap: if it doesn't drop to near its previous level, something from that screen is still reachable

Taking a snapshot before and after the navigation and comparing the two reveals which objects remained in memory when they should have been collected.

The Most Common Sources of Leaks

Undisposed AnimationController:

class _AnimatedScreenState extends State
    with SingleTickerProviderStateMixin {
  late final AnimationController _controller;

  @override
  void initState() {
    super.initState();
    _controller = AnimationController(
      vsync: this,
      duration: const Duration(milliseconds: 300),
    );
  }

  @override
  void dispose() {
    // Without this, the Ticker fires on every frame
    // indefinitely, holding the State in memory
    _controller.dispose();
    super.dispose();
  }
}

Uncanceled StreamSubscription:

class _ChatScreenState extends State {
  StreamSubscription? _subscription;

  @override
  void initState() {
    super.initState();
    _subscription = messageStream.listen((message) {
      if (mounted) setState(() => messages.add(message));
    });
  }

  @override
  void dispose() {
    // Without cancel(), the stream holds a reference
    // to this callback, which holds a reference to
    // the State, preventing garbage collection
    _subscription?.cancel();
    super.dispose();
  }
}

Anything created in initState that exposes a dispose(), cancel(), or close() method requires that method to be called in dispose(). There are no exceptions to this rule.

Fixing the Most Common Jank Patterns

DevTools consistently surfaces the same categories of jank in production Flutter apps. The fixes are direct once the root cause is known.

Expensive Synchronous Work on the Main Isolate

DevTools signal: tall UI thread bar, CPU Profiler shows parsing or sorting functions with high self time.

// Before — sorting 10,000 items synchronously
// blocks the main isolate for 80-200ms on slower devices
final sorted = List.from(items)
  ..sort((a, b) => a.name.compareTo(b.name));

// After — sorting in a background isolate
final sorted = await Isolate.run(() {
  final copy = List.from(items);
  copy.sort((a, b) => a.name.compareTo(b.name));
  return copy;
});

Future Created Inside Build

DevTools signal: Network view shows duplicate API calls for the same endpoint. CPU Profiler shows network functions called multiple times per user interaction.

// Before — a new Future is created on every rebuild.
// FutureBuilder treats each new Future as a fresh
// operation and resets to loading state.
@override
Widget build(BuildContext context) {
  return FutureBuilder(
    future: repository.fetchUser(userId),
    builder: (context, snapshot) { ... },
  );
}

// After — the Future is created once in initState
// and reused across all subsequent rebuilds
late final Future _userFuture;

@override
void initState() {
  super.initState();
  _userFuture = repository.fetchUser(widget.userId);
}

@override
Widget build(BuildContext context) {
  return FutureBuilder(
    future: _userFuture,
    builder: (context, snapshot) { ... },
  );
}

Large List Rendered as a Column

DevTools signal: the first frame after navigating to a list screen is significantly slower than subsequent frames. Inspector shows a Column with hundreds of children.

// Before — builds all items at once regardless
// of how many are currently visible
Column(
  children: items
      .map((item) => ItemCard(item: item))
      .toList(),
)

// After — builds only the items currently
// visible on screen plus a small buffer
ListView.builder(
  itemCount: items.length,
  itemBuilder: (context, index) {
    return ItemCard(item: items[index]);
  },
)

Animated Opacity Causing Raster Thread Jank

DevTools signal: tall raster thread bar. Flame chart shows SaveLayer operations during animation.

Opacity with a changing value forces Flutter to render the child widget to an offscreen buffer on every frame before compositing it at the target opacity. This SaveLayer operation is one of the most expensive things the raster thread can do.

A note on Impeller: Flutter's newer rendering backend, Impeller — now the default on iOS and rolling out on Android — significantly reduces the penalty of SaveLayer operations and eliminates the shader compilation jank that affected the older Skia engine.

If the app targets only recent Flutter versions with Impeller enabled, raster thread jank from Opacity animations may be less severe than on Skia. The guidance to prefer FadeTransition over animated Opacity still holds, but the urgency is lower on Impeller than it was historically.

// Bad — Opacity with a changing value creates a SaveLayer
// on every animation frame, causing raster thread jank
Opacity(
  opacity: _animationValue,
  child: myWidget,
)

// Good — FadeTransition uses the compositor directly
// without a SaveLayer offscreen buffer
FadeTransition(
  opacity: _animation,
  child: myWidget,
)

Verifying Your Fix Actually Worked

Performance optimisation has a tendency to move the bottleneck rather than eliminate it. Fixing one slow function sometimes reveals that the next-slowest operation now dominates the frame time.

Measuring before and after every fix prevents this from becoming invisible.

The verification process:

Profile in profile mode before making any changes
Record the worst-case frame time during the problematic interaction
Note which thread is the bottleneck
Apply the fix
Profile again under identical conditions
Compare frame times and thread breakdowns

Frame timings can also be captured programmatically, which is useful for tracking improvements over time or validating fixes in CI:

WidgetsBinding.instance.addTimingsCallback((timings) {
  for (final timing in timings) {
    if (timing.totalSpan.inMilliseconds > 16) {
      debugPrint(
        'Slow frame: ${timing.totalSpan.inMilliseconds}ms '
        'build: ${timing.buildDuration.inMilliseconds}ms '
        'raster: ${timing.rasterDuration.inMilliseconds}ms',
      );
    }
  }
});

If measurements improve consistently after a fix, the root cause was correctly identified. If measurements don't improve, the real bottleneck is elsewhere and another round of profiling is needed before changing more code.

Conclusion

Performance problems in Flutter applications rarely come from where developers initially suspect.

The most reliable approach is to profile first, then fix. Not the other way around.

DevTools provides complete visibility into frame timing, CPU usage, widget rebuild frequency, and memory behavior. The Performance view identifies which thread is responsible for a slow frame. The CPU Profiler identifies the specific function causing it. The Inspector surfaces unnecessary rebuild propagation. The Memory view reveals leaks before they affect users.

Profiling in profile mode, profiling before optimizing, and measuring after optimizing are the three habits that make jank a solvable engineering problem rather than a recurring mystery.

The answer to most Flutter performance questions is already in DevTools. Open it before changing any code.

How to Scale Laravel Applications for High-Traffic Production Systems

Olamilekan Lamidi — Thu, 11 Jun 2026 23:45:39 +0000

Your first scaling problem rarely arrives with a bang. For a while, everything is fine: pages load fast, the database barely breaks a sweat, and the team ships features without thinking much about infrastructure.

Then traffic climbs. A campaign over-performs. A marketplace onboards a popular seller. A SaaS product signs a couple of enterprise accounts.

Suddenly, /dashboard takes two seconds instead of 300 milliseconds. Queue jobs that used to clear in seconds sit waiting for minutes. You have database CPU spikes every afternoon.

So you add another app server, and response time barely moves because the real culprit was a slow query on a large table all along.

If you have run Laravel in production, you've probably lived some version of this. The good news is that scaling Laravel almost never means abandoning the framework. It means learning where pressure builds and making the application behave predictably under load.

In this guide, you'll learn how to find common bottlenecks, tune the database, use Redis effectively, move slow work onto queues, optimize APIs, and monitor a Laravel application in production.

None of this requires a single heroic rewrite. The biggest wins usually come from practical work: removing inefficient queries, pushing slow tasks onto queues, adding the right indexes, caching carefully chosen data, and measuring whether each change actually helped.

Prerequisites

You'll get the most out of this guide if you're already comfortable with:

Building applications with Laravel and PHP
Writing Eloquent queries and database migrations
Using queues, jobs, and scheduled commands
Reading a basic database query plan
Deploying Laravel to a production server or platform
Working with Redis and either MySQL or PostgreSQL in a production-like setup

What Happens When Laravel Apps Start Growing
Common Laravel Bottlenecks
How to Optimize the Database
How to Scale with Redis
How to Use Queue-Driven Architectures
How to Optimize API Performance
How to Monitor Laravel in Production
An Example High-Traffic Laravel Architecture
Lessons Learned the Hard Way
A Pre-Launch Scaling Checklist
Conclusion
References

What Happens When Laravel Apps Start Growing

Traffic changes a system's behavior because it turns small inefficiencies into permanent costs. A query that takes 80 milliseconds is harmless when it runs a few hundred times an hour. Run it 30 times per page view on a page that gets thousands of hits a minute, and that same query becomes a capacity problem.

The pressure tends to show up in predictable places. More requests mean more PHP workers, more database connections, more queue volume, and more Redis operations.

The database, whether MySQL or PostgreSQL, is usually the first thing to buckle. Queues back up when work is created faster than workers can drain it. Caches only help when hit rates stay high and misses stay controlled. And scaling everything horizontally can turn sloppy code into an expensive cloud bill.

That's why scaling work has to start with measurement, not guesswork. Before you change anything, you want to know what is actually saturated: request CPU, database I/O, lock contention, Redis latency, queue depth, an external API, or oversized payloads.

A typical request in a growing Laravel app travels through several layers. The user sends a request, a load balancer routes it to an app server, and Laravel checks Redis for a cached result. On a miss, it queries the database, stores the computed result back in Redis, and hands any slow follow-up work to a queue. A worker picks up that job later while Laravel returns the response right away.

Here's the important part: adding more app servers does nothing for a slow query, a missing index, or an overloaded queue. Horizontal scaling only pays off once the shared dependencies behind those servers can keep up.

Common Laravel Bottlenecks

Laravel itself causes very few scaling problems. Most issues come from how application code talks to the database, the network, and background workers.

N+1 Queries

The classic offender is the N+1 query. You load a list of models, then lazily touch a relationship on each one:

use App\Models\Post;

$posts = Post::latest()->take(50)->get();

foreach (\(posts as \)post) {
    echo $post->author->name;
}

That's one query for the posts plus one query per author: 51 queries for a single page. Eager load the relationship instead:

use App\Models\Post;

$posts = Post::with('author')
    ->latest()
    ->take(50)
    ->get();

foreach (\(posts as \)post) {
    echo $post->author->name;
}

In production, these are sneaky. They often hide inside API Resources, Blade components, and authorization checks, where the relationship access isn't obvious from the controller.

Missing Indexes

Adding an index is one of the highest-return fixes you can make. Take a query like this:

\(orders = Order::where('account_id', \)accountId)
    ->where('status', 'paid')
    ->whereBetween('created_at', [\(start, \)end])
    ->latest()
    ->paginate(50);

If orders has millions of rows and no useful compound index, the database scans far more rows than it needs to. Add an index that matches how you actually query:

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration {
    public function up(): void
    {
        Schema::table('orders', function (Blueprint $table) {
            $table->index(['account_id', 'status', 'created_at']);
        });
    }

    public function down(): void
    {
        Schema::table('orders', function (Blueprint $table) {
            $table->dropIndex(['account_id', 'status', 'created_at']);
        });
    }
};

Indexes aren't free, though. They take up space and slow down writes. Add them for real, repeated query patterns, not for every column that ever appears in a where clause.

Inefficient Eager Loading

You can also swing too far the other way. Loading every relationship "just in case" burns memory and ships data the request never uses:

$users = User::with([
    'profile',
    'teams',
    'roles.permissions',
    'invoices.lineItems.product',
])->get();

That might be fine for an admin detail page showing one user. On a list page, it's a liability. Constrain the eager loads and select only the columns you need:

$users = User::query()
    ->select(['id', 'name', 'email'])
    ->with([
        'profile:id,user_id,avatar_url',
        'teams:id,name',
    ])
    ->latest()
    ->paginate(25);

One caveat: tightly scoped select lists can break later code that expects a column you didn't load. Keep this technique close to read-heavy endpoints where the payoff is obvious.

Synchronous Processing

High-traffic apps need short web requests. Sending email, generating PDFs, calling third-party APIs, resizing images, and building exports usually belong outside the request cycle. This version can hurt you:

public function store(Request $request)
{
    \(order = Order::create(\)request->validated());

    Mail::to(\(order->user)->send(new OrderReceipt(\)order));

    return response()->json($order, 201);
}

Push the work onto a queue instead:

public function store(StoreOrderRequest $request)
{
    \(order = Order::create(\)request->validated());

    SendOrderReceipt::dispatch($order->id);

    return response()->json([
        'id' => $order->id,
        'status' => 'accepted',
    ], 202);
}

Now your response time no longer depends on your mail provider. If the provider has a slow afternoon, the queue absorbs it and your users don't have to wait.

Large Payloads

Oversized JSON responses hurt everyone in the chain: the app server serializing them, the network carrying them, and the client parsing them. A frequent mistake is returning whole models when you meant to return a summary:

return User::with('orders', 'invoices', 'teams')->findOrFail($id);

Define an explicit API Resource instead:

use Illuminate\Http\Resources\Json\JsonResource;

class UserSummaryResource extends JsonResource
{
    public function toArray($request): array
    {
        return [
            'id' => $this->id,
            'name' => $this->name,
            'avatar_url' => $this->profile?->avatar_url,
            'plan' => $this->subscription_plan,
        ];
    }
}

A small, deliberate response contract keeps endpoint cost easy to reason about and prevents accidental coupling.

Expensive Joins

Joins are useful, but expensive joins across large tables can dominate your database time, especially when they sort or filter on columns that aren't indexed:

$rows = DB::table('orders')
    ->join('users', 'users.id', '=', 'orders.user_id')
    ->join('accounts', 'accounts.id', '=', 'users.account_id')
    ->where('accounts.region', 'us-east')
    ->where('orders.status', 'paid')
    ->orderByDesc('orders.created_at')
    ->limit(100)
    ->get();

At scale, you may need to denormalize a small field, precompute a reporting table, or move analytics off the primary transactional database entirely. Do not treat denormalization as an admission of defeat. Copying a stable field like account_id onto orders can remove a costly join from a hot path. The price you pay is keeping that duplicated data consistent, which can be a worthwhile trade-off.

How to Optimize the Database

When a Laravel app slows down, the database is usually the first place to look.

Add Indexes Around Real Query Patterns

Start with your slow query log, database metrics, and traces rather than intuition. If the app constantly looks up active subscriptions by account, build a compound index that matches that access pattern:

Schema::table('subscriptions', function (Blueprint $table) {
    $table->index(['account_id', 'status', 'renews_at']);
});

Then write the query so it can actually use the index:

\(subscription = Subscription::where('account_id', \)accountId)
    ->where('status', 'active')
    ->where('renews_at', '>=', now())
    ->orderBy('renews_at')
    ->first();

Get in the habit of running EXPLAIN after you add an index to confirm that the plan changed. An index the optimizer ignores is just write overhead.

Use Eager Loading Deliberately

Match eager loading to what the endpoint actually returns. For list endpoints, keep relationships shallow and constrained:

$projects = Project::query()
    ->select(['id', 'account_id', 'name', 'updated_at'])
    ->withCount('openTasks')
    ->with([
        'owner:id,name',
    ])
    ->where('account_id', $accountId)
    ->latest('updated_at')
    ->paginate(30);

When you only need a number, withCount beats loading a whole relationship to count it:

$teams = Team::query()
    ->withCount([
        'members',
        'invitations as pending_invitations_count' => fn (\(query) => \)query->whereNull('accepted_at'),
    ])
    ->paginate(25);

Your memory footprint stays flat, which matters much more on a list page than on a detail page.

Optimize Queries Before Adding Hardware

A bigger database instance buys you time. It also hides the inefficient queries that put you there until the next traffic jump exposes them again. Before you reach for a larger machine, find your highest-cost queries. In local or staging environments, logging slow ones is easy:

use Illuminate\Database\Events\QueryExecuted;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Log;

DB::listen(function (QueryExecuted $query) {
    if ($query->time > 100) {
        Log::warning('Slow query detected', [
            'sql' => $query->toRawSql(),
            'time_ms' => $query->time,
        ]);
    }
});

Be careful doing this in production. Bindings can contain sensitive data, and verbose logging at high volume can become its own performance problem.

Process Large Tables with Chunking

Never pull an entire large table into memory for a batch job:

User::where('is_active', true)
    ->chunkById(1000, function ($users) {
        foreach (\(users as \)user) {
            RefreshUserSearchIndex::dispatch($user->id);
        }
    });

chunkById is safer than offset-based chunking when rows can change while the job runs, because it tracks the last seen ID instead of a numeric offset. For very large exports, stream the records or write them out in batches.

Use Cursor Pagination for High-Volume Feeds

Offset pagination gets slower the deeper a user scrolls, because the database still has to skip every row it's not returning. For feeds, audit logs, messages, and timelines, cursor pagination is usually the better fit:

$events = AuditEvent::query()
    ->where('account_id', $accountId)
    ->orderByDesc('id')
    ->cursorPaginate(50);

return AuditEventResource::collection($events);

It relies on a stable, indexed ordering column and uses next/previous cursors rather than arbitrary page numbers, which is what an infinite-scroll feed usually needs.

Split Reads with Read Replicas

As read traffic grows, replicas can take load off the primary:

'mysql' => [
    'driver' => 'mysql',
    'read' => [
        'host' => [
            env('DB_READ_HOST', '127.0.0.1'),
        ],
    ],
    'write' => [
        'host' => [
            env('DB_WRITE_HOST', '127.0.0.1'),
        ],
    ],
    'sticky' => true,
    'database' => env('DB_DATABASE', 'laravel'),
    'username' => env('DB_USERNAME', 'root'),
    'password' => env('DB_PASSWORD', ''),
],

The sticky option keeps reads on the write connection after a write within the same request, which helps avoid some read-after-write surprises.

Replicas come with replication lag, and that lag matters. Don't route payment confirmations, password changes, permission checks, or anything else consistency-sensitive to a replica that might be a few seconds stale unless the business flow can genuinely tolerate seeing old data.

How to Scale with Redis

Redis often does a lot in a Laravel production stack: caching, sessions, rate limiting, queues, locks, and Horizon metrics. It's fast, but it still needs thought: sensible key design, expiration policies, memory monitoring, and a real plan for invalidation.

Caching

Cache expensive reads that get requested often and can tolerate being slightly out of date:

use Illuminate\Support\Facades\Cache;

$stats = Cache::remember(
    "accounts:{$account->id}:dashboard-stats",
    now()->addMinutes(5),
    fn () => DashboardStats::forAccount($account)->calculate()
);

Short time-to-live values go a surprisingly long way. A five-minute cache can wipe out thousands of duplicate queries while keeping the data fresh enough for most dashboards.

When the data changes after a known event, invalidate it explicitly:

Order::created(function (Order $order) {
    Cache::forget("accounts:{$order->account_id}:dashboard-stats");
});

Caching works best when your keys are predictable and your invalidation is tied to domain events rather than guesswork.

Sessions

For horizontally scaled app servers, file-based sessions are a trap: the next request can land on a different server that has never seen the session. Store sessions in Redis or a database so any server can handle any request:

SESSION_DRIVER=redis
CACHE_STORE=redis
QUEUE_CONNECTION=redis

Rate Limiting

Rate limits protect you from abusive clients, runaway loops, and endpoints that get hammered:

use Illuminate\Cache\RateLimiting\Limit;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\RateLimiter;

RateLimiter::for('api', function (Request $request) {
    return Limit::perMinute(120)->by(
        optional(\(request->user())->id ?: \)request->ip()
    );
});

Expensive endpoints deserve stricter limits:

RateLimiter::for('exports', function (Request $request) {
    return Limit::perHour(10)->by($request->user()->id);
});

Let business cost drive the numbers. Login, search, export, and webhook endpoints rarely need the same limit.

Queues

Redis is a common queue backend because it's quick and Horizon supports it well:

QUEUE_CONNECTION=redis

Dispatch work onto named queues from the request:

GenerateInvoicePdf::dispatch($invoice->id)
    ->onQueue('documents');

Split work by profile, such as default, emails, webhooks, documents, and imports, because each workload can need different worker counts and retry rules. Keep the names meaningful. During an incident, "the documents queue is 20 minutes behind" tells you far more than "default is slow."

How to Use Queue-Driven Architectures

Queues are one of Laravel's best scaling tools. They let the app accept work quickly and process it asynchronously with controlled concurrency. They also make the system more resilient: when a third-party API goes down, jobs retry on their own instead of tying up your PHP-FPM request workers.

Laravel Queues

A good job is small, idempotent, and safe to retry:

use App\Mail\OrderReceiptMail;
use App\Models\Order;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Queue\Queueable;
use Illuminate\Support\Facades\Mail;

class SendOrderReceipt implements ShouldQueue
{
    use Queueable;

    public int $tries = 3;
    public int $backoff = 60;

    public function __construct(public int $orderId)
    {
    }

    public function handle(): void
    {
        \(order = Order::with('user')->findOrFail(\)this->orderId);

        Mail::to(\(order->user)->send(new OrderReceiptMail(\)order));
    }
}

Pass IDs into jobs rather than full Eloquent models. The model might change before the job runs, and serializing a whole model bloats the payload. For external APIs, add timeouts and guard against duplicate work:

use App\Models\Order;
use App\Services\CrmClient;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Queue\Queueable;

class SyncOrderToCrm implements ShouldQueue
{
    use Queueable;

    public int $tries = 3;
    public int $backoff = 60;

    public function __construct(public int $orderId)
    {
    }

    public function handle(CrmClient $crm): void
    {
        \(order = Order::findOrFail(\)this->orderId);

        if ($order->crm_synced_at) {
            return;
        }

        \(crm->upsertOrder(\)order->external_reference, [
            'total' => $order->total,
            'status' => $order->status,
        ]);

        $order->forceFill(['crm_synced_at' => now()])->save();
    }
}

The crm_synced_at check is the whole point. Jobs run more than once in real life, and idempotency is what keeps a retry from double-charging or double-syncing.

Horizon

Horizon gives you visibility and control over Redis queues. A typical setup runs different supervisors for different workloads:

'production' => [
    'supervisor-default' => [
        'connection' => 'redis',
        'queue' => ['default', 'emails'],
        'balance' => 'auto',
        'maxProcesses' => 20,
        'tries' => 3,
    ],

    'supervisor-documents' => [
        'connection' => 'redis',
        'queue' => ['documents'],
        'balance' => 'simple',
        'maxProcesses' => 5,
        'tries' => 2,
        'timeout' => 300,
    ],
],

The separation matters: a long-running document job shouldn't starve a quick password-reset email.

Failed Jobs and Retries

Retries only help when failures are temporary. Retrying a job that's permanently broken just burns capacity. For jobs with a business deadline, use retryUntil:

use DateTime;
use Throwable;

public function retryUntil(): DateTime
{
    return now()->addMinutes(30);
}

public function failed(Throwable $exception): void
{
    ImportBatch::whereKey($this->batchId)->update([
        'status' => 'failed',
        'failed_reason' => $exception->getMessage(),
    ]);
}

Use failed to flag the problem somewhere a human will see it. Whatever you do, don't set unlimited retries on jobs that hit a third-party service.

Queue Monitoring

Track queue depth, wait time, failure rate, and processing time together. Depth alone can mislead you. When depth starts climbing, walk through it methodically: are workers keeping pace with incoming jobs? If the queue keeps growing, check how long individual jobs take. If the slow part is the database, fix the query or dial back worker concurrency. If it's an external API, add backoff or a circuit breaker. If the work is CPU-bound, scale workers or break the jobs into smaller pieces.

Be careful with the "scale workers" instinct, though. Adding more workers without checking the database first can make an incident worse. More workers mean more concurrent queries, more locks, and more pressure on the primary exactly when it's already struggling.

How to Optimize API Performance

APIs earn special attention because clients call them repeatedly and payloads tend to grow quietly over months.

API Resources

Resources keep your response shape intentional:

class OrderResource extends JsonResource
{
    public function toArray($request): array
    {
        return [
            'id' => $this->id,
            'status' => $this->status,
            'total' => $this->total,
            'placed_at' => $this->created_at->toIso8601String(),
            'customer' => new CustomerSummaryResource($this->whenLoaded('customer')),
        ];
    }
}

whenLoaded is doing real work here. It stops the resource from quietly triggering a lazy query when the relationship wasn't eager loaded:

$orders = Order::query()
    ->with('customer:id,name')
    ->where('account_id', $accountId)
    ->latest()
    ->paginate(50);

return OrderResource::collection($orders);

Pagination

Returning unbounded collections is an easy way to create an API performance problem you won't notice until a client has a lot of data:

$perPage = min((int) request('per_page', 50), 100);

\(orders = Order::where('account_id', \)accountId)
    ->latest()
    ->paginate($perPage);

Cap the page size. If a client genuinely needs every record for an export, make that an async job rather than a giant synchronous response.

Response Optimization

Stop returning fields nobody reads. On read-heavy endpoints, selecting only the columns you need cuts both database I/O and serialization cost:

$products = Product::query()
    ->select(['id', 'name', 'slug', 'price', 'thumbnail_url'])
    ->where('is_visible', true)
    ->orderBy('name')
    ->paginate(40);

It's also worth turning on compression at the web server or load balancer. JSON compresses extremely well, and that's often a small config change with a real bandwidth payoff.

Rate Limiting

Design API rate limits around identity and endpoint cost:

Route::middleware(['auth:sanctum', 'throttle:api'])
    ->group(function () {
        Route::get('/orders', [OrderController::class, 'index']);
        Route::post('/exports/orders', [OrderExportController::class, 'store'])
            ->middleware('throttle:exports');
    });

This keeps casual browsing and expensive exports under separate policies, so one heavy user can't squeeze out everyone else.

Caching API Responses

Cache responses that are expensive to compute and can tolerate being a little stale:

public function index(Request $request)
{
    \(accountId = \)request->user()->account_id;
    \(page = \)request->integer('page', 1);

    \(cacheKey = "api:accounts:{\)accountId}:orders:v1:page:{$page}";

    return Cache::remember(\(cacheKey, now()->addSeconds(60), function () use (\)accountId) {
        return OrderResource::collection(
            Order::with('customer:id,name')
                ->where('account_id', $accountId)
                ->latest()
                ->paginate(50)
        )->response()->getData(true);
    });
}

Notice the v1 in the key. Bumping that version number lets you invalidate an entire response format at once when the shape changes. Always scope the key to the tenant or user for anything that's not truly global.

How to Monitor Laravel in Production

The teams that catch problems before customers do are the ones collecting signals from everywhere: Laravel, queues, the database, Redis, the infrastructure, and external services.

Laravel gives you several good starting points. Horizon shows queue throughput, failed jobs, wait times, and worker balancing. Telescope surfaces request details, queries, exceptions, jobs, mail, and cache events. Your logs capture slow operations, unexpected retries, and external failures. Your metrics track latency, error rate, queue depth, job runtime, database CPU, lock waits, cache hit ratio, and Redis memory. Your alerting ties all of it back to something a customer would actually feel.

That last part is where teams often make mistakes. The best alerts are about symptoms, not machines being busy: p95 API latency over 800ms for 10 minutes, checkout error rate above 1%, the emails queue waiting more than 5 minutes, database CPU over 85% with slow queries rising, Redis memory over 80%, or failed payment webhooks crossing a threshold.

A useful mental model is this: logs tell you what happened, metrics tell you whether the system is healthy, and traces tell you where the time went. In practice, wrapping your expensive business operations in a bit of instrumentation pays off quickly:

use Illuminate\Support\Facades\Log;

$startedAt = microtime(true);

\(report = \)builder->forAccount($account)->build();

Log::info('Billing report generated', [
    'account_id' => $account->id,
    'duration_ms' => (int) ((microtime(true) - $startedAt) * 1000),
    'invoice_count' => $report->invoiceCount(),
]);

When something is failing at 2am, a log line like that can tell you which account, import, or report is causing the pressure.

One more thing worth internalizing: monitor wait time, not just throughput. A queue can process thousands of jobs a minute and still be unhealthy if important jobs sit waiting too long before they start. Users feel the wait, not the throughput.

An Example High-Traffic Laravel Architecture

A high-traffic Laravel setup generally separates four things: stateless web requests, shared cache and session storage, asynchronous workers, and database roles.

Users hit a load balancer, which spreads traffic across a fleet of stateless Laravel app servers. Those servers use Redis for cache, sessions, rate limits, queues, and Horizon data. Queue workers handle slow or unreliable work off to the side. A MySQL primary takes all writes and any consistency-sensitive reads, while a read replica absorbs read-heavy endpoints that can tolerate some replication lag.

The flow looks like this:

Users
  -> Load balancer
  -> Stateless Laravel app servers
  -> Redis for cache, sessions, rate limits, queues, and Horizon data
  -> Primary database for writes and consistency-sensitive reads
  -> Read replica for safe read-heavy endpoints

Redis queue
  -> Queue workers
  -> Database, external APIs, mail providers, object storage, and other services

This isn't the only valid shape. PostgreSQL can stand in for MySQL, Amazon SQS can replace Redis queues, a CDN can serve static assets and cache public responses, and object storage should hold user uploads. The principle that matters is that each layer has one clear job and can be scaled or tuned on its own.

The flip side of stateless app servers is that anything a user needs after the request ends has to live in shared storage. Uploads, generated files, and session state shouldn't sit on a single server's local disk, or they may disappear from the user's point of view when the load balancer sends the next request somewhere else.

Lessons Learned the Hard Way

1. Premature Optimization

This usually shows up as elaborate infrastructure built before the app has any real visibility into itself.

The practical path works better: measure, rank the bottlenecks, fix the biggest one, repeat. For most Laravel apps, the first round of scaling is mostly indexes, N+1 fixes, queue separation, and trimming payloads.

2. Over-caching

Caching can make a system faster and harder to reason about at the same time. One team cached an account-settings response for 30 minutes, then later folded role changes into that same response. The result was that users who had just lost access could still see features until the cache expired.

The fix was splitting stable account metadata away from permission-sensitive state. The lesson is to avoid caching authorization data unless you have thought carefully about invalidation.

3. Missing Indexes

These hide until a table crosses a size threshold. A query that scanned 20,000 rows in development can scan 20 million in production. Bake index review into feature work, and plan big index migrations carefully so they don't lock a hot table at the worst possible time.

4. Queue Overload

Queues don't remove work, they move it. The classic failure is letting one noisy workload block everything else. A big CSV import floods the default queue, and password-reset emails get stuck behind it. Separate queues are cheap insurance against that entire class of incident.

5. Large Transactions

Long transactions hold locks longer and make failures more expensive. Dispatching a job inside a transaction is especially risky because a worker can grab it before the transaction commits:

DB::transaction(function () use ($request) {
    $order = Order::create([...]);
    \(order->items()->createMany(\)request->items);

    GenerateInvoicePdf::dispatch($order->id);
    SyncOrderToCrm::dispatch($order->id);
});

Use after-commit dispatching for any job that depends on committed data:

GenerateInvoicePdf::dispatch($order->id)->afterCommit();
SyncOrderToCrm::dispatch($order->id)->afterCommit();

Keep transactions scoped to the data that genuinely has to change atomically, and nothing more.

6. Treating Symptoms as Causes

This is the expensive one. If latency is high because an endpoint runs 300 queries, adding app servers adds database pressure. If jobs are slow because an external API is rate-limiting you, adding workers multiplies the failures.

Good scaling work keeps asking the same questions: What resource is saturated? Which endpoint, job, tenant, or query is causing it? Is this work necessary during the request? Can I reduce it, defer it, cache it, or isolate it? How will I know whether the change helped?

A Pre-Launch Scaling Checklist

Run through this before a big launch, a traffic campaign, or an enterprise rollout.

Application and runtime: Cache config, routes, and views during deploy. Set APP_DEBUG=false. Turn on OPcache. Keep web requests short and move slow work to queues. Store uploads in object storage, not on app-server disk. Keep servers stateless. Set timeouts on every external HTTP call.

Database: Review slow query logs first. Add indexes for your high-volume filters, joins, and ordering. Hunt for N+1 queries in controllers, resources, policies, and views. Paginate every list endpoint. Use chunkById or cursors for batch work. Avoid long transactions and external calls inside transactions. Confirm your backup and restore process works. Test stale-read behavior if you use replicas.

Redis and cache: Use Redis for cache, sessions, rate limiting, and queues where it fits. Set TTLs unless you have a clear reason not to. Include tenant, user, locale, and version in keys when relevant. Watch memory and the eviction policy. Avoid caching permission-sensitive responses without careful invalidation. Guard against cache stampedes on expensive recomputation.

Queues: Separate queues by workload. Configure Horizon supervisors per queue. Set timeouts, retries, and backoff on purpose. Make jobs idempotent where you can. Use afterCommit for jobs that depend on committed data. Monitor wait time, runtime, failures, and retries. Review failed jobs instead of ignoring them.

APIs: Use Resources to control response shape. Cap per_page. Use cursor pagination for big feeds and logs. Cache expensive reads with safe, versioned keys and short TTLs. Apply rate limits by endpoint cost. Don't return raw Eloquent models. Compress responses at the edge.

Observability: Track p50, p95, and p99 latency on the endpoints that matter. Track error rates by route and job class. Alert on queue wait time, not just size. Watch database CPU, connections, slow queries, and lock waits. Watch Redis memory, latency, and evictions. Log important business operations with durations and identifiers. Test your alerts before launch night because a silent alert is worse than no alert.

Conclusion

Laravel runs high-traffic production systems well when you design around the real costs of data, concurrency, and external dependencies. Just make sure you measure before you optimize, because guessing wastes time and tends to complicate the wrong layer.

Fix the database first: indexes, query shape, pagination, and eager loading usually deliver the biggest early wins. Lean on queues to keep requests fast and push slow work into controlled background workers. Cache deliberately, with clear keys, sane TTLs, and a plan for invalidation. Keep watching latency, errors, queue wait time, database health, Redis memory, and your external dependencies.

The best scaling work is practical and repeatable. You study the system you actually have, remove waste, isolate slow parts, and give yourself enough visibility to make the next change with confidence. Do that on a loop, and you rarely need the big rewrite.

References

How to Use PostgreSQL as a Cache, Queue, and Search Engine

Aaron Yong — Tue, 21 Apr 2026 16:58:55 +0000

"Just use Postgres" has been circulating as advice for years, but most articles arguing for it are opinion pieces. I wanted hard numbers.

So I built a benchmark suite that pits vanilla PostgreSQL against a feature-optimized PostgreSQL instance — measuring caching, message queues, full-text search, and pub/sub under controlled conditions.

In this article, you'll learn how to use PostgreSQL's built-in features for caching, job queues, full-text search, and pub/sub. You'll see actual benchmark results (latency percentiles, throughput, and error rates) comparing naive PostgreSQL patterns against optimized ones, and understand where PostgreSQL's limits are so you can decide whether you really need that extra service in your stack.

Prerequisites
The Setup
Benchmark 1: Caching with UNLOGGED Tables
Benchmark 2: Job Queues with SKIP LOCKED
Benchmark 3: Full-Text Search with tsvector
Benchmark 4: Pub/Sub with LISTEN/NOTIFY
The Combined Workload: The Honest Test
What I Learned

Prerequisites

To follow along or reproduce the benchmarks, you'll need:

Docker and Docker Compose
Node.js 20+ (for the Express TypeScript API layer)
k6 for load testing
Basic familiarity with SQL and PostgreSQL

The full benchmark project is open source on GitHub — you can clone it and run every test yourself.

The Setup

The benchmark uses two identical PostgreSQL 17 instances running in Docker containers, each with fixed resource constraints (2 CPUs, 2 GB RAM). Both share the same Express TypeScript API layer — the only difference is which PostgreSQL features are enabled.

┌─────────┐     ┌──────────────────┐     ┌─────────────────┐
│   k6    │────>│  Express API     │────>│  PG Baseline    │
│  (load  │     │  (TypeScript)    │     │  (vanilla PG17) │
│  test)  │────>│  Port 3001/3002  │────>│  PG Modded      │
└─────────┘     └──────────────────┘     │  (features on)  │
                                         └─────────────────┘

The baseline instance uses naïve approaches (regular tables, ILIKE search, polling). The modded instance uses PostgreSQL's built-in features (UNLOGGED tables, tsvector with GIN indexes, LISTEN/NOTIFY, partial indexes). Same hardware, same API code, same data. Only the database features differ.

Both instances share this tuned postgresql.conf:

# Memory allocation
shared_buffers = 512MB           # 25% of available RAM
effective_cache_size = 1536MB    # 75% of RAM — helps the query planner
work_mem = 16MB                  # per-sort/hash operation memory

# SSD-optimized planner settings
random_page_cost = 1.1           # default 4.0 assumes spinning disks
effective_io_concurrency = 200   # allow parallel I/O on SSDs

These settings matter. The defaults assume spinning disks from the early 2000s. Setting random_page_cost = 1.1 tells the query planner that random reads are nearly as fast as sequential reads on SSDs, which encourages index usage over sequential scans.

Benchmark 1: Caching with UNLOGGED Tables

The idea: Use an UNLOGGED table as an in-database cache. UNLOGGED tables skip PostgreSQL's Write-Ahead Log (WAL) — the mechanism that guarantees durability. Since cache data is ephemeral by nature, losing it on a crash is acceptable, and skipping WAL removes the biggest write bottleneck.

-- Modded: UNLOGGED table for cache entries
CREATE UNLOGGED TABLE cache_entries (
    key TEXT PRIMARY KEY,
    value JSONB NOT NULL,
    expires_at TIMESTAMPTZ
);

-- Baseline: same schema, but a regular (logged) table
CREATE TABLE cache_entries (
    key TEXT PRIMARY KEY,
    value JSONB NOT NULL,
    expires_at TIMESTAMPTZ
);

Results (200 Virtual Users)

Mode	p50	p95	avg	req/s
Baseline (regular table)	1.87ms	6.00ms	2.50ms	1,754/s
Modded (UNLOGGED table)	1.71ms	5.24ms	2.17ms	1,760/s

A consistent 13% improvement across all percentiles. Not dramatic, but free — you change one keyword in your CREATE TABLE statement.

Under Stress (1,000 Virtual Users, No Sleep)

Mode	p50	p95	req/s	Total Requests
Baseline	83.38ms	143.23ms	7,663/s	728,021
Modded	77.69ms	126.39ms	8,062/s	765,934

The relative improvement stays locked at 12-13% regardless of load level. The UNLOGGED advantage is a per-write optimization — it saves the same amount of I/O whether you are doing 100 or 10,000 writes per second. The modded instance served 37,000 more requests in the same time window.

The Verdict

UNLOGGED tables won't match Redis for sub-millisecond hot-path caching (real-time bidding, gaming leaderboards). But for web applications where the difference between 2ms and 5ms is invisible to users, they eliminate an entire infrastructure dependency for zero additional complexity.

You do give up Redis data structures (sorted sets, HyperLogLog, streams). If you need those, a dedicated cache is still the right call.

Benchmark 2: Job Queues with SKIP LOCKED

The idea: Use PostgreSQL as a job queue with SELECT ... FOR UPDATE SKIP LOCKED. Multiple workers poll the same table, and SKIP LOCKED ensures each worker gets a different row — no duplicates, no contention.

-- Queue table with a partial index on pending jobs only
CREATE TABLE job_queue (
    id SERIAL PRIMARY KEY,
    payload JSONB NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Partial index: only indexes pending jobs
-- As jobs complete, they leave the index — it stays small forever
CREATE INDEX idx_pending_jobs ON job_queue (created_at)
    WHERE status = 'pending';

The dequeue pattern:

-- Atomic dequeue: select + update in one statement
UPDATE job_queue SET status = 'processing'
WHERE id = (
    SELECT id FROM job_queue
    WHERE status = 'pending'
    ORDER BY created_at
    LIMIT 1
    FOR UPDATE SKIP LOCKED  -- skip rows locked by other workers
) RETURNING *;

How SKIP LOCKED works: Worker A locks row 1. Worker B tries row 1, sees the lock, skips it, and takes row 2 instead. No blocking, no duplicates. If a worker crashes, the transaction rolls back and the row becomes available again.

Results (100 Producers + 50 Consumers)

Mode	p50	p95	avg	req/s
Baseline (full index)	1.90ms	5.01ms	2.30ms	1,053/s
Modded (partial index)	1.81ms	5.28ms	2.29ms	1,052/s

They're virtually identical. The partial index doesn't show its value in a 60-second benchmark because the table doesn't accumulate enough completed rows for the index size difference to matter. In a production system with millions of completed jobs, the partial index keeps the index at kilobytes while a full index grows to gigabytes.

The Verdict

SKIP LOCKED is production-ready for job queues. Libraries like pg-boss (Node.js) and river (Go) build on this exact pattern.

You do give up exchange/routing patterns (fan-out, topic-based routing) and consumer groups with message replay. If you need those, a dedicated message broker is still the right tool. For simple "process this job once" workloads, PostgreSQL handles it.

Benchmark 3: Full-Text Search with tsvector

The idea: Use PostgreSQL's built-in full-text search instead of a separate search service. A tsvector column stores pre-processed search tokens, and a GIN (Generalized Inverted Index) enables fast lookups using the same inverted index concept that powers Elasticsearch.

-- Search-optimized article table
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    search_vector tsvector  -- pre-computed search tokens
);

-- GIN index for full-text search
CREATE INDEX idx_search ON articles USING GIN (search_vector);

-- Auto-update search_vector on insert/update
CREATE OR REPLACE FUNCTION update_search_vector() RETURNS trigger AS $$
BEGIN
    NEW.search_vector := to_tsvector('english',
        COALESCE(NEW.title, '') || ' ' || COALESCE(NEW.body, ''));
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_search
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION update_search_vector();

The baseline uses ILIKE with a leading wildcard — the approach most developers reach for first:

-- Baseline: sequential scan on every query
SELECT * FROM articles
WHERE title ILIKE '%postgresql%' OR body ILIKE '%postgresql%';

-- Modded: GIN index lookup with relevance ranking
SELECT id, title,
    ts_rank(search_vector, plainto_tsquery('english', 'postgresql')) AS rank
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql')
ORDER BY rank DESC LIMIT 20;

Results (500 Virtual Users)

Mode	p50	p95	avg	req/s
Baseline (ILIKE)	1.96ms	101.83ms	25.22ms	561/s
Modded (tsvector + GIN)	2.76ms	10.39ms	3.76ms	675/s

This is the standout result. The baseline's p95 of 101ms versus the modded's 10ms is a 10x improvement.

Why the baseline's p50 (1.96ms) is slightly better than the modded's (2.76ms): simple ILIKE queries on small result sets can be fast when the data fits in shared_buffers. But as load increases and the buffer cache is contested, sequential scans degrade dramatically. The GIN index stays stable.

Under Stress (500 Virtual Users, No Sleep)

Mode	p50	p95	req/s	Total Requests
Baseline (ILIKE)	599ms	1,000ms	558/s	50,212
Modded (tsvector)	209ms	396ms	1,441/s	129,679

ILIKE collapses to 1-second p95 latencies. Each query forces a sequential scan of all 10,000 articles, blocking shared buffers and starving concurrent queries. The tsvector approach serves 2.6x more requests in the same time window because the GIN index lookup is O(log n) regardless of concurrency.

The Verdict

This is the strongest argument in the entire benchmark. The fix requires zero extensions — to_tsvector(), plainto_tsquery(), and CREATE INDEX USING GIN are all built into core PostgreSQL. If you're doing WHERE column ILIKE '%term%' on any table with more than a few thousand rows, you're leaving massive performance on the table.

You do give up distributed search across shards, complex analyzers for CJK languages, and aggregation/faceted search pipelines. For a product search bar, blog search, or internal tool — PostgreSQL is enough.

Benchmark 4: Pub/Sub with LISTEN/NOTIFY

The idea: Use PostgreSQL's native LISTEN/NOTIFY for pub/sub messaging, triggered automatically on INSERT via a database trigger.

-- Trigger that fires pg_notify on every new message
CREATE OR REPLACE FUNCTION notify_message() RETURNS trigger AS $$
BEGIN
    PERFORM pg_notify(NEW.channel, NEW.payload::text);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_notify
    AFTER INSERT ON messages
    FOR EACH ROW EXECUTE FUNCTION notify_message();

Results (200 Virtual Users)

Mode	p50	p95	avg	req/s
Baseline (poll-based)	1.99ms	6.04ms	2.84ms	1,116/s
Modded (LISTEN/NOTIFY)	1.65ms	4.80ms	2.13ms	1,131/s

Here we have a 20% improvement at p95. The trigger-based approach does more work per INSERT (INSERT + NOTIFY), but the reduced round trips and better connection reuse patterns offset the overhead.

The Verdict

LISTEN/NOTIFY works for real-time features where you would otherwise reach for Redis pub/sub. The main limitation is payload size (8,000 bytes maximum) and the requirement for dedicated connections (incompatible with PgBouncer in transaction mode).

The Combined Workload: The Honest Test

Individual benchmarks are flattering. The real question: can one PostgreSQL instance handle caching, queues, search, and pub/sub simultaneously without degrading?

Results (All Four Workloads Running Together)

Mode	p50	p95	avg	req/s
Baseline	1.65ms	5.24ms	2.17ms	1,424/s
Modded	1.86ms	6.05ms	2.47ms	1,417/s

Under combined load, the baseline marginally outperforms the modded setup. The modded PostgreSQL does more work per operation — maintaining GIN indexes, firing triggers, running pg_cron in the background. When all these features are active simultaneously, the overhead is measurable: about 15% higher p95 latency.

But both setups stay comfortably under 10ms at p95. For most web applications, that's more than good enough.

What I Learned

After running all these benchmarks, here's what I would tell a team evaluating whether to "just use Postgres":

Do it for full-text search: Switching from ILIKE to tsvector with a GIN index is a 10x improvement that requires zero extensions. This is the single highest-ROI change in the entire PostgreSQL ecosystem, and most developers don't know it exists.
Do it for job queues: SKIP LOCKED is production-ready and eliminates RabbitMQ for simple "process this job" workloads. Use a library like pg-boss or river rather than rolling your own.
Consider it for caching: UNLOGGED tables give a steady 13% improvement over regular tables. If sub-millisecond latency is not a hard requirement (and for most web apps, it is not), you can drop Redis entirely.
Be honest about the overhead: Running all four roles simultaneously adds about 15% latency compared to running any single role. Whether that matters depends on your latency budget.
Know where to stop: PostgreSQL won't match Redis for sub-millisecond caching, Kafka for millions of messages per second, or Elasticsearch for distributed multi-node search with complex analyzers. The line is at extreme throughput or extreme specialization.

The honest conclusion is not "PostgreSQL does everything." It is: for most applications, a single well-configured PostgreSQL instance handles 80% of what you would otherwise need three to five additional services for. That is less infrastructure to deploy, monitor, and maintain — and fewer things to break at 3 AM.

Enterprise-scale applications processing millions of messages per second, serving sub-millisecond cache hits to millions of concurrent users, or running distributed search across terabytes of documents will still need specialized tools. Those tools exist for a reason, and at that scale the operational cost of running them is justified by the performance you get back.

But most of us aren't building at that scale — and may never need to. Starting with PostgreSQL for these roles means you ship faster with fewer moving parts. If and when you outgrow what PostgreSQL can handle, your benchmarks will tell you exactly which role needs to be extracted into a dedicated service. That is a much better position than starting with five services on day one because you assumed you would need them.

The benchmark project is open source if you want to reproduce these results or adapt the tests for your own workload.

You can find more of my writing at site.aaronhsyong.com.

How to Find Any File on Windows Like a Linux User (using Windows Powershell)

Piotr "NotBlackMagic" Opoka — Wed, 25 Mar 2026 16:00:00 +0000

Sometimes you might struggle to find a file or program when you have no idea where it could be saved or installed. And the Windows user interface may not always give you the results you want. If that's the case for you, you're in the right place.

Get-ChildItem (also known as gci, ls, dir ) is a very powerful command. And one of its most iconic uses is to find/search for a file. It's more precise and more reliable than Windows Explorer. It even has better filtering options that show the results that are more relevant to you.

In this tutorial, you'll learn how to use gci and how to combine it with other commands so that it becomes an even more powerful tool. Remember to enable copy-pasting in Windows PowerShell, so it's easier for you to follow along. You can see how to enable it here.

What we'll cover:

Basic explanation of the Get-ChildItem command
- Most used examples of searching by gci command
Setup for other more complex examples
When is the -Path option not needed?
Advanced Searching – Combining Get-ChildItem with the Where-Object Command
How to Search Through Hidden Files
How can you know all the properties that you can use as a filter?
- How to retrieve only 1 desired property
I don't know the file’s name, but I know what's inside it. How do I find the file by its content?
I can't see the full path - how do I fix this?
Hard to read? Open the results in the text editor of your choice
Summary - the ultimate commands for searching and finding whatever you need

1. Basic Explanation of the `Get-ChildItem` Command

Let's take a look at the example searching script to understand how it works:

Get-ChildItem -Recurse -Path "C:\path to\your directory\" -Filter "*whatImLookingFor*"

Get-ChildItem (aliases: dir, ls, gci) lists the content of a folder or directory just like the Linux ls command does.

This command works by searching every single file and directory in the path specified. It shows you everything it found that matches the filter. It doesn't mean that this command doesn't look everywhere else – because it does.

So you specify the path that is the parent (folder), which means that every folder and file under it is its child. If you know some CSS and JavaScript, treat it the same way that these languages do.

If you don't use -Recurse or -Depth, then the command works only in your current directory (parent Depth level 0) and searches for its children inside that directory (children Depth level 0).

If you use -Recurse, then the gci will search for what you want on ALL LEVELS. But by using-Depth, you can specify how deep you want it to look for a file/folder.

To recurse means "to repeat an operation". So, -Recurse means that gci will repeat the search for your file or folder in every child element of the "Documents" directory, and every directory inside it, all levels deep.

All of these files and folders are children of your "Documents" folder. If you delete the folder, you delete everything inside it too.

-Filter filters the output of the command to only show what matches the filter (examples of how to use filter are further in the article).

-Path tells where the command should be looking for files (by using "C:\", for example, you're telling it to look at the very basis of your computer). If you want to search in certain directory it would look like this:

Get-ChildItem -Path "C:\path to\your directory\"

Get-ChildItem -Path "~\Documents\path to\your directory\"

~\ here is a shorthand for "inside current user's folder" or "C:\Users\YourUsername".

Next, we can specify whether we'd like to look for a file or a folder, so we have fewer results to look at:

Get-ChildItem -Path C:\ -Recurse -Filter "*whatImLookingFor*" -File

Get-ChildItem -Path C:\ -Recurse -Filter "*whatImLookingFor*" -Directory

You might be wondering how you can stop the search if it takes too long. When you're using -Recurse, the output that you'll get might become quite overwhelming, especially if you didn't specify your command enough (more about that in step 3 and step 4). Luckily, you can stop any command in PowerShell after starting it with Ctrl + C OR Ctrl + Z OR Ctrl + X. All of them should work.

Most Used Examples of Searching by `gci` Command

Here are some handy examples of searching scripts that you can use:

Example #1: search for all executive files on your PC (remember that you can stop this command with one of shortcuts, like Ctrl + C):

Get-ChildItem -Path C:\ -Recurse -Filter "*.exe" -File

REMEMBER:
In order to paste commands into the PowerShell, you have to first enable it. Here's how.

This command will show you a very long list of executable files and their folders (as shown in the image below).

These lists might be so long that it's impossible to find anything in them. That's why you'll learn how to use more advanced techniques of filtering in step 4 to see fewer unnecessary results that don't fit your criteria.

Example #2: search for an executable file that has "notepad" in its name (or search for any program you need, basically):

Get-ChildItem -Path C:\ -Recurse -Filter "notepad*.exe" -File

One of the results will show you the location of the file you want:

In our case it's the C:\Windows\System32 folder.

You can mix it however you want! Thanks to that command, you don't have to remember much about your file and it will still work.

Get-ChildItem -Path C:\ -Recurse -Filter "n*pad*.*xe"

So what if you see some errors while scanning the whole system. Should you worry?

It's ok! Sometimes you might get lots of errors. They will most likely occur when a script scours the system folders/files. If you want to get rid of them, add -ErrorAction SilentlyContinue, like you see here:

Get-ChildItem -Path C:\ -Recurse -Filter "notepad*.exe" -File -ErrorAction SilentlyContinue

You can try it now ;)

2. Setup for Other More Complex Examples

Now, let's look at even more use cases for this command. But first, we'll create a space where I can show you examples.

First, create new folder inside your "Documents" folder. Let's call it "Items".

Inside it, create two text documents. Name one of them "Item 1- Green Bracelet" and the other "Item 2- Blue Bracelet" (Yes, make sure you write the first letter of each word in UPPER CASE).

Copy these files now.

Go one folder back (you can use the Ctrl + UpArrow shortcut ) and create another folder next to "Items" called "More items":

Paste the copied files inside the "More items" folder and change their names, so they have only lower case letters ("item 1- green bracelet" and "item 2- blue bracelet" ).

PRO TIP:
You can click once on a file with your mouse and then type the F2 key on your keyboard in order to change their names.

3. When is the `-Path` option not needed?

You don't have to specify the path every time. You can always just move to the desired directory with the cd (change directory) command.

This command will move you to your Documents folder:

cd ~\Documents\

Now, you should be able to see PowerShell pointing to your Documents folder on the left of the screen:

If you don't see this, then you can use double quotes " ", like in this command:

cd "~\Documents\"

Make sure that PowerShell is pointing to our desired folder. Now, the searching command looks like this without the -Path option:

Get-ChildItem -Recurse -Filter "*item*" -File

Pretty simple, right?

As you can see in the image above, we first moved to our desired directory, so later we could perform the search inside it without specifying the -Path option/parameter.

But the -Path option is very useful, either when you're creating a script or you want to search for something without moving away from the current directory:

Get-ChildItem -Path ~\Documents\ -Recurse -Filter "*item*" -File

Get-ChildItem -Path ~\Documents\ -Recurse -Filter "*item*" -Directory

Here's an example. I'm inside the System32 folder and I want to know whether the thing I'm looking for is inside the Documents folder without moving in there:

And it really is there!

From now on, because you already know what the -Path option is being used for, I won't be using it unless it's necessary.

4. Advanced Searching – Combining `Get-ChildItem` with the `Where-Object` Command

Sometimes you might have several folders named exactly the same, but they're in different places. You might want to exclude them based on their content, which folder they are in, or based on their-Depth level (see the graphic with the explanation about -Depth level in step 1). That's what we're going to cover in the next few points.

For this part of the tutorial, make sure you've gone through step 2 (but you can skip step 3 if you want).

4.1. Searching through only a particular directory

Let's say that we're now looking for the bracelets that we created in step 2. But, we want to see the results from only one folder. For that, we'll use case-sensitive search (-clike) to get only our preferred results. But -clike doesn't work with gci alone. We need to apply another filter with the Where-Object { } command:

Get-ChildItem -Path ~\Documents\ -Recurse -Filter "*item*" |   
Where-Object { $_.Name -clike "*Item*" }

OR (clearer version, without the -Path option):

Get-ChildItem -Recurse -Filter "*item*" |   
Where-Object { $_.Name -clike "*Item*" }

Let's review what's going on here:

Get-ChildItem -Recurse -Filter "*item*" searches for all files and folders with "item" in their name
| – the "pipe" symbol is used to get the output of the previous command (the list of all files and folders filtered by gci) and send it to the next command (Where-Object is applying another filter to what is already filtered by gci).
Where-Object { } is the command used for filtering the lists of objects. The filter is being specified inside the { } curly brackets.
$_ refers to all the separate objects. Treat it as "ForEachObjectFromList". And treat the whole sequence after the | as "FindObjectsFromList that have a name with 'Item' ".
$_ is very often used with Where-Object, but also with some other commands.
.Name – we choose a Name property to get from every object.
-clike finds a match that is 100% correct. All letters must be the exact same case as the phrase we specified. c stands for "case sensitive" and it checks every letter to see if it's upper case or lower case.

So, Where-Object { $_.Name -clike "*Item*" } is a filter that takes the Name parameter of every object from the list (created by gci) and checks with -clike if any Name has the word "Item" in it.

As you can see in the image below, now we'll get only the files with upper case names in our result:

IMPORTANT:
-like alone means that we're looking for a certain pattern, no matter what case the letters are. The c in -clike means that we look for the thing with exactly the same capitalization of the letters (both upper and lower case, hence the "c").

If you want to see the files without the upper case first letter, you can do that by changing "*Item*" from our current command to "*item*":

Get-ChildItem -Recurse -Filter "*item*" |   
Where-Object { $_.Name -clike "*item*" }

Let's try it out!

4.2. How to search while excluding a particular directory

In step 4.1 we learned how to search only for files/folders with specific case-sensitive names in them. After applying only two changes to our previous code, we can exclude certain directories from our search.

Here's our starting command once again:

Get-ChildItem -Recurse -Filter "*item*" |   
Where-Object { $_.Name -clike "*Item*" }

Change #1

In the example above, -clike shows only files/folders including specific phrase in their names. If we change it to -cnotlike, we'll exclude from the search all files/folders with that specific phrase in their name.

Now our code looks like this:

Get-ChildItem -Recurse -Filter "*item*" |   
Where-Object { $_.Name -cnotlike "*Item*" }

Change #2

After the first change, Where-Object { $_.Name -cnotlike "*Item*" } only excludes the names, not full paths. In order to avoid that, we need to exclude an actual path to these files. We can do that by changing $_.Name to $_.FullName, which checks for a certain phrase in the whole path to the file and in the file's name.

Now, your command should look like this:

Get-ChildItem -Recurse -Filter "*item*" |   
Where-Object { $_.FullName -cnotlike "*Item*" }

We excluded the "Items" folder from our search. You should now be able to see the files only from the "More items" directory. Try it out yourself!

What if you want to exclude the "More items" directory instead? Just change the phrase inside the filter to something like this:

Get-ChildItem -Recurse -Filter "*green*" -File |   
Where-Object { $_.FullName -cnotlike "*More*" }

We also changed the name of the file from "*item*" to "*green*" in our gci search (first line of code). That's why now we'll see only one bracelet in our result list:

The gci command has two filters applied. First, it searches for files with phrase "green" in their names. The second filter is the "Where-Object" command, which excludes anything that has the word "More" in its path. In our case, the "More items" folder got excluded.

We don't even need the case-sensitive filter in our case. The command will work the same when we exclude just a lowercase word "more". So let's change -cnotlike "*More*" to -notlike "*more*" and see if it's true:

Get-ChildItem -Recurse -Filter "*green*" -File |   
Where-Object { $_.FullName -notlike "*more*" }

As you can see, the result is the same! Despite different cases of the letters, we still got the right keyword. So, case-sensitive search isn't always needed – only when you want to be very specific.

Sometimes, being too specific might be bad and make your code not work as intended. To see what I mean, let's look at the example below. Let's apply case-sensitive search once again, but to our unchanged, lowercase keyword "more" and see if it still works:

Get-ChildItem -Recurse -Filter "*green*" -File |   
Where-Object { $_.FullName -cnotlike "*more*" }

Case-sensitive search doesn't filter out anything now, because it's too specific. Both the "Items" and "More items" folders omit the filter now.

FAQ:

If the Where-Object command is what actually filters the output for us, shouldn't we drop (delete) the -Filter option from gci?

No, we should still use the -Filter option, because it already separates around 99% of the possible files, so the Where-Object command has to work roughly only on 1% of the objects. It makes this part of the command AT LEAST 100 times faster (more often 100,000 times or even faster).

You can try using this command in -Path C:/ with and without the -Filter option. In my case, using the -Filter shortened the time needed for the whole sequence of commands to finish from 16 seconds to 8 seconds (first 7.99 seconds is used by gci, so that's why the time got shortened only by a half). That's what we call ✨optimization✨ :D

4.3 Searching only 1 directory from many with exactly the same name

We've learned how to search for a phrase anywhere inside the path of a file. But what if we want to search inside exactly the "More items" folder? For that, we'll use the -match filter (which works similarly to the -like filter).

Our phrase will also use "\", instead of "\". This is because "\" is the symbol for a folder, but alone in programming it also has some other features, which we don't want.

This command will look for a match for the "More items" folder in the path of every file from the list. Then, it will show you this file if it matches.

What if we want to check for two folders, one next to the other, simultaneously? Very easy! Just connect them with the sign for a folder "\". Here, the command will search inside the "More items" folder only if it's inside the "Documents" folder:

As you can see, we didn't use "More items", only "More". You can shorten that filter how you want. It will still be applied to the whole path. See the example below:

Get-ChildItem -Recurse -Filter "*green*" -File |
Where-Object { $_.FullName -match "s\\Mo*" }

Earlier, we used the not statement in -like filter to exclude certain files and directories. The same can be done with -notmatch:

Get-ChildItem -Recurse -Filter "*green*" -File | 
Where-Object { $_.FullName -notmatch "ents\\Ite*" }

Be aware that we're now excluding the "Items" folder from the search, not "More items".

And, with -cmatch we can apply the same case-sensitive filter as with -clike:

Get-ChildItem -Recurse -Filter "*green*" -File | 
Where-Object { $_.FullName -cmatch "green*" }

I hope you get the gist of it now.

4.4 Filter how deep (how many folders in) you want to search for the file

Sometimes you might have a very long path to some of your files. If you don't want to waste time searching every folder on your computer recursively, you can use -Depth option. It specifies how many folders to search inside your folder tree. I already showed you the picture of a folder tree in the beginning of this article, but you should take a look at it here once again.

So, how does the -Depth parameter work?

-Depth 0 means that our command will search only the current folder. It will show results of all children of Depth level 0. Those results are:
1 "child file" and 2 "child folders".

-Depth 1 searches the current folder and its child-folders. It will show the results of all children of Depth level 1. Those results are:
1 "child file", 2 "child folders", 2 "grandchild files" and 1 "grandchild folder".

-Depth 2 searches the current folder and its child and grandchild folders. It will show results of all children of Depth level 2. Those results are:
1 "child file", 2 "child folders", 2 "grandchild files", 1 "grandchild folder" and 1 "great grandchild file".

Let's see the difference between these two commands:

Get-ChildItem -Recurse -Filter "*item*" -Depth 0

Get-ChildItem -Recurse -Filter "*item*" -Depth 1

The first command will show you only the files and folders inside our current directory.
The second command will also search for them inside every folder found inside the current folder.

For the sake of practice, let's combine it with Where-Object to find the green bracelet:

Get-ChildItem -Recurse -Filter "*item*" -Depth 1 | Where-Object { $_.name -clike"*green*" }

I hope that this example showed you how easy it is to use multiple options ( -Depth, -Recurse) and filters (-Filter, Where-Object).

5. How to Search Through Hidden Files

Some files are not that easily accessible to the user. You can see some of the hidden files and folders in Windows Explorer (here's how). But sometimes it's easier to find what you need if you see only those hidden files. That's possible with PowerShell.

The options we're going to use for that are:

-Force: show files otherwise not accessible by the user, such as hidden files.
-Hidden: show only those hidden files and directories.

This example will search for hidden files in our user's folder:

gci -Path ~\ -Force -Hidden

Everything here is usually invisible to the typical user. But not for you now :D

The interesting thing is that there are more files not available to the user than the available ones. If you're brave enough, you can see them yourself (Remember! Ctrl + C stops the command!):

gci -Path ~\ -Force -Hidden -Recurse

6. How can you know all the properties that you can use as a filter?

Up until now, we'vce used some common properties, like Name and Fullname. But there are many others that you might want to access, like CreationTime (date of creating the file) or LastWriteTime (date of last edit of the file).

In this section, I'll first show you how to see all the possible properties. After that, you'll learn how to retrieve only the property you want for scripting purposes.

Go through step 2 above if you haven't already, because we're going to use the same files that we created before.

Move to the Documents folder in PowerShell.

I hope that this script looks familiar to you now. It searches for files with "item" in their names and checks if these names contain the word "green" (all lowercase letters):

Get-ChildItem -Recurse -Filter "*item*" | 
Where-Object { $_.Name -clike "*green*" }

We know that only one file should appear (if you don't trust me, just see for yourself). So, we're going to see every possible property we can use by appending (adding at the end) this fragment of code:
| Select-Object -Property *

Select-Object (alias: select) is used for selecting different types of properties. By using an option -Property we tell it to show both values and names of all the properties.

For example:

Name of property: FullName
Value of property: ~\Documents\More items\item 1- green bracelet.txt

The asterisk * at the end tells this command to show these names and values for every property possible.

The final version of this command looks like this:

Get-ChildItem -Recurse -Filter "*item*" | 
Where-Object { $_.Name -clike "*green*" } | 
Select-Object -Property *

Try finding the FullName property in there :D

This command showed us all possible properties that we can use for that 1 file that it found. If there were more files fitting the filter, then every single one of them would have a similar list of properties. But for different types of files you will get different results.

How to retrieve only 1 desired property

You've already learned how to check for all possible properties. So, how do we use any of them? Just put one of them instead an asterisk * at the end of the command, like we put CreationTime in here:

Get-ChildItem -Recurse -Filter "*green*" -File |
Where-Object { $_.Name -clike "*green*" } | 
Select-Object -Property CreationTime

You can use any other property for the sake of this exercise, like LastWriteTime:

Get-ChildItem -Recurse -Filter "*green*" -File |
Where-Object { $_.Name -clike "*green*" } | 
Select-Object -Property LastWriteTime

What if you want to retrieve only the value of the property without its name (because you already know its name and it also messes up your script)? You can retrieve just the value, by changing the -Property to -ExpandProperty:

Get-ChildItem -Recurse -Filter "*green*" -File |
Where-Object { $_.Name -clike "*green*" } | 
Select-Object -ExpandProperty LastWriteTime

See the result:

7. I don't know the file’s name, but I know what's inside it. How do I find the file by its content?

Sometimes it's easier to find a file by searching it by its content. Or perhaps you have lots of similar files and you'd like to check them quickly without opening and closing them. I'll show you some techniques that will let you achieve that in no time.

This command will search every file on your system for the specified word or phrase (in our case, the phrase is "match"):

Get-ChildItem -Path C:\ -Recurse -File | 
Select-String -Pattern 'match' -List

Here's what's happening:

Get-ChildItem -Path C:\ -Recurse -File: as you already know, this part searches for every file on your computer.
| – passes the list of files to the next command. So, the next command will search for a certain phrase only in the files listed by gci.
Select-String – "String" is a common word in programming used to describe a word/phrase/some text. So, we select the phrase that we want to search for. That phrase is specified by the -Pattern parameter (in our case it's "match").
-List tells the command to show only the first found match in every file (great if you want to just see the list of all found files).

Here's an example output of our command:

Of course, you have quite a lot of files, and some images may also appear in your search (like .svg files that are basically text files that tell the system how to draw an icon). So, it's always best to specify what type of file you're searching for. Let's look for the phrase "red" inside .svg files:

Get-ChildItem -Filter "*.svg" -Recurse | 
Select-String -Pattern 'red' -List

On the other hand, some text documents will never appear in your search (for example .doc and .docx documents are encoded in such a way that they're impossible to decode without Word).

But in regular text files, you can search for phrases with an emphasis on big and small letters with the -CaseSensitive option. Here, we're going to search for the phrase "github" with only lowercase letters:

Get-ChildItem -Filter "*.txt" -Recurse | 
Select-String -Pattern 'github' -List -CaseSensitive

Other options that you'll often use with the Select-String command are:

Select-String -AllMatch will show you all matches found in every searched file (instead of only 1 match found per file, like with -List).
Select-String -Context 3 shows the three lines of text before and after the line in which the match is found.
Select-String -Raw won't show you the paths, just the content of the files. This is great for automation and scripts. It's often combined with the -Context option.

Let's see some of these options in action:

Get-ChildItem -Filter "*.txt" -Recurse | 
Select-String -Pattern 'github' -AllMatch -Context 3

Thanks to the -Context parameter, you can see a total of seven lines (three lines before and three lines after the match) in this file, one after another. This makes it easier to differentiate it from all the other matches found by -AllMatch that might be put in a very similar context.

If you ever feel like there's too much clutter on your screen, you can combine Select-String with Select-Object to get only the paths of the files with matched phrases.

The command below will search every .txt file on your computer for the phrase specified:

Get-ChildItem -Filter "*.txt" -Recurse | 
Select-String -Pattern 'github' -List

Let's add the Select-Object -Property Path filter at the end. Now, the command will only show the paths, so there's less clutter on your screen:

Get-ChildItem -Filter "*.txt" -Recurse | 
Select-String -Pattern 'github' -List | 
Select-Object -Property Path

Some of the paths are not fully visible. We'll fix that in the next step.

8. I can't see the full path - how do I fix this?

Let's format the results with the Format-Table -Wrap -AutoSize command. -Autosize allows the result to take the whole available space. -Wrap allows wrapping (continuing the text in the next line when it doesn't fit in the space available), which creates more space if it's needed.

Here's an example:

Get-ChildItem -Path C:\ -Filter "*.txt" -Recurse | 
Select-String -Pattern 'github' -List | 
Select -Property Path | 
Format-Table -Wrap -AutoSize

Now, you can see the whole paths (or any other results you need) even in PowerShell!

9. Hard to read? Open the results in the text editor of your choice

You can send the results of any script/command in two ways:

> ~\Documents\command_output.txt
AND
| Out-File ~\Documents\command_output.txt

Both of these will create a file inside your Documents folder, which you can later open in any program of your choice and edit.

Just add whichever solution you prefer to the end of your command, like here:

Get-ChildItem -Filter "*.txt" -Recurse | 
Select-String -Pattern 'match' -List | 
Select -Property Path | 
Out-File ~\Documents\command_output.txt

In the image below, first you'll see the same command, but without exporting the results to another file. The second command, at the bottom of the image, will export the results to the other file without showing them in PowerShell:

You'll see the results from second command after opening the file in any text editor:

But, what if you can't see the full path even in your text editor?

To address this, you can add | Format-Table -Wrap -AutoSize right before sending the results to the file:

Get-ChildItem -Path C:\ -Filter "*.txt" -Recurse | 
Select-String -Pattern 'match' -List | 
Select -Property Path | 
Format-Table -Wrap -AutoSize |
Out-File ~\Documents\command_output.txt

And open the file to see the whole path!

Just remember that you have to copy each line one by one. Where you see the arrows in the screenshot above is a "newline" character, which you have to delete. Only after doing that can you copy the whole path and paste it into Windows Explorer or into some script.

10. Summary: the Ultimate Commands for Searching and Finding Whatever You Need

Here you can download a free cheat sheet with explanations of the commands and examples in one place.

Most used commands:

Case-sensitive search:

Get-ChildItem -Path C:\ -Recurse -Filter "*whatYouNeed*" |   
Where-Object { $_.Name -clike "*whatYouNeed*" } |   
Select-Object { $_.FullName } |
Format-Table -Wrap -AutoSize

Alternatively, send the result to a file:

Get-ChildItem -Path C:\ -Recurse -Filter "*whatYouNeed*" |   
Where-Object { $_.Name -clike "*whatYouNeed*" } |   
Select-Object { $_.FullName } |
Format-Table -Wrap -AutoSize |
Out-File ~\Documents\command_output.txt

Search by file's content:

Get-ChildItem -Path C:\ -Recurse | 
Select-String -Pattern 'what you remember' -AllMatch -Context 2 |
Format-Table -Wrap -AutoSize

Alternatively, send the result to the file:

Get-ChildItem -Path C:\ -Recurse | 
Select-String -Pattern 'what you remember' -CaseSensitive -AllMatch -Context 2 |
Format-Table -Wrap -AutoSize |
Out-File ~\Documents\command_output.txt

These commands should work for anything you want to find. I hope you understand now how they function after reading through this tutorial ;)

Wrapping Up

If you want to learn more about these commands, I show you how to work with them in depth in my tutorial “Learn PowerShell commands like a Linux user”.

If what you found here helped you in any way, consider following me on my social media in order to help me reach further audience: Mastodon, LinkedIn.

You can also rate me on Github and support me on Ko-fi!

Thank you for any support you're able to give. Have a great day!

How to Elevate Your Database Game: Supercharging Query Performance with Postgres FDW

Hamdaan Ali — Wed, 18 Feb 2026 22:36:48 +0000

Foreign data wrappers (FDWs) make remote Postgres tables feel local. That convenience is exactly why FDW performance surprises are so common.

A query that looks like a normal join can execute like a distributed system: rows move across the network, remote statements get executed repeatedly, and the local planner quietly becomes a coordinator. In that world, “fast SQL” is not mainly about CPU or indexes. It’s about data movement and round-trips.

This handbook covers the mechanism that determines whether a federated query behaves like a clean remote query or a chatty distributed workflow: pushdown.

Pushdown is not “moving compute”. Pushdown determines whether filtering, joining, ordering, and aggregation occur at the data source or after the data has already crossed the wire. When pushdown works, the local server receives a reduced result set. When it doesn’t, Postgres often has to fetch broad intermediate sets and finish the work locally.

The chapters ahead will help you build a practical mental model of what is “shippable” in postgres_fdw, why some expressions are blocked, and how to read EXPLAIN (ANALYZE, BUFFERS, VERBOSE) without getting tricked by familiar plan shapes.

After the core method, the handbook covers tuning knobs that matter in production, schema and indexing considerations, benchmarking methodology, monitoring and logging, and a case study that shows what a real pushdown win looks like end-to-end.

The later sections go deeper into advanced shippability edge cases, cost model calibration, and regression-proofing FDW workloads.

Prerequisites
Executive Summary
Motivation
FDW Basics Without the Setup Tax
Pushdown Mechanics
Shippable Operations: a Deep Dive
Pushdown Blockers and Why They Exist
Reading EXPLAIN Like a Pro
How to Tune postgres_fdw
Schema and Index Recommendations
Benchmarking Methodology
Monitoring and Logging
Case Study: Refactoring a Keycloak Coverage Query
Checklist and Troubleshooting Guide
Case Study Takeaways
Advanced Operations: A Deeper Dive into Shippability
Common Anti‑Patterns and How to Avoid Them
Extending Tuning: Calibrating Cost Models
Further Case Studies and Practical Examples
Monitoring, Diagnostics, and Regression Testing
Extended Guidelines for Advanced DBAs
Bringing it All Together
References

Prerequisites

This handbook assumes basic comfort with Postgres query plans. It builds on EXPLAIN (ANALYZE, BUFFERS) rather than reintroducing SQL fundamentals, indexing, or join algorithms.

The focus here is federated execution: how foreign queries behave, and how to reason about them with the same clarity as local plans.

Here’s what you should already be comfortable with:

Reading EXPLAIN (ANALYZE, BUFFERS) output and spotting obvious plan smells (row explosions, bad join order, missed indexes).
Basic join mechanics (nested loop, hash join, merge join) and why cardinality estimates matter.
Postgres statistics at a practical level (ANALYZE, correlation, and what “estimated rows vs actual rows” implies).

And here’s what you need to follow along with the examples:

A Postgres “local” instance that will run postgres_fdw and act as the coordinator.
A Postgres “remote” instance that holds the foreign tables.
Permission on the local side to:
- CREATE EXTENSION postgres_fdw;
- create a SERVER and USER MAPPING
- create FOREIGN TABLE objects (or permission to use existing ones)
A way to run queries and capture plans:
- psql is enough, and so is any GUI, as long as you can run EXPLAIN (ANALYZE, BUFFERS, VERBOSE).

We won’t go through a long environment setup walkthrough. The examples assume the FDW objects exist and focus on plans and behavior.

We also won’t go into general distributed systems theory. Only the pieces that show up in an FDW plan are used.

Executive Summary

The single most important lesson of this handbook is that FDW pushdown reduces data movement. It’s tempting to think of pushdown as merely changing where a calculation happens (“move the work to the remote”). But what really matters is whether the remote server is asked for only the rows you need.

When pushdown is working, the remote server performs the selective join and filtering, and the local Postgres receives a small, already reduced result set. When pushdown fails, the local server becomes a distributed query coordinator: it pulls large intermediate sets over the network and then finishes the heavy lifting locally.

Why does this matter? Because a refactor that makes more of your query shippable to the remote server can slash end‑to‑end latency without changing a single row of output. In the case study we'll explore later, rewriting a query so that the FDW can ship a joined remote query instead of performing multiple foreign scans and local joins reduces runtime from approximately 166 ms to 25 ms. The business logic did not change – the shape of the work changed.

Below is a simple bar chart illustrating that dramatic drop. The chart uses actual timings from the case study. If you run the experiment yourself, the numbers may differ depending on your hardware and network, but the relative difference should be clear.

Motivation

Foreign data wrappers let you query remote data using the same SQL syntax you use locally. That convenience is exactly why they can be so deceptive.

A federated query may look like a normal join, but under the hood, it behaves like a distributed system: some part of the plan runs on the remote server, some on the local server, and every boundary between them is a network hop. The slow path is rarely “bad SQL” – it’s usually a combination of two things:

Too many rows are pulled over the network. Without pushdown, the FDW retrieves a large slice of the remote table and applies your filters and joins locally. This may lead to tens of thousands or millions of rows being shipped across the network when you only needed hundreds or fewer.
Too many round-trips. If the plan performs a nested loop that drives a foreign scan, it can end up executing the same remote query hundreds or thousands of times. Each call might be fast on its own, but latency adds up.

This isn't speculation. PostgreSQL's documentation makes clear that a foreign table has no local storage and that Postgres “asks the FDW to fetch data from the external source” [1]. There is no local buffer cache or heap storage to hide mistakes. Every row you retrieve must traverse the network at least once. If your plan fetches more rows than it needs, or repeatedly does so, performance can degrade quickly.

That’s why you should treat the Remote SQL shown in EXPLAIN (VERBOSE) as part of your query plan. It tells you exactly what the remote server is being asked to do. If it’s missing your filters or joins, you know the local server will have to finish the job. The rest of this handbook will teach you how to read that plan, how to force pushdown when possible, and how to recognize the signs that something has gone wrong.

FDW Basics Without the Setup Tax

You might be tempted to skip this section if you've already created foreign tables in your own databases. Don't. Understanding the architecture of foreign data wrappers is essential to understanding why pushdown matters.

SQL/MED in a nutshell

PostgreSQL implements the SQL/MED (Management of External Data) standard through its FDW framework. To access a remote Postgres server via postgres_fdw, you perform four steps:

Install the extension: CREATE EXTENSION postgres_fdw tells Postgres to load the FDW code.
Create a foreign server: CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '...', port '...', dbname '...')defines where the remote server resides and how to connect.
Create a user mapping: CREATE USER MAPPING FOR your_user SERVER foreign_server OPTIONS (user 'remote_user', password '...') tells Postgres how to authenticate on the remote side.
Create a foreign table: CREATE FOREIGN TABLE remote_table (...) SERVER foreign_server OPTIONS (schema_name '...', table_name '...'); defines the columns and references the remote table.

Once you've done that, you can run SELECT statements against the foreign table as if it were local. But the definition hides an important detail: there is no storage associated with that foreign table [1]. Every time you SELECT, INSERT, UPDATE, or DELETE, the FDW must connect to the remote server, build a remote query, send it, and read the results. This overhead is small for simple queries but becomes critical as queries get more complex.

What postgres_fdw does and does not do

postgres_fdw does two things for you:

It builds remote SQL from your query, including pushing down safe filters, joins, sorts, and aggregates when it can.
It fetches rows from the remote server and hands them to the local executor. If some part of your query cannot be executed remotely, the local executor performs that part.

The FDW tries hard to minimize data transfer by sending as much of your WHERE clause as possible to the remote server and by not retrieving unused columns [2]. It also has a number of tuning knobs that we'll explore later (such as fetch_size, use_remote_estimate, fdw_startup_cost, and fdw_tuple_cost[3]). But the real win often comes from structuring your query so that the FDW can push work down.

There's one last architectural point to keep in mind: the remote server runs with a restricted session environment. In remote sessions opened by postgres_fdw, the search_path is set to pg_catalog only, and TimeZone, DateStyle, and IntervalStyle are set to specific values [4]. This means that any functions you expect to run remotely must be schema‑qualified or packaged in a way that the FDW can find them. It also underscores why you should not override session settings for FDW connections unless you know exactly what you are doing [4].

Pushdown Mechanics

At a high level, “pushdown” means pushing as much of your SQL query as possible to the remote server. But the FDW cannot simply send arbitrary SQL. It must be safe and portable for remote evaluation. Postgres uses the term shippable to describe expressions and operations that can be evaluated on the foreign server.

What “shippable” means in practice

An expression is considered shippable if it meets several conditions:

It uses built‑in functions, operators, or data types, or functions/operators from extensions that have been explicitly allow‑listed via the extensions option on the foreign server [2]. If you use a custom function or an extension that has not been declared, the FDW assumes it cannot run remotely.
It’s marked IMMUTABLE. Postgres distinguishes between IMMUTABLE, STABLE, and VOLATILE functions. Only immutable functions – those that always return the same output for the same inputs and don’t depend on session state – are candidates for pushdown [5]. This rule prevents time‑dependent functions, such as now() or random() from being evaluated remotely, because the result might differ between the local and remote servers.
It doesn’t depend on local collations or type conversions. PostgreSQL’s docs warn that type or collation mismatches can lead to semantic anomalies [1]. If the FDW cannot guarantee that a comparison behaves identically on both servers, it will refuse to push it down. For example, comparing a citext column to a text constant could be unsafe if the remote server doesn’t have the citext extension installed.

From these rules, you can derive a mental checklist: avoid non‑immutable functions in your WHERE clause, keep your join conditions simple and typed correctly, and list any third‑party extensions you want to use in the foreign server’s extensions option so that they are considered shippable [2].

WHERE pushdown

If a WHERE clause consists entirely of shippable expressions, it will be included in the remote query. Otherwise, it will be evaluated locally. This matters because pushing a filter down reduces the number of rows returned to the local server.

Consider a predicate like this:

WHERE created_at >= now() - interval '30 days'

Because now() is volatile (it returns a different value each time it’s called), Postgres cannot assume the remote server will interpret now() the same way. The FDW therefore pulls the entire table and applies the filter locally.

A better approach is to pass a parameter into the query or compute the cutoff timestamp once in the application and embed it into the SQL.

Join pushdown conditions

Joins are the next big lever. When postgres_fdw encounters a join between foreign tables on the same foreign server, it will send the entire join to the remote server unless it believes it will be more efficient to fetch the tables individually or unless the tables use different user mappings [6].

It applies the same precautions described for WHERE clauses: the join condition must be shippable, and both tables must be on the same server. Cross‑server joins are never pushed down – the FDW will perform them locally.

Shippability decision tree

It can be helpful to visualize the shippability rules as a flowchart. Below is a simple decision tree that you can use when inspecting an expression or join clause.

It starts with the question of whether an expression is in a WHERE or JOIN clause. Further decisions are made based on factors like using volatile functions, built-in functions, type mismatches, or cross-server joins. The flowchart concludes with outcomes like "Not shippable, evaluated locally" or "Shippable, included in Remote SQL."

If you reach the left side of the tree, the expression will be evaluated locally. If you reach the right side, the FDW can ship it.

Shippable Operations: a Deep Dive

Postgres has been expanding what postgres_fdw can be pushed down over several versions. This section walks through each operation class and the conditions required for pushdown.

Filters (WHERE clauses)

As explained above, simple filters that use built‑in operators and immutable functions are generally pushed down. If you see a Filter: node above a Foreign Scan in your plan, it means some part of your predicate didn’t qualify. Common reasons include using now(), timezone() or other volatile functions, referencing a non‑allow‑listed extension, or comparing different collation settings.

When this happens, the entire table (or at least all rows matching other shippable conditions) is fetched, and the filter is applied locally.

Plan smell: Look for a Foreign Scan node with a Filter: line directly above it. That means filtering happened locally. Also look for broad Remote SQL such as:

SELECT * FROM remote_table WHERE (name = 'Hamdaan')

with no group constraints. That's a sign that the filter was not pushed down.

Joins

Simple inner joins between foreign tables on the same foreign server are usually pushable. The join condition must satisfy the same shippability rules as filters. If the join involves more than one foreign server, if the join condition uses an unshippable function, or if the foreign tables use different user mappings, the FDW will fetch each table separately and join them locally [6]. This can lead to large intermediate sets being transferred.

Plan smell: A Hash Join or Merge Join where both inputs are Foreign Scan nodes indicates that the join was performed locally. Conversely, a single Foreign Scan representing a join and containing the JOIN ... ON clause in Remote SQL indicates that the join was pushed down.

Aggregates (GROUP BY, COUNT, SUM, and so on)

Starting in PostgreSQL 10, aggregates can be pushed to the remote server when possible. The release notes state explicitly: “push aggregate functions to the remote server,” and explain that this reduces the amount of data that must be transferred from the remote server and offloads aggregate computation [7].

To qualify, both the grouping expressions and the aggregate functions themselves must be shippable. If the FDW cannot push an aggregate, it will fetch the raw rows and perform the aggregation locally.

Plan smell: Look for a GroupAggregate node above a Foreign Scan that returns many rows. When the aggregate is pushed down, there will be no local aggregate node. Instead, the Remote SQL will include a GROUP BY clause.

ORDER BY and LIMIT

Prior to PostgreSQL 12, sorting and limiting were rarely pushed down. In version 12, Etsuro Fujita’s patch allows ORDER BY sorts and LIMIT clauses to be pushed to postgres_fdw foreign servers in more cases [8]. For the sort or limit to be pushed, the underlying scan must be pushable, and the ordering expression must be shippable. Partitioned queries or complicated join trees may still cause the sort or limit to be applied locally.

Plan smell: A local Sort or Limit node above a Foreign Scan indicates the operation was not pushed down. Conversely, a Remote SQL statement containing ORDER BY and LIMIT indicates that pushdown succeeded.

DISTINCT

Distinct operations can be pushed down when the distinct expression list is shippable. But if the distinct is combined with unshippable expressions, or if the distinct is applied after a join that cannot be pushed down, the FDW will retrieve all rows and perform the distinct locally.

Window functions

In practice, window functions are rarely pushed down through postgres_fdw. They often require ordering or partitioning semantics that are difficult to represent portably. If you see a WindowAgg node in your plan, it’s almost always local. That doesn’t mean you can't use window functions with foreign tables, but you should expect them to incur network and CPU costs.

Version differences

Postgres developers continue to improve the FDW layer. Here are some notable changes by version:

PostgreSQL 9.6 introduced remote join pushdown and allowed UPDATE/DELETE pushdown. Before 9.6, all joins were local.
PostgreSQL 10 introduced aggregate pushdown, enabling remote GROUP BY and aggregate functions [7].
PostgreSQL 12 expanded ORDER BY and LIMIT pushdown [8].
PostgreSQL 15 added pushdown for certain CASE expressions and other improvements.

If you learned FDW behavior on an older version, revisit your assumptions.

Pushdown Blockers and Why They Exist

When pushdown fails, it’s not due to bad luck. There’s always a reason grounded in safety or correctness. Here are the most common blockers and how to diagnose them.

Non‑immutable functions

Functions marked VOLATILE or STABLE cannot be pushed down because their results may differ between the local and remote server. Examples include now(), random(), current_user, and user‑defined functions that look at session variables or query the database. Even functions you might think are harmless, like age() or clock_timestamp(), can cause pushdown to fail.

Fix: Compute volatile values in your application or in a CTE before referencing the foreign table. For example, compute timestamp 'now' - interval '30 days' as a constant and compare your created_at column against that constant. Alternatively, move the logic into a stored generated column on the remote table.

Type and collation mismatches

The documentation warns that when types or collations don’t match between the local and remote tables, the remote server may interpret conditions differently [1]. This is particularly insidious when text comparisons, case‑insensitive collations, or non‑default locale settings are used. If Postgres can't guarantee the same semantics, it will pull rows locally and evaluate the expression.

Fix: Make sure that your foreign table definition uses the same data types and collations as the remote table. When in doubt, explicitly cast values to a common type.

Cross‑server joins

Joins across different foreign servers cannot be pushed down. The FDW can only ship a join when both tables reside on the same remote server and use the same user mapping [6]. Otherwise, it will perform two separate scans and join the results locally.

Fix: If you frequently join tables across servers, consider consolidating the tables on a single server, materializing a view on one side, or pulling the smaller table into a temporary local table before joining.

Mixed local and foreign joins

A join between a local table and a foreign table will not be pushed down. Even though the foreign side might be pushdown‑eligible, the FDW cannot join it with local data on the remote server. A nested loop with a parameterized foreign scan is the typical pattern here, resulting in many remote calls.

Fix: Filter or aggregate as much as possible on the foreign side first (via a CTE or by materializing a subset) before joining to local tables.

Remote session settings and search paths

Because postgres_fdw sets a restricted search_path, TimeZone, DateStyle, and IntervalStyle in remote sessions [4], any functions you call must be schema‑qualified or otherwise compatible. If a function relies on the current search path or session settings, it may break or produce different results on the remote side.

Fix: Schema‑qualify remote functions and ensure that any environment‑dependent logic is safe to execute under the default FDW session settings. If necessary, attach SET search_path or other settings to your remote functions.

Troubleshooting matrix

The table below maps symptoms in your EXPLAIN plan to likely causes and fixes. Use it as a quick diagnostic tool when something looks off.

Symptom in plan	Likely cause	Suggested fix
Foreign Scan has loops much greater than 1	Parameterized remote lookup caused by nested loop, join conditions not shippable	Rewrite join so the FDW can ship a single joined query, or batch remote requests via an `IN` list or temporary table
Broad Remote SQL that lacks scope predicates	`WHERE` clause contains non‑immutable functions or unsupported operators	Replace volatile functions with constants or allow‑list extension functions, ensure types and collations match
Local Hash Join or Merge Join between two foreign tables	Join could not be pushed down (different servers, user mappings, or unshippable join expression)	Consolidate tables on one server, align user mappings, or rewrite the join condition
Local Sort, Limit, or Unique on top of a Foreign Scan	`ORDER BY`, `LIMIT`, or `DISTINCT` could not be pushed down	Simplify sort expressions, push filters deeper, check PG version for improvements
Plan runs but gives wrong results when pushdown is enabled	Semantic mismatch due to type/collation differences or remote session settings [1] [4]	Align types/collations, schema‑qualify functions, use stable session settings

Reading EXPLAIN Like a Pro

Many developers skim EXPLAIN plans for local queries, looking at the top nodes and overall cost. For FDW queries, you must invert that habit: read the foreign parts first. The Remote SQL string tells you what the remote server is being asked to do, and the loops field tells you how many times that remote call is executed.

Inspect the Foreign Scan nodes

Start by finding the Foreign Scan node(s). In EXPLAIN (VERBOSE), each foreign scan includes a line like:

Remote SQL: SELECT ...

This line is not a trivial – it’s the actual SQL that will run on the remote server. Read it carefully. Does it include your WHERE predicates? Does it include your join conditions? If not, you know the local server will pick up the slack.

Look at the loops column. If the loops exceed 1, the same remote query is executed multiple times. For example:

Foreign Scan on public.user_entity  (rows=1 loops=416)
  Remote SQL: SELECT id, tenant_id FROM public.user_entity WHERE enabled AND service_account_client_link IS NULL AND id = $1

This is the “N+1” problem in disguise. The plan executes the foreign scan once per outer row. Multiply the per‑loop cost by the number of loops to understand why the query is slow. The fix is to rewrite the query so that the join and filters are applied in a single remote call.

Recognize InitPlan vs SubPlan

An InitPlan runs once and caches its result. A SubPlan can run per outer row. In FDW queries, subplans often drive parameterized remote scans. If you see a SubPlan attached to a nested loop that feeds a foreign scan, suspect a parameterized remote lookup and look for ways to turn it into an InitPlan or merge it into a single remote query.

Understand CTE materialization

Common table expressions (CTEs) behave differently depending on whether they are marked MATERIALIZED or NOT MATERIALIZED. A materialized CTE is computed once and stored in a temporary structure, then read by the rest of the query. A non‑materialized CTE is inlined into the parent query, allowing optimizations to span across the boundary.

In PostgreSQL 12 and later, CTEs are inlined by default unless they’re referenced multiple times or explicitly marked MATERIALIZED. Materializing a CTE that contains a foreign scan can freeze a broad remote fetch and prevent later clauses from being pushed down. On the other hand, materialization can prevent repeated remote scans if the CTE is referenced multiple times. Use this lever deliberately to control where remote work happens.

Annotated example

Let's annotate a simplified excerpt from a real plan. The goal is to show how to quickly read the relevant parts.

Nested Loop  (rows=414 loops=1)
  -> Hash Join  (rows=416 loops=1)
       -> Foreign Scan on public.user_entity (rows=1 loops=416)
            Remote SQL: SELECT id, tenant_id FROM public.user_entity WHERE enabled AND service_account_client_link IS NULL AND id = $1
  -> Foreign Scan on public.user_attribute (rows=671 loops=1)
       Remote SQL: SELECT ua.user_id, ua.value FROM user_attribute ua JOIN user_entity u ON ua.user_id = u.id JOIN tenant r ON u.tenant_id = r.id WHERE ua.name = 'attribute A' AND r.name = 'demo' AND u.enabled AND u.service_account_client_link IS NULL AND (g.name = 'keycloak-group-a' OR g.parent_group = $1)

In the old plan, the first Foreign Scan executed 416 times, each time retrieving a single row. The Remote SQL only applies the filter on enabled and service_account_client_link – it doesn’t include the tenant or group scoping. That scoping is applied by the nested loop outside the foreign scan.

In the refactored plan, the second Foreign Scan results from combining user_attribute, user_entity, user_group_membership, keycloak_group, and tenant into a single remote query. It retrieves 671 rows in a single query and includes all relevant filters. There is no repeated remote call. The timing difference is driven by the different loop values and the selectivity of the Remote SQL.

How to Tune postgres_fdw

Once you've structured your query for maximum pushdown, tuning knobs let you squeeze out further performance improvements and adjust planner decisions.

fetch_size

fetch_size controls how many rows postgres_fdw retrieves per network fetch. The default is 100 rows [9]. A small fetch size means more round-trips and lower memory usage. A larger fetch size reduces network overhead at the cost of buffering more rows in memory.

In practice, increasing fetch_size to a few thousand can reduce latency for large result sets. It’s specified either at the foreign server or foreign table level:

ALTER SERVER foreign_server OPTIONS (ADD fetch_size '1000');
ALTER FOREIGN TABLE remote_table OPTIONS (ADD fetch_size '1000');

use_remote_estimate

By default, the planner estimates the cost of foreign scans using local statistics. This can be wildly inaccurate if the foreign table has a different data distribution. Setting use_remote_estimate to true tells postgres_fdw to run EXPLAIN on the remote server to get row count and cost estimates. This can dramatically improve join order selection at the cost of an additional remote query during planning [3]. You can set this per table or per server:

ALTER SERVER foreign_server OPTIONS (SET use_remote_estimate 'true');

fdw_startup_cost and fdw_tuple_cost

These cost parameters model the overhead of starting a foreign scan and the cost per row fetched. Adjusting them can influence the planner’s choice of join strategy. A higher fdw_startup_cost discourages the planner from choosing plans with many small foreign scans (which might generate many remote calls). A higher fdw_tuple_cost discourages plans that fetch large numbers of rows [3]. Use these only after you have solid evidence from EXPLAIN and experiments.

ANALYZE and analyze_sampling

Running ANALYZE on a foreign table collects local statistics by sampling the remote table [3]. Accurate stats are essential for good estimates when use_remote_estimate is false.

But if the remote table changes frequently, these stats become stale quickly. The analyze_sampling option controls whether sampling happens on the remote side or locally. When analyze_sampling is set to random, system, bernoulli, or auto, ANALYZE will sample rows remotely instead of pulling all rows into the local server[3].

extensions

The extensions option lists extensions whose functions and operators can be shipped to the remote server [2]. If you rely on functions from citext, pg_trgm, or other extensions, add them to the server definition:

ALTER SERVER foreign_server OPTIONS (SET extensions 'citext,pg_trgm');

A quick knob impact table

Knob	Primary effect	When to change it	Possible downside
fetch_size	Number of rows per fetch	Result sets are large and latency dominates	Too large consumes memory
use_remote_estimate	Better row count/cost estimates	Planner misestimates foreign scans	Extra remote queries during planning
fdw_startup_cost	Penalty per foreign scan	Planner chooses many small foreign scans	Wrong values bias the planner
fdw_tuple_cost	Cost per row fetched	Planner pulls too many rows	Mis‑tuned values mislead planner
extensions	Which extension functions are shippable	Using extension functions in predicates	Extensions must exist and match on both servers

Schema and Index Recommendations

Pushdown doesn’t eliminate the need for good indexes. In fact, effective pushdown depends on the remote server having indexes that support the filter and join predicates you’re shipping.

Below are some patterns to watch for in FDW queries and the indexes that support them. You can adapt these to your own schema.

Table	Access pattern	Recommended index	Why
tenant (remote)	Filter by tenant.name	UNIQUE (name) or BTREE (name)	Resolves tenant ID quickly
keycloak_group (remote)	Filter by name, join by tenant_id, filter on parent_group	Composite (tenant_id, name) and (parent_group)	Supports resolving root group and walking one‑level hierarchy
user_group_membership (remote)	Join by user_id, filter by group_id	BTREE (group_id, user_id)	Efficiently finds users in a set of groups
user_attribute (remote)	Filter by name, join by user_id	Composite (name, user_id) (optionally include value)	Matches “attribute name → users → values” flow
user_entity (remote)	Filter by tenant_id, enabled, service_account_client_link IS NULL, join by id	Partial index on (tenant_id, id) with predicate on enabled and service_account_client_link IS NULL	Helps remote planner start from user table when tenant and user filters are applied
filtercategory (local)	Filter by category && uuid[], join on (entitytype, entityid)	GIN index on category, BTREE (entitytype, entityid)	Speeds array overlap checks and join predicate

In general, indexes should reflect the join order you expect the remote planner to use. If your Remote SQL starts with:

FROM user_attribute ua JOIN user_entity u ON ua.user_id = u.id JOIN user_group_membership ugm ON ...

ensure that indexes exist on user_attribute(user_id) and user_group_membership(user_id).

Benchmarking Methodology

It’s easy to claim a performance improvement without proper measurement. Here's a repeatable method you can use to benchmark FDW query changes.

Warm the caches. Run each query once to load data into the remote buffer cache and the local FDW connection. Discard the timings.
Measure latencies. Use EXPLAIN (ANALYZE, BUFFERS, VERBOSE) to capture execution times, buffer usage, and remote row counts. Be aware that EXPLAIN ANALYZE adds overhead, so record the raw execution time if possible by running the query directly.
Record remote metrics. On the remote server, enable pg_stat_statements and track the calls, total_time, and rows for each remote query. This gives you a per‑query breakdown and confirms what Remote SQL is executed.
Control for concurrency and network latency. Run benchmarks during a quiet period or isolate the test cluster. If your environment has high network latency, record the round‑trip time separately to attribute delays.
Compare apples to apples. Benchmark the old and new queries under identical conditions. Use the same sample data, same remote server, and same connection settings.
Look at row counts. The primary goal of pushdown is to reduce the number of rows shipped. Compare the rows column of each Foreign Scan node.

Here's a simple matrix you can use to record your experiments:

Scenario	What you're testing	Expected change in Remote SQL	Metrics to record
Baseline (old query)	Starting point: broad remote scans + local joins	Remote SQL lacks scoping predicates	p50/p95 latency, remote row count, local sort/hash time
Refactor (new query)	Join + filter pushdown	Remote SQL includes joins and filters	Same metrics, plus remote row count
Introduce a volatile function	Pushdown blocker test	Clause removed from Remote SQL	Remote row count increases, local filter cost increases
Type or collation mismatch	Semantic risk test	Remote SQL might change behavior or lose pushdown	Compare correctness and row counts
ORDER/LIMIT pushdown	Version‑dependent test	Remote SQL includes ORDER BY, LIMIT	Sort time shifts to remote. Row count should remain
use_remote_estimate on/off	Planning accuracy test	Planner uses remote estimates	Planning time, join order, and runtime difference

Monitoring and Logging

In production, you need to know when a query starts misbehaving. There are two places to look: the local server and the remote server.

Local metrics

pg_stat_statements. This extension tracks planning and execution times, row counts, and buffer hits for each query. Look for high total times relative to rows or calls.
Auto Explain or auto_explain. Turn on auto_explain.log_min_duration_statement to capture slow queries with plans. This will show you the Remote SQL executed and whether the plan changed.
Connection pool metrics. Monitor connection counts and wait events related to FDW operations (for example, PostgresFdwConnect, PostgresFdwGetResult) as described in the documentation [10].

Remote metrics

pg_stat_statements on the remote server. This lets you see which Remote SQL queries are being executed, how often, and how long they take. Compare these with the Remote SQL strings in your local EXPLAIN plans.
Server logs. Increase log_statement or log_min_duration_statement on the remote server to capture long-running remote queries.

Correlating local and remote metrics can reveal patterns such as a new code path causing a surge in remote queries or pushdown failures, leading to heavy remote scans.

Case Study: Refactoring a Keycloak Coverage Query

The theory above may seem abstract until you see it play out in practice. Let's walk through a real example inspired by a Keycloak integration.

The original query calculated coverage: given a list of category IDs, it returned the percentage of users who had attributes mapped to those categories and a JSON array of entity counts. The query used a CTE to build a list of scoped users, then joined it with user attributes, category mappings, and a few other tables.

Symptom

In a test environment with 100K user records, the query averaged 166 ms. This was slower than expected. Running EXPLAIN (ANALYZE, BUFFERS, VERBOSE) showed two foreign scans on the Keycloak database. The first scanned user_entity 416 times (loops = 416). The second pulled all rows from user_attribute where name = 'attributeA' before filtering by tenant and group locally.

Here's a simplified excerpt (numbers are approximate):

Foreign Scan on public.user_entity  (actual time=0.117..0.117 rows=1 loops=416)
  Remote SQL: SELECT id, tenant_id FROM public.user_entity WHERE (enabled AND service_account_client_link IS NULL AND id = $1)
Foreign Scan on public.user_attribute  (actual time=41.267..80.352 rows=80739 loops=1)
  Remote SQL: SELECT value, user_id FROM public.user_attribute WHERE (('attributeA' = name))

The first scan performed a single-row lookup 416 times. The second scan retrieved 80,739 rows because the only condition pushed down was name = 'attributeA'. Tenant and group scoping occurred locally. That meant 80k rows were transferred over the network and then filtered down to about 671 on the local side.

Diagnosis

There were two main issues.

First was the N+1 remote calls on user_entity. The join to user_entity was not pushed down, so the plan executed a remote lookup for each row from user_group_membership. This created 416 remote queries.

Second was the unscoped attribute fetch. Because the WHERE clause included user_entity.tenant_id = tenant.id and keycloak_group.name = 'groupA' in a higher CTE, the FDW could not see those predicates when scanning user_attribute. It therefore fetched all rows with name = 'attributeA' and left the tenant and group filters to the local side.

Refactor

The fix was to inline the tenant and group joins into the user_attribute scan to avoid the nested-loop pattern. The refactored selected_user_attributes CTE looked like this (simplified for readability):

WITH selected_user_attributes AS (
  SELECT DISTINCT ua.user_id, ua.value
  FROM public.user_attribute ua
  JOIN public.user_entity u ON u.id = ua.user_id
  JOIN public.user_group_membership ugm ON ugm.user_id = u.id
  JOIN public.keycloak_group g ON g.id = ugm.group_id
  JOIN public.tenant r ON r.id = u.tenant_id
  WHERE ua.name = 'attributeA'
    AND u.enabled
    AND u.service_account_client_link IS NULL
    AND r.name = 'tenantA'
    AND (g.name = 'groupA' OR g.parent_group = (
         SELECT id FROM public.keycloak_group WHERE name = 'groupA' AND tenant_id= r.id
    ))
)

This single query expresses the same scoping logic that previously lived in separate CTEs. Because all the join conditions are on the same foreign server and use built‑in operators, the FDW can push down the entire join. The new plan looked like this:

Foreign Scan  (actual time=7.840..7.856 rows=671 loops=1)
  Remote SQL: SELECT ua.user_id, ua.value FROM user_attribute ua JOIN user_entity u ON ua.user_id = u.id JOIN user_group_membership ugm ON ugm.user_id = u.id JOIN keycloak_group g ON g.id = ugm.group_id JOIN tenant r ON u.tenant_id= r.id WHERE ua.name = 'attributeA' AND u.enabled AND u.service_account_client_link IS NULL AND r.name = 'tenantA' AND (g.name = 'groupA' OR g.parent_group = $1)

Only one remote query is executed, and it returns 671 rows. Tenant and group scoping occur on the remote server. There is no nested loop or repeated remote scan. The final runtime dropped to about 25 ms.

Why it improved

Fewer rows crossing the network. The old plan fetched 80k attribute rows and filtered them locally. The new plan fetched only the 671 scoped rows.
No repeated remote calls. The old plan executed 416 remote scans of user_entity. The new plan performs one joined remote query.
Less local work. Because the join and filtering happen remotely, the local side no longer hashes or filters large sets.

Key takeaway

If you see a Foreign Scan with a high loops count or a Remote SQL that doesn’t contain your filters and joins, you’re leaving performance on the table. Merging filters and joins into a single remote query (subject to shippability rules) often yields orders-of-magnitude improvements.

Checklist and Troubleshooting Guide

The following steps summarize how to approach FDW performance tuning:

Inspect the Remote SQL. Always run EXPLAIN (VERBOSE) and look at what is being sent to the remote. If your predicates are missing, the FDW isn't pushing them down.
Check loops. If the loops are greater than 1 on a Foreign Scan, you are paying for repeated remote calls. Rewrite the query or reorder the joins to make the foreign scan run once.
Make predicates shippable. Replace volatile functions with constants or parameters. Ensure operators and functions are built‑in or explicitly allow‑listed via the extensions option [2].
Align types and collations. Use the same data types and collations on both sides to avoid semantic mismatches [1].
Push joins to the same server. Consolidate tables on one foreign server if possible. Joins across servers cannot be pushed down [6].
Use use_remote_estimate when planning seems off. Enabling remote estimates can improve join order selection [3].
Tune fetch_size and costs if your queries transfer many rows. A bigger fetch_size reduces round-trip; adjusting fdw_startup_cost and fdw_tuple_cost influences the planner [3].
Analyze foreign tables if you rely on local cost estimates. Keep in mind that stats can get stale quickly [3].
Monitor both servers. Use pg_stat_statements on local and remote servers to see how often remote queries run and how long they take.
Test version upgrades. Each major release improves FDW pushdown semantics (for example, aggregates in 10 [7], ORDER/LIMIT in 12 [8]). Retest after upgrading.

Case Study Takeaways

Querying remote data with PostgreSQL’s postgres_fdw can be fast and convenient if you respect the underlying mechanics. Pushdown is the difference between streaming a trickle of relevant rows and hauling an ocean of data across the network. It isn't simply a matter of moving CPU cycles – it changes how much data moves, how many network round-trip occur, and how much your local server has to do.

The rules may seem restrictive – use only immutable functions, avoid cross‑server joins, align types and collations – but they exist to preserve correctness while enabling optimization.

By reading EXPLAIN from the bottom up, inspecting the Remote SQL, and understanding the shippability rules, you can spot slow patterns quickly. Armed with tuning knobs like fetch_size and use_remote_estimate, and a willingness to rewrite queries to make joins and filters pushable, you can often achieve dramatic performance gains without touching your hardware.

This case study shows that rewriting a query to enable a single-joined remote query reduced runtime from around 166 ms to 25 ms. That sort of improvement is not rare. It’s what happens when you treat FDW queries as distributed queries rather than local queries in disguise.

The next time you debug a slow FDW query, remember this handbook. Check the Remote SQL. Count the loops. Ask yourself: “Am I doing the work close to the data, or am I bringing the data to the work?” Adjust accordingly, and you'll write queries that make the most of Postgres's federated capabilities while keeping your latency in check.

This section closes the case study loop and summarizes exactly what changed in the plan and why it produced a large end-to-end win. The following sections of the handbook turn that single win into a repeatable method: how Postgres determines what is shippable, how to quickly read FDW plans, which operations and versions matter, and how to debug common failure modes that prevent pushdown.

Advanced Operations: A Deeper Dive into Shippability

The previous sections introduced the basic rules around what can be pushed to the remote and why. To really make sense of those rules, you need to see how they play out on the operations you use every day.

This section walks through filters, joins, aggregates, ordering, and limits, DISTINCT queries, and window functions in more detail. By the end, you should have a mental map of which operations to trust and which to double‑check when reading your plans.

Filters and simple predicates

WHERE clauses matter more than you think

When you specify WHERE attribute = 'value' on a foreign table, the FDW will happily transmit that predicate to the remote server as long as the comparison uses built‑in types and immutable operators. For example:

WHERE id = 42 is fine
WHERE lower(username) = 'hamdaan' is fine if lower() is allow‑listed and immutable
WHERE created_at >= now() - interval '7 days' is not shippable because now() is volatile

When such a predicate cannot be pushed, the FDW will fetch every row that matches all the shippable predicates and apply the rest locally. That means that a seemingly innocuous call to now() can blow up your network traffic.

The lesson is simple: compute volatile values up front (in your application or in a CTE) and reference them as constants in the query against the foreign table.

Complex expressions are not automatically unsafe

Suppose you have WHERE (status = 'active' AND (age BETWEEN 18 AND 29 OR age > 65)). This entire expression is shippable because it uses built‑in boolean logic, simple comparisons, and immutable operators. The FDW will deparse it into remote SQL and forward it. You only need to worry when one of the subexpressions introduces a function or operator that the FDW doesn’t recognize or cannot safely assume exists on the remote.

A good heuristic is: if you can express your filter using only simple comparisons, boolean logic, and built‑in functions, pushdown should work. When in doubt, check the Remote SQL.

Array and JSON operators

Modern Postgres makes heavy use of array and JSON functions. Many of these functions, like the array overlap operator && used in the case study, are built‑in and can be shipped. But some JSON functions are provided by extensions (like jsonb_path_query or functions from the pgjson family).

If your filter uses one of these, ensure that the extension is available and allow‑listed on the foreign server. Otherwise, the FDW will fetch rows and perform the JSON logic locally. This is rarely what you want when dealing with large JSON columns.

Joins: the good, the bad, and the ugly

Same‑server joins are your friend

If you join multiple foreign tables that are all defined on the same foreign server and user mapping, and if the join condition uses only shippable expressions, then the FDW can generate a single remote join. This is the ideal case.

For example, joining orders and customers on orders.customer_id = customers.id is pushable, as long as both tables reside on the same foreign server. The remote planner will use its own statistics and indexes to plan the join, and the local server will simply iterate through the result. Postgres 9.6 and later support this pattern [6].

Cross‑server joins break pushdown

If you attempt to join two foreign tables that live on different servers (or even on the same remote server but with different user mappings), postgres_fdw will fetch the tables separately and join them locally. This is almost always slower than pushing the join down, because you end up transferring both tables in their entirety.

The FDW design team chose not to support cross‑server joins because there is no portable way to tell two remote servers to cooperate on a join. Your options are: replicate one table on the other server, materialize the smaller table locally before joining, or restructure the query to filter aggressively on each side before joining locally.

Mixed local/foreign joins are tricky

Joining a local table to a foreign table cannot be pushed down, for straightforward reasons: the remote server has no access to your local data. A common pattern that triggers repeated remote calls looks like this:

SELECT u.id, a.value
FROM users u
LEFT JOIN user_attribute a
  ON a.user_id = u.id AND a.name = 'favorite_color';

If users is a local table and user_attribute is foreign, the plan may use a nested loop: for each local u, it executes a remote lookup in user_attribute to retrieve attributes.

The fix is to flip the query: retrieve all relevant rows from user_attribute in one remote scan, then join them locally. Or, if possible, create a small temporary table on the remote side with your u.id values, perform the join entirely remotely, and then fetch the results.

Join conditions matter

Even when joining two foreign tables on the same server, an unshippable join condition will force the join to be local. For example, JOIN ON textcol ILIKE '%foo%' is not pushable because ILIKE might not exist or behave identically on the remote.

If you need case‑insensitive matching, consider lowercasing both sides: LOWER(textcol) = 'foo' (assuming the remote server has the lower() function available and allowed). Similarly, joining on a cast expression (for example, JOIN ON CAST(a.id AS text) = b.text_id) can block pushdown. Define your columns with matching types instead.

Aggregates and grouping

Aggregates are where the data movement story shines. When you can push down a GROUP BY and aggregate functions like COUNT, SUM, AVG, or MAX, you reduce the result set to just the aggregated rows. This can be a difference of several orders of magnitude.

Postgres 10 introduced aggregate pushdown [7]. But not all aggregates are equal:

Simple aggregates such as COUNT(*), SUM(col), AVG(col), MIN(col), and MAX(col) are shippable when applied to shippable expressions. Even COUNT(DISTINCT col) is often shippable, because the remote can deduplicate before counting. The FDW will wrap the aggregate in a remote query and return just the aggregated row.

If you see a GroupAggregate node on the local side, check whether all involved columns and functions are shippable. If they are, ensure that the join conditions above are also pushable.

Filtered aggregates such as COUNT(*) FILTER (WHERE x > 5) or SUM(col) FILTER (WHERE status = 'active') are often pushable, because they translate into SUM(CASE WHEN condition THEN col ELSE 0 END) or COUNT(...). As long as the filter is shippable, the FDW will push it into the remote aggregate.

User‑defined aggregates are rarely pushable. If you have a custom aggregate function, the FDW will not assume that it exists or behaves the same on the remote server. Even if you install the function on both servers, postgres_fdw won't push it unless the function is in an allow‑listed extension.

Grouping sets and rollups are not currently pushable. When you write GROUP BY GROUPING SETS (...) or ROLLUP(...), Postgres will compute the grouping locally even if the underlying scan is remote.

If you need complex rollups, consider performing them in two steps: push down the initial grouping to the remote server to reduce rows, then perform the rollup locally.

ORDER BY, LIMIT, and DISTINCT

Ordering and limiting rows may seem like purely cosmetic features, but they affect how much data is transferred. If the remote can sort and limit, the local server only receives the top N rows. If it cannot, the local server must sort everything.

Postgres 12 expanded the cases where ORDER BY and LIMIT are pushed down [8]. Here are guidelines:

Single foreign scan with simple sort: If your query selects from one foreign table and sorts by a shippable expression (for example, ORDER BY created_at DESC), the FDW will include ORDER BY in Remote SQL. It will also push down LIMIT and OFFSET. This is ideal because the remote server does the sort and sends only the top rows.
Sort after join: If you sort after joining two foreign tables on the same server, and the join and sort expressions are shippable, the FDW may push both down. But if the sort requires columns from the local side or from a different remote server, the FDW cannot push it down.
Sort after aggregation: Sorting aggregated results is often pushable as long as the aggregate itself is pushable. But when grouping occurs locally, the sort remains local.
DISTINCT behaves like GROUP BY. If the distinct expression list is shippable, the FDW can push it down. If you write SELECT DISTINCT ON (col1) col2, col3 FROM ... and col3 is not part of the DISTINCT list, Postgres will treat this as GROUP BY and may push it. Be aware that DISTINCT ON semantics differ from plain DISTINCT and may not be pushable in older Postgres versions.

Window functions

Window functions (for example, ROW_NUMBER() OVER (PARTITION BY ...), RANK(), LAG(), LEAD()) rely on ordering and partitioning across rows.

Postgres has not yet taught postgres_fdw how to push window functions. When you see a WindowAgg node in your plan, it’s almost always local. The FDW will fetch the rows, and the local server will sort, partition, and compute the window. If you need to run window functions on remote data, plan to transfer the data locally.

Version‑specific quirks

The exact pushdown capabilities vary by release. When planning migrations or deciding whether to rely on a pushdown behavior, check the release notes:

9.6: first version to support pushdown of joins and sorts, and remote updates and deletes.
10: introduced aggregate pushdown [7], significantly reducing network use for GROUP BY queries.
11: improved partition pruning and join ordering for foreign tables.
12: expanded ORDER BY and LIMIT pushdown [8].
15: added pushdown for simple CASE expressions and additional built‑in functions.
17 (development at the time of writing) continues to expand shippable constructs. Always test on your target version because subtle improvements can change what the FDW can ship.

Common Anti‑Patterns and How to Avoid Them

Everyone has run into FDW queries that seemed reasonable but turned out to be bottlenecks. Here are a few of the most common mistakes and how to correct them. These examples are deliberately simplified – so you can adapt them to your schema.

Using volatile functions in predicates

Anti‑pattern:

SELECT *
FROM audit_logs
WHERE event_ts >= now() - interval '1 day';

now() is a volatile function, so the FDW refuses to push this predicate. It pulls all rows from audit_logs and filters them locally.

Better:

SELECT *
FROM audit_logs
WHERE event_ts >= $1;

Compute $1 (a timestamp) in your application or upstream query. Or compute it once in a CTE:

WITH cutoff AS (SELECT now() - interval '1 day' AS ts) SELECT * FROM audit_logs, cutoff WHERE event_ts >= cutoff.ts;

The FDW sees a constant and pushes the predicate.

Joining local and foreign data first

Anti‑pattern:

SELECT u.email, ua.value
FROM users u
LEFT JOIN user_attribute ua ON u.id = ua.user_id AND ua.name = 'favorite_movie';

This uses a local table (users) to drive a join to a foreign table (user_attribute). The FDW receives 10,000 individual remote queries if users have 10,000 rows. Each call fetches one or zero rows from user_attribute.

Better:

-- Fetch all favorite movies remotely and join locally
WITH remote_movies AS (
  SELECT ua.user_id, ua.value
  FROM user_attribute ua
  WHERE ua.name = 'favorite_movie'
)
SELECT u.email, rm.value
FROM users u
LEFT JOIN remote_movies rm ON u.id = rm.user_id;

Now the FDW issues one query to fetch all relevant attributes, and the join is done locally in one pass.

Cross‑server joins without materialization

Anti‑pattern:

SELECT *
FROM remote_db1.orders o
JOIN remote_db2.customers c ON o.customer_id = c.id;

This is not pushable because the two tables are on different foreign servers. Postgres will fetch orders and customers separately and join them locally. If orders have 1 million rows and customers have 50,000 rows, you will transfer 1.05 million rows.

Better: Replicate or materialize one side on the other server (or locally) before joining. For example, create a materialized view m_customers on remote_db1 containing just the id and name of the customers you need, then join orders and m_customers on the same server. Alternatively, copy customers into a temporary table on the local server and join there.

Complex expressions on join keys

Anti‑pattern:

SELECT *
FROM remote_table a
JOIN remote_table b ON CAST(a.key AS text) = b.key_text;

Casting a numeric key to text prevents pushdown. The remote server cannot use indexes and must return both tables. The local server performs the join and cast.

Better: Align your schemas so that the join columns use the same type. If you cannot change the schema, create a computed column on the remote server with the appropriate type and use it in the join.

Ignoring collation and type mismatches

Anti‑pattern:

SELECT *
FROM remote_table
WHERE citext_col = 'abc';

If the remote server doesn’t have the citext extension installed, the comparison semantics will differ, and the FDW will refuse to ship the filter. This appears harmless until you see the plan and realize all rows were fetched.

Better: Install the same extensions and collations on the remote server, or convert the column to a base type like text on both sides.

Extending Tuning: Calibrating Cost Models

Earlier, we discussed fetch_size, use_remote_estimate, and the cost knobs. This section expands on how to use them strategically.

Balancing fetch size and memory

fetch_size controls how many rows the FDW asks for in each round trip [9]. Think of it as the batch size. The default (100) works well for small result sets. If you expect to retrieve tens of thousands of rows, a higher fetch size reduces the overhead of many network requests. But there are trade‑offs:

Memory consumption: Each foreign scan buffers rows until they are consumed. A huge fetch size (for example, 10,000) may allocate more memory than you expect, especially when multiple scans run concurrently. Monitor memory usage as you increase this setting.
Latency hiding: If network latency is high, overlapping network requests with local processing can hide some latency. But postgres_fdw does not pipeline multiple fetches – it waits for one batch before requesting the next. This means that a larger batch size reduces the number of waits, but cannot overlap them. If you operate across data centers, consider using a connection pooler or caching layer instead of just increasing fetch_size.

Remote estimates vs. local estimates

The planner uses statistics to estimate how many rows each node will produce, which in turn influences join order. When use_remote_estimate is false (the default), the planner guesses based on local stats collected by ANALYZE on the foreign table. This can be wrong if the remote table has a different distribution than the local sample, or if the table has changed since the last ANALYZE.

Setting use_remote_estimate to true instructs the FDW to run EXPLAIN on the remote server during planning to obtain row counts and cost estimates [3]. This can improve join ordering, especially when joining multiple foreign tables or mixing local and foreign tables. The downside is increased planning time because each remote estimate runs an extra query.

In practice:

Enable use_remote_estimate on queries with complex joins where the planner picks obviously wrong join orders. If enabling it improves the plan, consider leaving it on for that server or table.
Use ANALYZE on foreign tables periodically if your remote data is relatively static. This populates local stats and can avoid the overhead of remote estimates.
Don’t enable use_remote_estimate indiscriminately on simple lookups. The cost of additional round-trip remote flights may outweigh the benefit.

Tuning cost parameters

fdw_startup_cost and fdw_tuple_cost control how much the planner thinks it costs to start a foreign scan and fetch each row [3]. If these are too low, the planner may choose a nested loop that generates many small remote calls. If they are too high, the planner might avoid remote scans even when they are efficient.

You can adjust these parameters based on empirical measurement:

Increase fdw_startup_cost to discourage the planner from using nested loops that call the remote table repeatedly. You might set it to the average cost of a round-trip remote.
Increase fdw_tuple_cost if network bandwidth is limited or expensive. This indicates to the planner that each remote row incurs higher fetch costs than a local row. The planner will prefer plans that filter early on the remote side.

Always adjust these settings gradually and observe the effect on the plan. Keep separate settings per foreign server if network conditions differ.

When to analyze foreign tables

Running ANALYZE on a foreign table collects sample statistics by pulling a subset of rows from the remote server. This helps the planner estimate row counts when use_remote_estimate is off. It also helps decide whether to use an index on the remote side. You should analyze foreign tables when:

The remote table is large and static, and you want accurate local estimates without the overhead of remote estimates.
You have just defined a foreign table, and the default stats are empty.
You changed the extensions allow‑list to enable more pushdown and want the planner to see the effect.

Conversely, if the remote data changes constantly, ANALYZE results will quickly become stale. In that case, rely on use_remote_estimate instead.

Further Case Studies and Practical Examples

The Keycloak coverage example is not the only place where pushdown matters. The following scenarios illustrate other patterns you may encounter.

Reporting on a sharded logging system

Imagine you store application logs across multiple shards, each a separate Postgres database. You want to produce a report of the number of error logs per service per day.

A naïve approach might join all shards in one query:

SELECT shard, service, date_trunc('day', log_time) AS day, COUNT(*)
FROM shard1.logs
UNION ALL
SELECT shard, service, date_trunc('day', log_time) AS day, COUNT(*)
FROM shard2.logs
...;

This approach will fetch all log rows to the local server and aggregate them locally. A better solution is to push the grouping to each shard:

SELECT shard, service, day, sum(count)
FROM (
  SELECT 1 AS shard, service, date_trunc('day', log_time) AS day, COUNT(*) AS count
  FROM shard1.logs
  WHERE log_time >= $1 AND log_time < $2
  GROUP BY service, day
  UNION ALL
  SELECT 2 AS shard, service, date_trunc('day', log_time) AS day, COUNT(*)
  FROM shard2.logs
  WHERE log_time >= $1 AND log_time < $2
  GROUP BY service, day
  ...
) x
GROUP BY shard, service, day;

Here, each foreign server returns a small set of aggregated rows instead of raw logs. The outer aggregation sums across shards. This pattern generalizes: push grouping and filtering to the remote side, then combine locally.

Combining remote and local data for analytics

Suppose you have a local table users and a remote table orders. You want to compute the average order amount per user segment. A naïve query might look like:

SELECT u.segment, AVG(o.amount)
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.segment;

This is a local join driving a remote nested loop. The better approach is to aggregate orders remotely by user_id and join on the small result:

WITH remote_totals AS (
  SELECT user_id, SUM(amount) AS total, COUNT(*) AS n
  FROM orders
  GROUP BY user_id
)
SELECT u.segment, AVG(rt.total / rt.n)
FROM users u
JOIN remote_totals rt ON u.id = rt.user_id
GROUP BY u.segment;

This pushes the heavy aggregation to the remote and transfers only one row per user. The local join then groups by segment. As with other examples, the key is to reduce remote rows before they cross the network.

Avoiding pushdown for correctness

There are legitimate cases where you should prevent pushdown because of semantic differences. Postgres allows you to do this by adding OFFSET 0 or wrapping the foreign table in a CTE.

For example, if a built‑in function behaves differently on the remote due to a version mismatch, you can force local evaluation:

WITH local_eval AS (SELECT  FROM remote_table)  -- CTE prevents pushdown
SELECT 
FROM local_eval
WHERE some_complex_expression(local_eval.col) > 0;

Alternatively, a WHERE clause like random() < 0.1 will not push down because random() is volatile – you don't need to force it. But adding OFFSET 0 is a simple hack that prevents any pushdown:

SELECT * FROM remote_table OFFSET 0;

Knowing how to disable pushdown intentionally helps you debug. If a query returns different results when pushdown occurs, suspect type/collation mismatches or remote session settings [4].

Monitoring, Diagnostics, and Regression Testing

Monitoring doesn't end at counting remote rows. To make pushdown reliable in production, you need to set up mechanisms to detect regressions and gather evidence when performance changes.

Automate EXPLAIN regression tests

In addition to unit tests and integration tests, you can add tests that assert the shape of your plans. For instance, if a mission‑critical report must always push down a WHERE clause, you can write a test that runs EXPLAIN (VERBOSE) and checks that the Remote SQL contains the filter. You might even parse loops and assert that it is 1. When a developer inadvertently adds a non‑immutable function or changes a join, the test will fail. This is akin to snapshot testing for SQL.

Monitor pg_stat_statements across servers

Enable pg_stat_statements on both the local and remote servers. On the local side, track the total time, planning time, and rows for each FDW query. On the remote side, track which queries are being executed.

Look for outliers: a query whose remote calls spike or whose average remote rows jump from hundreds to thousands. Those are early signs of pushdown failure.

Log remote SQL with auto_explain

Setting auto_explain.log_min_duration_statement (for example, to 500ms) causes Postgres to automatically log slow queries with their plans. Combine this with auto_explain.log_verbose = true and auto_explain.log_nested_statements = true to capture remote SQL as well. When a federated query slows down, the log will show you exactly what remote SQL was executed and how often. This is invaluable in production, where you cannot always run EXPLAIN interactively.

Use connection pooling and prepare statements

postgres_fdw maintains a connection pool keyed on the user mapping. It reuses connections between queries, but you can also use connection pooling at the network level (for example, pgbouncer or pgcat).

Keeping connections warm reduces the startup cost, as captured by fdw_startup_cost. Meanwhile, preparing statements on the remote server (via PREPARE and EXECUTE) can save parse time when the same remote SQL is executed frequently. postgres_fdw can use server‑side prepared statements for parameterized scans.

Regression testing after version upgrades

Every major Postgres release brings improvements to postgres_fdw pushdown semantics. But new releases also change planner heuristics and remote SQL generation. After an upgrade, rerun your key queries with EXPLAIN (VERBOSE), compare the Remote SQL, and benchmark them.

In some cases, a release may push down something previously local, revealing a latent type mismatch or a function difference. In other cases, pushdown may be withheld due to a new rule. Don’t assume that an upgrade automatically improves performance – test it.

Extended Guidelines for Advanced DBAs

To close this handbook, here are consolidated guidelines distilled from the previous sections. They go beyond simple bullet points to capture nuances. Keep them handy for reference or print them out for your team.

Respect the FDW safety model. Immutable functions and built‑in operators are your friends. Anything outside that scope must be explicitly allowed or evaluated locally. Understand which items belong to each category and plan accordingly.
Always read the Remote SQL. Don’t trust your intuition about what is being pushed down. The Remote SQL string is the only source of truth. It indicates whether a predicate, join, sort, or limit operation is occurring remotely. It also shows parameter placeholders (for example, $1) that correspond to values passed from the local plan.
Reduce before you fetch. The network is the highest cost. If the remote can reduce rows through filtering, grouping, or limiting, let it. If it cannot, structure your query to enable it. Avoid queries that require pulling large raw tables and processing them locally.
Beware of join order. The planner sometimes chooses a nested loop with a foreign table as the inner side, resulting in repeated remote calls. Examine loops: if you see a high number, consider rewriting the query or adjusting cost parameters.
Use CTEs strategically. A CTE can isolate remote scans and let you control whether they are materialized once or inlined. Use MATERIALIZED to avoid repeated remote scans when a CTE is referenced multiple times. Use NOT MATERIALIZED to allow optimizations across CTE boundaries.
Instrument, monitor, iterate. Good FDW performance is not a one‑off fix. Monitor queries and plans. Use tests to catch regressions. Adjust tuning knobs and indexes as your data or workload changes. Document your reasoning so others can understand why a particular plan is expected.
Educate your team. Federated queries invite subtle bugs and performance traps. Share the high‑level rules – immutable functions only, cross‑server joins are local, always check remote SQL – so engineers write safer queries by default. A 30‑minute training can save hours of debugging later.

Bringing it All Together

This handbook has covered a lot of ground: from the high‑level principle that pushdown is about data movement, to the nitty‑gritty of join conditions and tuning knobs, to troubleshooting steps and case studies. It is intentionally opinionated and personal: these are the patterns and pitfalls encountered in real systems, not abstract guidelines. By sharing specific examples, I hoped to make the rules memorable and show how they interplay with actual workloads.

The goal is not just to tell you what to do, but to show you how to think and problem solve: review the plan, trace data movement, and determine whether the query is doing the heavy work in the right place.

That thinking process, practiced enough times, becomes second nature. When you write a new query, you'll automatically consider whether your predicates are immutable, whether the join can be shipped, and whether you are about to trigger an N+1 pattern. When you review plans, you'll start from the Foreign Scan nodes and remote SQL, not the top‑level node. When you tune, you'll know which knobs to twist and in which order.

Keep experimenting. Use the examples here as starting points. Try different structures in a test environment and measure the difference. The more you play with pushdown, the more comfortable you'll become with its constraints and superpowers.

If this handbook helps you avoid one performance incident or saves you from shipping a broken query, it has done its job. Enjoy exploring the federated world of Postgres.

References

[1] [2] [3] [4] [5] [6] [9] [10] PostgreSQL: Documentation: 18: F.38. postgres_fdw – access data stored in external PostgreSQL servers (https://www.postgresql.org/docs/current/postgres-fdw.html)

[7] PostgreSQL: Release Notes (https://www.postgresql.org/docs/release/10.0/)

[8] PostgreSQL: Release Notes (https://www.postgresql.org/docs/release/12.0/)

A Game Developer’s Guide to Understanding Screen Resolution

Manish Shivanandhan — Wed, 19 Nov 2025 15:59:38 +0000

Every game developer obsesses over performance, textures, and frame rates, but resolution is the quiet foundation that makes or breaks visual quality.

Whether you are building a pixel-art indie game or a high-fidelity 3D world, understanding how resolution works is essential.

It affects how your art assets scale, how your UI appears, and how your game feels on different screens. Yet, many developers still treat resolution as a simple number instead of a design decision.

Let’s learn what resolutions are and why it matters for game developers.

What Resolution Really Means

Resolution defines how many pixels a screen can display horizontally and vertically.

A monitor labelled 1920x1080 has 1920 pixels across and 1080 down, which equals over two million pixels in total. More pixels mean more visual detail but also more rendering work for the GPU.

In game development, that tradeoff is constant. Rendering at higher resolutions improves clarity but reduces frame rates unless your code and assets are optimized.

Many developers solve this by offering resolution scaling options in their games, letting players balance visual quality and performance.

It’s also important to distinguish between screen size and resolution. A 27-inch monitor and a 15-inch laptop can both run at 1080p, but the larger display will have bigger, less dense pixels.

This is where pixel density comes in. High-density displays pack more pixels per inch, creating smoother edges and sharper textures even at the same resolution.

The Evolution of Resolution in Gaming

Games have evolved alongside display technology.

Early consoles ran at 240p, then 480p during the SD era. The jump to HD with 720p and 1080p transformed game visuals. Suddenly, developers had to think about anti-aliasing, texture resolution, and UI scaling in new ways.

Today, 4K and HDR have become the standard for modern consoles and PCs. Developers now design with higher fidelity in mind, baking in lighting systems, shaders, and art pipelines that scale up to Ultra HD.

That’s why testing on different display resolutions isn’t just good practice, it’s critical for consistent player experience.

If you want to see how your game performs on large high-resolution displays, try testing it on a modern TV for PS5. These screens are optimized for 4K and 120Hz refresh rates, giving you a realistic look at how your game will appear in a living-room setup.

They also help you spot UI scaling issues, frame pacing problems, and HDR color mismatches that might go unnoticed on a typical monitor.

DPI, Scaling, and Texture Clarity

For web developers, DPI mostly affects how images scale. But for game developers, DPI connects directly to texture resolution and how art assets are perceived at different screen sizes.

A sprite that looks crisp on a 1080p monitor might appear tiny or blurry on a 4K display if not properly scaled. Engines like Unity and Unreal handle this with dynamic scaling options, but understanding the underlying math helps.

When your display density doubles, each asset needs four times as many pixels to appear at the same size and sharpness. If you do not plan for this, your carefully crafted textures might look soft or misaligned on higher-resolution displays.

This is why UI systems in modern engines rely on resolution-independent units. In Unity, Canvas Scaler helps ensure your interface looks the same on every device. In Unreal, DPI scaling rules allow developers to maintain consistent HUD layouts. Getting this right means your game remains legible on everything from handhelds to 8K TVs.

Resolution vs Performance

The biggest cost of higher resolution is GPU load. Rendering in 4K means pushing four times as many pixels as 1080p. Without proper optimization, frame rates can drop sharply.

That’s why many AAA games use resolution scaling techniques like temporal upsampling or DLSS. These methods render frames at a lower resolution and then use AI or interpolation to upscale them without losing clarity.

As a developer, you should test your game across multiple resolutions and aspect ratios. This helps ensure your render pipeline, shaders, and assets adapt smoothly. Tools like NVIDIA Nsight or Unreal’s built-in profiler show how resolution affects frame time and GPU usage.

If your game includes video content or cinematic sequences, also remember that video compression behaves differently at higher resolutions. Encoding 4K video requires significantly more bandwidth and storage, which can affect your build size and performance during playback.

Aspect Ratio and Display Diversity

Aspect ratio determines the shape of the display.

Most modern games target 16:9, but 21:9 ultrawide and 32:9 super-ultrawide displays are becoming more popular. Developers must ensure their camera framing and UI layouts adapt accordingly.

When a game is locked to one ratio, black bars or stretching can occur. To fix this, adjust your camera’s field of view dynamically or provide safe viewport settings.

Engines like Unreal let you script these adjustments easily, while Unity’s Cinemachine system handles FOV scaling automatically.

Even TVs now vary in aspect ratio capabilities, especially with new mini LED and OLED technologies. Testing across multiple ratios ensures your game looks balanced and cinematic on every screen.

The Art of Testing in 4K and HDR

4K and HDR introduce new layers of visual complexity. HDR displays show a wider range of brightness and color depth, which means lighting and textures can look completely different compared to SDR monitors. To handle this, calibrate your color grading pipeline and use tone mapping tools within your engine.

When working with HDR assets, always test your output on real hardware. Emulators and monitors often fail to reproduce true HDR contrast. A proper HDR-certified TV helps you identify overexposure, color clipping, and banding issues before release.

Preparing for Next-Gen Displays

The display industry continues to evolve fast. 8K and high refresh rate panels are already entering mainstream markets.

For developers, this means thinking ahead. Designing scalable rendering systems, supporting dynamic resolution, and maintaining flexible UI layouts are now essential parts of modern game design.

As displays get sharper, player expectations rise too. Textures, shaders, and post-processing all need to support higher levels of detail without compromising performance. By understanding how resolution interacts with your pipeline, you can future-proof your games for years to come.

Conclusion

Resolution is more than a number on a settings menu. It is a design constraint, a performance factor, and a creative opportunity. As a game developer, mastering resolution helps you build experiences that look sharp, play smoothly, and scale across every device.

The next time you polish your textures or fine-tune your rendering settings, remember that every pixel counts. Understanding how resolution, scaling, and density interact will not only make your games more beautiful but also more accessible to every player, whether they’re gaming on a laptop, a monitor, or the living-room tv that brings your visuals to life in stunning detail.

Hope you enjoyed this article. Find me on Linkedin or visit my website.

How to Optimize a Graphical React Codebase — Optimize d3-zoom and dnd-kit Code

Cedd Burge — Thu, 16 Oct 2025 12:36:57 +0000

Miro and Figma are online collaborative canvas tools that became very popular during the pandemic. Instead of using sticky notes on a physical wall, you can add a virtual post—and an array of other things—to a virtual canvas. This lets teams collaborate virtually in ways that feel familiar from the physical world.

I previously wrote an article showing how to create a Figma/Miro Clone in React and TypeScript. The code in the article was designed to be as easy to understand, and in this article, we’re going to optimize it. The code used DndKit for dragging and dropping, and D3 Zoom for panning and zooming. There were four components (App, Canvas, Draggable and Addable), and about 250 lines of code. You do not need to read the original article to understand this one.

Standard optimizations such as useCallback, memo, and similar made it about twice as fast when dragging, but made no difference for panning and zooming. More creative/intensive optimizations made it about ten times as fast in most cases.

You can see the optimized code on GitHub and there is a live demo on GitHub pages to test out the speed with 100,000 cards.

How to Measure Performance in React Apps
How to Investigate the performance
How to Optimize Panning and Zooming the Canvas
How to Optimize Dragging Cards Around the Canvas
Results
Summary

How to Measure Performance in React Apps

There are three common ways to measure performance in React Apps

React Dev Tools profiler
Chrome Dev Tools profiler, especially using custom tracks
Profiler component

These tools are all great, but none of them are quite the right fit in this case. In most codebases, the time spent executing JavaScript code (both our code and that of the React framework) is the primary issue. However, after all your code has run and React has updated the Dom, the browser still has a lot of work to do:

In this case, this browser layout and rendering time was significant, and is not accounted for by the React profiling.

You can use custom tracks in the Chrome dev tools profiler, but it is very cumbersome to use.

For us, the JavaScript performance API is the best option, which gives results that are closer to those experienced by the user, and is relatively easy to use.

First, we make a call to performance.mark in the event handler that starts the action, with a string to describe the time point. For example, when starting a zoom or pan operation:

zoomBehavior.on("start", () => {
    performance.mark('zoomingOrPanningStart');
}

Then, in a useEffect hook, we call performance.mark again, and call performance.measure to calculate the time between the two points:

useEffect(() => {
    performance.mark('zoomingOrPanningEnd');
    performance.measure('zoomingOrPanning', 'zoomingOrPanningStart', 'zoomingOrPanningEnd');
});

The React docs states that useEffect usually fires after the browser has painted the updated screen, which is what we want.

This isn't perfect, and will vary depending on the machine specifications, and what else the machine is doing at the time, but it was good enough to verify which optimizations worked best. It is possible to go further if you need to. For example, using Cypress to automate and profile scenarios, potentially running many times to get a good mean, or using Browserstack to test on a variety of devices.

How to Investigate the Performance

Most of the investigation involved using the React Dev Tools profiler to record profiles of user interactions.

The performance data shows how many commits there were in the profile, and how long each one took, which is a great way to see if there are too many commits.

Each commit displays a flame chart showing which components rendered and why they re-rendered. This makes it much easier to find ways to avoid the re-rendering, and to check that memoization strategies are working as expected. This does have some caveats though. It often says 'The parent component rendered', which is misleading default text for when it doesn’t understand what happened (and is often due to a change in a parent context). It also says things like 'hook 9 changed', which makes it time consuming to work out exactly which hook changed.

The flame chart also shows how long each component took to render. This helps target problem components that we need to focus on.

How to Optimize Panning and Zooming the Canvas

The original Canvas element used the CSS transform translate3d(x, y, k) to pan and zoom the canvas. This works, but it doesn't scale child elements, so when the zoom changes, all the cards on the canvas have to be re-rendered with a new CSS transform for the new zoom level (scale(${canvasTransform.k})).

 "canvas"
    style={{
        transform: `translate3d(${transform.x}px, ${transform.y}px, ${transform.k}px)`,
        ...
    }}>
    ...

<div
    className="card"
    style={{
        ...
        transform: `scale(${canvasTransform.k})`,
    }}>
    ...
div>

I changed this to use translateX(x) translateY(y) scale(k), which has the same effect, but does scale child elements. This way, when the zoom changes, none of the cards will be re-rendered (the style of the card component no longer uses the canvasTransform.k).

 "canvas"
    style={{
        transform: `translateX(${transform.x}px) translateY(${transform.y}px) scale(${transform.k})`,
        ...
    }}>
    ...

<div
    className="card"
    ...
div>

The Canvas still needed to re-render whenever the pan or zoom changed, and it is possible to prevent this with useRef, and updating the CSS transform with direct JavaScript Dom manipulation in the d3-zoom event handler. This doesn’t make a significant improvement to the performance though, and is a definite hack, so the trade off is not worthwhile.

Both zooming and panning get a bit slower when the canvas is zoomed very far out and there are (a lot) more cards visible on the screen, just due to the browser having to render them all. It's still workable at 100,000 cards though. There are things you can do about this. An easy option is limiting the maximum zoom extent. This is a functional change, so potentially something that doesn’t meet requirements, but it is easy to do in d3-zoom using scaleExtent:

zoom().scaleExtent([0.1, 100])

Another option is to create a bitmap for very low zoom levels and render that as a single element. This may be difficult, but it means that there will be no change to the functionality.

How to Optimize Dragging Cards Around the Canvas

Starting a drag

The useDraggable hook from DndContext causes some re-renders when starting a drag operation.

It is possible to improve this by changing the Draggable component to just have this hook (and the things that use it) and having a DraggableInner component for everything else (inside a memo). This works well for reducing the re-renders, in that the DraggableInner almost never get re-rendered, and improves the speed of starting a drag operation. However, it was still fairly slow, and the time was all under the DndContext.

A better option is to create a new NonDraggable component, that looks exactly like the Draggable component, but does not hook up with DndContext. These cards are shown on the Canvas, and have an onMouseEnter event, to swap in the Draggable component for the active card, so that dragging continued to work.

const onMouseEnter = useCallback(() => {
    setHoverCard(card);
}, []);

This works well, and significantly improves the speed when starting a drag operation, but it was still quite slow with large numbers of cards. Nearly nothing was getting re-rendered, but there is still a time cost to when using memo, as it needs to check whether components have changed.

To fix this, we create an AllCards component, that contains all the cards on the canvas as NonDraggable components. Because it always renders all the cards, it nearly never needs to be re-rendered, and it is used with memo. So instead of each individual card using a memo (with the associated time cost), there is now just one component using a memo. To make it so that the dragging still works, the active Draggable component is rendered on top, obscuring the NonDraggable component beneath it. There is also a Cover component beneath that, so that when the Draggable component is dragged away, the NonDraggable component underneath remains hidden.

Original code, where each card is a Draggable component:


    {cards.map((card) => (
        <Draggable card={card} key={card.id} canvasTransform={transform} />
    ))}

Optimized code, where the AllCards component renders all the cards as NonDraggable components, and then a Cover and a Draggable component for the active card.


<DndContext ...>
    <Cover card={hoverCard} />
    <Draggable card={hoverCard} canvasTransform={transform} />
DndContext>

This works very well. With a low number of cards, the speed is about the same, but with a high numbers of cards, it’s about twenty times faster.

There is now a new potential performance issue with the onMouseEnter event that swaps in the Draggable component for the active card, but this just adds two components to the Dom, and is very quick even with large numbers of cards.

Finishing a drag

Finishing a drag operation is hard to optimize, as the position of a card changes, and that does need to re-render, which means that the AllCards component has to re-render as well.

You can see original code below. Even when using memo with the Draggable component, the end drag operation still takes 2500ms with 100,000 cards, mostly due to the complexity of the Draggable component and its integration with DndKit.


    {cards.map((card) => (
        <Draggable card={card} key={card.id} canvasTransform={transform} />
    ))}

However, we now use the NonDraggable components, which all memo successfully, and only the dragged card is re-rendered. There is still a time cost using the memo, and this is the slowest part of the solution, but it leads to an increase in speed to 500ms with 100,000 cards.

const NonDraggable = memo(...)

const AllCards = memo((cards, setHoverCard) => {
    <>
        {cards.map((card) => {
            <NonDraggable card={card} key={card.id} setHoverCard={setHoverCard} />);
        })}
    ;
});

Results

The base unoptimized version started to get slow between 1000 and 5000 cards. Standard optimizations improved this to around 10,000 cards, and the more optimization took it to about 100,000 cards. The trade off is that the code becomes significantly more complicated, which makes it harder to understand and modify, especially for people new to the codebase.

		Pan (ms)	Zoom (ms)	Start drag (ms)	End drag (ms)	Card hover (ms)
1000 cards	Base	3	4	200	50	-
	Basic optimization	2	3	200	30	-
	Intensive optimization	10	10	7	15	2
5000 cards	Base	20	150	450	200	-
	Basic optimization	20	150	200	80	-
	Intensive optimization	10	10	25	40	2
10,000 cards	Base	50	300	900	400	-
	Basic optimization	50	300	400	180	-
	Intensive optimization	25	25	50	50	2
50,000 cards	Base	1000	1500	4000	1800	-
	Basic optimization	1000	1500	1900	900	-
	Intensive optimization	150	150	150	250	5
100,000 cards	Base	-	-	-	-	-
	Basic optimization	3000	4500	5000	2500	-
	Intensive optimization	150	250	300	500	15

Summary

It is unusual to display 100,000 or more items on screen in a standard React App, but in a highly graphical codebase, it becomes much more likely.

With these numbers, the browser rendering engine is likely to take a significant amount of time, so it is best to use the performance API to measure performance, instead of the usual React tools.

Standard React optimization strategies do work and improve the situation, but there is a need to go further, by finding ways to avoid renders, and even to avoid too many memo comparisons.

Key Metrics That Can Make or Break Your Startup

Aditya Vikram Kashyap — Thu, 07 Aug 2025 18:21:21 +0000

If you’ve built something worth pitching – something more than a fancy hobby with a login screen – you need to know your numbers. Not "I’ll get back to you" know them, know them like you know your co-founder's coffee order.

I have seen too many founders who are smart, legit, and ambitious get ghosted by investors simply because they couldn't walk through their unit economics.

It's not personal. It's math.

So here it is: Numbers that will either carry your pitch or quietly kill it, explained by someone who has sat through them time and time again, with examples, and no fluff.

Here’s what we’ll cover:

1. Burn Rate: How Fast Are You Lighting Your Cash on Fire?
2. Cash Runway: How Long Before You Run Out of Cash?
3. CAC (Customer Acquisition Cost): How Much Does it Cost to Convince Someone to Pay You?
4. Customer Lifetime Value (LTV): How Much is One Customer Worth Over Time?
5. Gross Profit Margin: What Do You Actually Keep After Delivering Your Service or Product?
6. Monthly / Annual Recurring Revenue (MRR / ARR)
7. Churn Rate: How Fast Are Your Users Leaving
8. Payback Period: How Long Before You Recover Your CAC?
9. Earnings Before Interest, Taxes, Depreciation, and Amortization (EBITDA)
10. Valuation: What’s Your Company Worth – and What Supports that Number?
Real Talk Before You Close That Tab
- Real Experience
- Why This Matters
Conclusion and Final Thoughts

1. Burn Rate: How Fast Are You Lighting Your Cash on Fire?

Burn rate is the speed at which a startup is spending its cash. Basically, how fast are you consuming your venture capital to cover over overhead until you generate positive cash flow from operations? It’s a measure of negative cash flow.

If you’re spending $80K a month to keep the lights on (payroll, AWS, your workspace snacks, and so on), that’s how much cash you’re burning each month. But many startups calculate two different burn rates: gross burn (how much cash you’re spending, ignoring any revenue), and net burn (monthly operating experiences minus any cash you take in each month). Net burn basically measures how fast your cash is shrinking, and it’s often what investors care more about.

Real talk: investors want to know when the plane runs out of fuel before they board. Thats what this metric helps them understand – how fast you’re going through the money you have.

At some point, if a company has a high burn rate, it has to reduce structural costs by cutting expenditures on labor, rent, marketing, and/or capital equipment. The burn rate is an important metric for any company, but it's particularly important for startups that aren't yet generating revenue. It tells managers and investors how fast the company is spending its capital

Watch this video to understand more about Burn Rate.

2. Cash Runway: How Long Before You Run Out of Cash?

Cash runway tells how long a startup can continue to operate at a certain burn rate until they are out of cash. For startups without revenue, you can calculate this by dividing available cash by total monthly expenses. Available cash is defined as the funds that are accessible now or can be accessed at a later time relatively quickly to pay for expenses.

When you’re making this calculation, it’s important to not include any anticipated fundraising and other uncertain sources of capital.

Actively managing cash runway is crucial for startup survival and growth. With a significant percentage of startups failing due to cash shortages, founders need to closely monitor their cash burn rate and runway.

The length of runway needed varies based on factors including the startup’s stage, industry, and milestones. In tighter venture capital markets, startups should plan for longer runways and consider strategies such as increasing revenue, reducing expenses, or raising additional capital. Regularly updating financial models and understanding metrics like the burn multiple can help you make informed decisions to extend your runway and align your growth ambitions with financial stability.

While it’s a simple calculation at face value, a cash runway analysis is nuanced and unique to every startup and can be impacted by a multitude of circumstances.

To calculate this, you just divide your total cash reserves by the amount you’re spending each month. Say you’ve got $250K in the bank and you’re spending $50K/month: 250/50 = 5. So you’ve got 5 months. Not 6. Not “it depends” – 5. That’s your runway.

Investors ask “If we don’t fund you, how long do you survive?” If you don’t know that answer, you're not fundraising – you’re freelancing with hope.

Here is a video that explains cash runway with real world examples and the thought process behind it.

And here’s an article from JP Morgan breaking down cash runway, its importance, and what can you to to maximize it.

Burn Rate vs Runway

So, let’s just make this super clear: burn rate is simply how much you spend each month to run your operation – that is, your negative cash flow. Runway is how many months there are left before your bank balance reaches zero.

So again, why do these numbers matter?

Because burn rate tells you how quickly you need to find more revenue or funding. Runway tells investors whether you are going to still be around by the time they finish their due diligence.

They are not just numbers. They are your survival clock.

Smart founders utilize these metrics to:

Trim the fat without cutting muscle – know what to focus on and what to let go
Forecast hiring/fundraising deadlines – know the process and prep for it. Numbers don’t line but they sure can get you ghosted.
Assure investors you’re not going to come knocking again in 90 days – establishing credibility is key, make an investor realize its not just a hobby, you mean business.

The goal: Extend runway without stalling momentum. Keep the plane in the air, while building a bigger engine.

3. CAC (Customer Acquisition Cost): How Much Does it Cost to Convince Someone to Pay You?

Cost of acquisition refers to the entire cost that a business incurs to obtain a new client or asset. This includes the purchase price, shipping, installation, and marketing costs for the asset acquired. CAC takes into account the total expenditure on all marketing, advertising, and sales for the period, which you then divide by the number of new customers for the period.

In this case, all the upfront costs incurred to purchase a business asset, including equipment or inventory, are part of the cost of acquisition. Cost of acquisition includes:

Purchase price of the item
Costs to ship it to its point of use
Costs to install the item
Costs to get it up and running (in the case of equipment) or ready for sale (in the case of inventory) condition
Marketing sales teams salaries
All sales and consulting marketing expenses geared to get new consumers should all be included

Formula:
CAC = (Total Marketing + Sales Expenses) / Number of New Customers Acquired

Say you spent $10K last month across paid ads, content creation, outbound campaigns, and sales team costs. You onboarded 100 new customers, so your CAC = $100.

But is that good?

It depends on:

Your pricing model (one-time vs. subscription)
Your margin (how much of that sale do you actually keep?)
Your customer retention (how long do they stick around?)

If you’re selling a $20 product once, a $100 CAC is a non-starter. But if that customer brings in $50/month for 12 months, you’ve got a solid return.

Watch for red flags:

CAC is rising but revenue isn’t
You’re overly reliant on paid ads (especially if organic/referral is flat)
You don’t know CAC by channel (averages hide leaks)

A healthy CAC is one that pays itself back quickly and can be improved over time as you optimize funnels and messaging

Here is a video that breaks down CAC for you.

4. Customer Lifetime Value (LTV): How Much is One Customer Worth Over Time?

Customer Lifetime Value is the average monetary value of each customer to your business. LTV takes into account how much a unique customer is expected to spend with your business. It’s an important metric so you know how much new customers are worth to your business over their lifespan as a customer.

Let’s say you charge $25/month. The average customer sticks around 12 months.
LTV = $300.

In this case, if your CAC is $80? You’re in the green. But if it’s $350? You’re basically paying people to hang out (and losing money on them).

Now, let’s connect this to CAC.

Say your CAC is $80. You’re doing fine – your LTV is ~4x CAC. That’s what investors want to see.

Rule of thumb: you want your LTV to be at least 3x your CAC. A 1:1 ratio means you’re barely breaking even, before operational costs and the math stops working at scale. So if you can hit a 3:1 ratio, great – and based off my experience, your business will be much more appealing if it’s closer to 5:1.

And keep in mind that different models can have different thresholds. For example, a SaaS company with low churn can afford higher CACs, while an e-commerce platform might need faster payback. And marketplaces and freemium models may have lower LTV per user, but they can often more easily offset it with volume.

If you don’t know your LTV or can’t defend it with data, it becomes hard to justify spend – and easy for investors to walk.

If you want to know more, this video walks you through the basics.

And here’s a video that beautifully explains the CAC and LTV relationship.

5. Gross Profit Margin: What Do You Actually Keep After Delivering Your Service or Product?

Gross profit margin shows the amount of money a business collects after it pays for all its expenses. It’s usually calculated as a percentage of sales. This specific metric is also referred to as the gross margin ratio.

Companies use gross margin as a measure of how production costs relate to revenue. If a company's gross margin falls because it is making less revenue, it may try to cut labor costs, find cheaper suppliers of materials, or increase prices to increase revenue.

Gross profit margins can also allow a business to measure how efficient a company is, or compare two very differently sized companies that share a common revenue stream or product

If you sell a subscription for $50/month and it costs you $10/month to host, maintain, and support it, your gross margin is 80%.

Good: SaaS companies often hit 70–90%.
Bad: If you're below 30%, your "scalable" business will collapse under weight.

Want to know the conceptual math behind this metric and how it differs from Profit Margin? Here is a fantastic video that easily breaks it down.

6. Monthly / Annual Recurring Revenue (MRR / ARR)

Annual recurring revenue (ARR) is revenue a company expects to see from its product and service offerings, calculated over the course of a year. Companies that sell annual subscriptions like using ARR as a sales metric to track what they anticipate making in a year.

ARR tends to be used if companies sell a product or service in the software as a service (SaaS) space, but it can also be useful in terms of streaming services, cell phone bills, and (almost) anything else with a predictable, recurring charge.

ARR is calculated annually, whereas monthly recurring revenue (MRR) is calculated monthly. MRR is useful in that it shows what’s happening on a month-to-month basis. For example, if you change your price in April, you can see the immediate effects of that change in May. MRR also helps track fluctuations in revenue based on outside factors like holiday shopping seasons and economic conditions.

In a nutshell, Monthly / Annual Recurring Revenue = predictable income.

If you’re pulling $20K/month in subscriptions, that’s $240K ARR. Simple.

What investors care about:

- Is it growing?
  - How fast?
  - And how stable is it?

Here is a founder breaking down the metric and explaining the relationship between MRR/ARR.

7. Churn Rate: How Fast Are Your Users Leaving

The churn rate, also known as attrition rate, represents the rate at which a customer stops doing business with a company. Customer churn is typically expressed as the percentage of service subscribers that discontinue their service subscriptions within a time frame. Churn can also be expressed as the rate at which employees leave their jobs in a given time.

In order for a business to grow its number of clients, its growth rate (which takes into account new customers) must be higher than its churn rate.

The benefit of calculating a churn rate is that it can clarify how well a business is retaining its customers, which is a measure of the quality of service the business is providing and the usefulness of that service.

When a business can see its churn rate increasing from period to period, this suggests that a critical aspect of how it is running the business might be problematic or flawed.

It could be the result of:

A faulty product(s)
Bad customer service
Costs exceed utility to customers

And so on.

The churn rate will indicate to a business that it needs to learn why its customers are leaving, and where it needs to adjust its business. It’s more expensive to attract new customers than it is to retain them, so reducing the churn rate can save a business resources in the future.

Real talk: Say you had 500 users at the start of the month, and you lost 50 by the end of the month. That’s 10% churn – which is high! Annualize that and…ouch. You're not growing. You're replacing.

Make sure you fix this before you fundraise. Or at least explain why churn’s high and what you’re doing to plug the holes.

Here is a video that beautifully explains Churn Rate.

8. Payback Period: How Long Before You Recover Your CAC?

The payback period is a popular tool for determining investment return. People invest money for the purpose of getting it back and generating a positive return on the money they invested. The shorter the payback period, the more beneficial the investment will be.

The payback period does not factor in the time value of money. You can determine it simply by counting the number of years until the principal paid in is returned.

This metric measures how quickly your customer pays you back for the cost of acquiring them. The payback period doesn’t take into account the total profitability of an investment. It’s just concerned with paying the investment back.

There are two common interpretations:

Customer-Level Payback: If your CAC is $250 and your customer pays $50/month, it’ll take 5 months to recover the acquisition cost.
Investment-Level Payback: You spend $100,000 on a new sales hire, tool stack, or feature. You want to know how long it takes for that investment to generate $100,000 in profit.

Both use the same principle: the shorter the payback period, the less cash you need to float your growth.

If you want a target, aim for 6 months for customer-level payback. Closer to 3-6 is ideal. Long payback periods mean you need deep pockets – or exceptional retention – to stay afloat.

Here’s a video where you can learn more.

9. Earnings Before Interest, Taxes, Depreciation, and Amortization (EBITDA)

EBITDA stands for Earning Before Interest, Taxes, Depreciation, and Amortization. You can think of this as just your company's operating profit if you wanted a very rudimentary way of referring to it.

It’s not flashy. It’s not fun. But it tells investors: “Here’s what we really make once the accounting fog clears.” and “Are we generating real profits from our actual operations?”

EBITDA is what investors look at because it is the best way of comparing apples to apples when considering startups. EBITDA can provide investors a measure of your operational health.

A negative EBITDA for an early stage company isn't going to raise any eyebrows. Just don’t act surprised when someone brings it up. You need to do that math before the pitch. But, if you have a growing early stage company that's moving from negative EBITDA to positive? Now your getting into grown folks business.

Here’s a video that explains the basics of EBITDA.

And here’s another video that explains how investors look at EBITDA and its value in determining your business’s worth.

10. Valuation: What’s Your Company Worth – and What Supports that Number?

Valuation is a focused exercise that determines the value of an asset, investment, or company. So, how much your company’s worth. And the goal is typically to determine whether that value is a fair value.

Valuations can be conducted in one of two ways:

an absolute valuation, which evaluates a company on its own merits and entirely independently of other factors/companies, or
a relative valuation, which evaluates the company relative to other similar firms, or assets, in the same sector or industry. This determines if the company, or asset, is worth that much relative to others.

Depending on how the analysis and conclusions are reached, there are a variety of methods and techniques used to develop valuations. And as you’d expect, there’s often significant variability between outputs (or valuations) based on the inputs and context.

While valuations are predominantly quantitatively driven, there’s often a significant subjective influence that come from the assumptions and estimates made along the way. Valuations are also subject to developing situations and events outside of the analysis or the control of the analyst – for example, earnings reports or material news, or economic news – that can result in a change to a valuation stance.

If you’re pre-revenue and you’re saying $30M because a friend raised at that, please stop. Their experience likely has nothing to do with yours.

Valuation = traction + market comps + revenue + momentum + team.

Valuation isn’t just about what you want – it’s about what you can defend.

Startups are typically valued using:

Comparable Analysis (Comps): What similar companies are worth
Discounted Cash Flow (DCF): Projecting future cash and discounting it back
Revenue Multiples: Often 5x–10x for SaaS, but varies wildly
Precedent Transactions: What investors paid in past rounds for similar startups

But that’s the math.

Here’s the messy truth: Valuation = Traction + Team + TAM (total addressable market) + Timing + Storytelling.

Hard factors:

MRR/ARR
Growth rate
Churn
CAC:LTV
Gross margins

Soft factors:

Founding team’s track record
Market momentum
Hype or scarcity

Don’t inflate. Don’t anchor to your friend’s raise. Know your comps. And show why your model is defensible, not just desirable. Inflated numbers make investors run. They don’t correct you – they just ghost you.

There are numerous books written on valuation and each technique could be its own PhD. But my role here is to give you a sneak peak into the metrics.

Here’s a basic video on valuation if you’re interested in a deeper dive.

And here’s a more detailed video course outlining different forms of valuation. Professor Damodaran from New York University is considered to be one of the aces and thought leaders when it comes to valuation. In this video course he explains stepwise and beautifully so you can understand and explore the fascinating world of valuations.

Real Talk Before You Close That Tab

Real Experience

I met a founder once – early days, rough product, but you could tell he actually cared. He wasn't trying to look good. No buzzwords. No "disrupt" talk. Just someone trying to solve something annoying and important.

He walked into the room with a twinkle. Not swagger – just that gentle intensity. We were leaning in.

Then, in the middle of the pitch, someone asked, "So what's your monthly burn?" And I swear to you, he said, "Umm... I think my co-founder has that. I haven't looked in a while."

That was it.

No freak out. No awkward pause. Just... a cluck. Like a window closing in the background.

The product? Still smart. But the moment? Gone.

Nobody was mad. Nobody laughed. We even said thank you. But nobody followed up.

Why? Because it didn't feel like a business. It felt like a maybe.

Why This Matters

I’ve seen so many versions of that same scene play out. It’s never about charisma. It’s not even about the idea, half the time.

It’s about whether the person asking for money actually knows what they’re building. Not the dream, the mechanics. The guts, nuts and bolds of the business. The ugly Excel math nobody brags about on Twitter.

Unfortunately, no simple pitch deck will do that part for you. No co-founder can answer those questions on your behalf.

If it’s your vision, own the math. If it’s your company, learn the cost of keeping it alive.

The rest? The logos, the taglines, the “go-to-market” plans?.... All of that’s just packaging.

And you don’t have to be perfect either. You just have to be in it. Eyes open. Numbers in your head.
Because if you’re asking people to believe in what you’re building, you’d better believe in the scaffolding holding it up.

So yeah, know your CAC. Your LTV. Your margins. Your churn. Not to check some box on an investor’s sheet, but to prove to yourself and the investor that the thing you’re spending your life on…has legs. That it can stand. And run.

And maybe, someday, outlast you. Maybe!

Conclusion and Final Thoughts

I hope this was helpful to you, especially if you’re a founder or aspiring founder trying to build the next big thing. While there a many more ratios and concepts, these are the crux of them.

A lot of other complex ratios and valuations are either built using these metrics or refer them in some way. And each of these metrics could be an article of its own. But I wanted to give you my top 10 run down so that you could get a head start. Numbers are very much a part of the ideation stage itself, and omitting them from your strategy could prove to be a fatal mistake.

I’ll leave you with one last video on How to Start a Start Up with Michael Seibel (Reddit, YC, Twitch) that I hope you find valuable. It lays out, in a crash course format, the mindset of a founder who has been there and done that. The fun fact is that a lot of the themes he speaks of tie in to the metrics here, directly or indirectly.

I hope this gives you a perspective of being on the other side, evaluating your hard work and passion, and I hope it sets you up for success in your next Investor Review.

I look forward to your thoughts, comments, and feedback. If this was helpful, engaging, and informative, do share it – you never know who may need it, or could benefit from it. I wish you all the very best in your funding rounds.

Until then, keep learning, unlearning, and relearning, folks.

How to Sort Dates Efficiently in JavaScript

Brandon Wozniewicz — Fri, 30 May 2025 13:41:38 +0000

Recently, I was working on a PowerApps Component Framework (PCF) project that required sorting an array of objects by date. The dates were in ISO 8601 format but without a time zone – for example, "2025-05-01T15:00:00.00".

Without much thought, I wrote something similar to:

const sorted = data.sort((a, b) => {
  return new Date(a.date) - new Date(b.date);
})

This worked fine on small datasets. But the array I was sorting had nearly 30,000 objects. On a fast development machine, the performance hit was around 100–150ms – already noticeable when combined with other UI work. When I tested with 4× CPU throttling in the browser, the delay increased to nearly 400ms, which more accurately simulates a lower-end device. That's a reasonable approach to ensure your app still performs well for users on slower machines.

Results in browser:

sort_with_date_conversion: 397.955078125 ms

Output with performance throttled by 4x slowdown

In this article, you will learn how to sort dates efficiently in JavaScript. We'll walk through what makes the method above inefficient, as well as a better pattern–especially when dealing with large amounts of data.

Why 400ms Feels Slow
Setting Up Our Experiment
The Cost of Date Conversion
The Lexicographical Superpower of ISO 8601
What If Your Dates Aren't ISO Format?
Key Takeaways

Why 400ms Feels Slow

According to Jakob Nielsen's classic "Usability Engineering" (1993), delays under 100 milliseconds are perceived as instantaneous. Between 100ms and 1,000ms, users start to notice lag – even if it doesn't require UI feedback. In my case, 400ms felt choppy, especially since the PCF component was already handling other tasks. It wasn't going to cut it.

Setting Up Our Experiment

Let's simulate this with a simple experiment that stress tests our sorting. We'll create an array of 100,000 ISO-formatted dates, and we will simulate a 4x performance slowdown in the browser for all scenarios:

// Create an array of 100,000 ISO-format dates
const isoArray = [];
let currentDate = new Date(2023, 9, 1); // October 1, 2023

for (let i = 0; i < 100000; i++) {
  const year = currentDate.getFullYear();
  const month = String(currentDate.getMonth() + 1).padStart(2, '0');
  const day = String(currentDate.getDate()).padStart(2, '0');

  isoArray.push({ date: `\({year}-\){month}-${day}`, value: i });
  currentDate.setDate(currentDate.getDate() + 1); // advance by one day
}

// Shuffle the array to simulate unsorted input
function shuffle(array) {
  for (let i = array.length - 1; i > 0; i--) {
    const j = Math.floor(Math.random() * (i + 1));
    [array[i], array[j]] = [array[j], array[i]];
  }
}

shuffle(isoArray);

The Cost of Date Conversion

Now, let's sort using the new Date() method, where each new date is instantiated directly inside the sort method.

console.time('sort_with_date_conversion');

// Sorting by converting each string to a Date object on every comparison
const sortedByDate = isoArray.sort((a, b) => {
  return new Date(a.date) - new Date(b.date);
});

console.timeEnd('sort_with_date_conversion');

Result in browser:

sort_with_date_conversion: 1629.466796875 ms

Sorting 100,000 dates took almost 2 seconds.

Almost 2 seconds. Ouch.

The Lexicographical Superpower of ISO 8601

Here’s the critical realization: ISO 8601 date strings are already lexicographically sortable. That means we can skip the Date object entirely:

console.time('sort_by_iso_string');

// Compare strings directly — thanks to ISO 8601 format
const sorted = isoArray.sort((a, b) => 
  a.date > b.date ? 1 : -1
);

console.timeEnd('sort_by_iso_string');
console.log(sorted.slice(0, 10));

Output in the console:

sort_by_iso_string: 10.549072265625 ms
[
  { date: '2023-10-01', value: 0 },
  { date: '2023-10-02', value: 1 },
  { date: '2023-10-03', value: 2 },
  { date: '2023-10-04', value: 3 },
  { date: '2023-10-05', value: 4 },
  { date: '2023-10-06', value: 5 },
  { date: '2023-10-07', value: 6 },
  { date: '2023-10-08', value: 7 },
  { date: '2023-10-09', value: 8 },
  { date: '2023-10-10', value: 9 }
]

From 1600ms down to ~10ms. That's a 160x speedup.

Why is this faster? Because using new Date() inside .sort() results in creating two new Date objects per comparison. With 100,000 items and how sort works internally, that's millions of object instantiations. On the other hand, when we sort lexicographically, we are simply sorting strings, which is far less expensive.

What If Your Dates Aren't ISO Format?

Let's say your dates are in MM/DD/YYYY format. Those strings aren't lexicographically sortable, so you'll need to transform them first.

Transform then Sort

console.time('sort_with_iso_conversion_first');

const sortedByISO = mdyArray
  .map((item) => { // First convert to ISO format
    const [month, day, year] = item.date.split('/');
    return { date: `\({year}-\){month}-${day}`, value: item.value };
  })
  .sort((a, b) => (a.date > b.date ? 1 : -1)); // then sort

console.timeEnd('sort_with_iso_conversion_first');

Output:

sort_with_iso_conversion_first: 58.8779296875 ms

Still perceived as instantaneous.

Retaining Original Objects

If you want to keep your original objects (with non-ISO dates), you can use tuples:

console.time('sort_and_preserve_original');

// Create tuples: [sortableDate, originalObject]
const sortedWithOriginal = mdyArray
  .map((item) => {
    const [month, day, year] = item.date.split('/');
    return [`\({year}-\){month}-${day}`, item]; // return the tuple items
  })
  .sort((a, b) => a[0] > b[0] ? 1 : -1) // sort based on the first item
  .map(([, item]) => item); // Return the original object

console.timeEnd('sort_and_preserve_original');

Output:

sort_and_preserve_original: 73.733154296875 ms

Still within the boundaries of being perceived as instantaneous.

The original data is preserved and the performance falls well within what is perceived as instantaneous.

Key Takeaways

Avoid object creation inside .sort(), especially for large arrays.
ISO 8601 strings are lexicographically sortable. Use string comparison when you can.
If your date strings aren't sortable, map them to a sortable form first, sort, and optionally map them back.
Minor tweaks in sorting can yield massive performance gains – especially in UI components or real-time visualizations.

Found this helpful? I write about practical automation, productivity systems, and building smarter workflows — without the jargon. Visit me at brandonwoz.com.

The Front-End Performance Optimization Handbook – Tips and Strategies for Devs

Gordan Tan — Wed, 07 May 2025 13:21:37 +0000

When you’re building a website, you’ll want it to be responsive, fast, and efficient. This means making sure the site loads quickly, runs smoothly, and provides a seamless experience for your users, among other things.

So as you build, you’ll want to keep various performance optimizations in mind – like reducing file size, making fewer server requests, optimizing images in various ways, and so on.

But performance optimization is a double-edged sword, with both good and bad aspects. The good side is that it can improve website performance, while the bad side is that it's complicated to configure, and there are many rules to follow.

Also, some performance optimization rules aren't suitable for all scenarios and should be used with caution. So make sure you approach this handbook with a critical eye. In it, I’ll lay out a bunch of ways you can optimize your website’s performance, and share insights to help you chose which of these techniques to use.

I’ll also provide the references for these optimization suggestions after each one and at the end of the article.

Reduce HTTP Requests
Use HTTP2
Use Server-Side Rendering
Use a CDN for Static Resources
Place CSS in the Head and JavaScript Files at the Bottom
Use Font Icons (iconfont) Instead of Image Icons
Make Good Use of Caching, Avoid Reloading the Same Resources
Compress Files
Image Optimization
- Lazy Loading Images
- Responsive Images
- Adjust Image Size
- Reduce Image Quality
- Use CSS3 Effects Instead of Images When Possible
- Use webp Format Images
Load Code on Demand Through Webpack, Extract Third-Party Libraries, Reduce Redundant Code When Converting ES6 to ES5
Reduce Reflows and Repaints
Use Event Delegation
Pay Attention to Program Locality
if-else vs switch
Lookup Tables
Avoid Page Stuttering
Use requestAnimationFrame to Implement Visual Changes
Use Web Workers
Use Bitwise Operations
Don't Override Native Methods
Reduce the Complexity of CSS Selectors
Use Flexbox Instead of Earlier Layout Models
Use Transform and Opacity Properties to Implement Animations
Use Rules Reasonably, Avoid Over-Optimization
Other References
Conclusion

1. Reduce HTTP Requests

A complete HTTP request needs to go through DNS lookup, TCP handshake, browser sending the HTTP request, server receiving the request, server processing the request and sending back a response, browser receiving the response, and other processes. Let's look at a specific example to understand how HTTP works:

This is an HTTP request, and the file size is 28.4KB.

Terminology explained:

Queueing: Time spent in the request queue.
Stalled: The time difference between when the TCP connection is established and when data can actually be transmitted, including proxy negotiation time.
Proxy negotiation: Time spent negotiating with the proxy server.
DNS Lookup: Time spent performing DNS lookup. Each different domain on a page requires a DNS lookup.
Initial Connection / Connecting: Time spent establishing a connection, including TCP handshake/retry and SSL negotiation.
SSL: Time spent completing the SSL handshake.
Request sent: Time spent sending the network request, usually a millisecond.
Waiting (TFFB): TFFB is the time from when the page request is made until the first byte of response data is received.
Content Download: Time spent receiving the response data.

From this example, we can see that the actual data download time accounts for only 13.05 / 204.16 = 6.39% of the total. The smaller the file, the smaller this ratio – and the larger the file, the higher the ratio. This is why it's recommended to combine multiple small files into one large file, which reduces the number of HTTP requests.

How to combine multiple files

There are several techniques to reduce the number of HTTP requests by combining files:

1. Bundle JavaScript files with Webpack

// webpack.config.js
module.exports = {
  entry: './src/index.js',
  output: {
    filename: 'bundle.js',
    path: path.resolve(__dirname, 'dist'),
  },
};

This will combine all JavaScript files imported in your entry point into a single bundle.

2. Combine CSS files
Using CSS preprocessors like Sass:

/* main.scss */
@import 'reset';
@import 'variables';
@import 'typography';
@import 'layout';
@import 'components';

Then compile to a single CSS file:

sass main.scss:main.css

Reference:

Resource_timing

2. Use HTTP2

Compared to HTTP1.1, HTTP2 has several advantages:

Faster parsing

When parsing HTTP1.1 requests, the server must continuously read bytes until it encounters the CRLF delimiter. Parsing HTTP2 requests isn't as complicated because HTTP2 is a frame-based protocol, and each frame has a field indicating its length.

Multiplexing

With HTTP1.1, if you want to make multiple requests simultaneously, you need to establish multiple TCP connections because one TCP connection can only handle one HTTP1.1 request at a time.

In HTTP2, multiple requests can share a single TCP connection, which is called multiplexing. Each request and response is represented by a stream with a unique stream ID to identify it.
Multiple requests and responses can be sent out of order within the TCP connection and then reassembled at the destination using the stream ID.

Header compression

HTTP2 provides header compression functionality.

For example, consider the following two requests:

:authority: unpkg.zhimg.com
:method: GET
:path: /za-js-sdk@2.16.0/dist/zap.js
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
cache-control: no-cache
pragma: no-cache
referer: https://www.zhihu.com/
sec-fetch-dest: script
sec-fetch-mode: no-cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36

:authority: zz.bdstatic.com
:method: GET
:path: /linksubmit/push.js
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9
cache-control: no-cache
pragma: no-cache
referer: https://www.zhihu.com/
sec-fetch-dest: script
sec-fetch-mode: no-cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36

From the two requests above, you can see that a lot of data is repeated. If we could store the same headers and only send the differences between them, we could save a lot of bandwidth and speed up the request time.

HTTP/2 uses "header tables" on the client and server sides to track and store previously sent key-value pairs, and for identical data, it's no longer sent through each request and response.

Here's a simplified example. Suppose the client sends the following header requests in sequence:

Header1:foo
Header2:bar
Header3:bat

When the client sends a request, it creates a table based on the header values:

Index	Header Name	Value
62	Header1	foo
63	Header2	bar
64	Header3	bat

If the server receives the request, it will create the same table.
When the client sends the next request, if the headers are the same, it can directly send a header block like this:

62 63 64

The server will look up the previously established table and restore these numbers to the complete headers they correspond to.

Priority

HTTP2 can set a higher priority for more urgent requests, and the server can prioritize handling them after receiving such requests.

Flow control

Since the bandwidth of a TCP connection (depending on the network bandwidth from client to server) is fixed, when there are multiple concurrent requests, if one request occupies more traffic, another request will occupy less. Flow control can precisely control the flow of different streams.

Server push

A powerful new feature added in HTTP2 is that the server can send multiple responses to a single client request. In other words, in addition to responding to the initial request, the server can also push additional resources to the client without the client explicitly requesting them.

For example, when a browser requests a website, in addition to returning the HTML page, the server can also proactively push resources based on the URLs of resources in the HTML page.

Many websites have already started using HTTP2, such as Zhihu:

Where "h2" refers to the HTTP2 protocol, and "http/1.1" refers to the HTTP1.1 protocol.

References:

3. Use Server-Side Rendering

In client-side rendering, you get the HTML file, download JavaScript files as needed, run the files, generate the DOM, and then render.

And in server-side rendering, the server returns the HTML file, and the client only needs to parse the HTML.

Pros: Faster first-screen rendering, better SEO.
Cons: Complicated configuration, increases the computational load on the server.

Below, I'll use Vue SSR as an example to briefly describe the SSR process.

Client-side rendering process

Visit a client-rendered website.
The server returns an HTML file containing resource import statements and
.
The client requests resources from the server via HTTP, and when the necessary resources are loaded, it executes new Vue() to instantiate and render the page.

Example of client-side rendered app (Vue):





  Client-side Rendering Example


  
  "app">

// main.js (compiled into bundle.js)
import Vue from 'vue';
import App from './App.vue';

// Client-side rendering happens here - after JS loads and executes
new Vue({
  render: h => h(App)
}).$mount('#app');

// App.vue

Server-side rendering process

Visit a server-rendered website.
The server checks which resource files the current route component needs, then fills the content of these files into the HTML file. If there are AJAX requests, it will execute them for data pre-fetching and fill them into the HTML file, and finally return this HTML page.
When the client receives this HTML page, it can start rendering the page immediately. At the same time, the page also loads resources, and when the necessary resources are fully loaded, it begins to execute new Vue() to instantiate and take over the page.

Example of server-side rendered app (Vue):

// server.js
const express = require('express');
const server = express();
const { createBundleRenderer } = require('vue-server-renderer');

// Create a renderer based on the server bundle
const renderer = createBundleRenderer('./dist/vue-ssr-server-bundle.json', {
  template: require('fs').readFileSync('./index.template.html', 'utf-8'),
  clientManifest: require('./dist/vue-ssr-client-manifest.json')
});

// Handle all routes with the same renderer
server.get('*', (req, res) => {
  const context = { url: req.url };

  // Render our Vue app to a string
  renderer.renderToString(context, (err, html) => {
    if (err) {
      // Handle error
      res.status(500).end('Server Error');
      return;
    }
    // Send the rendered HTML to the client
    res.end(html);
  });
});

server.listen(8080);





  Server-side Rendering Example

// entry-server.js
import { createApp } from './app';

export default context => {
  return new Promise((resolve, reject) => {
    const { app, router } = createApp();

    // Set server-side router's location
    router.push(context.url);

    // Wait until router has resolved possible async components and hooks
    router.onReady(() => {
      const matchedComponents = router.getMatchedComponents();

      // No matched routes, reject with 404
      if (!matchedComponents.length) {
        return reject({ code: 404 });
      }

      // The Promise resolves to the app instance
      resolve(app);
    }, reject);
  });
}

From the two processes above, you can see that the difference lies in the second step. A client-rendered website will directly return the HTML file, while a server-rendered website will render the page completely before returning this HTML file.

What's the benefit of doing this? It's a faster time-to-content.

Suppose your website needs to load four files (a, b, c, d) to render completely. And each file is 1 MB in size.

Calculating this way: a client-rendered website needs to load 4 files and an HTML file to complete the home page rendering, totaling 4MB (ignoring the HTML file size). While a server-rendered website only needs to load a fully rendered HTML file to complete the home page rendering, totaling the size of the already rendered HTML file (which isn't usually too large, generally a few hundred KB; my personal blog website (SSR) loads an HTML file of 400KB). This is why server-side rendering is faster.

References:

4. Use a CDN for Static Resources

A Content Delivery Network (CDN) is a set of web servers distributed across multiple geographic locations. We all know that the further the server is from the user, the higher the latency. CDNs are designed to solve this problem by deploying servers in multiple locations, bringing users closer to servers, thereby shortening request times.

CDN Principles

When a user visits a website without a CDN, the process is as follows:

The browser needs to resolve the domain name into an IP address, so it makes a request to the local DNS.
The local DNS makes successive requests to the root server, top-level domain server, and authoritative server to get the IP address of the website's server.
The local DNS sends the IP address back to the browser, and the browser makes a request to the website server's IP address and receives the resources.

If the user is visiting a website that has deployed a CDN, the process is as follows:

The browser needs to resolve the domain name into an IP address, so it makes a request to the local DNS.
The local DNS makes successive requests to the root server, top-level domain server, and authoritative server to get the IP address of the Global Server Load Balancing (GSLB) system.
The local DNS then makes a request to the GSLB. The main function of the GSLB is to determine the user's location based on the local DNS's IP address, filter out the closest local Server Load Balancing (SLB) system to the user, and return the IP address of that SLB to the local DNS.
The local DNS sends the SLB's IP address back to the browser, and the browser makes a request to the SLB.
The SLB selects the optimal cache server based on the resource and address requested by the browser and sends it back to the browser.
The browser then redirects to the cache server based on the address returned by the SLB.
If the cache server has the resource the browser needs, it sends the resource back to the browser. If not, it requests the resource from the source server, sends it to the browser, and caches it locally.

References:

5. Place CSS in the Head and JavaScript Files at the Bottom

CSS execution blocks rendering and prevents JS execution
JS loading and execution block HTML parsing and prevent CSSOM construction

If these CSS and JS tags are placed in the HEAD tag, and they take a long time to load and parse, then the page will be blank. So you should place JS files at the bottom (not blocking DOM parsing but will block rendering) so that HTML parsing is completed before loading JS files. This presents the page content to the user as early as possible.

So then you might be wondering – why should CSS files still be placed in the head?

Because loading HTML first and then loading CSS will make users see an unstyled, "ugly" page at first glance. To avoid this situation, place CSS files in the head.

You can also place JS files in the head as long as the script tag has the defer attribute, which means asynchronous download and delayed execution.

Here's an example of optimal placement:




  "UTF-8">
  Optimized Resource Loading

  
  "stylesheet" href="styles.css">

  
  


  
    My Website
    
  

  
    Content that users need to see quickly...

Explanation of this approach:

CSS in the : Ensures the page is styled as soon as it renders, preventing the "flash of unstyled content" (FOUC). CSS is render-blocking, but that's actually what we want in this case.
Critical JS with defer: The defer attribute tells the browser to:
- Download the script in parallel while parsing HTML
- Only execute the script after HTML parsing is complete but before the DOMContentLoaded event
- Maintain the order of execution if there are multiple deferred scripts
Non-critical JS before closing : Scripts without special attributes will:
- Block HTML parsing while they download and execute
- By placing them at the bottom, we ensure that all the important content is parsed and displayed first
- This improves perceived performance even if the total load time is the same

You can also use async for scripts that don't depend on DOM or other scripts:

The async attribute will download the script in parallel and execute it as soon as it's available, which may interrupt HTML parsing. Use this only for scripts that don't modify the DOM or depend on other scripts.

Reference:

Adding Interactivity with JavaScript

6. Use Font Icons (iconfont) Instead of Image Icons

A font icon is an icon made into a font. When using it, it's just like a font, and you can set attributes such as font-size, color, and so on, which is very convenient. Font icons are also vector graphics and won't lose clarity. Another advantage is that the generated files are particularly small.

Compress Font Files

Use the fontmin-webpack plugin to compress font files (thanks to Frontend Xiaowei for providing this).

References:

7. Make Good Use of Caching, Avoid Reloading the Same Resources

To prevent users from having to request files every time they visit a website, we can control this behavior by adding Expires or max-age. Expires sets a time, and as long as it's before this time, the browser won't request the file but will directly use the cache. Max-age is a relative time, and it's recommended to use max-age instead of Expires.

But this creates a problem: what happens when the file is updated? How do we notify the browser to request the file again?

This can be done by updating the resource link addresses referenced in the page, making the browser actively abandon the cache and load new resources.

The specific approach is to associate the URL modification of the resource address with the file content, which means that only when the file content changes, the corresponding URL will change. This achieves file-level precise cache control.

So what is related to file content? We naturally think of using digest algorithms to derive digest information for the file. The digest information corresponds one-to-one with the file content, providing a basis for cache control that's precise to the granularity of individual files.

How to implement caching and cache-busting:

1. Server-side cache headers (using Express.js as an example):

// Set cache control headers for static resources
app.use('/static', express.static('public', {
  maxAge: '1y', // Cache for 1 year
  etag: true,   // Use ETag for validation
  lastModified: true // Use Last-Modified for validation
}));

// For HTML files that shouldn't be cached as long
app.get('/*.html', (req, res) => {
  res.set({
    'Cache-Control': 'public, max-age=300', // Cache for 5 minutes
    'Expires': new Date(Date.now() + 300000).toUTCString()
  });
  // Send HTML content
});

2. Using content hashes in filenames (Webpack configuration):

// webpack.config.js
module.exports = {
  output: {
    filename: '[name].[contenthash].js', // Uses content hash in filename
    path: path.resolve(__dirname, 'dist'),
  },
  plugins: [
    // Extract CSS into separate files with content hash
    new MiniCssExtractPlugin({
      filename: '[name].[contenthash].css'
    }),
    // Generate HTML with correct hashed filenames
    new HtmlWebpackPlugin({
      template: 'src/index.html'
    })
  ]
};

This will produce output files like:

main.8e0d62a10c151dad4f8e.js
styles.f4e3a77c616562b26ca1.css

When you change the content of a file, its hash will change, forcing the browser to download the new file instead of using the cached version.

3. Example of generated HTML with cache-busting:




  "UTF-8">
  Cache Busting Example
  
  "stylesheet" href="/static/styles.f4e3a77c616562b26ca1.css">


  "app">

4. Version query parameters (simpler but less effective approach):

"stylesheet" href="styles.css?v=1.2.3">

When updating files, manually change the version number to force a new download.

References:

webpack-caching

8. Compress Files

Compressing files can reduce file download time, providing a better user experience.

Thanks to the development of Webpack and Node, file compression is now very convenient.

In Webpack, you can use the following plugins for compression:

JavaScript: UglifyPlugin
CSS: MiniCssExtractPlugin
HTML: HtmlWebpackPlugin

In fact, we can do even better by using gzip compression. This can be enabled by adding the gzip identifier to the Accept-Encoding header in the HTTP request header. Of course, the server must also support this feature.

Gzip is currently the most popular and effective compression method. For example, the app.js file generated after building a project I developed with Vue has a size of 1.4MB, but after gzip compression, it's only 573KB, reducing the volume by nearly 60%.

Here are the methods for configuring gzip in webpack and node.

Download plugins

npm install compression-webpack-plugin --save-dev
npm install compression

Webpack configuration

const CompressionPlugin = require('compression-webpack-plugin');

module.exports = {
  plugins: [new CompressionPlugin()],
}

Node configuration

const compression = require('compression')
// Use before other middleware
app.use(compression())

9. Image Optimization

1. Lazy Loading Images

In a page, don't initially set the path for images – only load the actual image when it appears in the browser's viewport. This is lazy loading. For websites with many images, loading all images at once can have a significant impact on user experience, so image lazy loading is necessary.

First, set up the images like this, where images won't load when they're not visible in the page:

"https://avatars0.githubusercontent.com/u/22117876?s=460&u=7bd8f32788df6988833da6bd155c3cfbebc68006&v=4">

When the page becomes visible, use JS to load the image:

const img = document.querySelector('img')
img.src = img.dataset.src

This is how the image gets loaded. For the complete code, please refer to the reference materials.

Reference:

Lazy loading images for the web

2. Responsive Images

The advantage of responsive images is that browsers can automatically load appropriate images based on screen size.

Implementation through picture:


    "banner_w1000.jpg" media="(min-width: 801px)">
    "banner_w800.jpg" media="(max-width: 800px)">
    "banner_w800.jpg" alt="">

Implementation through @media:

@media (min-width: 769px) {
    .bg {
        background-image: url(bg1080.jpg);
    }
}
@media (max-width: 768px) {
    .bg {
        background-image: url(bg768.jpg);
    }
}

3. Adjust Image Size

For example, if you have a 1920 * 1080 size image, you show it to users as a thumbnail, and only display the full image when users hover over it. If users never actually hover over the thumbnail, the time spent downloading the image is wasted.

So we can optimize this with two images. Initially, only load the thumbnail, and when users hover over the image, then load the large image. Another approach is to lazy load the large image, manually changing the src of the large image to download it after all elements have loaded.

Example implementation of image size optimization:


class="image-container">
  class="thumbnail" src="thumbnail-small.jpg" alt="Small thumbnail">
  class="full-size" data-src="image-large.jpg" alt="Full-size image">

/* CSS for the container and images */
.image-container {
  position: relative;
  width: 200px;
  height: 150px;
  overflow: hidden;
}

.thumbnail {
  width: 100%;
  height: 100%;
  object-fit: cover;
  display: block;
}

.full-size {
  display: none;
  position: absolute;
  top: 0;
  left: 0;
  z-index: 2;
  max-width: 600px;
  max-height: 400px;
}

/* Show full size on hover */
.image-container:hover .full-size {
  display: block;
}

// JavaScript to lazy load the full-size image
document.addEventListener('DOMContentLoaded', () => {
  const containers = document.querySelectorAll('.image-container');

  containers.forEach(container => {
    const thumbnail = container.querySelector('.thumbnail');
    const fullSize = container.querySelector('.full-size');

    // Load the full-size image when the user hovers over the thumbnail
    container.addEventListener('mouseenter', () => {
      if (!fullSize.src && fullSize.dataset.src) {
        fullSize.src = fullSize.dataset.src;
      }
    });

    // Alternative: Load the full-size image after the page loads completely
    /*
    window.addEventListener('load', () => {
      setTimeout(() => {
        if (!fullSize.src && fullSize.dataset.src) {
          fullSize.src = fullSize.dataset.src;
        }
      }, 1000); // Delay loading by 1 second after window load
    });
    */
  });
});

This implementation:

Shows only the thumbnail initially
Loads the full-size image only when the user hovers over the thumbnail
Provides an alternative approach to load all full-size images with a delay after page load

4. Reduce Image Quality

For example, with JPG format images, there's usually no noticeable difference between 100% quality and 90% quality, especially when used as background images. When cutting background images in Adobe Photoshop, I often cut the image into JPG format and compress it to 60% quality, and basically can't see any difference.

There are two compression methods: one is through the Webpack plugin image-webpack-loader, and the other is through online compression websites.

Here's how to use the Webpack plugin image-webpack-loader:

npm i -D image-webpack-loader

Webpack configuration:

{
  test: /\.(png|jpe?g|gif|svg)(\?.*)?$/,
  use:[
    {
    loader: 'url-loader',
    options: {
      limit: 10000, /* Images smaller than 1000 bytes will be automatically converted to base64 code references */
      name: utils.assetsPath('img/[name].[hash:7].[ext]')
      }
    },
    /* Compress images */
    {
      loader: 'image-webpack-loader',
      options: {
        bypassOnDebug: true,
      }
    }
  ]
}

5. Use CSS3 Effects Instead of Images When Possible

Many images can be drawn with CSS effects (gradients, shadows, and so on). In these cases, CSS3 effects are better. This is because code size is usually a fraction or even a tenth of the image size.

Reference:

Asset Management

6. Use WebP to Format Images

WebP's advantage is reflected in its better image data compression algorithm, which brings smaller image volume while maintaining image quality that's indistinguishable to the naked eye. It also has lossless and lossy compression modes, Alpha transparency, and animation features. Its conversion effects on JPEG and PNG are quite excellent, stable, and uniform.

Example of implementing WebP with fallbacks:



  "image.webp" type="image/webp">
  "image.jpg" type="image/jpeg">
  "image.jpg" alt="Description of the image">

Server-side WebP detection and serving:

// Express.js example
app.get('/images/:imageName', (req, res) => {
  const supportsWebP = req.headers.accept && req.headers.accept.includes('image/webp');
  const imagePath = supportsWebP 
    ? `public/images/${req.params.imageName}.webp` 
    : `public/images/${req.params.imageName}.jpg`;

  res.sendFile(path.resolve(__dirname, imagePath));
});

Reference:

WebP

10. Load Code on Demand Through Webpack, Extract Third-Party Libraries, Reduce Redundant Code When Converting ES6 to ES5

The following quote from the official Webpack documentation explains the concept of lazy loading:

"Lazy loading or on-demand loading is a great way to optimize a website or application. This approach actually separates your code at some logical breakpoints, and then immediately references or is about to reference some new code blocks after completing certain operations in some code blocks. This speeds up the initial loading of the application and lightens its overall volume because some code blocks may never be loaded." Source: Lazy Loading

Note: While image lazy loading (discussed in section 9.1) delays the loading of image resources until they're visible in the viewport, code lazy loading splits JavaScript bundles and loads code fragments only when they're needed for specific functionality. They both improve initial load time, but they work at different levels of resource optimization.

Generate File Names Based on File Content, Combined with Import Dynamic Import of Components to Achieve On-Demand Loading

This requirement can be achieved by configuring the filename property of output. One of the value options in the filename property is [contenthash], which creates a unique hash based on file content. When the file content changes, [contenthash] also changes.

output: {
    filename: '[name].[contenthash].js',
    chunkFilename: '[name].[contenthash].js',
    path: path.resolve(__dirname, '../dist'),
},

Example of code lazy loading in a Vue application:

// Instead of importing synchronously like this:
// import UserProfile from './components/UserProfile.vue'

// Use dynamic import for route components:
const UserProfile = () => import('./components/UserProfile.vue')

// Then use it in your routes
const router = new VueRouter({
  routes: [
    { path: '/user/:id', component: UserProfile }
  ]
})

This ensures the UserProfile component is only loaded when a user navigates to that route, not on initial page load.

Extract Third-Party Libraries

Since imported third-party libraries are generally stable and don't change frequently, extracting them separately as long-term caches is a better choice. This requires using the cacheGroups option of Webpack4's splitChunk plugin.

optimization: {
    runtimeChunk: {
        name: 'manifest' // Split webpack's runtime code into a separate chunk.
    },
    splitChunks: {
        cacheGroups: {
            vendor: {
                name: 'chunk-vendors',
                test: /[\\/]node_modules[\\/]/,
                priority: -10,
                chunks: 'initial'
            },
            common: {
                name: 'chunk-common',
                minChunks: 2,
                priority: -20,
                chunks: 'initial',
                reuseExistingChunk: true
            }
        },
    }
},

test: Used to control which modules are matched by this cache group. If passed unchanged, it defaults to select all modules. Types of values that can be passed: RegExp, String, and Function.
priority: Indicates extraction weight, with higher numbers indicating higher priority. Since a module might meet the conditions of multiple cacheGroups, extraction is determined by the highest weight.
reuseExistingChunk: Indicates whether to use existing chunks. If true, it means that if the current chunk contains modules that have already been extracted, new ones won't be generated.
minChunks (default is 1): The minimum number of times this code block should be referenced before splitting (note: to ensure code block reusability, the default strategy doesn't require multiple references to be split).
chunks (default is async): initial, async, and all.
name (name of the packaged chunks): String or function (functions can customize names based on conditions).

Reduce Redundant Code When Converting ES6 to ES5

To achieve the same functionality as the original code after Babel conversion, some helper functions are needed. For example this:

class Person {}

will be converted to this:

"use strict";

function _classCallCheck(instance, Constructor) {
  if (!(instance instanceof Constructor)) {
    throw new TypeError("Cannot call a class as a function");
  }
}

var Person = function Person() {
  _classCallCheck(this, Person);
};

Here, _classCallCheck is a helper function. If classes are declared in many files, then many such helper functions will be generated.

The @babel/runtime package declares all the helper functions needed, and the role of @babel/plugin-transform-runtime is to import all files that need helper functions from the @babel/runtime package:

"use strict";

var _classCallCheck2 = require("@babel/runtime/helpers/classCallCheck");

var _classCallCheck3 = _interopRequireDefault(_classCallCheck2);

function _interopRequireDefault(obj) {
  return obj && obj.__esModule ? obj : { default: obj };
}

var Person = function Person() {
  (0, _classCallCheck3.default)(this, Person);
};

Here, the helper function classCallCheck is no longer compiled, but instead references helpers/classCallCheck from @babel/runtime.

Installation:

npm i -D @babel/plugin-transform-runtime @babel/runtime

Usage:
In the .babelrc file,

"plugins": [
        "@babel/plugin-transform-runtime"
]

References:

11. Reduce Reflows and Repaints

Browser Rendering Process

Parse HTML to generate DOM tree.
Parse CSS to generate CSSOM rules tree.
Combine DOM tree and CSSOM rules tree to generate rendering tree.
Traverse the rendering tree to begin layout, calculating the position and size information of each node.
Paint each node of the rendering tree to the screen.

Reflow

When the position or size of DOM elements is changed, the browser needs to regenerate the rendering tree, a process called reflow.

Repaint

After regenerating the rendering tree, each node of the rendering tree needs to be painted to the screen, a process called repaint. Not all actions will cause reflow – for example, changing font color will only cause repaint. Remember, reflow will cause repaint, but repaint will not cause reflow.

Both reflow and repaint operations are very expensive because the JavaScript engine thread and the GUI rendering thread are mutually exclusive, and only one can work at a time.

What operations will cause reflow?

Adding or removing visible DOM elements
Element position changes
Element size changes
Content changes
Browser window size changes

How to reduce reflows and repaints?

When modifying styles with JavaScript, it's best not to write styles directly, but to replace classes to change styles.
If you need to perform a series of operations on a DOM element, you can take the DOM element out of the document flow, make modifications, and then bring it back to the document. It's recommended to use hidden elements (display:none) or document fragments (DocumentFragement), both of which can implement this approach well.

Example of causing unnecessary reflows (inefficient):

// This causes multiple reflows as each style change triggers a reflow
const element = document.getElementById('myElement');
element.style.width = '100px';
element.style.height = '200px';
element.style.margin = '10px';
element.style.padding = '20px';
element.style.borderRadius = '5px';

Optimized version 1 – using CSS classes:

/* style.css */
.my-modified-element {
  width: 100px;
  height: 200px;
  margin: 10px;
  padding: 20px;
  border-radius: 5px;
}

// Only one reflow happens when the class is added
document.getElementById('myElement').classList.add('my-modified-element');

Optimized version 2 – batching style changes:

// Batching style changes using cssText
const element = document.getElementById('myElement');
element.style.cssText = 'width: 100px; height: 200px; margin: 10px; padding: 20px; border-radius: 5px;';

Optimized version 3 – using document fragments (for multiple elements):

// Instead of adding elements one by one
const list = document.getElementById('myList');
const fragment = document.createDocumentFragment();

for (let i = 0; i < 100; i++) {
  const item = document.createElement('li');
  item.textContent = `Item ${i}`;
  fragment.appendChild(item);
}

// Only one reflow happens when the fragment is appended
list.appendChild(fragment);

Optimized version 4 – take element out of flow, modify, then reinsert:

// Remove from DOM, make changes, then reinsert
const element = document.getElementById('myElement');
const parent = element.parentNode;
const nextSibling = element.nextSibling;

// Remove (causes one reflow)
parent.removeChild(element);

// Make multiple changes (no reflows while detached)
element.style.width = '100px';
element.style.height = '200px';
element.style.margin = '10px';
element.style.padding = '20px';
element.style.borderRadius = '5px';

// Reinsert (causes one more reflow)
if (nextSibling) {
  parent.insertBefore(element, nextSibling);
} else {
  parent.appendChild(element);
}

Optimized version 5 – using display:none temporarily:

const element = document.getElementById('myElement');

// Hide element (one reflow)
element.style.display = 'none';

// Make multiple changes (no reflows while hidden)
element.style.width = '100px';
element.style.height = '200px';
element.style.margin = '10px';
element.style.padding = '20px';
element.style.borderRadius = '5px';

// Show element again (one more reflow)
element.style.display = 'block';

By using these optimization techniques, you can significantly reduce the number of reflows and repaints, leading to smoother performance, especially for animations and dynamic content updates.

12. Use Event Delegation

Event delegation takes advantage of event bubbling, allowing you to specify a single event handler to manage all events of a particular type. All events that use buttons (most mouse events and keyboard events) are suitable for the event delegation technique. Using event delegation can save memory.


  Apple
  Banana
  Pineapple


// good
document.querySelector('ul').onclick = (event) => {
  const target = event.target
  if (target.nodeName === 'LI') {
    console.log(target.innerHTML)
  }
}

// bad
document.querySelectorAll('li').forEach((e) => {
  e.onclick = function() {
    console.log(this.innerHTML)
  }
})

13. Pay Attention to Program Locality

A well-written computer program often has good locality – it tends to reference data items near recently referenced data items or the recently referenced data items themselves. This tendency is known as the principle of locality. Programs with good locality run faster than those with poor locality.

Locality usually takes two different forms:

Temporal locality: In a program with good temporal locality, memory locations that have been referenced once are likely to be referenced multiple times in the near future.
Spatial locality: In a program with good spatial locality, if a memory location has been referenced once, the program is likely to reference a nearby memory location in the near future.

Temporal locality example:

function sum(arry) {
    let i, sum = 0
    let len = arry.length

    for (i = 0; i < len; i++) {
        sum += arry[i]
    }

    return sum
}

In this example, the variable sum is referenced once in each loop iteration, so it has good temporal locality.

Spatial locality example:

Program with good spatial locality:

// Two-dimensional array 
function sum1(arry, rows, cols) {
    let i, j, sum = 0

    for (i = 0; i < rows; i++) {
        for (j = 0; j < cols; j++) {
            sum += arry[i][j]
        }
    }
    return sum
}

Program with poor spatial locality:

// Two-dimensional array 
function sum2(arry, rows, cols) {
    let i, j, sum = 0

    for (j = 0; j < cols; j++) {
        for (i = 0; i < rows; i++) {
            sum += arry[i][j]
        }
    }
    return sum
}

Looking at the two spatial locality examples above, the method of accessing each element of the array sequentially starting from each row, as shown in the examples, is called a reference pattern with a stride of 1.

If in an array, every k elements are accessed, it's called a reference pattern with a stride of k. Generally, as the stride increases, spatial locality decreases.

What's the difference between these two examples? Well, the first example scans the array by row, scanning one row completely before moving on to the next row. The second example scans the array by column, scanning one element in a row and immediately going to scan the same column element in the next row.

Arrays are stored in memory in row order, resulting in the example of scanning the array row by row getting a stride-1 reference pattern with good spatial locality. The other example has a stride of rows, with extremely poor spatial locality.

Performance Testing

Running environment:

CPU: i5-7400
Browser: Chrome 70.0.3538.110

Testing spatial locality on a two-dimensional array with a length of 9000 (child array length also 9000) 10 times, taking the average time (milliseconds), the results are as follows:

The examples used are the two spatial locality examples mentioned above.

Stride 1	Stride 9000
124	2316

From the test results above, the array with a stride of 1 executes an order of magnitude faster than the array with a stride of 9000.

So to sum up:

Programs that repeatedly reference the same variables have good temporal locality
For programs with a reference pattern with a stride of k, the smaller the stride, the better the spatial locality; while programs that jump around in memory with large strides will have very poor spatial locality

Reference:

Computer Systems: A Programmer's Perspective

14. if-else vs switch

As the number of judgment conditions increases, it becomes more preferable to use switch instead of if-else.

if (color == 'blue') {

} else if (color == 'yellow') {

} else if (color == 'white') {

} else if (color == 'black') {

} else if (color == 'green') {

} else if (color == 'orange') {

} else if (color == 'pink') {

}

switch (color) {
    case 'blue':

        break
    case 'yellow':

        break
    case 'white':

        break
    case 'black':

        break
    case 'green':

        break
    case 'orange':

        break
    case 'pink':

        break
}

In situations like the one above, from a readability perspective, using switch is better (JavaScript's switch statement is not based on hash implementation but on loop judgment, so from a performance perspective, if-else and switch are the same).

Why switch is better for multiple conditions:

Improved readability: Switch statements present a clearer visual structure when dealing with multiple conditions against the same variable. The case statements create a more organized, tabular format that's easier to scan and understand.
Cleaner code maintenance: Adding or removing conditions in a switch statement is simpler and less error-prone. With if-else chains, it's easy to accidentally break the chain or forget an "else" keyword.
Less repetition: In the if-else example, we repeat checking the same variable (color) multiple times, while in switch we specify it once at the top.
Better for debugging: When debugging, it's easier to set breakpoints on specific cases in a switch statement than trying to identify which part of a long if-else chain you need to target.
Intent signaling: Using switch communicates to other developers that you're checking multiple possible values of the same variable, rather than potentially unrelated conditions.

For modern JavaScript, there's another alternative worth considering for simple value mapping – object literals:

const colorActions = {
  'blue': () => { /* blue action */ },
  'yellow': () => { /* yellow action */ },
  'white': () => { /* white action */ },
  'black': () => { /* black action */ },
  'green': () => { /* green action */ },
  'orange': () => { /* orange action */ },
  'pink': () => { /* pink action */ }
};

// Execute the action if it exists
if (colorActions[color]) {
  colorActions[color]();
}

This approach provides even better performance (O(1) lookup time) compared to both if-else and switch statement approaches.

15. Lookup Tables

When there are many conditional statements, using switch and if-else is not the best choice. In such cases, you might want to try lookup tables. Lookup tables can be constructed using arrays and objects.

switch (index) {
    case '0':
        return result0
    case '1':
        return result1
    case '2':
        return result2
    case '3':
        return result3
    case '4':
        return result4
    case '5':
        return result5
    case '6':
        return result6
    case '7':
        return result7
    case '8':
        return result8
    case '9':
        return result9
    case '10':
        return result10
    case '11':
        return result11
}

This switch statement can be converted to a lookup table:

const results = [result0,result1,result2,result3,result4,result5,result6,result7,result8,result9,result10,result11]

return results[index]

If the conditional statements are not numerical values but strings, you can use an object to build a lookup table:

const map = {
  red: result0,
  green: result1,
}

return map[color]

Why lookup tables are better for many conditions:

Constant time complexity (O(1)): Lookup tables provide direct access to the result based on the index/key, making the operation time constant regardless of how many options there are. In contrast, both if-else chains and switch statements have linear time complexity (O(n)) because in the worst case, they might need to check all conditions.
Performance gains with many conditions: As the number of conditions increases, the performance advantage of lookup tables becomes more significant. For a small number of cases (2-5), the difference is negligible, but with dozens or hundreds of cases, lookup tables are substantially faster.
Code brevity: As shown in the examples, lookup tables typically require less code, making your codebase more maintainable.
Dynamic configuration: Lookup tables can be easily populated dynamically:

   const actionMap = {};

   // Dynamically populate the map
   function registerAction(key, handler) {
     actionMap[key] = handler;
   }

   // Register different handlers
   registerAction('save', saveDocument);
   registerAction('delete', deleteDocument);

   // Use it
   if (actionMap[userAction]) {
     actionMap[userAction]();
   }

Reduced cognitive load: When there are many conditions, lookup tables eliminate the mental overhead of following long chains of logic.

When to use each approach:

If-else: Best for a few conditions (2-3) with complex logic or different variables being checked
Switch: Good for moderate number of conditions (4-10) checking against the same variable
Lookup tables: Ideal for many conditions (10+) or when you need O(1) access time

In real applications, lookup tables might be populated from external sources like databases or configuration files, making them flexible for scenarios where the mapping logic might change without requiring code modifications.

16. Avoid Page Stuttering

60fps and Device Refresh Rate

Currently, most devices have a screen refresh rate of 60 times/second. Therefore, if there's an animation or gradient effect on the page, or if the user is scrolling the page, the browser needs to render animations or pages at a rate that matches the device's screen refresh rate.

The budget time for each frame is just over 16 milliseconds (1 second / 60 = 16.66 milliseconds). But in reality, the browser has housekeeping work to do, so all your work needs to be completed within 10 milliseconds. If you can't meet this budget, the frame rate will drop, and content will jitter on the screen.

This phenomenon is commonly known as stuttering and has a negative impact on user experience. Source: Google Web Fundamentals - Rendering Performance

Suppose you use JavaScript to modify the DOM, trigger style changes, go through reflow and repaint, and finally paint to the screen. If any of these takes too long, it will cause the rendering time of this frame to be too long, and the average frame rate will drop. Suppose this frame took 50 ms, then the frame rate would be 1s / 50ms = 20fps, and the page would appear to stutter.

For some long-running JavaScript, we can use timers to split and delay execution.

for (let i = 0, len = arry.length; i < len; i++) {
    process(arry[i])
}

Suppose the loop structure above takes too long due to either the high complexity of process() or too many array elements, or both, you might want to try splitting.

const todo = arry.concat()
setTimeout(function() {
    process(todo.shift())
    if (todo.length) {
        setTimeout(arguments.callee, 25)
    } else {
        callback(arry)
    }
}, 25)

If you're interested in learning more, check out High Performance JavaScript Chapter 6.

Reference:

Rendering Performance

17. Use `requestAnimationFrame` to Implement Visual Changes

From point 16, we know that most devices have a screen refresh rate of 60 times/second, which means the average time per frame is 16.66 milliseconds. When using JavaScript to implement animation effects, the best case is that the code starts executing at the beginning of each frame. The only way to ensure JavaScript runs at the beginning of a frame is to use requestAnimationFrame.

/**
 * If run as a requestAnimationFrame callback, this
 * will be run at the start of the frame.
 */
function updateScreen(time) {
  // Make visual updates here.
}

requestAnimationFrame(updateScreen);

If you use setTimeout or setInterval to implement animations, the callback function will run at some point in the frame, possibly right at the end, which can often cause us to miss frames, leading to stuttering.

Reference:

18. Use Web Workers

Web Workers use other worker threads to operate independently of the main thread. They can perform tasks without interfering with the user interface. A worker can send messages to the JavaScript code that created it by sending messages to the event handler specified by that code (and vice versa).

Web Workers are suitable for processing pure data or long-running scripts unrelated to the browser UI.

Creating a new worker is simple – just specify a script URI to execute the worker thread (main.js):

var myWorker = new Worker('worker.js');
// You can send messages to the worker through the postMessage() method and onmessage event
first.onchange = function() {
  myWorker.postMessage([first.value, second.value]);
  console.log('Message posted to worker');
}

second.onchange = function() {
  myWorker.postMessage([first.value, second.value]);
  console.log('Message posted to worker');
}

In the worker, after receiving the message, you can write an event handler function code as a response (worker.js):

onmessage = function(e) {
  console.log('Message received from main script');
  var workerResult = 'Result: ' + (e.data[0] * e.data[1]);
  console.log('Posting message back to main script');
  postMessage(workerResult);
}

The onmessage handler function executes immediately after receiving the message, and the message itself is used as the data property of the event. Here we simply multiply the two numbers and use the postMessage() method again to send the result back to the main thread.

Back in the main thread, we use onmessage again to respond to the message sent back from the worker:

myWorker.onmessage = function(e) {
  result.textContent = e.data;
  console.log('Message received from worker');
}

Here we get the data from the message event and set it as the textContent of result, so the user can directly see the result of the calculation.

Note that inside the worker, you cannot directly manipulate DOM nodes, nor can you use the default methods and properties of the window object. But you can use many things under the window object, including data storage mechanisms such as WebSockets, IndexedDB, and Firefox OS-specific Data Store API.

Reference:

19. Use Bitwise Operations

Numbers in JavaScript are stored in 64-bit format using the IEEE-754 standard. But in bitwise operations, numbers are converted to 32-bit signed format. Even with the conversion, bitwise operations are much faster than other mathematical and boolean operations.

Modulo

Since the lowest bit of even numbers is 0 and odd numbers is 1, modulo operations can be replaced with bitwise operations.

if (value % 2) {
    // Odd number
} else {
    // Even number 
}
// Bitwise operation
if (value & 1) {
    // Odd number
} else {
    // Even number
}

How it works: The & (bitwise AND) operator compares each bit of the first operand to the corresponding bit of the second operand. If both bits are 1, the corresponding result bit is set to 1; otherwise, it's set to 0.

When we do value & 1, we're only checking the last bit of the number:

For even numbers (for example, 4 = 100 in binary), the last bit is 0: 100 & 001 = 000 (0)
For odd numbers (for example, 5 = 101 in binary), the last bit is 1: 101 & 001 = 001 (1)

Floor

~~10.12 // 10
~~10 // 10
~~'1.5' // 1
~~undefined // 0
~~null // 0

How it works: The ~ (bitwise NOT) operator inverts all bits in the operand. For a number n, ~n equals -(n+1). When applied twice (~~n), it effectively truncates the decimal part of a number, similar to Math.floor() for positive numbers and Math.ceil() for negative numbers.

The process:

First ~: Converts the number to a 32-bit integer and inverts all bits
Second ~: Inverts all bits again, resulting in the original number but with decimal part removed

For example:

~10.12 → ~10 → -(10+1) → -11
~(-11) → -(-11+1) → -(-10) → 10

Bitmask

const a = 1
const b = 2
const c = 4
const options = a | b | c

By defining these options, you can use the bitwise AND operation to determine if a/b/c is in the options.

// Is option b in the options?
if (b & options) {
    ...
}

How it works: In bitmasks, each bit represents a boolean flag. The values are typically powers of 2 so each has exactly one bit set.

a = 1: Binary 001
b = 2: Binary 010
c = 4: Binary 100
options = a | b | c: The | (bitwise OR) combines them: 001 | 010 | 100 = 111 (binary) = 7 (decimal)

When checking if a flag is set with if (b & options):

b & options = 010 & 111 = 010 = 2 (decimal)
Since this is non-zero, the condition evaluates to true

This technique is extremely efficient for storing and checking multiple boolean values in a single number, and is commonly used in systems programming, graphics programming, and permission systems.

20. Don't Override Native Methods

No matter how optimized your JavaScript code is, it can't match native methods. This is because native methods are written in low-level languages (C/C++) and compiled into machine code, becoming part of the browser. When native methods are available, try to use them, especially for mathematical operations and DOM manipulations.

Example: String Replacement (Native vs. Custom)

A common pitfall is rewriting native string methods like replaceAll(). Below is an inefficient custom implementation versus the native method, with performance benchmarks:

// Inefficient custom global replacement (manual loop)  
function customReplaceAll(str, oldSubstr, newSubstr) {  
  let result = '';  
  let index = 0;  
  while (index < str.length) {  
    if (str.slice(index, index + oldSubstr.length) === oldSubstr) {  
      result += newSubstr;  
      index += oldSubstr.length;  
    } else {  
      result += str[index];  
      index++;  
    }  
  }  
  return result;  
}  

// Efficient native method (browser-optimized)  
function nativeReplaceAll(str, oldSubstr, newSubstr) {  
  return str.replaceAll(oldSubstr, newSubstr);  
}  

// Test with a large string (100,000 repetitions of "abc ")  
const largeString = 'abc '.repeat(100000);  

// Benchmark: Custom implementation  
console.time('customReplaceAll');  
customReplaceAll(largeString, 'abc', 'xyz');  
console.timeEnd('customReplaceAll'); // Output: ~5ms (varies by browser)  

// Benchmark: Native method  
console.time('nativeReplaceAll');  
nativeReplaceAll(largeString, 'abc', 'xyz');  
console.timeEnd('nativeReplaceAll'); // Output: ~2ms (typically 2-3x faster)

Key takeaways:

Performance: Native methods like replaceAll() are optimized at the browser level, often outperforming handwritten code (as shown in the benchmark above).
Maintainability: Native methods are standardized, well-documented, and less error-prone than custom logic (for example, handling edge cases like overlapping substrings).
Ecosystem compatibility: Using native methods ensures consistency with libraries and tools that rely on JavaScript’s built-in behavior.

When to Use Custom Code

While native methods are usually superior, there are rare cases where you might need custom logic:

When the native method doesn’t exist (for example, polyfilling for older browsers).
For highly specialized edge cases not covered by native APIs.
When you need to avoid function call overhead in extremely performance-critical loops (for example, tight numerical computations).

Remember: Browser vendors spend millions of hours optimizing native methods. By leveraging them, you gain free performance boosts and reduce the risk of reinventing flawed solutions.

21. Reduce the Complexity of CSS Selectors

1. When browsers read selectors, they follow the principle of reading from right to left.

Let's look at an example:

#block .text p {
    color: red;
}

Find all P elements.
Check if the elements found in result 1 have parent elements with class name "text"
Check if the elements found in result 2 have parent elements with ID "block"

Why is this inefficient? This right-to-left evaluation process can be very expensive in complex documents. Take the selector #block .text p as an example:

The browser first finds all p elements in the document (potentially hundreds)
For each of those paragraph elements, it must check if any of their ancestors have the class text
For those that pass step 2, it must check if any of their ancestors have the ID block

This creates a significant performance bottleneck because:

The initial selection (p) is very broad
Each subsequent step requires checking multiple ancestors in the DOM tree
This process repeats for every paragraph element

A more efficient alternative would be:

#block p.specific-text {
    color: red;
}

This is more efficient because it directly targets only paragraphs with a specific class, avoiding checking all paragraphs

2. CSS selector priority

Inline > ID selector > Class selector > Tag selector

Based on the above two pieces of information, we can draw conclusions:

The shorter the selector, the better.
Try to use high-priority selectors, such as ID and class selectors.
Avoid using the universal selector *.

Practical advice for optimal CSS selectors:

/* ❌ Inefficient: Too deep, starts with a tag selector */
body div.container ul li a.link {
    color: blue;
}

/* ✅ Better: Shorter, starts with a class selector */
.container .link {
    color: blue;
}

/* ✅ Best: Direct, single class selector */
.nav-link {
    color: blue;
}

Finally, I should say that according to the materials I've found, there's no need to optimize CSS selectors because the performance difference between the slowest and fastest selectors is very small.

Reference:

Optimizing CSS: ID Selectors and Other Myths

22. Use Flexbox Instead of Earlier Layout Models

In early CSS layout methods, we could position elements absolutely, relatively, or using floats. Now, we have a new layout method called Flexbox, which has an advantage over earlier layout methods: better performance.

The screenshot below shows the layout cost of using floats on 1300 boxes:

Then we recreate this example using Flexbox:

Now, for the same number of elements and the same visual appearance, the layout time is much less (3.5 milliseconds versus 14 milliseconds in this example).

But Flexbox compatibility is still an issue, as not all browsers support it, so use it with caution.

Browser compatibility:

Chrome 29+
Firefox 28+
Internet Explorer 11
Opera 17+
Safari 6.1+ (prefixed with -webkit-)
Android 4.4+
iOS 7.1+ (prefixed with -webkit-)

Reference:

Use flexbox instead of earlier layout models

23. Use Transform and Opacity Properties to Implement Animations

In CSS, transforms and opacity property changes don't trigger reflow and repaint. They’re properties that can be processed by the compositor alone.

Example: Inefficient vs. Efficient Animation

❌ Inefficient animation using properties that trigger reflow and repaint:

/* CSS */
.box-inefficient {
  position: absolute;
  left: 0;
  top: 0;
  width: 100px;
  height: 100px;
  background-color: #3498db;
  animation: move-inefficient 2s infinite alternate;
}

@keyframes move-inefficient {
  to {
    left: 300px;
    top: 200px;
    width: 150px;
    height: 150px;
  }
}

This animation constantly triggers layout recalculations (reflow) because it animates position (left/top) and size (width/height) properties.

✅ Efficient animation using transform and opacity:

/* CSS */
.box-efficient {
  position: absolute;
  width: 100px;
  height: 100px;
  background-color: #3498db;
  animation: move-efficient 2s infinite alternate;
}

@keyframes move-efficient {
  to {
    transform: translate(300px, 200px) scale(1.5);
    opacity: 0.7;
  }
}

Why this is better:

transform: translate(300px, 200px) replaces left: 300px; top: 200px
transform: scale(1.5) replaces width: 150px; height: 150px
These transform operations and opacity changes can be handled directly by the GPU without triggering layout or paint operations

Performance comparison:

The inefficient version may drop frames on lower-end devices because each frame requires:
- JavaScript → Style calculations → Layout → Paint → Composite
The efficient version typically maintains 60fps because it only requires:
- JavaScript → Style calculations → Composite

HTML implementation:

class="box-inefficient">Inefficient
class="box-efficient">Efficient

For complex animations, you can use the Chrome DevTools Performance panel to visualize the difference. The inefficient animation will show many more layout and paint events compared to the efficient one.

Reference:

Use transform and opacity property changes to implement animations

24. Use Rules Reasonably, Avoid Over-Optimization

Performance optimization is mainly divided into two categories:

Load-time optimization
Runtime optimization

Of the 23 suggestions above, the first 10 belong to load-time optimization, and the last 13 belong to runtime optimization. Usually, there's no need to apply all 23 performance optimization rules. It's best to make targeted adjustments based on the website's user group, saving effort and time.

Before solving a problem, you need to identify the problem first, otherwise you won't know where to start. So before doing performance optimization, it's best to investigate the website's loading and running performance.

Check Loading Performance

A website's loading performance mainly depends on white screen time and first screen time.

White screen time: The time from entering the URL to when the page starts displaying content.
First screen time: The time from entering the URL to when the page is completely rendered.

You can get the white screen time by placing the following script before .

You can get the first screen time by executing new Date() - performance.timing.navigationStart in the window.onload event.

Check Runtime Performance

With Chrome's developer tools, we can check the website's performance during runtime.

Open the website, press F12 and select performance, click the gray dot in the upper left corner, it turns red to indicate it has started recording. At this point, you can simulate users using the website, and after you're done, click stop, then you'll see the website's performance report during the runtime.

If there are red blocks, it means there are frame drops. If it's green, it means the FPS is good. For detailed usage of performance, you can search using a search engine, as the scope is limited.

By checking the loading and runtime performance, I believe you already have a general understanding of the website's performance. So what you need to do now is to use the 23 suggestions above to optimize your website. Go for it!

References:

performance.timing.navigationStart

Conclusion

Performance optimization is a critical aspect of modern web development that directly impacts user experience, engagement, and ultimately, business outcomes. Throughout this article, we've explored 24 diverse techniques spanning various layers of web applications – from network optimization to rendering performance and JavaScript execution.

Key Takeaways

Start with measurement, not optimization. As discussed in point #24, always identify your specific performance bottlenecks before applying optimization techniques. Tools like Chrome DevTools Performance panel, Lighthouse, and WebPageTest can help pinpoint exactly where your application is struggling.
Focus on the critical rendering path. Many of our techniques (placing CSS in the head, JavaScript at the bottom, reducing HTTP requests, server-side rendering) are centered around speeding up the time to first meaningful paint – the moment when users see and can interact with your content.
Understand the browser rendering process. Knowledge of how browsers parse HTML, execute JavaScript, and render pixels to the screen is essential for making informed optimization decisions, especially when dealing with animations and dynamic content.
Balance implementation cost vs. performance gain. Not all optimization techniques are worth implementing for every project. For instance, server-side rendering adds complexity that might not be justified for simple applications, and bitwise operations provide performance gains only in specific heavy computation scenarios.
Consider the device and network conditions of your users. If you're building for users in regions with slower internet connections or less powerful devices, techniques like image optimization, code splitting, and reducing JavaScript payloads become even more important.

Practical Implementation Strategy

Instead of trying to implement all 24 techniques at once, consider taking a phased approach:

First pass: Implement the easy wins with high impact
- Proper image optimization
- HTTP/2
- Basic caching
- CSS/JS placement
Second pass: Address specific measured bottlenecks
- Use performance profiling to identify problem areas
- Apply targeted optimizations based on findings
Ongoing maintenance: Make performance part of your development workflow
- Set performance budgets
- Implement automated performance testing
- Review new feature additions for performance impact

By treating performance as an essential feature rather than an afterthought, you'll create web applications that not only look good and function well but also provide the speed and responsiveness that modern users expect.

Remember that web performance is a continuous journey, not a destination. Browsers evolve, best practices change, and user expectations increase. The techniques in this article provide a strong foundation, but staying current with web performance trends will ensure your applications remain fast and effective for years to come.

Other References

Why Your Code is Slow: Common Performance Mistakes Beginners Make

Rahul — Fri, 28 Mar 2025 15:38:18 +0000

Maybe you’ve experienced something like this before: you’ve written code that works, but when you hit “run,” it takes forever. You stare at the spinner, wondering if it’s faster to just solve the problem by hand.

But you end up looking something like this… 😭⬇️⬇️

Here’s the truth: slow code doesn’t have to be the end of the world. And it’s a rite of passage if you’re a developer.

When you’re learning to code, you’re focused on making things work—not making them fast. But eventually, you’ll hit a wall: your app freezes, your data script takes hours, or your game lags like a PowerPoint slideshow.

The difference between working code and blazing-fast code often comes down to avoiding a few common mistakes. Mistakes that are easy to make when you’re starting out, like using the wrong tool for the job, writing unnecessary code, or accidentally torturing your computer with hidden inefficiencies.

I’ve been there. I once wrote a “quick” script to analyze data. It ran for 3 hours. Turns out, changing one line of code cut it to 10 seconds. Yes I was dumb when I was learning – but I don’t want you to be, too.

That’s the power of understanding performance.

In this guide, I’ll break down seven common mistakes that can really tank your code’s speed—and how to fix them.

Mistake #1: Logging Everything in Production (Without Realizing It)
- How to Fix It

Mistake #2: Using the Wrong Loops (When There’s a Faster Alternative)
- Why This is a Problem
Mistake #3: Writing Database Queries Inside Loops (Killer of Speed)
Mistake #4: Not Knowing Your Hardware’s Dirty Secrets
Mistake #5: Memory Fragmentation
Mistake #6: The Cache (catch)
Mistake #7: The Copy-Paste Trap
How Do Pro Developers Write Faster Code?
Final Thoughts: Lessons Learned the Hard Way

Mistake #1: Logging Everything in Production (Without Realizing It)

Logging is supposed to help you understand what’s happening in your code—but if you’re logging everything, you’re actually slowing it down. A common beginner mistake is leaving print() statements everywhere or enabling verbose logging even in production, where performance matters most.

Instead of logging only what’s useful, they log every function call, every input, every output, and sometimes even entire request bodies or database queries. This might seem harmless, but in a live application handling thousands of operations per second, excessive logging can cause major slowdowns.

Why This is a Problem

Logging isn’t free. Every log message, whether printed to the console or written to a file, adds extra processing time. If logging is done synchronously (which it often is by default), your application can pause execution while waiting for the log to be recorded.

It also wastes disk space. If every request gets logged in detail, log files can grow rapidly, eating up storage and making it harder to find useful information when debugging.

Here’s an example:

def process_data(data):
    print(f"Processing data: {data}")  # Logging every input
    result = data * 2  
    print(f"Result: {result}")  # Logging every result
    return result

If this function is running inside a loop handling 10,000+ operations, those print statements are slowing things down massively.

How to Fix It

Instead of logging everything, focus on logging only what actually matters. Good logging helps you diagnose real issues without cluttering your logs or slowing down your app.

For example, let’s say you're processing user transactions. You don’t need to log every step of the calculation, but logging when a transaction starts, succeeds, or fails is valuable.

// ✅ Bad logging

logging.info(f"Received input: {data}")  
logging.info(f"Processing transaction for user {user_id}")  
logging.info(f"Transaction intermediate step 1 result: {some_var}")  
logging.info(f"Transaction intermediate step 2 result: {another_var}")  
logging.info(f"Transaction completed: {final_result}")  

// ✅ Better logging

logging.info(f"Processing transaction for user {user_id}")  
logging.info(f"Transaction successful. Amount: ${amount}")

Next, make sure debugging logs are turned off in production. Debug logs (logging.debug()) are great while developing because they show detailed information, but they shouldn’t be running on live servers.

You can control this by setting the logging level to INFO or higher:

import logging

logging.basicConfig(level=logging.INFO)  # Only logs INFO, WARNING, ERROR, CRITICAL messages

def process_data(data):
    logging.debug(f"Processing data: {data}")  # Won't show up in production
    return data * 2

Finally, for high-performance applications, consider using asynchronous logging. By default, logging operations can block execution, meaning your program waits until the log message is written before continuing. This can be a bottleneck, especially if you're logging to a file or a remote logging service.

Asynchronous logging solves this by handling logs in the background. Here’s how you can set it up with Python’s QueueHandler:

import logging
import logging.handlers
import queue

log_queue = queue.Queue()
queue_handler = logging.handlers.QueueHandler(log_queue)
logger = logging.getLogger()
logger.addHandler(queue_handler)
logger.setLevel(logging.INFO)

logger.info("This log is handled asynchronously!")

Mistake #2: Using the Wrong Loops (When There’s a Faster Alternative)

Why This is a Problem

Loops are one of the first things you learn in programming, and for loops feel natural—they give you control, they’re easy to understand, and they work everywhere. That’s why beginners tend to reach for them automatically.

But just because something works doesn’t mean it’s the best way. In Python, for loops can be slow—especially when there’s a built-in alternative that does the same job faster and more efficiently.

This isn’t just a Python thing. Most programming languages have optimized ways to handle loops under the hood—whether it's vectorized operations in NumPy, functional programming in JavaScript, or stream processing in Java. Knowing when to use them is key to writing fast, clean code.

Example

Let’s say you want to square a list of numbers. A beginner might write this:

numbers = [1, 2, 3, 4, 5]
squared = []

for num in numbers:
    squared.append(num ** 2)

Looks fine, right? But there are two inefficiencies here:

You're manually looping when Python has a better, built-in way to handle this.
You're making repeated .append() calls, which add unnecessary overhead.

In small cases, you won’t notice a difference. But when processing large datasets, these inefficiencies add up fast.

The Better, Faster Way

Python has built-in optimizations that make loops run faster. One of them is list comprehensions, which are optimized in C and run significantly faster than manual loops. Here’s how you can rewrite the example:

pythonCopyEdit# Much faster and cleaner
squared = [num ** 2 for num in numbers]

Why this is better:

It’s faster. List comprehensions run in C under the hood, meaning they don’t have the overhead of Python function calls like .append().
It eliminates extra work. Instead of growing a list dynamically (which requires resizing in memory), Python pre-allocates space for the entire list. This makes the operation much more efficient.
It’s more readable. The intent is clear: "I’m creating a list by squaring each number"—no need to scan through multiple lines of code.
It’s less error-prone. Since everything happens in a single expression, there’s less chance of accidentally modifying the list incorrectly (for example, forgetting to .append()).

When to Use For Loops vs. List Comprehensions

For loops still have their place. Use them when:

You need complex logic inside the loop (for example, multiple operations per iteration).
You need to modify existing data in place rather than create a new list.
The operation involves side effects, like logging, file writing, or network requests.

Otherwise, list comprehensions should be your default choice for simple transformations. They’re faster, cleaner, and make your Python code more efficient.

Mistake #3: Writing Database Queries Inside Loops (Killer of Speed)

Why This is a Problem

This is one of the biggest slow-code mistakes beginners (and even intermediates) make. It happens because loops feel natural, and database queries feel straightforward. But mix the two together, and you’ve got a performance disaster.

Every time you call a database inside a loop, you're making repeated trips to the database. Each query adds network latency, processing overhead, and unnecessary load on your system.

Example:

Imagine you’re fetching user details for a list of user_ids like this:

pythonCopyEdituser_ids = [1, 2, 3, 4, 5]

for user_id in user_ids:
    user = db.query(f"SELECT * FROM users WHERE id = {user_id}")
    print(user)  # Do something with the user

What's wrong here?

You're hitting the database multiple times instead of once.
Each call has network overhead (database queries aren’t instant).
Performance tanks when user_ids gets large.

How to Fix It: Use Bulk Queries

Instead of making 5 separate queries, make one:

pythonCopyEdituser_ids = [1, 2, 3, 4, 5]

users = db.query(f"SELECT * FROM users WHERE id IN ({','.join(map(str, user_ids))})")

for user in users:
    print(user)  # Process users efficiently

Why this is better:

In the above code, we just have one database call instead of many. This results in faster performance.
There’s also less network overhead which makes your app feel snappier.
And this works even if user_ids has 10,000+ entries.

A More Scalable Approach

If you're using an ORM (like SQLAlchemy in Python or Sequelize in JavaScript), use batch fetching instead of looping:

pythonCopyEditusers = db.query(User).filter(User.id.in_(user_ids)).all()

Mistake #4: Not Knowing Your Hardware’s Dirty Secrets

Your code doesn’t run in a magical fairyland—it runs on real hardware. CPUs, memory, and caches have quirks that can turn “logically fast” code into a sluggish mess. Here’s what most tutorials won’t tell you:

Problem 1: The CPU’s Crystal Ball is Broken (Memory Prefetching)

What you think happens:

“I’m looping through data sequentially. The CPU should predict what I need next!”

What actually happens:

Modern CPUs have a memory prefetcher—a smart assistant that tries to guess which memory locations you’ll need next and loads them in advance.

But here’s the catch: If your access pattern is too random, the prefetcher gives up. Instead of smoothly fetching data ahead of time, the CPU is left waiting, like someone stuck refreshing Google Maps on a broken internet connection or blind date.

This happens a lot with linked lists and hash tables, where memory jumps around unpredictably.

Example:

# Linked list traversal (random memory jumps)  
class Node:  
    def __init__(self, val):  
        self.val = val  
        self.next = None  

head = Node(0)  
current = head  
for _ in range(100000):  # Each 'next' points to a random memory location  
    current.next = Node(0)  
    current = current.next  

# Walking this list = 100,000 cache misses

Why this hurts:

Each time the CPU needs the next Node, it has to fetch it from a random memory location, making prefetching useless and causing frequent cache misses.

The Fix: Use Contiguous Data Structures

Instead of using a linked list, store your data in a contiguous memory block (like an array or NumPy array). This way, the CPU can easily prefetch the next elements in sequence, speeding things up.

# Array traversal (prefetcher-friendly)  
data = [0] * 100000  # Contiguous memory  
for item in data:  
    pass  # CPU prefetches next elements seamlessly

Why this is better:

The CPU efficiently prefetches upcoming values instead of waiting.
Fewer cache misses = way faster execution.
Hot loops (loops that run millions of times) get a huge performance boost.

📌 Hot loops are loops that execute a massive number of times, like those in data processing, AI models, and game engines. Even a small speedup in a hot loop can dramatically improve overall performance.

Problem 2: The Invisible Tax of Memory Pages (TLB Thrashing)

What you think happens:

“My 10GB dataset is just… there. Accessing it is free, right?”

What actually happens:

Your OS splits memory into 4KB pages. Every time your program accesses a new memory page, the CPU consults a Translation Lookaside Buffer (TLB)—a “phonebook” for fast page lookups.

If your program jumps between too many pages, you get TLB misses, and the CPU wastes cycles waiting for the OS to fetch memory mappings.

Example:

# Iterating a giant list with random access  
data = [x for x in range(10_000_000)]  
total = 0  
for i in random_indexes:  # 1,000,000 random jumps  
    total += data[i]  # Each jump likely hits a new page

Why this hurts:

TLB misses can add 10-100 CPU cycles per access.
If you have millions of random accesses, that’s billions of wasted cycles.

The Fix: Process Data in Chunks

To reduce TLB misses:

Process data in chunks (for example, 4096 elements at a time) instead of randomly jumping around.
Use huge pages (2MB instead of 4KB) so that more data fits in each memory page.

Problem 3: Your Code is a Tourist in the Wrong CPU Neighborhood (NUMA)

What you think happens:

“My 64-core server is a speed paradise!”

What actually happens:

On multi-socket servers, memory is divided into NUMA (Non-Uniform Memory Access) zones. Each CPU socket has its own local memory, and accessing memory from another socket is slow—like ordering Uber Eats from another city.

Example:

# Running this on a 2-socket server:  
from multiprocessing import Pool  
import numpy as np  

def process(chunk):  
    data = np.load("giant_array.npy")  # Allocated on Socket 1's RAM  
    return chunk * data  # If process runs on Socket 2's CPU... ouch  

with Pool(64) as p:  
    p.map(process, big_data)  # 64 cores fighting over remote RAM

Why this hurts:

Accessing memory from another NUMA zone can be 2-4x slower.
Your 64 cores end up waiting for memory instead of actually computing.

The Fix: Pin Processes to NUMA-Aware Memory

Instead of letting your processes randomly access memory, you can pin them to the correct NUMA node.

Use numactl on Linux to allocate memory near the CPU that will use it.
Use numba-aware libraries in NumPy to ensure data is allocated optimally.

Problem 4: The CPU is a Drama Queen (Speculative Execution)

What you think happens:

“My code runs in the order I wrote it!”

What actually happens:

CPUs speculatively execute code ahead of time. If they guess wrong, they have to rollback everything and restart, which slows things down.

Example:

// Unpredictable branches = CPU's worst nightmare  
if (rare_condition) {  // 99% of the time, this is false  
    do_work();  
}

Why this hurts:

A branch misprediction wastes 15-20 cycles. In hot loops, this can really hurt performance.

The Fix: Make Branches Predictable

Sort data to help the CPU make better predictions:

# Process all 'valid' items first, then 'invalid' ones  
sorted_data = sorted(data, key=lambda x: x.is_valid, reverse=True)  
for item in sorted_data:  
    if item.is_valid:  # CPU learns the pattern → accurate predictions  
        process(item)

Why This Works:

Branching becomes predictable—the CPU stops guessing wrong.
Sorting ahead of time reduces rollbacks and wasted cycles.

How to Fight Back

Here’s how you can stop your CPU from sabotaging your code:

Treat Memory Like a Highway: Cache lines matter. Keep data contiguous so the CPU doesn’t have to search for it.

Profile with perf: Use Linux’s perf tool to spot cache misses, page faults, and TLB thrashing:
```
 perf stat -e cache-misses,page-faults ./your_code
```

Assume Nothing. Benchmark Everything: CPUs have a thousand undocumented behaviors. Test different data layouts, loop structures, and memory allocations to see what’s fastest.

Mistake #5: Memory Fragmentation

You’ve optimized your algorithms. You’ve nailed Big O. Yet your app still crashes with “out of memory” errors or slows to a crawl over time. The culprit? Memory fragmentation—a ghost in the machine that most developers ignore until it’s too late.

What’s Happening Under the Hood

When your code allocates and frees memory blocks of varying sizes, it leaves behind a patchwork of free and used spaces. Over time, this creates a Swiss cheese effect in your RAM: plenty of total free memory, but no contiguous blocks for new allocations.

Example:
Imagine a C++ server that handles requests by allocating buffers of random sizes:

void process_request() {  
    // Allocate a buffer of random size between 1–1024 bytes  
    char* buffer = new char[rand() % 1024 + 1];  
    // ... process ...  
    delete[] buffer;  
}

After millions of requests, your memory looks like this:

[USED][FREE][USED][FREE][USED][FREE]...

Now, when you try to allocate a 2KB buffer, it fails—not because there’s no space, but because no single free block is large enough.

How to Fix it:

Use a memory pool to allocate fixed-size blocks:

class MemoryPool {  
public:  
    MemoryPool(size_t block_size) : block_size_(block_size) {}  
    void* allocate() { /* get a pre-allocated block */ }  
    void deallocate(void* ptr) { /* return block to pool */ }  
};  

// All requests use buffers of fixed size (1024 bytes)  
MemoryPool pool(1024);  
void process_request() {  
    char* buffer = static_cast<char*>(pool.allocate());  
    // ... process ...  
    pool.deallocate(buffer);  
}

By standardizing block sizes, you eliminate fragmentation.

The Autoboxing Trap (Java, C#, and so on)

What’s Happening?

In languages that mix primitives (like int, float) and objects (like Integer, Double), converting a primitive to its object wrapper is called autoboxing. It feels harmless, but in hot loops, it’s a performance disaster.

Example:

// Slow: Creates 1,000,000 Integer objects (and garbage!)
List list = new ArrayList<>();
for (int i = 0; i < 1_000_000; i++) {  
    list.add(i);  // Autoboxing 'i' to Integer  
}

Why this hurts performance:

Memory overhead: Each Integer object adds 16–24 bytes of extra memory (object headers, pointers). With 1,000,000 numbers, that’s an extra 16–24MB wasted just on overhead.
Garbage collection (GC) pressure: Since objects are allocated on the heap, the GC constantly cleans up old Integer objects, leading to latency spikes.
CPU cache inefficiency: Primitives like int are tightly packed in memory, but Integer objects are scattered across the heap with extra indirection, wrecking cache locality.

The Fix: Use Primitive Collections

To avoid autoboxing, use data structures that store raw primitives instead of objects. In Java, Eclipse Collections provides primitive-friendly lists like IntList that store raw int values directly.

Example: The Faster Version (Primitive Collections)

// Import primitive-friendly collection
import org.eclipse.collections.api.list.primitive.IntList;
import org.eclipse.collections.impl.list.mutable.primitive.IntArrayList;  

// Use IntArrayList to store raw ints
IntList list = new IntArrayList();  
for (int i = 0; i < 1_000_000; i++) {  
    list.add(i);  // No autoboxing! Stores raw 'int'  
}

How this fix works:

Stores raw int values instead of Integer objects, eliminating memory overhead.
Avoids heap allocations, so the garbage collector doesn’t get involved.
Keeps numbers tightly packed in memory, improving CPU cache efficiency.

The Fix for C

In C#, you can avoid unnecessary heap allocations by using structs and Span, which keep data on the stack or in contiguous memory rather than the heap.

// Span avoids heap allocations  
Span<int> numbers = stackalloc int[1_000_000];  
for (int i = 0; i < numbers.Length; i++) {  
    numbers[i] = i;  // No boxing, no heap allocation  
}

No object wrappers. No GC pressure. Just performance.

Mistake #6: The Cache (catch)

You’ve heard “cache matters,” but here’s the twist: your loops are lying to your CPU. The way you traverse multi-dimensional arrays can turn a 10x speed difference into a mystery that leaves you questioning reality.

Row-Major vs. Column-Major Access

What you think happens:
“Iterating over a 2D array is the same whether I go row-by-row or column-by-column. Right?”

What actually happens:
Memory is laid out linearly, but CPUs prefetch data in chunks (cache lines). Traversing against the grain forces the CPU to fetch new cache lines every single step.

Example in C:

// A "tiny" 1024x1024 matrix  
int matrix[1024][1024];  

// Fast: Row-major traversal (cache-friendly)  
for (int i = 0; i < 1024; i++) {  
    for (int j = 0; j < 1024; j++) {  
        matrix[i][j] = i + j;  
    }  
}  

// Slow: Column-major traversal (cache-hostile)  
for (int j = 0; j < 1024; j++) {  
    for (int i = 0; i < 1024; i++) {  
        matrix[i][j] = i + j;  
    }  
}

The result:

Row-major: ~5ms (data flows like a river).
Column-major: ~50ms (CPU drowns in cache misses).

Why it’s worse than you think:
In C/C++, arrays are row-major. But in Fortran, MATLAB, or Julia, they’re column-major. Use the wrong traversal order in these languages, and you’ll get the same penalty.

The Plot Twist: Your Programming Language is Gaslighting You

In C and Python (NumPy default), arrays use row-major order. But in Fortran, MATLAB, and Julia, arrays are column-major. If you assume the wrong layout, your loops will be slow for no apparent reason.

Python Example:

import numpy as np  

# Row-major (C-style) → Fast for row-wise loops  
row_major = np.zeros((1024, 1024), order='C')  

# Column-major (Fortran-style) → Fast for column-wise loops  
col_major = np.zeros((1024, 1024), order='F')  

# ❌ Slow: Column-wise access on a row-major array  
for i in range(1024):  
    for j in range(1024):  
        col_major[i][j] = i + j  # Cache-miss chaos!

Why this is a problem:

Row-major (default in NumPy) expects row-wise access, but the loop accesses it column-wise, causing cache misses.
Fortran-style arrays are stored column-first, so row-wise loops will be slow instead.

The Fix:

Match the array order to your access pattern using order='C' (row-major) or order='F' (column-major).
Convert data layout with np.asarray() if needed.

The Multidimensional Illusion: 3D+ Arrays

What you think happens:
“3D arrays are just 2D arrays with extra steps. No big deal.”

What actually happens:
Each dimension adds a layer of indirection. A 3D array in C is an array of arrays of arrays. Traversing the “wrong” dimension forces the CPU to dereference pointers repeatedly, killing locality.

Example: 3D Array in Traversal in C

// ✅ Fast: Iterate in Row-Major Order (Innermost Dimension Last)

int space[256][256][256];  

for (int x = 0; x < 256; x++) {  
    for (int y = 0; y < 256; y++) {  
        for (int z = 0; z < 256; z++) {  
            space[x][y][z] = x + y + z;  // Smooth memory access  
        }  
    }  
}

So what happens is that the innermost loop moves through contiguous memory, making full use of cache lines.

// ❌ Slow: Iterate in the Wrong Order (Innermost Dimension First)

for (int z = 0; z < 256; z++) {  
    for (int y = 0; y < 256; y++) {  
        for (int x = 0; x < 256; x++) {  
            space[x][y][z] = x + y + z;  // Constant cache misses  
        }  
    }  
}

Why this is bad:

This loop jumps across memory every time x changes.
Instead of accessing contiguous memory, it dereferences pointers constantly.
Penalty: Up to 100x slower for large 3D arrays!

The Nuclear Option: Cache-Aware Algorithms

For extreme performance (game engines, HPC), you need to design for cache lines:

Tiling: Split arrays into small blocks that fit in L1/L2 cache.

 // Process 8x8 tiles to exploit 64-byte cache lines  
 for (int i = 0; i < 1024; i += 8) {  
     for (int j = 0; j < 1024; j += 8) {  
         // Process tile[i:i+8][j:j+8]  
     }  
 }

SoA vs. AoS: Prefer Structure of Arrays (SoA) over Array of Structures for SIMD.

 // Slow: Array of Structures (AoS)  
 struct Particle { float x, y, z; };  
 Particle particles[1000000];  

 // Fast: Structure of Arrays (SoA)  
 struct Particles {  
     float x[1000000];  
     float y[1000000];  
     float z[1000000];  
 };

Mistake #7: The Copy-Paste Trap

You’d never download 10 copies of the same movie. But in code? You’re probably cloning data all the time without realizing it. Here’s how invisible copies turn your app into a bloated, slow mess—and how to fix it.

Problem 1: The Ghost Copies in “Harmless” Operations

What you think happens:
“I sliced a list—it’s just a reference, right?”

What actually happens:
In many languages, slicing creates a full copy of the data. Do this with large datasets, and you’re silently doubling memory usage and CPU work.

Python Example:

# A 1GB list of data  
big_data = [ ... ]  # 1,000,000 elements  

# Accidentally cloning the entire list  
snippet = big_data[:1000]  # Creates a copy (harmless, right?)  

# Better: Use a view (if possible)  
import numpy as np  
big_array = np.array(big_data)  
snippet = big_array[:1000]  # A view, not a copy (0MB added)

Why this hurts:

Copying 1GB → 2GB of RAM used.
If this happens in a loop, your program could crash with MemoryError.

The Fix:

Use memory views (numpy, memoryview in Python) or lazy slicing (Pandas .iloc).
In JavaScript, slice() copies arrays—replace with TypedArray.subarray for buffers.

Problem 2: The Hidden Cost of “Functional” Code

What you think happens:

“I’ll chain array methods for clean, readable code!”

What actually happens:

Every map, filter, or slice creates a new array. Chain three operations? You’ve cloned your data three times.

JavaScript Example:

// A 10,000-element array  
const data = [ ... ];  

// Slow: Creates 3 copies (original → filtered → mapped → sliced)  
const result = data  
  .filter(x => x.active)  
  .map(x => x.value * 2)  
  .slice(0, 100);  

// Faster: Do it in one pass  
const result = [];  
for (let i = 0; i < data.length; i++) {  
  if (data[i].active) {  
    result.push(data[i].value * 2);  
    if (result.length === 100) break;  
  }  
}

Why this hurts:

10,000 elements → 30,000 operations + 3x memory.
Functional programming is elegant but can be expensive.

The Fix:

Use generators (Python yield, JS function*) for lazy processing.
Replace method chains with single-pass loops in hot paths.

Problem 3: The “I’ll Just Modify a Copy” Mistake

What you think happens:
“I need to tweak this object. I’ll duplicate it to avoid side effects.”

What actually happens:
Deep cloning complex objects (especially in loops) is like photocopying a dictionary every time you edit a word.

Python Example:

import copy  

config = {"theme": "dark", "settings": { ... }}  # Nested data  

# Slow: Deep-copying before every edit  
for user in users:  
    user_config = copy.deepcopy(config)  # Copies entire nested structure  
    user_config["theme"] = user.preference  
    # ...  

# Faster: Reuse the base config, overlay changes  
for user in users:  
    user_config = {"theme": user.preference, **config}  # Shallow merge  
    # ...

Why this hurts:

deepcopy is 10-100x slower than shallow copies.
Multiplied by 1,000 users, you’re wasting minutes.

The Fix:

Use immutable patterns: Create new objects by merging instead of cloning.
For big data, use structural sharing (libraries like immutables in Python).

How to Escape the Copy-Paste hell?

Ask: “Do I need a copy?”: 90% of the time, you don’t. Use views, generators, or in-place edits.
Profile memory usage: Tools like memory_profiler (Python) or Chrome DevTools (JS) show copy overhead.
Learn your language’s quirks:
- Python: Slicing lists copies, slicing NumPy arrays doesn’t.
- JavaScript: [...array] clones, array.subarray (TypedArray) doesn’t.

How Do Pro Developers Write Faster Code?

Most beginners think "fast code" just means writing cleaner syntax or using a different framework. But in reality, performance isn't just about what language or framework you use—it's about how you think.

Pro developers don’t just write code. They measure, test, and optimize it. Here’s how they do it.

1. They Profile Their Code Instead of Guessing

🔥 Beginners: “This function feels slow… maybe I should rewrite it?”
💡 Pros: “Let’s profile it and see what’s actually slow.”

Instead of randomly rewriting code, pro developers measure first using profiling tools.

Example: In Python, you can use cProfile to analyze where your code is spending the most time:

pythonCopyEditimport cProfile

def slow_function():
    total = 0
    for i in range(10**6):
        total += i
    return total

cProfile.run('slow_function()')

👀 What this tells you:

Which function takes the longest
How many times is a function being called
Where is the actual bottleneck

✅ Takeaway: Before optimizing, always profile your code. You can’t fix what you don’t measure.

Other useful tools:

Python: cProfile, line_profiler
JavaScript: Chrome DevTools Performance Tab
Java: JProfiler
General: perf, Valgrind

2. They Avoid Premature Optimization

🔥 Beginners: “I’ll spend hours optimizing this loop before testing it.”
💡 Pros: “I’ll make it work first, then optimize only what matters.”

Donald Knuth famously said, "Premature optimization is the root of all evil." Many beginners waste time optimizing things that aren’t actually slow.

Example: A beginner might spend hours optimizing a loop that runs in 0.001 seconds, while the real slowdown is an extra database query that takes 500ms.

✅ Takeaway:

First, make your code work.
Then, profile and optimize only what’s slow.

3. They Pick the Right Data Structures (Not Just What’s Familiar)

🔥 Beginners: “I’ll just use a list.”
💡 Pros: “Which data structure is optimal for this task?”

Most slowdowns happen because of bad data structure choices. Pro developers pick the right tool instead of just going with the default.

Example: Fast lookups
❌ Slow (List - O(n))

pythonCopyEditusers = ["alice", "bob", "charlie"]
if "bob" in users:  # Searches the entire list
    print("Found")

✅ Fast (Set - O(1))

pythonCopyEditusers = {"alice", "bob", "charlie"}
if "bob" in users:  # Uses a hash table for instant lookup
    print("Found")

✅ Takeaway: When performance matters, choose the right data structure, not just the most familiar one.

4. They Automate Performance Checks

🔥 Beginners: “I’ll check for performance issues when I feel like it.”
💡 Pros: “I’ll use tools to automatically catch performance bottlenecks.”

Instead of manually looking for slow code, pro developers rely on automated tools that flag inefficiencies.

Example:

Python: py-spy (lightweight sampling profiler)
JavaScript: Chrome DevTools Performance Monitoring
Java: JMH (Java Microbenchmark Harness)
AI-assisted code reviews: There are tools like CodeAnt that analyze and auto fix your code automatically when you push on GitHub(or anywhere) and suggest performance improvements.

✅ Takeaway: Set up automated checks so you catch performance issues early—before they hit production.

5. They Think About Performance From Day One

🔥 Beginners: “I’ll optimize later.”
💡 Pros: “I’ll write efficient code from the start.”

While premature optimization is bad, writing slow code from the start is worse. Pro developers avoid common pitfalls before they become real problems.

Example: Writing efficient loops from the start
❌ Slow (Unnecessary .append())

pythonCopyEditresult = []
for i in range(10**6):
    result.append(i * 2)  # This is slow

✅ Fast (List Comprehension - Optimized from the Start)

pythonCopyEditresult = [i * 2 for i in range(10**6)]  # Faster, more efficient

✅ Takeaway: Small choices add up. Think about performance as you write, rather than fixing it later.

🚀 Final Thoughts: Lessons Learned the Hard Way

Thanks for reading! These are some of the tips I’ve personally bookmarked for myself—things I’ve learned the hard way while coding, talking to dev friends, and working on real projects.

When I first started, I used to guess why my code was slow instead of measuring. I’d optimize random parts of my code and still wonder why things weren’t getting faster. Over time, I realized that pro developers don’t just “write fast code” by instinct—they use tools, measure, and optimize what actually matters.

I wrote this to save you from making the same mistakes I did. Hopefully, now you have a clearer roadmap to writing faster, more efficient code—without the frustration I went through! 🚀

If you found this helpful, bookmark it for later, and feel free to share it with a fellow dev who might be struggling with slow code too.

Happy coding! 😊

How to Use Python's Built-in Profiling Tools: Examples and Best Practices

Vivek Sahu — Tue, 25 Mar 2025 16:11:21 +0000

Python is known for its simplicity and readability, making it a favorite among developers. But this simplicity sometimes comes at the cost of performance. When your Python application grows or needs to handle larger workloads, understanding what's happening under the hood becomes crucial.

While many developers reach for third-party profiling tools, Python's standard library already comes packed with powerful profiling capabilities that are often overlooked or underutilized.

In this article, you'll learn how to use these built-in profiling tools beyond their basic usage. You'll discover how to combine and leverage them to gain deep insights into your code's performance without installing additional packages.

Prerequi site s
The Bui l t-in Profiling A r senal
- The timeit M odule
- The cProfile Module
- T h e pstats M odule
- The profile Module
P ractical Ex periments
Best Practic es
Conclu s ion
References a nd Further R e ading

Prerequisites

Before diving into the profiling techniques, make sure you have:

Python 3.6+: All examples in this article are compatible with Python 3.6 and newer versions.
Basic Python Knowledge: You should be comfortable with Python fundamentals like functions, modules, and basic data structures.
A Test Environment: Either a local Python environment or a virtual environment where you can run the code examples.

No external libraries are required for this tutorial as we'll be focusing exclusively on Python's built-in profiling tools. You can verify your Python version this way:

# Verify your Python version
import sys
print(f"Python version: {sys.version}")

The Built-in Profiling Arsenal

Python ships with several profiling tools in its standard library. Let's explore each one and understand their various strengths.

The `timeit` Module

Most Python developers are familiar with the basic timeit usage:

import timeit

# Basic usage
execution_time = timeit.timeit('"-".join(str(n) for n in range(100))', number=1000)
print(f"Execution time: {execution_time} seconds")

# Sample output:
# Execution time: 0.006027 seconds

This basic example measures how long it takes to join 100 numbers into a string with hyphens. The number=1000 parameter tells Python to run this operation 1,000 times and return the total execution time, which helps average out any random fluctuations.

However, timeit offers much more flexibility than most developers realize. Let's explore some powerful ways to use it:

Setup Code Separation:

setup_code = """
data = [i for i in range(1000)]
"""

test_code = """
result = [x * 2 for x in data]
"""

execution_time = timeit.timeit(stmt=test_code, setup=setup_code, number=100)
print(f"Execution time: {execution_time} seconds")

# Sample output:
# Execution time: 0.001420 seconds

In this example, we separate the setup code from the code being timed. This is extremely useful when:

You need to create test data but don't want that time included in your measurement
You're timing a function that relies on imports or variable definitions
You want to reuse the same setup for multiple timing tests

The advantage is that only the code in test_code is timed, while the setup runs just once before the timing begins.

Comparing Functions:

def approach_1(data):
    return [x * 2 for x in data]

def approach_2(data):
    return list(map(lambda x: x * 2, data))

data = list(range(1000))

time1 = timeit.timeit(lambda: approach_1(data), number=100)
time2 = timeit.timeit(lambda: approach_2(data), number=100)

print(f"Approach 1: {time1} seconds")
print(f"Approach 2: {time2} seconds")
print(f"Ratio: {time2/time1:.2f}x")

# Sample output:
# Approach 1: 0.001406 seconds
# Approach 2: 0.003049 seconds
# Ratio: 2.17x

This example demonstrates how to compare two different implementations of the same functionality. Here we're comparing:

A list comprehension approach
A map() with lambda approach

By using lambda functions, we can pass existing data to our functions when timing them. This directly measures real-world scenarios where your functions are working with existing data. The ratio calculation makes it easy to understand exactly how much faster one approach is than the other.

In this case, we can see the list comprehension is about 2.17 times faster than the map approach for this specific operation.

The `cProfile` Module

cProfile is Python's C-based profiler that provides detailed statistics about function calls. Many developers use it with its default settings:

import cProfile

def my_function():
    total = 0
    for i in range(100000):  # Reduced for faster execution
        total += i
    return total

cProfile.run('my_function()')

# Sample output:
#          4 function calls in 0.002 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.000    0.000    0.002    0.002 :1()
#         1    0.002    0.002    0.002    0.002 :1(my_function)
#         1    0.000    0.000    0.002    0.002 {built-in method builtins.exec}
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This basic example runs the profiler on a simple function that sums the numbers from 0 to 99,999. The output provides several key pieces of information:

ncalls: How many times each function was called
tottime: The total time spent in the function (excluding time in subfunctions)
percall: Time per call (tottime divided by ncalls)
cumtime: Cumulative time spent in this function and all subfunctions
filename:lineno(function): Where the function is defined

This gives you a comprehensive view of where time is being spent in your code, but there's much more you can do with cProfile.

The real power comes from advanced usage techniques:

Sorting Results:

import cProfile
import pstats

# Profile the function
profiler = cProfile.Profile()
profiler.enable()
my_function()
profiler.disable()

# Create stats object
stats = pstats.Stats(profiler)

# Sort by different metrics
stats.sort_stats('cumulative').print_stats(10)  # Top 10 functions by cumulative time
stats.sort_stats('calls').print_stats(10)       # Top 10 functions by call count
stats.sort_stats('time').print_stats(10)        # Top 10 functions by time

# Sample output for cumulative sorting:
#          2 function calls in 0.002 seconds
#
#    Ordered by: cumulative time
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.002    0.002    0.002    0.002 :1(my_function)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This example demonstrates how to control the profiling process and sort the results in different ways. The advantages are:

You can enable/disable profiling around specific sections of code
You can sort results by different metrics to identify different types of bottlenecks:
- cumulative: Find functions that consume the most time overall (including subfunctions)
- calls: Find functions called most frequently
- time: Find functions with the highest self-time (excluding subfunctions)
Limit output to only the top N results with print_stats(N)

This flexibility lets you focus on specific performance aspects of your code.

Filtering Results:

stats.strip_dirs().print_stats()  # Remove directory paths for cleaner output
stats.print_stats('my_module')   # Only show results from my_module

# Sample output with strip_dirs():
#          2 function calls in 0.002 seconds
#
#    Random listing order was used
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.002    0.002    0.002    0.002 :1(my_function)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

These filtering techniques are invaluable when working with larger applications:

strip_dirs() removes directory paths, making the output much more readable
print_stats('my_module') filters results to only show functions from a specific module, letting you focus on your code rather than library code

This is particularly useful when profiling large applications where the full output might include hundreds or thousands of function calls.

The `pstats` Module

The pstats module is often overlooked but provides powerful ways to analyze profiling data:

Saving and Loading Profile Data:

import cProfile
import pstats

# Save profile data to a file
cProfile.run('my_function()', 'my_profile.stats')

# Load and analyze later
stats = pstats.Stats('my_profile.stats')
stats.strip_dirs().sort_stats('cumulative').print_stats(10)

# Sample output:
# Wed Mar 20 14:30:00 2024    my_profile.stats
#
#          4 function calls in 0.002 seconds
#
#    Ordered by: cumulative time
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.000    0.000    0.002    0.002 {built-in method builtins.exec}
#         1    0.000    0.000    0.002    0.002 :1()
#         1    0.002    0.002    0.002    0.002 :1(my_function)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This example shows how to save profiling data to a file and load it later for analysis. The key advantages are:

You can collect profiling data in one session or environment and analyze it in another
You can share profiling data with team members without them needing to run the code
You can save profiling data from production environments where interactive analysis might not be possible
You can compare different runs over time to track performance improvements

This approach separates data collection from analysis, making it more flexible for real-world applications.

Combining Multiple Profiles:

stats = pstats.Stats('profile1.stats')
stats.add('profile2.stats')
stats.add('profile3.stats')
stats.sort_stats('time').print_stats()

# This allows you to combine results from multiple profiling runs,
# useful for aggregating data from different test cases or scenarios

This powerful feature lets you combine results from multiple profiling runs. This is useful for:

Comparing performance across different inputs
Aggregating data from multiple test scenarios
Combining data from different parts of your application
Building a more comprehensive performance picture across multiple runs

By combining stats from multiple runs, you can identify patterns that might not be apparent from a single profiling session.

The `profile` Module

The profile module is a pure Python implementation of the profiler interface. While it's slower than cProfile, it can be more flexible for specific cases:

import profile

def my_function():
    total = 0
    for i in range(100000):  # Using 100000 for faster execution
        total += i
    return total

profile.run('my_function()')

# Sample output:
#          5 function calls in 0.011 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.000    0.000    0.002    0.002 :0(exec)
#         1    0.009    0.009    0.009    0.009 :0(setprofile)
#         1    0.000    0.000    0.002    0.002 :1()
#         1    0.000    0.000    0.011    0.011 profile:0(my_function())
#         0    0.000             0.000          profile:0(profiler)
#         1    0.002    0.002    0.002    0.002 :1(my_function)

The profile module works similarly to cProfile but offers advantages in specific scenarios:

It's implemented in pure Python, making it easier to modify if you need custom profiling behavior
You can subclass and extend it to implement custom profiling logic
It allows more fine-grained control over the profiling process
It's useful for profiling in environments where the C extension might not be available

While it's slower than cProfile (because it's implemented in Python rather than C), its flexibility makes it valuable for specialized profiling needs.

The profile module follows the same API as cProfile, so you can use all the same techniques for analyzing results.

Practical Experiments

Let's put these tools to practical use with some experiments.

Setup

First, create a simple Python module with various functions to profile:

# profiling_example.py

import time
import random

def process_data(data):
    result = []
    for item in data:
        result.append(process_item(item))
    return result

def process_item(item):
    # Simulate processing time
    time.sleep(0.0001)  # Small delay for demonstration purposes
    return item * 2

def generate_data(size):
    return [random.randint(1, 100) for _ in range(size)]

def process_data_optimized(data):
    return [process_item(item) for item in data]

def main():
    data = generate_data(50)
    result1 = process_data(data)
    result2 = process_data_optimized(data)
    assert result1 == result2
    return result1

if __name__ == "__main__":
    main()

Experiment 1: Basic vs Advanced `timeit` Usage

Let's compare different ways of timing our functions:

import timeit
from profiling_example import generate_data, process_data, process_data_optimized

# Method 1: Basic string evaluation (limited but simple)
setup1 = """
from profiling_example import generate_data, process_data
data = generate_data(5)  # Using a small size for demonstration
"""
basic_time = timeit.timeit('process_data(data)', setup=setup1, number=5)
print(f"Basic timing: {basic_time:.4f} seconds")

# Method 2: Using lambda for better control
data = generate_data(5)
advanced_time = timeit.timeit(lambda: process_data(data), number=5)
print(f"Advanced timing: {advanced_time:.4f} seconds")

# Method 3: Comparing implementations
data = generate_data(5)
original_time = timeit.timeit(lambda: process_data(data), number=5)
optimized_time = timeit.timeit(lambda: process_data_optimized(data), number=5)
print(f"Original implementation: {original_time:.4f} seconds")
print(f"Optimized implementation: {optimized_time:.4f} seconds")
print(f"Improvement ratio: {original_time/optimized_time:.2f}x")

# Sample output:
# Basic timing: 0.0032 seconds
# Advanced timing: 0.0034 seconds
# Original implementation: 0.0033 seconds
# Optimized implementation: 0.0034 seconds
# Improvement ratio: 0.98x

This experiment demonstrates three different approaches to timing code with the timeit module:

Method 1: Basic string evaluation – This approach evaluates a string of code after running the setup code. The advantages include:

Simple syntax for basic timing needs
Setup code runs only once, not during each timing run
Good for timing simple expressions

Method 2: Lambda functions – This more advanced approach uses lambda functions to call our functions directly. Benefits include:

Direct access to functions and variables in the current scope
No need to import functions in setup code
Better for timing functions that take arguments
More intuitive for complex timing scenarios

Method 3: Implementation comparison – This practical approach compares two different implementations of the same functionality. This is valuable when:

Deciding between alternative implementations
Measuring the impact of optimizations
Quantifying performance differences with a ratio

In this example, the list comprehension isn't significantly faster because the dominant cost is the time.sleep() call in both implementations. In real-world cases with actual computation instead of sleep, the difference is often more pronounced.

Experiment 2: Effective `cProfile` Analysis

Now let's use cProfile to identify bottlenecks:

import cProfile
import pstats
import io
from profiling_example import main

# Method 1: Basic profiling
cProfile.run('main()')

# Sample output snippet:
#         679 function calls in 0.014 seconds
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#      100    0.000    0.000    0.014    0.000 profiling_example.py:10(process_item)
#      100    0.014    0.000    0.014    0.000 {built-in method time.sleep}
#        1    0.000    0.000    0.007    0.007 profiling_example.py:4(process_data)
#        1    0.000    0.000    0.007    0.007 profiling_example.py:18(process_data_optimized)

# Method 2: Capturing and analyzing results
profiler = cProfile.Profile()
profiler.enable()
main()
profiler.disable()

# Redirect output to string for analysis
s = io.StringIO()
stats = pstats.Stats(profiler, stream=s).sort_stats('cumulative')
stats.print_stats(10)  # Print top 10 functions by cumulative time
print(s.getvalue())

# Method 3: Focus on specific functions
s = io.StringIO()
stats = pstats.Stats(profiler, stream=s).sort_stats('cumulative')
stats.print_callers('process_item')  # Show what's calling this function
print(s.getvalue())

# Sample output for the callers analysis:
# Function was called by...
#       ncalls  tottime  cumtime
# profiling_example.py:10(process_item)  <-  
#     50    0.000    0.007  profiling_example.py:4(process_data)
#     50    0.000    0.007  profiling_example.py:18(process_data_optimized)

This experiment demonstrates three powerful cProfile techniques for identifying bottlenecks:

Method 1: Basic profiling – Using cProfile.run() to profile a function call provides an immediate overview of performance. This technique:

Gives you a quick snapshot of all function calls
Shows precisely where time is being spent
Is easy to use with minimal setup
Identifies the most time-consuming operations

In our example, we can immediately see that time.sleep() is consuming most of the execution time.

Method 2: Programmatic profiling and analysis – This approach gives you more control:

You can enable/disable profiling for specific sections of code
You can save the results to a variable for further analysis
You can customize how results are sorted and displayed
You can redirect output to a string or file for post-processing

This method is particularly useful for profiling specific parts of a larger application.

Method 3: Caller analysis – The print_callers() method is extremely valuable because it:

Shows which functions are calling your bottleneck functions
Helps identify which code paths are contributing to performance issues
Reveals how many times each caller invokes a particular function
Provides context that's crucial for understanding performance patterns

In our example, we can see that both process_data and process_data_optimized are calling process_item 50 times each, confirming they're contributing equally to the bottleneck.

This immediate shows us that process_item is the bottleneck, specifically the time.sleep() call inside it, and it's being called equally by both implementations.

Experiment 3: Combining Tools for Real-world Profiling

In real-world scenarios, combining profiling tools gives the most complete picture:

import cProfile
import pstats
import timeit
from profiling_example import main, process_data, process_data_optimized, generate_data

# First, use timeit to get baseline performance of main components
data = generate_data(50)
time_process = timeit.timeit(lambda: process_data(data), number=3)
time_process_opt = timeit.timeit(lambda: process_data_optimized(data), number=3)
print(f"Process data: {time_process:.4f}s")
print(f"Process data optimized: {time_process_opt:.4f}s")

# Sample output:
# Process data: 0.0196s
# Process data optimized: 0.0194s

# Then, use cProfile for deeper insights
profiler = cProfile.Profile()
profiler.enable()
main()
profiler.disable()

# Save stats for later analysis
profiler.dump_stats('profile_results.stats')

# Load and analyze
stats = pstats.Stats('profile_results.stats')
stats.strip_dirs().sort_stats('cumulative').print_stats(10)

# Sample output:
# Wed Mar 20 14:30:00 2024    profile_results.stats
#          659 function calls in 0.013 seconds
#    Ordered by: cumulative time
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.000    0.000    0.013    0.013 profiling_example.py:21(main)
#       100    0.000    0.000    0.013    0.000 profiling_example.py:10(process_item)
#       100    0.013    0.000    0.013    0.000 {built-in method time.sleep}

This experiment demonstrates a comprehensive real-world profiling strategy that combines multiple tools:

First step: High-level timing with timeit – We start with timeit to:

Get baseline performance metrics for specific functions
Compare different implementations directly
Measure overall execution time
Identify which high-level components might need optimization

This gives us a quick overview that both implementations take about the same time, confirming our earlier findings.

Second step: Detailed profiling with cProfile – Next, we use cProfile to:

Get a function-by-function breakdown of execution time
Identify specific bottlenecks at a granular level
See the number of calls to each function
Understand the call hierarchy

Third step: Saving and analyzing with pstats – Finally, we:

Save profiling data to a file for persistence
Load the data and apply filtering/sorting
Focus on the most time-consuming functions
Get a clean, readable output

This multi-tool approach provides several advantages:

You get both high-level and detailed insights
You can save profiling data for later comparison
You can share results with team members
You can track performance changes over time

In our example, we confirm that our main bottleneck is the time.sleep() call inside process_item, which accounts for most of the execution time. Without this combined approach, we might have missed important details or wasted time optimizing the wrong parts of our code.

This approach gives you both high-level timing information and detailed profiling data, allowing for a comprehensive performance analysis.

Best Practices

Based on our experiments, here are some best practices for effective profiling:

Start with the right tool for the job:
- Use timeit for quick, targeted measurements of specific functions or code blocks
- Use cProfile for comprehensive program analysis when you need to understand how all parts of your code interact
- Use pstats for in-depth analysis of profiling data when you need to filter, sort, and interpret complex profiling results

For example, if you're just trying to decide between two implementations of a sorting algorithm, timeit is sufficient. But if you're trying to understand why your entire web application is slow, start with cProfile.

Profile realistic workloads:
- Synthetic benchmarks often mislead because they don't reflect real-world usage patterns
- Use production-like data sizes to see how your code scales with realistic inputs
- Run multiple iterations to account for variance and ensure your results are reliable

A function that's fast with 10 items might be painfully slow with 10,000. Always test with data sizes that match your production needs.

Focus on the right metrics:
- cumulative time shows the total time spent in a function and all its calls. It’s useful for finding the overall most expensive operations.
- tottime shows time spent only in the function itself. It’s useful for finding inefficient implementations.
- ncalls helps identify functions called excessively. It’s useful for finding redundant operations.

For example, a function with a small tottime but large cumulative time might be efficient itself but is calling expensive subfunctions.

Save profiling data for comparison:
- Use profiler.dump_stats() to save data from different versions of your code
- Compare before and after optimization to quantify improvements
- Track performance over time to catch regressions early

This practice helps you prove that your optimizations are actually working and prevents performance from degrading over time.

Look for the 80/20 rule:
- 80% of time is often spent in 20% of the code. Focus optimization efforts on these "hot spots"
- Focus optimization efforts on the functions with the highest cumulative time.
- Don't optimize what isn't slow – premature optimization wastes time and can make code more complex.

For example, in our experiments, the time.sleep() call was the clear bottleneck. Optimizing anything else would be pointless until that's addressed.

By following these practices, you'll make the most efficient use of your profiling tools and focus your optimization efforts where they'll have the greatest impact.

Conclusion

Python's built-in profiling tools offer a powerful arsenal for identifying and resolving performance bottlenecks in your code. By leveraging the timeit, cProfile, and pstats modules effectively, you can get deep insights into your application's performance without relying on third-party tools.

Each tool serves a specific purpose:

timeit helps you measure execution time of specific code snippets
cProfile gives you a comprehensive view of function calls and execution time
pstats lets you analyze, filter, and interpret profiling data
profile provides a customizable profiling interface for special cases

The greatest power comes from combining these tools, as we demonstrated in our practical experiments. This allows you to approach performance optimization systematically:

Identify high-level performance concerns with timeit
Drill down into specific bottlenecks with cProfile
Analyze and interpret results with pstats
Make targeted optimizations based on data, not guesswork

Remember that profiling is as much an art as it is a science. The goal isn't just to make code faster, but to understand why it's slow in the first place. With the techniques demonstrated in this article, you're well-equipped to tackle performance challenges in your Python applications.

Apply these profiling techniques to your own code, and you'll be surprised at what you discover. Often, the bottlenecks aren't where you expect them to be!

References and Further Reading

Python timeit documentation: The official documentation for the timeit module, with detailed explanations of all parameters and functions.
Python cProfile documentation: Comprehensive guide to profiling modules, including both cProfile and profile.
Python pstats documentation: Detailed reference for the pstats module, explaining all methods for analyzing profiling data.
Python Profilers - Official Documentation: The complete official documentation on Python's profiling capabilities.
Python Speed - Performance Tips: A collection of practical tips for optimizing Python code once you've identified bottlenecks.
The Python Profilers: In-depth explanation of profiling in Python, including details on overhead and accuracy.
Django Optimization Techniques: Practical advice on optimizing Django ORM code in real-world applications.
How Python Magic Methods Work: A Practical Guide: My previous article on FreeCodeCamp covering practical deep-dive on Python Magic methods.

How to Speed Up Website Loading by Removing Extra Bits and Bytes

Alex Tray — Mon, 24 Feb 2025 17:05:58 +0000

Let’s start with an interesting fact: according to research done by Akamai, a 1-second delay in loading a website’s page can decrease the conversion rate by 7%.

We are currently living in a fast-paced world, where time is money for everyone. People expect their favorite websites to load lightning-fast. A slow loading speed will not only make them go to the competitor but will also hurt the website's ranking in the SERP.

But the main question is, who’s the culprit? Those extra bits and bytes that almost every site contains. These are unnecessary code files, unoptimized images, and many more. But by following the right approach, you can easily strip away these inefficiencies and achieve excellent loading speed.

In this article, I will be discussing that approach in detail, so stick around with me till the very end.

Why Does Loading Speed Matter?
How to Remove Extra Bits & Bytes from the Website – Different Strategies
Tools That You Can Use to Streamline the Process
Wrapping Up

Why Does Loading Speed Matter?

There are several reasons why the loading speed of a website is considered essential. Here are some of the major ones.

Google Ranking Factor:

Website loading speed is a confirmed ranking factor. This means that search engines like Google definitely consider the loading time when evaluating a website’s quality. Usually, the ideal loading speed is between 0 and 2 seconds. However, 3 seconds is also sometimes acceptable.

In case your site does not fulfill this criteria, then there is a high probability that it may receive a penalty from Google. This will result in lower rankings in the targeted niche – which no webmaster or business wants.

Impact on User Experience:

A slow loading speed is capable of single-handedly destroying the entire user experience. When the website does not load quickly in front of the visitor, they may close it and move on to another site to find the required information, product, or service.

This will decrease the number of user engagements and increase the overall bounce rate of the site. And a high bounce rate increases the chances of facing a penalty from Google.

Negative Brand Perception:

For online businesses or brands, their authority and image are everything. When their official site takes too much time to load, it ultimately damages the brand’s perception or credibility in their minds. They will think about how you can deliver a top-notch service or product when you aren’t able to properly manage a website.

This negative impression will not only reduce customer engagement but also conversions.

Retaining Mobile Users:

Mobile contributes to 58% of the global internet traffic. It is also true mobile networks often have slow internet speed issues as compared to Wi-Fi. This can be especially true for people living in rural areas. So, that’s why you should always prioritize loading speed to efficiently retain mobile users.

How to Remove Extra Bits & Bytes from the Website – Different Strategies

Here are some of the most proven strategies you can utilize to remove extra bits and bytes from your websites.

Perform Code Optimization:

Excessive HTML, CSS, and JavaScript can greatly slow down a website. Due to the large code file, the host server will have to transfer more packets to the client browser, ultimately resulting in slow loading.

To resolve this issue, it is always recommended to perform code optimization. The most widely known and used technique for this purpose is minification. It refers to the process of removing all the:

Unnecessary characters
White spaces
Line breaks
Comments
Unused elements.

But you’ll want to make sure that the code works as before, even after minification.

Optimizing code boosts application performance by reducing execution time and resource consumption. Refactor inefficient loops, minimize database queries, and leverage caching to enhance speed. You can use profiling tools to identify bottlenecks and streamline functions for smoother, faster performance.

To demonstrate better, below I have discussed an example:

Unoptimized JavaScript Code:

greet(name) {
    if (!name) {
        console.log("Hello, Guest!");
    } else {
        console.log("Hello, " + name + "!");
    }
}
greet("John");

Minified Version:

function greet(n){console.log("Hello, "+(n||"Guest")+"!")}greet("John");

As you can see, I created the minified version by removing all the line breaks and whitespaces. Apart from this, I used shortened variables, like “n” instead of “Name.” Finally, I also replaced the If Else statement with a shorter n || "Guest" expression.

This is how you can easily condense the entire HTML, CSS, and JavaScript code of your website, and enhance the overall loading speed.

Just keep in mind that there are multiple downsides of code minification. For instance, it significantly impacts code readability and can cause challenges in debugging and maintenance. So use this approach judiciously.

Image & Media Optimization:

Apart from code, unoptimized images, logo files and other media files are often the main culprits behind the slow loading speed of a website. This means that you also need to optimize them as well. There are numerous things you can do in this regard.

First of all – you should reduce the image size in terms of storage. It is generally recommended that each picture should be less than 500 KB in size. But note that this size can vary depending on the use case.

It’s also a good idea to choose next-generation picture formats like WebP instead of typical ones like JPEG or PNG. When it comes to video files, it’s also helpful if you go with the embedded ones from platforms like YouTube.

Now, let us explain all this with a proper example (Before & After).

Let’s say that a website uses a 2MB JPEG image for its blog post. Its optimization process will involve the following steps:

Resize the image first. The recommended dimensions are 1200x800.
Compress the image size using image compression tools (we’ll discuss one such tool later in this article)
Now, convert the JPEG file into WebP format.
Add alternative text before publishing

After optimization:

The image file size will now be reduced to KBS somewhere around 120Kb.
Your website will experience better loading speed as well as an improved user experience.

One more tip that you can consider is lazy loading. This means only loading the images and videos when they are about to be consumed.

By taking care of these few things, you can efficiently optimize images and media files to achieve faster loading speeds.

Manage Plugins & Scripts:

Your website may contain unused plugins and scripts that can cause bloat. So, to remove the extra bits and bytes, it is essential to perform regular check-ins.

First, make sure you deactivate and delete all the plugins that aren’t needed. Then, start exploring more lightweight alternatives for plugins that you are actively using. If you find any, go for them and uninstall the bulky ones to improve performance and enhance security, especially for processes like identity verification. Ensure you’re using the latest, most optimized version..

For example, Revolution Slider is a heavy plugin. It loads large scripts and images on every page, even when not needed. This ultimately affects the overall website speed. Some of its lightweight alternatives that you might consider for this include Smart Slider 3, or any other CSS-based slider.

Next comes script management. Here you should first limit any third-party scripts, such as excessive code tracking, social media widgets, and embedded content. Apart from this, don’t forget to totally disable scripts on the pages where they aren’t required.

One useful example here is Google Analytics which loads tracking scripts on every page, increasing the request time. To fix this issue, you can use Google Tag Manager to load the scripts only when they are needed.

Additionally, you can use no-code workflow automation tools like Zapier, Make, or Uncanny Automator which help streamline processes by reducing reliance on heavy plugins and scripts.

Server & Hosting Upgrades:

This is the final strategy that you can consider. Your hosting provider plays a key role in deciding the loading speed of the website. So, it’s a good idea to upgrade your hosting plan and get it from a reputable and credible service.

Also, do not forget to enable server-side compression. Doing so will automatically reduce the file sizes before transmission. Optimizing database performance is equally crucial, as database observability enables database pipeline analytics, helping to identify inefficiencies, reduce query execution time, and enhance overall site responsiveness.

Also, take steps to optimize the database queries. You can do this by removing unnecessary data while also caching data mechanisms. There are also specialized plugins available for this like WP-Optimize. It effectively cleans up all the unnecessary data saving valuable time and effort.

You should also start caching queries. Store all the frequent ones in memory. This will significantly reduce database load.

SELECT * FROM products WHERE category = 'Laptops' CACHE;

This prevents the server from re-executing the same query repeatedly.

So, these are some of the proven strategies you can apply to eliminate additional bits & bytes from the website to achieve faster loading.

Tools That You Can Use to Streamline the Process

To simplify the process of optimizing website loading speed, you can consider utilizing the following tools.

Minifier

First of all, we have Minifier, a dedicated tool that is specifically designed to automate the code minification process with a single click. It is available for free and works for HTML, CSS, and JavaScript codes.

Besides this, the tool features a user-intuitive interface so that you can quickly navigate through it. The minifier is trained according to both development and minification to ensure maximum speed and accuracy in the output.

All you need to do is either paste or upload the code file into the tool, hit the “Minify” button, and get a condensed version. You can check out the below screenshot to get a better idea how it works.

Minify also offers a wide variety of other useful tools you can use if needed. Some notable options include JSON minifier and XML formatter, among others.

So now there is no need to spend time and effort on manually minifying your code for better loading speed. You can just use this tool and get the job done with a single click.

TinyPNG

Many of you may have heard of or even used this tool. It is an image compression tool that will help you effectively reduce your image sizes for optimization. The good thing is that TinyPNG perfectly preserves the original quality of the picture (in terms of resolution) even after the compression.

All you need to do is upload the required photo from your local storage, and the tool will automatically provide a compressed version. Don’t worry about the file format, as TinyPNG supports JPG, PNG, JPEG, and many more.

The tool even provides the percentage of how much the uploaded image was compressed, like -51%, and so on. It also mentions the size of the compressed photo in terms of KBs. So, in case you are not satisfied with the file size, you can further compress it.

PNG to WebP Converter:

As I mentioned earlier, I recommend using next-gen image formats like WebP instead of older formats when possible. Usually, the widely used format is PNG, but to seamlessly convert into WebP, you can use this PNG to WebP converter.

It’s available for free and does not ask for registration/signup. Simply visit the page and start performing conversions. The conversion is performed without causing any damage to the image’s quality and formatting.

The tool also offers many extra features. For instance, you can adjust both the image’s width and height. You can also set image quality (WebP compression level) if required. And it doesn’t stop here – you can even select the right fit for the photo from the following options:

Max
Crop
Scale

Google PageSpeed Insight

How can you enhance the loading speed of the website when you don’t even know which elements are causing issues? For this purpose, Google PageSpeed Insight is the best solution. It is developed and managed by Google.

The tool effectively crawls the given page link and highlights all the issues that are causing slow loading. It even provides four different scores (0-100) for evaluation. These include:

Performance
Accessibility
Best Practices
SEO

The good thing is that Google PageSpeed Insights evaluates the page for both mobile and desktop users. The results are also provided separately. The areas of improvement are highlighted in red, along with the necessary instructions you can take. The good parts are marked with green.

By utilizing this tool, you can easily evaluate your website and then make efforts to improve the loading speed.

Cloudflare

Last but not least, Cloudflare is a good tool that helps enhance the loading speed of a website by using its global content delivery network (CDN). With this feature, it caches static content across different servers worldwide. This ultimately reduces the overall latency and improves loading speed for users in different locations.

Besides this, Cloudflare also offers a bunch of other features. For example, it automatically minifies HTML, CSS, and JavaScript files. It can even compress and convert images into next-gen formats, especially WebP.

It offers a robust DNS resolution that reduces lookup times and helps the page load faster. This feature also protects the site from DDoS attacks.

Wrapping Up

If you want to experience higher ranking and increased user engagement, then you need to optimize your website’s loading speed. The extra bits and bytes like code files, media, and so on can cause real hurdles – but don’t worry.

By using these strategies and tools, you’ll be able to speed up page loading in no time. I hope you found this article interesting and valuable.

How to Use Skeleton Screens to Improve Perceived Website Performance

Timothy Olanrewaju — Wed, 23 Oct 2024 19:59:15 +0000

When you’re building a website, it’s important to make sure that it’s fast. People have little to no patience for slow-loading websites. So as developers, we need to use all the techniques available to us to speed up our site’s performance.

And sometimes, we need to make users think that something is happening when they’re waiting for a page to load so they don’t give up and leave the site.

Fast webpage loading speed is important these days because humans’ attention spans are shrinking. According to statistics on the average human attention span, the average page visit lasts less than a minute, with users often leaving web pages in just 10-20 seconds.

This means that we as developers have had to come up with strategies to keep users engaged while waiting for their requested web page content to load. And this led to the concept of the Skeleton Screen.

In this article, we’ll be looking at what skeleton screens are, how effective they are at enhancing the user experience and build a skeleton screen of our own !.

What is a Skeleton Screen?

A skeleton screen is like a sketch of a webpage that displays before the final page fully loads. It gives you a glimpse of the form and positioning of elements on your screen (like text, images, and buttons) which are represented by a placeholder.

Here is what a YouTube skeleton screen looks like:

When you visit a website that uses skeleton screens, the skeleton screen appears first while the content is being fetched. When the content finally gets fetched, it gradually replaces the skeleton screen until the screen is fully populated.

That is what brought about the name Skeleton Screen – because the bare bones akin to a skeleton appear first before being fleshed out by real content.

Skeleton screens take the appearance or form of elements they are meant to “stand in place of” – meaning oval-shaped placeholders are replaced by oval-shaped elements on full loading, and so on.

The ultimate goal of the skeletal screen is to make the waiting game less painful by giving users something to focus on. It has nothing to do with actual load time but all to do with providing a distraction so the waiting time feels shorter. It can also reassure users that content is indeed coming. Clever right?

The Psychology Behind Skeleton Screens

Here is where things get interesting. You might already be wondering what the reasoning behind such an invention was.

Based on what we’ve already discussed, you probably agree that they are all about “Perceived Performance”. It’s less about how long users have to wait and more about how long it feels like they’re waiting.

If you’ve ever been stuck in traffic, you’d know there is a difference in feeling when you’re moving forward versus sitting still. Moving traffic, even if it’s slow, is better than being stuck in a total gridlock.

The same applies to a user who’s visiting a webpage. A visible and engaging placeholder is better than being greeted with a blank screen while waiting for the final content to show.

With skeleton screens, it's like “Hey, here is the form of the page content you’re looking for, but please, exercise some patience while we get you the real thing!”

This fits perfectly into the Zeigarnik Effect, a psychological principle suggesting that we remember incomplete tasks better than completed ones. Think of it like leaving a jigsaw puzzle half-finished on your table – your brain stays engaged, eager to see the final picture.

Similarly, when users see a skeleton screen, they remain mentally hooked, anticipating the moment when the content will fully load.

Skeleton Screens vs Spinners and Progress Bars

Spinners and progress bars might seem like a viable alternative to skeleton screens, but do they have the same effect on users? The answer is – not quite.

With spinners and progress bars, the load time is somewhat undecided, and it’s a bit like watching a clock tick – the time seems to move slower, as focusing on the hands of the clock makes the duration seem longer and more frustrating.

Skeleton screens, on the other hand, add an interesting extra layer of providing a visual cue of expected content rather than just displaying an indicator (which is what spinners and progress bars do).

Interfaces that use skeleton screens make the user scan the screen thinking things like, “That rectangle must be an image or video, and these blocks look like they are for text”. They don’t leave users idle but keep their brains and eyes engaged.

Is a Skeleton Screen Just a Visual Illusion?

Yes, skeleton screens are a bit of an illusion. They don’t speed up load times – rather, they just make it feel faster.

But here’s the thing: if not done well, this trick can backfire. Users expect that once they see the skeleton screen, the real content should follow quickly. If not frustration creeps in.

Also, adding motion to skeleton screens makes the illusion effect more effective by decreasing the perceived duration time. It is not uncommon to see sliding effects (left to right) and pulse effects (fading opacity – in and out) used in skeleton screens.

Finally, for best results, skeleton screens should be neutral in color. This is important as it helps to create a smooth and subtle loading experience without distracting or overwhelming users.

How to Build a Skeleton Screen with React

Now that you know what a skeleton screen is all about, let’s build our own using React.

Step 1: Set up Your React Project

If you’re new to React and wish to follow along, click this link and follow the steps to create your React project. When you’re done, come back here and let’s continue building.

If you already have a React project you want to use, that’s great, too.

Step 2: Install `react-loading-skeleton` Package

Next, we’ll install a package called react-loading-skeleton that helps in creating beautiful and animated skeletons. To install this package:

Navigate to your project on your terminal.
If you’re using yarn, type this command yarn add react-loading-skeleton or npm install react-loading-skeleton for npm users.

Step 3: How to Handle States and Skeleton Imports

There are variables that will be changing frequently in our project, and they need to be declared. You can read my article on state management if you are not familiar with the concept.

  import { useState } from 'react';
  import Skeleton from 'react-loading-skeleton';
  import 'react-loading-skeleton/dist/skeleton.css';

  const SkeletonScreenComponent = () => {
    const [data, setData] = useState([]);
    const [loading, setLoading] = useState(true);
    const [error, setError] = useState(null);
  }
  export default SkeletonScreenComponent;

In this code, we declared three states in our SkeletonScreenComponent which are:

data: responsible for storing the data fetched from a fake REST API with its initial value set to an empty array.
loading: to keep track of data loading with its initial value set to a Boolean value of true.
error: to store any error message with initial value set to null.

We also imported the useState hook for the states together with the Skeleton component and its CSS from the react-loading-skeleton library.

Step 4: Fetch Data from the Fake API

Our little project will be fetching data from https://jsonplaceholder.typicode.com/posts, which is a free online fake REST API.

  useEffect(() => {
    fetchData();
  }, []);

  const fetchData = async () => {
    try {
        const response = await fetch('https://jsonplaceholder.typicode.com/posts');
        if (!response.ok) {
          throw new Error('Network response was not ok');
        }
        const result = await response.json();
        setData(result);
        setLoading(false);
    } catch (err) {
      setError('Error fetching data'+ err.message);
      setLoading(false);
    } 
  };

In the code block above:

The useEffect hook is responsible for handling side effects. It’s perfect for data fetching purposes, and has its dependency array set to empty (makes it render on mount).
fetchData is an asynchronous function that fetches data from the URL, updates the data state, sets loading state to false when done, catches any errors, and updates the error state.

Step 5: Conditional Rendering

The whole idea of this project revolves around the loading state. The component renders different content based on the loading state.

If loading is true:

An Array is created where each element is a Skeleton component.
The Skeleton count is set to 2, for the post title and body. You can set the count according to the number of placeholders you want to display.

If data loading is successful:

It maps through the data array.
It renders each post’s title and body.

If there is an error, an error message is displayed.

   if (loading) {
    return (
      <div>
        {Array.from({ length: 15 },(_, index) => (
          <div key={index} style={{  marginTop: '30px'  }}>
            <Skeleton count={2} style={{marginBottom:"5px"}} />
          div>
        ))}
      div>
    );
  }

  if (error) {
    return <div>{error}div>;
  }
  return (
    <div>
      {data.map(({id, title, body}) => (
        <div key={id} style={{ marginBottom: '20px' }}>
          <h3>{title}h3>
          <p>{body}p>
        div>
      ))}
    div>
  );

Final Result

This is what our Skeleton screen looks like:

Conclusion

Skeleton screens are great at creating the illusion of progress and making users feel like the site is loading faster. But they don’t fix slow-loading pages on their own. The trick is combining skeleton screens with performance-enhancing techniques such as lazy loading, image compression, and server-side rendering.

Balancing the actual speed and the user’s perception of it is vital in web performance. Skeleton screens are just one tool in your UX toolbox – but when used correctly, they can help create a web experience that feels fast, reliable, and most importantly engaging. And in the world of web development, where perception is reality, that’s half the battle won.

For more front-end related posts and articles, you can follow me on LinkedIn and X.

See you on the next one!

A How to Start a Career in Site Reliability Engineering – SRE Career Guide

Iroro Chadere — Fri, 05 Apr 2024 18:24:12 +0000

If you're considering a career in the Site Reliability Engineering (SRE) field, you should understand what SREs do, how to get started, and how to grow as an SRE.

In this article, we'll explore what you need to know to be an SRE, and how you can develop your skills to become a successful one.

Here's what we'll cover in this article:

Introduction to Site Reliability Engineering
Role and Responsibilities of an SRE
Importance of SRE in Modern Tech Organizations
Prerequisites and Fundamental Knowledge
Essential Skills for SRE
Learning Path and Resources
How to Succeed in the SRE Field
Conclusion

Before we get started...

This isn't a course or a complete tutorial on how to master SRE – that is, it doesn't teach all the nitty-gritty of SRE. Instead, it's more like a guide that'll walk you through how to become an SRE by providing the needed materials for you to succeed.

To get started with reading this guide, you should have a desire to learn and become an SRE. SRE is a wide field, and I urge you to have a burning zeal to learn and master it.

Last but not least, keep in mind that the linked resources and additional pointers contained in this post are my personal recommendations that should help you as you dive into the SRE field. Just make sure you chose the ones that best match your learning style and goals.

Introduction to Site Reliability Engineering (SRE)

The concept of Site Reliability Engineering (SRE) originated at Google in the early 2000s, emerging as a novel approach to tackling large-scale system management challenges.

SRE was born from the necessity to ensure the reliability and scalability of rapidly growing online services. And it has since evolved into a critical discipline within the tech industry.

This origin story not only highlights SRE's roots but also its foundational importance in shaping modern operational practices.

In the early days of Google, the explosive growth of its services and the scale at which they operated introduced unprecedented reliability and scalability challenges.

Traditional IT operations approaches were insufficient for the company's needs, prompting a rethink of how to manage large-scale systems efficiently and reliably. Google's innovative solution was to create a new role that blended software engineering with IT operations, thus giving birth to Site Reliability Engineering.

This new breed of engineers was tasked with making Google's already large and complex systems more reliable, efficient, and scalable. They applied software engineering principles and practices to infrastructure and operations problems, automating tasks that were traditionally performed manually.

This approach not only improved system reliability and efficiency but also allowed for scaling operations in a way that could keep up with the company's rapid growth.

Definition and Purpose of SRE

Photo Credit: TechWorld with Nana

After exploring its origins, you can see that SRE is fundamentally about applying a software engineering mindset to help solve operations problems.

At its core, SRE is about engineering resilience into systems and applications. It focuses on the intersection of software engineering and system administration, applying principles of software design to infrastructure and operations problems.

SRE aims to strike a balance between innovation and reliability, enabling organizations to deliver feature-rich products while maintaining high levels of service reliability.

The primary purpose of SRE is to build and maintain highly reliable, scalable, and efficient systems through a combination of software development, automation, and operational best practices.

By adopting a proactive and engineering-driven approach to operations, SRE teams strive to minimize service disruptions, mitigate risks, and optimize system performance.

Role and Responsibilities of an SRE

The role of an SRE is multifaceted, encompassing a wide range of responsibilities across software development, operations, and system architecture.

Some key responsibilities of an SRE include:

Service Reliability: Ensuring the reliability, availability, and performance of critical services and systems.
Automation and Tooling: Developing automation tools and systems for provisioning, deployment, monitoring, and incident response.
Capacity Planning: Analyzing resource usage patterns and forecasting capacity requirements to support business growth.
Incident Management: Responding to and resolving incidents in a timely manner, and conducting post-incident reviews to identify root causes and prevent recurrence.
Performance Optimization: Identifying and addressing performance bottlenecks to improve system scalability and efficiency.
Security and Compliance: Implementing security best practices and ensuring compliance with regulatory requirements to protect sensitive data and infrastructure.
Collaboration and Communication: Working closely with cross-functional teams, including software engineers, product managers, and system administrators, to drive continuous improvement and innovation.

Importance of SRE in Modern Tech Organizations:

In today's digital economy, where user expectations are higher than ever, the reliability and performance of online services are critical to business success. Downtime or poor performance can have significant financial and reputational consequences, leading to lost revenue, customer churn, and damage to brand reputation.

SRE plays a vital role in addressing these challenges by applying software engineering principles to infrastructure and operations. This improves system reliability, scalability, and efficiency.

By fostering a culture of reliability and resilience, SRE enables organizations to deliver better user experiences, reduce operational overhead, and drive business growth.

And as organizations increasingly rely on cloud computing, microservices architecture, and DevOps practices to innovate and scale their operations, the role of SRE becomes even more crucial. SRE provides the expertise and tools necessary to manage complex distributed systems effectively, enabling organizations to leverage technology to achieve their business objectives.

So as you can see, SRE is not just a technical discipline but a strategic imperative for modern tech organizations seeking to thrive in a highly competitive and dynamic market landscape. By investing in SRE principles and practices, organizations can build more resilient and reliable systems, driving innovation, growth, and customer satisfaction.

Prerequisites and Fundamental Knowledge

If you're going to embark on a career in Site Reliability Engineering (SRE), you'll need a solid foundation in computer science principles, a good grasp of programming, and an understanding of version control systems.

These components equip aspiring SREs with the necessary tools to design, develop, and manage reliable and scalable systems.

Understanding of Computer Science Basics

Operating Systems Concepts: A deep understanding of operating systems (OS) is crucial for SREs. This knowledge includes, but is not limited to, process management, memory management, file systems, and the OS's role in defining the interactions between hardware and software.

🔗You can checkout this Handbook that teaches you key OS concepts for Mac, Linux, and Windows.

Familiarity with these concepts helps SREs in optimizing system performance and in diagnosing and troubleshooting system-level issues.

Networking Fundamentals: Networking is the backbone of the internet and cloud services, making it essential for SREs to understand the basics of networking. This includes 🔗TCP/IP models, DNS, HTTP, HTTPS, and network protocols, as well as the ability to diagnose network-related issues.

Here's a 🔗solid introduction to computer networking basics you can use to get started.

And here's a 🔗full handbook on HTTP Networking for beginners.

A solid grasp of networking principles allows SREs to ensure that the services they manage can communicate efficiently and reliably across the internet and within distributed systems.

Proficiency in Programming Languages

Recommended Languages (Python, Go, Java): SREs must be proficient in at least one programming language.

Python is widely favored for its simplicity and the vast ecosystem of libraries, making it ideal for automation scripts and tools.

freeCodeCamp 🔗has a couple Python certifications if you want to learn the basics and get some practice coding in Python.

Go, developed by Google, is becoming increasingly popular in cloud services and systems programming due to its efficiency and performance.

🔗Here's a full course that'll teach you go by having you build 11 projects.

Java, known for its portability and extensive use in enterprise environments, is also valuable.

🔗Here's a full course that teaches you coding in Java, 🔗along with a handbook to reinforce your skills.

Mastery of these languages enables SREs to write efficient, reliable software that automates and enhances system operations.

Scripting Skills (for example, Shell Scripting): Scripting skills are important for automating routine tasks, such as software deployment, system configuration, and monitoring. Shell scripting, in particular, is essential for Unix/Linux-based systems.

🔗Here's a tutorial on bash scripting that'll walk you through some examples.

These scripting skills save time, reduce the likelihood of human error, and ensure that operations can scale efficiently.

Familiarity with Version Control Systems (like Git)

Version control is fundamental to modern software development and operations. Git, being the most widely used version control system, is crucial for tracking changes in code, collaboration, and maintaining the integrity of software projects.

Understanding Git workflows, branches, commits, and merges is essential for SREs, as it enables them to manage code changes, automate parts of the software delivery pipeline, and roll back changes if necessary.

🔗Here's a full book that'll teach you everything you need to know (and more!) to get started with Git.

And 🔗here's a handbook that'll review the common commands and actions you'll use in version control every day.

Together, these prerequisites form the foundation upon which SREs build their skills. Mastery of computer science fundamentals, programming, and version control is essential for anyone looking to succeed in the rapidly evolving field of Site Reliability Engineering.

Essential Skills for SRE

The image above is gotten from SquadCast

The realm of Site Reliability Engineering is both broad and deep. It encompasses a range of skills that ensure systems are not only reliable but also efficient, scalable, and responsive to the needs of users and businesses alike.

System Administration and Operations

Knowledge of Linux/Unix Administration: Proficiency in managing and troubleshooting 🔗Linux or Unix-based environments is fundamental. This includes managing file systems, users, processes, packages, and services.
Network Administration: Understanding network configuration, firewall management, and network services ensures SREs can optimize network performance and security. 🔗Here's an article that explains Network Admin.
Resource Management: Efficient management of system resources, including CPU, memory, and disk IO, to ensure optimal performance and reliability.

Automation and Infrastructure as Code (IaC)

Automation Tools: Proficiency in tools like Ansible, Chef, or Puppet for 🔗automating deployment, configuration, and management tasks.
Infrastructure as Code: Using tools such as Terraform and CloudFormation to manage infrastructure through code, enabling scalable and reproducible environments with reduced human error. TerraForm is the most suitable and popular, and I recommend that you 🔗check out this 15 minute intro.
Scripting and Coding: Ability to write scripts and small programs to automate tasks and integrate systems

Monitoring and Alerting

Implementing Monitoring Tools: Experience with tools like 🔗Prometheus, 🔗Grafana, ELK Stack, or Splunk for real-time monitoring of applications and infrastructure. There are a lot of tools to mange and monitor incidents, but the ones listed above are the most wildly used in the industry.
Log Management and Analysis: Ability to aggregate, analyze, and interpret logs from various sources for insight into system behavior and troubleshooting.
Alerting Strategies: Developing effective alerting mechanisms that accurately reflect system health and operational issues without overwhelming with false positives.

Incident Response and Post-Incident Analysis

Incident Management: Ability to lead and manage the response to system outages or performance degradations to restore service as quickly as possible.
🔗 Blameless Postmortems: Conducting thorough analysis post-incident to identify root causes without attributing blame, focusing instead on learning and improvement.
Reliability Metrics: Tracking and improving key reliability metrics such as availability, latency, and error rates. 🔗 Here's an article from Blameless that explains more about reliability metrics.

Capacity Planning and Performance Management

Performance Tuning: After you've reviewed and gathered logs from your monitoring tools, it's a good idea to identify and optimise performance bottlenecks in applications and infrastructure.
Scalability Strategies: Planning and implementing strategies for scaling systems to handle growth in users or data volume efficiently.
Capacity Forecasting: Using metrics and trends to forecast future capacity needs and planning ahead to meet those requirements. Don't wait and hope the application won't go down – your task is to see into the future with the tools and skills you have to prevent it from going down.

Cloud Computing Concepts and Technologies

Cloud Service Models: Understanding the spectrum of cloud services (🔗 IaaS, PaaS, SaaS) and how they can be leveraged for reliability and scalability.
Cloud Providers: Familiarity with major cloud providers such as AWS, Google Cloud, and Azure, and their specific technologies and services.
🔗 Here's a 14 hour course to help you learn AWS, 🔗 a 4 hour course on Google Cloud, and a 🔗 13 hour course on Azure to get you on your feet!
Cloud-Native Technologies: Knowledge of cloud-native technologies and practices, including 🔗 microservices architecture, containers (for example, Docker), and orchestration tools (for example, 🔗 Kubernetes), to build and manage scalable, resilient systems. 🔗 This course teaches you both Docker and Kubernetes basics.

While all of these skills are vital, it isn't a must to master them, especially all at once. But knowing or having basic understanding of these essential skills enables SREs to ensure that systems are not just up and running, but also optimised for performance, ready to scale as needed, and resilient in the face of failures.

The role of an SRE demands a blend of expertise in software engineering and system operations, making it both a challenging and rewarding career path.

Learning Path and Resources

Like I said earlier in this article, this isn't a tutorial – it's more like a learning path that'll walk you through all that you need to get started in the SRE field.

The journey to becoming a proficient SRE is continuous and multifaceted. Engaging with a variety of resources and communities can significantly enhance your learning experience.

Below are some approaches and resources that you can use to learn or master the SRE field.

Online Courses and Tutorials

Platforms like Udemy, Coursera, Udacity, and edX offer comprehensive courses on SRE fundamentals, 🔗 cloud computing, 🔗 automation, and more. Look for courses developed in partnership with leading tech companies and universities.
Specific Tutorials on tools and technologies (for example, 🔗 Kubernetes, 🔗 Terraform, Prometheus) abound on YouTube, or through the documentation and learning resources provided by the tools themselves. 🔗 Here's a fun tutorial that uses Prometheus as part of a larger tech stack to secure server infrastructure clouds.

Books and Publications

🔗 Site Reliability Engineering by Niall Richard Murphy, Betsy Beyer, Chris Jones, and Jennifer Petoff (often referred to as the "SRE Bible"), published by O'Reilly, offers insights directly from Google's SRE team.
🔗 The Phoenix Project and 🔗 The DevOps Handbook by Gene Kim, Jez Humble, and others provide excellent insights into DevOps principles, which overlap significantly with SRE practices. If you're a fan of books, then you can purchase those books to read.
Industry Publications such as ACM Queue or 🔗 IEEE Software regularly feature articles on SRE topics, case studies, and best practices.

Hands-On Projects and Exercises

Cloud Platforms offer free tiers or trial periods that are perfect for experimenting with cloud-based infrastructure and services.
GitHub and GitLab host a multitude of open-source projects where you can contribute code, documentation, or even participate in issue resolution and feature requests.
Personal Projects can also serve as a valuable learning tool. Try to replicate real-world systems, or automate the deployment and management of an application from scratch. The best way to learn is to practice.
Contributing to open-source projects related to SRE tools and technologies not only gives you hands-on experience but also helps you understand the community standards and practices. Open source is a great way to learn from others, improve your knowledge, and gain valuable experience. Think of working on an open source project like an entry-level job where you get to do real things! Contribute, contribute, contribute.

Embarking on your SRE learning journey is both exciting and demanding. It requires a commitment to continuous learning and improvement.

Leveraging a mix of online resources, books, hands-on projects, community participation, and professional networking will equip aspiring SREs with the knowledge, skills, and insights needed to succeed in this dynamic field.

How to Succeed in the SRE Field

Navigating a successful career in Site Reliability Engineering (SRE) requires more than just technical acumen. You'll also need to cultivate a mindset geared towards growth, collaboration, and resilience.

Achieving success as an SRE involves a blend of continuous learning, adaptability, communication, problem-solving, and a commitment to fostering a culture of reliability.

Continual Learning and Skill Development

Stay Updated: The tech field evolves rapidly, with new tools, languages, and practices emerging constantly. Dedicate time regularly to learn new skills and technologies. Search through YouTube, LinkedIn and Twitter and connects with friends, folks and people who share the same goal and skills with you.
Deepen and Broaden Your Knowledge: While specializing in certain areas is valuable, having a broad understanding of related disciplines, such as cloud services, networking, and cybersecurity, can significantly enhance your effectiveness as an SRE.

Adaptability to New Technologies and Methodologies

Be Open to Change: Embrace new methodologies and technologies. The willingness to adapt and experiment with innovative solutions is crucial in an environment where reliability and efficiency are paramount.
Experimentation and Evaluation: Apply critical thinking to assess the applicability of new tools and practices to your organization's specific challenges and objectives.

Effective Communication and Collaboration

Clear Communication: Whether it's documenting an incident report, explaining a technical concept to a non-technical stakeholder, or writing code comments, clear communication is key.
🔗 Here's an article I found that can help with some effect communication.
Collaborative Mindset: SRE involves working closely with development, operations, and business teams. Building strong relationships based on trust and mutual respect is essential for achieving common goals.
🔗 Here's some killer advice from LinkedIn that can help.

Problem-Solving and Troubleshooting Skills

Analytical Approach: Develop a methodical approach to troubleshooting and problem-solving. This includes breaking down complex systems into smaller components, identifying potential failure points, and systematically eliminating possibilities.
Learning from Failures: Adopt a mindset that views failures as learning opportunities. Conduct blameless postmortems to understand what went wrong and how similar incidents can be prevented in the future.

Embracing a Culture of Reliability and Resilience

Prioritize Reliability: Advocate for reliability and uptime within your organization, emphasizing that reliability is a feature not just for customers but for the business's bottom line.
Resilience Engineering: Focus on building systems that are not only reliable under normal conditions but can also gracefully handle unexpected stressors and failures. This involves designing for failure, anticipating bottlenecks, and implementing fallback mechanisms. 🔗 Check out this article to learn more about Resilience Engineering.

Success in the SRE field is about more than just keeping the systems running. You'll also need to foresee potential issues, enhance system resilience, and ensure that the infrastructure can support the organization's long-term goals.

By focusing on continual learning, adaptability, communication, problem-solving, and a culture of reliability, you can contribute significantly to your team and organization, while also advancing your career in this dynamic and critical field.

If for some reasons you're still lost in this SRE thing, you can connect with me on LinkedIn or Twitter where I'll be sharing some news, info, and updates about trending SRE topics and discussions.

Conclusion

In this guide, we've journeyed through the essentials of what it takes to embark on a career in SRE. You should now understand its foundational principles and know how to acquire the necessary skills to excel in the role and make a significant impact within tech organizations.

Here's a recap of what we covered:

Key Points

Introduction to SRE: We started with the genesis of SRE at Google, outlining its purpose to bridge the gap between development and operations, emphasizing reliability, scalability, and operational efficiency.
Prerequisites and Fundamental Knowledge: A strong foundation in computer science principles, programming languages, and version control is essential for aspiring SREs.
Essential Skills for SRE: We delved into system administration, automation, monitoring, incident response, and cloud computing as critical skills for anyone in the SRE domain.
Learning Path and Resources: The path to becoming an SRE involves continuous learning through online courses, books, hands-on projects, and community engagement.
Succeeding in the SRE Field: Success hinges on continual learning, adaptability, effective communication, problem-solving skills, and fostering a culture of reliability and resilience.

Pursue SRE as a Career Path

Site Reliability Engineering is a mindset and a set of practices that can lead to highly rewarding careers. As businesses increasingly rely on technology, the demand for people who can ensure systems are reliable, scalable, and efficient has never been higher.

Pursuing a career in SRE offers the opportunity to work at the forefront of technology innovation, solving complex problems and making a tangible impact on the digital landscape.

performance - freeCodeCamp.org

How to Fix App Jank: A Practical Guide to Profiling Flutter Apps with DevTools

Table of Contents

What Jank Actually Is

Setting Up for Accurate Profiling

The Performance View: Reading the Frame Timeline

The Frame Chart

The Two Threads

Reading the Flame Chart

The CPU Profiler: Finding the Root Cause

Recording a Profile

Reading the Flame Graph

The Bottom-up Table

Fixing a CPU-bound Bottleneck

The Flutter Inspector: Hunting Unnecessary Rebuilds

Enabling Rebuild Counting

Fixing Unnecessary Rebuilds by Extracting State

Using RepaintBoundary to Isolate Expensive Painting

The Memory View: Catching Leaks Before Users Do

What a Memory Leak Looks Like in DevTools

Finding a Leak

The Most Common Sources of Leaks

Fixing the Most Common Jank Patterns

Expensive Synchronous Work on the Main Isolate

Future Created Inside Build

Large List Rendered as a Column

Animated Opacity Causing Raster Thread Jank

Verifying Your Fix Actually Worked

Conclusion

How to Scale Laravel Applications for High-Traffic Production Systems

Prerequisites

Table of Contents

What Happens When Laravel Apps Start Growing

Common Laravel Bottlenecks

N+1 Queries

Missing Indexes

Inefficient Eager Loading

Synchronous Processing

Large Payloads

Expensive Joins

How to Optimize the Database

Add Indexes Around Real Query Patterns

Use Eager Loading Deliberately

Optimize Queries Before Adding Hardware

Process Large Tables with Chunking

Use Cursor Pagination for High-Volume Feeds

Split Reads with Read Replicas

How to Scale with Redis

Caching

Sessions

Rate Limiting

Queues

How to Use Queue-Driven Architectures

Laravel Queues

Horizon

Failed Jobs and Retries

Queue Monitoring

How to Optimize API Performance

API Resources

Pagination

Response Optimization

Rate Limiting

Caching API Responses

How to Monitor Laravel in Production

An Example High-Traffic Laravel Architecture

Lessons Learned the Hard Way

1. Premature Optimization

2. Over-caching

3. Missing Indexes

4. Queue Overload

5. Large Transactions

6. Treating Symptoms as Causes

A Pre-Launch Scaling Checklist

Conclusion

References

How to Use PostgreSQL as a Cache, Queue, and Search Engine

Table of Contents

Prerequisites

The Setup

Benchmark 1: Caching with UNLOGGED Tables

1. Basic Explanation of the `Get-ChildItem` Command

Most Used Examples of Searching by `gci` Command

3. When is the `-Path` option not needed?

4. Advanced Searching – Combining `Get-ChildItem` with the `Where-Object` Command