<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ ratelimit - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ ratelimit - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sat, 30 May 2026 22:25:22 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/ratelimit/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Implement Token Bucket Rate Limiting with FastAPI ]]>
                </title>
                <description>
                    <![CDATA[ APIs power everything from mobile apps to enterprise platforms, quietly handling millions of requests per day. Without safeguards, a single misconfigured client or a burst of automated traffic can ove ]]>
                </description>
                <link>https://www.freecodecamp.org/news/token-bucket-rate-limiting-fastapi/</link>
                <guid isPermaLink="false">69c6f8747cf270651055571c</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ api ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ratelimit ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Prosper Ugbovo ]]>
                </dc:creator>
                <pubDate>Fri, 27 Mar 2026 21:36:52 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/fba3d4a6-faca-429a-8e16-a3e9778d2cf8.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>APIs power everything from mobile apps to enterprise platforms, quietly handling millions of requests per day. Without safeguards, a single misconfigured client or a burst of automated traffic can overwhelm your service, degrading performance for everyone.</p>
<p>Rate limiting prevents this. It controls how many requests a client can make within a given timeframe, protecting your infrastructure from both intentional abuse and accidental overload.</p>
<p>Among the several algorithms used for rate limiting, the <strong>Token Bucket</strong> stands out for its balance of simplicity and flexibility. Unlike fixed window counters that reset abruptly, the Token Bucket allows short bursts of traffic while still enforcing a sustainable long-term rate. This makes it a practical choice for APIs where clients occasionally need to send a quick flurry of requests without being penalized.</p>
<p>In this guide, you'll implement a Token Bucket rate limiter in a FastAPI application. You'll build the algorithm from scratch as a Python class, wire it into FastAPI as middleware with per-user tracking, add standard rate limit headers to your responses, and test everything with a simple script. By the end, you'll have a working rate limiter you can drop into any FastAPI project.</p>
<h3 id="heading-what-well-cover">What we'll cover:</h3>
<ol>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-understanding-the-token-bucket-algorithm">Understanding the Token Bucket Algorithm</a></p>
</li>
<li><p><a href="#heading-setting-up-the-fastapi-project">Setting Up the FastAPI Project</a></p>
</li>
<li><p><a href="#heading-implementing-the-token-bucket-class">Implementing the Token Bucket Class</a></p>
</li>
<li><p><a href="#heading-adding-peruser-rate-limiting-middleware">Adding Per-User Rate Limiting Middleware</a></p>
</li>
<li><p><a href="#heading-testing-the-rate-limiter">Testing the Rate Limiter</a></p>
</li>
<li><p><a href="#heading-where-rate-limiting-fits-in-your-architecture">Where Rate Limiting Fits in Your Architecture</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow this tutorial, you'll need:</p>
<ul>
<li><p><strong>Python 3.9 or later</strong> installed on your machine. You can verify your version by running <code>python --version</code>.</p>
</li>
<li><p><strong>Familiarity with Python</strong> and basic knowledge of how HTTP APIs work.</p>
</li>
<li><p><strong>A text editor</strong> such as VS Code, Vim, or any editor you prefer.</p>
</li>
</ul>
<h2 id="heading-understanding-the-token-bucket-algorithm">Understanding the Token Bucket Algorithm</h2>
<p>Before writing code, it helps to understand the mechanism you'll be building.</p>
<p>The Token Bucket algorithm models rate limiting with two simple concepts: a <strong>bucket</strong> that holds tokens, and a <strong>refill process</strong> that adds tokens at a steady rate.</p>
<p>Here is how it works:</p>
<ol>
<li><p>The bucket starts full, holding a fixed maximum number of tokens (the capacity).</p>
</li>
<li><p>Each incoming request costs one token. If the bucket has tokens available, the request is allowed, and one token is removed.</p>
</li>
<li><p>If the bucket is empty, the request is rejected with a <code>429 Too Many Requests</code> response.</p>
</li>
<li><p>Tokens are added back to the bucket at a constant refill rate, regardless of whether requests are coming in. The bucket never exceeds its maximum capacity.</p>
</li>
</ol>
<p>The capacity determines how large a burst the system absorbs. The refill rate defines the sustained throughput. For example, a bucket with a capacity of 10 and a refill rate of 2 tokens per second allows a client to fire 10 requests instantly, but after that, they can only make 2 requests per second until the bucket refills.</p>
<p>This two-parameter design gives you precise control:</p>
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Controls</th>
<th>Example</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Capacity</strong> (max tokens)</td>
<td>Maximum burst size</td>
<td>10 tokens = 10 requests at once</td>
</tr>
<tr>
<td><strong>Refill rate</strong></td>
<td>Sustained throughput</td>
<td>2 tokens/sec = 2 requests/sec long-term</td>
</tr>
<tr>
<td><strong>Refill interval</strong></td>
<td>Granularity of refill</td>
<td>1.0 sec = tokens added every second</td>
</tr>
</tbody></table>
<p>Compared to other rate-limiting algorithms:</p>
<ul>
<li><p><strong>Fixed Window</strong> counters reset at hard boundaries (for example, every minute), which can allow double the intended rate at window edges. The Token Bucket has no such boundary.</p>
</li>
<li><p><strong>Sliding Window</strong> counters are more accurate but more complex to implement and maintain.</p>
</li>
<li><p><strong>Leaky Bucket</strong> processes requests at a fixed rate and queues the rest. The Token Bucket is similar, but allows bursts instead of forcing a constant pace.</p>
</li>
</ul>
<p>The Token Bucket is widely used in production systems. AWS API Gateway, NGINX, and Stripe all use variations of it.</p>
<h2 id="heading-setting-up-the-fastapi-project">Setting Up the FastAPI Project</h2>
<p>Create a project directory and install the dependencies:</p>
<pre><code class="language-shell">mkdir fastapi-ratelimit &amp;&amp; cd fastapi-ratelimit
</code></pre>
<p>Create and activate a virtual environment:</p>
<pre><code class="language-shell">python -m venv venv
</code></pre>
<p>On Linux/macOS:</p>
<pre><code class="language-shell">source venv/bin/activate
</code></pre>
<p>On Windows:</p>
<pre><code class="language-shell">venv\Scripts\activate
</code></pre>
<p>Install FastAPI and Uvicorn:</p>
<pre><code class="language-shell">pip install fastapi uvicorn
</code></pre>
<p>Create the project file structure:</p>
<pre><code class="language-plaintext">fastapi-ratelimit/
├── main.py
└── ratelimiter.py
</code></pre>
<p>Create <code>main.py</code> with a minimal FastAPI application:</p>
<pre><code class="language-python">from fastapi import FastAPI

app = FastAPI()


@app.get("/")
async def root():
    return {"message": "Hello, world!"}
</code></pre>
<p>Start the server to verify the setup:</p>
<pre><code class="language-shell">uvicorn main:app --reload
</code></pre>
<p>You should see output similar to:</p>
<pre><code class="language-plaintext">INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process
</code></pre>
<p>Open in your browser <a href="http://127.0.0.1:8000">http://127.0.0.1:8000</a> or run curl <a href="http://127.0.0.1:8000">http://127.0.0.1:8000</a>. You should receive:</p>
<pre><code class="language-json">{"message": "Hello, world!"}
</code></pre>
<p>With the project running, you can move on to building the rate limiter.</p>
<h2 id="heading-implementing-the-token-bucket-class">Implementing the Token Bucket Class</h2>
<p>Open <code>ratelimiter.py</code> in your editor and add the following code. This class implements the Token Bucket algorithm with thread-safe operations:</p>
<pre><code class="language-python">import time
import threading


class TokenBucket:
    """
    Token Bucket rate limiter.

    Each bucket starts full at `max_tokens` and refills `refill_rate`
    tokens every `interval` seconds, up to the maximum capacity.
    """

    def __init__(self, max_tokens: int, refill_rate: int, interval: float):
        """
        Initialize a new Token Bucket.

        :param max_tokens: Maximum number of tokens the bucket can hold (burst capacity).
        :param refill_rate: Number of tokens added per refill interval.
        :param interval: Time in seconds between refills.
        """
        assert max_tokens &gt; 0, "max_tokens must be positive"
        assert refill_rate &gt; 0, "refill_rate must be positive"
        assert interval &gt; 0, "interval must be positive"

        self.max_tokens = max_tokens
        self.refill_rate = refill_rate
        self.interval = interval

        self.tokens = max_tokens
        self.refilled_at = time.time()
        self.lock = threading.Lock()

    def _refill(self):
        """Add tokens based on elapsed time since the last refill."""
        now = time.time()
        elapsed = now - self.refilled_at

        if elapsed &gt;= self.interval:
            num_refills = int(elapsed // self.interval)
            self.tokens = min(
                self.max_tokens,
                self.tokens + num_refills * self.refill_rate
            )
            # Advance the timestamp by the number of full intervals consumed,
            # not to `now`, so partial intervals aren't lost.
            self.refilled_at += num_refills * self.interval

    def allow_request(self, tokens: int = 1) -&gt; bool:
        """
        Attempt to consume `tokens` from the bucket.

        Returns True if the request is allowed, False if the bucket
        does not have enough tokens.
        """
        with self.lock:
            self._refill()

            if self.tokens &gt;= tokens:
                self.tokens -= tokens
                return True
            return False

    def get_remaining(self) -&gt; int:
        """Return the current number of available tokens."""
        with self.lock:
            self._refill()
            return self.tokens

    def get_reset_time(self) -&gt; float:
        """Return the Unix timestamp when the next refill occurs."""
        with self.lock:
            return self.refilled_at + self.interval
</code></pre>
<p>The class has three public methods:</p>
<ul>
<li><p><code>allow_request()</code> is the core method. It refills tokens based on elapsed time, then tries to consume one. It returns <code>True</code> if the request is allowed, <code>False</code> if the bucket is empty.</p>
</li>
<li><p><code>get_remaining()</code> returns the number of tokens the client has left. You will use this for response headers.</p>
</li>
<li><p><code>get_reset_time()</code> returns when the next token will be added. This is also exposed in response headers.</p>
</li>
</ul>
<p>The <code>threading.Lock</code> ensures that concurrent requests don't create race conditions when reading or modifying the token count. This is important because FastAPI runs request handlers concurrently.</p>
<p><strong>Note:</strong> This implementation stores bucket state in memory. If you restart the server, all buckets reset. For persistence across restarts or multiple server instances, you would store token counts in Redis or a similar external store. The in-memory approach is sufficient for single-instance deployments.</p>
<h2 id="heading-adding-per-user-rate-limiting-middleware">Adding Per-User Rate Limiting Middleware</h2>
<p>A single global bucket would throttle all users together. One heavy user could exhaust the limit for everyone. Instead, you'll assign a separate bucket to each user, identified by their IP address.</p>
<p>Add the following to <code>ratelimiter.py</code>, below the <code>TokenBucket</code> class:</p>
<pre><code class="language-python">from collections import defaultdict


class RateLimiterStore:
    """
    Manages per-user Token Buckets.

    Each unique client key (e.g., IP address) gets its own bucket
    with identical parameters.
    """

    def __init__(self, max_tokens: int, refill_rate: int, interval: float):
        self.max_tokens = max_tokens
        self.refill_rate = refill_rate
        self.interval = interval
        self._buckets: dict[str, TokenBucket] = {}
        self._lock = threading.Lock()

    def get_bucket(self, key: str) -&gt; TokenBucket:
        """
        Return the TokenBucket for a given client key.
        Creates a new bucket if one does not exist yet.
        """
        with self._lock:
            if key not in self._buckets:
                self._buckets[key] = TokenBucket(
                    max_tokens=self.max_tokens,
                    refill_rate=self.refill_rate,
                    interval=self.interval,
                )
            return self._buckets[key]
</code></pre>
<p>Now open <code>main.py</code> and replace its contents with the full application, including the rate-limiting middleware:</p>
<pre><code class="language-python">import time

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse

from ratelimiter import RateLimiterStore

app = FastAPI()

# Configure rate limits: 10 requests burst, 2 tokens added every 1 second.
limiter = RateLimiterStore(max_tokens=10, refill_rate=2, interval=1.0)


@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    """
    Middleware that enforces per-IP rate limiting on every request.
    Adds standard rate limit headers to every response.
    """
    # Identify the client by IP address.
    client_ip = request.client.host
    bucket = limiter.get_bucket(client_ip)

    # Check if the client has tokens available.
    if not bucket.allow_request():
        retry_after = bucket.get_reset_time() - time.time()
        return JSONResponse(
            status_code=429,
            content={"detail": "Too many requests. Try again later."},
            headers={
                "Retry-After": str(max(1, int(retry_after))),
                "X-RateLimit-Limit": str(bucket.max_tokens),
                "X-RateLimit-Remaining": str(bucket.get_remaining()),
                "X-RateLimit-Reset": str(int(bucket.get_reset_time())),
            },
        )

    # Request is allowed. Process it and add rate limit headers to the response.
    response = await call_next(request)
    response.headers["X-RateLimit-Limit"] = str(bucket.max_tokens)
    response.headers["X-RateLimit-Remaining"] = str(bucket.get_remaining())
    response.headers["X-RateLimit-Reset"] = str(int(bucket.get_reset_time()))
    return response


@app.get("/")
async def root():
    return {"message": "Hello, world!"}


@app.get("/data")
async def get_data():
    return {"data": "Some important information"}


@app.get("/health")
async def health():
    return {"status": "ok"}
</code></pre>
<p>The middleware does the following on every incoming request:</p>
<ol>
<li><p>Extracts the client's IP address from <code>request.client.host</code>.</p>
</li>
<li><p>Retrieves (or creates) that client's Token Bucket from the store.</p>
</li>
<li><p>Calls <code>allow_request()</code>. If the bucket is empty, it returns a <code>429</code> response with a <code>Retry-After</code> header telling the client how long to wait.</p>
</li>
<li><p>If tokens are available, it processes the request normally and attaches rate limit headers to the response.</p>
</li>
</ol>
<p>The three <code>X-RateLimit-*</code> headers follow a <a href="https://datatracker.ietf.org/doc/draft-ietf-httpapi-ratelimit-headers/">widely adopted convention</a>:</p>
<table>
<thead>
<tr>
<th>Header</th>
<th>Meaning</th>
</tr>
</thead>
<tbody><tr>
<td><code>X-RateLimit-Limit</code></td>
<td>Maximum burst capacity (max tokens)</td>
</tr>
<tr>
<td><code>X-RateLimit-Remaining</code></td>
<td>Tokens left in the current bucket</td>
</tr>
<tr>
<td><code>X-RateLimit-Reset</code></td>
<td>Unix timestamp when the next refill occurs</td>
</tr>
</tbody></table>
<p>These headers allow well-behaved clients to self-throttle before hitting the limit.</p>
<h2 id="heading-testing-the-rate-limiter">Testing the Rate Limiter</h2>
<p>Restart the server if it's not already running:</p>
<pre><code class="language-shell">uvicorn main:app --reload
</code></pre>
<h3 id="heading-manual-testing-with-curl">Manual Testing with curl</h3>
<p>Manual testing with <code>curl</code> is useful during development when you want to quickly verify that your middleware is working. A single request lets you confirm that the rate limit headers are present, the values are correct, and one token is consumed as expected.</p>
<p>This approach is fast and requires no additional setup, making it ideal for spot-checking your configuration after making changes.</p>
<p>Send a single request and inspect the response:</p>
<pre><code class="language-shell">curl -i http://127.0.0.1:8000/data
</code></pre>
<p>You should see a <code>200</code> response with headers like:</p>
<pre><code class="language-plaintext">HTTP/1.1 200 OK
x-ratelimit-limit: 10
x-ratelimit-remaining: 9
x-ratelimit-reset: 1739836801
</code></pre>
<h3 id="heading-automated-burst-test">Automated Burst Test</h3>
<p>While <code>curl</code> confirms that the rate limiter is active, it can't verify that the limiter actually blocks requests when the bucket is empty. For that, you need to send requests faster than the refill rate and observe the <code>429</code> responses. An automated burst test is essential before deploying to production, after changing your bucket parameters, or when you need to verify both the blocking and refill behavior.</p>
<p>Create a file called <code>test_ratelimit.py</code> in your project directory:</p>
<pre><code class="language-python">import requests
import time


def test_burst():
    """Send 15 rapid requests to trigger the rate limit."""
    url = "http://127.0.0.1:8000/data"
    results = []

    for i in range(15):
        response = requests.get(url)
        remaining = response.headers.get("X-RateLimit-Remaining", "N/A")
        results.append((i + 1, response.status_code, remaining))
        print(f"Request {i+1:2d} | Status: {response.status_code} | Remaining: {remaining}")

    print()

    allowed = sum(1 for _, status, _ in results if status == 200)
    blocked = sum(1 for _, status, _ in results if status == 429)
    print(f"Allowed: {allowed}, Blocked: {blocked}")


def test_refill():
    """Exhaust tokens, wait for a refill, then confirm requests succeed again."""
    url = "http://127.0.0.1:8000/data"

    print("\n--- Exhausting tokens ---")
    for i in range(12):
        response = requests.get(url)
        print(f"Request {i+1:2d} | Status: {response.status_code}")

    print("\n--- Waiting 3 seconds for refill ---")
    time.sleep(3)

    print("\n--- Sending requests after refill ---")
    for i in range(5):
        response = requests.get(url)
        remaining = response.headers.get("X-RateLimit-Remaining", "N/A")
        print(f"Request {i+1:2d} | Status: {response.status_code} | Remaining: {remaining}")


if __name__ == "__main__":
    print("=== Burst Test ===")
    test_burst()

    # Allow bucket to refill before next test
    time.sleep(6)

    print("\n=== Refill Test ===")
    test_refill()
</code></pre>
<p>Install the <code>requests</code> library if you don't have it:</p>
<pre><code class="language-shell">pip install requests
</code></pre>
<p>Run the test:</p>
<pre><code class="language-shell">python test_ratelimit.py
</code></pre>
<p>You should see output similar to:</p>
<pre><code class="language-output">=== Burst Test ===
Request  1 | Status: 200 | Remaining: 9
Request  2 | Status: 200 | Remaining: 8
Request  3 | Status: 200 | Remaining: 7
...
Request 10 | Status: 200 | Remaining: 0
Request 11 | Status: 429 | Remaining: 0
Request 12 | Status: 429 | Remaining: 0
...
Request 15 | Status: 429 | Remaining: 0

Allowed: 10, Blocked: 5
</code></pre>
<p>The first 10 requests succeed (one token each from the full bucket). Requests 11 through 15 are rejected because the bucket is empty. The refill test then confirms that after waiting, tokens reappear and requests succeed again.</p>
<p><strong>Note:</strong> The exact split between allowed and blocked requests may vary slightly due to timing. Tokens may refill between rapid requests. This is expected behavior.</p>
<h2 id="heading-where-rate-limiting-fits-in-your-architecture">Where Rate Limiting Fits in Your Architecture</h2>
<p>The implementation in this tutorial runs inside your application process, which is the simplest approach and works well for single-instance deployments. In larger systems, rate limiting typically appears at multiple layers:</p>
<ul>
<li><p><strong>API gateway level</strong> (NGINX, Kong, Traefik, Envoy): A coarse global rate limit applied to all traffic before it reaches your application. This protects against large-scale abuse and DDoS.</p>
</li>
<li><p><strong>Application level</strong> (this tutorial): Fine-grained per-user or per-endpoint limits inside your service. This is useful for enforcing different quotas on different API tiers.</p>
</li>
<li><p><strong>Both</strong>: Many production systems combine a gateway-level global limiter with an in-app per-user limiter. The gateway catches the flood and the application enforces business rules.</p>
</li>
</ul>
<p>For multi-instance deployments (multiple server processes behind a load balancer), the in-memory <code>RateLimiterStore</code> won't share state across instances. In that case, replace the in-memory dictionary with Redis. The Token Bucket logic stays the same – only the storage layer changes.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this guide, you built a Token Bucket rate limiter from scratch and integrated it into a FastAPI application with per-user tracking and standard rate limit response headers. You also tested the implementation to verify that burst capacity and refill behavior work as expected.</p>
<p>The Token Bucket algorithm gives you two straightforward controls, capacity for burst tolerance and refill rate for sustained throughput, which cover the vast majority of rate-limiting needs.</p>
<p>From here, you can extend this foundation by:</p>
<ul>
<li><p>Replacing the in-memory store with Redis for multi-instance deployments.</p>
</li>
<li><p>Applying different rate limits per endpoint by creating separate <code>RateLimiterStore</code> instances.</p>
</li>
<li><p>Using authenticated user IDs instead of IP addresses for more accurate client identification.</p>
</li>
<li><p>Adding metrics and logging to track how often clients are being throttled.</p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build an In-Memory Rate Limiter in Next.js ]]>
                </title>
                <description>
                    <![CDATA[ An API rate limiter is a server-side component of a web service that limits the number of API requests a client can make to an endpoint within a period of time. For example, X (formerly known as Twitt ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-an-in-memory-rate-limiter-in-nextjs/</link>
                <guid isPermaLink="false">696155ea25d7491ccd74da74</guid>
                
                    <category>
                        <![CDATA[ Next.js ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ratelimit ]]>
                    </category>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Orim Dominic Adah ]]>
                </dc:creator>
                <pubDate>Fri, 09 Jan 2026 19:24:26 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767981990510/95306973-8c9a-435b-936e-ae5476f600de.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>An API rate limiter is a server-side component of a web service that limits the number of API requests a client can make to an endpoint within a period of time. For example, X (formerly known as Twitter) limits the number of tweets that a specific user can make to three hundred every three hours.</p>
<p>Rate limiters enforce the responsible use of APIs by blocking requests that exceed the set usage limits.</p>
<p>By following along with this article, you will:</p>
<ul>
<li><p>Learn how rate limiters work</p>
</li>
<li><p>Build an in-memory rate limiter for a Next.js pa router project</p>
</li>
<li><p>Use Artillery to load test the rate limiter for accuracy and resilience</p>
</li>
</ul>
<h3 id="heading-heres-what-well-cover">Here’s What We’ll Cover:</h3>
<ol>
<li><p><a href="#heading-benefits-of-rate-limiters">Benefits of Rate Limiters</a></p>
</li>
<li><p><a href="#heading-how-rate-limiters-work">How Rate Limiters Work</a></p>
</li>
<li><p><a href="#heading-rate-limiting-algorithms">Rate Limiting Algorithms</a></p>
</li>
<li><p><a href="#heading-how-to-build-an-in-memory-rate-limiter">How to Build an In-Memory Rate Limiter</a></p>
<ul>
<li><p><a href="#heading-how-the-in-memory-rate-limiter-works">How The In-Memory Rate Limiter Works</a></p>
</li>
<li><p><a href="#heading-the-request-handler">The Request Handler</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-the-front-end">The Front End</a></p>
</li>
<li><p><a href="#heading-how-to-load-test-the-rate-limiter-for-resilience-with-artillery">How to Load Test the Rate Limiter for Resilience with Artillery</a></p>
<ul>
<li><p><a href="#heading-the-load-test-configuration">The Load Test Configuration</a></p>
</li>
<li><p><a href="#heading-run-the-load-test">Run the Load Test</a></p>
</li>
<li><p><a href="#heading-review-the-results">Review the Results</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<p>To get the most out of this article, you should have experience in building APIs with Next.js pages router, Express, or any other Node.js backend framework that uses middlewares.</p>
<h2 id="heading-benefits-of-rate-limiters">Benefits of Rate Limiters</h2>
<p>Rate limiters control how many requests are allowed within a given time window. They have several benefits you should know about if you’re considering using them.</p>
<p>First, they help prevent the abuse of web servers. Rate limiters guard web servers from overuse that needlessly increases their load. They block excessive requests from Denial of Service (DoS) attacks from bots so that the web service doesn’t crash from unnecessary overload and can continue to be available to legitimate users.</p>
<p>They also help manage the cost of using external APIs. Some API endpoints make requests to external APIs to complete their operations – for example, API endpoints that send emails through an email service provider. When an endpoint relies on paid external APIs and user access of the endpoint is not restricted, excessive usage can lead to increased and expensive costs for the web service. Rate limiters block the excessive usage of endpoints like these, helping to keep costs to a reasonable minimum.</p>
<h2 id="heading-how-rate-limiters-work">How Rate Limiters Work</h2>
<p>Rate limiters work using a three-step mechanism. The process includes tracking requests from specific clients, monitoring their usage, and blocking extra requests once the threshold has been exceeded.</p>
<p>In more detail, rate limiters:</p>
<ul>
<li><p><strong>Track requests</strong>: Rate limiters take note of API clients that make requests and attributes that are specific to the clients (for example, an IP address or a userId). These specific attributes are references or keys that are used to identify clients.</p>
</li>
<li><p><strong>Monitor usage</strong>: Depending on the rate limiting mechanism, rate limiters increase or decrease the metric that is used to determine the threshold of use. For example, within a three-hour time period, Twitter can track and increase the number of times a user makes an API request to the <code>create tweet</code> endpoint.</p>
</li>
<li><p><strong>Ensure threshold compliance</strong>: Rate limiters check the threshold of use for every request made. If it has been exceeded, it blocks the request from accessing the functionality of the API endpoint and responds with a status code of 429.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767810794741/616acc5a-4df5-4314-ace2-d179b973874d.png" alt="Client-server interaction in a rate-limited endpoint" style="display:block;margin:0 auto" width="1751" height="853" loading="lazy"></li>
</ul>
<h2 id="heading-rate-limiting-algorithms">Rate Limiting Algorithms</h2>
<p>You can implement rate limiting using different algorithms based on the requirements of the rate limiter. Each rate limiting algorithm has its merits and demerits. Below are some popular rate limiting algorithms you can play around with.</p>
<h3 id="heading-fixed-window-algorithm">Fixed Window Algorithm</h3>
<p>In the fixed window rate limiting algorithm, the number of requests made within a fixed time period is tracked and every request increases the request count tracked. If the number requests within the time frame is exceeded, any extra request that comes in within the time frame is blocked. At the end of the time period, the request count is reset and increases for every request made.</p>
<p>Its mechanism is easy to understand and it’s memory-efficient. Its challenge is that spikes in traffic close to the start or the end of a time window can allow more requests than permitted.</p>
<h3 id="heading-sliding-window-algorithm">Sliding Window Algorithm</h3>
<p>The sliding window algorithm fixes the issue with the fixed window algorithm where spikes in traffic close to the start or end of a time window can allow more requests than permitted.</p>
<p>It works as follows:</p>
<ul>
<li><p>It keeps a track of the timestamps of requests made in a cache.</p>
</li>
<li><p>When there’s a new request, it removes all timestamps that are older than the start of the current time window and it appends the new request’s timestamp to the cache.</p>
</li>
<li><p>If the count of the requests in the cache is higher than the threshold, the request is blocked. Otherwise, it’s allowed.</p>
</li>
</ul>
<p>Although this algorithm is more accurate than the fixed window algorithm, it consumes more memory because of the storage of timestamps.</p>
<h3 id="heading-token-bucket-algorithm">Token Bucket Algorithm</h3>
<p>In the token bucket algorithm, a bucket that contains a predefined number of tokens is assigned to a user. Tokens are added to the bucket at a predefined rate, for example 2 tokens may be added every second.</p>
<p>Once the bucket is full, no more tokens are added. Each request consumes one or more tokens, and if the tokens are exhausted, requests are blocked until the bucket has tokens again.</p>
<p>The Token Bucket algorithm has the benefits of being memory efficient, easy to implement, and accurate enough to block extra requests even during a burst in traffic.</p>
<p>In this tutorial, we’ll use the fixed window algorithm to build a rate limiter. We’ll also battle-test it for resilience and accuracy using Artillery.</p>
<h2 id="heading-how-to-build-an-in-memory-rate-limiter">How to Build an In-Memory Rate Limiter</h2>
<p>If you’re a backend developer, you may have noticed that users sometimes abuse the reset password API endpoint in your Next.js application. This is a cause for concern because the API endpoint makes a request to your email service provider to send an email and you get charged for it.</p>
<p>Because of this, you may want to limit the requests that users make to this endpoint so that you can prevent the abuse of the API and save costs. And that’s where a rate limiter comes in.</p>
<p>You can get the <a href="https://github.com/orimdominic/nextjs-pages-router-rate-limiter">code for this tutorial here</a>is tutorial here. You can clone it, install the dependencies with <code>npm install</code>, and run it following the instructions in the <a href="https://github.com/orimdominic/nextjs-app-router-rate-limiter/blob/main/README.md">README file</a>. You’ll need it to follow along with the rest of this article.</p>
<p>I built the project using Next.js and it uses the pages router. I’ve also built the rate limiter and <a href="https://github.com/orimdominic/nextjs-app-router-rate-limiter/blob/main/src/lib/server/rate-limiter.ts">you can find it here</a>. You can see how to use it in the <a href="https://github.com/orimdominic/nextjs-app-router-rate-limiter/blob/main/src/pages/api/reset-password-init.ts">reset password API endpoint here</a>.</p>
<p>It has a user interface that you can use to test the rate limiter – but let’s dive into the code first.</p>
<h3 id="heading-how-the-in-memory-rate-limiter-works">How The In-Memory Rate Limiter Works</h3>
<p>To help you better understand the rate limiter, I've created this diagram. We'll walk through what's happening after:</p>
<img src="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/d453743c-5659-4ae1-870f-ea6117696cef.png" alt="A flow diagram for how the in-memory rate limiter works" style="display:block;margin:0 auto" width="1764" height="1652" loading="lazy">

<p>The <a href="https://github.com/orimdominic/nextjs-app-router-rate-limiter/blob/main/src/lib/server/rate-limiter.ts">src/lib/server/rate-limiter.ts</a> file exports a function called <code>applyRateLimiter</code> which accepts three parameters:</p>
<ul>
<li><p>the request object</p>
</li>
<li><p>the response object</p>
</li>
<li><p><code>getOptsFn</code></p>
</li>
</ul>
<p><code>getOptsFn</code> is a function that accepts the request object and, when executed, returns properties specific to the request for tracking, monitoring, and blocking by the rate limiter. <code>getOptsFn</code> is a function and not a static object so that the specific properties of a request can be dynamically created by the request handler for each request.</p>
<p><a href="https://github.com/orimdominic/nextjs-app-router-rate-limiter/blob/main/src/lib/server/rate-limiter.ts">src/lib/server/rate-limiter.ts</a> also has an in-memory map called <code>cache</code>. <code>cache</code> stores the key (or unique identifier) of a request and maps it to its usage. An interval runs every minute to remove keys with <code>expiredAt</code> values that have passed from the cache. This helps to manage the amount of memory used by the cache.</p>
<pre><code class="language-typescript">type GetOptionsFn = (req: NextApiRequest) =&gt; {
  key: string;
  maxTries: number;
  expiresAt: Date;
};

const cache = new Map&lt;string, Usage&gt;();

// clear stale keys from cache every minute
setInterval(() =&gt; {
  const currentDate = new Date();
  for (const [key, usage] of cache) {
    if (!usage) continue;

    if (currentDate &gt; usage.expiresAt) {
      cache.delete(key);
    }
  }
}, 60000);
</code></pre>
<p>When the rate limiter is executed, it uses the <code>getOptsFn</code> to generate the following from the request:</p>
<ul>
<li><p><code>key</code>: The unique identifier for the request that can be used to track its usage</p>
</li>
<li><p><code>maxTries</code>: The maximum number of times a request can be made within the specified time window</p>
</li>
<li><p><code>expiresAt</code>: The expiry time of a time window</p>
</li>
</ul>
<p>based on its content where it was created.</p>
<pre><code class="language-typescript">  const opts = getOptsFn(req);
  const usage = cache.get(opts.key);

  if (!usage) {
    cache.set(opts.key, {
      tries: 1,
      maxTries: opts.maxTries,
      expiresAt: opts.expiresAt,
    });

    return;
  }
</code></pre>
<p>The rate limiter then checks if the <code>key</code> of the request exists in the cache. If it doesn’t, it sets it in the cache, mapping it to the following values:</p>
<ul>
<li><p><code>tries</code> : The number of times that the request has been made without being blocked</p>
</li>
<li><p><code>maxTries</code>: The maximum number of times that the request should be allowed within the time window without blocking</p>
</li>
<li><p><code>expiresAt</code>: The expiry time of the time window</p>
</li>
</ul>
<p>It also allows the request to continue by exiting the rate limiter through the <code>return</code> statement. The values set in <code>cache</code> will be used to determine if and when consecutive requests with the same key should be blocked or not.</p>
<p>If the request’s key exists in <code>cache</code>, the rate limiter checks if the number of unblocked tries (<code>usage.tries</code>) from <code>cache</code> is less than the number of allowed usage tries (<code>usage.maxTries</code>). If it evaluates to <code>true</code>, it means that the request has not exceeded its maximum tries. It also checks if the expiry time of the time window stored in <code>cache</code> for the request has elapsed.</p>
<p>The request is not blocked if one of the following conditions evaluates to <code>true</code>:</p>
<ul>
<li><p>the request has not exceeded its maximum tries AND its time window has not elapsed</p>
</li>
<li><p>the current time window of the request usage in cache (<code>usage.expiresAt</code>) has elapsed</p>
</li>
</ul>
<pre><code class="language-typescript">  const currentDate = new Date();
  const retryAfter = usage.expiresAt.getTime() - currentDate.getTime();
  const timeWindowHasElapsed = retryAfter &lt; 0
  const canProceed = usage.tries &lt; opts.maxTries &amp;&amp; !timeWindowHasElapsed;

  if (canProceed) {
    cache.set(opts.key, {
      ...usage,
      tries: usage.tries + 1,
    });

    return;
  }

  if (timeWindowHasElapsed) { // if usage.expiresAt has elapsed
    cache.set(opts.key, {
      tries: 1,
      maxTries: opts.maxTries,
      expiresAt: opts.expiresAt,
    });

    return;
  }
</code></pre>
<p>If&nbsp;<code>canProceed</code>&nbsp;is truthy, the rate limiter increases the number of tries (<code>usage.tries</code>) that the request has in the cache and then allows the request to proceed by exiting the rate limiter using the <code>return</code> statement. If&nbsp;<code>timeWindowHasElapsed</code>&nbsp;is truthy, the rate limiter resets the usage of the request in the cache using values gotten from <code>getOptsFn</code> and then allows the request to proceed. If both conditions are falsy, the request is blocked with a 429 response status code.</p>
<pre><code class="language-typescript">  res.setHeader("Retry-After", retryAfter/1000);
  return res.status(429).json({
    error: { message: "Too many requests" },
  });
</code></pre>
<p>According to REST specifications, a 429 HTTP response may include a <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Retry-After">Retry-After</a> header to let clients know how long to wait before making a new request. The value of the <code>Retry-After</code> header had been calculated beforehand and is set on the response object using <code>res.setHeader</code>.</p>
<h3 id="heading-the-request-handler">The Request Handler</h3>
<p>You can find the reset password request handler in <a href="https://github.com/orimdominic/nextjs-app-router-rate-limiter/blob/main/src/pages/api/reset-password-init.ts">src/pages/api/reset-password-init.ts</a>. First, it performs validation checks on the request method and body to ensure that it is fit for its operations. The validation ensures that the request is a POST request and that the request body includes an <code>email</code> property. It ends the request with the appropriate response code if validation fails.</p>
<pre><code class="language-typescript">  if (req.method !== "POST") {
    return res.status(405).json({
      error: { message: "Not allowed" },
    });
  }

  if (!req.body.email || typeof req.body.email != "string") {
    return res.status(400).json({
      error: { message: "'email' is required" },
    });
  }
</code></pre>
<p><code>generateOptions</code> is the function that is eventually passed as <code>getOptsFn</code> to the rate limiter. The <code>generateOptions</code> function generates the specific properties of the request for the rate limiter. In the case of this endpoint, the properties are:</p>
<ul>
<li><p><code>key</code>: A string in the format <code>[method].[endpoint].[email]</code>. For an email value of “<a href="mailto:Hello@me.com">Hello@me.com</a>”, the key will be <code>post.reset-password.hello@me.com</code> which will be constant for every request for that email to this endpoint. This key value format makes it unique and specific to this request.</p>
</li>
<li><p><code>expiresAt</code>: The time when the time window expires. If the request is in cache, this value is ignored by the rate limiter and it uses the value in the cache instead</p>
</li>
<li><p><code>maxTries</code>: The maximum number of tries that should be allowed within the time window. If the request is in the rate limiter cache already, this value is ignored in preference of the value in cache.</p>
</li>
</ul>
<pre><code class="language-typescript">  const generateOptions = function (req: NextApiRequest) {
    const now = new Date();
    const inFiveSeconds = new Date(now.getTime() + 5000);

    return {
      expiresAt: inFiveSeconds,
      key: `post.reset-password.${req.body.email.toLowerCase()}`,
      maxTries: 1,
    };
  };
</code></pre>
<p>For the reset password handler, requests are rate limited to one every five seconds. You can tweak the <code>expiresAt</code> and <code>maxTries</code> values to test how it works. <code>applyRateLimiter</code> is executed with its arguments and if it does not block the request, the handler can go on to send the mail and respond to the client.</p>
<h2 id="heading-the-front-end">The Front End</h2>
<p>You can visit the user interface to test the rate limiter manually. Visit the URL shown (<a href="http://localhost:3000">http://localhost:3000</a> by default) after you ran <code>npm run dev</code>. You should see the user interface shown below to test the rate limiter.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767603425330/e7fd49a8-e8ce-4e76-b5f5-df094a5fa3f1.png" alt="User interface to test the rate limiter manually" style="display:block;margin:0 auto" width="625" height="472" loading="lazy">

<h2 id="heading-how-to-load-test-the-rate-limiter-for-resilience-with-artillery">How to Load Test the Rate Limiter for Resilience with Artillery</h2>
<p><a href="https://www.artillery.io/">Artillery</a> is a tool for testing and reporting how well web applications can perform under heavy load. In this section, you will use Artillery to test how efficient and accurate the rate limiter that you built is.</p>
<p>To use Artillery, install it globally via the <code>npm install -g artillery@latest</code> command so that the <code>artillery</code> command can be available for use via the CLI.</p>
<h3 id="heading-the-load-test-configuration">The Load Test Configuration</h3>
<p>In the <code>loadtest</code> folder located at the root of the project, you will find the <code>setup.yaml</code> file. It contains the instructions for Artillery to use to carry out the load test. The instructions tell Artillery to create virtual users that will make API requests to the application with the base URL identified by <code>target</code> in three phases:</p>
<ul>
<li><p><strong>Warm up</strong>: Make API requests for a duration of ten seconds, starting from one request per second and increase it to five requests per second.</p>
</li>
<li><p><strong>Ramp up</strong>: After warm up, make API requests for a duration of thirty seconds, starting from five requests per second and increase it to ten requests per second.</p>
</li>
<li><p><strong>Spike phase</strong>: After ramp up, make API requests for a duration of twenty seconds, starting from ten requests per second and increase it to thirty requests per second.</p>
</li>
</ul>
<p>This brings the total time of the load test to sixty seconds.</p>
<pre><code class="language-yaml">config:
  target: http://localhost:3000/api

  phases:
    - duration: 10
      arrivalRate: 1
      rampTo: 5
      name: Warm up

    - duration: 30
      arrivalRate: 5
      rampTo: 10
      name: Ramp up

    - duration: 20
      arrivalRate: 10
      rampTo: 30
      name: Spike phase
</code></pre>
<p>The <a href="https://www.artillery.io/docs/reference/extensions"><code>plugins</code></a> section contains instructions for extensions you can use to analyse the results from Artillery and get reports. For example, the <a href="https://www.artillery.io/docs/reference/extensions/ensure"><code>ensure</code></a> plugin contains setups that will report “OK” if at least 99% of the request responses have a latency of 100ms or less.</p>
<pre><code class="language-yaml">  plugins:
    ensure:
      thresholds:
        - http.response_time.p99: 100
        - http.response_time.p95: 75
</code></pre>
<p>The <a href="https://www.artillery.io/docs/reference/extensions/metrics-by-endpoint"><code>metrics-by-endpoint</code></a> plugin (not used in this project) is another Artillery plugin that is used to display response time metrics for each URL in the test.</p>
<p>A <a href="https://www.artillery.io/docs/reference/test-script#scenarios-section"><code>scenario</code></a> is a sequence of steps that describes a virtual user session in the app. Each virtual user created in <code>phases</code> will make an API request to the end endpoint in <code>flow</code> and the requests in the loop&nbsp;will happen or loop only once per virtual user (because the flow&nbsp;<code>count</code>&nbsp;has a value of 1).</p>
<pre><code class="language-yaml">scenarios:
  - flow:
      - loop:
          - post:
              url: "/reset-password-init"
              headers:
                Content-Type: "application/json"
              json:
                email: "j.doe@email.com"

        count: 1
</code></pre>
<h3 id="heading-run-the-load-test">Run the Load Test</h3>
<p>Make sure that the application is running and run the load test with the command <code>artillery run loadtest/setup.yaml --output loadtest/results.json</code> from the root folder of the project. This will run the load test on the rate-limited endpoint and save the output of the results in <code>loadtest/results.json</code>.</p>
<h3 id="heading-review-the-results">Review the Results</h3>
<p>Regardless of the of the number of requests made, the setup of our rate limiter allows only one request every five seconds. This means that the number of requests that should be allowed within a space of sixty seconds is twelve.</p>
<p>If you take a look at <code>loadtest/results.json</code>, you will see that only twelve requests had a status code of 200. If you increase the values of <code>arrivalRate</code> or <code>rampTo</code> in any or all of the phases to increase the number of requests made to the endpoint and you run the load test again, only twelve requests will still have a status code of 200. This means that our rate limiter is remaining effective and accurate even under high loads.</p>
<p>For latency, you should consider the report of the <code>ensure</code> plugin which is logged to the terminal at the end of the test. A result such as:</p>
<pre><code class="language-plaintext">Checks:
ok: http.response_time.p95 &lt; 75
ok: http.response_time.p99 &lt; 100
</code></pre>
<p>means that 95% of all requests made had a latency of less than 75 milliseconds and 99% of all requests made had a latency of less than 100 milliseconds. These are good results.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this article, you have learned about rate limiters, rate limiting algorithms, and how to build and use an in-memory rate limiter in Next.js.</p>
<p>You also got a brief introduction to load testing with Artillery. Be sure to apply what you have learned in one of your Next.js projects when you need it.</p>
<p>Feel free to <a href="https://www.linkedin.com/in/orimdominicadah/">connect with me on LinkedIn</a> for questions or clarifications. Thank you for reading this far and I hope this helps you achieve what you intended to achieve. Don’t hesitate to share this article if you feel that it would help someone else out there. Cheers!</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
