<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Prosper Ugbovo - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Prosper Ugbovo - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 14 May 2026 20:24:00 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/yongdev/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Implement Token Bucket Rate Limiting with FastAPI ]]>
                </title>
                <description>
                    <![CDATA[ APIs power everything from mobile apps to enterprise platforms, quietly handling millions of requests per day. Without safeguards, a single misconfigured client or a burst of automated traffic can ove ]]>
                </description>
                <link>https://www.freecodecamp.org/news/token-bucket-rate-limiting-fastapi/</link>
                <guid isPermaLink="false">69c6f8747cf270651055571c</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ api ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ratelimit ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Prosper Ugbovo ]]>
                </dc:creator>
                <pubDate>Fri, 27 Mar 2026 21:36:52 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/fba3d4a6-faca-429a-8e16-a3e9778d2cf8.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>APIs power everything from mobile apps to enterprise platforms, quietly handling millions of requests per day. Without safeguards, a single misconfigured client or a burst of automated traffic can overwhelm your service, degrading performance for everyone.</p>
<p>Rate limiting prevents this. It controls how many requests a client can make within a given timeframe, protecting your infrastructure from both intentional abuse and accidental overload.</p>
<p>Among the several algorithms used for rate limiting, the <strong>Token Bucket</strong> stands out for its balance of simplicity and flexibility. Unlike fixed window counters that reset abruptly, the Token Bucket allows short bursts of traffic while still enforcing a sustainable long-term rate. This makes it a practical choice for APIs where clients occasionally need to send a quick flurry of requests without being penalized.</p>
<p>In this guide, you'll implement a Token Bucket rate limiter in a FastAPI application. You'll build the algorithm from scratch as a Python class, wire it into FastAPI as middleware with per-user tracking, add standard rate limit headers to your responses, and test everything with a simple script. By the end, you'll have a working rate limiter you can drop into any FastAPI project.</p>
<h3 id="heading-what-well-cover">What we'll cover:</h3>
<ol>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-understanding-the-token-bucket-algorithm">Understanding the Token Bucket Algorithm</a></p>
</li>
<li><p><a href="#heading-setting-up-the-fastapi-project">Setting Up the FastAPI Project</a></p>
</li>
<li><p><a href="#heading-implementing-the-token-bucket-class">Implementing the Token Bucket Class</a></p>
</li>
<li><p><a href="#heading-adding-peruser-rate-limiting-middleware">Adding Per-User Rate Limiting Middleware</a></p>
</li>
<li><p><a href="#heading-testing-the-rate-limiter">Testing the Rate Limiter</a></p>
</li>
<li><p><a href="#heading-where-rate-limiting-fits-in-your-architecture">Where Rate Limiting Fits in Your Architecture</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ol>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow this tutorial, you'll need:</p>
<ul>
<li><p><strong>Python 3.9 or later</strong> installed on your machine. You can verify your version by running <code>python --version</code>.</p>
</li>
<li><p><strong>Familiarity with Python</strong> and basic knowledge of how HTTP APIs work.</p>
</li>
<li><p><strong>A text editor</strong> such as VS Code, Vim, or any editor you prefer.</p>
</li>
</ul>
<h2 id="heading-understanding-the-token-bucket-algorithm">Understanding the Token Bucket Algorithm</h2>
<p>Before writing code, it helps to understand the mechanism you'll be building.</p>
<p>The Token Bucket algorithm models rate limiting with two simple concepts: a <strong>bucket</strong> that holds tokens, and a <strong>refill process</strong> that adds tokens at a steady rate.</p>
<p>Here is how it works:</p>
<ol>
<li><p>The bucket starts full, holding a fixed maximum number of tokens (the capacity).</p>
</li>
<li><p>Each incoming request costs one token. If the bucket has tokens available, the request is allowed, and one token is removed.</p>
</li>
<li><p>If the bucket is empty, the request is rejected with a <code>429 Too Many Requests</code> response.</p>
</li>
<li><p>Tokens are added back to the bucket at a constant refill rate, regardless of whether requests are coming in. The bucket never exceeds its maximum capacity.</p>
</li>
</ol>
<p>The capacity determines how large a burst the system absorbs. The refill rate defines the sustained throughput. For example, a bucket with a capacity of 10 and a refill rate of 2 tokens per second allows a client to fire 10 requests instantly, but after that, they can only make 2 requests per second until the bucket refills.</p>
<p>This two-parameter design gives you precise control:</p>
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Controls</th>
<th>Example</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Capacity</strong> (max tokens)</td>
<td>Maximum burst size</td>
<td>10 tokens = 10 requests at once</td>
</tr>
<tr>
<td><strong>Refill rate</strong></td>
<td>Sustained throughput</td>
<td>2 tokens/sec = 2 requests/sec long-term</td>
</tr>
<tr>
<td><strong>Refill interval</strong></td>
<td>Granularity of refill</td>
<td>1.0 sec = tokens added every second</td>
</tr>
</tbody></table>
<p>Compared to other rate-limiting algorithms:</p>
<ul>
<li><p><strong>Fixed Window</strong> counters reset at hard boundaries (for example, every minute), which can allow double the intended rate at window edges. The Token Bucket has no such boundary.</p>
</li>
<li><p><strong>Sliding Window</strong> counters are more accurate but more complex to implement and maintain.</p>
</li>
<li><p><strong>Leaky Bucket</strong> processes requests at a fixed rate and queues the rest. The Token Bucket is similar, but allows bursts instead of forcing a constant pace.</p>
</li>
</ul>
<p>The Token Bucket is widely used in production systems. AWS API Gateway, NGINX, and Stripe all use variations of it.</p>
<h2 id="heading-setting-up-the-fastapi-project">Setting Up the FastAPI Project</h2>
<p>Create a project directory and install the dependencies:</p>
<pre><code class="language-shell">mkdir fastapi-ratelimit &amp;&amp; cd fastapi-ratelimit
</code></pre>
<p>Create and activate a virtual environment:</p>
<pre><code class="language-shell">python -m venv venv
</code></pre>
<p>On Linux/macOS:</p>
<pre><code class="language-shell">source venv/bin/activate
</code></pre>
<p>On Windows:</p>
<pre><code class="language-shell">venv\Scripts\activate
</code></pre>
<p>Install FastAPI and Uvicorn:</p>
<pre><code class="language-shell">pip install fastapi uvicorn
</code></pre>
<p>Create the project file structure:</p>
<pre><code class="language-plaintext">fastapi-ratelimit/
├── main.py
└── ratelimiter.py
</code></pre>
<p>Create <code>main.py</code> with a minimal FastAPI application:</p>
<pre><code class="language-python">from fastapi import FastAPI

app = FastAPI()


@app.get("/")
async def root():
    return {"message": "Hello, world!"}
</code></pre>
<p>Start the server to verify the setup:</p>
<pre><code class="language-shell">uvicorn main:app --reload
</code></pre>
<p>You should see output similar to:</p>
<pre><code class="language-plaintext">INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process
</code></pre>
<p>Open in your browser <a href="http://127.0.0.1:8000">http://127.0.0.1:8000</a> or run curl <a href="http://127.0.0.1:8000">http://127.0.0.1:8000</a>. You should receive:</p>
<pre><code class="language-json">{"message": "Hello, world!"}
</code></pre>
<p>With the project running, you can move on to building the rate limiter.</p>
<h2 id="heading-implementing-the-token-bucket-class">Implementing the Token Bucket Class</h2>
<p>Open <code>ratelimiter.py</code> in your editor and add the following code. This class implements the Token Bucket algorithm with thread-safe operations:</p>
<pre><code class="language-python">import time
import threading


class TokenBucket:
    """
    Token Bucket rate limiter.

    Each bucket starts full at `max_tokens` and refills `refill_rate`
    tokens every `interval` seconds, up to the maximum capacity.
    """

    def __init__(self, max_tokens: int, refill_rate: int, interval: float):
        """
        Initialize a new Token Bucket.

        :param max_tokens: Maximum number of tokens the bucket can hold (burst capacity).
        :param refill_rate: Number of tokens added per refill interval.
        :param interval: Time in seconds between refills.
        """
        assert max_tokens &gt; 0, "max_tokens must be positive"
        assert refill_rate &gt; 0, "refill_rate must be positive"
        assert interval &gt; 0, "interval must be positive"

        self.max_tokens = max_tokens
        self.refill_rate = refill_rate
        self.interval = interval

        self.tokens = max_tokens
        self.refilled_at = time.time()
        self.lock = threading.Lock()

    def _refill(self):
        """Add tokens based on elapsed time since the last refill."""
        now = time.time()
        elapsed = now - self.refilled_at

        if elapsed &gt;= self.interval:
            num_refills = int(elapsed // self.interval)
            self.tokens = min(
                self.max_tokens,
                self.tokens + num_refills * self.refill_rate
            )
            # Advance the timestamp by the number of full intervals consumed,
            # not to `now`, so partial intervals aren't lost.
            self.refilled_at += num_refills * self.interval

    def allow_request(self, tokens: int = 1) -&gt; bool:
        """
        Attempt to consume `tokens` from the bucket.

        Returns True if the request is allowed, False if the bucket
        does not have enough tokens.
        """
        with self.lock:
            self._refill()

            if self.tokens &gt;= tokens:
                self.tokens -= tokens
                return True
            return False

    def get_remaining(self) -&gt; int:
        """Return the current number of available tokens."""
        with self.lock:
            self._refill()
            return self.tokens

    def get_reset_time(self) -&gt; float:
        """Return the Unix timestamp when the next refill occurs."""
        with self.lock:
            return self.refilled_at + self.interval
</code></pre>
<p>The class has three public methods:</p>
<ul>
<li><p><code>allow_request()</code> is the core method. It refills tokens based on elapsed time, then tries to consume one. It returns <code>True</code> if the request is allowed, <code>False</code> if the bucket is empty.</p>
</li>
<li><p><code>get_remaining()</code> returns the number of tokens the client has left. You will use this for response headers.</p>
</li>
<li><p><code>get_reset_time()</code> returns when the next token will be added. This is also exposed in response headers.</p>
</li>
</ul>
<p>The <code>threading.Lock</code> ensures that concurrent requests don't create race conditions when reading or modifying the token count. This is important because FastAPI runs request handlers concurrently.</p>
<p><strong>Note:</strong> This implementation stores bucket state in memory. If you restart the server, all buckets reset. For persistence across restarts or multiple server instances, you would store token counts in Redis or a similar external store. The in-memory approach is sufficient for single-instance deployments.</p>
<h2 id="heading-adding-per-user-rate-limiting-middleware">Adding Per-User Rate Limiting Middleware</h2>
<p>A single global bucket would throttle all users together. One heavy user could exhaust the limit for everyone. Instead, you'll assign a separate bucket to each user, identified by their IP address.</p>
<p>Add the following to <code>ratelimiter.py</code>, below the <code>TokenBucket</code> class:</p>
<pre><code class="language-python">from collections import defaultdict


class RateLimiterStore:
    """
    Manages per-user Token Buckets.

    Each unique client key (e.g., IP address) gets its own bucket
    with identical parameters.
    """

    def __init__(self, max_tokens: int, refill_rate: int, interval: float):
        self.max_tokens = max_tokens
        self.refill_rate = refill_rate
        self.interval = interval
        self._buckets: dict[str, TokenBucket] = {}
        self._lock = threading.Lock()

    def get_bucket(self, key: str) -&gt; TokenBucket:
        """
        Return the TokenBucket for a given client key.
        Creates a new bucket if one does not exist yet.
        """
        with self._lock:
            if key not in self._buckets:
                self._buckets[key] = TokenBucket(
                    max_tokens=self.max_tokens,
                    refill_rate=self.refill_rate,
                    interval=self.interval,
                )
            return self._buckets[key]
</code></pre>
<p>Now open <code>main.py</code> and replace its contents with the full application, including the rate-limiting middleware:</p>
<pre><code class="language-python">import time

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse

from ratelimiter import RateLimiterStore

app = FastAPI()

# Configure rate limits: 10 requests burst, 2 tokens added every 1 second.
limiter = RateLimiterStore(max_tokens=10, refill_rate=2, interval=1.0)


@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    """
    Middleware that enforces per-IP rate limiting on every request.
    Adds standard rate limit headers to every response.
    """
    # Identify the client by IP address.
    client_ip = request.client.host
    bucket = limiter.get_bucket(client_ip)

    # Check if the client has tokens available.
    if not bucket.allow_request():
        retry_after = bucket.get_reset_time() - time.time()
        return JSONResponse(
            status_code=429,
            content={"detail": "Too many requests. Try again later."},
            headers={
                "Retry-After": str(max(1, int(retry_after))),
                "X-RateLimit-Limit": str(bucket.max_tokens),
                "X-RateLimit-Remaining": str(bucket.get_remaining()),
                "X-RateLimit-Reset": str(int(bucket.get_reset_time())),
            },
        )

    # Request is allowed. Process it and add rate limit headers to the response.
    response = await call_next(request)
    response.headers["X-RateLimit-Limit"] = str(bucket.max_tokens)
    response.headers["X-RateLimit-Remaining"] = str(bucket.get_remaining())
    response.headers["X-RateLimit-Reset"] = str(int(bucket.get_reset_time()))
    return response


@app.get("/")
async def root():
    return {"message": "Hello, world!"}


@app.get("/data")
async def get_data():
    return {"data": "Some important information"}


@app.get("/health")
async def health():
    return {"status": "ok"}
</code></pre>
<p>The middleware does the following on every incoming request:</p>
<ol>
<li><p>Extracts the client's IP address from <code>request.client.host</code>.</p>
</li>
<li><p>Retrieves (or creates) that client's Token Bucket from the store.</p>
</li>
<li><p>Calls <code>allow_request()</code>. If the bucket is empty, it returns a <code>429</code> response with a <code>Retry-After</code> header telling the client how long to wait.</p>
</li>
<li><p>If tokens are available, it processes the request normally and attaches rate limit headers to the response.</p>
</li>
</ol>
<p>The three <code>X-RateLimit-*</code> headers follow a <a href="https://datatracker.ietf.org/doc/draft-ietf-httpapi-ratelimit-headers/">widely adopted convention</a>:</p>
<table>
<thead>
<tr>
<th>Header</th>
<th>Meaning</th>
</tr>
</thead>
<tbody><tr>
<td><code>X-RateLimit-Limit</code></td>
<td>Maximum burst capacity (max tokens)</td>
</tr>
<tr>
<td><code>X-RateLimit-Remaining</code></td>
<td>Tokens left in the current bucket</td>
</tr>
<tr>
<td><code>X-RateLimit-Reset</code></td>
<td>Unix timestamp when the next refill occurs</td>
</tr>
</tbody></table>
<p>These headers allow well-behaved clients to self-throttle before hitting the limit.</p>
<h2 id="heading-testing-the-rate-limiter">Testing the Rate Limiter</h2>
<p>Restart the server if it's not already running:</p>
<pre><code class="language-shell">uvicorn main:app --reload
</code></pre>
<h3 id="heading-manual-testing-with-curl">Manual Testing with curl</h3>
<p>Manual testing with <code>curl</code> is useful during development when you want to quickly verify that your middleware is working. A single request lets you confirm that the rate limit headers are present, the values are correct, and one token is consumed as expected.</p>
<p>This approach is fast and requires no additional setup, making it ideal for spot-checking your configuration after making changes.</p>
<p>Send a single request and inspect the response:</p>
<pre><code class="language-shell">curl -i http://127.0.0.1:8000/data
</code></pre>
<p>You should see a <code>200</code> response with headers like:</p>
<pre><code class="language-plaintext">HTTP/1.1 200 OK
x-ratelimit-limit: 10
x-ratelimit-remaining: 9
x-ratelimit-reset: 1739836801
</code></pre>
<h3 id="heading-automated-burst-test">Automated Burst Test</h3>
<p>While <code>curl</code> confirms that the rate limiter is active, it can't verify that the limiter actually blocks requests when the bucket is empty. For that, you need to send requests faster than the refill rate and observe the <code>429</code> responses. An automated burst test is essential before deploying to production, after changing your bucket parameters, or when you need to verify both the blocking and refill behavior.</p>
<p>Create a file called <code>test_ratelimit.py</code> in your project directory:</p>
<pre><code class="language-python">import requests
import time


def test_burst():
    """Send 15 rapid requests to trigger the rate limit."""
    url = "http://127.0.0.1:8000/data"
    results = []

    for i in range(15):
        response = requests.get(url)
        remaining = response.headers.get("X-RateLimit-Remaining", "N/A")
        results.append((i + 1, response.status_code, remaining))
        print(f"Request {i+1:2d} | Status: {response.status_code} | Remaining: {remaining}")

    print()

    allowed = sum(1 for _, status, _ in results if status == 200)
    blocked = sum(1 for _, status, _ in results if status == 429)
    print(f"Allowed: {allowed}, Blocked: {blocked}")


def test_refill():
    """Exhaust tokens, wait for a refill, then confirm requests succeed again."""
    url = "http://127.0.0.1:8000/data"

    print("\n--- Exhausting tokens ---")
    for i in range(12):
        response = requests.get(url)
        print(f"Request {i+1:2d} | Status: {response.status_code}")

    print("\n--- Waiting 3 seconds for refill ---")
    time.sleep(3)

    print("\n--- Sending requests after refill ---")
    for i in range(5):
        response = requests.get(url)
        remaining = response.headers.get("X-RateLimit-Remaining", "N/A")
        print(f"Request {i+1:2d} | Status: {response.status_code} | Remaining: {remaining}")


if __name__ == "__main__":
    print("=== Burst Test ===")
    test_burst()

    # Allow bucket to refill before next test
    time.sleep(6)

    print("\n=== Refill Test ===")
    test_refill()
</code></pre>
<p>Install the <code>requests</code> library if you don't have it:</p>
<pre><code class="language-shell">pip install requests
</code></pre>
<p>Run the test:</p>
<pre><code class="language-shell">python test_ratelimit.py
</code></pre>
<p>You should see output similar to:</p>
<pre><code class="language-output">=== Burst Test ===
Request  1 | Status: 200 | Remaining: 9
Request  2 | Status: 200 | Remaining: 8
Request  3 | Status: 200 | Remaining: 7
...
Request 10 | Status: 200 | Remaining: 0
Request 11 | Status: 429 | Remaining: 0
Request 12 | Status: 429 | Remaining: 0
...
Request 15 | Status: 429 | Remaining: 0

Allowed: 10, Blocked: 5
</code></pre>
<p>The first 10 requests succeed (one token each from the full bucket). Requests 11 through 15 are rejected because the bucket is empty. The refill test then confirms that after waiting, tokens reappear and requests succeed again.</p>
<p><strong>Note:</strong> The exact split between allowed and blocked requests may vary slightly due to timing. Tokens may refill between rapid requests. This is expected behavior.</p>
<h2 id="heading-where-rate-limiting-fits-in-your-architecture">Where Rate Limiting Fits in Your Architecture</h2>
<p>The implementation in this tutorial runs inside your application process, which is the simplest approach and works well for single-instance deployments. In larger systems, rate limiting typically appears at multiple layers:</p>
<ul>
<li><p><strong>API gateway level</strong> (NGINX, Kong, Traefik, Envoy): A coarse global rate limit applied to all traffic before it reaches your application. This protects against large-scale abuse and DDoS.</p>
</li>
<li><p><strong>Application level</strong> (this tutorial): Fine-grained per-user or per-endpoint limits inside your service. This is useful for enforcing different quotas on different API tiers.</p>
</li>
<li><p><strong>Both</strong>: Many production systems combine a gateway-level global limiter with an in-app per-user limiter. The gateway catches the flood and the application enforces business rules.</p>
</li>
</ul>
<p>For multi-instance deployments (multiple server processes behind a load balancer), the in-memory <code>RateLimiterStore</code> won't share state across instances. In that case, replace the in-memory dictionary with Redis. The Token Bucket logic stays the same – only the storage layer changes.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this guide, you built a Token Bucket rate limiter from scratch and integrated it into a FastAPI application with per-user tracking and standard rate limit response headers. You also tested the implementation to verify that burst capacity and refill behavior work as expected.</p>
<p>The Token Bucket algorithm gives you two straightforward controls, capacity for burst tolerance and refill rate for sustained throughput, which cover the vast majority of rate-limiting needs.</p>
<p>From here, you can extend this foundation by:</p>
<ul>
<li><p>Replacing the in-memory store with Redis for multi-instance deployments.</p>
</li>
<li><p>Applying different rate limits per endpoint by creating separate <code>RateLimiterStore</code> instances.</p>
</li>
<li><p>Using authenticated user IDs instead of IP addresses for more accurate client identification.</p>
</li>
<li><p>Adding metrics and logging to track how often clients are being throttled.</p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
