cache - freeCodeCamp.org

How to Cache Golang API Responses for High Performance

Temitope Oyedele — Wed, 15 Oct 2025 10:27:00 +0000

Go makes it easy to build APIs that are fast out of the box. But as usage grows, speed at the language level is not enough. If every request keeps hitting the database, crunching the same data, or serializing the same JSON over and over, latency creeps up and throughput suffers. Caching is the tool that keeps performance high by storing work that has already been done so that future requests can reuse it instantly. Let’s look at four practical ways to cache APIs in Go, each explained with an analogy and backed by simple code you can adapt.

Response Caching with Local and Redis Storage
Database Query Result Caching
HTTP Caching with ETag and Cache-Control
Stale-While-Revalidate with Background Refresh
Wrapping Up

Response Caching with Local and Redis Storage

When the process of generating an API response becomes expensive, the fastest solution is to store the entire response. Think of a coffee shop during the morning rush. If every customer orders the same latte, the barista could grind beans and steam milk for each order, but the line would move slowly. A smarter move is to brew a pot once and pour from it repeatedly. To handle both speed and scale, the shop keeps a small pot at the counter for instant pours and a larger urn in the back for refills. In software terms, the counter pot is a local in-memory cache such as Ristretto or BigCache, and the urn is Redis, which allows multiple API servers to share the same cached responses.

In Go, this two-tier setup usually follows a cache-aside pattern: look in local memory first, fall back to Redis if needed, and only compute the result when both layers miss. Once computed, the value is saved in Redis for everyone and in memory for immediate reuse on the next call.

val, ok := local.Get(key)
if !ok {
    val, err = rdb.Get(ctx, key).Result()
    if err == redis.Nil {
        val = computeResponse() // expensive DB or logic
        _ = rdb.Set(ctx, key, val, 60*time.Second).Err()
    }
    local.Set(key, val, 1)
}
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(val))

In the code above, the first attempt is to retrieve the response from the local cache, which returns instantly if the key or data exists. If not found, it queries Redis as the second layer. If Redis also returns nothing, the expensive computation runs and its result is stored in Redis with a sixty seconds expiration so other services can access it, then placed in the local cache for immediate reuse. After which, the response is written back to the client as JSON.

This gives you the best of both worlds: lightning-fast responses for repeat calls and a consistent cache across all your API servers.

Database Query Result Caching

Sometimes the API itself is simple but the real cost hides in the database. Imagine a newsroom waiting for election results. If every editor keeps calling the counting office for the same numbers, the phone lines may jam. Instead, one reporter calls once, writes the result on a board, and every editor copies from there. The board is the cache, and it saves both time and pressure on the office.

In Go, you can apply the same principle by caching query results. Rather than hitting the database for each identical request, you store the result in Redis with a key that represents the query intent. When the next request comes in, you pull from Redis, skip the database, and respond faster.

key := fmt.Sprintf("q:UserByID:%d", id)
if b, err := rdb.Get(ctx, key).Bytes(); err == nil {
    var u User
    _ = json.Unmarshal(b, &u)
    return u
}

u, _ := repo.GetUser(ctx, id) // real DB call
bb, _ := json.Marshal(u)
_ = rdb.Set(ctx, key, bb, 2*time.Minute).Err()
return u

Here, we construct a cache key that uniquely identifies the query using the user ID, then attempts to fetch the serialized result from Redis. If the key exists, it deserializes the bytes back into a User struct and returns immediately without touching the database. On a cache miss, it executes the actual database query through the repository, serializes the User object to JSON, stores it in Redis with a two-minute expiration, and returns the result.

This pattern dramatically reduces database load and response time for read-heavy APIs, but you must remember to clear or refresh entries when data changes, or set short time-to-live values to keep results reasonably fresh.

HTTP Caching with ETag and Cache-Control

Not all caching has to happen inside the server. The HTTP standard already provides tools that let clients or CDNs reuse responses. By setting headers like ETag and Cache-Control, you can tell the client whether the response has changed. If nothing is new, the client keeps its own copy and the server only sends a lightweight 304 response.

It is similar to a manager posting notices on an office board. Each sheet carries a small stamp. Employees compare the stamp against the one they already have. If it matches, they know their copy is still valid and skip taking a new one. Only when the stamp changes do they replace it.

In Go this is straightforward. Compute an ETag from the response body, compare it with what the client sends, and decide whether to return the full payload or just the 304.

etag := computeETag(responseBytes)
if match := r.Header.Get("If-None-Match"); match == etag {
    w.WriteHeader(http.StatusNotModified)
    return
}

w.Header().Set("ETag", etag)
w.Header().Set("Cache-Control", "public, max-age=60")
w.Write(responseBytes)

The code above generates an ETag, which is a fingerprint or hash of the response content, then checks if the client sent an If-None-Match header with a matching ETag from a previous request. If the ETags match, the content hasn't changed, so the server responds with a 304 Not Modified status and sends no body, saving bandwidth. When the ETags don't match or the client has no cached version, the server attaches the new ETag and a Cache-Control header that allows public caching for sixty seconds, then sends the full response.

This approach reduces bandwidth, lowers CPU usage, and pairs well with CDNs that can cache and serve responses directly.

Stale-While-Revalidate with Background Refresh

There are cases where serving slightly old data is acceptable if it keeps the API fast. Stock dashboards, analytics summaries, or feed endpoints often fit this model. Instead of making users wait for fresh data on every request, you can serve the cached value immediately and refresh it quietly in the background. This technique is called Stale-While-Revalidate.

Picture a stock ticker screen in a lobby. The numbers may be a few seconds behind, but they are still useful to anyone glancing at the board. Meanwhile, a background process fetches the latest figures and updates the ticker. The reader never stares at a blank screen and the system stays responsive even during spikes.

In Go, this can be built by storing not just the cached data but also timestamps that define when the data is fresh, when it can still be served as stale, and when it must be recomputed. The singleflight package helps ensure that only one goroutine does the refresh work, preventing a dogpile of updates.

entry := getEntry(key) // {data, freshUntil, staleUntil}
switch {
case time.Now().Before(entry.freshUntil):
    return entry.data
case time.Now().Before(entry.staleUntil):
    go refreshSingleflight(key) // background refresh
    return entry.data
default:
    return refreshSingleflight(key) // must refresh now
}

Here, the code retrieves a cache entry containing the data along with two timestamps marking the freshness and staleness boundaries. If the current time falls before the fresh threshold, the data is considered fully fresh and returned immediately. If time has passed the fresh threshold but remains within the stale window, the code returns the slightly outdated data instantly while launching a background goroutine to refresh it asynchronously, ensuring the next request gets updated information. Once time exceeds even the stale boundary, the data is too old to serve, so the code blocks and performs a synchronous refresh before returning.

This keeps latency low while still ensuring the cache updates regularly, a balance between freshness and performance.

Wrapping Up

Caching is not a single tactic but a set of strategies that fit different needs. Full response caching eliminates repeat work at the top level. Query result caching protects the database from repeated load. HTTP caching leverages the protocol to cut down data transfer. Stale-While-Revalidate strikes a compromise that favors speed without leaving data stale for too long.

In practice, these approaches are often layered. A Go API might use local memory and Redis for responses, apply query-level caching for hot tables, and set ETags so clients avoid unnecessary downloads. With the right mix, you can cut latency by orders of magnitude, handle far more traffic, and save both compute and database resources.

Memcached Crash Course

Beau Carnes — Mon, 08 Jan 2024 18:02:00 +0000

Memcached is an important technology for back end developers to understand. It is a distributed memory caching system, primarily used to speed up web applications by reducing database load. It stores data in memory, allowing quicker access compared to traditional database-driven methods.

We just published a Memcached crash course on the freeCodeCamp.org Youtbue channel. This course, designed for both beginners and intermediate learners, delves into the architecture and design choices of Memcached. It provides practical hands-on experience with Docker, Telnet, and Node.js, making it an invaluable resource for anyone looking to enhance their web application performance. Hussein Nasser developed this course.

The course explains the key-value store mechanism of Memcached, emphasizing its simplicity and efficiency in handling web application data.

Course Highlights

Understanding Memcached's Architecture: Deep insights into Memcached's design, including memory management and Least Recently Used (LRU) caching policy.
Practical Demonstrations: Utilizing Docker for setting up a Memcached environment, Telnet for interaction, and Node.js for integrating Memcached into web applications.
Advanced Concepts: Exploring threading, connections, read and write operations, collision handling, and locking mechanisms.
Distributed Caching: Learning how Memcached enables distributed caching, enhancing scalability and performance.

Here are the sections in this course:

What is Memcached?
Memory management
LRU
Threading and Connections
Read Example
Write Example
Write and Read collisions
Locking
Distributed Cache
Memcached with Docker/Telnet/NodeJS
Spin up a Memcached Docker container and telnet
Memcached and NodeJS
Four Memached Servers with NodeJS
Summary

Why Memcached?

Memcached is essential for web developers looking to optimize application performance. It offers rapid data access and reduces database load, crucial for high-traffic websites. This course provides the necessary skills to implement Memcached effectively, ensuring a more responsive and efficient web application experience.

Conclusion

Whether you're a beginner or have some experience, this course equips you with the knowledge to effectively implement and manage a Memcached system in your web applications.

Watch the full course on the freeCodeCamp.org YouTube channel (1-hour watch).

Docker Cache – How to Do a Clean Image Rebuild and Clear Docker's Cache

freeCodeCamp — Mon, 28 Mar 2022 18:51:57 +0000

By Sebastian Sigl

Containers enable you to package your application in a portable way that can run in many environments. The most popular container platform is Docker.

This tutorial will explain how to use the Docker build cache to your advantage.

Docker Build Cache

Building images should be fast, efficient, and reliable. The concept of Docker images comes with immutable layers. Every command you execute results in a new layer that contains the changes compared to the previous layer.

All previously built layers are cached and can be reused. But, if your installation depends on external resources, the Docker cache can cause issues.

How to Leverage the Docker Build Cache

To understand Docker build-cache issues, let’s build a simple custom nginx Docker application. Before you build the image, create a Dockerfile that updates libraries and adds a custom startpage:

FROM nginx:1.21.6

# Update all packages
RUN apt-get update && apt-get -y upgrade

# Use a custom startpage
RUN echo 'My Custom Startpage' > /usr/share/nginx/html/index.html

You can now build the Docker image:

$  docker build -t my-custom-nginx .

=> [1/3] FROM docker.io/library/nginx:1.21.6@sha256:e12...  5.8s
=> [2/3] RUN apt-get update && apt-get -y upgrade           3.6s
=> [3/3] RUN echo 'My Custom Startpage...        0.2s

=> exporting to image                                       0.1s
=> exporting layers                                         0.1s
=> writing image                                            0.0s
=> naming to docker.io/library/my-custom-nginx

[+] Building 11.3s (7/7) FINISHED

In this example, I removed some output for readability. If you build the image the first time, you see that it takes quite some time, in my case 11.3s.

One long executing step is apt-get update && apt-get -y upgrade depending on how many dependencies are updated and how fast your internet speed is. It checks for package updates on the operation system and installs them if available.

Now, you execute it again, and you benefit from the Docker build cache:

$ docker build -t my-custom-nginx .

=> [1/3] FROM docker.io/library/nginx:1.21.6@sha256:e1211ac1…   0.0s
=> CACHED [2/3] RUN apt-get update && apt-get -y upgrade        0.0s
=> CACHED [3/3] RUN echo 'My Custom Startpage...     0.0s

=> exporting to image                                           0.0s
=> exporting layers                                             0.0s
=> writing image                                                0.0s
=> naming to docker.io/library/my-custom-nginx

Building 1.1s (7/7) FINISHED

This time, the image build is very fast because it can reuse all previously built images. When you customize your startpage in the Dockerfile, you see how the caching behavior is affected:

FROM nginx:1.21.6

# Update all packages
RUN apt-get update && apt-get -y upgrade

# Use a custom startpage
RUN echo 'New Startpage' > /usr/share/nginx/html/index.html

Now, build the image again:

$ docker build -t my-custom-nginx .

=> [1/3] FROM docker.io/library/nginx:1.21.6@sha256:e1211ac1…   0.0s
=> CACHED [2/3] RUN apt-get update && apt-get -y upgrade        0.0s
=> [3/3] RUN echo 'My Custom Startpage...            0.2s

=> exporting to image                                           0.0s
=> exporting layers                                             0.0s
=> writing image                                                0.0s
=> naming to docker.io/library/my-custom-nginx

Building 2.1s (7/7) FINISHED

This time it only rebuilt the last layer because it recognized that the RUN command had changed. But, it reused the intense 2nd build step and did not update operation system dependencies.

The caching behavior is intelligent. Once 1 step needs to rebuild, every subsequent step is built again. Therefore, it’s good to put frequently changing parts at the end of a Dockerfile to reuse previous build layers.

Still, maybe you want to force a rebuild of a cached layer to force a package update. Forcing a rebuild can be necessary because you want to keep your application safe and use the newest updates when available.

How to Use the Docker Build `--no-cache` Option

There can be different reasons for disabling the build-cache. You can rebuild the image from the base image without using cached layers by using the --no-cache option.

$ docker build -t my-custom-nginx .

=> CACHED [1/3] FROM docker.io/library/nginx:1.21.6@sha256:...  0.0s
=> [2/3] RUN apt-get update && apt-get -y upgrade               3.5s
=> [3/3] RUN echo 'My Custom Startpage...            0.2s

=> exporting to image                                           0.1s
=> exporting layers                                             0.0s
=> writing image                                                0.0s
=> naming to docker.io/library/my-custom-nginx

Building 5.5s (7/7) FINISHED

New layers were constructed and used. The docker build runs both commands this time, which comes with an all-or-nothing approach. Either you provide the --no-cache option that executes all commands, or you will cache as much as possible.

How to Use Docker Arguments for Cache-Busting

Another option allows providing a little starting point in the Dockerfile. You need to edit your Dockerfile like this:

FROM nginx:1.21.6

# Update all packages
RUN apt-get update && apt-get -y upgrade

# Custom cache invalidation
ARG CACHEBUST=1

# Use a custom startpage
RUN echo 'New Startpage' > /usr/share/nginx/html/index.html

You add a CACHEBUST argument to your Dockerfile at the location you want to enforce a rebuild. Now, you can build the Docker image and provide an always different value that causes all following commands to rerun:

$ docker build -t my-custom-nginx --build-arg CACHEBUST=$(date +%s) .

=> [1/3] FROM docker.io/library/nginx:1.21.6@sha256:e1211ac1...    0.0s
=> CACHED [2/3] RUN apt-get update && apt-get -y upgrade           0.0s
=> [3/3] RUN echo 'My Custom Startpage...               0.3s

=> exporting to image                                              0.0s
=> exporting layers                                                0.0s
=> writing image                                                   0.0s
=> naming to docker.io/library/my-custom-nginx

Building 1.0s (7/7) FINISHED

By providing --build-arg CACHEBUST=$(date +%s), you set the parameter to an always different value that causes all following layers to rebuild.

Summary

Docker’s build-cache is a handy feature. It speeds up Docker builds due to reusing previously created layers.

You can use the --no-cache option to disable caching or use a custom Docker build argument to enforce rebuilding from a certain step.

Understanding the Docker build cache is powerful and will make you more efficient in building your Docker container.

I hope you enjoyed this article.

If you liked it and feel the need to give me a round of applause or just want to get in touch, follow me on Twitter.

I work at eBay Kleinanzeigen, one of the world’s biggest classified companies. By the way, we are hiring!

References

What is Cached Data? What does Clear Cache Mean and What Does it Do?

freeCodeCamp — Fri, 06 Mar 2020 20:51:16 +0000

By Jeff M Lowery

First, what's a cache?

In general terms, a cache (pronounced "cash") is a type of repository. You can think of a repository as a storage depot. In the military, this would be to hold weapons, food, and other supplies needed to carry forward a mission.

A military distribution network

In computer science, these "supplies" are termed resources, where the resources are scripts, code, and document content. The latter is sometimes more specifically referred to as "assets" such as text, static data, media, and hyperlinks, but here I'll just use the one term resources.

The distinction between a cache and other types of repositories

A cache's primary purpose is to speed up retrieval of web page resources, decreasing page load times. Another critical aspect of a cache is to ensure that it contains relatively fresh data.

This article will cover two prevalent methods of caching: browser caching and Content Delivery Networks (CDNs).

Besides caches, other repositories come into play in web architectures; often these are designed to hold vast troves of data. They are not as focussed, though, on retrieval performance.

For example, Amazon Glacier is a data repository that is designed to store data cheaply, but not retrieve it quickly. An SQL database, on the other hand, is designed to be flexible, up-to-date, and fast, but is seldom cheap and not usually as fast as a cache.

The Browser Cache: a memory cache

A memory cache stores resources locally on the computer where the browser is running. While the browser is active, retrieved resources will be stored on the computer's physical memory (RAM), and possibly also on hard drive.

Later, when the exact same resources are needed when revisiting a web page, the browser will pull those from the cache instead of the remote server. Since the cache is stored locally, in fast memory, those resources are fetched quicker, and the page loads faster.

Speed of resource retrieval is of the essence, but so is the necessity that the resources be fresh. A stale resource is one that is out-of-date and may no longer be valid.

Part of the job of the browser is to identify which cached resources are stale, and refetch those that are. Since a web page typically has may resources, there will usually be a mix of stale and fresh versions in the cache.

How does the browser know what is stale in the cache?

The answer is not simple, but there are two main approaches: cache-busting and HTTP header fields.

cache-busting

_Photo by [Unsplash](https://unsplash.com/@sarah_elizabeth?utm_source=ghost&utm_medium=referral&utm_campaign=api-credit">Sarah Shaffer / HTTP header fields

Every resource request come with some meta information known as the header. Conversely, every response also has header information associated with it.

In some cases, the browser sees the response header values, and changes corresponding values in subsequent request headers. Among these header values are those that affect how resource caching is performed on the browser.

HEAD requests and conditional requests

A HEAD request is like a truncated GET or a POST request. Instead of requesting the complete resource, a HEAD request only requests the header fields that would otherwise be returned on a full request.

The header of a resource is generally going to be much smaller (in number of total bytes) than the resource data associated with it (the "body" of the response). The header information is sufficiently informative to allow the browser to determine the freshness of the resource in its cache.

HEAD requests are often used to verify the validity of a server resource (that is, does the resource still exist, and if so, has it been updated since the browser last accessed it?). The browser will use what's in its cache if the HEAD request indicates the resource is valid, otherwise it will perform a full GET or POST request and refresh its cache with what is returned.

With a conditional request, the browser sends fields in the header describing the freshness of its cached resource. This time, the server determines if the browser's cache is still fresh.

If it is, the server returns a 304 response with just the resource's header information, and no resource body (the data). If the browser's cache is determined to be outdated, then the server will return a full 200 OK response.

This mechanism is faster than using HEAD requests, since it eliminates the possibility of having to issue two requests instead of one.

The above simplifies what can be a pretty complicated process. There's a lot of fine-tuning involved in caching, but it all is controlled through header fields, the most important of which is cache-control.

Cache-Control

When responding to a request, the server will send header fields to the browser indicating what behavior is should adapt when caching. If I load the page at https://en.wikipedia.org/wiki/Uniform_Resource_Identifier, the response contains this in its header record:

cache-control: private, s-maxage=0, max-age=0, must-revalidate

private means that only the browser should cache the document content.

s-maxage and max-age are set to 0. The s-maxage value is for proxy servers with caches, whereas max-age is intended for the browser. The effect of setting max-age alone is that the cached resource expires immediately, yet it may still be used (even though stale) during page reloads while in the same browser session.

A stale resource may be revalidation through a HEAD request, which might be followed by a GET or POST request, depending on the response. The must-revalidate directive commands the browser to revalidate the cached resource if it is stale.

Since max-age is set to 0 in this case, the cached resource is immediately stale once received. The combination of the two directives is equivalent to the single directive no-cache.

The two settings ensure that the browser always revalidates the cached resource, whether still in the same session or not.

Cache-control directives are very extensive, and at times confusing – they're a topic in their own right. A complete documented list of directives can be found here.

E-tag

This is a token that the server sends and the browser retains until the next request. This is only used when the browser knows that the resource's cache lifetime has expired.

E-tags are server-generated hash values, which often use the resource's physical file name and last modified date on the server as a seed. When a resource file is updated, the modified date changes, and a new hash value is generated and sent in the response header to the request.

Other header tags affecting caching

The header tags expires and last-modified are all but obsolete, yet are still sent by most servers for backward compatibility with older browsers. An example:

expires: Thu, 01 Jan 1970 00:00:00 GMT
last-modified: Sun, 01 Mar 2020 17:59:02 GMT

Here, the expires is set to the zeroth date (historically, from the UNIX operating system). That indicates that the resource expires immediately, just as max-age=0 does. Last-modified tells the browser when the latest update was made to the resource, which it can then use to decide if it should refetch it rather than use the cache value.

Forcing a cache refresh from the browser

What's a hard reload?

A hard reload forces the refetch of all resources on a page, whether they're content, scripts, stylesheets or media. Pretty much everything, right?

Well, some resources are may not be explicitly included on a page. Instead, they can be fetched dynamically, usually after everything explicit has loaded.

The browser doesn't know ahead of time that this will happen, and when it does, the later requests (initiated by scripts, usually) will still use cached copies of those resources if available.

What's clear cache and hard reload?

This operation clears the entire browser cache, which has the same effect as a hard reload, but additionally causes dynamically loaded resources to be fetched as well – after all, there's nothing in the cache, so there is no choice!

Content Delivery Networks: a geo-located cache

A CDN is more than just a cache, but caching is one of its jobs. A CDN stores data in geographically distributed locations so that round-trip times to and from a geographically local browser are reduced.

Browser requests are routed to a nearby CDN, thereby shortening the physical distance response data has to travel. CDNs also are able to handle large amounts of traffic, and provide security against some types of attacks.

A CDN gets its resources through an Internet Exchange Point (IXP), nodes that are part of the backbone of The Internet (in caps). There are steps to take to set up request routing to go to a CDN instead of the host server. The next step is to make sure the CDN has the current content of your website.

In the old days, most CDNs supported the push method: a website would push new content to a CDN hub, which would then get distributed to geographically dispersed nodes.

Nowadays, most CDNs use the caching protocols described above (or similar) to 1) download new resources, and 2) refresh existing ones. The browser still has its cache, and none of that changes. All a CDN does is make those transfers of new resources faster.

An In-depth Introduction to HTTP Caching: Cache-Control & Vary

Léo Jacquemin — Thu, 24 Oct 2019 09:56:49 +0000

Introduction - scope of the article

This series of articles deals with caching in the context of HTTP. When properly done, caching can increase the performance of your application by an order of magnitude. On the contrary, when overlooked or completely ignored, it can lead to some very unwanted side effects caused by misbehaving proxy servers that, in the absence of clear caching instructions, decide to cache anyway and serve stale resources.

In the first part of this series, we argued that caching is the most effective way to increase performance, when measured by the page load time. In this second part, it is time to shift our focus to the mechanisms at our disposal. To put it in another way: how does HTTP caching actually work?

To answer this question, we decided to consider the case of an empty cache that starts progressively caching and serving resources. As it gradually receives incoming HTTP requests, our cache will start behaving accordingly. Serving the resource from the cache when a fresh copy is available, varying over multiple representations, making a conditional request... This way, we can introduce each concept progressively as we need it.

At first, our empty cache will have no choice but to forward requests to the origin server. This will allow us to understand how origin servers instruct our cache on what to do with the resource, such as if it is allowed to store it, and for how long. For this, we will examine each Cache-Control directive and clarify some of them that have been known to have conflicting meanings.

Second, we will look at what happens when our cache receives a request for a resource it already knows. How does our cache decide if it can re-use a previously stored response? How does it map a given HTTP request to a particular resource? To answer these, we will learn about representation variations with the Vary header.

This article is going to focus on knowledge that’s the most valuable from a web developer’s perspective. Therefore, conditional requests are only discussed briefly and will be the focus of another article.

Without further ado, let us start with an overview of what we will be exploring.

The HTTP caching decision tree

Conceptually, a cache system always involve at least three participants. With HTTP, these participants are the client, the server, and the caching proxy.

However, when learning about HTTP caching, we strongly encourage you not to think of the client as your typical web browser because these days, they all ship with their own HTTP caching layer. It makes it difficult to clearly separate the browser from the cache. For this reason, we invite you to think of the client as a headless command line program such as cURL or any application without an embedded HTTP cache.

All precautions aside, let us now deep dive into the subject by taking a look at the following picture: the HTTP caching decision tree.

This picture illustrates all the possible paths a request can take every time a client asks for a resource to an origin server behind a caching system. A careful examination of this illustration reveals that there are only four possible outcomes.

Clearly separating these outcomes in our minds is actually very convenient, seeing as each important caching concept (cache instructions, representation matching, conditional requests and resource aging) maps to each one of them.

Let us describe succinctly each one by introducing two important terms relating to the HTTP caching terminology: cache hits and cache misses.

Hits and misses

The first possible outcome is when the cache finds a matching resource, and is allowed to serve it, which, in the caching world, are indeed two distinct things. This outcome is what we commonly call a cache hit, and is the reason why we use caches in the first place.

When a cache hit happens, it completely offloads the origin server and the latency is dramatically reduced. In fact, when the cache hit happens in the browser’s HTTP cache latency is null and the requested resource is instantly available.

Unfortunately, cache hits account only one of the four possible outcomes. The rest of them fall into the second category, also known as cache misses, which can happen for only three reasons.

The first reason a cache miss typically happens is simply when the cache does not find any matching resource in its storage. This is usually a sign that the resource has never been requested before, or has been evicted from the cache to free up some space. In such cases, the proxy has no choice but to forward the request to the origin server, fully download the response and look for caching instructions in the response headers.

The second reason a cache miss can happen is actually just as detrimental, where the cache detects a matching representation, one that it could potentially use. However, the resource is not considered to be fresh anymore - we will see how exactly in the cache-control section of this article - but is said to be stale.

In such case, the cache sends a special kind of request, called a conditional request to the origin server. Conditional requests allow caches to retrieve resources only if they are different from the one they have in their local storage. Since only the origin server ever has the most recent representation of a given resource, conditional requests always have to go through the whole caching proxy chain up to the origin server.

These special requests have only two possible outcomes. If the resource has not changed, the cache is instructed to use its local copy by receiving a 304 Not Modified response along with updated headers and an empty body. This outcome, the third one on our list, is called a successful validation.

Finally, the last possible outcome is when the resource has changed. In this case, the origin server sends a normal 200 OK response, as it would if the cache was empty and had forwarded the request. To put it another way, cache misses caused by empty cache and failed validation yield exactly the same HTTP response.

To best visualize these four paths, it is helpful to picture them in a timeline, as illustrated below.

At first, the cache is empty. The flow of requests starts with a cache miss (empty cache outcome). On its way back, the cache would read caching instructions and store the response. All subsequent requests for this particular resource would yield to cache hits, until the resource becomes stale and needs to be revalidated.

Upon a first revalidation, it is possible that the resource has not changed, hence, a 304 Not Modified would be sent.

Then, the resource eventually gets updated by a client, typically with a PUT or a PATCH request. When the next conditional request arrives, the origin server detects that the resource has changed and replies a 200 OK with updated ETag and Last-Modified headers.

Knowing about cache hits and cache misses along with the 4 possible paths that every cacheable request could take, should give you a good overview of how caching works.

Though overviews can only get you so far. In the following section, we will give a detailed explanation of how origin servers communicate caching instructions.

How origin servers communicate caching instructions

Origin servers communicate their caching instructions to downstream caching proxies by adding a Cache-Control header to their response. This header is an HTTP/1.1 addition and replaces the deprecated Pragma header, that was never a standard one.

Cache-control header values are called directives. The specification defines a lot of them, with various uses and browser-support. These directives are primarily used by developers to communicate caching instructions. However, when present in an HTTP request, clients can also influence the caching decision. Let us now take the time to describe the most useful directives.

max-age

The first important Cache-Control directive to know about is the max-age directive, which allows a server to specify the lifetime of a representation. It is expressed in seconds. For instance, if a cache sees a response containing the header Cache-Control: max-age=3600, it is allowed to store and serve the same response for all subsequent requests for this resource for the next 3600 seconds.

During these 3600 seconds, the resource will be considered fresh and cache hits will occur. Past this delay, the resource will become stale and validation will take over.

no-store, no-cache, must-revalidate

Unlike max-age, the no-store, no-cache and must-revalidate directives are about instructing caches to not cache a resource. However, they differ in subtle ways.

no-store is pretty self-explanatory, and in fact, it does even a little more than the name suggests. When present, a HTTP/1.1 compliant cache must not attempt to store anything, and must also take actions to delete any copy it might have, either in memory, or stored on disk.

The no-cache directive, on the other hand, is arguably much less self-explanatory. This directive actually means to never use a local copy without first validating with the origin server. By doing so, it prevents all possibility of a cache hit, even with fresh resources.

To put it another way, the no-cache directive says that caches must revalidate their representations with the origin server. But then comes another directive, awkwardly named… must-revalidate.

If this starts to get confusing for you, rest assured, you are not alone. If what one wants is not to cache, it has to use no-store instead of no-cache. And if what one wants is to always revalidate, it has to use no-cache instead of must-revalidate.

Confusing, indeed.

As for the must-revalidate directive, it is used to forbid a cache to serve a stale resource. If a resource is fresh, must-revalidate perfectly allows a cache to serve it without forcing any revalidation, unlike with no-store and no-cache. That’s why this header should always be used with a max-age directive, to indicate a desire to cache a resource for some time and when it’s become stale, enforce a revalidation.

When it comes to these last three directives, we find the choice of words to describe each of them particularly confusing: no-store and no-cache are expressed negatively whereas must-revalidate is expressed positively. Their differences would probably be more obvious if they were to be expressed in the same fashion.

Therefore, it is helpful to think about each of them expressed in terms of what is not allowed:

no-store: never store anything
no-cache: never cache hit
must-revalidate: never serve stale

Technically, these directives can appear in the same Cache-Control header. It is not uncommon to see them combined as a comma-separated list of values. A lot of popular websites still seem to behave very conservatively, sending back HTML pages with the following header:

Cache-Control: no-cache, no-store, max-age=0, must-revalidate

When you stumble upon this, the intention behind it is usually pretty clear: the web development team wants to ensure that the resource never gets served stale to anyone.

However, such cache-buster lines are probably not necessary anymore. Past work done in 2017 already showed that browsers are really rather compliant with the specification in respect to Cache-Control response directives. Therefore, unless you’re planning on setting up a caching stack with decades old software, you should be fine using just the directives you need. The most popular combinations will be analyzed in another article.

public, private

The last important directives we haven’t discussed yet are a little bit different, as they control which types of caches are allowed to cache the resources. These are the public and private directives, private being the default one if unspecified.

Private caches are the ones that are supposed to be used by a single user. Typically, this is the web browser’s cache. CDN and reverse-proxies on the contrary, handle requests coming from multiple users.

Why do we need to distinguish these two types of caches ? The answer is straightforward: security, as illustrated by the following example.

Many web applications expose convenience endpoints that rely on information coming from elsewhere than the URL. If two users access their profile by requesting /users/me, at https://api.example/com, and their actual user id is hidden within a Authorization: Bearer 4Ja23ç42…. token, the cache won’t be able to tell these are in fact two very different resources.

Indeed, when constructing their cache key, caches do not inspect HTTP headers unless specifically instructed to do so, as we shall see in the next section.

s-maxage

The s-maxage directive is like the max-age directive, except that it only applies to public caches, which are also referred to as shared caches (hence the s- prefix). If both directives are present, s-maxage will take precedence over max-age on public caches and be ignored on private ones.

When using this directive, the general rule is to always ensure that s-maxage value is below max-age’s. The reasoning behind this rule is that the closer you are to the origin, the more suitable it is to check frequently what the latest representation is.

Imagine you were to cache for one day in the proxy, and one hour in browsers.

Every time a browser would ask a resource to upstream servers, we could know in advance that the proxy will not contact the origin server for at least a day. Therefore, why not put the same TTL directly in the browsers ? As a conclusion, it is a best practice to always leave out a longer TTL in max-age than in s-maxage.

stale-while-revalidate and stale-if-error
These two directives are not technically part of the original specification but are part of an extension which were first described more than 10 years ago. Although their browser support is limited, some popular CDNs have been supported them for more than 5 years!

Though stale-while-revalidate is pretty useful. As the name implies, it allows a cache to “[...] immediately return a stale response while it revalidates it in the background, thereby hiding latency (both in the network and on the server) from clients”.

This caching extension proves really helpful for things like images, where reducing latency is critical for the user experience, and where having a stale version for a few seconds is often better than a painfully downloading image.

As for stale-if-error, it allows a cache to serve a stale version if the origin server returns a 5xx status code. This gives developers a chance to fix potential issues during a grace period where clients are shielded from irritating error pages.

Consider the case of a meteo third-party script. If the meteo server happens to be unreachable for a few minutes, it’s probably best to display a slightly outdated forecast during this lapse of time, than it is to see a portion of the page be blank (or a whole blank page if the code does not handle third-party scripts loading failures.

What we don’t know yet

After examining these Cache-Control directives, we now understand how applications that are distributed on the web, tend to leverage HTTP caching mechanisms in multiple ways, depending on what they need.

Though what we don’t yet understand is what cache softwares actually do with the response they receive. They will most likely have to store it somewhere in order to retrieve it later. That’s the core idea of any caching system after all.

Under normal circumstances, this certainly looks like what we would call an implementation detail. It should be merely enough to know that resources are indeed stored some way. Yet in this case, learning just a little more is actually critical.

Neglecting the mechanisms that govern how caching softwares map objects from the HTTP responses space to their storage space can have really unexpected consequences, such as serving a brotli encoded Chinese document, to a user who does not understand Chinese, using a browser unable to decode brotli ¯_(ツ)_/¯

How caches store and retrieve resources

Albeit unlikely to happen, since most browsers can decode brotli - and since most people know how to 說中文 - the previous situation can still easily occur. To understand why this is the case, one must consider how caches store their representations.

By virtue of what they try to achieve, most caching softwares ought to be able to quickly retrieve simple text documents. To do so, a very simple yet powerful strategy is to use a key-value store. This strategy fits well in-memory representations. Therefore, the question one must answer when designing is the following: how to construct a cache key from an HTTP response?

What we are looking for here is a way to uniquely identify a resource. Conveniently, this is exactly why URIs - Uniform Resource Identifiers - were invented in the first place!

But URIs don’t tell the whole truth about resources. They never describe them entirely, if only for the fact that resources change over time.

Websites get rebranded, new content gets published and users update their profile. Granted, not for the same reasons or at the same frequency, though all resources will eventually change. In fact, the entire Conditional request specification is based on this sole observation: nothing is permanent except change.

Philosophical quotes aside, there is, however, another time-independent reason why resources change. Indeed, any moment, resources may be available in multiple representations. This is why we have Content-Negociation.

The HTTP request headers Accept, Accept-Language, Accept-Encoding, Accept-Charset (and a few other headers who are not strictly speaking part of content negotiation) add another dimension on which representations can differ. As such, the problem of finding a good cache key becomes more complicated. Since all these representations share the same URI, caches must have a way to distinguish them in order to serve the right representation at each client, honoring content negotiation.

And since only origin servers know what different representations are available, it is again the origin server’s responsibility to indicate to a cache based on which headers it will generate a different representation. To do so, the origin servers must add a Vary header containing the value of the request headers that cause different representations to be generated.

When caches see a response coming from an origin server with, for instance, the header Vary**:** Accept-Language, it will examine the value of the Accept-Language header, such as fr-FR**,** and use this value to construct a more specific cache-key, perhaps like https://example.net/home.html_fr-FR.

The actual implementation strategy is of little importance to us. Altering the cache key might not even be the best way to do it. It somehow has to use the value of the header to differentiate representations.

The Vary header can actually point at more than one header, when resources are available in multiple representations. Selecting a cache key when multiple headers are involved is not really much more complicated than with only one header. The real problem when varying over multiple dimensions is the combinatorial explosion.

Unfortunately, there are no ways around this. If you are to cache and serve your resources in multiple representations, you have to pay the cost of a large storage. If you decide to lower your vary cardinality, some of your users will receive cache hits for responses that won’t match their requests.

On the other hand, if you vary properly on everything, and do not have enough storage space, chances are your users won’t be seeing cache hits anytime soon.

Now, it is important to know that this is only a problem if you decide to use a public cache, for which two different requests coming from two different users are running the same code, at the proxy level. If you decide to leverage the browser’s cache only, then you can skip the Vary header altogether and serve resources in as many representations as you want. This is because each browser’s cache will only cache representations matching the user’s preferences. This is good news!

But let’s not get ahead of ourselves just yet. As we said, caches use the value of the header as its input to generate a more specific cache key. But what is to say that all these values are well formatted ? Absolutely nothing! This is the rather inconvenient consequence of RFC father’s robustness principle. HTTP servers are indeed very liberal in what they accept.

However there is hope.

Considering the case of an origin server that can only produce a representation in two different languages, caches must be able to regroup incoming Accept-Content values such as fr, fr-FR, fr_FR_.._ into something such as FR. Otherwise, just like before with the combinatorial explosion, the number of representations will explode, but in this case, for a misguided reason.

The process by which all these representations are regrouped is called normalization and is often done at the cache. Many caches offer configuration utilities or their own languages to deal with these situations. Sometimes, the functions are even already written, or snippets can easily be found on the Internet. The following pictures illustrates the process for the infamous User-Agent header.

Fastly, a popular CDN, sampled 100 000 requests and found that the Accept-Encoding header was expressed in 44 different ways ! As for the User-Agent header, they found a shy of… 8000 different ones! Without normalization, chances are that the cache will never see any hit.

This wraps up the section about representation variation. At this point, we know how to instruct caches to store our resources, and have learned to leverage the Vary header to prevent accidents from happening when using public caches. We have now covered enough of the specification to be able to cache resources effectively.

Common misconceptions

By now, you should have a thorough understanding of how HTTP caching works. Freshness control, resource’s representations and cache hits are no longer mysterious concepts to you. And if you start to feel empowered by all this knowledge, we have some good news for you: we’ve covered a large portion of the specification, and you now know pretty much all that’s necessary to be up and running.

But make no mistake. Caching is a complex topic.

Experience has shown us that, unless you’re dealing with it on a day-to-day basis, what may be crystal clear today will quickly turn into something rather blurry after a few weeks. Therefore, we decided to conclude this second article by dispelling two common misconceptions that are all too easy to make.

Freshness-control and validation

This might seem obvious after reading the previous sections but it is worth repeating many times. Freshness control and validation (which we have slightly discussed in the beginning) are two very distinct mechanisms that serve two very different purposes, and involve HTTP requests between different pieces.

Freshness control always happen in a cache and is solely based on time
Validations always happen in the origin server and are based both on time and on identifiers (ETags)

This is something we find important to remind ourselves. It means that once the cache has received temporal instructions, it can - and best believe it will - serve resources without ever contacting the origin server until the timer expires.

For instance, if your web application’s HTML file reaches a browser and the HTTP response happens to include the header Cache-Control: max-age=86400 the browser will happily serve the same version of your app for a day. In this case, the browser would serve it for one day without any possible action from you or anyone, except the user, if one ever decided to flush his browser’s cache.

If you’re thinking everyone can make mistakes, and one day is not so bad, well, brace yourself: the maximum max-age value is… 31536000 seconds! That is to say, one year. This is the reason why HTML files are very dangerous to cache like this, and should generally be declared with Cache-Control: no-cache.

Freshness and most recent representation

Another misconception is to believe that cache hits and freshness have anything to do with having the last available version of a resource. This is what we all try to achieve, but one can never truly know if the resource it has been served from a cache is indeed the most up-to-date version. In fact, this holds true even in the absence of cache. It has to do with the nature of distributed applications: other people’s actions can change the things we are interacting with at any time.

When querying the state of the application, the ETag header must always be used to always let the server know what our current understanding of the application’s state is. And if it does not match the server’s, 409 Conflict are expected to be received on the client side.

Conclusion

Along this article, we have described how caching actually works. Now would be a good time to spin up a local dev server and fiddle around with these two core headers: Cache-Control and Vary to see them in action.

We started by giving an overview of how caching works, illustrating the four possible paths that a request can take : the happy path (cache hit) and the 3 possible ways to have a cache miss : empty cache, failed revalidation and successful revalidation. This overview alone gives the possibility to understand how complex caching topologies can fit together.

Then, we went deeper and looked at all the most useful Cache-Control headers, and clarified some subtle differences that are all easily missed.

We also looked at the Vary header and the fundamental difference between resources and representations, to avoid serving the wrong representation to the right client.

Finally, we took some time to review it all through the angle of common misconceptions you might encounter, and hopefully helped you to avoid them.

In the next article, we’ll apply all of this knowledge to set up a local lab environment in which we will set an innocent node.js app on fire with a load-testing tool, right before rescuing it with the help of a popular caching software.

Stay tuned!

To go further:

The official specification about the material we covered (and other things)
https://tools.ietf.org/html/rfc7234#section-5.3

Google Web’s Fundamental
https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching#defining-optimal-cache-control-policy

About the Cache-Control header:
https://developer.mozilla.org/fr/docs/Web/HTTP/Headers/Cache-Control

About the Vary Header:
https://www.smashingmagazine.com/2017/11/understanding-vary-header/
https://www.fastly.com/blog/best-practices-using-vary-header
https://www.fastly.com/blog/getting-most-out-vary-fastly
https://www.fastly.com/blog/understanding-vary-header-browser

cache - freeCodeCamp.org

How to Cache Golang API Responses for High Performance

Table of Contents

Response Caching with Local and Redis Storage

Database Query Result Caching

HTTP Caching with ETag and Cache-Control

Stale-While-Revalidate with Background Refresh

Wrapping Up

Memcached Crash Course

Course Highlights

Why Memcached?

Conclusion

Docker Cache – How to Do a Clean Image Rebuild and Clear Docker's Cache

Docker Build Cache

How to Leverage the Docker Build Cache

How to Use the Docker Build --no-cache Option

How to Use Docker Arguments for Cache-Busting

Summary

References

What is Cached Data? What does Clear Cache Mean and What Does it Do?

First, what's a cache?

The distinction between a cache and other types of repositories

The Browser Cache: a memory cache

How does the browser know what is stale in the cache?

cache-busting

HEAD requests and conditional requests

Cache-Control

E-tag

Other header tags affecting caching

Forcing a cache refresh from the browser

What's a hard reload?

What's clear cache and hard reload?

Content Delivery Networks: a geo-located cache

An In-depth Introduction to HTTP Caching: Cache-Control & Vary

Introduction - scope of the article

The HTTP caching decision tree

Hits and misses

How origin servers communicate caching instructions

max-age

no-store, no-cache, must-revalidate

public, private

s-maxage

What we don’t know yet

How caches store and retrieve resources

Common misconceptions

Freshness-control and validation

Freshness and most recent representation

Conclusion

To go further:

How to Use the Docker Build `--no-cache` Option