Léo Jacquemin - freeCodeCamp.org

An In-depth Introduction to HTTP Caching: Cache-Control & Vary

Léo Jacquemin — Thu, 24 Oct 2019 09:56:49 +0000

Introduction - scope of the article

This series of articles deals with caching in the context of HTTP. When properly done, caching can increase the performance of your application by an order of magnitude. On the contrary, when overlooked or completely ignored, it can lead to some very unwanted side effects caused by misbehaving proxy servers that, in the absence of clear caching instructions, decide to cache anyway and serve stale resources.

In the first part of this series, we argued that caching is the most effective way to increase performance, when measured by the page load time. In this second part, it is time to shift our focus to the mechanisms at our disposal. To put it in another way: how does HTTP caching actually work?

To answer this question, we decided to consider the case of an empty cache that starts progressively caching and serving resources. As it gradually receives incoming HTTP requests, our cache will start behaving accordingly. Serving the resource from the cache when a fresh copy is available, varying over multiple representations, making a conditional request... This way, we can introduce each concept progressively as we need it.

At first, our empty cache will have no choice but to forward requests to the origin server. This will allow us to understand how origin servers instruct our cache on what to do with the resource, such as if it is allowed to store it, and for how long. For this, we will examine each Cache-Control directive and clarify some of them that have been known to have conflicting meanings.

Second, we will look at what happens when our cache receives a request for a resource it already knows. How does our cache decide if it can re-use a previously stored response? How does it map a given HTTP request to a particular resource? To answer these, we will learn about representation variations with the Vary header.

This article is going to focus on knowledge that’s the most valuable from a web developer’s perspective. Therefore, conditional requests are only discussed briefly and will be the focus of another article.

Without further ado, let us start with an overview of what we will be exploring.

The HTTP caching decision tree

Conceptually, a cache system always involve at least three participants. With HTTP, these participants are the client, the server, and the caching proxy.

However, when learning about HTTP caching, we strongly encourage you not to think of the client as your typical web browser because these days, they all ship with their own HTTP caching layer. It makes it difficult to clearly separate the browser from the cache. For this reason, we invite you to think of the client as a headless command line program such as cURL or any application without an embedded HTTP cache.

All precautions aside, let us now deep dive into the subject by taking a look at the following picture: the HTTP caching decision tree.

This picture illustrates all the possible paths a request can take every time a client asks for a resource to an origin server behind a caching system. A careful examination of this illustration reveals that there are only four possible outcomes.

Clearly separating these outcomes in our minds is actually very convenient, seeing as each important caching concept (cache instructions, representation matching, conditional requests and resource aging) maps to each one of them.

Let us describe succinctly each one by introducing two important terms relating to the HTTP caching terminology: cache hits and cache misses.

Hits and misses

The first possible outcome is when the cache finds a matching resource, and is allowed to serve it, which, in the caching world, are indeed two distinct things. This outcome is what we commonly call a cache hit, and is the reason why we use caches in the first place.

When a cache hit happens, it completely offloads the origin server and the latency is dramatically reduced. In fact, when the cache hit happens in the browser’s HTTP cache latency is null and the requested resource is instantly available.

Unfortunately, cache hits account only one of the four possible outcomes. The rest of them fall into the second category, also known as cache misses, which can happen for only three reasons.

The first reason a cache miss typically happens is simply when the cache does not find any matching resource in its storage. This is usually a sign that the resource has never been requested before, or has been evicted from the cache to free up some space. In such cases, the proxy has no choice but to forward the request to the origin server, fully download the response and look for caching instructions in the response headers.

The second reason a cache miss can happen is actually just as detrimental, where the cache detects a matching representation, one that it could potentially use. However, the resource is not considered to be fresh anymore - we will see how exactly in the cache-control section of this article - but is said to be stale.

In such case, the cache sends a special kind of request, called a conditional request to the origin server. Conditional requests allow caches to retrieve resources only if they are different from the one they have in their local storage. Since only the origin server ever has the most recent representation of a given resource, conditional requests always have to go through the whole caching proxy chain up to the origin server.

These special requests have only two possible outcomes. If the resource has not changed, the cache is instructed to use its local copy by receiving a 304 Not Modified response along with updated headers and an empty body. This outcome, the third one on our list, is called a successful validation.

Finally, the last possible outcome is when the resource has changed. In this case, the origin server sends a normal 200 OK response, as it would if the cache was empty and had forwarded the request. To put it another way, cache misses caused by empty cache and failed validation yield exactly the same HTTP response.

To best visualize these four paths, it is helpful to picture them in a timeline, as illustrated below.

At first, the cache is empty. The flow of requests starts with a cache miss (empty cache outcome). On its way back, the cache would read caching instructions and store the response. All subsequent requests for this particular resource would yield to cache hits, until the resource becomes stale and needs to be revalidated.

Upon a first revalidation, it is possible that the resource has not changed, hence, a 304 Not Modified would be sent.

Then, the resource eventually gets updated by a client, typically with a PUT or a PATCH request. When the next conditional request arrives, the origin server detects that the resource has changed and replies a 200 OK with updated ETag and Last-Modified headers.

Knowing about cache hits and cache misses along with the 4 possible paths that every cacheable request could take, should give you a good overview of how caching works.

Though overviews can only get you so far. In the following section, we will give a detailed explanation of how origin servers communicate caching instructions.

How origin servers communicate caching instructions

Origin servers communicate their caching instructions to downstream caching proxies by adding a Cache-Control header to their response. This header is an HTTP/1.1 addition and replaces the deprecated Pragma header, that was never a standard one.

Cache-control header values are called directives. The specification defines a lot of them, with various uses and browser-support. These directives are primarily used by developers to communicate caching instructions. However, when present in an HTTP request, clients can also influence the caching decision. Let us now take the time to describe the most useful directives.

max-age

The first important Cache-Control directive to know about is the max-age directive, which allows a server to specify the lifetime of a representation. It is expressed in seconds. For instance, if a cache sees a response containing the header Cache-Control: max-age=3600, it is allowed to store and serve the same response for all subsequent requests for this resource for the next 3600 seconds.

During these 3600 seconds, the resource will be considered fresh and cache hits will occur. Past this delay, the resource will become stale and validation will take over.

no-store, no-cache, must-revalidate

Unlike max-age, the no-store, no-cache and must-revalidate directives are about instructing caches to not cache a resource. However, they differ in subtle ways.

no-store is pretty self-explanatory, and in fact, it does even a little more than the name suggests. When present, a HTTP/1.1 compliant cache must not attempt to store anything, and must also take actions to delete any copy it might have, either in memory, or stored on disk.

The no-cache directive, on the other hand, is arguably much less self-explanatory. This directive actually means to never use a local copy without first validating with the origin server. By doing so, it prevents all possibility of a cache hit, even with fresh resources.

To put it another way, the no-cache directive says that caches must revalidate their representations with the origin server. But then comes another directive, awkwardly named… must-revalidate.

If this starts to get confusing for you, rest assured, you are not alone. If what one wants is not to cache, it has to use no-store instead of no-cache. And if what one wants is to always revalidate, it has to use no-cache instead of must-revalidate.

Confusing, indeed.

As for the must-revalidate directive, it is used to forbid a cache to serve a stale resource. If a resource is fresh, must-revalidate perfectly allows a cache to serve it without forcing any revalidation, unlike with no-store and no-cache. That’s why this header should always be used with a max-age directive, to indicate a desire to cache a resource for some time and when it’s become stale, enforce a revalidation.

When it comes to these last three directives, we find the choice of words to describe each of them particularly confusing: no-store and no-cache are expressed negatively whereas must-revalidate is expressed positively. Their differences would probably be more obvious if they were to be expressed in the same fashion.

Therefore, it is helpful to think about each of them expressed in terms of what is not allowed:

no-store: never store anything
no-cache: never cache hit
must-revalidate: never serve stale

Technically, these directives can appear in the same Cache-Control header. It is not uncommon to see them combined as a comma-separated list of values. A lot of popular websites still seem to behave very conservatively, sending back HTML pages with the following header:

Cache-Control: no-cache, no-store, max-age=0, must-revalidate

When you stumble upon this, the intention behind it is usually pretty clear: the web development team wants to ensure that the resource never gets served stale to anyone.

However, such cache-buster lines are probably not necessary anymore. Past work done in 2017 already showed that browsers are really rather compliant with the specification in respect to Cache-Control response directives. Therefore, unless you’re planning on setting up a caching stack with decades old software, you should be fine using just the directives you need. The most popular combinations will be analyzed in another article.

public, private

The last important directives we haven’t discussed yet are a little bit different, as they control which types of caches are allowed to cache the resources. These are the public and private directives, private being the default one if unspecified.

Private caches are the ones that are supposed to be used by a single user. Typically, this is the web browser’s cache. CDN and reverse-proxies on the contrary, handle requests coming from multiple users.

Why do we need to distinguish these two types of caches ? The answer is straightforward: security, as illustrated by the following example.

Many web applications expose convenience endpoints that rely on information coming from elsewhere than the URL. If two users access their profile by requesting /users/me, at https://api.example/com, and their actual user id is hidden within a Authorization: Bearer 4Ja23ç42…. token, the cache won’t be able to tell these are in fact two very different resources.

Indeed, when constructing their cache key, caches do not inspect HTTP headers unless specifically instructed to do so, as we shall see in the next section.

s-maxage

The s-maxage directive is like the max-age directive, except that it only applies to public caches, which are also referred to as shared caches (hence the s- prefix). If both directives are present, s-maxage will take precedence over max-age on public caches and be ignored on private ones.

When using this directive, the general rule is to always ensure that s-maxage value is below max-age’s. The reasoning behind this rule is that the closer you are to the origin, the more suitable it is to check frequently what the latest representation is.

Imagine you were to cache for one day in the proxy, and one hour in browsers.

Every time a browser would ask a resource to upstream servers, we could know in advance that the proxy will not contact the origin server for at least a day. Therefore, why not put the same TTL directly in the browsers ? As a conclusion, it is a best practice to always leave out a longer TTL in max-age than in s-maxage.

stale-while-revalidate and stale-if-error
These two directives are not technically part of the original specification but are part of an extension which were first described more than 10 years ago. Although their browser support is limited, some popular CDNs have been supported them for more than 5 years!

Though stale-while-revalidate is pretty useful. As the name implies, it allows a cache to “[...] immediately return a stale response while it revalidates it in the background, thereby hiding latency (both in the network and on the server) from clients”.

This caching extension proves really helpful for things like images, where reducing latency is critical for the user experience, and where having a stale version for a few seconds is often better than a painfully downloading image.

As for stale-if-error, it allows a cache to serve a stale version if the origin server returns a 5xx status code. This gives developers a chance to fix potential issues during a grace period where clients are shielded from irritating error pages.

Consider the case of a meteo third-party script. If the meteo server happens to be unreachable for a few minutes, it’s probably best to display a slightly outdated forecast during this lapse of time, than it is to see a portion of the page be blank (or a whole blank page if the code does not handle third-party scripts loading failures.

What we don’t know yet

After examining these Cache-Control directives, we now understand how applications that are distributed on the web, tend to leverage HTTP caching mechanisms in multiple ways, depending on what they need.

Though what we don’t yet understand is what cache softwares actually do with the response they receive. They will most likely have to store it somewhere in order to retrieve it later. That’s the core idea of any caching system after all.

Under normal circumstances, this certainly looks like what we would call an implementation detail. It should be merely enough to know that resources are indeed stored some way. Yet in this case, learning just a little more is actually critical.

Neglecting the mechanisms that govern how caching softwares map objects from the HTTP responses space to their storage space can have really unexpected consequences, such as serving a brotli encoded Chinese document, to a user who does not understand Chinese, using a browser unable to decode brotli ¯_(ツ)_/¯

How caches store and retrieve resources

Albeit unlikely to happen, since most browsers can decode brotli - and since most people know how to 說中文 - the previous situation can still easily occur. To understand why this is the case, one must consider how caches store their representations.

By virtue of what they try to achieve, most caching softwares ought to be able to quickly retrieve simple text documents. To do so, a very simple yet powerful strategy is to use a key-value store. This strategy fits well in-memory representations. Therefore, the question one must answer when designing is the following: how to construct a cache key from an HTTP response?

What we are looking for here is a way to uniquely identify a resource. Conveniently, this is exactly why URIs - Uniform Resource Identifiers - were invented in the first place!

But URIs don’t tell the whole truth about resources. They never describe them entirely, if only for the fact that resources change over time.

Websites get rebranded, new content gets published and users update their profile. Granted, not for the same reasons or at the same frequency, though all resources will eventually change. In fact, the entire Conditional request specification is based on this sole observation: nothing is permanent except change.

Philosophical quotes aside, there is, however, another time-independent reason why resources change. Indeed, any moment, resources may be available in multiple representations. This is why we have Content-Negociation.

The HTTP request headers Accept, Accept-Language, Accept-Encoding, Accept-Charset (and a few other headers who are not strictly speaking part of content negotiation) add another dimension on which representations can differ. As such, the problem of finding a good cache key becomes more complicated. Since all these representations share the same URI, caches must have a way to distinguish them in order to serve the right representation at each client, honoring content negotiation.

And since only origin servers know what different representations are available, it is again the origin server’s responsibility to indicate to a cache based on which headers it will generate a different representation. To do so, the origin servers must add a Vary header containing the value of the request headers that cause different representations to be generated.

When caches see a response coming from an origin server with, for instance, the header Vary**:** Accept-Language, it will examine the value of the Accept-Language header, such as fr-FR**,** and use this value to construct a more specific cache-key, perhaps like https://example.net/home.html_fr-FR.

The actual implementation strategy is of little importance to us. Altering the cache key might not even be the best way to do it. It somehow has to use the value of the header to differentiate representations.

The Vary header can actually point at more than one header, when resources are available in multiple representations. Selecting a cache key when multiple headers are involved is not really much more complicated than with only one header. The real problem when varying over multiple dimensions is the combinatorial explosion.

Unfortunately, there are no ways around this. If you are to cache and serve your resources in multiple representations, you have to pay the cost of a large storage. If you decide to lower your vary cardinality, some of your users will receive cache hits for responses that won’t match their requests.

On the other hand, if you vary properly on everything, and do not have enough storage space, chances are your users won’t be seeing cache hits anytime soon.

Now, it is important to know that this is only a problem if you decide to use a public cache, for which two different requests coming from two different users are running the same code, at the proxy level. If you decide to leverage the browser’s cache only, then you can skip the Vary header altogether and serve resources in as many representations as you want. This is because each browser’s cache will only cache representations matching the user’s preferences. This is good news!

But let’s not get ahead of ourselves just yet. As we said, caches use the value of the header as its input to generate a more specific cache key. But what is to say that all these values are well formatted ? Absolutely nothing! This is the rather inconvenient consequence of RFC father’s robustness principle. HTTP servers are indeed very liberal in what they accept.

However there is hope.

Considering the case of an origin server that can only produce a representation in two different languages, caches must be able to regroup incoming Accept-Content values such as fr, fr-FR, fr_FR_.._ into something such as FR. Otherwise, just like before with the combinatorial explosion, the number of representations will explode, but in this case, for a misguided reason.

The process by which all these representations are regrouped is called normalization and is often done at the cache. Many caches offer configuration utilities or their own languages to deal with these situations. Sometimes, the functions are even already written, or snippets can easily be found on the Internet. The following pictures illustrates the process for the infamous User-Agent header.

Fastly, a popular CDN, sampled 100 000 requests and found that the Accept-Encoding header was expressed in 44 different ways ! As for the User-Agent header, they found a shy of… 8000 different ones! Without normalization, chances are that the cache will never see any hit.

This wraps up the section about representation variation. At this point, we know how to instruct caches to store our resources, and have learned to leverage the Vary header to prevent accidents from happening when using public caches. We have now covered enough of the specification to be able to cache resources effectively.

Common misconceptions

By now, you should have a thorough understanding of how HTTP caching works. Freshness control, resource’s representations and cache hits are no longer mysterious concepts to you. And if you start to feel empowered by all this knowledge, we have some good news for you: we’ve covered a large portion of the specification, and you now know pretty much all that’s necessary to be up and running.

But make no mistake. Caching is a complex topic.

Experience has shown us that, unless you’re dealing with it on a day-to-day basis, what may be crystal clear today will quickly turn into something rather blurry after a few weeks. Therefore, we decided to conclude this second article by dispelling two common misconceptions that are all too easy to make.

Freshness-control and validation

This might seem obvious after reading the previous sections but it is worth repeating many times. Freshness control and validation (which we have slightly discussed in the beginning) are two very distinct mechanisms that serve two very different purposes, and involve HTTP requests between different pieces.

Freshness control always happen in a cache and is solely based on time
Validations always happen in the origin server and are based both on time and on identifiers (ETags)

This is something we find important to remind ourselves. It means that once the cache has received temporal instructions, it can - and best believe it will - serve resources without ever contacting the origin server until the timer expires.

For instance, if your web application’s HTML file reaches a browser and the HTTP response happens to include the header Cache-Control: max-age=86400 the browser will happily serve the same version of your app for a day. In this case, the browser would serve it for one day without any possible action from you or anyone, except the user, if one ever decided to flush his browser’s cache.

If you’re thinking everyone can make mistakes, and one day is not so bad, well, brace yourself: the maximum max-age value is… 31536000 seconds! That is to say, one year. This is the reason why HTML files are very dangerous to cache like this, and should generally be declared with Cache-Control: no-cache.

Freshness and most recent representation

Another misconception is to believe that cache hits and freshness have anything to do with having the last available version of a resource. This is what we all try to achieve, but one can never truly know if the resource it has been served from a cache is indeed the most up-to-date version. In fact, this holds true even in the absence of cache. It has to do with the nature of distributed applications: other people’s actions can change the things we are interacting with at any time.

When querying the state of the application, the ETag header must always be used to always let the server know what our current understanding of the application’s state is. And if it does not match the server’s, 409 Conflict are expected to be received on the client side.

Conclusion

Along this article, we have described how caching actually works. Now would be a good time to spin up a local dev server and fiddle around with these two core headers: Cache-Control and Vary to see them in action.

We started by giving an overview of how caching works, illustrating the four possible paths that a request can take : the happy path (cache hit) and the 3 possible ways to have a cache miss : empty cache, failed revalidation and successful revalidation. This overview alone gives the possibility to understand how complex caching topologies can fit together.

Then, we went deeper and looked at all the most useful Cache-Control headers, and clarified some subtle differences that are all easily missed.

We also looked at the Vary header and the fundamental difference between resources and representations, to avoid serving the wrong representation to the right client.

Finally, we took some time to review it all through the angle of common misconceptions you might encounter, and hopefully helped you to avoid them.

In the next article, we’ll apply all of this knowledge to set up a local lab environment in which we will set an innocent node.js app on fire with a load-testing tool, right before rescuing it with the help of a popular caching software.

Stay tuned!

To go further:

The official specification about the material we covered (and other things)
https://tools.ietf.org/html/rfc7234#section-5.3

Google Web’s Fundamental
https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching#defining-optimal-cache-control-policy

About the Cache-Control header:
https://developer.mozilla.org/fr/docs/Web/HTTP/Headers/Cache-Control

About the Vary Header:
https://www.smashingmagazine.com/2017/11/understanding-vary-header/
https://www.fastly.com/blog/best-practices-using-vary-header
https://www.fastly.com/blog/getting-most-out-vary-fastly
https://www.fastly.com/blog/understanding-vary-header-browser

An in-depth introduction to HTTP Caching: exploring the landscape

Léo Jacquemin — Mon, 17 Dec 2018 18:21:46 +0000

Cache Me If You Can

About 2 years ago, I remember witnessing a reunion that had a profound impact on my software developer life.

The client, a former developer with decades of experience, a tech savvy product owner and another senior developer were talking about a caching problem. The API was answering stales video fragments that caused clients to view the wrong content.

It was bad. Really bad.

Although nobody knew what to do to fix the problem, they all seemed to agree on one thing. The problem was most likely a caching problem.

I didn’t know a thing about HTTP caching back then, so all I could do was listen to them arguing about what could have possibly gone wrong. And all of them had a different explanation.

“Browsers deviate from the specs !” the client said. “The CDN override our caching directives!” the tech product owner thought. “We need to invalidate the whole cache” the tech lead replied.

Since I wanted to be helpful too, even though I wasn’t exactly sure how, given my level of understanding, I started asking questions to my fellow coworkers.

I remember very well the level of confidence everyone seemed to have when answering my questions. Everyone acted like they knew what HTTP caching was all about. But at the same time, all their answers felt really vague and shallow. It was really like everybody had this high level of understanding of how things worked, but nobody wanted to get into the details.

Eventually, the problem magically fixed itself and the team was pretty satisfied with how things worked out.

But I was not.

I thought to myself, what just happened these past few days? Why is it that nobody was willing to admit that they don’t have a clue on how this whole caching thing works? Is it the curse of the software developer to always be tempted to pretend that we know more about a subject than we actually do?

So I decided to check it out for myself. And I somewhat understood why everybody was pretending. The subject was by no means easy. But I was determined to go to the bottom of this.

And that’s how what was supposed to be a few hours of Googling turned out to be months of reading articles, meditating on the specs, and experimenting with caching softwares.

Fast forward today, I now realize that web performance (of which HTTP caching is one of the most important aspects) is a topic on which we are not trained enough. Too few articles talk about it, and most of them don’t go deep enough.

The following articles are my attempt to rectify that by sharing everything I have learned during the past two years about HTTP caching.

I’m not a caching expert, and I won’t turn you into one. But it will hopefully give you a strong understanding of how things actually work.

Let’s Get Started

This series of articles deals with caching in the context of HTTP. When properly done, caching can increase the performance of your application by an order of magnitude.

On the contrary, when overlooked or completely ignored, it can lead to some really unwanted side effects caused by misbehaving proxy servers that, in the absence of clear caching instructions, decide to cache anyway and serve stale resources.

Before we get to the tactical details of how caching works, it helps to understand the context and the landscape of the problem we are up against. For this reason, the first part of this series covers where caching should happen and why we need it.

Without further ado, let us start with an overview of key considerations to keep in mind when dealing with HTTP caching, and to a lesser extent, with web performance in general.

Caching everywhere

Browsers

Caching is a very popular technique. The idea is indeed pretty appealing: no matter how long the I/O request, how CPU-intensive the computation, or any other programming task, it is always the same: storing the result somewhere and retrieving it as it is, for its further application.

Taking the example of browser’s HTTP cache that all browsers implement, web resources are stored on the user’s filesystem. Hence further requests that will access these same resources will have them delivered instantly.

No network request, no client/server round-trips, no database access, and so on. Can you think of any performance enhancement that would yield better results than no latency at all and complete server offloading? That’s simply not possible.

One might think that this situation is too ideal and impractical. If it were true, how come most pages don’t load that fast? One reason for this is because, even though all web resources are cacheable, they should not be cached the same way.

HTML files for example, which are the first to be downloaded and that contain references to other assets, are notoriously dangerous to cache. Therefore, they’re unlikely to find their way to browser caches except for a few minutes at most, as we shall see in a moment.

But another possible explanation, one that we find more likely based on our experience, is that caching policies are often completely left out for web servers to decide.

Setting a flag on a web server’s configuration file to automatically activate ETag generation and Last-Modified headers is not time-consuming and can give decent results.

For some time at least. Until one realizes that the feature doesn’t work as expected or even worse, that users are starting to be served stale content due to unknown reasons.

Besides, these web servers won’t do much good in terms of caching your API. Most of them generate cache headers based on files metadata, which can have subtle consequences. But with API requests, there is no file to read metadata from. Hence when they see a resource that is dynamically generated, all they can do is watch and forward the request.

Granted, as efficient as it can be, caching in the browser is not as easy it as looks. Major browsers implement different layers of cache, of which the HTTP one we’re talking about is only a piece. And in case you’re wondering, they can interact in some ways that are not always as predictable as we would like them to be.

Besides, caching in the browser is notoriously dangerous, because developers lack the ability to invalidate resources at will. Doing so would involve allowing a web server to push information to every client that interacted with it, without clients initiating a connection, which is not possible in a client/server architecture.

Furthermore, from the cacher’s point of view, as powerful as it is, the browser has other flaws. It uses some doubtful heuristics when no caching instructions are explicitly present, all the more reason for us to help it know exactly what to do.

But even if we were to do it, users have the ability to flush their cache anyway or disable it. Not the almighty caching tool that one could ask for Christmas, after all.

So let us take the browser out of the equation for now and ask ourselves: without it, is HTTP caching still relevant? Is it implemented at other places? As it turns out, the browser is just one piece of the caching chain. And if browsers were all of a sudden to stop caching everything, rest assured: CDNs got us covered.

CDN

Content Delivery Networks (CDN) are the unified champions of the HTTP caching world. Most of them have installed tons of servers — Akamai has ~240 000 — all geographically around the globe in order to serve our content closely to our end users.

These companies have accumulated decades of experience on web performance. Most of the people who write the specs or the software that power the specs usually work in these companies, which is a bit of an indication that they know what they’re doing. Let us take a quick tour of why they are so important and how they work.

First and foremost, is it crucial to understand that all of these servers are HTTP servers. In HTTP terminology, they are proxy servers which means that they speak HTTP. They do not encapsulate our requests into another shady proprietary application protocol. They just use these proxies, most of which are even free or open source!

As a direct consequence, any knowledge of HTTP caching is immediately actionable to leverage the infrastructure they put at our disposal. In addition to billions of browsers, we now have thousands of servers strategically placed by specialized companies waiting for us to instruct them how to cache our content for maximum efficiency. Furthermore, depending on what your priorities are, that’s not even the best feature.

Most modern CDNs advertise the ability to programmatically purge resources out of the CDN’s network instantly. Let us say this again: programmatically and instantly.

As far as HTTP caching is concerned, the two hard problems in computer sciences might just have been reduced down to one! Caching anything and invalidating instantly whenever we want. From a developer keen on web performance point of view, it can hardly get any better.

A word of caution: CDNs are not to be blindly trusted based on that. We’ve already experienced some slight differences between what was marketed on the brochure, and what we had in production, where other cache busting techniques had to be used to ensure the cache was actually cleared.

Before starting to cache all of your resources forever carelessly, experimenting on a small sample first might be a good idea. However, if it’s not here today, it will eventually land consistently everywhere, making our lives much easier.

Another aspect to keep in mind is that it is in every CDN’s best interest that developers see them as effective and powerful tools. As a consequence, they’ll often do their best to comply to the specification.

Also, they all provide some web interface that makes it a breeze to give caching rules that will either override or play nice with upstreaming caching directives coming from origin servers. The ability to configure your caching policy outside of your codebase happens to have two interesting consequences.

First, it means that people other than developers can have control on this which can be seen as a strength or a weakness based on your perspective. The ability to have someone other than a software developer fine-grain caching settings at the CDN’s level on critical occasions might come in handy in some situations.

But perhaps even more importantly, it means that the performance part of your application, or at least a large part of it, can be completely factored out at the infrastructure level.

All developers that have experienced performance problems at some point know this: it is rarely something you can mutualize in a single file called performance and is often best planned ahead, just like detailed application-level logging. But that’s not necessarily the case with caching.

Provided your caching headers are smartly set and your CDN well configured, you could write poorly optimized server code (although, please don’t) and still have the vast majority of your users be served content in less than 300 milliseconds, by reusing cached versions that are still perfectly fresh.

On the flip side, as one might expect, setting up and maintaining such a network of servers is both expensive and complex. As a result, although some of them have free plans that already allow for some serious performance boost in rather wide geographic areas, they remain paid solutions. If your intention is to cache millions of resources, be prepared to pay several thousand of dollars. This is where private proxy caches come into play.

Private proxy

The third and last player of the HTTP caching game is simply the same softwares many of the CDNs we just talked about are made of. Do the names Varnish, Squid, Traffic Server, or even Nginx ring a bell? Well, they certainly should!

Given what we just said about the unmatched performance of web browser caches, and the case we just argued in favor of CDNs, one might legitimately asks: why bother setting up these in front of my origin servers when CDNs can do much more, and browsers are closer to my end users?

Well, this third and last solution in the HTTP caching landscape also comes with its fair share of advantages. As a matter of fact, we’ll argue that this should often be the first solution to look for. Let us examine a few bonus points of the most popular solutions.

First, these solution are free and open source, which can be seen as a double edge sword. How many of such software that were once praised by the community suddenly stopped being maintained by their core committers due to a lack of interest, sponsoring, or both? The fear of seeing a project’s main dependency (web framework, ui library…) going to the software graveyard is a real concern.

Although when assessing this risk, one must always consider the maturity of the technology, how long it’s been around, which big company is using or supporting it — they usually do both — and how effective it is at solving a particular problem. Lucky for us, the proxies we’re talking about score pretty high on all levels.

Another aspect simply comes from the performance gain. As mentioned previously, these softwares are what CDNs are made of. This has two consequences.

First, it massively decreases the chance of termination of their usage, because CDNs whole infrastructure relies on it. This kind of stability is greater than when a company is just using a library as part of a larger system. In this case, the software is the system.

Second, any hard gained knowledge about their installation, configuration and maintenance will directly be transferable the day you decide to switch or complement your caching infrastructure with a CDN, since they are the same servers! In the software development world, where everything changes so fast, this is always good news.

It’s the same reason why learning HTTP caching is a good bet, because it’s relevant in many different places. And will likely stay that way for decades, we shall see why in the end of this article.

Browsers, edge servers, proxy servers… that’s a lot of caching intermediaries. Thinking about all these caches at the same time can be a little overwhelming and hard to picture. Luckily for us as we mentioned previously, all these caches speak HTTP and comply to the same specification.

As proxy servers, they all act transparently both for clients and for servers. Origin servers communicate with the proxy as if it were the client, and end users browsers communicate with the proxy as if it were the server. This holds true even between proxy servers.

As such, we can model the reality by considering that all caching infrastructures are equivalent to one with a single caching proxy in place.

This is best described by the following picture:

We’ll use this simplification in the rest of this series of articles. This abstraction is helpful to visualize the mechanisms at play, but it comes with certain limitations. Putting two proxies one after the other can have subtle consequences.

So far, we have covered a lot of ground without giving away anything about the detailed interactions between a client and a caching server. HTTP caching is a complex subject. Before moving on to the real technicalities in part 2, there is one last important aspect that needs to be explored.

Caching for the future

Have you ever had to wait a few seconds to interact with a web page, or to see anything meaningful on the screen? Probably. That’s actually not unlikely at all. Everyone has experienced the slow internet at some point in their life, even at home with fiber connectivity.

Past UX research actually has got some scary metrics about this. Metrics that have been the same for more than 40 years, by the way, making them unlikely to change anytime soon. Their guidelines are expressed in terms of hundreds of milliseconds, whereas we are used to browse the Internet and wait several seconds.

So why do we believe that HTTP caching is relevant today and will almost certainly remain so for the years to come?

It all comes down to this basic question: why is the web so slow?

This question is undoubtedly a difficult one, and a rigorous examination would take us far too deep and lose our primary focus, which is HTTP caching. However, if we don’t have any idea on what makes a web page slow, how can we be sure that caching, despite all of its virtues, is the right tool for the job?

From its conception, the web has certainly changed a lot. The days where web pages consisted of simple HTML files containing mostly text and hyperlinks are long gone.

These days, many websites are labeled as web applications, as their look and feel resembles that of desktop apps. But despite all the improvements and innovations that have been made over the years, one thing has always remained the same: it’s always begun with the downloading of an HTML file.

As it gradually downloads the HTML, the browser discovers all the other resources that combined will result in all the client’s side code that gets parsed and ultimately, executed.

In case of a typical SPA for instance, the flow of requests goes on. Upon execution, the application starts downloading data from the server, typically serialized as JSON these days, in order to render the UI. Each URL inside the JSON payload will, once referenced into the code (it doesn’t even have to be added to the DOM), triggers another download so that it can be displayed on the screen.

This model of execution, where all the bytes needed for the application to do its job are scattered among different places and must be downloaded every time is quite remarkable. Arguably, it’s what makes the web so unique in the software development world.

But there is a catch.

Indeed as of today, the typical web application requires 75 requests and weighs 1.5 Mb, meaning that browsers must initiate a lot of requests of ~20kB each. To put it another way, it means that a typical web application is made of lots of short-lived connections.

And here is the catch: this is the exact opposite of what TCP is optimized for.

The anatomy of a web request

In theory, all of these 75 requests should go through these steps:

DNS resolution
TCP handshake
SSL handshake
Data downloading constrained by TCP flow and congestion control

Let us walk through each of them and draw a counter-intuitive consequence from it.

DNS resolution is the process of converting a human readable hostname such as example.com into an IP address. Although the DNS protocol is based on UDP instead of TCP, the journey to getting a hostname IP address can be really long, involving multiple DNS servers. And it’s not uncommon that these DNS resolutions take between 50 to 250 milliseconds.

Then, each request must initiate a TCP connection. HTTP has always needed a reliable transport protocol to work. If the ASCII bytes representing every HTTP request were to be delivered out of order, a status line such as

GET /home.html HTTP/1.1

would become:

GTE /mohe.hmtl HTTP/1.1

and the request wouldn’t make much sense.

In order to guarantee delivery order, TCP marks each byte of applications data with a unique identifier called a sequence number (SYN). The problem is that this number must be chosen randomly for security reasons.

Therefore, if a client is asking for a resource, it cannot do so without signaling to the server its initial sequence number (ISN). Then it must wait before the server acknowledges good reception of this segment, before being able to send application data.

Well, unless the request is secure, which it probably is since 80% of HTTP requests these days are actually secure HTTPS requests. These are normal HTTP requests, except that they are encrypted in order to guarantee (at least, up to this day) confidentiality, integrity, and authenticity.

To accomplish that, the client and servers must now agree on a TLS protocol version, select cryptographic functions, authenticate each other by exchanging and validating x509 certificates… for no less than ~10 protocol messages that can be packed into a minimum of 2 TCP exchanges, and as many round trips.

Once the connection is setup and secure, then TCP can finally start sending segments carrying our application data, such as our HTTP request. Unfortunately for us, TCP prevents us from sending all our data at once in one batch of multiple segments.

This restriction is a necessary evil, so that we don’t accidentally cause a buffer overflow on the receiver. When an application that initiated a connection asks the underlying TCP socket for data, TCP will refuse to give any chunk that would be incomplete or made of unordered smaller chunks. Hence, the need for a buffer.

The way it works is that TCP sends N segments in the first batch, and, if all segments were received by the server, will send twice as many segments (2N) in the next batch, and so on, leading to an exponential growth. This mechanism is commonly known as the slow-start algorithm and is one of the two only possible modes in which TCP operates, along with congestion avoidance.

We won’t discuss congestion avoidance in this series. But if there was only one thing to be aware of, it would be that once TCP senses that the underlying network may be congested, the protocol starts acting much more conservatively. Thereby extending even more the time required to transmit data.

The consequence

With all these steps in mind, let us make a simple calculation to realize something important. At the beginnings of the web, the N parameter (known as the congestion window in TCP’s terms) was equal to 1. With such a window and an average resource of 20kB, we can determine how many round trips are necessary for an average request to be fully transmitted.

Indeed, under normal circumstances, the maximum segment size (MSS) is 1460 bytes. That’s equivalent to 20 000 / 1460 = 14 TCP segments. When dispatched according to the exponential scheme we just described, this is equivalent to 4 round-trips to the server.

Now, if we approximate a UDP-based DNS request to a TCP-based request to the origin servers, we can estimate a total number of round-trips (RT) that require the booting of a modern web application:

DNS request: ~1 RT
TCP connexion setup: 1 RT
SSL handshake: 2 RT
Resource downloading: 4 RTT
Total: 8 RT

75 requests that all require 8 round-trips each result in a total of 600 rounds trips to the server. A typical RTT between Europe and the US is 50ms, which gives us the amount of time that information spend flowing on the Internet when we request a typical web application: 30 seconds. And it could get worse.

The Amazon homepage for instance, a typical MPA, currently weighs 6.3Mb and requires 339 requests. That would translate into a salient page loading time of 2 minutes and 15 seconds. As an exercise, try do the same for the Facebook Messenger homepage, a typical SPA.

How to interpret this number? This would be the actual page load if every single resource had to be downloaded sequentially, TCP initial congestion was down to its minimum value of 1, and if DNS requests, TCP and SSL handshakes had to be done all over again every time. The web would be a much different place for sure.

Fortunately, many improvements have been made over the years. DNS resolutions are cached at different places, TLS handshakes results are reused.

TCP connections were allowed to be persisted between multiple requests, avoiding the cost of both connection setup and slow-start on each request.

TCP’s initial congestion window was lifted up twice, from 1 to 4 and more recently, to 10. Browsers started to open up parallel connections (6) as well as some really advanced strategies to accelerate page load times.

Some proposals tried to break free from the connection setup, although none of them are widely implemented.

Some CDNs even acquired patented algorithms to tune some of TCP aspects such as congestion avoidance, always for the same reason: speed up delivery.

What consequence can we draw from all these round-trips between browser and origin servers?

That bandwidth stopped being the bottleneck many years ago.

This is somewhat counter-intuitive to most of us, because bandwidth has embodied the browsing speed for years. After all, it has exactly the dimension of a speed, bits by unit of time, and it is the only thing ISPs advertise when trying to lure us into becoming their customers.

Besides, browsing on 3G is clearly slower than on 4G. But that is because the threshold at which bandwidth stops being the bottleneck is 5 Mb/s, and 3G maxes out at 2 Mb/s in ideal conditions!

And if wireless technologies such as Wi-Fi or 5G are indeed slower than their wired counterpart of some sort, it is also because in wireless systems, packet drops caused by interferences are commonplace thereby making latency much higher and volatile.

The latest version of the HTTP protocol, HTTP/2, codenamed H2, was a continuation of the SPDY protocol which itself was initially designed following this very observation: bandwidth doesn’t matter much anymore. Latency does. But latency is fundamentally a function of two things: the speed of light in optic fiber, and the distance between clients and servers.

Active research to increase the speed of light in optic fiber has already been conducted, but it only got us so far. In most deployments, light already travels at 60% of its maximum theoretical limit.

But even if we were to reach 99%, that would only have a significant impact on your website if it’s already loading in a few seconds at most. If it’s loading in more than 5 seconds, even though the performance increase would be noticeable, it would still feel slow.

Therefore we are left with one obvious choice: reducing the distance.

And the only way to accomplish that is by leveraging browsers and content delivery networks with HTTP caching.

Conclusion

Along this article, we have argued that HTTP caching is one if not the most effective way to improve the performance of your web application. And as many studies keep pointing out, page load time is an important subject that can be directly translated into user satisfaction and, ultimately, profitability.

We have seen that caching can happen pretty much everywhere, from the browser, to CDN, to private proxy servers sitting just in front of your origin servers. But also that, unlike many performance decisions, it can be completely externalized outside the main codebase, which is both valuable and convenient.

Finally, we took a closer look at the anatomy of a modern web application, to understand why latency is the new bottleneck, making caching relevant today and for years to come, even with the steady deployment of H2.

In the next article of this series, we will deep dive into the How.

In a way, this first part was merely a warm-up! We’ll learn how all of this actually works: resource freshness, revalidation, representations, cache-control headers… and much more!

Stay tuned!

To go further:

Ilya Grigorik High Performance Browser Applications (a must read):
https://hpbn.co/

Mike Belshe paper that served as a basis for the SPDY protocol: https://docs.google.com/a/chromium.org/viewer?a=v&pid=sites&srcid=Y2hyb21pdW0ub3JnfGRldnxneDoxMzcyOWI1N2I4YzI3NzE2

Active CDN blogs with tons of great articles:
https://www.fastly.com/blog
https://blogs.akamai.com/web-performance/