Caching is at the heart of content delivery network (CDN) services. Similar to how browser caching stores files on a hard drive, where they can be more rapidly accessed, a CDN moves your website content to powerful proxy servers optimized for accelerated content distribution.
Caching works by selectively storing website files on a CDN’s cache proxy servers, where they can be quickly accessed by website visitors browsing from a nearby location.
A majority of all website content consists of static pre-formatted files that are not expected to change over time (or for different users). These files are the default candidates for caching, as opposed to dynamic files, which are generated on-the-fly based on information from a database.
For example: Static e-store template with dynamically-generated product information. Typical static files are:
Delivering content from CDN cache proxies removes the burden from the origin (backend) server, significantly reducing bandwidth costs associated with serving content to numerous visitors. For most sites, bandwidth costs can be reduced by as much as 40% to 80%, depending on the percentage of cacheable content.
A globally distributed network of cache proxy servers, CDNs bring your website’s content closer to all visitors, no matter where they are. Having this content delivered from a local server significantly improves access speed and user experience.
Modern CDNs have traffic capacity far exceeding most normal enterprise network capabilities. Where a self-hosted website may be easily disrupted by unexpected traffic peaks or denial of service attacks, CDN cache servers are highly resilient and secure. As a result, they are stable during peak traffic instances.
Proxy cache servers are the building blocks of a CDN’s network data centers, which are strategically situated around the globe. These points of presence (PoP) are selected based on traffic patterns of individual regions.
Highly active locations with numerous users may have several data centers. On the other hand, remote locations with few users may have only one PoP to cover a large geographic region.
Once in place, cache servers act as a repository for website content, providing local users with accelerated access to cached files. The closer a cache server is to the end user, the shorter the connection time needed for transmission of website data.
Hardware wise, a typical individual cache server is a content delivery powerhouse, with bolstered RAM and SSD storage resources. Being the faster option, RAM is used for high-priority resources, while SSD is used for your least requested, but still cacheable web files.
Efficient caching relies on a high hit ratio, which indicate that the requested resources are present in the cache. Consequently, the general formula to calculate average memory reference time is:
Always discards the information that will not be needed for the longest time in the future. This too-good-to-be-true approach is only possible when one can predict how far in the future information will be needed. As a result, this algorithm is rarely used in practice.
Discards the least recently used items first. This algorithm is implemented by assigning an age counter to each cached resources and discarding those with low counters. Generally this is the most effective method of cache management.
As opposed to LRU, this discards the most recently used items first. This algorithm is most useful in situations where the older an item is, the more it is likely to be accessed.
Web developers use HTTP cache headers to mark cacheable web content and set cache durations. Using cache headers, you can control your caching strategy by establishing optimum cache policies that ensure the freshness of your content.
For example: “Cache-Control: max-age=3600” means that the file can be cached for no longer than an hour before it must be refetched from the origin content.
Meticulously tagging each file, or even groups of files, can be overwhelming and prone to inefficiencies. Modern-day CDNs allow you to forgo the practice by employing intelligent mechanisms able to override cache header directives when they are discovered to be suboptimal.
Most commonly, these mechanisms enable the caching of dynamic content marked as uncacheable by default, even when freshness is not an issue.
Introduced with HTTP/1.1, headers handle a variety of cache functions. Cache Control is supported by all modern browsers and supersedes any previous generation headers (such as Expires).
Cache-Control: public – enables caching by public platforms such as CDNs.
Cache-Control: private – reserved for private information that is designated non-cacheable.
Cache-Control: no-cache – requires validation before caching.
Cache-Control: no-store – completely prohibits caching.
Cache-Control: public, max-age=[seconds] – sets a max limit (in seconds) for time that content can be cached before purging.
Similar to Cache-Control: max-age, sets the time of content expiration and removal.
Gives you increased control over cache policies, acting with the authority of the origin server.
Provides your cached web content with unique identifiers, enabling individual labeling and more sophisticated sorting.
Largely supplanted by Cache Control, Pragma was previously used to handle caching instructions for browsers.
Some browsers still struggle with supporting the Vary header. When used properly, Vary can be a powerful tool for managing delivery of multiple file versions, especially for compressed files cached alongside their uncompressed counterparts.
To date, most CDN caching has been a hands-on process. Modern CDNs however, are developing new processes to monitor, categorize and cache a wider range of content, saving you time and allowing for higher overall efficiency.
This learning-based approach relies on a CDN’s ability to track content usage patterns to auto optimize storage and delivery. The benefit of using such intelligent cache controls include:
Cache adjustment for regionally popular content
Automated cache rules for frequently accessed material
Prodeictive replication for high demand content
Time-sensitive archive and expiry policies
One of the main benefits of intelligent cache controls is the ability to identify new cache opportunities for dynamically generated objects. These pieces of content, which are generated anew with each visit, may not be subject to change but are still deemed “dynamic” due to a technicality.
Intelligent cache algorithms can auto identify dynamic content simply by observing usage patterns. For example, when a system notices that the same HTML version of your product page is being served again and again, it labels it as static, even though it’s dynamically generated.
From that point, the HTML object is deemed “cacheable” and is served directly from a CDN’s proxy servers to improve page load speed and responsiveness. The algorithms, on the other hand, keep track of the object and constantly reevaluate its status, marking it as dynamic as soon as it sees that it was modified.
Doing this on scale can vastly improve website performance, with no impact to content freshness.
Even with the recent advancement in intelligent caching, control is still a requirement for optimal cache management. These are the three must-have cache control options:
Gives you the ability to refresh cached files on call. Note that some providers will only allow you to refresh the entire cache storage. Also, in some cases your CDN provider will limit the number of purges over a given time period. The effectiveness of a purging request is measured in the time it takes for it to propagate through the entire network.
Helps you manually override cache headers, tagging files that should be always served or never served from cache. This is an effective tool for cache management, especially when combined with bulk management options that allow you to apply these directives to entire groups of files (e.g., all JPG files in /template/images/ folder).
A refinement of the Always cache option, this allows you to set a specific period during which the object should be served from cache before refreshing. Accessed from the CDN GUI, this allows easier management of specific files. However, this option is most useful when used for bulk file management (e.g., all JS files that are cached for five days).