Cloud-based Image Caching

By default the Cleaver retains copies of all cropped images in a local cache. When an image is requested, it looks for the image in this cache first. If the requested image is not present in the cache, then it downloads the original image from the Content Store, applies the requested crop, returns the requested image and saves it in the cache for next time. This mechanism works well for a single-Cleaver installation with unlimited disk space, but can run into the following problems:

  • The cache may need to be pruned periodically due to disk space limitations by running a cron job to remove infrequently requested images. This means, however, that the deleted images will need to be re-cropped the next time they are requested. Scrapers and indexers making many requests for old images can then result in load spikes.

  • Adding extra Cleaver instances to improve performance can actually have the opposite effect initially, since each new instance will start up with an empty cache that needs filling. It is in general inefficient for each Cleaver to maintain its own cache, since each Cleaver will crop its own version of every requested image, even if another Cleaver has already done the same job.

You can solve both these performance problems by configuring the Cleaver(s) to make use of a cloud-based secondary cache. If you do this, then when a cropped image is saved to the local cache, it is also saved to the cloud cache, which is shared between all Cleaver instances. When a request is received by a Cleaver, it will first check its local cache, and then if necessary, the cloud cache. Only if the image is not found there will the Cleaver request the original image from the Content Store and crop it. Ideally, the cloud cache should never be pruned, so that almost all image crops are cached, even those that are used very infrequently.

Currently the Cleaver only supports the use of Amazon S3 buckets as image caches.