Key-value storage cache
To avoid the anticipated limitations to our disk-based cache, we will now build our cache on top of an existing key-value storage system. When crawling, we may need to cache massive amounts of data and will not need any complex joins, so we will use high availability key-value storage, which is easier to scale than a traditional relational database or even most NoSQL databases. Specifically, our cache will use Redis, which is a very popular key-value store.
What is key-value storage?
Key-value storage is very similar to a Python dictionary, in that each element in the storage has a key and a value. When designing the DiskCache
, a key-value model lent itself well to the problem. Redis, in fact, stands for REmote DIctionary Server. Redis was first released in 2009, and the API supports clients in many different languages (including Python). It differentiates itself from some of the more simple key-value stores, such as memcache, because the values can be several different...