HyperLogLogs
The newest Redis data type is a probabilistic data structure that provides an estimated count of unique items in a collection. Under typical or normal situations, to get a unique count of a collection's items requires an amount of memory that is equal to the number of items or at least a time complexity of O(n)
. Why? To ensure that no items are double-counted if they are duplicated in the collection, the algorithm must keep a record of each item for comparison with any new items. This amount of overhead becomes quite large and expensive to calculate as the size of the collections increases in the order of millions of items. In contrast, storing unique elements in a HyperLogLog
structure computes and stores an estimate of the size of the set as a probability instead of the actual value with a relatively small error rate of less than 1%. Adding one or more elements to a HyperLogLog
with the PFADD
command is an O(1)
operation, while retrieving the count of unique items with a PFCOUNT...