The first thing to do when dealing with shards is to define how to split the data. Indeed, sharding is powerful, but it is not without its limitations. For instance, an important limitation of sharding is the inability to traverse a relationship from one shard to another. However, some nodes can be duplicated in order to avoid this situation, so it all depends on your data model and how you are querying it. Imagine we have data from an e-commerce website that's registering orders from customers all around the world. Several dimensions can be considered for sharding:
- Spatial-based: All orders coming from the same continent or country can be grouped together in the same shard.
- Temporal-based: All nodes created on the same day/week/month can be saved in the same shard.
- Product-based: Orders containing the same products can be grouped together.
The last situation assumes that there are clear partitions between ordered products; otherwise, we would have to duplicate...