Chain Replication: CRAQ

Paper http://nil.csail.mit.edu/6.824/2020/papers/craq.pdf

Overview https://timilearning.com/posts/mit-6.824/lecture-9-craq/

CRAQ (Chain Replication with Apportioned Queries) is an an object storage system that, while maintaining the strong consistency properties of chain replication provides lower latency and higher throughput for read operations by supporting apportioned queries: that is, dividing read operations over all nodes in a chain, as opposed to requiring that they all be handled by a single primary node. This paper’s main contributions are the following

1) CRAQ enables any chain node to handle read operations while preserving strong consistency, thus supporting load balancing across all nodes storing an object. Furthermore, when workloads are read mostly—an assumption used in other systems such as the Google File System and Memcached —the performance of CRAQ rivals systems offering only eventual consistency.

2) In addition to strong consistency, CRAQ’s design naturally supports eventual-consistency among read operations for lower-latency reads during write contention and degradation to read-only behavior during transient partitions. CRAQ allows applications to specify the maximum staleness acceptable for read operations.

3) Leveraging these load-balancing properties, we describe a wide-area system design for building CRAQ chains across geographically-diverse clusters that preserves strong locality properties. Specifically, reads can be handled either completely by a local cluster, or at worst, require concise metadata information to be transmitted across the wide-area during times of high write contention. We also present our use of ZooKeeper, a PAXOS-like group membership system, to manage these deployments