Replication and Partitioning (Sharding)

Replication

One way to improve performance among distributed key-value stores is replication, which simply replicates the storage on many hosts so that a read query to the storage can be satisfied by any one of the replicas of the datastore.

  • The replicas can be in the same data center or spread geographically to reduce the latency of clients in various locations globally.

Replication - PUT

  • In case of update queries, the request needs to propagate to all replicas in order to preserve consistency across replicas.
  • Replication is particularly important when fault tolerance is of primary importance and also for performance when there exists a record in the database which is accessed very frequently by multiple clients.
  • There are further optimizations that can be made with the different levels of consistency that exists between each replicated storage.

Replication - Strong Consistency

  • It guarantees that at any point in time, all datastores must have the same value for any given key.
  • The order in which the requests arrive is preserved.

    • While an update is happening for a certain key-value pair, no other updates or reads can occur until the update is complete for all three datastores.
    • The order in which the operations arrive will be used as the timestamp for ordering them.
  • To achieve strong consistency, you must make sure that at any point in time, all the clients should read the exact same data from any of the datastore replicas.

  • To summarize, strongly consistent coordinators must abide by these rules:

Property Explanation

Strict Consistency | At any point in time to the client’s perspective, the same key must have the same value across all datastore replicas.| Strict Ordering | The order in which requests arrive at the coordinator must be the order in which they are fulfilled.| Atomic Operations |All requests must be atomic. If one datastore fails to update while the others do, then strong consistency is violated. Controlled Access | While a write request is being fulfilled for a key, no other requests for that key can be done for any datastore until the pending request is completed. Control for each key should be managed separately, so operations for two different keys should proceed unhindered by each other.

Sharding

  • Horizontal partitioning, where rows of a storage database (also known as partitions or shards) are divided among multiple servers.

    • The entire key space is divided among multiple servers, and each machine is responsible for a specific range of keys.
    • This enables a distribution of the data over a large number of machines, potentially GET requests to be processed in parallel.
    • This assumes that the data is distributed uniformly and the requests for individual keys are also distributed uniformly over the entire key-space.
  • A key design decision for a distributed key-value store is the distribution of keys over the set of data stores.

    • The hashing algorithm must return the same value for the same key at all times.
    • A good consistent hashing algorithm will also attempt to distribute the keys evenly.
    • Additionally, a good consistent hashing algorithm must also be able to handle failures.

results matching ""

    No results matching ""