12 September 2011

Creating Scalable Systems

Some food for thought. Consider the following:
  1. Prefer BASE over ACID transactions
  2. Prefer asynchronous over synchronous transactions
  3. Keeping state is expensive
  4. Considering database sharding (highscalability, codefutures, Pros and Cons) by data, by transaction or by customer but avoid premature optimisation
  5. Design the system for automated rollback
  6. Create isolative structures; share nothing; such that nothing crosses the swimlanes
  7. Design systems for failure
  8. Create idempotent services where possible
Database sharding requires changes in mindset:
  1. Tables may need to be denormalised to optimise sharding (as well as to workaround cross-shard joins/ queries)
  2. Scale-out instead of scale-up
  3. Do away with replication where possible
Different sharding schemes are:
  1. Vertical partitioning – sometimes known as functional or feature partitioning where data relating to certain entities are grouped together. Different functions or features are put onto different shards.
  2. Range-based partitioning – data for a certain function/ feature/ entity is sharded using ranges (such range may be based on year, location, etc.)
  3. Hash-based partitioning – data for a certain function/ feature/ entity is sharded using a hash function (modulo operation)
Database sharding presents a number of issues:
  1. Data needs to be rebalanced from time-to-time
  2. Joining data from multiple shards (cross-shard join) is expensive
  3. Referential integrity is now an issue since referential data may now be in a different database
  4. Sharding is relatively new; no body of knowledge and lack of support

1 comment:

Liran Zelkha said...

Hi

Great post. Some remarks on Sharding - it is possible to do cross-shard joins, so no need for schema changes. Just use an off-the-shelf tool for that - like ScaleBase (disclosure - I work there). They give you transparent database sharding.