Wednesday, July 31, 2013

CAP theorem for dummies

For some reason, I am a geek and yet it still takes me a while to understand things that are written formally.  In general it feels like I understand something intuitively, or not at all.  Perhaps it is analogous to one of Richard Feynman's great quotes "What I cannot create, I do not understand", with the additional condition that I cannot build what I do not understand intuitively.  If my brain contains an accurate model of a system then it will certainly be easier to build in any case.

The point is that I have at various times come across Brewer's CAP theorem, which states that a system that desires to be "Consistent, Available, and Partition Tolerant" can only have two of these features and not all three.  Consistent means that changes to the system are transactional and no transactions that are partially complete are visible to the user.  Available means that the user will get a good response time to a query.  Partition Tolerant means that the system is resilient to the failure of component systems.  Even with these explanations it can still be quite unclear as to what this theorem really means, or why it is true and important.

The theorem would be better-phrased in three parts and with three example systems being described.

1) If a system is Consistent and Available then it will not be Partition-Tolerant:
A system that performs transactions atomically and is very responsive to queries is not robust to failures. Such a system takes on the form of a single computer holding a database in memory, which performs transactions in a serial order.  The system will be very responsive and transactions will be respected utterly, but if that one computer crashes then the whole system goes down because the system is not redundant ("Unpartitioned").

2) If a system is Consistent and Partition-Tolerant then it will not be Available:
A system made of multiple computers that each store a copy of essential data will be robust to failures.  This system can accept transactions but in order to provide a Consistent view of the data to the user, each replica must execute the transaction and verify when it completes before proceeding.  Because subsequent queries must wait for the previous transaction to finish on all replicas the performance will be bad, and hence said to be "Unavailable".

3) If a system is Available and Partition-Tolerant then it will not be Consistent.
A system made of multiple computers that each store a copy of essential data will be robust to failures.  Such a system will be very responsive to user queries if, when the queries arrive at the computers, they are handled right away and do not wait for previous transactions to complete.  Unfortunately, the state of the data will be inconsistent while old transactions are still running, so the view of the data is said to be "Inconsistent".

If I have simplified something in the above text that must not be, then by all means leave a comment and I will attempt to correct the error.  In the interim it shall be entered into dailycircuitry canon.