ACID Transactions Change the Game for Cassandra Developers
For years, Apache Cassandra has been solving big data challenges such as horizontal scaling and geolocation for some of the most demanding use cases. But one area, distributed transactions, has proven particularly challenging for a variety of reasons.
It’s an issue that the Cassandra community has been hard at work to solve, and the solution is finally here. With the release of Apache Cassandra version 5.0, which is expected later in 2023, Cassandra will offer ACID transactions.
ACID transactions will be a big help for developers, who have been calling for more SQL-like functionality in Cassandra. This means that developers can avoid a bunch of complex code that they used for applying changes to multiple rows in the past. And some applications that currently use multiple databases to handle ACID transactions can now rely solely on Cassandra to solve their transaction needs.
What are ACID transactions and why would you want them?
ACID transactions adhere to the following characteristics:
- Atomicity — Operations in the transaction are treated as a single unit of work and can be rolled back if necessary.
- Consistency — Different from the “consistency” that we’re familiar with from the CAP Theorem, this is about upholding the state integrity of all data affected by the transaction.
- Isolation — Assuring that the data affected by the transaction cannot be interfered with by competing operations or transactions.
- Durability — The data will persist at the completion of the transaction, even in the event of a hardware failure.
While some NoSQL databases have managed to implement ACID transactions, they traditionally have only been a part of relational database management systems (RDBMS). One reason for that: RDBMSs historically have been contained within a single machine instance. The reality of managing database operations is that it’s much easier to provide ACID properties when everything is happening within the bounds of one system. This is why the inclusion of full ACID transactions into a distributed database such as Cassandra is such a big deal.
The advantage of ACID transactions is that multiple operations can be grouped together and essentially treated as a single operation. For instance, if you’re updating several points of data that depend on a specific event or action, you don’t want to risk some of those points being updated while others aren’t. ACID transactions enable you to do that.
Example transaction
Let’s look at a game transaction as an example. Perhaps we’re playing one of our favorite board games about buying properties. One of the players, named “Avery,” lands on a property named “Augustine Drive” and wants to buy it from the bank for $350.
There are three separate operations needed to complete the transaction:
- deduct $350 from Avery
- add $350 to the bank
- hand ownership of Augustine Drive to Avery
ACID transactions will help to ensure that:
- Avery’s $350 doesn’t disappear
- The bank doesn’t just receive $350 out of thin air
- Avery doesn’t get Augustine Drive for free
Essentially, an ACID transaction helps to ensure that all parts of this transaction are either applied in a consistent manner or rolled back.
Consensus with Accord
Cassandra will be able to support ACID transactions thanks to the Accord protocol. As a part of the Cassandra Enhancement Process, CEP-15 introduces general purpose transactions based on the Accord whitepaper. The main points of the CEP-15 are:
- Implementation of the Accord consensus protocol
- Strict-serializable isolation
- Best attempts will be made to complete the transaction in one round trip
- Operates over multiple partition keys
With the Accord consensus protocol, each node in a Casandra cluster has a structure called a “reorder buffer.” This buffer is designed to hold transaction timestamps for the future.
Figure 1 – A coordinator node presenting a future timestamp (t0) to its voting replicas.
Essentially, a coordinator node takes a transaction and proposes a future timestamp for it. It then presents this timestamp (Figure 1) to the “electorate” (voting replicas for the transaction). The replicas then check to see if they have any conflicting operations.
As long as a quorum of the voting replicas accept the proposed timestamp (Figure 2), the coordinator applies the transaction at that time. This process is known as the “Fast Path,” because it can be done in a single round trip.
Figure 2 – All of the voting replicas “accept” the proposed timestamp, and the “Fast Path” application of the transaction can proceed.
However, if a quorum of voting replicas fails to “accept” the proposed timestamp, the conflicting operations are reported back to the coordinator along with a newly proposed timestamp for the original transaction.
Wrapping up
The addition of ACID transactions to a distributed database like Cassandra is an exciting change, in part because it opens Cassandra up to several new use cases:
- Automated payments
- Game transactions
- Banking transfers
- Inventory management
- Authorization policy enforcement
Previously, Cassandra would have been unsuited for the cases listed above. Many times, developers have had to say, “We want to use Cassandra for X, but we need ACID.” No more!
More importantly, this is the beginning of Cassandra evolving into a feature-rich database. And that is going to improve the developer experience by leaps and bounds and help to make Cassandra a first-choice datastore for all developers building mission-critical applications.