Overview
Apache Tephra is a distributed transaction processing system that provides ACID semantics for applications built on top of distributed data stores, such as Apache HBase. Tephra addresses the challenge of maintaining data consistency across multiple regions and tables in HBase, which sacrifices consistency for scalability. It achieves this by implementing multi-version concurrency control (MVCC) through HBase's native data versioning. Tephra's architecture consists of a Transaction Server, which manages the global transaction state and conflict detection; a Transaction Client, which coordinates the start, commit, and rollback of transactions; and a TransactionProcessor Coprocessor, which filters data based on transaction state and cleans up old data. By providing global transactions, Tephra simplifies application development and ensures data integrity without significantly impacting performance. It is used by projects like Apache Phoenix to add cross-row and cross-table transaction support.
