The Sozu table, a blockchain-centric data structure for the UTXO dataset of Bitcoin

The UTXO dataset is the set of unspent transaction ouputs in Bitcoin. This dataset represents who own’s what, and it is the only part of the blockchain that actually need to be persisted to keep Bitcoin working (well, almost, block headers are needed too, but they are much smaller). Earlier this year, when I started to work on the UTXO challenge challenge, I realized that generic key-value stores were not exploiting all the angles that could be leveraged in the specific context of Bitcoin. Thanks to the temporal dimension, the UTXO can be scaled more efficiently than a plain associative array. The Sozu table precisely exploits this insight, and quite a few others as well. The implementation of Terab is based on the Sozu table.

The Sozu table, a blockchain-centric data structure for the UTXO dataset of Bitcoin

Abstract: Bitcoin has the ambition to scale on-chain up to millions of transactions per second. Scaling the blockchain entails varied challenges. We present the Sozu table, a high-density I/O optimized data structure intended for the UTXO dataset (unspent transaction outputs). Unlike a traditional key-value store, the Sozu table levarages the Lindy effect, as UTXO entries have a temporal dimension. The Sozu table is a layered hashtable designed to max-out a mix of data storage technologies. The I/O strategy of the Sozu table is aligned with the underlying hardware design.

PDF at https://blog.vermorel.com/pdf/sozu-table-2018-08-16.pdf