Salient bits of CashDB
Designing CashDB has been an interesting task, and beyond pure performance benchmarks, I would like to share some of the most salient aspects that went into its design.
Choosing C#/.NET: CashDB is implemented in C#/.NET. This might appear as a surprising choice for a high performance project, however, .NET Core is performant, surprisingly so, even. High quality C# implementations (eg xxHash) basically exhibit performance aligned with C. Then, productivity-wise, C# - just like Java or Python - is plainly superior to C++. D or Rust would have been strong contenders here, but it turns out that the teams of Lokad have a decade of experience at writing high performance terabyte-crunching software in C#/.NET, so we went down that path. However, from a client perspective, CashDB is neutral, and provides a C/C++ API, straightforward to interface with most languages and platforms.
Zero GC pressure: In order to ensure steady performance, the GC (garbage collector) pressure must be minimized. CashDB achieves near-zero GC pressure. Updating the UTXO generates zero objects to be managed by the GC; while committing a block generates a dozen objects or so (mostly string as CashDB logs a few messages). CashDB is extensively leveraging Span
Generational UTXO storage: UTXO entries are Lindy; the ones that have been around for a long time are expected to remain around for a long time. Inspired by generational GC, CashDB embeds this insight in its very design. CashDB supports up to three storage layers, which act as generational layers: older UTXO entries are gradually sinking from one layer to the next. This design offers to possibility to blend top-notch expensive SSDs (which deliver the most IOs cost-wise) with larger SSDs (which deliver the most GB cost-wise).
Lock-free concurrency: Locking is a weak approach at concurrency; and arguably one of the major pitfalls that is presently plagging the Satoshi’s client (checkout cs_main). CashDB achieves lock-free concurrency by relying on bounded inboxes alone. This design also ensures that concurrent accesses to CashDB gracefully get their respective shares of IO resources without degrading the overall performance. This aspect will become increasingly important as Bitcoin implementations become more parallelized themselves.
Adversarial resistance: When designing a key-value store for the blockchain, one needs to assume that someone will try to break it by generating transactions for the sole purpose of hitting worst-case performance scenarios as exposed by the design. Many key-value store implementation do not behave well when facing adversarial inserts and updates. CashDB stays clear of this problem by hashing all outpoints through SipHash, a hashing algorithm that has been engineered for DoS (denial of service) mitigation.
Hardware centric IOs: modern hardware, both RAM and storage, does not offer byte-level access to the data. Instead, whenever data is read or written, it goes in blocks (hardware blocks, not to be confused with Bitcoin blocks) of 4KB. Thus, while 100 bytes might have been needed - the typical size of an UTXO entry - 4KB are retrieved no matter what. Instead of being a mere victim of the IO granularity, CashDB takes the opposite stance: as 4KB are retrieved anyway, let’s do the most of this “excess” retrieval. In particular, CashDB collocates both UTXO entries and their probabilistic filters through CoinPack. Under the hood, memory mapped files - handled through MemoryMappedFileSlim - are used to achieve fine-grained control over IOs.
Sub-milisecond chain reorg: the block-do / block-undo design pattern adopted in the original Satoshi’s client is inefficient when it comes to large Bitcoin blocks. Through an event sourcing design, CashDB avoids this problem altogether, and chain reorgs are basically free. It also means the blockchain can be extended from two tips concurrently with no penalty whatsoever; although IO resources get shared between the two competing chains (no free lunch). Check out the Chains namespace of CashDB.
Bits of TCP corking: A typical machine can send about 100,000 TCP packets - considering a single thread - per second. While this is sizeable, if one packet has to be sent for each IO operation over the UTXO set, then, TCP itself becomes the performance bottleneck. Thus, CashDB does a bit of manual TCP corking to batch both requests and responses, as a single TCP packet (~1500 bytes) can typically pack a dozen of IO operations. Related server-side logic is found in the ConnectionController.
It took us a near complete rewrite of CashDB to get those insights properly implemented. Well, live and learn.