On compact proofs for token protocols

Bitcoin can operate a large variety of tokens1 through trust-but-verify schemes. Apps supporting such tokens are part of cashland2: those apps interact with the blockchain, but they are not required for Bitcoin to operate as cash. Trust-but-verify differs from the code-is-law approach where the validation of the correctness of the token transactions is the responsibility of the participants in charge of securing the blockchain itself. As far tokens are concerned, trust-but-verify is superior to code-is-law in terms of blockchain economics and also happens to be more practical3.

However, one of the challenges of trust-but-verify is the amount of verification that can be achieved with chainless2 apps. Indeed, while a token can be designed assuming that every token participant has to process the entire blockchain of Bitcoin to securely transact, this approach is putting an unreasonable computational burden on token holders in the context of a blockchain can that could grow to blocks as large as 1TB per block4. Thus, it is of prime interest to design the tokens in such a manner that the verification can be as lightweight as possible, ideally within the realm of possibilities for a mobile device.

There are two simple complementary scoping schemes which can be used to reduce the amount of data to be processed to generate proofs for trust-but-verify tokens by factors ranging from 104 to 1010 for scenarios of practical interest.

TxID range scoping

This first scheme, referred to as TxID range scoping, depends on the Canonical Transaction Ordering Rule (CTOR)5 to operate, which is not, at the time of writing, yet available on Bitcoin. Let’s assume that we have a trust-but-verify protocol that we want to make more easily verifiable.

The token protocol defines that transaction identifiers are valid for transaction inclusion if and only if the transaction identifier has a specific bitwise prefix. All other transactions are ignored.

Under the CTOR, the transactions are strictly ordered within a block. Thus, it is reasonable to assume that one of the networking dialects of Bitcoin will offer the possibility to download a subtree of the Merkle tree that is provably a minimal superset of all transactions starting with the specific bitwise prefix for the transaction identifiers.

Assuming that transaction identifiers are approximately random, querying the blocks against a given bitwise prefix of n bits only return, on average, a fraction not too far - more on market dynamics below - from 2-n of the original transactions found in the block.

Adversarial transactions can be constructed to bloat a specific TxID range of the blockchain, however this attack vector is dramatically expensive. Assuming that a 250 byte transaction remains priced at a tenth of a cent - or rather the monetary equivalent in BCH - pushing 100GB of adversarial transactions into the blockchain would cost 4M USD. This cost is not even considering the computing cost associated with the generation of 400M adversarial well-scoped transactions.

In practice, the easiest way to make a token protocol amenable to TxID range scoping is probably to include a nonce as a field within the OP_RETURN payload - in practice 4 to 8 bytes depending on the size of the bitwise prefix. Token apps would iterate over nonce values until a compliant TxID is found for any token transaction of interest.

With adequate software optimizations, a smartphone can vanity-hash a 2-byte prefix under less than a second to issue a token transaction compliant with the TxID range. In practice, this means that a protocol such as the Simple Ledger Protocol6 could restrict the amount of blockchain data to be processed by the token client app by a factor up to ≈65000 if blocks become very large. For example, the client app for the Simple Ledger Protocol could still operate with Bitcoin blocks as large as 65GB per block, while not having to process more than 1MB of data per block on average - assuming that the transaction volume of the token itself remains well below 1MB of data per block.

In addition, for a token protocol like Tokeda3, the issuer can abide to a longer bitwise prefix beyond what would otherwise be practical for regular token holders. Indeed, the secondary transaction fee paid to the issuer can be used by the latter to cover the hardware expenses associated with a much more stringent bitwise prefix.

Assuming that the issuer has access to ASIC for ECDSA, a 4-byte prefix could be enforced while keeping transactions relayed under a minute or so. This would theoretically restrict the amount of blockchain data to be processed by a factor of 4 billion. In practice, the gain ratio would not be that large however. Indeed, there are two incompressible factors. First, a range proof based on the merkle tree under the CTOR would weight about 4KB per block, that is ≈220MB per year, assuming 1TB blocks. Second, even if the range is very narrow, the transactions of the token itself remain part of the range by construction. Considering a token that generates 10k transactions per day, we have about ≈1GB per year of the token transactions.

Blockheight scoping

The second scoping scheme, referred to as blockheight scoping, is less capable and practical, but does not require any upgrade of the Bitcoin protocol, and can be used to complement the first scheme.

The token protocol defines that transaction identifiers are valid for transaction inclusion if and only if the transactions are included in a block where “blockheight mod N == k”, where N is the “gap” of the token protocol. All other transactions are ignored.

The gap N would typically be chosen as a prime number, as prime number offer better properties collision-wise. k is an offset intended to let the token issuer cherry-pick a blockchain scope that is minimally burdened by other Bitcoin participants.

In practice, Bitcoin does not offer a mechanism to have a transaction reliably inserted into a specific block. The empirical examination of the distribution of delays between blocks indicates that an automated process actively monitoring the propagation of the blocks should be able to successfully push a transaction at a targeted blockheight with over 98% success - the failure mode being the inclusion of the transaction in a later block. Indeed, the majority of the blocks are found more than a few seconds apart which is a time window large enough to propagate a transaction.

For a low volume token and/or a token that does not require fast transactions, a gap chosen at 20 (or rather 23) would still ensure a near perfect same-day transaction inclusion guarantee (with retries) while offering a strict decrease of the amount of blockchain data to be processed by the same factor.

For specific use cases, like monthly scheduled key rotations by the issuer for a token protocol like Tokeda3 the gap could be increased to 1000 (or rather 997), without degrading the user experience. By combining TxID and blockheight scoping,a back-of-the-envelop analysis hints for key rotation proofs that are below 2MB per year.

Market dynamics for scoping

The very introduction of scoping schemes either leveraging TxID or blockheights create a bias against the expected randomness of the distribution of the transactions within the blockchain.

However, it is the best interest of token issuers to avoid crowded scopes. Hence, market dynamics would still spread transactions as uniformely as possible if such scoping schemes ever become popular.

Indeed, just like those scoping schemes can be used to provide compact proofs for issuer key rotations, those scoping schemes can also be used to let issuers broadcast re-scoping instructions7 to dynamically - but infrequently - move the token a less crowded place if the need ever arises.


  1. In this document, token should be understood as redeemable tokens - which in practice represent all tokens, except bitcoins themselves which are non-redeemable as they are only token that originate from the blockchain itself. ↩︎

  2. A taxonomy of the Bitcoin applicative landscape, Joannes Vermorel, May 2018 (link) ↩︎

  3. Tokeda, Viable token-driven metadata within Bitcoin, Joannes Vermorel, April 2018 (link) ↩︎

  4. Terabyte blocks for Bitcoin, Joannes Vermorel, March 2018 (link) ↩︎

  5. Canonical Transaction Ordering for Bitcoin, Joannes Vermorel (Lokad), Amaury Séchet (Bitcoin ABC), Shammah Chancellor (Bitcoin ABC), Tomas van der Wansem (Bitcrust), June 2018 (link) ↩︎

  6. Simple Ledger Protocol, Jonald Fyookball, James Cramer, Unwriter, Mark B. Lundeberg, Calin Culianu, Ryan X. Charles, July 2018 (link) ↩︎

  7. In this context, a re-scoping instruction is an interpretation convention on the content of an OP_RETURN payload produced by the token issuer. ↩︎