Está en la página 1de 11

Bitcoin Onchain Pruning

Peter Gregory Jr.


30th June 2016

This document describes a concept of Bitcoin blockchain on-chain pruning.


Current solutions provide only client side pruning, which is not helpful with
network scaling as all new nodes still need to download full history of all transactions. Such an improvement allows to save roughly 95% of bandwidth, storage and spent CPU processing power for new nodes during synchronization.

Introduction
One of the most discussed and important problems in Bitcoin today is scaling. Recent implementation of Thin Blocks in Bitcoin Unlimited and planned implementation of Compact Blocks
in Bitcoin Core client solves the problem of traffic spikes between block propagations on Bitcoin network. This is especially important for nodes with low internet connection speeds (users
running nodes from home). However, these solutions dont solve the issue of the blockchain
size, which grows over time, as new users still need to download the whole blockchain first in
order to start operating as a node. This might take several days on a low speed connection.
For average internet connection speed approximation 1/5 of current average internet speed
in the U.S. (12.6 MBs) 1 is taken, which is 2.5 MBs. This figure is lower than average internet
1

Akamais [state of the internet] Q3 2015 report https://www.akamai.com/us/en/multimedia/documents/stateof-the-internet/akamai-state-of-the-internet-report-q3-2015.pdf

speed connection in almost all countries in the report (see Fig.3 in Appendix), except Bolivia,
Paraguay and Venezuela. While this figure is much lower than average world internet speed, it
is a good guarantee that Bitcoin network is kept decentralized as literally everyone will be able
to run own node.

Blockchain size problem and existing pruning soluition


Currently all transactions which happened in the whole history of Bitcoin are stored on the
blockchain. It grows constantly in size. At the moment blocks are limited with 1Mb and the
network is working at its capacity, which means blockchain increases for 1Mb every 10 minutes
on average (actually faster as difficulty is increasing). That adds roughly 4.3 Gb of data per
month to blockchain.

Storage. One of the issues blockchain size imposes is that users need to have high capacity
drives in order to store the blockchain. This is not actually a problem as currently existing
pruning solutions allow to prune blockchain on the client side of the node reducing the amount
of stored data. Current blockchain size is 74 Gb 2 (see Fig.4 in Appendix). Since Core Client
v.0.11 pruning of the blockchain files is supported 3 , savings can reach up to 98% as current
size of UTXO (unspent bitcoin transaction outputs) is only 1.3 Gb 4 (see Fig.5 in Appendix).

Download and validation. These are the main problems and limitation of scalability. Even
with blockchain files pruning available, node client still needs to download the full blockchain
from the network and validate all transactions (takes the most time here). It is often required to
redownload the blockchain, e.g. to rebuild index from scratch, when local pruning doesnt help
2

Blockchain.info Charts Blockchain size https://blockchain.info/charts/blocks-size


Bitcoin Core client v.0.11 release notes https://bitcoin.org/en/release/v0.11.0
4
UTXO statistics http://statoshi.info/dashboard/db/unspent-transaction-output-set
3

much. In the following sections we describe the method to organize data structures in order to
make pruning happening on-chain.

On-chain pruning concept


At the high concept level the proposed solution is the following:
Take out UTXO sets from blocks until recent time and put them into a separate data
structure (later referred in this document as UTXO block).
Calculate a hash of the UTXO block and write it into currently mined block of predefined
height.
Use last available UTXO block and rest part of raw blockchain instead of the whole raw
blockchain.
Suggested parameters and more detailed description:
1. Every UTXO block is built upon 4096 consequent raw blocks (roughly 1 month of operation) and previous UTXO block. Initial UTXO block has to be built on the whole history
of transactions minus 4096 blocks.
2. Every UTXO block can be connected with a raw block only of height that is multiple of
4096.
3. Association happens via adding a hash of UTXO block header into coinbase transaction
for currently mined raw block (See Fig.1). On the other hand, raw block height of the
last included block is written into UTXO block header. This will help to understand from
which raw block the download needs to proceed when a node received UTXO block.

4. Included UTXO block header hash changes the header and hash of a raw block. That is
how it is included into main blockchain and is covered by all mining power available.
5. UTXO blocks are built upon a predefined sorting algorithm of UTXOs within given
chunk of blocks available publicly. That means all miners can pregenerate and check
whether included hash of the UTXO block is correct and reject the whole block, if it is
not.
6. UTXO block should be created only when since last block included into another UTXO
block there are 8192 blocks mined. This means that there is always at least 1 month of
transactions (4096 blocks) kept on the raw blockchain and are not included into UTXO
block for preserving reasonable security level of the network.
7. UTXO set is calculated based on the state of the last included raw block into UTXO
block.
8. Nodes can choose whether they download UTXO block or work only with raw blocks (as
is now), this will keep part of the network storing the whole history of all transactions.
9. Blocks that can have UTXO block hash included have to have a height multiple of 4096.
Every UTXO block is built upon all history of raw blocks minus 4096 blocks, or for
efficiency it can be built upon last avaialble UTXO block and all transaction in 4096
blocks since then.
10. The parameter of 4096 is chosen based on reasonable expectations of how networks operates. It can be chosen differently. Requirement to have always 4096 raw blocks on the
main blockchain is dictated by security (if someone has a hashing power to rewrite last
4096 blocks, they basically will be able to rewrite all history by faking UTXO blockchain,
but only for those nodes who decided to use light UTXO block).
4

Also this factor needs to be dependent on average transaction output life length [todo:
have to be a separate study]. But keeping in mind that many transactions are long chained
and included into the same block, such pruning can have immediate effect even with much
smaller than 4096 value.
One more argument here is the bigger the value chosen for this parameter, the longer raw
blocks chain is not included in UTXO block, which forces new nodes to download and
validate more transactional data in raw format (including spent outputs).
So a balanced value needs to be chosen here. Which will keep the network private and
increase performance for nodes operating based on UTXO blocks.
11. This method requires miners to accept it at a high approval rate (95%) and can be implemented via a softfork without changing consensus rules.
12. The pruned UTXO blocks can be delivered in any form (best to have it implemented
natively by a client), however torrent networks works as well here. Advantage of the
described method is that the user doesnt need to trust the origin source of the file, as the
hash of it is written on the blockchain and can be easily verified.

Sample scenario
Current blockchain height is 418000. Assume miners already accepted and voted for proposal
and it starts working now. Next block for UTXO block inclusion will be 421888 (see Fig.1). So
before we reach this block, all miners will take first 417792 raw blocks (starting with genesis
block), take only UTXO from there and put them in a separate UTXO block structure. Every
miner will be able to independently create this data structure and calculate hash of this UTXO
block, which will be included inside block 421888 and will become part of the history. If the
hash is incorrect or not included into coinbase transaction, absolute majority of miners will
5

reject it and will continue on mining their own block # 421888. As the order of UTXOs is
predefined, every miner can pregenrate the hash of required UTXO block long before the block
# 421888 is mined.
Figure 1: Bitcoin On-chain Pruning First UTXO block

Assuming UTXO block hash is correct, blocks are mined further. When we reach block
425984 all miners will already have next UTXO block created, which will be built upon raw
blocks 1-421888 or for more efficiency based on previous UTXO block and transactions in
blocks 417793-421888 (see Fig.2). Hash of the new UTXO block will be written in block
425984. And so on.
When a new node starts and chooses to work upon UTXO blocks, only UTXO data will be
downloaded (last available UTXO block) plus all the raw blocks since this last UTXO block.
Currently that will result in only about 6Gb of the data instead of 74 Gb of raw blocks (1.3Gb
of UTXO set and 4Gb of last 4096 blocks in raw format). The speed of UTXO blockchain size
growing will be much lower, due to steady filtering out of spent outputs. The advantage of this
method is that it makes the size of the required blockchain to be downloaded for first initiating
6

Figure 2: Bitcoin On-chain Pruning Rest UTXO blocks

nodes relatively stable, mainly dependent on number of blocks kept from raw blockchain (4096
in this proposal) and the size of blocks (1Mb currently). So total data to be downloaded will
fluctuate between 4-8 Gb of raw blocks depending when last pruning happened and the size of
last pruned UTXO block ( 1.3 Gb currently).
With assumed internet speed of 2.5 Mb/s it will take roughly 40 minutes to download that
much data (6 Gb). CPU usage for signature verification will be reduced dramatically as most
of the transaction chains will be reduced to have only UTXO in the block. So starting a node
within couple of hours on a low speed internet connection will become possible compared to
several days now.

Criticism and attack vectors


The main drawback and potential attack vector is if someone has a power to overwrite last 4096
blocks, which will result in ability to overwrite the whole history of bitcoin transactions for
those nodes who use light download option. At the same time other nodes will still have the
full history of transactions and basically such an attack is easily distinguishable. Even without
UTXO blocks if someone will be able to rewrite history for 1 month (4096 blocks) that will
be a collapse of the same magnitude and should be addressed similarly.
Another attack vector is a Sybil attack, when someone creates many nodes with fake UTXO
block data and feeds this information to a newly started node. In the end it will still result in
invalidation of the UTXO block, because the hash of it wont correspond to the one written
on main blockchain. This attack could postpone full synchronization of a newly started node
for a short while. As this attack has no benefit for an attacker and no practical meaning, it
is considered as non-critical. Especially, when pruned file can be downloaded manually from
external to bitcoin network resources and then just hash validation happens, this attack becomes
meaningless.
One more criticism could be that in the future main raw blockchain can be lost, as all switch
to light UTXO blockchain. First of all, this is highly improbable to happen, as there will be
always nodes with enough resources that will prefer to have a full raw blockchain. But even if
that happens, this is not really a problem, as in the case with cash notes you dont care about
the full list of all transactions when it was used. The only importance is who holds it now and
controls the keys to make a payment. Bitcoin is a digital cash in similar sense here. As long
as 4096 confirmations is enough security, that will guarantee that UTXO blockchain on the
network is the true one.

Conclusion
The described method introduced a way to make on-chain pruning for Bitcoin. That will create
a separate data structure called UTXO block, which will store only UTXO set and reduce download times dramatically compared to download of a full raw blockchain now. Some nodes will
choose to use light UTXO block in order to use less resources and operate faster. At the same
time they will still be fully validating nodes on the network. So this method introduces a new
type of nodes between current fully validating nodes having the full blockchain downloaded
and light client nodes (SPV).

Appendix

Figure 3: Average internet connection speed by country, MB/s

10

Figure 4: Bitcoin blockchain size, Gb

Figure 5: Seralized UTXO set size, Gb

11

También podría gustarte