To understand blockchain, you need to know the basic principles that it is based on. Possibly the main feature of it is the Merkle tree, sometimes called a hash tree. It is thanks to it that blockchain can be both effective and transparent at the same time. The concept was patented by Professor Ralph Merkle back in 1979. Now it helps to solve problems in large decentralized networks.
What is the Merkle tree and how is it related to cryptocurrencies? Let’s find out in this Changelly article!
Table of Contents
Merkle Tree Basics
Merkle tree is a complete data structure in the form of a tree, in the leaf vertices of which there are hashes from data blocks, with the inner vertices containing hashes from adding values in child vertices. This connects all the elements with information among themselves. In the end, it looks like this.
A hash is a result of converting a hash function. It is a function that converts an array of input data of arbitrary length into an output string of a specified length in accordance with a specific algorithm.
What Is Merkle Tree Used for?
In a centralized system, the veracity of information is not a problem, since all its components rely on one centralized node. You do not need to worry about the authenticity of money when you receive a transfer to your bank account.
However, in a decentralized network everything is not so simple. Each of the nodes is responsible for the veracity of the transmitted information, so it is not an easy task to verify the authenticity of the full volume due to the number of transactions on the network. At least without the Merkle tree. It allows you to optimize the process of presenting data using hashing.
File systems use Merkle trees to check information for errors, and distributed databases to synchronize records. On blockchain, hash trees allow simplified payment verification (SPV).
SPV clients called lightweight (because they store only block headers, not their contents), to verify transaction information, do not recalculate all hashes, but request Merkle’s proof. It consists of a root and a branch that includes hashes from the requested transaction to the root since the client does not need information about other operations. When adding the requested hashes and comparing them with the root, the client makes sure that the transaction is in its place.
This approach allows you to work with arbitrarily large amounts of data since it significantly reduces the load on the network as only the necessary hashes are downloaded. For example, the size of a block with five transactions of maximum size is more than 500 kilobytes. The size of the Merkle proof, in that same case, would not exceed 140 bytes.
How Merkle Tree Works in Bitcoin
A hash function is a process of converting input data into a bit string of a specified length. The received string, the hash, is very dependent on the array of incoming data. If even one character from the entire array is changed, the resulting hash will take on a completely different value.
All transactions in the Bitcoin block are strings in hexadecimal format. They are hashed and presented as transaction identifiers (txid). All txid in the block is hashed until a single hash value of the block is received. In the process, the Merkle tree is built:
- First, txid (Transaction ID) itself is calculated, that is, transaction hashes;
- Then, hashes are calculated from the sum of the transaction hashes. The Merkle tree is binary – that is, with each new hash step, the number of tree elements must be even. If the block has an odd number of transactions, the hash of the last one is duplicated and added to itself;
- New hashes are computed from the hashes of the sum of the transaction hashes. The process continues until a single hash is obtained (Merkle root). It is indicated in the block header.
In the Bitcoin blockchain, Merkle trees are built using SHA-256 double hashing. Here is an example of hashing a hello string:
First-round SHA-256:
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
Second-round SHA-256:
9595c9df90075148eb06860365df33584b75bff782a510c6cd4883a419833d50
The Principle of Merkle Tree
The process of compiling a Merkle tree is similar to data folding. Thanks to it, a huge list of transactions or any other array of information can be represented in just one line. The amazing thing is that if somewhere in the list of these same transactions we change only one symbol, the next level of the tree and the final hash will be completely different. This means that the top of the tree will also change.
In other words, you cannot substitute a transaction for another one into the block or change the data of existing ones. This is why the Merkle tree is considered an efficient way to record transactions in a blockchain. There is also the concept of Merkle Proof. This is the principle of verifying the validity of information using hashes. Instead of examining the entire data array, it is enough to examine individual hashes in the tree, which greatly reduces the computational power overheads for the entire process.
Merkle Tree Alternatives
The article discusses the simplest binary version of the concept invented by Ralph Merkle. In it, each “parent” hash has two “heirs”. In Bitcoin, a hash tree is constructed using SHA-256 double hashing.
There are more complex interpretations of the concept. For example, the Ethereum uses the prefix Merkle tree. Each Ethereum block header contains three such trees at once: for transactions, information about their execution and status. Unlike a binary tree, the value of a prefix node also depends on connections to other nodes. The value is dynamic and not fixed. It can be changed without having to recalculate all the hashes of the tree.