Packt+ | Advance your knowledge in tech

You're reading from Hands-On Blockchain for Python Developers Gain blockchain programming skills to build decentralized applications using Python

Product type Paperback

Published in Feb 2019

Publisher Packt

ISBN-13 9781788627856

Length 450 pages

Edition 1st Edition

Languages

Python

Tools

Blockchain

Concepts

Blockchain

Author (1):

Arjuna Sky Kok

View More author details

Hashing is a function that takes an input of any length and turns it into a fixed length output. So, to make this clearer, we can look at the following code example:

>>> import hashlib
>>> hashlib.sha256(b"hello").hexdigest()
'2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824'
>>> hashlib.sha256(b"a").hexdigest()
'ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb'
>>> hashlib.sha256(b"hellohellohellohello").hexdigest()
'25b0b104a66b6a2ad14f899d190b043e45442d29a3c4ce71da2547e37adc68a9'

As you can see, the length of the input can be 1, 5, or even 20 characters, but the output will always be the length of 64 hexadecimal numeric characters. The output looks scrambled and it appears that there is no apparent link between the input and the output. However, if you give the same input, it will give the same output every time:

>>> hashlib.sha256(b"a").hexdigest()
'ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb'
>>> hashlib.sha256(b"a").hexdigest()
'ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb'

If you change the input by even just a character, the output would be totally different:

>>> hashlib.sha256(b"hello1").hexdigest()
'91e9240f415223982edc345532630710e94a7f52cd5f48f5ee1afc555078f0ab'
>>> hashlib.sha256(b"hello2").hexdigest()
'87298cc2f31fba73181ea2a9e6ef10dce21ed95e98bdac9c4e1504ea16f486e4'

Now that the output has a fixed length, which is 64 in this case, of course there will be two different inputs that have the same output.

Here is the interesting thing: it is very prohibitive to find two different inputs that have the same output as this hashing function. Mission Impossible: even if you hijack all the computers in the world and make them run the hashing computation, it is unlikely that you would ever find two different inputs with the same output.

Not all hashing functions are safe though. SHA-1 already died in 2017. This means that people can find two different long strings that have the same output. In this example, we will use SHA-256.

The output of the hashing function can be used as a digital signature. Imagine you have a string with a length of 10 million (say you are writing a novel), and to make sure this novel is not tampered with, you tell all your potential readers that they have to count the 10 million characters in order to ensure that the novel isn't be corrupted. Nobody would do that. But with hashing, you can publish the output validation with only 64 characters (through Twitter, for example) and your potential readers can hash the novel that they buy/download and compare them to make sure that their novel is legit.

So, we add the parent's hash in the block class. This way, we keep the digital signature of the parent's block in our block. This means that if we are ever naughty and change the content of any block, the parent's hash in any child's block will be invalid, and you would get caught red-handed.

But can't you change the parent's hash of the children's block if you want to alter the content of any block? You can, obviously. However, the process of altering the content becomes more difficult. You have to have two steps. Now, imagine you have 10 blocks and you want to change the content in the first block:

In this case, you have to change the parent's hash in its immediate child's block. But, alas, there are unseen ramifications with this. Technically speaking, the parent's hash in its immediate child is a part of the content in that block. That would mean that the parent's hash in its child (the grandchild of the first block) would be invalid.
Now, you have to change that grandchild's parent's hash, but this affects the subsequent block, and so on. Now, you have to change all blocks' parent's hashes. For this, ten steps need to be taken. Using a parent's hash makes tampering much more difficult.

So, we have three participants in this case: Nelson, Marie, and Sky. But there is another type of participant too: the one who writes into the blockchain is called—in blockchain parlance—the miner. In order to put the transaction into the blockchain, the miner is required to do some work first.

Previously, we had three blocks (block_A, block_B, and block_C), but now we have a candidate block (block_D), which we want to add into the blockchain as follows:

block_D = Block()
block_D.id = 4
block_D.history = 'Sky loves turtle'
block_D.parent_id = block_C.id

But instead of adding block_D to the blockchain just like that, we first require the miner to do some puzzle work. We serialize that block and ask the miner to apply an extra string, which, when appended to the serialization string of that block, will show the hash output with at least five zeros in the front, if it is hashed.

Those are a lot of words to chew on. First things first, we serialize the block:

import json
block_serialized = json.dumps(block_D.__dict__).encode('utf-8')
print(block_serialized)
b'{"history": "Sky loves turtle", "parent_id": 3, "id": 4}'

If the serialized block is hashed, what does it mean if we want the hash output to have at least five zeros at the front? It means that we want the output to look like this:

00000aa21def23ee175073c6b3c89b96cfe618b6083dae98d2a92c919c1329be

Alternatively, we want it to look like this:

00000be7b5347509c9df55ca35d27091b41a93acb2afd1447d1cc3e4b70c96ab

So, the puzzle is something like this:

string serialization + answer = hash output with (at least) 5 leading zeros

The miner needs to guess the correct answer. If this puzzle is converted to Python code, it would be something like this:

answer = ?
input = b'{"history": "Sky loves turtle", "parent_id": 3, "id": 4}' + answer
output = hashlib.sha256(input).hexdigest()
// output needs to be 00000???????????????????????????????????????????????????????????

So, how could the miner solve a problem like this? We can use brute force:

import hashlib

payload = b'{"history": "Sky loves turtle", "parent_id": 3, "id": 4}'
for i in range(10000000):
  nonce = str(i).encode('utf-8')
  result = hashlib.sha256(payload + nonce).hexdigest()
  if result[0:5] == '00000':
    print(i)
    print(result)
    break

The result would therefore be as follows:

184798
00000ae01f4cd7806e2a1fccd72fb18679cb07ede3a2a7ef028a0ecfd4aec153

This means that the answer is 184798, or the hash output of {"history": "Sky loves turtle", "parent_id": 3, "id": 4}184798 is the one that has five leading zeros. In that simple script, we iterate from 0 to 9999999 and append that into the input. This is a naive method, but it works. Of course, you could also append with characters other than numbers, such as a, b, or c.

Now, try to increase the number of leading zeros to six, or even ten. In this case, can you find the hash output? If there is no output, you could increase the range limit from 10000000 to an even higher number, such as 1000000000000. Once you get an appreciation of the hard work that goes into this, try to comprehend this: Bitcoin required around 18 leading zeros in the hash output at the time that this book was being written. The number of leading zeros is not static and changes according to the situation (but you don't need to worry about this).

So, why do we need proof of work? We need to take a look at the idea of consensus first.