Breaking Down the Math Behind Blockchain Mining

Bitcoin Mining Nov 03, 2020

We were inspired to do this post in response to a thread that was published on Twitter regarding the idea of 'miner capitulation' by a cryptocurrency influencer named, "Preston Psyh"

Unpacking the Misunderstandings

Before we begin to break down all of the reasons why Preston's conclusions in his thread are terribly misguided, let's at least 'set the stage' by presenting them to you (the reader)

Link to Twitter Thread = https://mobile.twitter.com/PrestonPysh/status/1265975651422687239

Below are some of the 'highlights' from the thread:

This is going to be a 'long one' because it will take several articles to unpack fully, but we feel that we're up to the task.

Breaking Down Blockchain Mining

Before getting into the math, this piece should be prefaced with a disclaimer that hopefully gives users the understanding (in subsequenty parts as this is all put together), that there are absurd amounts of profit that can be made from mining in blockchain (and also, no one  is restricted to mining on one protocol as Preston and many others seem to think for some reason).

We're going to keep this brief by simply covering some math that we did about a year ago responding to an open argument on Twitter between Peter Rizun and Craig Wright.

Figuring Out Blockchain’s Hardest Word Problem:

“Assuming that it takes at least 20 seconds for a block to be propagated to the network, what are the chances that an orphan block will be created on the Bitcoin network?”

Despite this question’s appearance of simplicity, there are several steps that must be taken before solving. We’ll go through all of them below before we dissect Craig Wright’s logic.

First Objective

In solving this problem, our first objective will be to dissect the actual probability of a block being solved in 10 minutes or less. As stated above, that probability is not 100% and it should not be considered to be such.

As noted in Craig Wright’s response to Peter Rizun, Bitcoin’s re-targeting algorithm does utilize a Poisson distribution.

The re-targeting formula is as follows:

    #include <iostream>
    #include <cmath>

    inline float fast_log(float val)
    {
       int * const exp_ptr = reinterpret_cast <int *>(&val);
       int x = *exp_ptr;
       const int log_2 = ((x >> 23) & 255) - 128;
       x &= ~(255 << 23);
       x += 127 << 23;
       *exp_ptr = x;

       val = ((-1.0f/3) * val + 2) * val - 2.0f/3;
       return ((val + log_2) * 0.69314718f);
    } 

    float difficulty(unsigned int bits)
    {
        static double max_body = fast_log(0x00ffff), scaland = fast_log(256);
        return exp(max_body - fast_log(bits & 0x00ffffff) + scaland * (0x1d - ((bits & 0xff000000) >> 24)));
    }

    int main()
    {
        std::cout << difficulty(0x1b0404cb) << std::endl;
        return 0;
    }

Users hoping to learn more about how hashes are calculated on the blockchain should visit this link on the Bitcoin Wiki: https://en.bitcoin.it/wiki/Difficulty#What_is_the_formula_for_difficulty.3F

Specifically, this statement posted below from the link explains the calculation of hash rate on the blockchain perfectly:

"The difficulty is adjusted every 2016 blocks based on the time it took to find the previous 2016 blocks. At the desired rate of one block each 10 minutes, 2016 blocks would take exactly two weeks to find. If the previous 2016 blocks took more than two weeks to f ind, the difficulty is reduced. If they took less than two weeks, the difficulty is increased. The change in difficulty is in proportion to the amount of time over or under two weeks the previous 2016 blocks took to find."

Right now, Bitcoin's hash rate is at a whopping 97.19 Eh/s :

Obviously, the hash rate will be different by the time any reader is viewing this - but the calculations still apply; meaning that you can swap out any value for 97.19 Eh/s, and arrive at the same result.

Simplifying the Calculations

97.19 Eh/s is an extremely large number (if you look at it in terms of how many individual hashes happen per second).

Below is a chart that provides the metric conversion for hash rate:

As we can see from that hash rate conversion chart above, 1 Eh/s is the effective equivalent of 1 quintillion individual hashes per second.

How We Simplify the Calculations

To start with, we're going to divide the total Eh/s by a factor of 1 million.

Doing so will simplify our hash rate down to 97.19 million Th/s.

We used the number '97.9M Th/s in our calculations below. 

This shift in # was due to us taking a 'break' at this point in the culimination of this article, then coming back and taking a *subsequent* look at the updated hash rate on the blockchain (forgetting that we had already polled this information in the prior section). 

This does not change any of the math that is done below - but w/o explaining this to the reader, there was a major chance that one could become confused after seeing an inexplicable shift in our calculation of the total number of hashes per second made on the blockchain

Next Step

To simplify the metrics here, we will not look at these 97.19 million TH on an individual basis.

Instead, we will borrow a concept from our good friend Charles Hoskinson from Cardano and divide each second into an ‘epoch’ to simplify our process moving forward in the first part of our equation.

Since the Bitcoin blockchain re-targets every 2016 blocks by adjusting its difficulty so that it takes, on average, 10 minutes for the entire network to find a solution — we know that there are 600 ‘epochs’ or seconds that we’re looking at here.

So, every second worth of trials on the blockchain (currently) = an attempt because we’re looking at the network holistically and assuming that all miners will continue mining within that 20 second time frame (on the north side of that time frame; between 10 minutes and 10 minutes 20 seconds). Craig argues that this would not be the case, but we’ll get to that part of the argument and debunk it later.

For now, we will continue under the premise that every second represents an opportunity.

Breaking Down the Math on the Probability of a Solved Block Within 10 Minutes

n = 620 trials (assuming we set an upper bound limit of 620 seconds, which = 10 minutes 20 seconds)

However, to really calculate this we want the chances of success in a given trial (assuming perfect re-targeting and consistent hash rate), which would be (1/600)⁶⁰⁰ (for 600 trials; now you see why we needed to simplify this slightly).

Since these are Bernoulli’s trials (i.e., each second worth of hashes are independent ‘attempts’), we need to use the formula:

X =1-(1-p)^n; where n = (# of trials & p = probability of success)

In this case that leaves us with the following

X = 1 — (1–1/600)⁶⁰⁰

The result = .632 (approximately), which means there is a 63.2% chance that we come up with the solution within the 10-minute time frame.

So, p = .632 (per block; each block = 600 seconds)
.632 converts to a 63.2% chance of the network successfully finding the next block.

Now that we have this information, we can solve this riddle once and for all.

Solving the Equation

Below are the following steps that were undertaken to figure out the solution to the question:

“Assuming that it takes at least 20 seconds for a block to be propagated to the network, what are the chances that an orphan block will be created on the Bitcoin network?”
  1. It is already known that traditional targeting should lend a 63.2% chance of the block being found <= 10 minutes (sliding scale up to 63.2% at the height; this is on an exponential distribution).63.2
  2. Currently, blockchain.com says that the network hash rate = 58.740M TH/s. We used that for reference in this case.
  3. There are 600 seconds in 10 minutes. 600 x 42.9M = 58.740B TH (total). So, there’s a 63.2% chance that 1/58.740B TH will find a successful nonce value at this difficulty.
  4. If we divide that success rate (63.2%) by 600 we should get an average success rate of .105% after one second (multiplied cumulatively as we go on; stays consistent with the fact these are all independent trials).
  5. Going back to that metric in #1 that there are 97.9M TH/s, we simply divided the .105% by 97.9 to figure out the chances of the first 1 million hashes being successful. This gave us a total of .00244755244%.
  6. We did that process in #5 because we’re assuming that a successful nonce find will result in that pool dropping out (everyone else will keep mining because we’re assuming their blissfully unaware a solution has been found). Based on pool size, we calculated that would be an 11% drop.

The Block-to-Block Hashrate Hit is an Important Factor That is Not Taken into Consideration

As mentioned in #6 above, once the nonce has been found by a miner (mining pool), they have no further incentive to continue mining the block  because they know that they are effectively the 'winners'.

However, the rest of the network will remain unaware of this fact because those blocks have yet to be propagated to the rest of the nodes on the network.

Thus, there is some finite amount of time that it takes for the 'winners' to:

A) Receive the relayed nonce from the 'winning' miner in their pool & process that information (technically or as a human being)

B) Subsequently relay this information to the Bitcoin network

C) Have that information propagate out to the wider network

Only after 'C' occurs will other miners be logically incentivized to discontinue mining on the highest known block height (version of the blockchain with the greatest calculable Proof of Work).

Given this fact, there is an additional hash rate hit that will always take place at some point during the block discovery process.

Potential Selfish Mining Impact

The theory put out by the pseudo-intellectual adjunct professor, 'Emin' (paid by the Ethereum corporation) a few years ago outlined the idea of 'selfish mining' on the Bitcoin protocol.

And while failing to understand many of the nuances of how Bitcoin actually works, Emin Gurer was correct about one thing - the network does rely on miners to properly relay newly found blocks.

This Comes With a Caveat

Mining electing to practice 'selfish mining' are taking a gamble because, if there is another block that is propagated to the network by the 'legitimate' miner, they will be "locked in" to the block that they are mining (since we're assuming that they have a block that they've found the nonce for and have elected to merely mine one block ahead in secret versus informing the rest of the Bitcoin network).

'Why would they 'locked in' to the block that they are mining?'

Because their hash rate will invariably be different from the one that the network now knows as the correct block height.

'Couldn't they still propagate that secret block at the same time that a newly found block is propagated by another mining pool that isn't engaged in this "selfish mining" practice?'

Yes. They could.

And, yes, this would result in some nodes potentially accepting this conflicted version of the blockchain (contingent on latency, network connectivity, timing & UNIX time offset which the chain provides a massive allotment for of up to 2 hours for time drift ; another fact that plays into the raw # of hashes estimated to have occurred at any given point).

The Setback

When that new block is propagated, there is no reason to assume that any miners that began mining on their known version of the block to begin mining on the 'selfishly mined' block that has just been revealed to the network.

And there would be great incentive for them to not do so.

Bernoulli's Trials + Poisson Distribution

As mentioned before:

A) Bernoulli's Trials = Dictate that each hash on the chain is its own independent event. Thus, as additional hashes are made on the chain, the probability of any one hash being responsible for finding the nonce never increases

B) Poisson Distribution = Bitcoin also utilizes the Poisson distribution to probabilistic target a 10-minute block time. That means that, on average, each block will be submitted within a 10-minute time interval.

Taking a Look at Poisson Distribution as it Pertains to Bitcoin

Looking at the picture above, imagine that the '10 minute' mark (since the last block has transpired) is the top of that bell curve.

Importance of the Poisson Distribution in Factoring These Calculations

We must not forget that the Poisson distribution is dependent upon the Bernoulli's trials that take place on Bitcoin.

This means that the 10-minute target is premised on the total number of hashes (in our case, 97.9 Eh/s) ; thus, if a miner were to 'selfishly mine' (effectively creating a forked / alternate version of the Bitcoin protocol), then the probability of finding the next block for both the attacker + the rest of the network decreases by the order of magnitude of hashes that are no longer mining on that given version of the blockchain.

Thus, the attacker is at a steep disadvantage in comparison to the remainder of the network if they can not leverage at least as much hash rate as the rest of the network combined (however, if this is not the case, then the opposite would be true in this situation, and they are willing to allow the remainder of the network to participate again - and this advantage would increase as they heightened their selfish mining practices ; but if things ever got to this point, then the idea of 'selfish mining' would become moot).

Adjusting Our Calculations to Factor for the Information Above

If you remember, in one of our previous sections when we were outlining the steps needed to come to our final answer (calculations), we made the following assumption:

We did that process in #5 because we’re assuming that a successful nonce find will result in that pool dropping out (everyone else will keep mining because we’re  assuming they’re blissfully unaware a solution has been found). Based on pool size, we calculated that would be a 11% drop.
'Why was that assumption made?'

Great question.

We simply looked at the distribution of hash rate on the Bitcoin network (as provided by btc.com) and used that as our baseline estimate:

[source: btc.com/stats/pool]

From the above, we merely took the average share that each respective entity has on the chart above (we purposefully ignored smaller, independent entities mining on the blockchain), and then divided that number by the total number of entities identified on the chart above.

Imperfect Means of Calculating

This is an imperfect means of calculation, but coming up with an 'exact' number here is of little importance since there is ample evidence to suggest that the typical winners of the next block on the Bitcoin network at any given point in time is usually a larger mining pool entity (see below):

[source: coin.dance]

As we can see from the chart above, the most frequently spotted winners of blocks for $BTC + $BCH are as follows:

  1. F2Pool (x2)
  2. Antpool (x3)
  3. BTC.com / BTC.top (these entities are essentially one in the same; if that's news, then we'll get into that at a later point in time)

Our sample size is roughly 20 blocks (among those 20 blocks, the entities named aboveo are reponsible for nearly half of them).

Taking a Look at Their Respective 'Market Shares' (by 'market share', we mean 'hash rate' ; hardy har)

(F2Pool comes in with a cool 20% of the hash rate on the Bitcoin network)

(BTC.com accounts for 12.2% of the network hash rate as well)

(Antpool represents an almost equivalent chunk as 'BTC.com', coming in at a cool 12.5%, respectively)

Surprisingly, 'Poolin' (which currently has 15.9% of the total network hash rate at the time of writing), had yet to contribute any blocks among the brief sample size that we saw above.

(Please Note: The calculations above purposefully excluded $BSV because of the greater variance in mining pools that are on their chain + lack of identifiable information associated with winning blocks submitted to the chain ; see below):

Moving Forward With Our Calculations From Where We Left Off

  1. 89% of 97.9 million Th/s = 87.131M Th/s. We already calculated the percentage of the first 1M hashes in #5 for this very purpose. Now we can multiply that by 87.131 to get a decreased success rate of .09%/second.
  2. From this point, it becomes a classical probability problem like what we used to do in school. Each second is a “trial” (to simplify this so we’re not playing with a dozen+ zeroes). So 20 seconds = 20 trials. We’re looking for 1 success; 19 failures in those 20 trials. Probability of success in each trial = .09% (decimal = .0009).
  3. Plugging all of that in gives me a 1.7% chance that a block is found within 20 seconds at that hash rate given the current difficulty.
  4. Since these two events must happen in unison, we need to multiply the two probabilities together (63.2% & 1.7%), which gives us a final value of a 1.07% chance.

Below is the mathematical notation (LaTeX format) for the mathematical notation in number '1' above from our list:

Conclusion

There's nothing to really conclude from this piece, but there is a significant amounto of information to glean from it.

This was done to ensure that the contribution would remain in the general 'canon' of publicly available 'Bitcoin-related' information in the "main stream" (as they like to call it).

If there are any questions, please do not hestitate to reach out

Tags

cryptomedication

Happy to serve and help wherever I'm needed in the blockchain space. #Education #EthicalContent #BringingLibretotheForefront

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.