Monero Wallet Library Backdoors

Monero Mar 06, 2021
Foreword: Originally, this was crafted in response to me pointing out a flawed library that Monero elected to have serve as the basis for its new wallet code. However, after fluffypony decided to reach out and attempt direct intimidation, it became apparent to me that I needed to leave no stone unturned here

If you're curious about fluffypony's correspondence, here it is below (private message confidentiality doesn't apply to unsolicited messages):

Never one to back down from a challenge, let's see if we can't meet Fluffypony's demands.

Introduction

Monero as a community offers individuals the chance to pitch "proposals" to the rest of the community.

A typical proposal includes a general summary of what that user intends on doing, requested funding amount, as well as an estimated date of completion along with established milestones.

For the wallet in question that was backdoored (badly), that proposal can be found here: https://ccs.getmonero.org/proposals/xiphon-7.html

181 XMR = $36,500 currently; nearly twice what renowned cryptographer JP Aumasson was offered by this same community. 

Scrolling down slightly, you'll find the object of this report's ire.

Below is a screenshot with the problematic portions highlighted:

The specific GitHub repo we're going to be taking a look at is the 'monero-seed' one.

tevador/monero-seed
Proof of concept 14-word mnemonic seed for Monero. Contribute to tevador/monero-seed development by creating an account on GitHub.

To warm us up, let's start by identifying some basic implementation flaws (and questionable decisions) in the wallet's construction.

This is being done to establish that we didn't arrive at the conclusion that this user is attempting to backdoor other Monero users by simply misreading their potential incompetence as nefarious.

Off the bat, some of the problematic statements made in the repo about the wallet's construction:

  1. "Embedded wallet birthday to optimize restoring from the seed" (this makes the wallet markedly more insecure then it needs to be by including unnecessary information  that essentially leaks more info that should've otherwise been left private)
  2. "[A]dvanced checksum based on Reed-Solomon linear code" <-- this is another baffling break from convention here that's inferior to the commonly used CRC32 [this is the modern standard currently]
  3. "Some file formats, particularly archive formats, include a checksum (most often CRC32) to detect corruption and truncation and can employ redundancy or parity files to recover portions of corrupted data."
  4. Not to mention that CRC32 is typically used for "digital networks and storage devices to detect accidental changes to raw data" ; this is the most likely vector of compromise for users in this context [attacker gains nothing by corrupting the underlying data]

Unexplained Departure From BIP32/39 Convention

The URL for the BIP39 specification is pasted here for convenience and reference as you continue to follow along  

bitcoin/bips
Bitcoin Improvement Proposals. Contribute to bitcoin/bips development by creating an account on GitHub.

The link above has the original specification for BIP39 (mnemonics, which is what is the construction that this individual claims to be adhering to BIP39); evidence of this claim below from the user's repo:

Incorrect Mnemonic Word Count Selection

I'm not sure if this individual thought the mnemonic phrase word count was an arbitrary choice, but it isn't.

Entropy is used to generate a binary strong of 128 -bits (depending on the user's specific implementation). After appending the checksum (4-bits additional), the words are generated from the binary string - with strict mappings to the BIP39 dictionary.

This is mandated by the BIP39 specification (as seen below):

The Monero wallet developer here uses 14 mnemonic words (which deviates entirely from convention).

They also state that the phrase contains "154 bits of data", which are reserved for "future use", "wallet birthday", "128-bits for the private key seed", and "11 bits for checksum".

This is extraordinarily incorrect. Normally a BIP32/39 key is derived by:

  1. Generating 128 bits of entropy (for example)
  2. Hashing the entire 128 bits of entropy (with the chain's hash algo); then extracting the first 4-bits and appending that to the end of the 132 bits of entropy (that's the checksum; this person didn't even specify how they were going to derive the checksum)
  3. Those 132 bits is supposed to be divided by 11 equal parts (resulting in 12 different "words"). Every 3 words = 32-bits of entropy; there are 4 groups of 32, which calculates back up to 128 (can't forget the additional 4 bits for the checksum).

Explanations By Developer For Their Decisions Make Zero Sense

Not only has this individual deviated significantly from the standard (which I had qualms with as is, but that's aside from the point), the user also fails to justify the thinking behind any of their arbitrary changes (which all make the address creation process inherently less secure, somehow).

Under the "reserved bits" section on their GitHub 'Readme', they make statements that diverge from any semblance of rational programming or cryptographic logic.

Without any greater explanation, the wallet developer insists that they are able to store some of the bits derived from entry for "reserved" for some other purpose (what the fuck does that even mean?).

What this individual does not appear to realize, is that this entropy is part of the normal BIP39 construction for generating the binary bit-strength necessary to generate the corresponding mnemonic phrase in UTF-8 NKFD format, which ultimately is to be piped into the PBKDF-HMAC-512 construction to derive a key the wallet's key.

Thus, in this context, its nonsensical to suggest that one is "reserving bits".

Either bits are being used or they aren't.

Statements Made About the 'Bits' Reflect a Fundamental Failure to Understand EDDSA / ECDSA as Well

The subheading for this section may seem a bit harsh, but its more likely an understatement, if anything.

To provide another example, one claim / goal stated by the developer is to use the "reserved bits" to later implement a, "[F]lag to differentiate betwen normal and 'short' address format".

This statement is where things get a little embarrassing, since Monero already has a construction designed to do just that.

The public Monero address is a concatenation of the public spend key + public view key. They're both derived as two different valid points on the Edwards' Curve.

The publication, 'Zero to Monero' corroborates this as well:

There are two different sets of coordinates used because ed25519 keys can be used for both encryption and signing (unlike ecdsa / secp256k1). Monero takes advantage of this fact by generating two sets of keys for users (by specification) when they create their addresses.

One set for people to encrypt to (public view key) ; the other, a public spend key that others can use to derive a subaddress based on what they know (the mission-critical nature of these addresses can't be understated since this all ties into the construction that Monero uses to protect against double spend attempts)

So I'm puzzled at why this user proposed having the "view key equal to the spend key"

Not only does this idea make zero sense, it also would make Monero less private by several orders of magnitude.

In fact, at a glance, this would nearly erode nearly any privacy assurances that the project has left.

Getting to the Part Where it Appears This User is Trying to Create a Backdoor

To reiterate my sentiment from the beginning of this article - all of the qualms / issues / critiques provided above have nothing to do with my assessment of this code as a potential backdoor on users.

So this is far from a smear campaign (to be entirely honest...because this code is bad enough to take it that direction).

The Devil is in the Details

If one scrolls down to the bottom of this user's GitHub README for the 'monero-seed' repo, they'll find a subheading titled, 'Private key seed'.

This is the first spot where I observed that this user had purposefully reduced the security of the key derivation function (for no apparent reason).

See below:

Specifically it states:

"The private key is derived from the 128-bit seed using PBKDF2-HMAC-SHA256 using 4096 iterations. The wallet birthday and the 5 reserved/feature bits are used as a salt. 128-bit seed provides the same level of security as the elliptic curve used by Monero"

Ah! No! All of this is wrong & almost maliciously so.

Let's start from the top though.

This User Weakened the Key Stretching Function Specified in BIP39

There is no conceivable reason for them to take this action (and this is actually something that may uniquely compromise users due to a quirk in PBKDF2 and HMAC).

But first, before we get to that, let's establish that this individual did indeed weaken the strength of this key stretching function (PBKDF2-HMAC construction). To be clear, by "purpose", I mean that this individual went out of their way to specifically (and exclusively) tweak the algorithm in a maner that reduces its security exponentially - and by 'exponentially', I mean that term in a literal mathematical, cryptographic sense.

Below is the specification from BIP39 again (Bitcoin):

Notice something?

In the BIP39 specification, the key stretching function is the HMAC-SHA-512, NOT the HMAC-SHA-256.

Also, keep in mind that this is the reference implementation. So the chances that this individual stumbled across a library using the SHA256 hash function in lieu of SHA512 is... slim to none.

Going further, if we look at the key length / bit / strength information provided by a matrix table from Wikipedia (these values were cross-referenced with the NIST specifications; this can be done independently by anyone reading along as well):

Please keep in mind that security in the sense of cryptography refers to the strength of the function to the power of ( xn; where 'n' bits is the algorithm's listed strength, hence the security against collision attacks).

Therefore, the difference between 256-bit strength and 128-bit strength is a hell of a lot more than 2x; the difference is most likely more than probably several hundred million times the other (and that's just me spitballing; probably low-balled that number; notation reference =  2128 vs. 2256)

Weird Nuance in the HMAC Convention Would Cause Collisions With Their Proposal

Since they elected to go with PBKDF2 - HMAC256 (vs. 512 variant), we need to take care to consider the length of the input being piped into this hash function.

Specifically, according to RFC2104:

"Keys longer B bytes are first hashed using H"  [source = https://tools.ietf.org/html/rfc2104]

This, notably creates a pseudo collision, where the sha256 of the input is the same as its HMAC (in essence); this essentially nulls the purpose of the HMAC in the first place.

This is detailed in this post here: https://mathiasbynens.be/notes/pbkdf2-hmac

If you got a chance to play with the embedded Repl code from above, then you probably saw something pretty crazy in live time!

Which validates our concern about the modification of the HMAC (especially since, once again, there was no rhyme or reason given for this decision).

It Appears That This User Neutered the HMAC Portion Entirely

This is a real problem at this point. And the omission of HMAC in this scheme the user is designing cannot be chucked up to ignorance or "not knowing".

After reviewing the code, however, we can see that it does contain references to the HMAC, specifying that PBKDF2-HMAC-SHA256 is used here (and not SHA-512 once again, per the actual specifications of BIP39, which this developer claimed to be building code to adhere to).

Again, as shown above in the prior section, this user's reference to BIP39 shows that they  have had exposure to the specification.

So there's no conceivable reason for removing the HMAC (key stretching) function from this scheme they're crafting.

Individual's Claim About the Strength of the Entropy vs. Strength of Curve Are 100% False

Under the same section as the dubious private key entry, the user states:

"128-bit seed provides the same level of security as the elliptic curve used by Monero."
  1. Monero doesn't use elliptic curves
  2. No it fucking doesn't
  3. Entropy is not "security", nor is it ever factored into the bit-strength of the "elliptic curve" or the 'Edwards Curve' in this case.
  4. These curves are geometric functions that depend on the assumed hardness of the discrete logarithm problem as their security assurance.

This is well known information...

Cardinal Sin: Downgrading of Argon2 to PBKDF2

This one is inexcusable in any universe.

The commit was made on June 14th, 2020:

https://github.com/tevador/monero-seed/commit/f1c7829f043322849c0323f58aa0a46352de4e02

Visiting the commit directly, we can see this individual adding PBKDF2 to the project while simultaneously removing Argon2.

Nevermind the fact that this is oddly coded in 'C', which is a curious language choice for a simple library (there are many, much lighter weight and easier ways to implement this).

One particular qualm about 'C' is that its not memory safe ; which means that we aren't afforded protection from the program overflowing (which is more than realistic and, perhaps plausible, considering the input block length)

Explaining Why the Argon2 Swap Was Ludicrous

If you were to ask any security professional on planet earth what their opinion was on the best hash algorithm (for ensuring your information remains behind an impenetrable fortress.

For those familiar with mining, you may remember that hash algorithm - after all, its used in Monero.

Password Hashing Competition

Recently, there was an international competition that sought to find the latest and greatest in the world as it pertains to KDFs (key-derivation functions).

While said competition may seem a bit preposterous, it did indeed exist. And it was hosted, followed and adjudicated

source: https://www.password-hashing.net/

No Conceivable Reason For the Replacement of Argon2

Argon2 is exponentially stronger than PBKDF2. And, in this situation, we want the strongest passsword hashing algorithm possible.

In fact, one could argue that there may be no time better than now to pull out all stops and deploy the best cryptographic methods commercially available to ensure the safety of one's funds.

And it is at this juncture, that the developer decides to strip out the Argon2 construction in exchange for PBKDF2. This decision is baffling, to say the least.

Multiple experts in the field of cryptography and elsewhere have vouched for Argon2 as being the strongest password-hashing algorithm out there, including JP Aumasson, a world-renown cryptography that recently audited Monero's latest ring signature upgrade (CLSAG)

Example of How to Derive a Monero Address From an Argon2 KDF

There's live (open source) code on the internet that can be audited and/or compiled to test the veracity of this construction.

Any reader visiting the link will be taken to a cool module built from the WarpWallet principle utilized by Keybase.

Here's the URL = https://patcito.github.io/mindwallet

Address generation process is deterministic (just like ed25519), so that's good for this scheme. The entire code runs client side and is available for users to download at their leisure to deploy Golang / Python as your preferred language to interact with

Reed-Solomons Code Weakens the Mnemonic Selection

See below:

This should be wholly impossible if the key is constructed properly.

Ian Coleman's Library

Ian is one of the most prolific PoC composers this space will ever know.

And lucky for us - he has one for BIP39 as well, which should allow us to get a general gist of the security of this assumption

Below is a screen of the site:

source: https://iancoleman.io/bip39/

Notably, Ian's specifications for the mnemonic word options mirror the actual implementation too:

If we utilize the 'BIP39 Split Mnemonic' feature, that's when we'll see a concise estimate of how long it would take to break to crack a wallet where the user has submitted in some of their too much for a second ("cards")

Tags

cryptomedication

Happy to serve and help wherever I'm needed in the blockchain space. #Education #EthicalContent #BringingLibretotheForefront

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.