Entropy of passphrases that contain pwned passwords/phrases

I understand that you did not mean this literally, but just as an example of what a 5-word diceware phrase might look like. However, to avoid misunderstanding, it should be pointed out that technically, this particular passphrase only has 13 bits of entropy, since “horse correct battery staple” is a pwned password and a common trope. A “real” example of a diceware passphrase with 65 bits of entropy would be something like “accurate cocoa heroics impulsive last”.

To me, the more relevant “technical” observation is that all the samples on this page are bad choices, including OP’s, yours, mine, and XKCD’s because they have now all been publicly mentioned.

To keep the story self-contained, I was reflecting back to the referenced xkcd, and showing how their example horse correct battery staple fits into the table, with a slight onion adjustment to accommodate a 64-bit table.

But you are correct that since horse correct battery staple is in the Have I Been Pwned database, its effective (as opposed to mathematical) entropy caps out at a lesser strength. But that cap is not zero because there is no expectation that horse... would be tried first. I suspect a better estimate for the cap would be 30 bits because horse... occurs 11 times in HIBP’s database of 15 billion powns ( 30 ~= log2(15b/11) ).

As an aside, if one adopts the philosophy of checking substrings against HIBP, every passphrase, including accurate cocoa heroics impulsive last would cap out at zero (or 30) bits because all of the individual diceware words seem to have been pwned. To my knowledge, this philosophy has not been adopted.

:thinking: Hmmm… if you are concerned about order, I would say that the alternatives are horse correct battery staple onion and onion horse correct battery staple, so there would be an additional 1 bit of entropy if that choice was made randomly.

The only viable method of estimating password entropy is to base calculations on a description of the process used to generate the password. Estimating entropy based on analysis of a single password exemplar generated from that process will never yield a valid result.

In my comment, I had made the assumption that your process was to use the XKCD example with one added passphrase word, which seems to have been a valid assumption, based on your description:

If a random password is generated by selecting 28 printable ASCII characters at random, the entropy would be estimated as184 bits, even if the generator rpoduces the 28-character string horse correct battery staple… It should be noted that the probability of this happening is vanishingly low (4×10–56).

That is the not what I said. A bit more context (with added emphasis) makes it clear that each of the samples are an example of what the particular methodology created or could have created:

Do note that Randall’s own estimate is that horse correct battery staple has 44 bits of entropy. Absent evidence discrediting Randall’s calculation, I do not feel it fair to conclude that horse ... onion has 0+13 bits, although I would accept that 44+13 is not “close enough” to 64 to be considered similar.

Help me understand how one determines that horse correct battery staple contributes zero bits of entropy because it is pwned (11 times), but accurate cocoa heroics impulsive last enjoys 13 bits of entropy contributed by accurate despite it having been pwned 3,500 times.

Derating based on the pwnage of substrings is something I have never heard of. Nor do I believe it scales. The letter “a” has been pwned 750k+ times, yet we would not severely derate every password/phrase that contains an “a”. And since all other letters appear to be pwned, things rapidly disintegrate as every possible substring is considered.

Derating based on the entire password being pwned is something I could get behind; derating based on a substring seems nonsensical.

When I wrote the quoted statement, I was not referring to your original comment, but rather to your response above.

Again, such analysis would not be valid, because it would be based on isolated password exemplars (not on analysis of the generation process).

I have not proposed this.

As I tried to explain above, my statement about horse correct battery staple onion having 13 bits of entropy was based on an analysis of the process by which you came up with that phrase — it was not strictly based on examination of the passphrase components.

Now, to be fair, you did not disclose your method until after had done my analysis, so I confess that I did guess the process that was used, based on context clues (the fact that you had referenced the XKCD comic, and the vanishingly low probability that anybody would be able to randomly generate the exact phrase “horse correct battery staple”). So I would agree that my method for deducing the generation process was not 100% rigorous, and that this would be a valid basis on which to question my 0-bit valuation.

However, the 0-bit estimate for “horse correct battery staple” was subsequently validated, after you disclosed the process and confirmed that it was not a random choice:

In an alternative universe, if you had been collaborating with Randall on the XKCD strip about passphrases, and he had proposed: “Let me come up with a four-word passphrase, and you come up with an extra random word to add to the end” — and if the published comic strip used the example horse correct battery staple onion as a result of this collaboration — in that case, I would agree that the entropy of the 5-word phrase generation process would be around 55 bits* (or 57 bits, if you used diceware for your part).

*One can also legitimately criticize Randall’s 44-bit estimate, but that may be off-topic even for this off-topic side-discussion…

This conversation feels like it is rapidly diving into the pedantic.

Password entropy does not change based on how many times, or even who cites the password. It only depends upon how it was generated. Since Randall claims 44 bits, that is what it has and will forever have.

Whether the password remains “good” and should be used is a completely different question that does not change the math.

I feel that any disagreement in this conversation is based only on miscommunication (which is one reason why I am attempting to be precise in my wording; perhaps this is what you are perceiving as pedantry).

I think we are both saying the same thing.

And because entropy only depends on how a password/passphrase was generated, it is (in my opinion) important to avoid referring to the entropy of this-or-that passphrase/password, and instead refer to the entropy of the process.

For example, the entropy of the word onion is undefined, and without context, the entropy of the process used to produce the word onion is unknown. But given the information that the process was to use a CSPRNG to select one word form the EFF word list, then the entropy (of the process) can be determined to be 13 bits.

Similarly, there is no way of assigning any entropy to the passphrase horse correct battery staple.

Apologies if I have not expressed myself clearly.

2 posts were merged into an existing topic: Passphrases generator should use nonsense/fake words and place digits in multiple locations