Discussion of Passphrase Entropy and Entropy Estimation Tools

And if the answer is “yes”, then you are arguing that all else being equal (specifically — the entropy being equal) the passphrase that is easier to memorize will be more secure (because there is less risk that the user will engage in insecure practices such as writing down their master password, or choosing a password that is too short).

I actually tried to describe something else. First, look at my second paragraph of the previous response which is the gist.

The point is that the word “secure” can have multiple interpretations. If I give you this 25 character password: “thei3Ul2yaepiechie4ETashe” which was just created with pwgen, then I can rest assured that you are going to either change it to something easier, or write it down somewhere visible, or just find a way not to use it. And if I give you this: “([BXF|379+yLE;m393#9U,RB”, then, oh well, you won’t even try to type it on a cell phone. Both are very “secure” but they won’t be used as a passphrase, so their security is irrelevant to a discussion about passphrases.

So while you can argue than one secret is more secure than the other, you also need to consider which one ends up protecting the vault better. And to that the answer may be that it’s a “less secure” random series of words instead of a randomized string. That’s because you need to account for the user too.

Because of that, we need to distinguish between “a secure secret” and “a secure passphrase”, because a secret (like a keyfile or a private key) doesn’t need to be remembered by a human, while a passphrase does.

OK, maybe we have a language barrier or other misunderstanding based on semantics.

In my posts (and in my interpretation of your posts, until now), the term passphrase always refers to an ordered set of real words (for example: Flyover Perfectly Conclude Parakeet Frame). Neither of the examples you gave above are passphrases.

Although random character strings as are often called passwords (to distinguish them from passphrases), the term “password” is ambiguous because it is also an umbrella term to refer to any shared secret used by humans to authenticate themselves (e.g., the Master Password may be either a string of random characters, or a passphrase). In my posts in this particular thread, my use of the term password is always in the general sense of a shared secret (specifically, the Master Password).

So in retrospect, it seems that you have been using the term “passphrase” as synonymous with “password” (in the sense of a string of characters that do not form any words); this is non-standard usage of the term, which I would discourage to prevent confusion. Perhaps you may also wish to re-read some of the discussion in this thread keeping in mind the intended meaning of the words used.

To your point above, of course it is well-known that passwords consisting of long strings of random characters are extremely difficult to remember and to type. This is the whole reason why passphrases consisting of a small number of recognizable words are recommended whenever a shared secret has to be memorized or typed (e.g., the Master Password for your password manager vault).

This thread discusses the strength of passphrases. Character-string passwords are outside the scope of this discussion.

1 Like

it seems that you have been using the term “passphrase” as synonymous with “password”

Indeed, that is correct. I was using (and interpreting) the word “passphrase” as any secret that a human must remember. So, yes, please ignore what I wrote above.

1 Like

Hello all,

So what I’ve gathered from reading this is that it seems there are two broad ways that attacks happen, brute force, and “smart” attacks. With the later, these often rely on a pattern that the user has made in their password. Such as using words from the top X words in the English language, a keyboard walk, etc, etc. However, if I create a 4 word passphrase and then change out a letter for a number, and throw a special character in, even if I re-generate the passphrase a few times, this still seems to be impossibly hard to predict. Is this correct? It can’t (reasonably) be brute forced since the entropy is too high, and for dictionary attacks, it would be pointless without going through every word as it could be spelled with any variation of numbers for letters AND adding a special character throughout the string.

I guess my logic is that if I let a passphrase generator handle initializing my passphrase, even if the reroll the generator for words that I like, a “pure” dictionary attack is pointless, and by switching and adding a character, this forces the attack back to a brute force method. The one edge case is an attacker who is trying to do a dictionary attack, switch out letters for numbers, and insert special characters throughout. Which still sounds like that wouldn’t be much better off than a brute force attack.

If this works, it leaves me with an extremely easy to memorize password that I need only remember a small variation of what was given to me. I just feel like this has to be missing something as I’ve never seen this recommended, instead I see bumping up to a 5-7 word passphrase. It’s not that this is harder to remember, but it can become tiresome to type out.

Thanks anyone who can poke a hole in my logic!

Let’s say you have words of 8 letters on average. For each 8 letter word you’ll change one of the 8 letters, hopefully choosing which one randomly, so that’s 3 extra bits of entropy. Changing the letter to another one randomly (lower+upper+numbers = 62 combinations) is ~ 6 more bits of entropy. Randomly adding a symbol from a table of 16 symbols will give you 4 bits of entropy. Positioning that to one of the 9 positions is ~3 more bits of entropy. In total you get 3 + 6 + 4 + 3 = 16 additional bits of entropy per word, or 64 extra bits of entropy for the 4-word passphrase, give or take.

If you selected your words randomly from a dictionary of 100K words, then that’s ~16.6 bits of entropy per word or ~66 bits of entropy for the passphrase. So your transformation nearly doubles the entropy from 66 bits to 130+ bits. If you used a dictionary of 4000 words then that drops to 112 bits. Anything above 128 bits is quite good.

So this password: “voPatil%e bo_ndWd 9ppea]sing attriru(te” is about as strong as this: “eir2uShoo1Uo7si8ia1equ”

p.s. I’m not actually sure about the size of the dictionary used to generate the underlying passphrase. I assumed 100K words but it looks much smaller than that.

1 Like

If you are referring to Bitwarden’s passphrase generator, it uses EFF’s word list, which has 7776 words (i.e., 13 bits of entropy per word).

However, @Justletmewarden mentions re-rolling the passphrase “a few times” when it until it contains preferred words. This will reduce entropy to a value smaller than 4×13 bits (only random decisions create entropy). For example, if there is a 50% chance that a user will re-roll around 6 times before finding a suitable passphrase, this implies that they are rejecting 90% of the possible passphrases (because 0.96 ≈ 0.5), which corresponds to an entropy reduction of about 3-4 bits. So our starting point is about 48 bits of entropy.

If these decisions are random (e.g., by randomly selecting a letter, and randomly selecting a digit to replace it with) then you add entropy. The EFF word list has an average word length of 7.0 characters, so your 4-word passphrase would contain 28 letters on average, If you substitute one randomly selected letter with a randomly selected digit in the range 0-9, you gain 8 bits of entropy. However, if you are just using “l33t” substitution rules (e3, s5, i1, etc.), then the amount of added entropy is very small.

Similarly for the special character: if you randomly select where to insert a special character, there would be 29-32 possibilities (depending on whether the words are separated or not), and if you randomly select a special character from the 32 available in ASCII, a single insertion could net up to 10 bits of entropy. However, if you hand-pick a special character, there is a much smaller set that you are likely to consider (let’s say 8), and if you don’t place that character randomly, but instead just use it as a word separator, then you may only get 3 bits of entropy.

So, with a reduced word list, predictable l33t substitutions, and a hand-picked separator character, your total password entropy may be as low as 51 bits. Note that this is less than what you would get if you just accepted the first generated passphrase without cherry-picking, and then did not make any alterations (numbers, special characters, etc.).

V13, thank you for assuming that I am doing it the most proper way, but Grb is right to assume my short comings haha.

Grb, thank you for that great response addressing all of that! Near the end you state “May be as low as 51 bits”. I see some parts that I would like to clarify to maybe solidify that number, as well as ask what you think a decent “minimum” should be. As you say at the top 65-90 should be plenty for most people.I also apologize for not making my parameters more defined making you have to answer under both assumptions!

So I’m using the passphrase generator from bitwarden to get phrases like shakily-monitor-nutrient-hurled, then applying the changes I mention. So continuing the example, 5hakily-mon!itor-nutrient-hurled. This would actually mean that I have the word separated special characters as well as the extra special character inserted randomly. This makes for a total of five special characters throughout. You were right to assume l33t transformations though.

So I think the only thing that might be different than what your assumptions acknowledge is that single extra special character. It’s hand picked, and not used randomly, so I’ll go with the 3 bits you suggest, bringing it to 54, if I’m understanding correctly. I do say that the main point of this is that I’d like to have a way to keep something at around 4 words, like what I have to type (as weird as that may sound, some things hard just tongue twisters on a keyboard), and keep it as secure as a 5-7 word passphrase. It looks like I am definitely falling short of that, so I’m going to consider going to a 5 word passphrase. Although like I mention earlier, I’d like to know your thoughts on what a minimum would be.

Moving posts causes confusion and makes it very difficult to follow a thread. :angry:

@Sizzle6397 Sorry you feel that way, but none of the posts in this thread have anything to do with the feature request to add a “secret key” or “keyfile” (like 1Password or Keepass, respectively). The top 8 posts above were moved from that feature request thread to avoid cluttering/hi-jacking it with discussion that was irrelevant to the feature request. However, it should be noted that the discussion from which these posts were moved was actually erased a few weeks ago, so it’s a good thing that I moved the posts, or they would have been lost as well.

“DefLeppard-Cincinatti(1983)” is a terrible (low entropy) password.

Not because anyone has ever used it before, likely no one hasn’t. It’s terrible because the structure of the password has surely been used before.

The structure is [band name] + [city name] + [year]. Assume there are 10,000 different bands, 10,000 different cities, and 50 different years. Add on that people will order these 3 parts in all 6 possible ways. The key space for that particular structure is a mere 3 billion (10,000x10,000x50x6). Let’s say there are 1000 popular structures known from password leaks. So the key space is now 3 billion x 1000 = 3 trillion.

A single GeForce RTX 4090 GPU, which can check 164 billion hashes per second, would go through all possibilities in 6 seconds. It would be almost trivial for a skilled attacker to write a script using something like Hashcat.

You can try to add entropy by adding random numbers here and there. But even if you increase the key space by 1000, all that means is that instead of 6 seconds it would take a few minutes to crack.

That password is fine for your Netflix account but not for anything important.

@Halo72 Welcome to the forum!

The person whose post you are responding to (@dh024) was a long-term forum regular and “community leader”, who has unfortunately been absent from the forum since January 5 (possibly due to his exasperation with this very thread). Although he is not here to defend himself, I think that his counterargument would amount to the following points:

  • Cincinatti is not really a city name, because it is misspelled.
  • It is impossible for an attacker to guess the password structure is a reasonable amount of time, and therefore they can only crack it using character-by-character brute force guessing.

 

For my own part, I agree with what you’re saying (you can read my argument with David above, in comments #9#18).

However, David has a point in that the guessability of the password generation scheme (i.e., its structure) does in fact play a role in determining the strength of the password. To take this factor into account, one could estimate the password entropy as the sum of two terms:

  1. The entropy associated with the password, given that the password generation scheme is known.
  2. The entropy associated with the password generation scheme itself (i.e., related to the probability if guessing the scheme if it is not known in advance).

I think we both agree that the first of these two terms (entropy of the password itself) is unacceptably low. You estimated approximately 15 bits, and perhaps we can add another 10 bits to account for transformations (misspellings, l33t, etc.) and special characters.

Unfortunately, the second of these terms (entropy of the scheme) is difficult to accurately estimate. One proposal I have seen suggests quantifying this entropy by the amount of information required to describe the scheme. Alternatively, one could attempt to extrapolate based on the number of known schemes. Evidently, David believes that the entropy of his scheme is very large (approximately 145 bits or more), so that the probability of guessing his scheme would be lower than 10-43 — essentially, a negligible risk. However, he has presented no basis for this estimate, and I believe that the number of possible schemes that could reasonable be expected to be guessable is many orders of magnitude less than 2145.

Thus, two points that I’ve been trying to make in my arguments above are:

  • The entropy of the scheme is likely not sufficiently large to make up for the low entropy of the password itself. If the [BandName]+[separator1]+[Misspell(City)]+[separator2]+[Year]+[separator3] scheme is among the first million schemes that are attempted in an attack, then the total effective entropy may not be larger than 45 bits. If it is among the first 1000 schemes tried, then the effective entropy may be lower than 35 bits. For a targeted attack, in which the attacker obtains information about the victim (e.g., their geographic location, or their taste in music), these numbers may be even lower.

  • Because of the difficulty in correctly estimating the entropy associated with guessing the scheme, there is no lower bound (other than 0 bits) or knowable upper bound for the value of this entropy term. As a result, a person using this scheme (or any other non-random scheme) has no way of knowing the strength of their password — they can only hope that is it sufficiently high to be uncrackable.

In my opinion, it is essential to be able to accurately estimate the entropy of one’s master password, which is why I always recommend using a randomly generated master password. In essence, my position is that the master password should have sufficient entropy on its own, even if the entropy of the password generation scheme is known to the attacker (making the entropy of the scheme 0 bits). This is also consistent with Shannon’s maxim and Kerkhoff’s principle.