Discussion of Passphrase Entropy and Entropy Estimation Tools

grb · January 5, 2023, 5:42pm

Note: This post and the 7 posts that follow below were moved from the thread Add (optional) Secret Key functionality (Like 1Password) or keyfile (Like Keepass) because they were off-topic, but worthy of their own thread.

How do you figure? You would need to draw random words from a dictionary containing 3 million words if you are going to get 150 bits of entropy from a 7-word passphrase. A typical diceware wordlist (like the EFF word list used by Bitwarden) contains only 7776 words, so you get “only” 65-90 bits of entropy for a 5-7 word passphrase. That being said, 65-90 bits of entropy should be plenty for almost everybody.

222 · January 5, 2023, 5:45pm

I just quickly used this online calculator:

Edit: I suspected you were correct but then also tried this and it took me over 150 bits of entropy with a good passphrase and tiny bit of salt.

https://rumkin.com/tools/password/

JimmyW · January 5, 2023, 6:11pm

I’m late to the dance and not as expert as many of the other commenters, so please forgive me if my comments are repetitive or irrelevant. First, may I assume that one’s vault is encrypted at all times while it’s on a Bitwarden server/device? If so, it doesn’t seem that burdensome to create an easily memorable passphrase of 80-100 BOE. It also doesn’t appear terribly difficult to avoid words on typical English word lists. Lastly, can one use Windows special characters (Alt-codes) in a passphrase? I realize that doing so may not be wise in many cases. Thanks!

V13 · January 5, 2023, 6:14pm

I just quickly used this online calculator:
I suspect you’re right and it’s flawed.

Indeed, it’s flawed. That’s not how you calculate the entropy. That’s why if you enter “8 lower case latin characters” on the site it says “strong, well done”, even though the word “password” is 8 characters and it’s the weakest password in the world.

The reason is that these entropy calculators assume a sequence of random characters from an RNG that follows a uniform distribution, which isn’t the case when people choose their own passwords.

Btw, this whole discussion has strong parallels with password-based-authentication vs 2FA.

222 · January 5, 2023, 6:17pm

Also tried this:
https://rumkin.com/tools/password/

This allows you to generate passphrases based on dictionary preference and then check entropy. Not hard to get above 150 when you’re using 5-7 words. (I wouldn’t suggest ever using anything online like this to create a real password.)

V13 · January 5, 2023, 6:31pm

https://rumkin.com/tools/password/

Again, this is misleading. Here’s what it says:

passwordpassword: “You have a strong password, which provides approximately 71 bits of entropy.”
eethiibeiquahche: “You have a strong password, which provides approximately 66 bits of entropy.”

The first one is clearly flawed and very easy to crack while the second one is of the same length but randomly generated.

In reality the first one has no more than 30 bits of entropy (“password” being within the 20K most popular words; ~14.2 bits of entropy, multiplied by 2 for the repeat, give or take) and the second one has ~75 bits (26 combinations per randomly generated character; ~4.7 bits of entropy, multiplied by 16).

This makes the second one ~ 30 trillion times stronger than the first one (2^45).

dh024 · January 5, 2023, 6:36pm

The passphrase I quoted above generates about 170 bits of entropy and it involves only three words and a date, and cannot be cracked with a dictionary attack or combination of dictionary attack and year-related numeric-guessing (a common technique in cracking). I think it is totally possible to create memorable, easy-to-enter passwords if you just get creative and avoid things like diceware dictionaries, particularly when using a password manager where you really only have to memorize one or very few passwords to make the rest of the system automated.

222 · January 5, 2023, 6:36pm

Fair enough. Try the random dictionary tool on that site with your preferred dictionary at 5-6 decently lengthed words and check the bits.

grb · January 5, 2023, 7:43pm

The zxcvbn tool used by Bitwarden estimates only 79 bits of entropy for your example phrase. However, every entropy estimation method that is based on the password/passphrase as a starting point (including your estimate of 170 bits and zxcvbn’s estimate of 79 bits) is inherently flawed, because it relies on assumptions about the cracking strategy — which is unknowable. For example, if I wanted to hunt for passphrases created by individuals for whom concerts were a memorable experience, I could set up a dictionary attack of the form BandName+City+RecentYear which would easily find your three “words”; I would then only need to introduce special characters and misspellings — this strategy would result in even less entropy than the 79 bits estimated by zxcvbn.

My point is that if you want to be sure that your password entropy is as high as you think it is, you need to use an entropy estimation method based on analysis of the process by which the password/passphrase was created, not an analysis of a specific exemplar. For example, a random passphrase generator based on a diceware-style word list (7776 words) will generate approximately 13 bits of entropy per word — unless you start cherry-picking passphrases and re-roll the generator if it contains words that you don’t “like”.

222 · January 5, 2023, 8:40pm

Great post! Thanks!

dh024 · January 5, 2023, 9:20pm

@grb - let me offer a few things for you to consider in your argument.

Both 170 bits of entropy and 79 bits of entropy are correct calculations. The difference is that the first assumes brute force cracking and the second (zxcvbn) uses a mix of methods: brute force, pattern matching (for the year) and dictionary matching (for the words). Thus, the two methods represent the possible range of entropy depending on the method - dumb brute force vs. ‘smart’ cracking.
zxcvbn breaks down the password you enter into components (e.g., words, years) to estimate an efficient cracking method. In the case of my example password DefLeppard-Cincinatti(1983) it breaks down the password into Def Leppard Cincin atti 1983 plus the special characters, much like the cracking strategy you suggest above. zxcvbn then uses a mix of dictionary matching (words), regex pattern matching (year component), and brute force (special characters) to estimate the time to crack the password. But even so, the entropy is still strong (>70).
The more interesting part is that zxcvbn and you both had access to my strategy, because you knew the password in advance. But if you did not know or could not guess the strategy, one could not crack the password using the mixed method above (dictionary & pattern matching + brute force). For example, if you did not know the password, how would you know that the first two components were words or that the last component was a numeric year surrounded by parentheses? As I said before, if you don’t know the password-creation strategy, dictionary or pattern matching won’t work - the password would have to be brute forced.
So, the entropy estimated by zxcvbn of 78.839118 is unrealistically low. The entropy value of about 170 (based on a brute force algorithm) would be a much better estimate of the strength of the password.

You mention above that the entropy estimation method must match the “process by which the password was created” - so, I hope you understand what I am trying to communicate. Cheers!

grb · January 5, 2023, 10:34pm

I understand what you are saying, but I don’t think you understood my point — which you’ve mangled a bit in your paraphrase.

First, I want to point out that you have no control (or knowledge) over what password cracking strategy will be used when or by whom. For example, if I had access to the LastPass leak, I might try 12-character strings generated by a Markov Chain model, or I might try the 100k most common English words with l33t-transformation and padded to 12 characters using punctuation marks (a.k.a. the “haystack” password scheme). If the particular set of patterns and rules that I use as the basis for the brute force attack doesn’t match your actual password pattern, then you got lucky — this time. For example, your password could have been !@#$%^&*()_+ and my two strategies above would fail; however, another attacker might look for keyboard walk patterns and crack this one in seconds. So you’re relying on luck, to a large extent, and the time to crack your password is highly dependent on the strategy chosen for cracking.

The problem with the philosophy that you’re espousing is that it relies on security by obscurity — the security depends on the assumption that the process you used for creating your password is not known (and by extension, could not be guessed) by an attacker, which violates Kerckhoffs’s principle.

In contrast, a randomly generated passphrase does not suffer from these problems. With a randomly generated passphrase, I do not have to worry that an attacker will choose a brute-force pattern that can find my passphrase. I can even make public the number of words and the specific word list that I used to generate the passphrase, and the average number of guesses required for cracking the passphrase would still be the same (e.g., 1.1×10²³ for a 6-word diceware phrase).

Entropy is a measure of randomness. If you select a passphrase based on non-random decisions, then the true entropy is actually 0 bits, and you are protected only by the fact that the attacker doesn’t have advance knowledge of the exact decision-making process that you used to arrive at DefLeppard-Cincinatti(1983).

It follows that an accurate estimate of entropy can only be obtained if the decision-making process is known, including the probability distribution for each decision made.

dh024 · January 5, 2023, 11:03pm

I guess we have to disagree, then. My understanding is that entropy refers to how (un)predictable something is or how much inherent variability there is among a set of possible solutions. I believe random passwords are regarded as having high entropy because they maximize how difficult it would be to replicate them (e.g., by chance, by a set of prioritized solutions, etc.). But, for example, chaotic patterns (a la chaos theory) can also have high unpredictablity because some cannot be distinguished from random patterns, even though these patterns are 100% repeatable if you know the algorithm.

So, I think all that matters is how the password cracker implements their guessing strategy. If one’s password-generation process is not known and cannot be predicted by the cracker, then the cracker must resort to brute forcing the solution. There are no other options. Thus, my point is, just create a password using a strategy that can’t be guessed or replicated so that it requires a brute-force attack to guess it. Then, you have a memorable password that will take just as long to crack as a random password - that is really the only thing that matters.

grb · January 6, 2023, 1:48am

The benefit of a randomly generated password/passphrase is that its strength is independent of the guessing strategy. With anything else, you may get lucky more often than not, but it’s a game of russian roulette.

create a password using a strategy that can’t be guessed or replicated

There’s no such thing. If you are able to device a password creation scheme, then a password cracker (or a global network of password crackers) can also think up the same scheme (and probably already did). Thus, your security rests on the hope that this never happens, but you can never be certain of the guessability of your password.

dh024 · January 6, 2023, 2:57am

You are right - I should not have said “can never be guessed.” I should have said “cannot be easily guessed” or “cannot be guessed in a reasonable amount of time.” Regardless, there just isn’t a method to easily guess the password creation scheme I proposed, given that there are so many possible ways I could have designed it and there are no example passwords from my scheme that crackers could learn from. So again, it just comes down to whether or not crackers will have to brute force the password - it seems to be the only option. If so, my password will now take the same amount of time to crack as a random password.

Regarding your first point above, yes random passwords definitely are strong - nobody is arguing that. But I think anyone would much rather have a master password that is both easily memorable and “strong enough”.

grb · January 6, 2023, 3:51am

Your claim is that if the password has to be brute forced, then it will take the same time to crack as a random password. This claim is clearly true for DefLeppard-Cincinatti(1983), but it is also true for Password123!. The weakness of the argument is in the premise: “brute forc[ing] the password…seems to be the only option”.

You are betting on the assumption that you have come up a unique, original idea, one that no password cracker has ever thought of and will never think of in your lifetime. Even though humans can be creative, the number of truly unique ideas that are generated by humans is not very large. This is especially true in applications that are constrained by their form, such as writing music or creating passwords.

I agree, and I have not claimed anything to the contrary. That’s the whole point of passphrases. What I’m arguing is that you will have no guarantee that your passphrase is strong enough, unless it is randomly generated. Randomly generated passphrases are memorable — in the sense that the humans are able to memorize them (e.g., the average number of words that can be held in short term memory is 7; repeat this memorization with sufficient frequency, and it will become stored in long-term memory). Your method may reduce the time and effort required to memorize a passphrase, but I hope you weren’t implying that the average person is incapable of memorizing a sequence of 5-7 randomly selected words, with some modicum of effort.

In my opinion, it is worthwhile to take the time to memorize a randomly generated master password when you first set up your password manager, in return for guarantees about the password strength. That way, when the inevitable Bitwarden server breach happens, I don’t have to worry that some hacker, somewhere, has ever had the same “original” idea that I came up with.

dh024 · January 6, 2023, 4:22am

This argument is getting really tiresome for me. Please keep an open mind.

Password123! is a very common password that has been reported in multiple breaches. I challenge you to find DefLeppard-Cincinatti(1983) in ANY breach. It should not be hard it if exists. Please prove me wrong. These are CLEARLY not comparable passwords.

No, I am growing tired of these strawman arguments. It does NOT have to be unique, just uncommon enough so that it is not easily guessed. I have stated this so many times now. Why are you still arguing this point?

I believe if you took the time to consider what I am trying to say above, you would understand. If it can’t be easily guessed, it must be brute-forced. Thus, my scheme works just as well as random. How many ways can I say this?

Of course not - please, please stop the strawman arguments. Can’t we have a civil discourse with consideration of each other’s contributions? This is so tiresome. I am totally done with this thread. Yeesh dude.

grb · January 6, 2023, 5:47am

Feel free to disengage at any time. I’m writing to educate other readers, not to try to convince you to change your mind. With regards to your parting shot above, I don’t think it’s deserved. I have been civil throughout, and to the extent you believe my arguments are strawmen, this appears to be the result of misunderstanding on your part or mine, not on any intentional effort to stray from the main point.

I will not attempt a retort to your first point, but suffice it to say that my original argument was just pointing out that a conditional with a false antecedent results in what is sometimes termed a vacuous truth (i.e., P→Q is always True when P is False).

Here, I will just focus on what I believe to be the crux of our disagreement: the validity of the antecedent (which I will rephrase as “There exists a non-random scheme that cannot be guessed in less time than would be required to brute-force the generated password”). I do understand that you consider this to be the case.

My own opinion, which I am restating here to clarify the position I’ve tried to get across in the posts above (not to attempt to goad you into further argument) is as follows:

The validity of the antecedent cannot be guaranteed. Consequently, it is possible that it may be false, in which case your generated password is weak.
Because the validity of the antecedent cannot be guaranteed, you will never know whether your generated password is strong.

So the case I am making to readers of this thread is simply that randomly generated passphrases are preferable to non-random passphrases, because the strength of the former is guaranteed (and therefore also knowable).

V13 · January 15, 2023, 4:34pm

I think I agree with everything that you said. This sentence however can be misleading. That’s because you’re saying “readers” “passphrases” and “preferable”. What you’re saying is true about cryptographic secrets but it isn’t necessarily true or meaningful for “passphrases”.

It’s indeed true that randomly generated, long series of bytes are almost always more difficult to crack. But it doesn’t mean that they’re good “passphrases”. If that was true then everyone would have a 20 character random password (or N-word random passphrase) which would incorporate 1024+ bits of entropy. If however you did that then humans would write these down in the most insecure way and be so frustrated that they’d opt in for something simpler.

If you’re looking at randomized passphrases that someone needs to (a) memorize, (b) type frequently and (c) type on a cell phone’s keyboard, then this places upper limits on their parameters (length, complexity) and it may as well be possible that a non-fully random one ends up being more secure than a random one just because these two approaches have different such parameters.

grb · January 22, 2023, 3:22am

@V13 You make an interesting point, which I will attempt to address.

If I have understood your comment correctly, you are in essence asking whether it is possible to make a non-random passphrase that is easier to memorize than a random passphrase, yet have equal or greater entropy. And if the answer is “yes”, then you are arguing that all else being equal (specifically — the entropy being equal) the passphrase that is easier to memorize will be more secure (because there is less risk that the user will engage in insecure practices such as writing down their master password, or choosing a password that is too short).

My response to that question is that, yes, it is possible that one could create a memorable, non-random passphrase that has sufficient strength (comparable to what can be achieved using randomly generated passphrases). However, the problem is that you can never be sure of the actual security/strength of such a non-random passphrase, because its entropy is unknowable.

Nonetheless, we can make some guesstimates. It has been claimed that the 3000 most common English words make up 90% of conversational English. To be conservative, let’s assume that your non-random passphrase consists only of words from among the top 1000 (or that the geometric mean of the words ranks is 1000). If the words were randomly chosen from this pool, you would get 10 bits of entropy per word. Now, research has shown that when words are arranged into grammatically correct language, the effective entropy is reduced by a factor of one half — thus, we would end up with 5 bits of entropy per word in a sentence. The average sentence contains around 14 words, and studies have shown that sentences longer than 17 words are difficult to read (and thus presumably difficult to memorize). So if we restrict ourselves to a sentence containing 16 words, the entropy can be estimated to be 16×5 = 80 bits.

Compared to a randomly generated passphrase that is drawn from a Diceware-style list of 7776 entries, the strength of the hypothetical 80-bit sentence is comparable to a 6-word passphrase that has a special character as a word separator (assuming the separator is randomly selected from a pool of 5-6 special characters, e.g., -_,.;/).

Of course, we can never be sure of the actual entropy of the non-random passphrase, because it is impossible to verify the many assumptions that go into estimating this value. And there is always a risk that an individual using this approach will choose a sentence that is a quote from a published work, in which case the entropy drops precipitously.