Passphrases generator should use nonsense/fake words and place digits in multiple locations

Enhancement of passphrase generator:

  1. Use nonsense words and,

  2. Produce digits between 0 and 999 in multiple locations and randomly place the digit in the front of a word or the end of a word.

  3. (Optional) Randomly force a single entire word to ALL-CAPS.

  4. The passphrase generator should use nonsense/fake words instead of real words to resist dictionary attacks. There are nonsense word generators that are customized for each language so that the nonsense words are “pronouncable” in that language.

  5. Passphrases should either place a digit in more than one location for passphrases that contain more than, say, 4 words; or, the number placed should randomly be between 0 and 999, or both.

These two enhancements will GREATLY increase the difficulty of cracking a passphrase even if an attacker knows that a passphrase is being used and not merely a password, AND the attacker knows the language spoken by the target.

Some examples of nonsense words (in ENG):

ushiredunlity
barrob
matiesy
barkloriatorit
thdrifiester

With all three enhancements, a potential four-word passphrase could be:

Thdrifiester61-MATIESY-459Barrob-Ushiredunlity

FYI, passphrases are not vulnerable to dictionary attacks if properly constructed (sufficiently large word list, sufficient number of words in passphrase, words randomly selected using a CSPRNG, no cherry-picking or manual modification of the generated passphrase).

The changes you have proposed would increase entropy (password strength), but at a high cost (significant reduction in the ease of memorization and manual typing of the passphrase). The entropy benefits would be modest:

  • Nonsense words: Most generators use a fairly standard list length of 7776 words, so this would results in 0 bits of additional entropy.
  • Increase the range of inserted numerical value by a factor of 100×: this would increase the entropy by less than 7 bits.
  • Randomly capitalize one word: this would increase the entropy by 2–3 bits (for passphrases containing 4–8 words).
  • Tailor passphrase to the user’s language: this is not a random selection, so nominally, the entropy increase is 0 bits; if a language is selected at random from among the 700 languages that have at least 100,000 speakers, the added entropy would be less than 10 bits.
  • Split the numerical digits into multiple locations: the calculation of the added entropy depends on what assumptions are made about the splitting algorithm, but one can estimate that the added entropy would be in the range 3–5 bits (for passphrases containing 4–8 words).

Contrast this to the much simpler method of increasing the passphrase length by a single word, which will increase the passphrase strength by 13 bits of entropy (much larger than any of the methods suggested). Even using all 5 proposed techniques in tandem (Thdrifiester61-MATIESY-459Barrob-Ushiredunlity) would result in a passphrase that is weaker than a conventional passphrase generated using 2 additional words (issuing-clarinet-vending-crewman-doormat-mutilator) — and of those two, it is clear that the conventional passphrase will be easier to remember and easier to type.

This being said, feature request topics must be limited to a single proposal, and should not duplicate existing feature request topics. Below are links to some existing feature request topics that you may wish to support (because they overlap with the suggestions that you have made in this topic):

 

Because of the multiple proposals in this feature request topic, and the overlap with existing feature requests, this topic will be closed (after a brief grace period to allow for questions or additional discussion),

Or, use a random password. They have much better strength-for-a-given-length than passphrases or even squyd-phrases. The general rule is that if you need to remember and type it, use a passphrase; it you will only be auto-filling it, use a password.

The entire point behind passphrases is to make something easy to remember, easy to communicate and easy to type. This was most famously and most clearly explained by xkcd. Adding complexity (random upper, lower, digits, specials) to passphrases creates a secret that benefits nobody. In addition to being difficult to remember or type, it also is less “strong” than an equivalent length password.

Perhaps surprisingly, all of these have the same approximate strength (~64 bits of entropy) when generated randomly [cite]:

  • 20 digits [0-9]
  • 14 lowercase letters [a-z]
  • 11 letters of mixed cases and numbers [a-zA-Z0-9]
  • 10 ascii printable characters ( letters, numbers, punctuation)
  • 5 diceware words.
  • 4 word squyd-phrase (estimated).

By way of sample, which would you rather use? They are similarly strong.

  • 52546643449572466429
  • kvnxyznabgvwfw
  • f7XkFYX5mg5
  • aT@KTv$2"h
  • horse correct battery staple onion
  • Thdrifiester61-MATIESY-459Barrob-Ushiredunlity

A decent description of the “math” behind password strength (which we call “entropy”, can be found in this Wikipedia article.

As an aside, the greatest cause of passwords breaking comes from the use of punctuation in passwords. One can avoid this problem by using an alpha-numeric password that is 10% longer.

2 posts were split to a new topic: Entropy of passphrases that contain pwned passwords/phrases

A cursory ask of ChatGPT and Gemini both say that real word passphrases ARE in fact, more vulnerable to certain attacks as compared to random-character passwords.

I’m only advocating that a “pronounable” nonsense word CANNOT be found in any dictionary (during dictionary attacks) AND can still be memorable (for ease of use). Even if not ALL words in a passphrase are nonsense, the inclusion of one would force an attacker to use brute-force, a highly inefficient method.

There should be NO distinction between a passphrase and password that an attacker could exploit. It’s as simple as that.

This feature request here: More Password/Passphrase Generator Enhancements (Comprehensive List) more or less also already contains these suggestions:

Tip: Don’t outsource your security decisions to an auto-complete generative LLM app.

The claims made by ChatGPT and Gemini are misleading, because they are not considering passphrases produced by a generator — which is the topic of this feature request. Yes, passphrases that are created by humans (“Correct Horse Battery Staple”, “I love my dog”, “Call me Ishmael”, etc.) are notoriously weak, just like passwords created by humans (“123456”, “P@ssw0rd”, “Qwerty1234”, etc.).

In my comment, I specifically referred to “properly constructed” passphrases:

If any AI model disputes this, then it is hallucinating, plain and simple.

I agree, but even with today’s implementation, the only distinction between the security of passwords and passphrases would be that the latter are more likely to be constrained by websites that foolishly enforce length limits on passwords. When creating a credential for a website that does not impose any password length limitations, then there is no distinction between a passphrase and a password that an attacker could exploit (provided both have been properly constructed, as explained above).

This is the link to Choosing a Secure Password - Schneier on Security
Bruce has lots to say about best practices for generating passphrases. The section on XKDC, he says,

“This is why the oft-cited XKCD for generating passwords—string together individual words like “correcthorsebatterystaple”—is no longer good advice. The password crackers are on to this trick.”

That was written in 2014. Other back issues give further advances and advice on the subject.

I think that if real words are used in passphrases (and the user knows those words), it doesn’t matter if the word is a four or five-letter word, or a 10 to 12-letter word; if the user knows the word, then the user knows the word and each word constitutes one element to remember. (Remember the notion that people of average intelligence can remember 7 items (+/- 2).

For effective attacks on passphrases composed of real words, if it’s in a dictionary, it also constitutes one element of the passphrase the attacker needs to get correct, and the length of each word is irrelevant to the cracking computer when employing certain attacks.

So… In an attempt to generate a passphrase that is both resistant to some types of attacks, and easy to remember, it seems that the use of a nonsense word among real words (and contains complexity of digits and punctuation) is (at least for now) a reasonable trade-off.

The goal is to make any attacking method as costly as brute-force, which is the only method, given enough time and energy, that is guaranteed to work.

@Squyd, your comment is more relevant to your feature request thread than to the topic where you posted it, so I am moving it there.

If Schneier is talking about human-created passphrases, then fine, I have no objection. But if he is referring to randomly generated passphrases (a.k.a. “diceware” passphrases), then his statement is uninformed and plain wrong.

That’s true, within limits. If the words in the passphrase are extremely short (e.g., 3 letters or less, on average), then a character-by-character brute-force atttack may succeed.

This is also true, with the same caveat that I expressed above.

This conclusion does not follow from anything that you’ve said.

Perfect, so they can remember passphrase consisting of 5–9 words. Using a standard diceware wordlist containing 7776 words, a randomly generated passphrase would have 65–116 bits of entropy. This is sufficient for most real-life scenarios (especially when you get to passphrases containing at least 6 words, corresponding to a 78-bit entropy).

If by “brute-force”, you mean the systematic guessing of different permutations of subunits (whether the subunits are words, individual characters, or character-patterns), then OK. If you mean to say that the goal of a passphrase is to make it “as costly as brute-force” guessing character-by-character, then no — this will never be possible, and is not a meaningful “goal”.

It is only guaranteed to work if the attacker in fact does have “given enough time and energy” (and resources). Thus, the goal of any password or passphrase should be to make a brute-force attack so costly that no attacker will have the resources, funds, or time to crack the password/passphrase. It is generally estimated that 72–80 bits of entropy are sufficient to achieve this goal.

In simpler words: if there are 7777 words to choose from and the attacker knows them, then he still has to brute force among these words. Just as if there are 26 lowercase letters and the attacker knows all of them he uses brute force to guess the password.

In this analogy a password like “juggling-dilation-ridden-image” of 4 words (if **each word is truly random from a generator) can be compared to a password “axecaoeu” of 8 letters (each character is truly random) and even if attacker knows the scheme used, he still needs to brute force 7777 ^ 4 or 26 ^ 8 combinations respectively.

If you make number of words large enough, you can get from the brute force perspective exactly the same “difficulty” as you can get from a randomized password.

This thing is formalized and expressed as “entropy”, understanding it will provide a fast way to compare different password generation schemes between themselves, but you don’t have to understand entropy to see that knowing the source dictionary is not enough to completely weaken the passphrase approach. Sure, it reduces bruteforce from naive “all the separate letters” to “words from the wordlist”, but this still gets the hacker only “back to” the number of combinations that we actually chose from when we generated the passphrase.

We can go further into the philosophy of “why passphrases and not letters”?

The answer is: because words are much richer in context for the human to remember.
There is a simple association technique to remember passphrase of almost arbitrary length: you generate any random passphrase, let me fatch my generator…

riot-nearly-cardigan-denatured-unlocking

Then you get the first word and remember it is some way. Let’s say “I’m starting a riot against bad passwords”

Then you get first and second word and imagine a weird connection

“riot-nearly”

“I take this riot so close (near) to my heart that it is basically inside me!”

“nearly-cardigan”

“I nearly bought something, but I forget what it is, what a bummer, I need to shoot myself into my heart (cardio) with a gun!”

“cardigan-denatured”

“I wear this cardigan for so long that my skin is completely denatured”

“denatured-unlocking”

“this crazy scheme of remembering passwords is so far from nature, but hopefully I remember it and it let’s allows unlocking of my vault”.

And you make sure to build these associations visual, bright and personal to you, this way you remember them. But the source is completely random.

This is a basic memory technique that can be extended to much more like trees and stuff. Single letters just don’t provide the same richness to make such connections and so remembering them is much harder for the human and by remembering a word you get much more in terms of entropy than remembering a single letter.

On average, the right passphrase/password will be found searching half the “space”.

Yes, it is true that by using words and word associations, humans can leverage episodic memory and not just semantic memory when recalling the passphrase, making memorization easier.

Another major advantage of passphrases is that the pool of tokens (words or characters) from which the password is constructed is orders of magnitude larger for passphrases (e.g., 7776 words vs. 94 characters). As a result, the number of tokens required for a strong passphrase is much smaller than the number of tokens required for a string password (typically half). This is advantageous, because the number of words in a strong passphrase is often within the capacity of the human working memory, while the number of characters in a strong password will always exceed the number of items that can be held in working memory.

Nonetheless, passphrases are not always “better” than character-string passwords. The major advantage of character-string passwords is that their entropy density is higher than that of passphrases. For the entropy density is approximately 6.6 bits/character for random-character strings, but only 1.6 bits/character for random passphrases — thus the entropy density can be increased by a factor of 4× by using a character-string password. This can be important if the authentication service imposes a password length limit.

Thus, the conventional wisdom is to use passphrases when a secret needs to be memorized, manually typed, and/or conveyed verbally, and use character-string passwords for everything else.

I largely agree with the main argument of the text. Simply put, offering password generation options similar to iCloud passwords makes it easier to create stronger passwords that are more suitable for use across various sites, even with shorter lengths.

For example, consider the passphrase “horse correct battery staple onion.” Despite its considerable length, if it’s ever accidentally exposed, all someone needs to remember is the five words, their order, and any special characters used. Even if these words are chosen from a list of around 7,000, when you look at it as a combination of words, it’s essentially just picking five out of 7,000 possibilities.

On the other hand, if you add options like randomly capitalizing a letter in a random word, or inserting a random letter or number at the beginning or end of a random word, the complexity of the password increases significantly.

Examples:

  1. horse correct battery staple onion
  2. hoRse correct battery staple9 onion
  3. horse 2correct battery Staple onion

Bitwarden recommends using at least six words for a strong password, but not all sites allow passwords that long. If a site doesn’t, and an attacker knows you’re a Bitwarden user, they might try combinations of shorter words, which becomes more of a threat as the number of Bitwarden users increases.

If generating nonsense words is too costly, I believe that simply adding random numbers or letters, or not fixing the position of capital letters, can greatly reduce potential security risks.

Relying on passphrases just because they’re easy to memorize seems to me like a misuse of autofill features. Even if it’s not a misuse, we shouldn’t forget that if such a password is ever exposed offline, it’s also easy for any bystander to memorize.

Um, did you read the whole thread? Because I have the impression (trying to find a neutral phrase here) the posts from the other posters already dissected the main argument… :sweat_smile:

Absolutely true. As I said a week ago, don’t use passphrases if you plan to auto-fill. The idea behind passphrases is to have something easy to remember, easy to type and easy to communicate. None of those benefit auto-filling, so a complex password is the better choice.

Passphrases leverage the observation that “humans are terrible at remembering random combinations of letters and numbers, but we are great at remembering phrases of words” [cite].

Let’s do a bit of quick math to see how your suggested options would strengthen things:

  • ~5 bits would be added for randomly capitalizing one of the 31 letters (=log2(31)), plus
  • ~3.3 bits for adding a random digit (=log2(10)), plus
  • ~3.3 bits for placing the digit in one of 10 positions (before or after one of 5 words).

In total, the suggested options add ~11.6 bits of additional entropy.

On the other hand, adding a sixth random diceware word, adds ~12.8 bits of entropy (=log2(7776)), which raises the bar a similar amount (~1 bit more, actually).

So, these have similar strength:

  1. hoRse correct battery staple9 onion
  2. horse 2correct battery Staple onion
  3. horse correct battery staple onion probation

Which would you find easier to remember, type and communicate? That is the secret-sauce in passphrases – make it easier for humans by adding length and ditch the complexity.

And, before this conversation derails again, don’t actually use a password derived from a cartoon. Although the author indicates horse... was randomly generated, the fact that it is publicly known makes it worth less.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.