Argon Settings Question

Herc · April 12, 2024, 7:05am

If my main desktop rig has only 2 cores, should parallelism be reduced to 2 threads or is it irrelevant?

After reading some literature, I’ve cranked up memory to almost 800 MB and have iterations at 3.

Also, in my research I read there may be a way to attack Argon (can’t remember if it was 2d, 2i, or 2id) if iterations is below 10. Anyone know anything about this? I seem to think it pertained to side channel attacks against Argon2d, but I really can’t remember and I am just learning from a beginner level.

Also, KeePassXC devs told me the difference between Argon2d or Argon2id is irrelevant for their usage. KeePass official documentation says the side channel attacks are only theoretical and are not believed to be a real threat, so they prefer Argon2d by default, for greater resistance against real-world threats.

For my KeePass database I have iterations closer to 20 and I adjust the memory size depending on if the database is being used locally or being uploaded to the cloud. When storing it in the cloud, I use a much more aggressive memory setting since local threats generally do not concern me.

Parallelism in my KeePass database is set at 2, the default, while currently I have my BW parallelism at the default that I think is 4, which may be too high for my 2-core main CPU, but on the other hand I read BW’s Argon may only be single-threaded anyway, so this setting may be wholly irrelevant.

Looking forward to any informed insights.

Quexten · April 12, 2024, 11:16am

Argon2id does not have the side-channel risk that Argon2d has. That being said I understand KeePassXC’s reasoning. If there is malware on your computer that can exploit a side-channel, you have much bigger risks to your vault.

grb · April 12, 2024, 1:56pm

The maximum meaningful value for parallelism is typically 2× the number of cores, which would be 2×2 = 4 in your case. That being said, increasing parallelism speeds up an attack (in addition to also reducing your unlock delay), so the conventional wisdom is to only use parallelism as a way to be able to increase the Argon2id memory requirement without unduly extending your unlock time.

For protecting the Bitwarden vault, selection of a parallelism setting is further complicated by the fact that not all clients support multithreading of the Argon2id calculations. IIRC, only the mobile apps currently support multithreading when parallelism > 1. Thus, unless you have an older or less powerful mobile device, increasing parallelism beyond 1 may not benefit you at all, but doing so does reduce the cracking time for an attacker.

Alphanaut · April 12, 2024, 2:19pm

If parallelism settings should be 2x cores and increasing it beyond that reduces cracking time, I’m assuming parallelism should be set to 2x the least number of cores across all bitwarden devices? I use bitwarden on an iPad, Android Phone, laptop, and desktop, all which most likely have a different number of cores.

grb · April 12, 2024, 2:33pm

Increasing it beyond parallelism = 1 will reduce cracking time. Like I said, I would keep parallelism as low as possible, unless you find that increasing the parallelism allows you to use a higher setting for memory.

Since you are using an iPad, you will run into memory limitations using the Auto-fill app, unless you are using biometric unlock. If the iOS memory limitations apply to you, you’ll have to reduce your memory setting to 48 MB. Thus, you should probably have a parallelism of 1, and the number of iterations as high as possible (without unduly slowing down your unlock time).

Herc · April 12, 2024, 4:56pm

CPU-X says I have 2 cores and yet only 2 threads. This is my CPU:

“The Intel Pentium 3805U has 2 cores with 2 threads and is based on the 4. gen of the Intel Pentium series.”

This may seem under-powered, but with a fast SSD and lots of memory it works well for my office needs.

Edit: I bought this under-powered CPU originally thinking I would promptly replace it, but so far I haven’t had the need to do so. If I were buying a new CPU now I would consider the AMD Ryzen 5 5560U.

grb · April 12, 2024, 6:26pm

Unclear whether they mean “2 threads each” or “2 threads total”.

If using Windows, check in System Information or in the Task Manager for the number of “Logical Processors”. This is the maximum number of parallel threads that can be run.

Herc · April 12, 2024, 6:30pm

I found LOTS of info in “CPU — Info Center” in KDE.

It says:

“Thread(s) per core: 1”

“Core(s) per socket: 2”

“Socket(s): 1”

“Stepping: 4”

So, I believe it is clear I have only 2 threads total on this box. Therefore, at least on this box I am not benefiting any from having parallelism above 2, but that is giving my potential attacker some advantage.

grb · April 12, 2024, 6:42pm

You are not benefitting from having parallelism above 1 (not 2) except for on your Android phone and iPad. The implementation of Argon2id for Bitwarden’s desktop app and browser extension is unable to take advantage of multithreading.

Quexten · April 12, 2024, 8:12pm

github.com/bitwarden/clients

[PM-6419] Change desktop argon implementation for Node

bitwarden:main ← bitwarden:ps/PM-6419-fix-desktop-argon2

opened 02:28PM - 22 Feb 24 UTC

dani-garcia

+91 -4

## Type of change ``` - [x] Bug fix - [ ] New feature development - [ ] Tech… debt (refactoring, code cleanup, dependency upgrades, etc) - [ ] Build/deploy pipeline (DevOps) - [ ] Other ``` ## Objective At the moment desktop is using the WASM implementation of Argon2, which is failing to load in the renderer processes now that context isolation is enabled. By moving the implementation to the main process we avoid this problem, and as a plus it allows us to switch to the faster Node implementation. ## Code changes - Created two new `CryptoFunctionService` implementations, specifically for desktop usage: - `RendererCryptoFunctionService`, for the renderer processes. Inherits from the WebCrypto implementation with the exception that it replaces the argon2 implementation by an IPC call - `MainCryptoFunctionService`, for the main process. Inherits from the node implementation and adds a listener that responds to the IPC calls from the renderer service.

Should work for desktop too now. Web/Browser are still single threaded.

Mulled7768 · April 13, 2024, 1:33am

I am raising this for discussion, not as critique. There appears to be a discrepancy between these statements:

compared with this on the Bitwarden Vault - Security - Keys page:

Before choosing to use Argon2id I read fairly extensively about which parameters affected the outcome to what extent. I was concerned at the time about the suggested 48MB limit for iOS although I have found that setting it higher on my devices has been fine.

Regarding parallelism more of it certainly allows more iterations and memory in the same processing time, the question being: is the user’s parallelism setting a constraint on the attacker?

It is assumed an attacker can avail themselves of far more resources than a user. If the set parallelism is a constraint then lower may be better, although that will limit practical memory and iterations for the user. If set parallelism is not a constraint then higher is better, consuming more of the attacker’s resources for a single attack line, either in the same way more memory and iterations do, or by making more memory and iterations practical for the user (thus harder for the attacker).

My reading suggests the last clause is true, and possibly the previous as well.

If I assume the attacker is unconstrained by specific parallelism setting (as I currently do) then parallelism becomes a setting to balance CPU efficiency and user time while maximising memory and iteration settings which offer actual security.

Following from that, in relation to cores and threads I believe the problem is not to match perfectly but to ensure there is always an available queued task for each worker thread without spending too much overhead on queue management. So, “double the available worker threads” seems to be a working or rule-of-thumb approach, using unlock time on your least powerful device as the benchmark. In my case that is a six-core iOS device implying p=12 but there is a further wrinkle. I read also that iOS assigns a given job either to its three performance cores or to its three efficiency cores (and may shift between them as a whole), not to all six at once. Therefore I have set p=6. My tests show this to be somewhere in the optimal performance zone for my settings of m and i.

I am interested in working out the properly founded basis for setting p as well as m and i.

Edit to add: I should have mentioned that Apple Ax silicon is single-threaded per core, hence my reference to cores rather than threads there.

grb · April 13, 2024, 1:49am

This was discussed some time ago in this forum, and I would recommend that you read this comment and the discussion that follows.

I have not thought about or experimented with these things for some time now, so if you have counterarguments to the assertions made in the previous thread, please share them (either here or there).

Mulled7768 · April 13, 2024, 3:58am

Yes, i read that before deciding on Argon, having searched all Argon threads on the forum. It was ambiguity around use of the term parallelism which gave me pause; user or attacker was left to be inferred from context and could potentially be misunderstood, so I searched other reasonable sources on Wikipedia, developer and security sites. The upshot (confirmable in the referenced thread) is that my suppositions appear correct. User setting of p is not relevant to security where nothing else is varied. On unlock time its effect depends on the Argon implementation, whether p is useable.

What is facilitated by increasing p is increased memory and iterations, increasing these costs for an attacker while reducing the impact on unlock time, provided the implementation is multi-threaded which we see above is now the case except only for the browser. The remaining piece for a user to consider is their base processor capability.

Much of this discussion is moot in that the marginal security gains from a user increasing Bitwarden’s defaults, already way above NIST recommendation and beyond PKDF2, will be negligible for almost everyone. I did it anyway.

Herc · April 13, 2024, 2:17pm

So does a memory setting of 1400 MB with 3 iterations provide obviously superior cracking resistance vs 900 MB of memory and 9 iterations? Answer doesn’t seem obvious to me because 900 MB is a big amount as well yet with 3X the iterations.

Herc · April 13, 2024, 2:34pm

To be clear, the BW defaults may be above the memory constrained recommendations (I can’t remember), not anywhere close to the general recommendations. See here: RFC 9106 - Argon2 Memory-Hard Function for Password Hashing and Proof-of-Work Applications

“The Argon2id variant with t=1 and 2 GiB memory is the FIRST
RECOMMENDED option and is suggested as a default setting for all environments. This setting is secure against side-channel attacks and maximizes adversarial costs on dedicated brute-force hardware.”

Importantly, a vault generally needs good future proofing because once it gets into the hands of an adversary they can sit on it until technology improves. Years down the line they could crack in and the victim would be caught off-guard.

grb · April 13, 2024, 5:32pm

A GPU with 24 GB memory could only use 17 of its cores if the Argon2id memory cost is set to 1400 MB. With 900 MB memory cost, the GPU could use 27 cores. Thus, assuming no bus speed limitations, the attacker would be able to test guesses 59% faster with the 900-MB setting if the number of iterations were the same. However, if the number of iterations is 3 vs. 9 for the memory settings of 1400 MB and 900 MB, respectively, then the latter configuration would be expected to be 47% slower than the former.

grb · April 13, 2024, 5:35pm

In general, it is more effective to future-proof by extending the entropy of your master password. For example, you can add one additional word to your randomly generated passphrase for every 25 years of future-proofing required.

Herc · April 13, 2024, 5:55pm

Do you strongly advise the use of a short pin if local threats are not perceived to be an issue? Otherwise it can be tedious unlocking the vault with a properly strong and long password.

Herc · April 13, 2024, 5:59pm

Your GPU answer is especially impressive and interesting. Thank you. Moreover, I would guess cracking hardware will evolve to deal with memory-hard KDF’s like Argon, so cranking up memory to max with minimal iterations may not be ideal. As you can see, even now the lower memory configuration with more iterations is more difficult to crack.

Also maybe this is all academic because the differences are minute in reality, but it is still interesting to me and I like to have the best settings available.

grb · April 13, 2024, 6:22pm

My personal recommendation is to rely on current KDF recommendations to protect against contemporaneous attacks (and periodically update KDF settings as recommendation change in response to improvements in computing technology).

To protect against future attacks on a vault that is stolen today, increase the master password entropy (by ½ bits for every year of future-proofing desired).

Biometric unlock or PIN unlock are fine to use (as long as you don’t disable the option to “lock with master password on restart”), but I would not advise using a “short” PIN (e.g., a 4-digit numerical code). Local threats can be minimized but not completely eliminated. Thus, you can use a PIN that has lower entropy than your master password, but it should still be randomly generated and sufficiently long to provide some meaningful resistance to brute-force cracking (e.g., use a random passphrase of 3-4 words as your PIN on browsers and desktop apps, or a 12-digit numerical code on mobile devices). You may find biometric unlocking to be more convenient.