In the first case, a 24-GB GPU (e.g., the RTX 4090) can use 220 cores at once (assuming 1 thread per core), and in the second case, they can only use 44 cores. If we (simplistically) assume that a single thread of the 1-GiB task takes twice as long to complete as a single thread of the 512-MiB task, then the first configuration (M=1 GiB, L=10) should result in a computation speed that is faster than the second configuration (M=512 MiB, L=1) by a factor of 220/2/44 = 2.5×.
So to answer your question, between your two options above, the 512-MiB case with a parallelism of 1 would be better.
To take a step back and look at the big picture, everything that you are doing is far beyond what the typical Bitwarden user is doing with their vault, so you are already way ahead of the game. If you enjoy pushing the limits like this, that’s fine, but it’s best not to overthink everything, especially if it is causing you stress! Password managers are supposed to make your life easier, not more difficult.