Importing from LastPass, encountered the 10k character max limit. Found the offending records, removed them, successfully imported the rest.
Here comes the fun part: I then split the offending records into <= 9000 char, attempted to create new records for them individually, but I’m still getting the 10k error when pasting some of them in. I can plainly see each has well under 10k char (wc -m ), yet Bitwarden tells me it’s over 10k. Am I missing something?
Out of interest I created a secure note by repeating the characters 1234567890 until I got the error about about “the field note exceeds the maximum encrypted value length of 10,000 characters”
The maximum BW note size is 7,439 characters (using the method I described)
I then encrypted the same message with GPG/PGP (which I know compresses before encrypting) and the result is just 227 ascii characters
That is 227 versus 10,000, both AES256. Maybe BW should compress before encrypting to help alleviate the problem of the restriction of the note size
password is password LOL
-----BEGIN PGP MESSAGE-----
This is not a bad idea, but your example is misleading:
Because of the way compression algorithms work, repeated patterns compress much more efficiently than other data (for example, I can create a “compressed” representation of the string 1234567890 repeated 750 times, simply by encoding this pattern as 750X1234567890, thus reducing the data size from 7500 bytes to 14 bytes).
Below is a more realistic experiment, using an online text compression tool. In each case, the input text consisted of 7500 characters:
Text Source
Compressed Size
Compression
Repeated “1234567890”
68 bytes
99%
Lorem Ipsum pseudo-Latin
1896 bytes
75%
Moby Dick Chapter 1
4824 bytes
36%
Random ASCII characters
7724 bytes
–3%
The Moby Dick example suggests that for English text, it may be possible to store a Secure Note that is up to 12k in length. However, for storing encryption keys and other random data, you would be better off not using any compression (since the compression algorithm actually expanded the data size).
The character limitation of Bitwarden Login not field is still 10000 characters ? (I have not seen something about the limitation in the FAQ)
It is possible to use bit of markdown or it is strictly plain text ?
I’m not sure. My best guess is that encrypted strings are usually already quite small (the size of a single field, e.g. a username or password) so compression wouldn’t yield much benefit in exchange for slowing down decryption (which is already quite intensive for large vaults).
I might be wrong on that though, or there might be better ways to do it - this is just how we do it today.
Another reason not to compress before encrypting is that if attempting to compress a string that is random (e.g., a password, or a note containing an encryption key or recovery code), then compression algorithms will typically return an output that is larger in size than the input. For the same reason, it would not be a good idea to compress ciphers after encrypting.
Just using a better encoding than base64 could reduce the overhead slightly, (base85, others), as long as care is taken that the special characters introduced don’t cause issues.
However, compressing plaintext data before encrypting is a dangerous game. If an attacker somehow has control over some of the plaintext saved (Imagine a feature where a Passkey auto-updates your displayName or something) CRIME style attacks can allow a chosen-plaintext attacker to recover decrypted data (or even encryption keys), by repeatedly choosing a plaintext part, and observing the compressed & encrypted lengths.
With individual field strings this is not as much of a problem, but moving to encrypting larger objects at once (f.e entire ciphers) this becomes a risk.
@Quexten, that’s incredible, but even after reading the Wikipedia article, remains difficult to believe in the form that it’s presented. Can you elaborate how compression renders the content easier to estimate? I ask because length should vary fairly randomly, and I cannot see how it would not be nullified by a decent encryption algorithm.
I would presume that using a library makes this quite trivial. Nobody’s going to re-engineer their own > base64 implementation.
To the extent that the attacker knows something about the plaintext (begins “-----BEGIN PGP MESSAGE-----”, or only uses 64/256 characters), a brute force attack can more quickly rule out candidates. This is one of the techniques used by the Bombe to break the Enigma machine.
I think compressing before encrypting is generally considered a secure practice.
One reason is that it enhances security.
Compression algorithms often remove patterns and redundancies in data.
Encrypting the already compressed data further obscures the original information, making it more difficult for attackers to analyze.
Compressing plaintext data before encrypting is the approach used by GPG/PGP and successfully used by Ed Snowden to keep his communication secure from the NSA.
Snowden worked at the NSA so he knew their capabilities and he considered GPG secure. I think this approves compressing before encrypting on security grounds.
PS This topic is about storing larger data in a note field but this will only work if the plaintext data is compressible and that is often not the case.
So we are back to just increasing the data limit (no compression).
It sounds like an easy thing to do so if BW wouldn’t mind just increasing the limit then that would be great, thanks.
@DenBesten, I’m familiar with how Enigma was broken. However, that’s irrelevant to compression, for the German communications were uncompressed.
Additionally, I believe that the PGP example you provided can be nullified by communicating the content in a structured manner with randomly ordered key values, instead of meretext/plain with PGP-key preface and postface syntax.
No, it doesn’t. Snowden is just a security researcher with asylum in the Russian Federation. He’s not particularly competent in comparison to those who design encryption algorithms. Irrespective, that’s conjecture – an actual evaluation, like the one aforecited in Wikipedia, is more actionable.
Encryption does not vary length randomly, at least for AES or chacha. For chacha20poly1305, the length is mapped 1:1 (it is a stream cipher, that is xor’d with the plaintext and a MAC is calculated on the ciphertext afterwards). For AES in CBC mode, the block size is 16, so your padding can be up to 15 bytes (with pkcs5). But even with padding, you can increase the data to a tipping point, and then leak plaintext size via the amount of blocks. So, for instance, at the moment, from the encrypted format of a vault, you can tell whether a password is 14 characters (1 block of ciphertext), or 20 characters (2 blocks). Rather, you know if the password falls into the interval [0,16], or (16,32] and so on. Base64 then additionally adds 33% overhead on it (plus IV plus MAC for the current bitwarden encstring type 2).
I would presume that using a library makes this quite trivial. Nobody’s going to re-engineer their own > base64 implementation.
I think compressing before encrypting is generally considered a secure practice.
I do not agree with this, and would like to request a source. CRIME/BREACH style attacks prove otherwise. Additionally, TLS 1.3 specifically removed compression because it was insecure. From the RFC:
Other cryptographic improvements were made, including changing the
RSA padding to use the RSA Probabilistic Signature Scheme
(RSASSA-PSS), and the removal of compression, the Digital
Signature Algorithm (DSA), and custom Ephemeral Diffie-Hellman
(DHE) groups.
Snowden worked at the NSA so he knew their capabilities and he considered GPG secure. I think this approves compressing before encrypting on security grounds.
I think this approves compressing before encrypting on security grounds.
I do not agree with this conclusion. Also, for GPG/PGP for encrypting data manually, this might be fine, because an attacker does not have a chosen plaintext channel. For Bitwarden specifically, I can imagine a few features that could be added that accidentally introduce a chosen plaintext channel.
With respect to how the attack works: The main idea is that the chosen plaintext in the compressed message is just brute-forced. If it matches a larger prefix of the secret, it will compress better. Therefore, if the attacker can reliably force the client to encrypt a chosen plaintext + a unknown target secret + other data (irrelevant for the attack), then they can adapt the plaintext, and search which string leads to the best compressed encrypted result and iteratively reconstruct the secret. If data is just encrypted per-field (as it is now) then this is probably not problematic, but again, moving to larger encrypted object this becomes more relevant.