Bitwarden AI Review

treska · December 1, 2025, 7:55pm

Update: A fix for this issue is in the works: https://github.com/bitwarden/clients/pull/17701

Are you seriously using AI for this security-sensitive app?

is nothing safe any more?

grb · December 1, 2025, 8:26pm

Presumably, this is just to perform initial checks before human QA, but hopefully it is not a slippery slope…

marlin · December 1, 2025, 8:32pm

What’s wrong with using AI for programming tasks, as long as the output is still reviewed by a human being?

A few days ago I asked Copilot to review the code of a colleague. It told me that a specific RFC had been implemented but a certain subsection of that RFC had not been applied correctly. I would have never detected that by myself.

dwbit · December 1, 2025, 8:40pm

Hey all, just added a notice to the top of the thread:

While AI tools can assist in maintaining coding standards, identifying potential bugs, and suggesting improvements that align with our established best practices and patterns, code reviews/approvals are performed by Bitwarden team members.

Bitwarden is also part of the HackerOne bug bounty program, and undergoes regular third-party audits.

Neuron5569 · December 1, 2025, 9:25pm

If you are asking whether nothing is safe from LLM hallucinations, biases, and flaws anymore, considering that governments and private entities are incorporating these into systems that could harm you and impact societies on a wide scale, the answer is nothing is safe anymore.

Humans supervising AI outputs? Given the AI output volume and human’s limited knowledge/time/attention, humans are obviously not going to catch all the mistakes. We can only hope that the positives will outweigh the negatives. LLM automation and software engineering? It seems the train has already left the station; we’d better get used to it.

treska · December 2, 2025, 7:27pm

While AI tools can assist in maintaining coding standards, identifying potential bugs, and suggesting improvements that align with our established best practices and patterns, code approvals are performed by Bitwarden team members.

This doesn’t address the fundamental underlying issues. By using these tools you are contributing to the larger systemic issues through validation. They are trained unethically on datasets without the consent or permission of the authors, and they require massive amounts of resources.

So you’ve said yourself here that a colleague made a mistake that you would never have caught. Who’s to say Copilot didn’t also make an error that you didn’t catch? You’ve proven by example that even with human review, errors go undetected, and AI is error prone. And unlike your colleague, whom you can ask directly, you will never fully understand the reasoning behind an AI’s decisions.

How many small bugs have these AI tools already left littered throughout the code base that the marlins of the world have failed to catch? When these small issues begin to compound into larger issues, and you’re not able to ask the AI why such-and-such code was made weeks ago, or months, or a year, because AI does not work that way, what then?

marlin · December 3, 2025, 12:50pm

There is also a cost for every Google query. Should we stop using search engines, too?

The code was talking about was not generated by AI but written by a human being and reviewed by a human being. In this case, AI was just another tool, providing an additional safety-net. What’s wrong with that?

No, I did not. But yes, AI is generating erroneous code all the time, just like humans do.

Again, my example was about code written by a human. Do not twist my words.

grb · December 3, 2025, 3:58pm

Unlike the recent LLM boom, the conventional (non-AI) Google search engine never caused my utility bills to shoot through the roof, though.

Personally, other than the ethical concerns with AI in general, I don’t see such usage causing security risks (unless, as I noted above, this usage doesn’t not become a “gateway drug” to possible future over-reliance on LLM in code review or actual programming).

I’m confused by your push-back. You did give a clear example where your (human) code review missed a problem. And even without this example, it should not be a contentious statement that to err is human.

You may be reading something into @treska’s words that wasn’t intended — I’ve taken what they’ve said as a straightforward extrapolation of the facts previously stipulated:

Human code reviewers miss problems (something you both agree on).
LLM-generated code is not error-free (something you both agree on).
Ergo, if Bitwarden starts using LLMs to generate code at some point in the future, then there could be serious problems.

It is true that you have never suggested (in this thread) that LLMs be used to generate code, but that doesn’t change the fact that some users (e.g., @treska) are concerned that Bitwarden may eventually go this route, nor the fact that some of what you said does illustrate some of the pitfalls of using LLM-generated code.

I address the following to all thread participants:

There’s no escaping that there are going to be (at least) two opposing views on AI/LLM use by Bitwarden (and beyond), and that these views are often going to be strongly held. Therefore, when expressing a viewpoint on this forum — and especially when debating an opposing viewpoint — it is important (in fact, required) to observe the Community Guidelines on respectful and constructive communication.

Let’s all try our best not to escalate the tone of this debate, so that mods don’t have to get involved…

marlin · December 3, 2025, 4:26pm

My example should illustrate that AI is a helpful tool that can improve the software development process if used correctly. I got the feeling that @treska turned this into “Since humans make errors, we should not use AI”.

Yes, both is correct in my opinion.

And this is not the conclusion I would draw here. Human coders and reviewers already make errors that cause serious problems, without any AI usage. But we now have an additional tool that can help here and that’s a good thing.

However, I would not limit this to reviewing. The Copilot has helped me improving my own code more than once and I even learned something new during the process. Many developers I have spoken to would not like to relinquish that now that they are used to it.

grb · December 3, 2025, 5:12pm

Probably, but not without cost.

I think that the difference pointed out by @treska was that humans can communicate with each other and provide explanations for decisions made in the past, something that is not (yet) possible with LLM-generated code.

That still sounds like review (suggesting improvements to your own code).

marlin · December 3, 2025, 5:32pm

Well, it is possible, but that produces funny results sometimes. Real conversation between Copilot and me:

Copilot: “Your code looks good, but you need to improve the event handlers.”
Me: “Which event handlers? I do not see any.”
Copilot: “Oh, you are right. I suggested it earlier but actually there are none in your code.”

Beside that, I found arguing with Copilot quite interesting, especially when probing for best practices and their pros and cons.

Well, it’s a fine line between copy&paste a suggestion or clicking the “Accept” button in “Edit mode”. Most of the time I let Copilot edit my code directly and review the suggestion afterward.

And that’s the crucial step. It happens way too often (unrelated to software development) that generated content is used as-is. But that is the same as just copying results from Stackoverflow without understanding those, just on a larger scale.