AI-generated Code and Code Reviews

Published April 2026

As agentic coding practices become more common and make producing code more accessible to a wider audience, many open source projects are adopting policies setting expectations and limits for submitting patches created using these tools, even the Linux Kernel itself.

So, what is the point of a code review? Code reviews exist to ensure any changes made to a project align with the project's goals and coding practices in order to ensure the software is getting better and remains maintainable in the future. This may seem obvious, but it bears repeating, because both points relate to how AI-generated code tends to differ from human written code.

So, how do human and AI code contributions differ? The most obvious factor is maintainability. On fully or mostly AI-generated pieces of code, there is no human who truly understands the code. Even if prompted by an experienced programmer, it's likely they would need to spend a significant chunk of time getting familiar with the code, similar to how familiar you get with a mid-sized code review for a colleague. You likely read it through, feel like you understand what's happening at a high level, check that the tests pass (which many open source projects don't even have), maybe give it a try manually, and stamp your approval. This lack of understanding leads to the second point.

Humans make mistakes. These are usually somewhat obvious. Conditionals being the wrong way around, missing input validation, that sort of thing. LLMs and AI agents make mistakes too, but they are different. At the point of code review, generated code will have passed the prompter's own cursory review. Trouble is the subtle bugs and regressions that are unfortunately still common today. Many of the subtle issues are not commonly made by humans, because they are pattern recognition misses, not simple typos. Think of a regular expression that appears to be doing the right thing. When a human has written it, I have some confidence that it is a regular expression intended for the thing it claims to be matching. When an LLM has made it, I can't be sure if it has pulled some garbage from its training. Mismatched regex and other similar pattern matchers has been a real problem I've observed in my own exploration. Which leads us to code reviews.

AI disclosure is important for reviews today, because AI code need deeper scrutiny. I do not see AI code as inherently worse. The point is that reviewing it needs a different lens, one that is more focused on the subtleties.

From a maintainability point of view, it is fair for projects to decline code without a human claiming and showcasing expertise over the subject matter. Many projects already die from the weight of technical debt and the problem becomes infinitely worse, if there is no human that knows the code. The more specific an implementation becomes, the more likely it gets that an AI agent confuses it with another similar niche that has completely different requirements. This leads to usage of APIs that don't exist, reasoning chains that do not match the project or subject matter at all, and still it can output code that seems correct and even work under normal conditions, but falls apart as soon as a real user touches it.