The varianle you’re missing is time. There was a big shift in quality by Christamas, and the latest models arr much better programmers than models from one year ago. The quality is improving so fast, that most people still think of AI as a “slop generator”, when it can actually write good code and find rral bugs and secutity issues now.
As someone who has to sift through other people’s LLM code every day at my job I can confirm it has definitely not gotten better in the past three months
We require you to submit markdown plan before working on a feature, which must have full context, scope, implementation details. Also verification tests mardown file of happy path and critical failure modes that would affect customer, and how tests were performed. Must be checked in with the commit.
If your plan or verification docs have wrong context, missing obvious implementation flaws, bad coupling, architecture, interfaces, boundary conditions, missing test cases, etc then PR rejected.
That’s the thing though. Even if the code is good, the plans are good, the outputs are good, etc, it still devolves into chaos after some time.
If you use AI to generate a bunch of code you then don’t internalize it as if you wrote it. You miss out on reuse patterns and implementation details which are harder to catch in review than they are in implementation. Additionally, you don’t have anyone who knows the code like the back of their hand because (even if supervised) a person didn’t write the code, they just looked over it for correctness, and maybe modified it a little bit.
It’s the same reason why sometimes handwritten notes can be better for learning than typed notes. Yeah one is faster, but the intentionality of slowing down and paying attention to little details goes a long way making code last longer.
There’s maybe something to be said about using LLMs as a sort of sanity check code reviewer to catch minor mistakes before passing it on to a real human for actual review, but I definitely see it as harmful for anything actually “generative”
That’s a good point. We’ve been using the UML diagrams as a tool to catch behavioral red flags, but the reuse and implementation details of that are left undefined.
Maybe the answer lies in also explicitly spending a few passes focusing on code health, explainability, maintainability. This is something I go through at end and then retry verification tests, but not something we explicitly require in our process at the moment.
The other missing variable is actually knowing how to use the tools. Vibe coding still produces slop. Good AI-generated code requires understanding what you’re trying to achieve and giving the AI clear context on what design paradigms to follow, what libraries to use and so on. Basically, if you know how to write good code without AI, it can help you to do so faster. If you don’t, it’ll help you to write slop faster. Garbage in, garbage out.
This is a good answer. AI tools won’t make someone who has not yet developed programming skills into a good programmer. For someone who has a good grasp of implementation patterns and the toolkit for a given tech stack, they can speed things up by putting you into the role of a senior programmer reviewing code from multiple newbies.
I’m finding that for it to work well, you have to split things up into very small pieces. You also have to really own your AI automation prompts and scripts. You can’t just copy what some YouTuber did and expect it to work well in your environment.
I used to feel the same way, but I’ve come to realize it’s slop that just looks better on the surface not slop that is actually better.
At least it compiles most the time now. But it’s never quite right… Everytime I have Claude write some section of code 6 more things spring up that need to be fixed in the new code. Never ending cycle. On the surface the code appears more readable but it’s not
The varianle you’re missing is time. There was a big shift in quality by Christamas, and the latest models arr much better programmers than models from one year ago. The quality is improving so fast, that most people still think of AI as a “slop generator”, when it can actually write good code and find rral bugs and secutity issues now.
As someone who has to sift through other people’s LLM code every day at my job I can confirm it has definitely not gotten better in the past three months
We require you to submit markdown plan before working on a feature, which must have full context, scope, implementation details. Also verification tests mardown file of happy path and critical failure modes that would affect customer, and how tests were performed. Must be checked in with the commit.
If your plan or verification docs have wrong context, missing obvious implementation flaws, bad coupling, architecture, interfaces, boundary conditions, missing test cases, etc then PR rejected.
That’s the thing though. Even if the code is good, the plans are good, the outputs are good, etc, it still devolves into chaos after some time.
If you use AI to generate a bunch of code you then don’t internalize it as if you wrote it. You miss out on reuse patterns and implementation details which are harder to catch in review than they are in implementation. Additionally, you don’t have anyone who knows the code like the back of their hand because (even if supervised) a person didn’t write the code, they just looked over it for correctness, and maybe modified it a little bit.
It’s the same reason why sometimes handwritten notes can be better for learning than typed notes. Yeah one is faster, but the intentionality of slowing down and paying attention to little details goes a long way making code last longer.
There’s maybe something to be said about using LLMs as a sort of sanity check code reviewer to catch minor mistakes before passing it on to a real human for actual review, but I definitely see it as harmful for anything actually “generative”
That’s a good point. We’ve been using the UML diagrams as a tool to catch behavioral red flags, but the reuse and implementation details of that are left undefined.
Maybe the answer lies in also explicitly spending a few passes focusing on code health, explainability, maintainability. This is something I go through at end and then retry verification tests, but not something we explicitly require in our process at the moment.
How do you manage?
The other missing variable is actually knowing how to use the tools. Vibe coding still produces slop. Good AI-generated code requires understanding what you’re trying to achieve and giving the AI clear context on what design paradigms to follow, what libraries to use and so on. Basically, if you know how to write good code without AI, it can help you to do so faster. If you don’t, it’ll help you to write slop faster. Garbage in, garbage out.
This is a good answer. AI tools won’t make someone who has not yet developed programming skills into a good programmer. For someone who has a good grasp of implementation patterns and the toolkit for a given tech stack, they can speed things up by putting you into the role of a senior programmer reviewing code from multiple newbies.
I’m finding that for it to work well, you have to split things up into very small pieces. You also have to really own your AI automation prompts and scripts. You can’t just copy what some YouTuber did and expect it to work well in your environment.
I used to feel the same way, but I’ve come to realize it’s slop that just looks better on the surface not slop that is actually better.
At least it compiles most the time now. But it’s never quite right… Everytime I have Claude write some section of code 6 more things spring up that need to be fixed in the new code. Never ending cycle. On the surface the code appears more readable but it’s not