A company not making self-serving predictions & studies.
From the paper abstract:
[…] Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI.
We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library.
We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation – particularly in safety-critical domains.
The wording is very, very self serving tho.
yep, they are selling learning models, but they are not pretending medical doctors will be out of work next week like OpenAI is doing
Anthropic may avoid saying the dumb things OpenAI says, but do not mistake that for being a better company/product. Amodei is still out to eliminate all jobs and has a history of being just as self-serving as Altman.
I 100% agree with you and would love to see Anthropic burn (same as OpenAI and all other big tech)
Interesting read and feels intuitively plausible. Also matches my growing personal sense that people are using these things wildly differently and having completely different outcomes as a result. Some other random disconnected thoughts:
-
I’m surprised they’re publishing this, it seems to me like a pretty stark condemnation of the technology. Like what are the benefits they anticipate that made them decide this should be published, vs. quietly kept aside “pending further research”? Obviously people knowing how to use the tools better is good for longevity, but that’s just not what our idiotic investment cycles prioritize.
-
I’m no scientist or expert in experimental design, but this seems like way too few people for the level of detail they’re bringing to the conclusions they’re drawing. That plus the way it all just feels intuitively plausible has a very “just so” feeling to the interpretation rather than true exploration. I mean, cmon - the behavioral buckets they are talking about range from 2-7 people apiece, most commonly just 4 individuals. “Four junior engineers behaved kinda like this and had that average outcome” MIGHT reflect a broader pattern but it sure doesn’t feel compelling or scientific.
Nonetheless I selfishly enjoyed having my own vague subconscious observations validated lol, would like to see more of this (and anything else that seems to work against the crazy bubble being inflated).
For 1: as a software company, they have a vested interest in ensuring that software engineers are as capable as possible. I don’t know if anthropic as a company uses this as a guiding principle, but certainly some companies do (ex Jane Street). So they might see this as more important than investment cycles.
The quality of software engineers and computer scientists I’ve seen coming out of undergraduate programs in the last year has been astonishingly poor compared to 2-3 years ago. I think it’s almost guaranteed that the larger companies have also noticed this.
-
@idriss Seems predictable to me. Programmers on the left or middle of some distribution identifying “good” programmers or engineers will use AI and be comfortable having completed some task. Those on the right of the distribution may or may not use AI but will insist on understanding what has been created.
Now, an interesting question for me unrelated to the post is “what would be a good metric to identify
really good programmers?”@troi@techhub.social tbh I could see people who are considered good programmers in one place but not in another place (just prompting to get things done with minimum effort & reserving the effort for something else). Probably it comes back to interest & care, how much the person is interested in iterating over their solution & architecture + learning things regardless of seniority level to achieve a higher level goal (simpler design for example rather than stopping when it works). Maybe that could be an indication of a good programmer?
@idriss makes sense. The 80-20 rule might apply here. A good programmer knows where to spend their time. I’ve been kicking this around with an old boss and we don’t have any firm ideas. A metric should be quantifiable, but your interest & care gets into self actualization. Maybe a version of Maslow’s hierarchy of needs for software developers?
I am also thinking the word “good” was a bad choice. It’s too subjective and has a negative implication for anyone to the left side of the bell curve. Competent programmers are a thing and I suspect they actually keep most things running smoothly.
I wish I had my old copy of Weinberg’s _The Psychology of Computer Programming_. It’s been decades since I read it so I don’t recall if it addressed this sort of question, but it might suggest something.
Why are like 70% of the posts in this comm about AI lately?? I’m out of here…
In a randomized controlled trial, we examined 1) how quickly software developers picked up a new skill (in this case, a Python library) with and without AI assistance; and 2) whether using AI made them less likely to understand the code they’d just written.
We found that using AI assistance led to a statistically significant decrease in mastery. On a quiz that covered concepts they’d used just a few minutes before, participants in the AI group scored 17% lower than those who coded by hand, or the equivalent of nearly two letter grades. Using AI sped up the task slightly, but this didn’t reach the threshold of statistical significance.
Who designed this study? I assume it wasn’t a software engineer, because this doesn’t reflect real world “coding skills”. This is just a programming-flavored memory test. Obviously, the people who coded by hand remembered more about the library in the same way students who take notes by hand as opposed to typing tend to remember more.
A proper study would need to evaluate critical thinking and problem solving skills using real world software engineering tasks. Maybe find some already-solved, but obscure bug in an open source project and have them try to solve it in a controlled environment (so they don’t just find the existing solution already).
The study is about the impact AI use has on learning. Their experiment seems to test just that, unlike what you’re describing.
Besides, remembering what you did an hour ago seems like a real world problem to me. Unless one manages to switch project before the bug reports come in
The study is about the impact AI use has on learning. Their experiment seems to test just that, unlike what you’re describing.
The title is literally “How AI assistance impacts the formation of coding skills”. Memorizing APIs isn’t what most people would consiser a “coding skill”.
Debugging, systems design, optimization, research and evaluation, etc are what actually make someone a useful engineer, and are the skills a person develops as they go from junior to senior. Even domain knowledge (like knowing a lot about farming if you’re working on farming software) is more useful than memorizing the API of any framework. The only thing memorization does is it saves you a few minutes from having to read some docs, but that’s minimal impact, and it’s something you pick up normally throughout the course of working on a project anyways. When you finish that project, you might never use that API again, or if you do it might have changed completely when a new version is released.
remembering what you did an hour ago seems like a real world problem to me.
Sure, humans have shitty memory, but that has nothing to do with AI code assistance. There are plenty of non-AI coding assistants that help people with this (like Intellisense/LSP auto complete, which has been around for decades)





