Does AI actually help students learn? A recent experiment in a high school provides a cautionary tale.
Researchers at the University of Pennsylvania found that Turkish high school students who had access to ChatGPT while doing practice math problems did worse on a math test compared with students who didn’t have access to ChatGPT. Those with ChatGPT solved 48 percent more of the practice problems correctly, but they ultimately scored 17 percent worse on a test of the topic that the students were learning.
A third group of students had access to a revised version of ChatGPT that functioned more like a tutor. This chatbot was programmed to provide hints without directly divulging the answer. The students who used it did spectacularly better on the practice problems, solving 127 percent more of them correctly compared with students who did their practice work without any high-tech aids. But on a test afterwards, these AI-tutored students did no better. Students who just did their practice problems the old fashioned way — on their own — matched their test scores.
Removed by mod
This isn’t a new issue. Wolfram alpha has been around for 15 years and can easily handle high school level math problems.
Except wolfram alpha is able to correctly explain step by step solutions. Which was an aid in my education.
Only old farts still use Wolfram
Where did you think you were?
What do young idiots use?
ChatGPT apparently lol
I can’t remember, but my dad said before he retired he would just pirate Wolfram because he was too old to bother learning whatever they were using. He spent 25 years in academia teaching graduate chem-e before moving to the private sector. He very briefly worked with one of the Wolfram founders at UIUC.
Edit: I’m thinking of Mathematica, he didn’t want to mess with learning python.
I don’t even know of this is ChatGPT’s fault. This would be the same outcome if someone just gave them the answers to a study packet. Yes, they’ll have the answers because someone (or something) gave it to them, but won’t know how to get that answer without teaching them. Surprise: For kids to learn, they need to be taught. Shocker.
I’ve found chatGPT to be a great learning aid. You just don’t use it to jump straight to the answers, you use it to explore the gaps and edges of what you know or understand. Add context and details, not final answers.
The study shows that once you remove the LLM though, the benefit disappears. If you rely on an LLM to help break things down or add context and details, you don’t learn those skills on your own.
I used it to learn some coding, but without using it again, I couldn’t replicate my own code. It’s a struggle, but I don’t think using it as a teaching aid is a good idea yet, maybe ever.
I wouldn’t say this matches my experience. I’ve used LLMs to improve my understanding of a topic I’m already skilled in, and I’m just looking to understand something nuanced. Being able to interrogate on a very specific question that I can appreciate the answer to is really useful and definitely sticks with me beyond the chat.
There are lots of studies out there, and many of them contradict each other. Having a study with references contribute to the discussion, but it isn’t the absolute truth.
Not limited to kids.
Kids who use ChatGPT as a study assistant do worse on tests
But on a test afterwards, these AI-tutored students did no better. Students who just did their practice problems the old fashioned way — on their own — matched their test scores
Headline: People who flip coins have a much worse chance of calling it if they call heads!
Text: Studies show that people who call heads when flipping coins have an even chance of getting it right compared to people who do the old fashion way of calling tails.
You skipped the paragraph where they used two different versions of LLMs in the study. The first statement is regarding generic ChatGPT. The second statement is regarding an LLM designed to be a tutor without directly giving answers.
I didn’t skip it. If you are going to use a tool, use it right. “Study shows using the larger plastic end of screwdriver makes it harder to turn screws than just using fingers to twist them. Researchers caution against using screwdriver to turn screws.”
That’s not the fault of the user/students, though. They’re different tools. One is outright worse than not using it. Neither produce lasting benefits.
Headline: Screwdrivers better than hammers for screws.
Text: When craftspeople were trained using hammers with screwdriver bits duct-taped to them, they were able to perform the task, but were not able to keep pace with people using screwdrivers. Another team was given power drills, which were effective in practice. However, these did not produce any benefit once all people were given screwdrivers.
Thats a modified version, it says using unmodified ChatGPT results in 17% worse scores
TLDR: ChatGPT is terrible at math and most students just ask it the answer. Giving students the ability to ask something that doesn’t know math the answer makes them less capable. An enhanced chatBOT which was pre-fed with questions and correct answers didn’t screw up the learning process in the same fashion but also didn’t help them perform any better on the test because again they just asked it to spoon feed them the answer.
references
ChatGPT’s errors also may have been a contributing factor. The chatbot only answered the math problems correctly half of the time. Its arithmetic computations were wrong 8 percent of the time, but the bigger problem was that its step-by-step approach for how to solve a problem was wrong 42 percent of the time.
The tutoring version of ChatGPT was directly fed the correct solutions and these errors were minimized.
The researchers believe the problem is that students are using the chatbot as a “crutch.” When they analyzed the questions that students typed into ChatGPT, students often simply asked for the answer.
Would kids do better if the AI doesn’t hallucinate?
Would snails be happier if it kept raining? What can we do to make it rain forever and all time?
Paradoxically, they would probably do better if the AI hallucinated more. When you realize your tutor is capable of making mistakes, you can’t just blindly follow their process; you have to analyze and verify their work, which forces a more complete understanding of the concept, and some insight into what errors can occur and how they might affect outcomes.
There’s a bunch of websites that give you the answers to most homework. You can just Google the question and find the answers pretty quickly. I assume the people using chatgtp to “study” are just cheating on homework anyway.
Taking too many shortcuts doesn’t help anyone learn anything.
At work we give a 16/17 year old, work experience over the summer. He was using chatgpt and not understanding the code that was outputing.
I his last week he asked why he doing print statement something like
print (f"message {thing} ")
Im afraid to ask, but whats wrong with that line? In the right context thats fine to do no?
There is nothing wrong with it. He just didn’t know what it meant after using it for a little over a month.
Sounds like operator error because he could have asked chatGPT and gotten the correct answer about python f strings…
It all depends on how and what you ask it, plus an element of randomness. Remember that it’s essentially a massive text predictor. The same question asked in different ways can lead it into predicting text based on different conversations it trained on. There’s a ton of people talking about python, some know it well, others not as well. And the LLM can end up giving some kind of hybrid of multiple other answers.
It doesn’t understand anything, it’s just built a massive network of correlations such that if you type “Python”, it will “want” to “talk” about scripting or snakes (just tried it, it preferred the scripting language, even when I said “snake”, it asked me if I wanted help implementing the snake game in Python 😂).
So it is very possible for it to give accurate responses sometimes and wildly different responses in other times. Like with the African countries that start with “K” question, I’ve seen reasonable responses and meme ones. It’s even said there are none while also acknowledging Kenya in the same response.
Students first need to learn to:
- Break down the line of code, then
- Ask the right questions
The student in question probably didn’t develop the mental faculties required to think, “Hmm… what the ‘f’?”
A similar thingy happened to me having to teach a BTech grad with 2 years of prior exp. At first, I found it hard to believe how someone couldn’t ask such questions from themselves, by themselves. I am repeatedly dumbfounded at how someone manages to be so ignorant of something they are typing and recently realising (after interaction with multiple such people) that this is actually the norm[1].
and that I am the weirdo for trying hard and visualising the C++ abstract machine in my mind ↩︎
No. Printing statements, using console inputs and building little games like tic tac toe and crosswords isn’t the right way to learn Computer Science. It is the way things are currently done, but you learn much more through open source code and trying to build useful things yourself. I would never go back to doing those little chores to get a grade.
I would never go back to doing those little chores to get a grade.
So either you have finished obtaining all the academic certifications that require said chores, or you are going to fail at getting a grade.
Perhaps unsurprisingly. Any sort of “assistance” with answers will do that.
Students have to learn why things work the way they do, and they won’t be able to grasp it without going ahead and doing every piece manually.
Kids using an AI system trained on edgelord Reddit posts aren’t doing well on tests?
Ya don’t say.
ChatGPT lies which is kind of an issue in education.
As far as seeing the answer, I learned a significant amount of math by looking at the answer for a type of question and working backwards. That’s not the issue as long as you’re honestly trying to understand the process.
While I get that, AI could be handy for some subjects, where you wont put your future on. However using it extinsively for everything is quite an exaggeration.
Shocked, I tell you!
Of all the students in the world, they pick ones from a “Turkish high school”. Any clear indication why there of all places when conducted by a US university?
The paper only says it’s a collaboration. It’s pretty large scale, so the opportunity might be rare. There’s a chance that (the same or other) researchers will follow up and experiment in more schools.
If I had access to ChatGPT during my college years and it helped me parse things I didn’t fully understand from the texts or provided much-needed context for what I was studying, I would’ve done much better having integrated my learning. That’s one of the areas where ChatGPT shines. I only got there on my way out. But math problems? Ugh.
When you automate these processes you lose the experience. I wouldn’t be surprised if you couldn’t parse information as well as you can now, if you had access to chat GPT.
It’s had to get better at solving your problems if something else does it for you.
Also the reliability of these systems is poor, and they’re specifically trained to produce output that appears correct. Not actually is correct.
I read that comment, and use it similarly, as more a super-dictionary/encyclopedia in the same way I’d watch supplementary YouTube videos to enhance my understanding. Rather than automating the understanding process.
More like having a tutor who you ask all the too-stupid and too-hard questions to, who never gets tired or fed up with you.
Exactly this! That is why I always have at least one instance of AI chatbot running when I am coding or better said analyse code for debugging.
It makes it possible to debug kernel stuff without much pre-knowledge, if you are proficient in prompting your questions. Well, it did work for me.
I quickly learned how ChatGPT works so I’m aware of its limitations. And since I’m talking about university students, I’m fairly sure those smart cookies can figure it out themselves. The thing is, studying the biological sciences requires you to understand other subjects you haven’t learned yet, and having someone explain how that fits into the overall picture puts you way ahead of the curve because you start integrating knowledge earlier. You only get that from retrospection once you’ve passed all your classes and have a panoramic view of the field, which, in my opinion, is too late for excellent grades. This is why I think having parents with degrees in a related field or personal tutors gives an incredibly unfair advantage to anyone in college. That’s what ChatGPT gives you for free. Your parents and the tutors will also make mistakes, but that doesn’t take away the value which is also true for the AIs.
And regarding the output that appears correct, some tools help mitigate that. I’ve used the Consensus plugin to some degree and think it’s fairly accurate for resolving some questions based on research. What’s more useful is that it’ll cite the paper directly so you can learn more instead of relying on ChatGPT alone. It’s a great tool I wish I had that would’ve saved me so much time to focus on other more important things instead of going down the list of fruitless search results with a million tabs open.
One thing I will agree with you is probably learning how to use Google Scholar and Google Books and
pirating booksusing the library to find the exact information as it appears in the textbooks to answer homework questions which I did meticulously down to the paragraph. But only I did that. Everybody else copied their homework, so at least in my university it was a personal choice how far you wanted to take those skills. So now instead of your peers giving you the answers, it’s ChatGPT. So my question is, are we really losing anything?Overall I think other skills need honing today, particularly verifying information, together with critical thinking which is always relevant. And the former is only hard because it’s tedious work, honestly.
The names of the authors suggest there could be a cultural link somewhere.
Ah thanks, that does appear to be the case.
I’m guessing there was a previous connection with some of the study authors.
I skimmed the paper, and I didn’t see it mention language. I’d be more interested to know if they were using ChatGPT in English or Turkish, and how that would affect performance, since I assume the model is trained on significantly more English language data than Turkish.
GPTs are designed with translation in mind, so I could see it being extremely useful in providing me instruction on a topic in a non-English native language.
But they haven’t been around long enough for the novelty factor to wear off.
It’s like computers in the 1980s… people played Oregon Trail on them, but they didn’t really help much with general education.
Fast forward to today, and computers are the core of many facets of education, allowing students to learn knowledge and skills that they’d otherwise have no access to.
GPTs will eventually go the same way.
The study was done in Turkey, probably because students are for sale and have no rights.
It doesn’t matter though. They could pick any weird, tiny sample and do another meaningless study. It would still get hyped and they would still get funding.