- cross-posted to:
- pcgaming@lemmy.ca
- cross-posted to:
- pcgaming@lemmy.ca
A user asked on the official Lutris GitHub two weeks ago “is lutris slop now” and noted an increasing amount of “LLM generated commits”. To which the Lutris creator replied:
It’s only slop if you don’t know what you’re doing and/or are using low quality tools. But I have over 30 years of programming experience and use the best tool currently available. It was tremendously helpful in helping me catch up with everything I wasn’t able to do last year because of health issues / depression.
There are massive issues with AI tech, but those are caused by our current capitalist culture, not the tools themselves. In many ways, it couldn’t have been implemented in a worse way but it was AI that bought all the RAM, it was OpenAI. It was not AI that stole copyrighted content, it was Facebook. It wasn’t AI that laid off thousands of employees, it’s deluded executives who don’t understand that this tool is an augmentation, not a replacement for humans.
I’m not a big fan of having to pay a monthly sub to Anthropic, I don’t like depending on cloud services. But a few months ago (and I was pretty much at my lowest back then, barely able to do anything), I realized that this stuff was starting to do a competent job and was very valuable. And at least I’m not paying Google, Facebook, OpenAI or some company that cooperates with the US army.
Anyway, I was suspecting that this “issue” might come up so I’ve removed the Claude co-authorship from the commits a few days ago. So good luck figuring out what’s generated and what is not. Whether or not I use Claude is not going to change society, this requires changes at a deeper level, and we all know that nothing is going to improve with the current US administration.


I tried fitting AI into my workloads just as an experiment and failed. It’ll frequently reference APIs that don’t even exist or over engineer the shit out of something could be written in just a few lines of code. Often it would be a combo of the two.
You might genuinely be using it wrong.
At work we have a big push to use Claude, but as a tool and not a developer replacement. And it’s working pretty damn well when properly setup.
Mostly using Claude Sonnet 4.6 with Claude Code. It’s important to run /init and check the output, that will produce a CLAUDE.md file that describes your project (which always gets added to your context).
Important: Review everything the AI writes, this is not a hands-off process. For bigger changes use the planning mode and split tasks up, the smaller the task the better the output.
Claude Code automatically uses subagents to fetch information, e.g. API documentation. Nowadays it’s extremely rare that it hallucinates something that doesn’t exist. It might use outdated info and need a nudge, like after the recent upgrade to .NET 10 (But just adding that info to the project context file is enough).
Agreed, I don’t understand people not even giving it a chance. They try it for five minutes, it doesn’t do exactly what they want, they give up on it, and shout how shit it is.
Meanwhile, I put the work in, see it do amazing shit after figuring out the basics of how the tech works, write rules and skills for it, have it figure out complex problems, etc.
It’s like handing your 90-year-old grandpa the Internet, and they don’t know what the fuck to do with it. It’s so infuriating.
Probably because, like your 90-year-old grandpa with the Internet, you have to know how to use the search engine. You have to know how to communicate ideas to an LLM, in detail, with fucking context, not just “me needs problem solvey, go do fix thing!”
It’s not really that simple. Yes, it’s a great tool when it works, but in the end it boils down to being a text prediction machine.
So a nice helper to throw shit at, but I trust the output as much as a random Stackoverflow reply with no votes :)
I feel like there needs to be a dedicated post (and I don’t want to write it, but maybe I eventually will) that outlines what a model really is. It is not just a statistical text prediction machine unless you are being so loose with the definition of “statistical” that it doesn’t even mean anything anymore.
A decent example of a statistical text prediction machine is the middle word suggested by your phone when you’re using the keyboard. An LLM is not that.
In the most general terms, this kind of language model tokenizes a corpus of text based on a vocabulary (which is probably more than just the words in the dictionary), uses an embedding model to translate these tokens into a vector of semantic “meaning” which minimized loss in a bidirectional encoding (probably), that is then trained against a rubric for one or more topic area questions, retrained for instruction and explainability, retrained with reinforcement learning and human feedback to provide guardrails, and retrained again to make use of supplemental materials not part of the original training corpus (resource augmented generation), then distilled, then probably scaled and fine tuned against topic areas of choice (like coding or Korean or whatever) and maybe THEN made available to people to use. There are generally more parts to curriculum learning even than that but it’s a representative-ish start.
My point being that, yes, it would be nuts to pose ANY question to a predictor that says “with 84% probability, the word that is most likely follows ‘I really like’ is ‘gooning’ on reddit”, but even Grok is wildly more sophisticated than that and Grok is terrible.
Edit: And also I really like your take at the start of this thread: user error is a pretty huge problem in this space.
The training is sophisticated, but inference is unfortunately really a text prediction machine. Technically token prediction, but you get the idea.
For every single token/word. You input your system prompt, context, user input, then the output starts.
The
Feed the entire context back in and add the reply “The” at the end.
The capital
Feed everything in again with “The capital”
The capital of
Feed everything in again…
The capital of Austria
…
It literally works like that, which sounds crazy :)
The only control you as a user can have is the sampling, like temperature, top-k and so on. But that’s just to soften and randomize how deterministic the model is.
Edit: I should add that tool and subagent use makes this approach a bit more powerful nowadays. But it all boils down to text prediction again. Even the tools are described per text for what they are for.
Unless that’s how people are designing front ends for models, it literally DOESN’T work like that. It works like that until you finish training an embedding model with masking related tasks, but that’s the tip of the iceberg. The input vector, after being tokenized, is ingested wholesale. Now there’s sometimes funny business to manage the size of a context window effectively but this isn’t that unless you’re home-rolling and you’re caching your own inputs or something before you give it to the model.
And we’re barely smarter than a bunch of monkeys throw piles of shit at each other. Being reductive about its origins doesn’t really explain anything.
Yeah, but that’s why there’s unit tests. Let it run its own tests and solve its own bugs. How many mistakes have you or I made because we hate making unit tests? At least the LLM has no problems writing the tests, after you know it works.
Most people on Lemmy probably haven’t given it a single minute let alone 5 minutes.
At a minimum, the agent should be compiling the code and running tests before handing things back to you. “It references non-existent APIs” isn’t a modern problem.
Yeah I mean. It’s not like AI can think. It’s just a glorified text predictor, the same you have on your phone keyboard
It’s like having an idiot employee that works for free. Depending on how you manage them, that employee can either do work to benefit you or just get in your way.
Only it’s not free. If you run it in the cloud, it’s heavily subsidized and proactively destroying the planet, and if you run it at home, you’re still using a lot of increasingly unaffordable power, and if you want something smarter than the average American politician, the upfront investment is still very significant.
Yeah I’m not buying the “proactively destroying the planet” angle. I’d imagine there’s a lot of misinformation around AI, given that the products surrounding it are mostly Western, like vaccines…
Vaccines are misinformation? What.
Not even free, just cheaper than an actual employee for now, but greed is inevitable and AI is computationally expensive, it’s only a matter of time before these AI companies start cranking up the prices.
I had the same experience. Asked a local LLM about using sole Qt Wayland stuff for keyboard input, a the only documentation was the official one (which wasn’t a lot for a noob), no.examples of it being used online, and with all my attempts at making it work failing. it hallucinated some functions that didn’t exist, even when I let it do web search (NOT via my browser). This was a few years ago.
That’s 50 years in LLM terms. You might as well have been banging two rocks together.
The symptoms you describe are caused by bad prompting. If an AI is providing over-complicated solutions, 9 times out of 10 it’s because you didn’t constrain your problem enough. If it’s referencing tools that don’t exist, then you either haven’t specified which tools are acceptable or you haven’t provided the context required for it to find the tools. You may also be wanting too much out of AI. You can’t expect it to do everything for you. You still have to do almost all the thinking and engineering if you want a quality project - the AI is just there to write the code. Sure, you can use an AI to help you learn how to be a better engineer, but AIs typically don’t make good high-level decisions. Treat AI like an intern, not like a principal engineer.
“it’s your fault that it just made up tools that don’t exist” is a bold statement, bro.
No, it’s not. It doesn’t have intention. It’s literally just a tool. If you don’t get the results you expect with a tool when other people do get those results, then the problem isn’t the tool.
If the tool can’t be consistent in it’s output, it’s not a reliable or worthwhile tool to use.
There is such a thing as a bad tool.
“It can’t be that stupid, you must be prompting it wrong.”
It’s not about stupid or smart. It’s a tool, not a person. If you don’t get the same results that other people get with the same tool, then what could possibly be the problem other than how the person is using the tool?