

And IMO… your 3080 is good. It’s very well supported. It’s kinda hard to upgrade, in fact, as realistically you’re either looking at a 4090 or a used 3090.


And IMO… your 3080 is good. It’s very well supported. It’s kinda hard to upgrade, in fact, as realistically you’re either looking at a 4090 or a used 3090.


Oh no, you got it backwards. The software is everything, and ollama is awful. It’s enshittifying: don’t touch it with a 10 foot pole.
Speeds are basically limited by CPU RAM bandwidth. Hence you want to be careful doubling up RAM, and doubling it up can the max speed (and hence cut your inference speed).
Anyway, start with this. Pick your size, based on how much free CPU RAM you want to spare:
https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF
The “dense” parts will live on your 3080 while the “sparse” parts will run on your CPU. The backend you want is this, specifically the built-in llama-server:
https://github.com/ikawrakow/ik_llama.cpp/
Regular llama.cpp is fine too, but it’s quants just aren’t quite as optimal or fast.
It has two really good built-in web UIs: the “new” llama.cpp chat UI, and mikupad, which is like a “raw” notebook mode more aimed at creative writing. But you can use LM Studio if you want, or anything else; there are like a bazillion frontends out there.


Since there is generated video, it seems like someone solved this problem.
Oh yes, it has come a LOONG way. Some projects to look at are:
https://github.com/ModelTC/LightX2V
https://github.com/deepbeepmeep/Wan2GP
And for images: https://github.com/nunchaku-tech/nunchaku
I dunno what card you have now, but hybrid CPU+GPU inference is the trend days.
As an example, I can run GLM 4.6, a 350B LLM, with measurably low quantization distortion on a 3090 + 128GB CPU RAM, at like 7 tokens/s.
You can easily run GLM Air on like a 3080 + system RAM, or even a lesser GPU. You just need the right software and quant.


website
Bingo.
All the dev work is in the app.
A manual BRZ is far from a shitty sports car. Hell, it’s more agile than a lot of $100K+ ones.
A shitty sports car is like… an old Saturn Sky? Or older.


It’s a lot to keep up with
Massive understatement!
The next project will be some kind of agent. A ‘go and Google this and summarize the results’
Yeah, you do want more contextual intelligence than an 8B for this.
The image models I’m using are probably pretty dated too
Actually SDXL is still used a lot! Especially for the anime stuff. It just got so much finetuning and tooling piled on.


Do it!
Feel free to spam me if I don’t answer at first. I’m not ignoring you; Lemmy fails to send me reply notifications, sometimes.


How about community taxonomy?
Say there’s a gaming community.
Then there’s a PC gaming community, then a MMO game community, and there’s communities for individual games subdivided into that.
So if you’re in /c/PCgaming, posts in /c/GuildWars will (by default) show up in your feed.
If you are in /c/GuildWars, you (by default) get the hyper focus, and exposure from your post filtering up to more general tiers.
But this sharing is toggleable too. For example, you could choose to only float it up to the “MMO” level without drawing in the /c/gaming crowd
And this structure kinda naturally fits underlying database structures anyway.
Reddit could not evolve like this, but now that we kinda know what niches exist, that could be constructed from scratch and maintained.
Did you check drawers?
I leave mine in drawers, somehow.


Also, I’m a quant cooker myself. Say the word, and I can upload an IK quant more tailored for whatever your hardware/aim is.


You can run GLM Air on pretty much any gaming desktop with 48GB+ of RAM. Check out ubergarm’s ik_llama.cpp quants on Huggingface; that’s state of the art right now.


You should be running hybrid inference of GLM Air with a setup like that. Qwen 8B is kinda obsolete.
I dunno what kind of speeds you absolutely need, but I bet you could get at least 12 tokens/s.


That’s not strictly true.
I have a Ryzen destkop, 7800, 3090, and 128GB DDR5. And I can run the full GLM 4.6 with quite acceptable token divergence compared to the unquantized model, see: https://huggingface.co/Downtown-Case/GLM-4.6-128GB-RAM-IK-GGUF
If I had a EPYC/Threadripper homelab, I could run Deepseek the same way.


Most aren’t really running Deepseek locally. What ollama advertises (and basically lies about) is the now-obselete Qwen 2.5 distillations.
…I mean, some are, but it’s exclusively lunatics with EPYC homelab servers, heh. And they are not using ollama.


Also the comments on this video are wild.
Honestly, they are above par for YouTube.
…So maybe they’re bots? The more expensive kind?


I just wish I could systematically prevent myself from making any mistake lol, or like anyone from making the first mistake.
…I guess we theoretically could, via a Lemmy or Piefed PR, heh.
As an example, we could implement an opt-in feature that pops-up community rules before one is allowed to post. Kinda like Discord, but less obnoxious.
That’s one reason why I like this place. If something about the site’s UX design in problematic, there’s somewhere to go to get it improved. With any corporate social media, your only assurance is that it will get worse with time.


That’s kinda the idea behind moderation.
It’s why it’s best done in small communites, as the context narrows the scope of the arbitrary judgement.


I’m not trying to grandstand. My issue is with these presumptions:
like if the mods want to auto-ban everyone who doesn’t personally verify with them their womanhood, that’s their business. but expecting people to self-police their gender is a dumb expectation.
They’re not checking you at the door. They aren’t auto banning anyone. They very politely point out the sidebar to a few posters, then request them to stay quiet; that’s the extent of it.
…If you don’t make an issue of that, it’s not an issue.
if you want a private exclusive type of space… then make it private and exclusive. that way you can control who views and interacts with the event and even hire security to keep the ‘wrong’ people out.
But this is unrealistic, as then they wouldn’t get nearly as much participation in the space. It’s a public gathering spot, by choice.
Again, my specific problem is with commenters that are shown the rules by the mods, yet willingly choose to ignore them.
Just because you think rules are unrealistic does not give you a right to ignore them once asked. That’s how every community here works. Yet they seem to get tons of posters carrying that bad attitude, with that same line of argument.
That’s what makes me bristle. Respecting community rules (once known) is basic human civility, and people are perfectly capable of ‘self-policing’ that. I do not like the rejection of that + the policing of others in its place.


Wandering in, missing the rule sign, getting corrected, and apologizing is fine. I’ve done it; the mods there couldn’t have been nicer about it. It’s not an ideal system, no, but it works well enough; it’s the mods shouldering that burden more than anything.
…The problem is when the guys are corrected, yet keep talking anyway. Which I see happen a lot.
There is no excuse for that.
Is the best behavior to block any community you don’t or can’t participate in? I personally don’t love that behavior because I like seeing what everyone is discussing in threads, but that’s a reasonable solution.
I feel extremely mixed about this, yeah. I feel weird even talking about it.
I personally don’t love that behavior because I like seeing what everyone is discussing in threads, but that’s a reasonable solution.
The women’s space… doesn’t prohibit lurking? On one hand, the community is public, and I’m curious about the perspective in the discussions. I’m interested in understanding them so I can be a more respectful person myself.
I upvote their posts so they get more exposure.
…But I don’t want to violate their privacy either. Blocking is reasonable. Right now, I just upvote them but don’t enter the threads.
Obviously my current strat is just reading the community before posting (like not commenting negatively about Star Gate getting a new season in the star gate community as an example that happened today lol).
Read the room, yeah.
IMO TV fandoms shouldn’t worship their material. Negative discussion is allowed, otherwise the space gets toxic.
In fact, this kinda happened to one of my personal fandom spaces, /r/thelastairbender: among other things, they idolize ATLA (the original series) like a diety, to the point where anything different (including other material like Korra or the Netflix adaption) is demonized. Deeper stuff like the novels, fanfics or speculative lore is not welcome either.
That sucks. It’s all too common; the Star Wars fandom (for instance) is notorious for it. And its why some negativity and ‘outsider perspectives’ should be welcomed in such spaces.
The women’s space is different though. It’s basically a shelter from the shit this group puts up with IRL and online, so being more sensitive makes sense.
Yeah. They don’t sell it with a better engine because it would embarrrass more expensive cars, kinda like the Porsche boxster/cayman (which whispers handle better than the 911).
Old Miatas were like that too :(. Though I don’t know what Mazda’s excuse is these days, as the Miata is their top sports car?