Selfhosted coding assistant?

wasp_eggs@midwest.social · 6 days ago

Selfhosted coding assistant?

perry@aussie.zone · 4 days ago

Qwen coder model from Huggingface, following the instructions there to run it in llama.cpp. Once that’s up: OpenCode and use the custom OpenAI API to connect it.

You’ll get far better results than trying to use other local options out of the box.

There may be better models potentially but I’ve found Qwen 2.5 etc to be pretty fantastic overall, and definitely a fine option beside Claude/ChatGPT/Gemini. I’ve tested the lot and it’s usually far more down to instruction and AGENTS.md instructions/layout than it is down to just the model.

madcaesar@lemmy.world · 4 days ago

Do you mind sharing your agents md?

melfie@lemy.lol · 4 days ago

The main thing that has stopped me from doing this so far is VRAM. My server has a RTX 4060 with 8GB, and not sure that can reasonably run a model like this.

70k32@sh.itjust.works · 4 days ago

This. Llama.cpp with Vulkan backend running in docker-compose, some Qwen3-Coder quantization from huggingface and pointing Opencode to that local setup with a OpenAI-compatible is working great for me.