In LM Studio, I still cant figure out one thing, when i run glm flash normally the speed is fast, as expected with MoE model. But different story when i launch this model in server mode, to use it with opencode. Then the speed of generation in opencode is much much slower. Tried to play with all the settings, nothing helps. Why it runs fast in chat mode, and very slow in server mode with opencode?
rtuuuuuuuur
urtuuuu
AI & ML interests
None yet
Recent Activity
replied to
danielhanchen's
post
44 minutes ago
We created a tool-calling guide for local LLMs!
Learn how to use any open model like Qwen3-Coder-Next and GLM-4.7-Flash for function calling.
Guide: https://unsloth.ai/docs/basics/tool-calling-guide-for-local-llms
We provide hands-on examples for: story writing, Python execution, terminal tool calls, maths and more.
new activity
1 day ago
ACE-Step/Ace-Step1.5:AMD support
liked
a model
1 day ago
ACE-Step/Ace-Step1.5
Organizations
None yet