Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
onekq 
posted an update 3 days ago
Post
769
Heard good things about this model and no inference providers support it ...

THUDM/GLM-4-9B-0414

it works on the llama.cpp

It is how you can run it:

llama-server -ngl 999 --host 192.168.1.68 --override-kv glm4.rope.dimension_count=int:64 --override-kv tokenizer.ggml.eos_token_id=int:151336 -m /mnt/nvme0n1/LLM/quantized/GLM-4-9B-0414-Q8_0.gguf

Read here why:

Eval bug: GLM-Z1-9B-0414 · Issue #12946 · ggml-org/llama.cpp:
https://github.com/ggml-org/llama.cpp/issues/12946#issuecomment-2803564782

·

Ah I see. they have their own architecture.

https://github.com/huggingface/transformers/pull/37388

This will be hard.

In this post