gguf?

by deniiiiiij - opened 17 days ago

Discussion

deniiiiiij

17 days ago

Hi. will there be a gguf? quantize who can)

Jianqiao1

14 days ago

This isn't a matter of quantization; if you want llama.cpp to support this model, you would need to port its specific inference algorithms and framework. Markovian RSA is by no means a simple mechanism; it essentially involves the parallel processing of multiple inference streams followed by recursive aggregation—you can think of it, simply put, as a dynamic "Best-of-N" inference system. This approach is not only slow but also memory-intensive, making it unsuitable for personal deployment; although the model itself is only 8B, its actual inference overhead is substantial.

ganeshnanduru

Zyphra org 13 days ago

There is an ongoing effort for llama.cpp integration here

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment