Safetensors
Eval Results

gguf?

#7
by deniiiiiij - opened

Hi. will there be a gguf? quantize who can)

This isn't a matter of quantization; if you want llama.cpp to support this model, you would need to port its specific inference algorithms and framework. Markovian RSA is by no means a simple mechanism; it essentially involves the parallel processing of multiple inference streams followed by recursive aggregation—you can think of it, simply put, as a dynamic "Best-of-N" inference system. This approach is not only slow but also memory-intensive, making it unsuitable for personal deployment; although the model itself is only 8B, its actual inference overhead is substantial.

There is an ongoing effort for llama.cpp integration here

Sign up or log in to comment