Text Generation
Transformers
Safetensors
4 languages
cohere
conversational
Inference Endpoints
text-generation-inference

Anyone have gguf quants?

#3
by lemon07r - opened

A big thanks for this farewell gift. Might be one of the best models in this size we have for a while since finetuning for 32b/35b is slow (this is the only good one from what I can tell). I'm wondering if anyone has gguf quants for this model.

CausalLM org

I think the implementation of bpe tokenizer from llama.cpp is still incorrect, and it won't work as expected unless someone fix that. And it's the same case with cohere-command-r. The only different thing is that I replaced the special tokens to those from chatml.

I would recommend aphrodite-engine for accelerated inference with f16.

Just added them. Issue is fixed in llama.cpp
QuantFactory/CausalLM-35b-beta-long-GGUF

bartowski/35b-beta-long-GGUF is OK with the latest llama.cpp or koboldcpp.

Sign up or log in to comment