gguf in llama cpp

#1
by Bearsaerker - opened

Would this also work quantized for long context in llama cpp or are there any special dependencies which are specific to the implementation in the model card?

Hi, I haven't used llama-cpp before. There's no special dependencies other than pytorch==2.1.2 transformers==4.36.1 accelerate==0.25.0 for this implementation.

I get this error when trying to convert to GGUF:

    raise Exception(f"Unexpected tensor name: {name}")
Exception: Unexpected tensor name: model.beacon_embed_tokens.weight

Does anyone know how we can use this model quantized?

Sign up or log in to comment