Text2Text Generation
Transformers
Safetensors
101 languages
t5
Inference Endpoints
text-generation-inference

aya-101 GGUF 4bit

#9
by mitkox - opened

Hey I am trying to convert the HF model to GGUF but am getting this error KeyError: 'encoder.block.0.layer.0.SelfAttention.k.weight'
I thought this should be only a warning but still the convert crashes.
Any recommendations for conversion to GGUF and 4bit quant?

watching!

This is a new model, so llama.cpp would have to create support for it, before you could make and use a GGUF-file.

The model might be new, but they used T5 architecture (T5ForConditionalGeneration) or (mt5-xxl). At the moment llama.cpp doesn't support it (yet). Most of the supported archs in llama.cpp are CasualLM, I hope they start support XXXForConditionalGeneration soon:

def from_model_architecture(model_architecture):
        if model_architecture == "GPTNeoXForCausalLM":
            return GPTNeoXModel
        if model_architecture == "BloomForCausalLM":
            return BloomModel
        if model_architecture == "MPTForCausalLM":
            return MPTModel
        if model_architecture in ("BaichuanForCausalLM", "BaiChuanForCausalLM"):
            return BaichuanModel
        if model_architecture in ("FalconForCausalLM", "RWForCausalLM"):
            return FalconModel
        if model_architecture == "GPTBigCodeForCausalLM":
            return StarCoderModel
        if model_architecture == "GPTRefactForCausalLM":
            return RefactModel
        if model_architecture == "PersimmonForCausalLM":
            return PersimmonModel
        if model_architecture in ("StableLMEpochForCausalLM", "LlavaStableLMEpochForCausalLM"):
            return StableLMModel
        if model_architecture == "QWenLMHeadModel":
            return QwenModel
        if model_architecture == "Qwen2ForCausalLM":
            return Model
        if model_architecture == "MixtralForCausalLM":
            return MixtralModel
        if model_architecture == "GPT2LMHeadModel":
            return GPT2Model
        if model_architecture == "PhiForCausalLM":
            return Phi2Model
        if model_architecture == "PlamoForCausalLM":
            return PlamoModel
        if model_architecture == "CodeShellForCausalLM":
            return CodeShellModel
        if model_architecture == "OrionForCausalLM":
            return OrionModel
        if model_architecture == "InternLM2ForCausalLM":
            return InternLM2Model
        if model_architecture == "MiniCPMForCausalLM":
            return MiniCPMModel
        if model_architecture == "BertModel":
            return BertModel
        if model_architecture == "NomicBertModel":
            return NomicBertModel
        return Model

https://github.com/ggerganov/llama.cpp/blob/8084d554406b767d36b3250b3b787462d5dd626f/convert-hf-to-gguf.py#L178

Cohere For AI org

Hi guys,

This discussion seems resolved so I'm closing it for now.

In general this discussion seems more relevant for the llama.cpp repo on github so feel free to continue there if required.

shivi changed discussion status to closed

Sign up or log in to comment