CohereForAI/aya-101 · aya-101 GGUF 4bit

Feb 13, 2024

Hey I am trying to convert the HF model to GGUF but am getting this error KeyError: 'encoder.block.0.layer.0.SelfAttention.k.weight'
I thought this should be only a warning but still the convert crashes.
Any recommendations for conversion to GGUF and 4bit quant?

danwils

Feb 13, 2024

watching!

AiCreatornator

Feb 13, 2024

This is a new model, so llama.cpp would have to create support for it, before you could make and use a GGUF-file.

MaziyarPanahi

Feb 14, 2024

•

edited Feb 14, 2024

The model might be new, but they used T5 architecture (T5ForConditionalGeneration) or (mt5-xxl). At the moment llama.cpp doesn't support it (yet). Most of the supported archs in llama.cpp are CasualLM, I hope they start support XXXForConditionalGeneration soon:

def from_model_architecture(model_architecture):
        if model_architecture == "GPTNeoXForCausalLM":
            return GPTNeoXModel
        if model_architecture == "BloomForCausalLM":
            return BloomModel
        if model_architecture == "MPTForCausalLM":
            return MPTModel
        if model_architecture in ("BaichuanForCausalLM", "BaiChuanForCausalLM"):
            return BaichuanModel
        if model_architecture in ("FalconForCausalLM", "RWForCausalLM"):
            return FalconModel
        if model_architecture == "GPTBigCodeForCausalLM":
            return StarCoderModel
        if model_architecture == "GPTRefactForCausalLM":
            return RefactModel
        if model_architecture == "PersimmonForCausalLM":
            return PersimmonModel
        if model_architecture in ("StableLMEpochForCausalLM", "LlavaStableLMEpochForCausalLM"):
            return StableLMModel
        if model_architecture == "QWenLMHeadModel":
            return QwenModel
        if model_architecture == "Qwen2ForCausalLM":
            return Model
        if model_architecture == "MixtralForCausalLM":
            return MixtralModel
        if model_architecture == "GPT2LMHeadModel":
            return GPT2Model
        if model_architecture == "PhiForCausalLM":
            return Phi2Model
        if model_architecture == "PlamoForCausalLM":
            return PlamoModel
        if model_architecture == "CodeShellForCausalLM":
            return CodeShellModel
        if model_architecture == "OrionForCausalLM":
            return OrionModel
        if model_architecture == "InternLM2ForCausalLM":
            return InternLM2Model
        if model_architecture == "MiniCPMForCausalLM":
            return MiniCPMModel
        if model_architecture == "BertModel":
            return BertModel
        if model_architecture == "NomicBertModel":
            return NomicBertModel
        return Model

https://github.com/ggerganov/llama.cpp/blob/8084d554406b767d36b3250b3b787462d5dd626f/convert-hf-to-gguf.py#L178

shivi

Cohere For AI org Feb 20, 2024

Hi guys,

This discussion seems resolved so I'm closing it for now.

In general this discussion seems more relevant for the llama.cpp repo on github so feel free to continue there if required.

shivi changed discussion status to closed Feb 20, 2024