Error running the YAML configuration with mergekit-ui

#2
by nazimali - opened

Hey, I tried to reproduce this using https://huggingface.co/spaces/arcee-ai/mergekit-gui and got an error. Did you have to modify mergekit to get it to merge?

Config YAML:

slices:
  - sources:
      - model: tiiuae/falcon-11B
        layer_range: [0, 24]
  - sources:
      - model: tiiuae/falcon-11B
        layer_range: [55, 59]
merge_method: passthrough
dtype: bfloat16

Error output:

...
[2024-09-07 17:15:27] [INFO] File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/mergekit/io/tasks.py", line 86, in execute
[2024-09-07 17:15:27] [INFO] raise RuntimeError(
[2024-09-07 17:15:27] [INFO] RuntimeError: Tensor transformer.h.58.ln_mlp.weight required but not present in model tiiuae/falcon-11B
[2024-09-07 17:15:28] [ERROR] Command exited with code 1
[2024-09-07 17:15:28] [ERROR] Merge failed. Deleting repo as no model is uploaded.

Hi, this is because a custom architecture has to be built (so the yaml has to be changed according to the layer names)

Oh I see. I misunderstood when I read this was the configuration https://huggingface.co/ssmits/Falcon2-5.5B-multilingual#configuration

I thought it meant that it would be reproducible. Thanks for the clarification!

I’ll try to share the code soon! It’s quite straightforward and just based on the changes in the variable names going from Falcon-7B to Falcon-11B. It’s actually a json used in the architectures folder of mergekit which I changed

That would awesome, thank you! I followed the mergekit tutorials and tried to replicate this manually, but lead me down a rabbit hole of failures. So I left it alone.

{
    "model_type": "falcon",
    "architectures": [
        "FalconForCausalLM"
    ],
    "pre_weights": [
        {
            "name": "transformer.word_embeddings.weight",
            "is_embed": true
        }
    ],
    "post_weights": [
        {
            "name": "transformer.ln_f.weight"
        },
        {
            "name": "transformer.ln_f.bias"
        },
        {
            "name": "lm_head.weight",
            "is_embed": true
        }
    ],
    "num_layers_config_key": "num_hidden_layers",
    "layer_templates": {
        "weights": [
            {
                "name": "transformer.h.${layer_index}.input_layernorm.bias"
            },
            {
                "name": "transformer.h.${layer_index}.input_layernorm.weight"
            },
            {
                "name": "transformer.h.${layer_index}.mlp.dense_4h_to_h.weight"
            },
            {
                "name": "transformer.h.${layer_index}.mlp.dense_h_to_4h.weight"
            },
            {
                "name": "transformer.h.${layer_index}.self_attention.dense.weight"
            },
            {
                "name": "transformer.h.${layer_index}.self_attention.query_key_value.weight"
            }
        ]
    }
}

That worked the first time! Thank you for sharing. I did not realize the architecture for it had to be updated with the library, I was trying to emulate it through the yaml configuration. I've been wanting to prune models + merge relevant task layers for a while, but I only got as far as the default examples. Going to try this with other languages :D

ssmits changed discussion status to closed

Great! Yes it was quite tedious to set up considering the fact there was no direct support from the library itself so a custom implementation had to be made.
Awesome, looking forward to seeing those models!

ssmits changed discussion status to open

Sign up or log in to comment