Error running the YAML configuration with mergekit-ui

by nazimali - opened Sep 7, 2024

Sep 7, 2024

Hey, I tried to reproduce this using https://huggingface.co/spaces/arcee-ai/mergekit-gui and got an error. Did you have to modify mergekit to get it to merge?

Config YAML:

slices:
  - sources:
      - model: tiiuae/falcon-11B
        layer_range: [0, 24]
  - sources:
      - model: tiiuae/falcon-11B
        layer_range: [55, 59]
merge_method: passthrough
dtype: bfloat16

Error output:

...
[2024-09-07 17:15:27] [INFO] File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/mergekit/io/tasks.py", line 86, in execute
[2024-09-07 17:15:27] [INFO] raise RuntimeError(
[2024-09-07 17:15:27] [INFO] RuntimeError: Tensor transformer.h.58.ln_mlp.weight required but not present in model tiiuae/falcon-11B
[2024-09-07 17:15:28] [ERROR] Command exited with code 1
[2024-09-07 17:15:28] [ERROR] Merge failed. Deleting repo as no model is uploaded.

ssmits

Owner Sep 27, 2024

Hi, this is because a custom architecture has to be built (so the yaml has to be changed according to the layer names)

nazimali

Sep 28, 2024

Oh I see. I misunderstood when I read this was the configuration https://huggingface.co/ssmits/Falcon2-5.5B-multilingual#configuration

I thought it meant that it would be reproducible. Thanks for the clarification!

ssmits

Owner Sep 30, 2024

•

edited Sep 30, 2024

I’ll try to share the code soon! It’s quite straightforward and just based on the changes in the variable names going from Falcon-7B to Falcon-11B. It’s actually a json used in the architectures folder of mergekit which I changed

nazimali

Sep 30, 2024

That would awesome, thank you! I followed the mergekit tutorials and tried to replicate this manually, but lead me down a rabbit hole of failures. So I left it alone.

ssmits

Owner Oct 2, 2024

•

edited Oct 2, 2024

{
    "model_type": "falcon",
    "architectures": [
        "FalconForCausalLM"
    ],
    "pre_weights": [
        {
            "name": "transformer.word_embeddings.weight",
            "is_embed": true
        }
    ],
    "post_weights": [
        {
            "name": "transformer.ln_f.weight"
        },
        {
            "name": "transformer.ln_f.bias"
        },
        {
            "name": "lm_head.weight",
            "is_embed": true
        }
    ],
    "num_layers_config_key": "num_hidden_layers",
    "layer_templates": {
        "weights": [
            {
                "name": "transformer.h.${layer_index}.input_layernorm.bias"
            },
            {
                "name": "transformer.h.${layer_index}.input_layernorm.weight"
            },
            {
                "name": "transformer.h.${layer_index}.mlp.dense_4h_to_h.weight"
            },
            {
                "name": "transformer.h.${layer_index}.mlp.dense_h_to_4h.weight"
            },
            {
                "name": "transformer.h.${layer_index}.self_attention.dense.weight"
            },
            {
                "name": "transformer.h.${layer_index}.self_attention.query_key_value.weight"
            }
        ]
    }
}

nazimali

Oct 2, 2024

That worked the first time! Thank you for sharing. I did not realize the architecture for it had to be updated with the library, I was trying to emulate it through the yaml configuration. I've been wanting to prune models + merge relevant task layers for a while, but I only got as far as the default examples. Going to try this with other languages :D

ssmits changed discussion status to closed Oct 2, 2024

ssmits

Owner Oct 2, 2024

Great! Yes it was quite tedious to set up considering the fact there was no direct support from the library itself so a custom implementation had to be made.
Awesome, looking forward to seeing those models!

ssmits changed discussion status to open Oct 2, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment