OSError: McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

#1
by CannotFindUserName - opened

The error is as described in the title. I followed the instruction and run the following but it throws me with the error.

model = AutoModel.from_pretrained(
    "McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp",
    trust_remote_code=True,
    config=config,
    torch_dtype=torch.bfloat16,
)

I checked Files and versions. It only has adapter weights only but not the original llama weights. I'm guessing this is the reason. Curious if some weights are missing in this repo?

McGill NLP Group org

Hi, the Huggingface gets the original llama weights by using the config.json file from the repo. Can you share the entire code file that you are using, as well as the library version of transformers and peft?

McGill NLP Group org

This error arises if peft library is not installed in the environment.

the Huggingface gets the original llama weights by using the config.json file from the repo

This pathway works only if peft library is installed. It is not the best way to load the model but this was the only way to push just the adapter weights and avoid all license issues that come with uploading the full model. An alternate way is by using the llm2vec package, which is detailed here.

Hey, thanks for replying.

Here is the library version:

  • transformers: 4.40.2
  • llm2vec: 0.1.5
  • peft: 0.10.0

I have installed llm2vec library. Below is the code I was trying to run:

from llm2vec import LLM2Vec

import torch
from transformers import AutoTokenizer, AutoModel, AutoConfig

# Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model.
tokenizer = AutoTokenizer.from_pretrained(
    "McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp"
)
config = AutoConfig.from_pretrained(
    "McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp", trust_remote_code=True
)
model = AutoModel.from_pretrained(
    "McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp",
    trust_remote_code=True,
    config=config,
    torch_dtype=torch.bfloat16,
    device_map="cuda" if torch.cuda.is_available() else "cpu",
)

And running this gives me the error
OSError: McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
Screenshot 2024-05-08 at 10.04.14 AM.png

Hi, the Huggingface gets the original llama weights by using the config.json file from the repo. Can you share the entire code file that you are using, as well as the library version of transformers and peft?

Are you referring to adapter_config.json?

Also, I am curious to know the exact model architecture of LLM2Vec-Llama-2-7b-chat-hf-mntp-supervised to understand

  • how does the adapter fit into the llama architecture?
  • If I need the entire model weights, is it true that all I need to do is to download original llama weights and the adapter weights saved in this repo?

Thanks a lot for your patience and time!

McGill NLP Group org

Are you referring to adapter_config.json?

I am referring to config.json. This HF repo contains both config.json and adapter_config.json. HF fetches the original model using _name_or_path field in the config. The trust_remote_code is needed to apply bidirectional connections to the model. You can find more detailed description here.

If I need the entire model weights, is it true that all I need to do is to download original llama weights and the adapter weights saved in this repo?

As described above, the original model download should be happening automatically.
I am able to run your code snippet without any errrors after removing all the cache, hence I am unable to reproduce the issue. I am listing below the library versions being used.

  • transformers: 4.40.2
  • huggingface-hub: 0.22.2
  • peft: 0.10.0
  • llm2vec: 0.1.5

If the issue still persists, can you share a YAML file with the entire environment details?

how does the adapter fit into the llama architecture?

lora adapters are added to all of the following model modules - v_proj, k_proj, q_proj, o_proj, up_proj, down_proj, gate_proj. After Lora, the Llama architecture looks like this

PeftModel(
  (base_model): LoraModel(
    (model): LlamaEncoderModel(
      (embed_tokens): Embedding(32000, 4096)
      (layers): ModuleList(
        (0-31): 32 x ModifiedLlamaDecoderLayer(
          (self_attn): ModifiedLlamaSdpaAttention(
            (q_proj): lora.Linear(
              (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
              (lora_dropout): ModuleDict(
                (default): Dropout(p=0.05, inplace=False)
              )
              (lora_A): ModuleDict(
                (default): Linear(in_features=4096, out_features=16, bias=False)
              )
              (lora_B): ModuleDict(
                (default): Linear(in_features=16, out_features=4096, bias=False)
              )
              (lora_embedding_A): ParameterDict()
              (lora_embedding_B): ParameterDict()
            )
            (k_proj): lora.Linear(
              (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
              (lora_dropout): ModuleDict(
                (default): Dropout(p=0.05, inplace=False)
              )
              (lora_A): ModuleDict(
                (default): Linear(in_features=4096, out_features=16, bias=False)
              )
              (lora_B): ModuleDict(
                (default): Linear(in_features=16, out_features=4096, bias=False)
              )
              (lora_embedding_A): ParameterDict()
              (lora_embedding_B): ParameterDict()
            )
            (v_proj): lora.Linear(
              (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
              (lora_dropout): ModuleDict(
                (default): Dropout(p=0.05, inplace=False)
              )
              (lora_A): ModuleDict(
                (default): Linear(in_features=4096, out_features=16, bias=False)
              )
              (lora_B): ModuleDict(
                (default): Linear(in_features=16, out_features=4096, bias=False)
              )
              (lora_embedding_A): ParameterDict()
              (lora_embedding_B): ParameterDict()
            )
            (o_proj): lora.Linear(
              (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
              (lora_dropout): ModuleDict(
                (default): Dropout(p=0.05, inplace=False)
              )
              (lora_A): ModuleDict(
                (default): Linear(in_features=4096, out_features=16, bias=False)
              )
              (lora_B): ModuleDict(
                (default): Linear(in_features=16, out_features=4096, bias=False)
              )
              (lora_embedding_A): ParameterDict()
              (lora_embedding_B): ParameterDict()
            )
            (rotary_emb): LlamaRotaryEmbedding()
          )
          (mlp): LlamaMLP(
            (gate_proj): lora.Linear(
              (base_layer): Linear(in_features=4096, out_features=11008, bias=False)
              (lora_dropout): ModuleDict(
                (default): Dropout(p=0.05, inplace=False)
              )
              (lora_A): ModuleDict(
                (default): Linear(in_features=4096, out_features=16, bias=False)
              )
              (lora_B): ModuleDict(
                (default): Linear(in_features=16, out_features=11008, bias=False)
              )
              (lora_embedding_A): ParameterDict()
              (lora_embedding_B): ParameterDict()
            )
            (up_proj): lora.Linear(
              (base_layer): Linear(in_features=4096, out_features=11008, bias=False)
              (lora_dropout): ModuleDict(
                (default): Dropout(p=0.05, inplace=False)
              )
              (lora_A): ModuleDict(
                (default): Linear(in_features=4096, out_features=16, bias=False)
              )
              (lora_B): ModuleDict(
                (default): Linear(in_features=16, out_features=11008, bias=False)
              )
              (lora_embedding_A): ParameterDict()
              (lora_embedding_B): ParameterDict()
            )
            (down_proj): lora.Linear(
              (base_layer): Linear(in_features=11008, out_features=4096, bias=False)
              (lora_dropout): ModuleDict(
                (default): Dropout(p=0.05, inplace=False)
              )
              (lora_A): ModuleDict(
                (default): Linear(in_features=11008, out_features=16, bias=False)
              )
              (lora_B): ModuleDict(
                (default): Linear(in_features=16, out_features=4096, bias=False)
              )
              (lora_embedding_A): ParameterDict()
              (lora_embedding_B): ParameterDict()
            )
            (act_fn): SiLU()
          )
          (input_layernorm): LlamaRMSNorm()
          (post_attention_layernorm): LlamaRMSNorm()
        )
      )
      (norm): LlamaRMSNorm()
    )
  )
)

Thanks a lot for the detailed answer. I tried to rerun the code and it works now. Not sure why it threw the error before.

As described above, the original model download should be happening automatically.

I'm trying to demystify the automation here. It looks like the _name_or_path parameter config.json is not used anywhere in modeling_llama_encoder.py. The llama weights seems are loaded when running self.post_init(). Is my understanding correct? I'm not so sure how exactly the weights are loaded into LlamaEncoderModel though. I'm guessing it's based on weight names? Would appreciate a lot if you could help me dive deep and understand it better. Thank you!

McGill NLP Group org

To be honest, I am not fully sure how the download is happening. The combination of PEFT loading + custom code is undocumented in HuggingFace. I tried a bunch of loading options until one of them worked.

As this current issue is resolved, maybe you can close this and open a separate discussion on our Github page? I am sure many others following our repository will benefit from knowing exactly how the model is being loaded.

Sure, let me do it. Thank you again for your help!

CannotFindUserName changed discussion status to closed

Sign up or log in to comment