OSError: McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
The error is as described in the title. I followed the instruction and run the following but it throws me with the error.
model = AutoModel.from_pretrained(
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp",
trust_remote_code=True,
config=config,
torch_dtype=torch.bfloat16,
)
I checked Files and versions
. It only has adapter weights only but not the original llama weights. I'm guessing this is the reason. Curious if some weights are missing in this repo?
Hi, the Huggingface gets the original llama weights by using the config.json file from the repo. Can you share the entire code file that you are using, as well as the library version of transformers and peft?
This error arises if peft
library is not installed in the environment.
the Huggingface gets the original llama weights by using the config.json file from the repo
This pathway works only if peft
library is installed. It is not the best way to load the model but this was the only way to push just the adapter weights and avoid all license issues that come with uploading the full model. An alternate way is by using the llm2vec
package, which is detailed here.
Hey, thanks for replying.
Here is the library version:
- transformers: 4.40.2
- llm2vec: 0.1.5
- peft: 0.10.0
I have installed llm2vec
library. Below is the code I was trying to run:
from llm2vec import LLM2Vec
import torch
from transformers import AutoTokenizer, AutoModel, AutoConfig
# Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model.
tokenizer = AutoTokenizer.from_pretrained(
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp"
)
config = AutoConfig.from_pretrained(
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp", trust_remote_code=True
)
model = AutoModel.from_pretrained(
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp",
trust_remote_code=True,
config=config,
torch_dtype=torch.bfloat16,
device_map="cuda" if torch.cuda.is_available() else "cpu",
)
And running this gives me the errorOSError: McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
Hi, the Huggingface gets the original llama weights by using the config.json file from the repo. Can you share the entire code file that you are using, as well as the library version of transformers and peft?
Are you referring to adapter_config.json
?
Also, I am curious to know the exact model architecture of LLM2Vec-Llama-2-7b-chat-hf-mntp-supervised
to understand
- how does the adapter fit into the llama architecture?
- If I need the entire model weights, is it true that all I need to do is to download original llama weights and the adapter weights saved in this repo?
Thanks a lot for your patience and time!
Are you referring to
adapter_config.json
?
I am referring to config.json
. This HF repo contains both config.json
and adapter_config.json
. HF fetches the original model using _name_or_path
field in the config. The trust_remote_code
is needed to apply bidirectional connections to the model. You can find more detailed description here.
If I need the entire model weights, is it true that all I need to do is to download original llama weights and the adapter weights saved in this repo?
As described above, the original model download should be happening automatically.
I am able to run your code snippet without any errrors after removing all the cache, hence I am unable to reproduce the issue. I am listing below the library versions being used.
- transformers: 4.40.2
- huggingface-hub: 0.22.2
- peft: 0.10.0
- llm2vec: 0.1.5
If the issue still persists, can you share a YAML file with the entire environment details?
how does the adapter fit into the llama architecture?
lora adapters are added to all of the following model modules - v_proj, k_proj, q_proj, o_proj, up_proj, down_proj, gate_proj. After Lora, the Llama architecture looks like this
PeftModel(
(base_model): LoraModel(
(model): LlamaEncoderModel(
(embed_tokens): Embedding(32000, 4096)
(layers): ModuleList(
(0-31): 32 x ModifiedLlamaDecoderLayer(
(self_attn): ModifiedLlamaSdpaAttention(
(q_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=16, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(k_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=16, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(v_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=16, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(o_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=16, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=11008, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=16, out_features=11008, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(up_proj): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=11008, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=16, out_features=11008, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(down_proj): lora.Linear(
(base_layer): Linear(in_features=11008, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=11008, out_features=16, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=16, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
)
)
Thanks a lot for the detailed answer. I tried to rerun the code and it works now. Not sure why it threw the error before.
As described above, the original model download should be happening automatically.
I'm trying to demystify the automation here. It looks like the _name_or_path
parameter config.json
is not used anywhere in modeling_llama_encoder.py
. The llama weights seems are loaded when running self.post_init()
. Is my understanding correct? I'm not so sure how exactly the weights are loaded into LlamaEncoderModel though. I'm guessing it's based on weight names? Would appreciate a lot if you could help me dive deep and understand it better. Thank you!
To be honest, I am not fully sure how the download is happening. The combination of PEFT loading + custom code is undocumented in HuggingFace. I tried a bunch of loading options until one of them worked.
As this current issue is resolved, maybe you can close this and open a separate discussion on our Github page? I am sure many others following our repository will benefit from knowing exactly how the model is being loaded.
Sure, let me do it. Thank you again for your help!