70B model启动时加载失败

#1
by wangbc - opened

您好,我想测试70B的agentLM模型,使用的代码如下

Load model directly

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("THUDM/agentlm-70b")
model = AutoModelForCausalLM.from_pretrained("THUDM/agentlm-70b")

结果报错:
Traceback (most recent call last):
File "load.py", line 5, in
model = AutoModelForCausalLM.from_pretrained("THUDM/agentlm-70b")
File "/home/tiger/.local/lib/python3.7/site-packages/transformers/models/auto/auto_factory.py", line 485, in from_pretrained
pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
File "/home/tiger/.local/lib/python3.7/site-packages/transformers/modeling_utils.py", line 2896, in from_pretrained
keep_in_fp32_modules=keep_in_fp32_modules,
File "/home/tiger/.local/lib/python3.7/site-packages/transformers/modeling_utils.py", line 3278, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).
size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).
size mismatch for model.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).
size mismatch for model.layers.1.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).
size mismatch for model.layers.2.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).

请问之前有遇到过么?如何解决呢?

llama2 和 transformers version 兼容问题,可参考 https://github.com/facebookresearch/llama/issues/378

Sign up or log in to comment