OpenAssistant/llama2-70b-oasst-sft-v10 · Model appears to be unusable now, due to the 128 padding (perhaps due to recent changes in Transformers?)

Sep 19, 2023

Hey

I've been making AWQs of all models I've done recently. This one failed, due to this error:

ValueError: Trying to set a tensor of shape torch.Size([32128, 8192]) in "weight" (which has shape torch.Size([32007, 8192])), this look incorrect.

I did some more digging, and realised I can't even load the model in plain Transformers:

In [1]: from transformers import AutoModelForCausalLM

In [2]: model = AutoModelForCausalLM.from_pretrained(".", low_cpu_mem_usage=True)
Loading checkpoint shards:   0%|                                                                                                                                                                                       | 0/15 [00:02<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 model = AutoModelForCausalLM.from_pretrained(".", low_cpu_mem_usage=True)

File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:563, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    561 elif type(config) in cls._model_mapping.keys():
    562     model_class = _get_model_class(config, cls._model_mapping)
--> 563     return model_class.from_pretrained(
    564         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    565     )
    566 raise ValueError(
    567     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    568     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    569 )

File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/modeling_utils.py:3187, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3177     if dtype_orig is not None:
   3178         torch.set_default_dtype(dtype_orig)
   3180     (
   3181         model,
   3182         missing_keys,
   3183         unexpected_keys,
   3184         mismatched_keys,
   3185         offload_index,
   3186         error_msgs,
-> 3187     ) = cls._load_pretrained_model(
   3188         model,
   3189         state_dict,
   3190         loaded_state_dict_keys,  # XXX: rename?
   3191         resolved_archive_file,
   3192         pretrained_model_name_or_path,
   3193         ignore_mismatched_sizes=ignore_mismatched_sizes,
   3194         sharded_metadata=sharded_metadata,
   3195         _fast_init=_fast_init,
   3196         low_cpu_mem_usage=low_cpu_mem_usage,
   3197         device_map=device_map,
   3198         offload_folder=offload_folder,
   3199         offload_state_dict=offload_state_dict,
   3200         dtype=torch_dtype,
   3201         is_quantized=(getattr(model, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES),
   3202         keep_in_fp32_modules=keep_in_fp32_modules,
   3203     )
   3205 model.is_loaded_in_4bit = load_in_4bit
   3206 model.is_loaded_in_8bit = load_in_8bit

File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/modeling_utils.py:3575, in PreTrainedModel._load_pretrained_model(cls, model, state_dict, loaded_keys, resolved_archive_file, pretrained_model_name_or_path, ignore_mismatched_sizes, sharded_metadata, _fast_init, low_cpu_mem_usage, device_map, offload_folder, offload_state_dict, dtype, is_quantized, keep_in_fp32_modules)
   3573 if low_cpu_mem_usage:
   3574     if not is_fsdp_enabled() or is_fsdp_enabled_and_dist_rank_0():
-> 3575         new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
   3576             model_to_load,
   3577             state_dict,
   3578             loaded_keys,
   3579             start_prefix,
   3580             expected_keys,
   3581             device_map=device_map,
   3582             offload_folder=offload_folder,
   3583             offload_index=offload_index,
   3584             state_dict_folder=state_dict_folder,
   3585             state_dict_index=state_dict_index,
   3586             dtype=dtype,
   3587             is_quantized=is_quantized,
   3588             is_safetensors=is_safetensors,
   3589             keep_in_fp32_modules=keep_in_fp32_modules,
   3590         )
   3591         error_msgs += new_error_msgs
   3592     else:

File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/modeling_utils.py:745, in _load_state_dict_into_meta_model(model, state_dict, loaded_state_dict_keys, start_prefix, expected_keys, device_map, offload_folder, offload_index, state_dict_folder, state_dict_index, dtype, is_quantized, is_safetensors, keep_in_fp32_modules)
    742     state_dict_index = offload_weight(param, param_name, state_dict_folder, state_dict_index)
    743 elif not is_quantized:
    744     # For backward compatibility with older versions of `accelerate`
--> 745     set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
    746 else:
    747     if param.dtype == torch.int8 and param_name.replace("weight", "SCB") in state_dict.keys():

File /workspace/venv/pytorch2/lib/python3.10/site-packages/accelerate/utils/modeling.py:285, in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics)
    283 if value is not None:
    284     if old_value.shape != value.shape:
--> 285         raise ValueError(
    286             f'Trying to set a tensor of shape {value.shape} in "{tensor_name}" (which has shape {old_value.shape}), this look incorrect.'
    287         )
    289     if dtype is None:
    290         # For compatibility with PyTorch load_state_dict which converts state dict dtype to existing dtype in model
    291         value = value.to(old_value.dtype)

ValueError: Trying to set a tensor of shape torch.Size([32128, 8192]) in "weight" (which has shape torch.Size([32007, 8192])), this look incorrect.

In [3]:

Is there any workaround you know of? I'm curious how this is working for people still - or maybe it isn't any more.

It worked for me when I made GPTQs three weeks ago, so I'm wondering if a recent update to Transformers or Accelerate (the error comes from Accelerate) is what's triggering the problem.

I'll see if I can go back to the earlier revision, before the 128 padding, to make the AWQ.

TheBloke

Sep 19, 2023

•

edited Sep 19, 2023

I went back to the earlier commit (d9f292769e461eec1f7bfe416ccd4e8043a46179) and now I can load the model and the AWQ is being created now with no errors.

I guess this AWQ probably won't be shardable due to the uneven vocab_size. But better than not being able to make it at all!

Let me know if you've got any thoughts as to why I can't load the pad-to-128 version.

Buddx

Oct 2, 2023

Switching to accelerate==0.21.0 worked for me.

justinphan3110

Oct 19, 2023

any update on the official fix for this to for up-to-date transformer+accelerate version?

gisforgirard

Feb 7, 2024

change vocab_size to 32128 in config.json