Text Generation
Transformers
PyTorch
English
llama
sft
Inference Endpoints
text-generation-inference

Model appears to be unusable now, due to the 128 padding (perhaps due to recent changes in Transformers?)

#5
by TheBloke - opened

Hey

I've been making AWQs of all models I've done recently. This one failed, due to this error:

ValueError: Trying to set a tensor of shape torch.Size([32128, 8192]) in "weight" (which has shape torch.Size([32007, 8192])), this look incorrect.

I did some more digging, and realised I can't even load the model in plain Transformers:

In [1]: from transformers import AutoModelForCausalLM

In [2]: model = AutoModelForCausalLM.from_pretrained(".", low_cpu_mem_usage=True)
Loading checkpoint shards:   0%|                                                                                                                                                                                       | 0/15 [00:02<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 model = AutoModelForCausalLM.from_pretrained(".", low_cpu_mem_usage=True)

File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:563, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    561 elif type(config) in cls._model_mapping.keys():
    562     model_class = _get_model_class(config, cls._model_mapping)
--> 563     return model_class.from_pretrained(
    564         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    565     )
    566 raise ValueError(
    567     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    568     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    569 )

File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/modeling_utils.py:3187, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3177     if dtype_orig is not None:
   3178         torch.set_default_dtype(dtype_orig)
   3180     (
   3181         model,
   3182         missing_keys,
   3183         unexpected_keys,
   3184         mismatched_keys,
   3185         offload_index,
   3186         error_msgs,
-> 3187     ) = cls._load_pretrained_model(
   3188         model,
   3189         state_dict,
   3190         loaded_state_dict_keys,  # XXX: rename?
   3191         resolved_archive_file,
   3192         pretrained_model_name_or_path,
   3193         ignore_mismatched_sizes=ignore_mismatched_sizes,
   3194         sharded_metadata=sharded_metadata,
   3195         _fast_init=_fast_init,
   3196         low_cpu_mem_usage=low_cpu_mem_usage,
   3197         device_map=device_map,
   3198         offload_folder=offload_folder,
   3199         offload_state_dict=offload_state_dict,
   3200         dtype=torch_dtype,
   3201         is_quantized=(getattr(model, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES),
   3202         keep_in_fp32_modules=keep_in_fp32_modules,
   3203     )
   3205 model.is_loaded_in_4bit = load_in_4bit
   3206 model.is_loaded_in_8bit = load_in_8bit

File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/modeling_utils.py:3575, in PreTrainedModel._load_pretrained_model(cls, model, state_dict, loaded_keys, resolved_archive_file, pretrained_model_name_or_path, ignore_mismatched_sizes, sharded_metadata, _fast_init, low_cpu_mem_usage, device_map, offload_folder, offload_state_dict, dtype, is_quantized, keep_in_fp32_modules)
   3573 if low_cpu_mem_usage:
   3574     if not is_fsdp_enabled() or is_fsdp_enabled_and_dist_rank_0():
-> 3575         new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
   3576             model_to_load,
   3577             state_dict,
   3578             loaded_keys,
   3579             start_prefix,
   3580             expected_keys,
   3581             device_map=device_map,
   3582             offload_folder=offload_folder,
   3583             offload_index=offload_index,
   3584             state_dict_folder=state_dict_folder,
   3585             state_dict_index=state_dict_index,
   3586             dtype=dtype,
   3587             is_quantized=is_quantized,
   3588             is_safetensors=is_safetensors,
   3589             keep_in_fp32_modules=keep_in_fp32_modules,
   3590         )
   3591         error_msgs += new_error_msgs
   3592     else:

File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/modeling_utils.py:745, in _load_state_dict_into_meta_model(model, state_dict, loaded_state_dict_keys, start_prefix, expected_keys, device_map, offload_folder, offload_index, state_dict_folder, state_dict_index, dtype, is_quantized, is_safetensors, keep_in_fp32_modules)
    742     state_dict_index = offload_weight(param, param_name, state_dict_folder, state_dict_index)
    743 elif not is_quantized:
    744     # For backward compatibility with older versions of `accelerate`
--> 745     set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
    746 else:
    747     if param.dtype == torch.int8 and param_name.replace("weight", "SCB") in state_dict.keys():

File /workspace/venv/pytorch2/lib/python3.10/site-packages/accelerate/utils/modeling.py:285, in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics)
    283 if value is not None:
    284     if old_value.shape != value.shape:
--> 285         raise ValueError(
    286             f'Trying to set a tensor of shape {value.shape} in "{tensor_name}" (which has shape {old_value.shape}), this look incorrect.'
    287         )
    289     if dtype is None:
    290         # For compatibility with PyTorch load_state_dict which converts state dict dtype to existing dtype in model
    291         value = value.to(old_value.dtype)

ValueError: Trying to set a tensor of shape torch.Size([32128, 8192]) in "weight" (which has shape torch.Size([32007, 8192])), this look incorrect.

In [3]:

Is there any workaround you know of? I'm curious how this is working for people still - or maybe it isn't any more.

It worked for me when I made GPTQs three weeks ago, so I'm wondering if a recent update to Transformers or Accelerate (the error comes from Accelerate) is what's triggering the problem.

I'll see if I can go back to the earlier revision, before the 128 padding, to make the AWQ.

I went back to the earlier commit (d9f292769e461eec1f7bfe416ccd4e8043a46179) and now I can load the model and the AWQ is being created now with no errors.

I guess this AWQ probably won't be shardable due to the uneven vocab_size. But better than not being able to make it at all!

Let me know if you've got any thoughts as to why I can't load the pad-to-128 version.

Switching to accelerate==0.21.0 worked for me.

any update on the official fix for this to for up-to-date transformer+accelerate version?

change vocab_size to 32128 in config.json

Sign up or log in to comment