Model appears to be unusable now, due to the 128 padding (perhaps due to recent changes in Transformers?)
Hey
I've been making AWQs of all models I've done recently. This one failed, due to this error:
ValueError: Trying to set a tensor of shape torch.Size([32128, 8192]) in "weight" (which has shape torch.Size([32007, 8192])), this look incorrect.
I did some more digging, and realised I can't even load the model in plain Transformers:
In [1]: from transformers import AutoModelForCausalLM
In [2]: model = AutoModelForCausalLM.from_pretrained(".", low_cpu_mem_usage=True)
Loading checkpoint shards: 0%| | 0/15 [00:02<?, ?it/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[2], line 1
----> 1 model = AutoModelForCausalLM.from_pretrained(".", low_cpu_mem_usage=True)
File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:563, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
561 elif type(config) in cls._model_mapping.keys():
562 model_class = _get_model_class(config, cls._model_mapping)
--> 563 return model_class.from_pretrained(
564 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
565 )
566 raise ValueError(
567 f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
568 f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
569 )
File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/modeling_utils.py:3187, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3177 if dtype_orig is not None:
3178 torch.set_default_dtype(dtype_orig)
3180 (
3181 model,
3182 missing_keys,
3183 unexpected_keys,
3184 mismatched_keys,
3185 offload_index,
3186 error_msgs,
-> 3187 ) = cls._load_pretrained_model(
3188 model,
3189 state_dict,
3190 loaded_state_dict_keys, # XXX: rename?
3191 resolved_archive_file,
3192 pretrained_model_name_or_path,
3193 ignore_mismatched_sizes=ignore_mismatched_sizes,
3194 sharded_metadata=sharded_metadata,
3195 _fast_init=_fast_init,
3196 low_cpu_mem_usage=low_cpu_mem_usage,
3197 device_map=device_map,
3198 offload_folder=offload_folder,
3199 offload_state_dict=offload_state_dict,
3200 dtype=torch_dtype,
3201 is_quantized=(getattr(model, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES),
3202 keep_in_fp32_modules=keep_in_fp32_modules,
3203 )
3205 model.is_loaded_in_4bit = load_in_4bit
3206 model.is_loaded_in_8bit = load_in_8bit
File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/modeling_utils.py:3575, in PreTrainedModel._load_pretrained_model(cls, model, state_dict, loaded_keys, resolved_archive_file, pretrained_model_name_or_path, ignore_mismatched_sizes, sharded_metadata, _fast_init, low_cpu_mem_usage, device_map, offload_folder, offload_state_dict, dtype, is_quantized, keep_in_fp32_modules)
3573 if low_cpu_mem_usage:
3574 if not is_fsdp_enabled() or is_fsdp_enabled_and_dist_rank_0():
-> 3575 new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
3576 model_to_load,
3577 state_dict,
3578 loaded_keys,
3579 start_prefix,
3580 expected_keys,
3581 device_map=device_map,
3582 offload_folder=offload_folder,
3583 offload_index=offload_index,
3584 state_dict_folder=state_dict_folder,
3585 state_dict_index=state_dict_index,
3586 dtype=dtype,
3587 is_quantized=is_quantized,
3588 is_safetensors=is_safetensors,
3589 keep_in_fp32_modules=keep_in_fp32_modules,
3590 )
3591 error_msgs += new_error_msgs
3592 else:
File /workspace/venv/pytorch2/lib/python3.10/site-packages/transformers/modeling_utils.py:745, in _load_state_dict_into_meta_model(model, state_dict, loaded_state_dict_keys, start_prefix, expected_keys, device_map, offload_folder, offload_index, state_dict_folder, state_dict_index, dtype, is_quantized, is_safetensors, keep_in_fp32_modules)
742 state_dict_index = offload_weight(param, param_name, state_dict_folder, state_dict_index)
743 elif not is_quantized:
744 # For backward compatibility with older versions of `accelerate`
--> 745 set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
746 else:
747 if param.dtype == torch.int8 and param_name.replace("weight", "SCB") in state_dict.keys():
File /workspace/venv/pytorch2/lib/python3.10/site-packages/accelerate/utils/modeling.py:285, in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics)
283 if value is not None:
284 if old_value.shape != value.shape:
--> 285 raise ValueError(
286 f'Trying to set a tensor of shape {value.shape} in "{tensor_name}" (which has shape {old_value.shape}), this look incorrect.'
287 )
289 if dtype is None:
290 # For compatibility with PyTorch load_state_dict which converts state dict dtype to existing dtype in model
291 value = value.to(old_value.dtype)
ValueError: Trying to set a tensor of shape torch.Size([32128, 8192]) in "weight" (which has shape torch.Size([32007, 8192])), this look incorrect.
In [3]:
Is there any workaround you know of? I'm curious how this is working for people still - or maybe it isn't any more.
It worked for me when I made GPTQs three weeks ago, so I'm wondering if a recent update to Transformers or Accelerate (the error comes from Accelerate) is what's triggering the problem.
I'll see if I can go back to the earlier revision, before the 128 padding, to make the AWQ.
I went back to the earlier commit (d9f292769e461eec1f7bfe416ccd4e8043a46179
) and now I can load the model and the AWQ is being created now with no errors.
I guess this AWQ probably won't be shardable due to the uneven vocab_size. But better than not being able to make it at all!
Let me know if you've got any thoughts as to why I can't load the pad-to-128 version.
Switching to accelerate==0.21.0 worked for me.
any update on the official fix for this to for up-to-date transformer+accelerate version?
change vocab_size to 32128 in config.json