How did you manage to quantize the model?

#3
by SaffalPoosh - opened

I was also trying to quantize some 40B/30B LLM models using bitsandbytes and AUTOGPTQ algorithm , with bits and bytes it was giving me errors, with layers, it would be extremely helpful if you can give insight into how did you manage to quantize this model?

Also model.save_pretrained() after load_in_8bit=True gives error that quantized model cannot be saved. How you pushed it here?

thanks

Not all models are able to be quantized using the bitsandbytes integration. If the model contains custom layers outside the standard huggingface transformers library, you will not be able to use the bitsandbytes route. What error are you facing when saving the model?

I also get an error when calling model.save_pretrained() with the 8-bit falcon-7b-instruct model. This is the warning when it's run:
/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/modeling_utils.py:1709: UserWarning: You are calling save_pretrainedto a 8-bit converted model you may likely encounter unexepected behaviors. If you want to save 8-bit models, make sure to havebitsandbytes>0.37.2 installed. warnings.warn(

This is the error:
AttributeError Traceback (most recent call last)
File :1

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/modeling_utils.py:1820, in PreTrainedModel.save_pretrained(self, save_directory, is_main_process, state_dict, save_function, push_to_hub, max_shard_size, safe_serialization, variant, **kwargs)
1817 weights_name = SAFE_WEIGHTS_NAME if safe_serialization else WEIGHTS_NAME
1818 weights_name = _add_variant(weights_name, variant)
-> 1820 shards, index = shard_checkpoint(state_dict, max_shard_size=max_shard_size, weights_name=weights_name)
1822 # Clean the folder from a previous save
1823 for filename in os.listdir(save_directory):

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/modeling_utils.py:318, in shard_checkpoint(state_dict, max_shard_size, weights_name)
315 storage_id_to_block = {}
317 for key, weight in state_dict.items():
--> 318 storage_id = id_tensor_storage(weight)
320 # If a weight shares the same underlying storage as another tensor, we put weight in the same block
321 if storage_id in storage_id_to_block:

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/transformers/pytorch_utils.py:290, in id_tensor_storage(tensor)
283 def id_tensor_storage(tensor: torch.Tensor) -> Tuple[torch.device, int, int]:
284 """
285 Unique identifier to a tensor storage. Multiple different tensors can share the same underlying storage. For
286 example, "meta" tensors all share the same storage, and thus their identifier will all be equal. This identifier is
287 guaranteed to be unique and constant for this tensor's storage during its lifetime. Two tensor storages with
288 non-overlapping lifetimes may have the same id.
289 """
--> 290 return tensor.device, storage_ptr(tensor), storage_size(tensor)

AttributeError: 'str' object has no attribute 'device'

Well, do you have the right bitsandbytes version?

I have version 0.39.1

Maybe try an older version. Huggingface documentation uses
bitsandbytes==0.38.0.post1

Okay, I'll try that. Thanks!

Yeah, setting bitsandbytes to version 0.38.0.post1 fixed my issue

ichitaka changed discussion status to closed

Sign up or log in to comment