Fix ValueError when save model using `.save_pretrained` due to the non-contiguous tensor in Midm model
#9
by
beomi
- opened
Proposal of the request
Fix the ValueError when save model using .save_pretrained
due to the non-contiguous tensor in Midm model
Err log
- ValueError happends on both directly calling
save_pretrained
and when usingTrainer
of the transformers library.
ValueError: You are trying to save a non contiguous tensor: `transformer.h.0.attn.c_attn.weight` which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call `.contiguous()` on your tensor to pack it before saving.
Full Code & Traceback
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.17.2 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from transformers import AutoModelForCausalLM
import to
In [2]: import torch
In [3]: model = AutoModelForCausalLM.from_pretrained(
...: 'KT-AI/midm-bitext-S-7B-inst-v1', device_map={'':1})
The repository for KT-AI/midm-bitext-S-7B-inst-v1 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/KT-AI/midm-bitext-S-7B-inst-v1.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
Do you wish to run the custom code? [y/N] y
The repository for KT-AI/midm-bitext-S-7B-inst-v1 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/KT-AI/midm-bitext-S-7B-inst-v1.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
Do you wish to run the custom code? [y/N] y
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:18<00:00, 9.44s/it]
In [4]: model
Out[4]:
MidmLMHeadModel(
(transformer): MidmModel(
(wte): Embedding(72192, 4096)
(rotary_pos_emb): RotaryEmbedding()
(drop): Dropout(p=0.0, inplace=False)
(h): ModuleList(
(0-31): 32 x MidmBlock(
(ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
(attn): MidmAttention(
(c_attn): Linear(in_features=4096, out_features=12288, bias=False)
(c_proj): Linear(in_features=4096, out_features=4096, bias=False)
(attn_dropout): Dropout(p=0.0, inplace=False)
(resid_dropout): Dropout(p=0.0, inplace=False)
)
(ln_2): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
(mlp): MidmMLP(
(c_fc): Linear(in_features=4096, out_features=21760, bias=False)
(c_proj): Linear(in_features=10880, out_features=4096, bias=False)
(dropout): Dropout(p=0.0, inplace=False)
)
)
)
(ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=4096, out_features=72192, bias=False)
)
In [5]: model.save_pretrained('test')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[5], line 1
----> 1 model.save_pretrained('test')
File ~/anaconda3/envs/career-chatbot-trainer-clm/lib/python3.10/site-packages/transformers/modeling_utils.py:2187, in PreTrainedModel.save_pretrained(self, save_directory, is_main_process, state_dict, save_function, push_to_hub, max_shard_size, safe_serialization, variant, token, save_peft_format, **kwargs)
2183 for shard_file, shard in shards.items():
2184 if safe_serialization:
2185 # At some point we will need to deal better with save_function (used for TPU and other distributed
2186 # joyfulness), but for now this enough.
-> 2187 safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
2188 else:
2189 save_function(shard, os.path.join(save_directory, shard_file))
File ~/anaconda3/envs/career-chatbot-trainer-clm/lib/python3.10/site-packages/safetensors/torch.py:281, in save_file(tensors, filename, metadata)
250 def save_file(
251 tensors: Dict[str, torch.Tensor],
252 filename: Union[str, os.PathLike],
253 metadata: Optional[Dict[str, str]] = None,
254 ):
255 """
256 Saves a dictionary of tensors into raw bytes in safetensors format.
257
(...)
279 ```
280 """
--> 281 serialize_file(_flatten(tensors), filename, metadata=metadata)
File ~/anaconda3/envs/career-chatbot-trainer-clm/lib/python3.10/site-packages/safetensors/torch.py:475, in _flatten(tensors)
466 if failing:
467 raise RuntimeError(
468 f"""
469 Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: {failing}.
(...)
472 """
473 )
--> 475 return {
476 k: {
477 "dtype": str(v.dtype).split(".")[-1],
478 "shape": v.shape,
479 "data": _tobytes(v, k),
480 }
481 for k, v in tensors.items()
482 }
File ~/anaconda3/envs/career-chatbot-trainer-clm/lib/python3.10/site-packages/safetensors/torch.py:479, in <dictcomp>(.0)
466 if failing:
467 raise RuntimeError(
468 f"""
469 Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: {failing}.
(...)
472 """
473 )
475 return {
476 k: {
477 "dtype": str(v.dtype).split(".")[-1],
478 "shape": v.shape,
--> 479 "data": _tobytes(v, k),
480 }
481 for k, v in tensors.items()
482 }
File ~/anaconda3/envs/career-chatbot-trainer-clm/lib/python3.10/site-packages/safetensors/torch.py:396, in _tobytes(tensor, name)
389 raise ValueError(
390 f"You are trying to save a sparse tensor: `{name}` which this library does not support."
391 " You can make it a dense tensor before saving with `.to_dense()` but be aware this might"
392 " make a much larger file than needed."
393 )
395 if not tensor.is_contiguous():
--> 396 raise ValueError(
397 f"You are trying to save a non contiguous tensor: `{name}` which is not allowed. It either means you"
398 " are trying to save tensors which are reference of each other in which case it's recommended to save"
399 " only the full tensors, and reslice at load time, or simply call `.contiguous()` on your tensor to"
400 " pack it before saving."
401 )
402 if tensor.device.type != "cpu":
403 # Moving tensor to cpu before saving
404 tensor = tensor.to("cpu")
ValueError: You are trying to save a non contiguous tensor: `transformer.h.0.attn.c_attn.weight` which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call `.contiguous()` on your tensor to pack it before saving.
Solution
Override .save_pretrained
method on MidmPreTrainedModel
to make model's tensor contiguous.
class MidmPreTrainedModel(PreTrainedModel):
# ... [other methods and properties of the class]
def make_tensors_contiguous(self):
for name, param in self.named_parameters():
if not param.is_contiguous():
param.data = param.data.contiguous()
def save_pretrained(self, save_directory, **kwargs):
# Make tensors contiguous
self.make_tensors_contiguous()
# Call the original save_pretrained method
super().save_pretrained(save_directory, **kwargs)
# Other class definitions remain unchanged
Result
save_pretrained
method works fine without error.
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.17.2 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from modeling_midm import MidmLMHeadModel
In [2]: model = MidmLMHeadModel.from_pretrained('KT-AI/midm-bitext-S-7B-inst-v1', device_map={'':1})
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:16<00:00, 8.30s/it]
In [3]: model.save_pretrained('test')
In [4]: exit
Test code - works fine!
You can test this PR version using revision='refs/pr/9'
.
Here's the test code below:
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.17.2 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from transformers import AutoModelForCausalLM
i
In [2]: import torch
In [3]: model = AutoModelForCausalLM.from_pretrained('KT-AI/midm-bitext-S-7B-inst-v1', revision='refs/pr/9', trust_remote_code=True, torch_dtype=torch.bfloat16, device_map={'':0})
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:12<00:00, 6.43s/it]
In [4]: model.save_pretrained('test')
In [5]:
Thank you for submitting your PR and conducting the test. We've reviewed your PR internally, and it looks good.
Additionally, we've updated the SafeSensor format checkpoint.
ktthkim
changed pull request status to
merged