Fix ValueError when save model using `.save_pretrained` due to the non-contiguous tensor in Midm model

#9
by beomi - opened

Proposal of the request

Fix the ValueError when save model using .save_pretrained due to the non-contiguous tensor in Midm model

Err log

  • ValueError happends on both directly calling save_pretrained and when using Trainer of the transformers library.
ValueError: You are trying to save a non contiguous tensor: `transformer.h.0.attn.c_attn.weight` which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call `.contiguous()` on your tensor to pack it before saving.

Full Code & Traceback

Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.17.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from transformers import AutoModelForCausalLM
import to
In [2]: import torch

In [3]: model = AutoModelForCausalLM.from_pretrained(
   ...: 'KT-AI/midm-bitext-S-7B-inst-v1', device_map={'':1})
The repository for KT-AI/midm-bitext-S-7B-inst-v1 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/KT-AI/midm-bitext-S-7B-inst-v1.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
The repository for KT-AI/midm-bitext-S-7B-inst-v1 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/KT-AI/midm-bitext-S-7B-inst-v1.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:18<00:00,  9.44s/it]

In [4]: model
Out[4]: 
MidmLMHeadModel(
  (transformer): MidmModel(
    (wte): Embedding(72192, 4096)
    (rotary_pos_emb): RotaryEmbedding()
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-31): 32 x MidmBlock(
        (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (attn): MidmAttention(
          (c_attn): Linear(in_features=4096, out_features=12288, bias=False)
          (c_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
        )
        (ln_2): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (mlp): MidmMLP(
          (c_fc): Linear(in_features=4096, out_features=21760, bias=False)
          (c_proj): Linear(in_features=10880, out_features=4096, bias=False)
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=4096, out_features=72192, bias=False)
)

In [5]: model.save_pretrained('test')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 model.save_pretrained('test')

File ~/anaconda3/envs/career-chatbot-trainer-clm/lib/python3.10/site-packages/transformers/modeling_utils.py:2187, in PreTrainedModel.save_pretrained(self, save_directory, is_main_process, state_dict, save_function, push_to_hub, max_shard_size, safe_serialization, variant, token, save_peft_format, **kwargs)
   2183 for shard_file, shard in shards.items():
   2184     if safe_serialization:
   2185         # At some point we will need to deal better with save_function (used for TPU and other distributed
   2186         # joyfulness), but for now this enough.
-> 2187         safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
   2188     else:
   2189         save_function(shard, os.path.join(save_directory, shard_file))

File ~/anaconda3/envs/career-chatbot-trainer-clm/lib/python3.10/site-packages/safetensors/torch.py:281, in save_file(tensors, filename, metadata)
    250 def save_file(
    251     tensors: Dict[str, torch.Tensor],
    252     filename: Union[str, os.PathLike],
    253     metadata: Optional[Dict[str, str]] = None,
    254 ):
    255     """
    256     Saves a dictionary of tensors into raw bytes in safetensors format.
    257 
   (...)
    279     ```
    280     """
--> 281     serialize_file(_flatten(tensors), filename, metadata=metadata)

File ~/anaconda3/envs/career-chatbot-trainer-clm/lib/python3.10/site-packages/safetensors/torch.py:475, in _flatten(tensors)
    466 if failing:
    467     raise RuntimeError(
    468         f"""
    469         Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: {failing}.
   (...)
    472         """
    473     )
--> 475 return {
    476     k: {
    477         "dtype": str(v.dtype).split(".")[-1],
    478         "shape": v.shape,
    479         "data": _tobytes(v, k),
    480     }
    481     for k, v in tensors.items()
    482 }

File ~/anaconda3/envs/career-chatbot-trainer-clm/lib/python3.10/site-packages/safetensors/torch.py:479, in <dictcomp>(.0)
    466 if failing:
    467     raise RuntimeError(
    468         f"""
    469         Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: {failing}.
   (...)
    472         """
    473     )
    475 return {
    476     k: {
    477         "dtype": str(v.dtype).split(".")[-1],
    478         "shape": v.shape,
--> 479         "data": _tobytes(v, k),
    480     }
    481     for k, v in tensors.items()
    482 }

File ~/anaconda3/envs/career-chatbot-trainer-clm/lib/python3.10/site-packages/safetensors/torch.py:396, in _tobytes(tensor, name)
    389     raise ValueError(
    390         f"You are trying to save a sparse tensor: `{name}` which this library does not support."
    391         " You can make it a dense tensor before saving with `.to_dense()` but be aware this might"
    392         " make a much larger file than needed."
    393     )
    395 if not tensor.is_contiguous():
--> 396     raise ValueError(
    397         f"You are trying to save a non contiguous tensor: `{name}` which is not allowed. It either means you"
    398         " are trying to save tensors which are reference of each other in which case it's recommended to save"
    399         " only the full tensors, and reslice at load time, or simply call `.contiguous()` on your tensor to"
    400         " pack it before saving."
    401     )
    402 if tensor.device.type != "cpu":
    403     # Moving tensor to cpu before saving
    404     tensor = tensor.to("cpu")

ValueError: You are trying to save a non contiguous tensor: `transformer.h.0.attn.c_attn.weight` which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call `.contiguous()` on your tensor to pack it before saving.

Solution

Override .save_pretrained method on MidmPreTrainedModel to make model's tensor contiguous.

class MidmPreTrainedModel(PreTrainedModel):
    # ... [other methods and properties of the class]

    def make_tensors_contiguous(self):
        for name, param in self.named_parameters():
            if not param.is_contiguous():
                param.data = param.data.contiguous()

    def save_pretrained(self, save_directory, **kwargs):
        # Make tensors contiguous
        self.make_tensors_contiguous()

        # Call the original save_pretrained method
        super().save_pretrained(save_directory, **kwargs)

# Other class definitions remain unchanged

Result

save_pretrained method works fine without error.

Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.17.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from modeling_midm import MidmLMHeadModel

In [2]: model = MidmLMHeadModel.from_pretrained('KT-AI/midm-bitext-S-7B-inst-v1', device_map={'':1})
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:16<00:00,  8.30s/it]

In [3]: model.save_pretrained('test')

In [4]: exit

Test code - works fine!

You can test this PR version using revision='refs/pr/9'.

Here's the test code below:

Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.17.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from transformers import AutoModelForCausalLM
i
In [2]: import torch

In [3]: model = AutoModelForCausalLM.from_pretrained('KT-AI/midm-bitext-S-7B-inst-v1', revision='refs/pr/9', trust_remote_code=True, torch_dtype=torch.bfloat16, device_map={'':0})
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:12<00:00,  6.43s/it]

In [4]: model.save_pretrained('test')

In [5]: 

Thank you for submitting your PR and conducting the test. We've reviewed your PR internally, and it looks good.
Additionally, we've updated the SafeSensor format checkpoint.

ktthkim changed pull request status to merged

Sign up or log in to comment