Serialization
huggingface_hub
contains helpers to help ML libraries serialize models weights in a standardized way. This part of the lib is still under development and will be improved in future releases. The goal is to harmonize how weights are serialized on the Hub, both to remove code duplication across libraries and to foster conventions on the Hub.
Save torch state dict
The main helper of the serialization
module takes a torch nn.Module
as input and saves it to disk. It handles the logic to save shared tensors (see safetensors explanation) as well as logic to split the state dictionary into shards, using split_torch_state_dict_into_shards() under the hood. At the moment, only torch
framework is supported.
If you want to save a state dictionary (e.g. a mapping between layer names and related tensors) instead of a nn.Module
, you can use save_torch_state_dict() which provides the same features. This is useful for example if you want to apply custom logic to the state dict before saving it.
huggingface_hub.save_torch_model
< source >( model: torch.nn.Module save_directory: typing.Union[str, pathlib.Path] filename_pattern: typing.Optional[str] = None force_contiguous: bool = True max_shard_size: typing.Union[int, str] = '5GB' metadata: typing.Optional[typing.Dict[str, str]] = None safe_serialization: bool = True is_main_process: bool = True )
Parameters
- model (
torch.nn.Module
) — The model to save on disk. - save_directory (
str
orPath
) — The directory in which the model will be saved. - filename_pattern (
str
, optional) — The pattern to generate the files names in which the model will be saved. Pattern must be a string that can be formatted withfilename_pattern.format(suffix=...)
and must contain the keywordsuffix
Defaults to"model{suffix}.safetensors"
orpytorch_model{suffix}.bin
depending onsafe_serialization
parameter. - force_contiguous (
boolean
, optional) — Forcing the state_dict to be saved as contiguous tensors. This has no effect on the correctness of the model, but it could potentially change performance if the layout of the tensor was chosen specifically for that reason. Defaults toTrue
. - max_shard_size (
int
orstr
, optional) — The maximum size of each shard, in bytes. Defaults to 5GB. - metadata (
Dict[str, str]
, optional) — Extra information to save along with the model. Some metadata will be added for each dropped tensors. This information will not be enough to recover the entire shared structure but might help understanding things. - safe_serialization (
bool
, optional) — Whether to save as safetensors, which is the default behavior. IfFalse
, the shards are saved as pickle. Safe serialization is recommended for security reasons. Saving as pickle is deprecated and will be removed in a future version. - is_main_process (
bool
, optional) — Whether the process calling this is the main process or not. Useful when in distributed training like TPUs and need to call this function from all processes. In this case, setis_main_process=True
only on the main process to avoid race conditions. Defaults to True.
Saves a given torch model to disk, handling sharding and shared tensors issues.
See also save_torch_state_dict() to save a state dict with more flexibility.
For more information about tensor sharing, check out this guide.
The model state dictionary is split into shards so that each shard is smaller than a given size. The shards are
saved in the save_directory
with the given filename_pattern
. If the model is too big to fit in a single shard,
an index file is saved in the save_directory
to indicate where each tensor is saved. This helper uses
split_torch_state_dict_into_shards() under the hood. If safe_serialization
is True
, the shards are saved as
safetensors (the default). Otherwise, the shards are saved as pickle.
Before saving the model, the save_directory
is cleaned from any previous shard files.
If one of the model’s tensor is bigger than max_shard_size
, it will end up in its own shard which will have a
size greater than max_shard_size
.
Example:
>>> from huggingface_hub import save_torch_model
>>> model = ... # A PyTorch model
# Save state dict to "path/to/folder". The model will be split into shards of 5GB each and saved as safetensors.
>>> save_torch_model(model, "path/to/folder")
# Load model back
>>> from huggingface_hub import load_torch_model # TODO
>>> load_torch_model(model, "path/to/folder")
>>>
huggingface_hub.save_torch_state_dict
< source >( state_dict: typing.Dict[str, ForwardRef('torch.Tensor')] save_directory: typing.Union[str, pathlib.Path] filename_pattern: typing.Optional[str] = None force_contiguous: bool = True max_shard_size: typing.Union[int, str] = '5GB' metadata: typing.Optional[typing.Dict[str, str]] = None safe_serialization: bool = True is_main_process: bool = True )
Parameters
- state_dict (
Dict[str, torch.Tensor]
) — The state dictionary to save. - save_directory (
str
orPath
) — The directory in which the model will be saved. - filename_pattern (
str
, optional) — The pattern to generate the files names in which the model will be saved. Pattern must be a string that can be formatted withfilename_pattern.format(suffix=...)
and must contain the keywordsuffix
Defaults to"model{suffix}.safetensors"
orpytorch_model{suffix}.bin
depending onsafe_serialization
parameter. - force_contiguous (
boolean
, optional) — Forcing the state_dict to be saved as contiguous tensors. This has no effect on the correctness of the model, but it could potentially change performance if the layout of the tensor was chosen specifically for that reason. Defaults toTrue
. - max_shard_size (
int
orstr
, optional) — The maximum size of each shard, in bytes. Defaults to 5GB. - metadata (
Dict[str, str]
, optional) — Extra information to save along with the model. Some metadata will be added for each dropped tensors. This information will not be enough to recover the entire shared structure but might help understanding things. - safe_serialization (
bool
, optional) — Whether to save as safetensors, which is the default behavior. IfFalse
, the shards are saved as pickle. Safe serialization is recommended for security reasons. Saving as pickle is deprecated and will be removed in a future version. - is_main_process (
bool
, optional) — Whether the process calling this is the main process or not. Useful when in distributed training like TPUs and need to call this function from all processes. In this case, setis_main_process=True
only on the main process to avoid race conditions. Defaults to True.
Save a model state dictionary to the disk, handling sharding and shared tensors issues.
See also save_torch_model() to directly save a PyTorch model.
For more information about tensor sharing, check out this guide.
The model state dictionary is split into shards so that each shard is smaller than a given size. The shards are
saved in the save_directory
with the given filename_pattern
. If the model is too big to fit in a single shard,
an index file is saved in the save_directory
to indicate where each tensor is saved. This helper uses
split_torch_state_dict_into_shards() under the hood. If safe_serialization
is True
, the shards are saved as
safetensors (the default). Otherwise, the shards are saved as pickle.
Before saving the model, the save_directory
is cleaned from any previous shard files.
If one of the model’s tensor is bigger than max_shard_size
, it will end up in its own shard which will have a
size greater than max_shard_size
.
Example:
>>> from huggingface_hub import save_torch_state_dict
>>> model = ... # A PyTorch model
# Save state dict to "path/to/folder". The model will be split into shards of 5GB each and saved as safetensors.
>>> state_dict = model_to_save.state_dict()
>>> save_torch_state_dict(state_dict, "path/to/folder")
Split state dict into shards
The serialization
module also contains low-level helpers to split a state dictionary into several shards, while creating a proper index in the process. These helpers are available for torch
and tensorflow
tensors and are designed to be easily extended to any other ML frameworks.
split_tf_state_dict_into_shards
huggingface_hub.split_tf_state_dict_into_shards
< source >( state_dict: typing.Dict[str, ForwardRef('tf.Tensor')] filename_pattern: str = 'tf_model{suffix}.h5' max_shard_size: typing.Union[int, str] = '5GB' ) → StateDictSplit
Parameters
- state_dict (
Dict[str, Tensor]
) — The state dictionary to save. - filename_pattern (
str
, optional) — The pattern to generate the files names in which the model will be saved. Pattern must be a string that can be formatted withfilename_pattern.format(suffix=...)
and must contain the keywordsuffix
Defaults to"tf_model{suffix}.h5"
. - max_shard_size (
int
orstr
, optional) — The maximum size of each shard, in bytes. Defaults to 5GB.
Returns
StateDictSplit
A StateDictSplit
object containing the shards and the index to retrieve them.
Split a model state dictionary in shards so that each shard is smaller than a given size.
The shards are determined by iterating through the state_dict
in the order of its keys. There is no optimization
made to make each shard as close as possible to the maximum size passed. For example, if the limit is 10GB and we
have tensors of sizes [6GB, 6GB, 2GB, 6GB, 2GB, 2GB] they will get sharded as [6GB], [6+2GB], [6+2+2GB] and not
[6+2+2GB], [6+2GB], [6GB].
If one of the model’s tensor is bigger than max_shard_size
, it will end up in its own shard which will have a
size greater than max_shard_size
.
split_torch_state_dict_into_shards
huggingface_hub.split_torch_state_dict_into_shards
< source >( state_dict: typing.Dict[str, ForwardRef('torch.Tensor')] filename_pattern: str = 'model{suffix}.safetensors' max_shard_size: typing.Union[int, str] = '5GB' ) → StateDictSplit
Parameters
- state_dict (
Dict[str, torch.Tensor]
) — The state dictionary to save. - filename_pattern (
str
, optional) — The pattern to generate the files names in which the model will be saved. Pattern must be a string that can be formatted withfilename_pattern.format(suffix=...)
and must contain the keywordsuffix
Defaults to"model{suffix}.safetensors"
. - max_shard_size (
int
orstr
, optional) — The maximum size of each shard, in bytes. Defaults to 5GB.
Returns
StateDictSplit
A StateDictSplit
object containing the shards and the index to retrieve them.
Split a model state dictionary in shards so that each shard is smaller than a given size.
The shards are determined by iterating through the state_dict
in the order of its keys. There is no optimization
made to make each shard as close as possible to the maximum size passed. For example, if the limit is 10GB and we
have tensors of sizes [6GB, 6GB, 2GB, 6GB, 2GB, 2GB] they will get sharded as [6GB], [6+2GB], [6+2+2GB] and not
[6+2+2GB], [6+2GB], [6GB].
To save a model state dictionary to the disk, see save_torch_state_dict(). This helper uses
split_torch_state_dict_into_shards
under the hood.
If one of the model’s tensor is bigger than max_shard_size
, it will end up in its own shard which will have a
size greater than max_shard_size
.
Example:
>>> import json
>>> import os
>>> from safetensors.torch import save_file as safe_save_file
>>> from huggingface_hub import split_torch_state_dict_into_shards
>>> def save_state_dict(state_dict: Dict[str, torch.Tensor], save_directory: str):
... state_dict_split = split_torch_state_dict_into_shards(state_dict)
... for filename, tensors in state_dict_split.filename_to_tensors.items():
... shard = {tensor: state_dict[tensor] for tensor in tensors}
... safe_save_file(
... shard,
... os.path.join(save_directory, filename),
... metadata={"format": "pt"},
... )
... if state_dict_split.is_sharded:
... index = {
... "metadata": state_dict_split.metadata,
... "weight_map": state_dict_split.tensor_to_filename,
... }
... with open(os.path.join(save_directory, "model.safetensors.index.json"), "w") as f:
... f.write(json.dumps(index, indent=2))
split_state_dict_into_shards_factory
This is the underlying factory from which each framework-specific helper is derived. In practice, you are not expected to use this factory directly except if you need to adapt it to a framework that is not yet supported. If that is the case, please let us know by opening a new issue on the huggingface_hub
repo.
huggingface_hub.split_state_dict_into_shards_factory
< source >( state_dict: typing.Dict[str, ~TensorT] get_storage_size: typing.Callable[[~TensorT], int] filename_pattern: str get_storage_id: typing.Callable[[~TensorT], typing.Optional[typing.Any]] = <function <lambda> at 0x7f4f7e349bd0> max_shard_size: typing.Union[int, str] = '5GB' ) → StateDictSplit
Parameters
- state_dict (
Dict[str, Tensor]
) — The state dictionary to save. - get_storage_size (
Callable[[Tensor], int]
) — A function that returns the size of a tensor when saved on disk in bytes. - get_storage_id (
Callable[[Tensor], Optional[Any]]
, optional) — A function that returns a unique identifier to a tensor storage. Multiple different tensors can share the same underlying storage. This identifier is guaranteed to be unique and constant for this tensor’s storage during its lifetime. Two tensor storages with non-overlapping lifetimes may have the same id. - filename_pattern (
str
, optional) — The pattern to generate the files names in which the model will be saved. Pattern must be a string that can be formatted withfilename_pattern.format(suffix=...)
and must contain the keywordsuffix
- max_shard_size (
int
orstr
, optional) — The maximum size of each shard, in bytes. Defaults to 5GB.
Returns
StateDictSplit
A StateDictSplit
object containing the shards and the index to retrieve them.
Split a model state dictionary in shards so that each shard is smaller than a given size.
The shards are determined by iterating through the state_dict
in the order of its keys. There is no optimization
made to make each shard as close as possible to the maximum size passed. For example, if the limit is 10GB and we
have tensors of sizes [6GB, 6GB, 2GB, 6GB, 2GB, 2GB] they will get sharded as [6GB], [6+2GB], [6+2+2GB] and not
[6+2+2GB], [6+2GB], [6GB].
If one of the model’s tensor is bigger than max_shard_size
, it will end up in its own shard which will have a
size greater than max_shard_size
.
Helpers
get_torch_storage_id
Return unique identifier to a tensor storage.
Multiple different tensors can share the same underlying storage. This identifier is guaranteed to be unique and constant for this tensor’s storage during its lifetime. Two tensor storages with non-overlapping lifetimes may have the same id. In the case of meta tensors, we return None since we can’t tell if they share the same storage.