Models
ベースクラスである PreTrainedModel、TFPreTrainedModel、FlaxPreTrainedModel は、モデルの読み込みと保存に関する共通のメソッドを実装しており、これはローカルのファイルやディレクトリから、またはライブラリが提供する事前学習モデル構成(HuggingFaceのAWS S3リポジトリからダウンロード)からモデルを読み込むために使用できます。
PreTrainedModel と TFPreTrainedModel は、次の共通のメソッドも実装しています:
- 語彙に新しいトークンが追加された場合に、入力トークン埋め込みのリサイズを行う
- モデルのアテンションヘッドを刈り込む
各モデルに共通するその他のメソッドは、ModuleUtilsMixin(PyTorchモデル用)および~modeling_tf_utils.TFModuleUtilsMixin
(TensorFlowモデル用)で定義されており、テキスト生成の場合、GenerationMixin(PyTorchモデル用)、TFGenerationMixin(TensorFlowモデル用)、およびFlaxGenerationMixin(Flax/JAXモデル用)もあります。
PreTrainedModel
Base class for all models.
PreTrainedModel takes care of storing the configuration of the models and handles methods for loading, downloading and saving models as well as a few methods common to all models to:
- resize the input embeddings,
- prune heads in the self-attention heads.
Class attributes (overridden by derived classes):
config_class (PretrainedConfig) — A subclass of PretrainedConfig to use as configuration class for this model architecture.
load_tf_weights (
Callable
) — A python method for loading a TensorFlow checkpoint in a PyTorch model, taking as arguments:- model (PreTrainedModel) — An instance of the model on which to load the TensorFlow checkpoint.
- config (
PreTrainedConfig
) — An instance of the configuration associated to the model. - path (
str
) — A path to the TensorFlow checkpoint.
base_model_prefix (
str
) — A string indicating the attribute associated to the base model in derived classes of the same architecture adding modules on top of the base model.is_parallelizable (
bool
) — A flag indicating whether this model supports model parallelization.main_input_name (
str
) — The name of the principal input to the model (ofteninput_ids
for NLP models,pixel_values
for vision models andinput_values
for speech models).
push_to_hub
< source >( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str = None commit_description: str = None tags: typing.Optional[typing.List[str]] = None **deprecated_kwargs )
Parameters
- repo_id (
str
) — The name of the repository you want to push your model to. It should contain your organization name when pushing to a given organization. - use_temp_dir (
bool
, optional) — Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default toTrue
if there is no directory named likerepo_id
,False
otherwise. - commit_message (
str
, optional) — Message to commit while pushing. Will default to"Upload model"
. - private (
bool
, optional) — Whether to make the repo private. IfNone
(default), the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists. - token (
bool
orstr
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). Will default toTrue
ifrepo_url
is not specified. - max_shard_size (
int
orstr
, optional, defaults to"5GB"
) — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"
). We default it to"5GB"
so that users can easily load models on free-tier Google Colab instances without any CPU OOM issues. - create_pr (
bool
, optional, defaults toFalse
) — Whether or not to create a PR with the uploaded files or directly commit. - safe_serialization (
bool
, optional, defaults toTrue
) — Whether or not to convert the model weights in safetensors format for safer serialization. - revision (
str
, optional) — Branch to push the uploaded files to. - commit_description (
str
, optional) — The description of the commit that will be created - tags (
List[str]
, optional) — List of tags to push on the Hub.
Upload the model file to the 🤗 Model Hub.
Examples:
from transformers import AutoModel
model = AutoModel.from_pretrained("google-bert/bert-base-cased")
# Push the model to your namespace with the name "my-finetuned-bert".
model.push_to_hub("my-finetuned-bert")
# Push the model to an organization with the name "my-finetuned-bert".
model.push_to_hub("huggingface/my-finetuned-bert")
add_model_tags
< source >( tags: typing.Union[typing.List[str], str] )
Add custom tags into the model that gets pushed to the Hugging Face Hub. Will not overwrite existing tags in the model.
can_generate
< source >( ) → bool
Returns
bool
Whether this model can generate sequences with .generate()
.
Returns whether this model can generate sequences with .generate()
.
Potentially dequantize the model in case it has been quantized by a quantization method that support dequantization.
Removes the _require_grads_hook
.
Enables the gradients for the input embeddings. This is useful for fine-tuning adapter weights while keeping the model weights fixed.
from_pretrained
< source >( pretrained_model_name_or_path: typing.Union[str, os.PathLike, NoneType] *model_args config: typing.Union[transformers.configuration_utils.PretrainedConfig, str, os.PathLike, NoneType] = None cache_dir: typing.Union[str, os.PathLike, NoneType] = None ignore_mismatched_sizes: bool = False force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' use_safetensors: typing.Optional[bool] = None weights_only: bool = True **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
, optional) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/
. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index
). In this case,from_tf
should be set toTrue
and a configuration object should be provided asconfig
argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards. - A path or url to a model folder containing a flax checkpoint file in .msgpack format (e.g,
./flax_model/
containingflax_model.msgpack
). In this case,from_flax
should be set toTrue
. None
if you are both providing the configuration and state dictionary (resp. with keyword argumentsconfig
andstate_dict
).
- model_args (sequence of positional arguments, optional) —
All remaining positional arguments will be passed to the underlying model’s
__init__
method. - config (
Union[PretrainedConfig, str, os.PathLike]
, optional) — Can be either:- an instance of a class derived from PretrainedConfig,
- a string or path valid as input to from_pretrained().
Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
- state_dict (
Dict[str, torch.Tensor]
, optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
Union[str, os.PathLike]
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool
, optional, defaults toFalse
) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_path
argument). - from_flax (
bool
, optional, defaults toFalse
) — Load the model weights from a Flax checkpoint save file (see docstring ofpretrained_model_name_or_path
argument). - ignore_mismatched_sizes (
bool
, optional, defaults toFalse
) — Whether or not to raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model (if for instance, you are instantiating a model with 10 labels from a checkpoint with 3 labels). - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - output_loading_info(
bool
, optional, defaults toFalse
) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (i.e., do not try to download the model). - token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.To test a pull request you made on the Hub, you can pass
revision="refs/pr/<pr_number>"
. - mirror (
str
, optional) — Mirror source to accelerate downloads in China. If you are from China and have an accessibility problem, you can set this option to resolve it. Note that we do not guarantee the timeliness or safety. Please refer to the mirror site for more information. - _fast_init(
bool
, optional, defaults toTrue
) — Whether or not to disable fast initialization.One should only disable _fast_init to ensure backwards compatibility with
transformers.__version__ < 4.6.0
for seeded model initialization. This argument will be removed at the next major version. See pull request 11471 for more information. - attn_implementation (
str
, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"
(manual implementation of the attention),"sdpa"
(usingF.scaled_dot_product_attention
), or"flash_attention_2"
(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"
implementation.
Parameters for big model inference
- low_cpu_mem_usage(
bool
, optional) — Tries not to use more than 1x model size in CPU memory (including peak memory) while loading the model. Generally should be combined with adevice_map
(such as"auto"
) for best results. This is an experimental feature and a subject to change at any moment. If the model weights are in the same precision as the model loaded in, `low_cpu_mem_usage` (without `device_map`) is redundant and will not provide any benefit in regards to CPU memory usage. However, this should still be enabled if you are passing in a `device_map`. - torch_dtype (
str
ortorch.dtype
, optional) — Override the defaulttorch.dtype
and load the model under a specificdtype
. The different options are:-
torch.float16
ortorch.bfloat16
ortorch.float
: load in a specifieddtype
, ignoring the model’sconfig.torch_dtype
if one exists. If not specified- the model will get loaded in
torch.float
(fp32).
- the model will get loaded in
-
"auto"
- Atorch_dtype
entry in theconfig.json
file of the model will be attempted to be used. If this entry isn’t found then next check thedtype
of the first weight in the checkpoint that’s of a floating point type and use that asdtype
. This will load the model using thedtype
it was saved in at the end of the training. It can’t be used as an indicator of how the model was trained. Since it could be trained in one of half precision dtypes, but saved in fp32. -
A string that is a valid
torch.dtype
. E.g. “float32” loads the model intorch.float32
, “float16” loads intorch.float16
etc.
For some models the
dtype
they were trained in is unknown - you may try to check the model’s paper or reach out to the authors and ask them to add this information to the model’s card and to insert thetorch_dtype
entry inconfig.json
on the hub. -
- device_map (
str
orDict[str, Union[int, str, torch.device]]
orint
ortorch.device
, optional) — A map that specifies where each submodule should go. It doesn’t need to be refined to each parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the same device. If we only pass the device (e.g.,"cpu"
,"cuda:1"
,"mps"
, or a GPU ordinal rank like1
) on which the model will be allocated, the device map will map the entire model to this device. Passingdevice_map = 0
means put the whole model on GPU 0.To have Accelerate compute the most optimized
device_map
automatically, setdevice_map="auto"
. For more information about each option see designing a device map. - max_memory (
Dict
, optional) — A dictionary device identifier to maximum memory. Will default to the maximum memory available for each GPU and the available CPU RAM if unset. - offload_folder (
str
oros.PathLike
, optional) — If thedevice_map
contains any value"disk"
, the folder where we will offload weights. - offload_state_dict (
bool
, optional) — IfTrue
, will temporarily offload the CPU state dict to the hard drive to avoid getting out of CPU RAM if the weight of the CPU state dict + the biggest shard of the checkpoint does not fit. Defaults toTrue
when there is some disk offload. - offload_buffers (
bool
, optional) — Whether or not to offload the buffers with the model parameters. - quantization_config (
Union[QuantizationConfigMixin,Dict]
, optional) — A dictionary of configuration parameters or a QuantizationConfigMixin object for quantization (e.g bitsandbytes, gptq). There may be other quantization-related kwargs, includingload_in_4bit
andload_in_8bit
, which are parsed by QuantizationConfigParser. Supported only for bitsandbytes quantizations and not preferred. consider inserting all such arguments into quantization_config instead. - subfolder (
str
, optional, defaults to""
) — In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here. - variant (
str
, optional) — If specified load weights fromvariant
filename, e.g. pytorch_model..bin. variant
is ignored when usingfrom_tf
orfrom_flax
. - use_safetensors (
bool
, optional, defaults toNone
) — Whether or not to usesafetensors
checkpoints. Defaults toNone
. If not specified andsafetensors
is not installed, it will be set toFalse
. - weights_only (
bool
, optional, defaults toTrue
) — Indicates whether unpickler should be restricted to loading only tensors, primitive types, dictionaries and any types added via torch.serialization.add_safe_globals(). When set to False, we can load wrapper tensor subclass weights. - kwargs (remaining dictionary of keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True
). Behaves differently depending on whether aconfig
is provided or automatically loaded:- If a configuration is provided with
config
,**kwargs
will be directly passed to the underlying model’s__init__
method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargs
will be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargs
that corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargs
value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__
function.
- If a configuration is provided with
Instantiate a pretrained pytorch model from a pre-trained model configuration.
The model is set in evaluation mode by default using model.eval()
(Dropout modules are deactivated). To train
the model, you should first set it back in training mode with model.train()
.
The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning task.
The warning Weights from XXX not used in YYY means that the layer XXX is not used by YYY, therefore those weights are discarded.
If model weights are the same precision as the base model (and is a supported model), weights will be lazily loaded
in using the meta
device and brought into memory once an input is passed through that layer regardless of
low_cpu_mem_usage
.
Activate the special “offline-mode” to use this method in a firewalled environment.
Examples:
>>> from transformers import BertConfig, BertModel
>>> # Download model and configuration from huggingface.co and cache.
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = BertModel.from_pretrained("./test/saved_model/")
>>> # Update configuration during loading.
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
>>> assert model.config.output_attentions == True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower, for example purposes, not runnable).
>>> config = BertConfig.from_json_file("./tf_model/my_tf_model_config.json")
>>> model = BertModel.from_pretrained("./tf_model/my_tf_checkpoint.ckpt.index", from_tf=True, config=config)
>>> # Loading from a Flax checkpoint file instead of a PyTorch model (slower)
>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", from_flax=True)
low_cpu_mem_usage
algorithm:
This is an experimental function that loads the model using ~1x model size CPU memory
Here is how it works:
- save which state_dict keys we have
- drop state_dict before the model is created, since the latter takes 1x model size CPU memory
- after the model has been instantiated switch to the meta device all params/buffers that are going to be replaced from the loaded state_dict
- load state_dict 2nd time
- replace the params/buffers from the state_dict
Currently, it can’t handle deepspeed ZeRO stage 3 and ignores loading errors
Return a torch.compile
‘d version of self.__call__
. This is useful to dynamically choose between
non-compiled/compiled forward
during inference, especially to switch between prefill (where we don’t
want to use compiled version to avoid recomputing the graph with new shapes) and iterative decoding
(where we want the speed-ups of compiled version with static shapes).
get_input_embeddings
< source >( ) → nn.Module
Returns
nn.Module
A torch module mapping vocabulary to hidden states.
Returns the model’s input embeddings.
get_memory_footprint
< source >( return_buffers = True )
Parameters
- return_buffers (
bool
, optional, defaults toTrue
) — Whether to return the size of the buffer tensors in the computation of the memory footprint. Buffers are tensors that do not require gradients and not registered as parameters. E.g. mean and std in batch norm layers. Please see: https://discuss.pytorch.org/t/what-pytorch-means-by-buffers/120266/2
Get the memory footprint of a model. This will return the memory footprint of the current model in bytes. Useful to benchmark the memory footprint of the current model and design some tests. Solution inspired from the PyTorch discussions: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2
get_output_embeddings
< source >( ) → nn.Module
Returns
nn.Module
A torch module mapping hidden states to vocabulary.
Returns the model’s output embeddings.
Deactivates gradient checkpointing for the current model.
Note that in other frameworks this feature can be referred to as “activation checkpointing” or “checkpoint activations”.
gradient_checkpointing_enable
< source >( gradient_checkpointing_kwargs = None )
Activates gradient checkpointing for the current model.
Note that in other frameworks this feature can be referred to as “activation checkpointing” or “checkpoint activations”.
We pass the __call__
method of the modules instead of forward
because __call__
attaches all the hooks of
the module. https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690/2
If needed prunes and maybe initializes weights. If using a custom PreTrainedModel
, you need to implement any
initialization logic in _init_weights
.
A method executed at the end of each Transformer model initialization, to execute code that needs the model’s modules properly initialized (such as weight initialization).
prune_heads
< source >( heads_to_prune: typing.Dict[int, typing.List[int]] )
Prunes heads of the base model.
register_for_auto_class
< source >( auto_class = 'AutoModel' )
Register this class with a given auto class. This should only be used for custom models as the ones in the library are already mapped with an auto class.
This API is experimental and may have some slight breaking changes in the next releases.
resize_token_embeddings
< source >( new_num_tokens: typing.Optional[int] = None pad_to_multiple_of: typing.Optional[int] = None mean_resizing: bool = True ) → torch.nn.Embedding
Parameters
- new_num_tokens (
int
, optional) — The new number of tokens in the embedding matrix. Increasing the size will add newly initialized vectors at the end. Reducing the size will remove vectors from the end. If not provided orNone
, just returns a pointer to the input tokenstorch.nn.Embedding
module of the model without doing anything. - pad_to_multiple_of (
int
, optional) — If set will pad the embedding matrix to a multiple of the provided value.Ifnew_num_tokens
is set toNone
will just pad the embedding to a multiple ofpad_to_multiple_of
.This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
>= 7.5
(Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc - mean_resizing (
bool
) — Whether to initialize the added embeddings from a multivariate normal distribution that has old embeddings’ mean and covariance or to initialize them with a normal distribution that has a mean of zero and std equalsconfig.initializer_range
.Setting
mean_resizing
toTrue
is useful when increasing the size of the embeddings of causal language models, where the generated tokens’ probabilities won’t be affected by the added embeddings because initializing the new embeddings with the old embeddings’ mean will reduce the kl-divergence between the next token probability before and after adding the new embeddings. Refer to this article for more information: https://nlp.stanford.edu/~johnhew/vocab-expansion.html
Returns
torch.nn.Embedding
Pointer to the input tokens Embeddings Module of the model.
Resizes input token embeddings matrix of the model if new_num_tokens != config.vocab_size
.
Takes care of tying weights embeddings afterwards if the model class has a tie_weights()
method.
reverse_bettertransformer
< source >( ) → PreTrainedModel
Reverts the transformation from to_bettertransformer() so that the original modeling is used, for example in order to save the model.
save_pretrained
< source >( save_directory: typing.Union[str, os.PathLike] is_main_process: bool = True state_dict: typing.Optional[dict] = None save_function: typing.Callable = <function save at 0x7f4ab874af80> push_to_hub: bool = False max_shard_size: typing.Union[int, str] = '5GB' safe_serialization: bool = True variant: typing.Optional[str] = None token: typing.Union[str, bool, NoneType] = None save_peft_format: bool = True **kwargs )
Parameters
- save_directory (
str
oros.PathLike
) — Directory to which to save. Will be created if it doesn’t exist. - is_main_process (
bool
, optional, defaults toTrue
) — Whether the process calling this is the main process or not. Useful when in distributed training like TPUs and need to call this function on all processes. In this case, setis_main_process=True
only on the main process to avoid race conditions. - state_dict (nested dictionary of
torch.Tensor
) — The state dictionary of the model to save. Will default toself.state_dict()
, but can be used to only save parts of the model or if special precautions need to be taken when recovering the state dictionary of a model (like when using model parallelism). - save_function (
Callable
) — The function to use to save the state dictionary. Useful on distributed training like TPUs when one need to replacetorch.save
by another method. - push_to_hub (
bool
, optional, defaults toFalse
) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to withrepo_id
(will default to the name ofsave_directory
in your namespace). - max_shard_size (
int
orstr
, optional, defaults to"5GB"
) — The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"
). We default it to 5GB in order for models to be able to run easily on free-tier google colab instances without CPU OOM issues.If a single weight of the model is bigger than
max_shard_size
, it will be in its own checkpoint shard which will be bigger thanmax_shard_size
. - safe_serialization (
bool
, optional, defaults toTrue
) — Whether to save the model usingsafetensors
or the traditional PyTorch way (that usespickle
). - variant (
str
, optional) — If specified, weights are saved in the format pytorch_model..bin. - token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - save_peft_format (
bool
, optional, defaults toTrue
) — For backward compatibility with PEFT library, in case adapter weights are attached to the model, all keys of the state dict of adapters needs to be pre-pended withbase_model.model
. Advanced users can disable this behaviours by settingsave_peft_format
toFalse
. - kwargs (
Dict[str, Any]
, optional) — Additional key word arguments passed along to the push_to_hub() method.
Save a model and its configuration file to a directory, so that it can be re-loaded using the from_pretrained() class method.
set_input_embeddings
< source >( value: Module )
Set model’s input embeddings.
tensor_parallel
< source >( device_mesh )
Tensor parallelize the model across the given device mesh.
Tie the weights between the input embeddings and the output embeddings.
If the torchscript
flag is set in the configuration, can’t handle parameter sharing so we are cloning the
weights instead.
to_bettertransformer
< source >( ) → PreTrainedModel
Converts the model to use PyTorch’s native attention implementation, integrated to Transformers through Optimum library. Only a subset of all Transformers models are supported.
PyTorch’s attention fastpath allows to speed up inference through kernel fusions and the use of nested tensors. Detailed benchmarks can be found in this blog post.
Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given.
大規模モデルの読み込み
Transformers 4.20.0では、from_pretrained() メソッドが再設計され、Accelerate を使用して大規模モデルを扱うことが可能になりました。これには Accelerate >= 0.9.0 と PyTorch >= 1.9.0 が必要です。以前の方法でフルモデルを作成し、その後事前学習の重みを読み込む代わりに(これにはメモリ内のモデルサイズが2倍必要で、ランダムに初期化されたモデル用と重み用の2つが必要でした)、モデルを空の外殻として作成し、事前学習の重みが読み込まれるときにパラメーターを実体化するオプションが追加されました。
このオプションは low_cpu_mem_usage=True
で有効にできます。モデルはまず空の重みを持つメタデバイス上に作成され、その後状態辞書が内部に読み込まれます(シャードされたチェックポイントの場合、シャードごとに読み込まれます)。この方法で使用される最大RAMは、モデルの完全なサイズだけです。
from transformers import AutoModelForSeq2SeqLM
t0pp = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp", low_cpu_mem_usage=True)
さらに、モデルが完全にRAMに収まらない場合(現時点では推論のみ有効)、異なるデバイスにモデルを直接配置できます。device_map="auto"
を使用すると、Accelerateは各レイヤーをどのデバイスに配置するかを決定し、最速のデバイス(GPU)を最大限に活用し、残りの部分をCPU、あるいはGPU RAMが不足している場合はハードドライブにオフロードします。モデルが複数のデバイスに分割されていても、通常どおり実行されます。
device_map
を渡す際、low_cpu_mem_usage
は自動的に True
に設定されるため、それを指定する必要はありません。
from transformers import AutoModelForSeq2SeqLM
t0pp = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp", device_map="auto")
モデルがデバイス間でどのように分割されたかは、その hf_device_map
属性を見ることで確認できます:
t0pp.hf_device_map
{'shared': 0,
'decoder.embed_tokens': 0,
'encoder': 0,
'decoder.block.0': 0,
'decoder.block.1': 1,
'decoder.block.2': 1,
'decoder.block.3': 1,
'decoder.block.4': 1,
'decoder.block.5': 1,
'decoder.block.6': 1,
'decoder.block.7': 1,
'decoder.block.8': 1,
'decoder.block.9': 1,
'decoder.block.10': 1,
'decoder.block.11': 1,
'decoder.block.12': 1,
'decoder.block.13': 1,
'decoder.block.14': 1,
'decoder.block.15': 1,
'decoder.block.16': 1,
'decoder.block.17': 1,
'decoder.block.18': 1,
'decoder.block.19': 1,
'decoder.block.20': 1,
'decoder.block.21': 1,
'decoder.block.22': 'cpu',
'decoder.block.23': 'cpu',
'decoder.final_layer_norm': 'cpu',
'decoder.dropout': 'cpu',
'lm_head': 'cpu'}
同じフォーマットに従って、独自のデバイスマップを作成することもできます(レイヤー名からデバイスへの辞書です)。モデルのすべてのパラメータを指定されたデバイスにマップする必要がありますが、1つのレイヤーが完全に同じデバイスにある場合、そのレイヤーのサブモジュールのすべてがどこに行くかの詳細を示す必要はありません。例えば、次のデバイスマップはT0ppに適しています(GPUメモリがある場合):
device_map = {"shared": 0, "encoder": 0, "decoder": 1, "lm_head": 1}
モデルのメモリへの影響を最小限に抑えるもう 1 つの方法は、低精度の dtype (torch.float16
など) でモデルをインスタンス化するか、以下で説明する直接量子化手法を使用することです。
Model Instantiation dtype
Pytorch では、モデルは通常 torch.float32
形式でインスタンス化されます。これは、しようとすると問題になる可能性があります
重みが fp16 にあるモデルをロードすると、2 倍のメモリが必要になるためです。この制限を克服するには、次のことができます。
torch_dtype
引数を使用して、目的の dtype
を明示的に渡します。
model = T5ForConditionalGeneration.from_pretrained("t5", torch_dtype=torch.float16)
または、モデルを常に最適なメモリ パターンでロードしたい場合は、特別な値 "auto"
を使用できます。
そして、dtype
はモデルの重みから自動的に導出されます。
model = T5ForConditionalGeneration.from_pretrained("t5", torch_dtype="auto")
スクラッチからインスタンス化されたモデルには、どの dtype
を使用するかを指示することもできます。
config = T5Config.from_pretrained("t5")
model = AutoModel.from_config(config)
Pytorch の設計により、この機能は浮動小数点 dtype でのみ使用できます。
ModuleUtilsMixin
A few utilities for torch.nn.Modules
, to be used as a mixin.
Add a memory hook before and after each sub-module forward pass to record increase in memory consumption.
Increase in memory consumption is stored in a mem_rss_diff
attribute for each module and can be reset to zero
with model.reset_memory_hooks_state()
.
estimate_tokens
< source >( input_dict: typing.Dict[str, typing.Union[torch.Tensor, typing.Any]] ) → int
Helper function to estimate the total number of tokens from the model inputs.
floating_point_ops
< source >( input_dict: typing.Dict[str, typing.Union[torch.Tensor, typing.Any]] exclude_embeddings: bool = True ) → int
Parameters
- batch_size (
int
) — The batch size for the forward pass. - sequence_length (
int
) — The number of tokens in each line of the batch. - exclude_embeddings (
bool
, optional, defaults toTrue
) — Whether or not to count embedding and softmax operations.
Returns
int
The number of floating-point operations.
Get number of (optionally, non-embeddings) floating-point operations for the forward and backward passes of a
batch with this transformer model. Default approximation neglects the quadratic dependency on the number of
tokens (valid if 12 * d_model << sequence_length
) as laid out in this
paper section 2.1. Should be overridden for transformers with parameter
re-use e.g. Albert or Universal Transformers, or if doing long-range modeling with very high sequence lengths.
get_extended_attention_mask
< source >( attention_mask: Tensor input_shape: typing.Tuple[int] device: device = None dtype: torch.float32 = None )
Makes broadcastable attention and causal masks so that future and masked tokens are ignored.
get_head_mask
< source >( head_mask: typing.Optional[torch.Tensor] num_hidden_layers: int is_attention_chunked: bool = False )
Parameters
- head_mask (
torch.Tensor
with shape[num_heads]
or[num_hidden_layers x num_heads]
, optional) — The mask indicating if we should keep the heads or not (1.0 for keep, 0.0 for discard). - num_hidden_layers (
int
) — The number of hidden layers in the model. - is_attention_chunked (
bool
, optional, defaults toFalse
) — Whether or not the attentions scores are computed by chunks or not.
Prepare the head mask if needed.
invert_attention_mask
< source >( encoder_attention_mask: Tensor ) → torch.Tensor
Invert an attention mask (e.g., switches 0. and 1.).
num_parameters
< source >( only_trainable: bool = False exclude_embeddings: bool = False ) → int
Get number of (optionally, trainable or non-embeddings) parameters in the module.
Reset the mem_rss_diff
attribute of each module (see add_memory_hooks()).
TFPreTrainedModel
Base class for all TF models.
TFPreTrainedModel takes care of storing the configuration of the models and handles methods for loading, downloading and saving models as well as a few methods common to all models to:
- resize the input embeddings,
- prune heads in the self-attention heads.
Class attributes (overridden by derived classes):
- config_class (PretrainedConfig) — A subclass of PretrainedConfig to use as configuration class for this model architecture.
- base_model_prefix (
str
) — A string indicating the attribute associated to the base model in derived classes of the same architecture adding modules on top of the base model. - main_input_name (
str
) — The name of the principal input to the model (ofteninput_ids
for NLP models,pixel_values
for vision models andinput_values
for speech models).
push_to_hub
< source >( repo_id: str use_temp_dir: Optional[bool] = None commit_message: Optional[str] = None private: Optional[bool] = None max_shard_size: Optional[Union[int, str]] = '10GB' token: Optional[Union[bool, str]] = None use_auth_token: Optional[Union[bool, str]] = None create_pr: bool = False **base_model_card_args )
Parameters
- repo_id (
str
) — The name of the repository you want to push your model to. It should contain your organization name when pushing to a given organization. - use_temp_dir (
bool
, optional) — Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default toTrue
if there is no directory named likerepo_id
,False
otherwise. - commit_message (
str
, optional) — Message to commit while pushing. Will default to"Upload model"
. - private (
bool
, optional) — Whether to make the repo private. IfNone
(default), the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists. - token (
bool
orstr
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). Will default toTrue
ifrepo_url
is not specified. - max_shard_size (
int
orstr
, optional, defaults to"10GB"
) — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"
). - create_pr (
bool
, optional, defaults toFalse
) — Whether or not to create a PR with the uploaded files or directly commit.
Upload the model files to the 🤗 Model Hub while synchronizing a local clone of the repo in repo_path_or_name
.
Examples:
from transformers import TFAutoModel
model = TFAutoModel.from_pretrained("google-bert/bert-base-cased")
# Push the model to your namespace with the name "my-finetuned-bert".
model.push_to_hub("my-finetuned-bert")
# Push the model to an organization with the name "my-finetuned-bert".
model.push_to_hub("huggingface/my-finetuned-bert")
can_generate
< source >( ) → bool
Returns
bool
Whether this model can generate sequences with .generate()
.
Returns whether this model can generate sequences with .generate()
.
compile
< source >( optimizer = 'rmsprop' loss = 'auto_with_warning' metrics = None loss_weights = None weighted_metrics = None run_eagerly = None steps_per_execution = None **kwargs )
This is a thin wrapper that sets the model’s loss output head as the loss if the user does not specify a loss function themselves.
create_model_card
< source >( output_dir model_name: str language: Optional[str] = None license: Optional[str] = None tags: Optional[str] = None finetuned_from: Optional[str] = None tasks: Optional[str] = None dataset_tags: Optional[Union[str, List[str]]] = None dataset: Optional[Union[str, List[str]]] = None dataset_args: Optional[Union[str, List[str]]] = None )
Parameters
- output_dir (
str
oros.PathLike
) — The folder in which to create the model card. - model_name (
str
, optional) — The name of the model. - language (
str
, optional) — The language of the model (if applicable) - license (
str
, optional) — The license of the model. Will default to the license of the pretrained model used, if the original model given to theTrainer
comes from a repo on the Hub. - tags (
str
orList[str]
, optional) — Some tags to be included in the metadata of the model card. - finetuned_from (
str
, optional) — The name of the model used to fine-tune this one (if applicable). Will default to the name of the repo of the original model given to theTrainer
(if it comes from the Hub). - tasks (
str
orList[str]
, optional) — One or several task identifiers, to be included in the metadata of the model card. - dataset_tags (
str
orList[str]
, optional) — One or several dataset tags, to be included in the metadata of the model card. - dataset (
str
orList[str]
, optional) — One or several dataset identifiers, to be included in the metadata of the model card. - dataset_args (
str
orList[str]
, optional) — One or several dataset arguments, to be included in the metadata of the model card.
Creates a draft of a model card using the information available to the Trainer
.
from_pretrained
< source >( pretrained_model_name_or_path: Optional[Union[str, os.PathLike]] *model_args config: Optional[Union[PretrainedConfig, str, os.PathLike]] = None cache_dir: Optional[Union[str, os.PathLike]] = None ignore_mismatched_sizes: bool = False force_download: bool = False local_files_only: bool = False token: Optional[Union[str, bool]] = None revision: str = 'main' use_safetensors: bool = None **kwargs )
Parameters
- pretrained_model_name_or_path (
str
, optional) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/
. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin
). In this case,from_pt
should be set toTrue
and a configuration object should be provided asconfig
argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards. None
if you are both providing the configuration and state dictionary (resp. with keyword argumentsconfig
andstate_dict
).
- model_args (sequence of positional arguments, optional) —
All remaining positional arguments will be passed to the underlying model’s
__init__
method. - config (
Union[PretrainedConfig, str]
, optional) — Can be either:- an instance of a class derived from PretrainedConfig,
- a string valid as input to from_pretrained().
Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
- from_pt (
bool
, optional, defaults toFalse
) — Load the model weights from a PyTorch state_dict save file (see docstring ofpretrained_model_name_or_path
argument). - ignore_mismatched_sizes (
bool
, optional, defaults toFalse
) — Whether or not to raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model (if for instance, you are instantiating a model with 10 labels from a checkpoint with 3 labels). - cache_dir (
str
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies —
(
Dict[str, str],
optional): A dictionary of proxy servers to use by protocol or endpoint, e.g.,
{‘http’: ‘foo.bar:3128’, ‘http://hostname’: ‘foo.bar:4012’}. The proxies are used on each request. output_loading_info(
bool, *optional*, defaults to
False`): Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (e.g., not try downloading the model). - token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.
Instantiate a pretrained TF 2.0 model from a pre-trained model configuration.
The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning task.
The warning Weights from XXX not used in YYY means that the layer XXX is not used by YYY, therefore those weights are discarded.
Examples:
>>> from transformers import BertConfig, TFBertModel
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFBertModel.from_pretrained("google-bert/bert-base-uncased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = TFBertModel.from_pretrained("./test/saved_model/")
>>> # Update configuration during loading.
>>> model = TFBertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
>>> assert model.config.output_attentions == True
>>> # Loading from a Pytorch model file instead of a TensorFlow checkpoint (slower, for example purposes, not runnable).
>>> config = BertConfig.from_json_file("./pt_model/my_pt_model_config.json")
>>> model = TFBertModel.from_pretrained("./pt_model/my_pytorch_model.bin", from_pt=True, config=config)
get_bias
< source >( ) → tf.Variable
Returns
tf.Variable
The weights representing the bias, None if not an LM model.
Dict of bias attached to an LM head. The key represents the name of the bias attribute.
get_head_mask
< source >( head_mask: tf.Tensor | None num_hidden_layers: int )
Prepare the head mask if needed.
get_input_embeddings
< source >( ) → tf.Variable
Returns
tf.Variable
The embeddings layer mapping vocabulary to hidden states.
Returns the model’s input embeddings layer.
get_lm_head
< source >( ) → keras.layers.Layer
Returns
keras.layers.Layer
The LM head layer if the model has one, None if not.
The LM Head layer. This method must be overwritten by all the models that have a lm head.
get_output_embeddings
< source >( ) → tf.Variable
Returns
tf.Variable
The new weights mapping vocabulary to hidden states.
Returns the model’s output embeddings
get_output_layer_with_bias
< source >( ) → keras.layers.Layer
Returns
keras.layers.Layer
The layer that handles the bias, None if not an LM model.
Get the layer that handles a bias attribute in case the model has an LM head with weights tied to the embeddings
Get the concatenated _prefix name of the bias from the model name to the parent layer
prepare_tf_dataset
< source >( dataset: 'datasets.Dataset' batch_size: int = 8 shuffle: bool = True tokenizer: Optional['PreTrainedTokenizerBase'] = None collate_fn: Optional[Callable] = None collate_fn_args: Optional[Dict[str, Any]] = None drop_remainder: Optional[bool] = None prefetch: bool = True ) → Dataset
Parameters
- dataset (
Any
) — A [~datasets.Dataset
] to be wrapped as atf.data.Dataset
. - batch_size (
int
, optional, defaults to 8) — The size of batches to return. - shuffle (
bool
, defaults toTrue
) — Whether to return samples from the dataset in random order. UsuallyTrue
for training datasets andFalse
for validation/test datasets. - tokenizer (PreTrainedTokenizerBase, optional) —
A
PreTrainedTokenizer
that will be used to pad samples to create batches. Has no effect if a specificcollate_fn
is passed instead. - collate_fn (
Callable
, optional) — A function that collates samples from the dataset into a single batch. Defaults toDefaultDataCollator
if notokenizer
is supplied orDataCollatorWithPadding
if atokenizer
is passed. - collate_fn_args (
Dict[str, Any]
, optional) — A dict of arguments to pass to thecollate_fn
alongside the list of samples. - drop_remainder (
bool
, optional) — Whether to drop the final batch, if the batch_size does not evenly divide the dataset length. Defaults to the same setting asshuffle
. - prefetch (
bool
, defaults toTrue
) — Whether to add prefetching to the end of thetf.data
pipeline. This is almost always beneficial for performance, but can be disabled in edge cases.
Returns
Dataset
A tf.data.Dataset
which is ready to pass to the Keras API.
Wraps a HuggingFace Dataset
as a tf.data.Dataset
with collation and batching. This method is
designed to create a “ready-to-use” dataset that can be passed directly to Keras methods like fit()
without
further modification. The method will drop columns from the dataset if they don’t match input names for the
model. If you want to specify the column names to return rather than using the names that match this model, we
recommend using Dataset.to_tf_dataset()
instead.
prune_heads
< source >( heads_to_prune )
Prunes heads of the base model.
register_for_auto_class
< source >( auto_class = 'TFAutoModel' )
Register this class with a given auto class. This should only be used for custom models as the ones in the library are already mapped with an auto class.
This API is experimental and may have some slight breaking changes in the next releases.
resize_token_embeddings
< source >( new_num_tokens: Optional[int] = None ) → tf.Variable
or keras.layers.Embedding
Parameters
- new_num_tokens (
int
, optional) — The number of new tokens in the embedding matrix. Increasing the size will add newly initialized vectors at the end. Reducing the size will remove vectors from the end. If not provided orNone
, just returns a pointer to the input tokens without doing anything.
Returns
tf.Variable
or keras.layers.Embedding
Pointer to the input tokens of the model.
Resizes input token embeddings matrix of the model if new_num_tokens != config.vocab_size
.
Takes care of tying weights embeddings afterwards if the model class has a tie_weights()
method.
save_pretrained
< source >( save_directory saved_model = False version = 1 push_to_hub = False signatures = None max_shard_size: Union[int, str] = '5GB' create_pr: bool = False safe_serialization: bool = False token: Optional[Union[str, bool]] = None **kwargs )
Parameters
- save_directory (
str
) — Directory to which to save. Will be created if it doesn’t exist. - saved_model (
bool
, optional, defaults toFalse
) — If the model has to be saved in saved model format as well or not. - version (
int
, optional, defaults to 1) — The version of the saved model. A saved model needs to be versioned in order to be properly loaded by TensorFlow Serving as detailed in the official documentation https://www.tensorflow.org/tfx/serving/serving_basic - push_to_hub (
bool
, optional, defaults toFalse
) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to withrepo_id
(will default to the name ofsave_directory
in your namespace). - signatures (
dict
ortf.function
, optional) — Model’s signature used for serving. This will be passed to thesignatures
argument of model.save(). - max_shard_size (
int
orstr
, optional, defaults to"10GB"
) — The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"
).If a single weight of the model is bigger than
max_shard_size
, it will be in its own checkpoint shard which will be bigger thanmax_shard_size
. - create_pr (
bool
, optional, defaults toFalse
) — Whether or not to create a PR with the uploaded files or directly commit. - safe_serialization (
bool
, optional, defaults toFalse
) — Whether to save the model usingsafetensors
or the traditional TensorFlow way (that usesh5
). - token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - kwargs (
Dict[str, Any]
, optional) — Additional key word arguments passed along to the push_to_hub() method.
Save a model and its configuration file to a directory, so that it can be re-loaded using the from_pretrained() class method.
Prepare the output of the saved model. Can be overridden if specific serving modifications are required.
set_bias
< source >( value )
Set all the bias in the LM head.
set_input_embeddings
< source >( value )
Set model’s input embeddings
set_output_embeddings
< source >( value )
Set model’s output embeddings
A modification of Keras’s default train_step
that correctly handles matching outputs to labels for our models
and supports directly training on the loss output head. In addition, it ensures input keys are copied to the
labels where appropriate. It will also copy label keys into the input dict when using the dummy loss, to ensure
that they are available to the model during the forward pass.
A modification of Keras’s default train_step
that correctly handles matching outputs to labels for our models
and supports directly training on the loss output head. In addition, it ensures input keys are copied to the
labels where appropriate. It will also copy label keys into the input dict when using the dummy loss, to ensure
that they are available to the model during the forward pass.
TFModelUtilsMixin
A few utilities for keras.Model
, to be used as a mixin.
num_parameters
< source >( only_trainable: bool = False ) → int
Get the number of (optionally, trainable) parameters in the model.
FlaxPreTrainedModel
class transformers.FlaxPreTrainedModel
< source >( config: PretrainedConfig module: Module input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True )
Base class for all models.
FlaxPreTrainedModel takes care of storing the configuration of the models and handles methods for loading, downloading and saving models.
Class attributes (overridden by derived classes):
- config_class (PretrainedConfig) — A subclass of PretrainedConfig to use as configuration class for this model architecture.
- base_model_prefix (
str
) — A string indicating the attribute associated to the base model in derived classes of the same architecture adding modules on top of the base model. - main_input_name (
str
) — The name of the principal input to the model (ofteninput_ids
for NLP models,pixel_values
for vision models andinput_values
for speech models).
push_to_hub
< source >( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str = None commit_description: str = None tags: typing.Optional[typing.List[str]] = None **deprecated_kwargs )
Parameters
- repo_id (
str
) — The name of the repository you want to push your model to. It should contain your organization name when pushing to a given organization. - use_temp_dir (
bool
, optional) — Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default toTrue
if there is no directory named likerepo_id
,False
otherwise. - commit_message (
str
, optional) — Message to commit while pushing. Will default to"Upload model"
. - private (
bool
, optional) — Whether to make the repo private. IfNone
(default), the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists. - token (
bool
orstr
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). Will default toTrue
ifrepo_url
is not specified. - max_shard_size (
int
orstr
, optional, defaults to"5GB"
) — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"
). We default it to"5GB"
so that users can easily load models on free-tier Google Colab instances without any CPU OOM issues. - create_pr (
bool
, optional, defaults toFalse
) — Whether or not to create a PR with the uploaded files or directly commit. - safe_serialization (
bool
, optional, defaults toTrue
) — Whether or not to convert the model weights in safetensors format for safer serialization. - revision (
str
, optional) — Branch to push the uploaded files to. - commit_description (
str
, optional) — The description of the commit that will be created - tags (
List[str]
, optional) — List of tags to push on the Hub.
Upload the model checkpoint to the 🤗 Model Hub.
Examples:
from transformers import FlaxAutoModel
model = FlaxAutoModel.from_pretrained("google-bert/bert-base-cased")
# Push the model to your namespace with the name "my-finetuned-bert".
model.push_to_hub("my-finetuned-bert")
# Push the model to an organization with the name "my-finetuned-bert".
model.push_to_hub("huggingface/my-finetuned-bert")
Returns whether this model can generate sequences with .generate()
. Returns:
bool
: Whether this model can generate sequences with .generate()
.
from_pretrained
< source >( pretrained_model_name_or_path: typing.Union[str, os.PathLike] dtype: dtype = <class 'jax.numpy.float32'> *model_args config: typing.Union[transformers.configuration_utils.PretrainedConfig, str, os.PathLike, NoneType] = None cache_dir: typing.Union[str, os.PathLike, NoneType] = None ignore_mismatched_sizes: bool = False force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/
. - A path or url to a pt index checkpoint file (e.g,
./tf_model/model.ckpt.index
). In this case,from_pt
should be set toTrue
.
- dtype (
jax.numpy.dtype
, optional, defaults tojax.numpy.float32
) — The data type of the computation. Can be one ofjax.numpy.float32
,jax.numpy.float16
(on GPUs) andjax.numpy.bfloat16
(on TPUs).This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given
dtype
.Note that this only specifies the dtype of the computation and does not influence the dtype of model parameters.
If you wish to change the dtype of the model parameters, see to_fp16() and to_bf16().
- model_args (sequence of positional arguments, optional) —
All remaining positional arguments will be passed to the underlying model’s
__init__
method. - config (
Union[PretrainedConfig, str, os.PathLike]
, optional) — Can be either:- an instance of a class derived from PretrainedConfig,
- a string or path valid as input to from_pretrained().
Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
- cache_dir (
Union[str, os.PathLike]
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool
, optional, defaults toFalse
) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_path
argument). - ignore_mismatched_sizes (
bool
, optional, defaults toFalse
) — Whether or not to raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model (if for instance, you are instantiating a model with 10 labels from a checkpoint with 3 labels). - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (i.e., do not try to download the model). - token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.
Instantiate a pretrained flax model from a pre-trained model configuration.
The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning task.
The warning Weights from XXX not used in YYY means that the layer XXX is not used by YYY, therefore those weights are discarded.
Examples:
>>> from transformers import BertConfig, FlaxBertModel
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable).
>>> model = FlaxBertModel.from_pretrained("./test/saved_model/")
>>> # Loading from a PyTorch checkpoint file instead of a PyTorch model (slower, for example purposes, not runnable).
>>> config = BertConfig.from_json_file("./pt_model/config.json")
>>> model = FlaxBertModel.from_pretrained("./pt_model/pytorch_model.bin", from_pt=True, config=config)
load_flax_sharded_weights
< source >( shard_files ) → Dict
This is the same as flax.serialization.from_bytes
(https:lax.readthedocs.io/en/latest/_modules/flax/serialization.html#from_bytes) but for a sharded checkpoint.
This load is performed efficiently: each checkpoint shard is loaded one by one in RAM and deleted after being loaded in the model.
register_for_auto_class
< source >( auto_class = 'FlaxAutoModel' )
Register this class with a given auto class. This should only be used for custom models as the ones in the library are already mapped with an auto class.
This API is experimental and may have some slight breaking changes in the next releases.
save_pretrained
< source >( save_directory: typing.Union[str, os.PathLike] params = None push_to_hub = False max_shard_size = '10GB' token: typing.Union[str, bool, NoneType] = None safe_serialization: bool = False **kwargs )
Parameters
- save_directory (
str
oros.PathLike
) — Directory to which to save. Will be created if it doesn’t exist. - push_to_hub (
bool
, optional, defaults toFalse
) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to withrepo_id
(will default to the name ofsave_directory
in your namespace). - max_shard_size (
int
orstr
, optional, defaults to"10GB"
) — The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"
).If a single weight of the model is bigger than
max_shard_size
, it will be in its own checkpoint shard which will be bigger thanmax_shard_size
. - token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - kwargs (
Dict[str, Any]
, optional) — Additional key word arguments passed along to the push_to_hub() method. - safe_serialization (
bool
, optional, defaults toFalse
) — Whether to save the model usingsafetensors
or through msgpack.
Save a model and its configuration file to a directory, so that it can be re-loaded using the
[from_pretrained()](/docs/transformers/v4.47.0/ja/main_classes/model#transformers.FlaxPreTrainedModel.from_pretrained)
class method
to_bf16
< source >( params: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict] mask: typing.Any = None )
Cast the floating-point params
to jax.numpy.bfloat16
. This returns a new params
tree and does not cast
the params
in place.
This method can be used on TPU to explicitly convert the model parameters to bfloat16 precision to do full half-precision training or to save weights in bfloat16 for inference in order to save memory and improve speed.
Examples:
>>> from transformers import FlaxBertModel
>>> # load model
>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # By default, the model parameters will be in fp32 precision, to cast these to bfloat16 precision
>>> model.params = model.to_bf16(model.params)
>>> # If you want don't want to cast certain parameters (for example layer norm bias and scale)
>>> # then pass the mask as follows
>>> from flax import traverse_util
>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> flat_params = traverse_util.flatten_dict(model.params)
>>> mask = {
... path: (path[-2] != ("LayerNorm", "bias") and path[-2:] != ("LayerNorm", "scale"))
... for path in flat_params
... }
>>> mask = traverse_util.unflatten_dict(mask)
>>> model.params = model.to_bf16(model.params, mask)
to_fp16
< source >( params: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict] mask: typing.Any = None )
Cast the floating-point parmas
to jax.numpy.float16
. This returns a new params
tree and does not cast the
params
in place.
This method can be used on GPU to explicitly convert the model parameters to float16 precision to do full half-precision training or to save weights in float16 for inference in order to save memory and improve speed.
Examples:
>>> from transformers import FlaxBertModel
>>> # load model
>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # By default, the model params will be in fp32, to cast these to float16
>>> model.params = model.to_fp16(model.params)
>>> # If you want don't want to cast certain parameters (for example layer norm bias and scale)
>>> # then pass the mask as follows
>>> from flax import traverse_util
>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> flat_params = traverse_util.flatten_dict(model.params)
>>> mask = {
... path: (path[-2] != ("LayerNorm", "bias") and path[-2:] != ("LayerNorm", "scale"))
... for path in flat_params
... }
>>> mask = traverse_util.unflatten_dict(mask)
>>> model.params = model.to_fp16(model.params, mask)
to_fp32
< source >( params: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict] mask: typing.Any = None )
Cast the floating-point parmas
to jax.numpy.float32
. This method can be used to explicitly convert the
model parameters to fp32 precision. This returns a new params
tree and does not cast the params
in place.
Examples:
>>> from transformers import FlaxBertModel
>>> # Download model and configuration from huggingface.co
>>> model = FlaxBertModel.from_pretrained("google-bert/bert-base-cased")
>>> # By default, the model params will be in fp32, to illustrate the use of this method,
>>> # we'll first cast to fp16 and back to fp32
>>> model.params = model.to_f16(model.params)
>>> # now cast back to fp32
>>> model.params = model.to_fp32(model.params)
Pushing to the Hub
A Mixin containing the functionality to push a model or tokenizer to the hub.
push_to_hub
< source >( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: str = None commit_description: str = None tags: typing.Optional[typing.List[str]] = None **deprecated_kwargs )
Parameters
- repo_id (
str
) — The name of the repository you want to push your {object} to. It should contain your organization name when pushing to a given organization. - use_temp_dir (
bool
, optional) — Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default toTrue
if there is no directory named likerepo_id
,False
otherwise. - commit_message (
str
, optional) — Message to commit while pushing. Will default to"Upload {object}"
. - private (
bool
, optional) — Whether to make the repo private. IfNone
(default), the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists. - token (
bool
orstr
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). Will default toTrue
ifrepo_url
is not specified. - max_shard_size (
int
orstr
, optional, defaults to"5GB"
) — Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like"5MB"
). We default it to"5GB"
so that users can easily load models on free-tier Google Colab instances without any CPU OOM issues. - create_pr (
bool
, optional, defaults toFalse
) — Whether or not to create a PR with the uploaded files or directly commit. - safe_serialization (
bool
, optional, defaults toTrue
) — Whether or not to convert the model weights in safetensors format for safer serialization. - revision (
str
, optional) — Branch to push the uploaded files to. - commit_description (
str
, optional) — The description of the commit that will be created - tags (
List[str]
, optional) — List of tags to push on the Hub.
Upload the {object_files} to the 🤗 Model Hub.
Examples:
from transformers import {object_class}
{object} = {object_class}.from_pretrained("google-bert/bert-base-cased")
# Push the {object} to your namespace with the name "my-finetuned-bert".
{object}.push_to_hub("my-finetuned-bert")
# Push the {object} to an organization with the name "my-finetuned-bert".
{object}.push_to_hub("huggingface/my-finetuned-bert")
Sharded checkpoints
transformers.modeling_utils.load_sharded_checkpoint
< source >( model folder strict = True prefer_safe = True ) → NamedTuple
Parameters
- model (
torch.nn.Module
) — The model in which to load the checkpoint. - folder (
str
oros.PathLike
) — A path to a folder containing the sharded checkpoint. - strict (
bool
, *optional, defaults to
True`) — Whether to strictly enforce that the keys in the model state dict match the keys in the sharded checkpoint. - prefer_safe (
bool
, optional, defaults toFalse
) — If both safetensors and PyTorch save files are present in checkpoint andprefer_safe
is True, the safetensors files will be loaded. Otherwise, PyTorch files are always loaded when possible.
Returns
NamedTuple
A named tuple with missing_keys
and unexpected_keys
fields
missing_keys
is a list of str containing the missing keysunexpected_keys
is a list of str containing the unexpected keys
This is the same as
torch.nn.Module.load_state_dict
but for a sharded checkpoint.
This load is performed efficiently: each checkpoint shard is loaded one by one in RAM and deleted after being loaded in the model.