Auto Classes
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you
are supplying to the from_pretrained()
method. AutoClasses are here to do this job for you so that you
automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary.
Instantiating one of AutoConfig, AutoModel, and AutoTokenizer will directly create a class of the relevant architecture. For instance
model = AutoModel.from_pretrained("google-bert/bert-base-cased")
will create a model that is an instance of BertModel.
There is one class of AutoModel
for each task, and for each backend (PyTorch, TensorFlow, or Flax).
Extending the Auto Classes
Each of the auto classes has a method to be extended with your custom classes. For instance, if you have defined a
custom class of model NewModel
, make sure you have a NewModelConfig
then you can add those to the auto
classes like this:
from transformers import AutoConfig, AutoModel
AutoConfig.register("new-model", NewModelConfig)
AutoModel.register(NewModelConfig, NewModel)
You will then be able to use the auto classes like you would usually do!
If your NewModelConfig
is a subclass of PretrainedConfig, make sure its
model_type
attribute is set to the same key you use when registering the config (here "new-model"
).
Likewise, if your NewModel
is a subclass of PreTrainedModel, make sure its
config_class
attribute is set to the same class you use when registering the model (here
NewModelConfig
).
AutoConfig
This is a generic configuration class that will be instantiated as one of the configuration classes of the library when created with the from_pretrained() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_pretrained
< source >( pretrained_model_name_or_path **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:- A string, the model id of a pretrained model configuration hosted inside a model repo on huggingface.co.
- A path to a directory containing a configuration file saved using the
save_pretrained() method, or the save_pretrained() method,
e.g.,
./my_model_directory/
. - A path or url to a saved configuration JSON file, e.g.,
./my_model_directory/configuration.json
.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download the model weights and configuration files and override the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - return_unused_kwargs (
bool
, optional, defaults toFalse
) — IfFalse
, then this function returns just the final configuration object.If
True
, then this functions returns aTuple(config, unused_kwargs)
where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e., the part ofkwargs
which has not been used to updateconfig
and is otherwise ignored. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - kwargs(additional keyword arguments, optional) —
The values in kwargs of any keys which are configuration attributes will be used to override the loaded
values. Behavior concerning key/value pairs whose keys are not configuration attributes is controlled
by the
return_unused_kwargs
keyword parameter.
Instantiate one of the configuration classes of the library from a pretrained model configuration.
The configuration class to instantiate is selected based on the model_type
property of the config object that
is loaded, or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path
:
- albert — AlbertConfig (ALBERT model)
- align — AlignConfig (ALIGN model)
- altclip — AltCLIPConfig (AltCLIP model)
- audio-spectrogram-transformer — ASTConfig (Audio Spectrogram Transformer model)
- autoformer — AutoformerConfig (Autoformer model)
- bark — BarkConfig (Bark model)
- bart — BartConfig (BART model)
- beit — BeitConfig (BEiT model)
- bert — BertConfig (BERT model)
- bert-generation — BertGenerationConfig (Bert Generation model)
- big_bird — BigBirdConfig (BigBird model)
- bigbird_pegasus — BigBirdPegasusConfig (BigBird-Pegasus model)
- biogpt — BioGptConfig (BioGpt model)
- bit — BitConfig (BiT model)
- blenderbot — BlenderbotConfig (Blenderbot model)
- blenderbot-small — BlenderbotSmallConfig (BlenderbotSmall model)
- blip — BlipConfig (BLIP model)
- blip-2 — Blip2Config (BLIP-2 model)
- bloom — BloomConfig (BLOOM model)
- bridgetower — BridgeTowerConfig (BridgeTower model)
- bros — BrosConfig (BROS model)
- camembert — CamembertConfig (CamemBERT model)
- canine — CanineConfig (CANINE model)
- chameleon — ChameleonConfig (Chameleon model)
- chinese_clip — ChineseCLIPConfig (Chinese-CLIP model)
- chinese_clip_vision_model — ChineseCLIPVisionConfig (ChineseCLIPVisionModel model)
- clap — ClapConfig (CLAP model)
- clip — CLIPConfig (CLIP model)
- clip_text_model — CLIPTextConfig (CLIPTextModel model)
- clip_vision_model — CLIPVisionConfig (CLIPVisionModel model)
- clipseg — CLIPSegConfig (CLIPSeg model)
- clvp — ClvpConfig (CLVP model)
- code_llama — LlamaConfig (CodeLlama model)
- codegen — CodeGenConfig (CodeGen model)
- cohere — CohereConfig (Cohere model)
- conditional_detr — ConditionalDetrConfig (Conditional DETR model)
- convbert — ConvBertConfig (ConvBERT model)
- convnext — ConvNextConfig (ConvNeXT model)
- convnextv2 — ConvNextV2Config (ConvNeXTV2 model)
- cpmant — CpmAntConfig (CPM-Ant model)
- ctrl — CTRLConfig (CTRL model)
- cvt — CvtConfig (CvT model)
- dac — DacConfig (DAC model)
- data2vec-audio — Data2VecAudioConfig (Data2VecAudio model)
- data2vec-text — Data2VecTextConfig (Data2VecText model)
- data2vec-vision — Data2VecVisionConfig (Data2VecVision model)
- dbrx — DbrxConfig (DBRX model)
- deberta — DebertaConfig (DeBERTa model)
- deberta-v2 — DebertaV2Config (DeBERTa-v2 model)
- decision_transformer — DecisionTransformerConfig (Decision Transformer model)
- deformable_detr — DeformableDetrConfig (Deformable DETR model)
- deit — DeiTConfig (DeiT model)
- depth_anything — DepthAnythingConfig (Depth Anything model)
- deta — DetaConfig (DETA model)
- detr — DetrConfig (DETR model)
- dinat — DinatConfig (DiNAT model)
- dinov2 — Dinov2Config (DINOv2 model)
- distilbert — DistilBertConfig (DistilBERT model)
- donut-swin — DonutSwinConfig (DonutSwin model)
- dpr — DPRConfig (DPR model)
- dpt — DPTConfig (DPT model)
- efficientformer — EfficientFormerConfig (EfficientFormer model)
- efficientnet — EfficientNetConfig (EfficientNet model)
- electra — ElectraConfig (ELECTRA model)
- encodec — EncodecConfig (EnCodec model)
- encoder-decoder — EncoderDecoderConfig (Encoder decoder model)
- ernie — ErnieConfig (ERNIE model)
- ernie_m — ErnieMConfig (ErnieM model)
- esm — EsmConfig (ESM model)
- falcon — FalconConfig (Falcon model)
- falcon_mamba — FalconMambaConfig (FalconMamba model)
- fastspeech2_conformer — FastSpeech2ConformerConfig (FastSpeech2Conformer model)
- flaubert — FlaubertConfig (FlauBERT model)
- flava — FlavaConfig (FLAVA model)
- fnet — FNetConfig (FNet model)
- focalnet — FocalNetConfig (FocalNet model)
- fsmt — FSMTConfig (FairSeq Machine-Translation model)
- funnel — FunnelConfig (Funnel Transformer model)
- fuyu — FuyuConfig (Fuyu model)
- gemma — GemmaConfig (Gemma model)
- gemma2 — Gemma2Config (Gemma2 model)
- git — GitConfig (GIT model)
- glpn — GLPNConfig (GLPN model)
- gpt-sw3 — GPT2Config (GPT-Sw3 model)
- gpt2 — GPT2Config (OpenAI GPT-2 model)
- gpt_bigcode — GPTBigCodeConfig (GPTBigCode model)
- gpt_neo — GPTNeoConfig (GPT Neo model)
- gpt_neox — GPTNeoXConfig (GPT NeoX model)
- gpt_neox_japanese — GPTNeoXJapaneseConfig (GPT NeoX Japanese model)
- gptj — GPTJConfig (GPT-J model)
- gptsan-japanese — GPTSanJapaneseConfig (GPTSAN-japanese model)
- granite — GraniteConfig (Granite model)
- granitemoe — GraniteMoeConfig (GraniteMoeMoe model)
- graphormer — GraphormerConfig (Graphormer model)
- grounding-dino — GroundingDinoConfig (Grounding DINO model)
- groupvit — GroupViTConfig (GroupViT model)
- hiera — HieraConfig (Hiera model)
- hubert — HubertConfig (Hubert model)
- ibert — IBertConfig (I-BERT model)
- idefics — IdeficsConfig (IDEFICS model)
- idefics2 — Idefics2Config (Idefics2 model)
- imagegpt — ImageGPTConfig (ImageGPT model)
- informer — InformerConfig (Informer model)
- instructblip — InstructBlipConfig (InstructBLIP model)
- instructblipvideo — InstructBlipVideoConfig (InstructBlipVideo model)
- jamba — JambaConfig (Jamba model)
- jetmoe — JetMoeConfig (JetMoe model)
- jukebox — JukeboxConfig (Jukebox model)
- kosmos-2 — Kosmos2Config (KOSMOS-2 model)
- layoutlm — LayoutLMConfig (LayoutLM model)
- layoutlmv2 — LayoutLMv2Config (LayoutLMv2 model)
- layoutlmv3 — LayoutLMv3Config (LayoutLMv3 model)
- led — LEDConfig (LED model)
- levit — LevitConfig (LeViT model)
- lilt — LiltConfig (LiLT model)
- llama — LlamaConfig (LLaMA model)
- llava — LlavaConfig (LLaVa model)
- llava_next — LlavaNextConfig (LLaVA-NeXT model)
- llava_next_video — LlavaNextVideoConfig (LLaVa-NeXT-Video model)
- llava_onevision — LlavaOnevisionConfig (LLaVA-Onevision model)
- longformer — LongformerConfig (Longformer model)
- longt5 — LongT5Config (LongT5 model)
- luke — LukeConfig (LUKE model)
- lxmert — LxmertConfig (LXMERT model)
- m2m_100 — M2M100Config (M2M100 model)
- mamba — MambaConfig (Mamba model)
- mamba2 — Mamba2Config (mamba2 model)
- marian — MarianConfig (Marian model)
- markuplm — MarkupLMConfig (MarkupLM model)
- mask2former — Mask2FormerConfig (Mask2Former model)
- maskformer — MaskFormerConfig (MaskFormer model)
- maskformer-swin —
MaskFormerSwinConfig
(MaskFormerSwin model) - mbart — MBartConfig (mBART model)
- mctct — MCTCTConfig (M-CTC-T model)
- mega — MegaConfig (MEGA model)
- megatron-bert — MegatronBertConfig (Megatron-BERT model)
- mgp-str — MgpstrConfig (MGP-STR model)
- mimi — MimiConfig (Mimi model)
- mistral — MistralConfig (Mistral model)
- mixtral — MixtralConfig (Mixtral model)
- mllama — MllamaConfig (Mllama model)
- mobilebert — MobileBertConfig (MobileBERT model)
- mobilenet_v1 — MobileNetV1Config (MobileNetV1 model)
- mobilenet_v2 — MobileNetV2Config (MobileNetV2 model)
- mobilevit — MobileViTConfig (MobileViT model)
- mobilevitv2 — MobileViTV2Config (MobileViTV2 model)
- mpnet — MPNetConfig (MPNet model)
- mpt — MptConfig (MPT model)
- mra — MraConfig (MRA model)
- mt5 — MT5Config (MT5 model)
- musicgen — MusicgenConfig (MusicGen model)
- musicgen_melody — MusicgenMelodyConfig (MusicGen Melody model)
- mvp — MvpConfig (MVP model)
- nat — NatConfig (NAT model)
- nemotron — NemotronConfig (Nemotron model)
- nezha — NezhaConfig (Nezha model)
- nllb-moe — NllbMoeConfig (NLLB-MOE model)
- nougat — VisionEncoderDecoderConfig (Nougat model)
- nystromformer — NystromformerConfig (Nyströmformer model)
- olmo — OlmoConfig (OLMo model)
- olmoe — OlmoeConfig (OLMoE model)
- omdet-turbo — OmDetTurboConfig (OmDet-Turbo model)
- oneformer — OneFormerConfig (OneFormer model)
- open-llama — OpenLlamaConfig (OpenLlama model)
- openai-gpt — OpenAIGPTConfig (OpenAI GPT model)
- opt — OPTConfig (OPT model)
- owlv2 — Owlv2Config (OWLv2 model)
- owlvit — OwlViTConfig (OWL-ViT model)
- paligemma — PaliGemmaConfig (PaliGemma model)
- patchtsmixer — PatchTSMixerConfig (PatchTSMixer model)
- patchtst — PatchTSTConfig (PatchTST model)
- pegasus — PegasusConfig (Pegasus model)
- pegasus_x — PegasusXConfig (PEGASUS-X model)
- perceiver — PerceiverConfig (Perceiver model)
- persimmon — PersimmonConfig (Persimmon model)
- phi — PhiConfig (Phi model)
- phi3 — Phi3Config (Phi3 model)
- pix2struct — Pix2StructConfig (Pix2Struct model)
- pixtral — PixtralVisionConfig (Pixtral model)
- plbart — PLBartConfig (PLBart model)
- poolformer — PoolFormerConfig (PoolFormer model)
- pop2piano — Pop2PianoConfig (Pop2Piano model)
- prophetnet — ProphetNetConfig (ProphetNet model)
- pvt — PvtConfig (PVT model)
- pvt_v2 — PvtV2Config (PVTv2 model)
- qdqbert — QDQBertConfig (QDQBert model)
- qwen2 — Qwen2Config (Qwen2 model)
- qwen2_audio — Qwen2AudioConfig (Qwen2Audio model)
- qwen2_audio_encoder — Qwen2AudioEncoderConfig (Qwen2AudioEncoder model)
- qwen2_moe — Qwen2MoeConfig (Qwen2MoE model)
- qwen2_vl — Qwen2VLConfig (Qwen2VL model)
- rag — RagConfig (RAG model)
- realm — RealmConfig (REALM model)
- recurrent_gemma — RecurrentGemmaConfig (RecurrentGemma model)
- reformer — ReformerConfig (Reformer model)
- regnet — RegNetConfig (RegNet model)
- rembert — RemBertConfig (RemBERT model)
- resnet — ResNetConfig (ResNet model)
- retribert — RetriBertConfig (RetriBERT model)
- roberta — RobertaConfig (RoBERTa model)
- roberta-prelayernorm — RobertaPreLayerNormConfig (RoBERTa-PreLayerNorm model)
- roc_bert — RoCBertConfig (RoCBert model)
- roformer — RoFormerConfig (RoFormer model)
- rt_detr — RTDetrConfig (RT-DETR model)
- rt_detr_resnet — RTDetrResNetConfig (RT-DETR-ResNet model)
- rwkv — RwkvConfig (RWKV model)
- sam — SamConfig (SAM model)
- seamless_m4t — SeamlessM4TConfig (SeamlessM4T model)
- seamless_m4t_v2 — SeamlessM4Tv2Config (SeamlessM4Tv2 model)
- segformer — SegformerConfig (SegFormer model)
- seggpt — SegGptConfig (SegGPT model)
- sew — SEWConfig (SEW model)
- sew-d — SEWDConfig (SEW-D model)
- siglip — SiglipConfig (SigLIP model)
- siglip_vision_model — SiglipVisionConfig (SiglipVisionModel model)
- speech-encoder-decoder — SpeechEncoderDecoderConfig (Speech Encoder decoder model)
- speech_to_text — Speech2TextConfig (Speech2Text model)
- speech_to_text_2 — Speech2Text2Config (Speech2Text2 model)
- speecht5 — SpeechT5Config (SpeechT5 model)
- splinter — SplinterConfig (Splinter model)
- squeezebert — SqueezeBertConfig (SqueezeBERT model)
- stablelm — StableLmConfig (StableLm model)
- starcoder2 — Starcoder2Config (Starcoder2 model)
- superpoint — SuperPointConfig (SuperPoint model)
- swiftformer — SwiftFormerConfig (SwiftFormer model)
- swin — SwinConfig (Swin Transformer model)
- swin2sr — Swin2SRConfig (Swin2SR model)
- swinv2 — Swinv2Config (Swin Transformer V2 model)
- switch_transformers — SwitchTransformersConfig (SwitchTransformers model)
- t5 — T5Config (T5 model)
- table-transformer — TableTransformerConfig (Table Transformer model)
- tapas — TapasConfig (TAPAS model)
- time_series_transformer — TimeSeriesTransformerConfig (Time Series Transformer model)
- timesformer — TimesformerConfig (TimeSformer model)
- timm_backbone — TimmBackboneConfig (TimmBackbone model)
- trajectory_transformer — TrajectoryTransformerConfig (Trajectory Transformer model)
- transfo-xl — TransfoXLConfig (Transformer-XL model)
- trocr — TrOCRConfig (TrOCR model)
- tvlt — TvltConfig (TVLT model)
- tvp — TvpConfig (TVP model)
- udop — UdopConfig (UDOP model)
- umt5 — UMT5Config (UMT5 model)
- unispeech — UniSpeechConfig (UniSpeech model)
- unispeech-sat — UniSpeechSatConfig (UniSpeechSat model)
- univnet — UnivNetConfig (UnivNet model)
- upernet — UperNetConfig (UPerNet model)
- van — VanConfig (VAN model)
- video_llava — VideoLlavaConfig (VideoLlava model)
- videomae — VideoMAEConfig (VideoMAE model)
- vilt — ViltConfig (ViLT model)
- vipllava — VipLlavaConfig (VipLlava model)
- vision-encoder-decoder — VisionEncoderDecoderConfig (Vision Encoder decoder model)
- vision-text-dual-encoder — VisionTextDualEncoderConfig (VisionTextDualEncoder model)
- visual_bert — VisualBertConfig (VisualBERT model)
- vit — ViTConfig (ViT model)
- vit_hybrid — ViTHybridConfig (ViT Hybrid model)
- vit_mae — ViTMAEConfig (ViTMAE model)
- vit_msn — ViTMSNConfig (ViTMSN model)
- vitdet — VitDetConfig (VitDet model)
- vitmatte — VitMatteConfig (ViTMatte model)
- vits — VitsConfig (VITS model)
- vivit — VivitConfig (ViViT model)
- wav2vec2 — Wav2Vec2Config (Wav2Vec2 model)
- wav2vec2-bert — Wav2Vec2BertConfig (Wav2Vec2-BERT model)
- wav2vec2-conformer — Wav2Vec2ConformerConfig (Wav2Vec2-Conformer model)
- wavlm — WavLMConfig (WavLM model)
- whisper — WhisperConfig (Whisper model)
- xclip — XCLIPConfig (X-CLIP model)
- xglm — XGLMConfig (XGLM model)
- xlm — XLMConfig (XLM model)
- xlm-prophetnet — XLMProphetNetConfig (XLM-ProphetNet model)
- xlm-roberta — XLMRobertaConfig (XLM-RoBERTa model)
- xlm-roberta-xl — XLMRobertaXLConfig (XLM-RoBERTa-XL model)
- xlnet — XLNetConfig (XLNet model)
- xmod — XmodConfig (X-MOD model)
- yolos — YolosConfig (YOLOS model)
- yoso — YosoConfig (YOSO model)
- zoedepth — ZoeDepthConfig (ZoeDepth model)
Examples:
>>> from transformers import AutoConfig
>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-uncased")
>>> # Download configuration from huggingface.co (user-uploaded) and cache.
>>> config = AutoConfig.from_pretrained("dbmdz/bert-base-german-cased")
>>> # If configuration file is in a directory (e.g., was saved using *save_pretrained('./test/saved_model/')*).
>>> config = AutoConfig.from_pretrained("./test/bert_saved_model/")
>>> # Load a specific configuration file.
>>> config = AutoConfig.from_pretrained("./test/bert_saved_model/my_configuration.json")
>>> # Change some config attributes when loading a pretrained config.
>>> config = AutoConfig.from_pretrained("google-bert/bert-base-uncased", output_attentions=True, foo=False)
>>> config.output_attentions
True
>>> config, unused_kwargs = AutoConfig.from_pretrained(
... "google-bert/bert-base-uncased", output_attentions=True, foo=False, return_unused_kwargs=True
... )
>>> config.output_attentions
True
>>> unused_kwargs
{'foo': False}
register
< source >( model_type config exist_ok = False )
Parameters
- model_type (
str
) — The model type like “bert” or “gpt”. - config (PretrainedConfig) — The config to register.
Register a new configuration for this class.
AutoTokenizer
This is a generic tokenizer class that will be instantiated as one of the tokenizer classes of the library when created with the AutoTokenizer.from_pretrained() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_pretrained
< source >( pretrained_model_name_or_path *inputs **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:- A string, the model id of a predefined tokenizer hosted inside a model repo on huggingface.co.
- A path to a directory containing vocabulary files required by the tokenizer, for instance saved
using the save_pretrained() method, e.g.,
./my_model_directory/
. - A path or url to a single saved vocabulary file if and only if the tokenizer only requires a
single vocabulary file (like Bert or XLNet), e.g.:
./my_model_directory/vocab.txt
. (Not applicable to all derived classes)
- inputs (additional positional arguments, optional) —
Will be passed along to the Tokenizer
__init__()
method. - config (PretrainedConfig, optional) — The configuration object used to determine the tokenizer class to instantiate.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download the model weights and configuration files and override the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - subfolder (
str
, optional) — In case the relevant files are located inside a subfolder of the model repo on huggingface.co (e.g. for facebook/rag-token-base), specify it here. - use_fast (
bool
, optional, defaults toTrue
) — Use a fast Rust-based tokenizer if it is supported for a given model. If a fast tokenizer is not available for a given model, a normal Python-based tokenizer is returned instead. - tokenizer_type (
str
, optional) — Tokenizer type to be loaded. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - kwargs (additional keyword arguments, optional) —
Will be passed to the Tokenizer
__init__()
method. Can be used to set special tokens likebos_token
,eos_token
,unk_token
,sep_token
,pad_token
,cls_token
,mask_token
,additional_special_tokens
. See parameters in the__init__()
for more details.
Instantiate one of the tokenizer classes of the library from a pretrained model vocabulary.
The tokenizer class to instantiate is selected based on the model_type
property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path
if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path
:
- albert — AlbertTokenizer or AlbertTokenizerFast (ALBERT model)
- align — BertTokenizer or BertTokenizerFast (ALIGN model)
- bark — BertTokenizer or BertTokenizerFast (Bark model)
- bart — BartTokenizer or BartTokenizerFast (BART model)
- barthez — BarthezTokenizer or BarthezTokenizerFast (BARThez model)
- bartpho — BartphoTokenizer (BARTpho model)
- bert — BertTokenizer or BertTokenizerFast (BERT model)
- bert-generation — BertGenerationTokenizer (Bert Generation model)
- bert-japanese — BertJapaneseTokenizer (BertJapanese model)
- bertweet — BertweetTokenizer (BERTweet model)
- big_bird — BigBirdTokenizer or BigBirdTokenizerFast (BigBird model)
- bigbird_pegasus — PegasusTokenizer or PegasusTokenizerFast (BigBird-Pegasus model)
- biogpt — BioGptTokenizer (BioGpt model)
- blenderbot — BlenderbotTokenizer or BlenderbotTokenizerFast (Blenderbot model)
- blenderbot-small — BlenderbotSmallTokenizer (BlenderbotSmall model)
- blip — BertTokenizer or BertTokenizerFast (BLIP model)
- blip-2 — GPT2Tokenizer or GPT2TokenizerFast (BLIP-2 model)
- bloom — BloomTokenizerFast (BLOOM model)
- bridgetower — RobertaTokenizer or RobertaTokenizerFast (BridgeTower model)
- bros — BertTokenizer or BertTokenizerFast (BROS model)
- byt5 — ByT5Tokenizer (ByT5 model)
- camembert — CamembertTokenizer or CamembertTokenizerFast (CamemBERT model)
- canine — CanineTokenizer (CANINE model)
- chameleon — LlamaTokenizer or LlamaTokenizerFast (Chameleon model)
- chinese_clip — BertTokenizer or BertTokenizerFast (Chinese-CLIP model)
- clap — RobertaTokenizer or RobertaTokenizerFast (CLAP model)
- clip — CLIPTokenizer or CLIPTokenizerFast (CLIP model)
- clipseg — CLIPTokenizer or CLIPTokenizerFast (CLIPSeg model)
- clvp — ClvpTokenizer (CLVP model)
- code_llama — CodeLlamaTokenizer or CodeLlamaTokenizerFast (CodeLlama model)
- codegen — CodeGenTokenizer or CodeGenTokenizerFast (CodeGen model)
- cohere — CohereTokenizerFast (Cohere model)
- convbert — ConvBertTokenizer or ConvBertTokenizerFast (ConvBERT model)
- cpm — CpmTokenizer or CpmTokenizerFast (CPM model)
- cpmant — CpmAntTokenizer (CPM-Ant model)
- ctrl — CTRLTokenizer (CTRL model)
- data2vec-audio — Wav2Vec2CTCTokenizer (Data2VecAudio model)
- data2vec-text — RobertaTokenizer or RobertaTokenizerFast (Data2VecText model)
- dbrx — GPT2Tokenizer or GPT2TokenizerFast (DBRX model)
- deberta — DebertaTokenizer or DebertaTokenizerFast (DeBERTa model)
- deberta-v2 — DebertaV2Tokenizer or DebertaV2TokenizerFast (DeBERTa-v2 model)
- distilbert — DistilBertTokenizer or DistilBertTokenizerFast (DistilBERT model)
- dpr — DPRQuestionEncoderTokenizer or DPRQuestionEncoderTokenizerFast (DPR model)
- electra — ElectraTokenizer or ElectraTokenizerFast (ELECTRA model)
- ernie — BertTokenizer or BertTokenizerFast (ERNIE model)
- ernie_m — ErnieMTokenizer (ErnieM model)
- esm — EsmTokenizer (ESM model)
- falcon — PreTrainedTokenizerFast (Falcon model)
- falcon_mamba — GPTNeoXTokenizerFast (FalconMamba model)
- fastspeech2_conformer — (FastSpeech2Conformer model)
- flaubert — FlaubertTokenizer (FlauBERT model)
- fnet — FNetTokenizer or FNetTokenizerFast (FNet model)
- fsmt — FSMTTokenizer (FairSeq Machine-Translation model)
- funnel — FunnelTokenizer or FunnelTokenizerFast (Funnel Transformer model)
- gemma — GemmaTokenizer or GemmaTokenizerFast (Gemma model)
- gemma2 — GemmaTokenizer or GemmaTokenizerFast (Gemma2 model)
- git — BertTokenizer or BertTokenizerFast (GIT model)
- gpt-sw3 — GPTSw3Tokenizer (GPT-Sw3 model)
- gpt2 — GPT2Tokenizer or GPT2TokenizerFast (OpenAI GPT-2 model)
- gpt_bigcode — GPT2Tokenizer or GPT2TokenizerFast (GPTBigCode model)
- gpt_neo — GPT2Tokenizer or GPT2TokenizerFast (GPT Neo model)
- gpt_neox — GPTNeoXTokenizerFast (GPT NeoX model)
- gpt_neox_japanese — GPTNeoXJapaneseTokenizer (GPT NeoX Japanese model)
- gptj — GPT2Tokenizer or GPT2TokenizerFast (GPT-J model)
- gptsan-japanese — GPTSanJapaneseTokenizer (GPTSAN-japanese model)
- grounding-dino — BertTokenizer or BertTokenizerFast (Grounding DINO model)
- groupvit — CLIPTokenizer or CLIPTokenizerFast (GroupViT model)
- herbert — HerbertTokenizer or HerbertTokenizerFast (HerBERT model)
- hubert — Wav2Vec2CTCTokenizer (Hubert model)
- ibert — RobertaTokenizer or RobertaTokenizerFast (I-BERT model)
- idefics — LlamaTokenizerFast (IDEFICS model)
- idefics2 — LlamaTokenizer or LlamaTokenizerFast (Idefics2 model)
- instructblip — GPT2Tokenizer or GPT2TokenizerFast (InstructBLIP model)
- instructblipvideo — GPT2Tokenizer or GPT2TokenizerFast (InstructBlipVideo model)
- jamba — LlamaTokenizer or LlamaTokenizerFast (Jamba model)
- jetmoe — LlamaTokenizer or LlamaTokenizerFast (JetMoe model)
- jukebox — JukeboxTokenizer (Jukebox model)
- kosmos-2 — XLMRobertaTokenizer or XLMRobertaTokenizerFast (KOSMOS-2 model)
- layoutlm — LayoutLMTokenizer or LayoutLMTokenizerFast (LayoutLM model)
- layoutlmv2 — LayoutLMv2Tokenizer or LayoutLMv2TokenizerFast (LayoutLMv2 model)
- layoutlmv3 — LayoutLMv3Tokenizer or LayoutLMv3TokenizerFast (LayoutLMv3 model)
- layoutxlm — LayoutXLMTokenizer or LayoutXLMTokenizerFast (LayoutXLM model)
- led — LEDTokenizer or LEDTokenizerFast (LED model)
- lilt — LayoutLMv3Tokenizer or LayoutLMv3TokenizerFast (LiLT model)
- llama — LlamaTokenizer or LlamaTokenizerFast (LLaMA model)
- llava — LlamaTokenizer or LlamaTokenizerFast (LLaVa model)
- llava_next — LlamaTokenizer or LlamaTokenizerFast (LLaVA-NeXT model)
- llava_next_video — LlamaTokenizer or LlamaTokenizerFast (LLaVa-NeXT-Video model)
- longformer — LongformerTokenizer or LongformerTokenizerFast (Longformer model)
- longt5 — T5Tokenizer or T5TokenizerFast (LongT5 model)
- luke — LukeTokenizer (LUKE model)
- lxmert — LxmertTokenizer or LxmertTokenizerFast (LXMERT model)
- m2m_100 — M2M100Tokenizer (M2M100 model)
- mamba — GPTNeoXTokenizerFast (Mamba model)
- mamba2 — GPTNeoXTokenizerFast (mamba2 model)
- marian — MarianTokenizer (Marian model)
- mbart — MBartTokenizer or MBartTokenizerFast (mBART model)
- mbart50 — MBart50Tokenizer or MBart50TokenizerFast (mBART-50 model)
- mega — RobertaTokenizer or RobertaTokenizerFast (MEGA model)
- megatron-bert — BertTokenizer or BertTokenizerFast (Megatron-BERT model)
- mgp-str — MgpstrTokenizer (MGP-STR model)
- mistral — LlamaTokenizer or LlamaTokenizerFast (Mistral model)
- mixtral — LlamaTokenizer or LlamaTokenizerFast (Mixtral model)
- mllama — LlamaTokenizer or LlamaTokenizerFast (Mllama model)
- mluke — MLukeTokenizer (mLUKE model)
- mobilebert — MobileBertTokenizer or MobileBertTokenizerFast (MobileBERT model)
- mpnet — MPNetTokenizer or MPNetTokenizerFast (MPNet model)
- mpt — GPTNeoXTokenizerFast (MPT model)
- mra — RobertaTokenizer or RobertaTokenizerFast (MRA model)
- mt5 — MT5Tokenizer or MT5TokenizerFast (MT5 model)
- musicgen — T5Tokenizer or T5TokenizerFast (MusicGen model)
- musicgen_melody — T5Tokenizer or T5TokenizerFast (MusicGen Melody model)
- mvp — MvpTokenizer or MvpTokenizerFast (MVP model)
- nezha — BertTokenizer or BertTokenizerFast (Nezha model)
- nllb — NllbTokenizer or NllbTokenizerFast (NLLB model)
- nllb-moe — NllbTokenizer or NllbTokenizerFast (NLLB-MOE model)
- nystromformer — AlbertTokenizer or AlbertTokenizerFast (Nyströmformer model)
- olmo — GPTNeoXTokenizerFast (OLMo model)
- olmoe — GPTNeoXTokenizerFast (OLMoE model)
- omdet-turbo — CLIPTokenizer or CLIPTokenizerFast (OmDet-Turbo model)
- oneformer — CLIPTokenizer or CLIPTokenizerFast (OneFormer model)
- openai-gpt — OpenAIGPTTokenizer or OpenAIGPTTokenizerFast (OpenAI GPT model)
- opt — GPT2Tokenizer or GPT2TokenizerFast (OPT model)
- owlv2 — CLIPTokenizer or CLIPTokenizerFast (OWLv2 model)
- owlvit — CLIPTokenizer or CLIPTokenizerFast (OWL-ViT model)
- paligemma — LlamaTokenizer or LlamaTokenizerFast (PaliGemma model)
- pegasus — PegasusTokenizer or PegasusTokenizerFast (Pegasus model)
- pegasus_x — PegasusTokenizer or PegasusTokenizerFast (PEGASUS-X model)
- perceiver — PerceiverTokenizer (Perceiver model)
- persimmon — LlamaTokenizer or LlamaTokenizerFast (Persimmon model)
- phi — CodeGenTokenizer or CodeGenTokenizerFast (Phi model)
- phi3 — LlamaTokenizer or LlamaTokenizerFast (Phi3 model)
- phobert — PhobertTokenizer (PhoBERT model)
- pix2struct — T5Tokenizer or T5TokenizerFast (Pix2Struct model)
- pixtral — PreTrainedTokenizerFast (Pixtral model)
- plbart — PLBartTokenizer (PLBart model)
- prophetnet — ProphetNetTokenizer (ProphetNet model)
- qdqbert — BertTokenizer or BertTokenizerFast (QDQBert model)
- qwen2 — Qwen2Tokenizer or Qwen2TokenizerFast (Qwen2 model)
- qwen2_audio — Qwen2Tokenizer or Qwen2TokenizerFast (Qwen2Audio model)
- qwen2_moe — Qwen2Tokenizer or Qwen2TokenizerFast (Qwen2MoE model)
- rag — RagTokenizer (RAG model)
- realm — RealmTokenizer or RealmTokenizerFast (REALM model)
- recurrent_gemma — GemmaTokenizer or GemmaTokenizerFast (RecurrentGemma model)
- reformer — ReformerTokenizer or ReformerTokenizerFast (Reformer model)
- rembert — RemBertTokenizer or RemBertTokenizerFast (RemBERT model)
- retribert — RetriBertTokenizer or RetriBertTokenizerFast (RetriBERT model)
- roberta — RobertaTokenizer or RobertaTokenizerFast (RoBERTa model)
- roberta-prelayernorm — RobertaTokenizer or RobertaTokenizerFast (RoBERTa-PreLayerNorm model)
- roc_bert — RoCBertTokenizer (RoCBert model)
- roformer — RoFormerTokenizer or RoFormerTokenizerFast (RoFormer model)
- rwkv — GPTNeoXTokenizerFast (RWKV model)
- seamless_m4t — SeamlessM4TTokenizer or SeamlessM4TTokenizerFast (SeamlessM4T model)
- seamless_m4t_v2 — SeamlessM4TTokenizer or SeamlessM4TTokenizerFast (SeamlessM4Tv2 model)
- siglip — SiglipTokenizer (SigLIP model)
- speech_to_text — Speech2TextTokenizer (Speech2Text model)
- speech_to_text_2 — Speech2Text2Tokenizer (Speech2Text2 model)
- speecht5 — SpeechT5Tokenizer (SpeechT5 model)
- splinter — SplinterTokenizer or SplinterTokenizerFast (Splinter model)
- squeezebert — SqueezeBertTokenizer or SqueezeBertTokenizerFast (SqueezeBERT model)
- stablelm — GPTNeoXTokenizerFast (StableLm model)
- starcoder2 — GPT2Tokenizer or GPT2TokenizerFast (Starcoder2 model)
- switch_transformers — T5Tokenizer or T5TokenizerFast (SwitchTransformers model)
- t5 — T5Tokenizer or T5TokenizerFast (T5 model)
- tapas — TapasTokenizer (TAPAS model)
- tapex — TapexTokenizer (TAPEX model)
- transfo-xl — TransfoXLTokenizer (Transformer-XL model)
- tvp — BertTokenizer or BertTokenizerFast (TVP model)
- udop — UdopTokenizer or UdopTokenizerFast (UDOP model)
- umt5 — T5Tokenizer or T5TokenizerFast (UMT5 model)
- video_llava — LlamaTokenizer or LlamaTokenizerFast (VideoLlava model)
- vilt — BertTokenizer or BertTokenizerFast (ViLT model)
- vipllava — LlamaTokenizer or LlamaTokenizerFast (VipLlava model)
- visual_bert — BertTokenizer or BertTokenizerFast (VisualBERT model)
- vits — VitsTokenizer (VITS model)
- wav2vec2 — Wav2Vec2CTCTokenizer (Wav2Vec2 model)
- wav2vec2-bert — Wav2Vec2CTCTokenizer (Wav2Vec2-BERT model)
- wav2vec2-conformer — Wav2Vec2CTCTokenizer (Wav2Vec2-Conformer model)
- wav2vec2_phoneme — Wav2Vec2PhonemeCTCTokenizer (Wav2Vec2Phoneme model)
- whisper — WhisperTokenizer or WhisperTokenizerFast (Whisper model)
- xclip — CLIPTokenizer or CLIPTokenizerFast (X-CLIP model)
- xglm — XGLMTokenizer or XGLMTokenizerFast (XGLM model)
- xlm — XLMTokenizer (XLM model)
- xlm-prophetnet — XLMProphetNetTokenizer (XLM-ProphetNet model)
- xlm-roberta — XLMRobertaTokenizer or XLMRobertaTokenizerFast (XLM-RoBERTa model)
- xlm-roberta-xl — XLMRobertaTokenizer or XLMRobertaTokenizerFast (XLM-RoBERTa-XL model)
- xlnet — XLNetTokenizer or XLNetTokenizerFast (XLNet model)
- xmod — XLMRobertaTokenizer or XLMRobertaTokenizerFast (X-MOD model)
- yoso — AlbertTokenizer or AlbertTokenizerFast (YOSO model)
Examples:
>>> from transformers import AutoTokenizer
>>> # Download vocabulary from huggingface.co and cache.
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> # Download vocabulary from huggingface.co (user-uploaded) and cache.
>>> tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-german-cased")
>>> # If vocabulary files are in a directory (e.g. tokenizer was saved using *save_pretrained('./test/saved_model/')*)
>>> # tokenizer = AutoTokenizer.from_pretrained("./test/bert_saved_model/")
>>> # Download vocabulary from huggingface.co and define model-specific arguments
>>> tokenizer = AutoTokenizer.from_pretrained("FacebookAI/roberta-base", add_prefix_space=True)
register
< source >( config_class slow_tokenizer_class = None fast_tokenizer_class = None exist_ok = False )
Parameters
- config_class (PretrainedConfig) — The configuration corresponding to the model to register.
- slow_tokenizer_class (
PretrainedTokenizer
, optional) — The slow tokenizer to register. - fast_tokenizer_class (
PretrainedTokenizerFast
, optional) — The fast tokenizer to register.
Register a new tokenizer in this mapping.
AutoFeatureExtractor
This is a generic feature extractor class that will be instantiated as one of the feature extractor classes of the library when created with the AutoFeatureExtractor.from_pretrained() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_pretrained
< source >( pretrained_model_name_or_path **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — This can be either:- a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co.
- a path to a directory containing a feature extractor file saved using the
save_pretrained() method, e.g.,
./my_model_directory/
. - a path or url to a saved feature extractor JSON file, e.g.,
./my_model_directory/preprocessor_config.json
.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request. - token (
str
or bool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - return_unused_kwargs (
bool
, optional, defaults toFalse
) — IfFalse
, then this function returns just the final feature extractor object. IfTrue
, then this functions returns aTuple(feature_extractor, unused_kwargs)
where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part ofkwargs
which has not been used to updatefeature_extractor
and is otherwise ignored. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - kwargs (
Dict[str, Any]
, optional) — The values in kwargs of any keys which are feature extractor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not feature extractor attributes is controlled by thereturn_unused_kwargs
keyword parameter.
Instantiate one of the feature extractor classes of the library from a pretrained model vocabulary.
The feature extractor class to instantiate is selected based on the model_type
property of the config object
(either passed as an argument or loaded from pretrained_model_name_or_path
if possible), or when it’s
missing, by falling back to using pattern matching on pretrained_model_name_or_path
:
- audio-spectrogram-transformer — ASTFeatureExtractor (Audio Spectrogram Transformer model)
- beit — BeitFeatureExtractor (BEiT model)
- chinese_clip — ChineseCLIPFeatureExtractor (Chinese-CLIP model)
- clap — ClapFeatureExtractor (CLAP model)
- clip — CLIPFeatureExtractor (CLIP model)
- clipseg — ViTFeatureExtractor (CLIPSeg model)
- clvp — ClvpFeatureExtractor (CLVP model)
- conditional_detr — ConditionalDetrFeatureExtractor (Conditional DETR model)
- convnext — ConvNextFeatureExtractor (ConvNeXT model)
- cvt — ConvNextFeatureExtractor (CvT model)
- dac — DacFeatureExtractor (DAC model)
- data2vec-audio — Wav2Vec2FeatureExtractor (Data2VecAudio model)
- data2vec-vision — BeitFeatureExtractor (Data2VecVision model)
- deformable_detr — DeformableDetrFeatureExtractor (Deformable DETR model)
- deit — DeiTFeatureExtractor (DeiT model)
- detr — DetrFeatureExtractor (DETR model)
- dinat — ViTFeatureExtractor (DiNAT model)
- donut-swin — DonutFeatureExtractor (DonutSwin model)
- dpt — DPTFeatureExtractor (DPT model)
- encodec — EncodecFeatureExtractor (EnCodec model)
- flava — FlavaFeatureExtractor (FLAVA model)
- glpn — GLPNFeatureExtractor (GLPN model)
- groupvit — CLIPFeatureExtractor (GroupViT model)
- hubert — Wav2Vec2FeatureExtractor (Hubert model)
- imagegpt — ImageGPTFeatureExtractor (ImageGPT model)
- layoutlmv2 — LayoutLMv2FeatureExtractor (LayoutLMv2 model)
- layoutlmv3 — LayoutLMv3FeatureExtractor (LayoutLMv3 model)
- levit — LevitFeatureExtractor (LeViT model)
- maskformer — MaskFormerFeatureExtractor (MaskFormer model)
- mctct — MCTCTFeatureExtractor (M-CTC-T model)
- mimi — EncodecFeatureExtractor (Mimi model)
- mobilenet_v1 — MobileNetV1FeatureExtractor (MobileNetV1 model)
- mobilenet_v2 — MobileNetV2FeatureExtractor (MobileNetV2 model)
- mobilevit — MobileViTFeatureExtractor (MobileViT model)
- nat — ViTFeatureExtractor (NAT model)
- owlvit — OwlViTFeatureExtractor (OWL-ViT model)
- perceiver — PerceiverFeatureExtractor (Perceiver model)
- poolformer — PoolFormerFeatureExtractor (PoolFormer model)
- pop2piano — Pop2PianoFeatureExtractor (Pop2Piano model)
- regnet — ConvNextFeatureExtractor (RegNet model)
- resnet — ConvNextFeatureExtractor (ResNet model)
- seamless_m4t — SeamlessM4TFeatureExtractor (SeamlessM4T model)
- seamless_m4t_v2 — SeamlessM4TFeatureExtractor (SeamlessM4Tv2 model)
- segformer — SegformerFeatureExtractor (SegFormer model)
- sew — Wav2Vec2FeatureExtractor (SEW model)
- sew-d — Wav2Vec2FeatureExtractor (SEW-D model)
- speech_to_text — Speech2TextFeatureExtractor (Speech2Text model)
- speecht5 — SpeechT5FeatureExtractor (SpeechT5 model)
- swiftformer — ViTFeatureExtractor (SwiftFormer model)
- swin — ViTFeatureExtractor (Swin Transformer model)
- swinv2 — ViTFeatureExtractor (Swin Transformer V2 model)
- table-transformer — DetrFeatureExtractor (Table Transformer model)
- timesformer — VideoMAEFeatureExtractor (TimeSformer model)
- tvlt — TvltFeatureExtractor (TVLT model)
- unispeech — Wav2Vec2FeatureExtractor (UniSpeech model)
- unispeech-sat — Wav2Vec2FeatureExtractor (UniSpeechSat model)
- univnet — UnivNetFeatureExtractor (UnivNet model)
- van — ConvNextFeatureExtractor (VAN model)
- videomae — VideoMAEFeatureExtractor (VideoMAE model)
- vilt — ViltFeatureExtractor (ViLT model)
- vit — ViTFeatureExtractor (ViT model)
- vit_mae — ViTFeatureExtractor (ViTMAE model)
- vit_msn — ViTFeatureExtractor (ViTMSN model)
- wav2vec2 — Wav2Vec2FeatureExtractor (Wav2Vec2 model)
- wav2vec2-bert — Wav2Vec2FeatureExtractor (Wav2Vec2-BERT model)
- wav2vec2-conformer — Wav2Vec2FeatureExtractor (Wav2Vec2-Conformer model)
- wavlm — Wav2Vec2FeatureExtractor (WavLM model)
- whisper — WhisperFeatureExtractor (Whisper model)
- xclip — CLIPFeatureExtractor (X-CLIP model)
- yolos — YolosFeatureExtractor (YOLOS model)
Passing token=True
is required when you want to use a private model.
Examples:
>>> from transformers import AutoFeatureExtractor
>>> # Download feature extractor from huggingface.co and cache.
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base-960h")
>>> # If feature extractor files are in a directory (e.g. feature extractor was saved using *save_pretrained('./test/saved_model/')*)
>>> # feature_extractor = AutoFeatureExtractor.from_pretrained("./test/saved_model/")
register
< source >( config_class feature_extractor_class exist_ok = False )
Parameters
- config_class (PretrainedConfig) — The configuration corresponding to the model to register.
- feature_extractor_class (
FeatureExtractorMixin
) — The feature extractor to register.
Register a new feature extractor for this class.
AutoImageProcessor
This is a generic image processor class that will be instantiated as one of the image processor classes of the library when created with the AutoImageProcessor.from_pretrained() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_pretrained
< source >( pretrained_model_name_or_path *inputs **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — This can be either:- a string, the model id of a pretrained image_processor hosted inside a model repo on huggingface.co.
- a path to a directory containing a image processor file saved using the
save_pretrained() method, e.g.,
./my_model_directory/
. - a path or url to a saved image processor JSON file, e.g.,
./my_model_directory/preprocessor_config.json
.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model image processor should be cached if the standard cache should not be used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force to (re-)download the image processor files and override the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request. - token (
str
or bool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - use_fast (
bool
, optional, defaults toFalse
) — Use a fast torchvision-base image processor if it is supported for a given model. If a fast tokenizer is not available for a given model, a normal numpy-based image processor is returned instead. - return_unused_kwargs (
bool
, optional, defaults toFalse
) — IfFalse
, then this function returns just the final image processor object. IfTrue
, then this functions returns aTuple(image_processor, unused_kwargs)
where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not image processor attributes: i.e., the part ofkwargs
which has not been used to updateimage_processor
and is otherwise ignored. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - kwargs (
Dict[str, Any]
, optional) — The values in kwargs of any keys which are image processor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not image processor attributes is controlled by thereturn_unused_kwargs
keyword parameter.
Instantiate one of the image processor classes of the library from a pretrained model vocabulary.
The image processor class to instantiate is selected based on the model_type
property of the config object
(either passed as an argument or loaded from pretrained_model_name_or_path
if possible), or when it’s
missing, by falling back to using pattern matching on pretrained_model_name_or_path
:
- align — EfficientNetImageProcessor (ALIGN model)
- beit — BeitImageProcessor (BEiT model)
- bit — BitImageProcessor (BiT model)
- blip — BlipImageProcessor (BLIP model)
- blip-2 — BlipImageProcessor (BLIP-2 model)
- bridgetower — BridgeTowerImageProcessor (BridgeTower model)
- chameleon — ChameleonImageProcessor (Chameleon model)
- chinese_clip — ChineseCLIPImageProcessor (Chinese-CLIP model)
- clip — CLIPImageProcessor (CLIP model)
- clipseg — ViTImageProcessor or ViTImageProcessorFast (CLIPSeg model)
- conditional_detr — ConditionalDetrImageProcessor (Conditional DETR model)
- convnext — ConvNextImageProcessor (ConvNeXT model)
- convnextv2 — ConvNextImageProcessor (ConvNeXTV2 model)
- cvt — ConvNextImageProcessor (CvT model)
- data2vec-vision — BeitImageProcessor (Data2VecVision model)
- deformable_detr — DeformableDetrImageProcessor (Deformable DETR model)
- deit — DeiTImageProcessor (DeiT model)
- depth_anything — DPTImageProcessor (Depth Anything model)
- deta — DetaImageProcessor (DETA model)
- detr — DetrImageProcessor (DETR model)
- dinat — ViTImageProcessor or ViTImageProcessorFast (DiNAT model)
- dinov2 — BitImageProcessor (DINOv2 model)
- donut-swin — DonutImageProcessor (DonutSwin model)
- dpt — DPTImageProcessor (DPT model)
- efficientformer — EfficientFormerImageProcessor (EfficientFormer model)
- efficientnet — EfficientNetImageProcessor (EfficientNet model)
- flava — FlavaImageProcessor (FLAVA model)
- focalnet — BitImageProcessor (FocalNet model)
- fuyu — FuyuImageProcessor (Fuyu model)
- git — CLIPImageProcessor (GIT model)
- glpn — GLPNImageProcessor (GLPN model)
- grounding-dino — GroundingDinoImageProcessor (Grounding DINO model)
- groupvit — CLIPImageProcessor (GroupViT model)
- hiera — BitImageProcessor (Hiera model)
- idefics — IdeficsImageProcessor (IDEFICS model)
- idefics2 — Idefics2ImageProcessor (Idefics2 model)
- imagegpt — ImageGPTImageProcessor (ImageGPT model)
- instructblip — BlipImageProcessor (InstructBLIP model)
- instructblipvideo — InstructBlipVideoImageProcessor (InstructBlipVideo model)
- kosmos-2 — CLIPImageProcessor (KOSMOS-2 model)
- layoutlmv2 — LayoutLMv2ImageProcessor (LayoutLMv2 model)
- layoutlmv3 — LayoutLMv3ImageProcessor (LayoutLMv3 model)
- levit — LevitImageProcessor (LeViT model)
- llava — CLIPImageProcessor (LLaVa model)
- llava_next — LlavaNextImageProcessor (LLaVA-NeXT model)
- llava_next_video — LlavaNextVideoImageProcessor (LLaVa-NeXT-Video model)
- llava_onevision — LlavaOnevisionImageProcessor (LLaVA-Onevision model)
- mask2former — Mask2FormerImageProcessor (Mask2Former model)
- maskformer — MaskFormerImageProcessor (MaskFormer model)
- mgp-str — ViTImageProcessor or ViTImageProcessorFast (MGP-STR model)
- mllama — MllamaImageProcessor (Mllama model)
- mobilenet_v1 — MobileNetV1ImageProcessor (MobileNetV1 model)
- mobilenet_v2 — MobileNetV2ImageProcessor (MobileNetV2 model)
- mobilevit — MobileViTImageProcessor (MobileViT model)
- mobilevitv2 — MobileViTImageProcessor (MobileViTV2 model)
- nat — ViTImageProcessor or ViTImageProcessorFast (NAT model)
- nougat — NougatImageProcessor (Nougat model)
- oneformer — OneFormerImageProcessor (OneFormer model)
- owlv2 — Owlv2ImageProcessor (OWLv2 model)
- owlvit — OwlViTImageProcessor (OWL-ViT model)
- perceiver — PerceiverImageProcessor (Perceiver model)
- pix2struct — Pix2StructImageProcessor (Pix2Struct model)
- pixtral — PixtralImageProcessor (Pixtral model)
- poolformer — PoolFormerImageProcessor (PoolFormer model)
- pvt — PvtImageProcessor (PVT model)
- pvt_v2 — PvtImageProcessor (PVTv2 model)
- qwen2_vl — Qwen2VLImageProcessor (Qwen2VL model)
- regnet — ConvNextImageProcessor (RegNet model)
- resnet — ConvNextImageProcessor (ResNet model)
- rt_detr —
R
orT
(RT-DETR model) - sam — SamImageProcessor (SAM model)
- segformer — SegformerImageProcessor (SegFormer model)
- seggpt — SegGptImageProcessor (SegGPT model)
- siglip — SiglipImageProcessor (SigLIP model)
- swiftformer — ViTImageProcessor or ViTImageProcessorFast (SwiftFormer model)
- swin — ViTImageProcessor or ViTImageProcessorFast (Swin Transformer model)
- swin2sr — Swin2SRImageProcessor (Swin2SR model)
- swinv2 — ViTImageProcessor or ViTImageProcessorFast (Swin Transformer V2 model)
- table-transformer — DetrImageProcessor (Table Transformer model)
- timesformer — VideoMAEImageProcessor (TimeSformer model)
- tvlt — TvltImageProcessor (TVLT model)
- tvp — TvpImageProcessor (TVP model)
- udop — LayoutLMv3ImageProcessor (UDOP model)
- upernet — SegformerImageProcessor (UPerNet model)
- van — ConvNextImageProcessor (VAN model)
- videomae — VideoMAEImageProcessor (VideoMAE model)
- vilt — ViltImageProcessor (ViLT model)
- vipllava — CLIPImageProcessor (VipLlava model)
- vit — ViTImageProcessor or ViTImageProcessorFast (ViT model)
- vit_hybrid — ViTHybridImageProcessor (ViT Hybrid model)
- vit_mae — ViTImageProcessor or ViTImageProcessorFast (ViTMAE model)
- vit_msn — ViTImageProcessor or ViTImageProcessorFast (ViTMSN model)
- vitmatte — VitMatteImageProcessor (ViTMatte model)
- xclip — CLIPImageProcessor (X-CLIP model)
- yolos — YolosImageProcessor (YOLOS model)
- zoedepth — ZoeDepthImageProcessor (ZoeDepth model)
Passing token=True
is required when you want to use a private model.
Examples:
>>> from transformers import AutoImageProcessor
>>> # Download image processor from huggingface.co and cache.
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
>>> # If image processor files are in a directory (e.g. image processor was saved using *save_pretrained('./test/saved_model/')*)
>>> # image_processor = AutoImageProcessor.from_pretrained("./test/saved_model/")
register
< source >( config_class image_processor_class = None slow_image_processor_class = None fast_image_processor_class = None exist_ok = False )
Parameters
- config_class (PretrainedConfig) — The configuration corresponding to the model to register.
- image_processor_class (ImageProcessingMixin) — The image processor to register.
Register a new image processor for this class.
AutoProcessor
This is a generic processor class that will be instantiated as one of the processor classes of the library when created with the AutoProcessor.from_pretrained() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_pretrained
< source >( pretrained_model_name_or_path **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — This can be either:- a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co.
- a path to a directory containing a processor files saved using the
save_pretrained()
method, e.g.,./my_model_directory/
.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request. - token (
str
or bool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - return_unused_kwargs (
bool
, optional, defaults toFalse
) — IfFalse
, then this function returns just the final feature extractor object. IfTrue
, then this functions returns aTuple(feature_extractor, unused_kwargs)
where unused_kwargs is a dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part ofkwargs
which has not been used to updatefeature_extractor
and is otherwise ignored. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - kwargs (
Dict[str, Any]
, optional) — The values in kwargs of any keys which are feature extractor attributes will be used to override the loaded values. Behavior concerning key/value pairs whose keys are not feature extractor attributes is controlled by thereturn_unused_kwargs
keyword parameter.
Instantiate one of the processor classes of the library from a pretrained model vocabulary.
The processor class to instantiate is selected based on the model_type
property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path
if possible):
- align — AlignProcessor (ALIGN model)
- altclip — AltCLIPProcessor (AltCLIP model)
- bark — BarkProcessor (Bark model)
- blip — BlipProcessor (BLIP model)
- blip-2 — Blip2Processor (BLIP-2 model)
- bridgetower — BridgeTowerProcessor (BridgeTower model)
- chameleon — ChameleonProcessor (Chameleon model)
- chinese_clip — ChineseCLIPProcessor (Chinese-CLIP model)
- clap — ClapProcessor (CLAP model)
- clip — CLIPProcessor (CLIP model)
- clipseg — CLIPSegProcessor (CLIPSeg model)
- clvp — ClvpProcessor (CLVP model)
- flava — FlavaProcessor (FLAVA model)
- fuyu — FuyuProcessor (Fuyu model)
- git — GitProcessor (GIT model)
- grounding-dino — GroundingDinoProcessor (Grounding DINO model)
- groupvit — CLIPProcessor (GroupViT model)
- hubert — Wav2Vec2Processor (Hubert model)
- idefics — IdeficsProcessor (IDEFICS model)
- idefics2 — Idefics2Processor (Idefics2 model)
- instructblip — InstructBlipProcessor (InstructBLIP model)
- instructblipvideo — InstructBlipVideoProcessor (InstructBlipVideo model)
- kosmos-2 — Kosmos2Processor (KOSMOS-2 model)
- layoutlmv2 — LayoutLMv2Processor (LayoutLMv2 model)
- layoutlmv3 — LayoutLMv3Processor (LayoutLMv3 model)
- llava — LlavaProcessor (LLaVa model)
- llava_next — LlavaNextProcessor (LLaVA-NeXT model)
- llava_next_video — LlavaNextVideoProcessor (LLaVa-NeXT-Video model)
- llava_onevision — LlavaOnevisionProcessor (LLaVA-Onevision model)
- markuplm — MarkupLMProcessor (MarkupLM model)
- mctct — MCTCTProcessor (M-CTC-T model)
- mgp-str — MgpstrProcessor (MGP-STR model)
- mllama — MllamaProcessor (Mllama model)
- oneformer — OneFormerProcessor (OneFormer model)
- owlv2 — Owlv2Processor (OWLv2 model)
- owlvit — OwlViTProcessor (OWL-ViT model)
- paligemma — PaliGemmaProcessor (PaliGemma model)
- pix2struct — Pix2StructProcessor (Pix2Struct model)
- pixtral — PixtralProcessor (Pixtral model)
- pop2piano — Pop2PianoProcessor (Pop2Piano model)
- qwen2_audio — Qwen2AudioProcessor (Qwen2Audio model)
- qwen2_vl — Qwen2VLProcessor (Qwen2VL model)
- sam — SamProcessor (SAM model)
- seamless_m4t — SeamlessM4TProcessor (SeamlessM4T model)
- sew — Wav2Vec2Processor (SEW model)
- sew-d — Wav2Vec2Processor (SEW-D model)
- siglip — SiglipProcessor (SigLIP model)
- speech_to_text — Speech2TextProcessor (Speech2Text model)
- speech_to_text_2 — Speech2Text2Processor (Speech2Text2 model)
- speecht5 — SpeechT5Processor (SpeechT5 model)
- trocr — TrOCRProcessor (TrOCR model)
- tvlt — TvltProcessor (TVLT model)
- tvp — TvpProcessor (TVP model)
- unispeech — Wav2Vec2Processor (UniSpeech model)
- unispeech-sat — Wav2Vec2Processor (UniSpeechSat model)
- video_llava — VideoLlavaProcessor (VideoLlava model)
- vilt — ViltProcessor (ViLT model)
- vipllava — LlavaProcessor (VipLlava model)
- vision-text-dual-encoder — VisionTextDualEncoderProcessor (VisionTextDualEncoder model)
- wav2vec2 — Wav2Vec2Processor (Wav2Vec2 model)
- wav2vec2-bert — Wav2Vec2Processor (Wav2Vec2-BERT model)
- wav2vec2-conformer — Wav2Vec2Processor (Wav2Vec2-Conformer model)
- wavlm — Wav2Vec2Processor (WavLM model)
- whisper — WhisperProcessor (Whisper model)
- xclip — XCLIPProcessor (X-CLIP model)
Passing token=True
is required when you want to use a private model.
Examples:
>>> from transformers import AutoProcessor
>>> # Download processor from huggingface.co and cache.
>>> processor = AutoProcessor.from_pretrained("facebook/wav2vec2-base-960h")
>>> # If processor files are in a directory (e.g. processor was saved using *save_pretrained('./test/saved_model/')*)
>>> # processor = AutoProcessor.from_pretrained("./test/saved_model/")
register
< source >( config_class processor_class exist_ok = False )
Parameters
- config_class (PretrainedConfig) — The configuration corresponding to the model to register.
- processor_class (
FeatureExtractorMixin
) — The processor to register.
Register a new processor for this class.
Generic model classes
The following auto classes are available for instantiating a base model class without a specific head.
AutoModel
This is a generic model class that will be instantiated as one of the base model classes of the library when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- ASTConfig configuration class: ASTModel (Audio Spectrogram Transformer model)
- AlbertConfig configuration class: AlbertModel (ALBERT model)
- AlignConfig configuration class: AlignModel (ALIGN model)
- AltCLIPConfig configuration class: AltCLIPModel (AltCLIP model)
- AutoformerConfig configuration class: AutoformerModel (Autoformer model)
- BarkConfig configuration class: BarkModel (Bark model)
- BartConfig configuration class: BartModel (BART model)
- BeitConfig configuration class: BeitModel (BEiT model)
- BertConfig configuration class: BertModel (BERT model)
- BertGenerationConfig configuration class: BertGenerationEncoder (Bert Generation model)
- BigBirdConfig configuration class: BigBirdModel (BigBird model)
- BigBirdPegasusConfig configuration class: BigBirdPegasusModel (BigBird-Pegasus model)
- BioGptConfig configuration class: BioGptModel (BioGpt model)
- BitConfig configuration class: BitModel (BiT model)
- BlenderbotConfig configuration class: BlenderbotModel (Blenderbot model)
- BlenderbotSmallConfig configuration class: BlenderbotSmallModel (BlenderbotSmall model)
- Blip2Config configuration class: Blip2Model (BLIP-2 model)
- BlipConfig configuration class: BlipModel (BLIP model)
- BloomConfig configuration class: BloomModel (BLOOM model)
- BridgeTowerConfig configuration class: BridgeTowerModel (BridgeTower model)
- BrosConfig configuration class: BrosModel (BROS model)
- CLIPConfig configuration class: CLIPModel (CLIP model)
- CLIPSegConfig configuration class: CLIPSegModel (CLIPSeg model)
- CLIPTextConfig configuration class: CLIPTextModel (CLIPTextModel model)
- CLIPVisionConfig configuration class: CLIPVisionModel (CLIPVisionModel model)
- CTRLConfig configuration class: CTRLModel (CTRL model)
- CamembertConfig configuration class: CamembertModel (CamemBERT model)
- CanineConfig configuration class: CanineModel (CANINE model)
- ChameleonConfig configuration class: ChameleonModel (Chameleon model)
- ChineseCLIPConfig configuration class: ChineseCLIPModel (Chinese-CLIP model)
- ChineseCLIPVisionConfig configuration class: ChineseCLIPVisionModel (ChineseCLIPVisionModel model)
- ClapConfig configuration class: ClapModel (CLAP model)
- ClvpConfig configuration class: ClvpModelForConditionalGeneration (CLVP model)
- CodeGenConfig configuration class: CodeGenModel (CodeGen model)
- CohereConfig configuration class: CohereModel (Cohere model)
- ConditionalDetrConfig configuration class: ConditionalDetrModel (Conditional DETR model)
- ConvBertConfig configuration class: ConvBertModel (ConvBERT model)
- ConvNextConfig configuration class: ConvNextModel (ConvNeXT model)
- ConvNextV2Config configuration class: ConvNextV2Model (ConvNeXTV2 model)
- CpmAntConfig configuration class: CpmAntModel (CPM-Ant model)
- CvtConfig configuration class: CvtModel (CvT model)
- DPRConfig configuration class: DPRQuestionEncoder (DPR model)
- DPTConfig configuration class: DPTModel (DPT model)
- DacConfig configuration class: DacModel (DAC model)
- Data2VecAudioConfig configuration class: Data2VecAudioModel (Data2VecAudio model)
- Data2VecTextConfig configuration class: Data2VecTextModel (Data2VecText model)
- Data2VecVisionConfig configuration class: Data2VecVisionModel (Data2VecVision model)
- DbrxConfig configuration class: DbrxModel (DBRX model)
- DebertaConfig configuration class: DebertaModel (DeBERTa model)
- DebertaV2Config configuration class: DebertaV2Model (DeBERTa-v2 model)
- DecisionTransformerConfig configuration class: DecisionTransformerModel (Decision Transformer model)
- DeformableDetrConfig configuration class: DeformableDetrModel (Deformable DETR model)
- DeiTConfig configuration class: DeiTModel (DeiT model)
- DetaConfig configuration class: DetaModel (DETA model)
- DetrConfig configuration class: DetrModel (DETR model)
- DinatConfig configuration class: DinatModel (DiNAT model)
- Dinov2Config configuration class: Dinov2Model (DINOv2 model)
- DistilBertConfig configuration class: DistilBertModel (DistilBERT model)
- DonutSwinConfig configuration class: DonutSwinModel (DonutSwin model)
- EfficientFormerConfig configuration class: EfficientFormerModel (EfficientFormer model)
- EfficientNetConfig configuration class: EfficientNetModel (EfficientNet model)
- ElectraConfig configuration class: ElectraModel (ELECTRA model)
- EncodecConfig configuration class: EncodecModel (EnCodec model)
- ErnieConfig configuration class: ErnieModel (ERNIE model)
- ErnieMConfig configuration class: ErnieMModel (ErnieM model)
- EsmConfig configuration class: EsmModel (ESM model)
- FNetConfig configuration class: FNetModel (FNet model)
- FSMTConfig configuration class: FSMTModel (FairSeq Machine-Translation model)
- FalconConfig configuration class: FalconModel (Falcon model)
- FalconMambaConfig configuration class: FalconMambaModel (FalconMamba model)
- FastSpeech2ConformerConfig configuration class: FastSpeech2ConformerModel (FastSpeech2Conformer model)
- FlaubertConfig configuration class: FlaubertModel (FlauBERT model)
- FlavaConfig configuration class: FlavaModel (FLAVA model)
- FocalNetConfig configuration class: FocalNetModel (FocalNet model)
- FunnelConfig configuration class: FunnelModel or FunnelBaseModel (Funnel Transformer model)
- GLPNConfig configuration class: GLPNModel (GLPN model)
- GPT2Config configuration class: GPT2Model (OpenAI GPT-2 model)
- GPTBigCodeConfig configuration class: GPTBigCodeModel (GPTBigCode model)
- GPTJConfig configuration class: GPTJModel (GPT-J model)
- GPTNeoConfig configuration class: GPTNeoModel (GPT Neo model)
- GPTNeoXConfig configuration class: GPTNeoXModel (GPT NeoX model)
- GPTNeoXJapaneseConfig configuration class: GPTNeoXJapaneseModel (GPT NeoX Japanese model)
- GPTSanJapaneseConfig configuration class: GPTSanJapaneseForConditionalGeneration (GPTSAN-japanese model)
- Gemma2Config configuration class: Gemma2Model (Gemma2 model)
- GemmaConfig configuration class: GemmaModel (Gemma model)
- GitConfig configuration class: GitModel (GIT model)
- GraniteConfig configuration class: GraniteModel (Granite model)
- GraniteMoeConfig configuration class: GraniteMoeModel (GraniteMoeMoe model)
- GraphormerConfig configuration class: GraphormerModel (Graphormer model)
- GroundingDinoConfig configuration class: GroundingDinoModel (Grounding DINO model)
- GroupViTConfig configuration class: GroupViTModel (GroupViT model)
- HieraConfig configuration class: HieraModel (Hiera model)
- HubertConfig configuration class: HubertModel (Hubert model)
- IBertConfig configuration class: IBertModel (I-BERT model)
- Idefics2Config configuration class: Idefics2Model (Idefics2 model)
- IdeficsConfig configuration class: IdeficsModel (IDEFICS model)
- ImageGPTConfig configuration class: ImageGPTModel (ImageGPT model)
- InformerConfig configuration class: InformerModel (Informer model)
- JambaConfig configuration class: JambaModel (Jamba model)
- JetMoeConfig configuration class: JetMoeModel (JetMoe model)
- JukeboxConfig configuration class: JukeboxModel (Jukebox model)
- Kosmos2Config configuration class: Kosmos2Model (KOSMOS-2 model)
- LEDConfig configuration class: LEDModel (LED model)
- LayoutLMConfig configuration class: LayoutLMModel (LayoutLM model)
- LayoutLMv2Config configuration class: LayoutLMv2Model (LayoutLMv2 model)
- LayoutLMv3Config configuration class: LayoutLMv3Model (LayoutLMv3 model)
- LevitConfig configuration class: LevitModel (LeViT model)
- LiltConfig configuration class: LiltModel (LiLT model)
- LlamaConfig configuration class: LlamaModel (LLaMA model)
- LongT5Config configuration class: LongT5Model (LongT5 model)
- LongformerConfig configuration class: LongformerModel (Longformer model)
- LukeConfig configuration class: LukeModel (LUKE model)
- LxmertConfig configuration class: LxmertModel (LXMERT model)
- M2M100Config configuration class: M2M100Model (M2M100 model)
- MBartConfig configuration class: MBartModel (mBART model)
- MCTCTConfig configuration class: MCTCTModel (M-CTC-T model)
- MPNetConfig configuration class: MPNetModel (MPNet model)
- MT5Config configuration class: MT5Model (MT5 model)
- Mamba2Config configuration class: Mamba2Model (mamba2 model)
- MambaConfig configuration class: MambaModel (Mamba model)
- MarianConfig configuration class: MarianModel (Marian model)
- MarkupLMConfig configuration class: MarkupLMModel (MarkupLM model)
- Mask2FormerConfig configuration class: Mask2FormerModel (Mask2Former model)
- MaskFormerConfig configuration class: MaskFormerModel (MaskFormer model)
MaskFormerSwinConfig
configuration class:MaskFormerSwinModel
(MaskFormerSwin model)- MegaConfig configuration class: MegaModel (MEGA model)
- MegatronBertConfig configuration class: MegatronBertModel (Megatron-BERT model)
- MgpstrConfig configuration class: MgpstrForSceneTextRecognition (MGP-STR model)
- MimiConfig configuration class: MimiModel (Mimi model)
- MistralConfig configuration class: MistralModel (Mistral model)
- MixtralConfig configuration class: MixtralModel (Mixtral model)
- MobileBertConfig configuration class: MobileBertModel (MobileBERT model)
- MobileNetV1Config configuration class: MobileNetV1Model (MobileNetV1 model)
- MobileNetV2Config configuration class: MobileNetV2Model (MobileNetV2 model)
- MobileViTConfig configuration class: MobileViTModel (MobileViT model)
- MobileViTV2Config configuration class: MobileViTV2Model (MobileViTV2 model)
- MptConfig configuration class: MptModel (MPT model)
- MraConfig configuration class: MraModel (MRA model)
- MusicgenConfig configuration class: MusicgenModel (MusicGen model)
- MusicgenMelodyConfig configuration class: MusicgenMelodyModel (MusicGen Melody model)
- MvpConfig configuration class: MvpModel (MVP model)
- NatConfig configuration class: NatModel (NAT model)
- NemotronConfig configuration class: NemotronModel (Nemotron model)
- NezhaConfig configuration class: NezhaModel (Nezha model)
- NllbMoeConfig configuration class: NllbMoeModel (NLLB-MOE model)
- NystromformerConfig configuration class: NystromformerModel (Nyströmformer model)
- OPTConfig configuration class: OPTModel (OPT model)
- OlmoConfig configuration class: OlmoModel (OLMo model)
- OlmoeConfig configuration class: OlmoeModel (OLMoE model)
- OmDetTurboConfig configuration class: OmDetTurboForObjectDetection (OmDet-Turbo model)
- OneFormerConfig configuration class: OneFormerModel (OneFormer model)
- OpenAIGPTConfig configuration class: OpenAIGPTModel (OpenAI GPT model)
- OpenLlamaConfig configuration class: OpenLlamaModel (OpenLlama model)
- OwlViTConfig configuration class: OwlViTModel (OWL-ViT model)
- Owlv2Config configuration class: Owlv2Model (OWLv2 model)
- PLBartConfig configuration class: PLBartModel (PLBart model)
- PatchTSMixerConfig configuration class: PatchTSMixerModel (PatchTSMixer model)
- PatchTSTConfig configuration class: PatchTSTModel (PatchTST model)
- PegasusConfig configuration class: PegasusModel (Pegasus model)
- PegasusXConfig configuration class: PegasusXModel (PEGASUS-X model)
- PerceiverConfig configuration class: PerceiverModel (Perceiver model)
- PersimmonConfig configuration class: PersimmonModel (Persimmon model)
- Phi3Config configuration class: Phi3Model (Phi3 model)
- PhiConfig configuration class: PhiModel (Phi model)
- PixtralVisionConfig configuration class: PixtralVisionModel (Pixtral model)
- PoolFormerConfig configuration class: PoolFormerModel (PoolFormer model)
- ProphetNetConfig configuration class: ProphetNetModel (ProphetNet model)
- PvtConfig configuration class: PvtModel (PVT model)
- PvtV2Config configuration class: PvtV2Model (PVTv2 model)
- QDQBertConfig configuration class: QDQBertModel (QDQBert model)
- Qwen2AudioEncoderConfig configuration class:
Qwen2AudioEncoder
(Qwen2AudioEncoder model) - Qwen2Config configuration class: Qwen2Model (Qwen2 model)
- Qwen2MoeConfig configuration class: Qwen2MoeModel (Qwen2MoE model)
- Qwen2VLConfig configuration class: Qwen2VLModel (Qwen2VL model)
- RTDetrConfig configuration class: RTDetrModel (RT-DETR model)
- RecurrentGemmaConfig configuration class: RecurrentGemmaModel (RecurrentGemma model)
- ReformerConfig configuration class: ReformerModel (Reformer model)
- RegNetConfig configuration class: RegNetModel (RegNet model)
- RemBertConfig configuration class: RemBertModel (RemBERT model)
- ResNetConfig configuration class: ResNetModel (ResNet model)
- RetriBertConfig configuration class: RetriBertModel (RetriBERT model)
- RoCBertConfig configuration class: RoCBertModel (RoCBert model)
- RoFormerConfig configuration class: RoFormerModel (RoFormer model)
- RobertaConfig configuration class: RobertaModel (RoBERTa model)
- RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
- RwkvConfig configuration class: RwkvModel (RWKV model)
- SEWConfig configuration class: SEWModel (SEW model)
- SEWDConfig configuration class: SEWDModel (SEW-D model)
- SamConfig configuration class: SamModel (SAM model)
- SeamlessM4TConfig configuration class: SeamlessM4TModel (SeamlessM4T model)
- SeamlessM4Tv2Config configuration class: SeamlessM4Tv2Model (SeamlessM4Tv2 model)
- SegGptConfig configuration class: SegGptModel (SegGPT model)
- SegformerConfig configuration class: SegformerModel (SegFormer model)
- SiglipConfig configuration class: SiglipModel (SigLIP model)
- SiglipVisionConfig configuration class: SiglipVisionModel (SiglipVisionModel model)
- Speech2TextConfig configuration class: Speech2TextModel (Speech2Text model)
- SpeechT5Config configuration class: SpeechT5Model (SpeechT5 model)
- SplinterConfig configuration class: SplinterModel (Splinter model)
- SqueezeBertConfig configuration class: SqueezeBertModel (SqueezeBERT model)
- StableLmConfig configuration class: StableLmModel (StableLm model)
- Starcoder2Config configuration class: Starcoder2Model (Starcoder2 model)
- SwiftFormerConfig configuration class: SwiftFormerModel (SwiftFormer model)
- Swin2SRConfig configuration class: Swin2SRModel (Swin2SR model)
- SwinConfig configuration class: SwinModel (Swin Transformer model)
- Swinv2Config configuration class: Swinv2Model (Swin Transformer V2 model)
- SwitchTransformersConfig configuration class: SwitchTransformersModel (SwitchTransformers model)
- T5Config configuration class: T5Model (T5 model)
- TableTransformerConfig configuration class: TableTransformerModel (Table Transformer model)
- TapasConfig configuration class: TapasModel (TAPAS model)
- TimeSeriesTransformerConfig configuration class: TimeSeriesTransformerModel (Time Series Transformer model)
- TimesformerConfig configuration class: TimesformerModel (TimeSformer model)
- TimmBackboneConfig configuration class: TimmBackbone (TimmBackbone model)
- TrajectoryTransformerConfig configuration class: TrajectoryTransformerModel (Trajectory Transformer model)
- TransfoXLConfig configuration class: TransfoXLModel (Transformer-XL model)
- TvltConfig configuration class: TvltModel (TVLT model)
- TvpConfig configuration class: TvpModel (TVP model)
- UMT5Config configuration class: UMT5Model (UMT5 model)
- UdopConfig configuration class: UdopModel (UDOP model)
- UniSpeechConfig configuration class: UniSpeechModel (UniSpeech model)
- UniSpeechSatConfig configuration class: UniSpeechSatModel (UniSpeechSat model)
- UnivNetConfig configuration class: UnivNetModel (UnivNet model)
- VanConfig configuration class: VanModel (VAN model)
- ViTConfig configuration class: ViTModel (ViT model)
- ViTHybridConfig configuration class: ViTHybridModel (ViT Hybrid model)
- ViTMAEConfig configuration class: ViTMAEModel (ViTMAE model)
- ViTMSNConfig configuration class: ViTMSNModel (ViTMSN model)
- VideoMAEConfig configuration class: VideoMAEModel (VideoMAE model)
- ViltConfig configuration class: ViltModel (ViLT model)
- VisionTextDualEncoderConfig configuration class: VisionTextDualEncoderModel (VisionTextDualEncoder model)
- VisualBertConfig configuration class: VisualBertModel (VisualBERT model)
- VitDetConfig configuration class: VitDetModel (VitDet model)
- VitsConfig configuration class: VitsModel (VITS model)
- VivitConfig configuration class: VivitModel (ViViT model)
- Wav2Vec2BertConfig configuration class: Wav2Vec2BertModel (Wav2Vec2-BERT model)
- Wav2Vec2Config configuration class: Wav2Vec2Model (Wav2Vec2 model)
- Wav2Vec2ConformerConfig configuration class: Wav2Vec2ConformerModel (Wav2Vec2-Conformer model)
- WavLMConfig configuration class: WavLMModel (WavLM model)
- WhisperConfig configuration class: WhisperModel (Whisper model)
- XCLIPConfig configuration class: XCLIPModel (X-CLIP model)
- XGLMConfig configuration class: XGLMModel (XGLM model)
- XLMConfig configuration class: XLMModel (XLM model)
- XLMProphetNetConfig configuration class: XLMProphetNetModel (XLM-ProphetNet model)
- XLMRobertaConfig configuration class: XLMRobertaModel (XLM-RoBERTa model)
- XLMRobertaXLConfig configuration class: XLMRobertaXLModel (XLM-RoBERTa-XL model)
- XLNetConfig configuration class: XLNetModel (XLNet model)
- XmodConfig configuration class: XmodModel (X-MOD model)
- YolosConfig configuration class: YolosModel (YOLOS model)
- YosoConfig configuration class: YosoModel (YOSO model)
- attn_implementation (
str
, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"
(manual implementation of the attention),"sdpa"
(usingF.scaled_dot_product_attention
), or"flash_attention_2"
(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"
implementation.
Instantiates one of the base model classes of the library from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/
. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index
). In this case,from_tf
should be set toTrue
and a configuration object should be provided asconfig
argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()
method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
- state_dict (Dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool
, optional, defaults toFalse
) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_path
argument). - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - output_loading_info(
bool
, optional, defaults toFalse
) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str
, optional, defaults to"main"
) — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True
). Behaves differently depending on whether aconfig
is provided or automatically loaded:- If a configuration is provided with
config
,**kwargs
will be directly passed to the underlying model’s__init__
method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargs
will be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargs
that corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargs
value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__
function.
- If a configuration is provided with
Instantiate one of the base model classes of the library from a pretrained model.
The model class to instantiate is selected based on the model_type
property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path
if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path
:
- albert — AlbertModel (ALBERT model)
- align — AlignModel (ALIGN model)
- altclip — AltCLIPModel (AltCLIP model)
- audio-spectrogram-transformer — ASTModel (Audio Spectrogram Transformer model)
- autoformer — AutoformerModel (Autoformer model)
- bark — BarkModel (Bark model)
- bart — BartModel (BART model)
- beit — BeitModel (BEiT model)
- bert — BertModel (BERT model)
- bert-generation — BertGenerationEncoder (Bert Generation model)
- big_bird — BigBirdModel (BigBird model)
- bigbird_pegasus — BigBirdPegasusModel (BigBird-Pegasus model)
- biogpt — BioGptModel (BioGpt model)
- bit — BitModel (BiT model)
- blenderbot — BlenderbotModel (Blenderbot model)
- blenderbot-small — BlenderbotSmallModel (BlenderbotSmall model)
- blip — BlipModel (BLIP model)
- blip-2 — Blip2Model (BLIP-2 model)
- bloom — BloomModel (BLOOM model)
- bridgetower — BridgeTowerModel (BridgeTower model)
- bros — BrosModel (BROS model)
- camembert — CamembertModel (CamemBERT model)
- canine — CanineModel (CANINE model)
- chameleon — ChameleonModel (Chameleon model)
- chinese_clip — ChineseCLIPModel (Chinese-CLIP model)
- chinese_clip_vision_model — ChineseCLIPVisionModel (ChineseCLIPVisionModel model)
- clap — ClapModel (CLAP model)
- clip — CLIPModel (CLIP model)
- clip_text_model — CLIPTextModel (CLIPTextModel model)
- clip_vision_model — CLIPVisionModel (CLIPVisionModel model)
- clipseg — CLIPSegModel (CLIPSeg model)
- clvp — ClvpModelForConditionalGeneration (CLVP model)
- code_llama — LlamaModel (CodeLlama model)
- codegen — CodeGenModel (CodeGen model)
- cohere — CohereModel (Cohere model)
- conditional_detr — ConditionalDetrModel (Conditional DETR model)
- convbert — ConvBertModel (ConvBERT model)
- convnext — ConvNextModel (ConvNeXT model)
- convnextv2 — ConvNextV2Model (ConvNeXTV2 model)
- cpmant — CpmAntModel (CPM-Ant model)
- ctrl — CTRLModel (CTRL model)
- cvt — CvtModel (CvT model)
- dac — DacModel (DAC model)
- data2vec-audio — Data2VecAudioModel (Data2VecAudio model)
- data2vec-text — Data2VecTextModel (Data2VecText model)
- data2vec-vision — Data2VecVisionModel (Data2VecVision model)
- dbrx — DbrxModel (DBRX model)
- deberta — DebertaModel (DeBERTa model)
- deberta-v2 — DebertaV2Model (DeBERTa-v2 model)
- decision_transformer — DecisionTransformerModel (Decision Transformer model)
- deformable_detr — DeformableDetrModel (Deformable DETR model)
- deit — DeiTModel (DeiT model)
- deta — DetaModel (DETA model)
- detr — DetrModel (DETR model)
- dinat — DinatModel (DiNAT model)
- dinov2 — Dinov2Model (DINOv2 model)
- distilbert — DistilBertModel (DistilBERT model)
- donut-swin — DonutSwinModel (DonutSwin model)
- dpr — DPRQuestionEncoder (DPR model)
- dpt — DPTModel (DPT model)
- efficientformer — EfficientFormerModel (EfficientFormer model)
- efficientnet — EfficientNetModel (EfficientNet model)
- electra — ElectraModel (ELECTRA model)
- encodec — EncodecModel (EnCodec model)
- ernie — ErnieModel (ERNIE model)
- ernie_m — ErnieMModel (ErnieM model)
- esm — EsmModel (ESM model)
- falcon — FalconModel (Falcon model)
- falcon_mamba — FalconMambaModel (FalconMamba model)
- fastspeech2_conformer — FastSpeech2ConformerModel (FastSpeech2Conformer model)
- flaubert — FlaubertModel (FlauBERT model)
- flava — FlavaModel (FLAVA model)
- fnet — FNetModel (FNet model)
- focalnet — FocalNetModel (FocalNet model)
- fsmt — FSMTModel (FairSeq Machine-Translation model)
- funnel — FunnelModel or FunnelBaseModel (Funnel Transformer model)
- gemma — GemmaModel (Gemma model)
- gemma2 — Gemma2Model (Gemma2 model)
- git — GitModel (GIT model)
- glpn — GLPNModel (GLPN model)
- gpt-sw3 — GPT2Model (GPT-Sw3 model)
- gpt2 — GPT2Model (OpenAI GPT-2 model)
- gpt_bigcode — GPTBigCodeModel (GPTBigCode model)
- gpt_neo — GPTNeoModel (GPT Neo model)
- gpt_neox — GPTNeoXModel (GPT NeoX model)
- gpt_neox_japanese — GPTNeoXJapaneseModel (GPT NeoX Japanese model)
- gptj — GPTJModel (GPT-J model)
- gptsan-japanese — GPTSanJapaneseForConditionalGeneration (GPTSAN-japanese model)
- granite — GraniteModel (Granite model)
- granitemoe — GraniteMoeModel (GraniteMoeMoe model)
- graphormer — GraphormerModel (Graphormer model)
- grounding-dino — GroundingDinoModel (Grounding DINO model)
- groupvit — GroupViTModel (GroupViT model)
- hiera — HieraModel (Hiera model)
- hubert — HubertModel (Hubert model)
- ibert — IBertModel (I-BERT model)
- idefics — IdeficsModel (IDEFICS model)
- idefics2 — Idefics2Model (Idefics2 model)
- imagegpt — ImageGPTModel (ImageGPT model)
- informer — InformerModel (Informer model)
- jamba — JambaModel (Jamba model)
- jetmoe — JetMoeModel (JetMoe model)
- jukebox — JukeboxModel (Jukebox model)
- kosmos-2 — Kosmos2Model (KOSMOS-2 model)
- layoutlm — LayoutLMModel (LayoutLM model)
- layoutlmv2 — LayoutLMv2Model (LayoutLMv2 model)
- layoutlmv3 — LayoutLMv3Model (LayoutLMv3 model)
- led — LEDModel (LED model)
- levit — LevitModel (LeViT model)
- lilt — LiltModel (LiLT model)
- llama — LlamaModel (LLaMA model)
- longformer — LongformerModel (Longformer model)
- longt5 — LongT5Model (LongT5 model)
- luke — LukeModel (LUKE model)
- lxmert — LxmertModel (LXMERT model)
- m2m_100 — M2M100Model (M2M100 model)
- mamba — MambaModel (Mamba model)
- mamba2 — Mamba2Model (mamba2 model)
- marian — MarianModel (Marian model)
- markuplm — MarkupLMModel (MarkupLM model)
- mask2former — Mask2FormerModel (Mask2Former model)
- maskformer — MaskFormerModel (MaskFormer model)
- maskformer-swin —
MaskFormerSwinModel
(MaskFormerSwin model) - mbart — MBartModel (mBART model)
- mctct — MCTCTModel (M-CTC-T model)
- mega — MegaModel (MEGA model)
- megatron-bert — MegatronBertModel (Megatron-BERT model)
- mgp-str — MgpstrForSceneTextRecognition (MGP-STR model)
- mimi — MimiModel (Mimi model)
- mistral — MistralModel (Mistral model)
- mixtral — MixtralModel (Mixtral model)
- mobilebert — MobileBertModel (MobileBERT model)
- mobilenet_v1 — MobileNetV1Model (MobileNetV1 model)
- mobilenet_v2 — MobileNetV2Model (MobileNetV2 model)
- mobilevit — MobileViTModel (MobileViT model)
- mobilevitv2 — MobileViTV2Model (MobileViTV2 model)
- mpnet — MPNetModel (MPNet model)
- mpt — MptModel (MPT model)
- mra — MraModel (MRA model)
- mt5 — MT5Model (MT5 model)
- musicgen — MusicgenModel (MusicGen model)
- musicgen_melody — MusicgenMelodyModel (MusicGen Melody model)
- mvp — MvpModel (MVP model)
- nat — NatModel (NAT model)
- nemotron — NemotronModel (Nemotron model)
- nezha — NezhaModel (Nezha model)
- nllb-moe — NllbMoeModel (NLLB-MOE model)
- nystromformer — NystromformerModel (Nyströmformer model)
- olmo — OlmoModel (OLMo model)
- olmoe — OlmoeModel (OLMoE model)
- omdet-turbo — OmDetTurboForObjectDetection (OmDet-Turbo model)
- oneformer — OneFormerModel (OneFormer model)
- open-llama — OpenLlamaModel (OpenLlama model)
- openai-gpt — OpenAIGPTModel (OpenAI GPT model)
- opt — OPTModel (OPT model)
- owlv2 — Owlv2Model (OWLv2 model)
- owlvit — OwlViTModel (OWL-ViT model)
- patchtsmixer — PatchTSMixerModel (PatchTSMixer model)
- patchtst — PatchTSTModel (PatchTST model)
- pegasus — PegasusModel (Pegasus model)
- pegasus_x — PegasusXModel (PEGASUS-X model)
- perceiver — PerceiverModel (Perceiver model)
- persimmon — PersimmonModel (Persimmon model)
- phi — PhiModel (Phi model)
- phi3 — Phi3Model (Phi3 model)
- pixtral — PixtralVisionModel (Pixtral model)
- plbart — PLBartModel (PLBart model)
- poolformer — PoolFormerModel (PoolFormer model)
- prophetnet — ProphetNetModel (ProphetNet model)
- pvt — PvtModel (PVT model)
- pvt_v2 — PvtV2Model (PVTv2 model)
- qdqbert — QDQBertModel (QDQBert model)
- qwen2 — Qwen2Model (Qwen2 model)
- qwen2_audio_encoder —
Qwen2AudioEncoder
(Qwen2AudioEncoder model) - qwen2_moe — Qwen2MoeModel (Qwen2MoE model)
- qwen2_vl — Qwen2VLModel (Qwen2VL model)
- recurrent_gemma — RecurrentGemmaModel (RecurrentGemma model)
- reformer — ReformerModel (Reformer model)
- regnet — RegNetModel (RegNet model)
- rembert — RemBertModel (RemBERT model)
- resnet — ResNetModel (ResNet model)
- retribert — RetriBertModel (RetriBERT model)
- roberta — RobertaModel (RoBERTa model)
- roberta-prelayernorm — RobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
- roc_bert — RoCBertModel (RoCBert model)
- roformer — RoFormerModel (RoFormer model)
- rt_detr — RTDetrModel (RT-DETR model)
- rwkv — RwkvModel (RWKV model)
- sam — SamModel (SAM model)
- seamless_m4t — SeamlessM4TModel (SeamlessM4T model)
- seamless_m4t_v2 — SeamlessM4Tv2Model (SeamlessM4Tv2 model)
- segformer — SegformerModel (SegFormer model)
- seggpt — SegGptModel (SegGPT model)
- sew — SEWModel (SEW model)
- sew-d — SEWDModel (SEW-D model)
- siglip — SiglipModel (SigLIP model)
- siglip_vision_model — SiglipVisionModel (SiglipVisionModel model)
- speech_to_text — Speech2TextModel (Speech2Text model)
- speecht5 — SpeechT5Model (SpeechT5 model)
- splinter — SplinterModel (Splinter model)
- squeezebert — SqueezeBertModel (SqueezeBERT model)
- stablelm — StableLmModel (StableLm model)
- starcoder2 — Starcoder2Model (Starcoder2 model)
- swiftformer — SwiftFormerModel (SwiftFormer model)
- swin — SwinModel (Swin Transformer model)
- swin2sr — Swin2SRModel (Swin2SR model)
- swinv2 — Swinv2Model (Swin Transformer V2 model)
- switch_transformers — SwitchTransformersModel (SwitchTransformers model)
- t5 — T5Model (T5 model)
- table-transformer — TableTransformerModel (Table Transformer model)
- tapas — TapasModel (TAPAS model)
- time_series_transformer — TimeSeriesTransformerModel (Time Series Transformer model)
- timesformer — TimesformerModel (TimeSformer model)
- timm_backbone — TimmBackbone (TimmBackbone model)
- trajectory_transformer — TrajectoryTransformerModel (Trajectory Transformer model)
- transfo-xl — TransfoXLModel (Transformer-XL model)
- tvlt — TvltModel (TVLT model)
- tvp — TvpModel (TVP model)
- udop — UdopModel (UDOP model)
- umt5 — UMT5Model (UMT5 model)
- unispeech — UniSpeechModel (UniSpeech model)
- unispeech-sat — UniSpeechSatModel (UniSpeechSat model)
- univnet — UnivNetModel (UnivNet model)
- van — VanModel (VAN model)
- videomae — VideoMAEModel (VideoMAE model)
- vilt — ViltModel (ViLT model)
- vision-text-dual-encoder — VisionTextDualEncoderModel (VisionTextDualEncoder model)
- visual_bert — VisualBertModel (VisualBERT model)
- vit — ViTModel (ViT model)
- vit_hybrid — ViTHybridModel (ViT Hybrid model)
- vit_mae — ViTMAEModel (ViTMAE model)
- vit_msn — ViTMSNModel (ViTMSN model)
- vitdet — VitDetModel (VitDet model)
- vits — VitsModel (VITS model)
- vivit — VivitModel (ViViT model)
- wav2vec2 — Wav2Vec2Model (Wav2Vec2 model)
- wav2vec2-bert — Wav2Vec2BertModel (Wav2Vec2-BERT model)
- wav2vec2-conformer — Wav2Vec2ConformerModel (Wav2Vec2-Conformer model)
- wavlm — WavLMModel (WavLM model)
- whisper — WhisperModel (Whisper model)
- xclip — XCLIPModel (X-CLIP model)
- xglm — XGLMModel (XGLM model)
- xlm — XLMModel (XLM model)
- xlm-prophetnet — XLMProphetNetModel (XLM-ProphetNet model)
- xlm-roberta — XLMRobertaModel (XLM-RoBERTa model)
- xlm-roberta-xl — XLMRobertaXLModel (XLM-RoBERTa-XL model)
- xlnet — XLNetModel (XLNet model)
- xmod — XmodModel (X-MOD model)
- yolos — YolosModel (YOLOS model)
- yoso — YosoModel (YOSO model)
The model is set in evaluation mode by default using model.eval()
(so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModel
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModel.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModel.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModel.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )
TFAutoModel
This is a generic model class that will be instantiated as one of the base model classes of the library when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: TFAlbertModel (ALBERT model)
- BartConfig configuration class: TFBartModel (BART model)
- BertConfig configuration class: TFBertModel (BERT model)
- BlenderbotConfig configuration class: TFBlenderbotModel (Blenderbot model)
- BlenderbotSmallConfig configuration class: TFBlenderbotSmallModel (BlenderbotSmall model)
- BlipConfig configuration class: TFBlipModel (BLIP model)
- CLIPConfig configuration class: TFCLIPModel (CLIP model)
- CTRLConfig configuration class: TFCTRLModel (CTRL model)
- CamembertConfig configuration class: TFCamembertModel (CamemBERT model)
- ConvBertConfig configuration class: TFConvBertModel (ConvBERT model)
- ConvNextConfig configuration class: TFConvNextModel (ConvNeXT model)
- ConvNextV2Config configuration class: TFConvNextV2Model (ConvNeXTV2 model)
- CvtConfig configuration class: TFCvtModel (CvT model)
- DPRConfig configuration class: TFDPRQuestionEncoder (DPR model)
- Data2VecVisionConfig configuration class: TFData2VecVisionModel (Data2VecVision model)
- DebertaConfig configuration class: TFDebertaModel (DeBERTa model)
- DebertaV2Config configuration class: TFDebertaV2Model (DeBERTa-v2 model)
- DeiTConfig configuration class: TFDeiTModel (DeiT model)
- DistilBertConfig configuration class: TFDistilBertModel (DistilBERT model)
- EfficientFormerConfig configuration class: TFEfficientFormerModel (EfficientFormer model)
- ElectraConfig configuration class: TFElectraModel (ELECTRA model)
- EsmConfig configuration class: TFEsmModel (ESM model)
- FlaubertConfig configuration class: TFFlaubertModel (FlauBERT model)
- FunnelConfig configuration class: TFFunnelModel or TFFunnelBaseModel (Funnel Transformer model)
- GPT2Config configuration class: TFGPT2Model (OpenAI GPT-2 model)
- GPTJConfig configuration class: TFGPTJModel (GPT-J model)
- GroupViTConfig configuration class: TFGroupViTModel (GroupViT model)
- HubertConfig configuration class: TFHubertModel (Hubert model)
- IdeficsConfig configuration class: TFIdeficsModel (IDEFICS model)
- LEDConfig configuration class: TFLEDModel (LED model)
- LayoutLMConfig configuration class: TFLayoutLMModel (LayoutLM model)
- LayoutLMv3Config configuration class: TFLayoutLMv3Model (LayoutLMv3 model)
- LongformerConfig configuration class: TFLongformerModel (Longformer model)
- LxmertConfig configuration class: TFLxmertModel (LXMERT model)
- MBartConfig configuration class: TFMBartModel (mBART model)
- MPNetConfig configuration class: TFMPNetModel (MPNet model)
- MT5Config configuration class: TFMT5Model (MT5 model)
- MarianConfig configuration class: TFMarianModel (Marian model)
- MistralConfig configuration class: TFMistralModel (Mistral model)
- MobileBertConfig configuration class: TFMobileBertModel (MobileBERT model)
- MobileViTConfig configuration class: TFMobileViTModel (MobileViT model)
- OPTConfig configuration class: TFOPTModel (OPT model)
- OpenAIGPTConfig configuration class: TFOpenAIGPTModel (OpenAI GPT model)
- PegasusConfig configuration class: TFPegasusModel (Pegasus model)
- RegNetConfig configuration class: TFRegNetModel (RegNet model)
- RemBertConfig configuration class: TFRemBertModel (RemBERT model)
- ResNetConfig configuration class: TFResNetModel (ResNet model)
- RoFormerConfig configuration class: TFRoFormerModel (RoFormer model)
- RobertaConfig configuration class: TFRobertaModel (RoBERTa model)
- RobertaPreLayerNormConfig configuration class: TFRobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
- SamConfig configuration class: TFSamModel (SAM model)
- SegformerConfig configuration class: TFSegformerModel (SegFormer model)
- Speech2TextConfig configuration class: TFSpeech2TextModel (Speech2Text model)
- SwiftFormerConfig configuration class: TFSwiftFormerModel (SwiftFormer model)
- SwinConfig configuration class: TFSwinModel (Swin Transformer model)
- T5Config configuration class: TFT5Model (T5 model)
- TapasConfig configuration class: TFTapasModel (TAPAS model)
- TransfoXLConfig configuration class: TFTransfoXLModel (Transformer-XL model)
- ViTConfig configuration class: TFViTModel (ViT model)
- ViTMAEConfig configuration class: TFViTMAEModel (ViTMAE model)
- VisionTextDualEncoderConfig configuration class: TFVisionTextDualEncoderModel (VisionTextDualEncoder model)
- Wav2Vec2Config configuration class: TFWav2Vec2Model (Wav2Vec2 model)
- WhisperConfig configuration class: TFWhisperModel (Whisper model)
- XGLMConfig configuration class: TFXGLMModel (XGLM model)
- XLMConfig configuration class: TFXLMModel (XLM model)
- XLMRobertaConfig configuration class: TFXLMRobertaModel (XLM-RoBERTa model)
- XLNetConfig configuration class: TFXLNetModel (XLNet model)
- attn_implementation (
str
, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"
(manual implementation of the attention),"sdpa"
(usingF.scaled_dot_product_attention
), or"flash_attention_2"
(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"
implementation.
Instantiates one of the base model classes of the library from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/
. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin
). In this case,from_pt
should be set toTrue
and a configuration object should be provided asconfig
argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()
method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool
, optional, defaults toFalse
) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_path
argument). - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - output_loading_info(
bool
, optional, defaults toFalse
) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str
, optional, defaults to"main"
) — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True
). Behaves differently depending on whether aconfig
is provided or automatically loaded:- If a configuration is provided with
config
,**kwargs
will be directly passed to the underlying model’s__init__
method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargs
will be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargs
that corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargs
value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__
function.
- If a configuration is provided with
Instantiate one of the base model classes of the library from a pretrained model.
The model class to instantiate is selected based on the model_type
property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path
if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path
:
- albert — TFAlbertModel (ALBERT model)
- bart — TFBartModel (BART model)
- bert — TFBertModel (BERT model)
- blenderbot — TFBlenderbotModel (Blenderbot model)
- blenderbot-small — TFBlenderbotSmallModel (BlenderbotSmall model)
- blip — TFBlipModel (BLIP model)
- camembert — TFCamembertModel (CamemBERT model)
- clip — TFCLIPModel (CLIP model)
- convbert — TFConvBertModel (ConvBERT model)
- convnext — TFConvNextModel (ConvNeXT model)
- convnextv2 — TFConvNextV2Model (ConvNeXTV2 model)
- ctrl — TFCTRLModel (CTRL model)
- cvt — TFCvtModel (CvT model)
- data2vec-vision — TFData2VecVisionModel (Data2VecVision model)
- deberta — TFDebertaModel (DeBERTa model)
- deberta-v2 — TFDebertaV2Model (DeBERTa-v2 model)
- deit — TFDeiTModel (DeiT model)
- distilbert — TFDistilBertModel (DistilBERT model)
- dpr — TFDPRQuestionEncoder (DPR model)
- efficientformer — TFEfficientFormerModel (EfficientFormer model)
- electra — TFElectraModel (ELECTRA model)
- esm — TFEsmModel (ESM model)
- flaubert — TFFlaubertModel (FlauBERT model)
- funnel — TFFunnelModel or TFFunnelBaseModel (Funnel Transformer model)
- gpt-sw3 — TFGPT2Model (GPT-Sw3 model)
- gpt2 — TFGPT2Model (OpenAI GPT-2 model)
- gptj — TFGPTJModel (GPT-J model)
- groupvit — TFGroupViTModel (GroupViT model)
- hubert — TFHubertModel (Hubert model)
- idefics — TFIdeficsModel (IDEFICS model)
- layoutlm — TFLayoutLMModel (LayoutLM model)
- layoutlmv3 — TFLayoutLMv3Model (LayoutLMv3 model)
- led — TFLEDModel (LED model)
- longformer — TFLongformerModel (Longformer model)
- lxmert — TFLxmertModel (LXMERT model)
- marian — TFMarianModel (Marian model)
- mbart — TFMBartModel (mBART model)
- mistral — TFMistralModel (Mistral model)
- mobilebert — TFMobileBertModel (MobileBERT model)
- mobilevit — TFMobileViTModel (MobileViT model)
- mpnet — TFMPNetModel (MPNet model)
- mt5 — TFMT5Model (MT5 model)
- openai-gpt — TFOpenAIGPTModel (OpenAI GPT model)
- opt — TFOPTModel (OPT model)
- pegasus — TFPegasusModel (Pegasus model)
- regnet — TFRegNetModel (RegNet model)
- rembert — TFRemBertModel (RemBERT model)
- resnet — TFResNetModel (ResNet model)
- roberta — TFRobertaModel (RoBERTa model)
- roberta-prelayernorm — TFRobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
- roformer — TFRoFormerModel (RoFormer model)
- sam — TFSamModel (SAM model)
- segformer — TFSegformerModel (SegFormer model)
- speech_to_text — TFSpeech2TextModel (Speech2Text model)
- swiftformer — TFSwiftFormerModel (SwiftFormer model)
- swin — TFSwinModel (Swin Transformer model)
- t5 — TFT5Model (T5 model)
- tapas — TFTapasModel (TAPAS model)
- transfo-xl — TFTransfoXLModel (Transformer-XL model)
- vision-text-dual-encoder — TFVisionTextDualEncoderModel (VisionTextDualEncoder model)
- vit — TFViTModel (ViT model)
- vit_mae — TFViTMAEModel (ViTMAE model)
- wav2vec2 — TFWav2Vec2Model (Wav2Vec2 model)
- whisper — TFWhisperModel (Whisper model)
- xglm — TFXGLMModel (XGLM model)
- xlm — TFXLMModel (XLM model)
- xlm-roberta — TFXLMRobertaModel (XLM-RoBERTa model)
- xlnet — TFXLNetModel (XLNet model)
Examples:
>>> from transformers import AutoConfig, TFAutoModel
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModel.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )
FlaxAutoModel
This is a generic model class that will be instantiated as one of the base model classes of the library when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: FlaxAlbertModel (ALBERT model)
- BartConfig configuration class: FlaxBartModel (BART model)
- BeitConfig configuration class: FlaxBeitModel (BEiT model)
- BertConfig configuration class: FlaxBertModel (BERT model)
- BigBirdConfig configuration class: FlaxBigBirdModel (BigBird model)
- BlenderbotConfig configuration class: FlaxBlenderbotModel (Blenderbot model)
- BlenderbotSmallConfig configuration class: FlaxBlenderbotSmallModel (BlenderbotSmall model)
- BloomConfig configuration class: FlaxBloomModel (BLOOM model)
- CLIPConfig configuration class: FlaxCLIPModel (CLIP model)
- Dinov2Config configuration class: FlaxDinov2Model (DINOv2 model)
- DistilBertConfig configuration class: FlaxDistilBertModel (DistilBERT model)
- ElectraConfig configuration class: FlaxElectraModel (ELECTRA model)
- GPT2Config configuration class: FlaxGPT2Model (OpenAI GPT-2 model)
- GPTJConfig configuration class: FlaxGPTJModel (GPT-J model)
- GPTNeoConfig configuration class: FlaxGPTNeoModel (GPT Neo model)
- GemmaConfig configuration class: FlaxGemmaModel (Gemma model)
- LlamaConfig configuration class: FlaxLlamaModel (LLaMA model)
- LongT5Config configuration class: FlaxLongT5Model (LongT5 model)
- MBartConfig configuration class: FlaxMBartModel (mBART model)
- MT5Config configuration class: FlaxMT5Model (MT5 model)
- MarianConfig configuration class: FlaxMarianModel (Marian model)
- MistralConfig configuration class: FlaxMistralModel (Mistral model)
- OPTConfig configuration class: FlaxOPTModel (OPT model)
- PegasusConfig configuration class: FlaxPegasusModel (Pegasus model)
- RegNetConfig configuration class: FlaxRegNetModel (RegNet model)
- ResNetConfig configuration class: FlaxResNetModel (ResNet model)
- RoFormerConfig configuration class: FlaxRoFormerModel (RoFormer model)
- RobertaConfig configuration class: FlaxRobertaModel (RoBERTa model)
- RobertaPreLayerNormConfig configuration class: FlaxRobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
- T5Config configuration class: FlaxT5Model (T5 model)
- ViTConfig configuration class: FlaxViTModel (ViT model)
- VisionTextDualEncoderConfig configuration class: FlaxVisionTextDualEncoderModel (VisionTextDualEncoder model)
- Wav2Vec2Config configuration class: FlaxWav2Vec2Model (Wav2Vec2 model)
- WhisperConfig configuration class: FlaxWhisperModel (Whisper model)
- XGLMConfig configuration class: FlaxXGLMModel (XGLM model)
- XLMRobertaConfig configuration class: FlaxXLMRobertaModel (XLM-RoBERTa model)
- attn_implementation (
str
, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"
(manual implementation of the attention),"sdpa"
(usingF.scaled_dot_product_attention
), or"flash_attention_2"
(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"
implementation.
Instantiates one of the base model classes of the library from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/
. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin
). In this case,from_pt
should be set toTrue
and a configuration object should be provided asconfig
argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()
method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool
, optional, defaults toFalse
) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_path
argument). - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - output_loading_info(
bool
, optional, defaults toFalse
) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str
, optional, defaults to"main"
) — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True
). Behaves differently depending on whether aconfig
is provided or automatically loaded:- If a configuration is provided with
config
,**kwargs
will be directly passed to the underlying model’s__init__
method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargs
will be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargs
that corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargs
value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__
function.
- If a configuration is provided with
Instantiate one of the base model classes of the library from a pretrained model.
The model class to instantiate is selected based on the model_type
property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path
if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path
:
- albert — FlaxAlbertModel (ALBERT model)
- bart — FlaxBartModel (BART model)
- beit — FlaxBeitModel (BEiT model)
- bert — FlaxBertModel (BERT model)
- big_bird — FlaxBigBirdModel (BigBird model)
- blenderbot — FlaxBlenderbotModel (Blenderbot model)
- blenderbot-small — FlaxBlenderbotSmallModel (BlenderbotSmall model)
- bloom — FlaxBloomModel (BLOOM model)
- clip — FlaxCLIPModel (CLIP model)
- dinov2 — FlaxDinov2Model (DINOv2 model)
- distilbert — FlaxDistilBertModel (DistilBERT model)
- electra — FlaxElectraModel (ELECTRA model)
- gemma — FlaxGemmaModel (Gemma model)
- gpt-sw3 — FlaxGPT2Model (GPT-Sw3 model)
- gpt2 — FlaxGPT2Model (OpenAI GPT-2 model)
- gpt_neo — FlaxGPTNeoModel (GPT Neo model)
- gptj — FlaxGPTJModel (GPT-J model)
- llama — FlaxLlamaModel (LLaMA model)
- longt5 — FlaxLongT5Model (LongT5 model)
- marian — FlaxMarianModel (Marian model)
- mbart — FlaxMBartModel (mBART model)
- mistral — FlaxMistralModel (Mistral model)
- mt5 — FlaxMT5Model (MT5 model)
- opt — FlaxOPTModel (OPT model)
- pegasus — FlaxPegasusModel (Pegasus model)
- regnet — FlaxRegNetModel (RegNet model)
- resnet — FlaxResNetModel (ResNet model)
- roberta — FlaxRobertaModel (RoBERTa model)
- roberta-prelayernorm — FlaxRobertaPreLayerNormModel (RoBERTa-PreLayerNorm model)
- roformer — FlaxRoFormerModel (RoFormer model)
- t5 — FlaxT5Model (T5 model)
- vision-text-dual-encoder — FlaxVisionTextDualEncoderModel (VisionTextDualEncoder model)
- vit — FlaxViTModel (ViT model)
- wav2vec2 — FlaxWav2Vec2Model (Wav2Vec2 model)
- whisper — FlaxWhisperModel (Whisper model)
- xglm — FlaxXGLMModel (XGLM model)
- xlm-roberta — FlaxXLMRobertaModel (XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModel
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModel.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModel.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModel.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )
Generic pretraining classes
The following auto classes are available for instantiating a model with a pretraining head.
AutoModelForPreTraining
This is a generic model class that will be instantiated as one of the model classes of the library (with a pretraining head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: AlbertForPreTraining (ALBERT model)
- BartConfig configuration class: BartForConditionalGeneration (BART model)
- BertConfig configuration class: BertForPreTraining (BERT model)
- BigBirdConfig configuration class: BigBirdForPreTraining (BigBird model)
- BloomConfig configuration class: BloomForCausalLM (BLOOM model)
- CTRLConfig configuration class: CTRLLMHeadModel (CTRL model)
- CamembertConfig configuration class: CamembertForMaskedLM (CamemBERT model)
- Data2VecTextConfig configuration class: Data2VecTextForMaskedLM (Data2VecText model)
- DebertaConfig configuration class: DebertaForMaskedLM (DeBERTa model)
- DebertaV2Config configuration class: DebertaV2ForMaskedLM (DeBERTa-v2 model)
- DistilBertConfig configuration class: DistilBertForMaskedLM (DistilBERT model)
- ElectraConfig configuration class: ElectraForPreTraining (ELECTRA model)
- ErnieConfig configuration class: ErnieForPreTraining (ERNIE model)
- FNetConfig configuration class: FNetForPreTraining (FNet model)
- FSMTConfig configuration class: FSMTForConditionalGeneration (FairSeq Machine-Translation model)
- FalconMambaConfig configuration class: FalconMambaForCausalLM (FalconMamba model)
- FlaubertConfig configuration class: FlaubertWithLMHeadModel (FlauBERT model)
- FlavaConfig configuration class: FlavaForPreTraining (FLAVA model)
- FunnelConfig configuration class: FunnelForPreTraining (Funnel Transformer model)
- GPT2Config configuration class: GPT2LMHeadModel (OpenAI GPT-2 model)
- GPTBigCodeConfig configuration class: GPTBigCodeForCausalLM (GPTBigCode model)
- GPTSanJapaneseConfig configuration class: GPTSanJapaneseForConditionalGeneration (GPTSAN-japanese model)
- HieraConfig configuration class: HieraForPreTraining (Hiera model)
- IBertConfig configuration class: IBertForMaskedLM (I-BERT model)
- Idefics2Config configuration class: Idefics2ForConditionalGeneration (Idefics2 model)
- IdeficsConfig configuration class: IdeficsForVisionText2Text (IDEFICS model)
- LayoutLMConfig configuration class: LayoutLMForMaskedLM (LayoutLM model)
- LlavaConfig configuration class: LlavaForConditionalGeneration (LLaVa model)
- LlavaNextConfig configuration class: LlavaNextForConditionalGeneration (LLaVA-NeXT model)
- LlavaNextVideoConfig configuration class: LlavaNextVideoForConditionalGeneration (LLaVa-NeXT-Video model)
- LlavaOnevisionConfig configuration class: LlavaOnevisionForConditionalGeneration (LLaVA-Onevision model)
- LongformerConfig configuration class: LongformerForMaskedLM (Longformer model)
- LukeConfig configuration class: LukeForMaskedLM (LUKE model)
- LxmertConfig configuration class: LxmertForPreTraining (LXMERT model)
- MPNetConfig configuration class: MPNetForMaskedLM (MPNet model)
- Mamba2Config configuration class: Mamba2ForCausalLM (mamba2 model)
- MambaConfig configuration class: MambaForCausalLM (Mamba model)
- MegaConfig configuration class: MegaForMaskedLM (MEGA model)
- MegatronBertConfig configuration class: MegatronBertForPreTraining (Megatron-BERT model)
- MllamaConfig configuration class: MllamaForConditionalGeneration (Mllama model)
- MobileBertConfig configuration class: MobileBertForPreTraining (MobileBERT model)
- MptConfig configuration class: MptForCausalLM (MPT model)
- MraConfig configuration class: MraForMaskedLM (MRA model)
- MvpConfig configuration class: MvpForConditionalGeneration (MVP model)
- NezhaConfig configuration class: NezhaForPreTraining (Nezha model)
- NllbMoeConfig configuration class: NllbMoeForConditionalGeneration (NLLB-MOE model)
- OpenAIGPTConfig configuration class: OpenAIGPTLMHeadModel (OpenAI GPT model)
- PaliGemmaConfig configuration class: PaliGemmaForConditionalGeneration (PaliGemma model)
- Qwen2AudioConfig configuration class: Qwen2AudioForConditionalGeneration (Qwen2Audio model)
- RetriBertConfig configuration class: RetriBertModel (RetriBERT model)
- RoCBertConfig configuration class: RoCBertForPreTraining (RoCBert model)
- RobertaConfig configuration class: RobertaForMaskedLM (RoBERTa model)
- RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
- RwkvConfig configuration class: RwkvForCausalLM (RWKV model)
- SplinterConfig configuration class: SplinterForPreTraining (Splinter model)
- SqueezeBertConfig configuration class: SqueezeBertForMaskedLM (SqueezeBERT model)
- SwitchTransformersConfig configuration class: SwitchTransformersForConditionalGeneration (SwitchTransformers model)
- T5Config configuration class: T5ForConditionalGeneration (T5 model)
- TapasConfig configuration class: TapasForMaskedLM (TAPAS model)
- TransfoXLConfig configuration class: TransfoXLLMHeadModel (Transformer-XL model)
- TvltConfig configuration class: TvltForPreTraining (TVLT model)
- UniSpeechConfig configuration class: UniSpeechForPreTraining (UniSpeech model)
- UniSpeechSatConfig configuration class: UniSpeechSatForPreTraining (UniSpeechSat model)
- ViTMAEConfig configuration class: ViTMAEForPreTraining (ViTMAE model)
- VideoLlavaConfig configuration class: VideoLlavaForConditionalGeneration (VideoLlava model)
- VideoMAEConfig configuration class: VideoMAEForPreTraining (VideoMAE model)
- VipLlavaConfig configuration class: VipLlavaForConditionalGeneration (VipLlava model)
- VisualBertConfig configuration class: VisualBertForPreTraining (VisualBERT model)
- Wav2Vec2Config configuration class: Wav2Vec2ForPreTraining (Wav2Vec2 model)
- Wav2Vec2ConformerConfig configuration class: Wav2Vec2ConformerForPreTraining (Wav2Vec2-Conformer model)
- XLMConfig configuration class: XLMWithLMHeadModel (XLM model)
- XLMRobertaConfig configuration class: XLMRobertaForMaskedLM (XLM-RoBERTa model)
- XLMRobertaXLConfig configuration class: XLMRobertaXLForMaskedLM (XLM-RoBERTa-XL model)
- XLNetConfig configuration class: XLNetLMHeadModel (XLNet model)
- XmodConfig configuration class: XmodForMaskedLM (X-MOD model)
- attn_implementation (
str
, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"
(manual implementation of the attention),"sdpa"
(usingF.scaled_dot_product_attention
), or"flash_attention_2"
(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"
implementation.
Instantiates one of the model classes of the library (with a pretraining head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/
. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index
). In this case,from_tf
should be set toTrue
and a configuration object should be provided asconfig
argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()
method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
- state_dict (Dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool
, optional, defaults toFalse
) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_path
argument). - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - output_loading_info(
bool
, optional, defaults toFalse
) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str
, optional, defaults to"main"
) — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True
). Behaves differently depending on whether aconfig
is provided or automatically loaded:- If a configuration is provided with
config
,**kwargs
will be directly passed to the underlying model’s__init__
method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargs
will be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargs
that corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargs
value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__
function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a pretraining head) from a pretrained model.
The model class to instantiate is selected based on the model_type
property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path
if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path
:
- albert — AlbertForPreTraining (ALBERT model)
- bart — BartForConditionalGeneration (BART model)
- bert — BertForPreTraining (BERT model)
- big_bird — BigBirdForPreTraining (BigBird model)
- bloom — BloomForCausalLM (BLOOM model)
- camembert — CamembertForMaskedLM (CamemBERT model)
- ctrl — CTRLLMHeadModel (CTRL model)
- data2vec-text — Data2VecTextForMaskedLM (Data2VecText model)
- deberta — DebertaForMaskedLM (DeBERTa model)
- deberta-v2 — DebertaV2ForMaskedLM (DeBERTa-v2 model)
- distilbert — DistilBertForMaskedLM (DistilBERT model)
- electra — ElectraForPreTraining (ELECTRA model)
- ernie — ErnieForPreTraining (ERNIE model)
- falcon_mamba — FalconMambaForCausalLM (FalconMamba model)
- flaubert — FlaubertWithLMHeadModel (FlauBERT model)
- flava — FlavaForPreTraining (FLAVA model)
- fnet — FNetForPreTraining (FNet model)
- fsmt — FSMTForConditionalGeneration (FairSeq Machine-Translation model)
- funnel — FunnelForPreTraining (Funnel Transformer model)
- gpt-sw3 — GPT2LMHeadModel (GPT-Sw3 model)
- gpt2 — GPT2LMHeadModel (OpenAI GPT-2 model)
- gpt_bigcode — GPTBigCodeForCausalLM (GPTBigCode model)
- gptsan-japanese — GPTSanJapaneseForConditionalGeneration (GPTSAN-japanese model)
- hiera — HieraForPreTraining (Hiera model)
- ibert — IBertForMaskedLM (I-BERT model)
- idefics — IdeficsForVisionText2Text (IDEFICS model)
- idefics2 — Idefics2ForConditionalGeneration (Idefics2 model)
- layoutlm — LayoutLMForMaskedLM (LayoutLM model)
- llava — LlavaForConditionalGeneration (LLaVa model)
- llava_next — LlavaNextForConditionalGeneration (LLaVA-NeXT model)
- llava_next_video — LlavaNextVideoForConditionalGeneration (LLaVa-NeXT-Video model)
- llava_onevision — LlavaOnevisionForConditionalGeneration (LLaVA-Onevision model)
- longformer — LongformerForMaskedLM (Longformer model)
- luke — LukeForMaskedLM (LUKE model)
- lxmert — LxmertForPreTraining (LXMERT model)
- mamba — MambaForCausalLM (Mamba model)
- mamba2 — Mamba2ForCausalLM (mamba2 model)
- mega — MegaForMaskedLM (MEGA model)
- megatron-bert — MegatronBertForPreTraining (Megatron-BERT model)
- mllama — MllamaForConditionalGeneration (Mllama model)
- mobilebert — MobileBertForPreTraining (MobileBERT model)
- mpnet — MPNetForMaskedLM (MPNet model)
- mpt — MptForCausalLM (MPT model)
- mra — MraForMaskedLM (MRA model)
- mvp — MvpForConditionalGeneration (MVP model)
- nezha — NezhaForPreTraining (Nezha model)
- nllb-moe — NllbMoeForConditionalGeneration (NLLB-MOE model)
- openai-gpt — OpenAIGPTLMHeadModel (OpenAI GPT model)
- paligemma — PaliGemmaForConditionalGeneration (PaliGemma model)
- qwen2_audio — Qwen2AudioForConditionalGeneration (Qwen2Audio model)
- retribert — RetriBertModel (RetriBERT model)
- roberta — RobertaForMaskedLM (RoBERTa model)
- roberta-prelayernorm — RobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
- roc_bert — RoCBertForPreTraining (RoCBert model)
- rwkv — RwkvForCausalLM (RWKV model)
- splinter — SplinterForPreTraining (Splinter model)
- squeezebert — SqueezeBertForMaskedLM (SqueezeBERT model)
- switch_transformers — SwitchTransformersForConditionalGeneration (SwitchTransformers model)
- t5 — T5ForConditionalGeneration (T5 model)
- tapas — TapasForMaskedLM (TAPAS model)
- transfo-xl — TransfoXLLMHeadModel (Transformer-XL model)
- tvlt — TvltForPreTraining (TVLT model)
- unispeech — UniSpeechForPreTraining (UniSpeech model)
- unispeech-sat — UniSpeechSatForPreTraining (UniSpeechSat model)
- video_llava — VideoLlavaForConditionalGeneration (VideoLlava model)
- videomae — VideoMAEForPreTraining (VideoMAE model)
- vipllava — VipLlavaForConditionalGeneration (VipLlava model)
- visual_bert — VisualBertForPreTraining (VisualBERT model)
- vit_mae — ViTMAEForPreTraining (ViTMAE model)
- wav2vec2 — Wav2Vec2ForPreTraining (Wav2Vec2 model)
- wav2vec2-conformer — Wav2Vec2ConformerForPreTraining (Wav2Vec2-Conformer model)
- xlm — XLMWithLMHeadModel (XLM model)
- xlm-roberta — XLMRobertaForMaskedLM (XLM-RoBERTa model)
- xlm-roberta-xl — XLMRobertaXLForMaskedLM (XLM-RoBERTa-XL model)
- xlnet — XLNetLMHeadModel (XLNet model)
- xmod — XmodForMaskedLM (X-MOD model)
The model is set in evaluation mode by default using model.eval()
(so for instance, dropout modules are
deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
>>> from transformers import AutoConfig, AutoModelForPreTraining
>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = AutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModelForPreTraining.from_pretrained(
... "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )
TFAutoModelForPreTraining
This is a generic model class that will be instantiated as one of the model classes of the library (with a pretraining head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: TFAlbertForPreTraining (ALBERT model)
- BartConfig configuration class: TFBartForConditionalGeneration (BART model)
- BertConfig configuration class: TFBertForPreTraining (BERT model)
- CTRLConfig configuration class: TFCTRLLMHeadModel (CTRL model)
- CamembertConfig configuration class: TFCamembertForMaskedLM (CamemBERT model)
- DistilBertConfig configuration class: TFDistilBertForMaskedLM (DistilBERT model)
- ElectraConfig configuration class: TFElectraForPreTraining (ELECTRA model)
- FlaubertConfig configuration class: TFFlaubertWithLMHeadModel (FlauBERT model)
- FunnelConfig configuration class: TFFunnelForPreTraining (Funnel Transformer model)
- GPT2Config configuration class: TFGPT2LMHeadModel (OpenAI GPT-2 model)
- IdeficsConfig configuration class: TFIdeficsForVisionText2Text (IDEFICS model)
- LayoutLMConfig configuration class: TFLayoutLMForMaskedLM (LayoutLM model)
- LxmertConfig configuration class: TFLxmertForPreTraining (LXMERT model)
- MPNetConfig configuration class: TFMPNetForMaskedLM (MPNet model)
- MobileBertConfig configuration class: TFMobileBertForPreTraining (MobileBERT model)
- OpenAIGPTConfig configuration class: TFOpenAIGPTLMHeadModel (OpenAI GPT model)
- RobertaConfig configuration class: TFRobertaForMaskedLM (RoBERTa model)
- RobertaPreLayerNormConfig configuration class: TFRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
- T5Config configuration class: TFT5ForConditionalGeneration (T5 model)
- TapasConfig configuration class: TFTapasForMaskedLM (TAPAS model)
- TransfoXLConfig configuration class: TFTransfoXLLMHeadModel (Transformer-XL model)
- ViTMAEConfig configuration class: TFViTMAEForPreTraining (ViTMAE model)
- XLMConfig configuration class: TFXLMWithLMHeadModel (XLM model)
- XLMRobertaConfig configuration class: TFXLMRobertaForMaskedLM (XLM-RoBERTa model)
- XLNetConfig configuration class: TFXLNetLMHeadModel (XLNet model)
- attn_implementation (
str
, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"
(manual implementation of the attention),"sdpa"
(usingF.scaled_dot_product_attention
), or"flash_attention_2"
(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"
implementation.
Instantiates one of the model classes of the library (with a pretraining head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/
. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin
). In this case,from_pt
should be set toTrue
and a configuration object should be provided asconfig
argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()
method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool
, optional, defaults toFalse
) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_path
argument). - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - output_loading_info(
bool
, optional, defaults toFalse
) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str
, optional, defaults to"main"
) — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True
). Behaves differently depending on whether aconfig
is provided or automatically loaded:- If a configuration is provided with
config
,**kwargs
will be directly passed to the underlying model’s__init__
method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargs
will be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargs
that corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargs
value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__
function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a pretraining head) from a pretrained model.
The model class to instantiate is selected based on the model_type
property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path
if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path
:
- albert — TFAlbertForPreTraining (ALBERT model)
- bart — TFBartForConditionalGeneration (BART model)
- bert — TFBertForPreTraining (BERT model)
- camembert — TFCamembertForMaskedLM (CamemBERT model)
- ctrl — TFCTRLLMHeadModel (CTRL model)
- distilbert — TFDistilBertForMaskedLM (DistilBERT model)
- electra — TFElectraForPreTraining (ELECTRA model)
- flaubert — TFFlaubertWithLMHeadModel (FlauBERT model)
- funnel — TFFunnelForPreTraining (Funnel Transformer model)
- gpt-sw3 — TFGPT2LMHeadModel (GPT-Sw3 model)
- gpt2 — TFGPT2LMHeadModel (OpenAI GPT-2 model)
- idefics — TFIdeficsForVisionText2Text (IDEFICS model)
- layoutlm — TFLayoutLMForMaskedLM (LayoutLM model)
- lxmert — TFLxmertForPreTraining (LXMERT model)
- mobilebert — TFMobileBertForPreTraining (MobileBERT model)
- mpnet — TFMPNetForMaskedLM (MPNet model)
- openai-gpt — TFOpenAIGPTLMHeadModel (OpenAI GPT model)
- roberta — TFRobertaForMaskedLM (RoBERTa model)
- roberta-prelayernorm — TFRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
- t5 — TFT5ForConditionalGeneration (T5 model)
- tapas — TFTapasForMaskedLM (TAPAS model)
- transfo-xl — TFTransfoXLLMHeadModel (Transformer-XL model)
- vit_mae — TFViTMAEForPreTraining (ViTMAE model)
- xlm — TFXLMWithLMHeadModel (XLM model)
- xlm-roberta — TFXLMRobertaForMaskedLM (XLM-RoBERTa model)
- xlnet — TFXLNetLMHeadModel (XLNet model)
Examples:
>>> from transformers import AutoConfig, TFAutoModelForPreTraining
>>> # Download model and configuration from huggingface.co and cache.
>>> model = TFAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = TFAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = TFAutoModelForPreTraining.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )
FlaxAutoModelForPreTraining
This is a generic model class that will be instantiated as one of the model classes of the library (with a pretraining head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- AlbertConfig configuration class: FlaxAlbertForPreTraining (ALBERT model)
- BartConfig configuration class: FlaxBartForConditionalGeneration (BART model)
- BertConfig configuration class: FlaxBertForPreTraining (BERT model)
- BigBirdConfig configuration class: FlaxBigBirdForPreTraining (BigBird model)
- ElectraConfig configuration class: FlaxElectraForPreTraining (ELECTRA model)
- LongT5Config configuration class: FlaxLongT5ForConditionalGeneration (LongT5 model)
- MBartConfig configuration class: FlaxMBartForConditionalGeneration (mBART model)
- MT5Config configuration class: FlaxMT5ForConditionalGeneration (MT5 model)
- RoFormerConfig configuration class: FlaxRoFormerForMaskedLM (RoFormer model)
- RobertaConfig configuration class: FlaxRobertaForMaskedLM (RoBERTa model)
- RobertaPreLayerNormConfig configuration class: FlaxRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
- T5Config configuration class: FlaxT5ForConditionalGeneration (T5 model)
- Wav2Vec2Config configuration class: FlaxWav2Vec2ForPreTraining (Wav2Vec2 model)
- WhisperConfig configuration class: FlaxWhisperForConditionalGeneration (Whisper model)
- XLMRobertaConfig configuration class: FlaxXLMRobertaForMaskedLM (XLM-RoBERTa model)
- attn_implementation (
str
, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"
(manual implementation of the attention),"sdpa"
(usingF.scaled_dot_product_attention
), or"flash_attention_2"
(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"
implementation.
Instantiates one of the model classes of the library (with a pretraining head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/
. - A path or url to a PyTorch state_dict save file (e.g,
./pt_model/pytorch_model.bin
). In this case,from_pt
should be set toTrue
and a configuration object should be provided asconfig
argument. This loading path is slower than converting the PyTorch model in a TensorFlow model using the provided conversion scripts and loading the TensorFlow model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()
method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_pt (
bool
, optional, defaults toFalse
) — Load the model weights from a PyTorch checkpoint save file (see docstring ofpretrained_model_name_or_path
argument). - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - output_loading_info(
bool
, optional, defaults toFalse
) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str
, optional, defaults to"main"
) — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - kwargs (additional keyword arguments, optional) —
Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True
). Behaves differently depending on whether aconfig
is provided or automatically loaded:- If a configuration is provided with
config
,**kwargs
will be directly passed to the underlying model’s__init__
method (we assume all relevant updates to the configuration have already been done) - If a configuration is not provided,
kwargs
will be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargs
that corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargs
value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__
function.
- If a configuration is provided with
Instantiate one of the model classes of the library (with a pretraining head) from a pretrained model.
The model class to instantiate is selected based on the model_type
property of the config object (either
passed as an argument or loaded from pretrained_model_name_or_path
if possible), or when it’s missing, by
falling back to using pattern matching on pretrained_model_name_or_path
:
- albert — FlaxAlbertForPreTraining (ALBERT model)
- bart — FlaxBartForConditionalGeneration (BART model)
- bert — FlaxBertForPreTraining (BERT model)
- big_bird — FlaxBigBirdForPreTraining (BigBird model)
- electra — FlaxElectraForPreTraining (ELECTRA model)
- longt5 — FlaxLongT5ForConditionalGeneration (LongT5 model)
- mbart — FlaxMBartForConditionalGeneration (mBART model)
- mt5 — FlaxMT5ForConditionalGeneration (MT5 model)
- roberta — FlaxRobertaForMaskedLM (RoBERTa model)
- roberta-prelayernorm — FlaxRobertaPreLayerNormForMaskedLM (RoBERTa-PreLayerNorm model)
- roformer — FlaxRoFormerForMaskedLM (RoFormer model)
- t5 — FlaxT5ForConditionalGeneration (T5 model)
- wav2vec2 — FlaxWav2Vec2ForPreTraining (Wav2Vec2 model)
- whisper — FlaxWhisperForConditionalGeneration (Whisper model)
- xlm-roberta — FlaxXLMRobertaForMaskedLM (XLM-RoBERTa model)
Examples:
>>> from transformers import AutoConfig, FlaxAutoModelForPreTraining
>>> # Download model and configuration from huggingface.co and cache.
>>> model = FlaxAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased")
>>> # Update configuration during loading
>>> model = FlaxAutoModelForPreTraining.from_pretrained("google-bert/bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True
>>> # Loading from a PyTorch checkpoint file instead of a TensorFlow model (slower)
>>> config = AutoConfig.from_pretrained("./pt_model/bert_pt_model_config.json")
>>> model = FlaxAutoModelForPreTraining.from_pretrained(
... "./pt_model/bert_pytorch_model.bin", from_pt=True, config=config
... )
Natural Language Processing
The following auto classes are available for the following natural language processing tasks.
AutoModelForCausalLM
This is a generic model class that will be instantiated as one of the model classes of the library (with a causal language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_config
< source >( **kwargs )
Parameters
- config (PretrainedConfig) —
The model class to instantiate is selected based on the configuration class:
- BartConfig configuration class: BartForCausalLM (BART model)
- BertConfig configuration class: BertLMHeadModel (BERT model)
- BertGenerationConfig configuration class: BertGenerationDecoder (Bert Generation model)
- BigBirdConfig configuration class: BigBirdForCausalLM (BigBird model)
- BigBirdPegasusConfig configuration class: BigBirdPegasusForCausalLM (BigBird-Pegasus model)
- BioGptConfig configuration class: BioGptForCausalLM (BioGpt model)
- BlenderbotConfig configuration class: BlenderbotForCausalLM (Blenderbot model)
- BlenderbotSmallConfig configuration class: BlenderbotSmallForCausalLM (BlenderbotSmall model)
- BloomConfig configuration class: BloomForCausalLM (BLOOM model)
- CTRLConfig configuration class: CTRLLMHeadModel (CTRL model)
- CamembertConfig configuration class: CamembertForCausalLM (CamemBERT model)
- CodeGenConfig configuration class: CodeGenForCausalLM (CodeGen model)
- CohereConfig configuration class: CohereForCausalLM (Cohere model)
- CpmAntConfig configuration class: CpmAntForCausalLM (CPM-Ant model)
- Data2VecTextConfig configuration class: Data2VecTextForCausalLM (Data2VecText model)
- DbrxConfig configuration class: DbrxForCausalLM (DBRX model)
- ElectraConfig configuration class: ElectraForCausalLM (ELECTRA model)
- ErnieConfig configuration class: ErnieForCausalLM (ERNIE model)
- FalconConfig configuration class: FalconForCausalLM (Falcon model)
- FalconMambaConfig configuration class: FalconMambaForCausalLM (FalconMamba model)
- FuyuConfig configuration class: FuyuForCausalLM (Fuyu model)
- GPT2Config configuration class: GPT2LMHeadModel (OpenAI GPT-2 model)
- GPTBigCodeConfig configuration class: GPTBigCodeForCausalLM (GPTBigCode model)
- GPTJConfig configuration class: GPTJForCausalLM (GPT-J model)
- GPTNeoConfig configuration class: GPTNeoForCausalLM (GPT Neo model)
- GPTNeoXConfig configuration class: GPTNeoXForCausalLM (GPT NeoX model)
- GPTNeoXJapaneseConfig configuration class: GPTNeoXJapaneseForCausalLM (GPT NeoX Japanese model)
- Gemma2Config configuration class: Gemma2ForCausalLM (Gemma2 model)
- GemmaConfig configuration class: GemmaForCausalLM (Gemma model)
- GitConfig configuration class: GitForCausalLM (GIT model)
- GraniteConfig configuration class: GraniteForCausalLM (Granite model)
- GraniteMoeConfig configuration class: GraniteMoeForCausalLM (GraniteMoeMoe model)
- JambaConfig configuration class: JambaForCausalLM (Jamba model)
- JetMoeConfig configuration class: JetMoeForCausalLM (JetMoe model)
- LlamaConfig configuration class: LlamaForCausalLM (LLaMA model)
- MBartConfig configuration class: MBartForCausalLM (mBART model)
- Mamba2Config configuration class: Mamba2ForCausalLM (mamba2 model)
- MambaConfig configuration class: MambaForCausalLM (Mamba model)
- MarianConfig configuration class: MarianForCausalLM (Marian model)
- MegaConfig configuration class: MegaForCausalLM (MEGA model)
- MegatronBertConfig configuration class: MegatronBertForCausalLM (Megatron-BERT model)
- MistralConfig configuration class: MistralForCausalLM (Mistral model)
- MixtralConfig configuration class: MixtralForCausalLM (Mixtral model)
- MllamaConfig configuration class: MllamaForCausalLM (Mllama model)
- MptConfig configuration class: MptForCausalLM (MPT model)
- MusicgenConfig configuration class: MusicgenForCausalLM (MusicGen model)
- MusicgenMelodyConfig configuration class: MusicgenMelodyForCausalLM (MusicGen Melody model)
- MvpConfig configuration class: MvpForCausalLM (MVP model)
- NemotronConfig configuration class: NemotronForCausalLM (Nemotron model)
- OPTConfig configuration class: OPTForCausalLM (OPT model)
- OlmoConfig configuration class: OlmoForCausalLM (OLMo model)
- OlmoeConfig configuration class: OlmoeForCausalLM (OLMoE model)
- OpenAIGPTConfig configuration class: OpenAIGPTLMHeadModel (OpenAI GPT model)
- OpenLlamaConfig configuration class: OpenLlamaForCausalLM (OpenLlama model)
- PLBartConfig configuration class: PLBartForCausalLM (PLBart model)
- PegasusConfig configuration class: PegasusForCausalLM (Pegasus model)
- PersimmonConfig configuration class: PersimmonForCausalLM (Persimmon model)
- Phi3Config configuration class: Phi3ForCausalLM (Phi3 model)
- PhiConfig configuration class: PhiForCausalLM (Phi model)
- ProphetNetConfig configuration class: ProphetNetForCausalLM (ProphetNet model)
- QDQBertConfig configuration class: QDQBertLMHeadModel (QDQBert model)
- Qwen2Config configuration class: Qwen2ForCausalLM (Qwen2 model)
- Qwen2MoeConfig configuration class: Qwen2MoeForCausalLM (Qwen2MoE model)
- RecurrentGemmaConfig configuration class: RecurrentGemmaForCausalLM (RecurrentGemma model)
- ReformerConfig configuration class: ReformerModelWithLMHead (Reformer model)
- RemBertConfig configuration class: RemBertForCausalLM (RemBERT model)
- RoCBertConfig configuration class: RoCBertForCausalLM (RoCBert model)
- RoFormerConfig configuration class: RoFormerForCausalLM (RoFormer model)
- RobertaConfig configuration class: RobertaForCausalLM (RoBERTa model)
- RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormForCausalLM (RoBERTa-PreLayerNorm model)
- RwkvConfig configuration class: RwkvForCausalLM (RWKV model)
- Speech2Text2Config configuration class: Speech2Text2ForCausalLM (Speech2Text2 model)
- StableLmConfig configuration class: StableLmForCausalLM (StableLm model)
- Starcoder2Config configuration class: Starcoder2ForCausalLM (Starcoder2 model)
- TrOCRConfig configuration class: TrOCRForCausalLM (TrOCR model)
- TransfoXLConfig configuration class: TransfoXLLMHeadModel (Transformer-XL model)
- WhisperConfig configuration class: WhisperForCausalLM (Whisper model)
- XGLMConfig configuration class: XGLMForCausalLM (XGLM model)
- XLMConfig configuration class: XLMWithLMHeadModel (XLM model)
- XLMProphetNetConfig configuration class: XLMProphetNetForCausalLM (XLM-ProphetNet model)
- XLMRobertaConfig configuration class: XLMRobertaForCausalLM (XLM-RoBERTa model)
- XLMRobertaXLConfig configuration class: XLMRobertaXLForCausalLM (XLM-RoBERTa-XL model)
- XLNetConfig configuration class: XLNetLMHeadModel (XLNet model)
- XmodConfig configuration class: XmodForCausalLM (X-MOD model)
- attn_implementation (
str
, optional) — The attention implementation to use in the model (if relevant). Can be any of"eager"
(manual implementation of the attention),"sdpa"
(usingF.scaled_dot_product_attention
), or"flash_attention_2"
(using Dao-AILab/flash-attention). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual"eager"
implementation.
Instantiates one of the model classes of the library (with a causal language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
from_pretrained
< source >( *model_args **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
- A path to a directory containing model weights saved using
save_pretrained(), e.g.,
./my_model_directory/
. - A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index
). In this case,from_tf
should be set toTrue
and a configuration object should be provided asconfig
argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args (additional positional arguments, optional) —
Will be passed along to the underlying model
__init__()
method. - config (PretrainedConfig, optional) —
Configuration for the model to use instead of an automatically loaded configuration. Configuration can
be automatically loaded when:
- The model is a model provided by the library (loaded with the model id string of a pretrained model).
- The model was saved using save_pretrained() and is reloaded by supplying the save directory.
- The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
- state_dict (Dict[str, torch.Tensor], optional) —
A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used. - from_tf (
bool
, optional, defaults toFalse
) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_path
argument). - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - output_loading_info(
bool
, optional, defaults toFalse
) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. - local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (e.g., not try downloading the model). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. - trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. - code_revision (
str
, optional, defaults to"main"
) — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.