Transformers documentation

πŸ€— Transformers

You are viewing v4.38.0 version. A newer version v4.47.1 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

πŸ€— Transformers

State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX.

πŸ€— Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities, such as:

πŸ“ Natural Language Processing: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.
πŸ–ΌοΈ Computer Vision: image classification, object detection, and segmentation.
πŸ—£οΈ Audio: automatic speech recognition and audio classification.
πŸ™ Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.

πŸ€— Transformers support framework interoperability between PyTorch, TensorFlow, and JAX. This provides the flexibility to use a different framework at each stage of a model’s life; train a model in three lines of code in one framework, and load it for inference in another. Models can also be exported to a format like ONNX and TorchScript for deployment in production environments.

Join the growing community on the Hub, forum, or Discord today!

If you are looking for custom support from the Hugging Face team

HuggingFace Expert Acceleration Program

Contents

The documentation is organized into five sections:

  • GET STARTED provides a quick tour of the library and installation instructions to get up and running.

  • TUTORIALS are a great place to start if you’re a beginner. This section will help you gain the basic skills you need to start using the library.

  • HOW-TO GUIDES show you how to achieve a specific goal, like finetuning a pretrained model for language modeling or how to write and share a custom model.

  • CONCEPTUAL GUIDES offers more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of πŸ€— Transformers.

  • API describes all classes and functions:

    • MAIN CLASSES details the most important classes like configuration, model, tokenizer, and pipeline.
    • MODELS details the classes and functions related to each model implemented in the library.
    • INTERNAL HELPERS details utility classes and functions used internally.

Supported models and frameworks

The table below represents the current support in the library for each of those models, whether they have a Python tokenizer (called β€œslow”). A β€œfast” tokenizer backed by the πŸ€— Tokenizers library, whether they have support in Jax (via Flax), PyTorch, and/or TensorFlow.

Model PyTorch support TensorFlow support Flax Support
ALBERT βœ… βœ… βœ…
ALIGN βœ… ❌ ❌
AltCLIP βœ… ❌ ❌
Audio Spectrogram Transformer βœ… ❌ ❌
Autoformer βœ… ❌ ❌
Bark βœ… ❌ ❌
BART βœ… βœ… βœ…
BARThez βœ… βœ… βœ…
BARTpho βœ… βœ… βœ…
BEiT βœ… ❌ βœ…
BERT βœ… βœ… βœ…
Bert Generation βœ… ❌ ❌
BertJapanese βœ… βœ… βœ…
BERTweet βœ… βœ… βœ…
BigBird βœ… ❌ βœ…
BigBird-Pegasus βœ… ❌ ❌
BioGpt βœ… ❌ ❌
BiT βœ… ❌ ❌
Blenderbot βœ… βœ… βœ…
BlenderbotSmall βœ… βœ… βœ…
BLIP βœ… βœ… ❌
BLIP-2 βœ… ❌ ❌
BLOOM βœ… ❌ βœ…
BORT βœ… βœ… βœ…
BridgeTower βœ… ❌ ❌
BROS βœ… ❌ ❌
ByT5 βœ… βœ… βœ…
CamemBERT βœ… βœ… ❌
CANINE βœ… ❌ ❌
Chinese-CLIP βœ… ❌ ❌
CLAP βœ… ❌ ❌
CLIP βœ… βœ… βœ…
CLIPSeg βœ… ❌ ❌
CLVP βœ… ❌ ❌
CodeGen βœ… ❌ ❌
CodeLlama βœ… ❌ βœ…
Conditional DETR βœ… ❌ ❌
ConvBERT βœ… βœ… ❌
ConvNeXT βœ… βœ… ❌
ConvNeXTV2 βœ… βœ… ❌
CPM βœ… βœ… βœ…
CPM-Ant βœ… ❌ ❌
CTRL βœ… βœ… ❌
CvT βœ… βœ… ❌
Data2VecAudio βœ… ❌ ❌
Data2VecText βœ… ❌ ❌
Data2VecVision βœ… βœ… ❌
DeBERTa βœ… βœ… ❌
DeBERTa-v2 βœ… βœ… ❌
Decision Transformer βœ… ❌ ❌
Deformable DETR βœ… ❌ ❌
DeiT βœ… βœ… ❌
DePlot βœ… ❌ ❌
Depth Anything βœ… ❌ ❌
DETA βœ… ❌ ❌
DETR βœ… ❌ ❌
DialoGPT βœ… βœ… βœ…
DiNAT βœ… ❌ ❌
DINOv2 βœ… ❌ ❌
DistilBERT βœ… βœ… βœ…
DiT βœ… ❌ βœ…
DonutSwin βœ… ❌ ❌
DPR βœ… βœ… ❌
DPT βœ… ❌ ❌
EfficientFormer βœ… βœ… ❌
EfficientNet βœ… ❌ ❌
ELECTRA βœ… βœ… βœ…
EnCodec βœ… ❌ ❌
Encoder decoder βœ… βœ… βœ…
ERNIE βœ… ❌ ❌
ErnieM βœ… ❌ ❌
ESM βœ… βœ… ❌
FairSeq Machine-Translation βœ… ❌ ❌
Falcon βœ… ❌ ❌
FastSpeech2Conformer βœ… ❌ ❌
FLAN-T5 βœ… βœ… βœ…
FLAN-UL2 βœ… βœ… βœ…
FlauBERT βœ… βœ… ❌
FLAVA βœ… ❌ ❌
FNet βœ… ❌ ❌
FocalNet βœ… ❌ ❌
Funnel Transformer βœ… βœ… ❌
Fuyu βœ… ❌ ❌
Gemma βœ… ❌ βœ…
GIT βœ… ❌ ❌
GLPN βœ… ❌ ❌
GPT Neo βœ… ❌ βœ…
GPT NeoX βœ… ❌ ❌
GPT NeoX Japanese βœ… ❌ ❌
GPT-J βœ… βœ… βœ…
GPT-Sw3 βœ… βœ… βœ…
GPTBigCode βœ… ❌ ❌
GPTSAN-japanese βœ… ❌ ❌
Graphormer βœ… ❌ ❌
GroupViT βœ… βœ… ❌
HerBERT βœ… βœ… βœ…
Hubert βœ… βœ… ❌
I-BERT βœ… ❌ ❌
IDEFICS βœ… ❌ ❌
ImageGPT βœ… ❌ ❌
Informer βœ… ❌ ❌
InstructBLIP βœ… ❌ ❌
Jukebox βœ… ❌ ❌
KOSMOS-2 βœ… ❌ ❌
LayoutLM βœ… βœ… ❌
LayoutLMv2 βœ… ❌ ❌
LayoutLMv3 βœ… βœ… ❌
LayoutXLM βœ… ❌ ❌
LED βœ… βœ… ❌
LeViT βœ… ❌ ❌
LiLT βœ… ❌ ❌
LLaMA βœ… ❌ βœ…
Llama2 βœ… ❌ βœ…
LLaVa βœ… ❌ ❌
Longformer βœ… βœ… ❌
LongT5 βœ… ❌ βœ…
LUKE βœ… ❌ ❌
LXMERT βœ… βœ… ❌
M-CTC-T βœ… ❌ ❌
M2M100 βœ… ❌ ❌
MADLAD-400 βœ… βœ… βœ…
Marian βœ… βœ… βœ…
MarkupLM βœ… ❌ ❌
Mask2Former βœ… ❌ ❌
MaskFormer βœ… ❌ ❌
MatCha βœ… ❌ ❌
mBART βœ… βœ… βœ…
mBART-50 βœ… βœ… βœ…
MEGA βœ… ❌ ❌
Megatron-BERT βœ… ❌ ❌
Megatron-GPT2 βœ… βœ… βœ…
MGP-STR βœ… ❌ ❌
Mistral βœ… ❌ βœ…
Mixtral βœ… ❌ ❌
mLUKE βœ… ❌ ❌
MMS βœ… βœ… βœ…
MobileBERT βœ… βœ… ❌
MobileNetV1 βœ… ❌ ❌
MobileNetV2 βœ… ❌ ❌
MobileViT βœ… βœ… ❌
MobileViTV2 βœ… ❌ ❌
MPNet βœ… βœ… ❌
MPT βœ… ❌ ❌
MRA βœ… ❌ ❌
MT5 βœ… βœ… βœ…
MusicGen βœ… ❌ ❌
MVP βœ… ❌ ❌
NAT βœ… ❌ ❌
Nezha βœ… ❌ ❌
NLLB βœ… ❌ ❌
NLLB-MOE βœ… ❌ ❌
Nougat βœ… βœ… βœ…
NystrΓΆmformer βœ… ❌ ❌
OneFormer βœ… ❌ ❌
OpenAI GPT βœ… βœ… ❌
OpenAI GPT-2 βœ… βœ… βœ…
OpenLlama βœ… ❌ ❌
OPT βœ… βœ… βœ…
OWL-ViT βœ… ❌ ❌
OWLv2 βœ… ❌ ❌
PatchTSMixer βœ… ❌ ❌
PatchTST βœ… ❌ ❌
Pegasus βœ… βœ… βœ…
PEGASUS-X βœ… ❌ ❌
Perceiver βœ… ❌ ❌
Persimmon βœ… ❌ ❌
Phi βœ… ❌ ❌
PhoBERT βœ… βœ… βœ…
Pix2Struct βœ… ❌ ❌
PLBart βœ… ❌ ❌
PoolFormer βœ… ❌ ❌
Pop2Piano βœ… ❌ ❌
ProphetNet βœ… ❌ ❌
PVT βœ… ❌ ❌
QDQBert βœ… ❌ ❌
Qwen2 βœ… ❌ ❌
RAG βœ… βœ… ❌
REALM βœ… ❌ ❌
Reformer βœ… ❌ ❌
RegNet βœ… βœ… βœ…
RemBERT βœ… βœ… ❌
ResNet βœ… βœ… βœ…
RetriBERT βœ… ❌ ❌
RoBERTa βœ… βœ… βœ…
RoBERTa-PreLayerNorm βœ… βœ… βœ…
RoCBert βœ… ❌ ❌
RoFormer βœ… βœ… βœ…
RWKV βœ… ❌ ❌
SAM βœ… βœ… ❌
SeamlessM4T βœ… ❌ ❌
SeamlessM4Tv2 βœ… ❌ ❌
SegFormer βœ… βœ… ❌
SEW βœ… ❌ ❌
SEW-D βœ… ❌ ❌
SigLIP βœ… ❌ ❌
Speech Encoder decoder βœ… ❌ βœ…
Speech2Text βœ… βœ… ❌
SpeechT5 βœ… ❌ ❌
Splinter βœ… ❌ ❌
SqueezeBERT βœ… ❌ ❌
StableLm βœ… ❌ ❌
SwiftFormer βœ… ❌ ❌
Swin Transformer βœ… βœ… ❌
Swin Transformer V2 βœ… ❌ ❌
Swin2SR βœ… ❌ ❌
SwitchTransformers βœ… ❌ ❌
T5 βœ… βœ… βœ…
T5v1.1 βœ… βœ… βœ…
Table Transformer βœ… ❌ ❌
TAPAS βœ… βœ… ❌
TAPEX βœ… βœ… βœ…
Time Series Transformer βœ… ❌ ❌
TimeSformer βœ… ❌ ❌
Trajectory Transformer βœ… ❌ ❌
Transformer-XL βœ… βœ… ❌
TrOCR βœ… ❌ ❌
TVLT βœ… ❌ ❌
TVP βœ… ❌ ❌
UL2 βœ… βœ… βœ…
UMT5 βœ… ❌ ❌
UniSpeech βœ… ❌ ❌
UniSpeechSat βœ… ❌ ❌
UnivNet βœ… ❌ ❌
UPerNet βœ… ❌ ❌
VAN βœ… ❌ ❌
VideoMAE βœ… ❌ ❌
ViLT βœ… ❌ ❌
VipLlava βœ… ❌ ❌
Vision Encoder decoder βœ… βœ… βœ…
VisionTextDualEncoder βœ… βœ… βœ…
VisualBERT βœ… ❌ ❌
ViT βœ… βœ… βœ…
ViT Hybrid βœ… ❌ ❌
VitDet βœ… ❌ ❌
ViTMAE βœ… βœ… ❌
ViTMatte βœ… ❌ ❌
ViTMSN βœ… ❌ ❌
VITS βœ… ❌ ❌
ViViT βœ… ❌ ❌
Wav2Vec2 βœ… βœ… βœ…
Wav2Vec2-BERT βœ… ❌ ❌
Wav2Vec2-Conformer βœ… ❌ ❌
Wav2Vec2Phoneme βœ… βœ… βœ…
WavLM βœ… ❌ ❌
Whisper βœ… βœ… βœ…
X-CLIP βœ… ❌ ❌
X-MOD βœ… ❌ ❌
XGLM βœ… βœ… βœ…
XLM βœ… βœ… ❌
XLM-ProphetNet βœ… ❌ ❌
XLM-RoBERTa βœ… βœ… βœ…
XLM-RoBERTa-XL βœ… ❌ ❌
XLM-V βœ… βœ… βœ…
XLNet βœ… βœ… ❌
XLS-R βœ… βœ… βœ…
XLSR-Wav2Vec2 βœ… βœ… βœ…
YOLOS βœ… ❌ ❌
YOSO βœ… ❌ ❌