Overview
π€ Optimum provides an integration with Better Transformer
, a fast path of standard PyTorch Transformer APIs to benefit from interesting speedups on CPU & GPU through sparsity and fused kernels. For now, it supports Transformer encoders, basically fast path of nn.TransformerEncoderLayer
,
support for decoders and training path is coming soon.
Quickstart
Since its 1.13 version, PyTorch released the stable version of a fast path for its standard Transformer APIs that provides out of the box performance improvements for transformer-based models. You can benefit from interesting speedup on most consumer-type devices, including CPUs, older and newer versions of NIVIDIA GPUs. You can now use this feature in π€ Optimum together with Transformers and use it for major models in the Hugging Face ecosystem.
In the 2.0 version, PyTorch includes a scaled dot-product attention function (SDPA) as part of torch.nn.functional
. This function encompasses several implementations that can be applied depending on the inputs and the hardware in use. See the official documentation in detail for more information.
We provide an integration with BetterTransforer
API to use this function in π€ Optimum, so that you can convert any supported π€ Transformers model to call the scaled_dot_product_attention
function when relevant.
Supported models
The list of supported model below:
- AlBERT
- BART
- BERT
- BERT-generation
- BLIP-2
- CamemBERT
- CLIP
- CodeGen
- Data2VecText
- DistilBert
- DeiT
- Electra
- Ernie
- FSMT
- GPT2
- GPT-j
- GPT-neo
- GPT-neo-x
- HuBERT
- LayoutLM
- MarkupLM
- Marian
- MBart
- M2M100
- OPT
- ProphetNet
- RemBERT
- RoBERTa
- RoCBert
- RoFormer
- Splinter
- Tapas
- ViLT
- ViT
- ViT-MAE
- ViT-MSN
- Wav2Vec2
- Whisper
- XLMRoberta
- YOLOS
Let us know by opening an issue in π€ Optimum if you want more models to be supported, or check out the contribution guideline if you want to add it by yourself!
Quick usage
In order to use the BetterTransformer
API just run the following commands:
>>> from transformers import AutoModelForSequenceClassification
>>> from optimum.bettertransformer import BetterTransformer
>>> model_hf = AutoModelForSequenceClassification.from_pretrained("bert-base-cased")
>>> model = BetterTransformer.transform(model_hf, keep_original_model=True)
You can leave keep_original_model=False
in case you want to overwrite the current model with its BetterTransformer
version.
More details on tutorials
section to deeply understand how to use it, or check the Google colab demo!