How to use optimum
and BetterTransformer
?
Install dependencies
You can easily use the BetterTransformer
integration with 🤗 Optimum, first install the dependencies as follows:
pip install transformers accelerate optimum
Also, make sure to install the latest version of PyTorch by following the guidelines on the PyTorch official website. Note that BetterTransformer
API is only compatible with torch>=1.13
, so make sure to have this version installed on your environement before starting.
If you want to benefit from the scaled_dot_product_attention
function (for decoder-based models), make sure to use at least torch>=2.0
.
Step 1: Load your model
First, load your Hugging Face model using 🤗 Transformers. Make sure to download one of the models that is supported by the BetterTransformer
API:
>>> from transformers import AutoModel
>>> model_id = "roberta-base"
>>> model = AutoModel.from_pretrained(model_id)
>>> from transformers import AutoModel
>>> model_id = "roberta-base"
>>> model = AutoModel.from_pretrained(model_id, device_map="auto")
Step 2: Set your model on your preferred device
If you did not used device_map="auto"
to load your model (or if your model does not support device_map="auto"
), you can manually set your model to a GPU:
>>> model = model.to(0) # or model.to("cuda:0")
Step 3: Convert your model to BetterTransformer!
Now time to convert your model using BetterTransformer
API! You can run the commands below:
>>> from optimum.bettertransformer import BetterTransformer
>>> model = BetterTransformer.transform(model)
By default, BetterTransformer.transform
will overwrite your model, which means that your previous native model cannot be used anymore. If you want to keep it for some reasons, just add the flag keep_original_model=True
!
>>> from optimum.bettertransformer import BetterTransformer
>>> model_bt = BetterTransformer.transform(model, keep_original_model=True)
If your model does not support the BetterTransformer
API, this will be displayed on an error trace. Note also that decoder-based models (OPT, BLOOM, etc.) are not supported yet but this is in the roadmap of PyTorch for the future.
Pipeline compatibility
Transformer’s pipeline is also compatible with this integration and you can use BetterTransformer
as an accelerator for your pipelines. The code snippet below shows how:
>>> from optimum.pipelines import pipeline
>>> pipe = pipeline("fill-mask", "distilbert-base-uncased", accelerator="bettertransformer")
>>> pipe("I am a student at [MASK] University.")
If you want to run a pipeline on a GPU device, run:
>>> from optimum.pipelines import pipeline
>>> pipe = pipeline("fill-mask", "distilbert-base-uncased", accelerator="bettertransformer", device=0)
>>> ...
You can also use transformers.pipeline
as usual and pass the converted model directly:
>>> from transformers import pipeline
>>> pipe = pipeline("fill-mask", model=model_bt, tokenizer=tokenizer, device=0)
>>> ...
Please refer to the official documentation of pipeline
for further usage. If you face into any issue, do not hesitate to open an isse on GitHub!
Training compatibility
You can now benefit from the BetterTransformer
API for your training scripts. Just make sure to convert back your model to its original version by calling BetterTransformer.reverse
before saving your model.
The code snippet below shows how:
from optimum.bettertransformer import BetterTransformer
from transformers import AutoModelForCausalLM
with torch.device(“cuda”):
model = AutoModelForCausalLM.from_pretrained(“gpt2-large”, torch_dtype=torch.float16)
model = BetterTransformer.transform(model)
# do your inference or training here
# if training and want to save the model
model = BetterTransformer.reverse(model)
model.save_pretrained("fine_tuned_model")
model.push_to_hub("fine_tuned_model")