You can easily use the
BetterTransformer integration with 🤗 Optimum, first install the dependencies as follows:
pip install transformers accelerate optimum
Also, make sure to install the latest version of PyTorch by following the guidelines on the PyTorch official website. Note that
BetterTransformer API is only compatible with
torch>=1.13, so make sure to have this version installed on your environement before starting.
First, load your Hugging Face model using 🤗 Transformers. Make sure to download one of the models that is supported by the
from transformers import AutoModel model_id = "roberta-base" model = AutoModel.from_pretrained(model_id)
from transformers import AutoModel model_id = "roberta-base" model = AutoModel.from_pretrained(model_id, device_map="auto")
If you did not used
device_map="auto" to load your model (or if your model does not support
device_map="auto"), you can manually set your model to a GPU:
0) # or model.to("cuda:0")model = model.to(
Now time to convert your model using
BetterTransformer API! You can run the commands below:
from optimum.bettertransformer import BetterTransformer model = BetterTransformer.transform(model)
BetterTransformer.transform will overwrite your model, which means that your previous native model cannot be used anymore. If you want to keep it for some reasons, just add the flag
from optimum.bettertransformer import BetterTransformer model_bt = BetterTransformer.transform(model, keep_original_model=True)
If your model does not support
BetterTransformer API, this will displayed on an error trace. Note also that decoder-based models (OPT, BLOOM, etc.) are not supported yet but this is in the roadmap of PyTorch for the future.
Transformer’s pipeline is also compatible with this integration and you can use
BetterTransformer as an accelerator for your pipelines. The code snippet below shows how:
from optimum.pipelines import pipeline pipe = pipeline("fill-mask", "distilbert-base-uncased", accelerator="bettertransformer") pipe("I am a student at [MASK] University.")
If you want to run a pipeline on a GPU device, run:
from optimum.pipelines import pipeline pipe = pipeline("fill-mask", "distilbert-base-uncased", accelerator="bettertransformer", device=0) ...
You can also use
transformers.pipeline as usual and pass the convertede model directly:
from transformers import pipeline pipe = pipeline("fill-mask", model=model_bt, tokenizer=tokenizer, device=0) ...
Please refer to the official documentation of
pipeline for further usage. If you face into any issue, do not hesitate to open an isse on GitHub!