π€ Optimum
π€ Optimum is an extension of Transformers that provides a set of performance optimization tools to train and run models on targeted hardware with maximum efficiency.
The AI ecosystem evolves quickly, and more and more specialized hardware along with their own optimizations are emerging every day. As such, Optimum enables developers to efficiently use any of these platforms with the same ease inherent to Transformers.
π€ Optimum is distributed as a collection of packages - check out the links below for an in-depth look at each one.
Hardware partners
The packages below enable you to get the best of the π€ Hugging Face ecosystem on various types of devices.
Accelerate inference with NVIDIA TensorRT-LLM on the NVIDIA platform
Enable performance optimizations for AMD Instinct GPUs and AMD Ryzen AI NPUs
Optimize your model to speedup inference with OpenVINO and Neural Compressor
Accelerate your training and inference workflows with AWS Trainium and AWS Inferentia
Accelerate your training and inference workflows with Google TPUs
Maximize training throughput and efficiency with Habana's Gaudi processor
Fast and efficient inference on FuriosaAI WARBOY
Some packages provide hardware-agnostic features (e.g. INC interface in Optimum Intel).
Open-source integrations
π€ Optimum also supports a variety of open-source frameworks to make model optimization very easy.
Apply quantization and graph optimization to accelerate Transformers models training and inference with ONNX Runtime
Export your PyTorch or TensorFlow model to different formats such as ONNX and TFLite
A one-liner integration to use PyTorch's BetterTransformer with Transformers models
Create and compose custom graph transformations to optimize PyTorch Transformers models with Torch FX