🤗 Optimum

🤗 Optimum is an extension of Transformers that provides a set of performance optimization tools to train and run models on targeted hardware with maximum efficiency.

The AI ecosystem evolves quickly, and more and more specialized hardware along with their own optimizations are emerging every day. As such, Optimum enables developers to efficiently use any of these platforms with the same ease inherent to Transformers.

🤗 Optimum is distributed as a collection of packages - check out the links below for an in-depth look at each one.

Hardware partners

The packages below enable you to get the best of the 🤗 Hugging Face ecosystem on various types of devices.

NVIDIA

Accelerate inference with NVIDIA TensorRT-LLM on the NVIDIA platform

AMD

Enable performance optimizations for AMD Instinct GPUs and AMD Ryzen AI NPUs

Intel

Optimize your model to speedup inference with OpenVINO , Neural Compressor and IPEX

AWS Trainium/Inferentia

Accelerate your training and inference workflows with AWS Trainium and AWS Inferentia

Google TPUs

Accelerate your training and inference workflows with Google TPUs

Habana

Maximize training throughput and efficiency with Habana's Gaudi processor

FuriosaAI

Fast and efficient inference on FuriosaAI WARBOY

Open-source integrations

🤗 Optimum also supports a variety of open-source frameworks to make model optimization very easy.

ONNX Runtime

Apply quantization and graph optimization to accelerate Transformers models training and inference with ONNX Runtime

ExecuTorch

PyTorch’s native solution to inference on the Edge via ExecuTorch

Exporters

Export your PyTorch or TensorFlow model to different formats such as ONNX and TFLite

BetterTransformer

A one-liner integration to use PyTorch's BetterTransformer with Transformers models

Torch FX

Create and compose custom graph transformations to optimize PyTorch Transformers models with Torch FX

< > Update on GitHub