🤗 Optimum

🤗 Optimum is an extension of Transformers that provides a set of performance optimization tools to train and run models on targeted hardware with maximum efficiency.

The AI ecosystem evolves quickly, and more and more specialized hardware along with their own optimizations are emerging every day. As such, Optimum enables developers to efficiently use any of these platforms with the same ease inherent to Transformers.

🤗 Optimum is distributed as a collection of packages - check out the links below for an in-depth look at each one.

Habana

Maximize training throughput and efficiency with Habana's Gaudi processor

Intel

Optimize your model to speedup inference with OpenVINO and Neural Compressor

AWS Trainium/Inferentia

Accelerate your training and inference workflows with AWS Trainium and AWS Inferentia

NVIDIA

Accelerate inference with NVIDIA TensorRT-LLM on the NVIDIA platform

AMD

Enable performance optimizations for AMD Instinct GPUs and AMD Ryzen AI NPUs

FuriosaAI

Fast and efficient inference on FuriosaAI WARBOY

ONNX Runtime

Apply quantization and graph optimization to accelerate Transformers models training and inference with ONNX Runtime

BetterTransformer

A one-liner integration to use PyTorch's BetterTransformer with Transformers models