Optimum documentation

🤗 Optimum notebooks

Optimum

Overview

🤗 Optimum Installation Quick tour Notebooks

Conceptual guides

Nvidia

AMD

Intel

AWS Trainium/Inferentia

Google TPUs

for Intel Gaudi

ExecuTorch

Furiosa

ONNX Runtime

Exporters

BetterTransformer

Torch FX

LLM quantization

Utilities

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v1.26.1).

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

🤗 Optimum notebooks

You can find here a list of the notebooks associated with each accelerator in 🤗 Optimum.

Optimum Habana

Notebook	Description	Colab	Studio Lab
How to use DeepSpeed to train models with billions of parameters on Habana Gaudi	Show how to use DeepSpeed to pre-train/fine-tune the 1.6B-parameter GPT2-XL for causal language modeling on Habana Gaudi.

Optimum Intel

OpenVINO

Notebook	Description	Colab	Studio Lab
How to run inference with OpenVINO	Explains how to export your model to OpenVINO and run inference with OpenVINO Runtime on various tasks
How to quantize a question answering model with NNCF	Show how to apply post-training quantization on a question answering model using NNCF and to accelerate inference with OpenVINO

Neural Compressor

Notebook	Description	Colab	Studio Lab
How to quantize a model with Intel Neural Compressor for text classification	Show how to apply quantization while training your model using Intel Neural Compressor for any GLUE task.

Optimum ONNX Runtime

Notebook	Description	Colab	Studio Lab
How to quantize a model with ONNX Runtime for text classification	Show how to apply static and dynamic quantization on a model using ONNX Runtime for any GLUE task.
How to fine-tune a model for text classification with ONNX Runtime	Show how to DistilBERT model on GLUE tasks using ONNX Runtime.
How to fine-tune a model for summarization with ONNX Runtime	Show how to fine-tune a T5 model on the BBC news corpus.
How to fine-tune DeBERTa for question-answering with ONNX Runtime	Show how to fine-tune a DeBERTa model on the squad.

< > Update on GitHub

←Quick tour Quantization→