Optimum documentation

🤗 Optimum notebooks

Optimum

Overview

🤗 Optimum Installation Quick tour Notebooks

Conceptual guides

Nvidia

AMD

Intel

AWS Trainium/Inferentia

Habana

Furiosa

ONNX Runtime

Exporters

BetterTransformer

Torch FX

LLM quantization

Utilities

You are viewing v1.18.1 version. A newer version v1.27.0 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

🤗 Optimum notebooks

You can find here a list of the notebooks associated with each accelerator in 🤗 Optimum.

Optimum Habana

Notebook	Description	Colab	Studio Lab
How to use DeepSpeed to train models with billions of parameters on Habana Gaudi	Show how to use DeepSpeed to pre-train/fine-tune the 1.6B-parameter GPT2-XL for causal language modeling on Habana Gaudi.

Optimum Intel

OpenVINO

Notebook	Description	Colab	Studio Lab
How to run inference with OpenVINO	Explains how to export your model to OpenVINO and run inference with OpenVINO Runtime on various tasks
How to quantize a question answering model with NNCF	Show how to apply post-training quantization on a question answering model using NNCF and to accelerate inference with OpenVINO
Compare outputs of a quantized Stable Diffusion model with its full-precision counterpart	Show how to load and compare outputs from two Stable Diffusion models with different precision

Neural Compressor

Notebook	Description	Colab	Studio Lab
How to quantize a model with Intel Neural Compressor for text classification	Show how to apply quantization while training your model using Intel Neural Compressor for any GLUE task.

Optimum ONNX Runtime

Notebook	Description	Colab	Studio Lab
How to quantize a model with ONNX Runtime for text classification	Show how to apply static and dynamic quantization on a model using ONNX Runtime for any GLUE task.
How to fine-tune a model for text classification with ONNX Runtime	Show how to DistilBERT model on GLUE tasks using ONNX Runtime.
How to fine-tune a model for summarization with ONNX Runtime	Show how to fine-tune a T5 model on the BBC news corpus.
How to fine-tune DeBERTa for question-answering with ONNX Runtime	Show how to fine-tune a DeBERTa model on the squad.

←Quick tour Quantization→