Optimum for Intel Gaudi

Optimum for Intel Gaudi is the interface between the Transformers and Diffusers libraries and Intel® Gaudi® AI Accelerators (HPUs). It provides a set of tools that enable easy model loading, training and inference on single- and multi-HPU settings for various downstream tasks as shown in the table below.

HPUs offer fast model training and inference as well as a great price-performance ratio. Check out this blog post about BERT pre-training and this post benchmarking Intel Gaudi 2 with NVIDIA A100 GPUs for concrete examples. If you are not familiar with HPUs, we recommend you take a look at our conceptual guide.

The following model architectures, tasks and device distributions have been validated for Optimum for Intel Gaudi:

In the tables below, ✅ means single-card, multi-card and DeepSpeed have all been validated.

Transformers

Architecture	Training	Inference	Tasks
BERT	✅	✅	text classification question answering language modeling
RoBERTa	✅	✅	question answering language modeling
ALBERT	✅	✅	question answering language modeling
DistilBERT	✅	✅	question answering language modeling
GPT2	✅	✅	language modeling text generation
BLOOM(Z)		DeepSpeed	text generation
StarCoder / StarCoder2	✅	Single card	language modeling text generation
GPT-J	DeepSpeed	Single card DeepSpeed	language modeling text generation
GPT-NeoX	DeepSpeed	DeepSpeed	language modeling text generation
OPT		DeepSpeed	text generation
Llama 2 / CodeLlama / Llama 3 / Llama Guard / Granite	✅	✅	language modeling text generation question answering text classification (Llama Guard)
StableLM		Single card	text generation
Falcon	LoRA	✅	text generation
CodeGen		Single card	text generation
MPT		Single card	text generation
Mistral		Single card	text generation
Phi	✅	Single card	language modeling text generation
Mixtral		Single card	text generation
Gemma	✅	Single card	language modeling text generation
Qwen2	Single card	Single card	language modeling text generation
Persimmon		Single card	text generation
Mamba		Single card	text generation
T5 / Flan T5	✅	✅	summarization translation question answering
BART		Single card	summarization translation question answering
ViT	✅	✅	image classification
Swin	✅	✅	image classification
Wav2Vec2	✅	✅	audio classification speech recognition
Whisper	✅	✅	speech recognition
SpeechT5		Single card	text to speech
CLIP	✅	✅	contrastive image-text training
BridgeTower	✅	✅	contrastive image-text training
ESMFold		Single card	protein folding
Blip		Single card	visual question answering image to text
OWLViT		Single card	zero shot object detection
ClipSeg		Single card	object segmentation
Llava / Llava-next		Single card	image to text

Diffusers

Architecture	Training	Inference	Tasks
Stable Diffusion	textual inversion ControlNet	Single card	text-to-image generation
Stable Diffusion XL	fine-tuning	Single card	text-to-image generation
LDM3D		Single card	text-to-image generation

TRL:

Architecture	Training	Tasks
Llama 2	✅	DPO Pipeline
Llama 2	✅	PPO Pipeline
Stable Diffusion	✅	DDPO Pipeline

Other models and tasks supported by the 🤗 Transformers and 🤗 Diffusers library may also work. You can refer to this section for using them with 🤗 Optimum Habana. Besides, this page explains how to modify any example from the 🤗 Transformers library to make it work with 🤗 Optimum Habana.

Tutorials

Learn the basics and become familiar with training transformers on HPUs with 🤗 Optimum. Start here if you are using 🤗 Optimum Habana for the first time!

How-to guides

Practical guides to help you achieve a specific goal. Take a look at these guides to learn how to use 🤗 Optimum Habana to solve real-world problems.

Conceptual guides

High-level explanations for building a better understanding of important topics such as HPUs.

Reference

Technical descriptions of how the Habana classes and methods of 🤗 Optimum Habana work.

< > Update on GitHub