Optimum for Intel Gaudi

Optimum for Intel Gaudi is the interface between the Transformers and Diffusers libraries and Intel® Gaudi® AI Accelerators (HPUs). It provides a set of tools that enable easy model loading, training and inference on single- and multi-HPU settings for various downstream tasks as shown in the table below.

HPUs offer fast model training and inference as well as a great price-performance ratio. Check out this blog post about BERT pre-training and this post benchmarking Intel Gaudi 2 with NVIDIA A100 GPUs for concrete examples. If you are not familiar with HPUs, we recommend you take a look at our conceptual guide.

The following model architectures, tasks and device distributions have been validated for Optimum for Intel Gaudi:

In the tables below, ✅ means single-card, multi-card and DeepSpeed have all been validated.

Transformers

Architecture	Training	Inference	Tasks
BERT	✅	✅	text classification question answering language modeling
RoBERTa	✅	✅	question answering language modeling
ALBERT	✅	✅	question answering language modeling
DistilBERT	✅	✅	question answering language modeling
GPT2	✅	✅	language modeling text generation
BLOOM(Z)		DeepSpeed	text generation
StarCoder		Single card	text generation
GPT-J	DeepSpeed	Single card DeepSpeed	language modeling text generation
GPT-NeoX	DeepSpeed	DeepSpeed	language modeling text generation
OPT		DeepSpeed	text generation
Llama 2 / CodeLlama / Llama 3 / Llama Guard	✅	✅	language modeling text generation question answering text classification (Llama Guard)
StableLM		Single card	text generation
Falcon	LoRA	✅	text generation
CodeGen		Single card	text generation
MPT		Single card	text generation
Mistral		Single card	text generation
Phi	✅	Single card	language modeling text generation
Mixtral		Single card	text generation
Gemma	✅	Single card	language modeling text generation
Qwen2	Single card	Single card	language modeling text generation
Persimmon		Single card	text generation
T5 / Flan T5	✅	✅	summarization translation question answering
BART		Single card	summarization translation question answering
ViT	✅	✅	image classification
Swin	✅	✅	image classification
Wav2Vec2	✅	✅	audio classification speech recognition
Whisper	✅	✅	speech recognition
SpeechT5		Single card	text to speech
CLIP	✅	✅	contrastive image-text training
BridgeTower	✅	✅	contrastive image-text training
ESMFold		Single card	protein folding
Blip		Single card	visual question answering image to text
OWLViT		Single card	zero shot object detection
ClipSeg		Single card	object segmentation

Diffusers

Architecture	Training	Inference	Tasks
Stable Diffusion	textual inversion ControlNet	Single card	text-to-image generation
Stable Diffusion XL	fine-tuning	Single card	text-to-image generation
LDM3D		Single card	text-to-image generation

TRL:

Architecture	Training	Tasks
Llama 2	✅	DPO Pipeline
Llama 2	✅	PPO Pipeline
Stable Diffusion	✅	DDPO Pipeline

Other models and tasks supported by the 🤗 Transformers and 🤗 Diffusers library may also work. You can refer to this section for using them with 🤗 Optimum Habana. Besides, this page explains how to modify any example from the 🤗 Transformers library to make it work with 🤗 Optimum Habana.

Tutorials

Learn the basics and become familiar with training transformers on HPUs with 🤗 Optimum. Start here if you are using 🤗 Optimum Habana for the first time!

How-to guides

Practical guides to help you achieve a specific goal. Take a look at these guides to learn how to use 🤗 Optimum Habana to solve real-world problems.

Conceptual guides

High-level explanations for building a better understanding of important topics such as HPUs.

Reference

Technical descriptions of how the Habana classes and methods of 🤗 Optimum Habana work.

< > Update on GitHub