Optimum for Intel® Gaudi® AI Accelerator

Optimum for Intel Gaudi AI accelerator is the interface between Hugging Face libraries (Transformers, Diffusers, Accelerate,…) and Intel Gaudi AI Accelerators (HPUs). It provides a set of tools that enable easy model loading, training and inference on single- and multi-HPU settings for various downstream tasks as shown in the table below.

Tutorials

Learn the basics and become familiar with training transformers on HPUs with 🤗 Optimum. Start here if you are using 🤗 Optimum for Intel Gaudi for the first time!

How-to guides

Practical guides to help you achieve a specific goal. Take a look at these guides to learn how to use 🤗 Optimum for Intel Gaudi to solve real-world problems.

The Intel Gaudi AI accelerator family currently includes three product generations: Intel Gaudi 1, Intel Gaudi 2, and Intel Gaudi 3. Each server is equipped with 8 devices, known as Habana Processing Units (HPUs), providing 128GB of memory on Gaudi 3, 96GB on Gaudi 2, and 32GB on the first-gen Gaudi. For more details on the underlying hardware architecture, check out the Gaudi Architecture Overview. Optimum for Intel Gaudi library is fully compatible with all three generations of Gaudi accelerators.

For in-depth examples of running workloads on Gaudi, explore the following blog posts:

The following model architectures, tasks and device distributions have been validated for Optimum for Intel Gaudi:

In the tables below, ✅ means single-card, multi-card and DeepSpeed have all been validated.

Transformers:

Architecture	Training	Inference	Tasks
BERT	✅	✅	text classification question answering language modeling text feature extraction
RoBERTa	✅	✅	question answering language modeling
ALBERT	✅	✅	question answering language modeling
DistilBERT	✅	✅	question answering language modeling
GPT2	✅	✅	language modeling text generation
BLOOM(Z)		DeepSpeed	text generation
StarCoder / StarCoder2	✅	Single card	language modeling text generation
GPT-J	DeepSpeed	Single card DeepSpeed	language modeling text generation
GPT-Neo		Single card	text generation
GPT-NeoX	DeepSpeed	DeepSpeed	language modeling text generation
OPT		DeepSpeed	text generation
Llama 2 / CodeLlama / Llama 3 / Llama Guard / Granite	✅	✅	language modeling text generation question answering text classification (Llama Guard)
StableLM		Single card	text generation
Falcon	LoRA	✅	text generation
CodeGen		Single card	text generation
MPT		Single card	text generation
Mistral		Single card	text generation
Phi	✅	Single card	language modeling text generation
Mixtral		Single card	text generation
Gemma	✅	Single card	language modeling text generation
Gemma2		✅	text generation
Qwen2	Single card	Single card	language modeling text generation
Qwen2-MoE		Single card	text generation
Persimmon		Single card	text generation
XGLM		Single card	text generation
Cohere		Single card	text generation
T5 / Flan T5	✅	✅	summarization translation question answering
BART		Single card	summarization translation question answering
ViT	✅	✅	image classification
Swin	✅	✅	image classification
Wav2Vec2	✅	✅	audio classification speech recognition
Whisper	✅	✅	speech recognition
SpeechT5		Single card	text to speech
CLIP	✅	✅	contrastive image-text training
BridgeTower	✅	✅	contrastive image-text training
ESMFold		Single card	protein folding
Blip		Single card	visual question answering image to text
OWLViT		Single card	zero shot object detection
ClipSeg		Single card	object segmentation
Llava / Llava-next		Single card	image to text
Paligemma		Single card	image to text
idefics2	LoRA	Single card	image to text
SAM		Single card	object segmentation
VideoMAE		Single card	Video classification
TableTransformer		Single card	table object detection
DETR		Single card	object detection
Mllama	LoRA	✅	image to text
Video-LLaVA		Single card	video comprehension
MiniCPM3		Single card	text generation
Baichuan2	DeepSpeed	Single card	language modeling text generation
DeepSeek-V2	✅	✅	text generation
DeepSeek-V3		✅	text generation
ChatGLM	DeepSpeed	Single card	language modeling text generation
Qwen2-VL		Single card	image to text

Diffusers

Architecture	Training	Inference	Tasks
Stable Diffusion	textual inversion ControlNet	Single card	text-to-image generation
Stable Diffusion XL	fine-tuning	Single card	text-to-image generation
Stable Diffusion Depth2img		Single card	depth-to-image generation
LDM3D		Single card	text-to-image generation
FLUX.1	fine-tuning	Single card	text-to-image generation
Text to Video		Single card	text-to-video generation
i2vgen-xl		Single card	image-to-video generation

PyTorch Image Models/TIMM:

Architecture	Training	Inference	Tasks
FastViT		Single card	image classification

TRL:

Architecture	Training	Tasks
Llama 2	✅	DPO Pipeline
Llama 2	✅	PPO Pipeline
Stable Diffusion	✅	DDPO Pipeline

Other models and tasks supported by the 🤗 Transformers and 🤗 Diffusers library may also work. You can refer to this section for using them with 🤗 Optimum for Intel Gaudi. In addition, this page explains how to modify any example from the 🤗 Transformers library to make it work with 🤗 Optimum for Intel Gaudi.

< > Update on GitHub