Optimum documentation

Optimum for Intel® Gaudi® AI Accelerator

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v1.23.3).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Optimum for Intel® Gaudi® AI Accelerator

Optimum for Intel Gaudi AI accelerator is the interface between Hugging Face libraries (Transformers, Diffusers, Accelerate,…) and Intel Gaudi AI Accelerators (HPUs). It provides a set of tools that enable easy model loading, training and inference on single- and multi-HPU settings for various downstream tasks as shown in the table below.

The Intel Gaudi AI accelerator family currently includes three product generations: Intel Gaudi 1, Intel Gaudi 2, and Intel Gaudi 3. Each server is equipped with 8 devices, known as Habana Processing Units (HPUs), providing 128GB of memory on Gaudi 3, 96GB on Gaudi 2, and 32GB on the first-gen Gaudi. For more details on the underlying hardware architecture, check out the Gaudi Architecture Overview. Optimum for Intel Gaudi library is fully compatible with all three generations of Gaudi accelerators.

For in-depth examples of running workloads on Gaudi, explore the following blog posts:

The following model architectures, tasks and device distributions have been validated for Optimum for Intel Gaudi:

In the tables below, ✅ means single-card, multi-card and DeepSpeed have all been validated.

  • Transformers:
Architecture Training Inference Tasks
BERT
  • text classification
  • question answering
  • language modeling
  • text feature extraction
  • RoBERTa
  • question answering
  • language modeling
  • ALBERT
  • question answering
  • language modeling
  • DistilBERT
  • question answering
  • language modeling
  • GPT2
  • language modeling
  • text generation
  • BLOOM(Z)
  • DeepSpeed
  • text generation
  • StarCoder / StarCoder2
  • Single card
  • language modeling
  • text generation
  • GPT-J
  • DeepSpeed
  • Single card
  • DeepSpeed
  • language modeling
  • text generation
  • GPT-Neo
  • Single card
  • text generation
  • GPT-NeoX
  • DeepSpeed
  • DeepSpeed
  • language modeling
  • text generation
  • OPT
  • DeepSpeed
  • text generation
  • Llama 2 / CodeLlama / Llama 3 / Llama Guard / Granite
  • language modeling
  • text generation
  • question answering
  • text classification (Llama Guard)
  • StableLM
  • Single card
  • text generation
  • Falcon
  • LoRA
  • text generation
  • CodeGen
  • Single card
  • text generation
  • MPT
  • Single card
  • text generation
  • Mistral
  • Single card
  • text generation
  • Phi
  • Single card
  • language modeling
  • text generation
  • Mixtral
  • Single card
  • text generation
  • Gemma
  • Single card
  • language modeling
  • text generation
  • Gemma2
  • text generation
  • Qwen2
  • Single card
  • Single card
  • language modeling
  • text generation
  • Qwen2-MoE
  • Single card
  • text generation
  • Persimmon
  • Single card
  • text generation
  • XGLM
  • Single card
  • text generation
  • Cohere
  • Single card
  • text generation
  • T5 / Flan T5
  • summarization
  • translation
  • question answering
  • BART
  • Single card
  • summarization
  • translation
  • question answering
  • ViT
  • image classification
  • Swin
  • image classification
  • Wav2Vec2
  • audio classification
  • speech recognition
  • Whisper
  • speech recognition
  • SpeechT5
  • Single card
  • text to speech
  • CLIP
  • contrastive image-text training
  • BridgeTower
  • contrastive image-text training
  • ESMFold
  • Single card
  • protein folding
  • Blip
  • Single card
  • visual question answering
  • image to text
  • OWLViT
  • Single card
  • zero shot object detection
  • ClipSeg
  • Single card
  • object segmentation
  • Llava / Llava-next
  • Single card
  • image to text
  • Paligemma
  • Single card
  • image to text
  • idefics2
  • LoRA
  • Single card
  • image to text
  • SAM
  • Single card
  • object segmentation
  • VideoMAE
  • Single card
  • Video classification
  • TableTransformer
  • Single card
  • table object detection
  • DETR
  • Single card
  • object detection
  • Mllama
  • LoRA
  • image to text
  • MiniCPM3
  • Single card
  • text generation
  • Baichuan2
  • DeepSpeed
  • Single card
  • language modeling
  • text generation
  • DeepSeek-V2
  • text generation
  • ChatGLM
  • DeepSpeed
  • Single card
  • language modeling
  • text generation
    • Diffusers
    Architecture Training Inference Tasks
    Stable Diffusion
  • textual inversion
  • ControlNet
  • Single card
  • text-to-image generation
  • Stable Diffusion XL
  • fine-tuning
  • Single card
  • text-to-image generation
  • Stable Diffusion Depth2img
  • Single card
  • depth-to-image generation
  • LDM3D
  • Single card
  • text-to-image generation
  • FLUX.1
  • fine-tuning
  • Single card
  • text-to-image generation
  • Text to Video
  • Single card
  • text-to-video generation
    • PyTorch Image Models/TIMM:
    Architecture Training Inference Tasks
    FastViT
  • Single card
  • image classification
    • TRL:
    Architecture Training Inference Tasks
    Llama 2
  • DPO Pipeline
  • Llama 2
  • PPO Pipeline
  • Stable Diffusion
  • DDPO Pipeline
  • Other models and tasks supported by the 🤗 Transformers and 🤗 Diffusers library may also work. You can refer to this section for using them with 🤗 Optimum for Intel Gaudi. In addition, this page explains how to modify any example from the 🤗 Transformers library to make it work with 🤗 Optimum for Intel Gaudi.

    < > Update on GitHub