Optimum documentation

πŸ€— Optimum Habana

You are viewing v1.14.0 version. A newer version v1.23.3 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

πŸ€— Optimum Habana

πŸ€— Optimum Habana is the interface between the πŸ€— Transformers and πŸ€— Diffusers libraries and Habana’s Gaudi processor (HPU). It provides a set of tools that enable easy model loading, training and inference on single- and multi-HPU settings for various downstream tasks as shown in the table below.

HPUs offer fast model training and inference as well as a great price-performance ratio. Check out this blog post about BERT pre-training and this article benchmarking Habana Gaudi2 versus Nvidia A100 GPUs for concrete examples. If you are not familiar with HPUs, we recommend you take a look at our conceptual guide.

The following model architectures, tasks and device distributions have been validated for πŸ€— Optimum Habana:

In the tables below, βœ… means single-card, multi-card and DeepSpeed have all been validated.

  • Transformers
Architecture Training Inference Tasks
BERT βœ… βœ…
  • text classification
  • question answering
  • language modeling
  • RoBERTa βœ… βœ…
  • question answering
  • language modeling
  • ALBERT βœ… βœ…
  • question answering
  • language modeling
  • DistilBERT βœ… βœ…
  • question answering
  • language modeling
  • GPT2 βœ… βœ…
  • language modeling
  • text generation
  • BLOOM(Z) ❌
  • DeepSpeed
  • text generation
  • StarCoder ❌
  • Single card
  • text generation
  • GPT-J
  • DeepSpeed
  • Single card
  • DeepSpeed
  • language modeling
  • text generation
  • GPT-NeoX
  • DeepSpeed
  • DeepSpeed
  • language modeling
  • text generation
  • OPT ❌
  • DeepSpeed
  • text generation
  • Llama 2 / CodeLlama
  • DeepSpeed
  • LoRA
  • DeepSpeed
  • LoRA
  • language modeling
  • text generation
  • StableLM ❌
  • Single card
  • text generation
  • Falcon ❌
  • Single card
  • text generation
  • CodeGen ❌
  • Single card
  • text generation
  • MPT ❌
  • Single card
  • text generation
  • T5 βœ… βœ…
  • summarization
  • translation
  • question answering
  • BART ❌
  • Single card
  • summarization
  • translation
  • question answering
  • ViT βœ… βœ…
  • image classification
  • Swin βœ… βœ…
  • image classification
  • Wav2Vec2 βœ… βœ…
  • audio classification
  • speech recognition
  • CLIP βœ… βœ…
  • contrastive image-text training
  • BridgeTower βœ… βœ…
  • contrastive image-text training
  • ESMFold ❌
  • Single card
  • protein folding
    • Diffusers
    Architecture Training Inference <center>Tasks</center>
    Stable Diffusion ❌
  • Single card
  • text-to-image generation
  • LDM3D ❌
  • Single card
  • text-to-image generation
  • Other models and tasks supported by the πŸ€— Transformers and πŸ€— Diffusers library may also work. You can refer to this section for using them with πŸ€— Optimum Habana. Besides, this page explains how to modify any example from the πŸ€— Transformers library to make it work with πŸ€— Optimum Habana.