Accelerate Transformers on State of the Art Hardware

Hugging Face is partnering with leading AI Hardware accelerators to make state of the art production performance accessible

Meet the Hugging Face Hardware Partners

Graphcore Logo

Train Transformers faster with IPUs

Learn more
Habana Logo

Accelerate Transformers Training on Gaudi

Learn more
Intel Logo

Scale with Xeon

Learn more

Optimum: the ML Optimization toolkit for production performance

Hardware-specific acceleration tools

1. Quantize

Make models faster with minimal impact on accuracy, leveraging post-training quantization, quantization-aware training and dynamic quantization from Intel® Neural Compressor.

huggingface@hardware:~
from optimum.intel.neural_compressor import IncOptimizer, IncQuantizer, IncQuantizationConfig

# Load the quantization configuration detailing the quantization process to apply
quantization_config = IncQuantizationConfig.from_pretrained(
    "echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1",
    config_file_name="quantization.yml",
)
# Instantiate our IncQuantizer using the desired configuration
quantizer = IncQuantizer(quantization_config, eval_func=eval_func)
optimizer = IncOptimizer(model, quantizer=quantizer)
# Apply dynamic quantization
model = optimizer.fit()

2. Prune

Make models smaller with minimal impact on accuracy, with easy to use configurations to remove model weights using Intel® Neural Compressor.

huggingface@hardware:~
from optimum.intel.neural_compressor import IncOptimizer, IncPruner, IncPruningConfig

# Load the pruning configuration detailing the pruning process to apply
pruning_config = IncPruningConfig.from_pretrained(
    "echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1",
    config_file_name="prune.yml",
)
# Instantiate our IncPruner using the desired configuration
pruner = IncPruner(pruning_config, eval_func=eval_func, train_func=train_func)
optimizer = IncOptimizer(model, pruner=pruner)
# Apply magnitude pruning
model = optimizer.fit()

3. Train

Train models faster than ever before with Graphcore Intelligence Processing Units (IPUs), the latest generation of AI dedicated hardware, leveraging the built-in IPUTrainer API to train or finetune transformers models (coming soon)

huggingface@hardware:~
from optimum.graphcore import IPUConfig, IPUTrainer
from transformers import BertForPreTraining, BertTokenizer

# Allocate model and tokenizer as usual
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
model = BertForPreTraining.from_pretrained("bert-base-cased")

# IPU configuration + Trainer
ipu_config = IPUConfig.from_pretrained("Graphcore/bert-base-ipu")
trainer = IPUTrainer(model, ipu_config=ipu_config, args=trainings_args)

# The Trainer takes care of compiling the model for the IPUs in the background
# to perform training, the user does not have to deal with that
trainer.train()

# Save the model and/or push to hub
model.save_pretrained("...")
model.push_to_hub("...")