Accelerate Transformers on State of the Art Hardware

Hugging Face is partnering with leading AI Hardware accelerators to make state of the art production performance accessible

Optimum: the ML Optimization toolkit for production performance

Hardware-specific acceleration tools

1. Quantize

Make models faster with minimal impact on accuracy, leveraging post-training quantization, quantization-aware training and dynamic quantization from Intel® Low Precision Optimization Tool (LPOT).

huggingface@hardware:~
from optimum.intel.lpot.quantization import LpotQuantizerForSequenceClassification

# Create quantizer from config 
quantizer = LpotQuantizerForSequenceClassification.from_config(
    "echarlaix/quantize-dynamic-test",
    "quantization.yml",
    model_name_or_path="textattack/bert-base-uncased-SST-2",
)

model = quantizer.fit_dynamic()

2. Prune

Make models smaller with minimal impact on accuracy, with easy to use configurations to remove model weights using Intel® Low Precision Optimization Tool (LPOT).

huggingface@hardware:~
from optimum.intel.lpot.pruning import LpotPrunerForSequenceClassification

# Create pruner from config 
pruner = LpotPrunerForSequenceClassification.from_config(
    "echarlaix/magnitude-pruning-test",
    "prune.yml",
    model_name_or_path="textattack/bert-base-uncased-SST-2",
)

model = pruner.fit()

3. Train

Train models faster than ever before with Graphcore Intelligence Processing Units (IPUs), the latest generation of AI dedicated hardware, leveraging the built-in IPUTrainer API to train or finetune transformers models (coming soon)

huggingface@hardware:~
from optimum.graphcore import IPUTrainer
from optimum.graphcore.bert import BertIPUConfig
from transformers import BertForMaskedLM, BertTokenizer
from poptorch.optim import AdamW

# Allocate model and tokenizer as usual
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
model = BertForMaskedLM.from_pretrained("bert-base-cased")

# Trainer + poptorch custom configuration optional 
ipu_config = BertIPUConfig()
trainer = IPUTrainer(model, trainings_args, config=ipu_config)
optimizer = AdamW(model.parameters)

# This is hidden from the user, it will be handled by the Trainer
with trainer.compile(some_data_loader) as model_f:
     for steps in range(...):
		 outputs = trainer.step(optimizer)    

# Save the model and/or push to hub
model.save_pretrained("...")
model.push_to_hub("...")

Meet the Hugging Face Hardware Partners

Scale with Xeon

Do more with Snapdragon

Train with IPU