Scaling Transformer Model Performance with Intel AI

Intel and Hugging Face are building powerful optimization tools to accelerate training and inference with Transformers. Learn how the Intel AI Software Portfolio and Xeon Scalable processors can help you achieve the best performance and productivity on your models.

Democratizing Machine Learning Acceleration

Intel and Hugging Face are collaborating to build state-of-the-art hardware and software acceleration to train, fine-tune and predict with Transformers. The hardware acceleration is driven by Intel Xeon Scalable CPU platform and the software acceleration through a rich suite of optimized AI software tools, frameworks, and libraries.

huggingface@intel:~
from optimum.intel.neural_compressor import IncOptimizer, IncQuantizationConfig, IncQuantizer 

# Load the quantization configuration detailing the quantization we wish to apply
quantization_config = IncQuantizationConfig.from_pretrained(
  "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic", config_file_name="quantization.yml"
)
quantizer = IncQuantizer(quantization_config, eval_func=eval_func)
optimizer = IncOptimizer(model, quantizer=quantizer)

# Apply dynamic quantization
optimized_model = optimizer.fit()
# Save the resulting model and its corresponding configuration in the given directory
optimizer.save_pretrained(save_dir)

# To load a quantized model hosted locally or on the hub, you can do as follows:
from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
loaded_model = IncQuantizedModelForSequenceClassification.from_pretrained(save_dir)

Scale Transformer Workloads with Intel AI

Hardware performance and developer productivity at unmatched scale

Easily optimize models for production

Optimum Intel, a part of Hugging Face's Optimum library, builds on top of Intel Neural Compressor, an open-source library for model compression and increasing the speed of inference deployment. With Optimum Intel, you can apply state-of-the-art optimization techniques such as quantization, pruning, and knowledge distillation for your transformer models with minimal effort.

Learn more about Optimum Intel

Easily optimize models for production

Get high performance on CPU instances

3rd Generation Intel® Xeon® Scalable processors offer a balanced architecture that delivers built-in AI acceleration and advanced security capabilities. This allows you to place your transformer workloads where they perform best while minimizing costs.

Learn more about Intel AI Hardware

Get high performance on CPU instances

Quickly go from concept to scale

With hardware and software optimized for AI workloads, an open, familiar, standards-based software environment and the hardware flexibility you need to create the deployment you want, Intel can help accelerate your time to production.

Quickly go from concept to scale

Accelerating Transformer Performance with Intel AI

Learn more about accelerating Hugging Face models with Intel hardware and software