Scaling Transformer Model Performance with Intel AI
Intel and Hugging Face are building powerful optimization tools to accelerate training and inference with Transformers. Learn how the Intel AI Software Portfolio and Xeon Scalable processors can help you achieve the best performance and productivity on your models.
Intel and Hugging Face are collaborating to build state-of-the-art hardware and software acceleration to train, fine-tune and predict with Transformers. The hardware acceleration is driven by Intel Xeon Scalable CPU platform and the software acceleration through a rich suite of optimized AI software tools, frameworks, and libraries.
from optimum.intel.neural_compressor import IncOptimizer, IncQuantizationConfig, IncQuantizer # Load the quantization configuration detailing the quantization we wish to apply quantization_config = IncQuantizationConfig.from_pretrained( "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic", config_file_name="quantization.yml" ) quantizer = IncQuantizer(quantization_config, eval_func=eval_func) optimizer = IncOptimizer(model, quantizer=quantizer) # Apply dynamic quantization optimized_model = optimizer.fit() # Save the resulting model and its corresponding configuration in the given directory optimizer.save_pretrained(save_dir) # To load a quantized model hosted locally or on the hub, you can do as follows: from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification loaded_model = IncQuantizedModelForSequenceClassification.from_pretrained(save_dir)
Easily optimize models for production
Optimum Intel, a part of Hugging Face's Optimum library, builds on top of Intel Neural Compressor, an open-source library for model compression and increasing the speed of inference deployment. With Optimum Intel, you can apply state-of-the-art optimization techniques such as quantization, pruning, and knowledge distillation for your transformer models with minimal effort.
Get high performance on CPU instances
3rd Generation Intel® Xeon® Scalable processors offer a balanced architecture that delivers built-in AI acceleration and advanced security capabilities. This allows you to place your transformer workloads where they perform best while minimizing costs.
Quickly go from concept to scale
With hardware and software optimized for AI workloads, an open, familiar, standards-based software environment and the hardware flexibility you need to create the deployment you want, Intel can help accelerate your time to production.