Optimum documentation

Quantization for Ryzen AI

You are viewing v1.20.0 version. A newer version v1.23.3 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Quantization for Ryzen AI

Ryzen AI IPU best performances are achieved using quantized models. There are two different ways to quantize models for Ryzen AI IPU:

  • through Vitis AI Quantizer, used in Optimum’s RyzenAIOnnxQuantizer, which is designed for ONNX model quantization. Currently supports quantising timm models using dynamic and static quantization methods.
  • through Brevitas library, used in Optimum’s BrevitasQuantizer. Brevitas allows to quantize directly PyTorch models, which may be optionally exported to ONNX. This is recommended to quantize other models.

Quantization using RyzenAIOnnxQuantizer

🤗 Optimum AMD provides a Ryzen AI Quantizer that enables you to apply quantization on many models hosted on the Hugging Face Hub using the AMD Vitis AI Quantizer.

RyzenAI Quantizer provides an easy-to-use Post Training Quantization (PTQ) flow on the pre-trained model saved in the ONNX format. It generates a quantized ONNX model ready to be deployed with the Ryzen AI.

The Quantizer supports various configuration and functions to quantize models targeting for deployment on IPU_CNN, IPU_Transformer and CPU.

The RyzenAIOnnxQuantizer can be initialized using the from_pretrained method, either from a local model folder or a model hosted on Hugging Face Hub:

>>> from optimum.amd.ryzenai import RyzenAIOnnxQuantizer

>>> quantizer = RyzenAIOnnxQuantizer.from_pretrained("path/to/model")

Below you will find an easy end-to-end example on how to quantize a VGG model from Timm library.

  • To begin, export the VGG model to ONNX using Optimum Exporters. Ensure static shapes are specified for inference.
  • Create a preprocessing function to handle specific image format conversions and apply necessary transformations to prepare the input for the model.
  • Initialize the RyzenAI quantizer (RyzenAIOnnxQuantizer) and configure the quantization settings using AutoQuantizationConfig. The recommended quantization configuration for CNN models to be deployed on the IPU is loaded using ipu_cnn_config.
  • Obtain a calibration dataset using the quantizer’s get_calibration_dataset method. This dataset is crucial for computing quantization parameters during the quantization process.
  • Run the quantizer with the specified quantization configuration and calibration data. The quantization parameters computed during this process are embedded as constants in the quantized model.
  • The resulting quantized model is saved in the specified quantization directory.
>>> from functools import partial
>>> import timm

>>> from optimum.amd.ryzenai import AutoQuantizationConfig, RyzenAIOnnxQuantizer
>>> from optimum.exporters.onnx import main_export
>>> from transformers import PretrainedConfig

>>> # Define paths for exporting ONNX model and saving quantized model
>>> export_dir = "/path/to/vgg_onnx"
>>> quantization_dir = "/path/to/vgg_onnx_quantized"

>>> # Specify the model ID from Timm
>>> model_id = "timm/vgg11.tv_in1k"

>>> # Step 1: Export the model to ONNX format using Optimum Exporters
>>> main_export(
...     model_name_or_path=model_id,
...     output=export_dir,
...     task="image-classification",
...     opset=13,
...     batch_size=1,
...     no_dynamic_axes=True,
... )

>>> # Step 2: Preprocess configuration and data transformations
>>> config = PretrainedConfig.from_pretrained(export_dir)
>>> data_config = timm.data.resolve_data_config(pretrained_cfg=config.pretrained_cfg)
>>> transforms = timm.data.create_transform(**data_config, is_training=False)

>>> def preprocess_fn(ex, transforms):
...     image = ex["image"]
...     if image.mode == "L":
...       # Convert greyscale to RGB if needed
...       print("WARNING: converting greyscale to RGB")
...       image = image.convert("RGB")
...     pixel_values = transforms(image)
...     return {"pixel_values": pixel_values}

>>> # Step 3: Initialize the RyzenAIOnnxQuantizer with the exported model
>>> quantizer = RyzenAIOnnxQuantizer.from_pretrained(export_dir)

>>> # Step 4: Load recommended quantization config for model
>>> quantization_config = AutoQuantizationConfig.ipu_cnn_config()

>>> # Step 5: Obtain a calibration dataset for computing quantization parameters
>>> train_calibration_dataset = quantizer.get_calibration_dataset(
...     "imagenet-1k",
...     preprocess_function=partial(preprocess_fn, transforms=transforms),
...     num_samples=100,
...     dataset_split="train",
...     preprocess_batch=False,
...     streaming=True,
... )

>>> # Step 6: Run the quantizer with the specified configuration and calibration data
>>> quantizer.quantize(
...     quantization_config=quantization_config,
...     dataset=train_calibration_dataset,
...     save_dir=quantization_dir
... )

Quantization using BrevitasQuantizer

Coming soon.

< > Update on GitHub