Optimum documentation

Supported models

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v1.19.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Brevitas is an AMD library for neural network quantization. 🤗 Optimum-AMD integrates with Brevitas so as to make it easier to quantize Transformers models through Brevitas.

This integration also allows to export models quantized through Brevitas to ONNX.

For a refresher on quantization, please have a look at this documentation.

Please refer to ~BrevitasQuantizer and ~BrevitasQuantizationConfig for all available options.

Supported models

Currently, only the following architectures are tested and supported:

  • Llama
  • OPT

Dynamic quantization

from optimum.amd import BrevitasQuantizationConfig, BrevitasQuantizer
from transformers import AutoTokenizer

# Prepare the quantizer, specifying its configuration and loading the model.
qconfig = BrevitasQuantizationConfig(
    is_static=False,
    apply_gptq=False,
    apply_weight_equalization=False,
    activations_equalization=False,
    weights_symmetric=True,
    activations_symmetric=False,
)

quantizer = BrevitasQuantizer.from_pretrained("facebook/opt-125m")

model = quantizer.quantize(qconfig)

Static quantization

from optimum.amd import BrevitasQuantizationConfig, BrevitasQuantizer
from transformers import AutoTokenizer

# Prepare the quantizer, specifying its configuration and loading the model.
qconfig = BrevitasQuantizationConfig(
    is_static=True,
    apply_gptq=False,
    apply_weight_equalization=True,
    activations_equalization=False,
    weights_symmetric=True,
    activations_symmetric=False,
)

quantizer = BrevitasQuantizer.from_pretrained("facebook/opt-125m")

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m")

# Load the data for calibration and evaluation.
calibration_dataset = get_dataset_for_model(
    "facebook/opt-125m",
    qconfig=qconfig,
    dataset_name="wikitext2",
    tokenizer=tokenizer,
    nsamples=128,
    seqlen=512,
    split="train",
)

model = quantizer.quantize(qconfig, calibration_dataset)

Export Brevitas models to ONNX

Brevitas models can be exported to ONNX using Optimum:

import torch
from optimum.amd.brevitas.export import onnx_export_from_quantized_model

# Export to ONNX through optimum.exporters.
onnx_export_from_quantized_model(model, "llm_quantized_onnx")

Complete example

A complete example is available at https://github.com/huggingface/optimum-amd/tree/main/examples/quantization/brevitas.

< > Update on GitHub