Quantization
🤗 Optimum provides an optimum.intel.neural_compressor
package that enables you to apply quantization on many model hosted on the 🤗 hub using the Intel Neural Compressor quantization API.
IncQuantizer
The IncQuantizer
class allows to apply different quantization approaches such as static, dynamic and aware training quantization using pytorch eager or fx graph mode.
class optimum.intel.IncQuantizer
< source >( model: typing.Union[transformers.modeling_utils.PreTrainedModel, torch.nn.modules.module.Module] config_path_or_obj: typing.Union[str, optimum.intel.neural_compressor.configuration.IncQuantizationConfig] tokenizer: typing.Optional[transformers.tokenization_utils_base.PreTrainedTokenizerBase] = None eval_func: typing.Optional[typing.Callable] = None train_func: typing.Optional[typing.Callable] = None calib_dataloader: typing.Optional[torch.utils.data.dataloader.DataLoader] = None )
from_config
< source >( model_name_or_path: str inc_config: typing.Union[optimum.intel.neural_compressor.configuration.IncQuantizationConfig, str, NoneType] = None config_name: str = None **kwargs ) → quantizer
Parameters
-
model_name_or_path (
str
) — Repository name in the Hugging Face Hub or path to a local directory hosting the model. -
inc_config (
Union[IncQuantizationConfig, str]
, optional) — Configuration file containing all the information related to the model quantization. Can be either:- an instance of the class
IncQuantizationConfig
, - a string valid as input to
IncQuantizationConfig.from_pretrained
.
- an instance of the class
-
config_name (
str
, optional) — Name of the configuration file. -
cache_dir (
str
, optional) — Path to a directory in which a downloaded configuration should be cached if the standard cache should not be used. -
force_download (
bool
, optional, defaults toFalse
) — Whether or not to force to (re-)download the configuration files and override the cached versions if they exist. -
resume_download (
bool
, optional, defaults toFalse
) — Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists. -
revision(
str
, optional) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. -
calib_dataloader (
DataLoader
, optional) — DataLoader for post-training quantization calibration. -
eval_func (
Callable
, optional) — Evaluation function to evaluate the tuning objective. -
train_func (
Callable
, optional) — Training function for quantization aware training approach.
Returns
quantizer
IncQuantizer object.
Instantiate a IncQuantizer object from a configuration file which can either be hosted on huggingface.co or from a local directory path.
IncQuantizedModel
The IncQuantizedModel
class allows to load a quantized pytorch model from a given configuration file summarizing the quantization performed by Intel® Neural Compressor.
from_pretrained
< source >( model_name_or_path: str inc_config: typing.Union[optimum.intel.neural_compressor.configuration.IncOptimizedConfig, str] = None q_model_name: typing.Optional[str] = None input_names: typing.Optional[typing.List[str]] = None batch_size: typing.Optional[int] = None sequence_length: typing.Union[int, typing.List[int], typing.Tuple[int], NoneType] = None num_choices: typing.Optional[int] = -1 **kwargs ) → q_model
Parameters
-
model_name_or_path (
str
) — Repository name in the Hugging Face Hub or path to a local directory hosting the model. -
inc_config (
Union[IncOptimizedConfig, str]
, optional) — Configuration file containing all the information related to the model quantization. Can be either:- an instance of the class
IncOptimizedConfig
, - a string valid as input to
IncOptimizedConfig.from_pretrained
.
- an instance of the class
-
q_model_name (
str
, optional) — Name of the state dictionary located in model_name_or_path used to load the quantized model. If state_dict is specified, the latter will not be used. -
input_names (
List[str]
, optional) — List of names of the inputs used when tracing the model. If unset, model.dummy_inputs().keys() are used instead. -
batch_size (
int
, optional) — Batch size of the traced model inputs. -
sequence_length (
Union[int, List[int], Tuple[int]]
, optional) — Sequence length of the traced model inputs. For sequence-to-sequence models with different sequence lengths between the encoder and the decoder inputs, this must be[encoder_sequence_length, decoder_sequence_length]
. -
num_choices (
int
, optional, defaults to -1) — The number of possible choices for a multiple choice task. -
cache_dir (
str
, optional) — Path to a directory in which a downloaded configuration should be cached if the standard cache should not be used. -
force_download (
bool
, optional, defaults toFalse
) — Whether or not to force to (re-)download the configuration files and override the cached versions if they exist. -
resume_download (
bool
, optional, defaults toFalse
) — Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists. -
revision(
str
, optional) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git. -
state_dict (
Dict[str, torch.Tensor]
, optional) — State dictionary of the quantized model, if not specified q_model_name will be used to load the state dictionary.
Returns
q_model
Quantized model.
Instantiate a quantized pytorch model from a given Intel Neural Compressor (INC) configuration file.