Reference

class optimum.intel.INCQuantizer

( model: Module eval_fn: typing.Union[typing.Callable[[transformers.modeling_utils.PreTrainedModel], int], NoneType] = None calibration_fn: typing.Union[typing.Callable[[transformers.modeling_utils.PreTrainedModel], int], NoneType] = None task: typing.Optional[str] = None seed: int = 42 )

Handle the Neural Compressor quantization process.

get_calibration_dataset

< source >

( dataset_name: str num_samples: int = 100 dataset_config_name: typing.Optional[str] = None dataset_split: str = 'train' preprocess_function: typing.Optional[typing.Callable] = None preprocess_batch: bool = True use_auth_token: bool = False )

Parameters

dataset_name (str) — The dataset repository name on the Hugging Face Hub or path to a local directory containing data files in generic formats and optionally a dataset script, if it requires some code to read the data files.
num_samples (int, defaults to 100) — The maximum number of samples composing the calibration dataset.
dataset_config_name (str, optional) — The name of the dataset configuration.
dataset_split (str, defaults to "train") — Which split of the dataset to use to perform the calibration step.
preprocess_function (Callable, optional) — Processing function to apply to each example after loading dataset.
preprocess_batch (bool, defaults to True) — Whether the preprocess_function should be batched.
use_auth_token (bool, defaults to False) — Whether to use the token generated when running transformers-cli login.

Create the calibration datasets.Dataset to use for the post-training static quantization calibration step.

quantize

< source >

( quantization_config: typing.Union[ForwardRef('PostTrainingQuantConfig'), ForwardRef('WeightOnlyQuantConfig')] save_directory: typing.Union[str, pathlib.Path] calibration_dataset: Dataset = None batch_size: int = 8 data_collator: typing.Optional[DataCollator] = None remove_unused_columns: bool = True file_name: str = None weight_only: bool = False **kwargs )

Parameters

quantization_config (Union[PostTrainingQuantConfig, WeightOnlyQuantConfig]) — The configuration containing the parameters related to quantization.
save_directory (Union[str, Path]) — The directory where the quantized model should be saved.
calibration_dataset (datasets.Dataset, defaults to None) — The dataset to use for the calibration step, needed for post-training static quantization.
batch_size (int, defaults to 8) — The number of calibration samples to load per batch.
data_collator (DataCollator, defaults to None) — The function to use to form a batch from a list of elements of the calibration dataset.
remove_unused_columns (bool, defaults to True) — Whether or not to remove the columns unused by the model forward method.
weight_only (bool, defaults to False) — Whether compress weights to integer precision (4-bit by default) while keeping activations floating-point. Fits best for LLM footprint reduction and performance acceleration.

Quantize a model given the optimization specifications defined in quantization_config.

class optimum.intel.INCTrainer

< source >

( model: typing.Union[transformers.modeling_utils.PreTrainedModel, torch.nn.modules.module.Module] = None args: TrainingArguments = None data_collator: typing.Optional[DataCollator] = None train_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None eval_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None tokenizer: typing.Optional[transformers.tokenization_utils_base.PreTrainedTokenizerBase] = None model_init: typing.Callable[[], transformers.modeling_utils.PreTrainedModel] = None compute_metrics: typing.Union[typing.Callable[[transformers.trainer_utils.EvalPrediction], typing.Dict], NoneType] = None callbacks: typing.Optional[typing.List[transformers.trainer_callback.TrainerCallback]] = None optimizers: typing.Tuple[torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler.LambdaLR] = (None, None) preprocess_logits_for_metrics: typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor] = None quantization_config: typing.Optional[neural_compressor.conf.pythonic_config._BaseQuantizationConfig] = None pruning_config: typing.Optional[neural_compressor.conf.pythonic_config._BaseQuantizationConfig] = None distillation_config: typing.Optional[neural_compressor.conf.pythonic_config._BaseQuantizationConfig] = None task: typing.Optional[str] = None save_onnx_model: bool = False )

INCTrainer enables Intel Neural Compression quantization aware training, pruning and distillation.

compute_distillation_loss

< source >

( student_outputs teacher_outputs )

How the distillation loss is computed given the student and teacher outputs.

compute_loss

< source >

( model inputs return_outputs = False )

How the loss is computed by Trainer. By default, all models return the loss in the first element.

save_model

< source >

( output_dir: typing.Optional[str] = None _internal_call: bool = False save_onnx_model: typing.Optional[bool] = None )

Will save the model, so you can reload it using from_pretrained(). Will only save from the main process.