Reference
INCQuantizer
class optimum.intel.INCQuantizer
< source >( model: Module eval_fn: typing.Union[typing.Callable[[transformers.modeling_utils.PreTrainedModel], int], NoneType] = None calibration_fn: typing.Union[typing.Callable[[transformers.modeling_utils.PreTrainedModel], int], NoneType] = None task: typing.Optional[str] = None seed: int = 42 )
Handle the Neural Compressor quantization process.
get_calibration_dataset
< source >( dataset_name: str num_samples: int = 100 dataset_config_name: typing.Optional[str] = None dataset_split: str = 'train' preprocess_function: typing.Optional[typing.Callable] = None preprocess_batch: bool = True use_auth_token: bool = False )
Parameters
-
dataset_name (
str
) — The dataset repository name on the Hugging Face Hub or path to a local directory containing data files in generic formats and optionally a dataset script, if it requires some code to read the data files. -
num_samples (
int
, defaults to 100) — The maximum number of samples composing the calibration dataset. -
dataset_config_name (
str
, optional) — The name of the dataset configuration. -
dataset_split (
str
, defaults to"train"
) — Which split of the dataset to use to perform the calibration step. -
preprocess_function (
Callable
, optional) — Processing function to apply to each example after loading dataset. -
preprocess_batch (
bool
, defaults toTrue
) — Whether thepreprocess_function
should be batched. -
use_auth_token (
bool
, defaults toFalse
) — Whether to use the token generated when runningtransformers-cli login
.
Create the calibration datasets.Dataset
to use for the post-training static quantization calibration step.
quantize
< source >( quantization_config: PostTrainingQuantConfig save_directory: typing.Union[str, pathlib.Path] calibration_dataset: Dataset = None batch_size: int = 8 data_collator: typing.Optional[DataCollator] = None remove_unused_columns: bool = True file_name: str = None **kwargs )
Parameters
-
quantization_config (
PostTrainingQuantConfig
) — The configuration containing the parameters related to quantization. -
save_directory (
Union[str, Path]
) — The directory where the quantized model should be saved. -
calibration_dataset (
datasets.Dataset
, defaults toNone
) — The dataset to use for the calibration step, needed for post-training static quantization. -
batch_size (
int
, defaults to 8) — The number of calibration samples to load per batch. -
data_collator (
DataCollator
, defaults toNone
) — The function to use to form a batch from a list of elements of the calibration dataset. -
remove_unused_columns (
bool
, defaults toTrue
) — Whether or not to remove the columns unused by the model forward method.
Quantize a model given the optimization specifications defined in quantization_config
.
INCTrainer
class optimum.intel.INCTrainer
< source >( model: typing.Union[transformers.modeling_utils.PreTrainedModel, torch.nn.modules.module.Module] = None args: TrainingArguments = None data_collator: typing.Optional[DataCollator] = None train_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None eval_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None tokenizer: typing.Optional[transformers.tokenization_utils_base.PreTrainedTokenizerBase] = None model_init: typing.Callable[[], transformers.modeling_utils.PreTrainedModel] = None compute_metrics: typing.Union[typing.Callable[[transformers.trainer_utils.EvalPrediction], typing.Dict], NoneType] = None callbacks: typing.Optional[typing.List[transformers.trainer_callback.TrainerCallback]] = None optimizers: typing.Tuple[torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler.LambdaLR] = (None, None) preprocess_logits_for_metrics: typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor] = None quantization_config: typing.Optional[neural_compressor.config._BaseQuantizationConfig] = None pruning_config: typing.Optional[neural_compressor.config._BaseQuantizationConfig] = None distillation_config: typing.Optional[neural_compressor.config._BaseQuantizationConfig] = None task: typing.Optional[str] = None save_onnx_model: bool = False )
INCTrainer enables Intel Neural Compression quantization aware training, pruning and distillation.
How the distillation loss is computed given the student and teacher outputs.
How the loss is computed by Trainer. By default, all models return the loss in the first element.
save_model
< source >( output_dir: typing.Optional[str] = None _internal_call: bool = False save_onnx_model: typing.Optional[bool] = None )
Will save the model, so you can reload it using from_pretrained()
.
Will only save from the main process.
INCModel
from_pretrained
< source >( model_name_or_path: str q_model_name: typing.Optional[str] = None **kwargs ) → q_model
Parameters
- model_name_or_path (str) — Repository name in the Hugging Face Hub or path to a local directory hosting the model.
- q_model_name (str, optional) — Name of the state dictionary located in model_name_or_path used to load the quantized model. If state_dict is specified, the latter will not be used.
- cache_dir (str, optional) — Path to a directory in which a downloaded configuration should be cached if the standard cache should not be used.
- force_download (bool, optional, defaults to False) — Whether or not to force to (re-)download the configuration files and override the cached versions if they exist.
- resume_download (bool, optional, defaults to False) — Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.
-
revision(str, optional) —
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
git-based system for storing models and other artifacts on huggingface.co, so
revision
can be any identifier allowed by git. - state_dict_path (str, optional) — The path to the state dictionary of the quantized model.
Returns
q_model
Quantized model.
Instantiate a quantized pytorch model from a given Intel Neural Compressor configuration file.