🤗 Optimum Intel
🤗 Optimum Intel is the interface between the 🤗 Transformers library and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
Intel Neural Compressor is an open-source library enabling the usage of the most popular compression techniques such as quantization, pruning and knowledge distillation. It supports automatic accuracy-driven tuning strategies in order for users to easily generate quantized model. The users can easily apply static, dynamic and aware-training quantization approaches while giving an expected accuracy criteria. It also supports different weight pruning techniques enabling the creation of pruned model giving a predefined sparsity target.
OpenVINO is an open-source toolkit that enables high performance inference capabilities for Intel CPUs, GPUs, and special DL inference accelerators. It is supplied with a set of tools to optimize and quantize models. Optimum Intel provides a simple interface to optimize Transformer models, convert them to OpenVINO Intermediate Representation format and to run inference using OpenVINO.
Installation
To install the latest release of 🤗 Optimum Intel with the corresponding required dependencies, you can do respectively:
Accelerator | Installation |
---|---|
Intel Neural Compressor (INC) | python -m pip install optimum[neural-compressor] |
Intel OpenVINO | python -m pip install optimum[openvino,nncf] |
We recommend creating a virtual environment and upgrading
pip with python -m pip install --upgrade pip
.
Optimum Intel is a fast-moving project, and you may want to install from source with the following command:
python -m pip install git+https://github.com/huggingface/optimum-intel.git
or to install from source including dependencies:
python -m pip install git+https://github.com/huggingface/optimum-intel.git#egg=optimum-intel[extras]
where extras
can be one or more of neural-compressor
, openvino
, nncf
.