Joint magnitude pruning, quantization and distillation on BERT-base/SST-2

This model conducts unstructured magnitude pruning, quantization and distillation at the same time on BERT-base when finetuning on the GLUE SST2 dataset. It achieves the following results on the evaluation set:

Torch loss: 0.3858
Torch accuracy: 0.9128
OpenVINO IR accuracy: 0.9128
Sparsity in transformer block linear layers: 0.80

Setup

conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
git clone https://github.com/yujiepan-work/optimum-intel.git
git checkout -b "magnitude-pruning" 01927af543eaea8678671bf8f4eb78fdb29f8930
cd optimum-intel
pip install -e .[openvino,nncf]

cd examples/openvino/text-classification/
pip install -r requirements.txt
pip install wandb # optional

NNCF config

See nncf_config.json in this repo.

Run

We use one card for training.

NNCFCFG=/path/to/nncf/config
python run_glue.py \
--lr_scheduler_type cosine_with_restarts \
--cosine_cycle_ratios 11,6 \
--cosine_cycle_decays 1,1 \
--save_best_model_after_epoch -1 \
--save_best_model_after_sparsity 0.7999 \
--model_name_or_path textattack/bert-base-uncased-SST-2 \
--teacher_model_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \
--distillation_temperature 2 \
--task_name sst2 \
--nncf_compression_config $NNCFCFG \
--distillation_weight 0.95 \
--output_dir /tmp/bert-base-uncased-sst2-int8-unstructured80-17epoch \
--run_name bert-base-uncased-sst2-int8-unstructured80-17epoch \
--overwrite_output_dir \
--do_train \
--do_eval \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--learning_rate 5e-05 \
--optim adamw_torch \
--num_train_epochs 17 \
--logging_steps 1 \
--evaluation_strategy steps \
--eval_steps 250 \
--save_strategy steps \
--save_steps 250 \
--save_total_limit 1 \
--fp16 \
--seed 1

The best model checkpoint is stored in the best_model folder. Here we only upload that checkpoint folder together with some config files.

inference

https://gist.github.com/yujiepan-work/c38dc4e56c7a9d803c42988f7b7d260a

Framework versions

Transformers 4.26.0
Pytorch 1.13.1+cu116
Datasets 2.8.0
Tokenizers 0.13.2

For a full description of the environment, please refer to pip-requirements.txt and conda-requirements.txt.

yujiepan
/

bert-base-uncased-sst2-int8-unstructured80-17epoch

Joint magnitude pruning, quantization and distillation on BERT-base/SST-2

Setup

NNCF config

Run

inference

Framework versions

Dataset used to train yujiepan/bert-base-uncased-sst2-int8-unstructured80-17epoch

Evaluation results