--- language: - en tags: - generated_from_trainer datasets: - glue metrics: - accuracy model-index: - name: yujiepan/bert-base-uncased-sst2-int8-unstructured80-30epoch results: - task: name: Text Classification type: text-classification dataset: name: GLUE SST2 type: glue config: sst2 split: validation args: sst2 metrics: - name: Accuracy type: accuracy value: 0.9139908256880734 --- # Joint magnitude pruning, quantization and distillation on BERT-base/SST-2 This model conducts unstructured magnitude pruning, quantization and distillation at the same time when finetuning on the GLUE SST2 dataset. It achieves the following results on the evaluation set: - Torch loss: 0.4116 - Torch accuracy: 0.9140 - OpenVINO IR accuracy: 0.9106 - Sparsity in transformer block linear layers: 0.80 ## Setup ``` conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia git clone https://github.com/yujiepan-work/optimum-intel.git git checkout -b "magnitude-pruning" 01927af543eaea8678671bf8f4eb78fdb29f8930 cd optimum-intel pip install -e .[openvino,nncf] cd examples/openvino/text-classification/ pip install -r requirements.txt pip install wandb # optional ``` ## NNCF config See `nncf_config.json` in this repo. ## Run We use one card for training. ``` NNCFCFG=/path/to/nncf/config python run_glue.py \ --lr_scheduler_type cosine_with_restarts \ --cosine_cycle_ratios 8,6,4,4,4,4 \ --cosine_cycle_decays 1,1,1,1,1,1 \ --save_best_model_after_epoch -1 \ --save_best_model_after_sparsity 0.7999 \ --model_name_or_path textattack/bert-base-uncased-SST-2 \ --teacher_model_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \ --distillation_temperature 2 \ --task_name sst2 \ --nncf_compression_config $NNCFCFG \ --distillation_weight 0.95 \ --output_dir /tmp/bert-base-uncased-sst2-int8-unstructured80-30epoch \ --run_name bert-base-uncased-sst2-int8-unstructured80-30epoch \ --overwrite_output_dir \ --do_train \ --do_eval \ --max_seq_length 128 \ --per_device_train_batch_size 32 \ --per_device_eval_batch_size 32 \ --learning_rate 5e-05 \ --optim adamw_torch \ --num_train_epochs 30 \ --logging_steps 1 \ --evaluation_strategy steps \ --eval_steps 250 \ --save_strategy steps \ --save_steps 250 \ --save_total_limit 1 \ --fp16 \ --seed 1 ``` The best model checkpoint is stored in the `best_model` folder. Here we only upload that checkpoint folder together with some config files. ### Framework versions - Transformers 4.26.0 - Pytorch 1.13.1+cu116 - Datasets 2.8.0 - Tokenizers 0.13.2 For a full description of the environment, please refer to `pip-requirements.txt` and `conda-requirements.txt`.