File size: 2,759 Bytes
3841ad3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
language:
- en
tags:
- generated_from_trainer
datasets:
- glue
metrics:
- accuracy
model-index:
- name: yujiepan/bert-base-uncased-sst2-int8-unstructured80
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: GLUE SST2
      type: glue
      config: sst2
      split: validation
      args: sst2
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.91284
pipeline_tag: text-classification
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Joint magnitude pruning, quantization and distillation on BERT-base/SST-2

This model conducts unstructured magnitude pruning, quantization and distillation at the same time on BERT-base when finetuning on the GLUE SST2 dataset.
It achieves the following results on the evaluation set:
- Torch accuracy: 0.9128
- OpenVINO IR accuracy: 0.9128
- Sparsity in transformer block linear layers: 0.80

## Setup

```
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
pip install optimum[openvino,nncf]==1.7.0
# TODO
pip install wandb # optional
```

## NNCF config

See `nncf_config.json` in this repo.


## Run

We use one card for training.

```
NNCFCFG=/path/to/nncf/config
python run_glue.py \
--lr_scheduler_type cosine_with_restarts \
--cosine_cycle_ratios 11,6 \
--cosine_cycle_decays 1,1 \
--save_best_model_after_epoch -1 \
--save_best_model_after_sparsity 0.7999 \
--model_name_or_path textattack/bert-base-uncased-SST-2 \
--teacher_model_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \
--distillation_temperature 2 \
--task_name sst2 \
--nncf_compression_config $NNCFCFG \
--distillation_weight 0.95 \
--output_dir /tmp/bert-base-uncased-sst2-int8-unstructured80-17epoch \
--run_name bert-base-uncased-sst2-int8-unstructured80-17epoch \
--overwrite_output_dir \
--do_train \
--do_eval \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--learning_rate 5e-05 \
--optim adamw_torch \
--num_train_epochs 17 \
--logging_steps 1 \
--evaluation_strategy steps \
--eval_steps 250 \
--save_strategy steps \
--save_steps 250 \
--save_total_limit 1 \
--fp16 \
--seed 1
```

The best model checkpoint is stored in the `best_model` folder. Here we only upload that checkpoint folder together with some config files.


## inference

https://gist.github.com/yujiepan-work/c38dc4e56c7a9d803c42988f7b7d260a


### Framework versions

- Transformers 4.26.0
- Pytorch 1.13.1+cu116
- Datasets 2.8.0
- Tokenizers 0.13.2

For a full description of the environment, please refer to `pip-requirements.txt` and `conda-requirements.txt`.