kssteven commited on
Commit
a4be6cb
·
1 Parent(s): 549add7
Files changed (1) hide show
  1. README.md +103 -0
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # I-BERT base model
2
+
3
+ This model, `ibert-roberta-base`, is an integer-only quantized version of [RoBERTa](https://arxiv.org/abs/1907.11692), and was introduced in [this papaer](https://arxiv.org/abs/2101.01321).
4
+ I-BERT stores all parameters with INT8 representation, and carries out the entire inference using integer-only arithmetic.
5
+ In particular, I-BERT replaces all floating point operations in the Transformer architectures (e.g., MatMul, GELU, Softmax, and LayerNorm) with closely approximating integer operations.
6
+ This can result in upto 4x inference speed up as compared to floating point counterpart when tested on an Nvidia T4 GPU.
7
+ The best model parameters searched via quantization-aware finetuning can be then exported (e.g., to TensorRT) for integer-only deployment of the model.
8
+
9
+
10
+ ## Finetuning Procedure
11
+
12
+ Finetuning of I-BERT consists of 3 stages: (1) Full-precision finetuning from the pretrained model on a down-stream task, (2) model quantization, and (3) integer-only finetuning (i.e., quantization-aware training) of the quantized model.
13
+
14
+
15
+ ### Full-precision finetuning
16
+
17
+ Full-precision finetuning of I-BERT is similar to RoBERTa finetuning.
18
+ For instance, you can run the following command to finetune on the [MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398) text classification task.
19
+
20
+ ```
21
+ python examples/text-classification/run_glue.py \
22
+ --model_name_or_path kssteven/ibert-roberta-base \
23
+ --task_name MRPC \
24
+ --do_eval \
25
+ --do_train \
26
+ --evaluation_strategy epoch \
27
+ --max_seq_length 128 \
28
+ --per_device_train_batch_size 32 \
29
+ --save_steps 115 \
30
+ --learning_rate 2e-5 \
31
+ --num_train_epochs 10 \
32
+ --output_dir $OUTPUT_DIR
33
+ ```
34
+
35
+ ### Model Quantization
36
+
37
+ Once you are done with full-precision finetuning, open up `config.json` in your checkpoint directory and set the `quantize` attribute as `true`.
38
+
39
+ ```
40
+ {
41
+ "_name_or_path": "kssteven/ibert-roberta-base",
42
+ "architectures": [
43
+ "IBertForSequenceClassification"
44
+ ],
45
+ "attention_probs_dropout_prob": 0.1,
46
+ "bos_token_id": 0,
47
+ "eos_token_id": 2,
48
+ "finetuning_task": "mrpc",
49
+ "force_dequant": "none",
50
+ "hidden_act": "gelu",
51
+ "hidden_dropout_prob": 0.1,
52
+ "hidden_size": 768,
53
+ "initializer_range": 0.02,
54
+ "intermediate_size": 3072,
55
+ "layer_norm_eps": 1e-05,
56
+ "max_position_embeddings": 514,
57
+ "model_type": "ibert",
58
+ "num_attention_heads": 12,
59
+ "num_hidden_layers": 12,
60
+ "pad_token_id": 1,
61
+ "position_embedding_type": "absolute",
62
+ "quant_mode": true,
63
+ "tokenizer_class": "RobertaTokenizer",
64
+ "transformers_version": "4.4.0.dev0",
65
+ "type_vocab_size": 1,
66
+ "vocab_size": 50265
67
+ }
68
+ ```
69
+
70
+ Then, your model will automatically run as the integer-only mode when you load the checkpoint.
71
+ Also, make sure to delete `optimizer.pt`, `scheduler.pt` and `trainer_state.json` in the same directory.
72
+ Otherwise, HF will not reset the optimizer, scheduler, or trainer state for the following integer-only finetuning.
73
+
74
+
75
+ ### Integer-only finetuning (Quantization-aware training)
76
+
77
+ Finally, you will be able to run integer-only finetuning simply by loading the checkpoint file you modified.
78
+ Note that the only difference in the example command below is `model_name_or_path`.
79
+
80
+ python examples/text-classification/run_glue.py \
81
+ --model_name_or_path $CHECKPOINT_DIR
82
+ --task_name MRPC \
83
+ --do_eval \
84
+ --do_train \
85
+ --evaluation_strategy epoch \
86
+ --max_seq_length 128 \
87
+ --per_device_train_batch_size 32 \
88
+ --save_steps 115 \
89
+ --learning_rate 1e-6 \
90
+ --num_train_epochs 10 \
91
+ --output_dir $OUTPUT_DIR
92
+
93
+
94
+ ## Citation info
95
+
96
+ If you use I-BERT, please cite [our papaer](https://arxiv.org/abs/2101.01321).
97
+ ```
98
+ @article{kim2021bert,
99
+ title={I-BERT: Integer-only BERT Quantization},
100
+ author={Kim, Sehoon and Gholami, Amir and Yao, Zhewei and Mahoney, Michael W and Keutzer, Kurt},
101
+ journal={arXiv preprint arXiv:2101.01321},
102
+ year={2021}
103
+ }