Joint Pruning, Quantization and Distillation for BERT-large/SQuADv1.1

Setup

git clone https://github.com/vuiseng9/optimum-intel
cd optimum-intel
pip install -e .[openvino,nncf]

cd examples/openvino/question-answering/
pip install -r requirements.txt

pip install wandb # optional

Run


NNCFCFG=/path/to/openvino_config.json
MASTER_PORT=<PORTID>
RUNID=<RUN_IDENTIFIER>
OUTDIR=/path/to/saved_model

NEPOCH=30

python -m torch.distributed.launch \
    --nproc_per_node 4 \
    --master_port $MASTER_PORT \
    run_qa.py \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --dataset_name squad \
    --teacher_model_or_path bert-large-uncased-whole-word-masking-finetuned-squad \
    --distillation_weight 0.9 \
    --do_eval \
    --fp16 \
    --do_train \
    --learning_rate 3e-5 \
    --num_train_epochs $NEPOCH \
    --per_device_eval_batch_size 128 \
    --per_device_train_batch_size 16 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --logging_steps 1 \
    --evaluation_strategy steps \
    --eval_steps 250 \
    --save_steps 500 \
    --overwrite_output_dir \
    --run_name $RUNID \
    --output_dir $OUTDIR \
    --nncf_compression_config $NNCFCFG

Reference Results

Global Step: 39500
F1: 92.482
EM: 86.594
Structured Sparsity (linear): 61.70%
Model Sparsity: 55.82%