vuiseng9
/

bert-base-squadv1-block-pruning-hybrid-filled-lt-nncf-50.0sparse-qat-lt

Inference Endpoints

Model card Files Files and versions Community

Chua, Vui Seng commited on Jan 9, 2022

Commit

fae2108

•

1 Parent(s): 5b8a717

UPdate readme

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 This model is a downstream optimization of [```vuiseng9/bert-base-squadv1-block-pruning-hybrid-filled-lt```](https://huggingface.co/vuiseng9/bert-base-squadv1-block-pruning-hybrid-filled-lt) using [OpenVINO/NNCF](https://github.com/openvinotoolkit/nncf). Applied optimization includes:
-1. magnitude sparsification during initialization
-2. NNCF Quantize-Aware Training
 3. Custom distillation with large model ```bert-large-uncased-whole-word-masking-finetuned-squad```
 ```
@@ -84,7 +84,6 @@ python run_qa.py \
 ```
 # Eval
 This repo must be cloned locally.
 ```bash
 git clone https://huggingface.co/vuiseng9/bert-base-squadv1-block-pruning-hybrid-filled-lt-nncf-50.0sparse-qat-lt

 This model is a downstream optimization of [```vuiseng9/bert-base-squadv1-block-pruning-hybrid-filled-lt```](https://huggingface.co/vuiseng9/bert-base-squadv1-block-pruning-hybrid-filled-lt) using [OpenVINO/NNCF](https://github.com/openvinotoolkit/nncf). Applied optimization includes:
+1. magnitude sparsification at 50% upon initialization. Parameters are ranked globally via thier absolute norm. Only linear layers of self-attention and ffnn are targeted.
+2. NNCF Quantize-Aware Training - Symmetric 8-bit for both weight and activation on all learnable layers.
 3. Custom distillation with large model ```bert-large-uncased-whole-word-masking-finetuned-squad```
 ```
 ```
 # Eval
 This repo must be cloned locally.
 ```bash
 git clone https://huggingface.co/vuiseng9/bert-base-squadv1-block-pruning-hybrid-filled-lt-nncf-50.0sparse-qat-lt