File size: 4,032 Bytes

e26cb38
 
6cfde9d
e26cb38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3845603
e26cb38
3845603
e26cb38
b0c32a5
3845603
 
b0c32a5
 
 
e26cb38
3845603
e26cb38
 
 
efe8fe0
e26cb38
b0c32a5
e26cb38
3845603
e26cb38
 
 
 
 
 
 
 
 
6cfde9d
e26cb38
 
 
 
 
 
 
 
 
 
 
423570d
 
 
 
e26cb38
 
 
 
 
cf9a3e9
e26cb38
 
 
6cfde9d
 
e26cb38
 
cf9a3e9
 
 
 
 
e26cb38
 
 
 
cf9a3e9
e26cb38

---
language: en
thumbnail:
license: mit
tags:
- question-answering
- bert
- bert-base
datasets:
- squad
metrics:
- squad
widget:
- text: "Where is the Eiffel Tower located?"
  context: "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower."
- text: "Who is Frederic Chopin?"
  context: "Frédéric François Chopin, born Fryderyk Franciszek Chopin (1 March 1810 – 17 October 1849), was a Polish composer and virtuoso pianist of the Romantic era who wrote primarily for solo piano."
---

## BERT-base uncased model fine-tuned on SQuAD v1

This model was created using the [nn_pruning](https://github.com/huggingface/nn_pruning) python library: the **linear layers contains 64.0%** of the original weights.

The model contains **36.0%** of the original weights **overall** (remaining weights are mostly in the input/ouput embeddings).

With a simple resizing of the linear matrices it ran **1.84x as fast as BERT-base** on the evaluation.
This is possible because the pruning method lead to structured matrices: to visualize them, hover below on the plot to see the non-zero/zero parts of each matrix.

<div class="graph"><script src="/madlag/bert-base-uncased-squadv1-x1.84-f88.7-d36-hybrid-filled-v1/raw/main/model_card/density_info.js" id="6a33cbcb-db1a-4e7a-a6dc-1c353babcf55"></script></div>

In terms of accuracy, its **F1 is 88.72**, compared with 88.5 for BERT-base, a **F1 gain of 0.22**.

## Fine-Pruning details
This model was fine-tuned from the HuggingFace [BERT](https://www.aclweb.org/anthology/N19-1423/) base uncased checkpoint on [SQuAD1.1](https://rajpurkar.github.io/SQuAD-explorer), and distilled from the equivalent model [csarron/bert-base-uncased-squad-v1](https://huggingface.co/csarron/bert-base-uncased-squad-v1).
This model is case-insensitive: it does not make a difference between english and English.

A side-effect of the block pruning is that some of the attention heads are completely removed: 48 heads were removed on a total of 144 (33.3%).
Here is a detailed view on how the remaining heads are distributed in the network after pruning.
<div class="graph"><script src="/madlag/bert-base-uncased-squadv1-x1.84-f88.7-d36-hybrid-filled-v1/raw/main/model_card/pruning_info.js" id="12c50081-be7a-4612-a727-67c9c9309cc9"></script></div>

## Details of the SQuAD1.1 dataset

| Dataset  | Split | # samples |
| -------- | ----- | --------- |
| SQuAD1.1 | train | 90.6K      |
| SQuAD1.1 | eval  | 11.1k     |

### Fine-tuning
- Python: `3.8.5`

- Machine specs:

```CPU: Intel(R) Core(TM) i7-6700K CPU
Memory: 64 GiB
GPUs: 1 GeForce GTX 3090, with 24GiB memory
GPU driver: 455.23.05, CUDA: 11.1
```

### Results

**Pytorch model file size**: `379M` (original BERT: `438M`)

| Metric | # Value   | # Original ([Table 2](https://www.aclweb.org/anthology/N19-1423.pdf))| Variation |
| ------ | --------- | --------- | --------- |
| **EM** | **81.69** | **80.8** | **+0.89**|
| **F1** | **88.72** | **88.5** | **+0.22**|

## Example Usage

```python
from transformers import pipeline
from nn_pruning.inference_model_patcher import optimize_model

qa_pipeline = pipeline(
    "question-answering",
    model="madlag/bert-base-uncased-squadv1-x1.84-f88.7-d36-hybrid-filled-v1",
    tokenizer="madlag/bert-base-uncased-squadv1-x1.84-f88.7-d36-hybrid-filled-v1"
)

print("BERT-base parameters: 110M")
print(f"Parameters count (includes head pruning)={int(qa_pipeline.model.num_parameters() / 1E6)}M")
qa_pipeline.model = optimize_model(qa_pipeline.model, "dense")

print(f"Parameters count after optimization={int(qa_pipeline.model.num_parameters() / 1E6)}M")
predictions = qa_pipeline({
    'context': "Frédéric François Chopin, born Fryderyk Franciszek Chopin (1 March 1810 – 17 October 1849), was a Polish composer and virtuoso pianist of the Romantic era who wrote primarily for solo piano.",
    'question': "Who is Frederic Chopin?",
})
print("Predictions", predictions)
```