Edit model card

BERT-base uncased model fine-tuned on SQuAD v1

This model is block sparse: the linear layers contains 12.5% of the original weights.

The model contains 32.1% of the original weights overall.

The training use a modified version of Victor Sanh Movement Pruning method.

That means that with the block-sparse runtime it ran 1.65x faster than an dense networks on the evaluation, at the price of some impact on the accuracy (see below).

This model was fine-tuned from the HuggingFace BERT base uncased checkpoint on SQuAD1.1, and distilled from the equivalent model csarron/bert-base-uncased-squad-v1. This model is case-insensitive: it does not make a difference between english and English.

Pruning details

A side-effect of the block pruning is that some of the attention heads are completely removed: 97 heads were removed on a total of 144 (67.4%).

Here is a detailed view on how the remaining heads are distributed in the network after pruning.

Pruning details

Density plot

Details

Dataset Split # samples
SQuAD1.1 train 90.6K
SQuAD1.1 eval 11.1k

Fine-tuning

  • Python: 3.8.5

  • Machine specs:

Memory: 64 GiB
GPUs: 1 GeForce GTX 3090, with 24GiB memory
GPU driver: 455.23.05, CUDA: 11.1

Results

Pytorch model file size: 342M (original BERT: 438M)

Metric # Value # Original (Table 2)
EM 74.39 80.8
F1 83.26 88.5

Example Usage

from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="madlag/bert-base-uncased-squad1.1-block-sparse-0.13-v1",
    tokenizer="madlag/bert-base-uncased-squad1.1-block-sparse-0.13-v1"
)

predictions = qa_pipeline({
    'context': "Frédéric François Chopin, born Fryderyk Franciszek Chopin (1 March 1810 – 17 October 1849), was a Polish composer and virtuoso pianist of the Romantic era who wrote primarily for solo piano.",
    'question': "Who is Frederic Chopin?",
})

print(predictions)
Downloads last month
7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train madlag/bert-base-uncased-squad1.1-block-sparse-0.13-v1