Update notes on model prep
Browse files
README.md
CHANGED
@@ -9,6 +9,7 @@ datasets: squad
|
|
9 |
# mobilebert-uncased-finetuned-squadv1
|
10 |
|
11 |
This model is a finetuned version of the [mobilebert-uncased](https://huggingface.co/google/mobilebert-uncased/tree/main) model on the SQuADv1 task.
|
|
|
12 |
|
13 |
It is produced as part of the work on the paper [The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models](https://arxiv.org/abs/2203.07259).
|
14 |
|
@@ -30,4 +31,5 @@ If you find the model useful, please consider citing our work.
|
|
30 |
journal={arXiv preprint arXiv:2203.07259},
|
31 |
year={2022}
|
32 |
}
|
33 |
-
```
|
|
9 |
# mobilebert-uncased-finetuned-squadv1
|
10 |
|
11 |
This model is a finetuned version of the [mobilebert-uncased](https://huggingface.co/google/mobilebert-uncased/tree/main) model on the SQuADv1 task.
|
12 |
+
To make this TPU-trained model stable when used in PyTorch on GPUs, the original model has been additionally pretrained for one epoch on BookCorpus and English Wikipedia with disabled dropout before finetuning on the SQuADv1 task.
|
13 |
|
14 |
It is produced as part of the work on the paper [The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models](https://arxiv.org/abs/2203.07259).
|
15 |
|
31 |
journal={arXiv preprint arXiv:2203.07259},
|
32 |
year={2022}
|
33 |
}
|
34 |
+
```
|
35 |
+
|