This model is a finetuned version of the mobilebert-uncased model on the SQuADv1 task. To make this TPU-trained model stable when used in PyTorch on GPUs, the original model has been additionally pretrained for one epoch on BookCorpus and English Wikipedia with disabled dropout before finetuning on the SQuADv1 task.

It is produced as part of the work on the paper The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models.

EM = 83.96
F1 = 90.90


If you find the model useful, please consider citing our work.

## Citation info

@article{kurtic2022optimal,
title={The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models},
author={Kurtic, Eldar and Campos, Daniel and Nguyen, Tuan and Frantar, Elias and Kurtz, Mark and Fineran, Benjamin and Goin, Michael and Alistarh, Dan},
journal={arXiv preprint arXiv:2203.07259},
year={2022}
}