albertZero

albertZero is a PyTorch model with a prediction head fine-tuned for SQuAD 2.0.

Based on Hugging Face's albert-base-v2, albertZero employs a novel method to speed up fine-tuning. It re-initializes weights of final linear layer in the shared albert transformer block, resulting in a 2% point improvement during the early epochs of fine-tuning.

Usage

albertZero can be loaded like this:

tokenizer = AutoTokenizer.from_pretrained('MarshallHo/albertZero-squad2-base-v2')
model = AutoModel.from_pretrained('MarshallHo/albertZero-squad2-base-v2')

from transformers import AlbertModel, AlbertTokenizer, AlbertForQuestionAnswering, AlbertPreTrainedModel

mytokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
model = AlbertForQuestionAnsweringAVPool.from_pretrained('albert-base-v2')
model.load_state_dict(torch.load('albertZero-squad2-base-v2.bin'))

References

The goal of ALBERT is to reduce the memory requirement of the groundbreaking language model BERT, while providing a similar level of performance. ALBERT mainly uses 2 methods to reduce the number of parameters – parameter sharing and factorized embedding.

The field of NLP has undergone major improvements in recent years. The replacement of recurrent architectures by attention-based models has allowed NLP tasks such as question-answering to approach human level performance. In order to push the limits further, the SQuAD2.0 dataset was created in 2018 with 50,000 additional unanswerable questions, addressing a major weakness of the original version of the dataset.

At the time of writing, near the top of the SQuAD2.0 leaderboard is Shanghai Jiao Tong University’s Retro-Reader. We have re-implemented their non-ensemble ALBERT model with the SQUAD2.0 prediction head.

Acknowledgments

Thanks to the generosity of the team at Hugging Face and all the groups referenced above !