MarshallHo
/

albertZero-squad2-base-v2

Model card Files Files and versions Community

albertZero-squad2-base-v2 / README.md

MarshallHo's picture

updated usage

f7e0c35 about 3 years ago

|

raw history blame contribute delete

No virus

2.12 kB

	# albertZero

	albertZero is a PyTorch model with a prediction head fine-tuned for SQuAD 2.0.

	Based on Hugging Face's albert-base-v2, albertZero employs a novel method to speed up fine-tuning. It re-initializes weights of final linear layer in the shared albert transformer block, resulting in a 2% point improvement during the early epochs of fine-tuning.



	## Usage

	albertZero can be loaded like this:

	```python
	tokenizer = AutoTokenizer.from_pretrained('MarshallHo/albertZero-squad2-base-v2')
	model = AutoModel.from_pretrained('MarshallHo/albertZero-squad2-base-v2')

	```

	or

	```python
	from transformers import AlbertModel, AlbertTokenizer, AlbertForQuestionAnswering, AlbertPreTrainedModel

	mytokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
	model = AlbertForQuestionAnsweringAVPool.from_pretrained('albert-base-v2')
	model.load_state_dict(torch.load('albertZero-squad2-base-v2.bin'))
	```


	## References

	The goal of [ALBERT](https://arxiv.org/abs/1909.11942) is to reduce the memory requirement of the groundbreaking
	language model [BERT](https://arxiv.org/abs/1810.04805), while providing a similar level of performance. ALBERT mainly uses 2 methods to reduce the number of parameters – parameter sharing and factorized embedding.

	The field of NLP has undergone major improvements in recent years. The
	replacement of recurrent architectures by attention-based models has allowed NLP tasks such as
	question-answering to approach human level performance. In order to push the limits further, the
	[SQuAD2.0](https://arxiv.org/abs/1806.03822) dataset was created in 2018 with 50,000 additional unanswerable questions, addressing a major weakness of the original version of the dataset.

	At the time of writing, near the top of the [SQuAD2.0 leaderboard](https://rajpurkar.github.io/SQuAD-explorer/) is Shanghai Jiao Tong University’s [Retro-Reader](http://arxiv.org/abs/2001.09694).
	We have re-implemented their non-ensemble ALBERT model with the SQUAD2.0 prediction head.



	## Acknowledgments

	Thanks to the generosity of the team at Hugging Face and all the groups referenced above !