Weight Initialization Warning Using AutoModelForQuestionAnswering and BigBirdForQuestionAnswering with "vasudevgupta/bigbird-roberta-natural-questions"

#1
by jstremme - opened

When I attempt to load the QA model with:

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("vasudevgupta/bigbird-roberta-natural-questions")
model = AutoModelForQuestionAnswering.from_pretrained("vasudevgupta/bigbird-roberta-natural-questions")

I get :

Some weights of the model checkpoint at vasudevgupta/bigbird-roberta-natural-questions were not used when initializing BigBirdForQuestionAnswering: ['bert.pooler.weight', 'cls.weight', 'bert.pooler.bias', 'cls.bias']
- This IS expected if you are initializing BigBirdForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BigBirdForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Does anyone know if this is expected behavior? I have the same warning with BigBirdForQuestionAnswering. I would expect the exact same output layers as the model checkpoint when initializing the model this way. If there is a change that must be made to use the checkpoint, would someone be willing to share? Many thanks!!

you should use this class: https://github.com/thevasudevgupta/bigbird/blob/2507c88ee850ec97c67c709f6be39c72075827c7/src/train_nq_torch.py#L64 if you want to make the inference directly.

Also, maybe you should use this checkpoint: https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions instead of this one. This is trained until convergence and performs better.

you should use this class: https://github.com/thevasudevgupta/bigbird/blob/2507c88ee850ec97c67c709f6be39c72075827c7/src/train_nq_torch.py#L64 if you want to make the inference directly.

Also, maybe you should use this checkpoint: https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions instead of this one. This is trained until convergence and performs better.

Thanks @vasudevgupta ! This is very helpful. As I understand it, there's a difference between the NQ and SQuAD QA formats, and this model uses the NQ format requiring the NQ QA class referenced above, whereas the HuggingFace QA model class is designed for SQuAD formatted inputs.

jstremme changed discussion status to closed

Sign up or log in to comment