Weight Initialization Warning Using AutoModelForQuestionAnswering and BigBirdForQuestionAnswering with "vasudevgupta/bigbird-roberta-natural-questions"

by jstremme - opened Sep 13, 2022

Sep 13, 2022

When I attempt to load the QA model with:

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
tokenizer = AutoTokenizer.from_pretrained("vasudevgupta/bigbird-roberta-natural-questions")
model = AutoModelForQuestionAnswering.from_pretrained("vasudevgupta/bigbird-roberta-natural-questions")

I get :

Some weights of the model checkpoint at vasudevgupta/bigbird-roberta-natural-questions were not used when initializing BigBirdForQuestionAnswering: ['bert.pooler.weight', 'cls.weight', 'bert.pooler.bias', 'cls.bias']
- This IS expected if you are initializing BigBirdForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BigBirdForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Does anyone know if this is expected behavior? I have the same warning with BigBirdForQuestionAnswering. I would expect the exact same output layers as the model checkpoint when initializing the model this way. If there is a change that must be made to use the checkpoint, would someone be willing to share? Many thanks!!

vasudevgupta

Owner Sep 15, 2022

you should use this class: https://github.com/thevasudevgupta/bigbird/blob/2507c88ee850ec97c67c709f6be39c72075827c7/src/train_nq_torch.py#L64 if you want to make the inference directly.

Also, maybe you should use this checkpoint: https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions instead of this one. This is trained until convergence and performs better.

jstremme

Sep 18, 2022

you should use this class: https://github.com/thevasudevgupta/bigbird/blob/2507c88ee850ec97c67c709f6be39c72075827c7/src/train_nq_torch.py#L64 if you want to make the inference directly.

Also, maybe you should use this checkpoint: https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions instead of this one. This is trained until convergence and performs better.

Thanks @vasudevgupta ! This is very helpful. As I understand it, there's a difference between the NQ and SQuAD QA formats, and this model uses the NQ format requiring the NQ QA class referenced above, whereas the HuggingFace QA model class is designed for SQuAD formatted inputs.

jstremme changed discussion status to closed Sep 18, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment