Models Weights not Initialized

by aashutoshb97 - opened Sep 29, 2023

Sep 29, 2023

Hi,
I am trying to run cdsBERT using the provided code. When loading the model either using CPU or GPU, I am getting some warnings about the weights not loaded from checkpoint. Is this normal behavior? I also get an AttributeError (see below).

2023-09-25 16:40:21.633712: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-25 16:40:40.924721: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Some weights of BertForMaskedLM were not initialized from the model checkpoint at lhallee/cdsBERT and are newly initialized: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "../scripts/test.py", line 52, in
matrix_embedding = model(**example).last_hidden_state.cpu()
AttributeError: 'MaskedLMOutput' object has no attribute 'last_hidden_state'

lhallee

Gleghorn Lab org Sep 29, 2023

•

edited Sep 29, 2023

Hello! This is the feature extraction checkpoint. So use BertModel instead of BertForMaskedLM. I updated the documentation and uploaded the MLM checkpoint. Please see our preprint and/or the model cards for the difference between the checkpoints. Let me know if there are any other issues!

aashutoshb97

Sep 29, 2023

Hi,
Thank you for the suggestion. I updated BertForMaskedLM to BertModel. I can get the features now. I do get this following warning:

To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-29 11:25:37.062213: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Some weights of BertModel were not initialized from the model checkpoint at lhallee/cdsBERT and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

I would be grateful if you could let me know this is normal.

lhallee

Gleghorn Lab org Sep 29, 2023

Yes, this is normal with some Bert models. You can train the pooler_output, (instead of last_hidden_state) which gives a vector based on the [CLS] token, for fine-tuning other tasks. However, if you use it without training it will be randomized. If you are using the last_hidden_state only this does not cause any problems.

lhallee changed discussion status to closed Sep 29, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment