WARN: You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference

#10
by greenpau - opened

I am loading the model from cache. When I do that, the model does not load files in 2_Dense directory (which are 768d vectors) and it load from root which is 1024d vectors.

Why is this message showing up? How should I address the issue?

2023-06-23 11:56:07,939 product-manual-indexer [DEBUG] loading AutoTokenizer from /app/data/cache/instructor-large/hkunlp_instructor-large
2023-06-23 11:56:07,984 product-manual-indexer [DEBUG] loading AutoModel from /app/data/cache/instructor-large/hkunlp_instructor-large
Some weights of T5Model were not initialized from the model checkpoint at /app/data/cache/instructor-large/hkunlp_instructor-large and are newly initialized: ['decoder.block.12.layer.2.DenseReluDense.wo.weight', 'decoder.block.22.layer.1.EncDecAttention.o.weight', 'decoder.block.19.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.17.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.18.layer.0.layer_norm.weight', 'decoder.block.19.layer.2.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.13.layer.0.layer_norm.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.16.layer.1.EncDecAttention.k.weight', 'decoder.block.15.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.21.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.18.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.1.EncDecAttention.q.weight', 'decoder.block.17.layer.0.SelfAttention.o.weight', 'decoder.block.20.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.9.layer.2.DenseReluDense.wi.weight', 'decoder.block.12.layer.2.DenseReluDense.wi.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.23.layer.0.SelfAttention.v.weight', 'decoder.block.19.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.2.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.o.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.17.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.23.layer.1.layer_norm.weight', 'decoder.block.17.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.1.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.2.DenseReluDense.wi.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.10.layer.2.DenseReluDense.wi.weight', 'decoder.block.20.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.2.DenseReluDense.wi.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'decoder.block.17.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.23.layer.2.layer_norm.weight', 'decoder.block.23.layer.0.layer_norm.weight', 'decoder.block.15.layer.0.SelfAttention.k.weight', 'decoder.block.21.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.2.DenseReluDense.wi.weight', 'decoder.block.16.layer.0.SelfAttention.k.weight', 'decoder.block.17.layer.1.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.23.layer.2.DenseReluDense.wi.weight', 'decoder.block.15.layer.1.EncDecAttention.o.weight', 'decoder.block.21.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.13.layer.2.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.1.EncDecAttention.k.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.18.layer.2.DenseReluDense.wi.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.19.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.2.layer_norm.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.2.layer_norm.weight', 'decoder.block.15.layer.1.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.22.layer.1.EncDecAttention.q.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.22.layer.1.layer_norm.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.v.weight', 'decoder.block.15.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.22.layer.0.SelfAttention.k.weight', 'decoder.block.16.layer.0.SelfAttention.q.weight', 'decoder.block.18.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.0.SelfAttention.q.weight', 'decoder.block.14.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.18.layer.0.SelfAttention.v.weight', 'decoder.block.14.layer.0.layer_norm.weight', 'decoder.block.18.layer.1.layer_norm.weight', 'decoder.block.20.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.q.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.18.layer.1.EncDecAttention.o.weight', 'decoder.block.15.layer.2.DenseReluDense.wi.weight', 'decoder.block.12.layer.1.EncDecAttention.q.weight', 'decoder.block.19.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.6.layer.2.DenseReluDense.wi.weight', 'decoder.block.23.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.block.22.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.20.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.20.layer.1.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.8.layer.2.DenseReluDense.wi.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.14.layer.1.EncDecAttention.q.weight', 'decoder.block.13.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.23.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.21.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.1.EncDecAttention.v.weight', 'decoder.block.16.layer.0.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.2.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.22.layer.2.layer_norm.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.21.layer.1.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wi.weight', 'decoder.block.13.layer.1.EncDecAttention.o.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.2.layer_norm.weight', 'decoder.block.14.layer.1.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.17.layer.0.SelfAttention.v.weight', 'decoder.block.19.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.v.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.13.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.1.EncDecAttention.q.weight', 'decoder.block.23.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.1.EncDecAttention.v.weight', 'decoder.block.20.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wi.weight', 'decoder.block.14.layer.0.SelfAttention.o.weight', 'decoder.block.13.layer.2.DenseReluDense.wi.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.14.layer.1.EncDecAttention.k.weight', 'decoder.block.15.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.20.layer.1.EncDecAttention.o.weight', 'decoder.block.23.layer.1.EncDecAttention.o.weight', 'decoder.block.19.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.20.layer.2.DenseReluDense.wi.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wo.weight', 'decoder.block.15.layer.2.layer_norm.weight', 'decoder.final_layer_norm.weight', 'decoder.block.21.layer.0.layer_norm.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.16.layer.2.layer_norm.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.12.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.0.layer_norm.weight', 'decoder.block.19.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.v.weight', 'decoder.block.22.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.0.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.22.layer.0.layer_norm.weight', 'decoder.block.15.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.2.DenseReluDense.wi.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.20.layer.0.SelfAttention.o.weight', 'decoder.block.18.layer.0.SelfAttention.k.weight', 'decoder.block.14.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.0.SelfAttention.v.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.12.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.14.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.o.weight', 'decoder.block.13.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.22.layer.0.SelfAttention.v.weight', 'decoder.block.17.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.0.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.1.layer_norm.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.20.layer.0.SelfAttention.v.weight', 'decoder.block.5.layer.2.DenseReluDense.wi.weight', 'decoder.block.12.layer.1.EncDecAttention.v.weight', 'decoder.block.19.layer.1.EncDecAttention.o.weight', 'decoder.block.12.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.0.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.17.layer.2.DenseReluDense.wi.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.15.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.1.EncDecAttention.o.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.23.layer.1.EncDecAttention.k.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.15.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.0.SelfAttention.k.weight', 'decoder.block.16.layer.0.SelfAttention.v.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
NLP Group of The University of Hong Kong org

Hi, Thanks a lot for your interests in the INSTRUCTOR model!

As the INSTRUCTOR is only an encoder model, all the weights of the decoder will not be used.

NLP Group of The University of Hong Kong org

Please re-open the discussion if you have any questions or comments!

multi-train changed discussion status to closed

Sign up or log in to comment