How much GPU VRAM needed for finetuning?

#8
by nlpdev3 - opened

I saw the source code provided instruction for training the model. What's the size of VRAM needed for finetuning with customized data?

NLP Group of The University of Hong Kong org

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

We perform all the training on GPUs with 48GB memory. It is possible for the smaller models to require less memory.

Feel free to add any further questions or comments!

Thanks for reply. @multi-train is it necessary to have one GPU with at least 48GB or fine with multi-devices(total VRAM > 48GB).

NLP Group of The University of Hong Kong org

Hi, it is possible to finetune a smaller model on GPU with less memory, e.g., 42GB. In addition, you may adjust the batch size, maximum sequence length to save the memory. With multiple devices, you may try to parallelize the training process.

Feel free to add any further questions or comments!

Got the training working by fintuning instructor-large. When I load the local trained model I got this:

  • This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of T5EncoderModel were not initialized from the model checkpoint at /home/ssm-user/.cache/torch/sentence_transformers/hkunlp_instructor-large/ and are newly initialized: ['encoder.block.18.layer.1.DenseReluDense.wo.weight', 'encoder.block.21.layer.1.DenseReluDense.wo.weight', 'encoder.block.18.layer.0.SelfAttention.v.weight', 'encoder.block.7.layer.0.SelfAttention.k.weight', 'encoder.block.9.layer.0.layer_norm.weight', 'encoder.block.12.layer.1.layer_norm.weight', 'encoder.block.0.layer.1.DenseReluDense.wi.weight', 'encoder.block.2.layer.0.SelfAttention.k.weight', 'encoder.block.6.layer.0.SelfAttention.k.weight', 'encoder.block.9.layer.0.SelfAttention.o.weight', 'encoder.block.5.layer.0.layer_norm.weight', 'encoder.block.15.layer.0.SelfAttention.k.weight', 'encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'encoder.block.20.layer.1.DenseReluDense.wi.weight', 'encoder.block.19.layer.1.DenseReluDense.wo.weight', 'encoder.block.10.layer.1.DenseReluDense.wo.weight', 'encoder.block.16.layer.0.SelfAttention.q.weight', 'encoder.block.23.layer.0.SelfAttention.o.weight', 'encoder.block.10.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.0.SelfAttention.o.weight', 'encoder.block.11.layer.0.SelfAttention.v.weight', 'encoder.block.17.layer.1.DenseReluDense.wo.weight', 'encoder.block.20.layer.0.SelfAttention.k.weight', 'encoder.block.5.layer.0.SelfAttention.v.weight', 'encoder.block.9.layer.1.layer_norm.weight', 'encoder.block.12.layer.0.SelfAttention.q.weight', 'encoder.block.20.layer.0.SelfAttention.v.weight', 'encoder.block.18.layer.0.layer_norm.weight', 'encoder.block.11.layer.0.SelfAttention.k.weight', 'encoder.block.2.layer.0.layer_norm.weight', 'encoder.block.11.layer.1.DenseReluDense.wo.weight', 'encoder.block.1.layer.0.SelfAttention.q.weight', 'encoder.block.16.layer.0.SelfAttention.k.weight', 'encoder.block.17.layer.0.SelfAttention.k.weight', 'encoder.block.21.layer.0.SelfAttention.v.weight', 'encoder.block.12.layer.0.SelfAttention.o.weight', 'encoder.block.16.layer.0.SelfAttention.v.weight', 'encoder.block.21.layer.1.DenseReluDense.wi.weight', 'encoder.block.1.layer.0.SelfAttention.k.weight', 'encoder.block.6.layer.1.layer_norm.weight', 'encoder.block.6.layer.1.DenseReluDense.wi.weight', 'encoder.block.2.layer.0.SelfAttention.q.weight', 'encoder.block.12.layer.0.SelfAttention.k.weight', 'encoder.block.20.layer.0.SelfAttention.o.weight', 'encoder.block.4.layer.0.SelfAttention.q.weight', 'encoder.block.19.layer.0.SelfAttention.q.weight', 'encoder.block.21.layer.0.layer_norm.weight', 'encoder.block.8.layer.0.SelfAttention.q.weight', 'encoder.block.12.layer.1.DenseReluDense.wo.weight', 'encoder.block.17.layer.1.DenseReluDense.wi.weight', 'encoder.block.4.layer.1.DenseReluDense.wo.weight', 'encoder.block.13.layer.0.SelfAttention.k.weight', 'encoder.block.13.layer.0.SelfAttention.q.weight', 'encoder.block.2.layer.0.SelfAttention.o.weight', 'encoder.final_layer_norm.weight', 'encoder.block.6.layer.0.SelfAttention.o.weight', 'encoder.block.3.layer.0.layer_norm.weight', 'encoder.block.20.layer.1.DenseReluDense.wo.weight', 'encoder.block.0.layer.0.layer_norm.weight', 'encoder.block.0.layer.1.DenseReluDense.wo.weight', 'encoder.block.6.layer.0.layer_norm.weight', 'encoder.block.7.layer.0.SelfAttention.q.weight', 'encoder.block.21.layer.0.SelfAttention.o.weight', 'encoder.block.4.layer.1.DenseReluDense.wi.weight', 'encoder.block.13.layer.0.layer_norm.weight', 'encoder.block.13.layer.1.DenseReluDense.wi.weight', 'encoder.block.23.layer.1.DenseReluDense.wo.weight', 'encoder.block.17.layer.0.SelfAttention.v.weight', 'encoder.block.17.layer.1.layer_norm.weight', 'encoder.block.8.layer.1.layer_norm.weight', 'encoder.block.10.layer.1.DenseReluDense.wi.weight', 'encoder.block.13.layer.0.SelfAttention.o.weight', 'encoder.block.21.layer.1.layer_norm.weight', 'encoder.block.9.layer.0.SelfAttention.v.weight', 'encoder.block.15.layer.1.DenseReluDense.wo.weight', 'encoder.block.19.layer.0.layer_norm.weight', 'encoder.block.23.layer.0.SelfAttention.v.weight', 'encoder.block.14.layer.1.layer_norm.weight', 'encoder.block.22.layer.1.DenseReluDense.wi.weight', 'encoder.block.22.layer.0.SelfAttention.v.weight', 'encoder.block.9.layer.0.SelfAttention.k.weight', 'encoder.block.3.layer.0.SelfAttention.k.weight', 'encoder.block.7.layer.1.layer_norm.weight', 'encoder.block.19.layer.0.SelfAttention.v.weight', 'encoder.block.13.layer.1.DenseReluDense.wo.weight', 'encoder.block.8.layer.1.DenseReluDense.wi.weight', 'encoder.block.12.layer.0.layer_norm.weight', 'encoder.block.14.layer.1.DenseReluDense.wi.weight', 'encoder.block.5.layer.0.SelfAttention.k.weight', 'encoder.block.4.layer.0.layer_norm.weight', 'encoder.block.3.layer.0.SelfAttention.o.weight', 'encoder.block.14.layer.0.layer_norm.weight', 'encoder.block.20.layer.1.layer_norm.weight', 'encoder.block.11.layer.0.layer_norm.weight', 'encoder.block.2.layer.1.layer_norm.weight', 'encoder.block.10.layer.0.SelfAttention.v.weight', 'encoder.block.16.layer.0.SelfAttention.o.weight', 'encoder.block.0.layer.0.SelfAttention.k.weight', 'encoder.block.12.layer.0.SelfAttention.v.weight', 'encoder.block.21.layer.0.SelfAttention.k.weight', 'encoder.block.22.layer.1.DenseReluDense.wo.weight', 'encoder.block.23.layer.1.layer_norm.weight', 'encoder.block.0.layer.0.SelfAttention.q.weight', 'encoder.block.11.layer.0.SelfAttention.o.weight', 'encoder.block.1.layer.0.layer_norm.weight', 'encoder.block.6.layer.1.DenseReluDense.wo.weight', 'encoder.block.15.layer.0.layer_norm.weight', 'encoder.block.8.layer.0.layer_norm.weight', 'encoder.block.10.layer.0.SelfAttention.k.weight', 'encoder.block.14.layer.0.SelfAttention.q.weight', 'encoder.block.8.layer.0.SelfAttention.v.weight', 'encoder.block.16.layer.0.layer_norm.weight', 'encoder.block.16.layer.1.DenseReluDense.wo.weight', 'encoder.block.18.layer.0.SelfAttention.k.weight', 'encoder.block.19.layer.0.SelfAttention.o.weight', 'encoder.block.19.layer.1.DenseReluDense.wi.weight', 'encoder.block.2.layer.1.DenseReluDense.wi.weight', 'encoder.block.16.layer.1.DenseReluDense.wi.weight', 'encoder.block.1.layer.0.SelfAttention.v.weight', 'encoder.block.11.layer.1.layer_norm.weight', 'encoder.block.14.layer.1.DenseReluDense.wo.weight', 'encoder.block.15.layer.0.SelfAttention.v.weight', 'encoder.block.7.layer.0.layer_norm.weight', 'encoder.block.14.layer.0.SelfAttention.o.weight', 'encoder.block.22.layer.0.SelfAttention.k.weight', 'encoder.block.15.layer.0.SelfAttention.q.weight', 'encoder.block.8.layer.1.DenseReluDense.wo.weight', 'encoder.block.8.layer.0.SelfAttention.k.weight', 'encoder.block.13.layer.1.layer_norm.weight', 'encoder.block.18.layer.0.SelfAttention.o.weight', 'encoder.block.23.layer.0.SelfAttention.q.weight', 'encoder.block.16.layer.1.layer_norm.weight', 'encoder.block.15.layer.1.DenseReluDense.wi.weight', 'encoder.block.23.layer.0.layer_norm.weight', 'encoder.block.15.layer.1.layer_norm.weight', 'encoder.block.20.layer.0.layer_norm.weight', 'encoder.block.3.layer.0.SelfAttention.v.weight', 'encoder.block.4.layer.0.SelfAttention.k.weight', 'encoder.block.22.layer.1.layer_norm.weight', 'encoder.block.9.layer.1.DenseReluDense.wo.weight', 'encoder.block.7.layer.1.DenseReluDense.wi.weight', 'encoder.block.4.layer.0.SelfAttention.o.weight', 'encoder.block.14.layer.0.SelfAttention.v.weight', 'encoder.block.18.layer.1.DenseReluDense.wi.weight', 'encoder.block.10.layer.0.layer_norm.weight', 'encoder.block.11.layer.0.SelfAttention.q.weight', 'encoder.block.15.layer.0.SelfAttention.o.weight', 'encoder.block.9.layer.0.SelfAttention.q.weight', 'encoder.block.1.layer.1.DenseReluDense.wo.weight', 'encoder.block.5.layer.1.DenseReluDense.wi.weight', 'encoder.block.7.layer.0.SelfAttention.o.weight', 'encoder.block.3.layer.1.layer_norm.weight', 'encoder.block.12.layer.1.DenseReluDense.wi.weight', 'encoder.block.17.layer.0.layer_norm.weight', 'encoder.embed_tokens.weight', 'encoder.block.6.layer.0.SelfAttention.v.weight', 'encoder.block.5.layer.0.SelfAttention.q.weight', 'encoder.block.2.layer.0.SelfAttention.v.weight', 'encoder.block.1.layer.1.layer_norm.weight', 'encoder.block.4.layer.1.layer_norm.weight', 'encoder.block.0.layer.1.layer_norm.weight', 'encoder.block.22.layer.0.SelfAttention.o.weight', 'encoder.block.21.layer.0.SelfAttention.q.weight', 'encoder.block.3.layer.0.SelfAttention.q.weight', 'encoder.block.5.layer.1.DenseReluDense.wo.weight', 'encoder.block.11.layer.1.DenseReluDense.wi.weight', 'encoder.block.14.layer.0.SelfAttention.k.weight', 'encoder.block.17.layer.0.SelfAttention.q.weight', 'encoder.block.7.layer.1.DenseReluDense.wo.weight', 'encoder.block.10.layer.0.SelfAttention.o.weight', 'encoder.block.4.layer.0.SelfAttention.v.weight', 'encoder.block.17.layer.0.SelfAttention.o.weight', 'encoder.block.9.layer.1.DenseReluDense.wi.weight', 'encoder.block.22.layer.0.layer_norm.weight', 'encoder.block.7.layer.0.SelfAttention.v.weight', 'encoder.block.19.layer.1.layer_norm.weight', 'encoder.block.1.layer.1.DenseReluDense.wi.weight', 'encoder.block.10.layer.1.layer_norm.weight', 'encoder.block.3.layer.1.DenseReluDense.wi.weight', 'encoder.block.23.layer.0.SelfAttention.k.weight', 'encoder.block.22.layer.0.SelfAttention.q.weight', 'encoder.block.5.layer.1.layer_norm.weight', 'encoder.block.23.layer.1.DenseReluDense.wi.weight', 'encoder.block.8.layer.0.SelfAttention.o.weight', 'encoder.block.6.layer.0.SelfAttention.q.weight', 'encoder.block.19.layer.0.SelfAttention.k.weight', 'encoder.block.20.layer.0.SelfAttention.q.weight', 'encoder.block.0.layer.0.SelfAttention.v.weight', 'encoder.block.3.layer.1.DenseReluDense.wo.weight', 'shared.weight', 'encoder.block.2.layer.1.DenseReluDense.wo.weight', 'encoder.block.18.layer.0.SelfAttention.q.weight', 'encoder.block.13.layer.0.SelfAttention.v.weight', 'encoder.block.18.layer.1.layer_norm.weight', 'encoder.block.5.layer.0.SelfAttention.o.weight', 'encoder.block.0.layer.0.SelfAttention.o.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
NLP Group of The University of Hong Kong org

Could I see your code to load the model?

I just simply replace the pytorch_mode.bin file in cache directory and use sample code to load.

NLP Group of The University of Hong Kong org

Do you refer to the code:

from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-large')

Right, just use these two lines to verify.

NLP Group of The University of Hong Kong org

It seems that I can run these two lines without encountering the errors above. Could you re-download the checkpoints?

This is what I did:
1 - Grab top 100 samples from medi-data.json
2 - Train with python train.py --model_name_or_path hkunlp/instructor-large --output_dir ./models --cache_dir ./medi-data --max_source_length 512 --num_train_epochs 10 --save_steps 500 --cl_temperature 0.01 --warmup_ratio 0.1 --learning_rate 2e-5 --overwrite_output_dir
3 - Copy the ./models/pytorch_model.bin to the cache directory to replace the same name file with my new weights.
4 - Use this code to load data

from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-large')

Weird that I evaluate billboard with finetuned model, the pearson result is 0.16 which is worse than original hkunlp/instructor-large.

NLP Group of The University of Hong Kong org

Hi, Thanks a lot for your comment!

We have updated both the training file and the InstructorEmbedding package. You may update the package by:

pip install -U InstructorEmbedding

and use the latest training file to finetune the model.

Feel free to add any further questions or comments!

This comment has been hidden
nlpdev3 changed discussion status to closed

Sign up or log in to comment