Model taking around 4GB GPU VRAM

#42
by vybhavnca - opened

Hi Team,
When loading the model with the following code snippet as well as with sentence-transformers library, the model seems to be taking about 4GB of GPU VRAM. Please see the attached code snippets and the output of nvidia-smi command for the pid 426186. The same situation happens with the rerank model as well (v1-turbo). Other BERT based models of similar param size take only around 600 MB. What could be the reason for this?
I am using the following package versions

sentence-transformers 2.7.0
transformers 4.40.2

code_snippet.png
Before Loading the model
model_size_without_loading.png
After loading the model
model_size.png

Jina AI org

hi @vybhavnca it highly depends on your document length, let's say your document is above 8k tokens then the model will consume more memory while encoding.

If you want to reduce memory usage while do not really care lengthy documents, do:

...

model.encode(['lengthy document ...'], max_length=512)

You can tune max_length based on your need.

Hi @bwang0911 , this is happening when I am loading the model itself as shown in the code and not while trying to encode.

Jina AI org

@vybhavnca sorry now i understand what you mean. Indeed, at model loading time we create an additional alibi tensor with a shape of 12, 8192, 8192, it roughly takes 3GB of additional vram. if you take the extra layers it should take you less than 4GB to load the model.

@bwang0911 , that makes sense. Thank you for the clarification.

vybhavnca changed discussion status to closed

Sign up or log in to comment