Model taking around 4GB GPU VRAM

#42

by vybhavnca - opened May 8

May 8

Hi Team,
When loading the model with the following code snippet as well as with sentence-transformers library, the model seems to be taking about 4GB of GPU VRAM. Please see the attached code snippets and the output of nvidia-smi command for the pid 426186. The same situation happens with the rerank model as well (v1-turbo). Other BERT based models of similar param size take only around 600 MB. What could be the reason for this?
I am using the following package versions
sentence-transformers 2.7.0 transformers 4.40.2

Before Loading the model

After loading the model

bwang0911

Jina AI org May 9

hi @vybhavnca it highly depends on your document length, let's say your document is above 8k tokens then the model will consume more memory while encoding.

If you want to reduce memory usage while do not really care lengthy documents, do:

...

model.encode(['lengthy document ...'], max_length=512)

You can tune max_length based on your need.

vybhavnca

May 9

Hi @bwang0911 , this is happening when I am loading the model itself as shown in the code and not while trying to encode.

bwang0911

Jina AI org May 9

@vybhavnca sorry now i understand what you mean. Indeed, at model loading time we create an additional alibi tensor with a shape of 12, 8192, 8192, it roughly takes 3GB of additional vram. if you take the extra layers it should take you less than 4GB to load the model.

vybhavnca

May 9

@bwang0911 , that makes sense. Thank you for the clarification.

vybhavnca changed discussion status to closed May 9

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment