Possible to implement `_no_split_modules` attribute?

#12

by ronnybehrens - opened May 30, 2024

May 30, 2024

Thank you for this great contribution.

I want to use the model locally or on google colab for embedding text to be later used in a cluster analysis.
However, due to memory restrictions, I cant get it to process even small batch sizes of text on an A100.

Could you implement the _no_split_modules attribute to allow for device_map='auto', which would help with running inference on less memory? Or point me in the right direction for using the model on smaller memory?
Thanks!

nada5

NVIDIA org May 30, 2024

•

edited May 30, 2024

Thank you for the suggestion. We updated the model card to describe about this feature as below. Please replace the "get the embedding and normalize embedding" with below codes.

batch_size=2 # modify the batch size
query_embeddings = model._do_encode(queries, batch_size=batch_size, instruction=query_prefix, max_length=max_length)
passage_embeddings = model._do_encode(passages, batch_size=batch_size, instruction=passage_prefix, max_length=max_length)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment