Possible to implement `_no_split_modules` attribute?

#12
by ronnybehrens - opened

Thank you for this great contribution.

I want to use the model locally or on google colab for embedding text to be later used in a cluster analysis.
However, due to memory restrictions, I cant get it to process even small batch sizes of text on an A100.

Could you implement the _no_split_modules attribute to allow for device_map='auto', which would help with running inference on less memory? Or point me in the right direction for using the model on smaller memory?
Thanks!

NVIDIA org
edited May 30

Thank you for the suggestion. We updated the model card to describe about this feature as below. Please replace the "get the embedding and normalize embedding" with below codes.

batch_size=2 # modify the batch size
query_embeddings = model._do_encode(queries, batch_size=batch_size, instruction=query_prefix, max_length=max_length)
passage_embeddings = model._do_encode(passages, batch_size=batch_size, instruction=passage_prefix, max_length=max_length)

Sign up or log in to comment