How to embed 5 Million documents?

#22
by leonshub - opened

I want to use this model for my research and have 5 million texts to embed, consisting of each between 100 and 300 tokens. What would be the fastest way to do that?
I guess my question boils down to what would be the best batch_size?

I would suggest using the largest batch size that can fit within your GPU to make the utilization rate as high as possible.

Another trick is to sort the documents by length, so that unnecessary padding can be minimized.

Also do not forget to use fp16/bf16 and flash attention if your GPU supports that.

This comment has been hidden

Sign up or log in to comment