BGR-RERANKER-LARGE COMPUTATIONAL TIME

#18
by kazmi09 - opened

Hi @Shitao
We are currently using the bge-small-en base model to create embeddings for our results, followed by filtering. We then apply the bge-reranker-large model to the top 1000 results. On a single GPU machine, the reranking process takes an average of 20 to 25 seconds for batch processing of 200 snippets per call. To reduce this time, we configured an environment with 4 GPUs, which brought the average time down to 9 to 10 seconds.

Can you provide guidance on how to further reduce the reranker computation time to an acceptable range of 0-2 seconds? Are there any settings or configurations we might be missing or doing incorrectly?
Thanks.

Beijing Academy of Artificial Intelligence org

@kazmi09 , you can use some latest inference frameworks from open-source community, e.g., https://github.com/huggingface/text-embeddings-inference

Sign up or log in to comment