Document length for v2-m3?

#9
by rag-perplexity - opened

Hi, just curious on whether the v2-m3 reranker supports long length documents or is that only for 'bge-m3' using the BGEM3FlagModel where you can specify the max_passage_length?Thanks!

Beijing Academy of Artificial Intelligence org

The max length of this model is 8192, so it supports long-length documents. However, we fine-tune this model with a max length of 1024, so we recommend to set max_length=1024.
For reranker model, you should use FlagReranker class following https://huggingface.co/BAAI/bge-reranker-v2-m3#using-flagembedding, not BGEM3FlagModel. And you can pass max_length= to the compute_score function.

Hi !
What is the influence of this parameter on the compute time ? What is the impact on the duration of compute to multiply by 2 the length of the input text (one of the two inputs, say) ?
Cheers

If you set a larger max_length, and the input exceeds this maximum length, you will require more compute time.
If you multiply the length of the input text by 2, you will need more computing time.

Hi @Shitao ,
Given that the bge-reranker-v2-m3 model is based on xlm-roberta, which traditionally has a maximum token length of 512, how can the bge-reranker-v2-m3 model support a maximum length of 8192 tokens?

Beijing Academy of Artificial Intelligence org

Hi, @r1ck , bge-reranker-v2-m3 is based on BAAI/bge-m3, which has extended the maximum token length of xlm-roberta to 8192.

Sign up or log in to comment