Evaluation for finetuned bge-m3 model

#39
by comet24082002 - opened

Hello BAAI team,
I've done finetuning BAAI/bge-m3 model for Vietnamese. But when I used bge -m3 with Sentence Transformer to evaluate using Information Retrieval Evaluation, I got an error that it required 128 GB RAM for Cuda GPU. This model is small but I don't understand the RAM GPU requirement. Please tell me about solution of this error.
And the second question is, if I want to evaluate my finetuned model using the evaluation steps on Github, how can I evaluate it with Vietnamese or other language because I've only seen that you use MS Macro (English Ver) in config of the evaluation.
I am looking forward to hearing from you.

image.png

Beijing Academy of Artificial Intelligence org

This may be due to the long text input with a large batch size. The memory cost is high when you encode text of 8192 tokens with a large batch size. You can set a smaller input length and batch size.

Regarding to evaluation, you need to prepare your dataset following https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py#L187 (eval_data) and https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py#L188 (corpus).

This may be due to the long text input with a large batch size. The memory cost is high when you encode text of 8192 tokens with a large batch size. You can set a smaller input length and batch size.

Regarding to evaluation, you need to prepare your dataset following https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py#L187 (eval_data) and https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py#L188 (corpus).

How can I add my evaluation and corpus data to a github link?

Sign up or log in to comment