Languages

#6
by sarangs - opened

Hi. Can you tell me what languages are covered in the model : MoritzLaurer/bge-m3-zeroshot-v2.0?

The model is based on XLM-R, so from self-supervised pre-training it has seen the languages mentioned in this paper: https://arxiv.org/abs/1911.02116
the BAAI/BGE-m3 team then did additional training with multiple languages mention in this paper: https://arxiv.org/pdf/2402.03216
My 0-shot fine-tuning only included English data, so multilinguality comes from cross-lingual transfer from the preceeding training steps

MoritzLaurer changed discussion status to closed

I am wondering whether one should translate the hypothesis template and the labels to the language of the premises used (e.g., translate the hypothesis to Spanish to use with Spanish premises) or whether it makes sense to keep them in English since the 0-shot fine-tuning is done in English? I am asking as changing the hypothesis template has a very volatile impact on performance (e.g., seemingly small changes that do not change the hypothesis in a semantic sense, lead to large changes in output)

As a 2nd question, I am wondering how much the template is actually beneficial and whether it might make more sense to simply feed the raw label as the hypothesis to the model (i.e. the hypothesis template would simply be, template="{}"). This seems to be working as well as a basic template (This example is about), and outperform complex templates that haven't been seen during the finetuning steps (based on very crude example evaluations).

@aarabil
re question 1: I don't know to be honest, I haven't compared this empirically. My intuition would be: if you work with one language that you know well, write the hypotheses in that language. If you work with multiple languages you don't speak well, write the hypothesis in one harmonized language (English).
re question 2: I think it's better to keep the hypothesis template and formulate it in a similar way to the hypotheses shown in the model cards. The model has seen full sentence hypotheses during fine-tuning and it has never seen single word hypotheses with only the labels.

@MoritzLaurer Thank you for the answers!

re Q1 - This paper sheds some additional light on this question.
re Q2 - Makes most sense indeed, especially to play it safe.

Sign up or log in to comment