About class_langs
#1
by
AlirezaFa
- opened
I want to use this for a multilingual retrieval application and I don't know the language of the query from the users. The text_features
function expects a langs
parameter that defines the language of each text and in case no langs
is passed, it defaults to eng_Latn
. In case we don't know the language of the text and it is not eng_Latn
, what happens? Do we get a big drop in performance?
From my experiments, text embeddings in different languages are quite similar (which is good and expected). Because of that, if you incorrectly specify the language, you will still likely get correct results. In terms of quantitative results, recall drops by about 10% with incorrect language.