About continuing training

#2
by Ali-C137 - opened

Hello guys, incredible work btw ๐Ÿ”ฅ
I am interested to know if you managed to evaluate the model's performance on other languages than English? I'am interested to continue training this model on an arabic corpus! Do you think it will maintain it's performance across the embedding task as well? Would love to hear your thoughts about this subject
best ๐Ÿค—

cc : @Muennighoff

GritLM org

Thanks!
We evaluated it on TyDi QA - you can find the per-language metrics of this model here: https://huggingface.co/datasets/GritLM/results/blob/main/GritLM-7B/tydiqa_metrics.json
(the average is also reported in the paper)

Here's the GritLM-8x7B model: https://huggingface.co/datasets/GritLM/results/blob/main/GritLM-8x7B/tydiqa_metrics.json

We didn't test them on arabic embedding but there are a bunch of Arabic datasets available in MTEB - would be great to get their performance!

What languages does it suport?

GritLM org

You can try any language, but it will probably be best for English and related languages

Sign up or log in to comment