About continuing training

by Ali-C137 - opened

Hello guys, incredible work btw πŸ”₯
I am interested to know if you managed to evaluate the model's performance on other languages than English? I'am interested to continue training this model on an arabic corpus! Do you think it will maintain it's performance across the embedding task as well? Would love to hear your thoughts about this subject
best πŸ€—

cc : @Muennighoff

GritLM org

We evaluated it on TyDi QA - you can find the per-language metrics of this model here: https://huggingface.co/datasets/GritLM/results/blob/main/GritLM-7B/tydiqa_metrics.json
(the average is also reported in the paper)

Here's the GritLM-8x7B model: https://huggingface.co/datasets/GritLM/results/blob/main/GritLM-8x7B/tydiqa_metrics.json

We didn't test them on arabic embedding but there are a bunch of Arabic datasets available in MTEB - would be great to get their performance!

What languages does it suport?

GritLM org

You can try any language, but it will probably be best for English and related languages

Sign up or log in to comment