About continuing training
Hello guys, incredible work btw ๐ฅ
I am interested to know if you managed to evaluate the model's performance on other languages than English? I'am interested to continue training this model on an arabic corpus! Do you think it will maintain it's performance across the embedding task as well? Would love to hear your thoughts about this subject
best ๐ค
cc : @Muennighoff
Thanks!
We evaluated it on TyDi QA - you can find the per-language metrics of this model here: https://huggingface.co/datasets/GritLM/results/blob/main/GritLM-7B/tydiqa_metrics.json
(the average is also reported in the paper)
Here's the GritLM-8x7B model: https://huggingface.co/datasets/GritLM/results/blob/main/GritLM-8x7B/tydiqa_metrics.json
We didn't test them on arabic embedding but there are a bunch of Arabic datasets available in MTEB - would be great to get their performance!
What languages does it suport?
You can try any language, but it will probably be best for English and related languages