Training languages in the model card #9

by fyvo - opened

The model card does not show the proportion of Arabic in the training data. The distribution of languages from the Niger-Congo family contains 'Kuganda', a probable misspelling of 'Luganda', spoken in Uganda. It is difficult to tell, as the corpora for Niger-Congo languages are not documented individually.

fyvo changed pull request status to open

Thanks for pointing out this!
I think it is worth it to open a PR on the main bloom repo as well since the model cards have been copied from there
cc-ing also @cakiki in case I did not missed anything