Which 46 Languages

#86
by Robertl - opened
This comment has been hidden
BigScience Workshop org
edited Aug 16, 2022

Hi @Robertl ,
Please find the full list below! You can see it by clicking on the widget 46 languages above
Screenshot 2022-08-16 at 10.01.05.png
Thanks a lot!

BigScience Workshop org

I agree it is not clear enough, I proposed a PR here: https://github.com/huggingface/transformers/pull/18645 to have the full detailed list of the trained languages

BigScience Workshop org

Actually you can also find the full list here: https://huggingface.co/bigscience/bloom#languages !

christopher changed discussion status to closed

Why no Czech for training bloom? Czech has large corpora, has a very active community in NLP, have published previous NLP models (e.g., a BERT version)... ?

BigScience Workshop org

@cerisara The training corpus was crowdsourced by workshop participants; the final list of languages took shape organically through community hackathons and volunteer efforts.

More info in this thread: https://twitter.com/YJernite/status/1505920454825066496

Sign up or log in to comment