Expand the linguistic coverage.
#1
by
Pclanglais
- opened
While Llama is incredibly versatile and the multilingual training seems to have maintained support for languages not included in the dataset, a future version of Brahe should absolutely feature some texts in the following languages (nearly absent for now):
- Arabian
- Russian
- Chinese
- Japanese.
- Hindi
- Bengali
(really looking forward for potential collaborations on this, as I'm not conversant in any of theses).