Expand the linguistic coverage.

#1
by Pclanglais - opened

While Llama is incredibly versatile and the multilingual training seems to have maintained support for languages not included in the dataset, a future version of Brahe should absolutely feature some texts in the following languages (nearly absent for now):

  • Arabian
  • Russian
  • Chinese
  • Japanese.
  • Hindi
  • Bengali

(really looking forward for potential collaborations on this, as I'm not conversant in any of theses).

Sign up or log in to comment