Full list of languages

#5
by wissamantoun - opened

Where can we find a list of the languages used during training? And will there be a paper to be released?

Thank you for the model and for the help!

XVERSE Technology org
β€’
edited Aug 10, 2023

We trained our model on multiple languages including ar, bg, ca, cs, da, de, el, en, es, et, fa, fi, fr, hi, hu, id, it, iw, ja, kk, ko, lt, lv, mr, ms, nl, no, pl, pt, ro, ru, sk, sl, sr, sv, ta, th, tr, uk, vi, and zh. Notably, the primary languages in our training data were Chinese and English. As for your inquiry about a paper, we currently do not have plans to release one. Thanks.

How about programming languages?

XVERSE Technology org

We have included the following programming languages in our pre-training data: ABAP, Arduino, Assembly, Shell, C, C#, C++, Clojure, COBOL, Crystal, CUDA, Dart, Pascal, Elixir, Erlang, F#, Fortran, Go, Groovy, Haskell, CSS, Java, JavaScript, Julia, Kotlin, Common Lisp, Emacs Lisp, Objective-C++, OCaml, Perl, PHP, PowerShell, Python, R, Ruby, Rust, Scala, Solidity, SQL, Swift, TypeScript, Verilog, VHDL, Visual Basic.

What is the proportion of each language?ar, bg, ca, cs, da, de, el, en, es, et, fa, fi, fr, hi, hu, id, it, iw, ja, kk, ko, lt, lv, mr, ms, nl, no, pl, pt, ro, ru, sk, sl, sr, sv, ta, th, tr, uk, vi, and zh
Any test results on cross language tasks?

Sign up or log in to comment