Dataset Language Distribution
#44
by
aslawliet
- opened
What ratio of English and Chinese dataset was Yi-34b trained on? Was it at least trained on 2 trillion+ tokens of English?
Hi there! Thank you for your question! Yi-34B was indeed trained on 2 Trillion+ tokens of English!
richardllin
changed discussion status to
closed