Curious about how the model was trained to support Taiwan Chinese so well

#1
by nps798 - opened

First, I'd like to thank you and your team for developing such a powerful model that has greatly contributed to the Traditional Chinese LLM community.

Not really a question about bugs or errors. I'm just newbie in this field. but.... I've heard that LLAMA-2 from Meta was predominantly trained on English corpus. Consequently, the vanilla Meta LLAMA-2 struggles with Chinese reasoning and responses. It would be very grateful if you could share some of your experience or tips for training the English-based LLM to learn Chinese?
And, if it's like going through the pre-training again. What's the advantage of using LLAMA2 rather than previous published model like Falcon or MPT ?

Thank you !

Thanks for your kind words and sorry for my late reply.

The choice of LLaMa-2 is based on the good performance (in English) and its relatively open license (compared to llama1)

Taiwan-LLMs were all continue-pretrained on a massive amount of Traditional Chinese and then instruction-tuned.

In additional to zh-tw, I kept a small portion of English and programming languages in the continue-pretraining in v2 for retraining the original capability in coding and english conversation.

Sign up or log in to comment