On how much English token was the model trained onn

#5
by aslawliet - opened

On how much English token was the model trained on?

Qwen org

🤔I would say less than 3T tokens; that's for sure.

@jklj077 is it more than 2.4 Trillion tokens?

It seems to randomly mix in Chinese words when I didn't ask for it annoyingly, maybe the model is better in Chinese I don't speak it. Might be due to the GGUF version though? It seems to make a GGUF model you need to give it some examples, and I think it sometimes makes it worse at some tasks if it's not good example. Might be worth testing if providing it with only English or mixed examples makes the quant better and release separate version

Sign up or log in to comment