Exciting, how can you do that so quickly?

#1
by KnutJaegersberg - opened

I thought you'd continue to pretrain the model on many tokens. Is this a fine-tune? What kinda data did it see? How much? Can we use it?

Wondering the same. Please share some details :)

KnutJaegersberg changed discussion status to closed

@KnutJaegersberg Why did you close this? I'm still interested in more details. Kind regards

no hurry, I am to share details, give me some time. much work to do.

no hurry, I am to share details, give me some time. much work to do.

Great! Thank you very much

It is quite complex as the large vocab and large size behaves unstable for Qwen 72B, will take more time to solve some issues, for example: totally failed on GGUF Quants - https://github.com/ggerganov/llama.cpp/pull/4281
Also downgraded performance on non-biased QKVO (not uploaded yet, used with previous 14B & 7B, poor on 72B).
I am not sure why, but I am quite unsatisfied with this new base model, maybe I am to head towards Deepseek LLM 67B and Yi 34B.

JosephusCheung changed discussion status to open

Sign up or log in to comment