Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
JustinLin610 
posted an update Feb 6
Post
Yesterday we just released Qwen1.5. Maybe someday I can tell more about the experience. But this is is at least a good release even if it is not yet SOTA. There is not so many SOTA by the way. This time, we actually fixed a lot of problems.

1. Context lengths are finally unified for all sizes. Previously, a lot of users kept telling us that 14B only supports 2K (Yeah even dynamic NTK does not work that well and it can only be extended to around 4-5K. Let alone those know nothing about how to use dynamic NTK).

2. If you carefully use our base language models, you will find that they understand special tokens of ChatML, which means that you can directly use LoRA to train on data with ChatML format. Why you can't do this before? This is because if the base language model does not understand the special tokens, you need to make them trained, which means that you should turn on the training of embedding. This is disgusting and it often leads to problems when you use ZeRO3.

3. We did strengthen our base language models except for 72. You should find better base language models, especially for 7 and 14. Why not 72? Nah, hard to say, but will make it better.

4. About the multilingual capabilities. Yes we finally build up our multilingual evaluation system and find out that our new base language models have nice performance in multilingual evaluation for base language models. This tells us that we should pay more attention to the post-training with multilingual data. And we did that too. This is why this time we tell you something about multilingual performance. It is for sure much much better than our models before this release.

5. Chat models are the most promising stuff. Before this release, we gave you the SFT models. But this time, we had very nice SFT+DPO models. Yeah not only annotators like them but also users like them. I am sure you developers will feel that way too.

For me Qwen always was top notch, especially the fact that it was possible to prompt in different languages. Awesome work, cannot wait to test the 1.5 stack ^^

Congratulations! With all the US/EU big players being more secretive than ever, you're not just bringing good models, but really making an incredible contribution to open research.

And I slightly disagree on one point: Qwen-500m is SOTA. Never thought it could be possible to pour results like this from such a small multilingual model for RAG tasks in French.

·

No I did not say it is SOTA. It is impossible for such a small model to be very powerful but it might be useful in some cases I guess.

Thanks for sharing!