Qwen1.5 add to leaderboard

#597
by TNTOutburst - opened

Is there a reason the Qwen1.5 models haven't been added to the leaderboard? I don't understand stuff like precision and weight type enough to confidently add the models myself, so I would really appreciate it if someone could add the models.

https://huggingface.co/Qwen/Qwen1.5-72B
https://huggingface.co/Qwen/Qwen1.5-14B
https://huggingface.co/Qwen/Qwen1.5-7B
https://huggingface.co/Qwen/Qwen1.5-4B
https://huggingface.co/Qwen/Qwen1.5-1.8B
https://huggingface.co/Qwen/Qwen1.5-0.5B

Edit:
Actually, it looks like 4B and 7B (chat versions) have been added, but not the rest.
And it seems the 4B and 7B on the leaderboard are marked as chat models when they should be marked as pretrained.

@TNTOutburst The Qwen1.5's currently on the leaderboard are the official chat vs pretrained versions.

@TNTOutburst The Qwen1.5's currently on the leaderboard are the official chat vs pretrained versions.

Huh weird. I don't know how I didn't see it says chat in their names.

@TNTOutburst Good news. It looks like someone submitted the Qwen1.5s to the leaderboard for evaluation.

Huh, I wonder why Qwen1.5-72B is scoring worse than Qwen-72B

@TNTOutburst I tested the official Qwen1.5 chat up to 14b (the limit of my PC) and it often performs surprising bad in English (worse than the best Llama 7b fine-tunes, let alone Llama 14b fine-tunes, which themselves aren't very good).

I can't really fault them for this since they are a Chinese company, and Qwen1.5s apparently do unusually well in Chinese. But the performance in English is far too unreliable to be useful, at least at 14b or lower parameters. Plus the alignment is the worse I've seen. They basically aligned it to a 10 year old child living in a deeply conservative religious household (e.g. As an AI agent I can't respond because it's not age appropriate), even though I asked common PG questions Wikipedia answered. And this over-alignment bleed everywhere, crippling it at every turn. And the hallucinations were also excessive, even among very widely know information, such as basic questions about the most popular English movies, music and TV shows ever released.

In short, don't bother with Qwen1, 1.5, 2.0... if you aren't Chinese. I'm glad they exist because the Chinese people deserve a good LLM, but its excessive alignment and poor performance in English when it comes to common and basic tasks makes it otherwise pointless to use.

Hugging Face H4 org

Btw, since we upgraded to the latest of transformers, I think you should be able to submit Qwen2 models yourselves if needed :)

clefourrier changed discussion status to closed

Qwen 1 is good at English as well as Chinese. But there are plenety of models that are greater than it in English though

Sign up or log in to comment