Spaces:

HuggingFaceH4
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

748

Qwen1.5 add to leaderboard

#597

by TNTOutburst - opened Feb 13

Discussion

TNTOutburst

Feb 13

•

edited Feb 13

Is there a reason the Qwen1.5 models haven't been added to the leaderboard? I don't understand stuff like precision and weight type enough to confidently add the models myself, so I would really appreciate it if someone could add the models.

https://huggingface.co/Qwen/Qwen1.5-72B
https://huggingface.co/Qwen/Qwen1.5-14B
https://huggingface.co/Qwen/Qwen1.5-7B
https://huggingface.co/Qwen/Qwen1.5-4B
https://huggingface.co/Qwen/Qwen1.5-1.8B
https://huggingface.co/Qwen/Qwen1.5-0.5B

Edit:
Actually, it looks like 4B and 7B (chat versions) have been added, but not the rest.
~~And it seems the 4B and 7B on the leaderboard are marked as chat models when they should be marked as pretrained.~~

Phil337

Feb 13

@TNTOutburst The Qwen1.5's currently on the leaderboard are the official chat vs pretrained versions.

TNTOutburst

Feb 13

@TNTOutburst The Qwen1.5's currently on the leaderboard are the official chat vs pretrained versions.

Huh weird. I don't know how I didn't see it says chat in their names.

Phil337

Feb 19

@TNTOutburst Good news. It looks like someone submitted the Qwen1.5s to the leaderboard for evaluation.

TNTOutburst

Feb 19

Huh, I wonder why Qwen1.5-72B is scoring worse than Qwen-72B

Phil337

Feb 19

@TNTOutburst I tested the official Qwen1.5 chat up to 14b (the limit of my PC) and it often performs surprising bad in English (worse than the best Llama 7b fine-tunes, let alone Llama 14b fine-tunes, which themselves aren't very good).

I can't really fault them for this since they are a Chinese company, and Qwen1.5s apparently do unusually well in Chinese. But the performance in English is far too unreliable to be useful, at least at 14b or lower parameters. Plus the alignment is the worse I've seen. They basically aligned it to a 10 year old child living in a deeply conservative religious household (e.g. As an AI agent I can't respond because it's not age appropriate), even though I asked common PG questions Wikipedia answered. And this over-alignment bleed everywhere, crippling it at every turn. And the hallucinations were also excessive, even among very widely know information, such as basic questions about the most popular English movies, music and TV shows ever released.

In short, don't bother with Qwen1, 1.5, 2.0... if you aren't Chinese. I'm glad they exist because the Chinese people deserve a good LLM, but its excessive alignment and poor performance in English when it comes to common and basic tasks makes it otherwise pointless to use.

clefourrier

Hugging Face H4 org Feb 26

Btw, since we upgraded to the latest of transformers, I think you should be able to submit Qwen2 models yourselves if needed :)

clefourrier changed discussion status to closed Feb 26

Yhyu13

Feb 26

Qwen 1 is good at English as well as Chinese. But there are plenety of models that are greater than it in English though

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment