Can you please submit this to leaderboard?

#15
by gblazex - opened
Qwen org

We ( @mlabonne , @chiphuyen & me) are trying to do correlation analysis between human judgement and different benchmarks,
and the Chat version of this model is missing from hugging leaderboard.

(base model exists but it's different)

Can you guys please submit the 14B chat version to hugging leaderboard as well?

context: https://twitter.com/gblazex/status/1737574824753467647

@gblazex Qwen needs trust remote code to be true. Which HF would not accept since its evaluation machines are not sandboxed.

Qwen org

@Yhyu13 that is great info thank you! So basically the tokenizer would need to be added to HuggingFace transformers library?

Qwen org

In fact, the modeling and tokenization both need merging for the leaderboard to work.
Currently, the base models (as foundation models) are manually run by HF staff (that's why its on the leaderboard). I don't think the chat models can enjoy the privilege though.
We plan to merge the code with transformers, but no schedule can be confirmed now.

Qwen org

@clefourrier can Qwen-14B-Chat get a manual run by HF stuff to get on leaderboard?

It would help us a lot in our quest to research the relationship between benchmarks,
and come up with a new representative suite based on them.

context: https://twitter.com/gblazex/status/1737574824753467647

Thank you

Qwen org

Hi,
I'm sorry, we have adopted as a policy to only run foundational models manually as 1) they are the most important for the community, and 2) any manual eval is a lot of added work and we don't have the bandwidth.
However, you can follow our instructions and run the eval yourself if you need results before the code is merged.

Qwen org

no worries, thank you!

Sign up or log in to comment