New foundational models in need of eval

#130
by spaceman7777 - opened

Hi @clefourrier , again, great job resolving the issues regarding the llama-1-65b fine tunings!

This is a separate request regarding the new work out of MosaicML. They've released what is essentially a MPT-7B-2, under the name of MPT-8k-7B

These three new models have been retrained from scratch on 1.5T tokens, instead of the original MPT-7B's 1T tokens, qualifying them as new foundational models, which fits the speculated reasons for which a model should be fast-tracked for leaderboard listing (vs. a fine tuning of a foundational model)

It's a series of three models, MPT-8k-7B, MPT-8k-7B-instruct, and MPT-8k-7B-chat
Here are some links to the models:
https://huggingface.co/mosaicml/mpt-7b-8k
https://huggingface.co/mosaicml/mpt-7b-8k-instruct
https://huggingface.co/mosaicml/mpt-7b-8k-chat

I know there were some issues with evaluating the mpt models before, as I believe they still need --trust-remote-code for the testing harness, and the models are quite new, so I thought it warranted a community thread to bring their priority to light :)

Looking forward to seeing more of you folks' good work here, keeping people up to date!

-- Spaceman

Open LLM Leaderboard org

HI @spaceman7777 ,

Thank you for your comments!
We will soon support MPT models, since they have been integrated to the transformers library, and I'll make sure to ping you then so you can submit these models to the queue if they interest you!

Note: Regarding what has been called a "fast track", I should have corrected then that it is not a "fast-track" per se, but rather a "pre-release track".
It is an exceptional process (happened 3 times since @SaylorTwift and I took over the leaderboard dev), which happens before the release of new SOTA models from partner organizations, per their request, to allow the results to appear on the leaderboard on release day.

These "tracks" are the exception and not the rule, as they cause considerably more work for us - otherwise we would be drowned under requests of models to manually evaluate.
And I'm sure you'll agree with us that it's more important to add more features and ensure leaderboard stability :)

Ah, ok. @clefourrier thank you for your response. :)

The reason why I wanted to make sure that these were prioritized is that they are among the very very few on the leaderboard who have different pretraining, rather than just fine tuning. As a result, the only other top tier models that compare are the llama-1-7/13/30/65 original models, mpt-30b/30b-instruct/30b-chat, mpt-7b/mpt-7b-instruct/mpt-7b-chat, and the few other foundational models that score somewhat decently. (Like rwkv, which still isn't listed, despite my requests).

That's what I meant by how it is different. 95% of the models on the board are just fine tuned versions of models, whereas these are original releases with freshly pretrained bases.

I look forward to seeing whether they do indeed score better with the current evaluation methodology here than the llama-2-7b original models, as could very well be the case.

Good luck with this :)

clefourrier changed discussion status to closed
Open LLM Leaderboard org

Hi @spaceman7777 ,
With the latest transformers release, we now support MPT models! Feel free to add as many as you'd like to the leaderboard!

Sign up or log in to comment