HuggingFaceH4/open_llm_leaderboard · Flagging models with incorrect tags

Jan 16

•

Hi

Discussions in #510 got lengthy so upon suggestion by @clefourrier I am opening a new thread.

Recent changes on the leaderboard made it so that proper filtering of models where merging was involved can be applied only if authors tag their model accordingly.
To facilitate that, HF representative @clefourrier and me (unrelated volunteer) are flagging models that don't have the right tag.

Let's go through questions that you might have.

What is considered a merge?

We considered that merges are models which combine several other models in a way that does not keep the individual weights of the original models (like fusions).

I fine-tuned a model that was created by someone else using one of the merging techniques, should I still tag it as a merge?

Yes please, discussed in #510

I made a MoE out of a few models, some of which were merges, should I still tag the model as a merge?

Yes, as discussed in #510. If you used a model that was created using merging as one or more experts in MoE, your model should be tagged as merge. Please also remember to tag it as MoE by adding tag moe.

I made a merge but used just single model weights for it, without combining it with weights from a different model - should I tag it as a merge?

No, this would not be considered a merge, so merge tag is not necessary. Hovewer, if your model is a MoE, please add moetag to it.

My model has been flagged, how do I unflag it?

If the creation of your model indeed involved model merging and you didn't tag it as such yet, please tag it. Even if you weren't the person doing the merging.
To do it via UI, please navigate to model card of your model, click Edit model card, type in merge in tags: section and save the model card.
Then, please wait an hour for the leaderboard to update. I noticed it generally happens every full hour. If your model still shows up as flagged, please reply here and ask @clefourrier to un-flag your model manually.
If all went well, you should then be able to open up the leaderboard, click Show merges and your model should be on a list.
I encourage you to double check if the leaderboard now sees your model as merged by adding merged column to the list and making sure that your model has merged = true.

If you believe that your model was wrongly flagged, please raise this here and we will discuss it.
It's possible for me to make a mistake when I browse through models and flag a model that is not a merge.

Please try to remember to tag your future models as merge before submitting them to the leaderboard.

adamo1139

Jan 16

Hi @clefourrier

Please flag models below as lacking merge tag. I want to focus on merge tag so I dropped text related to any contamination accusations.
Which links would be the most helpful for you for effortlessly flagging models? Does it make a difference whether I add a link to evaluation or not?

Description	Model card	Evaluation Details
Turdus' ancestry goes back to merge of AIDC-ai-business/Marcoroni-7B-v3 and EmbeddedLLM/Mistral-7B-Merge-14-v0.1 and possibly involves more merges.	https://huggingface.co/udkai/Turdus	https://huggingface.co/datasets/open-llm-leaderboard/details_udkai__Turdus
Slerp merge of upstage/SOLAR-10.7B-Instruct-v1.0 and bhavinjawade/SOLAR-10B-OrcaDPO-Jawade	https://huggingface.co/kodonho/Solar-OrcaDPO-Solar-Instruct-SLERP	https://huggingface.co/datasets/open-llm-leaderboard/details_kodonho__Solar-OrcaDPO-Solar-Instruct-SLERP
Slerp merge of DopeorNope/SOLARC-M-10.7B and kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2, both of which are also merges.	https://huggingface.co/kodonho/SolarM-SakuraSolar-SLERP	https://huggingface.co/datasets/open-llm-leaderboard/details_kodonho__SolarM-SakuraSolar-SLERP
Merge of upstage/SOLAR-10.7B-Instruct-v1.0 and rishiraj/meow	https://huggingface.co/Yhyu13/LMCocktail-10.7B-v1	https://huggingface.co/datasets/open-llm-leaderboard/details_Yhyu13__LMCocktail-10.7B-v1
Based on merge of AIDC-ai-business/Marcoroni-7B-v3 and EmbeddedLLM/Mistral-7B-Merge-14-v0.1	https://huggingface.co/mlabonne/NeuralMarcoro14-7B	https://huggingface.co/datasets/open-llm-leaderboard/details_mlabonne__NeuralMarcoro14-7B
Neuronovo is based on CultriX/MistralTrix-v1, which in turn is based on zyh3826/GML-Mistral-merged-v1 merge. zyh3826/GML-Mistral-merged-v1 is a merge of quantumaikr/quantum-v0.01 and mncai/mistral-7b-dpo-v5	https://huggingface.co/Neuronovo/neuronovo-7B-v0.2	https://huggingface.co/datasets/open-llm-leaderboard/details_Neuronovo__neuronovo-7B-v0.2
fine-tune of CultriX/MistralTrix-v1 which is based on zyh3826/GML-Mistral-merged-v1 merge. zyh3826/GML-Mistral-merged-v1 is a merge of quantumaikr/quantum-v0.01 and mncai/mistral-7b-dpo-v5	https://huggingface.co/ryandt/MusingCaterpillar	https://huggingface.co/datasets/open-llm-leaderboard/details_ryandt__MusingCaterpillar
fine-tune of Neuronovo/neuronovo-7B-v0.2 which is a fine-tune of CultriX/MistralTrix-v1, which is based on zyh3826/GML-Mistral-merged-v1.zyh3826/GML-Mistral-merged-v1 is a merge of quantumaikr/quantum-v0.01 and mncai/mistral-7b-dpo-v5	https://huggingface.co/Neuronovo/neuronovo-7B-v0.3	https://huggingface.co/datasets/open-llm-leaderboard/details_Neuronovo__neuronovo-7B-v0.3
DPO of SanjiWatsuki/Lelantos-7B which is a merge of mostly unspecified models but openaccess-ai-collective/DPOpenHermes-7B-v2 and jan-hq/stealth-v1.2 are mentioned as being used for the merge.	https://huggingface.co/SanjiWatsuki/Lelantos-DPO-7B	https://huggingface.co/datasets/open-llm-leaderboard/details_SanjiWatsuki__Lelantos-DPO-7B
based on mindy-labs/mindy-7b-v2 which in turn is a merge of AIDC-ai-business/Marcoroni-7B-v3 and Weyaxi/Seraph-7B	https://huggingface.co/bardsai/jaskier-7b-dpo	https://huggingface.co/datasets/open-llm-leaderboard/details_bardsai__jaskier-7b-dpo
fine-tune of merge of EmbeddedLLM/Mistral-7B-Merge-14-v0.2 and cookinai/CatMacaroni-Slerp	https://huggingface.co/cookinai/OpenCM-14	https://huggingface.co/datasets/open-llm-leaderboard/details_cookinai__OpenCM-14
based on mindy-labs/mindy-7b-v2 which in turn is a merge of AIDC-ai-business/Marcoroni-7B-v3 and Weyaxi/Seraph-7B	https://huggingface.co/bardsai/jaskier-7b-dpo-v2	https://huggingface.co/datasets/open-llm-leaderboard/details_bardsai__jaskier-7b-dpo-v2
merge of OpenHermes-2.5-neural-chat-v3-3-Slerp, MetaMath-Cybertron-Starling and Marcoroni-7B-v3.	https://huggingface.co/jan-hq/supermario-v2	https://huggingface.co/datasets/open-llm-leaderboard/details_janhq__supermario-v2

clefourrier

Hugging Face H4 org Jan 17

Amazing job, thank you! And thanks for the mini FAQ!
I'll take care of it this week :)
The easiest for me is "model name, model path, problem" - I usually don't need the details.

For the "I made a merge without combining model weights", people will need to tag them as moe if they are.

adamo1139

Jan 17

Thanks. I added note about MoE's to my top comment.
There is no notification that is sent out when you flag a model, right? I think I will be mentioning people here by their username after you flag their model to send them out a notification. Many of them probably look at the leaderboard very sparingly, so they may end up not realizing that their model was flagged.

facat

Jan 20

Hello, our model https://huggingface.co/SUSTech/SUS-Chat-34B is finetuned from yi-34b directly and seems should not be flagged as merged, could you please fix this?

adamo1139

Jan 20

@facat I absolutely agree that your model shouldn't be flagged. My models are also yi-34b fine-tunes and aren't merges yet 2 of them got flagged anyway. @clefourrier told me that there is no functionality of unflagging yet implemented for those models flagged by mistake, so they remain flagged. Maybe now it will get higher prio as more "prestigious" models like your finetune or official Mixtral Instruct from Mistral AI got flagged. I don't know what's up with the script used by HF to do the flagging, but it's flagging a lot of models which absolutely have no reason to be flagged. Clementine will hopefully look into this on Monday.

clefourrier pinned discussion Jan 22

euclaise

Jan 23

Filtering for only pretrained models yields a leaderboard dominated by untagged models. Here's an incomplete list, if it helps:

https://huggingface.co/cloudyu/Yi-34Bx2-MoE-60B - MoErge
https://huggingface.co/cloudyu/Mixtral_34Bx2_MoE_60B (x2) - MoErge
https://huggingface.co/gagan3012/MetaModel_moe (x2) - MoErge
https://huggingface.co/macadeliccc/SOLAR-math-2x10.7b-v0.2 - MoErge
https://huggingface.co/cloudyu/Mixtral_7Bx2_MoE - MoErge
https://huggingface.co/macadeliccc/SOLAR-math-2x10.7b - MoErge
https://huggingface.co/macadeliccc/Orca-SOLAR-4x10.7b - MoErge
https://huggingface.co/macadeliccc/piccolo-8x7b - merge, probably also MoErge
https://huggingface.co/freecs/ThetaWave-7B - domain data, I think?
https://huggingface.co/cloudyu/Mixtral_7Bx4_MOE_24B - MoErge
https://huggingface.co/Walmart-the-bag/WordWoven-13B - possible MoErge
https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo (x2) - MoErge, DPO, laser
https://huggingface.co/chargoddard/internlm2-20b-llama and https://huggingface.co/internlm/internlm2-20b and https://huggingface.co/chargoddard/internlm2-7b-llama and https://huggingface.co/internlm/internlm2-7b - domain data
https://huggingface.co/macadeliccc/polyglot-math-4x7b - MoErge

clefourrier

Hugging Face H4 org Jan 24

Thanks for your work! I'll take a look this week :)

adamo1139

Jan 24

@clefourrier Can you please also flag models that I listed in this discussion earlier? It seems like all of them are still unflagged, I am waiting with next round until those are flagged.

adamo1139

Jan 24

Also, I noticed a new technique of avoiding detection that was predictable. I feel like people who post models that are very likely contaminated or merged just started to avoid disclosing any real details about the process or finetune.
Is there a way to require some base amount of information disclosure from them?

Examples here:
PetroGPT/WestSeverus-7B-DPO
senseable/WestLake-7B-v2

All of PetroGPT's models just have some default model card with 0 informations about the model, it's practically equal to having no model card. Senseable has a few merged and contaminated models on their user profile, and WestLake v2 model itself has zero concrete information about dataset used, and all questions about it are dismissed due to it being "proprietary", "novel" with no concrete information. All of it looks like another contaminated model in hiding. If withholding basic information is allowed, leaderboard will be unreliable again in no time.

clefourrier

Hugging Face H4 org Jan 25

Hi!
Thanks to you both for your work!
I just added the flags for the relevant reported models.
Closing this discussion wrt flagging specifically :)

clefourrier changed discussion status to closed Jan 25

clefourrier

Hugging Face H4 org Jan 25

•

edited Jan 25

@adamo1139 It's a very good point - we already added a filter on the length of the model card, as well as on the presence or absence of some metadata, but we haven't found a good way to constrain the submissions more.
I think the main problem wrt all this is that big model creators are sharing less and less information about their models, so people in the community feel like they shouldn't have to either. But we need strict metadata efforts if we want the field to go in the right direction, with clear and comparable models. I don't have a solution atm sadly.

MichaelBarryUK

Jan 28

I understand why merged models should be tagged, but can someone explain to me why merged models are hidden by default? It feels like merges are something to be ashamed of. Am I missing something?

adamo1139

Jan 28

@MichaelBarryUK Merging was IMO abused to spam the leaderboard with merge of a merge of a merge, taking up many spots and basically littering it.
https://old.reddit.com/r/LocalLLaMA/comments/18xbevs/open_llm_leaderboard_is_disgusting/
Getting it hidden by default is a good way of discouraging this sort of stuff, I don't think HF has enough resources to moderate leaderboard submissions manually.

clefourrier

Hugging Face H4 org Jan 29

Hi @MichaelBarryUK ,
Good question! It's 1) as a response to abuse, as @adamo1139 pointed out, but also 2) because most people coming to the leaderboard to look for the best models to use or fine-tune are not interested in merges, which usually have hard lineages to track down (in terms of data and licensing).

clefourrier

Hugging Face H4 org Jan 29

And yep 100 to @adamo1139 , we definitely can't moderate everything manually.

MichaelBarryUK

Jan 29

Thank you for clarifying, I'm only just starting to learn about merges. These models that spammed the leaderboard, were they actually any good in terms of capability? I'm wondering whether the source of the problem was technical or legal/ethical. Thanks again

MichaelBarryUK

Jan 29

Sorry, I'm jumped the gun with that latest question, I just read the reddit page and I understand now that they were overfit, therefore answering my own question 😂

MichaelBarryUK

Jan 29

Also, well done for fixing this issue, I certainly wouldn't want to have to trawl through all that to find a decent model. That merge model family tree space is incredible. Kudos to whoever built that

euclaise

Jan 29

Thank you for clarifying, I'm only just starting to learn about merges. These models that spammed the leaderboard, were they actually any good in terms of capability? I'm wondering whether the source of the problem was technical or legal/ethical. Thanks again

They are often good but don't have any advances in terms of dataset or training method - so looking for the best model won't give you the model with the best training methodology, it'll instead give you a merge of several other models, which may be merges themselves, which would make the leaderboard less useful.

davzoku

Feb 8

Hi,

can i check if my model is incorrectly flagged?
https://huggingface.co/davzoku/cria-llama2-7b-v1.3

it is just a llama-2-7b-chat-hf finetuned with qlora.

Thank you