Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

811

Models with multiple submissions.

#322

by xzuyn - opened Oct 12, 2023

Discussion

xzuyn

Oct 12, 2023

Some models are being submitted multiple times.

These are just the 7B ones I saw with 2 submissions or more.

ehartford/dolphin-2.1-mistral-7b

Open-Orca/Mistral-7B-SlimOrca

TheBloke/Llama-2-7B-GPTQ

TheTravellingEngineer/llama2-7b-chat-hf-v2

TheTravellingEngineer/llama2-7b-chat-hf-v3

TheTravellingEngineer/llama2-7b-chat-hf-v4

codellama/CodeLlama-7b-Instruct-hf

codellama/CodeLlama-7b-Python-hf

garage-bAInd/Platypus2-7B

kfkas/Llama-2-ko-7b-Chat

kittn/mistral-7B-v0.1-hf

lmsys/vicuna-7b-v1.5

lmsys/vicuna-7b-v1.5-16k

meta-llama/Llama-2-7b-hf

mosaicml/mpt-7b

mosaicml/mpt-7b-8k-chat

mosaicml/mpt-7b-8k-instruct

mosaicml/mpt-7b-storywriter

PocketDoc/Dans-TotSirocco-7b

tiiuae/falcon-7b-instruct

togethercomputer/LLaMA-2-7B-32K

togethercomputer/Llama-2-7B-32K-Instruct

wenge-research/yayi-7b-llama2

clefourrier

Open LLM Leaderboard org Oct 12, 2023

Hi!
Are these models the same at the precision and commit level?

xzuyn

Oct 12, 2023

•

edited Oct 12, 2023

They are different precisions and categories, but evals seem to be within the margins of error, so basically the same result but multiple times.

Example:

I don't know what way this should be dealt with though, just thought to bring it up.

SaylorTwift

Open LLM Leaderboard org Oct 12, 2023

The way we are dealing with it is by having filters on precision. If, however, the same model has two different categories (like the model you just showed), this is a mistake in the request file.

xzuyn

Oct 12, 2023

•

edited Oct 12, 2023

There's probably not going to be any useful difference between float16 and bfloat16 evals though.

Also filtering by precision doesn't exactly solve this since you don't really get to compare all models, since some may be submitted only as float16 or bfloat16 or 8bit, so either you filter it to one of those and then you can't see some models, or you don't filter and you see duplicates.

Another example of no noticeable difference:

clefourrier

Open LLM Leaderboard org Oct 12, 2023

@xzuyn We don't plan on changing this mechanism - we understand that it brings a bit of redundancy between the bfloat16 and float16 models, but since you can hide the quantized models from a given search, it should still allow people to compare models quite fast. Thank you for your interest in the leaderboard!
Closing as it is not an issue but a feature.

clefourrier changed discussion status to closed Oct 12, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment