Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

Resources

View closed (740)

💬 Discussion thread: Model contamination techniques 💬

#472 opened 6 months ago by

Future feature: system prompt and chat support

#459 opened 6 months ago by

💬 Discussion thread: Model scores and model performances 💬

#265 opened 9 months ago by

💎 Resources and community initiatives around the Leaderboard! 💎

#174 opened 10 months ago by

Model "Amu/dpo-qlora-Qwen1.5-0.5B-Chat-xtuner" was not found or misconfigured on the hub!

#772 opened about 13 hours ago by

Amu

Model "X" was not found or misconfigured on the hub!

#771 opened 1 day ago by

GSM8K Evaluation has a serious bug/oversight, that is negatively impacting score of all Llama 3 models. Please consider updating to the latest commit of lm-evaluation-harness which fixes it.

#770 opened 1 day ago by

models submitted for eval are finished, but do not appear on the leaderboard

#768 opened 2 days ago by

FLAG: saltlux/luxia-21.4b-alignment-v1.2 GSM8k v1.0 to v1.2 29% GSM Tests contamination

#767 opened 2 days ago by

loading_from_contents

#766 opened 4 days ago by

gradientai/Llama-3-70B-Instruct-Gradient-1048k FAILED, can I investigate the logs?

#765 opened 4 days ago by

leo-pekelis-gradient

Feature request: Add toggle to only show models with linked dataset

#763 opened 5 days ago by

Feature request: Hide models with insufficient model card from default view in leaderboard

#762 opened 5 days ago by

Discussion: naming pattern to converge on to better identify fine-tunes

#761 opened 5 days ago by

Model disappeared from app(I can't find the dataset related with the model either)

#759 opened 6 days ago by

kamilmuratyilmaz

reclassify some ORPO models as chat 💬

#758 opened 7 days ago by

Models that used Nectar dataset

#749 opened 14 days ago by

TRI-ML/mamba-7b-rw failed

#704 opened about 1 month ago by

GPTQ and Mixtral models will need to be relaunched

#692 opened about 1 month ago by

ALL Jamba models failing

#690 opened about 1 month ago by

No good way to identify number of activated parameters causes MIxtral evaluation failures

#680 opened about 2 months ago by

Crowd-Source Hardware for the LeaderBoard?

#570 opened 4 months ago by

Eval models for data contamination?

#561 opened 4 months ago by

Feature request: Run 100B + models automatically

#434 opened 6 months ago by

Feature Request for Leaderboard: date added to hub

#425 opened 6 months ago by

Feature request: Using weights hash to identify duplicates

#422 opened 6 months ago by

Feature request: Add non AutoModelForCausalLM models

#391 opened 6 months ago by

KnutJaegersberg

Tool: Adding evaluation results to model cards

#370 opened 7 months ago by

Feature suggestion: average of selected (rather than all) columns

#368 opened 7 months ago by

Tool: Open LLM Leaderboard Model Renamer

#310 opened 8 months ago by

Checking for toxicity too

#53 opened 12 months ago by

ronald-d-rogers