open-llm-leaderboard/open_llm_leaderboard

Resource: Understanding the new benchmarks

pinned

1

#796 opened about 2 months ago by

rombodawg

💬 Discussion thread: Model contamination techniques 💬

pinned

34

#472 opened 8 months ago by

clefourrier

💎 Resources and community initiatives around the Leaderboard! 💎

pinned

#174 opened about 1 year ago by

clefourrier

flan-t5 evals failing

#898 opened about 14 hours ago by

pszemraj

phi-3-small-128k MATH Lvl 5 is 0

#897 opened about 19 hours ago by

huu-ontocord

Two model evaluations failed for abacusai/Dracarys-*

#896 opened 1 day ago by

siddartha-abacus

model evaluation failed

2

#895 opened 2 days ago by

thomas-yanxin

Three failed evaluations

6

#892 opened 4 days ago by

Pretergeek

Failed requests

12

#888 opened 5 days ago by

LiteAI-Team

model evaluation failed

7

#886 opened 8 days ago by

MaziyarPanahi

Model fail, re-eval request 😊

3

#885 opened 8 days ago by

dnhkng

Incorrect ifeval benchmark

2

#879 opened 13 days ago by

DavidGF

Increasing upper limit of `Select the number of parameters (B)` to support larger open-source models like `meta-llama/Meta-Llama-3.1-405B-Instruct`

5

#858 opened about 1 month ago by

singhsidhukuldeep

Upvote to evaluate deepseek-coder-v2

2

#793 opened about 2 months ago by

g1y5x3

Feature request: Add toggle to only show models with linked dataset

1

#763 opened 3 months ago by

ThiloteE

Feature request: Hide models with insufficient model card from default view in leaderboard

4

#762 opened 3 months ago by

ThiloteE

Discussion: naming pattern to converge on to better identify fine-tunes

17

#761 opened 3 months ago by

ThiloteE

Crowd-Source Hardware for the LeaderBoard?

4

#570 opened 7 months ago by

ibivibiv

Feature request: Using weights hash to identify duplicates

1

#422 opened 9 months ago by

mrfakename

Tool: Adding evaluation results to model cards

47

#370 opened 10 months ago by

Weyaxi

Feature suggestion: average of selected (rather than all) columns

4

#368 opened 10 months ago by

Minus0

Tool: Open LLM Leaderboard Model Renamer

31

#310 opened 11 months ago by

Weyaxi