Add eval scores to model similar to spaces section
#38 opened about 9 hours ago
by
SebastianSchramm
Falcon 7b Model Run Video & code
#37 opened about 17 hours ago
by
decodingdatascience

Include number of likes in the leaderboard
#36 opened about 17 hours ago
by
osanseviero

What about inference speed ?
#35 opened 1 day ago
by
Pitchboy

Model not found
1
#34 opened 2 days ago
by
mrm8488

please include sambanovasystems/BLOOMChat-176B-v1
#33 opened 3 days ago
by
gsaivinay
Scores of GPT3.5 and GPT4 for comparison
3
#30 opened 4 days ago
by
gsaivinay
Possibly include multi-lingual benchmarks like C-Eval and XCopa
1
#29 opened 5 days ago
by
yaofu
The system message and prompt format for TruthfulQA results for Vicuna 13B
2
#28 opened 6 days ago
by
hamidpalangi
Leaderboard is of very limited use without more 0-shot, instruction prompted datasets
#27 opened 7 days ago
by
JulesGM

Why MMLU is so much lower than the results reported in some papers like LLama 65B
5
#26 opened 8 days ago
by
lumosity
Interesting stats
6
#25 opened 9 days ago
by
BBLL3456
Possibility to include benchmarks in other languages
#24 opened 9 days ago
by
avacaondata

Request for access to raw dataset
#23 opened 10 days ago
by
gsaivinay
What if a model trained on one of the evaluation datasets?
2
#22 opened 10 days ago
by
nbroad

[Feature request] Search bar / regex for model
#21 opened 11 days ago
by
natolambert

how do you set task name for MMLU(5-shot) in LMEH?
2
#20 opened 16 days ago
by
Linbo
Stuck on 4bit?
11
#17 opened 17 days ago
by
xzuyn
Progress is too slow
1
#16 opened 17 days ago
by
felixz
Multilingual Scoreboard?
2
#15 opened 18 days ago
by
aari1995

[Bug] Average is taking empty columns as 0
3
#14 opened 20 days ago
by
natolambert

Column Request(s): VRAM usage, avg speed per inference,
1
#13 opened 20 days ago
by
russ771
Add license/commercial use column?
3
#12 opened 21 days ago
by
fireforge
Adding more models
2
#11 opened 21 days ago
by
abidlabs

Display when the evaluation was run
1
#10 opened 21 days ago
by
osanseviero

[Feature] Add code evals section
#9 opened 21 days ago
by
natolambert

Inclusion of non open LLMs for straightforward comparison
1
#7 opened 21 days ago
by
Supreeth

Show leaderboard position column
#6 opened 21 days ago
by
tomaarsen

Evaluation for fictional writing models
3
#5 opened 21 days ago
by
Henk717

Convert branch revision into commit revision
1
#4 opened 21 days ago
by
tomaarsen
