Spaces:
Runtime error
Runtime error
title: PoCLeaderboard | |
emoji: ๐ | |
colorFrom: green | |
colorTo: pink | |
sdk: gradio | |
sdk_version: 5.4.0 | |
app_file: app.py | |
pinned: false | |
license: mit | |
short_description: Example Leaderboard | |
This Space provides an interactive leaderboard for comparing language model performance across various benchmarks and custom tasks. | |
## Features | |
- Automated model evaluation using lm-evaluation-harness | |
- Support for standard and custom benchmarks | |
- Interactive visualization of results | |
- Daily automated evaluations | |
- Easy submission of new models and custom tasks | |
## Usage | |
1. Visit the Space to view current leaderboard | |
2. Submit new models for evaluation | |
3. Create custom evaluation tasks | |
4. Track performance trends over time | |
## Custom Task Format | |
```json | |
{ | |
"examples": [ | |
{ | |
"input": "question or prompt", | |
"ideal": "expected answer", | |
"metrics": ["accuracy", "f1"] | |
} | |
] | |
} | |
``` | |