Spaces:
Runtime error
Runtime error
metadata
title: PoCLeaderboard
emoji: 🏆
colorFrom: green
colorTo: pink
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
license: mit
short_description: Example Leaderboard
This Space provides an interactive leaderboard for comparing language model performance across various benchmarks and custom tasks.
Features
- Automated model evaluation using lm-evaluation-harness
- Support for standard and custom benchmarks
- Interactive visualization of results
- Daily automated evaluations
- Easy submission of new models and custom tasks
Usage
- Visit the Space to view current leaderboard
- Submit new models for evaluation
- Create custom evaluation tasks
- Track performance trends over time
Custom Task Format
{
"examples": [
{
"input": "question or prompt",
"ideal": "expected answer",
"metrics": ["accuracy", "f1"]
}
]
}