Spaces:

ozayezerceli
/

PoCLeaderboard

Runtime error

App Files Files Community

PoCLeaderboard / README.md

ozayezerceli's picture

Update README.md

07c9dc6 verified 2 months ago

|

895 Bytes

metadata

title: PoCLeaderboard
emoji: 🏆
colorFrom: green
colorTo: pink
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
license: mit
short_description: Example Leaderboard

This Space provides an interactive leaderboard for comparing language model performance across various benchmarks and custom tasks.

Features

Automated model evaluation using lm-evaluation-harness
Support for standard and custom benchmarks
Interactive visualization of results
Daily automated evaluations
Easy submission of new models and custom tasks

Usage

Visit the Space to view current leaderboard
Submit new models for evaluation
Create custom evaluation tasks
Track performance trends over time

Custom Task Format

{
  "examples": [
    {
      "input": "question or prompt",
      "ideal": "expected answer",
      "metrics": ["accuracy", "f1"]
    }
  ]
}