PoCLeaderboard / README.md
ozayezerceli's picture
Update README.md
07c9dc6 verified
|
raw
history blame
895 Bytes
metadata
title: PoCLeaderboard
emoji: 🏆
colorFrom: green
colorTo: pink
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
license: mit
short_description: Example Leaderboard

This Space provides an interactive leaderboard for comparing language model performance across various benchmarks and custom tasks.

Features

  • Automated model evaluation using lm-evaluation-harness
  • Support for standard and custom benchmarks
  • Interactive visualization of results
  • Daily automated evaluations
  • Easy submission of new models and custom tasks

Usage

  1. Visit the Space to view current leaderboard
  2. Submit new models for evaluation
  3. Create custom evaluation tasks
  4. Track performance trends over time

Custom Task Format

{
  "examples": [
    {
      "input": "question or prompt",
      "ideal": "expected answer",
      "metrics": ["accuracy", "f1"]
    }
  ]
}