TRAIL / README.md
jitinpatronus's picture
Update README.md
6cb2a6e verified
---
title: TRAIL Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: mit
short_description: Trace Reasoning and Agentic Issue Localization Leaderboard
sdk_version: 5.19.0
tags:
- leaderboard
---
# Model Performance Leaderboard
This is a Hugging Face Space that hosts a leaderboard for comparing model performances across various metrics of TRAIL dataset.
## Features
- **Submit Your Answers**: Run your model on TRAIL dataset. Submit your results.
- **Leaderboard**: View how your submissions are ranked.
## Instructions
* Please refer to our GitHub repository at https://github.com/patronus-ai/trail-benchmark for step‑by‑step instructions on how to run your model with the TRAIL dataset.
* Please upload a zip file containing your model outputs. The zip file should contain:
- One or more directories with model outputs
- Each directory should contain JSON files with the model's predictions
- Directory names should indicate the split (GAIA_ or SWE_)
* Once the evaluation is complete, we’ll upload the scores (this process will soon be automated).
## Benchmarking on TRAIL
[TRAIL(Trace Reasoning and Agentic Issue Localization)](https://arxiv.org/abs/2505.08638) is a benchmark dataset of 148 annotated AI agent execution traces containing 841 errors across reasoning, execution, and planning categories. Created from real-world software engineering and information retrieval tasks, it challenges even state-of-the-art LLMs, with the best model achieving only 11% accuracy, highlighting the difficulty of trace debugging for complex agent workflows.
## License
This project is open source and available under the MIT license.