Spaces:

PatronusAI
/

TRAIL

Running

App Files Files Community

TRAIL / README.md

jitinpatronus

Update README.md

6cb2a6e verified 2 days ago

preview code

raw

history blame contribute delete

1.71 kB

	---
	title: TRAIL Leaderboard
	emoji: 🥇
	colorFrom: green
	colorTo: indigo
	sdk: gradio
	app_file: app.py
	pinned: true
	license: mit
	short_description: Trace Reasoning and Agentic Issue Localization Leaderboard
	sdk_version: 5.19.0
	tags:
	- leaderboard
	---
	# Model Performance Leaderboard

	This is a Hugging Face Space that hosts a leaderboard for comparing model performances across various metrics of TRAIL dataset.

	## Features

	- Submit Your Answers: Run your model on TRAIL dataset. Submit your results.
	- Leaderboard: View how your submissions are ranked.

	## Instructions

	* Please refer to our GitHub repository at https://github.com/patronus-ai/trail-benchmark for step‑by‑step instructions on how to run your model with the TRAIL dataset.
	* Please upload a zip file containing your model outputs. The zip file should contain:
	- One or more directories with model outputs
	- Each directory should contain JSON files with the model's predictions
	- Directory names should indicate the split (GAIA_ or SWE_)
	* Once the evaluation is complete, we’ll upload the scores (this process will soon be automated).

	## Benchmarking on TRAIL

	[TRAIL(Trace Reasoning and Agentic Issue Localization)](https://arxiv.org/abs/2505.08638) is a benchmark dataset of 148 annotated AI agent execution traces containing 841 errors across reasoning, execution, and planning categories. Created from real-world software engineering and information retrieval tasks, it challenges even state-of-the-art LLMs, with the best model achieving only 11% accuracy, highlighting the difficulty of trace debugging for complex agent workflows.

	## License

	This project is open source and available under the MIT license.