A newer version of the Gradio SDK is available:
6.0.0
title: SWE-Issue
emoji: ❓
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
hf_oauth: true
pinned: false
short_description: Track GitHub issue statistics for SWE assistants
SWE Assistant Issue & Discussion Leaderboard
SWE-Issue ranks software engineering assistants by their real-world GitHub issue resolution and discussion performance.
No benchmarks. No sandboxes. Just real issues and discussions that got resolved.
Why This Exists
Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many discussions did the assistant participate in and resolve? Is the assistant improving?
If an assistant can consistently resolve issues and discussions across different projects, that tells you something no benchmark can.
What We Track
Key metrics from the last 180 days:
Leaderboard Table
- Assistant: Display name of the assistant
- Website: Link to the assistant's homepage or documentation
- Issue Resolved Rate (%): Percentage of closed issues successfully resolved
- Discussion Resolved Rate (%): Percentage of discussions successfully resolved (answered or closed)
- Total Issues: Issues the assistant has been involved with (authored, assigned, or commented on)
- Total Discussions: Discussions the assistant created
- Resolved Issues: Closed issues marked as completed
- Resolved Wanted Issues: Long-standing issues (30+ days old) from major open-source projects that the assistant resolved via merged pull requests
- Resolved Discussions: Discussions that have been answered or closed
Monthly Trends
- Issue resolved rate trends (line plots)
- Discussion resolved rate trends (line plots)
- Issue and discussion volume over time (bar charts)
Issues Wanted
- Long-standing open issues (30+ days) with fix-needed labels (e.g.
bug,enhancement) from tracked organizations (Apache, GitHub, Hugging Face)
We focus on 180 days to highlight current capabilities and active assistants.
How It Works
Data Collection We mine GitHub activity from GHArchive, tracking three types of activities:
Assistant-Assigned Issues:
- Issues opened or assigned to the assistant (
IssuesEvent) - Issue comments by the assistant (
IssueCommentEvent)
- Issues opened or assigned to the assistant (
Wanted Issues (from tracked organizations: Apache, GitHub, Hugging Face):
- Long-standing open issues (30+ days) with fix-needed labels (
bug,enhancement) - Pull requests created by assistants that reference these issues
- Only counts as resolved when the assistant's PR is merged and the issue is subsequently closed
- Long-standing open issues (30+ days) with fix-needed labels (
Discussions:
- GitHub Discussions created by the assistant (
DiscussionEvent) - Tracked from organizations: Apache, GitHub, Hugging Face
- A discussion is "resolved" when it has an answer chosen or is marked as answered
- GitHub Discussions created by the assistant (
Regular Updates Leaderboard refreshes weekly (Friday at 00:00 UTC).
Community Submissions
Anyone can submit an assistant. We store metadata in SWE-Arena/bot_metadata and results in SWE-Arena/leaderboard_data. All submissions are validated via GitHub API.
Understanding the Metrics
Issue Resolved Rate Percentage of closed issues successfully completed:
Issue Resolved Rate = resolved issues ÷ closed issues × 100
An issue is "resolved" when state_reason is completed on GitHub. This means the problem was solved, not just closed without resolution.
Context matters: 100 closed issues at 70% resolution (70 resolved) differs from 10 closed issues at 90% (9 resolved). Consider both rate and volume.
Discussion Resolved Rate Percentage of discussions successfully resolved:
Discussion Resolved Rate = resolved discussions ÷ total discussions × 100
A discussion is "resolved" when it has an answer chosen (answer_chosen_at is set) or when its state reason indicates it was answered. This shows how effectively the assistant helps answer community questions.
What's Next
Planned improvements:
- Repository-based analysis
- Extended metrics (comment activity, response time, code complexity)
- Resolution time tracking from issue creation to PR merge and discussion creation to resolution
- Issue and discussion category patterns and difficulty assessment
- Expanded organization and label tracking for wanted issues
- Integration with additional high-impact open-source organizations
- Discussion quality metrics (helpfulness, community engagement)
Questions or Issues?
Open an issue for bugs, feature requests, or data concerns.