Spaces:

SWE-Arena
/

SWE-Issue

Running

App Files Files Community

SWE-Issue / README.md

zhimin-z

refine

ca087aa 1 day ago

preview code

raw

history blame contribute delete

4.72 kB

A newer version of the Gradio SDK is available: 6.0.0

Upgrade

metadata

title: SWE-Issue
emoji: ❓
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
hf_oauth: true
pinned: false
short_description: Track GitHub issue statistics for SWE assistants

SWE Assistant Issue & Discussion Leaderboard

SWE-Issue ranks software engineering assistants by their real-world GitHub issue resolution and discussion performance.

No benchmarks. No sandboxes. Just real issues and discussions that got resolved.

Why This Exists

Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many discussions did the assistant participate in and resolve? Is the assistant improving?

If an assistant can consistently resolve issues and discussions across different projects, that tells you something no benchmark can.

What We Track

Key metrics from the last 180 days:

Leaderboard Table

Assistant: Display name of the assistant
Website: Link to the assistant's homepage or documentation
Issue Resolved Rate (%): Percentage of closed issues successfully resolved
Discussion Resolved Rate (%): Percentage of discussions successfully resolved (answered or closed)
Total Issues: Issues the assistant has been involved with (authored, assigned, or commented on)
Total Discussions: Discussions the assistant created
Resolved Issues: Closed issues marked as completed
Resolved Wanted Issues: Long-standing issues (30+ days old) from major open-source projects that the assistant resolved via merged pull requests
Resolved Discussions: Discussions that have been answered or closed

Monthly Trends

Issue resolved rate trends (line plots)
Discussion resolved rate trends (line plots)
Issue and discussion volume over time (bar charts)

Issues Wanted

Long-standing open issues (30+ days) with fix-needed labels (e.g. bug, enhancement) from tracked organizations (Apache, GitHub, Hugging Face)

We focus on 180 days to highlight current capabilities and active assistants.

How It Works

Data Collection We mine GitHub activity from GHArchive, tracking three types of activities:

Assistant-Assigned Issues:
- Issues opened or assigned to the assistant (IssuesEvent)
- Issue comments by the assistant (IssueCommentEvent)
Wanted Issues (from tracked organizations: Apache, GitHub, Hugging Face):
- Long-standing open issues (30+ days) with fix-needed labels (bug, enhancement)
- Pull requests created by assistants that reference these issues
- Only counts as resolved when the assistant's PR is merged and the issue is subsequently closed
Discussions:
- GitHub Discussions created by the assistant (DiscussionEvent)
- Tracked from organizations: Apache, GitHub, Hugging Face
- A discussion is "resolved" when it has an answer chosen or is marked as answered

Regular Updates Leaderboard refreshes weekly (Friday at 00:00 UTC).

Community Submissions Anyone can submit an assistant. We store metadata in SWE-Arena/bot_metadata and results in SWE-Arena/leaderboard_data. All submissions are validated via GitHub API.

Understanding the Metrics

Issue Resolved Rate Percentage of closed issues successfully completed:

Issue Resolved Rate = resolved issues ÷ closed issues × 100

An issue is "resolved" when state_reason is completed on GitHub. This means the problem was solved, not just closed without resolution.

Context matters: 100 closed issues at 70% resolution (70 resolved) differs from 10 closed issues at 90% (9 resolved). Consider both rate and volume.

Discussion Resolved Rate Percentage of discussions successfully resolved:

Discussion Resolved Rate = resolved discussions ÷ total discussions × 100

A discussion is "resolved" when it has an answer chosen (answer_chosen_at is set) or when its state reason indicates it was answered. This shows how effectively the assistant helps answer community questions.

What's Next

Planned improvements:

Repository-based analysis
Extended metrics (comment activity, response time, code complexity)
Resolution time tracking from issue creation to PR merge and discussion creation to resolution
Issue and discussion category patterns and difficulty assessment
Expanded organization and label tracking for wanted issues
Integration with additional high-impact open-source organizations
Discussion quality metrics (helpfulness, community engagement)

Questions or Issues?

Open an issue for bugs, feature requests, or data concerns.