Spaces:

SWE-Arena
/

SWE-Review

Running

App Files Files Community

zhiminy commited on Oct 18

Commit

325abdd

1 Parent(s): 9b447a6

add

Browse files

Files changed (3) hide show

README.md +8 -8
app.py +20 -27
msr.py +9 -9

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ Currently, the leaderboard tracks public GitHub PR review activity across open-s
 Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent participates in real code reviews with real maintainers and real quality standards.
-This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: how many PRs did the agent review? What percentage of those reviews led to accepted PRs? What percentage were rejected? These are the signals that reflect genuine code review quality - the kind you'd expect from a human reviewer.
 If an agent can consistently provide valuable reviews that help maintainers accept quality PRs across different projects, that tells you something no benchmark can.
@@ -32,9 +32,9 @@ The leaderboard pulls data directly from GitHub's PR review history and shows yo
 **Leaderboard Table**
 - **Total Reviews**: How many PR reviews the agent has made in the last 6 months
-- **Accepted PRs**: How many PRs reviewed by the agent were accepted/merged
 - **Rejected PRs**: How many PRs reviewed by the agent were rejected/closed without merging
-- **Acceptance Rate**: Percentage of reviewed PRs that were accepted (see calculation details below)
 **Monthly Trends Visualization**
 Beyond the table, we show interactive charts tracking how each agent's performance evolves month-by-month:
@@ -57,7 +57,7 @@ We search GitHub using the PR and review search APIs to track all reviews associ
 **Review Outcome Tracking**
 For each PR reviewed by an agent, we determine its status:
-1. **Accepted**: PR was merged into the repository
 2. **Rejected**: PR was closed without being merged
 3. **Pending**: PR is still open and under review
@@ -89,13 +89,13 @@ Click Submit. We'll validate the GitHub account, fetch the PR review history, an
 ## Understanding the Metrics
-**Total Reviews vs Accepted/Rejected PRs**
-Not every PR will be accepted. PRs may be rejected due to bugs, insufficient quality, conflicts with project goals, or other reasons. The acceptance and rejection rates help you understand how effective an agent's reviews are at identifying quality contributions.
 **Acceptance Rate**
-This is the percentage of reviewed PRs that were ultimately accepted and merged, calculated as:
-Acceptance Rate = Accepted PRs ÷ (Accepted PRs + Rejected PRs) × 100%
 Note: Pending PRs (still open) are excluded from this calculation to ensure we only measure completed review outcomes.

 Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent participates in real code reviews with real maintainers and real quality standards.
+This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: how many PRs did the agent review? What percentage of those reviews led to merged PRs? What percentage were rejected? These are the signals that reflect genuine code review quality - the kind you'd expect from a human reviewer.
 If an agent can consistently provide valuable reviews that help maintainers accept quality PRs across different projects, that tells you something no benchmark can.
 **Leaderboard Table**
 - **Total Reviews**: How many PR reviews the agent has made in the last 6 months
+- **Merged PRs**: How many PRs reviewed by the agent were merged
 - **Rejected PRs**: How many PRs reviewed by the agent were rejected/closed without merging
+- **Acceptance Rate**: Percentage of reviewed PRs that were merged (see calculation details below)
 **Monthly Trends Visualization**
 Beyond the table, we show interactive charts tracking how each agent's performance evolves month-by-month:
 **Review Outcome Tracking**
 For each PR reviewed by an agent, we determine its status:
+1. **Merged**: PR was merged into the repository
 2. **Rejected**: PR was closed without being merged
 3. **Pending**: PR is still open and under review
 ## Understanding the Metrics
+**Total Reviews vs Merged/Rejected PRs**
+Not every PR will be merged. PRs may be rejected due to bugs, insufficient quality, conflicts with project goals, or other reasons. The acceptance and rejection rates help you understand how effective an agent's reviews are at identifying quality contributions.
 **Acceptance Rate**
+This is the percentage of reviewed PRs that were ultimately merged, calculated as:
+Acceptance Rate = Merged PRs ÷ (Merged PRs + Rejected PRs) × 100%
 Note: Pending PRs (still open) are excluded from this calculation to ensure we only measure completed review outcomes.

app.py CHANGED Viewed

@@ -53,8 +53,7 @@ LEADERBOARD_COLUMNS = [
     ("Agent Name", "string"),
     ("Website", "string"),
     ("Total Reviews", "number"),
-    ("Accepted PRs", "number"),
-    ("Rejected PRs", "number"),
     ("Acceptance Rate (%)", "number"),
 ]
@@ -451,10 +450,10 @@ def extract_review_metadata(pr):
     PR status:
     - pr_status: 'open', 'merged', or 'closed'
-    - pr_merged: True if PR was merged (accepted), False otherwise
     - pr_closed_at: Date when PR was closed/merged (if applicable)
-    Accepted PR = PR that was merged after agent review
     Rejected PR = PR that was closed without merging after agent review
     """
     # Extract PR metadata from search results
@@ -721,16 +720,16 @@ def calculate_review_stats_from_metadata(metadata_list):
     Returns a dictionary with comprehensive review metrics.
     Acceptance Rate is calculated as:
-        accepted PRs / (accepted PRs + rejected PRs) * 100
-    Accepted PRs = PRs that were merged (pr_status='merged')
     Rejected PRs = PRs that were closed without merging (pr_status='closed')
     Pending PRs = PRs still open (pr_status='open') - excluded from acceptance rate
     """
     total_reviews = len(metadata_list)
-    # Count accepted PRs (merged)
-    accepted_prs = sum(1 for review_meta in metadata_list
                       if review_meta.get('pr_status') == 'merged')
     # Count rejected PRs (closed without merging)
@@ -742,13 +741,12 @@ def calculate_review_stats_from_metadata(metadata_list):
                      if review_meta.get('pr_status') == 'open')
     # Calculate acceptance rate (exclude pending PRs)
-    completed_prs = accepted_prs + rejected_prs
-    acceptance_rate = (accepted_prs / completed_prs * 100) if completed_prs > 0 else 0
     return {
         'total_reviews': total_reviews,
-        'accepted_prs': accepted_prs,
-        'rejected_prs': rejected_prs,
         'pending_prs': pending_prs,
         'acceptance_rate': round(acceptance_rate, 2),
     }
@@ -767,8 +765,7 @@ def calculate_monthly_metrics_by_agent():
                 agent_name: {
                     'acceptance_rates': list of acceptance rates by month,
                     'total_reviews': list of review counts by month,
-                    'accepted_prs': list of accepted PR counts by month,
-                    'rejected_prs': list of rejected PR counts by month
                 }
             }
         }
@@ -820,14 +817,13 @@ def calculate_monthly_metrics_by_agent():
     for agent_name, month_dict in agent_month_data.items():
         acceptance_rates = []
         total_reviews_list = []
-        accepted_prs_list = []
-        rejected_prs_list = []
         for month in months:
             reviews_in_month = month_dict.get(month, [])
-            # Count accepted PRs (merged)
-            accepted_count = sum(1 for review in reviews_in_month
                                 if review.get('pr_status') == 'merged')
             # Count rejected PRs (closed without merging)
@@ -838,19 +834,17 @@ def calculate_monthly_metrics_by_agent():
             total_count = len(reviews_in_month)
             # Calculate acceptance rate (exclude pending PRs)
-            completed_count = accepted_count + rejected_count
-            acceptance_rate = (accepted_count / completed_count * 100) if completed_count > 0 else None
             acceptance_rates.append(acceptance_rate)
             total_reviews_list.append(total_count)
-            accepted_prs_list.append(accepted_count)
-            rejected_prs_list.append(rejected_count)
         result_data[agent_name] = {
             'acceptance_rates': acceptance_rates,
             'total_reviews': total_reviews_list,
-            'accepted_prs': accepted_prs_list,
-            'rejected_prs': rejected_prs_list
         }
     return {
@@ -1861,8 +1855,7 @@ def get_leaderboard_dataframe():
             data.get('agent_name', 'Unknown'),
             data.get('website', 'N/A'),
             data.get('total_reviews', 0),
-            data.get('accepted_prs', 0),
-            data.get('rejected_prs', 0),
             data.get('acceptance_rate', 0.0),
         ])
@@ -1871,7 +1864,7 @@ def get_leaderboard_dataframe():
     df = pd.DataFrame(rows, columns=column_names)
     # Ensure numeric types
-    numeric_cols = ["Total Reviews", "Accepted PRs", "Rejected PRs", "Acceptance Rate (%)"]
     for col in numeric_cols:
         if col in df.columns:
             df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)

     ("Agent Name", "string"),
     ("Website", "string"),
     ("Total Reviews", "number"),
+    ("Merged PRs", "number"),
     ("Acceptance Rate (%)", "number"),
 ]
     PR status:
     - pr_status: 'open', 'merged', or 'closed'
+    - pr_merged: True if PR was merged, False otherwise
     - pr_closed_at: Date when PR was closed/merged (if applicable)
+    Merged PR = PR that was merged after agent review
     Rejected PR = PR that was closed without merging after agent review
     """
     # Extract PR metadata from search results
     Returns a dictionary with comprehensive review metrics.
     Acceptance Rate is calculated as:
+        merged PRs / (merged PRs + rejected PRs) * 100
+    Merged PRs = PRs that were merged (pr_status='merged')
     Rejected PRs = PRs that were closed without merging (pr_status='closed')
     Pending PRs = PRs still open (pr_status='open') - excluded from acceptance rate
     """
     total_reviews = len(metadata_list)
+    # Count merged PRs (merged)
+    merged_prs = sum(1 for review_meta in metadata_list
                       if review_meta.get('pr_status') == 'merged')
     # Count rejected PRs (closed without merging)
                      if review_meta.get('pr_status') == 'open')
     # Calculate acceptance rate (exclude pending PRs)
+    completed_prs = merged_prs + rejected_prs
+    acceptance_rate = (merged_prs / completed_prs * 100) if completed_prs > 0 else 0
     return {
         'total_reviews': total_reviews,
+        'merged_prs': merged_prs,
         'pending_prs': pending_prs,
         'acceptance_rate': round(acceptance_rate, 2),
     }
                 agent_name: {
                     'acceptance_rates': list of acceptance rates by month,
                     'total_reviews': list of review counts by month,
+                    'merged_prs': list of merged PR counts by month,
                 }
             }
         }
     for agent_name, month_dict in agent_month_data.items():
         acceptance_rates = []
         total_reviews_list = []
+        merged_prs_list = []
         for month in months:
             reviews_in_month = month_dict.get(month, [])
+            # Count merged PRs (merged)
+            merged_count = sum(1 for review in reviews_in_month
                                 if review.get('pr_status') == 'merged')
             # Count rejected PRs (closed without merging)
             total_count = len(reviews_in_month)
             # Calculate acceptance rate (exclude pending PRs)
+            completed_count = merged_count + rejected_count
+            acceptance_rate = (merged_count / completed_count * 100) if completed_count > 0 else None
             acceptance_rates.append(acceptance_rate)
             total_reviews_list.append(total_count)
+            merged_prs_list.append(merged_count)
         result_data[agent_name] = {
             'acceptance_rates': acceptance_rates,
             'total_reviews': total_reviews_list,
+            'merged_prs': merged_prs_list,
         }
     return {
             data.get('agent_name', 'Unknown'),
             data.get('website', 'N/A'),
             data.get('total_reviews', 0),
+            data.get('merged_prs', 0),
             data.get('acceptance_rate', 0.0),
         ])
     df = pd.DataFrame(rows, columns=column_names)
     # Ensure numeric types
+    numeric_cols = ["Total Reviews", "Merged PRs", "Acceptance Rate (%)"]
     for col in numeric_cols:
         if col in df.columns:
             df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)

msr.py CHANGED Viewed

@@ -449,10 +449,10 @@ def extract_review_metadata(pr):
     PR status:
     - pr_status: 'open', 'merged', or 'closed'
-    - pr_merged: True if PR was merged (accepted), False otherwise
     - pr_closed_at: Date when PR was closed/merged (if applicable)
-    Accepted PR = PR that was merged after agent review
     Rejected PR = PR that was closed without merging after agent review
     """
     # Extract PR metadata from search results
@@ -1050,16 +1050,16 @@ def calculate_review_stats_from_metadata(metadata_list):
     Returns a dictionary with comprehensive review metrics.
     Acceptance Rate is calculated as:
-        accepted PRs / (accepted PRs + rejected PRs) * 100
-    Accepted PRs = PRs that were merged (pr_status='merged')
     Rejected PRs = PRs that were closed without merging (pr_status='closed')
     Pending PRs = PRs still open (pr_status='open') - excluded from acceptance rate
     """
     total_reviews = len(metadata_list)
-    # Count accepted PRs (merged)
-    accepted_prs = sum(1 for review_meta in metadata_list
                       if review_meta.get('pr_status') == 'merged')
     # Count rejected PRs (closed without merging)
@@ -1071,12 +1071,12 @@ def calculate_review_stats_from_metadata(metadata_list):
                      if review_meta.get('pr_status') == 'open')
     # Calculate acceptance rate (exclude pending PRs)
-    completed_prs = accepted_prs + rejected_prs
-    acceptance_rate = (accepted_prs / completed_prs * 100) if completed_prs > 0 else 0
     return {
         'total_reviews': total_reviews,
-        'accepted_prs': accepted_prs,
         'rejected_prs': rejected_prs,
         'pending_prs': pending_prs,
         'acceptance_rate': round(acceptance_rate, 2),

     PR status:
     - pr_status: 'open', 'merged', or 'closed'
+    - pr_merged: True if PR was merged, False otherwise
     - pr_closed_at: Date when PR was closed/merged (if applicable)
+    merged PR = PR that was merged after agent review
     Rejected PR = PR that was closed without merging after agent review
     """
     # Extract PR metadata from search results
     Returns a dictionary with comprehensive review metrics.
     Acceptance Rate is calculated as:
+        merged PRs / (merged PRs + rejected PRs) * 100
+    merged PRs = PRs that were merged (pr_status='merged')
     Rejected PRs = PRs that were closed without merging (pr_status='closed')
     Pending PRs = PRs still open (pr_status='open') - excluded from acceptance rate
     """
     total_reviews = len(metadata_list)
+    # Count merged PRs (merged)
+    merged_prs = sum(1 for review_meta in metadata_list
                       if review_meta.get('pr_status') == 'merged')
     # Count rejected PRs (closed without merging)
                      if review_meta.get('pr_status') == 'open')
     # Calculate acceptance rate (exclude pending PRs)
+    completed_prs = merged_prs + rejected_prs
+    acceptance_rate = (merged_prs / completed_prs * 100) if completed_prs > 0 else 0
     return {
         'total_reviews': total_reviews,
+        'merged_prs': merged_prs,
         'rejected_prs': rejected_prs,
         'pending_prs': pending_prs,
         'acceptance_rate': round(acceptance_rate, 2),