zhiminy commited on
Commit
325abdd
·
1 Parent(s): 9b447a6
Files changed (3) hide show
  1. README.md +8 -8
  2. app.py +20 -27
  3. msr.py +9 -9
README.md CHANGED
@@ -22,7 +22,7 @@ Currently, the leaderboard tracks public GitHub PR review activity across open-s
22
 
23
  Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent participates in real code reviews with real maintainers and real quality standards.
24
 
25
- This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: how many PRs did the agent review? What percentage of those reviews led to accepted PRs? What percentage were rejected? These are the signals that reflect genuine code review quality - the kind you'd expect from a human reviewer.
26
 
27
  If an agent can consistently provide valuable reviews that help maintainers accept quality PRs across different projects, that tells you something no benchmark can.
28
 
@@ -32,9 +32,9 @@ The leaderboard pulls data directly from GitHub's PR review history and shows yo
32
 
33
  **Leaderboard Table**
34
  - **Total Reviews**: How many PR reviews the agent has made in the last 6 months
35
- - **Accepted PRs**: How many PRs reviewed by the agent were accepted/merged
36
  - **Rejected PRs**: How many PRs reviewed by the agent were rejected/closed without merging
37
- - **Acceptance Rate**: Percentage of reviewed PRs that were accepted (see calculation details below)
38
 
39
  **Monthly Trends Visualization**
40
  Beyond the table, we show interactive charts tracking how each agent's performance evolves month-by-month:
@@ -57,7 +57,7 @@ We search GitHub using the PR and review search APIs to track all reviews associ
57
 
58
  **Review Outcome Tracking**
59
  For each PR reviewed by an agent, we determine its status:
60
- 1. **Accepted**: PR was merged into the repository
61
  2. **Rejected**: PR was closed without being merged
62
  3. **Pending**: PR is still open and under review
63
 
@@ -89,13 +89,13 @@ Click Submit. We'll validate the GitHub account, fetch the PR review history, an
89
 
90
  ## Understanding the Metrics
91
 
92
- **Total Reviews vs Accepted/Rejected PRs**
93
- Not every PR will be accepted. PRs may be rejected due to bugs, insufficient quality, conflicts with project goals, or other reasons. The acceptance and rejection rates help you understand how effective an agent's reviews are at identifying quality contributions.
94
 
95
  **Acceptance Rate**
96
- This is the percentage of reviewed PRs that were ultimately accepted and merged, calculated as:
97
 
98
- Acceptance Rate = Accepted PRs ÷ (Accepted PRs + Rejected PRs) × 100%
99
 
100
  Note: Pending PRs (still open) are excluded from this calculation to ensure we only measure completed review outcomes.
101
 
 
22
 
23
  Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent participates in real code reviews with real maintainers and real quality standards.
24
 
25
+ This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: how many PRs did the agent review? What percentage of those reviews led to merged PRs? What percentage were rejected? These are the signals that reflect genuine code review quality - the kind you'd expect from a human reviewer.
26
 
27
  If an agent can consistently provide valuable reviews that help maintainers accept quality PRs across different projects, that tells you something no benchmark can.
28
 
 
32
 
33
  **Leaderboard Table**
34
  - **Total Reviews**: How many PR reviews the agent has made in the last 6 months
35
+ - **Merged PRs**: How many PRs reviewed by the agent were merged
36
  - **Rejected PRs**: How many PRs reviewed by the agent were rejected/closed without merging
37
+ - **Acceptance Rate**: Percentage of reviewed PRs that were merged (see calculation details below)
38
 
39
  **Monthly Trends Visualization**
40
  Beyond the table, we show interactive charts tracking how each agent's performance evolves month-by-month:
 
57
 
58
  **Review Outcome Tracking**
59
  For each PR reviewed by an agent, we determine its status:
60
+ 1. **Merged**: PR was merged into the repository
61
  2. **Rejected**: PR was closed without being merged
62
  3. **Pending**: PR is still open and under review
63
 
 
89
 
90
  ## Understanding the Metrics
91
 
92
+ **Total Reviews vs Merged/Rejected PRs**
93
+ Not every PR will be merged. PRs may be rejected due to bugs, insufficient quality, conflicts with project goals, or other reasons. The acceptance and rejection rates help you understand how effective an agent's reviews are at identifying quality contributions.
94
 
95
  **Acceptance Rate**
96
+ This is the percentage of reviewed PRs that were ultimately merged, calculated as:
97
 
98
+ Acceptance Rate = Merged PRs ÷ (Merged PRs + Rejected PRs) × 100%
99
 
100
  Note: Pending PRs (still open) are excluded from this calculation to ensure we only measure completed review outcomes.
101
 
app.py CHANGED
@@ -53,8 +53,7 @@ LEADERBOARD_COLUMNS = [
53
  ("Agent Name", "string"),
54
  ("Website", "string"),
55
  ("Total Reviews", "number"),
56
- ("Accepted PRs", "number"),
57
- ("Rejected PRs", "number"),
58
  ("Acceptance Rate (%)", "number"),
59
  ]
60
 
@@ -451,10 +450,10 @@ def extract_review_metadata(pr):
451
 
452
  PR status:
453
  - pr_status: 'open', 'merged', or 'closed'
454
- - pr_merged: True if PR was merged (accepted), False otherwise
455
  - pr_closed_at: Date when PR was closed/merged (if applicable)
456
 
457
- Accepted PR = PR that was merged after agent review
458
  Rejected PR = PR that was closed without merging after agent review
459
  """
460
  # Extract PR metadata from search results
@@ -721,16 +720,16 @@ def calculate_review_stats_from_metadata(metadata_list):
721
  Returns a dictionary with comprehensive review metrics.
722
 
723
  Acceptance Rate is calculated as:
724
- accepted PRs / (accepted PRs + rejected PRs) * 100
725
 
726
- Accepted PRs = PRs that were merged (pr_status='merged')
727
  Rejected PRs = PRs that were closed without merging (pr_status='closed')
728
  Pending PRs = PRs still open (pr_status='open') - excluded from acceptance rate
729
  """
730
  total_reviews = len(metadata_list)
731
 
732
- # Count accepted PRs (merged)
733
- accepted_prs = sum(1 for review_meta in metadata_list
734
  if review_meta.get('pr_status') == 'merged')
735
 
736
  # Count rejected PRs (closed without merging)
@@ -742,13 +741,12 @@ def calculate_review_stats_from_metadata(metadata_list):
742
  if review_meta.get('pr_status') == 'open')
743
 
744
  # Calculate acceptance rate (exclude pending PRs)
745
- completed_prs = accepted_prs + rejected_prs
746
- acceptance_rate = (accepted_prs / completed_prs * 100) if completed_prs > 0 else 0
747
 
748
  return {
749
  'total_reviews': total_reviews,
750
- 'accepted_prs': accepted_prs,
751
- 'rejected_prs': rejected_prs,
752
  'pending_prs': pending_prs,
753
  'acceptance_rate': round(acceptance_rate, 2),
754
  }
@@ -767,8 +765,7 @@ def calculate_monthly_metrics_by_agent():
767
  agent_name: {
768
  'acceptance_rates': list of acceptance rates by month,
769
  'total_reviews': list of review counts by month,
770
- 'accepted_prs': list of accepted PR counts by month,
771
- 'rejected_prs': list of rejected PR counts by month
772
  }
773
  }
774
  }
@@ -820,14 +817,13 @@ def calculate_monthly_metrics_by_agent():
820
  for agent_name, month_dict in agent_month_data.items():
821
  acceptance_rates = []
822
  total_reviews_list = []
823
- accepted_prs_list = []
824
- rejected_prs_list = []
825
 
826
  for month in months:
827
  reviews_in_month = month_dict.get(month, [])
828
 
829
- # Count accepted PRs (merged)
830
- accepted_count = sum(1 for review in reviews_in_month
831
  if review.get('pr_status') == 'merged')
832
 
833
  # Count rejected PRs (closed without merging)
@@ -838,19 +834,17 @@ def calculate_monthly_metrics_by_agent():
838
  total_count = len(reviews_in_month)
839
 
840
  # Calculate acceptance rate (exclude pending PRs)
841
- completed_count = accepted_count + rejected_count
842
- acceptance_rate = (accepted_count / completed_count * 100) if completed_count > 0 else None
843
 
844
  acceptance_rates.append(acceptance_rate)
845
  total_reviews_list.append(total_count)
846
- accepted_prs_list.append(accepted_count)
847
- rejected_prs_list.append(rejected_count)
848
 
849
  result_data[agent_name] = {
850
  'acceptance_rates': acceptance_rates,
851
  'total_reviews': total_reviews_list,
852
- 'accepted_prs': accepted_prs_list,
853
- 'rejected_prs': rejected_prs_list
854
  }
855
 
856
  return {
@@ -1861,8 +1855,7 @@ def get_leaderboard_dataframe():
1861
  data.get('agent_name', 'Unknown'),
1862
  data.get('website', 'N/A'),
1863
  data.get('total_reviews', 0),
1864
- data.get('accepted_prs', 0),
1865
- data.get('rejected_prs', 0),
1866
  data.get('acceptance_rate', 0.0),
1867
  ])
1868
 
@@ -1871,7 +1864,7 @@ def get_leaderboard_dataframe():
1871
  df = pd.DataFrame(rows, columns=column_names)
1872
 
1873
  # Ensure numeric types
1874
- numeric_cols = ["Total Reviews", "Accepted PRs", "Rejected PRs", "Acceptance Rate (%)"]
1875
  for col in numeric_cols:
1876
  if col in df.columns:
1877
  df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
 
53
  ("Agent Name", "string"),
54
  ("Website", "string"),
55
  ("Total Reviews", "number"),
56
+ ("Merged PRs", "number"),
 
57
  ("Acceptance Rate (%)", "number"),
58
  ]
59
 
 
450
 
451
  PR status:
452
  - pr_status: 'open', 'merged', or 'closed'
453
+ - pr_merged: True if PR was merged, False otherwise
454
  - pr_closed_at: Date when PR was closed/merged (if applicable)
455
 
456
+ Merged PR = PR that was merged after agent review
457
  Rejected PR = PR that was closed without merging after agent review
458
  """
459
  # Extract PR metadata from search results
 
720
  Returns a dictionary with comprehensive review metrics.
721
 
722
  Acceptance Rate is calculated as:
723
+ merged PRs / (merged PRs + rejected PRs) * 100
724
 
725
+ Merged PRs = PRs that were merged (pr_status='merged')
726
  Rejected PRs = PRs that were closed without merging (pr_status='closed')
727
  Pending PRs = PRs still open (pr_status='open') - excluded from acceptance rate
728
  """
729
  total_reviews = len(metadata_list)
730
 
731
+ # Count merged PRs (merged)
732
+ merged_prs = sum(1 for review_meta in metadata_list
733
  if review_meta.get('pr_status') == 'merged')
734
 
735
  # Count rejected PRs (closed without merging)
 
741
  if review_meta.get('pr_status') == 'open')
742
 
743
  # Calculate acceptance rate (exclude pending PRs)
744
+ completed_prs = merged_prs + rejected_prs
745
+ acceptance_rate = (merged_prs / completed_prs * 100) if completed_prs > 0 else 0
746
 
747
  return {
748
  'total_reviews': total_reviews,
749
+ 'merged_prs': merged_prs,
 
750
  'pending_prs': pending_prs,
751
  'acceptance_rate': round(acceptance_rate, 2),
752
  }
 
765
  agent_name: {
766
  'acceptance_rates': list of acceptance rates by month,
767
  'total_reviews': list of review counts by month,
768
+ 'merged_prs': list of merged PR counts by month,
 
769
  }
770
  }
771
  }
 
817
  for agent_name, month_dict in agent_month_data.items():
818
  acceptance_rates = []
819
  total_reviews_list = []
820
+ merged_prs_list = []
 
821
 
822
  for month in months:
823
  reviews_in_month = month_dict.get(month, [])
824
 
825
+ # Count merged PRs (merged)
826
+ merged_count = sum(1 for review in reviews_in_month
827
  if review.get('pr_status') == 'merged')
828
 
829
  # Count rejected PRs (closed without merging)
 
834
  total_count = len(reviews_in_month)
835
 
836
  # Calculate acceptance rate (exclude pending PRs)
837
+ completed_count = merged_count + rejected_count
838
+ acceptance_rate = (merged_count / completed_count * 100) if completed_count > 0 else None
839
 
840
  acceptance_rates.append(acceptance_rate)
841
  total_reviews_list.append(total_count)
842
+ merged_prs_list.append(merged_count)
 
843
 
844
  result_data[agent_name] = {
845
  'acceptance_rates': acceptance_rates,
846
  'total_reviews': total_reviews_list,
847
+ 'merged_prs': merged_prs_list,
 
848
  }
849
 
850
  return {
 
1855
  data.get('agent_name', 'Unknown'),
1856
  data.get('website', 'N/A'),
1857
  data.get('total_reviews', 0),
1858
+ data.get('merged_prs', 0),
 
1859
  data.get('acceptance_rate', 0.0),
1860
  ])
1861
 
 
1864
  df = pd.DataFrame(rows, columns=column_names)
1865
 
1866
  # Ensure numeric types
1867
+ numeric_cols = ["Total Reviews", "Merged PRs", "Acceptance Rate (%)"]
1868
  for col in numeric_cols:
1869
  if col in df.columns:
1870
  df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
msr.py CHANGED
@@ -449,10 +449,10 @@ def extract_review_metadata(pr):
449
 
450
  PR status:
451
  - pr_status: 'open', 'merged', or 'closed'
452
- - pr_merged: True if PR was merged (accepted), False otherwise
453
  - pr_closed_at: Date when PR was closed/merged (if applicable)
454
 
455
- Accepted PR = PR that was merged after agent review
456
  Rejected PR = PR that was closed without merging after agent review
457
  """
458
  # Extract PR metadata from search results
@@ -1050,16 +1050,16 @@ def calculate_review_stats_from_metadata(metadata_list):
1050
  Returns a dictionary with comprehensive review metrics.
1051
 
1052
  Acceptance Rate is calculated as:
1053
- accepted PRs / (accepted PRs + rejected PRs) * 100
1054
 
1055
- Accepted PRs = PRs that were merged (pr_status='merged')
1056
  Rejected PRs = PRs that were closed without merging (pr_status='closed')
1057
  Pending PRs = PRs still open (pr_status='open') - excluded from acceptance rate
1058
  """
1059
  total_reviews = len(metadata_list)
1060
 
1061
- # Count accepted PRs (merged)
1062
- accepted_prs = sum(1 for review_meta in metadata_list
1063
  if review_meta.get('pr_status') == 'merged')
1064
 
1065
  # Count rejected PRs (closed without merging)
@@ -1071,12 +1071,12 @@ def calculate_review_stats_from_metadata(metadata_list):
1071
  if review_meta.get('pr_status') == 'open')
1072
 
1073
  # Calculate acceptance rate (exclude pending PRs)
1074
- completed_prs = accepted_prs + rejected_prs
1075
- acceptance_rate = (accepted_prs / completed_prs * 100) if completed_prs > 0 else 0
1076
 
1077
  return {
1078
  'total_reviews': total_reviews,
1079
- 'accepted_prs': accepted_prs,
1080
  'rejected_prs': rejected_prs,
1081
  'pending_prs': pending_prs,
1082
  'acceptance_rate': round(acceptance_rate, 2),
 
449
 
450
  PR status:
451
  - pr_status: 'open', 'merged', or 'closed'
452
+ - pr_merged: True if PR was merged, False otherwise
453
  - pr_closed_at: Date when PR was closed/merged (if applicable)
454
 
455
+ merged PR = PR that was merged after agent review
456
  Rejected PR = PR that was closed without merging after agent review
457
  """
458
  # Extract PR metadata from search results
 
1050
  Returns a dictionary with comprehensive review metrics.
1051
 
1052
  Acceptance Rate is calculated as:
1053
+ merged PRs / (merged PRs + rejected PRs) * 100
1054
 
1055
+ merged PRs = PRs that were merged (pr_status='merged')
1056
  Rejected PRs = PRs that were closed without merging (pr_status='closed')
1057
  Pending PRs = PRs still open (pr_status='open') - excluded from acceptance rate
1058
  """
1059
  total_reviews = len(metadata_list)
1060
 
1061
+ # Count merged PRs (merged)
1062
+ merged_prs = sum(1 for review_meta in metadata_list
1063
  if review_meta.get('pr_status') == 'merged')
1064
 
1065
  # Count rejected PRs (closed without merging)
 
1071
  if review_meta.get('pr_status') == 'open')
1072
 
1073
  # Calculate acceptance rate (exclude pending PRs)
1074
+ completed_prs = merged_prs + rejected_prs
1075
+ acceptance_rate = (merged_prs / completed_prs * 100) if completed_prs > 0 else 0
1076
 
1077
  return {
1078
  'total_reviews': total_reviews,
1079
+ 'merged_prs': merged_prs,
1080
  'rejected_prs': rejected_prs,
1081
  'pending_prs': pending_prs,
1082
  'acceptance_rate': round(acceptance_rate, 2),