zhiminy commited on
Commit
9b447a6
Β·
1 Parent(s): 3648f40
Files changed (6) hide show
  1. .github/workflows/hf_sync.yml +35 -0
  2. .gitignore +5 -0
  3. README.md +121 -1
  4. app.py +2140 -0
  5. msr.py +1224 -0
  6. requirements.txt +9 -0
.github/workflows/hf_sync.yml ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face Space
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+
8
+ jobs:
9
+ sync:
10
+ runs-on: ubuntu-latest
11
+
12
+ steps:
13
+ - name: Checkout GitHub Repository
14
+ uses: actions/checkout@v3
15
+ with:
16
+ fetch-depth: 0 # Fetch the entire history to avoid shallow clone issues
17
+
18
+ - name: Install Git LFS
19
+ run: |
20
+ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
21
+ sudo apt-get install git-lfs
22
+ git lfs install
23
+
24
+ - name: Configure Git
25
+ run: |
26
+ git config --global user.name "GitHub Actions Bot"
27
+ git config --global user.email "actions@github.com"
28
+
29
+ - name: Push to Hugging Face
30
+ env:
31
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
32
+ run: |
33
+ git remote add huggingface https://user:${HF_TOKEN}@huggingface.co/spaces/SWE-Arena/SWE-Review
34
+ git fetch huggingface
35
+ git push huggingface main --force
.gitignore ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ *.claude
2
+ *.env
3
+ *.venv
4
+ *.ipynb
5
+ *.pyc
README.md CHANGED
@@ -10,4 +10,124 @@ pinned: false
10
  short_description: Track GitHub review statistics for SWE agents
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  short_description: Track GitHub review statistics for SWE agents
11
  ---
12
 
13
+ # SWE Agent Review Leaderboard
14
+
15
+ SWE-Review ranks software engineering agents by their real-world GitHub review performance.
16
+
17
+ A lightweight platform for tracking real-world GitHub pull request review statistics for software engineering agents. No benchmarks. No sandboxes. Just real PR reviews from actual repositories.
18
+
19
+ Currently, the leaderboard tracks public GitHub PR review activity across open-source repositories where the agent has participated in code review.
20
+
21
+ ## Why This Exists
22
+
23
+ Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent participates in real code reviews with real maintainers and real quality standards.
24
+
25
+ This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: how many PRs did the agent review? What percentage of those reviews led to accepted PRs? What percentage were rejected? These are the signals that reflect genuine code review quality - the kind you'd expect from a human reviewer.
26
+
27
+ If an agent can consistently provide valuable reviews that help maintainers accept quality PRs across different projects, that tells you something no benchmark can.
28
+
29
+ ## What We Track
30
+
31
+ The leaderboard pulls data directly from GitHub's PR review history and shows you key metrics from the last 6 months:
32
+
33
+ **Leaderboard Table**
34
+ - **Total Reviews**: How many PR reviews the agent has made in the last 6 months
35
+ - **Accepted PRs**: How many PRs reviewed by the agent were accepted/merged
36
+ - **Rejected PRs**: How many PRs reviewed by the agent were rejected/closed without merging
37
+ - **Acceptance Rate**: Percentage of reviewed PRs that were accepted (see calculation details below)
38
+
39
+ **Monthly Trends Visualization**
40
+ Beyond the table, we show interactive charts tracking how each agent's performance evolves month-by-month:
41
+ - Acceptance rate trends (line plots)
42
+ - Review volume over time (bar charts)
43
+
44
+ This helps you see which agents are improving, which provide consistently valuable reviews, and how active they've been recently.
45
+
46
+ **Why 6 Months?**
47
+ We focus on recent performance (last 6 months) to highlight active agents and current capabilities. This ensures the leaderboard reflects the latest versions of agents rather than outdated historical data, making it more relevant for evaluating current performance.
48
+
49
+ ## How It Works
50
+
51
+ Behind the scenes, we're doing a few things:
52
+
53
+ **Data Collection**
54
+ We search GitHub using the PR and review search APIs to track all reviews associated with an agent:
55
+ - PR reviews by the agent (`reviewed-by:agent-name`)
56
+ - PR status (merged, closed, open) to determine acceptance or rejection
57
+
58
+ **Review Outcome Tracking**
59
+ For each PR reviewed by an agent, we determine its status:
60
+ 1. **Accepted**: PR was merged into the repository
61
+ 2. **Rejected**: PR was closed without being merged
62
+ 3. **Pending**: PR is still open and under review
63
+
64
+ **Regular Updates**
65
+ The leaderboard refreshes automatically every day at 12:00 AM UTC.
66
+
67
+ **Community Submissions**
68
+ Anyone can submit a coding agent to track via the leaderboard. We store agent metadata in Hugging Face datasets (`SWE-Arena/swe_agents`) and review metadata in (`SWE-Arena/review_metadata`). The leaderboard is dynamically constructed from the review metadata. All submissions are automatically validated through GitHub's API to ensure the account exists and has public activity.
69
+
70
+ ## Using the Leaderboard
71
+
72
+ ### Just Browsing?
73
+ Head to the Leaderboard tab where you'll find:
74
+ - **Searchable table**: Search by agent name or website
75
+ - **Filterable columns**: Filter by acceptance rate to find top performers
76
+ - **Monthly charts**: Scroll down to see acceptance rate trends and review activity over time
77
+
78
+ The charts use color-coded lines and bars so you can easily track individual agents across months.
79
+
80
+ ### Want to Add Your Agent?
81
+ In the Submit Agent tab, provide:
82
+ - **GitHub identifier*** (required): Your agent's GitHub username or bot account
83
+ - **Agent name*** (required): Display name for the leaderboard
84
+ - **Organization*** (required): Your organization or team name
85
+ - **Website*** (required): Link to your agent's homepage or documentation
86
+ - **Description** (optional): Brief explanation of what your agent does
87
+
88
+ Click Submit. We'll validate the GitHub account, fetch the PR review history, and add your agent to the board. Initial data loading takes a few seconds.
89
+
90
+ ## Understanding the Metrics
91
+
92
+ **Total Reviews vs Accepted/Rejected PRs**
93
+ Not every PR will be accepted. PRs may be rejected due to bugs, insufficient quality, conflicts with project goals, or other reasons. The acceptance and rejection rates help you understand how effective an agent's reviews are at identifying quality contributions.
94
+
95
+ **Acceptance Rate**
96
+ This is the percentage of reviewed PRs that were ultimately accepted and merged, calculated as:
97
+
98
+ Acceptance Rate = Accepted PRs Γ· (Accepted PRs + Rejected PRs) Γ— 100%
99
+
100
+ Note: Pending PRs (still open) are excluded from this calculation to ensure we only measure completed review outcomes.
101
+
102
+ **What This Tells Us**:
103
+ - A high acceptance rate suggests the agent provides valuable reviews that help maintainers identify quality PRs worth merging
104
+ - A balanced acceptance/rejection rate may indicate thorough, critical review practices
105
+ - Very low acceptance rates might suggest overly harsh or inaccurate reviews
106
+
107
+ Context matters though - an agent with 100 reviews and a 70% acceptance rate is different from one with 10 reviews at 100%. Look at both the rate and the volume.
108
+
109
+ **Monthly Trends**
110
+ The visualization below the leaderboard table shows:
111
+ - **Line plots**: How acceptance rates change over time for each agent
112
+ - **Bar charts**: How many PR reviews each agent performed each month
113
+
114
+ Use these charts to spot patterns:
115
+ - Consistent acceptance rates indicate reliable review quality
116
+ - Increasing trends show agents that are learning and improving
117
+ - High review volumes with good acceptance rates demonstrate both productivity and quality review practices
118
+
119
+ ## What's Next
120
+
121
+ We're planning to add more granular insights:
122
+
123
+ - **Repository-based analysis**: Break down performance by repository to highlight domain strengths and project-specific acceptance rates
124
+ - **Extended metrics**: Review response time, review depth (number of comments), and review message quality
125
+ - **Review sentiment analysis**: Understand the tone and helpfulness of review comments
126
+ - **Review patterns**: Identify whether agents excel at security reviews, code quality reviews, or architectural feedback
127
+ - **PR characteristics**: Analyze acceptance rates based on PR size, complexity, and type (features, fixes, refactoring)
128
+
129
+ Our goal is to make leaderboard data as transparent and reflective of real-world code review quality as possible.
130
+
131
+ ## Questions or Issues?
132
+
133
+ If something breaks, you want to suggest a feature, or you're seeing weird data for your agent, [open an issue](https://github.com/SE-Arena/SWE-Review/issues) and we'll take a look.
app.py ADDED
@@ -0,0 +1,2140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from gradio_leaderboard import Leaderboard
3
+ import json
4
+ import os
5
+ import time
6
+ import requests
7
+ from datetime import datetime, timezone, timedelta
8
+ from collections import defaultdict
9
+ from huggingface_hub import HfApi, hf_hub_download
10
+ from datasets import load_dataset, Dataset
11
+ import threading
12
+ from dotenv import load_dotenv
13
+ import pandas as pd
14
+ import random
15
+ import argparse
16
+ import plotly.graph_objects as go
17
+ from plotly.subplots import make_subplots
18
+ from apscheduler.schedulers.background import BackgroundScheduler
19
+ from apscheduler.triggers.cron import CronTrigger
20
+
21
+ # Load environment variables
22
+ load_dotenv()
23
+
24
+ # Parse command-line arguments
25
+ parser = argparse.ArgumentParser(description='SWE Agent Review Leaderboard')
26
+ parser.add_argument('--debug', '--DEBUG', action='store_true',
27
+ help='Enable debug mode (limits review retrieval to 10 per query pattern)')
28
+ parser.add_argument('--no-debug', '--production', action='store_true',
29
+ help='Explicitly disable debug mode (force production mode)')
30
+ args = parser.parse_args()
31
+
32
+ # =============================================================================
33
+ # CONFIGURATION
34
+ # =============================================================================
35
+
36
+ # DEBUG MODE: Set to True to limit review retrieval for testing
37
+ # When enabled, only fetches up to 10 reviews per query pattern per agent
38
+ # Priority: 1) Command-line args, 2) Environment variable, 3) Default (False)
39
+ if args.no_debug:
40
+ DEBUG_MODE = False
41
+ elif args.debug:
42
+ DEBUG_MODE = True
43
+ else:
44
+ DEBUG_MODE = os.getenv('DEBUG_MODE', 'False').lower() in ('true', '1', 'yes')
45
+
46
+ # In-memory cache for debug mode (data persists during session but NOT saved to HF)
47
+ DEBUG_REVIEW_METADATA_CACHE = defaultdict(list)
48
+
49
+ AGENTS_REPO = "SWE-Arena/swe_agents" # HuggingFace dataset for agent metadata
50
+ REVIEW_METADATA_REPO = "SWE-Arena/review_metadata" # HuggingFace dataset for review metadata
51
+
52
+ LEADERBOARD_COLUMNS = [
53
+ ("Agent Name", "string"),
54
+ ("Website", "string"),
55
+ ("Total Reviews", "number"),
56
+ ("Accepted PRs", "number"),
57
+ ("Rejected PRs", "number"),
58
+ ("Acceptance Rate (%)", "number"),
59
+ ]
60
+
61
+ # =============================================================================
62
+ # JSONL FILE OPERATIONS
63
+ # =============================================================================
64
+
65
+ def load_jsonl(filename):
66
+ """Load JSONL file and return list of dictionaries."""
67
+ if not os.path.exists(filename):
68
+ return []
69
+
70
+ data = []
71
+ with open(filename, 'r', encoding='utf-8') as f:
72
+ for line in f:
73
+ line = line.strip()
74
+ if line:
75
+ try:
76
+ entry = json.loads(line)
77
+ data.append(entry)
78
+ except json.JSONDecodeError as e:
79
+ print(f"Warning: Skipping invalid JSON line: {e}")
80
+ return data
81
+
82
+
83
+ def save_jsonl(filename, data):
84
+ """Save list of dictionaries to JSONL file."""
85
+ with open(filename, 'w', encoding='utf-8') as f:
86
+ for item in data:
87
+ f.write(json.dumps(item) + '\n')
88
+
89
+
90
+ def cache_to_dict(cache_list):
91
+ """Convert list of cache entries to dictionary by identifier."""
92
+ return {entry['github_identifier']: entry for entry in cache_list}
93
+
94
+
95
+ def dict_to_cache(cache_dict):
96
+ """Convert dictionary back to list of values."""
97
+ return list(cache_dict.values())
98
+
99
+
100
+ def normalize_date_format(date_string):
101
+ """
102
+ Convert date strings to standardized ISO 8601 format with Z suffix.
103
+ Handles both old format (2025-10-15T23:23:47.983068) and new format (2025-10-15T23:23:47Z).
104
+ """
105
+ if not date_string or date_string == 'N/A':
106
+ return 'N/A'
107
+
108
+ try:
109
+ # Parse the date string (handles both with and without microseconds)
110
+ if '.' in date_string:
111
+ # Old format with microseconds
112
+ dt = datetime.fromisoformat(date_string.replace('Z', '+00:00'))
113
+ else:
114
+ # Already in correct format or GitHub format
115
+ return date_string
116
+
117
+ # Convert to standardized format
118
+ return dt.strftime('%Y-%m-%dT%H:%M:%SZ')
119
+ except Exception as e:
120
+ print(f"Warning: Could not parse date '{date_string}': {e}")
121
+ return date_string
122
+
123
+
124
+ # =============================================================================
125
+ # GITHUB API OPERATIONS
126
+ # =============================================================================
127
+
128
+ def request_with_backoff(method, url, *, headers=None, params=None, json_body=None, data=None, max_retries=10, timeout=30):
129
+ """
130
+ Perform an HTTP request with exponential backoff and jitter for GitHub API.
131
+ Retries on 403/429 (rate limits), 5xx server errors, and transient network exceptions.
132
+
133
+ Returns the final requests.Response on success or non-retryable status, or None after exhausting retries.
134
+ """
135
+ delay = 1.0
136
+ for attempt in range(max_retries):
137
+ try:
138
+ resp = requests.request(
139
+ method,
140
+ url,
141
+ headers=headers or {},
142
+ params=params,
143
+ json=json_body,
144
+ data=data,
145
+ timeout=timeout
146
+ )
147
+
148
+ status = resp.status_code
149
+
150
+ # Success
151
+ if 200 <= status < 300:
152
+ return resp
153
+
154
+ # Rate limits or server errors -> retry with backoff
155
+ if status in (403, 429) or 500 <= status < 600:
156
+ wait = None
157
+
158
+ # Prefer Retry-After when present
159
+ retry_after = resp.headers.get('Retry-After') or resp.headers.get('retry-after')
160
+ if retry_after:
161
+ try:
162
+ wait = float(retry_after)
163
+ except Exception:
164
+ wait = None
165
+
166
+ # Fallback to X-RateLimit-Reset when 403/429
167
+ if wait is None and status in (403, 429):
168
+ reset_hdr = resp.headers.get('X-RateLimit-Reset') or resp.headers.get('x-ratelimit-reset')
169
+ if reset_hdr:
170
+ try:
171
+ reset_ts = int(float(reset_hdr))
172
+ wait = max(reset_ts - time.time() + 2, 1)
173
+ except Exception:
174
+ wait = None
175
+
176
+ # Final fallback: exponential backoff with jitter
177
+ if wait is None:
178
+ wait = delay + random.uniform(0, 0.5)
179
+
180
+ # Cap individual wait to avoid extreme sleeps
181
+ wait = max(1.0, min(wait, 120.0))
182
+ print(f"GitHub API {status}. Backing off {wait:.1f}s (attempt {attempt + 1}/{max_retries})...")
183
+ time.sleep(wait)
184
+ delay = min(delay * 2, 60.0)
185
+ continue
186
+
187
+ # Non-retryable error; return response for caller to handle
188
+ return resp
189
+
190
+ except requests.RequestException as e:
191
+ # Network error -> retry with backoff
192
+ wait = delay + random.uniform(0, 0.5)
193
+ wait = max(1.0, min(wait, 60.0))
194
+ print(f"Request error: {e}. Retrying in {wait:.1f}s (attempt {attempt + 1}/{max_retries})...")
195
+ time.sleep(wait)
196
+ delay = min(delay * 2, 60.0)
197
+
198
+ print(f"Exceeded max retries for {url}")
199
+ return None
200
+
201
+ def get_github_token():
202
+ """Get GitHub token from environment variables."""
203
+ token = os.getenv('GITHUB_TOKEN')
204
+ if not token:
205
+ print("Warning: GITHUB_TOKEN not found. API rate limits: 60/hour (authenticated: 5000/hour)")
206
+ return token
207
+
208
+
209
+ def validate_github_username(identifier):
210
+ """Verify that a GitHub identifier exists with backoff-aware requests."""
211
+ try:
212
+ token = get_github_token()
213
+ headers = {'Authorization': f'token {token}'} if token else {}
214
+ url = f'https://api.github.com/users/{identifier}'
215
+ response = request_with_backoff('GET', url, headers=headers, max_retries=1)
216
+ if response is None:
217
+ return False, "Validation error: network/rate limit exhausted"
218
+ if response.status_code == 200:
219
+ return True, "Username is valid"
220
+ elif response.status_code == 404:
221
+ return False, "GitHub identifier not found"
222
+ else:
223
+ return False, f"Validation error: HTTP {response.status_code}"
224
+ except Exception as e:
225
+ return False, f"Validation error: {str(e)}"
226
+
227
+
228
+ def fetch_reviews_with_time_partition(base_query, start_date, end_date, headers, prs_by_url, debug_limit=None, depth=0):
229
+ """
230
+ Fetch reviews within a specific time range using time-based partitioning.
231
+ Recursively splits the time range if hitting the 1000-result limit.
232
+ Supports splitting by day, hour, minute, and second as needed.
233
+
234
+ Args:
235
+ debug_limit: If set, stops fetching after this many NEW reviews total across all partitions (for testing)
236
+ depth: Current recursion depth (for tracking)
237
+
238
+ Returns the number of reviews found in this time partition.
239
+ """
240
+ # Calculate time difference
241
+ time_diff = end_date - start_date
242
+ total_seconds = time_diff.total_seconds()
243
+
244
+ # Determine granularity and format dates accordingly
245
+ if total_seconds >= 86400: # >= 1 day
246
+ # Use day granularity (YYYY-MM-DD)
247
+ start_str = start_date.strftime('%Y-%m-%d')
248
+ end_str = end_date.strftime('%Y-%m-%d')
249
+ elif total_seconds >= 3600: # >= 1 hour but < 1 day
250
+ # Use hour granularity (YYYY-MM-DDTHH:MM:SSZ)
251
+ start_str = start_date.strftime('%Y-%m-%dT%H:00:00Z')
252
+ end_str = end_date.strftime('%Y-%m-%dT%H:59:59Z')
253
+ elif total_seconds >= 60: # >= 1 minute but < 1 hour
254
+ # Use minute granularity (YYYY-MM-DDTHH:MM:SSZ)
255
+ start_str = start_date.strftime('%Y-%m-%dT%H:%M:00Z')
256
+ end_str = end_date.strftime('%Y-%m-%dT%H:%M:59Z')
257
+ else: # < 1 minute
258
+ # Use second granularity (YYYY-MM-DDTHH:MM:SSZ)
259
+ start_str = start_date.strftime('%Y-%m-%dT%H:%M:%SZ')
260
+ end_str = end_date.strftime('%Y-%m-%dT%H:%M:%SZ')
261
+
262
+ # Add date range to query (use created for PR search)
263
+ query = f'{base_query} created:{start_str}..{end_str}'
264
+
265
+ indent = " " + " " * depth
266
+ print(f"{indent}Searching range {start_str} to {end_str}...")
267
+
268
+ page = 1
269
+ per_page = 100
270
+ total_in_partition = 0
271
+
272
+ while True:
273
+ # Check debug limit GLOBALLY (total unique PRs across all partitions)
274
+ if debug_limit is not None and len(prs_by_url) >= debug_limit:
275
+ print(f"{indent} πŸ› DEBUG MODE: Reached global limit of {debug_limit} PRs, stopping...")
276
+ return total_in_partition
277
+ url = 'https://api.github.com/search/issues' # Use issues endpoint for PR search
278
+ params = {
279
+ 'q': query,
280
+ 'per_page': per_page,
281
+ 'page': page,
282
+ 'sort': 'created',
283
+ 'order': 'asc'
284
+ }
285
+ headers_with_accept = headers.copy() if headers else {}
286
+
287
+ try:
288
+ response = request_with_backoff('GET', url, headers=headers_with_accept, params=params)
289
+ if response is None:
290
+ print(f"{indent} Error: retries exhausted for range {start_str} to {end_str}")
291
+ return total_in_partition
292
+
293
+ if response.status_code != 200:
294
+ print(f"{indent} Error: HTTP {response.status_code} for range {start_str} to {end_str}")
295
+ return total_in_partition
296
+
297
+ data = response.json()
298
+ total_count = data.get('total_count', 0)
299
+ items = data.get('items', [])
300
+
301
+ if not items:
302
+ break
303
+
304
+ # Add PR reviews to global dict (keyed by PR URL)
305
+ for pr in items:
306
+ pr_url = pr.get('html_url')
307
+ pr_number = pr.get('number')
308
+ # Use PR URL as unique key (more reliable than number alone)
309
+ if pr_url and pr_url not in prs_by_url:
310
+ prs_by_url[pr_url] = pr
311
+ total_in_partition += 1
312
+
313
+ # Check if we hit the 1000-result limit
314
+ if total_count > 1000 and page == 10:
315
+ print(f"{indent} ⚠️ Hit 1000-result limit ({total_count} total). Splitting time range...")
316
+
317
+ # Determine how to split based on time range duration
318
+ if total_seconds < 2: # Less than 2 seconds - can't split further
319
+ print(f"{indent} ⚠️ Cannot split further (range < 2 seconds). Some results may be missing.")
320
+ break
321
+
322
+ elif total_seconds < 120: # Less than 2 minutes - split by seconds
323
+ # Split into 2-4 parts depending on range
324
+ num_splits = min(4, max(2, int(total_seconds / 30)))
325
+ split_duration = time_diff / num_splits
326
+ split_dates = [start_date + split_duration * i for i in range(num_splits + 1)]
327
+
328
+ total_from_splits = 0
329
+ for i in range(num_splits):
330
+ split_start = split_dates[i]
331
+ split_end = split_dates[i + 1]
332
+ # Avoid overlapping ranges (add 1 second to start)
333
+ if i > 0:
334
+ split_start = split_start + timedelta(seconds=1)
335
+
336
+ count = fetch_reviews_with_time_partition(
337
+ base_query, split_start, split_end, headers, prs_by_url, debug_limit, depth + 1
338
+ )
339
+ total_from_splits += count
340
+
341
+ return total_from_splits
342
+
343
+ elif total_seconds < 7200: # Less than 2 hours - split by minutes
344
+ # Split into 2-4 parts
345
+ num_splits = min(4, max(2, int(total_seconds / 1800)))
346
+ split_duration = time_diff / num_splits
347
+ split_dates = [start_date + split_duration * i for i in range(num_splits + 1)]
348
+
349
+ total_from_splits = 0
350
+ for i in range(num_splits):
351
+ split_start = split_dates[i]
352
+ split_end = split_dates[i + 1]
353
+ # Avoid overlapping ranges (add 1 minute to start)
354
+ if i > 0:
355
+ split_start = split_start + timedelta(minutes=1)
356
+
357
+ count = fetch_reviews_with_time_partition(
358
+ base_query, split_start, split_end, headers, prs_by_url, debug_limit, depth + 1
359
+ )
360
+ total_from_splits += count
361
+
362
+ return total_from_splits
363
+
364
+ elif total_seconds < 172800: # Less than 2 days - split by hours
365
+ # Split into 2-4 parts
366
+ num_splits = min(4, max(2, int(total_seconds / 43200)))
367
+ split_duration = time_diff / num_splits
368
+ split_dates = [start_date + split_duration * i for i in range(num_splits + 1)]
369
+
370
+ total_from_splits = 0
371
+ for i in range(num_splits):
372
+ split_start = split_dates[i]
373
+ split_end = split_dates[i + 1]
374
+ # Avoid overlapping ranges (add 1 hour to start)
375
+ if i > 0:
376
+ split_start = split_start + timedelta(hours=1)
377
+
378
+ count = fetch_reviews_with_time_partition(
379
+ base_query, split_start, split_end, headers, prs_by_url, debug_limit, depth + 1
380
+ )
381
+ total_from_splits += count
382
+
383
+ return total_from_splits
384
+
385
+ else: # 2+ days - split by days
386
+ days_diff = time_diff.days
387
+
388
+ # Use aggressive splitting for large ranges or deep recursion
389
+ # Split into 4 parts if range is > 30 days, otherwise split in half
390
+ if days_diff > 30 or depth > 5:
391
+ # Split into 4 parts for more aggressive partitioning
392
+ quarter_diff = time_diff / 4
393
+ split_dates = [
394
+ start_date,
395
+ start_date + quarter_diff,
396
+ start_date + quarter_diff * 2,
397
+ start_date + quarter_diff * 3,
398
+ end_date
399
+ ]
400
+
401
+ total_from_splits = 0
402
+ for i in range(4):
403
+ split_start = split_dates[i]
404
+ split_end = split_dates[i + 1]
405
+ # Avoid overlapping ranges
406
+ if i > 0:
407
+ split_start = split_start + timedelta(days=1)
408
+
409
+ count = fetch_reviews_with_time_partition(
410
+ base_query, split_start, split_end, headers, prs_by_url, debug_limit, depth + 1
411
+ )
412
+ total_from_splits += count
413
+
414
+ return total_from_splits
415
+ else:
416
+ # Binary split for smaller ranges
417
+ mid_date = start_date + time_diff / 2
418
+
419
+ # Recursively fetch both halves
420
+ count1 = fetch_reviews_with_time_partition(
421
+ base_query, start_date, mid_date, headers, prs_by_url, debug_limit, depth + 1
422
+ )
423
+ count2 = fetch_reviews_with_time_partition(
424
+ base_query, mid_date + timedelta(days=1), end_date, headers, prs_by_url, debug_limit, depth + 1
425
+ )
426
+
427
+ return count1 + count2
428
+
429
+ # Normal pagination: check if there are more pages
430
+ if len(items) < per_page or page >= 10:
431
+ break
432
+
433
+ page += 1
434
+ time.sleep(0.5) # Courtesy delay between pages
435
+
436
+ except Exception as e:
437
+ print(f"{indent} Error fetching range {start_str} to {end_str}: {str(e)}")
438
+ return total_in_partition
439
+
440
+ if total_in_partition > 0:
441
+ print(f"{indent} βœ“ Found {total_in_partition} reviews in range {start_str} to {end_str}")
442
+
443
+ return total_in_partition
444
+
445
+
446
+ def extract_review_metadata(pr):
447
+ """
448
+ Extract minimal PR review metadata for efficient storage.
449
+ Only keeps essential fields: html_url, reviewed_at, pr_status, pr_merged, pr_closed_at.
450
+ Note: agent_name is not stored as it's inferred from the folder structure.
451
+
452
+ PR status:
453
+ - pr_status: 'open', 'merged', or 'closed'
454
+ - pr_merged: True if PR was merged (accepted), False otherwise
455
+ - pr_closed_at: Date when PR was closed/merged (if applicable)
456
+
457
+ Accepted PR = PR that was merged after agent review
458
+ Rejected PR = PR that was closed without merging after agent review
459
+ """
460
+ # Extract PR metadata from search results
461
+ # The GitHub search API returns PR data from /search/issues endpoint
462
+ pr_url = pr.get('html_url')
463
+ pr_number = pr.get('number')
464
+ created_at = pr.get('created_at')
465
+ closed_at = pr.get('closed_at')
466
+ state = pr.get('state', 'open') # open or closed
467
+
468
+ # Check if PR has pull_request field (indicates it's a PR, not an issue)
469
+ pull_request_data = pr.get('pull_request', {})
470
+
471
+ # For initial extraction, we don't know if merged yet
472
+ # This will be updated by update_pr_status function
473
+ pr_merged = pull_request_data.get('merged_at') is not None if pull_request_data else False
474
+
475
+ # Determine initial status
476
+ if pr_merged:
477
+ status = 'merged'
478
+ elif state == 'closed':
479
+ status = 'closed'
480
+ else:
481
+ status = 'open'
482
+
483
+ return {
484
+ 'html_url': pr_url,
485
+ 'reviewed_at': created_at, # When the PR was created (agent reviewed it)
486
+ 'pr_status': status,
487
+ 'pr_merged': pr_merged,
488
+ 'pr_closed_at': closed_at,
489
+ 'pr_url': pr_url, # Store PR URL for tracking
490
+ 'review_id': f"pr_{pr_number}" # Use PR number for deduplication
491
+ }
492
+
493
+
494
+ def update_pr_status(metadata_list, headers, token):
495
+ """
496
+ Update PR status for reviews to get current merged/closed state.
497
+
498
+ For each PR associated with a review, fetch current status from GitHub API.
499
+ Updates metadata_list in-place with PR status information.
500
+
501
+ In DEBUG MODE: Skips status updates to avoid API rate limits.
502
+
503
+ Args:
504
+ metadata_list: List of review metadata dictionaries
505
+ headers: HTTP headers for GitHub API
506
+ token: GitHub API token
507
+
508
+ Returns:
509
+ Updated metadata_list with current PR status
510
+ """
511
+ if not metadata_list:
512
+ return metadata_list
513
+
514
+ # In debug mode, skip status updates to avoid excessive API calls
515
+ if DEBUG_MODE:
516
+ print(f" πŸ› DEBUG MODE: Skipping PR status updates for {len(metadata_list)} reviews")
517
+ return metadata_list
518
+
519
+ # Track unique PRs to avoid duplicate API calls
520
+ pr_url_to_status = {}
521
+ updated_count = 0
522
+
523
+ for metadata in metadata_list:
524
+ pr_url = metadata.get('pr_url')
525
+ if not pr_url:
526
+ continue
527
+
528
+ # Skip if already fetched for this PR
529
+ if pr_url in pr_url_to_status:
530
+ status_info = pr_url_to_status[pr_url]
531
+ metadata['pr_status'] = status_info['status']
532
+ metadata['pr_merged'] = status_info['merged']
533
+ metadata['pr_closed_at'] = status_info['closed_at']
534
+ continue
535
+
536
+ try:
537
+ # Convert HTML URL to API URL
538
+ # https://github.com/owner/repo/pull/123 -> https://api.github.com/repos/owner/repo/pulls/123
539
+ parts = pr_url.replace('https://github.com/', '').split('/')
540
+ if len(parts) >= 4:
541
+ owner, repo, pull_word, pr_number = parts[0], parts[1], parts[2], parts[3]
542
+ api_url = f'https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}'
543
+
544
+ response = request_with_backoff('GET', api_url, headers=headers, max_retries=3)
545
+
546
+ if response and response.status_code == 200:
547
+ pr_data = response.json()
548
+ state = pr_data.get('state', 'open')
549
+ merged = pr_data.get('merged', False)
550
+ closed_at = pr_data.get('closed_at')
551
+ merged_at = pr_data.get('merged_at')
552
+
553
+ # Determine final status
554
+ if merged:
555
+ status = 'merged'
556
+ elif state == 'closed':
557
+ status = 'closed'
558
+ else:
559
+ status = 'open'
560
+
561
+ status_info = {
562
+ 'status': status,
563
+ 'merged': merged,
564
+ 'closed_at': closed_at or merged_at
565
+ }
566
+
567
+ # Cache and update
568
+ pr_url_to_status[pr_url] = status_info
569
+ metadata['pr_status'] = status
570
+ metadata['pr_merged'] = merged
571
+ metadata['pr_closed_at'] = closed_at or merged_at
572
+ updated_count += 1
573
+
574
+ # Small delay to avoid rate limiting
575
+ time.sleep(0.1)
576
+
577
+ except Exception as e:
578
+ print(f" Warning: Could not check PR status for {pr_url}: {e}")
579
+ continue
580
+
581
+ if updated_count > 0:
582
+ print(f" βœ“ Updated status for {updated_count} unique PRs")
583
+
584
+ return metadata_list
585
+
586
+
587
+ def fetch_all_reviews_metadata(identifier, agent_name, token=None, start_from_date=None, year=None, exclude_dates=None):
588
+ """
589
+ Fetch PR reviews associated with a GitHub user or bot for the past 6 months.
590
+ Returns lightweight metadata instead of full review objects.
591
+
592
+ This function employs time-based partitioning to navigate GitHub's 1000-result limit per query.
593
+ It searches using the query pattern:
594
+ - reviewed-by:{identifier} (PR reviews by the agent)
595
+
596
+ After fetching reviews, it updates PR status to determine if PRs were merged or closed.
597
+
598
+ Args:
599
+ identifier: GitHub username or bot identifier
600
+ agent_name: Human-readable name of the agent for metadata purposes
601
+ token: GitHub API token for authentication
602
+ start_from_date: Only fetch reviews created after this date (for incremental updates)
603
+ year: Year parameter (deprecated, retained for compatibility but not utilized)
604
+ exclude_dates: Set of date objects to exclude from mining (dates that have already been processed)
605
+
606
+ Returns:
607
+ List of dictionaries containing minimal PR review metadata with PR status
608
+ """
609
+ headers = {'Authorization': f'token {token}'} if token else {}
610
+
611
+ # Debug mode: limit review retrieval for testing
612
+ debug_limit_per_pattern = 10 if DEBUG_MODE else None
613
+
614
+ if DEBUG_MODE:
615
+ print(f"\nπŸ› DEBUG MODE ENABLED: Limiting to {debug_limit_per_pattern} reviews per query pattern")
616
+
617
+ # Define query pattern for PR reviews:
618
+ query_patterns = []
619
+
620
+ # Add reviewed-by pattern for PR reviews
621
+ query_patterns.append(f'is:pr reviewed-by:{identifier}')
622
+
623
+ # Use a dict to deduplicate PRs by URL
624
+ prs_by_url = {}
625
+
626
+ # Define time range: past 6 months only (or from start_from_date if specified)
627
+ current_time = datetime.now(timezone.utc)
628
+ six_months_ago = current_time - timedelta(days=180) # ~6 months
629
+
630
+ if start_from_date:
631
+ # Use start_from_date but ensure it's not older than 6 months
632
+ start_date = max(start_from_date, six_months_ago)
633
+ else:
634
+ start_date = six_months_ago
635
+
636
+ # End date is current time
637
+ end_date = current_time
638
+
639
+ for query_pattern in query_patterns:
640
+ print(f"\nπŸ” Searching with query: {query_pattern}")
641
+ print(f" Time range: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}")
642
+
643
+ pattern_start_time = time.time()
644
+ initial_count = len(prs_by_url)
645
+
646
+ # Fetch with time partitioning
647
+ reviews_found = fetch_reviews_with_time_partition(
648
+ query_pattern,
649
+ start_date,
650
+ end_date,
651
+ headers,
652
+ prs_by_url,
653
+ debug_limit_per_pattern
654
+ )
655
+
656
+ pattern_duration = time.time() - pattern_start_time
657
+ new_reviews = len(prs_by_url) - initial_count
658
+
659
+ print(f" βœ“ Pattern complete: {new_reviews} new PRs found ({reviews_found} total fetched, {len(prs_by_url) - initial_count - (reviews_found - new_reviews)} duplicates)")
660
+ print(f" ⏱️ Time taken: {pattern_duration:.1f} seconds")
661
+
662
+ # Delay between different query patterns (shorter in debug mode)
663
+ time.sleep(0.2 if DEBUG_MODE else 1.0)
664
+
665
+ # Convert to lightweight metadata
666
+ all_prs = list(prs_by_url.values())
667
+
668
+ # Filter out PRs from excluded dates if specified
669
+ if exclude_dates:
670
+ filtered_prs = []
671
+ excluded_count = 0
672
+ for pr in all_prs:
673
+ created_at = pr.get('created_at')
674
+ if created_at:
675
+ try:
676
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
677
+ pr_date = dt.date()
678
+ if pr_date not in exclude_dates:
679
+ filtered_prs.append(pr)
680
+ else:
681
+ excluded_count += 1
682
+ except Exception:
683
+ filtered_prs.append(pr) # Keep PRs with unparseable dates
684
+ else:
685
+ filtered_prs.append(pr) # Keep PRs without created_at
686
+
687
+ if excluded_count > 0:
688
+ print(f" ⏭️ Skipped {excluded_count} PRs from already-mined dates")
689
+ all_prs = filtered_prs
690
+
691
+ if DEBUG_MODE:
692
+ print(f"\nβœ… COMPLETE (DEBUG MODE): Found {len(all_prs)} unique PRs reviewed by {identifier}")
693
+ print(f" Note: In production mode, this would fetch ALL PRs")
694
+ else:
695
+ print(f"\nβœ… COMPLETE: Found {len(all_prs)} unique PRs reviewed by {identifier}")
696
+ print(f"πŸ“¦ Extracting minimal metadata and updating PR status...")
697
+
698
+ # Extract metadata for each PR review
699
+ metadata_list = [extract_review_metadata(pr) for pr in all_prs]
700
+
701
+ # Update PR status to get current merged/closed state
702
+ print(f"πŸ” Updating PR status for reviewed PRs...")
703
+ metadata_list = update_pr_status(metadata_list, headers, token)
704
+
705
+ # Calculate memory savings
706
+ import sys
707
+ original_size = sys.getsizeof(str(all_prs))
708
+ metadata_size = sys.getsizeof(str(metadata_list))
709
+ savings_pct = ((original_size - metadata_size) / original_size * 100) if original_size > 0 else 0
710
+
711
+ print(f"πŸ’Ύ Memory efficiency: {original_size // 1024}KB β†’ {metadata_size // 1024}KB (saved {savings_pct:.1f}%)")
712
+
713
+ return metadata_list
714
+
715
+
716
+ def calculate_review_stats_from_metadata(metadata_list):
717
+ """
718
+ Calculate statistics from a list of review metadata (lightweight objects).
719
+ Works with minimal metadata: html_url, reviewed_at, pr_status, pr_merged, pr_closed_at.
720
+
721
+ Returns a dictionary with comprehensive review metrics.
722
+
723
+ Acceptance Rate is calculated as:
724
+ accepted PRs / (accepted PRs + rejected PRs) * 100
725
+
726
+ Accepted PRs = PRs that were merged (pr_status='merged')
727
+ Rejected PRs = PRs that were closed without merging (pr_status='closed')
728
+ Pending PRs = PRs still open (pr_status='open') - excluded from acceptance rate
729
+ """
730
+ total_reviews = len(metadata_list)
731
+
732
+ # Count accepted PRs (merged)
733
+ accepted_prs = sum(1 for review_meta in metadata_list
734
+ if review_meta.get('pr_status') == 'merged')
735
+
736
+ # Count rejected PRs (closed without merging)
737
+ rejected_prs = sum(1 for review_meta in metadata_list
738
+ if review_meta.get('pr_status') == 'closed')
739
+
740
+ # Count pending PRs (still open)
741
+ pending_prs = sum(1 for review_meta in metadata_list
742
+ if review_meta.get('pr_status') == 'open')
743
+
744
+ # Calculate acceptance rate (exclude pending PRs)
745
+ completed_prs = accepted_prs + rejected_prs
746
+ acceptance_rate = (accepted_prs / completed_prs * 100) if completed_prs > 0 else 0
747
+
748
+ return {
749
+ 'total_reviews': total_reviews,
750
+ 'accepted_prs': accepted_prs,
751
+ 'rejected_prs': rejected_prs,
752
+ 'pending_prs': pending_prs,
753
+ 'acceptance_rate': round(acceptance_rate, 2),
754
+ }
755
+
756
+
757
+ def calculate_monthly_metrics_by_agent():
758
+ """
759
+ Calculate monthly metrics for all agents for visualization.
760
+ Loads data directly from SWE-Arena/review_metadata dataset for the current year.
761
+
762
+ Returns:
763
+ dict: {
764
+ 'agents': list of agent names,
765
+ 'months': list of month labels (e.g., '2025-01'),
766
+ 'data': {
767
+ agent_name: {
768
+ 'acceptance_rates': list of acceptance rates by month,
769
+ 'total_reviews': list of review counts by month,
770
+ 'accepted_prs': list of accepted PR counts by month,
771
+ 'rejected_prs': list of rejected PR counts by month
772
+ }
773
+ }
774
+ }
775
+ """
776
+ # Get current year for loading metadata
777
+ current_year = datetime.now().year
778
+
779
+ # Load ALL agents from HuggingFace agents repo
780
+ agents = load_agents_from_hf()
781
+
782
+ # Create mapping from agent_identifier to agent_name
783
+ identifier_to_name = {agent.get('github_identifier'): agent.get('agent_name') for agent in agents if agent.get('github_identifier')}
784
+
785
+ # Load all review metadata for current year from review_metadata dataset
786
+ all_metadata = load_review_metadata_for_year(current_year)
787
+
788
+ if not all_metadata:
789
+ return {'agents': [], 'months': [], 'data': {}}
790
+
791
+ # Group by agent and month
792
+ agent_month_data = defaultdict(lambda: defaultdict(list))
793
+
794
+ for review_meta in all_metadata:
795
+ agent_identifier = review_meta.get('agent_identifier')
796
+ reviewed_at = review_meta.get('reviewed_at')
797
+
798
+ if not agent_identifier or not reviewed_at:
799
+ continue
800
+
801
+ # Get agent_name from identifier
802
+ agent_name = identifier_to_name.get(agent_identifier, agent_identifier)
803
+
804
+ try:
805
+ dt = datetime.fromisoformat(reviewed_at.replace('Z', '+00:00'))
806
+ month_key = f"{dt.year}-{dt.month:02d}"
807
+ agent_month_data[agent_name][month_key].append(review_meta)
808
+ except Exception as e:
809
+ print(f"Warning: Could not parse date '{reviewed_at}': {e}")
810
+ continue
811
+
812
+ # Get all unique months and sort them
813
+ all_months = set()
814
+ for agent_data in agent_month_data.values():
815
+ all_months.update(agent_data.keys())
816
+ months = sorted(list(all_months))
817
+
818
+ # Calculate metrics for each agent and month
819
+ result_data = {}
820
+ for agent_name, month_dict in agent_month_data.items():
821
+ acceptance_rates = []
822
+ total_reviews_list = []
823
+ accepted_prs_list = []
824
+ rejected_prs_list = []
825
+
826
+ for month in months:
827
+ reviews_in_month = month_dict.get(month, [])
828
+
829
+ # Count accepted PRs (merged)
830
+ accepted_count = sum(1 for review in reviews_in_month
831
+ if review.get('pr_status') == 'merged')
832
+
833
+ # Count rejected PRs (closed without merging)
834
+ rejected_count = sum(1 for review in reviews_in_month
835
+ if review.get('pr_status') == 'closed')
836
+
837
+ # Total reviews created in this month
838
+ total_count = len(reviews_in_month)
839
+
840
+ # Calculate acceptance rate (exclude pending PRs)
841
+ completed_count = accepted_count + rejected_count
842
+ acceptance_rate = (accepted_count / completed_count * 100) if completed_count > 0 else None
843
+
844
+ acceptance_rates.append(acceptance_rate)
845
+ total_reviews_list.append(total_count)
846
+ accepted_prs_list.append(accepted_count)
847
+ rejected_prs_list.append(rejected_count)
848
+
849
+ result_data[agent_name] = {
850
+ 'acceptance_rates': acceptance_rates,
851
+ 'total_reviews': total_reviews_list,
852
+ 'accepted_prs': accepted_prs_list,
853
+ 'rejected_prs': rejected_prs_list
854
+ }
855
+
856
+ return {
857
+ 'agents': sorted(list(agent_month_data.keys())),
858
+ 'months': months,
859
+ 'data': result_data
860
+ }
861
+
862
+
863
+ # =============================================================================
864
+ # REVIEW METADATA STORAGE & RETRIEVAL
865
+ # =============================================================================
866
+
867
+ def group_metadata_by_date(metadata_list):
868
+ """
869
+ Group review metadata by exact date (year.month.day) for efficient daily storage.
870
+ Returns dict: {(year, month, day): [metadata_list]}
871
+ """
872
+ grouped = defaultdict(list)
873
+
874
+ for review_meta in metadata_list:
875
+ reviewed_at = review_meta.get('reviewed_at')
876
+ if not reviewed_at:
877
+ continue
878
+
879
+ try:
880
+ dt = datetime.fromisoformat(reviewed_at.replace('Z', '+00:00'))
881
+ key = (dt.year, dt.month, dt.day)
882
+ grouped[key].append(review_meta)
883
+ except Exception as e:
884
+ print(f"Warning: Could not parse date '{reviewed_at}': {e}")
885
+
886
+ return dict(grouped)
887
+
888
+
889
+ def save_review_metadata_to_hf(metadata_list, agent_identifier):
890
+ """
891
+ Save review metadata to HuggingFace dataset, organized by [agent_identifier]/YYYY.MM.DD.jsonl.
892
+ Each file is stored in the agent's folder and named YYYY.MM.DD.jsonl for that day's reviews.
893
+ In debug mode, saves to in-memory cache only.
894
+
895
+ This function APPENDS new metadata and DEDUPLICATES by sha.
896
+
897
+ Args:
898
+ metadata_list: List of review metadata dictionaries
899
+ agent_identifier: GitHub identifier of the agent (used as folder name)
900
+ """
901
+ # Skip saving to HF in debug mode - use in-memory cache instead
902
+ if DEBUG_MODE:
903
+ global DEBUG_REVIEW_METADATA_CACHE
904
+ # Merge with existing cache, deduplicating by review_id
905
+ existing = {review['review_id']: review for review in DEBUG_REVIEW_METADATA_CACHE[agent_identifier] if review.get('review_id')}
906
+ new = {review['review_id']: review for review in metadata_list if review.get('review_id')}
907
+ existing.update(new)
908
+ DEBUG_REVIEW_METADATA_CACHE[agent_identifier] = list(existing.values())
909
+ print(f"πŸ› DEBUG MODE: Saved to in-memory cache only ({len(metadata_list)} reviews) - NOT saved to HuggingFace")
910
+ return True
911
+
912
+ try:
913
+ token = get_hf_token()
914
+ if not token:
915
+ raise Exception("No HuggingFace token found")
916
+
917
+ api = HfApi()
918
+
919
+ # Group by exact date (year, month, day)
920
+ grouped = group_metadata_by_date(metadata_list)
921
+
922
+ for (review_year, month, day), day_metadata in grouped.items():
923
+ # New structure: [agent_identifier]/YYYY.MM.DD.jsonl
924
+ filename = f"{agent_identifier}/{review_year}.{month:02d}.{day:02d}.jsonl"
925
+ local_filename = f"{review_year}.{month:02d}.{day:02d}.jsonl"
926
+ print(f"πŸ“€ Uploading {len(day_metadata)} reviews to {filename}...")
927
+
928
+ # Download existing file if it exists
929
+ existing_metadata = []
930
+ try:
931
+ file_path = hf_hub_download(
932
+ repo_id=REVIEW_METADATA_REPO,
933
+ filename=filename,
934
+ repo_type="dataset",
935
+ token=token
936
+ )
937
+ existing_metadata = load_jsonl(file_path)
938
+ print(f" Found {len(existing_metadata)} existing reviews in {filename}")
939
+ except Exception:
940
+ print(f" No existing file found for {filename}, creating new")
941
+
942
+ # Merge and deduplicate by review_id
943
+ existing_by_id = {meta['review_id']: meta for meta in existing_metadata if meta.get('review_id')}
944
+ new_by_id = {meta['review_id']: meta for meta in day_metadata if meta.get('review_id')}
945
+
946
+ # Update with new data (new data overwrites old)
947
+ existing_by_id.update(new_by_id)
948
+ merged_metadata = list(existing_by_id.values())
949
+
950
+ # Save locally
951
+ save_jsonl(local_filename, merged_metadata)
952
+
953
+ try:
954
+ # Upload to HuggingFace with folder path
955
+ upload_with_retry(
956
+ api=api,
957
+ path_or_fileobj=local_filename,
958
+ path_in_repo=filename,
959
+ repo_id=REVIEW_METADATA_REPO,
960
+ repo_type="dataset",
961
+ token=token
962
+ )
963
+ print(f" βœ“ Saved {len(merged_metadata)} total reviews to {filename}")
964
+ finally:
965
+ # Always clean up local file, even if upload fails
966
+ if os.path.exists(local_filename):
967
+ os.remove(local_filename)
968
+
969
+ return True
970
+
971
+ except Exception as e:
972
+ print(f"βœ— Error saving review metadata: {str(e)}")
973
+ return False
974
+
975
+
976
+ def load_review_metadata_for_year(year):
977
+ """
978
+ Load all review metadata for a specific year from HuggingFace.
979
+ Scans all agent folders and loads daily files matching the year.
980
+ In debug mode, loads from in-memory cache if available.
981
+
982
+ Structure: [agent_identifier]/YYYY.MM.DD.jsonl
983
+
984
+ Returns:
985
+ List of dictionaries with 'agent_identifier' added to each review metadata.
986
+ """
987
+ # In debug mode, check in-memory cache first
988
+ if DEBUG_MODE and DEBUG_REVIEW_METADATA_CACHE:
989
+ all_metadata = []
990
+ for agent_identifier, metadata_list in DEBUG_REVIEW_METADATA_CACHE.items():
991
+ for review_meta in metadata_list:
992
+ review_with_agent = review_meta.copy()
993
+ review_with_agent['agent_identifier'] = agent_identifier
994
+ all_metadata.append(review_with_agent)
995
+ if all_metadata:
996
+ print(f"πŸ› DEBUG MODE: Loading review metadata from in-memory cache ({len(all_metadata)} reviews)")
997
+ return all_metadata
998
+
999
+ try:
1000
+ api = HfApi()
1001
+ token = get_hf_token()
1002
+
1003
+ # List all files in the repository
1004
+ files = api.list_repo_files(repo_id=REVIEW_METADATA_REPO, repo_type="dataset")
1005
+
1006
+ # Filter for files matching the year pattern: [agent_identifier]/YYYY.MM.DD.jsonl
1007
+ # Extract year from filename
1008
+ year_str = str(year)
1009
+ year_files = []
1010
+ for f in files:
1011
+ if f.endswith('.jsonl'):
1012
+ parts = f.split('/')
1013
+ if len(parts) == 2: # [agent_identifier]/YYYY.MM.DD.jsonl
1014
+ filename = parts[1]
1015
+ if filename.startswith(year_str + '.'):
1016
+ year_files.append(f)
1017
+
1018
+ print(f"πŸ“₯ Loading review metadata for {year} ({len(year_files)} daily files across all agents)...")
1019
+
1020
+ all_metadata = []
1021
+ for filename in year_files:
1022
+ try:
1023
+ # Extract agent_identifier from path (first part)
1024
+ # Format: agent_identifier/YYYY.MM.DD.jsonl
1025
+ parts = filename.split('/')
1026
+ if len(parts) != 2:
1027
+ print(f" Warning: Unexpected filename format: {filename}")
1028
+ continue
1029
+
1030
+ agent_identifier = parts[0]
1031
+
1032
+ file_path = hf_hub_download(
1033
+ repo_id=REVIEW_METADATA_REPO,
1034
+ filename=filename,
1035
+ repo_type="dataset",
1036
+ token=token
1037
+ )
1038
+ day_metadata = load_jsonl(file_path)
1039
+
1040
+ # Add agent_identifier to each review metadata for processing
1041
+ for review_meta in day_metadata:
1042
+ review_meta['agent_identifier'] = agent_identifier
1043
+
1044
+ all_metadata.extend(day_metadata)
1045
+ print(f" βœ“ Loaded {len(day_metadata)} reviews from {filename}")
1046
+ except Exception as e:
1047
+ print(f" Warning: Could not load {filename}: {str(e)}")
1048
+
1049
+ print(f"βœ“ Loaded {len(all_metadata)} total reviews for {year}")
1050
+ return all_metadata
1051
+
1052
+ except Exception as e:
1053
+ print(f"βœ— Error loading review metadata for {year}: {str(e)}")
1054
+ return []
1055
+
1056
+
1057
+ def get_latest_review_date_for_agent(agent_identifier):
1058
+ """
1059
+ Get the latest review creation date for an agent from stored metadata.
1060
+ Used for incremental updates - only fetch reviews newer than this date.
1061
+
1062
+ Structure: [agent_identifier]/YYYY.MM.DD.jsonl
1063
+
1064
+ Args:
1065
+ agent_identifier: GitHub identifier of the agent
1066
+
1067
+ Returns:
1068
+ datetime or None if no existing reviews found.
1069
+ """
1070
+ try:
1071
+ api = HfApi()
1072
+ token = get_hf_token()
1073
+
1074
+ # List all files in the repository
1075
+ files = api.list_repo_files(repo_id=REVIEW_METADATA_REPO, repo_type="dataset")
1076
+
1077
+ # Filter for files in this agent's folder
1078
+ # New structure: [agent_identifier]/YYYY.MM.DD.jsonl
1079
+ agent_pattern = f"{agent_identifier}/"
1080
+ agent_files = [f for f in files if f.startswith(agent_pattern) and f.endswith('.jsonl')]
1081
+
1082
+ if not agent_files:
1083
+ return None
1084
+
1085
+ # Find latest created_at across all files
1086
+ latest_date = None
1087
+ for filename in agent_files:
1088
+ try:
1089
+ file_path = hf_hub_download(
1090
+ repo_id=REVIEW_METADATA_REPO,
1091
+ filename=filename,
1092
+ repo_type="dataset",
1093
+ token=token
1094
+ )
1095
+ metadata = load_jsonl(file_path)
1096
+
1097
+ for review_meta in metadata:
1098
+ reviewed_at = review_meta.get("reviewed_at")
1099
+ if reviewed_at:
1100
+ try:
1101
+ dt = datetime.fromisoformat(reviewed_at.replace("Z", "+00:00"))
1102
+ if latest_date is None or dt > latest_date:
1103
+ latest_date = dt
1104
+ except Exception:
1105
+ continue
1106
+ except Exception:
1107
+ continue
1108
+
1109
+ return latest_date
1110
+
1111
+ except Exception:
1112
+ return None
1113
+
1114
+
1115
+ def get_daily_files_last_n_months(agent_identifier, n_months=6):
1116
+ """
1117
+ Get list of daily file paths for an agent from the last N months.
1118
+
1119
+ Args:
1120
+ agent_identifier: GitHub identifier of the agent
1121
+ n_months: Number of months to look back (default: 6)
1122
+
1123
+ Returns:
1124
+ List of file paths in format: [agent_identifier]/YYYY.MM.DD.jsonl
1125
+ """
1126
+ try:
1127
+ api = HfApi()
1128
+ token = get_hf_token()
1129
+
1130
+ # Calculate date range
1131
+ today = datetime.now(timezone.utc)
1132
+ n_months_ago = today - timedelta(days=30 * n_months)
1133
+
1134
+ # List all files in the repository
1135
+ files = api.list_repo_files(repo_id=REVIEW_METADATA_REPO, repo_type="dataset")
1136
+
1137
+ # Filter for files in this agent's folder
1138
+ agent_pattern = f"{agent_identifier}/"
1139
+ agent_files = [f for f in files if f.startswith(agent_pattern) and f.endswith('.jsonl')]
1140
+
1141
+ # Filter by date range (extract date from filename)
1142
+ recent_files = []
1143
+ for filename in agent_files:
1144
+ try:
1145
+ # Extract date from filename: YYYY.MM.DD.jsonl
1146
+ parts = filename.split('/')
1147
+ if len(parts) != 2:
1148
+ continue
1149
+
1150
+ date_part = parts[1].replace('.jsonl', '') # Get YYYY.MM.DD
1151
+ date_components = date_part.split('.')
1152
+ if len(date_components) != 3:
1153
+ continue
1154
+
1155
+ file_year, file_month, file_day = map(int, date_components)
1156
+ file_date = datetime(file_year, file_month, file_day, tzinfo=timezone.utc)
1157
+
1158
+ # Include if within last n_months
1159
+ if n_months_ago <= file_date <= today:
1160
+ recent_files.append(filename)
1161
+ except Exception:
1162
+ continue
1163
+
1164
+ return recent_files
1165
+
1166
+ except Exception as e:
1167
+ print(f"Error getting daily files: {str(e)}")
1168
+ return []
1169
+
1170
+
1171
+ def get_already_mined_dates(agent_identifier, n_months=6):
1172
+ """
1173
+ Get set of dates that have already been mined for an agent.
1174
+
1175
+ Args:
1176
+ agent_identifier: GitHub identifier of the agent
1177
+ n_months: Number of months to look back (default: 6)
1178
+
1179
+ Returns:
1180
+ Set of date objects (datetime.date) that already have data files
1181
+ """
1182
+ try:
1183
+ api = HfApi()
1184
+
1185
+ # Calculate date range
1186
+ today = datetime.now(timezone.utc)
1187
+ n_months_ago = today - timedelta(days=30 * n_months)
1188
+
1189
+ # List all files in the repository
1190
+ files = api.list_repo_files(repo_id=REVIEW_METADATA_REPO, repo_type="dataset")
1191
+
1192
+ # Filter for files in this agent's folder
1193
+ agent_pattern = f"{agent_identifier}/"
1194
+ agent_files = [f for f in files if f.startswith(agent_pattern) and f.endswith('.jsonl')]
1195
+
1196
+ mined_dates = set()
1197
+ for filename in agent_files:
1198
+ try:
1199
+ # Extract date from filename: [agent_identifier]/YYYY.MM.DD.jsonl
1200
+ parts = filename.split('/')
1201
+ if len(parts) != 2:
1202
+ continue
1203
+
1204
+ date_part = parts[1].replace('.jsonl', '') # Get YYYY.MM.DD
1205
+ date_components = date_part.split('.')
1206
+ if len(date_components) != 3:
1207
+ continue
1208
+
1209
+ file_year, file_month, file_day = map(int, date_components)
1210
+ file_date = datetime(file_year, file_month, file_day, tzinfo=timezone.utc).date()
1211
+
1212
+ # Only include dates within the last n_months
1213
+ if n_months_ago.date() <= file_date <= today.date():
1214
+ mined_dates.add(file_date)
1215
+ except Exception as e:
1216
+ print(f" Warning: Could not parse date from filename {filename}: {e}")
1217
+ continue
1218
+
1219
+ return mined_dates
1220
+
1221
+ except Exception as e:
1222
+ print(f" Warning: Could not get already-mined dates for {agent_identifier}: {str(e)}")
1223
+ return set()
1224
+
1225
+
1226
+ def fetch_review_current_status(review_url, token):
1227
+ """
1228
+ Fetch the current revert status of a single review from GitHub API.
1229
+
1230
+ Args:
1231
+ token: GitHub API token
1232
+ token: GitHub API token
1233
+
1234
+ Returns:
1235
+ Dictionary with updated is_reverted and revert_at, or None if failed
1236
+ """
1237
+ try:
1238
+ # Convert HTML URL to API URL
1239
+ # https://github.com/owner/repo/reviews/123 -> https://api.github.com/repos/owner/repo/reviews/123
1240
+ parts = review_url.replace('https://github.com/', '').split('/')
1241
+ if len(parts) < 4:
1242
+ return None
1243
+
1244
+ owner, repo, review_word, review_number = parts[0], parts[1], parts[2], parts[3]
1245
+ api_url = f'https://api.github.com/repos/{owner}/{repo}/reviews/{review_number}'
1246
+
1247
+ headers = {'Authorization': f'token {token}'} if token else {}
1248
+ response = request_with_backoff('GET', api_url, headers=headers, max_retries=3)
1249
+
1250
+ if response is None or response.status_code != 200:
1251
+ return None
1252
+
1253
+ review_data = response.json()
1254
+ state = review_data.get('state')
1255
+ state_reason = review_data.get('state_reason')
1256
+ closed_at = review_data.get('closed_at')
1257
+
1258
+ return {
1259
+ 'state': state,
1260
+ 'state_reason': state_reason,
1261
+ 'closed_at': closed_at
1262
+ }
1263
+
1264
+ except Exception as e:
1265
+ print(f" Error fetching review status for {review_url}: {str(e)}")
1266
+ return None
1267
+
1268
+
1269
+ def refresh_review_status_for_agent(agent_identifier, token):
1270
+ """
1271
+ Refresh status for all open reviews from the last 6 months for an agent.
1272
+ Only updates reviews that are still open (state="open" or no state_reason).
1273
+
1274
+ This implements the smart update strategy:
1275
+ - Skip reviews that are already closed/resolved
1276
+ - Fetch current status for open reviews
1277
+ - Update and save back to daily files
1278
+
1279
+ Args:
1280
+ agent_identifier: GitHub identifier of the agent
1281
+ token: GitHub API token
1282
+
1283
+ Returns:
1284
+ Tuple: (total_checked, updated_count)
1285
+ """
1286
+ print(f"\nπŸ”„ Refreshing open reviews for {agent_identifier} (last 6 months)...")
1287
+
1288
+ try:
1289
+ # Get daily files from last 6 months
1290
+ recent_files = get_daily_files_last_n_months(agent_identifier, n_months=6)
1291
+
1292
+ if not recent_files:
1293
+ print(f" No recent files found for {agent_identifier}")
1294
+ return (0, 0)
1295
+
1296
+ print(f" Found {len(recent_files)} daily files to check")
1297
+
1298
+ total_checked = 0
1299
+ updated_count = 0
1300
+
1301
+ # Process each file
1302
+ for filename in recent_files:
1303
+ try:
1304
+ # Download file
1305
+ file_path = hf_hub_download(
1306
+ repo_id=REVIEW_METADATA_REPO,
1307
+ filename=filename,
1308
+ repo_type="dataset",
1309
+ token=get_hf_token()
1310
+ )
1311
+ reviews = load_jsonl(file_path)
1312
+
1313
+ if not reviews:
1314
+ continue
1315
+
1316
+ updated_reviews = []
1317
+ file_had_updates = False
1318
+
1319
+ # Check each review
1320
+ for review in reviews:
1321
+ # Skip if already closed (has a state_reason)
1322
+ if review.get("is_reverted"):
1323
+ updated_reviews.append(review)
1324
+ continue
1325
+
1326
+ # Review may have been reverted, check status
1327
+ review_url = review.get("html_url")
1328
+
1329
+ if not review_url:
1330
+ updated_reviews.append(review)
1331
+ continue
1332
+
1333
+ current_status = fetch_review_current_status(review_url, token)
1334
+
1335
+ if current_status:
1336
+ # Check if status changed (now closed)
1337
+ if current_status['state'] == 'closed':
1338
+ print(f" βœ“ Review status changed: {review_url}")
1339
+ review['state'] = current_status['state']
1340
+ review['state_reason'] = current_status['state_reason']
1341
+ review['closed_at'] = current_status['closed_at']
1342
+ updated_count += 1
1343
+ file_had_updates = True
1344
+
1345
+ updated_reviews.append(review)
1346
+ time.sleep(0.1) # Rate limiting courtesy delay
1347
+
1348
+ # Save file if there were updates
1349
+ if file_had_updates:
1350
+ # Extract filename components for local save
1351
+ parts = filename.split('/')
1352
+ local_filename = parts[-1] # Just YYYY.MM.DD.jsonl
1353
+
1354
+ # Save locally
1355
+ save_jsonl(local_filename, updated_reviews)
1356
+
1357
+ try:
1358
+ # Upload back to HuggingFace
1359
+ api = HfApi()
1360
+ upload_with_retry(
1361
+ api=api,
1362
+ path_or_fileobj=local_filename,
1363
+ path_in_repo=filename,
1364
+ repo_id=REVIEW_METADATA_REPO,
1365
+ repo_type="dataset",
1366
+ token=get_hf_token()
1367
+ )
1368
+ print(f" πŸ’Ύ Updated {filename}")
1369
+ finally:
1370
+ # Always clean up local file, even if upload fails
1371
+ if os.path.exists(local_filename):
1372
+ os.remove(local_filename)
1373
+
1374
+ except Exception as e:
1375
+ print(f" Warning: Could not process {filename}: {str(e)}")
1376
+ continue
1377
+
1378
+ print(f" βœ… Refresh complete: {total_checked} open reviews checked, {updated_count} updated")
1379
+ return (total_checked, updated_count)
1380
+
1381
+ except Exception as e:
1382
+ print(f" βœ— Error refreshing reviews for {agent_identifier}: {str(e)}")
1383
+ return (0, 0)
1384
+
1385
+
1386
+ # =============================================================================
1387
+ # HUGGINGFACE DATASET OPERATIONS
1388
+ # =============================================================================
1389
+
1390
+ def load_agents_from_hf():
1391
+ """Load all agent metadata JSON files from HuggingFace dataset."""
1392
+ try:
1393
+ api = HfApi()
1394
+ agents = []
1395
+
1396
+ # List all files in the repository
1397
+ files = api.list_repo_files(repo_id=AGENTS_REPO, repo_type="dataset")
1398
+
1399
+ # Filter for JSON files only
1400
+ json_files = [f for f in files if f.endswith('.json')]
1401
+
1402
+ print(f"Found {len(json_files)} agent files in {AGENTS_REPO}")
1403
+
1404
+ # Download and parse each JSON file
1405
+ for json_file in json_files:
1406
+ try:
1407
+ file_path = hf_hub_download(
1408
+ repo_id=AGENTS_REPO,
1409
+ filename=json_file,
1410
+ repo_type="dataset"
1411
+ )
1412
+
1413
+ with open(file_path, 'r') as f:
1414
+ agent_data = json.load(f)
1415
+ agents.append(agent_data)
1416
+
1417
+ except Exception as e:
1418
+ print(f"Warning: Could not load {json_file}: {str(e)}")
1419
+ continue
1420
+
1421
+ print(f"βœ“ Loaded {len(agents)} agents from HuggingFace")
1422
+ return agents
1423
+
1424
+ except Exception as e:
1425
+ print(f"Could not load agents from HuggingFace: {str(e)}")
1426
+ return None
1427
+
1428
+
1429
+
1430
+
1431
+ def get_hf_token():
1432
+ """Get HuggingFace token from environment variables."""
1433
+ token = os.getenv('HF_TOKEN')
1434
+ if not token:
1435
+ print("Warning: HF_TOKEN not found in environment variables")
1436
+ return token
1437
+
1438
+
1439
+ def upload_with_retry(api, path_or_fileobj, path_in_repo, repo_id, repo_type, token, max_retries=5):
1440
+ """
1441
+ Upload file to HuggingFace with exponential backoff retry logic.
1442
+
1443
+ Args:
1444
+ api: HfApi instance
1445
+ path_or_fileobj: Local file path to upload
1446
+ path_in_repo: Target path in the repository
1447
+ repo_id: Repository ID
1448
+ repo_type: Type of repository (e.g., "dataset")
1449
+ token: HuggingFace token
1450
+ max_retries: Maximum number of retry attempts
1451
+
1452
+ Returns:
1453
+ True if upload succeeded, raises exception if all retries failed
1454
+ """
1455
+ delay = 2.0 # Initial delay in seconds
1456
+
1457
+ for attempt in range(max_retries):
1458
+ try:
1459
+ api.upload_file(
1460
+ path_or_fileobj=path_or_fileobj,
1461
+ path_in_repo=path_in_repo,
1462
+ repo_id=repo_id,
1463
+ repo_type=repo_type,
1464
+ token=token
1465
+ )
1466
+ if attempt > 0:
1467
+ print(f" βœ“ Upload succeeded on attempt {attempt + 1}/{max_retries}")
1468
+ return True
1469
+
1470
+ except Exception as e:
1471
+ if attempt < max_retries - 1:
1472
+ wait_time = delay + random.uniform(0, 1.0)
1473
+ print(f" ⚠️ Upload failed (attempt {attempt + 1}/{max_retries}): {str(e)}")
1474
+ print(f" ⏳ Retrying in {wait_time:.1f} seconds...")
1475
+ time.sleep(wait_time)
1476
+ delay = min(delay * 2, 60.0) # Exponential backoff, max 60s
1477
+ else:
1478
+ print(f" βœ— Upload failed after {max_retries} attempts: {str(e)}")
1479
+ raise
1480
+
1481
+
1482
+ def save_agent_to_hf(data):
1483
+ """Save a new agent to HuggingFace dataset as {identifier}.json in root."""
1484
+ try:
1485
+ api = HfApi()
1486
+ token = get_hf_token()
1487
+
1488
+ if not token:
1489
+ raise Exception("No HuggingFace token found. Please set HF_TOKEN in your Space settings.")
1490
+
1491
+ identifier = data['github_identifier']
1492
+ filename = f"{identifier}.json"
1493
+
1494
+ # Save locally first
1495
+ with open(filename, 'w') as f:
1496
+ json.dump(data, f, indent=2)
1497
+
1498
+ try:
1499
+ # Upload to HuggingFace (root directory)
1500
+ upload_with_retry(
1501
+ api=api,
1502
+ path_or_fileobj=filename,
1503
+ path_in_repo=filename,
1504
+ repo_id=AGENTS_REPO,
1505
+ repo_type="dataset",
1506
+ token=token
1507
+ )
1508
+ print(f"βœ“ Saved agent to HuggingFace: {filename}")
1509
+ return True
1510
+ finally:
1511
+ # Always clean up local file, even if upload fails
1512
+ if os.path.exists(filename):
1513
+ os.remove(filename)
1514
+
1515
+ except Exception as e:
1516
+ print(f"βœ— Error saving agent: {str(e)}")
1517
+ return False
1518
+
1519
+
1520
+
1521
+
1522
+ # =============================================================================
1523
+ # DATA MANAGEMENT
1524
+ # =============================================================================
1525
+
1526
+ def update_all_agents_incremental():
1527
+ """
1528
+ Memory-efficient incremental update of review statistics for all agents.
1529
+
1530
+ Strategy:
1531
+ 1. For each agent, load existing data from SWE-Arena/review_metadata
1532
+ 2. Identify already-mined dates (based on filename: YYYY.MM.DD.jsonl)
1533
+ 3. Only fetch reviews from dates that haven't been mined yet (within last 6 months)
1534
+ 4. If no data exists at all, mine everything from scratch
1535
+ 5. Store minimal metadata (not full review objects) to avoid storage limits
1536
+ 6. Construct leaderboard from ALL stored metadata (last 6 months)
1537
+
1538
+ Returns dictionary of all agent data with current stats.
1539
+ """
1540
+ token = get_github_token()
1541
+ current_year = datetime.now().year
1542
+
1543
+ # Load agent metadata from HuggingFace
1544
+ agents = load_agents_from_hf()
1545
+ if not agents:
1546
+ print("No agents found in HuggingFace dataset")
1547
+ return {}
1548
+
1549
+ cache_dict = {}
1550
+
1551
+ # Update each agent
1552
+ for agent in agents:
1553
+ identifier = agent.get('github_identifier')
1554
+ agent_name = agent.get('agent_name', 'Unknown')
1555
+
1556
+ if not identifier:
1557
+ print(f"Warning: Skipping agent without identifier: {agent}")
1558
+ continue
1559
+
1560
+ try:
1561
+ print(f"\n{'='*80}")
1562
+ print(f"Processing: {agent_name} ({identifier})")
1563
+ print(f"{'='*80}")
1564
+
1565
+ # Get already-mined dates for this agent (last 6 months)
1566
+ already_mined_dates = get_already_mined_dates(identifier, n_months=6)
1567
+
1568
+ if already_mined_dates:
1569
+ print(f"πŸ“… Found {len(already_mined_dates)} already-mined dates")
1570
+ print(f" Skipping these dates and fetching only new data...")
1571
+ # Fetch only reviews from dates not yet mined
1572
+ new_metadata = fetch_all_reviews_metadata(
1573
+ identifier,
1574
+ agent_name,
1575
+ token,
1576
+ start_from_date=None, # Use full 6-month range
1577
+ exclude_dates=already_mined_dates # But exclude already-mined dates
1578
+ )
1579
+ else:
1580
+ print(f"πŸ“… No existing data found. Mining everything from scratch...")
1581
+ # Mine everything from scratch (full 6-month range)
1582
+ new_metadata = fetch_all_reviews_metadata(
1583
+ identifier,
1584
+ agent_name,
1585
+ token,
1586
+ start_from_date=None
1587
+ )
1588
+
1589
+ if new_metadata:
1590
+ # Save new metadata to HuggingFace (organized by agent_identifier/YYYY.MM.DD.jsonl)
1591
+ print(f"πŸ’Ύ Saving {len(new_metadata)} new review records...")
1592
+ save_review_metadata_to_hf(new_metadata, identifier)
1593
+ else:
1594
+ print(f" No new reviews to save")
1595
+
1596
+ # Load ALL metadata for current year to calculate stats (aggregates entire last 6 months)
1597
+ print(f"πŸ“Š Calculating statistics from ALL stored metadata (last 6 months)...")
1598
+ all_year_metadata = load_review_metadata_for_year(current_year)
1599
+
1600
+ # Filter for this specific agent
1601
+ agent_metadata = [review for review in all_year_metadata if review.get("agent_identifier") == identifier]
1602
+
1603
+ # Calculate stats from metadata
1604
+ stats = calculate_review_stats_from_metadata(agent_metadata)
1605
+
1606
+ # Merge metadata with stats
1607
+ cache_dict[identifier] = {
1608
+ 'agent_name': agent_name,
1609
+ 'website': agent.get('website', 'N/A'),
1610
+ 'github_identifier': identifier,
1611
+ **stats
1612
+ }
1613
+
1614
+ print(f"βœ“ Updated {identifier}: {stats['total_reviews']} reviews, {stats['acceptance_rate']}% acceptance rate")
1615
+
1616
+ except Exception as e:
1617
+ print(f"βœ— Error updating {identifier}: {str(e)}")
1618
+ import traceback
1619
+ traceback.print_exc()
1620
+ continue
1621
+
1622
+ return cache_dict
1623
+
1624
+
1625
+ def construct_leaderboard_from_metadata():
1626
+ """
1627
+ Construct leaderboard from stored review metadata instead of fetching all reviews.
1628
+ Much more memory-efficient and faster.
1629
+
1630
+ Returns dictionary of agent stats.
1631
+ """
1632
+ print("πŸ“Š Constructing leaderboard from review metadata...")
1633
+ current_year = datetime.now().year
1634
+
1635
+ # Load agents
1636
+ agents = load_agents_from_hf()
1637
+ if not agents:
1638
+ print("No agents found")
1639
+ return {}
1640
+
1641
+ # Load all review metadata for current year
1642
+ all_metadata = load_review_metadata_for_year(current_year)
1643
+
1644
+ cache_dict = {}
1645
+
1646
+ for agent in agents:
1647
+ identifier = agent.get('github_identifier')
1648
+ agent_name = agent.get('agent_name', 'Unknown')
1649
+
1650
+ # Filter metadata for this agent
1651
+ agent_metadata = [review for review in all_metadata if review.get("agent_identifier") == identifier]
1652
+
1653
+ # Calculate stats
1654
+ stats = calculate_review_stats_from_metadata(agent_metadata)
1655
+
1656
+ cache_dict[identifier] = {
1657
+ 'agent_name': agent_name,
1658
+ 'website': agent.get('website', 'N/A'),
1659
+ 'github_identifier': identifier,
1660
+ **stats
1661
+ }
1662
+
1663
+ return cache_dict
1664
+
1665
+
1666
+ def initialize_data():
1667
+ """
1668
+ Initialize data on application startup.
1669
+ Constructs leaderboard from review metadata.
1670
+
1671
+ In DEBUG MODE:
1672
+ - If no data available, automatically mine up to 10 reviews per query per agent
1673
+ - Does NOT save to HuggingFace datasets
1674
+ """
1675
+ print("πŸš€ Initializing leaderboard data...")
1676
+
1677
+ # Try constructing from review metadata (fast, memory-efficient)
1678
+ print(f"πŸ“‚ Checking {REVIEW_METADATA_REPO} for existing data...")
1679
+ try:
1680
+ cache_dict = construct_leaderboard_from_metadata()
1681
+ # Check if there's actually meaningful data (at least one agent with reviews)
1682
+ has_data = any(entry.get('total_reviews', 0) > 0 for entry in cache_dict.values())
1683
+ if cache_dict and has_data:
1684
+ print(f"βœ“ Found existing review metadata. Leaderboard constructed from {REVIEW_METADATA_REPO}")
1685
+ return
1686
+ else:
1687
+ print(f" No meaningful data found in {REVIEW_METADATA_REPO}")
1688
+ except Exception as e:
1689
+ print(f" Could not construct from metadata: {e}")
1690
+
1691
+ # If in debug mode and no data available, mine immediately
1692
+ if DEBUG_MODE:
1693
+ print("\nπŸ› DEBUG MODE: No data available, mining immediately (up to 10 reviews per query per agent)...")
1694
+ agents = load_agents_from_hf()
1695
+ if agents:
1696
+ print(f"βœ“ Loaded {len(agents)} agents from HuggingFace")
1697
+ print("⛏️ Mining GitHub data in debug mode (limited to 10 reviews per query)...")
1698
+ cache_dict = update_all_agents_incremental()
1699
+ print("βœ“ Debug mining complete (data NOT saved to HuggingFace)")
1700
+ return
1701
+ else:
1702
+ print("⚠️ No agents found. Waiting for first submission...")
1703
+ return
1704
+
1705
+ # Production mode: Fallback to full incremental mining from GitHub
1706
+ agents = load_agents_from_hf()
1707
+ if agents:
1708
+ print(f"βœ“ Loaded {len(agents)} agents from HuggingFace")
1709
+ print("⛏️ Mining GitHub data (this may take a while)...")
1710
+ cache_dict = update_all_agents_incremental()
1711
+ return
1712
+
1713
+ # No data available
1714
+ print("⚠️ No data sources available. Waiting for first submission...")
1715
+
1716
+
1717
+ # =============================================================================
1718
+ # UI FUNCTIONS
1719
+ # =============================================================================
1720
+
1721
+ def create_monthly_metrics_plot():
1722
+ """
1723
+ Create a Plotly figure with dual y-axes showing:
1724
+ - Left y-axis: Acceptance Rate (%) as line curves
1725
+ - Right y-axis: Total Reviews created as bar charts
1726
+
1727
+ Each agent gets a unique color for both their line and bars.
1728
+ """
1729
+ metrics = calculate_monthly_metrics_by_agent()
1730
+
1731
+ if not metrics['agents'] or not metrics['months']:
1732
+ # Return an empty figure with a message
1733
+ fig = go.Figure()
1734
+ fig.add_annotation(
1735
+ text="No data available for visualization",
1736
+ xref="paper", yref="paper",
1737
+ x=0.5, y=0.5, showarrow=False,
1738
+ font=dict(size=16)
1739
+ )
1740
+ fig.update_layout(
1741
+ title=None,
1742
+ xaxis_title=None,
1743
+ height=500
1744
+ )
1745
+ return fig
1746
+
1747
+ # Create figure with secondary y-axis
1748
+ fig = make_subplots(specs=[[{"secondary_y": True}]])
1749
+
1750
+ # Define colors for agents (using a color palette)
1751
+ colors = [
1752
+ '#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
1753
+ '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'
1754
+ ]
1755
+
1756
+ agents = metrics['agents']
1757
+ months = metrics['months']
1758
+ data = metrics['data']
1759
+
1760
+ # Add traces for each agent
1761
+ for idx, agent_name in enumerate(agents):
1762
+ color = colors[idx % len(colors)]
1763
+ agent_data = data[agent_name]
1764
+
1765
+ # Add line trace for acceptance rate (left y-axis)
1766
+ acceptance_rates = agent_data['acceptance_rates']
1767
+ # Filter out None values for plotting
1768
+ x_acceptance = [month for month, rate in zip(months, acceptance_rates) if rate is not None]
1769
+ y_acceptance = [rate for rate in acceptance_rates if rate is not None]
1770
+
1771
+ if x_acceptance and y_acceptance: # Only add trace if there's data
1772
+ fig.add_trace(
1773
+ go.Scatter(
1774
+ x=x_acceptance,
1775
+ y=y_acceptance,
1776
+ name=agent_name,
1777
+ mode='lines+markers',
1778
+ line=dict(color=color, width=2),
1779
+ marker=dict(size=6),
1780
+ legendgroup=agent_name,
1781
+ showlegend=True,
1782
+ hovertemplate='<b>%{fullData.name}</b><br>' +
1783
+ 'Month: %{x}<br>' +
1784
+ 'Acceptance Rate: %{y:.2f}%<br>' +
1785
+ '<extra></extra>'
1786
+ ),
1787
+ secondary_y=False
1788
+ )
1789
+
1790
+ # Add bar trace for total reviews (right y-axis)
1791
+ # Only show bars for months where agent has reviews
1792
+ x_bars = []
1793
+ y_bars = []
1794
+ for month, count in zip(months, agent_data['total_reviews']):
1795
+ if count > 0: # Only include months with reviews
1796
+ x_bars.append(month)
1797
+ y_bars.append(count)
1798
+
1799
+ if x_bars and y_bars: # Only add trace if there's data
1800
+ fig.add_trace(
1801
+ go.Bar(
1802
+ x=x_bars,
1803
+ y=y_bars,
1804
+ name=f"{agent_name} (Reviews)",
1805
+ marker=dict(color=color, opacity=0.6),
1806
+ legendgroup=agent_name,
1807
+ showlegend=False, # Don't show in legend (already shown for line)
1808
+ hovertemplate='<b>%{fullData.name}</b><br>' +
1809
+ 'Month: %{x}<br>' +
1810
+ 'Total Reviews: %{y}<br>' +
1811
+ '<extra></extra>',
1812
+ offsetgroup=agent_name # Group bars by agent for proper spacing
1813
+ ),
1814
+ secondary_y=True
1815
+ )
1816
+
1817
+ # Update axes labels
1818
+ fig.update_xaxes(title_text=None)
1819
+ fig.update_yaxes(title_text="<b>Acceptance Rate (%)</b>", secondary_y=False)
1820
+ fig.update_yaxes(title_text="<b>Total Reviews</b>", secondary_y=True)
1821
+
1822
+ # Update layout
1823
+ fig.update_layout(
1824
+ title=None,
1825
+ hovermode='x unified',
1826
+ barmode='group',
1827
+ height=600,
1828
+ legend=dict(
1829
+ orientation="h",
1830
+ yanchor="bottom",
1831
+ y=1.02,
1832
+ xanchor="right",
1833
+ x=1
1834
+ ),
1835
+ margin=dict(l=50, r=50, t=100, b=50)
1836
+ )
1837
+
1838
+ return fig
1839
+
1840
+
1841
+ def get_leaderboard_dataframe():
1842
+ """
1843
+ Construct leaderboard from review metadata and convert to pandas DataFrame for display.
1844
+ Returns formatted DataFrame sorted by retention rate.
1845
+ """
1846
+ # Construct leaderboard from metadata
1847
+ cache_dict = construct_leaderboard_from_metadata()
1848
+
1849
+ if not cache_dict:
1850
+ # Return empty DataFrame with correct columns if no data
1851
+ column_names = [col[0] for col in LEADERBOARD_COLUMNS]
1852
+ return pd.DataFrame(columns=column_names)
1853
+
1854
+ rows = []
1855
+ for data in cache_dict.values():
1856
+ # Filter out agents with zero total reviews
1857
+ if data.get('total_reviews', 0) == 0:
1858
+ continue
1859
+ # Only include display-relevant fields
1860
+ rows.append([
1861
+ data.get('agent_name', 'Unknown'),
1862
+ data.get('website', 'N/A'),
1863
+ data.get('total_reviews', 0),
1864
+ data.get('accepted_prs', 0),
1865
+ data.get('rejected_prs', 0),
1866
+ data.get('acceptance_rate', 0.0),
1867
+ ])
1868
+
1869
+ # Create DataFrame
1870
+ column_names = [col[0] for col in LEADERBOARD_COLUMNS]
1871
+ df = pd.DataFrame(rows, columns=column_names)
1872
+
1873
+ # Ensure numeric types
1874
+ numeric_cols = ["Total Reviews", "Accepted PRs", "Rejected PRs", "Acceptance Rate (%)"]
1875
+ for col in numeric_cols:
1876
+ if col in df.columns:
1877
+ df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
1878
+
1879
+ # Sort by Acceptance Rate (%) descending
1880
+ if "Acceptance Rate (%)" in df.columns and not df.empty:
1881
+ df = df.sort_values(by="Acceptance Rate (%)", ascending=False).reset_index(drop=True)
1882
+
1883
+ return df
1884
+
1885
+
1886
+ def submit_agent(identifier, agent_name, organization, description, website):
1887
+ """
1888
+ Submit a new agent to the leaderboard.
1889
+ Validates input, saves submission, and fetches PR metadata (memory-efficient).
1890
+ """
1891
+ # Validate required fields
1892
+ if not identifier or not identifier.strip():
1893
+ return "❌ GitHub identifier is required", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1894
+ if not agent_name or not agent_name.strip():
1895
+ return "❌ Agent name is required", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1896
+ if not organization or not organization.strip():
1897
+ return "❌ Organization name is required", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1898
+ if not website or not website.strip():
1899
+ return "❌ Website URL is required", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1900
+
1901
+ # Clean inputs
1902
+ identifier = identifier.strip()
1903
+ agent_name = agent_name.strip()
1904
+ organization = organization.strip()
1905
+ description = description.strip()
1906
+ website = website.strip()
1907
+
1908
+ # Validate GitHub identifier
1909
+ is_valid, message = validate_github_username(identifier)
1910
+ if not is_valid:
1911
+ return f"❌ {message}", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1912
+
1913
+ # Check for duplicates by loading agents from HuggingFace
1914
+ agents = load_agents_from_hf()
1915
+ if agents:
1916
+ existing_names = {agent['github_identifier'] for agent in agents}
1917
+ if identifier in existing_names:
1918
+ return f"⚠️ Agent with identifier '{identifier}' already exists", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1919
+
1920
+ # Create submission
1921
+ submission = {
1922
+ 'agent_name': agent_name,
1923
+ 'organization': organization,
1924
+ 'github_identifier': identifier,
1925
+ 'description': description,
1926
+ 'website': website,
1927
+ }
1928
+
1929
+ # Save to HuggingFace
1930
+ if not save_agent_to_hf(submission):
1931
+ return "❌ Failed to save submission", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1932
+
1933
+ # Fetch review metadata immediately (memory-efficient)
1934
+ token = get_github_token()
1935
+ try:
1936
+ print(f"Fetching review metadata for {agent_name}...")
1937
+
1938
+ # Fetch lightweight metadata
1939
+ metadata_list = fetch_all_reviews_metadata(identifier, agent_name, token)
1940
+
1941
+ if metadata_list:
1942
+ # Save metadata to HuggingFace
1943
+ save_review_metadata_to_hf(metadata_list, identifier)
1944
+
1945
+ # Calculate stats from metadata
1946
+ stats = calculate_review_stats_from_metadata(metadata_list)
1947
+
1948
+ return f"βœ… Successfully submitted {agent_name}! Stats: {stats['total_reviews']} reviews, {stats['acceptance_rate']}% acceptance rate", get_leaderboard_dataframe(), create_monthly_metrics_plot()
1949
+
1950
+ except Exception as e:
1951
+ error_msg = f"⚠️ Submitted {agent_name}, but failed to fetch review data: {str(e)}"
1952
+ print(error_msg)
1953
+ import traceback
1954
+ traceback.print_exc()
1955
+ return error_msg, get_leaderboard_dataframe(), create_monthly_metrics_plot()
1956
+
1957
+
1958
+ # =============================================================================
1959
+ # BACKGROUND TASKS
1960
+ # =============================================================================
1961
+
1962
+ def daily_update_task():
1963
+ """
1964
+ Daily scheduled task (runs at 12:00 AM UTC) for smart review updates.
1965
+
1966
+ Strategy:
1967
+ 1. For each agent, refresh open reviews from last 6 months
1968
+ 2. Skip reviews that are already closed/resolved (no API calls)
1969
+ 3. Only fetch status for open reviews to check if they've been closed/resolved
1970
+ 4. Update leaderboard with refreshed data
1971
+
1972
+ This is much more efficient than fetching all reviews every time.
1973
+ """
1974
+ print(f"\n{'='*80}")
1975
+ print(f"πŸ•› Daily update started at {datetime.now(timezone.utc).isoformat()}")
1976
+ print(f"{'='*80}")
1977
+
1978
+ try:
1979
+ token = get_github_token()
1980
+
1981
+ # Load all agents
1982
+ agents = load_agents_from_hf()
1983
+ if not agents:
1984
+ print("No agents found")
1985
+ return
1986
+
1987
+ print(f"πŸ“‹ Processing {len(agents)} agents...")
1988
+
1989
+ total_checked = 0
1990
+ total_updated = 0
1991
+
1992
+ # Refresh open reviews for each agent (last 6 months)
1993
+ for agent in agents:
1994
+ identifier = agent.get('github_identifier')
1995
+ agent_name = agent.get('agent_name', 'Unknown')
1996
+
1997
+ if not identifier:
1998
+ continue
1999
+
2000
+ print(f"\n{'='*60}")
2001
+ print(f"Processing: {agent_name} ({identifier})")
2002
+ print(f"{'='*60}")
2003
+
2004
+ # Refresh open reviews from last 6 months
2005
+ checked, updated = refresh_review_status_for_agent(identifier, token)
2006
+ total_checked += checked
2007
+ total_updated += updated
2008
+
2009
+ print(f"\n{'='*80}")
2010
+ print(f"πŸ“Š Refresh Summary:")
2011
+ print(f" Total open reviews checked: {total_checked}")
2012
+ print(f" Reviews updated (newly reverted): {total_updated}")
2013
+ print(f"{'='*80}")
2014
+
2015
+ print(f"\nβœ… Daily update completed at {datetime.now(timezone.utc).isoformat()}")
2016
+
2017
+ except Exception as e:
2018
+ print(f"βœ— Daily update failed: {str(e)}")
2019
+ import traceback
2020
+ traceback.print_exc()
2021
+
2022
+
2023
+ # =============================================================================
2024
+ # GRADIO APPLICATION
2025
+ # =============================================================================
2026
+
2027
+ # Initialize data before creating UI
2028
+ if DEBUG_MODE:
2029
+ print("\n" + "="*80)
2030
+ print("πŸ› DEBUG MODE ENABLED πŸ›")
2031
+ print("="*80)
2032
+ print("Review retrieval is limited to 10 reviews per query pattern per agent")
2033
+
2034
+ # Show how debug mode was enabled
2035
+ if args.debug:
2036
+ print("Enabled via: command-line flag '--debug'")
2037
+ print("To disable: run without '--debug' flag")
2038
+ else:
2039
+ print("Enabled via: DEBUG_MODE environment variable")
2040
+ print("To disable: run with '--no-debug' flag or unset DEBUG_MODE")
2041
+
2042
+ print("="*80 + "\n")
2043
+ else:
2044
+ print("\nπŸš€ Starting in PRODUCTION MODE - full review retrieval enabled")
2045
+ if args.no_debug:
2046
+ print(" (Explicitly set via '--no-debug' flag)")
2047
+ print()
2048
+
2049
+ initialize_data()
2050
+
2051
+ # Start APScheduler for daily updates at 12:00 AM UTC
2052
+ scheduler = BackgroundScheduler(timezone="UTC")
2053
+ scheduler.add_job(
2054
+ daily_update_task,
2055
+ trigger=CronTrigger(hour=0, minute=0), # 12:00 AM UTC daily
2056
+ id='daily_review_refresh',
2057
+ name='Daily Review Status Refresh',
2058
+ replace_existing=True
2059
+ )
2060
+ scheduler.start()
2061
+ print("βœ“ Scheduler started: Daily updates at 12:00 AM UTC")
2062
+
2063
+ # Create Gradio interface
2064
+ with gr.Blocks(title="SWE Agent Review Leaderboard", theme=gr.themes.Soft()) as app:
2065
+
2066
+ gr.Markdown("# πŸ† SWE Agent Review Leaderboard")
2067
+ gr.Markdown("Track and compare GitHub PR review acceptance statistics for SWE agents (last 6 months)")
2068
+
2069
+ with gr.Tabs():
2070
+
2071
+ # Leaderboard Tab
2072
+ with gr.Tab("πŸ“Š Leaderboard"):
2073
+ gr.Markdown("*All statistics are based on reviews from the last 6 months*")
2074
+ leaderboard_table = Leaderboard(
2075
+ value=get_leaderboard_dataframe(),
2076
+ datatype=LEADERBOARD_COLUMNS,
2077
+ search_columns=["Agent Name", "Website"],
2078
+ filter_columns=["Acceptance Rate (%)"]
2079
+ )
2080
+
2081
+ gr.Markdown("### Monthly Metrics")
2082
+ gr.Markdown("Track acceptance rates and review activity over time")
2083
+
2084
+ monthly_plot = gr.Plot(
2085
+ value=create_monthly_metrics_plot(),
2086
+ label="Monthly Review Metrics"
2087
+ )
2088
+
2089
+ # Submit Agent Tab
2090
+ with gr.Tab("βž• Submit Agent"):
2091
+
2092
+ gr.Markdown("### Submit Your Agent")
2093
+ gr.Markdown("Fill in the details below to add your agent to the leaderboard. Make sure you're logged in to HuggingFace CLI on your machine.")
2094
+
2095
+ with gr.Row():
2096
+ with gr.Column():
2097
+ github_input = gr.Textbox(
2098
+ label="GitHub Identifier*",
2099
+ placeholder="Your agent username (e.g., my-agent-bot)"
2100
+ )
2101
+ name_input = gr.Textbox(
2102
+ label="Agent Name*",
2103
+ placeholder="Your agent's display name"
2104
+ )
2105
+
2106
+ with gr.Column():
2107
+ organization_input = gr.Textbox(
2108
+ label="Organization*",
2109
+ placeholder="Your organization or team name"
2110
+ )
2111
+ description_input = gr.Textbox(
2112
+ label="Description",
2113
+ placeholder="Brief description of your agent",
2114
+ lines=3
2115
+ )
2116
+ website_input = gr.Textbox(
2117
+ label="Website",
2118
+ placeholder="https://your-agent-website.com"
2119
+ )
2120
+
2121
+ submit_button = gr.Button(
2122
+ "Submit Agent",
2123
+ variant="primary"
2124
+ )
2125
+ submission_status = gr.Textbox(
2126
+ label="Submission Status",
2127
+ interactive=False
2128
+ )
2129
+
2130
+ # Event handler
2131
+ submit_button.click(
2132
+ fn=submit_agent,
2133
+ inputs=[github_input, name_input, organization_input, description_input, website_input],
2134
+ outputs=[submission_status, leaderboard_table, monthly_plot]
2135
+ )
2136
+
2137
+
2138
+ # Launch application
2139
+ if __name__ == "__main__":
2140
+ app.launch()
msr.py ADDED
@@ -0,0 +1,1224 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Standalone miner to fetch PR review metadata and update the leaderboard immediately.
3
+
4
+ This script reuses the same logic and on-disk/HuggingFace formats as app.py, but
5
+ has no UI or scheduler. You can run it once, or run it in a loop for hours.
6
+
7
+ Datasets used:
8
+ - Agents: SWE-Arena/swe_agents
9
+ - Review metadata: SWE-Arena/review_metadata
10
+
11
+ Environment:
12
+ - Requires HF_TOKEN (for HuggingFace uploads)
13
+ - Optional GITHUB_TOKEN (highly recommended to avoid low rate limits)
14
+ - Reads .env if present
15
+
16
+ CLI flags:
17
+ - --debug / --no-debug: Same semantics as app.py (debug limits to 10 PRs/pattern
18
+ and DOES NOT save to HF, mirroring app.py behavior).
19
+ - --loop: Keep running in a loop.
20
+ - --interval-seconds N: Sleep between loops (default 3600 seconds).
21
+
22
+ Note: In production mode (default), data will be saved to HuggingFace datasets.
23
+ """
24
+
25
+ import argparse
26
+ import json
27
+ import os
28
+ import random
29
+ import sys
30
+ import time
31
+ from collections import defaultdict
32
+ from datetime import datetime, timezone, timedelta
33
+
34
+ import pandas as pd
35
+ import requests
36
+ from dotenv import load_dotenv
37
+ from huggingface_hub import HfApi, hf_hub_download
38
+
39
+
40
+ # =============================================================================
41
+ # Environment & CLI
42
+ # =============================================================================
43
+
44
+ load_dotenv()
45
+
46
+ parser = argparse.ArgumentParser(description="Immediate PR review miner for SWE Arena")
47
+ parser.add_argument("--debug", "--DEBUG", action="store_true", help="Enable debug mode (limits PR retrieval to 10 per query; does NOT save to HF)")
48
+ parser.add_argument("--no-debug", "--production", action="store_true", help="Explicitly disable debug mode (force production mode)")
49
+ parser.add_argument("--loop", action="store_true", help="Run in a loop until interrupted")
50
+ parser.add_argument("--interval-seconds", type=int, default=3600, help="Sleep interval between loops in seconds (default: 3600)")
51
+ args = parser.parse_args()
52
+
53
+ # DEBUG MODE priority: 1) flags, 2) env var, 3) default False
54
+ if args.no_debug:
55
+ DEBUG_MODE = False
56
+ elif args.debug:
57
+ DEBUG_MODE = True
58
+ else:
59
+ DEBUG_MODE = os.getenv("DEBUG_MODE", "False").lower() in ("true", "1", "yes")
60
+
61
+
62
+ # =============================================================================
63
+ # Constants (match app.py)
64
+ # =============================================================================
65
+
66
+ DEBUG_REVIEW_METADATA_CACHE = defaultdict(list)
67
+
68
+ AGENTS_REPO = "SWE-Arena/swe_agents"
69
+ REVIEW_METADATA_REPO = "SWE-Arena/review_metadata"
70
+
71
+
72
+ # =============================================================================
73
+ # Utilities & I/O (match app.py behavior exactly)
74
+ # =============================================================================
75
+
76
+ def load_jsonl(filename):
77
+ """Load JSONL file and return list of dictionaries."""
78
+ if not os.path.exists(filename):
79
+ return []
80
+
81
+ data = []
82
+ with open(filename, 'r', encoding='utf-8') as f:
83
+ for line in f:
84
+ line = line.strip()
85
+ if line:
86
+ try:
87
+ entry = json.loads(line)
88
+ data.append(entry)
89
+ except json.JSONDecodeError as e:
90
+ print(f"Warning: Skipping invalid JSON line: {e}")
91
+ return data
92
+
93
+
94
+ def save_jsonl(filename, data):
95
+ """Save list of dictionaries to JSONL file."""
96
+ with open(filename, 'w', encoding='utf-8') as f:
97
+ for item in data:
98
+ f.write(json.dumps(item) + '\n')
99
+
100
+
101
+ def cache_to_dict(cache_list):
102
+ return {entry['github_identifier']: entry for entry in cache_list}
103
+
104
+
105
+ def dict_to_cache(cache_dict):
106
+ return list(cache_dict.values())
107
+
108
+
109
+ def get_github_token():
110
+ token = os.getenv('GITHUB_TOKEN')
111
+ if not token:
112
+ print("Warning: GITHUB_TOKEN not found. API rate limits: 60/hour (authenticated: 5000/hour)")
113
+ return token
114
+
115
+
116
+ def get_hf_token():
117
+ token = os.getenv('HF_TOKEN')
118
+ if not token:
119
+ print("Warning: HF_TOKEN not found in environment variables")
120
+ return token
121
+
122
+
123
+ def upload_with_retry(api, path_or_fileobj, path_in_repo, repo_id, repo_type, token, max_retries=5):
124
+ """
125
+ Upload file to HuggingFace with exponential backoff retry logic.
126
+
127
+ Args:
128
+ api: HfApi instance
129
+ path_or_fileobj: Local file path to upload
130
+ path_in_repo: Target path in the repository
131
+ repo_id: Repository ID
132
+ repo_type: Type of repository (e.g., "dataset")
133
+ token: HuggingFace token
134
+ max_retries: Maximum number of retry attempts
135
+
136
+ Returns:
137
+ True if upload succeeded, raises exception if all retries failed
138
+ """
139
+ delay = 2.0 # Initial delay in seconds
140
+
141
+ for attempt in range(max_retries):
142
+ try:
143
+ api.upload_file(
144
+ path_or_fileobj=path_or_fileobj,
145
+ path_in_repo=path_in_repo,
146
+ repo_id=repo_id,
147
+ repo_type=repo_type,
148
+ token=token
149
+ )
150
+ if attempt > 0:
151
+ print(f" βœ“ Upload succeeded on attempt {attempt + 1}/{max_retries}")
152
+ return True
153
+
154
+ except Exception as e:
155
+ if attempt < max_retries - 1:
156
+ wait_time = delay + random.uniform(0, 1.0)
157
+ print(f" ⚠️ Upload failed (attempt {attempt + 1}/{max_retries}): {str(e)}")
158
+ print(f" ⏳ Retrying in {wait_time:.1f} seconds...")
159
+ time.sleep(wait_time)
160
+ delay = min(delay * 2, 60.0) # Exponential backoff, max 60s
161
+ else:
162
+ print(f" βœ— Upload failed after {max_retries} attempts: {str(e)}")
163
+ raise
164
+
165
+
166
+ # =============================================================================
167
+ # GitHub API with backoff (same as app.py)
168
+ # =============================================================================
169
+
170
+ def request_with_backoff(method, url, *, headers=None, params=None, json_body=None, data=None, max_retries=10, timeout=30):
171
+ delay = 1.0
172
+ for attempt in range(max_retries):
173
+ try:
174
+ resp = requests.request(
175
+ method,
176
+ url,
177
+ headers=headers or {},
178
+ params=params,
179
+ json=json_body,
180
+ data=data,
181
+ timeout=timeout
182
+ )
183
+
184
+ status = resp.status_code
185
+
186
+ if 200 <= status < 300:
187
+ return resp
188
+
189
+ if status in (403, 429) or 500 <= status < 600:
190
+ wait = None
191
+ retry_after = resp.headers.get('Retry-After') or resp.headers.get('retry-after')
192
+ if retry_after:
193
+ try:
194
+ wait = float(retry_after)
195
+ except Exception:
196
+ wait = None
197
+ if wait is None and status in (403, 429):
198
+ reset_hdr = resp.headers.get('X-RateLimit-Reset') or resp.headers.get('x-ratelimit-reset')
199
+ if reset_hdr:
200
+ try:
201
+ reset_ts = int(float(reset_hdr))
202
+ wait = max(reset_ts - time.time() + 2, 1)
203
+ except Exception:
204
+ wait = None
205
+ if wait is None:
206
+ wait = delay + random.uniform(0, 0.5)
207
+ wait = max(1.0, min(wait, 120.0))
208
+ print(f"GitHub API {status}. Backing off {wait:.1f}s (attempt {attempt + 1}/{max_retries})...")
209
+ time.sleep(wait)
210
+ delay = min(delay * 2, 60.0)
211
+ continue
212
+
213
+ return resp
214
+
215
+ except requests.RequestException as e:
216
+ wait = delay + random.uniform(0, 0.5)
217
+ wait = max(1.0, min(wait, 60.0))
218
+ print(f"Request error: {e}. Retrying in {wait:.1f}s (attempt {attempt + 1}/{max_retries})...")
219
+ time.sleep(wait)
220
+ delay = min(delay * 2, 60.0)
221
+
222
+ print(f"Exceeded max retries for {url}")
223
+ return None
224
+
225
+
226
+ def fetch_reviews_with_time_partition(base_query, start_date, end_date, headers, prs_by_url, debug_limit=None, depth=0):
227
+ """
228
+ Fetch PR reviews within a specific time range using time-based partitioning.
229
+ Recursively splits the time range if hitting the 1000-result limit.
230
+ Supports splitting by day, hour, minute, and second as needed.
231
+
232
+ Args:
233
+ debug_limit: If set, stops fetching after this many NEW PRs total across all partitions (for testing)
234
+ depth: Current recursion depth (for tracking)
235
+
236
+ Returns the number of PRs found in this time partition.
237
+ """
238
+ # Calculate time difference
239
+ time_diff = end_date - start_date
240
+ total_seconds = time_diff.total_seconds()
241
+
242
+ # Determine granularity and format dates accordingly
243
+ if total_seconds >= 86400: # >= 1 day
244
+ # Use day granularity (YYYY-MM-DD)
245
+ start_str = start_date.strftime('%Y-%m-%d')
246
+ end_str = end_date.strftime('%Y-%m-%d')
247
+ elif total_seconds >= 3600: # >= 1 hour but < 1 day
248
+ # Use hour granularity (YYYY-MM-DDTHH:MM:SSZ)
249
+ start_str = start_date.strftime('%Y-%m-%dT%H:00:00Z')
250
+ end_str = end_date.strftime('%Y-%m-%dT%H:59:59Z')
251
+ elif total_seconds >= 60: # >= 1 minute but < 1 hour
252
+ # Use minute granularity (YYYY-MM-DDTHH:MM:SSZ)
253
+ start_str = start_date.strftime('%Y-%m-%dT%H:%M:00Z')
254
+ end_str = end_date.strftime('%Y-%m-%dT%H:%M:59Z')
255
+ else: # < 1 minute
256
+ # Use second granularity (YYYY-MM-DDTHH:MM:SSZ)
257
+ start_str = start_date.strftime('%Y-%m-%dT%H:%M:%SZ')
258
+ end_str = end_date.strftime('%Y-%m-%dT%H:%M:%SZ')
259
+
260
+ query = f'{base_query} created:{start_str}..{end_str}'
261
+
262
+ indent = " " + " " * depth
263
+ print(f"{indent}Searching range {start_str} to {end_str}...")
264
+
265
+ page = 1
266
+ per_page = 100
267
+ total_in_partition = 0
268
+
269
+ while True:
270
+ # Check debug limit GLOBALLY (total unique PRs across all partitions)
271
+ if debug_limit is not None and len(prs_by_url) >= debug_limit:
272
+ print(f"{indent} πŸ› DEBUG MODE: Reached global limit of {debug_limit} PRs, stopping...")
273
+ return total_in_partition
274
+
275
+ url = 'https://api.github.com/search/issues' # Use issues endpoint for PR search
276
+ params = {
277
+ 'q': query,
278
+ 'per_page': per_page,
279
+ 'page': page,
280
+ 'sort': 'created',
281
+ 'order': 'asc'
282
+ }
283
+ headers_with_accept = headers.copy() if headers else {}
284
+
285
+ try:
286
+ response = request_with_backoff('GET', url, headers=headers_with_accept, params=params)
287
+ if response is None:
288
+ print(f"{indent} Error: retries exhausted for range {start_str} to {end_str}")
289
+ return total_in_partition
290
+
291
+ if response.status_code != 200:
292
+ print(f"{indent} Error: HTTP {response.status_code} for range {start_str} to {end_str}")
293
+ return total_in_partition
294
+
295
+ data = response.json()
296
+ total_count = data.get('total_count', 0)
297
+ items = data.get('items', [])
298
+
299
+ if not items:
300
+ break
301
+
302
+ # Add PR reviews to global dict (keyed by PR URL)
303
+ for pr in items:
304
+ pr_url = pr.get('html_url')
305
+ pr_number = pr.get('number')
306
+ # Use PR URL as unique key (more reliable than number alone)
307
+ if pr_url and pr_url not in prs_by_url:
308
+ prs_by_url[pr_url] = pr
309
+ total_in_partition += 1
310
+
311
+ # Check if we hit the 1000-result limit
312
+ if total_count > 1000 and page == 10:
313
+ print(f"{indent} ⚠️ Hit 1000-result limit ({total_count} total). Splitting time range...")
314
+
315
+ # Determine how to split based on time range duration
316
+ if total_seconds < 2: # Less than 2 seconds - can't split further
317
+ print(f"{indent} ⚠️ Cannot split further (range < 2 seconds). Some results may be missing.")
318
+ break
319
+
320
+ elif total_seconds < 120: # Less than 2 minutes - split by seconds
321
+ # Split into 2-4 parts depending on range
322
+ num_splits = min(4, max(2, int(total_seconds / 30)))
323
+ split_duration = time_diff / num_splits
324
+ split_dates = [start_date + split_duration * i for i in range(num_splits + 1)]
325
+
326
+ total_from_splits = 0
327
+ for i in range(num_splits):
328
+ split_start = split_dates[i]
329
+ split_end = split_dates[i + 1]
330
+ # Avoid overlapping ranges (add 1 second to start)
331
+ if i > 0:
332
+ split_start = split_start + timedelta(seconds=1)
333
+
334
+ count = fetch_reviews_with_time_partition(
335
+ base_query, split_start, split_end, headers, prs_by_url, debug_limit, depth + 1
336
+ )
337
+ total_from_splits += count
338
+
339
+ return total_from_splits
340
+
341
+ elif total_seconds < 7200: # Less than 2 hours - split by minutes
342
+ # Split into 2-4 parts
343
+ num_splits = min(4, max(2, int(total_seconds / 1800)))
344
+ split_duration = time_diff / num_splits
345
+ split_dates = [start_date + split_duration * i for i in range(num_splits + 1)]
346
+
347
+ total_from_splits = 0
348
+ for i in range(num_splits):
349
+ split_start = split_dates[i]
350
+ split_end = split_dates[i + 1]
351
+ # Avoid overlapping ranges (add 1 minute to start)
352
+ if i > 0:
353
+ split_start = split_start + timedelta(minutes=1)
354
+
355
+ count = fetch_reviews_with_time_partition(
356
+ base_query, split_start, split_end, headers, prs_by_url, debug_limit, depth + 1
357
+ )
358
+ total_from_splits += count
359
+
360
+ return total_from_splits
361
+
362
+ elif total_seconds < 172800: # Less than 2 days - split by hours
363
+ # Split into 2-4 parts
364
+ num_splits = min(4, max(2, int(total_seconds / 43200)))
365
+ split_duration = time_diff / num_splits
366
+ split_dates = [start_date + split_duration * i for i in range(num_splits + 1)]
367
+
368
+ total_from_splits = 0
369
+ for i in range(num_splits):
370
+ split_start = split_dates[i]
371
+ split_end = split_dates[i + 1]
372
+ # Avoid overlapping ranges (add 1 hour to start)
373
+ if i > 0:
374
+ split_start = split_start + timedelta(hours=1)
375
+
376
+ count = fetch_reviews_with_time_partition(
377
+ base_query, split_start, split_end, headers, prs_by_url, debug_limit, depth + 1
378
+ )
379
+ total_from_splits += count
380
+
381
+ return total_from_splits
382
+
383
+ else: # 2+ days - split by days
384
+ days_diff = time_diff.days
385
+
386
+ # Use aggressive splitting for large ranges or deep recursion
387
+ # Split into 4 parts if range is > 30 days, otherwise split in half
388
+ if days_diff > 30 or depth > 5:
389
+ # Split into 4 parts for more aggressive partitioning
390
+ quarter_diff = time_diff / 4
391
+ split_dates = [
392
+ start_date,
393
+ start_date + quarter_diff,
394
+ start_date + quarter_diff * 2,
395
+ start_date + quarter_diff * 3,
396
+ end_date
397
+ ]
398
+
399
+ total_from_splits = 0
400
+ for i in range(4):
401
+ split_start = split_dates[i]
402
+ split_end = split_dates[i + 1]
403
+ # Avoid overlapping ranges
404
+ if i > 0:
405
+ split_start = split_start + timedelta(days=1)
406
+
407
+ count = fetch_reviews_with_time_partition(
408
+ base_query, split_start, split_end, headers, prs_by_url, debug_limit, depth + 1
409
+ )
410
+ total_from_splits += count
411
+
412
+ return total_from_splits
413
+ else:
414
+ # Binary split for smaller ranges
415
+ mid_date = start_date + time_diff / 2
416
+
417
+ # Recursively fetch both halves
418
+ count1 = fetch_reviews_with_time_partition(
419
+ base_query, start_date, mid_date, headers, prs_by_url, debug_limit, depth + 1
420
+ )
421
+ count2 = fetch_reviews_with_time_partition(
422
+ base_query, mid_date + timedelta(days=1), end_date, headers, prs_by_url, debug_limit, depth + 1
423
+ )
424
+
425
+ return count1 + count2
426
+
427
+ # Normal pagination: check if there are more pages
428
+ if len(items) < per_page or page >= 10:
429
+ break
430
+
431
+ page += 1
432
+ time.sleep(0.5) # Courtesy delay between pages
433
+
434
+ except Exception as e:
435
+ print(f"{indent} Error fetching range {start_str} to {end_str}: {str(e)}")
436
+ return total_in_partition
437
+
438
+ if total_in_partition > 0:
439
+ print(f"{indent} βœ“ Found {total_in_partition} PRs in range {start_str} to {end_str}")
440
+
441
+ return total_in_partition
442
+
443
+
444
+ def extract_review_metadata(pr):
445
+ """
446
+ Extract minimal PR review metadata for efficient storage.
447
+ Only keeps essential fields: html_url, reviewed_at, pr_status, pr_merged, pr_closed_at.
448
+ Note: agent_name is not stored as it's inferred from the folder structure.
449
+
450
+ PR status:
451
+ - pr_status: 'open', 'merged', or 'closed'
452
+ - pr_merged: True if PR was merged (accepted), False otherwise
453
+ - pr_closed_at: Date when PR was closed/merged (if applicable)
454
+
455
+ Accepted PR = PR that was merged after agent review
456
+ Rejected PR = PR that was closed without merging after agent review
457
+ """
458
+ # Extract PR metadata from search results
459
+ # The GitHub search API returns PR data from /search/issues endpoint
460
+ pr_url = pr.get('html_url')
461
+ pr_number = pr.get('number')
462
+ created_at = pr.get('created_at')
463
+ closed_at = pr.get('closed_at')
464
+ state = pr.get('state', 'open') # open or closed
465
+
466
+ # Check if PR has pull_request field (indicates it's a PR, not an issue)
467
+ pull_request_data = pr.get('pull_request', {})
468
+
469
+ # For initial extraction, we don't know if merged yet
470
+ # This will be updated by update_pr_status function
471
+ pr_merged = pull_request_data.get('merged_at') is not None if pull_request_data else False
472
+
473
+ # Determine initial status
474
+ if pr_merged:
475
+ status = 'merged'
476
+ elif state == 'closed':
477
+ status = 'closed'
478
+ else:
479
+ status = 'open'
480
+
481
+ return {
482
+ 'html_url': pr_url,
483
+ 'reviewed_at': created_at, # When the PR was created (agent reviewed it)
484
+ 'pr_status': status,
485
+ 'pr_merged': pr_merged,
486
+ 'pr_closed_at': closed_at,
487
+ 'pr_url': pr_url, # Store PR URL for tracking
488
+ 'review_id': f"pr_{pr_number}" # Use PR number for deduplication
489
+ }
490
+
491
+
492
+ def update_pr_status(metadata_list, headers, token):
493
+ """
494
+ Update PR status for reviews to get current merged/closed state.
495
+
496
+ For each PR associated with a review, fetch current status from GitHub API.
497
+ Updates metadata_list in-place with PR status information.
498
+
499
+ In DEBUG MODE: Skips status updates to avoid API rate limits.
500
+
501
+ Args:
502
+ metadata_list: List of review metadata dictionaries
503
+ headers: HTTP headers for GitHub API
504
+ token: GitHub API token
505
+
506
+ Returns:
507
+ Updated metadata_list with current PR status
508
+ """
509
+ if not metadata_list:
510
+ return metadata_list
511
+
512
+ # In debug mode, skip status updates to avoid excessive API calls
513
+ if DEBUG_MODE:
514
+ print(f" πŸ› DEBUG MODE: Skipping PR status updates for {len(metadata_list)} reviews")
515
+ return metadata_list
516
+
517
+ # Track unique PRs to avoid duplicate API calls
518
+ pr_url_to_status = {}
519
+ updated_count = 0
520
+
521
+ for metadata in metadata_list:
522
+ pr_url = metadata.get('pr_url')
523
+ if not pr_url:
524
+ continue
525
+
526
+ # Skip if already fetched for this PR
527
+ if pr_url in pr_url_to_status:
528
+ status_info = pr_url_to_status[pr_url]
529
+ metadata['pr_status'] = status_info['status']
530
+ metadata['pr_merged'] = status_info['merged']
531
+ metadata['pr_closed_at'] = status_info['closed_at']
532
+ continue
533
+
534
+ try:
535
+ # Convert HTML URL to API URL
536
+ # https://github.com/owner/repo/pull/123 -> https://api.github.com/repos/owner/repo/pulls/123
537
+ parts = pr_url.replace('https://github.com/', '').split('/')
538
+ if len(parts) >= 4:
539
+ owner, repo, pull_word, pr_number = parts[0], parts[1], parts[2], parts[3]
540
+ api_url = f'https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}'
541
+
542
+ response = request_with_backoff('GET', api_url, headers=headers, max_retries=3)
543
+
544
+ if response and response.status_code == 200:
545
+ pr_data = response.json()
546
+ state = pr_data.get('state', 'open')
547
+ merged = pr_data.get('merged', False)
548
+ closed_at = pr_data.get('closed_at')
549
+ merged_at = pr_data.get('merged_at')
550
+
551
+ # Determine final status
552
+ if merged:
553
+ status = 'merged'
554
+ elif state == 'closed':
555
+ status = 'closed'
556
+ else:
557
+ status = 'open'
558
+
559
+ status_info = {
560
+ 'status': status,
561
+ 'merged': merged,
562
+ 'closed_at': closed_at or merged_at
563
+ }
564
+
565
+ # Cache and update
566
+ pr_url_to_status[pr_url] = status_info
567
+ metadata['pr_status'] = status
568
+ metadata['pr_merged'] = merged
569
+ metadata['pr_closed_at'] = closed_at or merged_at
570
+ updated_count += 1
571
+
572
+ # Small delay to avoid rate limiting
573
+ time.sleep(0.1)
574
+
575
+ except Exception as e:
576
+ print(f" Warning: Could not check PR status for {pr_url}: {e}")
577
+ continue
578
+
579
+ if updated_count > 0:
580
+ print(f" βœ“ Updated status for {updated_count} unique PRs")
581
+
582
+ return metadata_list
583
+
584
+
585
+ def fetch_all_reviews_metadata(identifier, agent_name, token=None, start_from_date=None, year=None, exclude_dates=None):
586
+ """
587
+ Fetch PR reviews associated with a GitHub user or bot for the past 6 months.
588
+ Returns lightweight metadata instead of full review objects.
589
+
590
+ This function employs time-based partitioning to navigate GitHub's 1000-result limit per query.
591
+ It searches using the query pattern:
592
+ - reviewed-by:{identifier} (PR reviews by the agent)
593
+
594
+ After fetching reviews, it updates PR status to determine if PRs were merged or closed.
595
+
596
+ Args:
597
+ identifier: GitHub username or bot identifier
598
+ agent_name: Human-readable name of the agent for metadata purposes
599
+ token: GitHub API token for authentication
600
+ start_from_date: Only fetch reviews created after this date (for incremental updates)
601
+ year: Year parameter (deprecated, retained for compatibility but not utilized)
602
+ exclude_dates: Set of date objects to exclude from mining (dates that have already been processed)
603
+
604
+ Returns:
605
+ List of dictionaries containing minimal PR review metadata with PR status
606
+ """
607
+ headers = {'Authorization': f'token {token}'} if token else {}
608
+
609
+ # Debug mode: limit review retrieval for testing
610
+ debug_limit_per_pattern = 10 if DEBUG_MODE else None
611
+
612
+ if DEBUG_MODE:
613
+ print(f"\nπŸ› DEBUG MODE ENABLED: Limiting to {debug_limit_per_pattern} PRs per query pattern")
614
+
615
+ # Define query pattern for PR reviews:
616
+ query_patterns = []
617
+
618
+ # Add reviewed-by pattern for PR reviews
619
+ query_patterns.append(f'is:pr reviewed-by:{identifier}')
620
+
621
+ # Use a dict to deduplicate PRs by URL
622
+ prs_by_url = {}
623
+
624
+ # Define time range: past 6 months only (or from start_from_date if specified)
625
+ current_time = datetime.now(timezone.utc)
626
+ six_months_ago = current_time - timedelta(days=180) # ~6 months
627
+
628
+ if start_from_date:
629
+ # Use start_from_date but ensure it's not older than 6 months
630
+ start_date = max(start_from_date, six_months_ago)
631
+ else:
632
+ start_date = six_months_ago
633
+
634
+ # End date is current time
635
+ end_date = current_time
636
+
637
+ for query_pattern in query_patterns:
638
+ print(f"\nπŸ” Searching with query: {query_pattern}")
639
+ print(f" Time range: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}")
640
+
641
+ pattern_start_time = time.time()
642
+ initial_count = len(prs_by_url)
643
+
644
+ # Fetch with time partitioning
645
+ reviews_found = fetch_reviews_with_time_partition(
646
+ query_pattern,
647
+ start_date,
648
+ end_date,
649
+ headers,
650
+ prs_by_url,
651
+ debug_limit_per_pattern
652
+ )
653
+
654
+ pattern_duration = time.time() - pattern_start_time
655
+ new_reviews = len(prs_by_url) - initial_count
656
+
657
+ print(f" βœ“ Pattern complete: {new_reviews} new PRs found ({reviews_found} total fetched, {len(prs_by_url) - initial_count - (reviews_found - new_reviews)} duplicates)")
658
+ print(f" ⏱️ Time taken: {pattern_duration:.1f} seconds")
659
+
660
+ # Delay between different query patterns (shorter in debug mode)
661
+ time.sleep(0.2 if DEBUG_MODE else 1.0)
662
+
663
+ # Convert to lightweight metadata
664
+ all_prs = list(prs_by_url.values())
665
+
666
+ # Filter out PRs from excluded dates if specified
667
+ if exclude_dates:
668
+ filtered_prs = []
669
+ excluded_count = 0
670
+ for pr in all_prs:
671
+ created_at = pr.get('created_at')
672
+ if created_at:
673
+ try:
674
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
675
+ pr_date = dt.date()
676
+ if pr_date not in exclude_dates:
677
+ filtered_prs.append(pr)
678
+ else:
679
+ excluded_count += 1
680
+ except Exception:
681
+ filtered_prs.append(pr) # Keep PRs with unparseable dates
682
+ else:
683
+ filtered_prs.append(pr) # Keep PRs without created_at
684
+
685
+ if excluded_count > 0:
686
+ print(f" ⏭️ Skipped {excluded_count} PRs from already-mined dates")
687
+ all_prs = filtered_prs
688
+
689
+ if DEBUG_MODE:
690
+ print(f"\nβœ… COMPLETE (DEBUG MODE): Found {len(all_prs)} unique PRs reviewed by {identifier}")
691
+ print(f" Note: In production mode, this would fetch ALL PRs")
692
+ else:
693
+ print(f"\nβœ… COMPLETE: Found {len(all_prs)} unique PRs reviewed by {identifier}")
694
+ print(f"πŸ“¦ Extracting minimal metadata and updating PR status...")
695
+
696
+ # Extract metadata for each PR review
697
+ metadata_list = [extract_review_metadata(pr) for pr in all_prs]
698
+
699
+ # Update PR status to get current merged/closed state
700
+ print(f"πŸ” Updating PR status for reviewed PRs...")
701
+ metadata_list = update_pr_status(metadata_list, headers, token)
702
+
703
+ # Calculate memory savings
704
+ original_size = sys.getsizeof(str(all_prs))
705
+ metadata_size = sys.getsizeof(str(metadata_list))
706
+ savings_pct = ((original_size - metadata_size) / original_size * 100) if original_size > 0 else 0
707
+
708
+ print(f"πŸ’Ύ Memory efficiency: {original_size // 1024}KB β†’ {metadata_size // 1024}KB (saved {savings_pct:.1f}%)")
709
+
710
+ return metadata_list
711
+
712
+
713
+ def group_metadata_by_date(metadata_list):
714
+ """
715
+ Group review metadata by exact date (year.month.day) for efficient daily storage.
716
+ Returns dict: {(year, month, day): [metadata_list]}
717
+ """
718
+ grouped = defaultdict(list)
719
+
720
+ for review_meta in metadata_list:
721
+ reviewed_at = review_meta.get('reviewed_at')
722
+ if not reviewed_at:
723
+ continue
724
+
725
+ try:
726
+ dt = datetime.fromisoformat(reviewed_at.replace('Z', '+00:00'))
727
+ key = (dt.year, dt.month, dt.day)
728
+ grouped[key].append(review_meta)
729
+ except Exception as e:
730
+ print(f"Warning: Could not parse date '{reviewed_at}': {e}")
731
+
732
+ return dict(grouped)
733
+
734
+
735
+ def save_review_metadata_to_hf(metadata_list, agent_identifier):
736
+ """
737
+ Save review metadata to HuggingFace dataset, organized by [agent_identifier]/YYYY.MM.DD.jsonl.
738
+ Each file is stored in the agent's folder and named YYYY.MM.DD.jsonl for that day's reviews.
739
+ In debug mode, saves to in-memory cache only.
740
+
741
+ This function APPENDS new metadata and DEDUPLICATES by sha.
742
+
743
+ Args:
744
+ metadata_list: List of review metadata dictionaries
745
+ agent_identifier: GitHub identifier of the agent (used as folder name)
746
+ """
747
+ # Skip saving to HF in debug mode - use in-memory cache instead
748
+ if DEBUG_MODE:
749
+ global DEBUG_REVIEW_METADATA_CACHE
750
+ # Merge with existing cache, deduplicating by review_id
751
+ existing = {review['review_id']: review for review in DEBUG_REVIEW_METADATA_CACHE[agent_identifier] if review.get('review_id')}
752
+ new = {review['review_id']: review for review in metadata_list if review.get('review_id')}
753
+ existing.update(new)
754
+ DEBUG_REVIEW_METADATA_CACHE[agent_identifier] = list(existing.values())
755
+ print(f"πŸ› DEBUG MODE: Saved to in-memory cache only ({len(metadata_list)} reviews) - NOT saved to HuggingFace")
756
+ return True
757
+
758
+ try:
759
+ token = get_hf_token()
760
+ if not token:
761
+ raise Exception("No HuggingFace token found")
762
+
763
+ api = HfApi()
764
+
765
+ # Group by exact date (year, month, day)
766
+ grouped = group_metadata_by_date(metadata_list)
767
+
768
+ for (review_year, month, day), day_metadata in grouped.items():
769
+ # New structure: [agent_identifier]/YYYY.MM.DD.jsonl
770
+ filename = f"{agent_identifier}/{review_year}.{month:02d}.{day:02d}.jsonl"
771
+ local_filename = f"{review_year}.{month:02d}.{day:02d}.jsonl"
772
+ print(f"πŸ“€ Uploading {len(day_metadata)} reviews to {filename}...")
773
+
774
+ # Download existing file if it exists
775
+ existing_metadata = []
776
+ try:
777
+ file_path = hf_hub_download(
778
+ repo_id=REVIEW_METADATA_REPO,
779
+ filename=filename,
780
+ repo_type="dataset",
781
+ token=token
782
+ )
783
+ existing_metadata = load_jsonl(file_path)
784
+ print(f" Found {len(existing_metadata)} existing reviews in {filename}")
785
+ except Exception:
786
+ print(f" No existing file found for {filename}, creating new")
787
+
788
+ # Merge and deduplicate by review_id
789
+ existing_by_id = {meta['review_id']: meta for meta in existing_metadata if meta.get('review_id')}
790
+ new_by_id = {meta['review_id']: meta for meta in day_metadata if meta.get('review_id')}
791
+
792
+ # Update with new data (new data overwrites old)
793
+ existing_by_id.update(new_by_id)
794
+ merged_metadata = list(existing_by_id.values())
795
+
796
+ # Save locally
797
+ save_jsonl(local_filename, merged_metadata)
798
+
799
+ try:
800
+ # Upload to HuggingFace with folder path
801
+ upload_with_retry(
802
+ api=api,
803
+ path_or_fileobj=local_filename,
804
+ path_in_repo=filename,
805
+ repo_id=REVIEW_METADATA_REPO,
806
+ repo_type="dataset",
807
+ token=token
808
+ )
809
+ print(f" βœ“ Saved {len(merged_metadata)} total reviews to {filename}")
810
+ finally:
811
+ # Always clean up local file, even if upload fails
812
+ if os.path.exists(local_filename):
813
+ os.remove(local_filename)
814
+
815
+ return True
816
+
817
+ except Exception as e:
818
+ print(f"βœ— Error saving review metadata: {str(e)}")
819
+ return False
820
+
821
+
822
+ def load_agents_from_hf():
823
+ try:
824
+ api = HfApi()
825
+ agents = []
826
+ files = api.list_repo_files(repo_id=AGENTS_REPO, repo_type="dataset")
827
+ json_files = [f for f in files if f.endswith('.json')]
828
+ print(f"Found {len(json_files)} agent files in {AGENTS_REPO}")
829
+ for json_file in json_files:
830
+ try:
831
+ file_path = hf_hub_download(
832
+ repo_id=AGENTS_REPO,
833
+ filename=json_file,
834
+ repo_type="dataset"
835
+ )
836
+ with open(file_path, 'r') as f:
837
+ agent_data = json.load(f)
838
+ agents.append(agent_data)
839
+ except Exception as e:
840
+ print(f"Warning: Could not load {json_file}: {str(e)}")
841
+ continue
842
+ print(f"βœ“ Loaded {len(agents)} agents from HuggingFace")
843
+ return agents
844
+ except Exception as e:
845
+ print(f"Could not load agents from HuggingFace: {str(e)}")
846
+ return None
847
+
848
+
849
+ def load_review_metadata_for_year(year):
850
+ """
851
+ Load all review metadata for a specific year from HuggingFace.
852
+ Scans all agent folders and loads daily files matching the year.
853
+ In debug mode, loads from in-memory cache if available.
854
+
855
+ Structure: [agent_identifier]/YYYY.MM.DD.jsonl
856
+
857
+ Returns:
858
+ List of dictionaries with 'agent_identifier' added to each review metadata.
859
+ """
860
+ # In debug mode, check in-memory cache first
861
+ if DEBUG_MODE and DEBUG_REVIEW_METADATA_CACHE:
862
+ all_metadata = []
863
+ for agent_identifier, metadata_list in DEBUG_REVIEW_METADATA_CACHE.items():
864
+ for review_meta in metadata_list:
865
+ review_with_agent = review_meta.copy()
866
+ review_with_agent['agent_identifier'] = agent_identifier
867
+ all_metadata.append(review_with_agent)
868
+ if all_metadata:
869
+ print(f"πŸ› DEBUG MODE: Loading review metadata from in-memory cache ({len(all_metadata)} reviews)")
870
+ return all_metadata
871
+
872
+ try:
873
+ api = HfApi()
874
+ token = get_hf_token()
875
+
876
+ # List all files in the repository
877
+ files = api.list_repo_files(repo_id=REVIEW_METADATA_REPO, repo_type="dataset")
878
+
879
+ # Filter for files matching the year pattern: [agent_identifier]/YYYY.MM.DD.jsonl
880
+ # Extract year from filename
881
+ year_str = str(year)
882
+ year_files = []
883
+ for f in files:
884
+ if f.endswith('.jsonl'):
885
+ parts = f.split('/')
886
+ if len(parts) == 2: # [agent_identifier]/YYYY.MM.DD.jsonl
887
+ filename = parts[1]
888
+ if filename.startswith(year_str + '.'):
889
+ year_files.append(f)
890
+
891
+ print(f"πŸ“₯ Loading review metadata for {year} ({len(year_files)} daily files across all agents)...")
892
+
893
+ all_metadata = []
894
+ for filename in year_files:
895
+ try:
896
+ # Extract agent_identifier from path (first part)
897
+ # Format: agent_identifier/YYYY.MM.DD.jsonl
898
+ parts = filename.split('/')
899
+ if len(parts) != 2:
900
+ print(f" Warning: Unexpected filename format: {filename}")
901
+ continue
902
+
903
+ agent_identifier = parts[0]
904
+
905
+ file_path = hf_hub_download(
906
+ repo_id=REVIEW_METADATA_REPO,
907
+ filename=filename,
908
+ repo_type="dataset",
909
+ token=token
910
+ )
911
+ day_metadata = load_jsonl(file_path)
912
+
913
+ # Add agent_identifier to each review metadata for processing
914
+ for review_meta in day_metadata:
915
+ review_meta['agent_identifier'] = agent_identifier
916
+
917
+ all_metadata.extend(day_metadata)
918
+ print(f" βœ“ Loaded {len(day_metadata)} reviews from {filename}")
919
+ except Exception as e:
920
+ print(f" Warning: Could not load {filename}: {str(e)}")
921
+
922
+ print(f"βœ“ Loaded {len(all_metadata)} total reviews for {year}")
923
+ return all_metadata
924
+
925
+ except Exception as e:
926
+ print(f"βœ— Error loading review metadata for {year}: {str(e)}")
927
+ return []
928
+
929
+
930
+ def get_latest_review_date_for_agent(agent_identifier):
931
+ """
932
+ Get the latest review creation date for an agent from stored metadata.
933
+ Used for incremental updates - only fetch reviews newer than this date.
934
+
935
+ Structure: [agent_identifier]/YYYY.MM.DD.jsonl
936
+
937
+ Args:
938
+ agent_identifier: GitHub identifier of the agent
939
+
940
+ Returns:
941
+ datetime or None if no existing reviews found.
942
+ """
943
+ try:
944
+ api = HfApi()
945
+ token = get_hf_token()
946
+
947
+ # List all files in the repository
948
+ files = api.list_repo_files(repo_id=REVIEW_METADATA_REPO, repo_type="dataset")
949
+
950
+ # Filter for files in this agent's folder
951
+ # New structure: [agent_identifier]/YYYY.MM.DD.jsonl
952
+ agent_pattern = f"{agent_identifier}/"
953
+ agent_files = [f for f in files if f.startswith(agent_pattern) and f.endswith('.jsonl')]
954
+
955
+ if not agent_files:
956
+ return None
957
+
958
+ # Find latest review_at across all files
959
+ latest_date = None
960
+ for filename in agent_files:
961
+ try:
962
+ file_path = hf_hub_download(
963
+ repo_id=REVIEW_METADATA_REPO,
964
+ filename=filename,
965
+ repo_type="dataset",
966
+ token=token
967
+ )
968
+ metadata = load_jsonl(file_path)
969
+
970
+ for review_meta in metadata:
971
+ reviewed_at = review_meta.get("reviewed_at")
972
+ if reviewed_at:
973
+ try:
974
+ dt = datetime.fromisoformat(reviewed_at.replace("Z", "+00:00"))
975
+ if latest_date is None or dt > latest_date:
976
+ latest_date = dt
977
+ except Exception:
978
+ continue
979
+ except Exception:
980
+ continue
981
+
982
+ return latest_date
983
+
984
+ except Exception:
985
+ return None
986
+
987
+
988
+ def get_already_mined_dates(agent_identifier, n_months=6):
989
+ """
990
+ Get set of dates that have already been mined for an agent.
991
+
992
+ Args:
993
+ agent_identifier: GitHub identifier of the agent
994
+ n_months: Number of months to look back (default: 6)
995
+
996
+ Returns:
997
+ Set of date objects (datetime.date) that already have data files
998
+ """
999
+ try:
1000
+ api = HfApi()
1001
+
1002
+ # Calculate date range
1003
+ today = datetime.now(timezone.utc)
1004
+ n_months_ago = today - timedelta(days=30 * n_months)
1005
+
1006
+ # List all files in the repository
1007
+ files = api.list_repo_files(repo_id=REVIEW_METADATA_REPO, repo_type="dataset")
1008
+
1009
+ # Filter for files in this agent's folder
1010
+ agent_pattern = f"{agent_identifier}/"
1011
+ agent_files = [f for f in files if f.startswith(agent_pattern) and f.endswith('.jsonl')]
1012
+
1013
+ mined_dates = set()
1014
+ for filename in agent_files:
1015
+ try:
1016
+ # Extract date from filename: [agent_identifier]/YYYY.MM.DD.jsonl
1017
+ parts = filename.split('/')
1018
+ if len(parts) != 2:
1019
+ continue
1020
+
1021
+ date_part = parts[1].replace('.jsonl', '') # Get YYYY.MM.DD
1022
+ date_components = date_part.split('.')
1023
+ if len(date_components) != 3:
1024
+ continue
1025
+
1026
+ file_year, file_month, file_day = map(int, date_components)
1027
+ file_date = datetime(file_year, file_month, file_day, tzinfo=timezone.utc).date()
1028
+
1029
+ # Only include dates within the last n_months
1030
+ if n_months_ago.date() <= file_date <= today.date():
1031
+ mined_dates.add(file_date)
1032
+ except Exception as e:
1033
+ print(f" Warning: Could not parse date from filename {filename}: {e}")
1034
+ continue
1035
+
1036
+ return mined_dates
1037
+
1038
+ except Exception as e:
1039
+ print(f" Warning: Could not get already-mined dates for {agent_identifier}: {str(e)}")
1040
+ return set()
1041
+
1042
+
1043
+
1044
+
1045
+ def calculate_review_stats_from_metadata(metadata_list):
1046
+ """
1047
+ Calculate statistics from a list of review metadata (lightweight objects).
1048
+ Works with minimal metadata: html_url, reviewed_at, pr_status, pr_merged, pr_closed_at.
1049
+
1050
+ Returns a dictionary with comprehensive review metrics.
1051
+
1052
+ Acceptance Rate is calculated as:
1053
+ accepted PRs / (accepted PRs + rejected PRs) * 100
1054
+
1055
+ Accepted PRs = PRs that were merged (pr_status='merged')
1056
+ Rejected PRs = PRs that were closed without merging (pr_status='closed')
1057
+ Pending PRs = PRs still open (pr_status='open') - excluded from acceptance rate
1058
+ """
1059
+ total_reviews = len(metadata_list)
1060
+
1061
+ # Count accepted PRs (merged)
1062
+ accepted_prs = sum(1 for review_meta in metadata_list
1063
+ if review_meta.get('pr_status') == 'merged')
1064
+
1065
+ # Count rejected PRs (closed without merging)
1066
+ rejected_prs = sum(1 for review_meta in metadata_list
1067
+ if review_meta.get('pr_status') == 'closed')
1068
+
1069
+ # Count pending PRs (still open)
1070
+ pending_prs = sum(1 for review_meta in metadata_list
1071
+ if review_meta.get('pr_status') == 'open')
1072
+
1073
+ # Calculate acceptance rate (exclude pending PRs)
1074
+ completed_prs = accepted_prs + rejected_prs
1075
+ acceptance_rate = (accepted_prs / completed_prs * 100) if completed_prs > 0 else 0
1076
+
1077
+ return {
1078
+ 'total_reviews': total_reviews,
1079
+ 'accepted_prs': accepted_prs,
1080
+ 'rejected_prs': rejected_prs,
1081
+ 'pending_prs': pending_prs,
1082
+ 'acceptance_rate': round(acceptance_rate, 2),
1083
+ }
1084
+
1085
+
1086
+ def update_all_agents_incremental():
1087
+ """
1088
+ Memory-efficient incremental update of review statistics for all agents.
1089
+
1090
+ Strategy:
1091
+ 1. For each agent, load existing data from SWE-Arena/review_metadata
1092
+ 2. Identify already-mined dates (based on filename: YYYY.MM.DD.jsonl)
1093
+ 3. Only fetch reviews from dates that haven't been mined yet (within last 6 months)
1094
+ 4. If no data exists at all, mine everything from scratch
1095
+ 5. Store minimal metadata (not full review objects) to avoid storage limits
1096
+ 6. Construct leaderboard from ALL stored metadata (last 6 months)
1097
+
1098
+ Returns dictionary of all agent data with current stats.
1099
+ """
1100
+ token = get_github_token()
1101
+ current_year = datetime.now().year
1102
+
1103
+ # Load agent metadata from HuggingFace
1104
+ agents = load_agents_from_hf()
1105
+ if not agents:
1106
+ print("No agents found in HuggingFace dataset")
1107
+ return {}
1108
+
1109
+ cache_dict = {}
1110
+
1111
+ # Update each agent
1112
+ for agent in agents:
1113
+ identifier = agent.get('github_identifier')
1114
+ agent_name = agent.get('agent_name', 'Unknown')
1115
+
1116
+ if not identifier:
1117
+ print(f"Warning: Skipping agent without identifier: {agent}")
1118
+ continue
1119
+
1120
+ try:
1121
+ print(f"\n{'='*80}")
1122
+ print(f"Processing: {agent_name} ({identifier})")
1123
+ print(f"{'='*80}")
1124
+
1125
+ # Get already-mined dates for this agent (last 6 months)
1126
+ already_mined_dates = get_already_mined_dates(identifier, n_months=6)
1127
+
1128
+ if already_mined_dates:
1129
+ print(f"πŸ“… Found {len(already_mined_dates)} already-mined dates")
1130
+ print(f" Skipping these dates and fetching only new data...")
1131
+ # Fetch only reviews from dates not yet mined
1132
+ new_metadata = fetch_all_reviews_metadata(
1133
+ identifier,
1134
+ agent_name,
1135
+ token,
1136
+ start_from_date=None, # Use full 6-month range
1137
+ exclude_dates=already_mined_dates # But exclude already-mined dates
1138
+ )
1139
+ else:
1140
+ print(f"πŸ“… No existing data found. Mining everything from scratch...")
1141
+ # Mine everything from scratch (full 6-month range)
1142
+ new_metadata = fetch_all_reviews_metadata(
1143
+ identifier,
1144
+ agent_name,
1145
+ token,
1146
+ start_from_date=None
1147
+ )
1148
+
1149
+ if new_metadata:
1150
+ # Save new metadata to HuggingFace (organized by agent_identifier/YYYY.MM.DD.jsonl)
1151
+ print(f"πŸ’Ύ Saving {len(new_metadata)} new review records...")
1152
+ save_review_metadata_to_hf(new_metadata, identifier)
1153
+ else:
1154
+ print(f" No new reviews to save")
1155
+
1156
+ # Load ALL metadata for current year to calculate stats (aggregates entire last 6 months)
1157
+ print(f"πŸ“Š Calculating statistics from ALL stored metadata (last 6 months)...")
1158
+ all_year_metadata = load_review_metadata_for_year(current_year)
1159
+
1160
+ # Filter for this specific agent
1161
+ agent_metadata = [review for review in all_year_metadata if review.get("agent_identifier") == identifier]
1162
+
1163
+ # Calculate stats from metadata
1164
+ stats = calculate_review_stats_from_metadata(agent_metadata)
1165
+
1166
+ # Merge metadata with stats
1167
+ cache_dict[identifier] = {
1168
+ 'agent_name': agent_name,
1169
+ 'website': agent.get('website', 'N/A'),
1170
+ 'github_identifier': identifier,
1171
+ **stats
1172
+ }
1173
+
1174
+ print(f"βœ“ Updated {identifier}: {stats['total_reviews']} reviews, {stats['acceptance_rate']}% acceptance rate")
1175
+
1176
+ except Exception as e:
1177
+ print(f"βœ— Error updating {identifier}: {str(e)}")
1178
+ import traceback
1179
+ traceback.print_exc()
1180
+ continue
1181
+
1182
+ return cache_dict
1183
+
1184
+
1185
+ def run_once():
1186
+ print("\nπŸš€ Immediate mining run started")
1187
+ cache_dict = update_all_agents_incremental()
1188
+ if cache_dict:
1189
+ print(f"βœ“ Updated {len(cache_dict)} agents")
1190
+ print("βœ… Immediate mining run completed\n")
1191
+
1192
+
1193
+ def main():
1194
+ if DEBUG_MODE:
1195
+ print("\n" + "="*80)
1196
+ print("πŸ› DEBUG MODE ENABLED πŸ›")
1197
+ print("="*80)
1198
+ print("PR retrieval is limited to 10 PRs per query pattern per agent")
1199
+ print("Data will NOT be saved to HuggingFace in debug mode.")
1200
+ print("="*80 + "\n")
1201
+ else:
1202
+ print("\nπŸš€ Starting in PRODUCTION MODE - full review retrieval enabled")
1203
+ print()
1204
+
1205
+ if not args.loop:
1206
+ run_once()
1207
+ return
1208
+
1209
+ print(f"πŸ” Loop mode enabled. Interval: {args.interval_seconds} seconds")
1210
+ try:
1211
+ while True:
1212
+ start = time.time()
1213
+ run_once()
1214
+ elapsed = time.time() - start
1215
+ sleep_for = max(0, args.interval_seconds - int(elapsed))
1216
+ if sleep_for > 0:
1217
+ print(f"😴 Sleeping {sleep_for} seconds before next run...")
1218
+ time.sleep(sleep_for)
1219
+ except KeyboardInterrupt:
1220
+ print("\nπŸ‘‹ Loop interrupted by user. Exiting...")
1221
+
1222
+
1223
+ if __name__ == "__main__":
1224
+ main()
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ APScheduler
2
+ datasets
3
+ gradio
4
+ gradio_leaderboard
5
+ huggingface_hub
6
+ pandas
7
+ plotly
8
+ PyGithub
9
+ python-dotenv