Bremin commited on
Commit
8fdba4d
·
verified ·
1 Parent(s): a38a118

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # SWE-Bench Trajectory Eval Bundle (v1)
6
+
7
+ Companion artifact for the trajectory-probe downstream eval of the
8
+ code-graph-v7 encoders (W1, I6, ...).
9
+
10
+ ## Contents
11
+
12
+ - `traj_full_bundle.tar.gz` (488 MB) — contains:
13
+ - `specs.jsonl`: 2456 SWE-Bench Verified agent trajectories harvested
14
+ from `swe-bench-submissions` S3 bucket. Fields: instance_id, traj_id,
15
+ repo, base_commit, patches (1 entry = final model patch), resolved.
16
+ - `repos/`: shallow (`--filter=blob:none`) clones of the 12 target
17
+ repos (django, sympy, sphinx, matplotlib, scikit-learn, astropy,
18
+ xarray, pytest, pylint, requests, seaborn, flask). ~671 MB
19
+ uncompressed. Blobs pulled lazily per base_commit checkout.
20
+ - `graphjepa/`: pipeline code (trajectory_pipeline, trajectory_realize,
21
+ trajectory_probe, trajectory_harvest) plus scripts/trajectory_full.sh.
22
+ - `harvest.log` — stdout from the S3 harvester that produced specs.jsonl.
23
+
24
+ ## Downstream workflow
25
+
26
+ ```bash
27
+ tar -xzf traj_full_bundle.tar.gz
28
+ rsync -a traj_full/graphjepa/ graphjepa/
29
+ mkdir -p outputs/traj_real
30
+ cp traj_full/specs.jsonl outputs/traj_real/
31
+ mv traj_full/repos outputs/traj_real/repos
32
+
33
+ # realize (4 sharded workers by repo)
34
+ SHARDS=4 bash graphjepa/scripts/trajectory_full.sh
35
+ tail -f outputs/traj_real/logs/realize_shard*.log
36
+
37
+ # merge manifests + probe with each encoder
38
+ cat outputs/traj_real/manifest_shard*.jsonl > outputs/traj_real/manifest.jsonl
39
+ for NAME in W1_softplus_s0 I6_joint_s0; do
40
+ .venv/bin/python -m graphjepa.trajectory_probe \
41
+ --manifest outputs/traj_real/manifest.jsonl \
42
+ --ckpt outputs/$NAME/ckpt_final.pt \
43
+ --pool mean --split-by repo \
44
+ --output outputs/traj_real/probe_${NAME}.json
45
+ done
46
+ ```
47
+
48
+ ## Provenance
49
+
50
+ Specs harvested from 5 SWE-Bench Verified submissions:
51
+
52
+ | Submission | N | Resolved | Rate |
53
+ |---|---|---|---|
54
+ | 20240620_sweagent_claude3.5sonnet | 485 | 168 | 34.6% |
55
+ | 20241022_tools_claude-3-5-sonnet-updated | 483 | 245 | 50.7% |
56
+ | 20241028_agentless-1.5_gpt4o | 495 | 194 | 39.2% |
57
+ | 20241029_OpenHands-CodeAct-2.1-sonnet | 493 | 265 | 53.8% |
58
+ | 20250405_amazon-q-developer-2025 | 500 | 330 | 66.0% |
59
+ | **total** | **2456** | **1202** | **48.9%** |
60
+
61
+ 500 unique instance_ids, 499 unique base_commits (median 5 trajectories
62
+ per commit — different agents attempting the same task).