Vilin97 Claude Opus 4.6 (1M context) commited on
Commit
f3ef14f
·
1 Parent(s): 0253f58

Add repair_lean_proofs tool for instant sorry-filling

Browse files

New agent tool backed by Axle's repair_proofs API. Tries automation
tactics (grind, simp, omega, nlinarith, aesop) to fill sorries in
~1-2 seconds — before the expensive Aristotle submission.

Tested: fills simple sorries (n+0=n, commutativity, positivity)
instantly. Hard sorries still need Aristotle.

Workflow: write sorry → repair_lean_proofs (1s) → if sorry remains,
extract_sorry_lemmas → submit to Aristotle.

23/31 verified, $11.60 total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (3) hide show
  1. agent.py +22 -2
  2. log.md +30 -0
  3. tools.py +55 -0
agent.py CHANGED
@@ -18,6 +18,7 @@ from tools import (
18
  search_lean_library,
19
  search_loogle,
20
  check_lean_code,
 
21
  extract_sorry_lemmas,
22
  submit_to_aristotle,
23
  check_aristotle_status,
@@ -53,8 +54,9 @@ If a statement is FALSE or you suspect it is false:
53
  ### Phase 2: Write statement + parallel proving
54
  1. **Write the formal Lean 4 theorem STATEMENT first** (with `sorry` as proof).
55
  2. **Verify the statement compiles** with check_lean_code (just the statement + sorry).
56
- 3. Once the statement compiles:
57
- - **Submit to Aristotle immediately** with the sorry'd code. Don't wait to try yourself first.
 
58
  - **Simultaneously try proving it yourself** with check_lean_code.
59
  4. If your proof attempt fails after 3-5 tries on a specific approach, try a DIFFERENT strategy:
60
  - Different tactics (simp, omega, ring, norm_num, linarith, nlinarith)
@@ -453,6 +455,24 @@ async def _handle_tool_call(fn_name: str, fn_args: dict, job: JobState) -> str:
453
  pass
454
  return result
455
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
456
  if fn_name == "extract_sorry_lemmas":
457
  code = fn_args.get("code", "")
458
  job.add_status("Extracting sorry'd sub-lemmas...")
 
18
  search_lean_library,
19
  search_loogle,
20
  check_lean_code,
21
+ repair_lean_proofs,
22
  extract_sorry_lemmas,
23
  submit_to_aristotle,
24
  check_aristotle_status,
 
54
  ### Phase 2: Write statement + parallel proving
55
  1. **Write the formal Lean 4 theorem STATEMENT first** (with `sorry` as proof).
56
  2. **Verify the statement compiles** with check_lean_code (just the statement + sorry).
57
+ 3. Once the statement compiles with sorry:
58
+ - **First try repair_lean_proofs** it may fill simple sorries automatically in 1-2 seconds.
59
+ - If sorry remains, **submit to Aristotle** with the sorry'd code.
60
  - **Simultaneously try proving it yourself** with check_lean_code.
61
  4. If your proof attempt fails after 3-5 tries on a specific approach, try a DIFFERENT strategy:
62
  - Different tactics (simp, omega, ring, norm_num, linarith, nlinarith)
 
455
  pass
456
  return result
457
 
458
+ if fn_name == "repair_lean_proofs":
459
+ code = fn_args.get("code", "")
460
+ job.add_status("Attempting automatic proof repair...")
461
+ result = await repair_lean_proofs(code)
462
+ try:
463
+ parsed = json.loads(result)
464
+ repairs = parsed.get("repairs_applied", 0)
465
+ has_sorry = parsed.get("still_has_sorry", True)
466
+ if repairs > 0 and not has_sorry:
467
+ job.add_status(f"Proof repair succeeded! {repairs} sorry(s) filled automatically.")
468
+ elif repairs > 0:
469
+ job.add_status(f"Partial repair: {repairs} sorry(s) filled, some remain.")
470
+ else:
471
+ job.add_status("Proof repair: no sorries could be filled automatically.")
472
+ except json.JSONDecodeError:
473
+ pass
474
+ return result
475
+
476
  if fn_name == "extract_sorry_lemmas":
477
  code = fn_args.get("code", "")
478
  job.add_status("Extracting sorry'd sub-lemmas...")
log.md CHANGED
@@ -516,3 +516,33 @@ System is stable and reliable at its capability tier.
516
  - `templates/index.html`: Updated subtitle with tech stack description, replaced examples with 6 proven-to-work problems across different difficulty levels
517
 
518
  ### Overall: 22/30 verified locally + HF deployment confirmed working
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
516
  - `templates/index.html`: Updated subtitle with tech stack description, replaced examples with 6 proven-to-work problems across different difficulty levels
517
 
518
  ### Overall: 22/30 verified locally + HF deployment confirmed working
519
+
520
+ ## Iteration 14 — 2026-03-24 11:00 PDT
521
+
522
+ ### Diagnosis
523
+ The agent's workflow for sorry-containing code was: write sorry → submit to Aristotle → wait. But many sorries can be closed instantly by automation tactics (grind, simp, omega). Tested Axle's `repair_proofs` API:
524
+ - 3 simple sorries (n+0=n, a+b=b+a, 0<n+1): ALL filled by `grind` in 100ms
525
+ - 2 hard sorries (Fermat, AM-GM): Not filled by automation (needs Mathlib-specific reasoning)
526
+
527
+ ### Fix: Add `repair_lean_proofs` tool (EFFICIENCY)
528
+
529
+ New agent tool that calls Axle's `repair_proofs` API with automation tactics (grind, simp_all, omega, norm_num, nlinarith, aesop). Costs ~1-2 seconds per call.
530
+
531
+ Updated workflow:
532
+ 1. Agent writes code with sorry → compiles
533
+ 2. **Call repair_lean_proofs** (NEW, ~1s) — may fill simple sorries instantly
534
+ 3. If sorry remains → extract_sorry_lemmas + submit to Aristotle
535
+ 4. Keep proving yourself in parallel
536
+
537
+ This adds a fast, free middle step that catches low-hanging fruit before the expensive Aristotle round-trip (which takes 5-30 minutes).
538
+
539
+ ### Test Results
540
+ | Problem | Time | Cost | Status |
541
+ |---------|------|------|--------|
542
+ | a∣b ∧ b∣a → a=b | ~10s | $0.007 | VERIFIED | `Nat.dvd_antisymm` |
543
+
544
+ ### Code Changes
545
+ - `tools.py`: Added `repair_lean_proofs()` function + tool definition
546
+ - `agent.py`: Added import, handler, updated system prompt Phase 2
547
+
548
+ ### Overall: 23/31 verified (74%), $11.60 total
tools.py CHANGED
@@ -137,6 +137,39 @@ async def check_lean_code(code: str) -> str:
137
 
138
 
139
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
  async def extract_sorry_lemmas(code: str) -> str:
141
  """Use Axle sorry2lemma to extract sorry'd subgoals into standalone lemma stubs."""
142
  try:
@@ -413,6 +446,28 @@ TOOL_DEFINITIONS = [
413
  },
414
  },
415
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
416
  {
417
  "type": "function",
418
  "function": {
 
137
 
138
 
139
 
140
+ async def repair_lean_proofs(code: str) -> str:
141
+ """Use Axle repair_proofs to automatically fill sorries with grind/simp/omega."""
142
+ try:
143
+ async with httpx.AsyncClient(timeout=180) as client:
144
+ response = await client.post(
145
+ f"{AXLE_BASE_URL}/repair_proofs",
146
+ headers={
147
+ "Authorization": f"Bearer {AXLE_API_KEY}",
148
+ "Content-Type": "application/json",
149
+ },
150
+ json={
151
+ "content": code,
152
+ "environment": LEAN_ENVIRONMENT,
153
+ "terminal_tactics": ["grind", "simp_all", "omega", "norm_num", "nlinarith", "aesop"],
154
+ "timeout_seconds": 60,
155
+ },
156
+ )
157
+ if response.status_code != 200:
158
+ return json.dumps({"error": f"Axle HTTP {response.status_code}: {response.text[:500]}"})
159
+ data = response.json()
160
+ repaired_code = data.get("content", code)
161
+ stats = data.get("repair_stats", {})
162
+ has_sorry = "sorry" in repaired_code
163
+ return json.dumps({
164
+ "repaired_code": repaired_code,
165
+ "repairs_applied": stats.get("apply_terminal_tactics", 0),
166
+ "still_has_sorry": has_sorry,
167
+ "okay": data.get("okay", False),
168
+ }, indent=2)
169
+ except Exception as e:
170
+ return json.dumps({"error": f"repair_proofs error: {e}"})
171
+
172
+
173
  async def extract_sorry_lemmas(code: str) -> str:
174
  """Use Axle sorry2lemma to extract sorry'd subgoals into standalone lemma stubs."""
175
  try:
 
446
  },
447
  },
448
  },
449
+ {
450
+ "type": "function",
451
+ "function": {
452
+ "name": "repair_lean_proofs",
453
+ "description": (
454
+ "Automatically attempt to fill sorry placeholders with automation tactics "
455
+ "(grind, simp, omega, norm_num, nlinarith, aesop). FAST (~1-2 seconds). "
456
+ "Call this BEFORE submitting to Aristotle — it may resolve simple sorries instantly. "
457
+ "Returns repaired code and whether sorry remains."
458
+ ),
459
+ "parameters": {
460
+ "type": "object",
461
+ "properties": {
462
+ "code": {
463
+ "type": "string",
464
+ "description": "Lean 4 code with sorry placeholders to attempt repair.",
465
+ }
466
+ },
467
+ "required": ["code"],
468
+ },
469
+ },
470
+ },
471
  {
472
  "type": "function",
473
  "function": {