Spaces:

SolusOps
/

tracefix_rl

Sleeping

App Files Files Community

tracefix_rl / CLAUDE.md

databoysu

local test

33ef871 about 1 month ago

preview code

raw

history blame contribute delete

5.42 kB

CLAUDE.md - TraceFix-RL (RL_ENV_FINAL)

Current, code-backed notes for assistants working in this repository. Last updated: 2026-04-08

Project Status Snapshot

Repo: code_reasoner_rl_env
Branch: master
Working tree: dirty
- Modified: .gitignore, inference.py, models.py, __pycache__/models.cpython-312.pyc
- Untracked: .hfignore
Last recorded pre-validation command in terminal:
- ./pre-val.sh https://sus-human-tracefix-rl.hf.space .
- Exit code: 1

This file describes the current implementation in RL_ENV_FINAL only.

High-Level Architecture

environment.py: core gym-style state machine (TraceFixRLGym)
server/tracefix_rl_environment.py: OpenEnv adapter (Environment interface)
server/app.py: FastAPI app creation and uvicorn entrypoint
models.py: action/observation schemas (CodeAction, CodeObservation, TestResult)
sandbox.py: isolated code execution + test running + timeout handling
tasks.py: static task registry (easy/medium/hard)
context.py: localized context windowing around last edit
client.py: typed OpenEnv client (TraceFixRLEnv / MyEnv)
inference.py: baseline agent runner with OpenAI-compatible API
openenv.yaml: OpenEnv runtime metadata (app: server.app:app, port: 7860)

Runtime and Entry Points

Local server via project script:
- uv run --project . server
Container command in Dockerfile:
- uvicorn server.app:app --host 0.0.0.0 --port 7860
OpenEnv spec points to:
- server.app:app

Environment Behavior (`environment.py`)

Action space:

VIEW_CODE
RUN_TESTS
REPLACE_LINES
UNDO_EDIT
RESET_TO_ORIGINAL
SUBMIT

Reward constants currently defined:

R_STEP_COST = -0.01
R_RUN_TESTS = +0.10
R_PER_NEW_PASS = +0.05
R_SYNTAX_ERROR = -0.10
R_INVALID_LINE = -0.02
R_DESTRUCTIVE_PENALTY = -0.20
R_UNDO_RESET = -0.10
MAX_STEPS = 50

Episode internals include:

code snapshotting (_original_code, _edit_history)
anti-loop penalty for repeated identical action_type
contextual anchor (_last_edited_line) for localized context
cumulative step-cost tracking (_accumulated_step_costs)

Submit scoring model:

proportion = passing_tests / total_tests (or 0 on syntax error)
raw_score = proportion - _accumulated_step_costs
final_score = clamp(raw_score, 0.0, 1.0)
same clamp model used on max-step timeout auto-evaluation

Task sampling policy:

training_step == 0: random from ALL_TASKS
< 1000: easy
< 5000: medium
>= 5000: hard
fallback to first non-empty bucket

Schema Notes (`models.py`)

Important: current code uses Pydantic v2-style validation APIs.

CodeAction uses @model_validator(mode="before")
Non-REPLACE_LINES actions force start_line, end_line, new_code_block to None
REPLACE_LINES enforces required fields and 1-indexed positive range constraints

This is not compatible with Pydantic v1-only assumptions.

Sandbox Notes (`sandbox.py`)

run_code_with_tests(...) returns a strict 3-tuple:

output_str
List[TestResult>
had_syntax_error: bool

Execution safeguards:

subprocess isolation via multiprocessing.Process
timeout terminate/kill path
tail truncation (MAX_OUTPUT_CHARS = 1000)
restricted builtins to block risky operations

Tasks Registry (`tasks.py`)

Static hardcoded registry grouped by difficulty
Exports:
- TASKS_BY_DIFFICULTY
- ALL_TASKS
Expected total currently: 16 tasks
- easy: 4
- medium: 6
- hard: 6

OpenEnv Adapter and Client

server/tracefix_rl_environment.py:

Maps optional reset difficulty to training_step hints
Writes system_prompt into observation metadata
Sets observation reward/done from gym step output

client.py:

Sends actions using model_dump(exclude_none=True)
Parses OpenEnv payloads into typed CodeObservation

Inference Runner (`inference.py`)

Key defaults:

API_BASE_URL = https://router.huggingface.co/v1
MODEL_NAME = Qwen/Qwen2.5-72B-Instruct
MAX_STEPS = 50
SUCCESS_SCORE_THRESHOLD = 0.99
THINKING_TOKEN_LIMIT = 512

Behavior:

Logs in strict sequence: [START], repeated [STEP], then [END]
Uses JSON extraction fallback path from model text
Falls back to RUN_TESTS on parse or validation failure
Supports --easy, --medium, --hard, --debug

Drift and Risk Notes

requirements.txt currently pins pydantic==1.10.17, but code in models.py uses v2 APIs (model_validator).
pyproject.toml is the active dependency source for uv sync; requirements.txt appears stale relative to runtime assumptions.
environment.py defines R_SUBMIT_ALL_PASS and R_SUBMIT_FAIL, but submit currently uses clamped proportion-minus-step-cost scoring instead of those constants.
server/tracefix_rl_environment.py advertises concurrent sessions support, while create_app(..., max_concurrent_envs=1) constrains server-level concurrency.

Practical Checklist Before Validation

Confirm dependency source of truth (pyproject.toml vs requirements.txt) and align Pydantic version expectations.
Re-run pre-validation and capture the first failing check/output.
Remove tracked cache artifacts from version control if unintended (for example __pycache__/*.pyc).
Keep stdout format in inference.py unchanged, as validator parsing depends on it.