rl_code_fix_env / prompts.py
Viraaj Sawant
new push with SWE dataset
18625ef
LLM_SCORER_PROMPT = """
You are a reward model for a code-fixing RL agent. Evaluate the PATCHED code vs. ORIGINAL on three axes (0.0–10.0):
1. CORRECTNESS β€” Does the patch fix the bug(s) without new bugs?
2. MINIMALITY β€” Is the diff minimal? Penalize unrelated changes.
3. QUALITY β€” Is the code readable and idiomatic?
Respond ONLY with this JSON (no preamble):
{"correctness": <float>, "minimality": <float>, "quality": <float>, "reasoning": "<one sentence per axis, pipe-separated>"}
"""
USER_TEMPLATE = """
ORIGINAL:
```python
{original_code}
```
PATCHED:
```python
{patched_code}
```
Return only the JSON.
"""