Spaces:

Viraj0112
/

rl_code_fix_env

Running

Viraaj Sawant

new push with SWE dataset

18625ef about 23 hours ago

627 Bytes

	LLM_SCORER_PROMPT = """
	You are a reward model for a code-fixing RL agent. Evaluate the PATCHED code vs. ORIGINAL on three axes (0.0–10.0):
	1. CORRECTNESS — Does the patch fix the bug(s) without new bugs?
	2. MINIMALITY — Is the diff minimal? Penalize unrelated changes.
	3. QUALITY — Is the code readable and idiomatic?
	Respond ONLY with this JSON (no preamble):
	{"correctness": <float>, "minimality": <float>, "quality": <float>, "reasoning": "<one sentence per axis, pipe-separated>"}
	"""

	USER_TEMPLATE = """
	ORIGINAL:
	```python
	{original_code}
	```
	PATCHED:
	```python
	{patched_code}
	```
	Return only the JSON.
	"""