Update README.md
Browse files
README.md
CHANGED
|
@@ -12,8 +12,7 @@ pinned: false
|
|
| 12 |
|
| 13 |
**Hackathon Links:**
|
| 14 |
- π **[Live Hugging Face Space](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3)**
|
| 15 |
-
-
|
| 16 |
-
- π **[Read the Technical Writeup](#)** *(Replace with HF Blog Link)*
|
| 17 |
|
| 18 |
### π One-Line Pitch
|
| 19 |
An OpenEnv-backed reinforcement learning environment that trains LLMs to debug code systematically via Group Relative Policy Optimization (GRPO) and secure sandbox execution.
|
|
@@ -50,7 +49,7 @@ LLMs often hallucinate bug fixes via blind trial-and-error. Real debugging in pr
|
|
| 50 |
Our training run clearly demonstrates rapid policy adaptation. The model successfully learned the `OBSERVATION/HYPOTHESIS/ACTION` constraint almost instantly and navigated the tier-2 difficulty bump (step 150) with a textbook drop-and-recover curve.
|
| 51 |
|
| 52 |
## Training Results
|
| 53 |
-
[W&B Run](https://wandb.ai/shashaankjain07-keshav-memorial-college-of-law/AgentDebuggerEnv/runs/vylbqd5m?nw=nwusershashaankjain07) | [HF Blog](
|
| 54 |
|
| 55 |
*(Note for Hackathon Judges: Live Weights & Biases charts and Gradio UI are embedded below as evidence of the training run).*
|
| 56 |
|
|
@@ -105,4 +104,4 @@ The easiest way to re-run the exact GRPO training pipeline is via our Jupyter No
|
|
| 105 |
|
| 106 |
### π₯ Team Endurance
|
| 107 |
* **Shashaank Jain** | GitHub: [@shasshaank](https://github.com/shasshaank) | Email: *[shashaankjain07@gmail.com]*
|
| 108 |
-
* **
|
|
|
|
| 12 |
|
| 13 |
**Hackathon Links:**
|
| 14 |
- π **[Live Hugging Face Space](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3)**
|
| 15 |
+
- π **[Read the Technical Writeup](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3/blob/main/Blog.md)**
|
|
|
|
| 16 |
|
| 17 |
### π One-Line Pitch
|
| 18 |
An OpenEnv-backed reinforcement learning environment that trains LLMs to debug code systematically via Group Relative Policy Optimization (GRPO) and secure sandbox execution.
|
|
|
|
| 49 |
Our training run clearly demonstrates rapid policy adaptation. The model successfully learned the `OBSERVATION/HYPOTHESIS/ACTION` constraint almost instantly and navigated the tier-2 difficulty bump (step 150) with a textbook drop-and-recover curve.
|
| 50 |
|
| 51 |
## Training Results
|
| 52 |
+
[W&B Run](https://wandb.ai/shashaankjain07-keshav-memorial-college-of-law/AgentDebuggerEnv/runs/vylbqd5m?nw=nwusershashaankjain07) | [HF Blog](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3/blob/main/Blog.md)
|
| 53 |
|
| 54 |
*(Note for Hackathon Judges: Live Weights & Biases charts and Gradio UI are embedded below as evidence of the training run).*
|
| 55 |
|
|
|
|
| 104 |
|
| 105 |
### π₯ Team Endurance
|
| 106 |
* **Shashaank Jain** | GitHub: [@shasshaank](https://github.com/shasshaank) | Email: *[shashaankjain07@gmail.com]*
|
| 107 |
+
* **Pranav Pulipati** | GitHub: [@PulipatiPranav](https://github.com/PulipatiPranav) | Email: *[pranavpulipatix@gmail.com]*
|