Spaces:

agentDebugger
/

AgentDebugger-training-v3

Running

App Files Files Community

shashaank0707 commited on 6 days ago

Commit

8f19095

verified ·

1 Parent(s): a2cb0a0

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -12,8 +12,7 @@ pinned: false
 **Hackathon Links:**
 - 🌌 **[Live Hugging Face Space](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3)**
-- 📹 **[Watch the 2-Minute Demo](#)** *(Replace with YouTube Link)*
-- 📝 **[Read the Technical Writeup](#)** *(Replace with HF Blog Link)*
 ### 🚀 One-Line Pitch
 An OpenEnv-backed reinforcement learning environment that trains LLMs to debug code systematically via Group Relative Policy Optimization (GRPO) and secure sandbox execution.
@@ -50,7 +49,7 @@ LLMs often hallucinate bug fixes via blind trial-and-error. Real debugging in pr
 Our training run clearly demonstrates rapid policy adaptation. The model successfully learned the `OBSERVATION/HYPOTHESIS/ACTION` constraint almost instantly and navigated the tier-2 difficulty bump (step 150) with a textbook drop-and-recover curve.
 ## Training Results
-[W&B Run](https://wandb.ai/shashaankjain07-keshav-memorial-college-of-law/AgentDebuggerEnv/runs/vylbqd5m?nw=nwusershashaankjain07) | [HF Blog](#)
 *(Note for Hackathon Judges: Live Weights & Biases charts and Gradio UI are embedded below as evidence of the training run).*
@@ -105,4 +104,4 @@ The easiest way to re-run the exact GRPO training pipeline is via our Jupyter No
 ### 👥 Team Endurance
 * **Shashaank Jain** | GitHub: [@shasshaank](https://github.com/shasshaank) | Email: *[shashaankjain07@gmail.com]*
-* **[Pranav Pulipati]** | GitHub: *[@PulipatiPranav](https://github.com/PulipatiPranav)* | Email: *[pranavpulipatix@gmail.com]*

 **Hackathon Links:**
 - 🌌 **[Live Hugging Face Space](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3)**
+- 📝 **[Read the Technical Writeup](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3/blob/main/Blog.md)**
 ### 🚀 One-Line Pitch
 An OpenEnv-backed reinforcement learning environment that trains LLMs to debug code systematically via Group Relative Policy Optimization (GRPO) and secure sandbox execution.
 Our training run clearly demonstrates rapid policy adaptation. The model successfully learned the `OBSERVATION/HYPOTHESIS/ACTION` constraint almost instantly and navigated the tier-2 difficulty bump (step 150) with a textbook drop-and-recover curve.
 ## Training Results
+[W&B Run](https://wandb.ai/shashaankjain07-keshav-memorial-college-of-law/AgentDebuggerEnv/runs/vylbqd5m?nw=nwusershashaankjain07) | [HF Blog](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3/blob/main/Blog.md)
 *(Note for Hackathon Judges: Live Weights & Biases charts and Gradio UI are embedded below as evidence of the training run).*
 ### 👥 Team Endurance
 * **Shashaank Jain** | GitHub: [@shasshaank](https://github.com/shasshaank) | Email: *[shashaankjain07@gmail.com]*
+* **Pranav Pulipati** | GitHub: [@PulipatiPranav](https://github.com/PulipatiPranav) | Email: *[pranavpulipatix@gmail.com]*