shashaank0707 commited on
Commit
8f19095
Β·
verified Β·
1 Parent(s): a2cb0a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -4
README.md CHANGED
@@ -12,8 +12,7 @@ pinned: false
12
 
13
  **Hackathon Links:**
14
  - 🌌 **[Live Hugging Face Space](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3)**
15
- - πŸ“Ή **[Watch the 2-Minute Demo](#)** *(Replace with YouTube Link)*
16
- - πŸ“ **[Read the Technical Writeup](#)** *(Replace with HF Blog Link)*
17
 
18
  ### πŸš€ One-Line Pitch
19
  An OpenEnv-backed reinforcement learning environment that trains LLMs to debug code systematically via Group Relative Policy Optimization (GRPO) and secure sandbox execution.
@@ -50,7 +49,7 @@ LLMs often hallucinate bug fixes via blind trial-and-error. Real debugging in pr
50
  Our training run clearly demonstrates rapid policy adaptation. The model successfully learned the `OBSERVATION/HYPOTHESIS/ACTION` constraint almost instantly and navigated the tier-2 difficulty bump (step 150) with a textbook drop-and-recover curve.
51
 
52
  ## Training Results
53
- [W&B Run](https://wandb.ai/shashaankjain07-keshav-memorial-college-of-law/AgentDebuggerEnv/runs/vylbqd5m?nw=nwusershashaankjain07) | [HF Blog](#)
54
 
55
  *(Note for Hackathon Judges: Live Weights & Biases charts and Gradio UI are embedded below as evidence of the training run).*
56
 
@@ -105,4 +104,4 @@ The easiest way to re-run the exact GRPO training pipeline is via our Jupyter No
105
 
106
  ### πŸ‘₯ Team Endurance
107
  * **Shashaank Jain** | GitHub: [@shasshaank](https://github.com/shasshaank) | Email: *[shashaankjain07@gmail.com]*
108
- * **[Pranav Pulipati]** | GitHub: *[@PulipatiPranav](https://github.com/PulipatiPranav)* | Email: *[pranavpulipatix@gmail.com]*
 
12
 
13
  **Hackathon Links:**
14
  - 🌌 **[Live Hugging Face Space](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3)**
15
+ - πŸ“ **[Read the Technical Writeup](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3/blob/main/Blog.md)**
 
16
 
17
  ### πŸš€ One-Line Pitch
18
  An OpenEnv-backed reinforcement learning environment that trains LLMs to debug code systematically via Group Relative Policy Optimization (GRPO) and secure sandbox execution.
 
49
  Our training run clearly demonstrates rapid policy adaptation. The model successfully learned the `OBSERVATION/HYPOTHESIS/ACTION` constraint almost instantly and navigated the tier-2 difficulty bump (step 150) with a textbook drop-and-recover curve.
50
 
51
  ## Training Results
52
+ [W&B Run](https://wandb.ai/shashaankjain07-keshav-memorial-college-of-law/AgentDebuggerEnv/runs/vylbqd5m?nw=nwusershashaankjain07) | [HF Blog](https://huggingface.co/spaces/shashaank0707/AgentDebugger-training-v3/blob/main/Blog.md)
53
 
54
  *(Note for Hackathon Judges: Live Weights & Biases charts and Gradio UI are embedded below as evidence of the training run).*
55
 
 
104
 
105
  ### πŸ‘₯ Team Endurance
106
  * **Shashaank Jain** | GitHub: [@shasshaank](https://github.com/shasshaank) | Email: *[shashaankjain07@gmail.com]*
107
+ * **Pranav Pulipati** | GitHub: [@PulipatiPranav](https://github.com/PulipatiPranav) | Email: *[pranavpulipatix@gmail.com]*