Spaces:

avichauhan
/

api-debug-env

Running

Level	Task	Threshold	Max Turns	Skill
1	easy	0.7 avg reward	3	Identify single error type + fields
2	classify	0.6 avg reward	4	Identify ALL error types + fields
3	medium	0.6 avg reward	5	Fix the broken request body
4	headers	0.5 avg reward	4	Fix header-level errors
5	response	0.5 avg reward	4	Validate API response issues
6	hard	--	7	Fix mixed errors + explain reasoning

Promotion happens when the rolling average reward (window=10) exceeds the threshold for the current level.

Architecture

Dataset prompt ("Debug this broken API request.")
     |
GRPOTrainer calls rollout_func()
     |
rollout_func() connects to live HF Space via WebSocket
     |
env.reset(task=current_task) -> broken API request
     |
LLM generates JSON response -> env.step(action) -> reward
     |  (repeat up to max_turns)
Returns: prompt_ids, completion_ids, logprobs, env_reward
     |
reward_from_env() extracts env_reward
     |
GRPO updates model weights
     |
maybe_promote() checks if agent should advance to next task

Run on Google Colab (free T4 GPU)

# Cell 1 -- Install
!pip install trl>=0.26.0 transformers torch datasets openenv-core openai

# Cell 2 -- Clone repo
!git clone https://github.com/Avi-chauhan/api-debug-env.git
%cd api-debug-env

# Cell 3 -- Train
!python training/train.py

Requirements

GPU: T4 or better (free Colab works)
RAM: 8GB+
The live HF Space must be running: https://huggingface.co/spaces/avichauhan/api-debug-env