Spaces:
Running
Running
ARCHITECTURE.md β Bug Triage OpenEnv System Architecture
1. Core Abstractions (OpenEnv Spec)
Server-side (Docker)
BugTriageEnvironment(Environment)
βββ reset(task_id) β BugTriageObservation # New bug report episode
βββ step(action) β BugTriageObservation # Agent triages; grader fires; done=True
βββ state @property β BugTriageState # Episode metadata
Client-side (Training code)
BugTriageEnvClient
βββ reset(task_id) β dict # POST /reset
βββ step(ep_id, action) β dict # POST /step
βββ state() β dict # GET /state
2. State Model
class BugTriageState:
episode_id: str # Unique per episode
step_count: int # 0 after reset, 1 after step
task_id: str # "task_1" | "task_2" | "task_3"
bug_id: str # Which bug report is active
cumulative_reward: float
3. Observation Model
class BugTriageObservation:
done: bool # True after step()
reward: float # Shaped reward [-0.5, 1.0]
task_id: str
bug_report: BugReport # Title, description, logs, env, reporter, metadata
available_developers: List[str] # ["Alice", "Bob", "Carol", "David", "Eve"]
step_number: int
feedback: str # Human-readable grader feedback
grader_score: Optional[float] # [0.0-1.0] β only when done=True
episode_id: str
BugReport fields the agent reads:
| Field | Type | Example |
|---|---|---|
| title | str | "App crashes on iOS 17 uploading >50MB" |
| description | str | Full description with context |
| logs | str? | Stack traces, error output |
| environment | str? | "iOS 17.2, iPhone 15 Pro, App v3.2.1" |
| reporter | str? | "enterprise_client_a" |
| metadata | dict | {"component": "file_upload", "affected_users": 847} |
4. Action Model
class BugTriageAction:
task_id: str # Required always
# Task 1 (Easy)
bug_type: Optional[str] # crash|ui|performance|security|data_loss|compatibility
# Task 2 (Medium)
priority: Optional[str] # low|medium|high|critical
# Task 3 (Hard) β all of the above plus:
assigned_developer: Optional[str] # Alice|Bob|Carol|David|Eve
suggested_action: Optional[str] # fix_immediately|schedule_sprint|needs_more_info|wontfix|duplicate
reasoning: Optional[str] # Chain-of-thought (not graded)
5. Episode Flow
reset(task_id="task_1")
β
βββ Returns BugTriageObservation:
- bug_report = random bug from dataset (15 bugs)
- done = False
- episode_id = "abc123"
step(action=BugTriageAction(task_id="task_1", bug_type="crash"))
β
βββ Grader fires immediately (single-step episode)
β task1_grader.grade([action], ground_truth) β 1.0
β
βββ Returns BugTriageObservation:
- done = True
- reward = 1.0 (shaped from grader score)
- grader_score = 1.0
- feedback = "Bug type: β (predicted=crash, expected=crash)"
Key design: Episodes are single-step β agent reads bug, makes one decision, episode ends. This matches real-world triage (you don't re-classify the same bug iteratively).
6. File Layout
bug_triage_env/
βββ __init__.py β Package exports
βββ models.py β Pydantic typed Action/Observation/State
βββ client.py β Sync + Async HTTP clients
βββ baseline.py β OpenAI GPT-4o-mini inference script
βββ openenv.yaml β Manifest
βββ pyproject.toml
βββ requirements.txt
β
βββ data/
β βββ bugs.json β 15 real-world bug reports + ground truth
β
βββ graders/
β βββ __init__.py β GRADERS registry
β βββ task1_grader.py β Bug type exact match [0/1]
β βββ task2_grader.py β Priority distance scoring [0-1]
β βββ task3_grader.py β Weighted composite [0-1]
β
βββ server/
βββ __init__.py
βββ environment.py β BugTriageEnvironment(Environment)
βββ app.py β FastAPI (standard + hackathon endpoints)
βββ Dockerfile
7. API Endpoints
Standard OpenEnv
| Method | Endpoint | Purpose |
|---|---|---|
| GET | /health |
β {"status": "healthy"} |
| POST | /reset |
Body: {"task_id": "task_1"} β observation with bug report |
| POST | /step |
Body: {"episode_id": "...", "action": {...}} β scored observation |
| GET | /state |
β current episode metadata |
| GET | /docs |
Swagger UI |
Hackathon Required
| Method | Endpoint | Purpose |
|---|---|---|
| GET | /tasks |
β 3 tasks with action schemas |
| POST | /grader |
Body: {"episode_id":"...","task_id":"task_1"} β score [0.0-1.0] |
| POST | /baseline |
Runs baseline.py β all task scores |
8. Developer Specialization Matrix
| Developer | Crash | UI | Perf | Security | Data Loss | Compat |
|---|---|---|---|---|---|---|
| Alice | β | β | β | |||
| Bob | β | β | ||||
| Carol | β | β | ||||
| David | β | β | ||||
| Eve | β | β | β |
This matrix is used by the Task 3 grader for partial credit on developer assignment β if the agent picks the wrong person but someone with the right specialization, it gets 0.5 instead of 0.0.
9. Scaling
| Deployment | Workers | Max Sessions |
|---|---|---|
| Local | 8 | ~2000 |
| HF Spaces Free | 2 | ~128 |
| HF Spaces Upgrade | 4-8 | ~512 |
Thread-safe: BugTriageEnvironment uses a threading.Lock for concurrent episode storage.