Spaces:

savetrees
/

bug-triage-openenv

Running

App Files Files Community

bug-triage-openenv / docs /ARCHITECTURE.md

savetrees

Upload folder using huggingface_hub

0135a17 verified 2 days ago

preview code

raw

history blame contribute delete

6.01 kB

ARCHITECTURE.md — Bug Triage OpenEnv System Architecture

1. Core Abstractions (OpenEnv Spec)

Server-side (Docker)

BugTriageEnvironment(Environment)
├── reset(task_id)  →  BugTriageObservation   # New bug report episode
├── step(action)    →  BugTriageObservation   # Agent triages; grader fires; done=True
└── state @property →  BugTriageState         # Episode metadata

Client-side (Training code)

BugTriageEnvClient
├── reset(task_id)  →  dict   # POST /reset
├── step(ep_id, action) → dict  # POST /step
└── state()  →  dict           # GET  /state

2. State Model

class BugTriageState:
    episode_id: str         # Unique per episode
    step_count: int         # 0 after reset, 1 after step
    task_id: str            # "task_1" | "task_2" | "task_3"
    bug_id: str             # Which bug report is active
    cumulative_reward: float

3. Observation Model

class BugTriageObservation:
    done: bool                        # True after step()
    reward: float                     # Shaped reward [-0.5, 1.0]
    task_id: str

    bug_report: BugReport             # Title, description, logs, env, reporter, metadata
    available_developers: List[str]   # ["Alice", "Bob", "Carol", "David", "Eve"]
    step_number: int
    feedback: str                     # Human-readable grader feedback
    grader_score: Optional[float]     # [0.0-1.0] — only when done=True
    episode_id: str

BugReport fields the agent reads:

Field	Type	Example
title	str	"App crashes on iOS 17 uploading >50MB"
description	str	Full description with context
logs	str?	Stack traces, error output
environment	str?	"iOS 17.2, iPhone 15 Pro, App v3.2.1"
reporter	str?	"enterprise_client_a"
metadata	dict	`{"component": "file_upload", "affected_users": 847}`

4. Action Model

class BugTriageAction:
    task_id: str                                    # Required always

    # Task 1 (Easy)
    bug_type: Optional[str]                         # crash|ui|performance|security|data_loss|compatibility

    # Task 2 (Medium)
    priority: Optional[str]                         # low|medium|high|critical

    # Task 3 (Hard) — all of the above plus:
    assigned_developer: Optional[str]               # Alice|Bob|Carol|David|Eve
    suggested_action: Optional[str]                 # fix_immediately|schedule_sprint|needs_more_info|wontfix|duplicate
    reasoning: Optional[str]                        # Chain-of-thought (not graded)

5. Episode Flow

reset(task_id="task_1")
  │
  └── Returns BugTriageObservation:
        - bug_report = random bug from dataset (15 bugs)
        - done = False
        - episode_id = "abc123"

step(action=BugTriageAction(task_id="task_1", bug_type="crash"))
  │
  ├── Grader fires immediately (single-step episode)
  │     task1_grader.grade([action], ground_truth) → 1.0
  │
  └── Returns BugTriageObservation:
        - done = True
        - reward = 1.0 (shaped from grader score)
        - grader_score = 1.0
        - feedback = "Bug type: ✓ (predicted=crash, expected=crash)"

Key design: Episodes are single-step — agent reads bug, makes one decision, episode ends. This matches real-world triage (you don't re-classify the same bug iteratively).

6. File Layout

bug_triage_env/
├── __init__.py            ← Package exports
├── models.py              ← Pydantic typed Action/Observation/State
├── client.py              ← Sync + Async HTTP clients
├── baseline.py            ← OpenAI GPT-4o-mini inference script
├── openenv.yaml           ← Manifest
├── pyproject.toml
├── requirements.txt
│
├── data/
│   └── bugs.json          ← 15 real-world bug reports + ground truth
│
├── graders/
│   ├── __init__.py        ← GRADERS registry
│   ├── task1_grader.py    ← Bug type exact match [0/1]
│   ├── task2_grader.py    ← Priority distance scoring [0-1]
│   └── task3_grader.py    ← Weighted composite [0-1]
│
└── server/
    ├── __init__.py
    ├── environment.py     ← BugTriageEnvironment(Environment)
    ├── app.py             ← FastAPI (standard + hackathon endpoints)
    └── Dockerfile

7. API Endpoints

Standard OpenEnv

Method	Endpoint	Purpose
GET	`/health`	→ `{"status": "healthy"}`
POST	`/reset`	Body: `{"task_id": "task_1"}` → observation with bug report
POST	`/step`	Body: `{"episode_id": "...", "action": {...}}` → scored observation
GET	`/state`	→ current episode metadata
GET	`/docs`	Swagger UI

Hackathon Required

Method	Endpoint	Purpose
GET	`/tasks`	→ 3 tasks with action schemas
POST	`/grader`	Body: `{"episode_id":"...","task_id":"task_1"}` → score [0.0-1.0]
POST	`/baseline`	Runs baseline.py → all task scores

8. Developer Specialization Matrix

Developer	Crash	UI	Perf	Security	Data Loss	Compat
Alice	✓		✓		✓
Bob	✓			✓
Carol		✓				✓
David				✓	✓
Eve		✓	✓			✓

This matrix is used by the Task 3 grader for partial credit on developer assignment — if the agent picks the wrong person but someone with the right specialization, it gets 0.5 instead of 0.0.

9. Scaling

Deployment	Workers	Max Sessions
Local	8	~2000
HF Spaces Free	2	~128
HF Spaces Upgrade	4-8	~512

Thread-safe: BugTriageEnvironment uses a threading.Lock for concurrent episode storage.