modelbuilderhq commited on
Commit
35d990a
·
verified ·
1 Parent(s): 5373a8d

Delete AGENT.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. AGENT.md +0 -102
AGENT.md DELETED
@@ -1,102 +0,0 @@
1
- ---
2
- title: HyperBrickCaseOps Agent Guide
3
- ---
4
-
5
- # HyperBrickCaseOps Agent Guide
6
-
7
- This environment evaluates real-world customer support triage. Agents must classify the ticket, request missing info when required, draft the customer reply, add an internal note, and submit only when the workflow is complete.
8
-
9
- ## Quick Start (Agent Strategy)
10
-
11
- Recommended action order:
12
-
13
- 1. `classify` — set `queue`, `priority`, `issue_type`
14
- 2. `request_info` if `required_next_actions` includes it
15
- 3. `wait` if the customer follow-up is pending
16
- 4. `draft_reply`
17
- 5. `add_internal_note`
18
- 6. `submit`
19
-
20
- ## Environment API
21
-
22
- The environment follows the standard OpenEnv API:
23
-
24
- - `reset()` -> initial observation
25
- - `step(action)` -> next observation, reward, done
26
- - `state()` -> internal state snapshot
27
-
28
- Server entrypoint:
29
-
30
- - `server.app:app`
31
-
32
- ## Action Schema
33
-
34
- Each step takes a typed `SupportDeskAction`:
35
-
36
- - `operation`: `classify|request_info|draft_reply|add_internal_note|submit|wait`
37
- - `queue`: string or null
38
- - `priority`: string or null
39
- - `issue_type`: string or null
40
- - `status`: string or null
41
- - `resolution_code`: string or null
42
- - `requested_fields`: list of strings
43
- - `reply`: string or null
44
- - `internal_note`: string or null
45
-
46
- ## Observation Highlights
47
-
48
- The observation includes:
49
-
50
- - `task_id`, `difficulty`, `objective`
51
- - `ticket` (customer, tier, region, business impact)
52
- - `knowledge_base` (policy snippets)
53
- - `case` (current triage state)
54
- - `workflow_stage`, `required_next_actions`, `risk_flags`
55
-
56
- ## Tasks and Difficulty
57
-
58
- There are 4 tasks with increasing difficulty:
59
-
60
- - `billing_refund_easy` (easy)
61
- - `account_takeover_medium` (medium)
62
- - `api_incident_hard` (hard)
63
- - `regulated_export_exception_hard` (hard)
64
-
65
- ## Grading and Reward
66
-
67
- - Deterministic graders score task completion
68
- - Final scores are clamped to `(0.01, 0.99)`
69
- - Reward provides dense progress signals across the episode
70
-
71
- ## Routing Guide (High-Level)
72
-
73
- - Duplicate charge -> `billing_ops`, `high`, `duplicate_charge`
74
- - Suspicious login -> `trust_and_safety`, `urgent`, `account_compromise`
75
- - Production 500s -> `platform_engineering`, `urgent`, `production_incident`
76
- - Export policy bypass -> `compliance_ops`, `high`, `regulated_exception`
77
-
78
- ## Required Environment Variables
79
-
80
- Baseline inference uses:
81
-
82
- - `API_BASE_URL`
83
- - `MODEL_NAME`
84
- - `HF_TOKEN`
85
-
86
- ## Mandatory Stdout Format
87
-
88
- The inference script must emit exactly:
89
-
90
- ```
91
- [START] task=<task_name> env=<benchmark> model=<model_name>
92
- [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
93
- [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
94
- ```
95
-
96
- Rules:
97
-
98
- - One `[START]` at episode begin
99
- - One `[STEP]` per env step
100
- - One `[END]` after episode close
101
- - `reward` and `rewards` formatted to 2 decimals
102
- - `done`/`success` are lowercase booleans