Spaces:
Sleeping
Sleeping
Delete AGENT.md with huggingface_hub
Browse files
AGENT.md
DELETED
|
@@ -1,102 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: HyperBrickCaseOps Agent Guide
|
| 3 |
-
---
|
| 4 |
-
|
| 5 |
-
# HyperBrickCaseOps Agent Guide
|
| 6 |
-
|
| 7 |
-
This environment evaluates real-world customer support triage. Agents must classify the ticket, request missing info when required, draft the customer reply, add an internal note, and submit only when the workflow is complete.
|
| 8 |
-
|
| 9 |
-
## Quick Start (Agent Strategy)
|
| 10 |
-
|
| 11 |
-
Recommended action order:
|
| 12 |
-
|
| 13 |
-
1. `classify` — set `queue`, `priority`, `issue_type`
|
| 14 |
-
2. `request_info` if `required_next_actions` includes it
|
| 15 |
-
3. `wait` if the customer follow-up is pending
|
| 16 |
-
4. `draft_reply`
|
| 17 |
-
5. `add_internal_note`
|
| 18 |
-
6. `submit`
|
| 19 |
-
|
| 20 |
-
## Environment API
|
| 21 |
-
|
| 22 |
-
The environment follows the standard OpenEnv API:
|
| 23 |
-
|
| 24 |
-
- `reset()` -> initial observation
|
| 25 |
-
- `step(action)` -> next observation, reward, done
|
| 26 |
-
- `state()` -> internal state snapshot
|
| 27 |
-
|
| 28 |
-
Server entrypoint:
|
| 29 |
-
|
| 30 |
-
- `server.app:app`
|
| 31 |
-
|
| 32 |
-
## Action Schema
|
| 33 |
-
|
| 34 |
-
Each step takes a typed `SupportDeskAction`:
|
| 35 |
-
|
| 36 |
-
- `operation`: `classify|request_info|draft_reply|add_internal_note|submit|wait`
|
| 37 |
-
- `queue`: string or null
|
| 38 |
-
- `priority`: string or null
|
| 39 |
-
- `issue_type`: string or null
|
| 40 |
-
- `status`: string or null
|
| 41 |
-
- `resolution_code`: string or null
|
| 42 |
-
- `requested_fields`: list of strings
|
| 43 |
-
- `reply`: string or null
|
| 44 |
-
- `internal_note`: string or null
|
| 45 |
-
|
| 46 |
-
## Observation Highlights
|
| 47 |
-
|
| 48 |
-
The observation includes:
|
| 49 |
-
|
| 50 |
-
- `task_id`, `difficulty`, `objective`
|
| 51 |
-
- `ticket` (customer, tier, region, business impact)
|
| 52 |
-
- `knowledge_base` (policy snippets)
|
| 53 |
-
- `case` (current triage state)
|
| 54 |
-
- `workflow_stage`, `required_next_actions`, `risk_flags`
|
| 55 |
-
|
| 56 |
-
## Tasks and Difficulty
|
| 57 |
-
|
| 58 |
-
There are 4 tasks with increasing difficulty:
|
| 59 |
-
|
| 60 |
-
- `billing_refund_easy` (easy)
|
| 61 |
-
- `account_takeover_medium` (medium)
|
| 62 |
-
- `api_incident_hard` (hard)
|
| 63 |
-
- `regulated_export_exception_hard` (hard)
|
| 64 |
-
|
| 65 |
-
## Grading and Reward
|
| 66 |
-
|
| 67 |
-
- Deterministic graders score task completion
|
| 68 |
-
- Final scores are clamped to `(0.01, 0.99)`
|
| 69 |
-
- Reward provides dense progress signals across the episode
|
| 70 |
-
|
| 71 |
-
## Routing Guide (High-Level)
|
| 72 |
-
|
| 73 |
-
- Duplicate charge -> `billing_ops`, `high`, `duplicate_charge`
|
| 74 |
-
- Suspicious login -> `trust_and_safety`, `urgent`, `account_compromise`
|
| 75 |
-
- Production 500s -> `platform_engineering`, `urgent`, `production_incident`
|
| 76 |
-
- Export policy bypass -> `compliance_ops`, `high`, `regulated_exception`
|
| 77 |
-
|
| 78 |
-
## Required Environment Variables
|
| 79 |
-
|
| 80 |
-
Baseline inference uses:
|
| 81 |
-
|
| 82 |
-
- `API_BASE_URL`
|
| 83 |
-
- `MODEL_NAME`
|
| 84 |
-
- `HF_TOKEN`
|
| 85 |
-
|
| 86 |
-
## Mandatory Stdout Format
|
| 87 |
-
|
| 88 |
-
The inference script must emit exactly:
|
| 89 |
-
|
| 90 |
-
```
|
| 91 |
-
[START] task=<task_name> env=<benchmark> model=<model_name>
|
| 92 |
-
[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
|
| 93 |
-
[END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
|
| 94 |
-
```
|
| 95 |
-
|
| 96 |
-
Rules:
|
| 97 |
-
|
| 98 |
-
- One `[START]` at episode begin
|
| 99 |
-
- One `[STEP]` per env step
|
| 100 |
-
- One `[END]` after episode close
|
| 101 |
-
- `reward` and `rewards` formatted to 2 decimals
|
| 102 |
-
- `done`/`success` are lowercase booleans
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|