Spaces:

modelbuilderhq
/

HyperBrickCaseOps

Sleeping

App Files Files Community

modelbuilderhq commited on 29 days ago

Commit

35d990a

verified ·

1 Parent(s): 5373a8d

Delete AGENT.md with huggingface_hub

Browse files

Files changed (1) hide show

AGENT.md +0 -102

AGENT.md DELETED Viewed

@@ -1,102 +0,0 @@
----
-title: HyperBrickCaseOps Agent Guide
----
-# HyperBrickCaseOps Agent Guide
-This environment evaluates real-world customer support triage. Agents must classify the ticket, request missing info when required, draft the customer reply, add an internal note, and submit only when the workflow is complete.
-## Quick Start (Agent Strategy)
-Recommended action order:
-1. `classify` — set `queue`, `priority`, `issue_type`
-2. `request_info` if `required_next_actions` includes it
-3. `wait` if the customer follow-up is pending
-4. `draft_reply`
-5. `add_internal_note`
-6. `submit`
-## Environment API
-The environment follows the standard OpenEnv API:
-- `reset()` -> initial observation
-- `step(action)` -> next observation, reward, done
-- `state()` -> internal state snapshot
-Server entrypoint:
-- `server.app:app`
-## Action Schema
-Each step takes a typed `SupportDeskAction`:
-- `operation`: `classify|request_info|draft_reply|add_internal_note|submit|wait`
-- `queue`: string or null
-- `priority`: string or null
-- `issue_type`: string or null
-- `status`: string or null
-- `resolution_code`: string or null
-- `requested_fields`: list of strings
-- `reply`: string or null
-- `internal_note`: string or null
-## Observation Highlights
-The observation includes:
-- `task_id`, `difficulty`, `objective`
-- `ticket` (customer, tier, region, business impact)
-- `knowledge_base` (policy snippets)
-- `case` (current triage state)
-- `workflow_stage`, `required_next_actions`, `risk_flags`
-## Tasks and Difficulty
-There are 4 tasks with increasing difficulty:
-- `billing_refund_easy` (easy)
-- `account_takeover_medium` (medium)
-- `api_incident_hard` (hard)
-- `regulated_export_exception_hard` (hard)
-## Grading and Reward
-- Deterministic graders score task completion
-- Final scores are clamped to `(0.01, 0.99)`
-- Reward provides dense progress signals across the episode
-## Routing Guide (High-Level)
-- Duplicate charge -> `billing_ops`, `high`, `duplicate_charge`
-- Suspicious login -> `trust_and_safety`, `urgent`, `account_compromise`
-- Production 500s -> `platform_engineering`, `urgent`, `production_incident`
-- Export policy bypass -> `compliance_ops`, `high`, `regulated_exception`
-## Required Environment Variables
-Baseline inference uses:
-- `API_BASE_URL`
-- `MODEL_NAME`
-- `HF_TOKEN`
-## Mandatory Stdout Format
-The inference script must emit exactly:
-```
-[START] task=<task_name> env=<benchmark> model=<model_name>
-[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
-[END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
-```
-Rules:
-- One `[START]` at episode begin
-- One `[STEP]` per env step
-- One `[END]` after episode close
-- `reward` and `rewards` formatted to 2 decimals
-- `done`/`success` are lowercase booleans