Spaces:

modelbuilderhq
/

HyperBrickCaseOps

Sleeping

App Files Files Community

HyperBrickCaseOps / agents.md

modelbuilderhq

Upload folder using huggingface_hub

9ffc733 verified about 1 month ago

preview code

raw

history blame contribute delete

2.79 kB

metadata

title: HyperBrickCaseOps Agent Guide

HyperBrickCaseOps Agent Guide

This environment evaluates real-world customer support triage. Agents must classify the ticket, request missing info when required, draft the customer reply, add an internal note, and submit only when the workflow is complete.

Quick Start (Agent Strategy)

Recommended action order:

classify — set queue, priority, issue_type
request_info if required_next_actions includes it
wait if the customer follow-up is pending
draft_reply
add_internal_note
submit

Environment API

The environment follows the standard OpenEnv API:

reset() -> initial observation
step(action) -> next observation, reward, done
state() -> internal state snapshot

Server entrypoint:

server.app:app

Action Schema

Each step takes a typed SupportDeskAction:

operation: classify|request_info|draft_reply|add_internal_note|submit|wait
queue: string or null
priority: string or null
issue_type: string or null
status: string or null
resolution_code: string or null
requested_fields: list of strings
reply: string or null
internal_note: string or null

Observation Highlights

The observation includes:

task_id, difficulty, objective
ticket (customer, tier, region, business impact)
knowledge_base (policy snippets)
case (current triage state)
workflow_stage, required_next_actions, risk_flags

Tasks and Difficulty

There are 4 tasks with increasing difficulty:

billing_refund_easy (easy)
account_takeover_medium (medium)
api_incident_hard (hard)
regulated_export_exception_hard (hard)

Grading and Reward

Deterministic graders score task completion
Final scores are clamped to (0.01, 0.99)
Reward provides dense progress signals across the episode

Routing Guide (High-Level)

Duplicate charge -> billing_ops, high, duplicate_charge
Suspicious login -> trust_and_safety, urgent, account_compromise
Production 500s -> platform_engineering, urgent, production_incident
Export policy bypass -> compliance_ops, high, regulated_exception

Required Environment Variables

Baseline inference uses:

API_BASE_URL
MODEL_NAME
HF_TOKEN

Mandatory Stdout Format

The inference script must emit exactly:

[START] task=<task_name> env=<benchmark> model=<model_name>
[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
[END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>

Rules:

One [START] at episode begin
One [STEP] per env step
One [END] after episode close
reward and rewards formatted to 2 decimals
done/success are lowercase booleans