HyperBrickCaseOps / agents.md
modelbuilderhq's picture
Upload folder using huggingface_hub
9ffc733 verified
metadata
title: HyperBrickCaseOps Agent Guide

HyperBrickCaseOps Agent Guide

This environment evaluates real-world customer support triage. Agents must classify the ticket, request missing info when required, draft the customer reply, add an internal note, and submit only when the workflow is complete.

Quick Start (Agent Strategy)

Recommended action order:

  1. classify — set queue, priority, issue_type
  2. request_info if required_next_actions includes it
  3. wait if the customer follow-up is pending
  4. draft_reply
  5. add_internal_note
  6. submit

Environment API

The environment follows the standard OpenEnv API:

  • reset() -> initial observation
  • step(action) -> next observation, reward, done
  • state() -> internal state snapshot

Server entrypoint:

  • server.app:app

Action Schema

Each step takes a typed SupportDeskAction:

  • operation: classify|request_info|draft_reply|add_internal_note|submit|wait
  • queue: string or null
  • priority: string or null
  • issue_type: string or null
  • status: string or null
  • resolution_code: string or null
  • requested_fields: list of strings
  • reply: string or null
  • internal_note: string or null

Observation Highlights

The observation includes:

  • task_id, difficulty, objective
  • ticket (customer, tier, region, business impact)
  • knowledge_base (policy snippets)
  • case (current triage state)
  • workflow_stage, required_next_actions, risk_flags

Tasks and Difficulty

There are 4 tasks with increasing difficulty:

  • billing_refund_easy (easy)
  • account_takeover_medium (medium)
  • api_incident_hard (hard)
  • regulated_export_exception_hard (hard)

Grading and Reward

  • Deterministic graders score task completion
  • Final scores are clamped to (0.01, 0.99)
  • Reward provides dense progress signals across the episode

Routing Guide (High-Level)

  • Duplicate charge -> billing_ops, high, duplicate_charge
  • Suspicious login -> trust_and_safety, urgent, account_compromise
  • Production 500s -> platform_engineering, urgent, production_incident
  • Export policy bypass -> compliance_ops, high, regulated_exception

Required Environment Variables

Baseline inference uses:

  • API_BASE_URL
  • MODEL_NAME
  • HF_TOKEN

Mandatory Stdout Format

The inference script must emit exactly:

[START] task=<task_name> env=<benchmark> model=<model_name>
[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
[END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>

Rules:

  • One [START] at episode begin
  • One [STEP] per env step
  • One [END] after episode close
  • reward and rewards formatted to 2 decimals
  • done/success are lowercase booleans