Spaces:

aim143
/

support-queue-openenv

Sleeping

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,5 @@

+.uv-cache/
+.docker/
+__pycache__/
+.pytest_cache/
+inference_results.json

Dockerfile ADDED Viewed

	@@ -0,0 +1,21 @@

+FROM python:3.11-slim
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV PORT=8000
+WORKDIR /app
+COPY requirements.txt ./
+COPY pyproject.toml ./
+COPY README.md ./
+COPY server ./server
+COPY support_queue_env ./support_queue_env
+COPY openenv.yaml ./
+COPY inference.py ./
+RUN pip install --no-cache-dir -r requirements.txt
+EXPOSE 8000
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

PROJECT.md ADDED Viewed

	@@ -0,0 +1,208 @@

+# Support Queue OpenEnv
+A real-world OpenEnv benchmark for **SaaS support triage**.
+Agents must read incoming support tickets, assign the right priority, route the case to the correct internal queue, choose the next action, and draft a safe first reply. The benchmark is designed to feel like an actual support operations workflow rather than a toy task.
+## Why This Environment
+Real support teams repeatedly solve the same high-value triage problems:
+- decide how urgent a ticket is
+- route it to the right team
+- avoid unsafe or misleading replies
+- handle ambiguous requests without over-escalating
+This makes support triage a strong RL and agent-evaluation environment because success is measurable, partial credit is meaningful, and mistakes are easy to interpret.
+## What The Agent Does
+For each ticket, the agent must produce a `SupportQueueAction` with:
+- `priority`: `P1 | P2 | P3 | P4`
+- `queue`: `billing | security | technical | success | trust_safety`
+- `disposition`: `respond | request_info | escalate | close`
+- `summary`: short internal triage note
+- `response`: first customer-facing reply
+- `confidence`: float in `[0.0, 1.0]`
+## Observation Space
+Each `reset()` and `step()` returns a typed `SupportQueueObservation` containing:
+| Field | Meaning |
+| --- | --- |
+| `task_id`, `task_title`, `difficulty` | Active benchmark task metadata |
+| `instructions` | Task-specific operating guidance |
+| `current_index`, `total_tickets` | Episode progress |
+| `ticket` | Current customer ticket payload |
+| `allowed_priorities`, `allowed_queues`, `allowed_dispositions` | Valid discrete actions |
+| `scoring_weights` | Reward decomposition |
+| `last_feedback` | Previous grader output |
+| `reward`, `cumulative_reward`, `done` | Episode feedback |
+| `info` | Extra metadata such as `episode_id` |
+The ticket payload includes:
+- `ticket_id`
+- `subject`
+- `body`
+- `customer_tier`
+- `product_area`
+- `sla_hours`
+- `recent_events`
+## State Space
+`state()` returns a typed `SupportQueueState` with:
+- active task card
+- current cursor
+- cumulative and average reward
+- processed ticket ids
+- full action history
+- full per-ticket grading history
+## Tasks
+The benchmark includes three deterministic tasks with increasing difficulty.
+| Task ID | Difficulty | Tickets | Description |
+| --- | --- | ---: | --- |
+| `easy_inbox_cleanup` | Easy | 2 | Straightforward access and billing tickets |
+| `medium_sla_defense` | Medium | 3 | Mix of phishing escalation, webhook failure, and billing ambiguity |
+| `hard_exec_escalations` | Hard | 4 | Executive-pressure tickets spanning production, security, commercial, and retention workflows |
+## Reward Design
+Each processed ticket gets a reward in `[0.0, 1.0]`.
+Reward components:
+| Component | Weight |
+| --- | ---: |
+| Priority accuracy | `0.30` |
+| Queue accuracy | `0.25` |
+| Disposition accuracy | `0.20` |
+| Summary keyword coverage | `0.15` |
+| Response keyword coverage | `0.10` |
+| Unsafe reply penalty | `-0.10` |
+This gives useful partial progress signals. An agent can still earn reward for a good route or good reply even if one part of the triage decision is wrong.
+## API Surface
+The environment server exposes:
+- `POST /reset`
+- `POST /step`
+- `GET /state`
+- `GET /tasks`
+- `GET /health`
+- `GET /`
+Example reset payload:
+```json
+{
+  "task_id": "easy_inbox_cleanup"
+}
+```
+## Project Structure
+```text
+support_queue_env/
+  client.py
+  grading.py
+  models.py
+  tasks.py
+  server/
+    app.py
+    openenv_compat.py
+    support_queue_environment.py
+Dockerfile
+openenv.yaml
+inference.py
+```
+## Running Locally
+### Python
+```bash
+pip install -r requirements.txt
+uvicorn support_queue_env.server.app:app --host 0.0.0.0 --port 8000
+```
+### Docker
+```bash
+docker build -t support-queue-openenv .
+docker run --rm -p 8000:8000 support-queue-openenv
+```
+## Baseline Inference
+The required inference script is [inference.py](./inference.py).
+It:
+- uses the OpenAI Python client
+- reads `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`, and optional `LOCAL_IMAGE_NAME`
+- emits structured `[START]`, `[STEP]`, and `[END]` logs
+- writes `inference_results.json`
+Set environment variables:
+```bash
+API_BASE_URL=https://api.openai.com/v1
+MODEL_NAME=gpt-4o-mini
+HF_TOKEN=your_token
+LOCAL_IMAGE_NAME=
+```
+Then run:
+```bash
+python inference.py
+```
+## Baseline Scores
+Expected deterministic baseline scores from the bundled heuristic policy:
+| Task | Score |
+| --- | ---: |
+| `easy_inbox_cleanup` | `1.00` |
+| `medium_sla_defense` | `0.98` |
+| `hard_exec_escalations` | `0.97` |
+| Average | `0.98` |
+## Hugging Face Space
+This repository is configured for a **Docker Space**.
+- front matter in `README.md` sets `sdk: docker`
+- app serves on port `8000`
+- `GET /health` and `POST /reset` support deployment checks
+## OpenEnv Files
+Core submission files:
+- [openenv.yaml](./openenv.yaml)
+- [inference.py](./inference.py)
+- [Dockerfile](./Dockerfile)
+- [support_queue_env/models.py](./support_queue_env/models.py)
+- [support_queue_env/server/support_queue_environment.py](./support_queue_env/server/support_queue_environment.py)
+## Submission Checklist
+- typed action, observation, and state models included
+- `reset()`, `step()`, and `state()` implemented
+- three graded tasks included
+- reward bounded to `[0.0, 1.0]`
+- Dockerfile included
+- Hugging Face Docker Space compatible
+- root `inference.py` included

README.md ADDED Viewed

	@@ -0,0 +1,37 @@

+---
+title: Support Queue OpenEnv
+emoji: 🎫
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_port: 8000
+---
+# Support Queue OpenEnv
+Real-world OpenEnv benchmark for SaaS support triage.
+## Quick Links
+- Full project documentation: [PROJECT.md](./PROJECT.md)
+- OpenEnv manifest: [openenv.yaml](./openenv.yaml)
+- Baseline runner: [inference.py](./inference.py)
+- Environment server: [support_queue_environment.py](./support_queue_env/server/support_queue_environment.py)
+## Quick Start
+```bash
+docker build -t support-queue-openenv .
+docker run --rm -p 8000:8000 support-queue-openenv
+```
+Then run:
+```bash
+python inference.py
+```
+## Notes
+- This repository is configured for a Hugging Face Docker Space.
+- The full environment description, tasks, reward design, and setup guide are in [PROJECT.md](./PROJECT.md).

inference.py ADDED Viewed

	@@ -0,0 +1,274 @@

+from __future__ import annotations
+import asyncio
+import json
+import os
+from typing import Any, List
+from openai import OpenAI
+from support_queue_env.client import SupportQueueEnv
+from support_queue_env.models import TaskCard, SupportQueueAction, SupportQueueObservation
+from support_queue_env.tasks import TASKS
+API_BASE_URL = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "gpt-4o-mini")
+HF_TOKEN = os.getenv("HF_TOKEN")
+LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME")
+BENCHMARK = "support_queue_env"
+SUCCESS_SCORE_THRESHOLD = 0.80
+MAX_TOKENS = 250
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: str | None) -> None:
+    error_value = "none" if error is None else error.replace("\n", " ")
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.4f} done={str(done).lower()} error={error_value}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: list[float]) -> None:
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.4f} rewards={json.dumps([round(r, 4) for r in rewards])}",
+        flush=True,
+    )
+def get_model_message(
+    client: OpenAI,
+    step: int,
+    observation: SupportQueueObservation,
+    last_reward: float,
+    history: List[str],
+) -> str:
+    prompt = (
+        "Return a short support-triage recommendation as JSON with fields priority, queue, disposition, summary, response. "
+        f"Step: {step}. Last reward: {last_reward:.4f}. History: {history[-4:]}. Observation: {observation.model_dump_json()}"
+    )
+    try:
+        completion = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[
+                {"role": "system", "content": "You are assisting a support triage agent."},
+                {"role": "user", "content": prompt},
+            ],
+            temperature=0.0,
+            max_tokens=MAX_TOKENS,
+            stream=False,
+        )
+        text = (completion.choices[0].message.content or "").strip()
+        return text if text else "hello"
+    except Exception as exc:
+        print(f"[DEBUG] Model request failed: {exc}", flush=True)
+        return "hello"
+def available_tasks() -> list[TaskCard]:
+    return [
+        TaskCard(
+            task_id=task.task_id,
+            title=task.title,
+            difficulty=task.difficulty,
+            description=task.description,
+            ticket_count=len(task.tickets),
+        )
+        for task in TASKS
+    ]
+def heuristic_action(observation: SupportQueueObservation) -> SupportQueueAction:
+    text = " ".join(
+        [
+            observation.ticket.subject,
+            observation.ticket.body,
+            " ".join(observation.ticket.recent_events),
+            observation.task_title,
+        ]
+    ).lower()
+    if any(word in text for word in ["password reset", "account is locked", "locked out"]):
+        return SupportQueueAction(
+            priority="P3",
+            queue="technical",
+            disposition="respond",
+            summary="Customer account locked after password reset in the admin portal.",
+            response=(
+                "Thanks for reporting this. Please verify the account owner details and we will unlock the account and "
+                "confirm the next reset step for you."
+            ),
+            confidence=0.82,
+        )
+    if any(word in text for word in ["phishing", "credentials", "oauth", "unknown ip", "contractor", "security"]):
+        return SupportQueueAction(
+            priority="P1",
+            queue="security",
+            disposition="escalate",
+            summary="Security issue involving phishing, credentials, or unknown OAuth access.",
+            response=(
+                "Thanks for flagging this quickly. This is escalated to our security team now. Please do not click the message "
+                "again, revoke suspicious access where possible, and keep audit logs ready."
+            ),
+            confidence=0.9,
+        )
+    if any(word in text for word in ["502", "500", "webhook", "login", "blocked", "outage", "rollout"]):
+        priority = "P1" if any(word in text for word in ["all agents", "entire", "502", "blocked"]) else "P2"
+        return SupportQueueAction(
+            priority=priority,
+            queue="technical",
+            disposition="escalate",
+            summary="Technical incident affecting login, webhook delivery, or a recent rollout.",
+            response=(
+                "I am escalating this incident to engineering right away. Please keep example timestamps and logs handy while "
+                "we investigate the rollout behavior and urgent production impact."
+            ),
+            confidence=0.88,
+        )
+    if any(word in text for word in ["renewal", "discount", "cfo", "quote"]):
+        return SupportQueueAction(
+            priority="P2",
+            queue="success",
+            disposition="escalate",
+            summary="Renewal quote issue where the committed discount is blocking the CFO review.",
+            response=(
+                "I am escalating this to the account manager now. We will review the quote, confirm the discount commitment, "
+                "and share the escalated renewal update as soon as possible."
+            ),
+            confidence=0.83,
+        )
+    if any(word in text for word in ["cancel", "data export"]):
+        return SupportQueueAction(
+            priority="P3",
+            queue="success",
+            disposition="request_info",
+            summary="Customer wants cancellation and a data export after verification.",
+            response=(
+                "I can help with the export and cancellation flow. Please verify that you are the account owner and confirm "
+                "the workspace name so we can start the export safely."
+            ),
+            confidence=0.8,
+        )
+    if any(word in text for word in ["invoice", "charged", "billed", "refund", "billing"]):
+        unclear = any(word in text for word in ["maybe", "not fully sure", "thinks", "what details"])
+        return SupportQueueAction(
+            priority="P2" if any(word in text for word in ["charged twice", "double billed", "two identical charges"]) else "P3",
+            queue="billing",
+            disposition="request_info" if unclear else "respond",
+            summary=(
+                "Billing issue is unclear because only one invoice is visible today."
+                if unclear
+                else "Duplicate charge appears tied to a specific invoice in billing."
+            ),
+            response=(
+                "I can review this with billing. Please send the invoice number, charged amount, and the last four digits of "
+                "the payment method so we can compare the records."
+                if unclear
+                else "I am checking this with our billing team now. If this is a duplicate charge, we will investigate the invoice and share the refund update for you."
+            ),
+            confidence=0.84,
+        )
+    return SupportQueueAction(
+        priority="P3",
+        queue="technical",
+        disposition="respond",
+        summary="General product issue that needs standard technical follow-up.",
+        response="Thanks for the report. We will verify the issue and share the next reset or troubleshooting step.",
+        confidence=0.7,
+    )
+async def run_task(client: OpenAI, task: TaskCard) -> dict[str, Any]:
+    env = await SupportQueueEnv.from_docker_image(LOCAL_IMAGE_NAME)
+    history: List[str] = []
+    rewards: List[float] = []
+    steps_taken = 0
+    score = 0.0
+    success = False
+    log_start(task=task.task_id, env=BENCHMARK, model=MODEL_NAME)
+    try:
+        result = await env.reset(task_id=task.task_id)
+        last_reward = 0.0
+        for step in range(1, task.ticket_count + 1):
+            if result.done:
+                break
+            observation = result.observation
+            _ = get_model_message(client, step, observation, last_reward, history)
+            action = heuristic_action(observation)
+            result = await env.step(action)
+            reward = result.reward or 0.0
+            done = result.done
+            error = None
+            rewards.append(reward)
+            steps_taken = step
+            last_reward = reward
+            action_payload = json.dumps(action.model_dump(), separators=(",", ":"), sort_keys=True)
+            log_step(step=step, action=action_payload, reward=reward, done=done, error=error)
+            history.append(f"Step {step}: {action_payload} -> reward {reward:+.2f}")
+            if done:
+                break
+        score = sum(rewards) / len(rewards) if rewards else 0.0
+        score = min(max(score, 0.0), 1.0)
+        success = score >= SUCCESS_SCORE_THRESHOLD
+    finally:
+        try:
+            await env.close()
+        except Exception as exc:
+            print(f"[DEBUG] env.close() error (container cleanup): {exc}", flush=True)
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+    return {
+        "task_id": task.task_id,
+        "score": score,
+        "steps": steps_taken,
+        "rewards": rewards,
+        "success": success,
+    }
+async def main() -> None:
+    client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
+    results = []
+    for task in available_tasks():
+        results.append(await run_task(client, task))
+    aggregate = {
+        "benchmark": BENCHMARK,
+        "model": MODEL_NAME,
+        "average_score": round(sum(item["score"] for item in results) / len(results), 4) if results else 0.0,
+        "tasks": results,
+    }
+    with open("inference_results.json", "w", encoding="utf-8") as handle:
+        json.dump(aggregate, handle, indent=2)
+if __name__ == "__main__":
+    asyncio.run(main())

openenv.yaml ADDED Viewed

	@@ -0,0 +1,11 @@

+spec_version: 1
+name: support_queue_env
+version: "0.1.0"
+description: Deterministic SaaS support triage benchmark for OpenEnv.
+type: environment
+runtime: fastapi
+app: server.app:app
+port: 8000
+action: SupportQueueAction
+observation: SupportQueueObservation
+state: SupportQueueState

pyproject.toml ADDED Viewed

	@@ -0,0 +1,29 @@

+[build-system]
+requires = ["setuptools>=68", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-support-queue-env"
+version = "0.1.0"
+description = "Real-world OpenEnv benchmark for SaaS support queue triage."
+readme = "README.md"
+requires-python = ">=3.11"
+dependencies = [
+    "fastapi>=0.115.0",
+    "openai>=1.55.0",
+    "openenv-core[core]>=0.2.2",
+    "pydantic>=2.8.0",
+    "requests>=2.32.0",
+    "uvicorn[standard]>=0.30.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+]
+[project.scripts]
+server = "support_queue_env.server.app:main"
+[tool.setuptools.packages.find]
+include = ["support_queue_env*"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+fastapi>=0.115.0
+openai>=1.55.0
+openenv-core[core]>=0.2.2
+pydantic>=2.8.0
+requests>=2.32.0
+uvicorn[standard]>=0.30.0
+-e .

scripts/validate-submission.sh ADDED Viewed

	@@ -0,0 +1,25 @@

+#!/usr/bin/env bash
+set -euo pipefail
+PING_URL="${1:-http://127.0.0.1:8000}"
+REPO_DIR="${2:-.}"
+IMAGE_NAME="support-queue-openenv:local"
+echo "[1/4] Checking repo files"
+test -f "$REPO_DIR/openenv.yaml"
+test -f "$REPO_DIR/Dockerfile"
+test -f "$REPO_DIR/inference.py"
+echo "[2/4] Building Docker image"
+docker build -t "$IMAGE_NAME" "$REPO_DIR"
+echo "[3/4] Starting container"
+CID=$(docker run -d -p 8000:8000 "$IMAGE_NAME")
+trap 'docker rm -f "$CID" >/dev/null 2>&1 || true' EXIT
+sleep 5
+echo "[4/4] Pinging environment"
+curl -fsS "$PING_URL/health"
+curl -fsS -X POST "$PING_URL/reset" -H 'Content-Type: application/json' -d '{}'
+echo "Validation completed"

server/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ """Root server package expected by some OpenEnv validators."""
2	+

server/app.py ADDED Viewed

	@@ -0,0 +1,24 @@

+"""Validator-friendly root app entrypoint."""
+from __future__ import annotations
+import os
+import uvicorn
+from support_queue_env.server.app import app
+__all__ = ["app", "main"]
+def main() -> None:
+    uvicorn.run(
+        "server.app:app",
+        host="0.0.0.0",
+        port=int(os.getenv("PORT", "8000")),
+        reload=False,
+    )
+if __name__ == "__main__":
+    main()

support_queue_env/__init__.py ADDED Viewed

	@@ -0,0 +1,23 @@

+"""Public package exports for the support queue OpenEnv environment."""
+from support_queue_env.client import SupportQueueEnv
+from support_queue_env.models import (
+    GradingBreakdown,
+    TaskCard,
+    TicketFeedback,
+    TicketSnapshot,
+    SupportQueueAction,
+    SupportQueueObservation,
+    SupportQueueState,
+)
+__all__ = [
+    "GradingBreakdown",
+    "SupportQueueAction",
+    "SupportQueueEnv",
+    "SupportQueueObservation",
+    "SupportQueueState",
+    "TaskCard",
+    "TicketFeedback",
+    "TicketSnapshot",
+]

support_queue_env/client.py ADDED Viewed

	@@ -0,0 +1,70 @@

+"""HTTP client for interacting with the support queue environment."""
+from __future__ import annotations
+import asyncio
+import os
+from typing import Any
+import requests
+from support_queue_env.models import TaskCard, SupportQueueAction, SupportQueueObservation, SupportQueueState
+DEFAULT_ENV_BASE_URL = os.getenv("ENV_BASE_URL", "http://127.0.0.1:8000")
+class _Result:
+    def __init__(self, payload: dict[str, Any]) -> None:
+        self.observation = SupportQueueObservation.model_validate(payload["observation"])
+        self.reward = float(payload.get("reward") or 0.0)
+        self.done = bool(payload.get("done"))
+class SupportQueueEnv:
+    def __init__(self, base_url: str) -> None:
+        self.base_url = base_url.rstrip("/")
+    @classmethod
+    def from_base_url(cls, base_url: str) -> "SupportQueueEnv":
+        return cls(base_url=base_url)
+    @classmethod
+    async def from_docker_image(cls, image_name: str | None = None) -> "SupportQueueEnv":
+        _ = image_name
+        return cls(base_url=DEFAULT_ENV_BASE_URL)
+    def list_tasks(self) -> list[TaskCard]:
+        response = requests.get(f"{self.base_url}/tasks", timeout=30)
+        response.raise_for_status()
+        payload = response.json()
+        return [TaskCard.model_validate(item) for item in payload["tasks"]]
+    async def alist_tasks(self) -> list[TaskCard]:
+        return await asyncio.to_thread(self.list_tasks)
+    def reset_sync(self, **kwargs: Any) -> _Result:
+        response = requests.post(f"{self.base_url}/reset", json=kwargs or {}, timeout=30)
+        response.raise_for_status()
+        return _Result(response.json())
+    async def reset(self, **kwargs: Any) -> _Result:
+        return await asyncio.to_thread(self.reset_sync, **kwargs)
+    def step_sync(self, action: SupportQueueAction) -> _Result:
+        response = requests.post(f"{self.base_url}/step", json=action.model_dump(), timeout=30)
+        response.raise_for_status()
+        return _Result(response.json())
+    async def step(self, action: SupportQueueAction) -> _Result:
+        return await asyncio.to_thread(self.step_sync, action)
+    def state_sync(self) -> SupportQueueState:
+        response = requests.get(f"{self.base_url}/state", timeout=30)
+        response.raise_for_status()
+        return SupportQueueState.model_validate(response.json())
+    async def state(self) -> SupportQueueState:
+        return await asyncio.to_thread(self.state_sync)
+    async def close(self) -> None:
+        return None

support_queue_env/grading.py ADDED Viewed

	@@ -0,0 +1,94 @@

+"""Deterministic reward shaping and grading utilities."""
+from __future__ import annotations
+import re
+from support_queue_env.models import GradingBreakdown, SupportQueueAction, TicketFeedback
+from support_queue_env.tasks import TicketSpec
+PRIORITY_ORDER = ["P1", "P2", "P3", "P4"]
+def _normalize(text: str) -> str:
+    return re.sub(r"\s+", " ", text.lower()).strip()
+def _contains_keywords(text: str, keywords: list[str]) -> int:
+    normalized = _normalize(text)
+    return sum(1 for keyword in keywords if keyword.lower() in normalized)
+def _priority_score(expected: str, predicted: str) -> float:
+    if expected == predicted:
+        return 0.30
+    try:
+        distance = abs(PRIORITY_ORDER.index(expected) - PRIORITY_ORDER.index(predicted))
+    except ValueError:
+        return 0.0
+    if distance == 1:
+        return 0.15
+    return 0.0
+def _queue_score(ticket: TicketSpec, predicted: str) -> float:
+    if predicted == ticket.expected_queue:
+        return 0.25
+    if predicted in ticket.acceptable_queues:
+        return 0.15
+    return 0.0
+def _disposition_score(ticket: TicketSpec, predicted: str) -> float:
+    if predicted == ticket.expected_disposition:
+        return 0.20
+    if predicted in ticket.acceptable_dispositions:
+        return 0.10
+    return 0.0
+def grade_ticket(ticket: TicketSpec, action: SupportQueueAction) -> TicketFeedback:
+    summary_hits = _contains_keywords(action.summary, ticket.summary_keywords)
+    response_hits = _contains_keywords(action.response, ticket.response_keywords)
+    penalty_hits = _contains_keywords(action.response, ticket.disallowed_keywords)
+    summary_score = 0.15 * (summary_hits / len(ticket.summary_keywords)) if ticket.summary_keywords else 0.15
+    response_score = 0.10 * (response_hits / len(ticket.response_keywords)) if ticket.response_keywords else 0.10
+    penalty = -0.10 if penalty_hits else 0.0
+    breakdown = GradingBreakdown(
+        priority_score=_priority_score(ticket.expected_priority, action.priority),
+        queue_score=_queue_score(ticket, action.queue),
+        disposition_score=_disposition_score(ticket, action.disposition),
+        summary_score=round(summary_score, 4),
+        response_score=round(response_score, 4),
+        penalty=penalty,
+    )
+    total = (
+        breakdown.priority_score
+        + breakdown.queue_score
+        + breakdown.disposition_score
+        + breakdown.summary_score
+        + breakdown.response_score
+        + breakdown.penalty
+    )
+    breakdown.total = round(max(0.0, min(1.0, total)), 4)
+    matched_summary = summary_hits if ticket.summary_keywords else 0
+    matched_response = response_hits if ticket.response_keywords else 0
+    feedback = (
+        f"priority={action.priority} target={ticket.expected_priority}; "
+        f"queue={action.queue} target={ticket.expected_queue}; "
+        f"disposition={action.disposition} target={ticket.expected_disposition}; "
+        f"summary_keywords={matched_summary}/{len(ticket.summary_keywords)}; "
+        f"response_keywords={matched_response}/{len(ticket.response_keywords)}"
+    )
+    return TicketFeedback(
+        ticket_id=ticket.ticket_id,
+        expected_priority=ticket.expected_priority,
+        expected_queue=ticket.expected_queue,
+        expected_disposition=ticket.expected_disposition,
+        breakdown=breakdown,
+        feedback=feedback,
+    )

support_queue_env/models.py ADDED Viewed

	@@ -0,0 +1,125 @@

+"""Typed models for the SaaS support triage benchmark."""
+from __future__ import annotations
+from typing import Any, Literal
+from pydantic import BaseModel, ConfigDict, Field
+try:
+    from openenv.core.env_server.types import Action as OpenEnvAction
+    from openenv.core.env_server.types import Observation as OpenEnvObservation
+except Exception:  # pragma: no cover - compatibility fallback
+    OpenEnvAction = BaseModel
+    OpenEnvObservation = BaseModel
+Priority = Literal["P1", "P2", "P3", "P4"]
+QueueName = Literal["billing", "security", "technical", "success", "trust_safety"]
+Disposition = Literal["respond", "request_info", "escalate", "close"]
+Difficulty = Literal["easy", "medium", "hard"]
+CustomerTier = Literal["starter", "growth", "enterprise"]
+class TaskCard(BaseModel):
+    model_config = ConfigDict(extra="forbid")
+    task_id: str
+    title: str
+    difficulty: Difficulty
+    description: str
+    ticket_count: int
+class TicketSnapshot(BaseModel):
+    model_config = ConfigDict(extra="forbid")
+    ticket_id: str
+    subject: str
+    body: str
+    customer_tier: CustomerTier
+    product_area: str
+    sla_hours: int
+    recent_events: list[str] = Field(default_factory=list)
+class SupportQueueAction(OpenEnvAction):
+    model_config = ConfigDict(extra="forbid")
+    priority: Priority
+    queue: QueueName
+    disposition: Disposition
+    summary: str = Field(..., min_length=8, max_length=280)
+    response: str = Field(..., min_length=16, max_length=1200)
+    confidence: float = Field(default=0.5, ge=0.0, le=1.0)
+class GradingBreakdown(BaseModel):
+    model_config = ConfigDict(extra="forbid")
+    priority_score: float = 0.0
+    queue_score: float = 0.0
+    disposition_score: float = 0.0
+    summary_score: float = 0.0
+    response_score: float = 0.0
+    penalty: float = 0.0
+    total: float = 0.0
+class TicketFeedback(BaseModel):
+    model_config = ConfigDict(extra="forbid")
+    ticket_id: str
+    expected_priority: Priority
+    expected_queue: QueueName
+    expected_disposition: Disposition
+    breakdown: GradingBreakdown
+    feedback: str
+class SupportQueueObservation(OpenEnvObservation):
+    model_config = ConfigDict(extra="forbid")
+    task_id: str
+    task_title: str
+    difficulty: Difficulty
+    instructions: str
+    current_index: int
+    total_tickets: int
+    ticket: TicketSnapshot
+    allowed_priorities: list[Priority] = Field(default_factory=lambda: ["P1", "P2", "P3", "P4"])
+    allowed_queues: list[QueueName] = Field(
+        default_factory=lambda: ["billing", "security", "technical", "success", "trust_safety"]
+    )
+    allowed_dispositions: list[Disposition] = Field(
+        default_factory=lambda: ["respond", "request_info", "escalate", "close"]
+    )
+    scoring_weights: dict[str, float] = Field(
+        default_factory=lambda: {
+            "priority": 0.30,
+            "queue": 0.25,
+            "disposition": 0.20,
+            "summary": 0.15,
+            "response": 0.10,
+        }
+    )
+    last_feedback: TicketFeedback | None = None
+    cumulative_reward: float = 0.0
+    reward: float = 0.0
+    done: bool = False
+    info: dict[str, Any] = Field(default_factory=dict)
+class SupportQueueState(BaseModel):
+    model_config = ConfigDict(extra="forbid")
+    episode_id: str
+    task: TaskCard
+    current_index: int
+    total_tickets: int
+    done: bool
+    cumulative_reward: float
+    average_reward: float
+    ticket_scores: list[TicketFeedback] = Field(default_factory=list)
+    action_history: list[SupportQueueAction] = Field(default_factory=list)
+    processed_tickets: list[str] = Field(default_factory=list)

support_queue_env/server/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Server package for the support queue environment."""

support_queue_env/server/app.py ADDED Viewed

	@@ -0,0 +1,60 @@

+"""FastAPI app entrypoint for local runs and Hugging Face Spaces."""
+from __future__ import annotations
+import os
+from fastapi import FastAPI
+try:
+    from openenv.core.env_server import create_app
+except Exception:  # pragma: no cover - compatibility fallback
+    from support_queue_env.server.openenv_compat import create_app
+from support_queue_env.models import SupportQueueAction, SupportQueueObservation
+from support_queue_env.server.support_queue_environment import SupportQueueEnvironment
+ENV_NAME = "support_queue_env"
+app: FastAPI = create_app(
+    SupportQueueEnvironment,
+    SupportQueueAction,
+    SupportQueueObservation,
+    env_name=ENV_NAME,
+    max_concurrent_envs=16,
+)
+@app.get("/")
+def root() -> dict[str, object]:
+    return {
+        "name": ENV_NAME,
+        "status": "ok",
+        "message": "Support Queue OpenEnv is running.",
+        "endpoints": ["/health", "/reset", "/step", "/state", "/tasks"],
+    }
+@app.get("/health")
+def health() -> dict[str, str]:
+    return {"status": "ok"}
+@app.get("/tasks")
+def list_tasks() -> dict[str, object]:
+    return {"tasks": [task.model_dump() for task in SupportQueueEnvironment.available_tasks()]}
+def main() -> None:
+    import uvicorn
+    uvicorn.run(
+        "support_queue_env.server.app:app",
+        host="0.0.0.0",
+        port=int(os.getenv("PORT", "8000")),
+        reload=False,
+    )
+if __name__ == "__main__":
+    main()

support_queue_env/server/openenv_compat.py ADDED Viewed

	@@ -0,0 +1,91 @@

+"""Small FastAPI compatibility layer used when openenv-core is unavailable."""
+from __future__ import annotations
+from typing import Any, Generic, TypeVar
+from fastapi import Body, FastAPI
+from pydantic import BaseModel
+ActT = TypeVar("ActT", bound=BaseModel)
+ObsT = TypeVar("ObsT", bound=BaseModel)
+StateT = TypeVar("StateT", bound=BaseModel)
+class Environment(Generic[ActT, ObsT, StateT]):
+    SUPPORTS_CONCURRENT_SESSIONS = False
+    def reset(self, **kwargs: Any) -> ObsT:
+        raise NotImplementedError
+    def step(self, action: ActT) -> ObsT:
+        raise NotImplementedError
+    def state(self) -> StateT:
+        raise NotImplementedError
+def create_app(
+    environment_cls: type[Environment[ActT, ObsT, StateT]],
+    action_model: type[ActT],
+    observation_model: type[ObsT],
+    env_name: str,
+    **_: Any,
+) -> FastAPI:
+    app = FastAPI(title=env_name)
+    app.state.environment = environment_cls()
+    @app.get("/")
+    def root() -> dict[str, Any]:
+        return {
+            "name": env_name,
+            "status": "ok",
+            "endpoints": ["/health", "/reset", "/step", "/state", "/tasks", "/metadata", "/schema"],
+        }
+    @app.get("/health")
+    def health() -> dict[str, str]:
+        return {"status": "ok"}
+    @app.get("/metadata")
+    def metadata() -> dict[str, Any]:
+        return {
+            "name": env_name,
+            "supports_state": True,
+            "supports_tasks": True,
+            "transport": "http",
+        }
+    @app.get("/schema")
+    def schema() -> dict[str, Any]:
+        return {
+            "action": action_model.model_json_schema(),
+            "observation": observation_model.model_json_schema(),
+        }
+    @app.post("/reset")
+    def reset(payload: dict[str, Any] | None = Body(default=None)) -> dict[str, Any]:
+        observation = app.state.environment.reset(**(payload or {}))
+        data = observation.model_dump()
+        return {
+            "observation": data,
+            "reward": float(data.get("reward") or 0.0),
+            "done": bool(data.get("done", False)),
+        }
+    @app.post("/step")
+    def step(payload: dict[str, Any]) -> dict[str, Any]:
+        action = action_model.model_validate(payload)
+        observation = app.state.environment.step(action)
+        data = observation.model_dump()
+        return {
+            "observation": data,
+            "reward": float(data.get("reward") or 0.0),
+            "done": bool(data.get("done", False)),
+        }
+    @app.get("/state")
+    def state() -> dict[str, Any]:
+        return app.state.environment.state().model_dump()
+    return app

support_queue_env/server/support_queue_environment.py ADDED Viewed

	@@ -0,0 +1,157 @@

+"""Environment implementation for SaaS support queue triage."""
+from __future__ import annotations
+from itertools import cycle
+from threading import Lock
+from typing import Any
+from uuid import uuid4
+try:
+    from openenv.core.env_server import Environment
+except Exception:  # pragma: no cover - compatibility fallback
+    from support_queue_env.server.openenv_compat import Environment
+from support_queue_env.grading import grade_ticket
+from support_queue_env.models import TaskCard, TicketSnapshot, SupportQueueAction, SupportQueueObservation, SupportQueueState
+from support_queue_env.tasks import TASK_INDEX, TASKS, TaskSpec
+class SupportQueueEnvironment(Environment[SupportQueueAction, SupportQueueObservation, SupportQueueState]):
+    SUPPORTS_CONCURRENT_SESSIONS = True
+    _task_cycle = cycle(task.task_id for task in TASKS)
+    _cycle_lock = Lock()
+    def __init__(self) -> None:
+        self.episode_id = ""
+        self.task: TaskSpec = TASKS[0]
+        self.current_index = 0
+        self.cumulative_reward = 0.0
+        self.ticket_scores = []
+        self.action_history = []
+        self.processed_tickets = []
+        self.done = False
+    @classmethod
+    def available_tasks(cls) -> list[TaskCard]:
+        return [
+            TaskCard(
+                task_id=task.task_id,
+                title=task.title,
+                difficulty=task.difficulty,
+                description=task.description,
+                ticket_count=len(task.tickets),
+            )
+            for task in TASKS
+        ]
+    @classmethod
+    def next_default_task_id(cls) -> str:
+        with cls._cycle_lock:
+            return next(cls._task_cycle)
+    def reset(self, task_id: str | None = None, **_: Any) -> SupportQueueObservation:
+        selected_task_id = task_id or self.next_default_task_id()
+        self.task = TASK_INDEX.get(selected_task_id, TASKS[0])
+        self.episode_id = str(uuid4())
+        self.current_index = 0
+        self.cumulative_reward = 0.0
+        self.ticket_scores = []
+        self.action_history = []
+        self.processed_tickets = []
+        self.done = False
+        return self._build_observation(reward=0.0, done=False, feedback=None)
+    def step(self, action: SupportQueueAction) -> SupportQueueObservation:
+        if self.done:
+            return self._terminal_observation("Episode already finished. Call reset() to start a new task.")
+        ticket = self.task.tickets[self.current_index]
+        feedback = grade_ticket(ticket, action)
+        self.action_history.append(action)
+        self.ticket_scores.append(feedback)
+        self.processed_tickets.append(ticket.ticket_id)
+        self.cumulative_reward = round(self.cumulative_reward + feedback.breakdown.total, 4)
+        self.current_index += 1
+        self.done = self.current_index >= len(self.task.tickets)
+        if self.done:
+            return self._terminal_observation(feedback.feedback, reward=feedback.breakdown.total, feedback=feedback)
+        return self._build_observation(reward=feedback.breakdown.total, done=False, feedback=feedback)
+    def state(self) -> SupportQueueState:
+        average_reward = self.cumulative_reward / len(self.ticket_scores) if self.ticket_scores else 0.0
+        return SupportQueueState(
+            episode_id=self.episode_id or "not-started",
+            task=TaskCard(
+                task_id=self.task.task_id,
+                title=self.task.title,
+                difficulty=self.task.difficulty,
+                description=self.task.description,
+                ticket_count=len(self.task.tickets),
+            ),
+            current_index=self.current_index,
+            total_tickets=len(self.task.tickets),
+            done=self.done,
+            cumulative_reward=round(self.cumulative_reward, 4),
+            average_reward=round(average_reward, 4),
+            ticket_scores=self.ticket_scores,
+            action_history=self.action_history,
+            processed_tickets=self.processed_tickets,
+        )
+    def _current_ticket(self) -> TicketSnapshot:
+        ticket = self.task.tickets[min(self.current_index, len(self.task.tickets) - 1)]
+        return TicketSnapshot(
+            ticket_id=ticket.ticket_id,
+            subject=ticket.subject,
+            body=ticket.body,
+            customer_tier=ticket.customer_tier,
+            product_area=ticket.product_area,
+            sla_hours=ticket.sla_hours,
+            recent_events=ticket.recent_events,
+        )
+    def _build_observation(self, reward: float, done: bool, feedback) -> SupportQueueObservation:
+        average_reward = self.cumulative_reward / len(self.ticket_scores) if self.ticket_scores else 0.0
+        return SupportQueueObservation(
+            task_id=self.task.task_id,
+            task_title=self.task.title,
+            difficulty=self.task.difficulty,
+            instructions=self.task.instructions,
+            current_index=self.current_index + 1,
+            total_tickets=len(self.task.tickets),
+            ticket=self._current_ticket(),
+            last_feedback=feedback,
+            cumulative_reward=round(self.cumulative_reward, 4),
+            reward=round(reward, 4),
+            done=done,
+            info={
+                "episode_id": self.episode_id,
+                "processed_tickets": list(self.processed_tickets),
+                "average_reward": round(average_reward, 4),
+            },
+        )
+    def _terminal_observation(self, message: str, reward: float = 0.0, feedback=None) -> SupportQueueObservation:
+        placeholder_ticket = self._current_ticket()
+        return SupportQueueObservation(
+            task_id=self.task.task_id,
+            task_title=self.task.title,
+            difficulty=self.task.difficulty,
+            instructions=f"{self.task.instructions} Episode complete.",
+            current_index=len(self.task.tickets),
+            total_tickets=len(self.task.tickets),
+            ticket=placeholder_ticket,
+            last_feedback=feedback,
+            cumulative_reward=round(self.cumulative_reward, 4),
+            reward=round(reward, 4),
+            done=True,
+            info={
+                "episode_id": self.episode_id,
+                "processed_tickets": list(self.processed_tickets),
+                "message": message,
+            },
+        )

support_queue_env/tasks.py ADDED Viewed

	@@ -0,0 +1,249 @@

+"""Deterministic task catalog for the support triage environment."""
+from __future__ import annotations
+from pydantic import BaseModel, ConfigDict, Field
+from support_queue_env.models import Difficulty, Disposition, Priority, QueueName
+class TicketSpec(BaseModel):
+    model_config = ConfigDict(extra="forbid")
+    ticket_id: str
+    subject: str
+    body: str
+    customer_tier: str
+    product_area: str
+    sla_hours: int
+    recent_events: list[str] = Field(default_factory=list)
+    expected_priority: Priority
+    expected_queue: QueueName
+    expected_disposition: Disposition
+    acceptable_queues: list[QueueName] = Field(default_factory=list)
+    acceptable_dispositions: list[Disposition] = Field(default_factory=list)
+    summary_keywords: list[str] = Field(default_factory=list)
+    response_keywords: list[str] = Field(default_factory=list)
+    disallowed_keywords: list[str] = Field(default_factory=list)
+class TaskSpec(BaseModel):
+    model_config = ConfigDict(extra="forbid")
+    task_id: str
+    title: str
+    difficulty: Difficulty
+    description: str
+    instructions: str
+    tickets: list[TicketSpec]
+TASKS: list[TaskSpec] = [
+    TaskSpec(
+        task_id="easy_inbox_cleanup",
+        title="Inbox Cleanup",
+        difficulty="easy",
+        description="Two straightforward tickets covering access and billing triage.",
+        instructions=(
+            "You are a SaaS support triage agent. For each ticket, choose priority, routing queue, "
+            "and the next best disposition. Write a short internal summary plus the first reply you "
+            "would send to the customer."
+        ),
+        tickets=[
+            TicketSpec(
+                ticket_id="E-101",
+                subject="Locked out after password reset",
+                body=(
+                    "I reset my password this morning and now the admin portal says my account is locked. "
+                    "We need to finish payroll before noon."
+                ),
+                customer_tier="starter",
+                product_area="auth",
+                sla_hours=24,
+                recent_events=["Password reset completed 2 hours ago", "No prior incidents on status page"],
+                expected_priority="P3",
+                expected_queue="technical",
+                expected_disposition="respond",
+                summary_keywords=["account", "locked", "password"],
+                response_keywords=["verify", "unlock", "reset"],
+                disallowed_keywords=["refund", "breach"],
+            ),
+            TicketSpec(
+                ticket_id="E-102",
+                subject="We were charged twice for March",
+                body=(
+                    "Our card shows two identical charges from your company for invoice INV-4481. "
+                    "Please confirm whether one of them will be refunded."
+                ),
+                customer_tier="growth",
+                product_area="billing",
+                sla_hours=8,
+                recent_events=["Invoice INV-4481 marked paid yesterday"],
+                expected_priority="P2",
+                expected_queue="billing",
+                expected_disposition="respond",
+                summary_keywords=["duplicate", "charge", "invoice"],
+                response_keywords=["refund", "investigate", "billing"],
+                disallowed_keywords=["ignore", "security incident"],
+            ),
+        ],
+    ),
+    TaskSpec(
+        task_id="medium_sla_defense",
+        title="SLA Defense",
+        difficulty="medium",
+        description="Three tickets that mix urgent escalation with an ambiguity check.",
+        instructions=(
+            "Prioritize by customer impact and risk. Security events and broad service degradation should "
+            "be escalated immediately. If the customer has not given enough evidence to act safely, ask for "
+            "the minimum details needed to proceed."
+        ),
+        tickets=[
+            TicketSpec(
+                ticket_id="M-201",
+                subject="Suspicious email asking admins to re-enter credentials",
+                body=(
+                    "Several admins received an email that looks like your login page and asks us to "
+                    "re-authenticate. One teammate clicked it but says they closed the tab before typing anything."
+                ),
+                customer_tier="enterprise",
+                product_area="security",
+                sla_hours=1,
+                recent_events=["Customer SSO is enabled", "No status page incident posted"],
+                expected_priority="P1",
+                expected_queue="security",
+                expected_disposition="escalate",
+                summary_keywords=["phishing", "credentials", "admins"],
+                response_keywords=["security", "escalated", "do not click"],
+                disallowed_keywords=["send password", "share secrets"],
+            ),
+            TicketSpec(
+                ticket_id="M-202",
+                subject="Webhook deliveries are failing after yesterday's rollout",
+                body=(
+                    "Every webhook call since 06:15 UTC has returned HTTP 500. This is blocking our downstream "
+                    "fulfillment pipeline. Can you investigate urgently?"
+                ),
+                customer_tier="growth",
+                product_area="integrations",
+                sla_hours=4,
+                recent_events=["Customer is on API version 2025-11", "Platform release went out last night"],
+                expected_priority="P2",
+                expected_queue="technical",
+                expected_disposition="escalate",
+                acceptable_queues=["success"],
+                summary_keywords=["webhook", "500", "rollout"],
+                response_keywords=["engineering", "logs", "investigate"],
+                disallowed_keywords=["duplicate charge", "unsubscribe"],
+            ),
+            TicketSpec(
+                ticket_id="M-203",
+                subject="Maybe double charged? Not fully sure",
+                body=(
+                    "My finance teammate thinks we were double billed, but I can only find one invoice in the portal. "
+                    "Could you explain what happened and what details you need from me?"
+                ),
+                customer_tier="growth",
+                product_area="billing",
+                sla_hours=12,
+                recent_events=["One paid invoice visible in portal", "No payment failures recorded"],
+                expected_priority="P3",
+                expected_queue="billing",
+                expected_disposition="request_info",
+                acceptable_dispositions=["respond"],
+                summary_keywords=["billing", "unclear", "invoice"],
+                response_keywords=["invoice", "last four", "amount"],
+                disallowed_keywords=["breach", "status page"],
+            ),
+        ],
+    ),
+    TaskSpec(
+        task_id="hard_exec_escalations",
+        title="Executive Escalations",
+        difficulty="hard",
+        description="Four high-stakes tickets that require precise triage under pressure.",
+        instructions=(
+            "You are covering an executive escalation queue during a busy incident window. Optimize for "
+            "business continuity, account safety, and clean handoffs. Use P1 only for severe production or "
+            "security impact. Ask for more detail only when it materially changes the next safe action."
+        ),
+        tickets=[
+            TicketSpec(
+                ticket_id="H-301",
+                subject="All agents see 502 during login",
+                body=(
+                    "Our entire support floor is blocked from logging in. Every browser gets a 502 after "
+                    "submitting the sign-in form. The public status page still says operational."
+                ),
+                customer_tier="enterprise",
+                product_area="auth",
+                sla_hours=1,
+                recent_events=["50+ seats on account", "Issue started 18 minutes ago"],
+                expected_priority="P1",
+                expected_queue="technical",
+                expected_disposition="escalate",
+                summary_keywords=["login", "502", "all agents"],
+                response_keywords=["incident", "engineering", "urgent"],
+                disallowed_keywords=["refund only", "close ticket"],
+            ),
+            TicketSpec(
+                ticket_id="H-302",
+                subject="Unknown OAuth app connected after employee departure",
+                body=(
+                    "An OAuth app named 'SyncFast' appeared in our workspace this morning from an IP we don't recognize. "
+                    "The only recent account change is that one contractor left yesterday."
+                ),
+                customer_tier="enterprise",
+                product_area="security",
+                sla_hours=1,
+                recent_events=["Customer has audit logs enabled", "Former contractor account was deactivated yesterday"],
+                expected_priority="P1",
+                expected_queue="security",
+                expected_disposition="escalate",
+                summary_keywords=["oauth", "unknown", "contractor"],
+                response_keywords=["security", "revoke", "escalated"],
+                disallowed_keywords=["share api key", "ignore"],
+            ),
+            TicketSpec(
+                ticket_id="H-303",
+                subject="Renewal quote lost our committed discount",
+                body=(
+                    "Our renewal quote is missing the 18% discount your sales team committed in writing. "
+                    "Our CFO will freeze procurement tomorrow if this isn't corrected."
+                ),
+                customer_tier="enterprise",
+                product_area="commercial",
+                sla_hours=6,
+                recent_events=["Renewal date in 2 days", "Account owner is on PTO"],
+                expected_priority="P2",
+                expected_queue="success",
+                expected_disposition="escalate",
+                acceptable_queues=["billing"],
+                summary_keywords=["renewal", "discount", "cfo"],
+                response_keywords=["account manager", "quote", "escalated"],
+                disallowed_keywords=["security breach", "reset password"],
+            ),
+            TicketSpec(
+                ticket_id="H-304",
+                subject="Need cancellation plus data export",
+                body=(
+                    "We're planning to cancel next month for budget reasons, but first I need a data export for "
+                    "our records. Please tell me exactly what you need from me to start."
+                ),
+                customer_tier="starter",
+                product_area="retention",
+                sla_hours=24,
+                recent_events=["No open invoices", "Account is owner-managed"],
+                expected_priority="P3",
+                expected_queue="success",
+                expected_disposition="request_info",
+                acceptable_queues=["billing"],
+                summary_keywords=["cancel", "data export", "verification"],
+                response_keywords=["verify", "export", "owner"],
+                disallowed_keywords=["breach", "status page"],
+            ),
+        ],
+    ),
+]
+TASK_INDEX = {task.task_id: task for task in TASKS}

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff