Sayed223 commited on
Commit
bc56641
·
verified ·
1 Parent(s): 3e6d302

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -263
README.md DELETED
@@ -1,263 +0,0 @@
1
- # CustomerSupportEnv
2
-
3
- > An OpenEnv-compatible reinforcement learning environment for training and evaluating AI customer support agents.
4
-
5
- [![OpenEnv](https://img.shields.io/badge/OpenEnv-1.0.0-blue)](openenv.yaml)
6
- [![HF Spaces](https://img.shields.io/badge/HuggingFace-Spaces-yellow)](https://huggingface.co/spaces)
7
- [![Docker](https://img.shields.io/badge/Docker-ready-brightgreen)](Dockerfile)
8
-
9
- ---
10
-
11
- ## Overview
12
-
13
- **CustomerSupportEnv** simulates a real-world Tier-1 customer support workflow. An agent handles inbound support tickets by searching a knowledge base, empathising with customers, asking clarifying questions, and delivering concrete solutions — all within a multi-turn conversation.
14
-
15
- This environment is designed for:
16
- - Training RL agents on real-world NLP tasks
17
- - Benchmarking LLM-based tool-use and retrieval-augmented reasoning
18
- - Evaluating customer satisfaction optimisation policies
19
-
20
- ---
21
-
22
- ## Quick Start
23
-
24
- ### Docker (recommended)
25
- ```bash
26
- git clone https://huggingface.co/spaces/<your-username>/customer-support-env
27
- cd customer-support-env
28
- docker build -t customer-support-env .
29
- docker run -p 7860:7860 customer-support-env
30
- ```
31
-
32
- ### Local
33
- ```bash
34
- pip install -r requirements.txt
35
- uvicorn server:app --host 0.0.0.0 --port 7860
36
- ```
37
-
38
- ### Run baseline inference
39
- ```bash
40
- export API_BASE_URL=https://api.openai.com/v1
41
- export MODEL_NAME=gpt-4o-mini
42
- export HF_TOKEN=sk-...
43
- python inference.py
44
- ```
45
-
46
- ---
47
-
48
- ## Environment Description
49
-
50
- Each **episode** = one customer support ticket. The agent takes a sequence of actions (turns) until it calls `resolve()` or exceeds `max_turns`.
51
-
52
- ### Real-world fidelity
53
- - Tickets span 5 categories: **auth**, **billing**, **fulfillment**, **bug**, **sales**
54
- - Customers have dynamic sentiment: **positive / neutral / frustrated / angry**
55
- - Knowledge base retrieval is gated — agent must explicitly call `search_kb`
56
- - Conversation history accumulates across turns, mirroring real support tooling
57
- - CSAT (customer satisfaction) is a synthetic secondary objective
58
-
59
- ---
60
-
61
- ## OpenEnv API
62
-
63
- ### `POST /reset`
64
- ```json
65
- { "task_id": "task_1" }
66
- ```
67
- Returns an `Observation`. Initialises a fresh episode.
68
-
69
- ### `POST /step`
70
- ```json
71
- { "task_id": "task_1", "action_type": "search_kb", "payload": null }
72
- ```
73
- Returns a `StepResult` containing `observation`, `reward`, `done`, `info`.
74
-
75
- ### `GET /state?task_id=task_1`
76
- Returns the current `Observation` without advancing the environment.
77
-
78
- ### `POST /grade`
79
- ```json
80
- { "task_id": "task_1" }
81
- ```
82
- Returns a `GraderResult` with score (0.0–1.0), breakdown, and pass/fail.
83
-
84
- ### `GET /tasks`
85
- Lists all task specs.
86
-
87
- ### `GET /health`
88
- Returns `{"status": "ok"}`.
89
-
90
- ---
91
-
92
- ## Observation Space
93
-
94
- | Field | Type | Description |
95
- |-------|------|-------------|
96
- | `ticket_id` | string | Ticket identifier (e.g. `TKT-001`) |
97
- | `task_id` | string | Active task (`task_1` / `task_2` / `task_3`) |
98
- | `status` | enum | `idle` \| `open` \| `resolved` \| `escalated` \| `timeout` |
99
- | `sentiment` | enum | `positive` \| `neutral` \| `frustrated` \| `angry` |
100
- | `priority` | enum | `low` \| `medium` \| `high` \| `urgent` |
101
- | `category` | enum | `auth` \| `billing` \| `fulfillment` \| `bug` \| `sales` |
102
- | `turn` | int | Current turn number |
103
- | `max_turns` | int | Maximum turns before timeout |
104
- | `history` | Message[] | Full conversation: `{role, text, turn}` |
105
- | `kb_results` | string[] | KB articles retrieved (empty until `search_kb` called) |
106
- | `kb_searched` | bool | Whether KB has been consulted |
107
- | `empathized` | bool | Whether agent expressed empathy |
108
- | `clarified` | bool | Whether agent asked a clarifying question |
109
- | `solution_offered` | bool | Whether a solution has been offered |
110
- | `escalated` | bool | Whether ticket was escalated |
111
- | `cumulative_reward` | float | Running total reward |
112
- | `done` | bool | Episode termination flag |
113
-
114
- ---
115
-
116
- ## Action Space
117
-
118
- | Action | Payload | Reward | Notes |
119
- |--------|---------|--------|-------|
120
- | `search_kb` | — | **+2.0** | Retrieves KB articles for this ticket's category. Penalty −1.0 on duplicate. |
121
- | `empathize` | — | **+1.0** | Acknowledges customer frustration. Zero reward on repeat. |
122
- | `ask_clarify` | question text | **+1.0** | Requests more detail. Zero reward on repeat. |
123
- | `offer_solution` | solution text | **+3.0 × quality** | Solution is scored against expected keywords. Penalty −1.0 if KB not searched first. |
124
- | `escalate` | — | **−1.0** | Transfers to tier-2. Penalised to incentivise in-tier resolution. |
125
- | `resolve` | — | **+5.0 + CSAT×2** | Ends episode. Penalty −3.0 if no solution offered. |
126
- | `send_message` | message text | **+0.5** | Generic message. Useful for multi-turn clarification. |
127
-
128
- ### Reward decomposition
129
- Every `Reward` object includes:
130
- - `total` — net step reward
131
- - `process_score` — correct action sequencing (0–1)
132
- - `quality_score` — solution quality (0–1)
133
- - `efficiency_score` — steps taken vs. optimal (0–1)
134
- - `csat_score` — synthetic customer satisfaction (0–1)
135
- - `penalties` — total penalties this step
136
-
137
- ---
138
-
139
- ## Tasks
140
-
141
- ### Task 1 — Easy: Resolve a Standard Auth Ticket
142
- - **Ticket**: TKT-001 (account lockout, frustrated customer)
143
- - **Max turns**: 8
144
- - **Optimal policy**: `search_kb → empathize → offer_solution → resolve`
145
- - **Max reward**: ~11.0
146
- - **Grader weights**: KB searched (0.30), empathy (0.25), solution quality (0.25), resolved (0.20)
147
-
148
- ### Task 2 — Medium: Handle a Billing Dispute
149
- - **Ticket**: TKT-003 (wrong invoice amount after plan downgrade)
150
- - **Max turns**: 10
151
- - **Optimal policy**: `search_kb → ask_clarify → empathize → offer_solution → resolve`
152
- - **Challenge**: Generic solutions penalised; agent must cite a specific dollar credit.
153
- - **Grader weights**: clarify (0.20), KB (0.20), solution quality (0.30), empathy (0.15), resolved (0.15)
154
-
155
- ### Task 3 — Hard: Triage a Critical Time-Sensitive Bug
156
- - **Ticket**: TKT-006 (data export stuck, compliance deadline tomorrow)
157
- - **Max turns**: 8
158
- - **Optimal policy**: `search_kb → empathize → ask_clarify → offer_solution → resolve`
159
- - **Challenge**: Two-part solution required (priority queue + partial export). Escalation is capped. Score requires urgency awareness.
160
- - **Grader weights**: KB (0.20), empathy (0.15), two-part solution (0.35), no escalation (0.15), resolved (0.15)
161
-
162
- ---
163
-
164
- ## Reward Function Design
165
-
166
- The reward function encodes three business objectives simultaneously:
167
-
168
- 1. **Resolution quality** — `offer_solution` reward scales with solution quality score (keyword matching against canonical solution). Forces the agent to consult the KB before improvising.
169
-
170
- 2. **Process compliance** — Action sequencing is rewarded and penalised: searching KB first, empathising with high-sentiment customers, clarifying ambiguities before offering solutions.
171
-
172
- 3. **Customer experience** — The CSAT bonus on `resolve` (up to +2.0) creates a secondary objective that rewards empathetic, knowledge-grounded interactions even when the base resolution is correct.
173
-
174
- ### Shaped vs. sparse
175
- Reward is **dense** — every action produces a signal. The agent never needs to reach `resolve` to receive useful gradient. This allows value-function methods to learn efficient policies from incomplete trajectories.
176
-
177
- ---
178
-
179
- ## Grader Specification
180
-
181
- All graders are **deterministic**: identical observations produce identical scores.
182
-
183
- - Scores are in `[0.0, 1.0]`
184
- - Each grader inspects the final `Observation`: flags (`kb_searched`, `empathized`, `clarified`, `solution_offered`, `escalated`, `status`) and conversation `history`
185
- - Solution quality is measured by keyword presence in agent turn text
186
- - **Pass threshold**: ≥ 0.70 on all tasks
187
-
188
- ---
189
-
190
- ## Baseline Scores
191
-
192
- | Task | Difficulty | Model | Grader Score | Passed |
193
- |------|-----------|-------|-------------|--------|
194
- | task_1 | easy | gpt-4o-mini | 0.85 | ✓ |
195
- | task_2 | medium | gpt-4o-mini | 0.78 | ✓ |
196
- | task_3 | hard | gpt-4o-mini | 0.65 | — |
197
- | **avg** | | | **0.76** | |
198
-
199
- ---
200
-
201
- ## Project Structure
202
-
203
- ```
204
- customer_support_env/
205
- ├── server.py # FastAPI app — /reset, /step, /state, /grade
206
- ├── inference.py # Baseline inference script (OpenAI client)
207
- ├── openenv.yaml # OpenEnv spec file
208
- ├── requirements.txt
209
- ├── Dockerfile
210
- ├── README.md
211
- ├── env/
212
- │ ├── __init__.py
213
- │ ├── models.py # Typed Pydantic models: Observation, Action, Reward
214
- │ ├── environment.py # Core CustomerSupportEnv class
215
- │ └── tickets.py # Ticket scenario database (6 tickets, KB articles)
216
- ├── graders/
217
- │ ├── __init__.py
218
- │ └── graders.py # Programmatic graders for all 3 tasks
219
- └── tests/
220
- ├── __init__.py
221
- └── test_env.py # 25 unit tests
222
- ```
223
-
224
- ---
225
-
226
- ## Running Tests
227
-
228
- ```bash
229
- pytest tests/ -v
230
- ```
231
-
232
- Or without pytest:
233
- ```bash
234
- python -m tests.test_env
235
- ```
236
-
237
- ---
238
-
239
- ## Hugging Face Space Configuration
240
-
241
- Add the following to the top of `README.md` for HF Spaces auto-detection:
242
-
243
- ```yaml
244
- ---
245
- title: CustomerSupportEnv
246
- emoji: 🎧
247
- colorFrom: blue
248
- colorTo: indigo
249
- sdk: docker
250
- pinned: false
251
- tags:
252
- - openenv
253
- - reinforcement-learning
254
- - customer-support
255
- - nlp
256
- ---
257
- ```
258
-
259
- ---
260
-
261
- ## License
262
-
263
- MIT