DevanshuDon commited on
Commit
722231e
·
verified ·
1 Parent(s): 5edce72

Upload 8 files

Browse files
Files changed (8) hide show
  1. Dockerfile +16 -0
  2. README.md +156 -0
  3. __init__.py +5 -0
  4. client.py +30 -0
  5. inference.py +276 -0
  6. openenv.yaml +66 -0
  7. pyproject.toml +26 -0
  8. requirements.txt +7 -0
Dockerfile ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.9-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Copy requirements first for caching
6
+ COPY requirements.txt .
7
+ RUN pip install --no-cache-dir -r requirements.txt
8
+
9
+ # Copy all source code
10
+ COPY . .
11
+
12
+ # Expose port
13
+ EXPOSE 7860
14
+
15
+ # Run the server
16
+ CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ExecAssist — Executive Assistant Environment
2
+
3
+ An OpenEnv environment where AI agents learn to manage email and calendar for busy executives.
4
+
5
+ ## Problem Statement
6
+
7
+ Every executive assistant juggles email, calendars, and scheduling conflicts daily. This environment simulates that exact challenge: read incoming requests, draft professional replies, book meetings, and resolve conflicts intelligently.
8
+
9
+ **Theme:** #3.2 - World Modeling (Personalized Tasks)
10
+
11
+ ## Tasks
12
+
13
+ ### Task 1: Easy — Simple Meeting Request
14
+ - **Challenge:** Single email with clear calendar availability
15
+ - **Agent must:** Draft polite reply + book meeting in open slot
16
+ - **Score:** 50% email quality + 50% scheduling correctness
17
+
18
+ ### Task 2: Medium — Scheduling Conflict
19
+ - **Challenge:** Requested time is already booked
20
+ - **Agent must:** Identify conflict + propose 2-3 alternatives + explain professionally
21
+ - **Score:** 30% email quality + 40% conflict resolution + 30% scheduling
22
+
23
+ ### Task 3: Hard — Multi-Party Coordination
24
+ - **Challenge:** 3 emails requesting meetings, some overlapping, priority conflicts
25
+ - **Agent must:** Prioritize + reschedule + notify all parties
26
+ - **Score:** 25% email + 25% scheduling + 25% conflict + 25% completion
27
+
28
+ ## Environment Design
29
+
30
+ ### Observation Space
31
+ - **Emails:** Sender, subject, body, priority
32
+ - **Calendar:** Existing meetings, working hours, blocked times
33
+ - **Contacts:** Names, emails, timezones
34
+
35
+ ### Action Space
36
+ ```json
37
+ {
38
+ "email_reply": "Professional response text",
39
+ "calendar_action": "book | propose_alternatives | reschedule | decline",
40
+ "meeting_details": {
41
+ "participants": ["email@company.com"],
42
+ "start_time": "2026-04-28T14:00:00",
43
+ "end_time": "2026-04-28T15:00:00",
44
+ "subject": "Meeting topic",
45
+ "proposed_alternatives": [...]
46
+ }
47
+ }
48
+ ```
49
+
50
+ ### Reward Functions (Multiple Independent Checks)
51
+
52
+ **1. Email Quality (0-1)**
53
+ - Politeness markers (thank you, regards)
54
+ - Proper greeting/closing
55
+ - Sufficient detail (20+ words)
56
+ - Professional tone (no negative framing)
57
+ - LLM-as-judge for nuance
58
+
59
+ **2. Scheduling Correctness (0-1)**
60
+ - No double-booking
61
+ - Within working hours
62
+ - Appropriate duration (15min - 2hrs)
63
+ - All participants included
64
+
65
+ **3. Conflict Resolution (0-1)**
66
+ - Recognizes conflicts
67
+ - Proposes 2-3 alternatives
68
+ - Explains professionally
69
+ - Prioritizes correctly (for hard task)
70
+
71
+ **4. Anti-Reward Hacking Penalties**
72
+ - Too short email: -0.3
73
+ - Missing meeting details: -0.4
74
+ - Generic/templated: -0.1
75
+ - Overly long: -0.15
76
+
77
+ ## Baseline Scores
78
+
79
+ ### Random Baseline
80
+ | Task | Score |
81
+ |------|-------|
82
+ | Easy | TODO |
83
+ | Medium | TODO |
84
+ | Hard | TODO |
85
+
86
+ ### AI Baseline (Nemotron 3 Super 120B) — Untrained
87
+ | Task | Score |
88
+ |------|-------|
89
+ | Easy | 0.315 |
90
+ | Medium | 0.349 |
91
+ | Hard | 0.346 |
92
+ | **Average** | **0.337** |
93
+
94
+ *Note: These are pre-training scores. The model struggles with JSON formatting, conflict detection, and professional email composition. Training target: 0.60-0.80*
95
+
96
+ ## Setup & Usage
97
+
98
+ ### Local Development
99
+
100
+ ```bash
101
+ # Clone the repository
102
+ git clone https://huggingface.co/spaces/YourUsername/exec-assist
103
+ cd exec-assist
104
+
105
+ # Install dependencies
106
+ pip install -r requirements.txt
107
+
108
+ # Run the server
109
+ uvicorn server.app:app --reload
110
+
111
+ # Open API docs
112
+ # http://127.0.0.1:8000/docs
113
+ ```
114
+
115
+ ### Run Baseline Inference
116
+
117
+ ```bash
118
+ # Set environment variables
119
+ export API_BASE_URL=https://openrouter.ai/api/v1
120
+ export MODEL_NAME=nvidia/nemotron-3-super-120b-a12b:free
121
+ export HF_TOKEN=your-api-key
122
+
123
+ # Run inference
124
+ python inference.py
125
+ ```
126
+
127
+ ### Docker
128
+
129
+ ```bash
130
+ docker build -t exec-assist .
131
+ docker run -p 7860:7860 exec-assist
132
+ ```
133
+
134
+ ## Training (TODO — Apr 26)
135
+
136
+ We will train using TRL + Unsloth:
137
+ 1. GRPO trainer setup
138
+ 2. Reward shaping
139
+ 3. Baseline comparison
140
+ 4. Before/after examples
141
+
142
+ ## API Endpoints
143
+
144
+ | Endpoint | Method | Description |
145
+ |----------|--------|-------------|
146
+ | `/reset?task=easy\|medium\|hard` | POST | Start new episode |
147
+ | `/step` | POST | Submit action, get reward |
148
+ | `/state` | GET | Current state |
149
+ | `/tasks` | GET | List all tasks |
150
+ | `/health` | GET | Health check |
151
+ | `/metadata` | GET | Environment info |
152
+ | `/schema` | GET | Action/observation/state schemas |
153
+
154
+ ## Author
155
+
156
+ **Gang-gay** — Built for OpenEnv Hackathon 2026
__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """ExecAssist — Executive Assistant OpenEnv Environment."""
2
+
3
+ from client import ExecAssistEnv
4
+
5
+ __all__ = ["ExecAssistEnv"]
client.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ client.py — OpenEnv client for ExecAssist Environment
3
+
4
+ Provides typed client interface for interacting with the environment.
5
+ """
6
+
7
+ try:
8
+ from openenv import EnvClient
9
+ except ImportError:
10
+ EnvClient = object # fallback if openenv not installed
11
+
12
+ from server.models import AssistantAction, AssistantObservation, AssistantState
13
+
14
+
15
+ class ExecAssistEnv(EnvClient):
16
+ """Typed client for the Executive Assistant environment."""
17
+
18
+ metadata = {
19
+ "name": "exec-assist",
20
+ "description": "Executive Assistant environment for email and calendar management.",
21
+ }
22
+
23
+ class Action(AssistantAction):
24
+ pass
25
+
26
+ class Observation(AssistantObservation):
27
+ pass
28
+
29
+ class State(AssistantState):
30
+ pass
inference.py ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ inference.py — Baseline inference script for ExecAssist
3
+
4
+ Runs a baseline AI model against all 3 tasks using structured stdout logging.
5
+ Uses OpenRouter API with unlimited free credits.
6
+ """
7
+
8
+ import os
9
+ import json
10
+ import statistics
11
+ from typing import List, Optional
12
+
13
+ from openai import OpenAI
14
+ from dotenv import load_dotenv
15
+
16
+ load_dotenv()
17
+
18
+ # ============================================================
19
+ # CONFIGURATION
20
+ # ============================================================
21
+
22
+ API_BASE_URL = os.getenv("API_BASE_URL") or "https://openrouter.ai/api/v1"
23
+ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
24
+ MODEL_NAME = os.getenv("MODEL_NAME") or "nvidia/nemotron-3-super-120b-a12b:free"
25
+
26
+ BENCHMARK = "exec-assist"
27
+ TEMPERATURE = 0.3
28
+ MAX_TOKENS = 500
29
+
30
+
31
+ # ============================================================
32
+ # STRUCTURED STDOUT LOGGING
33
+ # ============================================================
34
+
35
+ def log_start(task: str, env: str, model: str) -> None:
36
+ print(f"[START] task={task} env={env} model={model}", flush=True)
37
+
38
+
39
+ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
40
+ error_val = error if error else "null"
41
+ done_val = str(done).lower()
42
+ print(
43
+ f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
44
+ flush=True,
45
+ )
46
+
47
+
48
+ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
49
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
50
+ print(
51
+ f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
52
+ flush=True,
53
+ )
54
+
55
+
56
+ # ============================================================
57
+ # PROMPT BUILDING
58
+ # ============================================================
59
+
60
+ def build_assistant_prompt(observation: dict) -> str:
61
+ """Build prompt for the AI model to act as executive assistant."""
62
+
63
+ emails = observation.get("emails", [])
64
+ calendar = observation.get("calendar", {})
65
+
66
+ # Build email section
67
+ email_str = ""
68
+ for email in emails:
69
+ email_str += f"\n--- Email from {email['sender']} ---\n"
70
+ email_str += f"Subject: {email['subject']}\n"
71
+ email_str += f"Priority: {email['priority']}\n"
72
+ email_str += f"Body:\n{email['body']}\n"
73
+
74
+ # Build calendar section
75
+ meetings = calendar.get("existing_meetings", [])
76
+ calendar_str = "\nExisting Meetings:\n"
77
+ if meetings:
78
+ for mtg in meetings:
79
+ calendar_str += f" - {mtg['subject']}: {mtg['start_time']} to {mtg['end_time']} (Priority: {mtg['priority']})\n"
80
+ else:
81
+ calendar_str += " (No existing meetings)\n"
82
+
83
+ working_hours = calendar.get("working_hours", {})
84
+ hours_str = "\nWorking Hours:\n"
85
+ for day, hours in working_hours.items():
86
+ hours_str += f" {day.capitalize()}: {hours}\n"
87
+
88
+ task_desc = observation.get("description", "")
89
+ action_required = observation.get("action_required", "")
90
+
91
+ prompt = f"""You are an executive assistant for {calendar.get('executive_name', 'Alex Chen')}.
92
+
93
+ TASK: {task_desc}
94
+
95
+ {email_str}
96
+
97
+ {calendar_str}
98
+
99
+ {hours_str}
100
+
101
+ ACTION REQUIRED: {action_required}
102
+
103
+ Respond with ONLY a JSON object in this exact format:
104
+ {{
105
+ "email_reply": "Your professional email response here",
106
+ "calendar_action": "book or propose_alternatives or reschedule or decline",
107
+ "meeting_details": {{
108
+ "participants": ["email1@company.com", "email2@company.com"],
109
+ "start_time": "2026-04-28T14:00:00",
110
+ "end_time": "2026-04-28T15:00:00",
111
+ "subject": "Meeting subject",
112
+ "location": "Conference Room A",
113
+ "proposed_alternatives": [
114
+ {{"start_time": "2026-04-29T10:00:00", "end_time": "2026-04-29T11:00:00", "note": "Alternative option"}}
115
+ ]
116
+ }}
117
+ }}
118
+
119
+ Important:
120
+ - Be professional and polite in email
121
+ - Check for calendar conflicts
122
+ - If conflict exists, propose 2-3 alternative times
123
+ - Include all email participants in meeting_details.participants
124
+ - Use ISO format for all times (YYYY-MM-DDTHH:MM:SS)
125
+
126
+ Respond with ONLY the JSON object, no explanation."""
127
+
128
+ return prompt
129
+
130
+
131
+ # ============================================================
132
+ # MODEL INTERACTION
133
+ # ============================================================
134
+
135
+ def call_model(client: OpenAI, prompt: str) -> str:
136
+ """Call OpenRouter API."""
137
+ try:
138
+ completion = client.chat.completions.create(
139
+ model=MODEL_NAME,
140
+ messages=[{"role": "user", "content": prompt}],
141
+ temperature=TEMPERATURE,
142
+ max_tokens=MAX_TOKENS,
143
+ )
144
+ response_text = completion.choices[0].message.content or ""
145
+ return response_text.strip()
146
+ except Exception as exc:
147
+ print(f"API error: {exc}")
148
+ return ""
149
+
150
+
151
+ # ============================================================
152
+ # RESPONSE PARSING
153
+ # ============================================================
154
+
155
+ def parse_assistant_response(response: str) -> Optional[dict]:
156
+ """Parse AI response into action dict."""
157
+
158
+ if not response:
159
+ return None
160
+
161
+ try:
162
+ # Extract JSON from response
163
+ start = response.find("{")
164
+ end = response.rfind("}") + 1
165
+ if start != -1 and end > start:
166
+ json_str = response[start:end]
167
+ parsed = json.loads(json_str)
168
+
169
+ # Validate required fields
170
+ if "email_reply" in parsed and "calendar_action" in parsed:
171
+ return parsed
172
+ except (json.JSONDecodeError, KeyError) as e:
173
+ print(f"Parse error: {e}")
174
+
175
+ return None
176
+
177
+
178
+ # ============================================================
179
+ # ENVIRONMENT INTERACTION
180
+ # ============================================================
181
+
182
+ def run_episode(client: OpenAI, task: str, env_url: str = "http://localhost:8000") -> dict:
183
+ """Run one episode against the environment."""
184
+
185
+ import requests
186
+
187
+ # Reset environment
188
+ reset_response = requests.post(f"{env_url}/reset", params={"task": task})
189
+ reset_data = reset_response.json()
190
+
191
+ observation = reset_data["observation"]
192
+
193
+ # Build prompt and get AI response
194
+ prompt = build_assistant_prompt(observation)
195
+ ai_response = call_model(client, prompt)
196
+
197
+ # Parse response
198
+ action = parse_assistant_response(ai_response)
199
+
200
+ if not action:
201
+ # Fallback action if parsing failed
202
+ action = {
203
+ "email_reply": "Thank you for your message. I'll check the calendar and get back to you shortly.",
204
+ "calendar_action": "propose_alternatives",
205
+ "meeting_details": None,
206
+ }
207
+
208
+ # Submit action to environment
209
+ step_response = requests.post(f"{env_url}/step", json=action)
210
+ step_data = step_response.json()
211
+
212
+ return {
213
+ "reward": step_data["reward"],
214
+ "done": step_data["done"],
215
+ "info": step_data.get("info", {}),
216
+ }
217
+
218
+
219
+ # ============================================================
220
+ # MAIN — Run baseline inference
221
+ # ============================================================
222
+
223
+ def main() -> None:
224
+ """Run baseline inference on all 3 tasks."""
225
+
226
+ if not API_KEY:
227
+ print("[END] success=false steps=0 score=0.000 rewards=", flush=True)
228
+ return
229
+
230
+ client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
231
+
232
+ # Environment URL (local or HF Space)
233
+ env_url = os.getenv("ENV_URL", "http://localhost:8000")
234
+
235
+ for task in ["easy", "medium", "hard"]:
236
+ rewards = []
237
+ step_count = 0
238
+
239
+ log_start(task=task, env=BENCHMARK, model=MODEL_NAME)
240
+
241
+ try:
242
+ # Run episode
243
+ result = run_episode(client, task, env_url)
244
+
245
+ reward = result["reward"]
246
+ done = result["done"]
247
+
248
+ rewards.append(reward)
249
+ step_count += 1
250
+
251
+ log_step(
252
+ step=step_count,
253
+ action=f"assistant({task})",
254
+ reward=reward,
255
+ done=done,
256
+ error=None,
257
+ )
258
+
259
+ final_score = round(reward, 4)
260
+ success = final_score > 0.5
261
+
262
+ except Exception as exc:
263
+ print(f"Error in {task}: {exc}")
264
+ final_score = 0.0
265
+ success = False
266
+
267
+ log_end(
268
+ success=success,
269
+ steps=step_count,
270
+ score=final_score,
271
+ rewards=rewards,
272
+ )
273
+
274
+
275
+ if __name__ == "__main__":
276
+ main()
openenv.yaml ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # openenv.yaml — Environment manifest
2
+
3
+ name: exec-assist
4
+ version: "1.0.0"
5
+ description: >
6
+ Executive Assistant environment where AI agents learn to manage email and calendar.
7
+ Agents must draft professional replies, schedule meetings, resolve conflicts, and
8
+ handle multi-party coordination. Tests real-world assistant capabilities across
9
+ three difficulty levels.
10
+
11
+ author: Gang-gay
12
+ repository: https://huggingface.co/spaces/YourUsername/exec-assist
13
+
14
+ tasks:
15
+ - name: easy
16
+ description: >
17
+ Simple meeting request with clear calendar availability.
18
+ Agent must draft polite reply and book the meeting correctly.
19
+ Score = 50% email quality + 50% scheduling correctness.
20
+ difficulty: easy
21
+ max_score: 1.0
22
+ action_schema:
23
+ email_reply: "Professional email response to sender"
24
+ calendar_action: "book | propose_alternatives | reschedule | decline"
25
+ meeting_details: "MeetingDetails object with time, participants, subject"
26
+
27
+ - name: medium
28
+ description: >
29
+ Scheduling conflict — requested time is already booked.
30
+ Agent must identify conflict, propose 2-3 alternative slots, and
31
+ explain professionally in email.
32
+ Score = 30% email quality + 40% conflict resolution + 30% scheduling.
33
+ difficulty: medium
34
+ max_score: 1.0
35
+ action_schema:
36
+ email_reply: "Professional email explaining conflict and proposing alternatives"
37
+ calendar_action: "propose_alternatives"
38
+ meeting_details: "MeetingDetails with proposed_alternatives list"
39
+
40
+ - name: hard
41
+ description: >
42
+ Multi-party coordination with priority conflicts.
43
+ 3 emails requesting meetings, some overlapping, one high-priority requiring
44
+ reshuffling existing meetings. Agent must prioritize, reschedule, and notify.
45
+ Score = 25% email + 25% scheduling + 25% conflict + 25% task completion.
46
+ difficulty: hard
47
+ max_score: 1.0
48
+ action_schema:
49
+ email_reply: "Professional emails to multiple parties"
50
+ calendar_action: "Multiple actions coordinated"
51
+ meeting_details: "Complete coordination plan"
52
+
53
+ endpoints:
54
+ reset: POST /reset
55
+ step: POST /step
56
+ state: GET /state
57
+ tasks: GET /tasks
58
+ health: GET /health
59
+ metadata: GET /metadata
60
+ schema: GET /schema
61
+ mcp: POST /mcp
62
+
63
+ environment:
64
+ python_version: "3.9"
65
+ framework: fastapi
66
+ deployment: huggingface_spaces
pyproject.toml ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "exec-assist"
3
+ version = "1.0.0"
4
+ description = "Executive Assistant environment where AI agents learn to manage email and calendar."
5
+ requires-python = ">=3.9"
6
+ license = {text = "MIT"}
7
+ authors = [
8
+ {name = "Gang-gay"}
9
+ ]
10
+
11
+ dependencies = [
12
+ "fastapi>=0.104.0",
13
+ "uvicorn[standard]>=0.24.0",
14
+ "openai>=1.0.0",
15
+ "python-dotenv>=1.0.0",
16
+ "openenv-core>=0.2.0",
17
+ "pydantic>=2.0.0",
18
+ "python-dateutil>=2.8.0",
19
+ ]
20
+
21
+ [project.scripts]
22
+ server = "server.app:main"
23
+
24
+ [build-system]
25
+ requires = ["setuptools>=68.0", "wheel"]
26
+ build-backend = "setuptools.build_meta"
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ fastapi>=0.104.0
2
+ uvicorn[standard]>=0.24.0
3
+ openai>=1.0.0
4
+ python-dotenv>=1.0.0
5
+ openenv-core>=0.2.0
6
+ pydantic>=2.0.0
7
+ python-dateutil>=2.8.0