Spaces:
Sleeping
Sleeping
Upload 8 files
Browse files- Dockerfile +16 -0
- README.md +156 -0
- __init__.py +5 -0
- client.py +30 -0
- inference.py +276 -0
- openenv.yaml +66 -0
- pyproject.toml +26 -0
- requirements.txt +7 -0
Dockerfile
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.9-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
# Copy requirements first for caching
|
| 6 |
+
COPY requirements.txt .
|
| 7 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 8 |
+
|
| 9 |
+
# Copy all source code
|
| 10 |
+
COPY . .
|
| 11 |
+
|
| 12 |
+
# Expose port
|
| 13 |
+
EXPOSE 7860
|
| 14 |
+
|
| 15 |
+
# Run the server
|
| 16 |
+
CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]
|
README.md
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ExecAssist — Executive Assistant Environment
|
| 2 |
+
|
| 3 |
+
An OpenEnv environment where AI agents learn to manage email and calendar for busy executives.
|
| 4 |
+
|
| 5 |
+
## Problem Statement
|
| 6 |
+
|
| 7 |
+
Every executive assistant juggles email, calendars, and scheduling conflicts daily. This environment simulates that exact challenge: read incoming requests, draft professional replies, book meetings, and resolve conflicts intelligently.
|
| 8 |
+
|
| 9 |
+
**Theme:** #3.2 - World Modeling (Personalized Tasks)
|
| 10 |
+
|
| 11 |
+
## Tasks
|
| 12 |
+
|
| 13 |
+
### Task 1: Easy — Simple Meeting Request
|
| 14 |
+
- **Challenge:** Single email with clear calendar availability
|
| 15 |
+
- **Agent must:** Draft polite reply + book meeting in open slot
|
| 16 |
+
- **Score:** 50% email quality + 50% scheduling correctness
|
| 17 |
+
|
| 18 |
+
### Task 2: Medium — Scheduling Conflict
|
| 19 |
+
- **Challenge:** Requested time is already booked
|
| 20 |
+
- **Agent must:** Identify conflict + propose 2-3 alternatives + explain professionally
|
| 21 |
+
- **Score:** 30% email quality + 40% conflict resolution + 30% scheduling
|
| 22 |
+
|
| 23 |
+
### Task 3: Hard — Multi-Party Coordination
|
| 24 |
+
- **Challenge:** 3 emails requesting meetings, some overlapping, priority conflicts
|
| 25 |
+
- **Agent must:** Prioritize + reschedule + notify all parties
|
| 26 |
+
- **Score:** 25% email + 25% scheduling + 25% conflict + 25% completion
|
| 27 |
+
|
| 28 |
+
## Environment Design
|
| 29 |
+
|
| 30 |
+
### Observation Space
|
| 31 |
+
- **Emails:** Sender, subject, body, priority
|
| 32 |
+
- **Calendar:** Existing meetings, working hours, blocked times
|
| 33 |
+
- **Contacts:** Names, emails, timezones
|
| 34 |
+
|
| 35 |
+
### Action Space
|
| 36 |
+
```json
|
| 37 |
+
{
|
| 38 |
+
"email_reply": "Professional response text",
|
| 39 |
+
"calendar_action": "book | propose_alternatives | reschedule | decline",
|
| 40 |
+
"meeting_details": {
|
| 41 |
+
"participants": ["email@company.com"],
|
| 42 |
+
"start_time": "2026-04-28T14:00:00",
|
| 43 |
+
"end_time": "2026-04-28T15:00:00",
|
| 44 |
+
"subject": "Meeting topic",
|
| 45 |
+
"proposed_alternatives": [...]
|
| 46 |
+
}
|
| 47 |
+
}
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
### Reward Functions (Multiple Independent Checks)
|
| 51 |
+
|
| 52 |
+
**1. Email Quality (0-1)**
|
| 53 |
+
- Politeness markers (thank you, regards)
|
| 54 |
+
- Proper greeting/closing
|
| 55 |
+
- Sufficient detail (20+ words)
|
| 56 |
+
- Professional tone (no negative framing)
|
| 57 |
+
- LLM-as-judge for nuance
|
| 58 |
+
|
| 59 |
+
**2. Scheduling Correctness (0-1)**
|
| 60 |
+
- No double-booking
|
| 61 |
+
- Within working hours
|
| 62 |
+
- Appropriate duration (15min - 2hrs)
|
| 63 |
+
- All participants included
|
| 64 |
+
|
| 65 |
+
**3. Conflict Resolution (0-1)**
|
| 66 |
+
- Recognizes conflicts
|
| 67 |
+
- Proposes 2-3 alternatives
|
| 68 |
+
- Explains professionally
|
| 69 |
+
- Prioritizes correctly (for hard task)
|
| 70 |
+
|
| 71 |
+
**4. Anti-Reward Hacking Penalties**
|
| 72 |
+
- Too short email: -0.3
|
| 73 |
+
- Missing meeting details: -0.4
|
| 74 |
+
- Generic/templated: -0.1
|
| 75 |
+
- Overly long: -0.15
|
| 76 |
+
|
| 77 |
+
## Baseline Scores
|
| 78 |
+
|
| 79 |
+
### Random Baseline
|
| 80 |
+
| Task | Score |
|
| 81 |
+
|------|-------|
|
| 82 |
+
| Easy | TODO |
|
| 83 |
+
| Medium | TODO |
|
| 84 |
+
| Hard | TODO |
|
| 85 |
+
|
| 86 |
+
### AI Baseline (Nemotron 3 Super 120B) — Untrained
|
| 87 |
+
| Task | Score |
|
| 88 |
+
|------|-------|
|
| 89 |
+
| Easy | 0.315 |
|
| 90 |
+
| Medium | 0.349 |
|
| 91 |
+
| Hard | 0.346 |
|
| 92 |
+
| **Average** | **0.337** |
|
| 93 |
+
|
| 94 |
+
*Note: These are pre-training scores. The model struggles with JSON formatting, conflict detection, and professional email composition. Training target: 0.60-0.80*
|
| 95 |
+
|
| 96 |
+
## Setup & Usage
|
| 97 |
+
|
| 98 |
+
### Local Development
|
| 99 |
+
|
| 100 |
+
```bash
|
| 101 |
+
# Clone the repository
|
| 102 |
+
git clone https://huggingface.co/spaces/YourUsername/exec-assist
|
| 103 |
+
cd exec-assist
|
| 104 |
+
|
| 105 |
+
# Install dependencies
|
| 106 |
+
pip install -r requirements.txt
|
| 107 |
+
|
| 108 |
+
# Run the server
|
| 109 |
+
uvicorn server.app:app --reload
|
| 110 |
+
|
| 111 |
+
# Open API docs
|
| 112 |
+
# http://127.0.0.1:8000/docs
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
### Run Baseline Inference
|
| 116 |
+
|
| 117 |
+
```bash
|
| 118 |
+
# Set environment variables
|
| 119 |
+
export API_BASE_URL=https://openrouter.ai/api/v1
|
| 120 |
+
export MODEL_NAME=nvidia/nemotron-3-super-120b-a12b:free
|
| 121 |
+
export HF_TOKEN=your-api-key
|
| 122 |
+
|
| 123 |
+
# Run inference
|
| 124 |
+
python inference.py
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
### Docker
|
| 128 |
+
|
| 129 |
+
```bash
|
| 130 |
+
docker build -t exec-assist .
|
| 131 |
+
docker run -p 7860:7860 exec-assist
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
## Training (TODO — Apr 26)
|
| 135 |
+
|
| 136 |
+
We will train using TRL + Unsloth:
|
| 137 |
+
1. GRPO trainer setup
|
| 138 |
+
2. Reward shaping
|
| 139 |
+
3. Baseline comparison
|
| 140 |
+
4. Before/after examples
|
| 141 |
+
|
| 142 |
+
## API Endpoints
|
| 143 |
+
|
| 144 |
+
| Endpoint | Method | Description |
|
| 145 |
+
|----------|--------|-------------|
|
| 146 |
+
| `/reset?task=easy\|medium\|hard` | POST | Start new episode |
|
| 147 |
+
| `/step` | POST | Submit action, get reward |
|
| 148 |
+
| `/state` | GET | Current state |
|
| 149 |
+
| `/tasks` | GET | List all tasks |
|
| 150 |
+
| `/health` | GET | Health check |
|
| 151 |
+
| `/metadata` | GET | Environment info |
|
| 152 |
+
| `/schema` | GET | Action/observation/state schemas |
|
| 153 |
+
|
| 154 |
+
## Author
|
| 155 |
+
|
| 156 |
+
**Gang-gay** — Built for OpenEnv Hackathon 2026
|
__init__.py
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""ExecAssist — Executive Assistant OpenEnv Environment."""
|
| 2 |
+
|
| 3 |
+
from client import ExecAssistEnv
|
| 4 |
+
|
| 5 |
+
__all__ = ["ExecAssistEnv"]
|
client.py
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
client.py — OpenEnv client for ExecAssist Environment
|
| 3 |
+
|
| 4 |
+
Provides typed client interface for interacting with the environment.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
try:
|
| 8 |
+
from openenv import EnvClient
|
| 9 |
+
except ImportError:
|
| 10 |
+
EnvClient = object # fallback if openenv not installed
|
| 11 |
+
|
| 12 |
+
from server.models import AssistantAction, AssistantObservation, AssistantState
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
class ExecAssistEnv(EnvClient):
|
| 16 |
+
"""Typed client for the Executive Assistant environment."""
|
| 17 |
+
|
| 18 |
+
metadata = {
|
| 19 |
+
"name": "exec-assist",
|
| 20 |
+
"description": "Executive Assistant environment for email and calendar management.",
|
| 21 |
+
}
|
| 22 |
+
|
| 23 |
+
class Action(AssistantAction):
|
| 24 |
+
pass
|
| 25 |
+
|
| 26 |
+
class Observation(AssistantObservation):
|
| 27 |
+
pass
|
| 28 |
+
|
| 29 |
+
class State(AssistantState):
|
| 30 |
+
pass
|
inference.py
ADDED
|
@@ -0,0 +1,276 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
inference.py — Baseline inference script for ExecAssist
|
| 3 |
+
|
| 4 |
+
Runs a baseline AI model against all 3 tasks using structured stdout logging.
|
| 5 |
+
Uses OpenRouter API with unlimited free credits.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import os
|
| 9 |
+
import json
|
| 10 |
+
import statistics
|
| 11 |
+
from typing import List, Optional
|
| 12 |
+
|
| 13 |
+
from openai import OpenAI
|
| 14 |
+
from dotenv import load_dotenv
|
| 15 |
+
|
| 16 |
+
load_dotenv()
|
| 17 |
+
|
| 18 |
+
# ============================================================
|
| 19 |
+
# CONFIGURATION
|
| 20 |
+
# ============================================================
|
| 21 |
+
|
| 22 |
+
API_BASE_URL = os.getenv("API_BASE_URL") or "https://openrouter.ai/api/v1"
|
| 23 |
+
API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
|
| 24 |
+
MODEL_NAME = os.getenv("MODEL_NAME") or "nvidia/nemotron-3-super-120b-a12b:free"
|
| 25 |
+
|
| 26 |
+
BENCHMARK = "exec-assist"
|
| 27 |
+
TEMPERATURE = 0.3
|
| 28 |
+
MAX_TOKENS = 500
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
# ============================================================
|
| 32 |
+
# STRUCTURED STDOUT LOGGING
|
| 33 |
+
# ============================================================
|
| 34 |
+
|
| 35 |
+
def log_start(task: str, env: str, model: str) -> None:
|
| 36 |
+
print(f"[START] task={task} env={env} model={model}", flush=True)
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
|
| 40 |
+
error_val = error if error else "null"
|
| 41 |
+
done_val = str(done).lower()
|
| 42 |
+
print(
|
| 43 |
+
f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
|
| 44 |
+
flush=True,
|
| 45 |
+
)
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
|
| 49 |
+
rewards_str = ",".join(f"{r:.2f}" for r in rewards)
|
| 50 |
+
print(
|
| 51 |
+
f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
|
| 52 |
+
flush=True,
|
| 53 |
+
)
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
# ============================================================
|
| 57 |
+
# PROMPT BUILDING
|
| 58 |
+
# ============================================================
|
| 59 |
+
|
| 60 |
+
def build_assistant_prompt(observation: dict) -> str:
|
| 61 |
+
"""Build prompt for the AI model to act as executive assistant."""
|
| 62 |
+
|
| 63 |
+
emails = observation.get("emails", [])
|
| 64 |
+
calendar = observation.get("calendar", {})
|
| 65 |
+
|
| 66 |
+
# Build email section
|
| 67 |
+
email_str = ""
|
| 68 |
+
for email in emails:
|
| 69 |
+
email_str += f"\n--- Email from {email['sender']} ---\n"
|
| 70 |
+
email_str += f"Subject: {email['subject']}\n"
|
| 71 |
+
email_str += f"Priority: {email['priority']}\n"
|
| 72 |
+
email_str += f"Body:\n{email['body']}\n"
|
| 73 |
+
|
| 74 |
+
# Build calendar section
|
| 75 |
+
meetings = calendar.get("existing_meetings", [])
|
| 76 |
+
calendar_str = "\nExisting Meetings:\n"
|
| 77 |
+
if meetings:
|
| 78 |
+
for mtg in meetings:
|
| 79 |
+
calendar_str += f" - {mtg['subject']}: {mtg['start_time']} to {mtg['end_time']} (Priority: {mtg['priority']})\n"
|
| 80 |
+
else:
|
| 81 |
+
calendar_str += " (No existing meetings)\n"
|
| 82 |
+
|
| 83 |
+
working_hours = calendar.get("working_hours", {})
|
| 84 |
+
hours_str = "\nWorking Hours:\n"
|
| 85 |
+
for day, hours in working_hours.items():
|
| 86 |
+
hours_str += f" {day.capitalize()}: {hours}\n"
|
| 87 |
+
|
| 88 |
+
task_desc = observation.get("description", "")
|
| 89 |
+
action_required = observation.get("action_required", "")
|
| 90 |
+
|
| 91 |
+
prompt = f"""You are an executive assistant for {calendar.get('executive_name', 'Alex Chen')}.
|
| 92 |
+
|
| 93 |
+
TASK: {task_desc}
|
| 94 |
+
|
| 95 |
+
{email_str}
|
| 96 |
+
|
| 97 |
+
{calendar_str}
|
| 98 |
+
|
| 99 |
+
{hours_str}
|
| 100 |
+
|
| 101 |
+
ACTION REQUIRED: {action_required}
|
| 102 |
+
|
| 103 |
+
Respond with ONLY a JSON object in this exact format:
|
| 104 |
+
{{
|
| 105 |
+
"email_reply": "Your professional email response here",
|
| 106 |
+
"calendar_action": "book or propose_alternatives or reschedule or decline",
|
| 107 |
+
"meeting_details": {{
|
| 108 |
+
"participants": ["email1@company.com", "email2@company.com"],
|
| 109 |
+
"start_time": "2026-04-28T14:00:00",
|
| 110 |
+
"end_time": "2026-04-28T15:00:00",
|
| 111 |
+
"subject": "Meeting subject",
|
| 112 |
+
"location": "Conference Room A",
|
| 113 |
+
"proposed_alternatives": [
|
| 114 |
+
{{"start_time": "2026-04-29T10:00:00", "end_time": "2026-04-29T11:00:00", "note": "Alternative option"}}
|
| 115 |
+
]
|
| 116 |
+
}}
|
| 117 |
+
}}
|
| 118 |
+
|
| 119 |
+
Important:
|
| 120 |
+
- Be professional and polite in email
|
| 121 |
+
- Check for calendar conflicts
|
| 122 |
+
- If conflict exists, propose 2-3 alternative times
|
| 123 |
+
- Include all email participants in meeting_details.participants
|
| 124 |
+
- Use ISO format for all times (YYYY-MM-DDTHH:MM:SS)
|
| 125 |
+
|
| 126 |
+
Respond with ONLY the JSON object, no explanation."""
|
| 127 |
+
|
| 128 |
+
return prompt
|
| 129 |
+
|
| 130 |
+
|
| 131 |
+
# ============================================================
|
| 132 |
+
# MODEL INTERACTION
|
| 133 |
+
# ============================================================
|
| 134 |
+
|
| 135 |
+
def call_model(client: OpenAI, prompt: str) -> str:
|
| 136 |
+
"""Call OpenRouter API."""
|
| 137 |
+
try:
|
| 138 |
+
completion = client.chat.completions.create(
|
| 139 |
+
model=MODEL_NAME,
|
| 140 |
+
messages=[{"role": "user", "content": prompt}],
|
| 141 |
+
temperature=TEMPERATURE,
|
| 142 |
+
max_tokens=MAX_TOKENS,
|
| 143 |
+
)
|
| 144 |
+
response_text = completion.choices[0].message.content or ""
|
| 145 |
+
return response_text.strip()
|
| 146 |
+
except Exception as exc:
|
| 147 |
+
print(f"API error: {exc}")
|
| 148 |
+
return ""
|
| 149 |
+
|
| 150 |
+
|
| 151 |
+
# ============================================================
|
| 152 |
+
# RESPONSE PARSING
|
| 153 |
+
# ============================================================
|
| 154 |
+
|
| 155 |
+
def parse_assistant_response(response: str) -> Optional[dict]:
|
| 156 |
+
"""Parse AI response into action dict."""
|
| 157 |
+
|
| 158 |
+
if not response:
|
| 159 |
+
return None
|
| 160 |
+
|
| 161 |
+
try:
|
| 162 |
+
# Extract JSON from response
|
| 163 |
+
start = response.find("{")
|
| 164 |
+
end = response.rfind("}") + 1
|
| 165 |
+
if start != -1 and end > start:
|
| 166 |
+
json_str = response[start:end]
|
| 167 |
+
parsed = json.loads(json_str)
|
| 168 |
+
|
| 169 |
+
# Validate required fields
|
| 170 |
+
if "email_reply" in parsed and "calendar_action" in parsed:
|
| 171 |
+
return parsed
|
| 172 |
+
except (json.JSONDecodeError, KeyError) as e:
|
| 173 |
+
print(f"Parse error: {e}")
|
| 174 |
+
|
| 175 |
+
return None
|
| 176 |
+
|
| 177 |
+
|
| 178 |
+
# ============================================================
|
| 179 |
+
# ENVIRONMENT INTERACTION
|
| 180 |
+
# ============================================================
|
| 181 |
+
|
| 182 |
+
def run_episode(client: OpenAI, task: str, env_url: str = "http://localhost:8000") -> dict:
|
| 183 |
+
"""Run one episode against the environment."""
|
| 184 |
+
|
| 185 |
+
import requests
|
| 186 |
+
|
| 187 |
+
# Reset environment
|
| 188 |
+
reset_response = requests.post(f"{env_url}/reset", params={"task": task})
|
| 189 |
+
reset_data = reset_response.json()
|
| 190 |
+
|
| 191 |
+
observation = reset_data["observation"]
|
| 192 |
+
|
| 193 |
+
# Build prompt and get AI response
|
| 194 |
+
prompt = build_assistant_prompt(observation)
|
| 195 |
+
ai_response = call_model(client, prompt)
|
| 196 |
+
|
| 197 |
+
# Parse response
|
| 198 |
+
action = parse_assistant_response(ai_response)
|
| 199 |
+
|
| 200 |
+
if not action:
|
| 201 |
+
# Fallback action if parsing failed
|
| 202 |
+
action = {
|
| 203 |
+
"email_reply": "Thank you for your message. I'll check the calendar and get back to you shortly.",
|
| 204 |
+
"calendar_action": "propose_alternatives",
|
| 205 |
+
"meeting_details": None,
|
| 206 |
+
}
|
| 207 |
+
|
| 208 |
+
# Submit action to environment
|
| 209 |
+
step_response = requests.post(f"{env_url}/step", json=action)
|
| 210 |
+
step_data = step_response.json()
|
| 211 |
+
|
| 212 |
+
return {
|
| 213 |
+
"reward": step_data["reward"],
|
| 214 |
+
"done": step_data["done"],
|
| 215 |
+
"info": step_data.get("info", {}),
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
|
| 219 |
+
# ============================================================
|
| 220 |
+
# MAIN — Run baseline inference
|
| 221 |
+
# ============================================================
|
| 222 |
+
|
| 223 |
+
def main() -> None:
|
| 224 |
+
"""Run baseline inference on all 3 tasks."""
|
| 225 |
+
|
| 226 |
+
if not API_KEY:
|
| 227 |
+
print("[END] success=false steps=0 score=0.000 rewards=", flush=True)
|
| 228 |
+
return
|
| 229 |
+
|
| 230 |
+
client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
|
| 231 |
+
|
| 232 |
+
# Environment URL (local or HF Space)
|
| 233 |
+
env_url = os.getenv("ENV_URL", "http://localhost:8000")
|
| 234 |
+
|
| 235 |
+
for task in ["easy", "medium", "hard"]:
|
| 236 |
+
rewards = []
|
| 237 |
+
step_count = 0
|
| 238 |
+
|
| 239 |
+
log_start(task=task, env=BENCHMARK, model=MODEL_NAME)
|
| 240 |
+
|
| 241 |
+
try:
|
| 242 |
+
# Run episode
|
| 243 |
+
result = run_episode(client, task, env_url)
|
| 244 |
+
|
| 245 |
+
reward = result["reward"]
|
| 246 |
+
done = result["done"]
|
| 247 |
+
|
| 248 |
+
rewards.append(reward)
|
| 249 |
+
step_count += 1
|
| 250 |
+
|
| 251 |
+
log_step(
|
| 252 |
+
step=step_count,
|
| 253 |
+
action=f"assistant({task})",
|
| 254 |
+
reward=reward,
|
| 255 |
+
done=done,
|
| 256 |
+
error=None,
|
| 257 |
+
)
|
| 258 |
+
|
| 259 |
+
final_score = round(reward, 4)
|
| 260 |
+
success = final_score > 0.5
|
| 261 |
+
|
| 262 |
+
except Exception as exc:
|
| 263 |
+
print(f"Error in {task}: {exc}")
|
| 264 |
+
final_score = 0.0
|
| 265 |
+
success = False
|
| 266 |
+
|
| 267 |
+
log_end(
|
| 268 |
+
success=success,
|
| 269 |
+
steps=step_count,
|
| 270 |
+
score=final_score,
|
| 271 |
+
rewards=rewards,
|
| 272 |
+
)
|
| 273 |
+
|
| 274 |
+
|
| 275 |
+
if __name__ == "__main__":
|
| 276 |
+
main()
|
openenv.yaml
ADDED
|
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# openenv.yaml — Environment manifest
|
| 2 |
+
|
| 3 |
+
name: exec-assist
|
| 4 |
+
version: "1.0.0"
|
| 5 |
+
description: >
|
| 6 |
+
Executive Assistant environment where AI agents learn to manage email and calendar.
|
| 7 |
+
Agents must draft professional replies, schedule meetings, resolve conflicts, and
|
| 8 |
+
handle multi-party coordination. Tests real-world assistant capabilities across
|
| 9 |
+
three difficulty levels.
|
| 10 |
+
|
| 11 |
+
author: Gang-gay
|
| 12 |
+
repository: https://huggingface.co/spaces/YourUsername/exec-assist
|
| 13 |
+
|
| 14 |
+
tasks:
|
| 15 |
+
- name: easy
|
| 16 |
+
description: >
|
| 17 |
+
Simple meeting request with clear calendar availability.
|
| 18 |
+
Agent must draft polite reply and book the meeting correctly.
|
| 19 |
+
Score = 50% email quality + 50% scheduling correctness.
|
| 20 |
+
difficulty: easy
|
| 21 |
+
max_score: 1.0
|
| 22 |
+
action_schema:
|
| 23 |
+
email_reply: "Professional email response to sender"
|
| 24 |
+
calendar_action: "book | propose_alternatives | reschedule | decline"
|
| 25 |
+
meeting_details: "MeetingDetails object with time, participants, subject"
|
| 26 |
+
|
| 27 |
+
- name: medium
|
| 28 |
+
description: >
|
| 29 |
+
Scheduling conflict — requested time is already booked.
|
| 30 |
+
Agent must identify conflict, propose 2-3 alternative slots, and
|
| 31 |
+
explain professionally in email.
|
| 32 |
+
Score = 30% email quality + 40% conflict resolution + 30% scheduling.
|
| 33 |
+
difficulty: medium
|
| 34 |
+
max_score: 1.0
|
| 35 |
+
action_schema:
|
| 36 |
+
email_reply: "Professional email explaining conflict and proposing alternatives"
|
| 37 |
+
calendar_action: "propose_alternatives"
|
| 38 |
+
meeting_details: "MeetingDetails with proposed_alternatives list"
|
| 39 |
+
|
| 40 |
+
- name: hard
|
| 41 |
+
description: >
|
| 42 |
+
Multi-party coordination with priority conflicts.
|
| 43 |
+
3 emails requesting meetings, some overlapping, one high-priority requiring
|
| 44 |
+
reshuffling existing meetings. Agent must prioritize, reschedule, and notify.
|
| 45 |
+
Score = 25% email + 25% scheduling + 25% conflict + 25% task completion.
|
| 46 |
+
difficulty: hard
|
| 47 |
+
max_score: 1.0
|
| 48 |
+
action_schema:
|
| 49 |
+
email_reply: "Professional emails to multiple parties"
|
| 50 |
+
calendar_action: "Multiple actions coordinated"
|
| 51 |
+
meeting_details: "Complete coordination plan"
|
| 52 |
+
|
| 53 |
+
endpoints:
|
| 54 |
+
reset: POST /reset
|
| 55 |
+
step: POST /step
|
| 56 |
+
state: GET /state
|
| 57 |
+
tasks: GET /tasks
|
| 58 |
+
health: GET /health
|
| 59 |
+
metadata: GET /metadata
|
| 60 |
+
schema: GET /schema
|
| 61 |
+
mcp: POST /mcp
|
| 62 |
+
|
| 63 |
+
environment:
|
| 64 |
+
python_version: "3.9"
|
| 65 |
+
framework: fastapi
|
| 66 |
+
deployment: huggingface_spaces
|
pyproject.toml
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[project]
|
| 2 |
+
name = "exec-assist"
|
| 3 |
+
version = "1.0.0"
|
| 4 |
+
description = "Executive Assistant environment where AI agents learn to manage email and calendar."
|
| 5 |
+
requires-python = ">=3.9"
|
| 6 |
+
license = {text = "MIT"}
|
| 7 |
+
authors = [
|
| 8 |
+
{name = "Gang-gay"}
|
| 9 |
+
]
|
| 10 |
+
|
| 11 |
+
dependencies = [
|
| 12 |
+
"fastapi>=0.104.0",
|
| 13 |
+
"uvicorn[standard]>=0.24.0",
|
| 14 |
+
"openai>=1.0.0",
|
| 15 |
+
"python-dotenv>=1.0.0",
|
| 16 |
+
"openenv-core>=0.2.0",
|
| 17 |
+
"pydantic>=2.0.0",
|
| 18 |
+
"python-dateutil>=2.8.0",
|
| 19 |
+
]
|
| 20 |
+
|
| 21 |
+
[project.scripts]
|
| 22 |
+
server = "server.app:main"
|
| 23 |
+
|
| 24 |
+
[build-system]
|
| 25 |
+
requires = ["setuptools>=68.0", "wheel"]
|
| 26 |
+
build-backend = "setuptools.build_meta"
|
requirements.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
fastapi>=0.104.0
|
| 2 |
+
uvicorn[standard]>=0.24.0
|
| 3 |
+
openai>=1.0.0
|
| 4 |
+
python-dotenv>=1.0.0
|
| 5 |
+
openenv-core>=0.2.0
|
| 6 |
+
pydantic>=2.0.0
|
| 7 |
+
python-dateutil>=2.8.0
|