Spaces:

DevanshuDon
/

exec-assist

Sleeping

App Files Files Community

DevanshuDon commited on 18 days ago

Commit

722231e

verified ·

1 Parent(s): 5edce72

Upload 8 files

Browse files

Files changed (8) hide show

Dockerfile +16 -0
README.md +156 -0
__init__.py +5 -0
client.py +30 -0
inference.py +276 -0
openenv.yaml +66 -0
pyproject.toml +26 -0
requirements.txt +7 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,16 @@

+FROM python:3.9-slim
+WORKDIR /app
+# Copy requirements first for caching
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy all source code
+COPY . .
+# Expose port
+EXPOSE 7860
+# Run the server
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md ADDED Viewed

	@@ -0,0 +1,156 @@

+# ExecAssist — Executive Assistant Environment
+An OpenEnv environment where AI agents learn to manage email and calendar for busy executives.
+## Problem Statement
+Every executive assistant juggles email, calendars, and scheduling conflicts daily. This environment simulates that exact challenge: read incoming requests, draft professional replies, book meetings, and resolve conflicts intelligently.
+**Theme:** #3.2 - World Modeling (Personalized Tasks)
+## Tasks
+### Task 1: Easy — Simple Meeting Request
+- **Challenge:** Single email with clear calendar availability
+- **Agent must:** Draft polite reply + book meeting in open slot
+- **Score:** 50% email quality + 50% scheduling correctness
+### Task 2: Medium — Scheduling Conflict
+- **Challenge:** Requested time is already booked
+- **Agent must:** Identify conflict + propose 2-3 alternatives + explain professionally
+- **Score:** 30% email quality + 40% conflict resolution + 30% scheduling
+### Task 3: Hard — Multi-Party Coordination
+- **Challenge:** 3 emails requesting meetings, some overlapping, priority conflicts
+- **Agent must:** Prioritize + reschedule + notify all parties
+- **Score:** 25% email + 25% scheduling + 25% conflict + 25% completion
+## Environment Design
+### Observation Space
+- **Emails:** Sender, subject, body, priority
+- **Calendar:** Existing meetings, working hours, blocked times
+- **Contacts:** Names, emails, timezones
+### Action Space
+```json
+{
+  "email_reply": "Professional response text",
+  "calendar_action": "book | propose_alternatives | reschedule | decline",
+  "meeting_details": {
+    "participants": ["email@company.com"],
+    "start_time": "2026-04-28T14:00:00",
+    "end_time": "2026-04-28T15:00:00",
+    "subject": "Meeting topic",
+    "proposed_alternatives": [...]
+  }
+}
+```
+### Reward Functions (Multiple Independent Checks)
+**1. Email Quality (0-1)**
+- Politeness markers (thank you, regards)
+- Proper greeting/closing
+- Sufficient detail (20+ words)
+- Professional tone (no negative framing)
+- LLM-as-judge for nuance
+**2. Scheduling Correctness (0-1)**
+- No double-booking
+- Within working hours
+- Appropriate duration (15min - 2hrs)
+- All participants included
+**3. Conflict Resolution (0-1)**
+- Recognizes conflicts
+- Proposes 2-3 alternatives
+- Explains professionally
+- Prioritizes correctly (for hard task)
+**4. Anti-Reward Hacking Penalties**
+- Too short email: -0.3
+- Missing meeting details: -0.4
+- Generic/templated: -0.1
+- Overly long: -0.15
+## Baseline Scores
+### Random Baseline
+| Task | Score |
+|------|-------|
+| Easy | TODO |
+| Medium | TODO |
+| Hard | TODO |
+### AI Baseline (Nemotron 3 Super 120B) — Untrained
+| Task | Score |
+|------|-------|
+| Easy | 0.315 |
+| Medium | 0.349 |
+| Hard | 0.346 |
+| **Average** | **0.337** |
+*Note: These are pre-training scores. The model struggles with JSON formatting, conflict detection, and professional email composition. Training target: 0.60-0.80*
+## Setup & Usage
+### Local Development
+```bash
+# Clone the repository
+git clone https://huggingface.co/spaces/YourUsername/exec-assist
+cd exec-assist
+# Install dependencies
+pip install -r requirements.txt
+# Run the server
+uvicorn server.app:app --reload
+# Open API docs
+# http://127.0.0.1:8000/docs
+```
+### Run Baseline Inference
+```bash
+# Set environment variables
+export API_BASE_URL=https://openrouter.ai/api/v1
+export MODEL_NAME=nvidia/nemotron-3-super-120b-a12b:free
+export HF_TOKEN=your-api-key
+# Run inference
+python inference.py
+```
+### Docker
+```bash
+docker build -t exec-assist .
+docker run -p 7860:7860 exec-assist
+```
+## Training (TODO — Apr 26)
+We will train using TRL + Unsloth:
+1. GRPO trainer setup
+2. Reward shaping
+3. Baseline comparison
+4. Before/after examples
+## API Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/reset?task=easy\|medium\|hard` | POST | Start new episode |
+| `/step` | POST | Submit action, get reward |
+| `/state` | GET | Current state |
+| `/tasks` | GET | List all tasks |
+| `/health` | GET | Health check |
+| `/metadata` | GET | Environment info |
+| `/schema` | GET | Action/observation/state schemas |
+## Author
+**Gang-gay** — Built for OpenEnv Hackathon 2026

__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""ExecAssist — Executive Assistant OpenEnv Environment."""
+from client import ExecAssistEnv
+__all__ = ["ExecAssistEnv"]

client.py ADDED Viewed

	@@ -0,0 +1,30 @@

+"""
+client.py — OpenEnv client for ExecAssist Environment
+Provides typed client interface for interacting with the environment.
+"""
+try:
+    from openenv import EnvClient
+except ImportError:
+    EnvClient = object  # fallback if openenv not installed
+from server.models import AssistantAction, AssistantObservation, AssistantState
+class ExecAssistEnv(EnvClient):
+    """Typed client for the Executive Assistant environment."""
+    metadata = {
+        "name": "exec-assist",
+        "description": "Executive Assistant environment for email and calendar management.",
+    }
+    class Action(AssistantAction):
+        pass
+    class Observation(AssistantObservation):
+        pass
+    class State(AssistantState):
+        pass

inference.py ADDED Viewed

	@@ -0,0 +1,276 @@

+"""
+inference.py — Baseline inference script for ExecAssist
+Runs a baseline AI model against all 3 tasks using structured stdout logging.
+Uses OpenRouter API with unlimited free credits.
+"""
+import os
+import json
+import statistics
+from typing import List, Optional
+from openai import OpenAI
+from dotenv import load_dotenv
+load_dotenv()
+# ============================================================
+# CONFIGURATION
+# ============================================================
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://openrouter.ai/api/v1"
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+MODEL_NAME = os.getenv("MODEL_NAME") or "nvidia/nemotron-3-super-120b-a12b:free"
+BENCHMARK = "exec-assist"
+TEMPERATURE = 0.3
+MAX_TOKENS = 500
+# ============================================================
+# STRUCTURED STDOUT LOGGING
+# ============================================================
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+# ============================================================
+# PROMPT BUILDING
+# ============================================================
+def build_assistant_prompt(observation: dict) -> str:
+    """Build prompt for the AI model to act as executive assistant."""
+    emails = observation.get("emails", [])
+    calendar = observation.get("calendar", {})
+    # Build email section
+    email_str = ""
+    for email in emails:
+        email_str += f"\n--- Email from {email['sender']} ---\n"
+        email_str += f"Subject: {email['subject']}\n"
+        email_str += f"Priority: {email['priority']}\n"
+        email_str += f"Body:\n{email['body']}\n"
+    # Build calendar section
+    meetings = calendar.get("existing_meetings", [])
+    calendar_str = "\nExisting Meetings:\n"
+    if meetings:
+        for mtg in meetings:
+            calendar_str += f"  - {mtg['subject']}: {mtg['start_time']} to {mtg['end_time']} (Priority: {mtg['priority']})\n"
+    else:
+        calendar_str += "  (No existing meetings)\n"
+    working_hours = calendar.get("working_hours", {})
+    hours_str = "\nWorking Hours:\n"
+    for day, hours in working_hours.items():
+        hours_str += f"  {day.capitalize()}: {hours}\n"
+    task_desc = observation.get("description", "")
+    action_required = observation.get("action_required", "")
+    prompt = f"""You are an executive assistant for {calendar.get('executive_name', 'Alex Chen')}.
+TASK: {task_desc}
+{email_str}
+{calendar_str}
+{hours_str}
+ACTION REQUIRED: {action_required}
+Respond with ONLY a JSON object in this exact format:
+{{
+  "email_reply": "Your professional email response here",
+  "calendar_action": "book or propose_alternatives or reschedule or decline",
+  "meeting_details": {{
+    "participants": ["email1@company.com", "email2@company.com"],
+    "start_time": "2026-04-28T14:00:00",
+    "end_time": "2026-04-28T15:00:00",
+    "subject": "Meeting subject",
+    "location": "Conference Room A",
+    "proposed_alternatives": [
+      {{"start_time": "2026-04-29T10:00:00", "end_time": "2026-04-29T11:00:00", "note": "Alternative option"}}
+    ]
+  }}
+}}
+Important:
+- Be professional and polite in email
+- Check for calendar conflicts
+- If conflict exists, propose 2-3 alternative times
+- Include all email participants in meeting_details.participants
+- Use ISO format for all times (YYYY-MM-DDTHH:MM:SS)
+Respond with ONLY the JSON object, no explanation."""
+    return prompt
+# ============================================================
+# MODEL INTERACTION
+# ============================================================
+def call_model(client: OpenAI, prompt: str) -> str:
+    """Call OpenRouter API."""
+    try:
+        completion = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[{"role": "user", "content": prompt}],
+            temperature=TEMPERATURE,
+            max_tokens=MAX_TOKENS,
+        )
+        response_text = completion.choices[0].message.content or ""
+        return response_text.strip()
+    except Exception as exc:
+        print(f"API error: {exc}")
+        return ""
+# ============================================================
+# RESPONSE PARSING
+# ============================================================
+def parse_assistant_response(response: str) -> Optional[dict]:
+    """Parse AI response into action dict."""
+    if not response:
+        return None
+    try:
+        # Extract JSON from response
+        start = response.find("{")
+        end = response.rfind("}") + 1
+        if start != -1 and end > start:
+            json_str = response[start:end]
+            parsed = json.loads(json_str)
+            # Validate required fields
+            if "email_reply" in parsed and "calendar_action" in parsed:
+                return parsed
+    except (json.JSONDecodeError, KeyError) as e:
+        print(f"Parse error: {e}")
+    return None
+# ============================================================
+# ENVIRONMENT INTERACTION
+# ============================================================
+def run_episode(client: OpenAI, task: str, env_url: str = "http://localhost:8000") -> dict:
+    """Run one episode against the environment."""
+    import requests
+    # Reset environment
+    reset_response = requests.post(f"{env_url}/reset", params={"task": task})
+    reset_data = reset_response.json()
+    observation = reset_data["observation"]
+    # Build prompt and get AI response
+    prompt = build_assistant_prompt(observation)
+    ai_response = call_model(client, prompt)
+    # Parse response
+    action = parse_assistant_response(ai_response)
+    if not action:
+        # Fallback action if parsing failed
+        action = {
+            "email_reply": "Thank you for your message. I'll check the calendar and get back to you shortly.",
+            "calendar_action": "propose_alternatives",
+            "meeting_details": None,
+        }
+    # Submit action to environment
+    step_response = requests.post(f"{env_url}/step", json=action)
+    step_data = step_response.json()
+    return {
+        "reward": step_data["reward"],
+        "done": step_data["done"],
+        "info": step_data.get("info", {}),
+    }
+# ============================================================
+# MAIN — Run baseline inference
+# ============================================================
+def main() -> None:
+    """Run baseline inference on all 3 tasks."""
+    if not API_KEY:
+        print("[END] success=false steps=0 score=0.000 rewards=", flush=True)
+        return
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    # Environment URL (local or HF Space)
+    env_url = os.getenv("ENV_URL", "http://localhost:8000")
+    for task in ["easy", "medium", "hard"]:
+        rewards = []
+        step_count = 0
+        log_start(task=task, env=BENCHMARK, model=MODEL_NAME)
+        try:
+            # Run episode
+            result = run_episode(client, task, env_url)
+            reward = result["reward"]
+            done = result["done"]
+            rewards.append(reward)
+            step_count += 1
+            log_step(
+                step=step_count,
+                action=f"assistant({task})",
+                reward=reward,
+                done=done,
+                error=None,
+            )
+            final_score = round(reward, 4)
+            success = final_score > 0.5
+        except Exception as exc:
+            print(f"Error in {task}: {exc}")
+            final_score = 0.0
+            success = False
+        log_end(
+            success=success,
+            steps=step_count,
+            score=final_score,
+            rewards=rewards,
+        )
+if __name__ == "__main__":
+    main()

openenv.yaml ADDED Viewed

	@@ -0,0 +1,66 @@

+# openenv.yaml — Environment manifest
+name: exec-assist
+version: "1.0.0"
+description: >
+  Executive Assistant environment where AI agents learn to manage email and calendar.
+  Agents must draft professional replies, schedule meetings, resolve conflicts, and
+  handle multi-party coordination. Tests real-world assistant capabilities across
+  three difficulty levels.
+author: Gang-gay
+repository: https://huggingface.co/spaces/YourUsername/exec-assist
+tasks:
+  - name: easy
+    description: >
+      Simple meeting request with clear calendar availability.
+      Agent must draft polite reply and book the meeting correctly.
+      Score = 50% email quality + 50% scheduling correctness.
+    difficulty: easy
+    max_score: 1.0
+    action_schema:
+      email_reply: "Professional email response to sender"
+      calendar_action: "book | propose_alternatives | reschedule | decline"
+      meeting_details: "MeetingDetails object with time, participants, subject"
+  - name: medium
+    description: >
+      Scheduling conflict — requested time is already booked.
+      Agent must identify conflict, propose 2-3 alternative slots, and
+      explain professionally in email.
+      Score = 30% email quality + 40% conflict resolution + 30% scheduling.
+    difficulty: medium
+    max_score: 1.0
+    action_schema:
+      email_reply: "Professional email explaining conflict and proposing alternatives"
+      calendar_action: "propose_alternatives"
+      meeting_details: "MeetingDetails with proposed_alternatives list"
+  - name: hard
+    description: >
+      Multi-party coordination with priority conflicts.
+      3 emails requesting meetings, some overlapping, one high-priority requiring
+      reshuffling existing meetings. Agent must prioritize, reschedule, and notify.
+      Score = 25% email + 25% scheduling + 25% conflict + 25% task completion.
+    difficulty: hard
+    max_score: 1.0
+    action_schema:
+      email_reply: "Professional emails to multiple parties"
+      calendar_action: "Multiple actions coordinated"
+      meeting_details: "Complete coordination plan"
+endpoints:
+  reset: POST /reset
+  step: POST /step
+  state: GET /state
+  tasks: GET /tasks
+  health: GET /health
+  metadata: GET /metadata
+  schema: GET /schema
+  mcp: POST /mcp
+environment:
+  python_version: "3.9"
+  framework: fastapi
+  deployment: huggingface_spaces

pyproject.toml ADDED Viewed

	@@ -0,0 +1,26 @@

+[project]
+name = "exec-assist"
+version = "1.0.0"
+description = "Executive Assistant environment where AI agents learn to manage email and calendar."
+requires-python = ">=3.9"
+license = {text = "MIT"}
+authors = [
+    {name = "Gang-gay"}
+]
+dependencies = [
+    "fastapi>=0.104.0",
+    "uvicorn[standard]>=0.24.0",
+    "openai>=1.0.0",
+    "python-dotenv>=1.0.0",
+    "openenv-core>=0.2.0",
+    "pydantic>=2.0.0",
+    "python-dateutil>=2.8.0",
+]
+[project.scripts]
+server = "server.app:main"
+[build-system]
+requires = ["setuptools>=68.0", "wheel"]
+build-backend = "setuptools.build_meta"

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+fastapi>=0.104.0
+uvicorn[standard]>=0.24.0
+openai>=1.0.0
+python-dotenv>=1.0.0
+openenv-core>=0.2.0
+pydantic>=2.0.0
+python-dateutil>=2.8.0