Spaces:
Sleeping
title: Ghostexec Environment Server
emoji: π’
colorFrom: pink
colorTo: yellow
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
- openenv
Ghostexec: The AI Chief-of-Staff Environment
Ghostexec is an OpenEnv-compliant environment where an LLM acts as an executive chief-of-staff under pressure: triaging inbox crises, resolving calendar conflicts, protecting stakeholder relationships, and finishing critical tasks.
The agent gets a dense plain-text briefing, takes one structured action, and is scored on three coupled dimensions: conflict reduction, relationship quality, and task progress.
Submission Package
| Item | Link |
|---|---|
| Public HF Space (required) | modelbuilderhq/ghostexec |
| OpenEnv manifest | openenv.yaml |
| Training notebook (Colab-ready) | notebooks/ghostexec_unsloth_grpo_hf_api.ipynb |
| Minimal training script (Unsloth + TRL) | scripts/train_sft_then_grpo.py |
| Mini-blog (required) | BLOG.md on Hugging Face |
| Demo video <2 minutes (required) | YouTube β Ghostexec demo |
Why This Environment Is Competitive
- Novel task composition: combines language-heavy triage, social reasoning, scheduling constraints, and deadline management in a single trainable loop.
- Non-trivial behavior: valid JSON is necessary but not sufficient; the policy must choose useful actions on the right entity ids at the right time.
- Dynamic world model: mood shifts, conflict rebuilds, overdue penalties, and scenario drift events force adaptation over a trajectory.
- Trainable reward signal: dense step reward for learning plus bounded graders for evaluation.
- Hackathon fit: fully OpenEnv-packaged, hostable on HF Spaces, with reproducible training and visible before/after evidence.
1) Our Inovation
- The observation is a realistic text briefing, not a toy tabular state dump.
- Actions are schema-bound (
GhostexecAction) and validated against live world ids. - The world evolves after each step (conflict graph, stress, mood, time shifts).
- Drift events in scenario data test robustness to changing conditions.
Task ladder
| Task ID | Difficulty | Scenario |
|---|---|---|
phase2_core |
easy | scenarios/phase2_core.json |
monday_morning |
medium | scenarios/monday_morning.json |
dinner_disaster |
hard | scenarios/dinner_disaster.json |
2) Overview
Ghostexec tells a familiar high-stakes story: too many urgent asks, not enough time, and every action has social + operational consequences.
The demo is easy to follow:
- show the same briefing the model sees,
- compare weak vs better action choice,
- show reward movement and policy behavior improvements.
3) Improvement in Rewards
The repo includes persisted training artifacts and plot outputs:
output/reward_curve.pngoutput/loss_curve.pngoutput/baseline_comparison.png
Training evidence plots
Reward trend across training progression.
SFT/GRPO training loss over optimization steps.
Random vs frozen vs trained policy mean episode reward.
Current before/after metrics (from saved artifacts)
| Metric | Baseline | Trained |
|---|---|---|
| Mean step reward | 0.145 |
0.257 |
| Invalid action rate | Not logged in saved artifacts |
Not logged in saved artifacts |
| Grader score | Not logged in saved artifacts |
Not logged in saved artifacts |
4) Reward and Training Pipeline
Ghostexec uses a coherent weighted reward core plus bounded shaping:
[ \text{weighted_base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task} ]
Then applies structured adjustments (invalid-action penalties, do-nothing pressure, completion/catastrophic terms) with transparent breakdown fields.
Training is end-to-end and environment-connected (not static-only): SFT warm start, then GRPO with environment reward plus local shaping functions.
Quick Start
uv sync
uv run server --port 8000
Python client example:
from ghostexec import GhostexecAction, GhostexecEnv
with GhostexecEnv(base_url="http://127.0.0.1:8000") as env:
out = env.reset()
print(out.observation.echoed_message[:400], "...")
step = env.step(
GhostexecAction(
action_type="reply_email",
email_id="e01",
message_body="Acknowledged. Sending concise revised update before noon.",
)
)
print("reward:", step.reward)
Reproducible Training Commands
uv run python scripts/train_sft_then_grpo.py \
--model-preset small_iter_fast \
--training-preset hackathon_turbo \
--env-url http://127.0.0.1:8000 \
--generate-sft-from-env \
--sft-samples 120 \
--max-sft-steps 60 \
--max-grpo-steps 120 \
--env-reward-scale 1.0 \
--local-reward-scale 0.35 \
--complexity-curriculum easy_to_full \
--curriculum-ramp-ratio 0.60
Generate post-train plots:
uv run python scripts/plot_training_report.py \
--trainer-history outputs/trainer_state.json \
--reward-csv outputs/reward_log.csv \
--baselines-json outputs/compliance_manifest.json \
--out-dir output
OpenEnv and Space Deployment
openenv serve
openenv build
openenv validate --verbose
openenv push
If needed:
openenv push --repo-id your-username/ghostexec
Environment API and Contract
- Core endpoints:
/reset,/step,/state,/schema,/health,/docs,/ws - Observation contains:
echoed_message(plain-text briefing),- optional metadata (step validity, reward breakdown, ids).
- Action schema: see
GhostexecActioninmodels.py.
Supported action_type values:
reply_emailarchive_emailreschedule_meetingcancel_meetingcomplete_taskdelegate_tasksend_messagedo_nothing
Submission Readiness Checklist
- OpenEnv latest-compatible environment with valid
openenv.yaml - Public HF Space deployed and reachable
- Minimal trainable script using Unsloth + TRL
- Colab-ready notebook for reruns
- Training evidence plots embedded in README
- Add HF blog link β spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md
- Add <2 minute YouTube demo link β youtu.be/g4IFZMEzfO8
Repository Structure
ghostexec/
βββ openenv.yaml
βββ pyproject.toml
βββ models.py
βββ client.py
βββ graders.py
βββ scenarios/
βββ scripts/
βββ notebooks/
βββ tests/
βββ output/
βββ server/
βββ app.py
βββ ghostexec_environment.py
βββ reward.py
Additional References
- OpenEnv (Meta PyTorch)
- OpenEnv Packaging and Deploying Docs
- OpenEnv Hub
- Environment Innovation Deep-Dive
License
BSD-style license as included in this repository and upstream OpenEnv lineage notices.