Spaces:

modelbuilderhq
/

ghostexec

Sleeping

App Files Files Community

ghostexec / README.md

modelbuilderhq

Upload folder using huggingface_hub

8c627b1 verified 9 days ago

preview code

raw

history blame contribute delete

7.87 kB

metadata

title: Ghostexec Environment Server
emoji: 📢
colorFrom: pink
colorTo: yellow
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
  - openenv

Ghostexec: The AI Chief-of-Staff Environment

Ghostexec is an OpenEnv-compliant environment where an LLM acts as an executive chief-of-staff under pressure: triaging inbox crises, resolving calendar conflicts, protecting stakeholder relationships, and finishing critical tasks.

The agent gets a dense plain-text briefing, takes one structured action, and is scored on three coupled dimensions: conflict reduction, relationship quality, and task progress.

Submission Package

Item	Link
Public HF Space (required)	modelbuilderhq/ghostexec
OpenEnv manifest	`openenv.yaml`
Training notebook (Colab-ready)	`notebooks/ghostexec_unsloth_grpo_hf_api.ipynb`
Minimal training script (Unsloth + TRL)	`scripts/train_sft_then_grpo.py`
Mini-blog (required)	BLOG.md on Hugging Face
Demo video <2 minutes (required)	YouTube — Ghostexec demo

Why This Environment Is Competitive

Novel task composition: combines language-heavy triage, social reasoning, scheduling constraints, and deadline management in a single trainable loop.
Non-trivial behavior: valid JSON is necessary but not sufficient; the policy must choose useful actions on the right entity ids at the right time.
Dynamic world model: mood shifts, conflict rebuilds, overdue penalties, and scenario drift events force adaptation over a trajectory.
Trainable reward signal: dense step reward for learning plus bounded graders for evaluation.
Hackathon fit: fully OpenEnv-packaged, hostable on HF Spaces, with reproducible training and visible before/after evidence.

1) Our Inovation

The observation is a realistic text briefing, not a toy tabular state dump.
Actions are schema-bound (GhostexecAction) and validated against live world ids.
The world evolves after each step (conflict graph, stress, mood, time shifts).
Drift events in scenario data test robustness to changing conditions.

Task ladder

Task ID	Difficulty	Scenario
`phase2_core`	easy	`scenarios/phase2_core.json`
`monday_morning`	medium	`scenarios/monday_morning.json`
`dinner_disaster`	hard	`scenarios/dinner_disaster.json`

2) Overview

Ghostexec tells a familiar high-stakes story: too many urgent asks, not enough time, and every action has social + operational consequences.

The demo is easy to follow:

show the same briefing the model sees,
compare weak vs better action choice,
show reward movement and policy behavior improvements.

3) Improvement in Rewards

The repo includes persisted training artifacts and plot outputs:

output/reward_curve.png
output/loss_curve.png
output/baseline_comparison.png

Training evidence plots

Reward trend across training progression.

SFT/GRPO training loss over optimization steps.

Random vs frozen vs trained policy mean episode reward.

Current before/after metrics (from saved artifacts)

Metric	Baseline	Trained
Mean step reward	`0.145`	`0.257`
Invalid action rate	`Not logged in saved artifacts`	`Not logged in saved artifacts`
Grader score	`Not logged in saved artifacts`	`Not logged in saved artifacts`

4) Reward and Training Pipeline

Ghostexec uses a coherent weighted reward core plus bounded shaping:

[ \text{weighted_base} = 0.35 \cdot \text{conflict} + 0.35 \cdot \text{relationship} + 0.30 \cdot \text{task} ]

Then applies structured adjustments (invalid-action penalties, do-nothing pressure, completion/catastrophic terms) with transparent breakdown fields.

Training is end-to-end and environment-connected (not static-only): SFT warm start, then GRPO with environment reward plus local shaping functions.

Quick Start

uv sync
uv run server --port 8000

Python client example:

from ghostexec import GhostexecAction, GhostexecEnv

with GhostexecEnv(base_url="http://127.0.0.1:8000") as env:
    out = env.reset()
    print(out.observation.echoed_message[:400], "...")

    step = env.step(
        GhostexecAction(
            action_type="reply_email",
            email_id="e01",
            message_body="Acknowledged. Sending concise revised update before noon.",
        )
    )
    print("reward:", step.reward)

Reproducible Training Commands

uv run python scripts/train_sft_then_grpo.py \
  --model-preset small_iter_fast \
  --training-preset hackathon_turbo \
  --env-url http://127.0.0.1:8000 \
  --generate-sft-from-env \
  --sft-samples 120 \
  --max-sft-steps 60 \
  --max-grpo-steps 120 \
  --env-reward-scale 1.0 \
  --local-reward-scale 0.35 \
  --complexity-curriculum easy_to_full \
  --curriculum-ramp-ratio 0.60

Generate post-train plots:

uv run python scripts/plot_training_report.py \
  --trainer-history outputs/trainer_state.json \
  --reward-csv outputs/reward_log.csv \
  --baselines-json outputs/compliance_manifest.json \
  --out-dir output

OpenEnv and Space Deployment

openenv serve
openenv build
openenv validate --verbose
openenv push

If needed:

openenv push --repo-id your-username/ghostexec

Environment API and Contract

Core endpoints: /reset, /step, /state, /schema, /health, /docs, /ws
Observation contains:
- echoed_message (plain-text briefing),
- optional metadata (step validity, reward breakdown, ids).
Action schema: see GhostexecAction in models.py.

Supported action_type values:

reply_email
archive_email
reschedule_meeting
cancel_meeting
complete_task
delegate_task
send_message
do_nothing

Submission Readiness Checklist

OpenEnv latest-compatible environment with valid openenv.yaml
Public HF Space deployed and reachable
Minimal trainable script using Unsloth + TRL
Colab-ready notebook for reruns
Training evidence plots embedded in README
Add HF blog link — spaces/modelbuilderhq/ghostexec/blob/main/BLOG.md
Add <2 minute YouTube demo link — youtu.be/g4IFZMEzfO8

Repository Structure

ghostexec/
├── openenv.yaml
├── pyproject.toml
├── models.py
├── client.py
├── graders.py
├── scenarios/
├── scripts/
├── notebooks/
├── tests/
├── output/
└── server/
    ├── app.py
    ├── ghostexec_environment.py
    └── reward.py

Additional References

License

BSD-style license as included in this repository and upstream OpenEnv lineage notices.