Spaces:

Mrkumar007
/

cloud_queue_env

Running

App Files Files Community

Mrkumar007 commited on 5 days ago

Commit

a49c996

verified ·

1 Parent(s): 4e8be23

Upload folder using huggingface_hub

Browse files

Files changed (17) hide show

Dockerfile +81 -0
HIGH_SEVERITY_ANALYSIS.md +63 -0
IMPLEMENTATION_ROADMAP.md +272 -0
README.md +364 -5
__init__.py +16 -0
client.py +123 -0
inference.py +747 -0
inference2.py +751 -0
models.py +55 -0
openenv.yaml +30 -0
pyproject.toml +45 -0
ref_inference.py +188 -0
server/__init__.py +11 -0
server/app.py +89 -0
server/cloud_queue_env_environment.py +762 -0
server/requirements.txt +6 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,81 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=cloud_queue_env
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

HIGH_SEVERITY_ANALYSIS.md ADDED Viewed

	@@ -0,0 +1,63 @@

+# Cloud Queue Env - High Severity Analysis (Updated)
+Date: 2026-04-12
+This note captures the two highest-impact issues still present in the environment logic.
+## 1) Arrival Modeling and Arrival Metrics Mismatch
+Files and lines:
+- cloud_queue_env/server/cloud_queue_env_environment.py:240
+- cloud_queue_env/server/cloud_queue_env_environment.py:241
+- cloud_queue_env/server/cloud_queue_env_environment.py:248
+- cloud_queue_env/server/cloud_queue_env_environment.py:259
+What happens now:
+- The simulator samples Poisson arrivals each step.
+- If sampled arrivals are greater than 1, the code still creates only one incoming job object.
+- The arrivals metric is incremented by 1.0, not by sampled arrival count.
+Why this is high severity:
+- Burst behavior is compressed into a single-event stream, so load spikes are underrepresented.
+- Several business metrics and grader components become biased (rejections, abandonment, SLA pressure).
+- Policy ranking can drift because the environment under-penalizes burst scenarios.
+Impact on benchmark credibility:
+- High. This directly affects realism, fairness of grading, and reproducibility quality claims.
+Recommended fix direction:
+- Track all sampled arrivals each step.
+- Either queue all arrivals or maintain an explicit backlog of pending incoming jobs.
+- Increment arrivals metric using true sampled count.
+## 2) Agent Dispatch Control Is Partially Bypassed by Autodispatch
+Files and lines:
+- cloud_queue_env/server/cloud_queue_env_environment.py:353
+- cloud_queue_env/server/cloud_queue_env_environment.py:391
+- cloud_queue_env/server/cloud_queue_env_environment.py:738
+What happens now:
+- The agent may choose an action that is not dispatch.
+- After action application, the environment still runs autodispatch and moves work to idle servers.
+Why this is high severity:
+- It weakens action-to-outcome causality for dispatch decisions.
+- A policy can look better than it should because server assignment still happens automatically.
+- It reduces benchmark difficulty in exactly the control surface the task is evaluating.
+Impact on benchmark credibility:
+- High. This can alter policy comparisons and invalidate assumptions about explicit control.
+Recommended fix direction:
+- Make dispatch behavior explicit by mode:
+  - strict-control mode: only agent dispatches.
+  - assisted mode: autodispatch on, but document this clearly and score accordingly.
+- Keep one consistent mode for official benchmark scoring.
+## Priority Summary
+1. Fix arrival accounting and multi-arrival handling first.
+2. Fix dispatch authority semantics second.
+Both should be addressed before claiming benchmark-grade reliability.

IMPLEMENTATION_ROADMAP.md ADDED Viewed

	@@ -0,0 +1,272 @@

+# QueueOps OpenEnv Implementation Roadmap
+This file is the execution reference for building and iterating the queue operations environment.
+Scope constraints:
+- Keep current repository structure unchanged.
+- Use cloud_queue_env as the project root.
+- Follow OpenEnv compliance strictly.
+- Provide deterministic graders with partial scores in [0, 1].
+- Keep at least 3 benchmark tasks (easy, medium, hard).
+---
+## V1 - MVP Submission Build
+Goal: ship a complete, valid benchmark that can be submitted.
+### Phase 1 - Environment Core
+Sub-goals:
+1. Replace template echo behavior with queue simulator dynamics.
+2. Implement deterministic state transitions using explicit seeds.
+3. Implement terminal conditions with fixed task horizons.
+4. Keep OpenEnv contract: reset, step, state.
+Exit criteria:
+1. reset/step/state are stable and deterministic for fixed seed + fixed action trace.
+2. Episodes terminate correctly.
+### Phase 2 - Task Pack (Easy/Medium/Hard)
+Sub-goals:
+1. Add task selector and fixed per-task configs.
+2. Easy: single queue with admission/dispatch control.
+3. Medium: multi-server with class-aware routing.
+4. Hard: two-stage queue network with scaling decisions.
+Exit criteria:
+1. All three tasks run end-to-end.
+2. Difficulty progression is visible from easy to hard.
+### Phase 3 - Deterministic Graders
+Sub-goals:
+1. Implement per-task score equations with partial credit.
+2. Clamp all task scores to [0, 1].
+3. Handle edge cases (NaN/Inf/missing metrics) safely.
+4. Add final aggregate score across tasks.
+Exit criteria:
+1. Same seeds and same actions always produce the same score.
+2. Scores are interpretable and bounded.
+### Phase 4 - Reward Shaping
+Sub-goals:
+1. Add dense multi-component rewards (wait, throughput, SLA, cost, fairness, safety).
+2. Penalize invalid and exploit-like behavior.
+3. Keep reward scale bounded and stable.
+4. Expose component breakdown in metadata/info.
+Exit criteria:
+1. Reward changes across trajectory (not terminal-only).
+2. Unsafe behavior is consistently penalized.
+### Phase 5 - Inference Runner
+Sub-goals:
+1. Run all benchmark tasks with fixed seeds.
+2. Use OpenAI-compatible client with provider credentials from env variables.
+3. Emit [START], [STEP], [END] logs and final [SUMMARY].
+4. Keep runs reproducible (fixed model params).
+Exit criteria:
+1. End-to-end benchmark run works locally and on deployed runtime.
+2. Output format is submission-ready.
+### Phase 6 - Validation and Docs
+Sub-goals:
+1. Pass openenv validate.
+2. Ensure Docker build/run path works.
+3. Update README with task, reward, grading, and baseline usage.
+4. Add sample benchmark output snippet for evidence.
+Exit criteria:
+1. Validation passes.
+2. README is complete for judges and users.
+### V1 Submission Gate
+All items must be true:
+1. Three tasks implemented and deterministic.
+2. Graders produce valid partial scores in [0, 1].
+3. Inference script runs all tasks and reports summary.
+4. OpenEnv validation passes.
+5. Deployment path is functional.
+---
+## V2 - Robustness and Quality Upgrade
+Goal: improve reliability, calibration, and benchmark trustworthiness.
+### Phase 1 - Determinism Hardening
+Sub-goals:
+1. Separate RNG streams for arrivals/service/abandonment/shocks.
+2. Add replay trace mode for debugging.
+3. Add deterministic episode metadata for audits.
+### Phase 2 - Difficulty Calibration
+Sub-goals:
+1. Tune easy/medium/hard parameter separation.
+2. Improve anti-exploit balancing (reject-all, noop loops, over-scaling).
+3. Re-check reward and grade alignment across seeds.
+### Phase 3 - Reporting Upgrade
+Sub-goals:
+1. Add per-seed result table.
+2. Add mean/std and confidence summary.
+3. Add failure/invalid-action diagnostics in summary.
+### V2 Exit Criteria
+1. Lower variance for fixed seed sets.
+2. Clearer task progression and fairer scoring.
+3. Better debugging and reproducibility outputs.
+---
+## V3 - Extended Benchmark Pack
+Goal: increase novelty and long-term benchmark value.
+### Phase 1 - Optional Task D
+Sub-goals:
+1. Add stronger non-stationary demand patterns.
+2. Grade robustness to bursts and demand shifts.
+### Phase 2 - Optional Task E
+Sub-goals:
+1. Add partial observability/noisy delayed metrics.
+2. Grade safe decision-making under uncertainty.
+### Phase 3 - Public Benchmarking Bundle
+Sub-goals:
+1. Publish official seed suites and profiles (quick/standard/full).
+2. Provide reference baseline runs.
+3. Provide reproducibility notes for external users.
+### V3 Exit Criteria
+1. Four or more tasks available.
+2. Stronger novelty and benchmark coverage.
+3. Cleaner external benchmarking workflow.
+---
+## Recommended Execution Order
+1. Complete V1 and submit.
+2. Upgrade to V2 for reliability and scoring quality.
+3. Add V3 only if timeline permits.
+## Current Status Snapshot
+1. V1 core implementation is in place and running.
+2. openenv validate has passed.
+3. V2 determinism hardening, calibration pass, and reporting upgrade are implemented.
+4. Current focus shifts to V3 extensions and benchmark quality tuning.
+## V2 Completion Notes
+Implemented outcomes:
+1. Separate RNG streams are active for arrivals, service, abandonment, and exogenous effects.
+2. Deterministic trace metadata is exposed (`trace_digest`, `seed`, and RNG stream seeds).
+3. Anti-exploit reward calibration includes rejection-heavy and harmful downscale penalties.
+4. Inference supports multi-seed reporting with mean/std/ci95 outputs.
+5. Inference supports replay-mode action traces via file input for deterministic debugging.
+6. Inference supports JSON/CSV report export for per-seed analysis.
+---
+## Requirement Coverage Matrix (From requirementInfo.md)
+This section is the final compliance tracker for judging criteria.
+### Functional Requirements
+1. Real-world task simulation
+- Requirement: Must represent real human operational work, not toy behavior.
+- Implementation target: queue operations in call center/cloud/logistics-style flow.
+- Evidence to keep: README motivation + task descriptions + action semantics.
+- Status: in progress (core done, examples and narrative should be strengthened).
+2. OpenEnv spec compliance
+- Requirement: typed models, reset, step(action), state, openenv.yaml, validate pass.
+- Implementation target: models.py + server environment + openenv.yaml + app entrypoint.
+- Evidence to keep: `openenv validate` output in PR notes/README.
+- Status: done (validate passing).
+3. Minimum 3 tasks with deterministic graders
+- Requirement: at least easy/medium/hard, deterministic 0.0-1.0 grading.
+- Implementation target: task configs + per-task scoring formulas + clamping.
+- Evidence to keep: sample run showing all tasks and deterministic seeds.
+- Status: done for 3 tasks, polish recommended for calibration.
+4. Meaningful reward function
+- Requirement: dense trajectory signal + penalties for undesirable behavior.
+- Implementation target: weighted reward components and safety penalties.
+- Evidence to keep: reward component logging in metadata and README equations.
+- Status: done, tune weights in V2.
+5. Baseline inference script
+- Requirement: OpenAI-compatible client, env vars credentials, reproducible score over tasks.
+- Implementation target: fixed tasks/seeds/model params, required log format.
+- Evidence to keep: saved run logs and summary scores.
+- Status: done, provider-fallback robustness can be improved.
+### Non-Functional Requirements
+1. Hugging Face Space deployment
+- Requirement: containerized HF Space tagged openenv.
+- Evidence to keep: Space URL + successful run proof.
+- Status: done.
+2. Containerized execution
+- Requirement: Dockerfile works with build + run.
+- Evidence to keep: commands and successful output snippet.
+- Status: pending explicit evidence capture in docs.
+3. Documentation completeness
+- Requirement: README includes env motivation, spaces, tasks, setup/usage, baseline scores.
+- Evidence to keep: README sections + benchmark output table.
+- Status: mostly done, baseline score table still needed.
+---
+## Evaluation Criteria Coverage Checklist
+### Real-world utility (30%)
+1. Keep README examples tied to concrete real operations scenarios.
+2. Add one paragraph on why this benchmark is useful for agent evaluation.
+### Task and grader quality (25%)
+1. Keep deterministic seed set fixed and documented.
+2. Show per-task scoring decomposition and bounded outputs.
+3. Add one reproducibility check note: same seed + same policy => same score.
+### Environment design (20%)
+1. Verify clean reset and sensible done boundaries for all tasks.
+2. Keep action/observation schema stable and documented.
+3. Keep dense reward with interpretable components.
+### Code quality and spec compliance (15%)
+1. Keep `openenv validate` passing.
+2. Capture docker build/run commands and outcomes.
+3. Keep deployment and ws route functional.
+### Creativity and novelty (10%)
+1. Emphasize queue-control benchmark novelty in README.
+2. Keep multi-objective reward and cost/fairness tradeoff visible.
+---
+## Pre-Submission Evidence Pack (Must Attach)
+1. Validation proof
+- `openenv validate` success output.
+2. Runtime proof
+- HF Space URL and one successful task run excerpt.
+3. Baseline proof
+- One full [START]/[STEP]/[END]/[SUMMARY] run log.
+4. Docker proof
+- `docker build` and `docker run` command results.
+5. Documentation proof
+- README includes baseline score table (easy, medium, hard, final).

README.md CHANGED Viewed

@@ -1,10 +1,369 @@
 ---
-title: Cloud Queue Env
-emoji: 😻
-colorFrom: red
-colorTo: purple
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Cloud Queue Env Environment Server
+emoji: 🖨️
+colorFrom: pink
+colorTo: blue
 sdk: docker
 pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
 ---
+# Cloud Queue Env Environment
+A real-world queue-operations benchmark for OpenEnv.
+This environment simulates service operations decisions humans make in production systems:
+- Admission and rejection under load
+- Queue routing and dispatching
+- Priority handling for urgent traffic
+- Capacity scaling under infrastructure cost constraints
+The benchmark includes three deterministic tasks with partial graders in [0, 1]:
+- easy: single-queue stability
+- medium: multi-server priority routing
+- hard: two-stage queue network with scaling
+## Quick Start
+Use the CloudQueueEnv client to connect to a running server or container:
+```python
+from cloud_queue_env import CloudQueueAction, CloudQueueEnv
+try:
+    env = CloudQueueEnv.from_docker_image("cloud_queue_env-env:latest")
+    # Configure task + seed, then reset into that deterministic episode
+    env.reset()
+    env.step(CloudQueueAction(action_type="configure_task", task_id="easy", seed=11))
+    result = env.reset()
+    for _ in range(20):
+        obs = result.observation
+        if obs.incoming_job_present:
+            action = CloudQueueAction(action_type="admit", target_queue=0)
+        else:
+            action = CloudQueueAction(action_type="dispatch", target_queue=0)
+        result = env.step(action)
+        print(
+            f"step={obs.sim_time} queues={obs.queue_lengths} "
+            f"reward={result.reward:.3f} done={result.done}"
+        )
+        if result.done:
+            break
+    final_score = result.observation.metadata.get("episode_score", 0.0)
+    print(f"episode_score={final_score:.3f}")
+finally:
+    env.close()
+```
+The CloudQueueEnv.from_docker_image() method handles:
+- Starting the Docker container
+- Waiting for the server to be ready
+- Connecting to the environment
+- Container cleanup when you call `close()`
+## Building the Docker Image
+Before using the environment, you need to build the Docker image:
+```bash
+# From project root
+docker build -t cloud_queue_env-env:latest -f server/Dockerfile .
+```
+## Deploying to Hugging Face Spaces
+You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
+```bash
+# From the environment directory (where openenv.yaml is located)
+openenv push
+# Or specify options
+openenv push --namespace my-org --private
+```
+The `openenv push` command will:
+1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
+2. Prepare a custom build for Hugging Face Docker space (enables web interface)
+3. Upload to Hugging Face (ensuring you're logged in)
+### Prerequisites
+- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
+### Options
+- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
+- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
+- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
+- `--private`: Deploy the space as private (default: public)
+### Examples
+```bash
+# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
+openenv push
+# Push to a specific repository
+openenv push --repo-id my-org/my-env
+# Push with a custom base image
+openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
+# Push as a private space
+openenv push --private
+# Combine options
+openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
+```
+After deployment, your space will be available at:
+`https://huggingface.co/spaces/<repo-id>`
+The deployed space includes:
+- **Web Interface** at `/web` - Interactive UI for exploring the environment
+- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
+- **Health Check** at `/health` - Container health monitoring
+- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
+## Environment Details
+### Action
+CloudQueueAction fields:
+- action_type: one of configure_task, admit, reject, route, dispatch, scale, reprioritize, noop
+- target_queue: queue index for route/dispatch/admit
+- target_server: optional server index
+- scale_delta: server delta for scale action
+- new_priority: new priority value for reprioritize
+- task_id: easy/medium/hard (used with configure_task)
+- seed: deterministic task seed (used with configure_task)
+### Observation
+CloudQueueObservation includes:
+- task_id, sim_time, horizon
+- queue_lengths, queue_wait_ema
+- server_busy, server_remaining_service, utilization
+- incoming_job_present, incoming_job_size, incoming_job_priority, incoming_job_deadline, incoming_job_type
+- sla_violation_rate, abandonment_rate, throughput_recent, energy_cost_rate
+- level, optional_history, action_mask
+- reward, done, metadata
+### Reward
+Per-step reward is dense and multi-objective:
+$$
+r_t = 0.35R_{wait} + 0.20R_{throughput} + 0.20R_{sla} + 0.15R_{cost} + 0.05R_{fair} + 0.05R_{safe}
+$$
+Properties:
+- Partial progress signal over the full trajectory
+- Penalties for invalid actions and unsafe/noop behavior under congestion
+- Bounded reward values for stability
+### Deterministic Graders
+Each task returns a deterministic episode_score in [0, 1], stored in observation metadata.
+- easy score uses avg wait, throughput, rejection rate, and SLA violations
+- medium score uses urgent/normal p95 waits, urgent SLA, throughput, and action cost
+- hard score uses end-to-end p95, abandonment, SLA, throughput, infra cost, and fairness gap
+If invalid action rate exceeds threshold, score is capped.
+## Tasks
+1. easy (single queue stability)
+- one queue, one server
+- objective: low wait with acceptable throughput and low rejection
+2. medium (priority routing)
+- two queues and multiple servers
+- objective: protect urgent traffic while maintaining total performance
+3. hard (queue network + scaling)
+- two-stage queue network with bursty arrivals and heavy-tailed service times
+- objective: balance latency/SLA/abandonment against infra cost and fairness
+## Baseline Inference
+Run baseline inference across easy/medium/hard:
+```bash
+API_KEY=your_provider_key python inference.py
+```
+Optional variables:
+- API_KEY (OpenAI-compatible provider key for model calls)
+- API_BASE_URL (default: https://router.huggingface.co/v1)
+- MODEL_NAME (default: Qwen/Qwen2.5-72B-Instruct)
+- BASE_URL (if using deployed space)
+- IMAGE_NAME (if launching local docker image)
+- USE_HEURISTIC_ONLY (true/false)
+- DISABLE_MODEL_ON_FIRST_ERROR (true/false)
+- MAX_STEPS_OVERRIDE (integer quick-test cap)
+- TASK_SEEDS_JSON (JSON map for multi-seed runs)
+- ACTION_TRACE_FILE (JSON replay file keyed by task:seed)
+- REPORT_JSON_PATH (write seed/task report JSON)
+- REPORT_CSV_PATH (write per-seed report CSV)
+Output includes required line types:
+- [START]
+- [STEP]
+- [END]
+And final aggregate summary:
+- [SUMMARY] easy=<...> medium=<...> hard=<...> final=<...>
+V2 reporting also includes:
+- [REPORT_SEED] task=<task_id> seed=<seed> score=<score> steps=<n> trace=<digest>
+- [REPORT] task=<task_id> seeds=<n> mean=<score> std=<score> ci95=<score>
+## Baseline Scores
+Current reproducible heuristic-only baseline (deployed runtime, single seed per task):
+| Task | Seed Count | Mean Score |
+|---|---:|---:|
+| easy | 1 | 0.000 |
+| medium | 1 | 0.000 |
+| hard | 1 | 0.000 |
+| final (mean of task means) | - | 0.000 |
+Notes:
+- These values are from heuristic fallback mode and are expected to be low.
+- Model-based scores depend on provider/model availability and should be recorded from a successful funded run.
+- Keep this table updated with your latest official benchmark run before final submission.
+## Advanced Usage
+### Connecting to an Existing Server
+If you already have a Cloud Queue Env environment server running, you can connect directly:
+```python
+from cloud_queue_env import CloudQueueAction, CloudQueueEnv
+# Connect to existing server
+cloud_queue_envenv = CloudQueueEnv(base_url="<ENV_HTTP_URL_HERE>")
+# Use as normal
+result = cloud_queue_envenv.reset()
+result = cloud_queue_envenv.step(CloudQueueAction(action_type="dispatch", target_queue=0))
+```
+Note: When connecting to an existing server, `cloud_queue_envenv.close()` will NOT stop the server.
+### Using the Context Manager
+The client supports context manager usage for automatic connection management:
+```python
+from cloud_queue_env import CloudQueueAction, CloudQueueEnv
+# Connect with context manager (auto-connects and closes)
+with CloudQueueEnv(base_url="http://localhost:8000") as env:
+    result = env.reset()
+    print(f"Initial queues: {result.observation.queue_lengths}")
+    # Multiple steps with low latency
+    for _ in range(10):
+        result = env.step(CloudQueueAction(action_type="noop"))
+        print(f"Reward: {result.reward:.3f}")
+```
+The client uses WebSocket connections for:
+- **Lower latency**: No HTTP connection overhead per request
+- **Persistent session**: Server maintains your environment state
+- **Efficient for episodes**: Better for many sequential steps
+### Concurrent WebSocket Sessions
+The server supports multiple concurrent WebSocket connections. To enable this,
+modify `server/app.py` to use factory mode:
+```python
+# In server/app.py - use factory mode for concurrent sessions
+app = create_app(
+    CloudQueueEnvironment,  # Pass class, not instance
+    CloudQueueAction,
+    CloudQueueObservation,
+    max_concurrent_envs=4,  # Allow 4 concurrent sessions
+)
+```
+Then multiple clients can connect simultaneously:
+```python
+from cloud_queue_env import CloudQueueAction, CloudQueueEnv
+from concurrent.futures import ThreadPoolExecutor
+def run_episode(client_id: int):
+    with CloudQueueEnv(base_url="http://localhost:8000") as env:
+        result = env.reset()
+        for i in range(10):
+            result = env.step(CloudQueueAction(action_type="dispatch", target_queue=i % 2))
+        return client_id, result.observation.queue_lengths
+# Run 4 episodes concurrently
+with ThreadPoolExecutor(max_workers=4) as executor:
+    results = list(executor.map(run_episode, range(4)))
+```
+## Development & Testing
+### Direct Environment Testing
+Core files:
+- models: typed action/observation schema
+- server environment: queue simulation, reward shaping, grading
+- inference script: task sweep and benchmark logging
+### Running Locally
+Run the server locally for development:
+```bash
+uvicorn server.app:app --reload
+```
+## Project Structure
+```
+cloud_queue_env/
+├── .dockerignore
+├── __init__.py
+├── README.md
+├── openenv.yaml
+├── pyproject.toml
+├── client.py
+├── models.py
+├── inference.py
+├── IMPLEMENTATION_ROADMAP.md
+└── server/
+    ├── __init__.py
+    ├── cloud_queue_env_environment.py
+    ├── app.py
+    └── Dockerfile
+```
+TASK A — Easy (150 steps)
+  Scenario:  1 queue, 1 server (M/M/1), only admit/reject/dispatch
+  Objective: Keep wait low while processing throughput
+  Grader:    score = 0.40×(1-avg_wait/6) + 0.30×(throughput/70)
+                   + 0.15×(1-rejection_rate/0.3) + 0.15×(1-sla_breaches/0.3)
+TASK B — Medium (200 steps)
+  Scenario:  2 queues, 3 servers, 28% urgent jobs → route + reprioritize
+  Objective: Protect urgent SLA while not starving normal jobs
+  Grader:    score = 0.35×urgent_wait_score + 0.25×urgent_sla_score
+                   + 0.15×normal_wait_score + 0.15×throughput + 0.10×cost
+TASK C — Hard (250 steps)
+  Scenario:  2-stage pipeline, 1–6 servers, heavy-tail service, abandonments
+  Objective: Maximize quality under budget with fairness
+  Grader:    score = 0.25×e2e_latency + 0.20×abandonment + 0.20×sla
+                   + 0.15×throughput + 0.10×cost + 0.10×fairness

__init__.py ADDED Viewed

	@@ -0,0 +1,16 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Cloud Queue Env Environment."""
+from .client import CloudQueueEnv
+from .models import CloudQueueAction, CloudQueueObservation
+__all__ = [
+    "CloudQueueAction",
+    "CloudQueueObservation",
+    "CloudQueueEnv",
+]

client.py ADDED Viewed

	@@ -0,0 +1,123 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Cloud Queue Env Environment Client."""
+from typing import Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+from .models import CloudQueueAction, CloudQueueObservation
+class CloudQueueEnv(
+    EnvClient[CloudQueueAction, CloudQueueObservation, State]
+):
+    """
+    Client for the Cloud Queue Env Environment.
+    This client maintains a persistent WebSocket connection to the environment server,
+    enabling efficient multi-step interactions with lower latency.
+    Each client instance has its own dedicated environment session on the server.
+    Example:
+        >>> # Connect to a running server
+        >>> with CloudQueueEnv(base_url="http://localhost:8000") as client:
+        ...     result = client.reset()
+        ...     print(result.observation.queue_lengths)
+        ...
+        ...     result = client.step(CloudQueueAction(action_type="admit", target_queue=0))
+        ...     print(result.observation.throughput_recent)
+    Example with Docker:
+        >>> # Automatically start container and connect
+        >>> client = CloudQueueEnv.from_docker_image("cloud_queue_env-env:latest")
+        >>> try:
+        ...     result = client.reset()
+        ...     result = client.step(CloudQueueAction(action_type="dispatch", target_queue=0))
+        ... finally:
+        ...     client.close()
+    """
+    def _step_payload(self, action: CloudQueueAction) -> Dict:
+        """
+        Convert CloudQueueAction to JSON payload for step message.
+        Args:
+            action: CloudQueueAction instance
+        Returns:
+            Dictionary representation suitable for JSON encoding
+        """
+        return {
+            "action_type": action.action_type,
+            "target_queue": action.target_queue,
+            "target_server": action.target_server,
+            "scale_delta": action.scale_delta,
+            "new_priority": action.new_priority,
+            "task_id": action.task_id,
+            "seed": action.seed,
+        }
+    def _parse_result(self, payload: Dict) -> StepResult[CloudQueueObservation]:
+        """
+        Parse server response into StepResult[CloudQueueObservation].
+        Args:
+            payload: JSON response data from server
+        Returns:
+            StepResult with CloudQueueObservation
+        """
+        obs_data = payload.get("observation", {})
+        observation = CloudQueueObservation(
+            task_id=obs_data.get("task_id", "easy"),
+            sim_time=obs_data.get("sim_time", 0),
+            horizon=obs_data.get("horizon", 0),
+            queue_lengths=obs_data.get("queue_lengths", []),
+            queue_wait_ema=obs_data.get("queue_wait_ema", []),
+            server_busy=obs_data.get("server_busy", []),
+            server_remaining_service=obs_data.get("server_remaining_service", []),
+            utilization=obs_data.get("utilization", []),
+            incoming_job_present=obs_data.get("incoming_job_present", False),
+            incoming_job_size=obs_data.get("incoming_job_size", 0.0),
+            incoming_job_priority=obs_data.get("incoming_job_priority", 0),
+            incoming_job_deadline=obs_data.get("incoming_job_deadline", 0.0),
+            incoming_job_type=obs_data.get("incoming_job_type", 0),
+            sla_violation_rate=obs_data.get("sla_violation_rate", 0.0),
+            abandonment_rate=obs_data.get("abandonment_rate", 0.0),
+            throughput_recent=obs_data.get("throughput_recent", 0.0),
+            energy_cost_rate=obs_data.get("energy_cost_rate", 0.0),
+            level=obs_data.get("level", 1.0),
+            optional_history=obs_data.get("optional_history", []),
+            action_mask=obs_data.get("action_mask", []),
+            done=payload.get("done", False),
+            reward=payload.get("reward"),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        """
+        Parse server response into State object.
+        Args:
+            payload: JSON response from state request
+        Returns:
+            State object with episode_id and step_count
+        """
+        return State(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+        )

inference.py ADDED Viewed

	@@ -0,0 +1,747 @@

+"""Baseline inference runner for the queue operations benchmark tasks."""
+import asyncio
+import csv
+import json
+import os
+import statistics
+import textwrap
+from typing import Any, List, Optional
+from urllib.parse import urlparse, urlunparse
+from dotenv import load_dotenv
+from openai import OpenAI
+load_dotenv()  # Load environment variables from .env file
+from cloud_queue_env import CloudQueueAction, CloudQueueEnv, CloudQueueObservation
+IMAGE_NAME = os.getenv("IMAGE_NAME")
+BASE_URL = os.getenv("BASE_URL")
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
+API_KEY = os.getenv("API_KEY") or os.getenv("HF_TOKEN")
+BENCHMARK = os.getenv("BENCHMARK", "queueops-openenv")
+TASKS = ["easy", "medium", "hard"]
+TASK_SEEDS_JSON = os.getenv("TASK_SEEDS_JSON")
+SEEDS = [11, 23, 37]
+TEMPERATURE = 0.2
+MAX_TOKENS = 180
+SUCCESS_SCORE_THRESHOLD = 0.60
+USE_HEURISTIC_ONLY = os.getenv("USE_HEURISTIC_ONLY", "false").lower() in {"1", "true", "yes"}
+DISABLE_MODEL_ON_FIRST_ERROR = os.getenv("DISABLE_MODEL_ON_FIRST_ERROR", "true").lower() in {"1", "true", "yes"}
+MAX_STEPS_OVERRIDE = int(os.getenv("MAX_STEPS_OVERRIDE", "0") or "0")
+ACTION_TRACE_FILE = os.getenv("ACTION_TRACE_FILE")
+REPORT_JSON_PATH = os.getenv("REPORT_JSON_PATH")
+REPORT_CSV_PATH = os.getenv("REPORT_CSV_PATH")
+SYSTEM_PROMPT = textwrap.dedent(
+    """
+    You are an agent controlling a cloud queue scheduling environment.
+    Your goal: minimize wait times, SLA violations, and cost while maximizing throughput.
+    ACTIONS (return exactly one JSON object, no extra text):
+      {"action_type": "admit",       "target_queue": 0}          — accept incoming job into queue 0
+      {"action_type": "route",       "target_queue": 1}          — accept incoming job into queue 1 (medium/hard only)
+      {"action_type": "reject",      "target_queue": null}       — reject incoming job (use when queues are filling up)
+      {"action_type": "dispatch",    "target_queue": 0}          — move job from queue to an idle server
+      {"action_type": "reprioritize","new_priority": 2}         — promote a normal job to urgent (medium/hard only)
+      {"action_type": "scale",       "scale_delta": 1}           — add 1 server (+1) or remove 1 server (-1) (hard only)
+      {"action_type": "noop",        "target_queue": null}       — do nothing
+    STRATEGY HINTS:
+      - REJECT jobs when queue fill is above 60% to prevent overflow and SLA breaches.
+      - ADMIT when queues have space and server is idle.
+      - DISPATCH after admitting to keep servers busy.
+      - On medium/hard: ROUTE urgent jobs (priority=2) to a less-loaded queue.
+      - On hard: SCALE up (+1) when queue_fill > 70% and cost allows; scale down when queues are empty.
+      - Negative reward means the system is struggling — change strategy.
+    Return ONLY valid JSON. No explanation.
+    """
+).strip()
+ACTION_TYPES = (
+    "configure_task",
+    "admit",
+    "reject",
+    "route",
+    "dispatch",
+    "scale",
+    "reprioritize",
+    "noop",
+)
+TASK_ALLOWED_ACTIONS = {
+    "easy": {"admit", "reject", "dispatch", "noop"},
+    "medium": {"admit", "reject", "route", "dispatch", "reprioritize", "noop"},
+    "hard": {"admit", "reject", "route", "dispatch", "reprioritize", "scale", "noop"},
+}
+MODEL_ACTION_RESPONSE_FORMAT = {
+    "type": "json_schema",
+    "json_schema": {
+        "name": "cloud_queue_action",
+        "strict": True,
+        "schema": {
+            "type": "object",
+            "additionalProperties": False,
+            "required": [
+                "action_type",
+                "target_queue",
+                "target_server",
+                "scale_delta",
+                "new_priority",
+            ],
+            "properties": {
+                "action_type": {"type": "string", "enum": list(ACTION_TYPES)},
+                "target_queue": {"type": ["integer", "null"], "minimum": 0},
+                "target_server": {"type": ["integer", "null"], "minimum": 0},
+                "scale_delta": {"type": ["integer", "null"], "minimum": -2, "maximum": 2},
+                "new_priority": {"type": ["integer", "null"], "minimum": 0, "maximum": 3},
+            },
+        },
+    },
+}
+_SCHEMA_RESPONSE_FORMAT_FAILED = False
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
+def parse_task_seed_map() -> dict[str, list[int]]:
+    if TASK_SEEDS_JSON:
+        try:
+            data = json.loads(TASK_SEEDS_JSON)
+            task_map: dict[str, list[int]] = {}
+            for task_name, seeds in data.items():
+                parsed = [int(s) for s in seeds]
+                if parsed:
+                    task_map[str(task_name)] = parsed
+            if task_map:
+                return task_map
+        except Exception as exc:
+            print(f"[DEBUG] Invalid TASK_SEEDS_JSON, falling back to defaults: {exc}", flush=True)
+    return {
+        "easy": [SEEDS[0]],
+        "medium": [SEEDS[1]],
+        "hard": [SEEDS[2]],
+    }
+def _action_from_dict(data: dict) -> CloudQueueAction:
+    return CloudQueueAction(
+        action_type=str(data.get("action_type", "noop")),
+        target_queue=data.get("target_queue"),
+        target_server=data.get("target_server"),
+        scale_delta=data.get("scale_delta"),
+        new_priority=data.get("new_priority"),
+    )
+def load_replay_actions() -> dict[str, list[CloudQueueAction]]:
+    if not ACTION_TRACE_FILE:
+        return {}
+    try:
+        with open(ACTION_TRACE_FILE, "r", encoding="utf-8") as f:
+            payload = json.load(f)
+    except Exception as exc:
+        print(f"[DEBUG] Failed to load ACTION_TRACE_FILE: {exc}", flush=True)
+        return {}
+    replay: dict[str, list[CloudQueueAction]] = {}
+    if isinstance(payload, dict):
+        for key, action_list in payload.items():
+            if not isinstance(action_list, list):
+                continue
+            parsed = []
+            for item in action_list:
+                if isinstance(item, dict):
+                    parsed.append(_action_from_dict(item))
+            if parsed:
+                replay[str(key)] = parsed
+    return replay
+def ci95(values: list[float]) -> float:
+    if len(values) <= 1:
+        return 0.0
+    std = statistics.pstdev(values)
+    return 1.96 * std / (len(values) ** 0.5)
+def write_reports(seed_rows: list[dict], task_score_table: dict[str, list[float]]) -> None:
+    if REPORT_JSON_PATH:
+        report_payload = {
+            "seed_rows": seed_rows,
+            "task_summary": {
+                task: {
+                    "mean": statistics.mean(scores) if scores else 0.0,
+                    "std": statistics.pstdev(scores) if len(scores) > 1 else 0.0,
+                    "ci95": ci95(scores),
+                    "count": len(scores),
+                }
+                for task, scores in task_score_table.items()
+            },
+        }
+        try:
+            with open(REPORT_JSON_PATH, "w", encoding="utf-8") as f:
+                json.dump(report_payload, f, indent=2)
+        except Exception as exc:
+            print(f"[DEBUG] Failed to write REPORT_JSON_PATH: {exc}", flush=True)
+    if REPORT_CSV_PATH:
+        try:
+            with open(REPORT_CSV_PATH, "w", encoding="utf-8", newline="") as f:
+                writer = csv.DictWriter(
+                    f,
+                    fieldnames=[
+                        "task",
+                        "seed",
+                        "score",
+                        "steps",
+                        "success",
+                        "trace_digest",
+                        "invalid_actions",
+                        "harmful_scale_down",
+                    ],
+                )
+                writer.writeheader()
+                for row in seed_rows:
+                    writer.writerow(row)
+        except Exception as exc:
+            print(f"[DEBUG] Failed to write REPORT_CSV_PATH: {exc}", flush=True)
+def build_obs_summary(obs: CloudQueueObservation, task_name: str) -> str:
+    """Build a rich, structured text summary of the observation for the LLM prompt."""
+    # Queue fill percentages — helps model know when to reject
+    max_sizes = {"easy": 28, "medium": 42, "hard": 64}
+    max_q = max_sizes.get(task_name, 30)
+    fills = [f"{l}/{max_q}({100*l//max_q}%)" for l in obs.queue_lengths]
+    # Server status
+    busy_count = sum(obs.server_busy)
+    total_servers = len(obs.server_busy)
+    servers_str = f"{busy_count}/{total_servers} busy"
+    # Incoming job info
+    if obs.incoming_job_present:
+        urgency = "URGENT" if obs.incoming_job_priority >= 2 else "normal"
+        incoming_str = f"YES [{urgency} size={obs.incoming_job_size:.1f} deadline={obs.incoming_job_deadline:.0f}]"
+    else:
+        incoming_str = "none"
+    return (
+        f"task={task_name} | "
+        f"queues={fills} | "
+        f"servers={servers_str} | "
+        f"incoming={incoming_str} | "
+        f"sla_breach={obs.sla_violation_rate:.3f} | "
+        f"abandonment={obs.abandonment_rate:.3f} | "
+        f"cost_rate={obs.energy_cost_rate:.3f}"
+    )
+def build_user_prompt(step: int, obs_summary: str, last_reward: float, history: List[str], task_name: str) -> str:
+    history_block = "\n".join(history[-4:]) if history else "None"
+    return textwrap.dedent(
+        f"""
+        Step {step} | Last reward: {last_reward:.2f}
+        State: {obs_summary}
+        Recent actions:
+        {history_block}
+        Choose the best action now.
+        """
+    ).strip()
+def choose_heuristic_action(task_name: str, queue_lengths: List[int], incoming_present: bool) -> CloudQueueAction:
+    if incoming_present:
+        if task_name == "hard" and len(queue_lengths) > 1 and queue_lengths[0] > queue_lengths[1]:
+            return CloudQueueAction(action_type="route", target_queue=1)
+        if task_name == "medium" and len(queue_lengths) > 1 and queue_lengths[1] < queue_lengths[0]:
+            return CloudQueueAction(action_type="route", target_queue=1)
+        return CloudQueueAction(action_type="admit", target_queue=0)
+    return CloudQueueAction(action_type="dispatch", target_queue=0)
+def _coerce_optional_int(value: Any) -> Optional[int]:
+    if value is None:
+        return None
+    if isinstance(value, bool):
+        return int(value)
+    if isinstance(value, int):
+        return value
+    if isinstance(value, float):
+        return int(value)
+    if isinstance(value, str):
+        txt = value.strip().lower()
+        if txt in {"", "null", "none"}:
+            return None
+        try:
+            return int(txt)
+        except ValueError:
+            try:
+                return int(float(txt))
+            except ValueError:
+                return None
+    return None
+def _extract_json_object(text: str) -> Optional[dict[str, Any]]:
+    cleaned = (text or "").strip()
+    if not cleaned:
+        return None
+    # Handle common fenced responses first.
+    if cleaned.startswith("```"):
+        chunks = [chunk.strip() for chunk in cleaned.split("```") if chunk.strip()]
+        for chunk in chunks:
+            candidate = chunk
+            if candidate.lower().startswith("json"):
+                candidate = candidate[4:].strip()
+            try:
+                parsed = json.loads(candidate)
+                if isinstance(parsed, dict):
+                    return parsed
+                if isinstance(parsed, list) and parsed and isinstance(parsed[0], dict):
+                    return parsed[0]
+            except Exception:
+                continue
+    try:
+        parsed = json.loads(cleaned)
+        if isinstance(parsed, dict):
+            return parsed
+        if isinstance(parsed, list) and parsed and isinstance(parsed[0], dict):
+            return parsed[0]
+    except Exception:
+        pass
+    # Fallback: extract the first balanced JSON object from noisy text.
+    start = 0
+    while True:
+        open_idx = cleaned.find("{", start)
+        if open_idx < 0:
+            return None
+        depth = 0
+        for i in range(open_idx, len(cleaned)):
+            ch = cleaned[i]
+            if ch == "{":
+                depth += 1
+            elif ch == "}":
+                depth -= 1
+                if depth == 0:
+                    candidate = cleaned[open_idx : i + 1]
+                    try:
+                        parsed = json.loads(candidate)
+                        if isinstance(parsed, dict):
+                            return parsed
+                    except Exception:
+                        break
+        start = open_idx + 1
+def _normalize_action_payload(data: dict[str, Any], task_name: str) -> Optional[dict[str, Any]]:
+    action_type = str(data.get("action_type", "noop")).strip().lower()
+    if action_type not in ACTION_TYPES:
+        return None
+    if action_type not in TASK_ALLOWED_ACTIONS.get(task_name, set(ACTION_TYPES)):
+        return None
+    target_queue = _coerce_optional_int(data.get("target_queue"))
+    target_server = _coerce_optional_int(data.get("target_server"))
+    scale_delta = _coerce_optional_int(data.get("scale_delta"))
+    new_priority = _coerce_optional_int(data.get("new_priority"))
+    if action_type in {"admit", "route", "dispatch"} and target_queue is None:
+        target_queue = 0
+    if action_type in {"reject", "noop"}:
+        target_queue = None
+        target_server = None
+    if action_type == "scale":
+        if scale_delta is None:
+            return None
+        scale_delta = max(-2, min(2, scale_delta))
+    else:
+        scale_delta = None
+    if action_type == "reprioritize":
+        if new_priority is None:
+            new_priority = 2
+    else:
+        new_priority = None
+    return {
+        "action_type": action_type,
+        "target_queue": target_queue,
+        "target_server": target_server,
+        "scale_delta": scale_delta,
+        "new_priority": new_priority,
+    }
+def parse_model_action(text: str, task_name: str) -> Optional[CloudQueueAction]:
+    data = _extract_json_object(text)
+    if data is None:
+        return None
+    payload = _normalize_action_payload(data, task_name)
+    if payload is None:
+        return None
+    try:
+        return CloudQueueAction(**payload)
+    except Exception:
+        return None
+def get_model_action(
+    client: OpenAI,
+    task_name: str,
+    step: int,
+    obs_summary: str,
+    last_reward: float,
+    history: List[str],
+) -> tuple[Optional[CloudQueueAction], Optional[str]]:
+    global _SCHEMA_RESPONSE_FORMAT_FAILED
+    user_prompt = build_user_prompt(step, obs_summary, last_reward, history, task_name)
+    messages = [
+        {"role": "system", "content": SYSTEM_PROMPT},
+        {"role": "user", "content": user_prompt},
+    ]
+    try:
+        if not _SCHEMA_RESPONSE_FORMAT_FAILED:
+            try:
+                completion = client.chat.completions.create(
+                    model=MODEL_NAME,
+                    messages=messages,
+                    temperature=TEMPERATURE,
+                    max_tokens=MAX_TOKENS,
+                    stream=False,
+                    response_format=MODEL_ACTION_RESPONSE_FORMAT,
+                )
+            except Exception as schema_exc:
+                _SCHEMA_RESPONSE_FORMAT_FAILED = True
+                print(
+                    f"[DEBUG] response_format unavailable, retrying without schema: {schema_exc}",
+                    flush=True,
+                )
+                completion = client.chat.completions.create(
+                    model=MODEL_NAME,
+                    messages=messages,
+                    temperature=TEMPERATURE,
+                    max_tokens=MAX_TOKENS,
+                    stream=False,
+                )
+        else:
+            completion = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=messages,
+                temperature=TEMPERATURE,
+                max_tokens=MAX_TOKENS,
+                stream=False,
+            )
+        text = (completion.choices[0].message.content or "").strip()
+        action = parse_model_action(text, task_name)
+        if action is None:
+            preview = " ".join(text.split())[:180]
+            return None, f"invalid_model_action_payload: {preview}"
+        return action, None
+    except Exception as exc:
+        print(f"[DEBUG] Model request failed: {exc}", flush=True)
+        return None, str(exc)
+def normalize_base_url(base_url: Optional[str]) -> Optional[str]:
+    """Normalize user-provided BASE_URL into an API runtime URL.
+    If a Hugging Face repo page URL is provided (huggingface.co/spaces/user/space),
+    convert it to the runtime domain (https://user-space.hf.space).
+    """
+    if not base_url:
+        return base_url
+    cleaned = base_url.strip().rstrip("/")
+    parsed = urlparse(cleaned)
+    # Handle Hugging Face repo page URL -> runtime URL used by API/WebSocket.
+    if parsed.netloc.lower() == "huggingface.co":
+        parts = [p for p in parsed.path.strip("/").split("/") if p]
+        if len(parts) >= 3 and parts[0] == "spaces":
+            owner, space = parts[1], parts[2]
+            # HF runtime hostnames use lowercase and are TLS-safe.
+            owner = owner.lower().replace("_", "-")
+            space = space.lower().replace("_", "-")
+            return f"https://{owner}-{space}.hf.space"
+    # Avoid accidentally pointing at the web UI path.
+    if cleaned.endswith("/web"):
+        cleaned = cleaned[:-4]
+        parsed = urlparse(cleaned)
+    # HF runtime domains should be lowercase and avoid underscores for TLS host checks.
+    host = (parsed.hostname or "").lower()
+    if host.endswith(".hf.space"):
+        safe_host = host.replace("_", "-")
+        if safe_host != host or (parsed.netloc and parsed.netloc != parsed.netloc.lower()):
+            port_part = f":{parsed.port}" if parsed.port else ""
+            netloc = f"{safe_host}{port_part}"
+            parsed = parsed._replace(netloc=netloc)
+            cleaned = urlunparse(parsed)
+    return cleaned
+def _smoke_test_model(client: OpenAI) -> bool:
+    """Verify the model API is reachable AND can generate a coherent response.
+    Asks a short queue-domain question that requires a real sentence answer.
+    An empty or missing reply is treated as failure — not just exceptions.
+    Prints [MODEL_OK] or [MODEL_FAIL] with details.
+    Returns True if the model is working, False otherwise.
+    """
+    print(f"[MODEL_CHECK] Testing model={MODEL_NAME} at {API_BASE_URL} ...", flush=True)
+    test_question = (
+        "You are a cloud scheduling agent. "
+        "A job queue is 80% full and a new urgent job just arrived. "
+        "Should you admit the job, reject it, or route it to another queue? "
+        "Answer in one sentence and explain why."
+    )
+    try:
+        resp = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[{"role": "user", "content": test_question}],
+            temperature=0.0,
+            max_tokens=80,
+        )
+        reply = (resp.choices[0].message.content or "").strip()
+        if not reply:
+            print("[MODEL_FAIL] Model returned an empty response.", flush=True)
+            print("[MODEL_FAIL] Will fall back to heuristic for all steps.", flush=True)
+            return False
+        print(f"[MODEL_OK] model is reasoning correctly.", flush=True)
+        print(f"[MODEL_OK] test reply: {reply}", flush=True)
+        return True
+    except Exception as exc:
+        print(f"[MODEL_FAIL] Cannot reach model: {exc}", flush=True)
+        print("[MODEL_FAIL] Will fall back to heuristic for all steps.", flush=True)
+        return False
+async def main() -> None:
+    if not API_KEY and not USE_HEURISTIC_ONLY:
+        raise ValueError("API_KEY is required for model inference.")
+    client = None
+    if not USE_HEURISTIC_ONLY:
+        client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    runtime_base_url = normalize_base_url(BASE_URL)
+    if runtime_base_url:
+        env = CloudQueueEnv(base_url=runtime_base_url)
+    else:
+        if not IMAGE_NAME:
+            raise ValueError(
+                "Set BASE_URL for deployed env, or IMAGE_NAME for local docker env."
+            )
+        env = await CloudQueueEnv.from_docker_image(IMAGE_NAME)
+    try:
+        # Run smoke test before benchmark — confirms model API is reachable.
+        model_enabled = client is not None
+        if client is not None:
+            model_enabled = _smoke_test_model(client)
+        task_seed_map = parse_task_seed_map()
+        replay_map = load_replay_actions()
+        task_score_table: dict[str, list[float]] = {}
+        seed_rows: list[dict] = []
+        for task_name in TASKS:
+            seeds = task_seed_map.get(task_name, [])
+            if not seeds:
+                continue
+            task_score_table[task_name] = []
+            for seed in seeds:
+                history: List[str] = []
+                rewards: List[float] = []
+                steps_taken = 0
+                score = 0.0
+                success = False
+                log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
+                await env.reset()
+                await env.step(
+                    CloudQueueAction(action_type="configure_task", task_id=task_name, seed=seed)
+                )
+                result = await env.reset()
+                last_reward = 0.0
+                max_steps = max(1, int(result.observation.horizon))
+                if MAX_STEPS_OVERRIDE > 0:
+                    max_steps = min(max_steps, MAX_STEPS_OVERRIDE)
+                for step in range(1, max_steps + 1):
+                    if result.done:
+                        break
+                    obs = result.observation
+                    obs_summary = build_obs_summary(obs, task_name)
+                    action = None
+                    model_error = None
+                    replay_key = f"{task_name}:{seed}"
+                    replay_actions = replay_map.get(replay_key, [])
+                    if step - 1 < len(replay_actions):
+                        action = replay_actions[step - 1]
+                    if action is None and model_enabled and client is not None:
+                        action, model_error = get_model_action(
+                            client=client,
+                            task_name=task_name,
+                            step=step,
+                            obs_summary=obs_summary,
+                            last_reward=last_reward,
+                            history=history,
+                        )
+                        if model_error and DISABLE_MODEL_ON_FIRST_ERROR:
+                            model_enabled = False
+                            print("[DEBUG] Disabling model calls and switching to heuristic fallback.", flush=True)
+                    if action is None:
+                        action = choose_heuristic_action(
+                            task_name=task_name,
+                            queue_lengths=obs.queue_lengths,
+                            incoming_present=obs.incoming_job_present,
+                        )
+                    result = await env.step(action)
+                    reward = float(result.reward or 0.0)
+                    done = bool(result.done)
+                    error = None
+                    meta = result.observation.metadata or {}
+                    info = meta.get("info", {}) if isinstance(meta, dict) else {}
+                    if isinstance(info, dict) and info.get("valid_action") is False:
+                        error = str(info.get("note", "invalid_action"))
+                    rewards.append(reward)
+                    steps_taken = step
+                    last_reward = reward
+                    action_str = (
+                        f"{action.action_type}(q={action.target_queue},s={action.target_server},"
+                        f"d={action.scale_delta},p={action.new_priority})"
+                    )
+                    log_step(step=step, action=action_str, reward=reward, done=done, error=error)
+                    history.append(f"step={step} action={action_str} reward={reward:.2f}")
+                    if done:
+                        break
+                if isinstance(result.observation.metadata, dict):
+                    score = float(result.observation.metadata.get("episode_score", 0.0) or 0.0)
+                    # Debug: print raw server metadata so we can verify grader output
+                    _m = result.observation.metadata
+                    print(
+                        f"[DEBUG_META] task={task_name} seed={seed} "
+                        f"episode_score={_m.get('episode_score')} "
+                        f"score_details={_m.get('score_details')} "
+                        f"metrics_completed={_m.get('metrics', {}).get('completed')} "
+                        f"metrics_arrivals={_m.get('metrics', {}).get('arrivals')}",
+                        flush=True,
+                    )
+                score = max(0.0, min(1.0, score))
+                task_score_table[task_name].append(score)
+                success = score >= SUCCESS_SCORE_THRESHOLD
+                log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+                meta = result.observation.metadata or {}
+                metrics = meta.get("metrics", {}) if isinstance(meta, dict) else {}
+                seed_row = {
+                    "task": task_name,
+                    "seed": int(seed),
+                    "score": round(score, 6),
+                    "steps": int(steps_taken),
+                    "success": bool(success),
+                    "trace_digest": str(meta.get("trace_digest", "")),
+                    "invalid_actions": float(metrics.get("invalid_actions", 0.0)),
+                    "harmful_scale_down": float(metrics.get("harmful_scale_down", 0.0)),
+                }
+                seed_rows.append(seed_row)
+                print(
+                    "[REPORT_SEED] "
+                    f"task={seed_row['task']} seed={seed_row['seed']} score={seed_row['score']:.3f} "
+                    f"steps={seed_row['steps']} trace={seed_row['trace_digest']}",
+                    flush=True,
+                )
+            task_scores = task_score_table[task_name]
+            task_mean = statistics.mean(task_scores) if task_scores else 0.0
+            task_std = statistics.pstdev(task_scores) if len(task_scores) > 1 else 0.0
+            task_ci = ci95(task_scores)
+            print(
+                f"[REPORT] task={task_name} seeds={len(task_scores)} mean={task_mean:.3f} std={task_std:.3f} ci95={task_ci:.3f}",
+                flush=True,
+            )
+        all_task_means = []
+        for task_name in TASKS:
+            scores = task_score_table.get(task_name, [])
+            if scores:
+                all_task_means.append(statistics.mean(scores))
+        if all_task_means:
+            final_score = sum(all_task_means) / len(all_task_means)
+            easy_mean = statistics.mean(task_score_table.get("easy", [0.0]))
+            medium_mean = statistics.mean(task_score_table.get("medium", [0.0]))
+            hard_mean = statistics.mean(task_score_table.get("hard", [0.0]))
+            print(
+                f"[SUMMARY] easy={easy_mean:.3f} medium={medium_mean:.3f} hard={hard_mean:.3f} final={final_score:.3f}",
+                flush=True,
+            )
+            write_reports(seed_rows=seed_rows, task_score_table=task_score_table)
+    finally:
+        try:
+            await env.close()
+        except Exception as e:
+            print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
+if __name__ == "__main__":
+    asyncio.run(main())

inference2.py ADDED Viewed

	@@ -0,0 +1,751 @@

+"""Strict model-only inference runner for the queue operations benchmark.
+This variant intentionally removes heuristic fallback paths.
+Every decision must come from either:
+1) replay trace input (ACTION_TRACE_FILE), or
+2) model output.
+If model output is invalid/unavailable, the seed run is marked failed.
+"""
+import asyncio
+import csv
+import json
+import os
+import statistics
+import textwrap
+from typing import Any, List, Optional
+from urllib.parse import urlparse, urlunparse
+from dotenv import load_dotenv
+from openai import OpenAI
+load_dotenv()
+from cloud_queue_env import CloudQueueAction, CloudQueueEnv, CloudQueueObservation
+IMAGE_NAME = os.getenv("IMAGE_NAME")
+BASE_URL = os.getenv("BASE_URL")
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
+API_KEY = os.getenv("API_KEY") or os.getenv("HF_TOKEN")
+BENCHMARK = os.getenv("BENCHMARK", "queueops-openenv")
+TASKS = ["easy", "medium", "hard"]
+TASK_SEEDS_JSON = os.getenv("TASK_SEEDS_JSON")
+SEEDS = [11, 23, 37]
+TEMPERATURE = 0.2
+MAX_TOKENS = 780
+SUCCESS_SCORE_THRESHOLD = 0.60
+MAX_STEPS_OVERRIDE = int(os.getenv("MAX_STEPS_OVERRIDE", "0") or "0")
+ACTION_TRACE_FILE = os.getenv("ACTION_TRACE_FILE")
+REPORT_JSON_PATH = os.getenv("REPORT_JSON_PATH")
+REPORT_CSV_PATH = os.getenv("REPORT_CSV_PATH")
+SYSTEM_PROMPT = textwrap.dedent(
+    """
+    You are an agent controlling a cloud queue scheduling environment.
+    Your goal: minimize wait times, SLA violations, and cost while maximizing throughput.
+    Return exactly one JSON object and no extra text.
+    ACTIONS:
+      {"action_type": "admit",       "target_queue": 0}
+      {"action_type": "route",       "target_queue": 1}
+      {"action_type": "reject",      "target_queue": null}
+      {"action_type": "dispatch",    "target_queue": 0}
+      {"action_type": "reprioritize","new_priority": 2}
+      {"action_type": "scale",       "scale_delta": 1}
+      {"action_type": "noop",        "target_queue": null}
+    Constraints:
+    - easy: use admit/reject/dispatch/noop only
+    - medium: use admit/reject/route/dispatch/reprioritize/noop only
+    - hard: use admit/reject/route/dispatch/reprioritize/scale/noop only
+    No explanation. JSON only.
+    """
+).strip()
+ACTION_TYPES = (
+    "configure_task",
+    "admit",
+    "reject",
+    "route",
+    "dispatch",
+    "scale",
+    "reprioritize",
+    "noop",
+)
+TASK_ALLOWED_ACTIONS = {
+    "easy": {"admit", "reject", "dispatch", "noop"},
+    "medium": {"admit", "reject", "route", "dispatch", "reprioritize", "noop"},
+    "hard": {"admit", "reject", "route", "dispatch", "reprioritize", "scale", "noop"},
+}
+ACTION_PAYLOAD_PROPERTIES = {
+    "target_queue": {"type": ["integer", "null"], "minimum": 0},
+    "target_server": {"type": ["integer", "null"], "minimum": 0},
+    "scale_delta": {"type": ["integer", "null"], "minimum": -2, "maximum": 2},
+    "new_priority": {"type": ["integer", "null"], "minimum": 0, "maximum": 3},
+}
+_SCHEMA_RESPONSE_FORMAT_FAILED = False
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
+def model_action_response_format(task_name: str) -> dict[str, Any]:
+    allowed = sorted(TASK_ALLOWED_ACTIONS.get(task_name, set(ACTION_TYPES)))
+    return {
+        "type": "json_schema",
+        "json_schema": {
+            "name": f"cloud_queue_action_{task_name}",
+            "strict": True,
+            "schema": {
+                "type": "object",
+                "additionalProperties": False,
+                "required": [
+                    "action_type",
+                    "target_queue",
+                    "target_server",
+                    "scale_delta",
+                    "new_priority",
+                ],
+                "properties": {
+                    "action_type": {"type": "string", "enum": allowed},
+                    **ACTION_PAYLOAD_PROPERTIES,
+                },
+            },
+        },
+    }
+def parse_task_seed_map() -> dict[str, list[int]]:
+    if TASK_SEEDS_JSON:
+        try:
+            data = json.loads(TASK_SEEDS_JSON)
+            task_map: dict[str, list[int]] = {}
+            for task_name, seeds in data.items():
+                parsed = [int(s) for s in seeds]
+                if parsed:
+                    task_map[str(task_name)] = parsed
+            if task_map:
+                return task_map
+        except Exception as exc:
+            print(f"[DEBUG] Invalid TASK_SEEDS_JSON, falling back to defaults: {exc}", flush=True)
+    return {
+        "easy": [SEEDS[0]],
+        "medium": [SEEDS[1]],
+        "hard": [SEEDS[2]],
+    }
+def _action_from_dict(data: dict) -> CloudQueueAction:
+    return CloudQueueAction(
+        action_type=str(data.get("action_type", "noop")),
+        target_queue=data.get("target_queue"),
+        target_server=data.get("target_server"),
+        scale_delta=data.get("scale_delta"),
+        new_priority=data.get("new_priority"),
+    )
+def load_replay_actions() -> dict[str, list[CloudQueueAction]]:
+    if not ACTION_TRACE_FILE:
+        return {}
+    try:
+        with open(ACTION_TRACE_FILE, "r", encoding="utf-8") as f:
+            payload = json.load(f)
+    except Exception as exc:
+        print(f"[DEBUG] Failed to load ACTION_TRACE_FILE: {exc}", flush=True)
+        return {}
+    replay: dict[str, list[CloudQueueAction]] = {}
+    if isinstance(payload, dict):
+        for key, action_list in payload.items():
+            if not isinstance(action_list, list):
+                continue
+            parsed = []
+            for item in action_list:
+                if isinstance(item, dict):
+                    parsed.append(_action_from_dict(item))
+            if parsed:
+                replay[str(key)] = parsed
+    return replay
+def ci95(values: list[float]) -> float:
+    if len(values) <= 1:
+        return 0.0
+    std = statistics.pstdev(values)
+    return 1.96 * std / (len(values) ** 0.5)
+def write_reports(seed_rows: list[dict], task_score_table: dict[str, list[float]]) -> None:
+    if REPORT_JSON_PATH:
+        report_payload = {
+            "seed_rows": seed_rows,
+            "task_summary": {
+                task: {
+                    "mean": statistics.mean(scores) if scores else 0.0,
+                    "std": statistics.pstdev(scores) if len(scores) > 1 else 0.0,
+                    "ci95": ci95(scores),
+                    "count": len(scores),
+                }
+                for task, scores in task_score_table.items()
+            },
+        }
+        try:
+            with open(REPORT_JSON_PATH, "w", encoding="utf-8") as f:
+                json.dump(report_payload, f, indent=2)
+        except Exception as exc:
+            print(f"[DEBUG] Failed to write REPORT_JSON_PATH: {exc}", flush=True)
+    if REPORT_CSV_PATH:
+        try:
+            with open(REPORT_CSV_PATH, "w", encoding="utf-8", newline="") as f:
+                writer = csv.DictWriter(
+                    f,
+                    fieldnames=[
+                        "task",
+                        "seed",
+                        "score",
+                        "steps",
+                        "success",
+                        "trace_digest",
+                        "invalid_actions",
+                        "harmful_scale_down",
+                        "failure_reason",
+                    ],
+                )
+                writer.writeheader()
+                for row in seed_rows:
+                    writer.writerow(row)
+        except Exception as exc:
+            print(f"[DEBUG] Failed to write REPORT_CSV_PATH: {exc}", flush=True)
+def build_obs_summary(obs: CloudQueueObservation, task_name: str) -> str:
+    max_sizes = {"easy": 28, "medium": 42, "hard": 64}
+    max_q = max_sizes.get(task_name, 30)
+    fills = [f"{l}/{max_q}({100*l//max_q}%)" for l in obs.queue_lengths]
+    busy_count = sum(obs.server_busy)
+    total_servers = len(obs.server_busy)
+    servers_str = f"{busy_count}/{total_servers} busy"
+    if obs.incoming_job_present:
+        urgency = "URGENT" if obs.incoming_job_priority >= 2 else "normal"
+        incoming_str = f"YES [{urgency} size={obs.incoming_job_size:.1f} deadline={obs.incoming_job_deadline:.0f}]"
+    else:
+        incoming_str = "none"
+    return (
+        f"task={task_name} | "
+        f"queues={fills} | "
+        f"servers={servers_str} | "
+        f"incoming={incoming_str} | "
+        f"sla_breach={obs.sla_violation_rate:.3f} | "
+        f"abandonment={obs.abandonment_rate:.3f} | "
+        f"cost_rate={obs.energy_cost_rate:.3f}"
+    )
+def build_user_prompt(step: int, obs_summary: str, last_reward: float, history: List[str]) -> str:
+    history_block = "\n".join(history[-4:]) if history else "None"
+    return textwrap.dedent(
+        f"""
+        Step {step} | Last reward: {last_reward:.2f}
+        State: {obs_summary}
+        Recent actions:
+        {history_block}
+        Choose the best action now.
+        """
+    ).strip()
+def _coerce_optional_int(value: Any) -> Optional[int]:
+    if value is None:
+        return None
+    if isinstance(value, bool):
+        return int(value)
+    if isinstance(value, int):
+        return value
+    if isinstance(value, float):
+        return int(value)
+    if isinstance(value, str):
+        txt = value.strip().lower()
+        if txt in {"", "null", "none"}:
+            return None
+        try:
+            return int(txt)
+        except ValueError:
+            try:
+                return int(float(txt))
+            except ValueError:
+                return None
+    return None
+def _extract_json_object(text: str) -> Optional[dict[str, Any]]:
+    cleaned = (text or "").strip()
+    if not cleaned:
+        return None
+    if cleaned.startswith("```"):
+        chunks = [chunk.strip() for chunk in cleaned.split("```") if chunk.strip()]
+        for chunk in chunks:
+            candidate = chunk
+            if candidate.lower().startswith("json"):
+                candidate = candidate[4:].strip()
+            try:
+                parsed = json.loads(candidate)
+                if isinstance(parsed, dict):
+                    return parsed
+                if isinstance(parsed, list) and parsed and isinstance(parsed[0], dict):
+                    return parsed[0]
+            except Exception:
+                continue
+    try:
+        parsed = json.loads(cleaned)
+        if isinstance(parsed, dict):
+            return parsed
+        if isinstance(parsed, list) and parsed and isinstance(parsed[0], dict):
+            return parsed[0]
+    except Exception:
+        pass
+    start = 0
+    while True:
+        open_idx = cleaned.find("{", start)
+        if open_idx < 0:
+            return None
+        depth = 0
+        for i in range(open_idx, len(cleaned)):
+            ch = cleaned[i]
+            if ch == "{":
+                depth += 1
+            elif ch == "}":
+                depth -= 1
+                if depth == 0:
+                    candidate = cleaned[open_idx : i + 1]
+                    try:
+                        parsed = json.loads(candidate)
+                        if isinstance(parsed, dict):
+                            return parsed
+                    except Exception:
+                        break
+        start = open_idx + 1
+def _normalize_action_payload(data: dict[str, Any], task_name: str) -> Optional[dict[str, Any]]:
+    action_type = str(data.get("action_type", "noop")).strip().lower()
+    if action_type not in ACTION_TYPES:
+        return None
+    if action_type not in TASK_ALLOWED_ACTIONS.get(task_name, set(ACTION_TYPES)):
+        return None
+    target_queue = _coerce_optional_int(data.get("target_queue"))
+    target_server = _coerce_optional_int(data.get("target_server"))
+    scale_delta = _coerce_optional_int(data.get("scale_delta"))
+    new_priority = _coerce_optional_int(data.get("new_priority"))
+    if action_type in {"admit", "route", "dispatch"} and target_queue is None:
+        target_queue = 0
+    if action_type in {"reject", "noop"}:
+        target_queue = None
+        target_server = None
+    if action_type == "scale":
+        if scale_delta is None:
+            return None
+        scale_delta = max(-2, min(2, scale_delta))
+    else:
+        scale_delta = None
+    if action_type == "reprioritize":
+        if new_priority is None:
+            new_priority = 2
+    else:
+        new_priority = None
+    return {
+        "action_type": action_type,
+        "target_queue": target_queue,
+        "target_server": target_server,
+        "scale_delta": scale_delta,
+        "new_priority": new_priority,
+    }
+def parse_model_action(text: str, task_name: str) -> Optional[CloudQueueAction]:
+    data = _extract_json_object(text)
+    if data is None:
+        return None
+    payload = _normalize_action_payload(data, task_name)
+    if payload is None:
+        return None
+    try:
+        return CloudQueueAction(**payload)
+    except Exception:
+        return None
+def get_model_action(
+    client: OpenAI,
+    task_name: str,
+    step: int,
+    obs_summary: str,
+    last_reward: float,
+    history: List[str],
+) -> tuple[Optional[CloudQueueAction], Optional[str]]:
+    global _SCHEMA_RESPONSE_FORMAT_FAILED
+    user_prompt = build_user_prompt(step, obs_summary, last_reward, history)
+    messages = [
+        {"role": "system", "content": SYSTEM_PROMPT},
+        {"role": "user", "content": user_prompt},
+    ]
+    try:
+        if not _SCHEMA_RESPONSE_FORMAT_FAILED:
+            try:
+                completion = client.chat.completions.create(
+                    model=MODEL_NAME,
+                    messages=messages,
+                    temperature=TEMPERATURE,
+                    max_tokens=MAX_TOKENS,
+                    stream=False,
+                    response_format=model_action_response_format(task_name),
+                )
+            except Exception as schema_exc:
+                _SCHEMA_RESPONSE_FORMAT_FAILED = True
+                print(
+                    f"[DEBUG] response_format unavailable, retrying without schema: {schema_exc}",
+                    flush=True,
+                )
+                completion = client.chat.completions.create(
+                    model=MODEL_NAME,
+                    messages=messages,
+                    temperature=TEMPERATURE,
+                    max_tokens=MAX_TOKENS,
+                    stream=False,
+                )
+        else:
+            completion = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=messages,
+                temperature=TEMPERATURE,
+                max_tokens=MAX_TOKENS,
+                stream=False,
+            )
+        text = (completion.choices[0].message.content or "").strip()
+        action = parse_model_action(text, task_name)
+        if action is None:
+            preview = " ".join(text.split())[:180]
+            return None, f"invalid_model_action_payload: {preview}"
+        return action, None
+    except Exception as exc:
+        return None, str(exc)
+def get_model_action_with_retry(
+    client: OpenAI,
+    task_name: str,
+    step: int,
+    obs_summary: str,
+    last_reward: float,
+    history: List[str],
+    retries: int = 2,
+) -> tuple[Optional[CloudQueueAction], Optional[str]]:
+    last_error: Optional[str] = None
+    for attempt in range(1, retries + 2):
+        action, error = get_model_action(
+            client=client,
+            task_name=task_name,
+            step=step,
+            obs_summary=obs_summary,
+            last_reward=last_reward,
+            history=history,
+        )
+        if action is not None:
+            return action, None
+        last_error = error
+        print(f"[DEBUG] Model action parse failed on attempt={attempt}: {error}", flush=True)
+    return None, last_error
+def normalize_base_url(base_url: Optional[str]) -> Optional[str]:
+    if not base_url:
+        return base_url
+    cleaned = base_url.strip().rstrip("/")
+    parsed = urlparse(cleaned)
+    if parsed.netloc.lower() == "huggingface.co":
+        parts = [p for p in parsed.path.strip("/").split("/") if p]
+        if len(parts) >= 3 and parts[0] == "spaces":
+            owner, space = parts[1], parts[2]
+            owner = owner.lower().replace("_", "-")
+            space = space.lower().replace("_", "-")
+            return f"https://{owner}-{space}.hf.space"
+    if cleaned.endswith("/web"):
+        cleaned = cleaned[:-4]
+        parsed = urlparse(cleaned)
+    host = (parsed.hostname or "").lower()
+    if host.endswith(".hf.space"):
+        safe_host = host.replace("_", "-")
+        if safe_host != host or (parsed.netloc and parsed.netloc != parsed.netloc.lower()):
+            port_part = f":{parsed.port}" if parsed.port else ""
+            parsed = parsed._replace(netloc=f"{safe_host}{port_part}")
+            cleaned = urlunparse(parsed)
+    return cleaned
+def _smoke_test_model(client: OpenAI) -> bool:
+    print(f"[MODEL_CHECK] Testing model={MODEL_NAME} at {API_BASE_URL} ...", flush=True)
+    test_question = (
+        "You are a cloud scheduling agent. "
+        "A job queue is 80% full and a new urgent job just arrived. "
+        "Should you admit the job, reject it, or route it to another queue? "
+        "Answer with exactly one JSON object containing action_type and optional fields."
+    )
+    try:
+        resp = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[{"role": "user", "content": test_question}],
+            temperature=0.0,
+            max_tokens=80,
+        )
+        reply = (resp.choices[0].message.content or "").strip()
+        if not reply:
+            print("[MODEL_FAIL] Model returned an empty response.", flush=True)
+            return False
+        print("[MODEL_OK] model endpoint reachable.", flush=True)
+        return True
+    except Exception as exc:
+        print(f"[MODEL_FAIL] Cannot reach model: {exc}", flush=True)
+        return False
+async def main() -> None:
+    if not API_KEY:
+        raise ValueError("API_KEY or HF_TOKEN is required for strict model inference.")
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    if not _smoke_test_model(client):
+        raise RuntimeError("Model smoke test failed. Aborting strict model-only run.")
+    runtime_base_url = normalize_base_url(BASE_URL)
+    if runtime_base_url:
+        env = CloudQueueEnv(base_url=runtime_base_url)
+    else:
+        if not IMAGE_NAME:
+            raise ValueError("Set BASE_URL for deployed env, or IMAGE_NAME for local docker env.")
+        env = await CloudQueueEnv.from_docker_image(IMAGE_NAME)
+    try:
+        task_seed_map = parse_task_seed_map()
+        replay_map = load_replay_actions()
+        task_score_table: dict[str, list[float]] = {}
+        seed_rows: list[dict] = []
+        for task_name in TASKS:
+            seeds = task_seed_map.get(task_name, [])
+            if not seeds:
+                continue
+            task_score_table[task_name] = []
+            for seed in seeds:
+                history: List[str] = []
+                rewards: List[float] = []
+                steps_taken = 0
+                score = 0.0
+                success = False
+                failure_reason: Optional[str] = None
+                log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
+                await env.reset()
+                await env.step(CloudQueueAction(action_type="configure_task", task_id=task_name, seed=seed))
+                result = await env.reset()
+                last_reward = 0.0
+                max_steps = max(1, int(result.observation.horizon))
+                if MAX_STEPS_OVERRIDE > 0:
+                    max_steps = min(max_steps, MAX_STEPS_OVERRIDE)
+                replay_key = f"{task_name}:{seed}"
+                replay_actions = replay_map.get(replay_key, [])
+                for step in range(1, max_steps + 1):
+                    if result.done:
+                        break
+                    obs = result.observation
+                    obs_summary = build_obs_summary(obs, task_name)
+                    action: Optional[CloudQueueAction] = None
+                    model_error: Optional[str] = None
+                    if step - 1 < len(replay_actions):
+                        action = replay_actions[step - 1]
+                    else:
+                        action, model_error = get_model_action_with_retry(
+                            client=client,
+                            task_name=task_name,
+                            step=step,
+                            obs_summary=obs_summary,
+                            last_reward=last_reward,
+                            history=history,
+                            retries=2,
+                        )
+                    if action is None:
+                        failure_reason = f"model_action_unavailable: {model_error}"
+                        log_step(
+                            step=step,
+                            action="model_action_error",
+                            reward=0.0,
+                            done=True,
+                            error=failure_reason,
+                        )
+                        steps_taken = step
+                        break
+                    result = await env.step(action)
+                    reward = float(result.reward or 0.0)
+                    done = bool(result.done)
+                    error = None
+                    meta = result.observation.metadata or {}
+                    info = meta.get("info", {}) if isinstance(meta, dict) else {}
+                    if isinstance(info, dict) and info.get("valid_action") is False:
+                        error = str(info.get("note", "invalid_action"))
+                    rewards.append(reward)
+                    steps_taken = step
+                    last_reward = reward
+                    action_str = (
+                        f"{action.action_type}(q={action.target_queue},s={action.target_server},"
+                        f"d={action.scale_delta},p={action.new_priority})"
+                    )
+                    log_step(step=step, action=action_str, reward=reward, done=done, error=error)
+                    history.append(f"step={step} action={action_str} reward={reward:.2f}")
+                    if done:
+                        break
+                if failure_reason is None and isinstance(result.observation.metadata, dict):
+                    score = float(result.observation.metadata.get("episode_score", 0.0) or 0.0)
+                    _m = result.observation.metadata
+                    print(
+                        f"[DEBUG_META] task={task_name} seed={seed} "
+                        f"episode_score={_m.get('episode_score')} "
+                        f"score_details={_m.get('score_details')} "
+                        f"metrics_completed={_m.get('metrics', {}).get('completed')} "
+                        f"metrics_arrivals={_m.get('metrics', {}).get('arrivals')}",
+                        flush=True,
+                    )
+                elif failure_reason is not None:
+                    score = 0.0
+                if failure_reason is None and not bool(result.done):
+                    failure_reason = "episode_not_done_within_max_steps"
+                    print(
+                        "[DEBUG] Episode ended early before done=true; "
+                        "set MAX_STEPS_OVERRIDE=0 or unset it for valid benchmark scores.",
+                        flush=True,
+                    )
+                    score = 0.0
+                score = max(0.0, min(1.0, score))
+                task_score_table[task_name].append(score)
+                success = failure_reason is None and score >= SUCCESS_SCORE_THRESHOLD
+                log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+                meta = result.observation.metadata or {}
+                metrics = meta.get("metrics", {}) if isinstance(meta, dict) else {}
+                seed_row = {
+                    "task": task_name,
+                    "seed": int(seed),
+                    "score": round(score, 6),
+                    "steps": int(steps_taken),
+                    "success": bool(success),
+                    "trace_digest": str(meta.get("trace_digest", "")),
+                    "invalid_actions": float(metrics.get("invalid_actions", 0.0)),
+                    "harmful_scale_down": float(metrics.get("harmful_scale_down", 0.0)),
+                    "failure_reason": failure_reason or "",
+                }
+                seed_rows.append(seed_row)
+                print(
+                    "[REPORT_SEED] "
+                    f"task={seed_row['task']} seed={seed_row['seed']} score={seed_row['score']:.3f} "
+                    f"steps={seed_row['steps']} trace={seed_row['trace_digest']}",
+                    flush=True,
+                )
+            task_scores = task_score_table[task_name]
+            task_mean = statistics.mean(task_scores) if task_scores else 0.0
+            task_std = statistics.pstdev(task_scores) if len(task_scores) > 1 else 0.0
+            task_ci = ci95(task_scores)
+            print(
+                f"[REPORT] task={task_name} seeds={len(task_scores)} mean={task_mean:.3f} std={task_std:.3f} ci95={task_ci:.3f}",
+                flush=True,
+            )
+        all_task_means = []
+        for task_name in TASKS:
+            scores = task_score_table.get(task_name, [])
+            if scores:
+                all_task_means.append(statistics.mean(scores))
+        if all_task_means:
+            final_score = sum(all_task_means) / len(all_task_means)
+            easy_mean = statistics.mean(task_score_table.get("easy", [0.0]))
+            medium_mean = statistics.mean(task_score_table.get("medium", [0.0]))
+            hard_mean = statistics.mean(task_score_table.get("hard", [0.0]))
+            print(
+                f"[SUMMARY] easy={easy_mean:.3f} medium={medium_mean:.3f} hard={hard_mean:.3f} final={final_score:.3f}",
+                flush=True,
+            )
+            write_reports(seed_rows=seed_rows, task_score_table=task_score_table)
+    finally:
+        try:
+            await env.close()
+        except Exception as exc:
+            print(f"[DEBUG] env.close() error (container cleanup): {exc}", flush=True)
+if __name__ == "__main__":
+    asyncio.run(main())

models.py ADDED Viewed

	@@ -0,0 +1,55 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Data models for the Cloud Queue Env queue operations environment."""
+from openenv.core.env_server.types import Action, Observation
+from pydantic import Field
+class CloudQueueAction(Action):
+    """Action model for queue control decisions."""
+    action_type: str = Field(
+        default="noop",
+        description=(
+            "One of: configure_task, admit, reject, route, dispatch, scale, reprioritize, noop"
+        ),
+    )
+    target_queue: int | None = Field(default=None, description="Queue index for route/dispatch")
+    target_server: int | None = Field(default=None, description="Server index for dispatch")
+    scale_delta: int | None = Field(default=None, description="Server pool scale delta for scale action")
+    new_priority: int | None = Field(default=None, description="Updated priority for reprioritize action")
+    task_id: str | None = Field(default=None, description="Task selector: easy, medium, or hard")
+    seed: int | None = Field(default=None, description="Deterministic seed for upcoming reset")
+class CloudQueueObservation(Observation):
+    """Observation model exposing queue system state to the agent."""
+    task_id: str = Field(default="easy", description="Active benchmark task")
+    sim_time: int = Field(default=0, description="Discrete simulation time step")
+    horizon: int = Field(default=0, description="Episode horizon")
+    queue_lengths: list[int] = Field(default_factory=list, description="Length per queue")
+    queue_wait_ema: list[float] = Field(default_factory=list, description="EMA wait time per queue")
+    server_busy: list[int] = Field(default_factory=list, description="1 if server is busy, else 0")
+    server_remaining_service: list[float] = Field(
+        default_factory=list,
+        description="Remaining service time per server",
+    )
+    utilization: list[float] = Field(default_factory=list, description="Rolling utilization by server")
+    incoming_job_present: bool = Field(default=False, description="Whether a new job is waiting for admission")
+    incoming_job_size: float = Field(default=0.0, description="Incoming job estimated size")
+    incoming_job_priority: int = Field(default=0, description="Incoming job priority")
+    incoming_job_deadline: float = Field(default=0.0, description="Incoming job deadline")
+    incoming_job_type: int = Field(default=0, description="Incoming job class/type id")
+    sla_violation_rate: float = Field(default=0.0, description="Running SLA violation rate")
+    abandonment_rate: float = Field(default=0.0, description="Running abandonment rate")
+    throughput_recent: float = Field(default=0.0, description="Completed jobs in current step")
+    energy_cost_rate: float = Field(default=0.0, description="Current infrastructure cost rate")
+    level: float = Field(default=1.0, description="Difficulty level scalar")
+    optional_history: list[float] = Field(default_factory=list, description="Compact recent context")
+    action_mask: list[int] = Field(default_factory=list, description="Optional valid action hints")

openenv.yaml ADDED Viewed

	@@ -0,0 +1,30 @@

+spec_version: 1
+name: cloud_queue_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000
+metadata:
+  description: >
+    A real-world queueing control environment where an agent manages
+    cloud request scheduling decisions — admission control, routing,
+    dispatching, and dynamic server scaling — under stochastic arrivals
+    and service times. Optimizes latency, throughput, SLA compliance,
+    fairness, and infrastructure cost across three benchmark tasks
+    (Easy / Medium / Hard) with deterministic graders scored in (0, 1).
+  tags:
+    - openenv
+    - reinforcement-learning
+    - queueing
+    - scheduling
+    - cloud-operations
+    - multi-objective
+    - llm-agent
+  difficulty: easy-to-hard
+  tasks:
+    - easy
+    - medium
+    - hard
+  author: Mrkumar007

pyproject.toml ADDED Viewed

	@@ -0,0 +1,45 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-cloud_queue_env"
+version = "0.1.0"
+description = "Cloud Queue Env environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.2",
+    # Environment-specific dependencies
+    # Add all dependencies needed for your environment here
+    # Examples:
+    # "numpy>=1.19.0",
+    # "torch>=2.0.0",
+    # "gymnasium>=0.29.0",
+    # "openspiel>=1.0.0",
+    # "smolagents>=1.22.0,<2",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m cloud_queue_env.server.app
+server = "cloud_queue_env.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["cloud_queue_env", "cloud_queue_env.server"]
+package-dir = { "cloud_queue_env" = ".", "cloud_queue_env.server" = "server" }

ref_inference.py ADDED Viewed

	@@ -0,0 +1,188 @@

+"""
+Inference Script Example
+===================================
+MANDATORY
+- Before submitting, ensure the following variables are defined in your environment configuration:
+    API_BASE_URL   The API endpoint for the LLM.
+    MODEL_NAME     The model identifier to use for inference.
+    HF_TOKEN       Your Hugging Face / API key.
+    LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
+                     method
+- Defaults are set only for API_BASE_URL and MODEL_NAME
+    (and should reflect your active inference setup):
+    API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
+    MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
+- The inference script must be named `inference.py` and placed in the root directory of the project
+- Participants must use OpenAI Client for all LLM calls using above variables
+STDOUT FORMAT
+- The script must emit exactly three line types to stdout, in this order:
+    [START] task=<task_name> env=<benchmark> model=<model_name>
+    [STEP]  step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
+    [END]   success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
+  Rules:
+    - One [START] line at episode begin.
+    - One [STEP] line per step, immediately after env.step() returns.
+    - One [END] line after env.close(), always emitted (even on exception).
+    - reward and rewards are formatted to 2 decimal places.
+    - done and success are lowercase booleans: true or false.
+    - error is the raw last_action_error string, or null if none.
+    - All fields on a single line with no newlines within a line.
+    - Each tasks should return score in [0, 1]
+  Example:
+    [START] task=click-test env=miniwob model=Qwen3-VL-30B
+    [STEP] step=1 action=click('123') reward=0.00 done=false error=null
+    [STEP] step=2 action=fill('456','text') reward=0.00 done=false error=null
+    [STEP] step=3 action=click('789') reward=1.00 done=true error=null
+    [END] success=true steps=3 score=1.00 rewards=0.00,0.00,1.00
+"""
+import asyncio
+import os
+import textwrap
+from typing import List, Optional
+from openai import OpenAI
+from my_env_v4 import MyEnvV4Action, MyEnvV4Env
+IMAGE_NAME = os.getenv("IMAGE_NAME") # If you are using docker image
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
+TASK_NAME = os.getenv("MY_ENV_V4_TASK", "echo")
+BENCHMARK = os.getenv("MY_ENV_V4_BENCHMARK", "my_env_v4")
+MAX_STEPS = 8
+TEMPERATURE = 0.7
+MAX_TOKENS = 150
+SUCCESS_SCORE_THRESHOLD = 0.1  # normalized score in [0, 1]
+# Max possible reward: each token contributes 0.1, across all steps
+_MAX_REWARD_PER_STEP = MAX_TOKENS * 0.1
+MAX_TOTAL_REWARD = MAX_STEPS * _MAX_REWARD_PER_STEP
+SYSTEM_PROMPT = textwrap.dedent(
+    """
+    You are interacting with a simple echo environment.
+    Each turn you must send a message. The environment will echo it back.
+    Reward is proportional to message length: reward = len(message) * 0.1
+    Your goal is to maximize total reward by sending meaningful, substantive messages.
+    Reply with exactly one message string — no quotes, no prefixes, just the message text.
+    """
+).strip()
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
+def build_user_prompt(step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
+    history_block = "\n".join(history[-4:]) if history else "None"
+    return textwrap.dedent(
+        f"""
+        Step: {step}
+        Last echoed message: {last_echoed!r}
+        Last reward: {last_reward:.2f}
+        Previous steps:
+        {history_block}
+        Send your next message.
+        """
+    ).strip()
+def get_model_message(client: OpenAI, step: int, last_echoed: str, last_reward: float, history: List[str]) -> str:
+    user_prompt = build_user_prompt(step, last_echoed, last_reward, history)
+    try:
+        completion = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user", "content": user_prompt},
+            ],
+            temperature=TEMPERATURE,
+            max_tokens=MAX_TOKENS,
+            stream=False,
+        )
+        text = (completion.choices[0].message.content or "").strip()
+        return text if text else "hello"
+    except Exception as exc:
+        print(f"[DEBUG] Model request failed: {exc}", flush=True)
+        return "hello"
+async def main() -> None:
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    env = await MyEnvV4Env.from_docker_image(IMAGE_NAME)
+    history: List[str] = []
+    rewards: List[float] = []
+    steps_taken = 0
+    score = 0.0
+    success = False
+    log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
+    try:
+        result = await env.reset() # OpenENV.reset()
+        last_echoed = result.observation.echoed_message
+        last_reward = 0.0
+        for step in range(1, MAX_STEPS + 1):
+            if result.done:
+                break
+            message = get_model_message(client, step, last_echoed, last_reward, history)
+            result = await env.step(MyEnvV4Action(message=message))
+            obs = result.observation
+            reward = result.reward or 0.0
+            done = result.done
+            error = None
+            rewards.append(reward)
+            steps_taken = step
+            last_echoed = obs.echoed_message
+            last_reward = reward
+            log_step(step=step, action=message, reward=reward, done=done, error=error)
+            history.append(f"Step {step}: {message!r} -> reward {reward:+.2f}")
+            if done:
+                break
+        score = sum(rewards) / MAX_TOTAL_REWARD if MAX_TOTAL_REWARD > 0 else 0.0
+        score = min(max(score, 0.0), 1.0)  # clamp to [0, 1]
+        success = score >= SUCCESS_SCORE_THRESHOLD
+    finally:
+        try:
+            await env.close()
+        except Exception as e:
+            print(f"[DEBUG] env.close() error (container cleanup): {e}", flush=True)
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+if __name__ == "__main__":
+    asyncio.run(main())

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Cloud Queue Env environment server components."""
+from .cloud_queue_env_environment import CloudQueueEnvironment
+__all__ = ["CloudQueueEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,89 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the Cloud Queue Env Environment.
+This module creates an HTTP server that exposes the CloudQueueEnvironment
+over HTTP and WebSocket endpoints, compatible with EnvClient.
+Endpoints:
+    - POST /reset: Reset the environment
+    - POST /step: Execute an action
+    - GET /state: Get current environment state
+    - GET /schema: Get action/observation schemas
+    - WS /ws: WebSocket endpoint for persistent sessions
+Usage:
+    # Development (with auto-reload):
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
+    # Or run directly:
+    python -m server.app
+"""
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:  # pragma: no cover
+    raise ImportError(
+        "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
+    ) from e
+try:
+    from ..models import CloudQueueAction, CloudQueueObservation
+    from .cloud_queue_env_environment import CloudQueueEnvironment
+except ImportError:
+    from models import CloudQueueAction, CloudQueueObservation
+    from server.cloud_queue_env_environment import CloudQueueEnvironment
+# Create the app with web interface and README integration
+app = create_app(
+    CloudQueueEnvironment,
+    CloudQueueAction,
+    CloudQueueObservation,
+    env_name="cloud_queue_env",
+    max_concurrent_envs=1,  # increase this number to allow more concurrent WebSocket sessions
+)
+def main(host: str = "0.0.0.0", port: int = 8000) -> None:
+    """
+    Entry point for direct execution via uv run or python -m.
+    This function enables running the server without Docker:
+        uv run --project . server
+        uv run --project . server --port 8001
+        python -m cloud_queue_env.server.app
+    Args:
+        host: Host address to bind to (default: "0.0.0.0")
+        port: Port number to listen on (default: 8000)
+    For production deployments, consider using uvicorn directly with
+    multiple workers:
+        uvicorn cloud_queue_env.server.app:app --workers 4
+    """
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+def _cli_main() -> None:
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int, default=8000)
+    parser.add_argument("--host", type=str, default="0.0.0.0")
+    args = parser.parse_args()
+    main(host=args.host, port=args.port)
+if __name__ == '__main__':
+    main()

server/cloud_queue_env_environment.py ADDED Viewed

	@@ -0,0 +1,762 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Queue operations environment with deterministic task grading."""
+import math
+import random
+import hashlib
+from collections import deque
+from dataclasses import dataclass
+from uuid import uuid4
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+try:
+    from ..models import CloudQueueAction, CloudQueueObservation
+except ImportError:
+    from models import CloudQueueAction, CloudQueueObservation
+@dataclass
+class TaskConfig:
+    task_id: str
+    horizon: int
+    level: float
+    queue_count: int
+    initial_servers: int
+    min_servers: int
+    max_servers: int
+    arrival_rate: float
+    urgent_ratio: float
+    service_mean: float
+    deadline_base: int
+    allow_scaling: bool
+    allow_priority: bool
+    two_stage: bool
+    server_cost: float
+    max_queue_size: int
+    score_refs: dict[str, float]
+class CloudQueueEnvironment(Environment):
+    """Deterministic queueing environment with easy/medium/hard benchmark tasks."""
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(self):
+        self._task_configs = self._build_task_configs()
+        self._active_task_id = "easy"
+        self._pending_task_id = "easy"
+        self._pending_seed = 7
+        self._rng_streams: dict[str, random.Random] = {}
+        self._rng_stream_seeds: dict[str, int] = {}
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._sim_time = 0
+        self._queues: list[deque[dict]] = []
+        self._servers: list[dict] = []
+        self._incoming_job: dict | None = None
+        self._done = False
+        self._wait_ema: list[float] = []
+        self._utilization_ema: list[float] = []
+        self._metrics: dict[str, float] = {}
+        self._recent_rewards: deque[float] = deque(maxlen=8)
+        self._action_trace: list[str] = []
+        self._reset_runtime_state()
+    def _build_task_configs(self) -> dict[str, TaskConfig]:
+        return {
+            "easy": TaskConfig(
+                task_id="easy",
+                horizon=150,
+                level=1.0,
+                queue_count=1,
+                initial_servers=1,
+                min_servers=1,
+                max_servers=1,
+                arrival_rate=0.78,
+                urgent_ratio=0.0,
+                service_mean=1.6,
+                deadline_base=10,
+                allow_scaling=False,
+                allow_priority=False,
+                two_stage=False,
+                server_cost=0.04,
+                max_queue_size=28,
+                score_refs={"wait": 6.0, "thr": 70.0, "rej": 0.3, "sla": 0.3},
+            ),
+            "medium": TaskConfig(
+                task_id="medium",
+                horizon=200,
+                level=2.3,
+                queue_count=2,
+                initial_servers=3,
+                min_servers=3,   # scaling disabled on medium — lock to initial_servers
+                max_servers=3,   # scaling disabled on medium — lock to initial_servers
+                arrival_rate=1.15,
+                urgent_ratio=0.28,
+                service_mean=1.8,
+                deadline_base=8,
+                allow_scaling=False,
+                allow_priority=True,
+                two_stage=False,
+                server_cost=0.06,
+                max_queue_size=42,
+                score_refs={"uw": 7.0, "nw": 10.0, "usla": 0.25, "thr": 125.0, "cost": 14.0},
+            ),
+            "hard": TaskConfig(
+                task_id="hard",
+                horizon=250,
+                level=4.0,
+                queue_count=2,
+                initial_servers=3,
+                min_servers=1,
+                max_servers=6,
+                arrival_rate=1.45,
+                urgent_ratio=0.35,
+                service_mean=2.2,
+                deadline_base=7,
+                allow_scaling=True,
+                allow_priority=True,
+                two_stage=True,
+                server_cost=0.1,
+                max_queue_size=64,
+                score_refs={
+                    "e2e": 14.0,
+                    "abd": 0.25,
+                    "sla": 0.3,
+                    "thr": 145.0,
+                    "cost": 28.0,
+                    "fair": 0.35,
+                },
+            ),
+        }
+    def _reset_runtime_state(self) -> None:
+        cfg = self._task_configs[self._active_task_id]
+        self._sim_time = 0
+        self._done = False
+        self._incoming_job = None
+        self._action_trace = []
+        self._queues = [deque() for _ in range(cfg.queue_count)]
+        self._servers = [
+            {"remaining": 0.0, "job": None, "active": True}
+            for _ in range(cfg.initial_servers)
+        ]
+        self._wait_ema = [0.0 for _ in range(cfg.queue_count)]
+        self._utilization_ema = [0.0 for _ in range(cfg.max_servers)]
+        self._recent_rewards.clear()
+        self._metrics = {
+            "arrivals": 0.0,
+            "accepted": 0.0,
+            "rejected": 0.0,
+            "completed": 0.0,
+            "completed_urgent": 0.0,
+            "abandoned": 0.0,
+            "wait_sum": 0.0,
+            "wait_count": 0.0,
+            "wait_sum_urgent": 0.0,
+            "wait_count_urgent": 0.0,
+            "wait_sum_normal": 0.0,
+            "wait_count_normal": 0.0,
+            "sla_breaches": 0.0,
+            "sla_breaches_urgent": 0.0,
+            "invalid_actions": 0.0,
+            "noop_under_load": 0.0,
+            "harmful_scale_down": 0.0,
+            "action_cost": 0.0,
+            "infra_cost": 0.0,
+            "fairness_gap_sum": 0.0,
+            "fairness_gap_count": 0.0,
+        }
+        self._wait_samples_all: list[float] = []
+        self._wait_samples_urgent: list[float] = []
+        self._wait_samples_normal: list[float] = []
+        self._e2e_wait_samples: list[float] = []
+    def _init_rng_streams(self, base_seed: int) -> None:
+        self._rng_stream_seeds = {
+            "arrivals": int(base_seed) + 101,
+            "service": int(base_seed) + 211,
+            "abandonment": int(base_seed) + 307,
+            "exogenous": int(base_seed) + 401,
+        }
+        self._rng_streams = {
+            key: random.Random(seed) for key, seed in self._rng_stream_seeds.items()
+        }
+    def _rng(self, stream: str) -> random.Random:
+        return self._rng_streams[stream]
+    def _sample_poisson(self, lam: float, rng: random.Random) -> int:
+        lam = max(0.0, lam)
+        if lam == 0.0:
+            return 0
+        # Knuth algorithm is sufficient for this environment's lambda scale.
+        l_term = math.exp(-lam)
+        k = 0
+        p = 1.0
+        while p > l_term:
+            k += 1
+            p *= rng.random()
+        return max(0, k - 1)
+    def _trace_digest(self) -> str:
+        raw = f"task={self._active_task_id}|seed={self._pending_seed}|" + "|".join(self._action_trace)
+        return hashlib.sha256(raw.encode("utf-8")).hexdigest()[:16]
+    def reset(self) -> CloudQueueObservation:
+        self._active_task_id = self._pending_task_id if self._pending_task_id in self._task_configs else "easy"
+        self._init_rng_streams(self._pending_seed)
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._reset_runtime_state()
+        return self._build_observation(reward=0.0, done=False, info={"event": "reset"})
+    def _clamp(self, value: float, lo: float, hi: float) -> float:
+        return max(lo, min(hi, value))
+    def _sample_service_time(self, cfg: TaskConfig) -> float:
+        service_rng = self._rng("service")
+        if cfg.task_id == "hard":
+            heavy = service_rng.random() < 0.22
+            if heavy:
+                return self._clamp(service_rng.lognormvariate(1.2, 0.7), 1.0, 12.0)
+        return self._clamp(service_rng.expovariate(1.0 / cfg.service_mean), 0.5, 10.0)
+    def _sample_arrivals(self, cfg: TaskConfig) -> int:
+        arrival_rng = self._rng("arrivals")
+        exogenous_rng = self._rng("exogenous")
+        rate = cfg.arrival_rate
+        if cfg.task_id == "hard":
+            wave = 0.35 * math.sin((self._sim_time + 1) / 13.0)
+            jitter = exogenous_rng.uniform(-0.05, 0.05)
+            rate += wave + jitter
+        return self._sample_poisson(rate, arrival_rng)
+    def _spawn_incoming_job(self, cfg: TaskConfig) -> None:
+        arrivals = self._sample_arrivals(cfg)
+        if arrivals <= 0:
+            self._incoming_job = None
+            return
+        arrival_rng = self._rng("arrivals")
+        priority = 2 if arrival_rng.random() < cfg.urgent_ratio else 1
+        size = self._sample_service_time(cfg)
+        self._incoming_job = {
+            "priority": priority,
+            "queue": 0,
+            "created_step": self._state.step_count,
+            "wait": 0.0,
+            "size": size,
+            "remaining": size,
+            "deadline": self._state.step_count + cfg.deadline_base - (1 if priority == 2 else 0),
+            "type": 1 if priority == 2 else 0,
+            "stage": 0,
+        }
+        self._metrics["arrivals"] += 1.0
+    def _update_wait_and_abandonment(self, cfg: TaskConfig) -> float:
+        abandonment_rng = self._rng("abandonment")
+        abandoned_this_step = 0.0
+        for qi, q in enumerate(self._queues):
+            kept: deque[dict] = deque()
+            while q:
+                job = q.popleft()
+                job["wait"] += 1.0
+                patience = cfg.deadline_base + (2 if job["priority"] == 2 else 4)
+                if cfg.task_id == "hard" and job["wait"] > patience and abandonment_rng.random() < 0.35:
+                    abandoned_this_step += 1.0
+                    continue
+                kept.append(job)
+            self._queues[qi] = kept
+        if abandoned_this_step:
+            self._metrics["abandoned"] += abandoned_this_step
+        return abandoned_this_step
+    def _complete_job(self, cfg: TaskConfig, job: dict) -> None:
+        if cfg.two_stage and job["stage"] == 0:
+            forwarded = dict(job)
+            forwarded["stage"] = 1
+            forwarded["queue"] = min(1, len(self._queues) - 1)
+            forwarded["remaining"] = self._sample_service_time(cfg)
+            self._queues[forwarded["queue"]].append(forwarded)
+            return
+        self._metrics["completed"] += 1.0
+        wait = float(self._state.step_count - job["created_step"])
+        self._metrics["wait_sum"] += wait
+        self._metrics["wait_count"] += 1.0
+        self._wait_samples_all.append(wait)
+        self._e2e_wait_samples.append(wait)
+        if job["priority"] == 2:
+            self._metrics["completed_urgent"] += 1.0
+            self._metrics["wait_sum_urgent"] += wait
+            self._metrics["wait_count_urgent"] += 1.0
+            self._wait_samples_urgent.append(wait)
+        else:
+            self._metrics["wait_sum_normal"] += wait
+            self._metrics["wait_count_normal"] += 1.0
+            self._wait_samples_normal.append(wait)
+        if self._state.step_count > job["deadline"]:
+            self._metrics["sla_breaches"] += 1.0
+            if job["priority"] == 2:
+                self._metrics["sla_breaches_urgent"] += 1.0
+    def _process_servers(self, cfg: TaskConfig) -> float:
+        completed_this_step = 0.0
+        for si, server in enumerate(self._servers):
+            if not server["active"]:
+                continue
+            if server["remaining"] > 0:
+                server["remaining"] = max(0.0, server["remaining"] - 1.0)
+            if server["remaining"] <= 0 and server["job"] is not None:
+                self._complete_job(cfg, server["job"])
+                completed_this_step += 1.0
+                server["job"] = None
+            busy_flag = 1.0 if server["job"] is not None else 0.0
+            if si < len(self._utilization_ema):
+                self._utilization_ema[si] = 0.9 * self._utilization_ema[si] + 0.1 * busy_flag
+        return completed_this_step
+    def _admit_job(self, cfg: TaskConfig, queue_idx: int) -> tuple[bool, str]:
+        if self._incoming_job is None:
+            return False, "no_incoming_job"
+        if queue_idx < 0 or queue_idx >= len(self._queues):
+            return False, "invalid_queue"
+        if len(self._queues[queue_idx]) >= cfg.max_queue_size:
+            self._metrics["rejected"] += 1.0
+            self._incoming_job = None
+            return True, "queue_full_rejected"
+        job = dict(self._incoming_job)
+        job["queue"] = queue_idx
+        self._queues[queue_idx].append(job)
+        self._incoming_job = None
+        self._metrics["accepted"] += 1.0
+        return True, "admitted"
+    def _dispatch(self, queue_idx: int | None) -> tuple[bool, str]:
+        target = 0 if queue_idx is None else queue_idx
+        if target < 0 or target >= len(self._queues):
+            return False, "invalid_dispatch_queue"
+        for server in self._servers:
+            if not server["active"]:
+                continue
+            if server["job"] is None and self._queues[target]:
+                server["job"] = self._queues[target].popleft()
+                server["remaining"] = server["job"]["remaining"]
+                return True, "dispatched"
+        return False, "no_idle_server_or_empty_queue"
+    def _autodispatch(self) -> None:
+        for server in self._servers:
+            if not server["active"] or server["job"] is not None:
+                continue
+            for q in self._queues:
+                if q:
+                    server["job"] = q.popleft()
+                    server["remaining"] = server["job"]["remaining"]
+                    break
+    def _apply_action(self, action: CloudQueueAction, cfg: TaskConfig) -> tuple[bool, str]:
+        action_type = (action.action_type or "noop").lower()
+        if action_type == "configure_task":
+            if action.task_id and action.task_id in self._task_configs:
+                self._pending_task_id = action.task_id
+            if action.seed is not None:
+                self._pending_seed = int(action.seed)
+            return True, "configuration_updated_for_next_reset"
+        if self._done:
+            return False, "episode_already_done"
+        if action_type == "admit":
+            queue_idx = action.target_queue if action.target_queue is not None else 0
+            return self._admit_job(cfg, queue_idx)
+        if action_type == "reject":
+            if self._incoming_job is None:
+                return False, "no_incoming_job"
+            self._incoming_job = None
+            self._metrics["rejected"] += 1.0
+            return True, "rejected"
+        if action_type == "route":
+            queue_idx = action.target_queue if action.target_queue is not None else 0
+            return self._admit_job(cfg, queue_idx)
+        if action_type == "dispatch":
+            return self._dispatch(action.target_queue)
+        if action_type == "scale":
+            if not cfg.allow_scaling:
+                return False, "scaling_not_supported_for_task"
+            delta = action.scale_delta if action.scale_delta is not None else 0
+            if delta == 0:
+                return True, "no_scale_change"
+            active_count = sum(1 for s in self._servers if s["active"])
+            requested = int(self._clamp(active_count + delta, cfg.min_servers, cfg.max_servers))
+            if requested == active_count:
+                return True, "scale_clamped_no_change"
+            if requested > active_count:
+                for _ in range(requested - active_count):
+                    self._servers.append({"remaining": 0.0, "job": None, "active": True})
+                    self._utilization_ema.append(0.0)
+            else:
+                to_disable = active_count - requested
+                for server in reversed(self._servers):
+                    if to_disable == 0:
+                        break
+                    if server["active"] and server["job"] is None:
+                        server["active"] = False
+                        to_disable -= 1
+            self._metrics["action_cost"] += abs(delta) * 0.35
+            return True, "scaled"
+        if action_type == "reprioritize":
+            if not cfg.allow_priority:
+                return False, "reprioritize_not_supported_for_task"
+            new_priority = 2 if (action.new_priority or 1) >= 2 else 1
+            for q in self._queues:
+                for job in q:
+                    if job["priority"] == 1:
+                        job["priority"] = new_priority
+                        return True, "reprioritized"
+            return False, "no_eligible_job"
+        if action_type == "noop":
+            return True, "noop"
+        return False, "unknown_action_type"
+    def _percentile(self, values: list[float], p: float) -> float:
+        if not values:
+            return 0.0
+        ordered = sorted(values)
+        idx = int(self._clamp(round((len(ordered) - 1) * p), 0, len(ordered) - 1))
+        return float(ordered[idx])
+    def _safe_div(self, numerator: float, denominator: float) -> float:
+        if denominator <= 0:
+            return 0.0
+        return numerator / denominator
+    def _current_fairness_gap(self) -> float:
+        urgent_avg = self._safe_div(self._metrics["wait_sum_urgent"], self._metrics["wait_count_urgent"])
+        normal_avg = self._safe_div(self._metrics["wait_sum_normal"], self._metrics["wait_count_normal"])
+        scale = max(1.0, urgent_avg + normal_avg)
+        return abs(urgent_avg - normal_avg) / scale
+    def _compute_reward(
+        self,
+        cfg: TaskConfig,
+        action_ok: bool,
+        action_type: str,
+        action_scale_delta: int,
+        completed_step: float,
+    ) -> tuple[float, dict[str, float]]:
+        avg_wait = self._safe_div(self._metrics["wait_sum"], self._metrics["wait_count"])
+        queue_pressure = sum(len(q) for q in self._queues) / max(1.0, float(cfg.max_queue_size))
+        r_wait = -self._clamp(avg_wait / max(cfg.deadline_base, 1), 0.0, 1.5) - 0.15 * self._clamp(queue_pressure, 0.0, 1.5)
+        r_throughput = self._clamp(completed_step / max(1.0, float(cfg.initial_servers)), 0.0, 1.0)
+        total_decisions = max(1.0, self._metrics["completed"] + self._metrics["abandoned"])
+        r_sla = -self._clamp(self._metrics["sla_breaches"] / total_decisions, 0.0, 1.0)
+        active_servers = sum(1 for s in self._servers if s["active"])
+        r_cost = -self._clamp(active_servers / max(1.0, float(cfg.max_servers)), 0.0, 1.0)
+        fairness_gap = self._current_fairness_gap()
+        r_fair = -self._clamp(fairness_gap / 0.5, 0.0, 1.0)
+        r_safe = 0.0 if action_ok else -1.0
+        if not action_ok:
+            self._metrics["invalid_actions"] += 1.0
+        if action_type == "noop" and self._incoming_job is not None and sum(len(q) for q in self._queues) > 0:
+            r_safe -= 0.05
+            self._metrics["noop_under_load"] += 1.0
+        arrivals = max(1.0, self._metrics["arrivals"])
+        rejection_rate = self._safe_div(self._metrics["rejected"], arrivals)
+        if arrivals > 10 and rejection_rate > 0.4:
+            r_safe -= self._clamp((rejection_rate - 0.4) * 0.4, 0.0, 0.2)
+        if action_type == "scale" and action_scale_delta < 0 and queue_pressure > 0.45:
+            overload_penalty = self._clamp((queue_pressure - 0.45) * 0.5, 0.0, 0.25)
+            r_safe -= overload_penalty
+            self._metrics["harmful_scale_down"] += 1.0
+        reward = 0.35 * r_wait + 0.20 * r_throughput + 0.20 * r_sla + 0.15 * r_cost + 0.05 * r_fair + 0.05 * r_safe
+        reward = self._clamp(reward, -1.0, 1.0)
+        self._recent_rewards.append(reward)
+        self._metrics["infra_cost"] += active_servers * cfg.server_cost
+        self._metrics["fairness_gap_sum"] += fairness_gap
+        self._metrics["fairness_gap_count"] += 1.0
+        components = {
+            "wait": round(r_wait, 4),
+            "throughput": round(r_throughput, 4),
+            "sla": round(r_sla, 4),
+            "cost": round(r_cost, 4),
+            "fairness": round(r_fair, 4),
+            "safety": round(r_safe, 4),
+        }
+        return reward, components
+    def _score_task(self, cfg: TaskConfig) -> tuple[float, dict[str, float]]:
+        # c01: clamp individual sub-score components to [0, 1] inclusive.
+        def c01(value: float) -> float:
+            if not math.isfinite(value):
+                return 0.0
+            return self._clamp(value, 0.0, 1.0)
+        # _strict01: final clamp applied only to the episode score.
+        # Validator requires score strictly in (0, 1) — never 0.0 or 1.0.
+        _SCORE_MIN = 0.001
+        _SCORE_MAX = 0.999
+        def strict01(value: float) -> float:
+            if not math.isfinite(value):
+                return _SCORE_MIN
+            return self._clamp(value, _SCORE_MIN, _SCORE_MAX)
+        completed = self._metrics["completed"]
+        arrivals = self._metrics["arrivals"]
+        rejected = self._metrics["rejected"]
+        avg_wait = self._safe_div(self._metrics["wait_sum"], self._metrics["wait_count"])
+        rejection_rate = self._safe_div(rejected, arrivals)
+        sla_rate = self._safe_div(self._metrics["sla_breaches"], max(1.0, completed))
+        throughput = completed
+        fairness_gap = self._safe_div(self._metrics["fairness_gap_sum"], self._metrics["fairness_gap_count"])
+        if cfg.task_id == "easy":
+            score_wait = c01(1.0 - avg_wait / cfg.score_refs["wait"])
+            score_thr = c01(throughput / cfg.score_refs["thr"])
+            score_rej = c01(1.0 - rejection_rate / cfg.score_refs["rej"])
+            score_sla = c01(1.0 - sla_rate / cfg.score_refs["sla"])
+            score = 0.4 * score_wait + 0.3 * score_thr + 0.15 * score_rej + 0.15 * score_sla
+            details = {
+                "score_wait": round(score_wait, 4),
+                "score_throughput": round(score_thr, 4),
+                "score_rejection": round(score_rej, 4),
+                "score_sla": round(score_sla, 4),
+            }
+        elif cfg.task_id == "medium":
+            p95_u = self._percentile(self._wait_samples_urgent, 0.95)
+            p95_n = self._percentile(self._wait_samples_normal, 0.95)
+            urgent_sla = self._safe_div(self._metrics["sla_breaches_urgent"], max(1.0, self._metrics["completed_urgent"]))
+            s_uw = c01(1.0 - p95_u / cfg.score_refs["uw"])
+            s_nw = c01(1.0 - p95_n / cfg.score_refs["nw"])
+            s_usla = c01(1.0 - urgent_sla / cfg.score_refs["usla"])
+            s_thr = c01(throughput / cfg.score_refs["thr"])
+            s_cost = c01(1.0 - self._metrics["action_cost"] / cfg.score_refs["cost"])
+            score = 0.35 * s_uw + 0.15 * s_nw + 0.25 * s_usla + 0.15 * s_thr + 0.10 * s_cost
+            details = {
+                "score_urgent_wait": round(s_uw, 4),
+                "score_normal_wait": round(s_nw, 4),
+                "score_urgent_sla": round(s_usla, 4),
+                "score_throughput": round(s_thr, 4),
+                "score_cost": round(s_cost, 4),
+            }
+        else:
+            e2e_p95 = self._percentile(self._e2e_wait_samples, 0.95)
+            abd_rate = self._safe_div(self._metrics["abandoned"], arrivals)
+            s_e2e = c01(1.0 - e2e_p95 / cfg.score_refs["e2e"])
+            s_abd = c01(1.0 - abd_rate / cfg.score_refs["abd"])
+            s_sla = c01(1.0 - sla_rate / cfg.score_refs["sla"])
+            s_thr = c01(throughput / cfg.score_refs["thr"])
+            s_cost = c01(1.0 - self._metrics["infra_cost"] / cfg.score_refs["cost"])
+            s_fair = c01(1.0 - fairness_gap / cfg.score_refs["fair"])
+            score = 0.25 * s_e2e + 0.20 * s_abd + 0.20 * s_sla + 0.15 * s_thr + 0.10 * s_cost + 0.10 * s_fair
+            details = {
+                "score_e2e_p95": round(s_e2e, 4),
+                "score_abandonment": round(s_abd, 4),
+                "score_sla": round(s_sla, 4),
+                "score_throughput": round(s_thr, 4),
+                "score_cost": round(s_cost, 4),
+                "score_fairness": round(s_fair, 4),
+            }
+        if self._metrics["invalid_actions"] > max(3.0, 0.04 * cfg.horizon):
+            score = min(score, 0.4)
+        # Apply strict open-interval clamp: validator rejects 0.0 and 1.0.
+        return strict01(score), details
+    def _compute_action_mask(self, cfg: TaskConfig) -> list[int]:
+        """Compute which of the 8 actions are valid right now.
+        Slot order (matches CloudQueueAction.action_type):
+          0: configure_task  — always valid (meta, sets next task/seed)
+          1: admit           — only if an incoming job is waiting
+          2: reject          — only if an incoming job is waiting
+          3: route           — only if an incoming job is waiting
+          4: dispatch        — only if an idle+active server AND a non-empty queue exist
+          5: scale           — only if cfg.allow_scaling is True
+          6: reprioritize    — only if cfg.allow_priority AND a normal-priority job is queued
+          7: noop            — always valid
+        """
+        has_incoming = self._incoming_job is not None
+        has_idle_server = any(
+            s["active"] and s["job"] is None for s in self._servers
+        )
+        has_queued_job = any(len(q) > 0 for q in self._queues)
+        can_dispatch = 1 if (has_idle_server and has_queued_job) else 0
+        can_reprioritize = 0
+        if cfg.allow_priority:
+            can_reprioritize = 1 if any(
+                job["priority"] == 1 for q in self._queues for job in q
+            ) else 0
+        return [
+            1,                              # 0: configure_task
+            1 if has_incoming else 0,       # 1: admit
+            1 if has_incoming else 0,       # 2: reject
+            1 if has_incoming else 0,       # 3: route
+            can_dispatch,                   # 4: dispatch
+            1 if cfg.allow_scaling else 0,  # 5: scale
+            can_reprioritize,               # 6: reprioritize
+            1,                              # 7: noop
+        ]
+    def _build_observation(self, reward: float, done: bool, info: dict) -> CloudQueueObservation:
+        cfg = self._task_configs[self._active_task_id]
+        queue_lengths = [len(q) for q in self._queues]
+        for i, q in enumerate(self._queues):
+            current_mean_wait = 0.0
+            if q:
+                current_mean_wait = sum(job["wait"] for job in q) / len(q)
+            self._wait_ema[i] = 0.8 * self._wait_ema[i] + 0.2 * current_mean_wait
+        active_servers = max(1, sum(1 for s in self._servers if s["active"]))
+        completed = max(1.0, self._metrics["completed"])
+        sla_violation_rate = self._safe_div(self._metrics["sla_breaches"], completed)
+        abandonment_rate = self._safe_div(self._metrics["abandoned"], max(1.0, self._metrics["arrivals"]))
+        throughput_recent = max(0.0, info.get("completed_this_step", 0.0))
+        energy_cost_rate = active_servers * cfg.server_cost
+        incoming = self._incoming_job
+        incoming_present = incoming is not None
+        incoming_size = float(incoming["size"]) if incoming_present else 0.0
+        incoming_priority = int(incoming["priority"]) if incoming_present else 0
+        incoming_deadline = float(incoming["deadline"]) if incoming_present else 0.0
+        incoming_type = int(incoming["type"]) if incoming_present else 0
+        score, score_details = (0.0, {})
+        if done:
+            score, score_details = self._score_task(cfg)
+        metadata = {
+            "info": info,
+            "reward_components": info.get("reward_components", {}),
+            "applied_action": info.get("applied_action", "noop"),
+            "seed": int(self._pending_seed),
+            "trace_digest": self._trace_digest(),
+            "rng_stream_seeds": self._rng_stream_seeds,
+            "metrics": {
+                "arrivals": self._metrics["arrivals"],
+                "accepted": self._metrics["accepted"],
+                "rejected": self._metrics["rejected"],
+                "completed": self._metrics["completed"],
+                "abandoned": self._metrics["abandoned"],
+                "invalid_actions": self._metrics["invalid_actions"],
+                "harmful_scale_down": self._metrics["harmful_scale_down"],
+                "infra_cost": round(self._metrics["infra_cost"], 4),
+            },
+            "episode_score": round(score, 4),
+            "score_details": score_details,
+        }
+        return CloudQueueObservation(
+            task_id=cfg.task_id,
+            sim_time=self._sim_time,
+            horizon=cfg.horizon,
+            queue_lengths=queue_lengths,
+            queue_wait_ema=[round(v, 3) for v in self._wait_ema],
+            server_busy=[1 if s["job"] is not None and s["active"] else 0 for s in self._servers],
+            server_remaining_service=[round(float(s["remaining"]), 3) for s in self._servers],
+            utilization=[round(v, 3) for v in self._utilization_ema[: len(self._servers)]],
+            incoming_job_present=incoming_present,
+            incoming_job_size=round(incoming_size, 3),
+            incoming_job_priority=incoming_priority,
+            incoming_job_deadline=round(incoming_deadline, 3),
+            incoming_job_type=incoming_type,
+            sla_violation_rate=round(sla_violation_rate, 4),
+            abandonment_rate=round(abandonment_rate, 4),
+            throughput_recent=round(throughput_recent, 4),
+            energy_cost_rate=round(energy_cost_rate, 4),
+            level=cfg.level,
+            optional_history=[round(v, 4) for v in list(self._recent_rewards)],
+            action_mask=self._compute_action_mask(cfg),
+            done=done,
+            reward=round(reward, 6),
+            metadata=metadata,
+        )
+    def step(self, action: CloudQueueAction) -> CloudQueueObservation:  # type: ignore[override]
+        cfg = self._task_configs[self._active_task_id]
+        if (action.action_type or "").lower() == "configure_task":
+            ok, note = self._apply_action(action, cfg)
+            info = {
+                "event": "configure_task",
+                "applied_action": action.action_type,
+                "valid_action": ok,
+                "note": note,
+                "completed_this_step": 0.0,
+                "debug_trace_id": self._trace_digest(),
+            }
+            return self._build_observation(reward=0.0, done=self._done, info=info)
+        if self._done:
+            info = {
+                "event": "episode_done",
+                "applied_action": action.action_type,
+                "valid_action": False,
+                "note": "call reset() to start a new episode",
+                "completed_this_step": 0.0,
+                "reward_components": {},
+                "debug_trace_id": self._trace_digest(),
+            }
+            return self._build_observation(reward=0.0, done=True, info=info)
+        self._state.step_count += 1
+        self._sim_time += 1
+        completed_this_step = self._process_servers(cfg)
+        abandoned_this_step = self._update_wait_and_abandonment(cfg)
+        self._spawn_incoming_job(cfg)
+        action_ok, action_note = self._apply_action(action, cfg)
+        action_key = (
+            f"{(action.action_type or 'noop').lower()}|"
+            f"q={action.target_queue}|s={action.target_server}|"
+            f"d={action.scale_delta}|p={action.new_priority}"
+        )
+        self._action_trace.append(action_key)
+        self._autodispatch()
+        reward, reward_components = self._compute_reward(
+            cfg,
+            action_ok=action_ok,
+            action_type=(action.action_type or "noop").lower(),
+            action_scale_delta=int(action.scale_delta or 0),
+            completed_step=completed_this_step,
+        )
+        self._done = self._state.step_count >= cfg.horizon
+        info = {
+            "event": "step",
+            "applied_action": action.action_type,
+            "valid_action": action_ok,
+            "note": action_note,
+            "completed_this_step": completed_this_step,
+            "abandoned_this_step": abandoned_this_step,
+            "reward_components": reward_components,
+            "debug_trace_id": self._trace_digest(),
+        }
+        return self._build_observation(reward=reward, done=self._done, info=info)
+    @property
+    def state(self) -> State:
+        return self._state

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+openenv[core]>=0.2.0
+fastapi>=0.115.0
+uvicorn>=0.24.0

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff